Danny Siu's Personal Blog

Pseudoinverse and the shortest solution to least squares

May 03, 2020 | 2 min read

When we try to solve Ax=bAx=b for least squares, we solve ATAx^=ATbA^TA\hat{x}=A^Tb. However, since AA may have dependent columns, the solution to ATAx^=ATbA^TA\hat{x}=A^Tb is not exact.


A solution can be x+=A+bx^+ = A^+b where A+A^+ is the pseudoinverse of AA. To see that x+=A+bx^+ = A^+b is a solution to ATAx^=ATbA^TA\hat{x}=A^Tb:

ATAx+=ATb              ATAA+b=ATb    AT(bAA+b)=0\begin{aligned} A^TAx^+ &= A^Tb \\ \iff \space\space\space\space\space\space\space\space\space\space A^TAA^+b &= A^Tb \\ \iff A^T(b - AA^+b) &= 0 \\ \end{aligned}

Using SVD, we have AA+b=(u1u1T++ururT)b=(u1Tb)u1++(urTb)urAA^+b = (u_1u_1^T + \dots + u_ru_r^T)b = (u_1^Tb)u_1 + \dots + (u_r^Tb)u_r and using the result I proved on here, {u1,,uru_1, \dots , u_r} can be a basis to C(A)C(A), we know AA+bAA^+b is the projection of bb onto C(A)C(A) and hence (bAA+b)N(AT)(b - AA^+b) \isin N(A^T) and so AT(bAA+b)=0A^T(b - AA^+b) = 0.


So, how special is x+=A+bx^+ = A^+b? It is the shortest one among all solutions to ATAx^=ATbA^TA\hat{x}=A^Tb.


To see such result, we first observe the SVD of ATA^T and A+A^+:

AT=σ1v1u1T++σrvrurTA+=1σ1v1u1T++1σrvrurT\begin{aligned} A^T &= \sigma_1v_1u_1^T + \dots + \sigma_rv_ru_r^T \\ A^+ &= \frac{1}{\sigma_1}v_1u_1^T + \dots + \frac{1}{\sigma_r}v_ru_r^T \end{aligned}

Using the result I proved on here again, we know ATA^T and A+A^+ have the same four subspaces. Hence, x+=A+bC(A+)x^+ = A^+b \isin C(A^+) which is C(AT)C(A^T) and N(A)\perp N(A).


Now, any other x^\hat{x} which satisfies ATAx^=ATbA^TA\hat{x}=A^Tb, x^x+N(ATA)\hat{x}-x^+ \isin N(A^TA) which is N(A)N(A). Then, we have:

1. (x^x+)x+2. (x^x+)Tx+=0    x^Tx+=(x+)Tx+=x+23. (x+)T(x^x+)=0    (x+)Tx^=x+2\begin{aligned} 1.& \space(\hat{x}-x^+) \perp x^+ \\ 2.& \space(\hat{x}-x^+)^T x^+ = 0 \iff \hat{x}^Tx^+ = (x^+)^Tx^+ = |x^+|^2 \\ 3.& \space(x^+)^T(\hat{x}-x^+) = 0 \iff (x^+)^T\hat{x} = |x^+|^2 \end{aligned}

Then, we consider (x^x+)T(x^x+)(\hat{x} - x^+)^T(\hat{x}-x^+):

(x^x+)T(x^x+)=x^Tx^x^Tx+(x+)Tx^+(x+)Tx+=x^22x+2+x+2=x^2x+2\begin{aligned} &(\hat{x} - x^+)^T(\hat{x}-x^+) \\ =& \hat{x}^T\hat{x} - \hat{x}^Tx^+ - (x^+)^T\hat{x} + (x^+)^Tx^+ \\ =& |\hat{x}|^2 - 2|x^+|^2 + |x^+|^2 \\ =& |\hat{x}|^2 - |x^+|^2 \end{aligned}

i.e. x^2x+2=(x^x+)T(x^x+)=x^x+2|\hat{x}|^2 - |x^+|^2 = (\hat{x} - x^+)^T(\hat{x}-x^+) = |\hat{x}-x^+|^2
x^2=x^x+2+x+2\therefore |\hat{x}|^2 = |\hat{x}-x^+|^2 + |x^+|^2 and hence x+=A+bx^+ = A^+b is the shortest one among all solutions x^\hat{x} to ATAx^=ATbA^TA\hat{x} = A^Tb. Then, we know x+=A+bx^+ = A^+b is the shortest solution to least squares of Ax=bAx=b.


Written by Danny Siu who lives in Hong Kong. You should follow him on Twitter