Pseudoinverse and the shortest solution to least squares
May 03, 2020 | 2 min read
When we try to solve Ax=b for least squares, we solve ATAx^=ATb. However, since A may have dependent columns, the solution to ATAx^=ATb is not exact.
A solution can be x+=A+b where A+ is the pseudoinverse of A. To see that x+=A+b is a solution to ATAx^=ATb:
ATAx+⟺ ATAA+b⟺AT(b−AA+b)=ATb=ATb=0Using SVD, we have AA+b=(u1u1T+⋯+ururT)b=(u1Tb)u1+⋯+(urTb)ur and using the result I proved on here, {u1,…,ur} can be a basis to C(A), we know AA+b is the projection of b onto C(A) and hence (b−AA+b)∈N(AT) and so AT(b−AA+b)=0.
So, how special is x+=A+b? It is the shortest one among all solutions to ATAx^=ATb.
To see such result, we first observe the SVD of AT and A+:
ATA+=σ1v1u1T+⋯+σrvrurT=σ11v1u1T+⋯+σr1vrurTUsing the result I proved on here again, we know AT and A+ have the same four subspaces. Hence, x+=A+b∈C(A+) which is C(AT) and ⊥N(A).
Now, any other x^ which satisfies ATAx^=ATb, x^−x+∈N(ATA) which is N(A). Then, we have:
1.2.3. (x^−x+)⊥x+ (x^−x+)Tx+=0⟺x^Tx+=(x+)Tx+=∣x+∣2 (x+)T(x^−x+)=0⟺(x+)Tx^=∣x+∣2
Then, we consider (x^−x+)T(x^−x+):
===(x^−x+)T(x^−x+)x^Tx^−x^Tx+−(x+)Tx^+(x+)Tx+∣x^∣2−2∣x+∣2+∣x+∣2∣x^∣2−∣x+∣2i.e. ∣x^∣2−∣x+∣2=(x^−x+)T(x^−x+)=∣x^−x+∣2
∴∣x^∣2=∣x^−x+∣2+∣x+∣2 and hence x+=A+b is the shortest one among all solutions x^ to ATAx^=ATb. Then, we know x+=A+b is the shortest solution to least squares of Ax=b.