Moore–Penrose pseudoinverse

Moore–Penrose pseudoinverse

In mathematics, and in particular linear algebra, a pseudoinverse A+ of a matrix A is a generalization of the inverse matrix.[1] The most widely known type of matrix pseudoinverse is the Moore–Penrose pseudoinverse, which was independently described by E. H. Moore[2] in 1920, Arne Bjerhammar [3] in 1951 and Roger Penrose[4] in 1955. Earlier, Fredholm had introduced the concept of a pseudoinverse of integral operators in 1903. When referred to a matrix, the term pseudoinverse, without further specification, is often used to indicate the Moore–Penrose pseudoinverse. The term generalized inverse is sometimes used as a synonym for pseudoinverse.

A common use of the Moore–Penrose pseudoinverse (hereafter, just pseudoinverse) is to compute a 'best fit' (least squares) solution to a system of linear equations that lacks a unique solution (see below under Applications). Another use is to find the minimum (Euclidean) norm solution to a system of linear equations with multiple solutions.

The pseudoinverse is defined and unique for all matrices whose entries are real or complex numbers. It can be computed using the singular value decomposition.

Contents

Notation

In the following discussion, the following conventions are adopted.

  • \mathbb{K} will denote one of the fields of real or complex numbers, denoted \mathbb{R},\,\mathbb{C}, respectively. The ring of m \times n matrices over \mathbb{K} is denoted by M(m,n;\mathbb{K}).
  • For A \in M(m,n;\mathbb{K}), AT and A * denote the transpose and Hermitian transpose (also called conjugate transpose) respectively. If \mathbb{K} = \mathbb{R}, then A * = AT.
  • For A \in M(m,n;\mathbb{K}), then \operatorname{Im}(A) denotes the range (image) of A (the space spanned by the column vectors of A) and \operatorname{Ker}(A) denotes the kernel (null space) of A.
  • Finally, for any positive integer n, I_{n} \in M(n,n;\mathbb{K}) denotes the n \times n identity matrix.

Definition

For  A \in M(m,n;\mathbb{K}) , a Moore–Penrose pseudoinverse (hereafter, just pseudoinverse) of A is defined as a matrix  A^+ \in M(n,m;\mathbb{K}) satisfying all of the following four criteria:[4][5]

  1. A A^+A = A\,\!       (AA+ need not be the general identity matrix, but it maps all column vectors of A to themselves);
  2. A^+A A^+ = A^+\,\!       (A+ is a weak inverse for the multiplicative semigroup);
  3. (AA^+)^* = AA^+\,\!       (AA+ is Hermitian); and
  4. (A^+A)^* = A^+A\,\!       (A+A is also Hermitian).

Properties

Proofs for some of these facts may be found on a separate page here.

Existence and uniqueness

  • The Moore–Penrose pseudoinverse exists and is unique: for any matrix  A\,\!, there is precisely one matrix  A^+\,\!, that satisfies the four properties of the definition.[5]

A matrix satisfying the first two conditions of the definition is known as a generalized inverse. Generalized inverses always exist but are not in general unique. Uniqueness is a consequence of the last two conditions.

Basic properties

  • If A\,\! has real entries, then so does  A^+\,\!.
  • If A\,\! is invertible, the pseudoinverse and the inverse coincide: A^+=A^{-1}\,\!.[6]:243
  • The pseudoinverse of a zero matrix is its transpose.
  • The pseudoinverse of the pseudoinverse is the original matrix: (A^+)^+=A\,\!.[6]:245
  • Pseudoinversion commutes with transposition, conjugation, and taking the conjugate transpose:[6]:245
(A^T)^+ = (A^+)^T,~~ \overline{A}^+ = \overline{A^+},~~ (A^*)^+ = (A^+)^*.\,\!
  • The pseudoinverse of a scalar multiple of A is the reciprocal multiple of A+:
(\alpha A)^+ = \alpha^{-1} A^+\,\! for \alpha\neq 0.

Identities

The following identities can be used to cancel certain subexpressions or expand expressions involving pseudoinverses. Proofs for these properties can be found in the proofs subpage.

\begin{array}{lclll}
A^+ &=& A^+   & A^{+*} & A^*\\
A^+ &=& A^*   & A^{+*} & A^+\\
A   &=& A^{+*}& A^*    & A  \\
A   &=& A     & A^*    & A^{+*}\\
A^* &=& A^*   & A      & A^+\\
A^* &=& A^+   & A      & A^*\\
\end{array}

Reduction to Hermitian case

  • A^+ = (A^*A)^+A^*\,\!.
  • A^+ = A^*(AA^*)^+\,\!.

Products

If  A \in M(m,n;\mathbb{K}),~B \in M(n,p;\mathbb{K})\,\! and either,

  •  A\,\! has orthonormal columns (i.e.  A^*A = I_n\, ) or,
  •  B\,\! has orthonormal rows (i.e.  BB^* = I_n\, ) or,
  •  A\,\! has all columns linearly independent (full column rank) and  B\,\! has all rows linearly independent (full row rank),

then (AB)^+ = B^+ A^+\,\!.

Projectors

If P = AA^+\,\! and Q = A^+A\,\! are orthogonal projection operators i.e.  P = P^*\,\!,  Q = Q^*\,\!,  P^2 = P\,\! and  Q^2 = Q\,\! i.e. they are Hermitian and idempotent.

Then the following hold:

  • PA=A=AQ\,\! and A^+P=A^+=QA^+\,\!
  • P\,\! is the orthogonal projector onto the range of A\,\! (which equals the orthogonal complement of the kernel of  A^*\,\!).
  • Q\,\! is the orthogonal projector onto the range of A^*\,\! (which equals the orthogonal complement of the kernel of  A\,\!).
  • (I - P)\,\! is the orthogonal projector onto the kernel of A^*\,\!.
  • (I - Q)\,\! is the orthogonal projector onto the kernel of A\,\!.[5]

Subspaces

  •  \operatorname{Ker}(A^+) = \operatorname{Ker}(A^*)\,\!
  •  \operatorname{Im}(A^+) = \operatorname{Im}(A^*)\,\!

Limit relations

  • The pseudoinverse can also be computed via limiting processes:
A^+ = \lim_{\delta \searrow 0} (A^* A + \delta I)^{-1} A^*
          = \lim_{\delta \searrow 0} A^* (A A^* + \delta I)^{-1}
(see Tikhonov regularization). These limits exist even if (AA^*)^{-1}\,\! or (A^*A)^{-1}\,\! do not exist.[5]:263

Continuity

  • In contrast to ordinary matrix inversion, the process of taking pseudoinverses is not continuous: if the sequence (An) converges to the matrix A (in the maximum norm or Frobenius norm, say), then (An)+ need not converge to A+. However, if all the matrices have the same rank, (An)+ will converge to A+.

Special cases

Scalars

It is also possible to define a pseudoinverse for scalars and vectors. This amounts to treating these as matrices. The pseudoinverse of a scalar x is zero if x is zero and the reciprocal of x otherwise:

x^+ = \left\{\begin{matrix} 0, & \mbox{if }x=0;
 \\ x^{-1}, & \mbox{otherwise}. \end{matrix}\right.

Vectors

The pseudoinverse of the null (all zero) vector is the transposed null vector. The pseudoinverse of a non-null vector is the conjugate transposed vector divided by its squared magnitude:

x^+ = \left\{\begin{matrix} 0^T, & \mbox{if }x = 0;
 \\ {x^* \over x^* x}, & \mbox{otherwise}. \end{matrix}\right.

Linearly independent columns

If the columns of A\,\! are linearly independent (so that m \ge n), then A^*A\,\! is invertible. In this case, an explicit formula is:[1]

A^+ = (A^*A)^{-1}A^*\,\!.

It follows that A^+\,\! is then a left inverse of A\,\!:    A^+ A = I_n\,\!.

Linearly independent rows

If the rows of A\,\! are linearly independent (so that m \le n), then AA * is invertible. In this case, an explicit formula is:

A^+ = A^*(A A^*)^{-1}\,\!.

It follows that A^+\,\! is a right inverse of A\,\!:   A A^+ = I_m\,\!.

Orthonormal columns or rows

This is a special case of either full column rank or full row rank (treated above). If A\,\! has orthonormal columns (A^*A = I_n\,\!) or orthonormal rows (AA^* = I_m\,\!), then A^+ = A^*\,\!.

Circulant matrices

For a Circulant matrix C\,\!, the singular value decomposition is given by the Fourier transform, that is the singular values are the Fourier coefficients. Let \mathcal{F} be the Discrete Fourier Transform (DFT) matrix, then

C = \mathcal{F}\cdot\Sigma\cdot\mathcal{F}^*\,\!
C^+ = \mathcal{F}\cdot\Sigma^+\cdot\mathcal{F}^*\,\![7]

Construction

Rank decomposition

Let  r \le \min(m,n) denote the rank of A \in M(m,n;\mathbb{K})\,\!. Then A\,\! can be (rank) decomposed as A=BC\,\! where B \in M(m,r;\mathbb{K})\,\! and C \in M(r,n;\mathbb{K})\,\! are of rank r. Then A^+ = C^+B^+ = C^*(CC^*)^{-1}(B^*B)^{-1}B^*\,\!.

The QR method

For \mathbb{K}=\mathbb{R}\,\! or \mathbb{K}=\mathbb{C}\,\! computing the product AA * or A * A and their inverses explicitly is often a source of numerical rounding errors and computational cost in practice. An alternative approach using the QR decomposition of A\,\! may be used instead.

Considering the case when A\,\! is of full column rank, so that A^+ = (A^*A)^{-1}A^*\,\!. then the Cholesky decomposition A^*A = R^*R\,\!, where R\,\! is an upper triangular matrix, may be used. Multiplication by the inverse is then done easily by solving a system with multiple right-hand-sides,

A^+ = (A^*A)^{-1}A^*  \qquad \Leftrightarrow \qquad  (A^*A)A^+ = A^*  \qquad \Leftrightarrow \qquad R^*RA^+ = A^*

which may be solved by forward substitution followed by back substitution.

The Cholesky decomposition may be computed without forming A^*A\,\! explicitly, by alternatively using the QR decomposition of  A = QR\,\!, where Q\,\,\! has orthonormal columns, Q * Q = I, and R\,\! is upper triangular. Then

 A^*A \,=\, (QR)^*(QR) \,=\, R^*Q^*QR \,=\, R^*R,

so R is the Cholesky factor of A * A.

The case of full row rank is treated similarly by using the formula A^+ = A^*(AA^*)^{-1}\,\! and using a similar argument, swapping the roles of A and A * .

Singular value decomposition (SVD)

A computationally simple and accurate way to compute the pseudoinverse is by using the singular value decomposition.[1][5][8] If A = UΣV * is the singular value decomposition of A, then A + = VΣ + U * . For a diagonal matrix such as Σ, we get the pseudoinverse by taking the reciprocal of each non-zero element on the diagonal, leaving the zeros in place, and transposing the resulting matrix. In numerical computation, only elements larger than some small tolerance are taken to be nonzero, and the others are replaced by zeros. For example, in the MATLAB or NumPy function pinv, the tolerance is taken to be t = ε•max(m,n)•max(Σ), where ε is the machine epsilon.

The computational cost of this method is dominated by the cost of computing the SVD, which is several times higher than matrix-matrix multiplication, even if a state-of-the art implementation (such as that of LAPACK) is used.

The above procedure shows why taking the pseudoinverse is not a continuous operation: if the original matrix A has a singular value 0 (a diagonal entry of the matrix Σ above), then modifying A slightly may turn this zero into a tiny positive number, thereby affecting the pseudoinverse dramatically as we now have to take the reciprocal of a tiny number.

Block matrices

Optimized approaches exist for calculating the pseudoinverse of block structured matrices.

The iterative method of Ben-Israel and Cohen

Another method for computing the pseudoinverse uses the recursion

 A_{i+1} = 2A_i - A_i A A_i, \,

which is sometimes referred to as hyper-power sequence. This recursion produces a sequence converging quadratically to the pseudoinverse of A if it is started with an appropriate A0 satisfying A0A = (A0A) * . The choice A0 = αA * (where 0 < \alpha < 2/\sigma^2_1(A), with σ1(A) denoting the largest singular value of A) [9] has been argued not to be competitive to the method using the SVD mentioned above, because even for moderately ill-conditioned matrices it takes a long time before Ai enters the region of quadratic convergence.[10] However, if started with A0 already close to the Moore–Penrose pseudoinverse and A0A = (A0A) * , for example A0: = (A * A + δI) − 1A * , convergence is fast (quadratic).

Updating the pseudoinverse

For the cases where A has full row or column rank, and the inverse of the correlation matrix (AA * for A with full row rank or A * A for full column rank) is already known, the pseudoinverse for matrices related to A can be computed by applying the Sherman–Morrison–Woodbury formula to update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms[11][12] exist that exploit the relationship.

Similarly, it is possible to update the Cholesky factor when a row or column is added, without creating the inverse of the correlation matrix explicitly. However, updating the pseudoinverse in the general rank-deficient case is much more complicated.[13][14]

Software libraries

The package NumPy provides a pseudo-inverse calculation through its functions matrix.I and linalg.pinv. High quality implementations of SVD, QR, and back substitution are available in standard libraries, such as LAPACK. Writing one's own implementation of SVD is a major programming project that requires a significant numerical expertise. In special circumstances, such as parallel computing or embedded computing, however, alternative implementations by QR or even the use of an explicit inverse might be preferable, and custom implementations may be unavoidable.

Applications

Linear least-squares

The pseudoinverse provides a least squares solution to a system of linear equations.[15] For  A \in M(m,n; \mathbb{K})\,\!, given a system of linear equations

A x = b\,,

in general, a vector x which solves the system may not exist, or if one exists, it may not be unique. The pseudoinverse solves the "least-squares" problem as follows:

  •  \forall x \in \mathbb{K}^n\,\!, we have \|Ax -b\|_2 \ge \|Az -b\|_2 where z = A + b and \|\cdot\|_2 denotes the Euclidean norm. This weak inequality holds with equality if and only if x = A + b + (IA + A)w for any vector w; this provides an infinitude of minimizing solutions unless A has full column rank, in which case (IA + A) is a zero matrix.

This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let  B \in M(m,p; \mathbb{K})\,\!.

  •  \forall X \in M(n,p; \mathbb{K})\,\!, we have  \|AX - B\|_F \ge \|AZ -B\|_F where Z = A + B and \|\cdot\|_F denotes the Frobenius norm.

Obtaining all solutions of a linear system

If the linear system

A x = b\,

has any solutions, they are all given by

x = A + b + [IA + A]w

for arbitrary vector w. Solution(s) exist if and only if AA + b = b. If the latter holds, then the solution is unique if and only if A has full column rank, in which case [IA + A] is a zero matrix.

Minimum-norm solution to a linear system

For linear systems A x = b,\, with non-unique solutions (such as under-determined systems), the pseudoinverse may be used to construct the solution of minimum Euclidean norm \|x\|_2 among all solutions.

  • If A x = b\, is satisfiable, the vector z = A + b is a solution, and satisfies \|z\|_2 \le \|x\|_2 for all solutions.

This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let  B \in M(m,p; \mathbb{K})\,\!.

  • If A X = B\, is satisfiable, the matrix Z = A + B is a solution, and satisfies \|Z\|_F \le \|X\|_F for all solutions.

Geometric construction

This description suggests the following geometric construction for the result of applying the pseudoinverse of an m×n matrix A to a vector. To find A + b for given b in Rm, first project b orthogonally onto the range of A, finding a point p(b) in the range. Then form A-1({p(b)}), i.e. find those vectors in Rn that A sends to p(b). This will be an affine subspace of Rn parallel to the kernel of A. The element of this subspace that has the smallest length (i.e. is closest to the origin) is the answer A + b we are looking for. It can be found by taking an arbitrary member of A-1({p(b)}) and projecting it orthogonally onto the orthogonal complement of the kernel of A.

Condition number

Using the pseudoinverse and a matrix norm, one can define a condition number for any matrix:

\mbox{cond}(A)=\|A\| \|A^+\|.\

A large condition number implies that the problem of finding least-squares solutions to the corresponding system of linear equations is ill-conditioned in the sense that small errors in the entries of A can lead to huge errors in the entries of the solution.[16]

Generalizations

In order to solve more general least-squares problems, one could try to define Moore–Penrose pseudoinverses for all continuous linear operators A : H1H2 between two Hilbert spaces H1 and H2, using the same four conditions as in our definition above. It turns out that not every continuous linear operator has a continuous linear pseudo-inverse in this sense.[16] Those that do are precisely the ones whose range is closed in H2.

In abstract algebra, a Moore–Penrose pseudoinverse may be defined on a *-regular semigroup. This abstract definition coincides with the one in linear algebra.

See also

References

  1. ^ a b c Ben-Israel, Adi; Thomas N.E. Greville (2003). Generalized Inverses. Springer-Verlag. ISBN 0-387-00293-6. 
  2. ^ Moore, E. H. (1920). "On the reciprocal of the general algebraic matrix". Bulletin of the American Mathematical Society 26 (9): 394–395. doi:10.1090/S0002-9904-1920-03322-7. http://projecteuclid.org/euclid.bams/1183425340. 
  3. ^ Bjerhammar, Arne (1951). "Application of calculus of matrices to method of least squares; with special references to geodetic calculations". Trans. Roy. Inst. Tech. Stockholm 49. 
  4. ^ a b Penrose, Roger (1955). "A generalized inverse for matrices". Proceedings of the Cambridge Philosophical Society 51: 406–413. doi:10.1017/S0305004100030401. 
  5. ^ a b c d e Golub, Gene H.; Charles F. Van Loan (1996). Matrix computations (3rd ed.). Baltimore: Johns Hopkins. pp. 257–258. ISBN 0-8018-5414-8. 
  6. ^ a b c Stoer, Josef; Bulirsch, Roland (2002). Introduction to Numerical Analysis (3rd ed.). Berlin, New York: Springer-Verlag. ISBN 978-0-387-95452-3. .
  7. ^ Stallings, W. T.; Boullion, T. L. (1972). "The Pseudoinverse of an r-Circulant Matrix". Proceedings of the American Mathematical Society 34: 385–388. doi:10.2307/2038377. 
  8. ^ Linear Systems & Pseudo-Inverse
  9. ^ Ben-Israel, Adi; Cohen, Dan (1966). "On Iterative Computation of Generalized Inverses and Associated Projections". SIAM Journal on Numerical Analysis 3: 410–419. doi:10.1137/0703035. JSTOR 2949637. pdf
  10. ^ Söderström, Torsten; Stewart, G. W. (1974). "On the Numerical Properties of an Iterative Method for Computing the Moore- Penrose Generalized Inverse". SIAM Journal on Numerical Analysis 11: 61–74. doi:10.1137/0711008. JSTOR 2156431. 
  11. ^ Tino Gramß (1992). Worterkennung mit einem künstlichen neuronalen Netzwerk. Georg-August-Universität zu Göttingen. 
  12. ^ , Mohammad Emtiyaz, "Updating Inverse of a Matrix When a Column is Added/Removed"[1]
  13. ^ Meyer, Carl D., Jr. Generalized inverses and ranks of block matrices. SIAM J. Appl. Math. 25 (1973), 597—602
  14. ^ Meyer, Carl D., Jr. Generalized inversion of modified matrices. SIAM J. Appl. Math. 24 (1973), 315—323
  15. ^ Penrose, Roger (1956). "On best approximate solution of linear matrix equations". Proceedings of the Cambridge Philosophical Society 52: 17–19. doi:10.1017/S0305004100030929. 
  16. ^ a b Roland Hagen, Steffen Roch, Bernd Silbermann. C*-algebras and Numerical Analysis, CRC Press, 2001. Section 2.1.2.

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Moore-Penrose pseudoinverse — In mathematics, and in particular linear algebra, the pseudoinverse A^+ of an m imes n matrix A is a generalization of the inverse matrix.cite book | last=Ben Israel | first = Adi | coauthors=Thomas N.E. Greville | title=Generalized Inverses |… …   Wikipedia

  • Moore-Penrose pseudoinverse/Proofs — Existence and Uniqueness =Let A be an m by n matrix over K , where K is either the field of real numbers or the field of complex numbers. Then there is a unique n by m matrix A^+ over K such that: #A A^+A = A #A^+A A^+ = A^+ #(AA^+)^* = AA^+… …   Wikipedia

  • Moore-Penrose-Inverse — Die Pseudoinverse einer Matrix ist ein Begriff aus dem mathematischen Teilgebiet lineare Algebra. Sie ist eine Verallgemeinerung der inversen Matrix auf singuläre und nichtquadratische Matrizen, weshalb sie häufig auch als verallgemeinerte… …   Deutsch Wikipedia

  • Pseudoinverse — Die Pseudoinverse einer Matrix ist ein Begriff aus dem mathematischen Teilgebiet lineare Algebra. Sie ist eine Verallgemeinerung der inversen Matrix auf singuläre und nichtquadratische Matrizen, weshalb sie häufig auch als verallgemeinerte… …   Deutsch Wikipedia

  • Pseudoinverse — Pseudo inverse En algèbre linéaire, la notion de pseudo inverse (ou inverse généralisé) généralise celle d’inverse d’une application linéaire ou d’une matrice[1] aux cas non inversibles en lui supprimant certaines des propriétés demandées aux… …   Wikipédia en Français

  • Roger Penrose — Infobox Scientist box width = 300px name = Sir Roger Penrose image size = 200px caption = Roger Penrose at Brookhaven Lab, February 6, 2007 birth date = birth date and age|1931|08|08 birth place = Colchester, Essex, England death date = death… …   Wikipedia

  • E. H. Moore — Eliakim Hastings Moore (* 28. Januar 1862 in Marietta, Ohio; † 30. Dezember 1932 in Chicago, Illinois) war ein US amerikanischer Mathematiker. Moore studierte ab 1879 an der Yale University Mathematik und Astronomie und promovierte 1885 mit der… …   Deutsch Wikipedia

  • Eliakim Hastings Moore — (* 28. Januar 1862 in Marietta, Ohio; † 30. Dezember 1932 in Chicago, Illinois) war ein US amerikanischer Mathematiker. Moore studierte ab 1879 an der Yale University Mathematik und Astronomie und promovierte …   Deutsch Wikipedia

  • Invertible matrix — In linear algebra an n by n (square) matrix A is called invertible (some authors use nonsingular or nondegenerate) if there exists an n by n matrix B such that where In denotes the n by n identity matrix and the multiplication used is ordinary… …   Wikipedia

  • Inverse element — In abstract algebra, the idea of an inverse element generalises the concept of a negation, in relation to addition, and a reciprocal, in relation to multiplication. The intuition is of an element that can undo the effect of combination with… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”