Sylvester’s Criterion for Positive Definiteness

Suppose A=(a_{ij}) is an n\times n complex matrix. For 1\leq k\leq n, we define the k^{th} leading principal submatrix of A to be the k\times k matrix

\displaystyle A_{k}:=\begin{bmatrix}a_{11}&\cdots&a_{1k}\\ \vdots&\ddots&\vdots\\a_{k1}&\cdots&a_{kk}\end{bmatrix}

We define the k^{th} leading principal minor of A to be the determinant of A_{k}.

I prefer to think of n\times n matrices as operators on \mathbb{R}^{n} or \mathbb{C}^{n}, so what is the operator interpretation of the k^{th} principal submatrix? Let P:\mathbb{R}^{n}\rightarrow\mathbb{R}^{k} be the projection map of a vector onto its first k coordinates with respect to the standard basis e_{1},\ldots,e_{n}. More precisely,

\displaystyle Pv=P\left(\langle{v,e_{1}}\rangle e_{1}+\cdots+\langle{v,e_{n}}\rangle e_{n}\right)=\langle{v,e_{1}}\rangle e_{1}+\cdots+\langle{v,e_{k}}\rangle e_{k},\indent\forall v\in\mathbb{R}^{n}

We can write P as a matrix with respect to the standard bases of \mathbb{R}^{n} and \mathbb{R}^{k}:

P=\begin{bmatrix}1&0&\cdots&0&0&\cdots&0\\ 0&1&\cdots&0&0&\cdots&0\\ 0&0&\ddots&0&0&\cdots&0\\ 0&0&\cdots&1&0&\cdots&0\end{bmatrix},\indent P^{T}=\begin{bmatrix}1&0&\cdots&0\\ 0&1&\cdots&0\\ 0&0&\ddots&0\\ 0&0&\cdots&1\\ 0&0&\cdots&0\\ \vdots&\vdots&\vdots&\vdots\\ 0&0&\cdots&0\end{bmatrix}

Then A_{k}=PAP^{T}.

Why are leading principal minors useful? Well, suppose that you have located a critical point \mathbf{x}=(x_{1},\ldots,x_{n})\in\mathbb{R}^{n} of a smooth function f:\mathbb{R}^{n}\rightarrow\mathbb{R}. You want to determine whether or not this critical point is a local minimum. So, you compute the Hessian matrix H(f)(\mathbf{x}) at the point \mathbf{x}. The second derivative test says that \mathbf{x} is a local minimum if and only if the matrix H(f)(\mathbf{x} is positive definite:

\displaystyle\mathbf{y}H(f)(\mathbf{y})\mathbf{y}^{T}>0,\indent\forall\mathbf{y}\neq0\in\mathbb{R}^{n}

Second-Derivative Test. Suppose f:U\rightarrow\mathbb{R} is a C^{2} function on an open set U\subset\mathbb{R}^{n}. If x is a critical point of f and Hf(x) is positive definite, then x is a strict local minimizer of f on U.

Proof. Set A:=Hf(x). Since A is positive definite, we have that the function g(d):=\langle{Ad,d}\rangle>0, for all d\neq 0. Restricting the domain of g to the compact unit sphere S, we see from Weierstrass’ theorem that g attains its minimum \alpha at some point d_{0}\in S, so \alpha>0. Normalizing an arbitrary nonzero vector d\in\mathbb{R}^{n}, we see that

\displaystyle g(d)=\left\|d\right\|^{2}\left\langle{A\frac{d}{\left\|d\right\|},\frac{d}{\left\|d\right\|}}\right\rangle\geq\alpha\left\|d\right\|^{2},\indent\forall\left\|d\right\|\neq0

Of course, equality also holds if \left\|d\right\|=0.

If \left\|d\right\| is sufficiently small, then Taylor’s theorem tells us that

\begin{array}{lcl}\displaystyle f(x+d)=f(x)+\left\langle{\nabla f(x),d}\right\rangle+\dfrac{1}{2}\left\langle{Ad,d}\right\rangle+o\left(\left\|d\right\|^{2}\right)&=&\displaystyle f(x)+\dfrac{1}{2}g(d)+o\left(\left\|d\right\|^{2}\right)\\&\geq&\displaystyle f(x)+\left\|d\right\|^{2}\left(\dfrac{\alpha}{2}+\dfrac{o(\left\|d\right\|^{2})}{\left\|d\right\|^{2}}\right)\\&>&\displaystyle f(x),\end{array}

if \frac{\alpha}{2}+\frac{o(\left\|d\right\|^{2})}{\left\|d\right\|^{2}}>0. \Box

This is where the leading principal minors come in.

Theorem 1. (Sylvester’s Criterion) A real, symmetric matrix A is positive definite if and only if all principal minors are positive.

The proof presented below is adapted from George T. Gilbert’s American Mathematical Monthly article “Positive Definite Matrices and Sylvester’s Criterion.” The full citation can be found below.

For necessity, it suffices to show that the k^{th} principal minor A_{k} is positive definite, for 1\leq k\leq n. Using the decomposition of A_{k} obtained above and the hypothesis that A is a positive definite matrix, we see that

\displaystyle\langle{A_{k}v,v}\rangle=\langle{PAP^{T}v,v}\rangle=\langle{AP^{T}v,P^{T}v}\rangle>0,\indent\forall v\neq0\in\mathbb{R}^{k}

For sufficiency, we first prove two lemmas.

Lemma 2.  Let v_{1},\ldots,v_{n} be a basis of a finite-dimensional vector space V. Suppose W is a k-dimensional subspace of V. If m<k, then there exists a nonzero vector in W which belongs to the span of v_{m+1},\ldots,v_{n}.

Proof. Let w_{1},\ldots,w_{k} be a basis for W. If the statement of the lemma is false, then w_{1},\ldots,w_{k},v_{m+1},\ldots,v_{n} is a basis of V, which contradicts that \dim V=n. \Box

Lemma 3. Let A be an n\times n real, symmetric matrix. Suppose W is a k-dimensional subspace of \mathbb{R}^{n} such that

\displaystyle w^{T}Aw>0,\indent\forall w\neq0\in W

Then A has at least k (counting multiplicity) positive eigenvalues.

Proof. By the real spectral theorem, there is an orthonormal basis v_{1},\ldots,v_{n} for \mathbb{R}^{n} consisting of eigenvectors of A with corresponding eigenvalues \lambda_{1},\ldots,\lambda_{n}. Let m denote the number of positive eigenvalues of A, where 0\leq m\leq n. Assume for the sake of a contradiction that

\displaystyle w=a_{m+1}v_{m+1}+\cdots+a_{n}v_{n}

Applying our hypothesis that the quadratic form defined by A is positive on W, we obtain

\begin{array}{lcl}\displaystyle w^{T}Aw=Aw\cdot w&=&\displaystyle\left(a_{m+1}\lambda_{m+1}v_{m+1}+\cdots+a_{n}\lambda_{n}v_{n}\right)\cdot\left(a_{m+1}v_{m+1}+\cdots+a_{n}v_{n}\right)\\&=&\displaystyle\lambda_{m+1}a_{m+1}^{2}+\cdots+\lambda_{n}a_{n}^{2}\\&>&\displaystyle0\end{array}

However, we assumed that \lambda_{m+1},\ldots,\lambda_{n} are all nonpositive, so that \sum_{i=m+1}^{n}\lambda_{i}a_{i}^{2}\leq 0, which is a contradiction. \Box

We prove the suffiiency of Sylvester’s criterion by induction on n. For the base n=1, there is nothing to prove. Suppose that the sufficiency of Sylvester’s criterion is true for all (n-1)\times(n-1) real, symmetric matrices. Let W denote the subspace of \mathbb{R}^{n} consisting of all vectors with zero n^{th} coordinate. Using above, let P denote the projection of \mathbb{R}^{n} onto the subspace spanned by the first n-1 coordinates with respect to the standard basis. Then

\displaystyle w^{T}Aw=\left(w^{T}P\right)A\left(P^{T}w\right)=w^{T}A_{n-1}w>0,\indent\forall w\neq0\in W

by our induction hypothesis. By the preceding lemma, A has at least n-1 positive eigenvalues (counting multiplicity). Therefore the n^{th} eigenvalue is also positive, since otherwise, the n^{th} principal minor \det A\leq 0, which is a contradiction.

I’m not sure if Sylvester’s criterion is useful for determining whether an $latex n\times n$ matrix A is positive definite, if n is large; I didn’t study much numerical analysis as an undergraduate student. To apply Sylvester’s criterion, we need to know the signs of n determinants. As far as I know, there is no algorithm that computes the sign of the determinant faster than the value of the determinant. So, we would need to compute n determinants. Laplace expansion, which has arithmetic complexity of O(k!), is infeasible when n is large. So, we would probably use Gaussian elimination, which has arithmetic complexity of O(k^3), to put the leading principal submatrices in upper-triangular form to obtain the determinant.

If we had additional information about the full matrix A, then the aforementioned approaches would be unnecessary. For instance, if we could easily find a basis with respect to which A is upper triangular, then we could check positive definiteness by examining the diagonal entries. Recall that a matrix is positive definite if and only if all its eigenvalues are positive, and the diagonal entries of a matrix of an upper triangular matrix are precisely its eigenvalues repeated with multiplicity. Therefore we only need to check whether the diagonal entries are all positive.

Advertisements
This entry was posted in math.OC and tagged , , , , , , , , . Bookmark the permalink.

6 Responses to Sylvester’s Criterion for Positive Definiteness

  1. dinesh says:

    From a numerical perspective, won’t the power method help to determine if it is positive?

    • matthewhr says:

      If my understanding is correct, the power method only gives you the eigenvalue with the largest magnitude; however, a matrix is positive definite if and only if all its eigenvalues are positive. So, I’m not sure how knowing just one eigenvalue will help.

      There may be a multi-step adaptation of the power method, which perhaps is closer to what you have in mind, but I am not familiar with it. If you find out more, please do let me know.

  2. Dileep says:

    Oh, I was thinking if you have a numerical technique to estimate the lowest eigenvalue (or its corresponding eigenvector), then that would certainly tell you if the matrix is positive or not. I though power method did that. Guess, I was wrong.

  3. Pingback: Sylvester’s Criterion for Positive-Semidefinite Matrices I | Math by Matt

  4. Pingback: URL

  5. Anonymous says:

    The spectral decomposition of a real symmetric matrix can be done relatively easily using the Jacobi iteration algorithm, at least for moderate (n x n) matrices. Non-linear optimization objective functions I have encountered are typically less than 20-30 variables, which are well withing the capability of the Jacobi algorithm.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s