Monday's lecture on K-Means Clustering and today's on Principal Component Analysis cover algorithms which are used to achieve dimensionality reduction in a data set. This sort of reduction is mainly useful for compression, the applications for which are easy to see, and for feature extraction. Professor LeCun talked about a project in his lab which uses dimensionality expansion in order to try and identify multiple voices in a piece of orchestral or jazz music, but we didn't talk about any actual mechanics.
Principal Component Analysis attempts to reduce the number of dimensions needed to represent the data by finding positional correlation between two or more of its dimensions. Think about it like this: With two-dimensional data, if all of the points lie on a straight line then their position in one direction is completely correlated with their position in the other, and you only have to store one of those positions in order to record each point with no error. In reality, even with very strong correlation each point will lie to one or another side of the line, which is what we call the principal component, creating loss. This is much easier to visualize in two dimensions than it is in ten, but the same basic idea applies as long as you keep in mind that there might be more than one principal component.
In order to store the principal components we need to compute a covariance matrix for the data. Remember that stronger correlation means better compression, since it means less variance in the dimensions orthogonal to the principal component. With this in mind, the covariance matrix A for data in R2 is generated with the following sum:
1/p({sum from k=1 to n} (x1k)^2), 1/p({sum from i=1 to p} xi*xi')
1/p({sum from i=1 to p} xi*xi'), 1/p({sum from k=1 to n} (x2k)^2)
The terms in a12 and a21 indicate correlation. For no correlation at all, as with random data, these terms will equal 0.
The question remains: What use is this covariance matrix? The answer is that it allows us to compute the projection of each vector onto the primary component U, which allows us to discard the vectors' dimensions in these orthogonal dimensions. We do this by computing two new matrices, one containing the increasing eigenvalues of A on the i=j diagonal (typically called Lambda, though here I'll call it L) and the matrix Q whose lines are the normalized and mutually orthogonal eigenvectors of A. Q is a rotation matrix, and it can readily be seen that A = QLQ'.
Now that we have these matrices we can construct a rectangular matrix Qk made up of the k most important eigenvectors in Q, which are the principal components of A. The definition of "most important" in this case is situational, but often there are a few eigenvalues which are of a much larger magnitude than the rest, and these values' eigenvectors make up Qk.
Multiplying a vector by Qk gives the projections of the vector onto the principal eigenvectors of A. It's now easy and inexpensive to compute the k PCA features of any vector in the dimensionality of the original space by simply performing this multiplication. This efficiency has a price, however, in the computation of Q and L. Computing the eigenvectors and eigenvalues of a matrix is roughly n-cubed, so there is an upper-bound to the complexity of the data sets on which PCA is useful.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment