Your life will be easier: when trying to use Principal Component Analysis.

How: To compress a higher dimension vector to a lower one. A lossy compression (might loss information). Helps to save memory and computational complexity.

I would explain it from the perspective of matrices.

So, here is an easy example [1]:

A vector ,

normalize:

Suppose there is a point = [3,2,5], make a projection of the point on the vector .

Where we will have:

That is , we use the number 5.65 to represent the point [3,2,5]. We successfully reduced the 3-dimension vector to a single number (1-dimension).

Step-by-step [2]:

when you are given a matrix M

1. get average value of all the samples m, and Covariance matrix S;

2. get eigenvalues of S, in a descending order, select the eigenvectors of the first-n, you will get a matrix

3. new matrix

I drew a simple sketch to show a brief process. (I know my handwriting is terrible…:() Suppose we have 100 data, each contains 10 features, starting from top left corner:

AfterPCA, we could get a new Matrix , a compressed matrix.

If you are familiar with ML, you can treat matrix E as a “classifier”. When comes the new training dataset, here we say only 2 records, which can be represented by a matrix T, in the dimension of 2 times 10. So next step is, you apply this “classifier” to the training data: , you will get a final result.

Python code for the example:

import numpy as np from numpy import linalg as LA # init a = np.zeros((10,), dtype=np.int) mu_vec1 = np.array([0,0,0]) mu_vec1 = np.array(a) cov_mat1 = np.identity(10) # Sample matrix M sampleM = np.random.multivariate_normal(mu_vec1, cov_mat1, 100).T covM = np.cov(sampleM) # already in descending order # use LA.eigvals(covM) to get eigenvalues # get eigenvalues and eigenvectors alpha,lambdas= LA.eig(covM) # Get EigenMatix E E = lambdas[:,[0,1,2,3]] # use E.shape to check the shape: output is (10, 4) M2 = np.dot(sampleM.T,E) # M2 the new matrix has the shape of (100, 4) print M2.shape print M2

Numpy is powerful when doing matrices’ processing. lol I can not image how Java will be if I were to define a matrix, probably arraylist and arraylist.

Besides, building real word models makes everything to be easy. Forget about the objects and just do matrix multiplications.Btw, GPU is quite good doing that job.

**Application: Facial Recognition**

Think of an image, it can be seen as a matrix A (composed of pixels). Use PCA to get the eigenmatrix E, then get a compressed matrix.PCA *Eigenface* is popular among facial recognition jobs provided by OpenCV. The function *cvEigenDecomposite()* is used to get a projection on the dimension of n.Note it only happens in the training process.

I did not try it by myself, but I guess it would be interesting.

Life will be easier if you simplify any problems, try to focus on the principal components in your life, which are really matters.

References:

[1] https://www.youtube.com/watch?v=f9mZ8dSrVJA

[2] http://blog.csdn.net/xizhibei/article/details/7536022