The covariance matrix is a math concept that occurs in several areas of machine learning. If you have a set of n numeric data items, where each data item has d dimensions, then the covariance matrix is a d-by-d symmetric square matrix where there are variance values on the diagonal and covariance values off the diagonal.

Suppose you have a set of n=5 data items, representing 5 people, where each data item has a Height (X), test Score (Y), and Age (Z) (therefore d = 3):

X Y Z Height Score Age 64.0 580.0 29.0 66.0 570.0 33.0 68.0 590.0 37.0 69.0 660.0 46.0 73.0 600.0 55.0 mean = 68.0 600.0 40.0 n=5

The covariance matrix for this data set is:

X Y Z X 11.50 50.00 34.75 Y 50.00 1250.00 205.00 Z 34.75 205.00 110.00

The 11.50 is the variance of X, 1250.0 is the variance of Y, and 110.0 is the variance of Z. For variance, in words, subtract each value from the dimension mean. Square, add them up, and divide by n-1. For example, for X:

Var(X) = [ (64–68.0)^2 + (66–68.0^2 + (68-68.0)^2 + (69-68.0)^2 +(73-68.0)^2 ] / (5-1) = (16.0 + 4.0 + 0.0 + 1.0 + 25.0) / 4 = 46.0 / 4 = 11.50.

The covariance for XY is best shown by example:

Covar(XY) =

[ (64-68.0)*(580-600.0) + (66-68.0)*(570-600.0) + (68-68.0)*(590-600.0) + (69-68.0)*(660-600.0) + (73-68.0)*(600-600.0) ] / (5-1) =

[80.0 + 60.0 + 0 + 60.0 + 0] / 4 =

200 / 4 = 50.0

If you examine the calculations carefully, you’ll see the pattern to compute the covariance of the XZ and YZ columns. And you’ll see that Covar(XY) = Covar(YX).

One way to think about a covariance matrix is that it is a numerical summary of how variable a dataset is.

*“Light Variance” – Wade Koniakowsky*