Correlation Matrices
Visualizing correlation matrices
The correlation is one of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. The function corrcoef provided by numpy returns a matrix R of correlation coefficients calculated from an input matrix X whose rows are variables and whose columns are observations. Each element of the matrix R represents the correlation between two variables and it is computed as
where cov(X,Y) is the covariance between X and Y, while σX and σY are the standard deviations. If N is number of variables then R is a N-by-N matrix. Then, when we have a large number of variables we need a way to visualize R. The following snippet uses a pseudocolor plot to visualize R:
from numpy import corrcoef, sum, log, arange
from numpy.random import rand
from pylab import pcolor, show, colorbar, xticks, yticks
# generating some uncorrelated data
data = rand(10,100) # each row of represents a variable
# creating correlation between the variables
# variable 2 is correlated with all the other variables
data[2,:] = sum(data,0)
# variable 4 is correlated with variable 8
data[4,:] = log(data[8,:])*0.5
# plotting the correlation matrix
R = corrcoef(data)
pcolor(R)
colorbar()
yticks(arange(0.5,10.5),range(0,10))
xticks(arange(0.5,10.5),range(0,10))
show()
#!/usr/bin/env python
"""
Draws Hinton diagrams using matplotlib ( http://matplotlib.sf.net/ ).
Hinton diagrams are a handy way of visualizing weight matrices, using
colour to denote sign and area to denote magnitude.
By David Warde-Farley -- user AT cs dot toronto dot edu (user = dwf)
with thanks to Geoffrey Hinton for providing the MATLAB code off of
which this is modeled.
Redistributable under the terms of the 3-clause BSD license
(see http://www.opensource.org/licenses/bsd-license.php for details)
"""
import numpy as np
import matplotlib.pyplot as plt
def _blob(x, y, area, colour):
"""
Draws a square-shaped blob with the given area (< 1) at
the given coordinates.
"""
hs = np.sqrt(area) / 2
xcorners = np.array([x - hs, x + hs, x + hs, x - hs])
ycorners = np.array([y - hs, y - hs, y + hs, y + hs])
plt.fill(xcorners, ycorners, colour, edgecolor=colour)
def hinton(W, maxweight=None):
"""
Draws a Hinton diagram for visualizing a weight matrix.
Temporarily disables matplotlib interactive mode if it is on,
otherwise this takes forever.
"""
reenable = False
if plt.isinteractive():
plt.ioff()
plt.clf()
height, width = W.shape
if not maxweight:
maxweight = 2**np.ceil(np.log(np.max(np.abs(W)))/np.log(2))
plt.fill(np.array([0, width, width, 0]),
np.array([0, 0, height, height]),
'gray')
plt.axis('off')
plt.axis('equal')
for x in xrange(width):
for y in xrange(height):
_x = x+1
_y = y+1
w = W[y, x]
if w > 0:
_blob(_x - 0.5,
height - _y + 0.5,
min(1, w/maxweight),
'white')
elif w < 0:
_blob(_x - 0.5,
height - _y + 0.5,
min(1, -w/maxweight),
'black')
if reenable:
plt.ion()
if __name__ == "__main__":
hinton(np.random.randn(20, 20))
plt.title('Example Hinton diagram - 20x20 random normal')
plt.show()