Correlation Matrices

Visualizing correlation matrices

The correlation is one of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. The function corrcoef provided by numpy returns a matrix R of correlation coefficients calculated from an input matrix X whose rows are variables and whose columns are observations. Each element of the matrix R represents the correlation between two variables and it is computed as

where cov(X,Y) is the covariance between X and Y, while σX and σY are the standard deviations. If N is number of variables then R is a N-by-N matrix. Then, when we have a large number of variables we need a way to visualize R. The following snippet uses a pseudocolor plot to visualize R:

from numpy import corrcoef, sum, log, arange

from numpy.random import rand

from pylab import pcolor, show, colorbar, xticks, yticks

# generating some uncorrelated data

data = rand(10,100) # each row of represents a variable

# creating correlation between the variables

# variable 2 is correlated with all the other variables

data[2,:] = sum(data,0)

# variable 4 is correlated with variable 8

data[4,:] = log(data[8,:])*0.5

# plotting the correlation matrix

R = corrcoef(data)

pcolor(R)

colorbar()

yticks(arange(0.5,10.5),range(0,10))

xticks(arange(0.5,10.5),range(0,10))

show()

#!/usr/bin/env python

"""

Draws Hinton diagrams using matplotlib ( http://matplotlib.sf.net/ ).

Hinton diagrams are a handy way of visualizing weight matrices, using

colour to denote sign and area to denote magnitude.

By David Warde-Farley -- user AT cs dot toronto dot edu (user = dwf)

 with thanks to Geoffrey Hinton for providing the MATLAB code off of

 which this is modeled.

Redistributable under the terms of the 3-clause BSD license

(see http://www.opensource.org/licenses/bsd-license.php for details)

"""

import numpy as np

import matplotlib.pyplot as plt

def _blob(x, y, area, colour):

    """

   Draws a square-shaped blob with the given area (< 1) at

   the given coordinates.

   """

    hs = np.sqrt(area) / 2

    xcorners = np.array([x - hs, x + hs, x + hs, x - hs])

    ycorners = np.array([y - hs, y - hs, y + hs, y + hs])

    plt.fill(xcorners, ycorners, colour, edgecolor=colour)

def hinton(W, maxweight=None):

    """

   Draws a Hinton diagram for visualizing a weight matrix.

   Temporarily disables matplotlib interactive mode if it is on,

   otherwise this takes forever.

   """

    reenable = False

    if plt.isinteractive():

        plt.ioff()

    

    plt.clf()

    height, width = W.shape

    if not maxweight:

        maxweight = 2**np.ceil(np.log(np.max(np.abs(W)))/np.log(2))

        

    plt.fill(np.array([0, width, width, 0]),

             np.array([0, 0, height, height]),

             'gray')

    

    plt.axis('off')

    plt.axis('equal')

    for x in xrange(width):

        for y in xrange(height):

            _x = x+1

            _y = y+1

            w = W[y, x]

            if w > 0:

                _blob(_x - 0.5,

                      height - _y + 0.5,

                      min(1, w/maxweight),

                      'white')

            elif w < 0:

                _blob(_x - 0.5,

                      height - _y + 0.5,

                      min(1, -w/maxweight),

                      'black')

    if reenable:

        plt.ion()

    

if __name__ == "__main__":

    hinton(np.random.randn(20, 20))

    plt.title('Example Hinton diagram - 20x20 random normal')

    plt.show()