An Example

Here is what it looks like to run Damon in an interactive Python interpreter. To write reusable programs, you type the same commands into a python script (a text file with a .py extension). In this interactive session, we:

Import Damon.
Create an artificial data set, adding noise (error) and making 10% of cells missing. In this step, we get both the "true" model values and the "observed" noisy values, called "data". Damon only sees the data, but we want it to predict the model values. The "model" and "data" values are returned as "Damon objects", to which Damon "methods" can be applied in sequence.
Calculate row and column coordinates.
Compute cell estimates.
Compare the cell estimates with the model values. Do they match? Have we predicted the missing cells?

Running Damon is not hard but it does require some comfort with coding in Python and a way to get information about how to use Damon methods. If you're willing to climb that learning curve, Damon is as powerful and customizable as you want it to be.

# Import Numpy and Damon, plus some tools

>>> import numpy as np

>>> import damon1.core as dmn

>>> import damon1.tools as dmnt

# Create a 100 x 80 (rows x cols) artificial dataset.  Make it 3-dimensional.  Add noise.  Make 10% of cells missing.

>>> created = dmn.create_data(100, 80, ndim=3, noise=5, p_nan=0.10)

create_data() is working...

Number of Rows= 100

Number of Columns= 80

Number of Dimensions= 3

Data Min= -9.097

Data Max= 8.604

Proportion made missing= 0.098

Not-a-Number Value (nanval)= -999.0

create_data() is done.

Contains:

['fac0coord', 'model', 'fac1coord', 'data', 'anskey']

# "d" is a Damon object containing "data", with noise and missing values.

>>> d = created['data']

# "m" is a Damon object containing "model" values, no noise or missing values.

>>> m = created['model']

# Here's a snippet of what the data looks like. -999 means missing.

>>> print d.coredata

[[  -1.45   -0.43   -3.52 ...,    1.96    0.66 -999.  ]

 [  -2.3     0.97    2.91 ...,   -3.91    2.88   -2.49]

 [-999.     -1.1  -999.   ...,    0.24    0.17   -1.16]

 ...,

 [   0.7     2.7     1.83 ...,    4.01    5.92    0.54]

 [  -0.17   -1.6     2.51 ...,   -0.36   -4.02   -0.37]

 [  -2.34   -1.29   -1.16 ..., -999.     -2.67 -999.  ]]

# Run the coord() method to get coordinates for some optimal dimensionality between 1 and 7. The coordinates are not displayed but stored "inside" the Damon object.

>>> d.coord(ndim=[range(1, 7)])

coord() is working...

Getting best dimensionality...

1..2..3..4..5..6..

Dim Acc Stab Obj Speed Err

1 0.439 0.889 0.625 0.432 2.383

2 0.638 0.936 0.773 0.87 2.044

3 0.794 0.955 0.871 0.996 1.617

4 0.791 0.821 0.806 0.88 1.628

5 0.773 0.698 0.735 0.813 1.692

6 0.719 0.617 0.666 0.796 1.882

Best Dimensionality =  3

Seed Acc Stab Obj Speed Err

1 0.794 0.955 0.87 0.985 1.617

2 0.794 0.955 0.87 0.988 1.617

3 0.794 0.955 0.87 0.996 1.617

Best coordinate seed is 3 , out of 3 attempts.

Warning in coord()/seed(): Unable to find starting coordinates that meet your 'seed' requirements.  It is possible the dataset cannot yield the desired objectivity.

Dim Fac Iter Change jolt_

3 0 0 1.0001

3 1 0 1.0001

3 0 1 0.05837

3 1 1 0.05837

3 0 2 0.03156

3 1 2 0.03156

3 0 3 0.00237

3 1 3 0.00237

3 0 4 0.00019

3 1 4 0.00019

coord() is done -- see my_obj.coord_out

Contains:

['ndim', 'fac1coord', 'anchors', 'changelog', 'facs_per_ent', 'fac0coord']

# Using the coordinates, calculate an estimate for each cell, including missing.

>>> d.base_est()

base_est() is working...

base_est() is done -- see my_obj.base_est_out

Contains:

['nheaders4cols', 'key4rows', 'nheaders4rows', 'rowlabels', 'validchars', 'rowkeytype', 'coredata', 'colkeytype', 'nanval', 'collabels', 'key4cols', 'ecutmaxpos']

# Here are the estimates.  Missing cells are filled in.

>>> estimates = d.base_est_out['coredata']

>>> print estimates

[[-1.22  1.59 -0.69 ...,  0.74  2.23  3.55]

 [-2.06  0.67  2.29 ..., -1.2   0.93 -2.18]

 [ 0.47 -1.    0.22 ..., -0.94 -1.66 -2.26]

 ...,

 [-0.63  1.9   3.78 ...,  3.17  4.74  0.32]

 [-0.37 -0.98  0.64 ..., -1.97 -1.96 -3.24]

 [-3.9   0.76 -0.92 ..., -4.41 -1.05 -0.15]]

# Here are the original "true" values, i.e., the model values without noise or missing values.

>>> true = m.coredata

>>> print true

[[-0.55  1.22 -1.42 ...,  1.11  2.36  3.42]

 [-2.3   0.57  2.16 ..., -1.56  1.03 -1.34]

 [-0.15 -0.65  0.93 ..., -1.11 -1.38 -2.06]

 ...,

 [-0.7   1.85  3.78 ...,  2.81  5.07 -0.21]

 [-0.47 -0.7   1.11 ..., -1.56 -1.57 -2.37]

 [-4.19  0.36 -0.76 ..., -4.7  -0.87  0.64]]

# How well do the estimates match the true values?  A 0.976 correlation.

>>> est_v_true = dmnt.correl(estimates, true)

>>> print est_v_true

0.97596437021

# How well did Damon predict the true missing values?  The correlation is almost as high!

>>> missing = d.coredata == -999

>>> est_v_true_missing = dmnt.correl(estimates[missing], true[missing])

>>> print est_v_true_missing

0.975683152898

**************************************************

# The above looks like a lot of code but it's not.

# Here is what it looks like to calculate estimates without all the printouts (controlled by verbose=None):

>>> created = dmn.create_data(100, 80, ndim=3, noise=5, p_nan=0.10, verbose=None)

>>> d = created['data']

>>> d.coord(ndim=[range(1, 7)])

>>> d.base_est()

>>> print d.base_est_out['coredata']

[[-1.22  1.59 -0.69 ...,  0.74  2.23  3.55]

 [-2.06  0.67  2.29 ..., -1.2   0.93 -2.18]

 [ 0.47 -1.    0.22 ..., -0.94 -1.66 -2.26]

 ...,

 [-0.63  1.9   3.78 ...,  3.17  4.74  0.32]

 [-0.37 -0.98  0.64 ..., -1.97 -1.96 -3.24]

 [-3.9   0.76 -0.92 ..., -4.41 -1.05 -0.15]]

# In just 5 lines, we created, analyzed, and got cell predictions for a 3-D dataset.

Page updated

Google Sites

Report abuse