Here is what it looks like to run Damon in an interactive Python interpreter. To write reusable programs, you type the same commands into a python script (a text file with a .py extension). In this interactive session, we:
Running Damon is not hard but it does require some comfort with coding in Python and a way to get information about how to use Damon methods. If you're willing to climb that learning curve, Damon is as powerful and customizable as you want it to be.
# Import Numpy and Damon, plus some tools
>>> import numpy as np
>>> import damon1.core as dmn
>>> import damon1.tools as dmnt
# Create a 100 x 80 (rows x cols) artificial dataset. Make it 3-dimensional. Add noise. Make 10% of cells missing.
>>> created = dmn.create_data(100, 80, ndim=3, noise=5, p_nan=0.10)
create_data() is working...
Number of Rows= 100
Number of Columns= 80
Number of Dimensions= 3
Data Min= -9.097
Data Max= 8.604
Proportion made missing= 0.098
Not-a-Number Value (nanval)= -999.0
create_data() is done.
Contains:
['fac0coord', 'model', 'fac1coord', 'data', 'anskey']
# "d" is a Damon object containing "data", with noise and missing values.
>>> d = created['data']
# "m" is a Damon object containing "model" values, no noise or missing values.
>>> m = created['model']
# Here's a snippet of what the data looks like. -999 means missing.
>>> print d.coredata
[[ -1.45 -0.43 -3.52 ..., 1.96 0.66 -999. ]
[ -2.3 0.97 2.91 ..., -3.91 2.88 -2.49]
[-999. -1.1 -999. ..., 0.24 0.17 -1.16]
...,
[ 0.7 2.7 1.83 ..., 4.01 5.92 0.54]
[ -0.17 -1.6 2.51 ..., -0.36 -4.02 -0.37]
[ -2.34 -1.29 -1.16 ..., -999. -2.67 -999. ]]
# Run the coord() method to get coordinates for some optimal dimensionality between 1 and 7. The coordinates are not displayed but stored "inside" the Damon object.
>>> d.coord(ndim=[range(1, 7)])
coord() is working...
Getting best dimensionality...
1..2..3..4..5..6..
Dim Acc Stab Obj Speed Err
1 0.439 0.889 0.625 0.432 2.383
2 0.638 0.936 0.773 0.87 2.044
3 0.794 0.955 0.871 0.996 1.617
4 0.791 0.821 0.806 0.88 1.628
5 0.773 0.698 0.735 0.813 1.692
6 0.719 0.617 0.666 0.796 1.882
Best Dimensionality = 3
Seed Acc Stab Obj Speed Err
1 0.794 0.955 0.87 0.985 1.617
2 0.794 0.955 0.87 0.988 1.617
3 0.794 0.955 0.87 0.996 1.617
Best coordinate seed is 3 , out of 3 attempts.
Warning in coord()/seed(): Unable to find starting coordinates that meet your 'seed' requirements. It is possible the dataset cannot yield the desired objectivity.
Dim Fac Iter Change jolt_
3 0 0 1.0001
3 1 0 1.0001
3 0 1 0.05837
3 1 1 0.05837
3 0 2 0.03156
3 1 2 0.03156
3 0 3 0.00237
3 1 3 0.00237
3 0 4 0.00019
3 1 4 0.00019
coord() is done -- see my_obj.coord_out
Contains:
['ndim', 'fac1coord', 'anchors', 'changelog', 'facs_per_ent', 'fac0coord']
# Using the coordinates, calculate an estimate for each cell, including missing.
>>> d.base_est()
base_est() is working...
base_est() is done -- see my_obj.base_est_out
Contains:
['nheaders4cols', 'key4rows', 'nheaders4rows', 'rowlabels', 'validchars', 'rowkeytype', 'coredata', 'colkeytype', 'nanval', 'collabels', 'key4cols', 'ecutmaxpos']
# Here are the estimates. Missing cells are filled in.
>>> estimates = d.base_est_out['coredata']
>>> print estimates
[[-1.22 1.59 -0.69 ..., 0.74 2.23 3.55]
[-2.06 0.67 2.29 ..., -1.2 0.93 -2.18]
[ 0.47 -1. 0.22 ..., -0.94 -1.66 -2.26]
...,
[-0.63 1.9 3.78 ..., 3.17 4.74 0.32]
[-0.37 -0.98 0.64 ..., -1.97 -1.96 -3.24]
[-3.9 0.76 -0.92 ..., -4.41 -1.05 -0.15]]
# Here are the original "true" values, i.e., the model values without noise or missing values.
>>> true = m.coredata
>>> print true
[[-0.55 1.22 -1.42 ..., 1.11 2.36 3.42]
[-2.3 0.57 2.16 ..., -1.56 1.03 -1.34]
[-0.15 -0.65 0.93 ..., -1.11 -1.38 -2.06]
...,
[-0.7 1.85 3.78 ..., 2.81 5.07 -0.21]
[-0.47 -0.7 1.11 ..., -1.56 -1.57 -2.37]
[-4.19 0.36 -0.76 ..., -4.7 -0.87 0.64]]
# How well do the estimates match the true values? A 0.976 correlation.
>>> est_v_true = dmnt.correl(estimates, true)
>>> print est_v_true
0.97596437021
# How well did Damon predict the true missing values? The correlation is almost as high!
>>> missing = d.coredata == -999
>>> est_v_true_missing = dmnt.correl(estimates[missing], true[missing])
>>> print est_v_true_missing
0.975683152898
**************************************************
# The above looks like a lot of code but it's not.
# Here is what it looks like to calculate estimates without all the printouts (controlled by verbose=None):
>>> created = dmn.create_data(100, 80, ndim=3, noise=5, p_nan=0.10, verbose=None)
>>> d = created['data']
>>> d.coord(ndim=[range(1, 7)])
>>> d.base_est()
>>> print d.base_est_out['coredata']
[[-1.22 1.59 -0.69 ..., 0.74 2.23 3.55]
[-2.06 0.67 2.29 ..., -1.2 0.93 -2.18]
[ 0.47 -1. 0.22 ..., -0.94 -1.66 -2.26]
...,
[-0.63 1.9 3.78 ..., 3.17 4.74 0.32]
[-0.37 -0.98 0.64 ..., -1.97 -1.96 -3.24]
[-3.9 0.76 -0.92 ..., -4.41 -1.05 -0.15]]
# In just 5 lines, we created, analyzed, and got cell predictions for a 3-D dataset.