Load Text File

Often, your data will be in the form of a comma-delimited or tab-delimited file with a .csv or .txt extension, such as you might get using Excel's Save As... command.  We already discussed how to convert a Numpy array into a Damon object using DamonObj().  The same procedure is used to load text files.  But first, let's create a text file using core.create_data().

Create a New Script
In IDLE, open the damon1.templates.blank module.  Save it as my_script3.py in a directory of your choice.

Create Text File
In IDLE, type >>>  help(core.create_data).  Paste the core.create_data() function into my_script3.py.  Add the dmn prefix and assign the output a name of d.

Enter values for nfac0, nfac1, ndim, and any other parameters you like.  However, it is the output_as, outfile, and delimiter parameters that are to specify an output text file.  Specify output_as = 'textfile', outfile = 'scrap0.csv', and delimiter = ',' .  This will create two comma-delimited text files:  a_data_scrap0.csv and a_model_scrap0.csv. The prefix a and a descriptor (data or model) are automatically added to the specified outfile name.

Create Text File

import os
import sys

import numpy as np
import numpy.random as npr
import numpy.linalg as npla
import numpy.ma as npma

try:
    import matplotlib.pyplot as plt
except ImportError:
    pass

import damon1 as damon1
import damon1.core as dmn
import damon1.tools as dmnt


# Start programming here...

d = dmn.create_data(nfac0 = 10,  # [Number of facet 0 elements -- rows/persons]
                    nfac1 = 8,  # [Number of facet 1 elements -- columns/items]
                    ndim = 3,   # [Number of dimensions to create]
                    seed = 2,  # [<None => randomly pick starter coordinates; int => integer of "seed" random coordinates>]
                    facmetric = [4,-2],  # [[m,b] => rand() * m + b, to set range of facet coordinate values]
                    noise = 2, # [<None, noise, {'Rows':<noise,{1:noise1,4:noise4}>,'Cols':<noise,{2:noise2,5:noise5}> => add error to rows/cols]
                    validchars = None,   # [<None, ['All',[valid chars]]; or ['Cols', {1:['a','b','c','d'],2:['All'],3:['1.2 -- 3.5'],4:['0 -- '],...}]> ]
                    mean_sd = None, # [<None, ['All',[Mean,SD]], or ['Cols', {1:[Mean1,SD1],2:[Mean2,SD2],3:'Refer2VC',...}]> ]
                    p_nan = 0.10,  # [Proportion of cells to make missing at random]
                    nanval = -999.,  # [Numeric code for designating missing values]
                    condcoord_ = None,  # [< None, 'Orthonormal'>]
                    nheaders4rows = 1,  # [Number of header column labels to put before each row]
                    nheaders4cols = 1,  # [Number of header row labels to put before each column]
                    extra_headers = 0,  # [If headers > 1, range of integer values for header labels, applies to both row and col]
                    input_array = None,   # [<None, name of data array, {'fac0coord':EntxDim row coords,'fac1coord':EntxDim col coords}>]
                    output_as = 'textfile',  # [<'Damon','datadict','array','textfile','Damon_textfile','datadict_textfile','array_textfile','hd5'>]
                    outfile = 'scrap0.csv',    # [<None, name of the output file/path prefix when output_as includes 'textfile'>]
                    delimiter = ',',    # [<None, delimiter character used to separate fields of output file, e.g., ',' or '   '>]
                    bankf0 = None,  # [<None => no bank,[<'All', list of F0 (Row) entities>]> ]
                    bankf1 = None,  # [<None => no bank,[<'All', list of F1 (Col) entities>]> ]
                    verbose = True, # [<None, True> => print useful information and messages]
                    )







IDLE

>>> 
create_data() is working...

Number of Rows= 10
Number of Columns= 8
Number of Dimensions= 3
Data Min= -3.958
Data Max= 6.79
Proportion made missing= 0.112
Not-a-Number Value (nanval)= -999.0
a_data_scrap0.csv has been saved.
a_model_scrap0.csv has been saved.
Returning only specified files, no arrays.


create_data() is done.
Contains:
['fac0coord', 'model', 'fac1coord', 'data', 'anskey'] 

>>> 


The IDLE output informs us we built two text files consisting of 10 rows (nfac0 = 10) and 8 columns (nfac1 = 8) of numerical data (not including labels) with a dimensionality of 3 (ndim = 3).  The root of the file name is as specified by the outfile parameter, but Damon has added some prefix information to show that one of the files is "model" (mathematically pure) and the other is "data" (with noise and missing cells added).  Because outfile does not contain any path information, Damon automatically drops the files into the "current working directory" -- the directory where you saved your Damon script my_script3.py.  You could also have specified a file path name outfile = "/Home/Me/a_data_scrap0.py".  Check out the data files by opening them in a text editor.

Read the Data File as a Damon Object
Read in the data file using Damon().  In IDLE, type >>>  help(dmn.Damon.__init__).  The Damon() documentation will pop up.  At the end, under Paste Class, copy and paste the Damon() arguments into my_script3.py, add a dmn prefix and assign to the d_obj object:  d_obj = dmn.Damon(data,format_,...)

For the data parameter, type in the name of the data file a_data_scrap0.csv.  (You can also specify a path name if the file is not in the same directory as my_script3.py.)  For the Format parameter, type 'textfile'.  For the nheaders4rows and nheaders4cols parameters, type 1 and 1 -- the number of header columns/rows for the labels.  Because the data is from a text file, you need to tell Damon to interpret the core data as numbers.  Set validchars = ['All',['All'],'Num'].  That will do the trick.  (Make sure to read the validchars documentation in dmn.Damon.__init__  when you have a chance.  It is quite useful.)

The defaults for the rest of the parameters should work fine.  In the example, I stripped out the inline documentation for ease of reading. 

Important Note Regarding Path Names
Unlike Unix-based machines (including OS X), Windows machines use backward slashes in their pathnames ("\Home\Documents\a_Data_scrap0.csv") .  This can confuse Python, which reserves backslash for other uses.  There are two remedies if you are using a Windows machine:  
  • Use Unix-style forward slashes ("/Home/Documents/a_Data_scrap0.csv").  Python will interpret them correctly, even on a Windows machine.
  • Use Windows-style backward slashes but preface the pathname with a letter r, meaning "raw string" (r"\Home\Documents\a_Data_scrap0.csv").  This tells Python not to use the backward slash in a special way.
Getting the actual pathnames can be a bit of a chore.  On my mac, I used Automator to write a little program to get the pathname of a file, or you can use the GetInfo option.  Windows has its own tricks.

One other note, Python does not accept numerical file names.

Here is the script and the IDLE output.

Read In Text File

# Load the text file and format as Damon object
d_obj = dmn.Damon(data = 'a_data_scrap0.csv',
                  format_ = 'textfile',
                  validchars = ['All',['All'],'Num'],
                  nheaders4rows = 1,
                  nheaders4cols = 1
                  )

# Here is what the data looks like
np.set_printoptions(precision=2,suppress=True)
print 'd_obj.collabels =\n',d_obj.collabels
print 'd_obj.rowlabels =\n',d_obj.rowlabels
print 'd_obj.coredata =\n',d_obj.coredata


IDLE

Building Damon object...

Rows in coredata: 10
Columns in coredata: 8 

Damon object has been built.
Contains:
['missingchars', 'core_row', 'verbose', 'dtype', 'rl_col', 'validchars', 'whole_row', 'coredata', 'recode', 'path', 'colkeytype', 'collabels', 'data_out', 'fileh', 'key4rows', 'nheaders4rows', 'cols2left', 'workformat', 'miss4headers', 'cl_row', 'rowkeytype', 'rl_row', 'selectrange', 'nanval', 'format_', 'nheaders4cols', 'core_col', 'pytables', 'rowlabels', 'check_dups', 'cl_col', 'delimiter', 'whole_col', 'whole', 'key4cols'] 

d_obj.collabels =
[[id 1 2 3 4 5 6 7 8]]
d_obj.rowlabels =
[[id]
 [1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]
 [9]
 [10]]
d_obj.coredata =
[[  -1.94 -999.      2.33    3.48    1.5     3.26    0.08    0.23]
 [  -0.15   -2.06    2.22    0.83   -1.1     0.38    0.31    1.38]
 [   1.54   -0.6     3.33 -999.     -1.69 -999.      1.54 -999.  ]
 [  -0.62    1.1     0.06   -1.48   -1.45   -1.53    0.57   -1.59]
 [   1.09   -2.04    4.39    0.24   -1.25   -0.76    1.81    0.99]
 [   2.36    2.23   -2.52   -3.26   -0.44   -2.46   -1.65    1.11]
 [  -1.07   -1.63   -0.06 -999.      2.07    4.43   -0.99 -999.  ]
 [   0.26 -999.      6.79    2.16   -1.35   -0.26    3.7     0.28]
 [   0.53   -1.17    3.45 -999.     -1.07   -1.67    2.27   -0.09]
 [  -1.23   -2.99    5.36    3.06   -0.63 -999.      2.99   -0.57]]
>>> 




Alternative Methods
There is a confusing variety of text files in the real world, with slightly different conventions.  Damon may not be able to read all of them.  If you run into trouble, try one of Python's or Numpy's file readers, save the results as an array, and read it into Damon specifying format_ = 'array'.  Start with Numpy's genfromtxt() function.

And that is how to read in a text file.