How do you manage your data?

How do you manage your data?  How do other people manage their data?  We all have idiosyncratic styles, which result partially from individual preferences and partially from the nature of the work that we do.

Have you ever been curious about how other people handle their data?  Are you looking for ways to get more organized?  Then, check out our examples of data management structures.

Have you found data management ideas that work, and want to share them?  Then, create a diagram describing your data management structure and send it to us at sedimentexp@gmail.com, accompanied by a brief caption explaining what you are showing.  It could be as simple as showing the way that folders are organized on your computer.  Feel free to present things in any way that works best for you (screenshots, diagrams, drawings)!  Including a photo of your experiment would help too!  The example(s) below can help you to get started...




Description: I am currently running experiments on understanding the initiation of sediment motion in bed load transport.  These experiments depend on capturing high-speed movies that take up huge amounts of hard disk space (~100GB per experiment).  Thus, I needed to find a way to clearly separate the "raw" movie files from the "processed" data (e.g., trajectories of individual particles) that are much smaller and more manageable for everyday use.  While still "objective" in the sense that no data analysis or interpretations have been performed, the processed data are affected by individual decisions I have made (e.g., how to set the color threshold for particle extraction), providing a clear separation between these and the raw data.  Nonetheless, after initial creation, the processed data remain relatively unchanged.  The "Analysis" folder is where the action happens.  Given my perpetual work on the project, this folder is quite messy, containing lots of half-written and poorly-commented scripts that will hopefully get cleaned up eventually.  That's why I believe it is good to have a clear separation between the "Raw," "Processed," and "Analysis" folders shown above. 
--Raleigh Martin, University of Pennsylvania





I like to organize my folders at the top level by data type (force data, photos, videos, laser data, etc.). Within each of those folders, I have three folders for raw, processed, and derived data.

  • The raw data has the files that come directly from sensors or observations, organized in folders by date, YYMMDD so that they will sort numerically. If more than one run is done on a day, a descriptive suffix is added.
  • The processed data, for example for the force data, have a binary file (.mat in my case) of the data that can be used in the software I like (MATLAB), those files are also named YYMMDD, parallel to the raw data folder.
  • In the derived data folder, I number the folders so they will sort by processing step. Inside each folder, the files are still named with the same YYMMDD unique ID. I keep the YYMMDD string in the name of figures. The parallel naming of files in the different folders and figures is really useful for running scripts on entire folders, and searching for all figures from a certain date (by searching for the unique ID in a photo organizer like Picasa).
  • A master list of the runs explaining more detail of each YYMMDD unique identifier is kept in a table.
  --Leslie Hsu
ą
Leslie Hsu,
Jun 22, 2013, 6:24 AM
ą
Raleigh Martin,
Jun 18, 2013, 12:34 PM
ą
Raleigh Martin,
Jun 18, 2013, 12:45 PM
Comments