Computational Lab Notebooks

For EACH research project you undertake (essentially, every time you create a Project Summary), create a new Benchling Project:

1. Create a benchling project - SHARE with the pailab-umms organization or at least Athma (athmaapai)

2. Create a notebook entry for each segment of the project

3. Each day, outline the "experiments" (broad term here) that you are undertaking - with goals, results, and conclusions/next steps clearly outlined.

Everything should be recorded - you should be able to hand the project to your labmate tomorrow and they shouldn't have to look up anything in order to be able to re-create your entire analysis workflow!

Computational experiments include many different elements such as:

1. data downloading/parsing

- outline data sources (papers, database accessions, date downloaded, etc) as well as path to where data is saved

salient features of each dataset downloaded (number of samples, sample sources, experimental protocol used to generate data, processing/analyses previous done
scripts for data "munging" or clean-up/formatting necessary for your analyses

2. central script development (usually unix, bash, or python scripting)

- goal of each script, path to script, parameters input/output
- versions of different softwares (with links to manuals)
- order in which a set of scripts must be run or key things to change for each run

3. data analyses + plotting

goal of each analysis, results, and next steps
scripts (+ paths) for analyses
try to use R Markdown/Jupyter Notebooks when possible

Division of computational lab notebooks

Given that you will be integrating across multiple different programming/scripting resources (ie. unix, python, R, etc), it is quite hard to keep a notebook as one continuous document. Instead, I recommend the following workflow:

1. Keep a benchling entry for each day that outlines the main features of the work that you've done that day - including paths to data/scripts, version controls, analyses you've undertaken, and plots you've made, as outlined above.

2. Create either a Github or Bitbucket account, and use them for version control on all your scripts (and markdown/notebook documents, see below) - especially those on the cluster. At the end of each day, push to commit your changes - this will ensure that you won't loose work and allow you to revisit previous snippets of code you didn't would be useful!

3. For all data exploration and analyses, in either R or python, format your code in a Markdown document (R Markdown or Jupyter Notebook). At the end of each day, push commits on these scripts and upload the pdfs outputs to Benchling to that your results can be easily shared (especially with Athma).

4. Consider using workflowr to create a github hosted page that allows you to share your analysis and result as they're updated!

More resources for organizing computational lab notebooks:

PLOS Comp Bio's 10 Simple Rules for a Computation Biologist's Laboratory Notebook

Addgene's How to Keep a Lab Notebook for Bioinformatic Analyses

Ivan Hanigan's Implementing an Electronic Lab Notebook and Keeping an ELN for Computational Statistics Projects (relevant even though epidemiology focused!)

Biostar's thread on Documentation in Bioinformatics (note that Benchling essentially serves as a version controlled wiki!)

Cookiecutter data science Tips on what makes a good computational lab notebook

Computational Lab Notebooks

The Pai Lab at University of Massachusetts c2018