"Neurodata Without Borders (NWB) provides a common self-describing format ideal for archiving neurophysiology data and sharing it with colleagues. NWB provides a standardized schema – a set of rules and best practices – to help neuroscientists package their data and metadata together so that they are both machine- and human-readable."
The following are the steps that I (Ehren) took to get oriented to what NWB is and how it works.
(~2 hrs) It started when I started reading the info page about NWB. This provides a well-meaning but often cryptic introduction. My main take-away? That the standard over specifies the life out of all the ways you should document your experiment. Okay, mostly I appreciate the need for so much obsesiveness, but woah. A lot of it is way jargony and the shear number of specified fields is overwhelming. But, I suppose one doesn't need to use every last one (though the more you use, the more transparent your data becomes, the better). Hopefully, it is possible to build reusable modules that save the effort (e.g., with the details regarding opto-light devices, etc.).
After reading through much of the documentation at the above link, I decided that I want to play with some data in an existing NWB file. I downloaded an example NWB data file that should have lots of single units from DANDI with the thought that I might be able to try implementing an IDTxl analysis.
To read in this file, I needed to follow some tutorial. I landed on the pyNWB toolbox page which was impressively well organized and had a nice 'NWB File Basics' tutorial. Feeling the need to learn more about Python and given that I eventually hoped to push the data into the Python-based IDTxl toolbox, I decided to start here instead of using the matNWB. This required getting pyNWB installed. . .
Starting with my M1 Macbook Pro, which already had Anaconda installed. . .
Problem: I tried to start Anaconda but it would not, it only allowed me to 'Force quit.'
Solution attempt 1: I tried to reinstall Anaconda from an install package I downloaded from their site but this said to use the command
conda update anaconda
Soln attempt 2: I called updated anaconda as suggested. It took forever but it still won't start.
Soln attempt 3: I updated Python because it was showing up as using 99% of processor even though Anaconda wouldn't start with
conda update python
Soln attempt 4: I gave up on the Anaconda Navigator and just decided to use the terminal window itself
Soln attempt 5: <LATER> I found a page saying the solution is to update anaconda-navigator and this worked
conda update anaconda-navigator
Problem: Install pyNWB
Soln attempt 1: called the following in a terminal (without starting any coding environment)
conda install -c conda-forge pynwb
Problem: Start Python so I could try pyNWB
Soln attempt 1: call the following in the terminal
python
Problem: Understand what NWB is
Soln attempt 1: Blindly follow the 'NWB File Basics' tutorial.
Soln attempt 2: Try to load the NWB file I downloaded from DANDI
Problem: file is in a different folder requiring I figure out how to get Python to change working directory
solution:
import os
print(os.getcwd())
os.chdir('/Users/ehren/Downloads')
Problem: get the file into readable form
Solution:
from pynwb import NWBFile, TimeSeries, NWBHDF5IO
io = NWBHDF5IO("sub-699733573_ses-715093703.nwb",mode="r")
read_nwbfile = io.read()
Problem: I get some big long error ending with 'KeyError: "'ndx-aibs-ecephys' not a namespace" '
Solution attempt 1: Some errorpage on the AllenSDK user forum indicated that this meant I needed the updated SDK (which I didn't have at all). I got this by quitting Python and then installing AllenSDK as follows. But this did not solve the problem.
quit()
pip install allensdk
Solution attempt 2: Follow tutorials on the following page: https://allensdk.readthedocs.io/en/latest/visual_coding_neuropixels.html
The quickstart demo seems to be working well (although the data fetch is way slower than via DANDI)
Notes from CoSyNe workshop
All materials can be found here: https://bit.ly/nwb-cosyne-2023
DANDI -
Focus on 0402 folder on Dandi tutorials
tqdm is a useful python tool for generating progress bars
For streaming of files from DANDI:
Use Python button from DANDI page
from nwbwidgets import Panel
Dandi downloads - focus on getting individual files instead of whole session because whole session can be VERY large
Image below shows how to stream files to local analysis files
SpikeInterface allows for spike sorting on recordings
See also ProbeInterface for recordings done with probes (i.e., good for use with neuropixels)
There is a demo available in the cosyne-2023 tutorials to use as a template (use it!)
will need docker/singularity isntalled to download and use 'docker-ized' spike sorters (dockerized means able to run in an isolated and localized compute environment)
Includes information about curation of units and how to compare the output of multiple of sorters
Pose estimateion in NWB
SLEAP and DLC are supported
Can work with video stored in NWB ('internal mode': intended for optical imaging and stimulus videos) or by pointing to a file ('external mode': videos of tracking behavior)
My challenge when I started playing with the Allen Brain Institute Brain Observatory data had been to learn how to use the NWB object. Unfortunately, the AllenSDK is required to work with the Brain Observatory data and the extra layer totally removes the need to work with pyNWB directly. Alas.
But, now that I was able to download some of the data and get the spiking into a form that can be processed by IDTxl, I'm tempted to see how far I can push it. Here are some notes about how this could, in principle, mature into something publishable.
Scientific Question: Does the about of synergy vary systematically by brain region as a function of the task being performed?
Hypothesis: Synergy varies directly as a function of the amount of 'work' a neural circuit is doing.
Approach: The Allen B.O. data sets include recordings of the spiking of dozens of well-isolated single units in over a dozen different brain areas while the mouse was either unstimulated (i.e., 'spontaneous') or presented with visual patterns of varying complexity (inc. flashes, gabor patches, moving gratings, movies, etc.). One could think of this as a large design 2-way matrix [anatomy X visual condition] where anatomy and visual condition each have numerous levels. Given this, one can ask how some metric (e.g., synergy) varies over conditions differently across different brain areas. For synergy, we would extract the mvTE functional network for each brain area separately. Whether this would be done separately for each condition or using the data from all the conditions isn't immediately clear to me. With the functional networks, then compute the mean synergy for the computational triads separately for each condition.
Logic: If synergy is related to the amount of work a circuit is doing, then it should be greater in conditions when there is more processing required (e.g., videos > scrambled videos > flashes). But, this relationship should only hold in brain regions that are strongly visual and, arguably, this should be a stronger effect for the higher visual areas relative to the lower visual areas as those are ones that are putatively more sensitive to higher order structure in the data.