IDTxl

The Information Dynamics Toolkit xl (IDTxl) is a comprehensive software package for efficient inference of networks and their node dynamics from multivariate time series data using information theory.

pwollstadt.github.io/IDTxl/

Helpful commands for syncing local and remote computers

Sync remote --> local:

from local compute
open terminal
cd ~/Dropbox\ $NewmanLab$/scripts/projects/AllenBO_vs_PID
For full results folder:
- rsync -azP --ignore-existing ehnewman@quartz.uits.iu.edu:/N/project/memLab_deepLabCut/AllenBO_vs_PID/results .

- NOTES:
  - -a Archive mode (maintains dates modified)
  - -z Compress files to make transfer more efficient
  - --ignore-existing leaves existing files intact and only copies new files. WILL NOT UPDATE FILES THAT ALREADY EXIST
  - -update will recopy any files that have modifications more recently than local files

Sync local --> remote:

rsync -azP --ignore-existing AllenBO_vs_PID ehnewman@quartz.uits.iu.edu:/N/project/memLab_deepLabCut/AllenBO_vs_PID/

rsync -azP --ignore-existing results ehnewman@quartz.uits.iu.edu:/N/project/memLab_deepLabCut/AllenBO_vs_PID/

- To avoid recopying existing files, add --ignore-existing

Setting up the ITDxl environment on an HPC computer for the first time

From Thomas Varley:

install Miniconda on your BigRed profile, and then create a conda environment named python3.7 equipped with python3.7, numpy, scipy, and IDTxl.

You can download the Miniconda install here: https://docs.conda.io/en/latest/miniconda.html

Grab the Miniconda3 Linux 64-Bit version, download it, run the BASH file and complete the install.

Then you can create an environment with `conda create -n python3.7 python=3.7 numpy scipy networkx cython matplotlib seaborn h5py pip`

Once that's done, you can activate the environment with `source activate python3.7`.

There are a few other dependencies that IDTxl needs, so run:

`conda install -c conda-forge jpype1` # required by CPU JIDT estimators

`conda install -c conda-forge pyopencl` # required by GPU OpenCL estimators

`conda install -c anaconda ecos` # required by Tartu PID estimator

`conda install numba` # required by NumbaCuda estimators

`conda install cudatoolkit` # required by NumbaCuda estimators

** ADDITION LATER **

'conda install xarray'

At that point, you can `cd` into the IDTxl directory you installed from Github and can install it with `pip install .`

That should get you a fully configured Python environment that can run IDTxl out of the box.

You shouldn't have to do this all again when you get access to Quartz - just copy the files in your .bashrc file that Miniconda writes and paste them into the .bashrc in the Quartz directory. It should point to the same Miniconda install and put it all in the relevant PATH.

Be sure you set up JAVA_HOME appropriately:

Macos -
- First, make sure java is installed. Hint: it isn't by default on macos
  - You can download the installer from here
    - Use the 'arm64' installer for M1, M2 chips
    - Use the 'x64' installer for Intel chips
- Run the following to update your .zshrc file
  - echo export "JAVA_HOME=\$(/usr/libexec/java_home)" >> ~/.zshrc
- In the unlikely case you are using bash instead of the new defaul zsh
  - echo export "JAVA_HOME=\$(/usr/libexec/java_home)" >> ~/.bash_profile

I (ehren) have been building analyses to facilitate the piping of data into and through the IDTxl package.

I have been developing this code in the folder:

projDir = '/Dropbox (NewmanLab)/scripts/projects/AllenBO_vs_PID/'
scriptsDir = projDir + 'scripts/'

Here is guide to various scripts roughly in order of when you'll call each:

Building spike rasters from recordings

buildSpkRasters.ipynb : This is a jupyter notebook I built to build spike raster files for all the various epoch types and anatomy targets found in an individual Allen Brain Observatory (AllenBO) recording.
- Things you might want to edit inside:
  - projectPath = os.path.join(os.path.expanduser('~'),'Dropbox (NewmanLab)/scripts/projects/AllenBO_vs_PID')
    - The root folder of the project. Allen NWB files will be stored in <projectPath>/Allen/Manifest.json
  - session_id = 791319847
    - session to extract rasters from see AllenBO_vs_PID page for list of options
  - time_step = 0.011
    - spkRast timescale in seconds (i.e., bid width in seconds)
  - minUnits = 30
    - fewest number of units for an area to build the spkRast
- How to use:
  - So far I (ehren) have been running it from a Jupyter Notebook on my laptop
  - After it runs, manually copy the rasters over to the HPC

from local compute
open terminal
cd ~/Dropbox\ $NewmanLab$/scripts/projects/AllenBO_vs_PID

- - - rsync -azP --ignore-existing results ehnewman@quartz.uits.iu.edu:/N/project/memLab_deepLabCut/AllenBO_vs_PID/
- What it will do:
  - If it hasn't yet synced the relevant session of data it will do this (i.e., if this is the first time processing data from this session, it will download it). NOTE: this can be slow so be patient.
  - It will save the spike raster files into the directory: AllenBO_vs_PID/results/<sessID>/<epchID>/spkRast.npz

Compute functional networks from rasters

AllenSpkRast2mTE.py: This is a script to automatically comb through existing raster files and build mTE functional networks for each.
- Things you might want to edit inside:
  - Contained in this file is a search string that determines which spike raster files to built mTE networks for. It is intended to be general (e.g., sessSearchString = '*epch-all_*anat-*.npz' to process all rasters for the epch=all irregardless of the anatomical site). It will automatically process each target separately until all targets have been processed or until a 'fullNet.p' file exists.
- How to use:
  - This script is best called from a slurm batch script on the HPC that instantiates many calls to this function over HPC compute nodes.
- What it will do:
  - Each job will find the first target it can that isn't either already computer or being computed and start to process it. Once it finishes processing a target, it will continue on and find the next target that needs processing until either there is nothing left to process or the HPC wall-clock time runs out.
  - As outputs, it will create *lock.txt files as it runs to communicate to other instances which targets are already being processed. If there are still *lock.txt files after all instances of this script have terminated, these represent processing that died before completing. They can safely be deleted. It simply means more instances of this script must be started again (by running another slurm job).
  - When it successfully finishes processing a target, it will rename the *lock.txt file to *done.txt. Contained inside will be information about which node did the processing, which target out of how many was processed, and the total time it took to process the target along with any other output that might have resulted from running the analysis.
  - Most substantively, this will save the results of the mTE analysis to a compressed numpy file in projDir + 'mTE_byTarget/' + spkRasterFnameStem + '_target_' + targNum + '.npz'
    - These can be loaded as follows:
    - res_list = []
      for target in targets:
      npzFile = rec + '_target_' + str(target) + '.npz'
      if os.path.isfile(npzFile):
      res_list.append(np.load(npzFile,allow_pickle=True)["arr_0"].item())
slurm_AllenSpkRast2mTE_v1.1.sh: A bash script intended to be passed to sbatch to initiate the HPC job that will run AllenSpkRast2mTE.py (see above) across numerous compute nodes.
- Things you might want to edit inside:
  - #SBATCH --array=1-200
    - This indicates implicitly the number of compute nodes to start running AllenSpkRast2mTE.py. The exact numbers given here indicate something like the name of the jobs so --array=5-10 will create 6 instances of AllenSpkRast2mTE.py, one for each of the indices 5, 6, 7, 8, 9, 10. --array=15-20 will also start 6 instances but with the indices 15, 16, 17, 18, 19, 20. In general the indices do not matter much. However, AllenSpkRast2mTE.py will pause before starting for a number of seconds equal to this index. This is intended to make it so that each instance is desynchronized from the other stances. This is to allow the instances that start before it to ID which target they will run and create the appropriate lock file before the the next one starts and looks to see which target it should run.
  - #SBATCH --time=4-00:00:00
    - As written here, it asks the batch scheduler for a 4 day reservation on the HPC. This is the max. If you need less, ask for less. Your job will start sooner.
  - #SBATCH --mail-user=ehnewman@iu.edu
    - This is the email address to which the batch scheduler will send upated. Please replace my address with yours so you get the notices instead of me :-D
  - #SBATCH --mail-type=FAIL,END
    - These are the conditions under which you'll get emails. Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), INVALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buffer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80 percent of time limit), TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send emails for each array task).
  - #SBATCH --mem=2G
    - This is the maximum amount of memory you need. The job will fail if it needs more than you ask for. However, like the time, if you ask for substantially more than you need, it will come back to get you. In the shortterm it will take longer to get your job started. In the longterm the HPC admins will hunt you down and shake a finger at you. :-o
  - #SBATCH --job-name=sweep
    - OPTIONAL - this will label your jobs in the queue with something that will help you ID what they are.
- How to use:
  - Sign onto the relevant HPC system
    - ssh <your username>@bigred200.uits.iu.edu
  - Move to the directory where this file is located
    - cd /N/project/memLab_IDTxl/AllenBO_vs_PID/slurmLaunchDirs/slurm_fullSweep/
  - Check that there are not *lock.txt files from prior runs that will block it from finishing targets that still need attention
    - ls ../../results/*/*/*/*lock.txt
    - rm ../../results/*/*/*/*lock.txt
  - Submit this file to sbatch
    - sbatch slurm_AllenSpkRast2mTE_v1.1.sh
- What it will do:
  - sbatch will process this file and start a single HPC job (i.e., with a single job number) with the relevant number of subprocesses that you requested inside the script (see #SBATCH --array above).
  - As each subprocess starts, it will create a 'slurm_<jobID>_<subprocessIND>.out' file in the same directory as you launch it from. These build up quickly. This is why we call sbatch from inside of 'slurmLaunchDirs,' it helps keep these files in their own special home. Once the job terminates, they can be safely deleted.
  - Each instance of AllenSpkRast2mTE.py will do what it does (see above).
  - Individual jobs will run until either AllenSpkRast2mTE.py runs out of things to do or the wall-clock limit is hit (or it gets the boot)
  - If you need to cancel a job you started (e.g., if you made a mistake)
    - scancel <jobID>
    - in the above, replace <jobID> with the job number (e.g., if the job ID is 1284099, call scancel 1284099)
CombineTargetsIntoNetwork_v1.1.py: Will automatically step through all mTE_* folders in results/ to see if all the target files necessary exist to condense the results into a full functional network. It will then combine them, save it as 'fullNet.p' and clean up the separate target files. It can be safely run over and over again. It will skip mTE_* folders that have already been processed or that aren't ready to collapse the network.
- Things you might want to edit inside:
  - Nothing really. It does have three fields one could edit for some reason I can't imagine right now:
    - sessSearchString = 'sess-*'
      - The sessions to examine
    - epchSearchString = '*epch-*'
      - The epochs to examine
    - mvteSearchString = 'mTE*'
      - The mTE networks to collapse over
- How to use:
  - Sign onto the relevant HPC system
    - ssh <your username>@bigred200.uits.iu.edu
  - Move to the directory where this file is located
    - cd /N/project/memLab_IDTxl/AllenBO_vs_PID/scripts
  - Call python to execute the script
    - source activate python3.7
    - python CombineTargetsIntoNetwork_v1.1.py
- What it will do:
  - It will automatically search for mTE folders in the AllenBO_vs_PID/results folder
  - It will step through each and look a fullNet.p file and move along if it exists
  - if it doesn't exist, it will load the spkRast.npz in the parent folder to see how many target files it is expecting
  - It will then iteratively look for each target file, noting any that are missing
  - If all exist, it will collapse it into a network with FDR (false discovery rate) correction and one without, saving each as fullNet_FDR.p and fullNet.p respectively and then move the target files to a subfolder called 'targFiles' to clean up
  - If they don't all exist, it will give a report as to what is missing and move along

Page updated

Report abuse

IDTxl

Helpful commands for syncing local and remote computers

Sync remote --> local:

Sync local --> remote:

Setting up the ITDxl environment on an HPC computer for the first time

Building spike rasters from recordings

buildSpkRasters.ipynb : This is a jupyter notebook I built to build spike raster files for all the various epoch types and anatomy targets found in an individual Allen Brain Observatory (AllenBO) recording.

Things you might want to edit inside:

How to use:

What it will do:

Compute functional networks from rasters

AllenSpkRast2mTE.py: This is a script to automatically comb through existing raster files and build mTE functional networks for each.

Things you might want to edit inside:

How to use:

What it will do:

slurm_AllenSpkRast2mTE_v1.1.sh: A bash script intended to be passed to sbatch to initiate the HPC job that will run AllenSpkRast2mTE.py (see above) across numerous compute nodes.

Things you might want to edit inside:

How to use:

What it will do:

Things you might want to edit inside:

How to use:

What it will do: