The Information Dynamics Toolkit xl (IDTxl) is a comprehensive software package for efficient inference of networks and their node dynamics from multivariate time series data using information theory.
from local compute
open terminal
cd ~/Dropbox\ \(NewmanLab\)/scripts/projects/AllenBO_vs_PID
For full results folder:
rsync -azP --ignore-existing ehnewman@quartz.uits.iu.edu:/N/project/memLab_deepLabCut/AllenBO_vs_PID/results .
NOTES:
-a Archive mode (maintains dates modified)
-z Compress files to make transfer more efficient
--ignore-existing leaves existing files intact and only copies new files. WILL NOT UPDATE FILES THAT ALREADY EXIST
-update will recopy any files that have modifications more recently than local files
rsync -azP --ignore-existing AllenBO_vs_PID ehnewman@quartz.uits.iu.edu:/N/project/memLab_deepLabCut/AllenBO_vs_PID/
rsync -azP --ignore-existing results ehnewman@quartz.uits.iu.edu:/N/project/memLab_deepLabCut/AllenBO_vs_PID/
To avoid recopying existing files, add --ignore-existing
From Thomas Varley:
install Miniconda on your BigRed profile, and then create a conda environment named python3.7 equipped with python3.7, numpy, scipy, and IDTxl.
You can download the Miniconda install here: https://docs.conda.io/en/latest/miniconda.html
Grab the Miniconda3 Linux 64-Bit version, download it, run the BASH file and complete the install.
Then you can create an environment with `conda create -n python3.7 python=3.7 numpy scipy networkx cython matplotlib seaborn h5py pip`
Once that's done, you can activate the environment with `source activate python3.7`.
There are a few other dependencies that IDTxl needs, so run:
`conda install -c conda-forge jpype1` # required by CPU JIDT estimators
`conda install -c conda-forge pyopencl` # required by GPU OpenCL estimators
`conda install -c anaconda ecos` # required by Tartu PID estimator
`conda install numba` # required by NumbaCuda estimators
`conda install cudatoolkit` # required by NumbaCuda estimators
** ADDITION LATER **
'conda install xarray'
At that point, you can `cd` into the IDTxl directory you installed from Github and can install it with `pip install .`
That should get you a fully configured Python environment that can run IDTxl out of the box.
You shouldn't have to do this all again when you get access to Quartz - just copy the files in your .bashrc file that Miniconda writes and paste them into the .bashrc in the Quartz directory. It should point to the same Miniconda install and put it all in the relevant PATH.
Be sure you set up JAVA_HOME appropriately:
Macos -
First, make sure java is installed. Hint: it isn't by default on macos
You can download the installer from here
Use the 'arm64' installer for M1, M2 chips
Use the 'x64' installer for Intel chips
Run the following to update your .zshrc file
echo export "JAVA_HOME=\$(/usr/libexec/java_home)" >> ~/.zshrc
In the unlikely case you are using bash instead of the new defaul zsh
echo export "JAVA_HOME=\$(/usr/libexec/java_home)" >> ~/.bash_profile
I (ehren) have been building analyses to facilitate the piping of data into and through the IDTxl package.
I have been developing this code in the folder:
projDir = '/Dropbox (NewmanLab)/scripts/projects/AllenBO_vs_PID/'
scriptsDir = projDir + 'scripts/'
Here is guide to various scripts roughly in order of when you'll call each:
projectPath = os.path.join(os.path.expanduser('~'),'Dropbox (NewmanLab)/scripts/projects/AllenBO_vs_PID')
The root folder of the project. Allen NWB files will be stored in <projectPath>/Allen/Manifest.json
session_id = 791319847
session to extract rasters from see AllenBO_vs_PID page for list of options
time_step = 0.011
spkRast timescale in seconds (i.e., bid width in seconds)
minUnits = 30
fewest number of units for an area to build the spkRast
So far I (ehren) have been running it from a Jupyter Notebook on my laptop
After it runs, manually copy the rasters over to the HPC
from local compute
open terminal
cd ~/Dropbox\ \(NewmanLab\)/scripts/projects/AllenBO_vs_PID
rsync -azP --ignore-existing results ehnewman@quartz.uits.iu.edu:/N/project/memLab_deepLabCut/AllenBO_vs_PID/
If it hasn't yet synced the relevant session of data it will do this (i.e., if this is the first time processing data from this session, it will download it). NOTE: this can be slow so be patient.
It will save the spike raster files into the directory: AllenBO_vs_PID/results/<sessID>/<epchID>/spkRast.npz
Contained in this file is a search string that determines which spike raster files to built mTE networks for. It is intended to be general (e.g., sessSearchString = '*epch-all_*anat-*.npz' to process all rasters for the epch=all irregardless of the anatomical site). It will automatically process each target separately until all targets have been processed or until a 'fullNet.p' file exists.
This script is best called from a slurm batch script on the HPC that instantiates many calls to this function over HPC compute nodes.
Each job will find the first target it can that isn't either already computer or being computed and start to process it. Once it finishes processing a target, it will continue on and find the next target that needs processing until either there is nothing left to process or the HPC wall-clock time runs out.
As outputs, it will create *lock.txt files as it runs to communicate to other instances which targets are already being processed. If there are still *lock.txt files after all instances of this script have terminated, these represent processing that died before completing. They can safely be deleted. It simply means more instances of this script must be started again (by running another slurm job).
When it successfully finishes processing a target, it will rename the *lock.txt file to *done.txt. Contained inside will be information about which node did the processing, which target out of how many was processed, and the total time it took to process the target along with any other output that might have resulted from running the analysis.
Most substantively, this will save the results of the mTE analysis to a compressed numpy file in projDir + 'mTE_byTarget/' + spkRasterFnameStem + '_target_' + targNum + '.npz'
These can be loaded as follows:
res_list = []
for target in targets:
npzFile = rec + '_target_' + str(target) + '.npz'
if os.path.isfile(npzFile):
res_list.append(np.load(npzFile,allow_pickle=True)["arr_0"].item())
#SBATCH --array=1-200
This indicates implicitly the number of compute nodes to start running AllenSpkRast2mTE.py. The exact numbers given here indicate something like the name of the jobs so --array=5-10 will create 6 instances of AllenSpkRast2mTE.py, one for each of the indices 5, 6, 7, 8, 9, 10. --array=15-20 will also start 6 instances but with the indices 15, 16, 17, 18, 19, 20. In general the indices do not matter much. However, AllenSpkRast2mTE.py will pause before starting for a number of seconds equal to this index. This is intended to make it so that each instance is desynchronized from the other stances. This is to allow the instances that start before it to ID which target they will run and create the appropriate lock file before the the next one starts and looks to see which target it should run.
#SBATCH --time=4-00:00:00
As written here, it asks the batch scheduler for a 4 day reservation on the HPC. This is the max. If you need less, ask for less. Your job will start sooner.
#SBATCH --mail-user=ehnewman@iu.edu
This is the email address to which the batch scheduler will send upated. Please replace my address with yours so you get the notices instead of me :-D
#SBATCH --mail-type=FAIL,END
These are the conditions under which you'll get emails. Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), INVALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buffer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80 percent of time limit), TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send emails for each array task).
#SBATCH --mem=2G
This is the maximum amount of memory you need. The job will fail if it needs more than you ask for. However, like the time, if you ask for substantially more than you need, it will come back to get you. In the shortterm it will take longer to get your job started. In the longterm the HPC admins will hunt you down and shake a finger at you. :-o
#SBATCH --job-name=sweep
OPTIONAL - this will label your jobs in the queue with something that will help you ID what they are.
Sign onto the relevant HPC system
ssh <your username>@bigred200.uits.iu.edu
Move to the directory where this file is located
cd /N/project/memLab_IDTxl/AllenBO_vs_PID/slurmLaunchDirs/slurm_fullSweep/
Check that there are not *lock.txt files from prior runs that will block it from finishing targets that still need attention
ls ../../results/*/*/*/*lock.txt
rm ../../results/*/*/*/*lock.txt
Submit this file to sbatch
sbatch slurm_AllenSpkRast2mTE_v1.1.sh
sbatch will process this file and start a single HPC job (i.e., with a single job number) with the relevant number of subprocesses that you requested inside the script (see #SBATCH --array above).
As each subprocess starts, it will create a 'slurm_<jobID>_<subprocessIND>.out' file in the same directory as you launch it from. These build up quickly. This is why we call sbatch from inside of 'slurmLaunchDirs,' it helps keep these files in their own special home. Once the job terminates, they can be safely deleted.
Each instance of AllenSpkRast2mTE.py will do what it does (see above).
Individual jobs will run until either AllenSpkRast2mTE.py runs out of things to do or the wall-clock limit is hit (or it gets the boot)
If you need to cancel a job you started (e.g., if you made a mistake)
scancel <jobID>
in the above, replace <jobID> with the job number (e.g., if the job ID is 1284099, call scancel 1284099)
Nothing really. It does have three fields one could edit for some reason I can't imagine right now:
sessSearchString = 'sess-*'
The sessions to examine
epchSearchString = '*epch-*'
The epochs to examine
mvteSearchString = 'mTE*'
The mTE networks to collapse over
Sign onto the relevant HPC system
ssh <your username>@bigred200.uits.iu.edu
Move to the directory where this file is located
cd /N/project/memLab_IDTxl/AllenBO_vs_PID/scripts
Call python to execute the script
source activate python3.7
python CombineTargetsIntoNetwork_v1.1.py
It will automatically search for mTE folders in the AllenBO_vs_PID/results folder
It will step through each and look a fullNet.p file and move along if it exists
if it doesn't exist, it will load the spkRast.npz in the parent folder to see how many target files it is expecting
It will then iteratively look for each target file, noting any that are missing
If all exist, it will collapse it into a network with FDR (false discovery rate) correction and one without, saving each as fullNet_FDR.p and fullNet.p respectively and then move the target files to a subfolder called 'targFiles' to clean up
If they don't all exist, it will give a report as to what is missing and move along