GREENE cluster uses Intel CPUs, and thus your code can benefit from Intel distribution of Python. Look at their benchmarks and see if this would be useful for you.
Intel distribution of Python together with library daal4py may make your ML code much faster!
In order to experiment with that on cluster use
module use /share/apps/intel/19.1.2/intelpython3/binNote: command below will show available environmental modules for python build with intel compilers. However this is not Intel distribution of Python
module avail python/intel/3.8.6You have several options to make your computations parallel
Independent tasks
If you can separate your large job to several independent tasks - do that, and submit a corresponding number of jobs.
Grid Search across parameter space
If you may do the grid search on independent nodes
ML/Data processing algorithms supporting multiple CPUs on single node
ML/Data processing algorithms supporting multiple CPUs on multiple nodes
This usually requires an algorithm to be compiled with MPI
Splitting data
Some algorithms support splitting data, performing computations on independent chunks of data and then combining results
Splitting features
Some algorithms support splitting by features: performing computations on independent subsets for features and then combining results
It is always good to look at options listed in official documentation: here and here.
If you use Python thread-based parallelism, note that Python uses GIL (Global Interpreter Lock) - at any one time only a single thread can acquire a lock for a Python object. If you use threads, access to global objects will jump from one thread to another.
In case you use Python multiprocessing: there is no access to global variables at all. Objects will be pickled and passed by scheduler to independent processes.
There are other tools addressing the parallelization task like Ray, mpi4py, dask-mpi. We have examples for a couple of options you may find useful:
A well written overview of parallelization options for R is presented here https://www.glennklockwood.com/data-intensive/r/
Note: It is important to specify number of cores requested for R correctly
A wrong way
No matter how many cores you requested in SLURM, this will instruct R to use all the cores on a node (20, 28, or even 40). A job that requests a number of CPUs but uses more or fewer cores than requested will be detected by monitoring tools and then killed
An acceptable method:
Also set 1 as value for OMP_NUM_THREADS. Some R algorithms will still try to use more cores if OMP_NUM_THREADS is not setup. Thus, please do
# inside R scriptSpark provides an alternative method to distribute computations across multiple CPUs and multiple nodes. Spark comes with Spark ML - a collection of spark ML algorithms accessibly using pyspark and sparklyr.
You can read more information here: Spark, MLib, PySpark, sparklyr
Some may find useful algorithms designed by H2O, which can be executed with sparkling water (H2O on top of Spark): Sparkling Water
Example: Spark interactive: Scala, Python, R