Matlab Parallel Jobs
Matlab Parallel Jobs
Simple PARFOR
In the MATLAB script, just substitute for loops with parfor loops but note that parfor does not work for Nested loops.
In MATLAB script:
for i=1:n => parfor i=1:n
In SLURM script:
Need to reserve the whole node. Here, the number of processor (-c) and memory (--mem) depends on the type of node defined by keyword. Visit Server & Storage for details.
#SBATCH -N 1 -c <x> -C <keyword> --mem=<m>gb
Inherently Parallel Jobs
There are operations and functions in MATLAB that tries to grab as much core as possible. Please follow the job script rule as suggested for simple PARFOR. You can try the sample example "solveEq.m" at /usr/local/doc/MATLAB.
MDCS (Matlab Distributed Computing Server)
Configuration and Validation:
It is a good idea to create a separate directory for different versions of MATLAB under matlab directory at your home.
mkdir <version> /home/<caseID>/matlab
Copy the appropriate SLURMProfile20xxb.settings at /usr/local/doc/MATLAB directory.
cd <version>
cp /usr/local/doc/MATLAB/<SLURMProfile> .
Open MATLAB GUI (follow the Interactive Job Submission section above)
Click on the Parallel drop-down menu and select "Cluster Profile Manager". The naming may change for different versions of matlab.
Click on "Import" and select the SLURM Profile File.
Click on the edit button and make the following changes:
change the entries of <caseID> with your caseID
In Resource List parameters, you may want to modify the resources according to your need.
Click on the Validate; all stages should pass. If it fails, make sure that changes are reflected properly and validate again. In case of SLURM validation failure, please refer to Troubleshooting section.
(Note: Please use minimum workers to run your job considering the available MDCS licenses. Start with 2 workers. Each worker is going to checkout 1 MDCS licenses)
Distributed Jobs
In distributed MDCS, workers or processors are engaged to compute different tasks of a job in different nodes.
Interactive Submission:
Copy the distributed script "createjob.m" from /usr/local/doc/MATLAB which looks like the following.
%MATLAB Distributed Job
myCluster = parcluster; % returns a cluster object identified by default cluster
% profile
j = createJob(myCluster);
createTask(j,@rand,1,{{1} {2} {3}}); % Three rand tasks with three different inputs
submit(j); % tasks may be submitted to different nodes
wait(j); % wait for jobs to complete
results = getAllOutputArguments(j); % Get the results
celldisp(results); % Display the results
destroy(j); % Destroy all the traces for garbage mgmt
Open the MATLAB terminal following the Interactive Job Submission Procedure above. Make sure that you are in the directory where your "createjob.m" file is and then type:
createjob
outputs:
results{1} =
0.3246
....
In this example, we are using MATLAB built in rand function. For user defined function, You need to add your script or path to directory that contain the script in TorqueProfile under "Files and Folder". Your create task statement looks as below.
createTask(j,@<YOURFUNCTION,<# of outputs>,{{<task1-input>} {<task2-input>} {...}});
BATCH Submission:
You can copy the dependency file "primeNumbersDist_serial.m" and the distributed job file "primeDist_serial.m" from /usr/local/doc/MATLAB to test. The script "primeNumbersDist_serial.m" counts the number of prime numbers given the upper bound.
The SLURM script for batch submission looks similar to the one for Monte Carlo Method. If you want to run distributed job with user defined function as a batch submission, copy the SLURM script "primeDist.slurm" from /usr/local/doc/MATLAB and submit it using:
sbatch primeDist.slurm
Parpool
MATLAB workers act on the part of the iterations in the same node (shared Memory).
poolObj = parpool(4); % Assign 4 workers
parfor i = lower : upper
...
...
delete (poolObj); % Garbage Collection
Example: Central Theorem
Central Theorem simulation is the example that exhibits slicing. This example investigates the performance of the central limit theorem in the deep tails of the distribution. As an example, simulate from a t distribution with df = 3. To increase the speed and avoid overloading the memory, each simulation is divided into a set of batches using parfor statement. This shows basic idea to take advantage of parallelism. Also, though it uses plot functions, the GUI has been printed in another file format (.ps) submitting the job as a batch job. This is just the example code to guide you in writing optimal code for parallelism.
Run as a Batch Job
Copy the MATLAB pool script file "central_theorem.m" from /usr/local/doc/MATLAB and create a job script file "runCenTh.slurm using the template above.
Submit the script:
sbatch runCenTh.slurm
In the latest version of matlab (R2014 & later) use "poolObj = parpool(n)" and "delete (poolObj)" at the beginning and the end respectively, where n is the number of workers. If your SLURM Profile configuration is not recognized even when it is selected as a default configuration, you need to explicitly provide the full-path to the exported profile in the matlab script:
myProfile = parallel.importProfile('/home/CaseID/<path-to-slurm-profile-file>/SLURMProfile2015b')
poolObj = parpool(myProfile,4);
To see the plot:
evince-viewer plot_central.ps