multiprocessing is Python's standard module for parallel computing. It allows a function to be executed by the nth process. The most common use case in data analysis is to apply the same process to multiple years, multiple ensembles, or multiple model outputs. In this case, the results of each calculation only need to be written to a file, so there is no need for communication between processes.
In parallel computing with multiprocessing, you define a function and call that function in parallel.
One important point to note is that if an error occurs in the parallel portion of multiprocessing, it can be difficult to trace. Therefore, it is recommended to include an option in the script to choose whether or not to use parallelization. If an error occurs, debug it without parallelization, resolve the issue, and then perform the parallel computation. The following example performs parallel computation by dividing the calculation by year. The actual calculation is done in the calc_for_one_member function, which can be used for both parallel and non-parallel execution. import numpy as npimport multiprocessingimport osimport pdb
is_parallel=Truenice_value=20n_parallel=9 # number of parallel computation
def calc_for_one_member(member): if is_parallel: os.nice(nice_value) # <- reduce priority by using large nice value ... main body of computation .... if not is_parallel: pdb.set_trace() #<- if needed, stop by using python debugger, usable only non-parallel
members = np.arange(29) # member generation example.len_members = len(members)n_round = int(np.ceil(len_members/n_parallel))
for i_round in range(n_round): i_st = i_round * n_parallel # index start i_ed = i_st + n_parallel # index end if is_parallel:
jobs=[] for member in members[i_st:i_ed]: job=multiprocessing.Process(target=calc_for_one_member, args=([member])) jobs.append(job) job.start() [job.join() for job in jobs] else: for member in members[i_st:i_ed]: calc_for_one_member(member)