Introduction

DistNumPy is a branch of NumPy that is targeting distributed memory architectures. The idea is to utilize distributed memory architectures when running a sequential Python program that makes use of the NumPy-module.  In order to accomplish this we have introduced a new array-type in NumPy called Distributed NumPy Array.

NumPy is a numerical computation module for Python mainly implemented in C. It is the fundamental package needed for scientific computing with Python. NumPy introduces a powerful N-dimensional array object, which supports a broad range of numerical operations and by using array-operations instead of scalar-operations, a significant performance boost can be achieved.

NumPy supports to some degree parallelization but only in a shared memory environment. Unfortunately, many scientific computations are computed on large distributed memory machines mainly because of the computation and memory demand. In these cases, the communication between processes has to be implemented by the programmer manually. The result is a significant difference between the sequential program and the parallelized program. We have eliminated this difference by introducing a distributed version of the N-dimensional array object. All operations with this new array will utilize all available processes and the array itself will be distributed between multiple processes making it possible to have larger arrays than otherwise possible.

The only API difference between our branch and the original NumPy is an extra parameter added to all array creation routines. The parameter specifies whenever the array should be distributed or not.

NB: This project is continued through the project Bohrium: http://www.bh107.org

Monte Carlo example

The following program is a straightforward Monte Carlo computation of Pi. The program is written in Python using our NumPy module.

import numpy as np
import time
def MC(c, s):
    start=time.time()
    x = np.random([s], dtype=np.float, dist=True)
    y = np.
random([s], dtype=np.float, dist=True)
    sum = 0.0
    for i in range(c):
        np.square(x,x)
       
np.square(y,y)
       
np.add(x,y,x)
       
np.less_equal(x, 1.0, x)
        sum +=
np.add.reduce(x)*4.0/s
    stop=time.time()
    print 'Performance of Monte Carlo (in sec): ', stop-start,