Speed up your calculations by Intel's Python Distribution

Speed up your calculations by Intel's Python Distribution

Introduction

Needless to say that Python is quite slow for mathematical and scientific calculations as compare to other programming languages such as C and C++. However, to speed up the Python calculations, Intel software team has been developing the Python distribution for making Python faster.

This Intel software suite is actually built on top of Anaconda's Python distribution, and both distributions already come with the Intel MKL library. Moreover, Intel also provides other libraries, such as Data Analytics Acceleration Library (DAAL) and Threading Building Blocks (TBB) among others. In this post, I would like to show how to step-by-step benchmark the performance of Python between Anaconda and Intel.

Contents

Required packages for benchmarking

1. Environment

I suggest you create new environment for this project. For example:

conda create -n intel python=3
activate intel

2. Dask

pip install dask
pip install dask[array]

or

conda install tbb

3. TBB

pip install tbb4py

or

conda install tbb

Tip: Full installation of Intel Distribution for Python can be found here.

Benchmarker

QR decomposition in Python will be used here as a tester for benchmarking the performance of Anaconda's Python distribution and Intel's Python distribution.

For a square matrix A the QR Decomposition converts A into the product of an orthogonal matrix Q (i.e. (Q^T)(Q) = I) and an upper triangular matrix R. Hence: A=QR.

Create a Python script and append the following source code into it, for example: bench.py

import dask, time
import dask.array as da

x = da.random.random((100000, 2000), chunks=(10000, 2000))
t0 = time.time()

q, r = da.linalg.qr(x)
test = da.all(da.isclose(x, q.dot(r)))
assert(test.compute()) # compute(scheduler="threads") by default

print(time.time() - t0)

This script will generate random set of numbers and then compute the QR factorization of a matrix (source code of Dask's QR function is here), which is equivalent to Numpy's QR calculation function.

My machine's specifications

  • Software: Windows 10
  • Processor: Seventh Gen Intel Core i5-7300U
  • Storage: 256GB Solid State Drive (SSD)
  • Memory: 8GB RAM
  • 1866Mhz LPDDR3

Running the test

1. Restart your machine and close all other applications before running the tests.

2. Open cmd (or terminal for Linux and macOS users).

3. Suppose that you already stay in a new environment. Now you are ready to execute the script.

Anaconda distrubiton for Python

python bench.py

Intel distribution for Python

python -m tbb bench.py


Note that the test can take 1-2 minutes depending on the system.

Benchmark results

1. Computation time

  • Anaconda Python distribution
116.9416298866272
  • Intel Python distribution
91.34923768043518


You can see that Intel Python distribution is at least 15-20% faster than that of Anaconda.

2. CPU usage before running the test

3. CPU usage during running the test

Other resources that are worth reading

References

Rangsiman Ketkaew