Speed up your calculations by Intel's Python Distribution
Speed up your calculations by Intel's Python Distribution
Introduction
Needless to say that Python is quite slow for mathematical and scientific calculations as compare to other programming languages such as C and C++. However, to speed up the Python calculations, Intel software team has been developing the Python distribution for making Python faster.
This Intel software suite is actually built on top of Anaconda's Python distribution, and both distributions already come with the Intel MKL library. Moreover, Intel also provides other libraries, such as Data Analytics Acceleration Library (DAAL) and Threading Building Blocks (TBB) among others. In this post, I would like to show how to step-by-step benchmark the performance of Python between Anaconda and Intel.
Contents
Required packages for benchmarking
1. Environment
I suggest you create new environment for this project. For example:
conda create -n intel python=3
activate intel
2. Dask
pip install dask
pip install dask[array]
or
conda install tbb
3. TBB
pip install tbb4py
or
conda install tbb
Tip: Full installation of Intel Distribution for Python can be found here.
Benchmarker
QR decomposition in Python will be used here as a tester for benchmarking the performance of Anaconda's Python distribution and Intel's Python distribution.
For a square matrix A the QR Decomposition converts A into the product of an orthogonal matrix Q (i.e. (Q^T)(Q) = I) and an upper triangular matrix R. Hence: A=QR.
Create a Python script and append the following source code into it, for example: bench.py
import dask, time
import dask.array as da
x = da.random.random((100000, 2000), chunks=(10000, 2000))
t0 = time.time()
q, r = da.linalg.qr(x)
test = da.all(da.isclose(x, q.dot(r)))
assert(test.compute()) # compute(scheduler="threads") by default
print(time.time() - t0)
This script will generate random set of numbers and then compute the QR factorization of a matrix (source code of Dask's QR function is here), which is equivalent to Numpy's QR calculation function.
My machine's specifications
- Software: Windows 10
- Processor: Seventh Gen Intel Core i5-7300U
- Storage: 256GB Solid State Drive (SSD)
- Memory: 8GB RAM
- 1866Mhz LPDDR3
Running the test
1. Restart your machine and close all other applications before running the tests.
2. Open cmd (or terminal for Linux and macOS users).
3. Suppose that you already stay in a new environment. Now you are ready to execute the script.
Anaconda distrubiton for Python
python bench.py
Intel distribution for Python
python -m tbb bench.py
Note that the test can take 1-2 minutes depending on the system.
Benchmark results
1. Computation time
- Anaconda Python distribution
116.9416298866272
- Intel Python distribution
91.34923768043518
You can see that Intel Python distribution is at least 15-20% faster than that of Anaconda.
2. CPU usage before running the test
3. CPU usage during running the test
Other resources that are worth reading
References
- https://software.intel.com/en-us/distribution-for-python
- www.google.com (one of the best my advisers)
Rangsiman Ketkaew