The main feature of the many-core accelerators such as GPU is their massively parallel architecture allowing them to speed up computations that involve matrix-based operations, which is a heart of many ML/DL implementations. Manufacturers often offer the possibility to enhance hardware configuration with many-core accelerators to improve machine/cluster performance as well as accelerated libraries, which provide highly optimized primitives, algorithms and functions to access the massively parallel power of GPUs (Table 2).
Other parallel programming libraries, which support computational speed-up, are:
An application built with the hybrid model of parallel programming can run on a computer cluster using both OpenMP and MPI, such that OpenMP is used for parallelism within a (multi-core) node while MPI is used for parallelism between nodes. More details about accelerators and accelerated computing will be available in Deliverable D4.1 Available Technologies for accelerators and HPC. In the recent years the accelerators have been successfully used in many areas e.g. text, image, sound processing and recognition, life simulations as ML/NN/DL applications. Applicable areas of DNNs are: