In addition to the above-mentioned algorithmic advances, the other two factors responsible of the DL successes are the availability of huge amounts of data and computing power. DL needs to use specialised hardware with low-latency interconnects in accelerated computing i.e. massive parallel architecture, extension of the Single Instruction Multiple Data (SIMD) paradigm with large scale multi-threading, streaming memory and dynamic scheduling. Better hardware would allow to scale training beyond current data and allow to create bigger and more accurate models. The current mainstream solution [NVidiaAC] has been to use Graphics Processing Unit (GPU) as general purpose processors (GPGPU). GPUs provide a massive parallelism for large-scale DM problems, allowing scaling vertically algorithms to data volumes not computable by traditional approaches [Cano 2017]. GPUs are effective solutions for real-world and real-time systems requiring very fast decision and learning, such as DL, especially image processing. Beside that, the use of Field Programmable Gate Array (FPGA) [Lacey 2016] and the recently announced Google TPU2 (Tensor Processing Unit Second-Generation) for inference and training also constitute an interesting alternative [TPU2 2017]. Other IT companies also start to offer dedicated hardware for DL acceleration e.g. Kalray with their second generation of DL acceleration device MPAA2-256 Bostan, oriented to mobile devices such as autonomous cars [Kalray 2017]. Vertical scalability for large-scale data mining is still limited due to the GPU memory capacity, which is up to 16GB on the NVidia Pascal architecture at the moment. Multi-GPU and distributedGPU solutions are used to combine hardware resources to scale-out to bigger data (data parallelism) or bigger models (model parallelism). Integration of MapReduce frameworks with GPU computing may overcome many of the performance limitations and it is open challenges and future research [Cano 2017].