Machine Learning (ML), especially its subfield Deep Learning (DL), had many amazing advances in the recent years, and may lead to technological breakthroughs that will be used by billions of people. The software development in this field is fast changing with a great number of open-source software from academic, industry, start-up, and open source communities. As a new computing model, DL with GPU support is changing how software is developed and how it runs. Nowadays, ML algorithms learn from huge amount of real-world examples in variety formats. DL is about designing and training NNs. After a computationally consuming training, NNs can be deployed in data centers to infer, predict and classify from new incoming data presented to them. Trained NNs can also be deployed into intelligent IoT devices to understand the world. The deployment of trained NNs require smaller computational resources in comparison with the development phase. The new computing model in the Big Data era requires massive data processing and massive parallelism supports that are capable to scale computation effectively and efficiently according to the real need.
ML and DM are research areas of computer science with fast involving development due to the advances in data analysis research in the Big Data era. When the number of ML algorithms is extensive and growing, the number of their realizations through ML/NN/DL frameworks and libraries are extensive and growing too. The short outcome of the document is follows.
It is needed to notice that using these kind of tools is not the only way to build compute-intensive applications or to do data analytics and data mining. Self made code packages can do the same job. The price for this way is, of course, the time and the efforts spent on the code development and code maintenance process.
There is a challenge of managing multiple tools, multiple approaches from divergent ML/DL user communities in different applicable areas. The challenge is hard because of exposing a unified, comprehensive, efficient and coherent platform, that is capable to scale computation dynamically and on-demands. The combined impact of new computing resources and techniques with an increasing avalanche of large datasets, is transforming many research areas. This evolution has many different faces, components and contexts, and our project DEEP Hybrid DataCloud will combine some of them to propose a new e-infrastructure framework able to address relevant challenges in research.