KAU Data Scienсe Center

3.3.1 Deeplearning4j

Deeplearning4j or DL4J is distinguished from other ML/DL frameworks and libraries. It is a modern open-source, distributed, DL library implemented in Java (JVM) aimed to the industrial Java development ecosystem and Big Data processing.

DL4J framework comes with built-in GPU support, which is an important feature for the training process and supports YARN, Hadoop's distributed, application management framework [DL4J] [Skymind 2017]. The library consists of several sub-projects for developers such as raw data transformation into feature vectors (DataVec), tools for NN configuration (DeepLearning4j), 3rd party model import (Python and Keras models), native libraries support for quick matrix data processing on CPU and GPU (ND4J), Scala wrapper running on multi-GPU with Spark (ScalNet), library of reinforcement learning algorithms (RL4J), tool for searching the hyperparameter space to find the best NN configuration, and working examples (DL4J-Examples). Deeplearning4j has Java, Scala and also Python APIs.

It supports various types and formats of input data easily extendable by other specialized types and formats. The DataVec toolkit accepts raw data such as images, video, audio, text or time series on input and enables its ingestion, normalization and transformation into feature vectors. It can also load data into Spark RDDs. DataVec contains record readers for various common formats. DL4J includes some of the core NLP tools such as SentenceIterator (for feeding text piece by piece into natural language processor), Tokenizer (for segmenting the text at the level of single words or ngrams), Vocab (cache for storing metadata). Specialized formats can be introduced by implementing custom input format similarly as it is in Hadoop via InputFormat.

Strong points

The distinguished advantage of DL4j is it uses the whole power of the Java ecosystem to perform efficient DL [Varangaonkar 2017]. It can be implemented on top of the popular Big Data tools such as Apache Hadoop/Spark/Kafka with an arbitrary number of GPUs or CPUs. DL4J is the choice for many commercial, industry-focused distributed DL platform, where the Java ecosystem is predominate in business software development.
Rich set of DL architectures CNN, RNN (RNTN, LTSM), RBM and DBN i.e, excellent capabilities for image recognition, fraud detection and NLP.

Weak points

Java/Scala are not the most popular language in the DL/ML community like Python.
Currently, it gains less overall interest than H2O in Big Data and Spark community.

Return to Contemt

Google Sites

Report abuse