FlinkML is a part of Apache Flink, which is an open-source framework for distributed stream and batch data processing [Flink]. FlinkML aims to provide a set of scalable ML algorithms and an intuitive API adopted to Flink distributed framework; it contains algorithms for supervised learning, unsupervised learning, data preprocessing, recommendation and other utilities. Flink is focused on working with lots of data with very low data latency and high fault tolerance on distributed systems; its core feature is its ability to process data streams in real time. The main difference between Spark and Flink lies in the way each framework deals with streams of data.
Flink is a native streaming processing framework that can work on batch data. Spark was originally designed to work with static data through its RDDs, it uses microbatching to deal with streams.
Oryx 2 from Cloudera also has a ML layer. Oryx 2 is a realization of Lambda architecture built on Apache Spark and Apache Kafka for real-time large scale ML [Oryx2]; it is designed for building applications and includes packaged, end-to-end applications for collaborative filtering, classification, regression and clustering. Oryx 2 comprises the following three tiers 1) general Lambda architecture tier for batch, speed and serving layers, which are not specific to ML; 2) ML abstraction to hyperparameter selection; 3) end-to-end implementation of the same standard ML algorithms as an application (ALS, random decision forests, k-means).
KNIME (Konstanz Information Miner) is the data analytic, reporting and integration platform of the Knime AG, Switzerland [KNIME]. It integrates various components for ML and DM through its modular data pipelining concept through GUI allowing assembly of nodes for data preprocessing (ETL - Extraction, Transformation and Load), for modelling and data analysis and visualisation without, or with only minimal, programming. The platform is released under open source GNU GPLv3 license and has more than 1500 modules, a comprehensive range of integrated tools, and the widest choice of advanced algorithms available. KNIME is implemented in Java but also allows for wrappers calling other code in addition to providing nodes that allow to run Java, Python, Perl and other programming languages; and integration with Weka, R, Python, Keras (DL), H2O (ML/DL), DL4J (DL, Hadoop/Spark). It has considerable community supports i.e. it is used by over 3000 organizations in more than 60 countries.