ReForeSt is a distributed, scalable implementation of the RF learning algorithm which targets fast and memory efficient processing. ReForeSt main contributions are manifold: (i) it provides a novel approach for the RF implementation in a distributed environment targeting an in-memory efficient processing, (ii) it is faster and more memory efficient with respect to the de facto standard MLlib, (iii) the level of parallelism is self-configuring.
ReForeSt and its documentation have been designed for developers and data scientists which are familiar with the Spark Enviroment and the MLlib library. Consequently please refer first to those documentation before starting with ReForeSt
Get ReForeSt from the downloads page of the project website. ReForeSt is built on top of Apache Spark and requires Spark for executing.
Look at the examples to learn a random forest with ReForeSt:
ReForeSt has been developed at Smartlab - DIBRIS - University of Genoa