In order to use ReForeSt it is required to add the ReForeSt and the Apache Spark jar files in your classpath.
import reforest.rf.{RFProperty, RFRunner}
// Create the ReForeSt configuration.
val property = RFParameterBuilder.apply
.addParameter(RFParameterType.Dataset, "data/test10k-labels")
.addParameter(RFParameterType.NumFeatures, 794)
.addParameter(RFParameterType.NumClasses, 10)
.addParameter(RFParameterType.NumTrees, 100)
.addParameter(RFParameterType.Depth, 10)
.addParameter(RFParameterType.BinNumber, 32)
.addParameter(RFParameterType.SparkMaster, "local[4]")
.addParameter(RFParameterType.SparkCoresMax, 4)
.addParameter(RFParameterType.SparkPartition, 4 * 4)
.addParameter(RFParameterType.SparkExecutorMemory, "4096m")
.addParameter(RFParameterType.SparkExecutorInstances, 1)
.build
val sc = CCUtil.getSparkContext(property)
// Create the Random Forest classifier.
val timeStart = System.currentTimeMillis()
val rfRunner = ReForeStTrainerBuilder.apply(property).build(sc)
// Train a Random Forest model.
val model = rfRunner.trainClassifier()
val timeEnd = System.currentTimeMillis()
// Evaluate model on test instances and compute test error
val labelAndPreds = rfRunner.getDataLoader.getTestingData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
val testErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / rfRunner.getDataLoader.getTestingData.count()
println("Accuracy: "+(1 - testErr))
println("Time: " + (timeEnd - timeStart))
rfRunner.sparkStop()
Since version v1.2, ReForeSt support Model Selection.
In order to enable ReForeSt model selection it is required to activate the functionality and to provide a list of values for each of the parameters. In the following is presented how to configure ReForeSt to enable the model selection.
val property = RFParameterBuilder.apply
.addParameter(RFParameterType.Dataset, "data/test10k-labels")
.addParameter(RFParameterType.NumFeatures, 794)
.addParameter(RFParameterType.NumClasses, 10)
.addParameter(RFParameterType.ModelSelectionEpsilon, 0.001)
.addParameter(RFParameterType.ModelSelectionEpsilonRemove, 0.02)
.addParameter(RFParameterType.NumTrees, Array(50, 80, 100))
.addParameter(RFParameterType.Depth, Array(5, 10))
.addParameter(RFParameterType.FeatureMultiplierPerNode, Array(1d, 2d))
.addParameter(RFParameterType.BinNumber, Array(8, 16, 32))
.addParameter(RFParameterType.SparkMaster, "local[4]")
.addParameter(RFParameterType.SparkCoresMax, 4)
.addParameter(RFParameterType.SparkPartition, 4 * 4)
.addParameter(RFParameterType.SparkExecutorMemory, "4096m")
.addParameter(RFParameterType.SparkExecutorInstances, 1)
.addParameter(RFParameterType.ModelSelection, true)
.build
The Random Forest Rotation is a new ensemble method that extends the classic Random Forest formulation. The original definition can be found at: http://www.jmlr.org/papers/volume17/blaser16a/blaser16a.pdf
Since version v.1.1, ReForeSt support Random Forest Rotation. In order to activate the support for the Random Rotations it is required to activate its support as following.
val property = RFParameterBuilder.apply
.addParameter(RFParameterType.Dataset, "data/sample-covtype.libsvm")
.addParameter(RFParameterType.NumFeatures, 54)
.addParameter(RFParameterType.NumClasses, 2)
.addParameter(RFParameterType.NumTrees, 100)
.addParameter(RFParameterType.Depth, 10)
.addParameter(RFParameterType.BinNumber, 32)
.addParameter(RFParameterType.SparkMaster, "local[4]")
.addParameter(RFParameterType.SparkCoresMax, 4)
.addParameter(RFParameterType.SparkPartition, 4 * 4)
.addParameter(RFParameterType.SparkExecutorMemory, "4096m")
.addParameter(RFParameterType.SparkExecutorInstances, 1)
.addParameter(RFParameterType.Rotation, true)
.addParameter(RFParameterType.NumRotations, 4)
.build
ReForeSt has been developed at Smartlab - DIBRIS - University of Genoa