In order to use ReForeSt it is required to add the ReForeSt and the Apache Spark jar files in your classpath.
import reforest.rf.{RFProperty, RFRunner}// Create the ReForeSt configuration.val property = RFParameterBuilder.apply .addParameter(RFParameterType.Dataset, "data/test10k-labels") .addParameter(RFParameterType.NumFeatures, 794) .addParameter(RFParameterType.NumClasses, 10) .addParameter(RFParameterType.NumTrees, 100) .addParameter(RFParameterType.Depth, 10) .addParameter(RFParameterType.BinNumber, 32) .addParameter(RFParameterType.SparkMaster, "local[4]") .addParameter(RFParameterType.SparkCoresMax, 4) .addParameter(RFParameterType.SparkPartition, 4 * 4) .addParameter(RFParameterType.SparkExecutorMemory, "4096m") .addParameter(RFParameterType.SparkExecutorInstances, 1) .buildval sc = CCUtil.getSparkContext(property)// Create the Random Forest classifier.val timeStart = System.currentTimeMillis()val rfRunner = ReForeStTrainerBuilder.apply(property).build(sc)// Train a Random Forest model.val model = rfRunner.trainClassifier()val timeEnd = System.currentTimeMillis()// Evaluate model on test instances and compute test errorval labelAndPreds = rfRunner.getDataLoader.getTestingData.map { point => val prediction = model.predict(point.features) (point.label, prediction)}val testErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / rfRunner.getDataLoader.getTestingData.count()println("Accuracy: "+(1 - testErr))println("Time: " + (timeEnd - timeStart))rfRunner.sparkStop()Since version v1.2, ReForeSt support Model Selection.
In order to enable ReForeSt model selection it is required to activate the functionality and to provide a list of values for each of the parameters. In the following is presented how to configure ReForeSt to enable the model selection.
val property = RFParameterBuilder.apply .addParameter(RFParameterType.Dataset, "data/test10k-labels") .addParameter(RFParameterType.NumFeatures, 794) .addParameter(RFParameterType.NumClasses, 10) .addParameter(RFParameterType.ModelSelectionEpsilon, 0.001) .addParameter(RFParameterType.ModelSelectionEpsilonRemove, 0.02) .addParameter(RFParameterType.NumTrees, Array(50, 80, 100)) .addParameter(RFParameterType.Depth, Array(5, 10)) .addParameter(RFParameterType.FeatureMultiplierPerNode, Array(1d, 2d)) .addParameter(RFParameterType.BinNumber, Array(8, 16, 32)) .addParameter(RFParameterType.SparkMaster, "local[4]") .addParameter(RFParameterType.SparkCoresMax, 4) .addParameter(RFParameterType.SparkPartition, 4 * 4) .addParameter(RFParameterType.SparkExecutorMemory, "4096m") .addParameter(RFParameterType.SparkExecutorInstances, 1) .addParameter(RFParameterType.ModelSelection, true) .buildThe Random Forest Rotation is a new ensemble method that extends the classic Random Forest formulation. The original definition can be found at: http://www.jmlr.org/papers/volume17/blaser16a/blaser16a.pdf
Since version v.1.1, ReForeSt support Random Forest Rotation. In order to activate the support for the Random Rotations it is required to activate its support as following.
val property = RFParameterBuilder.apply .addParameter(RFParameterType.Dataset, "data/sample-covtype.libsvm") .addParameter(RFParameterType.NumFeatures, 54) .addParameter(RFParameterType.NumClasses, 2) .addParameter(RFParameterType.NumTrees, 100) .addParameter(RFParameterType.Depth, 10) .addParameter(RFParameterType.BinNumber, 32) .addParameter(RFParameterType.SparkMaster, "local[4]") .addParameter(RFParameterType.SparkCoresMax, 4) .addParameter(RFParameterType.SparkPartition, 4 * 4) .addParameter(RFParameterType.SparkExecutorMemory, "4096m") .addParameter(RFParameterType.SparkExecutorInstances, 1) .addParameter(RFParameterType.Rotation, true) .addParameter(RFParameterType.NumRotations, 4) .buildReForeSt has been developed at Smartlab - DIBRIS - University of Genoa