[2017-02-01] Regression is now added to RRF. 

[2014-07-08] Introducing "inTrees" (interpretable trees), a framework and an R package for extracting, measuring, pruning, selecting and summarizing rules from a tree ensemble (so far including random forest, RRF and gbm). For Latex user: these rules can be easily formatted as latex code. 
Web page with introduction deck and demo code

Feedback can be sent to: hdeng3@asu.edu

Technical details:

Houtao Deng, George Runger, "Gene Selection with Guided Regularized Random Forest", Pattern Recognition, 46.12 (2013): 3483-3489

 >> This paper describes the algorithms used in the RRF package. The guided RRF is an enhanced RRF which is guided by the importance scores from an ordinary random forest. 

Houtao Deng, George Runger, "Feature Selection via Regularized Trees", The 2012 International Joint Conference on Neural Networks (IJCNN), IEEE, 2012. 

 >> This paper describes the regularized trees including the original RRF.

Houtao Deng, "Guided Random Forest in the RRF Package", arXiv:1306.0237, 2013.

>> One can assign a weight [0,1] to each feature. Then, the features with larger weights will be preferred when building the GRF. 

>> In the paper the weight is decided by importance scores from an ordinary random forest. But it can be determined by other ways, e.g., expert insights on which features may be more important. 

>> Experiments on 10 high-dimensional gene data sets (#features >> #instances) show that, with a fixed parameter value (without tuning the parameter), RF applied to features selected by GRF outperforms RF applied to all features on 9 data sets and 7 of them have significant differences at the 0.05 level. Thus for these data sets, GRF  improve RF's performance by removing irrelevant features

Sample code:

Gene data sets: Code 
Simulated data: code  

slides on variable (feature) ranking and selection using random forests

Usage of RRF/GRRF

library(RRF); # install RRF package before running the script

# Generate Data and Visualize; X1 and X21 are the truely useful features
nCol <- 100; X <- matrix(runif(400*nCol, min=-2, max=2), ncol=nCol);
class <- (X[,1])^2 + (X[,21])^2 #X1 and X21 are the truely useful features
ix <- which(class > quantile(class, 6/10)); ix <- c(ix,which(class < quantile(class, 1/10))); class <- class*0-1; class[ix] <- 1
X11(); plot(X[ix,1],X[ix,21],col="blue",pch=1,xlim=c(-3,3),ylim=c(-3,3),xlab="Variable 1",ylab="Variable 2")
ix <- which(class==-1);
legend("topright",legend=c("class 1","class 2"), col=c("blue","red"), pch=c(1,3))

# Feature selection via RRF
lambda <- 0.8 Both the number of features and the quality of the features are quite sensitive to lambda for RRF. A smaller lambda leads to fewer features.
rrf <- RRF(X,as.factor(class),flagReg=1,coefReg=lambda) # coefReg is a constant for all variables.   #either "X,as.factor(class)" or data frame like "Y~., data=data" is fine, but the later one is significantly slower. 
subsetRRF <- rrf$feaSet # produce the indices of the features selected by RRF

# Feature selection via GRRF
rf <- RRF(X,as.factor(class), flagReg = 0) # build an ordinary RF 
impRF <- rf$importance 
impRF <- impRF[,"MeanDecreaseGini"] # get the importance score 
imp <- impRF/(max(impRF)) #normalize the importance scores into [0,1]
gamma <- 0.5   #A larger gamma often leads to fewer features. But, the quality of the features selected is quite stable for GRRF, i.e., different gammas can have similar accuracy performance (the accuracy of an ordinary RF using the feature subsets). See the paper for details. 
coefReg <- (1-gamma) + gamma*imp   # each variable has a coefficient, which depends on the importance score from the ordinary RF and the parameter: gamma
grrf <- RRF(X,as.factor(class), flagReg=1, coefReg=coefReg)
subsetGRRF <- grrf$feaSet # produce the indices of the features selected by GRRF

print(subsetRRF) #the subset includes many more noisy variables than GRRF

======    Notes     ======

If RRF and randomForest packages are loaded in the same R session, you may use "package::function" for functions appear to be the same in both packages. For example:  RRF::importance() or randomForest::importance()

Feature selection + classification procedure
1. remove unwanted features by applying a feature selection algorithm such as RRF to the training data
2. apply a classifier such as random forest to the reduced training data
3. test the classifier on the reduced testing data to estimate the error

OOB ERROR in (G)RRF should not be used for performance evaluation. 

The difference between RRF and guided RRF?

Guided RRF is recommended.

RRF is greedy in the feature selection process: variables are selected based on a subsample of data/variables at each node.

Guided RRF uses the importance scores from an ordinary random forest to guide the feature selection process of RRF. The motivation is to assign smaller penalites to variables with larger importance scores.  

Note both RRF and Guided RRF can be used in the 'RRF' package. 

But if classification accuracy is the only goal, you may use RRF with minimal regularization, that is, coefReg = 1. The number of features used in the RRF, though, can be large. 

The difference between Guided Random Forest and (G)RRF?

GRF selects a subset of RELEVANT features, while (G)RRF selects a subset of relevant and non-redundant features. GRF often selects a lot more features than GRRF (sometimes most of the features), but it may lead to better classification accuracy than GRRF (demonstrated in Guided Random Forest in the RRF Package). 

GRF can be implemented in a distributed computing framework such as Hadoop as each base tree is built independently. 

The key difference between RRF and RF's importance score ?

The relationship between RRF and RF is similar to the relationship between LASSO and ordinary regression.
RRF is good for feature subset selection. The variables with non-zero importance scores are selected.  
RF produces an importance score for each variable. But it does not provide a feature subset.

For a large number of instances, how to make RF/RRF/GRRF faster ?

You can increase the "nodesize" variable. The default value of nodesize is 1, i.e., grow complete trees, which may not be necessary when there is a large number of instances.
You can also set a limit to "maxnodes", which is the maximum number of nodes. Since
 the trees are built in a breadth-first fashion, the nodes with larger information gain tend to be formed earlier.