Data Science

H. Deng, "Interpreting Tree Ensembles with inTrees", arXiv:1408.5456, 2014

>> Extract interpretable rules and interactions from tree ensembles like random forests and boosted trees

>> Available in inTrees R package. More information

H. Deng, "Guided Random Forest in the RRF Package", arXiv:1306.0237, 2013.

>> Build random forests with pre-defined weights

H. Deng, G. Runger, E. Tuv, W. Bannister, "CBC: An Associative Classifier with A Small Number of Rules", Decision Support Systems, Volume 59, March 2014, Pages 163–170

>> This paper provides some insights/examples about decision trees and associative classifiers.

>> "I had the opportunity to read a worthy research work addressing the issue of associative classifiers." Comments from DSS

H. Deng, G. Runger, "Gene Selection with Guided Regularized Random Forest", Pattern Recognition, 46.12 (2013): 3483-3489

>> The "RRF" R package implements the (guided) regularized random forest algorithm, which is a type of regularized trees.

>> More about RRF (illustrative examples, clarifications, etc.)

H. Deng, M. Baydogan, G. Runger, "SMT: Sparse Multivariate Tree", Statistical Analysis and Data Mining, 7.1:53-69. 2014

>> Matlab+C code. (thanks to Mustafa for preparing this version for public)

>> SMT finds a linear combination of a small subset of variables at each node. In addition to regular data, it can also be used for time series data and can generate informative temporal patterns.

H. Deng, G. Runger, E. Tuv, M. Vladimir, "A Time Series Forest for Classification and Feature Extraction", Information Sciences, 239, pp. 142-153. 2013

>> Matlab+C Code (you need to run mexC.m first)

>> TSF provides the temporal importance curve for understanding the temporal patterns useful for classification.

>> TSF outperforms NN with Euclidean distance or DTW.

>> The time series must be of the same length. Otherwise need to align the time series into the same length.

>> note: the error rate of SonyAIBORobotSurface and SonyAIBORobotSurfaceII should be switched in the paper. (thanks to Professor Eamonn for pointing it out)

H. Deng, G. Runger, E. Tuv, "System Monitoring with Real-Time Contrasts", Journal of Quality Technology, 44(1), pp. 9-27. 2012

>> R_Code

>> "This is a really interesting approach with potential for wide application." Comments from JQT, a prestigious journal in the process monitoring area.

H. Deng, G. Runger, "Feature Selection via Regularized Trees", Proceedings of the 2012 International Joint Conference on Neural Networks (IJCNN), IEEE, 2012.

>> RRF available at the "RRF" R package.

>>The relationship between regularized trees and ordinary trees is similar to the relationship between Lasso and ordinary regression. Note Lasso is a linear model, while regularized trees can capture non-linear interactions between variables, and naturally handle missing values, different scales, and numerical and categorical variables.

H. Deng, G. Runger, E. Tuv, "Bias of Importance Measures for Multi-Valued Attributes and Solutions", Proceedings of the 21st International Conference on Artificial Neural Networks (ICANN 2011).

>> R_Code

>> Quite a few variable importance measures e.g. tree-based models are biased for multi-valued attributes. "Partial permutation" and "OOB Forest" were used to reduce the bias.

H. Deng, S. Davila, G. Runger, E. Tuv, "Learning Markov Blankets for Continuous or Discrete Networks via Feature Selection", Proceedings of ECML-SUEMA 2010, Pages: 97-108.

>> The code belongs to Intel and is not open-source.

Others

S. Ji, A. Fakhry, and H. D., "Integrative Analysis of the Connectivity and Gene Expression Atlases in the Mouse Brain", NeuroImage, 84(1), 245-253, 2014

S. H., W. J., and H. D., "SVM in RTC", submitted, 2015

E. A., and H. D., "Predict disease in Crop", submitted, 2015