R Package FHDI

Please email to icho@iastate.edu for any bug report with FHDI R package.

The current active version is 1.4.1 as of Sep 2020.

Update Note:

[>=1.4.1] (1) Options for cell construction methods for big-p (many variables) data using the k-nearest-neighbor (KNN), “s_op_cellmake” in FHDI_CellMake() and FHDI_Driver(). With this option, for each unique missing pattern that has insufficient donors (i.e. less than two possible donors in the same categorized imputation cell) KNN is used in lieu of the cell collapsing method of previous version, thereby dramatically reducing computation time. (2) Options for controlling the sure independence screening method used for big-p data, “top_corr_var” in FHDI_CellMake() and FHDI_Driver(). With the option, only a small set of top-ranked correlated variables are used to construct a reduced correlation matrix, thereby dramatically reducing memory usage for big-p data.

[>=1.4.0] Options for variable reduction methods for big-p data based on the sure independence screening (adapted theory of Fan and Lv, 2008): "i_op_SIS" and "s_op_SIS" in FHDI_CellMake() and FHDI_Driver(). With the options, for each unique missing pattern, only a small set of the most correlated observed variables are used to construct imputation cells, thereby dramatically reducing computations for too many variables (i.e. big-p data) and possibly avoiding over-fitting issues.

[>=1.2.5] Options for unordered categorical variables. "categorical" in FHDI_CellMake() and FHDI_Driver(). With the option, a mixed data set of continuous and unordered categorical variables are more efficiently handled.

SHARED DATA:

  • User manual about how to use FHDI for data curing

  • daty_user.csv (example data file)

  • election.csv (example categorical data file)

  • election.rds (example r data file of categorical format)

  • FHDIonRegression.r (R code for regression using FHDI)

  • The R Journal Paper of FHDI

are available at


https://iastate.box.com/s/vff7cpug89aaq624xszf3i2hkqldlmva