PBC4cip manual
Installation
Open Weka's Package manager (Tools > Package manager). Select File/URL from the Package manager and select the PBC4cip package to install it. After installing the package, you need to close and open Weka before using it.
PBC4cip parameters
PBC4cip has only two parameters: miner and filter. The miner extracts contrast patterns (CPs), right now the only working miner is RandomForestMinerWithoutFiltering. If the filter parameter is set to true, the patterns are filtered, keeping only the most general.
Since PBC4cip implements Weka's RandomizableClassifier interface, the classification results will only change if you change the seed.
Random forest miner
The miner extracts CPs from a random forest of trees built using the selected builder. There are two builders implemented:
DecisionTreeBuilder: the miner extracts univariate CPs
MultivariateDecisionTreeBuilder: the miner extract linear combination multivariate CPs
The number of trees is set with numTrees.
If bagging is set to True, each tree is built using random samples with replacement from the training dataset. The number of samples is a percentage of the original size of the dataset, specified with bagSizePercent.
The numFeatures parameter sets the number of randomly selected features used to build each tree; if the parameter is set to -1, log2(features)+1 are randomly selected
Decision tree builders
The following parameters are common to the univariate and multivariate decision tree builders:
The distributionEvaluator is the split evaluating function, currently there are three implemented functions: Quinlan gain, Hellinger distance, and a multi-class version for Hellinger distance. The minimum function value should be 0.0 and greater values correspond to better splits.
The maxDepth is the maximum depth allowed for each tree as an integer. If set to -1, there is no limit to the depth.
The MinimalObjByLeaf is the minimum number of objects in a leaf, a split would result in a leaf with less obejcts is not valid.
The MinimalSplitGain is the minimum value of the distributionEvaluator needed to consider a split.
The multivariate decision tree builder has two extra parameters:
The wMin parameter is the minimal absolute value for each weight in a linear combination after normalizing (norm of the weights equal to one). When a linear combination with a normalized weight less than wMin is found, we stop the Sequential Forward Selection early. It takes values between 0.0 and 1.0.
The minimalForwardGain is the minimum increment in the split evaluating function between the candidate split and the parent split.