Importing data
One instance per file:
bin/mallet import-dir --input sample-data/web/* --output web.mallet
One file, one instance per line:
bin/mallet import-file --input /data/web/data.txt --output web.mallet
Build the classifier
bin/mallet train-classifier --input acl/acl.mallet --output-classifier acl/acl.maxent.classifier --trainer MaxEnt
Test how it works with unseen data
bin/mallet classify-dir --input datadir --output - --classifier classifier
Evaluation of a Classification Algorithm
./bin/mallet train-classifier --input web.mallet --training-portion 0.9 --trainer MaxEnt
./bin/mallet train-classifier --input web.mallet --cross-validation 10 --trainer MaxEnt
Mallet Sequence Tagging
Using SimpleTagger perform n-fold cross validation using these parameters
--train true --test lab --threads 2 --iterations 50 crf-input-data.txt
Generalised Expectation
./bin/mallet import-file --input train.file.tsv --output train.file.mallet
./bin/mallet import-file --input test.file.tsv --use-pipe-from train.file.mallet --output test.file.mallet
vectors2vectors
--input hockey-train.mallet --output hockey.unlabeled.vectors --hide-targets
vectors2featureconstraints
--input lang-train.mallet --output lang.constraints --features-file labeled-features-lang.tsv --targets heuristic
Test:
./bin/mallet train-classifier --training-file ham.train.unlabeled.vectors --testing-file ham.test.mallet --trainer "MaxEntGETrainer,gaussianPriorVariance=0.1,constraintsFile=\"ham.constraints\"" --report test:accuracy
Human generated features:
java cc.mallet.classify.tui.Vectors2FeatureConstraints \
--input baseball-hockey.train.vectors \
--output baseball-hockey.constraints \
--features-file baseball-hockey.labeled_features \
--targets heuristic
Machine generated features:
Finally, we may estimate the expectations using the exact target expectations from the labeled data. The targets option to do this is oracle.
java cc.mallet.classify.tui.Vectors2FeatureConstraints \
--input baseball-hockey.train.vectors \
--output baseball-hockey.constraints \
--features-file baseball-hockey.features \
--targets oracle
---
mallet Line # does not match regex: When importing files to mallet
Run:
tr -dc [:alnum:][\ ,.]\\n < ./inputfile.txt > ./inputfilefixed.txt
See explanation.
Read More about Mallet