Perplexity estimates

posted Apr 26, 2012, 1:57 AM by Giuseppe Attardi
In several experiments of Domain Adaptation, we used the estimated likelihood that the parser computes for each transition as an indication of perplexity. Sentences with high perplexity are good candidates for Active Learning.
See for example this paper:

J. Atserias, G. Attardi, M. Simi, H. Zaragoza. Active Learning for Building a Corpus of Questions for Parsing. Proc. of LREC 2010, Malta, 2010.

Since this feature might be useful for other purposes as well, I have added a command line option -p to enable it.

Activating this option, the parser will print a tag in front of each sentence.

<LogLikelihood all=-23.6232 avg=0.111773 min=5.50254e-11 />

All is the overall likelihood of the parse, avg is the average of all the transitions, and min is the lowest of all transitions.

The option only has effects when using MLP or ME classifiers, which provide probability distributions for the predictions.