Announcements‎ > ‎

Version 1.2.5 released

posted Feb 11, 2012, 1:14 AM by Giuseppe Attardi   [ updated Feb 11, 2012, 2:07 AM ]
Composite features can now be used, estracted from several tokens and using different attributes.

So far features were extracted from a single token estratte da un singolo token ed erano elementari.
For example, the following notation:

Features LEMMA -1 0 1 leftChild(0) rightChild(prev(0))

meant to use as feature the lemma of tokens determined respectively as:

-1 first on the stack
0 next in input queue
1 second in input
leftChild(0) left child of next
rightChild(prev(0)) right child of token immediately preceding the next in input

Now it is possible to denote composite features, extracted from different tokens in the following way:

Feature LEMMA(-1) POSTAG(0) DEPREL(leftChild(-1))

The feature is the  concatenation of three elementary features extracted from tokens denoted in the same way as before.

The old notation is still available for back compatibility.

Adding the following composite features to the English model:

Feature         CPOSTAG(-1) CPOSTAG(0)
Feature         CPOSTAG(0) CPOSTAG(1)
Feature         CPOSTAG(-1) CPOSTAG(1)

provided an improvement to

  Labeled   attachment score: 50685 / 57676 * 100 = 87.88 %
  Unlabeled attachment score: 52066 / 57676 * 100 = 90.27 %

on the English Penn TreeBank.
With parser combination DeSR achieves:

  Labeled   attachment score: 51277 / 57676 * 100 = 88.91 %
  Unlabeled attachment score: 52612 / 57676 * 100 = 91.22 %

Comments