Composite features can now be used, estracted from several tokens and using different attributes.
So far features were extracted from a single token estratte da un singolo token ed erano elementari.
For example, the following notation:
Features LEMMA -1 0 1 leftChild(0) rightChild(prev(0))
meant to use as feature the lemma of tokens determined respectively as:
-1 first on the stack
0 next in input queue
1 second in input
leftChild(0) left child of next
rightChild(prev(0)) right child of token immediately preceding the next in input
Now it is possible to denote composite features, extracted from different tokens in the following way:
Feature LEMMA(-1) POSTAG(0) DEPREL(leftChild(-1))
The feature is the concatenation of three elementary features extracted from tokens denoted in the same way as before.
The old notation is still available for back compatibility.
Adding the following composite features to the English model:
Feature CPOSTAG(-1) CPOSTAG(0)
Feature CPOSTAG(0) CPOSTAG(1)
Feature CPOSTAG(-1) CPOSTAG(1)
provided an improvement to
Labeled attachment score: 50685 / 57676 * 100 = 87.88 %
Unlabeled attachment score: 52066 / 57676 * 100 = 90.27 %
on the English Penn TreeBank.
With parser combination DeSR achieves:
Labeled attachment score: 51277 / 57676 * 100 = 88.91 %
Unlabeled attachment score: 52612 / 57676 * 100 = 91.22 %