5.5 HOW GOOD IS THIS CLASSIFICATION?
Our classification routine has generated a set of word classes for our primary corpus. Now we need to identify how good this classification is. To do this, we looked at how well our classifications covered the word-occurrence and shared-context distribution classes given in Figures 5.1 and 5.3.
The shared-context distribution classes are a better indicator of performance because we use shared contexts as the basis for our word groups. So, it is significant that we classify 100% of the words that have 20 or more shared contexts. Further, there are only 14 words with 5 or more shared contexts that are not classified (less than 6% of these words). When we consider the set of groupable words (those words with 2 or more shared contexts), the classification process tripled the total number of grouped words, increasing the percentage grouped from 24% to 72%.
But is this any good? The problem is that such a large percentage of our corpus has either NO shared contexts or only ONE shared context. Recall that we added generalization to the classification to try to bring these groups into the process. Although we have had some success at classifying the words with one shared context (35% grouped), we have had virtually no success at grouping words with zero shared contexts (less than 1% grouped).
So what does this mean? It depends on how we define the "remainder of the words in the lexicon." If we decide that this must include all lexical entries, then we have not done very well at all. The 1,896 words that we have grouped represents less than 19% of the total lexicon. This does not seem very good. On the other hand, if we limit our attention to all groupable words, then we are only concerned with the 1,131 words in the lexicon that have two or more shared contexts. Over this restricted lexicon, the remainder will be the 867 words that were not grouped by iterative clustering. Our classification procedure grouped an additional 546 of these words. The total of 810 classified words from this restricted lexicon represents 72% of the groupable words.
But a more telling method for evaluating the classification procedure is to look at the errors associated with the classification (that is, the noun-verb mismatches detected). Although the overall error rate is not bad (128 errors representing 6.75% of the grouped words), most of the error is concentrated in the one large noun group (group label MAJORITY with 687 nouns, 88 verbs, and 49 words from other categories). There are more verbs in this NOUN class than in any of the VERB classes. Note that this group also has all of the error in the CORE GROUPING.
There are two ways to improve this performance. The first would involve tuning the classification algorithm itself. Recall that we are using a very conservative classification test - that more than 50% of the sum of the weighted-scores for CORE contexts associated with a candidate word must come from a single CORE context. We have done some tuning, but this remains an area for further research.
The second way to improve classification results is to modify the CORE GROUPING itself. That is, we could look for a way to increase the number of words in the CORE GROUPING. This would have two affects. First, more CORE words should increase the number of CORE contexts available for classifying words. Second, more grouped words should expand the scope of generalization, allowing more contexts to participate in the ABSTRACT portion of the classification process. Thus an expanded CORE GROUPING should improve our overall classification performance.
Note that increasing the size of the CORE does not require that we retain all current CORE words. It just requires that the total number of grouped words increase. This is significant when considering the problem with the group labeled MAJORITY. It seems that to improve overall classification performance, we will need to remove some of the verbs from that group's CORE. This implies a major change from the iterative clustering methodology. In that system, the groups grew monotonically. That is once a word entered a group, it would never become "ungrouped." Now we are allowing the possibility that a grouped word may become ungrouped (or possibly change its group).