Performance

This project is written in Python. The most recent version of my code is visible here (Github): https://github.com/michaelinwords/ua-americanspeech
- There are also performance examples and other information included in the Github repo
An overview of latest performance:
- This training was performed on 218 articles (80%) of the available 273 articles, using a TF-IDF vectoriser
  - Though below says 5 k-fold splits, only 1 of the loops was used for this classification report (the SKF is not fully implemented)
- The subset_accuracy is very low (though this is a harsh metric: it is an "exact match ratio" or "zero-one loss," and requires a perfect match between the predicted and actual labels for each sample in order to be considered correct) and the per_label_accuracy is somewhat high
- It is clear that support is a major issue for most of the categories (thus the need for more cross-validation as well as more articles generally, but especially for underrepresented categories)

The current performance is primarily due to a combination of the code being incomplete (look to Next Directions) as well as lacking a sufficient amount of articles for the category distribution, as well as overall (look to Limitations)

Page updated

Google Sites

Report abuse

Performance

Contact