HunPoS (Halacsy et al, 2007) is an open source reimplementation of the statistical part-of-speech tagger Trigrams'n Tags, also called TnT (Brants, 2000) allowing the user to tune the tagger by using different feature settings. TagPer (Seraji, 2015, Chapter 4, pp. 91-96) was developed by training HunPos on Uppsala Persian Corpus (UPC, which is a large, freely available Persian corpus. The corpus is a modified version of the Bijankhan corpus with additional sentence segmentation and consistent tokenization consisting of over 2,7 million words and is annotated with morpho-syntactic and partly semantic features.
The tool is developed by Mojgan Seraji ( mojgan.seraji96@gmail.com ) and licensed under GNU General Public License . It is used for part-of-speech tagging of Persian texts and can be downloaded below:
Before you start using the language model, you will first need to download HunPoS. Then you can take the model and tag your text using the following command line:
prompt> hunpos-tag model_TagPer < input_file.txt > output_file.txt
1. Halácsy P., Kornai A., and Oravecz Cs. 2007. Hunpos - an open source trigram tagger. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, volume Companion Volume, Proceedings of the Demo and Poster Sessions, pages 209-212, Prague, Czech Republic, 2007. Association for Computational Linguistics.
2. Seraji Mojgan. 2011. A Statistical Part-of-Speech Tagger for Persian. In Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA. Riga, Latvia. [pdf]
3. Seraji, Mojgan. 2015. Morphosyntactic Corpora and Tools for Persian. Doctoral dissertation, Uppsala University. Studia Linguistica Upsaliensia 16. [pdf]