Persian Sentence Segmenter and Tokenizer: SeTPer The toolsThe tools are developed by Mojgan Seraji ( mojgan.seraji96@gmail.com ) in collaboration with Jörg Tiedemann ( jorg.tiedemann@lingfil.uu.se ) and licensed under GNU General Public License . The following scripts use similar regular expressions as in Uplug (Tiedemann, 2003) with extensions for Persian. To get the tools click the following links:Running SeTPerYou can run SeTPer by typing the following at the command line prompt:prompt> perl fa_sent.pl < input_file.txt | perl fa_tok.pl > output_file.txt References1. Tiedemann J., 2003. Recycling Translation - Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing. Doctoral dissertation, Uppsala University. Studia Linguistica Upsaliensia 1. 2. Seraji, Mojgan. 2015. Morphosyntactic Corpora and Tools for Persian. Doctoral dissertation, Uppsala University. Studia Linguistica Upsaliensia 16. [pdf]. |
Home >