Home
Mojgan Seraji's Academic Web Pages
Short Bio
The scope of my research covers automatic morphosyntactic processing and analysis of texts, combining data-driven methods and linguistic knowledge.
Member of the Universal Dependencies team and the main developer of the Universal Dependencies for Persian (UD Persian Seraji).
Developer of the BLARK project for Persian, containing a large collection of open source tools and linguistic resources for natural language processing of Persian, including a normalizer, a sentence segmenter and tokenizer, a PoS tagger, and a dependency parser, as well as a treebank consisting of 6,000 sentences (circa 152,000 tokens) with a syntactic annotation scheme based on Stanford Dependencies. The largest part of this project deals with the development of the Uppsala Persian Dependency Treebank, a process spanning a series of tasks from corpus normalization and tokenization to adding encoding layers for part-of-speech tags and dependency relations. It further includes an innovative style for handling the language-specific challenges facing automatic processing.
Events
Release of the Uppsala Persian Dependency Treebank (UPDT) with Diacritics, in collaboration with Google (July 08, 2016).
Uppsala University Conferment Ceremony (Doktorspromotion) (Jan 29, 2016).
Workshop on Universal Dependencies for Indian languages, organized by: Hima Bindu Maringanti (North Orissa University) and Mojgan Seraji (Uppsala University), India (Dec 11-14, 2015).
Invited talk at the International Workshop on Language Resources for Iranian Languages, Université Sorbonne Nouvelle, Paris (Nov 25-26, 2015).
Invited researcher at the Google NLP PhD Summit, Google office - Zurich (Sep 23-26, 2015).
Projects
PARSEME (ICT COST Action IC1207)
Research interests
Data-driven methods for Natural Language Processing/Natural Language Understanding
Basic Language Resource Kit (BLARK) for Persian: corpora, treebank, text normalization, PoS tagging, morphological analysis, dependency parsing
Machine Translation
The correlation of speech to different gestures in human-human and human-machine communication
Developed open source BLARK for Persian
Detailed description of the following language resources and tools are given in my book (doctoral dissertation) entitled Morphosyntactic Corpora and Tools for Persian .
Copyright © 2015 Mojgan Seraji