  • The scope of my research covers automatic morphosyntactic processing and analysis of texts, combining data-driven methods and linguistic knowledge. 
  • Member of the Universal Dependencies team and the main developer of the Universal Dependencies for Persian (UD Persian Seraji).
  • Developer of the BLARK project for Persian, containing a large collection of open source tools and linguistic resources for natural language processing of Persian, including a normalizer, a sentence segmenter and tokenizer, a PoS tagger, and a dependency parser, as well as a treebank consisting of 6,000 sentences (circa 152,000 tokens) with a syntactic annotation scheme based on Stanford Dependencies. The largest part of this project deals with the development of the Uppsala Persian Dependency Treebank, a process spanning a series of tasks from corpus normalization and tokenization to adding encoding layers for part-of-speech tags and dependency relations. It further includes an innovative style for handling the language-specific challenges facing automatic processing.



  • Data-driven methods for NLP/NLU
  • Basic Language Resource Kit (BLARK) for Persian: corpora, treebank, text normalization, PoS tagging, morphological analysis, dependency parsing
  • Machine Translation
  • The correlation of speech to different gestures in human-human and human-machine communication

Developed open source BLARK for Persian

Detailed description of the following language resources and tools are given in my book (doctoral dissertation) entitled Morphosyntactic Corpora and Tools for Persian 

