Eleni Miltsakaki - Ελένη Μιλτσακάκη


Founder and CEO of Choosito!.
Email: first name at name of company dot com

Adjunct Assistant Professor of Education
Email: elenimi at gse dot upenn dot edu


Phd in Computational Linguistics, UPenn (2003) 
MA in Applied Linguistics, U of Essex, UK (1991)
BA in English, Aristotle U of Thessaloniki (1988)

Short Bio 

I am a computational linguist doing research in  discourse and dialogue.  I have a PhD in Linguistics from the University of Pennsylvania, U.S.A., and an M.A. in Applied Linguistics from the University of Essex, UK.  I have studied extensively topic tracking and topic structure in discourse,  reference and pronoun resolution in text and dialogue, discourse relations,  and analysis of text readability with the purpose of matching text with the reader's reading comprehension ability. Methodological approaches: a)linguistic and corpus analyses of discourse, b)psycholinguistic experiments, and c) machine learning methods for classification and labeling tasks. 

Recently, I founded Choosito!, an innovative research tool that uses linguistic analysis to filter websites by grade level and thematic content. While I continue to teach EDUC 526 at GSE in Summer Session I, I spend the rest of my time developing Choosito's innovative technology.

Choosito! has been funded by NSF/SBIR. 

Research projects 

  • Text Simplification:  Related publications
  • Personalized Learning
  • Antelogue is a pronoun resolution system that uses natural language techniques to process dialogues and identify co-referring relations between pronouns and their antecedents in the dialogue. Antelogue is a computationally efficient system which uses linguistically rich resources to achieve high precision resolution. The current version of Antelogue is specifically designed to process dialogues from the popular TV series 'Lost'. It achieves 93% accuracy for first, second and third person pronouns. Plural pronouns are not handled yet. Related publications.
  • Read-X is a web search optimization engine that searches the web using existing search engines and returns results classified by thematic area and expected level of reading difficulty. Read-X uses a MaxEnt classifier trained on hand-labeled data in eight thematic areas ranging from literature and science to business and sports. The reading level is determined by an analysis of several linguistics features. Read-X is a computationally efficient engine performing web text analyses in a matter of seconds. The next version of Read-X, including a sophisticated measure of readability sensitive to the reader's familiarity with specific content areas,  is currently under development. Related publications.

  • The Penn Discourse Treebank  is to-date the largest discourse annotated corpus. PDTB 1.0 was released in 2006 and contained annotations of discourse connectives (explicit and implicit) and their arguments. PDTB 2.0, released in January 2008, is enriched with annotations of speaker attribution and the senses of connectives. Related publications.

  • Automated evaluation of coherence in student essays implements a centering-based algorithm to identify topic discontinuities in students essays. Statistical analysis of the performance of the algorithm on a corpus of student essays shows that the topic discontinuity model can improve the performance of e-rater, the automated essay scoring system developed at ETS. Related publications link coming. Related publications.