Eleni Miltsakaki - Ελένη Μιλτσακάκη

Vita  Publications by date  by type  Teaching


  Research Associate 

  
Institute of Research in Cognitive Science

   Computer and Information Science
   University of Pennsylvania
   3401 Walnut St., Suite 400A, Room 407
   Philadelphia, PA 19104-6228
   Tel: 215 573 6285, Fax: 215 5739247
   Email (below)


LSA 2012 Symposium on Information Structure and Discourse, in memory of Ellen Prince. 

Readability Bibliography



Short Bio

I am a computational linguist doing research in  discourse and dialogue.  I have a PhD in Linguistics from the University of Pennsylvania, U.S.A., and an M.A. in Applied Linguistics from the University of Essex, UK.  I have studied extensively topic tracking and topic structure in discourse,  reference and pronoun resolution in text and dialogue, discourse relations,  and analysis of text readability with the purpose of matching text with the reader's reading comprehension ability. Methodological approaches: a)linguistic and corpus analyses of discourse, b)psycholinguistic experiments, and c) machine learning methods for classification and labeling tasks. 


Research projects 

  • Antelogue is a pronoun resolution system that uses natural language techniques to process dialogues and identify co-referring relations between pronouns and their antecedents in the dialogue. Antelogue is a computationally efficient system which uses linguistically rich resources to achieve high precision resolution. The current version of Antelogue is specifically designed to process dialogues from the popular TV series 'Lost'. It achieves 93% accuracy for first, second and third person pronouns. Plural pronouns are not handled yet. Related publications.
  • Read-X is a web search optimization engine that searches the web using existing search engines and returns results classified by thematic area and expected level of reading difficulty. Read-X uses a MaxEnt classifier trained on hand-labeled data in eight thematic areas ranging from literature and science to business and sports. The reading level is determined by an analysis of several linguistics features. Read-X is a computationally efficient engine performing web text analyses in a matter of seconds. The next version of Read-X, including a sophisticated measure of readability sensitive to the reader's familiarity with specific content areas,  is currently under development. Related publications.
  • The Penn Discourse Treebank  is to-date the largest discourse annotated corpus. PDTB 1.0 was released in 2006 and contained annotations of discourse connectives (explicit and implicit) and their arguments. PDTB 2.0, released in January 2008, is enriched with annotations of speaker attribution and the senses of connectives. Related publications.
  • Automated evaluation of coherence in student essays implements a centering-based algorithm to identify topic discontinuities in students essays. Statistical analysis of the performance of the algorithm on a corpus of student essays shows that the topic discontinuity model can improve the performance of e-rater, the automated essay scoring system developed at ETS. Related publications link coming. Related publications.

Positions held

  • Research associate, Computer & Information Science, University of Pennsylvania (2008- )
  • Lecturer, Educational Technologies, Graduate School of Education (2006-)
  • Postdoctoral fellow, IRCS, University of Pennsylvania, 2003-2006
  • ETS fellow (summer scholarship programs) in the natural language processing group (1998, 1999, 2000)
  • Director of Studies, Ekpedeftiki Foreign Language School, Athens, Greece (1991-1996)
  • Educational consultant for Cambridge University Press (1991-1996)

View Eleni Miltsakaki's profile on LinkedInLogo - Small

Sign in  |  Recent Site Activity  |  Terms  |  Report Abuse  |  Print page  |  Powered by Google Sites