Email: first name at name of company dot com
Adjunct Assistant Professor of Education
Email: elenimi at gse dot upenn dot edu
Email: elenimi at seas dot upenn dot edu
Penn office: 3401 Walnut St., Suite 400A, Room 463
Tel: 215 573 6285 Fax: 215 773 9247
View Larger Map
BA in English, Aristotle U of Thessaloniki (1988)
am a computational linguist doing research in discourse and dialogue.
I have a PhD in Linguistics from the University of Pennsylvania,
U.S.A., and an M.A. in Applied Linguistics from the University of Essex,
UK. I have studied extensively topic tracking and topic structure in
discourse, reference and pronoun resolution in text and
dialogue, discourse relations, and analysis of text readability with
the purpose of matching text with the reader's reading comprehension
ability. Methodological approaches: a)linguistic and corpus analyses of
discourse, b)psycholinguistic experiments, and c) machine learning
methods for classification and labeling tasks.
Recently, I founded Choosito!, an innovative research tool that uses linguistic analysis to filter websites by grade level and thematic content. While I continue to teach EDUC 526 at GSE in Summer Session I, I spend the rest of my time developing Choosito's innovative technology.
Choosito! has been funded by NSF/SBIR.
- Antelogue is
a pronoun resolution system that uses natural language techniques to
process dialogues and identify co-referring relations between pronouns
and their antecedents in the dialogue. Antelogue is a computationally
efficient system which uses linguistically rich resources to achieve
high precision resolution. The current version of Antelogue is
specifically designed to process dialogues from the popular TV series
'Lost'. It achieves 93% accuracy for first, second and third person
pronouns. Plural pronouns are not handled yet. Related publications.
- Read-X is
a web search optimization engine that searches the web using existing
search engines and returns results classified by thematic area and
expected level of reading difficulty. Read-X uses a MaxEnt classifier
trained on hand-labeled data in eight thematic areas ranging from
literature and science to business and sports. The reading level is
determined by an analysis of several linguistics features. Read-X is a
computationally efficient engine performing web text analyses in a
matter of seconds. The next version of Read-X, including a sophisticated
measure of readability sensitive to the reader's familiarity with
specific content areas, is currently under development. Related publications.
- The Penn Discourse Treebank is
to-date the largest discourse annotated corpus. PDTB 1.0 was released
in 2006 and contained annotations of discourse connectives (explicit and
implicit) and their arguments. PDTB 2.0, released in January 2008, is
enriched with annotations of speaker attribution and the senses of
connectives. Related publications.
- Automated evaluation of coherence
in student essays implements a centering-based algorithm to identify
topic discontinuities in students essays. Statistical analysis of the
performance of the algorithm on a corpus of student essays shows that
the topic discontinuity model can improve the performance of e-rater, the automated essay scoring system developed at ETS. Related publications link coming. Related publications.