Downloads

Cross-Lingual Semantic Relatedness (CLSR)

A validated translation of the original Miller-Charles (Miller and Charles, 1998) and WordSimilarity-353 (Finkelstein et al., 2001). The data sets contains the English word pairs and their corresponding translations in Spanish, Romanian, and Arabic. The data set has been used in evaluating monolingual and cross-lingual semantic similarity in (Hassan and Mihalcea 2009).

  • If you use this data set, please cite:
 Samer Hassan and Rada Mihalcea, Cross-Lingual Semantic Relatedness using
 Encyclopedic Knowledge, in Proceedings of the Conference on Empirical Methods
 in Natural Language Processing (EMNLP), Suntec, Singapore August 2009.
@InProceedings{Hassan09a,
 author = {Samer Hassan, Rada Mihalcea},
 title = {Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge},
 booktitle = {Proceedings of the conference on Empirical Methods in Natural
 Language Processing},
 address =      {Singapore},
 year =         {2009}
}


Learning to Identify Educational Material (LIEM)

The data sets is a collection of 862 documents annotated for its educative(ness) value along with other user selected features. The data set has been used in identification of educational materials in (Hassan and Mihalcea 2009).

  • Download (August 11, 2009)
  • If you use this data set, please cite:
 Samer Hassan and Rada Mihalcea, Learning to Identify Educational Materials, 
 in Proceedings of the Conference on Recent Advances in Natural Language 
 Processing (RANLP), Borovets, Bulgaria, September 2009
@InProceedings{Hassan09b,
 author = {Samer Hassan, Rada Mihalcea},
 title = {Learning to Identify Educational Materials},
 booktitle = {Proceedings of the Conference on Recent Advances in 
              Natural Language Processing},
 address = {Borovets, Bulgaria},
 year = {2009}
}
Comments