Text Reuse and Plagiarism Detection

With ever more electronic text being created by word processors and ever wider access to electronic text via the Internet, wider incidence of plagiarism was inevitable and is now occurring. Higher education institutions charged on the one hand with embracing new technology and widening access through increased participation and use of distance learning, and on the other hand with maintaining quality and standards, need tools to help combat this form of fraud. Computerised techniques that analyse lexical and phrasal features of texts can help to identify likely incidents of plagiarism and draw tutors attention to texts that should be more closely examined to determine whether plagiarism has or has not occurred.

This project will develop new techniques for automatically identifying text reuse and plagiarism. Several different types of text reuse (and plagiarism) can occur, for example when the original document has been translated from another language or rewritten to avoid detection. The project will focus on a subset of these types.


Contact Person:

Dr. Rao Muhammad Adeel Nawab, Assistant Professor, COMSATS Institute of Information Technology, Lahore, Pakistan. Homepage

Foreign Adviser:

Dr. Mark Stevenson, Senior Lecturer, Department of Computer Science, University of Sheffield, UK. Homepage

Dr. Paul Clough, Senior Lecturer, Information School, University of Sheffield, UK. Homepage


Selected Publication:

  • R. Nawab, M. Stevenson and P. Clough (2012) "Detecting Text Reuse with Modified and Weight N-grams ", *SEM: The First Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics (ACL) , pp: 54-58, Montreal, Canada.

  • R. Nawab, M. Stevenson and P. Clough (2012) "Retrieving Candidate Plagiarised Documents using Query Expansion ", In Proceedings of the 34th European Conference on Information Retrieval (ECIR) , pp: 207-218, Barcelona, Spain.