Zahurul's Web Space
Since February 2011, I have been working as a scientific staff member (Wissenschaftlicher Mitarbeiter) at theText Technology Lab, Institut für Informatik, Goethe-Universität Frankfurt am Main. My work is part of theLOEWE Digital Humanities project in Frankfurt. In the project, I am working on multilingual text classification, more specifically: text readability classification and source and translated text classification. I am exploring different information-theoretic, linguistic and lexical features for both tasks. I am also doing my PhD under the supervision of Prof. Dr. Alexander Mehler and submitted my dissertation in December 2014.
Before joining here, I had been working as a graduate researcher at Dipartimento di Informatica, University of Pisa, on a Machine Translation (MT) project. The project focus was to explore the possibility of dependency parsing based MT system. I was involved in developing the first version of the Italian to English Tanl Translatesystem using the MOSES toolkit.
I have finished my European Masters in Language and Communication Technology (EM-LCT) from theDepartment of Computational Linguistics, University of Saarland and Faculty of Arts, University of Groningen.
Before joining the masters program, I worked in the Center for Resarch on Bangla Language Processing (CRBLP), BRAC University, Bangladesh. The Research was mainly on Bangla (my native language) Language Processing with a focus on morphological analysis, a spell checker, and corpus analysis.
CV: [ pdf ]
MA in Linguistics (October, 2007 to August, 2009)
University of Groningen, The Netherlands
BSC in Computer Science, 2005
BRAC University, Bangladesh
Machine Learning, Digital Humanities, Statistical Alignment, Machine Translation and Natural Language Processing. My special interests in Machine Learning based NLP application.
HONORS AND AWARDS
- Recipient of Erasmus Mundus Scholarship for studying masters in University of Saarland and University of Groningen.
- I was awarded High Distinction award of the university in the year 2006 for attaining a high CGPA in the Computer Science and Engineering Department.
- Recipient of Merit based scholarship from BRAC University in undergraduate program for maintaining good CGPA.
- Islam, Zahurul and Rahman, Rashedur; Readability of Bangla News Articles for Children; In: The 28th Pacific Asia Conference on Language, Information and Computing (PACLIC); 2014.
- Islam, Zahurul and Rahman, Rashedur and Mehler, Alexander; Readability Classification of Bangla Texts; In: Proceedings of the Computational Linguistics and Intelligent Text Processing;Springer, 2014.
- vor der Brück, Tim and Mehler, Alexander and Islam, Zahurul;ColLex.en: Automatically Generatingand Evaluating a Full-form Lexicon for English; In: Proceedings of the LREC 2014.
- Islam, Md. Zahuru land Hoenen, Armin, Source and Translation Classification using Most Frequent Words, In: Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP), 2013.
- Islam, Md. Zahurul and Mehler, Alexander, Automatic Readability Classification of Crowd-Sourced Data based on Linguistic and Information-Theoretic Features, In: Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics, 2013.
- Islam, Md. Zahurul and Rahman, Rashedur, English to Bangla Name Transliteration System (Abstract), In: Proceedings of the 23rd Meeting of Computational Linguistics in the Netherlands (CLIN 2013), 2013.
- Islam, Md. Zahurul; Mehler, Alexander and Rahman, Rashedur, Text Readability Classification of Textbooks of a Low-Resource Language, In: Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation (PACLIC 26), 2012.
- Islam, Md. Zahurul and Mehler, Alexander, Customization of the Europarl Corpus for Translation Studies, In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), 2012.
- Sukhareva, Maria; Islam, Md. Zahurul; Hoenen, Armin and Mehler, Alexander, A Three-step Model of Language Detection in Multilingual Ancient Texts, In: Proceedings of the Workshop on Annotation of Corpora for Research in the Humanities, Heidelberg, Germany, 2012.
- Islam, Md. Zahurul; Mittmann, Roland and Mehler, Alexander, Multilingualism in Ancient Texts: Language Detection by Example of Old High German and Old Saxon, In: Proceedings of the GSCL conference on Multilingual Resources and Multilingual Applications (GSCL 2011), 28-30 September, Hamburg, Germany, 2011.
- Islam, Md. Zahurul; Tiedemann, Jörg and Eisele, Andreas, English to Bangla Phrase – Based Machine Translation, In: Proceedings of the 14th Annual Conference of The European Association for Machine Translation. Saint-Raphaël, France, 27-28 May, 2010.
- Bouma, Gosse; Duarte, Sergio and Islam, Md. Zahurul, Cross-lingual Alignment and Completion of Wikipedia Templates, In: Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3), Boulder, Colorado, USA, June 4, 2009.
- Islam, Md. Zahurul, English to Bangla Phrase-Based Statistical Machine Translation, Masters Thesis, University of Saarbrucken and University of Groningen, 2009.
- Asadullah, Munshi; Islam, Md. Zahurul and Khan, Mumit, Error-tolerant Finite-state Recognizer and String Pattern Similarity Based Spell-Checker for Bengali, In: Proceedings of the 5th International Conference on Natural Language Processing (ICON) as a poster,Hyderabad, India, 2007.
- Islam, Md. Zahurul; Uddin, Md. Nizam and Khan, Mumit, A Light Weight Stemmer for Bengali and Its Use in Spelling Checker, In: Proceedings of the 1st International Conference on Digital Communications and Computer Applications (DCCA2007), 2007.
- Islam, Md. Zahurul and Khan, Mumit, Bangla Verb Morphology and a Multilingual Computational Morphology FrameWork for PC-KIMMO, In: Proceedings of the Workshop on Morpho - Syntactic Analysis by the School of Asian Applied Natural Language Processing for Language Diversity and Language Resource Development (ADD), Bangkok, Thailand, 2007.
- Islam, Md. Zahurul and Khan, Mumit, JKimmo: A Multilingual Computational Morphology Framework for PC-KIMMO, In: Proceedings of the 9th International Conference on Computer and Information Technology (ICCIT 2006), Dhaka, Bangladesh, 2006.
- Rownok, Tofazzal; Islam, Md. Zahurul and Khan, Mumit, Bangla Text Input and Rendering
- Support for Short Message Service on Mobile Devices, In: Proceedings of the 9th International Conference on Computer and Information Technology (ICCIT 2006), Dhaka, Bangladesh, 2006.
- Arafat, Yeasir; Islam, Md. Zahurul and Khan, Mumit, Analysis and Observations From a Bangla
- news corpus, In: Proceedings of the 9th International Conference on Computer and Information
- Technology (ICCIT 2006), Dhaka, Bangladesh, 2006.
- Islam, Md. Zahurul and Khan, Mumit, Teaching Compiler Development to undergraduates using a
- Template Based Approach, In: Proceedings of the 8th International Conference on Computer and
- Information Technology (ICCIT 2005), Dhaka, Bangladesh, 2005.
- Islam, Md. Zahurul, Code Generation for JVM .NET and MIPS Targets from Subset of C
- Language, Bachelor Thesis, BRAC University, Bangladesh, 2005.
- Md. Zahurul Islam, Multilingual Text Classification Using Information-Theoretic Features, PhD Thesis, Goethe University Frankfurt, Supervisors: Prof. Dr. Alexander Mehler and Prof. Dr. Visvanathan Ramesh. Status: Submitted.
- Md. Zahurul Islam, English to Bangla Phrase-Based Statistical Machine Translation , Masters Thesis, University of Saarland and University of Groningen, August 2009, Supervisors: Dr. Andreas Eisele, DFKI Saarbrucken and Dr. Jorg Tiedemann, University of Groningen.
- Md. Zahurul Islam, Code Generation for JVM .NET and MIPS Targets from Subset of C Language, Undergraduate Thesis (Computer Science), BRAC University, May 2005. Supervisor: Mumit Khan
- ColLex.x A lexica collected in the sense that the word form tokens which underly the lexical entries have been gathered by harvesting various resources.
- CRBLP Converter:An open source Bangla ASCII text to Unicode text conversion tool. [details] [download]
- JKimmo: An open source muiltilingual computational morphology framework for PC-KIMMO[details] [download]
- BanglaPad: An open source, platform independent, Unicode rich text editor, capable of editing Bangla and English [details] [download]
- C Minus (C-): A compiler framework for teaching Compiler Design course to Undergraduates in BRAC University, Bangladesh
Lab Instructor (Part- time) for Compiler Design Course, Department of Computer Science and Engineering, BRAC University; Fall2005 - Summer 2007.
Teaching Assistant, Operating System, BRAC University, Spring 2004
Teaching Assistant, Programming Language II (C++), BRAC University, Fall 2003