Course Name:
Computer Applications for Indian Languages
Parameswari Krishnamurthy || Centre for Applied Linguistics and Translation Studies, University of Hyderabad
Course Introduction
Course Description
This course provides a broad introduction to language technology with a particular emphasis on developing computer applications in Indian languages. The major objectives of the course are,
To study and understand computational models of natural languages for its analysis and generation
To create and access multilingual knowledge resources for Indian languages
To provide hands-on experience in building computer applications for Indian languages
This course presents an opportunity for students to gain experience with models and algorithms used in language technology to develop practical applications for Indian languages.
Prerequisite: Knowledge on linguistics and basic computer understanding. However prior knowledge on programming is an added advantage.
Learning Outcomes
In the end of the course, students are able to:
understand the need and importance of language technology in general and Indian language technology in particular.
explain the importance of language technology research for multilingual country like India in order to solve issues related to language barrier in communication.
experiment with basic text analysis with linux commands
employ the use of regular expression in matching and manipulation of strings
learn and use the NLTK toolkit, the leading platform for building Python programs to work with human language data
develop resources in building NLP tools and modules by understanding Indian language complexity
design the architecture of machine translation system and explain the complexities involved in it
Course Outline
Week 1
Part 1- Computational paradigms in linguistics
Part 2- Computational paradigms in linguistics
The history of language technology in the world
The history of language technology in India
Week 2
Part-1 Grammar formalisms and language modeling.
Part-2 Grammar formalisms and language modeling.
Computational techniques and tools
Goals of language technology
Week 3
Corpora as a basis of linguistics studies
Digital corpora in computational studies
Part-1 The importance of corpus in language technology
Part-2 The importance of corpus in language technology
Week 4
Corpus building: ways and challenges
Corpus and Standardization
Corpus cleaning and normalization
Characterization of modern corpora
Week 5
Corpus frequency analysis: An introduction
Character frequency and syllable frequency
Word frequency and N-gram analysis
Corpus annotation
Week 6
Tools and techniques in text processing
Part-1: Linux commands
Part-2: Linux commands
Part-3: Linux commands
Week 7
Part-1: VI-editing commands
Part-2: VI-editing commands
Part-1: Regular expression and pattern matching
Part-2: Regular expression and pattern matching
Week 8
Introducing python
Python commands and syntax
Python variables
Python Data types
Week 9
Python strings, operators and regular expressions
Python loops and file handling
Part-1: Python programming
Part-2: Python programming
Week 10
Part-1: practical session-1: Build a python program to tokenize words
Part-2: practical session-1: Build a python program to tokenize sentences
Part-1: practical session-2: Build a python program to find frequencies
Part-2: practical session-2: Build a python program to find frequencies
Week 11
Part-1: practical session-3: Build a python program to analyze words
Part-2: practical session-3: Build a python program to analyze words
Part-1: practical session-4: Build a python program to generate words
Part-2: practical session-4: Build a python program to generate words
Week 12
Introducing NLTK
NLTK installation and corpus extraction
NLTK pos tagger for English and application
NLTK parsing for English and application
Week 13
Importance of building NLP tools for Indian languages
Complexity involved in Indian language technology
Part-1: Morphological complexity in Indian languages
Part-2: Morphological complexity in Indian languages
Week 14
Computer applications for Indian languages
Part-1: Early models and latest developments
Part-2: Early models and latest developments
Tools required for Indian languages
Week 15
Building knowledge resources for Indian languages
Morphological analysers and generators
POS taggers and Parsing
Introducing machine translation
Week 16
Architecture of machine translation
Translation divergence
Part-1: Current approaches and development in machine translation
Part-2: Current approaches and development in machine translation
Week-1 Course Content
Topic: Introduction to Language Technology
References
Bharati, A., Chaitanya, V., & Sangal, R. 1995. Natural Language Processing: A Paninian
Perspective. New Delhi: Prentice-Hall of India.
Grishman, R. 1986. Computational Linguistics: An Introduction. Cambridge: Cambridge University Press.
Hutchins, W. J., & Somers, H. L. 1992. An Introduction to Machine Translation. Vol. 362. London: Academic Press.
Jurafsky, D. and J. Martin. 2014. Speech and Language Processing: An introduction to Natural
Language Processing, Computational Linguistics, and Speech Recognition. India: Dorling
Kindersley Pvt, Ltd.
Kennedy, G. 2014. An Introduction to Corpus Linguistics. London and New York: Longman.
Ruslan, Mitkov (ed.).2002. The Oxford Handbook of Computational Linguistics. Oxford: OUP.
Uma Maheshwar Rao, G. and Amba Kulkarni. 2007. Natural Language and Computing. PGDCAIL, vol.411. Hyderabad: CDE, University of Hyderabad.