Course Name:
Computer Applications for Indian Languages
Parameswari Krishnamurthy || Centre for Applied Linguistics and Translation Studies, University of Hyderabad
Parameswari Krishnamurthy || Centre for Applied Linguistics and Translation Studies, University of Hyderabad
This course provides a broad introduction to language technology with a particular emphasis on developing computer applications in Indian languages. The major objectives of the course are,
To study and understand computational models of natural languages for its analysis and generation
To create and access multilingual knowledge resources for Indian languages
To provide hands-on experience in building computer applications for Indian languages
This course presents an opportunity for students to gain experience with models and algorithms used in language technology to develop practical applications for Indian languages.
Prerequisite: Knowledge on linguistics and basic computer understanding. However prior knowledge on programming is an added advantage.
In the end of the course, students are able to:
understand the need and importance of language technology in general and Indian language technology in particular.
explain the importance of language technology research for multilingual country like India in order to solve issues related to language barrier in communication.
experiment with basic text analysis with linux commands
employ the use of regular expression in matching and manipulation of strings
learn and use the NLTK toolkit, the leading platform for building Python programs to work with human language data
develop resources in building NLP tools and modules by understanding Indian language complexity
design the architecture of machine translation system and explain the complexities involved in it
Part 1- Computational paradigms in linguistics
Part 2- Computational paradigms in linguistics
The history of language technology in the world
The history of language technology in India
Part-1 Grammar formalisms and language modeling.
Part-2 Grammar formalisms and language modeling.
Computational techniques and tools
Goals of language technology
Corpora as a basis of linguistics studies
Digital corpora in computational studies
Part-1 The importance of corpus in language technology
Part-2 The importance of corpus in language technology
Corpus building: ways and challenges
Corpus and Standardization
Corpus cleaning and normalization
Characterization of modern corpora
Corpus frequency analysis: An introduction
Character frequency and syllable frequency
Word frequency and N-gram analysis
Corpus annotation
Tools and techniques in text processing
Part-1: Linux commands
Part-2: Linux commands
Part-3: Linux commands
Part-1: VI-editing commands
Part-2: VI-editing commands
Part-1: Regular expression and pattern matching
Part-2: Regular expression and pattern matching
Introducing python
Python commands and syntax
Python variables
Python Data types
Python strings, operators and regular expressions
Python loops and file handling
Part-1: Python programming
Part-2: Python programming
Part-1: practical session-1: Build a python program to tokenize words
Part-2: practical session-1: Build a python program to tokenize sentences
Part-1: practical session-2: Build a python program to find frequencies
Part-2: practical session-2: Build a python program to find frequencies
Part-1: practical session-3: Build a python program to analyze words
Part-2: practical session-3: Build a python program to analyze words
Part-1: practical session-4: Build a python program to generate words
Part-2: practical session-4: Build a python program to generate words
Introducing NLTK
NLTK installation and corpus extraction
NLTK pos tagger for English and application
NLTK parsing for English and application
Importance of building NLP tools for Indian languages
Complexity involved in Indian language technology
Part-1: Morphological complexity in Indian languages
Part-2: Morphological complexity in Indian languages
Computer applications for Indian languages
Part-1: Early models and latest developments
Part-2: Early models and latest developments
Tools required for Indian languages
Building knowledge resources for Indian languages
Morphological analysers and generators
POS taggers and Parsing
Introducing machine translation
Architecture of machine translation
Translation divergence
Part-1: Current approaches and development in machine translation
Part-2: Current approaches and development in machine translation
Bharati, A., Chaitanya, V., & Sangal, R. 1995. Natural Language Processing: A Paninian
Perspective. New Delhi: Prentice-Hall of India.
Grishman, R. 1986. Computational Linguistics: An Introduction. Cambridge: Cambridge University Press.
Hutchins, W. J., & Somers, H. L. 1992. An Introduction to Machine Translation. Vol. 362. London: Academic Press.
Jurafsky, D. and J. Martin. 2014. Speech and Language Processing: An introduction to Natural
Language Processing, Computational Linguistics, and Speech Recognition. India: Dorling
Kindersley Pvt, Ltd.
Kennedy, G. 2014. An Introduction to Corpus Linguistics. London and New York: Longman.
Ruslan, Mitkov (ed.).2002. The Oxford Handbook of Computational Linguistics. Oxford: OUP.
Uma Maheshwar Rao, G. and Amba Kulkarni. 2007. Natural Language and Computing. PGDCAIL, vol.411. Hyderabad: CDE, University of Hyderabad.