Course Name:

Computer Applications for Indian Languages

Parameswari Krishnamurthy || Centre for Applied Linguistics and Translation Studies, University of Hyderabad

Course Introduction

Course Description

This course provides a broad introduction to language technology with a particular emphasis on developing computer applications in Indian languages. The major objectives of the course are,

  • To study and understand computational models of natural languages for its analysis and generation

  • To create and access multilingual knowledge resources for Indian languages

  • To provide hands-on experience in building computer applications for Indian languages

This course presents an opportunity for students to gain experience with models and algorithms used in language technology to develop practical applications for Indian languages.

Prerequisite: Knowledge on linguistics and basic computer understanding. However prior knowledge on programming is an added advantage.

Learning Outcomes

In the end of the course, students are able to:

  • understand the need and importance of language technology in general and Indian language technology in particular.

  • explain the importance of language technology research for multilingual country like India in order to solve issues related to language barrier in communication.

  • experiment with basic text analysis with linux commands

  • employ the use of regular expression in matching and manipulation of strings

  • learn and use the NLTK toolkit, the leading platform for building Python programs to work with human language data

  • develop resources in building NLP tools and modules by understanding Indian language complexity

  • design the architecture of machine translation system and explain the complexities involved in it

Course Outline

Week 1

  • Introduction to language technology

  • Part 1- Computational paradigms in linguistics

  • Part 2- Computational paradigms in linguistics

  • The history of language technology in the world

  • The history of language technology in India

Week 2

  • Part-1 Grammar formalisms and language modeling.

  • Part-2 Grammar formalisms and language modeling.

  • Computational techniques and tools

  • Goals of language technology

Week 3

  • Corpora as a basis of linguistics studies

  • Digital corpora in computational studies

  • Part-1 The importance of corpus in language technology

  • Part-2 The importance of corpus in language technology

Week 4

  • Corpus building: ways and challenges

  • Corpus and Standardization

  • Corpus cleaning and normalization

  • Characterization of modern corpora

Week 5

  • Corpus frequency analysis: An introduction

  • Character frequency and syllable frequency

  • Word frequency and N-gram analysis

  • Corpus annotation

Week 6

  • Tools and techniques in text processing

  • Part-1: Linux commands

  • Part-2: Linux commands

  • Part-3: Linux commands

Week 7

  • Part-1: VI-editing commands

  • Part-2: VI-editing commands

  • Part-1: Regular expression and pattern matching

  • Part-2: Regular expression and pattern matching

Week 8

  • Introducing python

  • Python commands and syntax

  • Python variables

  • Python Data types

Week 9

  • Python strings, operators and regular expressions

  • Python loops and file handling

  • Part-1: Python programming

  • Part-2: Python programming

Week 10

  • Part-1: practical session-1: Build a python program to tokenize words

  • Part-2: practical session-1: Build a python program to tokenize sentences

  • Part-1: practical session-2: Build a python program to find frequencies

  • Part-2: practical session-2: Build a python program to find frequencies

Week 11

  • Part-1: practical session-3: Build a python program to analyze words

  • Part-2: practical session-3: Build a python program to analyze words

  • Part-1: practical session-4: Build a python program to generate words

  • Part-2: practical session-4: Build a python program to generate words

Week 12

  • Introducing NLTK

  • NLTK installation and corpus extraction

  • NLTK pos tagger for English and application

  • NLTK parsing for English and application

Week 13

  • Importance of building NLP tools for Indian languages

  • Complexity involved in Indian language technology

  • Part-1: Morphological complexity in Indian languages

  • Part-2: Morphological complexity in Indian languages

Week 14

  • Computer applications for Indian languages

  • Part-1: Early models and latest developments

  • Part-2: Early models and latest developments

  • Tools required for Indian languages

Week 15

  • Building knowledge resources for Indian languages

  • Morphological analysers and generators

  • POS taggers and Parsing

  • Introducing machine translation

Week 16

  • Architecture of machine translation

  • Translation divergence

  • Part-1: Current approaches and development in machine translation

  • Part-2: Current approaches and development in machine translation

Week-1 Course Content

Topic: Introduction to Language Technology

References

Bharati, A., Chaitanya, V., & Sangal, R. 1995. Natural Language Processing: A Paninian

Perspective. New Delhi: Prentice-Hall of India.

Grishman, R. 1986. Computational Linguistics: An Introduction. Cambridge: Cambridge University Press.

Hutchins, W. J., & Somers, H. L. 1992. An Introduction to Machine Translation. Vol. 362. London: Academic Press.

Jurafsky, D. and J. Martin. 2014. Speech and Language Processing: An introduction to Natural

Language Processing, Computational Linguistics, and Speech Recognition. India: Dorling

Kindersley Pvt, Ltd.

Kennedy, G. 2014. An Introduction to Corpus Linguistics. London and New York: Longman.

Ruslan, Mitkov (ed.).2002. The Oxford Handbook of Computational Linguistics. Oxford: OUP.

Uma Maheshwar Rao, G. and Amba Kulkarni. 2007. Natural Language and Computing. PGDCAIL, vol.411. Hyderabad: CDE, University of Hyderabad.

E-text Content

Assessment Questions