Parameswari Krishnamurthy - Introduction to Computer Application for Indian Languages

Course Name:

Computer Applications for Indian Languages

Parameswari Krishnamurthy || Centre for Applied Linguistics and Translation Studies, University of Hyderabad

Course Introduction

Course Description

This course provides a broad introduction to language technology with a particular emphasis on developing computer applications in Indian languages. The major objectives of the course are,

To study and understand computational models of natural languages for its analysis and generation
To create and access multilingual knowledge resources for Indian languages
To provide hands-on experience in building computer applications for Indian languages

This course presents an opportunity for students to gain experience with models and algorithms used in language technology to develop practical applications for Indian languages.

Prerequisite: Knowledge on linguistics and basic computer understanding. However prior knowledge on programming is an added advantage.

Learning Outcomes

In the end of the course, students are able to:

understand the need and importance of language technology in general and Indian language technology in particular.
explain the importance of language technology research for multilingual country like India in order to solve issues related to language barrier in communication.
experiment with basic text analysis with linux commands
employ the use of regular expression in matching and manipulation of strings
learn and use the NLTK toolkit, the leading platform for building Python programs to work with human language data
develop resources in building NLP tools and modules by understanding Indian language complexity
design the architecture of machine translation system and explain the complexities involved in it

Course Outline

Week 1

Introduction to language technology
Part 1- Computational paradigms in linguistics
Part 2- Computational paradigms in linguistics
The history of language technology in the world
The history of language technology in India

Week 2

Part-1 Grammar formalisms and language modeling.
Part-2 Grammar formalisms and language modeling.
Computational techniques and tools
Goals of language technology

Week 3

Corpora as a basis of linguistics studies
Digital corpora in computational studies
Part-1 The importance of corpus in language technology
Part-2 The importance of corpus in language technology

Week 4

Corpus building: ways and challenges
Corpus and Standardization
Corpus cleaning and normalization
Characterization of modern corpora

Week 5

Corpus frequency analysis: An introduction
Character frequency and syllable frequency
Word frequency and N-gram analysis
Corpus annotation

Week 6

Tools and techniques in text processing
Part-1: Linux commands
Part-2: Linux commands
Part-3: Linux commands

Week 7

Part-1: VI-editing commands
Part-2: VI-editing commands
Part-1: Regular expression and pattern matching
Part-2: Regular expression and pattern matching

Week 8

Introducing python
Python commands and syntax
Python variables
Python Data types

Week 9

Python strings, operators and regular expressions
Python loops and file handling
Part-1: Python programming
Part-2: Python programming

Week 10

Part-1: practical session-1: Build a python program to tokenize words
Part-2: practical session-1: Build a python program to tokenize sentences
Part-1: practical session-2: Build a python program to find frequencies
Part-2: practical session-2: Build a python program to find frequencies

Week 11

Part-1: practical session-3: Build a python program to analyze words
Part-2: practical session-3: Build a python program to analyze words
Part-1: practical session-4: Build a python program to generate words
Part-2: practical session-4: Build a python program to generate words

Week 12

Introducing NLTK
NLTK installation and corpus extraction
NLTK pos tagger for English and application
NLTK parsing for English and application

Week 13

Importance of building NLP tools for Indian languages
Complexity involved in Indian language technology
Part-1: Morphological complexity in Indian languages
Part-2: Morphological complexity in Indian languages

Week 14

Computer applications for Indian languages
Part-1: Early models and latest developments
Part-2: Early models and latest developments
Tools required for Indian languages

Week 15

Building knowledge resources for Indian languages
Morphological analysers and generators
POS taggers and Parsing
Introducing machine translation

Week 16

Architecture of machine translation
Translation divergence
Part-1: Current approaches and development in machine translation
Part-2: Current approaches and development in machine translation

Week-1 Course Content

Topic: Introduction to Language Technology

References

Bharati, A., Chaitanya, V., & Sangal, R. 1995. Natural Language Processing: A Paninian

Perspective. New Delhi: Prentice-Hall of India.

Grishman, R. 1986. Computational Linguistics: An Introduction. Cambridge: Cambridge University Press.

Hutchins, W. J., & Somers, H. L. 1992. An Introduction to Machine Translation. Vol. 362. London: Academic Press.

Jurafsky, D. and J. Martin. 2014. Speech and Language Processing: An introduction to Natural

Language Processing, Computational Linguistics, and Speech Recognition. India: Dorling

Kindersley Pvt, Ltd.

Kennedy, G. 2014. An Introduction to Corpus Linguistics. London and New York: Longman.

Ruslan, Mitkov (ed.).2002. The Oxford Handbook of Computational Linguistics. Oxford: OUP.

Uma Maheshwar Rao, G. and Amba Kulkarni. 2007. Natural Language and Computing. PGDCAIL, vol.411. Hyderabad: CDE, University of Hyderabad.

E-text Content

Assessment Questions

Discussion Forums

groups.google.com/g/cail-group/

Page updated

Report abuse