Research

Home          Teaching          Publications          Professional         Personal

The underlying theme of my work has been the search for informative patterns in signals that include raw speech, image, text, and others enhanced for different communication channels.

  • Noisy Text Analytics
  • Biological Knowledge Discovery Infrastructure
  • Audio-Visual Speech Recognition
  • Underwater Image Processing
  • Coding for AWGN and Fading Channels

Scroll down for more details on each of the above topics.


Current Research

 

Noisy Text Analytics

Noisy unstructured text data results from informal communications (online chat, SMS, e-mails, message boards, tweets, etc.) and text produced by processing (automated speech recognition, optical character recognition, machine translation, historical text, etc.). Noise severely degrades the performance of information processing algorithms (e.g. NLP, classification, clustering, information retrieval, summarization, information extraction). In this work techniques to overcome the noise and do reasonable text analytics are being developed. The focus is on computer extraction of useful information from noisy informal unstructured human-human communications. 

I co-founded the workshop series on Analytics for Noisy Unstructured Text Data (AND) and co-chaired it in 2007, 2008, 2009 and 2010. I was guest co-editor of the International Journal on Document Analysis and Recognition (IJDAR) for three special issues on Noisy Text Analytics in 2007, 2009 and 2011. This focus has successfully highlighted research challenges in processing noisy unstructured text data. As a result several top text processing and text mining conferences like NAACL HLT, ACL, WAIM, SDM, ECIR have included noisy text analytics as a prominent topic in their call for papers in recent times. In fact NAACL HLT 2010 was completely focused on this topic. I also gave a tutorial on this topic at NAACL HLT 2010.

In 2011 my team's work on Noisy Text Analytics has been recognized as a IBM research accomplishment.

 

Relevant Publications:


Overviews and Surveys:


* Data Cleansing Techniques for Large Enterprise Datasets

K. H. Prasad, T. A. Faruquie, S. Joshi, S. Chaturvedi, L. V. Subramaniam, M. K. Mohania
Proceedings: SRII Global Conference (SRII), San Jose, CA, USA, Mar 30 - Apr 2, 2011

* Data Cleansing as a Transient Service (Industry Track)

T. A. Faruquie, K. H. Prasad, L. V. Subramaniam, M. Mohania, G. Venkatachaliah, S. Kulkarni, P. Basu
Proceedings: IEEE International Conference on Data Engineering (ICDE), Long Beach, CA, USA, Mar 1-6, 2010

* A Survey of Types of Text Noise and Techniques to Handle Noisy Text

L. V. Subramaniam, S. Roy, T. A. Faruquie, S. Negi
Proceedings: Third Workshop on Analytics for Noisy Unstructured Text Data (AND), Pages 115-122, Barcelona, Spain, Jul 23-24, 2009

* Analytics for Noisy Unstructured Text Data II

L. V. Subramaniam, S. Roy
Encyclopedia of Artificial Intelligence, Information Science Reference, 2008

* Analytics for Noisy Unstructured Text Data I

S. Roy, L. V. Subramaniam
Encyclopedia of Artificial Intelligence, Information Science Reference, 2008

 

Handling Noise in Informal Communications (SMS, Emails):


* Handling Noisy Queries in Cross Language FAQ Retrieval

D. Contractor, G. Kothari, T. A. Faruquie, L. V. Subramaniam, S. Negi
Proceedings: Conference on Empirical Methods in Natural Language Processing (EMNLP), Cambridge, MA, USA, Oct 9-11, 2010

* Unsupervised Cleansing of Noisy Text

D. Contractor, T. A. Faruquie, L. V. Subramaniam
Proceedings: International Conference on Computational Linguistics (COLING), Beijing, China, Aug 23-27, 2010

* Transfer of Supervision for Improved Address Standardization

G. Kothari, T. A. Faruquie, L. V. Subramaniam, K. H. Prasad, M. Mohania
Accepted: International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, Aug 23-26, 2010

* Automatically Generating Term Frequency Induced Taxonomies (Short Paper)

K. Murthy, T. A. Faruquie, L. V. Subramaniam, K. H. Prasad, M. Mohania
Accepted: Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, Jul 11-16, 2010

* A Knowledge Acquisition Method for Improving Data Quality in Services Engagements (IndustryTrack )

M. Dani, T. A. Faruquie, R. Garg, G. Kothari, M. Mohania, K. H. Prasad, L. V. Subramaniam, V. Swamy
Accepted: IEEE International Conference on Services Computing (SCC), Miami, Florida, Jul 5-10, 2010

* Customer Focused Service Management for Contact Centers

M. Bhide, S. Negi, L. V. Subramaniam, H. Gupta
IBM Journal of Research and Development (IBMR): Special Issue on Global Service Delivery Technology, Vol. 53, No. 6, Paper 9, 2009

* Language Independent Unsupervised Learning of Short Message Service Dialect

S. Acharyya, S. Negi, L. V. Subramaniam, S. Roy
International Journal on Document Analysis and Recognition (IJDAR): Special Issue on Noisy Text Analytics, Springer, Vol. 12, No. 3, pp. 175-184, September 2009

* SMS Based Interface for FAQ Retrieval

G. Kothari, S. Negi, T. A. Faruquie, V. T. Chakaravarthy, L. V. Subramaniam
Proceedings: Joint Conference of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing (ACL-IJCNLP), Pages 852-860, Singapore, Aug 2-7, 2009

* Mobile Medicine: Providing Drug Related Information through Natural Language Queries

A. Langer, B. Kumar, A. Mittal, L. V. Subramaniam
Proceedings: IEEE International Advance Computing Conference (IACC), Pages 546-551, Patiala, India, Mar 6-7, 2009

* Unsupervised Learning of Multilingual Short Messaging Service (SMS) Dialect From Noisy Examples

S. Acharyya, S. Negi, L. V. Subramaniam, S. Roy
Proceedings: SIGIR 2008 Workshop on Analytics for Noisy Unstructured Text Data (AND), Pages 67-74, Singapore, Jul 24, 2008

* Automatically Selecting Answer Templates to Respond to Customer Emails

R. Malik, L. V. Subramaniam, S. Kaushik
Proceedings: International Joint Conference on Artificial Intelligence (IJCAI), Pages 1659-1664, Hyderabad, India, Jan. 6-12, 2007


Noise in Enterprise Datasets:


* Optimal Training Data Selection for Rule-Based Cleansing Models

S. Chaturvedi, T. A. Faruquie, L. V. Subramaniam, K. H. Prasad, G. Venkatachaliah, S. Padmanabhan
Proceedings: SRII Global Conference (SRII), San Jose, CA, USA, Mar 30 - Apr 2, 2011

* Estimating Accuracy for Text Classification Tasks on Large Unlabelled Data

S. Chaturvedi, T. A. Faruquie, L. V. Subramaniam, M. K. Mohania
Proceedings: ACM Conference on Information and Knowledge Management (CIKM), Toronto, Canada, Oct 26-30, 2010

* Automatically Generating Term Frequency Induced Taxonomies (Short Paper)

K. Murthy, T. A. Faruquie, L. V. Subramaniam, K. H. Prasad, M. Mohania
Proceedings: Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, Jul 11-16, 2010

* Resource Allocation and SLA Determination for Large Data Processing Services Over Cloud (Industry Track)

K. H Prasad, T. A. Faruquie, L. V. Subramaniam, M. Mohania, G. Venkatachaliah
Proceedings: IEEE International Conference on Services Computing (SCC), Miami, Florida, Jul 5-10, 2010

* A Knowledge Acquisition Method for Improving Data Quality in Services Engagements (Industry Track )

M. Dani, T. A. Faruquie, R. Garg, G. Kothari, M. Mohania, K. H. Prasad, L. V. Subramaniam, V. Swamy
Proceedings: IEEE International Conference on Services Computing (SCC), Miami, Florida, Jul 5-10, 2010


Handling Noise Resulting from Processing (Noisy Speech Transcripts):


* Automatically Extracting Dialog Models From Conversation Transcripts (Short Paper)

S. Negi, S. Joshi, A. Chalamalla, L. V. Subramaniam
Proceedings: IEEE International Conference on Data Mining (ICDM), Miami, FL, USA, Dec 6-9, 2009

* Unsupervised Segmentation of Conversational Transcripts

K. Kummamuru, D. Padmanabhan, S. Roy, L. V. Subramaniam
Statistical Analysis and Data Mining (SAM), Vol. 2, No. 4, pp. 231-245, November 2009

* Protecting Sensitive Customer Information in Call Center Recordings

T. A. Faruquie, S. Negi, L. V. Subramaniam
Proceedings: IEEE International Conference on Services Computing (SCC), Pages 81-88, Bangalore, India, Sept 21-25, 2009

* Getting Insights From the Voices of Customers: Conversation Mining at a Contact Center

H. Takeuchi, L. V. Subramaniam, T. Nasukawa, S. Roy
Information Sciences (IS): Special Issue on Chance Discovery, Elsevier, Vol. 179, No. 11, pp. 1584-1591, May 2009

* Business Intelligence from Voice of Customer (Industry Track)

L. V. Subramaniam, T. A. Faruquie, S. Ikbal, S. Godbole, M. K. Mohania
Proceedings: International Conference on Data Engineering (ICDE), Shanghai, China, Mar 29-April 4, 2009

* Identification of Class Specific Discourse Patterns

A. K. Chalamalla, S. Negi, L. V. Subramaniam, G. Ramakrishnan
Proceedings: ACM Conference on Information and Knowledge Management (CIKM), Pages 1193-1202, Napa Valley, CA, Oct 26-30, 2008

* Exploiting Context to Detect Sensitive Information in Call Center Conversations (Poster Paper)

T. A. Faruquie, S. Negi, A. K. Chalamalla, L. V. Subramaniam
Proceedings: ACM Conference on Information and Knowledge Management (CIKM), Pages 1513-1514, Napa Valley, CA, Oct 26-30, 2008

* Unsupervised Segmentation of Conversational Transcripts

K. Kummamuru, Deepak P., S. Roy, L. V. Subramaniam
Proceedings: SIAM International Conference on Data Mining (SDM), Pages 834-845, Atlanta, Georgia, Apr. 24-26, 2008 

* Sentence Boundary Detection in Conversational Speech Transcripts using Noisily Labelled Examples

H. Takeuchi, L. V. Subramaniam, S. Roy, D. Punjani, T. Nasukawa
International Journal on Document Analysis and Recognition (IJDAR): Special Issue on Noisy Text Analytics, Springer, Vol. 10, No. 3-4, pp. 147-155, Dec. 2007

* A Conversation-Mining System for Gathering Insights to Improve Agent Productivity (Short Paper)

H. Takeuchi, L. V. Subramaniam, T. Nasukawa, S. Roy, S. Balakrishnan
Proceedings: IEEE Joint Conference on E-Commerce Technology and Enterprise Computing, E-Commerce and E-Services (CEC-EEE), Pages 465-468, Tokyo, Japan, Jul. 23-26, 2007

* Automatic Identification of Important Segments and Expressions for Mining of Business-Oriented Conversations at Contact Centers

H. Takeuchi, L. V. Subramaniam, T. Nasukawa, S. Roy
Proceedings: Conference on Empirical Methods in Natural Langauge Processing (EMNLP), Pages 458-467, Prague, Czech Republic, Jun. 28-30, 2007

* Adding Sentence Boundaries to Conversational Speech Transcriptions Using Noisily Labeled Examples

T. Nasukawa, D. Punjani, S. Roy, L. V. Subramaniam, H. Takeuchi
Proceedings: IJCAI 2007 Workshop on Analytics for Noisy Unstructured Text Data (AND), Pages 71-78, Hyderabad, India, Jan 8, 2007

* Automatic Generation of Domain Models for Call-Centers from Noisy Transcriptions

S. Roy, L. V. Subramaniam
Proceedings: Joint International Conference on Computational Linguistics and the Conference of the Association for Computational Linguistics (COLING-ACL), Pages 737-744, Sydney, Australia, Jul. 17-21, 2006

 

Past Research

 

Biological Knowledge Discovery Infrastructure

Biological text data is getting generated at a very fast rate. New research findings are published and made available online (for example, PUBMED). As a medical researcher, drug designer, biologist, life scientist, chemist, ... this is the literature one reads to come up to date with the latest in the field. However, because of the volume of data it is very difficult to home in on the relevant information. In this project new methods for biological knowledge discovery are being developed.


Relevant Publications:

 

* Text Analytics for Life Science Using the Unstructured Information Management Architecture

R. Mack, S. Mukherjea, A. Sofer, N. Uramoto, E. Brown, A. Coden, J. Cooper, A. Inokuchi, B. Iyer, Y. Mass, H. Matsuzawa, L. V. Subramaniam
IBM Systems Journal (IBMR): Special Issue on Unstructured Information Management, Vol. 43, No. 3, pp. 490-515, 2004

* Enhancing a Biomedical Information Extraction System with Dictionary Mining and Context Disambiguation

S. Mukherjea, L. V. Subramaniam, G. Chanda, S. Sankararaman, R. Kothari, V. Batra, D. Bhardwaj, B. Srivastava,
IBM Journal of Research and Development (IBMR): Special Issue on Research in Asia, Vol 48, No. 5-6, pp. 693-702, 2004

* Information Extraction from Biomedical Literature: Methodology, Evaluation and an Application

L. V. Subramaniam, S. Mukherjea, P. Kankar, B. Srivastava, V. S. Batra, P. V. Kamesam, R. Kothari
Proceedings: ACM Conference on Information and Knowledge Management (CIKM), Pages 410-417, New Orleans, USA, Nov. 3-8, 2003

 

Audio-Visual Speech Recognition

In this project worked on the twin problems of multimodal speech recognition and audio driven facial animation. The audio and video provide orthogonal information in many cases and their combination is been shown to aid in speech recognition greatly. Also in this project methods were suggested for tracking of lips on a talking face and the extraction of visual features, for speech recognition, from the lip region. Also techniques for audio driven facial animation using morphing based and other approaches were suggested. Techniques for adapting the phone set of one language to another were developed. A translingual visual speech synthesis engine based on this research was developed.


Relevant Publications:

 

* Animating Expressive Faces Across Languages

A. Verma, L. V. Subramaniam, N. Rajput, C. Neti, T. A. Faruquie
IEEE Transactions on Multimedia (TMM), Vol. 6, No 6, pp. 791-800, Dec. 2004

* Using Viseme Based Acoustic Models for Speech Driven Lip Synthesis

A. Verma, N. Rajput, L. V. Subramaniam
Proceedings: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Pages V 720-723, Hong Kong, Apr. 6-10, 2003 and
Proceedings: International Conference on Multimedia and Expo (ICME), Volume: 3, Pages 533-536, Jul. 6-9, 2003

* An English-Hindi Statistical Machine Translation System

T. A. Faruquie, L. V. Subramaniam, R. Udupa
Proceedings: Symposium on Translation Support Systems (STRANS), Kanpur, India, Mar. 15-17, 2002

* Animating Expressive Faces to Speak in Indian Languages

T. A. Faruquie, C. Neti, N. Rajput, L. V. Subramaniam, A. Verma
Proceedings: National Conference on Communications (NCC), Pages 355-362, Bombay, India, Jan. 26-27, 2002

* Audio Driven Facial Animation for Audio-Visual Reality

T. Faruquie, A. Kapoor, R. Kate, N. Rajput, L. V. Subramaniam
Proceedings: IEEE International Conference on Multimedia and Expo (ICME), Pages 821-824, Tokyo, Japan, Aug. 22-25, 2001

* On Deriving a Phoneme Model for a New Language

N. Mukherjee, N. Rajput, L. V. Subramaniam, A. Verma
Proceedings: IEEE International Conference on Spoken Language Processing (ICSLP), Pages 198-201, Beijing, China, Oct. 16-20, 2000

* Adapting Phonetic Decision Trees Between Languages for Continuous Speech Recognition

N. Rajput, L. V. Subramaniam, A. Verma
Proceedings: IEEE International Conference on Spoken Language Processing (ICSLP), Pages 850-852, Beijing, China, Oct. 16-20, 2000

* Large Vocabulary Audio-Visual Speech Recognition using Active Shape Models

T. A. Faruquie, A. Majumdar, N. Rajput, L. V. Subramaniam
Proceedings: International Conference on Pattern Recognition (ICPR), Pages 106-109, Barcelona, Spain, Sep. 3-8, 2000

* Translingual Visual Speech Synthesis

T. A. Faruquie, C. Neti, N. Rajput, L. V. Subramaniam, A. Verma
Proceedings: IEEE International Conference on Multimedia and Expo (ICME), Pages 1089-1092, New York, USA, Jul. 30-Aug. 2, 2000

* Audio-Visual Large Vocabulary Continuous Speech Recognition in the Broadcast Domain

S. Basu, C. Neti, N. Rajput, A. Senior, L. Subramaniam, A. Verma
Proceedings: 1999 IEEE International Workshop on Multimedia Signal Processing (MMSP), Pages 475-481, Copenhagen, Denmark, Sep. 13-15, 1999

 

Underwater Image Processing

Worked as part of the team developing algorithms and software for the Department of Electronics sponsored work on ADvanced Object VIsualization Techniques (ADOVIT) for the underwater scenario from August 1993 to September 1996. This work involved forming visual images using sonar data. Sonar data is very sparse and noisy. A multiframe imaging Technique was developed to reduce speckle and noise. Using the sonar data as hard constraints it is merged with a shape from shading model obtained of the scene from visual images. Segmentation and surface understanding techniques are then used to form a dense image for a human observer.


Relevant Publications:

 

* Segmentation and Surface Fitting of Sonar Images for 3-D Visualization

L. V. Subramaniam, R. Bahl
Proceedings: 9th International Symposium on Unmanned Untethered Submersible Technology (UUST), Pages 350-359, Durham, NH, USA, Sep. 25-27, 1995

* Reduction of Speckle and Environmental Noise Using Multiframe Imaging Technique

R. Bahl, L. V. Subramaniam, M. Kumar, N. Rajpal
Proceedings: 9th International Symposium on Unmanned Untethered Submersible Technology (UUST), Pages 290-299, Durham, NH, USA, Sep. 25-27, 1995


Coding for AWGN and Fading Channels

How Good is a code designed for the AWGN Channel? Tried to answer this question by obtaining a lower bound on the largest achievable rate vs Euclidean distance of the code. This work also suggests constellations in 2,3 and 4-dimensions over which asymptotically good codes may be found. Over Fading channels worked on TCM schemes with asymmetric PSK signal sets specially modeled to maximize performance. 

 

Relevant Publications:

 

* Gilbert-Varshamov Bound for Euclidean Space Codes Over Distance-Uniform Signal Sets

B. S. Rajan, L. V. Subramaniam, R. Bahl
IEEE Transactions on Information Theory (T-IT), Vol 48, pp. 537-546, Feb. 2002

* Performance of 4 and 8-State TCM Schemes with Asymmetric 8-PSK in Fading Channels

L. V. Subramaniam, B. S. Rajan, R. Bahl
IEEE Transactions on Vehicular Technology (TVT), Vol. 49, No. 1, pp. 211-219, Jan. 2000

* Gilbert-Varshamov Bound for Euclidean Space Codes Over Signal Sets Matched to Groups

B. S. Rajan, L. V. Subramaniam, R. Bahl
Proceedings: National Conference on Communications (NCC), Pages 355-362, Kharagpur, India, Jan. 30-31, 1999

* Trellis Coded Modulation Schemes for Underwater Acoustic Communications

L. V. Subramaniam, B. S. Rajan, R. Bahl
Proceedings: IEEE OCEANS, Pages 800-804, Nice, France, Sep. 28 - Oct. 1, 1998

* A 4-state Asymmetric 8-PSK TCM Scheme for Rayleigh Fading Channels

L. V. Subramaniam, B. S. Rajan, R. Bahl
Proceedings: IEEE International Symposium on Information Theory (ISIT), Page 252, MIT, USA, Aug. 16-21, 1998

* An Optimal 16-ary Constellation for Underwater Acoustic Communications

L. V. Subramaniam, R. Bahl, B. S. Rajan
Proceedings: National Symposium on Ocean Electronics (NSOE), Pages 29-34, Cochin, India, Dec. 16-17, 1997

 

Home         Teaching          Publications          Professional       Personal