Fourth Workshop on Analytics for Noisy

Unstructured Text Data

October 26th, 2010, Toronto, Canada

in conjunction with CIKM 2010

Noisy unstructured text data is ubiquitous in real-world communication. Natural language and the creative ways that humans use it can create problems for computational techniques. Electronic text from the Internet (emails, message boards, newsgroups, blogs, wikis, chatlogs and web pages), contact centers (complaints, emails, call transcriptions, message summaries), and mobile phones (SMS) is often noisy – contains spelling errors, abbreviations, non-standard words, false starts, repetitions,


AND 2010 is a workshop devoted to issues arising from the need to contend with noisy inputs, the impact noise can have on downstream applications, and the demands it places on document analysis. The Fourth Workshop on Analytics for Noisy Unstructured Text Data will build on three previous successful AND workshops held in 2007 (in conjunction with the 20th Joint Conference on Artificial Intelligence [IJCAI]), 2008 (in conjunction with the 31st Annual International ACM SIGIR Conference) and 2009 (in conjunction with the 10th International Conference on Document Analysis and Recognition [ICDAR]).

The proceedings of AND 2010 are available on ACM Digital Library. The proceedings of AND 2009, proceedings of AND 2008 and proceedings of AND 2007 are also available online.


* Keynote *

The Nature of Noise in Linguistic Corpora

Randy Goebel, University of Alberta

* Panel Discussion *

Why is it Impossible to Handle Noisy Text With Existing Techniques: The Way Forward

Seamus Ross, University of Toronto
Yuji Matsumoto, Nara Institute of Science and Technology
Gareth Jones, Dublin City University

* Journal Special Issue *

Selected papers from the workshop will be invited to submit expanded versions for publication in the International Journal of Document Analysis and Recognition published by Springer in a special issue.


* IAPR Best Student Paper *

Julien Fayolle for the paper:

Julien Fayolle, Fabienne Moreau, Christian Raymond and Guillaume Gravier. Reshaping automatic speech transcripts for robust high-level spoken document analysis


* Photographs from the Workshop*