Topological Data Analysis in Natural Language Processing
Wlodek Zadrozny, UNC Charlotte
Abstract: Topological Data Analysis (TDA) introduces methods that capture the underlying structure of shapes in data. Within the last two decades, TDA has been mostly examined in unsupervised machine learning tasks. TDA has been often considered an alternative to the conventional algorithms due to its capability to deal with high-dimensional data in different tasks including but not limited to clustering, dimensionality reduction or descriptive modeling. This tutorial will focus on applications of topological data analysis to text data. After introducing the fundamentals, we will show ways in which topological information can be applied to example natural language processing (NLP) tasks, leading to new insights or improved accuracy. Examples include classification, sentence acceptability judgments, the structure of word embeddings, comparisons of writing styles, summarization, and others, such as fraud detection. Bio: Dr. Wlodek Zadrozny joined the faculty of the University of North Carolina at Charlotte in 2013, after a 27 year career at the IBM T.J. Watson Research Center. Dr. Zadrozny is Professor of Computer Science and Professor of Data Science at UNC Charlotte. His research focuses on natural language understanding and its applications. At IBM, from 2008 to 2013, Dr. Zadrozny was a member of the Watson project, the Jeopardy! playing machine, and subsequently a recipient of the 2013 AAAI Feigenbaum Prize for his contributions to the project. As a scientist at IBM Research, he led and contributed to a range of projects, including semantic search, natural language dialogue systems, and a value net analysis of intangible assets. Dr. Zadrozny published about a hundred refereed papers on various aspects of text processing and was granted sixty patents.