Invited Speaker

Wray Buntine - Making Topic Models More Usable

The output of topic models has always been seductive but not quite satisfying ever since the early work of Hofmann (PLSI) and Lee and Seung (NMF). An important approach to cleaning up the semantics does output analysis using coherence, and indeed other document summarisation methods could also be used. However, this talk argues that topic models themselves need attention. New ways of modelling document semantics are being explored in the field of deep neural networks. Similarly, non-parametric versions of topic models allow modelling such effects as document structure, word sparsity, word burstiness, background words, multi-word terms, and network effects from author or follower networks, and semantic wordhierarchies. These are usually done in the spirit of deep neural networks using hierarchical models, but earlier algorithms were often too slow to be realistic.
This talk will start with a brief tour of some of the variants, which can only be superficial given the huge number. This will be followed by a brief tour of some non-parametric methods known to be moderately efficient and suiting multi-core implementation. Note that the most important effect, modelling coherence, is currently poorly developed. The talk will then present experimental results on various versions of topic models to see how they can mitigate some of the unwanted artifacts of simple LDA.

Wray Buntine joined Monash University as a professor in February 2014 after 7 years at NICTA in Canberra Australia. He is a co-director of the Machine-Learning Flagship, director of the Master of Data Science. He was previously of Helsinki Institute for Information Technology from 2002, and at NASA Ames Research Center, University of California, Berkeley, and Google. He is known for his theoretical and applied work in graphical models, Bayesian non-parametric methods, and document and text analysis. He applies probabilistic and non-parametric methods to tasks such as text analysis. In 2009 he was programme co-chair of ECML-PKDD in Bled, Slovenia, programme co-chair of ACML in Singapore in 2012 and general co-chair of ACML in Canberra in 2013. He reviews for conferences such as ACML, ECIR, SIGIR, ECML-PKDD, ICML, NIPS, UAI, and KDD, and is on the editorial board of Data Mining and Knowledge Discovery. He distributes theHCA topic modelling suite, currently the only multi-core non-parametric topic modelling software with comparable speed to parametric software.