Generate & Tag Topics with Hyperbolic Geometries

Hyperbolic Spaces

In figure (a) we see a concept tree in Euclidean spaces. Words such as space shuttle and satellite, which belong to moderately different super-concepts such as vehicles and space, respectively, are brought closer together due to their semantic similarity. This leads to a convergence of their surrounding words, such as helicopter and solar system, creating a false distance relationship and a crowding effect in Euclidean spaces

In figure (b), we see a concept tree in Hyperbolic spaces (Poincaré ball), which inherently has more space (represented by grey circles) than Euclidean spaces. The distances here grow exponentially towards the edge of the ball, and the concepts at deeper levels such as helicopter and solar systems move apart in these growing spaces and are far from each other. 

The dashed blue line shows how the distances in both spaces are calculated.

HyHTM: Hyperbolic Geometry based Hyperbolic Topic Model  

Published at Findings of ACL, 2023 [Code] [Paper], presented at Findings Spotlight

Also presented at NeurIPS WiML Workshop, 2021 [Paper] [Oral Presentation

Hierarchical Topic Models (HTMs) are useful for discovering topic hierarchies in a collection of documents. 

Problem 1: Traditional HTMs often produce hierarchies where lower-level topics are unrelated and not specific enough to their higher-level topics. 

Solution: We use hyperbolic geometry to create topic hierarchies that better capture hierarchical relationships in real-world concepts. To achieve this, we propose a novel method of incorporating semantic hierarchy among words from hyperbolic spaces and encoding it explicitly into topic models. This encourages the topic model to attend to parent-child relationships between topics.


Problem 2: Existing methods can be computationally expensive (taking ~32 hrs for even 18k documents)

Solution: Pre-computation of word-hierarchies can save up a lot of time!


Figure on the right shows hierarchies by HyHTM (Ours) Vs CluHTM which uses euclidean embeddings. 

Topic Label with Hierarchical Relationships


Topic Models are statistical tools to learn, extract, recognize latent topics present in a collection of documents. A topic is usually represented by a list of terms ranked by their probability but these can be difficult to interpret.

We propose an unsupervised topic labeler which employs the parent-child hierarchical relations present in the hyperbolic geometry to assign labels to topics.