Reading, Reimagined

A Framework for Enhanced Readability through Cross-Domain Entity Alignment

[Paper, NeurIPS ML4CD] [Video] [Poster]

Simra Shahid, Surgan Jandial, Nikitha Srikanth, Balaji Krishnamurthy

Problem Statement

In the face of increasingly diverse reader demographics ensuring that content both resonates and remains relevant poses a subtle yet intricate challenge. Traditionally, content personalisation has focused on tailoring tonality, style, and lexicons to suit reader preferences. Recently there have been frameworks aimed at simplification like 'Explain it to me like I'm five' and targeted explanation like 'Explain to me like I'm a scientist'. However, even with modifications to content style, readers might still experience a disconnection when met with examples in the content that diverge from their knowledge and interests, such as for a fashion enthusiast, technological references spanning disparate eras or geographic regions may not be relevant.

In this work, we approach the temporal, region and topic interest entity correspondence problem in which, given an example and its context, the task is to find an alternative example that exists in a target time, region or topic of interest.  In this manner, our work utilises LLMs to enhance readability by adapting entities and context within a document to align closely with varied reader interests, ensuring reading is more engaging, relatable, and factually consistent for diverse readers.

Example

Consider that you are reading a document that mentions Magnavox Odyssey (refer to the highlighted yellow text in the Figure below). From the surrounding context, we recognise that even though Magnavox Odyssey was the first to enter the market, it did not sustain the first mover advantage. However, the example does not resonate with all readers and they might lose relevance over time. 

Our framework aims to retrieve other replaceable entities in similar context, for example, in the Sports domain Reebok has faced a similar outcome to Magnavox Odyssey. Similarly, if we want a more recent example, Netflix or IBM Personal Computers can be alternatives. And if we have to find examples for a particular region, then Micromax Mobiles serves as a fitting alternative to Magnavox Odyssey. 

Approach

We utilize a step-by-step process and first identify entity of interest along with their context within a given document. Next, we align them with an array of interest vocabulary with domain, time, and regional terms to generate appropriate replacements. Finally, we verify the adapted entities and their statements against credible references, and present reader with examples of their interest. The step-by-step workflow of our framework is shown in figure.

Limitations

This pipeline is using LLM to find these examples and will be bounded by the knowledge in these LLMs. 

Future Work

How I came across this problem? 🤪

I'm an avid follower of business insights, whether they come from reading industry articles or binge-watching episodes of Shark Tank. One particular episode caught my attention—it delved deep into the pros and cons of being the first mover in a market. Intrigued, I decided to dig further into the concept. My search led me to an enlightening Harvard Business Review article on first-mover advantages. However, I noticed that the examples cited were a bit outdated.

Recognizing this gap in the literature, I set out on a digital scavenger hunt for academic papers that could shed more light on the subject. While I found several that discussed aspects like "temporal correspondence," tracing products from the Sony Walkman to the iPod, none seemed to address varying reader interests, be it regional or topical. This observation spurred me to create a framework which addresses this problem. I collaborated with my teammates and we showcased this in Adobe Hackathon.