Problem Statement
In the face of increasingly diverse reader demographics ensuring that content both resonates and remains relevant poses a subtle yet intricate challenge. Traditionally, content personalisation has focused on tailoring tonality, style, and lexicons to suit reader preferences. Recently there have been frameworks aimed at simplification like 'Explain it to me like I'm five' and targeted explanation like 'Explain to me like I'm a scientist'. However, even with modifications to content style, readers might still experience a disconnection when met with examples in the content that diverge from their knowledge and interests, such as for a fashion enthusiast, technological references spanning disparate eras or geographic regions may not be relevant.
In this work, we approach the temporal, region and topic interest entity correspondence problem in which, given an example and its context, the task is to find an alternative example that exists in a target time, region or topic of interest. In this manner, our work utilises LLMs to enhance readability by adapting entities and context within a document to align closely with varied reader interests, ensuring reading is more engaging, relatable, and factually consistent for diverse readers.
Example
Consider that you are reading a document that mentions Magnavox Odyssey (refer to the highlighted yellow text in the Figure below). From the surrounding context, we recognise that even though Magnavox Odyssey was the first to enter the market, it did not sustain the first mover advantage. However, the example does not resonate with all readers and they might lose relevance over time.
Our framework aims to retrieve other replaceable entities in similar context, for example, in the Sports domain Reebok has faced a similar outcome to Magnavox Odyssey. Similarly, if we want a more recent example, Netflix or IBM Personal Computers can be alternatives. And if we have to find examples for a particular region, then Micromax Mobiles serves as a fitting alternative to Magnavox Odyssey.
Approach
We utilize a step-by-step process and first identify entity of interest along with their context within a given document. Next, we align them with an array of interest vocabulary with domain, time, and regional terms to generate appropriate replacements. Finally, we verify the adapted entities and their statements against credible references, and present reader with examples of their interest. The step-by-step workflow of our framework is shown in figure.
Limitations
This pipeline is using LLM to find these examples and will be bounded by the knowledge in these LLMs.
Future Work
First, we plan to update the framework such that it uses LLM for paraphrasing the context of an example and generating queries for search engine. These queries will then be used to find alternate examples on the web. This will use LLMs as an Agent and the outputs will not be bounded by knowledge in LLMs.
We plan to use this pipeline and create a benchmark dataset which has cross-domain entities, and benchmark LLMs on their inherent knowledge of these correlated entities.
How I came across this problem? 🤪
I'm an avid follower of business insights, whether they come from reading industry articles or binge-watching episodes of Shark Tank. One particular episode caught my attention—it delved deep into the pros and cons of being the first mover in a market. Intrigued, I decided to dig further into the concept. My search led me to an enlightening Harvard Business Review article on first-mover advantages. However, I noticed that the examples cited were a bit outdated.
Recognizing this gap in the literature, I set out on a digital scavenger hunt for academic papers that could shed more light on the subject. While I found several that discussed aspects like "temporal correspondence," tracing products from the Sony Walkman to the iPod, none seemed to address varying reader interests, be it regional or topical. This observation spurred me to create a framework which addresses this problem. I collaborated with my teammates and we showcased this in Adobe Hackathon.