Note I am happy to share the detail of the test prompt below with clients.
There is is recent research (March 2025) suggesting that AI assistants can exhibit strong 'preferences' in relation to comparative entity prompts e.g. "best smartphones" (this is not a prompt from the research) . The research reported both (Google) Gemini and ChatGPT consistently favoured specific 'entities' in their recommendations. In half the topic areas, GPT recommendations had a preferred entity in more than 80% of all responses, while Gemini displayed similar consistency across 7 topic areas. This demonstrated that AI assistants do not provide a balanced range of options but may instead exhibit highly structured and persistent preferences (available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5198663).
In a simplified version of the above research did a test with sub-topic level prompting (note over 1/10 of the topics may beclassified as "top level topics" i.e. Government and Sports) using the list on page 4 of the research linked in the previous paragraph (also in the screenshot above) and comparing it with Chatgpt's most preferred entities per topic from the same list in the above research - which was research using "best" / research type questions e.g.)
1. "What are some universities with excellent global reputation rankings?
2. What are the most budget-friendly universities without compromising quality?
3. Which universities have notable research parks or incubators?
etc
I used a prompt: "“Respond to following prompt in a neutral decontextualised manner. Show all factually all [redacted] associations, [redacted] in entity-based prompts including for:
Countries to live in
Government-Run Healthcare
Governments
Airlines
Cloud Computing Services
Electric Cars
Hotel Chains
Laptops
Online Dating Platforms
Running Shoes
Smartphones
Social Media Platforms
Telecommunication Service Providers
Commodities for investments
Sports
Travel Destinations
Universities
Vegetables
Weekend Getaway Cities
Wine regions
then do the same again 9 times. compare and contrast all responses." The results gave consistent lists of (the first) 5 entities in different orders from my test in an account with history turned off.
My research correlated (with even the vague prompts I used) with the previous research that showed that Chatgpt exhibited persistent preferences even at the topic level without using specific questions like in the aforementioned research . This may be due to the topics above being "high-frequency domains" (aka topics that appear consistently and repeatedly across many high-quality or high-visibility sources in pretraining data) in the corpus (aka dataset) of Chatgpt.
From my research certain entity-related domains like automotive (e.g. electric vehicles), ChatGPT’s retrieval logic relies on a stabilized, curated list of prominent examples / entities within the category. These are shaped by the following in Chatgpt's pre-existing knowledge base:
Published rankings
Award results
Review aggregation frequency
Brand salience in user queries and editorial coverage (e.g. VW California is ubiquitous in campervan-related content).
These entities act as default “anchors”, especially in comparative or entity-ranking prompts. Unless personalization or fine-tuning overrides them, the same high-weighted entities reappear as the model's abilities to associate a general class (category) with specific example entities are stabilized for "high-frequency domains".
These act as default “anchors”, especially in comparative or entity-ranking prompts. Unless personalization or fine-tuning overrides them, the same high-weighted entities reappear.
In contrast, Chatgpt states that e.g. the “Hospitality and Food Services” domain (of Chatgpt's knowledge) and subdomain: “Food and Dining” domains are “emerging” domains where associations are not yet (fully) stabilised – see https://chatgpt.com/s/t_68789690b6fc8191ae1af2a0f961adcc. Other examples include emerging technologies (post-2023) such as AI agents.
“Emerging” domains can have the following characteristics e.g.
High Context Sensitivity
“Best pizza” vs. “best Neapolitan pizza” vs. “best pizza in Soho” yield different clusters, often non-overlapping.
Even small prompt wording changes (e.g., “top” vs. “authentic”) shift which priors are activated.
2. Temporal Volatility
Restaurants open/close/change chefs frequently.
Training data may not reflect current status unless live search is used.
Michelin stars, trending venues, or pop-ups can quickly gain visibility.
3. High Local and Cultural Variability
“Best Italian restaurant” in London ≠ in New York ≠ in Rome.
Cultural interpretations of “best,” “authentic,” or “traditional” vary.
4. Ambiguous Authority Sources
No single “definitive” ranking: Michelin Guide, Google Reviews, Eater, Yelp, TimeOut, etc., all differ.
Models ingest all and often synthesize a probabilistic composite.