Gemini & ChatGPT consistently favour specific entities? A prompting test at sub-topic level.

Note I am happy to share the detail of the test prompt below with clients.

There is is recent research (March 2025) suggesting that AI assistants can exhibit strong 'preferences' in relation to comparative entity prompts e.g. "best smartphones" (this is not a prompt from the research) . The research reported both (Google) Gemini and ChatGPT 4 consistently favoured specific 'entities' in their recommendations. In half the topic areas, GPT recommendations had a preferred entity in more than 80% of all responses, while Gemini displayed similar consistency across 7 topic areas. This demonstrated that AI assistants do not provide a balanced range of options but may instead exhibit highly structured and persistent preferences (available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5198663).

preferred entities chatgpt comparative prompts

In a simplified version of the above research did a test with sub-topic level prompting (note over 1/10 of the topics may beclassified as "top level topics" i.e. Government and Sports) using the list on page 4 of the research linked in the previous paragraph (also in the screenshot above) and comparing it with Chatgpt 4's most preferred entities per topic from the same list in the above research - which was research using "best" / research type questions e.g.)

1. "What are some universities with excellent global reputation rankings?

2. What are the most budget-friendly universities without compromising quality?

3. Which universities have notable research parks or incubators?

etc

I used a prompt: "“Respond to following prompt in a neutral decontextualised manner. Show all factually all [redacted] associations, [redacted] in entity-based prompts including for:

Countries to live in

Government-Run Healthcare

Governments

Airlines

Cloud Computing Services

Electric Cars

Hotel Chains

Laptops

Online Dating Platforms

Running Shoes

Smartphones

Social Media Platforms

Telecommunication Service Providers

Commodities for investments

Sports

Travel Destinations

Universities

Vegetables

Weekend Getaway Cities

Wine regions

then do the same again 9 times. compare and contrast all responses." The results gave consistent lists of (the first) 5 entities in different orders from my test in an account with history turned off.

DO NOT DELETE

My research correlated (with even the vague prompts I used) with the previous research that showed that Chatgpt exhibited persistent preferences even at the topic level without using specific questions like in the aforementioned research . This may be due to the topics above being "high-frequency domains" (aka topics that appear consistently and repeatedly across many high-quality or high-visibility sources in pretraining data) in the corpus (aka dataset) of Chatgpt.

High-frequency Domains

From my research certain entity-related domains like automotive (e.g. electric vehicles), ChatGPT’s retrieval logic relies on a stabilized, curated list of prominent examples / entities within the category. These are shaped by the following in Chatgpt's pre-existing knowledge base:

Published rankings
Award results
Review aggregation frequency
Brand salience in user queries and editorial coverage (e.g. VW California is ubiquitous in campervan-related content).

These entities act as default “anchors”, especially in comparative or entity-ranking prompts. Unless personalization or fine-tuning overrides them, the same high-weighted entities reappear as the model's abilities to associate a general class (category) with specific example entities are stabilized for "high-frequency domains".

These act as default “anchors”, especially in comparative or entity-ranking prompts. Unless personalization or fine-tuning overrides them, the same high-weighted entities reappear.

“Emerging” domains

In contrast, Chatgpt states that e.g. the “Hospitality and Food Services” domain (of Chatgpt's knowledge) and subdomain: “Food and Dining” domains are “emerging” domains where associations are not yet (fully) stabilised – see https://chatgpt.com/s/t_68789690b6fc8191ae1af2a0f961adcc. Other examples include emerging technologies (post-2023) such as AI agents.

“Emerging” domains can have the following characteristics e.g.

High Context Sensitivity

“Best pizza” vs. “best Neapolitan pizza” vs. “best pizza in Soho” yield different clusters, often non-overlapping.
Even small prompt wording changes (e.g., “top” vs. “authentic”) shift which priors are activated.

2. Temporal Volatility

Restaurants open/close/change chefs frequently.
Training data may not reflect current status unless live search is used.
Michelin stars, trending venues, or pop-ups can quickly gain visibility.

3. High Local and Cultural Variability

“Best Italian restaurant” in London ≠ in New York ≠ in Rome.
Cultural interpretations of “best,” “authentic,” or “traditional” vary.

4. Ambiguous Authority Sources

No single “definitive” ranking: Michelin Guide, Google Reviews, Eater, Yelp, TimeOut, etc., all differ.
Models ingest all and often synthesize a probabilistic composite.

Page updated

Google Sites

Report abuse