Shared task: Product business idea generation from patents (PBIG)

PBIG has concluded. Thank you for your participation!

The shared task paper has been published:

Wataru Hirota, Chung-Chi Chen, Tomoko Ohkuma, Tomoki Taniguchi and Tatsuya Ishigaki. Overview of PBIG Shared Task at AgentScen 2025: Product Business Idea Generation from Patents .

Important dates

April 4th, 2025: Registration Open!
- If you're interested in this task, register using this form. You can get announcements about this task: https://docs.google.com/forms/d/e/1FAIpQLSfYcJI5UiBk-1-TUUl3eMznWyqP5aV-9C4PB5Ma8zbckiDRVQ/viewform
April 4th, 2025: Patent dataset release
- To access the data, please register using the form above.
April 30th. 2025: System output / paper submission form open
June 1st, 2025: Deadline for system output submission
- We're accepting system outputs! https://sites.google.com/view/agentscen/shared-task/sysmte-output-submission-pbig-shared-task
June 16th, 2025: Release of evaluation results
June 20th, 2025: Deadline for paper submission

Time zone: Anywhere On Earth (AOE)

Motivation

Product business idea generation from patents remains comparatively underexplored despite recent advancements in language technologies including LLMs have shown promise in tasks such as scientific hypothesis and discovery [1] [2], however, the challenges of generating viable business ideas requires multifaceted competencies, including a deep understanding of the relevant domain, the ability to pinpoint unmet user needs, and the creative synthesis of novel concepts.

By leveraging patents as a rich source of technical information, we expect that LLMs can facilitate the ideation process in a manner that leads to practical and innovative products. Harnessing the potential of these models in business idea generation has the promise to accelerate AI-driven innovation.

[1] Wang et al. SciMON : Scientific Inspiration Machines Optimized for Novelty. ACL 2024.

[2] Kumar et al. Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers. ICLR 2025.

Task Definition

We provide participants a set of patents (including full text and figures). For each patent, the task is to generate explanations about a potential future product that uses the patent’s technology. The product should be something that can realistically be launched or implemented within three years.

For each patent, the goal is to generate a business idea of a product using the technology written in the patent.

In the shared task, we define the scope of ideas as follows: using the patent, the business can be launched within 3 years.

A system proposed by a participant takes a patent as an input and generates the following four explanatory texts:

Product title: A concise name for your product (up to 100 characters).
Product description: A brief explanation of the product outlining its essential features and functions, the target users, their needs, and the benefits provided by the product (up to 300 characters).
Implementation: An explanation describing how you will implement the patent’s technology into your product (up to 300 characters).
Differentiation: An explanation highlighting what makes your product unique and the reason why it stands out from existing solutions (up to 300 characters).

Participants are allowed to use any external data other than an input patent. For example, you can use other patents or crawled web site as additional inputs to your system. These strategies are recommended as they may enhance the diversity of generated ideas and improve the user needs–product match, while also fostering deeper discussion at the AgentScen workshop in IJCAI-2025.

Data

Input

The patent dataset

*To access the data, please register using this form: https://docs.google.com/forms/d/e/1FAIpQLSfYcJI5UiBk-1-TUUl3eMznWyqP5aV-9C4PB5Ma8zbckiDRVQ/viewform

There is a directory for each category (NLP, Computer Science, Materials Chemistry).
Each directory contains:
- A JSONL file with the patent contents.
- A folder named pdf_and_image that includes subdirectories. Each subdirectory is named after a patent’s publication number and contains:

• A PDF file of the patent.

• Figure images in JPG format.

The input patent document (JSON) is represented in the following format:

{

"title": "...",

"application_number": "...",

"publication_number": "...",

"publication_date": "YYYY/MM/DD",

"abstract": "...",

"claims": "...",

"description": "...",

}

This is an example.

{

"title": "Natural language processing for restricting user access to systems",

"application_number": "US-2024039919-A9",

"publication_number": "US-202117564168-A",

"publication_date": "2024/02/01",

"abstract": "A method and system determine network ... (truncated)",

"claims": "What is claimed is: 1 . A method for determining network based access to restricted systems, comprising. ... (truncated)",

"description": "BACKGROUND Technical Field The present disclosure generally relates to networking systems ... (truncated)"

}

In the shared task, we sampled 50 USPTO patents for each of three categories—NLP, computer science, and material chemistry. We chose these patents based on experts’ availability and the expected diversity of product ideas.

May 2, 2025. The output file spec is updated: patent_id is no longer used; instead, use publication_number instead.

Output

A system should output a JSON file that contains a list of future product descriptions.

[

{

"publication_number": "...",

"title":"..."

"product_description": "...",

"implementation": "...",

"differentiation": "..."

{

"publication_number": "...",

"title":"..."

"product_description": "...",

"implementation": "...",

"differentiation": "..."

...

]

This is an example of a future product.

{

"publication_number": "US-202117564168-A",

"title": "NameGuard: AI-Powered Access Control for Enterprise Systems",

"product_description": "NameGuard helps IT admins and compliance teams block unauthorized access by checking user names against global deny lists and using AI to catch name variations. It’s ideal for finance, defense, and critical infrastructure sectors needing strong security and compliance.",

"implementation": "Use the patented method to integrate a name screening API into login or user registration flows. Names are matched against an updated denylist, decomposed, and analyzed via a neural network to detect obfuscated identities. Access decisions are then returned to the enterprise system.",

"differentiation": "Unlike traditional DPL checks, NameGuard detects partial or altered name matches using name decomposition and machine learning. It adapts to evolving threats, aggregates multi-source deny lists, and flags suspect names not yet on known lists, reducing false negatives and increasing compliance accuracy."

}

May 2, 2025. The output file spec is updated: patent_id is no longer used; instead, use publication_number instead.

Evaluation and leaderboard

Overview In this task, submissions are evaluated through pairwise comparisons. Given a pair of business ideas (A, B) derived from the same patent, human and LLM-based evaluators choose one of the following choices: “A is better,” “B is better,” “Tie,” or “Neither is good.” in terms of evaluation criteria explained below.

Human Evaluators Domain experts manually evaluate sampled submissions.

Automatic Evaluators Three different prompts are fed into three different LLMs to evaluate all the submissions. One example prompt is shown below, but other prompts are hidden for participants to prevent evaluation metric hacking.

## Inputs

Read (1) a patent and (2) two product business ideas using the technology in the patent.

</patent>

</idea>

</idea>

## Task

Your task is to choose the better idea from the perspective of **{{ criteria }}**.

{{ criteria_description }}.

## Output format

Output the judgment in the following format:

```json

{"idea_id": "<1 or 2>", "reason": <"reason for the choice">}

```

Evaluation criteria

An idea is evaluated from the following perspectives:

Technical validity: Is the patent suitable for the product? Is the implementation feasible? Can it be done within three years?
Innovativeness: Does the patented technology offer a novel solution to the demand?
Specificity: Is the idea specific? For example, “help researchers manage references” is more specific than “help researchers do research.”
Need validity: Do the described users really need this solution?
Market size: Is the market large enough? Are there many potential users?
Competitive advantage: What business advantage does the product gain by using this patented technology?

Ranking Submissions

[1 Elo. The Rating of Chessplayers, Past and Present. Ishi Press, 1986
[2] Chatbot Arena: Elo Rating Calculation (July 17, 2023) https://colab.research.google.com/drive/1RAWb22-PFNI-X1gPVzc927SGUdfr6nsR?usp=sharing#scrollTo=B_PYA7oVyaHO

Baseline methods

This is the prompt of a baseline method.

I give you the description of a patent. Read it.

</patent>

## Task

Generate one business idea for a product using this patent.

Output the idea in the following format:

{

"product_title": "...",

"product_description": "...",

"implementation": "...",

"differentiation": "...",

}

## Rules

- product_title: A concise name for your product (up to 100 characters).

- product_description: A brief explanation of the product outlining its essential features and functions, the target users, their needs, and the benefits provided by the product (up to 300 characters).

- implementation: An explanation describing how you will implement the patent’s technology into your product (up to 300 characters).

- differentiation: An explanation highlighting what makes your product unique and the reason why it stands out from existing solutions (up to 300 characters).

Regulations

Competition

Participants submit only one product business idea for a patent.
Participants may select one or more patent categories.
Participants can use external resources e.g., APIs, such as a web search API, for ideation.
Each section’s text is truncated if it exceeds its maximum length.
Participants need to review submissions from other teams in the NLP and Computer science categories. They don’t need to review them in material chemistry.

Paper Submission

Use the ACL template for all submissions.
The main text is limited to 4 pages, with an unlimited appendix placed after the references.
Shared task participants will be asked to review other teams’ papers during the review period.

Licenses

The patent dataset is from Google Patents Public Data by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

The patent PDFs and drawings are from United States Patent and Trademark Office, www.uspto.gov

Shared Task Organizers

Wataru Hirota - Stockmark Inc., Japan.
Chung-Chi Chen - AIRC, AIST, Japan.
Tatsuya Ishigaki - AIRC, AIST, Japan.
Tomoko Ohkuma - Asahi Kasei Corporation, Japan.
Tomoki Taniguchi - Asahi Kasei Corporation, Japan.

Acknowledgements

We thank the annotators for their valuable contributions to the data annotation process.

Asahikasei TENAC R&D Dept.

Yoshitaka Shigematsu
Motohiro Fukai
Tomohito Asakawa
Shinya Akebi

Stockmark Inc.

Daisuke Nakagawa
Jiro Kashiwagi
Kumiko Boshita
Shotaro Kusaura
Tomoaki Okamoto
Wataru Inoue
Yutaka Mori

Contacts

ijcai2025@<domain>

Replace <domain> with stockmark.co.jp

Page updated

Google Sites

Report abuse