Background:
The USPTO patent classification system assigns patents to one or multiple categories. This classification is typically done by humans before a patent is approved. Over time, the classification hierarchy (CPC) has evolved to accommodate new technologies and applications. For example, with the increasing interest in climate change-related technology, the category "Y02", which relates to patents focusing on "technologies for mitigating climate change" is a new addition.
Problem Statement:
In this project, we leverage Y02 patent data to align TopicGPT for the creation of topic hierarchies. This system serves two key purposes: (1) assisting patent officials in streamlining the labor-intensive process of manually annotating patents, and (2) empowering innovators and industry professionals to uncharted patents accessible from different points of view.
Approach:
Collected and compiled a new dataset by scraping recent data from USPTO's website. Our dataset was published on Kaggle
Applied prompt engineering and few-shot learning to align TopicGPT with our generation objectives.
Developed a robust evaluation technique utilizing inter-annotator agreement with LLMs like Mistral, Llama 3, Qwen, and DBRX
Demonstrated superior interpretability and flexibility to user-defined goals compared to BERTTopic