Resources

Exploring transformers for patent text analysis

Patent documents are a rich and growing source of data for innovation research. Recent progress in natural language processing (NLP), especially with transformer models, has created exciting new ways to analyze and extract insights from patent text. Transformers—large language models (LLMs) based on deep learning—use key features like attention mechanisms and word embeddings to understand text meaning more effectively than traditional methods. This makes them a powerful tool for studying innovation through patents. In this target article, we explain the main ideas behind transformers in simple terms, linking concepts from machine learning and linguistics. We then show how these models can be applied to patent research, with a focus on measuring technological novelty. To illustrate, we include exploratory analyses that demonstrate how transformers can be used in practice. The article, together with the accompanying code and data, can serve as a didactic tool for PhD students in the field of innovation.

Link to article

Data and codes (available upon request at m.mastrogiorgio@ie.edu)

Page updated

Report abuse