import requests
import csv
import os
import datetime
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
requests
Used to make HTTP requests, such as GET or POST, to access APIs or websites.
csv
Provides functionality to read from and write to CSV (Comma-Separated Values) files.
os
Provides a way to interact with the operating system, such as handling file paths or environment variables.
datetime
Used to manipulate dates and times, offering date formatting, time calculations, and more.
pandas
A powerful data manipulation library used for data analysis, providing data structures like DataFrames.
numpy
A library for numerical operations and array handling, offering efficient computation for large datasets.
seaborn
A data visualization library built on top of matplotlib, providing an easy interface for creating attractive statistical graphics.
pyplot
A plotting library used for creating static, interactive, and animated visualizations in Python.
First, we are going to need the main URL.
url = "https://api.patentsview.org/patents/query"
Now, we will need to attach what we are going.
Q : Query
A query is a request made to retrieve specific information from a database, API, or data source. It typically involves specifying conditions or filters that narrow down the results to meet the user's needs. For example, in the context of an API, a query might ask for data where a particular field. The query helps to define the exact subset of data you want to access, rather than returning all the available data. In this code, query was used to make a search date range, and topic selection(ML,NLP)
F : Fields
This parameter is used to specify which fields you want the API to return in the response. It allows you to customize the response by including only the specific data fields you're interested in. In this code, all the field endpoints are called to get a raw data.
O : Outputs
This parameter is used to specify additional options for the query, such as pagination or sorting. Because there are limitation when we call requests, use this output option to expand search listing.
query_payload = {
"q": {
"_and": [
{
"_gte": {"patent_date": "2018-01-01"}
},
{
"_lte": {"patent_date": "2024-12-31"}
},
{
"_or": [
{"_text_phrase": {"patent_title": "machine learning"}},
{"_text_phrase": {"patent_title": "ML"}},
{"_text_phrase": {"patent_title": "Natural Language Processing"}},
{"_text_phrase": {"patent_title": "NLP"}}
]
}
]
},
"f": [
"appcit_app_number",
"appcit_category",
"appcit_date",
"appcit_kind",
....
"wipo_field_id",
"wipo_field_title",
"wipo_sector_title",
"wipo_sequence"
],
"o": {
"per_page": 1000,
"page": 1
}
}
After setup the all the search options, now we request response from the url. Because of edited outputs, response will show us 1000 patents per pages. And there are more pages coming next. For each page, by using requests library, we can GET API response.
response = requests.post(url, json=query_payload)
data = response.json()
total_patent_count = data['total_patent_count']
per_page = query_payload['o']['per_page']
total_pages = (total_patent_count + per_page - 1) // per_page
all_patents = []
for page in range(1, total_pages + 1):
query_payload['o']['page'] = page
response = requests.post(url, json=query_payload)
data = response.json()
patents = data.get('patents', [])
all_patents.extend(patents)
print(f"Retrieved page {page}/{total_pages}")
print(f"Total patents retrieved: {len(all_patents)}\n")
Now data is ready. Import these data with field names to a csv file for more prepare for the data cleaning.
with open('patents.csv', 'w', newline='') as csvfile:
fieldnames = [
"appcit_app_number",
"appcit_category",
"appcit_date",
"appcit_kind",
...
"wipo_field_id",
"wipo_field_title",
"wipo_sector_title",
"wipo_sequence"
]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
items = all_patents
for item in items:
writer.writerow({
"appcit_app_number" : item.get("appcit_app_number", ""),
"appcit_category" : item.get("appcit_category", ""),
"appcit_date" : item.get("appcit_date", ""),
"appcit_kind" : item.get("appcit_kind", ""),
"appcit_sequence" : item.get("appcit_sequence", ""),
...
"wipo_field_id" : item.get("wipo_field_id", ""),
"wipo_field_title" : item.get("wipo_field_title", ""),
"wipo_sector_title" : item.get("wipo_sector_title", ""),
"wipo_sequence" : item.get("wipo_sequence", "")
}
)
Total patents retrieved: 7092
Now there is a fresh csv file from the patentview API.
For checking Raw Data, check the sub-tab