I am new to using patent data. How does patent data work?
A good start would be to read the classic article by Hall et al. (2001) on the content of patents. It is available via the NBER website.
Thereafter, keep the following in mind:
- all granted patents are included in the grant_pat table;
- granted patents are classified via one or more technological classifications (class_cpc, class_ipc, class_loc, class_nat);
- granted patents can have one or more more applicants (party_app), assignees (party_asg), attorneys (party_att), examiners (party_exa) and inventors (party_inv);
- granted patents can have prior-art references to US patents (cites_cpt), non-US patents (cites_cfr), and non-patent literature (cites_npl);
- granted patents were compared to existing patents in one or more technological classes via multiple classifications (ufocs_cct, ufocs_cpc, ufocs_ipc, ufocs_nat);
- granted patents may relate to prior patents as continuations, divisions, etc. (usrel);
All these tables link to each other through the grant_id field.
Which software should I use?
The data are provided as a relational database. Therefore any relational database software will do (SQL, SAS, Access, etc.) and most of these applications allow you to import CSV files. In addition, you can use Stata to compute measures by using the ‘merge’ and ‘joinby’ functions. For Stata, use the DTA files.
How should we cite this dataset?
Please simply refer to "Patent Grant Data (2017). Available via https://sites.google.com/site/patgrantdata/ "
Help, I get an out-of-memory error message!
This means the dataset is too large for the software, its settings, and/or your hardware. If you get this message, first check your settings: many applications have built-in limits of maximum memory to use. This can be changed in the settings. For example, in Stata the ‘query memory’ command shows your memory settings.
Second, certain applications are simply limited in their capabilities. For example Excel cannot deal with tables over about one million (210) records. Ensure that your database can deal with tables up to 100 million records (the largest table).
Third, sometimes your computer hardware is simply limited: it does not have enough RAM or processing power to deal with larger files. In that case, you could download the CSV files and cut them into pieces: use LTFViewer to view the structure of the files and use a text file splitter (GSplit, TextFileSplitter, etc.) to split the files into pieces which can be used individually. When using Stata, it is possible to only load a part of the dataset into Stata by using the command ‘use file.dta var1 var2 var3’ for certain fields or ‘use file.dta in 1/1000’ for certain records.
I found an error in the data. Where could I report it?
Please contact me via e-mail. I will look into this and figure out what went wrong.
I have another question.
Please contact me (contact details on the right of each page).