Chapter 5 Databases
This chapter provides a quick overview of some of the main sources of free patent data. It is intended for quick reference and points to some free tools for accessing patent databases that you may not be familiar with.
It goes without saying that getting access to patent data in the first place is fundamental to patent analysis. There are quite a few free services out there and we will highlight some of the important ones. Most free sources have particular strengths or weaknesses such as the number of records that can be downloaded, the data fields that can be queried, the format the data comes back in or how
clean data is in terms of the hours required to prepare for analysis. We won’t go into all of the details that but will provide some basic pointers.
5.2 The Databases
5.2.1 The Lens
Previously known as the Patent Lens this is a well designed site with quite a few visualisation options and access to sequence data. It is possible to search the title, abstract, description and claims of patent documents and create and share data in collections. In 2015 the ability to download up to 10,000 records at a time was added. When combined with interactive charts that allow the user to drill down into results set, this has transformed the Lens into a very useful and innovative database and visualization tool.
The WIPO Patentscope database provides access to Patent Cooperation Treaty data including downloads of a selection of fields (up to 10,000 records), a very useful search expansion translation tool, and translation.
Obtaining sequence data from Patentscope. Note that this rapidly becomes gigabytes of data.
Probably the best known free patent database from the European Patent Office.
For readers in Latin America (or Spain & Portugal) LATIPAT is a very useful resource.
5.2.5 EPO Open Patent Services
Access patent data through the EPO Application Programming Interface (API) free of charge. Requires programming knowledge.
The developer portal allows you to test your API queries and is recommended.
5.2.6 USPTO Patents View
The USPTO main database search page can reasonably be described as well… old. In 2016 the USPTO team initiated an Open Data and Mobility initiative that opens up USPTO patent and trademark data. The new Open Date Portal is still in Beta but provides an insight into things to come.
As part of the shift to open data the USPTO has established an external Patents View for free searches and bulk downloads. If simple searching does not meet your needs, or the bulk options are too overwhelming, then the new JSON API service is likely to meet your needs. The services are still in beta but this is a very exciting development for those who need greater levels of access to patent data or access to specific data fields.
5.2.7 Google Patents
In the free version of the Google Custom Search API data retrieval is limited and the patent field headings are unclear (that is they use non-standard names). For free patent analytics, Google Custom Search is presently of very limited use.
5.2.8 Google Prior Art Finder
The Google Prior Art Finder is a relatively recent development that allows you to enter search terms or patent numbers and to view and export results.
The results include a Top Ten and are broken down into sections including Google Scholar, Patents etc.
The Export button will export the top ten results for each section in a .csv file.
It is possible to load more results for a section (e.g. see More Patent Results at the bottom of the results) and then export them (e.g. 20 patent documents rather than 10). In a test we managed to export 140 patent results but this could rapidly become laborious. An additional issue is that the data will need transposing. At the time of writing we had not identified an API route to Prior Art Finder.
It is a fantastic service, and an example to patent offices everywhere on freeing up patent data. If you have a good broadband connection and the hard drive space, it is quite good fun to suddenly have access to millions of patent records. The authors used the service to text mine the collection for millions of biological species names as reported here.
However, one important issue to note is that the XML delimiting individual documents is not always well demarcated. This means that any code that will work for one bulk set of files may fail on another set. While it is possible to address this, be prepared to spend time working on this and/or seek assistance from a professional programmer. For an insight into these issues see this Stackoverflow discussion on parsing the data in R.
5.2.10 Free Patents Online
Sign up for a free account for enhanced access and to save and download data. It has been around quite a while now and while the download options are limited we rather like it.
We are not covering national databases. However, the patent database of the German Patent and Trademark Office struck us as potentially very useful. It allows for searches in English and German and has extensive coverage of international patent data, including the China, EP, US and PCT collections. The coverage details are here. Worth experimenting with.
5.2.12 OECD Patent Databases
One that is more for patent statisticians. The OECD has invested a lot of effort into developing patent indicators and resources including citations, the Harmonised Applicants names database HAN database, mapping through the REGPAT database among other resources that are available free of charge.
Along the same lines the US National Bureau of Economic Research NBER US Patent Citations Data File is an important resource.
The most important database for statistical use is the EPO World Patent Statistical Database (PATSTAT) and contains around 90 million records. PATSTAT is not free and costs 1250 Euro for a year (two editions) or 630 Euro for a single edition. The main barrier to using PATSTAT is the need to run and maintain a +200 Gigabyte database. However, there is also an online version of PATSTAT that is free for the first two months if you wish to try it by signing up for the trial (knowledge of SQL required).
For users seeking to load PATSTAT into a MySQL database Simone Mainardi provides the following code on Github.
5.2.14 Other data sources
A number of companies provide access to patent data, typically with tiered access depending on your needs and budget. Examples include Thomson Innovation, Questel Orbit, STN, and PatBase. We will not be focusing on these services but we will look at the use of data tools to work with data from services such as Thomson Innovation.
5.3 Tools for Accessing Patent Data
In closing this chapter we will highlight a couple of tools for accessing patent data, typically using APIs and Python. We will come back to this later and are working to try this approach in R.
5.3.1 Patent2Net in Python
A Python tool to access and process the data from the European Patent Office OPS service.
5.3.2 Python EPO OPS Client by Gsong
A Python client for OPS access developed by Gsong and freely available on GitHub. Used in Patent2Net above.
5.3.3 Fung Institute Patent Server for USPTO data in JSON
Researchers at the Fung Institute have also been active in developing open source resources for accessing and working with patent data. We highlight
patentserver but it is worth checking out other resources in the repository such as patentprocessor, a set of Python scripts for processing USPTO bulk download data. Note that development of these tools no longer appears to be active.
5.4 Round Up
One problem confronting patent analysts is access to data in a form that is suitable for more detailed analysis. Typically this involves hundreds or many thousands of records. Recent years have increasingly opened up patent data through the ability to download 1,000 or 10,000 records at a time. However, access to downloads of titles, abstracts and claims or descriptions and full text remains limited when this is what is needed. Patent offices such as the USPTO have taken a leading role in making bulk patent data available and this is very much to be welcomed for those working on patent analytics. However, it is reasonable to say that the present situation is one of improvements in access (through Patentscope, the Lens and the EPO OPS service) but not quite in the quantitities or with the data fields patent analysts would like.