Chapter 7 Databases

7.0.1 Introduction

This chapter provides a quick overview of some of the main sources of free patent data. It is intended for quick reference and points to some free tools for accessing patent databases that you may not be familiar with.

It goes without saying that getting access to patent data in the first place is fundamental to patent analysis. There are quite a few free services out there and we will highlight some of the important ones. Most free sources have particular strengths or weaknesses such as the number of records that can be downloaded, the data fields that can be queried, the format the data comes back in or how clean data is in terms of the hours required to prepare for analysis. We won’t go into all of the details that but will provide some basic pointers.

7.1 The Databases

7.1.1 The Lens

Originally known as the Patent Lens this is a very well designed site that provides combined access to data on hundreds of millions of scientific publications and worldwide patent collections Figure 7.1. Originally free for all users, the services are increasingly being split between non-commercial and commercial users in order to generate funds for the services. However, the Lens has adopted an equitable access policy (LEAP) that means that ability to pay should not be an obstacle to access.

Compared with other service providers, the Lens is particularly strong in linking the scientific and patent literature through citations. The Lens also stands out for the number of records that can be downloaded (50,000) compared with other services, for its rapid visualisation tools, new application programming and bulk download services (paid) and genetic sequence data (free and paid).

It is possible to search the title, abstract, description and claims of patent documents and create and share data in collections. As of 2021 it is possible to download 50,000 records at a time, including the titles, abstracts and claims of patent documents. As far as we are aware no other patent database offers this level of rapid access for patent analytics purposes.

The Lens Home Page

Figure 7.1: The Lens Home Page

7.1.2 Patentscope

The WIPO Patentscope database provides access to over 2.4 million Patent Cooperation Treaty applications and over 99 million patent documents. including downloads of a selection of fields (up to 10,000 records), a very useful search expansion translation tool, and translation (Figure 7.2.

For readers in South East Asia (ASEAN) it is important to note that the national collections of Brunei Darussalam, Cambodia, Philippines, Indonesia, Malaysia and Thailand and Viet Nam are available in Patentscope. This overcomes a limitation in access to these national collections across different databases.

The Patentscope Home Page

Figure 7.2: The Patentscope Home Page

Search results for drone technology on Patentscope Figure 7.3.

Search Results from Patentscope

Figure 7.3: Search Results from Patentscope

Obtaining sequence data from Patentscope. Note that this rapidly becomes gigabytes of data Figure 7.4.

Patentscope Sequence Listings with Bulk Download available over ftp

Figure 7.4: Patentscope Sequence Listings with Bulk Download available over ftp

Perhaps less well known is WIPO Pearl , see Figure 7.5. This provides access to powerful human curated mappings of search terms across multiple languages and is particularly important for developing detailed searches in specific domains and for searching across jurisdictions.

WIPO Pearl Cross Language Terms

Figure 7.5: WIPO Pearl Cross Language Terms

In Figure 7.5 we have searched for the term drone and exposed related terms within and across languages in patent data. In Figure 7.6 below we use concept match search focusing on airborne drones to reveal closely related concepts in patent data.

WIPO Pearl Patent Concept Map

Figure 7.6: WIPO Pearl Patent Concept Map

7.1.3 espacenet

Probably the best known free patent database from the European Patent Office. espacenet provides free access to over 130 million patent documents Figure 7.7.

espacenet home page

Figure 7.7: espacenet home page

The espacent search interface has been updated since the first edition of this manual and now includes a greater range of search options and tooltips (Figure 7.8.

espacenet advanced search features filters and tool tips

Figure 7.8: espacenet advanced search features filters and tool tips

7.1.4 LATIPAT

For readers in Latin America (or Spain & Portugal) LATIPAT Figure 7.9 is a very useful resource

LATIPAT for patent texts in Spanish and Portugese

Figure 7.9: LATIPAT for patent texts in Spanish and Portugese

7.1.5 EPO Open Patent Services

Access patent data through the EPO Application Programming Interface (API) free of charge (Figure 7.10. Requires programming knowledge.

EPO Open Patent Services for programmatic access to EPO data

Figure 7.10: EPO Open Patent Services for programmatic access to EPO data

7.1.6 USPTO Patents View

The USPTO main database search page can reasonably be described as well… old. In 2016 the USPTO team initiated an Open Data and Mobility initiative that opens up USPTO patent and trademark data. The new Open Date Portal is still in Beta but provides an insight into things to come.

As part of the shift to open data the USPTO has established an external Patents View for free searches and bulk downloads. If simple searching does not meet your needs, or the bulk options are too overwhelming, then the new JSON API service is likely to meet your needs. The services are still in beta but this is a very exciting development for those who need greater levels of access to patent data or access to specific data fields.

7.1.7 Google Patents

Google Patents

Figure 7.11: Google Patents

The Google Patent Search API has been deprecated. Access through the Google Custom Search API with the API flag for patents reported to be &tbm=pts with example code for using the API in Python.

In the free version of the Google Custom Search API data retrieval is limited and the patent field headings are unclear (that is, they use non-standard names). For free patent analytics, Google Custom Search is presently of very limited use.

7.1.8 USPTO Bulk Downloads

The USPTO patent database interface is archaic. However, a lot has been happening at the USPTO to make patent data more accessible in a range of formats to enable wider use.

For patent analytics we recommend the PatentsView API for regular lightweight use and the PatentsView Data Download for heavier duty users. The PatentsView Data Download is an excellent resource because it offers the complete US patent collection in data tables that can easily be joined or text mined.

In addition to PatentsView the USPTO also offers a bulk download service for the raw html files from the USPTO Patent Gazette.

USPTO Bulk Patent Data

Figure 7.12: USPTO Bulk Patent Data

It is unclear what exactly you might want to do with this and it could take a lot of work to parse but we include it for the sake of completeness. One possible use would be to check legal status issues such as the payment of payment renewal fees. We suspect that this type of use will be more relevant to companies for whom up to date information is of critical importance. Regular users will probably prefer other sources of the same data even if it is less timely.

7.1.9 Free Patents Online

Sign up for a free account for enhanced access and to save and download data. It has been around quite a while now and while the download options are limited we rather like it.

Free Patents Online

Figure 7.13: Free Patents Online

7.1.10 DEPATISnet

We are not covering national databases. However, the patent database of the German Patent and Trademark Office struck us as potentially very useful. It allows for searches in English and German and has extensive coverage of international patent data, including the China, EP, US and PCT collections. Worth experimenting with.

German Patent and Trademark Office Database

Figure 7.14: German Patent and Trademark Office Database

7.1.11 OECD Patent Databases & Resources

One that is more for patent statisticians and particularly important for methodologies and understanding patent data. The OECD has invested a lot of effort in developing patent indicators and resources including citations, the Harmonised Applicants names database HAN database and mapping through the REGPAT database among other resources that are available free of charge. The OECD datasets are linked to PATSTAT and are accessible free of charge by filling out a short form.

The OECD IP Portal

Figure 7.15: The OECD IP Portal

Along the same lines the US National Bureau of Economic Research NBER US Patent Citations Data File is an important resource. NBER patent data resources are accessible as part of the suite of data at PatentsView, notably the NBER patent category and subcategory classification files.

7.1.12 EPO World Patent Statistical Database

The most important database for statistical use is the EPO World Patent Statistical Database (PATSTAT) and contains around 90 million records. PATSTAT is not free and costs 1250 Euro for a year (two editions) or 630 Euro for a single edition. The main barrier to using PATSTAT is the need to run and maintain a +200 Gigabyte database. However, there is also an online version of PATSTAT that is free for the first two months if you wish to try it by signing up for the trial (knowledge of SQL required).

EPO Patstat Portal

Figure 7.16: EPO Patstat Portal

For users seeking to load PATSTAT into a MySQL database Simone Mainardi provides the following code on Github while Gianluca Tarasconi also provides scripts through the raw patent blog. Note that, as a result of the pandemic, members of the patent analytics community may not have posted the latest scripts. However, the PATSTAT team includes

7.1.13 Other data sources

A number of companies provide access to patent data, typically with tiered access depending on your needs and budget. Examples include The Lens, Derwent Innovation, PatSnap, Dimensions, Questel Orbit, STN, and PatBase. We will not be focusing on these services but we will look at the use of data tools to work with data from services such as Derwent Innovation.

For more information on free and commercial data providers try the excellent Patent Information User Group and its list of Patent Databases.

Patent Information User Group

Figure 7.17: Patent Information User Group

7.2 Tools for Accessing Patent Data

In closing this chapter we will highlight a couple of tools for accessing patent data, typically using APIs and Python. We will come back to this later and are working to try this approach in R.

7.2.1 Patent2Net in Python

A Python based tool to access and process the data from the European Patent Office OPS service.

Patent2Net

Figure 7.18: Patent2Net

7.2.2 Python EPO OPS Client by Gsong

A Python client for OPS access developed by Gsong and freely available on GitHub. Used in Patent2Net above.

Python Client for the EPO OPS Service

Figure 7.19: Python Client for the EPO OPS Service

7.3 Round Up

One problem that has confronted patent analysts for many years is access to data in a form that is suitable for more detailed analysis. In the six years between the creation of the first edition of this Manual and the second edition the situation has changed quite dramatically. On the one hand data providers such as the Lens allow users to download 50,000 records at a time. On the other hand the USPTO through PatentsView and the EPO through its text analysis service have made access to patent full texts quite straightforward for those who are prepared to invest some of their time in acquiring programming skills. As patent data becomes more readily available for analysis attention is inevitably turning to topics such as machine learning to work with data at scale, such as the open source PatCit initiative for patent citations. However, before addressing these more complex topics it is important to focus on the fundamentals of patent analytics. In the next two chapters we provide practical walkthroughs with two patent databases before turning to the use of analytics tools.