Chapter 1 Introduction

Patent analytics is a growing field that encompasses the analysis of patent data, analysis of the scientific literature, data cleaning, text mining, machine learning, geographic mapping and data visualisation. The WIPO Patent Analytics Handbook provides an introduction to advanced methods and tools for patent analytics. The Handbook complements the WIPO Manual on Open Source Patent Analytics which provides an introduction to tools and methods in patent analytics.

The Handbook focuses on more advanced methods and approaches using commercial and free tools and databases. The fields of patent search, patent statistics and patent analytics have been transformed in recent years by the growing availability of free and commercial databases and software for data mining, data visualisation and geographic mapping. The increasing availability of a wide range of web services or Application Programming Interfaces for access to patent data, the scientific literature and cloud computing services for machine learning or geocoding mean that today a patent analyst has access to an unprecedented and cost effective range of tools to facilitate their work.

Chapter 2 focuses on researching the scientific literature as a foundation for in depth patent research and analysis. This chapter begins by highlighting the growing accessibility of scientific publications and data arising from an increasing emphasis on open access publication. The chapter then focuses on the role of exploratory searches of the scientific literature in defining key word search strategies. The chapter then explores the main issues that arise when working with the scientific literature and how they can be addressed. The chapter concludes by considering strategies for joining together the scientific literature and patent literature.

Chapter 3 examines geocoding of the scientific literature to develop geographic maps. Increasingly, it is possible to link different types of data on the same map using online geolocation services and to present the results in interactive form. This chapter provides a basic introductory guide to geocoding using the Google Maps API and packages that can be used to access the API.

Chapter 4 provides an in depth exploration of methods for counting patent data as a basis for creating descriptive patent statistics and statistical models. Methodologies for patent counts have received remarkably little attention outside a highly specialised literature and this chapter aims to provide a step by step introduction to the issues involves in developing descriptive patent statistics. The chapter ranges through counts by priority, counts of patent applications, and counts by family. The chapter then provides a gentle introduction to linear regression using popular models as a basis for exploring predictive modelling by forecasting trends in PCT applications at WIPO.

Chapter 5 addresses the importance of understanding patent classification systems as the key tool for supporting patent analytics. The patent system is supported by a range of classification schemes that are designed to assist patent examiners with identifying and retrieving patent documents. These classification schemes commonly take the form of alphanumeric codes organised from general to specific categories. This chapter discusses the use of the International Patent Classification (IPC) and the closely related Cooperative Patent Classification (CPC) in patent analytics. The chapter provides an in depth introduction to the International Patent Classification (IPC) with a case study of using the IPC to examine patent activity for animal genetic resources and concludes with a discussion of the growing use of classification systems in technology mapping.

Chapter 6 explores the important role that patent citations play in patent analytics and the strengths and weaknesses of different approaches to patent citation analysis. The chapter begins with a description of the two types of patent citation (backwards and forward citations), the sources of patent citations and their impacts before considering different approaches to citation counts based on citations of individual documents and citations of patent families. The chapter uses the example of gene editing CRISPR patents and citations as a case study and concludes by exploring research on main path analysis with citation data.

Chapter 7 provides an in depth introduction to text mining as a powerful tool in the patent analysts toolbox. Building on the discussion in Chapter 2 the chapter moves through the basics of text mining with patent data and concludes with a growing emphasis on machine learning approaches such as the popular Word2Vec algorithm.

Chapter 8 examines the opportunities presented by machine learning to advance patent analytics. Machine learning or artificial intelligence approaches are increasingly being applied to text classification and named entity recognition and image classification. The application of machine learning in patent analytics remains at an early stage with the USPTO pioneering the application of machine learning algorithms to inventor and applicant name cleaning while Clarivate Analytics has recently applied machine learning to enhance the cleaning of applicant names. In future years we are likely to see the application of machine learning across the spectrum of patent analysis tasks. However, it can be very difficult to separate the hype around machine learning and artificial intelligence from the reality of what is available and achievable now. This chapter moves from the basics of machine learning approaches in Natural Language Processing to a worked example using the popular spaCy library in Python.

Chapter 9 concludes this edition of the WIPO Patent Analytics Handbook with a discussion of the possible future(s) of patent analytics in the context of increasing access to patent and related data at scale and the rise of machine learning.