Chapter 1 Introduction

Patent analytics is a growing field that encompasses the analysis of patent data, analysis of the scientific literature, data cleaning, text mining, machine learning, geographic mapping and data visualisation. The WIPO Patent Analytics Handbook provides an introduction to advanced methods and tools for patent analytics. The Handbook complements the WIPO Manual on Open Source Patent Analytics which provides an introduction to tools and methods in patent analytics.

The Handbook focuses on more advanced methods and approaches using commercial and free tools and databases. The fields of patent search, patent statistics and patent analytics have been transformed in recent years by the growing availability of free and commercial databases and software for data mining, data visualisation and geographic mapping. The increasing availability of a wide range of web services or Application Programming Interfaces for access to patent data, the scientific literature and cloud computing services for machine learning or geocoding mean that today a patent analyst has access to an unprecedented and cost effective range of tools to facilitate their work.

Chapter 2 focuses on researching the scientific literature as a foundation for in depth patent research and analysis. This chapter begins by highlighting the growing accessibility of scientific publcations and data arising from an increasing emphasis on open access publication. The chapter then focuses on the role of exploratory searches of the scientific literature in defining key word search strategies. The chapter then explores the main issues that arise when working with the scientific literature and how they can be addressed. The chapter concludes by considering strategies for joining together the scientific literature and patent literature.

Chapter 3 addresses methods for counting patent data as a basis for creating descriptive patent statistics and statistical models. Methodologies for patent counts has received remarkably little attention outside a highly specialised literature and this chapter aims to provide a step by step introduction to the issues involves in developing descriptive patent statistics. The chapter concludes by illustrating how trends in demand for patent rights can be identified across multiple countries.

Chapter 4 focuses on the EPO World Patent Statistical Database as the tool used by many patent offices and researchers as the international standard for the development of patent statistics and indicators. In common with other databases PATSTAT requires the use of the SQL language to generate queries or interfaces such as IISC PATSTAT that make access to PATSTAT easier

Chapter 5 addresses the use of other patent datafields such as applicant and inventor names, classification codes and citation data in the development of innovation analysis and business intelligence indicators.

Chapter 6 provides an introduction to text mining as a powerful tool in the patent analysts toolbox. Building on the discussion in Chapter 1 the chapter moves through the basics of text mining with patent data and concludes with a growing emphasis on machine learning approaches such as the popular Word2Vec algorithm.

Chapter 7 geocoding of patent data to develop geographic maps of patent and related data to geographic maps. Increasingly, it is possible to link different types of data on the same map using online geolocation services and to present the results in interactive maps. This chapter will discuss the principal patent data fields that are available for mapping and provide illustrations from services such as the USPTO and the ASEAN marine patent landscape report. The strengths and weaknesses of geolocation services such as the Google Maps API will be discussed such as the noisy nature of patent names and address fields, methods for regularising address data and the challenges involved in validating the georeferenced data returned from geolocation web services.

Chapter 9 focuses on the opportunities presented by machine learning to advance patent analytics. Machine learning or artificial intelligence approaches are increasingly being applied to text classification and named entity recognition and image classification. The application of machine learning in patent analytics remains at an early stage with the USPTO pioneering the application of machine learning algorithms to inventor and applicant name cleaning while Clarivate Analytics has recently applied machine learning to enhance the cleaning of applicant names. In future years we are likely to see the application of machine learning across the spectrum of patent analysis tasks. However, it can be very difficult to separate the hype around machine learning and artificial intelligence from the reality of what is available and achievable now. This chapter aims to assist with navigating these exciting but at times confusing and over hyped opportunities.

The patent system is supported by a range of classification schemes that are designed to assist patent examiners with identifying and retrieving patent documents. These classsfication schemes commonly take the form of alphanumeric codes organised from general to specific categories. Chapter 10 discusses the use of the International Patent Classification (IPC) and the closely related Cooperative Patent Classification (CPC) in patent analytics.

Chapter 11 discusses the important role that patent citations play in patent analytics and the strengths and weaknesses of different approaches to patent citation analysis. The chapter begins with a description of the two types of patent citation (backwards and forward citations), the sources of patent citations and their impacts before considering different approaches to citation counts based on citations of individual documents and citations of patent families.

Chapter 12 considers the emerging topic of social media as part of the toolbox for patent analytics. Using data from Twitter as an example, the chapter considers the potential use of social media data in areas such as searching for prior art, understanding company activity in a technology, assessing potential markets for an invention and public debate around controversial areas of science and technology such as artificial intelligence.