Preface

The WIPO Patent Analytics Handbook provides an introduction to advanced methods for patent analytics and focuses on tools and skills that patent analysts can use in their everyday work. The Handbook builds on the WIPO Manual for Open Source Patent Analytics which provides an introduction to working with patent data using a range of free tools to obtain, clean and visualize patent data. The handbook aims to address two challenges.

The first of these challenges is that anyone seeking to start work in patent analytics is confronted by a lack of reliable practical guidance on how to develop simple descriptive patent statistics. The OECD Patent Statistics Manual is required reading for anyone seeking to engage with patent statistics and is an invaluable resource (OECD Patent Statistics Manual 2009). However, it focuses on the issues we need to think about rather than practical demonstration. The Handbook addresses this problem by working through first principles in the development of patent counts for descriptive statistics and provides basic illustrations of the use of linear regression and forecasting models. In the process the Handbook aims to build a bridge to more sophisticated approaches to working with patent data at scale in fields such as econometrics and points to useful resources in these areas.

The ability to generate descriptive patent statistics is only one aspect of patent analytics. Recent years have witnessed an explosion in the availability of different data types that can be integrated with patent data to better inform and enrich analysis. The second and major challenge addressed by the Handbook is integrating different data types from the scientific literature, to geographic information and the results of text mining into patent analytics. In turn the range of methods that are available to patent analysts for working with patent data promises to be transformed by the emergence of accessible machine learning tools for use across a range of topics such as applicant name cleaning, text mining and image classification. In common with many other fields of research the emergence of machine learning appears to hold considerable promise for patent analytics but it remains to be seen whether this promise will be realised.

The Handbook is therefore intended to be used by researchers and professionals who are relatively new to working with patent data. It is also intended to be of interest for experienced researchers and professionals who are interested in expanding their skills in working with patent and related data at different scales.

One important challenge that has emerged in recent years with the growth of patent analytics and patent landscape analysis is the problem of reproducibility (Smith et al. 2017). Patent analysts typically work with data from a number of different databases and use a number of different methods in their analysis. However, the precise details of the coverage of different sources, the methods used, and the limitations of different approaches are often not made explicit. This makes it difficult for others to reproduce the results and to assess the quality of the analysis presented. The Handbook takes the approach that patent analysis should be reproducible. The Handbook addresses this issue by using examples from standardised open access datasets created for this purpose or from public sources. The online version of the Handbook is an example of literate programming and all chapters are accompanied by the code used to develop the examples. The chapters in Rmarkdown format containing all code are available from the public GitHub repository at https://github.com/wipo-analytics/handbook

References

OECD Patent Statistics Manual. 2009. OECD Publishing. https://doi.org/10.1787/9789264056442-en.

Smith, James A, Zeeshaan Arshad, Hannah Thomas, Andrew J Carr, and David A Brindley. 2017. “Evidence of Insufficient Quality of Reporting in Patent Landscapes in the Life Sciences.” Nature Biotechnology 35 (3): 210–14. https://doi.org/10.1038/nbt.3809.