Chapter 9 Conclusion

This Handbook has aimed to provide an accessible and practical guide to intermediate and advanced methods and tools for patent analytics. The Handbook is a complement to the WIPO Manual on Open Source Patent Analytics that has been widely used in introductory training in patent analytics.

In conclusion it is important to highlight some of the key take home messages that emerge from this wide ranging exploration of methods in patent analytics.

The first of these is that patent activity is an outcome of underlying investments in Research and Development. When initiating research for a patent analytics project it makes very good sense to start by investigating the scientific literature. Analysis of the scientific literature not only contributes to the process of identifying search strategies for an area of technology but allows you to become familiar with the main trends and actors involved in a technology area. The growing accessibility of tools for geospatial mapping, such as APIs or the new Research Organization Registry (ROR) and its incorporation into datasets such as OpenAlex, means that it is now possible to rapidly generate useful maps of global research activity. Mapping of this type not only assists with refining the focus of an analytics project but can also assist with identifying potential partners in different countries around the world and competing or emerging approaches. Finally, research organisations are also increasingly important players in the global landscape of patent activity. For these reasons research methods involving the scientific literature are so prominent in this Handbook.

Our second main insight is that it is very important to become familiar with different methods for counting patent data and the purposes to which different types of count can be used. This also requires careful attention to terminology in order to avoid misleading an audience. For example, as a matter of good practice the use of the word patent documents to describe patent data rather than ‘patents’ avoids giving an audience the impression that the analysis refers to patent grants rather than applications or a mix of applications and grants. It is important to be clear with audiences about what is being described. For example, where the focus is on identifying trends in research and development the use of priority counts is appropriate and can be readily explained as first filings. An audience will then generally want to see trends in applications and grants which involves engagement with patent families and careful attention to kind codes and effective description. Because different data providers use different and sometimes opaque definitions of patent families it is logical to use the DOCDB or INPADOC families wherever possible. However, in all cases analysis of activity using patent family data should be clearly explained to the reader with added notes on issues around the interpretation of kind codes particularly where analysis is extended to multiple countries. Methodological transparency is critically important where analysis aims to inform commercial or policy decision making.

A detailed understanding of the types and issues involved in patent counts is also important in preparation for modelling patent data. Chapter 4 introduced linear regression using widely used models for elucidating trends in existing data. This in turn provided a basis for the exploration of common approaches to forecasting using data on PCT applications at WIPO. As our example of a fictional crisis at WIPO made clear, it is possible to forecast patent activity such as PCT filings at WIPO. However, this requires an understanding both of the data itself and approaches to forecasting and their strengths and weaknesses. For example, we might reasonably feel confident in approaching forecasting of first filings in individual countries or areas of technology but forecasting patent family activity globally would be an entirely different matter. In making the transition from simple counts to modelling patent activity we can readily use common smoothing models but forecasting requires considerable care in selecting the data to be forecast.

The global patent system can be likened to the fictional Hogwarts library in that the global patent library houses information that is entirely innocuous and information that could be dangerous in the wrong hands. Knowledge of patent classification systems is central to the ability of an analyst to successfully navigate this system. It is therefore important to recognise the strength of patent classification systems in helping to focus patent analysis by selecting relevant areas of the system. However, as we saw in the case study in Chapter 5 it is also important to recognise the weaknesses of the patent classification for specific projects. Thus, while useful, the classification will commonly be either too broad or too specific for a specific analytics project. In other cases, such as emerging technologies, the classification may not yet have caught up with the latest developments (e.g. in the historic case nanotechnology). These limitations will generally require patent analysts to develop their own groupings. Chapter 5 provided an example of one approach to grouping using network analysis and community detection that became the basis for organising a patent landscape analysis. Finally, it is important to remember that the use of classification in patent analysis forms part of an exercise in communication with an audience who are unlikely to have much time, or be interested in classification symbols or long reams of formulaic text. As such, finding ways to communicate data on technology areas in a way that is understandable, as in the case of the short IPC, is an important element in successful patent analytics.

Citation analysis has been a major if not foundational focus of attention in scientometrics and patent analytics. Chapter 6 provided an in depth exploration of patent citations using the example of Nobel prize winning gene editing (CRISPR) technology and the contest between Berkeley and the Broad Institute in this field. Analysis of this worked example provided a basis for the exploration of main path analysis that combines citation analysis with the use of patent classification systems in order to reveal the development of technologies and point towards the forecasting and detection of technological trajectories. Patent citation data is increasingly available at scale through the OECD, the US PatentsView and web services such as the Lens patent API. In addition, the citation connections between the scientific and patent literature are also increasingly openly accessible at a range of different scales to inform patent analytics. Citation analysis is a critically important part of patent analysis because it allows us to identify the contours of technology landscapes using the framework of patent classification systems and increasingly to identify the trajectories of technologies within those landscapes. In future years, the ability to freely access citation data at scale will in the author’s view provide a platform for transformations in the scale and accuracy of patent analytics.

Text mining is a key component of patent analytics and is being transformed by growing access to full text patent data thanks to the work of the USPTO PatentsView service and the EPO. Growing access to patent texts at scale is also accompanied by the increasing accessibility of machine learning based approaches to Natural Language Processing (NLP). Chapter 7 examined standard widely used approaches to text mining and demonstrated how text mining could be combined with knowledge of the patent classification to more accurately target texts. Using the worked example of biodiversity patent activity this provided a basis for the analysis of words and phrases (ngrams) in text mining and widely used technique such as Term Frequency Inverse Document Frequency (TFIDF) to identify the distinctive features of texts in areas of the classification and to focus the analysis on areas such as gene editing. The Chapter also demonstrated the analysis of terms over time and the role of the visualisation of networks of terms in patent analytics. These common techniques can be applied either programmatically (e.g. using R or Python) or using specialised analytics software such as VantagePoint for fine grained control.

Standard approaches to text mining are extremely effective for many patent analytics tasks either using R, Python or VantagePoint. However, these approaches are increasingly being complemented and for some replaced by machine learning based approaches to Natural Language Processing. Chapter 8 provided an in depth introduction to machine learning as a field that encompasses Natural Language Processing, text classification, named entity recognition and image classification. Machine learning based approaches to text or image analysis are ultimately based on the use of algorithms to recognise patterns. Chapter 8 provided an in depth exploration of machine learning in Natural Language Processing using the fasttext, spaCy and Prodigy libraries. The Chapter concluded by pointing to the increasing accessibility of more accurate transformer models and off the shelf plug and play services provided through companies such as HuggingFace and the major cloud service providers.

For patent analytics the promise of machine learning based approaches is that it will be possible to automate tasks such as classification and entity recognition at scale and incorporate models into processing pipelines. The growing accessibility of pretrained models and affordable infrastructure means that these approaches will become increasingly accessible for analysts regardless of their budgets. Above all, the promise of machine learning approaches for text analysis tasks is that it will relieve some of the burden of hard manual processing from the analyst through automation. However, as discussed in Chapter 8 against this we must not underestimate the challenge of training models to perform accurately on the specialised language of patent texts or the images that accompany patents. In machine learning, the quality of training data is king. A great deal of hidden time and labour is required to generate training data that is appropriate for patent analytics. We may hope, as some initiatives already suggest, that with time specialised models will be created to assist with the classification and extraction of entities from patent documents. However, in the meantime patent analysts will be wise to rely on existing easy to use techniques and to progressively experiment with incorporating accessible machine learning models into their workflows. In this way patent analysts can benefit from the undoubted strengths of machine learning approaches while avoiding some of its pitfalls.

In closing the Handbook it is appropriate to briefly speculate about what the future of patent analytics might look like. For the author of this Handbook it would be highly desirable to see the increasing availability of patent data in forms that are amenable for patent analytics. The USPTO PatentsView service is the model in this regard followed by the EPO. It is to be hoped that in future the WIPO PCT collection might also be made available. As we have seen in the Handbook considerable manual or computational processing is required on the part of analysts, notably with data cleaning and text processing. Some of this work could be done in advance such as the harmonisation of applicant and inventor names. The OECD has done pioneering work in this area that is linked to work the EPO World Patent Statistical Database (PATSTAT). More recently the USPTO PatentsView service has done admirable work in disambiguating applicant and inventor names and geocoding patent data. It is highly desirable to support and encourage these types of initiatives for the benefit of the wider users of patent data. At the same time we could imagine that a great deal of the hard labour in text based patent analytics could be removed by following the example of the General Index which provides open access to the ngrams of over 57 million scientific articles. A similar initiative by patent offices with their texts could greatly improve the access of patent texts for analytics purposes by wider user communities. In a similar vein, as we saw in the example of fasttext, supporting and perhaps maintaining vector space models could greatly reduce duplication of effort by creating a common baseline. This would allow researchers and commercial providers to focus on the development of more specialist tasks.

Finally, as the first Handbook of its type the present work forms part of a wider effort to promote open patent analytics for the benefit of the wider community. I hope that the Handbook has proved useful and that you will contribute through your own work to the promotion of open patent analytics for the benefit of the wider community.

An Introduction to Bibliometrics. 2018. Elsevier. https://doi.org/10.1016/c2016-0-03695-1.
Automated Categorization in the International Patent Classification. 2003. Vol. 37. 1. United States: Association for Computing Machinery (ACM). https://doi.org/10.1145/945546.945547.
Benson, Christopher L., and Christopher L. Magee. 2012. “A Hybrid Keyword and Patent Class Methodology for Selecting Relevant Sets of Patents for a Technological Field.” Scientometrics 96 (1): 69–82. https://doi.org/10.1007/s11192-012-0930-3.
———. 2014. “Technology Structural Implications from the Extension of a Patent Search Method.” Scientometrics 102 (3): 1965–85. https://doi.org/10.1007/s11192-014-1493-2.
Blondel, Vincent D., Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. “Fast Unfolding of Communities in Large Networks.” Journal of Statistical Mechanics: Theory and Experiment 2008 (10): 10008–null. https://doi.org/10.1088/1742-5468/2008/10/p10008.
Bojanowski, Piotr, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. “Enriching Word Vectors with Subword Information.” arXiv Preprint arXiv:1607.04606.
Bollen, Johan, Herbert Van de Sompel, Aric Hagberg, Luís M. A. Bettencourt, Ryan Chute, Marko A. Rodriguez, and Lyudmila Balakireva. 2009a. “Clickstream Data Yields High-Resolution Maps of Science.” PloS One 4 (3): e4803–. https://doi.org/10.1371/journal.pone.0004803.
Bollen, Johan, Herbert Van de Sompel, Aric Hagberg, Luis Bettencourt, Ryan Chute, Marko A. Rodriguez, and Lyudmila Balakireva. 2009b. “Clickstream Data Yields High-Resolution Maps of Science.” Edited by Alan Ruttenberg. PLoS ONE 4 (3): e4803. https://doi.org/10.1371/journal.pone.0004803.
Boyack, Kevin W., and Richard Klavans. 2008. “Measuring Science–Technology Interaction Using Rare Inventor–Author Names.” Journal of Informetrics 2 (3): 173–82. https://doi.org/10.1016/j.joi.2008.03.001.
Boyack, Kevin W., Richard Klavans, and Katy Börner. 2005. “Mapping the Backbone of Science.” Scientometrics 64 (3): 351–74. https://doi.org/10.1007/s11192-005-0255-6.
Callaert, Julie, Joris Grouwels, and Bart Van Looy. 2011. “Delineating the Scientific Footprint in Technology: Identifying Scientific Publications Within Non-Patent References.” Scientometrics 91 (2): 383–98. https://doi.org/10.1007/s11192-011-0573-9.
Callaert, Julie, Bart Van Looy, Arnold Verbeek, Koenraad Debackere, and Bart Thijs. 2006. “Traces of Prior Art: An Analysis of Non-Patent References Found in Patent Documents.” Scientometrics 69 (1): 3–20. https://doi.org/10.1007/s11192-006-0135-8.
Callaert, Julie, Bart Van Looy, Arnold Verbeek, Koenraad Debackere, and Bart Thijs. 2006. “Traces of Prior Art: An Analysis of Non-Patent References Found in Patent Documents.” Scientometrics 69 (1): 3–20. https://doi.org/10.1007/s11192-006-0135-8.
Carley, Stephen, Alan L. Porter, Ismael Rafols, and Loet Leydesdorff. 2017. “Visualization of Disciplinary Profiles: Enhanced Science Overlay Maps.” Journal of Data and Information Science 2 (3): 68–111. https://doi.org/10.1515/jdis-2017-0015.
Carley, Stephen, Alan L. Porter, and Jan Youtie. 2019. “A Multi-Match Approach to the Author Uncertainty Problem.” Journal of Data and Information Science 4 (2): 1–18. https://doi.org/10.2478/jdis-2019-0006.
Carpenter, Mark P., Martin Cooper, and Francis Narin. 1980. “Linkage Between Basic Research Literature and Patents.” Research Management 23 (2): 30–35. https://doi.org/10.1080/00345334.1980.11756595.
Chen, Lixin. 2017. “Do Patent Citations Indicate Knowledge Linkage? The Evidence from Text Similarities Between Patents and Their Citations.” Journal of Informetrics 11 (1): 63–79. https://doi.org/10.1016/j.joi.2016.04.018.
Cotropia, Christopher A., Mark A. Lemley, and Bhaven Sampat. 2013. “Do Applicant Patent Citations Matter?” Research Policy 42 (4): 844–54. https://doi.org/10.1016/j.respol.2013.01.003.
Criscuolo, P. 2006. “The ’Home Advantage’ Effect and Patent Families. A Comparison of OECD Triadic Patents, the USPTO and the EPO.” Scientometrics 66: 23–41.
Cyranoski, David, and Heidi Ledford. 2018. “Genome-Edited Baby Claim Provokes International Outcry.” Nature 563 (7733): 607–8. https://doi.org/10.1038/d41586-018-07545-0.
Dechezleprêtre, Antoine, Yann Ménière, and Myra Mohnen. 2017. “International Patent Families: From Application Strategies to Statistical Indicators.” Scientometrics 111 (2): 793–828. https://doi.org/10.1007/s11192-017-2311-4.
Dernis, Hélène. 2007. “Nowcasting Patent Indicators,” January.
Dernis, Hélène, and Mosahid Khan. 2004. “Triadic Patent Families Methodology.” Organisation for Economic Co-Operation; Development (OECD). https://doi.org/10.1787/443844125004.
Ding, Cherng G., Wen-Chi Hung, Meng-Che Lee, and Hung-Jui Wang. 2017. “Exploring Paper Characteristics That Facilitate the Knowledge Flow from Science to Technology.” Journal of Informetrics 11 (1): 244–56. https://doi.org/10.1016/j.joi.2016.12.004.
Dosi, Giovanni. 1982. “Technological Paradigms and Technological Trajectories.” Research Policy 11 (3): 147–62. https://doi.org/10.1016/0048-7333(82)90016-6.
Egelie, Knut J, Gregory D Graff, Sabina P Strand, and Berit Johansen. 2016. “The Emerging Patent Landscape of CRISPRcas Gene Editing Technology.” Nature Biotechnology 34 (10): 1025–31. https://doi.org/10.1038/nbt.3692.
Fellbaum, Christiane. 2015. WordNet. Edited by Susan E. F. Chipman. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199842193.013.001.
Fukuzawa, Naomi, and Takanori Ida. 2015. “Science Linkages Between Scientific Articles and Patents for Leading Scientists in the Life and Medical Sciences Field: The Case of Japan.” Scientometrics 106 (2): 629–44. https://doi.org/10.1007/s11192-015-1795-z.
Garfield, Eugene, Irving H. Sher, and Richard J. Torpie. 1964. “The Use of Citation Data in Writing the History of Science.” Defense Technical Information Center. https://doi.org/10.21236/ad0466578.
Hall, Bronwyn H., and Dietmar Harhoff. 2012. “Recent Research on the Economics of Patents.” Annual Review of Economics 4 (1): 541–65. https://doi.org/10.1146/annurev-economics-080511-111008.
Hall, Bronwyn H., Adam Jaffe, and Manuel Trajtenberg. 2005. “Market Value and Patent Citations.” The RAND Journal of Economics 36 (1): 16–38. http://www.jstor.org/stable/1593752.
Hall, Bronwyn, Adam Jaffe, and Manuel Trajtenberg. 2001. “The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools.” National Bureau of Economic Research. https://doi.org/10.3386/w8498.
Harhoff, Dietmar, Francis Narin, F. M. Scherer, and Katrin Vopel. 1999. “Citation Frequency and the Value of Patented Inventions.” The Review of Economics and Statistics 81 (3): 511–15. http://www.jstor.org/stable/2646773.
Harhoff, Dietmar, Frederic M Scherer, and Katrin Vopel. 2003. “Citations, Family Size, Opposition and the Value of Patent Rights.” Research Policy 32 (8): 1343–63. https://doi.org/10.1016/s0048-7333(02)00124-5.
Hegde, Deepak, David Mowery, and Stuart Graham. 2007. “Pioneers, Submariners, or Thicket-Builders: Which Firms Use Continuations in Patenting?” National Bureau of Economic Research. https://doi.org/10.3386/w13153.
Hegde, Deepak, and Bhaven Sampat. 2009. “Examiner Citations, Applicant Citations, and the Private Value of Patents.” Economics Letters 105 (3): 287–89. https://doi.org/10.1016/j.econlet.2009.08.019.
Hummon, Norman P., and Patrick Dereian. 1989. “Connectivity in a Citation Network: The Development of DNA Theory.” Social Networks 11 (1): 39–63. https://doi.org/10.1016/0378-8733(89)90017-8.
Jaffe, Adam B., and Gaétan de Rassenfosse. 2017. “Patent Citation Data in Social Science Research: Overview and Best Practices.” Journal of the Association for Information Science and Technology 68 (6): 1360–74. https://doi.org/10.1002/asi.23731.
Jaffe, Adam, and Manuel Trajtenberg. 1998. “International Knowledge Flows: Evidence from Patent Citations.” National Bureau of Economic Research. https://doi.org/10.3386/w6507.
———. 2002. Patents, Citations, and Innovations: A Window on the Knowledge Economy. MIT Press.
Joulin, Armand, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. “FastText.zip: Compressing Text Classification Models.” arXiv Preprint arXiv:1612.03651.
Joulin, Armand, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. “Bag of Tricks for Efficient Text Classification.” arXiv Preprint arXiv:1607.01759.
Karvonen, Matti, and Tuomo Kässi. 2013. “Patent Citations as a Tool for Analysing the Early Stages of Convergence.” Technological Forecasting and Social Change 80 (6): 1094–1107. https://doi.org/10.1016/j.techfore.2012.05.006.
Kay, Luciano, Nils C. Newman, Jan Youtie, Alan L. Porter, and Ismael Rafols. 2014. “Patent Overlay Mapping: Visualizing Technological Distance.” Journal of the Association for Information Science and Technology 65 (12): 2432–43. https://doi.org/10.1002/asi.23146.
Klavans, Richard, and Kevin W. Boyack. 2009a. “Toward a Consensus Map of Science.” Journal of the American Society for Information Science and Technology 60 (3): 455–76. https://doi.org/10.1002/asi.20991.
———. 2009b. “Toward a Consensus Map of Science.” Journal of the American Society for Information Science and Technology 60 (3): 455–76. https://doi.org/10.1002/asi.20991.
Kogler, Dieter F., Gaston Heimeriks, and Loet Leydesdorff. 2018. “Patent Portfolio Analysis of Cities: Statistics and Maps of Technological Inventiveness.” European Planning Studies 26 (11): 2256–78. https://doi.org/10.1080/09654313.2018.1530147.
Ledford, Heidi. 2016. “Bitter Fight over CRISPR Patent Heats Up.” Nature 529 (7586): 265–65. https://doi.org/10.1038/nature.2015.17961.
———. 2017. “Broad Institute Wins Bitter Battle over CRISPR Patents.” Nature 542 (7642): 401–1. https://doi.org/10.1038/nature.2017.21502.
———. 2018. “Pivotal CRISPR Patent Battle Won by Broad Institute.” Nature, September. https://doi.org/10.1038/d41586-018-06656-y.
Lemley, Mark A., and Kimberly A. Moore. 2003. “Ending Abuse of Patent Continuations.” SSRN Electronic Journal. https://doi.org/10.2139/ssrn.462404.
Leydesdorff, Loet, Duncan Kushnir, and Ismael Rafols. 2012. “Interactive Overlay Maps for US Patent (USPTO) Data Based on International Patent Classification (IPC).” Scientometrics 98 (3): 1583–99. https://doi.org/10.1007/s11192-012-0923-2.
Leydesdorff, Loet, and Ismael Rafols. 2009. “A Global Map of Science Based on the ISI Subject Categories.” Journal of the American Society for Information Science and Technology 60 (2): 348–62. https://doi.org/10.1002/asi.20967.
Magee, Christopher L., Patrick W. Kleyn, Brendan M. Monks, Ulrich Betz, and Subarna Basnet. 2018. “Pre-Existing Technological Core and Roots for the CRISPR Breakthrough.” Edited by Shuang-yong Xu. PLOS ONE 13 (9): e0198541. https://doi.org/10.1371/journal.pone.0198541.
Martinez, Catalina. 2010a. “Insight into Different Types of Patent Families.” Organisation for Economic Co-Operation; Development (OECD). https://doi.org/10.1787/5kml97dr6ptl-en.
———. 2010b. “Patent Families: When Do Different Definitions Really Matter?” Scientometrics 86 (1): 39–63. https://doi.org/10.1007/s11192-010-0251-3.
Meyer, Martin. 2000. “Does Science Push Technology? Patents Citing Scientific Literature.” Research Policy 29 (3): 409–34. https://doi.org/10.1016/s0048-7333(99)00040-2.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” https://128.84.21.199/abs/1301.3781?context=cs; https://arxiv.org/abs/1301.3781; https://arxiv.org/pdf/1301.3781; https://lens.org/104-512-929-235-758.
Narin, F., K. S. Hamilton, and D. Olivastro. 1995. “Linkage Between Agency-Supported Research and Patented Industrial Technology.” Research Evaluation 5 (3): 183–87. https://doi.org/10.1093/rev/5.3.183.
Narin, Francis, Kimberly S. Hamilton, and Dominic Olivastro. 1997. “The Increasing Linkage Between u.s. Technology and Public Science.” Research Policy 26 (3): 317–30. https://doi.org/10.1016/s0048-7333(97)00013-9.
OECD Patent Statistics Manual. 2009. OECD Publishing. https://doi.org/10.1787/9789264056442-en.
Oldham, Paul D, and Stephen Hall. 2018. “Synthetic Biology: Mapping the Patent Landscape.” bioRxiv. https://doi.org/10.1101/483826.
Oldham, Paul, Stephen Hall, and Oscar Forero. 2013. “Biological Diversity in the Patent System.” PloS One 8 (11): 1–16. https://doi.org/10.1371/journal.pone.0078737.
Park, Hyunseok, and Christopher L. Magee. 2017. “Tracing Technological Development Trajectories: A Genetic Knowledge Persistence-Based Main Path Approach.” Edited by Zhong-Ke Gao. PLOS ONE 12 (1): e0170895. https://doi.org/10.1371/journal.pone.0170895.
Porter, Alan L., and Scott W. Cunningham. 2004. Tech Mining: Exploiting New Technologies for Competitive Advantage. https://lens.org/023-481-161-675-434.
Rafols, Ismael, Alan L. Porter, and Loet Leydesdorff. 2010. “Science Overlay Maps: A New Tool for Research Policy and Library Management.” Journal of the American Society for Information Science and Technology 61 (9): 1871–87. https://doi.org/10.1002/asi.21368.
Rinker, Tyler W. 2018. textstem: Tools for Stemming and Lemmatizing Text. Buffalo, New York. http://github.com/trinker/textstem.
Risch, Julian, and Ralf Krestel. 2019. “Domain-Specific Word Embeddings for Patent Classification.” Drug Testing and Analysis 53 (1): 108–22. https://doi.org/10.1108/dta-01-2019-0002.
Rizzo, Ugo, Nicolò Barbieri, Laura Ramaciotti, and Demian Iannantuono. 2018. “The Division of Labour Between Academia and Industry for the Generation of Radical Inventions.” The Journal of Technology Transfer, August. https://doi.org/10.1007/s10961-018-9688-y.
Robinson, David. 2021. Widyr: Widen, Process, Then Re-Tidy Data. https://CRAN.R-project.org/package=widyr.
Schoen, Antoine, Lionel Villard, Patricia Laurens, Jean-Philippe Cointet, Gaston Heimeriks, and Floor Alkemade. 2012. “The Network Structure of Technological Developments; Technological Distance as a Walk on the Technology Map.” In.
Scotchmer, Suzanne. 1991. “Standing on the Shoulders of Giants: Cumulative Research and the Patent Law.” Journal of Economic Perspectives 5 (1): 29–41. https://doi.org/10.1257/jep.5.1.29.
Silge, Julia, and David Robinson. 2016. “Tidytext: Text Mining and Analysis Using Tidy Data Principles in r.” JOSS 1 (3). https://doi.org/10.21105/joss.00037.
Smith, James A, Zeeshaan Arshad, Hannah Thomas, Andrew J Carr, and David A Brindley. 2017. “Evidence of Insufficient Quality of Reporting in Patent Landscapes in the Life Sciences.” Nature Biotechnology 35 (3): 210–14. https://doi.org/10.1038/nbt.3809.
Song, Binyang, Bowen Yan, Giorgio Triulzi, Jeff Alstott, and Jianxi Luo. 2019. “Overlay Technology Space Map for Analyzing Design Knowledge Base of a Technology Domain: The Case of Hybrid Electric Vehicles.” Research in Engineering Design 30 (3): 405–23. https://doi.org/10.1007/s00163-019-00312-w.
Sternitzke, Christian. 2009. “Defining Triadic Patent Families as a Measure of Technological Strength.” Scientometrics 81 (1): 91–109. https://doi.org/10.1007/s11192-009-1836-6.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” https://lens.org/086-980-365-076-590.
Wang, Kuansan, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. 2020. Microsoft Academic Graph: When experts are not enough.” Quantitative Science Studies 1 (1): 396–413. https://doi.org/10.1162/qss_a_00021.
Webb, Colin, Hélène Dernis, Dietmar Harhoff, and Karin Hoisl. 2005. “Analysing European and International Patent Citations: A Set of EPO Patent Database Building Blocks.” OECD Science, Technology and Industry Working Papers 2005/9. OECD Publishing. https://EconPapers.repec.org/RePEc:oec:stiaaa:2005/9-en.
Wick, Michael, Sameer Singh, and Andrew McCallum. 2012. “A Discriminative Hierarchical Model for Fast Coreference at Large Scale.” In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, 379–88. ACL ’12. Stroudsburg, PA, USA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=2390524.2390578.
Wickham, Hadley. 2022. Modelr: Modelling Functions That Work with the Pipe.
Wijffels, Jan. 2022. Udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the ’UDPipe’ ’NLP’ Toolkit. https://CRAN.R-project.org/package=udpipe.
Yan, Bowen, and Jianxi Luo. 2019. “The Superior Knowledge Proximity Measure for Patent Mapping.” https://lens.org/024-806-011-424-241.