Datasets

All datasets used in the Handbook are publicly available through an Open Science Framework repository at https://osf.io/jr87e/ and are arranged by chapter.

The Handbook makes extensive use of a data relating to drone technology.

  • The drones dataset. This is a set of training datasets used in examples. The core dataset consists of 15,557 patent applications involving the term drone or drones somewhere in the text. You can download the data as a zip file from the repository at https://osf.io/download/zubd4/

Users of Rstudio can install the drones package directly using the following code. Note that the devtools package must be installed (included in the packages above).

#install.packages("devtools")

devtools::install_github("wipo-analytics/drones")

When the drones package is installed review the contents of each dataset in the package documentation (see Packages in Rstudio) and load the data into your work space using the following.

library(drones)
drones <- drones::drones

A new version of the drones dataset package that we call dronesr has been created using data from The Lens database and its API. If you would like to use updated data to test the approaches provided in the Handbook then install the dronesr package. For those who are not using R the new data can be downloaded as a single zipped file from the Open Science Framework repository at https://osf.io/download/yngqc/](https://osf.io/download/yngqc/).

#install.packages("devtools") # from github

devtools::install_github("wipo-analytics/dronesr")

The dronesr data contains patent data and scientific literature and is available in two public Lens collections:

  1. The patent collection https://www.lens.org/lens/search/patent/list?collectionId=199031
  2. The literature collection https://www.lens.org/lens/search/scholar/list?collectionId=199039

The dronesr data is particularly valuable for readers interested in exploring the relationship between the scientific and the patent literature and citations.