A set of 15,776 patent applicationsm containing the word drone or drones published between 1845 and 2017 from the Clarivate Analytics Derwent Innovation database. The unique 15,776 application numbers were derived from a raw dataset containing 18,970 publications.
data("drones")
A data frame with 15,776 observations of 26 variables:
abstractThe original document abstract, a character vector
abstract_englishThe english document abstract, a character vector
applicantThe patent applicant name, also known as the assignee name, a character vector
applicant_cleanedA cleaned version of the applicant name
application_numberThe long application number including the date, a character vector
basic_patent_dateDerwent Innovation basic patent date, a character vector
basic_patent_numberThe Derwent Innovation basic patent number forming the base for the dwpi_family, a character vector
cited_nonpatentLiterature citations, field is noisy, a character vector
cited_patentsPatents cited in one or more documents, a character vector
citing_patentsPatents citing one or more documents, a character vector
cpcThe Cooperative Patent Classification Codes, a character vector
dwpi_family_datesFamily dates for DWPI family numbers, a character vector
dwpi_family_kindDocument kind codes for DWPI Family members, a character vector
dwpi_family_numbersDWPI family members - Derwent World Patent Index -, a character vector
first_claimThe first claim in a patent document, a character vector
inpadoc_family_membersINPADOC Family Members in long format with dates, a character vector
inpadoc_first_family_memberThe earliest publication number in the inpadoc_family_members based on the date, a character vector
inventorsThe original inventor name, a character vector
ipcInternational Patent Classification - IPC - codes, a character vector
priority_numberPatent priority numbers in long format with dates, a character vector
publication_numberPublication numbers in short form minus dates, a character vector
publication_yearThe year of publication of the publication numbers, a character vector
related_application_numbersDetails of related patent applications such as continuations, continuations in part and divisional applications, a character vector
title_englishThe english title, a character vector
title_originalThe original title, normally concatenated as English, French, German etc, a character vector
Clarivate Analytics Derwent Innovation database
Field names in this dataset have been simplified from their original long form to make them easier to work with in R. Patent data fields are commonly concatenated with a semicolon and require tidying for accurate counts. The cited_nonpatent field in this dataset contains irrelevant legal status information and is messy. Applicant names (assignees) were cleaned using VantagePoint by fuzzy matching names grouped on the priority number followed by manual review. In the second step the cleaned data was fuzzy match grouped on the INPADOC family member number.