A set of 15,776 patent applicationsm containing the word drone or drones published between 1845 and 2017 from the Clarivate Analytics Derwent Innovation database. The unique 15,776 application numbers were derived from a raw dataset containing 18,970 publications.
data("drones")
A data frame with 15,776 observations of 26 variables:
abstract
The original document abstract, a character vector
abstract_english
The english document abstract, a character vector
applicant
The patent applicant name, also known as the assignee name, a character vector
applicant_cleaned
A cleaned version of the applicant name
application_number
The long application number including the date, a character vector
basic_patent_date
Derwent Innovation basic patent date, a character vector
basic_patent_number
The Derwent Innovation basic patent number forming the base for the dwpi_family, a character vector
cited_nonpatent
Literature citations, field is noisy, a character vector
cited_patents
Patents cited in one or more documents, a character vector
citing_patents
Patents citing one or more documents, a character vector
cpc
The Cooperative Patent Classification Codes, a character vector
dwpi_family_dates
Family dates for DWPI family numbers, a character vector
dwpi_family_kind
Document kind codes for DWPI Family members, a character vector
dwpi_family_numbers
DWPI family members - Derwent World Patent Index -, a character vector
first_claim
The first claim in a patent document, a character vector
inpadoc_family_members
INPADOC Family Members in long format with dates, a character vector
inpadoc_first_family_member
The earliest publication number in the inpadoc_family_members based on the date, a character vector
inventors
The original inventor name, a character vector
ipc
International Patent Classification - IPC - codes, a character vector
priority_number
Patent priority numbers in long format with dates, a character vector
publication_number
Publication numbers in short form minus dates, a character vector
publication_year
The year of publication of the publication numbers, a character vector
related_application_numbers
Details of related patent applications such as continuations, continuations in part and divisional applications, a character vector
title_english
The english title, a character vector
title_original
The original title, normally concatenated as English, French, German etc, a character vector
Clarivate Analytics Derwent Innovation database
Field names in this dataset have been simplified from their original long form to make them easier to work with in R. Patent data fields are commonly concatenated with a semicolon and require tidying for accurate counts. The cited_nonpatent field in this dataset contains irrelevant legal status information and is messy. Applicant names (assignees) were cleaned using VantagePoint by fuzzy matching names grouped on the priority number followed by manual review. In the second step the cleaned data was fuzzy match grouped on the INPADOC family member number.