Learn how to use the USPTO Patentsview API in R.
The USPTO has implemented a patent visualization and analysis platform named Patentsview
knitr::include_graphics("images/patentsview/fig1_front.png")
Patentsview
is openly available through the web and enables the discovery and exploitation of US patent data in a visual, convenient way.
It makes it particularly easy to check out which are the most patented technologies, the main patent assignees within each technology area and the most prolific inventors. All these relationships, the associated locations and the data can be interactively explored and compared.
There is also an associated API for developers to personalise queries and download the corresponding patent data.
Moreover, an R package is available to access to Patentsview
data from R. Let’s check it out.
With the patentsview
package we can access, search and analyse USPTO patent data available in Patentsview.
First, we need to install and load the required packages (together with Patentsview
we will also use the package highcharter
allowing us to generate interactive visualizations of the data we obtain).
library(patentsview)
library(tidyverse) # for data manipulation
library(highcharter) # wrapper of highchart library to generate interactive visualisations
We can then formulate our patent queries using specified search fields.
We will need to express them using the function: with_qfuns
(to list query functions to be used) and then concatenate searches in each data field we want to search with corresponding operator functions; p.e. dates with gte(patent_date = "2016-01-01
(where gte
stands for greater or equal
) or exact terms in textual fields (p.e. in the abstract) with text_all(patent_abstract = "UAS")
or specific Cooperative Classification Classes (CPCs) using: qry_funs$eq(cpc_subsection_id = "G12")
.
Here is just a query formulation example considering a series of synonyms and a series of related classes:
library(patentsview)
query <-
with_qfuns(
and(
gte(patent_date = "2016-01-01"),
or(
text_all(patent_abstract = "UAS"),
text_all(patent_abstract = "drones"),
text_all(patent_abstract = "UAV"),
text_all(patent_abstract = "commercial"),
text_all(patent_abstract = "mobility"),
text_all(patent_abstract = "traffic"),
text_all(patent_abstract = "urban"),
text_all(patent_abstract = "cities"),
text_all(patent_abstract = "public"),
text_all(patent_abstract = "security"),
text_all(patent_abstract = "city"),
text_all(patent_abstract = "unmanned"),
text_all(patent_abstract = "aerial")
),
or(
qry_funs$eq(cpc_subsection_id = "G02B"),
qry_funs$eq(cpc_subsection_id = "G09B"),
qry_funs$eq(cpc_subsection_id = "G01"),
qry_funs$eq(cpc_subsection_id = "G21K"),
qry_funs$eq(cpc_subsection_id = "B64"),
qry_funs$eq(cpc_subsection_id = "G08"),
qry_funs$eq(cpc_subsection_id = "G05"))
)
)
Some of the operators that can be used are:
Once we are happy with our search, we need to create a list containing the fields to be used in our analysis (note that we also list location data -longitude and latitude- so we can map later on):
fields <- c("patent_number", "assignee_organization",
"patent_num_cited_by_us_patents", "app_date", "patent_date",
"assignee_total_num_patents", "forprior_country", "assignee_id", "assignee_longitude", "assignee_latitude")
We then send our HTTP request to the PatentsView
API to get the data:
library(patentsview)
pv_out <- search_pv(query = query, fields = fields, all_pages = TRUE) # this is crapping out
save(pv_out, file = "data/pv_out.rda", compress = "xz")
load("data/pv_out.rda")
# we have to unnest the data frames that are stored in the assignee list column:
dl <- unnest_pv_data(data = pv_out$data, pk = "patent_number")
save(dl, file = "data/dl.rda", compress = "xz")
load("data/dl.rda")
Now that we got the data, let’s try to identify top assignees.
library(tidyverse)
# We create a data frame with the top 75 assignees:
top_asgns <-
dl$assignees %>%
filter(!is.na(assignee_organization)) %>% # we filter out those patents that are assigned to an inventor without an organization (we want only organizations)
mutate(ttl_pats = as.numeric(assignee_total_num_patents)) %>% #we create a numeric column (ttl_pats) with total number of patents of assignee
group_by(assignee_organization, ttl_pats) %>% # we group assignees by total number of patents (ttl_pats)
summarise(db_pats = n()) %>%
mutate(frac_db_pats = round(db_pats / ttl_pats, 3)) %>% #we calculate the fraction of patents from the total patents each assignee has
ungroup() %>%
select(c(1, 3, 2, 4)) %>%
arrange(desc(db_pats)) %>%
slice(1:75)
We can create now a data frame with patent counts by application year for each assignee:
data <-
top_asgns %>%
select(-contains("pats")) %>%
slice(1:5) %>% #we filter top 5
inner_join(dl$assignees) %>%
inner_join(dl$applications) %>%
mutate(app_yr = as.numeric(substr(app_date, 1, 4))) %>% #we create a new column taking only the year form the date
group_by(assignee_organization, app_yr) %>%
count()
We are now ready to plot the evolution using highchartr
by assigning years to de x
axis and number of patents to the Y
axis and grouping them by assignee organization:
library(highcharter)
data %>%
hchart(.,
type = "line",
hcaes(x = data$app_yr,
y = data$n,
group = data$assignee_organization)) %>%
hc_plotOptions(series = list(marker = list(enabled = FALSE))) %>%
hc_xAxis(title = list(text = "Published applications")) %>%
hc_yAxis(title = list(text = "Patents on Drones")) %>%
hc_title(text = "Top 5 assignees patenting on 'Commercial Drones'") %>%
hc_subtitle(text = "Annual patent applications through time")
To get the top cited assignees, we write a ranking function that will be used to rank patents by their citation counts:
percent_rank2 <- function(x)
(rank(x, ties.method = "average", na.last = "keep") - 1) / (sum(!is.na(x)) - 1)
# Create a data frame with normalized citation rates and stats from Step 2:
asng_p_dat <-
dl$patents %>%
mutate(patent_yr = substr(patent_date, 1, 4)) %>%
group_by(patent_yr) %>%
mutate(perc_cite = percent_rank2(patent_num_cited_by_us_patents)) %>%
inner_join(dl$assignees) %>%
group_by(assignee_organization) %>%
summarise(mean_perc = mean(perc_cite)) %>%
inner_join(top_asgns) %>%
arrange(desc(ttl_pats)) %>%
filter(!is.na(assignee_organization)) %>%
slice(1:20) %>%
mutate(color = "#18BC9C") %>%
as.data.frame()
and we can now visualize it through a bubblechart scatterplot were the bubble size is relative to the number of patents, the position in the y axis is relative to the percentage of citations (highly cited organizations are positioned higher in the chart)
# Adapted from http://jkunst.com/highcharter/showcase.html
hchart(asng_p_dat, "scatter", hcaes(x = db_pats, y = mean_perc, size = frac_db_pats,
group = assignee_organization, color = color)) %>%
hc_xAxis(title = list(text = "Patents on Drones"), type = "logarithmic",
allowDecimals = FALSE, endOnTick = TRUE) %>%
hc_yAxis(title = list(text = "Mean percentile of citation")) %>%
hc_subtitle(text = "Most cited assignees on 'Drones'", align = "center") %>%
hc_add_theme(hc_theme_538()) %>%
hc_legend(enabled = FALSE)
Using the mapping library leaflet
and CartoDB
data -and given that we had longitude and latitude fields- we can geomap assignee organizations around the globe. We make the bubble size relative to the applicant’s number of patents.
library(leaflet)
library(htmltools)
library(dplyr)
library(tidyr)
datad <-
pv_out$data$patents %>%
unnest(assignees) %>%
select(assignee_id, assignee_organization, patent_number,
assignee_longitude, assignee_latitude) %>%
group_by_at(vars(-matches("pat"))) %>%
mutate(num_pats = n()) %>%
ungroup() %>%
select(-patent_number) %>%
distinct() %>%
mutate(popup = paste0("<font color='Black'>",
htmlEscape(assignee_organization), "<br><br>Patents:",
num_pats, "</font>")) %>%
mutate_at(vars(matches("_l")), as.numeric) %>%
filter(!is.na(assignee_id))
pd <- leaflet(datad) %>%
addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
addCircleMarkers(lng = ~assignee_longitude, lat = ~assignee_latitude,
popup = ~popup, ~sqrt(num_pats), color = "#18BC9C")
pd
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/wipo-analytics, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Escorsa (2021, April 9). WIPO Patent Analytics: Using the Patentsview API in R. Retrieved from https://wipo-analytics.github.io/posts/2021-04-09-patentsview/
BibTeX citation
@misc{escorsa2021using, author = {Escorsa, Enric}, title = {WIPO Patent Analytics: Using the Patentsview API in R}, url = {https://wipo-analytics.github.io/posts/2021-04-09-patentsview/}, year = {2021} }