Using the Patentsview API in R

Learn how to use the USPTO Patentsview API in R.

Enric Escorsa https://github.com/wipo-analytics
04-09-2021

Introduction

The USPTO has implemented a patent visualization and analysis platform named Patentsview

Show code
knitr::include_graphics("images/patentsview/fig1_front.png")

Patentsview is openly available through the web and enables the discovery and exploitation of US patent data in a visual, convenient way.

It makes it particularly easy to check out which are the most patented technologies, the main patent assignees within each technology area and the most prolific inventors. All these relationships, the associated locations and the data can be interactively explored and compared.

There is also an associated API for developers to personalise queries and download the corresponding patent data.

Moreover, an R package is available to access to Patentsview data from R. Let’s check it out.

Patentsview from R

With the patentsview package we can access, search and analyse USPTO patent data available in Patentsview.

First, we need to install and load the required packages (together with Patentsview we will also use the package highcharter allowing us to generate interactive visualizations of the data we obtain).

Show code
library(patentsview)
library(tidyverse) # for data manipulation
library(highcharter) # wrapper of highchart library to generate interactive visualisations

We can then formulate our patent queries using specified search fields.

We will need to express them using the function: with_qfuns (to list query functions to be used) and then concatenate searches in each data field we want to search with corresponding operator functions; p.e. dates with gte(patent_date = "2016-01-01 (where gte stands for greater or equal) or exact terms in textual fields (p.e. in the abstract) with text_all(patent_abstract = "UAS") or specific Cooperative Classification Classes (CPCs) using: qry_funs$eq(cpc_subsection_id = "G12").

Here is just a query formulation example considering a series of synonyms and a series of related classes:

Show code
library(patentsview)

query <- 
  with_qfuns(
    and(
      gte(patent_date = "2016-01-01"),
      or(
        text_all(patent_abstract = "UAS"),
        text_all(patent_abstract = "drones"),
        text_all(patent_abstract = "UAV"),
        text_all(patent_abstract = "commercial"),
        text_all(patent_abstract = "mobility"),
        text_all(patent_abstract = "traffic"),
        text_all(patent_abstract = "urban"),
        text_all(patent_abstract = "cities"),
        text_all(patent_abstract = "public"),
        text_all(patent_abstract = "security"),
        text_all(patent_abstract = "city"), 
        text_all(patent_abstract = "unmanned"),
        text_all(patent_abstract = "aerial")
    ),
      or(
        qry_funs$eq(cpc_subsection_id = "G02B"),
        qry_funs$eq(cpc_subsection_id = "G09B"),
        qry_funs$eq(cpc_subsection_id = "G01"),
        qry_funs$eq(cpc_subsection_id = "G21K"),
        qry_funs$eq(cpc_subsection_id = "B64"),
        qry_funs$eq(cpc_subsection_id = "G08"),
        qry_funs$eq(cpc_subsection_id = "G05"))
    )
  )

Some of the operators that can be used are:

Once we are happy with our search, we need to create a list containing the fields to be used in our analysis (note that we also list location data -longitude and latitude- so we can map later on):

Show code
fields <- c("patent_number", "assignee_organization",
            "patent_num_cited_by_us_patents", "app_date", "patent_date",
            "assignee_total_num_patents", "forprior_country", "assignee_id", "assignee_longitude", "assignee_latitude")

We then send our HTTP request to the PatentsView API to get the data:

Show code
library(patentsview)
pv_out <- search_pv(query = query, fields = fields, all_pages = TRUE) # this is crapping out
Show code
save(pv_out, file = "data/pv_out.rda", compress = "xz")
Show code
load("data/pv_out.rda")
Show code
# we have to unnest the data frames that are stored in the assignee list column:
dl <- unnest_pv_data(data = pv_out$data, pk = "patent_number")
Show code
save(dl, file = "data/dl.rda", compress = "xz")
Show code
load("data/dl.rda")

Identifying top assignees

Now that we got the data, let’s try to identify top assignees.

Show code
library(tidyverse)
# We create a data frame with the top 75 assignees:
top_asgns <-
  dl$assignees %>%
  filter(!is.na(assignee_organization)) %>% # we filter out those patents that are assigned to an inventor without an organization (we want only organizations)
  mutate(ttl_pats = as.numeric(assignee_total_num_patents)) %>% #we create a numeric column (ttl_pats) with total number of patents of assignee
  group_by(assignee_organization, ttl_pats) %>% # we group assignees by total number of patents (ttl_pats)
  summarise(db_pats = n()) %>%
  mutate(frac_db_pats = round(db_pats / ttl_pats, 3)) %>% #we calculate the fraction of patents from the total patents each assignee has
  ungroup() %>%
  select(c(1, 3, 2, 4)) %>%
  arrange(desc(db_pats)) %>%
  slice(1:75)

Evolution of patent activity

We can create now a data frame with patent counts by application year for each assignee:

Show code
data <-
  top_asgns %>%
  select(-contains("pats")) %>%
  slice(1:5) %>% #we filter top 5
  inner_join(dl$assignees) %>%
  inner_join(dl$applications) %>%
  mutate(app_yr = as.numeric(substr(app_date, 1, 4))) %>% #we create a new column taking only the year form the date
  group_by(assignee_organization, app_yr) %>%
  count()

We are now ready to plot the evolution using highchartr by assigning years to de x axis and number of patents to the Y axis and grouping them by assignee organization:

Show code
library(highcharter)
data %>% 
  hchart(., 
         type = "line", 
         hcaes(x = data$app_yr, 
               y = data$n,
                 group = data$assignee_organization)) %>%
  hc_plotOptions(series = list(marker = list(enabled = FALSE))) %>%
  hc_xAxis(title = list(text = "Published applications")) %>%
  hc_yAxis(title = list(text = "Patents on Drones")) %>%
  hc_title(text = "Top 5 assignees patenting on 'Commercial Drones'") %>%
  hc_subtitle(text = "Annual patent applications through time")

Top cited assignees

To get the top cited assignees, we write a ranking function that will be used to rank patents by their citation counts:

Show code
percent_rank2 <- function(x)
  (rank(x, ties.method = "average", na.last = "keep") - 1) / (sum(!is.na(x)) - 1)

# Create a data frame with normalized citation rates and stats from Step 2:
asng_p_dat <-
  dl$patents %>%
  mutate(patent_yr = substr(patent_date, 1, 4)) %>%
  group_by(patent_yr) %>%
  mutate(perc_cite = percent_rank2(patent_num_cited_by_us_patents)) %>%
  inner_join(dl$assignees) %>%
  group_by(assignee_organization) %>%
  summarise(mean_perc = mean(perc_cite)) %>%
  inner_join(top_asgns) %>%
  arrange(desc(ttl_pats)) %>%
  filter(!is.na(assignee_organization)) %>%
  slice(1:20) %>%
  mutate(color = "#18BC9C") %>%
  as.data.frame()

and we can now visualize it through a bubblechart scatterplot were the bubble size is relative to the number of patents, the position in the y axis is relative to the percentage of citations (highly cited organizations are positioned higher in the chart)

Show code
# Adapted from http://jkunst.com/highcharter/showcase.html
hchart(asng_p_dat, "scatter", hcaes(x = db_pats, y = mean_perc, size = frac_db_pats,
                                    group = assignee_organization, color = color)) %>%
  hc_xAxis(title = list(text = "Patents on Drones"), type = "logarithmic",
           allowDecimals = FALSE, endOnTick = TRUE) %>%
  hc_yAxis(title = list(text = "Mean percentile of citation")) %>%
  hc_subtitle(text = "Most cited assignees on 'Drones'", align = "center") %>%
  hc_add_theme(hc_theme_538()) %>%
  hc_legend(enabled = FALSE)

Origin of inventions

Using the mapping library leaflet and CartoDB data -and given that we had longitude and latitude fields- we can geomap assignee organizations around the globe. We make the bubble size relative to the applicant’s number of patents.

Show code
library(leaflet)
library(htmltools)
library(dplyr)
library(tidyr)

datad <-
  pv_out$data$patents %>%
    unnest(assignees) %>%
    select(assignee_id, assignee_organization, patent_number,
           assignee_longitude, assignee_latitude) %>%
    group_by_at(vars(-matches("pat"))) %>%
    mutate(num_pats = n()) %>%
    ungroup() %>%
    select(-patent_number) %>%
    distinct() %>%
    mutate(popup = paste0("<font color='Black'>",
                          htmlEscape(assignee_organization), "<br><br>Patents:",
                          num_pats, "</font>")) %>%
    mutate_at(vars(matches("_l")), as.numeric) %>%
    filter(!is.na(assignee_id))

pd <- leaflet(datad) %>%
  addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
  addCircleMarkers(lng = ~assignee_longitude, lat = ~assignee_latitude,
                   popup = ~popup, ~sqrt(num_pats), color = "#18BC9C")
pd

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/wipo-analytics, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Escorsa (2021, April 9). WIPO Patent Analytics: Using the Patentsview API in R. Retrieved from https://wipo-analytics.github.io/posts/2021-04-09-patentsview/

BibTeX citation

@misc{escorsa2021using,
  author = {Escorsa, Enric},
  title = {WIPO Patent Analytics: Using the Patentsview API in R},
  url = {https://wipo-analytics.github.io/posts/2021-04-09-patentsview/},
  year = {2021}
}