Using PostgreSQL with RStudio and DBI

Using DBI to connect to and query PostgreSQL databases

Jasmine Kindness (One World Analytics)oneworldanalytics.com
2022-03-16

Connecting to Database from DBI

To connect to a PostgreSQL database from inside RStudio, you will need a database interface (DBI) and a DBIdriver.

We will be using the DBI package for our database interface and RPostgreSQL as our DBI driver.

The DBI package separates the connection to a database management system (DBMS) into a front end and a back end with a connection in the middle. In this case PostgreSQL is our DBMS.

DBI functions produce DBI objects which can be used for any DBMS, and RPostgreSQL communicates with the specific DBMS that we will be using.

Below we will cover how to establish a connection to a PostgreSQL database with DBI and RPostgreSQL, before diving into some basic DBI queries we can use to query PostgreSQL databases.

This post assumes that you already have your own PostgreSQL database. If you’re not sure how to set one up, you can follow our tutorial on getting started with PostgreSQL here££££

Setting up the connection

First load your packages:

Next, set up the DBI PostgreSQL driver:

drv <- DBI::dbDriver("PostgreSQL")

Finally, set up the connection to your PostgreSQL database.

pw <- "password"
con <- dbConnect(drv, dbname = "dbname",
                  host = "localhost", port = 5432,
                  user = "username", password = pw)

# You can also disconnect from a database using:
dbDisconnect(con)

Your connection is set up, you can check whether it works by calling the following to list all tables in the database:

r dbListTables(con)

It works! Now we’re ready to query our database in RStudio.

Using DBI to query our database

Below we cover some basic DBI queries which will come in handy when querying your database:

dbListTables(con)

# List remote tables

# To list fields in a specific table in the database
# List field names of a remote table

dbListFields(con, "table") 

#To read a table from postgres into R (the R object will have the same name
# in your environment as it does in postgres):
dbReadTable(con, "table")

dbCreateTable()

# Create a table in the database

dbAppendTable()

# Insert rows into a table

dbRemoveTable()

# Remove a table from the database

One excellent feature of using DBI with PostgreSQL in RStudio is that it is far easier to write tables to PostgreSQL than it is in postgres itself, say by writing from a csv file saved on disk.

A few things to note when writing tables to postgres:

  1. Remember to use data.frame() on your tables before attempting to write them to Postgres. It will not recognise tibbles etc.
  2. Always check the data types of each column of your tables. For the most part they should be integer or character to ensure postgres will be able to interpret them.
  3. Always check the columns and contents of columns after you have written a table to postgres. You should check the contents of a field after writing to the database to ensure they are what you expect. For example, PostgreSQL does not accept list columns, but writing them will not throw an error. Instead you’ll find that while the field is there, it is empty.
#To write a table from R to postgres:
dbWriteTable(con, "table", tablenameinpostgres) 
# (Use overwrite = TRUE) to overwrite existing tables. 
# Use row.names = FALSE if it is defaulting to TRUE

SQL Queries in RStudio

You can also use standard PostgreSQL syntax to build queries from RStudio. A full breakdown of SQL queries is beyond the scope of this post, but you can find more information on writing SELECT statements (here)[https://www.postgresql.org/docs/8.4/tutorial-select.html]. You can also find more on joins between tables (here)[https://www.postgresql.org/docs/8.4/tutorial-join.html].

res <- dbGetQuery(con, "SELECT name FROM laureates WHERE gender = 'female'")

res <- dbGetQuery(con, "SELECT * FROM table WHERE column != 4")
# Executes a query and retrieves the data

Using functions to write queries

You can also write functions to generate SQL queries for you. This is extremely useful when you want to be able to generate queries without writing them manually, such as when you wish to pass parameters to a query in a parameterized report.

This function constructs our query as a character string using the paste0 function. We use dQuote() for double quotation marks and shQuote() for single quotation marks which must be contained inside the character string.

# initialise the function
queryfunc <- function(parameter, table, column1, column2 = "") {
select <- "select "
comma <- ", "
parameter <- shQuote(parameter))
from <- " from "
where <- " where "
like <- " like "
colon <- ";"
qry <- paste0(select, dQuote(column1, FALSE),
              from, table, where, column2, 
              like, 
              parameter, colon)
got <- RPostgreSQL::dbGetQuery(conn = con, statement = qry)
}

# This would be called like so:

queryfunc(parameter, table, column1, column2)

This will produce the same result as:

dbGetQuery(con, "SELECT column1 FROM table WHERE column2 LIKE parameter")

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/wipo-analytics, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Kindness (2022, March 16). WIPO Patent Analytics: Using PostgreSQL with RStudio and DBI. Retrieved from https://wipo-analytics.github.io/posts/postgresdbi/

BibTeX citation

@misc{kindness2022using,
  author = {Kindness, Jasmine},
  title = {WIPO Patent Analytics: Using PostgreSQL with RStudio and DBI},
  url = {https://wipo-analytics.github.io/posts/postgresdbi/},
  year = {2022}
}