:fire: A fast, easy-to-use database library for R
Designed for both research and production environments
Supports Postgres, MySQL, MariaDB, SQLite, SQL Server, and more
Install dbx
install.packages("dbx")
And follow the instructions for your database
To install with Jetpack, use:
::add("dbx") jetpack
Install the R package
install.packages("RPostgres")
And use:
library(dbx)
<- dbxConnect(adapter="postgres", dbname="mydb") db
You can also pass user
, password
,
host
, port
, and url
.
Works with RPostgreSQL as well
Install the R package
install.packages("RMySQL")
And use:
library(dbx)
<- dbxConnect(adapter="mysql", dbname="mydb") db
You can also pass user
, password
,
host
, port
, and url
.
Works with RMariaDB as well
Install the R package
install.packages("RSQLite")
And use:
library(dbx)
<- dbxConnect(adapter="sqlite", dbname=":memory:") db
Install the R package
install.packages("odbc")
And use:
library(dbx)
<- dbxConnect(adapter=odbc::odbc(), database="mydb") db
You can also pass uid
, pwd
,
server
, and port
.
For Redshift, follow the Postgres instructions.
Install the R package
install.packages("duckdb")
And use:
library(dbx)
<- dbxConnect(adapter=duckdb::duckdb(), dbdir=":memory:") db
Install the appropriate R package and use:
<- dbxConnect(adapter=odbc::odbc(), database="mydb") db
Create a data frame of records from a SQL query
<- dbxSelect(db, "SELECT * FROM forecasts") records
Pass parameters
dbxSelect(db, "SELECT * FROM forecasts WHERE period = ? AND temperature > ?", params=list("hour", 27))
Parameters can also be vectors
dbxSelect(db, "SELECT * FROM forecasts WHERE id IN (?)", params=list(1:3))
Insert records
<- "forecasts"
table <- data.frame(temperature=c(32, 25))
records dbxInsert(db, table, records)
If you use auto-incrementing ids, you can get the ids of newly inserted rows by passing the column name:
dbxInsert(db, table, records, returning=c("id"))
returning
is not available for MySQL or Redshift
Update records
<- data.frame(id=c(1, 2), temperature=c(16, 13))
records dbxUpdate(db, table, records, where_cols=c("id"))
Use where_cols
to specify the columns used for lookup.
Other columns are written to the table.
Updates are batched when possible, but often need to be run as multiple queries. We recommend upsert when possible for better performance, as it can always be run as a single query. Turn on logging to see the difference.
Atomically insert if they don’t exist, otherwise update them
<- data.frame(id=c(2, 3), temperature=c(20, 25))
records dbxUpsert(db, table, records, where_cols=c("id"))
Use where_cols
to specify the columns used for lookup.
There must be a unique index on them, or an error will be thrown.
To skip existing rows instead of updating them, use:
dbxUpsert(db, table, records, where_cols=c("id"), skip_existing=TRUE)
If you use auto-incrementing ids, you can get the ids of newly upserted rows by passing the column name:
dbxUpsert(db, table, records, where_cols=c("id"), returning=c("id"))
returning
is not available for MySQL or Redshift
Delete specific records
<- data.frame(id=c(1, 2))
bad_records dbxDelete(db, table, where=bad_records)
Delete all records (uses TRUNCATE
when possible for
performance)
dbxDelete(db, table)
Execute a statement
dbxExecute(db, "UPDATE forecasts SET temperature = temperature + 1")
Pass parameters
dbxExecute(db, "UPDATE forecasts SET temperature = ? WHERE id IN (?)", params=list(27, 1:3))
Log all SQL queries with:
options(dbx_logging=TRUE)
Customize logging by passing a function
<- function(sql) {
logQuery # your logging code
}
options(dbx_logging=logQuery)
Environment variables are a convenient way to store database credentials. This keeps them outside your source control. It’s also how platforms like Heroku store them.
Create an .Renviron
file in your home directory
with:
DATABASE_URL=postgres://user:pass@host/dbname
Install urltools:
install.packages("urltools")
And use:
<- dbxConnect() db
If you have multiple databases, use a different variable name, and:
<- dbxConnect(url=Sys.getenv("OTHER_DATABASE_URL")) db
You can also use a package like keyring.
By default, operations are performed in a single statement or
transaction. This is better for performance and prevents partial writes
on failures. However, when working with large data frames on production
systems, it can be better to break writes into batches. Use the
batch_size
option to do this.
dbxInsert(db, table, records, batch_size=1000)
dbxUpdate(db, table, records, where_cols, batch_size=1000)
dbxUpsert(db, table, records, where_cols, batch_size=1000)
dbxDelete(db, table, records, where, batch_size=1000)
Add comments to queries to make it easier to see where time-consuming queries are coming from.
options(dbx_comment=TRUE)
The comment will be appended to queries, like:
SELECT * FROM users /*script:forecast.R*/
Set a custom comment with:
options(dbx_comment="hi")
To perform multiple operations in a single transaction, use:
::dbWithTransaction(db, {
DBIdbxInsert(db, ...)
dbxDelete(db, ...)
})
For updates inside a transaction, use:
dbxUpdate(db, transaction=FALSE)
To specify a schema, use:
<- DBI::Id(schema="schema", table="table") table
Dates are returned as Date
objects and times as
POSIXct
objects. Times are stored in the database in UTC
and converted to your local time zone when retrieved.
Times without dates are returned as character
vectors
since R has no built-in support for this type. If you use hms, you can convert
columns with:
$column <- hms::as_hms(records$column) records
SQLite does not have support for TIME
columns, so we
recommend storing as VARCHAR
.
JSON and JSONB columns are returned as character
vectors. You can use jsonlite to parse
them with:
$column <- lapply(records$column, jsonlite::fromJSON) records
SQLite does not have support for JSON
columns, so we
recommend storing as TEXT
.
BLOB and BYTEA columns are returned as raw
vectors.
RSQLite does not currently provide enough info to automatically typecast dates and times. You can manually typecast date columns with:
$column <- as.Date(records$column) records
And time columns with:
$column <- as.POSIXct(records$column, tz="Etc/UTC")
recordsattr(records$column, "tzone") <- Sys.timezone()
RMariaDB and RSQLite do not currently provide enough info to automatically typecast booleans. You can manually typecast with:
$column <- records$column != 0 records
RMariaDB does not currently support JSON.
RMySQL can write BLOB columns, but can’t retrieve them directly. To workaround this, use:
<- dbxSelect(db, "SELECT HEX(column) AS column FROM table")
records
<- function(x) {
hexToRaw <- strsplit(x, "")[[1]]
y <- paste0(y[c(TRUE, FALSE)], y[c(FALSE, TRUE)])
z as.raw(as.hexmode(z))
}
$column <- lapply(records$column, hexToRaw) records
BIGINT columns are returned as numeric
vectors. The
numeric
type in R loses precision above 253.
Some libraries (RPostgres, RMariaDB, RSQLite, ODBC) support returning
bit64::integer64
vectors instead.
dbxConnect(bigint="integer64")
Install the pool package
install.packages("pool")
Create a pool
library(pool)
<- function() {
factory dbxConnect(adapter="postgres", ...)
}
<- poolCreate(factory, maxSize=5) pool
Run queries
<- poolCheckout(pool)
conn
{
tryCatch("SELECT * FROM forecasts")
dbxSelect(conn, }, finally={
poolReturn(conn)})
In the future, dbx commands may work directly with pools.
When connecting to a database over a network you don’t fully trust, make sure your connection is secure.
With Postgres, use:
<- dbxConnect(adapter="postgres", sslmode="verify-full", sslrootcert="ca.pem") db
With RMariaDB, use:
<- dbxConnect(adapter="mysql", ssl.ca="ca.pem") db
Please let us know if you have a way that works with RMySQL.
Set session variables with:
<- dbxConnect(variables=list(search_path="archive")) db
Set a statement timeout with:
# Postgres
<- dbxConnect(variables=list(statement_timeout=1000)) # ms
db
# MySQL 5.7.8+
<- dbxConnect(variables=list(max_execution_time=1000)) # ms
db
# MariaDB 10.1.1+
<- dbxConnect(variables=list(max_statement_time=1)) # sec db
With Postgres, set a connect timeout with:
<- dbxConnect(connect_timeout=3) # sec db
With SQL Server, set a connect timeout with:
<- dbxConnect(timeout=3) # sec db
All connections are simply DBI connections, so you can use them anywhere you use DBI.
dbCreateTable(db, ...)
Install dbplyr to use data with dplyr.
<- tbl(db, "forecasts") forecasts
To close a connection, use:
dbxDisconnect(db)
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
To get started with development:
git clone https://github.com/ankane/dbx.git
cd dbx
# create Postgres database
createdb dbx_test
# create MySQL database
mysqladmin create dbx_test
In R, do:
install.packages("devtools")
::install_deps(dependencies=TRUE)
devtools::test() devtools
To test a single file, use:
::install() # to use latest updates
devtools::test_active_file("tests/testthat/test-postgres.R") devtools
To test the ODBC adapter, use:
brew install mariadb-connector-odbc psqlodbc
# or
sudo apt-get install odbc-mariadb odbc-postgresql
To test SQL Server, use:
brew install freetds
# or
sudo apt-get install tdsodbc
docker run -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=YourStrong!Passw0rd' -p 1433:1433 -d mcr.microsoft.com/mssql/server:2022-latest
docker exec -it <container-id> /opt/mssql-tools18/bin/sqlcmd -S localhost -U SA -P YourStrong\!Passw0rd -C -Q "CREATE DATABASE dbx_test"