--- title: "`data.table::merge()` wrapper" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{`data.table::merge()` wrapper} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(joyn) library(data.table) x1 = data.table(id = c(1L, 1L, 2L, 3L, NA_integer_), t = c(1L, 2L, 1L, 2L, NA_integer_), x = 11:15) y1 = data.table(id = c(1,2, 4), y = c(11L, 15L, 16)) x2 = data.table(id1 = c(1, 1, 2, 3, 3), id2 = c(1, 1, 2, 3, 4), t = c(1L, 2L, 1L, 2L, NA_integer_), x = c(16, 12, NA, NA, 15)) y2 = data.table(id = c(1, 2, 5, 6, 3), id2 = c(1, 1, 2, 3, 4), y = c(11L, 15L, 20L, 13L, 10L), x = c(16:20)) ``` This vignette describes the use of the `joyn` `merge()` function. 🔀 `joyn::merge` resembles the usability of `base::merge` and `data.table::merge`, while also incorporating the additional features that characterize `joyn`. In fact, `joyn::merge` masks the other two. ### Examples #### Simple merge Suppose you want to merge `x1` and `y1`. First notice that while `base::merge` is principally for data frames, `joyn::merge` coerces `x` and `y` to data tables if they are not already. By default, `merge` will join by the shared column name(s) in `x` and `y`. ```{r ex1} # Example not specifying the key merge(x = x1, y = y1) # Example specifying the key merge(x = x1, y = y1, by = "id") ``` As usual, if the columns you want to join by don’t have the same name, you need to tell merge which columns you want to join by: `by.x` for the x data frame column name, and `by.y` for the y one. For example, ```{r ex2} df1 <- data.frame(id = c(1L, 1L, 2L, 3L, NA_integer_, NA_integer_), t = c(1L, 2L, 1L, 2L, NA_integer_, 4L), x = 11:16) df2 <- data.frame(id = c(1,2, 4, NA_integer_, 8), y = c(11L, 15L, 16, 17L, 18L), t = c(13:17)) merge(x = df1, y = df2, by.x = "x", by.y = "y") ``` By default, `sort` is `TRUE`, so that the merged table will be sorted by the `by.x` column. Notice that the output table distinguishes non-by column *t* coming from `x` from the one coming from `y` by adding the *.x* and *.y* suffixes -which occurs because the `no.dups` argument is set to `TRUE` by default. #### Going further In a similar fashion as the `joyn()` primary function does, `merge()` offers a number of arguments to verify/control the merge[^1]. [^1]: See the "Advanced functionalities" article for more details For example, `joyn::joyn` allows to execute one-to-one, one-to-many, many-to-one and many-to-many joins. Similarly, `merge` accepts the `match_type` argument: ```{r ex3} # Example with many to many merge joyn::merge(x = x2, y = y2, by.x = "id1", by.y = "id2", match_type = "m:m") # Example with many to many merge joyn::merge(x = x1, y = y1, by = "id", match_type = "m:1") ``` In a similar way, you can exploit all the other additional options available in `joyn()`, e.g., for keeping common variables, updating NAs and values, displaying messages etc..., which you can explore in the "Advanced functionalities" article.