Random Forest, using Ranger

Function	Works
`tidypredict_fit()`, `tidypredict_sql()`, `parse_model()`	✔
`tidypredict_to_column()`	✗
`tidypredict_test()`	✔
`tidypredict_interval()`, `tidypredict_sql_interval()`	✗
`parsnip`	✔

Under the hood

The parser is based on the output from the ranger::treeInfo() function. It will return as many decision paths as there are non-NA rows in the prediction field.

treeInfo(model) %>%
  head()
#>   nodeID leftChild rightChild splitvarID splitvarName splitval terminal
#> 1      0         1          2          8         gear     3.50    FALSE
#> 2      1         3          4          2           hp   192.50    FALSE
#> 3      2         5          6          4           wt     2.26    FALSE
#> 4      3        NA         NA         NA         <NA>       NA     TRUE
#> 5      4        NA         NA         NA         <NA>       NA     TRUE
#> 6      5        NA         NA         NA         <NA>       NA     TRUE
#>   prediction
#> 1         NA
#> 2         NA
#> 3         NA
#> 4   16.02000
#> 5   12.18333
#> 6   29.98750

The output from parse_model() is transformed into a dplyr, a.k.a Tidy Eval, formula. Each decision tree becomes one dplyr::case_when() statement, which are then combined.

tidypredict_fit(model)
#> case_when(hp < 192.5 & gear < 3.5 ~ 16.02, hp >= 192.5 & gear < 
#>     3.5 ~ 12.1833333333333, wt < 2.26 & gear >= 3.5 ~ 29.9875, 
#>     .default = 20.0076923076923) + case_when(vs < 0.5 & wt < 
#>     3.295 ~ 21.1833333333333, vs >= 0.5 & wt < 3.295 ~ 25.8714285714286, 
#>     qsec < 18.15 & wt >= 3.295 ~ 14.1588235294118, .default = 18.5) + 
#>     case_when(hp < 79.5 & disp < 163.8 ~ 28.125, hp >= 79.5 & 
#>         disp < 163.8 ~ 21.225, wt < 4.5475 & disp >= 163.8 ~ 
#>         17.15, .default = 10.4) + case_when(disp < 101.55 & cyl < 
#>     5 ~ 31.65, disp >= 101.55 & cyl < 5 ~ 23.3, cyl < 7 & cyl >= 
#>     5 ~ 20.2666666666667, .default = 15.3538461538462) + case_when(wt < 
#>     1.885 & cyl < 5 ~ 31.5666666666667, wt >= 1.885 & cyl < 5 ~ 
#>     23.9714285714286, cyl < 7 & cyl >= 5 ~ 19.8, .default = 14.91875)

From there, the Tidy Eval formula can be used anywhere where it can be operated. tidypredict provides three paths:

Use directly inside dplyr, mutate(iris, !! tidypredict_fit(model))
Use tidypredict_to_column(model) to a piped command set
Use tidypredict_to_sql(model) to retrieve the SQL statement

parsnip

tidypredict also supports ranger model objects fitted via the parsnip package.

library(parsnip)

parsnip_model <- rand_forest(mode = "regression", trees = 5) %>%
  set_engine("ranger", max.depth = 2) %>%
  fit(mpg ~ ., data = mtcars)

tidypredict_fit(parsnip_model)
#> case_when(disp < 197.95 & gear < 3.5 ~ 21.5, disp >= 197.95 & 
#>     gear < 3.5 ~ 15.42, drat < 4 & gear >= 3.5 ~ 23.4444444444444, 
#>     .default = 27.9833333333333) + case_when(wt < 2.2775 & hp < 
#>     131.5 ~ 30.5, wt >= 2.2775 & hp < 131.5 ~ 21.1125, drat < 
#>     3.035 & hp >= 131.5 ~ 10.4, .default = 16.8833333333333) + 
#>     case_when(hp < 78.5 & cyl < 5 ~ 31.15, hp >= 78.5 & cyl < 
#>         5 ~ 26.1285714285714, disp < 266.9 & cyl >= 5 ~ 20.2, 
#>         .default = 15.2583333333333) + case_when(drat < 4.325 & 
#>     vs < 0.5 ~ 16.7, drat >= 4.325 & vs < 0.5 ~ 26, wt < 2.26 & 
#>     vs >= 0.5 ~ 32.2333333333333, .default = 20.6375) + case_when(vs < 
#>     0.5 & disp < 120.65 ~ 26, vs >= 0.5 & disp < 120.65 ~ 31.2777777777778, 
#>     wt < 3.3125 & disp >= 120.65 ~ 21.4555555555556, .default = 16.4307692307692)

Random Forest, using Ranger

How it works

Under the hood

parsnip