--- title: "1. Getting started with florabr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{1. Getting started with florabr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = F, warning = FALSE ) ``` ## Introduction ### What is Flora e Funga do Brasil? Flora e Funga do Brasil is the most comprehensive work to reliably document Brazilian plant, fungi and algae diversity. It involves the work of hundreds of taxonomists, integrating data from plant(including algae) and fungi collected in Brazil during the last two centuries. The database contains detailed and standardized morphological descriptions, illustrations, nomenclatural data, geographic distribution, and keys for the identification of all native and non-native plants and fungi found in Brazil. The florabr package includes a collection of functions designed to retrieve, filter and spatialize data from the [Flora e Funga do Brasil dataset](https://floradobrasil.jbrj.gov.br/reflora/listaBrasil/PrincipalUC/PrincipalUC.do?lingua=en#CondicaoTaxonCP). ## Overview of the functions: + `get_florabr()`: download the latest or an older version of Flora e Funga do Brasil database. + `check_version()`: check if you have the latest version of Flora e Funga do Brasil data available in a directory. + `load_florabr()`: load Flora e Funga do Brasil database. + `solve_discrepancies()`: Resolve discrepancies between species and subspecies/varieties information in the original dataset. + `get_attributes()`: display all the options available to filter species by its characteristics. + `select_species()`: filter species based on its characteristics and distribution. + `get_binomial()`: extract the binomial name (Genus + specific epithet) from a Scientific Name. + `get_synonym()`: Retrieve synonyms for species. + `check_names()`: check if the species names are correct. + `subset_species()`: subset a list of species from Flora e Funga do Brasil. + `get_spat_occ()`: get Spatial polygons (SpatVectors) of species based on its distribution. + `filter_florabr()`: removes or flags records outside of the species' natural ranges. + `get_pam()`: get a presence-absence matrix of species based on its distribution. ## Installation ### Install stable version from CRAN To install the stable version of `florabr` use: ``` r install.packages("florabr") ```
### Install development version from GitHub You can install the released version of *florabr* from [github](https://github.com/wevertonbio/florabr) with: ```{r, results='hide', message=FALSE} if(!require(remotes)){ install.packages("remotes") } if(!require(faunabr)){ remotes::install_github('wevertonbio/faunabr')} library(florabr) ``` ## Get data from Flora e Funga do Brasil All data included in the Flora e Funga do Brasil (i.e., nomenclature, life form, and distribution) are stored in Darwin Core Archive data sets, which is updated weekly. Before downloading the data available in the Flora e Funga do Brasil, we need to create a folder to save the data: ```{r} #Creating a folder in a temporary directory #Replace 'file.path(tempdir(), "florabr")' by a path folder to be create in #your computer my_dir <- file.path(file.path(tempdir(), "florabr")) dir.create(my_dir) ``` You can now utilize the `get_florabr` function to retrieve the most recent version of the data: ```{r, results='hide', message=FALSE, warning=FALSE} get_florabr(output_dir = my_dir, #directory to save the data data_version = "latest", #get the most recent version available overwrite = T) #Overwrite data, if it exists ``` The function will take a few seconds to download the data and a few minutes to merge the datasets into a single data.frame. Upon successful completion, you will find a folder named with the version of Flora e Funga do Brasil. This folder contains the downloaded raw dataset (TXT files in Portuguese) and a file named **CompleteBrazilianFlora.rds**. The latter represents the finalized dataset, merged and translated into English. You also have the option to download an older, specific version of the Flora e Funga do Brasil dataset. To explore the available versions, please refer to [this link](https://ipt.jbrj.gov.br/jbrj/resource?r=lista_especies_flora_brasil). For downloading a particular version, simply replace 'latest' with the desired version number. For example: ```{r, results='hide', message=FALSE, warning=FALSE} get_florabr(output_dir = my_dir, #directory to save the data data_version = "393.385", #Version 393.385, published on 2023-07-21 overwrite = T) #Overwrite data, if it exists ``` As previously mentioned, you will find a folder named '393.385' within the designated directory. To view the available versions in your specified directory, run: ```{r} check_version(data_dir = my_dir) #> You have the following versions of Flora e Funga do Brasil: #> 393.385 #> 393.401 #> It includes the latest version: 393.401 ``` ## Loading the data In order to use the other functions of the package, you need to load the data into your environment. To achieve this, utilize the `load_florabr()` function. By default, the function will automatically search for the latest available version in your directory. However, you have the option to specify a particular version using the *data_version* parameter. Additionally, you can choose between two versions of the data: the 'short' version (containing the 23 columns required for run the other functions of the package) or the 'complete' version (with all original 39 columns). The function imports the 'short' version by default. ```{r} #Short version bf <- load_florabr(data_dir = my_dir, data_version = "Latest_available", type = "short") #short #> Loading version 393.401 colnames(bf) #See variables from short version #> [1] "species" "scientificName" "acceptedName" #> [4] "kingdom" "group" "subgroup" #> [7] "phylum" "class" "order" #> [10] "family" "genus" "lifeForm" #> [13] "habitat" "biome" "states" #> [16] "vegetation" "origin" "endemism" #> [19] "taxonomicStatus" "nomenclaturalStatus" "vernacularName" #> [22] "taxonRank" "id" ``` Note that the complete version has 20 more columns: ```{r} #Complete version bf_complete <- load_florabr(data_dir = my_dir, data_version = "Latest_available", type = "complete") #complete colnames(bf_complete) #See variables from complete version #> [1] "id" "taxonID" #> [3] "acceptedNameUsageID" "parentNameUsageID" #> [5] "originalNameUsageID" "group" #> [7] "subgroup" "species" #> [9] "acceptedName" "scientificName" #> [11] "acceptedNameUsage" "parentNameUsage" #> [13] "namePublishedIn" "namePublishedInYear" #> [15] "higherClassification" "kingdom" #> [17] "phylum" "class" #> [19] "order" "family" #> [21] "genus" "specificEpithet" #> [23] "infraspecificEpithet" "taxonRank" #> [25] "scientificNameAuthorship" "taxonomicStatus" #> [27] "nomenclaturalStatus" "vernacularName" #> [29] "lifeForm" "habitat" #> [31] "vegetation" "origin" #> [33] "endemism" "biome" #> [35] "states" "countryCode" #> [37] "modified" "bibliographicCitation" #> [39] "references" ``` If you want to save the datasets to open with an external editor (e.g. Excel), you can save the data.frame as a CSV file: ```{r, eval=FALSE, warning=FALSE, message=FALSE, } write.csv(x = bf, file = file.path(my_dir, "BrazilianFlora.csv"), row.names = F) ``` ## Solve discrepancies Discrepancies may arise in the original dataset, particularly between species and subspecies/varieties information. One common scenario is when a species is reported in one biome (e.g., Amazon), while a subspecies or variety of the same species is found in another biome (e.g., Cerrado). The `solve_discrepancies()` function rectifies such discrepancies by considering distribution (states, biomes, and vegetation), life form, and habitat. For example, if a subspecies is documented in a specific biome, it suggests that the species also exists in that biome You can resolve these discrepancies by setting *solve_discrepancy = FALSE* in `get_florabr()`: ```{r eval = FALSE} get_florabr(output_dir, data_version = "latest", solve_discrepancy = TRUE, #Default is false overwrite = TRUE, verbose = TRUE) ``` By resolving discrepancies internally in get_florabr(), the dataset loaded with load_florabr() will already have these issues addressed. You can verify if the dataset has resolved discrepancies by examining its attributes: ```{r} attr(bf, "solve_discrepancies") #> FALSE ``` Since we downloaded the data with the default option (solve_discrepancy = FALSE), the loaded dataset may contain discrepancies. For instance, let's explore the biomes with confirmed occurrences of species and subspecies of *Acianthera ochreata*: ```{r} acianthera <- subset_species(data = bf, species = "Acianthera ochreata", include_subspecies = TRUE, include_variety = TRUE) #Check biomes with confirmed occurrence acianthera[,c("scientificName", "taxonRank", "biome")] #> scientificName taxonRank biome #> 8134 Acianthera ochreata (Lindl.) Pridgeon & M.W.Chase Species Caatinga;Cerrado #> 24923 Acianthera ochreata subsp. cylindrifolia (Borba & Semir) Borba Subspecies Cerrado #> 29550 Acianthera ochreata (Lindl.) Pridgeon & M.W.Chase subsp. ochreata Subspecies Atlantic_Forest;Caatinga;Cerrado ``` We observe that the species *Acianthera ochreata* has confirmed occurrences in two biomes: Caatinga and Cerrado. However, there are two subspecies of *A. ochreata*: *A. ochreata subsp. cylindrifolia*, confirmed in the Cerrado, and *Acianthera ochreata subsp. ochreata*, confirmed in the Caatinga, Cerrado, and Atlantic Forest. Through the solve_discrepancies() function, we can retrieve information from subspecies and varieties with accepted names to update the information of the related species ```{r} bf_solved <- solve_discrepancies(bf) #See attribute attr(bf_solved, "solve_discrepancies") #> TRUE ``` In this new dataset with discrepancies solved, let's examine the biomes where *Acianthera ochreata* has confirmed occurrences: ```{r} acianthera_solved <- subset_species(data = bf_solved, species = "Acianthera ochreata", include_subspecies = TRUE, include_variety = TRUE) #Check biomes with confirmed occurrence acianthera_solved[,c("scientificName", "taxonRank", "biome")] #> scientificName taxonRank biome #> 16108 Acianthera ochreata (Lindl.) Pridgeon & M.W.Chase Species Atlantic_Forest;Caatinga;Cerrado #> 24923 Acianthera ochreata subsp. cylindrifolia (Borba & Semir) Borba Subspecies Cerrado #> 29550 Acianthera ochreata (Lindl.) Pridgeon & M.W.Chase subsp. ochreata Subspecies Atlantic_Forest;Caatinga;Cerrado ``` Now, we can see that the biomes with confirmed occurrences of the species *A. ochreata* are updated to Caatinga, Cerrado, and Atlantic Forest (the latter retrieved from the subspecies).