Overview of Datasets

The {avocado} package consists of three different datasets that summarize the weekly sales (units) of Hass Avocados at different regional levels.

Units

Throughout the datasets, you’ll see the term units. Think of a unit as 1 avocado. The Hass Avocado Board (HAB) provides insights on the unit sales of avocados. This includes bags. In terms of bags, 1 unit still refers to 1 avocado. A bag (of any size) may consist of multiple units.

Rounding

The raw dataset that is provided by HAB typically includes fractional units. This does not imply that fractions of avocados were sold. Rather, the underlying data comes from external sources. These sources can provide fractional units depending on how they count units. For example, partial sales could result in fractional units. For the datasets in this package, all values have been rounded up to the nearest whole number.

See the HAB website for their summarized reports and informational dashboards.

PLU

The product/price lookup code (PLU) uniquely identifies a product (mainly produce). The Hass Avocado Board focuses on six different PLUs:

Organic avocados have the digit 9 prefixed to the non-organic PLUs: * 94046: organic small/medium Hass Avocados (~3-5 oz) * 94225: organic large Hass Avocados (~8-10 oz) * 94770: organic extra large Hass Avocados (~10-15 oz)

Within this dataset, you’ll want to use the type column combined with the column plu4046_units, plu4225_units, and plu4770_units to determine if the units are for conventional (non-organic) or organic avocados. For example, if the type is Organic and you look at the value in plu4046_units, you’ll actually be looking at the unit sales for organic avocados with PLU 94046.

Bags vs PLU

Another distinction that the HAB makes is between bags versus bulk. Bulk typically means avocados sold as individual pieces and are easily distinguishable with their PLU codes. Hence, the PLU refers to a bulk sale. On the other hand, the bags indicates a pre-packaged container consisting of a variable number of avocados that could weigh differently.

Region vs. Location

The hass_region and hass datasets contain a shared variable called region and the hass dataset has a variable called location. Regions are defined by the Hass Avocado Board and Locations are selected cities or sub-regions that are part of the overall Region. The totals found for all locations within a Region will not equal the total found for the specific Region due to the aforementioned point. For convenience, here is a breakdown of the Regions and Locations:

Datasets

hass_usa

The hass_usa dataset focuses on weekly Hass Avocado sales at the country (i.e., contiguous US) level and consists of the following fields:

library(avocado)
data('hass_usa')
dplyr::glimpse(hass_usa)
#> Rows: 810
#> Columns: 11
#> $ week_ending               <date> 2017-01-02, 2017-01-08, 2017-01-15, 2017-01…
#> $ type                      <chr> "Conventional", "Conventional", "Conventiona…
#> $ avg_selling_price         <dbl> 0.89, 0.99, 0.98, 0.94, 0.96, 0.77, 0.87, 0.…
#> $ total_bulk_and_bags_units <dbl> 38879717, 38049803, 38295489, 42140394, 3937…
#> $ plu4046_units             <dbl> 12707895, 11809728, 12936859, 14254151, 1403…
#> $ plu4225_units             <dbl> 14201201, 13856935, 12625666, 14212882, 1168…
#> $ plu4770_units             <dbl> 549845, 539069, 579347, 908617, 818728, 1664…
#> $ total_bagged_units        <dbl> 11420777, 11844072, 12153619, 12764745, 1283…
#> $ sml_bagged_units          <dbl> 8551134, 9332972, 9445623, 9462854, 9918256,…
#> $ lrg_bagged_units          <dbl> 2802710, 2432260, 2638919, 3231020, 2799961,…
#> $ xlrg_bagged_units         <dbl> 66934, 78841, 69078, 70872, 119096, 112870, …

haas_region

The hass_region dataset focuses on weekly US Hass Avocado sales at the region level and consist of the following fields:

library(avocado)
data('hass_region')
dplyr::glimpse(hass_region)
#> Rows: 6,480
#> Columns: 12
#> $ region                    <chr> "California", "Great Lakes", "Midsouth", "No…
#> $ week_ending               <date> 2017-01-02, 2017-01-02, 2017-01-02, 2017-01…
#> $ type                      <chr> "Conventional", "Conventional", "Conventiona…
#> $ avg_selling_price         <dbl> 0.89, 0.88, 1.12, 1.35, 0.83, 0.64, 0.94, 0.…
#> $ total_bulk_and_bags_units <dbl> 7175277, 4225246, 2878968, 3513389, 2382743,…
#> $ plu4046_units             <dbl> 2266314, 636278, 653896, 174843, 1462455, 35…
#> $ plu4225_units             <dbl> 2877689, 2157250, 1285365, 2589316, 509660, …
#> $ plu4770_units             <dbl> 90900, 189357, 64704, 39607, 4781, 27549, 92…
#> $ total_bagged_units        <dbl> 1940376, 1242362, 875005, 709624, 405849, 13…
#> $ sml_bagged_units          <dbl> 1762034, 885770, 719380, 659612, 387098, 110…
#> $ lrg_bagged_units          <dbl> 151334, 349033, 151227, 49533, 13009, 230435…
#> $ xlrg_bagged_units         <dbl> 27008, 7560, 4399, 479, 5743, 16335, 1884, 3…

hass_market

The hass_market dataset summarizes weekly Hass Avocado sales within the contiguous US based on city or sub-region. These areas are defined by the HAB and make up portions of the region field in the haas_region dataset. The fields are:

library(avocado)
data('hass_market')
dplyr::glimpse(hass_market)
#> Rows: 38,522
#> Columns: 13
#> $ region                    <chr> "Northeast", "Southeast", "Midsouth", "West"…
#> $ market                    <chr> "Albany", "Atlanta", "Baltimore/Washington",…
#> $ week_ending               <date> 2017-01-02, 2017-01-02, 2017-01-02, 2017-01…
#> $ type                      <chr> "Conventional", "Conventional", "Conventiona…
#> $ avg_selling_price         <dbl> 1.47, 0.93, 1.47, 0.92, 1.29, 1.43, 1.21, 1.…
#> $ total_bulk_and_bags_units <dbl> 129949, 547566, 631761, 104511, 458831, 1053…
#> $ plu4046_units             <dbl> 4846, 224074, 54531, 27846, 4120, 1286, 4776…
#> $ plu4225_units             <dbl> 117028, 118927, 408953, 9409, 371224, 58532,…
#> $ plu4770_units             <dbl> 201, 338, 14388, 11342, 3934, 103, 15037, 11…
#> $ total_bagged_units        <dbl> 7875, 204229, 153892, 55915, 79554, 45430, 5…
#> $ sml_bagged_units          <dbl> 7867, 111600, 151346, 53094, 79340, 45156, 4…
#> $ lrg_bagged_units          <dbl> 8, 92629, 2543, 2794, 214, 256, 13712, 1079,…
#> $ xlrg_bagged_units         <dbl> 0, 0, 4, 28, 0, 19, 47, 5090, 2, 0, 917, 98,…