Analyzing Hass USA

The {avocado} package provides a weekly summary - starting from January 2017 through November 2020 - of Hass Avocado sales. There are three datasets in this package and let’s start with the dataset hass_usa which focuses on weekly avocado sales in the contiguous US.

Let’s start by loading the package - along with {dplyr} (for data wrangling) and {ggplot} (for data visualization) - and exploring it’s structure

library(avocado)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)

data('hass_usa')

dplyr::glimpse(hass_usa)
#> Rows: 810
#> Columns: 11
#> $ week_ending               <date> 2017-01-02, 2017-01-08, 2017-01-15, 2017-01…
#> $ type                      <chr> "Conventional", "Conventional", "Conventiona…
#> $ avg_selling_price         <dbl> 0.89, 0.99, 0.98, 0.94, 0.96, 0.77, 0.87, 0.…
#> $ total_bulk_and_bags_units <dbl> 38879717, 38049803, 38295489, 42140394, 3937…
#> $ plu4046_units             <dbl> 12707895, 11809728, 12936859, 14254151, 1403…
#> $ plu4225_units             <dbl> 14201201, 13856935, 12625666, 14212882, 1168…
#> $ plu4770_units             <dbl> 549845, 539069, 579347, 908617, 818728, 1664…
#> $ total_bagged_units        <dbl> 11420777, 11844072, 12153619, 12764745, 1283…
#> $ sml_bagged_units          <dbl> 8551134, 9332972, 9445623, 9462854, 9918256,…
#> $ lrg_bagged_units          <dbl> 2802710, 2432260, 2638919, 3231020, 2799961,…
#> $ xlrg_bagged_units         <dbl> 66934, 78841, 69078, 70872, 119096, 112870, …

Exploratory Data Analysis

Let’s begin by exploring the following two topics:

Fluctuation of Average Selling Price


hass_usa |> 
  ggplot(aes(x = week_ending)) +
  geom_line(aes(y = avg_selling_price, color = as.factor(type))) +
  scale_color_manual(labels = c('Conventional','Organic'), values = c('steelblue','forestgreen')) +
  scale_x_date(date_breaks = '1 year', date_labels = '%Y') +
  labs(
    x = 'Year',
    y = 'Average Selling Price per Unit (US$)',
    title = 'Fluctuation of Average Selling Price', 
    caption = 'Not adjusted for inflation\nSource: Hass Avocado Board',
    color = ''
  ) +
  ylim(min = 0, max = 3.0) +
  theme(
    plot.background = element_rect(fill = "grey20"),
    plot.title = element_text(color = "#FFFFFF"),
    axis.title = element_text(color = "#FFFFFF"),
    axis.text.x = element_text(color = 'grey50', angle = 45, hjust = 1),
    axis.text.y = element_text(color = 'grey50'),
    plot.caption = element_text(color = 'grey75'),
    panel.background = element_blank(),
    panel.grid.major = element_line(color = "grey50", linewidth = 0.2),
    panel.grid.minor = element_line(color = "grey50", linewidth = 0.2),
    legend.background = element_rect(fill = 'grey20'),
    legend.key = element_rect(fill = 'grey20'),
    legend.title = element_text(color = 'grey75'),
    legend.text = element_text(color = 'grey75'),
    legend.position = 'inside',
    legend.position.inside = c(0.85, 0.85)
  )

Interestingly, we can see that the average selling price for organic avocados tends to be higher than the average selling price for non-organic (Conventional) avocados. Note how there seems to be a fairly large spike in selling price in late 2017. Moreover, it seems as if the peak average selling price of avocados is declining as time goes on.