Introduction to the jumps package

Version: 2025-03-16

Introduction

The jumps package implements the method described in the forthcoming Economic Modelling article “A Hodrick-Prescott filter with automatically selected breaks”. The terms jumps and breaks are used interchangeably in the package documentation. Indeed, in the initial version of the article, we used the term jumps, and one of the referees suggested that this term is generally devoted to the jump processes used in mathematical finance, while breaks is generally used in econometrics. We agreed with the referee and decided to use the term breaks in the article’s final version. However, we decided to keep the term jumps in the package name and in the function names because we had already written most of it.

We introduced an innovation to the well-known Hodrick-Prescott filter (HPF) that allows for a small number of discontinuities in the otherwise smooth filter. The number, positions and magnitudes of these discontinuities, which we name breaks or jumps, are automatically estimated from the data. The method is based on the minimization of the sum of squared residuals of the HPF subject to a penalty on the number of breaks. The penalty is chosen by the user and can be set to zero, in which case the method reduces to the standard HPF. The technique is implemented in the hpj function, which is the main function of the package. The function hpj is a wrapper for other functions that implement various variations of the technique.

For efficiency, the package’s computational engine is written in C++ through the Rcpp package. The engine implements the Kalman filter and smoother for the state-space form underlying the HPF. All formulae can be found in the article and in the vignette titled Formulae.

Users needing higher control over the estimation process can use the functions with names hpfj* and auto_hpfj*. However, the wrapper function hpj makes the process simpler and more stable. Indeed, good starting values are essential since the technique is based on a complex numerical optimisation. Inside the hpj function, we linearly transform the time series so that the starting values in the hpfj* functions are expected to work fine, and then we apply the anti-transform to all the results so that they refer to the original time series. The functions hpfj* and auto_hpfj* allow full control over the starting values and penalties.

While hpfj* and auto_hpfj* accept only time series stored in a numerical vector (any type of time series object is cast to numerical vector), the hpj function accepts all the main time series objects used in R (ts, zoo, xts, and timeSeries). All the time series generated by the function receive the same class and dates as the original time series.

Basic usage

The function is called as

hpj(y, maxsum = NULL, lambda = NULL, xreg = NULL, ic = c("bic", "hq", "aic", "aicc"))

where y is the time series to be filtered, maxsum is inverse penalty, when zero no jumps are allowed, the larger maxsum the more and larger breaks are allowed, lambda is the smoothing parameter of the HPF, xreg is a matrix of regressors (we are still woring on this, so, at the moment it is not yet implemented), and ic is the information criterion used to select the penalty when maxsum = NULL. When lambda = NULL, the smoothing parameter is estimated by quasi maximum likelihood. The parameter lambda can be a positive number or one of the following strings (in parenthesis the value of the smoothing parameter corresponding to the string): daily (110,930,628,906), weekly (45,697,600), monthly (129,600), quarterly (1,600), annual (6.25).

The function returns an object of class hpj, which is a list with the following slots:

The methods print and plot are available for the hpj class. The plot method can be customised:

plot(x, prob = NULL, show_breaks = TRUE, main = "original + filter", use_ggplot = TRUE, ...)

where x is the object of class hpj, prob is the coverage of the confidence interval for the filter, which is not plotted when prob = NULL, show_breaks is a logical indicating whether to show the breaks in the plot, main is the title of the plot, use_ggplot is a logical indicating whether to use ggplot2, and ... are additional arguments passed to the plot function when the standard plot is used (that is when use_ggplot = FALSE).

Examples

Simulated time series

library(jumps)
set.seed(2025)
n <- 100

# simulated smooth trend
mu <- 100*cos(3*pi/n*(1:n)) - ((1:n) > 50)*n - c(rep(0, 50), 1:50)*10
# simulated time series
y <- mu + rnorm(n, sd = 20)

# HP filter with jumps with estimated lambda and fixed penalty (maxsum = 50)
hpj_sim <- hpj(y, maxsum = 50)

print(hpj_sim)
#> Hodrick-Prescott filter with jumps
#> Call:
#>   hpj(y, maxsum = 50) 
#> Parameters:
#>   sd(slope) = 1.848931
#>   sd(noise) = 20.67477
#>   gamma     = 0.03659672
#>   lambda    = 125.0375
#>   maxsum    = 50
#> Model's degrees of freedom:  14.0596 
#> Log-likelihood:  -492.3048 
#> Information criteria:
#>   AIC  = 1012.729
#>   AICc = 1017.714
#>   BIC  = 1049.357
#>   HQ   = 1027.553
#> Break dates:    
#> 1 51

plot(hpj_sim)

plot(hpj_sim, prob = 0.95)

plot(hpj_sim, use_ggplot = FALSE)

Nile time series

# HP filter with jumps with estimated lambda and automatically selected penalty
hpj_nile <- hpj(Nile)

print(hpj_nile)
#> Hodrick-Prescott filter with jumps
#> Call:
#>   hpj(Nile) 
#> Parameters:
#>   sd(slope) = 1.692275e-07
#>   sd(noise) = 127.7478
#>   gamma     = 3.428068e-14
#>   lambda    = 5.698556e+17
#>   maxsum    = 152.3048
#> Model's degrees of freedom:  4.277247 
#> Log-likelihood:  -644.5773 
#> Information criteria:
#>   AIC  = 1297.709
#>   AICc = 1298.186
#>   BIC  = 1308.852
#>   HQ   = 1302.219
#> Break dates:      
#> 1 1899

plot(hpj_nile, main = "Nile river flow")
#> Don't know how to automatically pick scale for object of type <ts>. Defaulting
#> to continuous.

plot(hpj_nile, prob = 0.95, main = "Nile river flow")
#> Don't know how to automatically pick scale for object of type <ts>. Defaulting
#> to continuous.

plot(hpj_nile, use_ggplot = FALSE, main = "Nile river flow")

Employment in Italy

data("employed_IT")
y <- window(employed_IT[, "Y25.29"], start = c(2009, 1))
hpj_emp <- hpj(y, scl = "original")

print(hpj_emp)
#> Hodrick-Prescott filter with jumps
#> Call:
#>   hpj(y, scl = "original") 
#> Parameters:
#>   sd(slope) = 3.347636
#>   sd(noise) = 21.13837
#>   gamma     = 17.3066
#>   lambda    = 39.87185
#>   maxsum    = 76.59153
#> Model's degrees of freedom:  14.90965 
#> Log-likelihood:  -301.9286 
#> Information criteria:
#>   AIC  = 633.6765
#>   AICc = 644.1979
#>   BIC  = 665.1489
#>   HQ   = 646.0108
#> Break dates:         
#> 1 2010.25
#> 2 2012   
#> 3 2013.75
#> 4 2019.75
#> 5 2020.25
plot(hpj_emp, main = "Millions of employed in Italy: age 25-29")