Practical Session 5: Periodicity

Periodicity

Show the code
# Install pacman if not installed already, and activate it
rm(list=ls())

if (!require("pacman")) install.packages("pacman")

# Install, update and activate libraries
pacman::p_load(
  here, 
  rio, 
  skimr,
  tsibble,
  TSA,
  tidyverse,
  season,
  lmtest
)
package 'season' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\javid\AppData\Local\Temp\RtmpiUbtXs\downloaded_packages
Show the code
# Create tsa_theme from previous exercise to be used in ggplot.
tsa_theme <- theme_bw() + 
        theme(
            plot.title = element_text(face = "bold", 
                                      size = 12),
            legend.background = element_rect(fill = "white", 
                                             size = 4, 
                                             colour = "white"),
            # legend.justification = c(0, 1),
            legend.position = "bottom",
            panel.grid.minor = element_blank()
        )

Expected learning outcomes

By the end of this session, participants should be able to:

  • assess the existence of periodicity in surveillance data

  • fit and interpret models containing a trend and one or several sine and cosine curves on surveillance data to model both trend and periodicity

Task 5.1

Using mortz object (tibble), which is in mortagg_case_4.RData (created in Practical 4), visually assess the existence of any periodicity in the total number of deaths (cyclical patterns). What do you think the period is, if any? Discuss your results with your peers.

Help for Task 5.1

Show the code
# Read data
load(here("data", "mortagg2_case_4.RData"))

Explore the time series.

The function decompose() decomposes the time series by applying moving averages to the series.

Show the code
# format the data to a time series object using ts(); this is needed when using decompose() directly after.
mortz.ts  <- ts(mortz$cases, frequency=52, start=c(2010,1))

plot(decompose(mortz.ts))

Discuss the panels obtained. What features seem to be relevant?

Step 2)

Although we can see that there is a trend in the panels returned by decompose(), we wish to inspect it ourselves. Apply moving averages (practical 4) to determine if there is a trend. Which periodicity or periodicities might be relevant.

Step 3)

We will now estimate the trend, as was done in practical 4. Run the code in order to have at hand the results and be able to compare later with other models.

Show the code
mort_trend <- glm(cases ~ index, family = "poisson", data = mortz)

summary(mort_trend)

Call:
glm(formula = cases ~ index, family = "poisson", data = mortz)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 8.910e+00  1.007e-03 8850.25   <2e-16 ***
index       1.895e-04  3.302e-06   57.39   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 62509  on 520  degrees of freedom
Residual deviance: 59215  on 519  degrees of freedom
AIC: 64841

Number of Fisher Scoring iterations: 4
mort_poissontrend %>% 
  gtsummary::tbl_regression(
    exponentiate = TRUE,
    estimate_fun = ~style_number(.x, digits = 4)
  )

Step 4:

As there seem to be some periodicity in the initial explorations, proceed to estimate the seasonality component in the time series.

Seasonality and periodicity both refer to re-occurring patterns in the data.

Periodogram is called a graph that shows at what frequencies and time units there is periodicity. R has a number of functions which produce a periodogram. Here we will use the periodogram function from the TSA package.

Show the code
mort_period <- TSA::periodogram(mortz$cases)

The periodogram function plots the estimated periodogram for a given time series and also returns various useful outputs as a list, which we can use for other analyses.

This type of plot is interpreted by identifying peaks. To obtain something that can be easily interpretable, we will extract the spectra and frequencies from the output of the periodogram, and calculate the periods by taking the reciprocal of the frequencies, and the plot the frequencies.

Show the code
mort_period_recip <-
    tibble(
        freq = mort_period$freq,
        spec = mort_period$spec
    ) %>%
    arrange(desc(spec)) %>% 
    mutate(reciprocal_freq = 1 / freq)  # here in weeks

view(mort_period_recip)

The above lines of code extract two variables (freq and spec) from the mort_period_recip object and place them in a tibble data.frame. Both columns are ordered in descending order of the spectral variable. At the top of the table we find the strongest spectral signal, as well as at what periodicity does it appear (see reciprocal frequency). Observe that the reciprocal frequency is in the same units as the data, in this case is in
weeks.

Show the code
# breaks every 13, equivalent to every 3 months approximately.

ggplot(data = mort_period_recip, aes(x = reciprocal_freq, y = 2*spec)) +
    geom_line() +
    scale_x_continuous(limits = c(0, 160), breaks=seq(0, 160, 13)) +
    labs(x = "Period (weeks)", 
         y = "Spectral density") +
    tsa_theme

What is the main periodicity in the mortality data? (use the plot and the table of mort_period_recip).

Is there another periodicity that might be worth checking out?

Task 5.2

Fit a model on the data with a trend and then the relevant period(s).

How many different periods are you including in your model? Test statistically if the inclusion of more than one period contributes to the fit of the model. Discuss your results.

As the periodogram shows periodicity close to 52 weeks, we will use a sine and cosine curves of a 52-week period. Periodicity with a period of one year is also referred to as seasonality.

Fit a poisson regression of cases that contains a trend as well as sine and cosine predictor term for the periodicit identified. In order to have an appropriate phase in your model (you do not have to worry about identifying it; this will happen automatically), you need to use both a sine and a cosine curve with the same period. The sum of these two curves gives the periodicity for the specified period and the phase best describing our data.

Show the code
# calculate the sine and cosine terms.
mortz <- mortz %>%
  mutate(cos52 = cos(2 * pi * index / 52),
         sin52 = sin(2 * pi * index / 52))

mort_trendsin52cos52 <- glm(cases ~ index + sin52 + cos52, family = "poisson", data = mortz)

summary(mort_trendsin52cos52)

Call:
glm(formula = cases ~ index + sin52 + cos52, family = "poisson", 
    data = mortz)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 8.898e+00  1.012e-03 8792.82   <2e-16 ***
index       2.184e-04  3.311e-06   65.97   <2e-16 ***
sin52       9.137e-02  7.066e-04  129.29   <2e-16 ***
cos52       1.061e-01  7.033e-04  150.88   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 62509  on 520  degrees of freedom
Residual deviance: 19668  on 517  degrees of freedom
AIC: 25298

Number of Fisher Scoring iterations: 3
Show the code
ggplot(data = mortz, aes(x = date_index)) +
    geom_point(aes(y = cases), alpha = 0.4) +
    geom_line(
        mapping = aes(y = fitted(mort_trendsin52cos52)),
        colour = "green",
        lwd=1.5
    ) +
    scale_y_continuous(limits = c(0, NA)) +
  scale_x_yearweek(date_labels = "%Y-%W", 
                   date_breaks = "1 year") +
    labs(x = "Year week", 
         y = "Number of deaths", 
         title = "Spain: number of deaths per week with fit from trend+sin52+cos52") +
    tsa_theme

How does the fit look visually?

Plot the results of the model with only trend and the newer model with trend and periodicity 52 weeks. Compare statistically and comment on the results.

Show the code
ggplot(data = mortz, aes(x = date_index)) +
    geom_point(aes(y = cases), alpha = 0.4) +
    geom_line(
        mapping = aes(y = fitted(mort_trend)),
        colour = "blue",
        lwd=1.5
    ) +
    geom_line(
        mapping = aes(y = fitted(mort_trendsin52cos52)),
        colour = "green",
        lwd=1.5
    ) +
    scale_y_continuous(limits = c(0, NA)) +
  scale_x_yearweek(date_labels = "%Y-%W", 
                   date_breaks = "1 year") +
    labs(x = "Year week", 
         y = "Number of deaths", 
         title = "Spain: number of deaths per week with fit from only trend and trend+sin52+cos52") +
    tsa_theme

To compare two nested models, we make use of the likelihood ratio test (LRT). These models are nested since they contain the same data, and the second model has the same variables as the first plus some additional ones.

Show the code
# likelihood ratio test: compares two nested models

# test sin52 and cos52 
lrtest(mort_trend, mort_trendsin52cos52) 
Likelihood ratio test

Model 1: cases ~ index
Model 2: cases ~ index + sin52 + cos52
  #Df LogLik Df Chisq Pr(>Chisq)    
1   2 -32418                        
2   4 -12645  2 39547  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A significant p-value from LRT indicates that the extra variable(s) result in a better fit of the bigger model. In this cases, a better fit to the data is obtained when we include the seasonality at 52 weeks.

To fit the model better, we could try to add more sine/cosine curves, of a period corresponding to the second strongest peak in the model (26 weeks). This will allow for cycles (periods) of not only 52, but also 26 weeks, should we think there might be half-yearly cycles which are relevant. In this case, for instance, there may be elevated mortality in winter, but also in summer during heatwaves.

Generate sine and cosine terms with a period of 26 weeks and name them sin26 and cos26. Add these two new terms to the previous model.

Show the code
mortz <-
    mortz %>%
    mutate(
        sin26 = sin(2 * pi * index / 26),
        cos26 = cos(2 * pi * index / 26)
    )

mort_trendsin2cos2 <- glm(cases ~ index + sin52 + cos52 + sin26 + cos26,
                           family = "poisson", 
                           data = mortz)
                           
summary(mort_trendsin2cos2)

Call:
glm(formula = cases ~ index + sin52 + cos52 + sin26 + cos26, 
    family = "poisson", data = mortz)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 8.895e+00  1.013e-03 8779.51   <2e-16 ***
index       2.265e-04  3.313e-06   68.35   <2e-16 ***
sin52       9.034e-02  7.142e-04  126.48   <2e-16 ***
cos52       1.021e-01  6.994e-04  145.94   <2e-16 ***
sin26       5.075e-02  7.054e-04   71.94   <2e-16 ***
cos26       3.298e-02  7.034e-04   46.88   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 62509  on 520  degrees of freedom
Residual deviance: 12288  on 515  degrees of freedom
AIC: 17922

Number of Fisher Scoring iterations: 3
Show the code
ggplot(data = mortz, aes(x = date_index)) +
    geom_point(aes(y = cases), alpha = 0.4) +
    geom_line(
        mapping = aes(y = fitted(mort_trendsin2cos2)),
        colour = "green",
        lwd=1.5
    ) +
    scale_y_continuous(limits = c(0, NA)) +
  scale_x_yearweek(date_labels = "%Y-%W", 
                   date_breaks = "1 year") +
    labs(x = "Year-week", 
         y = "Number of deaths", 
         title = "Spain: number of deaths per week with fit from trend, sin52, cos52, sin26, cos26.") +
    tsa_theme

Test if the model with two periodicities fit better than the model with one.

Show the code
# likelihood ratio test: compares two nested models

# test sin26 and cos26
lrtest(mort_trendsin52cos52, mort_trendsin2cos2)
Likelihood ratio test

Model 1: cases ~ index + sin52 + cos52
Model 2: cases ~ index + sin52 + cos52 + sin26 + cos26
  #Df   LogLik Df Chisq Pr(>Chisq)    
1   4 -12644.9                        
2   6  -8954.9  2  7380  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Is the addition of periodicity at 26 weeks a statistically significant contribution to your model?

Task 5.3 (Optional)

Carry out the same analysis for men and women. Do you see differences between the two? Where?

Save

Show the code
# Save the mortz data created during this case study to be used in the next practical.

#save(mortz, file = here("data", "mortagg2_case_5.RData"))