Practical Session 10: Assessing the impact of interventions using surveillance data

Expected learning outcomes

By the end of the case study, participants will be able to:

  • Assess and interpret the effect of an intervention on trends and periodicity in surveillance data

Description of the dataset

Rotavirus is a very common and potentially serious infection of the large bowel, mainly affecting young babies. Nearly every child will have at least one episode of rotavirus gastroenteritis by five years of age. People of any age can be affected, but the illness is more severe in young infants. The rotavirus immunisation programme was introduced in the UK on 1 July 2013 (week 27) with the objective of preventing a significant number of young infants from developing rotavirus infection. It may also provide some additional protection to the wider population through herd immunity. The aim of the rotavirus immunisation programme is to provide two doses of vaccine to infants from six weeks of age and before 24 weeks of age. The first dose of vaccine is offered at approximately eight weeks of age and the second dose at least four weeks after the first dose.

High coverage was rapidly achieved for the first cohort of children offered rotavirus vaccine routinely in England and this has been maintained throughout the first fourteen months of the routine programme. Over this period, rotavirus vaccine coverage for children in the routine cohort averaged 93.3% for one dose and 88.3% for two doses.1

Task 10.1

You have been given a dataset for notified Rotavirus infections in England and Wales from week 23/2009 up to week 42/2018 (rotavirus.csv).

Assess the impact of the introduction of the vaccination scheme in England and Wales for the whole population. Has the vaccine introduction had an effect on trend or periodicity patterns? Has the vaccine introduction affected the timing of the yearly outbreaks?

Help for Task 10.1

Open the rotavirus dataset, which is a line list of all notified rotavirus infections in the UK between June 2009 and December 2014. Prepare your data and test if the introduction of the vaccine has had an impact on the notified number of cases of rotavirus infections in the UK. Has the trend changed? Has the periodicity changed?

Prepare the rotavirus.csv for time-series analysis. You need to manipulate the dataset before being able to use it for time series analysis.

Show the code
rota <- import(here("data", "rotavirus.csv")) 
view(rota)

As the data in rota shows above, this dataset is structured in a “wide” format. You can see two variables representing dates, and all other variables containing case counts are organised horizontally. Alternatively, we can store the same information by converting the first dataset to a “long” format, where one variable contains the different age group categories of cases (cases_cat), and another displays the corresponding case counts (n) for each age group and year-week.

In order to use ggplot2, remember that the data should be organised in a long format.

The function as_tsibble() is what indicates to R that the data is a time series. index represents the time variable and key represents any categorical variable that splits the data into several time series (eg: location, age group…).

Show the code
## convert to long format
rota_lg <- rota %>% 
    pivot_longer(
      cols = -c(year, week),
      names_to = "cases_cat",
      values_to = "n"
    )

## recode
rota_lg <- rota_lg %>%
  mutate(cases_cat = case_when(
    cases_cat == "case1" ~ "Cases <1 year",
    cases_cat == "case2" ~ "Cases 1 to 4 years",
    cases_cat == "case3" ~ "Cases 5 to 14 years",
    cases_cat == "case4" ~ "Cases 15 to 64 years",
    cases_cat == "case5" ~ "Cases 65+ years",
    cases_cat == "cases" ~ "Total cases",
    TRUE ~ NA_character_
    ))

## turn into ts format and make sure the data is complete
rota_lg <- rota_lg %>%
  mutate(year_week = make_yearweek(year = year, week = week)) %>%
  group_by(cases_cat) %>%
  mutate(index = seq.int(from = 1, to = length(unique(year_week)))) %>%
  ungroup() %>%
  as_tsibble(index = year_week, key = cases_cat) %>%
  complete(index, cases_cat, fill=list(n = 0))

Plot the total number of cases by age group against time. Note that .

Show the code
ggplot(data = rota_lg) +
  geom_line (mapping = aes(x = year_week, y = n), alpha = 0.3) +
  facet_wrap(facets = vars(cases_cat), 
             scales = "free_y") +   
  scale_y_continuous(limits = c(0, NA)) +
  scale_x_yearweek(date_labels = "%Y-%W", 
                   date_breaks = "1 year") +
  labs(
    x = "Year", 
    y = "Weekly case counts",
    title = "Rotavirus cases\n(note differing y scales)"
  ) +
  tsa_theme

Show the code
## Notes: 
# You can get line breaks in labels using \n
# You can also make all y-axes the same by using scales = "fixed_y"

We want to see whether the number of cases changed after the introduction of the vaccine.

For that, first we need to state in the dataset when the vaccine was available. Firstly, we are adding one new vaccine variable to the rota_ts time series. It will be 0 for weeks before week 27 2013, and 1 thereafter.

We have also created sine and cosine variables so that we can model annual seasonality as before.

Show the code
rota_ts <- rota_lg %>%
  mutate(
    vaccine = case_when(
      year >= 2014 ~ 1,
      year <= 2012 ~ 0,
      year == 2013 & week >= 27 ~ 1,
      year == 2013 & week < 27 ~ 0,
      TRUE ~ NA_real_
      ),
    sin52 = sin(2 * pi * index / 52),
    cos52 = cos(2 * pi * index / 52)
  )

## subset to total cases
rota_ts <- rota_ts %>% 
  filter(cases_cat == "Total cases")

Run first a separate model for the time before and after the introduction of the vaccine in England and Wales for all population. Can you detect any change in trend?

Note in the code that we are building further on the tbl_regression function, adding options.

What we saw before:

  • exponentiate = T to exponentiate coefficients
  • style_number() to specify the number of digits in the estimates.
  • add_glance_table() to add key parameters at the bottom of the table.
  • table_merge() to take several tables built with tbl_regression and put them in one big table.

What we are adding now: - style_pvalue to format the pvalue (here with max 3 digits) - modify_caption to add a table title

Show the code
rotamodel_prevacc_p <- glm(n ~ index,
    data = rota_ts %>% filter(vaccine == 0),
    family = "poisson"
) 

#If the `summary` command is used for a `glm` Poisson model, the log rate ratios are reported by 
#default. They need to be exponentiated 

summary_rotamodel_prevacc_p <- rotamodel_prevacc_p %>% 
  gtsummary::tbl_regression(
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 3), 
    estimate_fun = ~style_number(.x, digits = 4)
    ) %>%
  gtsummary::add_glance_table(include = c(AIC, deviance, df.residual)) %>% 
  gtsummary::modify_caption("Rotavirus model pre-vaccination - Poisson")
  
summary_rotamodel_prevacc_p
Rotavirus model pre-vaccination - Poisson
Characteristic IRR 95% CI p-value
index 1.0028 1.0027, 1.0029 <0.001
AIC 78,347

Deviance 76,884

Residual df 210

Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

Is a Poisson model the best option?

Before we have a look at the period post-vaccination, is a Poisson model really the best model to use here? We need to check whether our data exhibit overdispersion.

Overdispersion occurs when the mean of the data is not equal to its variance, violating the fundamental assumption of the Poisson distribution. If this happens, the model underestimates the standard error of the coefficients, biasing results towards narrower confidence intervals and artifically lower p-values.

Let’s check for overdispersion. We can calculate an overdispersion parameter from a model as the ratio of the deviance and the residuals degrees of freedom. Also, proper statistical test are available, for example using the AER library and the dispersiontest function

Show the code
# Quick check for overdispersion: if check > 1 then use negative binomial
deviance <- summary(rotamodel_prevacc_p)$deviance
residual.df <- summary(rotamodel_prevacc_p)$df.residual

check <- deviance / residual.df
check
[1] 366.1138
Show the code
# Proper check for overdispersion: 
AER::dispersiontest(rotamodel_prevacc_p, trafo = NULL, alternative = c("greater"))

    Overdispersion test

data:  rotamodel_prevacc_p
z = 7.2434, p-value = 2.188e-13
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion 
   446.172 

The overdispersion parameter is fairly greater that 1, and the overdispersion test is significant, so we can safely conclude that a negative binomial model is more appropiate. Let’s compute it with the glm.nb function, and compare with the previous one

Show the code
#so best to use a negative binomial here
rotamodel_prevacc_nb <- glm.nb(n ~ index,
                               data = rota_ts %>% filter(vaccine == 0)) 


# Table with results
summary_rotamodel_prevacc_nb <- rotamodel_prevacc_nb %>%
  gtsummary::tbl_regression(
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 3),
    estimate_fun = ~style_number(.x, digits = 4)
    ) %>%
  gtsummary::add_glance_table(include = c(AIC, deviance, df.residual)) %>% 
  gtsummary::modify_caption("Rotavirus model pre-vaccination - Negative Binomial")
  

# Let's compare the two models
comparison_p_nb <- gtsummary::tbl_merge(
  tbls = list(summary_rotamodel_prevacc_p, 
              summary_rotamodel_prevacc_nb),
  tab_spanner = c("Poisson model", "Negative Binomial model")
  )

comparison_p_nb
Rotavirus model pre-vaccination - Poisson
Characteristic
Poisson model
Negative Binomial model
IRR 95% CI p-value IRR 95% CI p-value
index 1.0028 1.0027, 1.0029 <0.001 1.0029 1.0005, 1.0054 0.015
AIC 78,347

2,854

Deviance 76,884

248

Residual df 210

210

Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

The AIC comparison proves how the negative binomial model outperforms the Poisson model. Also, realize how the index coefficient shifted from a tiny negative to a tiny positive overal trend, and how the p-value has greatly increased (but it still significant)

Let’s use negative binomial models for the rest of the practical.

Now let’s look at the period post-introduction of the vaccine.

We will fit a model for the post-vaccination period and compare it with the pre-vaccination period

Show the code
rotamodel_postvacc <- glm.nb(n ~ index,
                             data = rota_ts %>% filter(vaccine == 1))

# Table with results 
summary_rotamodel_postvacc <- rotamodel_postvacc %>% 
  gtsummary::tbl_regression(
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 3),
    estimate_fun = ~style_number(.x, digits = 4)) %>%
  gtsummary::add_glance_table(include = c(AIC, deviance, df.residual)) %>% 
  gtsummary::modify_caption("Rotavirus model post-vaccination")


# Comparisson between models
comparison_pre_post <- gtsummary::tbl_merge(
    tbls = list(summary_rotamodel_prevacc_nb, 
                summary_rotamodel_postvacc),
    tab_spanner = c("Pre-vaccination", "Post-vaccination")
  )

comparison_pre_post
Rotavirus model pre-vaccination - Negative Binomial
Characteristic
Pre-vaccination
Post-vaccination
IRR 95% CI p-value IRR 95% CI p-value
index 1.0029 1.0005, 1.0054 0.015 0.9977 0.9967, 0.9988 <0.001
AIC 2,854

2,830

Deviance 248

293

Residual df 210

274

Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

We can see a difference in trends, but the two models cannot be directly compared.

Now we want to run one single model for both periods (before/after the introduction of the vaccine)

Show the code
rotamodel_trend <- glm.nb(n ~ index + vaccine,
                          data = rota_ts)

# Table with results
summary_rotamodel_trend <- rotamodel_trend %>% 
  gtsummary::tbl_regression(
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 3),
    estimate_fun = ~style_number(.x, digits = 4)) %>%
  gtsummary::add_glance_table(include = c(AIC, deviance, df.residual)) %>% 
  gtsummary::modify_caption("Rotavirus model with trend and vaccine intervention")

summary_rotamodel_trend
Rotavirus model with trend and vaccine intervention
Characteristic IRR 95% CI p-value
index 0.9995 0.9983, 1.0006 0.341
vaccine 0.2624 0.1940, 0.3551 <0.001
AIC 5,768

Deviance 544

Residual df 485

Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

How would you interpret these results?

We can plot the model against the raw data.

Show the code
rota_ts <- rota_ts %>% 
  mutate(fitted_trend = fitted(rotamodel_trend))
   

ggplot(data = rota_ts) +
  geom_point(mapping = aes(x = year_week, y = n), alpha = 0.3) +
  geom_line(mapping = aes(x = year_week, y = fitted_trend), 
            alpha = 0.7, colour = "green") +
  scale_y_continuous(limits = c(0, NA)) +
  scale_x_yearweek(date_labels = "%Y-%W", 
                   date_breaks = "1 year") +
  labs(x = "Year", 
       y = "Weekly case counts",
       title = "Rotavirus cases: model with trend and intervention"
  ) +
  tsa_theme

Now we want to run one single model for both periods (before/after the introduction of the vaccine), this time including seasonality.

Show the code
rotamodel_seasonality <- glm.nb(n ~ index + sin52 + cos52 + vaccine,
                                data = rota_ts)


summary_rotamodel_seasonality <- rotamodel_seasonality %>% 
  gtsummary::tbl_regression(
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 3),
    estimate_fun = ~style_number(.x, digits = 4)) %>%
  gtsummary::add_glance_table(include = c(AIC, deviance, df.residual)) %>% 
  gtsummary::modify_caption("Rotavirus model with seasonality, trend and vaccine intervention")

summary_rotamodel_seasonality
Rotavirus model with seasonality, trend and vaccine intervention
Characteristic IRR 95% CI p-value
index 0.9989 0.9982, 0.9996 0.001
sin52 0.4245 0.3951, 0.4560 <0.001
cos52 2.0527 1.9092, 2.2071 <0.001
vaccine 0.4885 0.4034, 0.5919 <0.001
AIC 5,225

Deviance 506

Residual df 483

Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

We can plot the model against the raw data again.

Show the code
rota_ts <- rota_ts %>% 
  mutate(fitted_seasonality = fitted(rotamodel_seasonality))
 

ggplot(data = rota_ts) + 
  geom_point(mapping = aes(x = year_week, y = n), alpha = 0.3) +
  geom_line(mapping = aes(x = year_week, y = fitted_seasonality), 
            alpha = 0.7, colour = "green") +
  scale_y_continuous(limits = c(0, NA)) +
  scale_x_yearweek(date_labels = "%Y-%W", 
                   date_breaks = "1 year") +
  labs(x = "Year", 
       y = "Weekly case counts",
       title = "Rotavirus cases: model with trend, seasonality, and intervention"
    ) +
  tsa_theme

To be able to comment on any change in trend after the introduction of the vaccine, we need to either include an interaction term in the model or include a time variable that starts at the introduction of the vaccine. Let’s here use an interaction term.

Show the code
rotamodel_winteraction <- glm.nb(n ~ index * vaccine + sin52 + cos52,
                                 data = rota_ts)

summary_rotamodel_trendinteraction <- rotamodel_winteraction %>% 
  gtsummary::tbl_regression(
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 3),
    estimate_fun = ~style_number(.x, digits = 4)) %>%
  gtsummary::add_glance_table(include = c(AIC, deviance, df.residual)) %>% 
  gtsummary::modify_caption("Rotavirus model with seasonality, trend, vaccine and interaction between intervention and trend")

summary_rotamodel_trendinteraction
Rotavirus model with seasonality, trend, vaccine and interaction between intervention and trend
Characteristic IRR 95% CI p-value
index 1.0015 1.0003, 1.0027 0.015
vaccine 0.9833 0.7076, 1.3693 0.919
sin52 0.4290 0.4001, 0.4600 <0.001
cos52 2.0434 1.9036, 2.1935 <0.001
index * vaccine 0.9962 0.9948, 0.9977 <0.001
AIC 5,202

Deviance 505

Residual df 482

Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

You can visualize the effect of the interaction term using plot_model() from the sjPlot library

Show the code
sjPlot::plot_model(rotamodel_winteraction, type ="int") 

Have there been any changes in seasonality, too?

Note how we can use brackets to introduce interaction terms for several variables.

Show the code
rotamodel_full <- glm.nb(n ~ (sin52 + cos52 + index)*vaccine,
                         data = rota_ts)


summary_rotamodel_full <- rotamodel_full %>% 
  gtsummary::tbl_regression(
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 3),
    estimate_fun = ~style_number(.x, digits = 4)) %>%
  gtsummary::add_glance_table(include = c(AIC, deviance, df.residual)) %>% 
  gtsummary::modify_caption("Rotavirus model with seasonality, trend, vaccine and interaction term")

summary_rotamodel_full
Show the code
sjPlot::plot_model(rotamodel_full, type ="int")
[[1]]


[[2]]


[[3]]

Let’s plot the full model estimate.

Show the code
rota_ts <- rota_ts %>% 
  mutate(fitted_full = fitted(rotamodel_full))


ggplot(data = rota_ts) +
  geom_point(mapping = aes(x = year_week, y = n), alpha = 0.3) +
  geom_line(mapping = aes(x = year_week, y = fitted_full), 
            alpha = 0.7, colour = "green") +
  scale_y_continuous(limits = c(0, NA)) +
  scale_x_yearweek(date_labels = "%Y-%W", date_breaks = "1 year") +
  labs(x = "Year",
       y = "Weekly case counts",
       title = "Rotavirus cases: model with trend, seasonality, intervention \n and interaction terms"
  ) +
  tsa_theme

We can now compare all models next to each other:

Show the code
comparison_all <- gtsummary::tbl_merge(
    tbls = list(summary_rotamodel_seasonality,
                summary_rotamodel_trendinteraction,  
                summary_rotamodel_full),
    tab_spanner = c("Model without interaction terms", 
                    "Interaction with trend only", 
                    "Full model")
  )

comparison_all
Rotavirus model with seasonality, trend and vaccine intervention
Characteristic
Model without interaction terms
Interaction with trend only
Full model
IRR 95% CI p-value IRR 95% CI p-value IRR 95% CI p-value
index 0.9989 0.9982, 0.9996 0.001 1.0015 1.0003, 1.0027 0.015 1.0006 0.9997, 1.0016 0.185
sin52 0.4245 0.3951, 0.4560 <0.001 0.4290 0.4001, 0.4600 <0.001 0.2551 0.2361, 0.2756 <0.001
cos52 2.0527 1.9092, 2.2071 <0.001 2.0434 1.9036, 2.1935 <0.001 2.1715 1.9979, 2.3602 <0.001
vaccine 0.4885 0.4034, 0.5919 <0.001 0.9833 0.7076, 1.3693 0.919 0.9175 0.7086, 1.1897 0.504
AIC 5,225

5,202

4,945

Deviance 506

505

496

Residual df 483

482

480

index * vaccine


0.9962 0.9948, 0.9977 <0.001 0.9971 0.9959, 0.9982 <0.001
sin52 * vaccine





2.6312 2.3698, 2.9216 <0.001
cos52 * vaccine





0.8702 0.7800, 0.9709 0.010
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

What about age? Can you see a difference in impact of vaccination depending on age category?

Task 10.2 (Optional)

Discuss with colleagues how you can determine the number of cases prevented as a result of the introduction of the vaccine (Hint: refer to Practical 7!).


  1. Rotavirus infant immunisation programme 2014/15: Vaccine uptake report for England (Published: June 2015; PHE publications gateway number: 2015141)↩︎