Practical Session 10: Assessing the impact of interventions using surveillance data

Expected learning outcomes

By the end of the case study, participants will be able to:

Assess and interpret the effect of an intervention on trends and periodicity in surveillance data

Description of the dataset

Rotavirus is a very common and potentially serious infection of the large bowel, mainly affecting young babies. Nearly every child will have at least one episode of rotavirus gastroenteritis by five years of age. People of any age can be affected, but the illness is more severe in young infants. The rotavirus immunisation programme was introduced in the UK on 1 July 2013 (week 27) with the objective of preventing a significant number of young infants from developing rotavirus infection. It may also provide some additional protection to the wider population through herd immunity. The aim of the rotavirus immunisation programme is to provide two doses of vaccine to infants from six weeks of age and before 24 weeks of age. The first dose of vaccine is offered at approximately eight weeks of age and the second dose at least four weeks after the first dose.

High coverage was rapidly achieved for the first cohort of children offered rotavirus vaccine routinely in England and this has been maintained throughout the first fourteen months of the routine programme. Over this period, rotavirus vaccine coverage for children in the routine cohort averaged 93.3% for one dose and 88.3% for two doses.¹

Task 10.1

You have been given a dataset for notified Rotavirus infections in England and Wales from week 23/2009 up to week 42/2018 (rotavirus.csv).

Assess the impact of the introduction of the vaccination scheme in England and Wales for the whole population. Has the vaccine introduction had an effect on trend or periodicity patterns? Has the vaccine introduction affected the timing of the yearly outbreaks?

Help for Task 10.1

Open the rotavirus dataset, which is a line list of all notified rotavirus infections in the UK between June 2009 and December 2014. Prepare your data and test if the introduction of the vaccine has had an impact on the notified number of cases of rotavirus infections in the UK. Has the trend changed? Has the periodicity changed?

Prepare the rotavirus.csv for time-series analysis. You need to manipulate the dataset before being able to use it for time series analysis.

Show the code

rota <- import(here("data", "rotavirus.csv")) 
view(rota)

As the data in rota shows above, this dataset is structured in a “wide” format. You can see two variables representing dates, and all other variables containing case counts are organised horizontally. Alternatively, we can store the same information by converting the first dataset to a “long” format, where one variable contains the different age group categories of cases (cases_cat), and another displays the corresponding case counts (n) for each age group and year-week.

In order to use ggplot2, remember that the data should be organised in a long format.

The function as_tsibble() is what indicates to R that the data is a time series. index represents the time variable and key represents any categorical variable that splits the data into several time series (eg: location, age group…).

Show the code

## convert to long format
rota_lg <- rota %>% 
    pivot_longer(
      cols = -c(year, week),
      names_to = "cases_cat",
      values_to = "n"
    )

## recode
rota_lg <- rota_lg %>%
  mutate(cases_cat = case_when(
    cases_cat == "case1" ~ "Cases <1 year",
    cases_cat == "case2" ~ "Cases 1 to 4 years",
    cases_cat == "case3" ~ "Cases 5 to 14 years",
    cases_cat == "case4" ~ "Cases 15 to 64 years",
    cases_cat == "case5" ~ "Cases 65+ years",
    cases_cat == "cases" ~ "Total cases",
    TRUE ~ NA_character_
    ))

## turn into ts format and make sure the data is complete
rota_lg <- rota_lg %>%
  mutate(year_week = make_yearweek(year = year, week = week)) %>%
  group_by(cases_cat) %>%
  mutate(index = seq.int(from = 1, to = length(unique(year_week)))) %>%
  ungroup() %>%
  as_tsibble(index = year_week, key = cases_cat) %>%
  complete(index, cases_cat, fill=list(n = 0))

Plot the total number of cases by age group against time. Note that .

Show the code

ggplot(data = rota_lg) +
  geom_line (mapping = aes(x = year_week, y = n), alpha = 0.3) +
  facet_wrap(facets = vars(cases_cat), 
             scales = "free_y") +   
  scale_y_continuous(limits = c(0, NA)) +
  scale_x_yearweek(date_labels = "%Y-%W", 
                   date_breaks = "1 year") +
  labs(
    x = "Year", 
    y = "Weekly case counts",
    title = "Rotavirus cases\n(note differing y scales)"
  ) +
  tsa_theme

Show the code

## Notes: 
# You can get line breaks in labels using \n
# You can also make all y-axes the same by using scales = "fixed_y"

We want to see whether the number of cases changed after the introduction of the vaccine.

For that, first we need to state in the dataset when the vaccine was available. Firstly, we are adding one new vaccine variable to the rota_ts time series. It will be 0 for weeks before week 27 2013, and 1 thereafter.

We have also created sine and cosine variables so that we can model annual seasonality as before.

Show the code

rota_ts <- rota_lg %>%
  mutate(
    vaccine = case_when(
      year >= 2014 ~ 1,
      year <= 2012 ~ 0,
      year == 2013 & week >= 27 ~ 1,
      year == 2013 & week < 27 ~ 0,
      TRUE ~ NA_real_
      ),
    sin52 = sin(2 * pi * index / 52),
    cos52 = cos(2 * pi * index / 52)
  )

## subset to total cases
rota_ts <- rota_ts %>% 
  filter(cases_cat == "Total cases")

Run first a separate model for the time before and after the introduction of the vaccine in England and Wales for all population. Can you detect any change in trend?

Note in the code that we are building further on the tbl_regression function, adding options.

What we saw before:

exponentiate = T to exponentiate coefficients
style_number() to specify the number of digits in the estimates.
add_glance_table() to add key parameters at the bottom of the table.
table_merge() to take several tables built with tbl_regression and put them in one big table.

What we are adding now: - style_pvalue to format the pvalue (here with max 3 digits) - modify_caption to add a table title

Show the code

rotamodel_prevacc_p <- glm(n ~ index,
    data = rota_ts %>% filter(vaccine == 0),
    family = "poisson"
) 

#If the `summary` command is used for a `glm` Poisson model, the log rate ratios are reported by 
#default. They need to be exponentiated 

summary_rotamodel_prevacc_p <- rotamodel_prevacc_p %>% 
  gtsummary::tbl_regression(
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 3), 
    estimate_fun = ~style_number(.x, digits = 4)
    ) %>%
  gtsummary::add_glance_table(include = c(AIC, deviance, df.residual)) %>% 
  gtsummary::modify_caption("Rotavirus model pre-vaccination - Poisson")
  
summary_rotamodel_prevacc_p

Rotavirus model pre-vaccination - Poisson
Characteristic	IRR	95% CI	p-value
index	1.0028	1.0027, 1.0029	<0.001
AIC	78,347
Deviance	76,884
Residual df	210
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

Is a Poisson model the best option?

Before we have a look at the period post-vaccination, is a Poisson model really the best model to use here? We need to check whether our data exhibit overdispersion.

Overdispersion occurs when the mean of the data is not equal to its variance, violating the fundamental assumption of the Poisson distribution. If this happens, the model underestimates the standard error of the coefficients, biasing results towards narrower confidence intervals and artifically lower p-values.

Let’s check for overdispersion. We can calculate an overdispersion parameter from a model as the ratio of the deviance and the residuals degrees of freedom. Also, proper statistical test are available, for example using the AER library and the dispersiontest function

Show the code

# Quick check for overdispersion: if check > 1 then use negative binomial
deviance <- summary(rotamodel_prevacc_p)$deviance
residual.df <- summary(rotamodel_prevacc_p)$df.residual

check <- deviance / residual.df
check

[1] 366.1138

Show the code

# Proper check for overdispersion: 
AER::dispersiontest(rotamodel_prevacc_p, trafo = NULL, alternative = c("greater"))


    Overdispersion test

data:  rotamodel_prevacc_p
z = 7.2434, p-value = 2.188e-13
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion 
   446.172

The overdispersion parameter is fairly greater that 1, and the overdispersion test is significant, so we can safely conclude that a negative binomial model is more appropiate. Let’s compute it with the glm.nb function, and compare with the previous one

Show the code

#so best to use a negative binomial here
rotamodel_prevacc_nb <- glm.nb(n ~ index,
                               data = rota_ts %>% filter(vaccine == 0)) 


# Table with results
summary_rotamodel_prevacc_nb <- rotamodel_prevacc_nb %>%
  gtsummary::tbl_regression(
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 3),
    estimate_fun = ~style_number(.x, digits = 4)
    ) %>%
  gtsummary::add_glance_table(include = c(AIC, deviance, df.residual)) %>% 
  gtsummary::modify_caption("Rotavirus model pre-vaccination - Negative Binomial")
  

# Let's compare the two models
comparison_p_nb <- gtsummary::tbl_merge(
  tbls = list(summary_rotamodel_prevacc_p, 
              summary_rotamodel_prevacc_nb),
  tab_spanner = c("Poisson model", "Negative Binomial model")
  )

comparison_p_nb

Rotavirus model pre-vaccination - Poisson
Characteristic	Poisson model			Negative Binomial model
Characteristic	IRR	95% CI	p-value	IRR	95% CI	p-value
index	1.0028	1.0027, 1.0029	<0.001	1.0029	1.0005, 1.0054	0.015
AIC	78,347			2,854
Deviance	76,884			248
Residual df	210			210
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

The AIC comparison proves how the negative binomial model outperforms the Poisson model. Also, realize how the index coefficient shifted from a tiny negative to a tiny positive overal trend, and how the p-value has greatly increased (but it still significant)

Let’s use negative binomial models for the rest of the practical.

Now let’s look at the period post-introduction of the vaccine.

We will fit a model for the post-vaccination period and compare it with the pre-vaccination period

Show the code

rotamodel_postvacc <- glm.nb(n ~ index,
                             data = rota_ts %>% filter(vaccine == 1))

# Table with results 
summary_rotamodel_postvacc <- rotamodel_postvacc %>% 
  gtsummary::tbl_regression(
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 3),
    estimate_fun = ~style_number(.x, digits = 4)) %>%
  gtsummary::add_glance_table(include = c(AIC, deviance, df.residual)) %>% 
  gtsummary::modify_caption("Rotavirus model post-vaccination")


# Comparisson between models
comparison_pre_post <- gtsummary::tbl_merge(
    tbls = list(summary_rotamodel_prevacc_nb, 
                summary_rotamodel_postvacc),
    tab_spanner = c("Pre-vaccination", "Post-vaccination")
  )

comparison_pre_post

Rotavirus model pre-vaccination - Negative Binomial
Characteristic	Pre-vaccination			Post-vaccination
Characteristic	IRR	95% CI	p-value	IRR	95% CI	p-value
index	1.0029	1.0005, 1.0054	0.015	0.9977	0.9967, 0.9988	<0.001
AIC	2,854			2,830
Deviance	248			293
Residual df	210			274
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

We can see a difference in trends, but the two models cannot be directly compared.

Now we want to run one single model for both periods (before/after the introduction of the vaccine)

Show the code

rotamodel_trend <- glm.nb(n ~ index + vaccine,
                          data = rota_ts)

# Table with results
summary_rotamodel_trend <- rotamodel_trend %>% 
  gtsummary::tbl_regression(
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 3),
    estimate_fun = ~style_number(.x, digits = 4)) %>%
  gtsummary::add_glance_table(include = c(AIC, deviance, df.residual)) %>% 
  gtsummary::modify_caption("Rotavirus model with trend and vaccine intervention")

summary_rotamodel_trend

Rotavirus model with trend and vaccine intervention
Characteristic	IRR	95% CI	p-value
index	0.9995	0.9983, 1.0006	0.341
vaccine	0.2624	0.1940, 0.3551	<0.001
AIC	5,768
Deviance	544
Residual df	485
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

How would you interpret these results?

We can plot the model against the raw data.

Show the code

rota_ts <- rota_ts %>% 
  mutate(fitted_trend = fitted(rotamodel_trend))
   

ggplot(data = rota_ts) +
  geom_point(mapping = aes(x = year_week, y = n), alpha = 0.3) +
  geom_line(mapping = aes(x = year_week, y = fitted_trend), 
            alpha = 0.7, colour = "green") +
  scale_y_continuous(limits = c(0, NA)) +
  scale_x_yearweek(date_labels = "%Y-%W", 
                   date_breaks = "1 year") +
  labs(x = "Year", 
       y = "Weekly case counts",
       title = "Rotavirus cases: model with trend and intervention"
  ) +
  tsa_theme

Now we want to run one single model for both periods (before/after the introduction of the vaccine), this time including seasonality.

Show the code

rotamodel_seasonality <- glm.nb(n ~ index + sin52 + cos52 + vaccine,
                                data = rota_ts)


summary_rotamodel_seasonality <- rotamodel_seasonality %>% 
  gtsummary::tbl_regression(
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 3),
    estimate_fun = ~style_number(.x, digits = 4)) %>%
  gtsummary::add_glance_table(include = c(AIC, deviance, df.residual)) %>% 
  gtsummary::modify_caption("Rotavirus model with seasonality, trend and vaccine intervention")

summary_rotamodel_seasonality

Rotavirus model with seasonality, trend and vaccine intervention
Characteristic	IRR	95% CI	p-value
index	0.9989	0.9982, 0.9996	0.001
sin52	0.4245	0.3951, 0.4560	<0.001
cos52	2.0527	1.9092, 2.2071	<0.001
vaccine	0.4885	0.4034, 0.5919	<0.001
AIC	5,225
Deviance	506
Residual df	483
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

We can plot the model against the raw data again.

Show the code

rota_ts <- rota_ts %>% 
  mutate(fitted_seasonality = fitted(rotamodel_seasonality))
 

ggplot(data = rota_ts) + 
  geom_point(mapping = aes(x = year_week, y = n), alpha = 0.3) +
  geom_line(mapping = aes(x = year_week, y = fitted_seasonality), 
            alpha = 0.7, colour = "green") +
  scale_y_continuous(limits = c(0, NA)) +
  scale_x_yearweek(date_labels = "%Y-%W", 
                   date_breaks = "1 year") +
  labs(x = "Year", 
       y = "Weekly case counts",
       title = "Rotavirus cases: model with trend, seasonality, and intervention"
    ) +
  tsa_theme

To be able to comment on any change in trend after the introduction of the vaccine, we need to either include an interaction term in the model or include a time variable that starts at the introduction of the vaccine. Let’s here use an interaction term.

Show the code

rotamodel_winteraction <- glm.nb(n ~ index * vaccine + sin52 + cos52,
                                 data = rota_ts)

summary_rotamodel_trendinteraction <- rotamodel_winteraction %>% 
  gtsummary::tbl_regression(
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 3),
    estimate_fun = ~style_number(.x, digits = 4)) %>%
  gtsummary::add_glance_table(include = c(AIC, deviance, df.residual)) %>% 
  gtsummary::modify_caption("Rotavirus model with seasonality, trend, vaccine and interaction between intervention and trend")

summary_rotamodel_trendinteraction

Rotavirus model with seasonality, trend, vaccine and interaction between intervention and trend
Characteristic	IRR	95% CI	p-value
index	1.0015	1.0003, 1.0027	0.015
vaccine	0.9833	0.7076, 1.3693	0.919
sin52	0.4290	0.4001, 0.4600	<0.001
cos52	2.0434	1.9036, 2.1935	<0.001
index * vaccine	0.9962	0.9948, 0.9977	<0.001
AIC	5,202
Deviance	505
Residual df	482
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

You can visualize the effect of the interaction term using plot_model() from the sjPlot library

Show the code

sjPlot::plot_model(rotamodel_winteraction, type ="int")

Have there been any changes in seasonality, too?

Note how we can use brackets to introduce interaction terms for several variables.

Show the code

rotamodel_full <- glm.nb(n ~ (sin52 + cos52 + index)*vaccine,
                         data = rota_ts)


summary_rotamodel_full <- rotamodel_full %>% 
  gtsummary::tbl_regression(
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 3),
    estimate_fun = ~style_number(.x, digits = 4)) %>%
  gtsummary::add_glance_table(include = c(AIC, deviance, df.residual)) %>% 
  gtsummary::modify_caption("Rotavirus model with seasonality, trend, vaccine and interaction term")

summary_rotamodel_full

Rotavirus model with seasonality, trend, vaccine and interaction term
Characteristic	IRR	95% CI	p-value
sin52	0.2551	0.2361, 0.2756	<0.001
cos52	2.1715	1.9979, 2.3602	<0.001
index	1.0006	0.9997, 1.0016	0.185
vaccine	0.9175	0.7086, 1.1897	0.504
sin52 * vaccine	2.6312	2.3698, 2.9216	<0.001
cos52 * vaccine	0.8702	0.7800, 0.9709	0.010
index * vaccine	0.9971	0.9959, 0.9982	<0.001
AIC	4,945
Deviance	496
Residual df	480
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

Show the code

sjPlot::plot_model(rotamodel_full, type ="int")

[[1]]


[[2]]


[[3]]

Let’s plot the full model estimate.

Show the code

rota_ts <- rota_ts %>% 
  mutate(fitted_full = fitted(rotamodel_full))


ggplot(data = rota_ts) +
  geom_point(mapping = aes(x = year_week, y = n), alpha = 0.3) +
  geom_line(mapping = aes(x = year_week, y = fitted_full), 
            alpha = 0.7, colour = "green") +
  scale_y_continuous(limits = c(0, NA)) +
  scale_x_yearweek(date_labels = "%Y-%W", date_breaks = "1 year") +
  labs(x = "Year",
       y = "Weekly case counts",
       title = "Rotavirus cases: model with trend, seasonality, intervention \n and interaction terms"
  ) +
  tsa_theme

We can now compare all models next to each other:

Show the code

comparison_all <- gtsummary::tbl_merge(
    tbls = list(summary_rotamodel_seasonality,
                summary_rotamodel_trendinteraction,  
                summary_rotamodel_full),
    tab_spanner = c("Model without interaction terms", 
                    "Interaction with trend only", 
                    "Full model")
  )

comparison_all

Rotavirus model with seasonality, trend and vaccine intervention
Characteristic	Model without interaction terms			Interaction with trend only			Full model
Characteristic	IRR	95% CI	p-value	IRR	95% CI	p-value	IRR	95% CI	p-value
index	0.9989	0.9982, 0.9996	0.001	1.0015	1.0003, 1.0027	0.015	1.0006	0.9997, 1.0016	0.185
sin52	0.4245	0.3951, 0.4560	<0.001	0.4290	0.4001, 0.4600	<0.001	0.2551	0.2361, 0.2756	<0.001
cos52	2.0527	1.9092, 2.2071	<0.001	2.0434	1.9036, 2.1935	<0.001	2.1715	1.9979, 2.3602	<0.001
vaccine	0.4885	0.4034, 0.5919	<0.001	0.9833	0.7076, 1.3693	0.919	0.9175	0.7086, 1.1897	0.504
AIC	5,225			5,202			4,945
Deviance	506			505			496
Residual df	483			482			480
index * vaccine				0.9962	0.9948, 0.9977	<0.001	0.9971	0.9959, 0.9982	<0.001
sin52 * vaccine							2.6312	2.3698, 2.9216	<0.001
cos52 * vaccine							0.8702	0.7800, 0.9709	0.010
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

What about age? Can you see a difference in impact of vaccination depending on age category?

Task 10.2 (Optional)

Discuss with colleagues how you can determine the number of cases prevented as a result of the introduction of the vaccine (Hint: refer to Practical 7!).

Rotavirus infant immunisation programme 2014/15: Vaccine uptake report for England (Published: June 2015; PHE publications gateway number: 2015141)↩︎