1. Learning outcomes

At the end of the session, participants will be able to:

  • Consider the effect of confounding and effect modification on the association between exposure and disease,

  • Perform stratified analysis using the Mantel-Haenszel approach

The use of stratified analysis, is the first step to identify confounding factors and effect modifiers one by one (after, of course, thinking which variables could potentially be confounders or effect modifiers). As the final step, you will be using Regression Models to account for confounding and check for effect modification. We will see these with you in the Multivariable Module (MVA), next year.

2. Story/plot description

From the univariable analysis, it seems that eating orzpo and eating moussaka as well as drinking champagne are associated with the highest risk of becoming ill. There are, however, many other food items that are associated with an increased risk (even if not statistically significant).

You should next think about potential confounders and about effect modification. Think about which variables you might want to check for effect modification or confounding. One common strategy is to base this decision on the results obtained in the univariable analysis and a p-value threshold of 0.20-0.25. Also, food items that are known risk factors for gastroenteritis could also be included regardless of their univariable p-value.

3. Questions/Assignments

3.1. Confounders and effect modification

Discuss how to identify potential confounders and effect modification. Draw dummy tables before coding to have clear what you want to achieve.

3.2. Install packages and load libraries

Code
# Load the required libraries into the current R session:
pacman::p_load(rio, 
               here, 
               tidyverse, 
               skimr,
               plyr,
               janitor,
               lubridate,
               gtsummary, 
               flextable,
               officer,
               epikit, 
               apyramid, 
               scales,
               EpiStats)

3.3. Import your data

Code
# Import the raw data set: 
copdata <- rio::import(here::here("data", "Spetses_clean2_2024.rds"))

3.4. Consider and assess for confounding and/or effect modification

Have a look at the relative risk for being a case having eaten a specific food item (for example, moussake), when stratified by another variable (for example, orzo). You may consider stratifying by orzo, as it has the highest RR in the univariable analysis.

There are many variables in this dataset and it might not make sense to stratify each variable by each other variable on our search for effect modifiers and confounders.

However, we also don’t want to be too restrictive as a variable which actually (i.e. causally) is associated with the outcome might not show a significant association at the significant level we decided (say 5%) in the univariable analysis due to confounding. Therefore, we could test all variables statistically significant at the 15%, 20%, or 25% level (specific percentage to be decided by your group). In our solutions here, we are looking at moussaka and champagne, stratified by orzo, but you may decide to look at other food items as well.

Use the function csinter() of the EpiStats package.

a) Moussaka as exposure of interest, stratified by having eaten orzo

If we stratify the effect of moussaka by orzo we ask the question: does eating orzo modify or confound the association between eating moussaka and being a case?

Code
stratall <- copdata %>% 
  # Mutate across to convert cases to numeric:
  mutate(across(.cols = case, 
                .fns = ~ as.numeric(.)))

# Pass data to the csinter function:
orzostrata <- csinter(x = stratall, 
                       cases = "case", 
                       exposure = "moussaka", 
                       by = "orzo")

orzostrata
$df1
  CSInter case - moussaka by(orzo) Total Cases Risk %          P.est. Stats
1                         orzo = 1   338  <NA>     NA Risk difference  0.10
2                          Exposed   330   198  60.00      Risk Ratio  1.20
3                        Unexposed     8     4  50.00 Attrib.risk.exp  0.17
4                                     NA  <NA>     NA Attrib.risk.pop  0.16
5                         orzo = 0    36  <NA>     NA Risk difference  0.02
6                          Exposed     8     3  37.50      Risk Ratio  1.05
7                        Unexposed    28    10  35.71 Attrib.risk.exp  0.05
8                                     NA  <NA>     NA Attrib.risk.pop  0.01
9              Missing / Missing %     3  0.8%     NA            <NA>    NA
  95%CI.ll 95%CI.ul
1    -0.25     0.45
2     0.60     2.41
3    -0.68     0.59
4       NA       NA
5    -0.36     0.40
6     0.38     2.92
7    -1.65     0.66
8       NA       NA
9       NA       NA

$df2
                    Point Estimate Chi2 p.value  Stats 95%CI.ll 95%CI.ul
1        Woolf test of homogeneity 0.04   0.833     NA       NA       NA
2            Crude RR for moussaka   NA      NA   1.53     1.01     2.32
3 MH RR moussaka adjusted for orzo   NA      NA   1.15     0.64     2.04
4   Adjusted/crude relative change   NA      NA -25.08       NA       NA

Let’s check if orzo is associated with moussaka (if we are thinking moussaka may be a confounder, we need to see if there is an association between the potential confounder (moussaka) and the exposure (orzo)):

Code
# Perform Wilcoxon rank sum test on orzo and moussaka:
wilcox.test(orzo ~ moussaka, 
            data = copdata)

    Wilcoxon rank sum test with continuity correction

data:  orzo by moussaka
W = 1496, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0

b) Champagne as exposure of interest, stratified by having eaten orzo

Code
# Pass data to the csinter function:
champstrata <- csinter(x = stratall, 
                       cases = "case", 
                       exposure = "champagne", 
                       by = "orzo")

Are there any indications to make you think there may be effect modification and/or confounding?

We could stratify by orzo (the strongest risk factor in the univariable analysis), to examine if orzo confounds the association between eating moussaka and being a case. Before stratification, we will need to check if orzo meets the conditions of being a confounder. For a variable to be a confounder it needs to be associated both with the outcome (being a case) and with the exposure (and not be in the causal pathway between exposure and outcome). We know from univariable analysis that orzo is associated with being a case. If we run a Wilcoxon rank sum test, we will see that orzo is also associated with moussaka (indeed, you can see that most people either had both moussaka and orzo or neither of these food items, so they are associated with each other).

Above, we uses ccinter to stratify and save the object as “orzostrata”.

orzostrata: Within the stratum of the people who ate orzo, moussaka has no significant effect (RR = 1.20, CI: 0.60 - 2.41). The same holds within the stratum of people who didn’t eat orzo (RR = 1.05, CI = 0.38, 2.92). The adjusted MH-RR also suggests that moussaka has no effect (RRadj = 1.15, CI: 0.80 - 2.85).  To identify confounding, we want to look at the % change between the crude and the adjusted RR. This is given by the csinter output “Adjusted/crude relative change”.  The difference between the crude and the MH-RR in this case is >20% suggesting that orzo confounds the association between moussaka and the disease.

This result suggest that moussaka is not a risk factor of the disease and that the crude observed effect was due to the confounding effect of orzo.

If you stratify by moussaka, you see that moussaka does not confound the association between orzo and the disease. The same applies if you stratify the exposure to orzo by other variables. The above, the higher RR for orzo and the dose response relationship we found earlier for orzo (remember this was optional) provide additional evidence that there was something going on with the orzo with pesto dish!