At the end of the session, participants will be able to:
Consider the effect of confounding and effect modification on the association between exposure and disease,
Perform stratified analysis using the Mantel-Haenszel approach
The use of stratified analysis, is the first step to identify confounding factors and effect modifiers one by one (after, of course, thinking which variables could potentially be confounders or effect modifiers). As the final step, you will be using Regression Models to account for confounding and check for effect modification. We will see these with you in the Multivariable Module (MVA), next year.
2. Story/plot description
From the univariable analysis, it seems that eating pasta and eating veal as well as drinking champagne are associated with the highest risk of becoming ill. There are, however, many other food items that are associated with an increased risk (even if not statistically significant).
You should next think about potential confounders and about effect modification. Think about which variables you might want to check for effect modification or confounding. One common strategy is to base this decision on the results obtained in the univariable analysis and a p-value threshold of 0.20-0.25. Also, food items that are known risk factors for gastroenteritis could also be included regardless of their univariable p-value.
3. Questions/Assignments
3.1. Confounders and effect modification
Discuss how to identify potential confounders and effect modification. Draw dummy tables before coding to have clear what you want to achieve.
3.2. Install packages and load libraries
Show the code
# Load the required libraries into the current R session:pacman::p_load(rio, here, tidyverse, skimr, plyr, janitor, lubridate, gtsummary, flextable, officer, epikit, apyramid, scales, EpiStats)
3.3. Import your data
Show the code
# Import the raw data set: copdata <- rio::import(here::here("data", "Spetses_clean2_2024.rds"), trust = T)
3.4. Consider and assess for confounding and/or effect modification
Have a look at the relative risk for being a case having eaten a specific food item (for example, veal), when stratified by another variable (for example, pasta). You may consider stratifying by pasta, as it has the highest RR in the univariable analysis.
There are many variables in this dataset and it might not make sense to stratify each variable by each other variable on our search for effect modifiers and confounders.
However, we also don’t want to be too restrictive as a variable which actually (i.e. causally) is associated with the outcome might not show a significant association at the significant level we decided (say 5%) in the univariable analysis due to confounding. Therefore, we could test all variables statistically significant at the 15%, 20%, or 25% level (specific percentage to be decided by your group). In our solutions here, we are looking at veal and champagne, stratified by pasta, but you may decide to look at other food items as well.
Need a little bit of help?
Use the function csinter() of the EpiStats package.
a) Veal as exposure of interest, stratified by having eaten pasta
If we stratify the effect of veal by pasta we ask the question: does eating pasta modify or confound the association between eating veal and being a case?
Show the code
stratall <- copdata %>%# Mutate across to convert cases to numeric:mutate(across(.cols = case, .fns =~as.numeric(.)))# Pass data to the csinter function:pastastrata <-csinter(x = stratall, cases ="case", exposure ="veal", by ="pasta")pastastrata
$df1
CSInter case - veal by(pasta) Total Cases Risk % P.est. Stats
1 pasta = 1 338 <NA> NA Risk difference 0.10
2 Exposed 330 198 60.00 Risk Ratio 1.20
3 Unexposed 8 4 50.00 Attrib.risk.exp 0.17
4 NA <NA> NA Attrib.risk.pop 0.16
5 pasta = 0 36 <NA> NA Risk difference 0.02
6 Exposed 8 3 37.50 Risk Ratio 1.05
7 Unexposed 28 10 35.71 Attrib.risk.exp 0.05
8 NA <NA> NA Attrib.risk.pop 0.01
9 Missing / Missing % 3 0.8% NA <NA> NA
95%CI.ll 95%CI.ul
1 -0.25 0.45
2 0.60 2.41
3 -0.68 0.59
4 NA NA
5 -0.36 0.40
6 0.38 2.92
7 -1.65 0.66
8 NA NA
9 NA NA
$df2
Point Estimate Chi2 p.value Stats 95%CI.ll 95%CI.ul
1 Woolf test of homogeneity 0.04 0.833 NA NA NA
2 Crude RR for veal NA NA 1.53 1.01 2.32
3 MH RR veal adjusted for pasta NA NA 1.15 0.64 2.04
4 Adjusted/crude relative change NA NA -25.08 NA NA
Let’s check if pasta is associated with veal (if we are thinking pasta may be a confounder, we need to see if there is an association between the potential confounder (pasta) and the exposure (veal)):
Show the code
# Perform Wilcoxon rank sum test on veal and pasta:wilcox.test(veal ~ pasta, data = copdata)
Wilcoxon rank sum test with continuity correction
data: veal by pasta
W = 1496, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
b) Champagne as exposure of interest, stratified by having eaten pasta
Show the code
# Pass data to the csinter function:champstrata <-csinter(x = stratall, cases ="case", exposure ="champagne", by ="pasta")
Let’s stop and… think!
Are there any indications to make you think there may be effect modification and/or confounding?
We could stratify by pasta (the strongest risk factor in the univariable analysis), to examine if pasta confounds the association between eating veal and being a case. Before stratification, we will need to check if pasta meets the conditions of being a confounder. For a variable to be a confounder it needs to be associated both with the outcome (being a case) and with the exposure (and not be in the causal pathway between exposure and outcome). We know from univariable analysis that pasta is associated with being a case. If we run a Wilcoxon rank sum test, we will see that pasta is also associated with veal (indeed, you can see that most people either had both pasta and veal or neither of these food items, so they are associated with each other).
Above, we uses ccinter to stratify and save the object as “pastastrata”.
pastastrata: Within the stratum of the people who ate pasta, veal has no significant effect (RR = 1.20, CI: 0.60 - 2.41). The same holds within the stratum of people who didn’t eat pasta (RR = 1.05, CI = 0.38, 2.92). The adjusted MH-RR also suggests that veal has no effect (RRadj = 1.15, CI: 0.80 - 2.85). To identify confounding, we want to look at the % change between the crude and the adjusted RR. This is given by the csinter output “Adjusted/crude relative change”. The difference between the crude and the MH-RR in this case is >20% suggesting that pasta confounds the association between veal and the disease.
This result suggest that veal is not a risk factor of the disease and that the crude observed effect was due to the confounding effect of pasta.
If you stratify by veal, you see that veal does not confound the association between pasta and the disease. The same applies if you stratify the exposure to pasta by other variables. The above, the higher RR for pasta and the dose response relationship we found earlier for pasta (remember this was optional) provide additional evidence that there was something going on with the pasta with pesto dish!
Source Code
---title: "Stratified analysis"editor: visual---## 1. Learning outcomesAt the end of the session, participants will be able to:- Consider the effect of confounding and effect modification on the association between exposure and disease,- Perform stratified analysis using the Mantel-Haenszel approach*The use of stratified analysis, is the first step to identify confounding factors and effect modifiers one by one (after, of course, thinking which variables could potentially be confounders or effect modifiers). As the final step, you will be using Regression Models to account for confounding and check for effect modification. We will see these with you in the Multivariable Module (MVA), next year.*## 2. Story/plot descriptionFrom the univariable analysis, it seems that eating pasta and eating veal as well as drinking champagne are associated with the highest risk of becoming ill. There are, however, many other food items that are associated with an increased risk (even if not statistically significant).You should next think about potential confounders and about effect modification. Think about which variables you might want to check for effect modification or confounding. One common strategy is to base this decision on the results obtained in the univariable analysis and a p-value threshold of 0.20-0.25. Also, food items that are known risk factors for gastroenteritis could also be included regardless of their univariable p-value.## 3. Questions/Assignments## 3.1. Confounders and effect modificationDiscuss how to identify potential confounders and effect modification. Draw dummy tables before coding to have clear what you want to achieve.## 3.2. Install packages and load libraries```{r}# Load the required libraries into the current R session:pacman::p_load(rio, here, tidyverse, skimr, plyr, janitor, lubridate, gtsummary, flextable, officer, epikit, apyramid, scales, EpiStats)```## 3.3. Import your data```{r, Import_data}# Import the raw data set: copdata <- rio::import(here::here("data", "Spetses_clean2_2024.rds"), trust = T)```## 3.4. Consider and assess for confounding and/or effect modificationHave a look at the relative risk for being a case having eaten a specific food item (for example, veal), when stratified by another variable (for example, pasta). You may consider stratifying by pasta, as it has the highest RR in the univariable analysis.There are many variables in this dataset and it might not make sense to stratify each variable by each other variable on our search for effect modifiers and confounders.However, we also don't want to be too restrictive as a variable which actually (i.e. causally) is associated with the outcome might not show a significant association at the significant level we decided (say 5%) in the univariable analysis due to confounding. Therefore, we could test all variables statistically significant at the 15%, 20%, or 25% level (specific percentage to be decided by your group). In our solutions here, we are looking at veal and champagne, stratified by pasta, but you may decide to look at other food items as well.::: {.callout-tip title="Need a little bit of help?" collapse="true"}Use the function `csinter()` of the EpiStats package.:::### a) Veal as exposure of interest, stratified by having eaten pastaIf we stratify the **effect of veal by pasta** we ask the question: does eating pasta modify or confound the association between eating veal and being a case?```{r}stratall <- copdata %>%# Mutate across to convert cases to numeric:mutate(across(.cols = case, .fns =~as.numeric(.)))# Pass data to the csinter function:pastastrata <-csinter(x = stratall, cases ="case", exposure ="veal", by ="pasta")pastastrata```Let's check if pasta is associated with veal (if we are thinking pasta may be a confounder, we need to see if there is an association between the potential confounder (pasta) and the exposure (veal)):```{r}# Perform Wilcoxon rank sum test on veal and pasta:wilcox.test(veal ~ pasta, data = copdata)```### b) Champagne as exposure of interest, stratified by having eaten pasta```{r}# Pass data to the csinter function:champstrata <-csinter(x = stratall, cases ="case", exposure ="champagne", by ="pasta")```::: {.callout-warning title="Let's stop and... think!" collapse="true"}Are there any indications to make you think there may be effect modification and/or confounding?:::We could stratify by pasta (the strongest risk factor in the univariable analysis), to examine if pasta confounds the association between eating veal and being a case. Before stratification, we will need to check if pasta meets the conditions of being a confounder. For a variable to be a confounder it needs to be associated both with the outcome (being a case) and with the exposure (and not be in the causal pathway between exposure and outcome). We know from univariable analysis that pasta is associated with being a case. If we run a Wilcoxon rank sum test, we will see that pasta is also associated with veal (indeed, you can see that most people either had both pasta and veal or neither of these food items, so they are associated with each other).Above, we uses `ccinter` to stratify and save the object as "pastastrata".pastastrata: Within the stratum of the people who ate pasta, veal has no significant effect (RR = 1.20, CI: 0.60 - 2.41). The same holds within the stratum of people who didn't eat pasta (RR = 1.05, CI = 0.38, 2.92). The adjusted MH-RR also suggests that veal has no effect (RRadj = 1.15, CI: 0.80 - 2.85). To identify confounding, we want to look at the % change between the crude and the adjusted RR. This is given by the csinter output "Adjusted/crude relative change". The difference between the crude and the MH-RR in this case is \>20% suggesting that pasta confounds the association between veal and the disease.This result suggest that veal is not a risk factor of the disease and that the crude observed effect was due to the confounding effect of pasta.If you stratify by veal, you see that veal does not confound the association between pasta and the disease. The same applies if you stratify the exposure to pasta by other variables. The above, the higher RR for pasta and the dose response relationship we found earlier for pasta (remember this was optional) provide additional evidence that there was something going on with the pasta with pesto dish!