Let’s denote the vaccine-symptom pairs from the VAERS database as \(2 \times 2\) contingency tables:
Target vaccine | Target symptom | All other symptoms | Total |
---|---|---|---|
Yes | \(n_{ij}\) | \(n_i - n_{ij}\) | \(n_i\) |
No | \(n_j - n_{ij}\) | \(n - n_i - n_j + n_{ij}\) | \(n - n_i\) |
Total | \(n_j\) | \(n - n_j\) | \(n\) |
It is often the case than not that \(n_{ij}\) occures as a rare event such that the observed count follows a Poisson distribution. In the GPS algorithm, we model the relative reporting ratio (RRR), which measures the ratio of how many symptoms under vaccine were actually observed over the number of expected symptoms, under the assumption that the symptoms and vaccine exposure are independent. The RRR (\(\lambda\)) is defined as
\[ \lambda_{ij} = \frac{\mu_{ij}}{E_{ij}}. \]
\(\mu_{ij}\) is the mean of the Poisson distribution of \(n_{ij}\). \(E_{ij}\) is the expected event count under the assumption that the vaccine exposure and the symptom are idependent. It can be estimated as
\[ \hat{E_{ij}} = \frac{n_i n_j}{n}. \]
For \(\lambda\), we can use a gamma distribution as a conjugate prior since the Poisson distributed \(n_{ij}\). (\(n_{ij} \sim Poisson (\mu_{ij})\)). In the GPS model, a simple mixture of two Gamma distributions was used as the prior to allow for more flexibility in the model:
\[ \lambda_{ij} \sim P \ \text{Gamma}(\alpha_1, \beta_1) + (1-P) \ \text{Gamma}(\alpha_2, \beta_2). \]
Using an empirical Bayesian approach, we can use the data to estimate the paramaters (\(\alpha_1\), \(\alpha_2\), \(\beta_1\), \(\beta_2\), \(P\)), calculate the posterior for \(\lambda\), and then derive the expectation of \(\log(\lambda)\):
\[ E(\log(\lambda)) = Q (\psi (\alpha_1 + n_{ij}) - \log(1/\beta_1 + E_{ij})) + (1-Q) (\psi (\alpha_2 + n_{ij}) - \log(1/\beta_2) + E_{ij}) \]
where
\[ Q = \frac{P \cdot \text{NB}(n_{ij}, E_{ij} \ | \ \alpha_1, \beta_1)}{P \cdot \text{NB}(n_{ij}, E_{ij} \ | \ \alpha_1, \beta_1) + (1-P) \cdot \text{NB}(n_{ij}, E_{ij} \ | \ \alpha_2, \beta_2)} \]
\(\psi\) is the digamma function. \(\text{NB}\) is the negative binomial distribution.
With the expectation of \(\log(\lambda)\), we can define the Empirical Bayesian Geometric Mean (EBGM) as
\[EBGM = e^{E(\log(\lambda))}.\]
The fifth percentile of the posterior distribution of \(\lambda\) is denoted as EB05 and interpreted as the lower one-sided 95% confidence limit for the EBGM. This estimator gives more conservative estimates when the event counts are small, thus yields a shrinkage estimate which could reduce the number of false-positive signals.
To fit a Gamma Poisson Shrinker model (DuMouchel 1999), we first calculate actual counts (N), expected counts (E) of each vaccine-symptom combination, relative reporting ratio (RRR), and proportional reporting ratio (PRR) using age + gender stratification to control potential confounding effects:
library("openEBGM")
library("magrittr")
library("kableExtra")
library("ggsci")
df_p <- readRDS("data-processed/df.rds") %>%
processRaw(stratify = TRUE, zeroes = FALSE)
# stratification variables used: strat_age, strat_gender
# there were 12 strata: <= 2-Female, <= 2-Male, <= 2-Unknown, > 65-Female, > 65-Male, > 65-Unknown, 18 - 65-Female, 18 - 65-Male, 18 - 65-Unknown, 2 - 18-Female, 2 - 18-Male, 2 - 18-Unknown
Sanity check:
df_p %>% head()
var1 var2 N E RR
1 ADENOVIRUS (NO BRAND NAME) Arthralgia 1 0.119480157 8.37
2 ADENOVIRUS (NO BRAND NAME) Dyspnoea 1 0.083668035 11.95
3 ADENOVIRUS (NO BRAND NAME) Face oedema 1 0.011163701 89.58
4 ADENOVIRUS (NO BRAND NAME) Gait disturbance 1 0.014375342 69.56
5 ADENOVIRUS (NO BRAND NAME) Osteoarthritis 1 0.005387014 185.63
6 ADENOVIRUS (NO BRAND NAME) Petechiae 1 0.002717102 368.04
PRR
1 16.26
2 17.43
3 86.42
4 67.12
5 296.72
6 212.56
# var1 var2 N E RR PRR
# 1 ADENOVIRUS (NO BRAND NAME) Arthralgia 1 0.119480157 8.37 16.26
# 2 ADENOVIRUS (NO BRAND NAME) Dyspnoea 1 0.083668035 11.95 17.43
# 3 ADENOVIRUS (NO BRAND NAME) Face oedema 1 0.011163701 89.58 86.42
# 4 ADENOVIRUS (NO BRAND NAME) Gait disturbance 1 0.014375342 69.56 67.12
# 5 ADENOVIRUS (NO BRAND NAME) Osteoarthritis 1 0.005387014 185.63 296.72
# 6 ADENOVIRUS (NO BRAND NAME) Petechiae 1 0.002717102 368.04 212.56
Save processed data to file:
df_p %>% saveRDS(file = "data-processed/df_p.rds")
df_p <- readRDS("data-processed/df_p.rds")
squashed <- squashData(df_p, bin_size = 100, keep_pts = 0)
squashed <- squashData(squashed, count = 2, bin_size = 12)
library("foreach")
library("doParallel")
registerDoParallel(parallel::detectCores())
suppressMessages(library("DEoptim"))
set.seed(2020)
theta_hat <- DEoptim(
negLLsquash,
lower = c(rep(1e-05, 4), 0.001),
upper = c(rep(5, 4), 1 - 0.001),
control = DEoptim.control(
itermax = 500,
reltol = 1e-04,
steptol = 200,
NP = 100,
CR = 0.85,
F = 0.75,
trace = 25,
parallelType = 2
),
ni = squashed$N, ei = squashed$E, wi = squashed$weight
)
Iteration: 25 bestvalit: 325555.202871 bestmemit: 0.014104 0.116785 1.749330 1.712070 0.101938
Iteration: 50 bestvalit: 325354.375330 bestmemit: 0.002284 0.113023 2.013934 1.947101 0.094070
Iteration: 75 bestvalit: 325347.784248 bestmemit: 0.000153 0.115025 2.093288 2.020985 0.097340
Iteration: 100 bestvalit: 325347.575191 bestmemit: 0.000036 0.116657 2.103292 2.031426 0.099336
Iteration: 125 bestvalit: 325347.567304 bestmemit: 0.000010 0.116819 2.103899 2.031816 0.099493
Iteration: 150 bestvalit: 325347.567208 bestmemit: 0.000010 0.116802 2.103769 2.031757 0.099480
Iteration: 175 bestvalit: 325347.567205 bestmemit: 0.000010 0.116798 2.103761 2.031760 0.099474
Iteration: 200 bestvalit: 325347.567205 bestmemit: 0.000010 0.116799 2.103766 2.031765 0.099474
Iteration: 225 bestvalit: 325347.567205 bestmemit: 0.000010 0.116799 2.103766 2.031764 0.099475
# Iteration: 25 bestvalit: 325555.202871 bestmemit: 0.014104 0.116785 1.749330 1.712070 0.101938
# Iteration: 50 bestvalit: 325354.375330 bestmemit: 0.002284 0.113023 2.013934 1.947101 0.094070
# Iteration: 75 bestvalit: 325347.784248 bestmemit: 0.000153 0.115025 2.093288 2.020985 0.097340
# Iteration: 100 bestvalit: 325347.575191 bestmemit: 0.000036 0.116657 2.103292 2.031426 0.099336
# Iteration: 125 bestvalit: 325347.567304 bestmemit: 0.000010 0.116819 2.103899 2.031816 0.099493
# Iteration: 150 bestvalit: 325347.567208 bestmemit: 0.000010 0.116802 2.103769 2.031757 0.099480
# Iteration: 175 bestvalit: 325347.567205 bestmemit: 0.000010 0.116798 2.103761 2.031760 0.099474
# Iteration: 200 bestvalit: 325347.567205 bestmemit: 0.000010 0.116799 2.103766 2.031765 0.099474
# Iteration: 225 bestvalit: 325347.567205 bestmemit: 0.000010 0.116799 2.103766 2.031764 0.099475
Check \(\hat{\theta}\):
theta_hat <- as.numeric(theta_hat$optim$bestmem)
theta_hat
[1] 1.000001e-05 1.167989e-01 2.103766e+00 2.031764e+00 9.947457e-02
# 1.000001e-05 1.167989e-01 2.103766e+00 2.031764e+00 9.947457e-02
Save to file:
saveRDS(theta_hat, file = "data-processed/theta_hat.rds")
df_p <- readRDS("data-processed/df_p.rds")
squashed <- squashData(df_p, bin_size = 100, keep_pts = 0)
squashed <- squashData(squashed, count = 2, bin_size = 12)
theta_hat <- readRDS("data-processed/theta_hat.rds")
qn <- Qn(theta_hat, N = df_p$N, E = df_p$E)
head(qn)
[1] 5.391214e-06 6.031719e-06 8.479922e-06 8.312731e-06 8.803126e-06
[6] 8.963224e-06
identical(length(qn), nrow(df_p))
[1] TRUE
summary(qn)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000018 0.0000027 0.0000036 0.0010325 0.0000056 1.0000000
df_p$ebgm <- ebgm(theta_hat, N = df_p$N, E = df_p$E, qn = qn)
head(df_p) %>% kable() %>% kable_styling()
var1 | var2 | N | E | RR | PRR | ebgm |
---|---|---|---|---|---|---|
ADENOVIRUS (NO BRAND NAME) | Arthralgia | 1 | 0.1194802 | 8.37 | 16.26 | 1.22 |
ADENOVIRUS (NO BRAND NAME) | Dyspnoea | 1 | 0.0836680 | 11.95 | 17.43 | 1.24 |
ADENOVIRUS (NO BRAND NAME) | Face oedema | 1 | 0.0111637 | 89.58 | 86.42 | 1.28 |
ADENOVIRUS (NO BRAND NAME) | Gait disturbance | 1 | 0.0143753 | 69.56 | 67.12 | 1.28 |
ADENOVIRUS (NO BRAND NAME) | Osteoarthritis | 1 | 0.0053870 | 185.63 | 296.72 | 1.29 |
ADENOVIRUS (NO BRAND NAME) | Petechiae | 1 | 0.0027171 | 368.04 | 212.56 | 1.29 |
df_p$QUANT_05 <- quantBisect(
5,
theta_hat = theta_hat,
N = df_p$N, E = df_p$E, qn = qn
)
df_p$QUANT_95 <- quantBisect(
95,
theta_hat = theta_hat,
N = df_p$N, E = df_p$E, qn = qn
)
head(df_p) %>% kable() %>% kable_styling()
var1 | var2 | N | E | RR | PRR | ebgm | QUANT_05 | QUANT_95 |
---|---|---|---|---|---|---|---|---|
ADENOVIRUS (NO BRAND NAME) | Arthralgia | 1 | 0.1194802 | 8.37 | 16.26 | 1.22 | 0.41 | 3.00 |
ADENOVIRUS (NO BRAND NAME) | Dyspnoea | 1 | 0.0836680 | 11.95 | 17.43 | 1.24 | 0.41 | 3.05 |
ADENOVIRUS (NO BRAND NAME) | Face oedema | 1 | 0.0111637 | 89.58 | 86.42 | 1.28 | 0.42 | 3.15 |
ADENOVIRUS (NO BRAND NAME) | Gait disturbance | 1 | 0.0143753 | 69.56 | 67.12 | 1.28 | 0.42 | 3.15 |
ADENOVIRUS (NO BRAND NAME) | Osteoarthritis | 1 | 0.0053870 | 185.63 | 296.72 | 1.29 | 0.42 | 3.17 |
ADENOVIRUS (NO BRAND NAME) | Petechiae | 1 | 0.0027171 | 368.04 | 212.56 | 1.29 | 0.43 | 3.17 |
Suspicious pair with EBGM05 > 5, which are clearly reporting signals:
suspicious <- df_p[df_p$QUANT_05 >= 5, ]
nrow(suspicious)
[1] 311
nrow(df_p)
[1] 166220
nrow(suspicious) / nrow(df_p)
[1] 0.001871014
suspicious <- suspicious[order(suspicious$QUANT_05, decreasing = TRUE), c("var1", "var2", "N", "E", "QUANT_05", "ebgm", "QUANT_95")]
head(suspicious, 5) %>% kable() %>% kable_styling()
var1 | var2 | N | E | QUANT_05 | ebgm | QUANT_95 | |
---|---|---|---|---|---|---|---|
73892 | INFLUENZA (SEASONAL) (FLUCELVAX) | Product use issue | 57 | 0.2753666 | 115.19 | 144.07 | 178.40 |
131701 | ROTAVIRUS (ROTASHIELD) | Gastrointestinal haemorrhage | 94 | 0.7136427 | 94.70 | 112.59 | 133.06 |
72211 | INFLUENZA (SEASONAL) (FLUBLOK QUADRIVALENT) | Product administered to patient of inappropriate age | 141 | 1.5132541 | 74.88 | 86.19 | 98.82 |
48195 | HPV (CERVARIX) | Product use issue | 22 | 0.0950984 | 70.29 | 101.47 | 142.71 |
73652 | INFLUENZA (SEASONAL) (FLUCELVAX) | Drug administered to patient of inappropriate age | 350 | 5.8868353 | 53.27 | 58.21 | 63.52 |
tabbed <- table(suspicious$var1)
head(tabbed[order(tabbed, decreasing = TRUE)]) %>% kable() %>% kable_styling()
Var1 | Freq |
---|---|
SMALLPOX (ACAM2000) | 35 |
LYME (LYMERIX) | 34 |
ROTAVIRUS (ROTATEQ) | 21 |
HPV (GARDASIL) | 14 |
SMALLPOX (DRYVAX) | 13 |
INFLUENZA (SEASONAL) (FLUMIST QUADRIVALENT) | 12 |
txt <- paste0(
1:nrow(suspicious),
". After controlling for the confounding effects from age and gender, the reporting of the adverse reaction ",
suspicious$var2,
" for vaccine ",
suspicious$var1,
" is disproportionately high compared to this same event for all other vaccines, with ",
suspicious$N,
" total reports and an EBGM05 score ",
suspicious$QUANT_05,
"."
)
txt[1:5]
# [1] 1. After controlling for the confounding effects from age and gender,
the reporting of the adverse reaction Product use issue for vaccine
INFLUENZA (SEASONAL) (FLUCELVAX) is disproportionately high compared
to this same event for all other vaccines, with 57 total reports and
an EBGM05 score 115.19.
# [2] 2. After controlling for the confounding effects from age and gender,
the reporting of the adverse reaction Gastrointestinal haemorrhage for
vaccine ROTAVIRUS (ROTASHIELD) is disproportionately high compared to
this same event for all other vaccines, with 94 total reports and
an EBGM05 score 94.7.
# [3] 3. After controlling for the confounding effects from age and gender,
the reporting of the adverse reaction Product administered to patient
of inappropriate age for vaccine INFLUENZA (SEASONAL) (FLUBLOK QUADRIVALENT)
is disproportionately high compared to this same event for all other
vaccines, with 141 total reports and an EBGM05 score 74.88.
# [4] 4. After controlling for the confounding effects from age and gender,
the reporting of the adverse reaction Product use issue for vaccine HPV
(CERVARIX) is disproportionately high compared to this same event for all
other vaccines, with 22 total reports and an EBGM05 score 70.29.
# [5] 5. After controlling for the confounding effects from age and gender,
the reporting of the adverse reaction Drug administered to patient of
inappropriate age for vaccine INFLUENZA (SEASONAL) (FLUCELVAX) is
disproportionately high compared to this same event for all other
vaccines, with 350 total reports and an EBGM05 score 53.27.
df_p <- readRDS("data-processed/df_p.rds")
squashed <- squashData(df_p, bin_size = 100, keep_pts = 0)
squashed <- squashData(squashed, count = 2, bin_size = 12)
theta_hat <- readRDS("data-processed/theta_hat.rds")
est <- list("estimates" = theta_hat)
names(est$estimates) <- c("alpha1", "beta1", "alpha2", "beta2", "P")
# for the 5th and 95th percentiles
ebout <- ebScores(
df_p,
hyper_estimate = est,
quantiles = c(5, 95)
)
print(ebout, threshold = 10)
There were 123 var1-var2 pairs with a QUANT_05 greater than 10
Top 5 Highest QUANT_05 Scores
var1
73892 INFLUENZA (SEASONAL) (FLUCELVAX)
131701 ROTAVIRUS (ROTASHIELD)
72211 INFLUENZA (SEASONAL) (FLUBLOK QUADRIVALENT)
48195 HPV (CERVARIX)
73652 INFLUENZA (SEASONAL) (FLUCELVAX)
var2 N
73892 Product use issue 57
131701 Gastrointestinal haemorrhage 94
72211 Product administered to patient of inappropriate age 141
48195 Product use issue 22
73652 Drug administered to patient of inappropriate age 350
E QUANT_05
73892 0.27536662 115.19
131701 0.71364266 94.70
72211 1.51325411 74.88
48195 0.09509842 70.29
73652 5.88683526 53.27
The global overview might not be too helpful:
plot(ebout, plot.type = "histogram")
plot(ebout, plot.type = "shrinkage") + scale_color_nejm()
Let’s focus on a specific type of adverse event (vomiting):
plot(ebout, event = "Vomiting") + scale_fill_nejm()
plot(ebout, plot.type = "histogram", event = "Vomiting")
plot(ebout, plot.type = "shrinkage", event = "Vomiting") + scale_color_nejm()
DuMouchel, William. 1999. “Bayesian Data Mining in Large Frequency Tables, with an Application to the Fda Spontaneous Reporting System.” The American Statistician 53 (3): 177–90.
If you see mistakes or want to suggest changes, please create an issue on the source repository.