Skip to contents

Power calculation for a joint analysis of a two-stage case control design for SNP data.

Usage

cats(
  freq = 0.5,
  freq2 = -1,
  ncases = 500,
  ncontrols = 500,
  ncases2 = 500,
  ncontrols2 = 500,
  risk = 1.5,
  risk2 = -1,
  pisamples = -1,
  prevalence = 0.1,
  prevalence2 = -1,
  additive = 0,
  recessive = 0,
  dominant = 0,
  multiplicative = 1,
  alpha = 1e-07,
  pimarkers = 0.00316
)

Arguments

freq

numeric. The minor allele frequency (MAF) in the first stage

freq2

numeric. The MAF in the second stage, Optional, if -1 the same value as for the first stage is given

ncases

integer. The number of cases in the first stage

ncontrols

integer. The number of controls in the first stage

ncases2

integer. The number of cases in the second stage

ncontrols2

integer. The number of controls in the second stage

risk

numeric. The relative risk in the first stage

risk2

numeric. The relative risk in the second stage, Optional, if -1 the same value as for the first stage is given

pisamples

numeric. The weights used for the joint statistic. Optional. see details

prevalence

numeric. The prevalence of the disease in the population for the first stage

prevalence2

numeric. The prevalence of the disease in the population for the second stag, Optional, if -1 the same value as for the first stage is given

additive

boolean. if 1 an additive model is assumed

recessive

boolean. if 1 a recessive model is assumed

dominant

boolean. if 1 a dominant model is assumed

multiplicative

boolean. if 1 a multiplicative model is assumed

alpha

numeric. The significance threshold. Often the a threshold of 0.05 divided by the number of markers is chosen

pimarkers

numeric. The fraction of markers genotyped in the second stage

Value

P.one.study

The power if only one study was performed, NB! This is only a valid estimate if the relative risk and allele frequency is the same for both stages

P.first.stage

The power for a marker to proceed the the second stage

P.rep.study

The power of the study if based on replication and not a joint analysis

P.joint.min

The power of the joint analysis tp detect at least one susceptibility SNP assuming that five susceptibility SNPs exits

P.joint

The power of the joint analysis

pi

The weight used to calculate the joint statistic

T.one.study

Recommended thresholds for a one-stage study

T.first.stage

Recommended thresholds for the first stage in two-stage study

T.second.stage.rep

Recommended thresholds for the second stage in replication analysis

T.second.stage.joint

Recommended thresholds for the second stage in a joint analysis

E.Disease.freq.cases1

The expected disease allele frequency in stage 1 for cases

E.Disease.freq.controls1

The expected disease allele frequency in stage 1 for controls

E.Disease.freq.cases2

The expected disease allele frequency in stage 2 for cases

E.Disease.freq.controls2

The expected disease allele frequency in stage 2 for controls

Details

These power analysis are based on Skol et al. 2006, But are generized so that the ratio between cases and controls may vary between stages. Also the allele frequencies, disease prevalence and relative risk are also allowed to vary. The joint statistic $z_joint=z_1\sqrt\pi+z_2\sqrt1-\pi$ where $z_1$ is the z-score for the first stage and the weight $\pi$ is calculated as $\pi=1/var(\hatp'_1-\hatp_1)*(1/var(\hatp'_1-\hatp_1)+1/var(\hatp'_2-\hatp_2))^-1$, where $\hatp'_1$ is the estimate of the allele frequency of the cases in the first stage. This is consistent with Skol et al 2006 when the ratios of cases and controls are the same in both stages. When this is not the case the weight $\pi$ may vary slightly with different allele frequencies and different relative risks. For power calculations I would recommend calculating the weight at a likely scenario where there is about 80-90% power and fixing the weights at other scenarios (and the testing of the real data) to this weight. This can be done by assigning pisample to a value. In practice this will hardly affect the power.

References

Skol AD, Scott LJ, Abecasis GR, Boehnke M: Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 38: 209-213, 2006.

Author

Anders Albrechtsen

Examples

# calculate the power under a multiplicative model using a two stage design
# and assuming a relative risk of 1.5
cats(
  freq = 0.2,
  ncases = 500, ncases2 = 500,
  ncontrols = 1000, ncontrols2 = 1000,
  risk = 1.5, multiplicative = 1
)
#> Expected Power is;
#> 
#>      
#> 
#>                    For a one-stage study = 0.94
#>       For first stage in two-stage study = 0.972
#> For second stage in replication analysis = 0.784
#>     For second stage in a joint analysis = 0.929
#>                                       pi = 0.5
#> 

power.J <- c()
power.R <- c()
power.O <- c()
RR <- 23:32 / 20
for (tal in 1:length(RR)) {
  temp <- cats(risk = RR[tal])
  power.J[tal] <- temp$P.joint
  power.R[tal] <- temp$P.rep.study
  power.O[tal] <- temp$P.one.study
}
plot(RR, power.J, type = "b", lwd = 2, ylab = "Power")
lines(RR, power.R, lwd = 2, col = 2, type = "b")
lines(RR, power.O, lwd = 2, col = 3, type = "b")
legend(1.4, 0.4, c(
  "joint analysis", "replication design",
  "one stage design"
), col = 1:3, lwd = 3, bty = "n")