Skip to contents

Generate simulation data (Gaussian case) following the settings in Xiao and Xu (2015).

Usage

msaenet.sim.gaussian(
  n = 300,
  p = 500,
  rho = 0.5,
  coef = rep(0.2, 50),
  snr = 1,
  p.train = 0.7,
  seed = 1001
)

Arguments

n

Number of observations.

p

Number of variables.

rho

Correlation base for generating correlated variables.

coef

Vector of non-zero coefficients.

snr

Signal-to-noise ratio (SNR). SNR is defined as $$ \frac{Var(E(y | X))}{Var(Y - E(y | X))} = \frac{Var(f(X))}{Var(\varepsilon)} = \frac{Var(X^T \beta)}{Var(\varepsilon)} = \frac{Var(\beta^T \Sigma \beta)}{\sigma^2}. $$

p.train

Percentage of training set.

seed

Random seed for reproducibility.

Value

List of x.tr, x.te, y.tr, and y.te.

References

Nan Xiao and Qing-Song Xu. (2015). Multi-step adaptive elastic-net: reducing false positives in high-dimensional variable selection. Journal of Statistical Computation and Simulation 85(18), 3755–3765.

Author

Nan Xiao <https://nanx.me>

Examples

dat <- msaenet.sim.gaussian(
  n = 300, p = 500, rho = 0.6,
  coef = rep(1, 10), snr = 3, p.train = 0.7,
  seed = 1001
)

dim(dat$x.tr)
#> [1] 210 500
dim(dat$x.te)
#> [1]  90 500