Skip to contents

Measuring feature importance with ensemble partial least squares.

Usage

enpls.fs(
  x,
  y,
  maxcomp = NULL,
  cvfolds = 5L,
  reptimes = 500L,
  method = c("mc", "boot"),
  ratio = 0.8,
  parallel = 1L
)

Arguments

x

Predictor matrix.

y

Response vector.

maxcomp

Maximum number of components included within each model. If not specified, will use the maximum number possible (considering cross-validation and special cases where n is smaller than p).

cvfolds

Number of cross-validation folds used in each model for automatic parameter selection, default is 5.

reptimes

Number of models to build with Monte-Carlo resampling or bootstrapping.

method

Resampling method. "mc" (Monte-Carlo resampling) or "boot" (bootstrapping). Default is "mc".

ratio

Sampling ratio used when method = "mc".

parallel

Integer. Number of CPU cores to use. Default is 1 (not parallelized).

Value

A list containing two components:

  • variable.importance - a vector of variable importance

  • coefficient.matrix - original coefficient matrix

See also

See enpls.od for outlier detection with ensemble partial least squares regressions. See enpls.fit for fitting ensemble partial least squares regression models.

Author

Nan Xiao <https://nanx.me>

Examples

data("alkanes")
x <- alkanes$x
y <- alkanes$y

set.seed(42)
fs <- enpls.fs(x, y, reptimes = 50)
print(fs)
#> Variable Importance by Ensemble Partial Least Squares
#> ---
#>          Importance
#> MEDV.23   2.3438355
#> MEDV.33   2.1624571
#> Chi.P.4   2.1475160
#> Chi.C.3   2.0521822
#> Chi.P.5   1.4142498
#> Estate.1  1.2850053
#> MEDV.22   1.2822210
#> Chi.P.3   1.0533900
#> MEDV.12   1.0532281
#> MEDV.11   0.9379795
#> MEDV.13   0.8904436
#> Chi.PC.4  0.7798934
#> Estate.2  0.7473758
#> Chi.P.2   0.7221879
#> Kappa.3   0.7076264
#> Kappa.1   0.7043556
#> Kappa.2   0.4590357
#> Chi.P.1   0.4193437
#> Estate.3  0.3400528
#> Chi.P.0   0.2904166
#> Kappa.4   0.2455370
plot(fs)