Skip to contents

Measuring feature importance with ensemble partial least squares.

Usage

enpls.fs(
  x,
  y,
  maxcomp = NULL,
  cvfolds = 5L,
  reptimes = 500L,
  method = c("mc", "boot"),
  ratio = 0.8,
  parallel = 1L
)

Arguments

x

Predictor matrix.

y

Response vector.

maxcomp

Maximum number of components included within each model. If not specified, will use the maximum number possible (considering cross-validation and special cases where n is smaller than p).

cvfolds

Number of cross-validation folds used in each model for automatic parameter selection, default is 5.

reptimes

Number of models to build with Monte-Carlo resampling or bootstrapping.

method

Resampling method. "mc" (Monte-Carlo resampling) or "boot" (bootstrapping). Default is "mc".

ratio

Sampling ratio used when method = "mc".

parallel

Integer. Number of CPU cores to use. Default is 1 (not parallelized).

Value

A list containing two components:

  • variable.importance - a vector of variable importance

  • coefficient.matrix - original coefficient matrix

See also

See enpls.od for outlier detection with ensemble partial least squares regressions. See enpls.fit for fitting ensemble partial least squares regression models.

Author

Nan Xiao <https://nanx.me>

Examples

data("alkanes")
x <- alkanes$x
y <- alkanes$y

set.seed(42)
fs <- enpls.fs(x, y, reptimes = 50)
print(fs)
#> Variable Importance by Ensemble Partial Least Squares
#> ---
#>          Importance
#> Chi.C.3   2.4683701
#> MEDV.23   2.3787962
#> MEDV.33   2.2815314
#> Chi.P.4   2.0315902
#> Chi.P.5   1.6766926
#> MEDV.13   1.4392556
#> MEDV.22   1.4159863
#> Chi.P.3   1.2510102
#> Estate.1  1.2467426
#> MEDV.12   1.2000973
#> MEDV.11   1.0926947
#> Chi.P.2   1.0155413
#> Chi.PC.4  0.8968577
#> Kappa.1   0.8588444
#> Kappa.3   0.8552669
#> Estate.2  0.6537099
#> Chi.P.1   0.6405966
#> Kappa.2   0.6140915
#> Kappa.4   0.5323162
#> Chi.P.0   0.4964470
#> Estate.3  0.4209405
plot(fs)