Skip to contents

Model applicability domain evaluation with ensemble partial least squares.

Usage

enpls.ad(
  x,
  y,
  xtest,
  ytest,
  maxcomp = NULL,
  cvfolds = 5L,
  space = c("sample", "variable"),
  method = c("mc", "boot"),
  reptimes = 500L,
  ratio = 0.8,
  parallel = 1L
)

Arguments

x

Predictor matrix of the training set.

y

Response vector of the training set.

xtest

List, with the i-th component being the i-th test set's predictor matrix (see example code below).

ytest

List, with the i-th component being the i-th test set's response vector (see example code below).

maxcomp

Maximum number of components included within each model. If not specified, will use the maximum number possible (considering cross-validation and special cases where n is smaller than p).

cvfolds

Number of cross-validation folds used in each model for automatic parameter selection, default is 5.

space

Space in which to apply the resampling method. Can be the sample space ("sample") or the variable space ("variable").

method

Resampling method. "mc" (Monte-Carlo resampling) or "boot" (bootstrapping). Default is "mc".

reptimes

Number of models to build with Monte-Carlo resampling or bootstrapping.

ratio

Sampling ratio used when method = "mc".

parallel

Integer. Number of CPU cores to use. Default is 1 (not parallelized).

Value

A list containing:

  • tr.error.mean - absolute mean prediction error for training set

  • tr.error.median - absolute median prediction error for training set

  • tr.error.sd - prediction error sd for training set

  • tr.error.matrix - raw prediction error matrix for training set

  • te.error.mean - list of absolute mean prediction error for test set(s)

  • te.error.median - list of absolute median prediction error for test set(s)

  • te.error.sd - list of prediction error sd for test set(s)

  • te.error.matrix - list of raw prediction error matrix for test set(s)

Note

Note that for space = "variable", method could only be "mc", since bootstrapping in the variable space will create duplicated variables, and that could cause problems.

Author

Nan Xiao <https://nanx.me>

Examples

data("alkanes")
x <- alkanes$x
y <- alkanes$y

# training set
x.tr <- x[1:100, ]
y.tr <- y[1:100]

# two test sets
x.te <- list(
  "test.1" = x[101:150, ],
  "test.2" = x[151:207, ]
)
y.te <- list(
  "test.1" = y[101:150],
  "test.2" = y[151:207]
)

set.seed(42)
ad <- enpls.ad(
  x.tr, y.tr, x.te, y.te,
  space = "variable", method = "mc",
  ratio = 0.9, reptimes = 50
)
print(ad)
#> Model Applicability Domain Evaluation by ENPLS
#> ---
#> Absolute mean prediction error for each training set sample:
#>   [1]  1.143535290  0.266577478  0.075668338  1.131416799  0.103337151
#>   [6]  1.062594738  0.023209713  0.700521215  0.468235064  0.673075458
#>  [11]  0.089540802  0.401803647  3.489442389  0.627893821  0.222504312
#>  [16]  3.221940312  0.894591039  0.096840929  0.792751494  1.236641601
#>  [21]  0.001584416  0.339331094  0.609968357  0.286792550  0.471706657
#>  [26]  0.691666512  0.219584091  0.687592096  0.737439448  0.692377554
#>  [31]  1.577144836  0.747749279  1.237863334  0.791600110  0.830273570
#>  [36]  1.187375782  0.416252943  1.386693400  1.209712475  1.110794824
#>  [41]  2.140243438  2.399984105  1.428537318  1.055821644  1.517716920
#>  [46]  1.590342977  0.706991956  0.914251843  1.973206057  1.939137967
#>  [51]  1.671019536  0.091573195  3.972970145  0.955507259  0.415111831
#>  [56]  0.931788087  2.508476277  2.759110197  1.600821991  0.404102398
#>  [61]  0.963920649  3.568594663 14.075284215  5.965384961  0.849943296
#>  [66]  1.141524697  2.287166025  1.359689933  1.504079464  2.344010556
#>  [71]  0.629486701  1.049468037  1.268353928  2.135248556  1.616434750
#>  [76]  2.119979067  0.269447046  1.830802524  1.784506205  0.637978496
#>  [81]  1.039257790  0.413567656  5.704702300  0.307698959  2.670416866
#>  [86]  0.691503674  2.185178349  2.001917127  1.529464596  0.374889846
#>  [91]  1.283773190  3.774151143  2.857702789  4.220121496  6.504020855
#>  [96]  1.762052847  3.572747930  1.679601429  1.776315457  1.140765375
#> ---
#> Prediction error SD for each training set sample:
#>   [1] 0.6632510 1.0747613 0.6228398 0.3909717 0.6276894 0.2926764 0.4835952
#>   [8] 0.2839815 0.3790893 0.2795853 0.2936881 0.2066487 0.7131719 0.3041280
#>  [15] 0.1152717 0.5529949 0.2850841 0.1779433 0.4157889 0.3811892 0.2819642
#>  [22] 0.2720929 0.4005652 0.3226532 0.2209190 0.4454591 0.3876323 0.2040831
#>  [29] 0.4670555 0.4136547 0.3290377 0.1708938 0.4155863 0.1579950 0.1666879
#>  [36] 0.2073140 0.5295909 0.4394888 0.3281213 0.2408321 0.4505811 0.1982922
#>  [43] 0.2434650 0.6775080 0.5017698 0.4212760 0.5761228 0.3315790 0.2303946
#>  [50] 0.2397685 0.3025108 0.1862496 0.3459049 0.3907732 0.2002375 0.3334323
#>  [57] 0.2800619 0.2489747 0.4342413 0.3122039 0.3108747 0.1818030 0.3497313
#>  [64] 0.1975296 0.2514031 0.5769985 0.2638713 0.2774563 0.3553371 0.3684134
#>  [71] 0.2015989 0.2458771 0.2710750 0.2248169 0.3510436 0.4010946 0.5246710
#>  [78] 0.5445661 0.2722141 0.3716945 0.3971595 0.1601688 0.2975253 0.2694639
#>  [85] 0.1819710 0.3799211 0.1644366 0.4944335 0.1960038 0.3795414 0.3421108
#>  [92] 0.3548105 0.6303994 0.5065287 0.2099194 0.1725128 0.2470902 0.4008990
#>  [99] 0.1955731 0.4749818
#> ---
#> Absolute mean prediction error for each test set sample:
#> [[1]]
#>  [1]  1.65850329  0.38377988  1.67378051  0.05017774  4.39395510  0.10970090
#>  [7]  0.80312305  2.00112429  3.30352901  2.39702103  2.49110746  2.57201965
#> [13]  3.40522658  1.66881500  0.18343101  2.68753476  3.38755741  0.45423958
#> [19]  2.05966181 11.37567054 12.46448491 10.29298124 13.73648078  7.69634489
#> [25]  9.60840997 61.52340940 12.87208410 11.51112678 11.74810709  6.98371976
#> [31]  3.21827472 12.00306032 12.10550647 12.84024755  4.05105088 12.90499545
#> [37] 11.90523171  4.25191528  2.50046794 12.77903509  6.03664168 11.53383212
#> [43]  4.61455483  2.44420160 12.70236566  7.43903336  3.71605764  2.86329295
#> [49]  7.64329131  7.18551395
#> 
#> [[2]]
#>  [1]  3.4543176  2.4871241 11.3411211 10.5407448  0.6383966  2.5794987
#>  [7]  1.3770894  7.8395977  1.2465770  0.4117513 36.6052600 31.2661905
#> [13] 35.1483035 31.4823204 35.0510448 29.4380334 40.1687529 34.5458973
#> [19] 23.2587832 77.0500127 28.5465786 23.0528668 25.4344415 34.8137632
#> [25] 30.6215780 22.2266004 19.7494606  0.2107381  0.8672154  3.3958982
#> [31]  2.7987705  2.3278965  1.7741967  0.7847005  6.0851973  1.5202727
#> [37]  5.0502711 46.4425419  0.7740034  1.3427991  3.9044839  1.0792567
#> [43]  4.9727341  0.7154980  3.8805581 15.7410126  3.2326191 14.0260989
#> [49]  2.5844449  6.2848282 12.2295047 10.8379296 11.2892066 12.5826622
#> [55]  7.0478476 12.8170478 11.4202239
#> 
#> ---
#> Prediction error SD for each test set sample:
#> [[1]]
#>  [1]   0.3435652   0.4202025   0.2808564   0.4494669   0.5071351   0.6865988
#>  [7]   0.2336136   0.4024350   0.4419844   0.6074236   0.4874105   0.6118569
#> [13]   0.5662671   0.7046519   0.5319585   0.6922911   0.6478133   0.7873733
#> [19]   0.9220921   9.1907515   8.9865306   8.7179049   8.6820109   7.3214560
#> [25]   6.3265450 100.7454688   8.0020688   8.0038258   7.6711559   8.0594401
#> [31]   7.9284787   7.0233280   7.5094411   7.8319137   7.8441271   7.8257605
#> [37]   7.4775638   7.7852585   7.7400842   6.8966159   7.6805341   7.6719323
#> [43]   7.7588242   7.7021805   7.6900458   7.6252742   7.6487169   7.5851825
#> [49]   7.6648408   7.4873894
#> 
#> [[2]]
#>  [1]   7.6684654   7.5798389   7.6259424   7.2935751   7.9845870   7.5090642
#>  [7]   7.5267845   6.6601347   7.4769873   7.5602014  23.1031563  19.9672830
#> [13]  20.0716499  21.5910269  20.7678312  21.0867461  21.3929124  21.3106238
#> [19]  21.2005159  88.4988937  21.1606533  21.1062933  20.6057930  20.9342275
#> [25]  20.9589177  20.7578514  20.7028630   0.2224852   0.6708101   0.5494225
#> [31]   0.3928201   0.5017335   0.3570617   0.3469072   0.5381546   0.1064181
#> [37]   0.3604482 105.9905919   0.2603238   0.3310628   0.2514998   0.1795206
#> [43]   0.3312007   0.3384145   0.7288403   8.9270759   7.4587752   8.2109699
#> [49]   7.9356333   7.9803569   7.9508740   6.9278026   7.8746692   7.7807236
#> [55]   7.9334114   7.9810786   7.7136865
#> 
plot(ad)

# the interactive plot requires a HTML viewer
if (FALSE) {
plot(ad, type = "interactive")
}