Ensemble Partial Least Squares for Model Applicability Domain Evaluation
Source:R/enpls.ad.R
enpls.ad.Rd
Model applicability domain evaluation with ensemble partial least squares.
Arguments
- x
Predictor matrix of the training set.
- y
Response vector of the training set.
- xtest
List, with the i-th component being the i-th test set's predictor matrix (see example code below).
- ytest
List, with the i-th component being the i-th test set's response vector (see example code below).
- maxcomp
Maximum number of components included within each model. If not specified, will use the maximum number possible (considering cross-validation and special cases where n is smaller than p).
- cvfolds
Number of cross-validation folds used in each model for automatic parameter selection, default is
5
.- space
Space in which to apply the resampling method. Can be the sample space (
"sample"
) or the variable space ("variable"
).- method
Resampling method.
"mc"
(Monte-Carlo resampling) or"boot"
(bootstrapping). Default is"mc"
.- reptimes
Number of models to build with Monte-Carlo resampling or bootstrapping.
- ratio
Sampling ratio used when
method = "mc"
.- parallel
Integer. Number of CPU cores to use. Default is
1
(not parallelized).
Value
A list containing:
tr.error.mean
- absolute mean prediction error for training settr.error.median
- absolute median prediction error for training settr.error.sd
- prediction error sd for training settr.error.matrix
- raw prediction error matrix for training sette.error.mean
- list of absolute mean prediction error for test set(s)te.error.median
- list of absolute median prediction error for test set(s)te.error.sd
- list of prediction error sd for test set(s)te.error.matrix
- list of raw prediction error matrix for test set(s)
Note
Note that for space = "variable"
, method
could
only be "mc"
, since bootstrapping in the variable space
will create duplicated variables, and that could cause problems.
Author
Nan Xiao <https://nanx.me>
Examples
data("alkanes")
x <- alkanes$x
y <- alkanes$y
# training set
x.tr <- x[1:100, ]
y.tr <- y[1:100]
# two test sets
x.te <- list(
"test.1" = x[101:150, ],
"test.2" = x[151:207, ]
)
y.te <- list(
"test.1" = y[101:150],
"test.2" = y[151:207]
)
set.seed(42)
ad <- enpls.ad(
x.tr, y.tr, x.te, y.te,
space = "variable", method = "mc",
ratio = 0.9, reptimes = 50
)
print(ad)
#> Model Applicability Domain Evaluation by ENPLS
#> ---
#> Absolute mean prediction error for each training set sample:
#> [1] 1.143535290 0.266577478 0.075668338 1.131416799 0.103337151
#> [6] 1.062594738 0.023209713 0.700521215 0.468235064 0.673075458
#> [11] 0.089540802 0.401803647 3.489442389 0.627893821 0.222504312
#> [16] 3.221940312 0.894591039 0.096840929 0.792751494 1.236641601
#> [21] 0.001584416 0.339331094 0.609968357 0.286792550 0.471706657
#> [26] 0.691666512 0.219584091 0.687592096 0.737439448 0.692377554
#> [31] 1.577144836 0.747749279 1.237863334 0.791600110 0.830273570
#> [36] 1.187375782 0.416252943 1.386693400 1.209712475 1.110794824
#> [41] 2.140243438 2.399984105 1.428537318 1.055821644 1.517716920
#> [46] 1.590342977 0.706991956 0.914251843 1.973206057 1.939137967
#> [51] 1.671019536 0.091573195 3.972970145 0.955507259 0.415111831
#> [56] 0.931788087 2.508476277 2.759110197 1.600821991 0.404102398
#> [61] 0.963920649 3.568594663 14.075284215 5.965384961 0.849943296
#> [66] 1.141524697 2.287166025 1.359689933 1.504079464 2.344010556
#> [71] 0.629486701 1.049468037 1.268353928 2.135248556 1.616434750
#> [76] 2.119979067 0.269447046 1.830802524 1.784506205 0.637978496
#> [81] 1.039257790 0.413567656 5.704702300 0.307698959 2.670416866
#> [86] 0.691503674 2.185178349 2.001917127 1.529464596 0.374889846
#> [91] 1.283773190 3.774151143 2.857702789 4.220121496 6.504020855
#> [96] 1.762052847 3.572747930 1.679601429 1.776315457 1.140765375
#> ---
#> Prediction error SD for each training set sample:
#> [1] 0.6632510 1.0747613 0.6228398 0.3909717 0.6276894 0.2926764 0.4835952
#> [8] 0.2839815 0.3790893 0.2795853 0.2936881 0.2066487 0.7131719 0.3041280
#> [15] 0.1152717 0.5529949 0.2850841 0.1779433 0.4157889 0.3811892 0.2819642
#> [22] 0.2720929 0.4005652 0.3226532 0.2209190 0.4454591 0.3876323 0.2040831
#> [29] 0.4670555 0.4136547 0.3290377 0.1708938 0.4155863 0.1579950 0.1666879
#> [36] 0.2073140 0.5295909 0.4394888 0.3281213 0.2408321 0.4505811 0.1982922
#> [43] 0.2434650 0.6775080 0.5017698 0.4212760 0.5761228 0.3315790 0.2303946
#> [50] 0.2397685 0.3025108 0.1862496 0.3459049 0.3907732 0.2002375 0.3334323
#> [57] 0.2800619 0.2489747 0.4342413 0.3122039 0.3108747 0.1818030 0.3497313
#> [64] 0.1975296 0.2514031 0.5769985 0.2638713 0.2774563 0.3553371 0.3684134
#> [71] 0.2015989 0.2458771 0.2710750 0.2248169 0.3510436 0.4010946 0.5246710
#> [78] 0.5445661 0.2722141 0.3716945 0.3971595 0.1601688 0.2975253 0.2694639
#> [85] 0.1819710 0.3799211 0.1644366 0.4944335 0.1960038 0.3795414 0.3421108
#> [92] 0.3548105 0.6303994 0.5065287 0.2099194 0.1725128 0.2470902 0.4008990
#> [99] 0.1955731 0.4749818
#> ---
#> Absolute mean prediction error for each test set sample:
#> [[1]]
#> [1] 1.65850329 0.38377988 1.67378051 0.05017774 4.39395510 0.10970090
#> [7] 0.80312305 2.00112429 3.30352901 2.39702103 2.49110746 2.57201965
#> [13] 3.40522658 1.66881500 0.18343101 2.68753476 3.38755741 0.45423958
#> [19] 2.05966181 11.37567054 12.46448491 10.29298124 13.73648078 7.69634489
#> [25] 9.60840997 61.52340940 12.87208410 11.51112678 11.74810709 6.98371976
#> [31] 3.21827472 12.00306032 12.10550647 12.84024755 4.05105088 12.90499545
#> [37] 11.90523171 4.25191528 2.50046794 12.77903509 6.03664168 11.53383212
#> [43] 4.61455483 2.44420160 12.70236566 7.43903336 3.71605764 2.86329295
#> [49] 7.64329131 7.18551395
#>
#> [[2]]
#> [1] 3.4543176 2.4871241 11.3411211 10.5407448 0.6383966 2.5794987
#> [7] 1.3770894 7.8395977 1.2465770 0.4117513 36.6052600 31.2661905
#> [13] 35.1483035 31.4823204 35.0510448 29.4380334 40.1687529 34.5458973
#> [19] 23.2587832 77.0500127 28.5465786 23.0528668 25.4344415 34.8137632
#> [25] 30.6215780 22.2266004 19.7494606 0.2107381 0.8672154 3.3958982
#> [31] 2.7987705 2.3278965 1.7741967 0.7847005 6.0851973 1.5202727
#> [37] 5.0502711 46.4425419 0.7740034 1.3427991 3.9044839 1.0792567
#> [43] 4.9727341 0.7154980 3.8805581 15.7410126 3.2326191 14.0260989
#> [49] 2.5844449 6.2848282 12.2295047 10.8379296 11.2892066 12.5826622
#> [55] 7.0478476 12.8170478 11.4202239
#>
#> ---
#> Prediction error SD for each test set sample:
#> [[1]]
#> [1] 0.3435652 0.4202025 0.2808564 0.4494669 0.5071351 0.6865988
#> [7] 0.2336136 0.4024350 0.4419844 0.6074236 0.4874105 0.6118569
#> [13] 0.5662671 0.7046519 0.5319585 0.6922911 0.6478133 0.7873733
#> [19] 0.9220921 9.1907515 8.9865306 8.7179049 8.6820109 7.3214560
#> [25] 6.3265450 100.7454688 8.0020688 8.0038258 7.6711559 8.0594401
#> [31] 7.9284787 7.0233280 7.5094411 7.8319137 7.8441271 7.8257605
#> [37] 7.4775638 7.7852585 7.7400842 6.8966159 7.6805341 7.6719323
#> [43] 7.7588242 7.7021805 7.6900458 7.6252742 7.6487169 7.5851825
#> [49] 7.6648408 7.4873894
#>
#> [[2]]
#> [1] 7.6684654 7.5798389 7.6259424 7.2935751 7.9845870 7.5090642
#> [7] 7.5267845 6.6601347 7.4769873 7.5602014 23.1031563 19.9672830
#> [13] 20.0716499 21.5910269 20.7678312 21.0867461 21.3929124 21.3106238
#> [19] 21.2005159 88.4988937 21.1606533 21.1062933 20.6057930 20.9342275
#> [25] 20.9589177 20.7578514 20.7028630 0.2224852 0.6708101 0.5494225
#> [31] 0.3928201 0.5017335 0.3570617 0.3469072 0.5381546 0.1064181
#> [37] 0.3604482 105.9905919 0.2603238 0.3310628 0.2514998 0.1795206
#> [43] 0.3312007 0.3384145 0.7288403 8.9270759 7.4587752 8.2109699
#> [49] 7.9356333 7.9803569 7.9508740 6.9278026 7.8746692 7.7807236
#> [55] 7.9334114 7.9810786 7.7136865
#>
plot(ad)
# the interactive plot requires a HTML viewer
if (FALSE) {
plot(ad, type = "interactive")
}