Skip to contents

catboost - parameter tuning and model selection with k-fold cross-validation and grid search

Usage

cv_catboost(
  x,
  y,
  params = cv_param_grid(),
  n_folds = 5,
  n_threads = 1,
  seed = 42,
  verbose = TRUE
)

Arguments

x

Predictor matrix.

y

Response vector.

params

Parameter grid generated by cv_param_grid().

n_folds

Number of folds. Default is 5.

n_threads

The number of parallel threads. For optimal speed, match this to the number of physical CPU cores, not threads. See respective model documentation for more details. Default is 1.

seed

Random seed for reproducibility.

verbose

Show progress?

Value

A data frame containing the complete tuning grid and the AUC values, with the best parameter combination and the highest AUC value.

Examples

sim_data <- msaenet::msaenet.sim.binomial(
  n = 100,
  p = 10,
  rho = 0.6,
  coef = rnorm(5, mean = 0, sd = 10),
  snr = 1,
  p.train = 0.8,
  seed = 42
)

params <- cv_catboost(
  sim_data$x.tr,
  sim_data$y.tr,
  params = cv_param_grid(
    n_iterations = c(100, 200),
    max_depth = c(3, 5),
    learning_rate = c(0.1, 0.5)
  ),
  n_folds = 5,
  n_threads = 1,
  seed = 42,
  verbose = FALSE
)

params$df
#>   iterations depth    metric
#> 1        100     3 0.7788221
#> 2        200     3 0.8007519
#> 3        100     5 0.7907268
#> 4        200     5 0.7719298