Regularized Logistic Regressions with Computational Graphs

Introduction

The logreg package implemented logistic regression and regularized logistic regression models with the computational graph and automatic differentiation framework provided by the R package cgraph.

In this vignette, we will test the methods implemented in this package:

Logistic regression
Logistic regression with the ridge penalty (\(\ell_2\) regularization)
Logistic regression with the seamless-\(\ell_0\) (SELO) penalty (differentiable approximation for \(\ell_0\) regularization)

We will also compare them with one existing method to make sure the data can be fitted by a reasonable regularized logistic regression model:

Logistic regression with multi-step adaptive elastic-net (\(\ell_1\)-\(\ell_2\) based but approximately non-convex regularization)

Generate data

Let’s simulate some data for testing high-dimensional linear models:

library("logreg")
library("msaenet")

sim <- msaenet.sim.binomial(
  n = 500, p = 500, rho = 0.5, coef = rnorm(5, sd = 5), snr = 2,
  p.train = 0.7, seed = 2019
)

Now we have

350 samples in the training set and 150 in the test set;
500 variables with the first 5 of them being true variables;
Moderate correlations between variables (rho = 0.5) and moderate signal-to-noise ratio (snr = 2).

Fit the models

Fit the models on the training set:

fit_base <- fit_logistic(x = sim$x.tr, y = sim$y.tr, n_epochs = 500)
fit_ridge <- fit_logistic_ridge(x = sim$x.tr, y = sim$y.tr, n_epochs = 500, lambda = 0.5)
fit_selo <- fit_logistic_selo(x = sim$x.tr, y = sim$y.tr, n_epochs = 500, tau = 0.05)
fit_msaenet <- msaenet(sim$x.tr, sim$y.tr, family = "binomial", init = "ridge", tune = "ebic", nsteps = 10L, seed = 2009)

Convergence

Plot the training errors vs. epochs:

par(mfrow = c(1, 3))
plot_error(fit_base)
plot_error(fit_ridge)
plot_error(fit_selo)

The ridge model converged around 100 epochs, much faster than the other two models (vanilla and SELO).

Estimated coefficients

Let’s plot the estimated coefficients and see if they meet our expectations.

The logistic regression got the overall correct estimation for true variables; other variables got smaller coefficients:

plot_coef(fit_base)

For the ridge model, some shrinkage effects are observed while no sparsity was induced, as expected:

plot_coef(fit_ridge)

For the SELO model, we can see the shrinkage effect and apparent sparsity in the estimation results. Note that the near-zero coefficients are not estimated as precisely 0, due to the limitation of our unconstrained gradient-based optimization method.

plot_coef(fit_selo)

The multi-step adaptive elastic-net gave us a model closest to the true model, with 4 in 5 true variables estimated to be non-zero, with all others being 0 (1 false negative):

plot_coef(fit_msaenet)

Predictive performance

Let’s compute AUC on the training and test set.

The logistic regression model sets the baseline, and overfits the training set:

c(auc(sim$y.tr, predict(fit_base, sim$x.tr)), auc(sim$y.te, predict(fit_base, sim$x.te)))
#> [1] 1.0000000 0.6626506

The ridge model clearly overfits the training set too, but with 3% to 4% AUC improvement on the test set compared to the baseline:

c(auc(sim$y.tr, predict(fit_ridge, sim$x.tr)), auc(sim$y.te, predict(fit_ridge, sim$x.te)))
#> [1] 0.9923589 0.6951987

The SELO model overfits the training set, with an almost 10% AUC improvement on the test set compared to the baseline:

c(auc(sim$y.tr, predict(fit_selo, sim$x.tr)), auc(sim$y.te, predict(fit_selo, sim$x.te)))
#> [1] 0.9912813 0.7574177

The multi-step adaptive elastic-net model has similar AUCs over the training set and set set. with a 20% AUC improvement on the test set compared to the baseline:

c(auc(sim$y.tr, predict(fit_msaenet, sim$x.tr)), auc(sim$y.te, predict(fit_msaenet, sim$x.te)))
#> [1] 0.8928618 0.8730444