# Building Regularized Logistic Regressions from Scratch with Computational Graphs in R

The R package built for this post is available on GitHub: nanxstats/logreg.

As one of the cornerstones for deep learning frameworks, automatic differentiation was briefly mentioned in our previous post. Today let’s focus on the other important piece: the computational graph. I’m only able to write something about this, thanks to Ron Triepels, the author of the recently published R package cgraph. This package offers a decent, minimalist implementation for computational graphs and automatic differentiation in native C and R. Downhill into the forest. Photo by Joakim Honkasalo.

The computational graph is a useful abstraction/framework for describing and executing computational workflows for data. In many cases, the workflows/pipelines are composed of very reusable components, such as all the types of layers in neural networks. Besides machine learning applications, another interesting example is the Common Workflow Language (CWL). CWL is widely used for describing bioinformatics workflows. The components in the workflow described in YAML or JSON formats are parsed to create a computational graph to run through later.

To me, one of the benefits of using computational graphs is that you will notice the issues immediately if there is an error compiling the graph before even running things, say, mistakes were made when describing the workflow. This may help enforce and improve code quality and reproducibility. Plus, The modularized computational components makes it particularly easy to prototype new machine learning algorithms, just like legos. On the performance side, since the data and computation flow is predictable, many aggressive optimizations are possible, for example, on parallelization.

To experience how powerful computational graph + automatic differentiation is for solving classical machine learning problems, I took a few hours to use cgraph and built an R package logreg. I implemented three methods here:

• Logistic regression
• Logistic regression with the ridge ($$\ell_2$$) penalty
• Logistic regression with the seamless-$$\ell_0$$ (SELO) penalty, a differentiable approximation of the $$\ell_0$$ penalty.

Although these methods could be implemented very differently before, the core implementation with cgraph is surprisingly similar: pretty much just replacing the loss/cost functions, as is shown below (SELO). Check out this vignette for test examples, and please feel free to leave me a message if you find any bugs in the code.

# initialize graph
graph <- cg_graph()

# initialize input (X), target (y)
input <- cg_input("input")
target <- cg_input("target")

# intialize parameters (beta, alpha)
parms <- list(
beta = cg_parameter(initialize_glorot_normal(ncol(x), 1), "beta"),
alpha = cg_parameter(initialize_constant(0), "alpha")
)

# define the model (yhat = beta X + alpha)
output <- cg_sigmoid(cg_add(cg_matmul(input, parms$beta), cg_as_numeric(parms$alpha)), "output")

# define the selo loss (y, yhat, beta)
cg_mean(crossentropy(target, output)),
cg_div(
cg_mean(cg_ln(cg_add(cg_div(cg_abs(parms$beta), cg_add(cg_abs(parms$beta), tau)), 1))),
log(2)
),
"loss"
)

# track errors
error <- rep(0, n_epochs)

# optimize parameters via sgd
for (i in 1:n_epochs) {
values <- cg_graph_run(graph, loss, list(input = x, target = y))
for (parm in parms) parm$value <- parm$value - learning_rate * grads[[parm$name]] error[i] <- values$loss
}