R Package Release Notes: ggsci, protr, and msaenet (Spring 2024)

Nan Xiao April 21, 2024 3 min read

Cute market in Rome, Italy. Photo by Mark Pecar.

Maintaining R packages is a significant time and effort commitment. The Releasing to CRAN chapter of the R Packages book provides an excellent overview of the many responsibilities package authors face. Henrik’s CRANhaven also helps me appreciate the efforts made by both CRAN maintainers and package maintainers to keep a consistent, high standard on software quality and integration within a single, trusted repository.

As a package maintainer myself, I currently oversee 10 personal packages on CRAN, most of which originated from previous research activities. This Spring, I took some time to clean up the issues in half of these packages. This post documents the most important improvements.

ggsci 3.0.3

ggsci is a ggplot2 extension package offering various color scales. ggsci 3.0.2 and 3.0.3 resolved some compatibility issues with ggplot 3.5.0. Specifically, ggplot2 3.5.0 deprecated the scale_name argument in discrete_scale(). This will generate warning messages when users call any color scale functions that did not remove this argument when using ggplot2 >= 3.5.0:

Warning message:
The `scale_name` argument of `discrete_scale()` is deprecated as of
ggplot2 3.5.0.

However, if we simply remove this argument in the color scale functions, R will complain under ggplot2 < 3.5.0 because this was a required argument:

Error in `discrete_scale()`:
! argument "scale_name" is missing, with no default
Backtrace:
 1. ggsci::scale_color_*()
 2. ggplot2::discrete_scale(...)
 3. ggplot2::ggproto(...)
 4. rlang::list2(...)
Execution halted

To maximize compatibility without requiring a recent minimum ggplot2 version, I implemented a more cautious approach by detecting the installed ggplot2 version at runtime and use different discrete_scale() calls based on the detected version. The internal functions I borrowed from the shiny package, such as is_installed(), are effective in solving this version detection problem (although a bit complicated).

protr 1.7-1

protr is a package for protein sequence feature engineering.

protr 1.7-1 brings crossSetSim() to feature parity with parSeqSim(). First, it extends crossSetSim() added in protr 1.7-0 to further support splitting similarity computation into batches and displaying computation progress. It also introduces a new function crossSetSimDisk() to offload the intermediate results from each batch to disk and merge them at the end, similar to the logic in parSeqSimDisk().

These enhancements might help scale your similarity computation between protein sequence sets when being bounded by available RAM, given the \(\mathcal{O}(n^2)\) or \(\mathcal{O}(mn)\) space complexity. However, specific situations may require further custom optimizations. Of course, using high-memory machines is a simple alternative solution.

msaenet 3.1.1

msaenet helps users explore the solution space of adaptive estimations when building sparse linear models, by supporting arbitrary numbers of adaptive estimation steps, initialization methods, and automatic parameter tuning. A previous post demonstrated how it can generate parsimonious solutions when modeling the sparse index tracking problem.

msaenet 3.1.1 resolves R CMD check notes and code example warnings accumulated from R version and dependency updates. It also improves code style and adopts proper three-number semantic versioning.

Where do we go from here?

I believe that proper delegation and transition of maintainership is critical for the long-term health and sustainability of open source projects. I have tried a few times in the past years to find new maintainers for my packages—if you are interested in taking some responsibilities, please reach out.