Performance¶
revdeprun is designed to optimize time-to-results. Its performance model is:
- Setup time is mostly network + I/O + some single-core work.
- Checking time is embarrassingly parallel and benefits strongly from more and faster CPU cores.
Scaling tips¶
- On very large machines, raise file descriptor limits:
ulimit -n 10240. - Expect diminishing returns: the slowest individual packages dominate near the end.
- Disk matters:
R CMD checkcan write a lot, so avoid tiny or slow storage.
Parallel workers for reverse dependency check¶
The CLI option --num-workers controls how many parallel workers
xfun::rev_check() uses. By default, revdeprun uses all available CPU cores.
Binary package downloads¶
revdeprun configures pak/pkgcache async HTTP connection limits to accelerate binary package downloads from P3M.
By default, binary package downloads use the same computed max connections with a 50 connections per host cap, increasing from the pkgcache default of 6 connections per host.
Source tarball downloads¶
xfun::rev_check() downloads reverse dependency tarballs from CRAN. Historically
this was serial, which is painful at 1,000+ packages.
assets/patch-xfun.R patches xfun:::download_tarball() to download in parallel
using parallel::mcmapply(). Concurrency is controlled by
getOption("xfun.rev_check.download_cores"). Default is 50.
Auto-tuning --max-connections¶
Behind the scene, revdeprun computes a safe --max-connections value for
running Rscript. The --max-connections argument is available since R 4.4.0.
When xfun::rev_check() runs with many parallel workers, the main R process
can run out of connections (pipes for inter-process communication + various
other connections opened during installs and checks). To avoid that, revdeprun
auto-tunes Rscript --max-connections based on the worker count
(--num-workers).
The heuristic is implemented in src/util.rs (util::optimal_max_connections)
as:
Ncpus is the number of parallel workers revdeprun will use.
Why this works¶
- The main process typically needs about 1 to 2 R connections per worker (pipes for result/control) plus a small fixed overhead.
- Multiplying workers by 3 bakes in ~50% headroom over a 2-per-worker baseline for miscellaneous connections opened by your code and the parallel machinery.
- The extra
+64is a fixed cushion for the own connections of the main process. - It respects R's hard cap (
4096) and the legacy default (128), and rounds up to the next multiple of128for stable "round number" values.
Example values¶
The unrounded baseline is max(128, 3 * Ncpus + 64). revdeprun then rounds up
to the next multiple of 128 and caps the final value at 4096.
- 16 cores → computed
112→128(lift to floor) - 32 cores → computed
160→256 - 128 cores → computed
448→512 - 256 cores → computed
832→896 - 384 cores → computed
1216→1280
pkgdepends scheduler patch¶
pak embeds pkgdepends for install planning and scheduling. With thousands of packages, two effects show up:
- The worker pool can under-fill when many small binary installs finish between polls.
- Dependency bookkeeping can become expensive for very large plans.
assets/patch-pkgdepends.R patches the scheduler to refill the worker pool
more aggressively and avoid unnecessary work.
The monkey patch is applied before calling pak::pkg_install().