I am pleased to introduce pkglite for Python, now available on PyPI. pkglite for Python provides a minimalist framework for packing source packages written in any programming language into a text file and restoring them to their original directory structure. You can install it with:
pip install pkglite
For installation as a global command-line tool, use pipx:
pipx install pkglite
Context
Four years ago, we released the R package pkglite, designed for bidirectional conversion of R packages. The motivation behind pkglite for R, including its role in supporting eCTD submissions, was summarized in our Clinical Trials paper (Zhao et al. 2023). We also explored its application in retrieval-augmented generation to provide code context to large language models.
pkglite for R has since been adopted in multiple real-world applications, including the first two R Consortium open-source R submissions (pilot 1, pilot 2), and sponsor-led submissions (webinar 1, webinar 2). It is also featured in the pharmaverse end-to-end clinical reporting packages.
The overwhelmingly positive feedback on the R package motivated me to consider: what’s next?
Creating pkglite for Python
To address this, I asked myself a broader question: what do people need? Many colleagues and collaborators of mine use Python and other languages daily. Extending the capability to pack and exchange source packages written in their language of choice into plain text felt like a natural progression.
This goal led to several key design updates in pkglite for Python:
Language-agnostic design. To support any programming language, I replaced R-specific packing scope specifications with a
.pkgliteignore
configuration file, compliant with the gitignore format standard.Command-line interface. Alongside the Python API, pkglite for Python includes a CLI built with Typer. This enables seamless integration into shell scripts and standard engineering workflows.
Optimized tooling. File type recognition now relies on a content-based zlib algorithm, replacing file-extension-based methods. The packed file parser has been rewritten using finite state machines for improved maintainability. Additionally, UTF-8 encoding is enforced for all text files across platforms.
Performance testing
pkglite for Python is developed with performance in mind. Using this
bench.sh
script, I tested packing and unpacking an entire
Python project with the virtual environment, which offers a realistic mix
of text and binary files.
On an M2 MacBook Air (2022), packing the project generated a 1.1 GB text file in 16 seconds:
Packing...
Packing complete.
Output file size: 1.18 GB
Packing time: 16 seconds
Packing write speed: 76.04 MB/s
Unpacking from the file took only 8 seconds:
Unpacking...
Unpacking complete.
Unpacking time: 8 seconds
Unpacking read speed: 152.09 MB/s
Acknowledgements
I am deeply grateful to my original pkglite R package coauthors, Keaven Anderson and Yilong Zhang, for their encouragement to explore new ideas. My thanks also go to Ross Farrugia and the pharmaverse council for including the package and providing GitHub hosting, plus everyone who kindly contributed valuable time for code reviews and validations.