Nan Xiao

Genomic Data Scientist. Seven Bridges Genomics, Inc. Cambridge, MA, USA.
PhD Candidate. Statistics. Central South University. Changsha, Hunan, China.

Software Index

R Packages: msaenet · ggsci · liftr · protr · Rcpi · OHPL · hdnom · sevenbridges-r · enpls · RECA · grex · sbgr

Web Applications: · · · · ·

R Packages

A collection of my R packages for machine learning, data visualization, and reproducible research.


msaenet: Multi-Step Adaptive Estimation Methods for Reducing False Positive Selection in Sparse Regressions

msaenet implements the multi-step adaptive elastic-net (MSAENet) algorithm for feature selection in high-dimensional regressions. Multi-step adaptive estimation based on MCP-net or SCAD-net is also supported.

Website CRAN GitHub Paper


ggsci: Scientific Journal and Sci-Fi Themed Color Palettes for ggplot2

ggsci offers a collection of ggplot2 color palettes inspired by scientific journals, data visualization libraries, science fiction movies, and TV shows.

Website CRAN GitHub


liftr: Containerize R Markdown Documents

liftr aims to solve the problem of persistent reproducible reporting. To achieve this goal, it extends the R Markdown metadata format, and uses Docker to containerize and render R Markdown documents.

Website CRAN GitHub


protr: R Package for Generating Various Numerical Representation Schemes of Protein Sequence

R package generating various numerical representation schemes of protein sequence for bioinformatics and proteochemometrics research.

Website CRAN GitHub Paper


Rcpi: R/Bioconductor Package for Generating Various Descriptors of Proteins, Compounds, and their Interactions

The Rcpi package emphasizes the comprehensive integration of bioinformatics and chemoinformatics into a molecular informatics platform for drug discovery.

Website Bioconductor GitHub Paper


OHPL: Ordered Homogeneity Pursuit Lasso for Group Variable Selection

Ordered homogeneity pursuit lasso (OHPL) algorithm for group variable selection. The OHPL method takes the homogeneity structure in high-dimensional data into account and enjoys the grouping effect to select groups of important variables automatically. This feature makes it particularly useful for high-dimensional datasets with strongly correlated variables, such as spectroscopic data.

Website CRAN GitHub Paper


hdnom: Benchmarking and Visualization Toolkit for Penalized Cox Models

hdnom creates nomogram visualizations for penalized Cox regression models, with the support of reproducible survival model building, validation, calibration, and comparison for high-dimensional data.

Website CRAN GitHub Paper


sevenbridges-r: Seven Bridges API Client, CWL Schema, Meta Schema, and SDK Helper in R

R client and utilities for Seven Bridges platform API, from Cancer Genomics Cloud to other Seven Bridges supported platforms.

Website Bioconductor GitHub


enpls: R Package for Ensemble Partial Least Squares Regression

Algorithmic framework for measuring feature importance, outlier detection, model applicability evaluation, and ensemble predictive modeling with (sparse) partial least squares regressions.

Website CRAN GitHub


RECA: R Package for Relevant Component Analysis (RCA) in Supervised Distance Metric Learning

Relevant Component Analysis (RCA) tries to find a linear transformation of the feature space such that the effect of irrelevant variability is reduced in the transformed space.

Website CRAN GitHub


grex: Gene ID Mapping for Genotype-Tissue Expression (GTEx) Data

grex offers a minimal dependency solution for mapping Ensembl gene IDs to Entrez IDs, HGNC gene symbols, and UniProt IDs, for Genotype-Tissue Expression (GTEx) data.

Website CRAN GitHub


sbgr: R Client for Seven Bridges Genomics API (v1)

sbgr provides an R client for accessing Seven Bridges Genomics API (v1).

Website GitHub

Web Applications

A collection of my Shiny apps for reproducible interactive data analysis.


DockFlow: Bioconductor Workflow Containerization and Orchestration with liftr

Proof-of-concept project exploring the technical possibility and complexity for bioinformatics workflow containerization and orchestration using Docker and liftr. All 18 available Bioconductor workflows were containerized. GitHub

hdnom-icon Web Application for Building Nomograms with High-Dimensional Data is the web application for the hdnom package. All the 9 model types in the hdnom package are supported. It streamlined the process of nomogram building, model validation, model calibration, and reproducible report generation.

This app has been selected as one of the Shiny User Showcase. Paper


TargetNet: Shiny Web Application for Drug Target Identification with Large-Scale Public Binding Affinities Data

Web application for predicting the binding probability of 623 potential drug targets for given molecule(s). Driven by machine learning modeling of large-scale public chemogenomics data. GitHub


ProtrWeb: Shiny Web Application for Computing Protein Sequence-Derived Descriptors

Web application for computing 14 types of protein sequence-derived structural and physicochemical features in bioinformatics. GitHub Paper


ImgSVD: Shiny Web Application for Image Compression via Singular Value Decomposition

ImgSVD is a Shiny app for image compression via singular value decomposition (SVD). ImgSVD is inspired by Yihui Xie's comment in Yixuan Qiu's article on image compression via singular value decomposition with the R package rARPACK. GitHub


Signify: Shiny Web Application for Making Your p-value Sound Significant

Signify is a Shiny-based web application for making your (>0.05) p-values sound significant. The application is powered by the data from Matthew Hankins. GitHub

R Document Archives


Developing Web Applications with R and Apache

使用 R 和 Apache 开发 Web 应用程序

Created by Jeffery Horner, rApache is a creative project supporting web application development using the R statistical language and environment and the Apache web server.

Here is the translated documentation in Chinese, which was firstly released in March, 2010. The current version was revised in December, 2011.

Read More


Google's R Style Guide

来自 Google 的 R 语言编码风格指南

Google's style guide for R gurus. The style guide conveys that consistency and detailed usages are the very elements in R coding. Translated in December, 2011.

Read More

© 2018 Nan Xiao ·