Nan Xiao

R developer
data science practitioner
machine learning researcher


msaenet · ggsci · liftr · protr · Rcpi · OHPL · hdnom · sevenbridges-r · enpls · RECA · grex · sbgr · · · · ·

R Packages

A collection of my R packages for machine learning, data visualization, and reproducible research.

msaenet: Multi-Step Adaptive Estimation Methods for Reducing False Positive Selection in Sparse Regressions

msaenet implements the multi-step adaptive elastic-net (MSAENet) algorithm for feature selection in high-dimensional regressions. Multi-step adaptive estimation based on MCP-net or SCAD-net is also supported.

Website CRAN GitHub Paper

ggsci: Scientific Journal and Sci-Fi Themed Color Palettes for ggplot2

ggsci offers a collection of ggplot2 color palettes inspired by scientific journals, data visualization libraries, science fiction movies, and TV shows.

Website CRAN GitHub

liftr: Containerize R Markdown Documents

liftr aims to solve the problem of persistent reproducible reporting. To achieve this goal, it extends the R Markdown metadata format, and uses Docker to containerize and render R Markdown documents.

Website CRAN GitHub

protr: R Package for Generating Various Numerical Representation Schemes of Protein Sequence

R package generating various numerical representation schemes of protein sequence for bioinformatics and proteochemometrics research.

Website CRAN GitHub Paper

Rcpi: R/Bioconductor Package for Generating Various Descriptors of Proteins, Compounds, and their Interactions

The Rcpi package emphasizes the comprehensive integration of bioinformatics and chemoinformatics into a molecular informatics platform for drug discovery.

Website Bioconductor GitHub Paper

OHPL: Ordered Homogeneity Pursuit Lasso for Group Variable Selection

Ordered homogeneity pursuit lasso (OHPL) algorithm for group variable selection. The OHPL method takes the homogeneity structure in high-dimensional data into account and enjoys the grouping effect to select groups of important variables automatically. This feature makes it particularly useful for high-dimensional datasets with strongly correlated variables, such as spectroscopic data.

Website CRAN GitHub Paper

hdnom: Benchmarking and Visualization Toolkit for Penalized Cox Models

hdnom creates nomogram visualizations for penalized Cox regression models, with the support of reproducible survival model building, validation, calibration, and comparison for high-dimensional data.

Website CRAN GitHub Paper

sevenbridges-r: Seven Bridges API Client, CWL Schema, Meta Schema, and SDK Helper in R

R client and utilities for Seven Bridges platform API, from Cancer Genomics Cloud to other Seven Bridges supported platforms.

Website Bioconductor GitHub

enpls: R Package for Ensemble Partial Least Squares Regression

Algorithmic framework for measuring feature importance, outlier detection, model applicability evaluation, and ensemble predictive modeling with (sparse) partial least squares regressions.

Website CRAN GitHub

RECA: R Package for Relevant Component Analysis (RCA) in Supervised Distance Metric Learning

Relevant Component Analysis (RCA) tries to find a linear transformation of the feature space such that the effect of irrelevant variability is reduced in the transformed space.

Website CRAN GitHub

grex: Gene ID Mapping for Genotype-Tissue Expression (GTEx) Data

grex offers a minimal dependency solution for mapping Ensembl gene IDs to Entrez IDs, HGNC gene symbols, and UniProt IDs, for Genotype-Tissue Expression (GTEx) data.

Website CRAN GitHub

sbgr: R Client for Seven Bridges Genomics API (v1)

sbgr provides an R client for accessing Seven Bridges Genomics API (v1).

Website GitHub

Web Applications

A collection of my Shiny apps for reproducible interactive data analysis.

DockFlow: Bioconductor Workflow Containerization and Orchestration with liftr

Proof-of-concept project exploring the technical possibility and complexity for bioinformatics workflow containerization and orchestration using Docker and liftr. All 18 available Bioconductor workflows were containerized. GitHub

hdnom-icon Web Application for Building Nomograms with High-Dimensional Data is the web application for the hdnom package. All the 9 model types in the hdnom package are supported. It streamlined the process of nomogram building, model validation, model calibration, and reproducible report generation.

The web app has been included in the Shiny User Showcase. Paper

TargetNet: Shiny Web Application for Drug Target Identification with Large-Scale Public Binding Affinities Data

Web application for predicting the binding probability of 623 potential drug targets for given molecule(s). Driven by machine learning modeling of large-scale public chemogenomics data. GitHub

ProtrWeb: Shiny Web Application for Protein Sequence-Derived Descriptor Computation

Web application for computing 14 types of protein sequence-derived structural and physicochemical features in bioinformatics. GitHub Paper

ImgSVD: Shiny Web Application for Image Compression via Singular Value Decomposition

ImgSVD is a Shiny app for image compression via singular value decomposition (SVD). ImgSVD is inspired by Yihui Xie's comment in Yixuan Qiu's article on image compression via singular value decomposition with the R package rARPACK. [Photo credit: Crouching Tiger, Hidden Dragon] GitHub

Signify: Shiny Web Application for Making Your p-value Sound Significant

Signify is a Shiny-based web application for making your (>0.05) p-values sound significant. The application is powered by the data from Matthew Hankins. GitHub

R Document Archives

Developing Web Applications with R and Apache
使用 R 和 Apache 开发 Web 应用程序

Created by Jeffery Horner, rApache is a creative project supporting web application development using the R statistical language and environment and the Apache web server.

Here is the translated documentation in Chinese, which was firstly released in March, 2010. The current version was revised in December, 2011.

Read More

Google's R Style Guide
来自 Google 的 R 语言编码风格指南

Google's style guide for R gurus. The style guide conveys that consistency and detailed usages are the very elements in R coding.

Translated in Dec, 2011.

Read More

© Nan Xiao 2017
[email protected]