Data Science Practitioner
Machine Learning Researcher
Invited talk at the 10th China R Conference. Tsinghua University, Beijing, China. May 20, 2017.
Invited talk at DockerCon 2017. Austin, TX. April, 2017.
Invited talk at Boston R/Bioconductor for Genomics Meetup. Dana-Farber Cancer Institute. January 12, 2017.
Invited talk at Shiny Developers Conference. Stanford University. January 30, 2016.
Invited talk at 2015 Bioinformatics Workshop. Center for Research Informatics, The University of Chicago. December 3, 2015.
Abstract: We introduced the modern concepts, principles, tools, and challenges in reproducible (computational) research at the workshop. With some coverage of the following topics:
Invited workshop (joint with Dan Tenenbaum & Tengfei Yin) at BioC 2015. Fred Hutchinson Cancer Research Center, Seattle, WA. July 21, 2015.
Abstract: We will introduce common workflow language and R package cwl, the implementation with Rabix , then a demo about how to write R command line tool with docopt, how to convert your R command line tool to CWL, how to use rabix R package's R interface to describe your tool, and use Rabix to develop, deploy and run it on AWS cloud with SBG platform or run it locally. We will also demonstrate dockerizing R Markdown documents with Rabix support using the liftr package; automating a workflow from raw data uploading, pipeline running, and report retrieving with the sbgr API package.
Presented at the Computational Biology & Drug Design (CBDD) Group, Central South University. December, 2013.
Abstract: The need for appropriate methods to measure the similarity between data points is urgent in machine learning research, but handcrafting good metrics for specific problems is difficult. This has led to the emergence of supervised distance metric learning, which aims at automatically learning a metric from data, for the past decade. The talk gives a review of the successful methods in the field of supervised distance metric learning, discussed the pros and cons of each approach, especially RCA, NCA, ITML and LMNN.
Keywords: distance metric learning
Invited talk at the 6th China R Conference. Renmin University of China, Beijing. May 18, 2013.
Abstract: The web itself is the world's largest, public-accessible data source. Knowing how to scrape data from the web has become one must-have skill, particularly for data hackers. In this report, you will learn the basic coding strategies and neat tricks for web scraping with R. While introducing how to retrieve data from the web and parse a variety of data formats, we will summarize the usage and application scenarios of several useful R packages. At last but not least, this report emphasizes the suitable exception handling and parallelization methods, which is crucial for the construction of a robust and high performance web scraper with R.
Keywords: R; web scraping; web crawling
Presented at Computational Biology & Drug Design (CBDD) Group, Central South University. March 29, 2012.
Introduction to the linear and circular layouts for network visualization.
Presented at 2010 PKU Visualization Summer School. Peking University, Beijing. August 18, 2010.
Final project presentation of our group for the 2010' visualization summer school in Peking University.