Introduction

This “easy API” is built on lower level R API function calls provided by this package, (please check the other tutorial and manual) but with a more user friendly interface that provides

  • Simpler cascading style from a Auth object or its children, for example
## download a file
a$project("wgs")$file("17.vcf")$download()
## upload a file
a$project("wgs")$upload("sample.tgz")
  • Exact matching or fuzz matching by names and id.
  • Actions for each different objects
  • Nice print format than a long list
  • Return an R5 object in stead of a list
  • Extensible with more actions

We defined several different objects easy to manipulate, including Auth, Project, Task, File, Upload, Member, etc, each comes with their own methods to access the server, send request and get a response.

To understand the cascading style, you has to under stand the structure of the SBG platforms.

  • Each account could generate one auth token, it’s our top level object.
  • Each account has several different projects represented by Project object
  • Each account had different billing group and each project is assigned to particular billing group in order to run on AWS cloud.
  • Each project is composed with files (File object), tasks (Task object), pipelines (Pipeline object) and members or users (Member object) with different permission level.
  • Each member has its own permission to particular project.
  • A task is a scheduled pipeline job with files and parameters.

So our cascading visually represent this relationship and structure, if you are familiar with SBG platform, you will find this interface very easy to understand.

To understand the API better, you can also check the original SBG developer hub documents

Cheatsheet

This cheat sheet provide main API designed for end users, please note “single” is not real project name, just means the cascading method expecting a single returned object not a list, so you name better has a single hit or you pick the one you want from the list. Remember the logic structure and relationship of those concept will help you remember the API easily. For example, a project has members so of course you can run member() method on project object.

For a searching function that need name or id, when both are empty, they will return every object (file, project, pipeline, task, member) in a list. These function usually has optional parameter called ignore.case and exact used for matching, when exact = FALSE you can use pattern to grep objects.

The cheat sheet only show some most used arguments, please check their help function to read more details.

a <- Auth()
a$billing()
a$project_new(name = , description = , billing_group_id = )
a$project(repos = c("public", "my", "project"), project_name = , project_id = )
## to save some typing
p <- a$project("single")
p$delete()
p$member(name = , id = )
p$member_add()
p$member_update()
p$member_delete()
p$file(name = , id = )
p$file("single")$download()
p$upload(file = , metadata = )
p$pipeline(name = , id = )
p$pipeline_add()
p$task(name = , id = )
p$task_run()
p$task("single")$abort()
p$task("single")$monitor(time = )

So you can do a cascading like this, it reads like “I want download the file sample1.tgz in project API under account a”

a$project("API")$file("sample1.tgz")$download("~/Desktop/")

Please play the main function yourself, we are not covering all the function in this tutorial. Also some users may notice the object actually has lower level API binding as well for example, you can call a$pipeline_list_project() but those are used internally not supposed to be used by end users. user can just call a$prject("public")

Quick start: running the FastQC pipeline

I will go through the same tutorial again, but this time use a more simpler cascading API.

Create an Auth object

First thing you do is to get your authentication token following this tutorial, then create a Auth object which will be our master object for request and actions.

By default we are using SBG US platform API URL “https://api.sbgenomics.com/1.1/”, you can also pass another URL, for example, NCI cancer cloud API URL.

Now let’s use some fake auth token and load the library.

library(sbgr)
a <- Auth("aef7e9e3f6c54fb1b338ac4ecddf1a56")
a
## == Auth ==
## auth_token : aef7e9e3f6c54fb1b338ac4ecddf1a56
## url : https://api.sbgenomics.com/1.1/

Billing group

Billing group specify how your computation charges, when you first registered, you should have some free credit, with billing group called something like “Free Account”. Or maybe your company/organization purchased a license, you should have other billing group attached as well

To show how many billing group you belong too, just call billing function

b <- a$billing()
b

Create a project

Let’s first list all projects you have under your account

a$project()

You can search it by name or by id, note: id is unique, but name may have multiple hits, by default, we are using exact matching, only return it when it has single hits. There are two additional parameters to control it exact and ignore.case.

a$project(name = "my first") # return one matching

Maybe you want to create another project called “API” just for testing, but make sure you pass the required field name, description and billing_group_id.

a$project_new(
  name = "API", description = "API tutorial",
  billing_group_id = b[[1]]$id)

Upload file and set metadata

To save your typing/lines for cascading style, I want to save the project I am gonna use for this tutorial to a Project object first.

## get the project we just created
p <- a$project("API")

To list all files in that project and search for a file, it’s also easy

a$project("my first")$file()
a$project("my first")$file("illumina", exact = FALSE)

Get the file we are going to upload and get the metadata ready as well.

fl <- system.file("extdata", "sample1.fastq", package = "sbgr")
## create meta data
fl.meta <- list(
  file_type = "fastq",
  seq_tech = "Illumina",
  sample = "sample1",
  author = "tengfei")

To upload the file to the project, just call upload method from project object, it will initialize the multiparts automatically, checking each part and complete the call when finished. You will see the progress bar.

p$upload(fl, metadata = fl.meta)
Initialized
|=============================================================================| 100%
== File ==
id : 55c90c73e4b01cacdc4fbf64
name : sample1.fastq
size : 16
-- metadata --
file_type : fastq
seq_tech : Illumina
sample : sample1
author: tengfei

To check if it’s uploaded successfully, just check the file on the server to see if it exists or not.

p$file(basename(fl))

Keep playing with file metadata

Metadata is designed with fixed fields in the GUI with fixed enum types for some fields, like file_type, to check what’s fixed, please do

## -- Metadata --
## file_type : NA
## qual_scale : NA
## seq_tech : NA
## sample :
## library :
## platform_unit :
## paired_end : NA
Metadata()$file_type
## An object of class "FileTypeSingleEnum"
## [1] NA
## Slot "levels":
##  [1] "text"            "binary"          "fasta"           "csfasta"        
##  [5] "fastq"           "qual"            "xsq"             "sff"            
##  [9] "bam"             "bam_index"       "illumina_export" "vcf"            
## [13] "sam"             "bed"             "archive"         "juncs"          
## [17] "gtf"             "gff"             "enlis_genome"    NA
levels(Metadata()$file_type)
##  [1] "text"            "binary"          "fasta"           "csfasta"        
##  [5] "fastq"           "qual"            "xsq"             "sff"            
##  [9] "bam"             "bam_index"       "illumina_export" "vcf"            
## [13] "sam"             "bed"             "archive"         "juncs"          
## [17] "gtf"             "gff"             "enlis_genome"    NA

To add more metadata items, you can pass more named list entries, and to keep original turn on parameter append.

p$file(basename(fl))$set_metadata(list(version = "2.0"), append = TRUE)

Read more about metadata please visit SBG developer hub page

Delete a file

To delete a file, just call delete method on a File object. for example.

p$file(basename(fl))$delete()

Get pipelines information

p$pipeline()

It’s a new project, there is no pipeline yet, there are several other ways to check out available pipelines for public, my pipeline and pipelines in particular project.

## project pipelines
a$project("my first")$pipeline()
## all public pipeline
a$pipeline()
## my pipeline
a$pipeline("my")
## particular project
a$pipeline(project_name = "my first")

Let’s look for a pipeline about “FastQC”, from public repos

f.pipe <- a$pipeline(pipeline_name = "FastQC")
f.pipe

Cool, we got a unique hit, let’s copy the FastQC pipeline to the new project

p$pipeline_add(pipeline_name = f.pipe$name)

To confirm

f.pipe <- p$pipeline(name = "FastQC")
f.pipe

Run a task with the pipeline

Please call detail function to check the required inputs, here I quote the SBG API introduction

  • id: The ID number of the pipeline, used by the API to uniquely refer to it
  • revision: The revision of the pipeline
  • name: The human-readable name of the pipeline
  • description: A short description of the pipeline’s function
  • inputs: An array listing the pipeline’s input nodes, i.e. those pipeline apps which require files as input
  • nodes: An array of intermediary nodes, representing the pipeline apps which transform the data. These may have editable parameters that should be set before running the pipeline, although in this particular pipeline example no parameters are editable.
  • outputs: An array of output nodes, producing files, reports, visualizations, etc.
f.pipe$details()
f.pipe$details()$inputs
f.pipe$details()$nodes

To run the pipeline, we need to provide task details, we know we want to input our upload sample file as input node here. So we need to link our file id with the input node id.

f.task <- p$task_run(
  name = "my task",
  description = "A text description",
  pipeline_id = f.pipe$id,
  inputs = list("177252" = list(f.file$id)))

f.task

Monitor your task

Pass the interval time (in seconds) this will check status of a task, print finish when it’s finished.

f.task$monitor(30)

Or simple call details method

f.task$details()

Abort a task

f.task$abort()

Download the results

f.task <- p$task(id = "fed9279b-8677-4eef-9c16-9135a457f53f")
f.task$download("~/Desktop/")

Wrap the example up

Ok, the full script is something like

library("sbgr")
token <- "aef7e9e3f6c54fb1b338ac4ecddf1a56"
a <- Auth(token)

## get billing info
b <- a$billing()

## create project
a$project_new(
  name = "API", description = "API tutorial",
  billing_group_id = b[[1]]$id)
p <- a$project("API")

## get data
fl <- system.file("extdata", "sample1.fastq", package = "sbgr")

## create meta data
fl.meta <- list(
  file_type = "fastq",
  seq_tech = "Illumina",
  sample = "sample1",
  author = "tengfei")

## upload data with metadata
p$upload(fl, metadata = fl.meta)

## check uploading success
f.file <- p$file(basename(fl))

## get the pipeline from public repos
f.pipe <- a$pipeline(pipeline_name = "FastQC")

## copy the pipeline to your poject
p$pipeline_add(pipeline_name = f.pipe$name)

## get the pipeline from your project not public one
f.pipe <- p$pipeline(name = "FastQC")

## check the inputs needed for running tasks
f.pipe$details()

## Ready to run a task? go
f.task <- p$task_run(
  name = "my task",
  description = "A text description",
  pipeline_id = f.pipe$id,
  inputs = list("177252" = list(f.file$id)))
f.task$run()

## or you can just run with Task constructor
f.task <- Task(
  auth = Auth(token),
  name = "my task",
  description = "A text description",
  pipeline_id = f.pipe$id,
  project_id = p$id,
  inputs = list("177252" = list(f.file$id)))

## Monitor you task
f.task$monitor(30)

## download a task output files
f.task <- p$task("my task")
f.task$download("~/Desktop/")

## Abort the task
f.task$abort()