This “easy API” is built on lower level R API function calls provided by this package, (please check the other tutorial and manual) but with a more user friendly interface that provides
## download a file
a$project("wgs")$file("17.vcf")$download()
## upload a file
a$project("wgs")$upload("sample.tgz")
We defined several different objects easy to manipulate, including Auth, Project, Task, File, Upload, Member, etc, each comes with their own methods to access the server, send request and get a response.
To understand the cascading style, you has to under stand the structure of the SBG platforms.
So our cascading visually represent this relationship and structure, if you are familiar with SBG platform, you will find this interface very easy to understand.
To understand the API better, you can also check the original SBG developer hub documents
This cheat sheet provide main API designed for end users, please note “single” is not real project name, just means the cascading method expecting a single returned object not a list, so you name better has a single hit or you pick the one you want from the list. Remember the logic structure and relationship of those concept will help you remember the API easily. For example, a project has members so of course you can run member() method on project object.
For a searching function that need name or id, when both are empty, they will return every object (file, project, pipeline, task, member) in a list. These function usually has optional parameter called ignore.case
and exact
used for matching, when exact = FALSE
you can use pattern to grep objects.
The cheat sheet only show some most used arguments, please check their help function to read more details.
a <- Auth()
a$billing()
a$project_new(name = , description = , billing_group_id = )
a$project(repos = c("public", "my", "project"), project_name = , project_id = )
## to save some typing
p <- a$project("single")
p$delete()
p$member(name = , id = )
p$member_add()
p$member_update()
p$member_delete()
p$file(name = , id = )
p$file("single")$download()
p$upload(file = , metadata = )
p$pipeline(name = , id = )
p$pipeline_add()
p$task(name = , id = )
p$task_run()
p$task("single")$abort()
p$task("single")$monitor(time = )
So you can do a cascading like this, it reads like “I want download the file sample1.tgz in project API under account a”
a$project("API")$file("sample1.tgz")$download("~/Desktop/")
Please play the main function yourself, we are not covering all the function in this tutorial. Also some users may notice the object actually has lower level API binding as well for example, you can call a$pipeline_list_project()
but those are used internally not supposed to be used by end users. user can just call a$prject("public")
I will go through the same tutorial again, but this time use a more simpler cascading API.
First thing you do is to get your authentication token following this tutorial, then create a Auth object which will be our master object for request and actions.
By default we are using SBG US platform API URL “https://api.sbgenomics.com/1.1/”, you can also pass another URL, for example, NCI cancer cloud API URL.
Now let’s use some fake auth token and load the library.
## == Auth ==
## auth_token : aef7e9e3f6c54fb1b338ac4ecddf1a56
## url : https://api.sbgenomics.com/1.1/
Billing group specify how your computation charges, when you first registered, you should have some free credit, with billing group called something like “Free Account”. Or maybe your company/organization purchased a license, you should have other billing group attached as well
To show how many billing group you belong too, just call billing function
b <- a$billing()
b
Let’s first list all projects you have under your account
a$project()
You can search it by name
or by id
, note: id
is unique, but name
may have multiple hits, by default, we are using exact matching, only return it when it has single hits. There are two additional parameters to control it exact
and ignore.case
.
a$project(name = "my first") # return one matching
Maybe you want to create another project called “API” just for testing, but make sure you pass the required field name
, description
and billing_group_id
.
a$project_new(
name = "API", description = "API tutorial",
billing_group_id = b[[1]]$id)
To save your typing/lines for cascading style, I want to save the project I am gonna use for this tutorial to a Project object first.
## get the project we just created
p <- a$project("API")
To list all files in that project and search for a file, it’s also easy
a$project("my first")$file()
a$project("my first")$file("illumina", exact = FALSE)
Get the file we are going to upload and get the metadata ready as well.
fl <- system.file("extdata", "sample1.fastq", package = "sbgr")
## create meta data
fl.meta <- list(
file_type = "fastq",
seq_tech = "Illumina",
sample = "sample1",
author = "tengfei")
To upload the file to the project, just call upload
method from project object, it will initialize the multiparts automatically, checking each part and complete the call when finished. You will see the progress bar.
p$upload(fl, metadata = fl.meta)
Initialized
|=============================================================================| 100%
== File ==
id : 55c90c73e4b01cacdc4fbf64
name : sample1.fastq
size : 16
-- metadata --
file_type : fastq
seq_tech : Illumina
sample : sample1
author: tengfei
To check if it’s uploaded successfully, just check the file on the server to see if it exists or not.
p$file(basename(fl))
Metadata is designed with fixed fields in the GUI with fixed enum types for some fields, like file_type, to check what’s fixed, please do
Metadata()
## -- Metadata --
## file_type : NA
## qual_scale : NA
## seq_tech : NA
## sample :
## library :
## platform_unit :
## paired_end : NA
Metadata()$file_type
## An object of class "FileTypeSingleEnum"
## [1] NA
## Slot "levels":
## [1] "text" "binary" "fasta" "csfasta"
## [5] "fastq" "qual" "xsq" "sff"
## [9] "bam" "bam_index" "illumina_export" "vcf"
## [13] "sam" "bed" "archive" "juncs"
## [17] "gtf" "gff" "enlis_genome" NA
## [1] "text" "binary" "fasta" "csfasta"
## [5] "fastq" "qual" "xsq" "sff"
## [9] "bam" "bam_index" "illumina_export" "vcf"
## [13] "sam" "bed" "archive" "juncs"
## [17] "gtf" "gff" "enlis_genome" NA
To add more metadata items, you can pass more named list entries, and to keep original turn on parameter append
.
Read more about metadata please visit SBG developer hub page
To delete a file, just call delete
method on a File object. for example.
p$file(basename(fl))$delete()
p$pipeline()
It’s a new project, there is no pipeline yet, there are several other ways to check out available pipelines for public, my pipeline and pipelines in particular project.
## project pipelines
a$project("my first")$pipeline()
## all public pipeline
a$pipeline()
## my pipeline
a$pipeline("my")
## particular project
a$pipeline(project_name = "my first")
Let’s look for a pipeline about “FastQC”, from public repos
f.pipe <- a$pipeline(pipeline_name = "FastQC")
f.pipe
Cool, we got a unique hit, let’s copy the FastQC pipeline to the new project
p$pipeline_add(pipeline_name = f.pipe$name)
To confirm
f.pipe <- p$pipeline(name = "FastQC")
f.pipe
Please call detail
function to check the required inputs, here I quote the SBG API introduction
f.pipe$details()
f.pipe$details()$inputs
f.pipe$details()$nodes
To run the pipeline, we need to provide task details, we know we want to input our upload sample file as input node here. So we need to link our file id with the input node id.
Pass the interval time (in seconds) this will check status of a task, print finish when it’s finished.
f.task$monitor(30)
Or simple call details
method
f.task$details()
f.task <- p$task(id = "fed9279b-8677-4eef-9c16-9135a457f53f")
f.task$download("~/Desktop/")
Ok, the full script is something like
library("sbgr")
token <- "aef7e9e3f6c54fb1b338ac4ecddf1a56"
a <- Auth(token)
## get billing info
b <- a$billing()
## create project
a$project_new(
name = "API", description = "API tutorial",
billing_group_id = b[[1]]$id)
p <- a$project("API")
## get data
fl <- system.file("extdata", "sample1.fastq", package = "sbgr")
## create meta data
fl.meta <- list(
file_type = "fastq",
seq_tech = "Illumina",
sample = "sample1",
author = "tengfei")
## upload data with metadata
p$upload(fl, metadata = fl.meta)
## check uploading success
f.file <- p$file(basename(fl))
## get the pipeline from public repos
f.pipe <- a$pipeline(pipeline_name = "FastQC")
## copy the pipeline to your poject
p$pipeline_add(pipeline_name = f.pipe$name)
## get the pipeline from your project not public one
f.pipe <- p$pipeline(name = "FastQC")
## check the inputs needed for running tasks
f.pipe$details()
## Ready to run a task? go
f.task <- p$task_run(
name = "my task",
description = "A text description",
pipeline_id = f.pipe$id,
inputs = list("177252" = list(f.file$id)))
f.task$run()
## or you can just run with Task constructor
f.task <- Task(
auth = Auth(token),
name = "my task",
description = "A text description",
pipeline_id = f.pipe$id,
project_id = p$id,
inputs = list("177252" = list(f.file$id)))
## Monitor you task
f.task$monitor(30)
## download a task output files
f.task <- p$task("my task")
f.task$download("~/Desktop/")
## Abort the task
f.task$abort()