The ssw
package offers an R interface for SSW described in Zhao et al. (2013), a fast implementation of the Smith-Waterman algorithm for sequence alignment using SIMD. The package is currently built on ssw-py.
A short read sequence:
read <- "ACGT"
Exact alignment
Align the read against the reference sequence (TTTTACGTCCCCC
) and print the results:
CIGAR start index 4: 4M
optimal_score: 8 sub-optimal_score: 0
target_begin: 4 target_end: 7
query_begin: 0 query_end: 3
Target: 4 ACGT 7
||||
Query: 0 ACGT 3
ACGT
Get specific results, such as the alignment scores:
[1] 8
a$alignment$sub_optimal_score
[1] 0
Deletion
CIGAR start index 4: 2M1D2M
optimal_score: 5 sub-optimal_score: 0
target_begin: 4 target_end: 8
query_begin: 0 query_end: 3
Target: 4 ACAGT 8
||*||
Query: 0 AC-GT 3
ACGT
Insertion with gap open
CIGAR start index 4: 2M
optimal_score: 4 sub-optimal_score: 0
target_begin: 4 target_end: 5
query_begin: 0 query_end: 1
Target: 4 AC 5
||
Query: 0 AC 1
ACGT
Insertion with no gap open penalty
CIGAR start index 4: 2M1I1M
optimal_score: 6 sub-optimal_score: 0
target_begin: 4 target_end: 6
query_begin: 0 query_end: 3
Target: 4 AC-T 6
||*|
Query: 0 ACGT 3
ACGT
Specify start index
a <- align("ACTG", "ACTCACTG", start_idx = 4)
a
CIGAR start index 0: 4M
optimal_score: 8 sub-optimal_score: 0
target_begin: 0 target_end: 3
query_begin: 0 query_end: 3
Target: 0 ACTC 3
|||*
Query: 0 ACTG 3
ACTG
Print the results from position 4:
CIGAR start index 0: 4M
optimal_score: 8 sub-optimal_score: 0
target_begin: 0 target_end: 3
query_begin: 0 query_end: 3
Target: 0 ACTG 3
||||
Query: 0 ACTG 3
ACTG
Forced alignment
Enforce no gaps by increasing the penalty:
a <- force_align("ACTG", "TTTTCTGCCCCCACG")
a
The results are truncated:
CIGAR start index 4: 3M
optimal_score: 6 sub-optimal_score: 0
target_begin: 4 target_end: 6
query_begin: 1 query_end: 3
Target: 4 CTG 6
|||
Query: 1 CTG 3
ACTG
Use formatter()
to avoid the truncation:
[[1]]
[1] "TTTTCTGCCCCCACG"
[[2]]
[1] " ACTG"
Or pretty-print the formatted results directly:
TTTTCTGCCCCCACG
ACTG
References
Zhao, Mengyao, Wan-Ping Lee, Erik P Garrison, and Gabor T Marth. 2013. “SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications.” PloS ONE 8 (12): e82138. https://doi.org/10.1371/journal.pone.0082138.