Parallellized Protein Sequence Similarity Calculation based on Sequence Alignment
Source:R/704-calcProtSeqSim.R
calcParProtSeqSim.Rd
Parallellized Protein Sequence Similarity Calculation based on Sequence Alignment
Arguments
- protlist
A length
n
list containingn
protein sequences, each component of the list is a character string, storing one protein sequence. Unknown sequences should be represented as''
.- cores
Integer. The number of CPU cores to use for parallel execution, default is
2
. Users could use thedetectCores()
function in theparallel
package to see how many cores they could use.- type
Type of alignment, default is
'local'
, could be'global'
or'local'
, where'global'
represents Needleman-Wunsch global alignment;'local'
represents Smith-Waterman local alignment.- submat
Substitution matrix, default is
'BLOSUM62'
, could be one of'BLOSUM45'
,'BLOSUM50'
,'BLOSUM62'
,'BLOSUM80'
,'BLOSUM100'
,'PAM30'
,'PAM40'
,'PAM70'
,'PAM120'
,'PAM250'
.
Details
This function implemented the parallellized version for calculating protein sequence similarity based on sequence alignment.
See also
See calcTwoProtSeqSim
for protein sequence alignment
for two protein sequences. See calcParProtGOSim
for
protein similarity calculation based on
Gene Ontology (GO) semantic similarity.
Examples
s1 = readFASTA(system.file('protseq/P00750.fasta', package = 'Rcpi'))[[1]]
s2 = readFASTA(system.file('protseq/P08218.fasta', package = 'Rcpi'))[[1]]
s3 = readFASTA(system.file('protseq/P10323.fasta', package = 'Rcpi'))[[1]]
s4 = readFASTA(system.file('protseq/P20160.fasta', package = 'Rcpi'))[[1]]
s5 = readFASTA(system.file('protseq/Q9NZP8.fasta', package = 'Rcpi'))[[1]]
plist = list(s1, s2, s3, s4, s5)
# \donttest{
psimmat = calcParProtSeqSim(plist, cores = 2, type = 'local',
submat = 'BLOSUM62')
#> Error: The package "pwalign" is required. Please install it from Bioconductor.
print(psimmat)# }
#> Error: object 'psimmat' not found