Generalized BLOSUM and PAM Matrix-Derived Descriptors
Source:R/420-extractPCMBLOSUM.R
extractPCMBLOSUM.Rd
Generalized BLOSUM and PAM Matrix-Derived Descriptors
Arguments
- x
A character vector, as the input protein sequence.
- submat
Substitution matrix for the 20 amino acids. Should be one of
AABLOSUM45
,AABLOSUM50
,AABLOSUM62
,AABLOSUM80
,AABLOSUM100
,AAPAM30
,AAPAM40
,AAPAM70
,AAPAM120
,AAPAM250
. Default is'AABLOSUM62'
.- k
Integer. The number of selected scales (i.e. the first
k
scales) derived by the substitution matrix. This could be selected according to the printed relative importance values.- lag
The lag parameter. Must be less than the amino acids.
- scale
Logical. Should we auto-scale the substitution matrix (
submat
) before doing eigen decomposition? Default isTRUE
.- silent
Logical. Whether we print the relative importance of each scales (diagnal value of the eigen decomposition result matrix B) or not. Default is
TRUE
.
Details
This function calculates the generalized BLOSUM matrix-derived descriptors.
For users' convenience, Rcpi
provides the
BLOSUM45, BLOSUM50, BLOSUM62, BLOSUM80, BLOSUM100,
PAM30, PAM40, PAM70, PAM120, and PAM250 matrices
for the 20 amino acids to select.
References
Georgiev, A. G. (2009). Interpretable numerical descriptors of amino acid space. Journal of Computational Biology, 16(5), 703–723.
Examples
x = readFASTA(system.file('protseq/P00750.fasta', package = 'Rcpi'))[[1]]
blosum = extractPCMBLOSUM(x, submat = 'AABLOSUM62', k = 5, lag = 7, scale = TRUE, silent = FALSE)
#> Relative importance of all the possible 20 scales:
#> [1] 1.204960e+01 7.982007e+00 6.254364e+00 4.533706e+00 4.326286e+00
#> [6] 3.850579e+00 3.752197e+00 3.538207e+00 3.139155e+00 2.546405e+00
#> [11] 2.373286e+00 1.666259e+00 1.553126e+00 1.263685e+00 1.024699e+00
#> [16] 9.630187e-01 9.225759e-01 7.221636e-01 1.020085e-01 -5.354392e-18