This function calculates BLOSUM matrix-derived descriptors.
For users' convenience, protr
provides the BLOSUM45, BLOSUM50,
BLOSUM62, BLOSUM80, BLOSUM100, PAM30, PAM40, PAM70, PAM120, and PAM250
matrices for the 20 amino acids to select from.
Arguments
- x
A character vector, as the input protein sequence.
- submat
Substitution matrix for the 20 amino acids. Should be one of
AABLOSUM45
,AABLOSUM50
,AABLOSUM62
,AABLOSUM80
,AABLOSUM100
,AAPAM30
,AAPAM40
,AAPAM70
,AAPAM120
, orAAPAM250
. Default is"AABLOSUM62"
.- k
Integer. The number of selected scales (i.e. the first
k
scales) derived by the substitution matrix. This can be selected according to the printed relative importance values.- lag
The lag parameter. Must be less than the amino acids.
- scale
Logical. Should we auto-scale the substitution matrix (
submat
) before doing eigen decomposition? Default isTRUE
.- silent
Logical. Whether we print the relative importance of each scales (diagnal value of the eigen decomposition result matrix B) or not. Default is
TRUE
.
References
Georgiev, A. G. (2009). Interpretable numerical descriptors of amino acid space. Journal of Computational Biology, 16(5), 703–723.
Author
Nan Xiao <https://nanx.me>
Examples
x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]]
blosum <- extractBLOSUM(x, submat = "AABLOSUM62", k = 5, lag = 7, scale = TRUE, silent = FALSE)
#> Relative importance of all the possible 20 scales:
#> [1] 1.204960e+01 7.982007e+00 6.254364e+00 4.533706e+00 4.326286e+00
#> [6] 3.850579e+00 3.752197e+00 3.538207e+00 3.139155e+00 2.546405e+00
#> [11] 2.373286e+00 1.666259e+00 1.553126e+00 1.263685e+00 1.024699e+00
#> [16] 9.630187e-01 9.225759e-01 7.221636e-01 1.020085e-01 -5.354392e-18