This function calculates BLOSUM matrix-derived descriptors.
For users' convenience, protr provides the BLOSUM45, BLOSUM50,
BLOSUM62, BLOSUM80, BLOSUM100, PAM30, PAM40, PAM70, PAM120, and PAM250
matrices for the 20 amino acids to select from.
Arguments
- x
A character vector, as the input protein sequence.
- submat
Substitution matrix for the 20 amino acids. Should be one of
AABLOSUM45,AABLOSUM50,AABLOSUM62,AABLOSUM80,AABLOSUM100,AAPAM30,AAPAM40,AAPAM70,AAPAM120, orAAPAM250. Default is"AABLOSUM62".- k
Integer. The number of selected scales (i.e. the first
kscales) derived by the substitution matrix. This can be selected according to the printed relative importance values.- lag
The lag parameter. Must be less than the amino acids.
- scale
Logical. Should we auto-scale the substitution matrix (
submat) before doing eigen decomposition? Default isTRUE.- silent
Logical. Whether we print the relative importance of each scales (diagnal value of the eigen decomposition result matrix B) or not. Default is
TRUE.
References
Georgiev, A. G. (2009). Interpretable numerical descriptors of amino acid space. Journal of Computational Biology, 16(5), 703–723.
Author
Nan Xiao <https://nanx.me>
Examples
x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]]
blosum <- extractBLOSUM(x, submat = "AABLOSUM62", k = 5, lag = 7, scale = TRUE, silent = FALSE)
#> Relative importance of all the possible 20 scales:
#> [1] 1.204960e+01 7.982007e+00 6.254364e+00 4.533706e+00 4.326286e+00
#> [6] 3.850579e+00 3.752197e+00 3.538207e+00 3.139155e+00 2.546405e+00
#> [11] 2.373286e+00 1.666259e+00 1.553126e+00 1.263685e+00 1.024699e+00
#> [16] 9.630187e-01 9.225759e-01 7.221636e-01 1.020085e-01 -5.354392e-18
