BLOSUM and PAM Matrix-Derived Descriptors

This function calculates BLOSUM matrix-derived descriptors. For users' convenience, protr provides the BLOSUM45, BLOSUM50, BLOSUM62, BLOSUM80, BLOSUM100, PAM30, PAM40, PAM70, PAM120, and PAM250 matrices for the 20 amino acids to select from.

Usage

extractBLOSUM(x, submat = "AABLOSUM62", k, lag, scale = TRUE, silent = TRUE)

Arguments

x: A character vector, as the input protein sequence.
submat: Substitution matrix for the 20 amino acids. Should be one of AABLOSUM45, AABLOSUM50, AABLOSUM62, AABLOSUM80, AABLOSUM100, AAPAM30, AAPAM40, AAPAM70, AAPAM120, or AAPAM250. Default is "AABLOSUM62".
k: Integer. The number of selected scales (i.e. the first k scales) derived by the substitution matrix. This can be selected according to the printed relative importance values.
lag: The lag parameter. Must be less than the amino acids.
scale: Logical. Should we auto-scale the substitution matrix (submat) before doing eigen decomposition? Default is TRUE.
silent: Logical. Whether we print the relative importance of each scales (diagnal value of the eigen decomposition result matrix B) or not. Default is TRUE.

Value

A length lag * p^2 named vector, p is the number of scales selected.

References

Georgiev, A. G. (2009). Interpretable numerical descriptors of amino acid space. Journal of Computational Biology, 16(5), 703–723.

Author

Nan Xiao <https://nanx.me>

Examples

x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]]
blosum <- extractBLOSUM(x, submat = "AABLOSUM62", k = 5, lag = 7, scale = TRUE, silent = FALSE)
#> Relative importance of all the possible 20 scales: 
#>  [1]  1.204960e+01  7.982007e+00  6.254364e+00  4.533706e+00  4.326286e+00
#>  [6]  3.850579e+00  3.752197e+00  3.538207e+00  3.139155e+00  2.546405e+00
#> [11]  2.373286e+00  1.666259e+00  1.553126e+00  1.263685e+00  1.024699e+00
#> [16]  9.630187e-01  9.225759e-01  7.221636e-01  1.020085e-01 -5.354392e-18