Skip to contents

This function calculates scales-based descriptors derived by Principal Components Analysis (PCA), with gap support. Users can provide customized amino acid property matrices. This function implements the core computation procedure needed for the scales-based descriptors derived by AA-Properties (AAindex) and scales-based descriptors derived by 20+ classes of 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.) in the protr package.

Usage

extractScalesGap(x, propmat, pc, lag, scale = TRUE, silent = TRUE)

Arguments

x

A character vector, as the input protein sequence. Use '-' to represent gaps in the sequence.

propmat

A matrix containing the properties for the amino acids. Each row represent one amino acid type, each column represents one property. Note that the one-letter row names must be provided for we need them to seek the properties for each AA type.

pc

Integer. Use the first pc principal components as the scales. Must be no greater than the number of AA properties provided.

lag

The lag parameter. Must be less than the amino acids.

scale

Logical. Should we auto-scale the property matrix (propmat) before PCA? Default is TRUE.

silent

Logical. Whether to print the standard deviation, proportion of variance and the cumulative proportion of the selected principal components or not. Default is TRUE.

Value

A length lag * p^2 named vector, p is the number of scales (principal components) selected.

See also

See extractProtFPGap for amino acid property based scales descriptors (protein fingerprint) with gap support.

Author

Nan Xiao <https://nanx.me>

Examples

# amino acid sequence with gaps
x <- readFASTA(system.file("protseq/align.fasta", package = "protr"))$`IXI_235`
data(AAindex)
AAidxmat <- t(na.omit(as.matrix(AAindex[, 7:26])))
scales <- extractScalesGap(x, propmat = AAidxmat, pc = 5, lag = 7, silent = FALSE)
#> Summary of the first 5 principal components: 
#>                             PC1      PC2      PC3      PC4     PC5
#> Standard deviation     12.38381 10.73268 7.742507 6.802462 5.22316
#> Proportion of Variance  0.28881  0.21693 0.112890 0.087140 0.05138
#> Cumulative Proportion   0.28881  0.50574 0.618640 0.705780 0.75716