Skip to contents

This function calculates amino acid properties based scales descriptors (protein fingerprint) with gap support. Users can specify which AAindex properties to select from the AAindex database by specify the numerical or character index of the properties in the AAindex database.

Usage

extractProtFPGap(x, index = NULL, pc, lag, scale = TRUE, silent = TRUE)

Arguments

x

A character vector, as the input protein sequence. Use '-' to represent gaps in the sequence.

index

Integer vector or character vector. Specify which AAindex properties to select from the AAindex database by specify the numerical or character index of the properties in the AAindex database. Default is NULL, means selecting all the AA properties in the AAindex database.

pc

Integer. Use the first pc principal components as the scales. Must be no greater than the number of AA properties provided.

lag

The lag parameter. Must be less than the amino acids.

scale

Logical. Should we auto-scale the property matrix before PCA? Default is TRUE.

silent

Logical. Whether we print the standard deviation, proportion of variance and the cumulative proportion of the selected principal components or not. Default is TRUE.

Value

A length lag * p^2 named vector, p is the number of scales (principal components) selected.

Author

Nan Xiao <https://nanx.me>

Examples

# amino acid sequence with gaps
x <- readFASTA(system.file("protseq/align.fasta", package = "protr"))$`IXI_235`
fp <- extractProtFPGap(x, index = c(160:165, 258:296), pc = 5, lag = 7, silent = FALSE)
#> Summary of the first 5 principal components: 
#>                             PC1      PC2      PC3      PC4     PC5
#> Standard deviation     4.398253 2.620509 2.267688 1.756102 1.52816
#> Proportion of Variance 0.429880 0.152600 0.114280 0.068530 0.05189
#> Cumulative Proportion  0.429880 0.582480 0.696760 0.765290 0.81718