# Scales-Based Descriptors derived by Principal Components Analysis (with Gap Support)

Source:`R/pcm-01-extractScalesGap.R`

`extractScalesGap.Rd`

This function calculates scales-based descriptors derived by Principal Components Analysis (PCA), with gap support. Users can provide customized amino acid property matrices. This function implements the core computation procedure needed for the scales-based descriptors derived by AA-Properties (AAindex) and scales-based descriptors derived by 20+ classes of 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.) in the protr package.

## Arguments

- x
A character vector, as the input protein sequence. Use '

`-`

' to represent gaps in the sequence.- propmat
A matrix containing the properties for the amino acids. Each row represent one amino acid type, each column represents one property. Note that the one-letter row names must be provided for we need them to seek the properties for each AA type.

- pc
Integer. Use the first pc principal components as the scales. Must be no greater than the number of AA properties provided.

- lag
The lag parameter. Must be less than the amino acids.

- scale
Logical. Should we auto-scale the property matrix (

`propmat`

) before PCA? Default is`TRUE`

.- silent
Logical. Whether to print the standard deviation, proportion of variance and the cumulative proportion of the selected principal components or not. Default is

`TRUE`

.

## See also

See `extractProtFPGap`

for amino acid property based
scales descriptors (protein fingerprint) with gap support.

## Author

Nan Xiao <https://nanx.me>

## Examples

```
# amino acid sequence with gaps
x <- readFASTA(system.file("protseq/align.fasta", package = "protr"))$`IXI_235`
data(AAindex)
AAidxmat <- t(na.omit(as.matrix(AAindex[, 7:26])))
scales <- extractScalesGap(x, propmat = AAidxmat, pc = 5, lag = 7, silent = FALSE)
#> Summary of the first 5 principal components:
#> PC1 PC2 PC3 PC4 PC5
#> Standard deviation 12.38381 10.73268 7.742507 6.802462 5.22316
#> Proportion of Variance 0.28881 0.21693 0.112890 0.087140 0.05138
#> Cumulative Proportion 0.28881 0.50574 0.618640 0.705780 0.75716
```