# Scales-Based Descriptors derived by Factor Analysis

Source:`R/pcm-04-extractFAScales.R`

`extractFAScales.Rd`

This function calculates scales-based descriptors derived by Factor Analysis (FA). Users can provide customized amino acid property matrices.

## Usage

```
extractFAScales(
x,
propmat,
factors,
scores = "regression",
lag,
scale = TRUE,
silent = TRUE
)
```

## Arguments

- x
A character vector, as the input protein sequence.

- propmat
A matrix containing the properties for the amino acids. Each row represent one amino acid type, each column represents one property. Note that the one-letter row names must be provided for we need them to seek the properties for each AA type.

- factors
Integer. The number of factors to be fitted. Must be no greater than the number of AA properties provided.

- scores
Type of scores to produce. The default is

`"regression"`

, which gives Thompson's scores,`"Bartlett"`

given Bartlett's weighted least-squares scores.- lag
The lag parameter. Must be less than the amino acids number in the protein sequence.

- scale
Logical. Should we auto-scale the property matrix (

`propmat`

) before doing Factor Analysis? Default is`TRUE`

.- silent
Logical. Whether we print the SS loadings, proportion of variance and the cumulative proportion of the selected factors or not. Default is

`TRUE`

.

## References

Atchley, W. R., Zhao, J., Fernandes, A. D., & Druke, T. (2005). Solving the protein sequence metric problem. Proceedings of the National Academy of Sciences of the United States of America, 102(18), 6395-6400.

## Author

Nan Xiao <https://nanx.me>

## Examples

```
x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]]
data(AATopo)
tprops <- AATopo[, c(37:41, 43:47)] # select a set of topological descriptors
fa <- extractFAScales(x, propmat = tprops, factors = 5, lag = 7, silent = FALSE)
#> Summary of the factor analysis result:
#>
#> Call:
#> factanal(x = propmat, factors = factors, scores = scores)
#>
#> Uniquenesses:
#> WhetZ Whetm Whetv Whete Whetp JhetZ Jhetm Jhetv Jhete Jhetp
#> 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005
#>
#> Loadings:
#> Factor1 Factor2 Factor3 Factor4 Factor5
#> WhetZ 0.982 -0.166
#> Whetm 0.982 -0.166
#> Whetv 0.989 -0.133
#> Whete 0.982 -0.159
#> Whetp 0.989 -0.131
#> JhetZ -0.399 0.881 0.162 0.192
#> Jhetm -0.399 0.881 0.162 0.192
#> Jhetv 0.985 -0.131
#> Jhete -0.378 0.872 0.300
#> Jhetp 0.970 -0.210
#>
#> Factor1 Factor2 Factor3 Factor4 Factor5
#> SS loadings 5.313 4.340 0.212 0.107 0.003
#> Proportion Var 0.531 0.434 0.021 0.011 0.000
#> Cumulative Var 0.531 0.965 0.986 0.997 0.997
#>
#> Test of the hypothesis that 5 factors are sufficient.
#> The chi square statistic is 472.2 on 5 degrees of freedom.
#> The p-value is 7.96e-100
```