Skip to contents

Generalized Scales-Based Descriptors derived by Factor Analysis

Usage

extractPCMFAScales(
  x,
  propmat,
  factors,
  scores = "regression",
  lag,
  scale = TRUE,
  silent = TRUE
)

Arguments

x

A character vector, as the input protein sequence.

propmat

A matrix containing the properties for the amino acids. Each row represent one amino acid type, each column represents one property. Note that the one-letter row names must be provided for we need them to seek the properties for each AA type.

factors

Integer. The number of factors to be fitted. Must be no greater than the number of AA properties provided.

scores

Type of scores to produce. The default is "regression", which gives Thompson's scores, "Bartlett" given Bartlett's weighted least-squares scores.

lag

The lag parameter. Must be less than the amino acids number in the protein sequence.

scale

Logical. Should we auto-scale the property matrix (propmat) before doing Factor Analysis? Default is TRUE.

silent

Logical. Whether we print the SS loadings, proportion of variance and the cumulative proportion of the selected factors or not. Default is TRUE.

Value

A length lag * p^2 named vector, p is the number of scales (factors) selected.

Details

This function calculates the generalized scales-based descriptors derived by Factor Analysis (FA). Users could provide customized amino acid property matrices.

References

Atchley, W. R., Zhao, J., Fernandes, A. D., & Druke, T. (2005). Solving the protein sequence metric problem. Proceedings of the National Academy of Sciences of the United States of America, 102(18), 6395-6400.

Examples

x = readFASTA(system.file('protseq/P00750.fasta', package = 'Rcpi'))[[1]]
data(AATopo)
tprops = AATopo[, c(37:41, 43:47)]  # select a set of topological descriptors
fa = extractPCMFAScales(x, propmat = tprops, factors = 5, lag = 7, silent = FALSE)
#> Summary of the factor analysis result:
#> 
#> Call:
#> factanal(x = propmat, factors = factors, scores = scores)
#> 
#> Uniquenesses:
#> WhetZ Whetm Whetv Whete Whetp JhetZ Jhetm Jhetv Jhete Jhetp 
#> 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 
#> 
#> Loadings:
#>       Factor1 Factor2 Factor3 Factor4 Factor5
#> WhetZ  0.982  -0.166                         
#> Whetm  0.982  -0.166                         
#> Whetv  0.989  -0.133                         
#> Whete  0.982  -0.159                         
#> Whetp  0.989  -0.131                         
#> JhetZ -0.399   0.881   0.162   0.192         
#> Jhetm -0.399   0.881   0.162   0.192         
#> Jhetv          0.985          -0.131         
#> Jhete -0.378   0.872   0.300                 
#> Jhetp          0.970  -0.210                 
#> 
#>                Factor1 Factor2 Factor3 Factor4 Factor5
#> SS loadings      5.313   4.340   0.212   0.107   0.003
#> Proportion Var   0.531   0.434   0.021   0.011   0.000
#> Cumulative Var   0.531   0.965   0.986   0.997   0.997
#> 
#> Test of the hypothesis that 5 factors are sufficient.
#> The chi square statistic is 472.2 on 5 degrees of freedom.
#> The p-value is 7.96e-100