This function reads protein sequences in FASTA format.
Usage
readFASTA(
file = system.file("protseq/P00750.fasta", package = "protr"),
legacy.mode = TRUE,
seqonly = FALSE
)
Arguments
- file
Path to the file containing the protein sequences in FASTA format. If it does not contain an absolute or relative path, the file name is relative to the current working directory,
getwd
. The default here is to read theP00750.fasta
file which is present in theprotseq
directory of the protr package.- legacy.mode
If set to
TRUE
, lines starting with a semicolon (;
) are ignored. Default value isTRUE
.- seqonly
If set to
TRUE
, only sequences as returned without attempt to modify them or to get their names and annotations (execution time is divided approximately by a factor 3). Default value isFALSE
.
Value
Character vector of the protein sequences.
The three returned argument are just different forms of the same output. If one is interested in a Mahalanobis metric over the original data space, the first argument is all she/he needs. If a transformation into another space (where one can use the Euclidean metric) is preferred, the second returned argument is sufficient. Using A and B is equivalent in the following sense.
References
Pearson, W.R. and Lipman, D.J. (1988) Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the United States of America, 85: 2444–2448.
See also
See getUniProt
for retrieving
protein sequences from uniprot.org.
Author
Nan Xiao <https://nanx.me>
Examples
P00750 <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))