Remove/replace gaps or any irregular characters from protein sequences, to make them suitable for feature extraction or sequence alignment based similarity computation.
Arguments
- x
character vector, containing the input protein sequence(s).
- pattern
character string contains the gap (or other irregular) character to be removed or replaced. Default is
"-"
. For advanced usage, seegsub
.- replacement
a replacement for matched characters. Default is
""
(remove the matched character).- ...
addtional parameters for
gsub
.
Author
Nan Xiao <https://nanx.me>
Examples
# amino acid sequences that contain gaps ("-")
aaseq <- list(
"MHGDTPTLHEYMLDLQPETTDLYCYEQLSDSSE-EEDEIDGPAGQAEPDRAHYNIVTFCCKCDSTLRLCVQS",
"MHGDTPTLHEYMLDLQPETTDLYCYEQLNDSSE-EEDEIDGPAGQAEPDRAHYNIVTFCCKCDSTLRLCVQS"
)
if (FALSE) { # \dontrun{
#' # gaps create issues for alignment
parSeqSim(aaseq)
# remove the gaps
nogapseq <- removeGaps(aaseq)
parSeqSim(nogapseq)
} # }