Skip to contents

Remove/replace gaps or any irregular characters from protein sequences, to make them suitable for feature extraction or sequence alignment based similarity computation.

Usage

removeGaps(x, pattern = "-", replacement = "", ...)

Arguments

x

character vector, containing the input protein sequence(s).

pattern

character string contains the gap (or other irregular) character to be removed or replaced. Default is "-". For advanced usage, see gsub.

replacement

a replacement for matched characters. Default is "" (remove the matched character).

...

addtional parameters for gsub.

Value

a vector of protein sequence(s) with gaps or irregular characters removed/replaced.

Author

Nan Xiao <https://nanx.me>

Examples

# amino acid sequences that contain gaps ("-")
aaseq <- list(
  "MHGDTPTLHEYMLDLQPETTDLYCYEQLSDSSE-EEDEIDGPAGQAEPDRAHYNIVTFCCKCDSTLRLCVQS",
  "MHGDTPTLHEYMLDLQPETTDLYCYEQLNDSSE-EEDEIDGPAGQAEPDRAHYNIVTFCCKCDSTLRLCVQS"
)
if (FALSE) { # \dontrun{
#' # gaps create issues for alignment
parSeqSim(aaseq)

# remove the gaps
nogapseq <- removeGaps(aaseq)
parSeqSim(nogapseq)
} # }