Protein Sequence Segmentation/Partition

This function extracts the segmentations from the protein sequence.

Usage

protseg(
  x,
  aa = c("A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K", "M", "F", "P", "S",
    "T", "W", "Y", "V"),
  k = 7
)

Arguments

x: A character vector, as the input protein sequence.
aa: A character, the amino acid type. One of 'A', 'R', 'N', 'D', 'C', 'E', 'Q', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V'.
k: A positive integer, specifys the window size (half of the window), default is 7.

Value

A named list, each component contains one of the segmentations (a character string), names of the list components are the positions of the specified amino acid in the sequence.

Author

Nan Xiao <https://nanx.me>

Examples

x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]]
protseg(x, aa = "R", k = 5)
#> $`6`
#> [1] "MDAMKRGLCCV"
#> 
#> $`29`
#> [1] "QEIHARFRRGA"
#> 
#> $`31`
#> [1] "IHARFRRGARS"
#> 
#> $`32`
#> [1] "HARFRRGARSY"
#> 
#> $`35`
#> [1] "FRRGARSYQVI"
#> 
#> $`42`
#> [1] "YQVICRDEKTQ"
#> 
#> $`58`
#> [1] "HQSWLRPVLRS"
#> 
#> $`62`
#> [1] "LRPVLRSNRVE"
#> 
#> $`65`
#> [1] "VLRSNRVEYCW"
#> 
#> $`75`
#> [1] "WCNSGRAQCHS"
#> 
#> $`90`
#> [1] "SCSEPRCFNGG"
#> 
#> $`124`
#> [1] "CEIDTRATCYE"
#> 
#> $`136`
#> [1] "QGISYRGTWST"
#> 
#> $`164`
#> [1] "KPYSGRRPDAI"
#> 
#> $`165`
#> [1] "PYSGRRPDAIR"
#> 
#> $`170`
#> [1] "RPDAIRLGLGN"
#> 
#> $`180`
#> [1] "NHNYCRNPDRD"
#> 
#> $`184`
#> [1] "CRNPDRDSKPW"
#> 
#> $`224`
#> [1] "NGSAYRGTHSL"
#> 
#> $`268`
#> [1] "KHNYCRNPDGD"
#> 
#> $`284`
#> [1] "HVLKNRRLTWE"
#> 
#> $`285`
#> [1] "VLKNRRLTWEY"
#> 
#> $`302`
#> [1] "STCGLRQYSQP"
#> 
#> $`310`
#> [1] "SQPQFRIKGGL"
#> 
#> $`333`
#> [1] "IFAKHRRSPGE"
#> 
#> $`334`
#> [1] "FAKHRRSPGER"
#> 
#> $`339`
#> [1] "RSPGERFLCGG"
#> 
#> $`362`
#> [1] "HCFQERFPPHH"
#> 
#> $`374`
#> [1] "TVILGRTYRVV"
#> 
#> $`377`
#> [1] "LGRTYRVVPGE"
#> 
#> $`418`
#> [1] "KSDSSRCAQES"
#> 
#> $`427`
#> [1] "ESSVVRTVCLP"
#> 
#> $`462`
#> [1] "PFYSERLKEAH"
#> 
#> $`469`
#> [1] "KEAHVRLYPSS"
#> 
#> $`475`
#> [1] "LYPSSRCTSQH"
#> 
#> $`484`
#> [1] "QHLLNRTVTDN"
#> 
#> $`497`
#> [1] "CAGDTRSGGPQ"
#> 
#> $`524`
#> [1] "CLNDGRMTLVG"
#> 
#> $`557`
#> [1] "YLDWIRDNMRP"
#>