
Rcpi Quick Reference Card
Nan Xiao <https://nanx.me>
Source:vignettes/Rcpi-quickref.Rmd
      Rcpi-quickref.RmdRetrieve protein sequence data from online databases
| Function name | Function description | 
|---|---|
| getProt() | Retrieve protein sequence in FASTA format or PDB format from various online databases | 
| getFASTAFromUniProt() | Retrieve protein sequence in FASTA format from UniProt | 
| getFASTAFromKEGG() | Retrieve protein sequence in FASTA format from KEGG | 
| getPDBFromRCSBPDB() | Retrieve protein sequence in PDB Format from RCSB PDB | 
| getSeqFromUniProt() | Retrieve protein sequence from UniProt | 
| getSeqFromKEGG() | Retrieve protein sequence from KEGG | 
| getSeqFromRCSBPDB() | Retrieve protein sequence from RCSB PDB | 
Retrieve drug molecular data from online databases
| Function name | Function description | 
|---|---|
| getDrug() | Retrieve drug molecules in MOL format and SMILES format from various online databases | 
| getMolFromDrugBank() | Retrieve drug molecules in MOL format from DrugBank | 
| getMolFromPubChem() | Retrieve drug molecules in MOL format from PubChem | 
| getMolFromChEMBL() | Retrieve drug molecules in MOL format from ChEMBL | 
| getMolFromKEGG() | Retrieve drug molecules in MOL format from the KEGG | 
| getMolFromCAS() | Retrieve drug molecules in InChI format from CAS | 
| getSmiFromDrugBank() | Retrieve drug molecules in SMILES format from DrugBank | 
| getSmiFromPubChem() | Retrieve drug molecules in SMILES format from PubChem | 
| getSmiFromChEMBL() | Retrieve drug molecules in SMILES format from ChEMBL | 
| getSmiFromKEGG() | Retrieve drug molecules in SMILES format from KEGG | 
Calculate commonly used protein sequence derived descriptors
| Function name | Descriptor name | Descriptor group | 
|---|---|---|
| extractProtAAC() | Amino acid composition | Amino acid composition | 
| extractProtDC() | Dipeptide composition | |
| extractProtTC() | Tripeptide composition | |
| extractProtMoreauBroto() | Normalized Moreau-Broto autocorrelation | Autocorrelation | 
| extractProtMoran() | Moran autocorrelation | |
| extractProtGeary() | Geary autocorrelation | |
| extractProtCTDC() | Composition | CTD | 
| extractProtCTDT() | Transition | |
| extractProtCTDD() | Distribution | |
| extractProtCTriad() | Conjoint Triad | Conjoint Triad | 
| extractProtSOCN() | Sequence-order-coupling number | Quasi-sequence-order | 
| extractProtQSO() | Quasi-sequence-order descriptors | |
| extractProtPAAC() | Pseudo-amino acid composition | Pseudo-amino acid composition | 
| extractProtAPAAC() | Amphiphilic pseudo-amino acid composition | |
| AAindex | AAindex data of 544 physicochemical and biological properties for 20 amino acids | Dataset | 
Generate profile-based protein representations
| Function name | Function description | 
|---|---|
| extractProtPSSM() | Compute PSSM (Position-Specific Scoring Matrix) for given protein sequence or peptides | 
| extractProtPSSMFeature() | Profile-based protein representation derived by PSSM | 
| extractProtPSSMAcc() | Profile-based protein representation derived by PSSM and auto cross covariance (ACC) | 
Generate scales-based descriptors for proteochemometrics modeling
| Function name | Descriptor class | Derived by | 
|---|---|---|
| extractPCMScales() | Generalized scales-based descriptors derived by principal components analysis (PCA) | Principal components analysis | 
| extractPCMPropScales() | Generalized scales-based descriptors derived by amino acid properties (AAindex) | |
| extractPCMDescScales() | Generalized scales-based descriptors derived by 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.) | |
| extractPCMFAScales() | Generalized scales-based descriptors derived by factor analysis | Factor analysis | 
| extractPCMMDSScales() | Generalized scales-based descriptors derived by multidimensional scaling (MDS) | Multidimensional scaling | 
| extractPCMBLOSUM() | Generalized BLOSUM and PAM matrix-derived descriptors | Substitution matrix | 
| acc() | Auto cross covariance (ACC) for generating scales-based descriptors of the same length | 
Molecular descriptor sets of the 20 amino acids for generating scales-based descriptors
| Dataset name | Dataset description | Dimensionality | Calculated by | 
|---|---|---|---|
| OptAA3d | Optimized 20 amino acids | – | MOE | 
| AA2DACOR | 2D autocorrelations descriptors | 92 | Dragon | 
| AA3DMoRSE | 3D-MoRSE descriptors | 160 | Dragon | 
| AAACF | Atom-centred fragments descriptors | 6 | Dragon | 
| AABurden | Burden Eigenvalues descriptors | 62 | Dragon | 
| AAConn | Connectivity indices descriptors | 33 | Dragon | 
| AAConst | Constitutional descriptors | 23 | Dragon | 
| AAEdgeAdj | Edge adjacency indices descriptors | 97 | Dragon | 
| AAEigIdx | Eigenvalue-based indices descriptors | 44 | Dragon | 
| AAFGC | Functional group counts descriptors | 5 | Dragon | 
| AAGeom | Geometrical descriptors | 41 | Dragon | 
| AAGETAWAY | GETAWAY descriptors | 194 | Dragon | 
| AAInfo | Information indices descriptors | 47 | Dragon | 
| AAMolProp | Molecular properties descriptors | 12 | Dragon | 
| AARandic | Randic molecular profiles descriptors | 41 | Dragon | 
| AARDF | RDF descriptors | 82 | Dragon | 
| AATopo | Topological descriptors | 78 | Dragon | 
| AATopoChg | Topological charge indices descriptors | 15 | Dragon | 
| AAWalk | Walk and path counts descriptors | 40 | Dragon | 
| AAWHIM | WHIM descriptors | 99 | Dragon | 
| AACPSA | CPSA descriptors | 41 | Accelrys Discovery Studio | 
| AADescAll | All the 2D descriptors calculated by Dragon | 1171 | Dragon | 
| AAMOE2D | All the 2D descriptors calculated by MOE | 148 | MOE | 
| AAMOE3D | All the 3D descriptors calculated by MOE | 143 | MOE | 
| AABLOSUM45 | BLOSUM45 matrix for 20 amino acids | Biostrings | |
| AABLOSUM50 | BLOSUM50 matrix for 20 amino acids | Biostrings | |
| AABLOSUM62 | BLOSUM62 matrix for 20 amino acids | Biostrings | |
| AABLOSUM80 | BLOSUM80 matrix for 20 amino acids | Biostrings | |
| AABLOSUM100 | BLOSUM100 matrix for 20 amino acids | Biostrings | |
| AAPAM30 | PAM30 matrix for 20 amino acids | Biostrings | |
| AAPAM40 | PAM40 matrix for 20 amino acids | Biostrings | |
| AAPAM70 | PAM70 matrix for 20 amino acids | Biostrings | |
| AAPAM120 | PAM120 matrix for 20 amino acids | Biostrings | |
| AAPAM250 | PAM250 matrix for 20 amino acids | Biostrings | 
Note: non-informative descriptors (e.g. descriptors with only one value across all the 20 amino acids) in these datasets have been filtered out.
Molecular descriptors
| Function name | Descriptor name | 
|---|---|
| extractDrugAIO() | All the molecular descriptors in the package | 
| extractDrugALOGP() | Atom additive logP and molar refractivity values descriptor | 
| extractDrugAminoAcidCount() | Number of amino acids | 
| extractDrugApol() | Sum of the atomic polarizabilities | 
| extractDrugAromaticAtomsCount() | Number of aromatic atoms | 
| extractDrugAromaticBondsCount() | Number of aromatic bonds | 
| extractDrugAtomCount() | Number of atom descriptor | 
| extractDrugAutocorrelationCharge() | Moreau-Broto autocorrelation descriptors using partial charges | 
| extractDrugAutocorrelationMass() | Moreau-Broto autocorrelation descriptors using atomic weight | 
| extractDrugAutocorrelationPolarizability() | Moreau-Broto autocorrelation descriptors using polarizability | 
| extractDrugBCUT() | BCUT, the eigenvalue based descriptor | 
| extractDrugBondCount() | Number of bonds of a certain bond order | 
| extractDrugBPol() | Sum of the absolute value of the difference between atomic polarizabilities of all bonded atoms in the molecule | 
| extractDrugCarbonTypes() | Topological descriptor characterizing the carbon connectivity in terms of hybridization | 
| extractDrugChiChain() | Kier & Hall Chi chain indices of orders 3, 4, 5, 6 and 7 | 
| extractDrugChiCluster() | Kier & Hall Chi cluster indices of orders 3, 4, 5 and 6 | 
| extractDrugChiPath() | Kier & Hall Chi path indices of orders 0 to 7 | 
| extractDrugChiPathCluster() | Kier & Hall Chi path cluster indices of orders 4, 5 and 6 | 
| extractDrugCPSA() | Descriptors combining surface area and partial charge information | 
| extractDrugDescOB() | Molecular descriptors provided by OpenBabel | 
| extractDrugECI() | Eccentric connectivity index descriptor | 
| extractDrugFMF() | FMF descriptor | 
| extractDrugFragmentComplexity() | Complexity of a system | 
| extractDrugGravitationalIndex() | Mass distribution of the molecule | 
| extractDrugHBondAcceptorCount() | Number of hydrogen bond acceptors | 
| extractDrugHBondDonorCount() | Number of hydrogen bond donors | 
| extractDrugHybridizationRatio() | Molecular complexity in terms of carbon hybridization states | 
| extractDrugIPMolecularLearning() | Ionization potential | 
| extractDrugKappaShapeIndices() | Kier & Hall Kappa molecular shape indices | 
| extractDrugKierHallSmarts() | Number of occurrences of the E-State fragments | 
| extractDrugLargestChain() | Number of atoms in the largest chain | 
| extractDrugLargestPiSystem() | Number of atoms in the largest Pi chain | 
| extractDrugLengthOverBreadth() | Ratio of length to breadth descriptor | 
| extractDrugLongestAliphaticChain() | Number of atoms in the longest aliphatic chain | 
| extractDrugMannholdLogP() | LogP based on the number of carbons and hetero atoms | 
| extractDrugMDE() | Molecular Distance Edge (MDE) descriptors for C, N and O | 
| extractDrugMomentOfInertia() | Principal moments of inertia and ratios of the principal moments | 
| extractDrugPetitjeanNumber() | Petitjean number of a molecule | 
| extractDrugPetitjeanShapeIndex() | Petitjean shape indices | 
| extractDrugRotatableBondsCount() | Number of non-rotatable bonds on a molecule | 
| extractDrugRuleOfFive() | Number failures of the Lipinski’s Rule Of Five | 
| extractDrugTPSA() | Topological Polar Surface Area (TPSA) | 
| extractDrugVABC() | Volume of a molecule | 
| extractDrugVAdjMa() | Vertex adjacency information of a molecule | 
| extractDrugWeight() | Total weight of atoms | 
| extractDrugWeightedPath() | Weighted path (Molecular ID) | 
| extractDrugWHIM() | Holistic descriptors described by Todeschini et al. | 
| extractDrugWienerNumbers() | Wiener path number and wiener polarity number | 
| extractDrugXLogP() | Prediction of logP based on the atom-type method called XLogP | 
| extractDrugZagrebIndex() | Sum of the squared atom degrees of all heavy atoms | 
Molecular fingerprints
| Function name | Fingerprint type | 
|---|---|
| extractDrugStandard() | Standard molecular fingerprints (in compact format) | 
| extractDrugStandardComplete() | Standard molecular fingerprints (in complete format) | 
| extractDrugExtended() | Extended molecular fingerprints (in compact format) | 
| extractDrugExtendedComplete() | Extended molecular fingerprints (in complete format) | 
| extractDrugGraph() | Graph molecular fingerprints (in compact format) | 
| extractDrugGraphComplete() | Graph molecular fingerprints (in complete format) | 
| extractDrugHybridization() | Hybridization molecular fingerprints (in compact format) | 
| extractDrugHybridizationComplete() | Hybridization molecular fingerprints (in complete format) | 
| extractDrugMACCS() | MACCS molecular fingerprints (in compact format) | 
| extractDrugMACCSComplete() | MACCS molecular fingerprints (in complete format) | 
| extractDrugEstate() | E-State molecular fingerprints (in compact format) | 
| extractDrugEstateComplete() | E-State molecular fingerprints (in complete format) | 
| extractDrugPubChem() | PubChem molecular fingerprints (in compact format) | 
| extractDrugPubChemComplete() | PubChem molecular fingerprints (in complete format) | 
| extractDrugKR() | KR (Klekota and Roth) molecular fingerprints (in compact format) | 
| extractDrugKRComplete() | KR (Klekota and Roth) molecular fingerprints (in complete format) | 
| extractDrugShortestPath() | Shortest Path molecular fingerprints (in compact format) | 
| extractDrugShortestPathComplete() | Shortest Path molecular fingerprints (in complete format) | 
| extractDrugOBFP2() | FP2 molecular fingerprints | 
| extractDrugOBFP3() | FP3 molecular fingerprints | 
| extractDrugOBFP4() | FP4 molecular fingerprints | 
| extractDrugOBMACCS() | MACCS molecular fingerprints | 
Protein-protein and compound-protein interation descriptors
| Function name | Function description | 
|---|---|
| getPPI() | Generating protein-protein interaction descriptors | 
| getCPI() | Generating compound-protein interaction descriptors | 
Similarity and similarity searching
| Function name | Function description | 
|---|---|
| calcDrugFPSim() | Calculate drug molecule similarity derived by molecular fingerprints | 
| calcDrugMCSSim() | Calculate drug molecule similarity derived by maximum common substructure search | 
| searchDrug() | Parallelized drug molecule similarity search by molecular fingerprints similarity or maximum common substructure search | 
| calcTwoProtSeqSim() | Similarity calculation based on sequence alignment for a pair of protein sequences | 
| calcParProtSeqSim() | Parallellized protein sequence similarity calculation based on sequence alignment | 
| calcTwoProtGOSim() | Similarity calculation based on Gene Ontology (GO) similarity between two proteins | 
| calcParProtGOSim() | Protein similarity calculation based on Gene Ontology (GO) similarity | 
Protein sequence data manipulation
| Function name | Function description | 
|---|---|
| readFASTA() | Read protein sequences in FASTA format | 
| readPDB() | Read protein sequences in PDB format | 
| segProt() | Protein sequence segmentation | 
| checkProt() | Check if the protein sequence’s amino acid types are the 20 default types | 
Molecular data manipulation
| Function name | Function description | 
|---|---|
| readMolFromSDF() | Read molecules from SDF files and return parsed Java molecular object | 
| readMolFromSmi() | Read molecules from SMILES files and return parsed Java molecular object or plain text list | 
| convMolFormat() | Chemical file formats conversion |