Parallelized Drug Molecule Similarity Search by Molecular Fingerprints Similarity or Maximum Common Substructure Search
Source:R/705-searchDrug.R
searchDrug.Rd
Parallelized Drug Molecule Similarity Search by Molecular Fingerprints Similarity or Maximum Common Substructure Search
Usage
searchDrug(
mol,
moldb,
cores = 2,
method = c("fp", "mcs"),
fptype = c("standard", "extended", "graph", "hybrid", "maccs", "estate", "pubchem",
"kr", "shortestpath", "fp2", "fp3", "fp4", "obmaccs"),
fpsim = c("tanimoto", "euclidean", "cosine", "dice", "hamming"),
mcssim = c("tanimoto", "overlap"),
...
)
Arguments
- mol
The query molecule. The location of a
sdf
file containing one molecule.- moldb
The molecule database. The location of a
sdf
file containing all the molecules to be searched with.- cores
Integer. The number of CPU cores to use for parallel search, default is
2
. Users could use thedetectCores()
function in theparallel
package to see how many cores they could use.- method
'fp'
or'mcs'
. Search by molecular fingerprints or by maximum common substructure searching.- fptype
The fingerprint type, only available when
method = 'fp'
. Rcpi supports 13 types of fingerprints, including'standard'
,'extended'
,'graph'
,'hybrid'
,'maccs'
,'estate'
,'pubchem'
,'kr'
,'shortestpath'
,'fp2'
,'fp3'
,'fp4'
,'obmaccs'
.- fpsim
Similarity measure type for fingerprint, only available when
method = 'fp'
. Including'tanimoto'
,'euclidean'
,'cosine'
,'dice'
and'hamming'
. SeecalcDrugFPSim
for details.- mcssim
Similarity measure type for maximum common substructure search, only available when
method = 'mcs'
. Including'tanimoto'
and'overlap'
.- ...
Other possible parameter for maximum common substructure search, see
calcDrugMCSSim
for available options.
Value
Named numerical vector. With the decreasing similarity value of the molecules in the database.
Details
This function does compound similarity search derived by various molecular fingerprints with various similarity measures or derived by maximum common substructure search. This function runs for a query compound against a set of molecules.
Examples
mol = system.file('compseq/DB00530.sdf', package = 'Rcpi')
# DrugBank ID DB00530: Erlotinib
moldb = system.file('compseq/tyrphostin.sdf', package = 'Rcpi')
# Database composed by searching 'tyrphostin' in PubChem and filtered by Lipinski's Rule of Five
# \donttest{
searchDrug(mol, moldb, cores = 4, method = 'fp', fptype = 'maccs', fpsim = 'hamming')
#> Error in loadMolecules(mol): The package "rcdk" is required to load molecular structures
searchDrug(mol, moldb, cores = 4, method = 'fp', fptype = 'fp2', fpsim = 'tanimoto')
#> Error: Must install the `ChemmineOB` package first.
searchDrug(mol, moldb, cores = 4, method = 'mcs', mcssim = 'tanimoto')# }
#> Error: Must install the `ChemmineR` package to use the 'mcs' method.