Skip to contents

Introduction

The tidychem package offers a lightweight R interface for accessing RDKit via the RDKit Python API.

Load, parse, and write chemical data

Chemical data format intro: SMI and SDF/MOL.

Reading. Parsing (Error handling example – will be NULL.). Writing.

Calculate chemical fingerprints

mols <- "smi-multiple.smi" |>
  tidychem_example() |>
  read_smiles()

# ECFP4
mols |> fp_morgan()

# similarity
# mols |> fp_morgan |> sim_tanimoto

# matrix
mols |> fp_morgan(explicit = TRUE)

Calculate chemical descriptors

2D descriptors and 3D descriptors.

The 3D follow a common workflow: 3D formance -> descriptor…

If already optimized with 3D coordinates, load them with parse_sdf or read_sdf directly, then compute the 3D descriptors with the vanilla option.

df <- "logd74.tsv" |>
  tidychem_example() |>
  read_tsv()
y <- df$logD7.4
mols <- df$SMILES |> parse_smiles()
mols

# matrix of 2D/3D descriptors
x <- mols |> desc_2d()
x
x[which(is.na(x))] <- 0
library("glmnet")

cvfit <- cv.glmnet(x, y)
plot(cvfit)

fit <- glmnet(x, y)
plot(fit)
head(coef(fit, s = cvfit$lambda.min, exact = TRUE), n = 30)