What's new in the RDKit since the 2012 UGM?

Greg Landrum

2013 RDKit UGM

General

  • Move to github
  • New contributors

Contrib area

New feature highlights

  • C# wrapper (2012.12)
  • "protected" atoms in reactions (2012.12)
  • rdkit.ML.Scoring.Scoring module, includes ROC, AUC, RIE, BEDROC, and EF (2012.12)
  • Pandas integration (2013.03)
  • chirality fixes for reactions (2013.03)
  • chirality in substructure matching (2013.03)
  • fmcs threshold (2013.06)
  • mmff94 (2013.09)
  • initial protein support (2013.09)
  • similarity maps (2013.09)
In [1]:
from rdkit import Chem
from rdkit.Chem import AllChem,Draw,Descriptors
from rdkit.Chem.Draw import IPythonConsole

Chirality in substructure matching and reactions

Note that throughout in this section I am somewhat lazy with terminology: "Chiral" here refers to chiral centers in molecules, not bulk properties of collections of molecules.

In [2]:
p = Chem.MolFromSmarts('CC(F)Cl')
p1 = Chem.MolFromSmarts('C[[email protected]](F)Cl')
p2 = Chem.MolFromSmarts('C[[email protected]@H](F)Cl')
m = Chem.MolFromSmiles('CCC(F)Cl')
m1 = Chem.MolFromSmiles('CC[[email protected]](F)Cl')
m2 = Chem.MolFromSmiles('CC[[email protected]@H](F)Cl')

The default behavior is the same as in previous versions of the RDKit: substructure matching ignores stereochemistry

In [3]:
print 'm :',m.HasSubstructMatch(p),m.HasSubstructMatch(p1),m.HasSubstructMatch(p2)
print 'm1:',m1.HasSubstructMatch(p),m1.HasSubstructMatch(p1),m1.HasSubstructMatch(p2)
print 'm2:',m2.HasSubstructMatch(p),m2.HasSubstructMatch(p1),m2.HasSubstructMatch(p2)
m : True True True
m1: True True True
m2: True True True

This can be changed with the useChirality option:

In [4]:
print 'm :',m.HasSubstructMatch(p,useChirality=True),m.HasSubstructMatch(p1,useChirality=True),m.HasSubstructMatch(p2,useChirality=True)
print 'm1:',m1.HasSubstructMatch(p,useChirality=True),m1.HasSubstructMatch(p1,useChirality=True),m1.HasSubstructMatch(p2,useChirality=True)
print 'm2:',m2.HasSubstructMatch(p,useChirality=True),m2.HasSubstructMatch(p1,useChirality=True),m2.HasSubstructMatch(p2,useChirality=True)
m : True False False
m1: True True False
m2: True False True

Note how this works: chiral queries only match when atoms with the appropriate chirality are present, but non-chiral queries match everywhere.

Including the explicit H in the query is not necessary as long as the other three atoms are present:

In [5]:
p3 = Chem.MolFromSmarts('C[[email protected]](F)Cl')
p4 = Chem.MolFromSmarts('C[[email protected]@](F)Cl')
In [6]:
print 'm1:',m1.HasSubstructMatch(p3,useChirality=True),m1.HasSubstructMatch(p4,useChirality=True)
print 'm2:',m2.HasSubstructMatch(p3,useChirality=True),m2.HasSubstructMatch(p4,useChirality=True)
m1: True False
m2: False True

Let's move on to reactions

In [7]:
rxn = AllChem.ReactionFromSmarts("[C:1][[email protected]:2]1[C:3][C:4][N:5]([C:6])[C:7](=[O:8])[[email protected]:9]1[C:10]>>[C:1][[email protected]:2]1[C:3][C:4][N:5]([C:6])[C:7](=[O:8])[[email protected]@H:9]1[C:10]")
rxn
Out[7]:

In this reaction, the stereochemistry (if any) at C9 will be inverted, while the stereochemistry (if any) at C2 will be retained.

In [8]:
m = Chem.MolFromSmiles('C[[email protected]]1CCN(C)C(=O)[[email protected]]1CC')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[8]:

The stereochemistry info is not used in the substructure matching:

In [9]:
m = Chem.MolFromSmiles('C[[email protected]@H]1CCN(C)C(=O)[[email protected]@H]1CC')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[9]:

The reaction still applies if stereochemistry info is missing from the reactants:

In [10]:
m = Chem.MolFromSmiles('CC1CCN(C)C(=O)[[email protected]@H]1CC')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[10]:
In [11]:
m = Chem.MolFromSmiles('C[[email protected]]1CCN(C)C(=O)C1CC')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[11]:

Specific cases for stereochemistry in reactions:

Stereo labels match -> retention of stereochemistry:

In [12]:
rxn = AllChem.ReactionFromSmarts("[F:1][[email protected]:2]([C:3])[I:4]>>[F:1][[email protected]:2]([C:3])[Cl:4]")
In [13]:
m = Chem.MolFromSmiles('F[[email protected]](CC)I')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[13]:
In [14]:
m = Chem.MolFromSmiles('F[[email protected]@H](CC)I')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[14]:
In [15]:
m = Chem.MolFromSmiles('FC(CC)I')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[15]:

Stereo labels change -> Inversion

In [16]:
rxn = AllChem.ReactionFromSmarts("[F:1][[email protected]:2]([C:3])[I:4]>>[F:1][[email protected]@H:2]([C:3])[Cl:4]")
In [17]:
m = Chem.MolFromSmiles('F[[email protected]](CC)I')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[17]:
In [18]:
m = Chem.MolFromSmiles('F[[email protected]@H](CC)I')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[18]:
In [19]:
m = Chem.MolFromSmiles('FC(CC)I')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[19]:

Stereo label removed -> removal of stereochemistry:

In [20]:
rxn = AllChem.ReactionFromSmarts("[F:1][[email protected]:2]([C:3])[I:4]>>[F:1][C:2]([C:3])[Cl:4]")
In [21]:
m = Chem.MolFromSmiles('F[[email protected]](CC)I')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[21]:
In [22]:
m = Chem.MolFromSmiles('F[[email protected]@H](CC)I')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[22]:
In [23]:
m = Chem.MolFromSmiles('FC(CC)I')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[23]:

Stereo label appears -> creation/setting of stereochemistry:

In [24]:
rxn = AllChem.ReactionFromSmarts("[F:1][C:2]([C:3])[I:4]>>[F:1][[email protected]:2]([C:3])[Cl:4]")
In [25]:
m = Chem.MolFromSmiles('FC(CC)I')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[25]:
In [26]:
m = Chem.MolFromSmiles('F[[email protected]@H](CC)I')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[26]:
In [27]:
m = Chem.MolFromSmiles('F[[email protected]](CC)I')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[27]:

Some notes

It's not purely the stereo label that's use in determining whether or not to invert:

In [28]:
rxn = AllChem.ReactionFromSmarts("[F:1][[email protected]:2]([C:3])[I:4]>>[F:1][[email protected]:2]([Cl:4])[C:3]")
In [29]:
m = Chem.MolFromSmiles('F[[email protected]](CC)I')
p = rxn.RunReactants((m,))[0][0]
Chem.SanitizeMol(p)
Draw.MolsToGridImage((m,p))
Out[29]:
In [29]:
 
In [29]:
 

Protected atoms in reactions

Can be used to make up for deficiencies in generic reactions or to simulate protecting groups without requiring additional steps.

Demonstrate this with a simple amide-bond formation definition:

In [30]:
rxn = AllChem.ReactionFromSmarts('[C:1](=[O:2])-[OH].[N;!H0:3]>>[C:1](=[O:2])-[N:3]')
rxn
Out[30]:

Here are the reactants, including a building block with two amines:

In [31]:
acid = Chem.MolFromSmiles('c1ccccc1C(=O)O')
acid
Out[31]:
In [32]:
amine = Chem.MolFromSmiles('NCCc1ccccc1N')
amine
Out[32]:

By default this generates two products:

In [33]:
ps = [x[0] for x in rxn.RunReactants((acid,amine))]
[Chem.SanitizeMol(x) for x in ps]
Draw.MolsToGridImage(ps)
Out[33]:

But we can protect one of the amines by setting the "_protected" atom property :

In [34]:
amine.GetAtomWithIdx(0).SetProp("_protected","1")

ps = [x[0] for x in rxn.RunReactants((acid,amine))]
[Chem.SanitizeMol(x) for x in ps]
Draw.MolsToGridImage(ps)
Out[34]:

Clearing the property allows that atom to react again:

In [35]:
amine.GetAtomWithIdx(0).ClearProp("_protected")

ps = [x[0] for x in rxn.RunReactants((acid,amine))]
[Chem.SanitizeMol(x) for x in ps]
Draw.MolsToGridImage(ps)
Out[35]:
In [35]:
 

Pandas integration

Contribution from Nikolas Fechner.

Much more on this coming during Niko's session

In [36]:
import pandas as pd
from rdkit import Chem,rdBase
from rdkit.Chem import PandasTools
In [37]:
data =PandasTools.LoadSDF('data/d3_aid563770.sdf')
data.head()
Out[37]:
ID SMILES ROMol
0 498399 COc1ccccc1N1CCN(CCCN2CCc3ccccc3C2=O)CC1 Mol
1 498400 COc1ccc(N2CCN(CCCN3CCc4ccccc4C3=O)CC2)cc1 Mol
2 498401 O=C1c2ccccc2CCN1CCCN1CCN(c2ccccn2)CC1 Mol
3 498402 O=C1c2ccccc2CCN1CCCN1CCN(c2ncccn2)CC1 Mol
4 498475 O=C1c2ccccc2CCN1CCCN1CCN(c2cccc(C(F)(F)F)c2)CC1 Mol

We can do things like substructure searches:

In [38]:
query = Chem.MolFromSmiles('c1ncccc1')
subset = data[data.ROMol >= query]
subset
Out[38]:
ID SMILES ROMol
2 498401 O=C1c2ccccc2CCN1CCCN1CCN(c2ccccn2)CC1 Mol
8 498479 O=C1c2ccccc2CCCN1CCCN1CCN(c2ccccn2)CC1 Mol
15 497391 O=C1c2ccccc2CCN1CCCCN1CCN(c2ccccn2)CC1 Mol
22 497602 O=C1c2ccccc2CCCN1CCCCN1CCN(c2ccccn2)CC1 Mol

Add descriptors:

In [39]:
data['slogp']=data.ROMol.map(Descriptors.MolLogP)
data['amw'] =data.ROMol.map(Descriptors.MolWt)
In [40]:
data.head()
Out[40]:
ID SMILES ROMol slogp amw
0 498399 COc1ccccc1N1CCN(CCCN2CCc3ccccc3C2=O)CC1 Mol 2.9058 379.504
1 498400 COc1ccc(N2CCN(CCCN3CCc4ccccc4C3=O)CC2)cc1 Mol 2.9058 379.504
2 498401 O=C1c2ccccc2CCN1CCCN1CCN(c2ccccn2)CC1 Mol 2.2922 350.466
3 498402 O=C1c2ccccc2CCN1CCCN1CCN(c2ncccn2)CC1 Mol 1.6872 351.454
4 498475 O=C1c2ccccc2CCN1CCCN1CCN(c2cccc(C(F)(F)F)c2)CC1 Mol 3.9160 417.475

And very easily do plots:

In [41]:
data.plot(x='amw',y='slogp',style='o')
Out[41]:
<matplotlib.axes.AxesSubplot at 0x109455c50>

fmcs threshold

The FMCS code now has an optional argument that causes it to search for the MCS that appears in at least a given fraction of the input molecules.

Here's an example using some molecules from a ChEMBL beta2 adrenergic data set.

In [42]:
from rdkit.Chem import MCS
In [43]:
ms = [x for x in Chem.SDMolSupplier('data/beta2_adrenergic_aid37833.sdf')]
Draw.MolsToGridImage(ms,molsPerRow=5,legends=[x.GetProp('_Name') for x in ms])
Out[43]:

Notice that these molecules share a common scaffold except for 5528, this leads to quite a small MCS:

In [44]:
mcs=MCS.FindMCS(ms,completeRingsOnly=True)
print mcs.smarts
mcsM = Chem.MolFromSmarts(mcs.smarts)
mcsM.UpdatePropertyCache()
Chem.SetHybridization(mcsM)
mcsM
[#7][email protected][#6]:1:[#6]:[#6]:[#6]:[#6]:[#6]:1
Out[44]:

Using the threshold argument allows the algorithm to find an MCS that is more representative:

In [45]:
mcs=MCS.FindMCS(ms,completeRingsOnly=True,threshold=0.8)
print mcs.smarts
mcsM = Chem.MolFromSmarts(mcs.smarts)
mcsM.UpdatePropertyCache()
Chem.SetHybridization(mcsM)
mcsM

Now we can render the molecules nicely aligned:

In [46]:
AllChem.Compute2DCoords(mcsM)
for m in ms:
    if m.HasSubstructMatch(mcsM):
        AllChem.GenerateDepictionMatching2DStructure(m,mcsM)
Draw.MolsToGridImage(ms,molsPerRow=5,legends=[x.GetProp('_Name') for x in ms])
Out[46]:
In [46]:
 

MMFF94 integration

Contribution from Paolo Tosco. More information coming in his presentation.

In [4]:
from rdkit.Chem import PyMol
v = PyMol.MolViewer()
In [5]:
m = Chem.MolFromSmiles('c1ccccc1C(=O)NCC')
mh = Chem.AddHs(m)
In [6]:
import time
AllChem.EmbedMolecule(mh)
v.ShowMol(mh)
v.GetPNG(preDelay=2)
Out[6]:
In [7]:
AllChem.MMFFOptimizeMolecule(mh)
v.ShowMol(mh)
v.GetPNG(preDelay=2)
Out[7]:

Getting information about the atom types and charges:

In [8]:
mp = AllChem.MMFFGetMoleculeProperties(mh)
for idx in range(mh.GetNumAtoms()):
    print idx,mh.GetAtomWithIdx(idx).GetSymbol(),mp.GetMMFFAtomType(idx),mp.GetMMFFPartialCharge(idx)
0 C 37 -0.15
1 C 37 -0.15
2 C 37 -0.15
3 C 37 -0.15
4 C 37 -0.15
5 C 37 0.0862
6 C 3 0.5438
7 O 7 -0.57
8 N 10 -0.7301
9 C 1 0.3001
10 C 1 0.0
11 H 5 0.15
12 H 5 0.15
13 H 5 0.15
14 H 5 0.15
15 H 5 0.15
16 H 28 0.37
17 H 5 0.0
18 H 5 0.0
19 H 5 0.0
20 H 5 0.0
21 H 5 0.0

Similarity maps

Contribution from Sereina Riniker

Riniker, S. & Landrum, G. A. "Similarity maps - a visualization strategy for molecular fingerprints and machine-learning methods." J Cheminf (2013). http://www.jcheminf.com/content/5/1/43

In [54]:
from rdkit.Chem.Draw import SimilarityMaps
from rdkit.Chem import rdMolDescriptors
from rdkit import DataStructs
In [55]:
fp1 = rdMolDescriptors.GetTopologicalTorsionFingerprint(ms[0])
fp2 = rdMolDescriptors.GetTopologicalTorsionFingerprint(ms[16])
print DataStructs.DiceSimilarity(fp1,fp2)
Draw.MolsToGridImage((ms[0],ms[16]))
0.388059701493
Out[55]:
In [56]:
SimilarityMaps.GetSimilarityMapForFingerprint(ms[0],ms[16],SimilarityMaps.GetTTFingerprint)
Out[56]:
(<matplotlib.figure.Figure at 0x109786850>, 0.31663113006396593)
In [57]:
fp1 = rdMolDescriptors.GetAtomPairFingerprint(ms[0])
fp2 = rdMolDescriptors.GetAtomPairFingerprint(ms[16])
print DataStructs.DiceSimilarity(fp1,fp2)
SimilarityMaps.GetSimilarityMapForFingerprint(ms[0],ms[16],SimilarityMaps.GetAPFingerprint)
0.338912133891
Out[57]:
(<matplotlib.figure.Figure at 0x10a8dd2d0>, 0.043901287904228592)
In [58]:
fp1 = rdMolDescriptors.GetMorganFingerprint(ms[0],2)
fp2 = rdMolDescriptors.GetMorganFingerprint(ms[16],2)
print DataStructs.DiceSimilarity(fp1,fp2)
SimilarityMaps.GetSimilarityMapForFingerprint(ms[0],ms[16],SimilarityMaps.GetMorganFingerprint)
0.455284552846
Out[58]:
(<matplotlib.figure.Figure at 0x10a912650>, 0.17499999999999999)