Inspired by this RDKit-discuss thread started by Liz Wylie: http://www.mail-archive.com/[email protected]/msg03676.html

In [1]:
from rdkit import Chem
from rdkit.Chem import MCS
from rdkit.Chem.Draw import IPythonConsole

Start with two sets of simple molecules

In [2]:
smis = ['CC(C)=C','CC(C)C']
ms = [Chem.MolFromSmiles(x) for x in smis]
smis2 = ['CC(C)=C','CC(C)=N']
ms2 = [Chem.MolFromSmiles(x) for x in smis2]

default MCS behavior:

In [3]:
mcs=MCS.FindMCS(ms)
mcs.smarts
Out[3]:
'[#6]-[#6]-[#6]'
In [4]:
mcs=MCS.FindMCS(ms2)
mcs.smarts
Out[4]:
'[#6]-[#6]-[#6]'

define a simple hash that combines atomic num and hybridization

In [5]:
def label(a): 
    return 100*int(a.GetHybridization())+a.GetAtomicNum()

and now use that to label the atoms of the first set of molecules:

In [6]:
nms = [Chem.Mol(x) for x in ms]
for nm in nms:
    for at in nm.GetAtoms():
        at.SetIsotope(label(at))
In [8]:
Chem.MolToSmiles(nms[0],True)
Out[8]:
'[406CH3][306C]([406CH3])=[306CH2]'

If we now use "isotopes" as the MCS atom comparator, we get nothing for the first set of molecules:

In [9]:
mcs=MCS.FindMCS(nms,atomCompare='isotopes')
mcs.smarts

The second set returns a result:

In [10]:
nms2 = [Chem.Mol(x) for x in ms2]
for nm in nms2:
    for at in nm.GetAtoms():
        at.SetIsotope(label(at))
mcs=MCS.FindMCS(nms2,atomCompare='isotopes')
mcs.smarts
Out[10]:
'[406*]-[306*]-[406*]'

But that's kind of ugly... let's get the SMILES for it.

We start by finding the matching atoms in the first of the modified molecules:

In [11]:
mcsp = Chem.MolFromSmarts(mcs.smarts)
match = nms2[0].GetSubstructMatch(mcsp)
match
Out[11]:
(0, 1, 2)

And now generate smiles for the first unmodified molecule, but only including those atoms. We can do this because we know that the atom numbers didn't change.

In [12]:
print Chem.MolFragmentToSmiles(ms2[0],atomsToUse=match,isomericSmiles=True,canonical=False)
CCC

Ok, let's try it with Liz's test case:

In [13]:
smis=["COc1ccc(C(Nc2nc3c(ncn3COCC=O)c(=O)[nH]2)(c2ccccc2)c2ccccc2)cc1",
      "COc1ccc(C(Nc2nc3c(ncn3COC(CO)(CO)CO)c(=O)[nH]2)(c2ccccc2)c2ccccc2)cc1"]
ms = [Chem.MolFromSmiles(x) for x in smis]
In [14]:
ms[0]
Out[14]:
In [15]:
ms[1]
Out[15]:
In [16]:
nms = [Chem.Mol(x) for x in ms]
for nm in nms:
    for at in nm.GetAtoms():
        at.SetIsotope(label(at))
mcs=MCS.FindMCS(nms,atomCompare='isotopes')
mcs.smarts
Out[16]:
'[406*]-[308*]-[306*]:1:[306*]:[306*]:[306*](:[306*]:[306*]:1)-[406*](-[306*]:1:[306*]:[306*]:[306*]:[306*]:[306*]:1)(-[306*]:1:[306*]:[306*]:[306*]:[306*]:[306*]:1)-[307*]-[306*]:1:[307*]:[306*]:2:[306*](:[306*](:[307*]:1)=[308*]):[307*]:[306*]:[307*]:2-[406*]-[408*]-[406*]'
In [17]:
mcsp = Chem.MolFromSmarts(mcs.smarts)
match = nms[0].GetSubstructMatch(mcsp)
smi=Chem.MolFragmentToSmiles(ms[0],atomsToUse=match,isomericSmiles=True,canonical=False)
print smi
COc1ccc(C(Nc2nc3c(ncn3COC)c(=O)[nH]2)(c2ccccc2)c2ccccc2)cc1
In [18]:
core = Chem.MolFromSmiles(smi)
core
Out[18]: