ChEMBL Structure Pipeline demo

Use it directly from the library

Standardize a molblock

Standardises chemical structures according to a set of predefined ChEMBL business rules

In [1]:
from chembl_structure_pipeline import standardizer

s_molblock = """
  Mrv1810 07121910172D          

  4  3  0  0  0  0            999 V2000
   -2.5038    0.4060    0.0000 C   0  0  3  0  0  0  0  0  0  0  0  0
   -2.5038    1.2310    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
   -3.2182   -0.0065    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
   -1.7893   -0.0065    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  3  1  0  0  0  0
  1  4  1  4  0  0  0
M  CHG  2   2  -1   3   1
M  END
"""

standard_molblock = standardizer.standardize_molblock(s_molblock)
print(standard_molblock)
     RDKit          2D

  4  3  0  0  0  0  0  0  0  0999 V2000
   -2.5038    0.4060    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.5038    1.2310    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2182   -0.0065    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -1.7893   -0.0065    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  1  3  1  0
  1  4  1  0
M  END

RDKit WARNING: [13:05:05] Enabling RDKit 2019.09.3 jupyter extensions

Get parent molblock

Generates parent structures of multi-component compounds based on a set of rules and defined list of salts and solvents

In [2]:
from chembl_structure_pipeline import standardizer

p_molblock = """
  Mrv1810 07121910262D          

  3  1  0  0  0  0            999 V2000
   -5.2331    1.1053    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.5186    1.5178    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
   -2.8647    1.5789    0.0000 Cl  0  5  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
M  CHG  2   2   1   3  -1
M  END
"""

parent_molblock, _ = standardizer.get_parent_molblock(p_molblock)
print(parent_molblock)
     RDKit          2D

  2  1  0  0  0  0  0  0  0  0999 V2000
   -5.2331    1.1053    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.5186    1.5178    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
M  END

Check the molecule

Identifies and validates problem structures before they are added to the database

In [3]:
from chembl_structure_pipeline import checker

c_molblock = """ 
  Mrv1810 02151908462D           
 
  4  3  0  0  0  0            999 V2000 
    2.2321    4.4196    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0 
    3.0023    4.7153    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0 
    1.4117    4.5059    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0 
    1.9568    3.6420    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0 
  1  2  1  1  0  0  0 
  1  3  1  0  0  0  0 
  1  4  1  0  0  0  0 
M  END 
"""

issues = checker.check_molblock(c_molblock)
print(issues)
((5, 'InChi_RDKit/Mol stereo mismatch'),)

Use it from Beaker

  • Beaker allows to send a batch of molecules that can be processed in a single request but this is no longer recommended.

Standardize a molblock

Standardises chemical structures according to a set of predefined ChEMBL business rules.

In [4]:
import requests

res = requests.post('https://www.ebi.ac.uk/chembl/api/utils/standardize', data=s_molblock)
standard_molblock = res.json()[0]['standard_molblock']
print(standard_molblock)
     RDKit          2D

  4  3  0  0  0  0  0  0  0  0999 V2000
   -2.5038    0.4060    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.5038    1.2310    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2182   -0.0065    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -1.7893   -0.0065    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  1  3  1  0
  1  4  1  0
M  END

Get parent molblock

Generates parent structures of multi-component compounds based on a set of rules and defined list of salts and solvents

In [5]:
import requests

res = requests.post('https://www.ebi.ac.uk/chembl/api/utils/getParent', data=p_molblock)
parent_molblock = res.json()[0]['parent_molblock']
print(parent_molblock)
     RDKit          2D

  2  1  0  0  0  0  0  0  0  0999 V2000
   -5.2331    1.1053    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.5186    1.5178    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
M  END

Check the molecule

Identifies and validates problem structures before they are added to the database

In [6]:
import requests

res = requests.post('https://www.ebi.ac.uk/chembl/api/utils/check', data=c_molblock)
issues = res.json()[0]
print(issues)
[[5, 'InChi_RDKit/Mol stereo mismatch']]
In [ ]: