Standardises chemical structures according to a set of predefined ChEMBL business rules
from chembl_structure_pipeline import standardizer
s_molblock = """
Mrv1810 07121910172D
4 3 0 0 0 0 999 V2000
-2.5038 0.4060 0.0000 C 0 0 3 0 0 0 0 0 0 0 0 0
-2.5038 1.2310 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0
-3.2182 -0.0065 0.0000 N 0 3 0 0 0 0 0 0 0 0 0 0
-1.7893 -0.0065 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
1 3 1 0 0 0 0
1 4 1 4 0 0 0
M CHG 2 2 -1 3 1
M END
"""
standard_molblock = standardizer.standardize_molblock(s_molblock)
print(standard_molblock)
RDKit 2D 4 3 0 0 0 0 0 0 0 0999 V2000 -2.5038 0.4060 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.5038 1.2310 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -3.2182 -0.0065 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 -1.7893 -0.0065 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 1 3 1 0 1 4 1 0 M END
RDKit WARNING: [13:05:05] Enabling RDKit 2019.09.3 jupyter extensions
Generates parent structures of multi-component compounds based on a set of rules and defined list of salts and solvents
from chembl_structure_pipeline import standardizer
p_molblock = """
Mrv1810 07121910262D
3 1 0 0 0 0 999 V2000
-5.2331 1.1053 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-4.5186 1.5178 0.0000 N 0 3 0 0 0 0 0 0 0 0 0 0
-2.8647 1.5789 0.0000 Cl 0 5 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
M CHG 2 2 1 3 -1
M END
"""
parent_molblock, _ = standardizer.get_parent_molblock(p_molblock)
print(parent_molblock)
RDKit 2D 2 1 0 0 0 0 0 0 0 0999 V2000 -5.2331 1.1053 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -4.5186 1.5178 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 M END
Identifies and validates problem structures before they are added to the database
from chembl_structure_pipeline import checker
c_molblock = """
Mrv1810 02151908462D
4 3 0 0 0 0 999 V2000
2.2321 4.4196 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.0023 4.7153 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.4117 4.5059 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.9568 3.6420 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 1 0 0 0
1 3 1 0 0 0 0
1 4 1 0 0 0 0
M END
"""
issues = checker.check_molblock(c_molblock)
print(issues)
((5, 'InChi_RDKit/Mol stereo mismatch'),)
Standardises chemical structures according to a set of predefined ChEMBL business rules.
import requests
res = requests.post('https://www.ebi.ac.uk/chembl/api/utils/standardize', data=s_molblock)
standard_molblock = res.json()[0]['standard_molblock']
print(standard_molblock)
RDKit 2D 4 3 0 0 0 0 0 0 0 0999 V2000 -2.5038 0.4060 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.5038 1.2310 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -3.2182 -0.0065 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 -1.7893 -0.0065 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 1 3 1 0 1 4 1 0 M END
Generates parent structures of multi-component compounds based on a set of rules and defined list of salts and solvents
import requests
res = requests.post('https://www.ebi.ac.uk/chembl/api/utils/getParent', data=p_molblock)
parent_molblock = res.json()[0]['parent_molblock']
print(parent_molblock)
RDKit 2D 2 1 0 0 0 0 0 0 0 0999 V2000 -5.2331 1.1053 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -4.5186 1.5178 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 M END
Identifies and validates problem structures before they are added to the database
import requests
res = requests.post('https://www.ebi.ac.uk/chembl/api/utils/check', data=c_molblock)
issues = res.json()[0]
print(issues)
[[5, 'InChi_RDKit/Mol stereo mismatch']]