Here will attempt to assemble a growth medium for the infant gut based on a breast milk diet. We will use the following strategy:
Let's start by reading the metabolomics data from a study on breast milk which is incidentally the only one on the metabolomics workbench. This one is from obese donors, but since we will fill in the main abundances based on the WHO we hope this will be fairly representative.
import requests
import pandas as pd
from io import StringIO
response = requests.get("https://www.metabolomicsworkbench.org/data/DRCCMetadata.php?Mode=ProcessDownloadResults&DownloadMode=DownloadResults&StudyID=ST001322&AnalysisID=AN002198")
abundances, metabolites = [pd.read_csv(StringIO(content), sep="\t") for content in response.text.split("\n\n")]
abundances.columns = abundances.columns.str.strip()
abundances.set_index("metabolite_name", inplace=True)
abundances = abundances.mean(axis=1)
abundances
metabolite_name zymosterol 24984.303109 xylulose 1194.383420 xylose 2586.269430 xylonolactone 2670.575130 xylitol 3394.310881 ... 1-monopalmitin 209480.132124 1-monoolein 900478.279793 1-methylgalactose 2383.512953 1,5-anhydroglucitol 7029.012953 1,2,4-benzenetriol 597.020725 Length: 124, dtype: float64
metabolites.set_index("metabolite_name", inplace=True)
metabolites["abundance"] = abundances
metabolites
retention index | quantitated m/z | Binbase ID | PubChem ID | spectrum | KEGG ID | InChI Key | ri_type | abundance | |
---|---|---|---|---|---|---|---|---|---|
metabolite_name | |||||||||
zymosterol | 1088064 | 129 | 110304 | 92746.0 | 85:177.0 89:568.0 91:6448.0 92:1105.0 93:4713.... | C05437 | CGSJXLIKVBJVRY-XTGBIJOFSA-N | Binbase | 24984.303109 |
xylulose | 553450 | 173 | 31632 | 439205.0 | 85:1861.0 86:702.0 87:1148.0 88:324.0 89:10095... | C00312 | LQXVFWRQNMEDEE-PYHARJCCSA-N | Binbase | 1194.383420 |
xylose | 543267 | 103 | 169 | 135191.0 | 86:77.0 87:118.0 89:838.0 90:80.0 91:46.0 94:1... | C00181 | SRBFZHDQGSBBOR-IOVATXLUSA-N | Binbase | 2586.269430 |
xylonolactone | 535176 | 217 | 1808 | 439692.0 | 86:6.0 88:17.0 89:5.0 101:53.0 103:590.0 104:3... | C02266 | XXBSUZSONOQQGK-FLRLBIABSA-N | Binbase | 2670.575130 |
xylitol | 567437 | 217 | 5857 | 6912.0 | 85:22.0 87:53.0 88:95.0 89:312.0 94:15.0 99:38... | C00379 | HEBKCHPVOIAQTA-NGQZWQHPSA-N | Binbase | 3394.310881 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1-monopalmitin | 901749 | 129 | 2070 | 14900.0 | 85:11049.0 86:1066.0 87:1727.0 88:2503.0 89:43... | C01885 | QHZLMUACJMDIAE-UHFFFAOYSA-N | Binbase | 209480.132124 |
1-monoolein | 955584 | 129 | 21632 | 5283468.0 | 85:72.0 89:187.0 91:637.0 92:23.0 93:609.0 94:... | NaN | RZRNAYUHWVFMIP-KTKRTIGZSA-N | Binbase | 900478.279793 |
1-methylgalactose | 664807 | 204 | 477 | 2108.0 | 85:1337.0 86:428.0 87:1347.0 88:1090.0 89:3019... | NaN | HOVAGTYPODGVJG-UHFFFAOYSA-N | Binbase | 2383.512953 |
1,5-anhydroglucitol | 633603 | 217 | 209168 | 64960.0 | 85:8076.0 87:8514.0 88:4105.0 89:11826.0 91:22... | C07326 | MPCAJMNYNOGXPB-SLPGGIOYSA-N | Binbase | 7029.012953 |
1,2,4-benzenetriol | 521803 | 239 | 26704 | 10787.0 | 85:1133.0 87:801.0 88:613.0 89:323.0 90:201.0 ... | C02814 | GGNQRNBDZQJCCN-UHFFFAOYSA-N | Binbase | 597.020725 |
124 rows × 9 columns
Now we try to map it onto the AGORA database.
agora_mets = pd.read_csv("../data/agora_metabolites.csv")
agora_mets.head()
metabolite | name | hmdb | kegg.compound | pubchem.compound | inchi | chebi | |
---|---|---|---|---|---|---|---|
0 | 10fthf5glu | 10-formyltetrahydrofolate-[Glu](5) | NaN | NaN | NaN | NaN | NaN |
1 | 10fthf | 10-Formyltetrahydrofolate | HMDB00972 | C00234 | 122347.0 | NaN | NaN |
2 | 10m3hddcaACP | 10-methyl-3-hydroxy-dodecanoyl-ACP | NaN | NaN | NaN | NaN | NaN |
3 | 10m3hundecACP | 10-methyl-3-hydroxy-undecanoyl-ACP | NaN | NaN | NaN | NaN | NaN |
4 | 10m3oddcaACP | 10-methyl-3-oxo-dodecanoyl-ACP | NaN | NaN | NaN | NaN | NaN |
kegg = pd.merge(agora_mets[agora_mets["kegg.compound"].notnull()], metabolites, left_on="kegg.compound", right_on="KEGG ID")
name = pd.merge(agora_mets, metabolites, left_on=agora_mets.name.str.lower(), right_on=metabolites.index)
merged = pd.concat([kegg, name]).drop_duplicates(subset=["metabolite"]).drop(columns=["spectrum", "key_0"])
merged
metabolite | name | hmdb | kegg.compound | pubchem.compound | inchi | chebi | retention index | quantitated m/z | Binbase ID | PubChem ID | KEGG ID | InChI Key | ri_type | abundance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ala_B | beta-alanine | HMDB00056 | C00099 | NaN | InChI=1S/C3H7NO2/c4-2-1-3(5)6/h1-2,4H2,(H,5,6) | NaN | 435564 | 248 | 148 | 239.0 | C00099 | UCMIRNVEIXFBKS-UHFFFAOYSA-N | Binbase | 2.759922e+02 |
1 | ala_L | L-alanine | HMDB00161 | C00041 | 5950.0 | InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5... | NaN | 243971 | 116 | 34178 | 5950.0 | C00041 | QNAYBMKLOCPYGJ-REOHCLBHSA-N | Binbase | 1.915410e+04 |
2 | asp_L | L-aspartate(1-) | HMDB00191 | C00049 | 5960.0 | InChI=1S/C4H7NO4/c5-2(4(8)9)1-3(6)7/h2H,1,5H2,... | NaN | 480387 | 232 | 79 | 5960.0 | C00049 | CKLJMWTZIZZHCS-REOHCLBHSA-N | Binbase | 2.535544e+02 |
3 | cit | Citrate | HMDB00094 | C00158 | 311.0 | InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10... | NaN | 617342 | 273 | 288 | 311.0 | C00158 | KRKNYBCHXYNGOX-UHFFFAOYSA-N | Binbase | 2.791619e+03 |
4 | ddca | laurate | HMDB00638 | C02679 | 3893.0 | InChI=1S/C12H24O2/c1-2-3-4-5-6-7-8-9-10-11-12(... | NaN | 547906 | 117 | 49 | 3893.0 | C02679 | POULHZVOKOAJMA-UHFFFAOYSA-N | Binbase | 3.843045e+05 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
58 | glutar | Glutarate | HMDB00661 | C00489 | 3772.0 | InChI=1S/C5H8O4/c6-4(7)2-1-3-5(8)9/h1-3H2,(H,6... | NaN | 421596 | 261 | 16952 | 743.0 | C00489 | JFCQEDHGNNZCLN-UHFFFAOYSA-N | Binbase | 2.172487e+02 |
59 | chsterol | cholesterol | HMDB00067 | C00187 | 5997.0 | NaN | NaN | 1076014 | 129 | 87943 | 5997.0 | C00187 | HVYWMOMLDIMFJA-DPAQBDIFSA-N | Binbase | 2.556318e+05 |
6 | lcts | Lactose | HMDB00186 | C00243 | 440995.0 | NaN | NaN | 932179 | 204 | 1373 | 6134.0 | C01970 | GUBGYTABKSRVRQ-DCSYEGIMSA-N | Binbase | 4.425044e+06 |
8 | raffin | Raffinose | NaN | NaN | NaN | NaN | NaN | 1120886 | 361 | 3190 | 439242.0 | C00492 | MUPFEKGTMRGPLJ-ZQSKZDJDSA-N | Binbase | 2.543718e+03 |
12 | hqn | Hydroquinone | NaN | NaN | NaN | NaN | NaN | 422583 | 239 | 16709 | 785.0 | C00530 | QIGBRXMKCJKVMJ-UHFFFAOYSA-N | Binbase | 1.679870e+02 |
63 rows × 15 columns
Now we will in some abundances with data from the WHO (https://archive.unu.edu/unupress/food/8F174e/8F174E04.htm). We do this for 1l of milk for now since that is pretty much the largest amount a baby drinks per day. We also add in some carbon sources that are present in the gut (mucin cores and primary bile acids). Note that we won't be using the metabolomics abundances here since those are relative data not absolute ones (are under peak). So higher values between metabolites dont't necessarily mean that one is more abundant than the other.
merged.set_index("metabolite", inplace=True)
merged.loc["lcts", "mmol_per_litre"] = 70/180*1000
merged.loc["chsterol", "mmol_per_litre"] = 0.16/386 * 1000
merged.loc["ca2", "mmol_per_litre"] = 0.3/40 * 1000
merged.loc["ppi", "mmol_per_litre"] = 0.14/174 * 1000
merged.loc["na1", "mmol_per_litre"] = 0.15/35 * 1000
merged.loc["k", "mmol_per_litre"] = 0.55/39 * 1000
merged.loc["cl", "mmol_per_litre"] = 0.43/35 * 1000
# mucin
for met in agora_mets.loc[agora_mets.metabolite.str.contains("core"), "metabolite"]:
merged.loc[met, "mmol_per_litre"] = 1
merged.loc[met, "name"] = agora_mets.loc[agora_mets.metabolite == met, "name"].values
# primary BAs
for met in ["gchola", "tchola"]:
merged.loc[met, "mmol_per_litre"] = 1
merged.loc[met, "name"] = agora_mets.loc[agora_mets.metabolite == met, "name"].values
# anaerobic
merged.loc["o2", ["mmol_per_litre", "name"]] = [0.001, "Oxygen"]
merged.loc[merged.mmol_per_litre.isnull(), "mmol_per_litre"] = 1
merged
name | hmdb | kegg.compound | pubchem.compound | inchi | chebi | retention index | quantitated m/z | Binbase ID | PubChem ID | KEGG ID | InChI Key | ri_type | abundance | mmol_per_litre | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
metabolite | |||||||||||||||
ala_B | beta-alanine | HMDB00056 | C00099 | NaN | InChI=1S/C3H7NO2/c4-2-1-3(5)6/h1-2,4H2,(H,5,6) | NaN | 435564.0 | 248.0 | 148.0 | 239.0 | C00099 | UCMIRNVEIXFBKS-UHFFFAOYSA-N | Binbase | 275.992228 | 1.000 |
ala_L | L-alanine | HMDB00161 | C00041 | 5950.0 | InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5... | NaN | 243971.0 | 116.0 | 34178.0 | 5950.0 | C00041 | QNAYBMKLOCPYGJ-REOHCLBHSA-N | Binbase | 19154.101036 | 1.000 |
asp_L | L-aspartate(1-) | HMDB00191 | C00049 | 5960.0 | InChI=1S/C4H7NO4/c5-2(4(8)9)1-3(6)7/h2H,1,5H2,... | NaN | 480387.0 | 232.0 | 79.0 | 5960.0 | C00049 | CKLJMWTZIZZHCS-REOHCLBHSA-N | Binbase | 253.554404 | 1.000 |
cit | Citrate | HMDB00094 | C00158 | 311.0 | InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10... | NaN | 617342.0 | 273.0 | 288.0 | 311.0 | C00158 | KRKNYBCHXYNGOX-UHFFFAOYSA-N | Binbase | 2791.619171 | 1.000 |
ddca | laurate | HMDB00638 | C02679 | 3893.0 | InChI=1S/C12H24O2/c1-2-3-4-5-6-7-8-9-10-11-12(... | NaN | 547906.0 | 117.0 | 49.0 | 3893.0 | C02679 | POULHZVOKOAJMA-UHFFFAOYSA-N | Binbase | 384304.502591 | 1.000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
gncore2_rl | released GlcNAc-alpha-1,4-Core 2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.000 |
core7 | Core 7 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.000 |
gchola | glycocholate | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.000 |
tchola | taurocholate | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.000 |
o2 | Oxygen | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.001 |
84 rows × 15 columns
Now we will try to identify components that can be taken up by human cells.
To achieve this we will load the Recon3 human model. AGORA and Recon IDs are very similar so we should be able to match them. We just have to adjust the Recon3 ones a bit. We start by identifying all available exchanges in Recon3 and adjusting the IDs.
from cobra.io import read_sbml_model
import pandas as pd
recon3 = read_sbml_model("../data/Recon3D.xml.gz")
exchanges = pd.Series([r.id for r in recon3.exchanges])
exchanges = exchanges.str.replace("__", "_").str.replace("_e$", "", regex=True)
exchanges.head()
0 EX_5adtststerone 1 EX_5adtststerones 2 EX_5fthf 3 EX_5htrp 4 EX_5mthf dtype: object
medium = merged.reset_index().copy()
medium["reaction"] = "EX_" + medium.metabolite
medium["dilution"] = 1.0
medium.loc[medium.reaction.isin(exchanges), "dilution"] = 0.1
medium.dilution.value_counts()
0.1 71 1.0 13 Name: dilution, dtype: int64
medium["metabolite"] = medium.reaction.str.replace("^EX_", "", regex=True) + "_m"
medium["global_id"] = medium.reaction + "(e)"
medium["reaction"] = medium.reaction + "_m"
medium["flux"] = medium.mmol_per_litre * medium.dilution
medium.loc[medium.flux < 1e-4, "flux"] = 1e-4
medium
metabolite | name | hmdb | kegg.compound | pubchem.compound | inchi | chebi | retention index | quantitated m/z | Binbase ID | PubChem ID | KEGG ID | InChI Key | ri_type | abundance | mmol_per_litre | reaction | dilution | global_id | flux | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ala_B_m | beta-alanine | HMDB00056 | C00099 | NaN | InChI=1S/C3H7NO2/c4-2-1-3(5)6/h1-2,4H2,(H,5,6) | NaN | 435564.0 | 248.0 | 148.0 | 239.0 | C00099 | UCMIRNVEIXFBKS-UHFFFAOYSA-N | Binbase | 275.992228 | 1.000 | EX_ala_B_m | 0.1 | EX_ala_B(e) | 0.1000 |
1 | ala_L_m | L-alanine | HMDB00161 | C00041 | 5950.0 | InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5... | NaN | 243971.0 | 116.0 | 34178.0 | 5950.0 | C00041 | QNAYBMKLOCPYGJ-REOHCLBHSA-N | Binbase | 19154.101036 | 1.000 | EX_ala_L_m | 0.1 | EX_ala_L(e) | 0.1000 |
2 | asp_L_m | L-aspartate(1-) | HMDB00191 | C00049 | 5960.0 | InChI=1S/C4H7NO4/c5-2(4(8)9)1-3(6)7/h2H,1,5H2,... | NaN | 480387.0 | 232.0 | 79.0 | 5960.0 | C00049 | CKLJMWTZIZZHCS-REOHCLBHSA-N | Binbase | 253.554404 | 1.000 | EX_asp_L_m | 0.1 | EX_asp_L(e) | 0.1000 |
3 | cit_m | Citrate | HMDB00094 | C00158 | 311.0 | InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10... | NaN | 617342.0 | 273.0 | 288.0 | 311.0 | C00158 | KRKNYBCHXYNGOX-UHFFFAOYSA-N | Binbase | 2791.619171 | 1.000 | EX_cit_m | 0.1 | EX_cit(e) | 0.1000 |
4 | ddca_m | laurate | HMDB00638 | C02679 | 3893.0 | InChI=1S/C12H24O2/c1-2-3-4-5-6-7-8-9-10-11-12(... | NaN | 547906.0 | 117.0 | 49.0 | 3893.0 | C02679 | POULHZVOKOAJMA-UHFFFAOYSA-N | Binbase | 384304.502591 | 1.000 | EX_ddca_m | 0.1 | EX_ddca(e) | 0.1000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
79 | gncore2_rl_m | released GlcNAc-alpha-1,4-Core 2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.000 | EX_gncore2_rl_m | 1.0 | EX_gncore2_rl(e) | 1.0000 |
80 | core7_m | Core 7 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.000 | EX_core7_m | 0.1 | EX_core7(e) | 0.1000 |
81 | gchola_m | glycocholate | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.000 | EX_gchola_m | 0.1 | EX_gchola(e) | 0.1000 |
82 | tchola_m | taurocholate | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.000 | EX_tchola_m | 0.1 | EX_tchola(e) | 0.1000 |
83 | o2_m | Oxygen | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.001 | EX_o2_m | 0.1 | EX_o2(e) | 0.0001 |
84 rows × 20 columns
medium[medium.metabolite == "lcts_m"]
metabolite | name | hmdb | kegg.compound | pubchem.compound | inchi | chebi | retention index | quantitated m/z | Binbase ID | PubChem ID | KEGG ID | InChI Key | ri_type | abundance | mmol_per_litre | reaction | dilution | global_id | flux | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
60 | lcts_m | Lactose | HMDB00186 | C00243 | 440995.0 | NaN | NaN | 932179.0 | 204.0 | 1373.0 | 6134.0 | C01970 | GUBGYTABKSRVRQ-DCSYEGIMSA-N | Binbase | 4.425044e+06 | 388.888889 | EX_lcts_m | 0.1 | EX_lcts(e) | 38.888889 |
But can the bacteria in our model database actually grow on this medium? Let's check and start by downbloading the AGORA model database.
# !wget https://zenodo.org/record/3755182/files/agora103_genus.qza?download=1 -O data/agora103_genus.qza
No we we will check for growth by running the growth medium against any single model.
from micom.workflows.db_media import check_db_medium
check = check_db_medium("../data/agora103_genus.qza", medium, threads=20)
Output()
check
now includes the entire manifest plus two new columns: the growth rate and whether the models can grow.
check.can_grow.value_counts()
False 227 Name: can_grow, dtype: int64
Okay nothing can grow. We probably miss some important cofactor such as manganese or copper.
Let's complete the medium so that all taxa in AGORA can grow at a rate of at least 1e-3.
Sometimes you may start from a few componenents and will want to complete this skeleton medium to reach a certain minimum growth rate across all models in the database. This can be done with complete_db_medium
. We can minimize either the added total flux, mass or presence of any atom. Since, we want to build a low carb diet here we will minimize the presence of added carbon.
from micom.workflows.db_media import complete_db_medium
manifest, imports = complete_db_medium("../data/agora103_genus.qza", medium, growth=0.01, threads=20, max_added_import=10, weights="mass")
Output()
manifest.can_grow.value_counts()
True 225 False 2 Name: can_grow, dtype: int64
manifest
is the amended manifest as before and imports
contains the used import fluxes for each model. A new column in the manifest also tells us how many import were added.
manifest.added.describe()
count 225.000000 mean 16.511111 std 6.705615 min 6.000000 25% 12.000000 50% 15.000000 75% 21.000000 max 38.000000 Name: added, dtype: float64
From this we build up our new medium.
fluxes = imports.max()
fluxes = fluxes[(fluxes > 1e-6) | fluxes.index.isin(medium.reaction)]
completed = pd.DataFrame({
"reaction": fluxes.index,
"metabolite": fluxes.index.str.replace("^EX_", "", regex=True),
"global_id": fluxes.index.str.replace("_m$", "(e)", regex=True),
"flux": fluxes
})
completed.shape
(182, 4)
Let's also export the medium as Qiime 2 artifact which can be read with q2-micom
or the normal micom package.
from qiime2 import Artifact
arti = Artifact.import_data("MicomMedium[Global]", completed)
arti.save("../media/breast_milk_agora.qza")
'../media/breast_milk_agora.qza'
As a last step we validate the created medium.
check = check_db_medium("../data/agora103_genus.qza", completed, threads=20)
check.can_grow.value_counts()
Output()
True 227 Name: can_grow, dtype: int64
check.growth_rate.describe()
count 227.000000 mean 0.019931 std 0.011181 min 0.000048 25% 0.010000 50% 0.019846 75% 0.025641 max 0.062186 Name: growth_rate, dtype: float64