Breast milk medium for AGORA¶

Here will attempt to assemble a growth medium for the infant gut based on a breast milk diet. We will use the following strategy:

Obtain metabolomics data for human breast milk
Use WHO breast milk composition data to fill in some of the abundances
Distribute remaining mass across the non-quantified components
Add in intestinal metabolites like mucins and primary bile acids
complete the medium so all taxa in the AGORA database can grow in it

Let's start by reading the metabolomics data from a study on breast milk which is incidentally the only one on the metabolomics workbench. This one is from obese donors, but since we will fill in the main abundances based on the WHO we hope this will be fairly representative.

In [106]:

import requests
import pandas as pd
from io import StringIO

response = requests.get("https://www.metabolomicsworkbench.org/data/DRCCMetadata.php?Mode=ProcessDownloadResults&DownloadMode=DownloadResults&StudyID=ST001322&AnalysisID=AN002198")
abundances, metabolites = [pd.read_csv(StringIO(content), sep="\t") for content in response.text.split("\n\n")]
abundances.columns = abundances.columns.str.strip()
abundances.set_index("metabolite_name", inplace=True)
abundances = abundances.mean(axis=1)
abundances

Out[106]:

metabolite_name
zymosterol              24984.303109
xylulose                 1194.383420
xylose                   2586.269430
xylonolactone            2670.575130
xylitol                  3394.310881
                           ...      
1-monopalmitin         209480.132124
1-monoolein            900478.279793
1-methylgalactose        2383.512953
1,5-anhydroglucitol      7029.012953
1,2,4-benzenetriol        597.020725
Length: 124, dtype: float64

In [107]:

metabolites.set_index("metabolite_name", inplace=True)
metabolites["abundance"] = abundances
metabolites

Out[107]:

	retention index	quantitated m/z	Binbase ID	PubChem ID	spectrum	KEGG ID	InChI Key	ri_type	abundance
metabolite_name
zymosterol	1088064	129	110304	92746.0	85:177.0 89:568.0 91:6448.0 92:1105.0 93:4713....	C05437	CGSJXLIKVBJVRY-XTGBIJOFSA-N	Binbase	24984.303109
xylulose	553450	173	31632	439205.0	85:1861.0 86:702.0 87:1148.0 88:324.0 89:10095...	C00312	LQXVFWRQNMEDEE-PYHARJCCSA-N	Binbase	1194.383420
xylose	543267	103	169	135191.0	86:77.0 87:118.0 89:838.0 90:80.0 91:46.0 94:1...	C00181	SRBFZHDQGSBBOR-IOVATXLUSA-N	Binbase	2586.269430
xylonolactone	535176	217	1808	439692.0	86:6.0 88:17.0 89:5.0 101:53.0 103:590.0 104:3...	C02266	XXBSUZSONOQQGK-FLRLBIABSA-N	Binbase	2670.575130
xylitol	567437	217	5857	6912.0	85:22.0 87:53.0 88:95.0 89:312.0 94:15.0 99:38...	C00379	HEBKCHPVOIAQTA-NGQZWQHPSA-N	Binbase	3394.310881
...	...	...	...	...	...	...	...	...	...
1-monopalmitin	901749	129	2070	14900.0	85:11049.0 86:1066.0 87:1727.0 88:2503.0 89:43...	C01885	QHZLMUACJMDIAE-UHFFFAOYSA-N	Binbase	209480.132124
1-monoolein	955584	129	21632	5283468.0	85:72.0 89:187.0 91:637.0 92:23.0 93:609.0 94:...	NaN	RZRNAYUHWVFMIP-KTKRTIGZSA-N	Binbase	900478.279793
1-methylgalactose	664807	204	477	2108.0	85:1337.0 86:428.0 87:1347.0 88:1090.0 89:3019...	NaN	HOVAGTYPODGVJG-UHFFFAOYSA-N	Binbase	2383.512953
1,5-anhydroglucitol	633603	217	209168	64960.0	85:8076.0 87:8514.0 88:4105.0 89:11826.0 91:22...	C07326	MPCAJMNYNOGXPB-SLPGGIOYSA-N	Binbase	7029.012953
1,2,4-benzenetriol	521803	239	26704	10787.0	85:1133.0 87:801.0 88:613.0 89:323.0 90:201.0 ...	C02814	GGNQRNBDZQJCCN-UHFFFAOYSA-N	Binbase	597.020725

124 rows × 9 columns

Now we try to map it onto the AGORA database.

In [108]:

agora_mets = pd.read_csv("../data/agora_metabolites.csv")
agora_mets.head()

Out[108]:

	metabolite	name	hmdb	kegg.compound	pubchem.compound	inchi	chebi
0	10fthf5glu	10-formyltetrahydrofolate-[Glu](5)	NaN	NaN	NaN	NaN	NaN
1	10fthf	10-Formyltetrahydrofolate	HMDB00972	C00234	122347.0	NaN	NaN
2	10m3hddcaACP	10-methyl-3-hydroxy-dodecanoyl-ACP	NaN	NaN	NaN	NaN	NaN
3	10m3hundecACP	10-methyl-3-hydroxy-undecanoyl-ACP	NaN	NaN	NaN	NaN	NaN
4	10m3oddcaACP	10-methyl-3-oxo-dodecanoyl-ACP	NaN	NaN	NaN	NaN	NaN

In [109]:

kegg = pd.merge(agora_mets[agora_mets["kegg.compound"].notnull()], metabolites, left_on="kegg.compound", right_on="KEGG ID")
name = pd.merge(agora_mets, metabolites, left_on=agora_mets.name.str.lower(), right_on=metabolites.index)
merged = pd.concat([kegg, name]).drop_duplicates(subset=["metabolite"]).drop(columns=["spectrum", "key_0"])
merged

Out[109]:

	metabolite	name	hmdb	kegg.compound	pubchem.compound	inchi	chebi	retention index	quantitated m/z	Binbase ID	PubChem ID	KEGG ID	InChI Key	ri_type	abundance
0	ala_B	beta-alanine	HMDB00056	C00099	NaN	InChI=1S/C3H7NO2/c4-2-1-3(5)6/h1-2,4H2,(H,5,6)	NaN	435564	248	148	239.0	C00099	UCMIRNVEIXFBKS-UHFFFAOYSA-N	Binbase	2.759922e+02
1	ala_L	L-alanine	HMDB00161	C00041	5950.0	InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5...	NaN	243971	116	34178	5950.0	C00041	QNAYBMKLOCPYGJ-REOHCLBHSA-N	Binbase	1.915410e+04
2	asp_L	L-aspartate(1-)	HMDB00191	C00049	5960.0	InChI=1S/C4H7NO4/c5-2(4(8)9)1-3(6)7/h2H,1,5H2,...	NaN	480387	232	79	5960.0	C00049	CKLJMWTZIZZHCS-REOHCLBHSA-N	Binbase	2.535544e+02
3	cit	Citrate	HMDB00094	C00158	311.0	InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10...	NaN	617342	273	288	311.0	C00158	KRKNYBCHXYNGOX-UHFFFAOYSA-N	Binbase	2.791619e+03
4	ddca	laurate	HMDB00638	C02679	3893.0	InChI=1S/C12H24O2/c1-2-3-4-5-6-7-8-9-10-11-12(...	NaN	547906	117	49	3893.0	C02679	POULHZVOKOAJMA-UHFFFAOYSA-N	Binbase	3.843045e+05
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
58	glutar	Glutarate	HMDB00661	C00489	3772.0	InChI=1S/C5H8O4/c6-4(7)2-1-3-5(8)9/h1-3H2,(H,6...	NaN	421596	261	16952	743.0	C00489	JFCQEDHGNNZCLN-UHFFFAOYSA-N	Binbase	2.172487e+02
59	chsterol	cholesterol	HMDB00067	C00187	5997.0	NaN	NaN	1076014	129	87943	5997.0	C00187	HVYWMOMLDIMFJA-DPAQBDIFSA-N	Binbase	2.556318e+05
6	lcts	Lactose	HMDB00186	C00243	440995.0	NaN	NaN	932179	204	1373	6134.0	C01970	GUBGYTABKSRVRQ-DCSYEGIMSA-N	Binbase	4.425044e+06
8	raffin	Raffinose	NaN	NaN	NaN	NaN	NaN	1120886	361	3190	439242.0	C00492	MUPFEKGTMRGPLJ-ZQSKZDJDSA-N	Binbase	2.543718e+03
12	hqn	Hydroquinone	NaN	NaN	NaN	NaN	NaN	422583	239	16709	785.0	C00530	QIGBRXMKCJKVMJ-UHFFFAOYSA-N	Binbase	1.679870e+02

63 rows × 15 columns

Now we will in some abundances with data from the WHO (https://archive.unu.edu/unupress/food/8F174e/8F174E04.htm). We do this for 1l of milk for now since that is pretty much the largest amount a baby drinks per day. We also add in some carbon sources that are present in the gut (mucin cores and primary bile acids). Note that we won't be using the metabolomics abundances here since those are relative data not absolute ones (are under peak). So higher values between metabolites dont't necessarily mean that one is more abundant than the other.

In [110]:

merged.set_index("metabolite", inplace=True)
merged.loc["lcts", "mmol_per_litre"] = 70/180*1000
merged.loc["chsterol", "mmol_per_litre"] = 0.16/386 * 1000
merged.loc["ca2", "mmol_per_litre"] = 0.3/40 * 1000
merged.loc["ppi", "mmol_per_litre"] = 0.14/174 * 1000
merged.loc["na1", "mmol_per_litre"] = 0.15/35 * 1000
merged.loc["k", "mmol_per_litre"] = 0.55/39 * 1000
merged.loc["cl", "mmol_per_litre"] = 0.43/35 * 1000

# mucin
for met in agora_mets.loc[agora_mets.metabolite.str.contains("core"), "metabolite"]:
    merged.loc[met, "mmol_per_litre"] = 1
    merged.loc[met, "name"] = agora_mets.loc[agora_mets.metabolite == met, "name"].values

# primary BAs
for met in ["gchola", "tchola"]:
    merged.loc[met, "mmol_per_litre"] = 1
    merged.loc[met, "name"] = agora_mets.loc[agora_mets.metabolite == met, "name"].values
    
# anaerobic
merged.loc["o2", ["mmol_per_litre", "name"]] = [0.001, "Oxygen"]

merged.loc[merged.mmol_per_litre.isnull(), "mmol_per_litre"] = 1
merged

Out[110]:

	name	hmdb	kegg.compound	pubchem.compound	inchi	chebi	retention index	quantitated m/z	Binbase ID	PubChem ID	KEGG ID	InChI Key	ri_type	abundance	mmol_per_litre
metabolite
ala_B	beta-alanine	HMDB00056	C00099	NaN	InChI=1S/C3H7NO2/c4-2-1-3(5)6/h1-2,4H2,(H,5,6)	NaN	435564.0	248.0	148.0	239.0	C00099	UCMIRNVEIXFBKS-UHFFFAOYSA-N	Binbase	275.992228	1.000
ala_L	L-alanine	HMDB00161	C00041	5950.0	InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5...	NaN	243971.0	116.0	34178.0	5950.0	C00041	QNAYBMKLOCPYGJ-REOHCLBHSA-N	Binbase	19154.101036	1.000
asp_L	L-aspartate(1-)	HMDB00191	C00049	5960.0	InChI=1S/C4H7NO4/c5-2(4(8)9)1-3(6)7/h2H,1,5H2,...	NaN	480387.0	232.0	79.0	5960.0	C00049	CKLJMWTZIZZHCS-REOHCLBHSA-N	Binbase	253.554404	1.000
cit	Citrate	HMDB00094	C00158	311.0	InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10...	NaN	617342.0	273.0	288.0	311.0	C00158	KRKNYBCHXYNGOX-UHFFFAOYSA-N	Binbase	2791.619171	1.000
ddca	laurate	HMDB00638	C02679	3893.0	InChI=1S/C12H24O2/c1-2-3-4-5-6-7-8-9-10-11-12(...	NaN	547906.0	117.0	49.0	3893.0	C02679	POULHZVOKOAJMA-UHFFFAOYSA-N	Binbase	384304.502591	1.000
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
gncore2_rl	released GlcNAc-alpha-1,4-Core 2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.000
core7	Core 7	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.000
gchola	glycocholate	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.000
tchola	taurocholate	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.000
o2	Oxygen	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.001

84 rows × 15 columns

Now we will try to identify components that can be taken up by human cells.

Identifying human adsorption¶

To achieve this we will load the Recon3 human model. AGORA and Recon IDs are very similar so we should be able to match them. We just have to adjust the Recon3 ones a bit. We start by identifying all available exchanges in Recon3 and adjusting the IDs.

In [111]:

from cobra.io import read_sbml_model
import pandas as pd

recon3 = read_sbml_model("../data/Recon3D.xml.gz")
exchanges = pd.Series([r.id for r in recon3.exchanges])
exchanges = exchanges.str.replace("__", "_").str.replace("_e$", "", regex=True)
exchanges.head()

Out[111]:

0     EX_5adtststerone
1    EX_5adtststerones
2             EX_5fthf
3             EX_5htrp
4             EX_5mthf
dtype: object

In [112]:

medium = merged.reset_index().copy()
medium["reaction"] = "EX_" + medium.metabolite
medium["dilution"] = 1.0
medium.loc[medium.reaction.isin(exchanges), "dilution"] = 0.1
medium.dilution.value_counts()

Out[112]:

0.1    71
1.0    13
Name: dilution, dtype: int64

In [113]:

medium["metabolite"] = medium.reaction.str.replace("^EX_", "", regex=True) + "_m"
medium["global_id"] = medium.reaction + "(e)"
medium["reaction"] = medium.reaction + "_m"
medium["flux"] = medium.mmol_per_litre * medium.dilution
medium.loc[medium.flux < 1e-4, "flux"] = 1e-4
medium

Out[113]:

	metabolite	name	hmdb	kegg.compound	pubchem.compound	inchi	chebi	retention index	quantitated m/z	Binbase ID	PubChem ID	KEGG ID	InChI Key	ri_type	abundance	mmol_per_litre	reaction	dilution	global_id	flux
0	ala_B_m	beta-alanine	HMDB00056	C00099	NaN	InChI=1S/C3H7NO2/c4-2-1-3(5)6/h1-2,4H2,(H,5,6)	NaN	435564.0	248.0	148.0	239.0	C00099	UCMIRNVEIXFBKS-UHFFFAOYSA-N	Binbase	275.992228	1.000	EX_ala_B_m	0.1	EX_ala_B(e)	0.1000
1	ala_L_m	L-alanine	HMDB00161	C00041	5950.0	InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5...	NaN	243971.0	116.0	34178.0	5950.0	C00041	QNAYBMKLOCPYGJ-REOHCLBHSA-N	Binbase	19154.101036	1.000	EX_ala_L_m	0.1	EX_ala_L(e)	0.1000
2	asp_L_m	L-aspartate(1-)	HMDB00191	C00049	5960.0	InChI=1S/C4H7NO4/c5-2(4(8)9)1-3(6)7/h2H,1,5H2,...	NaN	480387.0	232.0	79.0	5960.0	C00049	CKLJMWTZIZZHCS-REOHCLBHSA-N	Binbase	253.554404	1.000	EX_asp_L_m	0.1	EX_asp_L(e)	0.1000
3	cit_m	Citrate	HMDB00094	C00158	311.0	InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10...	NaN	617342.0	273.0	288.0	311.0	C00158	KRKNYBCHXYNGOX-UHFFFAOYSA-N	Binbase	2791.619171	1.000	EX_cit_m	0.1	EX_cit(e)	0.1000
4	ddca_m	laurate	HMDB00638	C02679	3893.0	InChI=1S/C12H24O2/c1-2-3-4-5-6-7-8-9-10-11-12(...	NaN	547906.0	117.0	49.0	3893.0	C02679	POULHZVOKOAJMA-UHFFFAOYSA-N	Binbase	384304.502591	1.000	EX_ddca_m	0.1	EX_ddca(e)	0.1000
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
79	gncore2_rl_m	released GlcNAc-alpha-1,4-Core 2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.000	EX_gncore2_rl_m	1.0	EX_gncore2_rl(e)	1.0000
80	core7_m	Core 7	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.000	EX_core7_m	0.1	EX_core7(e)	0.1000
81	gchola_m	glycocholate	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.000	EX_gchola_m	0.1	EX_gchola(e)	0.1000
82	tchola_m	taurocholate	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.000	EX_tchola_m	0.1	EX_tchola(e)	0.1000
83	o2_m	Oxygen	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.001	EX_o2_m	0.1	EX_o2(e)	0.0001

84 rows × 20 columns

In [114]:

medium[medium.metabolite == "lcts_m"]

Out[114]:

	metabolite	name	hmdb	kegg.compound	pubchem.compound	inchi	chebi	retention index	quantitated m/z	Binbase ID	PubChem ID	KEGG ID	InChI Key	ri_type	abundance	mmol_per_litre	reaction	dilution	global_id	flux
60	lcts_m	Lactose	HMDB00186	C00243	440995.0	NaN	NaN	932179.0	204.0	1373.0	6134.0	C01970	GUBGYTABKSRVRQ-DCSYEGIMSA-N	Binbase	4.425044e+06	388.888889	EX_lcts_m	0.1	EX_lcts(e)	38.888889

Checking the growth medium against the DB¶

But can the bacteria in our model database actually grow on this medium? Let's check and start by downbloading the AGORA model database.

In [115]:

# !wget https://zenodo.org/record/3755182/files/agora103_genus.qza?download=1 -O data/agora103_genus.qza

No we we will check for growth by running the growth medium against any single model.

In [116]:

from micom.workflows.db_media import check_db_medium

check = check_db_medium("../data/agora103_genus.qza", medium, threads=20)

Output()

check now includes the entire manifest plus two new columns: the growth rate and whether the models can grow.

In [117]:

check.can_grow.value_counts()

Out[117]:

False    227
Name: can_grow, dtype: int64

Okay nothing can grow. We probably miss some important cofactor such as manganese or copper.

Let's complete the medium so that all taxa in AGORA can grow at a rate of at least 1e-3.

Supplementing a growth medium from a skeleton¶

Sometimes you may start from a few componenents and will want to complete this skeleton medium to reach a certain minimum growth rate across all models in the database. This can be done with complete_db_medium. We can minimize either the added total flux, mass or presence of any atom. Since, we want to build a low carb diet here we will minimize the presence of added carbon.

In [118]:

from micom.workflows.db_media import complete_db_medium

manifest, imports = complete_db_medium("../data/agora103_genus.qza", medium, growth=0.01, threads=20, max_added_import=10, weights="mass")

Output()

In [119]:

manifest.can_grow.value_counts()

Out[119]:

True     225
False      2
Name: can_grow, dtype: int64

manifest is the amended manifest as before and imports contains the used import fluxes for each model. A new column in the manifest also tells us how many import were added.

In [120]:

manifest.added.describe()

Out[120]:

count    225.000000
mean      16.511111
std        6.705615
min        6.000000
25%       12.000000
50%       15.000000
75%       21.000000
max       38.000000
Name: added, dtype: float64

From this we build up our new medium.

In [121]:

fluxes = imports.max()
fluxes = fluxes[(fluxes > 1e-6) | fluxes.index.isin(medium.reaction)]
completed = pd.DataFrame({
    "reaction": fluxes.index,
    "metabolite": fluxes.index.str.replace("^EX_", "", regex=True),
    "global_id": fluxes.index.str.replace("_m$", "(e)", regex=True),
    "flux": fluxes
})
completed.shape

Out[121]:

(182, 4)

Let's also export the medium as Qiime 2 artifact which can be read with q2-micom or the normal micom package.

In [122]:

from qiime2 import Artifact

arti = Artifact.import_data("MicomMedium[Global]", completed)
arti.save("../media/breast_milk_agora.qza")

Out[122]:

'../media/breast_milk_agora.qza'

Validation¶

As a last step we validate the created medium.

In [123]:

check = check_db_medium("../data/agora103_genus.qza", completed, threads=20)
check.can_grow.value_counts()

Output()

Out[123]:

True    227
Name: can_grow, dtype: int64

In [124]:

check.growth_rate.describe()

Out[124]:

count    227.000000
mean       0.019931
std        0.011181
min        0.000048
25%        0.010000
50%        0.019846
75%        0.025641
max        0.062186
Name: growth_rate, dtype: float64