Here we will build a medium based on the anticipated metabolites present in the human gut for a meal matched to the Chepang people in Nepal. Please be aware that this is unlikely to represent the full diversity of the diet.
The composition of the meal is the following:
This sums up to 1020.5 kcal.
We will start by reading individual tables for the specific foods and scale the abundances.
import pandas as pd
meal = {
"Mountain yam": 450,
"Cowpea": 250,
"Himalayan Stinging Nettle": 500,
"Chicken": 100
}
foods = []
for food in meal:
mets = pd.read_excel("../data/foods_diets.xlsx", food)
mets["amount_g"] = mets.relative_abundance / mets.relative_abundance.sum() * meal[food]
mets["concentration"] = mets["amount_g"] / mets["mw"] * 1000.0 # to yield mmol/meal
mets["flux"] = mets["concentration"] / 8.0 # to yield mmol/h
mets["food"] = food
foods.append(mets)
foods = pd.concat(foods)
Now we combine the data.
foods.loc[foods.metabolite == "h2o", "flux"] += 1000.0 / 18.01 * 1000.0 / 8.0 # add 1L water
diet = foods.dropna(subset=["metabolite"]).groupby("metabolite").flux.sum().reset_index()
diet.head()
metabolite | flux | |
---|---|---|
0 | ala_L | 16.966227 |
1 | arg_L | 13.388907 |
2 | ascb_L | 0.030642 |
3 | asn_L | 5.593654 |
4 | asp_L | 19.277857 |
To achieve this we will load the Recon3 human model. AGORA and Recon IDs are very similar so we should be able to match them. We just have to adjust the Recon3 ones a bit. We start by identifying all available exchanges in Recon3 and adjusting the IDs.
from cobra.io import read_sbml_model
import pandas as pd
recon3 = read_sbml_model("../data/Recon3D.xml.gz")
exchanges = pd.Series([r.id for r in recon3.exchanges])
exchanges = exchanges.str.replace("__", "_").str.replace("_e$|EX_", "", regex=True)
exchanges.head()
0 5adtststerone 1 5adtststerones 2 5fthf 3 5htrp 4 5mthf dtype: object
diet["dilution"] = 1.0
diet.loc[diet.metabolite.isin(exchanges), "dilution"] = 0.2
diet["flux"] = diet["flux"] * diet["dilution"]
diet[["metabolite", "dilution"]].drop_duplicates().dilution.value_counts()
0.2 47 1.0 9 Name: dilution, dtype: int64
Finally we add the host metabolites such as primary bile acids and mucins and a minuscule amount of oxygen.
diet.set_index("metabolite", inplace=True)
annotations = pd.read_csv("../data/agora_metabolites.csv")
# mucin
for met in annotations.loc[annotations.metabolite.str.contains("core"), "metabolite"]:
diet.loc[met, "flux"] = 1
# primary BAs
for met in ["gchola", "tchola"]:
diet.loc[met, "flux"] = 1
# anaerobic
diet.loc["o2", "flux"] = 0.001
diet.reset_index(inplace=True)
diet["reaction"] = "EX_" + diet.metabolite + "(e)"
diet
metabolite | flux | dilution | reaction | |
---|---|---|---|---|
0 | ala_L | 3.393245 | 0.2 | EX_ala_L(e) |
1 | arg_L | 2.677781 | 0.2 | EX_arg_L(e) |
2 | ascb_L | 0.006128 | 0.2 | EX_ascb_L(e) |
3 | asn_L | 1.118731 | 0.2 | EX_asn_L(e) |
4 | asp_L | 3.855571 | 0.2 | EX_asp_L(e) |
... | ... | ... | ... | ... |
67 | gncore2_rl | 1.000000 | NaN | EX_gncore2_rl(e) |
68 | core7 | 1.000000 | NaN | EX_core7(e) |
69 | gchola | 1.000000 | NaN | EX_gchola(e) |
70 | tchola | 1.000000 | NaN | EX_tchola(e) |
71 | o2 | 0.001000 | NaN | EX_o2(e) |
72 rows × 4 columns
And we will merge this table with some annotations to make it more accessible.
skeleton = pd.merge(diet, annotations, on="metabolite")
skeleton["global_id"] = skeleton.reaction
skeleton["reaction"] = "EX_" + skeleton.metabolite + "_m"
skeleton.head()
metabolite | flux | dilution | reaction | name | hmdb | kegg.compound | pubchem.compound | inchi | chebi | global_id | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | ala_L | 3.393245 | 0.2 | EX_ala_L_m | L-alanine | HMDB00161 | C00041 | 5950.0 | InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5... | NaN | EX_ala_L(e) |
1 | arg_L | 2.677781 | 0.2 | EX_arg_L_m | L-argininium(1+) | HMDB00517 | C00062 | 6322.0 | InChI=1S/C6H14N4O2/c7-4(5(11)12)2-1-3-10-6(8)9... | NaN | EX_arg_L(e) |
2 | ascb_L | 0.006128 | 0.2 | EX_ascb_L_m | L-ascorbate | HMDB00044 | C00072 | NaN | InChI=1S/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/... | NaN | EX_ascb_L(e) |
3 | asn_L | 1.118731 | 0.2 | EX_asn_L_m | L-asparagine | HMDB00168 | C00152 | 6267.0 | InChI=1S/C4H8N2O3/c5-2(4(8)9)1-3(6)7/h2H,1,5H2... | NaN | EX_asn_L(e) |
4 | asp_L | 3.855571 | 0.2 | EX_asp_L_m | L-aspartate(1-) | HMDB00191 | C00049 | 5960.0 | InChI=1S/C4H7NO4/c5-2(4(8)9)1-3(6)7/h2H,1,5H2,... | NaN | EX_asp_L(e) |
Great we now have a pretty good skeleton. One issue that this will never be fully complete. There will always be some components missing that are essential for microbial growth. Fortunately, we provide a algorithm in MICOM to complete a medium with the smallest set of additional components to provide growth to all intestinal taxa.
from micom.workflows.db_media import complete_db_medium
manifest, imports = complete_db_medium(
"../data/agora103_strain.qza",
medium=skeleton,
growth=0.1,
threads=12,
max_added_import=10, # do not add more than 10 mmol/h of flux per component
strict=["EX_o2(e)"], # force anaerobic environment
weights="mass" # minimize added molecular weight
)
Completing 818 strain-level models on a medium with 72 components (1 strict).
Output()
Obtained growth for 815 models adding additional flux of 0.62/6763.81 on average.
manifest.can_grow.value_counts()
True 815 False 3 Name: can_grow, dtype: int64
filled = imports.max()
added = filled.sum() - skeleton.loc[skeleton.reaction.isin(filled.index), "flux"].sum()
print(f"Added flux is {added.sum():.2f}/{filled.sum():.2f} mmol/h.")
Added flux is 119.80/6969.38 mmol/h.
Let's see what was added in large amounts.
added_fluxes = filled.copy()
shared = added_fluxes.index[added_fluxes.index.isin(skeleton.reaction)]
added_fluxes[shared] -= skeleton.flux[shared]
added_fluxes.sort_values(ascending=False)[0:20]
EX_h_m 10.000000 EX_glyleu_m 10.000000 EX_h2_m 10.000000 EX_glyc_m 8.883167 EX_no_m 7.881353 EX_ser_L_m 7.426290 EX_for_m 7.181399 EX_asn_L_m 5.312592 EX_urea_m 4.965212 EX_ac_m 4.587295 EX_fru_m 4.524197 EX_cytd_m 4.458517 EX_mal_L_m 3.850456 EX_co2_m 3.453552 EX_no3_m 3.413730 EX_ph2s_m 3.074153 EX_nh4_m 2.765672 EX_no2_m 2.182533 EX_glyc3p_m 1.819979 EX_fum_m 1.686218 dtype: float64
Looks okay. So we will now assemble the final medium. For this we add the new components to each sample and rebuild the annotations for a nicely formatted medium.
added_df = filled[filled > 1e-8].reset_index()
added_df.iloc[:, 0] = added_df.iloc[:, 0].str.replace("EX_|_m$", "", regex=True)
added_df.columns = ["metabolite", "flux"]
completed = pd.merge(added_df, annotations, on="metabolite", how="left")
completed["reaction"] = "EX_" + completed.metabolite + "_m"
completed["global_id"] = "EX_" + completed.metabolite + "(e)"
completed
metabolite | flux | name | hmdb | kegg.compound | pubchem.compound | inchi | chebi | reaction | global_id | |
---|---|---|---|---|---|---|---|---|---|---|
0 | ala_L | 3.393245 | L-alanine | HMDB00161 | C00041 | 5950.0 | InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5... | NaN | EX_ala_L_m | EX_ala_L(e) |
1 | arg_L | 3.769624 | L-argininium(1+) | HMDB00517 | C00062 | 6322.0 | InChI=1S/C6H14N4O2/c7-4(5(11)12)2-1-3-10-6(8)9... | NaN | EX_arg_L_m | EX_arg_L(e) |
2 | asn_L | 6.431323 | L-asparagine | HMDB00168 | C00152 | 6267.0 | InChI=1S/C4H8N2O3/c5-2(4(8)9)1-3(6)7/h2H,1,5H2... | NaN | EX_asn_L_m | EX_asn_L(e) |
3 | asp_L | 3.855571 | L-aspartate(1-) | HMDB00191 | C00049 | 5960.0 | InChI=1S/C4H7NO4/c5-2(4(8)9)1-3(6)7/h2H,1,5H2,... | NaN | EX_asp_L_m | EX_asp_L(e) |
4 | but | 0.000322 | butyrate | HMDB00039 | C00246 | 264.0 | InChI=1S/C4H8O2/c1-2-3-4(5)6/h2-3H2,1H3,(H,5,6... | NaN | EX_but_m | EX_but(e) |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
164 | indole | 0.005516 | Indole | NaN | NaN | NaN | NaN | NaN | EX_indole_m | EX_indole(e) |
165 | cmp | 0.008410 | CMP | HMDB00095 | C00055 | 6131.0 | NaN | NaN | EX_cmp_m | EX_cmp(e) |
166 | coa | 0.000619 | Coenzyme A | HMDB01423 | C00010 | 6816.0 | NaN | NaN | EX_coa_m | EX_coa(e) |
167 | datp | 0.002391 | dATP | HMDB01532 | C00131 | 15993.0 | NaN | NaN | EX_datp_m | EX_datp(e) |
168 | urea | 4.965212 | Urea | HMDB00294 | C00086 | 1176.0 | InChI=1S/CH4N2O/c2-1(3)4/h(H4,2,3,4) | NaN | EX_urea_m | EX_urea(e) |
169 rows × 10 columns
And we will now validate whether the medium works.
from micom.workflows.db_media import check_db_medium
check = check_db_medium("../data/agora103_strain.qza", medium=completed, threads=12)
Checking 818 strain-level models on a medium with 169 components.
Output()
check.growth_rate.describe()
count 818.000000 mean 0.191422 std 0.109046 min 0.036577 25% 0.100000 50% 0.156764 75% 0.256409 max 0.646666 Name: growth_rate, dtype: float64
import qiime2 as q2
arti = q2.Artifact.import_data("MicomMedium[Global]", completed)
arti.save("../media/himalaya.qza")
'../media/himalaya.qza'