My research involves visiting and sampling hot springs of Yellowstone National Park (YNP), USA and elsewhere around the world.
Specifically, I sample and study heat-loving, or "thermophilic", microbes that form communities submerged in hot spring water. Communities can grow large enough to view with the naked eye, such as these "pink streamer" communities photographed in the water of Octopus Spring. (In person, these look like billowing clumps of cotton candy stuck to rocks under the water!)
Downstream, thermophile communities can look completely different, like this huge photosynthetic microbial mat at Octopus Spring. It is as squishy and slimy as it looks!
One way that thermophiles have adapted to life in boiling water is by making their outer layer, or cell membrane, resistant to heat. Cell membranes are composed of many individual lipid molecules with different abundances and chemical structures that provide heat-resistance. There are many, many, many different kinds of lipids. Chemical cartoons of two examples are shown below. For fun, try to find a few similarities and differences between them.
The top one is found in some thermophiles that live around 70 to 90 ∘C while the bottom one found in photosynthetic microbes that live below 70∘C and is also common in plants. These lipids have complicated chemical names that I won't refer to here, but in case you're curious, the top and bottom lipids shown above are called "phosphoaminopentanetetrol acyl ether glycerol" and "sulfoquinovosyl diacylglycerol", respectively.
Note that these lipids have commonly have a polar headgroup and a nonpolar tailgroup. The tailgroup commonly has two lipid chains, which are represented by the long zig-zag lines in the cartoon.
A few ways that the chemical structure of lipids can be modified are shown in blue in the cartoon below. Some of these modifications provide heat resistance, such as longer chain lengths, more ether bonds, and fewer double bonds.
When I extract and quantify thermophile lipids in hot spring sediment or biomass samples I look for:
Point 3 requires performing calculations to estimate how much energy it takes for a thermophile to synthesize a particular lipid relative to another. This cost changes depending on the temperature and water chemistry of the habitat.
I hypothesize that microbes adapt their lipid composition to 1) minimize energy cost and 2) maximize lipid function.
In other words, microbes have performed cost-benefit analysis through natural selection: a microbe that evolves a way to save more energy can reproduce faster than its neighbors, therefore making it more competitive. But remember, a given lipid will cost different amounts to make depending on habitat conditions, which could explain why we see different distributions and chemical structures of lipids in different places.
To test this hypothesis, I first sampled thermophile communities living in different temperatures and chemical conditions. Then for each sample, I calculated whether the energy needed to synthesize observed lipids was minimized in the chemical conditions of that sample location relative to other locations.
The remained of this notebook focuses on the results of one hot spring in YNP, Bison Pool, which was sampled during the summer of 2012. The source pool is pictured below on the left, where hot water comes to the surface at ~89∘C and then flows away via a narrow outflow channel. The photomosaic map of Bison Pool's outflow channel shown below indicates the six sites where I sampled lipids from thermophile biomass.
Lipid identities and abundances are obtained by analyzing biomass extracts via liquid chromatography tandem mass spectrometry. This is a time-intensive process, as hot spring samples like these can produce very complex lipid profiles, commonly with never-before-seen lipids.
Shown in the Table 1 (below) are weighted lipid chain properties observed in biomass extracts from six sample locations marked on the map above. For reference, these properties are shown in the cartoon below Table 1. The column "% ester bond" is the percent of observed lipid chains in a sample with an ester bond (the remainder are overwhelmingly ether bonds). The column "ave. length" shows the average number of carbons (or "kinks" in the zig-zag cartoon chain). The column "ave. double bonds" is the average number of double bonds per chain, and "% hydroxyl" is the percent of observed chains in a sample with a hydroxylation.
Site | % ester bond | ave. length | ave. double bonds | % hydroxyl. |
---|---|---|---|---|
Bison Pool 1 | 27 | 19.8 | 0.16 | 0.9 |
Bison Pool 2 | 42 | 19.4 | 0.18 | 1.4 |
Bison Pool 3 | 74 | 17.9 | 0.17 | 2.1 |
Bison Pool 4 | 55 | 17.3 | 0.17 | 1.5 |
Bison Pool 5 | 85 | 16.8 | 0.27 | 2.6 |
Bison Pool 6 | 97 | 16.7 | 0.41 | 0.3 |
In this work, I focus only on lipid tailgroup properties in my samples, and not headgroup properties, because there is a lack of experimental thermodynamic data for many headgroups and headgroup-like structures in scientific literature. I do have thermodynamic data that allow calculations involving lipid tailgroup chains, but this first requires knowing what type of chains for which to calculate properties.
In reality, hundreds of lipid structures are typically observed in a single hot spring sample. Performing thermodynamic calculations on this many lipids across the six sample sites of Bison Pool can get nasty and complicated, so I chose to approach this problem in a different way.
Imagine instead that only a single lipid was found at each sample site, and the properties of these lipids shown in Table 1. For Bison Pool site 1, this average lipid would have 27% of an ester bond, a length of 19.8 carbons, 0.4 of a double bond, and 0.01 of a hydroxylation. A cartoon of this average lipid is shown below.
Repeat this process for each of the six samples along Bison Pool, and one ends up with six average lipid representatives. Table 2 (below) shows the calculated chemical formulae of these six average lipids.
Six lipids, rather than hundreds, is much more manageable when it comes to computing thermodynamic properties in the next step.
Site | hypoth. lipid formula |
---|---|
Bison Pool 1 | C19.8H39.8O1.3N0.01 |
Bison Pool 2 | C19.4H38.9O1.4N0.02 |
Bison Pool 3 | C17.9H35.8O1.7N0.02 |
Bison Pool 4 | C17.3H35.1O1.6N0.01 |
Bison Pool 5 | C16.9H33.4O1.8N0.003 |
Bison Pool 6 | C16.7H32.6O2.0N0.003 |
Recall that the goal of this work is to test the hypothesis that thermophiles have naturally selected lipids that minimize energetic cost while maximizing function.
Thermodynamic calculations allow the estimation of the energetic costs of these six average lipid chains relative to each other. If the final results of the calculation show that the average lipids of sites 1, 2, 3, 4, 5, and 6 are most stable in their respective thermal and geochemical conditions, then this lends support to the hypothesis. If, on the other hand, the average lipid of site 6 is most stable in the thermal and geochemical conditions measured at site 1 (for example) then there is a problem with our original hypothesis.
Describing in detail the process behind estimating the thermodynamic properties of the six average representative lipids could fill one or more Jupyter notebooks in itself, so instead I'll keep it simple and brief. I wrote scripts in the R programming language to do the following:
Think of Step 3 like building a Lego set (e.g. the thermodynamic property of an average lipid) out of individual Lego blocks (e.g. the partial thermodynamic contribution of an ester bond). Most of these thermodynamic data come from ORCHYD, an online thermodynamic database of aqueous chemical species.
Parts of the average lipids, thermodynamic estimation strategy, and thermodynamic data used are summarized in Table 3, below. For simplicity's sake, many of the lipid parts mentioned in Table 3, such as "pentane rings" and "half monolayers" haven't been shown in any cartoon in this notebook demo, but exist in nature and have been accounted for in these calculations.
| lipid part | strategy | data used | |:---------------- |:------------------------------------- |:-------------------------------------------------------------------------------- | | straight-chain | linear regression of… | | | C-C | n-alkanes | C3 to C14 | | ether | 1-alcohols | C3 to C12 | | ester | carboxylic acids | C3 to C12 | | | | | | | modification | add properties of… | | | unsaturation | 2-alkenes - alkanes | 2-(pent, hex)ene | | hydroxyl | 2-alcohols - alkanes | 2-hept, 3-(pent, hex, hept), 4-heptanol | | branch | banched - nonbranched alkanes | 2-methyl(prop, but, pent, hex, oct), 3-methyl(pent, hex, hept), 4-methyloctane | | pentane ring | cyclopentanes - n-alkanes | (methyl, propyl, pentyl)cyclopentane | | hexane ring | 1,1,3-trimethylcyclohexane - decane | 1,1,3-trimethylcyclohexane | | half monolayer | [CH2] - [CH3] | CH2 and CH3 contribution of n-alkanes, C3 to C14 | | amide | [CONH2] - [COOH] | amino acid backbone contributions |
Shown below are the thermodynamic properties estimated for the average representative lipid of Bison Pool sample site 1 that are relevant to testing the original hypothesis.
These properties are then used to calculate a few extra parameters, called "Helgeson-Kirkham-Flowers (HKF) equation-of-state coefficients", that are needed to perform thermodynamic calculations at temperatures other than 25∘C. This is an important step, considering we are doing calculations for Bison Pool, which is 89∘C at site 1!
These calculations are repeated to obtain thermodynamic properties and HKF coefficients for all six average lipids. Next, we need to compare relative energetic costs of lipids to finally test the hypothesis that thermophiles minimize energetic cost while maximizing lipid function.
Now that the thermodynamic properties of the six average representative thermophile lipids have been calculated, the next step is to compare their energetic costs.
A number of factors influence energetic cost that a thermophile must pay when making a lipid. One factor is temperature; a particular lipid might cost more to make at a higher temperature than a lower one, and vice-versa. Another is the availability of nutrients. By "nutrients", I refer to the molecules thermophiles "eat" from their environment to make themselves, such as dissolved bicarbonate (HCO−3) as a source of carbon and ammonia (NH3) as a source of nitrogen. The concentration of protons (H+) in the environment determines pH and can have a huge impact on energetic costs.
Likewise, the oxidation-reduction potential (Eh) of a thermophile habitat influences the energetic cost of creating a more oxidized or reduced lipid. One can visualize this in terms of dissolved oxygen (O2) concentrations along Bison Pool. When hot water bubbles up from underground at the spring source, there is very little dissolved oxygen in the water, which makes chemical reactions that require O2 less favorable in those conditions because of its scarcity. We call these conditions more reducing. As water trickles downstream away from the hot spring source, O2 from the atmosphere begins mixing with the water, increasing its concentration. At the same time, some thermophiles generate O2 metabolically, increasing the concentration of O2 even more downstream. In sites where the oxygen concentration is higher, reactions that require O2 are more favorable. We call these conditions more oxidizing. Rather than using O2 concentrations in these calculations, I've chosen instead to represent Eh in terms of electron (e−) concentrations, though doing it either way yields equivalent results.
Eh receives special attention in this work because there is a huge gradient in Eh between Bison Pool site 1 and site 6, with the most reduced sites upstream (e.g. site 1) and the most oxidized sites downstream (e.g. sites 5 and 6).
An unbalanced example reaction involving these important chemicals has been written below for the synthesis of the average representative lipid from Bison Pool site 1. Note that nutrients (HCO−3 and NH3), pH in the form of H+, and Eh in the form of e− are all considered. Water (H2O) is on the right side of the reaction because it is a common byproduct of metabolic reactions and will serve to balance our equation later. Because the average lipid from site 1 is being formed in this reaction, we call this a formation reaction.
Now that the thermodynamic properties of each of the six average lipids and their chemical formulae (Table 2) have been calculated or estimated, it is possible to balance their formation reactions and calculate their energetic costs.
I use a thermodynamic R package called "CHNOSZ" to automate this. Shown below is a full calculation comparing the energetic costs of the six average representative lipids at Bison Pool.
First, I load the necessary packages for calculation and plotting.
My six average representative lipids are then imported into the CHNOSZ thermodynamic database. Their names are Bison1, Bison2, and so on to Bison6 furthest downstream from the hot spring source.
### install necessary R packages
# install.packages('CHNOSZ', repos='http://cran.us.r-project.org')
# install.packages('ggplot2', repos='http://cran.us.r-project.org')
# install.packages('repr', repos='http://cran.us.r-project.org')
### Load packages
library(CHNOSZ)
library(ggplot2)
library(repr)
CHNOSZ version 1.3.2-10 (2019-05-06) reset: creating "thermo" object obigt: loading default database with 1841 aqueous, 3357 total species
### add average lipid properties to CHNOSZ database
suppressMessages(add.obigt("data/OBIGT_bison.csv"))
# check to see lipids have loaded properly into the database
info(info(c("Bison1", "Bison2", "Bison3",
"Bison4", "Bison5", "Bison6")))
name | abbrv | formula | state | ref1 | ref2 | date | G | H | S | Cp | V | a1 | a2 | a3 | a4 | c1 | c2 | omega | Z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | |
3358 | Bison1 | NA | C19.8H39.5N0.0111O1.28 | aq | thermofunc | NA | 11/1/2017 | -7779.977 | -165548.2 | 146.4985 | 395.8124 | 319.0402 | 6.380555 | 4610.164 | 88.93901 | -329282.4 | 387.7237 | 32835.18 | -15373.26 | 0 |
3359 | Bison2 | NA | C19.4H38.6N0.0164O1.42 | aq | thermofunc | NA | 11/1/2017 | -15764.062 | -169001.9 | 150.6434 | 388.9181 | 312.4211 | 6.254139 | 4499.090 | 86.61967 | -321221.1 | 381.0372 | 31963.55 | -15039.45 | 0 |
3360 | Bison3 | NA | C18H35.5N0.0236O1.73 | aq | thermofunc | NA | 11/1/2017 | -36975.259 | -177535.1 | 150.5980 | 361.4238 | 291.4375 | 5.849094 | 4158.048 | 79.53854 | -296386.5 | 354.1030 | 29609.83 | -14152.19 | 0 |
3361 | Bison4 | NA | C17.3H34.7N0.015O1.55 | aq | thermofunc | NA | 11/1/2017 | -28442.862 | -165284.8 | 145.0045 | 350.8409 | 280.6913 | 5.642638 | 3980.866 | 75.87279 | -283555.8 | 343.8761 | 28109.31 | -13597.09 | 0 |
3362 | Bison5 | NA | C16.9H33N0.00259O1.81 | aq | thermofunc | NA | 11/1/2017 | -42450.206 | -172146.0 | 147.9660 | 338.3405 | 272.5637 | 5.481634 | 3859.431 | 73.37821 | -274599.8 | 331.4704 | 27710.01 | -13450.73 | 0 |
3363 | Bison6 | NA | C16.7H31.8N0.0033O1.96 | aq | thermofunc | NA | 11/1/2017 | -44937.702 | -169183.8 | 150.9237 | 329.8841 | 265.7185 | 5.374079 | 3684.549 | 69.73712 | -262889.3 | 324.2365 | 22531.73 | -11602.47 | 0 |
Next, I input the concentrations of nutrients (HCO−3 and NH3) and protons (H+) that our research group actually measured in the water of Bison Pool. I have chosen to perform the calculation for the geochemical conditions we measured for Bison Pool site 1 as a starting point.
Note that I do not include a measured value for Eh. The technical reason for this is that there is no single Eh value for water not in redox equilibrium. So instead of defining a single Eh value, Eh is allowed to become an independent variable to show how relative energetic costs of making these average representative lipids changes along an Eh gradient.
# input measured log concentrations of nutrients and pH
basis(c("HCO3-", "H2O", "H+", "e-", "NH3"),
c(-2.3655, 0, -7.235, NA, -5.6034))
C | H | N | O | Z | ispecies | logact | state | |
---|---|---|---|---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <int> | <dbl> | <chr> | |
HCO3- | 1 | 1 | 0 | 3 | -1 | 13 | -2.3655 | aq |
H2O | 0 | 2 | 0 | 1 | 0 | 1 | 0.0000 | liq |
H+ | 0 | 1 | 0 | 0 | 1 | 3 | -7.2350 | aq |
e- | 0 | 0 | 0 | 0 | -1 | 2 | NA | aq |
NH3 | 0 | 3 | 1 | 0 | 0 | 64 | -5.6034 | aq |
After that, I call up the six average representative lipids. This table shows how many of each nutrient/proton/electron are required to balance the formation reactions for each average representative lipid.
Note that the rightmost column of each row gives the name of each average lipid, and to their left are the numbers of HCO−3, NH3, H+, and e− necessary to balance the formation reaction. For instance, the average lipid Bison1 requires 19.8 HCO−3 in the formation reaction to supply the necessary 19.8 carbons in the lipid structure. Also note the negative sign for H2O, which represents that water is being formed as a byproduct of the lipid formation reaction.
# set 'average lipids' as species of interest
species(c("Bison1", "Bison2", "Bison3", "Bison4", "Bison5", "Bison6"))
HCO3- | H2O | H+ | e- | NH3 | ispecies | logact | state | name |
---|---|---|---|---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <int> | <dbl> | <chr> | <chr> |
19.8 | -58.12 | 135.9067 | 116.10670 | 0.01110 | 3358 | -3 | aq | Bison1 |
19.4 | -56.78 | 132.7108 | 113.31080 | 0.01640 | 3359 | -3 | aq | Bison2 |
18.0 | -52.27 | 121.9692 | 103.96920 | 0.02360 | 3360 | -3 | aq | Bison3 |
17.3 | -50.35 | 118.0550 | 100.75500 | 0.01500 | 3361 | -3 | aq | Bison4 |
16.9 | -48.89 | 113.8722 | 96.97223 | 0.00259 | 3362 | -3 | aq | Bison5 |
16.7 | -48.14 | 111.3701 | 94.67010 | 0.00330 | 3363 | -3 | aq | Bison6 |
Finally, I perform the thermodynamic calculation that compares the relative energetic costs of making each of the six average lipids in the geochemical conditions and temperature we measured at Bison Pool (site 1).
# set resolution to 300 calculations
res <- 300
# calculate hyp. lipid affinities of formation at 89 degreesC
a <-affinity(Eh=c(-0.55, -0.30, res), T=89.0)
# calculate equilibrium chemical activities of lipids from affinities
e <-equilibrate(a, balance=1)
affinity: temperature is 89 C affinity: pressure is Psat affinity: variable 1 is Eh at 300 values from -0.55 to -0.3 V subcrt: 11 species at 362.15 K and 1 bar (wet) balance: from numeric argument value equilibrate: n.balance is 1 1 1 1 1 1 equilibrate: loga.balance is -2.22184874961636 equilibrate: using boltzmann method
With the thermodynamic calculation complete, all that remains is to plot the output using the ggplot2 package.
# prepare 'average lipid' speciation output for plotting:
x <- round(seq(from=-0.55, to=-0.30, length.out=res), 5) # Eh
y1 <- round(10^e$loga.equil[[1]]/10^e$loga.balance, 5)*100 # Bison1
y2 <- round(10^e$loga.equil[[2]]/10^e$loga.balance, 5)*100 # Bison2
y3 <- round(10^e$loga.equil[[3]]/10^e$loga.balance, 5)*100 # Bison3
y4 <- round(10^e$loga.equil[[4]]/10^e$loga.balance, 5)*100 # Bison4
y5 <- round(10^e$loga.equil[[5]]/10^e$loga.balance, 5)*100 # Bison5
y6 <- round(10^e$loga.equil[[6]]/10^e$loga.balance, 5)*100 # Bison6
df <- data.frame(cbind(x, y1, y2, y3, y4, y5, y6)) # bind output into a data frame
# plotly color palette
pal <- c("#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd", "#8c564b")
# create ggplot
p <- ggplot(data=df) +
ggtitle(expression("Formation of average representative lipids of Bison Pool, 89"*~degree*C)) +
theme(panel.background=element_blank()) +
labs(x="Eh, volts", y="percent formation") +
geom_line(aes(x=x, y=y1, colour="Bison1"), size=1) +
geom_line(aes(x=x, y=y2, colour="Bison2"), size=1) +
geom_line(aes(x=x, y=y3, colour="Bison3"), size=1) +
geom_line(aes(x=x, y=y4, colour="Bison4"), size=1) +
geom_line(aes(x=x, y=y5, colour="Bison5"), size=1) +
geom_line(aes(x=x, y=y6, colour="Bison6"), size=1) +
scale_colour_manual(name="Lipid", values=pal)
# Change plot size to 7 x 3
options(repr.plot.width=7, repr.plot.height=3)
# Display plot
print(p)
Try out an interactive plotly version of this plot.
As mentioned previously, Eh is allowed to be our independent variable (x-axis) because we are interested in how the relative energetic costs of the six average lipids change along an Eh gradient, such as the one downstream at Bison Pool. Eh values further to the left side of the x-axis are more reduced (e.g. conditions close to the hot spring source), while values further to the right are more oxidized (e.g. conditions further downstream).
Changes in Eh result in the rise and fall of six colored lines, which correspond to our six average representative lipids. The y-axis is "percent formation" and is a measure of relative thermodynamic stability, with higher values meaning a lipid is relatively more thermodynamically stable compared to others, therefore requiring less energy to produce by their formation reaction. For any given Eh, the percent formation of all six average lipids adds up to 100; so for instance at the Eh where the purple and brown lines cross, the most energetically favorable lipid composition is 50%-50% Bison5 and Bison6.
According to the diagram above, the most favorable configuration of lipids in the most reduced Eh is represented by average lipid Bison1, which was sampled at the hottest, most reduced site upstream, which agrees with the original hypothesis. A result that would challenge our hypothesis would show a average lipid representative of a downstream site, e.g. Bison5 or Bison6, as the most stable lipid in the most reducing Eh. Likewise, if Bison1 or Bison2 was most stable in oxidizing Eh relative to other lipids, this would challenge our hypothesis. But because the percent formation of average lipids transitions from Bison1 to Bison6 from most reduced to most oxidized Eh, or left to right on the x-axis, these results support the original hypothesis.