FDT in climate science (fluctuation dissipation theorem)¶

FDT: The use of fluctuation dissipation theorem can be traced back to Annus mirabilis in 1905. The year that Albert Einstein published 5 legendary paper which changed how we see the world. However, I am not going to talk about "Relativity". Instead, this project will focus on FDT, a theorem was applied to explain Brownian motion (or other fields in Statistical mechanics). FDT has a lot of names in climate science and you might be able to name a few of them, such as (1) linear response function, (2) Jacobian Matrix, (3) Green's function, (4) linear inverse model or (5) Markov train.

Before I jump into the details of FDT, I would like give you a general picture of the physical assumption over different dynamical models (or dynamical systems). All of the dynamical models we use today are based on different physical assumption. For example, when we use a global climate model with cumulus parameterization, we are assuming a quasi-equilibrium state of short-lived cumulus cloud, which nearly strikes balance with large-scale tendency. When we use cloud permitting model, we can use vertical wind shear and static stability to approximate the large eddies in the boundary layer (i.e. boundary layer scheme). In addition, in a large eddy simulation (cloud resolving), we assume the physical processes (i.e. condensation) are relatively fast compared to the lifecycle of cloud and turbulence. Thus, when we predict the "resolved scale systems" (e.g. those can be seen by the grid), the "unresovled scales" are parameterized. A good news is, most of time, these assumptions are quite robust (except some cases going through cross-scale gray zones). This indicates that if we want to forecast the large-scale systems, we don't need to do deterministic forecast for the small-scale systems. Insead, what we need is the statistics of these systems (i.e. parameterization).

Then...what is the assumption of FDT? FDT can ba regarded as a reduced-dimension dynamical model, in which the time series of normal modes are predicted. Typically, these normal modes are red in spectrum (i.e. Gaussian or linear), while high frequency systems (those can't be described by the normal mode) are white in spectrum (i.e. random process). Just like the physical parameterizations mentioned above, these high frequency systems are parameterized in FDT. Thus, what we really care is when the predictable signal (e.g. those normal modes) is greater than the stochastic process (i.e. unpredictable components).

Here, I use a very simple O.D.E as an example.

$ \begin{equation} \frac{d}{dt}\mathbf x= \mathbf A \mathbf x+\mathbf\epsilon \label{eq:1}\tag{1} \end{equation} $

In equation $\ref{eq:1}$, $\mathbf{x}$ is the predictable components (i.e. the time series of normal modes), which has a dimension of $n\times1$ and $\epsilon$ is the sotchastic part (i.e. random white noise) with the same dimension as $\mathbf{x}$. If we drop the unpredictable part, we can have the general solution of $\ref{eq:1}$.

$ \begin{equation} \mathbf{x}_{\tau}=e^{\mathbf A \tau} \mathbf{x}_{0} \label{eq:2}\tag{2} \end{equation} $

In equation $\eqref{eq:2}$, $e^{\mathbf A \tau}$ is so-called "propagator operator", which carries the information of $\mathbf{x}$ from $t=0$ to $t=\tau$ and has a dimension of $n\times n$. If you still remember the high school mathematics, we will find the $e^{\mathbf A \tau}$ matrix is just like the matrix we use for coordinate transform (i.e. Jacobian matrix). For example, you have a vector $(a,b)$ and you want to rotate this vector counter-clockwise for $\theta$ degrees, what does the formulation look like ?

$ \begin{equation} \begin{bmatrix} a'\\ b' \end{bmatrix}= \begin{bmatrix} cos(\theta) & -sin(\theta) \\ sin(\theta) & cos(\theta) \end{bmatrix} \begin{bmatrix} a\\ b \end{bmatrix} \label{eq:3}\tag{3} \end{equation} $

$\mathbf{x}_{0}$ is just like the vector $(a,b)$ and $\mathbf{x}_{\tau}$ is similar to vector $(a',b')$. $e^{\mathbf A \tau}$ is the matrix we use for coordinate transform. Similar formulations can be found in a lot of places, such as stastistical test, fourier transform, Laplace transform...etc (I will write another discussion about these analogs).

Apparently, the key is to derive the propagator operator $e^{\mathbf A \tau}$, which enables us to develop this simplified model. I will provide details in the following part.

Developing a simple climate model from observational data¶

To solve $e^{\mathbf A \tau}$ in $\eqref{eq:2}$ numerically, we can use inverse approach. (of course the forward approach is another option as well, which is computationally less expensive but also characterized by some drawbacks ). The first step to solve $e^{\mathbf A \tau}$ is, we multiply both sides of $\eqref{eq:2}$ by $\mathbf{x}_{0}^T$.

$ \begin{equation} \mathbf{x}_{\tau}\mathbf{x}_{0}^{T}=e^{\mathbf A \tau} \mathbf{x}_{0} \mathbf{x}_{0}^{T} \label{eq:4} \tag{4} \end{equation} $

then we can have

$ \begin{equation} \mathbf{C}_\tau=e^{\mathbf A \tau} \mathbf{C}_0 \label{eq:5}\tag{5} \end{equation} $

where $\mathbf{x}_{\tau}\mathbf{x}_{0}^{T}=\mathbf{C}_\tau$ and $\mathbf{x}_{0} \mathbf{x}_{0}^{T}=\mathbf{C}_0$. $\mathbf{C}_\tau$ and $\mathbf{C}_0$ are covariance matrices of $\mathbf{x}$ between (1)$t=\tau$ and $t=0$ and (2) $t=0$ and $t=0$. Now, we can find all three matrices in $\eqref{eq:5}$ have dimension of $n\times n$. This means we can iverse any matrix in $\eqref{eq:5}$ directly as long as the det is not 0. Luckily, using normal modes ensures that the det of $\mathbf{C}_0$ won't be 0. Thus, multiplying both sides of $\eqref{eq:5}$ by the inversed $\mathbf{C}_0 $, we can ultimately get $e^{\mathbf A \tau}$. Woohoo! (Looks easy right?).

$ \begin{equation} \mathbf{C}_\tau \mathbf{C}_0^{-1}=e^{\mathbf A \tau} \label{eq:6}\tag{6} \end{equation} $

Limitation of Inverse modeling: $\tau$ test¶

Although the steps of driving $e^{\mathbf A \tau}$ look simple, there is still a very important criteria. Let's go back to the assumption at the very beginning. From $\eqref{eq:1}$ to $\eqref{eq:2}$, we drop $\epsilon$. The reason we can drop this term is that...

$ \begin{equation} (\frac{d}{dt}\mathbf x)\mathbf x^{T}= \mathbf A \mathbf x\mathbf x^{T}+\mathbf\epsilon \mathbf x^{T}\label{eq:7} \tag{7} \end{equation} $

the last term of $\eqref{eq:7}$ is nearlly zero because $\epsilon$ is white in spectrum (you will only get mean 0 (or nearly zero) when you multiply a matrix with a white noise matrix). This indicates that there is no information of $\epsilon$ passed from fomer time step to next time step. On the other hand, everything left (i.e. the predictable part, $\mathbf{x}$) should be red. This assumption gives us the following way to test our data.

Specifically, we can examine if a dataset can be used in FDT by checking the diagonal elements of $\mathbf{C}_\tau \mathbf{C}_0^{-1}$ in $\eqref{eq:6}$. Because if the general solution in $\eqref{eq:4}$ hold for every $\tau$, we will only have one $\mathbf{A}$ matrix regardless of the value of $\tau$, which means...

$ \begin{equation} tr(\mathbf A)=tr(\mathbf{ln}(\mathbf{C}_\tau \mathbf{C}_0^{-1})/\tau)=const \label{eq:8} \tag{8} \end{equation} $

If we find $\eqref{eq:8}$ doesn't hold for every $\tau$, we can't apply FDT to a given dataset.