Hi! This is my first blog post on Gaussian mixture reduction and its applications. I

What is Gaussian mixture?

Gaussian mixture is a family of distributions. Most people might be familiar with Gaussian distribution, a bell shaped distribution that is widely used in practice. For example, if you draw the histogram of the final grades of a class, or if you draw the histogram of the age of a population. Usually, these histograms will have a peak in the middle and delays on both ends.

The Gaussian distribution

Let $\sR^{d}$ be the standard Euclidean space of dimension $d$ and $\Theta$ be some space. A parametric distribution family with density function $f(x;\theta)$ with respect to some $\sigma$-finite measure is $\gF = { f(x;\theta): x\in \sR^{d}, \theta \in \Theta}$. Let $\delta_{\theta}(\cdot)$ be the Dirac measure such that $\delta_{\theta}(\sA)=1$ if $\theta$ is in set $\sA$ and $0$ otherwise. Let $G= \sum_{k=1}^{K} w_k \delta_{\theta_k}$ be a discrete probability measure, assigning probability $w_k$ to parameter value $\theta_k \in \Theta$ for some integer $K>0$. A distribution with the following density function \begin{equation} \label{def:general_mixture_density} f(x; G) = \int f(x;\theta),dG(\theta) = \sum_{k=1}^K w_k f(x;\theta_k) \end{equation} is called a finite mixture of $\mathcal{F}$. We call $f(x; \theta)$ the subpopulation density function. The elements of $\btheta=(\theta_1,\theta_2,\ldots,\theta_K)^{\top}$ and that of $\vw=(w_1,w_2,\ldots,w_K)^{\top}$ are respectively called the subpopulation parameter and the mixing weight. We use $F(x; \theta)$ and $F(x; G)$ for the \ac{CDF} of $f(x;\theta)$ and $f(x;G)$ respectively. Let $\Theta^{K} =\Theta\times\Theta\times\cdots\times \Theta$ be the Cartesian product over $K$ sets of $\Theta$ and $\Delta_{K-1}$ be the $(K-1)$-dimensional simplex ${(w_1, w_2, \ldots, w_{K}): w_{k} \in [0,1], \sum_{k=1}^{K} w_k=1}$, we denote the space of mixing distributions $G$ of order up to $K$ as \begin{equation} \label{eq:G_K} \mathbb{G}_{K}

\left { G: G=\sum_{k=1}^K w_k\delta_{\theta_k},
\vw \in \Delta_{K-1}, \btheta \in \Theta^K \right }. \end{equation} A mixture of (exactly) order $K$ has its mixing distribution $G$ being a member of $\sG_{K}\backslash\sG_{K-1} = {G: G\in \sG_{K}\text{ and }G\not\in\sG_{K-1}}$. The order $K$ is also referred to as the number of components of a mixture. A mixture model is a collection of mixture distributions of $\gF$.

The finite mixtures are commonly used to model the distribution of population that exhibits heterogeneity. In many applications, the population can be decomposed into several different but homogeneous subpopulations, whose distributions can be modelled by a classical parametric distribution. As early as in 1894,\citet{pearson1894contributions} applies a Gaussian mixture to analyze crabs’ ratio of the forehead to body length data. \begin{figure}[htbp] \centering \includegraphics[width=0.9\textwidth]{figure/misc/Pearson.png} \caption{Plot of the histogram of the ratio of forehead to body length data on $1000$ crabs and of the fitted Gaussian density (dashed line) and two-component Gaussian mixture density (solid line). The two-component Gaussian mixture suggests the crabs may be from two unidentified subspecies.} \label{fig:pearson_example} \end{figure} The histogram of the ratio of forehead to body length of $1000$ crabs that Pearson analyzed is shown in Figure\ref{fig:pearson_example}. In this figure, the dashed line is the density function of the single Gaussian distribution fitted to the data. The Gaussian distribution is clearly not a good fit. Based on the general understanding that a well developed biological species should have its biometrics normally (Gaussian) distributed, Pearson suggests the $1000$ crabs is composed $2$ unidentified subspecies. He subsequently fits a $2$-component Gaussian mixture to the data and the density function of the fitted mixture is given by the solid line in Figure~\ref{fig:pearson_example}. The well fitted mixture supports the two subspecies hypothesis of crabs in the collected sample. Finite mixtures are also widely used in other disciplines. In finance, people believe the stock prices in the stock market is either in a normal" state or a extreme" state~\citep{liesenfeld2001generalized}. Hence, the distribution of stock prices often resembles Gaussian mixture. In the study the evolution of galaxies,~\citet{baldry2004color} suggests the existence of $2$ galaxy subpopulations: a passively evolving red galaxy subpopulation and a blue star-forming galaxy subpopulation. A $2$-component mixture fitted on the data suggests that there cannot be a continuous evolution and the rapid change of galaxies in these $2$ subpopulations is due to galaxy merger.

In machine learning, finite mixtures are often used as probabilistic models for clustering analysis~\citep{bishop2006pattern}. The finite mixture model is used in~\citet{fraley2002model} to cluster breast cancer patients into different groups. Clinically, doctors generally divide the tumours into either malignant or benign types. Their analysis suggests that there may be $3$ groups suggesting that the malignant tumour may be in different stages. The finding based on mixture model is clinically important to determine an appropriate course of action for malignancy. In a clinical example in~\citet{baudry2010combining}, the Gaussian mixture is used to study the development of the graft-versus-host disease (GvHD). GvHD occurs in allogeneic hematopoietic stem cell transplant recipients when donor-immune cells in the graft attack the skin, gut, liver, and other tissues of the recipient. GvHD is diagnosed by clinical and histologic criteria that are often nonspecific and it is typically apparent only after the disease is well established. In their study, a mixture model is fitted to the bio-marker of GvHD positive patient data and the result suggests the existence of $4$ cell subpopulations. These cell subpopulations correspond to colour combinations of lymphocyte phenotypic and activation markers at progressive time points post transplant.

It is often cited~\citep{titterington1985statistical,nguyen2020approximation} that there always exists a Gaussian mixture whose density function is arbitrarily close to any density function. For example, the kernel density estimate with Gaussian kernel and proper bandwidth is consistent for any continuous density function that varnishes at infinity. \todo{reference} Finite mixtures are therefore also broadly used as a parametric model to approximate distributions with unknown shapes. Figure~\ref{fig:gmm_density} gives density functions of Gaussian mixtures with various shapes, demonstrating their ability to approximate an arbitrary density. \begin{figure}[htbp] \centering
\includegraphics[width=0.24\textwidth]{figure/misc/skewed_unimodal_density.png} \includegraphics[width=0.24\textwidth]{figure/misc/strongly_skewed_density.png} \includegraphics[width=0.24\textwidth]{figure/misc/bimodal_density.png} \includegraphics[width=0.24\textwidth]{figure/misc/asymmetric_bimodal_density.png}\ \includegraphics[width=0.24\textwidth]{figure/misc/claw_density.png} \includegraphics[width=0.24\textwidth]{figure/misc/asymmetric_claw_density.png} \includegraphics[width=0.24\textwidth]{figure/misc/discrete_comb.png} \includegraphics[width=0.24\textwidth]{figure/misc/double_claw_density.png} \caption{Density function of Gaussian mixtures with various shapes in \citet[Section 1.5]{mclachlan2004finite}.} \label{fig:gmm_density} \end{figure} In system design in engineering, the shape of the distribution of the design life of systems can vary considerably. \citet{buvcar2004reliability} proposes to approximate the density functions of these distributions by finite Weibull mixtures. In~\citet{santosh2013tracking},\citet{brubaker2015map}, and\citet{yu2018density}, the Gaussian mixtures are used to approximate density functions in Bayesian inference procedures under hidden Markov models for the task of object tracking in video sequences.

\subsubsection{Commonly Used Models for Subpopulation} There are a lot of choices for the subpopulation distribution family $\gF$. We give several examples of the most commonly used models below.

The finite Gaussian mixtures are by far the most studied finite mixture model. For example,\citet{lo2001testing},\citet{chen2009hypothesis}, and~\citet{chen2012inference} study the problem of testing the order of a Gaussian mixture. The \texttt{mclust} package in \texttt{R}\citep{scrucca2016mclust} is developed for using finite Gaussian mixtures for model based clustering, classification, and density estimation in applications. \citet{xu1996convergence} studies the convergence of the \ac{EM} algorithm under finite Gaussian mixtures. Various learning approaches under finite Gaussian mixtures\citep{vlassis2002greedy,pernkopf2005genetic,constantinopoulos2007unsupervised} are also studied. We give the density function of finite Gaussian mixtures in the following example.

\begin{example}[Finite Gaussian Mixture] As the name suggests, a finite Gaussian mixture is a mixture of Gaussian distributions. A $d$-dimensional Gaussian distribution with mean vector $\mu$ and covariance matrix $\Sigma$ has density function given by [ \phi (x ; \mu, \Sigma)

\det(2\pi\Sigma)^{-1/2} \exp \left {-\frac{1}{2}(x-\mu)^\top \Sigma^{-1}(x-\mu) \right } ] where $\text{det}(\cdot)$ is the determinant of a square matrix. We denote by $\Phi(x; \mu, \Sigma)$ its \acs{CDF}. We denote the density of a finite \ac{GMM} of order $K$ and its \acs{CDF} by [ \phi(x; G) = \sum_{k=1}^K w_k \phi(x; \mu_k,\Sigma_k);\quad \Phi(x; G) = \sum_{k=1}^K w_k \Phi (x; \mu_k,\Sigma_k). ] Under the finite \acs{GMM}, the subpopulation parameter is $\theta=(\mu, \Sigma)$ with its parameter space $\Theta = \sR^d \times \sS_{+}^{d}$ where $\sS_{+}^{d}$ is the space of all $d\times d$ positive definite matrices. \end{example}

Binomial and Poisson mixtures are also broadly investigated in the literature and used in applications. In genetics, the number of recombinants of a family with $K$ offspring has binomial mixture distribution in the presence of genetic mutations~\citep{chernoff1995asymptotic}. The Poisson mixture model is well motivated for count data such as the number of patents~\citep{wang1998analysis} and the spinal tumour counts for patients with the disease neurofibromatosis 2~\citep{joe2005generalized}. The Gamma mixtures are often used to model household income distribution~\citep{he2021strong}. These mixtures may be regarded as special cases where the subpopulation distribution family of the mixture, $\gF$, is an exponential family. The exponential family is defined as follows.

\begin{definition}[Exponential Family] An exponential family is defined as a distribution family whose densities can be represented as [ f(x; \theta) = \exp{\theta^{\top} T(x) - A(\theta)}h(x) ] with respect to some reference measure $\nu(\cdot)$. In this definition, the vector $\theta = (\theta_1, \theta_2,\ldots,\theta_m)^{\top}$ is called the natural parameter. The natural sufficient statistics is the vector $T(x)=(T_1(x), T_2(x),\ldots, T_m(x))^{\top}$. The function $h(x)$ modifies the reference measure $\nu(\cdot)$ and the log-partition $A(\theta)$ is a normalization constant that does not depend on $x$. The parameter space of $\theta$ is usually expanded to be [ \Theta = \left{ \theta\in\sR^{m}: \int \exp{\theta^{\top}T(x)},\nu(dx)<\infty \right}. ] \end{definition} In Table~\ref{tab:exp_family}, we list widely used exponential families with their sufficient statistics and parameter space. We do not include the reference measure $\nu(\cdot)$ and $h(x)$ as they are not relevant in statistical inferences.

\begin{table}[htb] \centering \caption{The natural sufficient statistics, natural parameter, and parameter space of some widely used exponential distribution families.} \resizebox{\textwidth}{!}{ \begin{tabular}{llll} \toprule Name of $\gF$ & $T(x)$ & $A(\theta)$ & $\Theta$\ \midrule \multicolumn{4}{c}{Univariate discrete distributions}\ Binomial & $x$ & $\log {1 + \exp(\theta)}$ & $\sR$\ Poisson & $x$ & $\exp(\theta)$ & $\sR$\ \midrule \multicolumn{4}{c}{Univariate continuous distributions}\ Exponential & $x$ & $-\log(-\theta)$ & $(-\infty, 0)$ \ Weibull (known $k$) &
$x^k$ & $-\log(-\theta)$ & $(-\infty, 0)$ \ Laplace (known $\mu$) & $|x-\mu|$ & $\log(-2/\theta)$ & $(-\infty, 0)$\ Rayleigh & $x^2$ & $-\log(-2\theta)$ & $(-\infty, 0)$\ Log-normal & $(\log x, \log^2 x)^{\top}$ & $-\theta_1^2/\theta_2-1/\sqrt{2\theta_2}$ & $\sR \times (-\infty, 0)$ \ Gamma & $(\log x, x)^{\top}$ & $\log\Gamma(\theta_1+1)-(\theta_1+1)\log(-\theta_2)$ & $(-1, \infty) \times(-\infty, 0)$\ Inverse Gamma & $(\log x, 1/x)^{\top}$ & $\log\Gamma(-\theta_1-1))+(\theta_1+1)\log(-\theta_2)$ & $(-\infty, -1)\times (-\infty, 0)$\ \bottomrule \end{tabular}} \label{tab:exp_family} \end{table}

Another well investigated class of mixture models have location-scale families as their subpopulation model $\gF$. Let $f_0(x)$ be the distribution of a univariate random variable with support $x \in \sR$. A location-scale distribution family is formed by all distributions with density function \begin{equation*} f(x; \theta) = \frac{1}{\sigma} f_0 \left(\frac{x-\mu}{\sigma} \right) \end{equation*} for $\theta = (\mu, \sigma)^{\top}$ and parameter space $\Theta = \sR \times (0, \infty)$. Some examples of $f_0(x)$ are: \begin{itemize} \item Univariate Gaussian distribution: $f_0(x)=\phi(x;0,1)$; \item Logistic distribution: $f_0(x) = \exp(-x)/{1+\exp(-x)}^2$; \item Gumbel distribution (type I extreme-value distribution): $f_0(x)= \exp{-x-\exp(-x)}$. \end{itemize}

\citet{naya2006logistic} uses logistic mixture for thermogravimetric analysis and \citet{salimans2017pixelcnn++} uses this model for image analysis. The Weibull mixture is used by~\citet{hernandez2006weibull} to characterize end-to-end network delays and by~\citet{marin2005using} to model the life time of patients with lupus nephritis. \citet{zhang2006fitting} applies a mixture of Weibull distribution to model irregular diameter distribution of forest stands. The Weibull mixture is also used by~\cite{carta2007analysis} to model the distribution of wind speed.

What is Gaussian mixture reduction (GMR)

Applications of GMR

Kalman filter

Existing approaches for GMR