What is it?

Otsu’s Method is one of image thresholding algorithm in computer vision. One assumption that the algorithm requires is that the given image has two separated groups of pixel intensities (foreground and background). Given this condition on the input image, the algorithm tries to find the optimal pixel intensity that clearly separates this two groups. The implementation of the algorithm can be found here.

Theory behind algorithm

Let

the range of intensity value be $0$ ~ $L$.
$C_i$ where $i \in \{0, 1\}$ be two classes for collection of pixel intensities divided by $t$ (Bi-modal assumption)
$\omega_j$ where $j \in \{0, 1\}$ be the probability distribution of the pixel intensity within each class
the variances of each class be $\sigma^2_i$, where $i \in \{0,1\}$
$n_i$ the number of pixels that has intensity value $i$ and $N$ be the total number of pixels within image, $N=\sum_{i=0}^{L}n_i$.

The goal is to find one pixel intensity (threshold) value $t_{opt}$ that clearly separates $C_{i}$ into two classes. In order to achieve this, we need a way to represent how far these two classes are separated. The further the two classes are separated the easier to find the optimal value $t_{opt}$. The Otsu’s paper uses discriminant criterion measures used in the discriminant analysis, and claims that the simplest formulation is :

$$
\eta = \frac{\sigma_B^2}{\sigma_T^2}
$$

where inter-class variance (how far between each class is) be
$$\begin{aligned} \sigma_B^2 & = \omega_0(\mu_0 - \mu_T)^2 + \omega_1(\mu_1 - \mu_T)^2 \\ & = \omega_0\omega_1(\mu_1 - \mu_0)^2 \end{aligned}$$

and the total variance be

$$
\sigma_T^2 = \sum_{i=0}^{L} (i - \mu_T)^2 \cdot p_i
$$

where $p_i$ be normalized probability of intensity $i$.

The paper states that $\eta$ is function of intensity $t$ and finding the intensity $t$ will give the optimal threshold value $t_{opt}$.

Normalized probability distribution of pixel intensity within image would be :
$$
p_i = \frac{n_i}{N}
$$

where $p_i \geq 0$ and $\sum_{i=0}^{L} p_i = 1$.

The class probabilities $\omega_j, j \in \{0, 1\}$ are defined as

$$\begin{aligned} \omega_0 &= p_{C_0} = \sum_{i=0}^{t - 1} p_i = \omega(t) \\ \omega_1 &= p_{C_1} = \sum_{i=t}^{L} p_i = 1 - \omega(t) \end{aligned}$$

The above statement is valid because of Kolmogorov axioms (Third Axiom) which says $p\left(\bigcup_{e \in E} e\right) = \sum_{e \in E} p(e)$, and we assumed the probability $p_i$ is normalized.

The mean $\mu_i$ where $i \in \{0, 1\}$ of each class $C_i$ can be found as follows :

$$\begin{aligned} \mu_0 & = \sum_{i=0}^{t-1} i \cdot \frac{p_i}{\omega_0} = \frac{\mu(t)}{\omega(t)} \\ \mu_1 & = \sum_{i=t}^{L} i\cdot\frac{p_i}{\omega_1} = \frac{\mu_T - \mu(t)}{1 - \omega(t)} \end{aligned}$$

where

$$\begin{aligned} \mu_T &= \sum_{i=0}^{L} i \cdot p(i) \\ \mu(t) &= \sum_{i=0}^{t-1} i \cdot p_i \\ \omega(t) &= \sum_{i=0}^{t-1} p_i \end{aligned}$$

Up to this point, we have normalized class probability and class mean. The last part we are going to define is class variance. The class variances will be defined as follow :

$$\begin{aligned} \sigma_0^2(t) & = \sum_{i=0}^{t-1} [i - \mu_0]^2 \cdot \frac{p_i}{\omega_0} \\ \sigma_1^2(t) & = \sum_{i=t}^{L} [i - \mu_1]^2 \cdot \frac{p_i}{\omega_1} \end{aligned}$$

The goal is to find $t_{opt}$ that maximize $\eta$. There are few things that we need to keep in mind.

The variance $\sigma_B^2$ is function of $t$, so is the function $\eta$.
The function $\eta$ increases as $\sigma_B^2$ increases.

Then, we have $\eta(t) = \frac{\sigma_B^2(t)}{\sigma_T^2}$, and $\sigma_B^2(t)$ can be formulated as follow :

$$\begin{aligned} \sigma_B^2(t) &= \omega_0 \omega_1(\mu_1-\mu_0)^2 \\\\ &= \left[\omega(t) \cdot (1-\omega(t))\right] \cdot \left[\frac{\mu_T \cdot \omega(t)-\mu(t)}{\left[(1-\omega(t)) \cdot \omega(t)\right]}\right]^2 \\ &=\frac{\left[ \mu_T \cdot \omega(t) - \mu(t) \right]^2}{(1-\omega(t)) \cdot \omega(t)} \end{aligned}$$

Since $\eta(t)$ increases as $\sigma_B^2(t)$ increases, we only have to find $t_{opt}$ such that

$$\begin{aligned} t_{opt} = \operatorname{arg\ max}_{0 \leq t \leq L} \sigma_B(t) \end{aligned}$$

References

Kolmogorov axioms
Wikipedia
다크 프로그래머
Otsu, 1979. A Threshold Selection Method from Gray-Level Histograms