CAS TRUePic Upload

Posted on 2017-01-25 | In iOS

What is it?

The app CAS TRUePic Upload is an iOS application that let truckers upload their information (Driver’s License, Trucking company..) in order to pick up sensitive shipments handled by a cargo company called CAS (Consolidated Aviation Services). This application started with an idea to simplify and speed up the processes of picking sensitive shipments. This application has been published on the App Store.

What was required?

The project was required to implement the backend using ASP.NET MVC and the iOS application using Objective-C. All the communications between the server and the application used JSON, and the application has features such as upload documents, view shipment information that is visible to public…

Backend (ASP.NET MVC)

The server side code was implemented using ASP.NET MVC (C#) and published internal company server.
The server code used Restful API using JSON.
The codes are implemented as clean and testable as possible.

Mobile (iOS)

The iOS mobile application was implemented using Objective-C.
It has features such as upload documents, view shipment information.

mahalanobis_distance

Posted on 2016-08-04

Gradient Descent

Posted on 2016-07-22 | In Machine Learning

Gradient Descent

Let’s assume that we have a function $f$ that the value of $f(x), x \in \text{dom}(f)$ needs to be minimized at some parameter $x$ in its domain. The Gradient Descent algorithm is an optimization algorithm that finds this parameter $x$.

Gradient Descent

Definition

Let $f$ be a differentiable function at some point $a \in \text{dom}(f)^D$ and $\theta$ be some small real number. Then, we have sequence of points ${ x_{i} }_{i=0}^{L}$ such that

$$\begin{align*} x_{i+1} & = x_i - \nabla \cdot f(x_i) \end{align*}$$

where $f(x_{j-1}) \geq f(x_j),\ j \in {1, 2, 3, \ldots}$

Since the gradient $\nabla \cdot f(x_i)$ is a vector with magnitude and direction that the function increases greatest, the negative direction of the gradient will be the magnitude and the direction that the function decreases greatest. The algorithm iteratively finds the point $x \in \text{dom}(f)^D$ where the value of $f(x)$ is minimized. This point $x$ will be a local minimum point unless the function $f$ is convex function which guarantees that the local minimum is the global minimum.

Finding the Learning Rate

One of the most important part of the Gradient Descent algorithm is finding the best learning rate. Depends on this value, the algorithm can diverge or converge. If the learning rate is too high, the algorithm can diverge with high probability. So one good way to find this value is start high and decrease whenever it diverges, $f(x_i) > f(x_j)$ for all $i > j$.

References

Linear Regression

Posted on 2016-07-16 | In Machine Learning

Linear Regression

The Linear Regression is a statistical approach to find the relationship within the data where the data consist of dependent variables $y_i$ and independent variables $\{ x_{i,k} \}_{k=1}^{L}$ with the dimension $L$. The most famous approach to fit the data set is called Least Square Method which finds the best fitted line to the data set.

Least Square Method

The Least Square method is a way of finding the line that best fits the given data set. By saying best fitted line, we mean that the line that minimizes the distance(Error) between the line and each data point.

Let the dimension be $L=1$, and assume that we have a data set ${ y_i, x_i}$ for $i \in \{1, 2, \ldots , D\}$. With the equation of a line $y = mx + b$ on hands, we may define the error function as follows:

$$\begin{equation*} E(m, b) = \sum_{i = 1}^{D} \left[ y_i - (mx_i + b) \right]^2 \end{equation*}$$

In order to find the best fitted line with this error function, the focus comes down to find the parameters, the slope $m$ and the y-intercept $b$. In calculus, finding the critical points of a function can give us the points that minimizes the function. So setting the partial derivatives of the error functions equal to 0, we have:
$$\begin{align*} \frac{\partial E(m,b)}{\partial m} &= 2 \cdot \sum_{i=1}^{D} \left[ y_i - (mx_i + b) \right] \cdot (-x_i) \\ &= 0 \\ \frac{\partial E(m,b)}{\partial b} &= 2 \cdot \sum_{i=1}^{D} \left[ y_i - (mx_i + b) \right] \cdot (1) \\ &=0 \end{align*}$$

Each of the partial derivative can be expanded and re-rewritten as follow :

$$\begin{align*} & 2 \cdot \sum_{i=1}^{D} \left[ y_i x_i - m x_i^2 - b x_i \right] = 0 \\ \implies & \sum_{i=1}^{D} m x_i^2 + \sum_{i=1}^{D} b x_i = \sum_{i=1}^{D} y_i x_i \\ \end{align*}$$

and
$$\begin{align*} & 2 \cdot \sum_{i = 1}^{D} \left[ y_i - m x_i - b \right] = 0 \\ \implies & \sum_{i=1}^{D} m x_i + \sum_{i=1}^{D} b = \sum_{i=1}^{D} y_i \end{align*}$$

In matrix form, this can be re-rewritten as
$$\begin{bmatrix} \sum_{i=1}^{D} x_i^2 & \sum_{i=1}^{D} x_i \\ \sum_{i=1}^{D} x_i & \sum_{i=1}^{D} 1 \end{bmatrix} \cdot \begin{bmatrix} m \\ b \end{bmatrix} = \begin{bmatrix} \sum_{i=1}^{D} y_i x_i \\ \sum_{i=1}^{D} y_i \end{bmatrix}$$

Since the goal is to find the parameters $m$ and $b$, the above matrix equation can be solved by finding the inverse matrix as follow :
$$\begin{bmatrix} m \\ b \end{bmatrix} = \begin{bmatrix} \sum_{i=1}^{D} x_i^2 & \sum_{i=1}^{D} x_i \\ \sum_{i=1}^{D} x_i & \sum_{i=1}^{D} 1 \end{bmatrix}^{-1} \cdot \begin{bmatrix} \sum_{i=1}^{D} y_i x_i \\ \sum_{i=1}^{D} y_i \end{bmatrix}$$

References

Binomial Distribution

Posted on 2016-06-30 | In Probability Theory

Binomial Distribution

The Binomial Distribution is a discrete probability distribution which tells us that the number of successes in some sequence of independent events with parameter $n$ (# of independent events) and $p$ (probability of each event). If some random variable $X$ has binomial distribution, we denote it as $X \sim B(n, p)$.

Probability Mass function

Let

$n$ be the # of independent events
$k$ be the # of successes in $n$ events
$p$ be the probability of each event

then, the probability mass function can be defined as

$$\begin{align*} p(X = k) = { n \choose k } p^k (1-p)^{(n-k)} \end{align*}$$

where ${ n \choose k } = \frac{n!}{k!(n-k)!}$ is the binomial coefficient.

Binomial Probability in Code

Input

n :

k :

The following graph is implemented in D3.js and plots the probability of $k$ number of successes in $n$ independent trials. The condition of variables $n$ and $k$ is that $n >= k$ since there cannot be more successes than the total number of trials. Besides the actual code that draws the plots on the screen, there are parts required to calculate the probability.

In order to find the Binomial probability, it is required to find the Binomial Coefficient which is defined as ${ n \choose k } = \frac{n!}{k!(n-k)!}$ is the binomial coefficient. This involves finding factorial.

Factorial

The factorial is which is defined as
$$\begin{align*} n! &= \prod_{i=1}^{n} i \end{align*}$$

and $0! = 1$. In code this can be calculated as below.

function factorial(n) {
  if(n === 0) return 1;
  var result = 1;
  for(var i = n; i > 0; i--) {
    result *= i;
  }
  return result;
}

Binomial Coefficient

As stated above, the Binomial Coefficient is defined as follow.

$$\begin{align*} { n \choose k } &= \frac{n!}{k!(n-k)!} \end{align*}$$

Since we already have factorial in hands, this can be calculated easily.

function binomialCoefficient(n, k) {
  var nFact = factorial(n);
  var kFact = factorial(k);
  var n_kFact = factorial(n - k);
  return (nFact/(kFact * n_kFact));
}

Binomial Probability

Once we have Factorial and Binomial Coefficient in hands, finding the Binomial Probability with parameters $n$ and $k$ is easy. Since the Binomial Probability is defined as follow :

$$\begin{align*} p(X = k) = { n \choose k } p^k (1-p)^{(n-k)} \end{align*}$$

we can just substitute the equations with code as follow :

1
2
3

function binomialProb(n, k, p) {
  return binomialCoefficient(n, k) * Math.pow(p, k) * Math.pow(1 - p, n - k);
}

The Perceptron Learning Algorithm

Posted on 2016-06-16 | In Machine Learning

Perceptron Learning Algorithm(PLA)

The Perceptron algorithm is a supervised learning algorithm that find a binary classifiers(Yes/No, True/False). Let $x \in \mathbb{R}^d$ and $y \in \mathbb{R}$. Then, a function that maps $d$ dimensional vector to a single signed value can be defined as follow :
$$\begin{aligned} f(x) & = \sum_{i=1}^{d}w_i x_i + b \\ & = w^T \cdot x + b \end{aligned}$$

where $w \in \mathbb{R}^d$ be the weight vector(classifier), $b$ be the bias, and $\cdot$ denotes the dot product. If input data is linearly separable, then there exists a line $x \cdot w + b = 0$ that clearly separates the data into two groups.

To make the formula even simpler, we can merge the bias term $b$ into the vector $w = [w_0, w_1, \ldots , w_d]$ where $w_0 = b$ and $x = [x_0, x_1, \ldots , x_d]$ where $x_0 = 1$. Then, the function $f(x)$ becomes

$$f(x) = w^T \cdot x$$

The Perceptron Learning Algorithm(PLA) finds the weight vector $w$ with finite number of iterations. The update rule is

$$w(t+1) = w(t) + y(t)x(t)$$

where $y(t)$ is given result scalar value, $x(t)$ is the input vector and $t = 0, 1, 2, \ldots$.

References

Otsu's method

Posted on 2016-06-07 | In Computer Vision

What is it?

Otsu’s Method is one of image thresholding algorithm in computer vision. One assumption that the algorithm requires is that the given image has two separated groups of pixel intensities (foreground and background). Given this condition on the input image, the algorithm tries to find the optimal pixel intensity that clearly separates this two groups. The implementation of the algorithm can be found here.

Theory behind algorithm

Let

the range of intensity value be $0$ ~ $L$.
$C_i$ where $i \in \{0, 1\}$ be two classes for collection of pixel intensities divided by $t$ (Bi-modal assumption)
$\omega_j$ where $j \in \{0, 1\}$ be the probability distribution of the pixel intensity within each class
the variances of each class be $\sigma^2_i$, where $i \in \{0,1\}$
$n_i$ the number of pixels that has intensity value $i$ and $N$ be the total number of pixels within image, $N=\sum_{i=0}^{L}n_i$.

The goal is to find one pixel intensity (threshold) value $t_{opt}$ that clearly separates $C_{i}$ into two classes. In order to achieve this, we need a way to represent how far these two classes are separated. The further the two classes are separated the easier to find the optimal value $t_{opt}$. The Otsu’s paper uses discriminant criterion measures used in the discriminant analysis, and claims that the simplest formulation is :

$$
\eta = \frac{\sigma_B^2}{\sigma_T^2}
$$

where inter-class variance (how far between each class is) be
$$\begin{aligned} \sigma_B^2 & = \omega_0(\mu_0 - \mu_T)^2 + \omega_1(\mu_1 - \mu_T)^2 \\ & = \omega_0\omega_1(\mu_1 - \mu_0)^2 \end{aligned}$$

and the total variance be

$$
\sigma_T^2 = \sum_{i=0}^{L} (i - \mu_T)^2 \cdot p_i
$$

where $p_i$ be normalized probability of intensity $i$.

The paper states that $\eta$ is function of intensity $t$ and finding the intensity $t$ will give the optimal threshold value $t_{opt}$.

Normalized probability distribution of pixel intensity within image would be :
$$
p_i = \frac{n_i}{N}
$$

where $p_i \geq 0$ and $\sum_{i=0}^{L} p_i = 1$.

The class probabilities $\omega_j, j \in \{0, 1\}$ are defined as

$$\begin{aligned} \omega_0 &= p_{C_0} = \sum_{i=0}^{t - 1} p_i = \omega(t) \\ \omega_1 &= p_{C_1} = \sum_{i=t}^{L} p_i = 1 - \omega(t) \end{aligned}$$

The above statement is valid because of Kolmogorov axioms (Third Axiom) which says $p\left(\bigcup_{e \in E} e\right) = \sum_{e \in E} p(e)$, and we assumed the probability $p_i$ is normalized.

The mean $\mu_i$ where $i \in \{0, 1\}$ of each class $C_i$ can be found as follows :

$$\begin{aligned} \mu_0 & = \sum_{i=0}^{t-1} i \cdot \frac{p_i}{\omega_0} = \frac{\mu(t)}{\omega(t)} \\ \mu_1 & = \sum_{i=t}^{L} i\cdot\frac{p_i}{\omega_1} = \frac{\mu_T - \mu(t)}{1 - \omega(t)} \end{aligned}$$

where

$$\begin{aligned} \mu_T &= \sum_{i=0}^{L} i \cdot p(i) \\ \mu(t) &= \sum_{i=0}^{t-1} i \cdot p_i \\ \omega(t) &= \sum_{i=0}^{t-1} p_i \end{aligned}$$

Up to this point, we have normalized class probability and class mean. The last part we are going to define is class variance. The class variances will be defined as follow :

$$\begin{aligned} \sigma_0^2(t) & = \sum_{i=0}^{t-1} [i - \mu_0]^2 \cdot \frac{p_i}{\omega_0} \\ \sigma_1^2(t) & = \sum_{i=t}^{L} [i - \mu_1]^2 \cdot \frac{p_i}{\omega_1} \end{aligned}$$

The goal is to find $t_{opt}$ that maximize $\eta$. There are few things that we need to keep in mind.

The variance $\sigma_B^2$ is function of $t$, so is the function $\eta$.
The function $\eta$ increases as $\sigma_B^2$ increases.

Then, we have $\eta(t) = \frac{\sigma_B^2(t)}{\sigma_T^2}$, and $\sigma_B^2(t)$ can be formulated as follow :

$$\begin{aligned} \sigma_B^2(t) &= \omega_0 \omega_1(\mu_1-\mu_0)^2 \\\\ &= \left[\omega(t) \cdot (1-\omega(t))\right] \cdot \left[\frac{\mu_T \cdot \omega(t)-\mu(t)}{\left[(1-\omega(t)) \cdot \omega(t)\right]}\right]^2 \\ &=\frac{\left[ \mu_T \cdot \omega(t) - \mu(t) \right]^2}{(1-\omega(t)) \cdot \omega(t)} \end{aligned}$$

Since $\eta(t)$ increases as $\sigma_B^2(t)$ increases, we only have to find $t_{opt}$ such that

$$\begin{aligned} t_{opt} = \operatorname{arg\ max}_{0 \leq t \leq L} \sigma_B(t) \end{aligned}$$

References

Kolmogorov axioms
Wikipedia
다크 프로그래머
Otsu, 1979. A Threshold Selection Method from Gray-Level Histograms

Storeband

Posted on 2016-06-05 | In ios

Tatap

Posted on 2016-06-04 | In Android

What is it?

Tatap is a android chat app that lets you make pre-defined messages to your friends, and send them with two taps. This app can be useful in a situation where you are in hurry or need to send quick messages. The features it provides are :

Pre-defined messages so that you can send them as quickly as possible.
You can send messages with Tatap widgets on your phone screen without opening the app.
You can send or receive location message that pin points where you or your friend is.

What was required?

The project required to have its backend server and mobile application. The backend server was implemented using Node.js, and android app was developed. All communication between the mobile app and the server was done with json, and each end-point in the server was well tested.

Backend (Node.js)

The server was setup on AWS.
Nginx for reverse-proxy.

Mobile (Android)

Retrofit2 was used for REST API communication.
Google map SDK for location message.
Google analytics for app usages.

Edward Chung

Man is what he believes.

GitHub LinkedIn