Gradient Descent
Let’s assume that we have a function $f$ that the value of $f(x), x \in \text{dom}(f)$ needs to be minimized at some parameter $x$ in its domain. The Gradient Descent algorithm is an optimization algorithm that finds this parameter $x$.
Gradient Descent
Definition
Let $f$ be a differentiable function at some point $a \in \text{dom}(f)^D$ and $\theta$ be some small real number. Then, we have sequence of points ${ x_{i} }_{i=0}^{L}$ such that
$$\begin{align*} x_{i+1} & = x_i - \nabla \cdot f(x_i) \end{align*}$$where $f(x_{j-1}) \geq f(x_j),\ j \in {1, 2, 3, \ldots}$
Since the gradient $\nabla \cdot f(x_i)$ is a vector with magnitude and direction that the function increases greatest, the negative direction of the gradient will be the magnitude and the direction that the function decreases greatest. The algorithm iteratively finds the point $x \in \text{dom}(f)^D$ where the value of $f(x)$ is minimized. This point $x$ will be a local minimum point unless the function $f$ is convex function which guarantees that the local minimum is the global minimum.
Finding the Learning Rate
One of the most important part of the Gradient Descent algorithm is finding the best learning rate. Depends on this value, the algorithm can diverge or converge. If the learning rate is too high, the algorithm can diverge with high probability. So one good way to find this value is start high and decrease whenever it diverges, $f(x_i) > f(x_j)$ for all $i > j$.