3.1 Levenberg Method

Define a modified information matrix, with a damping factor $ \lambda$:

$\displaystyle \mathbf{A}\equiv\mathbf{J}^{T}\cdot\mathbf{R}^{-1}\cdot\mathbf{J}+\lambda\cdot\mathbf{I}$ (32)

As $ \lambda\rightarrow0$, $ \mathbf{A}$ approaches the unmodified information matrix. For $ \lambda\rightarrow\infty$, $ \mathbf{A}$ is dominated by the identity matrix. As $ \lambda$ increases, the computed update $ \boldsymbol{\delta}$ tends to the scaled gradient descent direction:

$\displaystyle \boldsymbol{\delta}\rightarrow\frac{1}{\lambda}\left(\mathbf{J}^{T}\cdot\mathbf{R}^{-1}\cdot\mathbf{v}\right)$ (33)

with decreasing step size. Thus if the residual $ L$ is not currently at a minimum, increasing $ \lambda$ and computing the update $ \boldsymbol{\delta}$ will eventually lead to a decrease in $ L$.

To control convergence behavior, we modify $ \lambda$ according to a simple schedule, controlled by two factors $ 1<a<b$. Typical values are $ a=2$ and $ b=10$. Starting with parameters $ \mathbf{x}$, residual $ L(\mathbf{x})$, and damping value $ \lambda$, an update $ \boldsymbol{\delta}_{\lambda}$ is computed and applied:

$\displaystyle \mathbf{x}^{\prime}=\mathbf{x}\oplus\boldsymbol{\delta}_{\lambda}$ (34)

Then the residual $ L(\mathbf{x}^{\prime})$ is computed under the new parameters. If the residual has decreased, such that $ L(\mathbf{x}^{\prime})<L(\mathbf{x})$, then the update is valid, and the damping factor is decreased by factor $ a$:

$\displaystyle \mathbf{x}$ $\displaystyle \leftarrow$ $\displaystyle \mathbf{x}^{\prime}$ (35)
$\displaystyle \lambda$ $\displaystyle \leftarrow$ $\displaystyle \frac{1}{a}\cdot\lambda$ (36)

If the residual has increased, or has not decreased by some threshold amount, the parameters are left unchanged and $ \lambda$ is increased:

$\displaystyle \lambda$ $\displaystyle \leftarrow$ $\displaystyle b\cdot\lambda$ (37)

Thus only parameter updates that decrease the residual are kept. The process is iterated similarly to the Gauss-Newton method, and can be terminated when $ \lambda$ reaches a large threshold value (which corresponds to a vanishingly small update). Note that in the case where the parameter update is rejected and $ \lambda$ increases, the information matrix and vector need not be recomputed. Instead only the matrix $ \mathbf{A}$ needs to be updated using the new $ \lambda$ value, and the linear system solved to find a new $ \boldsymbol{\delta}_{\lambda}$.

Ethan Eade 2012-02-16