3.1 Levenberg Method

Define a modified information matrix, with a damping factor $\lambda$ :

$\displaystyle \mathbf{A}\equiv\mathbf{J}^{T}\cdot\mathbf{R}^{-1}\cdot\mathbf{J}+\lambda\cdot\mathbf{I}$

(32)

As $\lambda\rightarrow0$ , $\mathbf{A}$ approaches the unmodified information matrix. For $\lambda\rightarrow\infty$ , $\mathbf{A}$ is dominated by the identity matrix. As $\lambda$ increases, the computed update $\boldsymbol{\delta}$ tends to the scaled gradient descent direction:

$\displaystyle \boldsymbol{\delta}\rightarrow\frac{1}{\lambda}\left(\mathbf{J}^{T}\cdot\mathbf{R}^{-1}\cdot\mathbf{v}\right)$

(33)

with decreasing step size. Thus if the residual

is not currently at a minimum, increasing $\lambda$ and computing the update $\boldsymbol{\delta}$ will eventually lead to a decrease in

To control convergence behavior, we modify $\lambda$ according to a simple schedule, controlled by two factors . Typical values are and . Starting with parameters $\mathbf{x}$ , residual $L(\mathbf{x})$ , and damping value $\lambda$ , an update $\boldsymbol{\delta}_{\lambda}$ is computed and applied:

$\displaystyle \mathbf{x}^{\prime}=\mathbf{x}\oplus\boldsymbol{\delta}_{\lambda}$

(34)

Then the residual $L(\mathbf{x}^{\prime})$ is computed under the new parameters. If the residual has decreased, such that $L(\mathbf{x}^{\prime})<L(\mathbf{x})$ , then the update is valid, and the damping factor is decreased by factor :

$\displaystyle \mathbf{x}$	$\displaystyle \leftarrow$	$\displaystyle \mathbf{x}^{\prime}$	(35)
$\displaystyle \lambda$	$\displaystyle \leftarrow$	$\displaystyle \frac{1}{a}\cdot\lambda$	(36)

If the residual has increased, or has not decreased by some threshold amount, the parameters are left unchanged and $\lambda$ is increased:

$\displaystyle \lambda$

$\displaystyle \leftarrow$

$\displaystyle b\cdot\lambda$

(37)

Thus only parameter updates that decrease the residual are kept. The process is iterated similarly to the Gauss-Newton method, and can be terminated when $\lambda$ reaches a large threshold value (which corresponds to a vanishingly small update). Note that in the case where the parameter update is rejected and $\lambda$ increases, the information matrix and vector need not be recomputed. Instead only the matrix $\mathbf{A}$ needs to be updated using the new $\lambda$ value, and the linear system solved to find a new $\boldsymbol{\delta}_{\lambda}$ .

Ethan Eade 2012-02-16