3.2 Levenberg-Marquardt Method

A refinement due to Marquardt changes how $ \mathbf{A}$ is defined in terms of $ \lambda$. Instead of damping all parameter dimensions equally (by adding a multiple of the identity matrix), a scaled version of of the diagonal of the information matrix itself can be added:

$\displaystyle \mathbf{A}\equiv\mathbf{J}^{T}\cdot\mathbf{R}^{-1}\cdot\mathbf{J}...
...rm{\mathbf{diag}}\left(\mathbf{J}^{T}\cdot\mathbf{R}^{-1}\cdot\mathbf{J}\right)$ (38)

As $ \lambda$ grows, $ \boldsymbol{\delta}_{\lambda}$ again tends towards a gradient descent update, but with each dimension scaled according to the diagonal of the information matrix. This can lead to faster convergence than the Levenberg damping term when some dimensions of the error surface have much different curvature than others.



Ethan Eade 2012-02-16