Next: , Previous: , Up: Least-Squares Fitting   [Index]


37.5 Ridge (Tikhonov) regression

Ordinary least squares models seek a solution vector c which minimizes the residual In cases where the least squares matrix X is ill-conditioned, small perturbations of the matrix could lead to widely different solution vectors c. In these cases it is often advantageous to include a regularization term in the least squares minimization for a suitably chosen matrix L. In many cases, L is chosen as a multiple of the identity matrix, giving preference to solution vectors c with smaller norms. Including this regularization term leads to the explicit solution which reduces to the ordinary least squares solution when L = 0. The routines below support two formats for the L matrix. The first is L = \lambda I giving a solution c = \left( X^T X + \lambda^2 I \right)^{-1} X^T y which damps all solution coefficients equally, and is known as the Tikhonov "standard form". For any square invertible matrix L, the general problem can be converted to standard form through the transformation leading to \tilde{c} = \left( \tilde{X}^T \tilde{X} + I \right)^{-1} \tilde{X}^T y which is in standard form with \lambda^2 = 1. The second regularization matrix input supported by GSL has the form L = diag(\lambda_0,\lambda_1,...,\lambda_{p-1}) which allows the user to selectively damp each coefficient differently. For this matrix, GSL performs the above transformation and then backtransforms the least squares solution to recover the original vector c. For applications which require a more complicated matrix L, the user can apply the above transformation themselves if L is square and invertible and use gsl_multifit_linear_ridge, setting the input \lambda = 1. If L is not square or not invertible, other techniques must be used (see Hansen 1998, chapter 2.3).

Function: int gsl_multifit_linear_ridge (const double lambda, const gsl_matrix * X, const gsl_vector * y, gsl_vector * c, gsl_matrix * cov, double * chisq, gsl_multifit_linear_workspace * work)

This function computes the best-fit parameters c of the model y = X c for the observations y and the matrix of predictor variables X, using the preallocated workspace provided in work. The Tikhonov matrix appearing in the least squares solution is L = \lambda I where \lambda is provided in lambda, so that \lambda^2 is added to the diagonal elements of X^T X. The p-by-p variance-covariance matrix of the model parameters cov is set to \sigma^2 (X^T X)^{-1}, where \sigma is the standard deviation of the fit residuals. The residual \chi^2 = || Xc - y ||^2 + \lambda^2 || c ||^2 is returned in chisq.

The best-fit is found by singular value decomposition of the matrix X. Any components which have zero singular value (to machine precision) are discarded from the fit.

Function: int gsl_multifit_linear_ridge2 (const gsl_vector * lambda, const gsl_matrix * X, const gsl_vector * y, gsl_vector * c, gsl_matrix * cov, double * chisq, gsl_multifit_linear_workspace * work)

This function computes the best-fit parameters c of the model y = X c for the observations y and the matrix of predictor variables X, using the preallocated workspace provided in work. The Tikhonov matrix appearing in the least squares solution is L = diag(\lambda_0,\lambda_1,...,\lambda_{p-1}) where \lambda_i is provided in lambda[i]. The problem is converted to standard form as discussed above, solved and backtransformed to recover the desired coefficients. Therefore, the matrix L^{-1} must exist and so none of the \lambda_i may be zero. The p-by-p variance-covariance matrix of the model parameters cov is set to \sigma^2 (X^T X)^{-1}, where \sigma is the standard deviation of the fit residuals. The residual \chi^2 = || Xc - y ||^2 + || L c ||^2 is returned in chisq.

The best-fit is found by singular value decomposition of the matrix X. Any components which have zero singular value (to machine precision) are discarded from the fit.


Next: , Previous: , Up: Least-Squares Fitting   [Index]