9512.net

甜梦文库

甜梦文库

当前位置：首页 >> >> # The Method of Least Squares

The Method of Least Squares

Hervé Abdi1

1 Introduction

The least square methods (LSM) is probably the most popular technique in statistics. This is due to several factors. First, most common estimators can be casted within this framework. For example, the mean of a distribution is the value that minimizes the sum of squared deviations of the scores. Second, using squares makes LSM mathematically very tractable because the Pythagorean theorem indicates that, when the error is independent of an estimated quantity, one can add the squared error and the squared estimated quantity. Third, the mathematical tools and algorithms involved in LSM (derivatives, eigendecomposition, singular value decomposition) have been well studied for a relatively long time. LSM is one of the oldest techniques of modern statistics, and even though ancestors of LSM can be traced up to Greek mathematics, the ?rst modern precursor is probably Galileo (see Harper, 1974, for a history and pre-history of LSM). The modern approach was ?rst exposed in 1805 by the French mathematician Legendre in a now classic memoir, but this method is somewhat older because it turned out that, after the publication of Legendre’s memoir, Gauss (the famous German mathematician) contested Legen1 In: Neil Salkind (Ed.) (2007). Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage. Address correspondence to: Hervé Abdi Program in Cognition and Neurosciences, MS: Gr.4.1, The University of Texas at Dallas, Richardson, TX 75083–0688, USA E-mail: herve@utdallas.edu http://www.utd.edu/?herve

1

Hervé Abdi: The Method of Least Squares

dre’s priority. Gauss often did not published ideas when he though that they could be controversial or not yet ripe, but would mention his discoveries when others would publish them (the way he did, for example for the discovery of Non-Euclidean geometry). And in 1809, Gauss published another memoir in which he mentioned that he had previously discovered LSM and used it as early as 1795 in estimating the orbit of an asteroid. A somewhat bitter anteriority dispute followed (a bit reminiscent of the Leibniz-Newton controversy about the invention of Calculus), which, however, did not diminish the popularity of this technique. The use of LSM in a modern statistical framework can be traced to Galton (1886) who used it in his work on the heritability of size which laid down the foundations of correlation and (also gave the name to) regression analysis. The two antagonistic giants of statistics Pearson and Fisher, who did so much in the early development of statistics, used and developed it in different contexts (factor analysis for Pearson and experimental design for Fisher). Nowadays, the least square method is widely used to ?nd or estimate the numerical values of the parameters to ?t a function to a set of data and to characterize the statistical properties of estimates. It exists with several variations: Its simpler version is called ordinary least squares (OLS), a more sophisticated version is called weighted least squares (WLS), which often performs better than OLS because it can modulate the importance of each observation in the ?nal solution. Recent variations of the least square method are alternating least squares (ALS) and partial least squares (PLS).

2 Functional ?t example: regression

The oldest (and still the most frequent) use of OLS was linear regression, which corresponds to the problem of ?nding a line (or curve) that best ?ts a set of data points. In the standard formulation, a set of N pairs of observations {Yi , X i } is used to ?nd a function relating the value of the dependent variable (Y ) to the values of an independent variable (X ). With one variable and a

2

Hervé Abdi: The Method of Least Squares

linear function, the prediction is given by the following equation: ? Y = a + bX . (1)

This equation involves two free parameters which specify the intercept (a) and the slope (b) of the regression line. The least square method de?nes the estimate of these parameters as the values which minimize the sum of the squares (hence the name least squares) between the measurements and the model (i.e., the predicted values). This amounts to minimizing the expression: E=

i

? (Yi ? Yi )2 =

i

[Yi ? (a + bX i )]2

(2)

(where E stands for “error" which is the quantity to be minimized). The estimation of the parameters is obtained using basic results from calculus and, speci?cally, uses the property that a quadratic expression reaches its minimum value when its derivatives vanish. Taking the derivative of E with respect to a and b and setting them to zero gives the following set of equations (called the normal equations): ?E = 2N a + 2b X i ? 2 Yi = 0 (3) ?a and ?E = 2b X i2 + 2a X i ? 2 Yi X i = 0 . (4) ?b Solving the normal equations gives the following least square estimates of a and b as: a = M Y ? bM X (5) (with M Y and M X denoting the means of X and Y ) and b= (Yi ? M Y )(X i ? M X ) (X i ? M X )2 . (6)

OLS can be extended to more than one independent variable (using matrix algebra) and to non-linear functions.

3

Hervé Abdi: The Method of Least Squares

2.1 The geometry of least squares

OLS can be interpreted in a geometrical framework as an orthogonal projection of the data vector onto the space de?ned by the independent variable. The projection is orthogonal because the predicted values and the actual values are uncorrelated. This is illustrated in Figure 1, which depicts the case of two independent variables (vectors x1 and x2 ) and the data vector (y), and shows ? that the error vector (y ? y) is orthogonal to the least square (? ) esy timate which lies in the subspace de?ned by the two independent variables.

y x1 y y_^ ^ y x2

Figure 1: The least square estimate of the data is the orthogonal

projection of the data vector onto the independent variable subspace.

4

Hervé Abdi: The Method of Least Squares

2.2 Optimality of least square estimates

OLS estimates have some strong statistical properties. Speci?cally when (1) the data obtained constitute a random sample from a well-de?ned population, (2) the population model is linear, (3) the error has a zero expected value, (4) the independent variables are linearly independent, and (5) the error is normally distributed and uncorrelated with the independent variables (the so-called homoscedasticity assumption); then the OLS estimate is the best linear unbiased estimate often denoted with the acronym “BLUE" (the 5 conditions and the proof are called the Gauss-Markov conditions and theorem). In addition, when the Gauss-Markov conditions hold, OLS estimates are also maximum likelihood estimates.

2.3 Weighted least squares

The optimality of OLS relies heavily on the homoscedasticity assumption. When the data come from different sub-populations for which an independent estimate of the error variance is available, a better estimate than OLS can be obtained using weighted least squares (WLS), also called generalized least squares (GLS). The idea is to assign to each observation a weight that re?ects the uncertainty of the measurement. In general, the weight w i , assigned to the i th observation, will be a function of the variance of this observation, denoted σ2 . A straightforward weighting schema i is to de?ne w i = σ?1 (but other more sophisticated weighted schi emes can also be proposed). For the linear regression example, WLS will ?nd the values of a and b minimizing: Ew =

i

? w i (Yi ? Yi )2 =

i

w i [Yi ? (a + bX i )]2 .

(7)

2.4 Iterative methods: Gradient descent

When estimating the parameters of a nonlinear function with OLS or WLS, the standard approach using derivatives is not always possible. In this case, iterative methods are very often used. These methods search in a stepwise fashion for the best values of the estimate. Often they proceed by using at each step a linear approx5

Hervé Abdi: The Method of Least Squares

imation of the function and re?ne this approximation by successive corrections. The techniques involved are known as gradient descent and Gauss-Newton approximations. They correspond to nonlinear least squares approximation in numerical analysis and nonlinear regression in statistics. Neural networks constitutes a popular recent application of these techniques

3 Problems with least squares, and alternatives

Despite its popularity and versatility, LSM has its problems. Probably, the most important drawback of LSM is its high sensitivity to outliers (i.e., extreme observations). This is a consequence of using squares because squaring exaggerates the magnitude of differences (e.g., the difference between 20 and 10 is equal to 10 but the difference between 202 and 102 is equal to 300) and therefore gives a much stronger importance to extreme observations. This problem is addressed by using robust techniques which are less sensitive to the effect of outliers. This ?eld is currently under development and is likely to become more important in the next future.

References

[1] Abdi, H., Valentin D., Edelman, B.E. (1999) Neural networks. Thousand Oaks: Sage. [2] Bates, D.M. & Watts D.G. (1988). Nonlinear regression analysis and its applications. New York: Wiley [3] Greene, W.H. (2002). Econometric analysis. New York: Prentice Hall. [4] Harper H.L. (1974–1976). The method of least squares and some alternatives. Part I, II, II, IV, V, VI. International Satistical Review, 42, 147–174; 42, 235–264; 43, 1–44; 43, 125–190; 43, 269–272; 44, 113–159; [5] Nocedal J. & Wright, S. (1999). Numerical optimization. New York: Springer.

6

Hervé Abdi: The Method of Least Squares

[6] Plackett, R.L. (1972). The discovery of the method of least squares. Biometrika, 59, 239–251. [7] Seal, H.L. (1967). The historical development of the Gauss linear model. Biometrika, 54, 1–23.

7

- Final versiongauss and the method of least squares
- The Convergence of Least-squares Projection Method
- Grid--based simulation and the method of conditional least squares
- Separable nonlinear least squares-- the variable projection method and its applications
- [2013]The Levenberg-Marquardt method for nonlinear least squares curve-
- An Analysis of the Total Least Squares Problem
- The least-squares line and plane and the analysis of paleomagnetic data
- Fault current allocation by the least-squares method
- The Approximation Power of Moving Least-Squares
- On the existence of an optimal regression complexity in the Least-Squares

更多相关文章：
**
Final versiongauss and ***the* *method* *of* *least* *squares*_图文.ppt

Gauss’ treatment*of* error Gauss’ derivation *of* *the* *method* *of* *least* *squares* Gauss’ derivation by modern matrix notation Gauss-Markov theorem Limitations *of*...**
General ***Least*-*Squares* Smoothing and Differentiation by *the* ....pdf

General*Least*-*Squares* Smoothing and Differentiation by *the* Convolution (Savitzky-Golay) *Method*_电子/电路_工程科技_专业资料。SG滤波;Savitzky-Golay ...**
***The* *Method* *of* *Least* *Squares*.pdf

*The* *Method* *of* *Least* *Squares* - *The* *least* *square* *methods* (LSM) is probably *the* most popular techniq...**
...simulation and ***the* *method* *of* conditional *least* *squares*_....pdf

Grid--based simulation and*the* *method* *of* conditional *least* *squares*_专业资料。This paper is concerned with *the* use *of* simulation to compute *the* conditional ...**
...***METHOD* *OF* COMPRESSOR MAP BASED ON PARTIAL *LEAST* *SQUARES* ....pdf

STUDY ON*THE* REGRESSION *METHOD* *OF* COMPRESSOR MAP BASED ON PARTIAL *LEAST* *SQUARES* REGRESSION MODELING_工学_高等教育_教育专区。GT2014-25705 ...**
***The* *Least* *Squares* Estimation *Method*.pdf

*The* *Least* *Squares* Estimation *Method*_数学_自然科学_专业资料 暂无评价|0人阅读|0次下载|举报文档 *The* *Least* *Squares* Estimation *Method*_数学_自然科学_专业资料。...**
Application ***of* *the* *Least* *Squares* *Method* for Determining ....pdf

Application*of* *the* *Least* *Squares* *Method* for Determining Magnetic Compass Deviation - 关于利用软件处理图片轮廓...**
Block SOR ***methods* for rank-deficient *least*-*squares* problems_....pdf

However very few papers studied iterative*methods* for solving rank-deficient *least*-*squares* problems. Miller and Neumann (1987) proposed *the* 4-block SOR ...**
...***of* Li-ion battery base on *least* *square* *method*.pdf

*The* *method* research *of* parameter identification *of* Li-ion battery base on *least* *square* *method*_调查/报告_表格/模板_实用文档。Advanced Materials Research Vols...**
***least* *square* *method*.doc

*least* *square* *method*_理学_高等教育_教育专区。我用 括号把层**
6.1 ***Least* *Squares* *Method* (双语)_图文.pdf

L????{ Page 1*of* 27 Go Back Full Screen Close Quit 6.1.1 *Least* *Squares* *Method* ???{ (6.1.1) De?nition 6.1.1 *The* set *of* functions ...**
Incremental Learning ***Method* *of* *Least* *Squares* Support Vector ....pdf

Incremental Learning*Method* *of* *Least* *Squares* Support Vector Machine_电子/电路_...For *the* Support Vector Machine, *the* support vector that *the* Support Vector ...**
***Least* *Squares* Fitting *of* Chacón-Gielis Curves by t....pdf

*Least* *Squares* Fitting *of* Chacón-Gielis Curves by *the* Particle Swarm *Method* *of* Optimization 1 m1 m隐藏>> *Least* *Squares* Fitting *of* Chacón-Gielis Curves ...**
...Regularized Structured Total ***Least* *Squares* Prob.pdf

A Fast*Method* for Finding *the* Global Solution *of* *the* Regularized Structured Total *Least* *Squares* Problem for Image Deblurring Amir Beck? Aharon Ben-Tal? ...**
Total ***Least* *Squares* *Method* for Robust Source Locali....pdf

*The* total *least* *squares* *method* for source localization with sensor location uncertainty is given in Section 3, and *the* corresponding sensitivity analysis is ...**
Analysis ***of* *Least* *Squares* Finite Element *Methods* for A ....pdf

Analysis*of* *Least* *Squares* Finite Element *Methods* for A Parameter-Dependent ...1 We are interested in *the* nite element approximations to *the* following ...**
A total ***least* *squares* *method* for Toeplitz systems *of* ....pdf

A Newton*method* to solve total *least* *squares* problems for Toeplitz systems ...Here we show how *the* *method* *of* Bojanczyk, Brent and de Hoog 4], which...**
...***methods* for *least*-*squares* element computations *of* variably....pdf

GAUSS-NEWTON MULTILEVEL*METHODS* FOR *LEAST*-*SQUARES* FINITE ELEMENT COMPUTATIONS *OF* VARIABLY SATURATED SUBSURFACE FLOW GERHARD STARKE Abstract. We apply *the* *least*...**
A ***least*-*squares* approach for upscaling and *the* acceleration ....pdf

A*least*-*squares* approach for upscaling and *the* acceleration *of* a galerkin technique. Presen_少儿英语_幼儿教育_教育专区。Abstract. A new *method* to determine...**
A total ***least* *squares* *method* for Toeplitz systems *of* ....pdf

Nagyy Abstract A Newton*method* to solve total *least* *squares* problems for ...Here we show how *the* *method* *of* Bojanczyk, Brent and de Hoog 3], which... 更多相关标签：

Gauss’ treatment

General

Grid--based simulation and

STUDY ON

Application

However very few papers studied iterative

L????{ Page 1

Incremental Learning

A Fast

Analysis

A Newton

GAUSS-NEWTON MULTILEVEL

A

Nagyy Abstract A Newton