The essence of the method of least squares is. Where is the method of least squares applied? Examples of solving problems by the least squares method

It has many applications, as it allows an approximate representation of a given function by other simpler ones. LSM can be extremely useful in processing observations, and it is actively used to estimate some quantities from the results of measurements of others containing random errors. In this article, you will learn how to implement least squares calculations in Excel.

Statement of the problem on a specific example

Suppose there are two indicators X and Y. Moreover, Y depends on X. Since OLS is of interest to us from the point of view of regression analysis (in Excel, its methods are implemented using built-in functions), we should immediately proceed to consider a specific problem.

So, let X be the selling area of ​​a grocery store, measured in square meters, and Y be the annual turnover, defined in millions of rubles.

It is required to make a forecast of what turnover (Y) the store will have if it has one or another retail space. Obviously, the function Y = f (X) is increasing, since the hypermarket sells more goods than the stall.

A few words about the correctness of the initial data used for prediction

Let's say we have a table built with data for n stores.

According to mathematical statistics, the results will be more or less correct if the data on at least 5-6 objects are examined. Also, "anomalous" results cannot be used. In particular, an elite small boutique can have a turnover many times greater than the turnover of large outlets of the “masmarket” class.

The essence of the method

The table data can be displayed on the Cartesian plane as points M 1 (x 1, y 1), ... M n (x n, y n). Now the solution of the problem will be reduced to the selection of an approximating function y = f (x), which has a graph passing as close as possible to the points M 1, M 2, .. M n .

Of course, you can use a high degree polynomial, but this option is not only difficult to implement, but simply incorrect, since it will not reflect the main trend that needs to be detected. The most reasonable solution is to search for a straight line y = ax + b, which best approximates the experimental data, and more precisely, the coefficients - a and b.

Accuracy score

For any approximation, the assessment of its accuracy is of particular importance. Denote by e i the difference (deviation) between the functional and experimental values ​​for the point x i , i.e. e i = y i - f (x i).

Obviously, to assess the accuracy of the approximation, you can use the sum of deviations, i.e., when choosing a straight line for an approximate representation of the dependence of X on Y, preference should be given to the one that has the smallest value of the sum e i at all points under consideration. However, not everything is so simple, since along with positive deviations, there will practically be negative ones.

You can solve the problem using the deviation modules or their squares. The latter method is the most widely used. It is used in many areas, including regression analysis (in Excel, its implementation is carried out using two built-in functions), and has long been proven to be effective.

Least square method

In Excel, as you know, there is a built-in autosum function that allows you to calculate the values ​​of all values ​​located in the selected range. Thus, nothing will prevent us from calculating the value of the expression (e 1 2 + e 2 2 + e 3 2 + ... e n 2).

In mathematical notation, this looks like:

Since the decision was initially made to approximate using a straight line, we have:

Thus, the task of finding a straight line that best describes a specific relationship between X and Y amounts to calculating the minimum of a function of two variables:

This requires equating to zero partial derivatives with respect to new variables a and b, and solving a primitive system consisting of two equations with 2 unknowns of the form:

After simple transformations, including dividing by 2 and manipulating the sums, we get:

Solving it, for example, by Cramer's method, we obtain a stationary point with certain coefficients a * and b * . This is the minimum, i.e. to predict what turnover the store will have for a certain area, the straight line y = a * x + b * is suitable, which is a regression model for the example in question. Of course, it will not allow you to find the exact result, but it will help you get an idea of ​​\u200b\u200bwhether buying a store on credit for a particular area will pay off.

How to implement the least squares method in Excel

Excel has a function for calculating the value of the least squares. It has the following form: TREND (known Y values; known X values; new X values; constant). Let's apply the formula for calculating the OLS in Excel to our table.

To do this, in the cell in which the result of the calculation using the least squares method in Excel should be displayed, enter the “=” sign and select the “TREND” function. In the window that opens, fill in the appropriate fields, highlighting:

  • range of known values ​​for Y (in this case data for turnover);
  • range x 1 , …x n , i.e. the size of retail space;
  • and known and unknown values ​​of x, for which you need to find out the size of the turnover (for information about their location on the worksheet, see below).

In addition, there is a logical variable "Const" in the formula. If you enter 1 in the field corresponding to it, then this will mean that calculations should be carried out, assuming that b \u003d 0.

If you need to know the forecast for more than one x value, then after entering the formula, you should not press "Enter", but you need to type the combination "Shift" + "Control" + "Enter" ("Enter") on the keyboard.

Some Features

Regression analysis can be accessible even to dummies. The Excel formula for predicting the value of an array of unknown variables - "TREND" - can be used even by those who have never heard of the least squares method. It is enough just to know some features of its work. In particular:

  • If you place the range of known values ​​of the variable y in one row or column, then each row (column) with known values ​​of x will be perceived by the program as a separate variable.
  • If the range with known x is not specified in the TREND window, then in case of using the function in Excel, the program will consider it as an array consisting of integers, the number of which corresponds to the range with the given values ​​of the variable y.
  • To output an array of "predicted" values, the trend expression must be entered as an array formula.
  • If no new x values ​​are specified, then the TREND function considers them equal to the known ones. If they are not specified, then array 1 is taken as an argument; 2; 3; 4;…, which is commensurate with the range with already given parameters y.
  • The range containing the new x values ​​must have the same or more rows or columns as the range with the given y values. In other words, it must be proportionate to the independent variables.
  • An array with known x values ​​can contain multiple variables. However, if we are talking about only one, then it is required that the ranges with the given values ​​of x and y be commensurate. In the case of several variables, it is necessary that the range with the given y values ​​fit in one column or one row.

FORECAST function

It is implemented using several functions. One of them is called "PREDICTION". It is similar to TREND, i.e. it gives the result of calculations using the least squares method. However, only for one X, for which the value of Y is unknown.

Now you know the Excel formulas for dummies that allow you to predict the value of the future value of an indicator according to a linear trend.

The problem is to find the linear dependence coefficients for which the function of two variables A And b takes the smallest value. That is, given the data A And b the sum of the squared deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

Thus, the solution of the example is reduced to finding the extremum of a function of two variables.

Derivation of formulas for finding coefficients. A system of two equations with two unknowns is compiled and solved. Finding partial derivatives of functions by variables A And b, we equate these derivatives to zero.

We solve the resulting system of equations by any method (for example, the substitution method or the Cramer method) and obtain formulas for finding the coefficients using the least squares method (LSM).

With data A And b function takes the smallest value.

That's the whole method of least squares. Formula for finding the parameter a contains the sums , , , and the parameter n- amount of experimental data. The values ​​of these sums are recommended to be calculated separately. Coefficient b found after calculation a.

The main area of ​​application of such polynomials is the processing of experimental data (the construction of empirical formulas). The fact is that the interpolation polynomial constructed from the values ​​of the function obtained with the help of the experiment will be strongly influenced by "experimental noise", moreover, during interpolation, the interpolation nodes cannot be repeated, i.e. you can not use the results of repeated experiments under the same conditions. The root-mean-square polynomial smoothes the noise and makes it possible to use the results of multiple experiments.

Numerical integration and differentiation. Example.

Numerical Integration- calculation of the value of a definite integral (as a rule, approximate). Numerical integration is understood as a set of numerical methods for finding the value of a certain integral.

Numerical differentiation– a set of methods for calculating the value of the derivative of a discretely given function.

Integration

Formulation of the problem. Mathematical statement of the problem: it is necessary to find the value of a certain integral

where a, b are finite, f(x) is continuous on [а, b].

When solving practical problems, it often happens that the integral is inconvenient or impossible to take analytically: it may not be expressed in elementary functions, the integrand can be given in the form of a table, etc. In such cases, numerical integration methods are used. Numerical integration methods use the replacement of the area of ​​a curvilinear trapezoid by a finite sum of areas of simpler geometric shapes that can be calculated exactly. In this sense one speaks of the use of quadrature formulas.

Most methods use the representation of the integral as a finite sum (quadrature formula):

The quadrature formulas are based on the idea of ​​replacing the graph of the integrand on the integration interval with functions of a simpler form, which can be easily integrated analytically and, thus, easily calculated. The simplest task of constructing quadrature formulas is realized for polynomial mathematical models.

Three groups of methods can be distinguished:

1. Method with division of the segment of integration into equal intervals. The division into intervals is done in advance, usually the intervals are chosen equal (to make it easier to calculate the function at the ends of the intervals). Calculate areas and sum them up (methods of rectangles, trapezoid, Simpson).

2. Methods with partitioning of the segment of integration using special points (Gauss method).

3. Calculation of integrals using random numbers (Monte Carlo method).

Rectangle method. Let the function (drawing) be integrated numerically on the segment . We divide the segment into N equal intervals. The area of ​​each of the N curvilinear trapezoids can be replaced by the area of ​​a rectangle.

The width of all rectangles is the same and equal to:

As a choice of the height of the rectangles, you can choose the value of the function on the left border. In this case, the height of the first rectangle will be f(a), the second one will be f(x 1),…, N-f(N-1).

If we take the value of the function on the right border as the choice of the height of the rectangle, then in this case the height of the first rectangle will be f (x 1), the second - f (x 2), ..., N - f (x N).

As can be seen, in this case one of the formulas gives an approximation to the integral with an excess, and the second with a deficiency. There is another way - to use the value of the function in the middle of the integration segment for approximation:

Estimation of the absolute error of the method of rectangles (middle)

Estimation of the absolute error of the methods of left and right rectangles.

Example. Calculate for the entire interval and dividing the interval into four sections

Solution. Analytical calculation of this integral gives I=arctg(1)–arctg(0)=0.7853981634. In our case:

1) h = 1; xo = 0; x1 = 1;

2) h = 0.25 (1/4); x0 = 0; x1 = 0.25; x2 = 0.5; x3 = 0.75; x4 = 1;

We calculate by the method of left rectangles:

We calculate by the method of right rectangles:

Calculate by the method of average rectangles:

Trapezoidal method. Using a polynomial of the first degree for interpolation (a straight line drawn through two points) leads to the trapezoid formula. The ends of the integration segment are taken as interpolation nodes. Thus, the curvilinear trapezoid is replaced by an ordinary trapezoid, the area of ​​\u200b\u200bwhich can be found as the product of half the sum of the bases and the height

In the case of N segments of integration for all nodes, except for the extreme points of the segment, the value of the function will be included in the total sum twice (since neighboring trapezoids have one common side)

The trapezoid formula can be obtained by taking half the sum of the rectangle formulas along the right and left edges of the segment:

Checking the stability of the solution. As a rule, the shorter the length of each interval, i.e. the greater the number of these intervals, the less the difference between the approximate and exact values ​​of the integral. This is true for most functions. In the trapezoid method, the error in calculating the integral ϭ is approximately proportional to the square of the integration step (ϭ ~ h 2). Thus, to calculate the integral of a certain function in the limits a, b, it is necessary to divide the segment into N 0 intervals and find the sum of the areas of the trapezoid. Then you need to increase the number of intervals N 1, again calculate the sum of the trapezoid and compare the resulting value with the previous result. This should be repeated until (N i) until the specified accuracy of the result (convergence criterion) is reached.

For the rectangle and trapezoid methods, usually at each iteration step, the number of intervals increases by a factor of 2 (N i +1 =2N i).

Convergence criterion:

The main advantage of the trapezoid rule is its simplicity. However, if the integration requires high precision, this method may require too many iterations.

Absolute error of the trapezoidal method rated as
.

Example. Calculate an approximately definite integral using the trapezoid formula.

a) Dividing the integration segment into 3 parts.
b) Dividing the segment of integration into 5 parts.

Solution:
a) By condition, the integration segment must be divided into 3 parts, that is.
Calculate the length of each segment of the partition: .

Thus, the general formula of trapezoids is reduced to a pleasant size:

Finally:

I remind you that the resulting value is an approximate value of the area.

b) We divide the integration segment into 5 equal parts, that is, . by increasing the number of segments, we increase the accuracy of calculations.

If , then the trapezoid formula takes the following form:

Let's find the partitioning step:
, that is, the length of each intermediate segment is 0.6.

When finishing the task, it is convenient to draw up all calculations with a calculation table:

In the first line we write "counter"

As a result:

Well, there really is a clarification, and a serious one!
If for 3 segments of the partition , then for 5 segments . If you take even more segment => will be even more accurate.

Simpson formula. The trapezoid formula gives a result that strongly depends on the step size h, which affects the accuracy of calculating a definite integral, especially in cases where the function is nonmonotonic. One can assume an increase in the accuracy of calculations if, instead of segments of straight lines replacing the curvilinear fragments of the graph of the function f(x), we use, for example, fragments of parabolas given through three neighboring points of the graph. A similar geometric interpretation underlies Simpson's method for calculating the definite integral. The entire integration interval a,b is divided into N segments, the length of the segment will also be equal to h=(b-a)/N.

Simpson's formula is:

remainder term

With an increase in the length of the segments, the accuracy of the formula decreases, therefore, to increase the accuracy, the composite Simpson formula is used. The entire integration interval is divided into an even number of identical segments N, the length of the segment will also be equal to h=(b-a)/N. The composite Simpson formula is:

In the formula, the expressions in brackets are the sums of the values ​​of the integrand, respectively, at the ends of the odd and even internal segments.

The remainder term of Simpson's formula is already proportional to the fourth power of the step:

Example: Calculate the integral using Simpson's rule. (Exact solution - 0.2)

Gauss method

Quadrature formula of Gauss. The basic principle of quadrature formulas of the second variety is visible from Figure 1.12: it is necessary to place the points in such a way X 0 and X 1 inside the segment [ a;b] so that the areas of the "triangles" in total are equal to the areas of the "segment". When using the Gauss formula, the initial segment [ a;b] is reduced to the interval [-1;1] by changing the variable X on

0.5∙(ba)∙t+ 0.5∙(b + a).

Then , Where .

This substitution is possible if a And b are finite, and the function f(x) is continuous on [ a;b]. Gauss formula for n points x i, i=0,1,..,n-1 inside the segment [ a;b]:

, (1.27)

Where t i And A i for various n are given in reference books. For example, when n=2 A 0 =A 1=1; at n=3: t 0 =t 2" 0.775, t 1 =0, A 0 =A 2" 0.555, A 1" 0.889.

Quadrature formula of Gauss

obtained with a weight function equal to one p(x)= 1 and nodes x i, which are the roots of the Legendre polynomials

Odds A i easily calculated by formulas

i=0,1,2,...n.

The values ​​of nodes and coefficients for n=2,3,4,5 are given in the table

Order Knots Odds
n=2 x 1=0 x 0 =-x2=0.7745966692 A 1=8/9 A 0 = A 2=5/9
n=3 x 2 =-x 1=0.3399810436 x 3 =-x0=0.8611363116 A 1 =A 2=0.6521451549 A 0 = A 3=0.6521451549
n=4 x 2 = 0 x 3 = -x 1 = 0.5384693101 x 4 =-x 0 =0.9061798459 A 0 =0.568888899 A 3 =A 1 =0.4786286705 A 0 =A 4 =0.2869268851
n=5 x 5 = -x 0 =0.9324695142 x 4 = -x 1 =0.6612093865 x 3 = -x 2 =0.2386191861 A 5 =A 0 =0.1713244924 A 4 =A 1 =0.3607615730 A 3 =A 2 =0.4679139346

Example. Calculate the value using the Gauss formula for n=2:

Exact value: .

The algorithm for calculating the integral according to the Gauss formula provides not for doubling the number of microsegments, but for increasing the number of ordinates by 1 and comparing the obtained values ​​of the integral. The advantage of the Gauss formula is high accuracy with a relatively small number of ordinates. Disadvantages: inconvenient for manual calculations; must be stored in computer memory t i, A i for various n.

The error of the Gauss quadrature formula on the segment will be at the same time For the formula of the remainder term will be where the coefficient α N decreases rapidly with growth N. Here

Gauss formulas provide high accuracy already with a small number of nodes (from 4 to 10). In this case, in practical calculations, the number of nodes ranges from several hundred to several thousand. We also note that the weights of Gaussian quadratures are always positive, which ensures the stability of the algorithm for calculating the sums

The method of least squares (LSM) allows you to estimate various quantities using the results of many measurements containing random errors.

Characteristic MNC

The main idea of ​​this method is that the sum of squared errors is considered as a criterion for the accuracy of the solution of the problem, which is sought to be minimized. When using this method, both numerical and analytical approaches can be applied.

In particular, as a numerical implementation, the least squares method implies making as many measurements of an unknown random variable as possible. Moreover, the more calculations, the more accurate the solution will be. On this set of calculations (initial data), another set of proposed solutions is obtained, from which the best one is then selected. If the set of solutions is parametrized, then the least squares method will be reduced to finding the optimal value of the parameters.

As an analytical approach to the implementation of LSM on the set of initial data (measurements) and the proposed set of solutions, some (functional) is defined, which can be expressed by a formula obtained as a certain hypothesis that needs to be confirmed. In this case, the least squares method is reduced to finding the minimum of this functional on the set of squared errors of the initial data.

Note that not the errors themselves, but the squares of the errors. Why? The fact is that often the deviations of measurements from the exact value are both positive and negative. When determining the average, simple summation can lead to an incorrect conclusion about the quality of the estimate, since the mutual cancellation of positive and negative values ​​will reduce the sampling power of the set of measurements. And, consequently, the accuracy of the assessment.

To prevent this from happening, the squared deviations are summed up. Even more than that, in order to equalize the dimension of the measured value and the final estimate, the sum of squared errors is used to extract

Some applications of MNCs

MNC is widely used in various fields. For example, in probability theory and mathematical statistics, the method is used to determine such a characteristic of a random variable as the standard deviation, which determines the width of the range of values ​​of the random variable.

Approximation of experimental data is a method based on the replacement of experimentally obtained data with an analytical function that most closely passes or coincides at the nodal points with the initial values ​​(data obtained during the experiment or experiment). There are currently two ways to define an analytic function:

By constructing an n-degree interpolation polynomial that passes directly through all points given array of data. In this case, the approximating function is represented as: an interpolation polynomial in the Lagrange form or an interpolation polynomial in the Newton form.

By constructing an n-degree approximating polynomial that passes close to points from the given data array. Thus, the approximating function smooths out all random noise (or errors) that may occur during the experiment: the measured values ​​during the experiment depend on random factors that fluctuate according to their own random laws (measurement or instrument errors, inaccuracy or experimental errors). In this case, the approximating function is determined by the least squares method.

Least square method(in the English literature Ordinary Least Squares, OLS) is a mathematical method based on the definition of an approximating function, which is built in the closest proximity to points from a given array of experimental data. The closeness of the initial and approximating functions F(x) is determined by a numerical measure, namely: the sum of the squared deviations of the experimental data from the approximating curve F(x) should be the smallest.

Fitting curve constructed by the least squares method

The least squares method is used:

To solve overdetermined systems of equations when the number of equations exceeds the number of unknowns;

To search for a solution in the case of ordinary (not overdetermined) nonlinear systems of equations;

For approximating point values ​​by some approximating function.

The approximating function by the least squares method is determined from the condition of the minimum sum of squared deviations of the calculated approximating function from a given array of experimental data. This criterion of the least squares method is written as the following expression:

Values ​​of the calculated approximating function at nodal points ,

Specified array of experimental data at nodal points .

The quadratic criterion has a number of "good" properties, such as differentiability, providing a unique solution to the approximation problem with polynomial approximating functions.

Depending on the conditions of the problem, the approximating function is a polynomial of degree m

The degree of the approximating function does not depend on the number of nodal points, but its dimension must always be less than the dimension (number of points) of the given array of experimental data.

∙ If the degree of the approximating function is m=1, then we approximate the table function with a straight line (linear regression).

∙ If the degree of the approximating function is m=2, then we approximate the table function with a quadratic parabola (quadratic approximation).

∙ If the degree of the approximating function is m=3, then we approximate the table function with a cubic parabola (cubic approximation).

In the general case, when it is required to construct an approximating polynomial of degree m for given tabular values, the condition for the minimum sum of squared deviations over all nodal points is rewritten in the following form:

- unknown coefficients of the approximating polynomial of degree m;

The number of specified table values.

A necessary condition for the existence of a minimum of a function is the equality to zero of its partial derivatives with respect to unknown variables . As a result, we obtain the following system of equations:

Let's transform the resulting linear system of equations: open the brackets and move the free terms to the right side of the expression. As a result, the resulting system of linear algebraic expressions will be written in the following form:

This system of linear algebraic expressions can be rewritten in matrix form:

As a result, a system of linear equations of dimension m + 1 was obtained, which consists of m + 1 unknowns. This system can be solved using any method for solving linear algebraic equations (for example, the Gauss method). As a result of the solution, unknown parameters of the approximating function will be found that provide the minimum sum of squared deviations of the approximating function from the original data, i.e. the best possible quadratic approximation. It should be remembered that if even one value of the initial data changes, all coefficients will change their values, since they are completely determined by the initial data.

Approximation of initial data by linear dependence

(linear regression)

As an example, consider the method for determining the approximating function, which is given as a linear relationship. In accordance with the least squares method, the condition for the minimum sum of squared deviations is written as follows:

Coordinates of nodal points of the table;

Unknown coefficients of the approximating function, which is given as a linear relationship.

A necessary condition for the existence of a minimum of a function is the equality to zero of its partial derivatives with respect to unknown variables. As a result, we obtain the following system of equations:

Let us transform the resulting linear system of equations.

We solve the resulting system of linear equations. The coefficients of the approximating function in an analytical form are determined as follows (Cramer's method):

These coefficients provide the construction of a linear approximating function in accordance with the criterion for minimizing the sum of squares of the approximating function from given tabular values ​​(experimental data).

Algorithm for implementing the method of least squares

1. Initial data:

Given an array of experimental data with the number of measurements N

The degree of the approximating polynomial (m) is given

2. Calculation algorithm:

2.1. Coefficients are determined for constructing a system of equations with dimension

Coefficients of the system of equations (left side of the equation)

- index of the column number of the square matrix of the system of equations

Free members of the system of linear equations (right side of the equation)

- index of the row number of the square matrix of the system of equations

2.2. Formation of a system of linear equations with dimension .

2.3. Solution of a system of linear equations in order to determine the unknown coefficients of the approximating polynomial of degree m.

2.4 Determination of the sum of squared deviations of the approximating polynomial from the initial values ​​over all nodal points

The found value of the sum of squared deviations is the minimum possible.

Approximation with Other Functions

It should be noted that when approximating the initial data in accordance with the least squares method, a logarithmic function, an exponential function, and a power function are sometimes used as an approximating function.

Log approximation

Consider the case when the approximating function is given by a logarithmic function of the form:

Least square method

Least square method ( MNK, OLS, Ordinary Least Squares) - one of the basic methods of regression analysis for estimating unknown parameters of regression models from sample data. The method is based on minimizing the sum of squares of regression residuals.

It should be noted that the least squares method itself can be called a method for solving a problem in any area, if the solution consists of or satisfies a certain criterion for minimizing the sum of squares of some functions of the unknown variables. Therefore, the least squares method can also be used for an approximate representation (approximation) of a given function by other (simpler) functions, when finding a set of quantities that satisfy equations or restrictions, the number of which exceeds the number of these quantities, etc.

The essence of the MNC

Let some (parametric) model of probabilistic (regression) dependence between the (explained) variable y and many factors (explanatory variables) x

where is the vector of unknown model parameters

- Random model error.

Let there also be sample observations of the values ​​of the indicated variables. Let be the observation number (). Then are the values ​​of the variables in the -th observation. Then, for given values ​​of the parameters b, it is possible to calculate the theoretical (model) values ​​of the explained variable y:

The value of the residuals depends on the values ​​of the parameters b.

The essence of LSM (ordinary, classical) is to find such parameters b for which the sum of the squares of the residuals (eng. Residual Sum of Squares) will be minimal:

In the general case, this problem can be solved by numerical methods of optimization (minimization). In this case, one speaks of nonlinear least squares(NLS or NLLS - English. Non Linear Least Squares). In many cases, an analytical solution can be obtained. To solve the minimization problem, it is necessary to find the stationary points of the function by differentiating it with respect to the unknown parameters b, equating the derivatives to zero, and solving the resulting system of equations:

If the random errors of the model are normally distributed, have the same variance, and are not correlated with each other, the least squares parameter estimates are the same as the maximum likelihood method (MLM) estimates.

LSM in the case of a linear model

Let the regression dependence be linear:

Let y- column vector of observations of the explained variable, and - matrix of observations of factors (rows of the matrix - vectors of factor values ​​in a given observation, by columns - vector of values ​​of a given factor in all observations). The matrix representation of the linear model has the form:

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal to

accordingly, the sum of the squares of the regression residuals will be equal to

Differentiating this function with respect to the parameter vector and equating the derivatives to zero, we obtain a system of equations (in matrix form):

.

The solution of this system of equations gives the general formula for the least squares estimates for the linear model:

For analytical purposes, the last representation of this formula turns out to be useful. If the data in the regression model centered, then in this representation the first matrix has the meaning of the sample covariance matrix of factors, and the second one is the vector of covariances of factors with dependent variable. If, in addition, the data is also normalized at the SKO (that is, ultimately standardized), then the first matrix has the meaning of the sample correlation matrix of factors, the second vector - the vector of sample correlations of factors with the dependent variable.

An important property of LLS estimates for models with a constant- the line of the constructed regression passes through the center of gravity of the sample data, that is, the equality is fulfilled:

In particular, in the extreme case, when the only regressor is a constant, we find that the OLS estimate of a single parameter (the constant itself) is equal to the mean value of the variable being explained. That is, the arithmetic mean, known for its good properties from the laws of large numbers, is also an least squares estimate - it satisfies the criterion for the minimum sum of squared deviations from it.

Example: simple (pairwise) regression

In the case of paired linear regression, the calculation formulas are simplified (you can do without matrix algebra):

Properties of OLS estimates

First of all, we note that for linear models, the least squares estimates are linear estimates, as follows from the above formula. For unbiased OLS estimates, it is necessary and sufficient to fulfill the most important condition of regression analysis: the mathematical expectation of a random error conditional on the factors must be equal to zero. This condition is satisfied, in particular, if

  1. the mathematical expectation of random errors is zero, and
  2. factors and random errors are independent random variables.

The second condition - the condition of exogenous factors - is fundamental. If this property is not satisfied, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow obtaining qualitative estimates in this case). In the classical case, a stronger assumption is made about the determinism of factors, in contrast to a random error, which automatically means that the exogenous condition is satisfied. In the general case, for the consistency of the estimates, it is sufficient to fulfill the exogeneity condition together with the convergence of the matrix to some non-singular matrix with an increase in the sample size to infinity.

In order for, in addition to the consistency and unbiasedness, the estimates of the (usual) LSM to be also effective (the best in the class of linear unbiased estimates), it is necessary to fulfill additional properties of a random error:

These assumptions can be formulated for the covariance matrix of the random error vector

A linear model that satisfies these conditions is called classical. The least squares estimators for classical linear regression are unbiased, consistent, and the most efficient estimators in the class of all linear unbiased estimators (the abbreviation blue (Best Linear Unbaised Estimator) is the best linear unbiased estimate; in domestic literature, the Gauss-Markov theorem is more often cited). As it is easy to show, the covariance matrix of the coefficient estimates vector will be equal to:

Generalized least squares

The method of least squares allows for a wide generalization. Instead of minimizing the sum of squares of the residuals, one can minimize some positive definite quadratic form of the residual vector , where is some symmetric positive definite weight matrix. Ordinary least squares is a special case of this approach, when the weight matrix is ​​proportional to the identity matrix. As is known from the theory of symmetric matrices (or operators), there is a decomposition for such matrices. Therefore, the specified functional can be represented as follows, that is, this functional can be represented as the sum of the squares of some transformed "residuals". Thus, we can distinguish a class of least squares methods - LS-methods (Least Squares).

It is proved (Aitken's theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are estimates of the so-called. generalized OLS (OMNK, GLS - Generalized Least Squares)- LS-method with a weight matrix equal to the inverse covariance matrix of random errors: .

It can be shown that the formula for the GLS-estimates of the parameters of the linear model has the form

The covariance matrix of these estimates, respectively, will be equal to

In fact, the essence of the OLS lies in a certain (linear) transformation (P) of the original data and the application of the usual least squares to the transformed data. The purpose of this transformation is that for the transformed data, the random errors already satisfy the classical assumptions.

Weighted least squares

In the case of a diagonal weight matrix (and hence the covariance matrix of random errors), we have the so-called weighted least squares (WLS - Weighted Least Squares). In this case, the weighted sum of squares of the residuals of the model is minimized, that is, each observation receives a "weight" that is inversely proportional to the variance of the random error in this observation: . In fact, the data is transformed by weighting the observations (dividing by an amount proportional to the assumed standard deviation of the random errors), and normal least squares is applied to the weighted data.

Some special cases of application of LSM in practice

Linear Approximation

Consider the case when, as a result of studying the dependence of a certain scalar quantity on a certain scalar quantity (This can be, for example, the dependence of voltage on current strength: , where is a constant value, the resistance of the conductor), these quantities were measured, as a result of which the values ​​\u200b\u200band and their corresponding values. Measurement data should be recorded in a table.

Table. Measurement results.

Measurement No.
1
2
3
4
5
6

The question sounds like this: what value of the coefficient can be chosen to best describe the dependence ? According to the least squares, this value should be such that the sum of the squared deviations of the values ​​from the values

was minimal

The sum of squared deviations has one extremum - a minimum, which allows us to use this formula. Let's find the value of the coefficient from this formula. To do this, we transform its left side as follows:

The last formula allows us to find the value of the coefficient , which was required in the problem.

Story

Until the beginning of the XIX century. scientists did not have certain rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, particular methods were used, depending on the type of equations and on the ingenuity of the calculators, and therefore different calculators, starting from the same observational data, came to different conclusions. Gauss (1795) is credited with the first application of the method, and Legendre (1805) independently discovered and published it under its modern name (fr. Methode des moindres quarres ) . Laplace related the method to the theory of probability, and the American mathematician Adrain (1808) considered its probabilistic applications. The method is widespread and improved by further research by Encke, Bessel, Hansen and others.

Alternative use of MNCs

The idea of ​​the least squares method can also be used in other cases not directly related to regression analysis. The fact is that the sum of squares is one of the most common proximity measures for vectors (the Euclidean metric in finite-dimensional spaces).

One application is "solving" systems of linear equations in which the number of equations is greater than the number of variables

where the matrix is ​​not square, but rectangular.

Such a system of equations, in the general case, has no solution (if the rank is actually greater than the number of variables). Therefore, this system can be "solved" only in the sense of choosing such a vector in order to minimize the "distance" between the vectors and . To do this, you can apply the criterion for minimizing the sum of squared differences of the left and right parts of the equations of the system, that is, . It is easy to show that the solution of this minimization problem leads to the solution of the following system of equations