072. Ordinary Least Squares Method (the line of best fit)

This method is called OLS because it results in line which is minimizes the squared vertical distances from each observation point to the line itself (Fig.7.3).

Figure 7.3 – Ordinary Least Squares

In Figure 7.3 Yi Is an actual, observed for the Y variable, Ŷ is a value on the line predicted by the equation.

Thus, the vertical difference between all the values Yi and Ŷ should be calculated and then squared to yield (Yi - Ŷ)2. All squared differences are summed then and expressed as

Σ(YiŶ)2 = min, (7.3)

Where Min is the number smaller than any summed squared vertical deviations between the actual data points and any other line. Hence, the term Least squares is used.

The difference YiŶ Is called Residual, or Error.

Assumptions of OLS.

1) The error term is a random variable and is normally distributed.

2) Any two errors are independent of each other, i. e. the error when Xi = 10 is totally independent of the error suffered when Xi+1 is equal to any other value.

3) All errors have the same variance. This condition is known as Homoscedastisity (Fig. 7.4).

Figure 7.4 – Equality of Variance in the Error Term

4) The means of the Y – values all lie on a straight line. Given some value Xi, there will occur a normal distribution of Y-values. This distribution has a mean. The same is true if X is set equal to any other value. OLS assumes that these two means, as well as all others that might be observed, lie on a straight line. This is referred to as the assumption of linearity, and can be expressed as

,

Where is the mean of the population of Y-values for any given value of X.

To calculate the regression coefficient B0 And the intercept B1 in Formula (7.2B) the sums of squares and cross-products are used

, , (7.4)

Where

,

. (7.5)

These calculations are extremely sensitive to rounding. In the interest of accuracy it’s better to carry out calculations of five or six decimal places.

The meaning of b1: it indicates by how much Y will change for every one-unit change in the X-variable.

Example 7.4. In order to make decisions regarding allocations for the advertising budget, the accounting department for Hop Scotch Airlines must determine the nature of the relationship between advertising expenditures and the number of passengers. The senior accountant recognizes that regression analysis would be of invaluable assistance.

Table 7.2 – Regression Data for Hop Scotch Airlines

Observation

(months)

Advertising (in $1000’s)

(X)

Passengers (in $1000’s)

(Y)

XY

X2

Y2

1

10

15

150

100

225

2

12

17

204

144

289

3

8

13

104

64

169

4

17

23

391

299

529

5

10

16

160

100

256

6

15

21

315

225

441

7

10

14

140

100

196

8

14

20

280

196

400

9

19

24

456

361

575

10

10

17

170

100

589

11

11

16

176

121

256

12

13

18

234

169

324

13

16

23

368

256

529

14

10

15

150

100

225

15

12

16

192

144

256

Total

187

268

3490

2469

4960

Solution: from Formulas (7.4) coefficients of the regression model can be determined as

or 1.08,

or 4.4.

The regression equation is therefore

Ŷ = 4.40 + 1.08X.

Interpretation:

1) The model tells that if, for example, $10000 is spent on advertising (X=10), then

Ŷ = 4.40 + 1.08(10) = 15.2 passengers (in 1000’s)

Will choose to fly Hop Scotch.

B) Since b1 = 1.08, for every additional $1000 that Hop Scotch spends on advertising, 1080 more passengers will choose this air company, i. e. if advertising is increased by one unit to $11000, the estimate of total passenger becomes Ŷ = 4.40 + 1.08(11) = 16.28 or 16280 passengers.

As there is no evidence of a cause – and – effect relationship the simultaneous increase in X and Y may have been caused by an unknown third variable excluded from the study. It is a common misconception to assume that there exists a cause – and – effect relationship between the two variables.

© 2011-2024 Контрольные работы по математике и другим предметам!