4%, so that the two variables together explain about 68% of the variation in total sales. However, if we look at the p-values ​​for the individual variables of the multiple regression, we see that, even at the 0.1 level, none of the variables is a significant explanatory variable. What happened? In simple regression, each variable is highly significant, and in multiple regression they are so collectively, but not individually. This apparent contradiction is explained when we observe that the number of ads has a high correlation with their cost. In fact, the correlation between these two variables is r 0.8949, so we have a multicollinearity problem in the data. We could ask ourselves why these two variables are not perfectly correlated. The reason is that the cost of an advertisement varies slightly, depending on where it occupies in the newspaper. For example, on Sunday, ads placed in the television section cost more than those in the news section, and the manager of Pizza Shack has placed ads in each of these sections on different occasions. Since X1 and X2 are closely related, in effect, each explains the same part of the variability of Y. This is why we get r 2 61.0% in the first simple regression, r 2 67.3% in the second simple regression, and an r 2 of only 68.4% in the multiple regression. Aggre-

Regression analysis The regression equation is SALES = 6.58 + 0.62 ADS + 2.14 COST Constant Forecaster ADS COST s = 3.989

Using the number of ads as the second explanatory variable, in addition to the cost of the ads, explains only about an additional 1% of the variation in total sales. At this point, it is fair to ask: which variable really explains the variation in total sales in the multiple regression? The answer is that they both explain it, but we cannot separate their individual contributions, because they are highly correlated with each other. Consequently, their coefficients in multiple regression have high standard errors, relatively low calculated t-values, and prob | t | relatively high. How does this multicollinearity affect us? We can still make relatively accurate predictions when it is present: note that for multiple regression (the output is given in Figure 13-8), the standard error of the estimate, which determines the width of the confidence intervals for the predictions is 3.

Leave a Reply

Your email address will not be published. Required fields are marked *