Butler Trucking: Here we have only one independent variable x1 (# kms travelled) with r2 = 0.60, only. (Dependent variable is y : travel time.) Is the fit good?
We now include a second variable, (x2 : # deliveries made) to explain the travel time. Here are the MegaStat results for this case with the output comments given here.
Multi-collinearity is a serios problem in multiple regression. This arises if two or more independent variables are highly correlated with each other and it results in (i) inflation of standard errors, (ii) distorts the estimates of coefficients, (iii) removing a data point may result in large changes in coefficients (and their signs), among others.
Multi-collinearity can be checked using the correlation matrix output of MegaStat. See this example where distance and gasoline usage are correlated and even though model is significant, individual variables are not!
In the Butler Trucking problem, suppose we consider a case where either a van (1) or a pickup truck (0) is used for deliveries. With such a dummy variable beta3 now corresponds to the additional time it takes to deliver goods with a van. The Excel file for this problem is here.
Real Estate Data (again): This time we use LotSize (x1), SqrFt (x2), Bedrooms (x3) and Bathrooms (x4) as variables and predict the Price (y). The regression equation is obtained as
Price = 17.41 + 12.27*LotSize + 0.01*SqrFt + 55.03*Bedrooms + 20.55*Bathrooms.
So, if your home has: LotSize = 1, SqrFt = 2,800, Bedrooms = 6 and Bathrooms = 4, then Price = $496,457.
Also, note that R2 = 0.93, and the p-value for the F-test is almost 0. What do these imply?
Actual real estate data for Hamilton homes.