Assignment 2
Assignment 2
1) Generate the summary statistics for each variable in the table. (Use Data analysis
toolpak). Write down your observations. (5 marks)
4) Create a correlation matrix of all the variables (Use Data analysis tool pack).
a) Which are the top 3 positively correlated pairs.
b) Which are the top 3 negatively correlated pairs. (5 marks)
Top 3 positively correlated pairs are - TAX & DISTANCE, NOX & INDUS, NOX & AGE.
Top 3 negatively correlated pairs are - AVG_PRICE& LSTAT, AVG_ROOM& LSTAT,
AVG_PRICE& PTRATIO.
5) Build an initial regression model with AVG_PRICE as ‘y’ (Dependent variable) and LSTAT
variable as Independent Variable. Generate the residual plot. (8 marks)
a) What do you infer from the Regression Summary output in terms of variance explained,
coefficient value, Intercept, and the Residual plot?
b) Is the LSTAT variable significant for the analysis based on your model?
a) Regression Summary output provides information on how well the model
fits the data and the relationships between the independent and dependent
variables. Since the R square value is low, the model does not explain the
variation in price very well. A negative value for the coefficient of LSTAT
variable represents that the price goes down as LSTAT goes up. The residual
plot has no patterns, representing no issues with the regression model.
b) P-value for LSTAT variable is less than 0.05, so it is considered as a significant
variable.
6) Build a new Regression model including LSTAT and AVG_ROOM together as independent
variables and AVG_PRICE as dependent variable. (6 marks)
a) Write the Regression equation. If a new house in this locality has 7 rooms (on an average)
and has a value of 20 for L-STAT, then what will be the value of AVG_PRICE? How does it
compare to the company quoting a value of 30000 USD for this locality? Is the company
Overcharging/Undercharging?
b) Is the performance of this model better than the previous model you built in Question 5?
Compare in terms of adjusted R-square and explain.
a) Regression Equation: -1.3582 + AVG_ROOM*5.0947 - LSTAT*0.6423.
Predicted price is 21.4K USD. Since the quoted price is 30K USD, they are
Overcharging.
b) Since the adjusted R square value is higher than the previous model, this
model is better at explaining the dependent variable than the previous model
(5th question).
7) Build another Regression model with all variables where AVG_PRICE alone be the
Dependent Variable and all the other variables are independent. Interpret the output in
terms of adjusted R-square, coefficient and Intercept values. Explain the significance of each
independent variable with respect to AVG_PRICE. (8 marks)
R squared value is 0.69 or 69% which indicates a proper fit. Except for NOX,
TAX, PTRATIO, LSTAT which have negative coefficients, indicating that increase
in those variables results in a decrease in the average price. All other variables
have positive coefficients.
Crime rate is the only variable whose p-value is not less than 0.05. Therefore,
all variables except for ‘crime rate’ are significant for the prediction of average.
8) Pick out only the significant variables from the previous question. Make another instance
of the Regression model using only the significant variables you just picked and answer the
questions below: (8 marks)
a) Interpret the output of this model.
b) Compare the adjusted R-square value of this model with the model in the previous
question, which model performs better according to the value of adjusted R-square?
c) Sort the values of the Coefficients in ascending order. What will happen to the average
price if the value of NOX is more in a locality in this town?
d) Write the regression equation from this model.
a) This model has an R squared value very similar to the previous model but an
adjusted R square value that is slightly higher. All the p values are also less
than 0.05 making all the variables significant.
b) Since this model has a slightly higher value of adjusted R, it explains the y
variable better.
c) Since NOX variable has a negative coefficient, higher value of NOX leads to a
decrease in price.
d) Equation: 29.42 - 10.27*NOX - 1.07*PTRATIO - 0.60*LSTAT - 0.01*TAX
+0.03*AGE + 0.13*INDUS + 0.26*DISTANCE + 4.12*AVG_ROOM.