Assignment 2

The document discusses a real estate agency that employs auditors to study various geographic and property features of houses to estimate their pricing. The agency has provided a dataset of 506 houses in Boston with details on crime rates, pollution levels, education facilities, distance from highways, and other variables. The data will be analyzed to build regression models to predict average home prices based on the other variables.

Uploaded by

Gnaneshwar Rao

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

180 views

Assignment 2

Uploaded by

Gnaneshwar Rao

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Terro’s real estate agency

Terro’s real-estate is an agency that estimates the pricing of houses in a certain

locality. The pricing is concluded based on different features / factors of a
property. This also helps them in identifying the business value of a property.
To do this activity the company employs an “Auditor”, who studies various
geographic features of a property like pollution level (NOX), crime rate,
education facilities (pupil to teacher ratio), connectivity (distance from
highway), etc. This helps in determining the price of a property. The agency has
provided a dataset of 506 houses in Boston. Following are the details of the
dataset:

Data Dictionary: The data consists of the following variables.

CRIME RATE Per capita crime rate by town
INDUSTRY Proportion of non-retail business acres per town (in
percentage terms)
NOX Nitric oxides concentration (parts per 10 million)
AVG_ROOM Average number of rooms per house
AGE Proportion of houses built prior to 1940 (in percentage terms)
DISTANCE Distance from highway (in miles)
TAX Full-value property-tax rate per $10,000
PTRATIO Pupil-teacher ratio by town
LSTAT % Lower status of the population
AVG_PRICE Average value of houses in $1000's
EXCEL WEEK 2 ASSESSMENT:
GNANESHWAR RAO R

1) Generate the summary statistics for each variable in the table. (Use Data analysis
toolpak). Write down your observations. (5 marks)

Based on Measures of Symmetry, we can say that ‘AVG_ROOM’ has the

sharpest peak as it has the highest kurtosis, while ‘AVG_PRICE’ is the most
positively skewed variable.
Based on Measures of variability, it can be inferred that the Standard deviation
for ‘TAX’ variable is the highest, indicating that its data is more spread out.
Based on minimum and maximum values, we can say that a lot of outliers are
present in ‘TAX’ and ‘AGE’ variables.

2) Plot a histogram of the Avg_Price variable. What do you infer? (5 marks)

Based on the shape of distribution of data, we can say that the AVG_PRICE
variable has a positive skew meaning most of the values occur before the
mean.
Since, most of data most of the data points falls on the left side of the mean
then it’s called Right Skewed Data or Positive Skewed.
The general relationship among the central tendency measures in a positively
skewed distribution may be expressed using the following inequality:
Mean > Median > Mode.

3) Compute the covariance matrix. Share your observations. (5 marks)

From the above covariance matrix, we can infer that the variables TAX and AGE
have the highest covariance, TAX and DISTANCE have the second highest
covariance.
Meanwhile, TAX and AVG_PRICE have the least covariance.

4) Create a correlation matrix of all the variables (Use Data analysis tool pack).
a) Which are the top 3 positively correlated pairs.
b) Which are the top 3 negatively correlated pairs. (5 marks)

Top 3 positively correlated pairs are - TAX & DISTANCE, NOX & INDUS, NOX & AGE.
Top 3 negatively correlated pairs are - AVG_PRICE& LSTAT, AVG_ROOM& LSTAT,
AVG_PRICE& PTRATIO.

5) Build an initial regression model with AVG_PRICE as ‘y’ (Dependent variable) and LSTAT
variable as Independent Variable. Generate the residual plot. (8 marks)
a) What do you infer from the Regression Summary output in terms of variance explained,
coefficient value, Intercept, and the Residual plot?
b) Is the LSTAT variable significant for the analysis based on your model?
a) Regression Summary output provides information on how well the model
fits the data and the relationships between the independent and dependent
variables. Since the R square value is low, the model does not explain the
variation in price very well. A negative value for the coefficient of LSTAT
variable represents that the price goes down as LSTAT goes up. The residual
plot has no patterns, representing no issues with the regression model.
b) P-value for LSTAT variable is less than 0.05, so it is considered as a significant
variable.

6) Build a new Regression model including LSTAT and AVG_ROOM together as independent
variables and AVG_PRICE as dependent variable. (6 marks)
a) Write the Regression equation. If a new house in this locality has 7 rooms (on an average)
and has a value of 20 for L-STAT, then what will be the value of AVG_PRICE? How does it
compare to the company quoting a value of 30000 USD for this locality? Is the company
Overcharging/Undercharging?
b) Is the performance of this model better than the previous model you built in Question 5?
Compare in terms of adjusted R-square and explain.
a) Regression Equation: -1.3582 + AVG_ROOM*5.0947 - LSTAT*0.6423.
Predicted price is 21.4K USD. Since the quoted price is 30K USD, they are
Overcharging.
b) Since the adjusted R square value is higher than the previous model, this
model is better at explaining the dependent variable than the previous model
(5th question).

7) Build another Regression model with all variables where AVG_PRICE alone be the
Dependent Variable and all the other variables are independent. Interpret the output in
terms of adjusted R-square, coefficient and Intercept values. Explain the significance of each
independent variable with respect to AVG_PRICE. (8 marks)
R squared value is 0.69 or 69% which indicates a proper fit. Except for NOX,
TAX, PTRATIO, LSTAT which have negative coefficients, indicating that increase
in those variables results in a decrease in the average price. All other variables
have positive coefficients.
Crime rate is the only variable whose p-value is not less than 0.05. Therefore,
all variables except for ‘crime rate’ are significant for the prediction of average.

8) Pick out only the significant variables from the previous question. Make another instance
of the Regression model using only the significant variables you just picked and answer the
questions below: (8 marks)
a) Interpret the output of this model.
b) Compare the adjusted R-square value of this model with the model in the previous
question, which model performs better according to the value of adjusted R-square?
c) Sort the values of the Coefficients in ascending order. What will happen to the average
price if the value of NOX is more in a locality in this town?
d) Write the regression equation from this model.
a) This model has an R squared value very similar to the previous model but an
adjusted R square value that is slightly higher. All the p values are also less
than 0.05 making all the variables significant.
b) Since this model has a slightly higher value of adjusted R, it explains the y
variable better.
c) Since NOX variable has a negative coefficient, higher value of NOX leads to a
decrease in price.
d) Equation: 29.42 - 10.27*NOX - 1.07*PTRATIO - 0.60*LSTAT - 0.01*TAX
+0.03*AGE + 0.13*INDUS + 0.26*DISTANCE + 4.12*AVG_ROOM.

Week 4 Project: Case Study
No ratings yet
Week 4 Project: Case Study
2 pages
Capstone Project 1
100% (1)
Capstone Project 1
20 pages
Decision Making: Submitted By-Ankita Mishra
No ratings yet
Decision Making: Submitted By-Ankita Mishra
20 pages
Factor-Hair RV PDF
No ratings yet
Factor-Hair RV PDF
23 pages
1) Introduction A) Defining Problem Statement:-: ST ST
No ratings yet
1) Introduction A) Defining Problem Statement:-: ST ST
10 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
Project Questions
No ratings yet
Project Questions
4 pages
Capstone Project Report 2
No ratings yet
Capstone Project Report 2
178 pages
FRA Project Report Milestone 1 PDF
No ratings yet
FRA Project Report Milestone 1 PDF
29 pages
ML - Project - Business Report
No ratings yet
ML - Project - Business Report
43 pages
House Price Prediction Using Data Science
No ratings yet
House Price Prediction Using Data Science
8 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
Problem 2 Businessreport ML
No ratings yet
Problem 2 Businessreport ML
9 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
No ratings yet
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
52 pages
AS Extended Buisnesss Report
No ratings yet
AS Extended Buisnesss Report
25 pages
Answer Book - Rose Wines
100% (1)
Answer Book - Rose Wines
11 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Capstone Project Report
No ratings yet
Capstone Project Report
187 pages
Cars Project PDF
No ratings yet
Cars Project PDF
9 pages
Surabhi FRA PartA
No ratings yet
Surabhi FRA PartA
13 pages
MySQL - Week 5 Quiz
100% (1)
MySQL - Week 5 Quiz
6 pages
Rajendra Ladda DVT Car Insurance Tableau Project
No ratings yet
Rajendra Ladda DVT Car Insurance Tableau Project
8 pages
Mra Project - Milestone1: Student Name: Gowri Srinivasan Batch: Dsba Online Mar 20
No ratings yet
Mra Project - Milestone1: Student Name: Gowri Srinivasan Batch: Dsba Online Mar 20
30 pages
Assignment 2 Solution
No ratings yet
Assignment 2 Solution
6 pages
Finance Risk Analytics - Priyanka Sharma - Business Report
No ratings yet
Finance Risk Analytics - Priyanka Sharma - Business Report
49 pages
Rahulsharma - 03 12 23
No ratings yet
Rahulsharma - 03 12 23
25 pages
Mini Project DVT
No ratings yet
Mini Project DVT
3 pages
Vijayalakshmi
No ratings yet
Vijayalakshmi
17 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
P L Lohitha 19-04-23 TSF Business Report
No ratings yet
P L Lohitha 19-04-23 TSF Business Report
70 pages
Us President
No ratings yet
Us President
24 pages
Answer Book - Sparkling Wines
No ratings yet
Answer Book - Sparkling Wines
10 pages
Time Series Forecasting
0% (1)
Time Series Forecasting
1 page
RACHIT MITTAL Capstone Project. Notes 2 PDF
No ratings yet
RACHIT MITTAL Capstone Project. Notes 2 PDF
39 pages
ML Quiz 2
No ratings yet
ML Quiz 2
1 page
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
No ratings yet
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
12 pages
ML Assignemnt PDF
No ratings yet
ML Assignemnt PDF
21 pages
House Price Prediction 1
No ratings yet
House Price Prediction 1
27 pages
Rajiv Ranjan 11 Dec 2022
No ratings yet
Rajiv Ranjan 11 Dec 2022
18 pages
Great Learning DVT Final Project - Car Claims For Insurance
100% (1)
Great Learning DVT Final Project - Car Claims For Insurance
113 pages
Prathamesh Shukla SMDM Project 20.08.23
100% (1)
Prathamesh Shukla SMDM Project 20.08.23
34 pages
7z1018 CW Example Predicting House Prices in King County
No ratings yet
7z1018 CW Example Predicting House Prices in King County
16 pages
ML Models
No ratings yet
ML Models
2 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
Anisha SMDM
No ratings yet
Anisha SMDM
11 pages
Answer Report: Data Mining
No ratings yet
Answer Report: Data Mining
32 pages
Mini Project - Factor Hair Analysis: Sravanthi.M
100% (2)
Mini Project - Factor Hair Analysis: Sravanthi.M
24 pages
Color: Due On Sunday June 7th, by 11:59PM
No ratings yet
Color: Due On Sunday June 7th, by 11:59PM
2 pages
Assignment 5 - Heuristics and Principles
No ratings yet
Assignment 5 - Heuristics and Principles
4 pages
Project Predictive Modeling PDF
100% (1)
Project Predictive Modeling PDF
58 pages
End Term Quiz1 - Attempt Review
No ratings yet
End Term Quiz1 - Attempt Review
5 pages
Boston Condo Info-Case Study: Click Here
No ratings yet
Boston Condo Info-Case Study: Click Here
3 pages
TSF - Project
100% (1)
TSF - Project
5 pages
Business Report: Advanced Statistics Module Project - II
No ratings yet
Business Report: Advanced Statistics Module Project - II
9 pages
Anshul Dyundi Machine Learning July 2022
50% (2)
Anshul Dyundi Machine Learning July 2022
46 pages
Akshaya SMDM Project Report
100% (1)
Akshaya SMDM Project Report
18 pages
SMDM Report
No ratings yet
SMDM Report
12 pages
Week2 Excel Problem Statement Real Estate-1
No ratings yet
Week2 Excel Problem Statement Real Estate-1
2 pages
Terro's Realestate Agency Business Report
No ratings yet
Terro's Realestate Agency Business Report
6 pages
Test For Stat 25 - 04
No ratings yet
Test For Stat 25 - 04
12 pages
Tugas Kelompok Statprob
No ratings yet
Tugas Kelompok Statprob
19 pages
Polytechnic Business Statistics
No ratings yet
Polytechnic Business Statistics
3 pages
12720
No ratings yet
12720
10 pages
Conf Int
No ratings yet
Conf Int
1 page
Tabla D3 Duncan Statistics Control
No ratings yet
Tabla D3 Duncan Statistics Control
3 pages
Moments and Measures of Skewness and Kurtosis
0% (1)
Moments and Measures of Skewness and Kurtosis
2 pages
GARCH Models in Python 1
No ratings yet
GARCH Models in Python 1
31 pages
Box Plot Answers MME
No ratings yet
Box Plot Answers MME
2 pages
Jihad Ardiansyah Telkom University (SENATIK 2020)
No ratings yet
Jihad Ardiansyah Telkom University (SENATIK 2020)
10 pages
Descriptive Statistics Updated
No ratings yet
Descriptive Statistics Updated
38 pages
Subject: Validity and Reliability Test Prepared By: Taufiq Kurniawan, S.Si, MM
No ratings yet
Subject: Validity and Reliability Test Prepared By: Taufiq Kurniawan, S.Si, MM
4 pages
Descriptive Statistics TRIPLE S (STATISTICALLY SIGNIFICANT SQUAD)
No ratings yet
Descriptive Statistics TRIPLE S (STATISTICALLY SIGNIFICANT SQUAD)
4 pages
T-1 Basic Statistics
No ratings yet
T-1 Basic Statistics
3 pages
IGCSE Math 0580 Averages Mean Median Mode and Ranges Past Paper Worksheet P4 Copy
No ratings yet
IGCSE Math 0580 Averages Mean Median Mode and Ranges Past Paper Worksheet P4 Copy
7 pages
Adobe Scan 15-Jan-2024
No ratings yet
Adobe Scan 15-Jan-2024
6 pages
Math 1060 - Lecture 7
No ratings yet
Math 1060 - Lecture 7
26 pages
2 - CHAPTER TWO-Mean and Total Estimation
No ratings yet
2 - CHAPTER TWO-Mean and Total Estimation
14 pages
Business Statistics - Mid Term Set 1-1
No ratings yet
Business Statistics - Mid Term Set 1-1
4 pages
Q4 - Written Test 3: Grade 10 Mathematics The Quartile, Decile and Percentile of Ungrouped/Grouped Data
No ratings yet
Q4 - Written Test 3: Grade 10 Mathematics The Quartile, Decile and Percentile of Ungrouped/Grouped Data
2 pages
Module 3 Psych Stat
No ratings yet
Module 3 Psych Stat
23 pages
Box and Whisker Plots
No ratings yet
Box and Whisker Plots
19 pages
Seatwork: Class Interval Frequenc y Class Boundary CM (X) CF (X-) 2) 2
No ratings yet
Seatwork: Class Interval Frequenc y Class Boundary CM (X) CF (X-) 2) 2
3 pages
University of Trinidad and Tobago: Worksheet 1
No ratings yet
University of Trinidad and Tobago: Worksheet 1
2 pages
Name - Score - Year/section - Date
No ratings yet
Name - Score - Year/section - Date
3 pages
MMW Weighted Mean Range Variance SD and Slovin
No ratings yet
MMW Weighted Mean Range Variance SD and Slovin
23 pages
11 Grade 3rd Term Note Maths
No ratings yet
11 Grade 3rd Term Note Maths
22 pages
Measures of Central Tedency Cont.
No ratings yet
Measures of Central Tedency Cont.
8 pages
03 Seatwork 1
No ratings yet
03 Seatwork 1
2 pages
QM Excel
No ratings yet
QM Excel
7 pages