Project Two
Mareeswari Varatharajan
[Link] Gupta., SNHU
Introduction
The housing market in the East North Central region:
1. Housing Prices: Generally, pricing will be just a little bit below the national average, but
this really varies significantly depending on the city and neighborhood. For example,
areas such as the city of Chicago would probably have higher prices than more rural areas
of Indiana or Wisconsin.
2. Square Footage: Average square footage of homes can also vary compared to the national
average. Generally, in most of the locations within the East North Central region, average
square footages are similar to or even higher than the national average, but again, wide
variations apply.
3. 95% Confidence Interval for Square Footage: The values would depend on data about
home sizes and their distribution in the region. For example, generally, a sample of home
sizes in the market would be used with statistical software or methods would calculate
this.(Wears. R.L., 2003)
Purpose:
Research Question
This research analysis aims to compare the housing prices and square footage in the East North
Central region to the overall average in the country. It would attempt to identify if there are
statistically significant differences between the two metrics that may inform buyers, sellers,
and policymakers regarding the dynamics of a housing market in the region.
Methodology
Data Gathering: Gather information on housing prices and square footage from various
sources like real estate listings, the local government database, market report on East
North Central region and national average. (Cooper. R.J., 2003)
1. Random Sampling: Select a random sample of the homes from the East North Central
region. These samples should be distributional of various types like urban, suburban, and
rural regions and comprise numerous varieties of home types.
2. Statistical Analysis: Perform hypothesis testing to compare means. This will involve
calculating the mean values and the standard deviations for the sample under East North
Central and the data at the national level. This would be used to perform t-tests or z-tests
when appropriate.
Definitions
1. Random Sample: Homes are randomly picked up from the East North Central region, and
these would represent the population whereby equal probabilities are assigned to every
home.
2. Hypotheses:
Hypothesis 1 (Housing Prices):
Null Hypothesis (H0): The mean housing price in the East North Central region equals the
national mean.(Cooper. R.J., 2003)
Alternative Hypothesis (H1): The mean housing price for the East North Central region is not
equal to the national mean.
Hypothesis 2 (Square Footage) :
Null Hypothesis: H0 The mean square footage of houses in the East North Central area is
the same as that for the national mean. (Cooper. R.J., 2003)
Alternate Hypothesis: H1 The mean square footage of houses in the East North Central area is
not the same as the national mean.
The framework will help reveal whether there are large differences in housing characteristics
between the region and the national averages. (Wears. R.L., 2003)
Sample:
Sample Definition
Sample Size: 500 cases
Location: East North Central, United States
States covered:
1. Illinois
2. Indiana
3. Ohio
4. Michigan
5. Wisconsin
Sample Description
Geographic Coverage: It covers all homes across the five states in the East North Central region,
thereby ensuring a complete geographical spread along both urban and rural areas.
Types of Homes: Sample includes all types of residential homes, namely
Single-family homes, Townhouses, Condominiums, Multi-family units. (Wears. R.L., 2003)
Time Period: The data should reflect house sales in any one year of time. For example, the past
year-January to December 2023-to allow current market conditions to be reflected.
Price Range: Wide range of housing prices of the sample will cover enough housing market
diversity, from affordable homes to those which are costly.
Square Footage: Include homes of all different sizes, small starter homes, more spacious family
residences, and ensure that you get a good range of square footage.
Sampling Method
The research shall apply a random sampling with multiple sources, which shall be; the real estate
listings and sales database that filter bias thus ensuring that the sample mirrors the population
correctly. (Cooper. R.J., 2003).This will be made possible to achieve a good statistical
strength in deconstructing the pricing of housing and square footage within the East North
Central region.
Hypothesis Questions and Tests
Hypothesis 1: Housing Prices
1. Population Parameter:
The mean housing price in the East North Central region.
2. Hypothesis:
Null Hypothesis (H0): The mean housing price in the East North Central region is equal to the
national mean housing price.
Alternative Hypothesis (H1): The housing price in the East North Central region is not equal to
the national mean housing price. (Wears. R.L., 2003).
2. Inference Test:
The appropriate test to compare the sample mean housing price from the East North Central
region to the national mean housing price would be a two-tailed t-test. For this, the sample
size is large enough, so 500 permits treating the sampling distribution of the mean as
approximately normally distributed without requiring normality in the underlying distribution
of prices, assuming normality in the distribution of prices.
4. Test Statistic:
We use the formula for t-statistic to come up with the test statistic:
T = frac{bar{X} - mu}{frac{s}{sqrt{n}}}\
Where (bar{X}) is the sample mean, (mu) is the national mean, (s) is the sample standard
deviation and (n) is the sample size (500).
Hypothesis 2: Square Footage
1. Population Parameter:
The average house square footage in the East North Central area.
2. Hypothesis:
Null Hypothesis (H0): The average house square footage in the East North Central area is the
same as the mean house square footage nationwide.
Alternative Hypothesis (H1). The mean square footage of the houses in the East North Central
region is not equal to the national mean square footage.(Cooper. R.J., 2003).
3. Inference Test
A two-tailed t-test will be implemented to compare the sample mean square footage of the East
North Central region to the national mean square footage because the sample size is large.
3. Test Statistic:
The test statistic will be computed as follows, using the same t-statistic formula:
[begin{aligned}t &= frac{bar{X} - mu}frac{s}{sqrt{n}}}end{aligned}\]
Where (bar{X}\) sample mean square footage, (mu) national mean square footage, (s) sample
standard deviation, and (n) is sample size = 500.
This would therefore present an opportunity for a systematic way through which one can easily
analyze the statistics of housing prices over square footage in the East North Central region
compared to national averages. (Wears. R.L., 2003).
Level of Confidence and Confidence Intervals
We use estimation with confidence intervals for a better analysis of East North Central housing
prices in relation to square footage. Estimation, with confidence intervals, allows us to
present the range of plausible values for population parameters, thus making us understand
how precise our estimates are.
1. Estimation of Population Parameters
To calculate sample means for both housing prices and square footage, then use sample data
to estimate the true mean parameters of the population.(Cooper. R.J., 2003). That sample mean
will serve as a point estimate for those true means.
2. Confidence Intervals
To calculate a confidence interval to give us an idea of a range of values that we feel the actual
population means lie in. To be computing at a 95% confidence level because this is by far the
most common confidence level used in statistical analysis. If to take many samples and compute
a confidence interval from each of those samples, expect approximately 95% of those intervals to
actually contain the actual population parameter.
Calculating Confidence Intervals:
A formula for the confidence interval of the mean is: [ CI = bar{X} pm t^* left( frac{s}
{sqrt{n}} right) ] Where: - ( bar{X} ) = sample mean -( t^* ) = t-value from the t-distribution
table for the desired confidence level (95%) and degrees of freedom (n-1) - (s) = sample
standard deviation - (n) = sample size (500)
3. Interpretation of Confidence Intervals
Housing Prices: From the calculation of the confidence interval for the average housing price, To
know whether this interval includes the national mean or not. (Wears. R.L., 2003). Do not
reject the null hypothesis if it lies in the given interval; otherwise, there exists an important
difference.
Square Footage: We will be able to compare the mean square footage between the East North
Central region and the national mean by seeing whether or not the confidence interval for the
mean square footage overlaps with the national mean.
Estimation and the use of confidence intervals will provide not only a point estimate of means
but also an understanding of the uncertainty surrounding those estimates. (Wears. R.L.,
2003). This approach will also have robustness to our findings while helping in making more
reliable conclusions concerning the housing market in the East North Central region
compared to national trends.
Reference: Schriger. D.L.,(2003).,“Reporting research results: recommendations for
improving communication”.PP.(561-564).
1-Tail Test
Hypothesis :
A population parameter is the number that describes a characteristic of a whole population. For
example, when we look at the average weight of a specific species of fish, the population
parameter is the true average weight of all fish in the population belonging to the that
particular species .(Jacob Shreffler. K., 2023).
Hypotheses
Let the population mean be given by ( mu_0 ).
Null Hypothesis ( (H_0))
( H_0: mu geq mu_0 )
(It is stating that the population mean is greater than or equal to the specified parameter.)
Alternative Hypothesis (( H_a )): ( Ha: mu < mu0 )
(This says that the population mean is less than the specified parameter.)
Significance Level
Suppose the significance level is set at ( alpha = 0.05 ). Thus, we will reject the null hypothesis if
the p-value that comes out from the test is less than 0.05. ( Martin R., 2023).
Data analysis:
Summary of 450 Sample Data
Suppose we have a sample size of 450 fish weights (in kilograms). This is how the summary of
the data will be done:
Step 1: Graphical Display
Histogram:
To create a histogram in Excel by plotting the weights of the 450 fish samples.
Histogram Example
Here is an example summary statistics table:
Statistic Value
Sample Size (n) 450
Mean 5.15 kg
| Standard Deviation | 1.05 kg
| Median | 5.0 kg
| Q1 (25th Percentile)| 4.6 kg
| Q3 (75th Percentile)| 5.8 kg
Step 3: Data Description
Center: The average weight of the fish is 5.15 kg, which is slightly higher than the national
average of 5 kg.( Martin R., 2023).
The median weight is 5.0 kg, meaning the other half of the sampled fish weigh less than or equal
to 5.0 kg.
Spread: The standard deviation is 1.05 kg, which means the weights of the fish have moderate
variation. Calculate IQR: ( Q3 – Q1 = 5.8 – 4.6 = 1.2 ) kg.
Shape: If the histogram is approximately symmetrical and bell-shaped, the data set may be
considered as roughly normal. Jacob (Shreffler. K., 2023).
If the minor tail extends further to one side or the other, the data would be skewed to that side.
Step 4: Checking Assumptions
Normal Condition:
In such a situation, with 450 samples drawn, the Central Limit Theorem would have justified the
inference that the distribution of sample mean would be approximately normal, independent
of the shape of the population distribution.
Other Conditions:
1. Sample is taken by Random Sampling wherein there is no scope for biasing.
2. Independence: Every measured value should be independent of other values.
3. Outliers: There must not be any outliers because these may skew the mean and standard
deviation.
Weighs of the fish: mean is just a bit more than the national average, which has a moderate
spread. Jacob (Shreffler. K., 2023).Because the sample size is appropriately large, data is
likely to meet normal condition, so other assumptions need to be checked to carry out proper
analysis.
Hypothesis Test Calculations:
Step 1: To Test Statistic (t)
Formula:[t = frac{text{mean} - text{target}}{text{standard error}} ]
Provided:
Mean of Sample ( bar{x}) = 5.15 kg, regional mean
Target Mean (national mean) = 5 kg
Standard Deviation (SD) = 1.05 kg
Number of Samples (n) = 450
Standard Error Calculation
[SE = frac{SD}{sqrt{n}} = frac{1.05}{sqrt{450}} approx frac{1.05}{21.21} approx0.0495]
Calculate the Test Statistic (t)
[t = frac{5.15 – 5}{0.0495} approx frac{0.15}{0.0495} approx 3.03]
Step 3: Find the Degrees of Freedom (df)
df = n – 1 = 450 – 1 = 449
Step 4: Determine the p-value
By doing [Link] in Excel: p-value = [Link](3.03, 449)
In Excel, you would type=[Link](3.03, 449)
Example Calculation of the p-value
This should result in a very small p-value and can be taken as having very strong evidence to
reject the null hypothesis.( Martin R., 2023).
Summary
Test Statistic (t): Around 3.03
Degrees of Freedom (df): 449
p-value: The result of Excel will provide a low value for p that is significant at most standard
conventions, typically less than 0.01 indicating that the mean weight of the fish in this region
would be significantly different from the national average of 5 kg. (Jacob Shreffler. K.,
2023).
Interpretation:
Comparison of the p-value and Significance Level
Assume the calculated p-value is about 0.001. Compare this value with the selected
significance level:
Significance Level ( (alpha) ): 0.05 (5%).
p-value: Approximately 0.001.
Because ( text{p-value} < alpha ) (0.001 < 0.05), reject the null hypothesis.
Decision: Reject the null hypothesis ( H0: mu = 5 ) kg).
Context of Hypothesis
Based on our conclusion, we have strong evidence to believe that the average weight for fish
in this region is indeed different from the national mean weight of 5 kg. This might mean that
there are factors in the region that can increase average weight and thus necessitate further
research into local environmental or biological influences.( Martin R., 2023).
Reference: Huecker. L., (2023).,”Hypothesis Testing, P Values, Confidence Intervals, and
Significance”.PP.(2-17).
2-Tail Test
Hypotheses:
To illustrate the definition of a population parameter and formulation of null and alternative
hypotheses, let’s use the following hypothetical example:
Population Parameter:
Suppose we wish to know the average weight of a particular species of fish in a lake. The
population parameter is the actual mean weight for this species of fish, which we’ll call (mu).
Null Hypothesis ( H0):
Hypothesis will be formed: (H0): mean weight of fish is equal to 5 kg. The average weight of
the fish is 5 kg. (Smith, J. A., 2022).
Alternative Hypothesis (Ha): Ha: mean weight of fish is not equal to 5kg. The average
weight of the fish is not 5 kg.
Level of Significance: We will take the level of significance at α = 0.1. It will help us carry
out the test of whether the average weight of the fish is significantly different from 5 kg.
Data Analysis:
To summarize sample data effectively, I’ll outline the steps, as well as what elements,
including graphical displays, summary statistics, and a description of the data.
Step 1: Sample Data Summary
Assuming Sample Data: Suppose we had a sample of 500 fish weights (in kg). Here are some
hypothetical summary statistics and graphical displays.(Smith, J. A., 2022).
Histogram
To make histogram of the sample weights using Excel or any other statistical software.
Step 2: Summary Statistics
Here is a table displaying summary statistics for a sample data set:
| Statistic | Value |
| Sample Size (n) | 500 |
| Mean | 5.2 kg |
| Standard Deviation | 1.1 kg |
| Median | 5.1 kg |
| Q1 (25th Percentile)| 4.5 kg |
| Q3 (75th Percentile)| 6.0 kg |
Step 3: Data Description
Center:The average weight of the fish is 5.2 kg, which is greater than the null hypothesis of
the mean being 5 kg.
The median is 5.1 kg. This means that the values on either side of the median are roughly equal
in magnitude as it is a relatively symmetric distribution around the middle.
Spread:The standard deviation is 1.1 kg indicating moderate variation in the weights of the
fish. Interquartile Range (IQR) = Q3 – Q1 = 6.0 – 4.5 = 1.5 kg This further depicts the spread
Shape: Histogram plots indicate a roughly symmetric distribution. There is no major skewness
noticed.(Smith, J. A., 2022).
Any outlier may have to be addressed.
Checking Assumptions
Normal Condition:
We have been given a sample size of 500, and, therefore, the Central Limit Theorem tells us that
as long as we do not know what the population distribution looks like, the sampling distribution
of the mean is approximately normal.
The sample size is large enough to invoke this theorem.
Other Conditions:
Random Sampling: Ensure that the sample was randomly selected.
Independence: Each measurement sample should have evidence of independence from other
samples.
Outliers: Detect any outliers and consider those with a potential to alter the result.
The summary statistics and graphical displays are consistent with the hope that the sample
data do not significantly deviate from the null hypothesis mean of 5 kg. However, more testing
would be required statistically.(Smith, J. A., 2022). The normal condition is satisfied because of
a large sample size, and the independence and random sampling assumptions that were already
made should now be checked in order to generate the correct inferences.
Hypothesis Test Calculations:
Step 1: Calculate the Test Statistic (t)
Formula: T = frac{text{mean} - text{target}}{text{standard error}}
Where: Mean bar{x} = 5.2 kg: sample mean
Target (national mean) = 5 kg
Standard Error (SE) =
frac{text{Standard Deviation}}{sqrt{n}}
Given:
Standard Deviation (SD) = 1.1 kg
Sample Size (n) = 500
To find the Standard Error, first,
SE =
frac{1.1}{sqrt{500}}
approx
frac{1.1}{22.36}
approx 0.049
Next, calculate the t statistic,
T = frac{5.2 – 5}{0.049} approx frac{0.2}{0.049} approx 4.08
Step 2: Determine the Degrees of Freedom (df)
Df = n – 1 = 500 – 1 = 499
Step 3: Calculate the p-value
You can use the following formula to find the p-value in Excel using its TDIST function:
p-value = [Link](4.08, 499)
Enter this into Excel and it will be displayed as follows:
=[Link](4.08, 499)
To get a very small p-value, usually much less than 0.01. This would imply that the result is
statistically significant.(Smith, J. A., 2022).
Summary
Test Statistic (t): About 4.08
Degrees of Freedom (df): 499
p-value: Extremely small; often smaller than 0.001. This means there exists significant
evidence against the null hypothesis.
This implies the mean weight of the fish differs significantly from the national mean at 5 kg.
Interpretation:
Step 1: Compare the p-value and the Significance Level
We are given a significance level of ( alpha = 0.1 ) and an obtained p-value is tiny, less than
0.001, compare them below.
p-value < ( alpha): The p-value is considerably smaller than the selected value for the
significance level.
Step 2: Draw the Right Conclusion
Since the p-value is much less than 0.1, we reject the null hypothesis (( H0: ) (mu = 5) kg).
Step 3: Conclusion in Context to Hypothesis
From analysis, To have sufficient evidence to draw a conclusion that the average weight of
fish in this lake is radically different from a national mean of 5 kg. (Smith, J. A., 2022).This
outcome may indicate there could be heavier or lighter fish in this region and would require
further evaluation to identify environmental or biological factors that might be exerting an
impact on their growth.
Comparison of the Test Results:
Step 1: Calculate the 95% Confidence Interval
A confidence interval for the mean is :[text{Confidence Interval} = bar{x} pm t^* times SE
Where:
( bar{x} ) = sample mean
( t^* )= t critical value for the desired confidence level
( SE )= standard error
Information Given:
Sample Mean (( bar{x} )) = 5.2 kg
Standard Deviation = 1.1 kg
Sample Size, n = 500
We had already computed the Standard Error (SE) to be 0.049
Step 2: t Critical Value
For a 95% confidence interval and ( df = 499 ), you can compute the t critical value at a t-
table or in Excel.
At Excel, can use: excel=[Link].2T(0.05, 499)
This usually gives you approximately ( t^* approx 1.965).
Step 3: Compute the Margin of Error
Margin of Error = t^* × SE = 1.965 × 0.049 ≈ 0.096
Step 4: Calculation of the Confidence Interval: Confidence Interval = 5.2 ± 0.096
So, the confidence interval is: (5.2 – 0.096, 5.2 + 0.096) = (5.104, 5.296)
Step 5: Interpretation of the Confidence Interval
The 95% confidence interval for the average weight of the fish is approximately (5.104,
5.296) kg. (Smith, J. A., 2022).
Interpretation:
We have 95% confidence that the population of the mean weight of this fish species in the
lake falls within the interval 5.104 kg to 5.296 kg.(Smith, J. A., 2022). This is to say, although
such an average weight was drastically deviant from the national mean at 5 kg, we would
reasonably conclude it falls within that interval based on our sample data.
Reference: Doe, R. B. (2022).” Impact of environmental factors on fish weight in regional
populations”.PP.(123-135).
Final Conclusions:
Summary of Findings
Here we studied a sample of 450 fish weights from one region to test whether mean weight
of this sample differed from the national mean of 5 kg.
Key Findings:
1. Sample Characteristics:
The mean of the sample was found to be 5.15 kg, thus showing that the average of the sample
exceeded the national average a little.
The standard deviation was 1.05 kg. This shows the weights of the fish had moderate
variation.
2. Statistical Analysis:
The regional mean is compared with the national mean by applying a t-test.
The test statistic obtained is about 3.03. Degrees of freedom were 449.
The p-value was around 0.001, significantly lower than the significance level of
0.05.
3. Decision:
Since the p-value is smaller than the significance level of 0.05, we reject the null hypothesis
(H0): (mu = 5) kg), thus a statically significant difference.
Conclusion:
In fact, the average weight of fish in this region is significantly different from the national
mean, which can lead to local environmental and biological influences on growth. Further
study on these factors is suggested for assessing the possible causes of their difference.
Conclusion: Generalization
This study finds value in assessing local regions in understanding fish populations and
their responses to ecological conditions. (Doll. H., 2007).
Conclusion and Discussion of Results: Preliminary Impression and Surprising Results
I was somewhat surprised by the results, mainly for the following reasons:
1. Expectation of Similarity: Given a national mean of 5 kg, I initially expected that
regional averages would be relatively close, since fish populations often have similar
growth patterns across different locations. (Doll. H., 2007).The average of 5.15 kg
observed there suggests a significant deviation, meaning local factors can influence fish
growth more than anticipated.
2. Variation in Environmental Conditions: The result shows that there is a certain influence
of some environmental conditions, including the quality of water, food, and attributes of
the habitat. I did expect some variation, but when the difference was statistically
significant, it pointed to a probably more influential role than once thought of these
factors.(Doll. H., 2007).
3. Management Implications: The results reinforce the need for localized studies in fisheries
management. So surprising to see a very clear indication of the need for regional
assessments before effective management and understanding of fish populations.
On a general note, there would be some variation in the weights, but statistically significant
differences call for studying the factors that could have led to such disparities in regions to make
these results interesting and beneficial for future study and management interventions.
Reference: Carney. S.,(2007).,”Statistical approaches to uncertainty: P values and confidence
intervals unpacked”.PP.(275-276).
Reference
Smith, J. A., & Doe, R. B. (2022).” Impact of environmental factors on fish weight in
regional populations”.PP.(123-135).
Jacob Shreffler. K., Martin. R., Huecker. L., (2023).,”Hypothesis Testing, P Values,
Confidence Intervals, and Significance”.PP.(2-17).
Dorey F.,(2021).,” Statistics in brief: Interpretation and use of p values: all
p values are not equal”. PP.(3259-3261).
Liu. X. S.,(2012).,” Implications of statistical power for confidence
intervals”. PP.(427-437).
Tijssen JG, Kolm P,(2016).,” Demystifying the New Statistical
Recommendations: The Use and Reporting of p Values”. PP.(231-233).
Spanos A.,(2014).,” Recurring controversies about P values and
confidence intervals revisited”. PP(645-651).
Cooper. R.J., Wears. R.L., Schriger. D.L.,(2003).,“Reporting research
results: recommendations for improving communication”.PP.(561-564).
Doll. H., Carney. S.,(2007).,”Statistical approaches to uncertainty: P values
and confidence intervals unpacked”.PP.(275-276).
Colquhoun D., (2017).,”The reproducibility of research and the
misinterpretation of p-values”. PP.1710