0% found this document useful (0 votes)
8 views11 pages

Regression Practice

This document presents 11 examples of linear regression. Each example provides numerical data and asks to calculate the regression line, the correlation coefficient, and make predictions. The examples involve variables such as percentages of metal absorption, bird breeding time, number of customers, student grades, weight and height of players, production in a workshop, hours of sleep vs TV, employee sales, pages vs book price, cement resistance, and budget.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views11 pages

Regression Practice

This document presents 11 examples of linear regression. Each example provides numerical data and asks to calculate the regression line, the correlation coefficient, and make predictions. The examples involve variables such as percentages of metal absorption, bird breeding time, number of customers, student grades, weight and height of players, production in a workshop, hours of sleep vs TV, employee sales, pages vs book price, cement resistance, and budget.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

REGRESSION PRACTICE

1.- In a study, using radioactive detectors, of the bodily capacity


To absorb iron and lead, ten subjects participated. Each was given a
identical oral doses of iron (ferrous sulfate) and lead (lead chloride-203).
After twelve days, the amount of each component retained is measured.
body system and, based on these, the absorbed percentages are determined
through the body. The data obtained were:
Hierro (%) 17 22 35 43 80 85 91 92 96 100
Plomo (%) 8 17 18 25 58 59 41 30 43 58
a) Draw the scatter plot. Based on it, can one expect that
Should the correlation coefficient be close to 1, -1, or 0?

b) Find and interpret the coefficient of determination.


c) Check the suitability of the linear regression model. If it is appropriate,
estimate the regression line and use it to predict the percentage of iron
absorbed by an individual whose body system absorbs 15% of the lead
ingerido
A study of photoperiod in aquatic birds is being conducted. The aim is to
establish an equation through which the time can be predicted
reproduction, And, based on the knowledge of photoperiod (number of hours of
light per day) under which reproduction began, X. Data was obtained from
behavior of 11 Aythya (diving ducks). The results were the
following:
Time
reproduction
40 54 98 50 67 58 52 50 43 15 28
Photoperiod 12.8 13.9 14.1 14.7 15.0 15.1 16.0 16.5 16.6 17.2 17.9
Find the corresponding regression line.
Calculate a playback time prediction for a photoperiod of 14.5.
hours. Would it make sense to make a prediction for a photoperiod in this case?
of 24 hours?
What would be the approximate weight of a six-year-old child?

A shopping center is based on the distance, in kilometers, from which


It is located in a population center, the clients listed in the table attend:
Nº de Clientes (X) Distance (Y)

8 15

7 19

6 25

4 23

2 34

1 40

a) Calculate the linear correlation coefficient.


b) If the shopping mall is located 2 km away, how many customers can it attract?
c) If you want to receive 5 clients, at what distance from the population center
Where should the shopping center be located?

4.- The grades obtained by five students in Mathematics and Accounting are:

Mathematics accounting
6 6.5
4 4.5
8 7.0
5 5.0
3.5 4.0
Determine the regression lines and calculate the expected grade in Physics for a
student who has 7.5 in Mathematics.
5- The heights and weights of 10 student basketball players from a team
son

Estatura (X) Pesos (Y)


186 85
189 85
190 86
192 90
193 87
193 91
198 93
201 103
203 100
205 101
Calculate:

a) The regression line of Y on X.


b) The correlation coefficient.
c) The estimated weight of a player who is 208 cm tall.

6.- Based on the following data regarding hours worked in a workshop (X),
and produced units (Y), determine the regression line of Y on X, the
linear correlation coefficient and interpret it.

Hours (X) Production (Y)


80 300
79 302
83 315
84 330
78 300
60 250
82 300
85 340
79 315
84 330
80 310
62 240

7.- A group of 50 individuals has been asked for information about the number of
hours they spend daily sleeping and watching television. The classification of the
responses have allowed us to create the following table:

Number of hours slept (X) 6 7 8 9 10


Nº de horas de televisión (Y) 4 3 3 2 1
a) Calculate the correlation coefficient.
b) Determine the equation of the regression line of Y on X.
c) If a person sleeps eight and a half hours, how much can be expected that
Do you watch television?

8.- The following table gives us the aptitude test scores (X) given to six
dependent on trial and sales of the first trial month (Y) in hundreds
euros.

X 25 42 33 54 29 36
Y 42 72 50 90 45 48
a) Find the correlation coefficient and interpret the result obtained.
b) Calculate the regression line of Y on X. Predict the sales of a
seller who scores 47 on the test.

9. "The attached table shows the number of pages and the price of twelve
technical books

pages price pages price pages price

310 350
'
400 800
'
420 250
'

300 350
'
170 180
'
610 500
'

280 350
'
430 700
'
420 540
'

310 730
'
230 320
'
450 370
'

1. Fit a regression line that explains the price as a function of


number of pages and interpret the results.
2. Build the associated ANOVA table. Is the fit adequate?
3. Calculate 90% confidence intervals for the model parameters.

10.- 'The resistance of the cement depends, among other things, on the time of
drying of cement. In an experiment, the strength of blocks was obtained.
of cement with different drying times, the results were as follows
Time (days) Resistance (kg/cm) 2

1 130 133 118


' ' '

2 219 245 247


' ' '

3 298 280 241 242 262


' ' ' ' '

7 324 304 345 331 357


' ' ' ' '

28 418 426 403 357 373


' ' ' ' '

Analyze the possible existence of a relationship between these two variables.


2. What conclusions can be drawn from the regression contrast and the contrast
of linearity?
3. If a quadratic adjustment were used, would better results be obtained?

11.-"The variable represents in thousands, the number of donkeys in Spain and


the percentage of the State budget dedicated to Education.

year Y X year Y X year Y X

1920 1.006 55
'
1945 747 97 '
1970 476 127
'

1925 1.162 48
'
1950 732 96 '
1975 386 115
'

1930 1.479 78
'
1955 683 89 '
1980 368 114
'

1935 805 82
'
1960 686 114 '

1940 795 86
'
1965 493 106 '

1.Graphically represent this data.


2. Build the regression line that explains the behavior of the
percentage of the State budget dedicated to
Education” based on the variable “the number of donkeys in Spain” and
interpret the results
Is the correlation coefficient between these two variables significant?
4. Are the residuals associated with linear regression adjustment?
independents?
5. Represent the variables XeY against time. Calculate the
correlation coefficients and regression lines of the
variables X and Y with respect to time.
12.- A study was conducted to determine the relationship between the number of
years of experience and the monthly salary, in thousands of pesetas, among the
computer scientists from a Spanish region. To do this, a random sample was taken
of 17 computer scientists and the following data was obtained

Exper. Salary Exper. Salary Exper. Salary

13 261
'
31 364
'
27 360
'

16 332
'
19 338
'
25 365
'

30 361
'
20 365
'
7 214
'

2 165
'
1 169
'
15 310
'

8 264
'
4 198
'
13 314
'

6 191
'
10 246
'

1. Calculate the linear regression of the salary variable against years of


experience. Calculate 95% confidence intervals for the
coefficients of this model.
2. Calculate the linear correlation coefficient and the coefficient of
determination. Can the null hypothesis that the
The coefficient of determination is zero with = 005? '

3. Estimate and calculate a confidence interval at 90% and 95% for the
salary prediction of a computer scientist with 8 years of experience
experience.
4. Is there any anomaly observed in the residuals vs. the graph?
regressed variable?

13.-"The following set of data was collected on groups of female workers"


from England and Wales in the period of 1970-72. Each group is made up of
workers of the same profession (doctors, textile workers,
decorators,...etc,) and in each of the twenty-five displayed groups, there have been
observed two variables: the standardized consumption index of
cigarettes (regressor variable, x) and the lung cancer death rate
(dependent variable, y).

1. Study the linear regression model of the mortality index against


to the smoking rate.
2. Calculate the ANOVA table. Conclusions.
3. Check if the model hypotheses are verified.

x y x y x y
77 84 102 88 133 146
137 116 91 104 115 128
117 123 104 129 105 115
94 128 107 86 87 79
116 155 112 96 91 85
102 101 113 144 100 120
111 118 110 139 76 60
93 113 125 113 66 51
88 104

14.- "Anscombe used the following dataset to demonstrate the


importance of graphs in regression and correlation analysis. There is
four sets of two-dimensional data , the vectorX is the same for
the first three sets.

X1 = X2 Y1 Y2 Y3 X4 Y4
=X3

10 804
'
914
'
746
'
8 658
'

8 695
'
814
'
677
'
8 576
'

13 758
'
874
'
1274 '
8 771
'

9 881
'
877
'
711
'
8 884
'

11 833
'
926
'
781
'
8 847
'

14 996
'
810
'
884
'
8 704
'

6 724
'
613
'
608
'
8 525
'

4 426
'
310
'
539
'
8 556
'

12 1084 '
913
'
815
'
8 791
'

7 482
'
726
'
642
'
8 689
'

5 568
'
474
'
573
'
19 1250 '
1. Calculate the regression line of Y against X in these four datasets.
de datos. Calcular el coeficiente de correlación.

15.- "In 34 batches of 120 pounds of peanuts, the average level was observed."
aflatoxin (parts per billion) and the percentage of peanuts not
contaminated in each batch .

X Y X Y X Y X Y X Y

30'
99971
'
188 '
99942
'
468
'
99863
'
123
'
99956
'
258
'
99858 '

47'
99979
'
189 '
99932
'
468
'
99811
'
713
'
99821
'
188
'
99975 '

83'
99982
'
217 '
99908
'
581
'
99877
'
125
'
99972
'
306
'
99987 '

93'
99971
'
219 '
99970
'
623
'
99798
'
126
'
99889
'
362
'
99958 '

99'
99957
'
228 '
99985
'
706
'
99855
'
159
'
99961
'
398
'
99909 '

110 '
99961
'
242 '
99933
'
711
'
99788
'
167
'
99982
'
443
'
99859 '

832 '
99830
'
836 '
99718
'
995
'
99642
'
1112 '
99658
'

1. Analyze this data and investigate the relationship between these two variables
to predict Y based on X. Is linear fitting appropriate?
2. Do the residues verify the structural hypotheses?
3. Attempt to find a parametric fit that improves upon the linear fit.

16.- ."In fifteen houses in the city of Milton Keynes, it was observed during a
period of time the average temperature difference (in degrees
degrees Celsius) between the temperature outside and the temperature at home, and the
daily gas consumption in kWh.

Diff. temp Consumption Dif. temp Consumption Dif. temp Consumption

103
'
6981
'
134
'
7532
'
156 '
8635
'

114
'
8275
'
136
'
6981
'
164 '
11023 '

115
'
8175
'
150
'
7854
'
165 '
10655 '

125
'
8038
'
152
'
8129
'
170 '
8550
'

131
'
8589
'
153
'
9920
'
171 '
9002
'

1.- Make a graph of the data. Is there a relationship between these two?
variables?
2.- Can the gas consumption be explained by a linear relationship with the
temperature difference?

3.- Adjusting a polynomial of higher degree, a greater one is obtained

coefficient of determination?, which model is preferable?

17.- ."The height (in centimeters) and weight (in kilograms) of thirty was measured.
eleven-year-old girls from the Keaton Medidle Acholo of Bradford. Study these
data and the relationship between both variables.

Height Weight Height Weight Height Weight Height Weight Height Weight
a a a a a

135 26 141 28 149 46 148 32 149 32


146 33 136 28 147 36 149 34 141 32
153 55 154 36 152 47 141 29
154 50 151 48 140 33 164 47
139 32 155 36 143 42 146 37
131 25 137 31 146 35 137 34
149 44 143 36 133 31 135 30

1. Draw the graph of these observations and calculate the line of


weight regression against height and height regression against weight.
In the linear regression of weight against height, is any observation made?
atypical observation?
3. Are there influential observations?
4. Contrast the structural hypotheses of the model.

The iron content in the blast furnace slags can be


determined by a chemical test in the laboratory or, in a cheaper way,
faster, by a magnetic test. There is interest in studying the relationship
between the results of the chemical test and the magnetic test. In particular, it
wants to know if it is possible to estimate from the results of the magnetic test
the results of the chemical test on the iron content. For this, there have been
two tests performed on a set of batches collected sequentially in the
time. The results obtained are in the attached table.

1. Analyze this data. Carry out a descriptive and graphic study of it.
same.
2. Study the relationship between the tests.
3. Check the model's hypotheses.
Who Mag Who Mag Who Mag Who Mag Who Mag Who Mag

24 25 18 19 17 12 21 18 20 21 25 16
16 22 20 10 19 15 24 22 24 18 15 16
24 17 21 23 16 15 15 20 24 20 16 26
18 21 20 20 15 15 20 21 23 25 27 28
18 20 21 19 15 15 20 21 29 20 27 28
10 13 15 15 13 17 25 25 27 18 30 30
14 16 16 16 24 18 27 22 23 19 29 32
16 14 15 16 22 16 22 18 19 16 26 28
25 28 25 36 32 40 28 33 25 33

19.- "The following data represent the Gross National Product of


USA and consumer spending in billions of dollars from 1972.
between the years 1960-1980

Year 1960 1961 1962 1963 1964 1965 1966

PNB 7372 '


7566 '
8003 '
8325
'
8764 '
9293
'
9848 '

GC 4520 '
4614 '
4820 '
5005
'
5280 '
5575
'
5857 '

Year 1967 1968 1969 1970 1971 1972 1973

PNB 1.0114'
1.0581 '
1.0876'
1.0856
'
1.1224'
1.1859 '
1.2550 '

GC 6027 '
6344 '
6579 '
6721
'
6968 '
7371
'
7685 '

Year 1974 1975 1976 1977 1978 1979 1980

PNB 1.2480'
1.2339 '
1.3004'
1.3717
'
1.4369'
1.4830 '
1.4807 '

GC 7636 '
7802 '
8237 '
8639
'
9048 '
9309
'
9351 '

1. Fit a linear model and interpret the regression coefficients


dear ones.
2. Create the graph of the standardized residuals against time.
Study the independence hypothesis.
3. If there is a positive autocorrelation, transform the data and adjust the
linear regression model to the data (least squares)
20.- "The data in the attached table is the classic dataset of the test"
psychological aspect of Strong on memory retention. The data was taken from the
In the following way: a group of individuals memorized a list of objects.
unrelated and after some time they remembered her. The variable indicates the percentage.
of memory retention on average and the variable is the elapsed time.
The objective of the study was to explain the variable function of det.

t p t p t p t p

1 084
'
60 054
'
720 036
'
10080 008
'

5 071
'
120 047
'
1440 026
'

15 061
'
240 045
'
2880 020
'

30 056
'
480 038
'
5760 016
'

1. Analyze this dataset and study the relationship of the


variable perspective at.
2. Analyze analytically and graphically a model of the type p = exp(-t), which
suggest a geometric loss of memory.
3. Study analytically and graphically a model of the type logp = + 0

What interpretation does this model have? Which fit is better?


1

DEVELOPMENT

You might also like