Linear Regression Problems

advertisement
1
Linear Regression Problems
1. As Earth’s population continues to grow, the solid waste generated by the
population grows with it. Governments must plan for disposal and recycling of
ever growing amounts of solid waste. Planners can use data from the past to predict
future waste generation and plan for enough facilities for disposing of and recycling
the waste.
Given the following data on the waste generated in Florida from 19901994, how can we construct a function to predict the waste that was generated in the
years 1995-1999? The scatter plot is shown in Figure 1.85.
Year
a)
b)
c)
d)
Tons of Solid Waste
Generated (in
thousands)
1990
19,358
1991
19,484
1992
20,293
1993
21,499
1994
23,561
Make a scatterplot of the data, letting x represent the number of years since
1990.
Use a graphing calculator to fit linear, quadratic, cubic, and power functions
to the data. By comparing the values of R2 , determine the function that best
fits the data.
Graph the function of best fit with the scatterplot of the data.
With each function found in part (b), predict the average tons of waste in 2000
and 2005, and determine which function gives the most realistic predictions.
2. The numbers of insured commercial banks y (in thousands) in the United States for
the years 1987 to 1996 are shown in the table. (Source: Federal Deposit Insurance
Corporation).
Year
y
1987
13.70
1988
13.12
1989
12.71
1990
12.34
1991
11.92
1992
11.46
1993
10.96
1994
10.45
1995
9.94
1996
9.53
Make a scatterplot of the data, letting x represent the number of years since 1987.
a) Use a graphing calculator to fit linear, quadratic, cubic, and power functions
to the data. By comparing the values of R2 , determine the function that best
fits the data.
b) Graph the function of best fit with the scatterplot of the data.
c) With each function found in part (b), predict the average number of insured
commercial banks in 2000 and 2005, and determine which function gives the
most realistic predictions.
e) Plot the actual data and the model you selected on the same graph. How
closely does the model represent the data?
Project AMP
A Quesada Director Project AMP
2
3. U.S. Farms. As the number of farms has decreased in the United States, the
average size of the remaining farms has grown larger, as shown in the table below.
Year
1910
1920
1930
1940
1950
1959
1969
1978
1987
1997
Average Acreage Per
Farm
139
149
157
175
216
303
390
449
462
487
a) Make a scatterplot of the data, letting x represent the number of years since
1900.
b) Use a graphing calculator to fit linear, quadratic, cubic, and power functions
to the data. By comparing the values of R2 , determine the function that best
fits the data.
c) Graph the function of best fit with the scatterplot of the data.
d) With each function found in part (b), predict the average acreage in 2000 and
2010 and determine which function gives the most realistic predictions.
4. Sports The winning times (in minutes) in the women’s 400-meter freestyle
swimming event in the Olympics from 1936 to 1996 are given by the following
ordered pairs.
(1936,5.44) (1972, 4.32)
(1948,5.30) (1976, 4.16)
(1952,5.20) (1980, 4.15)
(1956, 4.91) (1984, 4.12)
(1960, 4.84) (1988, 4.06)
(1964, 4.72) (1992, 4.12)
(1968, 4.53) (1996, 4.12)
a) Make a scatterplot of the data, letting x represent the number of years since
1972.
b) Use a graphing calculator to fit linear, quadratic, cubic, and power functions
to the data. By comparing the values of R2 , determine the function that best
fits the data.
c) Graph the function of best fit with the scatterplot of the data.
d) Plot the actual data and the model you selected on the same graph. How
closely does the model represent the data?
Project AMP
A Quesada Director Project AMP
3
Quadratic Regression Problems
1. The following data was obtained by throwing a rubber ball at a CBR.
Time (sec)
0.0000
0.1080
0.2150
0.3225
0.4300
0.5375
0.6450
0.7525
0.8600
Height (m)
1.03754
1.40205
1.63806
1.77412
1.80392
1.71522
1.50942
1.21410
0.83173
a) Use the data above to make a scatterplot, letting x represent the number of
seconds elapsed.
b) Next, use a graphing calculator to find the model that best expresses the height
and vertical velocity of the rubber ball. We can also use this model to predict the
maximum height of the ball and its vertical velocity when it hits the face of the
CBR.
c) Fit linear, quadratic, cubic, and power functions to the data. By comparing the
values of R2 , determine the function that best fits the data.
d) Graph the function of best fit with the scatterplot of the data.
e) Determine the maximum height of the ball (in meters).
f) With the model you selected in part (b), predict when the height of the ball is at
least 1.5 meters.
2. Stopping Distance A state highway patrol safety division collected the data on
stopping distances in Table 2.16.
a) Draw a scatter plot of the data.
b) Fit linear, quadratic, cubic, and power functions to the data. By comparing the
values of R2 , determine the function that best fits the data.
c) Superimpose the regression curve on the scatter plot.
d) Use the regression model to predict the stopping distance for a vehicle traveling at
25 mph.
e) Use the regression model to predict the speed of a car if the stopping distance is
300 ft.
Table 2.16 Highway Safety Division
Speed (mph)
10
20
30
40
50
Project AMP
Stopping Distance (ft)
15.1
39.9
75.2
120.5
175.9
A Quesada Director Project AMP
4
3. Home Schooling Growth The estimated number of U.S. children that were
home-schooled in the years from 1992 to 1997 were:
Table 1.13 Home Schooling
Year
1992
1993
1994
1995
1996
1997
Number
703,000
808,000
929,000
1,060,000
1,220,000
1,347,000
(a) Produce a scatter plot of the number of children home-schooled in thousands
(y) as a function of years since 1990 (x).
(b) Find the linear regression equation. (Round the coefficients to the nearest
0.01.)
(c) Does the value of r 2 suggest that the linear model is appropriate?
(d) Find the quadratic regression equation. (Round the coefficients to the nearest
0.01.)
(e) Does the value of R2 suggest that a quadratic model is appropriate?
(f) Use both curves to predict the number of U.S. children that are homeschooled in the year 2005. How different are the estimates?
(g) Writing to Learn Use the results of this exploration to explain why it is risky
to use regression equations to predict y-values for x values that are not very
close to the data points, even when the curves fit the data points very well.
4. Leisure Time The following table shows the median number of hours of leisure
time that Americans had each week in various years.
Year
Median Number of Leisure
Hours Per Week
1973, 0
26.2
1980, 7
19.2
1987, 14
16.6
1993, 20
18.8
1997, 24
19.5
Source: Louis Harris and Associates
(a) Make a scatterplot of the data, letting x represent the number of years
since 1973, and determine which model best fits the data.
(b) Use a graphing calculator to fit the type of function determined in part
(a) to the data.
(c) Graph the equation with the scatterplot. Then, use the function found
in part (b) to estimate the number of leisure hours per week in
1978,1990, and 2005.
Project AMP
A Quesada Director Project AMP
5
5. On-line Travel Revenue With the explosion of increased Internet use, more and
more travelers are booking their travel reservations on-line. The following table
lists the total on-line revenue for recent years. Most of the revenue is from airline
tickets.
Year
On-Line Travel Revenue (In
Millions)
$ 276
827
1900
3200
4700
6500
8900
1996
1997
1998
1999
2000
2001
2002
Source: Travel and Interactive Technology 1999
(a) Create a scatterplot of the data. Let x= the number of years since 1996.
(b) Use a graphing calculator to fit the data with linear, quadratic, and
exponential functions. Determine which function has the best fit.
(c) Graph all three functions found in part (b) with the scatterplot in part (a).
(d) Use the functions found in part (b) to estimate the on-line travel revenue
in 2010. Which function provides the most realistic prediction?
Quartic Regression Problems
1. Consumer Debt Nonmortgage consumer debt is mounting in the United States, as
shown in the table below.
Year
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
Non-mortgage
Debt (In Billions)
$ 762
789
783
775
804
902
1038
1161
1216
1266
f) Draw a scatter plot of the data.
g) Fit linear, exponential, power, cubic, and quartic functions to the data. By
comparing the values of R2 , determine the function that best fits the data.
h) Superimpose the regression curve on the scatter plot.
i) Use the regression model to predict when consumer debt will reach 1400 billion
dollars.
Project AMP
A Quesada Director Project AMP
6
2. Declining Number of Farms in the United States Today U.S. farm acreage is
about the same as it was in the early part of the twentieth century, but the number
of farms has shrunk.
Year
Number of Farms (in
millions)
6.4
6.5
6.3
6.1
5.4
3.7
2.7
2.3
2.1
1.9
1910
1920
1930
1940
1950
1959
1969
1978
1987
1997
Looking at the table above, we note that the data could be modeled with a cubic or a
quartic function.
(a) Model the data with both cubic and quartic functions. Let the first
coordinate of each data point be the number of years after 1900. That
is, enter the data as (10, 6.4), (20, 6.5), and so on. Then using R2 , the
coefficient of determination, decide which functions is the better fit.
The R2 -value gives an indication of how well the function fits the
data. The closer R2 is to 1, the better the fit.
(b) Graph the function with the scatterplot of the data.
(c) Use the answer to part (a) to estimate the number of farms in 1900,
1975, and 2003.
Exponential Regression Problems
1. In the years before the Civil War, the population of the United States grew
rapidly, as shown in the following table from the U.S. Bureau of the Census.
Year
1790
1800
1810
1820
1830
1840
1850
1860
Population in Millions
3.93
5.31
7.24
9.64
12.86
17.07
23.19
31.44
a) Draw a scatter plot of the data.
b) Fit linear, quadratic, exponential, power, logarithmic, and logistic functions to the
data. By comparing the values of R2 , determine the function that best fits the
data.
Project AMP
A Quesada Director Project AMP
7
c) Superimpose the regression curve on the scatter plot.
d) Use the regression model to predict the population in 1870.
e) Use the regression model to predict the population in 1930. Explain why/why not
you feel this prediction has validity. (Hint: you may want to complete this
problem after you finish the problem dealing with Census records after the Civil
War.)
2. Projected Number of Alzheimer’s Patients: German psychiatrist Alois Alzheimer
first described the disease, later called Alzheimer’s disease, in 1906. Since life
expectancy has significantly increased in the last century, the number of
Alzheimer’s patients has increased dramatically. The number of patients in the
United States reached 4 million in 2000. The following table lists projected data
regarding the number of Alzheimer’s patients in years beyond 2000.
Year, x
2000
2010
2020
2030
2040
2050
Projected Number of
Alzheimer’s Patients in
the United States (In
millions)
4.0
5.8
6.8
8.7
11.8
14.3
a) Draw a scatter plot of the data.
b) Fit linear, exponential, power, logistic and logarithmic functions to the data. By
comparing the values of R2 , determine the function that best fits the data.
c) Superimpose the regression curve on the scatter plot.
d) Use the regression model to estimate the number of Alzheimer’s patients in 2005,
2025, and 2100.
3. Number of physicians: The following table contains data regarding the number of
physicians in the United States in selected years.
Year
1950
1955
1960
1965
1970
1975
1980
1985
1990
1994
1995
1996
Project AMP
Total Number of Physicians
219,997
241.711
260.484
292,088
334,028
393,742
467,679
552,716
615,421
684,414
720,325
737,764
A Quesada Director Project AMP
8
a) Draw a scatter plot of the data.
b) Fit linear, quadratic, cubic, exponential, quartic, and power functions to the data.
By comparing the values of R2 , determine the function that best fits the data.
c) Superimpose the regression curve on the scatter plot.
d) Use the regression model to predict the population in 1975.
e) Use the regression model to estimate the number of physicians in 2000 and 2025.
3. Credit Card Volume: The total credit card volume for Visa, MasterCard,
American Express, and Discover has increased dramatically in recent years, as
shown in the table below. (Source, CardWeb Inc.’s CardData)
Year, x
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
Credit Card Volume, y (In Billions)
261.0
296.3
338.4
361.0
403.1
476.7
584.8
701.2
798.3
885.2
a) Draw a scatter plot of the data.
b) Fit linear, quadratic, cubic, exponential, quartic, and power functions to the data.
By comparing the values of R2 , determine the function that best fits the data.
c) Superimpose the regression curve on the scatter plot.
d) Use the regression model to predict the credit card volume in 2003 and in 2010.
Logarithmic Regression Problems
1. Forgetting In an art class, students were tested at the end of the course on a final
exam. Then they were retested with an equivalent test at subsequent time intervals.
Their scores after time t, in months, are given in the table.
Time, t (in
months)
1
2
3
4
5
6
Score, y
84.9%
84.6%
84.4%
84.2%
84.1%
83.9%
a) Draw a scatter plot of the data.
b) Fit linear, quadratic, logarithmic, and power functions to the data. By comparing
the values of R2 , determine the function that best fits the data.
c) Superimpose the regression curve on the scatter plot.
Project AMP
A Quesada Director Project AMP
9
d) Use the regression model to predict test scores after 8, 10, 24, and 36 months.
e) After how long will the test scores fall below 82%?
2
Jamie, a meteorologist, is interested in finding a function that explains the relation
between the height of a weather balloon (in kilometers) and the atmospheric
pressure (measured in millimeters of mercury) on the balloon. She collects the
data shown in Table 10.
Table 10
a) Using a graphing utility, draw a
scatter diagram of the data with
atmospheric pressure as the
independent variable.
b) Fit linear, quadratic, logarithmic, and
power functions to the data. By
comparing the values of R2 ,
determine the function that best fits
the data.
c) Superimpose the regression curve on
the scatter plot.
d) Use the function in part (b) to predict
the height of the weather balloon if the atmospheric pressure is 560 millimeters of
mercury.
Atmospheric Pressure, p
760
740
725
700
650
630
600
580
550
Height,h
0
0.184
0.328
0.565
1.079
1.291
1.634
1.862
2.235
3. Economics and Marketing The following data represent the price and
quantity supplied in 2005 for IBM personal computers.
Price ($/Computer)
2300
2000
1700
1500
1300
1200
1000
Quantity Supplied
180
173
160
150
137
130
113
(a) Using a graphing utility, draw a scatter diagram of the data with price
as the dependent variable.
(b) Using a graphing utility, try a variety of function families. Compare
the values R2 to find the function that best fits the data.
(c) Using a graphing utility, draw the function found in part (b) on the
scatter diagram.
(d) Use the function found in part (b) to predict the number of IBM
personal computers that will be supplied if the price is $1650.
Project AMP
A Quesada Director Project AMP
10
Power Regression Problems
1. Use the data in the table below to obtain a model for speed p versus distance
traveled d. Consider linear, quadratic, exponential, power, and quartic models.
Then use the model you selected as the best fit to predict the speed of the ball at
impact, given that impact occurs when d  1.80 m.
Table 2.12 Rubber Ball Data from CBR Experiment
Distance (m)
0.00000
0.04298
0.16119
0.35148
0.59394
0.89187
1.25557
Speed (m/s)
0.00000
0.82372
1.71163
2.45860
3.05209
3.74200
4.49558
2. The length of time that a planet takes to make one complete rotation around the
sun is its year. The table shows the length (in earth years) of each planet’s year
and the distance of that planet from the sun (in millions of miles). Find a model
for this data in which x is the length of the year and y the distance from the sum.
Planet
Mercury
Venus
Earth
Mars
Jupiter
Saturn
Uranus
Neptune
Pluto
Year
.24
.62
1
1.88
11.86
29.46
84.01
164.79
247.69
Distance
36.0
67.2
92.9
141.6
483.6
886.7
1783.0
2794.0
3674.5
3. Cholesterol Level and the Risk of Heart Attack. The data in the following table
show the relationship of cholesterol level in men to the risk of a heart attack.
Cholesterol Level, x
100
200
250
275
300
Men, Per 10,000, Who Suffer A Heart
Attack, y
30
65
100
130
180
(a) Use a graphing calculator to fit a model function to the data. Consider
linear, exponential, power, and cubic functions.
(b) Graph the function with the scatterplot of the data.
(c) Use the answer to part (a) to estimate the heart attack rate for men with
cholesterol levels of 150, 350, and 400.
Project AMP
A Quesada Director Project AMP
11
Logistic Regression Problems
1. After the Civil War, the U.S. population increased, as shown below.
Year
Population in Millions
1870
1880
1890
1900
1910
1920
1930
1940
1950
1960
1970
1980
1990
2000
38.56
50.19
62.98
76.21
92.23
106.02
123.20
132.16
151.33
179.32
202.30
226.54
248.72
281.42
a) Draw a scatter plot of the data.
b) Fit linear, quadratic, exponential, power, logarithmic, and logistic functions to the
data. By comparing the values of R2 , determine the function that best fits the data.
c) Use the regression model to predict the population in 1975 and in 2010. Explain
why/why not you feel this prediction has validity.
2. Effect of Advertising A company introduces a new software product on a trial run
in a city. They advertised the product on television and found the following data
relating the percent P of people who bought after x ads were run.
Number of Ads, x
% Who Bought, P
0
0.2
10
0.7
20
2.7
30
9.2
40
27
50
57.6
60
83.3
70
94.8
80
98.5
90
99.6
Draw a scatter plot of the data. Then, fit linear, exponential, power, logistic and logarithmic
functions to the data. By comparing the values of R 2 , determine the function that best fits the
data. Then use the regression model to predict the percent P of people who will buy the
software after 100 ads are run.
* Relate what you have discovered in this exercise to what you have observed in television ads.
What could the company do to change this pattern?
Project AMP
A Quesada Director Project AMP
Download