Exponential regression (Source: http://www

advertisement
*** NO HELP IS ALLOWED ***
STA1001C
CHAPTER 4 PROJECT
This will be your test grade for Chapter 4.
*** NO HELP IS ALLOWED ***
Project Guidelines:
1. You may choose to work alone or with a partner. More than two people
in a group is not acceptable. Working alone is acceptable.
2. No work or answers should be submitted on the question paper.
3. Include a cover page with the name(s) of the person(s) who worked on
the project. If working with a partner, submit only one project.
4. Show all work neatly, clearly, and completely. Whenever appropriate,
answer in full sentences. It is strongly recommended that partners carefully
proofread each other’s solutions.
5. Several questions ask for graphs. All graphs should be imported from a
computer and/or a graphing calculator (you may use Excel or StatCrunch).
Include the viewing window or show the scale on the axes.
6. Put thought into your explanations. Do not hesitate to research them on
the Web. Don’t take the easy way out!
7. Begin this project as soon as possible and complete the problems
gradually. Do not try to rush and complete it at the last minute. You will not
have enough time to produce a quality product. Remember this will be
your test grade for Chapter 4.
8. This project is worth 100 points.
9. No help is allowed from anyone other than your class partner, if you
choose to work with one. If you need help with technology, see your
instructor!
*** NO HELP IS ALLOWED ***
*** NO HELP IS ALLOWED ***
PART I – Comparing Linear Data
Shown below are the populations for both Birmingham, Alabama and Orlando, Florida from
1960 to 2000 (Sources: http://en.wikipedia.org/wiki/Birmingham,_Alabama and
http://en.wikipedia.org/wiki/Orlando,_Florida).Use these tables to answer questions 1-7 below.
Orlando, Florida
Historical populations
Birmingham, Alabama
Historical populations
Census
Census
Pop.
Pop.
1960
340,887
1960
86,135
1970
300,910
1970
99,006
1980
284,413
1980
128,291
1990
265,968
1990
164,693
2000
242,820
2000
185,951
1. Make scatterplots of the Birmingham population and the Orlando population data
on the same graph (use years since 1960, ex. 1960 = 0, 1970 = 10).
Population
400,000
350,000
300,000
250,000
200,000
150,000
100,000
50,000
0
0
**Birmingham data is
10
20
30
Year
40
50
and Orlando is
2. Describe the relationship between the two variables for each data set, in terms of
form, strength and direction.
Birmingham:
Form: Linear
Strength: Strong
Direction: Negative
As year increases, the population decreases.
Orlando:
Form: Linear
Strength: Strong
Direction: Positive
As year increases, the population increases.
*** NO HELP IS ALLOWED ***
*** NO HELP IS ALLOWED ***
3. Find the equation of the least squares regression line, (line of best fit) for each
data set, rounding the slope and y-intercept to the nearest hundredth.
Birmingham: yˆ  333,214.8  2,310.76 x
Orlando:
yˆ  79,751.4  2,653.19x
4. What is the slope of the regression line for the Birmingham data? Explain what
the slope means in the context of this problem. What is the slope of the regression
line for the Orlando data? Explain what the slope means in the context of this
problem.
Birmingham: Slope is -2,310.76/1
Every 1 year, the population decreases 2,310.76 people.
Orlando: Slope is 2,653.19/1
Every 1 year, the population increases 2,653.19 people.
5. Do the y-intercepts have a reasonable interpretation in the context of this
problem? If so, what is the meaning of the y-intercept for each equation?
Yes, they have a reasonable interpretation in the context of this problem.
Birmingham: y-intercept is 333,214.8
At year 0 (1960), the population is 333,214.8 people.
Orlando: y-intercept is 79,751.4
At year 0 (1960), the population is 79,751.4 people.
6. What is the correlation coefficient for the Birmingham data? What does it tell you
about the association between the two variables in this data? What is the
correlation coefficient for the Orlando data? What does it tell you about the
association between the two variables in this data?
Birmingham: correlation coefficient is -0.985
The association between the two variables appears to be strong in a negative
direction. The correlation coefficient is very close to -1. As year increases,
the population decreases.
Orlando: correlation coefficient is 0.99
The association between the two variables appears to be strong in a positive
direction. The correlation coefficient is very close to +1. As year increases,
the population increases.
*Remember that correlation does NOT imply causation.
*** NO HELP IS ALLOWED ***
*** NO HELP IS ALLOWED ***
7. Assume that the regression lines you found represent accurate trends in the
population data for each city. Is there a year in which Birmingham and Orlando
will have the same population? If so, state when and what that population is.
Indicate that point on your scatterplot.
(51.06, 215, 215,225.48)
At year 51.06 (2011), the populations are equal at 215,225.48 people.
Can be found algebraically by:
333,214.8  2,310.76 x  79,751.4  2,653.19 x
*** NO HELP IS ALLOWED ***
*** NO HELP IS ALLOWED ***
PART II – Comparing Linear and Non-linear Data
United States Population
The following tables contain data from the Information Please
Almanac. They show United States population from 1800 to 1860
(just prior to the Civil War) and from 1870 to 1920 (just after the
end of World War I).
Table 1
years since 1800
0
10
20
30
40
50
60
population in millions 5.31 7.24 9.64 12.87 17.07 23.19 31.44
years since 1870
Table 2
0
10 20
30
40
50
population in millions 39.2 50.3 63.1 76.2 92.0 106.1
Answer questions 6 – 16 relating to the data in the tables.
1. Make two separate scatter plots – one for the data in Table 1 and another for the
data in table 2.
Table 1:
35
30
Population
(Millions)
25
20
15
10
5
0
0
10
20
30
40
Year since 1800
50
60
*** NO HELP IS ALLOWED ***
70
*** NO HELP IS ALLOWED ***
Table 2:
150
125
Population
(Millions)
100
75
50
25
0
0
10
20
30
40
Year since 1870
50
60
2. In which scatterplot does the data appear to be more linear in form? (Make a note
for yourself which table contains data that is linear in form, and which table
contains data that is exponential in form. For the following questions you will be
switching back and forth between tables but to avoid revealing the answers, the
tables will be referenced by the form of the data they contain rather than the table
number.)
Table 1: Exponential
Table 2: Linear
3. For the scatterplot you identified in question 2 (linear form),
a. Find the equation of the least squares regression line, (line of best fit) for
each data set, rounding the slope and y-intercept to the nearest hundredth.
yˆ  37.39  1.35x
b. What is the slope of the regression line? Explain what the slope means in
the context of this problem.
Slope is: 1.35/1
Every 1 year, the population increases 1.35 million people.
c. What does the correlation coefficient tell you about the association
between the two variables in that data?
Correlation Coefficient: r = 0.998
The strength is strong because it is close to 1.
As the years increase, the population increases.
*** NO HELP IS ALLOWED ***
*** NO HELP IS ALLOWED ***
4. Find the equation of an exponential model for the remaining scatterplot (the one
you did not identify in question 2 – exponential form).
yˆ  5.34 1.03x
5. Using your exponential model, what was the growth or decay factor? Be sure to
identify whether this factor represents growth or decay. (Bonus: Based on that
factor, what was the percent of increase or decrease from year to year?)
The growth factor (b is greater than 1) is 1.03.
BONUS: 3% increase. Reason 1.03, the 0.03 is the percent increase, over the
1 (orginal).
6. If your exponential model, continued to accurately represent the population for the
exponential data through the year 1900, what would its estimate for the population in
1900 be?
1900 would be x of 100 (Remember 1800 is 0)
yˆ  5.34 1.03100  102.63 million
7. Assuming that the trend for the linear data continued to accurately represent the
population until 1950, use your linear regression equation to estimate the
population of the United States in 1950.
1950 would be x of 80 (1870 is 0), yˆ  37.39  1.35(80)  145.39 million
8. What was the actual United States population in 1950? To answer this question,
you will need to use the internet or some other form of reference. Please include
in your answer the reference you used (if a website, identify the web address).
https://www.census.gov/history/www/through_the_decades/fast_facts/1950_fast_facts.ht
ml
*** NO HELP IS ALLOWED ***
*** NO HELP IS ALLOWED ***
9. By how much do the answers to questions 7 and 8 differ? What historical
reason(s) can you give for this difference?
Actual – Predicted = Difference
Questions 7: 151,325,798 – 145,390,000 = 5,935,798
Reasons: Vary, possible reasons are events that change population not
accounted in prediction.
10. In which year did the population of the United States surpass 300 million? To
answer this question, you will need to use the internet or some other form of
reference. Please include in your answer the reference you used (if a website,
identify the web address).
2007,2,300095558 February 2007
https://www.census.gov/popest/data/intercensal/national/files/US-EST00INTTOT.csv
11. Using the year you identified in your answer to question 10, determine which model
gives the better prediction for the population of the United States in that year. Why
do you think this is so?
2007 would be x of 207 (1800 is 0), yˆ  5.34 1.03
207
 2425.75 million
2007 would be x of 137 (1870 is 0), yˆ  37.39  1.35(137)  222.34 million
The linear model would be better because when you enter a large value into in the
exponential model such as 207 the result will be very large, but entering in 137
into the linear model is more representative of the total amount.
*** NO HELP IS ALLOWED ***
Download