*** NO HELP IS ALLOWED *** STA1001C CHAPTER 4 PROJECT This will be your test grade for Chapter 4. *** NO HELP IS ALLOWED *** Project Guidelines: 1. You may choose to work alone or with a partner. More than two people in a group is not acceptable. Working alone is acceptable. 2. No work or answers should be submitted on the question paper. 3. Include a cover page with the name(s) of the person(s) who worked on the project. If working with a partner, submit only one project. 4. Show all work neatly, clearly, and completely. Whenever appropriate, answer in full sentences. It is strongly recommended that partners carefully proofread each other’s solutions. 5. Several questions ask for graphs. All graphs should be imported from a computer and/or a graphing calculator (you may use Excel or StatCrunch). Include the viewing window or show the scale on the axes. 6. Put thought into your explanations. Do not hesitate to research them on the Web. Don’t take the easy way out! 7. Begin this project as soon as possible and complete the problems gradually. Do not try to rush and complete it at the last minute. You will not have enough time to produce a quality product. Remember this will be your test grade for Chapter 4. 8. This project is worth 100 points. 9. No help is allowed from anyone other than your class partner, if you choose to work with one. If you need help with technology, see your instructor! *** NO HELP IS ALLOWED *** *** NO HELP IS ALLOWED *** PART I – Comparing Linear Data Shown below are the populations for both Birmingham, Alabama and Orlando, Florida from 1960 to 2000 (Sources: http://en.wikipedia.org/wiki/Birmingham,_Alabama and http://en.wikipedia.org/wiki/Orlando,_Florida).Use these tables to answer questions 1-7 below. Orlando, Florida Historical populations Birmingham, Alabama Historical populations Census Census Pop. Pop. 1960 340,887 1960 86,135 1970 300,910 1970 99,006 1980 284,413 1980 128,291 1990 265,968 1990 164,693 2000 242,820 2000 185,951 1. Make scatterplots of the Birmingham population and the Orlando population data on the same graph (use years since 1960, ex. 1960 = 0, 1970 = 10). Population 400,000 350,000 300,000 250,000 200,000 150,000 100,000 50,000 0 0 **Birmingham data is 10 20 30 Year 40 50 and Orlando is 2. Describe the relationship between the two variables for each data set, in terms of form, strength and direction. Birmingham: Form: Linear Strength: Strong Direction: Negative As year increases, the population decreases. Orlando: Form: Linear Strength: Strong Direction: Positive As year increases, the population increases. *** NO HELP IS ALLOWED *** *** NO HELP IS ALLOWED *** 3. Find the equation of the least squares regression line, (line of best fit) for each data set, rounding the slope and y-intercept to the nearest hundredth. Birmingham: yˆ 333,214.8 2,310.76 x Orlando: yˆ 79,751.4 2,653.19x 4. What is the slope of the regression line for the Birmingham data? Explain what the slope means in the context of this problem. What is the slope of the regression line for the Orlando data? Explain what the slope means in the context of this problem. Birmingham: Slope is -2,310.76/1 Every 1 year, the population decreases 2,310.76 people. Orlando: Slope is 2,653.19/1 Every 1 year, the population increases 2,653.19 people. 5. Do the y-intercepts have a reasonable interpretation in the context of this problem? If so, what is the meaning of the y-intercept for each equation? Yes, they have a reasonable interpretation in the context of this problem. Birmingham: y-intercept is 333,214.8 At year 0 (1960), the population is 333,214.8 people. Orlando: y-intercept is 79,751.4 At year 0 (1960), the population is 79,751.4 people. 6. What is the correlation coefficient for the Birmingham data? What does it tell you about the association between the two variables in this data? What is the correlation coefficient for the Orlando data? What does it tell you about the association between the two variables in this data? Birmingham: correlation coefficient is -0.985 The association between the two variables appears to be strong in a negative direction. The correlation coefficient is very close to -1. As year increases, the population decreases. Orlando: correlation coefficient is 0.99 The association between the two variables appears to be strong in a positive direction. The correlation coefficient is very close to +1. As year increases, the population increases. *Remember that correlation does NOT imply causation. *** NO HELP IS ALLOWED *** *** NO HELP IS ALLOWED *** 7. Assume that the regression lines you found represent accurate trends in the population data for each city. Is there a year in which Birmingham and Orlando will have the same population? If so, state when and what that population is. Indicate that point on your scatterplot. (51.06, 215, 215,225.48) At year 51.06 (2011), the populations are equal at 215,225.48 people. Can be found algebraically by: 333,214.8 2,310.76 x 79,751.4 2,653.19 x *** NO HELP IS ALLOWED *** *** NO HELP IS ALLOWED *** PART II – Comparing Linear and Non-linear Data United States Population The following tables contain data from the Information Please Almanac. They show United States population from 1800 to 1860 (just prior to the Civil War) and from 1870 to 1920 (just after the end of World War I). Table 1 years since 1800 0 10 20 30 40 50 60 population in millions 5.31 7.24 9.64 12.87 17.07 23.19 31.44 years since 1870 Table 2 0 10 20 30 40 50 population in millions 39.2 50.3 63.1 76.2 92.0 106.1 Answer questions 6 – 16 relating to the data in the tables. 1. Make two separate scatter plots – one for the data in Table 1 and another for the data in table 2. Table 1: 35 30 Population (Millions) 25 20 15 10 5 0 0 10 20 30 40 Year since 1800 50 60 *** NO HELP IS ALLOWED *** 70 *** NO HELP IS ALLOWED *** Table 2: 150 125 Population (Millions) 100 75 50 25 0 0 10 20 30 40 Year since 1870 50 60 2. In which scatterplot does the data appear to be more linear in form? (Make a note for yourself which table contains data that is linear in form, and which table contains data that is exponential in form. For the following questions you will be switching back and forth between tables but to avoid revealing the answers, the tables will be referenced by the form of the data they contain rather than the table number.) Table 1: Exponential Table 2: Linear 3. For the scatterplot you identified in question 2 (linear form), a. Find the equation of the least squares regression line, (line of best fit) for each data set, rounding the slope and y-intercept to the nearest hundredth. yˆ 37.39 1.35x b. What is the slope of the regression line? Explain what the slope means in the context of this problem. Slope is: 1.35/1 Every 1 year, the population increases 1.35 million people. c. What does the correlation coefficient tell you about the association between the two variables in that data? Correlation Coefficient: r = 0.998 The strength is strong because it is close to 1. As the years increase, the population increases. *** NO HELP IS ALLOWED *** *** NO HELP IS ALLOWED *** 4. Find the equation of an exponential model for the remaining scatterplot (the one you did not identify in question 2 – exponential form). yˆ 5.34 1.03x 5. Using your exponential model, what was the growth or decay factor? Be sure to identify whether this factor represents growth or decay. (Bonus: Based on that factor, what was the percent of increase or decrease from year to year?) The growth factor (b is greater than 1) is 1.03. BONUS: 3% increase. Reason 1.03, the 0.03 is the percent increase, over the 1 (orginal). 6. If your exponential model, continued to accurately represent the population for the exponential data through the year 1900, what would its estimate for the population in 1900 be? 1900 would be x of 100 (Remember 1800 is 0) yˆ 5.34 1.03100 102.63 million 7. Assuming that the trend for the linear data continued to accurately represent the population until 1950, use your linear regression equation to estimate the population of the United States in 1950. 1950 would be x of 80 (1870 is 0), yˆ 37.39 1.35(80) 145.39 million 8. What was the actual United States population in 1950? To answer this question, you will need to use the internet or some other form of reference. Please include in your answer the reference you used (if a website, identify the web address). https://www.census.gov/history/www/through_the_decades/fast_facts/1950_fast_facts.ht ml *** NO HELP IS ALLOWED *** *** NO HELP IS ALLOWED *** 9. By how much do the answers to questions 7 and 8 differ? What historical reason(s) can you give for this difference? Actual – Predicted = Difference Questions 7: 151,325,798 – 145,390,000 = 5,935,798 Reasons: Vary, possible reasons are events that change population not accounted in prediction. 10. In which year did the population of the United States surpass 300 million? To answer this question, you will need to use the internet or some other form of reference. Please include in your answer the reference you used (if a website, identify the web address). 2007,2,300095558 February 2007 https://www.census.gov/popest/data/intercensal/national/files/US-EST00INTTOT.csv 11. Using the year you identified in your answer to question 10, determine which model gives the better prediction for the population of the United States in that year. Why do you think this is so? 2007 would be x of 207 (1800 is 0), yˆ 5.34 1.03 207 2425.75 million 2007 would be x of 137 (1870 is 0), yˆ 37.39 1.35(137) 222.34 million The linear model would be better because when you enter a large value into in the exponential model such as 207 the result will be very large, but entering in 137 into the linear model is more representative of the total amount. *** NO HELP IS ALLOWED ***