Group Projects on Estimation and Hypothesis Testing

advertisement
Projects on Estimation and Hypothesis Testing
General Guidelines
This assignment is due on Wednesday, December 6th at the beginning of class. Papers
that are late or not stapled will not be accepted. Leave time for unforeseen
emergencies. You are to choose one of the following three assignments for your project.
IMPORTANT!!!
This project is to be done by you individually, without assistance from anyone
except Dr. Stevens. This includes but is not limited to any discussion of the
project with classmates, other students, faculty, and family. While you are
allowed to use any textbooks or library books that you wish, any discussion or
sharing of your methods, spreadsheets, data, or results with other students
constitutes an honor violation and will be dealt with as such. You should know
that I have successfully pursued such violations in the past.
Your work on this project must be done in Excel. Except where specified in the
instructions, you may not use my Excel templates for your work. Generate the
appropriate Excel formulas for your problem without reference to my templates.
Standard Operating Procedures (SOP) for Excel Printouts
1. Turn gridlines on and turn labels on. Your resulting printout should then be
gridded, and each row and column should appear with a numerical or letter label.
You can find these options on the File/Page Setup menu by clicking on the Sheet
tab that appears.
2. Adjust the column widths of your work so that the columns are wide enough to
contain the data, yet fit well on the page. When printing, look at both landscape
and portrait orientation for your printout, and choose the one that fits best. Your
problem must fit on a single page. You can make your data fit on one page by
checking the “fit to 1 page wide by 1 page high” box. If you do this, make sure
that you haven’t made so many columns (or rows) that the printing is too tiny to
read!
3. Print two copies of your sheet: one with the numeric values, and one with the
formulae that generated them. You can switch between these views by using the
control-~ key combination. Note that you’ll probably need to resize your column
widths when you display formulas. Failure to do so might cut some information
off.
4. Clearly label the meanings of the data that you enter. If the population mean for
the number of rabbits caught is 35 per week, indicate this (e.g., “population
mean”, “mean number caught (pop)”, etc.). Make sure that it’s clear where your
answer is. Although my templates are more detailed than you will need for this
work, they can give you some ideas for organization.
5. Any graphs in your work should be properly identified and labeled. This includes
title, axis labels, and (if more than one series is in the graph) a sensible legend.
Project #1: The Name Game
Go to the website http://www.ssa.gov/OACT/babynames/. Scroll to the bottom of the
page and select the top 1000 baby names for people born in the same decade that you
were born. Find your own first name on this list. (If your name is not on the list, you
cannot do this particular project.) Read the information given above the data, as it is
relevant to your work. Treat the information at the website as applying to a sample from
a much larger population.
Your tasks are as follows:
1. We’ll define your “cohort” as the set of all Americans of your gender who were
born in the same decade as you were. (My cohort, for example, would be
American males born in the 1950s.) Build the 98% confidence interval for the
fraction of people in your cohort who share your first name.
2. At the 2% level of significance, determine if there is evidence that your name is
less popular (in terms of the fraction of children given that name) in the 2000s
cohort than it was in your own cohort. You may use my Excel spreadsheet for
this, but must clearly explain your answer.
3. The website includes the top 10 names for all years from 1880 until 1997.
Assume that this list is correct for the entire US population for those years.
Generate a simple random sample of 10 of those years and note the number of
letters in the most popular girl’s name for each of those 10 years. Test the null
hypothesis that the average number of letters in the most popular girl’s name
during the period 1880 to 1997 is 4.5 letters long. Use a 10% level of
significance.
The report should be professional in appearance, done with a word processor and Excel,
and should include:

A cover page including your name, your section, and the name of the project .
Also include a signed honor pledge: “On my honor, I have neither given nor
received unauthorized assistance on this project.”

The text of each of the tasks above, along with your answer for that task. This
writeup of your answer to the task should include
o all relevant data raw data (such as the names selected for task 3).
o a check, where appropriate, of the assumptions needed to apply a
statistical method. (You should conduct the statistical analysis even if the
assumptions are not met, but should comment on this in your conclusions.)
o your calculations, shown on an Excel spreadsheet, with correct notation
used. (You may write “x-bar” for sample mean, “sigma-sub-p” for the
standard error of the proportion, and so on.) You may not use my
spreadsheets for this work. Except as noted in task 2, you may not
used my Excel spreadsheets in doing this work. Note, in particular, that
you will be giving me two copies of your spreadsheet: the numerical view
and the formula view.
o your final conclusions, interpreting the results of your spreadsheet. Your
conclusions should refer to the particular task at hand. Don’t just tell me,
“We reject the null hypothesis.” What was the null, and what does it mean
to reject it?
o A clear explanation of how you generated your simple random sample.
Use the Excel “randbetween” function in generating your sample.
It’s up to you whether you print your spreadsheets on a separate sheet or imbed them in
your Word document, but it’s probably easier for you to print them on separate sheets.
Project #2: Tornado!
Visit the website http://www4.ncdc.noaa.gov/cgi-win/wwcgi.dll?wwEvent~Storms. Use
the site to gather the data on all tornadoes of intensity F2 or higher that occurred in
Oklahoma during the 25 year period from January 1, 1969 to December 31, 2003. (The
site should return information on all 548 such tornados, including the number of people
killed and injured in each of these storms.) You can move the data from this site to Excel
by highlighting all of the data and doing a cut and paste command.
Your tasks are as follows:
1. Take a simple random sample of 50 storms selected from this population. Use the
data from these 50 storms to build a 95% confidence interval for the average
number of casualties per storm. (A person is a casualty if he or she is either
injured or killed by the storm.)
2. Use all of the population data (all 548 storms) to determine the actual average
number of casualties per storm from this population. Compare the result of this
calculation with the confidence interval you constructed, and discuss your
findings. (It’s quite possible that your confidence interval “won’t work”. If it
doesn’t think about possible reasons why.)
3. Using the same sample of 50 storms obtained above, conduct the appropriate
hypothesis test to see if there is evidence that someone was hurt or killed in more
than 40% of Oklahoma tornados of intensity F2 or higher during the period from
1969 to 2003. Use a 5% level of significance. Identify two changes that have
taken place over the period of time from 1969 to 2003 that suggest that it may not
be appropriate to use these results in predicting tornado casualties in 2007 from
this data.
The report should be professional in appearance, done with a word processor and Excel,
and should include:

A cover page including your name, your section, and the name of the project .
Also include a signed honor pledge: “On my honor, I have neither given nor
received unauthorized assistance on this project.”

The text of each of the tasks above, along with your answer for that task. This
writeup of your answer to the task should include
o all relevant data raw data (such as the information on the 50 storms that
constitute your sample). You need not include the population data on
all 548 storms.
o a check, where appropriate, of the assumptions needed to apply a
statistical method. (You should conduct the statistical analysis even if the
assumptions are not met, but should comment on this in your conclusions.)
o your calculations, shown on an Excel spreadsheet, with correct notation
used. (You may write “x-bar” for sample mean, “sigma-sub-p” for the
standard error of the proportion, and so on.) You may not use my
spreadsheets for this work. Note, in particular, that you will be giving
me two copies of your spreadsheet: the numerical view and the formula
view.
o your final conclusions, interpreting the results of your spreadsheet. Your
conclusions should refer to the particular task at hand. Don’t just tell me,
“We reject the null hypothesis.” What was the null, and what does it mean
to reject it?
o A clear explanation of how you generated your simple random sample.
Use the Excel “randbetween” function in generating your sample.
It’s up to you whether you print your spreadsheets on a separate sheet or imbed them in
your Word document, but it’s probably easier for you to print them on separate sheets.
Project #3: The Bible, Chapter and Verse
Get a copy of the Holy Bible, either the Revised Standard Version or the King James
Version.
Your tasks are as follows:
1. Obtain a simple random sample of 50 Bible chapters, and note the number of
verses in each of these chapters.
2. Use this sample to estimate the average length (in verses) of chapters in the Bible.
Your margin of error for this confidence interval should be +3 verses. What is the
confidence level for the interval that you have constructed?
3. At the 10% level of significance, do you reject the null hypothesis that the
average length of a Bible chapter is 21 verses? Base your conclusion on the 50
chapters in your simple random sample.
The report should be professional in appearance, done with a word processor and Excel,
and should include:

A cover page including your name, your section, and the name of the project .
Also include a signed honor pledge: “On my honor, I have neither given nor
received unauthorized assistance on this project.”

The text of each of the tasks above, along with your answer for that task. This
writeup of your answer to the task should include
o all relevant data raw data (such as the information on the 50 Chapters that
constitute your sample).
o a check, where appropriate, of the assumptions needed to apply a
statistical method. (You should conduct the statistical analysis even if the
assumptions are not met, but should comment on this in your conclusions.)
o your calculations, shown on an Excel spreadsheet, with correct notation
used. (You may write “x-bar” for sample mean, “sigma-sub-p” for the
standard error of the proportion, and so on.) You may not use my
spreadsheets for this work. Note, in particular, that you will be giving
me two copies of your spreadsheet: the numerical view and the formula
view.
o your final conclusions, interpreting the results of your spreadsheet. Your
conclusions should refer to the particular task at hand. Don’t just tell me,
“We reject the null hypothesis.” What was the null, and what does it mean
to reject it?
o A clear explanation of how you generated your simple random sample.
This is crucial for this project. Use the Excel “randbetween” function in
generating your sample.
It’s up to you whether you print your spreadsheets on a separate sheet or imbed them in
your Word document, but it’s probably easier for you to print them on separate sheets.
Download