Statistics 301 Project #2 Description Fall 2015

advertisement
Statistics 301
Project #2 Description
Fall 2015
The projects allow you to use your skills in a practical problem. Statistics provides tools to make
justifiable conclusions from data with variation. The project includes description of data, inference
about answers to questions, and evaluating assumptions. You will work in a team of 2 to 4 people,
although please talk with me if you prefer to work as an individual. Your team will choose one of the
two topics described on the next page. I will provide you your data set. Your team will write a technical
report of not more than 8 typed (11 point font or larger) double-spaced pages. Reports will not be
returned, so make a copy if you want a copy for yourself.
This project allows you to apply all the modelling skills you have learnt over the semester. The next
page provides short summaries of the two topics. Additional information about your chosen topic and
the data set will be sent to you next week. You will have time in lab to work on the project.
Your report should include:
 An executive summary (one or two paragraphs) that briefly but completely summarizes your
approach and the results of your analysis. Although this will appear at the beginning of your
report, it is likely to be the last thing you write.
 A summary of relevant aspects of the study design.
 A description of the relationship between your explanatory and response variables. Again,
include any graphs, summary statistics, and analysis that support your description. You may
need to transform one or more variable if the initial analysis has problems with assumptions.
 An evaluation of assumptions for the ``final’’ analysis.
 Answers to topic-specific questions
 Predictions and any related information for the specified X values.
Write your report with a minimum of statistical jargon. Jargon is especially inappropriate in the
executive summary. If you use statistical phrases, e.g., saying something is statistically significant, you
need to explain what you mean by that.
Turning in a wad of JMP output is not an appropriate statistical analysis. The body of your report should
include only those results that support your description or inference. You should cut/paste important
figures into your report. Most numeric results will be better presented in a form other than the JMP
output. Remember that JMP usually gives you more than you need so do not include graphs or output
that is not needed. Including unnecessary JMP output will result in deductions from the final project
score.
The project will be graded with a total of 30 points. If everyone on the team contributes approximately
equally, each team member will receive the same score. When the report is submitted, any team
member can include a separate sealed envelope giving her/his assessment of the relative effort
contributed by each team member. This should be expressed as a percentage of the total team effort
for each team member. If effort was wildly unequal, I will discuss the issue with team members and if
appropriate, adjust individual scores. If no envelopes are turned in, I will assume equal contributions.
The two topics available are:
Topic 1: Understanding reproduction in wild horses (part 2). The wild horse is a Western icon but large
numbers of wild horses create ecological problems. The response variable is fecundity, the number of
foals per adult in the group. This is measured on 50 groups of wild horses. Data were collected over
multiple years in two locations. As it turns out, the treatment was expected to have no effect in the first
year of the study (details provided if you choose this project).
There are various nuisance variables that might need to be adjusted for: location, number of adults in
the group, observation month, and year post-treatment. It is also not clear how to best denote the
treatment: as 1/0 (for group has a sterilized male / does not) or as the proportion of sterilized males in
the group (0 or some number > 0).
The folks managing these herds want to know:
1. what is the effect of the treatment on foal production (fecundity)?
2. is the response to the treatment different in the two locations?
3. the predicted fecundity for three specified conditions
Your group will need to decide (among other things): how to deal with the nuisance variables, how to
use data from the first year of the study, and how best to use the treatment information.
Topic 2: Investigation of possible sex discrimination in a bank. These data are derived from those
provided in support of a 1980’s era lawsuit against a Texas bank. The plaintiff’s claim that female
employees were hired at a lower starting salary and given smaller pay raises. To support these claims,
they obtained data on 100 employees hired between 1965 and 1975. The data set contains information
on one job category (skilled, entry-level clerical). It includes information on the starting salary, the
salary in 1977, the average annual pay raise, the number of months they had been hired when the data
were collected, the year of hire, and, and some descriptive information about each individual: their age,
sex, number of years of education, and number of months of work experience prior to being hired at the
bank.
The plaintiff’s lawyers have asked you to:
1. estimate the average difference between (or ratio of) female and male starting salaries.
2. estimate the average difference between (or ratio of) female and male starting salaries, after
adjusting for appropriate individual characteristics.
3. estimate the average difference between (or ratio of) female and male pay raises, after
adjusting if appropriate.
4. the predicted pay raise for two specified individuals.
Your group will need to decide (among other things): how to appropriately adjust for individual
characteristics
Due dates:
End of lecture, 20 Nov: Each team submits a list of team members and its choice of topic. If no team
member will be in class that day, please e-mail me that information.
End of lecture, Friday, 11 Dec: Turn in your report.
Download