TigerSTAT Instructor Guide Quick Info Level: Intro/Intermediate

advertisement
TigerSTAT Instructor Guide
Laboratory Exercise: Using Simple Linear Regression to Estimate a Tiger’s Age
Quick Info
Level: Intro/Intermediate Undergraduate Statistics
Brief Description: Students investigate the association of tiger age with particular tiger characteristics by
conducting a simple regression analysis. Instructors also have the option to give a reading assignment
brom an article in a scientific publication comprised descriptive statistics and regression analysis.
Topics Covered: Experimental Design, Data Analysis, Linear Regression, introduction of a real application
of statistical modeling using the Arcsine transformation.
Software Required: Data analysis software such as Excel or Minitab for descriptive statistics and
regression analysis. Students will also need computer access to play the TigerSTAT game on the web (this
can be done inside or outside of the regularly scheduled class time).
Prerequisites: Descriptive Statistics, Distributions, Hypothesis Testing
Time: 1 to 3 hours in class and 1 to 2 hours of homework
Instructor Resources: Student Lab, Instructor Guide, TigerSTAT Game Website
(http://web.grinnell.edu/individuals/kuipers/stat2labs/tigerstat.html)
Sustainable trophy hunting of African lions, (Whitman, et. al. 2004)
(http://www.cbs.umn.edu/lionresearch/publications/articles/Sustainable_trophy_hunting_of_African_lion
s.pdf)
Why use this lab in your course?
In this lab, students use the on-line TigerSTAT game to collect data and explore models for estimating
the age of a Siberian tiger. In this game, students act as researchers on a national preserve where
they are expected to catch tigers, collect data, analyze their data (using the simple linear regression
on transformed data), and draw appropriate conclusions. Before playing the game, student can read a
scientific paper discussing current methods of estimating age in lions largely through the use of proxy
variables. They are exposed to messy data and issues associated with data collection and through the
TigerSTAT game.
This lab provides an engaging way to practice simple linear regression applied to a real
problem. The realism of the lab can be increased if they also read and discuss the research
article provided in the introduction. One goal of this lab is to encourage students to consider
the implications of more complicated research design topics like sampling and bias.
What type of course is this TigerSTAT lab designed for?
This lab is designed for any course that introduces the simple linear regression model.
When should you use this lab in your course and what are the prerequisites?
Distributions and hypothesis testing should be familiar topics to the students. Although
linear regression is the primary topic of this lab, the game can be used to motivate many
topics such as descriptive statistics and visualizing data. In this sense, the game may
be visited several times during a course.
How should you conduct the lab? How much time should you expect to allocate?
Day 1: OPTIONAL READING: Ask students to complete the background research
exercise (page 1) where they read the entire article Sustainable trophy hunting of
African lions, (Whitman, et. al. 2004) and answer the discussion questions before class.
At the beginning of Day 1 you can start class by using the questions from the
background research exercise to motivate discussion. An alternative approach is to
include this as a part of the first day, and perhaps select only portions of the article to
have students read in class and then discuss together. The game and the following lab
questions do not require students to read the article or look at the mathematical model
used in the article.
You may also choose to skip page 1, reading and discussing the research article, and
simply start students with page 2 (the lab and the game). Introduce the game (15
minutes). If students have computers available, the first day would be an excellent
opportunity to go to the game’s webpage and play the tutorial. If no computers are
available, a brief discussion is warranted. The instructor should explain how the tutorial
works, the difference between the two missions and how to retrieve the dataset. The
instructor should have students complete questions 1-4 of the lab in preparation for the
next class.
Day 2: Have students come to class with their game data and complete all the lab
questions, working in groups of 2-3. There are a few points of time during the lab that
the instructor should solicit discussion on certain blocks of questions. We recommend
the following:


Begin with a discussion of student responses to questions 1-4 (10 min).
Have students work on questions 5-9 (focus discussion on assumptions of a
linear model) (15 min). Instructors should be aware that 1) some student data
may not support the model, 2) small data sets may be very erratic and may not


accurately fit any model, and 3) it may be best to group data (make sure there
are no duplicate observations in grouped data)..
After a majority of student groups have worked on questions 10-12, discuss
significance of association and model fit (15 min).
Question 13 will elicit very different responses from the students. For some, the
transformation will mean very little in terms of the relationship between noseblack
proportion and the model R2 value. For others the transformation will be readily
apparent. Discussion should focus on the issues of sample size, representative
sample and bias in the context of the experiment. (20 min)
The Background Research exercise and TigerSTAT lab are available in the next
section.
For more information and ideas on using TigerSTAT in your course go to:
http://web.grinnell.edu/individuals/kuipers/stat2labs/tigerstat.html.
TigerSTAT Background Research
Before conducting the TigerSTAT lab, read the 2004 article by Whitman et. al.,
Sustainable trophy hunting of African lions, ( the article can be found at
http://www.cbs.umn.edu/lionresearch/publications/articles/Sustainable_trophy_hu
nting_of_African_lions.pdf ) and answer the following questions:
1. Why is estimating the age of a lion a worthwhile question?
2. What are some of the difficulties associated with estimating the age?
3. What are a few approaches to estimation for lion ages? Which of these are possibly
useful in estimating the age of a Tiger?
4. How could you test to see if your model produces good estimates for a Tiger’s age?
TigerSTAT
Using Simple Linear Regression to Estimate a Tiger’s Age
You are hired to develop models to use in estimating the age of a population of
tigers.
The Bolshoy Kosha (Russian for big cat) Reserve is a newly created animal reserve that
was uniquely developed to help endangered species prosper. This 10,000 acre wild
animal reservation was selected because an abundance of Siberian tigers have been
found in the area. The diverse terrain of the reserve provides a wide variety of habitats
for many different species of animals.
Since the tigers in this area are much more abundant than any other area in the world,
they are starting to draw a significant number of researchers to the region. Your
primary responsibility will be to help these researchers as they study the tigers and then
incorporate the results of their research into a system to identify the best management
practices for this reserve.
Establishing a simple model to estimate the age of a tiger.
While the exact age is not known for most of the tigers in your reserve, the age of some
tigers are known. These have been carefully monitored by keeping them in a smaller
research zone within the BK land area. To estimate the age of a tiger that is captured on
your reserve, you will need to compare characteristics of the captured tiger to the ones
that live on the research zone (whose ages are known).
When data is collected as an indirect measure for the variable of interest, it is often
called proxy data. For this task we will examine one model developed for lions and see
how well it extends to our tigers. In the Whitman et. al. (2004) article the authors used
the percentage of nose blackness (NOSEBLACK%) to develop the model:
AGE   0  1 arcsin(NOSEBLACK%)
(1)
Your mission is to go into the Bolshoy Kosha reserve and gather data on as many tigers
as you can in 30 minutes. Using your sample data, answer the questions shown below.
Collecting Data Go to http://statgames.tietronix.com/TigerStat/ and enter a
PlayerName and GroupName (Use a secret name, any combination of letters and
numbers with no spaces. Do not use your name or a term that will identify you). If you
are working in teams each person on your team should have the same GroupName.
Use the Full Screen option to see the entire game on your computer screen.
Questions:
1) Calculate the mean and standard deviation of NOSEBLACK% of the tigers in
your sample.
2) Calculate the mean and standard deviation of the AGE of the tigers in your
sample.
3) Produce a graph of the AGE against NOSEBLACK% for your sample – describe
the relationship you observe. Would a linear model be appropriate for these
variables? Why or why not?
4) Create a new variable ANOSEBLACK% by computing the Arcsin of
NOSEBLACK% for tigers in your sample. Graph AGE as a function of this new
variable. Do you think the assumed linear relationship is reasonable? Why or
why not?
5) Use your software package to regress ANOSEBLACK% on AGE in order to
estimate the parameters in equation (1).
Checking the assumptions for any statistical model is imperative before any inferences
are made. For our simple regression model, we assume that the residuals are normally
distributed. Let’s check the validity of this assumption.
6) Using the parameter values obtained in (5), estimate the age of the tigers in your
sample.
7) For each estimated value, compute the associated residual ei  ( yi  yˆ i ) .
8) Create an appropriate plot you have learned about in class (histogram, qq-plot)
for assessing the normality assumption for the set of residual values computed in
Question (7). Does our assumption of normality hold?
Before performing our regression, we transformed our explanatory variable
NOSEBLACK% using the Arcsin function (ANOSEBLACK%). Let’s evaluate the validity
of our transformation.
9) Create a plot of the residuals computed in (7) versus the ANOSEBLACK% of
each member of the dataset. Do any patterns emerge? What does it mean if
there is a distinctive pattern?
Now that we have checked the assumptions of our model, let’s look at how well the
model performs. One measure for this is the coefficient of determination, R 2 . This is
the proportion of variability in the data set that is accounted for by the statistical model
and gives us insight as to how well future outcomes are likely to be predicted by the
model. We compute R 2 using equation (2) below.

SSE
R 2 1 
SST
where SSE   ( yi  yˆi )2 and SST   ( yi  y ) 2
i
(2)
i
SSE is the sum of squared error, a measure of the unexplained variance or variability
not captured by the model. SST, the sum of squares total, is a measure of the overall
sample variance.
10) Compute R 2 for the model developed for your sample. Based on this value,
how well does our model perform?
Before making any inferences or predictions on the mean values of the response
variable,
we must determine if the parameter associated with the predictor
ANOSEBLACK% is significant. That is, we desire to test the null hypothesis that 𝛽1 = 0
versus the alternative hypothesis 𝛽1 ≠ 0. The test statistic (t) for this hypothesis is
t
ˆ1  1
s1
(3)
and the test statistic has a t-distribution with n-2 degrees of freedom when the null
hypothesis is true.
11) Compute the test statistic for the null hypothesis Ho: 𝛽1 = 0. Do we reject or fail
to reject the null hypothesis? Why or why not?
12) Interpret the results of the hypothesis test in the context of the study.
13) Now perform a similar analysis without the Arcsin transformation. Describe any
differences you see from the previous analysis.
Download