Main document

advertisement
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
Lake Taupo Trout
Data Sets
for the developing statistical inference at Y 9 to 13
These ten data sets list all the trout weighed in during the Easter fishing
competitions in years 1993 to 1999 and then 2006, 2007 and 2011. The early
data sets have 600 to 1000 fish and the later ones around 125.
Each fish has length, weight, condition factor, how caught, species, sex, maturity
and a fish ID number.
Weight is in Kg
Length is in cm, (minimum for this competition is 45cm measured from nose to
the V-in the tail.
Condition Factor is a calculation
although I think there is a small alteration for the competition purposes. A good
investigation!
How Caught is the method used to catch the fish and is DT Deep Trolling, ST
Shallow Trolling, FF Fly Fishing, Down Rigging appears in latest datasets as
regulations changed.
Species is either Rainbow or Brown, oncorhynchus mykiss or salmo trutta.
Sex is Male or Female if able to be determined.
Maturity is either maiden or spawned.
ID Number is a competition generated number showing the order of weighing in
as the numbers get bigger.
Jim Hogan
Page 1
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
Department of Conservation, (http://www.doc.govt.nz/by-region/centralnorth-island/) Fishing Officers determined all the measurements and
judgements and this data was recorded by competition organisers. This data was
used by DOC to give measures of trout in the lake at the time.
Here is a website for research History of Taupo Fishery Link. Google to find out
more about fishing methods. Try using “rainbow trout”, “taupo fishing”, “fly
fishing” etc. 1998 Home Pages of the competition are available on disc.
KEY IDEA
It is probably a very wise idea to have a theme (of say Fishing or Water) running
in your department (or school) so that conceptual knowledge is built over time
in anticipation of future learning and assessment opportunities.
Year 9, 10 Teaching Activity Suggestions.
These suggestions are for teachers to use to help design a learning programme
suitable for their students. See Census at School and www.nzmaths.co.nz for
more ideas. Data cards are a good starter. (see NZMATHS, Statistics).
Show a video of fishing to help set the scene. There are u-tube videos
(http://www.youtube.com/watch?v=1V8RdK0h1-E) that show a few fishing
techniques. This is a fly fishing experience in the South Island. Here is a boy
called Lucas http://www.youtube.com/watch?v=24PVjq4jb3Y and a dog called
Milo.
Introduce the dataset of 1993, all 618 trout by letting students “fish” and catch 9,
17 or 19 trout each. Have students plot these and make a dot plot and a box and
whisker. Why these numbers (gen form 4n + 9 because the middle and quartiles
are easy to find, 5 is too small but works).
There is a page of six of these line graphs in the resource. Students can pair up
and fish together or do it on their own. Replace the fish if you need and mix them
up. There are plenty of fish to catch. The 1999 dataset has over 1000 trout.
Making the Teacher Resource
1. Find a box about 20cm by 20cm by 10cm to keep the little fish card sin.
2. Decorate with Lake Taupo pictures.
3. Print and cut up the dataset required.
1993 Sex and Weight only – Introducing topic and junior years
Data set 1, 1993, 618 fish, weight, length and sex.
Data Set 2, 1999, 1000+ fish, weight, length and sex.
Data Set 3, 2011, 125 fish, weight, length and sex.
4. Take a sample 17 fish, draw a dot plot and a box and whisker on the grid
supplied. Repeat to complete page. Notice variation between samples. [catch and
release only…put your fish back!] {Rule for nice sample size 4x + 5]
Jim Hogan
Page 2
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
Things to explore and connect with other areas of mathematics.
These suggestions in connect to probability, statistics, geometry, measurement,
number and algebra. Group work, investigation and project work.
• Making a dot plot
• Making box and whisker
• Determining what an outlier could be (3x IQR above or below quartiles)
• Comparing Male and Female fish (length or weight)
• Estimate the probability of catching a big fish.
• Using the Condition Factor formula to find best fish
• Using scale factor and grid to enlarge Trouty
• or other pictures
or
.
• Discuss how to estimate the number of fish in Lake Taupo. [DOC
estimate approximately 1.5 mature fish (>45cm)] Tagging fish simulation.
• Estimate the area of Lake Taupo from a map outline [Pics formula, count
squares, approximate shapes]
• The average depth of Lake Taupo is about 110m, calculate volume and mass of
water in the lake. [Surface area, 616 km2. Average depth, 110 m, about 60 cubic
km!..wow]
• Find out how many rivers flow into the lake and how many flow out.
Jim Hogan
Page 3
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
• Work out how much energy is stored as hydro-electricity. Name all of
the hydro-stations on the Waikato River.
• Draw a cross section diagram showing heights of all stations.
• Choose a project to investigate and produce a wall display.
• Go fishing or learn how to tie a fishing fly.
• Discover the National Trout Centre http://www.troutcentre.com/ on
the Tongariro River.
• http://www.food.com/recipes/trout, yum
• Present your project!
KEY TASK
List as many statistical ideas as you can.
Outline how statistical inference ideas are developed in your scheme for junior
classes.
Jim Hogan
Page 4
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
Year 11 • NCEA Level 1
The Achievement Standard:AS91035 “Investigate a given multivariate data set using the statistical
enquiry cycle” (4cr)
is assessable using the Trout data. There are also possibilities for problems to be
solved by students that would provide evidence towards the Work and Study
Skills Unit Standards 26623, 26626 and 26627.
Key Learning
• taking a random sample to avoid bias
• using dot plot and box and whisker to illustrate sample
• describing features of a sample distribution
• overlap and spread to assist “making a call” for comparisons
• point estimates of population parameters
• use of contextual information associated with the data
Learning PPDAC
The Data
Each fish has length, weight, condition factor, how caught, species, sex, maturity
and a fish ID number.
Weight is in Kg
Length is in cm, (minimum for this competition is 45cm measured from nose to
the V-in the tail.
Condition Factor is a calculation
although I think there is a small alteration for the competition purposes.
How Caught is the method used to catch the fish and is DT Deep Trolling, ST
Shallow Trolling, FF Fly Fishing, Down Rigging appears in latest datasets as
regulations changed.
Species is either Rainbow or Brown, oncorhynchus mykiss or salmo trutta.
Sex is Male or Female if able to be determined.
Maturity is either maiden or spawned.
ID Number is a competition generated number showing the order of weighing in
as the numbers get bigger.
Sample Assessment for AS91035
Jim Hogan
Page 5
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
Using the 1993 Lake Taupo Fishing Tournament trout data pose a suitable
comparison question and using the PPDAC cycle present your analysis and
answer your question.
1. Problem: I wonder if the weight of a female trout is larger than the weight of a
male trout caught in the Lake Taupo Fishing Tournament in 1993?
2. Plan: I am going to take a random sample about 30 male and 30 female fish
from the 1993 Lake Taupo Fishing Tournament results and computer the mean.
This will be my point estimate. I will describe the distributions and use overlap
and spread to make my decision.
3. Data: 1993 Lake Taupo Fishing Tournament results
4. Analysis: I used Fathom software to sample 30 random male and 30 random
female fish .
Jim Hogan
Page 6
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
[Note, iNZight can be used to give the same results and summary data].
In my sample there were 30 of each sex. The male distribution has less spread
than the females selected and the weights are grouped around 2kg. The females
have 3 big fish but most are grouped around 1.56kg. The mean of the males is
1.9kg and the mean of the females is 1.8kg. From the box and whisker plots it can
be seen that 50% of the males are bigger than 75% of the females. The difference
in the means is 0.1kg, and the overall spread is 2.19 – 1.44 = UQmales –
LQfemales = 0.75kg. The overlap of 0.1kg is much smaller than a third of the
overall spread.
5. Conclusion: Based upon this sample it appears that male trout caught in the
1993 Lake Taupo Fishing Tournament are heavier than the female trout caught.
The answer to my question is no, female trout do not appear to be heavier than
male trout. The mean weight of both male and female fish caught was 1.85kg.
This is a point estimate of the weight of any legal fish caught in Lake Taupo in
1993. The minimum length of fish weighed in is 45cm so a legal fish is one that
exceeds this length.
I expected female fish to be heavier because at this time of the year they are full
of roe ready for spawning up rivers within a month or so.
Broad Answer Schedule
See above for sample and analysis. A suitable comparison question might be “I
wonder if the mean weight of a female trout is bigger than the mean weight of a
male trout caught in the Lake Taupo Fishing Tournament in 1993?”
A – PPDAC cycle used, sample taken, graphical evidence used to answer question.
M- Features of sample distribution described and overlap shift used to justify
answer to question posed.
Jim Hogan
Page 7
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
E – Contextual information included describing why the female fish might be
larger than the male fish. Female fish in April are preparing for spawning and are
full of roe, or eggs. In the following months they migrate up rivers to reproduce.
Note if this context is going to be used to assess AS91035 or AS91036 a more
professional examination and more detailed answer schedule should be
developed. This example was provided as an illustration only.
KEY TASKS
Write a paragraph explaining why this example meets the standard for assessing
AS 91035. Include ideas for improvement.
Develop an assessment for AS91035 or AS91036 with answer schedule task
based on this example.
Jim Hogan
Page 8
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
Year 12 • NCEA Level 2
The internal Achievement Standard:AS92064 “Use statistical methods to make an inference (4cr)
could be assessed using this topic.
Key Learning
• sample size affects variability
• sample variability is inversely proportional to √n
• point estimate of population parameters
• IQR as a measure of spread
• estimate of median based upon median ≠ 1.5xIQR/√n
• interpreting informal inference interval
• using iNZight (or Fathom) essential
Sample Assessment
Using the 1999 Lake Taupo Fishing Tournament trout data pose a suitable
comparison question and using the PPDAC cycle present your analysis and
answer your question.
Sample Answer
Problem: I wonder if the median condition factor of female trout caught during
the Lake Taupo Fishing Tournament were larger than the median condition
factor of male trout caught?
[Note inclusion of the “median” ref moderator reports 2012/3”.
Plan: I am going to use Fathom software to randomly sample at least 30 female
and at least 30 male trout caught during the 1999 Lake Taupo Trout Fishing
Tournament. I will make a dot plot and a box and whisker plot of the trout
caught. I am choosing 30 of each because from class work a sample of size 30
appears to give a reliable point estimate and interval.
Data: 1999 Lake Taupo Trout Fishing Results database.
Analysis: I sampled 75 trout which included 31 male trout and 43 female trout,
There must have been more female trout caught in the tournament because
every resample I ended up with more female trout. Displaying all fish caught
showed nearly 700 female trout and about 400 male trout. There were almost
twice as many female trout caught.
I established an interval based on this sample from which I can be pretty sure
that the true population median is contained.
Jim Hogan
Page 9
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
Jim Hogan
Page 10
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
The median condition factor for the female trout was 45.9 and the mean
condition factor for male trout was 42.2. These are point estimates.
Using the formula parameter ≠ 1.5xIQR/√n the interval generated for the female
trout for the median condition factor is
= 45.9 ≠ 1.5 x (49.7 – 40.4)/√43 = [48.0, 43.8] (3sf)
and the interval generated for the male trout for the median condition factor is
= 42.2 ≠ 1.5 x (46.7 – 37.2)/√31 = [44.8, 39.6] (3sf)
There was one fish whose sex was unidentified. This has been omitted form the
analysis. Some fish are very difficult to sex due to maturity.
The difference in medians is
=median female – median male = 45.9 – 42.2 = 3.7 [mass]/[length]^3
Jim Hogan
Page 11
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
Conclusion: The mean condition factor for the females was 3.7 higher but the
interval estimates overlapped slightly. I am pretty sure that the mean condition
factor for female trout is in the interval 43.8 to 48.0. I am pretty sure that the
mean condition factor for males is in the interval 39.6 to 44.8. There is a small
overlap 44.8 – 43.8 = 1.0 compared to the overall spread of 48.0 – 39.6 = 8.4. I
will not make a call that the condition factor for female fish is bigger than that for
male fish because there is an overlap.
A reason to suspect females to have a higher condition factor is that this April
competition is just a few months before the spawning period when the fat
females deposit abundant quantities of eggs in the rivers around Lake Taupo.
The male trout are also fit and healthy but the sperm sacs are not as heavy as the
female egg sacs. The fish become fatter but not longer so the condition factor
formula should give bigger values for these fatter female fish. This data does not
support this hunch.
KEY TASK
Write a paragraph explaining why this example meets the standard for assessing
AS 92064. Include ideas for improvement.
Develop an assessment for AS 92064 with answer schedule task based on this
example. Use any trout database.
Jim Hogan
Page 12
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
Year 13 • NCEA Level 3
The internal Achievement Standard:AS91581 “Investigate bivariate measurement data” (4cr)
AS91582 “Use statistical methods to make a formal inference” (4cr)
could be assessed using this topic. There is time series information but it is not
sufficient for assessment.
Key Learning
• sample size affects variability
• sample variability is inversely proportional to √n
• point estimate of population parameters
• the bootstrap method of resampling
• estimate of parameter interval using bootstrap or other
• interpreting formal inference interval
• using iNZight (or Fathom) essential
• could involve correlation, model and strength of relationship.
Sample Assessment
Using the Lake Taupo Fishing Tournament trout data pose a suitable comparison
question and using the PPDAC cycle present your analysis and answer your
question.
Sample Answer
The trout data includes years 1993 to 1999, and 2006, 2007 and 2011. I am
interested to see if the mean weight of a trout over this time period has changed.
[Usually inference is made about a difference, Hence I wonder if there is
difference in the weights of fish in the 1993 tournament and the 2011
tornament? is an alternative question]
Problem: Has the mean weight of trout caught during the Lake Taupo Trout
Fishing Tournament over the data periods supplied stayed the same?
Plan: I am going to take a size 20 sample from the 1994, 1998 and 2007 and
2011 data distributions to establish a bootstrapped interval estimate of the
median weight for each of these years. A line graph of these will show trends
over this period 1994 to 2011.
Data: The data supplied is the fish that were weighed in as legal limit fish. These
are fish greater than 45cm in length and “kept” by fishermen who are only
allowed to catch a fishing “limit”. This data is really a sample of the whole lake
but for this assessment I am going to treat the sample as the population. The
time of the tournament is April each year.
I noticed when researching the tournament that there was an Mt Ruapehu
situated 40km to the Southwest of Lake Taupo erupted in 1995 and 1996. This
deposited ash in the lake. Waikato Regional Council estimate 2.3 million tonnes
of ash fell into lake Taupo during this eruption and in the short term “water
Jim Hogan
Page 13
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
quality improved”. http://www.waikatoregion.govt.nz/Environment/Natural-resources/Water/Lakes/LakeTaupo/How-Mount-Ruapehus-eruptions-affect-Lake-Taupo/
A thesis on trout in lake taupo by Heeg, 2012 is here
http://researcharchive.vuw.ac.nz/bitstream/handle/10063/2046/thesis.pdf?sequence=2
Analysis: I sampled 20 randomly selected trout from the 1994 database. Then I
bootrapped by sampling with replacement, 200 samples of size 20. This method
mimics the sample 200 times. The diagram shows the original sample, a
bootstrap sample and the collection of 200 medians.
The collection of medians is used to establish the interval for true value of the
population median. Finding the 5th sample median on the right of the
distribution and the 5th sample median on the left establishes a 95 percentile
interval containing the median of the population.
Jim Hogan
Page 14
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
This works out to be [1.75, 2.28] kg.
This process was repeated for 1998, 2006 and 2011.
The results for these years were:-
1998 interval is [1.9, 2.7]
Jim Hogan
Page 15
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
2006 interval is [1.6, 2.25]
2011 interval is [0.95, 1.4]
These values were put into a table and graphed.
Jim Hogan
Page 16
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
The graph shows an increase in max and min median weight to 1998 and then a
decrease through 2006 to 2011. The answer to the problem posed is that the
median weight of lake Taupo trout has not stayed the same during the data
periods. The median increased to a maximum of around 2kg in 1998 and then
decreased to around 1.2kg in 2011.
Jim Hogan
Page 17
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
The eruption noted earlier could have resulted in the lake being cleaned and
more suitable for trout. The trout contains some 1.5million catchable trout as
researched by DOC and reported in Target magazines. Google “DOC Trout Target
NZ” for many reports. The larger trout of earlier years has been replaced by
more and smaller trout of today. There is also a catfish issue which could be
interfering with food supply chains. The smaller trout of 2011 do have
reasonable condition factors so there may be other factors involved.
Answer Schedule
A - The student has used the bootstrap method to establish an answer to part of
the question posed.
M – The student has justified the sampling method and decisions made as well as
answer the question posed.
E – A well presented and reasoned report showing contextual knowledge applied
appropriately to explain outcomes from the investigation. The bootstrap method
is clear.
KEY TASK
Write a paragraph explaining why this example meets the standard for assessing
AS 91581. Include ideas for improvement.
Develop an assessment for AS 91581 or AS 91582 with answer schedule task
based on this example. Use any trout database. There are some interesting
alternatives with the data using methods of fishing.
Author Notes
This document is an attempt to show how early context learning can be used to
inform later assessment. Introducing a topic in earlier years provides the
opportunity for student interest, learning the context, and being informed prior
to assessment in senior levels. The statistical techniques involved are modern
and use the computer and in this assessment somewhat artificial as the data is in
fact a sample of the whole population of the Lake Taupo trout of which we know
little. Estimates of over 1.5million legal fish exist on the DOC website and
described in past issues of their publication TARGET.
The software is powerful enough to take a bootstrap sample of size 1018 (1998)
and repeat this 10000 times to establish a very tight interval of population
parameters. This could be done but only 100 or so resamples are needed for a
large samples. It soon becomes very obvious that bootstrapping works, a
surprisingly small sample will give a surprisingly good estimate of many
parameters. My experience suggests a sample size of between 7 to 18 resampled
200 times will produce a very good inference.
Jim Hogan
Page 18
2/7/2016
Statistical Inference Resource • Years 9 to 13 • Update Nov 26 2013
The Central Limit Theorem can be used to assess this standard as no specific
reference is made for techniques used. The CLT is being “de-emphasised”
however, in an attempt that students understand intervals. The modern
computer and available software as shown allows resampling techniques and
improved access to understanding the inference process of statistical thinking.
The advantage of resampling is any population shape can be analysed in this way
as the resample simply mirrors the sample taken. For the population involved
here the part of the normal distribution of weights considered is those of trout
with lengths above 45cm. This is a truncated distribution. The population of
small trout less than 45cm must be enormous and dynamic but finite.
A population with a wide variation will produce samples that have wide
variations. Hence the IQR in the Year 12 formula. Likewise the larger sample size
reduces the variation in the sample hence the 1/√n aspect of the Year 12
formula. Note the 1.5 multiplier in this formula is a guess as to what will work
made by John Tukey.
Local situations, strengths and resources must dominate school curriculum and
assessment. For the Central Plateau, Rotorua, Napier, Gisborne, Otago regions
trout fishing is a significant activity and one that would be popular with may
thousands of school students. This topic might not be popular with central city,
rural or coastal schools with different priorities.
2013
iNZight has become established as the standard software. The VIT modules are
excellent teaching resources to show variation in sampling, bootstrapping and
randomisation ideas. Google “inzight nz” for finding this FREE software.
See http://schools.reap.org.nz/advisor for all files.
Jim Hogan
Page 19
2/7/2016
Download