Mayfield High School – Higher Tier

advertisement
Mayfield High School – Higher Tier
Mayfield High is a fictitious school but the data collected is from real students. You
will investigate the relationship between height and weight. There are mistakes in the
data including impossible heights and weights.
Locating the Mayfield Database
In the school Intranet select Subject and Group Sites
Then select Maths Website, followed by Mayfield and click on the word HERE
You will spend some time on the computers obtaining data to work on and producing
graphs and statistical calculations, but marks are awarded for your entire write up of
the coursework. Your report must contain the following:
1.
2.
3.
4.
5.
A pilot study (including a commentary)
A detailed plan for your main study
Calculations and graphs produced
Detailed conclusions
An evaluation
Firstly, as you are only interested in height, weight, year and gender you must delete
the rest of the data.
Deleting Unwanted Data (leaving Year, Gender, Height and Weight)
Select columns 2 to 7
From the Edit menu select Delete
Now repeat this for columns 3 to 12 and 5 to 11
REMEMBER TO REGULARLY SAVE YOUR WORK INTO YOUR MY WORK FOLDER
1
Part 1 – The Pilot Study
A pilot study (sometimes called a pretest) is a preliminary test to see if there is a line
of enquiry to investigate further.
You will be testing the following hypothesis in your pilot study:

The taller you are the heavier you are
Taking a Sample
There are the 1183 students at Mayfield High School. The table shows the number of
students in each year group for males and females.
Year Group
Year 7
Year 8
Year 9
Year 10
Year 11
Boys
151
145
118
106
84
Girls
131
125
143
94
86
Total
282
270
261
200
170
For your pilot study you must consider each of the 10 groups in the school (i.e Year 7
Girls, Year 7 Boys, Year 8 Girls etc) to test your initial hypotheses. To do this you will
need samples from each group as the whole school is too large to work with. You
should take a stratified random sample of 100 students for your pilot study.
Stratified Sample - A stratified sample takes a proportional number from each
group in the population so that each group is fairly represented. This is necessary
when producing graphs or statistical calculations on more than one section of the
population together.
E.g. If there were 150 Year 7 girls in a school with 1000 students and you wanted to
take a stratified sample of 100 students in total, then use the formula:
No. of each group in sample =
E.g.
No. of Year 7 girls =
no. of students in group
 sample size
total no. of students in school
150
x 100 = 15 students
1000
2
Random Sample - Every person should have an equal chance of being chosen for your
sample to make it fair and avoid bias. A quick way of doing this is to give each student
a random number and then sort the data on this number which produces a random list
of the data.
Getting a random sample:
1. Type =RAND() in the first free cell to the right of the first line of data and
press Enter to insert a random number.
2. Click on this cell again and move the cursor to the bottom right of the cell until
it changes to a black cross. Drag down until you reach the bottom of the data.
3. To mix up the data, highlight the cell to the right of the first random number.
Select the Data menu and Sort. Sort by column 5 and this will mix up all the
data.
4. Now select Data menu and Sort and Sort by Year Group Then by Gender.
5. Select the number of calculated students from each group and copy to a
separate sheet
Calculations and evidence required for your pilot study:


One scatter graph showing all sample for pilot study
Correlation coefficient for all heights and weights in your sample
Scatter Graphs – Are used to compare the relationship (correlation) between two
types of data.
Correlation Coefficient – This is a more accurate method to compare correlation.
It uses the mean of each set of data and looks at the distance away from the mean of
each point. The formula, which is known as the Product Moment Correlation
Coefficient or r is
r
 ( x  x)( y  y)
 ( x  x)  ( y  y )
2
2
(where x and y are the means of the x and y values respectively)
The value of r determines correlation. It is always between –1 and 1.
3
-1 = Perfect Negative Correlation
-0.8 = Good Negative Correlation
-0.5 = Some Negative Correlation
0 = No Correlation
1 = Perfect Positive Correlation
0.8 = Good Positive Correlation
0.5 = Some Positive Correlation
A line of best fit should only be drawn on a scatter graph if the correlation
coefficient is >0.6 or <-0.6
The reason for displaying the equation of line of best fit is that it can be used to make
predictions. E.g. If the equation for Year 8 boys is y = 50x-40 this means for a boy in
Year 8 his weight can be predicted if you know his height from calculating
weight = 50xheight – 40.
r2 (called R2 in Excel) is the square of the correlation coefficient and allows you to
look at the likelihood of obtaining correct predictions from a line of best fit. R2 is the
likelihood that an increase in x will produce an increase in y. (i.e. that an increase in
height will mean an increase in weight).
E.g. If the Correlation coefficient, r = 0.8
then r2 = 0.64
= 64%
That is a 64% chance that from any point on the line increasing the height will result in
an increase in weight.
To Draw a Scatter Graph in Excel:
1. Highlight the two columns of data
2. Click on Chart Wizard (Bar chart icon on tool bar)
3. Choose XY(Scatter)
4. Enter chart title and label axes(remember units!)
5. In Legend untick box labelled Show Legend
6. Choose whether to save as separate chart or on sheet
To Improve Presentation:
Right click on x-axis and select format axis, choose scale and change minimum
value. Can repeat for y-axis if necessary.
To Put on a Line of Best Fit (only if strong enough correlation):
Right click on a point in the scatter graph, select add trendline. In options tick
boxes to display equation and R-squared on graph.
4
To Calculate Correlation Coefficient in Excel:
1. Select a blank cell in the spreadsheet
2. Click on fx on the tool bar
3. Select Statistical in the function category
4. Select Correl in the function name and then click ok
5. In Array 1 highlight the heights
6. Click in Array 2 and highlight all the weights
7. Click ok
Part 2 – The Main Study
Based on the results of your pilot study, you should have a line of enquiry to
investigate further and should refine your hypotheses to include how you think age and
gender will affect results.
E.g.
Hypothesis 1 – The taller you are the heavier you are. The
relationship between the two will become stronger as you get older.
Hypothesis 2 – Boys are taller and heavier than girls. The difference
between boys and girls will increase as the students get older.
Sampling
For your main study you will deal with each of the 10 groups in the school separately to
make comparisons across year groups and gender. To do this you will need larger
samples from each group. 30 students from each group should be enough to perform
statistical calculations on.
As your data has already been randomised you simply need to return to this and take
the first 30 from each group.
Anomalies
Once you have taken your sample, you may notice anomalies (extreme values in your
data, like someone who is 4.65m tall or 5kg in weight). These anomalies may be outliers
(i.e. results differing greatly from others in the same sample). You should test for
these outliers and then decide how to deal with them (leave them in or replace them).
Keep your graphs from before and after.
5
There are two methods to test for outliers:
1. Using Standard Deviation
A piece of data is considered an outlier if it is more than two standard deviations away
from the mean of the data set.
e.g. The mean of a set of heights is 1.54m, the standard deviation is 0.11
Is the height 1.25 in the data set is an outlier?
2xstandard deviation = 2 x 0.11 = 0.22
1.54-0.22=1.32
1.25 is smaller than 1.32 and so is more than 2 standard deviations from the mean,
making it an outlier.
2. Using Interquartile Range (IQR)
A piece of data is considered an outlier if it is more than 1.5 times the Interquartile
range above the upper quartile (UQ) or below the lower quartile (LQ).
e.g. The Lower quartile for a set of data is 6, the upper quartile is 9.5. Is the value 15
in the data set an outlier?
IQR = UQ – LQ = 9.5 – 6 = 3.5
Multiply the IQR by 1.5:
3.5 x 1.5 = 5.25
Any values more than 5.25 below the upper quartile or more than 5.25 above the upper
quartile are outliers.
UQ + 5.25 = 9.5 + 5.25 = 14.75
15 is larger than 14.75 and so is an outlier.
Replacing Anomalies
If you find an outlier in your data you must deal with it and include evidence. If it is
only just an outlier in your test you may choose to leave it in, but you must explain why
you have chosen to do this. If it is a clear outlier you must remove it and replace it
with a new person from your original randomised data. Your scatter graph and
calculations will automatically update, so remember to print out any graphs and
calculations before deleting to use as evidence in your report. Do not forget that if
you replace a piece of data, this will also need testing (N.B. The standard deviation or
interquartile range used will be a different value from before).
6
Calculations required for the Main Study
1. Scatter graphs for each group
2. Correlation coefficient for each group
3. Line of best fit, equation and R2 on scatter graphs (only if correlation
coefficient is >0.6)
4. Mean and standard deviation for heights (or weights) for each group
5. Minimum, lower quartile, median, upper quartile and maximum for weights (or
heights). From this you can find the range and interquartile range and produce
box and whisker diagrams (these should be done by hand on graph paper –
putting them all on one sheet along the same scale will make comparisons easier)
Standard Deviation – Standard deviation (represented by the symbol ) looks at
how spread out the data is. It is obtained by looking at how far each individual value is
away from the mean. It is calculated using the formula:
 
 ( x  x)
2
n
where x is the mean of the data set and n is the number of values
The larger the value obtained, the further the values are from the mean. Standard
deviation is a way of comparing two sets of data. Standard deviation can be calculated
in different ways depending upon the type of data. In Excel the function stdevp uses
the above formula which you need for your data.
To Calculate Standard Deviation in Excel:
1. Select an empty cell
2. Click on fx on the tool bar
3. Select Statistical in the function category
4. Select Stdevp in the function name and then click ok
5. Highlight the first item in list and drag down to highlight all the data in the
column, which will appear in the number 1 box (ignore number 2 box) and click ok
To Calculate the Mean in Excel:
1. Click in a blank cell
2. Click on fx on the tool bar
3. Select Statistical in the function category
4. Select Average in the function name and then click ok
5. Highlight the list of numbers you require the mean for, which will appear in the
number 1 box (ignore number 2 box) and click ok
7
Interquartile Range – This is also a measure of spread but looks at the spread of
the middle 50% of the data around the median. It is found by subtracting the lower
quartile from the upper quartile (calculating UQ-LQ).
To Calculate Quartiles in Excel:
The Lower Quartile
1. Click in an empty cell
2. Click on fx on the tool bar
3. Select Statistical in the function category
4. Select Quartile in the function name and then click ok
5. Highlight the column of data, which will appear in the array box
6. Click in quart box and type 1
7. Click ok
To calculate the Minimum, repeat as above but type in 0 instead of 1 in the quart
box
To calculate the Median, repeat as above but type in 2 in the quart box
To calculate the Upper Quartile, repeat as above but type in 3 in the quart box
To calculate the Maximum, repeat as above but type in 4 in the quart box
Drawing Box and Whisker Diagrams
1. Use graph paper
2. Draw a horizontal scale. It is a good idea to draw all the box plots on one piece
of graph paper using the same scale. You will then be able to compare your
results in your conclusion.
3. Mark on the 5 pieces of data you have found and draw in the box and whiskers
as shown on the next page.
Median
Lower Quartile
Upper Quartile
Minimum
Maximum
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
8
Summary Tables
Summary tables are used to make comparisons between years and gender easier. You
should include key pieces of data in your summary table(s):
 Correlation Coefficient
 Mean and Standard Deviation
 Median and Interquartile Range
E.g.
Group
Year 7
Year 8
Year 9
Year 10
Year 11
Correlation
Coefficient
Height
Weight
Standard
Deviation
Mean
Median
IQR
Male
Female
Male
Female
Male
Female
Male
Female
Male
Female
Bar charts can also be a way to compare these values visually.
E.g.
Mean Heights
Mean Height (m)
2
1.5
Girls
1
Boys
0.5
0
Year 7
Year 8
Year 9
Year 10
Year 11
Year Group
9
Conclusions
Your conclusions very important and must be detailed, but clear and not waffle! Any
calculations which you do not mention are considered irrelevant data. Make
statements and comparisons from your calculations across year groups and gender and
link these to your original hypotheses. (More info in Your Report section of this
booklet)
Evaluation
You must consider how reliable your findings are. Can you use them to make correct
predictions for the whole school? Could they be used to make predictions for your
school?
One way of considering the limitations of your sample is to compare the key results in
your summary table to calculations for the whole population (use Excel to obtain these
calculations)
10
Your Report
Your write up must include the following:
Pilot Study




Brief introduction to the coursework task
Why and how pilot study was carried out
Why and how a stratified random sample was taken and any possible limitations
Findings from pilot study
Plan for Main Study




State refined hypotheses (to include variation across years and gender)
How you will prove/disprove your hypotheses (i.e. What calculations and analysis
you will make- Explain which calculations will prove which hypothesis and WHY
you use them/what they show)
Why/how sample is taken; How many; What groupings
How you will eliminate bias/deal with outliers (include evidence)
Represent





Scatter graphs for each group, correlation coefficient, line of best fit and
equation (where appropriate)
Min, max, lower and upper quartiles and median for heights or weights of each
group
Box and Whisker diagrams (on same sheet for later analysis)
Mean and standard deviation for heights or weights.
Summary table(s) of calculated values
Conclusions










Correlation shown on each scatter graph and what they evidence
Comparison of correlation values across the groups
Explain your equations of lines of best fit and use to make
predictions/inferences (include calculations)
Comparison of height/weight between boys/girls
Comparison of height/weight across year groups
Refer to box and whisker diagrams and what they demonstrate
Refer to mean and standard deviation and what they demonstrate
Relate findings to original hypotheses. Do your findings support them?
Give/suggest reasons for your findings
Give/suggest reasons for exceptions to patterns in your data
Evaluation



Was your sample a good/fair/representative sample?
Any limitations of the project?
Any problems you faced and changes you would make
11
12
Download