Lab 2 skills

advertisement
Q Sci 381
Lab 2
October 8, 2014
As the head fisheries biologist at the Horne River salmon counting station, it is your job to
analyze the data your team gathers. Finding the length of the Chinook salmon that are
swimming to the headwaters to spawn is of high priority to the Horne River Trust Fund (HRTF).
Throughout the salmon run, your team of biologists has measured a sample of 15 chinook out of
the 327 chinook counted swimming upstream. The Excel File chin_lengths are the lengths in
centimeters of the fish sampled
1. Find the mean, median and range of the sample.
In a new cell type in “=average(A2:A16)”. You can highlight the data you’re interested in to
save time. Repeat this for median (“=median(A2:A16)”). To find the range use the min and
max functions and then subtract min from max:
“=min(A2:A16)” gives the minimum value
“=max(A2:A16)” gives the maximum value
(Max value)-(min value) = range
2. What is the standard deviation? (Hint- is this a sample or population?)
We have a sample of 15 fish out of 327 counted. To find the standard deviation of this
population we use “=STDEV.S(A2:A16)”
3. Calculate z-scores for each value. Are any data points unusual?
๐‘ง=
๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ − ๐‘š๐‘’๐‘Ž๐‘›
๐‘ ๐‘ก๐‘Ž๐‘›๐‘‘๐‘Ž๐‘Ÿ๐‘‘ ๐‘‘๐‘’๐‘ฃ๐‘–๐‘Ž๐‘ก๐‘–๐‘œ๐‘›
To do this in excel, we’ll create a new column titled z. Highlight the entire z score column from
B2 to B16. Now in the formula bar type in “=(A2:A16-A18)/A19”. Here’s the important part:
hit Control + shift + enter (command + enter on a mac). This helps excel pull from the correct
cells as it tabulates our z-score.
0.123221771
Looking at our z-score column we can see that our last chin length of 103.4 is 2.8 standard
deviations away from our mean. This is pretty unusual since it’s outside of the 95th.
4. Make a histogram of the lengths using 5 classes. Is the distribution symmetric?
We know from earlier calculations we have a range of 68.7. We divide by our desired number of
classes (5) and find the class width should be 13.74; we’ll round up to 14.
68.7 ๐‘Ÿ๐‘Ž๐‘›๐‘”๐‘’
= 13.74 ≈ 14 ๐‘๐‘™๐‘Ž๐‘ ๐‘  ๐‘ค๐‘–๐‘‘๐‘กโ„Ž
5 ๐‘๐‘™๐‘Ž๐‘ ๐‘ ๐‘’๐‘ 
Beginning with just below our lower limit(34.7) we’ll create our classes and class boundaries.
Class
Class Boundaries
34-48
48
48-62
62
62-76
76
76-90
90
90-104
104
Now, you can either use the frequency function to calculate frequencies for a histogram:
Select the new column (+ one additional cell) that will become the frequency column.
Use the frequency function to calculate frequencies (=frequencies(data array, class boundaries)).
Remember to control+shift+enter.
Sum the frequency column.
Create a new column to become relative frequency.
OR you can create the histogram using the data analysis tool we used last week by choosing Data
Analysis -> Histogram and then filling in our data array for the input range and the upper limits
for the bin range.
Then we might wish to graph the relative frequencies. Remember the relative frequencies are
percentages for each class. To compute relative frequencies, you need the frequency for each
class and the total number of data points.
Name a new column “Rel. Frequency” next to the frequency column. In the first cell of this new
column type in “=6/15.” You can also select the cell containing 6 instead of typing it; if you do
this you can copy and paste to fill the rest of the column.
Class
34-48
48-62
62-76
76-90
90-104
Class
Boundaries
48
62
76
90
104
Relative
Frequency Frequency
6
0.4
5 0.333333333
3
0.2
0
0
1 0.066666667
0
15
To display the relative frequencies as a histogram, simply drag the blue box that was shading in
the frequency column over to the next relative frequency column. The graph should now display
Relative Frequency
the relative frequencies.
Relative Frequency of
Chinook Lengths
0.5
0
48
62
76
Length (cm)
90
104
5. Determine the 1st, 2nd, and 3rd quartile.
To determine quartiles in excel, we use the “quartile.inc” function. For the first quartile we’ll
type “=quartile.inc(A2:A16, 1)”, where A2:A16 is the data we’re interested in and “1” tells excel
we’re interested in the 1st quartile.
For the second quartile we’ll type “=quartile.inc(A2:A16, 2)”; we could also use the median
since they’re the same thing. For the third quartile we’ll type “=quartile.inc(A2:A16, 3)”. The
min and max and quartile 0 and 4 respectively should give the same values.
6. Make a box-and-whisker plot for the data.
First take a minute to arrange our values and make sure we have everything we need to create a
box-and-whisker plot. We should end up with something like:
Min
min valid
1st
quartile
Average
median
3rd
quartile
max valid
max
34.7
34.7
42.15
54.6
53.3
61.55
70.3
103.4
Min valid is the lowest value that is not an outlier and max value is the greatest value, which is
not an outlier (take a look at the z-scores to think about this).
Now calculate the width of each percentile box and the non-outlier limits. For example, to find
the width of the 50th percentile box subtract the width of the 25th quartile from the 50th quartile.
25th percentile
50th percentile
75th percentile
lower limit
higher limit
42.15
11.15
8.25
7.45
8.8
Select the three percentile cells and make a stacked bar graph (insert-> horizontal bar charts
icon-> all charts -> second horizontal stacked bar icon ). It should return something like this:
We’ll modify the graph to make it look closer to a box-and-whisker plot.
-click, Format Data Series -> Fill ->
No fill)
-> Solid
Line)
-click,
Format Axis -> scale -> Vertical axis crosses at maximum axis value
-click, Format Axis...,
Line Color, No line
-click in the chart, Select Data, pick the Horizontal
(Category) Axis labels, click the Edit button and pick the cell containing your data label
So far we’ve got:
To add lower and upper limits:
ck on “add chart element” on the top menu -> Error bars -> error bars
options -> error bars -> minus; then custom value -> specify value -> choose lower limit for
negative error value
Now select the third box and follow the previous steps except choose the upper limit for the
positive error value.
To add the average:
๏‚ท
In a cell next to your average write 0
๏‚ท
Select the cells containing the average label and value (C3:D3). Do a copy (ctrl-c).
Simply select the graph and paste this value (ctrl-v) as a new series.
๏‚ท
It's being added as a new box in our graph. Select it and Change Series Chart Type...,
pick the X Y (Scatter) with only markers.
๏‚ท
This will show the marker, but it won't be positioned correctly in the graph. Remember
we switched rows and columns earlier: right-click anywhere in the chart, Select Data...
then pick the newly added series (Average). Edit the X to be your average value and yvalue to be 0.
๏‚ท
Pick the left-side Secondary Vertical (Value) Axis and rick-click to Format Axis..., set
the Minimum value to -1 and the Maximum to 1. Close the dialog. You can now delete
this extra axis scale indicator.
๏‚ท
The average is now positioned correctly. You might want to change the indicator to a
more visible symbol.
๏‚ท
Right-click the graph and Select Data...
๏‚ท
Click the Add button
๏‚ท
For the Series X, select values greater than our maximum valid (103.4 in our case)
๏‚ท
For the Series Y Values, put ={0} so the new data points are shown correctly on the
vertical scale.
Finally!
For more help on box-and-whisker plots: read Stephane Hamel: Box plot and whisker plots in
Excel 2007 ~ Stéphane Hamel - immeria::an immersion in digital analytics
Download