Q Sci 381 Lab 2 October 8, 2014 As the head fisheries biologist at the Horne River salmon counting station, it is your job to analyze the data your team gathers. Finding the length of the Chinook salmon that are swimming to the headwaters to spawn is of high priority to the Horne River Trust Fund (HRTF). Throughout the salmon run, your team of biologists has measured a sample of 15 chinook out of the 327 chinook counted swimming upstream. The Excel File chin_lengths are the lengths in centimeters of the fish sampled 1. Find the mean, median and range of the sample. In a new cell type in “=average(A2:A16)”. You can highlight the data you’re interested in to save time. Repeat this for median (“=median(A2:A16)”). To find the range use the min and max functions and then subtract min from max: “=min(A2:A16)” gives the minimum value “=max(A2:A16)” gives the maximum value (Max value)-(min value) = range 2. What is the standard deviation? (Hint- is this a sample or population?) We have a sample of 15 fish out of 327 counted. To find the standard deviation of this population we use “=STDEV.S(A2:A16)” 3. Calculate z-scores for each value. Are any data points unusual? ๐ง= ๐ฃ๐๐๐ข๐ − ๐๐๐๐ ๐ ๐ก๐๐๐๐๐๐ ๐๐๐ฃ๐๐๐ก๐๐๐ To do this in excel, we’ll create a new column titled z. Highlight the entire z score column from B2 to B16. Now in the formula bar type in “=(A2:A16-A18)/A19”. Here’s the important part: hit Control + shift + enter (command + enter on a mac). This helps excel pull from the correct cells as it tabulates our z-score. 0.123221771 Looking at our z-score column we can see that our last chin length of 103.4 is 2.8 standard deviations away from our mean. This is pretty unusual since it’s outside of the 95th. 4. Make a histogram of the lengths using 5 classes. Is the distribution symmetric? We know from earlier calculations we have a range of 68.7. We divide by our desired number of classes (5) and find the class width should be 13.74; we’ll round up to 14. 68.7 ๐๐๐๐๐ = 13.74 ≈ 14 ๐๐๐๐ ๐ ๐ค๐๐๐กโ 5 ๐๐๐๐ ๐ ๐๐ Beginning with just below our lower limit(34.7) we’ll create our classes and class boundaries. Class Class Boundaries 34-48 48 48-62 62 62-76 76 76-90 90 90-104 104 Now, you can either use the frequency function to calculate frequencies for a histogram: Select the new column (+ one additional cell) that will become the frequency column. Use the frequency function to calculate frequencies (=frequencies(data array, class boundaries)). Remember to control+shift+enter. Sum the frequency column. Create a new column to become relative frequency. OR you can create the histogram using the data analysis tool we used last week by choosing Data Analysis -> Histogram and then filling in our data array for the input range and the upper limits for the bin range. Then we might wish to graph the relative frequencies. Remember the relative frequencies are percentages for each class. To compute relative frequencies, you need the frequency for each class and the total number of data points. Name a new column “Rel. Frequency” next to the frequency column. In the first cell of this new column type in “=6/15.” You can also select the cell containing 6 instead of typing it; if you do this you can copy and paste to fill the rest of the column. Class 34-48 48-62 62-76 76-90 90-104 Class Boundaries 48 62 76 90 104 Relative Frequency Frequency 6 0.4 5 0.333333333 3 0.2 0 0 1 0.066666667 0 15 To display the relative frequencies as a histogram, simply drag the blue box that was shading in the frequency column over to the next relative frequency column. The graph should now display Relative Frequency the relative frequencies. Relative Frequency of Chinook Lengths 0.5 0 48 62 76 Length (cm) 90 104 5. Determine the 1st, 2nd, and 3rd quartile. To determine quartiles in excel, we use the “quartile.inc” function. For the first quartile we’ll type “=quartile.inc(A2:A16, 1)”, where A2:A16 is the data we’re interested in and “1” tells excel we’re interested in the 1st quartile. For the second quartile we’ll type “=quartile.inc(A2:A16, 2)”; we could also use the median since they’re the same thing. For the third quartile we’ll type “=quartile.inc(A2:A16, 3)”. The min and max and quartile 0 and 4 respectively should give the same values. 6. Make a box-and-whisker plot for the data. First take a minute to arrange our values and make sure we have everything we need to create a box-and-whisker plot. We should end up with something like: Min min valid 1st quartile Average median 3rd quartile max valid max 34.7 34.7 42.15 54.6 53.3 61.55 70.3 103.4 Min valid is the lowest value that is not an outlier and max value is the greatest value, which is not an outlier (take a look at the z-scores to think about this). Now calculate the width of each percentile box and the non-outlier limits. For example, to find the width of the 50th percentile box subtract the width of the 25th quartile from the 50th quartile. 25th percentile 50th percentile 75th percentile lower limit higher limit 42.15 11.15 8.25 7.45 8.8 Select the three percentile cells and make a stacked bar graph (insert-> horizontal bar charts icon-> all charts -> second horizontal stacked bar icon ). It should return something like this: We’ll modify the graph to make it look closer to a box-and-whisker plot. -click, Format Data Series -> Fill -> No fill) -> Solid Line) -click, Format Axis -> scale -> Vertical axis crosses at maximum axis value -click, Format Axis..., Line Color, No line -click in the chart, Select Data, pick the Horizontal (Category) Axis labels, click the Edit button and pick the cell containing your data label So far we’ve got: To add lower and upper limits: ck on “add chart element” on the top menu -> Error bars -> error bars options -> error bars -> minus; then custom value -> specify value -> choose lower limit for negative error value Now select the third box and follow the previous steps except choose the upper limit for the positive error value. To add the average: ๏ท In a cell next to your average write 0 ๏ท Select the cells containing the average label and value (C3:D3). Do a copy (ctrl-c). Simply select the graph and paste this value (ctrl-v) as a new series. ๏ท It's being added as a new box in our graph. Select it and Change Series Chart Type..., pick the X Y (Scatter) with only markers. ๏ท This will show the marker, but it won't be positioned correctly in the graph. Remember we switched rows and columns earlier: right-click anywhere in the chart, Select Data... then pick the newly added series (Average). Edit the X to be your average value and yvalue to be 0. ๏ท Pick the left-side Secondary Vertical (Value) Axis and rick-click to Format Axis..., set the Minimum value to -1 and the Maximum to 1. Close the dialog. You can now delete this extra axis scale indicator. ๏ท The average is now positioned correctly. You might want to change the indicator to a more visible symbol. ๏ท Right-click the graph and Select Data... ๏ท Click the Add button ๏ท For the Series X, select values greater than our maximum valid (103.4 in our case) ๏ท For the Series Y Values, put ={0} so the new data points are shown correctly on the vertical scale. Finally! For more help on box-and-whisker plots: read Stephane Hamel: Box plot and whisker plots in Excel 2007 ~ Stéphane Hamel - immeria::an immersion in digital analytics