Artificial Selection Lab Big Idea 1

advertisement
Artificial Selection Lab
Big Idea 1 – Lab 1
Grow Wisconsin Fast Plants (Brassica rapa)
Artificial Selection Lab
Big Idea 1 – Lab 1
Grow Wisconsin Fast Plants (Brassica rapa)
and now we wait…and observe variations
in the plants…like…?
Hmmm….height? OK.
Artificial Selection Lab
Big Idea 1 – Lab 1
Grow Wisconsin Fast Plants (Brassica rapa)
Finally, day 7….
Trichomes on Cannabis
- epidermal outgrowths of
various kinds
Now we need to measure the heights of all our plants and do some appropriate descriptive
statistics for the class data. Where is the class data? On the next slide…
Artificial Selection Lab
Big Idea 1 – Lab 1
Grow Wisconsin Fast Plants (Brassica rapa)
Plant Height Data for a sample size
of 41 plants or N=40 at day 7.
What type of descriptive stats
should we do with this data to
study the population as a whole in
terms of height? (Watch anderson
video on standard deviation)
1. histogram
2. mean
3. median
4. range
5. Standard deviation
Let’s do this…
Artificial Selection Lab
Big Idea 1 – Lab 1
What is a histogram?
These are all histograms. What is the commonality?
Artificial Selection Lab
Big Idea 1 – Lab 1
Histograms are graphs that reveal the
distribution/frequency of your data (how
often particular values appear).
What goes on the X-axis?
Your range of data. This can be individual values (left) or ranges/bins of values
(right)
Artificial Selection Lab
Big Idea 1 – Lab 1
Histograms are graphs that reveal the
distribution/frequency of your data (how
often particular values appear).
What goes on the y-axis?
Frequency or the number of times a given value appears.
Ex1) How many times did a value of 16 appear in a multiple choice test given to a class
of students according to the histogram on the left? 3
Ex2) How many people were paid between 77 and 87 thousand dollars according to the
above histogram? ~330
Artificial Selection Lab
Big Idea 1 – Lab 1
Histograms are graphs that reveal the
distribution/frequency of your data (how
often particular values appear).
mean
If enough data is collected,
histograms can reveal a
normal distribution in the
data around a central mean.
What is the approx. mean of the
birth weight data shown on the
right? 3.5 kg
(the apex of the normal distribution curve)
Artificial Selection Lab
Big Idea 1 – Lab 1
Now let’s build a histogram
for our plant height data…
What should we first do?
Sort the data!!
Artificial Selection Lab
Big Idea 1 – Lab 1
Sorting the Data
Mac
Highlight the two columns 
data  sort according to height
PC
Figure it out…lol
What next?
Estimate appropriate bin size (or no bins). Write
bins in one column on excel sheet and determine
frequency next door…see right. Bin size is critical as
you will see on the next slide so “choose wisely”.
Data sorted by height
Artificial Selection Lab
Big Idea 1 – Lab 1
Binning and Graphing the Data
Scenario 1 (large bin size)
Scenario 2 (small bin size)
Outlier?
10
20
# of plants
# of plants
25
15
10
5
8
6
4
2
0
1 to 5
6 to 10
11 to 15 16 to 20 21 to 25
Height bins (cm)
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Height bins (cm)
Which histogram provides more information about the distribution of plant heights
in our population? Scenario 2 as the data’s resolution is superior and tells a more complete story
Artificial Selection Lab
Big Idea 1 – Lab 1
Calculate the mean,…
=average()
8
# of plants
median,…
10
6
4
and range.
2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Height bins (cm)
The mean and median
- Measure of central tendancy
The Range
- A measure of Spread
Artificial Selection Lab
Descriptive
Statistics
Histograms
and Distributions
Big Idea 1 – Lab 1
The MEDIAN:
This is simply the data value that falls in the middle after sorting the data from low to high.
For example, in the sample to the right, the value that
separates the higher and lower halves of data is 291ms,
which is the median.
Reaction
Time (ms)
265
273
286
291
293
Just arrange the data from highest to lowest or vice versa and
find the middle number…
300
330
Artificial Selection Lab
Descriptive
Statistics
Histograms
and Distributions
Big Idea 1 – Lab 1
The MEDIAN
This is simply the value in a data set that separates the higher half of a sample from the
lower half.
What if there is an even number of data points like shown on
the right?
Again, sort the data from low to high and now just average
the two middle numbers. In this case you average 286 and
291 to get a median of 289.
Reaction
Time (ms)
265
273
286
292
293
300
Artificial Selection Lab
Descriptive
Statistics
Histograms
and Distributions
Big Idea 1 – Lab 1
Stats can be misleading…be very weary…
For example, a college boasts that the average starting salary of their last years graduating
class was $362,000 per year. This sounds quite impressive…
However, what they did not tell you was that the class size was 30 students of which 29
started at $30,000 a year and one student was first round draft pick in the NFL making
approximately $10,000,000 per year.
Histogram
18
16
14
12
10
8
Series1
6
4
2
Time (ms)
501-510
481-490
461-470
441-450
421-430
401-410
381-390
361-370
341-350
321-330
301-310
281-290
261-270
241-250
221-230
0
201-210
An outlier can be seen in the histogram
to the right of our athlete data…perhaps
the person blinked while the reaction
time was being measured.
frequency
Such a data point ($10,000,000 per year)
can be considered an outlier, which is a
data point much higher or lower than the
rest of the data points.
Artificial Selection Lab
Descriptive
Statistics
Histograms
and Distributions
Big Idea 1 – Lab 1
Stats can be misleading…be very weary…
For example, a college boasts that the average starting salary of their last years graduating
class was $362,000 per year. This sounds quite impressive…
However, what they did not tell you was that the class size was 30 students of which 28
started at $30,000 a year and one student was first round draft pick in the NFL making
approximately $10,000,000 per year.
What is the median of this data set?
$30,000
The median is far less sensitive to outliers than the mean.
Artificial Selection Lab
Descriptive
Statistics
Histograms
and Distributions
Big Idea 1 – Lab 1
Stats can be misleading…be very weary…
That said, the median can hide extremes...
Ex)
Let us consider the wages of 'The Widget Company’ below, we will
increase the earnings of the CEO from $100,000 to $500,000. How does
the median reported to the public change?
$30,000
It doesn’t. You can change it to a trillion and the median will not budge…
Artificial Selection Lab
Big Idea 1 – Lab 1
Stats can be misleading…be very weary…
Look at our data…
We have what appears to be an
outlier…a single plant with a height of
1 cm way off the beaten path…
10
What does the histogram inform us
about the mean then?
It may not be so accurate because of
this potential outlier and therefore the
median may be the better value to use
as the center of data.
What would be the mean without the
outlier?
10.6…closer to the median…
# of plants
8
6
4
2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Height bins (cm)
Artificial Selection Lab
Big Idea 1 – Lab 1
Stats can be misleading…be very weary…
The range can also be misleading…
Ex) The range of 1 to 16 makes it seem
that our plant heights might be evenly
spread across the entire range.
However, what does the histogram
show us?
The vast majority falls between 6 and 16.
A very different picture indeed.
Then what other measure of spread will
help us talk about the middle and not just
10
8
# of plants
-The range is a measure of spread, but
should never be used as the only
measure of spread as it tells you
nothing about what is going on in the
middle.
6
4
2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Height bins (cm)
Artificial Selection Lab
Big Idea 1 – Lab 1
Then what other measure of spread will
help us talk about the middle and not
just the edges of the data?
10
8
# of plants
Standard Deviation (s or σ)
6
4
2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Height bins (cm)
σ = the lower case Greek letter sigma
Artificial Selection Lab
Big Idea 1 – Lab 1
What is Standard Deviation (s or σ)?
- The standard deviation is a number
that you calculate based on your data.
This number will tell you more
precisely than the range where your
data is located relative to the
mean…not just between 1 and 16 like
before.
How does it do that?
8
7
# of plants
What does this number tell me?
9
6
5
4
3
2
1
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16
Height bins (cm)
Quite simply. Our data has a mean of 10.37 cm. Let’s say we calculate the
standard deviation to be σ = 1.1. Therefore we would write 10.37 +/- 1.1
cm. This tells you to add 1.1 to the mean getting 11.47cm, and subtract it
from the mean getting 9.27cm.
Great, so what?
So what? This tells you that between 9.27cm and 11.47cm is 68% of your data!!
Or that the next plant you grow will have a 68% chance of being between 9.27 and 11.47 cm.
Artificial Selection Lab
Big Idea 1 – Lab 1
What is Standard Deviation (s or σ)?
Which data set, red or blue, has the
greater mean?
They have the same mean and the
peaks of both normal distributions
align.
Which data set has the greater
standard deviation?
The red data is tighter, closer to the
mean. Therefore the standard
deviation should be smaller (68% of
the data will be closer to the mean
than in the blue data set).
Histogram of two sets of data, blue and red, of any data you
want it to be….
Conclusion: The smaller the standard
deviation… the closer the data is to the mean and
the more narrow the peak!!
Artificial Selection Lab
Big Idea 1 – Lab 1
What is Standard Deviation (s or σ)?
What do researchers hope for their
standard deviation values to be?
As small as possible making the data
peaks as narrow as possible.
Why?
Because we typically compare two or
more data sets to each other as we will
do later…
Look to the right. We are comparing the blue data,
say blood pressure of standard people, to the green
data, blood pressure of people on medication to
lower blood pressure.
Now can you figure out why they want to
peaks to be as narrow as possible? To tell if there if a difference between the groups!!!
Artificial Selection Lab
Big Idea 1 – Lab 1
What is Standard Deviation (s or σ)?
Why do these peaks have spread
associated with them? Why can’t all
the data just fall on one point giving us
a line? Why can’t all the plants just
have one height??
1. Natural Variation in a
population…and there if nothing you
can do about this.
2. Variables not being controlled tight enough like
temperature, water, sunlight, etc… or variables that
you are not considering, but should be.
3. Error in one’s instruments of measurement (not
making a mistake)…a ruler can only measure so
well…significant digits…cough, cough!
4. Small sample size
CONCLUSION: Nature has enough variation. Researchers need to control important variables tightly, develop and
utilize instruments of measure appropriate to the study, and to do one’s best to have a large sample size.
Artificial Selection Lab
Big Idea 1 – Lab 1
What’s up with this kid?
What is Standard Deviation (s or σ)?
Now that you understand standard
deviation (SD, s, σ), what is the
meaning of the figure to the left?
68% of data falls within 1 SD of the mean
95% of data falls within 2 SD of the mean
99.7% of data falls within 3 SD of the mean
s = standard deviation
= mean
Artificial Selection Lab
Big Idea 1 – Lab 1
What’s up with this kid?
What is Standard Deviation (s or σ)?
I love SD. Please can you show me how
to calculate it from my data?
It’s so simple! You really just want to know how far away
all of your data points from the mean!!...and a little more:
s = standard deviation
= mean
n = sample size
x = data value
Artificial Selection Lab
Big Idea 1 – Lab 1
What’s up with this kid?
What is Standard Deviation (s or σ)?
I love SD. Please can you show me how
to calculate it from my data?
1. Determine the average (mean)
2. Subtract the mean from every one of your data values
in the population.
Artificial Selection Lab
Big Idea 1 – Lab 1
What’s up with this kid?
What is Standard Deviation (s or σ)?
I love SD. Please can you show me how
to calculate it from my data?
1. Determine the average (mean)
2. Subtract the mean from every one of your data values
in the population.
3. Square each of the differences
Artificial Selection Lab
Big Idea 1 – Lab 1
What’s up with this kid?
What is Standard Deviation (s or σ)?
I love SD. Please can you show me how
to calculate it from my data?
1. Determine the average (mean)
2. Subtract the mean from every one of your data values
in the population.
3. Square each of the differences
4. Sum up the Squares…called Sum of Squares (SOS)
Artificial Selection Lab
Big Idea 1 – Lab 1
What is Standard Deviation (s or σ)?
I love SD. Please can you show me how
to calculate it from my data?
1. Determine the average (mean)
2. Subtract the mean from every one of your data values
in the population.
3. Square each of the differences
4. Sum up the Squares…called Sum of Squares (SOS)
5. Divide the SOS by the sample size (n) – 1 (this number
is called the variance).
Artificial Selection Lab
Big Idea 1 – Lab 1
What’s up with this kid?
What is Standard Deviation (s or σ)?
I love SD. Please can you show me how
to calculate it from my data?
1. Determine the average (mean)
2. Subtract the mean from every one of your data values
in the population.
3. Square each of the differences
4. Sum up the Squares…called Sum of Squares (SOS)
5. Divide the SOS by the sample size (n) – 1 (this number
is called the variance).
Why divide by n-1 and not just n?
You are almost averaging the squares of the differences. If n = 1000 and you minus 1, makes
really no difference so SD. However, if the sample size is 2 and you subtract 1, SD is
much larger…penalized for a small sample size you are!!!!!!
Artificial Selection Lab
Big Idea 1 – Lab 1
What’s up with this kid?
What is Standard Deviation (s or σ)?
I love SD. Please can you show me how
to calculate it from my data?
1. Determine the average (mean)
2. Subtract the mean from every one of your data values
in the population.
3. Square each of the differences
4. Sum up the Squares…called Sum of Squares (SOS)
5. Divide the SOS by the sample size (n) – 1 (this number
is called the variance).
6. Now just square root the variance to go back and there you
go…the SD!
Artificial Selection Lab
Big Idea 1 – Lab 1
What’s up with this kid?
What is Standard Deviation (s or σ)?
I love SD. Please can you show me how
to calculate it from my data?
1. Determine the average (mean)
2. Subtract the mean from every one of your data values
in the population.
3. Square each of the differences
4. Sum up the Squares…called Sum of Squares (SOS)
5. Divide the SOS by the sample size (n) – 1 (this number
is called the variance).
6. Now just square root the variance to go back and there you
go…the SD!
Artificial Selection Lab
Big Idea 1 – Lab 1
Average height of your population of
Wisconsin Fast Plants (Brassica rapa):
10.4 ± 2.94 cm
10
# of plants
8
6
4
2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Height bins (cm)
Artificial Selection Lab
Big Idea 1 – Lab 1
We now need to do some artificial selection….
What should we do? Directional?
Disruptive? Stabilizing?
Type of graph?
Histogram!
Directional? Me too…lets do it. But how?
Let’s kill the tallest 25% before formation of flowers (you
should know why) of the plants and push the population
towards being shorter (select for allele combinations
that give shorter plants)…
Remove these
Artificial Selection Lab
Big Idea 1 – Lab 1
We now need to do some artificial selection….
Now recalculate the
descriptive stats for height
of your new parental
population before you
breed them
Remove these
Artificial Selection Lab
Big Idea 1 – Lab 1
We now need to do some artificial selection….
Original
Population
Selected
Population
(P generation)
Average
10.4
9.26
Standard
Deviation (σ)
2.94
2.45
Now breed the selected P generation and look at
phenotype (height in this case) of the F1 generation.
Artificial Selection Lab
Big Idea 1 – Lab 1
F1 generation data that you collected. Excel file
is on website.
Guess what you do now with this data?
Descriptive stats of course…histogram,
average, sigma (SD),…
Artificial Selection Lab
Big Idea 1 – Lab 1
Original
Population
Selected
Population
(P generation)
F1
Generation
Average
10.4
9.26
9.61
Standard
Deviation (σ)
2.94
2.45
2.53
?
The big question now…Is the original population
significantly different from the F1 generation in
terms of height due to the artificial selection?
Artificial Selection Lab
Big Idea 1 – Lab 1
Original
Population
Selected
Population
(P generation)
F1
Generation
Average
10.4
9.26
9.61
Standard
Deviation (σ)
2.94
2.45
2.53
?
How can we determine this? Is the difference in
average enough to make a conclusion?
Try this…make a bar chart and a histogram of both populations in the same chart.
Artificial Selection Lab
Big Idea 1 – Lab 1
Standard Error and Error Bars
Original
Population
Selected
Population
(P generation)
F1
Generation
Average
10.4
9.26
9.61
Standard
Deviation
(σ)
2.94
2.45
2.53
12
10
8
6
4
2
0
1
Original
population
2
F1 Generation
n = 38
n = 41
A bar graph showing averages of a group without error bars is meaningless…
Error bars typically indicate either standard deviation or standard error.
We will use standard error. How does one calculate standard error you ask?
SEx = standard error of the mean
S = standard deviation
n = sample size
Artificial Selection Lab
Big Idea 1 – Lab 1
Standard Error and Error Bars
Original
Population
Selected
Population
(P generation)
F1
Generation
Average
10.4
9.26
9.61
Standard
Deviation
(σ)
2.94
2.45
2.53
Standard
Error (Sex)
0.376
12
11.5
11
10.5
10
9.5
9
1
Original
population
2
F1 Generation
n = 38
n = 41
Error bars indicate standard error of each group.
Error bars typically indicate either standard deviation or standard error.
SEx = standard error of the mean
S = standard deviation
n = sample size
.410
Artificial Selection Lab
Big Idea 1 – Lab 1
Histogram of both groups:
9
8
Original
Populatio
n
Selected
Population
(P generation)
F1
Generation
10.4
9.26
9.61
2.94
2.45
2.53
7
# of plants
6
Average
5
Standard
Deviation
(σ)
4
3
2
Gold = original generation
Red = F1 generation
1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Height bins (cm)
Even though the averages are different, the histogram shows that the data overlaps
dramatically, which you would expect if you looked at the standard deviations of group.
How does a researcher deal with this?
We would need to use a statistics test known as a t-test to give us a p-value, which of
course would tell us…
The probability of the null (no difference between groups) hypothesis being supported!!
Artificial Selection Lab
Big Idea 1 – Lab 1
Histograms are graphs that reveal the
distribution/frequency of your data (how
often particular values appear).
If enough data is collected,
histograms can reveal a
normal distribution in the
data around a central mean.
What is the approx. range of the
birth weight data shown on the
right? ~0.9 to 5.0kg
range
Artificial Selection Lab
Descriptive
Statistics
Histograms
and Distributions
Big Idea 1 – Lab 1
So should we be focusing on the median more than the mean????
No. Generally speaking, the mean is TYPICALLY a far more accurate measurement in terms of central
tendency than the median when outliers have been dealt with.
To convince yourself, try this exercise from Seeing Statistics (www.seeingstatistics.com):
The median is more resistant to extreme, misleading data values so it would seem to be the clear choice. However, we also
need to consider accuracy. Is the median or the mean more likely to be close to the true value?
To evaluate the relative accuracy of the median and the mean, let's consider how they do when we know the true center of
the data. Suppose that the only possible scores are the whole numbers between 0 and 100.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
The center of these 101 numbers, whether we use the median or the mean, is 50. What if we were to select five numbers
randomly from this set of 101 and calculate the median and mean of those five numbers? Would the median or the mean
be closer to what we know is the true value of 50?
Download