Statistical Reasoning

advertisement
Statistical Reasoning
Statistical Reasoning
A. Describing data
1. Frequency distributions – Where are the
majority of the scores?
Multiple Choice
13 A+ 40
12 A 39
38
11 A- 37
10 B+ 36
9 B 35
34
8 B- 33
7 C+ 32
6 C 31
30
5 C- 29
4 D+ 28
3 D 27
26
2 D- 25
1 F 24
<24
Composite
Essay
4 41%
11
11
6
4 31%
5
5
4
3 19%
3
2
4
1
8%
2%
1
Mean=34.3
SD=4.2
A 12 23 52%
A- 11 10
B+ 10 15
B 9 5 41%
B- 8 6
C+ 7 2
C 6 1 5%
C- 5
D+ 4 2
D 3
3%
D- 2
F 1
0%
0
Mean=10.2
SD=2.0
13
12
11
10
9
8
7
6
5
4
3
2
1
A+
A 11 39%
A- 14
B+ 9
B 12 45%
B- 8
C+ 2
C+ 2 11%
C- 3
D+
D 3 5%
DF
0%
Mean=9.3
SD=2.3
Statistical Reasoning
A. Describing data
1. Frequency distributions
2. Histograms & frequency polygons – ways
of showing your frequency distribution data.
Histogram
Percentage of students
Uses a Bar Graph to show data
50
45
40
35
30
25
20
15
10
5
0
A
B
C
Grades
D
Frequency Polygon
Percentage of students
Uses a line graph to show data
50
45
40
35
30
25
20
15
10
5
0
A
B
C
Grades
D
Statistical Reasoning
B. Measures of central tendency – 3 types
4, 3, 5, 4, 4
1. Mode=most common=4
(Reports what there is more of – Used in data with no
connection. Can’t average men & women.)
2. Mean=arithmetic average=20/5=4
(has most statistical value)
3. Median=middle score=4
(1/2 the scores are higher, half are lower.
Used when there are extreme scores)
Central Tendency
An extremely high or low price/score can skew the mean. Sometimes the
median is better at showing you the central tendency.
1968
TOPPS
Baseball
Cards
Nolan Ryan
$1500
Elston Howard
Billy Williams
Luis Aparicio
Harmon Killebrew
Orlando Cepeda
Maury Wills
Jim Bunning
Tony Conigliaro
Tony Oliva
Lou Pinella
Mickey Lolich
$8
$5
$5
$3.50
$3.50
$3
$3
$3
$3
$2.50
Jim Bouton
Rocky Colavito
Boog Powell
Luis Tiant
Tim McCarver
Tug McGraw
Joe Torre
Rusty Staub
Curt Flood
With Ryan:
Median=$2.50
Mean=$74.14
$2.25
$2
$2
$2
$2
$1.75
$1.75
$1.5
$1.25
$1
Without Ryan:
Median=$2.38
Mean=$2.85
Does the mean accurately portray the central
tendency of incomes?
NO!
What measure of central tendency would more accurately
show income distribution?
Median – the majority of the incomes surround that number.
Statistical Reasoning
A. Describing data
B. Measures of central tendency
C. Measures of variation
1. Range – Difference b/w a high & low score
(can be skewed by an extreme score)
2. Variance & standard deviation – How spread out is
your data?
Calculating Standard Deviation
How spread out (consistent) is your data?
1. Calculate the mean.
2. Take each score and subtract the mean from it.
3. Square the new scores to make them positive.
4. Mean (average) the new scores
5. Take the square root of the mean to get back to your original
measurement.
6. The smaller the number the more closely packed the data. The
larger the number the more spread out it is.
Standard Deviation
Punt
Deviation
Distance from Mean
36
38
41
45
36 - 40 = -4
38 – 40 = -2
41 – 40 = +1
45 – 40 = +5
Deviation
Squared
Numbers
multiplied by itself
& added together
16
4
1
25
Standard
Deviation:
variance=
11.5 = 3.4 yds
Mean:
160/4 = 40 yds
46
Variance:
46/4 = 11.5
Z-Scores
A number expressed in Standard Deviation Units that shows
an Individual score’s deviation from the mean.
Basically, it shows how you did compared to everyone else.
+ Z-score means you are above the mean,
– Z-score means you are below the mean.
Z-Score = your score minus the average score divided by standard deviation.
Which class did you perform better in compared to your classmates?
Test Total Your
Score
Average
score
S.D.
Biology
200
168
160
4
Psych.
100
44
38
2
Z score in Biology: 168-160 = 8, 8 / 4 = +2 S.D.
Z score in Psych: 44-38 = 6, 6/2 = +3 S.D.
You performed better in Psych compared to your classmates.
Statistical Reasoning
A. Describing data
B. Measures of central tendency
C. Measures of variation
D. Characteristics of the normal curve
Skewed Curves
A Positive Skew has a tail that goes
to the right.
A Negative Skew has a tail that goes
to the left.
Statistical Reasoning
A. Describing data
B. Measures of central tendency
C. Measures of variation
D. Characteristics of the normal curve
E. Inference
1. Does the sample represent the population?
a. Non-biased sample-good
b. Low variability-good
c. Larger samples-good
Statistical Reasoning
E. Inference
1. Does the sample represent the pop.?
2. Are differences between groups in your
results statistically significant?
a. Big differences-good
b. Low variability-good
c. Big groups-good
Statistical Significance
p value = likelihood a result is caused by chance
• This is bad to a researcher. They want this number to be
as small as possible to show that any change in their
experiment was caused by an independent variable and
not some outside force.
• This number can be no greater than 5% for the findings
to be considered statistically significant.
p ≤ .05
• This means the researcher must be 95% certain their
results are not caused by chance.
• Replication of the experiment will prove the p value to
be true or not.
Download