Research & Descriptive Statistics

advertisement
Textbook Credits
• Textbook
Shavelson, R.J. (1996). Statistical reasoning for the behavioral
sciences (3rd Ed.). Boston: Allyn & Bacon.
• Supplemental Material
Ruiz-Primo, M.A., Mitchell, M., & Shavelson, R.J. (1996).
Student guide for Shavelson statistical reasoning for the
behavioral sciences (3rd Ed.). Boston: Allyn & Bacon.
2
Overview
• Example data on Teacher Expectancy Research
• Frequency Distributions
• Measures of Central Tendency & Variability
• The Normal Distribution
3
Teacher Expectancy Data
4
Research on Teacher Expectancy Study Design
• A 2 x 6 x 2 (treatment x grade level x test occasions) randomized experiment
Schematic of design:
Occasion 1
Grade
Treatment
Occasion 2
1
Experimental X
(“Bloomers”)
2
IQ Pretest
3
4
Control
5
Randomly Assigned
6
5
IQ Posttest
Teacher Expectancy Data Matrix
• Another convenient way to easily depict the data
6
Teacher Expectancy Frequency Distribution
• Posttest scores from the treatment group
7
Sum = 30
Frequency Distribution Using Class Intervals
• Treatment group posttest scores divided into 11 class intervals
• Each class interval size is 3 (score values 123, 124, 125, …)
• Clear patterns emerge. Look @ interval 114-116 with highest f
8
Frequency Distribution Using Class Intervals
•
•
•
•
Use 11 intervals
Number of Class Intervals: Highest – Lowest score: 125-95 = 30
Class intervals size (i) = H–L / # of class intervals: 30/11 = 2.7(round to 3)
Rule: Lowest interval score must be divisible by interval class size. Lowest score 95 is not
divisible by 3 so subtract 1. 94 is still not divisible by 3 so subtract 1. 93 is divisible by 3 so
lowest class interval score begins with 93.
9
Teacher Expectancy: Histogram
• Histogram showing the class interval posttest scores on the abscissa and frequency
on the ordinate
• Lower and upper limit scores with zero values are shown.
10
Teacher Expectancy: Polygon
• Polygon showing the class interval posttest scores on the abscissa and frequency
on the ordinate
• Lower and upper midpoint values with zero f are shown.
11
Teacher Expectancy: Polygon
• Polygon showing the class interval posttest scores on the abscissa and frequency
on the ordinate
• Lower and upper midpoint values with zero f are shown.
12
Teacher Expectancy: Polygon
13
Teacher Expectancy: Stem-and-leaf plot
• Stem-and-leaf plot containing the data matrix posttest scores in increments of 5’s
14
Common Frequency Distribution Shapes
Normal Distribution(bell shape)
Unimodal distribution : 1 peak
Positively Skewed
Symmetric about the mean
Bimodal Distribution: 2 peaks
Multimodal Distribution: > 2 peaks
Negatively Skewed
15
Rectangular Distribution(no peaks)
Kurtosis (peakedness): platykurtic
Symmetric about the mean
Kurtosis (peakedness): leptokurtic
The Relative Frequency (Probability) Distribution
• Score frequencies are shown as a proportion of the total number of frequencies in
the sample: RF = f / total # of subjects
Sum = 10
Sum = 20
Total Sum = 30
16
The Relative Frequency Polygon
• Relative frequency polygons are constructed as the frequency polygons except the
relative frequency is listed in the ordinate
17
The Cumulative Frequency Distribution
• The cumulative frequency distribution shows the number of scores falling below a
certain point on the scale of scores
• The cf of a score is defined as the number of cases falling below the upper real limit
of the class interval
18
The Cumulative Frequency Polygon
• The cumulative frequency polygon uses upper real limits and cumulative frequency
19
Cumulative Proportions and Percentiles
• 80% of the subjects in the experimental group of the expectancy study received a
posttest score below 119.5
• CP = CF / total number of subjects
• C% = CP x 100
20
Percentile Scores
• Example: A raw score of 113.5 has a percentile rank of 57
21
Measures of Central Tendency
The central tendency of the set of measurements - that is, the tendency of the
data to cluster, or center, about certain numerical values.
Central Tendency
(Location)
22
Measures of Central Tendency
The variability of the set of measurements–that is, the spread of the data.
Variation (Dispersion)
23
Standard Notation
Measure
Sample
m
Mean
Size
Population
n
N
24
Mean
• Most common measure of central tendency
• Acts as ‘balance point’
• Affected by extreme values (‘outliers’)
• Denoted as
where
25
Mean Example
26
Median
27
Median Example: Odd Size Sample
• Raw Data: 24.1 22.6 21.5 23.7 22.6
• Ordered: 21.5 22.6 22.6 23.7 24.1
• Position:
1
2
3
4
5
28
Median Example: Even Size Sample
29
Mode
• Measure of central tendency
• Value that occurs most often
• Not affected by extreme values
• May be no mode or several modes
• May be used for quantitative or qualitative data
30
Mode Example
• No Mode
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• One Mode
Raw Data: 6.3 4.9 8.9
6.3 4.9 4.9
• More Than 1 Mode
Raw Data: 21 28
41
28
31
43
43
Measures of Variability
• Measure of dispersion
• Difference between largest & smallest
observations
Range = xlargest – xsmallest
• Ignores how data are distributed
32
Standard Notation
33
Sample & Population Variance
The variance is the average of the squares of the distance
each value is from the mean.
n
s2 
 x
i
 x
2
i1
n 1
x1  x   x2  x 


2
2
L  xn  x 
2
n 1
x  m 


2

2
N
N in the
denominator!
n – 1 in the
denominator!
34
Standard Deviation
The standard deviation is the square root of the variance.
s  s2
n

2
x

x


 i
  
2
i1
n 1
x1  x   x2  x  L  xn  x 


n 1
2
2
2
35
2


x

m

N
Sample Standard Deviation Formula
S
=
= 2.523
36
Overview of the Normal Distribution
• Serves as a reasonable good model of many natural
phenomena
• Provides a good model for the frequency distribution of scores
• Of particular importance is in inferential statistics as a
probability distribution
• There exists a close connection between the sample size and
the distribution of means calculated for many samples of
subjects drawn from the same population
• As the sample size increases, the distribution of scores
becomes normal
• May provide a good approximation to probabilities of other
distributions that are more difficult to work with
37
Properties of the Normal Distribution
• It is unimodal, observing the value of X and the mean
• It is symmetric about the mean; ½ the scores fall
below the mean and ½ the scores fall above the mean
• The mean, mode, and median are all equal
• It is asymptotic (never touches the abscissa)
• It is continuous for all values of X from - ∞ to +∞
38
Properties of the Normal Distribution
•
•
•
•
Unimodal
Symmetric
Mode=median=mean
Asymptotic
39
Empirical Rule of the Normal Distribution Areas
40
Interpretation of z-Scores Example
• Approximately 68% of the measurements will have a
z-score between –1 and 1.
• Approximately 95% of the measurements will have a
z-score between –2 and 2.
• Approximately 99.7% of the measurements will have
a z-score between –3 and 3.
41
Empirical Rule Example
42
Computing the z-Score
43
z-Score Example 1
44
Z-score Example 1
Find the area between the mean and a given raw score
• z score: Mean 0, s = 1
– Distance between a score (X) and the mean of a distribution in standard
deviation (s) units
– Used to display and interpret areas of the normal distribution
Assume score is 9, mean = 8, s = 2
z = (x - mean) / s
z = (9-8)/2 = 0.5
• Next, find the area between
the mean and z = 0.5
From Table B Column 2 in
Appendix II we find: 0.1915
0.1915 or 19.15% of the cases
45
Z-score Example 2
Find the area below a given raw score
• z score: Mean 0, s = 1
– Distance between a score (X) and the mean of a distribution in
standard deviation (s) units
– Used to display and interpret areas of the normal distribution
Assume score is 9, mean = 8, s = 2
z = (x - mean) / s
z = (9-8)/2 = 0.5
Below!
• Mark off area in the ND
• Find area below z = 0.5
• From Table B column 3 we find:
0.6915 or 69.15% of the cases
46
Z-score Example 3
Find the area above a given raw score
• z score: Mean 0, s = 1
– Distance between a score (X) and the mean of a distribution in standard
deviation (s) units
– Used to display and interpret areas of the normal distribution
Assume score is 9, mean = 8, s = 2
z = (x - mean) / s
z = (9-8)/2 = 0.5
• Mark off area in the ND
• Find area above z = 0.5
Above!
• From Table B column 3 we find:
0.3085 or 30.85% of the cases
47
Z-score Example 3
Find the area between two given raw scores
• z score: Mean 0, s = 1
– Distance between a score (X) and the mean of a distribution in standard deviation (s)
units
– Used to display and interpret areas of the normal distribution
Assume 1st score is 9, 2nd score is 5.8, mean = 8, s = 2
z = (x - mean) / s
z = (9-8)/2 = 0.5
•
•
z = (x - mean) / s
z = (5.8-8)/2 = -1.1
Find area between mean and z = 0.5
Find area between mean and z = -1.1
zX=5.8 = -1.1
Between!
zX=9 = .5
• Total = 0.1915 + 0.3643 = 0.5558
0.5558 or 55.58% of the cases
48
Practice Exercises
1. Select a hypothetical product or a process and create some test data
of your choice (plausible, no more than 10) as shown in
textbook/class
2. Show your type of experimental approach
3. Create a detailed table of frequency distributions
4. Display your data with different types of graphs
5. Calculate the measures of central tendency and variability
6. Calculate the Z-score(s) and indicate the relative position in the
normal distribution.
7. Provide any other pertinent information as a result
49
Questions ?
50
Download