Statistics Refresher 1

advertisement
Statistics Refresher
Research Methods PSYC362
Three Stages of Data Analysis
 1) Getting to know the data
 2) Summarizing the data
 3) Confirming what the data reveal
Three Stages of Data Analysis
 1) Getting to know the data
 Exploratory or investigative stage
 What is going on with the data?
 Are there errors in the data?
 Does the data require transformation?
 Frequency distribution of data
 Histogram, frequency polygon, etc.
 Stem-and-Leaf displays
Frequency Distribution
 Frequency distribution
 An organized tabulation of the number of individuals located in
each category on the scale of measurement
 Presented as either tables or a graph
 Two elements of Frequency Distributions
 The set of categories that make up the original measurement
scale.
 A record of the number of individuals in each category.
Frequency Distribution Tables
 8, 9, 8, 7, 10, 9, 6, 4, 9, 8, 7, 8, 10, 9, 8, 6, 9, 7, 8, 8
Frequency Distribution Tables
X
10
9
8
7
6
5
4
f
2
5
7
3
2
0
1
Proportions and Percentages
 Proportion/relative frequency
 P= f / N
 Percentage
 P(100)= f / N(100)
Proportions & Percentages
X
10
9
8
7
6
5
4
f
2
5
7
3
2
0
1
p
%
Proportions & Percentages
X
10
9
8
7
6
5
4
f
2
5
7
3
2
0
1
p
.1
.25
.35
.15
.1
0
.5
%
Proportions & Percentages
X
10
9
8
7
6
5
4
f
2
5
7
3
2
0
1
p
.1
.25
.35
.15
.1
0
.05
%
10%
25%
35%
15%
10%
0%
5%
Frequency Distributions Graphs
 Histograms
 Vertical bars above each score
 Height of bar corresponds to Frequency
 Width extends to real limits of the score
 Bar graphs
 Vertical bars above each score with space between each bar
 Designates separate distinct categories
 Frequency Distribution Polygon (line graph)
 A dot is centered above the score w/ height corresponding to
frequency
 Connected with a continuous line
Frequency Distribution Tables
X
10
9
8
7
6
5
4
f
2
5
7
3
2
0
1
Histogram
F
r
e
q
u
e
n
c
y
Bar graph
F
r
e
q
u
e
n
c
y
Line graph (frequency distribution polygon)
F
r
e
q
u
e
n
c
y
Shape of a distribution
 Symmetrical distribution
 Line at midpoint will give identical halves
 Skewed distribution
 Scores pile up at one end and taper off at the other
 Positively skewed
 Tail points toward the positive end of the scale
 Negatively skewed
 Tail points toward the negative end of the scale
Symmetrical Distribution
F
r
e
q
u
e
n
c
y
Negative Skew
F
r
e
q
u
e
n
c
y
Positive Skew
F
r
e
q
u
e
n
c
y
Stem and Leaf Display: An alternative to
traditional frequency distribution
• Lowest scores appear at the top.
•Presenting exact value for each
6|12
score
6|7889
•Showing the shape of the
7|000223
distribution (viewed from the side)
7|5677888899
•One of the
8|00112222344
techniques in
8|5566666667788999
Exploratory Data
9|01
Analysis
9|6
1-1
Three Stages of Data Analysis
 2) Summarizing the data
 Measures of Central Tendency:
 Mean- the sum of the scores divided by the number of scores contributing to the
sum
 Median- the middle point of a frequency distribution determined by ranking all
of the scores from lowest to highest
 Mode- the most often recurring score
Three Stages of Data Analysis
 2) Summarizing the data
 Dispersion of Variability:
 Range- the lowest score in the distribution subtracted from the highest
 Standard Deviation- the square root of the average squared deviations of scores
about the mean. Tells how far on average a score is from the mean
 Standard Error of the Mean- population standard deviation divided by the square
root of the sample size
 Estimated Standard Error of the Mean- sample standard deviation divided by the
square root of the sample size
STANDARD DEVIATION
 Raw Scores Method
 Method we will use for Class
SS   X
 First calculate the Sum of
Squares
 Then calculate the standard
deviation
X



2
2
s
SS
N 1

SS
N
N
Sampling Distribution of the Mean
 Central Limit Theorem
 Standard deviation of the sampling distribution of mean is equal to the
standard deviation of the raw score population divided by
n
X 

 Where n = the size of the sample
n
 Thus, the standard distribution of means is narrower than the
population distribution
Download