Data Analysis Statistics

advertisement
Data Analysis
Statistics
OVERVIEW
•
•
•
•
Getting Ready for Data Collection
The Data Collection Process
Getting Ready for Data Analysis
Descriptive Statistics
GETTING READY FOR DATA
COLLECTION
Four steps
•
•
•
•
Constructing a data collection form
Establishing a coding strategy
Collecting the data
Entering data onto the collection form
THE DATA COLLECTION
PROCESS
• Begins with raw data
– Raw data are unorganized data
CONSTRUCTING DATA
COLLECTION FORMS
One column for each variable
ID Gender
1
2
3
4
5
2
2
1
2
2
Grade
Building
Reading
Score
Mathematics
Score
8
2
8
4
10
1
6
6
6
6
55
41
46
56
45
60
44
37
59
32
One row for each subject
CODING DATA
Variable
Range of Data Possible
Example
ID Number
001 through 200
Gender
1 or 2
2
Grade
1, 2, 4, 6, 8, or 10
4
Building
1 through 6
1
Reading Score
1 through 100
78
Mathematics Score
1 through 100
69
138
• Use single digits when possible
• Use codes that are simple and unambiguous
• Use codes that are explicit and discrete
Interpretation
• The process of making pertinent inferences
and drawing conclusions concerning the
meaning and implications of a research
investigation
The Basics
• Descriptive statistics
• Inferential statistics
• Sample statistics
• Population parameters
Sample--------------population
Sample statistics
• Variables in a sample
or measures computed
from sample data
Population parameters
• The variables in a
population or
measured
characteristics of the
population
Making Data Usable
…Or what to do with all those
numbers
Descriptive Statistics
Frequency Distributions
• Organizing a set of
data by summarizing
the number of times a
particular value of a
variable occurs
Frequency distribution of ice
cream consumption
Age
Frequency
(number
in range)
0
1-5
6-10
11-15
TOTAL
25
15
8
2
50
Percentage Distributions
• Organizing the frequency
distribution into a chart or
graph that summarizes
percentage values
associated with particular
values of a variable
Proportion
• The percentage of
elements that meet some
criterion (percentage,
fraction or decimal)
Frequency distribution of ice
cream consumption by age
Age
Percent (of
people who
consumed ice
cream in
range)
0
1-5
6-10
11-15
TOTAL
50
30
16
4
100%
Graphic Representations of Data
Pie Chart: Ice cream
consumption
Winter
Spring
Summer
Fall
Bar Chart: Frequency of Seasonal Ice Cream
consumption
90
80
70
60
50
Amt
40
30
20
10
0
Winter
Spring
Summer
Fall
Cross tabulation
• Cross tabulation: a technique for
organizing data by groups, categories or
classes, thus facilitating comparisons; a
joint frequency distribution of observations
on two or more sets of variables
Types of Cross tabs
• Contingency table: the results of a cross
tabulation of two variables, such as survey
questions
• Cross tab of question: Do you have children under the age
of six currently living with you? This is a 2X2 table, why
Yes
Males
5
Females 10
Total
15
No
15
20
35
Total
20
30
50
Types of Cross tabs
• Percentage cross-tab. Using percentages helps us make
relative comparisons. The total number of
respondents/observations may be used as a base for
computing the percentage in each cell
• Percentage Cross tab : Do you have children under the
age of six currently living with you?
Yes
No
Total
Males
20%
80%
100% (20)
Females
33.33%
66.66%
100% (30)
Total
30%
70%
100% (50)
Graphical representation of results
from cross tab
Bar Chart: Frequency of Seasonal Ice Cream
consumption Shown By Gender
90
80
70
60
50
Male
Female
40
30
20
10
0
Winter
Spring
Summer
Fall
Elaboration Analysis of
Cross tabs
• Analysis of the basic cross-tab for each level of another
variable, such as subgroups of the same sample
• Percentage Cross tab : Do you have children under the
age of six currently living with you?
• Moderator Variable; Spurious relationship
•
Aged 17-25
Aged 25 and up
Male
Female Male
Female
Yes
0
2
5
8
No
10
20
0
0
Calculating Rank Data
• Please place in rank order the following
varieties of cookies (1= most preferred to
4=least preferred)
• __ Chocolate chip
• __ Marshmallow
• __ Oatmeal
• __ Oreo
Choco chip
Marshm
Oatmeal
Oreo
1
1
2
4
3
2
1
3
4
2
3
2
1
3
4
4
2
4
3
1
5
2
1
3
4
6
3
4
1
2
7
2
3
1
4
8
1
4
2
3
9
4
3
2
1
10
2
1
3
4
Chocolate chip: (3X1) +(4X2) + (2X3) +(1X4) = 21 ********
Marshmallow: (3X1) +(1X2) + (3X3) +(3X4) = 26
Oatmeal: (2X1) +(2X2) + (4X3) +(3X4) = 26
Oreo: (2X1) +(2X2) + (2X3) +(4X4) = 28
Measures of central tendency
• Mode: the value that occurs most often
• Median: the midpoint; the value below which
half the values in a distribution fall
• Mean: the arithmetic average
• Remember: what type of scale you use
determines the type of statistic you may calculate
WHEN TO USE WHICH
MEASURE
Measure of
Central
Tendency
Level of
Measurement
Use When
Examples
Mode
Nominal
Data are categorical
Eye color, party
affiliation
Median
Ordinal
Data include extreme scores
Rank in class, birth
order
Mean
Interval and ratio
You can, and the data fit
Speed of response,
age in years
Measures of dispersion
What is the tendency for measures to depart from
the central tendency?
• Range: simplest measure of dispersion
• Deviation scores- quantitative index of
dispersion
– Average deviation: never used
– Variance: the sum of squared deviation scores
divided by sample size minus 1- often used. (variance
is in squared units, eg squared dollars)
– Standard Deviation: square root of variance
MEASURES OF VARIABILITY
Variability is the degree of spread or dispersion in a
set of scores
• Range—difference between highest and lowest
score
• Standard deviation—average difference of each
score from mean
THE MEAN AND THE
STANDARD DEVIATION
STANDARD DEVIATIONS
AND % OF CASES
• The normal curve is symmetrical
• One standard deviation to either side of the mean contains 34% of area
under curve
• 68% of scores lie within ± 1 standard deviation of mean
Download