Data Analysis Statistics

advertisement
Data Analysis
Statistics
OVERVIEW




Getting Ready for Data Collection
The Data Collection Process
Getting Ready for Data Analysis
Descriptive Statistics
GETTING READY FOR
DATA COLLECTION
Four steps




Constructing a data collection form
Establishing a coding strategy
Collecting the data
Entering data onto the collection form
THE DATA COLLECTION
PROCESS

Begins with raw data
– Raw data are unorganized data
CONSTRUCTING DATA
COLLECTION FORMS
One column for each variable
ID Gender
1
2
3
4
5
2
2
1
2
2
Grade
Building
Reading
Score
Mathematics
Score
8
2
8
4
10
1
6
6
6
6
55
41
46
56
45
60
44
37
59
32
One row for each subject
CODING DATA
Variable



Range of Data Possible
Example
ID Number
001 through 200
Gender
1 or 2
2
Grade
1, 2, 4, 6, 8, or 10
4
Building
1 through 6
1
Reading Score
1 through 100
78
Mathematics Score
1 through 100
69
138
Use single digits when possible
Use codes that are simple and unambiguous
Use codes that are explicit and discrete
Interpretation
• The process of making pertinent
inferences and drawing conclusions
concerning the meaning and
implications of a research investigation
The Basics




Descriptive statistics
Inferential statistics
Sample statistics
Population parameters
Sample-------------population
Sample statistics

Variables in a
sample or measures
computed from
sample data
Population
parameters

The variables in a
population or
measured
characteristics of the
population
Making Data Usable
…Or what to do with all those
numbers
Descriptive Statistics
Frequency
Distributions

Organizing a set of
data by
summarizing the
number of times a
particular value of a
variable occurs
Frequency distribution of
ice cream consumption
Age
0
1-5
6-10
11-15
TOTAL
Frequency
(number in
range)
25
15
8
2
50
Percentage
Distributions
 Organizing the
frequency distribution
into a chart or graph
that summarizes
percentage values
associated with
particular values of a
variable
Proportion
 The percentage of
elements that meet
some criterion
(percentage, fraction or
decimal)
Frequency distribution of
ice cream consumption
by age
Age
0
1-5
6-10
11-15
TOTAL
Percent
(of
people who
consumed ice
cream in range)
50
30
16
4
100%
Graphic Representations of Data
Pie Chart: Ice cream
consumption
Winter
Spring
Summer
Fall
Bar Chart: Frequency of Seasonal Ice Cream
consumption
90
80
70
60
50
Amt
40
30
20
10
0
Winter
Spring
Summer
Fall
Graphical representation of
results from cross tab
Bar Chart: Frequency of Seasonal Ice Cream
consumption Shown By Gender
90
80
70
60
50
Male
Female
40
30
20
10
0
Winter
Spring
Summer
Fall
Cross tabulation

Cross tabulation:
– a technique for organizing data by
groups, categories or classes, thus
facilitating comparisons;
– a joint frequency distribution of
observations on two or more sets of
variables
Types of Cross tabs


Contingency table: the results of a
cross tabulation of two variables, such
as survey questions
Cross tab of question: Do you have children under
the age of six currently living with you? (2 x 2
table)
Yes
Males
5
Females 10
Total
15
No
15
20
35
Total
20
30
50
Types of Cross tabs


Percentage cross-tab. Using percentages helps us
make relative comparisons. The total number of
respondents/observations may be used as a base
for computing the percentage in each cell
Percentage Cross tab : Do you have children under
the age of six currently living with you?
Males
Females
Yes
20%
33.33%
No
80%
66.66%
Total
100% (20)
100% (30)
Total
30%
70%
100% (50)
Elaboration Analysis of
Cross tabs


Analysis of the basic cross-tab for each level of
another variable, such as subgroups of the same
sample
Percentage Cross tab : Do you have children under
the age of six currently living with you?
Aged 17-25

Aged 25 and up
Male
Female
Male
Female
Yes
0
2
5
8
No
10
20
0
0
Calculating Rank Data





Please place in rank order the
following varieties of cookies (1= most
preferred to 4=least preferred)
__ Chocolate chip
__ Marshmallow
__ Oatmeal
__ Oreo
Choco chip
Marshm
Oatmeal
Oreo
1
1
2
4
3
2
1
3
4
2
3
2
1
3
4
4
2
4
3
1
5
2
1
3
4
6
3
4
1
2
7
2
3
1
4
8
1
4
2
3
9
4
3
2
1
10
2
1
3
4
Chocolate chip: (3X1) +(4X2) + (2X3) +(1X4) = 21
Marshmallow: (3X1) +(1X2) + (3X3) +(3X4) = 26
Oatmeal: (2X1) +(2X2) + (4X3) +(3X4) = 26
Oreo: (2X1) +(2X2) + (2X3) +(4X4) = 28
Measures of central
tendency




Mode: the value that occurs most often
Median: the midpoint; the value below
which half the values in a distribution fall
Mean: the arithmetic average
Remember: what type of scale you use
determines the type of statistic you may
calculate
WHEN TO USE WHICH
MEASURE
Measure of
Central
Tendency
Level of
Measurement
Use When
Examples
Mode
Nominal
Data are categorical
Eye color, party
affiliation
Median
Ordinal
Data include extreme
scores
Rank in class, birth
order
Mean
Interval and ratio
You can, and the data fit
Speed of response,
age in years
Measures of dispersion
What is the tendency for measures to depart
from the central tendency?
 Range: simplest measure of dispersion
 Deviation scores- quantitative index of
dispersion
– Variance: the sum of squared deviation scores
divided by sample size minus 1- often used.
(variance is in squared units, eg squared dollars)
– Standard Deviation: square root of variance
MEASURES OF
VARIABILITY
Variability is the degree of spread or dispersion
in a set of scores
 Range—difference between highest and
lowest score
 Standard deviation—average difference of
each score from mean
THE MEAN AND THE
STANDARD DEVIATION
STANDARD DEVIATIONS
AND % OF CASES



The normal curve is symmetrical
One standard deviation to either side of the mean contains 34%
of area under curve
68% of scores lie within ± 1 standard deviation of mean
Download