Elementary Statistics:

advertisement
7/20/2015
Admin
Choices
Levels
Statistics
Measures
Choosing
SPSS
Last Lab Assigned
SOC497/L: SOCIOLOGY RESEARCH METHODS
Threats to Internal Validity
Some you can plan for,
some you can’t (history)
Elementary Statistics:
Choices & Implications
Either way, choices in research design have
implications for what happens during data
collection
This lecture & lab:
Ellis Godard
Choices in measurement have implications for
what happens (or can happen) during analysis
SOC497 @ CSUN w/ Ellis Godard
Admin
Choices
Levels
Statistics
Measures
Choosing
SPSS
Admin
Clickers…
1.
2.
3.
4.
5.
Choices
Levels
Statistics
Measures
4
Choosing
SPSS
Outline for Today
Rock my world
Add some value
Whatever…
Have problems
Really, really suck.
Operationalization Choices
57%
Evolution, Components
Levels of Measurement
Review of Introductory Statistics
Central Tendency & Dispersion
Choosing Statistics
Distributions & Shapes
21%
14%
7%
Example & Lab
0%
1.
2.
SOC497 @ CSUN w/ Ellis Godard
Admin
Choices
Levels
Statistics
Measures
3.
4.
Choosing
5.
2
SPSS
Admin
1. How many full chapters of reading in the text
were assigned for this lecture?
1. 1
38%
38%
2. 2
3. 3
4. 4
5. None of the above
13%
SOC497 @ CSUN w/ Ellis Godard
13%
Choices
Levels
Statistics
Measures
5
Choosing
SPSS
Evolution of Operationalizations
Kinds
Recoding & computing
Indices & Scales
Select cases
Crosstabs and other bivariate analyses
Timing
Ideally, would have spelled out in advance
Often, arise during data analysis
Consequence:
0%
1.
2.
SOC497 @ CSUN w/ Ellis Godard
SOC497 @ CSUN w/ Ellis Godard
3.
4.
5.
3
Changing the measurement changes the meaning!
SOC497 @ CSUN w/ Ellis Godard
6
1
7/20/2015
Admin
Choices
Levels
Statistics
Measures
Choosing
SPSS
Admin
Components of Variation (sets)
Meaningful standard intervals
Meaningful spectrum of values
Single measurement’s value, (as if) for 1 case
Ratio measures originate/end at zero
Values can be compared
No other differences for our purposes
Examples:
e.g. age, Kelvin
Jill has twice as much as Joe
Two primary characteristics/requirements:
Exhaustiveness: Able to classify every observation
just use NOI
Mutually exclusivity: each case fits 1&only1 value
SOC497 @ CSUN w/ Ellis Godard
Choices
Levels
Statistics
Measures
7
Choosing
SPSS
Only exhaustive & mutually exclusive – just names
Values cannot be ordered/ranked – apples/oranges
Examples: gender, race, religion, department
Ordinal variables:
Choices
Levels
Statistics
10
Measures
Choosing
SPSS
3. Which value is from an interval variable?
Nominal variables:
SOC497 @ CSUN w/ Ellis Godard
Admin
Levels of Measurement
SPSS
No “true zero”
age, gender, ethnicity, sexual orientation, SES, occupation
Admin
Choosing
Distance between any two values can be calculated
Difference between any two cases can be calculated
Variable: a logical set of related attributes
Measures
Distance between successive values is clear
One of those e.g.’s - Could apply to many cases
Statistics
Distance btwn attributes is measured & uniform
Examples:
young, female, Armenian, queer, wealthy, plumber
Levels
Interval Variables
Attribute: Characteristic or quality of something
Choices
Also rank-ordered – more/less, higher/lower
RanksRelative, not absolute
1.
2.
3.
4.
5.
<8 years of education
12-14 years old
>5 sexual partners
$18K-$20K in income
None of the above
20%
20%
20%
2.
3.
20%
20%
Difference between 2 values or cases is unclear
Range covered by each value may be unclear too
Examples: short/medium/tall; <HS/HS/BA/+
1.
SOC497 @ CSUN w/ Ellis Godard
Admin
Choices
Levels
Statistics
Measures
8
Choosing
SPSS
Admin
Age
Gender
Race
Religion
All of the above
Choices
Statistics
Measures
11
Choosing
SPSS
2. which LOM appropriate for…
67%
22%
11%
1.
Levels
5.
Question 2 from 1st Day Quiz
2. Which variable has rankable values?
1.
2.
3.
4.
5.
SOC497 @ CSUN w/ Ellis Godard
4.
0%
0%
2.
3.
SOC497 @ CSUN w/ Ellis Godard
SOC497 @ CSUN w/ Ellis Godard
4.
5.
9
College major – nominal
Socioeconomic status (low medium high) – ordinal
Average GPA – interval
Occupation (plumber, accountant, teacher, etc.) –
Nominal
Able to compose web pages in HTML (yes or no) –
nominal (b/c 2)
Verbal complexity (on a 100-point continuum) –
interval
SOC497 @ CSUN w/ Ellis Godard
12
2
7/20/2015
Admin
Choices
Levels
Statistics
Measures
Choosing
SPSS
Implications of Level Chosen
Admin
Choices
Levels
Statistics
Measures
Choosing
Course Progress
Analysis techniques require min. level.s
Typically learn statistics in this order:
Anticipate appropriate conclusions
Review statistics first, then pair w/ levels
Descriptive statistics for univariate data
Sometimes need >1 level, >1 indicator
Inferential statistics for univariate data
Computing variables > new variable
Indices & Scales – coming lecture
Descriptions of bivariate relationships
Note same as recoding (values > new values)
SOC497 @ CSUN w/ Ellis Godard
Admin
Choices
Levels
Statistics
Measures
Inferences from bivariate relationships
13
Choosing
SPSS
Why Do Levels Matter?
SOC497 @ CSUN w/ Ellis Godard
Admin
Each requires different univariate procedures
Can compute most frequent (modal) religion
Combinations require different bivariate procedures
Stat. techniques requires (min.) level
Statistics
Measures
Choosing
SPSS
Gender is a good example. How many of the
respondents were women, as compared to men.
Including discussion of Distributions
Can’t compute “average religion”
Levels
Describing a sample in terms of a single variable
Each level is described differently
Choices
16
Univariate Analysis
Associated w/ Different Statistics, 2 ways…
SPSS
Each has a set of assumptions about data
Inc. mathematical manipulation of the values
How answers R distributed across possible responses
What is the shape of the distribution? (…)
Where is this distribution centered? (typical value)
How spread out is the distribution? (dispersion)
Addition, subtraction, multiplication, division
Require at least interval level of measurement
SOC364 w/ Dr. Ellis Godard -Slide
Admin
Choices
Levels
Statistics
Measures
Choosing
SPSS
Introduction to Statistics
Levels
Statistics
Measures
Choosing
SPSS
Univariate analysis – single variables, not relationships, causes, etc.
Bivariate analysis – two variables; test relationships, causes, etc.
Multivariate Analysis – more than two variables
SOC497 @ CSUN w/ Ellis Godard
15
U-shaped (polarized), log, etc.
Flat/even/uniform
Logarhythmic
Warnings
SOC497 @ CSUN w/ Ellis Godard
Normal – not same as “bell-shaped”
Skewed – left/right? heavy/slight?
Oddities
Descriptive statistics – numerically summarize observations
Inferential statistics – generalize beyond a sample
Complexity
Choices
Basic Targets
Numbers vs. Procedures
Parameters (about populations) vs. Statistics (about samples)
Purposes
Admin
17
Shapes of Distributions
Meaning
SOC497 @ CSUN w/ Ellis Godard
14
May be nothing distinctive
Don’t exaggerate – almost certainly not normal!
SOC497 @ CSUN w/ Ellis Godard
18
3
7/20/2015
Admin
Choices
Levels
Statistics
Measures
Choosing
SPSS
Univariate Descriptive Statistics
Central Tendency
Mode: most frequently occurring value
Median: middle value, when sample ordered
Mean: arithmetic average, “center of gravity”
Choices
Variance
Variation ratio: percent that isn’t the mode
Range (Max – Min) & IQR (middle 50%)
Variance & Standard Deviation
Choices
Levels
Statistics
Mean
Standard
Deviation
SPSS
∑
Y
∑ (Y − Y )
i
n
σ2
)
Square root of variance
Interval
∑ (Y − Y )
2
i
n
σ
)
SOC497 @ CSUN w/ Ellis Godard
Admin
Choices
Levels
Statistics
Measures
22
Choosing
SPSS
Standard Deviations & Standard Errors
Appropriate if…
=
Interval
s = s2 =
Deviations across sample distributions:
Simple average (the sum of all
interval (unless heavily
the values, divided by the number skewed)
of values)
Y
Appropriate if…
(for population:
Measures of Central Tendancy
Formula
SPSS
Formula
(for population:
Choosing
Choosing
The average squared
difference between each
value and the mean
19
Measures
Measures
2
SOC497 @ CSUN w/ Ellis Godard
Admin
Statistics
s2 =
Dispersion (how far are they spread out?)
Levels
More Measures of Dispersion
(what is typical?)
Admin
i
n
About 68% of the values in a normal
distribution will fall w/I 1 standard deviation of
the man, 95% within 2.96, and 99.9 within 3.
Deviations across sampling distributions:
Median
Mode
If even number of cases, the
median case (not value) is the
(n/2)th case. Otherwise, it is the
[(n+1)/2]th case
Interval (if skewed),
ordinal (mode too?)
Highest (relative) frequency
any, but best for nominal
(only choice)
SOC497 @ CSUN w/ Ellis Godard
Admin
Choices
Levels
Statistics
Measures
20
Choosing
SPSS
Measures of Dispersion
Formula
Same idea, same distribution, same %’s
But the standard deviation of a sampling
distribution is called the standard error
Don’t confuse that with sampling error
SOC497 @ CSUN w/ Ellis Godard
Admin
Choices
Appropriate if…
The percent of cases not in the Nominal (because nothing
modal category
else works)
Range
Simple subtraction of the lowest Ordinal (works best), and
value (the “minimum”) from the interval (esp. if sample
highest (the “maximum”)
range is less than
population range)
Interval (also perhaps okay
for ordinal, if the sample is
not small and the range is
not short)
SOC497 @ CSUN w/ Ellis Godard
Choosing
SPSS
nominal data; don’t trust means for ordinal data;
2.
21
Shape of distribution – e.g. for skewed interval data, use
both – the mean will differ from the median in the direction of skew
i.e. higher if skewed right, lower if skewed left)
Robustness – Here, a statistic is “robust” if it resists sampling
deviations. The mean is fairly robust, but the median is less
misleading if there are scraggly tales
4.
Efficiency – use the highest level of precision available (modes
are least precise)
5.
SOC497 @ CSUN w/ Ellis Godard
Measures
What else makes a measure of central tendancy or
dispersion “appropriate”? (in order of importance)
1.
Scale of measurement – no medians or means for
3.
Same as the range, but only of
the middle half of the cases
when ordered – i.e. from the
25th percentile to the 75th
Statistics
Criteria for Selection
Variation
Ratio
Innerquartile
range
(IQR)
Levels
23
When in doubt, use more than one
SOC497 @ CSUN w/ Ellis Godard
24
4
7/20/2015
Admin
Choices
Levels
Statistics
Measures
Choosing
SPSS
Admin
Mode
Median
Variance
Variation Ratio
Standard
Deviation
17%
11%
2.
Statistics
Measures
3.
4.
Choosing
SPSS
Admin
Choices
SPSS
ANALYZE – DESCRIPTIVES – FREQUENCIES
Mean, stdev, a few others – but not all you need!
Stats & Choose which ones you want
More options, more output, tables, etc.
Use this one!!
Levels
Statistics
Measures
26
Choosing
SPSS
SOC497 @ CSUN w/ Ellis Godard
Admin
Choices
Levels
Statistics
Measures
29
Choosing
SPSS
SPSS Tips for HW5 (Happy etc)
Chi-square
Central tendency: mean (median if skewed)
Dispersion: standard deviation (mode “ “)
ANALYZE – DESCRIPTIVES – CROSSTABS
DV = row; IV = column
Stats – Chi-square
Correlations
Central tendency: median (& mode?)
Dispersion: range (IQR?)
Nominal
Choosing
ANALYZE – DESCRIPTIVES – DESCRIPTIVES
Ordinal
Measures
Mean and Standard deviation (taking
advantage of equal increments between
values)
Interval
Statistics
Median and Range (because you can at least
put the values in order)
Question 1 from 1st Day Quiz
Levels
28
Two options for basic stats (used the 2nd)
Mode and variation ratio (because there are
no alternatives)
Choices
Math 140! SPSS Tips for Today’s Lab
SOC497 @ CSUN w/ Ellis Godard
Any (though most of you did interval, if that)
SOC497 @ CSUN w/ Ellis Godard
Interval
Admin
Interval (t: 2 levels of Ord or Nom; F: >2 of O)
5.
Ordinal
Ord, Nom
25
Nominal
SPSS
6. Formula for z (or t)
Typical Choices
Choosing
5. Which level(s) for regression?
22%
1.
Levels
Measures
4. which level(s) for t test or F test (ANOVA)?
SOC497 @ CSUN w/ Ellis Godard
Choices
Statistics
3. Which level(s) appropriate for crosstabs?
44%
6%
Admin
Levels
Questions 3-6 from 1st Day Quiz
4. To measure nominal dispersion use…
1.
2.
3.
4.
5.
Choices
Central tendency: mode (only!)
Dispersion: variation ratio (Index of Qual. Variation)
SOC497 @ CSUN w/ Ellis Godard
SOC497 @ CSUN w/ Ellis Godard
27
ANALYZE – CORRELATE – BIVARIATE
Just get those stats!
Don’t worry about crosstab or corr. matrix
SOC497 @ CSUN w/ Ellis Godard
30
5
7/20/2015
Admin
Choices
Levels
Statistics
Measures
Choosing
SPSS
Lab Exercise
Pick a dataset (from 497 or 364 sites)
Pick 3 variables
One for each level of measurement (I, O, N)
Do NOT use the “measure” column in SPSS to pick!!
Look at values column and/or codebook and/or freq tables
Submit
Printout of frequency tables & histograms
Description of the shape of each distribution
Report central tendency & dispersion of each
Report the pieces and tell a story
Use the statistics to describe the sample
SOC497 @ CSUN w/ Ellis Godard
Admin
Choices
Levels
Statistics
31
Measures
Choosing
SPSS
5. To measure nominal dispersion use…
1.
2.
3.
4.
5.
Mode
Median
Variance
Variation Ratio
Standard
Deviation
76%
18%
6%
0%
0%
1.
2.
SOC497 @ CSUN w/ Ellis Godard
Admin
Choices
Levels
Statistics
Measures
3.
Choosing
4.
5.
32
SPSS
Quiz Scores by Clicker Attitude
Points
2.39
2
2
1.33
Team
Whatever…
Really, really
suck.
Rock my world
Add some value
Points
SOC497 @ CSUN w/ Ellis Godard
SOC497 @ CSUN w/ Ellis Godard
Team
33
6
Download