Overview
• Identify and explain descriptive and inferential
(parametric) statistics.
• Descriptive statistics are the measures of central
tendency.
• Mean, median, mode, standard deviation, and
standard error of the mean.
• The inferential statistics are:
• The z-test, t-test, f-test, and the Pearson correlation
coefficient (r).
Overview
• To provide the information necessary to interpret the
tables for these tests as presented in the Statistical
Package for the Social Sciences (SPSS).
• This includes:
•
•
•
•
Confidence level
Confidence interval
Significance (sig.)
Assumption of equal variance
Quantitative Research
• Most of the quantitative research done at the MBA level
is survey research.
• Issues concerning quantitative survey research design.
• What are the variables for the study?
• How will these variables be measured?
• How will the sample be selected?
• How will the data be collected?
• How many participants are needed?
• How will the data be analyzed?
Quantitative Research
• There are four types of quantitative research questions.
Type of Research Question
Survey Research Example
Descriptive
What are participants’ opinion
about hazardous waste?
Group Difference
Is there a significant difference
in opinions about hazardous
waste between Democrats and
Republicans?
Relationship
Is there a significant
relationship between age of
participants and their opinions
about hazardous waste?
Prediction
Can opinions about hazardous
waste be accurately predicted
by level of education or
religious preference?
Descriptive Questions
• Descriptive questions count and group responses.
• In our study regarding participants’ opinions about
hazardous waste descriptive questions would be used
to:
• Count similar opinions
• Group opinions by demographic characteristics
(age, education level, sex, etc.)
• Describe grouped opinions by average, most often
expressed, degree of intensity, etc.
• Descriptive statistics are used to answer these questions
• Mean, median, mode, standard deviation.
Group Difference Questions
• Group difference questions compare the responses of
one group of participants with those of another group to
identify significant differences.
• A single sample can be compared to a constant – a
researcher might want to test whether a groups’
average level of concern expressed on a hazardous
waste survey differs from 10 (on a 1 to 10 scale).
• Two independent samples can be compared with
each other - a researcher might want to test whether
the average level of concern expressed by
Republicans on a hazardous waste survey differs
from that expressed by Democrats.
Group Difference Questions
• Two paired samples can be compared with each
other – a researcher might want to test whether the
average level of concern expressed by a group on a
hazardous waste survey before a hazardous waste
ad campaign differs from that expressed by the same
group after the ad campaign.
• Mean comparison statistics are used to answer these
questions.
• One-sample t test
• Independent samples t test
• Paired samples t test
Relationship Questions
• Relationship questions look for relationships between
dependent and independent variables.
• A researcher may want to determine if there is a
significant relationship between a participants age
and the level of concern expressed on a hazardous
waste survey.
• Bivariate correlation statistics are used to answer these
questions.
• Pearson’s correlation coefficient.
Prediction Questions
• Prediction questions look for independent variables that
can be used to predict the outcome of the dependent
variable.
• A researcher may want to determine if participants’
level of education or religious preference predict the
average level of concern expressed on a hazardous
waste survey.
• Forecasting statistics are used to answer these
questions.
• Linear regression analysis
Levels of Measurement
• In putting together a survey the researcher must pay
attention to the level of measurement used to collect
data on the variables in the study.
• There are four levels or scales of measurement.
• Nominal
• Ordinal
• Interval
• Ratio
Nominal Measurements
• In nominal measurement the numerical values just
"name" the attribute uniquely. No ordering of the cases is
implied.
• For example, jersey numbers in basketball are measures
at the nominal level. A player with number 30 is not more
of anything than a player with number 15, and is
certainly not twice whatever number 15 is.
• Examples include:
• 1. Male
2. Female
• 1. Caucasian 2. Indian
3. Asian
• 1. Married
2. Single
3. Divorced
Ordinal Measurements
• In ordinal measurement the attributes can be rank-ordered.
Here, distances between attributes do not have any
meaning.
• For example, you might code Educational Attainment as
0=less than H.S. 1=some H.S. 2=H.S. degree 3=some college 4=college degree 5=post college
• In this measure, higher numbers mean more education.
But is distance from 0 to 1 same as 3 to 4? Of course not.
The interval between values is not interpretable in an
ordinal measure.
• Examples include:
• Movie ratings
• Class standing – Freshman, Sophomore, Junior, Senior
Interval Measurements
• In interval measurement the distance between attributes
does have meaning. For temperature (in Fahrenheit), the
distance from 30-40 is same as distance from 70-80.
• Because the interval between values is interpretable, we
can compute an average of an interval variable, where
we cannot for ordinal scales.
• Note that in interval measurement ratios don't make any
sense - 80 degrees is not twice as hot as 40 degrees.
• Examples include measures where 0 (zero) does not
mean the absence of the attribute.
Ratio Measurements
• In ratio measurement there is always an absolute zero
that is meaningful. This means that you can construct a
meaningful fraction (or ratio) with a ratio variable.
• Weight is a ratio variable.
• In applied social research most "count" variables are
ratio, for example, the number of clients in past six
months. Why? Because you can have zero clients and
because it is meaningful to say that "...we had twice as
many clients in the past six months as we did in the
previous six months.“
• Examples include: degrees Kelvin, annual income in
dollars, length or distance in inches, feet, miles, etc.
Levels of Measurement
Allowable
Nominal
Ordinal
Interval
Descriptive
Mode
Median, Mode
Mean, Median, Mode, Standard Deviation
Arithmetic
Operations
Counts
Greater or less than
Addition and
subtraction of scale
values
Statistics
Ratio
Multiplication and
division of scale
values.
z test, t test, f test, Pearson’s correlation
coefficient, linear regression
• In social science research the Likert scale is often used
to promote ordinal values to interval values. This allows
the data to be analyzed using inferential statistics.
The Likert Scale
• The scale is named after Rensis Likert, who first
developed it.
• The Likert scale is a psychometric scale commonly used
in questionnaires, and is the most widely used scale in
survey research.
• When responding to a Likert questionnaire item,
respondents specify their level of agreement to a
statement.
A Seven Level Likert Scale
1
2
3
4
5
6
7
The Likert Scale
• Likert scales can be used to evaluate subjective or
objective criteria.
• They are bipolar scales in that generally some level of
agreement or disagreement is measured.
• Many social science researchers recommend scales
consisting of 7 to 9 items.
A Seven Level Likert Scale
1
2
3
4
5
6
7
The Likert Scale
• Likert scale values should always be accompanied by
item labels.
• Without labels a mean result with a scale value of 1.9
would be reported as 1.9 on a scale of 1 to 7.
• With labels a mean result of 1.9 can additionally be
reported as ‘Dissatisfied’. This adds meaning to the
interpretation of the results.
A Seven Level Likert Scale
1
2
3
4
5
6
7
Very
Dissatisfied
Dissatisfied
Somewhat
Dissatisfied
Neither
Dissatisfied
or Satisfied
Somewhat
Satisfied
Satisfied
Very
Satisfied
The Likert Scale
• Some researchers object to the middle item indicating
the absence of satisfaction (or absence of agreement or
disagreement).
• They suggest that participants should be forced to
respond or to select a no response option.
A Seven Level Likert Scale
1
2
3
4
5
6
7
Very
Dissatisfied
Dissatisfied
Somewhat
Dissatisfied
Neither
Dissatisfied
or Satisfied
Somewhat
Satisfied
Satisfied
Very
Satisfied
The Likert Scale
• Moving the no response option out of the scale adds to
the meaningfulness of measures of central tendency
(mean, median, mode)
• It also allows data to be interpreted in two groups.
• The researcher can indicate the number of participants
responding as neither satisfied or dissatisfied.
• And, when the calculations are made they are based
on participants with an opinion about satisfaction.
A Six Level Likert Scale
1
2
3
4
5
6
0
Very
Dissatisfied
Dissatisfied
Somewhat
Dissatisfied
Somewhat
Satisfied
Satisfied
Very
Satisfied
Neither
Satisfied or
Dissatisfied
Types of Statistics
• There are two types of statistics used in social science
research.
• Descriptive
• Inferential
• Descriptive statistics refer to methods used to organize,
summarize, and tabulate data.
• Descriptive statistics provide a picture of what happened
in the study.
• Descriptive statistics provide a basis for inferential
statistics.
Types of Statistics
• Inferential statistics refers to methods used to draw
inferences about a population based on descriptive data
available on a sample drawn from the population.
The Mean
• The mean is the sum of
the individual samples (χ)
divided by the number of
samples.
• μ (mu) is the mean of the
entire population.
• x (x-bar) is the mean of
the sample.
Observation
x
1
60
2
34
3
74
4
10
5
86
6
59
7
34
8
50
9
43
10
59
11
68
12
35
13
53
14
28
15
82
16
47
17
60
18
40
19
19
20
59
1,000
x

x
n
1,000
50 
20
The Median
• The median is the number
that separates the higher
half of a sample from the
lower half.
• It is found by arranging the
observations from highest
to lowest and picking the
middle one.
• When the number of
observations is even, the
median is the mean* of the
two middle observations.
Observation
x
1
86
2
82
3
74
4
68
5
60
6
60
7
59
8
59
9
59
10
53
11
50
12
47
13
43
14
40
15
35
16
34
17
34
18
28
19
19
20
10
(53  50 )
51 .5 
2
* The strength of the
median as a measure
of central tendency is
that, unlike the mean,
it is a value that
occurs in the sample.
This strength is
nullified when the
median is the mean of
the two middle
observations.
The Mode
• The mode is the value
that occurs most
frequently in a sample.
• When multiple values
occur with the highest
frequency the sample is
said to be bi-modal or
multi-modal.
Observation
x
1
86
2
82
3
74
4
68
5
60
6
60
7
59
8
59
9
59
10
53
11
50
12
47
13
43
14
40
15
35
16
34
17
34
18
28
19
19
20
10
Standard Deviation
• It is a measure of the
dispersion of a collection
of numbers.
• It indicates how widely
spread the values in a
dataset are with respect
to their mean.
A data set with a mean of 50 (shown
in blue) and a standard deviation (σ)
of 20.
Standard Deviation

 (x  x)
2
(n  1)
7,612
20 
19
• It is calculated by
determining the square
root of the variance.
xx
( x  x )2
x
x
1
60
50
10
100
2
34
50
-16
256
3
74
50
24
576
4
10
50
-40
1600
5
86
50
36
1296
6
59
50
9
81
7
34
50
-16
256
8
50
50
0
0
9
43
50
-7
49
10
59
50
9
81
11
68
50
18
324
12
35
50
-15
225
13
53
50
3
9
14
28
50
-22
484
15
82
50
32
1024
16
47
50
-3
9
17
60
50
10
100
18
40
50
-10
100
19
19
50
-31
961
20
59
50
9
81
7,612
Standard Error of the Mean (SEM)
• Because the mean of the
population is usually unknown
it is important to estimate the
error between the sample
mean and the population
mean.
• The SEM is an unbiased
estimate of expected error in
the sample estimate of a
population mean.
SEM 

n
20
4.47 
20
Inferential Statistics
• With these measures of central tendency (mean,
median, mode, standard deviation, and standard error of
the mean), we are now able to statistically compare
samples.
• We have our sample (n = 20)
•
•
•
•
•
Mean = 50
Median = 51.5
Mode = 59
Standard Deviation = 20
Standard Error of the Mean = 4.47
• Let’s assume that we want to compare our sample with
another sample having a mean of 60 to determine if
there is a significant difference between the samples.
The z-test
• The z-test is used primarily with standardized testing to
determine if the test scores of a particular sample of test
takers are within or outside of the standard performance
of test takers (a second group of test scores).
• The assumptions necessary for the z-test are:
• The population standard deviation must be known.
• The sample must be random.
• The sample must be normally distributed.
• The null hypothesis for the test is that the means are
equal H0: μ1=μ2 (the samples come from the same
population).
The z-test
• First we need to determine the confidence level
(CL) we require for this comparison.
• The CL is determined by the researcher.
• The CL tells us how sure we can be of our results.
• Most social science research uses a CL of 95%.
• Next we need to determine the alpha level (α).
• α = 1-CL
• Most social science research uses an α of 5%.
The z-test
• The z score shows the
distance of our test mean
from the population mean
in units of the population
standard deviation.
• Our test mean = 60
• Our sample mean = 50
• Our SEM = 4.47
• Therefore,
• Our z-score is 2.23
xx
z
SEM
60  50
2.23 
4.47
The Normal Distribution
That means our comparison
mean is 2.23 standard
deviations above our
population mean.
Mean
Median
Mode
The Normal Distribution
A z-table tells us that 48.75%
of the scores fall between 0
and our score of 60 (2.23σ).
In the normal distribution,
50% of the scores fall
between 0 and -∞.
48.75%
50%
Mean
Median
Mode
The Normal Distribution
Our confidence interval was
95%. Therefore, we must
reject H0: (μsamp=μcomp).
This tells us that 98.75% of
the time the comparison
scores are higher than the
scores of our sample.
98.75%
Mean
Median
Mode
The Normal Distribution
The minimum/maximum
allowable z-score for our
95% CL in this two-tailed test
is ±1.96.
95%
Mean
Median
Mode
Confidence Interval
• Now that we know the
number of standard
deviations for the 95% CL
for the z-test we can
calculate our confidence
interval.
• z-score for 95%CL = ±1.96
• SEM = 4.47
• Mean = 50
The upper bound is
ub  x  (SEM)(z )
58.76  50  8.76
The lower bound is
lb  x  (SEM)(z)
41.24  50  8.76
Confidence Interval
• The range of means for which we can fail to reject
H0: (μsamp=μcomp) is 41.24 to 58.76.
Errors in Hypothesis Testing
• Hypothesis testing provides a negative assessment.
• Researchers do not prove hypothesis. They test them
in an attempt to disprove them.
• Researchers either reject H0 or fail to reject H0.
• There are two possible errors in hypothesis testing.
• Type 1 Error: reject H0 when H0 is true.
• Type 2 Error: fail to reject H0 when H0 is false.
• The α level of the test sets the probability of making a
Type 1 or a Type 2 Error.
• In most social science research the probability of an
error is 5%.
The t distribution
• The t test is the mean comparison test most often used
in social science research. The z-test requires that the
population standard deviation is known. That is almost
never the case in social science research.
• The assumptions of the t test are:
• The sample must be random.
• The sample must be normally distributed.
• However the t test is much more forgiving of violations of
these assumptions (random samples are difficult).
The t Distribution
• The t test is much better with smaller samples.
• The area under the curve changes with each sample size below 200.
• With sample sized above 200 the t and z tests are the same.
Mean
Median
Mode
The t Distribution
• By increasing the area under the curve for smaller samples
the t test decreases the likelihood of a type 1 error.
• t test and z test results are read and interpreted the same.
Mean
Median
Mode
The F Distribution
• The F test has a variety of applications.
• It measures variance within and between samples.
• In inferential statistics it is used to test variance between
two samples (t test) or to analyze the variance of more
than two samples as in analysis of variance calculations
(ANOVA).
• It is read and interpreted in the same way as the t test
and the z test.
SPSS
• Pause here and use the Statistical Package for the
Social Sciences (SPSS) to compare sample means.
Bivariate Correlation
• The relationship between two sets of variables can be
calculated numerically with a correlation coefficient.
• The correlation coefficient is a measure of the strength of
association between two sets of variables.
• The correlation coefficient is a value between -1 and 1.
• The stronger the correlation between two sets of
variables the closer the correlation coefficient is to -1 (for
negative correlations) and 1 (for positive correlations).
Bivariate Correlation
• Hinkle provides a table used by many social science
researchers to describe the level of correlation present in
their data.
Interpretation
Very High
Size of Correlation
Direction
Size of Correlation
Direction
.90 to 1.00 Positive
-.90 to -1.00 Negative
High
.70 to .89 Positive
-.70 to -.89 Negative
Moderate
.50 to .69 Positive
-.50 to -.69 Negative
Low
.30 to .49 Positive
-.30 to -.49 Negative
Little if any
.00 to .29 Positive
-.00 to -.29 Negative
The Pearson r
• The Pearson correlation coefficient or Pearson r is used
when both the X and Y variables are measured on at
least the interval scale.
• All correlation coefficients operate in a similar fashion.
• The H0 for the Pearson r is H0: ρ=0. Thus, there is no
correlation.
SPSS
• Pause here and use the Statistical Package for the
Social Sciences (SPSS) to calculate correlations.