IST 203 Statistics for Social Sciences
(Section 5231, 5232)
Review for Final Examination
[ Lectures 1, 2, 3 (A, B), 4A, 5A, 6, 7 ]
Bangkok University International College
May 7, 2011
1-2
IST 203: Statistics for Social Sciences
Lecture 1
1-3
What is Statistics?
• Statistics is the science of collecting, organizing,
analyzing, interpreting, and presenting data.
• A statistic is a single measure (number) used to
summarize a sample data set. For example, the
average height of students in this class.
• A statistician is an expert with at least a master’s
degree in mathematics or statistics or a trained
professional in a related field.
McGraw-Hill/Irwin
© 2008 The McGraw-Hill Companies, Inc. All rights reserved.
1-4
Uses of Statistics
 Two primary uses for statistics:
• Descriptive statistics – the collection, organization,
presentation and summary of data.
• Inferential statistics – generalizing from a sample
to a population, estimating unknown parameters,
drawing conclusions, making decisions.
1-5
Statistical Challenges
 Working with Imperfect Data
• State any assumptions and limitations and use
generally accepted statistical tests to detect
unusual data points or to deal with missing data.
 Dealing with Practical Constraints
• You will face constraints on the type and quantity
of data you can collect.
1-6
Statistical Challenges
 Upholding Ethical Standards
• Know and follow accepted procedures, maintain
data integrity, carry out accurate calculations,
report procedures, protect confidentiality, cite
sources and financial support.
 Using Consultants
• Hire consultants at the beginning of the project,
when your team lacks certain skills or when an
unbiased or informed view is needed.
1-7
Statistical Pitfalls
 Pitfall 1: Making Conclusions about a Large
Population from a Small Sample
• Be careful about making generalizations from
small samples (e.g., a group of 10 patients).
 Pitfall 2: Making Conclusions from
Nonrandom Samples
• Be careful about making generalizations from
retrospective studies of special groups (e.g.,
heart attack patients).
1-8
Statistical Pitfalls
 Pitfall 3: Attaching Importance to Rare
Observations from Large Samples
• Be careful about drawing strong inferences from
events that are not surprising when looking at the
entire population (e.g., winning the lottery).
 Pitfall 4: Using Poor Survey Methods
• Be careful about using poor sampling methods or
vaguely worded questions (e.g., anonymous
survey or quiz).
1-9
Statistical Pitfalls
 Pitfall 5: Assuming a Causal Link Based on
Observations
• Be careful about drawing conclusions when no
cause-and-effect link exists (e.g., most shark
attacks occur between 12p.m. and 2p.m.).
 Pitfall 6: Making Generalizations about
Individuals from Observations about Groups
• Avoid reading too much into statistical
generalizations (e.g., men are taller than
women).
1-10
Statistical Pitfalls
 Pitfall 7: Unconscious Bias
• Be careful about unconsciously or subtly allowing
bias to color handling of data (e.g., heart disease
in men vs. women).
 Pitfall 8: Attaching Practical Importance to
Every Statistically Significant Study Result
• Statistically significant effects may lack practical
importance (e.g., Austrian military recruits born in
the spring average 0.6 cm taller than those born
in the fall).
1-11
IST 203: Statistics for Social Sciences
Lecture 2
1-12
Data Vocabulary
•
Data is the plural form of the Latin datum (a “given”
fact).
•
In scientific research, data arise
from experiments whose results
are recorded systematically.
•
In business, data usually arise from
accounting transactions or
management processes.
•
Important decisions may depend on data.
1-13
Data Vocabulary
 Subjects, Variables, Data Sets
• We will refer to Data as plural and data set as a
particular collection of data as a whole.
• Observation – each data value.
• Subject (or individual) – an item for study (e.g., an
employee in your company).
• Variable – a characteristic about the subject or
individual (e.g., employee’s income).
1-14
Data Vocabulary
 Subjects, Variables, Data Sets
• Three types of data sets:
Data Set
Variables
Typical Tasks
Univariate
One
Histograms, descriptive
statistics, frequency tallies
Bivariate
Two
Scatter plots, correlations,
simple regression
Multivariate More than
two
Multiple regression, data
mining, econometric modeling
1-15
Data Vocabulary
 Subjects, Variables, Data Sets
Consider the multivariate data set with
5 variables 8 subjects
5 x 8 = 40 observations
1-16
Data Vocabulary
 Attribute Data
• Also called categorical, nominal or qualitative data.
• Values are described by words rather than
numbers.
• For example,
- Automobile style (e.g., X = full, midsize,
compact, subcompact).
- Mutual fund (e.g., X = load, no-load).
1-17
Data Vocabulary
 Data Coding
• Coding refers to using numbers to represent
categories to facilitate statistical analysis.
• Coding an attribute as a number does not make
the data numerical.
• For example,
1 = Bachelor’s, 2 = Master’s, 3 = Doctorate
• Rankings may exist, for example,
1 = Liberal, 2 = Moderate, 3 = Conservative
1-18
Data Vocabulary
 Binary Data
• A binary variable has only two values,
1 = presence, 0 = absence of a characteristic of
interest (codes themselves are arbitrary).
• For example,
1 = employed, 0 = not employed
1 = married, 0 = not married
1 = male, 0 = female
1 = female, 0 = male
• The coding itself has no numerical value so binary
variables are attribute data.
1-19
Data Vocabulary
 Numerical Data
• Numerical or quantitative data arise from counting
or some kind of mathematical operation.
• For example,
- Number of auto insurance claims filed in
March (e.g., X = 114 claims).
- Ratio of profit to sales for last quarter
(e.g., X = 0.0447).
• Can be broken down into two types – discrete or
continuous data.
1-20
Data Vocabulary
 Discrete Data
• A numerical variable with a countable number of
values that can be represented by an integer (no
fractional values).
• For example,
- Number of Medicaid patients (e.g., X = 2).
- Number of takeoffs at O’Hare (e.g., X = 37).
1-21
Data Vocabulary
 Continuous Data
• A numerical variable that can have any value
within an interval (e.g., length, weight, time, sales,
price/earnings ratios).
• Any continuous interval contains infinitely many
possible values (e.g., 426 < X < 428).
1-22
Level of Measurement
 Four levels of measurement for data:
Level of
Measurement
Characteristics
Example
Nominal
Categories only
Eye color (blue, brown,
green, hazel)
Ordinal
Rank has meaning
Bond ratings (Aaa, Aab,
C, D, F, etc.)
Interval
Distance has
meaning
Temperature (57o
Celsius)
Ratio
Meaningful zero
exists
Accounts payable ($21.7
million)
1-23
Level of Measurement
 Nominal Measurement
• Nominal data merely identify a category.
• Nominal data are qualitative, attribute, categorical
or classification data (e.g., Apple, Compaq, Dell,
HP).
• Nominal data are usually coded numerically,
codes are arbitrary (e.g., 1 = Apple, 2 = Compaq,
3 = Dell, 4 = HP).
• Only mathematical operations are counting (e.g.,
frequencies) and simple statistics.
1-24
Level of Measurement
 Ordinal Measurement
• Ordinal data codes can be ranked
(e.g., 1 = Frequently, 2 = Sometimes, 3 = Rarely,
4 = Never).
• Distance between codes is not meaningful
(e.g., distance between 1 and 2, or between 2 and
3, or between 3 and 4 lacks meaning).
• Many useful statistical tests exist for ordinal data.
Especially useful in social science, marketing and
human resource research.
1-25
Level of Measurement
 Interval Measurement
• Data can not only be ranked, but also have
meaningful intervals between scale points
(e.g., difference between 60F and 70F is same
as difference between 20F and 30F).
• Since intervals between numbers represent
distances, mathematical operations can be
performed (e.g., average).
• Zero point of interval scales is arbitrary, so ratios
are not meaningful (e.g., 60F is not twice as
warm as 30F).
1-26
Level of Measurement
 Likert Scales
• A special case of interval data frequently used in
survey research.
• The coarseness of a Likert scale refers to the
number of scale points (typically 5 or 7).
“College-bound high school students should be required to study a
foreign language.” (check one)





Strongly
Agree
Somewhat
Agree
Neither
Agree
Nor
Disagree
Somewhat
Disagree
Strongly
Disagree
1-27
Level of Measurement
 Likert Scales
• A neutral midpoint (“Neither Agree Nor Disagree”)
is allowed if an odd number of scale points is used
or omitted to force the respondent to “lean” one
way or the other.
• Likert data are
coded numerically
(e.g., 1 to 5) but any
equally spaced
values will work.
Likert coding:
1 to 5 scale
Likert coding:
-2 to +2 scale
5 = Help a lot
4 = Help a little
3 = No effect
2 = Hurt a little
1 = Hurt a lot
+2 = Help a lot
+1 = Help a little
0 = No effect
1 = Hurt a little
2 = Hurt a lot
1-28
Sampling Concepts
 Sample or Census?
• A sample involves looking only at some items
selected from the population.
• A census is an examination of all items in a
defined population.
• Why can’t the United States Census survey every
person in the population?
- Mobility
- Illegal immigrants
- Budget constraints
- Incomplete responses or nonresponses
1-29
Sampling Concepts
Situations Where A Sample May Be Preferred:
Infinite Population
No census is possible if the population is infinite or of indefinite size
(an assembly line can keep producing bolts, a doctor can keep
seeing more patients).
Destructive Testing
The act of sampling may destroy or devalue the item (measuring
battery life, testing auto crashworthiness, or testing aircraft turbofan
engine life).
Timely Results
Sampling may yield more timely results than a census (checking
wheat samples for moisture and protein content, checking peanut
butter for aflatoxin contamination).
1-30
Sampling Concepts
Situations Where A Sample May Be Preferred:
Accuracy
Sample estimates can be more accurate than a census. Instead of
spreading limited resources thinly to attempt a census, our budget
of time and money might be better spent to hire experienced staff,
improve training of field interviewers, and improve data safeguards.
Cost
Even if it is feasible to take a census, the cost, either in time or
money, may exceed our budget.
Sensitive Information
Some kinds of information are better captured by a well-designed
sample, rather than attempting a census. Confidentiality may also
be improved in a carefully-done sample.
1-31
Sampling Concepts
Situations Where A Census May Be Preferred
Small Population
If the population is small, there is little reason to sample, for the effort of
data collection may be only a small part of the total cost.
Large Sample Size
If the required sample size approaches the population size, we might as
well go ahead and take a census.
Database Exists
If the data are on disk we can examine 100% of the cases. But auditing or
validating data against physical records may raise the cost.
Legal Requirements
Banks must count all the cash in bank teller drawers at the end of each
business day. The U.S. Congress forbade sampling in the 2000 decennial
population census.
1-32
Sampling Concepts
 Parameters and Statistics
• Statistics are computed from a sample of n items,
chosen from a population of N items.
• Statistics can be used as estimates of parameters
found in the population.
• Symbols are used to represent population
parameters and sample statistics.
1-33
Sampling Concepts
 Parameters and Statistics
Parameter or Statistic?
Parameter
Any measurement that describes an entire population.
Usually, the parameter value is unknown since we
rarely can observe the entire population. Parameters
are often (but not always) represented by Greek
letters.
Statistic
Any measurement computed from a sample. Usually,
the statistic is regarded as an estimate of a population
parameter. Sample statistics are often (but not
always) represented by Roman letters.
1-34
Sampling Concepts
 Parameters and Statistics
• The population must be carefully specified and the
sample must be drawn scientifically so that the
sample is representative.
 Target Population
• The target population is the population we are
interested in (e.g., U.S. gasoline prices).
• The sampling frame is the group from which we
take the sample (e.g., 115,000 stations).
• The frame should not differ from the target
population.
1-35
Sampling Concepts
 Finite or Infinite?
• A population is finite if it has a definite size, even if
its size is unknown.
• A population is infinite if it is of arbitrarily large
size.
• Rule of Thumb: A population may be treated as
infinite when N is at least 20 times n (i.e., when
N/n > 20)
N
n
Here,
N/n > 20
1-36
Sampling Methods
Probability Samples
Simple Random
Sample
Use random numbers to select items
from a list (e.g., VISA cardholders).
Systematic Sample
Select every kth item from a list or
sequence (e.g., restaurant customers).
Stratified Sample
Select randomly within defined strata
(e.g., by age, occupation, gender).
Cluster Sample
Like stratified sampling except strata
are geographical areas (e.g., zip
codes).
1-37
Sampling Methods
Nonprobability Samples
Judgment
Sample
Use expert knowledge to choose
“typical” items (e.g., which employees
to interview).
Convenience
Sample
Use a sample that happens to be
available (e.g., ask co-worker opinions
at lunch).
1-38
Sampling Methods
 With or Without Replacement
• If we allow duplicates when sampling, then we are
sampling with replacement.
• Duplicates are unlikely when n is much smaller
than N.
• If we do not allow duplicates when sampling, then
we are sampling without replacement.
1-39
Sampling Methods
 Systematic Sampling
• Sample by choosing every kth item from a list,
starting from a randomly chosen entry on the list.
• For example, starting at item 2, we sample every
k = 4 items to obtain a sample of n = 20 items from
a list of N = 78 items.
• Note that N/n = 78/20  4.
1-40
Sampling Methods
 Systematic Sampling
• A systematic sample of n items from a population
of N items requires that periodicity k be
approximately N/n.
• Systematic sampling should yield acceptable
results unless patterns in the population happen to
recur at periodicity k.
• Can be used with unlistable or infinite populations.
• Systematic samples are well-suited to linearly
organized physical populations.
1-41
Sampling Methods
 Systematic Sampling
• For example, out of 501 companies, we want to
obtain a sample of 25. What should the periodicity
k be?
k = N/n = 501/25  20.
• So, we should choose every 20th company from a
random starting point.
1-42
Sampling Methods
 Stratified Sampling
• Utilizes prior information about the population.
• Applicable when the population can be divided
into relatively homogeneous subgroups of known
size (strata).
• A simple random sample of the desired size is
taken within each stratum.
• For example, from a population containing 55%
males and 45% females, randomly sample 120
males and 80 females (n = 200).
1-43
Sampling Methods
 Stratified Sampling
• Or, take a random sample of the entire population
and then combine individual strata estimates using
appropriate weights.
• For a population with L strata, the population size
N is the sum of the stratum sizes:
N = N1 + N2 + ... + NL
• The weight assigned to stratum j is
wj = Nj / n
• For example, take a random sample of n = 200
and then weight the responses for males by
wM = .55 and for females by wF = .45.
1-44
Sampling Methods
 Cluster Sample
• Strata consist of geographical regions.
• One-stage cluster sampling – sample consists of
all elements in each of k randomly chosen
subregions (clusters).
• Two-stage cluster sampling, first choose k
subregions (clusters), then choose a random
sample of elements within each cluster.
1-45
Sampling Methods
 Judgment Sample
• A nonprobability sampling method that relies on
the expertise of the sampler to choose items that
are representative of the population.
• Can be affected by subconscious bias (i.e.,
nonrandomness in the choice).
• Quota sampling is a special kind of judgment
sampling, in which the interviewer chooses a
certain number of people in each category.
1-46
Sampling Methods
 Convenience Sample
• Take advantage of whatever sample is available at
that moment. A quick way to sample.
 Sample Size
• Sample size depends on the inherent variability of
the quantity being measured and on the desired
precision of the estimate.
1-47
IST 203: Statistics for Social Sciences
Lecture 3 (A, B)
1-48
Visual Description
• Methods of organizing, exploring and summarizing
data include:
- Visual (charts and graphs)
provides insight into characteristics of a data set
without using mathematics.
- Numerical (statistics or tables)
provides insight into characteristics of a data set
using mathematics.
1-49
Visual Description
• Begin with univariate data (a set of n observations
on one variable) and consider the following:
Characteristic
Interpretation
Measurement
What are the units of measurement?
Are the data integer or continuous?
Any missing observations? Any concerns with
accuracy or sampling methods?
Central Tendency
Where are the data values concentrated? What
seem to be typical or middle data values?
1-50
Visual Description
Characteristic
Interpretation
Dispersion
How much variation is there in the data?
How spread out are the data values?
Are there unusual values?
Shape
Are the data values distributed symmetrically?
Skewed? Sharply peaked? Flat? Bimodal?
1-51
Visual Description
 Measurement
• Look at the data and visualize how it was collected
and measured.
 Sorting
• Sort the data and then summarize in a graphical
display. Here are the sorted P/E ratios:
8
10
10
10
13
13
14
14
15
15
16
16
17
18
19
19
20
20
21
22
23
26
26
27
29
29
34
48
55
68
• A histogram graphically displays sorted data.
1-52
Visual Description
 Sorting
• Sorting allows you to observe central tendency,
dispersion and shape as well as minimum, maximum
and range.
• What else do
you observe?
1-53
Dot Plots
• A dot plot is the simplest graphical display of n
individual values of numerical data.
- Easy to understand
- Not good for large samples (e.g., > 5,000).
 Steps in Making a Dot Plot
1. Make a scale that covers the data range
2. Mark the axes and label them
3. Plot each data value as a dot above the scale at
its approximate location
If more than one data value lies at about the same
axis location, the dots are piled up vertically.
1-54
Dot Plots
• Range of data shows dispersion.
• Clustering shows central tendency.
• Dot plots do not tell much of shape of distribution.
• Can add annotations (text boxes) to call attention
to specific features.
1-55
Dot Plots
 Small Sample: Home Prices
• Consider the
following median
home prices for
nine U.S. Cities.
Metropolitan Area
Median Home Price
(000)
Akron OH
119.6
Bergen-Passaic NJ
363.0
Bradenton FL
170.4
Colorado Springs CO
181.7
Hartford CT
198.5
Milwaukee WI
186.2
Raleigh-Durham NC
173.8
San Francisco CA
560.2
Topeka KS
100.7
1-56
Dot Plots
 Small Sample: Home Prices
• A dot plot is useful to realtors as they discuss
patterns in home selling prices within their
community.
1-57
Dot Plots
 Comparing Groups
• A stacked dot plot compares two or more groups
using a common X-axis scale.
Frequency Distributions
and Histograms
3A-58
1-58
 Bins and Bin Limits
• A frequency distribution is a table formed by
classifying n data values into k classes (bins).
• Bin limits define the values to be included in each
bin. Widths must all be the same.
• Frequencies are the number of observations within
each bin.
• Express as relative frequencies (frequency divided
by the total) or percentages (relative frequency
times 100).
Frequency Distributions
and Histograms
 Constructing a Frequency Distribution
1. Sort data in ascending order (e.g., P/E ratios)
8
10
10
10
13
13
14
14
15
15
16
16
17
18
19
19
20
20
21
22
23
26
26
27
29
29
34
48
55
68
2. Choose the number of bins (k)
- k should be much smaller than n.
- Too many bins results in sparsely populated
bins, too few and dissimilar data values are
lumped together.
3A-59
1-59
3A-60
1-60
Frequency Distributions
and Histograms
 Constructing a Frequency Distribution
- Herbert Sturges proposes the following rule:
Sample Size Number of Bins
(n)
(k)
Sample Size Number of Bins
(n)
(k)
16
5
256
9
32
6
512
10
64
7
1024
11
128
8
Frequency Distributions
and Histograms
 Constructing a Frequency Distribution
3. Set the bin limits:
Bin width 
X max  X min
k
For example, for k = 7 bins, the approximate bin
width is:
Bin width 
68  8 60

 8.57
7
7
To obtain “nice” limits, we round the width to 10
and start the first bin at 0 to get bin limits:
0, 10, 20, 30, 40, 50, 60, 70
3A-61
1-61
Frequency Distributions
and Histograms
3A-62
1-62
 Constructing a Frequency Distribution
4. Put the data values in the appropriate bin
In general, the lower limit is included in the bin
while the upper limit is excluded.
5. Create the table, you can include
Frequencies – counts for each bin
Relative frequencies – absolute frequency divided
by total number of data values.
Cumulative frequencies – accumulated relative
frequency values as bin limits increase.
Frequency Distributions
and Histograms
What are the bin limits for the P/E ratio data?
Cumulative
Relative
Frequency
Bin Range
Frequency
Relative
Frequency
0<P/E Ratio<10
1
0.0333
0.0333
10<P/E Ratio<20
15
0.5000
0.5333
20<P/E Ratio<30
10
0.3333
0.8666
30<P/E Ratio<40
1
0.0333
0.8999
40<P/E Ratio<50
1
0.0333
0.9332
50<P/E Ratio<60
1
0.0333
0.9665
60<P/E Ratio<70
1
0.0333
0.9998
3A-63
1-63
Frequency Distributions
and Histograms
 Histograms
• A histogram is a graphical representation of a
frequency distribution.
Y-axis shows frequency within each bin.
• A histogram is a bar chart.
X-axis ticks shows end points of each bin.
3A-64
1-64
Frequency Distributions
and Histograms
 Histograms
• Consider 3 histograms for the P/E ratio data with
different bin widths. What do they tell you?
3A-65
1-65
Frequency Distributions
and Histograms
3A-66
1-66
 Modal Class
• A histogram bar that is higher than those on either
side.
• Monomodal – a single modal class.
• Bimodal – two modal classes.
• Multimodal – more than two modal classes.
• Modal classes may be artifacts of the way bin
limits are chosen.
Frequency Distributions
and Histograms
3A-67
1-67
 Shape
• A histogram suggests the shape of the population.
• It is influenced by number of bins and bin limits.
• Skewness – indicated by the direction of the longer
tail of the histogram.
Left-skewed – (negatively skewed) a longer left
tail.
Right-skewed – (positively skewed) a longer right
tail.
Symmetric – both tail areas approximately the
same.
1-68
3A-69
1-69
Line Charts
 Log Scales
• Arithmetic scale – distances on the Y-axis are
proportional to the magnitude of the variable being
displayed.
• Logarithmic scale – (ratio scale) equal distances
represent equal ratios.
• Use a log scale for the vertical axis when data vary
over a wide range, say, by more than an order of
magnitude.
• This will reveal more detail for small data values.
1-70
Scatter Plots
• A scatter plot shows n pairs of observations as
dots (or some other symbol) on an XY graph.
• A starting point for bivariate data analysis.
• Allows observations about the relationship
between two variables.
• Answers the question: Is there an association
between the two variables and if so, what kind of
association?
1-71
Scatter Plots
 Example: Birth Rates and Life Expectancy
• Consider
the
following
data:
Nation
Birth Rate
Life Expectancy
Afghanistan
41.03
46.60
Canada
11.09
79.70
Finland
10.60
77.80
Guatemala
34.17
66.90
Japan
10.03
80.90
Mexico
22.36
72.00
Pakistan
30.40
62.70
Spain
9.29
79.10
United States
14.10
77.40
1-72
Scatter Plots
 Example: Birth Rates and Life Expectancy
• Here is a scatter plot with life expectancy on the
X-axis and birth rates on the Y-axis.
• Is there an
association
between the two
variables?
• Is there a causeand-effect
relationship?
1-73
Scatter Plots
 Example: Aircraft Fuel Consumption
• Consider five observations on flight time and fuel
consumption for a twin-engine Piper Cheyenne
aircraft.
• A causal relationship
is assumed since a
longer flight would
consume more fuel.
Trip Leg
Flight Time
(hours)
Fuel Used
(pounds)
1
2.3
145
2
4.2
258
3
3.6
219
4
4.7
276
5
4.9
283
1-74
Scatter Plots
 Example: Aircraft Fuel Consumption
• Here is the scatter plot with flight time on the
X-axis and fuel use on the Y-axis.
• Is there an
association
between
variables?
1-75
Scatter Plots
 Degree of Association
Very strong association
Strong association
Moderate association
Little or no association
1-76
Tables
• Tables are the simplest form of data display.
• A compound table is a table that contains time
series data down the columns and variables
across the rows.
 Example: School Expenditures
• Arrangement of data is in rows and columns to
enhance meaning.
• The data can be viewed by focusing on the time
pattern (down the columns) or by comparing the
variables (across the rows).
1-77
Tables
 Example: School Expenditures
Elementary and Secondary
Year
All
Schools
Colleges and Universities
Total
Public
Private
Total
Public
Private
1960
142.2
99.6
93.0
6.6
42.6
23.3
19.3
1970
317.3
200.2
188.6
11.6
117.2
75.2
41.9
1980
373.6
232.7
216.4
16.2
140.9
93.4
47.4
1990
526.1
318.5
293.4
25.1
207.6
132.9
74.7
2000
691.9
418.2
387.8
30.3
273.8
168.8
105.0
Source: U.S. Census Bureau, Statistical Abstract of the United States: 2002, p. 133.
Note: All figures are in billions of constant 2000/2001 dollars.
• Units of measure are stated in the footnote.
• Note merged headings to group columns.
1-78
Pie Charts
 An Oft-Abused Chart
• A pie chart can only convey a general idea of the
data.
• Pie charts should be used to portray data which
sum to a total (e.g., percent market shares).
• A pie chart should only have a few (i.e., 2 or 3)
slices.
• Each slice should be labeled with data values or
percents.
1-79
Pie Charts
 Common Errors in Pie Chart Usage
• Pie charts can only convey a general idea of the
data values.
• Pie charts are ineffective when they have too many
slices.
• Pie chart data must represent parts of a whole
(e.g., percent market share).
1-80
Maps and Pictograms
 Pictograms
• A visual display in which data values are replaced
by pictures.
1-81
Deceptive Graphs
 Error 1: Nonzero Origin
• A nonzero origin will exaggerate the trend.
Deceptive
Correct
1-82
IST 203: Statistics for Social Sciences
Lecture 4A
1-83
Numerical Description
• Statistics are descriptive measures derived from a
sample (n items).
• Parameters are descriptive measures derived from
a population (N items).
1-84
Numerical Description
• Three key characteristics of numerical data:
Characteristic
Interpretation
Central Tendency
Where are the data values concentrated?
What seem to be typical or middle data
values?
Dispersion
How much variation is there in the data?
How spread out are the data values?
Are there unusual values?
Shape
Are the data values distributed symmetrically?
Skewed? Sharply peaked? Flat? Bimodal?
1-85
Central Tendency
 Six Measures of Central Tendency
Statistic
Formula
Excel Formula
Mean
1 n
xi

n i 1
Familiar and
uses all the
=AVERAGE(Data)
sample
information.
Median
Middle
value in
sorted
array
=MEDIAN(Data)
Pro
Robust when
extreme data
values exist.
Con
Influenced
by extreme
values.
Ignores
extremes
and can be
affected by
gaps in data
values.
1-86
Central Tendency
 Six Measures of Central Tendency
Statistic
Mode
Midrange
Formula
Most
frequently
occurring
data value
xmin  xmax
2
Excel Formula
=MODE(Data)
=0.5*(MIN(Data)
+MAX(Data))
Pro
Con
Useful for
attribute
data or
discrete data
with a small
range.
May not be
unique,
and is not
helpful for
continuous
data.
Easy to
understand
and
calculate.
Influenced
by extreme
values and
ignores
most data
values.
1-87
Central Tendency
 Six Measures of Central Tendency
Statistic
Geometric
mean (G)
Trimmed
mean
Formula
n
x1 x2 ... xn
Same as the
mean except
omit highest
and lowest
k% of data
values (e.g.,
5%)
Excel Formula
=GEOMEAN(Data)
Pro
Con
Useful for
growth
rates and
mitigates
high
extremes.
Less
familiar
and
requires
positive
data.
Mitigates
effects of
=TRMEAN(Data, %)
extreme
values.
Excludes
some data
values
that could
be
relevant.
1-88
Central Tendency
 Mean
• A familiar measure of central tendency.
Population Formula
Sample Formula
n
N

 xi
i 1
N
x
 xi
i 1
n
• In Excel, use function =AVERAGE(Data) where
Data is an array of data values.
1-89
Central Tendency
 Mean
• For the sample of n = 37 car brands:
n
x
 xi
i 1
n

87  93  98  ...  159  164  173 4639

 125.38
37
37
1-90
Central Tendency
 Characteristics of the Mean
• Arithmetic mean is the most familiar average.
• Affected by every sample item.
• The balancing point or fulcrum for the data.
1-91
Central Tendency
 Characteristics of the Mean
• Regardless of the shape of the distribution,
absolute distances from the mean to the data
n
points always sum to zero.
 ( xi  x )  0
• Consider the following
i 1
asymmetric distribution of quiz
scores whose mean = 65.
n
 ( xi  x ) = (42 – 65) + (60 – 65) + (70 – 65) + (75 – 65) + (78 – 65)
i 1
= (-23) + (-5) + (5) + (10) + (13) = -28 + 28 = 0
1-92
Central Tendency
 Median
• The median (M) is the 50th percentile or midpoint
of the sorted sample data.
• M separates the upper and lower half of the sorted
observations.
• If n is odd, the median is the middle observation in
the data array.
• If n is even, the median is the average of the
middle two observations in the data array.
1-93
Central Tendency
 Median
• For n = 8, the median is between the fourth and
fifth observations in the data array.
1-94
Central Tendency
 Median
• For n = 9, the median is the fifth observation in the
data array.
1-95
Central Tendency
 Median
• Consider the following n = 6 data values:
11 12 15 17 21 32
• What is the median?
For even n, Median =
n/2 = 6/2 = 3
and
xn / 2  x( n / 21)
2
n/2+1 = 6/2 + 1 = 4
M = (x3+x4)/2 = (15+17)/2 = 16
11
12
15
16
17
21
32
1-96
Central Tendency
 Median
• Consider the following n = 7 data values:
12 23 23 25 27 34 41
• What is the median?
For odd n, Median =
x( n1) / 2
(n+1)/2 = (7+1)/2 = 8/2 = 4
M = x4 = 25
12
23
23
25
27
34
41
1-97
Central Tendency
 Median
• Use Excel’s function =MEDIAN(Data) where Data
is an array of data values.
• For the 37 vehicle quality ratings (odd n) the
position of the median is
(n+1)/2 = (37+1)/2 = 19.
• So, the median is x19 = 121.
• When there are several duplicate data values, the
median does not provide a clean “50-50” split in
the data.
1-98
Central Tendency
 Characteristics of the Median
• The median is insensitive to extreme data values.
• For example, consider the following quiz scores for
3 students:
Tom’s scores:
20, 40, 70, 75, 80
Jake’s scores:
60, 65, 70, 90, 95
Mary’s scores:
50, 65, 70, 75, 90
Mean =57, Median = 70, Total = 285
Mean = 76, Median = 70, Total = 380
Mean = 70, Median = 70, Total = 350
• What does the median for each student tell you?
1-99
Central Tendency
 Mode
• The most frequently occurring data value.
• Similar to mean and median if data values occur
often near the center of sorted data.
• May have multiple modes or no mode.
1-100
Central Tendency
 Mode
• For example, consider the following quiz scores for
3 students:
Lee’s scores:
60, 70, 70, 70, 80
Pat’s scores:
45, 45, 70, 90, 100
Sam’s scores:
50, 60, 70, 80, 90
Xiao’s scores:
50, 50, 70, 90, 90
Mean =70, Median = 70, Mode = 70
Mean = 70, Median = 70, Mode = 45
Mean = 70, Median = 70, Mode = none
Mean = 70, Median = 70, Modes = 50,90
• What does the mode for each student tell you?
1-101
Central Tendency
 Mode
• Easy to define, not easy to calculate in large
samples.
• Use Excel’s function =MODE(Array)
- will return #N/A if there is no mode.
- will return first mode found if multimodal.
• May be far from the middle of the distribution and
not at all typical.
1-102
Central Tendency
 Mode
• Generally isn’t useful for continuous data since
data values rarely repeat.
• Best for attribute data or a discrete variable with a
small range (e.g., Likert scale).
1-103
Central Tendency
 Example: Price/Earnings Ratios and Mode
• Consider the following P/E ratios for a random
sample of 68 Standard & Poor’s 500 stocks.
7
8
8
10 10 10 10 12 13 13 13 13 13 13 13 14 14
14 15 15 15 15 15 16 16 16 17 18 18 18 18 19 19 19
19 19 20 20 20 21 21 21 22 22 23 23 23 24 25 26 26
26 26 27 29 29 30 31 34 36 37 40 41 45 48 55 68 91
• What is the mode?
1-104
Central Tendency
 Example: Price/Earnings Ratios and Mode
• Excel’s descriptive
statistics results are:
• The mode 13 occurs
7 times, but what
does the dot plot
show?
Mean
22.7206
Median
19
Mode
13
Range
84
Minimum
7
Maximum
91
Sum
Count
1545
68
1-105
Central Tendency
 Example: Price/Earnings Ratios and Mode
• The dot plot shows local modes (a peak with
valleys on either side) at 10, 13, 15, 19, 23, 26, 29.
• These multiple modes suggest that the mode is
not a stable measure of central tendency.
1-106
Central Tendency
 Example: Rose Bowl Winners’ Points
• Points scored by the winning NCAA football team
tends to have modes in multiples of 7 because
each touchdown yields 7 points.
• Consider the dot plot of the points scored by the
winning team in the first 87 Rose Bowl games.
• What is the mode?
1-107
Central Tendency
 Mode
• A bimodal distribution refers to the shape of the
histogram rather than the mode of the raw data.
• Occurs when dissimilar populations are combined
in one sample. For example,
1-108
Central Tendency
 Symptoms of Skewness
Distribution’s
Shape
Histogram Appearance
Skewed left
(negative
skewness)
Long tail of histogram points left
(a few low values but most data on Mean < Median
right)
Symmetric
Tails of histogram are balanced
(low/high values offset)
Mean  Median
Skewed right
(positive
skewness)
Long tail of histogram points right
(most data on left but a few high
values)
Mean > Median
Statistics
1-109
Central Tendency
 Geometric Mean
• The geometric mean (G) is a
multiplicative average.
G  n x1 x2 ... xn
• For the J. D. Power quality data (n=37):
G  37 (87)(93)(98)...(164)(173)  37 2.37667 1077  123.38
• In Excel use =GEOMEAN(Array)
• The geometric mean tends to mitigate the effects
of high outliers.
1-110
Central Tendency
 Midrange
• The midrange is the point halfway between the
lowest and highest values of X.
• Easy to use but sensitive to extreme data values.
xmin  xmax
Midrange =
2
• For the J. D. Power quality data (n=37):
x1  x37 87  173
xmin  xmax

 130
Midrange =
=
2
2
2
• Here, the midrange (130) is higher than the mean
(125.38) or median (121).
1-111
Central Tendency
 Trimmed Mean
• To calculate the trimmed mean, first remove the
highest and lowest k percent of the observations.
• For example, for the n = 68 P/E ratios, we want a 5
percent trimmed mean (i.e., k = .05).
• To determine how many observations to trim,
multiply k x n = 0.05 x 68 = 3.4 or 3 observations.
• So, we would remove the three smallest and three
largest observations before averaging the
remaining values.
1-112
Dispersion
• Variation is the “spread” of data points about the
center of the distribution in a sample. Consider the
following measures of dispersion:
 Measures of Variation
Statistic
Range
Formula
xmax – xmin
n
Variance
(s2)
  xi  x 
i 1
n 1
Excel
Pro
Con
=MAX(Data)MIN(Data)
Sensitive to
Easy to calculate extreme data
values.
=VAR(Data)
Plays a key role
in mathematical
statistics.
2
Non-intuitive
meaning.
1-113
Dispersion
 Measures of Variation
Statistic
Standard
deviation
(s)
Coefficient. of
variation
(CV)
Formula
n
  xi  x 
i 1
2
Excel
Pro
Con
=STDEV(Data)
Most common
measure. Uses
same units as the
raw data ($ , £, ¥,
etc.).
Non-intuitive
meaning.
None
Measures relative
variation in
percent so can
compare data
sets.
Requires
nonnegative
data.
n 1
100 
s
x
1-114
Dispersion
 Range
• The difference between the largest and smallest
observation.
Range = xmax – xmin
• For example, for the n = 68 P/E ratios,
Range = 91 – 7 = 84
1-115
Dispersion
 Variance
• The population variance (s2) is
defined as the sum of squared
deviations around the mean 
divided by the population size.
N
s2 
• For the sample variance (s2), we
divide by n – 1 instead of n,
otherwise s2 would tend to
2
s

underestimate the unknown
population variance s2.
  xi   
2
i 1
N
n
  xi  x 
i 1
n 1
2
1-116
Dispersion
 Standard Deviation
• The square root of the variance.
• Explains how individual values in a data set vary
from the mean.
• Units of measure are the same as X.
Population
standard
deviation
N
s
  xi   
i 1
N
2
Sample
standard
deviation
n
s
  xi  x 
i 1
n 1
2
1-117
Dispersion
 Calculating a Standard Deviation
• Consider the following five quiz scores for
Stephanie.
1-118
Dispersion
 Calculating a Standard Deviation
• Now, calculate the sample standard deviation:
n
s
2
x

x


 i
i 1
n 1

2380
 595  24.39
5 1
• Somewhat easier, the two-sum formula can also
be used:
2


x

 i
n
2
(360)
2  i 1 
 xi  n
28300 
2
5  28300  25920  595  24.39
s  i 1

n 1
5 1
5 1
n
1-119
IST 203: Statistics for Social Sciences
Lecture 5A
1-120
Random Experiments
 Sample Space
• A random experiment is an observational process
whose results cannot be known in advance.
• The set of all outcomes (S) is the sample space for
the experiment.
• A sample space with a countable number of
outcomes is discrete.
1-121
Random Experiments
 Sample Space
• For a single roll of a die, the sample space is:
S = {1, 2, 3, 4, 5, 6}
• When two dice are rolled, the sample space is
the following pairs:
S = {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6),
(2,1), (2,2), (2,3), (2,4), (2,5), (2,6),
(3,1), (3,2), (3,3), (3,4), (3,5), (3,6),
(4,1), (4,2), (4,3), (4,4), (4,5), (4,6),
(5,1), (5,2), (5,3), (5,4), (5,5), (5,6),
(6,1), (6,2), (6,3), (6,4), (6,5), (6,6)}
1-122
Random Experiments
 Sample Space
• Consider the sample space to describe a randomly
chosen United Airlines employee by
2 genders,
21 job classifications,
6 home bases (major hubs) and
4 education levels
There are: 2 x 21 x 6 x 4 = 1008 possible outcomes
• It would be impractical to enumerate this sample
space.
1-123
Random Experiments
 Sample Space
• If the outcome is a continuous measurement, the
sample space can be described by a rule.
• For example, the sample space for the length of a
randomly chosen cell phone call would be
S = {all X such that X > 0}
or written as S = {X | X > 0}
• The sample space to describe a randomly chosen
student’s GPA would be
S = {X | 0.00 < X < 4.00}
1-124
Random Experiments
 Events
• An event is any subset of outcomes in the sample
space.
• A simple event or elementary event, is a single
outcome.
• A discrete sample space S consists of all the
simple events (Ei):
S = {E1, E2, …, En}
1-125
Random Experiments
 Events
• Consider the random experiment of tossing a
balanced coin.
What is the sample space?
S = {H, T}
• What are the chances of observing a H or T?
• These two elementary events are equally likely.
• When you buy a lottery ticket, the sample space
S = {win, lose} has only two events.
• Are these two events equally likely to occur?
1-126
Random Experiments
 Events
• A compound event consists of two or more simple
events.
• For example, in a sample space of 6 simple
events, we could define the compound events
A = {E1, E2}
B = {E3, E5, E6}
• These are
displayed in a
Venn diagram:
1-127
Random Experiments
 Events
• Many different compound events could be defined.
• Compound events can be described by a rule.
• For example, the compound event
A = “rolling a seven” on a roll of two
dice consists of 6 simple events:
S = {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}
1-128
Probability
 Definitions
• The probability of an event is a number that
measures the relative likelihood that the event will
occur.
• The probability of event A [denoted P(A)], must lie
within the interval from 0 to 1:
0 < P(A) < 1
If P(A) = 0, then the
event cannot occur.
If P(A) = 1, then the event
is certain to occur.
1-129
Probability
 Definitions
• In a discrete sample space, the probabilities of all
simple events must sum to unity:
P(S) = P(E1) + P(E2) + … + P(En) = 1
• For example, if the following number of purchases
were made by
credit card:
32%
debit card:
20%
cash:
35%
P(cash) = .35
check:
18%
P(check) = .18
Sum = 100%
Sum = 1.0
P(credit card) = .32
Probability
P(debit card) = .20
1-130
Probability
 Law of Large Numbers
• The law of large numbers is an important
probability theorem that states that a large sample
is preferred to a small one.
• Flip a coin 50 times. We would expect the
proportion of heads to be near .50.
• However, in a small finite sample, any ratio can be
obtained (e.g., 1/3, 7/13, 10/22, 28/50, etc.).
• A large n may be needed to get close to .50.
• Consider the results of 10, 20, 50, and 500 coin
flips.
1-131
Probability
1-132
Probability
 Practical Issues for Actuaries
• Actuarial science is a high-paying career that
involves estimating empirical probabilities.
• For example, actuaries
- calculate payout rates on life insurance,
pension plans, and health care plans
- create tables that guide IRA withdrawal
rates for individuals from age 70 to 99
1-133
Rules of Probability
 Complement of an Event
• The complement of an event A is denoted by
A′ and consists of everything in the sample space
S except event A.
1-134
Rules of Probability
 Complement of an Event
• Since A and A′ together comprise the entire
sample space,
P(A) + P(A′ ) = 1
• The probability of A′ is found by
P(A′ ) = 1 – P(A)
• For example, The Wall Street Journal reports that
about 33% of all new small businesses fail within
the first 2 years. The probability that a new small
business will survive is:
P(survival) = 1 – P(failure) = 1 – .33 = .67 or 67%
1-135
Rules of Probability
 Odds of an Event
• The odds in favor of event A occurring is
Odds =
P ( A)
P ( A)

P ( A ') 1  P ( A)
• Odds are used in sports and games of chance.
• For a pair of fair dice, P(7) = 6/36 (or 1/6).
What are the odds in favor of rolling a 7?
Odds =
P(rolling seven)
1/ 6
1/ 6 1



1  P(rolling seven) 1  1/ 6 5 / 6 5
1-136
Rules of Probability
 Odds of an Event
• On the average, for every time a 7 is rolled, there
will be 5 times that it is not rolled.
• In other words, the odds are 1 to 5 in favor of
rolling a 7.
• The odds are 5 to 1 against rolling a 7.
• In horse racing and other sports, odds are usually
quoted against winning.
1-137
Rules of Probability
 Odds of an Event
• If the odds against event A are quoted as b to a,
then the implied probability of event A is:
a
P(A) =
ab
• For example, if a race horse has a 4 to 1 odds
against winning, the P(win) is
a
1
1

  0.20 or 20%
P(win) =
a  b 4 1 5
1-138
Rules of Probability
 Union of Two Events
• The union of two events consists of all outcomes in
the sample space S that are contained either in
event A or in event B or both
(denoted A  B or “A or B”).
 may be read
as “or” since
one or the other
or both events
may occur.
1-139
Rules of Probability
 Union of Two Events
• For example, randomly choose a card from a deck
of 52 playing cards.
• If Q is the event that we draw a
queen and R is the event that we
draw a red card, what is Q  R?
• It is the possibility of drawing
either a queen (4 ways)
or a red card (26 ways)
or both (2 ways).
1-140
Rules of Probability
 Intersection of Two Events
• The intersection of two events A and B
(denoted A  B or “A and B”) is the event
consisting of all outcomes in the sample space S
that are contained in both event A and event B.
 may be read
as “and” since
both events
occur. This is a
joint probability.
1-141
Rules of Probability
 Intersection of Two Events
• For example, randomly choose a card from a deck
of 52 playing cards.
• If Q is the event that we draw a
queen and R is the event that we
draw a red card, what is
Q  R?
• It is the possibility of getting
both a queen and a red card
(2 ways).
1-142
Rules of Probability
 General Law of Addition
• The general law of addition states that the
probability of the union of two events A and B is:
P(A  B) = P(A) + P(B) – P(A  B)
When you add
So, you have
A and B
the P(A) and
to subtract
P(B) together,
P(A  B) to
you count the
avoid overA
B
P(A and B)
stating the
twice.
probability.
1-143
Rules of Probability
 General Law of Addition
• For the card example:
P(Q) = 4/52 (4 queens in a deck)
P(R) = 26/52 (26 red cards in a deck)
P(Q  R) = 2/52 (2 red queens in a deck)
P(Q  R) = P(Q) + P(R) – P(Q  Q)
Q and R = 2/52
= 4/52 + 26/52 – 2/52
= 28/52 = .5385 or 53.85%
Q
4/52
R
26/52
1-144
Rules of Probability
 Mutually Exclusive Events
• Events A and B are mutually exclusive (or disjoint)
if their intersection is the null set () that contains
no elements. If A  B = , then P(A  B) = 0
 Special Law of Addition
• In the case of mutually
exclusive events, the
addition law reduces
to:
P(A  B) = P(A) + P(B)
1-145
Rules of Probability
 Forced Dichotomy
• Polytomous events can be made dichotomous
(binary) by defining the second category as
everything not in the first category.
Polytomous Events
Binary (Dichotomous) Variable
Vehicle type (SUV, sedan, truck,
motorcycle)
X = 1 if SUV, 0 otherwise
A randomly-chosen NBA player’s
height
X = 1 if height exceeds 7 feet, 0
otherwise
Tax return type (single, married filing
jointly, married filing separately, head
of household, qualifying widower)
X = 1 if single, 0 otherwise
1-146
Rules of Probability
 Conditional Probability
• The probability of event A given that event B has
occurred.
• Denoted P(A | B).
The vertical line “ | ” is read as “given.”
P( A | B) 
P( A  B)
P( B)
for P(B) > 0 and
undefined otherwise
1-147
Rules of Probability
 Conditional Probability
• Consider the logic of this formula by looking at the
Venn diagram.
The sample space is
P( A  B)
restricted to B, an event
P( A | B) 
P( B)
that has occurred.
A  B is the part of B
that is also in A.
The ratio of the relative
size of A  B to B is
P(A | B).
1-148
Rules of Probability
 Example: High School Dropouts
• First define
U = the event that the person is unemployed
D = the event that the person is a high school
dropout
P(D) = .2905
P(UD) = .0532
P(U) = .1350
P(U  D) .0532
P(U | D) 

 .1831 or 18.31%
P( D)
.2905
• P(U | D) = .1831 > P(U) = .1350
• Therefore, being a high school dropout is related
to being unemployed.
1-149
IST 203: Statistics for Social Sciences
Lecture 6
1-150
Probability Models
 Probability Models
•
A random (or stochastic) process is a repeatable
random experiment.
• For example, each call arriving at
the L.L. Bean order center is a
random experiment in which the
variable of interest is the amount
of the order.
• Probability can be used to analyze random (or
stochastic) processes and to understand business
processes.
1-151
Discrete Distributions
 Random Variables
• A random variable is a function or rule that
assigns a numerical value to each outcome in
the sample space of a random experiment.
• Nomenclature:
- Capital letters are used to represent
random variables (e.g., X, Y).
- Lower case letters are used to represent
values of the random variable (e.g., x, y).
• A discrete random variable has a countable
number of distinct values.
1-152
Discrete Distributions
Probability Distributions
• A discrete probability distribution assigns a
probability to each value of a discrete random
variable X.
• To be a valid probability, each probability must be
between
0  P(x )  1
i
• and the sum of all the probabilities for the values of
X must be equal to unity.
n
 P( x )  1
i 1
i
1-153
Discrete Distributions
Example: Coin Flips
When you flip a coin
three times, the
sample space has
eight equally likely
simple events.
They are:
1st Toss
H
H
H
H
T
T
T
T
2nd Toss
H
H
T
T
H
H
T
T
3rd Toss
H
T
H
T
H
T
H
T
1-154
Discrete Distributions
Example: Coin Flips
If X is the number of heads, then X is a random
variable whose probability distribution is as follows:
Possible Events
TTT
HTT, THT, TTH
HHT, HTH, THH
HHH
Total
x
0
1
2
3
P(x)
1/8
3/8
3/8
1/8
1
1-155
Discrete Distributions
Expected Value
• The expected value E(X) of a discrete random
variable is the sum of all X-values weighted by
their respective probabilities.
• If there are n distinct values of X,
n
E ( X )     xi P( xi )
i 1
• The E(X) is a measure of central tendency.
1-156
Discrete Distributions
Example: Service Calls
The probability distribution of emergency service calls
on Sunday by Ace Appliance Repair is:
x
P(x)
0
0.05
1
0.10
2
0.30
3
0.25
4
0.20
5
0.10
Total
1.00
What is the average or expected
number of service calls?
1-157
Discrete Distributions
Example: Service Calls
First calculate xiP(xi):
x
P(x)
xP(x)
0
0.05
0.00
1
0.10
0.10
2
0.30
0.60
3
0.25
0.75
4
0.20
0.80
5
0.10
0.50
Total
1.00
2.75
The sum of the xP(x) column
is the expected value or
mean of the discrete
distribution.
5
E ( X )     xi P( xi )
i 1
1-158
Discrete Distributions
Example: Service Calls
This particular
probability distribution
is not symmetric
around the mean
 = 2.75.
0.30
Probability
0.25
0.20
0.15
0.10
0.05
0.00
0
1
2
3
 = 2.75
Num ber of Service Calls
4
5
However, the mean
is still the balancing
point, or fulcrum.
Because E(X) is an average, it does not have to be an
observable point.
1-159
Discrete Distributions
Application: Life Insurance
• Expected value is the basis of life insurance.
• For example, what is the probability that a 30-yearold white female will die within the next year?
• Based on mortality statistics, the probability is
.00059 and the probability of living another year is
1 - .00059 = .99941.
• What premium should a life insurance company
charge to break even on a $500,000 1-year term
policy?
1-160
Discrete Distributions
Application: Raffle Tickets
• Now, calculate the E(X):
E(X) = (value if you win)P(win) + (value if you lose)P(lose)
= (55,000)
1
+ (0) 29,345
29,346
29,346
= (55,000)(.000034076) + (0)(.999965924) = $1.87
• The raffle ticket is actually worth $1.87. Is it worth
spending $2.00 for it?
1-161
Discrete Distributions
Variance and Standard Deviation
• If there are n distinct values of X, then the variance
of a discrete random variable is:
n
V ( X )  s2  [ xi  ]2 P( xi )
i 1
• The variance is a weighted average of the
dispersion about the mean and is denoted either
as s2 or V(X).
• The standard deviation is the square root of the
variance and is denoted s.
2
s  s  V (X )
1-162
Discrete Distributions
Example: Bed and Breakfast
The Bay Street Inn is a 7-room
bed-and-breakfast in Santa
Theresa, Ca.
The probability
distribution of room
rentals during
February is:
x
P(x)
0
0.05
1
0.05
2
0.06
3
0.10
4
0.13
5
0.20
6
0.15
7
0.26
Total
1.00
1-163
Discrete Distributions
Example: Bed and Breakfast
First find the expected value
7
E ( X )     xi P( xi )
i 1
= 4.71 rooms
x
P(x)
x P(x)
0
0.05
0.00
1
0.05
0.05
2
0.06
0.12
3
0.10
0.30
4
0.13
0.52
5
0.20
1.00
6
0.15
0.90
7
0.26
1.82
1.00
 = 4.71
Total
1-164
Discrete Distributions
Example: Bed and Breakfast2
7
V ( X )  s  [ xi  ]2 P( xi )
The E(X) is then
used to find
x
the variance:
0
P(x)
x P(x)
[x]2
[x]2 P(x)
0.05
0.00
22.1841
1.109205
1
0.05
0.05
13.7641
0.688205
2
0.06
0.12
7.3441
0.440646
3
0.10
0.30
2.9241
0.292410
4
0.13
0.52
0.5041
0.065533
5
0.20
1.00
0.0841
0.016820
6
0.15
0.90
1.6641
0.249615
7
0.26
1.82
5.2441
1.363466
1.00
 = 4.71
= 4.2259 rooms2
The standard
deviation is:
s = 4.2259
= 2.0577 rooms
Total
i 1
s2 = 4.225900
1-165
Discrete Distributions
What is a PDF or CDF?
• A probability distribution function (PDF) is a
mathematical function that shows the probability of
each X-value.
• A cumulative distribution function (CDF) is a
mathematical function that shows the cumulative
sum of probabilities, adding from the smallest to
the largest X-value, gradually approaching unity.
1-166
Discrete Distributions
What is a PDF or CDF?
Consider the following illustrative histograms:
1.00
0.25
0.90
0.80
0.20
Probability
Probability
0.70
0.15
0.10
0.60
0.50
0.40
0.30
0.05
0.20
0.10
0.00
0.00
0
1
2
3
4
5
6
7
8
Value of X
9
10
11
12
13
14
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Value of X
Illustrative PDF
Cumulative CDF
(Probability Density Function)
(Cumulative Density Function)
The equations for these functions depend on the
parameter(s) of the distribution.
1-167
Uniform Distribution
 Characteristics of the Uniform Distribution
• The uniform distribution describes a random
variable with a finite number of integer values from
a to b (the only two parameters).
• Each value of the random variable is equally likely
to occur.
• Consider the following summary of the uniform
distribution:
1-168
Uniform Distribution
Parameters
PDF
Range
Mean
Std. Dev.
a = lower limit
b = upper limit
1
b  a 1
axb
P( x) 
ab
2
(b  a)  12  1
12
Random data
generation in Excel
Comments
=a+INT((b-a+1)*RAND())
Used as a benchmark, to generate random
integers, or to create other distributions.
1-169
Uniform Distribution
 Example: Rolling a Die
0.18
1.00
0.16
0.90
0.14
0.80
0.70
0.12
Probability
Probability
• The number of dots on the roll of a die form a
uniform random variable with six equally likely
integer values: 1, 2, 3, 4, 5, 6
• What is the probability of rolling any of these?
0.10
0.08
0.06
0.60
0.50
0.40
0.30
0.04
0.20
0.02
0.10
0.00
0.00
1
2
3
4
5
Num ber of Dots Show ing on the Die
PDF for one die
6
1
2
3
4
5
Num ber of Dots Show ing on the Die
CDF for one die
6
1-170
Uniform Distribution
 Example: Rolling a Die
1
1
1


• The PDF for all x is: P( x) 
b  a  1 6 1  1 6
• Calculate the mean as:
a  b 1 6

 3.5
2
2
• Calculate the standard deviation as:
(b  a)  1  1
2
12
(6  1)  1 1
2

12
 1.708
1-171
Uniform Distribution
 Application: Pumping Gas
 On a gas pump, the last two digits (pennies)
displayed will be a uniform random integer
(assuming the pump stops automatically).
0.012
1.000
0.900
0.010
0.800
0.700
0.008
0.600
0.006
0.500
0.400
0.004
0.300
0.200
0.002
0.100
0.000
0.000
0
10
20
30
40
50
60
Pennies Digits on Pum p
70
80
90
0
10
20
30
40
50
60
Pennies Digits on Pum p
PDF
CDF
The parameters are: a = 00 and b = 99
70
80
90
1-172
Uniform Distribution
 Application: Pumping Gas
• The PDF for all x is:
1
1
1
P( x) 


 .010
b  a  1 99  0  1 100
• Calculate the mean as:
a  b 0  99

 49.5
2
2
• Calculate the standard deviation as:
(b  a)  12  1  (99  0)  12  1  28.87
12
12
1-173
Bernoulli Distribution
 Bernoulli Experiments
• A random experiment with only 2 outcomes is a
Bernoulli experiment.
• One outcome is arbitrarily labeled a
“success” (denoted X = 1) and the other a “failure”
(denoted X = 0).
p is the P(success), 1 – p is the P(failure).
• “Success” is usually defined as the less likely
outcome so that p < .5 for convenience.
• Note that P(0) + P(1) = (1 – p) + p = 1 and
0 < p < 1.
1-174
Bernoulli Distribution
 Bernoulli Experiments
Consider the following Bernoulli experiments:
Bernoulli Experiment
Possible Outcomes
Probability of
“Success”
Flip a coin
1 = heads
0 = tails
p = .50
Inspect a jet turbine blade
1 = crack found
0 = no crack found
p = .001
Purchase a tank of gas
1 = pay by credit card
0 = do not pay by credit
card
p = .78
Do a mammogram test
1 = positive test
0 = negative test
p = .0004
1-175
Bernoulli Distribution
 Bernoulli Experiments
• The expected value (mean) of a Bernoulli
experiment is2 calculated as:
E ( X )   x i P( xi )  (0)(1  p)  (1)(p)  p
i 1
• The variance of a Bernoulli experiment is
calculated as:
2
V ( X )   xi  E ( X ) P( xi )  (0  p)2 (1  p)  (1  p)2 (p)  p(1  p)
2
i 1
• The mean and variance are useful in developing
the next model.
1-176
Binomial Distribution
 Characteristics of the Binomial Distribution
• The binomial distribution arises when a Bernoulli
experiment is repeated n times.
• Each Bernoulli trial is independent so the
probability of success p remains constant on each
trial.
• In a binomial experiment, we are interested in X =
number of successes in n trials. So,
X = x1 + x2 + ... + xn
• The probability of a particular number of successes
P(X) is determined by parameters n and p.
1-177
Binomial Distribution
 Characteristics of the Binomial Distribution
• The mean of a binomial distribution is found by
adding the means for each of the n Bernoulli
independent events: p + p + … + p = np
• The variance of a binomial distribution is found by
adding the variances for each of the n Bernoulli
independent events:
p(1-p)+ p(1-p) + … + p(1-p) = np(1-p)
• The standard deviation is
np(1-p)
1-178
Binomial Distribution
Parameters
PDF
n = number of trials
p = probability of success
P ( x) 
n!
p x (1  p) n  x
x !(n  x)!
Excel function
=BINOMDIST(k,n,p,0)
Range
X = 0, 1, 2, . . ., n
Mean
np
Std. Dev.
np(1  p)
Random data generation
in Excel
Sum n values of =1+INT(2*RAND()) or use
Excel’s Tools | Data Analysis
Comments
Skewed right if p < .50, skewed left if
p > .50, and symmetric if p = .50.
1-179
Binomial Distribution
 Example: Quick Oil Change Shop
• It is important to quick oil change shops to ensure
that a car’s service time is not considered “late” by
the customer.
• Service times are defined as either late or not late.
• X is the number of cars that are late out of the total
number of cars serviced.
• Assumptions:
- cars are independent of each other
- probability of a late car is consistent
1-180
Binomial Distribution
 Example: Quick Oil Change Shop
• What is the probability that exactly 2 of the next n
= 10 cars serviced are late (P(X = 2))?
• P(car is late) = p = .10
• P(car not late) = 1 - p = .90
n!
P ( x) 
p x (1  p) n  x
x !(n  x)!
10!
P(X = 2) =
2!(10-2)!
(.1)2(1-.10)10-2
= .1937
1-181
Binomial Distribution
 Application: Uninsured Patients
• On average, 20% of the emergency room patients
at Greenwood General Hospital lack health
insurance.
• In a random sample of 4 patients, what is the
probability that at least 2 will be uninsured?
• X = number of uninsured patients (“success”)
• P(uninsured) = p = 20% or .20
• P(insured) = 1 – p = 1 – .20 = .80
• n = 4 patients
• The range is X = 0, 1, 2, 3, 4 patients.
1-182
Binomial Distribution
 Application: Uninsured Patients
• What is the mean and standard deviation of this
binomial distribution?
Mean =  = np =
(4)(.20) = 0.8 patients
Standard deviation = s =
np(1  p)
= 4(.20(1-.20)
= 0.8 patients
1-183
IST 203: Statistics for Social Sciences
Lecture 7
1-184
Continuous Variables
 Events as Intervals
•
•
Discrete Variable – each value of X has its own
probability P(X).
Continuous Variable – events are intervals and
probabilities are areas underneath smooth
curves. A single point has no probability.
1-185
Describing a Continuous Distribution
 PDFs and CDFs
Continuous PDF’s:
• Denoted f(x)
• Must be nonnegative
• Total area under
curve = 1
• Mean, variance and
shape depend on
the PDF parameters
• Reveals the shape
of the distribution
Normal PDF
1-186
Uniform Continuous Distribution
 Characteristics of the Uniform Distribution
1-187
Uniform Continuous Distribution
 Example: Anesthesia Effectiveness
•
•
•
An oral surgeon injects a painkiller prior to
extracting a tooth. Given the varying
characteristics of patients, the dentist views the
time for anesthesia effectiveness as a uniform
random variable that takes between 15 minutes
and 30 minutes.
X is U(15, 30)
a = 15, b = 30, find the mean and standard
deviation.
1-188
Uniform Continuous Distribution
 Example: Anesthesia Effectiveness
a + b 15 + 30
=
=
= 22.5 minutes
2
2
s=
(b – a)2 = (30 – 15)2 = 4.33 minutes
12
12
Find the probability that the anesthetic takes between
20 and 25 minutes.
P(c < X < d) = (d – c)/(b – a)
P(20 < X < 25) = (25 – 20)/(30 – 15)
= 5/15 = 0.3333 or 33.33%
1-189
Normal Distribution
 What is Normal?
A normal random variable should:
• Be measured on a continuous scale.
• Possess clear central tendency.
• Have only one peak (unimodal).
• Exhibit tapering tails.
• Be symmetric about the mean (equal tails).
1-190
Standard Normal Distribution
 Characteristics of the Standard Normal
•
Since for every value of  and s, there is a
different normal distribution, we transform a
normal random variable to a standard normal
distribution with  = 0 and s = 1 using the
formula:
z= x–
s
•
Denoted N(0,1)
1-191
Standard Normal Distribution
 Finding Areas by using Standardized Variables
• Suppose John took an economics exam and
scored 86 points. The class mean was 75 with a
standard deviation of 7. What percentile is John
in (i.e., find P(X < 86)?
zJohn = x –  = 86 – 75 = 11/7 = 1.57
7
s
•
So John’s score is 1.57 standard deviations about
the mean.