Chapter 1 ** STA 1020 - Wayne State University

advertisement
** STA 1020 - Part 1 (24/Sep/13) **
MATERIAL FOR EXAM #1
Contents
Exam 1 of 3: Producing Data
STA 1020
Quizzes every chapter and then First Partial Exam
Fall 2013 Section 09 MWF 10:40-11:35 0035 State
Chapter 1 - Where Do Data Come From?
Chapter 2 - Samples, Good and Bad
Instructor: Dr. J.L. Menaldi
Chapter 3 - What Do Samples Tell Us?
Textbook - Statistics: Concepts and Controversies,
by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed]
Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm
Chapter 4 - Sample Surveys in the Real World – mostly skipped!
Chapter 5 - Experiments, Good and Bad
Chapter 6 - Experiments in the Real World
Chapter 7 - Data Ethics – skipped!
“Statistics” is the Science of collecting, describing and interpreting data...
Chapter 8 - Measuring
It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws
of probability, the theory of statistics would not be possible
JLM (WSU)
STA 1020
Ch01 - Where Do Data Come From?
1 / 100
STA 1020
STA 1020
Ch01 - Where Do Data Come From?
STATISTICS is the Science of collecting (organizing), describing
(displaying, summarizing) and interpreting (understanding,
comparing) data (number in context)
Using ‘data’ to draw a conclusion about something unknown
Decision making in the presence of uncertainty (or partial knowledge)
Pieces of information or Numbers are ‘data’ only if the information
has a meaning attached!
Data comes from Observational Studies and from Experiments
NOW: How data is/are obtained?
‘Individuals’ (are the objects described by a set of data), and
‘variables’ (are the characteristics of an individual)
Again, Statistics is a collection of procedures and principles for
gathering data and analyzing information to help people make
decisions when faced with uncertainty.
Ch01 - Where Do Data Come From?
JLM (WSU)
Taking about data
Part 1: How to tell if ‘data’ are well made?
Chapter 1
JLM (WSU)
Chapter 9 - Do the Number Make Sense? – skipped!
What was wrong in the Literary Digest poll? Founded in 1890, the
magazine correctly predicted the winners in the presidential elections of
1916, 1920, 1924, 1928, and 1932. In the 1936 presidential contest
between A. Landon and F.D. Roosevelt, the magazine sent out 10 million
ballots and received about 1.3 million ballots for Landon and 0.9 million
ballots for Roosevelt, so it appeared that Landon would get 57% of the
votes. The size of the poll was extremely large when compared with the
size of other typical pools.
.......................................................................
In that same 1936 presidential election, George Gallup used a much smaller
poll of 50,000 people, and he correctly predicted that Roosevelt would win.
.......................................................................
How could it happen that the larger Literary Digest poll could be wrong by
such a large margin? What went wrong?
.......................................................................
Key words: Sample method, Great Depression, disproportionately wealthy
people (car owner, magazine subscription, people with phones) voluntary response.
JLM (WSU)
3 / 100
Example 1: Who recycles?
STA 1020
Ch01 - Where Do Data Come From?
Researchers spend lots of time and money weighting (in pound) the
stuff (the curbside recycling basket each weak) of each residence in
two neighborhoods in a California city (referred to as Upper Crust and
Lower Mid).
The Upper Crust households contributed more pounds per week on
the average than did the folk in Lower Mid.
Can we say that the rich are more serious about recycling? No.
Someone notice that Upper Crust recycling baskets contained lots of
heavy glass wine bottles. In Lower Mid, they put out lots of light
plastic soda bottles and light metal beer/soda cans.
Weight tells us little about commitment to recycling. How to rectify
this?
2 / 100
Roosevelt-Landon Election
4 / 100
Other Examples
Read Example 2: What’s your race?
Ex 3: Do power lines causes leukemia in children?
Context: Electric currents generate magnetic fields. Really strong
magnetic fields can disturb living cells in laboratory studies.
What about the weaker magnetic fields we experience if we live near
power lines? Some data suggested that more children in these
locations might develop leukemia (a cancer of the blood cells)
Result: No evidence (of more than a chance connection between
magnetic fields and childhood leukemia)
No risk? (it says that a very careful study could not find any risk that
stands out the play of chance that distributes leukemia cases across
some landscape, at least as far as we know!)
Whenever risk statistics are reported, there is a risk that they are
misreported. Journalists often present risk data in a way that produces the
best story rather than in a way that provides the best information. Very
commonly, news reports either don’t contain or don’t emphasize the
information you need to understand risk.
JLM (WSU)
STA 1020
5 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
1 / 17
6 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch01 - Where Do Data Come From?
Observational Study / Experiment
Ch01 - Where Do Data Come From?
Data
Context of data (representing what, measuring what, which units, . . . )
Source of data (how and who got them, particular interest, . . . )
Sampling method (random, voluntary or self-selected, . . . )
Conclusion (statistical significance vs. practical significance, . . . )
Observational Study
Observes individuals and measures variables of interest but does not
attempt to influence the responses
Describes some group or situation
Means “no cause-and-effect” conclusion
Sample Surveys are a type of observational study
Experiment
Deliberately imposes some treatment on individuals in order to observe
their responses
Studies whether the treatment causes change in the response
A “cause-and-effect” conclusion is desired
STA 1020
Ch01 - Where Do Data Come From?
Statistics is about variation, ‘data’ vary because we don’t see everything
and because even what we do see and measure, we measure imperfectly.
So, in a very basic way, ‘statistics’ is about the real, imperfect world in
which we live.
Individuals are the objects described by the set of data, they may be
people, but also, animals or things
Variable is any characteristic of an individual, which may change from
individual to individual
Population is the entire group of individuals about which we want
information
Sampling Frame (or a list or set) of the individuals (not necessarily
the same as the population) from which the sample will be drawn
Sample is the subset of individuals from which information is collected
(and used to draw conclusions about the whole)
In reality, you have ‘theoretical’ population, an ‘implementable’ sampling
frame, and an ‘actual’ sample
What are the key words? (statistics jargon)
JLM (WSU)
Common Language - 1
JLM (WSU)
7 / 100
Common Language - 2
STA 1020
Ch01 - Where Do Data Come From?
Observational Studies try to gather information without disturbing
the scene they are observing
Sample Survey is a type of observational study, data collected on a
sample, looks only at part of the population
Census is a sample survey that attempts to include the entire
population in the sample
Experiments actually do something (called treatments) to individuals
in order to see how they respond. Usually, the goal is to learn whether
some treatment actually causes a certain response
8 / 100
Example 4: Public Opinion Polls
Poll such as those conducted by Gallup and many news organizations ask
people’s opinions on a variety of issues. The variables measured are
responses to questions about public issues. Though most noticed at
election time, these polls are conducted on a regular basis throughout the
year. For a typical poll:
Population: US residents 18 years of age and over. Non citizens and
even illegal immigrant are included
Sample: Between 1,000 and 1,500 people interviewed by telephone
Sampling Frame? Reachable population?
Response is a variable that measures an outcome or result of a study.
Three steps to doing Statistics right: (1) Think first, know where you are
headed and why. (2) Show is what folks think Statistics is about, the
mechanics of calculating statistics and making displays is important, but
not the most important part of Statistics. (3) Tell what you have learn,
until you have explained your results so that someone else can understand
your conclusion, the job is not done.
JLM (WSU)
STA 1020
Ch01 - Where Do Data Come From?
JLM (WSU)
9 / 100
Example 5: CPS
STA 1020
Ch01 - Where Do Data Come From?
10 / 100
Example 6: TV Ratings
Government economic and social data come from large sample surveys of
a nation’s individuals, households, or businesses. The monthly Current
Population Survey (CPS) is the most important government sample survey
in the United States. Many of the variables recorded by the CPS concern
the the employment or unemployment of everyone over 16 year old in a
household. The CPS also records many other economic and social
variables. For the CPS:
Market research is designed to discover what consumers want and what
they use. One example of market research is the television-rating service
Nielsen Media Research. These ratings influence how much advertisers will
pay to sponsor a program and whether or not the program stay on the air.
For the Nielsen national TV ratings:
Population: The more than 111 millions US households (i.e., all
people who share the same living quarters, regardless of how they are
related)
Sample: About 25,000 households that agree to use a “people meter”
to record the TV viewing of all people in the household
Sample: About 60,000 households interviewed
Population: The over 111 millions US households that have a
television set
Sampling Frame? Do you use a people meter?
Sampling Frame? Have you been interviewed?
JLM (WSU)
STA 1020
11 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
2 / 17
12 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch01 - Where Do Data Come From?
Example 7: GSS
Ch01 - Where Do Data Come From?
Social science research makes heavy use of sampling. The General Social
Survey (GSS), carried out every second year by the National Opinion
Research Center at the University of Chicago, is the most important social
science sample survey. The variables covered the subject’s personal and
family background, experiences and habits, and attitudes and opinions on
subjects from abortion to war.
Population: Adults (aged 18 and over) living the United States, that
can be interviewed in English, excluding adults in prisons and college
dormitories (& homeless)
Sample: About 3,000 adults interviewed in person in their homes.
Sampling Frame? Have you been interviewed?
Example 8: Helping welfare mothers find jobs
Most adult recipients of welfare are mothers of young children. Observational studies of
welfare mothers show that many are able to increase their earning and leave the welfare
system. Some take advantage of the voluntary job-training programs to improve their
skills. Should participating in job-training and job-search programs be required of all
able-bodied welfare mothers? Observational studies of the current system cannot tell us
what the effects of such a policy would be. Even if the mother studied are a properly
chosen sample of all welfare recipients, those who seek out training and find jobs may
differ in many ways from those who do not. They are observed to have more education,
for example, but they may also differ in values and motivation, things that cannot be
observed.
* To see if a required jobs program will help mothers escape welfare, such a program
must actually be tried. Choose two similar groups of mothers when they apply for
welfare. Require one group to participate in a job-training program, but do not offer the
program to the other group This is an experiment. Comparing the income and work
record of the two groups after several years will show whether requiring training has the
desired effect.
* If we hope the training will raise earnings, is it ethical to offer it to some women and
not to others?
JLM (WSU)
STA 1020
Ch01 - Where Do Data Come From?
13 / 100
JLM (WSU)
Key Concepts
STA 1020
Ch01 - Where Do Data Come From?
14 / 100
Data Set
The first time you see a data set, ask yourself these questions:
NOW IT’S YOUR TURN:
1.1 Federal Funding - Yes or No?
What are the objects of interest?
1.2 Posting lectures on the class web site - Was this an
Experiment?
What variables were measured?
Answer the question in Case Study Evaluated
What are the units of measurement?
How were the variables measured?
Who collected the data?
Knowing about statistical methods will have practical consequences in
your every day lives
Experiments versus Observational Studies
Common Terms (Individuals, Population, Sampling Frame, Sample,
Sample Survey, Census, Variable)
Data and Variables: identify, classify, and describe the Who, What,
When, Where, Why and How
How did they collect the data?
Where were the data collected?
Why did they collect the data?
A data file consists of rows and columns, where each row represents a
unique individual (or object), and each column represents a variable (or
characteristic) that describes the individuals.
Numerical variables describe quantities of the individuals, and
Categorical variables describe qualities of the individuals.
JLM (WSU)
STA 1020
Ch01 - Where Do Data Come From?
15 / 100
When discussing the change in the rate or risk of occurrence of something, make
sure you also include the base rate or baseline risk.
A representative sample of only a few thousand, or perhaps even a few hundred,
can give reasonably accurate information about a population of many millions.
An unrepresentative sample, even a large one, tells you almost nothing about the
population.
Cause-and-effect conclusions cannot generally be made on the basis of an
observational study.
Unlike with observational studies, cause-and-effect conclusions can generally be
made on the basis of randomized experiments.
A “statistically significant” finding does not necessarily have practical importance.
When a study reports a statistically significant finding, find out the magnitude of
the relationship or difference. A secondary moral to this story is that the implied
direction of cause and effect may be wrong. In this case, it could be that people
who were more lonely and depressed were more prone to using the Internet. And
as the follow-up research makes clear, remember that “truth” does not necessarily
remain fixed across time. Any study should be viewed in the context of society at
the time it was done. [Utts & Heckard - Mind on Statistics - Brooks-Cole (2007)]
STA 1020
STA 1020
Ch01 - Where Do Data Come From?
Simple summaries of data can tell an interesting story and are easier to digest than
long lists.
JLM (WSU)
JLM (WSU)
Morals to Remember Ch01
16 / 100
Exercise Ch01
1.10 What is the population? For each of the following sampling
situations, identify the population as exactly as possible. That is, say what
kind of individuals the population consists of and say exactly which
individuals fall in the population. If the information given is not sufficient,
complete the description of the population in a reasonable way. (a) A
sociologist is interested in determining the extent to which teens are
self-motivated. She selects a sample of four high schools in a large city
and interviews all tenth-graders in each of the schools. (b) The lecturer in
a large introductory mathematics course is concerned about the accuracy
with which multiple-choice tests are graded by her teaching assistants.
After the most recent test, she selects a sample of the exams and regrades
them. (c) The host of a local radio talk show wonders if people who are
actively religious are happier than those who are not. The station receives
calls from 48 listeners who voice their opinions.
17 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
3 / 17
18 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch01 - Where Do Data Come From?
Exercise (answer) Ch01
Ch01 - Where Do Data Come From?
**Answers
1.10 What is the population. Exact descriptions of the populations may
vary. (a) Teenagers (or “tenth-graders,” but from the description of the
situation, the researcher would like information about all teens). (b) The
most recent set of exams (or perhaps, all exams for the course). (c) All
adults, or everyone (the radio host’s question does not necessarily exclude
children from consideration).
JLM (WSU)
STA 1020
Ch01 - Where Do Data Come From?
Multiple choice Ch01
The monthly government sample survey that produces the unemployment
rate and other data about employment and earnings is called
(a) the National Household Survey. (b) the General Social Survey. (c) the
Survey of Employment. (d) the Current Population Survey.
Answer: (d)
.......................................................................
Each month, the commissioner of the Bureau of Labor Statistics appears
before Congress. His most recent testimony (October 3, 2008) began,
“Thank you for the opportunity to discuss the September employment and
unemployment data that we released this morning. The unemployment
rate was unchanged at 6.1 percent ...” The large sample survey that
produces monthly data on employment and unemployment is called the
(a) General Social Survey. (b) Current Population Survey. (c) Federal
Employment Survey. (d) Gallup Poll.
Answer: (b)
JLM (WSU)
19 / 100
Exercise 2 Ch01
STA 1020
20 / 100
Ch02 - Samples, Good and Bad
Identifying Data Sets. In a recent survey, 1500 adults in the United
States were asked if they thought there was solid evidence of global
warming. Eight hundred fifty-five of the adults said yes. Identify the
population and the sample. Describe the sample data set.
STA 1020
Fall 2013 Section 09 MWF 10:40-11:35 0035 State
**Solution.
The population consists of the responses of all adults in the United States,
and the sample consists of the responses of the 1500 adults in the United
States in the survey.
The sample is a subset of the responses of all adults in the United States.
The sample data set consists of 855 yes’s and 645 no’s.
Instructor: Dr. J.L. Menaldi
Textbook - Statistics: Concepts and Controversies,
by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed]
Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm
.......................................................................
Definition [Chapter 02]
* A ‘statistic’ is a numerical description of a sample characteristic (e.g.,
percentage of yes in sample, i.e., 855/1500 = 57%).
* A ‘parameter’ is a numerical description of a population characteristic
(e.g., percentage of yes in population, i.e., ?).
JLM (WSU)
STA 1020
Ch02 - Samples, Good and Bad
JLM (WSU)
21 / 100
Thought Questions 1,2. . .
Popular magazines often contain surveys that ask their readers to answer
questions about hot topics in the news. Do you think the responses the
magazines receive are representative of public opinion? Explain why or
why not.
.......................................................................
A survey on poverty and welfare included the following question, “Do you
agree with the popular notion that government policy should attempt to
assist those individuals who have had the misfortune to end up living in
poverty by providing them with much needed financial assistance until
they can get back on their feet?” Based on the wording, do you think the
author of this question was looking for support or opposition to welfare
programs? Explain.
STA 1020
STA 1020
Ch02 - Samples, Good and Bad
Chapter 2
JLM (WSU)
“Statistics” is the Science of collecting, describing and interpreting data...
It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws
of probability, the theory of statistics would not be possible
22 / 100
Thought Questions 3,4. . .
The Cable News Network (CNN) often asks its viewers to call the network
with their opinions on certain political issues, like whether or not they
favor current foreign policy. Do you think the results of these polls
represent the feelings of the general population? Do you think they
represent the feelings of all those watching CNN at the time? Explain.
.......................................................................
Suppose you access an online listing of all courses at your institution,
alphabetized by department, to determine what proportion of all courses
have a statistics course as a prerequisite. If you decide to sample 50
courses in order to get a representative sample of courses, how would you
select them? Would it be appropriate to simply select the first 50 courses
listed?
23 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
4 / 17
24 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch02 - Samples, Good and Bad
Good & Bad Samples
Ch02 - Samples, Good and Bad
Objective: To obtain samples that are representative of the population
(i.e., less bias as possible, a reduced picture of the whole)
Biased sampling methods
The design of a statistical study is biased if it systematically favors
certain outcomes
Convenience Sampling: Selection of whichever individuals are easiest
to reach
Voluntary Response Sample: Individuals responding to a general
appeal, i.e., write-in or call-in opinion polls.
Convenience Sampling and Voluntary Response Sample are biased!
Example 1: At The Mall
Manufacturers and advertising agencies often use interviews at shopping
malls to gather information about the habits of consumers and the
effectiveness of ads. A sample of mall shoppers is fast and cheap. But
people contacted at shopping malls are not representative of the entire US
population They are richer, for example, and more likely to be teenagers or
retired. Moreover, the interviewers tend to select neat, safe-looking
individuals from the stream of customers.
Mall samples are biased: they systematically over-represent some parts
of the population (prosperous people, teenagers, and retired people) and
under-represent others. The opinions of such a convenience sample may be
very different from those of the population as a whole.
Random selection methods are “better”, like “drawing names out of a
hat”. . . Use Tables of Random Numbers or Computer Softwares
JLM (WSU)
STA 1020
Ch02 - Samples, Good and Bad
25 / 100
JLM (WSU)
Example 2: Write-in
A Simple Random Sample (SRS) of size n consists of n individuals
from the population chosen in such a way that every set of n
individuals has an equal chance to be the sample actually selected
A Table of Random Digits is a long string of the ten digits
0, 1, . . . , 9 with the following properties:
People who feel strongly about an issue, particularly people with strong
negative feelings, are more likely to take the trouble to respond. Ann
Landers’ results are strongly biased.
Abigail Van Buren (niece of AL) revisited this question in her column, and
repeat the poll, and the majority of respondents would have children again.
1
2
Write-in and call-in opinion polls are almost sure to lead to strong
biased. In fact, only about 15% of the public have ever responded to a
call-in poll, and these tend to be the same people who call radio talk
shows.
STA 1020
Ch02 - Samples, Good and Bad
Note: Usually, a table of random digits is organized by lines, and
digits are grouped (Table A) or (Table A 8Ed)
27 / 100
JLM (WSU)
For example, pick a line and column at random: suppose we get line
111, column 3
28 / 100
Example 3 - SRS
Joan’s small accounting firm serves 30 business clients. Joan wants to interview a
sample of 5 clients to find ways to improve client satisfaction. To avoid bias, she
chooses an SRS of size 5. How to do it?
1
Give a numerical label. using as few digits as possible. Two digits are
needed to label 30 clients, for instance 01, 02, . . . , 29, 30. It is also correct to
use labels 00 to 29, or 31 to 60
2
Enter Table A anywhere and read two-digit groups. For instance, if we enter
at line 130, then the first 10 two-digits groups in this line are
69 05 16 48 17 87 17 40 95 17. Any two-digit group in Table A is equally
likely to be any of the 100 possible groups 00, 01, . . . , 98, 99. Joan used only
labels 01 to 30, we can ignore repeated two-groups and all other two-groups,
because (in Table A) each digit is independent of each other.
3
Thus, of the first 10 labels in line 130, we retain only 3 labels, namely,
05, 16, 17. We continuous (same line or with line 131, if needed) until five
labels are chosen.
4
Finally, we form the SRS with the clients corresponding to the chosen labels
Random labels: 605 130 929 700 412 712 . . .
Give details . . .
STA 1020
STA 1020
Ch02 - Samples, Good and Bad
Use a random digits table to select which 50 courses to sample
JLM (WSU)
Each entry in the table is equally likely to be any of the ten digits
0, 1, . . . , 9
The entries are independent of each other, i.e., the knowledge of one
part of the table gives no information about any other part
Courses w/Sta Prerequisite
Suppose there are 800 courses at an institution, alphabetized by
department (and numbered 001-800), and you decide to randomly select
50 of them to determine what proportion of all the courses have a
statistics course as a prerequisite
26 / 100
SRS
A Random Sample (RS) of size n consists of n individuals from the
population chosen in such a way that every individual has an equal
chance (or is equally likely) to be the sample actually selected
Ann Landers once asked the readers of her advice column, “If you had it
to do over again, would you have children?” She received nearly 10,000
responses, almost 70% saying “NO!” Can it be true that 70% of parents
regret having children? Not at all. This is a voluntary response sample.
JLM (WSU)
STA 1020
Ch02 - Samples, Good and Bad
29 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
5 / 17
30 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch02 - Samples, Good and Bad
SRS - How to?
Ch02 - Samples, Good and Bad
Choose an SRS in two steps
Stratified RS (Ch4!)
Choose an Stratified Random Sample in two steps
1
Labeling. Assign a numerical label to every individual in the
population (or sample frame). If you are planing to use a table of
random digits, then be sure that all labels have the same number of
digits.
1
Strata. Divide the population into groups of similar individuals, called
strata. Choose the strata according to any special interest you have in
certain groups within the population or because the individuals in
each stratum resemble each other.
2
Software or Table. Use random digits to select labels at random.
2
SRS. Take a separate SRS in each stratum and combine these to up
to make the complete sample.
Example 4 SRS using software http://www.randomizer.org
Repeat Example 3. We ask the Randomizer to generate one set of number with
five number per set. We specified the number range as 1 to 30. We requested
that each number remain unique, and that the numbers be sorted from least to
greatest. We asked to view the outputted numbers with the pace marked off.
After clicking on the “Randomize Now” button, we obtain the result (which may
be different each time we try!)
*Example 8 Ch 4: Stratifying a sample of students
A large university has 30,000 students, of whom 3,000 (10%) are graduate
students. An SRS of 500 students gives every student the same change to
be in the sample 500/30000 = 1/60. We expect an SRS of 500 to contain
only about 50 grad students, and sample of size 50 is not large enough to
estimate grad students opinion with reasonable accuracy. We may prefer a
stratified sample of 200 grad students and 300 undergraduates
*Check also Multistage Sample Design and other sampling techniques
JLM (WSU)
STA 1020
Ch02 - Samples, Good and Bad
31 / 100
JLM (WSU)
Sample Surveys (Ch4!)
STA 1020
Ch02 - Samples, Good and Bad
32 / 100
Believe a Poll (Ch4!)
Questions to ask before you believe a poll
Errors in sampling
Sampling errors are errors caused by the act of taking a sample. They
cause sample results to be different from the results of a census, e.g.,
undercoverage (recall sampling frame!).
Random sampling error is the deviation between the sample statistic
and the population parameter caused by chance in selecting a random
sample. The margin of error, in a confidence statement, includes only
random sampling error.
Nonsampling errors are errors not related to the act of selecting a
sample from the population. They can be present even in a census,
e.g. processing errors, nonresponse, wording questions
The announced margin of error for a sample survey covers only random
sampling error. Undercoverage, nonresponse, and other practical
difficulties can cause large bias that is not covered by the margin of error.
1
Who carried out the survey? Even a political party should hire a professional
sample survey firm which follows good survey practices.
2
What was the population? That is, whose opinions were being sought?
3
How was the sample selected? Look for mention of random sampling.
4
How large was the sample? Even better, find out both the sample size and the
margin of error within for a 95% confidence level
5
What was the response rate? That is, what percent of the original subjects
actually provided information?
6
How were the subjects contacted? By telephone? Mail? Face-to-face interview?
7
When was the survey conducted? Was it just after some even that might have
influenced opinion?
8
What were the exact questions asked?
Note: Academic survey centers and government statistical offices answer these
questions when they announce the results of a sample survey.
JLM (WSU)
STA 1020
Ch02 - Samples, Good and Bad
33 / 100
STA 1020
STA 1020
Ch02 - Samples, Good and Bad
2.4 Instant opinion. The BusinessWeek online poll can be found at the
Web site indicated in the “Notes and Data Sources”. The latest question
appears on the screen, and visitors to the site can simply click buttons to
vote. On March 29, 2007, the question was “Do you think Google is too
powerful?” In all, 1336 (35.9%) said “Yes,” 2051 (55.1%) said “No,” and
335 (9.0%) said “I’m not sure.” (a) What is the sample size for this poll?
(b) At the Web site, BusinessWeek includes the following statement about
its online poll. “Note: These are surveys, not scientific polls.” Explain why
the poll may give unreliable information. (c) Just above the poll question
was the following statement: “Google’s accelerating lead in search and its
moves into software and traditional advertising are sparking a backlash
among rivals.” How might this statement affect the poll results
JLM (WSU)
JLM (WSU)
Exercise Ch02
34 / 100
Exercise (answer) Ch02
**Answers
2.4 Instant opinion. (a) The sample size is 3,722. (b) Online polls may
not be reliable for a number of reasons (e.g., not everyone has a chance to
access the poll or to know about the poll, we don’t know if people can
answer the poll more than once). (c) This statement might bias more
people to say “Yes” than they otherwise would.
35 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
6 / 17
36 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch02 - Samples, Good and Bad
Multiple choice Ch02
Ch03 - What Do Samples Tell Us?
The student newspaper runs a weekly question that readers can answer online or
by campus mail. One question was “Do you think the college is doing enough to
provide student parking?” Of the 82 people who responded, 79% said “No.”
When we say that the newspaper poll is biased, we mean that
(a) faculty may have a different opinion from students. (b) repeated polls would
give results that are very different from each other. (c) the question asked shows
gender or racial bias. (d) repeated polls would miss the truth about the
population in the same direction.
Answer: (d)
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
In a table of random digits,
(a) each pair of digits 00, 01, 02, ..., 99 appears exactly once in any row of the
table. (b) any pair of entries is equally likely to be any of the 100 possible pairs
00, 01, 02, . . . , 99. (c) a specific pair such as 00 cannot be repeated until all
other pairs have appeared. (d) the pair 00 can appear, but 000 is not random
and can never appear in the table.
Answer: (b)
JLM (WSU)
STA 1020
Ch03 - What Do Samples Tell Us?
Instructor: Dr. J.L. Menaldi
Textbook - Statistics: Concepts and Controversies,
by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed]
Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm
“Statistics” is the Science of collecting, describing and interpreting data...
It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws
of probability, the theory of statistics would not be possible
JLM (WSU)
37 / 100
Thought Questions. . .
STA 1020
Ch03 - What Do Samples Tell Us?
Chapter 3
38 / 100
Taking a Sample
Sampling Terminology
During a medical exam, the doctor measures your cholesterol two times.
Do you think both measurements would be exactly the same? Why or why
not?
.......................................................................
To estimate the percentage of all adults who have an internet connection
in their homes, a properly chosen sample of 1100 adults across the US was
sampled, and 60% said “yes”.
How close do you think that is to the percentage of the entire country who
have an internet connection?
Within 30%? 10%? 5%? 1%? Exactly the same?
JLM (WSU)
STA 1020
Fall 2013 Section 09 MWF 10:40-11:35 0035 State
STA 1020
Ch03 - What Do Samples Tell Us?
Statistical inference is the process of drawing conclusions
about the population based on a sample
Parameter is a fixed, but unknown, number that describes some
characteristic the population
Statistic is a known value calculated from a sample, a statistic is
used to estimate a parameter
Bias in repeated samples: the sample statistic consistently
misses the population parameter in the same direction
Variability: different samples from the same population may yield
different values of the sample statistic
Sampling distribution is the distribution of a given statistic
based on a random sample, essential for statistical inference
JLM (WSU)
39 / 100
Example 1: Proportion in Favor
STA 1020
Ch03 - What Do Samples Tell Us?
40 / 100
Example 2: Lots of Samples
The proportion of all adults who favor a constitutional amendment (that
would define marriage as being between a man and a woman) is a parameter
describing the population of 220 millions of US adults. Call it p, for
“proportion.” Alas, we do not know the numerical value of p (Why?).
To estimate p, Gallup took a sample of 2527 adults. The proportion of the
sample who favor such an amendment is a statistic. Call it p̂. It happens
that 1289 of this sample (of size 2527) said that they favor an
amendment, so for this sample p̂ = 1289/2527 = 0.51 (i.e., 51%).
Figure 3.1 Draw 1000 SRSs of size 100 from the same population
.......................................................................
Because all adults had the same chance of be among the chosen 2527 (SRS), it
seems reasonable to use the statistic p̂ as an estimate of the unknown
parameter p. It is a fact that 51% of the sample favored an amendment
(we know because we asked them). We do not know what percentage of all adults favor
an amendment, but we estimate that about 51% do (Based on what?)
Figure 3.2 Draw 1000 SRSs of size 2527 from the same population as in Figure 3.1.
JLM (WSU)
STA 1020
41 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
7 / 17
42 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch03 - What Do Samples Tell Us?
Two Type of Errors
Ch03 - What Do Samples Tell Us?
Example 3: TV News
Here is what the TV news announcer says: “A new Gallup Poll finds that
a slim majority of 51% of American adults favor a constitutional amendment
To reduce bias, use
random sampling
that would define marriage as being between a man and a woman, thus barring
The margin of error for the poll was
2 percent points.” Plus or minus 2% starting at 51% is from 49% to 53%. Most
marriages between gay and lesbian couples.
To reduce variability, use
larger samples
people think Gallup claims that the truth about the entire population lies in that range.
This is what Gallup actually said: “For results based on a sample of this size, one
A good sampling method
has both small bias and
small variability.
can say that with 95% confidence that the error attributable to sampling and other
That is, Gallup
tells us that the margin of error includes the truth about the entire
population for only 95% of all its sample. The “95% confidence” is a
shorthand for that. The news report left out the “95% confidence.”
random effects could be plus o minus 2 percents points for adults.”
Figure 3.3: Bias and variability in shooting arrows at a target. Bias means the archer
systematically misses in the same direction. Variability means that the arrows are scattered
JLM (WSU)
STA 1020
Ch03 - What Do Samples Tell Us?
43 / 100
In 95% of surveys, the sample proportion will not differ from the
population proportion by any more than the margin of error. (“95%
confidence”)
More details will be discussed in later Chapters... Recall two issues:
√
√
Margin of Error p̂ − 1/ n ≤ p ≤ p̂ + 1/ n
95% Confidence, i.e., only 5% of the SRS may miss the margin of error
All based on the sampling distribution!
STA 1020
Ch03 - What Do Samples Tell Us?
√
STA 1020
1
1
=
= 0.020,
50.27
2527
about 2.0%
Gallup announced a margin of error of 2%, and our quick method agree
with this. In general our quick method may disagree a bit with Gallup’s for
two reasons. First, polls usually round their announced margin of error to
the nearest whole percent to keep press releases simple. Second, our rough
formula works for SRS, more details in later chapters
JLM (WSU)
45 / 100
STA 1020
Ch03 - What Do Samples Tell Us?
For instance, to decrease by half the margin or error we need to increase
the sample size four times.
44 / 100
Example 4: What is the MoE?
The Gallup Poll in Example 1 interviewed 2527 people. The margin of
error for 95% confidence will be about
Example 5: MoE and Sample Size
In Example 2, we compared the results of taking many SRSs of size
n = 100 and many SRSs of size n = 2527 from the same population. We
found that the spread of the middle 95% of the sample results was about
five times larger for the smaller samples.
Our quick formula estimates the margin of error for SRSs of size 2527 to
be√
about 2.0%. The margin of error for SRSs of size 100 is about
1/ 100 = 1/10 = 0.1, i.e., 10%. Because 2527 is roughly 25 times 100
and the square root of 25 is 5, the margin of error is about five times
larger for samples of 100 people than for samples of 2527 people.
JLM (WSU)
STA 1020
Ch03 - What Do Samples Tell Us?
The amount by which the proportion obtained from the sample (p̂) will
differ from the true population proportion (p) rarely exceeds the margin of
error.
√
Typical Margin of Error: 1/ n
JLM (WSU)
JLM (WSU)
Margin of Error / Confidence
46 / 100
Confidence statements
A confidence statement has two parts: a margin of error and a level of
confidence.
The margin of error says how close the sample statistic lies to the
population parameter.
The level of confidence says what percentage of all possible samples
satisfy the margin of error.
.......................................................................
Population size does not matter: The variability of a statistic from a
random sample does not depend on the size of the population as long as
the population is at least 100 times larger than the sample. For a SRS of
√
size n, the margin of error for 95% confidence is equal to 1/ n
47 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
8 / 17
48 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch03 - What Do Samples Tell Us?
Interpreting confidence statements
Ch03 - What Do Samples Tell Us?
** We must be careful to interpret confidence intervals correctly.
There is a correct interpretation and many different and creative incorrect
√
√
interpretations of the confidence interval p̂ − 1/ n ≤ p ≤ p̂ + 1/ n
√
Correct: “We are 95% confident that the interval from p̂ − 1/ n to
√
p̂ + 1/ n actually does contain the true value of the population
proportion p.”
This means that if we were to select many different samples of size n and
construct the corresponding confidence intervals, then at least, 95% of
them would actually contain the value of the population proportion p.
The level of 95% refers to the success rate of the process being used to
estimate the proportion.
.......................................................................
INCORRECT: “There is a 95% chance that the true value of p will fall
√
√
between p̂ − 1/ n and p̂ + 1/ n.”
It would also be incorrect to say that “95% of sample proportions fall
√
√
between p̂ − 1/ n and p̂ + 1/ n.”
JLM (WSU)
STA 1020
Ch03 - What Do Samples Tell Us?
Theoretical Probabilities:
0.05652
3.8 A sampling experiment. Let us illustrate sampling variability in a
small sample from a small population. Ten of the 25 club members listed
below are female. Their names are marked with asterisks in the list. The
club chooses 5 members at random to receive free trips to the national
convention.
Alonso
Binet*
Blumenbach
Chase*
Chen*
0.25692
JLM (WSU)
49 / 100
0.38538
0.23715
Darwin
Epstein
Ferri
Gonzales*
Gupta
Herrnstein
Jimenez*
Luo
Moll*
Morales*
Vogt*
Went
Wilson
Yerkes
Zimmer
STA 1020
Ch03 - What Do Samples Tell Us?
0.05929
Myrdal
Perez*
Spencer*
Thomson
Toulmin
(a) Draw 20 SRSs of size 5, using a different part of Table A each time.
Record the number of females in each of your samples. Make a histogram
like that in Figure 3.1 to display your results. What is the average number
of females in your 20 samples? (b) Do you think the club members should
suspect discrimination if none of the 5 tickets go to women?
Exercise (answer) Ch03
**Answers
3.8 A sampling experiment. (a) Results will vary (see note following),
but the theoretical set of probabilities is shown following; the theoretical
mean number of women is 2. (b) We would expect only about 1 in 20
samples to have no women in them; this is not impossible, but is at least
unlikely, so it may be enough reason to suspect discrimination.
Exercise Ch03
50 / 100
Multiple choice Ch03
The student newspaper runs a weekly question that readers can answer
online or by campus mail. One question was “Do you think the college is
doing enough to provide student parking?” Of the 82 people who
responded, 79% said “No.”
1
0.00474
The number 79% is a
(a) statistic. (b) parameter. (c) reliability. (d) margin of error.
Answer: (a)
. .................................................................
These theoretical probabilities were computed using
a hypergeometric distribution. 90% of all students
who do this exercise will
have no women in two or
fewer of their 20 samples,
while nearly all (99.5% of
all students) will have no
women in no more than
four samples.
JLM (WSU)
STA 1020
2
If we applied the quick method to the poll we would obtain this 95%
confidence interval:
(a) 79% ± 11%. (b) 79% ± 9%. (c) 82 ± 79. (d) 82 ± 11%.
Answer: (a)
51 / 100
JLM (WSU)
Ch05 - Experiments, Good and Bad
STA 1020
Ch05 - Experiments, Good and Bad
52 / 100
Thought Questions. . .
Chapter 5
In an observational study, researchers observe what individuals do (or have
done) naturally, while in an experiment, they randomly assign the
individuals to groups to receive one of several “treatments”.
STA 1020
Fall 2013 Section 09 MWF 10:40-11:35 0035 State
1
Ex1 Learning on the Web: . . . students learning undergraduate
courses on line were “equal in learning” to students taking the same
courses in class . . .
2
Ch1 - Ex3 Do power lines causes leukemia in children? Electric
currents generate magnetic fields. Strong magnetic fields can disturb
living cells. What about if you live near power lines?
Instructor: Dr. J.L. Menaldi
Textbook - Statistics: Concepts and Controversies,
by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed]
Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm
“Statistics” is the Science of collecting, describing and interpreting data...
This last example is a situation where an experiment would not be feasible
and thus an observational study is used.
It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws
of probability, the theory of statistics would not be possible
JLM (WSU)
STA 1020
53 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
9 / 17
54 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch05 - Experiments, Good and Bad
Alzheimer’s disease
Ch05 - Experiments, Good and Bad
In studies to determine the relationship between two conditions (activities,
traits, etc.), one of them is often defined as the explanatory (independent)
variable and the other as the outcome or response (dependent) variable. In
an experiment to determine whether the drug memantine improves
cognition of patients with moderate to severe Alzheimer’s disease, whether
or not the patient received memantine is one variable, and cognitive score
is the other.
Which is the explanatory variable and which is the response variable? How
would you go about randomizing 100 patients to the two treatment groups
(memantine group & placebo group)?
Why is it necessary to randomly assign the subjects, rather than having
the experimenter decide which patients should get which treatment?
JLM (WSU)
STA 1020
Ch05 - Experiments, Good and Bad
For Experiments or Clinical Trials
Response variable: what is measured as the outcome or result of a
study
Explanatory variable: what we think explains or causes changes in
the response variable, often determines how subjects are split into
groups)
Subjects: the individuals that are participating in a study
Treatments: specific experimental conditions, (related to the
explanatory variable) applied to the subjects. If an experiment has
several explanatory variables, a treatment is a combination of specific
values of these variables.
The Experimental units are the objects on which measurements are taken
55 / 100
JLM (WSU)
Ex.: The effect of day care
STA 1020
Ch05 - Experiments, Good and Bad
. . . The Carolina Abecedarian Project has followed a group of children
since 1972. The Abecedarian Project is an experiment in which the
subjects are 111 people who in 1972 were healthy but low-income black
infants in Chapel Hill, North Caroline. All the infants received nutritional
supplements and help from social workers. Half, chosen at random, were
also placed in an intensive preschool program. The experiment compares
these two treatments.
The explanatory variable is just “preschool, yes or no”. There are many
response variables, recorded over more than 30 years, including academic
test scores, college attendance, and employment.
This long and expensive experiment did not completely established that a
good day care makes a big difference in later school and work.
Common Language
56 / 100
Back to Ex1, Really?
Do students who take a course via the Web learn as well as those who
take the same course in a traditional classroom? The best way to find out
is to assign some students to the classroom and other to the Web. That is
an experiment. If students chose for themselves whether to enroll in a
classroom or online version of a course, then this is not an experiment (no
treatment is imposed to the subjects).
Actually, students who chose the online course were very different from the
classroom students, e.g., their average score on tests on the course
material given before the courses started was 40.70, against only 27.64 for
the classroom students.
Confounded variables? Lurking variables!
A following question that we would like to answer is: “how good must day
care be to really help children succeed in life” . . .
JLM (WSU)
STA 1020
Ch05 - Experiments, Good and Bad
57 / 100
JLM (WSU)
Ex1 Confounded Variables
STA 1020
Ch05 - Experiments, Good and Bad
58 / 100
Confounding
A lurking variable is a variable that has an important effect on the
relationship among the variables in a study but is not one of the
explanatory variables studied
Two variables are confounded when their effects on a response
variable cannot be distinguished from each other. The confounded
variables may be either explanatory variables or lurking variables.
Placebo Effect: A placebo is dummy treatment with not active
ingredients. Many patients respond favorable to any treatment, even a
placebo. The response to a dummy treatment is the placebo effect.
Perhaps the placebo effect is in our minds, based on the trust in the
doctor, expectations of a cure, or else . . . (read Example 3)
Figure 5.1 Confounding in the Nova Southeastern University study. The influence of course
setting (the explanatory variable) cannot be distinguished from the influence of student
preparation (a lurking variable)
JLM (WSU)
STA 1020
59 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
10 / 17
60 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch05 - Experiments, Good and Bad
Logic Design
Ch05 - Experiments, Good and Bad
Cause-and-Effect Conclusions
Randomization produces groups of subjects that should be similar in
all respect before we apply the treatments
Comparative design ensures that influences other than experimental
treatments operate equally on all groups
Example 4 Sickle-cell Anemia
. . . is an inherited disorder of the red blood cells . . . The NIH carried out a
clinical trial of the drug hydroxyurea . . . half received placebo . . . schedule
medical checkups . . . Lurking variables affected both groups equally
. . . The experiment was stopped ahead of schedule because the
hydroxyurea groups had many fewer pain episodes . . .
Therefore, difference in the response variable (as far as our knowledge
goes!) must be due to the effects of the treatments
We use chance to choose the groups in order to eliminate any systematic
bias in assigning the subjects to groups, i.e., use SRSs to form the groups.
Figure 5.2 The design of a randomized comparative experiment to compare hydroxyurea with a
placebo for treating sickle-cell anemia, for Example 4.
JLM (WSU)
STA 1020
Ch05 - Experiments, Good and Bad
61 / 100
JLM (WSU)
Example 5 Conserving Energy
STA 1020
Ch05 - Experiments, Good and Bad
An electric company considers placing electronic meters in households to show what the cost
would be if the electricity use at that moment continued for a month. One cheaper approach is
The
experiment compares theses two approaches (meter & chart) and also a
control (information, but no help).
to give customers a chart and information about monitoring their electricity use.
62 / 100
Design & Conclusions
A solution to handle Lurking Variables for an
Experiment: randomize experimental units to receive different
treatments (possible confounding variables should “even out” across
groups)
Observational Study: measure potential confounding variables and
determine if they have an impact on the response (may then adjust
for these variables in the statistical analysis)
If an experiment or observational study finds a difference in two (or more)
groups, is this difference really important?
If the observed difference is larger than what would be expected just by
chance, then it is labeled statistically significant. Rather than relying
solely on the label of statistical significance, also look at the actual results
to determine if they are practically important
Figure 5.3 The design of a randomized comparative experiment to compare three programs to
reduce electricity use by households, for Example 5
JLM (WSU)
STA 1020
Ch05 - Experiments, Good and Bad
63 / 100
Control the effects of lurking variables on the response, most simply
by comparing two or more treatments
Randomize-use impersonal chance to assign subjects to treatments
Use enough subjects in each group to reduce chance variation in
the results
An observed effect of a size that would rarely occur by chance is called
statistically significant
Ex 6 (Living longer through religion) and Ex 7 (Sex bias in treatment heart
disease?) Case Study Evaluated!
STA 1020
STA 1020
Ch05 - Experiments, Good and Bad
The basic principles of statistical design of experiments are:
JLM (WSU)
JLM (WSU)
Principles of experimental design
64 / 100
Exercise Ch05
5.6 Aspirin and heart attacks. Can aspirin help prevent heart attacks?
The Physicians’ Health Study, a large medical experiment involving 22,000
male physicians, attempted to answer this question. One group of about
11,000 physicians took an aspirin every second day, while the rest took a
placebo. After several years the study found that subjects in the aspirin
group had significantly fewer heart attacks than subjects in the placebo
group. (a) Identify the experimental subjects, the explanatory variable and
the values it can take, and the response variable. (b) Use a diagram to
outline the design of the Physicians’ Health Study. (When you outline the
design of an experiment, be sure to indicate the size of the treatment
groups and the response variable. The diagrams in Figures 5.2 and 5.3 are
models.) (c) What do you think the term “significantly” means in
“significantly fewer heart attacks”?
65 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
11 / 17
66 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch05 - Experiments, Good and Bad
Exercise (answer) Ch05
Ch05 - Experiments, Good and Bad
**Answers
5.6 Aspirin and heart attacks. (a) The subjects are the physicians, the
explanatory variable is medication (aspirin or placebo), and the response
variable is health, specifically whether the subjects have heart attacks. (b)
Random
Assignment
Group 1
11,000
physicians
Treatment 1
Aspirin
Group 2
11,000
physicians
Treatment 2
Placebo
JLM (WSU)
1
This study is
(a) a randomized comparative experiment. (b) an experiment, but without
randomization. (c) a simple random sample. (d) an observational study, but
not a simple random sample.
Answer: (d)
2
The explanatory variable in this study is
(a) whether the subject had an auto accident. (b) whether the subject was
using a cell phone. (c) the risk of an accident. (d) whether the subject
Answer: (b)
owned a cell phone.
3
An example of a lurking variable that might affect the results of this study is
(a) whether the subject had an auto accident. (b) whether the subject was
using a cell phone. (c) whether the subject was talking to a passenger in the
Answer: (c)
car. (d) whether the subject owned a cell phone.
Observe heart
attacks
(c) “Significantly” means “unlikely to have occurred by chance if there
were no difference between the aspirin and placebo groups.”
STA 1020
Multiple choice Ch05
Does using a cell phone while driving make an accident more likely? Researchers
compared telephone company and police records to find 699 people who had cell
phones and were also involved in an auto accident. Using phone billing records,
they compared cell phone use in the period of the accident with cell phone use
the same period on a previous day. Result: the risk of an accident was four times
higher when using a cell phone.
JLM (WSU)
67 / 100
Ch06 - Experiments in the Real World
STA 1020
Ch06 - Experiments in the Real World
68 / 100
Thought Questions. . .
Chapter 6
Suppose you are interested in determining if drinking a glass of red wine
each day helps prevent heartburn.
STA 1020
Fall 2013 Section 09 MWF 10:40-11:35 0035 State
You recruit 40 adults age 50 and older to participate in an
experiment. You want half of them to drink a glass of red wine each
day and the other half to not do so.
Instructor: Dr. J.L. Menaldi
Textbook - Statistics: Concepts and Controversies,
by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed]
You ask them which they would prefer, and 20 say they would like to
drink the red wine and the other 20 say they would not.
You ask each of them to record how many cases of heartburn they
have in the next six months. At the end of that time period, you
compare the results reported from the two groups.
Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm
“Statistics” is the Science of collecting, describing and interpreting data...
Give three reasons why this is not a good experiment
It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws
of probability, the theory of statistics would not be possible
JLM (WSU)
STA 1020
Ch06 - Experiments in the Real World
69 / 100
To find out if a new breakfast cereal provides good nutrition, we compare
the weight gains of young rats fed the new product and rats fed a standard
diet. The rats are randomly assigned to diets and are housed in large racks
of cages. It turn out that rats in upper cages grows a bit faster than rats
in bottom cages. So, if we put rats fed the new product at the top and
those fed the standard diet below, the experiment is biased.
Cholesterol level . . . human affection . . . All of the rabbit subjects ate the
same diet. Some (chosen at random) were regularly removed from their
cages to have their furry heads scratched by friendly people. The Rabbits
who received affection had lower cholesterol . . .
STA 1020
STA 1020
Ch06 - Experiments in the Real World
Rats & Rabbits that are specially bread to be uniform in their inherited characteristics
are subjects in experiments
JLM (WSU)
JLM (WSU)
Ex1 Rats & Rabbits
70 / 100
Ex2 Powerful Placebo
One study found that 42% of balding men maintained or increased the
amount of hair on their heads when they took a placebo.
Another study told 13 people who were very sensitive to poison ivy that
the stuff being rubbed on one arm was poison ivy. It was a placebo, but all
13 broke out in a rash. The stuff rubbed in the other arm really was
poison ivy, but the subjects were told it was harmless, and only 2 of the 13
developed a rash.
The strength of the placebo effect (as we can observe!) in medical
treatment is hard to pin down because it depends on the exact
environment. Even, how enthusiastic the doctor is seems to matter a lot.
When the ailment is vague and psychological, like depression, some
experts thinks 3/4 of the effect of drugs is just placebo effect. Certainly,
other disagree!
71 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
12 / 17
72 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch06 - Experiments in the Real World
Double-blinding
Ch06 - Experiments in the Real World
In a Double-blinding experiment neither the subjects not the people who
works with them know which treatment each subject is receiving. Help to
control respondent bias
Examples 3 & 4
Sample surveys suffer from non response due to failure to contact some
people selected for the sample and refusal of other to participate.
Experiments with human subject may suffer from similar problems
Minorities, women, the poor and the elderly have long been under
represented in clinical trials, for various reasons. But, refusals remain
a problem
Subjects who participate but don’t follow the experimental treatment
(called non adherers) can also cause bias
Experiments that continue over an extended period of time also suffer
dropouts, i.e., subjects who begin the experiment but do not
complete it
JLM (WSU)
STA 1020
Ch06 - Experiments in the Real World
73 / 100
Are our findings statistically
significant?, i.e., Are they too strong
to often occur just by change? [or
equivalently, our findings would rarely
occur by chance along?]
STA 1020
Ch06 - Experiments in the Real World
STA 1020
Ch06 - Experiments in the Real World
The treatments, the subjects or the
environment may not be realistic
JLM (WSU)
JLM (WSU)
Can we generalize?
A psychologist wants to study the effects of failure and frustration on the
relationships among members of a work team. She forms a team of
students, brings them to the psychology lab, and has them play a game
that requires teamwork. The game is rigged so that they lose regularly.
The psychologist observes the students through a one-way windows and
note the changes in their behavior during an evening of game playing. . .
The subjects (students who know they are subjects in an experiment), the
treatment (a rigged game) and the environment (the psychology lab) are
all unrealistic if the goal is to reach conclusions about the effects of
frustration on teamwork in the workplace
75 / 100
JLM (WSU)
Ex6 Center brake lights
STA 1020
Ch06 - Experiments in the Real World
Cars sold in US since 1986 have been required to have a high center brake
light in addition the usual two brake lights at the rear of the vehicle. This
safety requirement was justified by the randomized comparative experiment
with fleets of rental and business cars. The experiments showed that the
third brake light reduced rear-end collisions by as much as 50%.
74 / 100
Ex5 Studying frustration
76 / 100
Ex8 Completely randomized
Effects of TV advertising
. . . All subjects viewed a 40 min TV program that include ads for a digital camera. Some
subjects saw a 30-sec commercial; other a 90-sec version. The same commercial was
repeated either 1, 3 or 5 times during the program. After viewing, all of the subject
answered questions about their recall of the ad, their attitude toward the camera, and
their intention to purchase it.
After a decade in actual use, the Insurance Institute found only a 5%
reduction in rear-end collisions.
Most cars did not have the extra brake light when the experiment were
carried out, so it caught the eye of following drivers. Now that almost all
cars have the third light, it no longer capture attention.
Read Example 7: Are subjects treated too well?
Carolina Abecedarian Project (Ex 2, Ch05) faces the same “too good to
be realistic” question . . .
Figure 6.1 The treatments in the experiment of Example 8. Combinations of two explanatory variables form 6 treatments
. . . the interaction of several factors can produce effects that could not be predicted from
looking at the effect of each factor alone . . .
Now it’s your turn. Tasty cakes.
JLM (WSU)
STA 1020
77 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
. . . baking time or temperatures . . .
STA 1020
13 / 17
78 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch06 - Experiments in the Real World
Ex9 Coke versus Pepsi
Ch06 - Experiments in the Real World
Pepsi wanted to demonstrate that Coke drinkers prefer Pepsi when they
taste both colas blind. The subjects, all people who said they were Coke
drinkers tasted both colas from glasses without brand markings and said
which they liked better. This is a matched pairs design in which each
subject compares the two colas. Because responses may depend on which
cola is tasted first, the order of tasting should be chosen ar random for
each subject.
When more that half the Coke drinkers chose Pepsi, Coke claimed that the
experiment was biased. The Pepsi glasses were marked M and the Coke
glasses were marked Q. Aha, said Coke, the results could just mean that
people like the letter M better than the letter Q. The matched pair is OK,
but a more careful experiment would avoid any distinction other than Coke
versus Pepsi
Ex10 Men, women & ads
An experiment to compare the effectiveness of three TV commercials for
the same product will want to look separately at the reactions of men and
women, as well as the overall response to the ads . . .
Figure 6.2 A block design to compare the effectiveness of three TV advertisements. Female and male subjects form two blocks
Ex 11 (Comparing welfare) and Statistical Controversies!
JLM (WSU)
STA 1020
Ch06 - Experiments in the Real World
JLM (WSU)
79 / 100
Some Techniques
STA 1020
Ch06 - Experiments in the Real World
In a Double-blinding experiment neither the subjects not the people
who works with them know which treatment each subject is receiving.
This help to control experimenter/respondent bias.
In a Completely Randomized experimental design, all experimental
subjects are allocated at random among all treatments.
A block is a group of experimental subjects that are known before
the experiment to be similar in some way that is expected to affect
the response to treatments.
In a block design, the random assignment of subjects to treatments
is carried out separately within each block. Matched Pairs Design is
a particular case.
80 / 100
Exercise Ch06
6.8 Fatty acids and depression. A group of medical researchers studied
the effects of the intake of omega-6 fatty acids relative to omega-3 fatty
acids on depression. They randomly assigned 88 highly stressed and
depressed subjects to either a diet high in omega-6 fatty acids relative to
omega-3 fatty acids or a diet with a much lower amount of omega-6 fatty
acids relative to omega-3 fatty acids. They found that subjects generally
showed increased symptoms of depression on the high omega-6 diet
compared with those on the low omega-6 diet. The researchers themselves
cautioned against interpreting these experimental results as a general
warning that diets rich in omega-6 fatty acids increase depression. Why?
Simple or multistage or stratified random samples (or experiments).
JLM (WSU)
STA 1020
Ch06 - Experiments in the Real World
81 / 100
JLM (WSU)
Exercise (answer) Ch06
STA 1020
Ch06 - Experiments in the Real World
**Answers
6.8 Fatty acids and depression. Since the subjects were “highly stressed
and depressed,” the researchers need to be cautious about releasing a
general warning, since the existing stress and depression was confounded
with the treatments of interest.
82 / 100
Multiple choice Ch06
A study of a drug to prevent hair loss showed that 86% of the men who took it
maintained or increased the amount of hair on their heads. But so did 42% of the
men in the same study who took a placebo instead of the drug. This is an
example of
(a) a sampling error: the study should not have included men whose hair grew
without the drug. (b) the placebo effect: a treatment often works if you believe
that it will work. (c) an error in calculating percentages. (d) failure to use the
double-blind idea.
Answer: (b)
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A psychologist wants to know if adults with normal vision can be fooled by a
certain optical illusion. She recruits 50 students from her PSY 120 class and finds
that 42 of them are fooled by the illusion. The biggest potential weakness of
experiments is
(a) they do not give good evidence for cause and effect. (b) they only work when
we can give a placebo. (c) it can be hard to generalize conclusions beyond the
actual subjects to a wider population. (d) informed consent is often not possible.
Answer: (c)
JLM (WSU)
STA 1020
83 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
14 / 17
84 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch08 - Measuring
Ch08 - Measuring
Thought Questions. . .
Chapter 8
A local health club is doing a survey to see if there is a relationship
between strength and fitness. They want to measure the fitness and
strength of a sample of 100 members of the club. Which of these two
attributes do you think will be easier for them to measure? Explain
STA 1020
Fall 2013 Section 09 MWF 10:40-11:35 0035 State
fitness: ‘the quality of being suitable’ or ‘good physical condition; being in shape’
strength: ‘the property of being physically or mentally strong’ or ‘physical energy or intensity’
Instructor: Dr. J.L. Menaldi
Textbook - Statistics: Concepts and Controversies,
.......................................................................
by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed]
Would you get the same result if you measure (something) again, and
again?
.......................................................................
Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm
“Statistics” is the Science of collecting, describing and interpreting data...
It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws
of probability, the theory of statistics would not be possible
JLM (WSU)
STA 1020
Ch08 - Measuring
A study on customer service found that there were more customer
complaints registered at a large local grocery store in the past year than at
a small local market. Is it fair to conclude that the local market had better
customer service? What would be a fairer way to see the numbers?
JLM (WSU)
85 / 100
Ex1 Patients / Measurement
STA 1020
Ch08 - Measuring
86 / 100
Ex2 Length. . .
Clinical trials tend to measure things that are easy to measure, e.g., blood pressure,
tumor size, virus concentration in the blood. They often do not directly measure what
matters most to patients (does the treatment really improve their lives?) One study
found that only 5% of trials published between 1980 and 1997 measured the effect of
treatment on patients’ emotional well-being or their ability to function in social setting
To measure the length of a bed, you can use a tape measure as instrument (in inches or
centimeters, as unit of measurement)
We measure a property of a person or thing when we assign a number to
represent the property, i.e., the property is being quantified.
You might decide to use the number of people who die in motor vehicle accidents in a
year as a variable to measure highway safety The government’s Fatal Accident Reporting
System collects data on all fatal traffic crashes (instrument?, unit?)
We often use an instrument to make a measurement. We may have a
choice of the units we use to record the measurements.
Questions to ask in any statistical study:
The result of measurement is a numerical variable that takes different
values for people or things that differ in whatever we are measuring.
JLM (WSU)
STA 1020
Ch08 - Measuring
To measure a student’s readiness for college, you might ask the student to take the SAT
Reasoning exam (instrument?). The variable is the student’s score (in points, between
600 and 2400, combining Writing, Critical Reading and Math sections)
1
Exactly, how are the variables defined?
2
Are the variables a valid way to describe the properties they claim to
measure?
3
How accurate are the measurements?
JLM (WSU)
87 / 100
Ex3 Measuring unemployment
STA 1020
Ch08 - Measuring
Each month the Bureau of Labor Statistics (BLS) announces the
unemployment rate for the previous month. People who are not available
for work (i.e, retired, student who do not want to work while in school,
etc) are not counted as unemployed. The unemployment rate is the rate
between the number of people unemployment and the number of people in
the labor force.
88 / 100
Ex3 Measuring unemployment
The interviewer for BLS cannot simply ask “Are you employed”? Many
questions are needed to classify a person as employed, unemployed, or in
the labor force.
To complete the exact definition of unemployment rate, the BLS has very
detailed descriptions of what it means to be “in the labor force” and what
it means to be “unemployed”. For instance, if you are on strike but expect
to return to the same job, you count as employed. If you are not working
and did not look for work in the last two weeks, you are not in the labor
force. The details matter
Figure 8.1 The unemployment rate from August 1991 to July 1994. The gap shows the effect of a change in how the
government measures unemployment
JLM (WSU)
STA 1020
89 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
15 / 17
90 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch08 - Measuring
Known your variables
Ch08 - Measuring
A variable is a valid measure of a property if it is relevant or
appropriate as a representation of that property
Often a rate (a fraction, proportion, or percent) at which something
occurs is a more valid measure than a simple count of occurrences
A measurement of a property has predictive validity if it can be used
to predict success on tasks that are related to the property measured.
Often it is difficult to determine if a measurement is valid, especially
for behavioral properties
Ex4 Measuring highway safety
Roads get better. Speed limits increase. Big SUVs replace cars. Enforcement campaigns
reduce drunk driving. How as highway safety changed over time in this changing
environment? The Fatal Accident Reporting System says there were 40,716 deaths in
1994 and 42,642 deaths 12 years later in 2006. The number of death has increased. But
the number of licensed drivers rose from 175 million in 1994 to 201 million in 2006. The
number of miles that people drove rose from 2,358 billion to 2,996 billion. If more
people drive more miles, there may be more deaths even if the roads are safer. The
count of deaths in not a valid measure of highway safety.
Rather than a count, we should use a rate. The number of deaths per mile driven taken
into account the fact that more people drive more miles than in the past.
This deaths rate is the ratio between “motor vehicle deaths” and “100s of millions of
miles driven”, i,e, 42642/29960 = 1.4 for 2006, and 40716/23580 = 1.7 for 1994. That
is a decrease, i.e., there were 18% fewer deaths per mile driven in 2006 than in 1994.
Driving is getting safer. . .
JLM (WSU)
STA 1020
Ch08 - Measuring
JLM (WSU)
91 / 100
Ex5/6 Achievement/IQ tests
STA 1020
Ch08 - Measuring
When you take a statistics exam, you hope that it will ask you about the
points of the course syllabus. If it does, the exam is a valid measure of how
much you know about the course material . . . Experts can judge validity by
comparing the test questions with the syllabus of material covered.
.......................................................................
Psychologists would like to measure aspects of the human personality that
cannot be observed directly, such as “intelligence” or authoritarian
personality”. Some psychologists affirm that IQ test measure intelligence,
but others disagree. If we cannot agree on exactly what intelligence is, we
cannot agree on how to measure it!
.......................................................................
92 / 100
Accuracy & Reliability
Errors in measurement
We can think about errors in measurement this way:
measured value = true value + bias + random error
A measurement process has bias if it systematically overstates or
understates the true value of the property it measures
A measurement process has random error if repeated measurements
on the same individual give different results. If the random error is
small, we say the measurement is reliable
No measuring process is perfectly reliable. The average of several
repeated measurements of the same individual is more reliable (less
variable) than a single measurement
Read Ex7 (The SAT again), Statistical Controversies
JLM (WSU)
STA 1020
Ch08 - Measuring
....................................................................................
Improving reliability, reducing bias. What time is it? Much modern technology requires
very exact measurement of time, such as the Global Positioning System (GPS), which
uses satellite signals to tell you where you are. Time starts with earth’s path around the
sun, which last one year, but the earth is much too erratic. Since 1967, time starts with
the standard second, defined to be the time required for 9,192,631,770 vibrations of a
cesium atom. Physical clocks are bothered by changes in temperature, humidity and air
pressure. The cesium atom does not care.
....................................................................................
Navigation: Measuring latitude (an imaginary line around the Earth parallel to the
equator) position at night in relation to the stars is relatively simple, but measuring the
longitude (angular distance between a point on any meridian and the prime meridian at
Greenwich) position was a ‘problem’ in the past . . .
STA 1020
STA 1020
Ch08 - Measuring
In the mid-19th century, it was thought that measuring the volume of a human skull
would measure the intelligence of the skull’s owner . . . A professor of surgery showed
that filling a skull with small lead shot, then pouring out the shot and weighing it, gave
a reliable measurement of the skull’s volume. This accurate measurements do not,
however, give a valid measure of intelligence. Skull volume turned out to have no
relation to intelligence or achievement.
JLM (WSU)
JLM (WSU)
93 / 100
Ex8 Smart brains?
94 / 100
Ex9 Really accurate time
NIST’s (National Institute of Standards and Technology) atomic clock is
very accurate but not perfectly accurate. The world standard is
Coordinated Universal Time, compiled by International Bureau of Weights
and Measures (BIPM) in Sévres, France. BIPM does not have a better
clock that NIST. It calculates the time by averaging the results of more
than 200 atomic clocks around the world. NIST tells us (after the fact)
how much it misses the correct time by (about 10−9 sec).
In the long run, NIST’s measurement of time are not biased (sometimes
shorter other longer than BIPM).
The average (mean) of several measurements is more reliable than a
single measurement.
The National Institute of Standards and Technology (NIST) keeps an even
more accurate atomic clock and broadcasts the results (with some loss in
transmission) by radio, telephone and internet.
Read Ex 10 (Measuring unemployment again )
95 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
16 / 17
96 / 100
** STA 1020 - Part 1 (24/Sep/13) **
Ch08 - Measuring
Ex11 Authoritarian personality
Ch08 - Measuring
In 1950, a group of psychologists developed the “F-scale” as an
instrument to measure authoritarian personality. The F-scale asks how
strongly you agree or disagree with statements such as the following:
Obedience and respect for authority are the most important virtues
children should learn
Science has it place, but there are many important things that can
never be understood by the human mind
Exercise Ch08
8.12 Testing job applicants. The law requires that tests given to job
applicants must be shown to be directly job related. The Department of
Labor believes that an employment test called the General Aptitude Test
Battery (GATB) is valid for a broad range of jobs. As in the case of the
SAT, blacks and Hispanics get lower average scores on the GATB than do
whites. Describe briefly what must be done to establish that the GATB
has predictive validity as a measure of future performance on the job.
Strong agreement with such statements mark you are authoritarian. The
F-scale and the idea of the authoritarian personality continue to be
prominent in psychology, e.g., in studies of prejudice and right-wing
extremist movements.
Read Case Study Evaluated
JLM (WSU)
STA 1020
Ch08 - Measuring
97 / 100
8.12 Testing job applicants. It must be shown that scores on the GATB predict
future job performance. First, give the GATB to a large number of job applicants
for a broad range of jobs. Then, after some time, rate each applicant’s actual
performance. These ratings should be objective when possible; if workers are rated
by supervisors, the rating should be blind in the sense that the rater does not
know the GATB score. Arranging reliable and unbiased rating of job performance
may be the hardest part of the task. Finally, examine the relationship between
GATB scores and later job ratings. See Constance Holden, “Academy joins the fray
over job testing,” Science, vol. 244(1989), pp. 1036-1037 for a discussion of the GATB
that contains some nice statistical points. The National Academy of Sciences, reviewing
the evidence, said that: (1) The GATB is valid, and is in fact the best single predictor of
future job performance, beating interviews, educational background, and past work
experience. But the correlation with future ratings is modest, in the neighborhood of
0.3. (2) The GATB predicts just as well for minorities as for whites. (3) Nonetheless,
the GATB scores of lower-scoring minority groups should be adjusted upward to avoid
adverse impact so severe that civil rights law would rule out use of the test.
STA 1020
STA 1020
Ch08 - Measuring
**Answers
JLM (WSU)
JLM (WSU)
Exercise (answer) Ch08
98 / 100
Multiple choice Ch08
A psychologist says that scores on a test for “authoritarian personality”
can’t be trusted because the test counts religious belief as authoritarian.
The psychologist is attacking the test’s
(a) validity. (b) reliability. (c) margin of error. (d) confidence level.
Answer: (a)
.......................................................................
In one of the first attempts to discover the speed of light, Simon Newcomb
in 1882 made 66 measurements of the time light takes to travel between
the Washington Monument and his laboratory on the Potomac River. Why
did Newcomb repeat his measurement 66 times and the take the average
of the 66 as his final result?
(a) Averaging several measurements reduces any bias that is present in his
instruments. (b) The average of several measurements is more reliable
(less variable) than a single measurement. (c) Even if a measuring process
is not valid, averaging several measurements made by this process will be
valid. (d) Both (a) and (c) but not (b).
Answer: (b)
99 / 100
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
17 / 17
100 / 100
Download