Chapter 5.1 Data Production

advertisement
AP Statistics

Observational study: We observe individuals
and measure variables of interest but do not
attempt to influence responses.

Experiment: We deliberately impose some
treatment on individuals in order to observe
their responses.

Pros vs. Cons of each? (control etc…experiment better)

Pop: the entire group of individuals that we
want information about

Sample: a part of the population that we
actually examine in order to gather info

Sampling vs. Census: Sampling studies a part
in order to gain info about the whole, census
attempts to contact every individual in the pop



Voluntary response: People choose
themselves by responding
Convenience sampling: Choosing individuals
who are easiest to reach
Bias: The sampling method is biased if it
systematically favors certain outcomes


The simplest way to use chance to select a
sample is to place names in a hat (the
population) and draw out a handful (the
sample).
SRS: every individual has = chance of getting
picked, every sample of the size you are
drawing has = chance of getting picked


Table B: long string of digits 0-9, each entry in
table is equally likely to be any of the 1- digits
Choosing SRS with table:
 1. Label: Assign a # label to every individual in the
pop (example: 01-50 for each senior girl @ SYHS)
 2. Table: use table B to select random labels
 3. Stop: indicate when you should stop sampling (toss
out repeated numbers, or numbers out of your range)
 4. Identify sample: use the random #’s to identify
subjects to be selected from your pop. This is your
sample!

Math, prb, randint(lowest #, highest #, # of
people you want in your sample)

If you use ctlghlp: instead of hitting enter
when randint( is highlighted in the prb menu,
hit “+” and it will tell you what goes in
parens.
You can store your random numbers in a list:
Randint(1,150,25) sto-> L1






Probability sample: samples chosen by chance
Stratified random sample: divide population into
groups (aka strata) that are similar in some way,
then choose a separate SRS in each stratum,
then combine these SRS’s to form the full
sample
Cluster sampling: divide population into groups
(aka clusters). Some of these clusters are
randomly selected. Then all individuals in
chosen clusters are selected to be in the sample
Multistage samples

Undercoverage: occurs when some groups in
the population are left out in the process of
choosing the sample (hard to get an accurate
and complete list of the population. Most
samples suffer from some degree of this)

Nonresponse: occurs when an individual
chosen for the sample can’t be contacted or
does not cooperate.

The behavior of the respondent or
interviewer can cause response bias in
sample results

Wording of questions can influence answers

We can improve our results by knowing that
larger random samples give more accurate
results than smaller samples




The individuals on which the experiment is
done are the experimental units.
If units are humans, they are called subjects.
The experimental condition applied to the
units (aka the thing we ‘do’ to the people
participating) is called a treatment.
Goal of research is to establish a causal link
between a particular treatment and a
response.



Factors: number of variables interested in
(example: Study differences of gender and
alcohol preference. 2 factors: Gender, alcohol
preference)
Levels: number of ‘categories’ for each:
(gender has 2 levels…M/F, Alcohol lets say
has 3 levels…hard liquor/beer/wine)
This is an example of a 2x3 study

We use lab experiments often to protect us
from lurking variables which may happen
when conducting experiments ‘in the field’



Even w/control, natural variability occurs
among experimental units.
We would like to see units within a treatment
group responding similarly to one another,
but differently from units in other treatment
groups (then we can be sure that the
treatment is responsible for the differences).
If we assign many individuals to each
treatment group, the effects of chance (and
individual differences) will average out.

Comparison of the effects of several
treatments is valid only when all treatments
are applied to similar groups of experimental
units.


Experimenters often attempt to match
groups by elaborate balancing (match
patients in a ‘new drug’ and ‘placebo’ group
by age, sex, physical condition, smoker, etc).
This is helpful but not adequate b/c of lurking
variables.
Statistician’s remedy: rely on chance to make
an assignment that doesn’t depend on any
characteristic of the experimental units or the
judgment of the experimenter in any way.



Randomization produces 2 groups of subjects
we expect to be similar in all respects before
treatment is applied
Comparative design insures that influences
other than what is being studied operate
equally on both groups
Therefore, measured differences must be due
either to treatment or play of chance in the
random assignment of subjects to 2 groups



1. Control the effects of lurking variables on
the response, most simply by comparing 2 or
more treatments
2. Replicate each treatment on many units
to reduce chance variation in results
3. Randomize – use impersonal chance to
assign experimental units to treatments

We hope to see big differences (differences
so large they are not likely just due to chance
or individual differences).

If we do have an observed effect so large that
it would rarely occur by chance, we call our
result Statistically Significant


A block is a group of experimental units that
are known before the experiment to be
similar in some way that is expected to
systematically affect the response to
treatments (ex: Testing the effect of weight
lifting on a group of people- men/women will
have obvious differences).
Separate into “blocks” of similar subjects to
reduce the effect of variation



Matching the subjects in various ways can
produce more precise results than simple
randomization
Matched pairs design compares 2
treatments. Subjects matched in pairs.
Fitness example: Pair females with each
other, males with each other, one person in
each pair goes to one treatment group
(weights), the other person goes to the other
treatment group (pilates)


Double-blind: neither subject nor
experimenter knows which treatment is
assigned
Lack of realism: subjects or treatments of an
experiment may not realistically duplicate the
conditions we really want to study.
Download