Chapter 3 Generating Data

advertisement
Chapter 3
Generating Data
Introduction to Data Collection/Analysis
• Exploratory Data Analysis: Plots and Measures
that describe a set of measurements with no clear
research questions posed.
• Statistical Inference: Methods used to make
statements regarding population(s) based on
sample data
• Statistical Design: Strategy to obtain data to
answer research questions (gameplan)
• Anecdotal Evidence: Information obtained from
individual, high profile, cases (plane crashes,
storms, etc)
Data Sources
• Available Data: Information previously obtained
and available in libraries and/or the Internet
• Sampling: Selecting a subset from population of
interest and obtaining relevant information from
individuals (observational study)
• Census: Information collected from all
individuals in a population
• Experiment: Individuals are placed in various
conditions by researchers and responses are then
obtained
Experimental Design
• Experimental Units: Individuals participating in
experiment (Humans often called Subjects or Ss)
• Treatment: Specific condition applied to units
• Factor: Explanatory variable used in experiment. Many
experiments have more than 1 factor
• Factor Level: Value that a factor takes on.
• Example: Unplanned Purchases
– 68 subjects selected, response: #unplanned items purchased
– Factors: Store Knowledge and Time Pressure
– Factor Levels: Knowledge(Familiar/Unfamiliar) Time
Pressure(Present/Absent)
– Treatments: 4 Cominations of Knowledge and Time Pressure
Unplanned Purchases Experiment
Time
Pressure
No Time
Pressure
Familiar
17 subjects
Environment Mean=2.29
17 subjects
Mean=3.62
Unfamiliar 17 subjects
Environment Mean=2.13
17 subjects
Mean=7.68
Comparative Experiments
• Goal: Compare two or more conditions (treatments)
• Units assigned at random to receive 1 treatment
(usually, although some designs have each unit
receive each treatment)
• Placebo Effect: Phenomena where subjects show
improvement even when given a dummy treatment
• Control Group: Subjects that receive a placebo or
non-active agent or no treatment at all
• Biased Design: Favors certain response outcomes
• Randomization: Use of chance to assign units to
treatment conditions
Principles of Experimental Design
• Control: Removing effects of lurking variables
by comparing two or more treatments
• Randomization: Use of chance to allocate
subjects to treatments. Removes personal
biases. Makes use of tables/computer programs
for random digits
• Replication: Apply treatments to as many units
as possible
• Statistical Significance: Observed effect that
exceeds what could be expected by chance
Miscellaneous Topics
• Blinding: Whenever possible, subject and observor
should be unaware of which treatment was assigned.
When neither knows it’s called “double-blind”
• Realism: Do the conditions in the experiment the realworld setting of interest to investigators
• Matching: Identifying pairs of units based on some
criteria expected to be related to response, then
randomly assigning one from each pair to each
treatment
• Block Design: Extension of matching to more than 2
groups (subjects can be their own blocks and receive
each treatment in some experiments)
Sampling Design
• Population: Entire set of individuals of interest to
researcher
• Sample: Subset of population obtained for data
collection/information gathering
• Voluntary Response Sample: Individuals who self-select
themselves as respondents. Internet polls are example.
Tend to be very biased.
• Simple Random Sample: Sample selected so that each
group of n individuals is equally likely to be selected
• Probability Sample: Sample chosen by chance
• Stratified Random Sample: Simple Random samples
selected from pre-specified groups (strata)
Miscellaneous Topics in Sampling
• Multistage Sampling: Government surveys tend to have
multiple levels in the sampling process.
Primary Sampling Unit Block
Clusters of units
• Undercoverage: Groups in the population are not
included in sample
• Nonresponse: Individuals Selected who do not respond
• Biases:
– Response Bias: Subject gives answer to please interviewer
– Recall Bias: Tendency for some subjects to remember
something from past
– Wording: Questions can be phrased to elicit certain responses
Introduction to Statistical Inference
• Parameter: Number describing a population
 - Population Mean (Quantitat ive Variable)
p - Population proportion with a characteri c (Categoric al)
• Statistic: Number describing a sample
x - Sample Mean
# in sample with characteri stic
p - Sample proportion 
sample size
^
Parameters are fixed (usually Unknown) values.
Statistics vary from one sample to another due to different individuals
Sampling Distributions
• Sampling Distribution: Distribution of values that a
statistic can take on across all samples from the
population.
– Shape: For large samples, the sampling distributions of sample
means and proportions tend to be approximately normal
– Center: The center of he sampling is equal to the parameter
value in the population (unbiased)
– Spread: The spread of the distribution decreases as the sample
size increases (variability of statistic shrinks as sample size
gets larger)
– Margin of error: Bounds on the size of likely sampling error
(difference between sample statistic and population parameter)
Download