1285050886_437937

advertisement
TERM
Chapter 1
Statistics: a collection of procedures and principles for gaining and
analyzing information to educate people and help them make better
decisions when faced with uncertainty.
CHAPTER SECTION
01
01.02
01
01.03
Chapter 2
Data: a plural word referring to a collection of numbers or other
pieces of information to which meaning has been attached
02
02.01
Chapter 3
Deliberate bias: when survey questions are worded in such a way to
elicit a desired answer
03
03.02
03
03.03
03
03.05
Sample: people or objects in a study
Population: the larger group from which people or objects in a study
are chosen
Observational Study: a study in which we merely observe things
about our sample
Randomized Experiment: a study in which we randomly assign
people to one of two groups
Random Assignment: a way of determining the group membership
for each person in a study
Unintentional bias: when survey questions are worded in such a way
that the meaning is misinterpreted by a large percentage of the
respondents
Desire to please:
Asking the uninformed:
Unnecessary complexity:
Ordering of questions:
Confidentiality versus anonymity: not releasing identifying
information about survey respondents versus when a researcher does
not know the identity of survey respondents.
Open Question: a survey question in which respondents are allowed
to answer in their own words
Closed Question: a survey question in which they are given a list of
alternatives from which to choose their answer.
Pilot Study or Pilot Survey: a study in which a small group of people
are asked survey questions in open form and their responses are used
to create the choices for the closed form.
Categorical variables (nominal variables): variables we can place
into a category but that may not have any logical ordering
Ordinal Variable: variables we can place into categories that have a
natural ordering
Measurement Variables (quantitative variables): variables for
which we can record a numerical value and then order respondents
according to those values
Interval Variable: a measurement variable (like temperature) in
which it makes sense to talk about differences, but not about ratios.
Temperature is a good example of an interval variable.
Ratio Variable: a measurement variable (like pulse rate) that has a
meaningful value of zero
Discrete Variable: a variable for which you could actually count the
possible responses
Continuous Variable: a variable that can be anything within a given
interval
Valid Measurement: a measurement that actually measures what it
claims to measure
Reliable Measurement: a measurement that will give you or anyone
else approximately the same result time after time when taken on the
same object or individual
Biased Measurement: a measurement that is systematically off the
mark in the same direction
Variability: the concept that measurements are likely to differ from
one time to the next or from one individual to the next because of
unpredictable errors, discrepancies, or natural differences that are not
readily explained
Measurement Error: The amount by which each measurement
differs from the true value
Natural Variability: variability that results from changes across time
in the system being measured
Chapter 4
Sample survey: a process in which a subgroup, or sample, of a large
population is questioned on a set of topics
Experiment: measures the effect of manipulating the environment in
some way
Randomized experiment: an experiment in which the manipulation
is assigned to participants on a random basis
Explanatory variable: the feature in an experiment being
manipulated
04
04.01
Outcome variable (response variable): the result of an experiment
Observational study: a study in which the manipulation occurs
naturally rather than being imposed by the experimenter
Case-control study: an observational study that includes an
appropriate control group
Meta-analysis: a quantitative review of a collection of studies all
done on a similar topic
Case study: an in-depth examination of one or a small number of
individuals
Unit: a single individual or object to be measured.
04
04.02
Margin of error: the measure of accuracy of a sample survey
04
04.03
Probability sampling plan: a sampling plan in which everyone in the
population must have a specified chance of making it into the sample
04
04.04
04
04.05
Population (or universe): is the entire collection of units about
which we would like information or the entire collection of
measurements we would have if we could measure the whole
population.
Sample: the collection of units we actually measure or the collection
of measurements we actually obtain.
Sampling frame: is a list of units from which the sample is chosen.
Ideally, it includes the whole population.
Census: a survey in which the entire population is measured.
Simple random sample: a sample in which every conceivable group
of people of the required size has the same chance of being the
selected sample
Strata: natural groups of population units
Stratified random sample: a sample in which units are collected by
first dividing the population of units into groups (strata) and then
taking a simple random sample from each
Cluster sampling: a sampling method in which population units are
divided into groups (clusters), but rather than sampling within each
group, random sample of clusters are selected and only those clusters
are measured
Systematic sampling: a sampling method in which the population list
is divided into as many consecutive segments as needed, a starting
point is randomly chosen in the first segment and then each segment
is sampled at that same point
Multistage sampling: sampling that may combine methods to sample
successively smaller divisions of the population to reach an individual
unit
Volunteer response: a situation in which only some members of a
selected sample choose to participate in a study
Chapter 5
Treatment: one or a combination of categories of the explanatory
variable(s) assigned by the experimenter
04
04.06
05
05.01
05
05.02
Confounding variable: a variable that 1.) is related to the
explanatory variable in the sense that individuals who differ for the
explanatory variable are also likely to differ for the confounding
variable and 2.) affects the response variable
Effect modifier: a subgroup variable that modifies the effect of the
explanatory variable on the outcome
Interaction: occurs when the relationship of one of two explanatory
variables to the response depends on the other one
Experimental units: the smallest basic objects to which we can
assign different treatments in a randomized experiment
Observational units: the objects or people measured in any study
Control group: a group in an experiment which is handled identically
to the treatment group in all respects, except that they don’t receive
the active treatment
Placebo: a treatment in a study that looks like the real drug but has no
active ingredients
Placebo effect: improvement in health in an experimental subject not
attributable to treatment
Double-blind experiment: an experiment in which neither the
participant nor the researcher taking the measurements knows who
had which treatment
Single-blind experiment: an experiment in which only one of the
two, the participant or the researcher taking the measurements, knows
which treatment the participant was assigned
Matched-pair designs: experimental designs that use either two
matched individuals or the same individual to receive each of two
treatments
Randomized block design (block design): an experimental design in
which similar experimental units are first placed together in groups
called blocks, then treatments are randomly assigned separately
within each block
Repeated-measures designs: designs in which the same participants
are measured repeatedly
Ecological validity: the measure of whether the variables in a study
have been removed from their natural setting and are measured in the
laboratory or in some other artificial setting
05
05.03
Retrospective: an observational study in which participants are asked
to recall past events
05
05.04
07
07.01
07
07.02
07
07.03
Prospective: an observational study in which participants are
followed into the future and events are recorded
Chapter 6 (n/a)
Chapter 7
Mean: the numerical average of a data set
Median: the middle value of a data set
Mode: the most common or most frequent value in a data set
Outliers: values that are far removed from the rest of the data in a
data set
Range: the difference between the minimum value and the maximum
value in a data set
Shape:
Stemplot (stem-and-leaf plot or stem-and-leaf diagram):
Histogram:
Symmetric data set: a data set in which, if you were to draw a line
through the center, the picture on one side would be a mirror image of
the picture on the other side
Bell-shaped data set: a data set in which the picture is not only
symmetric but also shaped like a bell
Unimodal: a data set with a single prominent peak in a histogram or
stemplot
Bimodal: a data set with two prominent peaks in a histogram or
stemplot
Skewed data set: a data set that is basically unimodal but is
substantially off from being bell-shaped
Five-number summary: a summary of numbers showing the lowest
value, highest value, median, lower quartile, and upper quartile
Quartile: simply the median of the two halves of an ordered list of
numbers
Lower quartile: one quarter of the way from the bottom of an
ordered list.
Upper quartile: one quarter of the way down from the top of an
ordered list
Boxplot, (box and whisker plot): a visually appealing and useful
way to present a five-number summary
07
07.04
07
07.05
08
08.01
08
08.03
08
08.04
09
09.04
Interquartile range: the distance between the lower and upper
quartiles of an ordered list of numbers
Outlier: any value that is more than 1.5 × IQR beyond the closest
quartile
Mean: the numerical average of a set of numbers
Standard deviation: the spread or variability in the values of a set of
numbers.
Variance: the square of the standard deviation
Chapter 8
Frequency curve: shows the possible values for a measurement
Normal distribution (bell-shaped curve, normal curve, Gaussian
curve): a symmetric, bell-shaped distribution of a set of numbers
Proportion: percentage of the population of measurements that falls
into a certain range
Percentile: the position of your measurement in comparison with
everyone else’s
Standardized score (standard score, z-score): the number of
standard deviations an observed value or score falls from the mean
Standard normal curve: a normal curve with a mean of 0 and a
standard deviation of 1
Empirical Rule: for any normal curve, approximately 68% of the
values fall within 1 standard deviation of the mean in either direction;
95% of the values fall within 2 standard deviations of the mean in
either direction; 99.7% of the values fall within 3 standard deviations
of the mean in either direction.
Chapter 9
Time series: a record of a variable across time, usually measured at
equally spaced intervals
Trend: a steady change, either increasing or decreasing, steadily
across time
Seasonal component: the component of variation in a time series,
where the variation is high in certain months or seasons and low in
others every year
Cycle: the irregular (but smooth) unexplainable random fluctuations
of time series
Random fluctuation: the natural variability present in all
measurements
Chapter 10
Correlation: a measurement of the strength of a certain type of
relationship between two measurement variables
10
10.01
Statistically significant: a relationship that is strong enough in the
observed sample where it would have been unlikely to occur if there
were no relationship in the corresponding population
10
10.02
Regression: the procedure we use to find a straight line that comes as
close as possible to the points in a scatterplot
10
10.04
12
12.01
12
12.02
Regression: a numerical method for trying to predict the value of one
measurement variable from knowing the value of another one
Deterministic relationship: a relationship in which, if we know the
value of one variable, we can determine the value of the other exactly.
Statistical relationship: a relationship in which there is variation
from the average pattern
Regression line: the resulting line of a regression
Regression equation: the formula that describes a regression line
Least squares line: the most common procedure is to find what is the
best straight line relating two variables
Intercept: the point a line crosses the vertical axis when the
horizontal axis is at zero
Slope: the amount of an increase there is for one variable (the one on
the vertical axis) when the other (on the horizontal axis) increases by
one unit
Detrended time series: a time series in which the linear trend is
removed
Chapter 11
n/a
Chapter 12
Contingency table: displays the counts of how many individuals fall
into the possible combinations of categories for two categorical
variables
Cell: each row and column combination in a contingency table
Proportion: the percent chance of the total that a randomly selected
individual will fall into a particular category for a categorical
variable.
Odds: the measurement comparing the chance that the individual will
fall into a particular category for a categorical variable to the chance
that it will not
Baseline risk: the risk associated with something before a treatment
or behavior is considered
Relative risk: the ratio of the risks for each category for two
categories of an explanatory variable
Odds ratio: compares the odds of an occurrence for two different
categories
Simpson’s Paradox: a phenomenon in which omitting a third
variable masks a relationship between categorical variables
12
12.04
13
13.02
13
13.03
14
14.02
14
14.03
Selection ratio: the ratio of the proportion of successful applicants for
a job from one group (sex, race, and so on) compared with another
group
Chapter 13
Hypothesis test: used to decide whether an observed relationship in a
sample provides evidence of a real relationship in the population
represented by the sample
Alternative hypothesis (the research hypothesis): in hypothesis
testing, what the researchers are interested in showing to be true
Null hypothesis: in hypothesis testing, usually some form of “nothing
interesting happening”
Chi-square test: a procedure used in trying to determine if there is a
relationship between two categorical variables
p-value: the probability of observing a test statistic as extreme as the
one observed or more so if the null hypothesis is really true
Level of the test (level of significance, level): the number used as the
p-value cutoff for statistical significance
Chi-square statistic: a measure that combines the strength of the
relationship with information about the size of the sample to give one
summary number
Expected count: for a chi-square test, the counts that would be
expected, on average, if there really is no relationship between the
two variables (that is, if the null hypothesis really is true)
Chapter 14
Probability (relative frequency): the proportion of time any specific
outcome occurs over the long run
Personal probability: the degree to which a given individual believes
an event will happen
Coherent: the concept that the personal probability of one event
doesn’t contradict the personal probability of another
Mutually exclusive: when two outcomes cannot happen
simultaneously
14
14.04
14
14.06
15
15.01
Randomization distribution: distribution of chi-square statistics we
would observe if the null hypothesis were true
15
15.03
Randomization test (permutation tests): a test that uses simulation
to estimate p-values
15
15.04
16
16.02
16
16.03
16
16.04
Independent events: events that do not influence each other.
Knowing the probability that one of them will or has happened does
not change the probability of the other one happening.
Expected value (EV): the average value of any measurement over the
long run
Chapter 15
Simulation: the use of computer models to mimic what might happen
in the real world
Permutation: a scrambling of the values in a data set
Chapter 16
Certainty effect: the tendency to give more value to a fixed amount
of change in probability if that change results in 100% assurance of a
good thing happening or 100% assurance of a bad thing not
happening
Possibility effect: the tendency to give more value to a small change
in probability when it increases the probability of a good outcome
from 0 to a small non-zero amount
Pseudocertainty effect: says that people will pay more to reduce
some of possible risks to zero and not reduce others at all, rather than
reducing all risks by some amount that results in the same overall
reduction
Heuristic: a simple procedure that helps find adequate, though often
imperfect, answers to difficult questions
Availability heuristic: distorts probability estimates by tying them to
how readily situations can be brought to mind
Anchor: a reference point
Representativeness heuristic: a heuristics that leads people to assign
higher probabilities than are warranted to scenarios that are
representative of how we imagine things would happen
Conjunction fallacy: occurs when detailed scenarios involving the
conjunction of events are given higher probability assessments than
statements of one of the simple events alone
Conservatism: the tendency to change previous probability estimates
more slowly than warranted by new data
Chapter 17
Coincidence: a surprising concurrence of events, perceived as
meaningfully related, with no apparent causal connection
17
17.02
Gambler’s fallacy: the idea that the long-run frequency of an event
should apply even in the short run
17
17.03
17
17.04
18
18.01
18
18.05
19
19.02
Rule for Sample Means: describes the pattern (frequency curve) of
sample means that would result from taking repeated samples of the
same size
19
19.03
Confidence interval: an interval of values that a researcher is fairly
sure covers the true value for the population
19
19.04
20
20.01
Law of small numbers: the fallacy that even small samples are
highly representative of the populations from which they are drawn
Confusion of the inverse: the mistaken belief that the conditional
probability of event A happening given that event B happened is
similar to the conditional probability of event B, given event A
Sensitivity (of a test): the proportion of people who correctly test
positive when they actually have the disease
Specificity (of a test): the proportion of people who correctly test
negative when they don’t have the disease
Chapter 18
Price index number: measures prices at one time period relative to
another time period, usually as a percentage
Leading economic indicator: an indicator in which the highs, lows,
and changes tend to precede or lead similar changes in the economy
Coincident economic indicator: an indicator with changes that
coincide with those in the economy
Lagging economic indicator: an indicator whose changes lag behind
or follow changes in the economy
Chapter 19
Rule for Sample Proportions: describes the pattern (frequency
curve) of sample proportions that would result from taking repeated
samples of the same size
Hypothesis testing (significance testing): a statistical technique that
uses sample data to attempt to reject the hypothesis that nothing
interesting is happening
Chapter 20
Confidence level: accompanies a confidence interval and provides
the long-run relative frequency for which the confidence interval
procedure works
Standard error of the sample proportions (standard error or
SEP): the measurement when the sample proportion is substituted for
the population proportion in the standard deviation formula
20
20.03
21
21.01
21
21.02
22
22.02
22
22.03
22
22.04
Confidence interval for a proportion: a calculation of the sample
proportion ± multiplier × standard error
Chapter 21
Standard error of the mean (standard error or SEM): the standard
deviation for the possible sample means
Confidence interval for a population mean: sample mean ±
multiplier × standard error
Student’s t distribution: the place from which the confidence
interval for a population mean multiplier is derived
t-multiplier: the multiplier in the equation for a confidence interval
for a population mean
Standard error of the difference in two means (standard error of
difference or SED): standard error of difference = square root of
[(SEM1)2 + (SEM2)2]
Chapter 22
Test statistic: the single summary of data on which the decision in a
hypothesis test is based
Level of significance: the p-value that is small enough to rule out the
null hypothesis
Null value: the specific value of a population proportion at which
researchers are interested in testing
One-sided test or a one-tailed test: a hypothesis test where the
values above the null value only or below the null value only are
included in the alternative hypothesis
Two-sided test or a two-tailed test: a hypothesis test where values
on either side of the null value are included in the alternative
hypothesis
Sample proportion: in hypothesis testing, the corresponding
proportion in a sample that is compared to the null value of a
population proportion
Null standard error: the result of the calculation when the we
assume that the true population proportion is the null value and we
use the null value to compute the standard deviation
False negative: in medical tests, when someone is actually diseased
but has been told he or she is not
False positive: in medical tests, when someone is actually healthy but
has been told he or she is diseased
Type 1 error: an error made when the null hypothesis is true but is
rejected
Type 2 error: an error made when the alternative hypothesis is true
but the data does not provide convincing evidence that it is true
Chapter 24
Multiple testing: the conducting of many hypothesis tests
24
24.04
25
25.01
Multiple comparisons: making many comparisons through either
confidence intervals or hypothesis tests
Bonferroni method: a method developed for handling multiple
comparisons, done by dividing up the significance level (or
confidence level) and apportioning it across tests (or confidence
intervals)
Chapter 25
Vote-counting method: the practice of simply counting how many
studies on a topic were statistically significant
Meta-analysis: a collection of statistical techniques for combining
studies
Fixed effects model: in meta-analysis, the assumption is that all of
the studies included samples from similar populations, with a fixed
but unknown magnitude of the effect being tested
25.02
File drawer problem: a criticism of meta-analysis, the possibility
that numerous studies may not be discovered by the meta-analyst
25.04
Chapter 26
Informed consent: the idea that participants in experiments are to be
told what the research is about and given an opportunity to make an
informed choice about whether to participate
26
26.01
Download