TERM Chapter 1 Statistics: a collection of procedures and principles for gaining and analyzing information to educate people and help them make better decisions when faced with uncertainty. CHAPTER SECTION 01 01.02 01 01.03 Chapter 2 Data: a plural word referring to a collection of numbers or other pieces of information to which meaning has been attached 02 02.01 Chapter 3 Deliberate bias: when survey questions are worded in such a way to elicit a desired answer 03 03.02 03 03.03 03 03.05 Sample: people or objects in a study Population: the larger group from which people or objects in a study are chosen Observational Study: a study in which we merely observe things about our sample Randomized Experiment: a study in which we randomly assign people to one of two groups Random Assignment: a way of determining the group membership for each person in a study Unintentional bias: when survey questions are worded in such a way that the meaning is misinterpreted by a large percentage of the respondents Desire to please: Asking the uninformed: Unnecessary complexity: Ordering of questions: Confidentiality versus anonymity: not releasing identifying information about survey respondents versus when a researcher does not know the identity of survey respondents. Open Question: a survey question in which respondents are allowed to answer in their own words Closed Question: a survey question in which they are given a list of alternatives from which to choose their answer. Pilot Study or Pilot Survey: a study in which a small group of people are asked survey questions in open form and their responses are used to create the choices for the closed form. Categorical variables (nominal variables): variables we can place into a category but that may not have any logical ordering Ordinal Variable: variables we can place into categories that have a natural ordering Measurement Variables (quantitative variables): variables for which we can record a numerical value and then order respondents according to those values Interval Variable: a measurement variable (like temperature) in which it makes sense to talk about differences, but not about ratios. Temperature is a good example of an interval variable. Ratio Variable: a measurement variable (like pulse rate) that has a meaningful value of zero Discrete Variable: a variable for which you could actually count the possible responses Continuous Variable: a variable that can be anything within a given interval Valid Measurement: a measurement that actually measures what it claims to measure Reliable Measurement: a measurement that will give you or anyone else approximately the same result time after time when taken on the same object or individual Biased Measurement: a measurement that is systematically off the mark in the same direction Variability: the concept that measurements are likely to differ from one time to the next or from one individual to the next because of unpredictable errors, discrepancies, or natural differences that are not readily explained Measurement Error: The amount by which each measurement differs from the true value Natural Variability: variability that results from changes across time in the system being measured Chapter 4 Sample survey: a process in which a subgroup, or sample, of a large population is questioned on a set of topics Experiment: measures the effect of manipulating the environment in some way Randomized experiment: an experiment in which the manipulation is assigned to participants on a random basis Explanatory variable: the feature in an experiment being manipulated 04 04.01 Outcome variable (response variable): the result of an experiment Observational study: a study in which the manipulation occurs naturally rather than being imposed by the experimenter Case-control study: an observational study that includes an appropriate control group Meta-analysis: a quantitative review of a collection of studies all done on a similar topic Case study: an in-depth examination of one or a small number of individuals Unit: a single individual or object to be measured. 04 04.02 Margin of error: the measure of accuracy of a sample survey 04 04.03 Probability sampling plan: a sampling plan in which everyone in the population must have a specified chance of making it into the sample 04 04.04 04 04.05 Population (or universe): is the entire collection of units about which we would like information or the entire collection of measurements we would have if we could measure the whole population. Sample: the collection of units we actually measure or the collection of measurements we actually obtain. Sampling frame: is a list of units from which the sample is chosen. Ideally, it includes the whole population. Census: a survey in which the entire population is measured. Simple random sample: a sample in which every conceivable group of people of the required size has the same chance of being the selected sample Strata: natural groups of population units Stratified random sample: a sample in which units are collected by first dividing the population of units into groups (strata) and then taking a simple random sample from each Cluster sampling: a sampling method in which population units are divided into groups (clusters), but rather than sampling within each group, random sample of clusters are selected and only those clusters are measured Systematic sampling: a sampling method in which the population list is divided into as many consecutive segments as needed, a starting point is randomly chosen in the first segment and then each segment is sampled at that same point Multistage sampling: sampling that may combine methods to sample successively smaller divisions of the population to reach an individual unit Volunteer response: a situation in which only some members of a selected sample choose to participate in a study Chapter 5 Treatment: one or a combination of categories of the explanatory variable(s) assigned by the experimenter 04 04.06 05 05.01 05 05.02 Confounding variable: a variable that 1.) is related to the explanatory variable in the sense that individuals who differ for the explanatory variable are also likely to differ for the confounding variable and 2.) affects the response variable Effect modifier: a subgroup variable that modifies the effect of the explanatory variable on the outcome Interaction: occurs when the relationship of one of two explanatory variables to the response depends on the other one Experimental units: the smallest basic objects to which we can assign different treatments in a randomized experiment Observational units: the objects or people measured in any study Control group: a group in an experiment which is handled identically to the treatment group in all respects, except that they don’t receive the active treatment Placebo: a treatment in a study that looks like the real drug but has no active ingredients Placebo effect: improvement in health in an experimental subject not attributable to treatment Double-blind experiment: an experiment in which neither the participant nor the researcher taking the measurements knows who had which treatment Single-blind experiment: an experiment in which only one of the two, the participant or the researcher taking the measurements, knows which treatment the participant was assigned Matched-pair designs: experimental designs that use either two matched individuals or the same individual to receive each of two treatments Randomized block design (block design): an experimental design in which similar experimental units are first placed together in groups called blocks, then treatments are randomly assigned separately within each block Repeated-measures designs: designs in which the same participants are measured repeatedly Ecological validity: the measure of whether the variables in a study have been removed from their natural setting and are measured in the laboratory or in some other artificial setting 05 05.03 Retrospective: an observational study in which participants are asked to recall past events 05 05.04 07 07.01 07 07.02 07 07.03 Prospective: an observational study in which participants are followed into the future and events are recorded Chapter 6 (n/a) Chapter 7 Mean: the numerical average of a data set Median: the middle value of a data set Mode: the most common or most frequent value in a data set Outliers: values that are far removed from the rest of the data in a data set Range: the difference between the minimum value and the maximum value in a data set Shape: Stemplot (stem-and-leaf plot or stem-and-leaf diagram): Histogram: Symmetric data set: a data set in which, if you were to draw a line through the center, the picture on one side would be a mirror image of the picture on the other side Bell-shaped data set: a data set in which the picture is not only symmetric but also shaped like a bell Unimodal: a data set with a single prominent peak in a histogram or stemplot Bimodal: a data set with two prominent peaks in a histogram or stemplot Skewed data set: a data set that is basically unimodal but is substantially off from being bell-shaped Five-number summary: a summary of numbers showing the lowest value, highest value, median, lower quartile, and upper quartile Quartile: simply the median of the two halves of an ordered list of numbers Lower quartile: one quarter of the way from the bottom of an ordered list. Upper quartile: one quarter of the way down from the top of an ordered list Boxplot, (box and whisker plot): a visually appealing and useful way to present a five-number summary 07 07.04 07 07.05 08 08.01 08 08.03 08 08.04 09 09.04 Interquartile range: the distance between the lower and upper quartiles of an ordered list of numbers Outlier: any value that is more than 1.5 × IQR beyond the closest quartile Mean: the numerical average of a set of numbers Standard deviation: the spread or variability in the values of a set of numbers. Variance: the square of the standard deviation Chapter 8 Frequency curve: shows the possible values for a measurement Normal distribution (bell-shaped curve, normal curve, Gaussian curve): a symmetric, bell-shaped distribution of a set of numbers Proportion: percentage of the population of measurements that falls into a certain range Percentile: the position of your measurement in comparison with everyone else’s Standardized score (standard score, z-score): the number of standard deviations an observed value or score falls from the mean Standard normal curve: a normal curve with a mean of 0 and a standard deviation of 1 Empirical Rule: for any normal curve, approximately 68% of the values fall within 1 standard deviation of the mean in either direction; 95% of the values fall within 2 standard deviations of the mean in either direction; 99.7% of the values fall within 3 standard deviations of the mean in either direction. Chapter 9 Time series: a record of a variable across time, usually measured at equally spaced intervals Trend: a steady change, either increasing or decreasing, steadily across time Seasonal component: the component of variation in a time series, where the variation is high in certain months or seasons and low in others every year Cycle: the irregular (but smooth) unexplainable random fluctuations of time series Random fluctuation: the natural variability present in all measurements Chapter 10 Correlation: a measurement of the strength of a certain type of relationship between two measurement variables 10 10.01 Statistically significant: a relationship that is strong enough in the observed sample where it would have been unlikely to occur if there were no relationship in the corresponding population 10 10.02 Regression: the procedure we use to find a straight line that comes as close as possible to the points in a scatterplot 10 10.04 12 12.01 12 12.02 Regression: a numerical method for trying to predict the value of one measurement variable from knowing the value of another one Deterministic relationship: a relationship in which, if we know the value of one variable, we can determine the value of the other exactly. Statistical relationship: a relationship in which there is variation from the average pattern Regression line: the resulting line of a regression Regression equation: the formula that describes a regression line Least squares line: the most common procedure is to find what is the best straight line relating two variables Intercept: the point a line crosses the vertical axis when the horizontal axis is at zero Slope: the amount of an increase there is for one variable (the one on the vertical axis) when the other (on the horizontal axis) increases by one unit Detrended time series: a time series in which the linear trend is removed Chapter 11 n/a Chapter 12 Contingency table: displays the counts of how many individuals fall into the possible combinations of categories for two categorical variables Cell: each row and column combination in a contingency table Proportion: the percent chance of the total that a randomly selected individual will fall into a particular category for a categorical variable. Odds: the measurement comparing the chance that the individual will fall into a particular category for a categorical variable to the chance that it will not Baseline risk: the risk associated with something before a treatment or behavior is considered Relative risk: the ratio of the risks for each category for two categories of an explanatory variable Odds ratio: compares the odds of an occurrence for two different categories Simpson’s Paradox: a phenomenon in which omitting a third variable masks a relationship between categorical variables 12 12.04 13 13.02 13 13.03 14 14.02 14 14.03 Selection ratio: the ratio of the proportion of successful applicants for a job from one group (sex, race, and so on) compared with another group Chapter 13 Hypothesis test: used to decide whether an observed relationship in a sample provides evidence of a real relationship in the population represented by the sample Alternative hypothesis (the research hypothesis): in hypothesis testing, what the researchers are interested in showing to be true Null hypothesis: in hypothesis testing, usually some form of “nothing interesting happening” Chi-square test: a procedure used in trying to determine if there is a relationship between two categorical variables p-value: the probability of observing a test statistic as extreme as the one observed or more so if the null hypothesis is really true Level of the test (level of significance, level): the number used as the p-value cutoff for statistical significance Chi-square statistic: a measure that combines the strength of the relationship with information about the size of the sample to give one summary number Expected count: for a chi-square test, the counts that would be expected, on average, if there really is no relationship between the two variables (that is, if the null hypothesis really is true) Chapter 14 Probability (relative frequency): the proportion of time any specific outcome occurs over the long run Personal probability: the degree to which a given individual believes an event will happen Coherent: the concept that the personal probability of one event doesn’t contradict the personal probability of another Mutually exclusive: when two outcomes cannot happen simultaneously 14 14.04 14 14.06 15 15.01 Randomization distribution: distribution of chi-square statistics we would observe if the null hypothesis were true 15 15.03 Randomization test (permutation tests): a test that uses simulation to estimate p-values 15 15.04 16 16.02 16 16.03 16 16.04 Independent events: events that do not influence each other. Knowing the probability that one of them will or has happened does not change the probability of the other one happening. Expected value (EV): the average value of any measurement over the long run Chapter 15 Simulation: the use of computer models to mimic what might happen in the real world Permutation: a scrambling of the values in a data set Chapter 16 Certainty effect: the tendency to give more value to a fixed amount of change in probability if that change results in 100% assurance of a good thing happening or 100% assurance of a bad thing not happening Possibility effect: the tendency to give more value to a small change in probability when it increases the probability of a good outcome from 0 to a small non-zero amount Pseudocertainty effect: says that people will pay more to reduce some of possible risks to zero and not reduce others at all, rather than reducing all risks by some amount that results in the same overall reduction Heuristic: a simple procedure that helps find adequate, though often imperfect, answers to difficult questions Availability heuristic: distorts probability estimates by tying them to how readily situations can be brought to mind Anchor: a reference point Representativeness heuristic: a heuristics that leads people to assign higher probabilities than are warranted to scenarios that are representative of how we imagine things would happen Conjunction fallacy: occurs when detailed scenarios involving the conjunction of events are given higher probability assessments than statements of one of the simple events alone Conservatism: the tendency to change previous probability estimates more slowly than warranted by new data Chapter 17 Coincidence: a surprising concurrence of events, perceived as meaningfully related, with no apparent causal connection 17 17.02 Gambler’s fallacy: the idea that the long-run frequency of an event should apply even in the short run 17 17.03 17 17.04 18 18.01 18 18.05 19 19.02 Rule for Sample Means: describes the pattern (frequency curve) of sample means that would result from taking repeated samples of the same size 19 19.03 Confidence interval: an interval of values that a researcher is fairly sure covers the true value for the population 19 19.04 20 20.01 Law of small numbers: the fallacy that even small samples are highly representative of the populations from which they are drawn Confusion of the inverse: the mistaken belief that the conditional probability of event A happening given that event B happened is similar to the conditional probability of event B, given event A Sensitivity (of a test): the proportion of people who correctly test positive when they actually have the disease Specificity (of a test): the proportion of people who correctly test negative when they don’t have the disease Chapter 18 Price index number: measures prices at one time period relative to another time period, usually as a percentage Leading economic indicator: an indicator in which the highs, lows, and changes tend to precede or lead similar changes in the economy Coincident economic indicator: an indicator with changes that coincide with those in the economy Lagging economic indicator: an indicator whose changes lag behind or follow changes in the economy Chapter 19 Rule for Sample Proportions: describes the pattern (frequency curve) of sample proportions that would result from taking repeated samples of the same size Hypothesis testing (significance testing): a statistical technique that uses sample data to attempt to reject the hypothesis that nothing interesting is happening Chapter 20 Confidence level: accompanies a confidence interval and provides the long-run relative frequency for which the confidence interval procedure works Standard error of the sample proportions (standard error or SEP): the measurement when the sample proportion is substituted for the population proportion in the standard deviation formula 20 20.03 21 21.01 21 21.02 22 22.02 22 22.03 22 22.04 Confidence interval for a proportion: a calculation of the sample proportion ± multiplier × standard error Chapter 21 Standard error of the mean (standard error or SEM): the standard deviation for the possible sample means Confidence interval for a population mean: sample mean ± multiplier × standard error Student’s t distribution: the place from which the confidence interval for a population mean multiplier is derived t-multiplier: the multiplier in the equation for a confidence interval for a population mean Standard error of the difference in two means (standard error of difference or SED): standard error of difference = square root of [(SEM1)2 + (SEM2)2] Chapter 22 Test statistic: the single summary of data on which the decision in a hypothesis test is based Level of significance: the p-value that is small enough to rule out the null hypothesis Null value: the specific value of a population proportion at which researchers are interested in testing One-sided test or a one-tailed test: a hypothesis test where the values above the null value only or below the null value only are included in the alternative hypothesis Two-sided test or a two-tailed test: a hypothesis test where values on either side of the null value are included in the alternative hypothesis Sample proportion: in hypothesis testing, the corresponding proportion in a sample that is compared to the null value of a population proportion Null standard error: the result of the calculation when the we assume that the true population proportion is the null value and we use the null value to compute the standard deviation False negative: in medical tests, when someone is actually diseased but has been told he or she is not False positive: in medical tests, when someone is actually healthy but has been told he or she is diseased Type 1 error: an error made when the null hypothesis is true but is rejected Type 2 error: an error made when the alternative hypothesis is true but the data does not provide convincing evidence that it is true Chapter 24 Multiple testing: the conducting of many hypothesis tests 24 24.04 25 25.01 Multiple comparisons: making many comparisons through either confidence intervals or hypothesis tests Bonferroni method: a method developed for handling multiple comparisons, done by dividing up the significance level (or confidence level) and apportioning it across tests (or confidence intervals) Chapter 25 Vote-counting method: the practice of simply counting how many studies on a topic were statistically significant Meta-analysis: a collection of statistical techniques for combining studies Fixed effects model: in meta-analysis, the assumption is that all of the studies included samples from similar populations, with a fixed but unknown magnitude of the effect being tested 25.02 File drawer problem: a criticism of meta-analysis, the possibility that numerous studies may not be discovered by the meta-analyst 25.04 Chapter 26 Informed consent: the idea that participants in experiments are to be told what the research is about and given an opportunity to make an informed choice about whether to participate 26 26.01