Overview of How To Lie With Statistics by Darrell Huff With additional insights Chapter 1 - Sampling Biases • Response Bias: Tendency for people to over- or under-state the truth • Non-response: People who complete surveys are systematically different from those who fail to respond. Accessibility/Pride. • Representative Sample: One where all sources of bias have been removed. (Literary Digest) • Questionnaire wording/Interviewer effects • Recall Bias: Tendency for one group to remember prior exposure in retrospective studies Chapter 2 - Well-Chosen Average • Arithmetic Mean: Evenly distributes the total among individuals. Can be unrepresentative when measurements are highly skewed right. (e.g. per capita income) • Median: Value dividing distribution into two equal parts. 50th percentile. (e.g. median household income) • Mode: Most frequently observed outcome (rarely reported with numeric data) Chapter 3 - Little Figures Not There • Small samples: Estimators with large standard errors, can provide seemingly very strong effects • Low incidence rates: Need very large samples for meaningful estimates of low frequency events • Significance levels/margins of error: Measures of the strength and precision of inference • Ranges: Report ranges or standard deviations along with means (e.g. “normal” ranges) • Inferring among individuals versus populations • Clearly label chart axes Chapter 4 - Much Ado About Nothing • Probable Error: Estimation error with probability 0.5. If estimator is approximately normal, PE is approximately 0.675 standard errors. (Old school) • Margin of Error: Estimation error with probability 0.95. If estimator is approximately normal, PE is approximately 2 standard errors • Clinical (practical) significance: In very large samples an effect may be significant statistically, but not in a practical sense. Report confidence intervals as well as P-values. Chapter 6 - Eye-Catching Graphs • Choice of ranges on graphs can have huge impact on interpretation (e.g. percent change) • Choice of proportion of y-axis to x-axis can distort as well (very easy to do with modern software) • Can also distort bar charts by having them start at positive values and/or trimming below an artificial baseline to 0 Chapter 6 - 1-D Pictures • Bar Charts and Pictorial Graphs should have areas proportional to values (only make comparisons in one dimension) Chapter 7 - Semiattached Figure • Target Population: Group we want to make inference regarding • Study Population: Group or items that experiment or survey is conducted on • When comparative studies are conducted among products,treatments, or groups; what is the comparison product, treatment, or group? • Control for all other potential risk factors when studying effects of factors Chapter 8 - Causal Relationships • Correlation does not imply causation • Elements of causal relationships – Association between Y and X – Clear time ordering (X precedes Y) – Removal of alternative explanations (controlling for other factors) – Dose-Response (when possible)