Chapter 2 Collecting Data with Surveys and Scientific Studies Surveys • Instruments used to obtain demographic characteristics and attitudes or behavioral tendencies from subjects • Passive in nature, obtaining “naturally occuring” information • Many fields conduct surveys regularly: – – – – Public Opinions: Gallup, CNN, WSJ, TV Networks Government Bureaus: Census, Labor Statistics Business: Customer satisfaction, Quality, Practices Recreation: State parks and wildlife area usage Sampling Methods • Simple Random Sampling: Frame listing all N elements of population exists. Random numbers used to obtain a sample of n elements such that all samples of size n had equal chance of selection • Stratified Random Sampling: Population split into homogeneous groups (strata) based on auxiliary variable(s) such as gender, income, race. Simple random samples taken from each stratum. • Cluster Sampling: Population broken into set of clusters (often based on location), and sample of clusters are selected, with all elements in sampled cluster measured • Systematic Sampling: Element selected at random near top of list, then every kth element subsequently measured Survey Problems • Nonresponse: If people who do not respond tend to differ systematically from responders, results will be biased • Measurement Problems – Recall: Tendency to forget occurences of certain things or be unable to give accurate counts of frequency of occurrence – Leading Questions: Wording of questions can lead to certain responses that can bias survey results – Unclear Wording: Different people can interpret the same question in different ways, making results inaccurate when responses depend on interpretations Survey Techniques • Personal Interviews: In person, face-to-face meetings between interviewer and interviewee. Biases can occur due to the interaction. • Telephone Interviews: Interview over the phone. Less costly than personal interviews. Bias can occur due to unlisted numbers and different schedules for different people. • Self-administered Questionnaire: Inexpensive, but notoriously low response rates. Can be done by mail or on internet. • Direct Observation: Measurements made directly using monitoring equipment or public records Scientific Studies • Designed Experiment: Investigation to obtain/ compare measurements from subjects under various conditions • Elements of Experiments: – Factors: Variables to be controlled by experimenter – Measurements/Observations: Responses that are recorded (but not controlled) by the experimenter. Outcome of interest – Treatments: Conditions constructed from factor(s) to be assigned to units. Control is “benchmark” condition. – Experimental Unit: Physical entity receiving treatment – Replication: Treatments are assigned to more than one unit so that experimental error/variation can be measured – Measurement Unit: Unit on which observation is made. Could be experimental unit, or a “smaller part” (e.g. student in class) Treatment Designs • 1-Factor: Completely Randomized Design • Multi-Factor: Factorial Treatment Design – Full factorial: All combinations of factor levels are observed in experiment. – Fractional factorial: Subset of all possible factor level combinations observed (when too many exist) • Randomized Block Design: Experimental units broken into multiple measurement units (blocks), and treatments assigned at random to measurement units within blocks • Latin Square Design: Similar to Randomized Block Design, except positions within blocks have effects to be controlled (e.g. tire positions on an automobile) Factorial Treatment Design in CRD • 2 Factors: A and B (A has a levels, B has b levels) • 1-at-a-Time Approach: Vary levels of Factor A, while holding factor B constant and vice versa. Can obtain main effects for each factor, but not interaction. • Interaction: When effects of levels of one factor depend on the level of the other factor, and vice versa • Factorial Treatment Structures: Generate all ab combinations of levels of Factors A and B. Randomly assign experimental units to these treatments as in Completely Randomized Design with one factor. Statistical Interaction Absent No Interaction 80 70 60 Mean Rresponse 50 B=1 40 B=2 30 20 10 0 1 2 Factor A 3 Statistical Interaction Present Interaction Present 90 80 70 Mean Response 60 50 B=1 B=2 40 30 20 10 0 1 2 Factor A 3 Observational Studies • Sometimes cannot assign experimental units to treatments due to nature or ethics – Gender, race, religion cannot be assigned to subjects – Items cannot be assigned at random to manufacturer (they are built by firm) • Would like to compare factor levels anyway • More difficult to assess causal relationships since external factors may be related to identified factors in study which cause observed differences • Often will attempt to “control” for other factors in analysis