Chapter 3 Generating Data Introduction to Data Collection/Analysis • Exploratory Data Analysis: Plots and Measures that describe a set of measurements with no clear research questions posed. • Statistical Inference: Methods used to make statements regarding population(s) based on sample data • Statistical Design: Strategy to obtain data to answer research questions (gameplan) • Anecdotal Evidence: Information obtained from individual, high profile, cases (plane crashes, storms, etc) Data Sources • Available Data: Information previously obtained and available in libraries and/or the Internet • Sampling: Selecting a subset from population of interest and obtaining relevant information from individuals (observational study) • Census: Information collected from all individuals in a population • Experiment: Individuals are placed in various conditions by researchers and responses are then obtained Experimental Design • Experimental Units: Individuals participating in experiment (Humans often called Subjects or Ss) • Treatment: Specific condition applied to units • Factor: Explanatory variable used in experiment. Many experiments have more than 1 factor • Factor Level: Value that a factor takes on. • Example: Unplanned Purchases – 68 subjects selected, response: #unplanned items purchased – Factors: Store Knowledge and Time Pressure – Factor Levels: Knowledge(Familiar/Unfamiliar) Time Pressure(Present/Absent) – Treatments: 4 Cominations of Knowledge and Time Pressure Unplanned Purchases Experiment Time Pressure No Time Pressure Familiar 17 subjects Environment Mean=2.29 17 subjects Mean=3.62 Unfamiliar 17 subjects Environment Mean=2.13 17 subjects Mean=7.68 Comparative Experiments • Goal: Compare two or more conditions (treatments) • Units assigned at random to receive 1 treatment (usually, although some designs have each unit receive each treatment) • Placebo Effect: Phenomena where subjects show improvement even when given a dummy treatment • Control Group: Subjects that receive a placebo or non-active agent or no treatment at all • Biased Design: Favors certain response outcomes • Randomization: Use of chance to assign units to treatment conditions Principles of Experimental Design • Control: Removing effects of lurking variables by comparing two or more treatments • Randomization: Use of chance to allocate subjects to treatments. Removes personal biases. Makes use of tables/computer programs for random digits • Replication: Apply treatments to as many units as possible • Statistical Significance: Observed effect that exceeds what could be expected by chance Miscellaneous Topics • Blinding: Whenever possible, subject and observor should be unaware of which treatment was assigned. When neither knows it’s called “double-blind” • Realism: Do the conditions in the experiment the realworld setting of interest to investigators • Matching: Identifying pairs of units based on some criteria expected to be related to response, then randomly assigning one from each pair to each treatment • Block Design: Extension of matching to more than 2 groups (subjects can be their own blocks and receive each treatment in some experiments) Sampling Design • Population: Entire set of individuals of interest to researcher • Sample: Subset of population obtained for data collection/information gathering • Voluntary Response Sample: Individuals who self-select themselves as respondents. Internet polls are example. Tend to be very biased. • Simple Random Sample: Sample selected so that each group of n individuals is equally likely to be selected • Probability Sample: Sample chosen by chance • Stratified Random Sample: Simple Random samples selected from pre-specified groups (strata) Miscellaneous Topics in Sampling • Multistage Sampling: Government surveys tend to have multiple levels in the sampling process. Primary Sampling Unit Block Clusters of units • Undercoverage: Groups in the population are not included in sample • Nonresponse: Individuals Selected who do not respond • Biases: – Response Bias: Subject gives answer to please interviewer – Recall Bias: Tendency for some subjects to remember something from past – Wording: Questions can be phrased to elicit certain responses Introduction to Statistical Inference • Parameter: Number describing a population - Population Mean (Quantitat ive Variable) p - Population proportion with a characteri c (Categoric al) • Statistic: Number describing a sample x - Sample Mean # in sample with characteri stic p - Sample proportion sample size ^ Parameters are fixed (usually Unknown) values. Statistics vary from one sample to another due to different individuals Sampling Distributions • Sampling Distribution: Distribution of values that a statistic can take on across all samples from the population. – Shape: For large samples, the sampling distributions of sample means and proportions tend to be approximately normal – Center: The center of he sampling is equal to the parameter value in the population (unbiased) – Spread: The spread of the distribution decreases as the sample size increases (variability of statistic shrinks as sample size gets larger) – Margin of error: Bounds on the size of likely sampling error (difference between sample statistic and population parameter)