Week 5 Dr. Jenne Meyer Field of statistics regarded as dry, too antiseptic and unfeeling Dispel this misapprehension by sharing these data stories with you Individual people speaking from behind the numbers Hurricane Katrina was the costliest and one of the deadliest hurricanes in American history. Damages exceeded $50 billion and fatalities exceeded 1300, according to the National Oceanic and Atmospheric Administration. In September 2005, a survey was conducted of a group of hurricane survivors who had later been moved to shelters in the Greater Houston area. The respondents who did not evacuate were asked what was their most important reason for not evacuating. Katrina survivors’ reasons for not evacuating. Katrina survivors’ most important reasons for not evacuating Look at the accompanying illustration. Do you find this 2000 presidential election ballot confusing? There is evidence that many Palm Beach County, Florida, residents did find the ballot confusing. According to the Palm Beach Post, confused voters marked more than one choice on the “butterfly ballot.” Gore likely lost 6607 votes because of these ballots. Evidence that many confused Palm Beach County voters chose Reform Party candidate Pat Buchanan by mistake. Informal meaning of statistic-number that describes a person, a group, or a set of items. According to a recent survey, 54% of the men surveyed responded that they believed in aliens, and 33% of the women did. Statistics is the art and science of collecting, analyzing, presenting, and interpreting data. A business major interested in diversifying her portfolio to stocks based on their price/earnings ratio. A psychology major interested in whether there are differences in therapeutic outcomes between traditional counseling methods and a new cognitive approach. An education major interested in whether listening to a Mozart sonata before taking an exam can significantly improve a grade. How would researchers go about studying whether superstitions change the way people behave? What kind of evidence would support the hypothesis that Friday the 13th causes a change in human behavior? T. J. Scanlon and his co-researchers thought that if there were fewer vehicles on the road on Friday the 13th than on the previous Friday, this would be evidence that some people were playing it safe on Friday the 13th and staying off the roads. What they deemed important is the effect of such a superstition on human behavior and how to measure such an effect as a change in behavior. Phase 1 Data collection. Select method to collect the data. Obtained data kept by the British Department of Transport on the traffic flow through certain junctions of the M25 motorway in England. Phase 2 Data analysis. Determine ways to analyze the data. Compared the number of vehicles passing through certain junctions on the M25 motorway on Friday the 13th and the previous Friday during 1990, 1991, and 1992. Table 1.3 Traffic through M25 junctions, 1990–1992 Phase 3 Data presentation Presentation of the results important Researchers found a highly respectable journal, the British Medical Journal, in which to publish their findings. Other methods: delivering a talk at a conference, writing up a report for one’s supervisor, or presenting a class project. Phase 4 Data interpretation Results should be understandable to nonstatisticians. In this case, researchers chose decrease in number of vehicles as the criterion on which to base support for their hypothesis that people changed their behavior on Friday the 13th. Consistent decrease in traffic on Friday the 13th supports hypothesis. Refers to methods for summarizing and organizing the information in a data set. We use numbers (such as counts and percents) and graphics to describe the data set, as a first step in data analysis. An element is a specific entity for which information is collected. A variable is a characteristic of an element, which can assume different values for different elements. An observation is the set of values of the variables for a given element. Is a variable that does not have a numeric value but is classified into categories. Qualitative variables are also called categorical variables, because they can be grouped into categories. A quantitative variable is a variable that takes numeric values. Quantitative variables can be classified as either discrete or continuous. A discrete variable can take either a finite or a countable number of values. Each value can be graphed as a separate point on a number line, with space between each point. A continuous variable can take infinitely many values, forming an interval on the number line, with no space between the points. Nominal – names, labels, or categories. No natural or obvious ordering of nominal data (such as high to low) Ordinal – arranged in a particular order, no arithmetic can be performed on ordinal data, ie poor, satisfactory, good, or best. Interval – Same as ordinal data, has the extra property that subtraction may be carried out on interval data, “no natural zero”, high temperatures in the city of Pompey’s Pillar, Montana for the month of December. Ratio – similar to interval data, division may be carried out, natural zero exists, example salaries of college professors. Descriptive methods of data analysis are widespread and quite informative. The modern field of statistics involves much more than simply summarizing a data set. Learning about the characteristics of a population by studying those characteristics in a subset of the population (that is, in a sample) Time Cost Destructive nature of sampling Access to sample Sampling is good enough Attributes or characteristics of the population are generally normally distributed. For instance, when attributes such as height and weight are considered, most people will be clustered around the mean, leaving only a small number at the extremes who are either very tall or very short What is the relevant target population of focus to the study? What exactly are the parameters we are interested in investigating? What kind of a sampling frame is available? What is the sample size needed? What costs are attached to the sampling design? How much time is available to collect the data from the sample? Simple random sample Systematic sampling (every nth one) Stratified random sampling (random samples from segments) Cluster sampling (random clusters) Area sampling Convenience sampling Quotas Administrative – self identifying, usually comes from the company Classification – demographic info Target questions Structured – closed-ended Unstructured – open-ended Surveys (paper, online, in person, mail in) Interviews Focus Groups Ethnography Observation Case study Content analysis Omnibus survey ** Should this question be asked? Is the questions of proper scope and coverage? Can the participant adequately answer this question as asked? Will the participant willingly answer this question, as asked? Vague or ambiguous terminology Technical terminology Hypothetical questions – must be reasonable for meaningful answers Leading questions (Would you agree the government’s policies on healthcare are unfair?), would you agree, do you agree questions… Value judgments – do not express your views Context effects – be aware of the impact of GM took a step back when it tried to market the NOVA in Central and South America. In Spanish, “No va” means “it doesn’t go.” Pepsi’s “Come Alive With the Pepsi Generation,” when translated into Chinese, means “Pepsi brings Your Ancestors From the Grave.” Frank Perdue’s chicken slogan, “It takes a strong man to make a tender chicken” translates in Spanish to, “It takes an aroused Demographic Behavioral Attitudes and opinions Knowledge Intentions, expectations, and aspirations Conducted when a survey or sampling methods cannot be used. Researchers investigate how varying the predictor variable affects the response variable A predictor variable (or explanatory variable)is a characteristic purported to explain differences in the response variable. Treatment - predictor variable that takes the form of a purposeful intervention Three main factors: Control Randomization Replication Used when an experimental study is not possible for ethical reasons Observes whether the subjects’ differences in the predictor variable are associated with differences in the response variable No attempt to manipulate the variables A 2006 Surgeon General’s report found that “the evidence is sufficient to infer a causal relationship” between secondhand tobacco smoke exposure from parental smoking and respiratory illnesses in infants and children. Was this report based on an experimental study or an observational study? Solution Unethical to force the parents of a treatment group to smoke tobacco The study must have been an observational one