Chapter 1 1-1 Chapter Goals After completing this chapter, you should be able to: ♦Population vs. Sample Chapter 1 Introduction and Data Collection ♦Discrete vs. Continuous data Chap 1-1 Explain the difference between descriptive and inferential statistics Describe the difference between inductive and deductive reasoning Yandell - GSBA 502 Chap 1-2 What is Statistics? Why study statistics? What Is Statistics? To become a better decision maker Statistics involves Statistics is a tool used to gather, present, analyze, and interpret data for decision making collecting, organizing, summarizing, and analyzing data ♦Primary vs. Secondary data types ♦Categorical vs. Numerical data ♦Time Series vs. Cross-Sectional data Yandell - GSBA 502 Describe key data collection and sampling methods Know key definitions: and using the results for decision making Yandell - GSBA 502 Chap 1-3 Yandell - GSBA 502 Chap 1-4 Why Study Statistics? Why Study Statistics? (continued) (continued) Business decision making: Personal decision making: everyone can benefit from the ability to make informed decisions managers in all business fields must make choices based on statistical evidence Examples: Examples: smoking and health recalls and car safety marital status or sex or age vs. car insurance rates passing offense and points scored etc.... marketing surveys, accounting audits, economics forecasts, manufacturing quality control, finance trends Many everyday events and decisions involve statistical reasoning and analysis of data Recent news report: night lights and nearsightedness see file nightlight_news.htm Click here to see news article Yandell - GSBA 502 Chap 1-5 Yandell - GSBA 502 Chap 1-6 Chapter 1 1-2 Two broad categories of constraints: Key Aspects of Decision Making Objectives Alternatives Without alternatives, no choices are necessary. These define the alternative means of satisfying the objective Environmental: These represent the goals of the decision maker - they define the problem and determine the incentives of the decision maker Constraints Yandell - GSBA 502 Informational: Constraints determine the boundaries of the feasible (or allowable) set of actions by the decision maker Chap 1-7 factors that cannot be controlled by the decision maker Physical (resources, budget) Technical (technology) Legal (government barriers or restrictions) Behavioral (customs or traditions) results of alternative actions are uncertain Risk Uncertainty Yandell - GSBA 502 Chap 1-8 Completing the Decision Making Process Establish objectives Define the problem Decision Process Flowchart Data collection and analysis Information is needed to support the decision making process Decision Consider input and resource constraints Identify types and availability of existing data Use graphical summary techniques An action is chosen based on objectives, alternatives, constraints, and data Scope of this course: Identify possible solutions Consider sources of variation Gather data Summarize, analyze, and interpret data Select the best expected solution Consider legal and other constraints Develop surveys or sampling plans to acquire needed data Use numerical summary techniques and analytical methods Evaluate risk and uncertainty Implement the decision Yandell - GSBA 502 Chap 1-9 Yandell - GSBA 502 Deductive reasoning Statistical Reasoning Chap 1-10 (from general to specific) Statistics is more than mathematical manipulation, it is a way of thinking about data Example All children are short (Major premise = general statement) Statistical logic: We use Inductive Logic rather than Deductive Logic Yandell - GSBA 502 Chap 1-11 Tom is a child (Minor premise = specific example) Therefore . . . Yandell - GSBA 502 Chap 1-12 Chapter 1 1-3 Deductive reasoning (from general to specific) Using Venn diagrams: (continued) Example “Children” represents an entire group (the “population”) about which a characteristic is known “Tom” is one of the group (a “sample observation” or “member” of the population) All children are short (Major premise = general statement) Tom is a child (Minor premise = specific example) Therefore Tom is short (conclusion) Yandell - GSBA 502 Chap 1-13 Tom and so Tom has the group characteristic Yandell - GSBA 502 Inductive reasoning Chap 1-14 Inductive reasoning (from specific cases to the general case) (from specific cases to the general case) (continued) Example Example I have seen many children I have seen many children All of them were short All of them were short Therefore . . . Therefore, the average height of children is likely to be small Yandell - GSBA 502 Chap 1-15 Yandell - GSBA 502 Using Venn diagrams Chap 1-16 Tools of Business Statistics An observed characteristic of a sample of children leads us to infer a characteristic of the entire population (even though they all have not been observed) Descriptive statistics Children I have seen Chap 1-17 Collecting, presenting, and describing data Inferential statistics All Children Yandell - GSBA 502 Children Yandell - GSBA 502 Drawing conclusions and/or making decisions concerning a population based only on sample data Chap 1-18 Chapter 1 1-4 Descriptive Statistics Data Sources Collect data e.g. Survey, Observation, Primary Secondary Data Collection Data Compilation Experiments Print or Electronic Present data Characterize data Observation e.g. Charts and graphs e.g. Sample mean = ∑x Experimentation i n Yandell - GSBA 502 Chap 1-19 Yandell - GSBA 502 Populations and Samples Examples: Population a b All likely voters in the next election All parts produced today All sales receipts for November Examples: x y Chap 1-21 Sample data is needed to make inferences about population parameters c gi o z n r u y Chap 1-22 Why Sample? Why Gather Data? b Yandell - GSBA 502 Data Collection and Sampling cd o p q rs t u v w 1000 voters selected at random for interview A few parts selected for destructive testing Every 100th receipt selected for audit Yandell - GSBA 502 Sample ef gh i jk l m n A Sample is a subset of the population Chap 1-20 Population vs. Sample A Population is the set of all items or individuals of interest Survey i.e., the true proportion of defective parts in large shipment the true mean of a population Less time consuming than a census Less costly to administer than a census It is possible to obtain statistical results of a sufficiently high precision based on samples To reach valid inferences, we need to examine a representative sample of the population Yandell - GSBA 502 the observed sample characteristic is used to infer the likely value of the population characteristic. Chap 1-23 Yandell - GSBA 502 Chap 1-24 Chapter 1 1-5 Sampling Techniques Statistical Sampling Samples Non-Probability Samples Judgement Items of the sample are chosen based on known or calculable probabilities Probability Samples Probability Samples Simple Random Convenience Systematic Stratified Simple Cluster Yandell - GSBA 502 Chap 1-25 Systematic Cluster Yandell - GSBA 502 Chap 1-26 Simple Random Samples Example Every individual or item from the population has an equal chance of being selected Selection may be with replacement or without replacement Samples can be obtained from a table of random numbers or computer random number generators Yandell - GSBA 502 Stratified Random Chap 1-27 Suppose the output from a production process is identified by a code number For simplicity assume that the code number runs from 1 to 10,000 (hence the population consists of 10,000 objects) A random sample of 100 items could be drawn by using a random number generator to create 100 numbers corresponding to 100 codes in the population Yandell - GSBA 502 Example Chap 1-28 Stratified Samples (continued) If we require that each of the randomly generated numbers be unique then this technique represents sampling without replacement Sampling with replacement would allow for the possibility that a particular item be selected more than once This has the advantage of ensuring that the probability of selecting an object remains constant and independent across trials. The obvious disadvantage is the potential for redundant observations Population divided into subgroups (called strata) according to some common characteristic Simple random sample selected from each subgroup Samples from subgroups are combined into one Population Divided into 4 strata Sample Yandell - GSBA 502 Chap 1-29 Yandell - GSBA 502 Chap 1-30 Chapter 1 1-6 Stratified Random Sampling Systematic Samples (continued) Decide on sample size: n The separate samples from the different strata do not need to be the same size Divide frame of N individuals into groups of k individuals: k=N/n the total sample could be drawn so that the proportion from each strata reflects the composition of the population Randomly select one individual from the 1st group Select every kth individual thereafter N = 64 n=8 First Group k=8 Yandell - GSBA 502 Chap 1-31 Yandell - GSBA 502 Chap 1-32 Systematic Sampling Example Cluster Samples A sample can be gathered by choosing items according to a set rule Example Population is divided into several “clusters,” each representative of the population A simple random sample of clusters is selected If the objects in a population of 1,000 are randomly ordered then a sample of size 100 could be chosen by selecting every tenth item All items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling technique Population divided into 16 clusters. Yandell - GSBA 502 Chap 1-33 Yandell - GSBA 502 Validity of Results Chap 1-34 A Potential Problem: Sampling Error The validity of the sample results depends on how the sample is gathered There is no “best” technique for gathering a sample The user must decide which sampling technique best suits the current problem Given that a sample is to be taken, there are two sources of error that can result. 1. sampling error we will not know the true population parameters exactly because we are using only a subset of the population Yandell - GSBA 502 Randomly selected clusters for sample 2. nonsampling error error due to other factors (arises independently of the sampling technique involved) Chap 1-35 Yandell - GSBA 502 Chap 1-36 Chapter 1 1-7 Example Random Sampling Suppose young male survey workers only approach attractive females to complete a survey Then sample results may be biased since all members of the population did not have an equal chance to be selected for the sample. Young females may not represent the relevant population Notice that the word “random” was used in each example above to describe the method of selecting elements from a population for a sample Only random sampling plans generate information that reliably mirrors the extent and pattern of the variation in the population The results would be biased due to nonsampling error. Yandell - GSBA 502 Chap 1-37 Yandell - GSBA 502 Chap 1-38 Sampling Bias Random Sampling (continued) Simply put, a sample is random if each element of the population is equally likely to be selected for the sample Naturally, the actual elements selected for the sample are variable, and some unusual sample results may occur This variability in sample results leads to variability in estimating population parameters . . . but the random nature of the sample insures that the results can be statistically investigated for insight into their reliability Yandell - GSBA 502 Non-random sampling plans may introduce biases. Bias destroys the ability to make good decisions. Examples of sampling plans that may generate biases are: Chap 1-39 a. Sampling from the same location in all containers or bins. b. Sampling every nth item from a production run if production variation is not random. c. Ignoring lot portions that are inconvenient to sample. d. Sampling performance at constant (specified) time intervals. Yandell - GSBA 502 Chap 1-40 Key Definitions A population is the entire collection of things under consideration Data Types Data A parameter is a summary measure computed to describe a characteristic of the population Attribute (Categorical) A sample is a portion of the population selected for analysis Variable (Numerical) Examples: A statistic is a summary measure computed to describe a characteristic of the sample Marital Status Political Party Satisfactory vs. Defective Eye Color (Defined categories) Discrete Examples: Yandell - GSBA 502 Chap 1-41 Yandell - GSBA 502 Number of Children Defects per hour (Counted items) Continuous Examples: Weight Voltage (Measured characteristics) Chap 1-42 Chapter 1 1-8 Attribute Data Variable Data categories can be identified not measured in physical units Examples: scratched vs. scratch-free spreads peanut butter first vs. spreads jelly first satisfactory vs. defective any “yes” vs. “no” question cars are “4 cylinder”, “6 cyl”, “8 cyl”, or “other” marital status categories ... Yandell - GSBA 502 Chap 1-43 expressed in actual physical units can be measured or given a numerical value Examples: Continuous Voltage Time Weight Income Hardness Discrete Number of brothers and sisters Number of data entries per hour Number of absences per month Number of change orders per week Number of rings before phone is answered Yandell - GSBA 502 Chap 1-44 Data Types Data Types (continued) Sales (in $1000’s) Time Series Data Ordered data values observed over time Cross Section Data Data values observed at a fixed point in time 2003 2004 2005 2006 Atlanta 435 460 475 490 Boston 320 345 375 395 Cleveland 405 390 410 395 Denver 260 270 285 280 Time Series Data Cross Section Data Yandell - GSBA 502 Chap 1-45 Yandell - GSBA 502 Inferential Statistics Inferential Statistics Drawing conclusions and/or making decisions concerning a population based on sample results. Making statements about a population by examining sample results Sample statistics (known) Population parameters Inference Estimation (unknown, but can be estimated from sample evidence) Yandell - GSBA 502 Population Chap 1-47 e.g.: Estimate the population mean weight using the sample mean weight Hypothesis Testing Sample Chap 1-46 e.g.: Use sample evidence to test the claim that the population mean weight is 120 pounds Yandell - GSBA 502 Chap 1-48 Chapter 1 1-9 Description vs. Inference Description vs. Inference 2. Inference 1. Description a. graphical techniques for summarizing and presenting information Examples: graphs, charts, histograms, other visual displays – text chapter 2 What can be said from sample data? At what levels of confidence? With what sources and amount of error? b. numerical summary values Examples: mean, median, standard deviation, quartiles – text chapter 3 Yandell - GSBA 502 Chap 1-49 Yandell - GSBA 502 Chap 1-50 Road Map Semester Road Map A useful summary chart (text p. 7) It might be helpful to think of the course material in 3 main blocks (see next slide) Yandell - GSBA 502 Chap 1-51 Yandell - GSBA 502 Chap 1-52 Survey Design Steps Survey Design Steps (continued) Define the issue what are the purpose and objectives of the survey? Define the population of interest Formulate survey questions Yandell - GSBA 502 make questions clear and unambiguous use universally-accepted definitions limit the number of questions Chap 1-53 Pre-test the survey pilot test with a small group of participants assess clarity and length Determine the sample size and sampling method Select Sample and administer the survey Yandell - GSBA 502 Chap 1-54 Chapter 1 1-10 Types of Questions Data from Surveys Closed-end Questions Select from a short list of defined choices Example: Major: __business __liberal arts __science __other Open-end Questions Respondents are free to respond with any value, words, or statement Potential problems with surveys What population is being sampled? Are responses random and representative? Are questions likely to generate truthful and relevant data? How confident are we that survey responses accurately represent the population? Example: What did you like best about this course? Demographic Questions Questions about the respondents’ personal characteristics Example: Gender: __Female __ Male Yandell - GSBA 502 Chap 1-55 Yandell - GSBA 502 Chap 1-56 Survey Errors Survey Errors (continued) 1. Coverage error or selection bias . . . may occur if selected members of the population are not represented in the survey sample 2. Nonresponse error or nonresponse bias . . . 3. Sampling error . . . may occur if those who respond are not representative of the full population. This is why survey follow-ups are done to try to get a large response rate. Yandell - GSBA 502 will always exist, since only a sample is obtained from the population. Statistical methods can be used to report the size of this error. 4. Measurement Error . . . may exist if the questions (or the interviewer’s non-verbal actions) influence responses. Non-neutral wording can elicit emotional reactions that may not appear if questions are worded differently. Chap 1-57 Yandell - GSBA 502 Chap 1-58 Chap 1-59 Yandell - GSBA 502 Chap 1-60 It’s A Magical World Bill Watterson Scholastic, Inc., 1996 Yandell - GSBA 502 Chapter 1 1-11 Link to Survey and Polling Sites Visit the links collected on this web page to find out more about survey and polling techniques, or to see what recent opinion polls reveal about current social, political, and economic issues: links to national polling and opinion survey sites Yandell - GSBA 502 Chap 1-61 Yandell - GSBA 502 Chap 1-62 Class Discussions Questions Discussion Question 2 1. A newspaper clipping appeared stating “A random inspection of 4500 tires removed from cars in four American cities showed that 1125 were worn down below tread indicators, Firestone [tire company] reports. The tire company said this is a potentially dangerous practice because 90 percent of tire trouble occurs in the 10 percent of tread life.” 2. In the late 1960s and early 1970s metal-studded snow tires were widely used during winter months in Canada and the northern U.S. to increase traction on icy roads. The studs were blamed for tearing up roads and were ultimately banned in many states. A newspaper article reported that . . . “National Safety Council tests have demonstrated that studs improve stopping distance by as much as 50 percent and best starting traction in some instances as high as 50 percent on ‘glaze ice’ near the freezing point – soft ice that is easy for the studs to grip.” Do the statistics and information presented mislead or illuminate the issue of tire safety? Yandell - GSBA 502 Chap 1-63 Yandell - GSBA 502 Chap 1-64 Discussion Question 2 Discussion Question 3 (continued) Nevertheless, according to the article, a survey in Ontario, Canada . . . Are statistical procedures a replacement for intuition and common sense? Is the reverse true? What should be done if the conclusions from a statistical analysis seem to conflict with intuition and common sense? “concluded that vehicles with studded tires are involved in a somewhat bigger percentage of accidents on icy roads than those on clear surfaces.” Give possible explanations for the higher accident rates for drivers with studded snow tires on icy roads. Can you think of flaws in the design of the Ontario study, or in the description of the findings, the drawing of inferences, or the (implied) decision of the study? Yandell - GSBA 502 Chap 1-65 Yandell - GSBA 502 Chap 1-66 Chapter 1 1-12 For further reading: Chapter Summary “How numbers are tricking you” Reviewed key data collection methods Introduced key definitions: By Arnold Barnett Technology Review, October 1994 Click here to see article Yandell - GSBA 502 Chap 1-67 ♦Population vs. Sample ♦Primary vs. Secondary data types ♦Attribute vs. Variable data ♦Time Series vs. Cross-Sectional data Examined descriptive vs. inferential statistics Described different sampling techniques Reviewed types of data Yandell - GSBA 502 Chap 1-68