Statistics for Business Administration (TOAE302) Theory of Economics Statistics (TOAE301) Nguyen Thu Hang nguyenthuhang.cs2@ftu.edu.vn Assessment Attendance: 10% Mid-term test: 30% Final exam: 60% Course outline Chapter 1: Introduction to Statistics Chapter 2: Summarizing Data Chapter 3: Numerical Descriptive Techniques Chapter 4: Inferences Based on a Single Sample: Confidence Intervals and Tests of Hypothesis Chapter 5: Inferences Based on a Two Samples Confidence Intervals and Tests of Hypothesis Chapter 6: ANOVA Analysis Chapter 7: Regression Analysis Chapter 8: Time series analysis Text book Gerald Keller (2018), Statistics for Management and Economics, Cengage Learning James T. McClave • P. George Benson • Terry Sincich (2018), Statistics for Business and Economics, Pearson Education Levin, Stephan, Krehbiel & Berenson, Statistics for Managers Using Microsoft Excel, 8e © 2017 Pearson Prentice-Hall, Inc. Chapter 1 Introduction to Statistics (6 hours) Chapter outline In this chapter you learn: 1. Statistics Definition and Objectives 2. Statistical Concepts 3. Types of data and variable measurements 4. Statistical Analysis Process 5. Source of Data 6. Questionnaire design Business Statistics Marks A student enrolled in a business program is attending the first class of the required statistics course. The student is somewhat apprehensive because he believes the myth that the course is difficult. To alleviate his anxiety, the student asks the professor about last year’s marks. The professor obliges and provides a list of the final marks, which is composed of term work plus the final exam. What information can the student obtain from the list? Business Statistics Marks A student enrolled in a business program is attending the first class of the required statistics course. The student is somewhat apprehensive because he believes the myth that the course is difficult. To alleviate his anxiety, the student asks the professor about last year’s marks. The professor obliges and provides a list of the final marks, which is composed of term work plus the final exam. What information can the student obtain from the list? Case Pepsi’ Agreement Case Pepsi’ Agreement 1. What Is Statistics? 1. Collecting Data e.g., Survey 2. Presenting Data e.g., Charts & Tables 3. Characterizing Data e.g., Average Data Analysis Why? DecisionMaking 1. What is statistics? A branch of mathematics taking and transforming numbers into useful information for decision makers. Statistics is a way to get information from data. Methods for processing & analyzing numbers Methods for helping reduce the uncertainty inherent in decision making 1. What Is Statistics? Statistics is the science of data. It involves collecting, classifying, summarizing, organizing, analyzing, interpreting numerical information. Application Areas Economics Forecasting Demographics Sports Individual & Team Performance Engineering Construction Materials Business Consumer Preferences Financial Trends Objectives of Statistics Decision Makers Use Statistics To: Present and describe business data and information properly Draw conclusions about large groups of individuals or items, using information collected from subsets of the individuals or items. Make reliable forecasts about a business activity Improve business/production processes Improve product quality Statistics: Two Processes A Describing sets of data B Drawing conclusions making estimates, decisions, predictions, etc. about sets of data based on sampling Types of Statistics Statistics The branch of mathematics that transforms data into useful information for decision makers. Descriptive Statistics Collecting, summarizing, and describing data Inferential Statistics Drawing conclusions and/or making decisions concerning a population based only on sample data Descriptive Statistics Collect data Present data e.g., Survey e.g., Tables and graphs Characterize data X e.g., Sample mean = n i Descriptive Statistics Descriptive statistics utilizes numerical and graphical methods to explore data, i.e., to look for patterns in a data set, to summarize the information revealed in a data set, to present the information in a convenient form. Inferential Statistics Estimation e.g., Estimate the population mean weight using the sample mean weight Hypothesis testing e.g., Test the claim that the population mean weight is 120 pounds Drawing conclusions about a large group of individuals based on a subset of the large group. Inferential Statistics Inferential statistics utilizes sample data to make estimates, decisions, predictions, other generalizations about a larger set of data. Example- Inferential statistics 2. Statistical Concepts Experimental unit Object upon which we collect data Population the totality of objects under consideration • P in Population Variable & Parameter Characteristic of an individual experimental unit • S in Sample & Statistic Measurement the process we use to assign numbers to variables of individual population units Sample Subset of the units of a population that is selected for analysis Measurement Numerical representations are not often readily available for some variables, so the process of measurement plays an important supporting role in statistical studies. Measurement is the process we use to assign numbers to variables of individual population units. Measure the preference for a food product by asking a consumer to rate the product’s taste on a scale from 1 to 10. Measure workforce age by simply asking each worker, “How old are you?”. Measure gender by giving 0 and 1 for female and male, respectively. 2. Statistical Concepts Data facts or information that is relevant or appropriate to a decision maker Parameter a summary measure (e.g., mean) that is computed to describe a characteristic of the population Statistic a summary measure (e.g., mean) that is computed to describe a characteristic of the sample Population vs. Sample Population Measures used to describe the population are called parameters Sample Measures computed from sample data are called statistics Example According to a report in the Washington Post (Sep. 5, 2014), the average age of viewers of television programs broadcast on CBS, NBC, and ABC is 54 years. Suppose a rival network (e.g., FOX) executive hypothesizes that the average age of FOX viewers is less than 54. To test her hypothesis, she samples 200 FOX viewers and determines the age of each. a. Describe the population. b. Describe the variable of interest. c. Describe the sample. d. Describe the inference. 2. Statistical Concepts Measure of Reliability • Statement (usually qualified) about the degree of uncertainty associated with a statistical inference Four Elements of Descriptive Statistical Problems 1. 2. 3. 4. The population or sample of interest One or more variables (characteristics of the population or sample units) that are to be investigated Tables, graphs, or numerical summary tools Identification of patterns in the data Five Elements of Inferential Statistical Problems 1. 2. 3. 4. 5. The population of interest One or more variables (characteristics of the population units) that are to be investigated The sample of population units The inference about the population based on information contained in the sample A measure of reliability for the inference Example Example “The actual preference for Pepsi is between 51% and 61%” This interval represents a measure of reliability for the inference Process (optional) A process is a series of actions or operations that transforms inputs to outputs. A process produces or generates output over time. Process A process whose operations or actions are unknown or unspecified is called a black box. Any set of output (object or numbers) produced by a process is called a sample. Example A particular fast-food restaurant chain has 6,289 outlets with drive-through windows. To attract more customers to its drive-through services, the company is considering offering a 50% discount to customers who wait more than a specified number of minutes to receive their order. To help determine what the time limit should be, the company decided to estimate the average waiting time at a particular drive-through window in Dallas, Texas. For 7 consecutive days, the worker taking customers’ orders recorded the time that every order was placed. The worker who handed the order to the customer recorded the time of delivery. In both cases, workers used synchronized digital clocks that reported the time to the nearest second. At the end of the 7day period, 2,109 orders had been timed. Example (cont) a. Describe the process of interest at the Dallas restaurant. b. Describe the variable of interest. c. Describe the sample. d. Describe the inference of interest. e. Describe how the reliability of the inference could be measured. 3. Types of Data and variable measurements Quantitative data are measurements that are recorded on a naturally occurring numerical scale. Qualitative data are measurements that cannot be measured on a natural numerical scale; they can only be classified into one of a group of categories. 3. Types of Data Types of Data Quantitative Data Qualitative Data Quantitative Data Measured on a numeric scale. Number of defective items in a lot. Salaries of CEOs of oil companies. Ages of employees at a company. 4 943 21 52 12 120 8 71 3 Qualitative Data Classified into categories. College major of each student in a class. Gender of each employee at a company. Method of payment (cash, check, credit card). $ Credit Example Chemical and manufacturing plants sometimes discharge toxic-waste materials such as DDT into nearby rivers and streams. These toxins can adversely affect the plants and animals inhabiting the river and the riverbank. The U.S. Army Corps of Engineers conducted a study of fish in the Tennessee River (in Alabama) and its three tributary creeks: Flint Creek, Limestone Creek, and Spring Creek. A total of 144 fish were captured, and the following variables were measured for each: (continued on next slide) Example (cont) 1. River/creek where each fish was captured 2. Species (channel catfish, largemouth bass, or smallmouth buffalo fish) 3. Length (centimeters) 4. Weight (grams) 5. DDT concentration (parts per million) These data are saved in the DDT file. Classify each of the five variables measured as quantitative or qualitative. Types of Variables Categorical (qualitative) variables have values that can only be placed into categories, such as “yes” and “no.” Numerical (quantitative) variables have values that represent quantities. Types of Variables Data Categorical Numerical Examples: Marital Status Political Party Eye Color (Defined categories) Discrete Examples: Number of Children Defects per hour (Counted items) Continuous Examples: Weight Voltage (Measured characteristics) Levels of Measurement A nominal scale classifies data into distinct categories in which no ranking is implied. Categorical Variables Categories Personal Computer Ownership Yes / No Type of Stocks Owned Growth Value Other Internet Provider Microsoft Network / AOL/ Other Levels of Measurement An ordinal scale classifies data into distinct categories in which ranking is implied Categorical Variable Ordered Categories Student class designation Freshman, Sophomore, Junior, Senior Product satisfaction Satisfied, Neutral, Unsatisfied Faculty rank Professor, Associate Professor, Assistant Professor, Instructor Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC, C, DDD, DD, D Student Grades A, B, C, D, F Levels of Measurement An interval scale is an ordered scale in which the difference between measurements is a meaningful quantity but the measurements do not have a true zero point. A ratio scale is an ordered scale in which the difference between the measurements is a meaningful quantity and the measurements have a true zero point. Interval and Ratio Scales Difference between interval and ordinal scales The critical difference between them is that the intervals or differences between values of interval data are consistent and meaningful (which is why this type of data is called interval). For example, the difference between marks of 85 and 80 is the same five-mark difference that exists between 75 and 70—that is, we can calculate the difference and interpret the results. Difference between interval and ordinal scales Because the codes representing ordinal data are arbitrarily assigned except for the order, we cannot calculate and interpret differences. Using a 1-2-3-4-5 coding system to represent poor, fair, good, very good, and excellent, we note that the difference between excellent and very good is identical to the difference between good and fair. With a 6-18-2345-88 coding, the difference between excellent and very good is 43, and the difference between good and fair is 5. 4. Statistical Analysis Process Identify research goals Identify variables of interest and measuring methods Data collection Data summarization Data analysis Forecasting Decision making The role of statistics in business analytics Source: From The American Statistician by George Benson. Discussion Monitoring product quality. The Wallace Company of Houston is a distributor of pipes, valves, and fittings to the refining, chemical, and petrochemical industries. The company was a recent winner of the Malcolm Baldrige National Quality Award. One of the steps the company takes to monitor the quality of its distribution process is to send out a survey twice a year to a subset of its current customers, asking the customers to rate the speed of deliveries, the accuracy of invoices, and the quality of the packaging of the products they have received from Wallace. a. Describe the process studied. b. Describe the variables of interest. c. Describe the sample. d. Describe the inferences of interest. e. What are some of the factors that are likely to affect the reliability of the inferences? Questions What are some of the factors that are likely to lead to a selection bias problem in: - A survey of customers’ satisfaction towards digital banking service? - A survey of customers’ satisfaction towards bancassurance service? 5. Sources of Data 1. 2. 3. Data from a published source Data from a designed experiment Data from an observationally study 5. Sources of Data Primary Sources: The data collector is the one using the data for analysis Data from a political survey Data collected from an experiment Observed data Secondary Sources: The person performing data analysis is not the data collector Analyzing census data Examining data from print journals or data published on the internet. 5. Sources of Data Published source: book, journal, newspaper, Web site (https://www.wider.unu.edu/data), https://data.worldbank.org/ Designed experiment: researcher exerts strict control over the units Survey: a group of people are surveyed and their responses are recorded Observation study: units are observed in natural setting and variables of interest are recorded Designed Experiment A designed experiment is a data-collection method where the researcher exerts full control over the characteristics of the experimental units sampled. These experiments typically involve a group of experimental units that are assigned the treatment and an untreated (or control) group. Observational Study An observational study is a data-collection method where the experimental units sampled are observed in their natural setting. No attempt is made to control the characteristics of the experimental units sampled. (Examples include opinion polls and surveys.) Samples A representative sample exhibits characteristics typical of those possessed by the population of interest. A simple random sample of n experimental units is a sample selected from the population in such a way that every different sample of size n has an equal chance of selection. Random Sample A simple random sample of n experimental units is a sample selected from the population in such a way that every different sample of size n has an equal chance of selection. Example Suppose you wish to assess the feasibility of building a new high school. As part of your study, you would like to gauge the opinions of people living close to the proposed building site. The neighborhood adjacent to the site has 711 homes. Use a random number generator to select a simple random sample of 20 households from the neighborhood to participate in the study Importance of Selection How a sample is selected from a population is of vital importance in statistical inference because the probability of an observed sample will be used to infer the characteristics of the sampled population. Measurement error Refer to inaccuracies in the values of the data collected. In the surveys, the error may be due to ambiguous or leading questions and the interviewer’s effect on the respondent. Nonrandom Sample Errors Selection bias results when a subset of the experimental units in the population is excluded so that these units have no chance of being selected for the sample. Nonresponse bias results when the researchers conducting a survey or study are unable to obtain data on all experimental units selected for the sample. Measurement error refers to inaccuracies in the values of the data recorded. In surveys, the error may be due to ambiguous or leading questions and the interviewer’s effect on the respondent. Example How do consumers feel about using the Internet for online shopping? To find out, United Parcel Service (UPS) commissioned a nationwide survey of 5,118 U.S. adults who had conducted at least two online transactions in 2015. One finding from the study is that 74% of online shoppers have used a smartphone to do their shopping. a. Identify the data-collection method. b. Identify the target population. c. Are the sample data representative of the population? Questionnaire Design 71 Questionnaires The validity of the results depends on the quality of these instruments. Good questionnaires are difficult to construct; bad questionnaires are difficult to analyze. Difficult to design for several reasons: Each question must provide a valid and reliable measure. The questions must clearly communicate the research intention to the survey respondent. The questions must be assembled into a logical, clear instrument that flows naturally and will keep the respondent sufficiently interested to continue to cooperate. 72 Quality aims in survey research Goal is to collect information that is: Valid: measures the quantity or concept that is supposed to be measured Reliable: measures the quantity or concept in a consistent or reproducible manner Unbiased: measures the quantity or concept in a way that does not systematically under- or overestimate the true value Discriminating: can distinguish adequately between respondents for whom the underlying level of the quantity or concept is different Steps to design a questionnaire: Step 1: Write out the primary and secondary aims of your study. Step 2: Write out concepts/information to be collected that relates to these aims. Step 3: Review the current literature to identify already validated questionnaires that measure your specific area of interest. Step 4: Compose a draft of your questionnaire. Step 5: Revise the draft. Step 6: Assemble the final questionnaire. 73 Step 1: Define the aims of the study Write out the problem and primary and secondary aims using one sentence per aim. Formulate a plan for the statistical analysis of each aim. Make sure to define the target population in your aim(s). 74 75 Step 2: Define the variables to be collected Write a detailed list of the information to be collected and the concepts to be measured in the study. Are you trying to identify: Attitudes Needs Behavior Demographics Some combination of these concepts Translate these concepts into variables that can be measured. Define the role of each variable in the statistical analysis: 76 Step 3: Review the literature Review current literature to identify related surveys and data collection instruments that have measured concepts similar to those related to your study’s aims. 77 Step 4: Compose a draft Determine the mode of survey administration: face-to-face interviews, telephone interviews, selfcompleted questionnaires, computer-assisted approaches. Format the draft as if it were the final version with appropriate white space to get an accurate estimate as to its length – longer questionnaires reduce the response rate. Make sure questions flow naturally from one to another. 78 Compose a draft Question: How many cups of coffee or tea do you drink in a day? Principle: Ask for an answer in only one dimension. Solution: Separate the question into two – (1) How many cups of coffee do you drink during a typical day? (2) How many cups of tea do you drink during a typical day? 79 Compose a draft Question: What brand of computer do you own? (A) IBM PC (B) Apple Principle: Avoid hidden assumptions. Make sure to accommodate all possible answers. Solution: (1) Make each response a separate dichotomous item Do you own an IBM PC? (Circle: Yes or No) Do you own an Apple computer? (Circle: Yes or No) (2) Add necessary response categories and allow for multiple responses. What brand of computer do you own? (Circle all that apply) Do not own computer IBM PC Apple Other 80 Compose a draft Question: Have you had pain in the last week? [ ] Never [ ] Seldom [ ] Often [ ] Very often Principle: Make sure question and answer options match. Solution: Reword either question or answer to match. How often have you had pain in the last week? [ ] Never [ ] Seldom [ ] Often [ ] Very Often 81 Compose a draft Question: Are you against drug abuse? (Circle: Yes or No) Principle: Write questions that will produce variability in the responses. Solution: Eliminate the question. 82 Compose a draft Question: Which one of the following do you think increases a person’s chance of having a heart attack the most? (Check one.) [ ] Smoking [ ] Being overweight [ ] Stress Principle: Encourage the respondent to consider each possible response to avoid the uncertainty of whether a missing item may represent either an answer that does not apply or an overlooked item. Solution: Which of the following increases the chance of having a heart attack? Smoking: Being overweight: Stress: [ ] Yes [ ] No [ ] Don’t know [ ] Yes [ ] No [ ] Don’t know [ ] Yes [ ] No [ ] Don’t know 83 Compose a draft Question: (1) Do you currently have a life insurance policy? (Circle: Yes or No) If no, go to question 3. (2) How much is your annual life insurance premium? Principle: Avoid branching as much as possible to avoid confusing respondents. Solution: If possible, write as one question. How much did you spend last year for life insurance? (Write 0 if none). 84 Step 5: Revise Shorten the set of questions for the study. If a question does not address one of your aims, discard it. Refine the questions included and their wording by testing them with a variety of respondents. Ensure the flow is natural. Verify that terms and concepts are familiar and easy to understand for your target audience. Step 6: Assemble the final questionnaire Decide whether you will format the questionnaire yourself or use computer-based programs for assistance: 85 SurveyMonkey.com Google form At the top, clearly state: The purpose of the study How the data will be used Instructions on how to fill out the questionnaire Your policy on confidentiality Assemble the final questionnaire 90 Group questions concerning major subject areas together and introduce them by heading or short descriptive statements. Order and format questions to ensure unbiased and balanced results. Assemble the final questionnaire 91 Include white space to make answers clear and to help increase response rate. Space response scales widely enough so that it is easy to circle or check the correct answer without the mark accidentally including the answer above or below. Open-ended questions: the space for the response should be big enough to allow respondents with large handwriting to write comfortably in the space. Closed-ended questions: line up answers vertically and precede them with boxes or brackets to check, or by numbers to circle, rather than open blanks. 92 Non-responders Understanding the characteristics of those who did not respond to the survey is important to quantify what, if any, bias exists in the results. To quantify the characteristics of the nonresponders to postal surveys, Moser and Kalton suggest tracking the length of time it takes for surveys to be returned. Those who take the longest to return the survey are most like the non-responders. This result may be situationdependent. 93 Conclusions You need plenty of time! Design your questionnaire from research hypotheses that have been carefully studied and thought out. Discuss the research problem with colleagues and subject matter experts is critical to developing good questions. Review, revise and test the questions on an iterative basis. Examine the questionnaire as a whole for flow and presentation. End of Chapter 1