MIS 301 Statistical Analysis for Business Chris O’Byrne Spring 2019, Sections 1 & 6 Lecture Notes for Chapters 1 – 3 Course Text: Anderson, D.R., Sweeney, D.J. and T.A. Williams, Modern Business Statistics with Microsoft Excel, 5th edition, Cengage Learning, 2015. Suggested Problems: Read Chapters 1 – 3. Chapter 1 exercises: 3, 7, 10, 13 Chapter 2 exercises: 2, 8, 11, 12, 15, 23, 29, 49 Chapter 3 exercises: 4, 5, 6, 8, 21, 27, 31, 40, 47, 52, 56, 58, 61 Use Excel for problems with a CD file. Answers to all of the homework problems are available in the solutions manual. Section 1 – Monday, January 28, 2019 Sections 6 – Friday, January 25, 2019 Copyright © 2017, Chris O’Byrne, MIS Department, San Diego State University. Any use or reproduction of these notes without his consent is prohibited by law. Chapter 1: Data and Statistics Terms: Statistics: The art and science of collecting, analyzing, presenting, and interpreting data. Data: The facts and figures that are collected, analyzed, and summarized for presentation and interpretation. Elements: The entities on which data are collected. Variable: An attribute or characteristic of an entity. Observation: The set of measurements for a particular element. Example: Financial Data on Selected Companies Company Exchange Ticker Market Cap (millions) P/E Ratio Current Price 52-week high 52-week low Qualcomm NASDAQ QCOM 56,767 30.75 34.75 44.99 32.08 GE NYSE GE 371,101 21.21 34.99 37.75 31.42 IBM NYSE IBM 127,936 15.86 79.30 99.10 71.85 Gateway NYSE GTW 1,392 0.00 3.75 6.92 2.78 Microsoft NASDAQ MSFT 271,081 24.36 25.09 30.20 23.82 Source: money.cnn.com accessed on 7/10/05 What is the element or entity in the above data set? What are the attributes for these entities? How many observations are in the above data set? Scale of Measurement: Nominal: Data are labels or names. Ordinal: The data have the properties of nominal data and can be ordered in some meaningful way. Interval: The data have the properties of ordinal data and equal distances on the scale represent equal distances being measured. Ratio: The data have the properties of interval data and contain an absolute zero value resulting in meaningful ratios. 1 Qualitative Data vs. Quantitative Data Qualitative data use labels or names to describe an attribute of an element whereas Quantitative data indicate how much or how many. Discrete vs. Continuous Cross-Sectional Data vs. Time Series Data Cross-sectional data has been collected at approximately the same time. Time series data are collected over a series of time periods. Statistical Studies Experimental: Control one or more independent variable (IV) to measure their influence on a dependent variable (DV). Observational: No attempt is made to control the IV. Statistical Inference Population: The set of all elements of interest in a particular study. Sample: A subset of the population. Ethical Guidelines for Statistical Practice (unethical behavior can take a variety of forms) • • • • • • Improper sampling Inappropriate analysis of the data Development of misleading graphs Use of inappropriate summary statistics Biased interpretation of the statistical results Multiple tests until a desired result is obtained 2 Chapter 2: Descriptive Statistics: Tabular and Graphical Methods Read Chapter 2 Review the Following: 2.1 Summarizing Qualitative Data Frequency Distribution Relative and Percent Frequency Distributions (Table 2.3) Bar Graphs and Pie Charts (Figures 2.3 & 2.4) 2.2 Summarizing Quantitative Data Frequency Distribution (Table 2.6) Relative and Percent Frequency Distributions (Table 2.7) Histogram (Figures 2.7 & 2.8) Cumulative Distributions (Table 2.8) Ogive (Figure 2.15) 2.3 Stem-and-Leaf Display 2.4 Crosstabulations and Scatter Diagrams (Table 2.11 and Figure 2.23) 3 Chapter 3: Descriptive Statistics: Numerical Measures Terms: Sample Statistics: Numerical measures computed from a sample (e.g., the sample mean, x , and the sample standard deviation, s). Population Parameters: Numerical measures computed from a population (e.g., the population mean, , and the population standard deviation, ). Point Estimate: The sample statistic used to estimate the corresponding population parameter (e.g., the sample mean, x , is often used as a point estimate of the population mean, ). 3.1 Measures of Location Mean Population Mean Sample Mean Median – The middle value when the data are sorted in ascending order. To compute the median, sort data in ascending order: (a) If n is odd, the median is the middle value. (b) If n is even, the median is the average of the two middle values. Mode – The value that occurs with the greatest frequency. Excel Functions: Mean =AVERAGE(range) Median =MEDIAN(range) Mode =MODE(range) 4 Percentile: The pth percentile is a value that at least p percent of the observations are ≤ to this value and at least (1-p) percent of the observations are ≥ to this value. Calculating the pth Percentile Step 1. Step 2. Step 3. Sort data in ascending order. p i = n Computer index i: 100 (a) If i is not an integer, round up. The next integer greater than i denotes the position of the pth percentile. (b) If i is an integer, the pth percentile is the average of the values in positions i and i + 1. Quartiles Q1 = first quartile, or 25th percentile Q2 = second quartile, or 50th percentile (median) Q3 = third quartile, or 75th percentile Calculate the 35th percentile - 2 4 7 9 13 17 21 23 29 40 43 Excel Functions: Percentile =PERCENTILE(array, k) where k is the percentile from 0 to 1 Quartile =QUARTILE(array, quart) where quart is the quartile Note: Excel assumes that you are working with continuous data and uses a slightly different formula to compute percentiles (for an explanation see p. 95 in the textbook). When working with large datasets, this difference is negligible. 5 Exact Values for Percentiles & Quartiles: LP = P (n + 1) 100 if LP = 10.4 P Percentile = 10th # + .4(11th #-10th #) .4 or 4 percent between the 10th and 11th number Calculate the 35th percentile and 3rd Quartile - 2 4 7 9 13 17 21 23 29 40 43 Example. The following is a list of countries and their Gross National Income (GNI) per capita as published by the World Bank, 7/16/01. The data is rounded and presented in thousands (1,000s). Australia: 20 Canada: 20 Ireland: 23 Italy: 19 Portugal: 11 Singapore: 24 Spain: 15 Sweden: 27 Switzerland: 37 United States: 34 Calculate the Mean, Median, Mode, Q1 , Q2 , Q3 and the 40th percentile 6 3.2 Measures of Variability Range Interquartile Range Variance Population Variance Sample Variance Standard Deviation Population Standard Deviation Sample Standard Deviation Coefficient of Variation – is a descriptive statistic that indicates how large the standard deviation is relative to the mean. s 100 % x Using the following sample: 3 8 11 15 20 25 28 29 31 35 Find the mean, median, quartiles, range, IQR, 60th percentile and the coefficient of variation. 7 Compute the coefficient of variation for the following stocks: Stock IBM Citibank Mean 121 14 Variance 36 18 Excel Functions: Range =MAX(range)-MIN(range) IQR =QUARTILE(array, 3) - QUARTILE(array, 1) Variance =VAR(range) Std. Dev. =STDEV(range) Note: To compute the population variance and the population standard deviation, use =VARP(range) and =STDEVP(range) respectively. Example. The following is a list of countries and their Gross National Income (GNI) per capita as published by the World Bank, 7/16/01. The data is rounded and presented in thousands (1,000s). Australia: 20 Canada: 20 Ireland: 23 Italy: 19 Portugal: 11 Singapore: 24 Spain: 15 Sweden: 27 Switzerland: 37 United States: 34 Calculate the Range, Variance, Standard Deviation, IQR, and Coefficient of Variation 8 Review “Using Excel’s Descriptive Statistics Tool” on pp.106 – 108. Data File: Salary.xls Tools → Data Analysis → Descriptive Statistics Output 9 3.3 Measures of Distribution Shape, Relative Location, and Detecting Outliers Skewness = (negative, positive) Excel Function: =SKEW(range) Mean vs Median: Relative Location The z-score is often referred to as the standardized value, it is the number of standard deviations an observation x i is away from x . z-Score Country Per capita GNI (1,000s) Australia 20 Canada 20 Ireland 23 Italy 19 Portugal 11 Singapore 24 Spain 15 Sweden 27 Switzerland 37 United States 34 z-Score Example: Test: Mean: Std Dev. SAT 970 280 ACT 24 3.2 Student: Score: Sally 1330 Alice 30 Who did better on there respective exam? 10 Chebyshev’s Theorem: -lower bound – “at least” k>1 Example: Mean=28, standard deviation=5, what is the minimum proportion of data points that fall within 12 units of the mean? Example: If the mean is 45 and the standard deviation is 8, what is the minimum proportion of data points that will fall between 35 and 55? Example: If the mean is 163 and the standard deviation is 25, what is the minimum proportion of data points that will fall between 100 and 226? Empirical Rule: (normal dist.) 1 68% 2 95% 3 99.7% Rule of thumb for Identifying Outliers: 11 3.4 Exploratory Data Analysis Five Number Summary Stem Plot Box Plot 8 15 19 22 22 25 30 33 34 38 40 44 45 48 53 81 3.5 Measures of Association Between Two Variables Covariance Sample Covariance Population Covariance Pearson Product Moment Correlation Coefficient Sample Correlation Coefficient: Population Correlation Coefficient: 12 3.6 Weighted Mean Weighted Mean Example: A student in Dr. Reinig’s section of IDS 301 finishes the semester with the following scores: Assignments: 65, 95, 100 (worth 10%) Midterms: 58, 76, 68 (high is worth 30%, 2nd is worth 25%, 3rd is thrown out) Final: 80 (worth 35%) What is this student’s weighted mean for the course? Assume that the student in the above example has not yet taken the final. What is his weighted mean for the course prior to the final and what does he need to score on the final exam to earn a 72 for the course? Example: x 5 25 90 f(x) 10 4 1 GPA Class IDS MGMT BIO PE ACCT MKT Find the weighted average? Cars per household in a Community Units Grade Cars Households 3 3 5 1 4 2 A B B D F C 0 1 2 3 4 48 51 34 9 5 13 Geometric Mean (used typically in financial data) 2 questions: 1) value at end of period 2) average rate of return Year 1 2 3 4 5 6 7 8 9 10 $100 Return -22.1% 28.7% 10.9% 4.9% 15.8% 5.5% -37.0% 26.5% 15.1% 2.1% Growth Factor 0.779 1.287 1.109 1.049 1.158 1.055 0.63 1.265 1.151 1.021 $ 133.45 1.029275 1.3345 Avg Rate just adding up and Dividing- Wrong 5.040% 2.9275% Initially invested $100 $100*[.779*1.287*1.109*1.049*1.158*1.055*.630*1.265*1.151*1.021] = $133.4493 Mean Growth Rate➔ x g = 10 1.334493 = 1.029275 ➔ 2.9275% annual growth rate $133.4493 / $100 (initial investment) If you invested $1500 for 5 years and received the following returns, what will be the value of your investment and the average rate of return after 5 years? Year 1 2 3 4 5 Return 15.0% 32.1% -11.2% 4.9% -9.0% Growth Factor 14 Practice Problems: 1. Chapter 3 Q. 62 Tax Penalties on Payroll Taxes 820 270 450 1010 890 700 1350 350 300 1200 390 730 2040 230 640 350 420 270 370 620 The above data represents a sample of 20 companies tax penalties for not properly paying payroll taxes. A. What is the range of this data? B. What is the IQR? C. What is the mean and median? D. What is the first and third quartile? E. What is the 60th percentile? F. What is the 35th percentile? G. What is the coefficient of variation? Sample Standard Deviation = 455.91 2. The average weight of a 26 year old male is 171 pounds with a standard deviation of 18 pounds. Assume the following weights are normally distributed. A. What is the probability that you pick someone and they weigh more than 189 pounds? B. What is the probability that you pick someone and they weigh less than 135 pounds? C. What is the probability that you pick someone and they weigh more than 153 pounds? D. What is the probability that you pick someone and they weigh less than 225 pounds? E. What is the probability that you pick someone and they weigh more than 225 pounds? F. What is the probability that you pick someone and they weigh between 135 pounds and 189 pounds? 15 3. The average amount of money in tips a food server makes at this particular restaurant is $91 with a standard deviation of $16. What is the minimum percentage of food servers that make between $50 and $132. 4. The average cell phone bill is $78 with a standard deviation of $10. What is the minimum percentage of cell phone bills that will be between $54 and $102. 5. What is the weighted average of the following salaries: ategory # of Workers CEO President CFO Mechanics Sales Associates 6. Salary 1 1 1 15 20 10 $ $ $ $ $ $ 100,000 75,000 65,000 40,000 35,000 45,000 The following table is the prices and the number of times I paid that price to go to baseball games this season. Tell me the average price I paid this season. Price of Ticket $1 $3 $8 $15 $40 # of Tickets 8 21 17 15 8 7. In one of your classes the syllabus states that HW is worth 10%, Quizzes 20%, Test 40% and the final 30%. Your grades on the HW is 80, Quizzes is 75 and Tests 68. What is your current grade in the class? What do you need to get on the final to get a 75 in the class? 8. The possible returns for your portfolio and the probabilities that it occurs is as follows: -15% .20 9% .15 11% .30 18% .25 25% .10 What is the expected return for the portfolio? 16 9. If you invested $1000 for 4 years and received the following returns, what will be the value of your investment and the average rate of return after 4 years? Year 1 2 3 4 Growth Factor Return 25.0% -50.0% 30.0% 10.0% Answers to Practice Problems: 1. A. 1810 B. 522.5 C. 670 & 535 D. 350 & 872.5 E. 676 F. 377 G. 68.05% 2. A. 16% B. 2.5% C. 84% D. 99.85% E. .15% F. 81.5% 3. std dev = 2.5625 84.771% 4. std dev = 2.4 5. $41,458.33 6. 7. before the final 71.71, needed on the final 82.67 8. 8.65% 82.6389% $10.90 9. Year 1 2 3 4 $1,000 0.894 Return 25.0% -50.0% 30.0% 10.0% Growth Factor 1.25 0.5 1.3 1.1 $ 893.75 0.972308 Avg Rate just adding up and Dividing- Wrong 3.750% -2.769% 17 The following will be included in your formula sheet for Midterm 1: Formulas from Chapter 3: Descriptive Statistics: Numerical Measures Sample Mean: x = xi n Sample Standard Deviation: s = s 2 z-score: z i = xi − x s Correlation Coefficient: rxy = (x i − x ) 2 Sample Variance: s 2 = s Coefficient of Variation: 100 % x Sample Covariance: s xy = s xy s xs y n −1 Weighted Mean: x = (x i − x )(y i − y ) n −1 wi xi wi 18