26133: Business Statistics Exam Notes Types of Data Data Quality

26133: BUSINESS STATISTICS EXAM NOTES 1 TYPES OF DATA.................................................................................................................................................. 2 1.1 1.2 1.3 2 Data Quality (Nominal, Ordinal, Interval, Ratio) .................................................................................................................................. 2 Method of data collection .................................................................................................................................................................... 2 Types of graphs .................................................................................................................................................................................... 2 DESCRIPTIVE STATISTICS, NUMERICAL MEASURES...................................................................................................... 3 2.1 2.2 3 Numerical Data Summaries ................................................................................................................................................................. 3 Finding Outliers .................................................................................................................................................................................... 4 PROBABILITY [1] ................................................................................................................................................ 5 3.1 Miscellaneous Laws ............................................................................................................................................................................. 5 DEPENDANCE (CHI2 TEST) .................................................................................................................................... 6 4 5 PROBABILITY [2]: DISCREET PROBABILITY DISTRIBUTIONS ........................................................................................... 7 5.1 5.2 6 Binomial Distribution ........................................................................................................................................................................... 7 Poisson Distributions............................................................................................................................................................................ 7 PROBABILITY [3]: CONTINUOUS DISTRIBUTIONS........................................................................................................ 8 6.1 6.2 6.3 7 Uniform Distribution ............................................................................................................................................................................ 8 Normal Distribution ............................................................................................................................................................................. 8 Exponential Distributions ..................................................................................................................................................................... 8 SAMPLING AND SAMPLING DISTRIBUTIONS .............................................................................................................. 9 7.1 7.2 7.3 8 Can the sample be assumed to be normal? ......................................................................................................................................... 9 Standard error of a sample mean ........................................................................................................................................................ 9 Finite correction factor ........................................................................................................................................................................ 9 INTERVAL ESTIMATION ...................................................................................................................................... 10 8.1 8.2 8.3 8.4 8.5 9 Estimating the population mean with a large N, using “z” ................................................................................................................ 10 Estimating the population mean, using “t-statistic” (𝜎 unknown) ..................................................................................................... 11 Estimating the population proportion................................................................................................................................................ 12 Estimating population variance ......................................................................................................................................................... 12 Estimating sample size ....................................................................................................................................................................... 13 HYPOTHESIS TESTING [1 POPULATION] .................................................................................................................. 14 9.1 9.2 9.3 Methodology ..................................................................................................................................................................................... 14 Rejection and non-rejection regions .................................................................................................................................................. 14 Types of questions ............................................................................................................................................................................. 14 10 HYPOTHESIS TESTING [2+ POPULATIONS]............................................................................................................... 17 11 REGRESSION [1] .............................................................................................................................................. 19 12 REGRESSION [2] .............................................................................................................................................. 20 1 1 TYPES OF DATA 1.1 D ATA Q UALITY (N OMINAL , O RDINAL , I NTERVAL , R ATIO )     1.2 Nominal (purely descriptive) Ordinal (ordered) Interval (each group of equal magnitude) Ratio (has a zero point) M ETHOD OF DATA COLLECTION     1.3 Sampling (small group to represent population) o Cheap Population (everyone) o Thorough Time-series (over time) o Shows change Cross-sectional (once/a snapshot) o Cheap/where time is irrelevant T YPES OF GRAPHS      Bar chart o Sectional comparison/growth Line graph Ogive o Cumulative frequency (percentage less than) Pie chart o Percentages Scatter plot o Infer trends 2 2 2.1 DESCRIPTIVE STATISTICS, NUMERICAL MEASURES N UMERICAL D ATA S UMMARIES 2.1.1 Mode Most popular option 2.1.2 Median Central option 2.1.3 Mean 𝑀𝑒𝑎𝑛 = 𝜇 = 𝑥̅ = 1. 2. 3. 4. ∑ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 # 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 SD Mode [MODE], [MODE], [1]. Stat clear [SHIFT], [MODE], [1]. Enter values [OBSERVATION], [SHIFT], [,], [NUMBER OF OBSERVATIONS], [M+] (repeat for each observation value). Calculate [SHIFT], [2] (S-VAR), [1] (𝑋̅), [=]. 2.1.4 Variance 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝜎 2 = 1. 2. 3. 4. ∑𝑛𝑖=1(𝑥𝑖 − 𝜇)2 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 = 𝑠2 = 𝑛 𝑛−1 Stat clear [SHIFT], [MODE], [1]. Enter values [OBSERVATION], [SHIFT], [,], [NUMBER OF OBSERVATIONS], [M+] (repeat for each observation value). Calculate standard deviation [SHIFT], [2] (S-VAR), [2] (x 𝜎n) OR [3] (x 𝜎n-1), [=]. Square for variance [2], [=]. 2.1.5 Standard Deviation 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝜎 = √𝜎 2 = 𝑠 = √𝑠 2 1. 2. 3. Stat clear [SHIFT], [MODE], [1]. Enter values [OBSERVATION], [SHIFT], [,], [NUMBER OF OBSERVATIONS], [M+] (repeat for each observation value). Calculate standard deviation [SHIFT], [2] (S-VAR), [2] (x 𝜎n) OR [3] (x 𝜎n-1), [=]. 3 2.1.6 Coefficient of Variation Measure of data spread; best method where the data set is positive. 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 = 2.2 𝑠 𝜎 (100) = (100) 𝑥̅ 𝜇 F INDING O UTLIERS 2.2.1 Z-Score Z-score describes the distance of a number from the average in terms of standard deviations. 𝑍𝑠𝑐𝑜𝑟𝑒 = 𝑧𝑖 =  𝑥𝑖 − 𝑥̅ 𝑠 In outliers, 𝑧𝑖 > 3 2.2.2 Box and whisker plot Use for irregular/asymmetrical data Describes the data set in terms of 5 points: min, q1 , median, q3 , max → 𝐼𝑄𝑅 = 𝑞3 − 𝑞1.      min = 𝑞1 − 1.5(𝐼𝑄𝑅) 𝑞1 = 𝑠𝑝𝑙𝑖𝑡 𝑎𝑔𝑎𝑖𝑛 median = 𝑐𝑒𝑛𝑡𝑟𝑎𝑙 𝑑𝑎𝑡𝑎 𝑝𝑜𝑖𝑛𝑡 𝑞3 = 𝑠𝑝𝑙𝑖𝑡 𝑎𝑔𝑎𝑖𝑛 max = 𝑞3 + 1.5(𝐼𝑄𝑅) 4 3 PROBABILITY [1] 3.1 M ISCELLANEOUS L AWS   Sum of probabilities = 1 = 100% 𝑝′ = 1 − 𝑝 P(A∩B) P(A’∩B) P(A∩B’) P(A’∩B’) P(A) P(A’) P(B) P(B’) 1 3.1.1 Intersection Both occur: 𝑃(𝐴 ∩ 𝐵) 3.1.2 Union Either A or B or both occurring: 𝑃(𝐴 ∪ 𝐵) 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) 3.1.3 Conditional Probability Probability of A occurring given that B already occurs 𝑃(𝐴|𝐵) = 𝑃(𝐴 ∩ 𝐵) 𝑃(𝐵) 5 4 DEPENDANCE (CHI 2 TEST) 4.1.1 Observed Data Insert observed data into a probability table Observed data W Retail Sale 'W 420 1140 1560 sum w 280 160 440 sum 'w 700 sum rp 1300 sum 'rp 2000 TT 4.1.2 Probability from observations 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑓𝑟𝑜𝑚 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ∑∑ Probability P (Retail) P' (Sale) P (W) P' (W) 0.21 0.14 0.57 0.08 0.78 0.22 P (W) P' (W) 0.35 P (RP) 0.65 P' (RP) 1 TT 4.1.3 Predicted results if events are independent 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑟𝑒𝑠𝑢𝑙𝑡𝑠 𝑖𝑓 𝑒𝑣𝑒𝑛𝑡𝑠 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑎𝑛𝑡 = Events as independent W 'W Retail 546 154 Sale 1014 286 1560 440 sum w sum 'w 𝐶𝑜𝑙𝑢𝑚𝑛 ∗ 𝑅𝑜𝑤 ∑∑ 700 sum rp 1300 sum 'rp 2000 TT 4.1.4 Chi 2 Test 1. Create table: for each cell, 𝐶ℎ𝑖 = 𝜒 = 2. Total all cells: TTotal = Chi2 value (𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑟𝑒𝑠𝑢𝑙𝑡𝑠−𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑟𝑒𝑠𝑢𝑙𝑡𝑠 𝑖𝑓 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑎𝑛𝑡)2 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑟𝑒𝑠𝑢𝑙𝑡𝑠 𝑖𝑓 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑎𝑛𝑡 Compare Chi2 value with Chi2 critical value [found by entering degrees of freedom (𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑜𝑤𝑠 − 1)(𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑙𝑢𝑚𝑛𝑠 − 1) and alpha value (1 − 𝑐𝑒𝑟𝑡𝑎𝑖𝑛𝑡𝑦 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑) into the chi2 tables]  if 𝑐ℎ𝑖 2 > 𝑐ℎ𝑖 2 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒, then the values are dependant 6 5 PROBABILITY [2]: DISCREET PROBABILITY DISTRIBUTIONS Finite number of observations 1. Determine the type of distribution a. Binomial Distribution b. Poisson Distribution What is the question? a. Probability of x? Probability of more or less than? b. DRAW Get the formula Apply the terms 2. 3. 4. 5.1 B INOMIAL D ISTRIBUTION 𝑃(𝑥) = (𝑛𝑥)𝑃 𝑥 𝑞 𝑛−𝑥 = 𝑛! 𝑥!(𝑛−𝑥)! 𝑝 𝑥 𝑞 𝑛−𝑥 X = number of successes required N = number of trials P = probability of success Q = 1-probability of failure 𝑓(𝑥 = 𝑎) = 𝑛𝐶𝑎 ∗ 𝑝𝑎 ∗ 𝑞 𝑛−𝑎 F(x) = probability of x successes in n trials 5.2 P OISSON D ISTRIBUTIONS 𝑃(𝑥) = 𝜆𝑥 𝑒 −𝜆 𝑥! Λ = mean of Poisson distribution 7 6 PROBABILITY [3]: CONTINUOUS DISTRIBUTIONS Working strictly with probabilities (percentages etc) 6.1 U NIFORM D ISTRIBUTION This one looks like a rectangle; you merely need to find the area. 6.2 N ORMAL D ISTRIBUTION 6.2.1 Probability density function of the normal distribution 𝑓(𝑥) = 1 𝜎√2𝜋 1 𝑥−𝜇 2 ] 𝜎 𝑒 −(2)[ 6.2.2 Standardization (z-scores) 𝑧= 𝑥−𝜇 𝜎 Then plug the z score into the z distribution table (single sided test) 6.3 E XPONENTIAL D ISTRIBUTIONS 6.3.1 Probability density function of the exponential distribution 𝑓(𝑥) = 𝜆𝑒 −𝜆𝑥 X & 𝜆must be greater than zero 6.3.2 Probability of the right tail of the exponential distribution 𝑃(𝑥 ≥ 𝑥0 ) = 𝑒 −𝜆𝑥0 X0 must be greater than 0 8 7 7.1 SAMPLING AND SAMPLING DISTRIBUTIONS C AN THE SAMPLE BE ASS UMED TO BE NORMAL ? If: sample >30, yes If: population is normal, yes 7.2 S TANDARD ERROR OF A SAMPLE MEAN For infinite population 𝜎𝑥̅ = 𝜎 √𝑛 For finite population 𝜎 𝑁−𝑛 𝜎𝑥̅ = ( ) (√ ) 𝑁−1 √𝑛 N = observations in population n = observations in sample 7.3 F INITE CORRECTION FAC TOR This is necessary when 𝑛 𝑁 > 0.05 For proportions 𝜎𝑝̂ = √ 𝑝𝑞 𝑁 − 𝑛 √ 𝑛 𝑁−1 𝑝̂ = 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 = 𝑥 𝑛 X = number of items in sample with the requisite characteristic For quantitative data 𝜎 𝑁−𝑛 𝜎𝑥̅ = ( ) (√ ) 𝑁−1 √𝑛 9 8 INTERVAL ESTIMATION 8.1 E STIMATING THE POPULATION MEAN WITH A LAR GE N, USING “ Z ” 8.1.1 Basic form 𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 ∗ 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 If 𝑧 = 𝑥̅ −𝜇 𝜎 √𝑛 𝜇 = 𝑥̅ ± 𝑧 and sample mean can be greater or less than the population mean, the confidence interval is: 𝜎 √𝑛 8.1.2 Estimating 𝜇 𝜇 = 𝑥̅ ± 𝑧𝛼/2 𝜎 √𝑛 𝑧𝛼/2 =z-score of the one sided area outside of the confidence interval Or 𝑥̅ − 𝑧𝛼/2 𝜎 √𝑛 ≤ 𝜇 ≤ 𝑥̅ + 𝑧𝛼/2 𝜎 √𝑛 Usually, 𝑧𝛼/2 for confidence of 95%, see below 8.1.3 Finding 𝑧𝛼/2 1. 2. Draw Plug 𝑧𝛼/2 into z-tables 10 8.1.4 Add a finite correction factor 𝑥̅ − 𝑧𝛼/2 𝜎 𝑁−𝑛 𝜎 𝑁−𝑛 √ √ ≤ 𝜇 ≤ 𝑥̅ + 𝑧𝛼/2 √𝑛 𝑁 − 1 √𝑛 𝑁 − 1 8.1.5 If n is small (<30), then you can only use the above formulae if the population is normal 8.2 E STIMATING THE POPULATIO N MEAN , USING “ T - STATISTIC ” (𝜎 UNKNOWN ) 8.2.1 T distribution A distribution that describes the standardized sample mean when 𝜎 is unknown and population is normal 8.2.2 T value Tool used to reach conclusions about null hypothesis 𝑡= 𝑥̅ − 𝜇 𝑠 ⁄ √𝑛 8.2.3 T distribution table To read the table we need degrees of freedom and a t value 𝐷𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 = 𝑛 − 1 𝑡 = 𝛼/2 8.2.4 Confidence intervals to estimate the population mean using the t -stat 𝑥̅ − 𝑡𝛼,𝑛−1 2 𝑠 √𝑛 ≤ 𝜇 ≤ 𝑥̅ + 𝑡𝛼,𝑛−1 2 𝑠 √𝑛 11 8.3 𝑧= E STIMATING THE POPULATION PROPORT ION 𝑝̂ − 𝑝 √𝑝̂ 𝑞̂ 𝑛 𝑝̂ = sample proportion 𝑞̂= 1-𝑝̂ p= population proportion n= sample size 8.3.1 Confidence interval to estimated p 𝑝̂ − 𝑧𝛼/2 √ 8.4 𝑠2 = 𝑝̂ 𝑞̂ 𝑝̂ 𝑞̂ ≤ 𝑝 ≤ 𝑝̂ + 𝑧𝛼/2 √ 𝑛 𝑛 E STIMATING POPULATION VARIANCE ∑(𝑥 − 𝑥̅ )2 𝑛−1 8.4.1 Chi 2 formula for variance NB: Distribution must be normal to use this formula 𝜒2 = (𝑛 − 1)𝑠 2 𝜎2 𝑑𝑓 = (𝑛 − 1) 8.4.2 Confidence interval to estimate the population variance (𝑛 − 1)𝑠 2 2 𝜒𝑎/2 ≤ 𝜎2 ≤ (𝑛 − 1)𝑠 2 2 𝜒1−𝑎/2 𝑑𝑓 = (𝑛 − 1) 2 Work 𝜒 2 out using 𝜒(2𝑎),𝑑𝑓 and 𝜒(1− and the 𝜒 2 tables. 𝑎 ),𝑑𝑓 2 2 12 8.5 E STIMATING SAMPLE SIZ E This is used to find the minimum sample size to fulfill the requirements of a particular confidence level within a certain amount of error. 8.5.1 Sample size when estimating µ 𝑛= 2 𝑧𝑎/2 𝜎2 𝐸2 2 𝑧𝑎 𝜎 =( 2 𝐸 ) 𝐸 = (𝑥̅ − 𝜇) = 𝐸𝑟𝑟𝑜𝑟 𝑜𝑓 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑖𝑜𝑛 You either need to work out E, or it can be given as “to be within .03 of the true population proportion” Always round up, since you can’t have half-people 8.5.2 Sample size when estimating p 𝑛= 𝑧 2 𝑝𝑞 𝐸2 Work out z-stat through confidence interval and tables 13 9 HYPOTHESIS TESTING [1 POPULATION] 9.1 M ETHODOLOGY 1. 2. 3. 4. 5. 9.2 Specify the thing of interest Formulate H0 and Ha a. Draw Define the level of significance a. 1 sided or two sided test? i. 1 sided for greater or less ii. 2 sided for equals Test a. Determine the appropriate statistical test b. Establish the decision rule c. Gather sample data d. Analyze the data Conclude/business application R EJECTION AND NON - REJECTION REGIONS Via critical values (inside is non-rejection, outside is rejection region) 14 9.3 U SING Z - STAT 9.3.1 Testing hypothesis about a population mean using the z -stat Z test for a single mean 𝑧= 𝑥̅ − 𝜇 𝜎⁄√𝑛 Where result is z, minus z from 0.5 or 1 and find on z table then look up row/column (i.e. the reverse of finding z score) 9.3.1.1 EXAMPLE QUESTION CPA’s average net Y for sole proprietor is $74914 [statistic from 10 years ago] Test again, n=112, 𝜎=$14530 STEP 1: HYPOTHESISE H0: µ=$74914 Ha: µ≠$74914 STEP 2: WHICH TEST TO USE? Sample size is large (n>30), sample mean as stat, therefore z-stat. 𝑧= 𝑥̅ − 𝜇 𝜎⁄√𝑛 STEP 3: WHAT ARE THE CRITICAL VALUES? Accuracy required: 95%, therefore α=.05 This test involves an = sign, not a ≤ or ≥ sign, so it is a two tailed test α/2=.05/2=.025 Each side therefore has a .475 success area and a .025 fail area. Plug .025 into z table to find zα/2  +/- 1.96 STEP 4: FIND TEST STATISTIC Sample mean = $78695, n = 112, µ = $74914, 𝜎=$14530 𝑧= 78695 − 74914 𝜎14530⁄√112 = 2.75 15 STEP 5: COMPARE TO CRITICAL VALUES Accepted range = +/- 1.96; 2.75 is not in this range, reject null hypothesis 9.3.2 Testing the mean with a finite population 𝑧= 𝑥̅ − 𝜇 𝜎 √𝑁 − 𝑛 √𝑛 𝑁 − 1 9.4 U SING F- STAT 9.4.1 T-test for µ P320 𝑥̅ − 𝜇 𝑠 √𝑛 𝑡= 𝑑𝑓 = 𝑛 − 1 9.5 𝑧= H YPOTHESIS ABOUT A PR OPORTION 𝑝̂ − 𝑝 𝑝𝑞 √𝑝 9.6 H YPOTHESIS ABOUT A VARIANCE P331 𝜒2 = (𝑛 − 1)𝑠 2 𝜎2 𝑑𝑓 = 𝑛 − 1 9.7 T YPE 2 ERRORS When null hypothesis is false See p 334 16 10 HYPOTHESIS TESTING [2+ POPULATIONS] p399 10.1 Z FORMULA FOR THE DI FFERENCE IN TWO SAMP LE MEANS AND POPULATION VARIA NCES 𝑧= (𝑥1 − 𝑥2 ) − (𝜇1 − 𝜇2 ) √( 𝜎12 𝜎22 + ) 𝑛1 𝑛2 𝜇1 − 𝜇2 = 0 10.1.1 Confidence intervals in estimate of 𝜇1 − 𝜇2 ( SEE P 360) 10.2 T STAT FOR THE DIFFERE NCE IN TWO SAMPLE MEANS (V ARIANCES UNKNOWN ) (see p365) 10.2.1 Confidence intervals in estimate of 𝜇1 − 𝜇2 (see p369) 10.3 S TATISTICAL INFERENCE S FOR RELATED POPULATIONS (see p 373) 10.4 S TATISTICAL INFERENCE S FOR TWO POPULATION PROPORTIONS (p383) 10.5 S TATISTICAL INFERENCE S FOR TWO POPULATION VARIANCES (p390) Ratio of two sample variances gives F value 17 11 ANOVA 18 12 REGRESSION [1] 12.1 S INGLE R EGRESSION 𝑦 = (𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡) + 𝑐1 𝑥1 + 𝑐2 𝑥2 + ⋯ + 𝑐𝑛 𝑥𝑛 If regression output “p-value” is smaller than .05 reject null hypothesis and use in formula R^2 shows “goodness” of model (0=bad, 1=good) 12.2 M ULTIPLE REGRESSION In multiple regression R^2 is inaccurate, so we have to adjust 12.3 P ROBLEMS Multi collinearity (values overlap) 19 13 REGRESSION [2] MORE PROBLEMS Residual is the difference between predicted and actual results 13.1 F-T EST H0, all of the coefficients = 0 If f-stat > critical F If significance f < alpha, reject Testing each coefficient, change one at a time to 0, see if there is a change 20

26133: Business Statistics Exam Notes Types of Data Data Quality

Related documents

Products

Support

26133: Business Statistics Exam Notes Types of Data Data Quality

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib