26133: BUSINESS STATISTICS EXAM NOTES 1 TYPES OF DATA.................................................................................................................................................. 2 1.1 1.2 1.3 2 Data Quality (Nominal, Ordinal, Interval, Ratio) .................................................................................................................................. 2 Method of data collection .................................................................................................................................................................... 2 Types of graphs .................................................................................................................................................................................... 2 DESCRIPTIVE STATISTICS, NUMERICAL MEASURES...................................................................................................... 3 2.1 2.2 3 Numerical Data Summaries ................................................................................................................................................................. 3 Finding Outliers .................................................................................................................................................................................... 4 PROBABILITY [1] ................................................................................................................................................ 5 3.1 Miscellaneous Laws ............................................................................................................................................................................. 5 DEPENDANCE (CHI2 TEST) .................................................................................................................................... 6 4 5 PROBABILITY [2]: DISCREET PROBABILITY DISTRIBUTIONS ........................................................................................... 7 5.1 5.2 6 Binomial Distribution ........................................................................................................................................................................... 7 Poisson Distributions............................................................................................................................................................................ 7 PROBABILITY [3]: CONTINUOUS DISTRIBUTIONS........................................................................................................ 8 6.1 6.2 6.3 7 Uniform Distribution ............................................................................................................................................................................ 8 Normal Distribution ............................................................................................................................................................................. 8 Exponential Distributions ..................................................................................................................................................................... 8 SAMPLING AND SAMPLING DISTRIBUTIONS .............................................................................................................. 9 7.1 7.2 7.3 8 Can the sample be assumed to be normal? ......................................................................................................................................... 9 Standard error of a sample mean ........................................................................................................................................................ 9 Finite correction factor ........................................................................................................................................................................ 9 INTERVAL ESTIMATION ...................................................................................................................................... 10 8.1 8.2 8.3 8.4 8.5 9 Estimating the population mean with a large N, using “z” ................................................................................................................ 10 Estimating the population mean, using “t-statistic” (π unknown) ..................................................................................................... 11 Estimating the population proportion................................................................................................................................................ 12 Estimating population variance ......................................................................................................................................................... 12 Estimating sample size ....................................................................................................................................................................... 13 HYPOTHESIS TESTING [1 POPULATION] .................................................................................................................. 14 9.1 9.2 9.3 Methodology ..................................................................................................................................................................................... 14 Rejection and non-rejection regions .................................................................................................................................................. 14 Types of questions ............................................................................................................................................................................. 14 10 HYPOTHESIS TESTING [2+ POPULATIONS]............................................................................................................... 17 11 REGRESSION [1] .............................................................................................................................................. 19 12 REGRESSION [2] .............................................................................................................................................. 20 1 1 TYPES OF DATA 1.1 D ATA Q UALITY (N OMINAL , O RDINAL , I NTERVAL , R ATIO ) ο· ο· ο· ο· 1.2 Nominal (purely descriptive) Ordinal (ordered) Interval (each group of equal magnitude) Ratio (has a zero point) M ETHOD OF DATA COLLECTION ο· ο· ο· ο· 1.3 Sampling (small group to represent population) o Cheap Population (everyone) o Thorough Time-series (over time) o Shows change Cross-sectional (once/a snapshot) o Cheap/where time is irrelevant T YPES OF GRAPHS ο· ο· ο· ο· ο· Bar chart o Sectional comparison/growth Line graph Ogive o Cumulative frequency (percentage less than) Pie chart o Percentages Scatter plot o Infer trends 2 2 2.1 DESCRIPTIVE STATISTICS, NUMERICAL MEASURES N UMERICAL D ATA S UMMARIES 2.1.1 Mode Most popular option 2.1.2 Median Central option 2.1.3 Mean ππππ = π = π₯Μ = 1. 2. 3. 4. ∑ πππ πππ£ππ‘ππππ # ππ πππ πππ£ππ‘ππππ SD Mode [MODE], [MODE], [1]. Stat clear [SHIFT], [MODE], [1]. Enter values [OBSERVATION], [SHIFT], [,], [NUMBER OF OBSERVATIONS], [M+] (repeat for each observation value). Calculate [SHIFT], [2] (S-VAR), [1] (πΜ ), [=]. 2.1.4 Variance ππππππππ = π 2 = 1. 2. 3. 4. ∑ππ=1(π₯π − π)2 ∑ππ=1(π₯π − π₯Μ )2 = π 2 = π π−1 Stat clear [SHIFT], [MODE], [1]. Enter values [OBSERVATION], [SHIFT], [,], [NUMBER OF OBSERVATIONS], [M+] (repeat for each observation value). Calculate standard deviation [SHIFT], [2] (S-VAR), [2] (x πn) OR [3] (x πn-1), [=]. Square for variance [2], [=]. 2.1.5 Standard Deviation ππ‘ππππππ π·ππ£πππ‘πππ = π = √π 2 = π = √π 2 1. 2. 3. Stat clear [SHIFT], [MODE], [1]. Enter values [OBSERVATION], [SHIFT], [,], [NUMBER OF OBSERVATIONS], [M+] (repeat for each observation value). Calculate standard deviation [SHIFT], [2] (S-VAR), [2] (x πn) OR [3] (x πn-1), [=]. 3 2.1.6 Coefficient of Variation Measure of data spread; best method where the data set is positive. πΆππππππππππ‘ ππ ππππππ‘πππ = 2.2 π π (100) = (100) π₯Μ π F INDING O UTLIERS 2.2.1 Z-Score Z-score describes the distance of a number from the average in terms of standard deviations. ππ ππππ = π§π = ο· π₯π − π₯Μ π In outliers, π§π > 3 2.2.2 Box and whisker plot Use for irregular/asymmetrical data Describes the data set in terms of 5 points: min, q1 , median, q3 , max → πΌππ = π3 − π1. ο· ο· ο· ο· ο· min = π1 − 1.5(πΌππ ) π1 = π ππππ‘ πππππ median = ππππ‘πππ πππ‘π πππππ‘ π3 = π ππππ‘ πππππ max = π3 + 1.5(πΌππ ) 4 3 PROBABILITY [1] 3.1 M ISCELLANEOUS L AWS ο· ο· Sum of probabilities = 1 = 100% π′ = 1 − π P(A∩B) P(A’∩B) P(A∩B’) P(A’∩B’) P(A) P(A’) P(B) P(B’) 1 3.1.1 Intersection Both occur: π(π΄ ∩ π΅) 3.1.2 Union Either A or B or both occurring: π(π΄ ∪ π΅) π(π΄ ∪ π΅) = π(π΄) + π(π΅) − π(π΄ ∩ π΅) 3.1.3 Conditional Probability Probability of A occurring given that B already occurs π(π΄|π΅) = π(π΄ ∩ π΅) π(π΅) 5 4 DEPENDANCE (CHI 2 TEST) 4.1.1 Observed Data Insert observed data into a probability table Observed data W Retail Sale 'W 420 1140 1560 sum w 280 160 440 sum 'w 700 sum rp 1300 sum 'rp 2000 TT 4.1.2 Probability from observations ππππππππππ‘π¦ ππππ πππ πππ£ππ‘πππ = πππ πππ£ππ‘πππ ∑∑ Probability P (Retail) P' (Sale) P (W) P' (W) 0.21 0.14 0.57 0.08 0.78 0.22 P (W) P' (W) 0.35 P (RP) 0.65 P' (RP) 1 TT 4.1.3 Predicted results if events are independent πππππππ‘ππ πππ π’ππ‘π ππ ππ£πππ‘π πππ πππππππππππ‘ = Events as independent W 'W Retail 546 154 Sale 1014 286 1560 440 sum w sum 'w πΆπππ’ππ ∗ π ππ€ ∑∑ 700 sum rp 1300 sum 'rp 2000 TT 4.1.4 Chi 2 Test 1. Create table: for each cell, πΆβπ = π = 2. Total all cells: TTotal = Chi2 value (πππ πππ£ππ πππ π’ππ‘π −πππππππ‘ππ πππ π’ππ‘π ππ πππππππππππ‘)2 πππππππ‘ππ πππ π’ππ‘π ππ πππππππππππ‘ Compare Chi2 value with Chi2 critical value [found by entering degrees of freedom (ππ’ππππ ππ πππ€π − 1)(ππ’ππππ ππ ππππ’πππ − 1) and alpha value (1 − ππππ‘ππππ‘π¦ ππππ’ππππ) into the chi2 tables] ο if πβπ 2 > πβπ 2 ππππ‘ππππ π£πππ’π, then the values are dependant 6 5 PROBABILITY [2]: DISCREET PROBABILITY DISTRIBUTIONS Finite number of observations 1. Determine the type of distribution a. Binomial Distribution b. Poisson Distribution What is the question? a. Probability of x? Probability of more or less than? b. DRAW Get the formula Apply the terms 2. 3. 4. 5.1 B INOMIAL D ISTRIBUTION π(π₯) = (ππ₯)π π₯ π π−π₯ = π! π₯!(π−π₯)! π π₯ π π−π₯ X = number of successes required N = number of trials P = probability of success Q = 1-probability of failure π(π₯ = π) = ππΆπ ∗ ππ ∗ π π−π F(x) = probability of x successes in n trials 5.2 P OISSON D ISTRIBUTIONS π(π₯) = ππ₯ π −π π₯! Λ = mean of Poisson distribution 7 6 PROBABILITY [3]: CONTINUOUS DISTRIBUTIONS Working strictly with probabilities (percentages etc) 6.1 U NIFORM D ISTRIBUTION This one looks like a rectangle; you merely need to find the area. 6.2 N ORMAL D ISTRIBUTION 6.2.1 Probability density function of the normal distribution π(π₯) = 1 π√2π 1 π₯−π 2 ] π π −(2)[ 6.2.2 Standardization (z-scores) π§= π₯−π π Then plug the z score into the z distribution table (single sided test) 6.3 E XPONENTIAL D ISTRIBUTIONS 6.3.1 Probability density function of the exponential distribution π(π₯) = ππ −ππ₯ X & πmust be greater than zero 6.3.2 Probability of the right tail of the exponential distribution π(π₯ ≥ π₯0 ) = π −ππ₯0 X0 must be greater than 0 8 7 7.1 SAMPLING AND SAMPLING DISTRIBUTIONS C AN THE SAMPLE BE ASS UMED TO BE NORMAL ? If: sample >30, yes If: population is normal, yes 7.2 S TANDARD ERROR OF A SAMPLE MEAN For infinite population ππ₯Μ = π √π For finite population π π−π ππ₯Μ = ( ) (√ ) π−1 √π N = observations in population n = observations in sample 7.3 F INITE CORRECTION FAC TOR This is necessary when π π > 0.05 For proportions ππΜ = √ ππ π − π √ π π−1 πΜ = πππππππ‘πππ = π₯ π X = number of items in sample with the requisite characteristic For quantitative data π π−π ππ₯Μ = ( ) (√ ) π−1 √π 9 8 INTERVAL ESTIMATION 8.1 E STIMATING THE POPULATION MEAN WITH A LAR GE N, USING “ Z ” 8.1.1 Basic form πππππ‘ ππ π‘ππππ‘π ± ππππ‘ππππ π£πππ’π ∗ π π‘ππππππ πππππ If π§ = π₯Μ −π π √π π = π₯Μ ± π§ and sample mean can be greater or less than the population mean, the confidence interval is: π √π 8.1.2 Estimating π π = π₯Μ ± π§πΌ/2 π √π π§πΌ/2 =z-score of the one sided area outside of the confidence interval Or π₯Μ − π§πΌ/2 π √π ≤ π ≤ π₯Μ + π§πΌ/2 π √π Usually, π§πΌ/2 for confidence of 95%, see below 8.1.3 Finding π§πΌ/2 1. 2. Draw Plug π§πΌ/2 into z-tables 10 8.1.4 Add a finite correction factor π₯Μ − π§πΌ/2 π π−π π π−π √ √ ≤ π ≤ π₯Μ + π§πΌ/2 √π π − 1 √π π − 1 8.1.5 If n is small (<30), then you can only use the above formulae if the population is normal 8.2 E STIMATING THE POPULATIO N MEAN , USING “ T - STATISTIC ” (π UNKNOWN ) 8.2.1 T distribution A distribution that describes the standardized sample mean when π is unknown and population is normal 8.2.2 T value Tool used to reach conclusions about null hypothesis π‘= π₯Μ − π π ⁄ √π 8.2.3 T distribution table To read the table we need degrees of freedom and a t value π·ππππππ ππ πππππππ = π − 1 π‘ = πΌ/2 8.2.4 Confidence intervals to estimate the population mean using the t -stat π₯Μ − π‘πΌ,π−1 2 π √π ≤ π ≤ π₯Μ + π‘πΌ,π−1 2 π √π 11 8.3 π§= E STIMATING THE POPULATION PROPORT ION πΜ − π √πΜ πΜ π πΜ = sample proportion πΜ= 1-πΜ p= population proportion n= sample size 8.3.1 Confidence interval to estimated p πΜ − π§πΌ/2 √ 8.4 π 2 = πΜ πΜ πΜ πΜ ≤ π ≤ πΜ + π§πΌ/2 √ π π E STIMATING POPULATION VARIANCE ∑(π₯ − π₯Μ )2 π−1 8.4.1 Chi 2 formula for variance NB: Distribution must be normal to use this formula π2 = (π − 1)π 2 π2 ππ = (π − 1) 8.4.2 Confidence interval to estimate the population variance (π − 1)π 2 2 ππ/2 ≤ π2 ≤ (π − 1)π 2 2 π1−π/2 ππ = (π − 1) 2 Work π 2 out using π(2π),ππ and π(1− and the π 2 tables. π ),ππ 2 2 12 8.5 E STIMATING SAMPLE SIZ E This is used to find the minimum sample size to fulfill the requirements of a particular confidence level within a certain amount of error. 8.5.1 Sample size when estimating µ π= 2 π§π/2 π2 πΈ2 2 π§π π =( 2 πΈ ) πΈ = (π₯Μ − π) = πΈππππ ππ πΈπ π‘ππππ‘πππ You either need to work out E, or it can be given as “to be within .03 of the true population proportion” Always round up, since you can’t have half-people 8.5.2 Sample size when estimating p π= π§ 2 ππ πΈ2 Work out z-stat through confidence interval and tables 13 9 HYPOTHESIS TESTING [1 POPULATION] 9.1 M ETHODOLOGY 1. 2. 3. 4. 5. 9.2 Specify the thing of interest Formulate H0 and Ha a. Draw Define the level of significance a. 1 sided or two sided test? i. 1 sided for greater or less ii. 2 sided for equals Test a. Determine the appropriate statistical test b. Establish the decision rule c. Gather sample data d. Analyze the data Conclude/business application R EJECTION AND NON - REJECTION REGIONS Via critical values (inside is non-rejection, outside is rejection region) 14 9.3 U SING Z - STAT 9.3.1 Testing hypothesis about a population mean using the z -stat Z test for a single mean π§= π₯Μ − π π⁄√π Where result is z, minus z from 0.5 or 1 and find on z table then look up row/column (i.e. the reverse of finding z score) 9.3.1.1 EXAMPLE QUESTION CPA’s average net Y for sole proprietor is $74914 [statistic from 10 years ago] Test again, n=112, π=$14530 STEP 1: HYPOTHESISE H0: µ=$74914 Ha: µ≠$74914 STEP 2: WHICH TEST TO USE? Sample size is large (n>30), sample mean as stat, therefore z-stat. π§= π₯Μ − π π⁄√π STEP 3: WHAT ARE THE CRITICAL VALUES? Accuracy required: 95%, therefore α=.05 This test involves an = sign, not a ≤ or ≥ sign, so it is a two tailed test α/2=.05/2=.025 Each side therefore has a .475 success area and a .025 fail area. Plug .025 into z table to find zα/2 ο +/- 1.96 STEP 4: FIND TEST STATISTIC Sample mean = $78695, n = 112, µ = $74914, π=$14530 π§= 78695 − 74914 π14530⁄√112 = 2.75 15 STEP 5: COMPARE TO CRITICAL VALUES Accepted range = +/- 1.96; 2.75 is not in this range, reject null hypothesis 9.3.2 Testing the mean with a finite population π§= π₯Μ − π π √π − π √π π − 1 9.4 U SING F- STAT 9.4.1 T-test for µ P320 π₯Μ − π π √π π‘= ππ = π − 1 9.5 π§= H YPOTHESIS ABOUT A PR OPORTION πΜ − π ππ √π 9.6 H YPOTHESIS ABOUT A VARIANCE P331 π2 = (π − 1)π 2 π2 ππ = π − 1 9.7 T YPE 2 ERRORS When null hypothesis is false See p 334 16 10 HYPOTHESIS TESTING [2+ POPULATIONS] p399 10.1 Z FORMULA FOR THE DI FFERENCE IN TWO SAMP LE MEANS AND POPULATION VARIA NCES π§= (π₯1 − π₯2 ) − (π1 − π2 ) √( π12 π22 + ) π1 π2 π1 − π2 = 0 10.1.1 Confidence intervals in estimate of π1 − π2 ( SEE P 360) 10.2 T STAT FOR THE DIFFERE NCE IN TWO SAMPLE MEANS (V ARIANCES UNKNOWN ) (see p365) 10.2.1 Confidence intervals in estimate of π1 − π2 (see p369) 10.3 S TATISTICAL INFERENCE S FOR RELATED POPULATIONS (see p 373) 10.4 S TATISTICAL INFERENCE S FOR TWO POPULATION PROPORTIONS (p383) 10.5 S TATISTICAL INFERENCE S FOR TWO POPULATION VARIANCES (p390) Ratio of two sample variances gives F value 17 11 ANOVA 18 12 REGRESSION [1] 12.1 S INGLE R EGRESSION π¦ = (πππ‘ππππππ‘) + π1 π₯1 + π2 π₯2 + β― + ππ π₯π If regression output “p-value” is smaller than .05 reject null hypothesis and use in formula R^2 shows “goodness” of model (0=bad, 1=good) 12.2 M ULTIPLE REGRESSION In multiple regression R^2 is inaccurate, so we have to adjust 12.3 P ROBLEMS Multi collinearity (values overlap) 19 13 REGRESSION [2] MORE PROBLEMS Residual is the difference between predicted and actual results 13.1 F-T EST H0, all of the coefficients = 0 If f-stat > critical F If significance f < alpha, reject Testing each coefficient, change one at a time to 0, see if there is a change 20