Basics of Statistical Analysis Basics of Analysis • The process of data analysis Observation Encode Data Information Analysis Example 1: – Gift Catalog Marketer – Mails 4 times a year to its customers – Company has I million customers on its file Example 1 • Cataloger would like to know if new customers buy more than old customers? • Classify New Customers as anyone who brought within the last twelve months for first time. • Analyst takes a sample of 100,000 customers and notices the following. Example 1 • 5000 orders received in the last month • 3000 (60%) were from new customers • 2000 (40%) were from old customers • So it looks like the new customers are doing better Example 1 • Is there any Catch here!!!!! • Data at this gross level, has no discrimination between customers within either group. – A customer who bought within the last 11 days is treated exactly similar to a customer who bought within the last 11 months. Example 1 • Can we use some other variable to distinguish between old and new Customers? • Answer: Actual Dollars spent ! • What can we do with this variable? – Find its Mean and Variation. • We might find that the average purchase amount for old customers is two or three times larger than the average among new customers Numerical Summaries of data • The two basic concepts are the Center and the Spread of the data • Center of data - Mean, which is given by - Median - Mode n x x i 1 n i Numerical Summaries of data • Forms of Variation – Sum of differences about the mean: n ( x x) i 1 i n – Variance: 2 ( x x ) i i 1 n 1 – Standard Deviation: Square Root of Variance Confidence Intervals • In catalog eg, analyst wants to know average purchase amount of customers • He draws two samples of 75 customers each and finds the means to be $68 and $122 • Since difference is large, he draws another 38 samples of 75 each • The mean of means of the 40 samples turns out to be $ 94.85 • How confident should he be of this mean of means? Confidence Intervals • Analyst calculates the standard deviation of sample means, called Standard Error (SE). (For our example, SE is 12.91) • Basic Premise for confidence Intervals – 95 percent of the time the true mean purchase amount lies between plus or minus 1.96 standard errors from the mean of the sample means. • C.I. = Mean (+or-) (1.96) * Standard Error Confidence Intervals • However, if CI is calculated with only one sample then Standard Error of sample mean = Standard deviation of sample n • Basic Premise for confidence Intervals with one sample – 95 percent of the time the true mean lies between plus or minus 1.96 standard errors from the sample means. Example 2: Confidence Intervals for response rates • You are the marketing analyst for Online Apparel Company • You want to run a promotion for all customers on your database • In the past you have run many such promotions • Historically you needed a 4% response for the promotions to break-even • You want to test the viability of the current fullscale promotion by running a small test promotion 16-12 Example 2: Confidence Intervals for response rates • Test 1,000 names selected at random from the full list. • The test sample returns 3.8%. • You construct CI based on sample rate of 3.8% and n=1000 • Confidence Interval= Sample Response ± 1.96*SE • The SE=.006, and CI is (0.032, 0.044) • In our case C.I. = 3.2 % to 4.4%. Thus any response between 3.2 and 4.4 % supports hypothesis that true response rate is 4% © 2007 Prentice Hall 16-13 Example 2: Confidence Intervals for response rates • • • • So if sample response rate is 3.8%. Then the true response rate maybe 4% What if the sample response rate were 5% ? Regression towards mean: Phenomenon of test result being different from true result • Give more thought to lists whose cutoff rates lie within confidence interval 16-14