1 Josh O'Farrell Math 1040-011 Instructor: T. Hilton April 30, 2012 Time and $, Are They Related When You Grocery Shop? INTRODUCTION It is probably safe to assume that almost all of us go grocery shopping on a relatively regular basis. Because most of us cannot afford to eat out all the time we need to go to the store to buy food. When I say store I am generally referring to anywhere you go to buy your groceries. It is important to understand that we spend more than just money at the store. We also spend our time, which is also a precious commodity. But what if the two are related? This aspect of the grocery store is what our group decided to focus on. Our group, consisting of: Jennifer Gerrard, Charonda Edwards, Brad Peterson, and myself, posed the following research question for our statistics class. "Is the amount of time a person spends in a grocery store related to the amount of money spent at that same grocery store?" Basically we are trying to see if our independent (or explanatory) variable, the time spent at a grocery store, is trying to predict our dependent (or response) variable, the money spent on groceries at that store. In this situation both of our variables are quantitative types of data, meaning we can count them. To collect our data we would need to ask people who came out of the grocery store two individual questions to satisfy both variables. The first question was, how much time did you spend in the grocery store? The other question was, how much money did you spend in the grocery store? This type of data collection is considered an observational study. Specifically a case control study because we would have the individuals look back in time and provide data to us from the past. For our purposes it would have been impossible to ask these questions to the entire grocery shopping population from the stores we went to observe. So we chose to use a systematic sampling approach to collect the data needed for our study. In the systematic method each of us would go to a different grocery store (of a different grocery chain), and ask every Kth person exiting the store the two questions I mentioned earlier. We decided the value of K would be every 6th person, and we would each ask 10 people for a total sample of 40 data points. We each randomly rolled a die to determine our random starting point. I specifically went to the Sunflower Farmers Market in Murray on State Street to collect my ten data points. DATA STATISTICS & ANALYSIS All of us did indeed collect 10 data points each for a total of 40 in our sample. Table 1 contains all the data points of our sample. Table 1 Sample X- Time (min) 25 10 Y- Cost ($) 83 48 2 12 10 9 4 75 20 3 10 10 30 60 30 45 15 5 30 10 9 30 10 15 30 20 15 15 5 15 3 15 50 8 5 15 30 3 30 30 20 21 3 40 18 248 193 8 8 15 7 19 31 4 8 13 27 53 11 160 10 60 70 40 17 79 10 14 3 5 90 3 10 100 86 5 44 35 57 3 You can see in Table 2 our independent variable descriptive statistics. Below that table are the frequency diagram and boxplot diagram for this variable. You will notice that both diagrams indicate the distribution of the data is skewed right with a median at 15 minutes and a range of 72 minutes. This variable does have one outlier and it is located at 75 minutes. Table 2 Descriptive Statistics for X - Time (min) Mean 19.65 S.D. 16.01 Min 3 Q1 9.5 Median Q3 15 30 Max 75 Range 72 Mode 30 Outliers 75 4 Our dependent variable descriptive statistics can be found in table 3. Below that are the frequency diagram and boxplot diagram for this variable. Similarly to our independent variable, the distribution of the data for our dependent variable is skewed right. It has a median at $20 and a range of $245. This variable has three outliers, and they are located at $160, $193, and $248. Table 3 Descriptive Statistics for Y - Money ($) Mean 43.9 S.D. 53.96 Min 3 Q1 9 Median Q3 20 58.5 Max 248 Range 245 Mode Outliers 3, 8, 10 160, 193, 248 5 6 When combining the variables in a Cartesian coordinate system we can see what the data looks like in a scatter plot in Figure 1, which also includes the estimated linear regression line based on the data we collect. You can see the equation for this line is written in the figure and has a R-value = 0.5410. The critical value of the correlation coefficient for n40 = .312. When comparing our R-value to the critical value for our sample size of 40 we can see that the R-value is greater than the critical value. Because the value is greater, and the slope of the regression line is positive, we can say that there is a positive linear relationship between the two variables. Figure 1 Scatter Plot Chart of Data Scatter Plot with Regression Line 300 250 y = 1.8237x + 8.0639 R = 0.5410 Y - Money Spent ($) 200 Y 150 Predicted Y Linear (Y) 100 50 0 0 10 20 30 40 50 60 70 80 X- Time in Store (min) LESSONS LEARNED One of the things I picked up on right away was the difference in accuracy of the values given between the money spent and the time spent in the store. Individuals questioned could easily look at their receipt and tell me the exact amount they spent; however, it seems the majority of the data given for the amount of time spent in the store was estimated at best. Some individuals would look at their watch to try to calculate the time, and others would simply 7 provide me a number (or a best guess). In one case I was asking the questions to a couple who had completed their shopping and there was a 15 minute difference in opinion between them for how long they were in the store shopping for groceries. Overall I feel that the data for the time values are off from what the true value actually should be. While a positive linear relation does exist between the two variables, I believe there could be other factors (or lurking variables) that influence how much money is spent in a certain amount of time at the grocery store. The data points (45, 4) and (60, 19) are $90 and $117 respectively below the expected values for those amounts of time spent in the grocery store. One possible explanation could be a shopper's familiarity with that particular grocery store. If those individuals walked into that particular store for the very first time it may take them more time than the average shopper to find the items they are looking for. Shopping habits may also be a factor. In this case maybe these shoppers might have been the kind that like to walk up and down every isle to "window shop," and may actually only get a couple of items. There is also a difference in the opposite direction such as at the points (15, 100) and (20, 193) are $65 and $148 respectively above their expected values for those times. Perhaps these shoppers were intimately familiar with "their" store and knew exactly where to find the items on their list; therefore taking less time than normal to grocery shop. They may also have been purchasing significant quantities of items, such as in a case lot sale, or more expensive items. Sales may also affect the values for money spent in the grocery store. While these are reasonable explanations for varied data it is inconclusive for me to say definitively if these were variables affecting our study. CONCLUSION We set out to see if there was a relationship between time spent and money spent in the grocery store. While there may be other factors that could possibly have an effect on the correlation between the variables it is difficult to say with absolute certainty. Based on the data we collected we can say that there is a positive linear relation between the two variables we were looking at. This basically means that yes, there is a relationship between the time we spend in a grocery store and the amount of money spent.