EXCEL PROJECT TUTORIAL GETTING YOUR UNIQUE DATA SET… Go to the stat 216 homepage: http://www.stat.wmich.edu/s216 and click on Weekly Homework Link GETTING YOUR UNIQUE DATA SET… Under Excel Projects section, Click on HW Data GETTING YOUR UNIQUE DATA SET… You will be directed to a page containing several data sets. Click on the one assigned for this semester: Realestate Data You would be directed to a page pertaining to that data set. Under Select Variables section, check all the box before each variable. Check these boxes GETTING YOUR UNIQUE DATA SET… At the bottom of the page, enter 30 for sample size, and your 4-digit pin that you use to access your weekly homework. Enter 30 here Your 4-digit pin here then click on the submit button GETTING YOUR UNIQUE DATA SET… You would be directed to a page containing your unique data set. COPYING YOUR DATA SET INTO EXCEL… On the page containing your unique data set, select all then copy. Open Microsoft Excel and paste your data set in the first cell Paste your data set here COPYING YOUR DATA SET INTO EXCEL… Click on DATA tab, then text to columns to separate the variables into several columns COPYING YOUR DATA SET INTO EXCEL… You will see this box appear next. Choose Delimited then click next COPYING YOUR DATA SET INTO EXCEL… On the next dialogue box, select how your data set is delimited. In our case, each variable is separated by comma, so make sure only the box referring to comma is checked. Then click on Finish. COPYING YOUR DATA SET INTO EXCEL… You would then see your data set separated into columns. You may edit the font size and everything you want on this data. Since you are going to use this specific data set in all three phases of the project, save this data set with a filename that you could remember. E.g. stat216project EXCEL PROJECT Phase I PHASE I In this phase, you are expected to identify the type and level of measurement of each variable that you are dealing with. In addition, depending on what kind of variable that you have, what is the appropriate method of data presentation that you could use to present that variable? Furthermore, what measures of location and spread could you compute for these variables to better describe your data set? PHASE I You may construct a table to help guide you on what to do with your variables. Example: Variable Type Level of Measurement Price Numerical Ratio Color Categorical Nominal PHASE I Once you have identified the type and level of measurement of each variable, what graphs or tables could you use to describe categorical variables? What about numerical variables? Microsoft Excel has a data analysis toolpak that could assist you in coming up with graphs. In your Data tab, you should see a button labeled Data Analysis. If not, then you need to install this toolpak. INSTALLING DATA ANALYSIS TOOLPAK... In Excel 2007, click on the office button at the top, then choose Excel Options INSTALLING DATA ANALYSIS TOOLPAK... You would see this box next. Click on Add-Ins INSTALLING DATA ANALYSIS TOOLPAK... You would then be directed to the Add-Ins menu. At the bottom of this menu, select Excel Add-Ins from the Manage drop down list then click on Go INSTALLING DATA ANALYSIS TOOLPAK... You would be directed to the Add-Ins menu. Check the box corresponding to Analysis Toolpak then click OK. INSTALLING DATA ANALYSIS TOOLPAK... You would then see the Data Analysis button on the Data Tab. GRAPHING VARIABLES… Suppose you want to create a graph for a variable. Lets say for example, your variable has two categories: 1-Yes and 0-No. For this variable, first thing you need to do is count the number of observations belonging to each category. Then select the appropriate graph that you want to make. GRAPHING VARIABLES… Open the file containing your data set. Suppose your data set contains a categorical variable, say Pool (0-No, 1-Yes) GRAPHING VARIABLES… In this particular example, suppose our observations for pool starts from D2 and goes up to D31. In graphing categorical variables you must create a “bin” which contains all the categories of your variable. GRAPHING VARIABLES… Since we only have two categories for pool we would create a bin that has two categories as well, i.e. 0 and 1. GRAPHING VARIABLES… Once you have created the bin, click on the Data tab, then click on Data Analysis button. You would see a menu showing all the contents of the Analysis toolpak. Since our goal is to count the number of observations for each category, choose Histogram, then click OK. GRAPHING VARIABLES… You would then be prompted to enter the Input Range and the Bin range. The input range would be that column containing the observations for the variable. The bin range is that column that contains the categories of the variable. GRAPHING VARIABLES… In our example: observations for pool starts from D2 to D31, while bin starts from K3 to K4 GRAPHING VARIABLES… Once you click OK, a new worksheet would be created showing the counts for each category of the variable: GRAPHING VARIABLES… On this worksheet, click on INSERT tab, then choose the graph you want. For example, we want a pie graph. Click on Pie, then choose the type of Pie that you want. It would then show you the Pie graph COMPUTING SUMMARY STATISTICS… Suppose for example, you want to describe a variable using some numerical descriptive measures. Let’s say our variable is price of a house. In our data set, lets say this variable is on the first column. Again, click on Data tab, then Data Analysis button. From the menu, select Descriptive Statistics COMPUTING SUMMARY STATISTICS… On the Input Range box, enter the range of the variable that you want to compute statistics for. COMPUTING SUMMARY STATISTICS… If the first row contains the label of the variable, check the box that says Labels in First Row. Then check the box for Summary statistics, then OK COMPUTING SUMMARY STATISTICS… On a new worksheet, the values for some numerical descriptive measures would be displayed. Adjust the column width to clearly see the values. PHASE I WRITE-UP Using all the graphs and computations that you made for the variables, describe the data set that you have on hand. You may or may not use all the variables in your write-up, but you have to give a brief explanation on why you decided to include a particular variable in your project. EXCEL PROJECT Phase II PHASE II The second phase of the project is focused on estimation and test of hypothesis. In this phase, you are to compute point and interval estimates for a specific variable of interest and draw conclusion based on confidence interval or p-value of the test. PHASE II Suppose for example, we go back to our data set that has variables price and pool. We might be interested to know the average price of a house, or the difference in the average price of a house with and without a pool. POINT ESTIMATION If we are interested in just a point estimate for the average of a specific variable, we could just use the descriptive statistics option under the data analysis menu. (see previous slides for instructions) If we want a confidence interval instead, you could use an excel worksheet that we have provided for you. CONFIDENCE INTERVAL We made an excel worksheet that could help you compute your confidence interval for the mean easily. The spreadsheet looks like this: CONFIDENCE INTERVAL The first worksheet is designed for one population mean confidence interval. Just follow the instructions that is written on the spreadsheet. This is your Confidence interval CONFIDENCE INTERVAL If you are interested in a confidence interval for a difference between two independent means, you would use the second spreadsheet. CONFIDENCE INTERVAL First, you need to sort the data set to separate the values according to which category they belong. For example, we want a confidence interval for the average difference in the price for homes with (pool=1) or without pool (pool=0). We need to sort the data set in a way that all those with pool=0 are next to each other, and those with pool=1 are also next to each other. SORTING YOUR DATA SET Select the entire data set (CTRL + A). Click on the Data Tab, then choose the SORT button. SORTING YOUR DATA SET You would see the SORT dialogue box appear. Since our data set has the variable names on the first row, check this box. SORTING YOUR DATA SET Then, from the Sort By drop down menu, choose the variable that you would use as sorting variable. In our case, we would use pool. Once you have selected the appropriate variable, click on OK. SORTING YOUR DATA SET You would then see your data set sorted according to that variable. All those with Pool=0 are next to each other. CONFIDENCE INTERVAL Once you have your data set sorted, follow the instructions in the worksheet. This is Your Confidence interval CONFIDENCE INTERVAL Note that since our interest is the difference in the price for with or without pool, what you would copy in the worksheet are the PRICES for those with pool=0 under the 0 column, and the PRICES for those with pool=1 under the 1 column. You could use this confidence interval for drawing conclusion as well. TEST OF HYPOTHESIS There are several functions in the Data Analysis toolpak that you could use to conduct a test of hypothesis. Depending on the test that you are going to conduct, choose the appropriate test. TEST OF HYPOTHESIS Suppose in our example, we want to know if there is a difference in the average price of houses with or without pool. The test that we would use is this one TEST OF HYPOTHESIS Once you click OK, this dialogue box should appear: Specify the range of Values for prices with Pool = 0 here. Specify the range of Values for prices with Pool = 0 here. Set the level of Significance here. TEST OF HYPOTHESIS Suppose in our sample data set, the prices for pool=0 starts from A2 up to A19 while for pool=1, it starts from A20 up to A31. We want to test the hypothesis at 5% level of significance. TEST OF HYPOTHESIS The output would be on a new worksheet. Adjust the column widths to see the numbers clearly. Value of the Test statistic P-value for one-tailed test P-value for two-tailed test PHASE II WRITE-UP Your write-up for phase II should include all your estimates and conclusions that you drew. You must have supporting evidence as to why did you come up with that conclusion. i.e, specify the p-value, and why did it lead you to that conclusion. EXCEL PROJECT Phase III PHASE III The Final Phase of the project is basically phase I and II combined, with some more information that you could include in your project. For example, by the time you have turned in phase II, we have not covered chi-square, regression and correlation analysis yet. In your final phase, you might want to include some of this analysis to give further meaning to your data set. PHASE III For example, in our data set containing price of a house. What are the variables that are associated with price? What are the variables that you could use to predict the price of a house? Those are just guide questions that could help you analyze your data set further. CORRELATION ANALYSIS Suppose you want to determine the strength of association between price of the house and the number of bedrooms. In the data analysis toolpak, choose Correlation CORRELATION ANALYSIS On the dialogue box, highlight the column for price and bedrooms on the input range. Also, check the box for Labels in the first row. CORRELATION ANALYSIS You would see the output on a new worksheet. This is the correlation coefficient of the two variables. REGRESSION ANALYSIS Suppose you want to predict the price of the house using say, the number of bedrooms. You could use the Regression Analysis option from the Data Analysis Toolpak. REGRESSION ANALYSIS On the Regression dialogue box: Specify the range of Values of the variable You want to predict Here Specify the range of Values of the variable You are using to predict the other Variable here REGRESSION ANALYSIS On a new worksheet, you would see the regression output. PHASE III WRITE-UP In your final project write-up, you are expected to write an executive summary about the entire project. You might want to include these sections in your project in order to provide your readers with an effective paper. PHASE III WRITE-UP Introduction What your data set is all about? What are the variables? What are the questions you intended to answer in this project and what are the methods that you used to answer them? Executive Summary What are your findings? What are the answer to the questions you raised before? What can you conclude on your data set? PHASE III WRITE-UP Appendix a copy of your data set your references