BSAD 141, Spring 2014 Excel Data Analysis Exercise FIRST: CHECK TO SEE IF YOU HAVE THE DATA ANALYSIS PLUG IN INSTALLED. We installed the Plug In at the beginning of last class. If you are using a MAC, MAKE SURE THAT YOU ARE RUNNING EXCEL FROM VIRTUAL BOX!!!! Click the Data tab on the top menu bar (on the right) and look to the far right. You should see Data Analysis on the very far right. If you have it great! You do not need to do anything. If you do NOT have the Data Analysis option, follow the directions below to install it. There are even more detailed instructions that are posted online from last class (Tuesday 2/25) The Analysis ToolPak includes the tools described below. To access these tools, click Data Analysis in the Analysis group on the Data tab. If the Data Analysis command is not available, you need to load the Analysis ToolPak add-in program. 1. Click the File tab, click Options, and then click the Add-Ins category. 2. In the Manage box, select Excel Add-ins and then click Go. 3. In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK. Tip If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it. If you are prompted that the Analysis ToolPak is not currently installed on your computer, click Yes to install it. Now, everyone should be able to go: 1) Left click on the “data” hot link in the schedule 2) The file will open and you can left click on the gear-like icon in the far upper right of the menu bar of your browser and choose “file” save as 3) Choose a location on your hard drive to save the file (preferably a folder you have associated with this class) save the file as a text file (by default the file name will be data and the type will be Text File (*.txt) to your hard drive. 4) Once you have saved the file to your hard drive, you can close your browser. Open Excel 5) In the upper left menu bar in Excel choose File Open and navigate to the location on your hard drive where you saved “data.txt” and open the file (you will have to select “All Files” from the bottom right of the menu bar because the default is set to search for only Excel files) double click data.txt to open it in Excel 1 6) The text import Wizard opens 7) Select the “Delimited” radio button (it is currently on “Fixed Width” click Next 8) Depending on the data you are using, you need to select the appropriate delimiter (the delimiter describes how the different data fields are separated – for example, they may be separated by Tab, space, a semicolon, etc). In this case, we have space delimited data, so click the check box next to “space” 9) You can see that the fields of data are now separated by Excel in the Data preview window choose Next 10) Leave the Column data format as General (this is the default) and choose Finish 11) You have now opened a text file in Excel and have the data separated into the appropriate columns. Save the file as Data.xlsx. Choose File Save As choose Excel workbook as the data type save (after this you will have both a data.txt and a data.xlsx file). 12) For our purposes just delete columns H through M and then insert a row above row 1. We need to insert column headers (field names for our data) 13) Label the columns as follows: column A = Local IP, column B = Remote IP, C = Protocol, D = Local Port, E = Remote Port, F = Incoming Bytes, G = Outgoing Bytes 14) Create a new column in H and label it Total Bytes. Use the sum function in Excel to calculate the sum of incoming and outgoing bytes for the Total Bytes column. 15) Highlight all columns (A through H). Use the data filter from the Home tab (it is called Sort & Filter on the far right of the menu bar) select “filter”. 16) Create a new worksheet in Excel called TCP and a new worksheet called UDP. Double click on Excel tab directly to the right of the one you are currently working in (look on the bottom left of the Excel screen) change the name from “Sheet1” to “TCP”. Double click on the next tab over “Sheet2” change the name from Sheet2 to “UDP” 17) The data where the value of Protocol = 6 is TCP stream data and data where the value of Protocol= 17 is UDP stream data (these data are in column C). Filter based on each protocol, copy the filtered data and paste it into the appropriate worksheet (i.e. protocol 6 into the TCP worksheet and protocol 17 into the UDP worksheet save your worksheet. 18) Insert a new column on the far left of each worksheet and number the records. You should have 377 TCP records and 109 UDP records 2 19) Use Excel to calculate summary statistics for Incoming, Outgoing, and Total Bytes for both TCP and UDP (choose the Data tab from the topmost menu and then Data Analysis from the far right). List the tables one after another in column K (I’ll demonstrate). Make sure to click summary statistics and confidence interval here. 20) What are the Kurtosis and Skewness values for each category (incoming, outgoing, total) for both TCP and UDP? What basic information does this provide? 21) Once you do this for all three columns in each of the two worksheets go to cell N2 and type the following numbers (one number in each row): 100; 500; 1,000; 5,000; 10,000; 50,000; 100,000; 500,000; 1,000,000; 5,000,000 (10 total). 22) These numbers will serve as your bin range for histograms. Use Data Analysis to generate a histogram for all three categories (incoming, outgoing, and total) for both the TCP and UDP worksheets. For example, for TCP (incoming, outgoing, and total) you should have the following: TCP: Incoming Bin 100 500 1,000 5,000 10,000 50,000 100,000 500,000 1,000,000 5,000,000 More Frequency 4 30 18 76 15 38 8 116 13 37 22 TCP: Outgoing Bin 100 500 1,000 5,000 10,000 50,000 100,000 500,000 1,000,000 5,000,000 More Frequency 8 19 3 40 21 91 33 138 6 15 3 TCP: Total Bin 100 500 1,000 5,000 10,000 50,000 100,000 500,000 1,000,000 5,000,000 More Frequency 2 11 9 20 19 89 22 43 93 40 29 23) Using the data from the histograms plot the probability distribution for each of the three categories on both worksheets (I will demonstrate). Go to the Insert menu tab select Column (for inserting a column graph) 24) Be able to provide a brief write-up that describes the important differences between UDP and TCP traffic. Do the incoming and outgoing bytes differ? How? What does this mean? What conclusions can you make? Can you compare the mean values of TCP traffic and UDP traffic for any of the incoming, outgoing, or total categories using the statistical tests provided in Excel? 3