data exercise

advertisement
BSAD 141, Spring 2014
Excel Data Analysis Exercise
FIRST: CHECK TO SEE IF YOU HAVE THE DATA ANALYSIS PLUG IN INSTALLED.
We installed the Plug In at the beginning of last class. If you are using a MAC, MAKE SURE THAT
YOU ARE RUNNING EXCEL FROM VIRTUAL BOX!!!! Click the Data tab on the top menu bar (on the
right) and look to the far right. You should see Data Analysis on the very far right. If you have it great!
You do not need to do anything.
If you do NOT have the Data Analysis option, follow the directions below to install it. There are even
more detailed instructions that are posted online from last class (Tuesday 2/25)
The Analysis ToolPak includes the tools described below. To access these tools, click Data Analysis in the Analysis
group on the Data tab. If the Data Analysis command is not available, you need to load the Analysis ToolPak add-in
program.
1.
Click the File tab, click Options, and then click the Add-Ins category.
2.
In the Manage box, select Excel Add-ins and then click Go.
3.
In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK.
Tip If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it.
If you are prompted that the Analysis ToolPak is not currently installed on your computer, click Yes to install it.
Now, everyone should be able to go:
1)
Left click on the “data” hot link in the schedule
2)
The file will open and you can left click on the gear-like icon in the far upper right of the menu bar
of your browser and choose “file”  save as
3)
Choose a location on your hard drive to save the file (preferably a folder you have associated with
this class) save the file as a text file (by default the file name will be data and the type will be
Text File (*.txt) to your hard drive.
4)
Once you have saved the file to your hard drive, you can close your browser. Open Excel
5)
In the upper left menu bar in Excel choose File Open and navigate to the location on your hard
drive where you saved “data.txt” and open the file (you will have to select “All Files” from the
bottom right of the menu bar because the default is set to search for only Excel files)  double
click data.txt to open it in Excel
1
6)
The text import Wizard opens
7)
Select the “Delimited” radio button (it is currently on “Fixed Width”  click Next
8)
Depending on the data you are using, you need to select the appropriate delimiter (the delimiter
describes how the different data fields are separated – for example, they may be separated by
Tab, space, a semicolon, etc). In this case, we have space delimited data, so click the check box
next to “space”
9)
You can see that the fields of data are now separated by Excel in the Data preview window 
choose Next
10)
Leave the Column data format as General (this is the default) and choose Finish
11)
You have now opened a text file in Excel and have the data separated into the appropriate
columns. Save the file as Data.xlsx. Choose File  Save As  choose Excel workbook as the data
type  save (after this you will have both a data.txt and a data.xlsx file).
12)
For our purposes just delete columns H through M and then insert a row above row 1. We need to
insert column headers (field names for our data)
13)
Label the columns as follows: column A = Local IP, column B = Remote IP, C = Protocol, D = Local
Port, E = Remote Port, F = Incoming Bytes, G = Outgoing Bytes
14)
Create a new column in H and label it Total Bytes. Use the sum function in Excel to calculate the
sum of incoming and outgoing bytes for the Total Bytes column.
15)
Highlight all columns (A through H). Use the data filter from the Home tab (it is called Sort & Filter
on the far right of the menu bar) select “filter”.
16)
Create a new worksheet in Excel called TCP and a new worksheet called UDP. Double click on Excel
tab directly to the right of the one you are currently working in (look on the bottom left of the
Excel screen)  change the name from “Sheet1” to “TCP”. Double click on the next tab over
“Sheet2”  change the name from Sheet2 to “UDP”
17)
The data where the value of Protocol = 6 is TCP stream data and data where the value of Protocol=
17 is UDP stream data (these data are in column C). Filter based on each protocol, copy the
filtered data and paste it into the appropriate worksheet (i.e. protocol 6 into the TCP worksheet
and protocol 17 into the UDP worksheet  save your worksheet.
18)
Insert a new column on the far left of each worksheet and number the records. You should have
377 TCP records and 109 UDP records
2
19)
Use Excel to calculate summary statistics for Incoming, Outgoing, and Total Bytes for both TCP and
UDP (choose the Data tab from the topmost menu and then Data Analysis from the far right). List
the tables one after another in column K (I’ll demonstrate). Make sure to click summary statistics
and confidence interval here.
20)
What are the Kurtosis and Skewness values for each category (incoming, outgoing, total) for both
TCP and UDP? What basic information does this provide?
21)
Once you do this for all three columns in each of the two worksheets go to cell N2 and type the
following numbers (one number in each row): 100; 500; 1,000; 5,000; 10,000; 50,000; 100,000;
500,000; 1,000,000; 5,000,000 (10 total).
22)
These numbers will serve as your bin range for histograms. Use Data Analysis to generate a
histogram for all three categories (incoming, outgoing, and total) for both the TCP and UDP
worksheets.
For example, for TCP (incoming, outgoing, and total) you should have the following:
TCP: Incoming
Bin
100
500
1,000
5,000
10,000
50,000
100,000
500,000
1,000,000
5,000,000
More
Frequency
4
30
18
76
15
38
8
116
13
37
22
TCP: Outgoing
Bin
100
500
1,000
5,000
10,000
50,000
100,000
500,000
1,000,000
5,000,000
More
Frequency
8
19
3
40
21
91
33
138
6
15
3
TCP: Total
Bin
100
500
1,000
5,000
10,000
50,000
100,000
500,000
1,000,000
5,000,000
More
Frequency
2
11
9
20
19
89
22
43
93
40
29
23)
Using the data from the histograms plot the probability distribution for each of the three
categories on both worksheets (I will demonstrate). Go to the Insert menu tab  select Column
(for inserting a column graph)
24)
Be able to provide a brief write-up that describes the important differences between UDP and TCP
traffic. Do the incoming and outgoing bytes differ? How? What does this mean? What conclusions
can you make? Can you compare the mean values of TCP traffic and UDP traffic for any of the
incoming, outgoing, or total categories using the statistical tests provided in Excel?
3
Download