excel

advertisement
EXCEL PROJECT TUTORIAL
GETTING YOUR UNIQUE DATA SET…

Go to the stat 216 homepage:
http://www.stat.wmich.edu/s216 and click on
Weekly Homework Link
GETTING YOUR UNIQUE DATA SET…

Under Excel Projects section, Click on HW
Data
GETTING YOUR UNIQUE DATA SET…


You will be directed to a page containing several data sets.
Click on the one assigned for this semester: Realestate
Data
You would be directed to a page pertaining to that data set.
Under Select Variables section, check all the box before
each variable.
Check these
boxes
GETTING YOUR UNIQUE DATA SET…

At the bottom of the page, enter 30 for sample
size, and your 4-digit pin that you use to access
your weekly homework.
Enter 30 here
Your 4-digit pin here
then click on the submit button
GETTING YOUR UNIQUE DATA SET…

You would be directed to a page containing your
unique data set.
COPYING YOUR DATA SET INTO EXCEL…


On the page containing your unique data set, select all then
copy.
Open Microsoft Excel and paste your data set in the first
cell
Paste your
data set here
COPYING YOUR DATA SET INTO EXCEL…

Click on DATA tab, then text to columns to
separate the variables into several columns
COPYING YOUR DATA SET INTO EXCEL…

You will see this box appear next.
Choose Delimited
then click next
COPYING YOUR DATA SET INTO EXCEL…

On the next dialogue box, select how your data set is
delimited. In our case, each variable is separated by
comma, so make sure only the box referring to comma is
checked. Then click on Finish.
COPYING YOUR DATA SET INTO EXCEL…

You would then see your data set separated into columns.
You may edit the font size and everything you want on this
data. Since you are going to use this specific data set in all
three phases of the project, save this data set with a
filename that you could remember. E.g. stat216project
EXCEL PROJECT
Phase I
PHASE I



In this phase, you are expected to identify the
type and level of measurement of each variable
that you are dealing with.
In addition, depending on what kind of variable
that you have, what is the appropriate method of
data presentation that you could use to present
that variable?
Furthermore, what measures of location and
spread could you compute for these variables to
better describe your data set?
PHASE I

You may construct a table to help guide you on
what to do with your variables.
Example:
Variable
Type
Level of
Measurement
Price
Numerical
Ratio
Color
Categorical
Nominal
PHASE I


Once you have identified the type and level of
measurement of each variable, what graphs or
tables could you use to describe categorical
variables? What about numerical variables?
Microsoft Excel has a data analysis toolpak that
could assist you in coming up with graphs. In
your Data tab, you should see a button labeled
Data Analysis. If not, then you need to install
this toolpak.
INSTALLING DATA ANALYSIS TOOLPAK...

In Excel 2007, click on the office button at the
top, then choose
Excel Options
INSTALLING DATA ANALYSIS TOOLPAK...

You would see this box next.
Click on
Add-Ins
INSTALLING DATA ANALYSIS TOOLPAK...


You would then be directed to the Add-Ins menu.
At the bottom of this menu, select Excel Add-Ins from the
Manage drop down list then click on Go
INSTALLING DATA ANALYSIS TOOLPAK...

You would be directed to the Add-Ins menu.
Check the box corresponding to Analysis Toolpak
then click
OK.
INSTALLING DATA ANALYSIS TOOLPAK...

You would then see the Data Analysis button on
the Data Tab.
GRAPHING VARIABLES…



Suppose you want to create a graph for a
variable. Lets say for example, your variable has
two categories: 1-Yes and 0-No.
For this variable, first thing you need to do is
count the number of observations belonging to
each category.
Then select the appropriate graph that you want
to make.
GRAPHING VARIABLES…


Open the file containing your data set.
Suppose your data set contains a categorical
variable, say Pool (0-No, 1-Yes)
GRAPHING VARIABLES…


In this particular example, suppose our
observations for pool starts from D2 and goes up
to D31.
In graphing categorical variables you must create
a “bin” which contains all the categories of your
variable.
GRAPHING VARIABLES…

Since we only have two categories for pool we
would create a bin that has two categories as
well, i.e. 0 and 1.
GRAPHING VARIABLES…


Once you have created the bin, click on the Data tab, then
click on Data Analysis button. You would see a menu
showing all the contents of the Analysis toolpak.
Since our goal is to count the number of observations for
each category, choose Histogram, then click OK.
GRAPHING VARIABLES…



You would then be prompted to enter the Input
Range and the Bin range.
The input range would be that column containing
the observations for the variable.
The bin range is that column that contains the
categories of the variable.
GRAPHING VARIABLES…
In our example: observations for pool starts from
D2 to D31, while bin starts from K3 to K4
GRAPHING VARIABLES…

Once you click OK, a new worksheet would be
created showing the counts for each category of
the variable:
GRAPHING VARIABLES…
On this worksheet, click on INSERT tab, then
choose the graph you want.
 For example, we want a pie graph.

Click on Pie,
then choose the
type of Pie that
you want. It would
then show you the
Pie graph
COMPUTING SUMMARY STATISTICS…


Suppose for example, you want to describe a variable using
some numerical descriptive measures. Let’s say our
variable is price of a house. In our data set, lets say this
variable is on the first column.
Again, click on Data tab, then Data Analysis button. From
the menu, select Descriptive Statistics
COMPUTING SUMMARY STATISTICS…

On the Input Range box, enter the range of the
variable that you want to compute statistics for.
COMPUTING SUMMARY STATISTICS…

If the first row contains the label of the variable,
check the box that says Labels in First Row.
Then check the box
for Summary
statistics, then OK
COMPUTING SUMMARY STATISTICS…

On a new worksheet, the values for some
numerical descriptive measures would be
displayed. Adjust the column width to clearly see
the values.
PHASE I WRITE-UP


Using all the graphs and computations that you
made for the variables, describe the data set that
you have on hand.
You may or may not use all the variables in your
write-up, but you have to give a brief explanation
on why you decided to include a particular
variable in your project.
EXCEL PROJECT
Phase II
PHASE II


The second phase of the project is focused on
estimation and test of hypothesis.
In this phase, you are to compute point and
interval estimates for a specific variable of
interest and draw conclusion based on confidence
interval or p-value of the test.
PHASE II


Suppose for example, we go back to our data set
that has variables price and pool.
We might be interested to know the average price
of a house, or the difference in the average price
of a house with and without a pool.
POINT ESTIMATION


If we are interested in just a point estimate for
the average of a specific variable, we could just
use the descriptive statistics option under the
data analysis menu. (see previous slides for
instructions)
If we want a confidence interval instead, you
could use an excel worksheet that we have
provided for you.
CONFIDENCE INTERVAL

We made an excel worksheet that could help you
compute your confidence interval for the mean
easily. The spreadsheet looks like this:
CONFIDENCE INTERVAL

The first worksheet is designed for one population mean
confidence interval. Just follow the instructions that is
written on the spreadsheet.
This is your
Confidence
interval
CONFIDENCE INTERVAL

If you are interested in a confidence interval for a
difference between two independent means, you
would use the second spreadsheet.
CONFIDENCE INTERVAL



First, you need to sort the data set to separate
the values according to which category they
belong.
For example, we want a confidence interval for
the average difference in the price for homes with
(pool=1) or without pool (pool=0).
We need to sort the data set in a way that all
those with pool=0 are next to each other, and
those with pool=1 are also next to each other.
SORTING YOUR DATA SET


Select the entire data set (CTRL + A).
Click on the Data Tab, then choose the SORT
button.
SORTING YOUR DATA SET

You would see the SORT dialogue box appear.
Since our data set has the variable names on the
first row, check this box.
SORTING YOUR DATA SET


Then, from the Sort By drop down menu, choose the
variable that you would use as sorting variable. In our
case, we would use pool.
Once you have selected the appropriate variable, click on
OK.
SORTING YOUR DATA SET

You would then see your data set sorted
according to that variable.
All those
with Pool=0
are next to
each other.
CONFIDENCE INTERVAL

Once you have your data set sorted, follow the
instructions in the worksheet.
This is
Your
Confidence
interval
CONFIDENCE INTERVAL


Note that since our interest is the difference in
the price for with or without pool, what you
would copy in the worksheet are the PRICES for
those with pool=0 under the 0 column, and the
PRICES for those with pool=1 under the 1
column.
You could use this confidence interval for
drawing conclusion as well.
TEST OF HYPOTHESIS

There are several functions in the Data Analysis
toolpak that you could use to conduct a test of
hypothesis. Depending on the test that you are
going to conduct, choose the appropriate test.
TEST OF HYPOTHESIS


Suppose in our example, we want to know if
there is a difference in the average price of
houses with or without pool.
The test that we would use is
this one
TEST OF HYPOTHESIS

Once you click OK, this dialogue box should
appear:
Specify the range of
Values for prices with
Pool = 0 here.
Specify the range of
Values for prices
with Pool = 0 here.
Set the level of
Significance here.
TEST OF HYPOTHESIS

Suppose in our sample data set, the prices for
pool=0 starts from A2 up to A19 while for pool=1,
it starts from A20 up to A31. We want to test the
hypothesis at 5% level of significance.
TEST OF HYPOTHESIS

The output would be on a new worksheet. Adjust the
column widths to see the numbers clearly.
Value of the
Test statistic
P-value for
one-tailed test
P-value for
two-tailed test
PHASE II WRITE-UP


Your write-up for phase II should include all your
estimates and conclusions that you drew.
You must have supporting evidence as to why did
you come up with that conclusion. i.e, specify the
p-value, and why did it lead you to that
conclusion.
EXCEL PROJECT
Phase III
PHASE III



The Final Phase of the project is basically phase I
and II combined, with some more information
that you could include in your project.
For example, by the time you have turned in
phase II, we have not covered chi-square,
regression and correlation analysis yet.
In your final phase, you might want to include
some of this analysis to give further meaning to
your data set.
PHASE III


For example, in our data set containing price of a
house. What are the variables that are associated
with price? What are the variables that you
could use to predict the price of a house?
Those are just guide questions that could help
you analyze your data set further.
CORRELATION ANALYSIS


Suppose you want to determine the strength of
association between price of the house and the
number of bedrooms.
In the data analysis toolpak, choose Correlation
CORRELATION ANALYSIS

On the dialogue box, highlight the column for
price and bedrooms on the input range. Also,
check the box for Labels in the first row.
CORRELATION ANALYSIS


You would see the output on a new worksheet.
This is the correlation coefficient of the two
variables.
REGRESSION ANALYSIS


Suppose you want to predict the price of the
house using say, the number of bedrooms.
You could use the Regression Analysis option
from the Data Analysis Toolpak.
REGRESSION ANALYSIS

On the Regression dialogue box:
Specify the range of
Values of the variable
You want to predict
Here
Specify the range of
Values of the variable
You are using to
predict the other
Variable here
REGRESSION ANALYSIS

On a new worksheet, you would see the
regression output.
PHASE III WRITE-UP


In your final project write-up, you are expected to
write an executive summary about the entire
project.
You might want to include these sections in your
project in order to provide your readers with an
effective paper.
PHASE III WRITE-UP


Introduction
What your data set is all about? What are the
variables? What are the questions you intended
to answer in this project and what are the
methods that you used to answer them?
Executive Summary
What are your findings? What are the answer to
the questions you raised before? What can you
conclude on your data set?
PHASE III WRITE-UP

Appendix
a copy of your data set
your references
Download