3 Figuring Probabilities With Excel Functions

advertisement
Copyright © by Todd Easton & Stela Dhami 2007
Excel Skills for Bus. 500,
“Statistical and Quantitative Analysis”
by Todd Easton & Stela Dhami
5 February 2016
Please send suggested improvements to easton@up.edu .
2
Table of Contents
(To jump to a topic, mouse over it, depress the <Ctrl> key, and click.)
1
CREATING FREQUENCY DISTRIBUTIONS AND HISTOGRAMS
1
1.1
Absolute frequency distribution
1.1.1
Find the minimum, maximum, and the range
1.1.2
Establish the classes
1.1.3
Create the absolute frequency distribution table and histogram
1.2
Relative frequency distributions
1.2.1
Create the table
1.2.2
Create the chart
2
CREATING PIVOT TABLES
2
2
5
6
9
10
11
13
2.1
A two-way table with counts
14
2.2
Column percentages
18
2.3
Row percentages
19
2.4
Grouping rows together
20
3
FIGURING PROBABILITIES WITH EXCEL FUNCTIONS
23
3.1
BINOMDIST
23
3.2
NORMDIST
27
3.3
TDIST
28
4
PERFORMING LINEAR REGRESSION
30
4.1
Simple Linear Regression
30
4.2
Multiple Linear Regression
35
A format note: In what follows, italic script always indicates the labels on Excel commands that
you need to select or click on.
We wrote this document to help students grasp and review crucial Excel skills for Bus. 500.
Because this course is meant to be about applied statistics, not Excel, each Excel skill is usually
taught only once. If you have trouble learning a skill, or if you learn a skill and then forget it,
this handbook can be a valuable resource.
For most students, it will be best to learn (or review) these skills actively. To make that easier,
there is an Excel workbook to accompany this document. It contains the raw data you can use to
create the frequency distributions, pivot tables, and linear regressions in sections 1, 2, and 4 of
the document. You can find the workbook at teaching.up.edu/BUS500/ExcelHandbkData.xls.
The data in the first worksheet of the workbook are from the 2000 US Census. The wage income
data in Column E of this worksheet are used to create the tables and graphs in Section 1. The
race data in Column D, along with the county of residence data in Column F, are used to create
the pivot tables in Section 2. The data in the workbook’s second sheet represent a random
sample of the cars produced in the US in 1997. Section 4 uses these data to estimate linear
regressions predicting fuel efficiency.
1 Creating Frequency Distributions and Histograms
A frequency distribution is a table that describes the distribution of a numerical variable. For
example, a table might show how many people in a city earn low incomes compared to the
number earning high incomes. To do this, a frequency distribution table divides the range of
incomes in the county into a number of contiguous classes. For example, $0 to $9,999; $10,000
to $19,999, et cetera. The table provides counts of the number of people in the city that fall in
each income class. If a table provides counts of the number of people in each class, it is called
an absolute frequency distribution. If it provides the percentage of the population that falls in
each class, it is called a relative frequency distribution. If bar graph is created from a frequency
distribution, with the bars’ heights illustrating the number of cases in each class (or the
percentage of all cases in each class), we call it a histogram.
To illustrate how to create frequency distributions and histograms, we use data from the 2000
Census for Portland, Oregon. These data are the result of sampling approximately 7.5% of the
records from Portland generated by the 2000 Census. The universe sampled was all people 1665 in the densely populated portions of the Portland metropolitan area. Because the Census
truncates incomes at $200,000, the 23 people who reported incomes over 200,000 were
eliminated from the sample. The Census is a stratified random sample (not a simple random
sample), but that complication is ignored here.
2
1.1 Absolute frequency distribution
To create an absolute frequency distribution—along with the corresponding histogram, one
needs to take three steps:
1) find the minimum, maximum, and the range for a data set,
2) establish the classes the data set will be sorted into, and
3) use Excel’s Histogram procedure to create a table and a graph describing the data’s
distribution.
What follows explains how to take these steps and shows you what results each should produce.
1.1.1 Find the minimum, maximum, and the range
To create an absolute frequency distribution, open the Excel workbook where the data are
located (ExcelHandbookData.xls). Select the data of interest, including the data’s label. In the
Portland Census data, the variable of interest is WageInc (wage and salary income). After you
have done that, you should see something like the following on your screen:
3
To keep the resulting output simple, copy the column and paste it into the next sheet, where you
will perform the relative and absolute frequency distribution analysis. A quick way of doing this
is by clicking on cell E1, and then depressing the Control, Shift, and down-arrow keys
simultaneously. This will allow you to select the entire column of data E1:E2603. Copy the data
and paste into cell A1 of the next sheet.
Prior to building a histogram, you need some information to help build it: the minimum,
maximum, and range for the data. To have Excel calculate these descriptive statistics, click on
Tools, Data Analysis, and then on Descriptive Statistics. At that point, you should see this:
4
Now, click on OK.
You should see a
new dialog box:
Fill out the box in four steps:
1) Click on the small button with the little red arrow to the right of “Input Range”. Then,
select the input data (cells A1:A2603).
2) Click in the box to the left of “Labels in First Row,” to let Excel know you included a
label when you selected the data in Step 1).
3) Click in the circle to the left of “Output Range,” and then click on the tiny box to its
right. After that, click in an area of the spreadsheet where the descriptive statistics can be
displayed. Select one cell. That cell will become the upper-right-hand corner of the area
in which the descriptive statistics will be displayed.
4) Click in the box to the left of the label “Summary statistics.”
After doing these
four things, you
should see the
following:
5
Now click on OK. Excel will calculate the descriptive statistics for the data you selected. You
can beautify this table in two steps.
1) Get rid of distracting decimals. Select the numbers Excel calculated, click on Format,
and then click on Cells. Now, select Number, and then set decimal places to zero.
2) Avoid excessive column width. First, select both the labels and numbers. Second,
click on Format, Column, and then on Autofit Selection. That should give you this
descriptive statistics table:
1.1.2 Establish the classes
At this point, you have enough information to establish the classes you will use to for your table.
Generally, you should use between 5 and 15 classes. For small data sets, pick a number of
classes at the small end of that range. For large data sets, pick a number of classes at the large
end. The class width should be the same for all classes, and should be an easy-to-interpret
number (e.g. $1,000 would be better than $940).
To determine class width, take the range and divide it by the desired number of classes. In this
case, suppose you use 10 classes. That would give you a class width of $17,000. Using that
width, the following classes would make sense: incomes less than or equal to $0; incomes more
than $0, but less than or equal to $17,000; incomes more than $17,000, but less than or equal to
$34,000, et cetera.
6
You communicate these classes to Excel using what Excel calls “bins.” Each bin is the top value
of its class, so the bins corresponding to the classes named above would be $0, $17,000, $34,000,
et cetera. Type the label, Bins, and the relevant bin values into a column on your spreadsheet, as
is demonstrated below:
1.1.3 Create the absolute frequency distribution table and histogram
Next, click on
Tools, Data
Analysis,
Histogram, and
then OK. The
following box
will appear:
7
Before Excel can create the table and histogram, the input and and bin range must be selected.
Use the same method you used to select the data and output ranges for the Descriptive Statistics
Tool. After you have done that, you should see this:
Make sure that the “Chart Output” option is selected. This will direct Excel to create both a table
and a graph of the absolute frequencies of the incomes in the data.
8
Click on “OK” and the following table and histogram should appear:
9
Left-clicking on the Histogram and tugging on the lower edge down will make it more legible.
Right-clicking on the “Frequency” legend, and then left-clicking on “Clear” will get rid of it.
Left-clicking on the chart title (Histogram), and then left-clicking again right after the letter “m”
will allow you to create a more descriptive title. At that point, you might see the following:
1.2 Relative frequency distributions
The graph above would certainly be useful, but what if we want a table or graph displaying
relative frequencies (percentages) rather than absolute frequencies (counts)? To do this, we
would need to:
1) create the table by calculating relative frequencies from the counts made by the
Histogram Tool, and
2) use Excel’s chart wizard to create a chart from these relative frequencies.
Below we describe how to create the table and the chart.
10
1.2.1 Create the table
We can compute the relative frequencies in three steps:
1) Drag the histogram to the right three columns (to create some room to work in).
2) Paste a copy of the Bins to the right of the absolute frequency distribution table.
3) Divide the first absolute frequency by the total (from the “Count” produced by the
Descriptive Statistics Tool), to get a relative frequency, placing the relative frequency to
the right of the first bin (the one having the value of zero).
After following these steps, we
should see this:
To complete the relative frequency table:
1) Copy the formula down the column,
2) Type the word “Total” below the last bin and the words “Relative Frequency” just
above the column,
3) Sum the relative frequencies into the cell to the right of the word “Total,”
4) Select all the relative frequencies and the total, right-click on the selected area, leftclick on Format Cells, left-click on Percentage, set “Decimal places” to one, and leftclick on OK.
That should leave us with the
following:
11
1.2.2 Create the chart
After calculating the relative frequencies, you can create a graph of them as follows.
1) Select the relative frequency data (but not “100%”).
2) Click on the Chart Wizard icon. (If you have the Standard Toolbar installed, you will
see its icon toward the right end of the toolbar, labeled with a miniature bar graph.)*
Next, choose the first “Column” option.
3) Click Next.
4) Go to series tab, click on the small button with the little red arrow to the right of
“Category (X) axis labels,” select the numbers “0” through “170000,” and then click
again on the small button.
At that point, you should
see the following:
*
If you do not see the icon, click on Insert and then on Chart.
12
Then left-click on Finish.
Pretty the graph up and
you will see something
like this:
Congratulations! You have now created tables and charts for both absolute and relative
frequency distributions!
13
2 Creating Pivot Tables
Excel’s PivotTable tool quickly creates what are variously called pivot tables, two-way tables,
contingency tables, or crosstabulation tables. A pivot table allows you to explore the relationship
between two categorical variables.†
To illustrate the use of pivot tables, we use the same data set used previously to create frequency
distributions of incomes in Portland, Oregon. This time we use the data to explore the ethnic
composition of the three metropolitan counties.
The following instructions will show you how to create a simple pivot table, and then modify it
in three useful ways. The first table summarizes the number of people in each ethnic group in
each county.
An explanation of each variable in the Portland dataset
Variable Explanation
PUMAPublic Use Microdata Area--a metropolitan area subset, contains about 100,000 people
Age Person's age in years
Sex Equal to 1 for males & 2 for females
RaceG Designates which general racial/ethnic group the person belongs to
1 White
2 African American
3 American Indian
4 Chinese
5 Japanese
6 Other Asian or Pacific Islander
7 Other race, n.e.c.
8 Two major races
9 Three or more major races
WageInc wage and salary income reported for 1999
MetCounty the county the person lives in
1 Multnomah
2 Clackamas
3 Washington
†
Through the use of Page Fields, one can use the PivotTable tool to explore the relationship between three or four
categorical variables, but this document does not introduce the use of Page Fields.
14
2.1 A two-way table with counts
To create a pivot table, the first step is to select all the data you will include in the table. Begin
with the Ethnicity by County work sheet. Select all data in columns A and B. Click on Data,
and then on PivotTable and PivotChart Report. The PivotTable and PivotChart Wizard will
open:
Leaving the defaults in place, click Next. Excel will automatically use the data you selected as
the range to be entered into the PivotTable tool, so you should see the following:
15
Click Next, and Finish, and you will see this:
This screen allows you to quickly select the variable that will be included in your pivot table.
Beginning in the PivotTable Field List box, drag the variable label RaceG and drop it in the
“Drop Row Fields Here” box. Do the same with MetCounty, but drop it in where you see, “Drop
Column Fields Here.” That should get you this:
16
At this point you have established the two variables that will be used to crosstabulate the cases in
your sample. To actually have Excel do the crosstabulation, you need to drop a variable label in
the center of the table, where it says, “Drop Data Items Here.” It does not matter which variable
you drop, but suppose you select MetCount. After dropping it in the center of the table, you will
likely see:
Notice how the text in cell A3 says, “Sum of RaceG.” Excel’s default is to sum the codes for the
variable that was dropped in the center of the table. That is fine if the variable measures the
dollar cost of something, and you want to display the total. However, we wish to count cases,
not sum variable values. To replace counts with sums, double left-click on the “Sum of
MetCounty” label. A new dialog box should open up:
17
In the “Summarize By” box, click on “Count,” and then on OK. You should see this:
Note that the values in the column labeled 2 (for MetCounty 2) are half as big as before, and that
the values in the column labeled 3 are one-third as big as before.
To make the table easier to interpret, replace the numerical variable codes with their
corresponding labels. To do this, type each label into its corresponding box. Doing that will get
you this final pivot table, showing how the number from each ethnic group in the sample differs
among the three counties.
18
2.2 Column percentages
For each county, the simple pivot table we just completed tallied the number of people in the
sample from in each ethnic category. What if your goal was to compare the ethnic composition
of the three counties? This count table would not be an ideal tool, because the total number of
people differs greatly among the counties. To make the table more helpful, you could have
Excel calculate column percentages.
To do that, double
left-click on
“Count of
MetCounty,” to get
the following:
Next, click on
Options, and then
find “% of column”
in the “Show data
as” window. Now
you should see this:
19
Finally, click on OK to see the following:
2.3 Row percentages
The table we just finished allows us to compare the ethnic compositions of the three counties in
our sample. For example, examine the first row. It tells us that 92% of the sample individuals
from Clackamas County are white, while only 81% of the Multnomah county sample is white.
What if you were interested in opening a market in Portland that catered to Chinese people, so
that you wanted to find where most Chinese people lived? If that were your goal, you would
want to calculate row percentages rather than column percentages. Luckily, it is easy to get
Excel to make the switch.
20
First, you would double click on “Count of MetCounty.” Next, you would click on Options, and
then find “% of row” in the “Show data as” window. Finally, you would click on OK. Having
done all of that, you would see the following:
Notice that the row to the right of the “Chinese” label tells us that 48% of the Chinese people in
the sample live in Washington County. Maybe you should investigate locations within
Washington County for your market!
2.4 Grouping rows together
When you create a pivot table, it will often provide more detail than you think is necessary. If
you want a reader to quickly see the point of a table, show only the necessary information. For
example, suppose you decided that your market should cater to all Asians, and also to Pacific
Islanders. It might be useful, in that case, to combine three groups (“Chinese,” “Japanese,” and
“Other Asian or Pacific Islander”) into one.‡
‡
To see labels for these three groups, look in rows 8, 9, and 10 of the illustration above.
21
To do this, begin by highlighting the three labels you wish to combine. After that, right click the
selected area with your mouse, select Group and Show Detail, and select Group. That should
give you this:
If you now left click, these dialog boxes will close. They may be replaced by a message that
begins, “Do you want to replace the contents of the destination cells in…” If you get that
message, click on No. That will leave you with the following:
22
To see row percentages combine for “Group 1”, rather than for each of its ethnic groups
separately, right click in the rectangle labeled “Group 1.” Then, select Group and Show Detail,
then Hide Detail. You should see a table that looks like this:
To neaten things up, delete “Group 1” and type the label “Asian and Pacific Islander” in its
place. Next, left-click on “B” at the top of the second column, then right-click on Hide. Your
final Pivot table should like the following:
The fourth row of the table suggests your Pan-Asian market should be in Multnomah County!
23
3 Figuring Probabilities With Excel Functions
We will use three Excel functions to calculate probabilities: BINOMDIST, NORMDIST, and
TDIST. BINOMDIST figures probabilities using the binomial distribution, NORMDIST figures
probabilities using the normal distribution, and TDIST figures probabilities using the t
distribution.
3.1 BINOMDIST
The binomial distribution is a discrete probability distribution. Our text suggests we think of a
binomial distribution as being generated by a sampling process with the following
characteristics:
 The sample consists of a fixed number of observations, n
 Each observation is classified into success or failure
 The probability than an observation is classified as a success is p, while the probability it
is classified as a failure is 1-p
 Each observation is randomly selected either from an infinite population without
replacement or from a finite population with replacement
Here is an example of a situation you could describe using a binomial distribution: you wish to
analyze reports of phone problems to the call center of a telephone company. In particular, you
are interested in the first three calls of the day, and in the likelihood that the company
successfully resolves exactly two of those problems before the day is over.
Suppose this situation can be analyzed with the binomial distribution. That is another way of
saying that:
a) we have three observations (the first call, the second call, and the third),
b) we can classify each observation as a success or a failure (the company repairs the
phone by the end of the day or it does not), and
c) the probability of success (the chance of repairing a phone by day’s end) is the same
for each observation.
To be concrete, suppose .7 is the probability of success. Given that, we could use the Excel
function BINOMDIST to figure this probability as follows:
=BINOMDIST(2,3,0.7,FALSE)
P(x=2)= 0.441
The probability that exactly two of the first three
troubled phones will be repaired is .441.
24
One can use the function by typing the necessary text (=BINOMDIST(2,3,.7,FALSE) into a cell
and pressing the Enter key.§ Typing this text into a cell and pressing the Enter key will cause
.441 to be displayed. If you type the same text, preceded by a single quotation mark
(‘=BINOMDIST(2,3,.7,FALSE), Excel will display the text itself. The single quotation mark
tells Excel to treat what follows it as text, rather than as a formula to be calculated.
One can also invoke BINOMDIST by using the function wizard. To do this, click on Insert and
then on Function. Then, select “Statistical” in the second window of the Insert Function dialog
box. Finally, select “BINOMDIST” in the third window of the dialog box.
At this point, you
should see
something like this:
§
This notation means that Excel is calculating the probability that 2 out of 3 observations are successes, when the
probability of success is .7. The FALSE tells Excel to calculate the probability of exactly two successes.
25
Next, click on OK
and fill in the
blanks in the Insert
Function dialog
box, using the same
values on the
previous page
(2, 3, .7, FALSE).
You should see
this:
If you then click OK, you can see the answer is the same as above: .441. (The text of the formula
in cell B18 is reproduced in C18 to make it visible.)
26
Let us look at one more example. If we switch the last argument of the BINOMDIST function to
TRUE, then it calculates the probability using a cumulative distribution function. Instead of
calculating the probability of exactly x successes, it calculates the probability of x or fewer
successes. Here is the example:
P(x  2) 
=BINOMDIST(2,3,0.7,TRUE)
0.657
The probability that two or fewer phones (of the first
three) will be repaired is .657.
27
3.2 NORMDIST
NORMDIST finds (for a particular normal distribution) the probability that a normally
distributed random variable takes on a value less than its first argument.
The second argument of NORMDIST is the mean of the particular normal distribution being
evaluated.
The third argument is the standard deviation of the particular normal distribution being
evaluated.
The fourth argument should be TRUE (or 1) if you want to find the probability that a normal
distributed random variable is less than the first argument.**
As an example, consider the following problem from Levine, Stephan, Krehbiel, and Berenson
(our text). Suppose the fees mutual funds charge are normally distributed. In a particular year,
the mean fee for a fund was .93% of the value of a fund’s assets. The standard deviation of the
fees in that year was .30%.
Suppose we select a fund at random. To find the probability that the fund selected charged less
than 1% of the value of its assets, we type the following text into a cell:
=NORMDIST(1,.93,.3,1). Pressing the Enter key will give the desired result (.592).
mean=
standard deviation=
a)
NORMDIST(1,0.93,0.3,1)=
0.93%
0.30%
0.592is the probability that average expense fees are less than 1%.
One can also invoke the NORMDIST function using the function wizard. To do this, click on
Insert and then on Function. Then, select “Statistical” in the second window of the Insert
Function dialog box. Finally, select “NORMDIST” in the third window of the dialog box, and
click on OK.
At this point, you
should see the
following:
**
If you make a FALSE (or 0), then it will give you the value of the normal probability density function at X—the
height of the function above the X-axis.
28
3.3 TDIST
We use the t distribution when we are testing a hypothesis about a population mean and we do
not know the standard deviation of the population. In particular, we use it when we utilize the
p-value approach to testing the hypothesis.
When we know the population standard deviation, we implement the p-value approach with the
standard normal distribution. However, if we must estimate the sample mean’s standard
deviation, the standard normal distribution will not give us an accurate p-value. This is because
it fails to account for the additional insecurity introduced into the hypothesis testing process
when we must utilize an estimate of the population standard deviation. We use the t distribution,
rather than the standard normal distribution, to take into account this additional insecurity.
For example, suppose we have a sample of 49 two-liter bottles from a soft drink bottler’s
production line. Our null hypothesis is that the population mean amount of soda in a bottle is 2
liters. For our 49-bottle sample, suppose the mean content is 2.01 liters, with a standard
deviation of .114 liters.
To calculate the p-value to test this null hypothesis, we figure out how likely it is that we get a
sample mean as extreme as—or more extreme than—2.01 liters if the null hypothesis is true.
Since we are working with the t-distribution, rather than the standard normal distribution, we
cannot find the relevant probability in a table. The standard normal table gives us probabilities
corresponding to a huge number of possible values for z. The table of "Critical Values of t" can't
do the same, because there is not just one t-distribution. There is one t-distribution for each
possible value the degrees of freedom can take on.
Luckily, Excel's TDIST function allows us to easily find the probability we need. We only need
to provide it with 3 arguments: the test statistic, the degrees of freedom, and the number of tails
in our test. Suppose we wish to use the function wizard to access TDIST. We begin by choosing
the relevant function using the Insert menu:
29
Then, we click on OK and fill in the relevant arguments:††
Finally, we click on OK to see the following:
TDIST is telling us that, if the null hypothesis were true, there would be little surprise in seeing a
sample mean of 2.01 liters from a sample of 49 bottles. To be more precise, it tells us there
would be a 54% chance we would get a sample mean .01 liters or more from 2 liters, if the true
population mean was 2 liters. Since that’s a pretty high probability, larger than any typical
alpha, this sample gives us little reason to reject our null hypothesis.
††
We switch to Excel 2007 for the next two screenshots. They were inserted to correct an error in the previous
edition of the handbook. Though the appearance of the program has changed, TDIST is exactly the same.
30
4 Performing Linear Regression
Linear regression involves fitting a line, or a plane, to a data set. This can allow us to seek
evidence for a relationship between a dependent variable of interest and one or more independent
variables. If the relationship we want to investigate is between a single independent variable and
a dependent variable, then we use simple linear regression. Excel can perform a simple linear
regression in two ways: with the Chart Wizard and with Regression Tool on the Data Analysis
Menu. If the relationship of interest is between two or more independent variables and a
dependent variable, then we use multiple linear regression. Excel performs multiple linear
regression only with the Regression Tool.
The sections that follow work through one example of simple regression and one example of
multiple regression.
4.1 Simple Linear Regression
Suppose we suspect a relationship exists between the weight of a car (independent variable) and
its fuel efficiency (dependent variable). One way to gather evidence to support (or undercut) the
existence of this relationship in a population would be to collect a random sample of cars from
that population, recording the weight and fuel efficiency of each one.
Suppose you did this, selecting your sample from the population of all cars produced in 1997.
To get a visual sense of the strength of this relationship, and to see if it is positive or negative,
you could use Excel to make a scatter plot of the two variables. Here’s the data set you
collected:
31
To make the scatter plot, you should select the Weight and MPG columns, including their labels.
Click on the Chart Wizard icon and then on XY (Scatter). Select the first chart sub-type (a scatter
diagram with no lines), and you should see a dialog box that looks like this:
If you then click on Next >, you should see the following:
32
Now take the following four steps: again click on Next > enter chart title, enter the labels for the
X-axis, and enter the label for the Y-axis. That will leave you with the following:
Now, click on Next > one final time. Neatening the resulting graph up will leave you with this:
33
To have Excel perform a simple linear regression, we can work with this graph a bit more.
Begin by clicking on Chart to see this:
Click on Add Trendline. That should leave you with this:
34
To have an unobstructed view of the graph, just click on OK.
Now, suppose you want to know the equation of the line Excel just fit to your data. To display
this equation, along with its R2, right click the trendline and select Format Trendline. In the
dialog box that appears, click on the Options tab. Now, choose Show Equation and R2. After
that, you should see the following:
If you click on OK, that should get you the equation for the linear regression you desired:
35
4.2 Multiple Linear Regression
You just analyzed the relationship between fuel efficiency and weight. In this model, fuel
efficiency is the dependent variable. The model supposes that fuel efficiency depends on
weight.
What if you think one variable depends on two (or more) other variables? In that case, you can
use multiple regression analysis to describe the sample relationship between the variables, or to
test for the existence of a population relationship.
For example, suppose you think that fuel efficiency depends not just on a car’s weight, but also
on the power of its engine. In that case, you might wish to estimate a multiple regression with
fuel efficiency as the dependent variable, using both weight and engine horsepower as
independent variables. To do this, first paste the data into a new worksheet. Now, select Tools,
Data Analysis, and then Rergression. At that point, you should see the following:
36
To continue, select OK. When the Regression dialog box appears, enter the range for your
dependent variable (MPG) in the box labeled “Input Y Range.” (To do this, use the same
technique described in the section above titled “Find the minimum, maximum, and the range.”)
Then, enter the ranges for your independent variables (Weight and Hpower) in the box labeled
“Input X Range.” Select Labels, Residual Plots, and Normal Probability Plots. Select the
output range (As you did above in the “Find the minimum, maximum, and the range” section.)
At this point you should see this:
37
Clicking on OK will get Excel to perform the multiple linear regression. In addition, doing this
will provide you with three plots that can be used to see how appropriate it is to use linear
regression to analyze this data set.
Congratulations for completing this Excel workbook! At this point, you should be able to create
frequency distributions and their corresponding histograms, use three valuable statistical
functions (BINOMDIST, NORMDIST, and TDIST), make two-way pivot tables, and perform
both simple and multiple regressions.
Download