BENEDICTINE UNIVERSITY

advertisement
BENEDICTINE UNIVERSITY
Course Outline
MGT 251
STATISTICS II
Spring, 2015
Text:
Modern Business Statistics with Microsoft Office Excel, 5th edition, Anderson, Sweeney & Williams,
South-Western/Cengage, 2015.
ISBN: 978-1-285-43330-1 (hard cover)
Other: Aplia interactive learning/assignment system.
TI-83 or TI-84 calculator.
Course Prerequisites: MATH 105 (Finite Math I) or MATH 110 (College Algebra)
Instructor: Jeffrey M. Madura, 160 Scholl Hall
B.A., University of Notre Dame
M.B.A., Northwestern University
C.P.A., State of Illinois
Office Hours: Announced in class or see web page at
www.ben.edu/faculty/jmadura/home.htm
e-mail: [email protected]
Course Description: This is a course in introductory statistics. The orientation is toward applications and
problem-solving, not mathematical theory. The instructor intends that students gain an appreciation for the
usefulness of statistical methods in analyzing data commonly encountered in business and the social and
natural sciences. The course is a framework within which students may learn the subject matter. This
framework consists of a program of study, opportunity for questions/discussion, explanation, and evaluation
(quizzes). The major topics are:
Inferences About Population Variances
Tests of Goodness of Fit and Independence
Experimental Design and Analysis of Variance
Simple Linear Regression
Multiple Regression
Nonparametric Methods
The course addresses the following College of Business Program Objectives:
Students in this program will receive a thorough grounding in Mathematics and Statistics.
Your student evaluation of this course will be completed online using the IDEA system. This course emphasizes the
following IDEA objectives:
Learning fundamental principles, generalizations, or theories.
Learning to apply course material to improve thinking, problem-solving, and decision-making.
Developing specific skills, competencies and points of view needed by professionals in the fields most closely
related to this course.
Quizzes and Grades: The course is divided into five three-week parts, with a quiz at the end
of each part. Dates are subject to change.
Quiz 1 Feb. 5
Quiz 2 Feb. 26
Quizzes will constitute 2/3 of your grade.
Quiz 3 Mar. 26
The other 1/3 will be your score on assignments,
Quiz 4 Apr. 16
Class participation may also be a factor.
Quiz 5 Finals Week
Grade requirements: A--90%, B--80%, C--60%, D--50%.
There may also be other assignments requiring analysis of data using Excel, and there may be a term project,
with weight equal to one quiz.
There may also be other assignments requiring the use of Excel. It is the responsibility of any student who is
unsure of the grading scale, course requirements, or anything else in this course outline to ask the instructor
for clarification.
Homework Assignments: There will be about 10 Aplia homework assignments. Due dates are listed in the Aplia
system.
The assignments will constitute 1/3 of the course grade. To accommodate the occasional instance when you cannot
meet an Aplia deadline, the lowest assignment will be dropped. Assignments will be handled by Aplia. You will be
required to access the Aplia website, which means you need to register for an account at: http://www.aplia.com
Please register within 24 hours of the first class meeting.
Please note: The computer is absolutely unforgiving about accepting late assignments. Time is kept at Aplia, and
not by the computer you are working on. You may appeal grading decisions made by the computer, if you can
demonstrate that an error has been made.
Faculty have observed that the worst thing students do, in any course, is not think about course material every day.
They sometimes let weeks go by and then try to learn all the material in one or two days. This usually does not
work. The weekly assignments will require keeping up-to-date.
Calculators: Calculators will be required for the computational portion of each quiz. Bring your calculator to every
class and verify each computation performed. The TI-83 is the standard for this course.
Recommended Exercises: Students should work as many as possible of the even-numbered exercises in the
text. Proficiency gained from practice on these will help when similar problems appear on quizzes. Answers
to even-numbered exercises are at the back of the text.
Attendance: Attendance will be taken occasionally and randomly. Frequent absences will be noticed, and they will
have an adverse impact on quiz performance and your final grade. Two or more absences on days when
quizzes are handed back will lower your grade by one letter grade.
Missed Quizzes: Make-up quizzes will be given only if a quiz was missed for a good and documented reason. If a
make-up is given. The quiz score will be reduced 20% in an effort to maintain some degree of fairness
to those who took the quiz at the proper time.
Use of Class Time: Come to class prepared to discuss the material assigned, and to contribute to the solution of the
assigned problems.
Special Needs: If you have a documented learning, psychological, or physical disability, you may be eligible for
reasonable academic accommodations or services. To request accommodations or services, please contact
Tina in the Student Success Center, 012 Krasa Student Center, 630-829-6512. All students are expected to
fulfill essential course requirements. The University will not waive any essential skill or requirement of a
course or degree program.
Academic Honesty Policy: The search for truth and the dissemination of knowledge are the central mission of a
university. Benedictine University pursues these missions in an environment guided by our Roman Catholic
tradition and our Benedictine heritage. Integrity and honesty are therefore expected of all members of the
community, including students, faculty members, administration, and staff. Actions such as cheating,
plagiarism, collusion, fabrication, forgery, falsification, destructions, multiple submission, solicitation, and
misrepresentation, are violations of these expectations and constitute unacceptable behavior in the University
community. The penalties for such actions can range from a private verbal warning, all the way to expulsion
from the University. The University’s Academic Honesty Policy is available at http://www.ben.edu/AHP.
In this course, academic honesty is expected of all class participants, myself included. If your name is on the
work submitted, it is expected that you alone did the work. For example, in terms of quizzes, this means that
copying from another paper, unauthorized collaboration of any sort, or the use of “cribs” of any kind is a
breach of academic honesty. The penalties for a breach of academic honesty in this course are (1) a zero for
the assignment or quiz for the first offense, and (2) an “F” for the course for a subsequent offense by the same
person(s).
Electronic Devices Policy: One aspect of being a member of a community of scholars is to show respect for
others by the way you behave. Do your part to create or maintain an environment that is conducive to
learning. Turn off your cell phone or set it to mute/silence before you enter class. If you use your cell phone
or any other electronic device in any manner during a quiz, you will receive a zero for that test or quiz. Using
the TI-83/84 calculator is permitted.
Feel free to see me if there is anything else of concern to you. Your comments about this course or any course
are always welcome and appreciated. The student is responsible for the information in the syllabus and should
ask for clarification for anything in the syllabus about which they are unsure.
COURSE PHILOSOPHY -- STATISTICS
In an article in the Chronicle of Higher Education, Sharon Rubin, assistant dean at the University of Maryland,
states that all course syllabi, in addition to providing the basic information on texts, topics, schedule, etc., should
answer certain questions. The instructor of this course would like to share these questions with you, and provide
some answers.
You are what you know. You are what you can do.
"What value can you add to our organization?"
1. WHY SHOULD A STUDENT WANT TO TAKE THIS COURSE?
As a decision-maker, you must learn how to analyze and interpret quantitative information. Such skills will
improve your ability to adopt the questioning attitude and independence of thought that are essential to
leadership and success in any field. You may also have the opportunity to introduce statistical data analyses
in areas where they are not currently in use, thus improving the quality of your organization's decisions.
2. WHAT IS THE RELEVANCE OF THIS COURSE TO THE DISCIPLINE?
Statistics courses are part of the curriculum in many of BU's programs. But since this course is part of a
program leading to a degree in business, let us interpret the word "discipline" in this question to mean
"management." This can refer to marketing management, financial management, human resource
management, etc., even the management of your personal affairs. To MANAGE something requires the
ability to exert some CONTROL over it, and the ability to exert control requires identification of
DEPENDENCIES. In order to manage sales performance, for example, you must find things upon which
sales depends (e.g. advertising budget; product price; number, training, and compensation of salespersons;
interest rates; and competitive factors), and learn something about the nature of the dependencies. Statistics
is the major tool for identifying dependencies.
Another example of the importance of identifying dependencies: a new disease appears. Researchers
immediately try to find things that enhance the occurrence rate or the severity of the illness (positive
dependencies), and things that reduce them (negative dependencies). Only after such things are found can
there be any hope of controlling the disease. Again, statistical analysis plays a major role.
Or, the objective may simply be to know more about how the world works. So-called "pure research" has no
immediate application, but seeks to find relationships among things, thereby securing knowledge that may
become useful in the future.
CAREFUL STATISTICAL ANALYSIS OF DATA OFTEN RESULTS IN THE IDENTIFICATION OF
DEPENDENCIES, and this is the reason why statistics is an important tool in virtually all disciplines.
3. HOW DOES THIS COURSE FIT INTO THE "GENERAL EDUCATION" PROGRAM?
Statistics is a major way in which human beings learn about the world, and how to control it. To be familiar
with a tool as fundamental and important as this is a responsibility of every educated person.
Statistics can be viewed as applied quantitative logic, usually seeking to make inferences about unknown
parameters on the basis of observations and measurements of samples drawn from a target population.
The study of statistics can promote clear and careful thinking, enhance problem-solving skills, and strengthen
one's ability to avoid premature conclusions. These are traits of the educated person, and are the mental
qualities essential for "knowledge workers" in modern society.
4. WHAT ARE THE OBJECTIVES OF THE COURSE?
The most important objective is the development of your ability to learn this kind of material on your own, and
to continue learning more about the subject after the course is over. Continuous and independent learning is
an important activity of every successful person. In connection with the objective of independent learning, the
instructor will expect students to study and learn certain topics in the course without formal discussion of them
in class. Questions on these topics, of course, are always welcomed and encouraged.
With respect to specific objectives, they are: that students learn the terminology, theory, principles, and
computational procedures related to basic descriptive and inferential statistics; and the careful cultivation of
the logical processes involved in statistical inference. This will enable students to understand statistics and
communicate statistical ideas using generally-accepted terminology.
Another important objective is that students become aware of the limitations of various statistical procedures.
This is particularly important since most students in this course will be consumers rather than providers of
statistical information and conclusions. Estimates and forecasts, for example, are generally regarded with too
much faith, and relied upon to a degree not warranted in light of their inherent limitations.
5. WHAT MUST STUDENTS DO TO SUCCEED IN THIS COURSE?
Your activities in this course should include: reading and studying the relevant sections of the text; attending
class and taking notes; rewriting, reviewing, and studying your notes; working the recommended exercises in
the text; practicing and experimenting with various spreadsheet files supplied by the instructor; asking and
answering questions in class; spending time just thinking about the procedures and their underlying logic;
forming a study group with other students to review notes on terminology and concepts, and to practice
problem-solving skills; and taking the quizzes.
These activities should help you to further develop your abilities to read, listen, record, and organize important
information; and to communicate, analyze, compute, and learn independently the subject matter of statistics.
In order to do well, students must recognize a basic difference between courses like statistics and courses like
history, philosophy, management or organizational strategy. In the latter type, the emphasis is often on
general ideas in broad contexts, with grades based on essay exams and term papers in which students have
considerable latitude to choose what they are going to discuss. The cogent expression and defense of wellreasoned opinion are highly valued. Students with good verbal, logical and writing skills often excel in this type
of course. Statistics, on the other hand, is a skills course, requiring precise knowledge of concepts,
terminology, and computational procedures. Verbal skills are still important, but now quantitative logic and
computational competence are also critical. Grades are based on knowledge of terminology and concepts,
and even more on the ability to get the right answers to problems.
Regarding study strategy, it is extremely important for most students to read about statistics, to think about
statistics and to do a few problems every day. The most common error is to neglect the material until shortly
before a quiz. But for most students, many of the concepts in statistics are new and strange, and there will be
many places where they are stopped cold: "What?" "I just don't get this!" Then there is no time left to
cultivate the understanding of new concepts and to refine the computational procedures. Anyone can learn
statistics, but most cannot do it overnight.
As with most courses, this course is organized with the most fundamental material coming first. In learning a
new language, or how to play a musical instrument, or any new set of skills, mastery of the basics is essential
to success later on. The subject matter of statistics is not like history, where, if you did not study 14th century
France, it probably did not affect your learning about 17th century England. In statistics, failure to obtain a
good understanding of earlier material will have a serious adverse effect on your ability to make sense out of
what comes later. It is therefore essential to build a solid foundation of fundamental knowledge early in the
course in order to support the more elaborate logical and computational structures involved later.
6. WHAT ARE THE PREREQUISITES FOR THE COURSE?
The primary prerequisite is a logical mind. This course is computational, but it is not a "math" course.
Mathematical theorems are not derived or proven; the need to solve equations is very rare. The emphasis is
on concrete applications rather than abstract theory. Some students with good math backgrounds have done
poorly, while others with little or no math experience have done very well.
The best MBA stats student I ever had was a philosophy major who did not have single math course at the
college level. When asked about this, the he replied: "My philosophy major gave me excellent training in
logic, and that's really what this course requires."
7. OF WHAT IMPORTANCE IS CLASS PARTICIPATION?
In this course, class participation means frequently asking relevant questions and supplying answers (right or
wrong) to the instructor's and colleagues' questions as problems and examples are worked out and discussed.
These behaviors are evidence of active involvement with the material and will result in better learning and an
automatic positive effect on your grade. In grade border-line cases, a history of active participation will enable
the instructor to award the higher grade to the deserving student.
8. WILL STUDENTS BE GIVEN ALTERNATIVE WAYS TO ACHIEVE SUCCESS, BASED ON DIFFERENT
LEARNING STYLES?
Different learning styles do exist. Some prefer a deductive method (deriving specific knowledge from general
principles), while others tend to prefer an inductive method (deriving the generalities from examples). The
inductive learners may need to work a number of problems before seeing the patterns that are present. The
deductive learners may never need to work a problem--they will know instinctively what to do. Some will not
like the book, and will learn primarily from the class presentations and discussions, while others will learn
mostly from the book and will find class time to be of lesser importance.
But the intended outcomes are the same for all--those in number 4 above.
9. WHAT IS THE PURPOSE OF THE ASSIGNMENTS?
Problems from the text may be suggested, for the purpose of providing practice in analyzing what must be
done, and in performing the required computations. Even though computer software is available to perform
calculations, students can gain insight into the logical structure of a sequence of computational steps if they go
through them several times by hand (i.e. using simple calculators).
Computer assignments using instructor-supplied spreadsheet files will require students to become more
familiar with spreadsheet software that they probably are or will be using in connection with their work. More
importantly, the spreadsheets allow students to experiment with data in order to investigate the quantitative
relationships involved. Such experimentation would be too tedious and time-consuming for manual or even
calculator computation.
10. WHAT WILL THE TESTS TEST? -- MEMORY? UNDERSTANDING? ABILITY TO SYNTHESIZE? TO
PRESENT EVIDENCE LOGICALLY? TO APPLY KNOWLEDGE IN A NEW CONTEXT?
The tests will test your ability to recognize and use statistical terminology correctly, and they will test your
understanding of the logic and principles underlying various statistical procedures. In addition, you will have to
demonstrate your ability to solve problems similar to those discussed in class, sometimes using computer
spreadsheet files.
There is a place for memorization in learning. It is not a substitute for comprehension, but it is better than
getting something wrong on a quiz that you were expected to know. As with prayers among small children,
memorization is often a first step, eventually followed by understanding. But if the memorization (of
terminology, for example) is not done, it is less likely that the comprehension will ever occur.
11. WHY HAS THIS PARTICULAR TEXT BEEN CHOSEN?
Our text is one of the most widely adopted introductory statistics books. It has gone through several editions,
and its popularity remains high. It is relatively easy to read, and its exercise material is excellent.
12. WHAT IS THE RELATIONSHIP BETWEEN KNOWLEDGE LEVEL AND GRADES?
Consider this hypothetical but realistic situation.
Knowledge
Percentage Grade
Course A
Course B
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
100%
81%
64%
49%
36%
25%
16%
9%
4%
1%
Course A might be like philosophy, history, or management, where the grade is more-or-less proportional to
knowledge level. Course B might be like statistics or other skills courses, where small deficiencies in
knowledge can have disastrous effects on results. Overstudying is the best strategy for coping with this, with
the dual payoffs of higher grades and, more importantly, greater knowledge.
QUIZ
QUIZ
HW
0.667
0.333
HW
10
20
100
70.0
95
A
B
C
D
F
30
40
50
60
70
80
90
95
73.4
76.7
80.0
83.4
86.7
90.0
93.3
96.7
98.3
100
100
66.7
70.0
73.4
76.7
80.0
83.3
86.7
90.0
93.3
95.0
96.7
95
90
63.4
66.7
70.0
73.4
76.7
80.0
83.3
86.7
90.0
91.7
93.3
90
85
60.0
63.4
66.7
70.0
73.3
76.7
80.0
83.3
86.7
88.3
90.0
85
80
56.7
60.0
63.4
66.7
70.0
73.3
76.7
80.0
83.3
85.0
86.7
80
75
53.4
56.7
60.0
63.3
66.7
70.0
73.3
76.7
80.0
81.7
83.3
75
70
50.0
53.4
56.7
60.0
63.3
66.7
70.0
73.3
76.7
78.3
80.0
70
65
46.7
50.0
53.3
56.7
60.0
63.3
66.7
70.0
73.3
75.0
76.7
65
60
43.4
46.7
50.0
53.3
56.7
60.0
63.3
66.7
70.0
71.7
73.3
60
55
40.0
43.3
46.7
50.0
53.3
56.7
60.0
63.3
66.7
68.3
70.0
55
50
36.7
40.0
43.3
46.7
50.0
53.3
56.7
60.0
63.3
65.0
66.7
50
45
33.3
36.7
40.0
43.3
46.7
50.0
53.3
56.7
60.0
61.7
63.3
45
40
30.0
33.3
36.7
40.0
43.3
46.7
50.0
53.3
56.7
58.3
60.0
40
30
23.3
26.7
30.0
33.3
36.7
40.0
43.3
46.7
50.0
51.6
53.3
30
20
16.7
20.0
23.3
26.7
30.0
33.3
36.7
40.0
43.3
45.0
46.6
20
100
I can use Excel to
perform basic computations
prepare tables
create charts and graphs
conduct common statistical procedures
create dashboards
I can use Word to
create various kinds of documents
I can
compute
means
medians
variances
standard deviations
confidence intervals for means and proportions
use the
binomial distribution to answer probability questions
normal distribution to answer probability questions
chi-square distribution to answer probability questions
F distribution to answer probability questions
conduct
hypothesis tests on
the means of one group or two
the proportions of one group or two
hypothetical vs. observed distributions
variances of one group or two
group means using ANOVA
regression analysis to examine correlation and make forecasts
I can
perform financial analysis
compute the NPV of various investment opportunities
decide between using debt or equity to raise new funds
determine the optimum mix of debt and equity financing
compute cost-of-capital
decide whether to make or buy components for our products
determine how much direct labor, direct materials, and overhead is going into our products
create cash budgets
conduct cost-volume-profit analyses
prepare a master budget
prepare prformance reports using standard costs and variances
employ the scientific method to study problems that may come up
PART TWO -- Essentials--Analysis of Enumerative Data
Enumerate: to count, usually after classification has been performed
Enumerative data: data obtained by classifying and counting occurrences
Multinomial experiment--like the binomial experiment, except each trial has more than two outcomes
n identical trials; k possible outcomes on each trial
Independence--the outcome of one trial does not affect the outcome of any other trial
Constant probabilities for each outcome from trial to trial
p1, p2, p3, . . ., pk are the probabilities of the various outcomes
Cell counts (number of times each outcome occurs) are the variables to be
analyzed
2
Chi-square (χ ) distribution: continuous, positively skewed
One-dimensional chi-square test--“goodness of fit” tests
Ho: that a population conforms to some expected distribution.
A cell consists of an expectation (E) and an observation (O).
Expected values (E) are derived from Ho.
The number of cells is denoted by k.
2
Calculated chi-square (test statistic, χ c) for a cell is the squared deviation
2
2
(E-O) divided by E. The χ c is the total of all the cells.
Degrees of freedom (df): the number of cells minus one (k-1)
(d.f. = k – 3 when the normal distribution is used.)
(d.f. = k – 2 when the Poisson distribution is used.)
2
2
Ho is rejected if χ c  χ t , also if p ≤ 
If Ho is rejected, additional information should be reported as to the nature of the deviation from the
expected distribution.
Often used to test for normal distributions.
For the sample size to be sufficient, the expected number (e) in each cell
should equal or exceed 5.
Two-dimensional chi-square test
H0: in the population the row variable and column variable are independent.
Ha: in the population the row variable and column variable are dependent.
Contingency (dependency) table contains a matrix of cells
A cell consists of an expectation (E) and an observation (O).
Expected values (E) are derived from H0 using the multiplication rule for
intersections of independent events: P(A B) = P(A) * P(B).
Calculated chi-square for a cell is (E-O)2 / E (same as above).
2
2
Ho is rejected if χ c  χ t , also if p ≤ 
Degrees of freedom: number of rows minus one, times number of columns
minus one; (r-1)(c-1) where r and c are the numbers of rows and columns
If H0 is rejected, additional information should be reported as to the nature of the dependencies.
For the sample size to be sufficient, the expected number (e) in each cell
should equal or exceed 5.
Terminology--explain each of the following:
enumerative data, multinomial experiment, binomial experiment, identical trials, independence, one-dimensional
or one-way chi-square test, “goodness-of-fit” test, two-dimensional or two-way chi-square test, dependency,
contingency table, multiplication rule for intersections of independent events.
Skills and Procedures
 given appropriate data, conduct a one-way chi-square test and interpret the results
 given appropriate data, conduct a two-way chi-square test and interpret the results
Concepts
 describe what is meant by “goodness-of-fit”
 explain how expected values are determined in a one-way chi-square test
 explain how the concept of “deviation” applies in chi-square test computations
 explain how expected values are determined in a two-way chi-square test
 describe the application of the “multiplication rule for independent events” in two-way chi-square analysis
If the H0 is rejected:
One-Way: “The differences between the observations and the expectations are statistically significant at the
______ level. The population probably does not conform to the expected distribution.” (You should say
more about the nature of the differences between the observations and the expectations.)
Two-Way: “There is statistically significant dependence between ______ and ______ at the ______ level.”
(Give more information about the dependencies.)
If the H0 is not rejected:
One-Way: “The differences between the observations and the expectations are not statistically significant at the
______ level. The population could conform to the expected distribution.”
Two-Way: “The dependence between ______ and ______ is not statistically significant at the ______ level.”
PART THREE -- Essentials--Analysis of Variance (ANOVA)
Purpose: To test for differences between/among two or more population means.
H0: μ1 = μ2 = μ3 . . .;
Population means are all equal.
Ha: not μ1 = μ2 = μ3 . . .;
Population means are not all equal;
Note that Ha is not "all the population means are different."
Rejection of Ho means that there is a statistically significant difference between at
least two of the sample means.
Interval estimation of population means and differences between population means is
also possible.
Sums of squared deviations
TSS--total sum of squared deviations
SST--sum of squared deviations for treatments (between-group variation)
SSE--sum of squared deviations for error (within-group variation)
TSS = SST + SSE
Means of squared deviations--recall that a variance is a mean of squared deviations.
MST--mean of squared deviations for treatments (between-group variance)
MSE--mean of squared deviations for error (within-group variance)
Signal-to-noise analogy
Signal: between-group variance, MST
Noise: within-group variance, MSE
The more false Ho is (the larger the differences between/among population means),
the larger MST will be relative to MSE.
ANOVA table--standardized way of presenting computations and results
Calculated F ( test statistic, Fc ) is MST / MSE
Total degrees of freedom: the number of observations minus one
Degrees of freedom for treatments: number of treatments minus one
Degrees of freedom for error: the number of observations minus the number
of treatments
When there are only two groups and a t-test could be used, the Fc will be equal to the
square of the tc.
Reject Ho if Fc  Ft and if p  α.
Four assumptions (same as t-tests of chapter 9)
Samples
Random
Independent
Populations
Normally distributed
Equal variances
Moderate departures from the assumptions will not seriously affect validity (robust)
One-way ANOVA--completely randomized design
Two-way ANOVA--randomized block design
TSS = SST + SSB + SSE (B = "blocks")
Two calculated F's: treatments FT = MST / MSE and blocks FB = MSB / MSE
Total degrees of freedom: the number of observations minus one
Degrees of freedom for treatments: the number of treatments minus one
Degrees of freedom for blocks: the number of blocks minus one
Degrees of freedom for error: the number of observations minus the number of
treatments, minus the number of blocks, plus one
Estimation in One-Way ANOVA
tt in the following equations is based on the number of degrees of freedom for error.
Single population mean
 = X  t t ( ˆ x )
where
MSE

n
ˆ X =
MSE / n
Difference between two population means:
(  1 -  2 ) = ( x1 - x2 )  t t ˆ ( x1- x2 )
where
ˆ ( x - x ) = MSE x
1
2
1
+
n1
1
n2
Estimation in two-way ANOVA (randomized block design)
Two-way ANOVA estimation -- valid only for differences between population means.
Confidence intervals cannot be obtained for individual treatment means.
tt in the following equations is based on the number of degrees of freedom for error,
Difference between two population means:
(  1 -  2 ) = ( x1 - x2 )  t t ˆ ( x1- x2 )
where
ˆ ( x - x ) = MSE x
1
2
1
n1
+
1
n2
Three-way analysis of variance
"Latin square" design
Terminology--explain each of the following:
TSS--total sum of squared deviations, SST--sum of squared deviations for treatments (between-group variation),
SSE--sum of squared deviations for error (within-group variation), variance, MST--mean of squared
deviations for treatments (between-group variance), MSE--mean of squared deviations for error
(within-group variance), signal-to-noise ratio, ANOVA table, calculated F (MST / MSE), degrees of
freedom (treatments, blocks, error), four assumptions (same as t-tests of chapter 9), robust test-moderate departures from the assumptions will not seriously affect validity, completely randomized
design, randomized block design, "Latin square" design
Skills and Procedures
 given appropriate data, conduct a one-way ANOVA and interpret the results; include all possible 95%
confidence intervals
 given appropriate data, conduct a two-way ANOVA and interpret the results; include all possible 95%
confidence intervals
Concepts
 explain why, when ANOVA deals with tests on means, it is called “analysis of variance”
 explain the “signal-to-noise ratio” concept in the context of ANOVA
 describe the shortcoming that ANOVA shares with small-sample t-tests
 show where the variances are found in the ANOVA table
If the H0 is rejected:
“The difference between at least two of the sample means of the __________ is statistically significant at
the α level. The population means are probably not all equal.”
If the H0 is not rejected:
“The differences among the sample means of the __________ are not statistically significant at the α
level. All the population means could be equal.”
PART FOUR -- Essentials--Linear Regression and Correlation
Major purpose in business: forecasting
In order for forecasting to be possible, the future must, in some way, be like the past.
Forecasting methods seek to identify relationships from the past, and use them to
predict the future (assuming that the identified relationship will persist).
Finding relationships is a way of identifying dependencies.
Dependent variable--one to be predicted
Independent variable--one used to make the prediction
Types of regression
Based on the number of independent variables
Simple regression--one predictor or independent variable (x)
E.g.
y = a + bx
Multiple regression--two or more predictor or independent variables (x1, x2, . . . ,xn)
E.g.
y = a + bx1 +cx2 +dx3 +ex4
Based on the type of regression line
Linear:
y = a + bx
a = y-intercept;
b = slope
or y = mx + b:
b = y-intercept;
m = slope
or y = β0 + β1 x:
β0 = y-intercept; β1 = slope
Slope is the coefficient (multiplier) of x, no matter what symbol is used or where
it appears in the equation.
Slope is the change in y for a one-unit change in x.
Usually regarded as the single most important result in regression, because it
describes the nature of the relationship between y and x.
In multiple regression, each independent variable has its own slope and its own
Intercept is the other value, also known as the "constant".
Intercept is the value of y when x = 0.
Non-linear (curved):
exponential e.g.
y = abx or y = 35(1.06)x
logarithmic e.g.
y = a log x or 3.2 log x
power e.g.
y = axb or 60(x)5
trigonometric e.g. y = a sin x or 3.7 sin x
etc.
Over a restricted range (relevant range) a curve can be approximated with a straight line
Based on the nature of the suspected relationship between y and x
Causal regression: x may be an actual cause of y, or x may be related to something
else that is a cause of y
Time series regression--popular in business and economics
Time is the independent (x) variable, used to substitute for the actual causes of y.
In time series, it is often better to use less historical data rather than more.
The future is likely to be more like the recent past than the more distant past.
With less data x is closer to x-bar (see below).
Correlation--the degree of "relatedness" between dependent and independent variables
Types of correlation
positive: dependent variable increases as the independent variable increases
negative: dependent variable decreases as the independent variable increases
none: no apparent relationship between dependent variable and independent variable
Measures of correlation
Coefficient of non-determination, k2--always positive--range, 0 to 1
If there is perfect correlation, k2 is equal to zero.
If there is no correlation, k2 is equal to one.
Coefficient of determination, r2, equal to 1 - k2--always positive--range, 0 to 1
If there is perfect correlation, r2 is equal to one.
If there is no correlation, r2 is equal to zero.
Correlation coefficient, r, the square root of r2--positive or negative, depending on
the type of correlation--range -1 to +1
Note: ρ (rho) and ρ2 are the population parameters corresponding to r and r2
Correlation and causation
The presence of correlation does not, in itself, prove that x causes y.
Three things necessary to prove causation
Statistically significant correlation between the effect, y, and the alleged cause, x.
Alleged cause, x, must be present before or at the same time as the effect, y.
Explanation must be found as to how x causes y.
Prediction errors--five standard errors (sampling standard deviations)
Standard error of the slope, σb
Measure of uncertainty regarding the slope of the regression line
Used to find confidence interval for the slope: β = b ± ttσb
Note: β is the population slope, estimated by b.
Standard error of the intercept, σa
Measure of uncertainty regarding the intercept of the regression line
Used to find confidence interval for the intercept: α = a ± ttσa
Note: α is the population intercept, estimated by a.
Standard error of estimate, σd and standard error of prediction, σpred
Measures of uncertainty regarding predictions
Used in finding confidence interval for predictions: y = y' ± ttσpred
Predictions have the least uncertainty when the value of x is near x-bar.
Standard error of the correlation coeffiecient, σr
Measure of uncertainty regarding the correlation coefficient
Types of variation in regression
Initial or original variation
Sum of the squared deviations between the data y-values and the mean of the
y-values -- Σ(y-ybar)2
Residual variation
Sum of the squared deviations between the data y-values and the predicted
y-values -- Σ(y-y')2
Removed or explained variation
Initial variation minus residual variation
k2 is the ratio of residual variation to original variation, Σ(y-y')2 / Σ(y-ybar)2.
r2 is the ratio of removed variation to original variation.
Hypothesis testing in regression
Ho: No correlation (relationship) between y and x.
ρ = 0 or ρ2 = 0 or β = 0
Ha: Correlation between y and x (two-sided)
Positive correlation between y and x (one-sided)
Negative correlation between y and x (one-sided)
Reject Ho if tc  tt (when n is small) or if zc  zt (when n is large).
When n is small, df = (n-2)
Reject Ho if p  α (hypothesis-test α, not intercept α)
If Ho is not rejected, there is no statistically significant correlation between x and y.
The regression equation should not be used--just use y-bar to predict y, or don't
make a prediction at all.
Exponential regression (not in the textbook)
Linear vs. exponential growth
Simple interest--example of linear growth
Interest is paid only on the initial deposit
E.g. $1,000 deposited today at 5% is worth $1,000 + $50(x) after x years.
$1,000 is the intercept (value of y today, when x = 0).
$50 is the slope (change in y each year (5% of $1,000)).
The slope, $50, is constant.
Compound interest--example of exponential growth
Interest paid not only on the initial deposit, but also on previously-earned interest.
E.g. $1,000 deposited today at 5% is worth $1,000 (1.05)x after x years
$1,000 is the intercept (value of y today, when x = 0)
1.05 is the growth factor (b), which is equal to 1 + the growth rate (r)
b = 1+ r and r = b - 1
In the above example r = 0.05 (5%) and b = 1.05
The slope is not constant, but increases as x increases.
Exponential equation: y'exp = a (b)x
a = y-intercept; b = compound growth factor
Growth rate r = b - 1, and compound growth factor b = 1+ r
"b" values compared:
Linear:
y = a + b(x)
b < 0 negative correlation
b = 0 no correlation (y = intercept a, regardless of value of x)
b > 0 positive correlation
Exponential: y = a (b)x
b < 1 negative correlation
b = 1 no correlation (y = intercept a, regardless of value of x)
b > 1 positive correlation
Exponential regression computations
Procedure is based on the fact that if y is an exponential function of x, then ln y
(or log y) is a linear function of x
That is, if y = a(b)x, then ln y = a' + b'(x) or log y = a'' + b''(x).
(The three "a" and "b" values in the above equations are different.)
Procedure
Transform the y-values into the lns (or logs) of the y-values.
Math review
The logarithm of a number is the power to which a base number must
be raised in order to give the original number
Natural logarithms use the number e (2.718281828...) as the base.
ln 25 is 3.218876 because e3.218876 is 25
ln 100 is 4.605170 because e4.605170 is 100
Common logarithms use the number 10 as the base
log 25 is 1.397940 because 101.397940 is 25
log 100 is 2 because 102 is 100
Perform linear regression analysis on the lns (or logs) of the y-values.
Result is a linear equation for predicting the ln (or log) of y
ln y' = a'+b'x or log y' = a''+b''x
Determine a and b values in y' = a(b)x
a is the inverse ln of a' (or the inverse log of a'')
b is the inverse ln of b' (or the inverse log of b'')
Inverse ln of z = ez (or Inverse log of z = 10z)
Confidence intervals in exponential forecasting
Intervals are first computed for ln (or log) of y', then are converted to LCL and UCL
values using inverse lns (or logs)
Two-point regression--linear and exponential--quick forecasts (see examples at end of outline)
Linear
Slope (b) is the difference between y-values divided by the difference between
x-values.
Let y-axis be located at the first x-value (let first x-value correspond to zero
on the x-axis).
Intercept (a) is then the first y-value.
Equation y' = a + bx can then be written and used to make forecasts
Exponential
Growth factor (b) is the ratio of the y-values raised to the 1/n power, where n is the
Let y-axis be located at the first x-value (let first x-value correspond to zero
on the x-axis).
Intercept (a) is then the first y-value.
Equation y' = abx can then be written and used to make forecasts
Confidence intervals cannot be computed for two-point forecasts.
Multiple Regression
More than one independent variable
Linear form: y' = a + bx1 + cx2 + dx3 + . . . (a coefficient for each variable)
Partial correlation coefficients and partial coefficients of determination
r1, r2, r3, . . . and r12, r22, r32, . . .
Terminology--explain each of the following:
forecasting (basic concept), dependent variable, independent variable, simple regression, multiple
regression, linear regression, intercept, slope, non-linear regression, exponential regression, causal
regression, time-series regression, correlation, positive correlation, negative correlation, k 2,
coefficient of non-determination, r2, coefficient of determination, r, correlation coefficient, causation,
standard error of the slope, standard error of the intercept, standard error of estimate, standard
error of prediction, standard error of the correlation coefficient, initial or original variation, residual
variation, removed or explained variation, null hypothesis in regression, alternate hypotheses in
regression, simple interest, compound interest, compound growth factor, growth rate,
transformation, logarithm, natural logarithm, common logarithm, inverse logarithm, two-point
regression, multiple regression, partial correlation, cross-products, degrees of freedom, table-t,
calculated-t, signal-to-noise ratio
Skills and Procedures
 perform linear regression using the TI-83 and the spreadsheet <<REG>>, including
predictions, error factors, hypothesis tests, and evaluation of the degree of correlation
 perform exponential regression using the TI-83 and the spreadsheet <<REG>>, including
predictions, error factors, hypothesis tests and evaluation of the degree of correlation
 interpret, in nonmathematical terms, the intercept and slope in linear regression
 interpret, in nonmathematical terms, the intercept and growth factor in exponential
regression
 interpret the coefficients of nondetermination and determination in linear and exponential
regression
Concepts
 describe “intercept” as nonmathematically as possible
 describe “slope” as nonmathematically as possible
 describe “compound growth factor” as nonmathematically as possible











explain the difference between simple regression and multiple regression
explain the significance of the “sum of the squared deviations between the data points and their
mean”
explain the significance of the “sum of the squared deviations between the data points and the
regression line”
describe the relationship between the “coefficient of nondetermination” and the two items
immediately above
describe the relationship between the “coefficient of nondetermination” and the “coefficient of
determination”
identify the difference between linear growth and exponential growth in terms of what is
constant in each case
explain why the demonstrated correlation between smoking and lung cancer does not prove
that smoking causes lung cancer
describe the relationship among the three types of variation: “original,” “residual,” and
“explained” (or “removed”)
explain the relationship between regression hypothesis-test results and the ability (advisability)
to make predictions
in exponential growth, describe the relationship between the compound growth factor and the
growth rate
describe how a regression line, straight or exponential, may be fitted between two data points
If the Ho is rejected:
“The correlation between _____ and _____ is statistically significant at the __ level.”
If the Ho is not rejected:
“The correlation between _____ and _____ is not statistically significant at the __ level.”
Two-point regression examples: A city’s population was 234,000 in 1995, and 683,000 in 2005.
What are the growth rates and forecasts for 2010?
Linear: The b-value is (683,000 - 234,000) / 10 = 44,900 people per year.
Equation is y’ = 234,000 + 44,900(x)
Forecast for 2010 is y’ = 234,000 + 44,900(15) = 907,500.
Exponential: The b-value is (683,000 / 234,000) ^ (1/10) = 1.113065 or 11.31% annual growth.
Equation is y’ = 234,000 * 1.113065 ^ x
Forecast for 2010 is y’ = 234,000 * 1.113065 ^ 15 = 1,166,872.
PART FIVE -- Essentials--Nonparametric Statistics
Parameter: population characteristic
Nonparametric test: does not require any particular population characteristics
Advantage: no required population characteristics
Disadvantage: not as powerful as parametric tests (t-test, ANOVA)
Power: ability of a test to detect when Ho is false, and give the correct
conclusion (rejection of Ho)
Power of any test can be increased by increasing the sample size.
If two different tests are applied to the same data, the more powerful test will
produce a lower p-value.
Sign Test--for differences between population means, paired-difference design
Based on the binomial distribution
Ho: the population means are equal.
Ha: the population means are not equal (2-sided). 1-sided tests are also possible.
Wilcoxon Signed-Rank Test--test for differences between population means,
paired-difference design
Ho: the population means are equal.
Ha: the population means are not equal (2-sided).
1-sided tests are also possible.
Procedure--see “notes”
Mann-Whitney "U" Test--test for differences between population means, unpaired
design
Ho: the population means are equal.
Ha:
the population means are not equal (2-sided).
1-sided tests are also possible.
Procedure--see ”notes”
Runs Test (not in textbook)--test for independence in a series of binomial events
Procedure--see ”notes”
Terminology--explain each of the following:
Sign Test--application, nonparametric, parametric, sign test, binomial distribution, paired-difference design,
power, ranking, tied observations, tied rankings, Wilcoxon Signed-Rank Test--application, Mann-Whitney “U”
Test--application, Runs Test--application, run, positive dependence, negative dependence, independence.
Skills and Procedures
 given appropriate data, conduct a Sign Test and interpret the results
 given appropriate data, conduct a Wilcoxon Signed-Rank Test and interpret the results
 given appropriate data, conduct a Mann-Whitney “U” Test and interpret the results
 given appropriate data, conduct a Runs Test and interpret the results
Concepts
 describe the advantage of nonparametric tests
 describe the disadvantage of nonparametric tests
 explain how the disadvantage of nonparametric tests may be overcome
 explain the theory of the sign test
 explain the concept of “power” and tell why nonparametric tests are generally less powerful than their
parametric equivalents
 describe what is meant by “randomness” in a series of binomial events
 describe what is meant by “positive dependence” in a series of binomial events
Download