BENEDICTINE UNIVERSITY Course Outline MGT 251 STATISTICS II Spring, 2015 Text: Modern Business Statistics with Microsoft Office Excel, 5th edition, Anderson, Sweeney & Williams, South-Western/Cengage, 2015. ISBN: 978-1-285-43330-1 (hard cover) Other: Aplia interactive learning/assignment system. TI-83 or TI-84 calculator. Course Prerequisites: MATH 105 (Finite Math I) or MATH 110 (College Algebra) Instructor: Jeffrey M. Madura, 160 Scholl Hall B.A., University of Notre Dame M.B.A., Northwestern University C.P.A., State of Illinois Office Hours: Announced in class or see web page at www.ben.edu/faculty/jmadura/home.htm e-mail: jmadura@ben.edu Course Description: This is a course in introductory statistics. The orientation is toward applications and problem-solving, not mathematical theory. The instructor intends that students gain an appreciation for the usefulness of statistical methods in analyzing data commonly encountered in business and the social and natural sciences. The course is a framework within which students may learn the subject matter. This framework consists of a program of study, opportunity for questions/discussion, explanation, and evaluation (quizzes). The major topics are: Inferences About Population Variances Tests of Goodness of Fit and Independence Experimental Design and Analysis of Variance Simple Linear Regression Multiple Regression Nonparametric Methods The course addresses the following College of Business Program Objectives: Students in this program will receive a thorough grounding in Mathematics and Statistics. Your student evaluation of this course will be completed online using the IDEA system. This course emphasizes the following IDEA objectives: Learning fundamental principles, generalizations, or theories. Learning to apply course material to improve thinking, problem-solving, and decision-making. Developing specific skills, competencies and points of view needed by professionals in the fields most closely related to this course. Quizzes and Grades: The course is divided into five three-week parts, with a quiz at the end of each part. Dates are subject to change. Quiz 1 Feb. 5 Quiz 2 Feb. 26 Quizzes will constitute 2/3 of your grade. Quiz 3 Mar. 26 The other 1/3 will be your score on assignments, Quiz 4 Apr. 16 Class participation may also be a factor. Quiz 5 Finals Week Grade requirements: A--90%, B--80%, C--60%, D--50%. There may also be other assignments requiring analysis of data using Excel, and there may be a term project, with weight equal to one quiz. There may also be other assignments requiring the use of Excel. It is the responsibility of any student who is unsure of the grading scale, course requirements, or anything else in this course outline to ask the instructor for clarification. Homework Assignments: There will be about 10 Aplia homework assignments. Due dates are listed in the Aplia system. The assignments will constitute 1/3 of the course grade. To accommodate the occasional instance when you cannot meet an Aplia deadline, the lowest assignment will be dropped. Assignments will be handled by Aplia. You will be required to access the Aplia website, which means you need to register for an account at: http://www.aplia.com Please register within 24 hours of the first class meeting. Please note: The computer is absolutely unforgiving about accepting late assignments. Time is kept at Aplia, and not by the computer you are working on. You may appeal grading decisions made by the computer, if you can demonstrate that an error has been made. Faculty have observed that the worst thing students do, in any course, is not think about course material every day. They sometimes let weeks go by and then try to learn all the material in one or two days. This usually does not work. The weekly assignments will require keeping up-to-date. Calculators: Calculators will be required for the computational portion of each quiz. Bring your calculator to every class and verify each computation performed. The TI-83 is the standard for this course. Recommended Exercises: Students should work as many as possible of the even-numbered exercises in the text. Proficiency gained from practice on these will help when similar problems appear on quizzes. Answers to even-numbered exercises are at the back of the text. Attendance: Attendance will be taken occasionally and randomly. Frequent absences will be noticed, and they will have an adverse impact on quiz performance and your final grade. Two or more absences on days when quizzes are handed back will lower your grade by one letter grade. Missed Quizzes: Make-up quizzes will be given only if a quiz was missed for a good and documented reason. If a make-up is given. The quiz score will be reduced 20% in an effort to maintain some degree of fairness to those who took the quiz at the proper time. Use of Class Time: Come to class prepared to discuss the material assigned, and to contribute to the solution of the assigned problems. Special Needs: If you have a documented learning, psychological, or physical disability, you may be eligible for reasonable academic accommodations or services. To request accommodations or services, please contact Tina in the Student Success Center, 012 Krasa Student Center, 630-829-6512. All students are expected to fulfill essential course requirements. The University will not waive any essential skill or requirement of a course or degree program. Academic Honesty Policy: The search for truth and the dissemination of knowledge are the central mission of a university. Benedictine University pursues these missions in an environment guided by our Roman Catholic tradition and our Benedictine heritage. Integrity and honesty are therefore expected of all members of the community, including students, faculty members, administration, and staff. Actions such as cheating, plagiarism, collusion, fabrication, forgery, falsification, destructions, multiple submission, solicitation, and misrepresentation, are violations of these expectations and constitute unacceptable behavior in the University community. The penalties for such actions can range from a private verbal warning, all the way to expulsion from the University. The University’s Academic Honesty Policy is available at http://www.ben.edu/AHP. In this course, academic honesty is expected of all class participants, myself included. If your name is on the work submitted, it is expected that you alone did the work. For example, in terms of quizzes, this means that copying from another paper, unauthorized collaboration of any sort, or the use of “cribs” of any kind is a breach of academic honesty. The penalties for a breach of academic honesty in this course are (1) a zero for the assignment or quiz for the first offense, and (2) an “F” for the course for a subsequent offense by the same person(s). Electronic Devices Policy: One aspect of being a member of a community of scholars is to show respect for others by the way you behave. Do your part to create or maintain an environment that is conducive to learning. Turn off your cell phone or set it to mute/silence before you enter class. If you use your cell phone or any other electronic device in any manner during a quiz, you will receive a zero for that test or quiz. Using the TI-83/84 calculator is permitted. Feel free to see me if there is anything else of concern to you. Your comments about this course or any course are always welcome and appreciated. The student is responsible for the information in the syllabus and should ask for clarification for anything in the syllabus about which they are unsure. COURSE PHILOSOPHY -- STATISTICS In an article in the Chronicle of Higher Education, Sharon Rubin, assistant dean at the University of Maryland, states that all course syllabi, in addition to providing the basic information on texts, topics, schedule, etc., should answer certain questions. The instructor of this course would like to share these questions with you, and provide some answers. You are what you know. You are what you can do. "What value can you add to our organization?" 1. WHY SHOULD A STUDENT WANT TO TAKE THIS COURSE? As a decision-maker, you must learn how to analyze and interpret quantitative information. Such skills will improve your ability to adopt the questioning attitude and independence of thought that are essential to leadership and success in any field. You may also have the opportunity to introduce statistical data analyses in areas where they are not currently in use, thus improving the quality of your organization's decisions. 2. WHAT IS THE RELEVANCE OF THIS COURSE TO THE DISCIPLINE? Statistics courses are part of the curriculum in many of BU's programs. But since this course is part of a program leading to a degree in business, let us interpret the word "discipline" in this question to mean "management." This can refer to marketing management, financial management, human resource management, etc., even the management of your personal affairs. To MANAGE something requires the ability to exert some CONTROL over it, and the ability to exert control requires identification of DEPENDENCIES. In order to manage sales performance, for example, you must find things upon which sales depends (e.g. advertising budget; product price; number, training, and compensation of salespersons; interest rates; and competitive factors), and learn something about the nature of the dependencies. Statistics is the major tool for identifying dependencies. Another example of the importance of identifying dependencies: a new disease appears. Researchers immediately try to find things that enhance the occurrence rate or the severity of the illness (positive dependencies), and things that reduce them (negative dependencies). Only after such things are found can there be any hope of controlling the disease. Again, statistical analysis plays a major role. Or, the objective may simply be to know more about how the world works. So-called "pure research" has no immediate application, but seeks to find relationships among things, thereby securing knowledge that may become useful in the future. CAREFUL STATISTICAL ANALYSIS OF DATA OFTEN RESULTS IN THE IDENTIFICATION OF DEPENDENCIES, and this is the reason why statistics is an important tool in virtually all disciplines. 3. HOW DOES THIS COURSE FIT INTO THE "GENERAL EDUCATION" PROGRAM? Statistics is a major way in which human beings learn about the world, and how to control it. To be familiar with a tool as fundamental and important as this is a responsibility of every educated person. Statistics can be viewed as applied quantitative logic, usually seeking to make inferences about unknown parameters on the basis of observations and measurements of samples drawn from a target population. The study of statistics can promote clear and careful thinking, enhance problem-solving skills, and strengthen one's ability to avoid premature conclusions. These are traits of the educated person, and are the mental qualities essential for "knowledge workers" in modern society. 4. WHAT ARE THE OBJECTIVES OF THE COURSE? The most important objective is the development of your ability to learn this kind of material on your own, and to continue learning more about the subject after the course is over. Continuous and independent learning is an important activity of every successful person. In connection with the objective of independent learning, the instructor will expect students to study and learn certain topics in the course without formal discussion of them in class. Questions on these topics, of course, are always welcomed and encouraged. With respect to specific objectives, they are: that students learn the terminology, theory, principles, and computational procedures related to basic descriptive and inferential statistics; and the careful cultivation of the logical processes involved in statistical inference. This will enable students to understand statistics and communicate statistical ideas using generally-accepted terminology. Another important objective is that students become aware of the limitations of various statistical procedures. This is particularly important since most students in this course will be consumers rather than providers of statistical information and conclusions. Estimates and forecasts, for example, are generally regarded with too much faith, and relied upon to a degree not warranted in light of their inherent limitations. 5. WHAT MUST STUDENTS DO TO SUCCEED IN THIS COURSE? Your activities in this course should include: reading and studying the relevant sections of the text; attending class and taking notes; rewriting, reviewing, and studying your notes; working the recommended exercises in the text; practicing and experimenting with various spreadsheet files supplied by the instructor; asking and answering questions in class; spending time just thinking about the procedures and their underlying logic; forming a study group with other students to review notes on terminology and concepts, and to practice problem-solving skills; and taking the quizzes. These activities should help you to further develop your abilities to read, listen, record, and organize important information; and to communicate, analyze, compute, and learn independently the subject matter of statistics. In order to do well, students must recognize a basic difference between courses like statistics and courses like history, philosophy, management or organizational strategy. In the latter type, the emphasis is often on general ideas in broad contexts, with grades based on essay exams and term papers in which students have considerable latitude to choose what they are going to discuss. The cogent expression and defense of wellreasoned opinion are highly valued. Students with good verbal, logical and writing skills often excel in this type of course. Statistics, on the other hand, is a skills course, requiring precise knowledge of concepts, terminology, and computational procedures. Verbal skills are still important, but now quantitative logic and computational competence are also critical. Grades are based on knowledge of terminology and concepts, and even more on the ability to get the right answers to problems. Regarding study strategy, it is extremely important for most students to read about statistics, to think about statistics and to do a few problems every day. The most common error is to neglect the material until shortly before a quiz. But for most students, many of the concepts in statistics are new and strange, and there will be many places where they are stopped cold: "What?" "I just don't get this!" Then there is no time left to cultivate the understanding of new concepts and to refine the computational procedures. Anyone can learn statistics, but most cannot do it overnight. As with most courses, this course is organized with the most fundamental material coming first. In learning a new language, or how to play a musical instrument, or any new set of skills, mastery of the basics is essential to success later on. The subject matter of statistics is not like history, where, if you did not study 14th century France, it probably did not affect your learning about 17th century England. In statistics, failure to obtain a good understanding of earlier material will have a serious adverse effect on your ability to make sense out of what comes later. It is therefore essential to build a solid foundation of fundamental knowledge early in the course in order to support the more elaborate logical and computational structures involved later. 6. WHAT ARE THE PREREQUISITES FOR THE COURSE? The primary prerequisite is a logical mind. This course is computational, but it is not a "math" course. Mathematical theorems are not derived or proven; the need to solve equations is very rare. The emphasis is on concrete applications rather than abstract theory. Some students with good math backgrounds have done poorly, while others with little or no math experience have done very well. The best MBA stats student I ever had was a philosophy major who did not have single math course at the college level. When asked about this, the he replied: "My philosophy major gave me excellent training in logic, and that's really what this course requires." 7. OF WHAT IMPORTANCE IS CLASS PARTICIPATION? In this course, class participation means frequently asking relevant questions and supplying answers (right or wrong) to the instructor's and colleagues' questions as problems and examples are worked out and discussed. These behaviors are evidence of active involvement with the material and will result in better learning and an automatic positive effect on your grade. In grade border-line cases, a history of active participation will enable the instructor to award the higher grade to the deserving student. 8. WILL STUDENTS BE GIVEN ALTERNATIVE WAYS TO ACHIEVE SUCCESS, BASED ON DIFFERENT LEARNING STYLES? Different learning styles do exist. Some prefer a deductive method (deriving specific knowledge from general principles), while others tend to prefer an inductive method (deriving the generalities from examples). The inductive learners may need to work a number of problems before seeing the patterns that are present. The deductive learners may never need to work a problem--they will know instinctively what to do. Some will not like the book, and will learn primarily from the class presentations and discussions, while others will learn mostly from the book and will find class time to be of lesser importance. But the intended outcomes are the same for all--those in number 4 above. 9. WHAT IS THE PURPOSE OF THE ASSIGNMENTS? Problems from the text may be suggested, for the purpose of providing practice in analyzing what must be done, and in performing the required computations. Even though computer software is available to perform calculations, students can gain insight into the logical structure of a sequence of computational steps if they go through them several times by hand (i.e. using simple calculators). Computer assignments using instructor-supplied spreadsheet files will require students to become more familiar with spreadsheet software that they probably are or will be using in connection with their work. More importantly, the spreadsheets allow students to experiment with data in order to investigate the quantitative relationships involved. Such experimentation would be too tedious and time-consuming for manual or even calculator computation. 10. WHAT WILL THE TESTS TEST? -- MEMORY? UNDERSTANDING? ABILITY TO SYNTHESIZE? TO PRESENT EVIDENCE LOGICALLY? TO APPLY KNOWLEDGE IN A NEW CONTEXT? The tests will test your ability to recognize and use statistical terminology correctly, and they will test your understanding of the logic and principles underlying various statistical procedures. In addition, you will have to demonstrate your ability to solve problems similar to those discussed in class, sometimes using computer spreadsheet files. There is a place for memorization in learning. It is not a substitute for comprehension, but it is better than getting something wrong on a quiz that you were expected to know. As with prayers among small children, memorization is often a first step, eventually followed by understanding. But if the memorization (of terminology, for example) is not done, it is less likely that the comprehension will ever occur. 11. WHY HAS THIS PARTICULAR TEXT BEEN CHOSEN? Our text is one of the most widely adopted introductory statistics books. It has gone through several editions, and its popularity remains high. It is relatively easy to read, and its exercise material is excellent. 12. WHAT IS THE RELATIONSHIP BETWEEN KNOWLEDGE LEVEL AND GRADES? Consider this hypothetical but realistic situation. Knowledge Percentage Grade Course A Course B 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 100% 81% 64% 49% 36% 25% 16% 9% 4% 1% Course A might be like philosophy, history, or management, where the grade is more-or-less proportional to knowledge level. Course B might be like statistics or other skills courses, where small deficiencies in knowledge can have disastrous effects on results. Overstudying is the best strategy for coping with this, with the dual payoffs of higher grades and, more importantly, greater knowledge. QUIZ QUIZ HW 0.667 0.333 HW 10 20 100 70.0 95 A B C D F 30 40 50 60 70 80 90 95 73.4 76.7 80.0 83.4 86.7 90.0 93.3 96.7 98.3 100 100 66.7 70.0 73.4 76.7 80.0 83.3 86.7 90.0 93.3 95.0 96.7 95 90 63.4 66.7 70.0 73.4 76.7 80.0 83.3 86.7 90.0 91.7 93.3 90 85 60.0 63.4 66.7 70.0 73.3 76.7 80.0 83.3 86.7 88.3 90.0 85 80 56.7 60.0 63.4 66.7 70.0 73.3 76.7 80.0 83.3 85.0 86.7 80 75 53.4 56.7 60.0 63.3 66.7 70.0 73.3 76.7 80.0 81.7 83.3 75 70 50.0 53.4 56.7 60.0 63.3 66.7 70.0 73.3 76.7 78.3 80.0 70 65 46.7 50.0 53.3 56.7 60.0 63.3 66.7 70.0 73.3 75.0 76.7 65 60 43.4 46.7 50.0 53.3 56.7 60.0 63.3 66.7 70.0 71.7 73.3 60 55 40.0 43.3 46.7 50.0 53.3 56.7 60.0 63.3 66.7 68.3 70.0 55 50 36.7 40.0 43.3 46.7 50.0 53.3 56.7 60.0 63.3 65.0 66.7 50 45 33.3 36.7 40.0 43.3 46.7 50.0 53.3 56.7 60.0 61.7 63.3 45 40 30.0 33.3 36.7 40.0 43.3 46.7 50.0 53.3 56.7 58.3 60.0 40 30 23.3 26.7 30.0 33.3 36.7 40.0 43.3 46.7 50.0 51.6 53.3 30 20 16.7 20.0 23.3 26.7 30.0 33.3 36.7 40.0 43.3 45.0 46.6 20 100 I can use Excel to perform basic computations prepare tables create charts and graphs conduct common statistical procedures create dashboards I can use Word to create various kinds of documents I can compute means medians variances standard deviations confidence intervals for means and proportions use the binomial distribution to answer probability questions normal distribution to answer probability questions chi-square distribution to answer probability questions F distribution to answer probability questions conduct hypothesis tests on the means of one group or two the proportions of one group or two hypothetical vs. observed distributions variances of one group or two group means using ANOVA regression analysis to examine correlation and make forecasts I can perform financial analysis compute the NPV of various investment opportunities decide between using debt or equity to raise new funds determine the optimum mix of debt and equity financing compute cost-of-capital decide whether to make or buy components for our products determine how much direct labor, direct materials, and overhead is going into our products create cash budgets conduct cost-volume-profit analyses prepare a master budget prepare prformance reports using standard costs and variances employ the scientific method to study problems that may come up PART TWO -- Essentials--Analysis of Enumerative Data Enumerate: to count, usually after classification has been performed Enumerative data: data obtained by classifying and counting occurrences Multinomial experiment--like the binomial experiment, except each trial has more than two outcomes n identical trials; k possible outcomes on each trial Independence--the outcome of one trial does not affect the outcome of any other trial Constant probabilities for each outcome from trial to trial p1, p2, p3, . . ., pk are the probabilities of the various outcomes Cell counts (number of times each outcome occurs) are the variables to be analyzed 2 Chi-square (χ ) distribution: continuous, positively skewed One-dimensional chi-square test--“goodness of fit” tests Ho: that a population conforms to some expected distribution. A cell consists of an expectation (E) and an observation (O). Expected values (E) are derived from Ho. The number of cells is denoted by k. 2 Calculated chi-square (test statistic, χ c) for a cell is the squared deviation 2 2 (E-O) divided by E. The χ c is the total of all the cells. Degrees of freedom (df): the number of cells minus one (k-1) (d.f. = k – 3 when the normal distribution is used.) (d.f. = k – 2 when the Poisson distribution is used.) 2 2 Ho is rejected if χ c χ t , also if p ≤ If Ho is rejected, additional information should be reported as to the nature of the deviation from the expected distribution. Often used to test for normal distributions. For the sample size to be sufficient, the expected number (e) in each cell should equal or exceed 5. Two-dimensional chi-square test H0: in the population the row variable and column variable are independent. Ha: in the population the row variable and column variable are dependent. Contingency (dependency) table contains a matrix of cells A cell consists of an expectation (E) and an observation (O). Expected values (E) are derived from H0 using the multiplication rule for intersections of independent events: P(A B) = P(A) * P(B). Calculated chi-square for a cell is (E-O)2 / E (same as above). 2 2 Ho is rejected if χ c χ t , also if p ≤ Degrees of freedom: number of rows minus one, times number of columns minus one; (r-1)(c-1) where r and c are the numbers of rows and columns If H0 is rejected, additional information should be reported as to the nature of the dependencies. For the sample size to be sufficient, the expected number (e) in each cell should equal or exceed 5. Terminology--explain each of the following: enumerative data, multinomial experiment, binomial experiment, identical trials, independence, one-dimensional or one-way chi-square test, “goodness-of-fit” test, two-dimensional or two-way chi-square test, dependency, contingency table, multiplication rule for intersections of independent events. Skills and Procedures given appropriate data, conduct a one-way chi-square test and interpret the results given appropriate data, conduct a two-way chi-square test and interpret the results Concepts describe what is meant by “goodness-of-fit” explain how expected values are determined in a one-way chi-square test explain how the concept of “deviation” applies in chi-square test computations explain how expected values are determined in a two-way chi-square test describe the application of the “multiplication rule for independent events” in two-way chi-square analysis If the H0 is rejected: One-Way: “The differences between the observations and the expectations are statistically significant at the ______ level. The population probably does not conform to the expected distribution.” (You should say more about the nature of the differences between the observations and the expectations.) Two-Way: “There is statistically significant dependence between ______ and ______ at the ______ level.” (Give more information about the dependencies.) If the H0 is not rejected: One-Way: “The differences between the observations and the expectations are not statistically significant at the ______ level. The population could conform to the expected distribution.” Two-Way: “The dependence between ______ and ______ is not statistically significant at the ______ level.” PART THREE -- Essentials--Analysis of Variance (ANOVA) Purpose: To test for differences between/among two or more population means. H0: μ1 = μ2 = μ3 . . .; Population means are all equal. Ha: not μ1 = μ2 = μ3 . . .; Population means are not all equal; Note that Ha is not "all the population means are different." Rejection of Ho means that there is a statistically significant difference between at least two of the sample means. Interval estimation of population means and differences between population means is also possible. Sums of squared deviations TSS--total sum of squared deviations SST--sum of squared deviations for treatments (between-group variation) SSE--sum of squared deviations for error (within-group variation) TSS = SST + SSE Means of squared deviations--recall that a variance is a mean of squared deviations. MST--mean of squared deviations for treatments (between-group variance) MSE--mean of squared deviations for error (within-group variance) Signal-to-noise analogy Signal: between-group variance, MST Noise: within-group variance, MSE The more false Ho is (the larger the differences between/among population means), the larger MST will be relative to MSE. ANOVA table--standardized way of presenting computations and results Calculated F ( test statistic, Fc ) is MST / MSE Total degrees of freedom: the number of observations minus one Degrees of freedom for treatments: number of treatments minus one Degrees of freedom for error: the number of observations minus the number of treatments When there are only two groups and a t-test could be used, the Fc will be equal to the square of the tc. Reject Ho if Fc Ft and if p α. Four assumptions (same as t-tests of chapter 9) Samples Random Independent Populations Normally distributed Equal variances Moderate departures from the assumptions will not seriously affect validity (robust) One-way ANOVA--completely randomized design Two-way ANOVA--randomized block design TSS = SST + SSB + SSE (B = "blocks") Two calculated F's: treatments FT = MST / MSE and blocks FB = MSB / MSE Total degrees of freedom: the number of observations minus one Degrees of freedom for treatments: the number of treatments minus one Degrees of freedom for blocks: the number of blocks minus one Degrees of freedom for error: the number of observations minus the number of treatments, minus the number of blocks, plus one Estimation in One-Way ANOVA tt in the following equations is based on the number of degrees of freedom for error. Single population mean = X t t ( ˆ x ) where MSE n ˆ X = MSE / n Difference between two population means: ( 1 - 2 ) = ( x1 - x2 ) t t ˆ ( x1- x2 ) where ˆ ( x - x ) = MSE x 1 2 1 + n1 1 n2 Estimation in two-way ANOVA (randomized block design) Two-way ANOVA estimation -- valid only for differences between population means. Confidence intervals cannot be obtained for individual treatment means. tt in the following equations is based on the number of degrees of freedom for error, Difference between two population means: ( 1 - 2 ) = ( x1 - x2 ) t t ˆ ( x1- x2 ) where ˆ ( x - x ) = MSE x 1 2 1 n1 + 1 n2 Three-way analysis of variance "Latin square" design Terminology--explain each of the following: TSS--total sum of squared deviations, SST--sum of squared deviations for treatments (between-group variation), SSE--sum of squared deviations for error (within-group variation), variance, MST--mean of squared deviations for treatments (between-group variance), MSE--mean of squared deviations for error (within-group variance), signal-to-noise ratio, ANOVA table, calculated F (MST / MSE), degrees of freedom (treatments, blocks, error), four assumptions (same as t-tests of chapter 9), robust test-moderate departures from the assumptions will not seriously affect validity, completely randomized design, randomized block design, "Latin square" design Skills and Procedures given appropriate data, conduct a one-way ANOVA and interpret the results; include all possible 95% confidence intervals given appropriate data, conduct a two-way ANOVA and interpret the results; include all possible 95% confidence intervals Concepts explain why, when ANOVA deals with tests on means, it is called “analysis of variance” explain the “signal-to-noise ratio” concept in the context of ANOVA describe the shortcoming that ANOVA shares with small-sample t-tests show where the variances are found in the ANOVA table If the H0 is rejected: “The difference between at least two of the sample means of the __________ is statistically significant at the α level. The population means are probably not all equal.” If the H0 is not rejected: “The differences among the sample means of the __________ are not statistically significant at the α level. All the population means could be equal.” PART FOUR -- Essentials--Linear Regression and Correlation Major purpose in business: forecasting In order for forecasting to be possible, the future must, in some way, be like the past. Forecasting methods seek to identify relationships from the past, and use them to predict the future (assuming that the identified relationship will persist). Finding relationships is a way of identifying dependencies. Dependent variable--one to be predicted Independent variable--one used to make the prediction Types of regression Based on the number of independent variables Simple regression--one predictor or independent variable (x) E.g. y = a + bx Multiple regression--two or more predictor or independent variables (x1, x2, . . . ,xn) E.g. y = a + bx1 +cx2 +dx3 +ex4 Based on the type of regression line Linear: y = a + bx a = y-intercept; b = slope or y = mx + b: b = y-intercept; m = slope or y = β0 + β1 x: β0 = y-intercept; β1 = slope Slope is the coefficient (multiplier) of x, no matter what symbol is used or where it appears in the equation. Slope is the change in y for a one-unit change in x. Usually regarded as the single most important result in regression, because it describes the nature of the relationship between y and x. In multiple regression, each independent variable has its own slope and its own Intercept is the other value, also known as the "constant". Intercept is the value of y when x = 0. Non-linear (curved): exponential e.g. y = abx or y = 35(1.06)x logarithmic e.g. y = a log x or 3.2 log x power e.g. y = axb or 60(x)5 trigonometric e.g. y = a sin x or 3.7 sin x etc. Over a restricted range (relevant range) a curve can be approximated with a straight line Based on the nature of the suspected relationship between y and x Causal regression: x may be an actual cause of y, or x may be related to something else that is a cause of y Time series regression--popular in business and economics Time is the independent (x) variable, used to substitute for the actual causes of y. In time series, it is often better to use less historical data rather than more. The future is likely to be more like the recent past than the more distant past. With less data x is closer to x-bar (see below). Correlation--the degree of "relatedness" between dependent and independent variables Types of correlation positive: dependent variable increases as the independent variable increases negative: dependent variable decreases as the independent variable increases none: no apparent relationship between dependent variable and independent variable Measures of correlation Coefficient of non-determination, k2--always positive--range, 0 to 1 If there is perfect correlation, k2 is equal to zero. If there is no correlation, k2 is equal to one. Coefficient of determination, r2, equal to 1 - k2--always positive--range, 0 to 1 If there is perfect correlation, r2 is equal to one. If there is no correlation, r2 is equal to zero. Correlation coefficient, r, the square root of r2--positive or negative, depending on the type of correlation--range -1 to +1 Note: ρ (rho) and ρ2 are the population parameters corresponding to r and r2 Correlation and causation The presence of correlation does not, in itself, prove that x causes y. Three things necessary to prove causation Statistically significant correlation between the effect, y, and the alleged cause, x. Alleged cause, x, must be present before or at the same time as the effect, y. Explanation must be found as to how x causes y. Prediction errors--five standard errors (sampling standard deviations) Standard error of the slope, σb Measure of uncertainty regarding the slope of the regression line Used to find confidence interval for the slope: β = b ± ttσb Note: β is the population slope, estimated by b. Standard error of the intercept, σa Measure of uncertainty regarding the intercept of the regression line Used to find confidence interval for the intercept: α = a ± ttσa Note: α is the population intercept, estimated by a. Standard error of estimate, σd and standard error of prediction, σpred Measures of uncertainty regarding predictions Used in finding confidence interval for predictions: y = y' ± ttσpred Predictions have the least uncertainty when the value of x is near x-bar. Standard error of the correlation coeffiecient, σr Measure of uncertainty regarding the correlation coefficient Types of variation in regression Initial or original variation Sum of the squared deviations between the data y-values and the mean of the y-values -- Σ(y-ybar)2 Residual variation Sum of the squared deviations between the data y-values and the predicted y-values -- Σ(y-y')2 Removed or explained variation Initial variation minus residual variation k2 is the ratio of residual variation to original variation, Σ(y-y')2 / Σ(y-ybar)2. r2 is the ratio of removed variation to original variation. Hypothesis testing in regression Ho: No correlation (relationship) between y and x. ρ = 0 or ρ2 = 0 or β = 0 Ha: Correlation between y and x (two-sided) Positive correlation between y and x (one-sided) Negative correlation between y and x (one-sided) Reject Ho if tc tt (when n is small) or if zc zt (when n is large). When n is small, df = (n-2) Reject Ho if p α (hypothesis-test α, not intercept α) If Ho is not rejected, there is no statistically significant correlation between x and y. The regression equation should not be used--just use y-bar to predict y, or don't make a prediction at all. Exponential regression (not in the textbook) Linear vs. exponential growth Simple interest--example of linear growth Interest is paid only on the initial deposit E.g. $1,000 deposited today at 5% is worth $1,000 + $50(x) after x years. $1,000 is the intercept (value of y today, when x = 0). $50 is the slope (change in y each year (5% of $1,000)). The slope, $50, is constant. Compound interest--example of exponential growth Interest paid not only on the initial deposit, but also on previously-earned interest. E.g. $1,000 deposited today at 5% is worth $1,000 (1.05)x after x years $1,000 is the intercept (value of y today, when x = 0) 1.05 is the growth factor (b), which is equal to 1 + the growth rate (r) b = 1+ r and r = b - 1 In the above example r = 0.05 (5%) and b = 1.05 The slope is not constant, but increases as x increases. Exponential equation: y'exp = a (b)x a = y-intercept; b = compound growth factor Growth rate r = b - 1, and compound growth factor b = 1+ r "b" values compared: Linear: y = a + b(x) b < 0 negative correlation b = 0 no correlation (y = intercept a, regardless of value of x) b > 0 positive correlation Exponential: y = a (b)x b < 1 negative correlation b = 1 no correlation (y = intercept a, regardless of value of x) b > 1 positive correlation Exponential regression computations Procedure is based on the fact that if y is an exponential function of x, then ln y (or log y) is a linear function of x That is, if y = a(b)x, then ln y = a' + b'(x) or log y = a'' + b''(x). (The three "a" and "b" values in the above equations are different.) Procedure Transform the y-values into the lns (or logs) of the y-values. Math review The logarithm of a number is the power to which a base number must be raised in order to give the original number Natural logarithms use the number e (2.718281828...) as the base. ln 25 is 3.218876 because e3.218876 is 25 ln 100 is 4.605170 because e4.605170 is 100 Common logarithms use the number 10 as the base log 25 is 1.397940 because 101.397940 is 25 log 100 is 2 because 102 is 100 Perform linear regression analysis on the lns (or logs) of the y-values. Result is a linear equation for predicting the ln (or log) of y ln y' = a'+b'x or log y' = a''+b''x Determine a and b values in y' = a(b)x a is the inverse ln of a' (or the inverse log of a'') b is the inverse ln of b' (or the inverse log of b'') Inverse ln of z = ez (or Inverse log of z = 10z) Confidence intervals in exponential forecasting Intervals are first computed for ln (or log) of y', then are converted to LCL and UCL values using inverse lns (or logs) Two-point regression--linear and exponential--quick forecasts (see examples at end of outline) Linear Slope (b) is the difference between y-values divided by the difference between x-values. Let y-axis be located at the first x-value (let first x-value correspond to zero on the x-axis). Intercept (a) is then the first y-value. Equation y' = a + bx can then be written and used to make forecasts Exponential Growth factor (b) is the ratio of the y-values raised to the 1/n power, where n is the Let y-axis be located at the first x-value (let first x-value correspond to zero on the x-axis). Intercept (a) is then the first y-value. Equation y' = abx can then be written and used to make forecasts Confidence intervals cannot be computed for two-point forecasts. Multiple Regression More than one independent variable Linear form: y' = a + bx1 + cx2 + dx3 + . . . (a coefficient for each variable) Partial correlation coefficients and partial coefficients of determination r1, r2, r3, . . . and r12, r22, r32, . . . Terminology--explain each of the following: forecasting (basic concept), dependent variable, independent variable, simple regression, multiple regression, linear regression, intercept, slope, non-linear regression, exponential regression, causal regression, time-series regression, correlation, positive correlation, negative correlation, k 2, coefficient of non-determination, r2, coefficient of determination, r, correlation coefficient, causation, standard error of the slope, standard error of the intercept, standard error of estimate, standard error of prediction, standard error of the correlation coefficient, initial or original variation, residual variation, removed or explained variation, null hypothesis in regression, alternate hypotheses in regression, simple interest, compound interest, compound growth factor, growth rate, transformation, logarithm, natural logarithm, common logarithm, inverse logarithm, two-point regression, multiple regression, partial correlation, cross-products, degrees of freedom, table-t, calculated-t, signal-to-noise ratio Skills and Procedures perform linear regression using the TI-83 and the spreadsheet <<REG>>, including predictions, error factors, hypothesis tests, and evaluation of the degree of correlation perform exponential regression using the TI-83 and the spreadsheet <<REG>>, including predictions, error factors, hypothesis tests and evaluation of the degree of correlation interpret, in nonmathematical terms, the intercept and slope in linear regression interpret, in nonmathematical terms, the intercept and growth factor in exponential regression interpret the coefficients of nondetermination and determination in linear and exponential regression Concepts describe “intercept” as nonmathematically as possible describe “slope” as nonmathematically as possible describe “compound growth factor” as nonmathematically as possible explain the difference between simple regression and multiple regression explain the significance of the “sum of the squared deviations between the data points and their mean” explain the significance of the “sum of the squared deviations between the data points and the regression line” describe the relationship between the “coefficient of nondetermination” and the two items immediately above describe the relationship between the “coefficient of nondetermination” and the “coefficient of determination” identify the difference between linear growth and exponential growth in terms of what is constant in each case explain why the demonstrated correlation between smoking and lung cancer does not prove that smoking causes lung cancer describe the relationship among the three types of variation: “original,” “residual,” and “explained” (or “removed”) explain the relationship between regression hypothesis-test results and the ability (advisability) to make predictions in exponential growth, describe the relationship between the compound growth factor and the growth rate describe how a regression line, straight or exponential, may be fitted between two data points If the Ho is rejected: “The correlation between _____ and _____ is statistically significant at the __ level.” If the Ho is not rejected: “The correlation between _____ and _____ is not statistically significant at the __ level.” Two-point regression examples: A city’s population was 234,000 in 1995, and 683,000 in 2005. What are the growth rates and forecasts for 2010? Linear: The b-value is (683,000 - 234,000) / 10 = 44,900 people per year. Equation is y’ = 234,000 + 44,900(x) Forecast for 2010 is y’ = 234,000 + 44,900(15) = 907,500. Exponential: The b-value is (683,000 / 234,000) ^ (1/10) = 1.113065 or 11.31% annual growth. Equation is y’ = 234,000 * 1.113065 ^ x Forecast for 2010 is y’ = 234,000 * 1.113065 ^ 15 = 1,166,872. PART FIVE -- Essentials--Nonparametric Statistics Parameter: population characteristic Nonparametric test: does not require any particular population characteristics Advantage: no required population characteristics Disadvantage: not as powerful as parametric tests (t-test, ANOVA) Power: ability of a test to detect when Ho is false, and give the correct conclusion (rejection of Ho) Power of any test can be increased by increasing the sample size. If two different tests are applied to the same data, the more powerful test will produce a lower p-value. Sign Test--for differences between population means, paired-difference design Based on the binomial distribution Ho: the population means are equal. Ha: the population means are not equal (2-sided). 1-sided tests are also possible. Wilcoxon Signed-Rank Test--test for differences between population means, paired-difference design Ho: the population means are equal. Ha: the population means are not equal (2-sided). 1-sided tests are also possible. Procedure--see “notes” Mann-Whitney "U" Test--test for differences between population means, unpaired design Ho: the population means are equal. Ha: the population means are not equal (2-sided). 1-sided tests are also possible. Procedure--see ”notes” Runs Test (not in textbook)--test for independence in a series of binomial events Procedure--see ”notes” Terminology--explain each of the following: Sign Test--application, nonparametric, parametric, sign test, binomial distribution, paired-difference design, power, ranking, tied observations, tied rankings, Wilcoxon Signed-Rank Test--application, Mann-Whitney “U” Test--application, Runs Test--application, run, positive dependence, negative dependence, independence. Skills and Procedures given appropriate data, conduct a Sign Test and interpret the results given appropriate data, conduct a Wilcoxon Signed-Rank Test and interpret the results given appropriate data, conduct a Mann-Whitney “U” Test and interpret the results given appropriate data, conduct a Runs Test and interpret the results Concepts describe the advantage of nonparametric tests describe the disadvantage of nonparametric tests explain how the disadvantage of nonparametric tests may be overcome explain the theory of the sign test explain the concept of “power” and tell why nonparametric tests are generally less powerful than their parametric equivalents describe what is meant by “randomness” in a series of binomial events describe what is meant by “positive dependence” in a series of binomial events