lOMoARcPSD|21931040 Study Guide Quantitative Techniques Quantitive Techniques (Damelin) Studocu is not sponsored or endorsed by any college or university Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© BACHELORS OF COMMERCE (GENERIC) MODULE: QUANTITATIVE TECHNIQUES STUDY GUIDE 2021 1 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Copyright © Educor 2020 All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of Educor Holdings. Individual’s found guilty of copywriting will be prosecuted and will be held liable for damages. 2 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 1 Table of Contents 1 About DAMELIN .............................................................................................................................. 6 2 Our Teaching and Learning Methodology ...................................................................................... 6 2.1 3 Icons ........................................................................................................................................ 8 Introduction to the Module .......................................................................................................... 11 3.1 Module Information.............................................................................................................. 11 3.2 Module Purpose .................................................................................................................... 11 The purpose of this module is to instil critical thinking and analytical mind-set in making decisions in a business and management settings. This will give learners the ability to logically analyse sets of related issues and be able to come up with an informed decision. ................................................. 11 4 5 3.3 Outcomes .............................................................................................................................. 11 3.4 Assessment ........................................................................................................................... 12 3.5 Planning Your Studies / Resources Required for this Module: ............................................. 13 Prescribed Reading ....................................................................................................................... 13 4.1 Prescribed Book .................................................................................................................... 13 4.2 Recommended Articles ......................................................................................................... 13 4.3 Recommended Multimedia .................................................................................................. 13 Module Pacing .............................................................................................................................. 14 5.1 WEEK 1: STATISTICS IN MANAGEMENT ................................................................................ 17 5.1.1 Introduction .................................................................................................................. 17 5.1.2 Statistics in Management.............................................................................................. 17 5.1.3 The terminology of Statistics ........................................................................................ 19 5.1.4 Components of Statistics .............................................................................................. 20 5.1.5 Statistical Applications in Management ....................................................................... 21 5.1.6 Statistics and Computers .............................................................................................. 21 5.1.7 Data and Data Quality ................................................................................................... 21 5.1.8 Data Types..................................................................................................................... 22 5.1.9 Data Sources ................................................................................................................. 23 5.1.10 Self-Assessment ............................................................................................................ 25 5.2 WEEK 2: SUMMARISING DATA: SUMMARY TABLES AND GRAPHS ....................................... 26 5.2.1 Introduction .................................................................................................................. 26 5.2.2 Summarising Categorical Data ...................................................................................... 26 5.2.3 Summarising Numeric Data .......................................................................................... 30 5.2.4 Self-Assessment ............................................................................................................ 33 3 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE 5.3 Damelin© WEEK 3: DESCRIBING DATA: NUMERIC DESCRIPTIVE STATISTICS ........................................ 34 5.3.1 Introduction .................................................................................................................. 34 5.3.2 Non-central Location Measures .................................................................................... 37 5.3.3 Measures of Dispersion ................................................................................................ 37 5.3.4 Measure of Skewness ................................................................................................... 38 5.3.5 The Box Plot .................................................................................................................. 39 5.3.6 Self-Assessment ............................................................................................................ 40 5.4 WEEK 4: BASIC PROBABILITY CONCEPTS............................................................................... 41 5.4.1 Introduction .................................................................................................................. 41 5.4.2 Types of Probability ...................................................................................................... 41 5.4.3 Properties of a Probability ............................................................................................ 43 5.4.4 Basic Probability Concepts ............................................................................................ 43 5.4.5 Calculating Objective Probabilities ............................................................................... 44 5.4.6 Probability Rules ........................................................................................................... 45 5.4.7 Probability Trees ........................................................................................................... 45 5.4.8 Permutations and Combinations .................................................................................. 45 5.4.9 Self-Assessment ............................................................................................................ 46 5.5 WEEK 5: PROBABILITY DISTRIBUTIONS ................................................................................. 47 5.5.1 Introduction .................................................................................................................. 47 5.5.2 Types of Probability Distribution .................................................................................. 47 5.5.3 Discrete Probability Distributions ................................................................................. 47 5.5.4 Binomial Probability Distribution .................................................................................. 48 5.5.5 Poisson Probability Distribution.................................................................................... 49 5.5.6 Continuous Probability Distribution ............................................................................. 50 5.5.7 Normal Probability Distribution .................................................................................... 51 5.5.8 Standard Normal (z) Probability Distribution ............................................................... 51 5.5.9 Self-Assessment ............................................................................................................ 52 5.6 WEEK 6: CONFIDENCE INTERVAL ESTIMATION..................................................................... 53 5.6.1 Introduction .................................................................................................................. 53 5.6.2 Point Estimation ............................................................................................................ 53 5.6.3 Confidence Interval Estimation ..................................................................................... 53 5.6.4 Confidence Interval for a single population mean: sample standard deviation is known, n – large (n>30) ............................................................................................................................. 56 5.6.5 The Precision of a Confidence Interval ......................................................................... 56 4 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE 5.6.6 Damelin© The Student t-distribution............................................................................................. 56 5.6.7 Confidence Interval for a Single Population Mean (μ) when the Population Standard Deviation (σ) is unknown .............................................................................................................. 57 5.6.8 Confidence Interval for the Population Proportion (π) ................................................ 59 5.6.9 Self-Assessment ............................................................................................................ 60 5.7 WEEK 7: HYPOTHESES TESTS – SINGLE POPULATION (PROPORTIONS & MEANS) ............... 61 5.7.1 Introduction .................................................................................................................. 61 5.7.2 The Process of Hypothesis Testing................................................................................ 61 5.7.3 Hypothesis Test for a Single Population Mean (μ) – Population Standard Deviation (σ) is known 62 5.7.4 Hypothesis Test for a Single Population Mean (μ) – Population Standard Deviation (σ) is Unknown.................................................................................................................................... 62 5.7.5 Hypothesis Test for a Single Population Proportion (π) ............................................... 63 5.7.6 The p-value approach to hypothesis testing................................................................. 63 5.7.7 Self-Assessment ............................................................................................................ 65 5.8 5.8.1 Introduction .................................................................................................................. 66 5.8.2 Simple Linear Regression .............................................................................................. 66 5.8.3 Scatter Plot .................................................................................................................... 67 5.8.4 Correlation Analysis ...................................................................................................... 70 5.8.5 The Coefficient of Determination (r2) ........................................................................... 70 5.8.6 Self-Assessment ............................................................................................................ 71 5.9 6 WEEK 8: SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS ................................. 66 WEEK 9: TIME SERIES ANALYSIS: A FORECASTING TOOL ...................................................... 72 5.9.1 Introduction .................................................................................................................. 72 5.9.2 The Components of a Time Series ................................................................................ 72 5.9.3 Decomposition of a Time Series ................................................................................... 74 5.9.4 Trend Analysis ............................................................................................................... 74 5.9.5 Seasonal Analysis .......................................................................................................... 75 5.9.6 Uses of Time Series Indicators ...................................................................................... 75 5.9.7 Self-Assessment ............................................................................................................ 75 REFERENCES .................................................................................................................................. 76 5 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 1 About DAMELIN Damelin knows that you have dreams and ambitions. You’re thinking about the future, and how the next chapter of your life is going to play out. Living the career, you’ve always dreamed of takes some planning and a little bit of elbow grease, but the good news is that Damelin will be there with you every step of the way. We’ve been helping young people to turn their dreams into reality for over 70 years, so rest assured, you have our support. As South Africa’s premier education institution, we’re dedicated to giving you the education experience you need and have proven our commitment in this regard with a legacy of academic excellence that’s produced over 500 000 world – class graduates! Damelin alumni are redefining industry in fields ranging from Media to Accounting and Business, from Community Service to Sound Engineering. We invite you to join this storied legacy and write your own chapter in Damelin’s history of excellence in achievement. A Higher Education and Training (HET) qualification provides you with the necessary step in the right direction towards excellence in education and professional development. 2 Our Teaching and Learning Methodology Damelin strives to promote a learning-centred and knowledge-based teaching and learning environment. Teaching and learning activities primarily take place within academic programmes and guide students to attain specific outcomes. • • • • • A learning-centred approach is one in which not only lecturers and students, but all sections and activities of the institution work together in establishing a learning community that promotes a deepening of insight and a broadening of perspective with regard to learning and the application thereof. An outcomes-oriented approach implies that the following categories of outcomes are embodied in the academic programmes: Culminating outcomes that are generic with specific reference to the critical cross-field outcomes including problem identification and problem-solving, co-operation, selforganisation and self-management, research skills, communication skills, entrepreneurship and the application of science and technology. Empowering outcomes that are specific, i.e. the context specific competencies students must master within specific learning areas and at specific levels before they exit or move to a next level. Discrete outcomes of community service learning to cultivate discipline-appropriate competencies. Damelin actively strives to promote a research culture within which a critical-analytical approach and competencies can be developed in students at undergraduate level. Damelin accepts that students’ learning is influenced by a number of factors, including their previous educational experience, their cultural background, their perceptions of particular learning tasks and assessments, as well as discipline contexts. 6 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Students learn better when they are actively engaged in their learning rather than when they are passive recipients of transmitted information and/or knowledge. A learning-oriented culture that acknowledges individual student learning styles and diversity and focuses on active learning and student engagement, with the objective of achieving deep learning outcomes and preparing students for lifelong learning, is seen as the ideal. These principles are supported through the use of an engaged learning approach that involves interactive, reflective, cooperative, experiential, creative or constructive learning, as well as conceptual learning via online-based tools. Effective teaching-learning approaches are supported by: • • • • • • • • • • • • Well-designed and active learning tasks or opportunities to encourage a deep rather than a surface approach to learning. Content integration that entails the construction, contextualization and application of knowledge, principles and theories rather than the memorisation and reproduction of information. Learning that involves students building knowledge by constructing meaning for themselves. The ability to apply what has been learnt in one context to another context or problem. Knowledge acquisition at a higher level that requires self-insight, self-regulation and selfevaluation during the learning process. Collaborative learning in which students work together to reach a shared goal and contribute to one another’s learning at a distance. Community service learning that leads to collaborative and mutual acquisition of competencies in order to ensure cross cultural interaction and societal development. Provision of resources such as information technology and digital library facilities of a high quality to support an engaged teaching-learning approach. A commitment to give effect teaching-learning in innovative ways and the fostering of digital literacy. Establishing a culture of learning as an overarching and cohesive factor within institutional diversity. Teaching and learning that reflect the reality of diversity. Taking multi culturalism into account in a responsible manner that seeks to foster an appreciation of diversity, build mutual respect and promote cross-cultural learning experiences that encourage students to display insight into and appreciation of differences. 7 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 2.1 Icons The icons below act as markers, that will help you make your way through the study guide. Additional Information All supplementary and recommended learning resources Announcements Important announcements made via myClass Assessments Continuous and Summative Assessments Audio Material Audio recordings and podcasts Calculator Activities that require calculation and equation base solutions Case Study Working examples of concepts and practices Chat A live chat with your Online Academic Tutor Discussion Forum Topic to be explored in the weekly discussion forum Glossary Learning activity centered on building a module glossary 8 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Group Assignment Assignments to be completed with peers Help Instructions on how to receive academic support and guidance Individual Assignment Assignments to be completed individually Lesson Material Learning content in myClass as per the units below Module Information Important information regarding your module like outcomes, credits, assessment, and textbooks Module Welcome A welcome to the module in myClass to introduce you to the module and important module information Outcomes Learning outcomes you will meet at the end of a section or module Survey A poll, feedback form or survey to complete Practice Indicates an activity for you to practice what you’ve learnt Lesson/Virtual Class Virtual Class links available via myClass 9 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Quote A thought, quote or important statement from a thought leader in the specialist field Reading Prescribed reading material and module textbooks Revision Questions and activities that will support your module revision Self-Assessment Quiz Weekly quizzes to complete to self-measure if you have a complete understanding of the lesson material Shout Out | Example Examples and highlights to contextualise the learning material, critical concepts and processes Lesson Material Indicates sections of learning material in myClass Thinking Point A question, problem or example posed to you for deeper thinking, interrogation, and reflection Time The allocated time required per week, unit and module related to the module credit structure as per your factsheet Video Additional videos, video tutorials, desktop capture/screen recording and other audiovisual supplementary material Vocabulary Important words and their definitions that aid the development of your specialist vocabulary 10 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 3 Introduction to the Module Welcome to the QUANTITATIVE TECHNIQUES This course is a Business Statistics in nature and its part of every management education programme offered today by academic institutions and business schools. Statistics provides evidence-based information which makes it an important decision support tool in management. Although students are encouraged to use this guide, it must be used in conjunction with other prescribed and recommended text. 3.1 Module Information Qualification title Bachelor of Commerce (Generic) Module Title Quantitative Techniques NQF Level 7 Credits 10 Notional hours 100 3.2 Module Purpose The purpose of this module is to instil critical thinking and analytical mind-set in making decisions in a business and management settings. This will give learners the ability to logically analyse sets of related issues and be able to come up with an informed decision. 3.3 Outcomes At the end of this module, you should be able to: • Describe the role of Statistics in management decision making and the importance of data in statistical analysis. • Describe the meaning of and be able to calculate the mean, confidence intervals for the mean, standard deviation, standard error, median, interquartile range, and mode. • Summarise tables (pivot tables) and graphs providing a broad overview of the profile of random variables, identifying the location, spread, and shape of the data. • Understanding the basic concepts of probability to help a manager to understand and use probabilities in decision making. • Describe and make use of probability distributions that occur most often in management situations, that describe patterns of outcomes for both discrete as well as continuous events • Review the different methods of sampling and the concept of the sampling distribution. • Describe the concept of interval estimation • Describe hypothesis testing and construct the null and an alternate hypothesis. 11 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© • Describe normal distribution, student’s distribution, binomial distribution, Poisson distribution, and the F distribution and apply for testing hypothesis. • Describe the concepts of multiple correlation and regression. • Discuss the necessity of including statistical planning in research design. • Test the difference between correlated and uncorrelated sample means using a t-test for two means and analysis of variance of several means. • Express the relationship between two variables by regression and calculating their correlation. • Analyse the dependence of one variable upon another by regression. • Sole well defined but unfamiliar problems using correct procedures and appropriate evidence. • Describe the time series analysis using a statistical approach to quantify the factors that influence and shape time series data and apply it to making forecasts of future levels of activity of the time series variables. 3.4 Assessment You will be required to complete both formative and summative assessment activities. Formative assessment: These are activities you will do as you make your way through the course. They are designed to help you learn about the concepts, theories, and models in this module. This could be through case studies, practice activities, self-check activities, study group / online forum discussions and think points. You may also be asked to blog / post your responses online. Summative assessment: These are activities you will do as you make your way through the course. They are designed to help you learn about the concepts, theories, and models in this module. This could be through case studies, practice activities, self-check activities, study group / online forum discussions and think points. You may also be asked to blog / post your responses online. You are required to do two individual assignments, online multiple-choice questions, and an online exam. Mark allocation These are activities you will do as you make your way through the course. They are designed to help you learn about the concepts, theories, and models in this module. This could be through case studies, practice activities, self-check activities, study group / online forum discussions and think points. You may also be asked to blog / post your responses online. The marks are derived as follows for this module: Individual Assignment 1 20% Individual Assignment 2 20% Online Multiple-Choice Questions 10% Online Exam 50% 12 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE TOTAL Damelin© 100% 3.5 Planning Your Studies / Resources Required for this Module: What equipment will I need? • • Access to a personal computer and internet. Scientific Calculator (Casio FX-82ZA Plus Scientific Calculator) or Sharp Writeview Scientific Calculator (ElW506) 4 Prescribed Reading 4.1 Prescribed Book Wegner, T (2016). Applied business statistics methods and Excel-based applications. Juta. Cape Town South Africa. 978-1-48511-193-1 4.2 Recommended Articles Please refer to the additional resources that are mentioned throughout the various weeks. 4.3 Recommended Multimedia Please refer to the video resources that are mentioned throughout the various weeks. 13 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5 Module Pacing Week 1 2 3 4 5 6 7 8 9 10 11 12 13 Topics STATISTICS IN MANAGEMENT SUMMARISING DATA: SUMMARY TABLES AND GRAPHS DESCRIBING DATA: NUMERIC DESCRIPTIVE STATISTICS BASIC PROBABILITY CONCEPTS PROBABILITY DISTRIBUTIONS CONFIDENCE INTERVAL ESTIMATION HYPOTHESES TESTS –SINGLE POPULATION (PROPORTIONS & MEANS) SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS TIME SERIES ANALYSIS: A FORECASTING TOOL REVISION 1 REVISION 2 REVISION 3 REVISION 4 Study Guide Unit Number 1 2 Textbook Chapter Number 1 2 3 3 4 5 6 7 4 5 7 8 8 9 9 10 1 To 2 3 To 5 6&8 9 & 10 NAME OF TOPIC FOR THE WEEK AS PER THIS GUIDE AND REFLECTIVE OF LMS Weeks WEEKLY TOPICS FOR THE SEMESTER 2020 1 2 3 4 5 6 7 8 9 10 11 12 13 Statistics in Management Summarising Data: Summary Tables and Graphs Describing Data: Numeric Descriptive Statistics Basic Probability Concepts Probability Distributions Confidence Interval Estimation Hypothesis Tests – Single Population (Proportions & Means) Simple Linear Regression and Correlation Analysis Time Series Analysis: A Forecasting Tool Revision 1 Revision 2 Revision 3 Revision 4 00..Exam Week Each Unit should be thought of as a “week of content”. If the unit is larger, it can be split over two weeks, but we should see that a week is a capsule or episode of learning that can have “consolidating” 14 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© learning activities. As such, each “unit” will be required to have a prescribed amount of learning activities and engagements. These are to be embedded within each week and not to be listed at the end of the unit. PLEASE SEE THE EXAMPLE OF UNIT CONTENT. Prescribed Learning Activities and Engagements: Video Content Podcast Thinking Point Case Studies Discussion Forum Example/Practice At least ONE video resource to be in each subsection of content. This is be embedded within the content at the appropriate time as per the learning design. At least ONE podcast to be in each unit. The podcast must be seen as supplementary to the learning content and if a podcast is not available on the specific topics at hand, an adjunct concept/topic can be used that will broaden the general area knowledge of the subject matter for the student. Podcasts should not be selected that are only available on streaming websites that require a subscription. At least ONE thinking point should be used within each subsection of content as a way to pause the movement through content and to provide the chance for the student to think and concretize their learning or what they have just read. A thinking point may be a hypothetical, a personal reflection or a question regarding the content within a different context (application). A thinking point must be thorough and engaging enough to draw pause and focus from the student. A case study should be within each unit and can be used in any relevant subsection of content. The case study should be robust enough for the student to understand how to apply something or to see how a function/tool/theory or practice may work in a real world environment. A case study should be seen as a way for the student to be reflected in the learning experience and as such, it is advised that case studies are selected from local/afrocentric contexts and illustrate our commitment to intersectionality within our teaching and learning approach and philosophy. Each unit of study/each week will require at least ONE discussion forum topics. This can either be embedded within a certain section of content or it can be at the end of the unit content depending on the requirements of the module as per the subject matter. The discussion forum topic/question should robust and dense enough for the student to be engaged and a reference must be made to the fact that the Discussion Forum topic is live and available within the module page on myClass. These are to be used within each section that deals with applied learning – the application of a process, technique, equation or function. The example is to be used when an example of a problem 15 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Vocabulary Glossary Additional Resource Prescribed Reading Quote Self-Assessment Quiz Damelin© and a solution is provided and the practice is to be used when a problem is provided for the student to solve. Vocabulary is to be used within each subsection of content where an important word, term or definition is provided that students are to take note of. The glossary is an LMS activity function and can be inserted into a guide where the development of a glossary is required and necessary for the module. This is to be used mainly within NQF 5 modules as it speaks to the specific level descriptors of that module. Each subsection of content must have at least THREE additional resources. These can be supplementary articles and journals, mixed/multimedia content such as a respected blog, social media account, news site, music video or audio recording. The additional resource must be provided by the study guide author if it is an “attachment” that will require loading into the LMS. Each subsection must refer to a page, section or chapter in the prescribed reading for the module. The prescribed reading should indicate to the student where to locate the texts from which the subsection has been summarised or written. This may be placed at the start of the subsection, or at the appropriate point where a student must leave the study guide/lms and read through a text section in the prescribed reading. Each subsection of content should have at least ONE quote that is from a thought leader in the field, or contextualises a section of learning for the student. The quote must not be inserted as a graphic but as plan text with the appropriate graphic alongside it. Each unit/week will have a self-assessment quiz for the student. Within the study guide, the author can refer to the self-assessment as per the below but must stress that the self-assessment will be live in the module myClass page for completion. Referencing 16 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.1 WEEK 1: STATISTICS IN MANAGEMENT Purpose The purpose of this unit is to introduce the common terms, notations and concepts in statistical analysis By the end of this week, you will be able to: Define the term ‘management decision support system’ Explain the difference between data and information explain the basic terms and concepts of Statistics and provide examples recognise the different symbols used to describe statistical concepts • Explain the different components of Statistics • Identify some applications of statistical analysis in business practice • Distinguish between qualitative and quantitative random variables • Explain and illustrate the different types of data • Identify the different sources of data • Discuss the advantages and disadvantages of each form of primary data collection • Explain how to prepare data for statistical analysis. It will take you 5 hours to make your way through this study week. • • Learning Outcomes Time Reading Wegner, T (2016). Applied business statistics methods and Excel-based applications. Juta. Cape Town South Africa. 978-1-48511-193-1. Chapter one 5.1.1 Introduction This week, we focuses on describing what is statistics and the role it plays in management and decision making. The importance of data in statistics is also discussed. Some basic statistical terms and concepts are also explained Let us Watch! 5.1.2 Statistics in Management In all academic institutions, business schools and management colleges worldwide, business statistics is part of every programme being offered today. The term statistics can take on a variety of meanings. It's frequently used to describe data of any sort, mass, pressure, height, weight, stock prices, batting average, GPA, temperature, etc. Other people may connect the term to the results of surveys, polls, and questionnaires. Particularly in our study, we will use the term statistics primarily to designate a specific academic discipline focused on methods of data collection, analysis, and presentation. Virtually in all case, statistics is concerned with the transformation of data into information. Black (2011) 17 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Management Decision Making As a manager decision making is one of the most crucial aspects of the jobs. Decision are made on all business activities such as what to sell, how to sell, how much to buy, which market to target, which equipment to buy; whether certain types of goods are of acceptable quality; where to locate stores so as to maximise profits; and whether girls buy more of a particular product than boys etc. Therefore, a well-informed decision based on quality information will need to be made. Wegner (2016) Information In order to make sound and viable business decisions, managers need high-quality information. Information must be relevant, adequate timeous, accurate and of easy access. Information is organised (collected, collated, summarized, analysed and presented) data values that are meaningful and can be used to make business decisions. Most often the information is not readily available in the formats required by the decision makers. Wegner (2016) Data Date constitute of individual values, for instance, observations or measurements on an issue e.g. R400.50, 5 days, 70 meters, strongly agree, etc. Data is readily available and carries a little useful and usable information to decision makers. Wegner (2016) Statistics It is a set of mathematically-based methods and techniques which transform small or large sets of raw (unprocessed) data into a few meaningful summary measures, that may exhibit relationships, show patterns and trends, which then contains very useful and usable information to support sound decision making, whether we're sorting out the day's stock quotations to make a more informed investment decision or unscrambling the latest market research data so we can better respond to customer needs and wants. The understanding and use of statistics empower managers to become confident and quantitative reasoning skills that enhance decision-making capabilities and provides an advantage over colleagues who do not possess them Black (2013). Transformation process from data to information INPUT___________PROCESS____________OUTPUT ____________BENEFIT (Data) (Statistical Analysis) (Information) (Management decision making) Source: Wegner 2016) 18 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Statistics support the decision process by strengthening the quantifiable basis from which a wellinformed decision can be made. Figure 1.1 Key Statistical Elements Source: Black (2013) Let us watch Watch the YouTube link below and write a brief essay on why you think statistics is important. https://www.youtube.com/watch?v=yxXsPc0bphQ 5.1.3 The terminology of Statistics Some essential terms and concepts are: Random Variable is any attribute or characteristic being measured or observed. It takes different values at each point of measurement e.g. Years of experience of an employee. Data, these are real values or outcomes drawn from a random variable e.g. Years of experience of an employee might be (2, 4, 1, 3, 2, 6). See below some examples of random variables and related data: • • • • Travel distances of delivery vehicles (data: 22 km, 18 km, and 29 km) Daily occupancy rates of hotels in Pretoria (data: 34%, 48%, and 34%) Duration of machine spends on working (data: 13 min, 21 min, and 18 min) Brand of washing powder preferred (data: Sunlight, OMO, and Aerial). Sampling Unit, this will be the item being measured, observed or counted with respect to the random variable under study. e.g. employees. Population represents every possible item that contains a data value (measurement or observation) of the random variable under study. The sampling units should possess the characteristics that are relevant to the problem. e.g. all employees of Damelin Pretoria City. 19 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Population Parameter, the actual value of a random variable in a population. It’s derived from all data values on the random variable in the population. It is constant. e.g. about 57% of MTN employees have more than 5 years’ experience. The sample is a subset of items drawn from a population. e.g. employees in the Finance department of MTN. Sample Statistic, It is a value of a random variable derived from sample data. It is NOT constant as its value always depends on the values included in each sample drawn. Table 1. 1 Examples of population and associated samples Random variable Population Sampling unit Size of bank All current accounts with An Absa client with overdraft Absa a current account Mode of daily All commuters to Cape A commuter to commuter transport Town’s central business Cape Town’s CBD to work district (CBD) TV programme All TV viewers in Gauteng A TV viewer in preferences Gauteng Age of students at a All students at Damelin A registered college College student at Damelin College Table 1. 2 Symbolic Notation for Samples and Population Measure Statistical Measure Sample Statistic Mean x Standard deviation S Variance S2 Size n Proportion p Correlation r Sample 400 randomly selected client’s current accounts 600 randomly selected commuters to Cape Town’s CBD 2000 randomly selected TV viewers in Gauteng 1000 randomly selected registered Damelin student Source: Wegner (2016) Population Parameter μ Σ σ2 N π ρ Source: Wegner (2016) 5.1.4 Components of Statistics Statistics has three components namely: • Descriptive Statistics – condense large volumes of data into summary measures. It seeks to paint a picture of a management problem scenario. • Inferential Statistics – sample findings can be generalized to the broader population. It extends the sample findings to the actual population. • Statistical Modelling – builds relationships between variables to make predictions. It uses equations to explain variables and to estimate or predict values of one or more of the variables under different management scenarios. 20 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.1.5 Statistical Applications in Management Recent examples that show importance of statistics decision making: • According to an Electronics wholesale survey, the average amount spent by a shopper on computer accessories in a two-month period is R 7280 at the Game store, R5040 at Makro, R2460 at PnP hyper, R6720 at the Incredible store, and R4200 at President Hyper. • 1275 workers survey by job mail reports shows that 45% of workers believe that the quality of their work is perceived the same when they work remotely as when they are physically in the office. • A KPMG Retail “Blue” survey of 1860 adults revealed that 44% agreed that plastic, noncompostable shopping bags should be banned. From these few examples, it is clear that there are a wide variety use and applications of statistics in business i.e. • • • • • Finance Marketing Human Resources Operation/Logistics Economics 5.1.6 Statistics and Computers The invention of computers has opened many new opportunities for statistical analysis. A computer allows for storage, retrieval, and transfer of large data sets. Some widely used statistical techniques, such as multiple regression, are so tedious and cumbersome to compute manually that they were of little practical use to researchers before computers were developed. Some statistical software packages, include i.e. R, Minitab, SAS, and SPSS. Wegner (2016) Case Study The link below takes you through various statistical packages that are used in industries. Identify three statistical packages that are of interest to you and carry out a brief research writing what features they entail. https://www.youtube.com/watch?v=hHywVkLwLzg 5.1.7 Data and Data Quality To this point, we've used the term data pretty loosely. In statistics, the term refers specifically to facts or figures that are subject to summarization, analysis, and presentation. A data set is a collection of data having some common connection. Data can be either numeric or non-numeric. Numeric data are data expressed as numbers. Non- numeric data are represented in other ways, often with words or letters. Telephone numbers and golf scores are examples of numeric data. Nationalities and nicknames are non-numeric. Data is the raw material of statistical analysis. If the quality of data is poor, the quality of information derived from statistical analysis of this data will also be poor. Consequently, user confidence in the statistical findings will be low. A useful acronym to keep in mind 21 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© is GIGO, which stands for ‘garbage in, garbage out’. It will be of great importance to understand influences of the quality of data needed to produce meaningful and reliable statistical results. Data is used to plan business activities and to make business decisions Black (2013) Data should be of good quality and the quality of data depends on three aspects: • Type of data • Source of data • Data collection method 5.1.8 Data Types Classification 1: categorical versus numeric Table 1. 3 Categorical Data Random variable Gender Country of origin Categories Female Male Angola Botswana Codes 1 2 1 2 Source: Developer’s compilation • Categorical data (qualitative): refers to data representing categories of outcomes of a random variable e.g. • Numeric data (quantitative data): refers to real numbers that can be manipulated using arithmetic operations to produce meaningful results. Classification 2: Nominal, Ordinal, Interval and Ratio Scales of Measurement Enormous numerical data are gathered in businesses every day, representing myriad items. For instance, numbers represent rand costs of items produced, geographical locations of retail outlets, weights of shipments, and rankings of subordinates at yearly reviews. All this data should not be analysed the same way statistically because the entities represented by the numbers are different. In such cases, the business researchers need to know the level of data measurement represented by the numbers being analysed Black (2013). Four common levels of data measurement follow. 1. 2. 3. 4. Nominal Ordinal Interval Ratio Nominal is the lowest level of data measurement followed by ordinal, interval, and ratio. Ratio is the highest level of data 22 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Figure 1.2 Hierarchy of Levels of Data Source: Black (2013) • Nominal-scaled data: used on categorical data of equal importance e.g. gender – male or female • Ordinal-scaled data: used on categorical data where ranking is implied e.g. Shirt size – small, medium, large. • Interval-scaled data: It is mainly from rating scales, which are used in questionnaires to measure respondents’ attitudes, motivations, preferences, and perceptions e.g. attitudes – poor, unsure, good etc. • Ratio-scaled data: used on numeric data involving direct measurement where there is an absolute origin of zero. E.g. length of service – 27 months, 45 months. Classification 3: Discrete versus Continuous Data • Discrete data: consists of whole numbers only. E.g. 1, 2, 3, 4 • Continuous data: is numeric data that can take any value in an interval (both whole number and fractional values) e.g. 4, 4.6, 10.7, 34.2 Case Study / Online Forum discussion Data types are a very important concept in the world of statistics, See the video below for additional information on data types and discuss in a group their relevance to data analysis. https://www.youtube.com/watch?v=hZxnzfnt5v8 5.1.9 Data Sources Data, of course, can be collected from a variety of sources—from vast government agencies charged with maintaining public records to surveys conducted among a small group of customers or prospective clients. 23 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© The Internet In these last 20 years, Internet has become an almost limitless source of business and economic data. With the help of powerful search engines, even the casual user has instant access to data that once would have required hours, if not weeks, of painstaking research (Bergquist et al, 2013). Government Agencies and Private-Sector Both governmental agencies and private-sector companies gather and make available a wealth of business and economic data. Thinking Point Name both governmental agencies and private sector companies that gather and make available business and economic data. Internal versus External Sources • Internal data refers to the availability of data from within an organization, examples are. Financial, production, human resources etc. • External data refers to data available from outside an organization, examples are. Employee associations, research institutions, government bodies etc. Primary versus Secondary Sources • Primary data is data which is taken at the point at which it is generated. i.e. surveys • Secondary data: the data is collected and processed by others for various purposes other than the problem at hand. i.e. publications Data Collection Methods • Observation Methods Direct observation –by directly observing the respondent or object in action data is collected. E.g. vehicle traffic survey Desk Research (Abstraction) – extracting secondary data from a variety of source documents. E.g. books, publications, newspapers etc. • Survey Methods: Primary data is gathered through the direct questioning of respondents. Personal interviews: involves face-to-face with a respondent during which a questionnaire gets to be completed. Postal surveys: involve posting questionnaires to respondents for completion 24 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Let us watch Watch the video on the link below about ideas on data collection and list some instances where you think a particular data collection method is suited for. https://www.youtube.com/watch?v=8SHnJfPQ9qc REVISION QUESTIONS 1. Why is it necessary to differentiate between different types of data? 2. What is the difference between: • Quantitative and qualitative data? • Discrete and continuous data? What types of information would be included in quantitative data? In what ways is qualitative data critical to the success of a business? 3. 4. Before we progress to the following learning unit learning unit, make sure you are able to understand and talk through the following concepts: • • • • • Nominal data Ordinal data Interval data Ratio data Scale of Measurement 5.1.10 Self-Assessment Let us see what you have learned so far by taking this short self-assessment. The Self-Assessment for this unit is embedded within your Quantitative Techniques in myClass. Head on to the quiz to see how you have fared with this section of content! Be sure to complete the self-assessment quiz before you move on to the next section! 25 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.2 WEEK 2: SUMMARISING DATA: SUMMARY TABLES AND GRAPHS Purpose The purpose of this unit is to explain how to summarise data into table format and how to display the results in an appropriate graph or chart By the end of this unit, you will be able to: Learning Outcomes Time • Summarize categorical data into frequency table and cross-tabulation table. • Interpret the findings from a categorical frequency table and crosstabulation table. • Construct and interpret appropriate bar and pie charts. • Summarize numeric data into a frequency distribution and cumulative frequency distribution (ogive). • Construct and interpret a histogram and a cumulative frequency polygon. • Construct and interpret a scatter plot of two numeric measures • Display time series data as line graphs and interpret trends. It will take you 10 hours to make your way through this study week. Reading Wegner, T (2016). Applied business statistics methods and Excel-based applications. Juta. Cape Town South Africa. 978-1-48511-193-1. Chapter two 5.2.1 Introduction This week aims to give the students an understanding of the common ways in which statistical findings are conveyed. The most commonly used means of displaying statistical results are summary tables and graphs. Summary tables can potentially be used to summarise single random variables as well as examine the relationship of 2 random variables. The choice of the summary table and graphic to be used depends on the type of data that need to be displayed. Managers do benefit from the statistical findings if the information will easily be interpreted and communicated effectively to them. Tables and graphs convey information much more efficiently and quick than a written report. In graphs and tables, for instance, there is much truth in the old adage ‘a picture is worth a thousand words’. While in practice, analysts’ should most if not all the times consider the use summary tables and graphical displays more than written texts. Profile consisting of a single random variable (e.g. most-preferred TV channel by viewers or pattern of delivery times) or to examining the relationship between two random variables (e.g. between gender and newspaper readership) can be easily summarised by summary tables and graphs. 5.2.2 Summarising Categorical Data Quantitative data graphs are plotted along a numerical scale while qualitative graphs are plotted using non-numerical categories. In this particular section of the study unit, the aim is to examine 2 types of qualitative data graphs, pie charts and bar charts, 26 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Categorical summary table (one-way pivot table) shows the count/percentage of responses that belong to each category of a categorical variable. If it is given as a count it is called absolute frequency and if given as a percentage or fraction it is referred to as relative frequency Wegener (2016). Table 2.1 Types of Cars at Company X Car Type Absolute Frequency Relative Frequency (%) Mazda 6 40 Toyota 3 20 Nissan 2 13 Isuzu 4 27 Total 15 100 Source: Developer’s own compilation (2021) Data from a categorical frequency table can be displayed as a pie chart or simple bar chart. Simple Bar chart Charts or bar graphs contain 2 or more categories along one axis and a series of bars, one for each particular category, along with the other axis. The length of the bar would represent the magnitude of the measure (frequency, percentages amount, money, etc.) for each category. A bar graph is qualitative since the categories are not numerical, they may be either horizontal or vertical. The same type of data that is used to produce a bar graph is also used to construct a pie chart. The advantage of a bar graph over a pie chart is that for some categories close in value, it is considered easier to see the difference in the bars of bar graph than discriminating between pie slices Black (2011). Construction of a simple bar chart • • • • The categories are exhibited on the horizontal axis. Frequencies are exhibited on the vertical axis. The height of each bar displays the frequency of each category. The width of the bars must be constant. Figure 2.1 Simple chart for Table 2.1 Source: Developer’s own compilation (2021) 27 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© In the example below, consider the data in Table 2.2 expenses by an average student on back-toschool spending. When constructing a bar graph from the data, the categories would be Electronics, Clothing and Accessories, Dorm Furnishings, School Supplies, and Misc. Bars for each of these categories are made using the rand figures given in the table. Figure 2.1 is the resulting bar graph produced by Excel Table 2.2 Back to School Spending Category Amount Spent (R) Electronics R211.89 Clothing R134.40 Dorm Furnishings R90.90 School Supplies R68.47 Misc. R93.72 Source: Black (2013) Figure 2.2 Bar Graph of Back to School Spending Source: Black (2013) Pie Chart A circular display of data where the area of the whole circle represents 100% of the data and slices of the circle represents a percentage breakdown of the different sublevels is known as a pie chart. Pie charts are widely used in business, mostly in showing things such as budget, market share, ethnic groups and time/resource allocations categories. Since pie charts can lead to less accuracy than are possible with other types of graphs there are, however, their use is minimized in the sciences and 28 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© technology. In general, it is harder for the viewer to interpret the relative sizes of angles in a pie chart than judging the length of rectangles in a bar chart Black (2013). Construction of a Pie chart • Divide a circle into segments. • Size of each segment should be proportional to the frequency count/ percentage of its category. • The sum of the segment frequency must equal to the whole. Figure 2.3 Pie Chart for Table 2.1 27% 40% Mazda Toyota Nissan Isuzu 13% 20% Source: Black (2013) • Profiling two categorical variables A cross-tabulation table (two-way pivot table) shows the number/percentage of observations that jointly belong to each combination of categories between two categorical variables. For example car type at company X in two years 2007 and 2008. The first categorical variable is car type with four categories (Mazda, Toyota, Nissan, and Isuzu). The second categorical variable is a year with two categories (2007 and 2008). The results can be displayed as shown in Table 2.3: Table 2.3 Type of Car at Company X 2007 & 2008 Car Type Year Total 2007 2008 Mazda 6 5 11 Toyota 3 2 5 Nissan 2 1 3 Isuzu 4 7 11 Total 15 15 30 Source: Black (2013) Data from a cross-tabulation table can be displayed as a component bar chart or a multiple bar chart. 29 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Figure 2.4 Component (Stacked) Bar Chart for Table 2.2 Source: Developer’s own compilation 5.2.3 Summarising Numeric Data Raw data, or data that have not been summarized in any way, are sometimes referred to as ungrouped data. Data that have been organized into a frequency distribution are called grouped data. The distinction between ungrouped and grouped data is important because the calculation of statistics differs between the two types of data. Several of the charts and graphs presented in this section are constructed from grouped data. One particularly useful tool for grouping data is the frequency distribution, which is a summary of data presented in the form of class intervals and frequencies Black (2013). Profiling a single numeric variable A numeric frequency table (distribution) is a summary table which groups numeric data into intervals and reports the frequency count of numbers assigned to each interval. Construction of a frequency table: • Determine data range, often is defined as the difference between the largest and smallest numbers • Decide on a number of classes, one rule of thumb is to select between 5 and 15 classes. If the frequency distribution contains too few classes, the data summary may be too general to be useful. Too many classes may result in a frequency distribution that does not aggregate the data enough to be helpful. The final number of classes is arbitrary. The business researcher arrives at a number by examining the range and determining the number of classes that will span the range adequately and also be meaningful to the user • Determine class width, an approximation of the class width can be calculated by dividing the range by the number of classes • Determine class limits, are selected so that no value of the data can fit into more than one class 30 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Table 2. 4 Frequency Table of Office Data Class Interval Column 1 Column 2 115 -<130 5 12.5 130-<145 7 17.5 145-<160 6 15 160-<175 12 30 175-<190 8 20 190-<205 2 5 Total 40 100 Damelin© Column 3 12.5 30 45 75 95 100 Column 4 100 87.5 70 55 25 5 Column 122.5 137.5 152.5 167.5 182.5 197.5 Source: Black (2013) Column 1: Absolute frequency Column 2: Relative frequency Column 3: Less than Cumulative frequency Column 4: More than Cumulative frequency Column 5: Class Mid-points Histogram A Histogram is a graphic display of numeric frequency distribution. One of the more widely used types of graphs for quantitative data is the histogram. A histogram is a series of contiguous rectangles that represent the frequency of data in given class intervals. If the class intervals used along the horizontal axis are equal, then the heights of the rectangles represent the frequency of values in a given class interval. If the class intervals are unequal, then the areas of the rectangles can be used for relative comparisons of class frequencies Black (2013). Figure 2.6 Example of a Histogram Source: Black (2011) 31 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Ogive – the cumulative frequency table An ogive (o-jive) is a cumulative frequency polygon. Construction begins by labelling the xaxis with the class endpoints and the y-axis with the frequencies. However, the use of cumulative frequency values requires that the scale along the y-axis be great enough to include the frequency total. A dot of zero frequency is plotted at the beginning of the first class, and construction proceeds by marking a dot at the end of each class interval for the cumulative value. Connecting the dots then completes the ogive. Ogives are most useful when the decision maker wants to see running totals. For example, if a comptroller is interested in controlling costs, an ogive could depict cumulative costs over a fiscal year. Steep slopes in an ogive can be used to identify sharp increases in frequencies Black (2013). Figure 2.7 Ogive of the Unemployment Data Source: Black (2013) REVISION QUESTIONS 1. Complete the sentence: ‘A picture is worth a .......................’ 2. What is the name was given to the chart that displays: a) The summarised data of a single categorical variable? b) The summarised data of two categorical variables simultaneously? 3. What is the name given to the table that summarises the data of two categorical variables? 4. Explain at least three differences between a bar chart and a histogram. 5. What is the name of the chart that is used to display time series data? 32 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.2.4 Self-Assessment Let us see what you have learned so far by taking this short self-assessment. The Self-Assessment for this unit is embedded within your Principles of Microeconomics in myClass. Head on to the quiz to see how you have fared with this section of content! Be sure to complete the self-assessment quiz before you move on to the next section! 33 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.3 WEEK 3: DESCRIBING DATA: NUMERIC DESCRIPTIVE STATISTICS The purpose of this week is to explore descriptive statistics, this kind of statistics help to identify the location, spread and shape of the data. Purpose By the end of this week, you will be able to: Learning Outcomes Time • Describe the various central and non-central location measures. • Calculate and interpret each of these location measures. • Describe the appropriate central location measure for different data types. • Describe the various measures of spread (or dispersion) • Calculate and interpret each measure of dispersion • Describe the concept of skewness • Calculate and interpret the coefficient of skewness • Calculate the five-number summary table and construct its box plot • Explain how outliers influence the choice of valid descriptive statistical measures It will take you 10 hours to make your way through this study week. Reading Wegner, T (2016). Applied business statistics methods and Excel-based applications. Juta. Cape Town South Africa. 978-1-48511-193-1. Chapter 3 5.3.1 Introduction Summary table and their graphical displays, described in study week 2 are used to communicate broad overviews of the profiles of random variables. Managers sometimes need numerical measures (statistics) to convey more precise information about the behaviour of random variables. This precise communication of data is the purpose of descriptive statistical measures. 5.3.1.1 Central Location Measures Descriptive Statistics – Location Measures Definition of Measures of Central Tendency 34 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© One type of measure that is used to describe a set of data is the measure of central tendency. Measures of central tendency yield information about the centre, or middle part, of a group of numbers, observations of a random variable tend to group about some central value. The statistical measures, which quantify where the majority of observations are concentrated, are referred to as measures of central location. A central location statistic represents a typical value or middle data point of a set of observation and is useful for comparing data sets Wegner (2016). There are three main measures of central location: 1. Arithmetic mean 2. Mode 3. Mean Measures of Central Tendency for Ungrouped Data 1. Mean The arithmetic mean is the average of a group of numbers and is computed by summing all numbers and dividing by the number of numbers. Because the arithmetic mean is so widely used, most statisticians refer to it simply as the mean. The population mean is represented by the Greek letter mu (μ). The sample mean is represented by Formula: Population Mean ๐= ∑ ๐ฅ๐ Sample Mean ๐ฅฬ = ∑ ๐ฅ๐ 2. The Mode ๐ ๐ The mode is the most frequently occurring value in a set of data Organizing the data into an ordered array (an ordering of the numbers from smallest to largest) helps to locate the mode i.e. The value with the highest frequency. It is calculated by observation Wegner (2016). 3. The Median The median is the middle value in an ordered array of numbers. For an array with an odd number of terms, the median is the middle number. For an array with an even number of terms, the median is the average of the two middle numbers Wegner (2016). The following steps are used to determine the median. STEP 1. Arrange the observations in an ordered data array. STEP 2. For an odd number of terms, find the middle term of the ordered array. It is the median. STEP 3. For an even number of terms, find the average of the middle two terms. This average is the median. 35 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Measures of Central Tendency for Grouped data 1. Mean For ungrouped data, the mean is computed by summing the data values and dividing by the number of values. With grouped data, the specific values are unknown. What can be used to represent the data values? The midpoint of each class interval is used to represent all the values in a class interval. This midpoint is weighted by the frequency of values in that class interval. The mean for grouped data is then computed by summing the products of the class midpoint and the class frequency for each class and dividing that sum by the total number of frequencies. The formula for the mean of grouped data follows. ๐ฅฬ = x = mid-point of each class interval ∑(๐๐ฅ) ๐ f = frequency of each data value n = total frequency. 2. The Mode The mode for grouped data is the class midpoint of the modal class. The modal class is the class interval with the greatest frequency. The formula for the mode of grouped data follows Mo = O๐๐ + ๐(⌊๐๐ −๐๐−1 ⌋) 2๐๐ − ๐๐−1 − ๐๐+1 Where: O๐๐ = lower limit of the modal interval ๐ = width of the modal interval ๐๐ = frequency of the modal interval ๐๐−1 = frequency of the interval preceding the modal interval ๐๐+1 = frequency of the interval following the modal interval 3. The Median The median for ungrouped or raw data is the middle value of an ordered array of numbers. For grouped data, solving for the median is considerably more complicated. The calculation of the median for grouped data is done by using the following formula. Median Formula: ๐⌊๐2−๐(<)⌋ Me = O๐๐ + ๐๐๐ 36 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Where: O๐๐ = lower limit of the median interval ๐ = class width ๐ = sample size (number of observations) ๐๐๐ = frequency count of the median interval ๐(<) = cumulative frequency count of all intervals before the median interval 5.3.2 Non-central Location Measures Definition: An economic system refers to any mechanism prevalent in the economy which is a vehicle by which scarce resources are produced and distributed in order to satisfy human needs and wants (Gemma, 2014). • Interquartile range • Quartile deviation 5.3.2.1 Measures of Dispersion for Ungrouped data Interquartile range It is the difference between the highest quartile (upper quartile) and the lowest quartile (lower quartile). i.e. ๐ฐ๐ธ๐น = ๐ธ๐ − ๐ธ๐ Quartile deviation It is a measure of the spread of the data values about the median. i.e. ๐ธ๐ซ = ๐ธ๐ −๐ธ๐ ๐ 5.3.3 Measures of Dispersion These are statistical measures that quantify the spread of the data set about their central location value. The main measures of dispersion include: • • • • • Range Variance Standard deviation Coefficient of variation Coefficient of skewness Range For our student survey data, 1, 3, 2, 2, 5, 4, 3, 3, 4, 3, we could report a range of 5 − 1 = 4 Unfortunately, although the range is obviously a simple measure to compute and interpret, its ability to effectively measure data dispersion is fairly limited. The problem is that only two values in the data set the smallest and the largest are actively involved in the calculation. None of the other values in between 37 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© has any influence at all. Considering also the data set consisting of the values 3, 7, 5, 2, 4, 5, 1000. The range would be 998, this would give a pretty misleading sense of the dispersion involved here since all the values but one is clustered within 5 units of each other.) The measures described next are intended to correct for this shortcomings Black (2013). Variance A variance is a measure of average squared deviation. It is calculated using all the data values in the dataset. For Ungrouped data For Grouped data ๐๐๐๐๐๐๐๐ = ๐๐ข๐ ๐๐ ๐ ๐๐ข๐๐๐๐ ๐๐๐ฃ๐๐๐ก๐๐๐๐ ๐๐๐๐๐๐ ๐ ๐๐ง๐ −1 ฬ ) ๐ −๐ฅ = ๐ 2 = ∑(๐ฅ๐−1 2 ∑ ๐๐ ๐ฅ๐ 2 − ๐๐ฅฬ 2 ๐ = ๐−1 2 Standard deviation It is the square root of the variance. For ungrouped data ∑(๐ฅ๐ −๐ฅฬ )2 For grouped data s=√ Coefficient of variation ∑ ๐๐ ๐ฅ๐ 2 − ๐๐ฅฬ 2 √ ๐ = ๐−1 ๐−1 2 It is used to compare variability where data sets are given in different units. The coefficient of variation essentially is a relative comparison of a standard deviation to its mean. The coefficient of variation can be useful in comparing standard deviations that have been computed from data with different means. ๐ Coefficient of variation (CV) = ๐ฅฬ × 100 5.3.4 Measure of Skewness Measures of shapes can be tools used in describing the shape of a distribution of data. This section, examines the skewness. Skewness is a measure of the shape of a uni-modal distribution of numeric data values. A distribution of data in which the right half is a mirror image of the left half is said to be symmetrical. Skewness is when a distribution is asymmetrical or lacks symmetry 38 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Figure 3.1 Relationships of Mean, Median and Mode Source: Black (2013) 5.3.5 The Box Plot Box and whisker plot is another way to describe a distribution of data. Sometimes referred as a box plot, is a depiction of the upper and lower quartiles together with the median and the two extreme values to show a distribution graphically. The median is then enclosed by the box. The box gets extended outward from the median along a band to the lower and upper quartiles, encompassing not only the median but also the middle 50% of the data. From the lower and upper quartiles, lines referred to as whiskers are stretched out from the box toward the outermost data values. The boxand-whisker plot is determined from five specific numbers sometimes referred to as the five-number summary Black (2013). Figure 3.2 Box - and -Whisker Plot 39 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Think point Most of the statistics presented here are drawn from studies or surveys. Let’s say a study of laundry usage is done in 50 South Africa households that have washers and dryers. Water measurements are taken for the number of gallons of water used by each washing machine in completing a cycle. The following data presented are the number of litres used by each washing machine during the washing cycle. Summarize the data so those study findings can be reported. 1. Calculate the Mean, mode, median, and standard deviation? Revision Questions 1. Select the appropriate central location measure (mean, median, mode) referred to in each of the following statements. (a) A quarter of our lecturers have more than 10 years’ work experience. (b) The most wealth city in South Africa in Johannesburg. (c) The average time taken by a runner to finish the 200m race is 17 seconds. 2. Identify for which of the following statements would the arithmetic mean be inappropriate as a measure of central location? (Give a reason.) State which measure of central location would be more appropriate, if necessary? (a) The ages of children at a playschool (b) The number of cars using a parking garage daily (c) The brand of cereal preferred by consumers (d) The value of transactions in a clothing store 5.3.6 Self-Assessment Let us see what you have learned so far by taking this short self-assessment. The Self-Assessment for this unit is embedded within your Principles of Microeconomics in myClass. Head on to the quiz to see how you have fared with this section of content! Be sure to complete the self-assessment quiz before you move on to the next section! 40 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.4 WEEK 4: BASIC PROBABILITY CONCEPTS Purpose The purpose of this week is to provide a brief overview of the basic concepts of probability to help a manager understand and use probabilities in decision making. By the end of this week, you will be able to: Learning Outcomes Time • Understand the importance of probability in statistical analysis. • Define the different types of probability. • Describe the properties and concepts of probabilities. • Apply the rules of probability to empirical data. • Construct and interpret probabilities from joint probability tables. • Understand the use of counting rules (permutations and combinations). It will take you 12 hours to make your way through this study week. Reading Wegner, T (2016). Applied business statistics methods and Excel-based applications. Juta. Cape Town South Africa. 978-1-48511-193-1. Chapter 4 5.4.1 Introduction Most business decisions are made under ambiguous conditions. Probability theory provides the underpinning for quantifying and assessing uncertainty. It is used to estimate the dependability in making inferences from samples to populations, as well as to quantify the uncertainty of future occurrences. 5.4.2 Types of Probability Uncertainty surrounds most aspects of the business situations. Frequently business people make decisions based on chance. Probability theory provides a logical way of quantifying and evaluating uncertainty. Probability is the chance or possibility of a particular outcome out of a number of conceivable outcomes occurring for a given event. There are two types of probability: • Subjective Probability • Objective Probability 41 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Subjective Probability The subjective method of assigning probability is based on the feelings or insights of the person determining the probability. Subjective probability comes from the person's intuition or intellectual. Although not a scientific approach to probability, the subjective method often is based on the accrual of knowledge, understanding, and experience stored and processed in the human mind. At times it is merely a supposition. At other times, the subjective probability can potentially yield accurate probabilities. Subjective probability can be used to exploit the background of experienced workers and managers in decision making. It is based on an educated guess, expert belief or value judgment. This type of probability cannot be confirmed statistically, hence it has limited use. E.g. the probability that it will rain in Cape town tomorrow is 0.15 Wegner (2016). Objective Probability It is founded on empirical observations or theoretical properties of an object. e.g. the probability of getting a Head after tossing a coin is 0.5. With this method, the probability of an event occurring is equal to the number of times the event has occurred in the past divided by the total number of opportunities for the event to have occurred. Formula: ๐ A = event of a specific type P(A) = ๐ r = number of outcomes of event A n = total number of all possible outcomes (sample space) P (A) = Probability of event A occurring e.g. A container contains 3 red balls and 2 black balls. If a ball is picked at random from the bag, what is the probability that it is: (i) red, (ii) black? Solution: i P(Red) = 3/5 ii P(Black) = 2/5 Experiment An experiment is a procedure that produces outcomes. Examples of business-oriented experiments with outcomes that can be statistically analysed might include the following. • Interviewing 10 randomly selected consumers and asking them which brand of washing powder do they prefer • Sampling every 100th bottle of KOO beans from an assembly line and weighing the contents • Testing new antibiotic drugs on samples of HIV patients and measuring the patients' improvement • Auditing every 5th account to detect any errors • Recording the S&P 500 index on the first Monday of every month for 5 years 42 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Event Since an event is an outcome of an experiment, the experiment expresses the possibilities of the event. If the experiment is to sample five bottles coming off a production line, an event could be to get one defective and four good bottles. In an experiment to roll a die, one event could be to roll an even number and another event could be to roll a number greater than two. 5.4.3 Properties of a Probability • The probability of an event A is a likelihood of the occurrence of an event. The probability of event A (denoted by P(A)) is a number between 0 and 1 inclusive (i.e. 0 ≤ ๐(๐ด) ≤ 1). • If P(A) = 0, then event A is unlikely to occur. • If P(A) = 1, then event A is certain to occur. • The sum of the probabilities of all possible events (i.e. the collective exhaustive set of events) equals one, i.e. P(A1 ) + P(A2 ) + P(A3 )+. . . +P(Ak ) = 1, for k possible events. • If P(A) is the probability of event A occurring. Then the probability of event A not occurring is ฬ ) = 1 − P(A). This is called complementary probability defined as P(A 5.4.4 Basic Probability Concepts Concept 1: Intersection of Two Events (A∩B) The intersection of events A and B is the set of outcomes that belong to both A and B altogether. The key word is “AND” Wegner (2016). Concept 2: Union of Two Events (A∪B) The union of events A and B is the set of outcomes that belong to either event A or B or both. The key word is “OR” Wegner (2016). Concept 3: Mutually Exclusive Events Events are mutually exclusive if they cannot happen together on a single trial of a random experiment. Two or more events are mutually exclusive events if the happening of one event precludes the occurrence of the other event(s). This characteristic means that mutually exclusive events cannot occur at the same time and therefore can have no intersection Wegner (2016). NB* The probability of two mutually exclusive events taking place at the same time is zero. Concept 4: Collectively Exhaustive Events Events are collectively exhaustive when the union of all possible events is equal to the sample space. i.e. at least one of the events is certain to occur in a randomly drawn object from the sample space Wegner (2016). Concept 5: Statistically Independent Events Two events A and B are statistically independent if the happening of event A has no effect on the outcome of event B and vice-versa. Two or more events are independent events if the occurrence or non-occurrence of one of the events does not affect the occurrence or non-occurrence of the other 43 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© event(s). Certain experiments, such as rolling dice, yield independent events; each die is independent of the other. Whether a 6 is rolled on the first die has no effect on whether a 6 is rolled on the second die. Coin tosses always are independent of each other. The event of getting ahead on the first toss of a coin is independent of getting ahead on the second toss. It is generally believed that certain human characteristics are independent of other events Wegner (2016). 5.4.5 Calculating Objective Probabilities Components of Objective Probabilities Empirically derived objective probabilities can be classified into three categories: • Marginal Probability • Joint Probability • Conditional Probability Marginal Probability A marginal probability is the probability of only a single event A occurring. i.e. the outcome of only one random variable Wegner (2016). Joint Probability A joint probability is the probability of both event A and event B occurring simultaneously on a given trial of a random experiment Wegner (2016). Conditional Probability A conditional probability is the probability of one event A occurring, given information about the occurrence of a prior event Wegner (2016). Figure 4.1 Marginal, Union, Joint and Conditional Probabilities Source: Black 2013 44 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.4.6 Probability Rules There are basically two probability rules: • Addition Rule • Multiplication Rule Addition Rule For non-mutually exclusive events P(A ∪ B) = P(A) + P(B) − P(A ∩ B) For mutually exclusive events Multiplication Rule P(A ∪ B) = P(A) + P(B) For statistically dependent events P(A ∩ B) = P(A|B) × P(B) For statistically independent events P(A ∩ B) = P(A) × P(B) 5.4.7 Probability Trees A probability tree is a graphical way to apply probability rules where there are multiple events that happen in sequence and these events can be represented by branches (similar to a tree). See example on page 114 of the prescribed textbook. 5.4.8 Permutations and Combinations Most probability questions involve counting large numbers of event outcomes and a total number of outcomes (n). Counting rules assist in finding values of r and n. Multiplication Rule of counting Factorial Notation It is used to find the total number of different ways in which n objects of a single event can be arranged (ordered). n! = n factorial = n(n-1)(n-2)(n-3) ….. 3.2.1 Permutations A permutation is the number of distinct ways of arranging a subset of r objects selected from a group of n objects where the order is important. Each possible arrangement (ordering) is called a permutation. Formula 45 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE . ๐๐ท๐ Combinations Rule Damelin© ๐! = (๐−๐)! A combination is the number of diverse ways of bringing together a subset of r objects selected from a group of n objects where the order is not important. Each separate grouping is called a combination. Formula . ๐๐ช๐ ๐! = ๐!(๐−๐)! Revision Questions 1. If an event has a probability equal to 0.2, what does this mean? 2. What term is used to describe two events that cannot occur 3. 4. 5. 6. simultaneously in a single trial of a random experiment? What is meant when two terms are said to be ‘statistically independent’? If P (A) = 0.26, P (B) = 0.35 and P(A and B) = 0.14, what is the value of P(A or B)? If P(X) = 0.54, P(Y) = 0.36 and P(X and Y) = 0.27, what is the value of P(X/Y)? Is it the same as P(Y/X)? Economic sectors In a survey of companies, it was found that 45 were in the mining sector, 72 were in the financial sector, 32 were in the IT sector and 101 were in the production sector. a) Show the data as a percentage frequency table. b) What is the probability that a randomly selected company is in the financial sector? c) If a company is selected at random, what is the probability that this company is not in the production sector? d) What is the likelihood that a randomly selected company is either a mining company or an IT company? e) Name the probability types or rules used in questions b, c and d. 5.4.9 Self-Assessment Let us see what you have learned so far by taking this short self-assessment. The Self-Assessment for this unit is embedded within your Principles of Microeconomics in myClass. Head on to the quiz to see how you have fared with this section of content! Be sure to complete the self-assessment quiz before you move on to the next section! 46 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.5 WEEK 5: PROBABILITY DISTRIBUTIONS Purpose The purpose of this week is to introduce a few important probability distributions that occur most often in management situations and also describe patterns of outcomes for both discrete as well as continuous events. By the end of this week, you will be able to: Learning Outcomes Time • Understand the concept of a probability distribution. • Describe three common probability distributions used in management practice. • Identify applications of each probability distribution in management • Calculate and interpret probabilities associated with each of these distributions. It will take you 12 hours to make your way through this study week. Reading Wegner, T (2016). Applied business statistics methods and Excel-based applications. Juta. Cape Town South Africa. 978-1-48511-193-1. Chapter 5 5.5.1 Introduction This week makes known to us the probability distributions. Probabilities can also be derived using mathematical functions known as probability distributions. Probability distributions quantify the uncertain conduct of many random variables in business practice. Probability distributions can define patterns of outcomes for both discrete as well as continuous events. important study unit, as it lays the foundation for most of the economic analysis in Microeconomics. 5.5.2 Types of Probability Distribution A probability distribution is a list of all the conceivable outcomes of a random variable and their associated probabilities of occurrence. Probability distributions can be classified into two groups: • Discrete Probability Distributions • Continuous Probability Distributions 5.5.3 Discrete Probability Distributions These are used to model random variables that take whole number values only. i.e. specific values. e.g. 0, 1, 2, 3, 4, etc. A random variable is a discrete random variable if the set of all possible values is at most a finite or a countably infinite number of possible values. In most statistical situations, discrete random variables produce values that are nonnegative whole numbers. The two common discrete probability distributions are: • The Binomial Probability Distribution • The Poisson Probability Distribution 47 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.5.4 Binomial Probability Distribution The word binomial indicates any single trial of a binomial experiment consists of only two possible outcomes. These two outcomes are categorized as success or failure. Usually, the outcome of interest to the analyst is labelled a success. Example 1. If a quality analyst is looking for defective products, he would contemplate finding a defective product a success even though the company would not consider a defective product a success. 2. If analysts are studying HIV patients, the outcome of getting an HIV person in a trial of an experiment is a success. The other possible outcome of a trial in a binomial experiment is called a failure. The word failure is used only in opposition to success. Characteristics: • There are only two, mutually exclusive and collectively exhaustive outcomes of the random variable, success & failure. • Each outcome has an associated probability: • Probability of success = p and probability of failure = q • p + q = 1 (always) • The random variable is observed n times/trials. Each trial generates either a success or failure. • Then trials are independent of each other. i.e. p & q are constant. The Binomial question: What is the probability that r successes will occur in n trials of the process under study? The Binomial formula: ๐(๐ฅ) = n.๐ถx ๐ ๐ฅ (1 − ๐)๐−๐ฅ ๐๐๐ ๐ฅ =0, 1, 2, 3, … n Where n = sample size r = the number of successes in the n independent trials p = probability of a success outcome q = probability of a failure outcome Descriptive Statistical Measures of the Binomial Distribution A measure of central location and a measure of dispersion can be calculated for any random variable that follows a binomial distribution using the following formulae: Mean ๐=๐๐ Standard deviation: ๐=√๐๐(1−๐) How to select p The success outcome is always associated with the probability, p. the outcome that must be labelled as the success outcome is identified from the binomial question. 48 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Useful Pointers on Calculating Probabilities • • Key words such as at least, no more than, at most, no less than, smaller than, larger than, greater than, no greater than always imply cumulative probabilities (i.e. the summing of individual marginal probabilities. The complementary rule should be considered whenever practical to reduce the number of probability calculation Revision question One study by CNNMoney reported that 60% of workers have less than $25,000 in total savings and investments (excluding the value of their home). If this is true and if a random sample of 20 workers is selected, what is the probability that fewer than 10 have less than $25,000 in total savings and investments? 5.5.5 Poisson Probability Distribution The Poisson distribution defines the occurrence of infrequent events. In fact, the Poisson formula has been signified to as the law of improbable events. Example 1. Serious accidents at a chemical plant are rare, and the number per month might be described by the Poisson distribution. 2. The number of random customer arrivals per five-minute interval at a small boutique on weekday mornings. The Poisson distribution is every so often used to explain the number of random arrivals per some time interval. If the number of arrivals per interval is too recurrent, the time interval can be reduced enough so that a rare number of occurrences is expected. In the field of management science, models used in queuing theory are usually based on the assumption that the Poisson distribution is the proper distribution to describe random arrival rates over a period of time. In statistical quality control, the Poisson distribution is the basis for the c control chart used to track the number of non-conformances per item or unit. Characteristics: Measures the number of occurrences of a particular event of a discrete random variable. • There is a pre-determined time, space or volume interval. • The average number of occurrences of the event is known or can be determined. The Poisson question: What is the probability of x occurrences of a given event being observed in a predetermined time, space or volume interval? 49 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© The Poisson formula: ๐(๐ฅ) = ๐ −๐ ๐๐ฅ ๐ฅ! for ๐ฅ = 0, 1, 2, 3, … Where: a = the mean number of occurrences of a given event of the random variable for a predetermined time, space or volume interval. e = a mathematical constant x = number of occurrences for which the probability is required. Descriptive Statistical Measures of the Poisson distribution A measure of central location and a measure of dispersion can be calculated for any random variable that follows a Poisson process using the following formulae: ๐๐๐๐: ๐ = ๐ Standard deviation: ๐ = √๐ Revision question One study by CNNMoney reported that 60% of workers have less than $25,000 in total savings and investments (excluding the value of their home). If this is true and if a random sample of 20 workers is selected, what is the probability that fewer than 10 have less than $25,000 in total savings and investments? 5.5.6 Continuous Probability Distribution These are used to model random variables that take both fractional and whole numbers. i.e. intervals of x-values. Continuous random variables take on values at every point over a given interval. Thus continuous random variables have no gaps or unassumed values. It could be said that continuous random variables are generated from experiments in which things are “measured” not “counted.” Example, If a worker is assembling a product component, the time it takes to accomplish that feat could be any value within a reasonable range such as 3 minutes 36.4218 seconds or 5 minutes 17.5169 seconds. A list of measures for which continuous random variables might be generated would include time, height, weight, and volume. The following are examples of experiments that could produce continuous random variables: 1. Sampling the volume of liquid nitrogen in a storage tank 2. Determining the time between customer arrivals at a retail outlet 3. Determining the lengths of newly designed automobiles 50 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 4. Determining the weight of grain in a grain elevator at different points of time NB* The main continuous probability random variable is the normal distribution. 5.5.7 Normal Probability Distribution Probably the most extensively known and used of all distributions is the normal distribution. It fits many human characteristics, such as height, weight, length, speed, IQ, scholastic achievement, and years of life expectancy, and many others. Like their human counterparts, living things in nature, such as trees, animals, insects, and others, have many characteristics that are normally distributed. Many variables in business and industry also are normally distributed. Examples 1. The annual cost of household insurance. 2. The cost per square foot of renting warehouse space. 3. Managers' satisfaction with support from ownership on a five-point scale. In addition, most items produced or filled by machines are normally distributed. Characteristics: • • • • • • It is a smooth bell-shaped curve It is symmetrical about the central mean value. The tails of the curve are asymptotic. The distribution is always described by two parameters, mean & standard deviation The total area under the curve will always equal one. The probability associated with a particular range of x-values is described by the area under the curve between the limits of the given x range. ( ๐ฅ1 < ๐ฅ < ๐ฅ2 ). Finding Probabilities using the normal distribution Special statistical tables are used to obtain probabilities for a range of values of x. 5.5.8 Standard Normal (z) Probability Distribution The normal distribution is described or illustrated by two parameters, the mean, μ, and the standard deviation, σ. That is, every unique set of the values of μ and σ explains a different normal distribution. Note that every change in a parameter (μ or σ) determines a different normal distribution. This characteristic of the normal curve (a family of curves) could make analysis by the normal distribution tedious because volumes of normal curve tables—one for each different combination of μ and σ— would be required. Fortunately, a method was established by which all normal distributions can be transformed into a single distribution: the z distribution. This process produces the standardized normal distribution (or curve). The changeover formula for any x value of a given normal distribution is as follows. ๐= ๐−๐ ๐ 51 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Revision Questions 1. Name two regularly used discrete probability distributions. 2. Specify whether each of the subsequent random variables is discrete or continuous: a) The mass of cans coming off a production line b) The number of employees in a company c) The number of households in Gauteng that have solar heating panels d) The distance traveled daily by a courier service truck. 3. Use the binomial formula to find each of the following probabilities: (i) n = 7, p = 0.2 and x = 3 (ii) n = 10, p = 0.2 and x = 4 (iii) n = 12, p = 0.3 and x ≤ 4 (iv) n = 10, p = 0.05 and x = 2 or 3 (v) n = 8, p = 0.25 and x ≥ 3 4. Once a week a merchandiser restock of a particular product brand in six stores for which she is responsible. Experience has shown that there is a one-in five chance that a given store will run out of stock before the merchandiser’s weekly visit. a) Which probability distribution is appropriate in this problem? Why? b) What is the probability that, on a given weekly round, the merchandiser will find exactly one store out of stock? c) What is the probability that, at most, two stores will be out of stock? d) What is the probability that no stores will be out of stock? e) What is the mean number of stores out of stock each week? Note: Calculate the probabilities in (b)–(d) using the binomial formula and then the Excel function BINOMDIST. 5.5.9 Self-Assessment Let us see what you have learned so far by taking this short self-assessment. The Self-Assessment for this unit is embedded within your Principles of Microeconomics in myClass. Head on to the quiz to see how you have fared with this section of content! Be sure to complete the self-assessment quiz before you move on to the next section! 52 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.6 WEEK 6: CONFIDENCE INTERVAL ESTIMATION Purpose The purpose of this week is to is to explain the process of confidence interval estimation. • • Learning Outcomes • • • Time Understand and explain the concept of a confidence interval Calculate a confidence interval for a population mean and a population proportion Interpret a confidence interval in a management context Identify factors that affect the precision and reliability of confidence intervals Determine sample sizes for desired levels of statistical precision. It will take you 15 hours to make your way through this study week. Reading Wegner, T (2016). Applied business statistics methods and Excel-based applications. Juta. Cape Town South Africa. 978-1-48511-193-1. Chapter 7 5.6.1 Introduction Inferential statistics’ role is to use sample evidence to ascertain population parameters. An important and. reliable procedure to estimate a population measure will be to use the sample statistic as a reference point and to create intervals of values around it. This would likely cover the true population parameter with a stated level of confidence and the procedure would be called confidence interval estimation. 5.6.2 Point Estimation A point estimate is a statistic drawn from a sample used to estimate a population measure. A point estimate is as good as the representation of its sample. If some other random samples are drawn from the population, the point estimates derived from those samples would vary. Due to this variation in sample statistic, estimating parameters of the population with interval estimate will be most preferable than using a point estimate. The point estimate is defined as the value to a single sample statistic, used in representing the true, but unknown value of a population parameter. For instance, sample mean is utilized to estimate the population mean measure and sample proportion statistics is used to estimate population proportion measure. Wegner (2016) 5.6.3 Confidence Interval Estimation Interval estimate (confidence interval) would be a range of values constructed around the value of sample statistic within which population parameter would be expected to lie with a certain level of confidence. A confidence would be bounded (upper and lower) of values within which the analyst may declare, with some confidence, were the population parameter is thought to be lying. Interval estimates may be two-sided or one-sided. Due to the central limit theorem, the below z formula for 53 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© sample means statistic can be used when the population standard deviation. The parameter is known regardless of the shape of the population when the sample size is large, or even for small sample sizes if population standard deviation is known then would imply the population is normally distributed. Rearranging this formula algebraically to solve for μ gives Because a sample mean can be greater than or less than the population mean, z can be positive or negative. Thus the preceding expression takes the following form: When this expression is rewritten it would yield the confidence interval formula for estimating μ with large sample sizes when the population standard deviation is known. 100(1-α) % confidence interval to estimate μ, when σ is known: ฬ ± ๐๐ถ⁄๐ ๐ ๐ √๐ Alpha (α) would be the area under the normal curve in the tails of the distribution outside the area defined by the confidence interval. Here we use α to locate the z value in constructing the confidence interval. For instance, if we would want to build a 95% confidence interval, the level of confidence is 95%, or .95. If 100 such intervals are constructed by taking random samples from the population, it is likely that 95 of the intervals would include the population mean and 5 would not. As the level of confidence is increased, the interval gets wider, provided the sample size and standard deviation remain constant. For 95% confidence, α = .0.5 and α/2 = .0.25. The value of zα/2 or z.025 is found by looking in the standard normal table under .0.5000 = .4750. This area in the table is associated with a z value of 1.96. Another way can be used to locate the table z value. Because the distribution is symmetric and the intervals are equal on each side of the population mean, ½(95%), or .4750, of the area is on each side of the mean. It would yield a z value of 1.96 for this portion of the normal curve. Thus the z value for a 95% confidence interval is always 1.96. In other words, of all the possible values along the horizontal axis of the diagram, 95% of them should be within a z score of 1.96 from the population mean. 54 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Figure 7.1 Z scores for confidence Intervals in Relation to α Source: Black 2013 Figure 7.2 Distribution of Sample Means for 95% Confidence Source: Black 2013 Think Point A survey was taken of South African. Companies that do business with firms in Nigeria. One of the questions on the survey was: Approximately how many years has your company been trading with firms in Nigeria? A random sample of 44 responses to this question yielded a mean of 10.455 years. Suppose the population standard deviation for this question is 7.7 years. Using this information, construct a 90% confidence interval for the mean number of years that a company has been trading in Nigeria for the population of South Africa. Companies trading with firms in Nigeria. 55 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.6.4 Confidence Interval for a single population mean: sample standard deviation is known, n – large (n>30) Confidence Interval = ฬ ± ๐๐ถ⁄๐ ๐ ๐ √๐ Where ๐ฅฬ = Sample mean z= value from the standard normal tables ๐ = population standard deviation n = sample size 5.6.5 The Precision of a Confidence Interval The width of a confidence interval is a measure of its precision, If the confidence interval is narrower the more precise is the interval estimate, and vice versa. The width of the confidence interval is influenced by: • the specified confidence level • the sample size • the population standard deviation Most commonly used confidence intervals are shown in the table below Confidence Level 90% ๐ − ๐๐๐๐๐๐ ±1.645 95% ±1.96 99% 5.6.6 The Student t-distribution ±2.58 In the formulas and problems deliberated so far in this unit, sample size was assumed large (n ≥ 30). In the business world, however, sample sizes may be small. While the central limit theorem applies only when the sample size is large, the distribution of sample means is approximately normal even for small sizes if the population is normally distributed Thus, if it is known that the population 56 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© from which the sample is being drawn is normally distributed and if σ is known, the z formulas presented in this previously can still be used to estimate a population mean even if the sample size is small (n < 30). Example Suppose a South Africa. car rental firm wants to estimate the average number of kms travelled per day by each of its cars rented in Cape Town. A random sample of 20 cars rented in Cape Town reveals that the sample mean travel distance per day is 85.5 km, with a population standard deviation of 19.3 km. Compute a 99% confidence interval to estimate μ. Here, n = 20, ๐ฅฬ = 85.5, and σ = 19.3. For a 99% level of confidence, a z value of 2.575 is obtained. Assume that number of kms traveled per day is normally distributed in the population. The confidence interval is The point estimate indicates that the average number of kms travelled per day by a rental car in Cape Town is 85.5 with a margin of error of 11.1 kms. With 99% confidence, we estimate that the population means is somewhere between 74.4 and 96.6 kms per day. 5.6.7 Confidence Interval for a Single Population Mean (μ) when the Population Standard Deviation (σ) is unknown The t distribution would be used instead of the z distribution for performing inferential statistics on the population mean in cases where the population standard deviation is unknown and the population is normally distributed. The formula for the t statistic is: This formula is essentially the same as the z formula, but the distribution table values are different. Confidence interval to estimate ๐: Population Standard deviation unknown and the population normally distributed, ฬ ± ๐๐ถ⁄๐−๐ ๐ ๐ ๐บ √๐ Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) 57 lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Example In the aerospace industry, some companies allow their employees to accumulate extra working hours beyond their 40-hour week. These extra hours sometimes are referred to as green time or comp time. Many managers work longer than the eighthour workday preparing proposals, overseeing crucial tasks, and taking care of paperwork. Recognition of such overtime is important. Most managers are usually not paid extra for this work, but a record is kept of this time and occasionally the manager is allowed to use some of this comp time as extra leave or vacation time. Suppose a researcher wants to estimate the average amount of comp time accumulated per week for managers in the aerospace industry. He randomly samples 18 managers and measures the amount of extra time they work during a specific week and obtains the results shown (in hours). Here constructs a 90% confidence interval to estimate the average amount of extra time per week worked by a manager in the aerospace industry. He assumes that comp time is normally distributed in the population. The sample size is 18, so df = 17. A 90% level of confidence results in α/2 = .05 area in each tail. The table t value is t.05, 17 =1,740 The subscripts in the t value denote to other researchers the area in the right tail of the t distribution (for confidence intervals α/2) and the number of degrees of freedom. The sample mean is 13.56 hours, and the sample standard deviation is 7.80 hours. The confidence interval is computed from this information as The point estimate for this problem is 13.56 hours, with a margin of error of ±3.20 hours. The researcher is 90% confident that the average amount of comp time accumulated by a manager per week in this industry is between 10.36 and 16.76 hours. From these figures, aerospace managers could attempt to build a reward system for such extra work or evaluate the regular 40-hour week to determine how to use the normal work hours more effectively and thus reduce comp time. 58 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.6.8 Confidence Interval for the Population Proportion (π) Business decision makers and researchers often need to be able to estimate a population proportion. Example What proportion of the market does our company control (market share)? What proportion of our products is defective? What proportion of customers will call customer service with complaints? What proportion of our customers is in the 20-to-30 age group? What proportion of our workers speaks Xhosa as a first language? Techniques similar to those previously discussed can be used to estimate the population proportion. The central limit theorem for sample proportions leads to the following formula Standard error of sample proportion p is calculated using ๐(1 − ๐) ๐๐ ≈ √ ๐ Thus the confidence interval for a single population proportion, p, is given by ๐ − ๐ง√ ๐(1−๐) ๐ (lower limit) ๐(1−๐) ๐ ≤ ๐ ≤ ๐ + ๐ง√ (upper limit) Example A study of 87 randomly selected companies with a telemarketing operation revealed that 39% of the sampled companies used telemarketing to assist them in order processing. Using this information, how could An analyst estimate the population proportion of telemarketing companies that use their telemarketing operation to assist them in order processing? For n = 87 and p = .39, a 95% confidence interval can be computed to determine the interval estimation of p. The z value for 95% confidence is 1.96. The confidence interval estimate is computed as follows 59 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© This interval suggests that the population proportion of telemarketing firms that use their operation to assist order processing is somewhere between .29 and .49, based on the point estimate of .39 with a margin of error of ±.10. This result has a 95% level of confidence Revision Questions 1) What is the aim of a confidence interval? 2) If x = 85, σ = 8 and n = 64, set up a 95% confidence interval estimate of the population mean, μ. 3) If the population standard deviation, σ, is not known, what standardised statistic used to construct a confidence interval? 4) If x = 54, s = 6 and n = 25, set up a 90% confidence interval estimate of the population mean, μ. 5) The Department of Trade and Industry (DTI) conducted a survey to estimate the average number of employees per small and medium-sized enterprises (SME) in Gauteng. A random sample of 144 SMEs in Gauteng found that the average number was 24.4 employees. Assume that the population standard deviation is: 8 5.6.9 Self-Assessment Let us see what you have learned so far by taking this short self-assessment. The Self-Assessment for this unit is embedded within your Principles of Microeconomics in myClass. Head on to the quiz to see how you have fared with this section of content! Be sure to complete the self-assessment quiz before you move on to the next section! 60 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.7 WEEK 7: HYPOTHESES TESTS – SINGLE POPULATION (PROPORTIONS & MEANS) The purpose of this unit is to explain the process to test the validity of a manager’s claim using a sample evidence. This unit only covers the hypothesis testing for single population mean and single population proportion Purpose By the end of this unit, you will be able to: • • • • • • Learning Outcomes Time Understand the concept of hypothesis testing Perform hypothesis tests for a single population mean Perform hypothesis tests for a single population proportion Distinguish when to use the z-test statistic or the t-test statistic Correctly interpret the results of a hypothesis test Correctly translate the statistical results into management conclusions. It will take you 12 hours to make your way through this study week. Reading Wegner, T (2016). Applied business statistics methods and Excel-based applications. Juta. Cape Town South Africa. 978-1-48511-193-1. Chapter Eight 5.7.1 Introduction In this unit, we focus on another approach of inferential statistics, where a claim made about the true value of a population parameter is assessed for validity. The hypothesis testing is a statistical process to test the validity of such claims using sample evidence. 5.7.2 The Process of Hypothesis Testing The hypothesis testing is a statistical rigorous process of testing for the closeness of a sample statistics to a hypothesized population parameter. General Procedure: Step 1: Formulation of the statistical hypotheses (Null & alternative) Step 2: Computing of the sample test statistic Step 3: Determining the rejection criteria 61 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Step4: Comparing the sample test statistic to the rejection criteria and make a conclusion. Step 5: Make statistical and management conclusions 5.7.3 Hypothesis Test for a Single Population Mean (μ) – Population Standard Deviation (σ) is known The most basic hypothesis test is a test about a population mean. A business analyst might be interested in testing to find out whether an established or accepted mean value for an industry is still true or in testing a hypothesized mean value for a new theory Example 1. A company dealing with computer products sets up a telephone service to assist customers by providing technical support. The average wait time during weekday hours was 35 minutes. However, more technical consultants were hired to the system; managers believe that it resulted in a decrease to waiting time, and they wish to prove it. 2. A boutique investment firm wishes to test to determine whether the average hourly change in the JSE average over a 5-year period is +0.25. 3. A manufacturing company wishes to test and determine whether the average thickness of a plastic bottle is 2.2 millimetres. 4. A retail store wants to test to determine whether the average age of its customers is less than 42 years. The formula below can be used to test hypotheses about a single population mean. When σ has known if the sample size is large (n ≥ 30) for any population and for small samples (n < 30) If x is known to be normally distributed in the population. Z Test for single mean ๐-๐๐๐๐ = ฬ −๐ ๐ ๐ √๐ 5.7.4 Hypothesis Test for a Single Population Mean (μ) – Population Standard Deviation (σ) is Unknown Most of the times when a business analyst is gathering data to test hypotheses about a single population mean, the value of the population standard deviation is unknown and the analyst must use the sample standard deviation as an estimate of it. In such cases, the z test cannot be used. In the previous study unit, the t distribution was presented which can be used to analyse hypotheses about a single population mean when σ is unknown if the population is normally distributed for the 62 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© measurement being studied. In this part of the unit, the t-test is discussed for a single population mean. More often, the t-test is applicable whenever the analyst is drawing a single random sample to test the value of a population mean (μ), the population standard deviation is unknown, and the population is normally distributed for the measurement of interest. The formula for testing such hypotheses follows. ๐ก-๐ ๐ก๐๐ก = ๐ฅฬ −๐ ๐ √๐ 5.7.5 Hypothesis Test for a Single Population Proportion (π) Data analysis used in decision making often contains proportions to describe such aspects as a consumer makeup, quality defects, market share, on-time delivery rate, profitable stocks etc. Most often business surveys produce information expressed in proportion form, such as .35 of all businesses offer flexible hours to its employees or .78 of all businesses have social networks for customers. The business analyst would conduct hypothesis tests about such proportions to determine whether they have changed in some way. Example 1. Suppose a company held a 36%, or .36, share of the market for several years. Resulting from a massive marketing effort and improved product quality, company officials believe that the market share increased, and they want to prove it. 2. A market researcher analyst wishes to test to determine whether the proportion of old car purchasers who are female has increased. 3. A financial analyst wants to test to determine whether the proportion of companies that were profitable last year in the average investment officer's portfolio is 0.50. 4. A quality assurance manager for a large manufacturing firm wishes to test to determine whether the proportion of defective items in a batch is less than 0.04. The formula below makes possible the testing of hypotheses about the population proportion in a manner similar to that of the formula used to test sample means ๐ง-๐ ๐ก๐๐ก = √ ๐−๐ ๐(1−๐) ๐ 5.7.6 The p-value approach to hypothesis testing P-value method is another way to reach a statistical conclusion in hypothesis testing problems. The pvalue technique tests hypotheses in where there is no present level or critical values of the test 63 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© statistic. Decisions to reject or fail to reject the null hypothesis are made using a p-value technique, which is the probability of getting a test statistic at least as extreme as the observed test statistic computed under the assumption that the null hypothesis is true. P-value is sometimes referred to as the observed significance level. The p-value technique has grown in importance with the increasing use of statistical computer packages to test hypotheses. The good thing about this method is that most computer statistical packages yield a p-value for every analysis. The p-value defines the smallest value of alpha ๐ผ for which the null hypothesis could be rejected. Example Suppose the p-value of a test is 0.038, the null hypothesis cannot be rejected at α = 0.01 because 0.038 is the smallest value of alpha for which the null hypothesis can be rejected and is larger than 0.01. However, the null hypothesis can be rejected for α = 0.05 because the p-value = 0.038 is smaller than α = 0.05. Manually solving for a p-value Suppose an analyst is conducting a one-tailed test with a rejection region in the upper tail, and the analyst obtains an observed test statistic of z = 2.04 from the sample data. Using the standard normal table, we find that the probability of randomly obtaining a z value this great or greater by chance is 0.5000− 0.4793 = 0.0207. Thus, the p-value for this problem is 0.0207. Using this information, the analyst would reject the null hypothesis for α = 0.05 or 0.10 or any value larger than 0.0207. The analyst would not reject the null hypothesis for any alpha value less than 0.0207 (in particular, α = 0.01, 0.001, etc.). When conducting two-tailed tests, remember that alpha is split to determine the critical value of the test statistic. For the two-tailed test, the p-value can be compared to α/2 to reach a statistical conclusion. If the p-value is less than α/2, the decision would be to reject the null hypothesis. Revision Questions 1. What is meant by the term ‘hypothesis testing’? 2. What determines whether a claim about a population parameter value is accepted as probably true or rejected as probably false? 3. Name the five steps of hypothesis testing. 4. What information is required to determine the critical limits for the region of acceptance of a null hypothesis? 5. If −1.96 ≤ z ≤ 1.96 defines the limits for the region of acceptance of a two-tailed hypothesis test and z-stat = 2.44, what statistical conclusion can be drawn from these findings? 64 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.7.7 Self-Assessment Let us see what you have learned so far by taking this short self-assessment. The Self-Assessment for this unit is embedded within your Principles of Microeconomics in myClass. Head on to the quiz to see how you have fared with this section of content! Be sure to complete the self-assessment quiz before you move on to the next section! 65 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.8 WEEK 8: SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS The purpose of this unit is to explain the technique used to quantify the relationship between variables and also identify the strength of the relationship as well as pointing out the significant variable in the prediction Purpose By the end of this unit, you will be able to: Learning Outcomes Time • Explain the meaning of regression analysis • Identify practical examples where regression analysis can be used construct a simple linear regression model • Use the regression line for prediction purposes • Calculate and interpret the correlation coefficient • Calculate and interpret the coefficient of determination It will take you 12 hours to make your way through this study week. Reading Wegner, T (2016). Applied business statistics methods and Excel-based applications. Juta. Cape Town South Africa. 978-1-48511-193-1. Chapter 12 5.8.1 Introduction Business decisions most often are made through predicting the unknown values of numeric variables using other numeric variables that may be related to it and for which values could be known. A statistical method that quantifies the relationship between a single response variable and one or more predictor variables is called regression analysis. This relationship, which is referred to as a statistical model, is used for prediction purposes. Correlation analysis, on the other hand, determines the strength of the relationships and determines which variables are useful in predicting the response variable. 5.8.2 Simple Linear Regression Regression analysis is the process of developing a mathematical model or function that can be used to predict or determine one variable by another variable or other variables. The simplest regression model is called simple regression or bivariate regression involving two variables in which just one variable is predicted by just one other variable. In simple linear regression, the variable to be predicted 66 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© is referred to as the dependent variable (y). The predictor is referred to as the independent variable, or explanatory variable, (x). In simple linear regression analysis, only a straight-line relationship between two variables is assessed Nonlinear relationships and regression models with more than one independent variable can be explored by using multiple regression models which are beyond the scope of this module. Independent variable (X): influences the outcome of the other variable Dependent variable (Y): influenced by the independent variable 5.8.3 Scatter Plot Usually, the first step in simple linear regression analysis is to develop a scatter plot (scatter diagram), Graphing the data in this way yields preliminary information about the spread and shape of the data. Figure 10.1 and Figure 10.2 is excel scatter plot of some data. In the case of the scatter diagrams below try to imagine a line passing through the points. Is a linear fit possible? Would a curve fit the data better? The scatter plot would give some rough idea of how well a regression line fits the data. Figure 8.1 Scatter Plot of Airline Cost Data Source: Black (2013) 67 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Figure 10.2 Scatter Plot of Airline Cost Data Source: Black (2013) Determining the equation of a straight line The first step in determining the equation of the regression line that passes through the sample data is to establish the equation's form. In regression analysis, analysts use the slope-intercept equation of a line. In statistics, the slope-intercept form of the equation of the regression line through the population points is: ฬ = ๐๐ + ๐๐ ๐ ๐ Where: ๐ฅ = values of the independent variable ๐ฆฬ = estimates values of the dependent variable ๐0 = ๐ฆ-intercept coefficient (where the regression line cuts the ๐ฆ-axis) ๐1 = slope (gradient) coefficient of the regression line To construct the equation of the regression line for a sample of data, the analyst must determine the values for b0 and b1. This procedure is sometimes referred to as least squares analysis. Least squares analysis is a procedure whereby a regression model is constructed by obtaining the minimum sum of the squared errors. On the basis of this premise and calculus, a particular set of equations has been developed to produce components of the regression model. 68 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Figure 10.3 Regression Line Source: Black (2013) Method of least squares: It Is a mathematical technique that determines the values ๐0 and ๐1 , such that: the sum of the squared deviations of data points from the fitted line is minimised The mathematical calculations of coefficients ๐0 and ๐1 that results from the method of least squares is as follows: ๐1 = Interpreting the ๐๐ Coefficient ๐ ∑ ๐ฅ๐ฆ − ∑ ๐ฅ ∑ ๐ฆ ๐ ∑ ๐ฅ 2 − (∑ ๐ฅ)2 ๐0 = ∑ ๐ฆ − ๐1 ∑ ๐ฅ ๐ The ๐1 regression coefficient is the slope of the regression line. It is a marginal rate of change measure. It is interpreted as follows: for a unit change in ๐ฅ, ๐ฆ will change by the value of ๐1 . Extrapolation Extrapolation occurs when ๐ฅ-values are chosen from outside the domain to substitute into the regression equation to estimate ๐ฆ. 69 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.8.4 Correlation Analysis In our case of examining two related variables, correlation measures the degree of relatedness of these variables. It can help business analysts determine, Several measures of correlation are available, the selection of which depends mostly on the level of data being analysed. Ideally, analysts would like to solve for ρ, the population coefficient of correlation. However, because analysts virtually always deal with sample data, this unit introduces a widely used sample coefficient of correlation, r The term r is a measure of the linear correlation of two variables, it ranges between −1 and +1, representing the strength of the relationship between the variables. For r-value of +1 denotes a perfect positive relationship between two variables. For r-value of −1 denotes a perfect negative correlation, indicating an inverse relationship between two variables: as one variable gets increases, the other decreases. For r-value of 0 means, no linear relationship is present between the two variables. It measures the strength of the linear association between X and Y. Pearson’s Correlation Coefficient (r) measures the correlation between two ratio-scaled random variables. Formula r= ๐ ∑ ๐ฅ๐ฆ − ∑ ๐ฅ ∑ ๐ฆ √[๐ ∑ ๐ฅ 2 − (∑ ๐ฅ)2 ] × [๐ ∑ ๐ฆ2 − (∑ ๐ฆ)2 ] Where: ๐ = ๐กโ๐ ๐ ๐๐๐๐๐ ๐๐๐๐๐๐๐๐ก๐๐๐ ๐๐๐๐๐๐๐๐๐๐๐ก ๐ฅ = ๐กโ๐ ๐ฃ๐๐๐ข๐๐ ๐๐ ๐๐๐๐๐๐๐๐๐๐๐ก ๐ฃ๐๐๐๐๐๐๐ ๐ฆ = ๐กโ๐ ๐ฃ๐๐๐ข๐๐ ๐๐ ๐กโ๐ ๐๐๐๐๐๐๐๐๐ก ๐ฃ๐๐๐๐๐๐๐ ๐ = ๐กโ๐ ๐๐ข๐๐๐๐ ๐๐ ๐๐๐ ๐๐๐ฃ๐๐ก๐๐๐ 5.8.5 The Coefficient of Determination (r 2 ) Suppose the sample correlation coefficient, r, is squared (r2), the resulting measure is called the coefficient of determination. The coefficient of determination measures the proportion (or percentage) of variation in the dependent variable, y, that is explained by the independent variable, x. The coefficient of determination values ranges between 0 and 1 (or 0% and 100%). 0 ≤ r2 ≤ 1 or 0% ≤ r2 ≤ 100% r2 is an important indicator of the usefulness of the regression equation because it measures how strongly x and y are associated. The closer r² is to 1 (or 100%), the stronger the association between x and y. alternatively, the closer r² is to 0, the weaker the association between x and y. To the economist, the cost of using something in a particular way is the benefit foregone by not using it in the best alternative way. This is called opportunity cost. Whereas accountants and businesspeople consider only the actual expenses incurred to produce a product, the economist measures the cost of production as the best alternative sacrificed (or foregone) by choosing to produce a particular product. 70 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Revision Questions 1. What is regression analysis? What is correlation analysis? 2. What name is given to the variable that is being estimated in a regression equation? 3. What is the purpose of an independent variable in regression analysis? 4. What is the name of the graph that is used to display the relationship between the dependent variable and the independent variable? 5. What is the name given to the method used to find the regression coefficients? 6. Explain the strength and direction of the association between two variables, x, and y that have a correlation coefficient of −0.78. 5.8.6 Self-Assessment Let us see what you have learned so far by taking this short self-assessment. The Self-Assessment for this unit is embedded within your Principles of Microeconomics in myClass. Head on to the quiz to see how you have fared with this section of content! Be sure to complete the self-assessment quiz before you move on to the next section! 71 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 5.9 WEEK 9: TIME SERIES ANALYSIS: A FORECASTING TOOL Purpose The purpose of this unit is to explain how to treat time series data and how to prepare forecasts of future levels of activities By the end of this unit, you will be able to: Learning Outcomes Time • Explain the difference between cross-sectional (survey) and time series data • Explain the purpose of time series analysis • Identify and explain the components in time series analysis • Calculate and interpret the trend values in a time series • Calculate and interpret the seasonal influence in a time series • De-seasonalise a time series and explain its value • Prepare seasonally adjusted forecast values of a time series It will take you 12 hours to make your way through this study week. Reading Wegner, T (2016). Applied business statistics methods and Excel-based applications. Juta. Cape Town South Africa. 978-1-48511-193-1. Chapter 15 5.9.1 Introduction Data collected on a given phenomenon over a period of time at systematic intervals is known as time-series data. Time-series forecasting methods endeavours to account for changes over time by studying patterns, trends, or cycle, or making use of information about previous time periods to predict the outcome for a future time period. Time-series methods include naïve methods, averaging, smoothing, regression trend analysis, and the decomposition of the possible time-series factors. Most data used in the statistical analysis is known as cross-sectional data, meaning that it is gathered from sample surveys at one point in time. Conversely, data can also be collected over time. For instance, when a business collects its daily, weekly or monthly gross revenue; or when a household records their daily or monthly electricity usage, they are gathering a time series of data. 5.9.2 The Components of a Time Series The general conviction is that time-series data is comprised of four components: trend, cycles, seasonal effects, and irregular fluctuations. Not all time-series data have all these features. 72 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Figure 11.1 Bond yield time series data Source: Black (2013) Bond yield data portrayed in Figure 9.1. The general trend appears to move downward and comprises two cycles. Each of the cycles passes through approximately 5 to 8 years. It is possible, although not presented here, that seasonal periods of highs and lows within each year result in seasonal bond yields. In addition, irregular daily variations of bond yield rates may occur but are unexplainable. Timeseries data that comprise of no trend, cyclical or seasonal effects are thought to be stationary. Approaches used to forecast stationary data analyse only the irregular fluctuation effects. Figure 11.2 Time Series Effects Source: Black (2013) Figure 9.2, which shows the effects of these time-series elements on data over a period of 13 years. The long-term general direction of data is referred to as a trend. Notice that even though the data move through upward and downward periods, the general direction or trend is increasing Cycles are 73 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© patterns of highs and lows through which data move over time periods usually of more than a year. Notice that the data in Figure 9.2 apparently move through two periods or cycles of highs and lows over a 13-year period. Time-series data that do not extend over a long period of time may not have enough “history” to show cyclical effects. Seasonal effects, on the other hand, are shorter cycles, which generally occur in time periods of less than one year. Every so often seasonal effects are measured by the month, but they may occur by a quarter or may be measured in as small a time frame as a week or even a day. Note the seasonal effects shown in Figure 9.2 as up and down cycles, many of which occur during a 1-year period. Irregular variations are rapid changes or “bleeps” in the data, which ensue in even shorter time frames than seasonal effects. Irregular fluctuations can happen as often as day to day. They are subject to momentary change and are often unexplained. Note the irregular fluxes in the data of Figure 9.2. 5.9.3 Decomposition of a Time Series Time series methods wish to separate the effects of each of the four factors on the actual time series. Time series models are used on the basis for assessing the influence of these four components assumes a multiplicative relationship between them. The multiplicative time series model is expressed algebraically as: y=T×C×S×I Where: T =trend C =cycles S =seasonal effects I =irregular fluctuations In this section of the unit, we examine statistical approaches to quantify trend and seasonal variations only. These two components account for the most significant proportion of an actual value in a time series. By isolating them, most of an actual time series value will be explained. 5.9.4 Trend Analysis The long-term trend in a time series may be isolated by removing the medium- and short-term fluctuations (i.e. cycles, seasonal and random) in the series. This results in either a smooth curve or a straight line, depending on the method selected. Two procedures for trend isolation could be: • moving average method, which produces a smooth curve • Regression analysis, which results in a straight-line trend. The moving average time series is a smoother series than the original time series values. It has removed the effect of short-term fluctuations (i.e. seasonal and irregular fluctuations) from the original observations, y, by averaging over these short-term fluctuations. The moving average value can be seen as reflecting mainly the combined trend and cyclical movements. In symbol terms for the multiplicative model: 74 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© Moving average = T × C × S × I S × I = T × C The moving average technique is an average that is updated or recomputed for every new time period being considered. The most recent information is utilized in each new moving average. This advantage is offset by the disadvantages that: 1. it is difficult to choose the optimal length of time for which to compute the moving average, and 2. moving averages do not usually adjust for such time-series effects as trend, cycles, or seasonality. To determine the more optimal lengths for which to compute the moving averages, we would need to forecast with several different average lengths and compare the errors produced by them. 5.9.5 Seasonal Analysis Seasonal effects are patterns of data trait that ensue in periods of time of less than one year. How can we separate out seasonal effects? The ratio-to-moving-average method is used to measure and quantify these seasonal effects. This method asserts the seasonal influence as an index number. It measures the percentage digression of the actual values of the time series, y, from a base value that disregards the short-term seasonal effects. These base values of a time series represent the trend/cyclical impacts only. 5.9.6 Uses of Time Series Indicators Time series indicators are important planning aids to managers in two ways: 1. To de-seasonalise a time series (i.e. exclusion of seasonal influences), and so afford a clearer vision of the longer-term trend/cyclical movements surfaces 2. To create seasonally adjusted trend forecasts of future values of a time series. 5.9.7 Self-Assessment Let us see what you have learned so far by taking this short self-assessment. The Self-Assessment for this unit is embedded within your Principles of Microeconomics in myClass. Head on to the quiz to see how you have fared with this section of content! Be sure to complete the self-assessment quiz before you move on to the next section! 75 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com) lOMoARcPSD|21931040 Quantitative Techniques – STUDY GUIDE Damelin© 6 REFERENCES Bergquist, T., Jones, S. and Freed, N (2013) Understanding Business Statistics. John Wiley & Sons Black K (2013). Business Statistics: For Contemporary Decision Making, 7th Edition. John Wiley & Sons. Wegner, T, (2016) Applied Business Statistics: Methods and Excel-based Applications, 4th ed. Juta. Cape Town South Africa 76 Downloaded by Joshua Benjamin Rodriguez (joshuabenjaminrodriguez@gmail.com)