Statistics for Business Instructor: Prof. Ken Tsang Room E409-R11 Email: kentsang@uic.edu.hk 1 TA information Mr. ZHOU, Min 周敏 Room E409 Tel:3620620 minzhou@uic.edu.hk 2 Web-page for this class • Watch for announcements about this class and • download lecture notes from • http://www.uic.edu.hk/~kentsang/stat2012/st at2012.htm • Or from this page: http://www.uic.edu.hk/~kentsang/ Or from Ispace 3 Tutorials • One hour each week • Time & place to be announced later (we need your input) • More explanations • More examples • More exercises 4 How is my final grade determined? • • • • Quizzes Mid-term exam Assignments Final Examination 20% 20% 10% 50% 5 Some requirements on this Course Assignments must be handed in before the deadline. We will tell you your scores for the mid-term test and quizzes so that you know your progress. However, for the final examination, we cannot tell you the score before the AR release the official results. 6 UIC Score System 7 Grade Distribution Guidelines 8 General Information • Textbook Business Statistics in Practice, 5th Edition, Bowerman O’Connell Murphree, McGraw Hill International Edition(2009) 9 Statistics for the Behavioral Sciences Frederick J Gravetter and Larry B. Wallnau Wadsworth Publishing; 8 edition (December 10, 2008) 10 Chapter 1 An Overview of Statistics 11 Chapter Sumary 1.1 Populations and Samples 1.2 Ratio, Interval, Ordinal, and Nominative Scales of Measurement 1.3 An Introduction to Survey Sampling 12 What is statistics? Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting numerical data to gain more knowledge, make more effective decisions. 13 What is Statistics? • Statistics is the branch of mathematical science to make effective use of numerical data relating to a population (groups of individuals or experiments). • It deals with all aspects of the collection, analysis, interpretation (or explanation) and presentation of such data, as well as the planning of the collection of data (i.e. the design of surveys and experiments). 14 Data collection and statistical analysis • Once a sample that is representative of the population is determined, data is collected for the sample members in an observational or experimental setting. • This data can then be subjected to statistical analysis, serving two related purposes: – description – inference 15 Why do I have to learn Statistics? • Social policy, medical practice, and business decision all rely on the proper use of statistics. • Misuse of statistics can produce subtle, but serious errors in description and interpretation of data, which leads to wrong decision. • Even when statistics are correctly applied, the results can be difficult to interpret for those lacking expertise. • The set of basic statistical skills (and skepticism) that people need to deal with information in their everyday lives properly is referred to as statistical literacy. ? 16 Who uses statistics? Statistical techniques are used in many areas: Government Marketing quality control Medical Research (sports, education, politic, psychology…) Who Uses Statistics? 17 Examples in business statistics • Consumer price index (CPI) is a measure estimating the average price of consumer goods and services purchased by households (a constant basket of goods and services within the same area). • Gross domestic product (GDP) is the market value of all final goods and services made within the borders of a country in a year. 18 Recent developments • There are more and more data around us, because – It is cheap to obtain & store • Computational tools are widely available. They are cheap and effective. 19 The McKinsey Global Institute: 20 How Companies Learn Your Secrets By CHARLES DUHIGG Published: February 16, 2012 21 Basic Terminology • • • • • • • Measurement, data Variables, Value Quantitative, Qualitative Population, Sample Census Descriptive Statistics Inferential Statistics 22 Measurement The process of determining the extent, quantity, or amount of the variable of interest for a particular item of the population. • Produces data • For example, collecting the starting salaries of graduates from last year’s MBA program 23 Data • Data can be viewed as the raw material from which information is obtained, just as trees are the raw material from which paper is produced. • In fact, a good definition of data is "facts or figures from which conclusions can be drawn". 24 Variables • A variable is a characteristic that may assume set of values to which a value can be assigned. • Height, age, amount of income, province or country of birth, grades obtained at school and type of housing are all examples of variables. 25 Value The result of measurements from a variable. • The specific measurement for a particular unit in the population • For example, the starting salaries of graduates from last year’s MBA Program 26 Quantitative Values that can be expressed as quantities/numbers. (For example, “how much” or “how many.”) • Annual starting salary of college graduate • Age and weight of a person 27 Qualitative A descriptive category to which the value can belong (a descriptive attribute of a population unit) • A person’s gender • A person’s hair color 28 Population A population is the set of all the individuals of interest in a particular study. • For example, if we want to know the starting salaries of all UIC graduates then the population of interest is the totality of all UIC graduates. 29 Census The procedure of systematically acquiring and recording information (taking measurements) about all the members of a given population. • Census usually too expensive, too time consuming, and too much effort for a large population 30 Sample A sample is a set of individuals selected from a population, usually intended to represent the population in a research study. • For example: 1,000,000 Chinese college students graduated in 2010 • This is too large for a census • So, we select a sample of these graduates and study their annual starting salaries 31 32 Population – the object of statistical study • In applying statistics to a scientific, industrial, or societal problem, it is necessary to begin with a population to be studied. • Populations can be diverse topics such as "all persons living in a city/country" or “all past and present students of UIC". 33 Parameter & Statistic A parameter is a value, usually a numerical value, that describes a population. A parameter may be obtain from a single measurement, or it may be derived from a set of measurements from the population. A statistic is a value, usually a numerical value, that describes a sample. A statistic may be obtain from a single measurement, or it may be derived from a set of measurements from the sample. 34 Sampling error • Sampling error is the discrepancy, or amount of error, that exists between a sample statistic and the corresponding population parameter. 35 Example of Sampling error 36 Descriptive Statistics are procedures to organize, summarize, and present data in an informative way. EXAMPLE 2: According to Consumer EXAMPLE 1: The average test score for the students in a class, to give a descriptive sense of the typical scores. Reports, there were 2.5 problems per one copying machines reported during 2009. 37 Descriptive statistics • Descriptive statistics summarize/characterize the population data by describing what was observed in the sample numerically (tabular) or graphically. • Numerical descriptors include mean and standard deviation for continuous data types (like heights or weights), while frequency and percentage are more useful in terms of describing categorical data (like race, gender…). 38 Descriptive Statistics To describe the important aspects of a set of measurements. • For example, for a set of starting salaries, we want to know: – How much to expect (mean) – What is a high versus low salary • If the population is small, could take a census and make statistical inferences • But if the population is too large, then … 39 Inferential Statistics The science that allow us to study samples and then make generalizations about the population from which they were selected (i.e. to determine [in statistical sense] the population parameters from sample statistics). • For example, use a sample of starting salaries to estimate the important aspects of the population of starting salaries. 40 Inferential statistics • Inferential statistics (or inductive statistics) uses patterns in the sample data to draw inferences about the population represented. • These inferences may take the form of: – answering yes/no questions about the data (hypothesis testing), – estimating numerical characteristics of the data (estimation), – describing associations within the data (correlation), – modeling relationships within the data (regression). 41 Examples of inferential statistics Example 1: In each month, 1000 families were chosen at random. An popular index of TV channel are computed base from the data obtained in these family. Example 2: The accounting department of a large firm will select a sample of the invoices to check for accuracy for all the invoices of the company. #1 42 Difference between descriptive & inferential statistics • Descriptive statistics are distinguished from inferential statistics in that descriptive statistics aim to quantitatively summarize a data set, rather than being used to support inferential statements about the population that the data are thought to represent. • Descriptive statistics- get a “feel” (characterization) for the data • Inferential statistics- draw conclusions from the data 43 Example: Descriptive & Inferential statistics 44 45 Data and Variables • Variables are qualitative or quantitative attributes that characterize a population/ sample. • Data (plural of "datum", which is seldom used) are typically the results of measurements of a set of variables. 46 Types of Variables For a Qualitative or Attribute Variable the characteristic being studied is nonnumeric. Gender Eye Color Type of car Types of Variables In a Quantitative Variable information is reported numerically. Balance in your checking account Final score for the students in a class Number of children in a family Types of Variables Quantitative variables can be classified as either Discrete or Continuous. Discrete Variable consists of separate, indivisible values. There are “gaps” between possible values of the variable. Example: the number of bedrooms in a house, or the number of hammers sold at the local hardware store (1,2,3,…,etc). Types of Variables A Continuous Variable can assume any value within a specified range. There are infinite number of possible values between any 2 observed values. The pressure in a tire The weight of a pork chop The height of students in a class. Summary of Types of Variables DATA Qualitative or attribute (type of car owned) Quantitative or numerical discrete (number of children) continuous (time taken for an exam) Scales of Measurement There are four scales of data Nominal Ordinal Interval Ratio Pinot noir 52 Nominal data Nominal Scales Data are Gender classified into categories. But the ordering of categories is not meaningful. These are: –Identifier or name –Unranked categorization •Example: gender, eye or skin color Eye Color 53 Scales of Measurement Category of Nominal scale variables must be Mutually exclusive ALL the individual (or object or measurement) must appear in ONLY ONE category. Exhaustive ALL the individual (or object or measurement) must appear in AT LEAST ONE of the categories. 54 Scales of Measurement Ordinal Scale: Orders are meaningful in ordinal scale, but differences are not. During a taste test of 4 soft drinks, Coca Cola was ranked number 1, Sprite number 2, Pepsi number 3, and Root Beer number 4. Can we say Coca Cola is 2 better then Pepsi? 4 2 1 3 55 Ordinal data • Ordinal data – All characteristics of nominal data plus… – Rank-order categories – Ranks are relative to each other • Example: Low (1), moderate (2) or high (3) risk 56 Scales of Measurement Interval Scales Both the orders and differences are meaningful but the ratio is not. Temperature on the Fahrenheit scale. 57 Interval data • All of the characteristics of ordinal data plus… • Measurements are on a numerical scale with an arbitrary zero point – The “zero” is assigned: it is nonphysical and not meaningful – Zero does not mean the absence of the quantity that we are trying to measure 58 Interval data Continued • Can only meaningfully compare values by the interval between them – Cannot compare values by taking their ratios – “Interval” is the arithmetic difference between the values • Example: temperature – 0 F means “cold,” not “no heat” – 80 F is not twice as warm as 40 F 59 Scales of Measurement Ratio Scales: Orders, Differences and ratios are meaningful for this level of measurement. Miles traveled by sales representative in a month Monthly income of surgeons Ratio data • All the characteristics of interval data plus… • Measurements are on a numerical scale with a meaningful zero point – Zero means “none” or “nothing” • Values can be compared in terms of their interval and ratio – $30 is $20 more than $10 – $0 means no money 61 Ratio data Continued • In business and finance, most quantitative variables are ratio variables, such as anything to do with money – Examples: Earnings, profit, loss, age, distance, height, weight 62 Qualitative Variables • Descriptive categorization of population or sample units • Two types: – Nominal – Ordinal 63 Quantitative Variables • Numerical values represent quantities measured with a fixed or standard unit of measure • Two types: – Interval – Ratio 64 Summary of Types of Variables DATA Qualitative or attribute (type of car owned) Nominal Ordinal Quantitative or numerical discrete (number of children) Interval Ratio continuous (time taken for an exam) How to choose a sample? • For a sample to be used as a guide to an entire population, it is important that it is truly a representative of that overall population. • Representative sampling assured: inferences and conclusions can be safely extended from the sample to the population as a whole. • A major problem lies in determining the extent to which the sample chosen is actually representative. 66 Sampling • Sampling is that part of statistical practice concerned with the selection of individual observations intended to yield accurate knowledge about a population of concern, especially for the purposes of statistical inference. 67 Representative Sample • Representative sample is not easy to obtain because of random and non- random variations in the sample. • Statistics offers methods for designing experiments to choose a representative sample of the overall population, strengthening its capability to discern truths about the population. 68 Random sampling • Random sampling is a sampling technique to select a sample for study from a population. Each individual is chosen entirely by chance, hence unpredictable, and each member of the population has a known, but possibly nonequal, chance of being included in the sample. • By using random sampling, the likelihood of bias (being non-representative) is reduced. • Simple random sampling is the basic sampling technique in which each individual is chosen entirely by chance with an equal probability of being included in the sample, i.e. each member of the population is equally likely to be chosen at any stage in the sampling process. 69 Random process • A random process is a repeating process whose outcomes follow no describable deterministic pattern, but follow a probability distribution. 70 Probability and Mathematical statistics • The fundamental mathematical concept employed in understanding randomness is probability. • Mathematical statistics (statistical theory) is the branch of applied mathematics that uses probability theory and analysis to examine the theoretical basis of statistics. 71 Chapter 1: GOALS When you have completed this chapter, you will be able to: ONE TWO Understand why we study statistics. Explain what is meant by descriptive statistics and inferential statistics. THREE Distinguish between qualitative and quantitative variables. FOUR Distinguish between discrete and continuous variables. FIVE Distinguish among the nominal, ordinal, interval, and ratio levels of measurement. SIX Define the terms mutually exclusive and exhaustive. SEVEN Basic methods in sampling. 72