Chapter 1: Why Study Statistics? Statistical techniques are used to make many decisions that involve data. An understanding of statistical methods will help us make decisions more effectively. Statistics turns _data_______ into __information_____________. What is Meant by Statistics? Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting numerical data which can then be used as a basis for inference to assist in making better decisions. Who Uses Statistics? Statistical techniques are used extensively by Marketers, Investors, Accountants, Consumers, Professional sports people, Hospital administrators, Educators, Politicians, Physicians. What examples can you think of? See examples: operations: quality control, reliability; Advertising: household surveys, tv viewing habits; strategists: forecasting, planning, risk min. Data Raw Facts or measurements of interest/Values assigned to observations or measurements Information data that are transformed into useful facts that can be used for a specific purpose, such as making a decision Low Temperature in Celsius for NY first week of January, 2018 Jan 1: -14 Jan 2: -16 Jan 3: -9 Jan 4: -7 Jan 5: -13 Jan 6: -14 Jan 7: -15 www.accuweather.com Average: ______ Meaningful: Lowest Jan. temperature historically for NY Data Set: A collection of data points Database: A collection of data points that contains many rows (records) and columns (fields) Data Sources: Primary: Collected for your own use. Secondary: Data collected by someone else. Which kind of data was the temperature data? Primary Data Advantages Collected by the person or organization who uses the data 1 Disadvantages can be expensive and time consuming to gather Secondary Data Disadvantages: no control over how the data was collected, less reliable unless collected and recorded accurately Advantages: readily available, less expensive to use Data Sources Existing Sources Data needed for a particular application might already exist within a firm. EG: Detailed information on customers, suppliers, and employees. Government agencies are another important source of data. Substantial amounts of business and economic data are available from organizations that specialize in collecting and maintaining data. Data are also available from a variety of industry associations and special-interest organizations. Statistical Studies In experimental studies the variables of interest are first identified. Then one or more factors are controlled so that data can be obtained about how the factors influence the variables. In observational (non-experimental) studies no attempt is made to control or influence the variables of interest. Survey perhaps the most common If surveys are collected properly, no attempt is made to control or influence the variable of interest. See EG of Bias Bias can occur when a question is stated in a way to encourage particular answers Observing, Experiments: treatments in controlled environment; Surveys: subjects are asked questions Data Acquisition : Cost Benefit Analysis Time Requirement: Information might no longer be useful by the time it is available. Cost of Acquisition: The cost of acquiring the information must be worth the information it provides. Data Errors: Part of the cost of acquiring worthwhile information is choosing how the data is measured with care, so that the study measures what it is supposed to measure. 2 Qualitative vs Quantitative Data Descriptive Counted/Measured Qualitative: ex. eye color, political party Quantitative: ex. number of children, weight, voltage EG: Quantitative Data/Variables Ordinary arithmetic operations are meaningful only with quantitative data. Discrete variables assume certain counted values Gaps when we graph discrete data on a number line A continuous variable can assume any value within a specified range. We observe a solid line on the number line with no gaps. There are four levels of data in terms of Level of Measurement Nominal: Data: Qualitative labels or names used to identify an attribute. Nonnumeric label or numeric code. Qualitative. Examples? Postal codes, hair color Ordinal: Qualitative Like nominal data but order or rank of the data is meaningful. The differences between data values cannot be determined / meaningless. A nonnumeric label or a numeric code may be used Ordinal data is qualitative. Examples? Education level (masters and doctorate) Interval: Quantitative similar to the ordinal level, but meaningful amounts of differences between data values can be determined. There is no natural zero point. Interval data is always numeric and is quantitative. The “zero” is assigned: it is unphysical and not meaningful Zero does not mean the absence of the quantity that we are trying to measure, zero is assigned, scale based Examples? calendar year, temperature Ratio level Quantitative Interval level with an inherent zero starting point. Differences and ratios are meaningful for this level of measurement. Ratio data is always numeric and is quantitative. True zero-point Examples? income Cross-Sectional and Time Series Data Time series data is collected over a range of time periods. Cross-sectional data is collected at the same or approximately the same point in time. See Table 1.4, 1.5, 1.6 in text 3 Branches of Statistics Descriptive Statistics: Methods of organizing, summarizing, and presenting data in an informative way. EG: Census of populations Individual response of registered voters regarding their choice of PM of Canada Inferential Statistics Making claims or conclusions about a population by examining sample results Predictive statistics Analyzing past data to predict future values and make decisions Population vs Sample Population • represents all possible subjects that are of interest in a particular study. Sample • refers to a portion of the population that is representative of the population from which it was selected Parameter vs Statistic Parameter – a described characteristic about a population Statistic – a described characteristic about a sample Why Sample the Population? To contact the whole population would often be time prohibitive. The cost of studying all the items in a population may be prohibitive. The sample results are adequate. The destructive nature of some tests or the physical impossibility of checking all items in the population makes it impossible to sample the entire population. Inferential Statistics Figure 1.8 4 5