Notes by Gabriel Okello INTRODUCTION TO STATISTICS AND DATA Description This lesson will introduce you to the basic concepts of statistics and data. This lesson will also introduce you to different statistical softwares for data analysis. You will also be introduced to SPSS statistical software for data analysis Learning Outcomes At the end of this lesson students will be able to: 1. Understand different statistical concepts: 2. Become aware of a wide range of applications of statistics in business 3. Identify the different types and sources of data 4. Compare the four different levels of data: nominal, ordinal, interval, and ratio 5. Distinguish different statistical softwares Lesson Outline 1. Basic statistical concepts 2. Application of statistics 3. Types, sources and classification of data 4. Data measurements 5. Introduction to statistical softwares Basic Statistical Concepts [Refer to Gupta & Kapoor (2000) Chapter 1] Statistics is the art and science of collecting, analyzing, presenting and interpreting data which leads to drawing of conclusions. Statistics has two branches: descriptive statistics and inferential statistics Descriptive statistics: part of statistics concerned with the description and summarization of data It also refers to the presentation of a body of data in the form of tables, charts, graphs, and other forms of graphic display together with the measures. Notes by Gabriel Okello Inferential statistics: part of statistics concerned with drawing of conclusions about the properties of the whole population from sample Statistical inference: process of using data from the sample to make estimates and test hypothesis about the characteristic of the population. Inferential statistics has two branches: parametric statistics and nonparametric statistics Parametric statistics: requires certain assumptions about the distribution of the data. It requires interval or ratio data. Nonparametric statistics: there is no assumptions about the distribution of the data. It requires data that are in nominal and ordinal Population is the total collection of elements of interest in a particular study Sample is the sub-group of the population Parameter is a numerical measure describing a characteristics of the population Statistic is a numerical measure describing a characteristics of a sample A statistic is usually used to estimate the parameter. Parameter is the truth we desire to find out Application of statistics [Refer to Gupta & Kapoor (2000) Chapter 1] Statistics is used today practically by every profession – Economist uses it to test the efficiency of alternative production techniques; – Businessperson may use it to test the product design or package that maximizes sales; – Sociologist use it to analyze the result of a drug rehabilitation program; – Industrial psychologist use it to examine workers' responses to plant environment; – Political scientist use statistics to forecast voting patterns; – The physician to test the effectiveness of a new drug; the chemist to produce cheaper fertilizers – Finance analysts use it to guide their investments recommendations Some of essential functions of statistics – Statistics presents facts and figures in a definite form – Statistics simplifies the complexity of data – Facilitates comparison between different sets of observation – Formulating and testing of hypothesis Notes by Gabriel Okello – Helps in formulating plans and policies in different fields – Deriving valid inferences – For forecasting of events and trends Data Data are facts and figures collected, analyzed and summarized for presentation and interpretation. Data can also be defined as the values of a variable. – Elements are entities on which data is collected. – A variable is a characteristic of interest for the element. A variable can also be defined as a characteristic that varies from one person or thing to another. Data can be classified by Scales of measurement Either as qualitative or quantitative Either as cross-sectional or time-series Either as primary or secondary Scales of measurement/ Levels of data measurement There are four measurement scales: nominal, ordinal, interval and ratio (NOIR). Ratio Scale Ratio-level data measurement is the highest level of data measurement. Ratio data have the same properties as interval data, but ratio data have an absolute zero, and the ratio of two numbers is meaningful. Examples of ratio data are height, weight, time, volume, and Kelvin temperature. With ratio data, a researcher can state that 180 pounds of weight is twice as much as 90 pounds or, in other words, make a ratio of 180:90. Many of the data gathered by machines in industry are ratio data. Interval scale In an interval scale, the numerals assigned to each measure are ranked in order and the intervals between numerals are equal. Interval-level data measurement is the next to the highest level of Notes by Gabriel Okello data in which the distances between consecutive numbers have meaning and the data are always numerical. The distances represented by the differences between consecutive numbers are equal; that is, interval data have equal intervals. An example of interval measurement is Fahrenheit temperature. With Fahrenheit temperature numbers, the temperatures can be ranked, and the amounts of heat between consecutive readings, such as 200, 210, and 220, are the same. Ordinal Scale Ordinal-level data measurement is higher than the nominal level. In addition to the nominal level capabilities, ordinal-level measurement can be used to rank or order objects. Examples of variables that can be measured at the ordinal scale include: social class, military rank, teaching staff, etc. Nominal Scale A nominal scale is considered to be the lowest level of measurement of a variable. This type of scale merely groups subjects, units or cases from the sample into categories. Subjects or cases in each category have some common set of characteristics. Numbers representing nominal level data (the word level often is omitted) can be used only to classify or categorize Variables which can only be measured at the nominal scale include: sex, race, marital status, employment status, language, roofing materials, religion, shape of a building, colour, etc. Qualitative and Quantitative data Data can either be qualitative or quantitative Quantitative data are those data sets that are numerical in nature Qualitative data are those data sets that are non-numerical in nature Cross-sectional and time series data Data can be classified as either cross-sectional or time-series especially when analyzing data. Cross sectional data are data collected at the same or approximately the same point in time. Time series data are data sets collected sequentially over time Data sources Notes by Gabriel Okello Data can be obtained from existing sources (secondary) or for the first time (primary) using various methods. Data collection methods – Questionnaires – Experiments – Interviews – Observation check lists Organizing Data Some situations generate an overwhelming amount of data which will require presentation. Data can be presented using – Tables (frequency distribution) – Graphs/ charts – Numerical measures The type of table and graph depends on the type of data – qualitative (non-numerical) or quantitative (numerical). Summarizing Qualitative data Tabular methods – frequency, relative frequency and percentage distributions Graphical methods – bar charts and pie charts Summarizing Quantitative data Tabular methods – frequency, relative frequency and percentage distributions; cumulative frequency, relative cumulative frequency and percentage cumulative frequency distributions Graphical methods – line graphs, dot plots, stem and leaf plots, histograms, boxplots, scatterplots and ogives Notes by Gabriel Okello Understanding symbols used in the formulas [Refer to Gupta & Kapoor (2000) Chapter 1 for formulas used in the Chapters] Introduction to Statistical Softwares Statistical software is a computer program designed to conduct statistical analyses. There are several statistical softwares. The common statistical softwares are: SPSS STATA SAS Eviews Matlab Minitab R Statistical Computing Excel NVivo The choice of statistical softwares depends on: The type of data The type of analyses to be done Cost Institutional preferences Learning curve Further Reading Read Gupta & Kapoor (2000) Chapter 1, Read Wonnacott & Wonnacott (1990) Chapter 1, Watch video clips on Introduction to Statistics, Data and Statistical Softwares Notes by Gabriel Okello – Basic Concepts and Terminology https://www.youtube.com/watch?v=zlfwdsEDC4Q – Choosing appropriate statistical software: https://www.youtube.com/watch?v=musm3zmPThI Week 1 Activity 1. With examples, distinguish between variable and data 2. Describe the two major branches of statistics with examples. 3. Using examples, describe the following levels of data measurements a. Ordinal b. Ratio c. Interval d. Nominal 4. What are the limitations of statistics