PhD Math 331- DATA ANALYSIS Descriptive Statistics Measures of Central Tendency Measures of Variation Olive N. Mancera Instructor 1 DESCRIPTIVE STATISTICS •Descriptive Statistics are brief descriptive coefficients that summarize a given data set, which can be either a representation of the entire or a sample of the population. Descriptive Statistics DESCRIPTIVE 1. Organizing and summarizing data using numbers and graphs 2. Describe the characteristics of the sample or population 3.Collection, organizing, summarizing, presenting the data 4. Charts, graphs, and tables 5. Measures of Central Tendency, Measures of Dispersion 6. Data set is small Examples of Descriptive Descriptive • Graphical - Arrange data in tables - Bar Graphs and Pie Charts • Numerical - Percentages - Averages - Range Example of Data Analysis Let us say there are 20 statistics classes at your university, and you’ve collected the ages of students in one class. Ages of students in your statistics class: 19, 21, 18, 18, 34, 30, 25, 26, 24, 24, 19, 18, 21, 49, 27. A descriptive question that could be asked about this data: “What’s the most common age of student in your statistics class? Limitations of Descriptive Statistics Descriptive statistics only allows you to make summations about the people or objects that you have measured. You cannot use the data you have collected to generalize to other people or objects (that is, using data from a sample to infer the properties or parameters of a population). DESCRIPTIVE STATISTICS 1. Measures of Central Tendency 2. Measures of Variation MEASURES OF CENTRAL TENDENCY • A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within the set of data. As such, measures of central tendency are also called measures of central location. They are also classed as summary statistics. • The mean (often called the average), median, and mode are the measures of central tendency. •The Arithmetic Mean mean (or average) is the most popular and well known measure of central tendency. It can be used with both discrete and continuous data. The mean is equal to the sum of all the values in the data set divided by the number of values in the data set. Arithmetic Mean • Formula If there are n values in a data set and they have values x1, x2, …, xn , the sample mean denoted by 𝒙 is 𝒙 = 𝒙= 𝒙 . 𝒏 𝒙𝟏 + 𝒙𝟐 +⋯+ 𝒙𝒏 𝒏 or • For population mean, use the Greek lower case letter “mu”, denoted as µ. Computing the Mean Example: The grades in Math 10 of 10 students are 87, 84, 85, 85,86, 90, 79, 82, 78, and 76. What is the mean grade? Computing the Mean •SOLUTIONS: 𝑥 𝑥 𝑥 87+84+85+85+86+90+79+82+78+76 = = 𝑛 10 832 = = 83.2 10 Important Properties of the Mean •A set of data has only one mean. •It includes every value in the data set as part of the calculation. •Mean can be applied for interval and ratio data. Important Properties of the Mean •The mean is very useful in comparing two or more data sets. •Mean is affected by the extreme small or large values on a data set. When Not to Use the Mean •It is susceptible to the influence of outliers or extreme values. Outliers are values that are unusual compared to the rest of the data set by being especially small or large in numerical value. When Not to Use the Mean Example: Consider the wages of the staffs at a factory. Staff 1 2 3 4 5 6 7 8 9 10 Wage 15000 18000 16000 14000 15000 15000 12000 17000 90000 95000 •The mean wage for these ten staff is 30,700 which does not reflect the typical wage of a worker as most workers have wages in the 12,000 to 18,000 range. The mean is being skewed by two large wages. When Not to Use the Mean •When the data is skewed, the mean loses its ability to provide the best central location for the data because the skewed data is dragging it away from the typical value. When is the Mean the Best Measure of Central Tendency? •The mean is usually the best measure of central tendency to use when the data distribution is continuous and symmetrical (normally distributed). However, it all depends on what you are trying to show from your data. Median •The median is the middle score for a set of data that has been arranged in order of magnitude. It is less affected by outliers and skewed data. Median Example: Find the median of the data below. 65 55 89 56 35 14 56 55 87 45 Median Solution: 1. Rearrange the data into order of magnitude. 14 35 45 55 55 56 56 65 87 89 2. Take the middle scores (5th and 6th) in the data set and average them to get the median of 55.5. When is the Median the Best Measure of Central Tendency? •The median is considered to be the best representative of the central location of the data when the data is skewed or have outliers (extreme values). When is the Median the Best Measure of Central Tendency? • It is best used when dealing with ordinal data (e.g., socio- economic status, education level, income level, satisfaction rating, etc.). •It is also used when dealing with skewed interval data (e.g., age). Mode •The mode is the most frequent score in the data set. Mode •It is the least used of the measures of central tendency and can only be used when dealing with nominal data (e.g., result of test as “pass” or “fail”). •It is commonly used for categorical data, rarely used with continuous data. Properties of Mode •Found by locating the most frequently occurring value. •Easiest average to compute. •Can be more than one mode or even no mode in any given data set. Properties of Mode •Not affected by the extreme small or large values. •Can be applied for nominal, ordinal, interval, and ratio data. Types of Mode • Unimodal – a data set has only one value that occur the greatest frequency. • Bimodal – the data has two values with the same greatest frequency, both values are considered the mode. • Multimodal – if a data set have more than two modes. • No mode – when a data set have the same number of frequency. Example: • An operations manager in charge of a company’s manufacturing keeps track of the number of manufactured LCD television in a day. Compute for the following data that represents the number of LCD television manufactured for the past three weeks: 20, 18, 19, 25, 20, 21, 20, 25, 30, 29, 28, 29, 25, 25, 27, 26, 22, and 20. Find the mode of the given data set. •20, 18, 19, 20, 21, 20, 30, 29, 28, 25, 25, 27, 22, and 20 25, 25, 29, 26, Solutions: 1. Arrange the data set from least to greatest. 18 19 20 20 20 20 21 22 25 25 25 25 26 27 28 29 30 2.Identify which number/s occur most frequently. 3. Determine the type of mode of the given data set. Summary of When to Use the Mean, Median, and Mode Guide on Choosing the Most Appropriate Measure of Central Tendency Thank you for listening!