Uploaded by oktaviusvicar

Descriptive-Statistics-EDDA

advertisement
PhD Math 331- DATA ANALYSIS
Descriptive Statistics
Measures of Central Tendency
Measures of Variation
Olive N. Mancera
Instructor 1
DESCRIPTIVE STATISTICS
•Descriptive
Statistics are brief
descriptive coefficients that
summarize a given data set,
which
can
be
either
a
representation of the entire or a
sample of the population.
Descriptive Statistics
DESCRIPTIVE
1. Organizing and summarizing data using numbers and
graphs
2. Describe the characteristics of the sample or population
3.Collection, organizing, summarizing, presenting the data
4. Charts, graphs, and tables
5. Measures of Central Tendency, Measures of Dispersion
6. Data set is small
Examples of Descriptive
Descriptive
• Graphical
- Arrange data in tables
- Bar Graphs and Pie Charts
• Numerical
- Percentages
- Averages
- Range
Example of Data Analysis
Let us say there are 20 statistics classes at your
university, and you’ve collected the ages of
students in one class. Ages of students in your
statistics class: 19, 21, 18, 18, 34, 30, 25, 26, 24, 24,
19, 18, 21, 49, 27.
A descriptive question that could be asked
about this data:
“What’s the most common age of student in
your statistics class?
Limitations of Descriptive Statistics
Descriptive statistics only allows you to
make summations about the people or
objects that you have measured. You cannot
use the data you have collected to generalize
to other people or objects (that is, using data
from a sample to infer the properties or
parameters of a population).
DESCRIPTIVE
STATISTICS
1. Measures of Central Tendency
2. Measures of Variation
MEASURES OF CENTRAL TENDENCY
• A measure of central tendency is a single value that
attempts to describe a set of data by identifying the
central position within the set of data. As such,
measures of central tendency are also called
measures of central location. They are also classed
as summary statistics.
• The mean (often called the average), median, and
mode are the measures of central tendency.
•The
Arithmetic Mean
mean (or average) is the most
popular and well known measure of
central tendency. It can be used with both
discrete and continuous data. The mean is
equal to the sum of all the values in the
data set divided by the number of values
in the data set.
Arithmetic Mean
• Formula
If there are n values in a data set and they have values
x1, x2, …, xn ,
the sample mean denoted by 𝒙 is 𝒙 =
𝒙=
𝒙
.
𝒏
𝒙𝟏 + 𝒙𝟐 +⋯+ 𝒙𝒏
𝒏
or
• For population mean, use the Greek lower case letter
“mu”, denoted as µ.
Computing the Mean
Example:
The grades in Math 10 of 10 students
are 87, 84, 85, 85,86, 90, 79, 82, 78, and
76. What is the mean grade?
Computing the Mean
•SOLUTIONS:
𝑥
𝑥
𝑥 87+84+85+85+86+90+79+82+78+76
=
=
𝑛
10
832
=
= 83.2
10
Important Properties of the
Mean
•A set of data has only one mean.
•It includes every value in the data set as
part of the calculation.
•Mean
can be applied for interval and
ratio data.
Important Properties of the
Mean
•The
mean is very useful in
comparing two or more data sets.
•Mean is affected by the extreme
small or large values on a data set.
When Not to Use the Mean
•It
is susceptible to the influence of
outliers or extreme values. Outliers are
values that are unusual compared to
the rest of the data set by being
especially small or large in numerical
value.
When Not to Use the Mean
Example:
Consider the wages of the staffs at a factory.
Staff
1
2
3
4
5
6
7
8
9
10
Wage 15000 18000 16000 14000 15000 15000 12000 17000 90000 95000
•The
mean wage for these ten staff is
30,700 which does not reflect the typical
wage of a worker as most workers have
wages in the 12,000 to 18,000 range. The
mean is being skewed by two large
wages.
When Not to Use the Mean
•When
the data is skewed, the
mean loses its ability to provide
the best central location for the
data because the skewed data is
dragging it away from the typical
value.
When is the Mean the Best Measure of
Central Tendency?
•The
mean is usually the best measure of
central tendency to use when the data
distribution is continuous and symmetrical
(normally distributed). However, it all
depends on what you are trying to show
from your data.
Median
•The median is the middle score for a
set of data that has been arranged in
order of magnitude. It is less affected
by outliers and skewed data.
Median
Example:
Find the median of the data below.
65 55 89 56 35 14 56 55 87 45
Median
Solution:
1. Rearrange the data into order of
magnitude.
14 35 45 55 55 56 56 65 87 89
2. Take the middle scores (5th and 6th) in
the data set and average them to get the
median of 55.5.
When is the Median the Best Measure of
Central Tendency?
•The median is considered to be the best
representative of the central location of
the data when the data is skewed or
have outliers (extreme values).
When is the Median the Best Measure of
Central Tendency?
• It
is best used when dealing with ordinal
data (e.g., socio- economic status,
education level, income level, satisfaction
rating, etc.).
•It is also used when dealing with skewed
interval data (e.g., age).
Mode
•The mode is the most frequent
score in the data set.
Mode
•It is the least used of the measures of
central tendency and can only be
used when dealing with nominal data
(e.g., result of test as “pass” or “fail”).
•It is commonly used for categorical
data, rarely used with continuous
data.
Properties of Mode
•Found
by locating the most frequently
occurring value.
•Easiest average to compute.
•Can be more than one mode or even no
mode in any given data set.
Properties of Mode
•Not
affected by the extreme small or
large values.
•Can
be applied for nominal, ordinal,
interval, and ratio data.
Types of Mode
• Unimodal – a data set has only one value that occur the
greatest frequency.
• Bimodal
– the data has two values with the same
greatest frequency, both values are considered the
mode.
• Multimodal – if a data set have more than two modes.
• No mode – when a data set have the same number of
frequency.
Example:
• An
operations manager in charge of a
company’s manufacturing keeps track of the
number of manufactured LCD television in a
day. Compute for the following data that
represents the number of LCD television
manufactured for the past three weeks: 20,
18, 19, 25, 20, 21, 20, 25, 30, 29, 28, 29, 25,
25, 27, 26, 22, and 20. Find the mode of the
given data set.
•20,
18, 19,
20, 21, 20,
30, 29, 28,
25, 25, 27,
22, and 20
25,
25,
29,
26,
Solutions:
1. Arrange the data set from least to greatest.
18 19 20 20 20 20 21 22 25 25 25 25 26 27 28 29 30
2.Identify which number/s occur most frequently.
3. Determine the type of mode of the given data set.
Summary of When to Use the Mean,
Median, and Mode
Guide on Choosing the Most Appropriate Measure
of Central Tendency
Thank you for listening!
Download