Professor Vipin 2014 Unit 5 Analysis of Uni-variate Data

Professor Vipin 2014
Unit 5
Analysis of Uni-variate Data
Central Tendency
In statistics, a central tendency (or, more commonly, a measure of central tendency) is a central value or
a typical value for a probability distribution.
It is occasionally called an average or just the center of the distribution. The most common measures of
central tendency are the arithmetic mean, the median and the mode. A central tendency can be
calculated for either a finite set of values or for a theoretical distribution, such as the normal
Arithmetic Mean
This is the simplest form of representing data by one value. This is also called ‘average’
Properties of Mean
1. The sum of the deviations, of all the values of x, from their arithmetic mean, is zero.
2. The product of the arithmetic mean and the number of items gives the total of all items.
3. If and are the arithmetic mean of two samples of sizes n1 and n2 respectively then, the
arithmetic mean of the distribution combining the two can be calculated as
Easy to understand and simple to compute.
It is based on all the values.
It is well defined by a mathematical formula.
It is capable of further mathematical treatment.
It possesses sampling stability.
It is highly affected by normal extreme values.
If the distributions have open ended classes, then it cannot be easily calculated.
Cannot be used for qualitative data
Cannot be obtained graphically.
Page 1
Professor Vipin 2014
Simple Arithmetic Mean
Arithmetic Mean for Un-Grouped Data (Frequencies Not Given)
1. Direct Method
2. Step Deviation method (Used when values are large)
Arithmetic Mean for Grouped Data (Frequencies Given)
1. Direct Method
2. Step Deviation
3. Continuous Series (Direct Method)
Page 2
Professor Vipin 2014
Here x is the midpoint of each class interval
4. Continuous Series (Step Deviation Method)
Weighted Arithmetic Mean
It is the middle most value of a data when they are arranged in an order. It is denoted by ‘M’. It divides
the variable into two equal parts.
Median (Direct Method)
In this case, the median can be easily computed by sorting the data in ascending or descending order
and counting the middle value.
Where N is the number of observations
Median (With Continuous Series)
In this case first we have to determine the median class and then median which lies in the median class
by using interpretation formula.
( )
Page 3
Professor Vipin 2014
( )
l1 is the lower limit of the median class
cf is the cumulative frequency of the median class.
i is the length of the median class (l2-l1)
N is the no of the observations in the distribution.
f is the simple frequency of the median class.
Mode (Direct Method)
Mode is the value of that observation which has the highest frequency.
In case of the following values
12, 34, 32, 16, 14, 19, 20, 27
The mode here is 34 because it is the highest.
Mode (With Continuous series)
Modal class is the class having highest frequency.
Mode is the value inside the modal class given by the formula:
( )
l1 is lower limit of the modal class
f1 is the frequency of the modal class
f0 is the frequency of the preceding to f1
f2 is the frequency next to f1
i is the length of the modal class
Emperical Relation between Mean, Median and Mode
Page 4
Professor Vipin 2014
Geometric Mean
Geometric Mean
It is the nth root of the product of ‘n’ observations in a series. It is denoted by G.
Ungrouped Data
2. For Discrete and Continuous Data
1. Based on all observations
2. Well defined by a mathematical formula
3. Capable for further mathematical treatment.
1. Not easy to understand and is complicated to calculate
2. Cannot be computed if any one of the observations is zero or negative
Harmonic Mean
Harmonic Mean
Harmonic mean is the reciprocal of the AM of the reciprocal of the set of observations.
1. Ungrouped data
∑( )
Page 5
Professor Vipin 2014
2. Discrete Data (Grouped / With Frequency)
∑( )
3. Continuous Data
∑( )
1. Based on all observations
2. Well defined by a mathematical formula
3. Capable for further mathematical treatment.
1. Not easy to understand and is complicated to calculate
2. Cannot be computed if any one of the observations is zero or negative
Relationship Between AM, GM and HM
√ ̅
For any two numbers, a and b
Page 6
Professor Vipin 2014
Measures of Position
It is the measure that divides the data into two equal parts.
They are measures which divide the data into four equal parts.
Q1 is the first quartile
Q2 is the 2nd quartile or median
Q3 is the third quartile or upper quartile
There are nine deciles which divide the data into 10 equal parts. The 5th decile is Q2.
They are measures which divide the data into 100 equal parts.
Computation of Quartiles, Deciles and Percentiles
1. Ungrouped Data
2. Discrete Data (Grouped Data)
Step 1: Find LCF
Step 2:
Page 7
Professor Vipin 2014
Step 3: Qi / Di / Pi = value of x corresponding to LCF just >
3. Continuous Data
Step 1: Find LCF
Step 2:
where N denotes the total frequency
Step 3: Qi / Di / Pi class = class corresponding to LCF Just >
Step 4:
Page 8
Professor Vipin 2014
Measures of Dispersion
Absolute Measures of Variation
Quartile Deviation
Mean Deviation
Standard Deviation
Relative Measures of Variation
Coefficient of Range
Coefficient of Quartile deviation
Coefficient of Mean deviation
Coefficient of Standard deviation
It is the simplest method of studying variation.
Coefficient of Range
Quartile Deviation
It is the half of the inter quartile range.
Coefficient of QD
Page 9
Professor Vipin 2014
Mean Deviation (From Mean)
1. Ungrouped Data
2. Discrete Data (Grouped)
∑ |
3. Continuous (Grouped) is a relative measure
( ̅)
Mean Deviation (From Median)
1. Ungrouped data
2. Discrete data (Grouped)
∑ |
3. Continuous (Grouped)
( )
Page 10
Professor Vipin 2014
Mean Deviation (From Mode)
1. Ungrouped Data
( )
2. Discrete data (Grouped)
∑ |
3. Continuous (Grouped)
It is the same as discrete data. However x denotes the mid points of the CI
Standard Deviation
It is the positive square root of the mean of the squared deviations of given observations from their AM.
1. Ungrouped Data
Direct Method
Deviation Method
Page 11
Professor Vipin 2014
2. Discrete Data (Grouped)
Deviation Method
3. Continuous Frequency (Grouped)
Where x is the midpoint of the class interval
Deviation Method
∑ ( ́)
Coefficient of SD
Coefficient of Variation
Page 12
Professor Vipin 2014
Moments are used to describe characteristics of a frequency distribution such as averages, dispersion,
skewness and kurtosis.
Types of Moments
1. Raw Moments
Moments about any arbitrary point ‘A’ are known as raw moments.
2. Central Moments
Moments about the mean are known are central moments.
a) First
b) Second
c) Third
d) Fourth
Karl Pearson’s Beta and Gamma Coefficients
These are coefficients based on the 1st four central moments.
Page 13
Professor Vipin 2014
Tells us if a distribution is skewed or whether it is symmetrical and
a symmetrical curve and normal curve.
tells us the difference between
In probability theory and statistics, skewness is a measure of the asymmetry of the probability
distribution of a real-valued random variable about its mean. The skewness value can be positive or
negative, or even undefined.
Types of Skewness
Teacher expects most of the students get good marks. If it happens, then the cure looks like the normal
curve below:
But for some reasons (e. g., lazy students, not understanding the lectures, not attentive etc.) it is not
happening. So we get another two curves.
Page 14
Professor Vipin 2014
Karl Pearson’s Coefficient of Skewness
Bowley’s Coefficient of Skewness
Page 15