section 2 2018

advertisement
2ndEnglish
Descriptive Statistics
REVIEW OF CH. 3;4
Measure
Raw Data
x
x
Mean
Grouped Data
x1  x2  .....  xn
n
 xi
n
X 
X
i 1
n
n
f
i 1
fi
i
i
X i : is the midpoint of the 𝑖 class.
f i :is the frequency in that class.
1- sort the data
n
 f1
2
MD  A 
L
f 2  f1
2- calculate:
-odd sample: [
(𝑛+1)
2
]th.
A is the Iower limit of the class of the median. (boundaries)
-even sample:
Median
([
(𝑛)
2
(𝑛)
]th + [
2
f1 is the cumulative number of frequencies in all the
classes before the class of the median.
+1]th)/2
f 2 is the cumulative number of frequencies in all the
n: the sample size.
classes after the class of the median.
L is the width of the class
1
2ndEnglish
Descriptive Statistics
Mode  A 
f  f1
L
2 f  f1  f 2
A is the Iower limit of the class containing the mode.
(boundaries )
Mode
The value that occurs most
often.
f is the large number of frequencies.
f1 is the number of frequency preceding the class
containing the mode.
f 2 is the number of frequency which following the
class containing the mode.
L is the width of the class.
1- sort the data
2- calculate:
(𝑛+1)
(𝑛+1)
- if [ 4 ]th integer, Q1=[ 4 ]th.
- otherwise, Q1= L+F×(U-L).
L: the integer part of [
1st quartile (Q1)
(𝑛+1) th
]
4
U: the round above of [
(𝑛+1) th
]
4
F: the fraction part of [
(𝑛+1) .
]
4
Ex: 2;5;7;8;9;11
(𝑛+1) th
]
4
[
=[1.75]th
n
 f1
MD  A  4
L
f 2  f1
A is the Iower limit of the class of the Q1 (
(boundaries )
f1 is the cumulative number of frequencies in all the
n
classes before the class of the Q1 ( ).
4
f 2 is the cumulative number of frequencies in all the
n
classes after the class of the Q1 ( ).
4
L is the width of the class
Q1= 2+.75×(5-2)= 4.25.
2
n
).
4
2ndEnglish
Descriptive Statistics
1- sort the data
2- calculate:
3(𝑛+1)
3(𝑛+1)
- if [ 4 ]th integer, Q1=[ 4 ]th.
- otherwise, Q1= L+F×(U-L).
Ex: 2;5;7;8;9;11
3st quartile (Q3)
3(𝑛+1)
[ 4 ]th
3n
 f1
MD  A  4
L
f 2  f1
A is the Iower limit of the class of the Q3 (
3n
).
4
f1 is the cumulative number of frequencies in all the
3n
classes before the class of the Q3 ( ).
4
=[5.25]th
Q1= 9+.25×(11-9)= 9.5.
f 2 is the cumulative number of frequencies in all the
3n
classes after the class of the Q3 ( ).
4
L is the width of the class.
Midrange
(Lowest + Highest)/2
(Lowest boundary + Highest boundary)/2
n
n
S2 
 ( X i  X )2
S2 
i 1
n 1
(X
i 1
 x )2 fi
n
f
i 1
Sampling variance
i
i
1
X i : is the midpoint of the 𝑖 class.
f i :is the frequency in that class.
Range
(Highest - Lowest)
3
(Highest boundary - Lowest boundary)
2ndEnglish
Descriptive Statistics
Q  Q3  Q1
Interquartile range
Semi-interquartile
range
Q
S
 100% ; S=√𝑺𝟐 .
X
coefficient of variation
(relative measure)
Range Rule of Thumb
Q3  Q1
2
S
Range
; Chebyshev’s Theorem when k=2.
4
4
2ndEnglish
Descriptive Statistics
Important Notes
1-
2Properties of the Mean
 Uses all data values.
 Varies less than the median or mode
 Used in computing other statistics, such as the variance
 Unique, usually exists in data values
5
2ndEnglish
Descriptive Statistics
 Affected by extremely high or low values, called outliers
 Cannot be used for nominal or ordinal data
Properties of the Median
 Not uses all data values.
 Affected less than the mean by extremely high or extremely low values.
 Can not be used for nominal data
Properties of the Mode
 Easiest measure to compute
 Can be used with nominal data
 Not always unique or may not exist
Properties of the Midrange
 Easy to compute.
 Affected by extremely high or low values in a data set
6
2ndEnglish
Descriptive Statistics
3Chebyshev’s Theorem (Empirical Rule)
𝑝(𝜇 − 𝑘𝜎 < 𝑥 < 𝜇 + 𝑘𝜎) ≥ 1 −
1
; 𝑘 > 1.
𝑘2
#of standard
Minimum Proportion within k
Minimum
deviations ,k
standard deviations
within k standard
deviations
2
1
1 3

4 4
75%
3
1
1 8

9 9
88.89%
4
1
1 15

16 16
93.75%
7
Percentage
2ndEnglish
Descriptive Statistics
EX:
The mean price of houses in a certain neighborhood is $50,000, and the standard
deviation is $10,000.
1-Find the price range for which at least 55% of the houses will sell.
2- Find the price range for which at least 75% of the houses will sell.
1- Chebyshev’s Theorem states that at least 55% of a data set will fall within 1.5
standard deviations of the mean.
Lowestvaule  50000  1.5  10000  35000
highestvaule  50000  1.5  10000  65000
2- Chebyshev’s Theorem states that at least 75% of a data set will fall within 2
standard deviations of the mean.
Lowestvaule  50000  2  10000  30000
highestvaule  50000  2  10000  70000
Note: there is –ve relation between the accuracy and the estimated range.
8
2ndEnglish
Descriptive Statistics
In the case that the shape of the distribution for the data is roughly bell-shaped, the
Empirical Rule states that:
The interval: (μ - σ , μ+σ) will contain approximately 68% of all the measurements.
The interval: (μ - 2σ, μ+2σ) will contain approximately 95% of all the measurements.
The interval: (μ - 3σ, μ+ 3σ) will contain approximately 99.7% of all the measurements.
EX:
A survey of local companies found that the expenditures on traveling for
individuals were $0.25 per month. The standard deviation was 0.025$. Using
Chebyshev’s theorem,
1- Find the minimum percentage of the individuals expenditures that will fall
between $0.20 and $0.30.
2- Assuming the population individuals is bell-shaped, find the minimum
percentage of the individuals expenditures that will fall between $0.20 and $0.30.
9
2ndEnglish
Descriptive Statistics
1-Compute the value of k
.30  .25
 2 or
.025
.25  .20
K
2
.025
K
At least 75% of the individuals expenditures will fall between $0.20 and $0.30.
2- At least 95% of the individuals expenditures will fall between $0.20 and $0.30.
10
Download