Uploaded by Christopher Lee

Lecture 1 3 - Tagged

advertisement
Lecture 1-3
Descriptive Statistics
Central Tendency
1
Descriptive Statistics
• Two important characteristics of a population
– Center: measures of central tendency
– Behavior around the center: measures of
variability or dispersion
• Examples of central tendency: mean, median,
mode
• Examples of measures of dispersion: sample
standard deviation, sample range
2
Sample Mean
• The arithmetic average of the data.
• Denoted by
• If we have items in our sample
n
x
x  i 1
n
i
3
Sample Median
• The “middle” value of the data.
• Denoted
• Requires the data to be ordered from smallest
to largest: .
• Median divides the data into two equal parts
• Let be the “location” of the middle
4
Sample Median
• If is odd, the median is
• If is even, it is
5
Sample Mode
• Most frequently observed value
• Has no special notation
• Often not unique
– Multiple modes
– Frequently occurs with small data sets
• Not used often in practice.
6
Example
• The following data are the results of test done
by the National Bureau of Standards to
determine the melting point of biphenyl:
343.0 342.4
343.4
343.1
343.3
343.7 343.5
343.1
343.3
343.4
343.8 343.3
343.3
343.3
• Compute the mean, median and mode.
7
Example
• Mean:
• Median: Ordered data set:
342.2, 343.0, 343.1, 343.1, 343.3, 343.3, 343.3
343.3, 343.3, 343.4, 343.4, 343.5, 343.7, 343.8
8
Example
• Median - Continued
• Mode: Most popular is
9
Relationships
• For symmetric distributions, mean, median,
and mode tend to be very similar.
• For right-skewed distributions, tendency is
Mode < Median < Mean
• For left-skewed distributions,
Mean < Median < Mode
• Results depend on sample size
10
Which Measure of Centrality is Best?
• Depends on the data:
– The mean sensitive to extreme values.
– The median is not sensitive to these values.
– The mode is not always representative of all of the
data.
• The median is an example of a “robust”
measure of the center.
• It is less sensitive to “outliers”
11
Descriptive Statistics:
Variability or Dispersion
12
Measures of Variability/Dispersion
• Measures of central tendency give
information only about the “typical” value.
• Real data exhibit variability.
13
Common Measures of Variability
• Sample Range
• Sample Variance
• Sample Standard Deviation
• Interquartile Range
14
Sample Range
• Difference between the largest and smallest
observation
• Note: a single value, not a list of values like the
term “range” in mathematics.
• Not complete agreement upon symbol.
15
Sample Variance
• Deviations from the typical value are the heart
of variability.
• Let be the deviation or residual.
• Definition (for now):
16
Sample Variance
• Consider
17
Sample Variance
• Now consider
18
Sample Variance
• Theoretical Formula:
n
2
(

)
x
x
 i
s 2  i 1
n 1
• Computational Formula:
=
• The reflect the “degrees of freedom.”
19
Sample Standard Deviation
• Note: the units of the sample variance are
squared (e.g. if the data are in ).
• Sample standard deviation converts the
sample variance into the units of the data.
20
Interquartile Range (IQR)
• Range of the “middle 50%” of the data.
• Requires calculation of the “quartiles”
– First quartile (: approximately 25% below
– Third quartile (): approximately 25% above
• Conceptually:
– Use to split the data into two parts.
– is the median of the lower half.
– is the median of the upper half
– No universal definition other than the concept!
21
IQR
• IQR = • Robust estimate of dispersion
– Insensitive to outliers
– Concept of leverage:
22
Measures of Position
• Measures of position:
– Describe the relative position a specific data value
in ranked order
– Common term: percentile
– Important for robust and nonparametric statistics
• Examples:
– Median: estimate of the 50th percentile
– First quartile: estimate of the 25th percentile
– Third quartile: estimate of the 75th percentile
23
Al Contamination in PET Plastic
• Data:
291
222 125 79
145 119 244 118
182
63
30
140 101 102 87
183 60
191
119 511 120 172 70
30
90
115
• Variance:
=
;
24
Contamination - Continued
• Variance (continued)
=
• Standard Deviation:
25
Contamination - Continued
• Data in Ascending Order
30 30 60 63 70 79 87
90 101 102 115 118 119
119 120 125 140 145 172 182
183 191 222 244 291 511
• Range:
Min = 30; Max = 511
R = Max – Min=511-30=481
26
Contamination - Continued
• IQR
– Lower half and upper half have 13 values each
– Median of each half is the 7th value.
27
Which Measure of Variability Is Best?
• Statistical theory uses the variance
– Standard deviation is in the original units
– Variance and standard deviation are sensitive to
extreme values.
• Range
– Not commonly used outside of SPC
– Very sensitive to extreme values
• IQR
– Useful for exploratory data analysis
– Robust to extreme values
28
Download