Review Pareto Chart

advertisement
Review
Pareto Chart
• time series: a data set collected over time
• time plot:
For quantitative variables,
• what is a representative observation like?
center of the distribution: mean & median
• Does the observations take similar values, or are they quite
spread out?
spread of the distribution
Numerical Summaries (Statistics):
• Mean: the sum of the observations divided by the number of
observations
• Median: the midpoint of the observations when they are
ordered from the smallest to the largest (or from the largest
to the smallest)
Symbols:
• use x, y to denote variables
• use n to denote the sample size
• use x̄ to denote the sample mean
• formula for mean:
P
x̄ =
n
x
Properties of mean:
• The mean is the balance point of the data
• For a skewed distribution, the mean is pulled in the direction
of the longer tail, relative to the median.
• The mean can be highly influenced by an outlier
An outlier is an observation that falls well above or well
below the overall bulk of the data.
While the mean is sensitive to extreme observations, the median is
resistant to the extreme observations.
• A numerical summary of the observations is called resistant if
extreme observations have little, if any, influence on its value.
Sometimes, the median is too resistant!
Choice of mean or median
• If a distribution is very highly skewed, the median is usually
preferred over the mean because it better represents what is
typical.
• If the ddistribution is close to symmetric or only mildly skewed
or if it is discrete with few distinct values, the mean is usually
preferred because it uses the numerical values of all the
observations.
Download