Review Pareto Chart • time series: a data set collected over time • time plot: For quantitative variables, • what is a representative observation like? center of the distribution: mean & median • Does the observations take similar values, or are they quite spread out? spread of the distribution Numerical Summaries (Statistics): • Mean: the sum of the observations divided by the number of observations • Median: the midpoint of the observations when they are ordered from the smallest to the largest (or from the largest to the smallest) Symbols: • use x, y to denote variables • use n to denote the sample size • use x̄ to denote the sample mean • formula for mean: P x̄ = n x Properties of mean: • The mean is the balance point of the data • For a skewed distribution, the mean is pulled in the direction of the longer tail, relative to the median. • The mean can be highly influenced by an outlier An outlier is an observation that falls well above or well below the overall bulk of the data. While the mean is sensitive to extreme observations, the median is resistant to the extreme observations. • A numerical summary of the observations is called resistant if extreme observations have little, if any, influence on its value. Sometimes, the median is too resistant! Choice of mean or median • If a distribution is very highly skewed, the median is usually preferred over the mean because it better represents what is typical. • If the ddistribution is close to symmetric or only mildly skewed or if it is discrete with few distinct values, the mean is usually preferred because it uses the numerical values of all the observations.