Lecture 5: Measures of Center and Variability for Distributions (Population); Quartiles, Boxplots for Data (Sample) Chapter 2 2.1 Measures of Center (Distributions) • Mean for continuous distributions – Let f(x) be the density function for a continuous random variable X, then the mean of X is: ∞ µ X = ∫ x ⋅ f ( x)dx −∞ • Mean for discrete distributions – Let p(x) be the mass function for a continuous random variable X, then the mean of X is: µ X = ∑ x ⋅ p(x) Examples—Means • Continuous distributions – Normal (µ,σ) — µ – Exponential (λ) — 1/ λ – Uniform (a, b) — (a+b)/2 • Discrete distributions – Binomial (n, π) — nπ – Poisson (λ) — λ Example 1 • Find the mean value of a variable x with density function: f(x) = 1.5(1-x^2), 0<x<1 = 0, o.w. • What’s the median of X distribution? Example - Medians • Median for continuous distributions – Let f(x) be the density function for a continuous random variable X, then the median of X is whatever value which satisfies: ∫ µ~ −∞ f ( x)dx = 0.5 • What is the median of a Normal Distribution? • If a continuous distribution is perfectly symmetric, mean = median. • No median for discrete distributions. 2.2 Measures of Variability (Distributions) • Variance for continuous distributions – Let f(x) be the density function for a continuous random variable X, then the variance of X is: 2 X ∞ 2 ( x − µX ) −∞ σ =∫ ⋅ f ( x)dx • Variance for discrete distributions – Let p(x) be the mass function for a continuous random variable X, then the variance of X is: 2 X 2 σ = ∑ (x − µ X ) ⋅ p ( x ) • σ X - Standard deviation of X, is square root of the variance σ X2 Examples—Variances • Continuous distributions – Normal (µ,σ) — σ2 – Exponential (λ) — 1/ λ2 – Uniform (a, b) — (b–a)2/12 • Discrete distributions – Binomial (n, π) — nπ(1–π) – Poisson (λ) — λ • Self-reading: “empirical rule” (68-95-99.7 rule) in the middle of Pg 76, “unbiasedness” on top of Pg 77 Example 2 • What’s the standard deviation of X from Example 1? 2.3 Other measures—Quartiles • The median is the midpoint of the data (Sample) • Quartiles break the data into quarters – 1st Quartile (Q1) = lower quartile = 25th percentile – 2nd Quartile = median = 50th percentile – 3rd Quartile (Q3) = upper quartile = 75th percentile • How to find the quartiles? – They are just medians of the two halves of the data • Interquartile Range (or IQR) = Q3 – Q1 • Self reading: Percentiles, Pg 85 Example • Scores for 10 students are: 78 80 80 81 82 83 85 85 86 87 • Find the median and quartiles: 1. Median= Q2 = M = (82+83)/2 = 82.5 2. Q1 = Median of the lower half, i.e. 78 80 80 81 82, = 80 3. Q3 = Median of the upper half, i.e. 83 85 85 86 87, = 85 Therefore, IQR = Q3 – Q1 = 85 – 80 = 5 • Additionally, find Min and Max Min = 78, and Max = 87 – We get a five-number summary! – Min Q1 Median Q3 78 80 82.5 85 Max 87 Boxplots; Modified Version • Visual representation of the five-number summary – Central box: Q1 to Q3 – Line inside box: Median – Extended straight lines: from each end of the box to lowest and highest observation. • Modified Boxplots: only extend the lines to the smallest and largest observations that are not outliers. Each mild outlier* is represented by a closed circle and each extreme outlier** by an open circle. *Any observation farther than 1.5 IQR from the closest quartile is an outlier. **An outlier is extreme if more than 3 IQR from the nearest quartile, and is mild otherwise. Example • • • • • • • Five-number summary is: Min: 78 Q1: 80 Median: 82.5 Q3: 85 Max: 87 Draw a boxplot: More on Boxplots • Much more compact than histograms • “Quick and Dirty” visual picture • Gives rough idea on how data is distributed – Shows center/typical value (the median); – Position of median line indicates symmetric/not symmetric, positively/negatively skewed. – IQR gives the middle 50% – Min to Max gives the entire range • Side-by-side boxplots very useful for comparisons – See from slide 10 Describe a Boxplot • Symmetric? if not, positively or negatively skewed (based on median line) • Outliers? Based on 1.5IQR rule (and 3IQR rule for extreme outliers) • Overall range : = Max - Min; • IQR : = Central box’s range; • Similar procedure for side-by-side comparison Examples--MPG After Class … • Review sections 2.1 and 2.2, especially Pg 63 – 68, 74 – 77 • Review section 2.3, self-reading Pg95 • Read section 2.4 and 5.1 • Hw#2, due by 5pm, next Monday • Start Lab#2 now, you don’t want to wait till the last minute of next Thursday pm…