Descriptive Statistics • Tabular and Graphical Displays – Frequency Distribution - List of intervals of values for a variable, and the number of occurrences per interval – Relative Frequency - Proportion (often reported as a percentage) of observations falling in the interval – Histogram/Bar Chart - Graphical representation of a Relative Frequency distribution – Stem and Leaf Plot - Horizontal tabular display of data, based on 2 digits (stem/leaf) Comparing Groups • • • • Side-by-side bar charts 3 dimensional histograms Back-to-back stem and leaf plots Goal: Compare 2 (or more) groups wrt variable(s) being measured • Do measurements tend to differ among groups? Sample & Population Distributions • Distributions of Samples and Populations- As samples get larger, the sample distribution gets smoother and looks more like the population distribution – U-shaped - Measurements tend to be large or small, fewer in middle range of values – Bell-shaped - Measurements tend to cluster around the middle with few extremes (symmetric) – Skewed Right - Few extreme large values – Skewed Left - Few extreme small values Measures of Central Tendency • Mean - Sum of all measurements divided by the number of observations (even distribution of outcomes among cases). Can be highly influenced by extreme values. • Notation: Sample Measurements labeled Y1,...,Yn Y1 ... Yn Yi Y n n Median, Percentiles, Mode • Median - Middle measurement after data have been ordered from smallest to largest. Appropriate for interval and ordinal scales • Pth percentile - Value where P% of measurements fall below and (100-P)% lie above. Lower quartile(25th), Median(50th), Upper quartile(75th) often reported • Mode - Most frequently occurring outcome. Typically reported for ordinal and nominal data. Measures of Variation • Measures of how similar or different individual’s measurements are – Range -- Largest-Smallest observation – Deviation -- Difference between ith individual’s outcome and the sample mean: Yi Y – Variance of n observations Y1,...,Yn is the “average” squared deviation: s2 2 ( Y Y ) i n 1 (Y1 Y ) 2 (Y2 Y ) 2 ... (Yn Y ) 2 n 1 Measures of Variation • Standard Deviation - Positive square root of the variance (measure in original units): s s2 2 ( Y Y ) i n 1 • Properties of the standard deviation: • s 0, and only equals 0 if all observations are equal • s increases with the amount of variation around the mean • Division by n-1 (not n) is due to technical reasons (later) • s depends on the units of the data (e.g. $1000s vs $) Empirical Rule • If the histogram of the data is approximately bell-shaped, then: – Approximately 68% of measurements lie within 1 standard deviation of the mean. – Approximately 95% of measurements lie within 2 standard deviations of the mean. – Virtually all of the measurements lie within 3 standard deviations of the mean. Other Measures and Plots • Interquartile Range (IQR)-- 75th%ile - 25th%ile (measures the spread in the middle 50% of data) • Box Plots - Display a box containing middle 50% of measurements with line at median and lines extending from box. Breaks data into four quartiles • Outliers - Observations falling more than 1.5IQR above (below) upper (lower) quartile Dependent and Independent Variables • Dependent variables are outcomes of interest to investigators. Also referred to as Responses or Endpoints • Independent variables are Factors that are often hypothesized to effect the outcomes (levels of dependent variables). Also referred to as Predictor or Explanatory Variables • Research ??? Does I.V. D.V. Example - Clinical Trials of Cialis • Clinical trials conducted worldwide to study efficacy and safety of Cialis (Tadalafil) for ED • Patients randomized to Placebo, 10mg, and 20mg • Co-Primary outcomes: – Change from baseline in erectile dysfunction domain if the International Index of Erectile Dysfunction (Numeric) – Response to: “Were you able to insert your P… into your partner’s V…?” (Nominal: Yes/No) – Response to: “Did your erection last long enough for you to have succesful intercourse?” (Nominal: Yes/No) Source: Carson, et al. (2004). Example - Clinical Trials of Cialis • Population: All adult males suffering from erectile dysfunction • Sample: 2102 men with mild-to-severe ED in 11 randomized clinical trials • Dependent Variable(s): Co-primary outcomes listed on previous slide • Independent Variable: Cialis Dose: (0, 10, 20 mg) • Research Questions: Does use of Cialis improve erectile function? Sample Statistics/Population Parameters • Sample Mean and Standard Deviations are most commonly reported summaries of sample data. They are random variables since they will change from one sample to another. • Population Mean (m) and Standard Deviation (s) computed from a population of measurements are fixed (unknown in practice) values called parameters.