Comparisons across normal distributions Z-Scores Overview Plan for the night Z-scores Definition Calculation Use Graphing Data/Distributions Frequencies/Percentages Charts/Graphs Last time… Last week we covered Measures of Central Tendency Mean, Mode, Median Measures of Variability Range, IQR, SIQR, Standard Deviation The most commonly used of the above are Mean (SD) These two measures can be combined to further describe the “position” of a score/datapoint Is that a good score? Mean and SD are useful, but sometimes we need to make comparisons between different measures Example (w/ same units of measure): SAT vs. ACT vs. GRE 10-yd dash time vs. 40-yd dash time Free-throw% vs. FG% vs. 3-Point% Example (w/different unit of measure): ERA vs. WHIP VO2max vs. Vertical Jump BMI vs. %BodyFat vs. Waist Circumference Minimal Statistics Mean SD m Describe the “typical” score, the “spread” of scores, and the number of cases Z-scores Combine the mean w/ SD to create a new unit of measurement (Standardizes Scores) Clearly identifies a score as above or below the mean AND expresses a score in units of SD Examples: z-score = 1.00 (1 SD above mean) z-score = -2.00 (2 SD below mean) Z-score = 1.0: GRAPHICALLY 84% of scores smaller than this Z=1 Recall – 50% of scores are below the mean + 34% of scores between the mean and 1 SD above Calculating z-scores X X OR Deviation Score Ζ SD SD Calculate Z for each of the following situations: X 20, SD 3, X 32 X 9, SD 2, X 6 Other features of z-scores 1) The Mean of a distribution of z-scores = 0 Recall the mean is the balance point of a distribution, where deviation scores sum to 0 A z-score of 0 is equivalent to scoring the mean Here is our normal distribution example from last week X = 70 SD = 10 If a subject scored 70, their z-score would be 0 34.1% 34.1% 13.6% 13.6% 2.3% 2.3% 40 50 60 70 80 90 100 Z = -3 -2 -1 0 1 2 3 Other features of z-scores 1) The Mean of a distribution of z-scores = 0 Recall the mean is the balance point of a distribution, where deviation scores sum to 0 A z-score of 0 is equivalent to scoring the mean 2) The SD of a distribution of z-scores = 1 Since SD is unit of measurement, when the mean is z=0 then the mean + 1 SD = a z-score of 1 Here is our normal distribution example from last week X = 70 SD = 10 34.1% What is the z-score of a subject that got: 80? 50? 100? 34.1% 13.6% 13.6% 2.3% 2.3% 40 50 60 70 80 90 100 Z = -3 -2 -1 0 1 2 3 Other features of z-scores 1) The Mean of a distribution of z-scores = 0 Recall the mean is the balance point of a distribution, where deviation scores sum to 0 A z-score of 0 is equivalent to scoring the mean 2) The SD of a distribution of z-scores = 1 Since SD is unit of measurement, when the mean is z=0 then the mean + 1 SD = a z-score of 1 3) A z-score distribution is same shape as raw score distribution Even though you are changing the unit of measurement, this does not change the “look” of the distribution when plotted Here is our normal distribution example from last week 34% of scores still fall between 0 and 1 z-score X = 70 SD = 10 34.1% 34.1% 13.6% 13.6% 2.3% 2.3% 40 50 60 70 80 90 100 Z = -3 -2 -1 0 1 2 3 Z-score Comparison As stated, z-scores standardize different distributions allowing you to make comparisons regardless of the unit of measure Bart’s score SAT Exam 450 (mean 500, SD 100) Lisa’s score ACT Exam 24 (mean 18, SD 6) Who scored higher? Bart: (450 – 500)/100 = - 0.5 Lisa: (24 – 18)/6 = 1 Z-scores & the normal curve For any z-score, we can calculate the percentage of scores between it and the mean; all scores below it & all above it Tons of online calculators: http://www.measuringusability.com/normal_curve.php Example: Mean BMI and WC in elementary school boys What upper and lower limits include 95% of BMI scores? If one boy’s BMI is 22 kg/m2 and another’s WC is 70 cm, which of the two has the highest adiposity? Nomenclature/Terminology Frequency: number of cases or subjects or occurrences in a distribution Represented with f i.e. f = 12 for a score of 25 12 occurrences of 25 in the sample Nomenclature/Terminology Percentage: Number of cases or subjects or occurrences expressed per 100 Represented with P or % Ex. f=12 for a score of 25 when n=25 P = 12/25*100 = 48% (of scores were 25) Warning Should report the f when presenting percentages i.e. 80% of the elementary students came from a family with an income < $25,000 different interpretation if n=5 compared to n=100 Reported in literature as f = 4 (80%) OR 80% (f = 4) OR 80% (n = 4) Numerator Monster Pantagraph reported that State Farm paid out over 1 Billion in dividends to customers in the United States Pantagraph, 6/13/00 Numerator Monster How much do you pay in car insurance every 6 months? So…how much is State Farm keeping? Frequency Distributions Graphically displaying the data should ALWAYS come before any type of statistical analysis Measures of central tendency and variability will give you a feeling for the distribution of the data – but it’s always easier to visually examine it Check for normality (are data normally distributed?) Check for outliers (are any subjects sticking out as odd?) Check of potential associations (might two variables relate to each other?) Frequency Distribution of Math Test Scores: SPSS Output t , m u l P r u c c e e e e V 2 4 1 0 0 0 2 5 1 0 0 1 2 8 2 1 1 1 2 9 2 1 1 2 3 0 1 0 0 2 3 1 1 0 0 2 3 2 3 1 1 3 3 3 1 0 0 4 3 4 6 2 2 5 3 5 3 1 1 6 3 6 4 1 1 8 3 7 8 2 2 0 T 3 0 0 40 items on exam Most students >34 skewed (more scores at one end of the scale) Cumulative frequencies &, Cumulative percentages t , m Cumulative u l P r u c c e e e e V 2 4 1 0 0 0 2 5 1 0 0 1 2 8 2 1 1 1 2 9 2 1 1 2 3 0 1 0 0 2 3 1 1 0 0 2 3 2 3 1 1 3 3 3 1 0 0 4 3 4 6 2 2 5 3 5 3 1 1 6 3 6 4 1 1 8 3 7 8 2 2 0 T 3 0 0 Percentage: how many subjects at and below a given score? i.e., 33.3% of students scored a 32 or lower Eyeball check of data: Intro to (brute force) graphing with SPSS Stem and Leaf Plot: quick viewing of data distribution Boxplot: visual representation of many of the descriptive statistics discussed last week Bar Chart: frequency of all cases Histogram: malleable bar chart Scatterplot: displays all cases based on two values of interest (X & Y) Note: compare to our previous discussion of distributions (normal, positively skewed, etc…) Stem and Leaf Plots Frequency Stem & Leaf 2.00 Extremes (=<25.0) 2.00 28 . 00 2.00 29 . 00 1.00 30 . 0 1.00 31 . 0 3.00 32 . 000 1.00 33 . 0 6.00 34 . 000000 3.00 35 . 000 4.00 36 . 0000 8.00 37 . 00000000 Stem width: 1 Each leaf: 1 case Fast look at shape of distribution shows f numerically & graphically stem is value, leaf is f Stem and Leaf Plots Another way of doing a stemplot Babe Ruth’s home runs in each of 14 seasons with the NY Yankees 54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, 22 2 3 4 5 6 25 45 1166679 449 0 Stem and Leaf Plots Back-to-back stem plots allow you to visualize two data sets at the same time Babe Ruth vs. Roger Maris Maris 8 643 863 93 1 Ruth 0 1 2 3 4 5 6 25 45 1166679 449 0 Boxplots 180 Maximum 160 Q3 140 Median Q1 120 Minimum 100 80 N= 16 Weight (in pounds) Note: we can also do sideby-side boxplots for a visual comparison of data sets Format of Bar Chart Y axis f X axis scores/categories Test score data as Bar Chart 10 8 6 4 Count 2 0 24 25 28 29 math test, max = 40 30 31 32 33 34 35 36 37 Format of Histogram (similar to Bar) Y axis f Can be manipulated X axis scores/categories Test score data as Histogram 10 8 6 4 2 Std. Dev = 3.62 Mean = 33.4 N = 33.00 0 24.0 26.0 28.0 math test, max = 40 30.0 32.0 34.0 36.0 38.0 Test score data as Revised Histogram 14 12 10 8 6 4 Std. Dev = 3.62 2 Mean = 33.4 N = 33.00 0 24.6 27.8 math test, max = 40 31.0 34.2 37.4 Scatterplot Quick way to visualize the data & see trends, patterns, etc… This plot visually shows the relationship between BMI and WC in a group of elementary school boys Scatterplot Somebody shook their pedometer for 2 hours a day… Here’s the relationship between females Steps/day and waist circumference Scatterplot Outlier removed This will impact any statistical tests you run (correlations, regression, etc…) Take home message Z-scores: A simple combination of Mean and SD Allow comparisons regardless of unit of measurement Always plot your data first! Descriptive statistics (like Mean/SD) are generally presented along with graphical representations of the distribution A histogram (for single variable) and scatterplot (for paired variables) are most commonly used Check for outliers! Is the value plausible? Upcoming… Homework = Cronk 3.5 & all of Chapter 4 Blackboard description upcoming We will examine relationships between variables next week Think about those scatterplots…do statistical relationships exist between those variables? How strong? In what direction? In-class activity 3…