Non-parametric equivalents to the t-test Sam Cromie Parametric assumptions • Normal distribution – (Kolmogorov-Smirnov test) • For between groups designs homogeneity of variance – (Levene’s test) • Data must be of interval quality or above Scales of measurement - NOIR • Nominal – Label that is attached to someone or something – Can be arbitrary or have meaning e.g., number on a football shirt as opposed to gender – Has no numerical meaning • Ordinal – Organised in magnitude according to some variable e.g., place in class, world ranking – Tells us nothing about the distance between adjacent scores Scales of measurement - NOIR • Interval – adjacent data points are separated by equivalent amounts e.g., going from an IQ of 100 to 110 is the same increase as going from 110-120 • Ratio data – adjacent data points are separated by the same amount but the scales also has an absolute zero e.g., height or weight – When we talk about attractiveness on a scale of 0-5, 0 does not mean that the person has zero attractiveness it means we cannot measure it – Psychological data is rarely of ratio quality What type of scale? • • • • Education level County of Birth Reaction time IQ Between groups design • Non-parametric equivalent = MannWhitney U-test Mann-Whitney U-test • Based on ordinal data • If differences exist scores in one group should be larger than in the other Group A Scores Group B Scores 3, 4, 4, 9 7, 10, 10, 12 Rank ordering the data • Scores must be combined and rank ordered to carry out the analysis e.g., Original scores: 3 Ordinal scores: 1 Final Ranks: 1 4 2 2.5 4 3 2.5 7 4 4 9 5 5 10 6 6.5 10 7 6.5 • If there is a difference, scores for one group should be concentrated at one end (e.g., end which represents a high score) while the scores for the second group are concentrated at the other end 12 8 8 Null hypothesis • H0: There is no tendency for ranks in one treatment condition to be systematically higher or lower than the ranks in the other treatment condition. • Could also be thought of as – Mean rank for inds in the first treatment is the same as the mean rank for the inds in the second treatment • Less accurate since average rank is not calculated Calculation • For each data point, need to identify how many data points in the other group have a larger rank order • Sum these for each group - referred to as U scores • As difference between two Gs increases so the difference between these two sum scores (U values) increases Calculating U scores Rank Score No of data points in alternative G with larger rank 1 2.5 2.5 4 5 6.5 6.5 8 U score for both Gs 3 4 4 7 9 10 10 12 UA UB 4 4 4 1 3 0 0 0 15 1 Determining significance • Mann-Whitney U value = the smaller of the two U values calculated - here it is 1 • With the specified n for each group you can look up a value of U which your result should be equal to or lower than to be considered sig Mann-Whitney U table (2 groups of 4 two-tailed), Note extremes… – At the extreme there should be no overlap and therefore the Mann-Whitney U value should be = 0 – As the two groups become more alike then the ranks begin to intermix and U becomes larger Reporting the result • Critical U = 0 • Critical value is dependent on n for each group • U=1 (n=4,4), p>.05, two tailed Formula for calculation • Previous process can be tedious and therefore using a formula is more ‘straight forward’ U A nA nB Where R A n A ( n A 1) 2 R A is the sum of ranks for Group A Repeated measures - Wilcoxon T • H0 = In the general population there is no tendency for the signs of the difference scores to be systematically positive or negative. There is no difference between the means. • H1= the difference scores are systematically positive or negative. There is a difference between the means. Table showing calculation Trea tments Subject 1 2 diff 1 18 43 +25 6 2 9 14 +5 2 3 21 20 -1 1 4 30 48 +18 5 5 14 21 +7 3 6 12 4 -8 4 R 16 R 5 rank • Calculate difference score • Assign rank independent of sign • Add ranks for each sign separately • T = lowest rank total T=5 Interpreting results • Look up the critical value of T • You result must be equal to or lower than it in order to be considered significant • With n = 6 critical T is 0 and therefore the result here is not significant. • As either sum of ranks approaches 0 the presence of that direction of change is limited • If the sum of negative ranks is small there are obviously very few decreases indicating that most scores increased Non-parametric Pros and Cons • Advantages of non-parametric tests – Shape of the underlying distribution is irrelevant - does not have to be normal – Large outliers have no effect – Can be used with data of ordinal quality • Disadvantages – Less Power - less likely to reject H0 – Reduced analytical sophistication. With nonparametric tests there are not as many options available for analysing your data – Inappropriate to use with lots of tied ranks