Quantitative Methods Part 3 T- Statistics Standard Deviation Measures the spread of scores within the data set ◦ Population standard deviation is used when you are only interested in your own data ◦ Sample standard deviation is used when you want to generalise for the rest of the population Z - Scores A specific method for describing a specific location within a distribution ◦ Used to determine precise location of an in individual score ◦ Used to compare relative positions of 2 or more scores Normally Distributed (Bell shaped) Distribution of the Means • Frequency Distribution of 4 scores (2, 4, 6,8) X 0 1 2 X 3 4 X 5 6 X 7 • It is flat and not bell shaped • Mean of population is (2+4+6+8)/4 = 5 8 9 Distribution of the Means • Take all possible samples of pairs of scores (2,4,6,8) • Use random sampling and replace each individual sample into data set • Calculate average of all sample pairs 2+6 /2 = 4 6+2 /2 = 4 2+8 /2 = 5 8+2 /2 = 5 2+2 /2 = 2 2+4 /2 = 3 4+2 /2 = 3 0 1 X 2 X X 3 X X X 4 4+4 /2 = 4 4+6 /2 = 5 6+4 /2 = 5 X X X X 5 X X X 6 X X 7 X 8 9 Central Limit Theorem “For any population with a mean μ and standard deviation σ , the distribution of sample means for sample size n will have a mean of μ and standard deviation of σ/√n and will approach a normal distribution as n gets very large.” How big should the sample size be? n=30 X 0 1 X X X X X X X X X X X X X X X 2 3 4 5 6 7 8 9 Standard Error σ/√n is used to calculate the Standard Error of the sample mean Sample data = X The mean of each sample = Then the standard error becomes It identifies how much the observed sample mean differs from the un-measurable population mean μ. So to be more confident that our sample mean is a good measure of the population mean, then the standard error should be small. One way we can ensure this is to take large samples. Example Z= - μ/ σ The population of SATs scores is normal with μ= 500, σ =100. What is the chance that a sample of n=25 students has a mean score =540? Since the distribution is normal, we can use the z-score First calculate Standard Error ◦ 100/5 = 20 Then Z-Score ◦ 540-500/20 =2 z-value is 2, therefore around 98% of the sample means are below this and only 2% are above. So we conclude that the chance of getting a sample mean of 540 is 2%, so we are 98% confident that this sample mean, if recorded in an experiment is false. T - Statistics So far we’ve looked at mean and sd of populations and our calculations have had parameters But how do we deduce something about the population beyond our sample? We can use T-Statistic T - Statistics Remember SD from last week? Great for population of N but not for sample of n Why n -1? Because we can only freely choose n-1 (Degree of freedom = df) T - Statistics Standard Error Z-Score redone to show above = To T-Statistic, we substitute σ (SD of population) with s (SD of sample) But what about μ ? Hypothesis Testing Sample of computer game players n =16 Intervention = inclusion of rich graphical elements Level has 2 rooms ◦ Room A = lots of visuals ◦ Room B = very bland Put them in level 60 minutes Record how long they spend in B Results Average time spent in B = 39 minutes Observed “sum of squares” for the sample is SS = 540. A B Stage1: Formulation of Hypothesis : “null hypothesis”, that the visuals have no effect on the behaviour. : “alternate hypothesis”, that the visuals do have an effect on the players’ behaviour. If visuals have no effect, how long on average should they be in room B? Null hypothesis is crucial; here we can infer that μ = 30 and get rid of the population one Stage 2: Locate the critical region We use the T-table to help us locate this, enabling us to reject or accept the null hypothesis. To get we need: ◦ Number of degree of freedom (df) 16 -1 =15 ◦ Level of significance of confidence ◦ Locate in T-table (2tails)= critical value of t=2.131, t=2.131 Stage 3: Calculate statistics Calculate sample sd Sample Standard Error =6 = 6 / 4 =1.5 T-Statistic =6 The μ 30 came from the null hypothesis if visuals had no effect, then the player would spend 30 minutes in both rooms A and B. Stage 4: Decision Can we reject the , that the visuals have no effect on the behaviour? ◦ T = 6 which is well beyond the value of 2.313 which indicates where chance kicks in. So yes we can safely reject it and say it does affect behaviour Which room do they prefer? ◦ They spent on average 39 minutes in Room B which is bland Workshop Work on Workshop 6 activities Your journal (Homework) Your Literature Review (Complete/update) References Dr C. Price’s notes 2010 Gravetter, F. and Wallnau, L. (2003) Statistics for the Behavioral Sciences, New York: West Publishing Company