CS130 – Software Tools Fall 2010 Statistics and PASW Wrap-up 1 T-Test Testing the difference between the means of two samples If those samples are taken from the same population you would anticipate that they would be largely equal In words, this simple test is to see if the means that are observed in the two samples is equivalent to the means we would EXPECT from the two sample This is within a standardized error amount that you might expect from any two samples 2 Source: geography.dur.ac.uk Remember – assumes data is taken from a normally distributed population CS130 Fall 2010 T-Test The key concept here is that PASW tells you whether or not the difference between the means of whatever the two conditions or groups are, is large enough to not be by chance 3 CS130 Fall 2010 Types of t-Tests All t-tests have the principle of comparison of means as their basis In PASW, this will explain why the menu item for all t-test is called Comparing Means There are several variants of t-tests as you have already learn 4 Independent Paired or Dependent One-sample There are also several “assumption” tests that can provide a check to make sure the sample data is suitable for a parametric test such as a t-test, e.g. Levene’s Test to evaluate the equal variance, we used this for our independent t-test CS130 Fall 2010 Speaking of P-Values You were introduced to Pvalues or Sig. (2-tailed) as a method for determining when you can reject or accept the null hypothesis However, before we wrap up the course, you should be aware of its general purpose nature 5 P-values use a threshold sometimes called α, alpha We have been using 0.05 CS130 Fall 2010 Speaking of P-Values 6 It is important to note that the design of the study controls the alpha, we have been using 0.05 because it is common but it can be a value based on what you are trying to do The smaller the p-value the more evidence there is against the hypothesis (in this case our null hypothesis) If you want an even stronger case, to reject you could insist on a threshold of 0.01 or 99% probability that the result is not by chance However… All p-values pertain to the probability that the means of the data are different by chance It has nothing to do with nor does it know anything about the nature of your hypothesis CS130 Fall 2010 Speaking of P-Values The Prosecutor’s Fallacy – (Shaughnessy and Chance – 2005) “The p-value is .001. This means that the chance is only 1 in 1000 that the null hypothesis is true” 7 It is the data in the sample that contains the probability, not the interpretation Then that variable data is interpreted within the context of the hypothesis The hypothesis is a statement of how might see the data based on the samples that we have collected CS130 Fall 2010 A classic example 8 You take 1 random coin out of your bank You want to test the fairness of this one coin You flip it 10 times in a row and you get heads every time Null Hypothesis: The coin is fair and it flips honestly and independently Observed data: In 10 tries all are heads Now calculate the p-value P(10H in 10)=P(H)xP(H)…xP(H)=( 1/2)10 = .001 This is strong evidence that the null hypothesis can be rejected CS130 Fall 2010 Introduction to Analysis of Variance And Finally, a brief introduction in another major statistical test family involving comparing an attribute of variable – this time we will look at the variance not the mean This ANOVA or Analysis of Variance 9 Its here that we answer the age old question (at least a 7-week course old question) What happens if I want to compare several independent variables to see how they interact with each other? CS130 Fall 2010 Introduction to Analysis of Variance Like a t-test, there are many kinds of ANOVA methods – Factorial ANOVA, MANOVA, ANCOVA, and so on. For this intro, we will just look at what you need to know to understand if you should consider investing time in understanding this method The simplest ANOVA for example might be to compare the effects of caffeine on learning by using a placebo (Decaf…wow, that is mean) and a specific level of caffeinated beverage 10 CS130 Fall 2010 Introduction to Analysis of Variance How about adding more groups though as independent variables? For example the effect of caffeine and weight on learning with the control being a placebo. Now you start to leave the domain of a t-test Analysis of Variance is just what it says, a comparison of the total variance of the data, the variance of data within each group and then a comparison of the variance of data across the groups (in our case caffeine, placebo, weight as independent, maybe test score as indicator of learning) Useless clip art, oops 11 CS130 Fall 2010 Introduction to Analysis of Variance A few terms to remember…ANOVA uses the F-ratio to determine the quality of the variances. A high F-ratio means that there is more “planned” variance then “unplanned variance or error” And again it has a Significance value just like our t-tests 12 CS130 Fall 2010 Introduction to Analysis of Variance One example to consider I have created a research question…I am interested to see if job satisfaction and gender have any influence on what type of car a person might buy More two independent factors or variables are job satisfaction and gender, my dependent variables is car category My null hypothesis is that there is no significant relationship between the type of car I buy and my relative job satisfaction and gender 13 CS130 Fall 2010 Introduction to Analysis of Variance Of course in PASW, there is no menu pick for this factor based ANOVA, they call it the General Linear Model (GLM) with univariate. Of Course!! Or I could use a One-Way ANOVA which is found under Comparing Mean but that does not allow for two independent variables My data was given to me in the form of a .sav file 14 CS130 Fall 2010 Introduction to Analysis of Variance Of course in PASW, there is no menu pick for this factor based ANOVA, they call it the General Linear Model (GLM) with univariate. Of Course!! 15 CS130 Fall 2010 Introduction to Analysis of Variance The results show that in fact, there is a high degree of “similiarity” in the variance between the groups of independent variables I see this by the F-ratios I also see a very low Sig for all for car category which means there is no probability that the variance in the data is due to chance Therefore, I can reject my null hypothesis and say that there is a statistically significant relationship between my gender, job satisfaction and the type of car I might purchase. 16 CS130 Fall 2010 Introduction to Analysis of Variance One final note on the introduction This is meant to give you an additional pathway to investigate when you have a statistical project and maybe the design of experiment is slightly more complex You will need a fair amount of study to understand the details and proper use of ANOVA and its variants (no pun intended there 17 CS130 Fall 2010 CS130 Conclusion So, this concludes our CS130 section for the Fall. You have covered a myriad of topics and tools 18 Excel Equation Editor Word – Templates, Styles, Merge Powerpoint – Presenting and Information Visualization (Tufte, Klass) PASW and Statistics All in the context of Academic Research and Design of Experiments You should feel armed and ready to take on interesting scholarly questions and present your important work CS130 Fall 2010