Statistics Wrap-up

advertisement
CS130 – Software Tools
Fall 2010
Statistics and PASW Wrap-up
1
T-Test
Testing the difference between
the means of two samples
If those samples are taken from
the same population you would
anticipate that they would be
largely equal
In words, this simple test is to
see if the means that are
observed in the two samples is
equivalent to the means we
would EXPECT from the two
sample
This is within a standardized
error amount that you might
expect from any two samples




2
Source: geography.dur.ac.uk
Remember – assumes data is
taken from a normally
distributed population
CS130
Fall 2010
T-Test
The key concept here is that PASW tells you whether or not the difference
between the means of whatever the two conditions or groups are, is large
enough to not be by chance
3
CS130
Fall 2010
Types of t-Tests
All t-tests have the
principle of comparison of
means as their basis
In PASW, this will explain
why the menu item for all
t-test is called Comparing
Means
There are several variants
of t-tests as you have
already learn



4




Independent
Paired or Dependent
One-sample
There are also several
“assumption” tests that
can provide a check to
make sure the sample data
is suitable for a parametric
test such as a t-test, e.g.
Levene’s Test to evaluate
the equal variance, we
used this for our
independent t-test
CS130
Fall 2010
Speaking of P-Values


You were introduced to Pvalues or Sig. (2-tailed) as a
method for determining
when you can reject or
accept the null hypothesis
However, before we wrap
up the course, you should
be aware of its general
purpose nature


5
P-values use a threshold
sometimes called α, alpha
We have been using 0.05
CS130
Fall 2010
Speaking of P-Values



6
It is important to note that
the design of the study
controls the alpha, we have
been using 0.05 because it is
common but it can be a value
based on what you are trying
to do
The smaller the p-value the
more evidence there is against
the hypothesis (in this case
our null hypothesis)
If you want an even stronger
case, to reject you could insist
on a threshold of 0.01 or 99%
probability that the result is
not by chance



However…
All p-values pertain to the
probability that the means of
the data are different by
chance
It has nothing to do with nor
does it know anything about
the nature of your
hypothesis
CS130
Fall 2010
Speaking of P-Values

The Prosecutor’s Fallacy – (Shaughnessy and Chance – 2005)
“The p-value is .001. This means that the chance is only 1 in 1000
that the null hypothesis is true”



7
It is the data in the sample that contains the probability, not the
interpretation
Then that variable data is interpreted within the context of the
hypothesis
The hypothesis is a statement of how might see the data based
on the samples that we have collected
CS130
Fall 2010
A classic example






8
You take 1 random coin
out of your bank
You want to test the
fairness of this one coin
You flip it 10 times in a row
and you get heads every
time
Null Hypothesis: The coin is
fair and it flips honestly and
independently
Observed data: In 10 tries
all are heads
Now calculate the p-value


P(10H in
10)=P(H)xP(H)…xP(H)=(
1/2)10 = .001
This is strong evidence
that the null hypothesis
can be rejected
CS130
Fall 2010
Introduction to Analysis of Variance
And Finally, a brief
introduction in another
major statistical test family
involving comparing an
attribute of variable – this
time we will look at the
variance not the mean
This ANOVA or Analysis
of Variance


9

Its here that we answer
the age old question (at
least a 7-week course old
question)

What happens if I want to
compare several
independent variables to
see how they interact
with each other?
CS130
Fall 2010
Introduction to Analysis of Variance



Like a t-test, there are many kinds of ANOVA methods –
Factorial ANOVA, MANOVA, ANCOVA, and so on.
For this intro, we will just look at what you need to know
to understand if you should consider investing time in
understanding this method
The simplest ANOVA for example might be to compare
the effects of caffeine on learning by using a placebo
(Decaf…wow, that is mean) and a specific level of
caffeinated beverage
10
CS130
Fall 2010
Introduction to Analysis of Variance


How about adding more groups though as independent
variables? For example the effect of caffeine and weight
on learning with the control being a placebo. Now you
start to leave the domain of a t-test
Analysis of Variance is just what it says, a comparison of
the total variance of the data, the variance of data within
each group and then a comparison of the variance of data
across the groups (in our case caffeine, placebo, weight as
independent, maybe test score as indicator of learning)
Useless clip art,
oops
11
CS130
Fall 2010
Introduction to Analysis of Variance



A few terms to remember…ANOVA uses the F-ratio to
determine the quality of the variances.
A high F-ratio means that there is more “planned”
variance then “unplanned variance or error”
And again it has a Significance value just like our t-tests
12
CS130
Fall 2010
Introduction to Analysis of Variance




One example to consider
I have created a research question…I am interested to
see if job satisfaction and gender have any influence on
what type of car a person might buy
More two independent factors or variables are job
satisfaction and gender, my dependent variables is car
category
My null hypothesis is that there is no significant
relationship between the type of car I buy and my relative
job satisfaction and gender
13
CS130
Fall 2010
Introduction to Analysis of Variance

Of course in PASW, there is no menu pick for this factor
based ANOVA, they call it the General Linear Model
(GLM) with univariate. Of Course!!

Or I could use a One-Way ANOVA which is found under
Comparing Mean but that does not allow for two
independent variables

My data was given to me in the form of a .sav file
14
CS130
Fall 2010
Introduction to Analysis of Variance

Of course in PASW, there is no menu pick for this factor
based ANOVA, they call it the General Linear Model
(GLM) with univariate. Of Course!!
15
CS130
Fall 2010
Introduction to Analysis of Variance




The results show that in fact, there is a high degree of
“similiarity” in the variance between the groups of
independent variables
I see this by the F-ratios
I also see a very low Sig for all for car category which
means there is no probability that the variance in the data
is due to chance
Therefore, I can reject my null hypothesis and say that
there is a statistically significant relationship between my
gender, job satisfaction and the type of car I might
purchase.
16
CS130
Fall 2010
Introduction to Analysis of Variance



One final note on the introduction
This is meant to give you an additional pathway to
investigate when you have a statistical project and maybe
the design of experiment is slightly more complex
You will need a fair amount of study to understand the
details and proper use of ANOVA and its variants (no pun
intended there
17
CS130
Fall 2010
CS130 Conclusion


So, this concludes our CS130
section for the Fall.
You have covered a myriad of topics
and tools







18
Excel
Equation Editor
Word – Templates, Styles, Merge
Powerpoint – Presenting and
Information Visualization (Tufte, Klass)
PASW and Statistics
All in the context of Academic
Research and Design of Experiments
You should feel armed and ready to
take on interesting scholarly questions
and present your important work
CS130
Fall 2010
Download