Descriptive Statistics

advertisement
HCI 510 : HCI Methods I
• Statistics
HCI 510: HCI Methods I
• Descriptive Statistics
• Inferential Statistics
• Significance
• T-Test
HCI 510: HCI Methods I
• Descriptive Statistics
• Inferential Statistics
• Significance
• T-Test
Descriptive Statistics
Descriptive vs. Inferential statistics
Descriptive Statistics
Descriptive statistics are used to describe the basic
features of the data in a study.
They provide simple summaries about the sample and
the measures.
Together with simple graphics analysis, they form the
basis of virtually every quantitative analysis of data
Descriptive Statistics
Descriptive statistics are typically distinguished from
inferential statistics.
With descriptive statistics you are simply describing
what is or what the data shows.
With inferential statistics, you are trying to reach
conclusions that extend beyond the immediate data
alone.
Descriptive Statistics
For instance, we use inferential statistics to try to infer
from the sample data what the population might think.
Or, we use inferential statistics to make judgments of the
probability that an observed difference between groups
is a dependable one or one that might have happened by
chance in this study.
Thus, we use inferential statistics to make inferences
from our data to more general conditions; we use
descriptive statistics simply to describe what's going on
in our data.
Descriptive Statistics
Descriptive Statistics are used to present quantitative
descriptions in a manageable form.
In a research study we may have lots of measures.
Or we may measure a large number of people on any
measure.
Descriptive statistics help us to simply large amounts of
data in a sensible way.
Each descriptive statistic reduces lots of data into a
simpler summary.
Descriptive Statistics
For instance, consider a simple number used to
summarize how well a batter is performing in
baseball, the batting average.
This single number is simply the number of hits
divided by the number of times at bat.
A batter who is hitting .333 is getting a hit one
time in every three at bats.
One batting .250 is hitting one time in four.
The single number describes a large number of
discrete events.
Descriptive Statistics
Or, consider the scourge of many students, the
Grade Point Average (GPA).
This single number describes the general
performance of a student across a potentially
wide range of course experiences.
Descriptive Statistics
Every time you try to describe a large set of
observations with a single indicator you run the
risk of distorting the original data or losing detail.
The batting average doesn't tell you whether the
batter is hitting home runs or singles. It doesn't
tell whether she's been in a slump or on a streak.
The GPA doesn't tell you whether the student was
in difficult courses or easy ones, or whether they
were courses in their major field or in other
disciplines.
Even given these limitations, descriptive statistics
provide a powerful summary that may enable
comparisons across people or other units.
Descriptive Statistics
Univariate Analysis
Univariate analysis involves the examination across cases of
one variable at a time. There are three major characteristics
of a single variable that we tend to look at:
•
•
•
the distribution
the central tendency
the dispersion
In most situations, we would describe all three of these
characteristics for each of the variables in our study.
Descriptive Statistics
The Distribution.
The distribution is a summary of the frequency of individual
values or ranges of values for a variable.
The simplest distribution would list every value of a variable
and the number of persons who had each value.
Descriptive Statistics
The Distribution.
For instance, a typical way to describe the
distribution of college students is by year
in college, listing the number or percent
of students at each of the four years.
Or, we describe gender by listing the number or percent of
males and females.
In these cases, the variable has few enough values that we
can list each one and summarize how many sample cases had
the value.
Descriptive Statistics
The Distribution.
But what do we do for a variable like income or GPA?
With these variables there can be a large number of possible
values, with relatively few people having each one.
In this case, we group the raw scores into categories
according to ranges of values.
For instance, we might look at GPA according to the letter
grade ranges. Or, we might group income into four or five
ranges of income values.
Descriptive Statistics
The Distribution.
One of the most common ways to describe a single variable
is with a frequency distribution.
Depending on the particular variable, all of the data values
may be represented, or you may group the values into
categories first (e.g., with age, price, or temperature
variables, it would usually not be sensible to determine the
frequencies for each value.
Rather, the value are grouped into ranges and the
frequencies determined.).
Descriptive Statistics
The Distribution.
Frequency distributions can be depicted in two ways, as a
table or as a graph. The table shows an age frequency
distribution with five categories of age ranges defined.
Descriptive Statistics
The Distribution.
The same frequency distribution can be depicted in a graph
as shown in the figure. This type of graph is often referred to
as a histogram or bar chart.
Descriptive Statistics
Central Tendency.
The central tendency of a distribution is an estimate of the
"center" of a distribution of values.
There are three major types of estimates of central tendency:
•
•
•
Mean
Median
Mode
Descriptive Statistics
The Mean or average is probably the most commonly used
method of describing central tendency.
To compute the mean all you do is add up all the values and
divide by the number of values.
For example, the mean or average quiz score is determined
by summing all the scores and dividing by the number of
students taking the exam.
Descriptive Statistics
The Mean or average is probably the most commonly used
method of describing central tendency.
For example, consider the test score values:
15, 20, 21, 20, 36, 15, 25, 15
The sum of these 8 values is 167,
so the mean is 167/8 = 20.875.
Descriptive Statistics
The Median is the score found at the exact middle of the set
of values.
One way to compute the median is to list all scores in
numerical order, and then locate the score in the center of
the sample.
Descriptive Statistics
The Median
For example, if there are 500 scores in the list, score #250
would be the median. If we order 8 scores, we would get:
15,15,15,20,20,21,25,36
There are 8 scores and score #4 and #5 represent the
halfway point. Since both of these scores are 20, the median
is 20.
If the two middle scores had different values, you would have
to interpolate to determine the median.
Descriptive Statistics
The mode is the most frequently occurring value in the set of
scores.
To determine the mode, you might again order the scores as
shown above, and then count each one. The most frequently
occurring value is the mode.
In our example, (15, 20, 21, 20, 36, 15, 25, 15) the value 15
occurs three times and is the model.
In some distributions there is more than one modal value.
For instance, in a bimodal distribution there are two values
that occur most frequently.
Descriptive Statistics
Notice that for the same set of 8 scores we got three different
values
-- 20.875, 20, and 15 –
for the mean, median and mode respectively.
If the distribution is truly normal (i.e., bell-shaped), the mean,
median and mode are all equal to each other.
Descriptive Statistics
Dispersion.
Dispersion refers to the spread of the values around the
central tendency.
There are two common measures of dispersion,
the range and the standard deviation.
Descriptive Statistics
Dispersion.
Dispersion refers to the spread of the values around the
central tendency.
There are two common measures of dispersion,
the range and the standard deviation.
The range is simply the highest value minus the lowest
value. In our example distribution, the high value is 36 and
the low is 15, so the range is 36 - 15 = 21.
Descriptive Statistics
The Standard Deviation is a more accurate and detailed
estimate of dispersion because an outlier can greatly
exaggerate the range.
Look at the set of scores: 15,20,21,20,36,15,25,15
the single outlier value of 36 stands apart from the rest of the values.
The Standard Deviation shows the relation that set of
scores has to the mean of the sample.
Descriptive Statistics
To compute the standard deviation, we first find the distance
between each value and the mean.
We know from above that the mean is 20.875. So, the
differences from the mean are:
15 - 20.875 = -5.875
20 - 20.875 = -0.875
21 - 20.875 = +0.125
20 - 20.875 = -0.875
36 - 20.875 = 15.125
15 - 20.875 = -5.875
25 - 20.875 = +4.125
15 - 20.875 = -5.875
Descriptive Statistics
Notice that values that are below the mean have negative
discrepancies and values above it have positive ones.
Next, we square each discrepancy:
-5.875 * -5.875 = 34.515625
-0.875 * -0.875 = 0.765625
+0.125 * +0.125 = 0.015625
-0.875 * -0.875 = 0.765625
15.125 * 15.125 = 228.765625
-5.875 * -5.875 = 34.515625
+4.125 * +4.125 = 17.015625
-5.875 * -5.875 = 34.515625
Descriptive Statistics
Now, we take these "squares" and sum them to get the Sum
of Squares (SS) value. Here, the sum is 350.875.
Next, we divide this sum by the number of scores minus 1.
Here, the result is 350.875 / 7 = 50.125.
This value is known as the variance.
To get the standard deviation, we take the square root of the
variance (remember that we squared the deviations earlier).
This would be SQRT(50.125) = 7.079901129253.
Descriptive Statistics
Although this computation may seem convoluted, it's actually
quite simple.
To see this, consider the formula for the standard deviation:
Descriptive Statistics
In the top part of the ratio, the numerator,
we see that each score has the mean
subtracted from it, the difference is
squared, and the squares are summed.
In the bottom part, we take the number of scores minus 1.
The ratio is the variance and the square root is the standard
deviation. We can describe the standard deviation as:
The square root of the sum of the squared deviations from
the mean divided by the number of scores minus one
Descriptive Statistics
Although we can calculate these univariate statistics by
hand, it gets quite tedious when you have more than a few
values and variables.
Every statistics program and calculator is capable of
calculating them easily for you.
Descriptive Statistics
The standard deviation allows us to reach some conclusions
about specific scores in our distribution.
Assuming that the distribution of scores is normal or bellshaped (or close to it!), the following conclusions can be
reached:
•
approximately 68% of the scores in the sample fall within one
standard deviation of the mean
•
approximately 95% of the scores in the sample fall within two
standard deviations of the mean
•
approximately 99% of the scores in the sample fall within three
standard deviations of the mean
Descriptive Statistics
For instance, since the mean in our example is 20.875 and
the standard deviation is 7.0799,
we can from the above statement estimate that
approximately 95% of the scores will fall in the range of
20.875-(2*7.0799) to 20.875+(2*7.0799)
or between 6.7152 and 35.0348.
This kind of information is a critical stepping stone to
enabling us to compare the performance of an individual on
one variable with their performance on another, even when
the variables are measured on entirely different scales.
Descriptive Statistics
Worksheet 01
While performing a usability test on a new computer interface
the following data was collected :
1.
2.
3.
4.
Create a frequency distribution for each variable.
Calculate the mean, median and mode for each data set.
Calculate the range, variance and standard deviation for each data set.
What conclusions can be drawn, if any from these statistics.
HCI 510: HCI Methods I
• Descriptive Statistics
• Inferential Statistics
• Significance
• T-Test
Inferential Statistics
Inferential Statistics
With inferential statistics, you are trying to reach conclusions
that extend beyond the immediate data alone.
For instance, we use inferential statistics to try to infer from
the sample data what the population might think.
Or, we use inferential statistics to make judgments of the
probability that an observed difference between groups is a
dependable one or one that might have happened by chance
in this study.
Inferential Statistics
Inferential Statistics
Thus,
we use inferential statistics to make inferences from our data
to more general conditions;
we use descriptive statistics simply to describe what's going
on in our data.
Inferential Statistics
Inferential Statistics
Inferential statistics are useful in experimental and quasiexperimental research design or in program outcome
evaluation.
Inferential Statistics
Inferential Statistics
Perhaps one of the simplest inferential test is used when you
want to compare the average performance of two groups on
a single measure to see if there is a difference.
You might want to know whether eighth-grade boys and girls
differ in math test scores or whether a program group differs
on the outcome measure from a control group.
Whenever you wish to compare the average performance
between two groups you should consider the t-test for
differences between groups.
Inferential Statistics
Inferential Statistics
Most of the major inferential statistics come from a general
family of statistical models known as the General Linear
Model.
This includes the t-test, Analysis of Variance (ANOVA),
Analysis of Covariance (ANCOVA), regression analysis, and
many of the multivariate methods like factor analysis,
multidimensional scaling, cluster analysis, discriminant
function analysis, and so on.
Significance
Significance
"Significance level" is a misleading term that many people do
not fully understand. In normal English, "significant" means
important, while in Statistics "significant" means probably
true (not due to chance).
A research finding may be true without being important.
When statisticians say a result is "highly significant" they
mean it is very probably true.
They do not (necessarily) mean it is highly important.
Significance
Take a look at the table below.
Significance
The chi squares at the bottom of the table show two rows of
numbers. The top row numbers of 0.07 and 24.4 are the chi
square statistics themselves. The second row contains
values .795 and .001. These are the significance levels.
HCI 510: HCI Methods I
• Descriptive Statistics
• Inferential Statistics
• Significance
• T-Test
Significance
Significance levels show you how likely a result is due to
chance.
The most common level, used to mean something is good
enough to be believed, is .95.
This means that the finding has a 95% chance of being true.
However, this value is also used in a misleading way.
Significance
No statistical package will show you "95%" or ".95" to
indicate this level.
Instead it will show you ".05," meaning that the finding has a
five percent (.05) chance of not being true, which is the
converse of a 95% chance of being true.
To find the significance level, subtract the number shown
from one.
For example, a value of ".01" means that there is a 99% (1.01=.99) chance of it being true.
Significance
In this table, there is probably no difference in purchases of
gasoline X by people in the city center and the suburbs,
because the probability is .795 (i.e., there is only a 20.5%
chance that the difference is true).
.
Significance
In contrast the high significance level for type of vehicle (.001
or 99.9%) indicates there is almost certainly a true difference
in purchases of Brand X by owners of different vehicles in the
population from which the sample was drawn.
Significance
In all cases of calculating statistical significance, the p value
tells you how likely something is to be not true.
If a chi square test shows probability of .04, it means that
there is a 96% (1-.04=.96) chance that the answers given by
different groups really are different.
If a t-test reports a probability of .07, it means that there is a
93% chance that the two means being compared would be
truly different if you looked at the entire population.
Significance
Significance is a statistical term that tells how sure you are
that a difference or relationship exists.
To say that a significant difference or relationship exists only
tells half the story. We might be very sure that a relationship
exists, but is it a strong, moderate, or weak relationship?
After finding a significant relationship, it is important to
evaluate its strength.
Significant relationships can be strong or weak. Significant
differences can be large or small. It just depends on your
sample size.
Significance
For example, suppose we give 1,000 people an IQ test, and
we ask if there is a significant difference between male and
female scores.
The mean score for males is 98 and the mean score for
females is 100.
We use an independent groups t-test and find that the
difference is significant at the .001 level.
The big question is, "So what?". The difference between 98
and 100 on an IQ test is a very small difference...so small, in
fact, that its not even important.
Significance
Then why did the t-statistic come out significant?
Because there was a large sample size.
When you have a large sample size, very small differences
will be detected as significant.
This means that you are very sure that the difference is real
(i.e., it didn't happen by fluke). It doesn't mean that the
difference is large or important.
If we had only given the IQ test to 25 people instead of
1,000, the two-point difference between males and females
would not have been significant.
Significance
People sometimes think that the 95% level is sacred.
If a test shows a .06 probability, it means that it has a 94%
chance of being true. You can't be quite as sure about it as if
it had a 95% chance of being be true, but the odds still are
that it is true.
The 95% level comes from academic publications, where a
theory usually has to have at least a 95% chance of being
true to be considered worth telling people about.
In the ‘real’ world if something has a 90% chance of being
true (probability =.1), it can't be considered proven, but it is
probably better to act as if it were true rather than false.
Significance
Many researchers use the word "significant" to describe a
finding that may have decision-making utility to a client.
From a statistician's viewpoint, this is an incorrect use of the
word.
However, the word "significant" has virtually universal
meaning to the public.
Thus, many researchers use the word "significant" to
describe a difference or relationship that may be strategically
important to a client (regardless of any statistical tests).
Significance
In these situations, the word "significant" is used to advise a
client to take note of a particular difference or relationship
because it may be relevant to the company's strategic plan.
The word "significant" is not the exclusive domain of
statisticians and either use is correct in the business world.
Thus, for the HCI expert, it may be wise to adopt a policy of
always referring to "statistical significance" rather than simply
"significance" when communicating with the public.
Significance
One important concept in significance testing is whether you
use a one-tailed or two-tailed test of significance.
The answer is that it depends on your hypothesis.
Significance
When your research hypothesis states the direction of the
difference or relationship, then you use a one-tailed
probability.
For example, a one-tailed test would be used to test these
null hypotheses:
• Females will not score significantly higher than males on an IQ test.
• Blue collar workers are will not buy significantly more product than
white collar workers.
• Superman is not significantly stronger than the average person.
In each case, the null hypothesis (indirectly) predicts the
direction of the difference.
Significance
A two-tailed test would be used to test these null hypotheses:
• There will be no significant difference in IQ scores between males and
females.
• There will be no significant difference in the amount of product
purchased between blue collar and white collar workers.
• There is no significant difference in strength between Superman and
the average person.
The one-tailed probability is exactly half the value of the twotailed probability.
Significance
A two-tailed test would be used to test these null hypotheses:
• There will be no significant difference in IQ scores between males and
females.
• There will be no significant difference in the amount of product
purchased between blue collar and white collar workers.
• There is no significant difference in strength between Superman and
the average person.
The one-tailed probability is exactly half the value of the twotailed probability.
Significance
Whenever we perform a significance test, it involves
comparing a test value that we have calculated to some
critical value for the statistic.
It doesn't matter what type of statistic we are calculating
(e.g., a t-statistic, a chi-square statistic, an F-statistic, etc.),
the procedure to test for significance is the same.
1. Decide on the critical alpha level you will use (i.e., the error rate you
are willing to accept).
2. Conduct the research.
3. Calculate the statistic.
4. Compare the statistic to a critical value obtained from a table.
Significance
If your statistic is higher than the critical value from the table:
• Your finding is significant.
• You reject the null hypothesis.
• The probability is small that the difference or relationship
happened by chance, and p is less than the critical alpha
level (p < alpha ).
Significance
If your statistic is lower than the critical value from the table:
• Your finding is not significant.
• You fail to reject the null hypothesis.
• The probability is high that the difference or relationship
happened by chance, and p is greater than the critical
alpha level (p > alpha ).
HCI 510: HCI Methods I
• Descriptive Statistics
• Inferential Statistics
• Significance
• T-Test
T-Test
The t-test assesses whether the means of two groups are
statistically different from each other.
This analysis is appropriate whenever you want to compare
the means of two groups, and especially appropriate as the
analysis for the posttest-only two-group randomized
experimental design.
T-Test
The figure shows the distributions for the treated (orange)
and control (purple) groups in a study.
T-Test
The figure indicates where the control and treatment group
means are located.
The question the t-test addresses is whether the means are
statistically different.
T-Test
What does it mean to say that the averages for two groups
are statistically different?
T-Test
What does it mean to say that the averages for two groups
are statistically different?
T-Test
The first thing to notice about the three situations is that the
difference between the means is the same in all three.
But, you should also notice that the three situations don't
look the same -- they tell very different stories.
T-Test
The top example shows a case with moderate variability of
scores within each group. The second situation shows the
high variability case. the third shows the case with low
variability.
T-Test
Clearly, we would conclude that the two groups appear most
different or distinct in the bottom or low-variability case.
Why? Because there is relatively little overlap between the
two bell-shaped curves.
In the high variability case, the group difference appears
least striking because the two bell-shaped distributions
overlap so much.
T-Test
This leads us to a very important conclusion: when we are
looking at the differences between scores for two groups, we
have to judge the difference between their means relative to
the spread or variability of their scores.
The t-test does just this..
T-Test
Statistical Analysis of the t-test
The formula for the t-test is a ratio.
The top part of the ratio is just the difference between the two
means or averages.
The bottom part is a measure of the variability or dispersion
of the scores.
T-Test
Statistical Analysis of the t-test
This formula is essentially another example of the signal-tonoise metaphor in research:
the difference between the means is the signal that, in this
case, we think our program or treatment introduced into the
data;
the bottom part of the formula is a measure of variability that
is essentially noise that may make it harder to see the group
difference.
T-Test
The figure below shows the formula for the t-test and how the
numerator and denominator are related to the distributions.
T-Test
The top part of the formula is easy to compute -- just find the
difference between the means.
The bottom part is called the standard error of the difference.
To compute it, we take the variance for each group and
divide it by the number of people in that group.
We add these two values and then take their square root.
T-Test
The final formula for the t-test is shown below:
Remember, that the variance is simply the square of the standard deviation.
T-Test
The t-value will be positive if the first mean is larger than the
second and negative if it is smaller.
Once you compute the t-value you have to look it up in a
table of significance to test whether the ratio is large enough
to say that the difference between the groups is not likely to
have been a chance finding.
To test the significance,
you need to set a risk level
(called the alpha level).
T-Test
You also need to determine the
degrees of freedom (df) for the
test.
In the t-test, the degrees of
freedom is the sum of the
persons in both groups minus 2.
T-Test
Given the alpha level, the df, and
the t-value, you can look the tvalue up in a standard table of
significance to determine
whether the t-value is large
enough to be significant.
If it is, you can conclude that the
difference between the means
for the two groups is different
(even given the variability).
T-Test
Worksheet 2
Rosenthal and Jacobson (1968) informed classroom
teachers that some of their students showed unusual
potential for intellectual gains.
Eight months later the students identified to teachers
as having potentional for unusual intellectual gains
showed significantly greater gains performance on a
test said to measure IQ than did children who were
not so identified.
T-Test
Conclusions
If you do a large number of tests, falsely significant results are a problem.
Remember that a 95% chance of something being true means there is a
5% chance of it being false. This means that of every 100 tests that show
results significant at the 95% level, the odds are that five of them do so
falsely.
If you took a random, meaningless set of data and did 100 significance
tests, the odds are that five tests would be falsely reported significant.
As you can see, the more tests you do, the more of a problem these false
positives are. You cannot tell which the false results are - you just know
they are there.
T-Test
Conclusions
Limiting the number of tests to a small group chosen before the data is
collected is one way to reduce the problem.
If this isn't practical, there are other ways of solving this problem.
The best approach from a statistical point of view is to repeat the study
and see if you get the same results.
If something is statistically significant in two separate studies, it is
probably true.
T-Test
Conclusions
In real life it is not usually practical to repeat a survey, but you can use
the "split halves" technique of dividing your sample randomly into two
halves and do the tests on each.
If something is significant in both halves, it is probably true.
The main problem with this technique is that when you halve the sample
size, a difference has to be larger to be statistically significant.
T-Test
Conclusions
The last common error is also important.
Most significance tests assume you have a truly random sample.
If your sample is not truly random, a significance test may overstate the
accuracy of the results, because it only considers random error.
The test cannot consider biases resulting from non-random error (for
example a badly selected sample).
T-Test
Conclusions
To summarize:
•
In statistical terms, significant does not necessarily mean important.
•
Probability values should be read in reverse (1 - p).
•
Too many significance tests will turn up some falsely significant
relationships.
•
Check your sampling procedure to avoid bias.
Download