DATA ANALYSIS FOR RESEARCH PROJECTS

advertisement
DATA ANALYSIS FOR
RESEARCH PROJECTS
TYPES OF DATA

Quantitative data
measurements use scale with equal intervals
examples include mass (g), length (cm),
volume (mL), temperature (oC or K)

Qualitative data
non-standard scales with unequal intervals or
discrete categories
examples include gender, choice, color scales
Quantitative Scales of Measure
Scale
Properties
Example
Interval
(equal)
Numerical value indicates
rank and meaningfully
reflects relative distance
between points on a scale
Has all the properties of
an interval scale, and in
addition has a true zero
point. (proportional scale)
Temperature (oC
or oF)
Ratio
(equal)
Length
Weight
Temperature (K)
Qualitative Scales of Measure
Scale
Properties
Example
Nominal
(to name)
Data represents
qualitative or
equivalent categories
(not numerical, cannot
be rank ordered).
Numerically ranked,
but has no implication
about how far apart
ranks are.
Eye color, hair
color
Gender
Race
Ordinal
(to order)
Grades
Rating Scales
Sample Data
An experiment was
conducted to measure the
tensile strength of each of
twelve pieces of two types
of steel. The data from
this experiment are given
in the table to the right.
Is there a significant
difference in tensile
strength between the two
types of steel?
Steel 1
Steel 2
(1000 lb/in^2) (1000 lb/in^2)
23.39
27.89
24.29
25.15
24.28
29.50
25.36
18.75
22.93
29.60
13.82
27.34
25.45
22.92
27.42
27.65
27.31
27.26
25.58
25.62
26.61
25.92
27.46
26.46
Is there a better way to compare the
data from these groups?
 What have you used before to compare
data from two different groups?

It is difficult to decide (consistently)
whether differences between
experimental groups are significant
 We need a rigorous procedure that
includes a clear operational definition of
dissimilarity.

Statistics & Statistical Analysis

Statistical hypothesis-testing methods
give us the ability to say with confidence
that differences between groups are
real and not just due to random chance,
sampling errors, or other mistakes in
data collection.
Sample data for consideration…

For the following sets of data, discuss:
– What was the IV and DV tested?
– How should the data be processed to
determine if the IV affects the DV?
– How will you decide if the IV has a
significant effect on the DV?
Sample Data Set 1
Effect of Temperature on the pressure of a sample of gas above water
Temperature of Water (oC)
Pressure (mmHg)
50
55
60
65
90
120
145
180
70
75
80
219
264
310
Graphing data

Correlation coefficient gives a measure
of how strong the relationship is
between the graphed variables.

Multiple trials can and should all be
analyzed at the same time.
Sample Data Set 2
Effect of Stress on the Height of Bean Plants after 30 Days
Stressed Plants (cm)
Unstressed Plants (cm)
55.0
65.0
50.0
57.0
48.0
65.0
59.0
57.0
59.0
73.0
57.0
51.0
63.0
65.0
54.0
62.0
68.0
58.0
44.0
50.0
Comparing levels of IV
If graphing the data is not appropriate,
the different groups of the IV can be
compared.
 These types of statistics are called
“Descriptive Statistics” since they:

– describe the data sets
– summarize groups of measurements
Descriptive Statistics:
Measure of Central Tendency
attempt to provide one value that is most typical of
the entire set of data
What are some examples of measures of central
tendency?
Variation
describes the spread within the data set
* two sets of data with the same mean may have
quite different spread within the data
Appropriate Measures of Central Tendency
and Variations for Types of Data
QUANTITATIVE
DATA
Central
Tendency
Measurement
Variation
Mean, Median
or Mode
Standard
Deviation
Or
Range
QUALITATIVE
DATA
Nominal
Ordinal
Mode
Median
Frequency
Distribution
What is “standard deviation”???



The standard deviation is a statistic that tells you how
tightly all the various examples are clustered around the
mean in a set of data. This relates the variation in a set of
data.
When the data points are pretty precise (close to the
mean, little variation), the bell-shaped curve is steep, and
the standard deviation is small.
When there is greater variation in the data, the bell curve
is relatively flat. that tells you you have a relatively large
standard deviation.
Displaying variation:
Box-and-Whisker Plot
SMALLEST
VALUE



FIRST
QUARTILE
MEDIAN
THIRD
QUARTILE
LARGEST
VALUE
First Quartile (Q1) – smaller than 75% of ranked values
Median (Q2) – smaller than 50% and larger than 50%
Third Quartile (Q3) – smaller than 25% of ranked values
Illustrating Distributions for
qualitative data: Histograms



Symmetrical – mean equals median
Left-skewed – mean < median
Right-skewed – mean > median
Statistical Hypothesis Testing
“A trend is apparent in the graph of the data, is
this trend significant?”
 “So the means of the groups are different, is
the difference significant?”


Statistical hypothesis testing is needed to
determine the significance in the results of your
data analysis.
 The results of these tests provide “Inferential
Statistics.” We make inferential decisions
based on the data we collect from a sample
population.
Sample Data
Effect of Stress on the Height of Bean Plants after 30 Days
Stressed Plants (cm)
Unstressed Plants (cm)
55.0
65.0
50.0
57.0
48.0
65.0
59.0
57.0
59.0
73.0
57.0
51.0
63.0
65.0
54.0
62.0
68.0
58.0
44.0
50.0
Example for comparing means:
t Test for Quantitative Data
Equal Sample Size
t=
x2
x1  x2
s1  s 2
n
2
2
s1
x1
= mean of Group 1
= mean of Group 2
2
s2
2
n
= variance of Group 1
= variance of Group 2
= number of items or
measurements
Statistical calculations

Use the TI-84 or TI-83
calculator OR
 Use Microsoft Excel Data
Analysis

Calculate the t-test for the
stressed plants data on
the next slide, using the
graphing calculator
Level of Significance
Establish a level of significance
In this class, use 0.05.
this means the probability of error in
rejecting the null hypothesis is 5/100
OR
we can be 95% confident that the null
hypothesis may be rejected
Results from the calculator








t: value for the t-test
x1: mean from List 1
x2: mean from List 2
Sx1: standard deviation for List 1
Sx2: standard deviation for List 2
df: degrees of freedom
n1: number of values in List 1
n2: number of values in List 2
t-Test Results from Excel
t-Test: Two-Sample Assuming Equal Variances
Mean
Variance
Observations
Pooled Variance
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
Stressed
Unstressed
Plants (cm) Plants (cm)
60
56
49.11111111 54.88888889
10
10
52
0
18
1.240347346
0.115386178
1.734063062
0.230772356
2.100923666
Statistical Hypotheses
(different from your research hypothesis)
Null Hypothesis
suggests any observed difference between two
sample means occurred by chance and is NOT
significant
state that there is no relationship between variables:
i.e. two means are equal OR they are not statistically
different
Claim / Alternative Hypothesis
derived from literature, research hypothesis
suggests outcome of experiment if I.V. affects D.V.
Null Hypothesis
What would be the null hypothesis
for this set of data?
The mean height of stressed plants is not
significantly different from the mean height
of unstressed plants.
Confidence Levels

Probability that findings are repeatable
 Infers that results of sample are the same as
results of the whole population
 If we reject the null hypothesis at 95%
confidence level:
– 95% certainty that difference between groups is
NOT due to chance
– 95% certainty that results will be the same with
further testing
Confidence levels
Probablity of error: Error that occurs if null
hypothesis is rejected when it is true and
should not be rejected
 Identified by Greek lowercase alpha, a
 Researchers usually select a < 0.05
 If confidence level is 95%, then probability
of error (a) is 5%, or 0.05

Statistical Tests:
Test Values and Critical Values
Test value – the result of a statistical test on
your data.
 Critical value – this is a reference value for
each statistical test.

– Your calculated statistical test value must exceed
this value for you to reject the null hypothesis

You can find the critical value for each
statistical test in publications and university
websites. (links available on my website)
 If you use Microsoft Excel for your statistics,
the critical value will be given with the results.
Significance of t value
Determine the degrees of freedom
df = (number in experimental group – 1) + (number in control group – 1)
df = (10 – 1) + (10 – 1) = 18
Determine significance of calculated t by looking at
table for critical t values
Calculated t < critical t  not significant
Calculated t > critical t  is significant
At df = 18, t = 2.101;
Calculated t of 1.24 < 2.101 and is not significant
at 0.05 level.
Rejecting Null Hypothesis
If test value is not significant 
null hypothesis is NOT REJECTED
If test value is significant 
null hypothesis is REJECTED
Do Statistical Findings Support the
Research Hypothesis?
Null hypothesis was rejected =
Research hypothesis was supported
(unless research hypothesis IS a null hypothesis)
Null hypothesis was not rejected =
Research hypothesis was not supported
Summary:
Steps of Hypothesis Testing
1.
2.
3.
4.
State the null hypothesis and
alternative hypothesis (claim)
Choose the confidence level (95%)
and sample size
Collect the data and calculate the
appropriate statistics
Make the proper statistical inference
Populations of Study – Be careful
what you claim!
Sample
specific portion of the population that is selected for the
study ( 100 bean seedlings used in the study)
Sampled Population
population from which the sample was drawn (all the
bean seedlings in the nursery from which the
experimenter obtained their bean seedlings)
Target Population
ALL units (persons, things, experimental outcomes) of
the specific group whose characteristics are being
studied (all the bean seedlings of the same species)
Communicating Statistics
Effect of Stress on the Mean Height of Bean Plants after 30 Days
Stressed Group
Unstressed Group
Mean
60.0 cm
56.0 cm
Variance
49.1 cm
60.7 cm
Standard Deviation
7.0 cm
7.8 cm
1SD
2SD
53.0 – 67.0 cm
46.0 – 74.0 cm
48.2 – 63.8 cm
40.4 – 71.6 cm
Number
10
10
Results of t test
t = 1.3
t of 1.3 < 2.101
df = 18
p > 0.10
Effect of Stress on the Height of Bean Plants After
30 Days
75
70
Height (cm)
65
60
55
50
45
40
Stressed
Unstressed
Treatment of Plants
Types of Tests

For Quantitative Data:
– Linear Regression
– One-Way Analysis of Variance (ANOVA)
– t Test

For Qualitative Data:
– Chi-Squared Test
– Z Test
Linear Regression

Determines a linear relationship
between two variables based on a
correlation coefficient
H0: The number of yellow M&M’s is not
related to the total number of M&M’s in
the package.
ANOVA Test

Compares the means of more than two
groups
H0: There is no significant difference
between the numbers of M&M’s in plain
packages, almond packages and
peanut packages
t-Test

Compares the means of two independent
groups
H0: There is no significant difference between
the numbers of M&M’s in plain and peanut
packages

Two-tail test determines if populations are not
equal / the same (more difficult to support)
 One-tail test determines if one mean is
greater than the other (easier to support)
Chi-Squared Test

Determines if a proportion within a
sample is larger than expected; can be
used for more than two groups
H0: There are equal numbers of each
color of M&M in a package.
Z-Test

Compares proportions between two
groups
H0: There are equal proportions of red
M&M’s in plain and peanut packages
Selecting a Statistical Test
Things to consider:
 Number of groups of data
 Type of data: Quantitative or
Qualitative
 Type of variable – numerical or
categorical
 The relationship in the null hypothesis
being tested
Statistical Tests Review
Comparison of two variables for
correlation  correlation coefficient test
 Comparing means of more than two
groups/levels  ANOVA test
 Comparing two means  t-test
 Comparison of proportions within a
population  X2 (chi-squared) test
 Comparison of proportions between
populations  Z test

Key Questions for your Research:

What kind of data will you need to collect to
test your hypothesis? (Qualitative or
Quantitative)
– What kind of scale will you use?
– How do you plan on analyzing this data?
• Comparison of groups? What will you compare?
• Look for a trend? What will you graph?
– How many different levels will you need data for?
– How many trials?

What relevant qualitative data will you look for
that may also help you interpret results?
Download