Uploaded by random acc

Data Managements and Basic Statistics

advertisement
The Nature of Data: Data
Management and Basic Statistics
Part 1
Jefte Arshed
The Nature of Data
Data Management
• data capture and management, like other aspects of
laboratory and field science, should be conducted to that the
chance of error is minimized
• Data can be captured electronically and most of the
principles that apply to traditional methods continue to apply.
• Research Questions: corresponds with statistical analysis
Correlation to Statistics
• Descriptive statistics- describes data (for
example, a chart or graph)
• Inferential statistics- allows you to make
predictions (“inferences”) from that data.
– Parametric statistics
– Non-Parametric statistics
Descriptive Statistics
• Measure of Frequency
– Count, Percent, Frequency
– Shows how often something occurs
• Measures of Central Tendency
– Mean, Median, and Mode
– Locates the distribution by various points
Descriptive Statistics
• Measures of Dispersion and Variation
–
–
–
–
Range, Variance/Standard Deviation
Identifies the spread of scores by stating intervals
Range = High/Low points
Variance or Standard Deviation = difference between
observed score and mean
• Measures of Position
– Percentile Ranks, Quartile Ranks
– Describes how scores fall in relation to one another.
Relies on standardized scores
Terms to Remember
Normal Distribution
Inferential Statistics
Commonly used Statistical Analysis
A.
B.
C.
D.
T-test (student T-test)
Chi- Square Goodness of Fit test
Correlation and Regression
Analysis of Variance (ANOVA)
Commonly used Statistical Analysis
A. T-test (student T-test) (Parametric)
– The t test tells you how significant the differences
between groups are;
– In other words it lets you know if those
differences (measured in means) could have
happened by chance.
T-test (student T-test)
• Let’s say you have a cold and you try a
naturopathic remedy. Your cold lasts a couple
of days. The next time you have a cold, you
buy an over-the-counter pharmaceutical and
the cold lasts a week. You survey your friends
and they all tell you that their colds were of a
shorter duration (an average of 3 days) when
they took the homeopathic remedy.
T-test (student T-test)
• The t-score is a ratio between the difference
between two groups and the difference within
the groups.
– A large t-score tells you that the groups are
different.
– A small t-score tells you that the groups are
similar.
T-test (student T-test)
t-values and p-values
• Every t-value has a p-value to go with it.
• p-value is the probability that the results from
your sample data occurred by chance.
– p-values are from 0% to 100%.
– p-value of 0.05 (5%) is accepted to mean the data
is valid.
T-test (student T-test)
• There are three main types of t-test:
– An Independent Samples t-test compares
the means for two groups.
– A Paired sample t-test compares means from the
same group at different times (say, one year
apart).
– A One sample t-test tests the mean of a single
group against a known mean.
Downloading Data Analysis Toolpack in
Excel
• First, check the toolbar of your excel and look for
the Data Analysis, (commonly at the right side
corner of your Data toolbar) if NONE, download
it: see link for the tutorial
• Windows:https://www.youtube.com/watch?v=_y
NxLFagKgw
• Mac:
https://www.youtube.com/watch?v=cE7YLvdWN
K4
How to do independent t-test in
EXCEL
• Step 1: Type your data into Excel: the first column for the
subject identifier (i.e. a name or a number), the second
column for the first independent variable results and the
third column for the second independent variable.
• Step 2: State your null hypothesis (How to state the null
hypothesis). For example, your null hypothesis might be
that the means are the same.
• Step 3: Click the “Data” tab and then click “Data
analysis.” If you don’t see the Data Analysis option, load
the Data Analysis Toolpak.
• Step 4: Click “t test: Two Sample Assuming Unequal
Variances” from the options window then click “OK.”
How to do independent t-test in
EXCEL
• Step 5: Click the “Variable 1 Range” box and then
select your first variable list (first independent
variable).
• Step 6: Click the “Variable 2 Range” box and then
select your second variable list (second
independent variable).
• Step 7: Type a number into the Hypothesized
Mean Difference box. For example, if your null
hypothesis stated that there was no difference
between the means, enter “0.” Otherwise, if you
are hypothesizing there is a difference, type that
difference into the box.
How to do independent t-test in
EXCEL
• Step 8: Check the “Labels” box if you have
included labels.
• Step 9: Type an alpha level into the alpha level
box. An alpha level of 0.05, or 5%, is standard
in hypothesis testing so if you aren’t sure what
alpha level you need, leave this at 0.05.
• Step 10: Click the Output Range box and select
an area to the right of your data. (dito lalabas yung
answer table once executed na yung dattasets)
• Step 11: Click “OK.”
Reading The Results from the Independent t-Test for means in Excel
• Your results will include a lot of data, some that’s obvious (like
the number of data items). But when you run a t-test you’re
really only looking for two things: t-scores and alpha levels.
• Step 1: Compare the alpha level you chose (i.e. 0.05) to the pvalue in the output. If the p-value in the output is smaller than
the alpha level you chose, reject the null hypothesis.
• Step 2: Compare the t-critical value in the output with the tvalue. If the t-value (t-stat) is larger than the t-critical
value, reject the null hypothesis. There are two t-critical values,
one-tail and two-tail. If you aren’t sure if you have a one-tailed
test or a two-tailed test, always compare the t-value to the twotail t critical value.
• In order to fully reject the null hypothesis, use both values (p
and t) in combination. In other words, if you think you might
reject the null based on the t-value, but your p-value is large,
then don’t reject the null.
How to do Paired t-test in EXCEL
• Step 1: Type your data into Excel. As the two sample t test paired
two sample for means is usually used for “before” and “after”
data, you’ll probably have three columns: the first column for
the subject identifier (i.e. a name or a number), the second
column for the Before results and the third column for the After
Results.
• Step 2: State your null hypothesis (How to state the null
hypothesis). For example, your null hypothesis might be that the
means are the same.
• Step 3: Click the “Data” tab and then click “Data analysis.” If you
don’t see the Data Analysis option, load the Data Analysis
Toolpak.
• Step 4: Click “t test paired two sample for means” from the
options window then click “OK.”
How to do Paired t-test in EXCEL
• Step 5: Click the “Variable 1 Range” box and then
select your first variable list (usually the Before
list).
• Step 6: Click the “Variable 2 Range” box and then
select your second variable list (usually the After
list).
• Step 7: Type a number into the Hypothesized
Mean Difference box. For example, if your null
hypothesis stated that there was no difference
between the means, enter “0.” Otherwise, if you
are hypothesizing there is a difference, type that
difference into the box.
How to do Paired t-test in EXCEL
• Step 8: Check the “Labels” box if you have
included labels.
• Step 9: Type an alpha level into the alpha level
box. An alpha level of 0.05, or 5%, is standard
in hypothesis testing so if you aren’t sure what
alpha level you need, leave this at 0.05.
• Step 10: Click the Output Range box and select
an area to the right of your data.
• Step 11: Click “OK.”
Reading The Results from the Paired t-Test for means in Excel
• Your results will include a lot of data, some that’s obvious (like
the number of data items). But when you run a t-test you’re
really only looking for two things: t-scores and alpha levels.
• Step 1: Compare the alpha level you chose (i.e. 0.05) to the pvalue in the output. If the p-value in the output is smaller than
the alpha level you chose, reject the null hypothesis.
• Step 2: Compare the t-critical value in the output with the tvalue. If the t-value (t-stat) is larger than the t-critical
value, reject the null hypothesis. There are two t-critical values,
one-tail and two-tail. If you aren’t sure if you have a one-tailed
test or a two-tailed test, always compare the t-value to the twotail t critical value.
• In order to fully reject the null hypothesis, use both values (p
and t) in combination. In other words, if you think you might
reject the null based on the t-value, but your p-value is large,
then don’t reject the null.
Click the link for video tutorial
• Independent t-test
– https://www.youtube.com/watch?v=norKDF0MH
0M
• Paired T-test
– https://www.youtube.com/embed/RHBIQ2reACM
Chi- Square Goodness of Fit test
• used to find out how the observed value of a
given phenomena is significantly different
from the expected value.
• the term goodness of fit is used to compare
the observed sample distribution with the
expected probability distribution.
• Chi-Square goodness of fit test determines
how well theoretical distribution fits the
empirical distribution.
• The chi-square Goodness of fit is to fit one categorical
variable to a distribution.
–
The Chi-square statistic can only be used on numbers.
They can’t be used for percentages, proportions, means
or similar statistical values. For example, if you have 10
percent of 200 people, you will need to convert that to a
number (20) before you can run a test statistic.
Procedure for Chi-Square Goodness of Fit Test:
1. The test can only be used for data put into classes. If
you have non-binned data you’ll need to make a
frequency table or histogram before performing the
test.
Procedure for Chi-Square Goodness of Fit Test:
• A. Null hypothesis: In Chi-Square goodness of
fit test, the null hypothesis assumes that there
is no significant difference between the
observed and the expected value.
• B. Alternative hypothesis: In Chi-Square
goodness of fit test, the alternative hypothesis
assumes that there is a significant difference
between the observed and the expected
value.
Procedure for Chi-Square Goodness of Fit Test:
• Compute the value of Chi-Square goodness of
fit test using the following formula:
Where,
= 0- observed value, E-expected value
Procedure for Chi-Square Goodness of Fit Test:
Chi Square Goodness of Fit Test in
EXCEL
Las Vegas Dice Chi Square Goodness of Fit Test
Example
• Let's say you want to know a six-sided die is fair
or unfair (Advanced Statistics by Dr. Larry
Stephens). If the die is fair then each side will
have an equal probability of coming up; if not,
then one or more of the sides will come up more
often. Now, test 120 rolls of the die and enter the
data into Excel. We would expect each side of the
die to come up 20 times (120/6):
Chi Square Goodness of Fit Test in
EXCEL
Chi Square Goodness of Fit Test in
EXCEL
2. H0: p1 = p2 = p3 = p4 = p5 = p6 = 1/6
Ha : At least one p is not equal to 1/6
3.
Chi Square Goodness of Fit Test in
EXCEL
4.
Chi Square Goodness of Fit Test in
EXCEL
5.
Interpreting the Chi Square Goodness of Fit results
• H0: p1 = p2 = p3 = p4 = p5 = p6 = 1/6
• Ha : At least one p is not equal to 1/6
END of part1
Download