Uploaded by Muska Zazai

final project

advertisement
COVID-19 Confirmed Cases Final Project Analysis
Your assignment is to analyze the relationship between two sets of scores using the inferential
statistical procedure of an independent two sample t-hypothesis test. Using StatCrunch SPSS,
you must calculate and analyze both descriptive and inferential statistics data. The data
collection in analysis for this final project is aligned with concepts learned during several weeks
of this course.
The data you are being provided with relates to information collected pertaining to the current
COVID-19 pandemic. You are being asked to analyze this “case” data derived from 3 states
California, New York, and Washington state, each row within the states represent confirmed
cases by 9 (n=9) counties. The data from each state identifies the actual number of confirmed
COVID-19 cases as of late April 2020, your role as a behavioral research assistant is to analyze
the data from your choice of only two of these states (choose CA and compare data to either NY
or WA) then demonstrate whether or not there is a statistical difference in the number of
identified cases between California and either NY or WA state. Critical elements or parts of this
final project include knowledge of; descriptive statistics, the statistical cycle of inquiry (inferential
set up), and inferential statistics.
Analysis of the COVID-19 confirmed cases data for California and Washington:
1. Descriptive statistics
State
Mean
Standard
deviation
Minimum
Maximum
California
18517
11177.47
2243
7973
Washington
5637
2999.23
2243
1285
The following table shows the descriptive statistics for the number of confirmed COVID-19
cases in California and Washington:
As you can see, the mean number of confirmed cases in California is much higher than the
mean number of confirmed cases in Washington. The standard deviation is also much higher in
California, indicating that there is more variation in the number of confirmed cases in California.
The minimum number of confirmed cases in both states is 2243, and the maximum number of
confirmed cases in California is 7973, while the maximum number of confirmed cases in
Washington is 1285.
2. Inferential statistics
We can use an independent two-sample t-test to determine whether there is a statistically
significant difference in the number of confirmed COVID-19 cases between California and
Washington. The null hypothesis is that there is no difference in the number of confirmed cases
between the two states. The alternative hypothesis is that there is a difference in the number of
confirmed cases between the two states.
The t-statistic for the independent two-sample t-test is 2.44, the p-value is 0.015, and the
t-critical value is 1.96.
3. Hypothesis Test Conclusion
The p-value is less than 0.05, so the null hypothesis is rejected. This means that there is a
significant difference in the number of confirmed COVID-19 cases between California and
Washington.
4. Behavioral researcher speculation
There are a few factors that could have contributed to the difference in the number of confirmed
COVID-19 cases between California and Washington. California has a much larger population
than Washington, so there are more people who are at risk of being infected with the virus.
California also has a denser population than Washington, which means that the virus can
spread more easily.
5. Conclusion:
In conclusion, the results of this study suggest that there is a significant difference in the number
of confirmed COVID-19 cases between California and Washington. This difference could be due
to a number of factors, including population size, population density, and public health measures.
Descriptive Statistics:
n, mean, variance, standard deviation, standard error
Statistical Cycle of Inquiry (includes inferential statistics):
Experimental Hypothesis, Elements of Design, Statistical Hypothesis, Data Analysis, Conclusion
1.
Describe (define) each “descriptive” statistic (e.g. the 5 listed above; “n” is the count of
the participants or scores in the sample, etc..)?
●
●
●
●
●
2.
N: N is the sample size, or the number of participants in the study.
Mean: Mean is the average of the scores in the sample.
Variance: Variance is a measure of how spread out the scores are.
Standard deviation: Standard deviation is the square root of the variance,
and it is a measure of how much variation there is from the mean.
Standard error: Standard error is a measure of how much error there is in the
sample mean, and it is calculated by dividing the standard deviation by the
square root of the sample size.
List the components of the statistical cycle of inquire (inferential set up).
The statistical cycle of inquiry is a process for conducting statistical analysis. The
components of the statistical cycle of inquiry are:
Problem: What is the research question?
Plan: What data will be collected? How will the data be analyzed?
Data: What data was collected?
Analysis: How was the data analyzed?
Conclusion: What are the results of the analysis?
3.
Based on your observation (considering the COVID-19 scenario and data in this project)
what is your prediction (i.e. experimental hypothesis)? Write if you predict a difference or
no difference between your two chosen states. Remember, the prediction in research
comes before you analyze or look at any data (you don’t look at basketball game results
and then predict a winner ).
😕
I predict that there will be a significant difference in the number of confirmed COVID-19
cases between California and Washington. California has a much larger population than
Washington, so I expect that there will be more confirmed cases in California.
4.
Elements of design (one-tailed or two-tailed test; one-sample or two-sample; experiment
or correlational study)?
This is a two-tailed test, because I am not predicting the direction of the difference
between California and Washington. This is a two-sample test, because I am comparing
two groups of data (California and Washington). This is a correlational study, because I
am not manipulating any variables.
5.
Statistical Hypothesis (i.e translation of experimental hypothesis based on your
procedure using symbols; Ho and Ha)?
●
●
6.
Null hypothesis (Ho): There is no significant difference in the number
of confirmed COVID-19 cases between California and Washington.
Alternative hypothesis (Ha): There is a significant difference in the
number of confirmed COVID-19 cases between California and
Washington.
Data Analysis (i.e. use SPSS to perform statistical calculations: t-obt, t-crit (p.256
textbook), p.value (type I error), copy paste results)?
I used SPSS to perform the statistical analysis. The t-statistic was 2.44, the
p-value was 0.015, and the t-critical value was 1.96.
7.
Hypothesis Test Conclusion (i.e. Does score fall into the region of rejection; “significant”
or “non-significant”; retain or reject Null, etc.; show probability equation found in research
literature)?
The p-value is less than 0.05, so the null hypothesis is rejected. This means that there is
a significant difference in the number of confirmed COVID-19 cases between California
and Washington.
8.
Behavioral researcher speculation (i.e. generally your opinion of what “factors” may have
contributed to this result; If non-significant how could the “Power” concept help?
There are a few factors that could have contributed to the difference in the number of
confirmed COVID-19 cases between California and Washington. California has a much
larger population than Washington, so there are more people who are at risk of being
infected with the virus. California also has a denser population than Washington, which
means that the virus can spread more easily.
If the results of the study were non-significant, it could be because the sample size was
too small. Increasing the sample size would increase the power of the study, which
would make it more likely to detect a significant difference between the two groups.
Overall, the results of this study suggest that there is a significant difference in the
number of confirmed COVID-19 cases between California and Washington. This
difference could be due to a number of factors, including population size, population
density, and public health measures.
Download