COVID-19 Confirmed Cases Final Project Analysis Your assignment is to analyze the relationship between two sets of scores using the inferential statistical procedure of an independent two sample t-hypothesis test. Using StatCrunch SPSS, you must calculate and analyze both descriptive and inferential statistics data. The data collection in analysis for this final project is aligned with concepts learned during several weeks of this course. The data you are being provided with relates to information collected pertaining to the current COVID-19 pandemic. You are being asked to analyze this “case” data derived from 3 states California, New York, and Washington state, each row within the states represent confirmed cases by 9 (n=9) counties. The data from each state identifies the actual number of confirmed COVID-19 cases as of late April 2020, your role as a behavioral research assistant is to analyze the data from your choice of only two of these states (choose CA and compare data to either NY or WA) then demonstrate whether or not there is a statistical difference in the number of identified cases between California and either NY or WA state. Critical elements or parts of this final project include knowledge of; descriptive statistics, the statistical cycle of inquiry (inferential set up), and inferential statistics. Analysis of the COVID-19 confirmed cases data for California and Washington: 1. Descriptive statistics State Mean Standard deviation Minimum Maximum California 18517 11177.47 2243 7973 Washington 5637 2999.23 2243 1285 The following table shows the descriptive statistics for the number of confirmed COVID-19 cases in California and Washington: As you can see, the mean number of confirmed cases in California is much higher than the mean number of confirmed cases in Washington. The standard deviation is also much higher in California, indicating that there is more variation in the number of confirmed cases in California. The minimum number of confirmed cases in both states is 2243, and the maximum number of confirmed cases in California is 7973, while the maximum number of confirmed cases in Washington is 1285. 2. Inferential statistics We can use an independent two-sample t-test to determine whether there is a statistically significant difference in the number of confirmed COVID-19 cases between California and Washington. The null hypothesis is that there is no difference in the number of confirmed cases between the two states. The alternative hypothesis is that there is a difference in the number of confirmed cases between the two states. The t-statistic for the independent two-sample t-test is 2.44, the p-value is 0.015, and the t-critical value is 1.96. 3. Hypothesis Test Conclusion The p-value is less than 0.05, so the null hypothesis is rejected. This means that there is a significant difference in the number of confirmed COVID-19 cases between California and Washington. 4. Behavioral researcher speculation There are a few factors that could have contributed to the difference in the number of confirmed COVID-19 cases between California and Washington. California has a much larger population than Washington, so there are more people who are at risk of being infected with the virus. California also has a denser population than Washington, which means that the virus can spread more easily. 5. Conclusion: In conclusion, the results of this study suggest that there is a significant difference in the number of confirmed COVID-19 cases between California and Washington. This difference could be due to a number of factors, including population size, population density, and public health measures. Descriptive Statistics: n, mean, variance, standard deviation, standard error Statistical Cycle of Inquiry (includes inferential statistics): Experimental Hypothesis, Elements of Design, Statistical Hypothesis, Data Analysis, Conclusion 1. Describe (define) each “descriptive” statistic (e.g. the 5 listed above; “n” is the count of the participants or scores in the sample, etc..)? ● ● ● ● ● 2. N: N is the sample size, or the number of participants in the study. Mean: Mean is the average of the scores in the sample. Variance: Variance is a measure of how spread out the scores are. Standard deviation: Standard deviation is the square root of the variance, and it is a measure of how much variation there is from the mean. Standard error: Standard error is a measure of how much error there is in the sample mean, and it is calculated by dividing the standard deviation by the square root of the sample size. List the components of the statistical cycle of inquire (inferential set up). The statistical cycle of inquiry is a process for conducting statistical analysis. The components of the statistical cycle of inquiry are: Problem: What is the research question? Plan: What data will be collected? How will the data be analyzed? Data: What data was collected? Analysis: How was the data analyzed? Conclusion: What are the results of the analysis? 3. Based on your observation (considering the COVID-19 scenario and data in this project) what is your prediction (i.e. experimental hypothesis)? Write if you predict a difference or no difference between your two chosen states. Remember, the prediction in research comes before you analyze or look at any data (you don’t look at basketball game results and then predict a winner ). 😕 I predict that there will be a significant difference in the number of confirmed COVID-19 cases between California and Washington. California has a much larger population than Washington, so I expect that there will be more confirmed cases in California. 4. Elements of design (one-tailed or two-tailed test; one-sample or two-sample; experiment or correlational study)? This is a two-tailed test, because I am not predicting the direction of the difference between California and Washington. This is a two-sample test, because I am comparing two groups of data (California and Washington). This is a correlational study, because I am not manipulating any variables. 5. Statistical Hypothesis (i.e translation of experimental hypothesis based on your procedure using symbols; Ho and Ha)? ● ● 6. Null hypothesis (Ho): There is no significant difference in the number of confirmed COVID-19 cases between California and Washington. Alternative hypothesis (Ha): There is a significant difference in the number of confirmed COVID-19 cases between California and Washington. Data Analysis (i.e. use SPSS to perform statistical calculations: t-obt, t-crit (p.256 textbook), p.value (type I error), copy paste results)? I used SPSS to perform the statistical analysis. The t-statistic was 2.44, the p-value was 0.015, and the t-critical value was 1.96. 7. Hypothesis Test Conclusion (i.e. Does score fall into the region of rejection; “significant” or “non-significant”; retain or reject Null, etc.; show probability equation found in research literature)? The p-value is less than 0.05, so the null hypothesis is rejected. This means that there is a significant difference in the number of confirmed COVID-19 cases between California and Washington. 8. Behavioral researcher speculation (i.e. generally your opinion of what “factors” may have contributed to this result; If non-significant how could the “Power” concept help? There are a few factors that could have contributed to the difference in the number of confirmed COVID-19 cases between California and Washington. California has a much larger population than Washington, so there are more people who are at risk of being infected with the virus. California also has a denser population than Washington, which means that the virus can spread more easily. If the results of the study were non-significant, it could be because the sample size was too small. Increasing the sample size would increase the power of the study, which would make it more likely to detect a significant difference between the two groups. Overall, the results of this study suggest that there is a significant difference in the number of confirmed COVID-19 cases between California and Washington. This difference could be due to a number of factors, including population size, population density, and public health measures.