Data Analysis & Research Design Evaluation 2 Please read and sign the Declaration below. As I type my name below, I admit that I am submitting my own work. I have answered all the questions and conducted all the analyses myself independently. I understand that all forms of plagiarism, cheating and collusion are regarded seriously by the University and could result in penalties including failure and possible exclusion from the University. Naomi Yuen Your full name 19896638 Student ID 22/05/21 Date You must provide relevant SPSS output for all data analyses questions; No relevant output = Zero mark Question 1 (18 marks; 3 marks for each part): As a health and fitness enthusiast you are interested in the effects of regularly eating kale (also known as leaf cabbage), even though you don’t particularly like the taste of your daily kale smoothie. A close friend reports an improvement in their depression with the consumption of kale. Now that you are just about to finish EPID1000, a unit you were skeptical about in the beginning and felt scared of numbers but now you realize there is always a story behind numbers and you appreciate the knowledge and skills this unit it has provided to you along with practice opportunities with lab & tutorial exercises and even with first assessment! You now feel confident you can plan a hypothetical research study to explore if regular consumption of kale really improves or even prevents depression. You consider various study designs (as below) and what each would involve in terms of participants and processes, what measures of association and effect each can provide and what could be potential bias and other (confounding) factors you should be aware of & how these could be minimized or even eliminated. You should use the above description and points provided in the review exercise (tutorial 11) to answer this question. Please also re-visit the epidemiology ilectures which cover each design on weekly basis. a) Cross-sectional study Where each individual’s disease presence or absence, as well as risk factor/exposure, is identified. Explores correlations that may exist between exposure (kale) and outcome (depression level) at a particular point in time. To plan, establish IVs, (e.g., kale consumption) DVs (e.g., depression level), extraneous, and confounder variables and decide how to collect information. A questionnaire/survey can be created in order to collect all relevant information such as depression score (measured in depression score; ratio variable) and kale consumption (measured as very regularly, regularly, irregularly, never; ordinal variable). A representative & willing sample of the general population (to minimise selection bias) will be recruited to answer questions – data will be collected once. Information bias may occur, which may be minimised by asking precise questions. Confounders can be minimised through randomisation. Prevalence ratio is used to measure of association and effect. b) Case Control study Two groups of participants need to be recruited: cases (with depression) & controls (without depression). Subjects are selected on the basis of having or not having the condition of interest (depression) and must be similar in all regards other than the condition. Participants should be chosen from the general population (to reduce chances of selection bias). Essential data such as kale consumption and other confounding factors such as genes and lifestyle will need to be collected. Both groups will be asked questions and compared to see if their previous exposures are different, otherwise systematic difference can cause biased results. Chances of recall and information bias may also occur. Matching can at least be used to minimise confounding. Measure of association and affect can be calculated through odds ratio. Statistical significance can be tested through Chi square test and 95%CI. c) Prospective Cohort study. Disease-free participants are identified & followed for a specified period of time. Identify and define the cohort: depression-free individuals from the general population at the start, with varying levels of kale consumption. Those that already have the condition are excluded. Data on exposures (e.g., kale) and confounding variables (e.g., age, major events, medication, etc) must be collected before data on the outcome (depression). All undergo a follow up for a set period of time and all cases of depression (yes/no; nominal) will be documented. Cumulative incidence/risk of depression is compared between those with high and low kale consumption. It is prone to selection, recall, and information bias which may be minimised by blinding and identical data collection methods. RR and its 95%CI can be used as a measure of association and effect. Pearson Chi squared test can be used to establish statistical significance. d) Retrospective Cohort study. Where existing data is identified to look for exposures (kale consumption) and records few years later are used to establish the outcomes. To plan, identify and define a cohort through records (in this scenario, most likely medical records from the past that provide information about kale consumption). Then check through records dating a few years after to see whether there are any reports or diagnoses of depression (measured in yes/no; binomial nominal) and use relative risk to compare depression cases between those with high and low kale consumption. This study is prone to recall, selection, and information bias which may be hard to avoid in this study. RR and its 95%CI can be used as a measure of association and effect. Relative risk and 95%CI along with probability value for statistical significance can be calculated using Chi Square test. e) Quasi-experimental study. Resembles experimental design but there is no randomisation of study subjects. Both groups have their scores recorded at both baseline then at the end. Roughly equal number of patients suffering from depression from a random selection of mental health facilities will be randomly allocated to two groups. One will consume kale, the other group will not. In each mental health facility, the first 30 patients will consume kale, while the next 30 patients will not. Any difference is attributed to the intervention. Selection bias may occur due to non-random group assignment. The susceptibility of confounding variables is also high but cannot be controlled. To assess statistical significance, p value & 95%CI can be used. Relative risk is used for measure of association and effect. f) Randomized controlled trial Type of intervention/experimental study. Participants will be chosen based on provided consent and if they have the condition of interest (depression). Sample is obtained and baseline measurements recorded (pre-test) Participants are randomly divided into either experimental/intervention group or control/no intervention group. Participants in experimental group consume kale while control group do not. The two groups are followed up and outcome is recorded (post-test) and compared to see if there are any differences between them in outcome (depression level; ratio). Independent samples t-test can be used for this. Confounders and selection bias may occur which can be minimised with randomisation, blinding and identical data collection methods. To assess statistical significance, p value & 95%CI can be used. Relative risk can be used for measure of association and effect. Note: Please be specific. No marks will be given for writing general design features from the ilectures. We want you to ‘apply’ your knowledge of study designs in light of the given scenario. Formatting & Line limit Requirements: For each of the six parts please use Times New Roman size 12 font, normal page borders and 10 lines maximum for each part, (10 lines DO NOT mean ten sentences or ten statements). These formatting requirements are for Q1 ONLY, rest of the questions are free from any formatting requirements. For Q2, Q3, Q4, Q5, Q6, Q7, Q8 and Q9 you will conduct ALL analyses using your own random sample of 50 (Video & instructions posted separately). NO MARKS will be awarded: If you used the entire dataset (N=700) instead of your own sample of 50. If you do not provide the relevant SPSS output. NO SPSS Output = No Mark You can choose a different sample of 50 for every question or use the same sample of 50 for all questions; it is up to you (Please read FAQs for more questions & answers). Question 2 (7 marks) Write null and a non-directional alternative hypotheses that can be tested with an Independent Samples t test. Briefly describe, and report on the equality of variance assumption. Provide complete interpretation of your results including their statistical and practical significance. You need to provide the working out for practical significance using ‘average SD’ as explained in the relevant ilecture. H0: There is no significant difference in the number of hours exercised per week between males and females. H0: µ1 (males) = µ2 (females) H1: There is a significant difference in the number of hours exercised per week between males and females. H1: µ1 (males) ≠ µ2 (females) As p value is .771 (>.05) from Levene’s test, there is no violation in the equality of variance and the assumption is met. In other words, the two gender groups are roughly similar in the variability of exercise hours. The average amount of hours exercised per week for females is 0.26 units higher than males (6.02 vs 6.28). However, since p>.001 and 95%confidence interval of the mean difference includes 0 (-1.24 and .71), the two groups fail to have any real difference between them, and any difference is most likely due to chance factors alone. Thus, null hypothesis is retained, and we conclude that the difference is not statistically significant. To assess practical significance, Cohen’s d is used: It is safe to make the conclusion that the two groups differ by .16 standard deviations. The number of hours exercised per week between the two groups have a very small difference. Question 3 (3 marks) Are people more willing to travel inter-state (i.e. within Australia) for holidays than to travel overseas for holidays in near future? Choose a suitable test to answer this question and provide a short description of your data analysis including statistical significance of your findings. No need to list or test any assumptions. Wilcoxon Signed rank test was conducted to investigate if people were more willing to travel interstate for holidays than to travel overseas for holidays. 37 were negative ranks meaning 37 participants were more willing/likely to travel within Australia than to travel overseas. 9 participants were more willing/likely to travel overseas than within Australia. 4 participants were neutral. Thus, the test indicated that travelling within Australia was rated significantly more favourably than travelling overseas. As p value is <.05, we can conclude that there is a statistically significant difference between willingness to travel inter-state for holidays than to travel overseas for holidays in the near future. Question 4 (3 marks) Are pet owners more willing to travel inter-state (i.e. within Australia) for holidays than those who do not currently have any pets? Choose a suitable test to answer this question and provide a short description of your data analysis including statistical significance of your findings. No need to list or test any assumptions Mann-Whitney U test was performed to investigate if willingness to travel inter-state were significantly different between pet owners and non-pet owners. Mean ranks in the Ranks table show a higher rank for pet owners (Mean rank = 27.24, n = 31) compared to non-pet owners (Mean rank = 22.66, n = 19) which means most pet owners are significantly more willing to travel inter-state compared to non-pet owners. Since p = .251 (2-tailed), we conclude that results between the willingness to travel interstate of the pet owners and non-pet owners are not statistically significant, and that any difference in willingness to travel overseas among those that own a pet and those that do not own a pet are most likely due to chance factors alone. Question 5 (3 marks) Were people generally happier before the COVID pandemic compared to their happiness during the pandemic? Choose a suitable test to answer this question. Carry out the appropriate analyses and write a short summary of your results and conclusion. No need to list or test any assumptions Mean happiness score indicated a significant difference between control and experimental conditions (35.79 vs 24.86) with 10.94 being the mean difference of. We are 95% confident that the true mean difference of happiness score could be as low as 7.99 or as high as 13.89 in the population. As p value is <.001 and 95% confidence interval for mean difference does not include zero, we can conclude that the happiness score is significantly lower for the experimental condition during the pandemic and this difference is unlikely due to chance factors alone. To test practical significance, Cohen’s d is calculated: Mean happiness scores are separated by 1.54 standard deviations, which is a large effect and meaningful. People were generally happier before COVID pandemic compared to their happiness during the pandemic. Question 6 (3 marks) Choose only those who are 26 years and younger and test the hypothesis that they come from a population in which average daily sleep time 5.5 hours per week. Write null and alternative hypotheses. Choose a suitable test and carry out the appropriate analyses and write a short summary of your results including their statistical significance No need to list or test any assumptions. H0: This sample represents a population where the average daily sleep time per week is 5.5 hours for those 26 years and younger. µpopulation daily sleep time = 5.5 HA: This sample represents a population where the average daily sleep time per week is not 5.5 hours. µpopulation daily sleep time ≠ 5.5 Based on a random sample of 22 participants, the average daily sleep time is 7.49 and is 1.99 points above the test value. As p value is .001 and 95%CI for mean difference does not include zero (0.97 to 3.02), we reject null hypothesis and conclude that the average daily sleep time in this population is significantly more than 5.5 and this difference is not due to chance factors. Despite 7.49 (which is 1.99 points above 5.5) being the sample/point estimate of average sleep time, we are 95% confident that the true difference in this population of those aged 26 years and younger can be as low as 0.97 or as high as 3.02 points above the hypothesized test value of 5.5, and as it does not include zero, this difference is statistically significant. Question 7 (4 marks) Test the hypothesis that there is no significant difference in Resting Energy Expenditure among those who Strongly Agree, Agree, Disagree and Strongly Disagree to the statement Weekly quizzes helped me to keep up with the content I need to learn every week. Choose a suitable test to answer this question. Carry out the appropriate analyses and write a short summary of your results and conclusion. Report on only the assumption that you can report at the analysis stage from the SPSS output. Homogeneity of variance was tested: Since Levene’s p =.639, assumption is not violated as variability of resting energy expenditure scores is roughly similar across all four groups. One-way ANOVA was conducted to test the hypothesis. Mean resting expenditure of group D (strongly disagree) is highest (1390.33), followed by group A (strongly agree) with a mean of 1368.44. Group B (agree) have a mean score of 1245.20, with group C (disagree) having the lowest average score (1123.40). As p = .260 we can conclude that all means are same or equal; there is no significant difference between all means. Question 8 (3 marks) Is there any significant difference in text messaging behavior while driving among people with different pet preferences? Choose a suitable test to answer this question. Carry out the appropriate analyses and write a short summary of your results and conclusion. No need to list or test any assumptions Kruskal-Wallis test was conducted to investigate if text messaging behaviour while driving were significantly different between people with different pet preferences. Those that prefer reptiles most frequently texted (mean rank 29.88), followed by those that prefer ‘other’ with a mean score of 26.89. Those that prefer dogs had a mean rank of 20.50, while those that prefer cats texted the least (mean rank 19.83). As p-value is .381 we conclude that there is no significant difference in the median texting frequency by pet preference and any difference is probably due to chance alone. Question 9 (1 mark) Participants were asked if only one wish could come true what would be that wish for them. Every participant provided one wish that was important to them and some wishes were more common than the others. Researchers collated all the responses top five responses were coded as (Variable: Top Wish & values/codes A, B, C, D and E). Your task is to use your discretion, creativity or even imagination to assign each code a hypothetical wish or a label or description, make up any whatever you think these are/can be/or should be (e.g. Assessments become extinguished specie or My two cats stop trying to kill each other!). Please provide a suitable graph that shows labels/description for each of the five codes as wishes. There be no mark if you simply provide a suitable graph without five wishes (no labels/description = no mark). Bar graph (Relevant SPSS output not provided (under each question) with written answer = No mark). This applies to Qs2-9 ‘FAQs’ are posted & please refer to those before you post any question. Thank you everyone for posting your questions on the Discussion Board or taking them to Open Collaborate sessions for the first submission. Same rules apply to this final submission. To be fair to everyone emails will not be responded to. We wish you our very best.