Uploaded by Naomi Yuen

Data Analysis & Research Design Evaluation Assignment

advertisement
Data Analysis & Research Design Evaluation 2
Please read and sign the Declaration below.
As I type my name below, I admit that I am submitting my own work. I have answered
all the questions and conducted all the analyses myself independently. I understand that
all forms of plagiarism, cheating and collusion are regarded seriously by the University
and could result in penalties including failure and possible exclusion from the
University.
Naomi Yuen
Your full name
19896638
Student ID
22/05/21
Date
You must provide relevant SPSS output for all data analyses questions;
No relevant output = Zero mark
Question 1 (18 marks; 3 marks for each part): As a health and fitness enthusiast you are
interested in the effects of regularly eating kale (also known as leaf cabbage), even though you
don’t particularly like the taste of your daily kale smoothie. A close friend reports an
improvement in their depression with the consumption of kale. Now that you are just about to
finish EPID1000, a unit you were skeptical about in the beginning and felt scared of numbers
but now you realize there is always a story behind numbers and you appreciate the knowledge
and skills this unit it has provided to you along with practice opportunities with lab & tutorial
exercises and even with first assessment! You now feel confident you can plan a hypothetical
research study to explore if regular consumption of kale really improves or even prevents
depression. You consider various study designs (as below) and what each would involve in
terms of participants and processes, what measures of association and effect each can provide
and what could be potential bias and other (confounding) factors you should be aware of & how
these could be minimized or even eliminated.
You should use the above description and points provided in the review exercise (tutorial 11) to
answer this question. Please also re-visit the epidemiology ilectures which cover each design on
weekly basis.
a) Cross-sectional study
Where each individual’s disease presence or absence, as well as risk factor/exposure, is identified.
Explores correlations that may exist between exposure (kale) and outcome (depression level) at a
particular point in time. To plan, establish IVs, (e.g., kale consumption) DVs (e.g., depression level),
extraneous, and confounder variables and decide how to collect information. A questionnaire/survey
can be created in order to collect all relevant information such as depression score (measured in
depression score; ratio variable) and kale consumption (measured as very regularly, regularly,
irregularly, never; ordinal variable). A representative & willing sample of the general population (to
minimise selection bias) will be recruited to answer questions – data will be collected once.
Information bias may occur, which may be minimised by asking precise questions. Confounders can
be minimised through randomisation. Prevalence ratio is used to measure of association and effect.
b) Case Control study
Two groups of participants need to be recruited: cases (with depression) & controls (without
depression). Subjects are selected on the basis of having or not having the condition of interest
(depression) and must be similar in all regards other than the condition. Participants should be
chosen from the general population (to reduce chances of selection bias). Essential data such as kale
consumption and other confounding factors such as genes and lifestyle will need to be collected.
Both groups will be asked questions and compared to see if their previous exposures are different,
otherwise systematic difference can cause biased results. Chances of recall and information bias may
also occur. Matching can at least be used to minimise confounding. Measure of association and
affect can be calculated through odds ratio. Statistical significance can be tested through Chi square
test and 95%CI.
c) Prospective Cohort study.
Disease-free participants are identified & followed for a specified period of time. Identify and
define the cohort: depression-free individuals from the general population at the start, with
varying levels of kale consumption. Those that already have the condition are excluded. Data
on exposures (e.g., kale) and confounding variables (e.g., age, major events, medication, etc)
must be collected before data on the outcome (depression). All undergo a follow up for a set
period of time and all cases of depression (yes/no; nominal) will be documented. Cumulative
incidence/risk of depression is compared between those with high and low kale consumption.
It is prone to selection, recall, and information bias which may be minimised by blinding and
identical data collection methods. RR and its 95%CI can be used as a measure of association
and effect. Pearson Chi squared test can be used to establish statistical significance.
d) Retrospective Cohort study.
Where existing data is identified to look for exposures (kale consumption) and records few
years later are used to establish the outcomes. To plan, identify and define a cohort through
records (in this scenario, most likely medical records from the past that provide information
about kale consumption). Then check through records dating a few years after to see whether
there are any reports or diagnoses of depression (measured in yes/no; binomial nominal) and
use relative risk to compare depression cases between those with high and low kale
consumption. This study is prone to recall, selection, and information bias which may be hard
to avoid in this study. RR and its 95%CI can be used as a measure of association and effect.
Relative risk and 95%CI along with probability value for statistical significance can be
calculated using Chi Square test.
e) Quasi-experimental study.
Resembles experimental design but there is no randomisation of study subjects. Both groups
have their scores recorded at both baseline then at the end. Roughly equal number of patients
suffering from depression from a random selection of mental health facilities will be randomly
allocated to two groups. One will consume kale, the other group will not. In each mental
health facility, the first 30 patients will consume kale, while the next 30 patients will not. Any
difference is attributed to the intervention. Selection bias may occur due to non-random group
assignment. The susceptibility of confounding variables is also high but cannot be controlled.
To assess statistical significance, p value & 95%CI can be used. Relative risk is used for
measure of association and effect.
f) Randomized controlled trial
Type of intervention/experimental study. Participants will be chosen based on provided
consent and if they have the condition of interest (depression). Sample is obtained and baseline
measurements recorded (pre-test) Participants are randomly divided into either
experimental/intervention group or control/no intervention group. Participants in experimental
group consume kale while control group do not. The two groups are followed up and outcome
is recorded (post-test) and compared to see if there are any differences between them in
outcome (depression level; ratio). Independent samples t-test can be used for this. Confounders
and selection bias may occur which can be minimised with randomisation, blinding and
identical data collection methods. To assess statistical significance, p value & 95%CI can be
used. Relative risk can be used for measure of association and effect.
Note: Please be specific. No marks will be given for writing general design features from the
ilectures. We want you to ‘apply’ your knowledge of study designs in light of the given
scenario.
Formatting & Line limit Requirements: For each of the six parts please use
Times New Roman size 12 font, normal page borders and 10 lines maximum for
each part, (10 lines DO NOT mean ten sentences or ten statements). These
formatting requirements are for Q1 ONLY, rest of the questions are free from any
formatting requirements.
For Q2, Q3, Q4, Q5, Q6, Q7, Q8 and Q9 you will conduct ALL analyses using your own random
sample of 50 (Video & instructions posted separately).


NO MARKS will be awarded:
If you used the entire dataset (N=700) instead of your own sample of 50.
If you do not provide the relevant SPSS output. NO SPSS Output = No Mark
You can choose a different sample of 50 for every question or use the same sample of 50 for all
questions; it is up to you (Please read FAQs for more questions & answers).
Question 2 (7 marks)
Write null and a non-directional alternative hypotheses that can be tested with an Independent
Samples t test. Briefly describe, and report on the equality of variance assumption. Provide
complete interpretation of your results including their statistical and practical significance. You
need to provide the working out for practical significance using ‘average SD’ as explained in the
relevant ilecture.
H0: There is no significant difference in the number of hours exercised per week between
males and females. H0: µ1 (males) = µ2 (females)
H1: There is a significant difference in the number of hours exercised per week between males
and females. H1: µ1 (males) ≠ µ2 (females)
As p value is .771 (>.05) from Levene’s test, there is no violation in the equality of variance and the
assumption is met. In other words, the two gender groups are roughly similar in the variability of
exercise hours. The average amount of hours exercised per week for females is 0.26 units higher than
males (6.02 vs 6.28). However, since p>.001 and 95%confidence interval of the mean difference
includes 0 (-1.24 and .71), the two groups fail to have any real difference between them, and any
difference is most likely due to chance factors alone. Thus, null hypothesis is retained, and we conclude
that the difference is not statistically significant.
To assess practical significance, Cohen’s d is used:
It is safe to make the conclusion that the two groups differ by .16 standard deviations. The number of
hours exercised per week between the two groups have a very small difference.
Question 3 (3 marks)
Are people more willing to travel inter-state (i.e. within Australia) for holidays than to travel
overseas for holidays in near future?
Choose a suitable test to answer this question and provide a short description of your data
analysis including statistical significance of your findings. No need to list or test any
assumptions.
Wilcoxon Signed rank test was conducted to investigate if people were more willing to travel interstate for holidays than to travel overseas for holidays. 37 were negative ranks meaning 37
participants were more willing/likely to travel within Australia than to travel overseas. 9 participants
were more willing/likely to travel overseas than within Australia. 4 participants were neutral. Thus,
the test indicated that travelling within Australia was rated significantly more favourably than
travelling overseas. As p value is <.05, we can conclude that there is a statistically significant
difference between willingness to travel inter-state for holidays than to travel overseas for holidays in
the near future.
Question 4 (3 marks)
Are pet owners more willing to travel inter-state (i.e. within Australia) for holidays than those
who do not currently have any pets?
Choose a suitable test to answer this question and provide a short description of your data
analysis including statistical significance of your findings. No need to list or test any
assumptions
Mann-Whitney U test was performed to investigate if willingness to travel inter-state were
significantly different between pet owners and non-pet owners. Mean ranks in the Ranks table show
a higher rank for pet owners (Mean rank = 27.24, n = 31) compared to non-pet owners (Mean rank =
22.66, n = 19) which means most pet owners are significantly more willing to travel inter-state
compared to non-pet owners. Since p = .251 (2-tailed), we conclude that results between the
willingness to travel interstate of the pet owners and non-pet owners are not statistically significant,
and that any difference in willingness to travel overseas among those that own a pet and those that do
not own a pet are most likely due to chance factors alone.
Question 5 (3 marks)
Were people generally happier before the COVID pandemic compared to their happiness during
the pandemic?
Choose a suitable test to answer this question. Carry out the appropriate analyses and write a
short summary of your results and conclusion. No need to list or test any assumptions
Mean happiness score indicated a significant difference between control and experimental
conditions (35.79 vs 24.86) with 10.94 being the mean difference of. We are 95% confident that
the true mean difference of happiness score could be as low as 7.99 or as high as 13.89 in the
population. As p value is <.001 and 95% confidence interval for mean difference does not
include zero, we can conclude that the happiness score is significantly lower for the
experimental condition during the pandemic and this difference is unlikely due to chance factors
alone.
To test practical significance, Cohen’s d is calculated:
Mean happiness scores are separated by 1.54 standard deviations, which is a large effect and
meaningful. People were generally happier before COVID pandemic compared to their
happiness during the pandemic.
Question 6 (3 marks)
Choose only those who are 26 years and younger and test the hypothesis that they come from a
population in which average daily sleep time 5.5 hours per week.
Write null and alternative hypotheses.
Choose a suitable test and carry out the appropriate analyses and write a short summary of
your results including their statistical significance No need to list or test any assumptions.
H0: This sample represents a population where the average daily sleep time per week is 5.5 hours
for those 26 years and younger. µpopulation daily sleep time = 5.5
HA: This sample represents a population where the average daily sleep time per week is not 5.5
hours. µpopulation daily sleep time ≠ 5.5
Based on a random sample of 22 participants, the average daily sleep time is 7.49 and is 1.99 points
above the test value. As p value is .001 and 95%CI for mean difference does not include zero (0.97
to 3.02), we reject null hypothesis and conclude that the average daily sleep time in this population
is significantly more than 5.5 and this difference is not due to chance factors.
Despite 7.49 (which is 1.99 points above 5.5) being the sample/point estimate of average sleep
time, we are 95% confident that the true difference in this population of those aged 26 years and
younger can be as low as 0.97 or as high as 3.02 points above the hypothesized test value of 5.5,
and as it does not include zero, this difference is statistically significant.
Question 7 (4 marks)
Test the hypothesis that there is no significant difference in Resting Energy Expenditure
among those who Strongly Agree, Agree, Disagree and Strongly Disagree to the statement
Weekly quizzes helped me to keep up with the content I need to learn every week.
Choose a suitable test to answer this question. Carry out the appropriate analyses and write a
short summary of your results and conclusion. Report on only the assumption that you can
report at the analysis stage from the SPSS output.
Homogeneity of variance was tested:
Since Levene’s p =.639, assumption is not violated as variability of resting energy expenditure
scores is roughly similar across all four groups.
One-way ANOVA was conducted to test the hypothesis. Mean resting expenditure of group D
(strongly disagree) is highest (1390.33), followed by group A (strongly agree) with a mean of
1368.44. Group B (agree) have a mean score of 1245.20, with group C (disagree) having the lowest
average score (1123.40). As p = .260 we can conclude that all means are same or equal; there is no
significant difference between all means.
Question 8 (3 marks)
Is there any significant difference in text messaging behavior while driving among people with
different pet preferences?
Choose a suitable test to answer this question. Carry out the appropriate analyses and write
a short summary of your results and conclusion. No need to list or test any assumptions
Kruskal-Wallis test was conducted to investigate if text messaging behaviour while driving
were significantly different between people with different pet preferences. Those that prefer
reptiles most frequently texted (mean rank 29.88), followed by those that prefer ‘other’ with a
mean score of 26.89. Those that prefer dogs had a mean rank of 20.50, while those that prefer
cats texted the least (mean rank 19.83). As p-value is .381 we conclude that there is no
significant difference in the median texting frequency by pet preference and any difference is
probably due to chance alone.
Question 9 (1 mark)
Participants were asked if only one wish could come true what would be that wish for them. Every
participant provided one wish that was important to them and some wishes were more common than
the others. Researchers collated all the responses top five responses were coded as (Variable: Top
Wish & values/codes A, B, C, D and E). Your task is to use your discretion, creativity or even
imagination to assign each code a hypothetical wish or a label or description, make up any whatever
you think these are/can be/or should be (e.g. Assessments become extinguished specie or My two cats
stop trying to kill each other!).
Please provide a suitable graph that shows labels/description for each of the five codes as
wishes. There be no mark if you simply provide a suitable graph without five wishes (no
labels/description = no mark).
Bar graph
(Relevant SPSS output not provided (under each question) with written answer = No mark).
This applies to Qs2-9
‘FAQs’ are posted & please refer to those before you post any question.
Thank you everyone for posting your questions on the Discussion Board or taking
them to Open Collaborate sessions for the first submission. Same rules apply to this
final submission.
To be fair to everyone emails will not be responded to.
We wish you our very best.
Download