Uploaded by tefo mokgoabone

STA272 Group Assignment

advertisement
Department of Statistics
2023/24 – Semester II
STA272 – STATISTICAL COMPUTING
GROUP PROJECT
Due: 09-May-2024 @ 16h00
Marks: 50
Instructions:
• Any work submitted late would be penalised as follows:
– any work submitted before midnight of the due date would attract a
penalty of up to 10%
– any work submitted a day late would attract a penalty of up to 25%
– any work submitted two days late would attract a penalty of up to 50%
– otherwise you’ll be awarded a zero mark.
• Any form cheating is not allowed and plagiarised work will be awarded a zero
mark.
1
Data
This group project is based on an excel file named Project Data within the Data
folder in the STA272 Moodle portal. The data consists of the following variables:
Variable
pid
Description
Participant ID
Values
sex
Sex of the participant
1 = Male
2 = Female
age
Age of the participant in years
marital_status
Marital status
1 = Single/Never Married
2 = Married
3 = Divorced/Separated
4 = Widowed
edu_level
Highest level of education
1 = Primary
2 = Secondary
3 = Tertiary
weight
Participant’s weight (in kg)
height
Participant’s height (in cm)
systolic_bp_1,
systolic_bp_2
Participant’s two measurements of systolic blood pressure
in mmHg
diastolic_bp_1, Participant’s two measurements of diastolic blood pressure
diastolic_bp_2 in mmHg
2
Objective
The aim of this project is to investigate if there are any associations between demographic and physical variables in the data and whether an individual is hypertensive
or the actual blood pressure values. A participant is said to be hypertensive if their
systolic blood pressure (SBP) is 140 mmHg or above, or if their diastolic blood
pressure (DBP) is 90 mmHg or above. This would involve data transformations
and basic statistics analysis that you have covered in this module.
Expectations
The following are the outputs you should submit as your final pieces of your work.
1. A written report in word/latex/Rmarkdown which will be handed in on/before the due date at my office, 240/252. [30 marks]
2. All your R codes used for data manipulation and analysis. This must be
submitted online via Moodle. [20 marks]
3. A single page outlining each group member’s contribution to the assignment.
Anyone whose contribution is deemed negligible will be awarded a zero mark
for no effort. Here’s what I expect from each team member:
• Clearly defined and measurable contributions: Each team member should
have specific tasks or deliverables assigned. These contributions should
be clearly outlined and have a measurable outcome to demonstrate their
impact on the project.
• Focus on expertise and value-added skills: The description should highlight how each member’s unique skills and knowledge will contribute to
the project’s success.
• Avoid mentioning generic tasks: Tasks like organizing meetings, providing a laptop, or simply being present don’t showcase an individual’s
specific value to the project.
Report
The written report should be structured as follows:
Introduction – What is the problem? A short and precise description of the
goal of the report. What is the structure of the report? Use ordinary, nonstatistical language.
Methods – A systematic account of the methods you have used and why it was
chosen. For example, to test for association one can use a χ2 test for association but you still have to justify why its suitable for your particular problem
and cite relevant supportive sources.
3
Data Analysis – Carry out some explanatory data analysis. That is, use charts
and graphs to explore your data. In particular, the distribution of your continuous variables, boxplots between the response variable and the predictor
variables.
Conclusions – A summary of the main things which have been learned in earlier parts of the report and what it all means. Use ordinary, non-statistical
language as much as possible.
The following questions are meant to guide your analysis approach and the write
up. Therefore you should not provide direct answers to these question as your
output.
Q1. Load this data into R and convert all categorical variables in the data to
labelled factor variables.
Q2. There are two BP readings for both systolic and diastolic BP. This is a normal
procedure in clinics whereby a patient’s BP is taken several times between
rests to account for the initial reading elevation due to anxiety or nervousness.
(a) Use an appropriate statistical test to check if the two readings are consistent with each other for both systolic and diastolic BP.
(b) Based on your results and their interpretation in (a), do you think its
fair to use the average of the two readings. If your answer is affirmative,
compute the average the BP readings for both systolic and diastolic BP.
(c) Create a new variable which indicates whether a participant is hypertensive or not. This variable will be dependent on your answers in part
(a) and (b) and which ever variables you use there should be some justification.
Q3. What is prevalence of hypertension in this cohort? Are there any associations between hypertension and sex, age, highest level of education, weight
or height. Note that in literature, researchers generally do not directly use
weight nor height, instead body mass index (BMI) is preferred.
Q4. It should also be interesting to check if there are significant correlation between continuous variables as this would affect your analysis if you were to
fit a model which several variables. Therefore it may be worth it to consider
correlation analysis.
4
Teams
Below are the different project teams, and were randomly allocated.
Group 1
9 Gaborone, Goabaone
25 Moyo, Karabo
14 Maithamako, Diana
23 Mothatego, Mechell
21 Mokgoabone, Tefo
Group 2
17 Maphorisa, Wapapha
22 Monthe, Bame
3 Batshwenyo, Belinda
26 Ngakaemang, Kitso
16 Manthai, Peggy
Group 3
29 Setlhare, Nnete
24 Motlokwa, Jaden
20 Mogome, Yaone
2 Baloyi, Minsozi
4 Bontseng, Motheo
Group 4
28 Seepo, Neo
27 Ntlotlang, Mompati
12 Kebareng, Thobo
1 Bakwadi, Taliah
7 Dlamini, Sinethemba
Group 5
13 Kgoladisa, Boago
6 Diteko, Keneilwe
8 Elijah, Cathy
10 Kammona, Kearate
15 Makumula, Bright
Group 6
5 Dikgang, Gontlafetse
18 Mathe, Piletso
11 Katse, Olorato
19 Moendambele, Albertinah
5
Download