Department of Statistics 2023/24 – Semester II STA272 – STATISTICAL COMPUTING GROUP PROJECT Due: 09-May-2024 @ 16h00 Marks: 50 Instructions: • Any work submitted late would be penalised as follows: – any work submitted before midnight of the due date would attract a penalty of up to 10% – any work submitted a day late would attract a penalty of up to 25% – any work submitted two days late would attract a penalty of up to 50% – otherwise you’ll be awarded a zero mark. • Any form cheating is not allowed and plagiarised work will be awarded a zero mark. 1 Data This group project is based on an excel file named Project Data within the Data folder in the STA272 Moodle portal. The data consists of the following variables: Variable pid Description Participant ID Values sex Sex of the participant 1 = Male 2 = Female age Age of the participant in years marital_status Marital status 1 = Single/Never Married 2 = Married 3 = Divorced/Separated 4 = Widowed edu_level Highest level of education 1 = Primary 2 = Secondary 3 = Tertiary weight Participant’s weight (in kg) height Participant’s height (in cm) systolic_bp_1, systolic_bp_2 Participant’s two measurements of systolic blood pressure in mmHg diastolic_bp_1, Participant’s two measurements of diastolic blood pressure diastolic_bp_2 in mmHg 2 Objective The aim of this project is to investigate if there are any associations between demographic and physical variables in the data and whether an individual is hypertensive or the actual blood pressure values. A participant is said to be hypertensive if their systolic blood pressure (SBP) is 140 mmHg or above, or if their diastolic blood pressure (DBP) is 90 mmHg or above. This would involve data transformations and basic statistics analysis that you have covered in this module. Expectations The following are the outputs you should submit as your final pieces of your work. 1. A written report in word/latex/Rmarkdown which will be handed in on/before the due date at my office, 240/252. [30 marks] 2. All your R codes used for data manipulation and analysis. This must be submitted online via Moodle. [20 marks] 3. A single page outlining each group member’s contribution to the assignment. Anyone whose contribution is deemed negligible will be awarded a zero mark for no effort. Here’s what I expect from each team member: • Clearly defined and measurable contributions: Each team member should have specific tasks or deliverables assigned. These contributions should be clearly outlined and have a measurable outcome to demonstrate their impact on the project. • Focus on expertise and value-added skills: The description should highlight how each member’s unique skills and knowledge will contribute to the project’s success. • Avoid mentioning generic tasks: Tasks like organizing meetings, providing a laptop, or simply being present don’t showcase an individual’s specific value to the project. Report The written report should be structured as follows: Introduction – What is the problem? A short and precise description of the goal of the report. What is the structure of the report? Use ordinary, nonstatistical language. Methods – A systematic account of the methods you have used and why it was chosen. For example, to test for association one can use a χ2 test for association but you still have to justify why its suitable for your particular problem and cite relevant supportive sources. 3 Data Analysis – Carry out some explanatory data analysis. That is, use charts and graphs to explore your data. In particular, the distribution of your continuous variables, boxplots between the response variable and the predictor variables. Conclusions – A summary of the main things which have been learned in earlier parts of the report and what it all means. Use ordinary, non-statistical language as much as possible. The following questions are meant to guide your analysis approach and the write up. Therefore you should not provide direct answers to these question as your output. Q1. Load this data into R and convert all categorical variables in the data to labelled factor variables. Q2. There are two BP readings for both systolic and diastolic BP. This is a normal procedure in clinics whereby a patient’s BP is taken several times between rests to account for the initial reading elevation due to anxiety or nervousness. (a) Use an appropriate statistical test to check if the two readings are consistent with each other for both systolic and diastolic BP. (b) Based on your results and their interpretation in (a), do you think its fair to use the average of the two readings. If your answer is affirmative, compute the average the BP readings for both systolic and diastolic BP. (c) Create a new variable which indicates whether a participant is hypertensive or not. This variable will be dependent on your answers in part (a) and (b) and which ever variables you use there should be some justification. Q3. What is prevalence of hypertension in this cohort? Are there any associations between hypertension and sex, age, highest level of education, weight or height. Note that in literature, researchers generally do not directly use weight nor height, instead body mass index (BMI) is preferred. Q4. It should also be interesting to check if there are significant correlation between continuous variables as this would affect your analysis if you were to fit a model which several variables. Therefore it may be worth it to consider correlation analysis. 4 Teams Below are the different project teams, and were randomly allocated. Group 1 9 Gaborone, Goabaone 25 Moyo, Karabo 14 Maithamako, Diana 23 Mothatego, Mechell 21 Mokgoabone, Tefo Group 2 17 Maphorisa, Wapapha 22 Monthe, Bame 3 Batshwenyo, Belinda 26 Ngakaemang, Kitso 16 Manthai, Peggy Group 3 29 Setlhare, Nnete 24 Motlokwa, Jaden 20 Mogome, Yaone 2 Baloyi, Minsozi 4 Bontseng, Motheo Group 4 28 Seepo, Neo 27 Ntlotlang, Mompati 12 Kebareng, Thobo 1 Bakwadi, Taliah 7 Dlamini, Sinethemba Group 5 13 Kgoladisa, Boago 6 Diteko, Keneilwe 8 Elijah, Cathy 10 Kammona, Kearate 15 Makumula, Bright Group 6 5 Dikgang, Gontlafetse 18 Mathe, Piletso 11 Katse, Olorato 19 Moendambele, Albertinah 5