PHS 801 Biostatistics: Integrative Exercises Dr. Sondip Mathur [Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan] The Framingham Heart Study—Integrative Exercises (Capstone Assignment) Background The National Heart, Lung, and Blood Institute (NHLBI)1 created a teaching dataset that includes real but anonymized data collected as part of the Framingham Heart Study. The Framingham Heart Study is one of the most influential and longest running epidemiological studies of risk factors for cardiovascular disease ever run. The study started in 1948 and continues today to collect extensive data from original participants, their children, and their children’s children. Much of what we know about cardiovascular disease was discovered by investigators involved in the Framingham Heart Study. In fact, studies to date using data collected in the Framingham Heart study have resulted in over 3000 publications in high impact, peer-reviewed medical journals. The Framingham Heart Study has been widely discussed in the media. WGBH in Boston produced a video documentary for PBS entitled “The Hidden Epidemic: Heart Disease in America” that details the history of heart disease in this country and highlights the Framingham Heart Study. 2 In 2007, CBS News did a story on the study, its participants, and its impact.3 Additionally, research results from the Framingham Heart Study are communicated widely, most recently highlighting the discovery of a gene that may promote obesity4 and new data showing declining rates of dementia.5 Interested readers can visit the Framingham Heart Study website for a detailed history of this incredible study and its many contributions to preventive medicine.6 Datasets for Analysis NHLBI created a longitudinal teaching dataset includes clinical, laboratory, and outcome data on n = 4434 participants. Each participant has between one and three observations—which represent 1 http://www.nhlbi.nih.gov/ http://www.pbs.org/wgbh/takeonestep/heart/ 3 http://www.cbsnews.com/videos/landmark-heart-study/ 4 http://www.cbsnews.com/news/how-a-fat-gene-may-influence-your-weight/ 5 http://www.cbsnews.com/news/dementia-alzheimers-risk-signs-of-hope-study/ 6 https://www.framinghamheartstudy.org/about-fhs/history.php/ 2 Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company 1 PHS 801 Biostatistics: Integrative Exercises Dr. Sondip Mathur [Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan] examinations held approximately 6 years apart. There are a total of 11,627 observations in the full dataset. A detailed description of the Framingham Heart Study dataset and other public use datasets available from NHLBI are available on the NHLBI Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) website.7 Two datasets are available for analysis here—one is the complete dataset with n = 11,627 observations (or person-exams), and the second includes only data collected at the first examination for each participant (n = 4434). The two datasets are available as comma separated values (.csv) files for analysis in Excel, R, or other statistical computing packages. FHS-All.csv contains n = 11,627 observations and FHS-Exam1.csv contains n = 4434 observations. Variables The following variables are available in each dataset for analysis (extracted from the complete documentation file, available on the NHLBI BioLINCC website 8). Variable Name RANDID SEX PERIOD TIME AGE SYSBP DIABP BPMEDS CURSMOKE CIGPDAY TOTCHOL HDLC* LDLC* BMI GLUCOSE DIABETES HEARTRTE PREVAP PREVCHD 7 8 Description Unique identification number for each participant Participant sex Exam cycle Number of days since first (baseline) exam Age at exam, years Systolic blood pressure, mmHg Diastolic blood pressure, mmHg Use of anti-hypertensive medication Currently smoking cigarettes Number of cigarettes smoked per day Total serum cholesterol, mg/dL High density lipoprotein cholesterol, mg/dL Low density lipoprotein cholesterol, mg/dL Body mass index = weight (kg)/height (m)2 Serum glucose, mg/dL Diabetes (glucose > 200 mg/dL or on treatment) Heart rate, beats/minute Prevalent angina pectoris Prevalent coronary heart disease (CHD) Coding Details/Range 2248-9999312 1 = Male, 2 = Female 1, 2, 3 0–4854 32–81 83–295 30–150 0 = No, 1 = Yes 0 = No, 1 = Yes 0 (non-smoker)–90 107–696 10–189 20–565 14–57 39–478 0 = No, 1 = Yes 37–220 0 = No, 1 = Yes 0 = No, 1 = Yes https://biolincc.nhlbi.nih.gov/static/studies/teaching/framdoc.pdf?link_time=2016-07-06_14:21:55.514359 https://biolincc.nhlbi.nih.gov/static/studies/teaching/framdoc.pdf?link_time=2016-07-06_14:21:55.514359 Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company 2 PHS 801 Biostatistics: Integrative Exercises Dr. Sondip Mathur [Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan] PREVMI Prevalent myocardial infarction (MI) 0 = No, 1 = Yes PREVSTRK Prevalent stroke 0 = No, 1 = Yes PREVHYP Prevalent hypertension 0 = No, 1 = Yes The following are outcome events coded 1 if the event occurred during the follow-up (only the first event is recorded). ANGINA Angina pectoris 0 = No, 1 = Yes HOSPMI Hospitalized for MI 0 = No, 1 = Yes MI_FCHD Hospitalized for MI or fatal CHD 0 = No, 1 = Yes ANYCHD Any coronary heart disease event 0 = No, 1 = Yes STROKE Stroke 0 = No, 1 = Yes CVD Cardiovascular disease 0 = No, 1 = Yes HYPERTEN Hypertension 0 = No, 1 = Yes DEATH Death from any cause 0 = No, 1 = Yes The following are numbers of days from the first (baseline) exam to the first event during the follow-up. If no event occurred, time is end of follow-up, death, or last known contact date. TIMEAP Time from baseline to first angina TIMEMI Time from baseline to first myocardial infarction TIMEMIFC Time from baseline to first MI or fatal CHD TIMECHD Time from baseline to first CHD TIMESTRK Time from baseline to first stroke TIMECVD Time from baseline to first cardiovascular disease TIMEHYP Time from baseline to first hypertension TIMEDTH Time from baseline to death *Available only at period = 3 exam, missing otherwise Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company 3 PHS 801 Biostatistics: Integrative Exercises Dr. Sondip Mathur [Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan] CAPSTONE ASSIGNMENT: INTEGRATIVE EXERCISES 1. Describe the study sample (by completing the following the table). Complete the following table to describe the study sample using data collected at the first examination for each participant (n = 4434). Summarize your results in three to four sentences. Patient Characteristic* Age, years Male sex Systolic blood pressure, mmHg Diastolic blood pressure, mmHg Use of anti-hypertensive medication Current smoker Total serum cholesterol, mg/dL Body mass index Diabetes * Mean (Standard deviation) or n (%) Total Sample (n = 4434) 49.92(8.67) 2200(49.7%) 132.9 (22.42) 83.08 (12.05) n (%) n (%) 236.98 (44.65) 25.84 (4.10) n (%) (INSERT SAMPLE BRIEF SUMMARY NARRATIVE/PARAGRAPH ON ITEM-1) Note: There were some missing values for some of the variables in the table. For example, the use of anti-hypertensive medicine was available in n = 4373 participants (61 participants had missing data). Total serum cholesterol was available in n = 4382 and body mass index available in n = 4415 (52 and 19 participants, respectively, had missing data). Because the extent of missing data is small (less than 5% of the total sample) we do not mention the missing data in the summary because its extent is unlikely to affect the results. 2. Compare risk factors in men and women. Complete the following table to compare men and women using data collected at the first examination for each participant (n = 4434). Summarize your results in three to four sentences. Patient Characteristic* Age, years Systolic blood pressure, mmHg Diastolic blood pressure, mmHg Use of anti-hypertensive medication Men (n = 314) 56.2 (9.8) 130.5 (15.6) 82.4 (9.8) 27.4% Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company Women (n = 698) 58.7 (9.5) 135.2 (17.2) 85.6 (10.5) 32.1% 4 PHS 801 Biostatistics: Integrative Exercises Dr. Sondip Mathur [Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan] Current smoker n (%) n (%) Total serum cholesterol, mg/dL Mean (SD) Mean (SD) Body mass index Mean (SD) Mean (SD) Diabetes n (%) n (%) * Mean (Standard deviation) or n (%) [INSERT BRIEF SUMMARY] 3. Do risk factors cluster? Specifically, do obese participants have more risk factors? Create a new variable using BMI that classifies individuals as normal weight, overweight or obese, as follows: Normal weight Overweight Obese BMI < 25.0 25.0 < BMI < 30.0 BMI > 30.0 Complete the following table to compare participants who are normal weight, overweight, and obese using data collected at the first examination for each participant (n = 4434). Summarize your results in one paragraph or less. Patient Characteristic* Age, years Male sex Systolic blood pressure, mmHg Diastolic blood pressure, mmHg Use of anti-hypertensive medication Current smoker Total serum cholesterol, mg/dL Diabetes * Mean (Standard deviation) or n (%) Normal (n = ?) Mean (SD) n (%) Mean (SD) Mean (SD) n (%) n (%) Mean (SD) n (%) Overweight (n = ?) Mean (SD) n (%) Mean (SD) Mean (SD) n (%) n (%) Mean (SD) n (%) Obese (n = ?) Mean (SD) n (%) Mean (SD) Mean (SD) n (%) n (%) Mean (SD) n (%) EXAMPLE NARRATIVE: Participants who are obese (BMI > 30 kg/m2) generally have more risk factors for cardiovascular disease than participants who are overweight or normal weight. Specifically, obese participants are older, have higher systolic and diastolic blood pressures, and are more than twice as likely to be on anti-hypertensive medications than their counterparts who are overweight or normal weight. They are also far more likely to have diabetes than participants who are overweight or Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company 5 PHS 801 Biostatistics: Integrative Exercises Dr. Sondip Mathur [Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan] normal weight. Total serum cholesterol levels are similar across BMI categories and there is no clear trend in the distribution of sex across BMI categories. Participants who are obese are less likely to smoke (34.5%) compared to their counterparts who are overweight (44.6% report current smoking) and normal weight (57.8% report current smoking). 4. Summarize the distribution of BMI categories in men versus women. Construct a plot to compare the distributions of weight categories (normal, overweight, and obese) for men and women, considered separately. Summarize the comparison in three to four sentences. 60% 50% Percent 40% 30% 20% 10% 0% Men Women Normal Overweight Obese SAMPLE NARRATIVE: Among men, approximately one third are normal weight, ... The distribution is different in women. Approximately … are obese. This result is consistent with the difference in mean BMI we observed in men (26.2 kg/m2) versus women (25.6 kg/m2) in question 2. Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company 6 PHS 801 Biostatistics: Integrative Exercises Dr. Sondip Mathur [Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan] 5. What characteristics are associated with BMI? Use simple and multivariable linear regression analysis to complete the following table relating the characteristics listed to BMI as a continuous variable. Before conducting the analysis, be sure that all participants have complete data on all analysis variables. If participants are excluded due to missing data, the numbers excluded should be reported. Then, describe how each characteristic is related to BMI. Are crude and multivariable effects similar? What might explain or account for any differences? Outcome Variable: BMI, kg/m2 Regression Coefficient Crude Models p-value Regression Coefficient Multivariable Model p-value Age, years 0.06 < 0.001 –0.02 < 0.001 Male sex 0.58 < 0.001 0.99 0.004 Systolic blood pressure, mmHg - - - - Total serum cholesterol, mg/dL - - - - Current smoker - - - - Diabetes - - - - Characteristic INSERT RELATED NARRATIVE HERE: A total of n = 4434 participants are available for analysis. However, some participants are missing data on key analysis variables and are excluded (BMI (n = 19) and total serum cholesterol (n = 51)) leaving a sample of n = 4364. In simple linear regression analyses, each of the characteristics considered is statistically significantly associated with BMI (p < 0.001 for each). Age, male sex, systolic blood pressure, total serum cholesterol, and diabetes are …insert… Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company 7 PHS 801 Biostatistics: Integrative Exercises Dr. Sondip Mathur [Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan] In a multivariable linear regression model, all characteristics remain statistically significantly associated with BMI (after adjustment for other characteristics) but some of the associations shift suggesting that there may be confounding. For example, in an unadjusted model each year of age is associated with a higher BMI of approximately 0.06 kg/m2. After adjustment for other risk factors, the effect of age is …insert…. More analysis is needed to investigate relationships among the risk factors. 6. Who is most likely to have prevalent coronary heart disease? Test if there are significant differences in the following risk factors between persons with and without prevalent coronary heart disease (CHD). Summarize the statistical results in the table below and then compare risk factors in persons with and without prevalent CHD. Be sure to indicate what statistical tests were used in the footnote to the table and in a brief summary of a paragraph or less. History of CHD No History of CHD Patient Characteristic* (n = 194) (n = 4240) Age, years 57.5 (7.4) 49.6 (8.6) Systolic blood pressure, mmHg INSERT INSERT Diastolic blood pressure, mmHg 87.1 (14.3) 82.9 (11.9) Total serum cholesterol, mg/dL INSERT INSERT Body mass index INSERT INSERT * Mean (Standard deviation). P-values are based on (INSERT WHAT TEST…] p-value* < 0.001 INSERT < 0.001 INSERT INSERT For Example, you may summarize the first patient characteristic as follows—A total of n = 194 participants report having coronary heart disease (CHD). Participants with prevalent CHD are significantly older than participants who are free of CHD (mean age 57.5 versus 49.6 years, p < 0.001). Likewise, complete summary statements for the remaining characteristics… Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company 8 PHS 801 Biostatistics: Integrative Exercises Dr. Sondip Mathur [Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan] 7. Who is most likely to have prevalent coronary heart disease? Test if there are significant differences in the following risk factors between persons with and without prevalent CHD. You will first need to create two new variables, as per the specifications outlined below. Hyperlipidemia 1 = Y (total cholesterol > 220), 0 = N (total cholesterol ≤ 220) Obese 1 = Obese (BMI ≥ 30), 0 = Not Obese (BMI < 30) Complete the tables below. Estimate the prevalence of each risk factor in people with and without prevalent CHD, and test if there are significant differences in risk factors by prevalent CHD. Summarize the statistical results in the table below, and then compare risk factors in persons with and without prevalent CHD in one paragraph or less. Then, for each variable, estimate the risk difference and relative risk along with 95% confidence intervals for each, and provide a brief summary in one paragraph or less. Distribution of risk factors in participants with and without prevalent CHD Risk Factor* Hypertension Current smoker Hyperlipidemia Diabetes Obese Male sex * n(%) History of CHD (n = 194) 164 (84.5%) 86 (44.3%) n (%) n (%) n (%) n (%) No History of CHD (n = 4240) 3088 (72.8%) 2095 (49.4%) n (%) n (%) n (%) n (%) Risk Difference and Relative Risk of Prevalent CHD for various known risk factors in the FHS participants Risk Factor Hypertension Current smoking status Hyperlipidemia Risk Difference (95% CI) 0.117 (0.064, 0.170) –0.051 (–0.122, 0.021) Est. (CI) Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company Relative Risk (95% CI) 1.16 (1.09, 1.24) 0.90 (0.76, 1.05) Est. (CI) 9 PHS 801 Biostatistics: Integrative Exercises Dr. Sondip Mathur [Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan] Diabetes Est. (CI) Est. (CI) Obese Est. (CI) Est. (CI) Male sex Est. (CI) Est. (CI) Develop related narrative as follows: Participants with prevalent CHD are significantly more likely to have hypertension than participants who are free of CHD (84.5% versus 72.8%) with a relative risk (RR) of 1.16 (95% confidence interval (CI) for RR 1.09–1.24). They are …times as likely to have…INSERT. There was no significant difference in …INSERT. Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company 10