Uploaded by William Adom

CAPSTONE ASSIGNMENT The Framingham Heart Study (Dr. Mathur) Mine

advertisement
PHS 801 Biostatistics: Integrative Exercises
Dr. Sondip Mathur
[Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan]
The Framingham Heart Study—Integrative Exercises (Capstone Assignment)
Background
The National Heart, Lung, and Blood Institute (NHLBI)1 created a teaching dataset that
includes real but anonymized data collected as part of the Framingham Heart Study. The Framingham
Heart Study is one of the most influential and longest running epidemiological studies of risk factors
for cardiovascular disease ever run. The study started in 1948 and continues today to collect extensive
data from original participants, their children, and their children’s children. Much of what we know
about cardiovascular disease was discovered by investigators involved in the Framingham Heart Study.
In fact, studies to date using data collected in the Framingham Heart study have resulted in over 3000
publications in high impact, peer-reviewed medical journals.
The Framingham Heart Study has been widely discussed in the media. WGBH in Boston
produced a video documentary for PBS entitled “The Hidden Epidemic: Heart Disease in America”
that details the history of heart disease in this country and highlights the Framingham Heart Study. 2 In
2007, CBS News did a story on the study, its participants, and its impact.3 Additionally, research
results from the Framingham Heart Study are communicated widely, most recently highlighting the
discovery of a gene that may promote obesity4 and new data showing declining rates of dementia.5
Interested readers can visit the Framingham Heart Study website for a detailed history of this
incredible study and its many contributions to preventive medicine.6
Datasets for Analysis
NHLBI created a longitudinal teaching dataset includes clinical, laboratory, and outcome data
on n = 4434 participants. Each participant has between one and three observations—which represent
1
http://www.nhlbi.nih.gov/
http://www.pbs.org/wgbh/takeonestep/heart/
3 http://www.cbsnews.com/videos/landmark-heart-study/
4 http://www.cbsnews.com/news/how-a-fat-gene-may-influence-your-weight/
5 http://www.cbsnews.com/news/dementia-alzheimers-risk-signs-of-hope-study/
6 https://www.framinghamheartstudy.org/about-fhs/history.php/
2
Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company
1
PHS 801 Biostatistics: Integrative Exercises
Dr. Sondip Mathur
[Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan]
examinations held approximately 6 years apart. There are a total of 11,627 observations in the full
dataset. A detailed description of the Framingham Heart Study dataset and other public use datasets
available from NHLBI are available on the NHLBI Biologic Specimen and Data Repository
Information Coordinating Center (BioLINCC) website.7
Two datasets are available for analysis here—one is the complete dataset with n = 11,627
observations (or person-exams), and the second includes only data collected at the first examination for
each participant (n = 4434). The two datasets are available as comma separated values (.csv) files for
analysis in Excel, R, or other statistical computing packages. FHS-All.csv contains n = 11,627
observations and FHS-Exam1.csv contains n = 4434 observations.
Variables
The following variables are available in each dataset for analysis (extracted from the complete
documentation file, available on the NHLBI BioLINCC website 8).
Variable Name
RANDID
SEX
PERIOD
TIME
AGE
SYSBP
DIABP
BPMEDS
CURSMOKE
CIGPDAY
TOTCHOL
HDLC*
LDLC*
BMI
GLUCOSE
DIABETES
HEARTRTE
PREVAP
PREVCHD
7
8
Description
Unique identification number for each participant
Participant sex
Exam cycle
Number of days since first (baseline) exam
Age at exam, years
Systolic blood pressure, mmHg
Diastolic blood pressure, mmHg
Use of anti-hypertensive medication
Currently smoking cigarettes
Number of cigarettes smoked per day
Total serum cholesterol, mg/dL
High density lipoprotein cholesterol, mg/dL
Low density lipoprotein cholesterol, mg/dL
Body mass index = weight (kg)/height (m)2
Serum glucose, mg/dL
Diabetes (glucose > 200 mg/dL or on treatment)
Heart rate, beats/minute
Prevalent angina pectoris
Prevalent coronary heart disease (CHD)
Coding Details/Range
2248-9999312
1 = Male, 2 = Female
1, 2, 3
0–4854
32–81
83–295
30–150
0 = No, 1 = Yes
0 = No, 1 = Yes
0 (non-smoker)–90
107–696
10–189
20–565
14–57
39–478
0 = No, 1 = Yes
37–220
0 = No, 1 = Yes
0 = No, 1 = Yes
https://biolincc.nhlbi.nih.gov/static/studies/teaching/framdoc.pdf?link_time=2016-07-06_14:21:55.514359
https://biolincc.nhlbi.nih.gov/static/studies/teaching/framdoc.pdf?link_time=2016-07-06_14:21:55.514359
Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company
2
PHS 801 Biostatistics: Integrative Exercises
Dr. Sondip Mathur
[Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan]
PREVMI
Prevalent myocardial infarction (MI)
0 = No, 1 = Yes
PREVSTRK
Prevalent stroke
0 = No, 1 = Yes
PREVHYP
Prevalent hypertension
0 = No, 1 = Yes
The following are outcome events coded 1 if the event occurred during the
follow-up (only the first event is recorded).
ANGINA
Angina pectoris
0 = No, 1 = Yes
HOSPMI
Hospitalized for MI
0 = No, 1 = Yes
MI_FCHD
Hospitalized for MI or fatal CHD
0 = No, 1 = Yes
ANYCHD
Any coronary heart disease event
0 = No, 1 = Yes
STROKE
Stroke
0 = No, 1 = Yes
CVD
Cardiovascular disease
0 = No, 1 = Yes
HYPERTEN
Hypertension
0 = No, 1 = Yes
DEATH
Death from any cause
0 = No, 1 = Yes
The following are numbers of days from the first (baseline) exam to the first event
during the follow-up. If no event occurred, time is end of follow-up,
death, or last known contact date.
TIMEAP
Time from baseline to first angina
TIMEMI
Time from baseline to first myocardial infarction
TIMEMIFC
Time from baseline to first MI or fatal CHD
TIMECHD
Time from baseline to first CHD
TIMESTRK
Time from baseline to first stroke
TIMECVD
Time from baseline to first cardiovascular disease
TIMEHYP
Time from baseline to first hypertension
TIMEDTH
Time from baseline to death
*Available only at period = 3 exam, missing otherwise
Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company
3
PHS 801 Biostatistics: Integrative Exercises
Dr. Sondip Mathur
[Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan]
CAPSTONE ASSIGNMENT: INTEGRATIVE EXERCISES
1. Describe the study sample (by completing the following the table).
Complete the following table to describe the study sample using data collected at the first
examination for each participant (n = 4434). Summarize your results in three to four sentences.
Patient Characteristic*
Age, years
Male sex
Systolic blood pressure, mmHg
Diastolic blood pressure, mmHg
Use of anti-hypertensive medication
Current smoker
Total serum cholesterol, mg/dL
Body mass index
Diabetes
* Mean (Standard deviation) or n (%)
Total Sample (n = 4434)
49.92(8.67)
2200(49.7%)
132.9 (22.42)
83.08 (12.05)
n (%)
n (%)
236.98 (44.65)
25.84 (4.10)
n (%)
(INSERT SAMPLE BRIEF SUMMARY NARRATIVE/PARAGRAPH ON ITEM-1)
Note: There were some missing values for some of the variables in the table. For example, the use of
anti-hypertensive medicine was available in n = 4373 participants (61 participants had missing data).
Total serum cholesterol was available in n = 4382 and body mass index available in n = 4415 (52 and
19 participants, respectively, had missing data). Because the extent of missing data is small (less than
5% of the total sample) we do not mention the missing data in the summary because its extent is
unlikely to affect the results.
2. Compare risk factors in men and women.
Complete the following table to compare men and women using data collected at the first
examination for each participant (n = 4434). Summarize your results in three to four sentences.
Patient Characteristic*
Age, years
Systolic blood pressure, mmHg
Diastolic blood pressure, mmHg
Use of anti-hypertensive medication
Men (n = 314)
56.2 (9.8)
130.5 (15.6)
82.4 (9.8)
27.4%
Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company
Women (n = 698)
58.7 (9.5)
135.2 (17.2)
85.6 (10.5)
32.1%
4
PHS 801 Biostatistics: Integrative Exercises
Dr. Sondip Mathur
[Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan]
Current smoker
n (%)
n (%)
Total serum cholesterol, mg/dL
Mean (SD)
Mean (SD)
Body mass index
Mean (SD)
Mean (SD)
Diabetes
n (%)
n (%)
* Mean (Standard deviation) or n (%)
[INSERT BRIEF SUMMARY]
3. Do risk factors cluster? Specifically, do obese participants have more risk factors?
Create a new variable using BMI that classifies individuals as normal weight, overweight or obese,
as follows:
Normal weight
Overweight
Obese
BMI < 25.0
25.0 < BMI < 30.0
BMI > 30.0
Complete the following table to compare participants who are normal weight, overweight, and
obese using data collected at the first examination for each participant (n = 4434). Summarize your
results in one paragraph or less.
Patient Characteristic*
Age, years
Male sex
Systolic blood pressure, mmHg
Diastolic blood pressure, mmHg
Use of anti-hypertensive medication
Current smoker
Total serum cholesterol, mg/dL
Diabetes
* Mean (Standard deviation) or n (%)
Normal
(n = ?)
Mean (SD)
n (%)
Mean (SD)
Mean (SD)
n (%)
n (%)
Mean (SD)
n (%)
Overweight
(n = ?)
Mean (SD)
n (%)
Mean (SD)
Mean (SD)
n (%)
n (%)
Mean (SD)
n (%)
Obese
(n = ?)
Mean (SD)
n (%)
Mean (SD)
Mean (SD)
n (%)
n (%)
Mean (SD)
n (%)
EXAMPLE NARRATIVE: Participants who are obese (BMI > 30 kg/m2) generally have more risk
factors for cardiovascular disease than participants who are overweight or normal weight. Specifically,
obese participants are older, have higher systolic and diastolic blood pressures, and are more than
twice as likely to be on anti-hypertensive medications than their counterparts who are overweight or
normal weight. They are also far more likely to have diabetes than participants who are overweight or
Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company
5
PHS 801 Biostatistics: Integrative Exercises
Dr. Sondip Mathur
[Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan]
normal weight. Total serum cholesterol levels are similar across BMI categories and there is no clear
trend in the distribution of sex across BMI categories. Participants who are obese are less likely to
smoke (34.5%) compared to their counterparts who are overweight (44.6% report current smoking)
and normal weight (57.8% report current smoking).
4. Summarize the distribution of BMI categories in men versus women.
Construct a plot to compare the distributions of weight categories (normal, overweight, and obese)
for men and women, considered separately. Summarize the comparison in three to four sentences.
60%
50%
Percent
40%
30%
20%
10%
0%
Men
Women
Normal
Overweight
Obese
SAMPLE NARRATIVE: Among men, approximately one third are normal weight, ... The distribution
is different in women. Approximately … are obese. This result is consistent with the difference in
mean BMI we observed in men (26.2 kg/m2) versus women (25.6 kg/m2) in question 2.
Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company
6
PHS 801 Biostatistics: Integrative Exercises
Dr. Sondip Mathur
[Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan]
5. What characteristics are associated with BMI?
Use simple and multivariable linear regression analysis to complete the following table relating the
characteristics listed to BMI as a continuous variable. Before conducting the analysis, be sure that
all participants have complete data on all analysis variables. If participants are excluded due to
missing data, the numbers excluded should be reported. Then, describe how each characteristic is
related to BMI. Are crude and multivariable effects similar? What might explain or account for any
differences?
Outcome Variable: BMI, kg/m2
Regression Coefficient
Crude Models
p-value
Regression Coefficient
Multivariable Model
p-value
Age, years
0.06
< 0.001
–0.02
< 0.001
Male sex
0.58
< 0.001
0.99
0.004
Systolic blood pressure, mmHg
-
-
-
-
Total serum cholesterol, mg/dL
-
-
-
-
Current smoker
-
-
-
-
Diabetes
-
-
-
-
Characteristic
INSERT RELATED NARRATIVE HERE: A total of n = 4434 participants are available for analysis.
However, some participants are missing data on key analysis variables and are excluded (BMI (n = 19)
and total serum cholesterol (n = 51)) leaving a sample of n = 4364. In simple linear regression
analyses, each of the characteristics considered is statistically significantly associated with BMI (p <
0.001 for each). Age, male sex, systolic blood pressure, total serum cholesterol, and diabetes are
…insert…
Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company
7
PHS 801 Biostatistics: Integrative Exercises
Dr. Sondip Mathur
[Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan]
In a multivariable linear regression model, all characteristics remain statistically significantly
associated with BMI (after adjustment for other characteristics) but some of the associations shift
suggesting that there may be confounding. For example, in an unadjusted model each year of age is
associated with a higher BMI of approximately 0.06 kg/m2. After adjustment for other risk factors, the
effect of age is …insert…. More analysis is needed to investigate relationships among the risk factors.
6. Who is most likely to have prevalent coronary heart disease?
Test if there are significant differences in the following risk factors between persons with and
without prevalent coronary heart disease (CHD). Summarize the statistical results in the table
below and then compare risk factors in persons with and without prevalent CHD. Be sure to
indicate what statistical tests were used in the footnote to the table and in a brief summary of a
paragraph or less.
History of CHD
No History of CHD
Patient Characteristic*
(n = 194)
(n = 4240)
Age, years
57.5 (7.4)
49.6 (8.6)
Systolic blood pressure, mmHg
INSERT
INSERT
Diastolic blood pressure, mmHg
87.1 (14.3)
82.9 (11.9)
Total serum cholesterol, mg/dL
INSERT
INSERT
Body mass index
INSERT
INSERT
* Mean (Standard deviation). P-values are based on (INSERT WHAT TEST…]
p-value*
< 0.001
INSERT
< 0.001
INSERT
INSERT
For Example, you may summarize the first patient characteristic as follows—A total of n = 194
participants report having coronary heart disease (CHD). Participants with prevalent CHD are
significantly older than participants who are free of CHD (mean age 57.5 versus 49.6 years, p < 0.001).
Likewise, complete summary statements for the remaining characteristics…
Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company
8
PHS 801 Biostatistics: Integrative Exercises
Dr. Sondip Mathur
[Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan]
7. Who is most likely to have prevalent coronary heart disease?
Test if there are significant differences in the following risk factors between persons with and
without prevalent CHD. You will first need to create two new variables, as per the specifications
outlined below.
Hyperlipidemia
1 = Y (total cholesterol > 220), 0 = N (total cholesterol ≤ 220)
Obese
1 = Obese (BMI ≥ 30), 0 = Not Obese (BMI < 30)
Complete the tables below. Estimate the prevalence of each risk factor in people with and without
prevalent CHD, and test if there are significant differences in risk factors by prevalent CHD.
Summarize the statistical results in the table below, and then compare risk factors in persons with
and without prevalent CHD in one paragraph or less. Then, for each variable, estimate the risk
difference and relative risk along with 95% confidence intervals for each, and provide a brief
summary in one paragraph or less.
Distribution of risk factors in participants with and without prevalent CHD
Risk Factor*
Hypertension
Current smoker
Hyperlipidemia
Diabetes
Obese
Male sex
* n(%)
History of CHD
(n = 194)
164 (84.5%)
86 (44.3%)
n (%)
n (%)
n (%)
n (%)
No History of CHD
(n = 4240)
3088 (72.8%)
2095 (49.4%)
n (%)
n (%)
n (%)
n (%)
Risk Difference and Relative Risk of Prevalent CHD for various known risk factors in the FHS
participants
Risk Factor
Hypertension
Current smoking status
Hyperlipidemia
Risk Difference
(95% CI)
0.117 (0.064, 0.170)
–0.051 (–0.122, 0.021)
Est. (CI)
Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company
Relative Risk
(95% CI)
1.16 (1.09, 1.24)
0.90 (0.76, 1.05)
Est. (CI)
9
PHS 801 Biostatistics: Integrative Exercises
Dr. Sondip Mathur
[Adapted/Reference: Biostatistics Essentials of Biostatistics in Public Health Lisa M. Sullivan]
Diabetes
Est. (CI)
Est. (CI)
Obese
Est. (CI)
Est. (CI)
Male sex
Est. (CI)
Est. (CI)
Develop related narrative as follows: Participants with prevalent CHD are significantly more likely to
have hypertension than participants who are free of CHD (84.5% versus 72.8%) with a relative risk
(RR) of 1.16 (95% confidence interval (CI) for RR 1.09–1.24). They are …times as likely to
have…INSERT. There was no significant difference in …INSERT.
Copyright © 2018 by Jones & Bartlett Learning, LLC, an Ascend Learning Company
10
Download