Framingham Heart Study Data
Cholesterol Data Analysis
Firas Jirjees
COH 602: Biostatistics
11-26-2014
Introduction
5209 patients in the Framingham Heart Study were profiled for cholesterol level, cholesterol status, weight, and age at the diagnosis of symptoms. A multiple linear regression test compared these four variables and identified possible relationships among them. This paper sought to identify the relevance of weight to cholesterol levels in Framingham heart patients. In addition, we stratified the tested population by age, from 28 to 62. We sought to determine how weight might affect cholesterol levels as well as the impact of age. Our tested hypothesis was that higher weights would correlate positively with high cholesterol. It is commonly known that older people tend to have higher cholesterol, and they are also at risk for becoming overweight due to low physical activity. However, the relationship between age, weight and cholesterol could be better understood through data analysis.
Methods
The study profiled a population of Framingham patients; researchers employed a GLM procedure of several individual variables, including cholesterol level, cholesterol status, weight, and age at start of treatment. Cholesterol level was measured from 96 to 568, the reported levels. Cholesterol status was ranked borderline (1), desirable (2) and high (3). Weight was ranked from 67 to 300 (lbs).
Age at start was ranked from 28 to 62, the age range of participants. Participants in the Framingham study had a cumulative weight of 5057 and cholesterol level had a mean of 227.42; comparatively, the median was 223.00 and the mode was 200.00.
In the lowest quartile was the very low value of 96; the maximum quartile value was
2
a very high 568. Again, most participants registered levels in the lower 200 range.
The mean, median, and mode of these variables were determined by SAS analysis.
Results
For the SAS analysis, we used multiple linear regression due to having more than two variables. We primarily examined the variables weight, age at start of diagnosis, and cholesterol. There were 5209 people in the study ranging in age at start of diagnosis from 28 to 62 years of age. Mean age was 43.94 with a median of
43.00 and a mode of 36.00; highest frequency was 5152 (62 years of age). Standard deviation was 8.57. Cholesterol levels ranged from a low of 96 to a high of 568; the highest frequency was 3691 (115). Cholesterol mean was 227.41, with a median
223.00 and a mode of 200.00. Standard deviation was 44.94. Of cholesterol status, we measured the frequency for cholesterol level in each of the three status levels, borderline, desirable and high. Of the borderline cholesterol status, the highest frequency was 1996 (200). Of the desirable cholesterol status, the highest frequency was 3417 (199). Of the high cholesterol status, highest frequency was 5209 (240).
Weight ranged from a low of 67 lbs to a high of 300 lbs, with a mean of 219.11, a median of 220.00 and a mode of 200.00; the highest frequency was 1996 (200 lbs).
Standard deviation was 11.45. However, in addition to these data we must assess the relationship and between the variables to verify correlation.
3
Variable N Mean StdDev Sum Min Max Label
Cholesterol 5057 227.417 44.935 1150050 96.00 568.00 Cholesterol
AgeAtStart 5209 44.068 8.574 229554 28.00 62.00 AgeAtStart
Weight 5209 219.110 11.450 407764 70.00 300.00 Weight
The coefficient variation was 18.84 for each cholesterol status; also P-values for all variables were < .0001, allowing us to reject the null hypothesis. If our null hypothesis predicted no statistically significant relationship between the variables, we can now say that there is a high positive statistical significance between them.
Based on our analysis, there was a strong positive statistical significance between age at start, weight, cholesterol status, and cholesterol level. Consequently, the graph for cholesterol shows a positive, right-skew since the mean is higher than median, and the median is higher than mode. Similarly, for age of start, the mean is higher than the median, which is higher than the mode; this indicates a right skew and high positive correlation.
Our hypothesis stated that weight and cholesterol levels would be positively correlated. The hypothesis was calculated in the following steps as H
0
: B
1
=0 versus
H
1
: B
1
≠ 0 with a level of significance of ∝ =0.05. Our test statistic will be the equation
Ῡ= 𝑏
0
+ 𝑏
1
𝑥
1
+ 𝑏
2 𝑥
2
+ b
3
x
3
. b
0
represents the value of Y when all independent variables are equal to 0. b
1
, b
2
and b
3
stand for estimated regression coefficients for cholesterol status, age of start and weight. Our decision rule must be to reject the null if p<0.05. Since p<0.0001, considerably lower than 0.05, we can reject the null,
4
so there is a statistically significant difference in the mean age at start compared to those who have high weight.
Statistic t
M
S
359.898
2528.500
6394577
P-Value
Pr >|t|
Pr > |M|
Pr > |S|
<.0001
<.0001
<.0001
Discussion
Based on our findings, we can see that in this age group, cholesterol levels tended toward the high side, with few extreme lows and some extreme highs. There was a majority of cholesterol levels in the mid-200 range, which is borderline high.
The rare high score of 568 was incredibly morbid, but there were quite a few who scored in the 300 range. Similarly, weights were highly variable, within a range of
233 lbs; the fact that weights started at 67 lbs and ranged to 300 lbs was telling as it includes both overweight and underweight patients, all of whom are adults.
However, our reported average was slightly overweight based on the general population. The fluctuation matches fairly well, although there was a higher range for cholesterol. In general, the ages reported did not include people older than 62 years of age, who might show higher levels. The variable “age at start” was perhaps the most intriguing; the highest frequency age for cholesterol checking was early
40s, suggesting that many people receive cholesterol checks at this age due to checks at physicals. Many people begin extensive health checks around age 40, and cholesterol is a common test.
5
Conclusion
In conclusion, there was strong statistical significance between weight and cholesterol level, which we can expect due to prior medical knowledge. The large population size allows for variation, but even with the age variation it is clear by the shape of the age and cholesterol graphs that a common age for high cholesterol starting is mid 40s and an average level is in the mid 200 range. Also, due to the three cholesterol statuses being well represented across the ages, it was clear that people of disparate ages could be vulnerable to high cholesterol. One interesting note is that the highest frequencies for desirable, borderline and high cholesterol status (199, 200 and 240, respectively) were only 41 points apart in range. That number suggests that most people can expect this range unless other factors may contribute to more polarized numbers. Since we were able to reject the null hypothesis with p-values <.0001, we can say that these values correlate with high significance: weight and cholesterol are positively correlated. These findings were visible on the graph as well. The fact that there was a high positive correlation between weight and high cholesterol suggests overweight people should be more mindful of their cholesterol levels regardless of age and get them checked more often. However, older people should get their levels checked frequently as well.
6
References
Sullivan, Lisa M. (2012). Essentials of Biostatistics in Public Health, Second
Edition. Sudbury, MA: Jones and Bartlett Publishers, Inc.
7