Short Course: Biostatistics in Practice Fall 2015 HW #1 (total 40 pts

advertisement
1
Short Course: Biostatistics in Practice Fall 2015
HW #1 (total 40 pts, Due by Oct 6)
Last Name: ____________________
First Name: _____________________
1. Suppose a physician want to investigate if there is a higher risk of lung cancer among patients who currently
have asthma. The following variables are data he/she collected from the chart review. (total 20 pts)
1) Please describe the type of following variables. (each 1pts, total 12 pts)
a. The number of cigarettes smoking per day(e.g, 0,1,2,…, 24)
: count data
b. The blood type (e.g, A, B, AB, O)
: nominal data
c. Gender (e.g., Male, Female)
: nominal data (binary)
d. Weight in pounds
: continuous data
e. Whether or not a patient has a lung cancer ( e.g, Yes, No)
: nominal data (binary)
f. The duration of having asthma in years (e.g, 3.7 years, 5.1 years, 10.8 years …)
: continuous data
g. Whether or not a patient has asthma (e.g, Yes, No)
: nominal data (binary)
h. The severity of asthma status ( e.g., mild, moderate, severe, very severe)
: ordinal data
i. The Body Max Index (BMI = )
: continuous data
j. The education level (e.g, high school degree, college degree, graduate degree)
: ordinal data
k. Race (e.g., Caucasian, African American, Hispanic, Asian, Other)
: nominal data
l. Age in years
: continuous data
(2) Among variables listed above , which variables are binary data? (2pts)
- gender (c), whether or not a patient has a lung cancer (e), whether or not a patient has a asthma (g)
(3) If you are a research associate who helps him/her to summarize the data, how would you summarize the
data? Indicate which variables you would provide the sample mean the standard deviation rather than the
2
percentage. Indicate which variables you would provide the percentage rather than the sample mean. Please
provide the reasoning for your choice. (total 6 pts).
Means and standard deviations are usually used for quantitative data (count, continuous) while
percentages are usually used for qualitative data (nominal, ordinal).
Thus, I would use means and standard deviations for the number of cigarettes smoking per day, weight, the
duration of having asthma, BMI, Age (a, d, f, i, l) to summarize the data. I would use percentages for the blood
type, gender, whether or not a patient has a lung cancer, whether or not a patients has asthma, the severity of
asthma status, the education level, race (b, c, e, g, h, j, k).
2. An experiment was conducted at the University of California-Berkeley to study the effect of psychological
environment on the anatomy of the brain. A group of 19 rats was randomly divided into two groups. Animals in
the treatment group lived together in a large cage furnished with playthings that were changed daily; animal in
the control group live in isolation with no toys. After a month, the experimental animals were sacrificed and
dissected to obtain the cortex weights (the thinking part of the brain) in milligrams. The following plot shows
the distribution of the cortex weight by group. (total 10 pts)
(1) What study design is this study adopted? (2pts) Answer: _____d____
a. Prospective longitudinal
b. Case-Control
c. Cross-Sectional
d. Randomized –Control
e. None in the above
(2) What is the name of the plot above? (2pts)
Answer: Box-Whisker plot__
(3) The following statements are based on the side-by-side box plot. Circle T (true) or F (false) in the
following set of statements. Each question is worth 2 points
(a) T
F The median cortex weight for the treatment group is greater than the one for the control
group.
3
(b) T
F The variability of the cortex weights for the control group is greater than the one for the
treatment group.
(The range of the cortex weights for the treatment group is wider, thus, more variability
exists in the treatment group)
(c) T
F There is one outlier (or an extreme value) in the control group.
4. Download the data file: SURVEY.SAV and draw the histogram using the variable: STAYMINUTES. Copy
and paste that histogram in the space below. (5 points)
20
Count
10
0.1
Proportion per Bar
0.2
15
5
0
0
500
1,000
0.0
1,500
Minutes in Clinic
5. Using MYSTAT, compute the mean, the median, and the standard deviation of the variable:
STAYMINUTES. (5 points)
Minutes in
Clinic
N of Cases
79
Minimum
13.000
Maximum
1,400.000
Median
465.000
Arithmetic Mean 473.443
Standard Deviation 323.232
Download