1 Short Course: Biostatistics in Practice Fall 2015 HW #1 (total 40 pts, Due by Oct 6) Last Name: ____________________ First Name: _____________________ 1. Suppose a physician want to investigate if there is a higher risk of lung cancer among patients who currently have asthma. The following variables are data he/she collected from the chart review. (total 20 pts) 1) Please describe the type of following variables. (each 1pts, total 12 pts) a. The number of cigarettes smoking per day(e.g, 0,1,2,…, 24) : count data b. The blood type (e.g, A, B, AB, O) : nominal data c. Gender (e.g., Male, Female) : nominal data (binary) d. Weight in pounds : continuous data e. Whether or not a patient has a lung cancer ( e.g, Yes, No) : nominal data (binary) f. The duration of having asthma in years (e.g, 3.7 years, 5.1 years, 10.8 years …) : continuous data g. Whether or not a patient has asthma (e.g, Yes, No) : nominal data (binary) h. The severity of asthma status ( e.g., mild, moderate, severe, very severe) : ordinal data i. The Body Max Index (BMI = ) : continuous data j. The education level (e.g, high school degree, college degree, graduate degree) : ordinal data k. Race (e.g., Caucasian, African American, Hispanic, Asian, Other) : nominal data l. Age in years : continuous data (2) Among variables listed above , which variables are binary data? (2pts) - gender (c), whether or not a patient has a lung cancer (e), whether or not a patient has a asthma (g) (3) If you are a research associate who helps him/her to summarize the data, how would you summarize the data? Indicate which variables you would provide the sample mean the standard deviation rather than the 2 percentage. Indicate which variables you would provide the percentage rather than the sample mean. Please provide the reasoning for your choice. (total 6 pts). Means and standard deviations are usually used for quantitative data (count, continuous) while percentages are usually used for qualitative data (nominal, ordinal). Thus, I would use means and standard deviations for the number of cigarettes smoking per day, weight, the duration of having asthma, BMI, Age (a, d, f, i, l) to summarize the data. I would use percentages for the blood type, gender, whether or not a patient has a lung cancer, whether or not a patients has asthma, the severity of asthma status, the education level, race (b, c, e, g, h, j, k). 2. An experiment was conducted at the University of California-Berkeley to study the effect of psychological environment on the anatomy of the brain. A group of 19 rats was randomly divided into two groups. Animals in the treatment group lived together in a large cage furnished with playthings that were changed daily; animal in the control group live in isolation with no toys. After a month, the experimental animals were sacrificed and dissected to obtain the cortex weights (the thinking part of the brain) in milligrams. The following plot shows the distribution of the cortex weight by group. (total 10 pts) (1) What study design is this study adopted? (2pts) Answer: _____d____ a. Prospective longitudinal b. Case-Control c. Cross-Sectional d. Randomized –Control e. None in the above (2) What is the name of the plot above? (2pts) Answer: Box-Whisker plot__ (3) The following statements are based on the side-by-side box plot. Circle T (true) or F (false) in the following set of statements. Each question is worth 2 points (a) T F The median cortex weight for the treatment group is greater than the one for the control group. 3 (b) T F The variability of the cortex weights for the control group is greater than the one for the treatment group. (The range of the cortex weights for the treatment group is wider, thus, more variability exists in the treatment group) (c) T F There is one outlier (or an extreme value) in the control group. 4. Download the data file: SURVEY.SAV and draw the histogram using the variable: STAYMINUTES. Copy and paste that histogram in the space below. (5 points) 20 Count 10 0.1 Proportion per Bar 0.2 15 5 0 0 500 1,000 0.0 1,500 Minutes in Clinic 5. Using MYSTAT, compute the mean, the median, and the standard deviation of the variable: STAYMINUTES. (5 points) Minutes in Clinic N of Cases 79 Minimum 13.000 Maximum 1,400.000 Median 465.000 Arithmetic Mean 473.443 Standard Deviation 323.232