BIOSTAT Case Study 1: Exploratory Data Analysis Techniques

advertisement
BIOSTATISTICS CASE STUDY 1:
Exploratory Data Analysis Techniques
INSTRUCTOR’S VERSION July 30, 2009
BIOSTAT Case Study 1: Exploratory Data Analysis Techniques
Time to Complete Exercise: 30 minutes
LEARNING OBJECTIVES
At the completion of this Case Study, participants should be able to:
 Access TB surveillance data from the CDC Web site
 Generate box-and-whiskers plots, stem and leaf diagrams, and histograms
 Generate percentile values and measures of central tendency and dispersion for
skewed distributions
 Describe the magnitude of the TB incidence (new case) rates in the United States
 Describe the differences in TB incidence rates by sex/gender and state across the
United States
ASPH A. BIOSTATISTICS COMPETENCIES ADDRESSED IN THIS CASE STUDY
A.5. Apply descriptive techniques commonly used to summarize public health data
A.8. Apply basic informatics techniques with vital statistics and public health records
in the description of public health characteristics and in public health research
and evaluation
ASPH INTERDISCIPLINARY/CROSS-CUTTING COMPETENCIES ADDRESSED IN THIS
CASE STUDY: F. COMMUNICATION AND INFORMATICS
F. 8. Use information technology to access, evaluate, and interpret public health data
Please provide your evaluation of the usefulness of this material by clicking here:
http://www.zoomerang.com/Survey/?p=WEB229G2W73FYP
This material was developed by the staff at the Global Tuberculosis Institute
(GTBI), one of four Regional Training and Medical Consultation Centers funded
by the Centers for Disease Control and Prevention. It is published for learning
purposes only.
Case study author(s) name and position:
Marian R. Passannante, PhD
Associate Professor, University of Medicine & Dentistry of New Jersey, New
Jersey Medical School and School of Public Health
Epidemiologist, NJMS, GTBI
For further information please contact:
New Jersey Medical School Global Tuberculosis Institute (GTBI)
225 Warren Street
P.O. Box 1709
Newark, NJ 07101-1709
or by phone at 973-972-9008
1
BIOSTATISTICS CASE STUDY 1:
Exploratory Data Analysis Techniques
INSTRUCTOR’S VERSION July 30, 2009
Introduction
Control of tuberculosis (TB) in the United States is an important public health
responsibility. Effective TB control requires a complex system that merges elements of
laboratory science, investigative work, public health, surveillance, and clinical care.
The Tuberculosis Information Management System (TIMS) is one example of a public
health surveillance system. TIMS is one of the main sources of descriptive data regarding TB
in the United States. TIMS includes information on all cases of TB that have been reported to
the Division of TB Elimination (DTBE) at the Centers for Disease Control and Prevention
(CDC). This information is reported to CDC by 50 states, the District of Columbia, the city of
New York, Puerto Rico, and other jurisdictions in the Pacific and Caribbean.
Data on person, place, and time relating to TB in the United States are gathered using
TIMS. These data are analyzed and published by the CDC annually and may be
accessed through the CDC Web site in the form of TB Surveillance Reports at:
http://www.cdc.gov/nchstp/tb/surv/Surv.htm and the Online Tuberculosis Information
System (OTIS) at http://wonder.cdc.gov/tb.html. If you were to access OTIS and request
current TB case reports by sex and state for the period 2001-5, you would obtain the
data below. The data presented below are the TB new case rates per 100,000 population
for males and females (person), in the 50 states and the District of Columbia (DC) (place)
during the years 2001 to 2005 (time).
TB Case Rates per 100,000 Population
Place
Females
Alabama
3.4
Alaska
6.6
Arizona
3.5
Arkansas
3.4
California
7.1
Colorado
2.1
Connecticut
2.5
Delaware
2.7
DC
8.2
Florida
4.3
Georgia
4.5
Hawaii
7.9
Idaho
0.9
Illinois
4.1
Indiana
1.6
Iowa
1.2
Kansas
2.1
Kentucky
2.1
Louisiana
3.7
Maine
1.2
Maryland
4.5
Massachusetts
3.5
Michigan
2.4
Minnesota
3.9
Mississippi
2.9
Missouri
1.5
Males
7.2
9.4
6.6
6.5
10.6
3.0
3.6
4.6
19.0
8.5
7.8
12.6
1.1
6.0
2.7
1.8
3.2
4.7
7.9
2.0
6.0
5.0
3.2
4.7
6.1
3.1
Place
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming
Females
0.7
1.5
3.6
1.2
5.0
2.3
5.6
3.2
0.8
1.6
3.5
2.3
2.2
4.0
4.2
1.7
3.4
4.9
1.2
1.5
3.9
3.3
1.0
1.2
0.5
Males
2.1
2.5
5.1
1.3
6.8
2.8
9.5
5.9
1.0
2.9
6.4
3.9
3.3
5.5
8.2
2.1
6.9
9.5
1.7
0.9
4.9
5.0
2.0
1.7
0.7
2
BIOSTATISTICS CASE STUDY 1:
Exploratory Data Analysis Techniques
INSTRUCTOR’S VERSION July 30, 2009
Exploratory data analysis techniques are often used to organize, summarize, and
describe clinical and epidemiologic data. These techniques include stem-and-leaf
plots and box plots. To make this easier, the sorted data, by gender, appear below.
Female TB Case Rates per 100,000 Population
1. Wyoming
0.5
2. Montana
0.7
3. North Dakota
0.8
4. Idaho
0.9
5. West Virginia
1.0
6. Iowa
1.2
7. Maine
1.2
8. New Hampshire
1.2
9. Utah
1.2
10. Wisconsin
1.2
11. Missouri
1.5
12. Nebraska
1.5
13. Vermont
1.5
14. Indiana
1.6
15. Ohio
1.6
16. South Dakota
1.7
17. Colorado
2.1
18. Kansas
2.1
19. Kentucky
2.1
20. Pennsylvania
2.2
21. New Mexico
2.3
22. Oregon
2.3
23. Michigan
2.4
24. Connecticut
2.5
25. Delaware
2.7
26. Mississippi
2.9
27. North Carolina
3.2
28. Washington
3.3
29. Alabama
3.4
30. Arkansas
3.4
31. Tennessee
3.4
32. Arizona
3.5
33. Massachusetts
3.5
34. Oklahoma
3.5
35. Nevada
3.6
36. Louisiana
3.7
37. Minnesota
3.9
38. Virginia
3.9
39. Rhode Island
4.0
40. Illinois
4.1
41. South Carolina
4.2
42. Florida
4.3
43. Georgia
4.5
44. Maryland
4.5
45. Texas
4.9
46. New Jersey
5.0
47. New York
5.6
48. Alaska
6.6
49. California
7.1
50. Hawaii
7.9
51. District of Columbia
8.2
Male TB Case Rates per 100,000 Population
1. Wyoming
0.7
2. Vermont
0.9
3. North Dakota
1.0
4. Idaho
1.1
5. New Hampshire
1.3
6. Utah
1.7
7. Wisconsin
1.7
8. Iowa
1.8
9. Maine
2.0
10. West Virginia
2.0
11. Montana
2.1
12. South Dakota
2.1
13. Nebraska
2.5
14. Indiana
2.7
15. New Mexico
2.8
16. Ohio
2.9
17. Colorado
3.0
18. Missouri
3.1
19. Kansas
3.2
20. Michigan
3.2
21. Pennsylvania
3.3
22. Connecticut
3.6
23. Oregon
3.9
24. Delaware
4.6
25. Kentucky
4.7
26. Minnesota
4.7
27. Virginia
4.9
28. Massachusetts
5.0
29. Washington
5.0
30. Nevada
5.1
31. Rhode Island
5.5
32. North Carolina
5.9
33. Illinois
6.0
34. Maryland
6.0
35. Mississippi
6.1
36. Oklahoma
6.4
37. Arkansas
6.5
38. Arizona
6.6
39. New Jersey
6.8
40. Tennessee
6.9
41. Alabama
7.2
42. Georgia
7.8
43. Louisiana
7.9
44. South Carolina
8.2
45. Florida
8.5
46. Alaska
9.4
47. New York
9.5
48. Texas
9.5
49. California
10.6
50. Hawaii
12.6
51. District of Columbia
19.0
3
BIOSTATISTICS CASE STUDY 1
Exploratory Data Analysis Techniques
INSTRUCTOR’S VERSION June 17, 2009
Question 1
Generate separate stem-and-leaf diagrams of these case rates for males and females and
describe the distribution of these data. (Hint: use the decimal as the leaf.)
Answer Key
Female TB Case Rates per 100,000
19
18
17
16
15
14
13
12
11
10
9
8
2
7
19
6
6
5
06
4
0123559
3
234445556799
2
1112334579
1
022222555667
0
5789
Male TB Case Rates per 100,000
19
0
18
17
16
15
14
13
12
6
11
10
6
9
455
8
25
7
289
6
00145689
5
00159
4
6779
3
0122369
2
00115789
1
013778
0
79
Question 2
Describe the distributions. Are they normally distributed or skewed to the right or skewed to
the left?
Answer Key
Right skewed.
Question 3
What is the median TB case rate among females and among males? The 75% and 25%
values? The interquartile (IQ) range? The range?
Answer Key
Females: Median 2.9
Males:
Median 4.7
25% 1.5
25% 2.5
75% 4.0
75% 6.8
IQ range: 2.5
IQ range: 4.3
Range: 7.7
Range: 18.3
Histogram, box plots and summary statistics in Instructor’s Version were produced using JMP 7.0,
SAS Institute Inc.
4
BIOSTATISTICS CASE STUDY 1
Exploratory Data Analysis Techniques
INSTRUCTOR’S VERSION June 17, 2009
Question 4
Draw/generate a histogram and a box-and-whiskers plot describing the rates for males and
females. Which states/locations have unusually high or low (outlier) rates?
Distributions
Female TB case rate per 100,000 population
20
15
10
5
0
5
10 15 20 25
Count
Male TB case rate per 100,000 population
20
15
10
5
0
5 10 15
Count
Females: Hawaii and DC
Males: DC
Question 5
Describe the differences in the TB case rates for males and females.
Answer Key
 Males tend to have higher TB case rates than females in this time period across
states in the US.
 These rates are skewed to locations with high rates, with DC having unusually
high TB case rates for both males and females
Histogram, box plots and summary statistics in Instructor’s Version were produced using JMP 7.0,
SAS Institute Inc.
5
Download