1.1 Displaying Distributions with Graphs Ulrich Hoensch Tuesday, January 8, 2013

advertisement
1.1 Displaying Distributions with Graphs
Ulrich Hoensch
Tuesday, January 8, 2013
Data and Variables
Statistics is the science of collecting, organizing, and interpreting
data.
Example 1 The serum levels of of high-density lipoprotein (HDL)
cholesterol (in mg/dL) of 26 subjects are shown here. (Source:
Jekel et al., Epidemiology, Biostatistics, and Preventive Medicine.)
31, 41, 44, 46, 47, 47, 48, 48, 49, 52, 53, 54, 57,
58, 58, 60, 60, 62, 63, 64, 67, 69, 70, 77, 81, 90
I
The numbers in this list are the data;
I
the mathematical object that generates this list is called a
variable. A variable takes different values for different
individuals.
Categorical and Quantitative Variables
I
A categorical variable places an individual into one of two or
more groups or categories.
I
A quantitative variable takes numerical values for which
arithmetic operations such as adding and averaging make
sense.
Note: There are variables that take numerical values, but are not
quantitative (e.g. ZIP codes, jersey numbers); conversely, some
categorical variables may be converted to quantitative variables by
a meaningful coding (e.g. customer satisfaction ratings).
Example 2
I
The variables ID, Grade, Gender, PrevStat are categorical;
I
the variables TotalPoints and Year are quantitative;
I
the variable Grade may be converted to a quantitative variable
by the numerical coding A=4.0, B=3.0, etc.;
I
the variable ID is numerical, but not quantitative.
Displaying Distributions
The distribution of a variable tells us what values it takes and
how often it takes these values.
A distribution may be given in the form of a table, which is called
a frequency distribution. The frequencies in this table can be
expressed using absolute frequencies (counts) or relative
frequencies (percentages).
Example 3
Here is the frequency distribution of the highest level of education
for people aged 25 years and over in the U.S. (Source:
www.census.gov, 2011 CPS)
Education
Less than high school
High school graduate
Some college
Associate degree
Bachelor’s degree
Advanced degree
Count
25,040
61,911
34,203
19,047
39,286
22,057
Percent
12.4%
30.7%
17.0%
9.5%
19.5%
10.9%
The two most common graphs that are used to display data of this
type are bar graphs and pie charts.
Example 3, bar graph
Example 3, pie chart
Stemplots (Stem-and-Leaf Plots)
To make a stemplot:
1. Separate each observation into a stem consisting of all but
the final (rightmost) digit and a leaf, the final digit. Stems
may have as many digits as needed, but each leaf contains
only a single digit.
2. Write the stems in a vertical column with the smallest at the
top, and draw a vertical line at the right of this column.
3. Write each leaf in the row to the right of its stem, in
increasing order out from the stem.
Example 4: Vitamin D in Adolescents
The following show data of vitamin D values (in ng/ml of blood)
for 20 girls aged 11 to 14 years, and 20 boys aged 11 to 14 years.
I
Girls:
16 43 38 48 42 23 36 35 37 34
25 28 26 43 51 33 40 35 41 42
I
Boys:
18 28 28 28 37 31 24 29 8 27
24 12 21 32 27 24 23 33 31 29
Source: Moore/McCabe/Craig, Introduction to the Practice of
Statistics, 7th Edition.
Example 4, Developing the Stemplot
The stemplot for the vitamin D values of girls: (a) writing the
stems; (b) writing the leaves; (c) sorting the leaves.
Example 4, Back-to-back Stemplot
Example 4, Back-to-back Stemplot with Split Stems
Creating Bar Graphs in MS Excel
Consider the following frequency data obtained from 50 teenage
females and 50 teenage males.
We want to create a bar graph that displays the five categories,
with the frequencies for females and males displayed separately in
side-by-side bars. First, click on the “Insert” tab, and select
“Column”. Select the type of pie chart you want to use. We will
use a regular 2-D column (bar) chart here.
Creating Bar Graphs in MS Excel
Creating Bar Graphs in MS Excel
Then, click on “Select Data”, and click and drag the cursor over
the column that contains the frequencies for both males and
females. Include the labels (Female/Male) into the selected cell
range.
Creating Bar Graphs in MS Excel
Now, click on the “Edit” button under the “Horizontal (Category)
Axis Labels” panel. Select the column that contains the labels.
Click “Ok”, and “Ok” again. The following bar chart results.
Creating Bar Graphs in MS Excel
Creating Pie Charts in MS Excel
Consider the following frequency distribution for the number of
sick days of workers.
Creating Pie Charts in MS Excel
To create a pie chart, click on the “Insert” tab, and select “Pie”.
Select the type of pie chart you want to use. We will use a regular
2-D pie chart here.
Creating Pie Charts in MS Excel
Then, click on “Select Data”, and click and drag the cursor over
the column that contains the frequencies.
Creating Pie Charts in MS Excel
Now, click on the “Edit” button under the “Horizontal (Category)
Axis Labels” panel. Select the column that contains the labels.
Creating Pie Charts in MS Excel
Click “Ok”, and “Ok” again. This gives a usable pie chart. We
modify the pie chart to show the percentages and the labels, as
follows. Right-click on the pie chart itself (not the surrounding
white space).
Creating Pie Charts in MS Excel
Select “Add Data Labels”, then right-click on the pie chart again,
and select “Format Data Labels. . . ”
Creating Pie Charts in MS Excel
Check the “Percentage” box and the “Category Name” box, and
uncheck the “Value” box.
Creating Pie Charts in MS Excel
Click “Ok”. Finally, we want to remove the legend on the right of
the graph. Simply click on it, and press the DELETE key. The
following pie chart results.
Creating Pie Charts in MS Excel
To include a title, click on the white space surrounding the pie
chart, and click on the left-most template under “Chart Layouts”,
and simply edit the title by clicking on it.
Creating Pie Charts in MS Excel
The final product should look like this.
Download