1.1 Displaying Distributions with Graphs Ulrich Hoensch Tuesday, January 8, 2013 Data and Variables Statistics is the science of collecting, organizing, and interpreting data. Example 1 The serum levels of of high-density lipoprotein (HDL) cholesterol (in mg/dL) of 26 subjects are shown here. (Source: Jekel et al., Epidemiology, Biostatistics, and Preventive Medicine.) 31, 41, 44, 46, 47, 47, 48, 48, 49, 52, 53, 54, 57, 58, 58, 60, 60, 62, 63, 64, 67, 69, 70, 77, 81, 90 I The numbers in this list are the data; I the mathematical object that generates this list is called a variable. A variable takes different values for different individuals. Categorical and Quantitative Variables I A categorical variable places an individual into one of two or more groups or categories. I A quantitative variable takes numerical values for which arithmetic operations such as adding and averaging make sense. Note: There are variables that take numerical values, but are not quantitative (e.g. ZIP codes, jersey numbers); conversely, some categorical variables may be converted to quantitative variables by a meaningful coding (e.g. customer satisfaction ratings). Example 2 I The variables ID, Grade, Gender, PrevStat are categorical; I the variables TotalPoints and Year are quantitative; I the variable Grade may be converted to a quantitative variable by the numerical coding A=4.0, B=3.0, etc.; I the variable ID is numerical, but not quantitative. Displaying Distributions The distribution of a variable tells us what values it takes and how often it takes these values. A distribution may be given in the form of a table, which is called a frequency distribution. The frequencies in this table can be expressed using absolute frequencies (counts) or relative frequencies (percentages). Example 3 Here is the frequency distribution of the highest level of education for people aged 25 years and over in the U.S. (Source: www.census.gov, 2011 CPS) Education Less than high school High school graduate Some college Associate degree Bachelor’s degree Advanced degree Count 25,040 61,911 34,203 19,047 39,286 22,057 Percent 12.4% 30.7% 17.0% 9.5% 19.5% 10.9% The two most common graphs that are used to display data of this type are bar graphs and pie charts. Example 3, bar graph Example 3, pie chart Stemplots (Stem-and-Leaf Plots) To make a stemplot: 1. Separate each observation into a stem consisting of all but the final (rightmost) digit and a leaf, the final digit. Stems may have as many digits as needed, but each leaf contains only a single digit. 2. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. 3. Write each leaf in the row to the right of its stem, in increasing order out from the stem. Example 4: Vitamin D in Adolescents The following show data of vitamin D values (in ng/ml of blood) for 20 girls aged 11 to 14 years, and 20 boys aged 11 to 14 years. I Girls: 16 43 38 48 42 23 36 35 37 34 25 28 26 43 51 33 40 35 41 42 I Boys: 18 28 28 28 37 31 24 29 8 27 24 12 21 32 27 24 23 33 31 29 Source: Moore/McCabe/Craig, Introduction to the Practice of Statistics, 7th Edition. Example 4, Developing the Stemplot The stemplot for the vitamin D values of girls: (a) writing the stems; (b) writing the leaves; (c) sorting the leaves. Example 4, Back-to-back Stemplot Example 4, Back-to-back Stemplot with Split Stems Creating Bar Graphs in MS Excel Consider the following frequency data obtained from 50 teenage females and 50 teenage males. We want to create a bar graph that displays the five categories, with the frequencies for females and males displayed separately in side-by-side bars. First, click on the “Insert” tab, and select “Column”. Select the type of pie chart you want to use. We will use a regular 2-D column (bar) chart here. Creating Bar Graphs in MS Excel Creating Bar Graphs in MS Excel Then, click on “Select Data”, and click and drag the cursor over the column that contains the frequencies for both males and females. Include the labels (Female/Male) into the selected cell range. Creating Bar Graphs in MS Excel Now, click on the “Edit” button under the “Horizontal (Category) Axis Labels” panel. Select the column that contains the labels. Click “Ok”, and “Ok” again. The following bar chart results. Creating Bar Graphs in MS Excel Creating Pie Charts in MS Excel Consider the following frequency distribution for the number of sick days of workers. Creating Pie Charts in MS Excel To create a pie chart, click on the “Insert” tab, and select “Pie”. Select the type of pie chart you want to use. We will use a regular 2-D pie chart here. Creating Pie Charts in MS Excel Then, click on “Select Data”, and click and drag the cursor over the column that contains the frequencies. Creating Pie Charts in MS Excel Now, click on the “Edit” button under the “Horizontal (Category) Axis Labels” panel. Select the column that contains the labels. Creating Pie Charts in MS Excel Click “Ok”, and “Ok” again. This gives a usable pie chart. We modify the pie chart to show the percentages and the labels, as follows. Right-click on the pie chart itself (not the surrounding white space). Creating Pie Charts in MS Excel Select “Add Data Labels”, then right-click on the pie chart again, and select “Format Data Labels. . . ” Creating Pie Charts in MS Excel Check the “Percentage” box and the “Category Name” box, and uncheck the “Value” box. Creating Pie Charts in MS Excel Click “Ok”. Finally, we want to remove the legend on the right of the graph. Simply click on it, and press the DELETE key. The following pie chart results. Creating Pie Charts in MS Excel To include a title, click on the white space surrounding the pie chart, and click on the left-most template under “Chart Layouts”, and simply edit the title by clicking on it. Creating Pie Charts in MS Excel The final product should look like this.