DataStructure

advertisement
Data Structure
Learning Objectives: The student will be able to


recognize and distinguish among data types
select appropriate visualization based on the data types
Knowledge and Skills


data characterization
Key words: variable, independent, dependent, numeric,
categorical, discrete, continuous, nominal, ordinal,
Prerequisites

Reading tables and graphs
Preparation: Read page 5 in the White Paper Effectively Communicating Numbers by S. Few.
Citation: Neuhauser, C. Data Structure.
Created: August 3, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which
permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the
original author and source are credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 1
A variable is any characteristic or attribute that differs for different subjects. Examples include income, age, height, mortality rate, etc. We
distinguish between independent and dependent variables. If a variable is manipulated by the investigator, it is independent; if it is measured, it
is dependent. For instance, if we wanted to know the median income in a set of countries, the investigator would set up a spreadsheet with
three columns, one for “Country,” one for the “Year” when the data was collected, and one for “Median household income.” The investigator
selects the country and the year, and then “measures” the median income for each of the selected countries in Purchasing Power Parity (PPP):
Country
Switzerland
Canada
United States
New Zealand
United Kingdom
Australia
Israel
Singapore
Year
2005
2005
2006
2007
2004
2006
2006
2005
Median
household
income (PPP)
$55,000
$44,000
$48,000
$41,000
$39,000
$38,000
$37,000
$30,000
The variable “Country” and the variable “Year” are the independent variables, and “Median household income (PPP)” is the dependent variable.
A set of data describes characteristics of a population or a sample. Each characteristic is a variable. Data are classified into different categories
according to numeric and categorical.

Numeric data are either discrete or continuous. Discrete data can be counted, for instance, the number of patients in a study who are
male. Continuous data can take on any value in a finite or infinite interval, for instance, temperature measured in Kelvin can take on any
value greater than or equal to 0.
Citation: Neuhauser, C. Data Structure.
Created: August 3, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which
permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the
original author and source are credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 2

Data are categorical if they can be sorted into categories. Categorical data cannot be measured, though they can be assigned a code. For
instance, male could be assigned 0 and female 1. Nominal data can only be compared but cannot be ordered, whereas ordinal data can
be ordered.
Resources
http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html#
http://cnx.org/content/m16007/latest/
In-class Activities
In-class Activity 1: Mohs scale of mineral hardness is listed in the following table. According to the American Federation of Mineralogical
Societies, Inc., Mohs scale was devised in 1812 “by the German mineralogist Frederich Mohs (1773-1839), who selected the ten minerals
because they were common and readily available. The scale is not a linear scale, but somewhat arbitrary” (http://www.amfed.org/t_mohs.htm).
Mineral
Talc
Gypsum
Calcite
Fluorite
Apatite
Orthoclase
Quartz
Topaz
Corundum
Diamond
Hardness
1
2
3
4
5
6
7
8
9
10
Citation: Neuhauser, C. Data Structure.
Created: August 3, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which
permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the
original author and source are credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 3
Categorize the data type of the variable “Hardness” according to one of the four categories
a.
b.
c.
d.
discrete numeric
continuous numeric
nominal categorical
ordinal categorical
In-class Activity 2: The spreadsheet “GarfinkelCardiacData” contains data on 220 men and 338 women who participated in a study to determine
whether the drug “dobutamine” could be used to assess a patient’s risk of a heart attack. For each column in the spreadsheet determine the
type of data according to the four categories (a) discrete numeric , (b) continuous numeric , (c) nominal categorical , or (d) ordinal categorical, by
checking the appropriate box in the table in the spreadsheet (Table tab).
Citation: Neuhauser, C. Data Structure.
Created: August 3, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which
permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the
original author and source are credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 4
Homework
1. A discrete numeric variable can only take on finitely many values.
a. TRUE
b. FALSE
2. Measurement of a continuous variable is always a discrete approximation.
a. TRUE
b. FALSE
3. The variable “World Population” in the following table is
a. a discrete numeric variable
b. a continuous numeric variable
c. a nominal categorical variable
d. an ordinal categorical variable
Total Midyear Population
Year
World Population
1950
2,555,955,393
3,041,685,851
3,711,996,957
4,452,557,135
5,284,486,614
6,092,409,072
1960
1970
1980
1990
2000
4. A survey asks you to assign a value from 1 to 5 to rate your satisfaction with a book you recently bought online. The variable “satisfaction” is
Citation: Neuhauser, C. Data Structure.
Created: August 3, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which
permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the
original author and source are credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 5
a.
b.
c.
d.
a discrete numeric variable
a continuous numeric variable
a nominal categorical variable
an ordinal categorical variable
5. The variable “mode of transportation to work” is
a. a discrete numeric variable
b. a continuous numeric variable
c. a nominal categorical variable
d. an ordinal categorical variable
6. The graph below displays the gross domestic expenditure as a percentage of GDP (vertical axis) for different countries (horizontal axis). The
variable “Country” is
a. a discrete numeric variable
b. a continuous numeric variable
c. a nominal categorical variable
d. an ordinal categorical variable
Citation: Neuhauser, C. Data Structure.
Created: August 3, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which
permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the
original author and source are credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 6
Source: http://titania.sourceoecd.org/vl=1309619/cl=23/nw=1/rpsv/factbook2009/07/01/01/07-01-01-g1.htm
7. The graph below displays the U.S. population pyramid. The population size for male and female are on the horizontal axis and the age
classes are on the vertical axis. The variable “Age” in the graph Population (in millions) of the United States: 2005 is
a. a discrete numeric variable
b. a continuous numeric variable
c. a nominal categorical variable
d. an ordinal categorical variable
Citation: Neuhauser, C. Data Structure.
Created: August 3, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which
permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the
original author and source are credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 7
Source: http://www.census.gov/ipc/www/idb/pyramids.html
8. In Problems 3, 6, and 7, list all variables and determine for each variable whether it is an independent or a dependent variable.
a. Problem 3:
b. Problem 6:
c. Problem 7:
9. Read pages 6-9 in the White Paper Effectively Communicating Numbers by S. Few. (This paper will be a constant companion throughout the
course.)
Citation: Neuhauser, C. Data Structure.
Created: August 3, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which
permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the
original author and source are credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 8
References
Few, S. 2005. Effectively Communicating Numbers. Principal Perceptual Edge. White Paper. Downloaded from
http://www.perceptualedge.com/library.php#Whitepapers
Citation: Neuhauser, C. Data Structure.
Created: August 3, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which
permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the
original author and source are credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 9
Download