Variable - GeoStats

advertisement
Content Objective
Students will be able to identify variables as being either
quantitative or categorical in nature.
Language Objective
Students will understand the meaning of the following terms:
 Variable
 Observational Units
 Data
 Variability
 Quantitative Variable
 Categorical Variable
 Binary Variable
 Definitions:

Variable – Any characteristic of a person or thing that can
be assigned a number or a category.
(Cognate: Variable)

Observational Units – The person or thing to which the
number or category is assigned.
(Cognates: Observational - De observación
Unit - Unidad)

Data – The numbers or categories recorded for the
observational units in a study.
(Cognate: Data – datos)

Variability – The phenomenon of a variable taking on
different values or categories from observational unit to
observational unit.
(Cognate: Variability – variabilidad)

Quantitative Variable – A variable which measures a
numerical characteristic such as height for example.
(Cognate: Quantitative – cuantitavio)
Categorical Variable – A variable which a group
designation (such as gender).
(Cognate: categórico)


Binary Variable – A categorical variable with only two
possible categories (such as male and female).
(Cognate: Binario)
Sheltered Instruction Strategy: Read Aloud

I choose to read through the dialogue first, focusing on any
new vocabulary that students will encounter.

I read slowly, thinking out loud as I read, alluding to past
lessons where applicable.

Breaking up into groups of two, students volunteer for the
parts they will read.

Once they have read through, I ask students to think back to
the past lesson references that I made. Why would I have
made them I ask?

Students switch roles and read through once again.
Unit 1. Data and Variables:
Central Concept: What is meant by the term data? Research
questions are investigated through the collecting of data and
conducting statistical analysis.
Content Standards
Interpreting Categorical and Quantitative Data
S-ID
Summarize, represent, and interpret data on a single count or
measurement variable.
1. Represent data with plots on the real number line (dot plots,
histograms, and box plots).
2. Use statistics appropriate to the shape of the data distribution to
compare center (median, mean) and spread (interquartile range,
standard deviation) of two or more different data sets.
3. Interpret differences in shape, center, and spread in the context of
the data sets, accounting for possible effects of extreme data points
(outliers).
4. Use the mean and standard deviation of a data set to fit it to a
normal distribution and to estimate population percentages.
Recognize that there are data sets for which such a procedure is not
appropriate. Use calculators, spreadsheets, and tables to estimate
areas under the normal curve.
Summarize, represent, and interpret data on two categorical and
quantitative variables.
5. Summarize categorical data for two categories in two-way
frequency tables. Interpret relative frequencies in the context of the
data (including joint, marginal, and conditional relative frequencies).
Recognize possible associations and trends in the data.
6. Represent data on two quantitative variables on a scatter plot, and
describe how the variables are related.
a. Fit a function to the data; use functions fitted to data to solve
problems in the context of the data. Use given functions or choose a
function suggested by the context. Emphasize linear, quadratic, and
exponential models.
b. Informally assess the fit of a function by plotting and analyzing
residuals.
c. Fit a linear function for a scatter plot that suggests a linear
association.
Interpret linear models.
7. Interpret the slope (rate of change) and the intercept (constant
term) of a linear model in the context of the data.
8. Compute (using technology) and interpret the correlation
coefficient of a linear fit.
9. Distinguish between correlation and causation.
Data and Variables:
In Class Activities:1
Activity 1-1: Cell Phone Calls
a.) For each student in class, record the number of outgoing calls
he or she has made on a cell phone so far today.
These numbers recorded represent data. Not all numbers are data.
Data are numbers collected in a particular context. (For example, the
numbers 10, 3 and 7 do not constitute data in and of themselves.)
They are data if they represent the number of outgoing phone calls
made by the first three students to walk into the classroom today.
b.) Did every student in the classroom make the same number of
outgoing calls?
c.) Is number of outgoing calls a quantitative or categorical
variable?
d.) What if we record only whether or not you have made a call
today? Would that be a quantitative or categorical variable?
e.) Suggest another categorical variable that we could record
about each student in the class with regard to cell phone use
today.
f.) Still considering the students in the class as the observational
units, suppose each was asked the following questions.
Classify each of the following variables as categorical or
quantitative. If it is categorical, also indicate whether it is binary.

Have you made more outgoing calls or received more incoming
calls today, or the same number of each?

What is the average duration of calls you have made today?

Does your cell phone have a QWERTY keyboard?

At what time did you receive your first call today?

What was the area code to which you made your first call
today?
g.) Lambert and Pinheiro (2006) describe a study in which
researchers try to identify characteristics of cell phone calls that
suggest that the phone is being used fraudulently. Suppose we
want to know the average duration of all the calls you have
made in the past month as a way to create a profile of your cell
phone usage. Identify the observational unit and variable in this
measurement, and classify the variable as quantitative or
categorical.
Observational Units:
Variable:
Type:
Watch Out
 It is very important that you think about the observational
units and how to phrase the variable as a characteristic that
varies from observational unit to observational unit. If might
be helpful to force yourself to always fill in the blanks of the
following sentence:
We are recording ___________________ from ____________
Variable
To ________________________________________
Observational Units
h.) Suggest two more categorical variables and two more
quantitative variables that could be measured about the call
phone calls you made in the past month to help describe how
you use your phone. Make sure you state these as variables
and not as summaries.
In Class Activities:2
Activity 1-2: Student Data
a.) Again, consider the students in your class as observational units.
Classify each of the following as categorical or quantitative. If it is
categorical, also indicate whether or not it is binary.
 How many hours have you slept in the past 24 hours.
 Whether or not you have slept for at least 7 hours in the past
24 hours.
 Number of Harry Potter books that you have read.
 How many states have you visited
 Handedness (which had do you write with)
 Political viewpoint (liberal, moderate, or conservative)
 Day of the week on which you were born
 Average study time per week
 How many birthday cards you received on your last birthday
 Gender
Research Question: A research question often looks for patterns
in a variable or compares a variable across different groups or
looks for a relationship between variables.
Some research questions that you could investigate with data on the
above variables include
 Do most students in your class get at least 7 hours of
sleep in a typical night?

Do females tend to study more than males?
 Is there an association between how much students
study per week and how much sleep they get?
Notice that though these are also phrased as questions, they
summarize the variable(s) across the observational units
rather than being posed to the individual observational unit.
b.) Suggest two other research questions that you could
investigate using the variables in part a.
Research question 1:
Research question 2:
c.) Suggest four additional variables that you could record
about yourself and your classmates, and then propose two
research questions that you could address using those
variables. [Hint: Be sure to distinguish the variables from the
research questions; remember a variable is some
characteristic that can be recorded for each student and can
vary from student to student.]
In Class Activities:3
Activity 1-3: Variables of State
Suppose that the observational units of interest are 50 states. Identify
which of the following are variables and which are not. Also classify
the variables as categorical or quantitative.
a.) Gender of the state’s current governor
b.) Number of states that have a female governor
c.) Percentage of the state’s residents older than 65 years of age
d.) Highest speed limit in the state
e.) Whether or not the state’s name contains one word
f.) Average income of the adult residents of the state
g.) How many states were settled before 1865
h.) Telephone area code for the capital building in the capital city
Activity 1-3: Variables of State (Answers)
a. Gender of the state’s current governor: binary categorical
variable.
b. Number of states that have a female governor: is not a variable.
c. Percentage of the state’s residents older than 65: quantitative
variable.
d. Highest speed limit in the state: quantitative variable.
e. Whether or not the state’s name contains one word: binary
categorical variable
f. Average income of the adult residents of the state: quantitative
variable
g. How many states were settled before 1865: is not a variable.
h. Telephone area code for the capital building in the capital city:
categorical
In Class Activities:4
Activity 1-4: Studies from Blink
The following studies are all described in the popular book Blink: The
Power of Thinking Without Thinking by Malcolm Gladwell (2005).
For each study identify the observational units and variables. Also,
clarify each variable as quantitative or categorical.
a.) A psychologist suspects that the chief executive officers
(CEO’s) of American companies tend to be taller than the
national average height of 69 inches; so she takes a random
sample of 100 CEO’s and records their heights.
Observational units
Variable:
Type:
b.) A psychologist shows a videotaped interview of a married
couple to a sample of 150 marriage counselors. Each
counselor is asked to predict whether the couple will still be
married five years later. The psychologist wants to test whether
marriage counselors make the correct prediction more than half
the time.
Observational units
Variable:
Type:
c.) A psychologist gives an SAT-like exam to 200 AfricanAmerican college students. Half the students are randomly
assigned to use a version of the exam that asks them to
indicate their race, and the other half are randomly assigned to
use a version of the exam that does not ask them to indicate
their race. The psychologist suspects that those students who
are not asked to indicate their race will score significantly
higher on the exam than those who are asked to indicate their
race.
Observational units
Variable 1:
Type:
Variable 2:
Type:
d.) An economist sends four different actors to ten different car
dealerships to negotiate the best price they can for a particular
model of car. The four people are all the same age, dressed
similarly, and tell the car salespeople that they have the same
occupation and neighborhood of residence. One of the actors is
a white male, one is a black male, one is a white female, and
one is a black female. The economist wants to test whether the
prices offered by these dealerships differ significantly
depending on the race or gender of the customer.
Observational units
Variable 1:
Type:
Variable 2:
Type:
Variable 3:
Type:
Activity 1-4: Studies from Blink (Answers)
a. Observational units: 100 CEOs;
Variable: height of the CEO Type:
quantitativeRossman/Chance, Workshop Statistics, 4/e 4
Solutions, Unit 1, Topic 1
b. Observational units:150 marriage counselors
Variable: whether or not the counselor makes the correct
prediction about whether a couple will still be married in five
years Type: categorical (binary)
c. Observational units: 200 African-American college students
Variable 1: whether or not their version of the exam asks them
to indicate race
Type: categorical (binary)
Variable 2: score on SAT-like exam Type: quantitative
d. Observational units: 10 car dealerships
Variable 1: gender of “customer” Type: categorical (binary)
Variable 2: race of “customer” Type: categorical (binary)
Variable 3: price negotiated for the car Type: quantitative
Self Check
Activity 1-5: A Nurse Accused
Statistical evidence played an important role in the murder of
Kristen Gilbert, a nurse who was accused of murdering
hospital patients by giving them fatal doses of a heat
stimulant (Cobb and Gerlach, 2006). Hospital records for an
18-month period of time indicated that of the 257 eight-hour
shifts that Gilbert worked, a patient died on 40 of those shifts
(15.6%). But during the 1384 eight-hour shifts that Gilbert did
not work, a patient died on only 34 of those shifts (2.5%). (You
will learn how to analyze such data in Topics 6 and 21.)
a.)
Identify the observational units in this study. [Hint: The
correct answer is more subtle than most students
suspect.]
b.)
Identify the two variables mentioned in the preceding
paragraph. Classify each as categorical (possibly
binary) or quantitative.
Observational units
Variable 1:
Type:
Variable 2:
Type:
Solution:
a.)
The observational units are the eight-hour shifts.
b.)
One variable is whether or not Gilbert worked on
the shift. This variable is categorical and binary.
(She either worked the shift OR she did not.). The
other variable is whether or not the patient died on
the shift. This variable is also categorical and
binary.
Watch Out
It is tempting to call the patience the observational units, but
that is not consistent with the data reported. The data indicate
what happened on each shift, not what happened to each
patient. The variables, therefore, need to refer to something that
can be recorded about each shift, namely whether Gilbert
worked that shift or not and whether a patient died on that shift
or not. Notice that we are asking these variables as questions to
be posed to each shift. Another way to spot the observational
units is to focus on how many data values are in the study; in
this case there are 257 + 1384 or 1641 shifts, not 1641 patients.
Some common errors in reporting variables are:
 Providing a summary, such as “the total number of patient
deaths” or “the percentage who died on Gilbert’s shifts.”
 Giving an ambiguous answer, such as “patient deaths.”
 Stating the research question rather than the variable,
such as “did patients die at a higher rate on Gilbert’s
shifts?”
 Describing a subset of the observational units, such as
“the patients who died on Gilbert’s shifts.”
Wrap Up
You can use statistics to address interesting research questions
that help you better understand the world and whatever
academic discipline you study. You’ve seen that statistics
played an important role in the murder trial of Kristen Gilbert
and that statistics enabled researchers to answer questions
such as whether or not CEO’s are taller than average and
whether or not thinking about their races causes AfricanAmerican students to do worse on standardized exams.
Because statistics is the science of data, this topic has given
you a sense of data are and a glimpse of what data analysis
entails. Data are not mere numbers: Data are collected for some
purpose and have meaning in some context. For example, the
numbers 5.25 and 37 are not data until you learn that they
represent the number of hours slept that night and the number
of states that a person has visited.
You encountered the most fundamental concept of statistics:
variability. This concept will be central throughout the course.
How long each of your classmates slept last night varies from
student to student, as does the day of the week on which each
of your classmates were born. One key idea to learn quickly is
that of a variable. Correctly identifying and classifying variables
will serve you well throughout this course and help you
determine which statistical tools to apply to that data.
 Homework:
Night 1: Read Chapter 1, pages 3-14
Night 2:
Page 11: Exercise 1-6
Page 12: Exercises 1-8, 1-9, 1-10
Page 13: Exercises 1-14, 1-15
Page 14: Exercises 1-21, 1-22
Exercise 1-6: Miscellany
a. Binary categorical; observational units: pennies being spun
b. Binary categorical; observational units: people leaving the
washroom
c. Quantitative; observational units: fast-food sandwiches
d. Quantitative; observational units: residents of that country
e. Binary categorical; observational units: American households
f. Quantitative; observational units: people trying to memorize a
vocabulary list
g. Binary categorical; observational units: people (applying for a
driver‘s license)
h. Categorical; observational units: American voters in 2008
i. Binary categorical; observational units: newborn babies
j. Quantitative; observational units: Alfred Hitchcock movies
k. Quantitative; observational units: American pennies
l. Quantitative; observational units: automobiles
m. Quantitative; observational units: people eating ice cream
Exercise 1-8: Top 100 Films
a. Box office revenue: quantitative
b. Number of years since production: quantitative (though age
might be easier to interpret)
c. Decade produced: categorical
d. Whether or not the film was produced before 1960: binary
categorical
e. Whether or not the film won an Academy Award for Best
Picture: binary categorical
f. Whether or not you have seen the film: binary categorical
g. The number of people in your class who have seen the film:
quantitative (Notice how this quantity will
vary from film to film.)
Exercise 1-9: Credit Card Usage
a. Year in school: categorical
Whether or not the student has a credit card: binary categorical
Outstanding balance on the credit card: quantitative
Whether or not the outstanding balance exceeds $1000: binary
categorical
Source for selecting a credit card: categorical
Region of the country: categorical
b. Answers will vary, but some sample questions include:
Which class (freshman, sophomore, …, ) tends to have the
largest outstanding credit card balance?
Do all regions of the country tend to obtain their credit cards
from the same source?
Exercise 1-10: Got a Tip?
a. Answers will vary, but here are some examples:
the number of customers at each table, the total amount spent
on food and drink by each table,
whether or not there were children at the table, whether a man or
woman paid the bill.
b. Answers will vary. Examples include:
Which tends to have more influence on the tip – the size of the
bill or the number of people in the party?
Do males tend to be better or worse tippers than females?
Exercise 1-14: Natural Light and Achievement
a. The observational units are the students.
b. One variable is whether or not the student learned in natural
light. The other variable is the score on
the standardized test.
c. The first variable in part b is categorical and binary. The
second variable in part b is quantitative.
Exercise 1-15: Children’s Television Viewing
a. The observational units are the third and fourth grade
students in San Jose.
b. The quantitative variables are body mass index, triceps
skinfold thickness, waist circumference, waist-to-hip ratio,
weekly time spent watching television, weekly time spent,
and weekly time spent playing video games.
The categorical variables are which school the student attends
and gender.
Exercise 1-21: Car Ages
a. The observational units would be the cars.
b. Variable 1 = Whether the car is driven by a faculty member
or a student (binary categorical)
Variable 2 = Age of the car (quantitative)
c. This is not a variable; it is the research question under
investigation rather than a measurement or category
recorded about the individual cars.
Exercise 1-22: In the News
Answers will vary.
Download