Content Objective Students will be able to identify variables as being either quantitative or categorical in nature. Language Objective Students will understand the meaning of the following terms: Variable Observational Units Data Variability Quantitative Variable Categorical Variable Binary Variable Definitions: Variable – Any characteristic of a person or thing that can be assigned a number or a category. (Cognate: Variable) Observational Units – The person or thing to which the number or category is assigned. (Cognates: Observational - De observación Unit - Unidad) Data – The numbers or categories recorded for the observational units in a study. (Cognate: Data – datos) Variability – The phenomenon of a variable taking on different values or categories from observational unit to observational unit. (Cognate: Variability – variabilidad) Quantitative Variable – A variable which measures a numerical characteristic such as height for example. (Cognate: Quantitative – cuantitavio) Categorical Variable – A variable which a group designation (such as gender). (Cognate: categórico) Binary Variable – A categorical variable with only two possible categories (such as male and female). (Cognate: Binario) Sheltered Instruction Strategy: Read Aloud I choose to read through the dialogue first, focusing on any new vocabulary that students will encounter. I read slowly, thinking out loud as I read, alluding to past lessons where applicable. Breaking up into groups of two, students volunteer for the parts they will read. Once they have read through, I ask students to think back to the past lesson references that I made. Why would I have made them I ask? Students switch roles and read through once again. Unit 1. Data and Variables: Central Concept: What is meant by the term data? Research questions are investigated through the collecting of data and conducting statistical analysis. Content Standards Interpreting Categorical and Quantitative Data S-ID Summarize, represent, and interpret data on a single count or measurement variable. 1. Represent data with plots on the real number line (dot plots, histograms, and box plots). 2. Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (interquartile range, standard deviation) of two or more different data sets. 3. Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers). 4. Use the mean and standard deviation of a data set to fit it to a normal distribution and to estimate population percentages. Recognize that there are data sets for which such a procedure is not appropriate. Use calculators, spreadsheets, and tables to estimate areas under the normal curve. Summarize, represent, and interpret data on two categorical and quantitative variables. 5. Summarize categorical data for two categories in two-way frequency tables. Interpret relative frequencies in the context of the data (including joint, marginal, and conditional relative frequencies). Recognize possible associations and trends in the data. 6. Represent data on two quantitative variables on a scatter plot, and describe how the variables are related. a. Fit a function to the data; use functions fitted to data to solve problems in the context of the data. Use given functions or choose a function suggested by the context. Emphasize linear, quadratic, and exponential models. b. Informally assess the fit of a function by plotting and analyzing residuals. c. Fit a linear function for a scatter plot that suggests a linear association. Interpret linear models. 7. Interpret the slope (rate of change) and the intercept (constant term) of a linear model in the context of the data. 8. Compute (using technology) and interpret the correlation coefficient of a linear fit. 9. Distinguish between correlation and causation. Data and Variables: In Class Activities:1 Activity 1-1: Cell Phone Calls a.) For each student in class, record the number of outgoing calls he or she has made on a cell phone so far today. These numbers recorded represent data. Not all numbers are data. Data are numbers collected in a particular context. (For example, the numbers 10, 3 and 7 do not constitute data in and of themselves.) They are data if they represent the number of outgoing phone calls made by the first three students to walk into the classroom today. b.) Did every student in the classroom make the same number of outgoing calls? c.) Is number of outgoing calls a quantitative or categorical variable? d.) What if we record only whether or not you have made a call today? Would that be a quantitative or categorical variable? e.) Suggest another categorical variable that we could record about each student in the class with regard to cell phone use today. f.) Still considering the students in the class as the observational units, suppose each was asked the following questions. Classify each of the following variables as categorical or quantitative. If it is categorical, also indicate whether it is binary. Have you made more outgoing calls or received more incoming calls today, or the same number of each? What is the average duration of calls you have made today? Does your cell phone have a QWERTY keyboard? At what time did you receive your first call today? What was the area code to which you made your first call today? g.) Lambert and Pinheiro (2006) describe a study in which researchers try to identify characteristics of cell phone calls that suggest that the phone is being used fraudulently. Suppose we want to know the average duration of all the calls you have made in the past month as a way to create a profile of your cell phone usage. Identify the observational unit and variable in this measurement, and classify the variable as quantitative or categorical. Observational Units: Variable: Type: Watch Out It is very important that you think about the observational units and how to phrase the variable as a characteristic that varies from observational unit to observational unit. If might be helpful to force yourself to always fill in the blanks of the following sentence: We are recording ___________________ from ____________ Variable To ________________________________________ Observational Units h.) Suggest two more categorical variables and two more quantitative variables that could be measured about the call phone calls you made in the past month to help describe how you use your phone. Make sure you state these as variables and not as summaries. In Class Activities:2 Activity 1-2: Student Data a.) Again, consider the students in your class as observational units. Classify each of the following as categorical or quantitative. If it is categorical, also indicate whether or not it is binary. How many hours have you slept in the past 24 hours. Whether or not you have slept for at least 7 hours in the past 24 hours. Number of Harry Potter books that you have read. How many states have you visited Handedness (which had do you write with) Political viewpoint (liberal, moderate, or conservative) Day of the week on which you were born Average study time per week How many birthday cards you received on your last birthday Gender Research Question: A research question often looks for patterns in a variable or compares a variable across different groups or looks for a relationship between variables. Some research questions that you could investigate with data on the above variables include Do most students in your class get at least 7 hours of sleep in a typical night? Do females tend to study more than males? Is there an association between how much students study per week and how much sleep they get? Notice that though these are also phrased as questions, they summarize the variable(s) across the observational units rather than being posed to the individual observational unit. b.) Suggest two other research questions that you could investigate using the variables in part a. Research question 1: Research question 2: c.) Suggest four additional variables that you could record about yourself and your classmates, and then propose two research questions that you could address using those variables. [Hint: Be sure to distinguish the variables from the research questions; remember a variable is some characteristic that can be recorded for each student and can vary from student to student.] In Class Activities:3 Activity 1-3: Variables of State Suppose that the observational units of interest are 50 states. Identify which of the following are variables and which are not. Also classify the variables as categorical or quantitative. a.) Gender of the state’s current governor b.) Number of states that have a female governor c.) Percentage of the state’s residents older than 65 years of age d.) Highest speed limit in the state e.) Whether or not the state’s name contains one word f.) Average income of the adult residents of the state g.) How many states were settled before 1865 h.) Telephone area code for the capital building in the capital city Activity 1-3: Variables of State (Answers) a. Gender of the state’s current governor: binary categorical variable. b. Number of states that have a female governor: is not a variable. c. Percentage of the state’s residents older than 65: quantitative variable. d. Highest speed limit in the state: quantitative variable. e. Whether or not the state’s name contains one word: binary categorical variable f. Average income of the adult residents of the state: quantitative variable g. How many states were settled before 1865: is not a variable. h. Telephone area code for the capital building in the capital city: categorical In Class Activities:4 Activity 1-4: Studies from Blink The following studies are all described in the popular book Blink: The Power of Thinking Without Thinking by Malcolm Gladwell (2005). For each study identify the observational units and variables. Also, clarify each variable as quantitative or categorical. a.) A psychologist suspects that the chief executive officers (CEO’s) of American companies tend to be taller than the national average height of 69 inches; so she takes a random sample of 100 CEO’s and records their heights. Observational units Variable: Type: b.) A psychologist shows a videotaped interview of a married couple to a sample of 150 marriage counselors. Each counselor is asked to predict whether the couple will still be married five years later. The psychologist wants to test whether marriage counselors make the correct prediction more than half the time. Observational units Variable: Type: c.) A psychologist gives an SAT-like exam to 200 AfricanAmerican college students. Half the students are randomly assigned to use a version of the exam that asks them to indicate their race, and the other half are randomly assigned to use a version of the exam that does not ask them to indicate their race. The psychologist suspects that those students who are not asked to indicate their race will score significantly higher on the exam than those who are asked to indicate their race. Observational units Variable 1: Type: Variable 2: Type: d.) An economist sends four different actors to ten different car dealerships to negotiate the best price they can for a particular model of car. The four people are all the same age, dressed similarly, and tell the car salespeople that they have the same occupation and neighborhood of residence. One of the actors is a white male, one is a black male, one is a white female, and one is a black female. The economist wants to test whether the prices offered by these dealerships differ significantly depending on the race or gender of the customer. Observational units Variable 1: Type: Variable 2: Type: Variable 3: Type: Activity 1-4: Studies from Blink (Answers) a. Observational units: 100 CEOs; Variable: height of the CEO Type: quantitativeRossman/Chance, Workshop Statistics, 4/e 4 Solutions, Unit 1, Topic 1 b. Observational units:150 marriage counselors Variable: whether or not the counselor makes the correct prediction about whether a couple will still be married in five years Type: categorical (binary) c. Observational units: 200 African-American college students Variable 1: whether or not their version of the exam asks them to indicate race Type: categorical (binary) Variable 2: score on SAT-like exam Type: quantitative d. Observational units: 10 car dealerships Variable 1: gender of “customer” Type: categorical (binary) Variable 2: race of “customer” Type: categorical (binary) Variable 3: price negotiated for the car Type: quantitative Self Check Activity 1-5: A Nurse Accused Statistical evidence played an important role in the murder of Kristen Gilbert, a nurse who was accused of murdering hospital patients by giving them fatal doses of a heat stimulant (Cobb and Gerlach, 2006). Hospital records for an 18-month period of time indicated that of the 257 eight-hour shifts that Gilbert worked, a patient died on 40 of those shifts (15.6%). But during the 1384 eight-hour shifts that Gilbert did not work, a patient died on only 34 of those shifts (2.5%). (You will learn how to analyze such data in Topics 6 and 21.) a.) Identify the observational units in this study. [Hint: The correct answer is more subtle than most students suspect.] b.) Identify the two variables mentioned in the preceding paragraph. Classify each as categorical (possibly binary) or quantitative. Observational units Variable 1: Type: Variable 2: Type: Solution: a.) The observational units are the eight-hour shifts. b.) One variable is whether or not Gilbert worked on the shift. This variable is categorical and binary. (She either worked the shift OR she did not.). The other variable is whether or not the patient died on the shift. This variable is also categorical and binary. Watch Out It is tempting to call the patience the observational units, but that is not consistent with the data reported. The data indicate what happened on each shift, not what happened to each patient. The variables, therefore, need to refer to something that can be recorded about each shift, namely whether Gilbert worked that shift or not and whether a patient died on that shift or not. Notice that we are asking these variables as questions to be posed to each shift. Another way to spot the observational units is to focus on how many data values are in the study; in this case there are 257 + 1384 or 1641 shifts, not 1641 patients. Some common errors in reporting variables are: Providing a summary, such as “the total number of patient deaths” or “the percentage who died on Gilbert’s shifts.” Giving an ambiguous answer, such as “patient deaths.” Stating the research question rather than the variable, such as “did patients die at a higher rate on Gilbert’s shifts?” Describing a subset of the observational units, such as “the patients who died on Gilbert’s shifts.” Wrap Up You can use statistics to address interesting research questions that help you better understand the world and whatever academic discipline you study. You’ve seen that statistics played an important role in the murder trial of Kristen Gilbert and that statistics enabled researchers to answer questions such as whether or not CEO’s are taller than average and whether or not thinking about their races causes AfricanAmerican students to do worse on standardized exams. Because statistics is the science of data, this topic has given you a sense of data are and a glimpse of what data analysis entails. Data are not mere numbers: Data are collected for some purpose and have meaning in some context. For example, the numbers 5.25 and 37 are not data until you learn that they represent the number of hours slept that night and the number of states that a person has visited. You encountered the most fundamental concept of statistics: variability. This concept will be central throughout the course. How long each of your classmates slept last night varies from student to student, as does the day of the week on which each of your classmates were born. One key idea to learn quickly is that of a variable. Correctly identifying and classifying variables will serve you well throughout this course and help you determine which statistical tools to apply to that data. Homework: Night 1: Read Chapter 1, pages 3-14 Night 2: Page 11: Exercise 1-6 Page 12: Exercises 1-8, 1-9, 1-10 Page 13: Exercises 1-14, 1-15 Page 14: Exercises 1-21, 1-22 Exercise 1-6: Miscellany a. Binary categorical; observational units: pennies being spun b. Binary categorical; observational units: people leaving the washroom c. Quantitative; observational units: fast-food sandwiches d. Quantitative; observational units: residents of that country e. Binary categorical; observational units: American households f. Quantitative; observational units: people trying to memorize a vocabulary list g. Binary categorical; observational units: people (applying for a driver‘s license) h. Categorical; observational units: American voters in 2008 i. Binary categorical; observational units: newborn babies j. Quantitative; observational units: Alfred Hitchcock movies k. Quantitative; observational units: American pennies l. Quantitative; observational units: automobiles m. Quantitative; observational units: people eating ice cream Exercise 1-8: Top 100 Films a. Box office revenue: quantitative b. Number of years since production: quantitative (though age might be easier to interpret) c. Decade produced: categorical d. Whether or not the film was produced before 1960: binary categorical e. Whether or not the film won an Academy Award for Best Picture: binary categorical f. Whether or not you have seen the film: binary categorical g. The number of people in your class who have seen the film: quantitative (Notice how this quantity will vary from film to film.) Exercise 1-9: Credit Card Usage a. Year in school: categorical Whether or not the student has a credit card: binary categorical Outstanding balance on the credit card: quantitative Whether or not the outstanding balance exceeds $1000: binary categorical Source for selecting a credit card: categorical Region of the country: categorical b. Answers will vary, but some sample questions include: Which class (freshman, sophomore, …, ) tends to have the largest outstanding credit card balance? Do all regions of the country tend to obtain their credit cards from the same source? Exercise 1-10: Got a Tip? a. Answers will vary, but here are some examples: the number of customers at each table, the total amount spent on food and drink by each table, whether or not there were children at the table, whether a man or woman paid the bill. b. Answers will vary. Examples include: Which tends to have more influence on the tip – the size of the bill or the number of people in the party? Do males tend to be better or worse tippers than females? Exercise 1-14: Natural Light and Achievement a. The observational units are the students. b. One variable is whether or not the student learned in natural light. The other variable is the score on the standardized test. c. The first variable in part b is categorical and binary. The second variable in part b is quantitative. Exercise 1-15: Children’s Television Viewing a. The observational units are the third and fourth grade students in San Jose. b. The quantitative variables are body mass index, triceps skinfold thickness, waist circumference, waist-to-hip ratio, weekly time spent watching television, weekly time spent, and weekly time spent playing video games. The categorical variables are which school the student attends and gender. Exercise 1-21: Car Ages a. The observational units would be the cars. b. Variable 1 = Whether the car is driven by a faculty member or a student (binary categorical) Variable 2 = Age of the car (quantitative) c. This is not a variable; it is the research question under investigation rather than a measurement or category recorded about the individual cars. Exercise 1-22: In the News Answers will vary.