How can statistics be used to help decide the guilt or innocence of a

advertisement
What is Statistics?
Statistics is the science of reasoning from data, so a natural place to begin your study is by examining
what is meant by the term data. You will find that data vary, and variability abounds in everyday life and in
academic study. Indeed, the most fundamental principle in statistics is that of variability. If the world were
perfectly predictable and showed no variability, you would not need to study statistics. Thus, you will learn
about variables and consider their different classifications. You will also begin to experience the interesting
research questions that you can investigate by collecting data and conducting statistical analyses.
How can statistics be used to help decide the guilt or innocence of a nurse accused of murdering some of
her patients?
If chief executive officers tend to be taller than average, would this convince you that being tall provides
advantages in the business world?
Do some students do worse on standardized tests when they are first asked to indicate their race than
when they are not, perhaps due to negative stereotypes of their academic ability?
Vocabulary Supplemental Exercises Chapter 1
1. Statistics is a collection of methods for planning experiments, obtaining data, and then organizing,
summarizing, presenting, analyzing, interpreting and drawing conclusions based on the data.
2. A population is the complete collection of all elements (scores, people, measurements, and so on) to be
studied. The collection is complete in the sense that it includes all subjects to be studied.
3. A sample is a subcollection of elements selected from a population.
Example: In a study of household incomes in a small town of 1000 households, one might conceivably
obtain the income of every household. However, it is probably very expensive and time consuming to do
this. Therefore, a better approach would be to obtain the data from a portion of the households (let’s say
125 households). In this scenario, the 1000 households are referred to as the population and the 125
households are referred to as a sample.
4. A parameter is a numerical measurement describing some characteristic of a population and computed
from all of the population measurements.
5. A statistic is a numerical measurement describing some characteristic of a sample drawn from the
population.
Example: In the household incomes example from above, the average (mean) income of all 1000
households is a parameter, whereas the average (mean) income of the 125 households is a statistic.
6. Discrete data result when the number of possible values is either a finite number or a countable number.
(That is, the number of possible values is 0 or 1 or 2 and so on.)
Example: The numbers of fatal automobile accidents last month in the 10 largest US cities
7. Continuous data result from infinitely many possible values that correspond to some continuous scale
that covers a range of values without gaps, interruptions, or jumps.
Example: The finishing times of a marathon
Example Problems (parameter/statistic) - Determine whether the given value is a statistic or a parameter.
1. A sample of students is selected from FIU and their average age in years is 23.7.
2. In a study of all current major league baseball players, it was found that 78% batted exclusively righthanded.
Example Problems (discrete/continuous) - Determine whether the given value is from a discrete or
continuous data set.
1. A research poll of 1015 people shows that 752 of them have internet access at work.
2. Josh Becket’s fastball was clocked at 98 mph during the World Series.
3. A student spent $86.53 on her calculator for class.
The ________ is the set of all measurements of interest to the investigator. (population)
A ______ is a subset of measurements selected from the population of interest. (Sample)
Some more examples (answers are given at the end):
Which branch of statistics deals with organizing and summarizing data?
(Descriptive or Inferential)
If I would like to predict the average fuel economy for my new car, in what branch of statistics will I find
prediction methods?
(Descriptive or Inferential)
A researcher at FIU is studying the effects of anti-anxiety drugs on memory. She wants to know if people
suffering from PTSD due to combat stress will have a reduction in symptoms while taking an anti-anxiety
medication. She plans to put 30 PTSD patients on the new drug. What is the population for this study?
What are the experimental units?
Are the 30 patients described above a sample or the population?
Height and weight are both examples of ___________ variables, while eye color is an example of a
___________ variable.
Answers: Descriptive;
Inferential ;
All people suffering from combat stress related PTSD; the individual patients who will receive
the drug are the experimental units.
Sample
Quantitative; Qualitative
Examples:
1. Kristen Gilbert, a nurse for a veteran’s hospital, was charged with murdering some of her patients by
administering fatal doses of a heart stimulant. Some of the evidence presented at her trial was statistical.
Researchers combed through the records of all eight-hour shifts at the hospital in the previous eighteen
months. What information do you think they recorded about each shift?
2. Telephone companies constantly collect data on cell phone calls in an effort to detect anomalies that
might indicate fraud (in other words, that someone other than the rightful owner has been using the
phone). What information should you record about each cell phone call, in order to develop a profile for the
cell phone owner so that you could then monitor calls for anomalies?
A variable is any characteristic of a person or thing that can be assigned a number or a category. The
person or thing to which the number or category is assigned, such as a student in your class, is called the
observational unit. Data consist of the numbers or categories recorded for the observational units in a
study. Variability refers to the phenomenon of a variable taking on different values or categories from
observational unit to observational unit.
A quantitative variable measures a numerical characteristic such as height,
while a categorical variable records a group designation such as gender.
Now consider the students in your class as observational units. Classify each of the following variables as
categorical or quantitative
• How many hours you have slept in the past 24 hours
• Whether or not you have slept for at least 7 hours in the past 24 hours
• How many states you have visited
• Handedness (which hand you write with)
• Day of the week on which you were born
• Whether or not you have used a cell phone today
• Whether you prefer baths or showers
• How much time you spent on your most recent bath or shower
We will continue to focus your attention on observational units and variables in virtually all of the studies
and data that you encounter throughout this book. Along the way, keep the following points in mind:
• This distinction between categorical and quantitative variables is quite important because determining
which statistical tools to use for analyzing a given set of data often depends on the type of variables
involved.
• Notice how the variable of sleep time can be measured either quantitatively (first bullet) or categorically
(second bullet). This is true of many variables: the classification of the variable often depends on how the
quantity is measured more than on any intrinsic property.
• A variable that takes on numerical values that are really just category labels, such as a zip code, is
categorical.
Variables of State
Suppose that the observational units of interest are the fifty states. Identify which of the
following are variables and which are not. Also classify the variables as categorical or quantitative.
a. Gender of the state’s current governor
b. Number of states that have a female governor
c. Percentage of the state’s residents older than 65 years of age
d. Highest speed limit in the state
e. Whether or not the state’s name contains one word
f. Average income of the adult residents of the state
g. How many states were settled before 1865
Cell Phone Fraud
Lambert and Pinheiro (2006) describe a study in which researchers try to identify characteristics of cell
phone calls that suggest the phone is being used fraudulently. For each cell phone call, the researchers
recorded information on its direction (incoming or outgoing), location (local or roaming), duration, time of
day, day of week, and whether or not the call took place on a weekday or weekend.
a. Identify the observational units in this study.
b. Identify the categorical variables mentioned in the preceding paragraph. Indicate
which are binary and which are not.
c. Identify the quantitative variables mentioned in the preceding paragraph.
Studies from Blink
For each study, identify the observational units and variables. Also, classify each variable as quantitative
or categorical.
a. An economist suspects that chief executive officers (CEOs) of American companies tend to be taller
than the national average height of 69 inches, so she takes a random sample of 100 CEOs and records
their heights.
b. A psychologist shows a videotaped interview of a married couple to a sample of 150 marriage
counselors. Each counselor is asked to predict whether the couple will still be married five years later. The
psychologist wants to test whether marriage counselors make the correct prediction more than half the
time.
c. A psychologist gives an SAT-like exam to 200 African-American college students. Half of the students
are randomly assigned to use a version of the exam that asks them to indicate their race, and the other
half are randomly assigned to use a version of the exam that does not ask them to indicate their race. The
psychologist suspects that those students who are not asked to indicate their race will score significantly
higher on the exam than those who are asked to indicate their race.
d. An economist randomly assigns four actors to go to ten different car dealerships each and negotiate the
best price they can for a particular model of car. The four people are all the same age, dressed similarly,
and tell the car salespeople that they have the same occupation and neighborhood of residence. One of
the actors is a white male, one is a black male, one is a white female, and one is a black female. The
economist wants to test whether or not the average prices differ significantly among these four types of
customers.
Student Data
Think of some questions that can be addressed with data collected on yourself and your classmates.
Some research questions that you could investigate with data on those variables include
• Do female students tend to study more than male students?
• Do left-handers and right-handers differ with regard to sleeping times?
• Do students who study more tend to score higher on exams?
A Nurse Accused
Statistical evidence played an important role in the murder trial of Kristen Gilbert, a nurse who was
accused of murdering hospital patients by giving them fatal doses of a heart stimulant. Hospital records for
an eighteen-month period indicated that of the 257 eight-hour shifts that Gilbert worked, a patient died on
40 of those shifts (15.6%). But during the 1384 eight-hour shifts that Gilbert did not work, a patient died on
only 34 of those shifts (2.5%).
a. Identify the observational units in this study.
b. Identify the two variables mentioned in the preceding paragraph. Classify each as categorical or
quantitative.
Solution
a. The observational units are the eight-hour shifts.
b. One variable is whether or not Gilbert worked on the shift. This variable is categorical. The other
variable is whether or not a patient died on the shift. This variable is also categorical.
Wrap Up
You can use statistics to address interesting research questions that help you better understand the world
and whatever academic discipline you study. You have seen that statistics played an important role in the
murder trial of Kristen Gilbert, and that statistics enabled researchers to answer questions such as
whether or not CEOs are taller than average and whether or not thinking about their race causes AfricanAmerican students to do worse on standardized exams. Because statistics is the science of data, this
topic has given you a sense of what data are and a glimpse of what data analysis entails. Data are not
mere numbers: Data are collected for some purpose and have meaning in some context. For example, the
numbers 5.25 and 37 are not data until you learn that they represent the number of hours slept last night
and the number of states that a person has visited. You encountered the most fundamental concept of
statistics: variability. This concept will be central throughout the course. How long your classmates slept
last night varies from student to student, as does the day of the week on which your classmates
were born. One key idea to learn quickly is that of a variable. Correctly identifying and classifying variables
will serve you well throughout this course and help you determine which statistical tools to apply to the
data.
Some useful definitions to remember and habits to develop from this topic include
• Always consider data in context and anticipate reasonable values for the data
collected and analyzed.
• A variable is a characteristic that varies from person to person or from thing to
thing. The person or thing is called an observational unit.
• Variables can be classified as categorical or quantitative, depending on whether the
characteristic is a categorical designation (such as gender) or a numerical value (such as height).
Four different levels of measurement
Introduction
We classify data obtained from measurements using numbers and we can do this with different levels of
precision or levels of measurement. There are 4 levels of measurement and it is important to know what
level of measurement you are working with as this partly determines the arithmetic and statistical
operations you can carry out on them. The four levels of measurement in ascending order of precision are:
nominal, ordinal, interval and ratio.
Nominal
At the first level of measurement, numbers are used to classify data. In fact words or letters would be
equally appropriate. Say you wanted to classify a football team into left footed and right footed players, you
could put all the left footed players into a group classified as 1 and all the right footed players into a group
classified as 2. The numbers 1 and 2 are used for convenience, you could equally use the letters L and R, or
the words LEFT and RIGHT to label the groups of players. Numbers are often preferred because text takes
longer to type out and takes up more space. Another example is blood groups where the letter A, B, O and
AB represent the different classes
Ordinal
In ordinal scales, values given to measurements can be ordered. One example is shoe size. Shoes are
assigned a number to represent the size, larger numbers mean bigger shoes so unlike the nominal scale that
just reflects a category or class, the numbers of an ordinal scale show an ordered relationship between
numbered items – we know that a shoe size of 8 is bigger than a shoe size of 4. What you can’t say though
is that a shoe size of 8 is twice as big as a shoe size of 4. So numbers on an ordinal scale represent a rough
and ready ordering of measurements but the difference or ratios between any two measurements
represented along the scale will not be the same.
As for the nominal scale, with ordinal scales you can use textual labels instead of numbers to represent the
categories. So, for example, a scale for the measurement of patient satisfaction with the care they received
in hospital might look like this: | Not satisfied | Fairly satisfied | Satisfied | Very satisfied |
There are many everyday examples of measurements assigned to ordinal scales: social class grading I, II,
III, IV; grades A, B, C, D; house numbers 1,3,5…2,4,6, etc.
Interval
On an interval scale, measurements are not only classified and ordered therefore having the properties of
the two previous scales, but the distances between each interval on the scale are equal right along the scale
from the low end to the high end. Two points next to each other on the scale, no matter whether they are
high or low, are separated by the same distance. So when you measure temperature in centigrade the
distance between 96 and 980, for example, is the same as between 100 and 1020 C. Remember though is
that for interval scales, a measurement of 100oC does not mean that the temperature is 10 times hotter than
something measuring 100C even though the value given on the scale IS 10 times as large. That’s because
there is no absolute zero: the zero is arbitrary. On the centigrade scale, the zero value is taken as the point at
which water freezes and the 1000C value when water begins to boil and between these extreme values the
scale is divided into a hundred equal divisions. Temperatures below 00 on the centigrade scale are
designated negative numbers. So the arbitrary 00C does not mean ‘no temperature’. But when expressed on
the Kelvin scale, a ratio scale, a measure of 00 K equivalents to -2730C does indeed mean no temperature!
Other examples of interval measurements are rare, but there’s one you will be familiar with. Calendar years
are an interval scale. The arbitrary 0 was assigned when Christ was born and time before this is labeled
‘BC’.
Ratio
Measurements expressed on a ratio scale can have an actual zero. Apart from this difference, ratio scales
have the same properties as interval scales. The divisions between the points on the scale have the same
distance between them and numbers on the scale are ranked according to size. There are many examples of
ratio scale measurements, length, weight, temperature on the Kelvin scale, speed and counted values like
numbers of people, exam marks – a score of zero really does mean no marks!! Returning to the Kelvin
scale of temperatures, at the temperature of 0 K0 the lowest temperature possible, it is so cold that all
molecules have stopped moving.
Download