P201 Lecture Notes01 Chapter 1

advertisement
Chapter 1 - Introduction
Steinberg Ch 1
Definition of statistics
1. “Statistics” refers to techniques for summarizing characteristics of collections of numbers.
Such a summary is called a descriptive statistic.
Example: Arithmetic average. The average age of faculty is 34. The average age of students is 22.
Why do we work with collections of numbers? Why won’t just one number work?
Consider the effects of smoking
How old is the oldest person you know who smokes?
How old is the oldest person you know who doesn’t smoke?
Most of us know someone who smokes and has lived a long life.
Many of us know someone who never smoked but died at an early age.
It isn’t until the lifespans of 100s of persons who smoked and 100s of persons who never smoked
were compared that the detrimental effects of smoking become apparent.
The issue is one of variability of people. Take 100 people who have all been exposed to the exact same
conditions, and you’ll get 100 different results.
Variability of people is the cross that statisticians bear.
Variability is why statisticians have devised ways to summarize collections of scores – descriptive statistics.
Biderman's 201 L01 - 1
2. “Statistics” refers to techniques for making inferences (educated guesses) from samples to populations.
This process is called inferential statistics.
The issue is that the collection about which we want to make a statement is HUGE – all smokers vs. all
nonsmokers. How do we make valid statement about such a huge collection of scores.
The solution is to take a sample from the huge collection, describe the sample, and then use the sample
result to inform our statement about the larger collection of scores.
Example: Polling – the Gallup Poll, the USA Today Poll, etc.
Consider campaign spending. In what states should a candidate for president spend his $?
Only in those states for which the election will be close. In clear winning states and in clear losing
states, money won’t make enough of a difference to change the state’s electoral vote.
How do we know in which states the election will be close?
We could ask ALL the registered voters in the state.
Or we could take a sample of voters, ask them, and base the results on the sample.
That is inferential statistics – making a decision about a population based on the results of a sample.
Overview of this course
I. Descriptive Statistics
Chapters 1-6, 34-36
II. Probability concepts.
Chapters 7-15
III. Inferential Statistics.
Chapters 16-31
Why should we learn statistics?
Statistics helps us use data to make decisions.
Decision: The act of choosing one of two or more alternative courses of action.
Example: Deciding whether or not to make the iPhone keyless.
Sources of influence
1. Authority: Ask important people who know about keyboards.
2. Logic: Figure out what the best course of action is.
3. Religion: Pray for divine guidance.
4. Data: Build two prototypes - one with keys, one without. Have people use them both and evaluate them.
Biderman's 201 L01 - 2
_______
Basic Concepts
p. 3-5
Population: A collection of individuals about which we want to make some statement.
Example: School aged kids, UTC students, Persons working at a local manufacturing plant
Registered voters; Likely voters
Sample: Any subset of a population.
Random Sample: A sample chosen so that
1) each individual in the population has an equal chance of being selected,
2) each combination of individuals is equally likely
Variable: A characteristic that can take on different values for different individuals in a population.
Examples: Age, IQ, Height, Political Conservatism, Depression, Conscientiousness, Gender, Race
Numeric variable: A variable whose values are typically expressed only as numbers.
Example: Age, IQ, Height.
Nonnumeric variable: A variable whose values are typically expressed as names.
Examples: Gender, Race, College major.
Constant: A characteristic that does not change from one person to the next in a population.
Example: Gender in a population of males.
Data: The numbers representing values of variables.
56 is not a data, it’s a datum.
Data Matrix: A display of data in row x column form.
Each column represents a variable.
Each row represents an individual.
Example, with height and weight
Ht Weight
P1
72
215
P2
66
144
P3
84
267
P4
49
87
P5
58
135
Etc.
Biderman's 201 L01 - 3
Part of a real life data matrix used in a study of ATV accidents . . .
Relationship of the above to the two meanings of statistics
1. We summarize the values of variables in samples. This is descriptive statistics.
2. We use those summaries to make educated guesses about similar summaries in populations. This is
inferential statistics.
Concepts covered in this lecture
Descriptive statistics
Inferential statistics
Population sample random sample
variable constant data data matrix
Biderman's 201 L01 - 4
Download