Statistics - the science of collecting, organizing, analyzing, and

advertisement
STA 2023 Elementary Statistics
Lecture Notes
Chapter 1 – Introduction to Statistics
Professor Achenbach
Introduction
Statistics - the science of collecting, organizing, analyzing, and interpreting Data.
Chapter 1: Collecting Data
Chapter 2: Organizing/Analyzing Data
Chapters 3-8: Interpreting Data
Types of Data Sets
Population - data set consisting of all outcomes, measurements, or responses of interest
Sample - data set which is a subset of the population data set
Examples:

If we are interested in measuring the salaries of American high-school teachers,
the population data set would be a list of the salaries of every high-school teacher
in America. A sample data set could be obtained by selecting 100 high-school
teachers from a across the country and listing their salaries.

A polling organization wants to know whether Americans favor increased defense
spending. The population data set would consist of the responses of every
American. A common way of choosing a sample data set would be to randomly
call 1000 Americans and gather their responses to the question of whether they
favor increased defense spending.

A biologist wants to measure the weights of female Alaskan grizzly bears. What
would be the population data set? A possible sample set?
1
Types of Measurements
Parameter - a numerical measurement made using the population data set
Statistic - a numerical measurement made using a sample data set
Examples:

Using the teacher salary data sets, we could calculate the average salary for the
high-school teachers. The average calculated from the population data set would
be the parameter. The average calculated from the sample of 100 teachers would
be a statistic.

Using the opinion poll data on defense spending, we could calculate the
percentage of Americans who favor increased defense spending. The actual
percentage of all Americans who favored increased defense spending would be
the parameter. The percentage of the 1000 Americans in our sample who favored
increased spending would be a statistic.
Notice that unless the population is very small it is probably impossible to gather the
population data set, and so it is usually impossible to calculate the parameter we are
interested in.
The main idea of the science of statistics is that we can get around this difficulty by
selecting a sample, calculating the sample statistic, and use the sample statistic to make
an estimate of the parameter.
Unfortunately, statistical estimates can never be 100% certain. (But they can be 90% or
95% or 99% certain)
Types of Data
Qualitative Data - non-numerical characteristics or labels
Examples:
Eye Color, First Name
Favorite Movie, Political Party
Quantitative Data - numerical measurements or quantities
Examples:
Height, Weight, Income
Resting Pulse Rate, Blood Alcohol Level
2
Levels of Measurement
Nominal Data – Can be qualitative only. Data values serve as labels, but the labels have
no meaningful order.
Examples:
Blood Type, College Major, Breed of Dog
Shape of Bacteria in a Petri Dish
Ordinal Data – Can be qualitative or quantitative. Data values serve as labels but the
labels have a natural meaningful order. Differences between values, however, are
meaningless.
Examples:
Statistics Grade, NCAA Basketball Rankings
Terror Threat Level
Interval Data – Are always quantitative. Data values are numerical, so they have a
natural meaningful order, and differences between data values are meaningful. The ratio
of two data values, however, is meaningless. This occurs when zero is an arbitrary
measurement rather than actually indicating “nothing”.
Examples:
Temperature, Year of Birth
Ratio Data – Are always quantitative. Data values are numerical, have order, and both
differences and ratios between values are meaningful. Zero measurement indicates
absence of the quantity being measured.
Examples:
Weight, Height, Volume, Number of Children
3
Methods of Data Collection
Method
Examples

Census - collect measurements from the
entire population

Used when population is small.
Sampling - choose a sample from your
population and collect measurements
from sample.
Determine average grade on a
Statistics exam
Measure salaries of all 50 state
governors


Opinion Polls
Determine average income in
U.S

Temperature at the core of the
Sun
Monte Carlo Simulations
Used when population is large. (Most
Common)
Simulation - Program a computer with a
mathematical or physical model to
simulate population data.

Used when impossible to collect sample
data.
Experiment - Collect a sample, split the
sample into two groups:

The Case Group receives treatment.
The Control Group does not.
Used to measure the effect of treatment
by comparing the characteristics of the
case and control groups.
Additional Terms:
Placebo,
Placebo Effect
Single Blind Experiment
Double Blind Experiment
4
A sample of 200 cancer patients
is selected. An experimental
drug is given to 100 patients and
the remaining 100 patients
receive a placebo. The survival
rates of the two groups are then
compared
Methods of Sampling
Method
Examples

Random Sampling - The sample is
chosen as a result of chance occurrences


Systematic Sampling - The population
is placed on a list, a random starting
point is chosen and then every k-th
member is selected.

Telephone polling random
telephone numbers
Drawing names out of a hat
Choosing a sample of registered
voters by choosing every 25th
voter from the county registration
roll
Testing every 300th product from
the assembly line
Stratified Sampling - The population is
divided into groups (strata) usually with
meaningful differences, and a sample is
chosen from each group.

Cluster Sampling - The population is
divided into groups in a more or less
random way, and then a sample is
chosen by randomly selecting entire
groups.

Randomly choose 10 polling
stations in a city and exit poll all
voters at those stations
Convenience Sampling - Choose
individuals for a sample because they are
easy to include.


Internet Polls
Mail-In Customer Survey

5
Choosing 200 men and 200
women for a sample
Stratify the population by income
level and then choose a sample of
low, middle, and high income
individuals
Download