Chapter 15: Statistics

advertisement
Chapter 15: Statistics
Section 15.1: Formulating Statistical
Questions, Gathering Data, and Using
Samples
Statistical Questions
• Statistical Questions: ones that can be answered by collecting and
analyzing data (pieces of information, can be numerical or categorical)
• Ex’s: (a) What is the height of each student in our class?
(b) How many home runs has each player hit in a baseball team’s
starting lineup?
(c) What on campus restaurant is the favorite among UK students?
(d) What is the average weight of watermelons sold at your local
grocery store?
Types of Statistical Studies
• Observational Studies: observe characteristics or quantities without
influencing these characteristics or quantities
• Experiments: try to determine factors that influence characteristics or
quantities
Gathering Data
• The population is the full set of people or things that the study is
designed to investigate.
• Ex’s: (a) the students in our class, (b) the players in the starting
lineup, (c) the UK student body, (d) all of the watermelons
• A sample of a population consists of some collection of members of
the population.
• Samples need to be representative, i.e. the characteristics of the
sample reflect those of the population
• Ideal representative sample: random sample- every member of the
population has an equal chance of being in the sample.
Gathering Data
• Samples need to be representative, i.e. the characteristics of the
sample reflect those of the population
• Ideal representative sample: random sample- every member of the
population has an equal chance of being in the sample.
• Ex: Asking random students in the Student Center at lunchtime about
their favorite on campus restaurant is not random or representative.
Using Samples
• Ex: If a random sampling of 250 students showed that 60 students
said Panda Express was their favorite on campus restaurant, about
how many of UK’s 20,000 students would be expected to say that
Panda Express is their favorite?
Section 15.2: Displaying and
Interpreting Data
Displaying Categorical Data
• Real graph: display using real objects in
graph form
• Ex: using Starburst candy or wrappers
to display how many pieces there are of
each color
Displaying Categorical Data
• Pictograph: uses icons or pictures to display the data
• Ex: each small, colored
rectangle represents
a piece of candy for
each flavor in one
package of candy
Displaying Categorical Data
• Bar Graph: uses a single rectangle for each category to display the data
• Ex: the height of each bar
represents the number of
a piece of candy for each
flavor
Displaying Categorical Data
• Double Bar Graph: each category is subdivided into 2 smaller categories
• Ex: Displays median weekly
earnings broken down by race
and further subdivided by
gender.
Displaying Categorical Data
• Pie Graph: uses a subdivided circle to show how data is partitioned
into categories
• Ex: Shows the percentage of
UK students that prefer that
particular on campus restaurant
Displaying Numerical Data
• Dot Plot: a pictograph with categories being numbers or intervals and
the icons being dots
• Ex: The following displays the home runs hit by hitters in a baseball
lineup, where the home runs hit were 12, 17, 25, 32, 24, 17, 12, 12, 8
Displaying Numerical Data
• Ex: The following displays the home runs hit by hitters in a baseball
lineup, where the home runs hit were 12, 17, 25, 32, 24, 17, 12, 12, 8
Displaying Numerical Data
• Histogram: a bar graph with categories being numbers or intervals
• Ex: The number of students
earning each letter grade on
an exam is displayed.
Displaying Numerical Data
• Stem and Leaf Plot: 2 columns in which the stem (left column) and
leaf (right column) together form the data
• 2 | 0572 = data includes 20, 25, 27, and 22
• Ex: Displays the exam scores from the previous histogram
5
266
6
93
7
187827
8
02053901
9
58263
Displaying Numerical Data
• Line Graphs: data points are plotted and adjacent points are
connected by line segments, used for continuously varying data
• Ex: Displays the U.S. population each decade over the
20th century.
Displaying Numerical Data
• Scatterplot: collection of data points in a plane, shows how 2 kinds of
data are related
120
100
80
Exam Averages
• Ex: Compares homework
and exam averages for my
MA 201 students from last
semester
60
40
20
0
0
20
40
60
Homework Averages
80
100
120
Reading Graphs
• 1. Reading the data: lift facts directly from the graphs
• 2. Reading between the data: use mathematical concepts and skills to
compare or combine quantities and identify relationships between
data
• 3. Reading beyond the data: predict or infer from the data
Section 15.3: The Center of Data:
Mean, Median, and Mode
Motivating Example
• Ex: If your 4th grade math class had the following scores on their
adding fractions quiz, what score did the “average” student get?
7, 4, 9, 10, 7, 9, 9, 6
• A single-number summary of a set of numerical data is called a
measure of center. Ex’s are mean, median, and mode.
The Mean
• Def: The mean (or arithmetic mean or average) of a list of numbers is
calculated by adding all of the numbers and dividing that sum by the
length of the list (the number of numbers).
• Ex 1: The mean of the quiz scores 7, 4, 9, 10, 7, 9, 9, 6 is …
Visualizations of the Mean
• Consider a dot plot of your numerical data as if the axis is a seesaw.
The mean is the location of the fulcrum so that the seesaw is
perfectly balanced.
• See Activity 15K for another visual approach to finding the mean.
The Median
• Def: The median of a list of numbers is the middle number of the list
once it is ordered from smallest to largest.
• Ex 2: The median of the quiz scores 7, 4, 9, 10, 7, 9, 9, 6 is …
• If the length of the list is even, then take the average of the middle
two numbers. If the length is odd, no average is needed.
See Activity 15O
The Mode
• Def: The mode of a list of numbers is the number that occur most
frequently.
• Ex 3: The mode of the quiz scores 7, 4, 9, 10, 7, 9, 9, 6 is …
Revisiting Our Motivating Example
• What score did the average or typical student get if the quiz scores
were 7, 4, 9, 10, 7, 9, 9, 6?
• 7.625, 8, or 9 are all reasonable answers
Categorical Data
• Def: The modal category is the category listed most frequently in your
categorical data.
• Note that the modal category is not necessarily the “favorite” category in
statistical studies like the favorite on campus restaurant question. Study
voting theory for other possibilities.
Section 15.4: Data
Distributions
Definition and Types
• Def: Data distribution: numerical data displayed in a dot plot or
histogram.
• Data distributions with a long tail that extends to the right (left) of the
majority of the data is skewed to the right (left).
Types of Data Distributions
• Data distributions which have two or more peaks bimodal.
Types of Data Distributions
• Data distributions which have or nearly have reflectional symmetry
are symmetric.
Other statistical measures
• Def: The Pth percentile of a set of numerical data is the number such
that P% of the data is ≤ that number.
• Def: The 1st quartile is the 25th percentile and the 3rd quartile is the
75% percentile.
• What is the 50th percentile called?
The median
• Ex 4: What is the 1st quartile, 3rd quartile, and 90th percentile of the
following quiz scores?
3, 4, 5, 5, 5, 6, 7, 7, 7, 7, 7, 8, 8, 9, 9, 9, 9, 10, 10, 10
Another Graphical Display
• Def: A box plot (box and whiskers plot) is a display of the lowest
value, highest value, 1st and 3rd quartiles, and the median, as shown
in the example below.
• Ex: Draw a box plot for the quiz scores from Ex 4:
3, 4, 5, 5, 5, 6, 7, 7, 7, 7, 7, 8, 8, 9, 9, 9, 9, 10, 10, 10
Download