D S I P

advertisement
Last Name____________________ First Name _____________________Class Time________Chapter 1-1
STATISTICS IS THE STUDY OF DATA
Statistics: how to collect, organize, summarize, analyze, interpret and draw conclusions from information.
2 MAIN BRANCHES:
 Descriptive Statistics: procedures for describing data using numerical and graphical summaries.
 Inferential Statistics: using sample information to make conclusions about the population
KEY TERMS AND VOCABULARY:
 Population: the complete set of ALL people or things being studied
 Sample: a SUBSET of the population that information is actually gathered from
 Parameter: a characteristic or number of the
_____________________________
.
A P________________ is calculated by using all the data values from a P____________________.
 Parameters are often symbolized with Greek letters such as:
Mean:  (mu), Standard Deviation:  (sigma), Correlation Coefficient:  (rho),
 Statistic: a characteristic or number of the S
AS
Proportion: p.
.
is calculated by using only the data values from a S
 Statistics are often symbolized with Roman letters such as:
Mean: x (lower-case x-bar), Standard Deviation: s, Correlation Coefficient: r,
______________.
Proportion: p’.
Example 1 – Populations, Sample, Parameters, Statistics – For the following:
1) Define the population.
2) Define the sample.
3) Define the parameter in context of the example; include the population defined in context of the problem.
4) Define the statistic in context of the example; include the sample defined in context of the problem.
5) Write the symbol and value of the parameter.
6) Write the symbol and value of the statistic.
a)PROPORTION: European rules specify that their dark chocolate must be made with 35% cocoa solids.
A sample of 17 expensive European dark chocolate bars was made with 70% cocoa solids.
1) Population: All dark chocolate made in Europe
2) Sample: 17 expensive European dark chocolate bars
3) Parameter: proportion of cocoa solids in all European dark chocolate
4) Statistic: proportion of cocoa solids in a sample of 17 expensive European dark chocolate bars
5) Symbol and Value of Parameter: p = 0.35
6) Symbol and Value of Statistic: p’ = 0.70
Last Name____________________ First Name _____________________Class Time________Chapter 1-2
b) AVERAGE/MEAN: From 1948 to 2005, San Jose had an average rainfall of 14.4 inches per year. During 15
years randomly selected from 1948 to 2005, on average, 13.8 inches of rain fell per year in San Jose.
1) Population:
2) Sample:
3) Parameter:
4) Statistic:
5) Symbol and Value of Parameter:
6) Symbol and Value of Statistic:
c) In a study of 500 randomly selected customers, a local office supply store found that 14% of customers
purchased pens. Through daily sales records, the store accountant determined that 28% of all customers
purchased pens.
1) Population:
2) Sample:
3) Parameter:
4) Statistic:
5) Symbol and Value of Parameter:
6) Symbol and Value of Statistic:
d) Use the following words to fill in the blanks below: sample, statistic, parameter, and population.
The information from a _______________ can be used to estimate the information for a _______________.
A ________________________ can be used to estimate a ____________________________________ .
Variable: an attribute measured or studied for each element in the population.
 Variables whose values are determined by chance are called Random Variables.
 Variables are symbolized with capital Roman letters: X, Y, etc.
 When asked to identify or define the variable, you are being asked to DESCRIBE the
characteristic of interest with a phrase. Your definition should include the type of units
(ounces, pounds, kilograms, etc.) of your variable, when possible.
 You can also identify the variable as the way you would describe
ONE value of data in response to the survey question.
Last Name____________________ First Name _____________________Class Time________Chapter 1-3
 Data: the collection of values, measurements, or observations the variables can assume.
 The word Data is plural and must be matched with a plural verb in sentence. (The data were
increasing over time.) Each individual value of a set of data is called datum.
 Data are symbolized with lower-case Roman letters: x, y, etc.
Types of Variables and Data
(CAUTION: YOU NEED TO BE ABLE TO SPELL ALL OF THESE TYPES CORRECTLY!)
**CAUTION! Be careful to recognize that some quantitative variables which are continuous
are often reported out in a discrete manner. Examples are time/age (20 years, 2 days, etc.),
length/distance (5 miles, 10 feet, etc.) or weight/amount (3 ounces, 6 cubic yards, 9 gallons, etc.).
Even though one might say they are 20 years old, time continues with no breaks. 20 years old is an
estimate, not an exact value of one's age.
Example 2 – Variables and Data: For the following:
1) Define the variable in context of the problem.
2) Iidentify the type of variable (qualitative, quantitative discrete or quantitative continuous)
3) List 3 possible results for data.
a) The bank manager was interested in how long, in minutes, her customers waited for service, on average.
1) X = the amount of time (minutes) that one bank customer waited for service
OR
X = how long (minutes) one bank customer waited for service
2) Quantitative continuous
3) x = 2, 10.6, 0.58
b) The zookeeper kept track of the months of all penguin births at the zoo over the past 20 years.
1) X =
X=
2)
3) x =
OR
Last Name____________________ First Name _____________________Class Time________Chapter 1-4
c) When ordering new soles for making repairs, the shoemaker made a list of shoe size lengths for all shoe
repair orders in his shop.
1) X =
OR
X=
2)
3) x =
Example 3: Interpreting Vocabulary:
Owners of a gym are studying the exercise habits of its members in 2012.
Sign-up questionnaires before joining the gym showed that 16% of its members exercised regularly before
enrolling in the gym. A random survey of 100 members showed that 71% exercised regularly after enrolling
in the gym. On average, surveyed members exercised 4 times and spent an average of 2.5 hours exercising in
week 5 after joining the gym. In particular, one client who was surveyed exercised 6 times for a total of 7.8
hours in week 5 after joining the gym.
a. Describe the population using a complete sentence.
b. Describe the sample using a complete sentence.
c. List two qualitative variables:
d. List one quantitative discrete variable:
e. List one quantitative continuous variable:
f. Fill in the blanks with one of the following words: data, statistic, variable, parameter.
16%
7.8
71%
6
2.5
Last Name____________________ First Name _____________________Class Time________Chapter 1-5
Example 4: Type of Variables: Variable Definition: Corresponding Data: Fill in the blanks
Type of Variable/Data
Variable Definition
Qualitative
the size of one shirt in the store
Two Data Values
0, 4, 15, 21
the amount (cups) of sugar used in one cake
Quantitative Discrete
the number of different vegetables grown in one garden
Quantitative Continuous
the weight gained by one baby during its first year
the type of apple in one orchard
1, 1.5
Pink Lady, Fuji
Bambi, The Godfather
the amount of gas (gallons) used on one trip
129, 117, 96, 114
Example 5:
 Make up 6 examples of variable definitions.
 You should have 2 examples of variable definitions for each of the three types of variable/data.
 Give 3 values of data for each of your examples.
 Remember to include the units for variables when appropriate.
 Do not use the examples that are included in the worksheet. Come up with your own ideas. Be creative!
 Your work must be NEAT, Complete and written in pencil.
Type of Variable/Data
Variable Definition
Two Data Values
Last Name____________________ First Name _____________________Class Time________Chapter 1-6
TYPES OF STATISTICAL STUDIES:
Census: the collection of data from every member of the population
Example: We're interested in the percent of women who are registered at De Anza this quarter.
 Gender information could be recorded from the registration record of each student at De Anza this quarter and
the percent of women calculated from that information.
Why don't we just always collect data from the whole population?
 May be impractical, inaccurate, expensive, too time-consuming, and, often impossible:
o
If we wanted to find the average income of all people living California, can we obtain the income of
everyone living in California?
o
To find the minimum weight limit of a specific tire, would it be practical to test all tires?
o
To find the percent of all ticks carrying the lyme disease bacteria, could we conduct a census?
Sampling should be designed to represent the population to be useful in making inferences, estimates,
predications or decisions about a population.
Remember: GIGO - G
I
G
O
Data are generally obtained from two types of sources:
 Observational Study: observations and measurements of elements in the population are conducted in
a way that doesn’t change the response or the variable being measured. The researcher only observes
what is happening or what has happened in the past and tries to draw conclusions based on those
observations.
o Observe and record the height (inches) of a creek after heavy rains.
o Compare the number of accidents involved with gas powered cars and electric cars.
 Experimental Study: apply some treatment or manipulate one of the variables and then try to
determine how the treatment or manipulation influences other variables being studied
o Study the effect of a new drug by splitting respondents into two groups. One group receives a
placebo or look-alike pill with no active drug ingredient. The other group is given the drug, a
treatment that modifies the group.
o Compare the responses of voters before and after advertisements.
Common Problems in Statistics to beware of: A sample should be REPRESENTATIVE of the population.

A sample that is not representative of the population is biased.
o polling drivers of hybrid cars to investigate the country's opinion on environmental issues

Self-Selected Samples: Are there differences between those who do and do not choose to be in sample?
o Exit polling for voting results, reviews based on YELP

Sample Size Issues: A sample should be large enough to provide useful results.
o

Using only 6 appliances to determine repair records of a major brand of refrigerator
Collecting data or asking questions in a way that influences the response:
o Asking "Do you favor a major overhaul of the current Federal Tax Code that would replace today’s
burdensome tax system with one that is simpler and fairer?”
versus asking
“Do you favor a major overhaul of the current Federal Tax Code?”

Non-response or refusal of subject to participate (direct mail surveys, telephone surveys)
Last Name____________________ First Name _____________________Class Time________Chapter 1-7

Causality - Confounding or Lurking Variables: a relationship between two variables does not
necessarily imply that changes in one variable causes changes in the other variable. Instead, both
variables may be related to another variable, called a confounding or lurking variable which might be
the underlying cause for the changes.
o During WWII it was noticed that bombers were more accurate when there
was more opposition from enemy fighters.
Does this mean that more fighter opposition causes greater bombing accuracy?
No, both fighter opposition and bombing accuracy were less in bad weather. More enemy fighters and
more accurate bombing were both related to clear weather.
o Higher sales of ice cream are related to higher sales of bathing suits.
Does this mean that increased sales of ice cream cause increased sales of bathing suits?
o People with a history of bad credit rating are more likely to be in serious accidents.
Does this mean bad credit ratings cause people to be in more serious accidents?
o People who carry matches are more likely to develop lung cancer.
Does this mean that carrying matches causes people to develop lung cancer?
o Studies claim that SAT review courses lead to an increase in SAT scores.
Does this mean that SAT review courses cause better SAT scores?

Self-Funded or Self-Interest Studies: As long as sponsors of a study have a stake in the conclusions,
these conclusions are inevitably suspect.
o
("Vegan diets are bad for bones" was the conclusion based on flawed research funded by a
dairy products producer in July, 2009. http://www.drmericle.com/wp/34/study-vegetariandiet-weakens-bones-funded-by-dairy-products-industry/)
o polls paid for by a particular political party or by a specific industry

Misleading Use of Data: improperly displayed graphs, incomplete data, lack of context
`
Last Name____________________ First Name _____________________Class Time________Chapter 1-8
TYPES OF SAMPLES:
A sample is a part of or a subset of a population.
A sample should be representative of the population.
A sample that is not representative of the population is biased.
VOCABULARY AND CONCEPTS:
Sampling Error:
Random error obtained by using part of the population to represent the whole population
This type of error, the difference between the sample measure (statistic) and the corresponding population measure
(parameter) is unavoidable as the sample is not a perfect representation of the population.
Non-Sampling Error:
A non-random error caused by human error includes improper data collection, data entry, recording or sampling
techniques, biased questions, biased processing or decision making, inappropriate analysis or conclusions, and
false information provided by respondents.
Circle the Correct Response:
True
False
A very large sample always produces more accurate results than a very small sample.
If a sample is BIASED or is the result of IMPROPER DATA COLLECTION
or POOR SAMPLING TECHNIQUES, the increasing the sample size
will NOT correct the underlying errors of the sample.
Collecting more bad data will not make the sample more representative of the population.
Sampling:
 With Replacement:
When sampling with replacement, whenever an element from the population is selected for the sample it is
returned to the population with the possibility of being selected again for the sample. Think of a fisherman
catching an undersized fish from a pond. The fish is thrown back into the pond, where is can be caught again.
 Without Replacement:
When sampling without replacement, whenever an element from the population is selected for the sample it is
removed from the population with no possibility of being selected a second time for the sample. Think of cutting a
rose from a bush. Once the rose is cut, it is removed from the bush and cannot be cut again.
Sampling Methods: We will cover five basic methods. (You must be able to SPELL these correctly):

SIMPLE RANDOM SAMPLE
 Every element in the population has equal chance of being included in the sample.
Calculator Instructions To Sort Generate A Random Number:
Accessing the random integer number generator program in the TI-83+, 84+ :
MATH → EDIT → #5: randInt(, press ENTER...........or you could just press the number 5 key.
Fill in:
randInt(
will appear on your calculator window.
randInt(lowest integer value, highest integer value, sample size)
The comma key , is just above the number 7 key.
Last Name____________________ First Name _____________________Class Time________Chapter 1-9
 Now use the random number generator to choose one integer from the integers 4 through 7.
Results: Did everyone in the class get the same integer? Why or why not?
 Now, use your calculator to randomly choose 3 integers from the integers 1 through 5.
Can you tell if your calculator is sampling with replacement or without replacement?

STRATIFIED SAMPLE




Divide the entire population into at least two different subgroups or strata.
Draw a random sample from each subgroup.
The size of the random samples may be the same or different.
Combine the random samples from each subgroup to form the stratified sample of the
population.
 Think of a geological core sample of soil taken by drilling down into the surface and extracting
a cylindrical core of material. The core will have several layers of varying height of the
different soils or strata of the earth. Each type of soil is represented in the core sample, a
stratified sample of the population soil.

CLUSTER SAMPLE
 Divide the entire population into at least two different subgroups or clusters.
 Randomly select one or more clusters
 Choose ALL members from those selected clusters for the sample to form the cluster sample of the
population. None of the members from those clusters not selected will be in the sample.
 Think of selecting grapes from a bunch of grapes arranged at a buffet. Grapes are formed as small
clusters on an entire branch of the bunch of grapes. Typically, one would break off one or more
small clusters of grapes from the whole branch to take. The grapes broken off would combine to
form the cluster sample of the entire population of the bunch of grapes.

SYSTEMATIC SAMPLE
 Number all members of the population sequentially, in some order.
 Then, from a starting point selected at random, include every kth member of the population in
the sample until the desired sample size is reached. (As a practical manner, often the starting
point may not randomly be selected but be chosen as one of the members near the beginning of
the ordered list.)
 Caution: Be careful about how the subjects in the population were numbered. For example if
they were arrange as parent, child, parent, child, etc and every 50th subject was selected for the
sample, then the sample would consist of all parents or all children.

CONVENIENCE SAMPLE
 Use data from the elements of a population that are readily and easily available.
Last Name____________________ First Name _____________________Class Time________Chapter 1-10
 Caution: Most convenience samples such as those based on telephone surveys and internet polls
are not based on a random selection of elements. Convenience samples are not necessarily
representative and may well be biased.
Simple Random
Stratified
Cluster
Systematic
Convenience
Example 6: Write the name of the sampling method used for each example below:
a. ______________________________ Use postal Zip Codes for Florida to divide the state into regions. Pick a
random sample of 3 Zip Codes and include all hospitals in each selected Zip Code area.
b. ______________________________ The American Statistical Association directory lists its members in
alphabetical order. A sample of 1,000 members is obtained by selecting the 5,342nd member and every 15th
member thereafter.
c. ______________________________ Among the students in class today, two random samples are taken, 15 of the
men and 10 of the women.
d. ______________________________ The phone company estimated the average cost of phone service to its
customers by reviewing 358 accounts randomly selected from its customer base.
e. ______________________________ The minestrone soup received an average rating of 4.5 based on a sample of
794 online reviews of the recipe.
f. ______________________________ At a wine tasting fair, several wineries displayed their bottles of merlot, pinot
noir, zinfandel and burgundy wines. The wine connoisseur chose randomly selected one type and only tasted all the
merlot wines.
g. ______________________________ One thousand workers from a large company were surveyed by randomly
selecting 250 workers each from managers, laborers, craftsmen, and administrators.
Last Name____________________ First Name _____________________Class Time________Chapter 1-11
Example 7: We're interested in the average number of hours all De Anza students slept last night.
Data were gathered from a previous statistics class.
a. Is this an observational study or an experiment? _____________________________________________
b. Define the population: _________________________________________________________________
c. What sampling method is being used? ______________________________________________________
d. What is the Survey Question: _____________________________________________________________
e. List one possible of data value and including the units of that value:
_____________________________
f. Fill in the blank with your data value in answer "e" above and then finish the remainder of the sentence.
_________________________ represents ___________________________________________________
g. Now define the variable with the phrase you used to finish the sentence:
X = ___________________________________________________________________________________
Caution: DO NOT define the variable as:
WRONG!!! X = the number of hours De Anza students slept last night
as this would mean that each and every student slept the same number of hours last night.
WRONG!!! X = the average number of hours De Anza students slept last night
as this would mean that each and every student slept the average number of hours last night.
h. What type of data are we collecting? ___________________________________________________
i. Define the parameter in context of this problem: Remember to include the population definition
j. Define the statistic in context of this problem: Remember to include the sample definition.
Last Name____________________ First Name _____________________Class Time________Chapter 1-12
Example 8: Organize data with a Frequency Table





Partitions data into classes or intervals so that each data value falls into exactly one class.
A Frequency Table consists of two headings.
The first heading consists of the data values in order from small to large.
The second heading consists of the frequency or number of responses for the data value.
Frequency tables can be oriented either horizontally or vertically.
We will continue to use the data collected from an previous statistics class of 42 students.
X:
# of hours
0
1
3
5
6
7
8
10
Frequency
1
1
2
OR
X: number of hours
Frequency
0
1
1
1
3
2
5
6
15
7
6
8
5
10
2
15
6
5
2
n = 42 or ∑ = 42
The frequencies add up to "n", the sample size. "N" is the symbol for the size of the population.
The symbol for a sum of values added together is the " ∑ " , the upper-case Greek letter sigma.
Show your work to find the missing frequency value:
b. Now we will calculate the average of the data. The number you calculate will be a
________________________, because this is a number calculated from the _______________________.
c. Show the algebraic calculations needed to calculate the average in a clear, complete manner:
d. The sample average (Estimated to three decimal places is __________________________________.
Last Name____________________ First Name _____________________Class Time________Chapter 1-13
Example 9: Graphing the data with a Dot Plot, the simplest of graphs that we will learn.
 Continue to use the same data.
 Draw a horizontal line for your axis, using a ruler.
 Label the right side of your axis with the definition of the variable, or, if you've defined the variable in
earlier work, then label your axis with the variable symbol, X (Your variable letter should read
clearly as a CAPITAL letter. Points on written work will be lost for unclear notation!!
 Scale your axis with tick marks and corresponding numbers. (Accuracy and neatness is essential with
graphs. A RULER must be used to draw & scale your axis. Points will be lost if a ruler is not used!
 Then plot each data value with a dot above the corresponding value on the horizontal axis.
 For repeated data values, stack the dots evenly.
Example 10: Organizing Data With a Cumulative Relative Frequency Table:
Using the Frequency Table you previously constructed,


Add a Relative Frequency column to create a Relative Frequency Table

Relative Frequency = Frequency / Sample Size

The sum of the relative frequency column should be __________________________________.
Add a Cumulative Relative Frequency column to create a Cumulative Relative Frequency Table
 Cum. Rel. Freq. = Sum of Relative Frequency and prior Relative Frequency
OR

Cum. Rel. Freq. = (Sum of Frequency and all previous Frequencies) / Sample Size

The last value of the cumulative relative frequency column should be _____________________.
a. Now complete the table that follows. Your answers should be decimal values, estimated to 4 places.
Last Name____________________ First Name _____________________Class Time________Chapter 1-14
x
# of hours
Cumulative Relative Frequency Table
Relative
Frequency
Frequency
Cumulative
Relative
Frequency
0
1
1/42 = 0.0238
1/42 = 0.0238
1
1
0.0238
0. 238 + 0. 0238 = 0.0476
OR (1+1) / 42= 0.0476
3
2
0.0476
0.0952
5
10
0.2381
0.3333
6
15
7
6
8
5
10
2
n = 42
*∑ = 1
*(Both the sum of relative frequencies and the last cumulative relative frequency should equal 1 or,
due to rounding error, should be very close to 1.)
Calculator instructions to create the Relative Frequency Column:
To input data values and frequencies:
STAT → EDIT → #1: EDIT → ENTER
to void a column of numbers already entered,
use the 4 cursor arrows to move the cursor over the column heading ( L1, L2, L3, etc.)
Press CLEAR → Enter
WARNING: DO NOT PRESS DEL when your cursor is on the column heading or you will eliminate the
column entirely from your screen.
However, if this does happen, reset your calculator with:
Press 2ND MEM ( this is the second operation when pressing the + key )
→ #7↓ RESET → ALL → 1: ALL Memory → ENTER
Now that the column is empty, enter data values (x values) in List Column 1 (L1).
Enter frequencies in List Column 2 (L2)
.
To enter relative frequencies in Column 3 (L3), move the cursor on the column heading, L3.
Enter the formula: L2 / total number of data values entered (sum of frequencies) To enter L2 (this is the
second operation above the digit 2 key) enter 2ND L2
So for our example, enter 2ND L2 / 42.
Last Name____________________ First Name _____________________Class Time________Chapter 1-15
Values in the relative frequency column are probabilities for selecting a specific data value: P(X = x)
Values in the cumulative relative frequency column are probabilities for selecting data value less than or
equal to a specific data value: P(X ≤ x).
Probabilities are decimal or fraction values between 0 and 1, inclusive:
0 ≤ P(X=x) ≤ 1
Example 11: Fill in the following:
a. P(X=3) = ______ b. P(X≠3) = ______ c. P(X≤ 3) = ______ d. P(X< 3) = ______ e. P(X≥ 3) = ______
f. P(X> 3) = ______ g. P(X≤ 9) = ______ h. P(X< 9) = ______ i. P(X≥ 9) = ______ j. P(X> 9) = ______
Using your Cumulative Relative Frequency Table, find the percent of data responses that follow:
 First write the problem with a direct translation of the probability statement.
For example: P ( X = 2 ), P ( X ≠ 2 ), P ( X < 2 ), P ( X ≤ 2 ), P ( X > 2 ), or P ( X ≥ 2 )
 If you have written a statement with “ = ” ,
you will be able to use the relative frequency column to find your answer.
 If you have written a statement with “≤ ” ,
you will be able to use the cumulative relative frequency column to find your answer.
 If you have written a statement with any other symbol, rewrite the probability statements with
equivalent probability statements that do include either “ = ” or “≤ ”.
 Show your steps neatly, one below the other, connecting all equivalent steps with an “=” sign.
 Show all numerical work to reach your final answer.
 Write your final answer as a percent, estimated to one decimal place.
(Take care: 0.8403, estimated as a percent to one decimal place is 84.0%, not 84%!)
1. not 6?
P ( X ≠ 6)
2. no less than 7?
OR P ( X ≠ 6)
= 1 - P ( X = 6)
= P ( X = 0,1,3,5,7,8, or 10)
= 1 - 0.3571
= .0238+.0238+.0476+.2381
+.1429+.1190+.0476
= 0.6428
= 0.6428
= 64.3%
(Use Rel. Freq.)
= 64.3%
(Use Rel. Freq.)
P ( X ≥ 7)
OR
P ( X ≥ 7)
= P (X = 7, 8, or 10)
= 1 - P ( X < 7)
= ( 6 + 5 + 2 ) / 42
= 1 - P ( X ≤ 6)
= 0.3095
= 1 - 0.6904
= 31.0%
= 0.3096
(Use Rel. Freq.)
= 31.0%
(Use Cum. Rel. Freq.)
3. exactly 6?
4. less than 7?
P ( X = 6)
P ( X ≤ 6)
= 0. 3571
= 0. 6904
= 35.7%
(Use Relative Frequency)
= 69.0%
(Use Cumulative Relative Frequency)
Last Name____________________ First Name _____________________Class Time________Chapter 1-16
5. not 5?
6. no less than 3?
7. less than 2?
8. at least 4?
9. no more than 4?
10. more than 4?
11. under 8?
12. at most 2?
Download