C. Pang`s Elementary Statistics Notes & Worksheets

advertisement
C.Pang’s Elementary Statistics Notes & Worksheets
(v.090728b)
C. Pang’s
Elementary
Statistics
Notes &
Worksheets
ISBN-5-6987458-1-1
© Chi-yin Pang, 2009
No Refund/No Buy-back Value
C.Pang’s Elementary Statistics Notes & Worksheets
(v.090728b)
Contents
1. Basics
a. Basic Terms
b. Critical Thinking
2. Organizing Data
a. Graphs of Quantitative Data
b. Statdisk Instruction
c. Project 1 Assignment and Example
d. Graphs of Categorical Data
3. Center and Variation
a. Measures of Center
b. Standard Deviation
4. Probability
a. Contingency Table (“Titanic”)
b. Stacked Bar Graph (Assessing Dependency)
c. Binomial Distribution
i. Binomial Motivation Example (Juror Selection Process)
ii. Binomial Computation
iii. Binomial Application
iv. Binomial Mean and Standard Deviation
d. Normal Distribution
i. Heights Distributions of 2-years Old & 20-years Old
ii. Normal CDF Computation
iii. Inverse Normal CDF Computation
iv. SAT vs. ACT Example (“Donald & Micky”)
v. Central Limit Theorem (CLT)
vi. Normal Distribution Application (Setup)
vii. Central Limit Theorem (CLT) Application (Setup)
5. Influential Statistic
a. Confident Interval (CI)
i. “CI Facts” (Summary)
ii. Interpretation of CI
b. Hypothesis Test (HT)
i. Types of Errors & Power of HT
ii. Defining H 0 and H1
c. Project 2: CI or HT Assignment & Examples
d. 2-Sample Mean HT (or CI) Data Collection (for Independent Samples)
e. 2-Sample Proportion HT (or CI) Data Collection
f. Correlation & Regression (Car Weight vs. MPG)
g. Chi-Square Goodness-of-Fit Test (Distribution of Categories)
h. Test of Dependency (Gender & Politics; “Titanic”)
6. CI & HT Flowcharts; HT Setup and Decision Methods
ISBN-5-6987458-1-1
© Chi-yin Pang, 2009
No Refund/No Buy-back Value
Elementary Statistics
Basic Terms Worksheet
Ref.
[Tri 2008]
NAME: _______________________
(Ver.090117)
Subject
(The individuals that
provide the data.)
Question
to or about
the subject
A survey of 35 students in Laysie
College, has an average of 4.3 units
per students.
NY city has 3250 Walk Buttons. 77%
of them (2502 buttons) are broken.
Sample
Statistics or
Parameter
p.5
Stat Parameter
Stat Parameter
A sample of 877 executives were
surveyed. 45% of them (395) would
not hire someone with typo error on
their job application.
p.10 #5
Stat Parameter
Stat Parameter
p.10 #6
Stat Parameter
p.10 #7
Stat Parameter
p.10 #8
Stat Parameter
p.11 #21
Stat Parameter
p.11 #22
Stat Parameter
p.11 #23
Stat Parameter
p.11 #24
© Chi-yin Pang, 2009
Population
(All the subjects of
interest.)
Stat Parameter
Page 1 of 2
Elementary Statistics
Basic Terms Worksheet
NAME: _______________________
(Ver.090117)
Quantitative
Or
Categorical
p.6
Discrete
or Continuous
(no gaps)
p.6
Nominal or Ordinal or
Interval or Ratio
Bleed of dog. (Bull, retriever, etc.)
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
Cell phone’s area code.
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
Size of coffee (small, medium, large)
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
Out door noon time temperature in
degree F.
Number of people in a country.
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
Weight of a candy.
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
p.10 #10
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
p.10 #11
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
p.10 #12
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
p.10 #13
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
p.10 #14
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
p.10 #15
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
p.10 #16
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
p.10 #17
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
p.10 #18
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
p.11 #19
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
p.11 #20
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
Quant Categorical
Discrete Continuous
Nominal Ordinal Interval Ratio
© Chi-yin Pang, 2009
pp.7-9
Page 2 of 2
Elementary Statistics
Critical Thinking Examples
(v.090207+)
Sampling Problems:
Some samples do not represent the population, therefore, the sample statistic is useless for making
inference about the population.
Situation
More background
Problem
From a randomly picked sample,
The statistics is from
Don’t make conclusion with small sample. [Tri08,
p.13] (We will
I have concluded, at SJCC,
a sample of 3 cars:
66.7% of people drive Honda and
2 Hondas and
learn how to
33.3% of people drive Toyota.
1 Toyota.
determine the
minimal sample
size later.)
Another e.g.:
[Huff54]
www.RateMyProfessors.com
The July 27, 2001 Orange County
Register reported: “All but 2% of [home]
buyers have at least one computer at
home. …” [BVD07, p.282]
Conduct U.S. Census by telephone.
[Tri08, p.16]
Conduct opinion poll over the phone.
[Tri08, p.16]
Only the students
who felt strongly took
the time to write.
The survey were
conducted via the
Internet.
Voluntary sampling misses the “middle
spectrum” of the population. Don’t use voluntary
sampling! Don’t trust the result. [Tri08, p.12]
This Convenience sampling the home buyers
who do not use a computer. It does not
represent the population. Do not believe the
result! [Tri08, p.28]
Homeless and poor
A segment of the population is missing. Beware
may not have phone. of Missing Data. [Tri08, p.16]
Some won’t answer to Those are afraid of telemarketers are not
avoid “selling under
represented. Beware of Non-response. [Tri08,
p.16]
the guise” of a poll.
Survey Questions Problems:
Some survey question can be “loaded” to encourage a certain answer. Even the order of question may
have an effect to the answer. (See [Tri08, p.16] for examples.)
Representation and Interpretations Problems:
The following could be from ignorance (fooling self and others).
They can also be intentional tricks to lead people to a wrong conclusions. DON’T BE FOOLED!
Situation
More background
Problem(s)
Wage statistics: (~1954) [Huff54]
Actually,
Don’t use
Rotundia: $30/week
Pictographs. The
USA: $60/week
2-D, 3-D pictures
falsely exaggerate
the relative size.
[Tri08, p.14]
Rotundia vs.
E.g., the Carter $
worth only 44% of
the Ike $. But, the
area of the Carter
bill is only 19% of
the Ike bill. [Tufte83]
USA
Page 1 of 2
Elementary Statistics
Critical Thinking Examples
(v.090207+)
Gov’t PayRolls Stable!
Gov’t Pay Rolls
Shoot Up!
30M$
[Huff54]
20M$
Beware of graphs that does not have the
horizontal axis at y=0. [Tri08, p.13]
Include y = 0 for the correct perspective.
Another misrepresentation e.g.:
10M$
0M$
“17.6264% of the bird/plane collisions
strike the engine.”
10,916 struck the
engine out of 61,930
collisions, between
1990 – 2007.
10916/61930 = 0.176263523…
“We need an emergency pay cut of
50%, after the difficult quarter we will
restore the pay with a raise of 50%.”
[Huff54]
Pay before cut = $100
Pay after cut = $50
50% of new pay=$50x.5=$25
Pay after “restoration” (50%
increase) = $50+$25 = $75!
Reporting many digits of an estimate gives a
false impression of a very accurate and precise
investigation. It is less misleading to report
“About 17.6% ...” or “About 18% …” See
Precise Numbers in [Tri08, p.17].
Be ware of how
Percentage are
calculated. Also see
[Tri08, p.14].
Another e.g.: Using %
to manipulate
conclusion. [Huff54]
“Honey, statistic shows that BMW
owners have a longer life expectancy.
For the sake of our lives, let’s buy a
BMW.”
BMW owners tend to
be wealthy. Wealthy
people have better
health care.
In a semester, your instructor taught
Statistics in the morning as well as
evening. The evening students were
older. The evening class also did
better, can you conclude that the older
students were better students?
The instructor
repeated the morning
lectures in the
evening (with
improvements).
Ethical Problems:
“Correlation does not imply Causality.” [Tri08,
p.17]
E.g., When ice cream sales goes up, more
people get drown. Let’s ban ice cream to save
life!
Was the higher achievement due to “age” or
“better lectures”? The effect of “student age”
and the effect of “lecture improvement” were
Confounded.
When planning an experiment, you must
carefully think through all the factors that might
affect the result. [Tri08, p.23]
(See [Tri08, p.17].)
Beware of:
• Self-Interest Studies (e.g., oil co. sponsors study to prove their gas is better)
• Partial Pictures (What are they hiding?)
• Deliberate Distortions (claims that base on nothing)
Page 2 of 2
Elementary Statistics
Graphs for Quantitative Data
G2:
Stemplot
Graphs for Quantitative Data
(v.090721)
(Stem-and-Leaf) [Tri08, p.59]
Goal: These graphs summarize the “distribution” of all the data.
“Distribution” means what values are common (concentrated) and
what values are rare. These are tools to assess whether the data is
from a “normally distributed” (bell-shaped
Example: In a Statistic class (in Fall
2008), scores for Test 1 are as follows:
G1: Dotplot [Tri08, p.58]
) population.
94
87
81
100
76
62
96
81
87
79
86
60
82
73
62
83
99
98
75
94
Step 1: Label the x-axis
• Find the minimum and maximum values from the data set.
• Decide the best x-scale to cover all the values.
• Label the x-axis.
Step 2: Plot the dots.
For each value, put a dot over the appropriate x. When there is a
repeated value, just stack them.
42
82
93
(See [Tri08, p.59] for more detail. Also see Wikipedia.)
Step 1: Label the “Stem”.
• Find the minimum and maximum values from the data set.
• Decide the range the stem values.
• Label stem heading, stem unit, stem values
• Label the leaf heading, and leaf unit.
• CAUTION: Do not skip any stem value.
Step 2: Record each “leaf” value.
Step 3: Sort the leave values for each “stem row”.
• CAUTION: Some stems might not have leaves. You must not
delete that “stem row”.
Observations:
Where are the values concentrated?
Identify any outliers, values that are far from the rest.
Observations:
Where are the values concentrated?
Are the values normally distributed (i.e., bell-shaped)?
Identify any outliers, values that are far from the rest.
Are the values normally distributed (i.e., bell-shaped)?
© 2009 Chi-yin Pang
Note: Stemplot and boxplot evolved from Arthur Bowley's work. Bowley (1869-1957) wrote
the first English-language statistics text-book.
Page 1 of 3
Elementary Statistics
G3: Histogram [Tri08, pp.51-]
Enter data into L1:
[STAT]>Edit>1:Edit>[ENTER]
Use [^] to go up to L1>[CLEAR]>[ENTER]
Enter the data to L1.
Quit list editor: [2nd][MODE](QUIT)
Make Histogram:
Graphs for Quantitative Data
(v.090721)
G4: Boxplot (Box-and-Whisker)
[Tri08, pp.121]
Make the plot:
[2nd][Y=](STAT PLOT)>1:Plot1…
Set up Plot1 as shown in screen shot.
[ZOOM]>9:ZoomStat
Copy the boxplot to scale.
[2nd][Y=](STAT PLOT)>1:Plot1…
Set up Plot1 as shown in screen
shot.
[ZOOM]>9:ZoomStat
Copy the histogram to scale.
Label the boxplot: Use [TRACE] and [<] [>].
Label the histogram: Use [TRACE] and [<] [>].
Observations:
Where are the values concentrated?
Identify any outliers, values that are far from the rest.
Are the values normally distributed (i.e., bell-shaped)?
Interpretation:
• A quarter of data are between minx and Q1.
• A quarter of data are between Q1 and Med.
• A quarter of data are between Med and Q3.
• A quarter of data are between Q3 and maxX.
• If there are any Outliers, they would be indicated with mark(s)
separated from the “whisker”.
Observations:
Where are the values concentrated?
Identify any outliers, values that are far from the rest.
Are the values normally distributed (i.e., bell-shaped)?
© 2009 Chi-yin Pang
Page 2 of 3
Elementary Statistics
G5: Normal Quantile
[Tri08, pp.53-54]
Graphs for Quantitative Data
(v.090721)
Plot
G3, G4, G5 with Statdisk’s
“Explore Data” Function
Install and Launch Statdisk:
See the Statdisk Instruction handout.
Enter data into column 1:
See the Statdisk Instruction handout.
Use the “Explore Data” function to do
“Exploratory Data Analysis”:
Make the plot:
[2nd][Y=](STAT PLOT)>1:Plot1…
Set up Plot1 as shown in screen shot.
[ZOOM]>9:ZoomStat
Copy the normal quantile plot to scale.
Data>Explore Data>choose column 1>[Evaluate]
Interpretation:
• The normal quartile is mainly used to assess whether a data set
is normally distributed. The points lines up straight, if and
only if the data set is normally distributed.
• Outliers, if any, would be located on the lower left or upper
right far from the straight line formed by the other points.
Observations:
Are the values normally distributed (i.e., bell-shaped)?
Identify any outliers.
© 2009 Chi-yin Pang
Page 3 of 3
Compare the Statdisk
result with the TI results.
Note:
• The histograms are
different, because the
TI and Statdisk used
different classes.
• Statdisk’s Boxplot
does not show outliers.
Elementary Statistics
Scatterplot [Tri08, p.60]
Example: Are horsepower & fuel economy correlated? Here are
advertised horsepower ratings and expected gas mileage for several
2001 vehicles. [BVD07, p.164]
Audi A4
Chevy Prizm
Ford Excursion
Honda Civic
Lexus 300
Olds Alero
VW Beetle
x: Horsepower
170 hp
125
310
127
215
140
115
y: Gas Mileage
22 mpg
31
10
29
21
23
29
Time-Series Graph
Example: Trend of Marriage age. Here are the average age of
American women (at first marriage) in different years. [BVD07, p.214]
x: Year
1900
1910
1920
1930
1940
y: Age
21.9
21.6
21.2
21.3
21.5
x: Year
1950
1960
1970
1980
1990
Plot the data by hand.
Plot the data by hand.
Plot the data with the TI.
Enter data into L1 & L2:
[STAT]>Edit>1:Edit>[ENTER] etc.
Make the Scatterplot:
[2nd][Y=](STAT PLOT)>1:Plot1…
Plot the data with the TI.
Enter data into L1 & L2:
[STAT]>Edit>1:Edit>[ENTER] etc.
Make the Time series plot:
[2nd][Y=](STAT PLOT)>1:Plot1…
Set up Plot1 as shown in screen
shot.
Set up Plot1 as shown in screen shot.
[ZOOM]>9:ZoomStat
[ZOOM]>9:ZoomStat
Observation: What trend do you see?
Observation: Does the graph show a correlation?
© 2009 Chi-yin Pang
Graphs for X-Y Data
(v.090120+)
[Tri08, p.61]
Page 1 of 1
y: Age
20.3
20.2
20.8
22.0
23.9
Elementary Statistics
Triola’s Statdisk Instruction
(v.090301+)
Triola’s Statdisk
Instruction
The text book is Mario F. Triola, Essentials of Statistics (3rd Ed.), Pearson, 2008.
Steps
Examples
Installing
Triola’s
Statdisk
From CDROM
1.)Insert the STATDISK CD in the CD-ROM
Drive.
2.)Double click the "My Computer" icon on
your desktop (Windows) or open a Finder
window (Macintosh).
3.)Double click on your CD-ROM Drive icon.
This will open a window containing the
contents of this CD.
4.)Double click on the "Software" Folder. This
will display the contents of that folder.
5.)Double click the "STATDISK" Folder. This
will display the contents of that folder.
6.)Double click on the file that corresponds to
your operating system. Drag the Statdisk
executable to your hard drive to install it.
Installing
Triola’s
Statdisk
From the
Web
http://wps.aw.com/aw_triola_stats_series
Click Essentials of Statistics
http://wps.aw.com/aw_triola_stats_series
Click [STATDISK]
http://media.pearsoncmg.com/aw/aw_triola_e
lemstats_9/software/statdisk.htm
(Assume you have a PC.)
Click “STATDISK 10.4.0 for Windows”.
Download and save the file to
My Document/Download/TiolaStat/
sd_10_4_0_win2kXP.zip
Go to the folder “My Document/Download/
TiolaStat /sd_10_4_0_win2kXP.zip” and
double click Statdisk.exe, then click [Extract
All].
Now you have the application Statdisk.exe
with the Histogram icon.
© Chi-yin Pang, 2009
Page 1 of 2
Elementary Statistics
A pictorial
guide to
draw a
comparative
boxplot
Triola’s Statdisk Instruction
(v.090301+)
You can follow this example to get a feel of
how easy to make a comparative boxplot. In
your project, you would enter your data into
columns 1 and 2.
Launch
.
The loaded the data from p.631’s Data Set 8:
Forecast and Actual Temperatures.
We will plot columns 1 and 2.
NOTE: For the comparative boxplots to be effective, you
MUST plot both sets of data at the same same time, and not
plot them one at a time. The two plots must be together and
on the same scale.
Paste
boxplot
into Word
You can use [Alt]+[PrtSc] (Alternate, Print
Screen) keys to “take a picture” of the
StatDisk window. Then in your Word
document (your report), use the Edit>Paste
to paste the StatDisk window into your report.
Cropping
the boxplot
picture in
Word
To trim off the unnecessary borders of the
pasted boxplot picture, you need to use the
If you do not see that tool, you need to add the “Picture Tool
Bar” by
picture crop tool.
Click the boxplot picture, then select the crop
tool. The cursor would become the crop
symbol. Use the crop symbol to drag the
corners of the picture to trim the borders.
Position of
the picture
You can also use
Format>Picture>Layout>Square to
experiment repositioning the placement of
the boxplot on your page.
Then [Close].
© Chi-yin Pang, 2009
Page 2 of 2
Elementary Statistics
Project 1 Assignment
(v.090728)
Project 1: Comparative Boxplots or Dotplots Assignment
Learning
Objective
The objectives are to:
• Gain experience for a “field” data collection.
• Use Comparative Boxplots or Dotplots to compare data.
• Exercise your written presentation skill.
You will use the numerical methods to analyze the same data in Project 2.
Due Date
Proposal due _______________
(send me a 1-line e-mail, chi-yin_pang@alumni.hmc.edu)
Report due _________________
Topic
You are to collect two sets of QUANTITATIVE data, of at least 31 samples
each. (Use measurement data, and not categorical or Yes/No data.) Compare
the samples comparative boxplots or dotplots to highlight the difference (or
similarity) of their distributions, and make conclusion about your investigation.
(If you want to use other graphic displays, talk with me.) See the end of this
assignment for examples.
Make you own “fun” topic; especially, the “fun” topic that you have ready
access to the data. “Fun” makes work easy. If you are not interested in the
topic, it is like pulling your own teeth.
Your report must include:
Project
Report’s
• An introduction of you topic. (It would be good to discuss your original
Content
prediction of the result.)
Requirements
• Describe how you collect the data, the source of the data, and present
the raw data.
• Describe your analysis and present the comparative graphs boxplots.
• Draw conclusion(s). If appropriate, state what you would do differently
for further investigation.
Editing
Requirement
The report must be word processed:
• You may draw the graphs software, for example, StatDisk. You may
use graph paper and sketch the graph neatly and to scale. You may also
use “Courier New” font (which is an “equal-width” font) and
hand typed character graphics, to scale.
• In case you use formulas, typed them with Equation Editor.
x x + ... + xn
E.g., μ = ∑ = 1
.
n
n
For Word, use Insert>Object …>Object type: Microsoft Equation 3.0> [OK].
© 2009 Chi-yin Pang
Page 1 of 2
Elementary Statistics
Caution
Project 1 Assignment
(v.090728)
There are many potential “time-eaters” along the way, such as:
• Collecting data.
• Installing/configuring software for graphing, equation editing etc.
• Using equation editor.
• Wanting to collect more data to investigating further.
• Thinking and writing up the conclusion.
Start the project early to find out what might surprise you.
Project Ideas: (Real projects from Fall 2008)
Price Comparison
Merchandise prices:
Paper back vs. Hardcover
Wal-Mart vs. Target
Starbucks vs. Peets
NVIDIA 260 vs NVIDIA 9800
Rent price:
SF vs. SJ; SJ vs. NY
East SJ vs. West SJ
Car prices:
Charge vs. Magnum
Prius vs. Civic
BMW M5: SF vs. LA
Saturn Ions: 2 vs. 4 door
Office rental price:
Downtown SJ vs.
not Downtown SJ
Women’s boot prices:
Aldo vs. Bakers
Shoe prices:
Manolo Blanik vs.
Jimmy Choo
Home price:
Zip 95123 vs. Zip 95051
Purse prices:
Coach vs. Dooney&Burke
Gucci vs. Burberry
# pets in 1000 households:
dogs vs. cats
# text messages:
In box vs. Out box
Past due traffic tickets (in $):
Male vs. Female
Unemployment rates:
8/2007 vs. 9/2008
Household incomes:
3-person families vs.
4-person families
Pairs of jeans owned:
Male vs. Female
College tuitions:
Private vs. Public
Nursing schools passing
rates: 2006 vs. 2007
# teachers in elementary
schools: Male vs. Female
Word lengths:
King James Version vs. New
King James version
# words in a verse:
English vs. Spanish
# letters in the title in Harry
Potter books:
Book 6 vs. Book 7
Age of Target workers:
Female vs. Male
Income from arcade game:
Redemption vs.
Non-Redemption
Bagels wasting (per day):
8/2008 vs. 9/2008
Movie gross (Million $):
Highest grossing of the year
vs. Best picture of the year
Rice export (tons):
1997 vs. 1998
Daily stock prices:
High vs. Low
Radiology scan time:
CT scan vs. MRI scan
Blood glucose level:
Before treatment vs.
After treatment
# days of hospital stay for
major joint replacement:
With complications vs.
Without complications
Number of park visitors:
Regular price vs.
Discount price
Temperature in cities:
West Coast vs. East Coast
Fuel economy of car:
Weight vs. MPG.
Foreclosure price:
Santa Clara County vs.
San Benito County
Sociology
Education
# units taken:
Spring vs. Fall
Literature
Word lengths (# letters):
Broken Spears vs.
To Kill a Mocking Bird
Business
Spending at checkouts:
Credit card vs. Cash
Time spent in a furniture
store:
Saturdays vs. Sundays
Medical
Cases of pneumonia:
2006 vs. 2007
Miscellaneous
Off Highway Vehicle park
visits: ATV vs. Motorcycle
NBA draft ages:
2007 vs. 2008
© 2009 Chi-yin Pang
Crime rate:
Page 2 of 2
Project 1 Example
(ver.090222)
Project 1 Example B
Did the 1st Century Doctor Use Longer Words?
Chi-yin Pang
February 22, 2009
Introduction
In the study of Koine Greek, beginner’s text books like to use examples from fisherman John’s
writing rather than Dr. Luke’s writing, because John’s writings are easier to read and Dr. Luke’s
writing are more refined. I thought that medical school must have taught Dr. Luke to write
longer unintelligible words. John was just an uneducated fisherman, before he became “St.
John”; therefore, he must have use simpler words. I will just take a sample of their writings, and
count the length of words and see what the data say.
Method and Raw Data
At first, I have decided to compare two passages that describe the same event, Jesus fed five
thousand people. Here are the comparative passages in Koine Greek (from Byzantine/Majority
Text 2000), and the number of letters of the first 15 words are shown as a sample of the data
columns.
13
Dr. Luke’s Luke 9:13-17
ειπεν δε προς αυτους δοτε αυτοις υμεις
φαγειν οι δε ειπον ουκ εισιν ημιν πλειον η
πεντε αρτοι και ιχθυες δυο ει μητι
πορευθεντες ημεις αγορασωμεν εις παντα
τον λαον τουτον βρωματα 14ησαν γαρ ωσει
ανδρες πεντακισχιλιοι ειπεν δε προς τους
μαθητας αυτου κατακλινατε αυτους
κλισιας ανα πεντηκοντα 15και εποιησαν
ουτως και ανεκλιναν απαντας 16λαβων δε
τους πεντε αρτους και τους δυο ιχθυας
αναβλεψας εις τον ουρανον ευλογησεν
αυτους και κατεκλασεν και εδιδου τοις
μαθηταις παρατιθεναι τω οχλω 17και
εφαγον και εχορτασθησαν παντες και ηρθη
το περισσευσαν αυτοις κλασματων κοφινοι
δωδεκα
8
Fisherman John’s John 6:8-13
λεγει αυτω εις εκ των μαθητων αυτου
ανδρεας ο αδελφος σιμωνος πετρου 9εστιν
παιδαριον εν ωδε ο εχει πεντε αρτους
κριθινους και δυο οψαρια αλλα ταυτα τι εστιν
εις τοσουτους 10ειπεν δε ο ιησους ποιησατε
τους ανθρωπους αναπεσειν ην δε χορτος
πολυς εν τω τοπω ανεπεσον ουν οι ανδρες τον
αριθμον ωσει πεντακισχιλιοι 11ελαβεν δε τους
αρτους ο ιησους και ευχαριστησας διεδωκεν
τοις μαθηταις οι δε μαθηται τοις ανακειμενοις
ομοιως και εκ των οψαριων οσον ηθελον 12ως
δε ενεπλησθησαν λεγει τοις μαθηταις αυτου
συναγαγετε τα περισσευσαντα κλασματα ινα
μη τι αποληται 13συνηγαγον ουν και εγεμισαν
δωδεκα κοφινους κλασματων εκ των πεντε
αρτων των κριθινων α επερισσευσεν τοις
βεβρωκοσιν
Page 1 of 2
Word #
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
etc.
Luke
5
2
4
6
4
6
5
6
2
2
5
3
5
4
6
etc
n=91
John
5
4
3
2
3
7
5
7
1
7
6
5
9
2
3
etc
n=101
Project 1 Example
(ver.090222)
Graphs of the Distributions
I entered the raw word length counts into Statdisk. Luke 9:13-17’s counts at column 1, and John
6:8-13’s counts at column2. Then I used “Boxplot” to draw Box-and-Whisker plots.
Dr. Luke’s Luke 0:13-17
Fisherman John’s John 6:8-13
Conclusions
To my surprise, the Boxplots show that the word length distribution of both Luke and John are
remarkably similar. Furthermore, the Boxplots show that 25% of Luke’s words are 6 to 14
letters, and 25% of John’s words are 7 to 14 words. This fact even suggests that John use longer
words more often.
What have I learned and what would I do differently? Since Boxplots only provide the 5-number
summary and sacrifice details of the raw data, perhaps Dotplots would show me more of the
subtle differences. Nevertheless, the Boxplots was very effective in showing that Dr. Luke does
not seem to use longer words that fisherman John.
In a more fundamental level, the whole premise of “difficulties are associated with word length”
might be shaky. For example, grammatical construction might have more influence to reading
difficulty, but difficult grammatical construction might not lead to long words.
Page 2 of 2
Elementary Statistics
Graphs for Categorical Data
(Ver.090118)
G3: Segmented bar chart (not in [Tri08])
Step A: Calculate sample size, n:
Go to list editor: [STAT]>Edit>1:Edit>[ENTER]
Clear L1:
Use [^] to go up to L1>[CLEAR]>[ENTER]
Enter frequencies into L1.
Quit list editor: [2nd][MODE](QUIT)
Calculate n, by sum up L1:
Graphs for Categorical Data
Example: Cause of death [BVD04 p.28]
In 1999, a sample of death cause data shows:
Cause
Frequency (#Cases)
Cancer
230
Circulatory diseases
84
Heart disease
303
Respiratory diseases
79
L1
Frequency
Category
xi
L2
Relative Freq.
ri = xi / n
L2 = L1 / n
Cumm.
L3
% for
Central
Angle
Segmented
θ
=
r
∗
i
i 360°
bar
chart
L3 = L2 *360
[2nd][STAT](LIST)>MATH>5:SUM(>[ENTER]>
[2nd][1](L1)>[ ) ]>[ENTER]
Step B: Calculate the relative frequency:
Go to list editor.
Go up to L2.
Type L2=L1/(the value of n) then [ENTER]
Step C: Calculate the cumulative
percentage. (Add from bottom to top.)
Step D: Draw the Segmented bar chart
Sample size: n = ∑xi = sum(L1)
=
G1: Bar chart (not in [Tri08])
Decide on the scale.
Draw the bars for each category.
G4: Pie chart
Step A: Calculate the central angles.
Go to list editor.
Go up to L3.
Type L3=L2*360 then [ENTER]
G2: Pareto* chart (p.59)
Sort the bars from tallest to the shortest.
Step B: “Cut” the pie according to
central angles. Label each slice.
* Vilfredo Pareto (Italian economist, 1848-1923) observed the “Pareto
Principle” (“80-20 Rule”, 80% of the wealth is owned by 20% of the people).
© 2009 Chi-yin Pang
Page 1 of 1
Measures of Center (Ver.080205+)
What?
How?
What for? When?
Characteristics
Sensitive to every
sample in the dataset.
Outliers would pull it
way off.
Mean
(Average)
Arthematic average.
Add the n values and devide by n.
The "center of gravity" of the (x1+x2+…+xn) / n
data.
Used for most measured
data: Height, weight,
duration, grade point
average, age …
Median
(50th
Percentile)
The value where 50% of the 1. Line up all the values from the
samples are above and 50% smallest to the largest.
are below.
2. Count from both ends going
towards the middle.
3. If there is one middle point, that's
the median.
If there are 2 middle points, add
them up and divide by 2.
Real estate: "Median house Insensitive to outliers.
price."
Mode
The category (or the
number) that occurs the
most often.
Election. The candidate who The only "center" that
gets the most vote wins.
works for categorical
data.
Midrange
The mid point between the (Minimum + Maximum) / 2
minimum and the maximum.
(c) Chi-yin Pang, 2008
Find the category (or the number)
that occurs the most often.
Academic achivement (50th
percentile).
Baby weight, baby length.
(Rarely used.)
Only depends on the
minimum and the
maximum values.
Elementary Statistics
Standard Deviation
(v.090728)
Standard Deviation Exercise
You will get a problem like this in a quiz and probably in the next test. (During the test you will
be given just an EMPTY table grid and you have to supply all the headings and all the formulae.)
∑x = x
Sample Mean:
x=
Sample Variance:
v = s2 =
+ ... + xn
n
1
n
∑(x − x)
∑ (x − x)
Sample Standard Deviation: s = s =
x
x−x
L1
L2 = L1 − ___
=
n −1
2
( x1 − x ) 2 + ... + ( xn − x ) 2
n −1
2
2
n −1
( x − x )2
L3 = L2 ^ 2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
∑ x =sum( L )
∑ (x − x )
=
=
1
x
x=∑ =
n
2
s
2
=sum( L3 )
(x − x)
=∑
n −1
Computed sample standard deviation: s = s =
2
=
∑ (x − x)
Compute the sample standard deviation again
using “1-Var Stat” ([STAT]>CALC>1:1-Var Stats) “Sx”=
Explain the difference, if any, of the results:
© 2009 Chi-yin Pang
2
p.1 of 2
n −1
2
=
Elementary Statistics
Standard Deviation
(v.090728)
Mean and Standard Deviation
for Data from a Frequency Distribution
Sometimes data is note presented in one list of numerical values, but a list of values and the
number of times it was observed. E.g., for [Tri08, p.108, #27], a sample of fifty speeding tickets
gives the following set of data. (The data record the speed of the drivers traveling through a 30
mph speed zone, in the town of Poughkeepsie, New York.
Speed (mph) Frequency
42-45
25
46-49
14
50-53
7
54-57
3
58-61
1
[Tri08, pp.83 and 108] has formulae to compute the sample mean and standard deviation “by
hand”:
x=
∑ ( f ⋅ x)
∑f
s=
n ⎣⎡ ∑ ( f ⋅ x 2 ) ⎦⎤ − ⎡⎣∑ ( f ⋅ x ) ⎤⎦
2
Isn’t it cool?
n ( n − 1)
Long way to compute the Mean and Standard Deviation with the TI:
We can calculate the mean and standard deviation with TI’s 1-VarStats by entering data into L1 as
follows:
enter 43.5 (the mid point between 42 and 45) 25 times,
enter 47.5 (the mid point between 46 and 49) 14 times,
enter 51.5 (the mid point between 50 and 53) 7 times,
enter 55.5 (the mid point between 54 and 57) 3 times,
enter 59.5 (the mid point between 58 and 61) 1 times.
You probably have something better to do with your time than playing this video game.
Nifty way to compute the Mean and Standard Deviation with the TI:
Enter the data into L1, L2 as follows:
Speed (mph) Mid-Point of the Speed Frequency
L1
L2
42-45
46-49
50-53
54-57
58-61
25
14
7
3
1
Now use TI’s 1-Var Stats
[STAT]>CALC>1:1-Var Stats>1-Var Stats L1, L2
For L1, L2, type “[2nd][1][,][2nd][2]”.
Mean =
=
Sample Standard Deviation =
© 2009 Chi-yin Pang
=
p.2 of 2
Elementary Statistics, Probability Contingency Table
(ver.090222)
3. Compute “The probability of a passenger dead
given that the passenger is a 1st class passenger”
and draw lines to link the numbers to the
locations of the contingency table.
# of 1stClassDead
P( Dead | 1st Class ) =
=
total1stClass
Probability Contingency Table
Exercise
A
D
Given the frequency contingency table of Titanic
passenger survival data:
Alive
1st Class
202
Dead
123
(total)
325
2nd Class 3rd Class
118
178
167
285
528
706
Crew
212
(total)
673
1491
885
4. P ( Dead | Crew) =
710
A
D
Alive
Crew
A
D
710
1491
325
285
706
885
2201
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
(total)
202
≈ .0918
2201
7. P ( Alive | 3rd Class ) =
Dead
(total)
Crew
212
673
6. P ( Dead | 2nd Class ) =
1. Fill-in the rest of the following probability
contingency table.
2nd Class 3rd Class
3rd
178
528
5. P ( Alive | Crew) =
We can compute the probability contingency table,
by computing the relative frequency for each cell
x
x
x
with =
=
n GrandTotal 2201
1st Class
2nd
118
167
# ofCrewDead
=
totalCrew
A
D
2201
1st
202
123
A
D
325
≈ .1477
2201
8. P (1st Class | Alive) =
A
D
2. Conditional Probability: Often we talk about
probabilities under a certain condition. For
example, “The probability of a passenger alive
given (the condition) that the passenger is a 1st
class passenger.” For short, we write
P(Alive given 1st Class) or P(Alive | 1st Class).
The probability is computed as follows:
# AlivesAmong1stCls 202
P ( Alive | 1st Cls ) =
=
total1stClass
325
Draw lines to link the numbers to the locations
of the contingency table below.
A
D
#1st ClaAmongAlives 202
=
totalAlives
710
9. P(3rd Cla | Alive) =
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
Page 1 of 3
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
#3rd ClaAmongAlives
=
totalAlives
A
D
1st
202
123
1st
202
123
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
Elementary Statistics, Probability Contingency Table
10. P(3rd Cla | Dead ) =
#3rd ClaAmongDeads
=
totalDeads
A
D
11. P(3rd Class ) =
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
#3rd ClassPassengers
=
totalPassengers
A
D
12. P (2nd Class ) =
1st
202
123
totalPassengers
A
D
706
2201
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
2201
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
13. P(Crew) =
A
D
18. To compute the “probability of a random
passenger who is 2nd Class AND who survived
(alive)” we use the number of the cell that
satisfies both criteria. The important word AND
usually makes the number smaller.
118
P (2nd Class AND Alive) =
2201
A
D
1st
202
123
=
(ver.090222)
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
19. To compute the “probability of a random
passenger who is 2nd Class OR who died
(dead)” we use the number in the cell that
satisfies both criteria. The important word OR
usually makes the number bigger.
P ( Dead OR 2 nd Class )
123 + 167 + 528 + 673 + 118
=
=
2201
2201
A
D
1st
202
123
1st
202
123
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
20.
P (2nd Class OR Alive)
14. P ( Alive) =
=
A
D
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
=
A
D
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
21.
15. P (Dead ) =
P (Crew AND Alive)
A
D
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
=
16. P(not Dead ) =
=
A
D
A
D
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
22.
P (1st Class OR 2nd Class )
=
17. P (not 1st Class ) =
A
D
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
Page 2 of 3
=
A
D
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
Elementary Statistics, Probability Contingency Table
23. P (1st Class AND 2nd Class ) =
A
D
st
nd
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
(ver.090222)
Independency: Events A and B are independent if
probability of A does not depend on whether B
happens or not. That is,
P( A given B) = P( A) = P( A given (not B))
P( A | B) = P( A) = P( A | (not B))
rd
P (1 Class OR 2 Class OR 3 Class )
24.
=
Dependency: Events A and B are dependent if they
are not independent.
=
A
D
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
28. Is “Alive” and being “1st Class” dependent or
independent?
1st Class ) =
=
)=
=
P ( Alive | not1st Class ) =
=
P ( Alive |
25. Here is an easier way to compute
P (1st Class OR 2nd Class OR 3rd Class ) using the
Law of Complement :
P (not A) = 1 − P( A)
or P( B) = 1 − P(not B)
P ( Alive
P(1st Class OR 2nd Class OR 3rd Class )
A
D
= P(not Crew) = 1 − P(Crew) = 1 −
=
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
Conclusion: ________________________
A
D
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
29. Is “Dead” and being “3rd Class” dependent or
independent?
P(
P(Crew OR 3rd Class OR 2nd Class ) =
26.
P(not
) = 1 − P(
) =1−
A
D
=
P(
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
P(
=
A
D
|
)=
=
)=
=
)=
=
A
D
27. Two events are Mutually Exclusive, if they
cannot both happen at the same time. Here is an
example: What’s the “probability that a
passenger is 1st Class AND also 3rd Class.”
P (1st Class AND 3rd Class ) =
|
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
Conclusion: ________________________
30. Write a sociological conclusion about your
conclusion of #27 and #28.
1st
202
123
2nd
118
167
3rd
178
528
Crew
212
673
710
1491
325
285
706
885
2201
Fact: If A and B are mutually exclusive, then
P ( A AND B) = 0. (Meaning they can never
happen at the same time.)
Page 3 of 3
Elementary Statistics
Contingency Table: Conditional Probability Stacked Bar Graph Worksheet
(Ver.090226)
STEP 1: Form a question of interest.
STEP 1: Form a question of interest.
Does ____________ depend on ____________?
Does ____________ depend on ____________?
STEP 2: Enter the variable names and the observed frequencies.
STEP 2: Enter the variable names and the observed frequencies.
The outcomes you are curious about.
This is the RESPONSE VARIABLE.
The conditions that are easier to assess.
This is the EXPLANATORY VARIABLE.
The outcomes you are curious about.
This is the RESPONSE VARIABLE.
R=
S=
T=
U=
Total
Freq =
Freq =
Freq =
Freq =
P(R | E)=
P(R | F)=
P(R | G)=
P(R | H)=
Freq =
Freq =
Freq =
Freq =
P(S | E)=
P(S | F)=
P(S | G)=
P(S | H)=
Freq =
Freq =
Freq =
Freq =
P(T | E)=
P(T | F)=
P(T | G)=
P(T | H)=
Freq =
Freq =
Freq =
Freq =
P(U | E)=
P(U | F)=
P(U | G)=
P(U | H)=
Σ Freq =
Σ Freq =
Σ Freq =
Σ Freq =
Explanatory Variable (conditions, possible explanation for different outcomes)
E=
F=
G=
H=
Response Variable (outcomes)
Response Variable (outcomes)
Explanatory Variable (conditions, possible explanation for different outcomes)
E=
F=
G=
H=
The conditions that are easier to assess.
This is the EXPLANATORY VARIABLE.
R=
S=
T=
U=
Total
Freq =
Freq =
Freq =
Freq =
P(R | E)=
P(R | F)=
P(R | G)=
P(R | H)=
Freq =
Freq =
Freq =
Freq =
P(S | E)=
P(S | F)=
P(S | G)=
P(S | H)=
Freq =
Freq =
Freq =
Freq =
P(T | E)=
P(T | F)=
P(T | G)=
P(T | H)=
Freq =
Freq =
Freq =
Freq =
P(U | E)=
P(U | F)=
P(U | G)=
P(U | H)=
Σ Freq =
Σ Freq =
Σ Freq =
Σ Freq =
STEP 3: Compute the totals, & the conditional probabilities, P(R | E) etc.
STEP 3: Compute the totals, & the conditional probabilities, P(R | E) etc.
STEP 4: Construct the stacked bar graphs with the conditional
probabilities.
STEP 4: Construct the stacked bar graphs with the conditional
probabilities.
STEP 5: Make conclusion (The response and explanatory variables are dependent,
STEP 5: Make conclusion (The response and explanatory variables are dependent,
if and only if the graphs are uneven.)
if and only if the graphs are uneven.)
©2009 by Chi-yin Pang
Page 1 of 1
Elementary Statistics
Binomial Distribution Motivation Example
(v.090702++)
Tools for answering Question 4: The basic information are
Binomial Distribution Motivation Example
In 1972, Rodrigo Partida was convicted of burglary with the intent to
commit rape. The trial was held in Texas’ Hidalgo County, which had
181,525 people eligible for jury duty, and 80% of them were MexicanAmerican. There were 12 jurors, and 7 of them were MexicanAmerican. (See more detail in [Tri08, p.193].)
To compute the z-score of 7 MAs, we need the mean and standard
deviation. The mean, or the expected value is just 80% of the 12
jurors.
μ = np =
Question: Based on 7 Mexican-American out of 12 jurors, can we
conclude that the jury selection process discriminated against
Mexican-American?
A Simple Minded Assessment: Compute the ratio of MexicanAmericans (MAs) among the Jurors.
Ratio of MA in the jury =
#MAs
=
#People in jury
=
=
n = Number of random picks (sample size)
= Number of jurors =
p = Probability of randomly picking a MA =
q = Probability of randomly picking a non-MA = 1 − p =
By “magic” that we will explain later, the standard deviation is:
%
Ratio of MA in the population =
σ = npq =
Solution to Question 4: Since we are interested in the z-score of 7
MAs
x=
Your conclusion:
z=
=
Formula
Rephrase the question (Question 2): Is 7 MAs out of 12 jurors
unusually low, if the jury process was random?
=
Plug in numbers
Answer Question 4:
More rephrase (Question 3): Is 7 MAs more than 2 standard
deviations below the expected value of MAs selected?
More rephrase (Question 4) Is the z-score, z =
than − 2?
x−μ
σ
, of 7 MAs less
Answer the original question:
Are Questions 2, 3, 4 and the original question equivalent?
© 2009 Chi-yin Pang
Page 1 of 2
Final answer
Elementary Statistics
Binomial Distribution Motivation Example
(v.090702++)
How come σ = npq ?
Next, we setup the lists for the distribution for “how many MA are
selected”:
We will not prove this mathematically, but we will demonstrate it by
using the TI calculator:
Step 1: We will “store” all 13 probability values into L2.
Use the binompdf(12,0.8) function without the “x” parameter.
The “arrow” is the [STO ►] key.
•
To compute
the probability of getting exactly 0 MAs,
the probability of getting exactly 1 MAs,
…,
the probability of getting exactly 12 MAs
•
To compute population mean and standard deviation, μ and σ ,
with “1-Var Stats L1, L2”
•
Then compare the previous answers computed by the formulae.
Computing the probabilities of 0 MAs, 1 MA, 2 MAs etc:
Step 2: Type 0, 1, 2, …, 12 into L1, to complete the description of the
discrete probability table.
Let X = the number of MAs selected, then we are interested in
computing the following probabilities:
P ( X = 0)
P ( X = 1)
...
P ( X = 12)
Again, by “magic” that we will explain later, for any given x,
Step 3: Use 1-Var Stats L1, L2 to compute population mean and standard
deviation, μ and σ .
The prob. of getting exactly x MAs
n!
p x q n− x
x !(n − x)!
= binompdf (n, p, x)
= P ( X = x) =
where binompdf(n,p,x). is the TI function “Binomial
probability density function”. The key sequence is
[2nd][VARS]>DISTR>0:binompdf(
For example, for x=10
binompdf(12,0.8,10) =
© 2009 Chi-yin Pang
μ=x=
σ =σx =
Step 4: Compare to the results given by the formulae.
μ = np =
σ = npq =
Page 2 of 2
San Jose City College
Math 63, Elementary Statistics
NAME:_______________________
Binomial Probability Computation (v.090721)
Binomial Probability Computation
Homework
These exercises cover most of the different
situations for binomial probabilities. These are
standard tricks, and some of them will appear in the
test. They use the formula for complementary event,
P (not A) = 1 − P( A) , over and over again. These
tricks will be used again with the normal
distribution.
The following are all the formulae that you need:
P( successes = x)
= P(exactly x successes)
n!
=
p x q ( n− x ) = n C x p x q ( n− x )
x! (n − x)!
= TI ' s binompdf (n, p, x)
4. Compute P(no success) “by hand”.
From here on, just use TI’s binompdf() and
binomcdf(). (1) Write the TI function you use and
then (2) the numerical answer.
5. P (exactly 3 successes )
Hint: binompdf(15, 0.1, 3).
6. P(exactly 10 successes)
7. P(no success)
P( successes ≤ x)
= P(at most x successes)
= P(0) + ... + P( x) = TI ' s binomcdf (n, p, x)
(Get to binompdf() or binomcdf() by
[2nd][VARS](DISTR)>DISTR> …
8. P(5) (This means P(x=5).)
-------------------------------------------------Assume n=15, p=0.1.
9. P(at most 3 successes)
Hint: binomcdf(15, 0.1, 3).
1. What is q?
2. Compute P(exactly 3 successes) “by hand”.
Hint: use TI’s 15C3*.1^3*.9^12. (For nCx type n
first then [MATH]>PRB>3:nCr.)
10. P(at most 7 successes)
11. P(at least 13 successes)
Hint: 1 − P( successes ≤ 12)
3. Compute P (exactly 10 successes) “by hand”.
© 2009 Chi-yin Pang
Page 1 of 2
San Jose City College
Math 63, Elementary Statistics
NAME:_______________________
Binomial Probability Computation (v.090721)
21. (Cont.) You can easily get the whole list of
P ( MAs ≤ 0), P ( MAs ≤ 1),..., P ( MAs ≤ 12)
with one TI formula
12. P(at least 5 successes)
binomcdf (12,0.8) ►L1
where “►” is the [STO►] (store) key.
13. P(at least 10 successes)
About how many MA jurors is the cutoff point
of being “unusually low”?
14. P(not all are successful)
Hint: use either
1 − P( successes = 15) = 1 − binompdf (15, 0.1,15)
or P ( successes ≤ 14) = binomcdf (15, 0.1,14)
22. (Cont.) What is the probability of having 10 or
more MA jurors?
15. P(all failed)
23. (Cont.) What is the probability of having 11 or
more MA jurors?
16. P(not all failed)
17. P(at least one failure)
24. (Cont.) What is the probability of having all MA
jurors?
18. Texas’ Hidalgo County has 80% MexicanAmericans (MA’s). If the jury selection of 12
jurors is random, what’s the probability of
having 7 or fewer MA jurors?
25. Gadgetco is selling a Gizzmo that has a defect
rate of 1 in a thousand. Last year, they sold
3800 Gizzmo. What is the expected returned
Gizzmos?
= 80% = 0.8;
= 12;
=7
19. (Cont.) What is the probability of having 6 or
fewer MA jurors?
26. (Cont.) What is the probability of 0 Gizzmo
returned? Is this unusual?
27. (Cont.) What is the probability of at most 4
Gizzmo returned? Is this unusual?
20. (Cont.) What is the probability of having 8 or
fewer MA jurors?
© 2009 Chi-yin Pang
Page 2 of 2
Elementary Statistics
NAME:_______________________
Binomial Applications (v.090721)
Binomial Applications Homework
Description of Situation
Crux
=
n
(1) At the post office closest to Graceland, one in ten
letters that arrive are addressed to Elvis. In a
p=
random sample of seven letters arriving at this post
office, … [FrMa97p.83]
q=
Success =
X=
(2) Would most wives marry the same man again if
given the chance? According to a poll of 608
married women conducted by Ladies Home Journal
(June 1988), 80% would, in fact, marry their current
husbands. If you randomly sample 20 people, …
n=
p=
q=
[FrMa97p.87]
Question
… what is the probability that
exactly three are addressed
to Elvis?
x=
What is the probability that
exactly 60% or them would
marry their current husband?
X=
,
)
P( X
)
= binom ____(
=
,
,
)
x=
P( X
)
= binom ____(
=
,
,
)
In a random sample of 20
female fence lizards, what is
the probability that at least 15
will be resting? x =
P( X
)
= binom ____(
=
,
,
)
P( X
)
= binom ____(
=
,
,
)
In a random sample of
In a random sample of 200
female fence lizards, would
you expect to observe fewer
than 190 at rest? x =
P( X
)
= binom ____(
=
,
,
)
What is the probability that
last Friday’s production will
be shipped?
NEED MORE INFO
P( X
)
= binom ____(
=
,
,
)
x=
[FrMa97p.87]
Success =
X=
(4) A factory manufactures ball bearings. Each
production day the quality control department
randomly selects 10 ball bearings and checks them
for defects. Suppose that last Friday the machines
were not calibrated correctly and consequently 60%
of that day’s production of ball bearings were
defective. [FrMa97p.83]
n=
p=
q=
In a random sample of 20
female fence lizards, what is
the probability that fewer than
10 will be resting? x =
n=
p=
q=
)
,
x=
What is the probability that at
least 60% of them would
marry their current husband?
(3) Zoologists have discovered that animals spend a
great deal of time resting. For example, a female
fence lizard will be resting at any given time is 0.97.
,
P( X
)
= binom ____(
=
What is the probability that at
most 60% of them would
marry their current husband?
Success =
Solution
P( X
)
= binom ____( ,
=
x=
Success =
X=
Page 1 of 2
Elementary Statistics
NAME:_______________________
Binomial Applications (v.090721)
n=
(5) An airline, believing that 5% of passengers fails
to show up for flights, overbooks (sells more tickets
p=
than there are seats). Suppose a plane will hold 265
passengers, and the airline sells 275 seats. [BVD07.400.26] q =
Success =
What is the probability the
airline will not have enough
seats so someone gets
bumped?
P( X
)
= binom ____(
=
,
,
)
What is the probability one of
these classes would not have
enough lefty arm tablet?
P( X
)
= binom ____(
=
,
,
)
Should he suspect he was
misled about the true success
rate? (Hint: Compute the
probability of making up to 10
sales.)
P( X
)
= binom ____(
=
,
,
)
What is the probability that he
will answer at least 80% of
the questions correctly and
get at least a B grade on the
quiz?
P( X
)
= binom ____(
=
,
,
)
x=
X=
(6) A lecture hall has 200 seats with folding arm
tablets, 30 of which are designed for left-handers.
The average size of classes that meet there is 188,
and we can assume that about 13% of students are
left-handed. [BVD07.400.25]
n=
p=
q=
x=
Success =
X=
(7) A newly hired telemarketer is told he will probably n =
make a sale on about 12% of his phone calls. The
p=
first week he called 200 people, but only made 10
sales. [BVD07.400.27]
q=
Success =
X=
(8) A student who did not study for 20-question
true/false quiz in his biology class must randomly
guess the answer to each question. [FrMa97p.83]
Success =
X=
x=
n=
p=
q=
x=
Hints for selected problems:
(1)
n = 7, p = 0.1, q = 0.9, P( x = 3) = binomPDF (7, 0.1, 3)
n = 20, p = 0.8, q = 0.2
(2a) P ( x = n × 60%) = P ( x = 12) = binomPDF ( 20, 0.8, 12)
(2b) P ( x ≤ 12) = binomCDF ( 20, 0.8, 12)
(2c) 1 − binomCDF ( 20, 0.8, 11)
(2)
(3a)
(3b)
(3c)
P( x ≥ 15) = 1 − binomCDF (20, 0.97, 14)
P( x < 10) = P( x ≤ 9) = binomCDF (20, 0.97, 9)
P( x < 190) = P( x ≤ 189) = binomCDF (200, 0.97, 189)
(4) For example, you may want to assume that the acceptance criterion is “no defect among any of the 10 samples.” In this case, you would compute P(x=0)
=…
(8)
n = 20, p = 0.5, q = 0.5, P( x ≥ n × 80%) = P( x ≥ 16) = 1 − binomCDF (20, 0.5, 15)
Page 2 of 2
Elementary Statistics
NAME:_______________________
Mean & Std. Dev. of Binomial Dist. HW (v.090305)
Mean & Standard Deviation of Binomial Distribution
These exercises drill you on: (1) reading the word problem, and
(2) computing the mean, standard deviation, & usual range of the binomial distribution.
Ref.
Triola
2008)
p.222
#12a,b
p.222
#11ab
Basic Info
For the distribution of
number of successes
Mean = μ = np
Range Rule of Thumb for usual number of
successes
Maximum usual value: μ +2σ
Minimum usual value: μ − 2σ
Std.Dev. = σ = npq
n = 100
Max usual=μ +2σ = 14 + 2 × 3.47 = 20.94
Success=”the
candy is yellow”
X=”# of yellow
candies”
p=0.14
q=1−p
=1–0.14=0.86
μ = np = (100)(.14) = 14
Min usual=μ − 2σ = 14 − 2 × 3.47 = 7.06
σ = npq
8 is within the usual range, therefore not
unusual. Therefore, the result of 8 does
not cast doubt on the claim of 14% yellow.
Success =
n=
X=
μ=
p=
σ=
= 100(.14)(.86) ≈ 3.47
Max usual=
Min usual=
q=
p.222
#13ab
Success =
n=
X=
μ=
p=
σ=
Max usual=
Min usual=
q=
Page 1 of 2
Elementary Statistics
p.223
#15ab
Success =
n=
X=
μ=
p=
σ=
NAME:_______________________
Mean & Std. Dev. of Binomial Dist. HW (v.090305)
Max usual=
Min usual=
q=
p.223
#17ab
Success =
n=
X=
μ=
p=
σ=
Max usual=
Min usual=
q=
p.223
#18ab
Success =
n=
X=
μ=
p=
σ=
Max usual=
Min usual=
q=
p.224
#20ab
Success =
n=
X=
μ=
p=
σ=
Max usual=
Min usual=
q=
Page 2 of 2
Height Distribution
.1
0.08
dnorm( x , 85.1 , 4.5)
dnorm( x , 86.7 , 4.6) 0.06
dnorm( x , 163.3 , 8.3)
dnorm( x , 176.9 , 9.2)
0.04
0.02
0
0
60
80
50
100
120
x
Height in cm
2 yr Girls
2 yr Boys
20 yr Women
20 yr Men
140
160
180
200
210
Elementary Statistics
NAME:_______________________
Standard Normal’s CDF Homework (v.090305)
Standard Normal’s CDF Homework
The Standard Normal Distribution has mean = 0 and
standard deviation = 1 ( μ = 0, σ = 1 ).
Notations:
• It is sometimes denoted by N (0,1) .
( N ( μ , σ ) denotes the normal distribution with
mean, μ, and standard deviation, σ.)
• N (0,1) is also called the Z-distribution.
• The values on the horizontal axis are called
z-scores.
The Z-distribution is our most beloved ♥ distribution
that helps us to do most of our statistical inferences.
(3) Label Z = –1, Z = –2, Z = –3 on the following Zdist.
(4) Label Z = –2.5, Z = 0.3, Z =1.6 on the following
Z-dist.
All the following problems are for the Z-distribution.
(1) Z means how many Std. Dev. in relation to the
mean.
z-score
English description
Z=1
One std. dev. ABOVE the mean.
The area under the standard normal curve is exactly 1
which correspond to the sum of the probability is
exactly 1.
(For students who know calculus: The standard normal Probability
z
Density Function (PDF) is f ( z ) = 1 e − 2 . The Cumulative Distribution
2
Z=2
2π
Function (CDF) is CDF ( z0 ) = ∫
Z = 1.6
z = z0
z = −∞
f ( z )dz . The area under the
entire curve is CDF (∞) = 1 .)
Z=
Three std. dev. above the mean.
Z = –3
Three std. dev. BELOW the mean.
(5a) Shade the area under the curve between z1 = 0.6
and z2 = 2.0.
Z = –0.4
Z=
One std. dev. below the mean.
Z=0
The mean. (Not above and not below.)
(5b) Find the shaded area, P(0.3 ≤ Z ≤ 1.6) , by
[2nd][VARS](DISTR) > DISTR > 2:normalcdf(
(2) Label Z =0, Z =1, Z =2, Z =3 on the following Z dist. (
P (0.6 ≤ Z ≤ 2.0) = normalcdf (0.6 , 2 ) =
NOTE: We will never use “normalPdf(z)”.
(5c) What percentage (%) of the population are
between 0.6 and 2 standard deviations above the
mean?
Page 1 of 3
Elementary Statistics
NAME:_______________________
Standard Normal’s CDF Homework (v.090305)
(6a) Shade the area under the curve between z1 = –1
and z2 = 1.
For the “left tail” area, use z1 = –∞.
For the “right tail” area, use z2 = ∞.
TI’s biggest number is E 99 = 10 99 . Use E99 as ∞.
The key sequence is [2nd][ , ](EE) > 99.
(9a) Shade the area to the LEFT of z2 =0.
(6b) Find the shaded area.
P(
Z
) = normal
(
,
)=
(7a) Shade the area under the curve between z1 = –2
and z2 = 2.
(9b) Find the above shaded area by
P(
Z
) = normal
(
,
)=
(10a) Shade the area to the LEFT of z1 = –2.
(7b) Find the above shaded area by
P(
Z
) = normal
(
,
)=
(7c) The “Range Rule of Thumb” (p.98) says that the
“usual values” are between 2 standard deviations
below and above the mean. What percentage of
population are considered “usual”?
(10b) Find the above shaded area by
P(
Z
) = normal
(
,
)=
(11a) Shade the area to the RIGHT of z1 = –2.
(8a) Shade the area under the curve between z1 = –3
and z2 = 3.
(11b) Find the above shaded area by
P(
(8b) Find the above shaded area by
P(
Z
) = normal
(
,
)=
(8c) Do the results from (6b, 7b, 8b) agree with the
“68-95-99.7%” Rule (see p.100)?
(8d) What percentage of population are more than 3
standard deviation from the mean?
P ( not (−3 ≤ Z ≤ 3) ) = 1 − P (−3 ≤ Z ≤ 3) =
Z
) = normal
(
,
)=
(12a) Compute the sum of the answers from #10b and
#11b above. That is the portion of the population that
are more than 2 standard deviations about the mean.
(12b) What is the percentage?
Page 2 of 3
Elementary Statistics
NAME:_______________________
Standard Normal’s CDF Homework (v.090305)
(13) Compute the following:
P(
Z ≤ −1.6) = normal
P(1.6 ≤ Z
P(
(
)=
,
) = normal
(
,
)=
Z ≤ 1.6) = normal
(
,
)=
P(−1.6 ≤ Z
) = normal
(
NOTE: For the probability of both tails (both the left
tail and right tail) you can add the area of the two tails
two tails or simply multiply the area of one tail by 2.
CAUTION: Do this only if the tails do not overlap,
otherwise you will get a non-sense probability greater
than 1!
)=
,
Notice the similarities and differences of the above
results.
(14) Compute the following:
(16a) Shade the both the area of
the LEFT tail of z1 = –2, and
the RIGHT tail of z2 = 2
P(−1 ≤ Z ≤ 2.5) = normal
(
,
)=
(16b) Find the following probabilities:
Left tail:
P(−2.5 ≤ Z ≤ 1) = normal
(
,
)=
P(
P(1 ≤ Z ≤ 2.5) = normal
(
P(−2.5 ≤ Z ≤ −1) = normal
(
,
P(
)=
Notice the similarities and differences of the above
results.
P (−0.3 ≤ Z ≤ 0) = normal
P (−0.3 ≤ Z ≤ 0.3) = normal
(
,
)=
Z
) = normal
(
,
)=
(16c) The tails do not overlap. Compute sum of the
tails. (CAUTION: We can only do this add, if the
tails do not overlap.)
P((Z ≤ −2) or (2 ≤ Z ))
= P( Z ≤ −2) + P(2 ≤ Z )
=
(15) Compute the following:
P (0 ≤ Z ≤ 0.3) = normal
) = normal
Right tail:
)=
,
Z
(
)=
,
(
)=
,
(
,
)=
Notice the similarities and differences of the above
results.
Page 3 of 3
Elementary Statistics
NAME:_______________________
Inverse of Normal CDF Homework (v.090305)
Inverse of Normal CDF Homework
z score = invNorm(
TI’s normalcdf(*,*) function goes
from given z-scores
to probability.
I.e., it takes two z-scores as inputs, z1 < z 2 , and it
outputs the probability that the random variable
would occurs between those z-scores.
normalcdf ( z1 , z2 ) outputs
(2a) Compute the z-score that is the 75th percentile.
)=
(2b) Label that z-score and shade the left tail.
p = P( z1 ≤ Z ≤ z2 )
(3a) Compute the z-score that is the 90th percentile.
TI’s invNorm(*) does the “opposite.” It goes
from a given percentage (a probability)
to the percentile (a z-score).
z score = invNorm(
)=
(3b) Label that z-score and shade the left tail.
invNorm( p) outputs
z
such that P(−∞ ≤ Z ≤ z ) = p
For example, given a percentage of 90% = 0.9, we
can get to invNorm(*) by
[2nd][VARS](DISTR) > DISTR > 3:invNorm(
(4a) Compute the z-score that is the 95th percentile.
z score = invNorm(
meaning that P ( Z ≤ 1.281551567) = 0.9 .
In fact, if we plug the output back into normalcdf(-E99, *)
we get 0.9 back (with a high precision).
)=
(4b) Label that z-score and shade the left tail.
(For students who know calculus: invNorm(a) = CDF −1 (a) ,
the inverse of CDF, the Cumulative Distributive Function.)
(5a) Compute the z-score that is the 40th percentile.
(1a) Compute the z-score that is the 50th percentile.
z score = invNorm(
z score = invNorm(
(5b) Label that z-score and shade the left tail.
)=
)=
(1b) Label that z-score and shade the left tail.
(6a) Compute the z-score that is the 25th percentile.
z score = invNorm(
Page 1 of 2
)=
Elementary Statistics
NAME:_______________________
Inverse of Normal CDF Homework (v.090305)
z1 = invNorm(0.1) =
(6b) Label that z-score and shade the left tail.
z 2 = invNorm(0.9) =
(10b) Label that z-scores and shade the middle and
label the percentage.
(7a) Compute the z-score that is the 10th percentile.
z score = invNorm(
)=
(7b) Label that z-score and shade the left tail.
(11a) Find the z-scores that “trap” the middle 50% of
the population.
z1 = invNorm(
)=
z 2 = invNorm(
(8a) Compute the z-score that is the 5th percentile.
z score = invNorm(
)=
(11b) Label that z-scores and shade the middle and
label the percentage.
)=
(8b) Label that z-score and shade the left tail.
(9a) Compute the z-score that is the 99th percentile.
z score = invNorm(
)=
(12a) Find the z-scores that “trap” the middle 95% of
the population.
z1 = invNorm(
)=
z 2 = invNorm(
)=
(9b) Label that z-score and shade the left tail.
(12b) Label that z-scores and shade the middle and
label the percentage.
Often we want to know z-scores that contains the
middle percentages, e.g.,
(10a) Find the z-scores that “trap” the middle 80% of
the population. (The middle 80% includes all the zscores from the 10th percentile to the 90th percentile.)
Page 2 of 2
[BB95, p.382]
SAT (Scolastic Aptitute Test)
Mean=500
Std.Dev.=100
ACT (American College Testing)
Mean=18
Std.Dev.=6
X-Land: SAT Distribution
X-Land: ACT Distribution
0.07
0.008
0.06
0.05
0.04
0.006
0.004
0.03
0.02
0.01
0.002
0
500
SAT Scores
1000
0 6 12 18 24 30 36 42
ACT Scores
(1) Donald Pato took SAT and got 666 points.
Micky Raton took ACT and got 29 points.
Who has a higher achievement?
(I.e., who achieved a higher percentile?)
Z-Land: N(0,1)
(1) Univ. of PRB (People's
Republic of Beserkeley) accepts
only students above the 85th
percentile with their SAT or ACT.
Find the minimum acceptance
scores.
minimum SAT score =
minimum ACT score =
Z-Land: N(0,1)
Z-Land: N(0,1)
San Jose City College, Spring 2007
Math 63 Statistics
Ver. 071022
p.1 of 1
Central Limit Theorem (CLT)
The theorem that proves: “The ____ justifies the mean.”
Situation:
1. Given ANY distribution with
unknown mean = μ and
unknown standard deviation = σ,
2. You want to use a sample of size n to estimate the
mean (μ). (I.e., take n samples, x1 , x2 ,..., xn , and use
x=
x1 + x2 + ... + xn to estimate μ.)
n
CONDITION:
When n is large (e.g., n ≥ 30 ),
RESULT:
The sample mean, x , is approximately normally
distributed with
mean = μ x = μ
std .dev. = σ x =
σ
n
In other word, “ x ~ N ⎛⎜ μ , σ ⎞⎟ ”.
⎝
n⎠
Exceptions to the n ≥ 30 condition
• If the original distribution IS NORMALly distributed,
then the result is true for any n.
• If the original distribution is already symmetric and
uni-modal, n does not have to very large for the
distribution of x to be approximately normal.
Paraphrases of the CLT result & Notes
• The sample mean, x , distribution “gets more and
more normal” as the sample size (n) increases.
• Even if the original distribution is highly skewed,
and many peaks, the sample mean, x , distribution
will still get more and more symmetric, uni-model,
and bell-shaped (i.e., normal) as the sample size (n)
increases.
• The mean (center point) never changes for the
sample mean distribution.
• As the sample size (n) increases, the sample mean
distribution has smaller and smaller spread, σ x = σ .
n
Elementary Statistics
NAME:_______________________
Normal Dist. App.: Problem Setup HW (ver.081014)
Normal Dist. Applications: Problem Setup Homework
These exercises drill you on: (1) Getting the important information from the word problem, and
(2) Forming the right strategy to solve the problem.
Ref.
Triola
2008)
p.254
#14
“Fish out the
Basic Info”
X = the
woman’s
height
μ = 63.6"
σ = 2.5"
p.255
#15
X=
p.255
#16
X=
Express the question in math
terms
Prob. of meeting the
requirement:
P (58" ≤ X ≤ 80") = ?
Solution strategy
Case P=?. Step 1: Translate the given x to z (z-score)
Step 2: Compute the probability using normalcdf( * , * )
Case x=?. Step 1: Turn the probability to percentile (in
decimal, say p).
Step 2: Find the z-score: z = invNorm( p )
Step 3: Translate the z back to x.
Let x1=58” and x2=80”.
Find the z-scores for x1 and x2.
Then use normalcdf( z1 , z2 )
normalcdf( -E99 , z1 )
normalcdf( z2 , E99)
to find the probabilities.
Prob. of too short:
P ( X ≤ 58") = ?
Prob. of too tall:
P (80 " ≤ X ) = ?
Page 1 of 2
Elementary Statistics
p.255
#18
X=
p.255
#21a
X=
p.256
#21b
X=
p.256
#22
X=
NAME:_______________________
Normal Dist. App.: Problem Setup HW (ver.081014)
Page 2 of 2
Elementary Statistics
Central Limit Theorem Application Setup
(v.090328)
Central Limit Theorem (CLT) Application Setup
These exercises drill you on setting up word problems.
Mean & S.D. for the
Ref.
Assign Correct
Does CLT
Triola
Symbols
Apply?
Sample Means
σ
2008)
“Yes” if n ≥ 30 or
μX = μ σ X =
n
if X-dist. is normal
p.276
CLT applies
X=Random
X =R.V. of sample
#9ab
because
Variable of man
mean weight
weights are
weight
μX =
normally
n = 12
distributed.
p.276
#10
μ = 172 lb
σ = 29 lb
σX =
X=R.V. of
X
= $182
= $105
= 35
= $50
p.276
#11
=R.V. of
μX =
σX =
X
=R.V. of
μX =
σX =
p.276
#12ab
Page 1 of 2
Rewrite question with
math symbols
12a. P(X>167)=?
12b. P( X >167)=?
Elementary Statistics
Central Limit Theorem Application Setup
(v.090328)
p.277
#14ab
p.277
#15ab
p.277
#16ab
p.278
#19ab
p.278
#20ab
Page 2 of 2
Elementary Statistics
Confidence Interval Facts
(v.090721)
Confidence Interval Facts
This document summarizes terms and facts about confidence interval problems.
The notation follows Triola’s Essentials of Statistics.
Measurement
Proportion
Problems
Problems
The goal is to Population mean
Population proportion
μ
estimate …
Triola Term: p
The given
Case 1: the raw data,
n = sample size
sampled data
x1 ,..., xn .
x = # successes
Case 2: the sample mean, x ,
(# failures = n − x)
and sample std. dev., s.
In either case, the population
standard deviation, σ , might
be also given.
Check
Assumptions
Proceed only if
n > 30 or
1.
the distribution is normal
2. the sample is random
Point estimate
Sample mean
x + ... + xn
x= 1
n
s ⎛ σ ⎞
⎜≈
⎟
n ⎝
n⎠
where s is the sample
standard deviation.
Standard Error
(of the sample
statistic)
Confidence
Level
α /2=
1 − CL
2
Critical Value
(for a given
CL)
E.g.,
CL = 90% = .90
CL = 95% = .95
CL = 99% = .99
E.g.,
CL = .90 → α / 2 = .05
CL = .95 → α / 2 = .025
CL = .99 → α / 2 = .005
zα /2 = −invNorm(α / 2)
Some times the problem gives
you the sample proportion,
p̂ , and you need to compute
x = npˆ .
Proceed only if
#successes ≥ 5 and
1.
#failures ≥ 5
2. the sample is random, and
independent to each other.
Sample proportion
x
pˆ =
n
pˆ (1 − pˆ )
n
⎛
p (1 − p ) ⎞
⎜≈
⎟
⎜
⎟
n
⎝
⎠
E.g.,
CL = 90% = .90
CL = 95% = .95
CL = 99% = .99
E.g.,
CL = .90 → α / 2 = .05
CL = .95 → α / 2 = .025
CL = .99 → α / 2 = .005
zα / 2 = −invNorm(α / 2)
tα /2 = −invT (α / 2, n − 1)
© 2009 Chi-yin Pang
Page 1 of 2
Comments
This step is for making sure
that the sample size is large
enough for the Central
Limit Theorem.
This step is also for
documenting assumptions.
This is our best 1-number
guess. (It might be close,
but it is almost never right
on the dot.)
This is an approximation of
σ x (the standard deviation
of the sample means), and
σ p̂ (the standard deviation
of the sample proportion).
The “center area” (center
probability).
The “tail area” (tail
probability) of one tail.
The number of standard
deviations from the center
to the edge of the “center
area.”
Elementary Statistics
Margin of
Error
Confidence
Interval
TI Function
Confidence Interval Facts
(v.090721)
E = tα / 2 ( std .err.)
E = zα / 2 ( std .err.)
⎛ s ⎞
= tα / 2 ⎜
⎟
⎝ n⎠
= zα /2
x±E
pˆ ± E
( x − E, x + E )
( pˆ − E , pˆ + E )
If σ is not available: (the
usual case)
[STAT]>TESTS>A:1-PropZInt…
[STAT]>TESTS>8:TInterval…
If σ is available:
[STAT]>TESTS>7:ZInterval…
Given margin
of error, E, to
determine n.
⎛z σ ⎞ ⎛z s⎞
n ≥ ⎜ α /2 ⎟ ≈ ⎜ α /2 ⎟
⎝ E ⎠ ⎝ E ⎠
2
2
The length from the center
to the edge.
pˆ (1 − pˆ )
n
The input for number of
successes, x, MUST be an
integer (no decimal).*
If estimate of p (π) is
[z ]
n ≥ α /2
We hope to capture the
population parameter within
this interval.
* If the number of
successes, x, is computed by
x = npˆ , always round to
the nearest integer.
pˆ (1 − pˆ )
E2
2
available:
If estimate of p (π) is not
available:
⎛ z × 0.5 ⎞ [ zα /2 ] 0.25
n ≥ ⎜ α /2
⎟ =
E
E2
⎝
⎠
2
If the result is a decimal,
ALWAYS round up.
If we increase
n…
E would be smaller, therefore
( x − E , x + E ) would be
narrower.
2
If the result is a decimal,
ALWAYS round UP.
E would be smaller, therefore
( x − E , x + E ) would be
narrower.
Because
⎛ s ⎞
E = tα / 2 ⎜
⎟
⎝ n⎠
E = zα /2
If we increase
CL
(confidence
level) …
E would be bigger, therefore
( x − E , x + E ) would be
E would be bigger, therefore
( x − E , x + E ) would be
wider.
wider.
If you want
smaller E
(higher
precision)
For larger σ or
s…
Either increase n
or decrease CL.
Either increase n
or decrease CL.
E would be bigger, therefore
( x − E , x + E ) would be
(Not applicable.)
wider.
For p (π)
closer to
0.5 …
(Not applicable.)
© 2009 Chi-yin Pang
E would be bigger, therefore
( x − E , x + E ) would be
wider.
Page 2 of 2
pˆ (1 − pˆ )
n
Multiple n by 4 would cut E
in half.
San Jose City College, Spring 2007
Math 63 Statistics
Interpretation of Confident Interval
What does it mean by the following sentence?
“The 95% confident interval of weekly income is
($371, $509).”
CORRECT:
•
Standard acceptable answer: “We are 95%
confident that the population’s mean weekly income,
μ, is between $371 and $509.”
Note: Although a little ambiguous, this is the standard acceptable
answer. The following is what “95% confident” really mean.
•
Precise interpretation: “When we compute the
confident interval this way, on average, 95 out of
100 of these intervals would capture the weekly
income’s true mean (the population mean, μ).
This time we have calculated the interval to be
($371, $509).”
INCORRECT:
•
Each worker makes between $371 to $509 per week.
Note: This is just plain wrong.
•
95% of workers make between $371 to $509 per
week.
Note: This is just plain wrong.
•
Any sample mean x would have a 95% chance to
be within ($371, $509).
Note: This is wrong. The confident interval tries to capture μ and not
x ’s.
•
If we compute the mean earning, μ, from the
population census, 95% of the time it would be
within ($371, $509).
Note: This is nonsense. If we compute μ from the population, there
is only one number. That number does not change from time to time.
•
There is a 95% chance that the mean earning, μ, is
within ($371, $509).
Note: μ is a fixed number, it is either in the interval (therefore
probability equals 1) or not in the interval (therefore probability equals
0). The 95% probability refers to this type of interval, and not this
particular interval or any particular interval.
Ver. 071022
p.1 of 1
Elementary Statistics
(See [Triola 2008,
pp.378-380].)
Decision: Reject H0
E.g.,
1. Patient tested positive
2. Fire alarm set off
3. Failed the EPA standard
4. The suspect was “found” guilty
Hypothesis Testing: Errors & Power of a Test
(Ver. 071211) Page 1 of 2
H0 is Actually True
1. Patient is healthy
2. No fire in the building
3. The factory’s discharge has no downstream environment effects
4. The suspect is innocent
Type I Error (with probability α)
α = P(rejecting H 0 | H 0 is true)
DAMAGE:
1. Scared the patient (small cost)
2. Annoyance from the buzz
3. Waste money to comply with an unreasonable standard
4. Innocent got executed
The The distribution of the test
statistics if H0 is true. (Dotted
line.)
The distribution of the test statistics of the
population that we sampled. (The solid line. In
this case, the dotted line and the solid lies are the
same.)
H0 is Actually False
1. The patient has the disease
2. There is a fire in the building
3. The factory’s discharge has downstream environmental effects
4. The suspect committed the murder
Correct decision
1 − β = " The Power of a test"
= P(rejecting H 0 | H 0 is false)
= The probability of supportinga true H 1 .
The The distribution of the test
statistics if H0 is true. (Dotted
line.)
The distribution of the test statistics of the
population that we sampled. (The solid line.)
P(Type-I Error)
= P(reject H0 | H0)
= α = level of significance
α = level of significance
P value < α.
Reject H0, a WRONG decision.
Critical Value
= – invNorm(α)
Decision: Fail to reject H0
E.g.,
1. Patient tested negative
2. Fire alarm did not set off
3. Met EPA standard
4. The suspect was “found” not
guilty
Critical Value
= – invNorm(α)
The sample’s test
statistics.
P value < α.
Reject H0, a correct decision.
The sample’s test
statistics.
Type II error (with probability β)
Correct decision
β = P (not rejecting H 0 | H 0 is false)
The The distribution of the test
statistics if H0 is true. (Dotted line.)
The distribution of the test statistics of the population that
we sampled. (The solid line. In this case, the dotted line
and the solid lies are the same.)
DAMAGE:
1. Lost chance for treatment (may be life)
2. Miss escaping for fire
3. Continue to pollute the environment
4. The murderer got unpunished
The The distribution of the
test statistics if H0 is true.
(Dotted line.)
The distribution of the test statistics of the
population that we sampled. (The solid line.)
P value > α.
Fail to Reject H0, a correct decision.
α = level of significance
P(Type-II Error)
= P(failed to reject H0 | ~H0)
= β (this is not assessable)
P value > α.
Fail to reject H0, a WRONG decision.
α = level of significance
The sample’s test
statistics.
Critical Value
= – invNorm(α)
The sample’s test
statistics.
Critical Value
= – invNorm(α)
Elementary Statistics
NAME: _______________________
(v.090401)
Hypothesis Testing:
“Defining H0, H1” Homework
These exercises drill you on setting up the correct Null and Alternative hypotheses for “Test of
Significance.”
For each of the referenced word problems, read the context of that problem and fill-in the blanks.
(Just use the context of the problems and you DO NOT need to solve what the problems ask.)
Ref. Mean or The English meaning for μ or p “Claim in Math
Null &
Shade Tail &
[Tri Proportion
Notation”
Alternative
State “2-Tail”
2008]
Hypothesis
“Left”, or “Right”
p.394 Mean Prop p = the true proportion of
p = .25
H0: p = .25
#5
the green flowered pea
H1: p ≠ .25
2-Tail
p.403
#10
Mean Prop
μ = the population’s mean
body temperature
μ < 98.6˚F
H0: μ = 98.6˚F
H1: μ < 98.6˚F
Left
p.426
#1a
Mean Prop
p.426
#1b
Mean Prop
p.426
#1c
Mean Prop
p.426
#2
Mean Prop
p.426
#3
Mean Prop
© Chi-yin Pang, 2008
Page 1 of 2
Elementary Statistics
p.426
#4
Mean Prop
p.426
#5
Mean Prop
p.427
#6
Mean Prop
p.427
#8
Mean Prop
p.427
#9
Mean Prop
p.427
#11
Mean Prop
p.427
#12
Mean Prop
© Chi-yin Pang, 2008
NAME: _______________________
(v.090401)
Page 2 of 2
Elementary Statistics
Project 2 Assignment
(v.09728)
Elementary Statistics
Project 2: Conf. Int. or Hypo. Test Assignment
Learning
Objective
The objectives are to:
• Gain experience for a real “field” data collection.
• Apply Confident Interval and/or Hypothesis Testing to analyze data and
make statistical inference.
• Exercise your written presentation skill.
Due Dates
Proposal due _______________
(send me a 1-line e-mail, chi-yin_pang@alumni.hmc.edu)
Report due _________________
Topic
Collect two sets of quantitative data. (If appropriate, you may use the data you
have collected for project 1.) Then use the 2-sample Confidence Interval or
Hypothesis Testing to analyze and make inference from the raw data.
Make you own “fun” topic; especially, a topic that you have ready access to the
data. “Fun” makes work easy. If you are not interested in the topic, it is like
pulling your own teeth.
Your report must include the following 4 sections:
Report’s
1. Introduction: Introduce your topic. (It would be good to discuss your
Content
original prediction of the result.)
Requirements
2. Raw Data: Describe how you collect the data, the source of the data,
and present the raw data.
3. Analysis: Describe your analysis technique.
4. Conclusion: Make your conclusion(s). If appropriate, state what you
would do differently for further investigation.
Editing
Requirement
The requirements are:
• It has to be totally word processed.
•
Caution
Formulae are typed with Equation Editor. E.g., μ =
∑x = x
n
There are may potential “time-eaters” along the way, such as:
• Collecting data.
• Wanting to collect more data to investigating further.
• Thinking and writing up the conclusion.
Start the project early to find out what might surprise you.
© 2009 Chi-yin Pang
Page 1 of 1
1
+ ... + xn
n
Elementary Statistics
Project 2 “Independent Samples” Example
(v.090421)
Project 2 “Independent Samples” Example
(v.090421)
Project 2 “Independent Samples” Example
Analysis
When a Car Ages, Does the MPG Change?
I set up the problem as a 2-sample confidence interval problem. I am trying to estimate the
difference of the means, (P1íP2). The data are independent. (The data did not come in pairs. In
fact, the sample size for the “new car” MPG is not even the same as the sample size of the “old
car” MPG.
Chi-yin Pang
April 16, 2008
Introduction
When a car is new, everything should work perfectly and it’s performance should be at its peak.
However, after the engine had some wear, it would have less internal friction and therefore may
contribute to a higher mileage. This report investigates whether the MPG changes as a car ages.
Method
I use the data from the mileage record from my 1995 Toyota Previa.
The records included the mileage between gas fill ups and the number
of gallons for the fill-up. The first page of the record book is shown
on the right.
The MPG is computed by
Number of miles since the last fill-up
. I
Number of gallons for this fill-up
use a set of MPG data when the car was new (from 0 miles to 10,000
miles), and another set of data when the car was older (from 100,000
miles to 110,000 miles). Let P1 be the true MPG when the car was
new and P2 be the true mean MPG when the car had 100,000 miles. I
will construct the confidence interval of (P1íP2), and see whether it is
totally positive, totally negative, or contains zero.
MPG
Elementary Statistics
Step 1. Checks: The sample sizes are both greater than 30, therefore,
there is no need to check for normality of the distributions. We
assume that the sampling is random in the sense that the diving
condition are different, the sample is an unbiased mix of various
situations.
Just for curiosity I entered the data into TI’s L1 and L2 and made a
comparative box plot (shown on the right). The plots shows that the
distributions are quite symmetrical and more importantly the median seems to be closed together.
Therefore, I really do not expect a big difference betweenP1 and P2.
Step 2. Computation: Since V1 and V2 are unknown, I used the T-distribution for the analysis. I
used TI-83’s 2-SamTInt with a 95% confidence level. (See screen shots on the right.) The input
and the resulted confidence interval are shown below.
miles per gallon
Raw Data
The number of miles driven and the gallons for the fill-ups
were entered in to a spreadsheet and the MPGs for each fill-up
were computed. The table on the right lists the resulted MPG
data. Data Column 1 has the MPG’s for the first 10,000 miles
and Column 2 has the MPG’s for 100,000 to 110,000 miles.
(The sample sizes are 35 and 42 respectively.)
Page 1 of 2
Conclusion
We are 95% confident that difference of the mean MPG’s of the new and old Toyota Previa is
between í0.65 MPG and 1.01 MPG. Since the confidence interval is neither totally positive nor
totally negative, (P1íP2) could be 0. In other words, the MPG did not change significantly when
the Previa aged. We did not see the MPG got better or get worse.
Before the analysis, I mentioned that a wore engine with less friction might result in higher MPG.
Apparently, that is not the case, but perhaps, the modern engines have less wear and a longer life.
It would be interesting to compare the MPG again when the Previa get to be 200,000 miles.
Page 2 of 2
Elementary Statistics
Project 2, Matched Pairs Example
(ver.090421)
Elementary Statistics
Project 2, Matched Pairs Example
(ver.090421)
Raw Data
Project 2 “Matched Pairs” Example
Does Chinese or English
Use Fewer Syllables to Say the Same Thing?
Chi-yin Pang
I decided to sample 20 commonly accessible passages. I took a random sample of 20 verses
from the Book of Daniel, which has a mix of narrative and prophetic prose. Daniel has 12
chapters with a total of 357 verses. I used Microsoft Excel’s RANDBETWEEN(1,357) function to
generate 20 random verses as sample passages. (Note: The sample is “with replacement.”) The
Chinese translation used is the Chinese Union Version (CUV, 1919) and the English translation
used is the American Standard Version (ASV, 1901).
Random
verse
Chapter
Verse
CUV
Chinese
Union Ver.
ASV
American
Std. Ver.
x=
Ch/Am
1
168
5
31
17
22
0.773
2
102
4
2
21
26
0.808
3
243
8
9
28
35
0.800
4
87
3
17
37
38
0.974
5
65
2
44
46
66
0.697
6
334
11
35
38
42
0.905
7
158
5
21
55
81
0.679
8
117
4
17
52
70
0.743
9
10
1
10
64
63
1.016
10
185
6
17
36
45
0.800
11
11
1
11
29
36
0.806
12
98
3
28
60
69
0.870
13
98
3
28
60
69
0.870
14
319
11
20
47
45
1.044
15
149
5
12
60
74
0.811
16
122
4
22
29
38
0.763
17
267
9
16
64
77
0.831
18
153
5
16
44
73
0.603
19
145
5
8
27
29
0.931
20
109
4
9
55
59
0.932
November 15, 2008
Introduction
Whenever I read something, I often feel that Chinese can say it more concisely than English. Is
that really true or just an unfounded feeling. I have used confident level technique to prove that
on average Chinese needs fewer syllables than English to say the same thing.
Method
I set out to compare parallel Chinese/English passages. I count up the number of syllables in
Number of Syllables in Chinese
each language and compute the ratio x
. If Chinese uses more
Number of Syllables in English
syllables, then x > 1. If Chinese uses fewer syllables, then x < 1.
For exampled, with the following parallel passage, Daniel 5:31:
ጆ‫ז‬ԳՕ‫ܓ‬௻‫ڣ‬քԼԲᄣΔ࠷Ա૫೬ࢍഏΖ
Syllables = 17
(Note: Each Chinese character has one
syllable.)
x
Number of Syllables in Chinese
Number of Syllables in English
And Darius the Mede received the kingdom,
being about threescore and two years old.
Syllables = 22
17
| 0.773
22
I planned to take a sample of n parallel passages and use the sample ratios ^x1 , x2 ,..., xn ` to
compute the 95% confident interval (CI) of the mean ratio. Then, basing on the CI’s position
with the 1 (the neutral ratio) I can draw conclusions:
x If CI is totally less than 1, then P 1 .
Therefore, Chinese use fewer syllables on average.
x If CI contains 1, then P could be 1.
Therefore, the evidence supports neither “Chinese use fewer syllables on average”
nor “Chinese use more syllables on average.”
x If CI is totally greater than 1, then P ! 1 .
Therefore, Chinese use more syllables on average.
Page 1 of 3
The last column, the “Chinese/English” ratio is used as the sample data.
Page 2 of 3
Elementary Statistics
Project 2, Matched Pairs Example
(ver.090421)
Analysis
I choose to use Statdisk for performing the analysis, because of the ease of copy-and-paste of the
computer results.
Step 1. I copy the xi’s from Excel to Statdisk’s data column 3.
Step 2. Because of my small sample, n = 20, I
use “Data > Explore Data” to verify that the data is
approximately normal. The histogram looks
approximately normal and the data’s sample
mean and sample standard deviation are also
computed and ready for use as input for the
confident interval computation.
x
0.8326911
s
0.1126609
Step 3. I use the Mean One-Sample function. (“Analysis >
Confidence Intervals > Mean One-Sample”).
The 95% confident interval of P (the mean Chinese/
English ratio) is (0.78, 0.89).
Step 4. Implication:
Since the entire confident interval is less than 1, we are 95% confident that the mean
Chinese/English syllable ratio is between 0.78 to 0.89.
Conclusion
Since I have only taken a convenience sample from the Book of Daniel, we must be careful to
make generalization. Nevertheless, for the type of literature that is similar to the Book of Daniel
we expect that on average Chinese to use fewer syllables that English. It fact, we expected it to
take only about 78% to 89% of the syllables in English.
The statistical analysis confirmed my suspicion that Chinese is more concise. However, the ratio
is not as low as I expect. I was expecting something like 2/3. My analysis proved my
“guesstimate” wrong.
Page 3 of 3
Elementary Statistics
2-μ Independent Sample, Data Collection
NAME: _______________________
(v.090721)
Hypothesis Testing (& Confidence Interval):
2-μ Independent Sample, Data Collection Worksheet
This template guides you on: (1) setting up the correct Null and Alternative hypotheses for 2sample measurement problem, and (2) collecting the necessary information for the computation.
Problem: Triola p.459 #9
Problem: Triola p.459 #10
“Claim in Math”: μechin ≠ μ plac
“Claim in Math”:
μ1 = μechin = pop. mean of # days of fever"
μ1 = μ =
when treated with echinacea.
μ2 = μ plac = pop. mean of # days of fever"
μ2 = μ =
when "treated" with placebo.
H 0 : μ1 = μ2
Hypotheses:
Hypotheses:
H1 : μ1 ≠ μ2
Population 1
Desc. of pop.
Echinacea
Sample Mean
x1 = 0.81 days
Sample StdDev
s1 = 1.50 days
H0 :
H1 :
Population 2
Placebo
x2 = 0.64 days
Desc. of pop.
Sample Mean
s2 = 1.16 days
Sample StdDev
Sample Size
n1 = 337
n2 = 370
Pop. StdDev
σ 1 = not given
σ 2 = not given
Population 1
Sample Size
Problem: Triola p.459 #11
Pop. StdDev (σ)
Problem: Triola p.460 #13
“Claim in Math”:
“Claim in Math”:
μ1 = μ =
μ1 = μ =
μ2 = μ =
μ2 = μ =
Hypotheses:
H0 :
Hypotheses:
H1 :
Population 1
H0 :
H1 :
Population 2
Population 1
Desc. of pop.
Sample Mean
Desc. of pop.
Sample Mean
Sample StdDev
Sample StdDev
Sample Size
Sample Size
Pop. StdDev (σ)
Pop. StdDev (σ)
© 2009 Chi-yin Pang
Population 2
Page 1 of 2
Population 2
Elementary Statistics
2-μ Independent Sample, Data Collection
NAME: _______________________
(v.090721)
Problem: Triola p.460 #16
Problem: Triola p.460 #17
“Claim in Math”:
“Claim in Math”:
μ1 = μ =
μ1 = μ =
μ2 = μ =
μ2 = μ =
Hypotheses:
H0 :
Hypotheses:
H1 :
Population 1
H0 :
H1 :
Population 2
Population 1
Desc. of pop.
Sample Mean
Desc. of pop.
Sample Mean
Sample StdDev
Sample StdDev
Sample Size
Sample Size
Pop. StdDev (σ)
Problem: Triola p.461 #22
Pop. StdDev (σ)
Problem: Triola p.477 #5b
“Claim in Math”:
“Claim in Math”:
μ1 = μ =
μ1 = μ =
μ2 = μ =
μ2 = μ =
Hypotheses:
H0 :
Hypotheses:
H1 :
Population 1
H0 :
H1 :
Population 2
Population 1
Desc. of pop.
Sample Mean
Desc. of pop.
Sample Mean
Sample StdDev
Sample StdDev
Sample Size
Sample Size
Pop. StdDev (σ)
Pop. StdDev (σ)
© 2009 Chi-yin Pang
Page 2 of 2
Population 2
Population 2
Elementary Statistics
2-Prop Data Collection
NAME: _______________________
(v.090728)
Hypothesis Testing (& Confidence Interval):
“2-Prop Data Collection” Worksheet (for [Tri08] Sect.9-2)
Problem: Triola08 p.444, #11 (Home Field Advantage)
Problem: Triola08 p.444, #12 (Gloves)
Success means: The home team won
“Claim in Math”: pbb = p fb
Success means: The glove leaks virus
“Claim in Math”: pvin > plat
p1 = pbb = pop. prop. of home team win in basketball
p1 = pvin = pop. prop. of vinyl gloves that leaks virus
p2 = p fb = pop. prop. of home team win in football
p2 = p fb =
Hypotheses:
H 0 : p1 = p2
Hypotheses:
H1 : p1 ≠ p2
Description of
populations
# “Successes”
Population 1
Population 2
Basketball
games
Football games
H 0 : p1 = p2
H1 :
Population 1
x1 = 127
x2 =
Description of
populations
# “Successes”
Sample Size
n1 = 198
n2 =
Sample Size
Sample Prop.
pˆ1 not
pˆ 2
Sample Prop.
x given; or x = npˆ
x given; or x = npˆ
Population 2
Vinyl gloves
x1 = n1 pˆ1
x2 =
= 240 × .63 =
n1 = 240
pˆ1 =63%=
n2 =
pˆ 2
explicitely given
Problem: Triola08 p.445, #14
Problem: Triola08 p.445, #15
Success means:
“Claim in Math”:
Success means:
“Claim in Math”:
p1 = p = pop. proportion of
p1 = p = pop. proportion of
p2 = p =
p2 = p =
Hypotheses:
H 0 : p1 = p2
Hypotheses:
H1 :
Population 1
Description of
populations
# “Successes”
H 0 : p1 = p2
H1 :
Population 2
Population 1
x1 =
x2 =
Description of
populations
# “Successes”
Sample Size
n1 =
n2 =
Sample Prop.
pˆ1 =
pˆ 2
x given; or x = npˆ
© 2009 Chi-yin Pang
Population 2
x1 =
x2 =
Sample Size
n1 =
n2 =
Sample Prop.
pˆ1 =
pˆ 2
x given; or x = npˆ
Page 1 of 2
Elementary Statistics
2-Prop Data Collection
NAME: _______________________
(v.090728)
Problem: Triola08 p.445, #17
Problem: Triola08 p.445, #19
Success means:
“Claim in Math”:
Success means:
“Claim in Math”:
p1 = p = pop. proportion of
p1 = p = pop. proportion of
p2 = p =
p2 = p =
Hypotheses:
H 0 : p1 = p2
Hypotheses:
H1 :
Population 1
Description of
populations
# “Successes”
H 0 : p1 = p2
H1 :
Population 2
Population 1
x1 =
x2 =
Description of
populations
# “Successes”
Sample Size
n1 =
n2 =
Sample Prop.
pˆ1 =
pˆ 2
x given; or x = npˆ
x1 =
x2 =
Sample Size
n1 =
n2 =
Sample Prop.
pˆ1 =
pˆ 2
x given; or x = npˆ
Problem: Triola08 p.446, #20
Problem: Triola08 p.446, #21
Success means:
“Claim in Math”:
Success means:
“Claim in Math”:
p1 = p = pop. proportion of
p1 = p = pop. proportion of
p2 = p =
p2 = p =
Hypotheses:
H 0 : p1 = p2
Hypotheses:
H1 :
Population 1
Description of
populations
# “Successes”
H 0 : p1 = p2
H1 :
Population 2
Population 1
x1 =
x2 =
Description of
populations
# “Successes”
Sample Size
n1 =
n2 =
Sample Prop.
pˆ1 =
pˆ 2
x given; or x = npˆ
© 2009 Chi-yin Pang
Population 2
Population 2
x1 =
x2 =
Sample Size
n1 =
n2 =
Sample Prop.
pˆ1 =
pˆ 2
x given; or x = npˆ
Page 2 of 2
Elementary Statistics
Correlation & Regression Example
(v.090724)
Correlation & Regression Example
We use the example from [Tri08, p.504 #16]. The
data has vehicle weight (lb.) and fuel efficiency
(mpg, “miles per gallon gas mileage”) for 7 cars.
Step 4a: Interpret the Result of the Test.
t= the test statistic, t-score =
df= the degree of freedom =
p= the p-value =
Conclusion of this test: (Reject H0 if p-value≤α)
Step 1, 2: Assessment; Define H0, H1.
Explanatory Variable (x):
Response Variable (y):
Assess correlation with the scatterplot.
r= the sample Correlation Coefficient =
r2= the Proportion of the Variation of y
explained by the regression line =
•
If there is an inferential outlier,
see whether it is appropriate to delete it.
• If the scatterplot shows a curve,
DON’T do linear regression.
• Otherwise, proceed to define H1 as:
ρ ≠ 0 or ρ < 0 or ρ > 0
Formulate the hypotheses:
H 0 : ρ = 0 (i.e., there is no correlation)
Step 4b: Graph the Regression Line.
“a=” and “b=” defines the regression line:
Y1= ŷ = a + bx =
Use ZoomStat to get the original data and
the regression line, Y1.
H1 :
Note: “r” represents the sample coefficient of correlation, and the Greek “ρ”
(pronounced “roe”) represents the population coefficient of correlation.
Step 3: Compute the Regression Line & p-value.
Use [STAT]>TEST>*:LinRegTTest .
Step 5: Predict ŷ for a specific x.
• If we failed to reject H0, DON’T predict,
because there is no correlation.
• If x is outside the range of the xi’s, DON’T
predict. It may give invalid prediction.
• If x is within the range of the xi’s, then predict
the ŷ value with ŷ = a + bx .
Then use [Y=] .
LinRegTTest does the following:
•
Fit a linear regression line, ŷ = a + bx to the data
points, (x,y)’s; and put the equation into Y1.
Note: This is called the “regression line”. This line has the “least
square” property that minimizes the sum of all the square of the ydistances between the data points and the line. See [Tri08, pp. 488, 510].
•
Compute r, and compute p-value for testing the
hypotheses.
Result: Use [Y=] to see the equation Y1.
© 2009 Chi-yin Pang
Step 4c: Interpret the Slope, b, of the Line.
y's unit
b = −0.00797
=
x's unit
Therefore, for every additional _____ of _________,
the _________ is ______ (increased/decreased) by ______
______ on average.
Equivalently: “For every additional 100 lb. of
vehicle weight, the fuel efficiency is decreased by
0.797 mpg on average.”
E.g. 1: Geo Metro weights 2623 lb. Predict the
mpg. The weight x=2623 lb. is within the range of
xi’s [2290 lb, 3985 lb], therefore we can use the
equation to predict:
yˆ = a + bx = ______ − .00797 * ______ ≈ 33.8mpg
E.g. 2: Smart Car weights 1808 lb. Predict the mpg.
(Ans.: Don’t. Why?)
Page 1 of 1
Chi-square (χ2) Goodness of Fit Test
(Ver.081209)
STEP 1: Define problem & Collection Information
Elementary Statistic
Chi-square (χ2)
Goodness of Fit (GOF) Test Worksheet
H0:
H1:
Given:
1. A discrete distribution of k categories,
P ( Categoryi ) = pi and p1 + ... + pk = 1 .
11
9
8
Was the sampling random?
5
2. A sample of size n was drawn and the
observed frequencies for the categories are: O1, O2, …, Ok
To Test: How good the observation fit the distribution.
H0: The population has the given distribution.
H1: The population has a different distribution.
Category
i
Item
Name
L1:
Expected
Prob.
P (Cati ) = pi
L2:
Observed
Freq.
Oi
L3:
Expected
Freq.
L4 = (L2–L3)^2/L3
(Oi − Ei )
2
Ei = npi
STEP 2: Check assumptions
7
Ei
1
Check
∑p
i
=1?
Check the expected frequencies Ei = npi . Is Ei ≥ 5 for each i?
STEP 3: Compute the Chi-square Statistics, & P value
χ =∑
2
( Oi − Ei )
Ei
2
=
P value = right tail probability of the Chi-square distribution
with (k – 1) degrees of freedom
2
= χ cdf(
χ 2 , ∞, k – 1)
2
= χ cdf(
,E99,
)=
Use: [2nd][VARS](DISTR)>DISTR>*:χ2cdf(…
TI-84 also has: [STATS]>TESTS>D:χ2GOF-Test…
2
STEP 4: Make conclusion
3
4
5
6
∑p
i
=
n = ∑ Oi
= sum( L2 )
=
χ2 = ∑
(Oi − Ei )2
= sum( L4 ) =
Ei
Elementary Statistics
Test of Dependency Worksheet
(v.090501)
STEP 2: Check the assumptions.
Test of Dependency Worksheet
Given: A frequency contingency table
Female Male
E.g., from a 1992 poll conducted by
Democrat
48
36
Univ. of Montana. [BVD07.631]
Independent
16
24
Test the claim:
Republican
33
45
The Response variable is Dependent
on the Explanatory variable.
E.g., “Political affiliation” depends on “gender” in 1992.
Explanatory Variable
2b. Assuming that H0 is true,
and P ( Explantory Cat. j ) = ∑ All i ' s Oij / n
compute & check the expected
frequencies, Eij’s.
* Enter the observed frequencies into matrix [A].
[2nd][x-1](MATRIX)>EDIT>1:[A]
* Use χ2-Test to compute the expected frequencies,
Expected frequencies will be written to [B].
* Copy [B] to the worksheet.
* Are all the expected frequencies ≥ 5?
STEP 3: Compute the Test Statistics & P value.
Use the χ2 worksheet OR USE TI’s χ2-Test results.
Test Statistics = χ = ∑
2
O11
O12
O13
O14
E11
E12
E13
E14
O21
O22
O23
O24
E21
E22
E23
E24
O31
O32
O33
O34
E31
E32
E33
E34
O41
O42
O43
O44
E41
E42
E43
E44
©2009 by Chi-yin Pang
and P ( Response Cat. i ) = ∑ All j ' s Oij / n
[STAT]>TESTS>*: χ2-Test…
Use Observed: [A] and Expected: [B].
Enter the category names and enter the observed frequencies
into the table.
V
a
r
where n = ∑ Oij = sum of all observed frequencies
2a. Is the sample random?
STEP 1: Define Problem & Collect Information
H0: Response and Explanatory are Independent
H1: Response and Explanatory are Dependent
R
e
s
p
o
n
s
e
Eij = n ⋅ P ( Response Cat. i ) ⋅ P ( Explantory Cat. j )
(O
ij
− Eij )
Eij
2
=
d . f . = degrees of freedom = ( Rows − 1)(Columns − 1) =
P value = right tail prob. of the χ2 distribution
with d.f. degrees of freedom
= χ2cdf( χ 2 , ∞ , d.f. )
= χ2cdf(
, E99,
STEP 4: Make conclusion.
Page 1 of 1
)=
Elementary Statistics
Test of Dependency Worksheet
(v.090501)
Test of Dependency Worksheet
2nd
118
167
3rd
178
528
Crew
212
673
Test the claim: The Response variable is Dependent on
the Explanatory variable.
E.g., “Survival” depends on “Passenger Type”.
Eij = n ⋅ P ( Response Cat. i ) ⋅ P ( Explantory Cat. j )
where n = ∑ Oij = sum of all observed frequencies
2a. Is the sample random?
Given: A frequency contingency table
E.g.,
1st
Alive 202
Dead 123
STEP 2: Check the assumptions.
and P ( Response Cat. i ) = ∑ All j ' s Oij / n
2b. Assuming that H0 is true,
and P ( Explantory Cat. j ) = ∑ All i ' s Oij / n
compute & check the expected
frequencies, Eij’s.
* Enter the observed frequencies into matrix [A].
[2nd][x-1](MATRIX)>EDIT>1:[A]
* Use χ2-Test to compute the expected frequencies,
[STAT]>TESTS>*: χ2-Test…
Use Observed: [A] and Expected: [B].
STEP 1: Define Problem & Collect Information
H0: Response and Explanatory are Independent
H1: Response and Explanatory are Dependent
Expected frequencies will be written to [B].
* Copy [B] to the worksheet.
* Are all the expected frequencies ≥ 5?
Enter the category names and enter the observed frequencies
into the table.
STEP 3: Compute the Test Statistics & P value.
Use the χ2 worksheet OR USE TI’s χ2-Test results.
Explanatory Variable
Test Statistics = χ = ∑
2
(O
ij
− Eij )
Eij
2
=
d . f . = degrees of freedom = ( Rows − 1)(Columns − 1) =
R
e
s
p
o
n
s
e
V
a
r
O11
O12
O13
O14
E11
E12
E13
E14
O21
O22
O23
O24
E21
E22
E23
E24
O31
O32
O33
O34
E31
E32
E33
E34
O41
O42
O43
O44
E41
E42
E43
E44
©2009 by Chi-yin Pang
P value = right tail prob. of the χ2 distribution
with d.f. degrees of freedom
2
= χ cdf( χ 2 , ∞ , d.f. )
= χ2cdf(
, E99,
STEP 4: Make conclusion.
Page 1 of 1
)=
Page 6
Grand Road Map
References
are from
Triola (2008)
STATISTICAL
INFERENCE
2-SAMPLE DATA
ANALYSIS
Hypothesis
Testing
Confidence
Interval
§8-2
Page 6, 7
Page 2
Linear
Regression
Prediction
Page 4
1-Sample
2-Sample
Difference
Page 3
1-Sample
p
σ
§7-2
§7-5
2-Sample
Difference
§10-4
Page 8
μ
Page 5
Page 8
μ
p
μ
§9-2
p
σ
§8-3
§8-6
Matched
Pairs Independent
Samples
Page 8
μ
p
§9-2
Matched
Pairs Independent
Samples
§9-4
ChiSquare
(χ2)
Tests
Page 9
ANOVA
§11-4
σ
σ ‘s
σ’s
σ
σ
σ’s
σ’s
Known
§7-3
Unknown
§7-4
Known
§9-3
Unknown
§9-3
Known
§8-4
Unknown
§8-5
Known
§9-3
Unknown
§9-3
Page 6, 7
Linear
Correlation &
Regression’s
ρ, β (coeff. of
correl. & slope)
§10-2 #39, 10-3
§9-4
σ
Ver.080430
© 2008 by Chi-yin Pang
Other
Tests
Goodness
-of-fit
Independence
§11-2
§11-3
σ
§7-5
§8-6
1
CONFIDENCE INTERVAL
Mean (μ)
⎛z σ ⎞ ⎛z s⎞
n ≥ ⎜ α /2 ⎟ ≈ ⎜ α /2 ⎟
⎝ E ⎠ ⎝ E ⎠
No info. for the dist.
and n≤30
Use nonparametric
or bootstrapping,
p.360. (Don’t touch!)
1. Random Sample?
2. n>30 or
Normally dist.
Always round up. (p.323)
σ Known (§7-3)
1 − α (comput α = 1 − C.L.)
Critical Value:
zα / 2 = invNorm(α / 2)
Point Estimate:
⎛ σ ⎞
E = zα / 2σ x = zα / 2 ⎜
⎟
⎝ n⎠
x
Confidence Interval:
( x − E, x + E )
Margin of Error:
[STAT] > TESTS > 7:ZInterval
If given raw data, use “Input: Data” & L1.
If given statistics, use “Input: Stats”.
v.090710+
© 2009 by Chi-yin Pang
If estimate of p is available,
[ zα /2 ]
2
n≥
E
Check: 1. Random Sample?
ˆˆ
pq
2
If estimate for p is not avail.,
.
[zα / 2 ]
⋅ 0.25
E2
Always round up. (p.308)
n≥
x
, qˆ = 1 − pˆ
n
#successes ≥ 5? # successes = x = npˆ
# failures ≥ 5? # failures = n − x = nqˆ
2. n big enough?
2
1 − α (comput α = 1 − C.L.)
Degrees of Freedom: d . f . = n − 1
Critical Value:
tα / 2 has P (−tα / 2 < t < tα / 2 ) = α / 2
with the "student's-t" distribution with (n − 1) d.f.
For TI-84
tα / 2 = invT (α / 2, n − 1)
Point Estimate:
⎛ s ⎞
E = tα / 2 ⎜
⎟
⎝ n⎠
x
Confidence Interval:
( x − E, x + E )
Margin of Error:
(See p.304, 305, 320.)
Or Use TI’s ZInterval
Success="1"
This involves the ratio (x/n)
of successes (x) out of the total (n).
(See p.331, Tbl.A-3.)
Or Use TI’s TInterval
[STAT] > TESTS > 8:TInterval
If given raw data, use “Input: Data” & L1.
If given statistics, use “Input: Stats”.
Conclusion:
(for “mean” example ONLY)
“We are 95% confident that the population’s
mean weekly income is between $371 and $509.”
pˆ =
(Note: Diffferent books use 5 to 15.)
σ Not Known (§7-4)
Confident Level:
Confident Level:
0.2
Failure="0"
Det. Min. n (§7-2)
Check:
2
0.8
Proportion (p) (§7-2)
This involves average of
measurements, e.g., weight.
Det. Min. n (§7-3)
2
for “trapping” population mean, μ
or population proportion, p
Given a confidence level, 1-α (e.g., C.L.=0.95)
Prob
Triola (2008) References
ZInterval: p.338
TInerval: p.338
1-PropZInt: p.312
Critical values: t-table: A-3
z-table: A-2
Binom. Dist.
1
0.8
0.6
0.4
0.2
0
Confident Level:
1 − α (comput α = 1 − C.L.)
Critical Value:
zα / 2 = invNorm(α / 2)
Margin of Error:
E = zα / 2σ = zα / 2
pˆ = x / n
Confidence Interval: ( pˆ − E , p + E )
pq
≈ zα / 2
n
ˆˆ
pq
n
Point Estimate:
(See p.304-306.)
Or Use TI’s 1-PropZInt
[STAT] > TESTS > A:1-PropZInt
Note: The input x MUST be an integer.
If x is not given, compute x = n ⋅ pˆ
Precise interpretation: e.g., “When we
compute the confident interval this way, on average,
95 out of 100 of these intervals would capture the
weekly income’s true mean (the population mean,
μ). This time we have calculated the interval to be
($371, $509).”
2
HYPOTHESIS TESTS (§8-1)
Mean (μ)
Check:
1. Random Sample?
2. n>30 or
Normally dist.
σ Known (§8-4)
Test Statistic : z =
x − μ0
H 0 : μ = μ0
H 1 : μ ≠ μ o ⇒ P value = 2 tails prob. = 2 × normalcdf ( z , E 99)
H 1 : μ < μ o ⇒ P value = Left tail prob. = normalcdf (− E 99, z )
H 1 : μ > μ o ⇒ P value = Right tail prob. = normalcdf ( z , E 99)
Or Use TI’s Z-Test (pp.402, 404)
[STAT] > TESTS > 1:Z-Test
If given raw data, use “Input: Data” & L1.
If given statistics, use “Input: Stats”.
Test statistic is reported as “z =…”
P-value
is reported as “p =…”
Conclusion:
v.090710
© 2009 by Chi-yin Pang
Given level of significance: α (e.g.,α=0.05)
1
0.8
0.6
0.4
0.2
0
0.8
0.2
Failure="0"
No info. for the dist.
and n≤30
Use nonparametric
or bootstrapping,
p.360. (Don’t touch!)
2. n big enough?
x
, qˆ = 1 − pˆ
n
#successes ≥ 5? # successes = x = npˆ
# failures ≥ 5? # failures = n − x = nqˆ
H1 : μ ≠ μ o ⇒ P value = 2 tails prob. = 2 × tcdf( t , E 99 , d.f.)
H1 : μ < μ o ⇒ P value = Left tail prob. = tcdf( − E 99 , t, d.f.)
pˆ =
(Note: Diffferent books use 5 to 15.)
σ Not Known (§8-5)
x − μ0
s/ n
Use t - distribution, d . f . = n − 1
H 0 : μ = μ0
Success="1"
Proportion (p) (§8-3)
Check: 1. Random Sample?
.
Test Statistic : t =
σ/ n
Binom. Dist.
Prob
Triola (2008) References
H0 & H1: pp.3693 types of H1’s: pp.374-375
Reject/Failed to Reject H0: pp.376-377
Type of errors: pp.378-380
Critical Region/value, α: pp.272-373
P-value: p.374
Critical values: t-table: A-3
z-table: A-2
Test Statistic : z =
pˆ − p 0
p0 q0 / n
H 0 : p = p0
H 1 : p ≠ p o ⇒ P value = 2 tails prob. = 2 × normalcdf ( z , E 99)
H 1 : p < p o ⇒ P value = Left tail prob. = normalcdf ( − E 99, z )
H 1 : p > p o ⇒ P value = Right tail prob. = normalcdf ( z , E 99)
H1 : μ > μ o ⇒ P value = Right tail prob. = tcdf(t, E 99 , d.f.)
Or Use TI’s T-Test (p.411)
[STAT] > TESTS > 2:T-Test
If given raw data, use “Input: Data” & L1.
If given statistics, use “Input: Stats”.
Test statistic is reported as “t =…”
P-value
is reported as “p =…”
α = level of significance (assume 0.05, if not given) (pp.376-377)
If P value ≤ α then reject H0.
“The observed sample statistics has a P-value of __≤__ [α]. We reject
Ho with a significance level of α =___. The evidence is statistically
significant to support _______ [English of H1].”
If P-value > α, then fail to reject H0.
“The observed sample statistics has a P-value of __>__ [α]. We failed
to reject Ho with a significance level of α =___. The evidence is not
statistically significant to support ____________ [English of H1].”
Or Use TI’s 1-PropZTest (pp.393-394)
[STAT] > TESTS > 5:1-PropZTest
Note: The input x MUST be an integer.
If x is not given, compute x = n ⋅ pˆ
Test statistic is reported as “z =…”
P-value
is reported as “p =…”
(§8-2, p.374) P-Value is the probability
for observing a result that is at least as
extreme as the given sampled data, if H0 is
true. Therefore, a small P-values is a
strong evidence for H0 being false.
(§8-2, p.372) Significant Level, α,
is the risk you are willing to take to reject a
true null hypothesis, H0.
3
Matched Pairs (§9-4)
Given a confidence level, 1-α (e.g., C.L.=0.95)
Check:
Ratio :
1. Random Sample?
2. (n1>30 & n2>30)
or Both Normal
ri = xi / y i
Relative Diff. : rd i =
xi − y i
yi
0.2
0.6
0.6
0.4
0.4
0.2
0
Success="1"
Failure="0"
Success="1"
Proportion (p1–p2) (§9-2)
Check: 1. Random Sample?
Difference : d i = xi − y i
as a 1-sample Conf.Int problem.
Caution: It might be more
appropriate to analyze the
Binom. Dist.
0.8
0.8
Failure="0"
Independent Samples
Given related data pairs, (xi,yi).
Analyze the
Binom. Dist.
1
0.8
0.6
0.4
0.2
0
Prob
Mean (μ1−μ2)
CONFIDENCE INTERVAL
2-SAMPLE (comparing 2 groups)
Prob
References
are from
Triola
(2008)
2. n big enough?
No info. for the dist. &
(n1≤30 or n2≤30)
Don’t Touch!
.
x1
, qˆ1 = 1 − pˆ 1 ,
n1
x2
, qˆ 2 = 1 − pˆ 2
n2
# successes1 ≥ 5 ? # successes1 = x1 = n1 pˆ 1
pˆ 1 =
pˆ 2 =
# failures1 ≥ 5 ? # failures1 = n1 − x1 = n1qˆ1
# successes2 ≥ 5 ? # successes2 = x2 = n2 pˆ 2
# failures2 ≥ 5 ? # failures2 = n2 − x2 = n2 qˆ 2
p.455
Both σ1 & σ2 Known (§9-3)
Confident Level:
1 − α (compute α = 1 − C.L.)
Critical Value:
zα / 2 = invNorm(α / 2)
Margin of Error:
E = zα / 2
Point Estimate:
x1 − x2
Confidence Interval:
σ 12
n1
+
σ 22
n2
( ( x1 − x2 ) − E , ( x1 − x2 ) + E )
Or Use TI’s 2-SampZInt
[STAT] > TESTS > 9:2-SampZInt
If given raw data, use “Input: Data” & L1, L2.
If given statistics, use “Input: Stats”.
σ1 or σ2 Unknown (§9-3)
p.438
If σ1=σ2, then see p.457 (pooled variance);
otherwise, see p.450.
Confident Level:
1 − α (compute α = 1 − C.L.)
Critical Value:
zα / 2 = invNorm(α / 2)
Or Use TI’s 2-SampTInt
Margin of Error:
E = zα / 2
Point Estimate:
pˆ1 − pˆ 2
[STAT] > TESTS > 0:2-SampTInt
If given raw data, use “Input: Data” & L1, L2.
If given statistics, use “Input: Stats”.
If σ1 = σ2, use “Pooled: Yes”;
otherwise, use “Pooled: No”.
Confidence Interval:
pˆ1qˆ1 pˆ 2 qˆ2
+
n1
n2
( ( pˆ1 − pˆ 2 ) − E , ( pˆ1 − pˆ 2 ) + E )
Or Use TI’s 2-PropZInt
[STAT] > TESTS > B:2-PropZInt
Note: The input x1, x2 MUST be integers.
Conclusion:
v.090710
© 2009 by Chi-yin Pang
“We are __% confident that ___ [English of μ1-μ2 or p1-p2] is between _ and _ [with units] .”
THEN make inference between the two groups:
If the interval is totally positive, then “We are __% confident that __ [English of μ1 or p1] is greater
than ___ [English of μ2 or p2] by at least ___ [the lower limit, with unit].”
If the interval is totally negative, then “We are __% confident that __ [English of μ2 or p2] is greater
than ___ [English of μ1 or p1] by at least ___ [the absolute value of the upper limit, with unit].”
(If the interval contains 0, the two means could be the same & we can’t make the above conclusions.)
4
HYPOTHESIS TESTS
Matched Pairs (§9-4)
p.455
Check:
1. Random Sample?
xi − y i
yi
2. (n1>30 & n2>30)
or Both Normal
x1 − x2
σ 12
n1
+
σ 22
0.4
0.6
0.4
0.2
0
Failure="0"
Success="1"
Failure="0"
Success="1"
x2
, qˆ 2 = 1 − pˆ 2
n2
# successes1 ≥ 5 ? # successes1 = x1 = n1 pˆ 1
pˆ 1 =
No info. for the dist. &
(n1≤30 or n2≤30)
Don’t Touch!
H 0 :μ1 = μ2
H1 : μ1 ≠ μ2 ⇒ P value = 2 tails prob. = 2 × normalcdf ( z , E 99)
H1 : μ1 < μ2 ⇒ P value = Left tail prob. = normalcdf (− E 99, z )
H1 : μ1 > μ2 ⇒ P value = Right tail prob. = normalcdf ( z, E 99)
Or Use TI’s 2-SampZTest
[STAT] > TESTS > 3:2-SampZTest
If given raw data, use “Input: Data” & L1, L2.
If given statistics, use “Input: Stats”.
Test statistic is reported as “z =…”
P-value
is reported as “p =…”
Conclusion:
x1
, qˆ1 = 1 − pˆ 1 ,
n1
pˆ 2 =
# failures1 ≥ 5 ? # failures1 = n1 − x1 = n1qˆ1
# successes2 ≥ 5 ? # successes2 = x2 = n2 pˆ 2
.
# failures2 ≥ 5 ? # failures2 = n2 − x2 = n2 qˆ 2
p.437
σ1 or σ2 Unknown (§9-3)
If σ1=σ2, then see p.457 (pooled variance);
otherwise, see p.450.
n2
v.090710
© 2009 by Chi-yin Pang
0.2
0.6
2. n’s big enough?
Both σ1 & σ2 Known (§9-3)
Test Statistic: z =
Binom. Dist.
0.8
0.8
Proportion (p1, p2) (§9-2)
Check: 1. Indep. Random Sample?
Independent Samples
Given related data pairs, (xi,yi). Analyze
the Difference : d i = xi − y i
as a 1-sample Hypo.Test problem.
Caution: It might be more appropriate to
ri = xi / y i
analyze the Ratio :
Relative Diff. : rd i =
2-SAMPLE (comparing 2 groups)
1
0.8
0.6
0.4
0.2
0
Prob
Mean (μ1, μ2)
Binom. Dist.
Prob
References
are from
Triola (2008)
Or Use TI’s 2-SampTTest
[STAT] > TESTS > 4:2-SampTTest
If given raw data, use “Input: Data” & L1, L2.
If given statistics, use “Input: Stats”.
If σ1 = σ2, use “Pooled: Yes”;
otherwise, use “Pooled: No”.
Test statistic is reported as “t =…”
P-value
is reported as “p =…”
Pooled Sample Prop. : p =
Test Statistics : z =
x1 + x2
, q =1− p
n1 + n2
pˆ1 − pˆ 2
x /n −x /n
= 1 1 2 2
pq pq
pq pq
+
+
n1
n2
n1
n2
H 0 : p1 = p2
H1 : p1 ≠ p2 ⇒ P value = 2 tails prob. = 2 × normalcdf ( z , E 99)
H1 : p1 < p2 ⇒ P value = Left tail prob. = normalcdf (− E 99, z )
H1 : p1 > p2 ⇒ P value = Right tail prob. = normalcdf ( z , E 99)
Or Use TI’s 2-PropZTest
[STAT] > TESTS > 6:2-PropZTest
Note: The input x1, x2 MUST be integers.
Test statistic is reported as “z =…”
P-value
is reported as “p =…”
α = level of significance (e.g., assume α = 0.05 if not given)
If P value ≤ α then reject H0.
“The observed sample statistics has a P value of __≤__ [α]. We reject
Ho with a significance level of α =___. The evidence is statistically
significant to support ____ [English of H1].”
If P value > α, then fail to reject H0.
“The observed sample statistics has a P value of __>__ [α]. We failed
to reject Ho with a significance level of α =___. The evidence is not
statistically significant to support ____ [English of H1] .”
5
2-Sample Data,
Correlation & Regression
Road Map
2-SAMPLE DATA
Mean (μ)
Matched Pairs
For Studying the Relationship between
the Explanatory Variable (x) &
the Response Variable (y)
Analyze with Correlation & Regression.
These techniques try to find the association between the
variables by fitting a function the observed (xi,yi) points.
They also assess the strength of the association.
Independent Sample
Proportion (p)
2-SampZInt 2-SampZTest
2-SampTInt 2-SampTTest
2-PropZInt 2-PropZTest
For Comparison of
Same Kind of Data
For more information about
the non-correlation type of
analysis, see the 1-sample
and 2-sample Confidence
Interval and Hypo. Test flow
charts.
Take the difference and treat it
as a 1-sample problem.
ZInterval Z-Test
TInterval T-Test
Fit a LINEAR Function (y=ax+b) to the
data. (“Linear Regression”)
Fit a NON-LINEAR Function to the data
(“Non-Linear Regression”)
Use [STAT]>TESTS>LinRegTTest
Use QuadReg for y = ax 2 + bx + c
Other related functions are:
[STAT]>TESTS>LinRegTInt
Use CubicReg for y = ax3 + bx 2 + cx + d
Or turn-on Diagnostics first
[2nd]>[0](CATALOG)>DiagnosticsOn [ENTER}>[ENTER]
Then use
[STAT]>CALC>LinReg(ax+b) L1,L2,Y1
[STAT]>CALC>LinReg(a+bx)
[STAT]>CALC>Med-Med
[STAT]>CALC>Manual-Fit
(See the “Linear Regression” flow chart for more detail.)
Ver.090428
© 2009 by Chi-yin Pang
Use QuartReg for y = ax 4 + bx 3 + cx 2 + dx + e
Use LnReg
for y = a + b ln x
Use ExpReg for y = a ⋅ b x
Use PwrReg for y = ax b
(
)
Use Logistic for y = c / 1 + ae − bx
Use SinReg for y = a sin(bx + c) + d
6
LINEAR REGRESSION
References
are from
Triola (2008)
Given n pairs of (xi,yi)’s. Regression was done (e.g.,
with LinReg(a+bx)) & the regression line is y=a+bx
with correlation coefficient r. (The symbols for the
corresponding population parameters are α, β, ρ.)
Check: (p.488)
Prediction Intervals (§10-4)
Testing ρ, the population
correlation coeff. (§10-2)
Confident Level:
c (a given probability value)
Degrees of Freedom: d . f . = n − 2
Critical Value:
tc has P( −tc < t < tc ) = c
with the t distribution with (n − 2) d.f.
2 ⎞
⎛
1 (x − x) ⎟
E = tc ⎜ S e 1 + +
⎜
n
SS x ⎟
⎝
⎠
(For Se and SS x , see the formulas for "Testing for correlation coefficient ρ".)
Error Tolerance:
Point Estimate:
y p = a + bxs where xs is the specific given x value
(y
p
r n−2
1− r2
H0 : β & ρ = 0
with a given level of significance α (e.g., α=0.05).
(Unfortunately, α is used in 2 difference senses.)
These are tests for whether x, y are correlated.
Test β, the population slope.
(∑ x)
SS x = ∑ x 2 −
slope b =
n
SS xy
SS x
2
, SS y = ∑ y 2 −
(∑ y)
n
2
, SS xy = ∑ xy −
, the Standard Error of the Estimate is Se =
( ∑ x )( ∑ y ) ,
n
SS y − b SS xy
n−2
b
b
=
t=
, d. f . = n − 2 .
Standard Error for b Se / SS x
, d. f . = n − 2 .
H 0 :β = 0
H0 : ρ = 0
H1 : ρ ≠ 0 ⇒ P value = 2 × tcdf( t , E99, d.f.)
H1 : β ≠ 0 ⇒ P value = 2 × tcdf( t , E99, d.f.)
H1 : ρ < 0 ⇒ P value = tcdf( − E99,
H1 : β < 0 ⇒ P value = tcdf( − E99, t , d.f.)
H1 : ρ > 0 ⇒ P value = tcdf(
and y p is the predicted value according to the regression line, y = a + bx.
Confidence Interval:
Compute t =
Test the hypotheses
H1 : β & ρ ≠ 0 (or < 0; or > 0 )
1. Random Sample from all the possible (x,y)’s
2. Scatterplot has approximate a straight line?
3. Remove any erroneous outliers.
4. At each x value, y is normally distributed
and with the same σ.
Given a specific xs, we want to get a conf.
int. to trap the population mean of y
corresponding to xs, with confidence level c
(e.g., c=0.95).
Testing ρ & β
t , d.f.)
t , E99, d.f.)
H1 : β > 0 ⇒ P value = tcdf(
t , E99, d.f.)
(For conf. int. for ρ, see p,508 #39.)
− E , y p + E ) (See p.529, p.534 #26.)
(CAUTION: TI-84’s LinRegTInt is for getting the confident interval for the
slope β, and not the confident interval for the predicted y’s mean. There
are no “TI tests” for this case.)
Or Use TI’s LinRegTTest
Conclusion: “We are 95% confident that the
true mean of copper sulfate that will dissolve in 100
g of water at 45°C is between 26.5g and 39.5g.”
Enter (x, y)’s into L1, L2.
Enter the observation counts to matrix [A].
[STAT] > TESTS > E:LinRegTTest
Test statistics is reported as “t =…”
P-value
is reported as “p = …”
Or find the conf. int. of the slope β for conf.
level c=(1-α) by using TI-84’s LinRegTInt:
Enter (x, y)’s into L1, L2.
[STAT] > TESTS > G:LinRegTInt…
The output indicates the conf. interval for
the slope β. Use that to make the
appropriate conclusion.
(For CI for α and CI for β, see p.533 #25.)
Conclusion:
v.090710
© 2009 by Chi-yin Pang
α = level of significance
If P value ≤ α then reject H0. “The observed sample statistics has a P value of
__≤__(α). We reject Ho with a significance level of α =___. The evidence
is statistically significant to support ____ [English of H1].”
If P value > α, then fail to reject H0. “The observed sample statistics has a P value
of __>__(α). We failed to reject Ho with a significance level of α =___.
The evidence is not statistically significant to support ____ [English of H1] .”
7
References
are from
Triola (2008)
1st
2nd
3rd
Crew
Alive
202
118
178
212
Dead
123
167
528
673
Test of Independency (§11-3)
Given a observed frequencies in a contingency table
with R rows and C columns. We want to test whether
the variables in the rows and columns are independent.
Check: 1. Random Sample?
2. Each expected, Eij ≥ 5?
(NOTE: Perform TI’s “χ2-Test” (see below) then check
the output expected count matrix [B].)
CHI-SQUARE (χ2) TESTS
Tests that use the χ2 distributions.
Goodness of Fit (§11-2)
Given an assumed distribution of k categories
and expected probabilities p1, p2, … , pk.
An observation was made and the
observed frequencies for the categories
are: O1, O2, …, Ok
Test how good the observation fit the distribution.
Check: 1. Random Sample?
Conf.Int. for σ (§7-5)
for a conf. level=1−α (e.g., 0.95)
Check: 1. Random Sample?
2. Population is NORMAL?
(even for large n)
d. f . = n −1
χ L2 = χ 2CDF −1 (α / 2, d . f .)
χ R2 = χ 2CDF −1 (1 − α / 2, d . f .)
CI :
(n − 1) s 2
χ R2
Test of Std.Dev., σ (§8-6)
Given a “status quo” σ0 and a sample standard
deviation, s (from a sample of size n).
Test the true σ comparing to the “status quo” σ0.
2. Each expected count, Ei = npi ≥ 5?
H 0 : The two variables are Independnt
H 0 : The population fits the assumed distribution.
Check: 1.Random Sample?
H1 : The population has a different distribution.
2. The population NORMAL? (even for large n)
Compute the expected frequency (assuming H 0 ) :
(Row total )(Column total)
Eij =
Sample size
Compute the statistics χ = ∑
2
(O
ij
(Oi − Ei )
Ei
2
Use chi-square (χ 2 ) distribution with d . f . = k − 1
− Eij )
2
Eij
Use chi - square ( χ 2 ) dist. with d . f . = ( R − 1)(C − 1)
P value = the right tail prob. = χ 2cdf( χ 2 , E99, d.f.)
Or Use TI’s χ2-Test
[2nd][MATRX] > EDIT
Enter the observation freq., Oij, to matrix [A].
[STAT] > TESTS > C:χ2-Test
Test statistics is reported as “χ2 =…”
P-value is reported as “p= …”, the computed
expected counts, Eij’s, are stored in matrix [B].
v.090710
© 2009 by Chi-yin Pang
Compute the statistics χ 2 = ∑
P value = the right tail prob. = χ 2 cdf(χ 2 ,E99,d.f.)
[2nd][VARS](DISTR)>DISTR>*:c2cdf(LtLim,RtLim,df)
Or Use TI-84’s χ2GOF-Test
Enter the observed values in L1
& the expected values in L2.
[STAT] > TESTS > D:χ2GOF-Test…
Test statistic is reported as “χ2 =…”
P-value
is reported as “p= …”
Conclusion:
( n − 1) s 2
χ L2
TI has no “inverseχ2CDF”. Use
CATALOG>solve(χ2cdf(0, x ,d.f.)-.025,x,0) for χ2L
CATALOG>solve(χ2cdf(x,E99,d.f.)-.025,x,0) for χ2R
H1 : The two variables are NOT Independnt
Let Oij = the observed frequency
<σ <
χ 2 = (n − 1)
s2
σ 02
with d . f . = n − 1
H 0 : σ = σ 0 (CAUTION: don't forget to square σ 0 )
H1 : σ < σ 0 ⇒ P value = χ 2 cdf(0,χ 2 ,d.f.)
H1 : σ > σ 0 ⇒ P value = χ 2 cdf(χ 2 ,E99,d.f.)
H1 : σ ≠ σ 0 ⇒ P value = 2 × min( χ 2 cdf(0,χ 2 ,d.f.) ,
χ 2 cdf(χ 2 ,E99,d.f.))
No TI “test” available. Use
[2nd][VARS](DISTR)>DISTR>*:χ2cdf(LtLim,RtLim,df)
α = level of significance
If P value ≤ α then reject H0. “The observed sample statistics has a P value of
__≤__(α). We reject Ho with a significance level of α =___. The evidence
is statistically significant to support ____ [English of H1].”
If P value > α, then fail to reject H0. “The observed sample statistics has a P value
of __>__(α). We failed to reject Ho with a significance level of α =___.
The evidence is not statistically significant to support ____ [English of H1] .”
8
References
are from
Triola (2008)
ANOVA (ANalysis Of VAriance) (§11-4)
Given measured samples from k populations, test the hypotheses:
H0: μ1=μ2=…=μk
H1: not (μ1=μ2=…=μk)
Check: (p.585)
1. Random Samples?
2. No two sets of data are matched pairs?
3. Distributions are all approximately normal?
(Actually, “not very far from normal” would still give good results.)
4. σ1=σ2=…=σk
Let:k = number of populations being compared
ni = number of values in the i th sample
xi = sample mean of the i th sample
si = sample standard deviation of the i th sample
N = ∑ ni = number of values in all samples combined
x = means of all sample values combined
∑ ( n ( x − x ) ) ( k − 1)
∑ ( ( n − 1) s ) ∑ ( n − 1)
2
Test Statistic: F =
variance between samples
=
variance within smaples
i
i
2
i
i
i
The test statistics has an "F distribution" with :
numerator degrees of freedom = "ndf" = k − 1
denominator degrees of freedom = "ddf" = N − 1
P value = Right tail probability = TI's Fcdf ( F , E 99, ndf , ddf )
Keys: [2nd][VARS](DISTR)>DISTR>*:Fcdf(
Or Use TI’s ANOVA(L1,…,Lk)
[STAT] > TESTS > *:ANOVA(
Test statistics is reported as “F =…”
P-value
is reported as “p = …”
Conclusion:
v.090710
© 2009 by Chi-yin Pang
α = level of significance
If P value ≤ α then reject H0. “The observed sample statistics has a P value of
__≤__(α). We reject Ho with a significance level of α =___. The evidence
is statistically significant to support ____ [English of H1].”
If P value > α, then fail to reject H0. “The observed sample statistics has a P value
of __>__(α). We failed to reject Ho with a significance level of α =___.
The evidence is not statistically significant to support ____ [English of H1] .”
9
San Jose City College
Math 63 Statistics
Ver. 071118
Hypothesis Testing: Setup and Decision Methods
If claim has If claim
“straight has “=”
inequality” imbedded
μ ≠ μ0
p ≠ p0
μ < μ0
p < p0
μ > μ0
p > p0
μ = μ0
Then Setup the
Null & Alternative Hypotheses
As:
Two-tailed test
H 0 : μ = μ0
H1 : μ ≠ μ 0
p = p0
H 0 : p = p0
H1 : p ≠ p0
μ ≥ μ0
Left-tailed test
H 0 : μ = μ0
H1 : μ < μ0
p ≥ p0
H 0 : p = p0
H 1 : p < p0
μ ≤ μ0
Right-tailed test
H 0 : μ = μ0
H1 : μ > μ 0
p ≤ p0
© Chi-yin Pang, 2007
H 0 : p = p0
H1 : p > p0
P-value Method
Traditional Method
(Compare the P-value (Probability)
to α (the Significance Level))
(Compare the Test Statistic to the Critical
Value (the t-score or z-score of α))
Test Statistics:
x−μ
x−μ
pˆ − p
z=
or t =
or z =
σ/ n
s/ n
pq / n
(
)
(
)
P-value = Left tail Probability =
2 × tdcf ( t , E 99, n − 1) or
2 × normalcdf ( z , E 99)
Test Statistics:
z=
x−μ
x−μ
pˆ − p
or t =
or z =
σ/ n
s/ n
pq / n
(
)
(
)
Critical Value =
invT (α / 2, n − 1) or invNorm(α / 2)
Test Statistics:
Reject H 0
if |Test Stat| ≥ Critical Value.
Test Statistics:
z=
z=
Reject H 0 if P-value ≤ α.
x−μ
x−μ
pˆ − p
or t =
or z =
σ/ n
s/ n
pq / n
(
)
(
)
P-value = Left tail Probability =
tdcf (− E 99, t , n − 1) or
x−μ
x−μ
pˆ − p
or t =
or z =
σ/ n
s/ n
pq / n
(
)
(
)
Critical Value =
invT (α , n − 1) or invNorm(α )
normalcdf (− E 99, z )
Test Statistics:
Reject H 0
if Test Statistics ≤ – Critical Value.
Test Statistics:
z=
z=
Reject H 0 if P-value ≤ α.
x−μ
x−μ
pˆ − p
or t =
or z =
σ/ n
s/ n
pq / n
(
)
(
)
P-value = Left tail Probability =
tdcf (t , E 99, n − 1) or
x−μ
x−μ
pˆ − p
or t =
or z =
σ/ n
s/ n
pq / n
(
)
(
)
Critical Value =
invT (α , n − 1) or invNorm(α )
normalcdf ( z, E 99)
Reject H 0 if P-value ≤ α.
Reject H 0
if Test Stat ≥ Critical Value.
Download