math 108: introduction to progability and statisics

advertisement
MATH 108: INTRODUCTION TO PROGABILITY AND STATISICS
SHORELINE COMMUNITY COLLEGE
MIKE MORAN, INSTRUCTOR
TABLE OF CONTENTS
Index of TI-83 Programs ...................................................................................................................... 2
Add-in Programs ..................................................................................................................................... 5
WS #21 Histograms ................................................................................................................................ 8
WS #22 Median, Mean, Standard Deviation, Box Plots ........................................................... 11
WS #23 Normal Distributions ......................................................................................................... 13
WS #24 Scatter Plots, Correlation and Linear Regression .................................................... 13
WS #26 Proportions: Intervals and Tests ................................................................................... 23
WS #27 Normal Probability Plot .................................................................................................... 30
WS #28 Means: Confidence Intervals and Tests ....................................................................... 32
WS #29 Comparing Counts – Chi-Square Tests ......................................................................... 37
1
Index of TI-83 Programs
Type of Statistics Problem
TI-83 Command
Mean, Median, Quartiles, Standard
Deviation
Find the specific values for a data set
1-Var Stats
Stat-Edit (Enter Data)
Stat-Calc-1 (Calculate
Values)
Histogram Plot
Statistics Plots
Given a set of data, make a vertical bar chart 2nd-Stat Plot (3rd Icon Type)
showing the frequencies of occurrences of
equal intervals of the data.
Box Plot
Statistics Plots
Given a set of data, plot the minimum, first
2nd-Stat Plot (4th and 5th Icon
quartile, median, third quartile, and
types.
maximum
Scatter Plot
Statistics Plots
Given two sets of data, plot corresponding
2nd-Stat Plot (1st Icon Type)
values as points on an x-y coordinate plane
Normal Probability Plot
Statistics Plots
Given a set of data, check to see if it is a
2nd-Stat Plot (6th Icon Type)
normal distribution. If the plot results in a
straight line it is normal
Normal Distribution
PGRM
Given lower bound (L) and upper bound(U)
NORMDIST 1
(L, U,  ,  )
of interval, find the area above the interval
Normal Distribution
PGRM
Given the area from the left, find the right
NORMDIST 2
bound on the x-axis interval
(Area from left as a decimal,
,)
T Distribution
PGRM
Given a sample mean, find the p-value
TDIST
Critical Values
PGRM
Find the critical value for a level of
CRITVAL
confidence to use to construct a confidence
interval of a proportion or mean
2
Reference1
WS #22
WS #21
WS #22
WS #24
WS #27
WS #23
Add-in
Programs
WS #23
Add-in
Programs
Add-in
Programs
Add-in
Programs
Type of Statistics Problem
Standard Error
Find the standard error to use to construct a
confidence level or hypothesis test
Sample Size
Given a desired confidence level and margin
of error, find the required sample size
Confidence Interval – One Proportion
Find a confidence interval for a sample
proportion
Confidence Interval – Two Proportions
Find confidence interval for the difference of
two sample proportions
Confidence Interval – Mean
Find the confidence interval for a sample
mean when the population standard
deviation is estimated by the standard error
Confidence Interval – Two Means
Find the confidence Interval for the difference
of two sample means when the standard
deviation is estimated by the standard error
Hypothesis Test – One Proportion
Given one sample proportion, find the p-value
that it is a sampling variation of a given
value
Hypothesis Test – Two Proportions
Given two sample proportions, find the pvalue that their difference is a sampling
variation of a true difference of zero.
Hypothesis Test – Mean
Given one sample mean, find the p-value that
it is a sampling variation of a given value.
Hypothesis Test – Two Means
Given two sample means, find the p-value
that their difference is a sampling variation
of a true difference of zero.
TI-83 Command
Reference
PGRM
STDERROR
Add-in
Programs
PGRM
SAMPLSIZ
Add-in
Programs
1-PropZInt
Stat-Tests-A
WS #26
2-PropZInt
Stat-Tests-B
WS #26
TInterval
Stat-Tests-8
WS #28
2-SampTInt
Stat-Tests-0
WS #28
1-PropZTest
Stat-Tests-5
WS #26
2-PropZTest
Stat-Tests-6
WS #26
T-Test
Stat-Tests-2
WS #28
2-SampTTest
Stat-Tests-4
WS #28
3
Type of Statistics Problem
TI-83 Command
Paired Samples – Confidence Interval
Given two sets of data with related numbers
for each subject, i.e. before treatment and
after, find the confidence interval around the
sum of the pair-wise differences
Paired Samples –Hypothesis Test
Test the single list of the pair-wise differences
Comparing Counts – Chi-Square
Distribution
Given a set of counts related to a categorical
variable, check the “goodness of fit” to a
particular model.
Given multiple distributions of categorical
variables, check to see if they are
homogeneous.
Given a two-way table for two categorical
variables, check to see if they are
independent.
Linear Regression and Correlation
Given two sets of data, find the equation of
the line that best models the relationship
Reference
PGRM
PARDSAMP
Confidence Interval
Add-in
Programs
PGRM
PARDSAMP
Hypothesis Test
 2 cdf
2nd Dist-7
Add-in
Programs
WS #29
 2 Test
Stat-Tests-Alpha C
PRGM
GOODFIT
LinReg(ax+b)
Stat-Calc-4
PGRM
REGBASIC
4
WS #24
Add-in
Programs
Add-in Programs
The TI-83 and TI 84 Calculators come with a variety of useful special
commands for Statistics. The instructor has assembled additional programs
which expand the functionality of the TI-83 and TI 84 calculators to be
comparable with the TI 89 Calculator plus some extras.
These programs must be added to the individual calculators. There are three
ways to get the programs added to your calculator.
Bring your calculator to the instructor's during office hours and he will
transfer the programs to your calculator from his calculator. You may also
tape a piece of paper to your calculator with your name and give it to the
instructor at the end of class. He will return it to you the next day with the
programs added to it.
You may transfer the programs to your calculator from another calculator
that has the programs loaded.
You may download the programs from this web site to your computer and
then download them to your calculator from your computer.
Following is a description of each of the programs. To download the program
to your computer, click on the program name.
CRITVAL (Critical Values). This program is used to find the critical values to
calculate confidence intervals. The input is the confidence level for a
proportion as a decimal, and the confidence level and the degrees of
freedom for a mean. The output is the critical value.
NORMDIST (Normal Distribution). This program is used to find values
associated with the normal distribution. If given an interval on the x-axis,
the program will calculate the probability that the x-value is in a given
interval. The inputs are the lower bound of the interval, the upper bound of
the interval, the mean of the normal distribution and the standard deviation
of the distribution. The output is the probability as a decimal. If given a
percentile, a cumulative percentage, the program will calculate the
associated x-axis value. The inputs are the percentile as a percent, the
mean of the normal distribution and the standard deviation of the
5
distribution. The output is the associated x-axis interval. The left bound of
the interval is always -E99, or negative infinity. The right bound is the value
of the percentile. WS #23 further illustrates the use of this program.
GOODFIT (Goodness of Fit). This program uses the Chi-Square Distribution to
test the goodness of fit of a data list to a model. The inputs are the data list
and the model list. The output includes the Chi-Square Statistic, the pvalue, and the cell-by-cell residuals. TI-83 Worksheet #29 demonstrates the
use of the program.
PARDSAMP (Paired Samples). This program is used to find a confidence
interval or a p-value for a hypothesis test when the data is from a paired
samples design. The inputs are the respective lists for the data. The outputs
are a normal probability plot and a confidence interval or a p-value.
REGBASIC (Simple Regression). This program streamlines the use of the TI-83
operations to perform simple regression. The program inputs are the x-list
and y-list. The program displays a scatter plot (optional), finds the
regression equation, r and r-squared, displays the scatter plot with the
regression line (optional), displays the residual plot (optional), and
calculates predicted y- values (optional). TI -83 Worksheet #24
demonstrates the use of the program.
SAMPLSIZ (Sample Size). This program estimates the sample size required
for a particular level of confidence and margin of error for proportions and
means. For proportions the input is the estimated proportion, the
confidence level, and the desired margin of error. For means, the inputs are
the estimated standard deviation, the confidence level, and the desired
margin of error. The output is the required sample size.
STDERROR (Standard Error). This program calculates the standard error for a
variety of situations: one proportion, two proportions, two proportions
pooled, one mean, two means (also includes as output the adjusted
degrees of freedom), the regression estimate of a y-value mean, and the
estimate for a y-value single point.
TDIST (T Distribution). This program calculates the T-Distribution p- value for a
one-tail and two-tail hypothesis test for means. The inputs are: the lower
bound of the interval, the upper bound of the interval, the mean, the
standard deviation, and the degrees of freedom. The output is the p-value.
6
7
WS #21 Histograms
Following is a list of countries and the percent of ninth graders who have used
marijuana. Make a histogram of the data.
Austria 10% Belgium 19%
Denmark 17%
England 40%
Finland 5%
France 12%
Germany 21%
Greece 2%
Iceland 10%
Ireland 37%
Italy 19%
Luxembourg 6% Netherlands 31% No. Ireland 23%
Portugal 7% Scotland 53%
Spain 15%
Sweden 6%
Norway 6%
Switzerland 27%
The histogram will show the “percent usage” categories on the x-axis and the number
of countries in the individual categories on the y-axis. We need only to enter the
percentages, not the countries. A histogram is a one-variable display graph.
Key Strokes
Comment
(Enter Data)
Stat
Displays the
statistics menu
1
Selects the list
editor
(Clear list if not empty)
8
▲◄
Move cursor to highlight L1
Clear Enter
Clears list 1
10 Enter 19 Enter …27 Enter
Enters the percentages in the list
2nd Statplot
Displays statplot menu
1
Selects first of 3
statplots
Enter
Turns statplot 1 on
▼►►
Highlights the histogram icon (3rd type)
Enter
Selects histogram type
▼ 2nd L1
Selects L1 as XList
▼1
Sets frequency to one (If you can’t type a 1, it
is because the calculator is in alpha mode. Hit
the ALPHA key to take calculator out of alpha
mode.
Zoom 9
Selects standard
statistics window –
Displays histogram
Trace
Displays properties of the histogram. Indicates
the first intgerval category is [2, 12.2), and n=9
numbers in that category.
►
Indicates the second category is [12.2, 22.4)
and n=6 numbers in that categroy
(Change the range of data an the width of
intervals)
9
Window
Displays window
settings.
Xmin=0 Xmax=60, Xscl=10
Xscl=10 sets the size of the categories at 10
units wide
Graph Trace
Displays the
histogram with
the revised range
and category size.
10
WS #22 Median, Mean, Standard Deviation, Box Plots
Following is a list of data which represents the cost per minute in cents for long
distance calls to a sample of countries served by Net2Phone.
7.9 17
3.9 9.9 15 9.9 7.9 7.9 7.9 49
9.9 7.9 16
49
21 6.9 11
7.9 8.9
9.9 9.9 7.9 3.9 22
Find the mean and standard deviation.
Key Strokes
Comment
Stat 1 Enter
Brings up list menu
60 Enter 61 Enter ...76 Enter
Enter Height data in list 1, L1
►
Move cursor to list 2, L2
2 Enter 6 Enter…1 Enter
Enter Count in L2
Stat ► 1 Enter
Selects One-Variable
Statistics from
CALC menu and puts command on the home
screen
2nd L1 Enter
Display
x  67.115 (mean)
S x  3.792
(standard deviation)
11
Find the 5-Number Summary for the Data, that is, find the Minimum, First
Quartile, Median, Third Quartile, and Maximum.
Key Strokes
▼▼▼▼▼
Comment
Display:
Min X = 60
Q1=65
Med = 66.000
Q3=70.000
maxX=76.000
Make a Box Plot for the data with the outliers plotted separately.
Key Strokes
Comment
2nd Statplot 1
Select Plot 1
Enter
Turns Plot 1 on
▼ ► ► ► Enter
Highlight and select 4th icon. (The next icon is
a box plot withour the outlilers plotted
separately)
▼ 2nd L1
Makes L1 the XList
12
WS #23 Normal Distributions
Solving Problems with the Add-in Program NORMDIST
Problem: Given a normal distribution for Scholastic Aptitude Test (SAT) is N(500,
100), that is, the mean is 500 and the standard deviation is 100. What is the
probability a student scored between 400 and 650?
Key Strokes
Display/Comment
PGRM
Brings up the Add-in Program Menu
3 ENTER
Shows the option for two types of problems.
We will select 1 because our problem gives
us the interval (400, 600) and asks for the %
of individual students in that interval.
1 ENTER
The program asks for the lower bound of the
interval which is 400.
400 ENTER
The program asks for the upper bound of the
13
interval which is 650.
650 ENTER
The program now asks for the mean of the
normal distribution which is 500.
500 ENTER
The program now asks for the Standard
Deviation of the normal distribution which is
100.
100 ENTER
The program gives the answer: The
probability the student scored between 400
and 650 is 0.7745%
14
Problem: Given a normal distribution for Scholastic Aptitude Test (SAT) is N(500,
100), that is, the mean is 500 and the standard deviation is 100. What percent of
individuals scored less than 300?
Key Strokes
Display/Comment
PRGM 3 ENTER
The interval we are looking for is (-  , 300).
Since the calculator does not have the
 symbol, we have to use the smallest
number in the calculator which is  1 10 99 or
-E99. The calculator interval is (-E99, 300).
We select 1.
1 ENTER
- 2ND EE 99 ENTER
300 ENTER
500 ENTER
15
100 ENTER
The probability a student scored less than
300 is .0228.
Problem: Given a normal distribution for Scholastic Aptitude Test (SAT) is N(500,
100), that is the mean is 500 and the standard deviation is 100. What score would it
take to be in the in the top 10% of all students.
Key Strokes
Display/Comment
PGRM 3 ENTER
The top 10% is the rightmost 10% area.. We
are given an area, and we need to find the
right bound on the x-axis We select 2.
2 ENTER
The area from the left is 100-.10 = 0.90
16
0.90 ENTER
500 ENTER
100 ENTER
The score required is 628.1552. Any
score above this number will be in the
top 10% of all socres.
17
Solving Problems with the TI-38 built in programs: NORMALCDF and INVNORMAL
Problem: Given a normal distribution for Scholastic Aptitude Test (SAT) is N(500,
100), that is, the mean is 500 and the standard deviation is 100. What percent of
students had scores less that 750.
When we know the interval (, 750) and want the area above it, we use the
Normalcdf command. This command takes the form of Normalcdf (lower bound,
upper bound, mean, standard deviation).
The TI-83 has no symbol or negative or positive infinity,   or   , so we use
 10 99 for negative infinity and 10 99 for positive infinity. These are the smallest and
largest numbers the TI 83 will take.
Key Strokes
Comment
2nd DISTR
Displays the
Distribution
Menu
2 Enter
Displays the
normalcdf on the
home page
- 2nd EE 99 , 750 , 500, 100 )
Enters the
parameters for
the command
18
Enter
Displays the answer 0.9938. This means that
99.38% of the test scores are lower than
750.
Problem: Given the SAT distribution of N(500, 100), what score would it take to get
into the top 10% of all tests.
To be in the top 4% of the tests would require a score above the 96th percentile.
For this problem, we use the invNoraml command. This command takes the form
invNormal (percentile, mean, standard deviation).
Key Strokes
Comment
2nd Distr 3
0.96 , 500, 100, )
Add the proper parameters to the command
Enter
Display :
A test score of 675.069 will be above 96
percent of all tests taken and in the top 4%
of all tests.
19
WS #24 Scatter Plots, Correlation and Linear Regression
Given the two-variable set of data for annual tuition costs at a community college as
follows:
Year
1991 1993 1995 1997 1999
Tuition $ 2137 2527 2860 3111 3356
Performing Regression Using TI-83 Add-In Program REGBASIC
Key Strokes
Display/Comment
PROG
Selects the Menu
of Add-in
programs
4 ENTER
Selects the
REGBASIC
program. The
program asks for the input lists. The data was
emtered in to List1 and List2 (See WS #21)
ENTER
Program asks if
we want a
scatter plot.
20
1 ENTER
Scatter plot is
displayed.
ENTER
Program displays
regression
equation, r and
r2 .
ENTER 1
Displays the
regression line
on the scatter
plot.
ENTER
Program asks if
you want to see
the residual plot.
21
1 ENTER
Displays the
residual plot.
ENTER 1 2005
Program asks if
we want a
predicted y. We
input the x-value
of 2005.
ENTER
The program
displays the xvalue in context,
indicating the zscore and the
minimum and
maximum xvalues.
Predictions are most accurate at the mean xvalue and less accurate the further away the xvalue is from the mean. Predictions are not
valid when the x-value for the prediction is
significantly below the x-minimum or
significantly above the x-maximum.
ENTER 2
Ends the program
22
WS #26 Proportions: Intervals and Tests
One proportion confidence Interval
A 2000 Gallup poll found that 38% of a random sample of 1012 adults said they
believe in ghosts. Find the 90% confidence interval and the margin of error for the
poll results.
Key Strokes
Comment
Stat ► ►
Brings up the
Test menu
Alpha A
Selects 1-PropZInt
.38 x 1012 Enter
Displays x:384.56
▲ 384 Enter
Displays x:384 The x: is the input for the
“number of successes” and must be experssed
as an integer.
1012 Enter
Input the sample size
. 90 Enter
Input the confidential interval desired
23
Enter
Displays the
inteveral, the
sample
proportion, and
the sample size
.5(.405-.354) Enter
Displays .026 which is the margin of error. The
margin of error is one half the length of the
confidence interval. The confidence interval
may also be exrressed as .379  .026
24
One Proportion Z Test
At Shoreline Community College it has been thought that students had no preference
as to Coke or Pepsi drinks. A survey of 300 students found 160 preferred Coke. Set
up the appropriate hypotheses and find the p-value.
Let p be the proportion of students that prefer Coke. The null hypothesis is the status
quo, which says there is no preference so the proportions for Coke and Pepsi would
be equal at .50. The alternate hypothesis is that the proportions are not equal.
H 0 : p  .50
H A : p  .50
Key Strokes
Comment
Stat ► ►
Brings up the Tests menu
5
Selects 1-PropZTest
.5
Inputs the proportion value for null hypothesis
▼ 160
Inputs the number of successes (must be an
integer)
▼ 300
Input the sample size
▼ Enter
Highlights  p0 . This is the option for a twotailed test.
▼ Enter
Highlights Calculate option
Enter
Displays:
The p-value is .248.
This means that the
probability of a
sample proportion of
.533 or higher or .467
(.50-.033) or lower when the true sample
proportion is .50 occurring is 24.8%. This is
too high a probability to reject the null
hypothesis.
25
Stat Test 5 ▼ ▼ ▼ ▼ ► Enter
Selects the
Draw option.
Display graph.
The shaded
corresponds to
the proportion
of sample
means above .533 and below .467, or 1.155
standard deviation units above the mean and
1.155 units below the mean.
Two Proportion Confidence interval
A study was done on adults over 65 years of age to measure the incidence of arthritis.
The results were:
# Afflicted Sample Size
Men
411
1012
Women
535
1062
Find the 95% confidence interval for the difference in the proportions of men and
women who are afflicted.
Key Strokes
Comment
Stat ► ► Alpha B
Selects 2 PropZInt
from Tests Menu
411 Enter
Enter # successes for men (The number of
successes must be an integer. If the percent of
26
men afflicted were given, we would have
multiplied it by the sample size and rounded it
to the nearest integer to get an integer input)
1012 Enter
Enter sample size for men
535 Enter
Enter # successes for women
1062 Enter
Enter sample size for women
.95
Enters confidence
level desired
▼ Enter
Displays the
confidence interval
(men – women)
pˆ 1  men
pˆ 2  women
27
Two Proportion Z Test
Given the following data on multiple births, test whether the difference in sample
proportions is significant at an alpha level of 5%.
Multiple Births Sample Size Proportion
White Women
94
3132
p1
Black Women
20
606
p2
Set up the hypotheses.
H 0 : p1  p 2
H A : p1  p 2
Key Strokes
Comment
Stat ► ► 6
Selects 2PropZTest from
Tests menu
94 Enter
Enters # successes for white women
3132 Enter
Enters sample size for white women
20 Enter
Enters # success for black women
606 Enter
Enters sample size for black women
Enter
Highlights p1  p 2 , two-tailed test option
28
▼ Enter
Displays results.
The p-value is 0.694
which is greater than
the alpha level of 5%
so the null hypothesis
cannot be rejected.
Stat ► ► 6 ▼ ▼ ▼ ▼ ▼ ► Enter
Displays results
grahpically.
29
WS #27 Normal Probability Plot
Given the following set of data, check to see if it is close to a normal distribution.
Speed 29 34 34 28 30 29 38 31 29 34 32
31
31 27 37 29 26 24 34 36 31 34 36
To check for normality, we make a probability plot. This is constructed by first
finding the Z-scores for each of the numbers in the data list (recall the z-score = (data
point- mean)/(standard deviation). We then plot the actual data on one axis and the
corresponding z-scores on the other axis. If the result is a perfect straight line, then
the data list is a linear transformation of the standard normal distribution and
therefore the data list is normal. The data list is close to normal of the plot is close to
a straight line.
Key Strokes
Comment
Stat Enter
Brings up List Editor. Select L1
29 Enter 34 Enter … 36 Enter
Enter the data in L1
2nd Statplot Enter Enter
Selects and turns Plot1
▼ ► ► ► ► ► Enter
Highlights the probability plot icon, the last
icon in the second row.
▼ 2nd L1
Enters L1 as the Data List
▼ ► Enter
Selects the y-axis
as the Data Axis
(It does not make
any difference
which axis is the
data axis, but
convention
usually uses the y-axis as the data axis.)
30
Zoom 9
Display the plot.
The solid
vertical line is at
the z-score =0
and will intersect
the mean of the
data. In this case
the line is fairly close to a straight line, so the
distribution of the data would be considered
close to normal.
31
WS #28 Means: Confidence Intervals and Tests
One Sample T – Confidence Interval
Given the following sample data about automobile speeds in a residential area, find
the 90% confidence interval for the true mean speed of the vehicles. Assume that the
data satisfies the necessary conditions so that it can be approximated by a tdistribution.
Speed 29 34 34 28 30 29 38 31 29 34 32
31
31 27 37 29 26 24 34 36 31 34 36
Key Strokes
Comment
Stat Enter
Brings up the list editor. Select L1
29 Enter 34 Enter … 36 Enter
Enter data in L1
Stat ► ►
Brings up Tests menu
8
Selects TInterval
Enter
Highlights and selects Data because we have a
list of data. If we knew the sample mean and
the standard deviation for the sample, we would
have selected Stats and inputted these two
items as required.
2nd L1 Enter
Enters L1 as List
1 Enter
Enters Freq as 1
. 9 Enter
Enter .90 for C-Level
32
Displays the
confidence
interval, the
sample mean,
standard error,
and sample size
Enter
One Sample T-Test
Given the sample data above with the assumption that it meets all the conditions to
be approximated by a t-distribution; can you conclude that true mean speed is
greater than 30mph? State the hypotheses and find the p-value. Use an alpha value of
5%.
Let  0  30 mph
H 0 :   0
H A :   0
Key Strokes
Comment
Stat ► ► 2
Brings up the Tests menu and selects T- Test
Enter
Highlights Data. If we had the sample mean
and the standard error, we would have selected
Stats
▼ 30 Enter
Inputs 30 for  0
2nd L1 Enter
Inputs L1 for List
1
Inputs 1 for Freq
▼►►
Highlights and
selects   0
33
▼ Enter
Displays Results.
Since the p-value
is less than .05,
the alpha level,
the null
hypothesis must
be rejected and
we conclude that the true mean speed is greater
that 30mph.
Two Sample T-Interval
We have two brands of batteries, Brand A and Brand B. Following is the data relating
to the working life in minutes for batteries from a sample of both brands.
Brand A 194.0 205.5 199.2 172.4 184.0 169.5
Brand B 190.7 203.5 203.5 206.5 222.5 209.5
Find the 90% confidence interval for the true mean of the difference in the lives of the
batteries:  A   B . Assume the data meets all the necessary conditions so the tdistribution can be used.
Key Strokes
Comment
Stat Enter
Enter Brand A into L1 and Brand B into L2
Stat ► ► 0
Brings up 2-SampTInt command
Enter
Highlight and select Data option since we have
the raw data. If we had the mean and standard
deviation for each sample, we would use the
Stats option.
▼ 2nd L1 Enter
Input L1 as List1
34
2nd L2 Enter
Input L2 as List2
1 Enter 1 Enter
Input 1 for both Freq1 and Freq2
. 9 Enter
Enter .90 for C-Level
Enter
Highlight and
select No for
Pooled. Always
select No for this
option.
▼ Enter
Displays the
results. df is the
degrees of
freedom the
calculator used
for the
calculation.
Two Sample T-Test
Can we conclude from the data that mean battery life for Brand A is less that the
mean battery life for Brand B? Perform a test. Give the p-value and state your
conclusion.
The hypotheses for the test are:
H 0 : 1   2
H A : 1   2
Key Strokes
Stat ► ► 4
Comment
Selects 2-SampTTest command
35
Enter
Selects and Highlight Data option
▼ 2nd L1 Enter
Inputs L1 as List1
2nd L2 Enter
Inputs L2 as List2
1 Enter 1 Enter
Inputs 1 for both Freq1 and Freq2
► Enter
Highlights and selects   2
▼ Enter
Highlights and
selects No for
Pooled option
▼ ► Enter
Displays the
results
graphically.
Stat ► ► 4 ▼ ▼ ▼ ▼ ▼ ▼ ▼ Enter
Displays the
calculations.
The p-value is
.016. Since the
alpha level is not
given, it is
assumed to be
.05. Since the p-value is less than the alpha
level, we reject the null hypothesis and
conclude that the mean life of Brand A batteries
is less than the mean life of Brand B batteries.
36
WS #29 Comparing Counts – Chi-Square Tests
Goodness-of Fit
Problem: Does your zodiac sign determine if you will be successful in life.
Following is a summary of the signs of the 256 heads of the largest of the 400
companies.
Births Sign
Births Sign
23
Aries
18
Libra
20
Taurus
21
Scorpio
18
Gemini
19
Sagittarius
23
Cancer
22
Capricorn
20
Leo
24
Aquarius
19
Virgo
29
Pisces
After confirming the necessary conditions, we set up the hypotheses.
H 0 : Births are uniformly distributed uniformly over the zodiac signs.
H A : Births are not uniformly distributed over the zodiac signs.
We will compute the Chi-square statistic
(Observed  Expected) 2
and calculate the p
Expected
All Cells
value from the chi-square distribution for 11 degrees of freedom.
Key Strokes
Display/Comment
STAT Enter 23 Enter 20 Enter…29 Enter
Enter the Number of Births in List 1
► 21.33 Enter …21.33 Enter
The expected frequency in each cell if the null
hypotheses is true is 256/12 = 21.33. Enter this
number is List 2 for all cells.
37
►▲
Highlights L3
► ▲ ( 2nd L1 – 2nd L2) x2  2nd L2
Puts (Observed – Expected)2 /Expected in L3
38
► ► ▼ (12)
Move the highlight to the bottom of List 3
2nd LIST ► ► 5 2nd L3) ENTER
Puts the sum of List 3 (The Chi-Square)
statistic, 5.095, at the bottom of List 3
2nd DIST 7 ENTER
5.095, 2nd EE 99, 11) Enter
The p-value is
.926 or 92.7%.
Since the p-value
would have to be
less than 5% to
reject the null
hypothesis, we
have to accept it – the births are uniformly
distributed over zodiac signs.
Using the Goodfit2 TI-83 add-in Program
Problem: Solve the previous problem using the Goodfit program
Key Strokes
Display/Comment
Enter the Data in L1 and the Model in L2 as
above. The program will actually let you
enter the data and model into any list you
choose
2
The program must be added to your calculator. You can get it from your instructor.
39
Program
Displays the addin programs
available in the
calculator.
ENTER, ENTER
Starts the Goodfit program.
2nd L1 ENTER 2nd L2
Enter the
appropriate lists
40
ENTER
Displays
Program output.
ENTER, 2nd LIST
Displays List
Menu
7 ENTER
Displays the
list of
individual cell
chi-square
calculations. Use ►► to view numbers off the
screen.
41
Comparing Observed Distributions
Problem: Following is a table that shows the distributions for post high school
activities for three graduating classes. Are they homogenous or are they significantly
different?
1980 1990 2000
College
320
245
288
Employment 98
24
17
Military
18
19
5
Travel
17
2
5
We set up the appropriate hypotheses.
H 0 : The distributions are homogenous
H A : The distributions are not homogenous
The TI-83 has a built-in program to perform the Chi-square test for this kind of problem.
We have to enter the table as a matrix, and then call up the test.
42
Key Strokes
Display/Comment
2nd MATRIX ►►
Selects Matrix
Edit Menu
ENTER
Choose Matrix A.
4 ENTER 3 ENTER
Set the size of
the matrix to 3
rows by 4
columns
320 ENTER 245 ENTER…5 ENTER
Enter the data
row by row.
2nd QUIT STAT ►► ALPHA C
Selects Chisquare test
Our table is in Matrix A, but we will change the
matrix for the Expected distributions to matrix
B.
2nd MATRIX ▼ ENTER
▼ ENTER
Displays Results.
Since the p-value
is less than 5%
43
we reject the null hypothesis, that the
distributions are homogeneous.
To make sure our conclusion is valid, we need
to check the condition that there need to be at
least an expected value of 5 in each cell since
one of the actual cells has less than 5.
2nd Matrix ►► ▼ ENTER
Displays Matrix
B the expected
count.
We observe that
the condition is met
2nd Quit 2nd MATRIX 1 – 2nd MATRIX 2
STO 2nd MATRIX 3 ENTER
The residuals are
the actual value
– expected value
for each cell of
the table. The
residuals are
stored in Matrix C. They are also displayed on
the home screen.
To see column 3 use the right arrow key. To
view the matrix completely on one screen, use
the Matrix Edit command.
We may want to examine the residuals by cell.
To calculate the standardized residual for each
cell, on the home screen calculate
residual
exp ected
.
For example let’s calculate the standardized
residual for cell 1,1
-45.226

365.23 ENTER
The result is – 2.366. This is essentially a zscore, the actual value is 2.336 standard
deviation units below the expected value.
Standardizing all the residuals provides a basis
for comparing the residuals to analyze where
the significant differences in the distributions
are. Unfortunately, they have to be calculated
cell-by-cell.
44
Independence
Problem: Below is a two-way table related to the ship Titanic. The variables
are class and survival. We want to see if these variables are independent, that
is, did the probability of surviving the disaster depend upon what class is
person was in.
First Second Third Crew
Alive 203
118
178
312
Dead
167
528
673
122
After confirming the necessary conditions, we set up the hypotheses.
H 0 : The variables are independent (The conditional distributions of survival are
uniform)
H A : The variables are not independent.
This problem fits the same procedure as homogeneous problem done above.
Key Strokes
2nd MATRIX ►►
Display/Comment
Selects Matrix Edit Menu
45
ENTER 4 ENTER 2
203 ENTER … 763 ENTER
Enter the data in
the matrix
2nd QUIT STAT ►►ALPHA C
▼▼ENTER
The p-value is
virtually zero, so
we reject then null
hypothesis and
conclude that the variables are not independent,
that the probability of being alive is dependent
on the class a person was in.
46
Download