MATH 108: INTRODUCTION TO PROGABILITY AND STATISICS SHORELINE COMMUNITY COLLEGE MIKE MORAN, INSTRUCTOR TABLE OF CONTENTS Index of TI-83 Programs ...................................................................................................................... 2 Add-in Programs ..................................................................................................................................... 5 WS #21 Histograms ................................................................................................................................ 8 WS #22 Median, Mean, Standard Deviation, Box Plots ........................................................... 11 WS #23 Normal Distributions ......................................................................................................... 13 WS #24 Scatter Plots, Correlation and Linear Regression .................................................... 13 WS #26 Proportions: Intervals and Tests ................................................................................... 23 WS #27 Normal Probability Plot .................................................................................................... 30 WS #28 Means: Confidence Intervals and Tests ....................................................................... 32 WS #29 Comparing Counts – Chi-Square Tests ......................................................................... 37 1 Index of TI-83 Programs Type of Statistics Problem TI-83 Command Mean, Median, Quartiles, Standard Deviation Find the specific values for a data set 1-Var Stats Stat-Edit (Enter Data) Stat-Calc-1 (Calculate Values) Histogram Plot Statistics Plots Given a set of data, make a vertical bar chart 2nd-Stat Plot (3rd Icon Type) showing the frequencies of occurrences of equal intervals of the data. Box Plot Statistics Plots Given a set of data, plot the minimum, first 2nd-Stat Plot (4th and 5th Icon quartile, median, third quartile, and types. maximum Scatter Plot Statistics Plots Given two sets of data, plot corresponding 2nd-Stat Plot (1st Icon Type) values as points on an x-y coordinate plane Normal Probability Plot Statistics Plots Given a set of data, check to see if it is a 2nd-Stat Plot (6th Icon Type) normal distribution. If the plot results in a straight line it is normal Normal Distribution PGRM Given lower bound (L) and upper bound(U) NORMDIST 1 (L, U, , ) of interval, find the area above the interval Normal Distribution PGRM Given the area from the left, find the right NORMDIST 2 bound on the x-axis interval (Area from left as a decimal, ,) T Distribution PGRM Given a sample mean, find the p-value TDIST Critical Values PGRM Find the critical value for a level of CRITVAL confidence to use to construct a confidence interval of a proportion or mean 2 Reference1 WS #22 WS #21 WS #22 WS #24 WS #27 WS #23 Add-in Programs WS #23 Add-in Programs Add-in Programs Add-in Programs Type of Statistics Problem Standard Error Find the standard error to use to construct a confidence level or hypothesis test Sample Size Given a desired confidence level and margin of error, find the required sample size Confidence Interval – One Proportion Find a confidence interval for a sample proportion Confidence Interval – Two Proportions Find confidence interval for the difference of two sample proportions Confidence Interval – Mean Find the confidence interval for a sample mean when the population standard deviation is estimated by the standard error Confidence Interval – Two Means Find the confidence Interval for the difference of two sample means when the standard deviation is estimated by the standard error Hypothesis Test – One Proportion Given one sample proportion, find the p-value that it is a sampling variation of a given value Hypothesis Test – Two Proportions Given two sample proportions, find the pvalue that their difference is a sampling variation of a true difference of zero. Hypothesis Test – Mean Given one sample mean, find the p-value that it is a sampling variation of a given value. Hypothesis Test – Two Means Given two sample means, find the p-value that their difference is a sampling variation of a true difference of zero. TI-83 Command Reference PGRM STDERROR Add-in Programs PGRM SAMPLSIZ Add-in Programs 1-PropZInt Stat-Tests-A WS #26 2-PropZInt Stat-Tests-B WS #26 TInterval Stat-Tests-8 WS #28 2-SampTInt Stat-Tests-0 WS #28 1-PropZTest Stat-Tests-5 WS #26 2-PropZTest Stat-Tests-6 WS #26 T-Test Stat-Tests-2 WS #28 2-SampTTest Stat-Tests-4 WS #28 3 Type of Statistics Problem TI-83 Command Paired Samples – Confidence Interval Given two sets of data with related numbers for each subject, i.e. before treatment and after, find the confidence interval around the sum of the pair-wise differences Paired Samples –Hypothesis Test Test the single list of the pair-wise differences Comparing Counts – Chi-Square Distribution Given a set of counts related to a categorical variable, check the “goodness of fit” to a particular model. Given multiple distributions of categorical variables, check to see if they are homogeneous. Given a two-way table for two categorical variables, check to see if they are independent. Linear Regression and Correlation Given two sets of data, find the equation of the line that best models the relationship Reference PGRM PARDSAMP Confidence Interval Add-in Programs PGRM PARDSAMP Hypothesis Test 2 cdf 2nd Dist-7 Add-in Programs WS #29 2 Test Stat-Tests-Alpha C PRGM GOODFIT LinReg(ax+b) Stat-Calc-4 PGRM REGBASIC 4 WS #24 Add-in Programs Add-in Programs The TI-83 and TI 84 Calculators come with a variety of useful special commands for Statistics. The instructor has assembled additional programs which expand the functionality of the TI-83 and TI 84 calculators to be comparable with the TI 89 Calculator plus some extras. These programs must be added to the individual calculators. There are three ways to get the programs added to your calculator. Bring your calculator to the instructor's during office hours and he will transfer the programs to your calculator from his calculator. You may also tape a piece of paper to your calculator with your name and give it to the instructor at the end of class. He will return it to you the next day with the programs added to it. You may transfer the programs to your calculator from another calculator that has the programs loaded. You may download the programs from this web site to your computer and then download them to your calculator from your computer. Following is a description of each of the programs. To download the program to your computer, click on the program name. CRITVAL (Critical Values). This program is used to find the critical values to calculate confidence intervals. The input is the confidence level for a proportion as a decimal, and the confidence level and the degrees of freedom for a mean. The output is the critical value. NORMDIST (Normal Distribution). This program is used to find values associated with the normal distribution. If given an interval on the x-axis, the program will calculate the probability that the x-value is in a given interval. The inputs are the lower bound of the interval, the upper bound of the interval, the mean of the normal distribution and the standard deviation of the distribution. The output is the probability as a decimal. If given a percentile, a cumulative percentage, the program will calculate the associated x-axis value. The inputs are the percentile as a percent, the mean of the normal distribution and the standard deviation of the 5 distribution. The output is the associated x-axis interval. The left bound of the interval is always -E99, or negative infinity. The right bound is the value of the percentile. WS #23 further illustrates the use of this program. GOODFIT (Goodness of Fit). This program uses the Chi-Square Distribution to test the goodness of fit of a data list to a model. The inputs are the data list and the model list. The output includes the Chi-Square Statistic, the pvalue, and the cell-by-cell residuals. TI-83 Worksheet #29 demonstrates the use of the program. PARDSAMP (Paired Samples). This program is used to find a confidence interval or a p-value for a hypothesis test when the data is from a paired samples design. The inputs are the respective lists for the data. The outputs are a normal probability plot and a confidence interval or a p-value. REGBASIC (Simple Regression). This program streamlines the use of the TI-83 operations to perform simple regression. The program inputs are the x-list and y-list. The program displays a scatter plot (optional), finds the regression equation, r and r-squared, displays the scatter plot with the regression line (optional), displays the residual plot (optional), and calculates predicted y- values (optional). TI -83 Worksheet #24 demonstrates the use of the program. SAMPLSIZ (Sample Size). This program estimates the sample size required for a particular level of confidence and margin of error for proportions and means. For proportions the input is the estimated proportion, the confidence level, and the desired margin of error. For means, the inputs are the estimated standard deviation, the confidence level, and the desired margin of error. The output is the required sample size. STDERROR (Standard Error). This program calculates the standard error for a variety of situations: one proportion, two proportions, two proportions pooled, one mean, two means (also includes as output the adjusted degrees of freedom), the regression estimate of a y-value mean, and the estimate for a y-value single point. TDIST (T Distribution). This program calculates the T-Distribution p- value for a one-tail and two-tail hypothesis test for means. The inputs are: the lower bound of the interval, the upper bound of the interval, the mean, the standard deviation, and the degrees of freedom. The output is the p-value. 6 7 WS #21 Histograms Following is a list of countries and the percent of ninth graders who have used marijuana. Make a histogram of the data. Austria 10% Belgium 19% Denmark 17% England 40% Finland 5% France 12% Germany 21% Greece 2% Iceland 10% Ireland 37% Italy 19% Luxembourg 6% Netherlands 31% No. Ireland 23% Portugal 7% Scotland 53% Spain 15% Sweden 6% Norway 6% Switzerland 27% The histogram will show the “percent usage” categories on the x-axis and the number of countries in the individual categories on the y-axis. We need only to enter the percentages, not the countries. A histogram is a one-variable display graph. Key Strokes Comment (Enter Data) Stat Displays the statistics menu 1 Selects the list editor (Clear list if not empty) 8 ▲◄ Move cursor to highlight L1 Clear Enter Clears list 1 10 Enter 19 Enter …27 Enter Enters the percentages in the list 2nd Statplot Displays statplot menu 1 Selects first of 3 statplots Enter Turns statplot 1 on ▼►► Highlights the histogram icon (3rd type) Enter Selects histogram type ▼ 2nd L1 Selects L1 as XList ▼1 Sets frequency to one (If you can’t type a 1, it is because the calculator is in alpha mode. Hit the ALPHA key to take calculator out of alpha mode. Zoom 9 Selects standard statistics window – Displays histogram Trace Displays properties of the histogram. Indicates the first intgerval category is [2, 12.2), and n=9 numbers in that category. ► Indicates the second category is [12.2, 22.4) and n=6 numbers in that categroy (Change the range of data an the width of intervals) 9 Window Displays window settings. Xmin=0 Xmax=60, Xscl=10 Xscl=10 sets the size of the categories at 10 units wide Graph Trace Displays the histogram with the revised range and category size. 10 WS #22 Median, Mean, Standard Deviation, Box Plots Following is a list of data which represents the cost per minute in cents for long distance calls to a sample of countries served by Net2Phone. 7.9 17 3.9 9.9 15 9.9 7.9 7.9 7.9 49 9.9 7.9 16 49 21 6.9 11 7.9 8.9 9.9 9.9 7.9 3.9 22 Find the mean and standard deviation. Key Strokes Comment Stat 1 Enter Brings up list menu 60 Enter 61 Enter ...76 Enter Enter Height data in list 1, L1 ► Move cursor to list 2, L2 2 Enter 6 Enter…1 Enter Enter Count in L2 Stat ► 1 Enter Selects One-Variable Statistics from CALC menu and puts command on the home screen 2nd L1 Enter Display x 67.115 (mean) S x 3.792 (standard deviation) 11 Find the 5-Number Summary for the Data, that is, find the Minimum, First Quartile, Median, Third Quartile, and Maximum. Key Strokes ▼▼▼▼▼ Comment Display: Min X = 60 Q1=65 Med = 66.000 Q3=70.000 maxX=76.000 Make a Box Plot for the data with the outliers plotted separately. Key Strokes Comment 2nd Statplot 1 Select Plot 1 Enter Turns Plot 1 on ▼ ► ► ► Enter Highlight and select 4th icon. (The next icon is a box plot withour the outlilers plotted separately) ▼ 2nd L1 Makes L1 the XList 12 WS #23 Normal Distributions Solving Problems with the Add-in Program NORMDIST Problem: Given a normal distribution for Scholastic Aptitude Test (SAT) is N(500, 100), that is, the mean is 500 and the standard deviation is 100. What is the probability a student scored between 400 and 650? Key Strokes Display/Comment PGRM Brings up the Add-in Program Menu 3 ENTER Shows the option for two types of problems. We will select 1 because our problem gives us the interval (400, 600) and asks for the % of individual students in that interval. 1 ENTER The program asks for the lower bound of the interval which is 400. 400 ENTER The program asks for the upper bound of the 13 interval which is 650. 650 ENTER The program now asks for the mean of the normal distribution which is 500. 500 ENTER The program now asks for the Standard Deviation of the normal distribution which is 100. 100 ENTER The program gives the answer: The probability the student scored between 400 and 650 is 0.7745% 14 Problem: Given a normal distribution for Scholastic Aptitude Test (SAT) is N(500, 100), that is, the mean is 500 and the standard deviation is 100. What percent of individuals scored less than 300? Key Strokes Display/Comment PRGM 3 ENTER The interval we are looking for is (- , 300). Since the calculator does not have the symbol, we have to use the smallest number in the calculator which is 1 10 99 or -E99. The calculator interval is (-E99, 300). We select 1. 1 ENTER - 2ND EE 99 ENTER 300 ENTER 500 ENTER 15 100 ENTER The probability a student scored less than 300 is .0228. Problem: Given a normal distribution for Scholastic Aptitude Test (SAT) is N(500, 100), that is the mean is 500 and the standard deviation is 100. What score would it take to be in the in the top 10% of all students. Key Strokes Display/Comment PGRM 3 ENTER The top 10% is the rightmost 10% area.. We are given an area, and we need to find the right bound on the x-axis We select 2. 2 ENTER The area from the left is 100-.10 = 0.90 16 0.90 ENTER 500 ENTER 100 ENTER The score required is 628.1552. Any score above this number will be in the top 10% of all socres. 17 Solving Problems with the TI-38 built in programs: NORMALCDF and INVNORMAL Problem: Given a normal distribution for Scholastic Aptitude Test (SAT) is N(500, 100), that is, the mean is 500 and the standard deviation is 100. What percent of students had scores less that 750. When we know the interval (, 750) and want the area above it, we use the Normalcdf command. This command takes the form of Normalcdf (lower bound, upper bound, mean, standard deviation). The TI-83 has no symbol or negative or positive infinity, or , so we use 10 99 for negative infinity and 10 99 for positive infinity. These are the smallest and largest numbers the TI 83 will take. Key Strokes Comment 2nd DISTR Displays the Distribution Menu 2 Enter Displays the normalcdf on the home page - 2nd EE 99 , 750 , 500, 100 ) Enters the parameters for the command 18 Enter Displays the answer 0.9938. This means that 99.38% of the test scores are lower than 750. Problem: Given the SAT distribution of N(500, 100), what score would it take to get into the top 10% of all tests. To be in the top 4% of the tests would require a score above the 96th percentile. For this problem, we use the invNoraml command. This command takes the form invNormal (percentile, mean, standard deviation). Key Strokes Comment 2nd Distr 3 0.96 , 500, 100, ) Add the proper parameters to the command Enter Display : A test score of 675.069 will be above 96 percent of all tests taken and in the top 4% of all tests. 19 WS #24 Scatter Plots, Correlation and Linear Regression Given the two-variable set of data for annual tuition costs at a community college as follows: Year 1991 1993 1995 1997 1999 Tuition $ 2137 2527 2860 3111 3356 Performing Regression Using TI-83 Add-In Program REGBASIC Key Strokes Display/Comment PROG Selects the Menu of Add-in programs 4 ENTER Selects the REGBASIC program. The program asks for the input lists. The data was emtered in to List1 and List2 (See WS #21) ENTER Program asks if we want a scatter plot. 20 1 ENTER Scatter plot is displayed. ENTER Program displays regression equation, r and r2 . ENTER 1 Displays the regression line on the scatter plot. ENTER Program asks if you want to see the residual plot. 21 1 ENTER Displays the residual plot. ENTER 1 2005 Program asks if we want a predicted y. We input the x-value of 2005. ENTER The program displays the xvalue in context, indicating the zscore and the minimum and maximum xvalues. Predictions are most accurate at the mean xvalue and less accurate the further away the xvalue is from the mean. Predictions are not valid when the x-value for the prediction is significantly below the x-minimum or significantly above the x-maximum. ENTER 2 Ends the program 22 WS #26 Proportions: Intervals and Tests One proportion confidence Interval A 2000 Gallup poll found that 38% of a random sample of 1012 adults said they believe in ghosts. Find the 90% confidence interval and the margin of error for the poll results. Key Strokes Comment Stat ► ► Brings up the Test menu Alpha A Selects 1-PropZInt .38 x 1012 Enter Displays x:384.56 ▲ 384 Enter Displays x:384 The x: is the input for the “number of successes” and must be experssed as an integer. 1012 Enter Input the sample size . 90 Enter Input the confidential interval desired 23 Enter Displays the inteveral, the sample proportion, and the sample size .5(.405-.354) Enter Displays .026 which is the margin of error. The margin of error is one half the length of the confidence interval. The confidence interval may also be exrressed as .379 .026 24 One Proportion Z Test At Shoreline Community College it has been thought that students had no preference as to Coke or Pepsi drinks. A survey of 300 students found 160 preferred Coke. Set up the appropriate hypotheses and find the p-value. Let p be the proportion of students that prefer Coke. The null hypothesis is the status quo, which says there is no preference so the proportions for Coke and Pepsi would be equal at .50. The alternate hypothesis is that the proportions are not equal. H 0 : p .50 H A : p .50 Key Strokes Comment Stat ► ► Brings up the Tests menu 5 Selects 1-PropZTest .5 Inputs the proportion value for null hypothesis ▼ 160 Inputs the number of successes (must be an integer) ▼ 300 Input the sample size ▼ Enter Highlights p0 . This is the option for a twotailed test. ▼ Enter Highlights Calculate option Enter Displays: The p-value is .248. This means that the probability of a sample proportion of .533 or higher or .467 (.50-.033) or lower when the true sample proportion is .50 occurring is 24.8%. This is too high a probability to reject the null hypothesis. 25 Stat Test 5 ▼ ▼ ▼ ▼ ► Enter Selects the Draw option. Display graph. The shaded corresponds to the proportion of sample means above .533 and below .467, or 1.155 standard deviation units above the mean and 1.155 units below the mean. Two Proportion Confidence interval A study was done on adults over 65 years of age to measure the incidence of arthritis. The results were: # Afflicted Sample Size Men 411 1012 Women 535 1062 Find the 95% confidence interval for the difference in the proportions of men and women who are afflicted. Key Strokes Comment Stat ► ► Alpha B Selects 2 PropZInt from Tests Menu 411 Enter Enter # successes for men (The number of successes must be an integer. If the percent of 26 men afflicted were given, we would have multiplied it by the sample size and rounded it to the nearest integer to get an integer input) 1012 Enter Enter sample size for men 535 Enter Enter # successes for women 1062 Enter Enter sample size for women .95 Enters confidence level desired ▼ Enter Displays the confidence interval (men – women) pˆ 1 men pˆ 2 women 27 Two Proportion Z Test Given the following data on multiple births, test whether the difference in sample proportions is significant at an alpha level of 5%. Multiple Births Sample Size Proportion White Women 94 3132 p1 Black Women 20 606 p2 Set up the hypotheses. H 0 : p1 p 2 H A : p1 p 2 Key Strokes Comment Stat ► ► 6 Selects 2PropZTest from Tests menu 94 Enter Enters # successes for white women 3132 Enter Enters sample size for white women 20 Enter Enters # success for black women 606 Enter Enters sample size for black women Enter Highlights p1 p 2 , two-tailed test option 28 ▼ Enter Displays results. The p-value is 0.694 which is greater than the alpha level of 5% so the null hypothesis cannot be rejected. Stat ► ► 6 ▼ ▼ ▼ ▼ ▼ ► Enter Displays results grahpically. 29 WS #27 Normal Probability Plot Given the following set of data, check to see if it is close to a normal distribution. Speed 29 34 34 28 30 29 38 31 29 34 32 31 31 27 37 29 26 24 34 36 31 34 36 To check for normality, we make a probability plot. This is constructed by first finding the Z-scores for each of the numbers in the data list (recall the z-score = (data point- mean)/(standard deviation). We then plot the actual data on one axis and the corresponding z-scores on the other axis. If the result is a perfect straight line, then the data list is a linear transformation of the standard normal distribution and therefore the data list is normal. The data list is close to normal of the plot is close to a straight line. Key Strokes Comment Stat Enter Brings up List Editor. Select L1 29 Enter 34 Enter … 36 Enter Enter the data in L1 2nd Statplot Enter Enter Selects and turns Plot1 ▼ ► ► ► ► ► Enter Highlights the probability plot icon, the last icon in the second row. ▼ 2nd L1 Enters L1 as the Data List ▼ ► Enter Selects the y-axis as the Data Axis (It does not make any difference which axis is the data axis, but convention usually uses the y-axis as the data axis.) 30 Zoom 9 Display the plot. The solid vertical line is at the z-score =0 and will intersect the mean of the data. In this case the line is fairly close to a straight line, so the distribution of the data would be considered close to normal. 31 WS #28 Means: Confidence Intervals and Tests One Sample T – Confidence Interval Given the following sample data about automobile speeds in a residential area, find the 90% confidence interval for the true mean speed of the vehicles. Assume that the data satisfies the necessary conditions so that it can be approximated by a tdistribution. Speed 29 34 34 28 30 29 38 31 29 34 32 31 31 27 37 29 26 24 34 36 31 34 36 Key Strokes Comment Stat Enter Brings up the list editor. Select L1 29 Enter 34 Enter … 36 Enter Enter data in L1 Stat ► ► Brings up Tests menu 8 Selects TInterval Enter Highlights and selects Data because we have a list of data. If we knew the sample mean and the standard deviation for the sample, we would have selected Stats and inputted these two items as required. 2nd L1 Enter Enters L1 as List 1 Enter Enters Freq as 1 . 9 Enter Enter .90 for C-Level 32 Displays the confidence interval, the sample mean, standard error, and sample size Enter One Sample T-Test Given the sample data above with the assumption that it meets all the conditions to be approximated by a t-distribution; can you conclude that true mean speed is greater than 30mph? State the hypotheses and find the p-value. Use an alpha value of 5%. Let 0 30 mph H 0 : 0 H A : 0 Key Strokes Comment Stat ► ► 2 Brings up the Tests menu and selects T- Test Enter Highlights Data. If we had the sample mean and the standard error, we would have selected Stats ▼ 30 Enter Inputs 30 for 0 2nd L1 Enter Inputs L1 for List 1 Inputs 1 for Freq ▼►► Highlights and selects 0 33 ▼ Enter Displays Results. Since the p-value is less than .05, the alpha level, the null hypothesis must be rejected and we conclude that the true mean speed is greater that 30mph. Two Sample T-Interval We have two brands of batteries, Brand A and Brand B. Following is the data relating to the working life in minutes for batteries from a sample of both brands. Brand A 194.0 205.5 199.2 172.4 184.0 169.5 Brand B 190.7 203.5 203.5 206.5 222.5 209.5 Find the 90% confidence interval for the true mean of the difference in the lives of the batteries: A B . Assume the data meets all the necessary conditions so the tdistribution can be used. Key Strokes Comment Stat Enter Enter Brand A into L1 and Brand B into L2 Stat ► ► 0 Brings up 2-SampTInt command Enter Highlight and select Data option since we have the raw data. If we had the mean and standard deviation for each sample, we would use the Stats option. ▼ 2nd L1 Enter Input L1 as List1 34 2nd L2 Enter Input L2 as List2 1 Enter 1 Enter Input 1 for both Freq1 and Freq2 . 9 Enter Enter .90 for C-Level Enter Highlight and select No for Pooled. Always select No for this option. ▼ Enter Displays the results. df is the degrees of freedom the calculator used for the calculation. Two Sample T-Test Can we conclude from the data that mean battery life for Brand A is less that the mean battery life for Brand B? Perform a test. Give the p-value and state your conclusion. The hypotheses for the test are: H 0 : 1 2 H A : 1 2 Key Strokes Stat ► ► 4 Comment Selects 2-SampTTest command 35 Enter Selects and Highlight Data option ▼ 2nd L1 Enter Inputs L1 as List1 2nd L2 Enter Inputs L2 as List2 1 Enter 1 Enter Inputs 1 for both Freq1 and Freq2 ► Enter Highlights and selects 2 ▼ Enter Highlights and selects No for Pooled option ▼ ► Enter Displays the results graphically. Stat ► ► 4 ▼ ▼ ▼ ▼ ▼ ▼ ▼ Enter Displays the calculations. The p-value is .016. Since the alpha level is not given, it is assumed to be .05. Since the p-value is less than the alpha level, we reject the null hypothesis and conclude that the mean life of Brand A batteries is less than the mean life of Brand B batteries. 36 WS #29 Comparing Counts – Chi-Square Tests Goodness-of Fit Problem: Does your zodiac sign determine if you will be successful in life. Following is a summary of the signs of the 256 heads of the largest of the 400 companies. Births Sign Births Sign 23 Aries 18 Libra 20 Taurus 21 Scorpio 18 Gemini 19 Sagittarius 23 Cancer 22 Capricorn 20 Leo 24 Aquarius 19 Virgo 29 Pisces After confirming the necessary conditions, we set up the hypotheses. H 0 : Births are uniformly distributed uniformly over the zodiac signs. H A : Births are not uniformly distributed over the zodiac signs. We will compute the Chi-square statistic (Observed Expected) 2 and calculate the p Expected All Cells value from the chi-square distribution for 11 degrees of freedom. Key Strokes Display/Comment STAT Enter 23 Enter 20 Enter…29 Enter Enter the Number of Births in List 1 ► 21.33 Enter …21.33 Enter The expected frequency in each cell if the null hypotheses is true is 256/12 = 21.33. Enter this number is List 2 for all cells. 37 ►▲ Highlights L3 ► ▲ ( 2nd L1 – 2nd L2) x2 2nd L2 Puts (Observed – Expected)2 /Expected in L3 38 ► ► ▼ (12) Move the highlight to the bottom of List 3 2nd LIST ► ► 5 2nd L3) ENTER Puts the sum of List 3 (The Chi-Square) statistic, 5.095, at the bottom of List 3 2nd DIST 7 ENTER 5.095, 2nd EE 99, 11) Enter The p-value is .926 or 92.7%. Since the p-value would have to be less than 5% to reject the null hypothesis, we have to accept it – the births are uniformly distributed over zodiac signs. Using the Goodfit2 TI-83 add-in Program Problem: Solve the previous problem using the Goodfit program Key Strokes Display/Comment Enter the Data in L1 and the Model in L2 as above. The program will actually let you enter the data and model into any list you choose 2 The program must be added to your calculator. You can get it from your instructor. 39 Program Displays the addin programs available in the calculator. ENTER, ENTER Starts the Goodfit program. 2nd L1 ENTER 2nd L2 Enter the appropriate lists 40 ENTER Displays Program output. ENTER, 2nd LIST Displays List Menu 7 ENTER Displays the list of individual cell chi-square calculations. Use ►► to view numbers off the screen. 41 Comparing Observed Distributions Problem: Following is a table that shows the distributions for post high school activities for three graduating classes. Are they homogenous or are they significantly different? 1980 1990 2000 College 320 245 288 Employment 98 24 17 Military 18 19 5 Travel 17 2 5 We set up the appropriate hypotheses. H 0 : The distributions are homogenous H A : The distributions are not homogenous The TI-83 has a built-in program to perform the Chi-square test for this kind of problem. We have to enter the table as a matrix, and then call up the test. 42 Key Strokes Display/Comment 2nd MATRIX ►► Selects Matrix Edit Menu ENTER Choose Matrix A. 4 ENTER 3 ENTER Set the size of the matrix to 3 rows by 4 columns 320 ENTER 245 ENTER…5 ENTER Enter the data row by row. 2nd QUIT STAT ►► ALPHA C Selects Chisquare test Our table is in Matrix A, but we will change the matrix for the Expected distributions to matrix B. 2nd MATRIX ▼ ENTER ▼ ENTER Displays Results. Since the p-value is less than 5% 43 we reject the null hypothesis, that the distributions are homogeneous. To make sure our conclusion is valid, we need to check the condition that there need to be at least an expected value of 5 in each cell since one of the actual cells has less than 5. 2nd Matrix ►► ▼ ENTER Displays Matrix B the expected count. We observe that the condition is met 2nd Quit 2nd MATRIX 1 – 2nd MATRIX 2 STO 2nd MATRIX 3 ENTER The residuals are the actual value – expected value for each cell of the table. The residuals are stored in Matrix C. They are also displayed on the home screen. To see column 3 use the right arrow key. To view the matrix completely on one screen, use the Matrix Edit command. We may want to examine the residuals by cell. To calculate the standardized residual for each cell, on the home screen calculate residual exp ected . For example let’s calculate the standardized residual for cell 1,1 -45.226 365.23 ENTER The result is – 2.366. This is essentially a zscore, the actual value is 2.336 standard deviation units below the expected value. Standardizing all the residuals provides a basis for comparing the residuals to analyze where the significant differences in the distributions are. Unfortunately, they have to be calculated cell-by-cell. 44 Independence Problem: Below is a two-way table related to the ship Titanic. The variables are class and survival. We want to see if these variables are independent, that is, did the probability of surviving the disaster depend upon what class is person was in. First Second Third Crew Alive 203 118 178 312 Dead 167 528 673 122 After confirming the necessary conditions, we set up the hypotheses. H 0 : The variables are independent (The conditional distributions of survival are uniform) H A : The variables are not independent. This problem fits the same procedure as homogeneous problem done above. Key Strokes 2nd MATRIX ►► Display/Comment Selects Matrix Edit Menu 45 ENTER 4 ENTER 2 203 ENTER … 763 ENTER Enter the data in the matrix 2nd QUIT STAT ►►ALPHA C ▼▼ENTER The p-value is virtually zero, so we reject then null hypothesis and conclude that the variables are not independent, that the probability of being alive is dependent on the class a person was in. 46