# Activity - ChiSquare

Dr. Thomas R. Boucher Texas A&M University-Commerce

### Pearson’s Chi-Square

Name:

____________________________________

The file ‘2013 Car MPG.csv’ contains EPA MPG estimates for a variety of 2013 models. We would like to compare the number of cylinders ‘cylinders’ across the various classes (‘class’) of automobiles. Consider the automobiles used in the testing to be a representative sample of all 2013 automobiles.

(1)

Use StatCrunch to create a contingency table with ‘class’ defining the rows and

‘cylinders’ defining the columns. Paste the table here. Does the table suggest anything?

 There are several 0’s or empyy cells which will impact the chi values, most of which come from pickup and suv

It appears that most cars across class have either 4, 6, or 8 cylinders.

Few cars across class have 5 or 12 cyllinders.

(2) Use StatCrunch to create individual bar charts for ‘class’ and for ‘cylinders’, and a grouped bar chart for the data for ‘class’ and ‘cylinders’. Include the plots here.

Do the plots show anything?

Dr. Thomas R. Boucher Texas A&M University-Commerce

The highest frequency of cylinders are 4,6, and 8

The highest frequency of car classes are compact, and midsize

We can see overlap between the cars with the highest frequency and the highest cylinder frequency

4 cylinder mostly has compact and midsize

6 cylinders is the most varied

8 cylinders is most SUV and pickup

(3)

Use Pearson’s Chi-Square test in StatCrunch to test for an association between

‘class’ and ‘cylinders’. Paste the output here. What do you conclude? Can you be confident in the results?

This chi square value is very large

Dr. Thomas R. Boucher Texas A&M University-Commerce

The p value is very small and we reject the null and accept the alternative because of this

 We can’t be confident in the result because over 20% of cells have a count less than 5

(4) You should have seen a significant association (with caveats) between ‘class’ and

‘cylinders’. Use StatCrunch to create another contingency table containing the observed and expected counts, and the contributions to the Chi-Square and use these to interpret the association.

 The biggest reasons we rejected the null are compact cars with 4 cylinders, SUV’s with 4 cylinders, and Suvs with 8 cylinders.

AS we can see there are several values that are more than expected such as 8 cylinder and compact cars

There are more however that are less than expected such as 4 cylinder compact cars.

We can see that those that were less than expected were big contributions which is likely why we got the results we did.