Activity 1 Analyzing Categorical Random Variables Today's activities are based on the dataset UCDavis2. The data were collected in Fall 2000 from n=239 college students. There are fourteen columns of data (variables), and here is their description: Column C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 Name Sex GPA Seat alcohol WtFeel Height IdealHt momheight dadheight Hand Looks Friends Cheat Smoke Description _______________ Male or Female Student's GPA Typical classroom seat location (Front Middle Back) Number of alcoholic beverages consumed in typical week Does student feel he/she is Overweight, Underweight, About Right? Self-reported height, inches Student's choice of ideal height, inches Mother's height, inches Father's height, inches Are you Left-handed or Right-handed? On a scale of 1-25 importance of personality (1) versus looks (25) Who is easiest to make friends with? (Opposite sex, Same sex) Would you tell the instructor if you saw somebody cheating on an exam? Does student smoke at least one pack of cigarettes per week? First a quick review of variable types….choose either categorical, quantitative (numerical), or ordinal for the following variables. Sex : GPA : Seat : alcohol : WtFeel : Height : IdealHt : momheight : dadheight: Hand : Looks : Friends : Cheat : Smoke : Minitab Directions: Start Minitab on your computer using Start/All Programs/Spreadsheeds and Statistics/Minitab 13 for Windows, and then click in the Minitab icon. Click on the data link, copy the data from the file UCDavis2 (find the link on the website) and paste it in the Minitab "data" window. Part1: Summarizing One Categorical Variable Let’s focus on the “Seat” variable. 1. Here is the one way table for the Seat variable obtained from Minitab: Minitab Directions: Select Stat/Tables/Cross Tabulation enter seat in the box of Classification variables and also select the “Total Percent” option. Tabulated Statistics: Seat Rows: Seat Count % of Tbl B F M All 46 53 139 238 19.33 22.27 58.40 100.00 2. What is the percentage of students sitting in the front of the class? 3. Construct a pie chart for the seat variable. Minitab Directions: Select Graph/Pie chart and enter Seat in the box named chart data in. Write a sentence describing what this pie chart. From the pie chart we can see that most of the students (58%) prefer to seat in the middle of the class. Also, we can observe that the percentage of students that prefer to seat in the front is a little bit larger than the percentage of students that prefer to seat at the back of the classroom. Part 2: Summarizing Two Categorical Variables Now let’s focus on the variables sex and perception of weight (Wtfeel). 1. It follows a contingency table for the relationship between sex and perception of weight (Wtfeel). Note that sex is used as the row variable (explanatory) and Wtfeel as the column variable (response). In addition to counts, there are also displayed row percentages. Minitab Directions: Select Stat/Tables/Cross Tabulation, enter sex and Wtfeel in the box of Classification variables and select the “Row Percent” option. (Unselect the “Total Percent” option.) Tabulated Statistics: Sex, WtFeel Rows: Sex Columns: WtFeel AboutRt OverWt UnderWt All Female 109 73.65 32 21.62 7 4.73 148 100.00 Male 56 66.67 15 17.86 13 15.48 84 100.00 All 165 71.12 47 20.26 20 8.62 232 100.00 2. Report the following percentages: Percentage of female students thinking they are overweight: Percentage of male students thinking they are overweight: Percentage of students thinking that they are overweight: Part 3: Assessing the Statistical Significance of the Relationship between two Categorical Variables. These questions are based on Exercise 6.62 on page 200. Here we will asses the question: “Is there a relation between Sex and WtFeel? “. So we will use our sample to make generalizations about the population. 1. Below, I have copied and paste a contingency table for the relationship between sex and perception of weight (Wtfeel). Note that sex is used as the row variable and Wtfeel as the column variable. The table displays counts and row percentages. Minitab Directions: Select Stat/Tables/Cross Tabulation, enter sex and Wtfeel in the box of Classification variables, select the “Chi Square Analysis” option and “Above the Expected Count” option. (Unselect the “Row Percent” option.) Tabulated Statistics: Sex, WtFeel Rows: Sex Columns: WtFeel AboutRt OverWt UnderWt All 109 105.26 32 29.98 7 12.76 148 148.00 Male 56 59.74 15 17.02 13 7.24 84 84.00 All 165 165.00 47 47.00 20 20.00 232 232.00 Female Chi-Square = 7.921, DF = 2, P-Value = 0.019 2. Write the null and the alternative hypothesis for testing whether males and females differ in their perceptions of their own weight. Null Hypothesis: Alternative Hypothesis: 3. Under the null hypothesis, what is the expected number of men in this sample who think they are overweight? (The expected counts for each “cell” are the number under the observed counts.) 4. What is the value of the chi-square statistic and what is the p-value. (See the line underneath the table.) 5. Interpret the p-value in the context of this example (i.e. can we reject or not the statement in the null hypothesis).