Activity 1

advertisement
Activity 1
Analyzing Categorical Random Variables
Today's activities are based on the dataset UCDavis2. The data were collected in Fall 2000
from n=239 college students. There are fourteen columns of data (variables), and here is their
description:
Column
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
Name
Sex
GPA
Seat
alcohol
WtFeel
Height
IdealHt
momheight
dadheight
Hand
Looks
Friends
Cheat
Smoke
Description
_______________
Male or Female
Student's GPA
Typical classroom seat location (Front Middle Back)
Number of alcoholic beverages consumed in typical week
Does student feel he/she is Overweight, Underweight, About Right?
Self-reported height, inches
Student's choice of ideal height, inches
Mother's height, inches
Father's height, inches
Are you Left-handed or Right-handed?
On a scale of 1-25 importance of personality (1) versus looks (25)
Who is easiest to make friends with? (Opposite sex, Same sex)
Would you tell the instructor if you saw somebody cheating on an exam?
Does student smoke at least one pack of cigarettes per week?
First a quick review of variable types….choose either categorical, quantitative (numerical), or
ordinal for the following variables.
Sex :
GPA :
Seat :
alcohol :
WtFeel :
Height :
IdealHt :
momheight :
dadheight:
Hand :
Looks :
Friends :
Cheat :
Smoke :
Minitab Directions: Start Minitab on your computer using Start/All Programs/Spreadsheeds and
Statistics/Minitab 13 for Windows, and then click in the Minitab icon. Click on the data link, copy the
data from the file UCDavis2 (find the link on the website) and paste it in the Minitab "data" window.
Part1: Summarizing One Categorical Variable
Let’s focus on the “Seat” variable.
1. Here is the one way table for the Seat variable obtained from Minitab:
Minitab Directions: Select Stat/Tables/Cross Tabulation enter seat in the box of Classification
variables and also select the “Total Percent” option.
Tabulated Statistics: Seat
Rows: Seat
Count % of Tbl
B
F
M
All
46
53
139
238
19.33
22.27
58.40
100.00
2. What is the percentage of students sitting in the front of the class?
3. Construct a pie chart for the seat variable.
Minitab Directions: Select Graph/Pie chart and enter Seat in the box named chart data in. Write a
sentence describing what this pie chart.
From the pie chart we can see that most of the students (58%) prefer to seat in the middle
of the class. Also, we can observe that the percentage of students that prefer to seat in the
front is a little bit larger than the percentage of students that prefer to seat at the back of the
classroom.
Part 2: Summarizing Two Categorical Variables
Now let’s focus on the variables sex and perception of weight (Wtfeel).
1. It follows a contingency table for the relationship between sex and perception of weight
(Wtfeel). Note that sex is used as the row variable (explanatory) and Wtfeel as the column
variable (response). In addition to counts, there are also displayed row percentages.
Minitab Directions: Select Stat/Tables/Cross Tabulation, enter sex and Wtfeel in the box of
Classification variables and select the “Row Percent” option. (Unselect the “Total Percent”
option.)
Tabulated Statistics: Sex, WtFeel
Rows: Sex
Columns: WtFeel
AboutRt
OverWt
UnderWt
All
Female
109
73.65
32
21.62
7
4.73
148
100.00
Male
56
66.67
15
17.86
13
15.48
84
100.00
All
165
71.12
47
20.26
20
8.62
232
100.00
2. Report the following percentages:
Percentage of female students thinking they are overweight:
Percentage of male students thinking they are overweight:
Percentage of students thinking that they are overweight:
Part 3: Assessing the Statistical Significance of the Relationship between two
Categorical Variables.
These questions are based on Exercise 6.62 on page 200. Here we will asses the question:
“Is there a relation between Sex and WtFeel? “. So we will use our sample to make
generalizations about the population.
1. Below, I have copied and paste a contingency table for the relationship between sex and
perception of weight (Wtfeel). Note that sex is used as the row variable and Wtfeel as the
column variable. The table displays counts and row percentages.
Minitab Directions: Select Stat/Tables/Cross Tabulation, enter sex and Wtfeel in the box of
Classification variables, select the “Chi Square Analysis” option and “Above the Expected Count”
option. (Unselect the “Row Percent” option.)
Tabulated Statistics: Sex, WtFeel
Rows: Sex
Columns: WtFeel
AboutRt
OverWt
UnderWt
All
109
105.26
32
29.98
7
12.76
148
148.00
Male
56
59.74
15
17.02
13
7.24
84
84.00
All
165
165.00
47
47.00
20
20.00
232
232.00
Female
Chi-Square = 7.921, DF = 2, P-Value = 0.019
2. Write the null and the alternative hypothesis for testing whether males and females differ in
their perceptions of their own weight.
Null Hypothesis:
Alternative Hypothesis:
3. Under the null hypothesis, what is the expected number of men in this sample who think
they are overweight? (The expected counts for each “cell” are the number under the
observed counts.)
4. What is the value of the chi-square statistic and what is the p-value. (See the line
underneath the table.)
5. Interpret the p-value in the context of this example (i.e. can we reject or not the statement
in the null hypothesis).
Download