Exercise 1 - Data Services

advertisement
Stata Intro
Practice Exercises
2014 - Debby Kermer, George Mason University Libraries Data Services
Instructions
Create and run syntax to accomplish each task.
Press the spacebar to see the next instruction,
an answer or a hint.
Open the Pew Social Trends Dataset
use http://dataservices.gmu.edu/files/pew.dta
___
given at the workshop
OR
File | Open…
[type in] http://dataservices.gmu.edu/files/pew.dta
given at the workshop
OR
Download the dataset from:
http://www.pewsocialtrends.org/category/datasets/?download=5753
1
Exercise 1
Using Help
1a
Produce statistics about yrborn using the summarize command
summarize yrborn
Open the help for that command
help summarize
Modify the syntax to…
… use abbreviations
sum yrborn
or
sum yr
or
su y
… display additional statistics
sum yr, detail
_____
<continues…>
Need to create yrborn?
generate yrborn = 2005 - age
1b
summarize yrborn
… ignore those who refused to give their age
sum yr if (________)
(age != 99)
Forgot which value meant refused?
label list AGE
sum yr if (________)
(age < 99)
Your result should look like ↓
Variable |
Obs
Mean
Std. Dev.
Min
Max
----------+-------------------------------------------------yrborn |
2948
1963.089
18.01353
1915
1995
Now, summarize age, ignoring those who refused to answer
sum age if (age < 99)
… and ALSO display additional statistics
sum age if (age < 99), detail
1c
Extra Challenge
Compare average age by Region (cregion)
tab cregion, sum(age) See the help page we used as an example:
help tab then tabulate, summarize()
Notice how this is a combination of both
tab cregion - frequencies for categorical variables
and
sum age - means for numeric variables
But, summarize is used as an option, so the
comma and parentheses are necessary
2
Exercise 2
Indicator Variables
Make a new variable "voted" indicating those who voted in the
'04 election. Voters should have a 1, non-voters should have a 0.
First, get information about the variable you will use:
________
codebook pvote04a
Then, create your variable:
generate voted =___________
(________)
(pvote
== 1)
use tab pvote04a voted to check your work:
2a
2b
If you want, this is how you can label the variable "voted"
label variable voted "Voted in the '04 Election"
label define yesno 1 "Yes" 0 "No"
label values voted yesno
("yesno" is a made-up name, you can use anything)
Now, you try: label the variable "youth" appropriately
lab var youth "Youth: age < 30"
Need to create "youth"?
generate youth = (age < 30)
replace youth = . if (age == 99)
lab def under30 1 "< 30 yrs old" 0 "30 yrs and up"
lab val youth under30
2c
Extra Challenge
In one statement (i.e., one line of syntax), create a variable
legal indicating only those of legal drinking age
legal | Freq.
gen legal = (age >= 21) & (age < 99)
------+------1 | 2,842
gen legal = (age >= 21) if (age < 99)
Although both of the above are good, the values generated by
these two commands are not identical. How do they differ?
& recodes 99's as 0
if recodes 99's as missing
Legal
Drinker
Not
Legal
No Age
(99)
gen legal = (age >= 21) & (age < 99)
1
0
0
gen legal = (age >= 21) if (age < 99)
1
0
.
3
Exercise 3
Illustrating Relationships
3a
Show the relationship between age group and voting rate
What variables can you use?
youth and voted
What command can you use? Open help.
help tab then tabulate twoway
Construct your syntax
tab youth voted___________
Use options to include percentages, like this ↓
|
voted
youth |
0
1 |
Total
-------+----------------------+---------0 |
18.68
81.32 |
100.00
1 |
47.65
52.35 |
100.00
-------+----------------------+---------Total |
23.05
76.95 |
100.00
Pearson chi2(1) = 179.6007
Pr = 0.000
12
3b
Show the relationship between age group and voting rate
•••
tab youth voted, row nofreq chi2
|
voted
youth |
0
1 |
Total
-------+----------------------+---------0 |
18.68
81.32 |
100.00
1 |
47.65
52.35 |
100.00
-------+----------------------+---------Total |
23.05
76.95 |
100.00
Pearson chi2(1) = 179.6007
Pr = 0.000
So, is there a relationship between age and voting?
Among those younger than 30, 52% voted.
But, among those 30 or older, 81% voted.
Youth were less likely to have voted (p < .001).
13
3c
Extra Challenge
What are the 4 ways the tabulate command can be written?
tab youth
1-way, frequencies
tab youth voted
2-way, crosstab / contingeny table
tab youth voted cregion too many variables
tab1 y vote cr
→ tab y
tab2 y vote cr
→ tab y vote + tab vote cr + tab y cr
tab y , sum(vote) → tab y
tab y cr, sum(vote) → tab y cr
+ tab vote
+ sum vote
+ sum vote
+ tab cr
Means by Group
Pivot Table
That's All!
Thanks for trying the Stata Exercises.
If you have any questions about using Stata
contact Debby Kermer at dkermer@gmu.edu
or see our online resources at:
http://dataservices.gmu.edu/software/stata
Download