Teaching with Stata Peter A. Lachenbruch & Alan C. Acock

advertisement
Teaching with Stata
Peter A. Lachenbruch
&
Alan C. Acock
Oregon State University
peter.lachenbruch@oregonstate.edu
alan.acock@oregonstate.edu
First Course Requirement—Data
Entry
• I want a first course to be able to do the things I
want students to do:
– Enter and edit data--must be “want to know topic”
– Students can do a small survey to get data on topics
of interest to them.
• Voter poll
• Attitudes toward diversity issues on campus
• Beliefs about regulating the internet
– Learn how to create a codebook, use codebook and
codebook, compact
• Where possible use “real” data
WCSUG Presentation
2
First Course Requirement—Data
Management
• Balance statistical content with proper data
management content—hard decision
• Storing original dataset and creating a working
dataset
• Keeping a record of every data modification they
make using do-file
– Menu system is an aid
– Do-files are the requirement
• Missing values--distinguish types
• Variable names, labels, and value labels
WCSUG Presentation
3
First Course Requirements—
Data Management
• Transformations – log, , exp
• Logical editing – beware of logical
transformations when missing values are
present (gen y = x < 10 leads to “.”
transforming to 0)
• Appending
– Append student generated datasets
• Merging
– Merging two waves of data
WCSUG Presentation
4
First Course Requirements—
Data Management
• Constructing Measures
– When to use egen newvar =rowtotal(var1, var2,
var3)
– When to use egen newvar =rowmean(var1, var2,
var3)
– When to use misschk command, what it does
• Suppose the variable category is 0 or 1
• If there are missing values in category, there is a
difference between
–
–
–
–
gen y = 1 if category
gen y = 1 if (category==1)
gen y = 1 if (category>0)
The first and third will give scores of 1 for missing values. The
second will give a score of 0 for missing values - BEWARE
WCSUG Presentation
5
First Course Requirements—
Data Management
• edit command, insheet input, infile
(csv files)
• gen newvar = ln(oldvar)
• Rarely use replace oldvar = sqrt(oldvar)
– only when correcting an error – don’t replace
data
• merge ptid assessment using file,
update (need for data to be sorted)
WCSUG Presentation
6
First Course Requirement (2)
– Data presentation, numerical summary measures –
summarize, detail; list; browse; edit;
describe; codebook; codebook, compact
– Graphic presentation--bar chart, histogram, box plot
seem minimum
– Probability computations – binomial,
binomialtail, chi2, chi2tail, F, Ftail,
normal – use of the inverse functions for these.
WCSUG Presentation
7
Examples
• summarize sp,detail; list sp;
describe s*; codebook s*
• display binomial(10,3,0.1) for
cumulative or display
Binomial(10,3,.1) for reverse
cumulative; Note disp 1binomial(10,2,.1) gives the
same result (also
binomialtail(10,3,.1)
• display normal(1.2)
WCSUG Presentation
• gen y =
8
First Course Requirement (3)
• Confidence intervals
– Binomial – ci—ci variable
– Normal – ci—ci variable
– Poisson – ci—ci variable, poisson
• Percentiles –
– summarize,d
– centile price, c(10(10)90)
WCSUG Presentation
9
Examples
• cii 20 4;
– cii 20 4, agresti
• Sometimes we want to use the Agresti formulation. The
exact is usually preferable
• ci varname, level(99)
• summarize weakness, detail
– Can use su weakn,d (i.e. abbreviate
commands, options and variables)
• centile weakness,c(20,40,60,80)
– Or centile weakness,c(20(20)80)
WCSUG Presentation
10
First Course Requirements (4)
• Hypothesis Testing:
– Normal r.v.s
• One sample (including paired data) • Two sample - ttest
• K samples – ANOVA
– Binomial variables
• One sample – proportion
• Two samples – tabulate, chi2
WCSUG Presentation
11
Examples
• ttest sp = 120 [one-sample]
• ttest spmen = spfem [paired]
• ttest spmen = spfem, unpaired
unequal welch
• ttest sp, by(sex) [unequal welch etc.]
• Also immediate form – see help
• anova sp agegrp
WCSUG Presentation
12
Examples
• bitest success = 0.8 [one sample
binomial]
• tabulate success group, chi2
row col
• prtest success, by(group) [two
sample binomial]
WCSUG Presentation
13
First Course Requirements (5)
• Hypothesis Testing (cont.)
– Power considerations – sampsi
(or spreadsheet – nice exercise for some
good ones)
– Nonparametric methods – sign, signrank,
ranksum
• Contingency tables – tabulate, epitab
WCSUG Presentation
14
Examples
• sampsi 132.86 127.44, p(0.8) r(2)
sd1(15.34) sd2(18.23)
• ranksum sp, by(survive)
• signrank before = after
• When should we supplement Stata with other
software such as G*power 3 that is free and
more flexible than sampsi or other software
such as PASS or nQuery Advisor?
WCSUG Presentation
15
First Course Requirements (6)
• Simple linear regression – regress,
rvfplot, other diagnostics
• Correlation – corr, spearman, ktau – I tend
not to use corr because of the sensitivity to the
normality assumption for tests and confidence
intervals
• Only pwcorr and not corr provide test of
significance
WCSUG Presentation
16
Examples
• regress mpg weight
• rvfplot
• Stata’s “type a little, get a little” very different
from other packages
• correlate mpg weight or pwcorr mpg
weight (especially when you have more than 2
variables – can specify sig and obs—Note that
these only work with pwcorr)
• spearman mpg weight – would be nice to
have Stata produce a Spearman correlation
matrix
WCSUG Presentation
17
Examples
• It’s easy to use permutation tests
. permute anyhcq t=r(t):ttest ald7 if adult==1 & assnum==1,by(anyhcq) (running ttest
on estimation sample)
Monte Carlo permutation results
command:
t:
permute var:
Number of obs
=
97
ttest ald7, by(anyhcq)
r(t)
anyhcq
--------------------------------------------------------------------------T
|
T(obs)
c
n
p=c/n
SE(p) [95% Conf. Interval]
-------------+------------------------------------------------------------t |
1.648305
13
100
0.1300
0.0336
.071073
.2120407
--------------------------------------------------------------------------Note:
confidence interval is with respect to p=c/n.
Note:
c = #{|T| >= |T(obs)|}
• One can do similar things with the bootstrap
• These are easy to use and intuitive for students
WCSUG Presentation
18
Use of Stata in the Classroom
• Use Stata sparingly
– It’s not easy to follow commands typed or used from
menus – students will get confused
– Have handouts of what you do – make spacing large
enough that students can annotate – even if only to
write nasty things about the instructor
– Balancing coverage of Stata, e.g. data management
with coverage of Statistics is a constant issue
– Remember – it’s a course in statistics, not in Stata
WCSUG Presentation
19
Data Sets
• Place data sets on a LAN or common
drive or available for copying to flash drive
or CD
• Use real data
– Not too many variables
– May have missing values – but should not
affect main analyses – unless you want to
demonstrate the problems with missing
values
WCSUG Presentation
20
In the Classroom
• Using CD rather than flash drive is
better(?)
– Many desktops have USB port located
inconveniently (darn you Dell!)
– Sometimes newer PCs have USB port on
monitor, and laptops usually have an easy
slot for the flash drive
– Light level in the room should allow students
to read easily
– Days of dim projectors are over
WCSUG Presentation
21
In the Classroom (2)
• Enlarge the Stata font by using right
mouse button
– I have found that 14 point is pretty good
– Be careful about wraparound of output – if
needed, reduce point size temporarily
– Don’t ever use red on blue font
– See what I mean? It’s more difficult to read
• Show how to move and fix windows
WCSUG Presentation
22
In the Classroom (2)
• Optimizing visibility with projector
– Use rich color background
– EditPreferencesGeneral preferences.
Blue background option good but it relies on
red for errors, green for Standard text, and
doesn’t bold fonts.
– Custom may be better because you can make
fonts bold and pick colors that do not
disadvantage students who are colorblind.
WCSUG Presentation
23
Virtual Lab
• A server supporting 30 simultaneous sessions of
Stata is remarkably inexpensive.
• A department can require students to have
laptops or provide a cart with enough laptops
• Because laptops are really “dumb” terminals
with server, the laptops can be cheap and not
updated very often
• Any room becomes a lab
• Students should have 24/7 access to the server
WCSUG Presentation
24
Handouts and Data Sets
• Have handouts of your lecture notes
• Have handouts of your data analysis
demonstrations
– Include commands as well as output!
• Data sets
– On line – LAN or CD or Floppy disk --Lots of laptops
don’t have floppy drives any more, flash drives are
inexpensive
• Include
– Student generated datasets
– Datasets with large Ns and relatively few variables
WCSUG Presentation
25
Emphasis in Course
• Lectures devoted to statistics
• Labs to learning Stata and working on
homework and discussion
• Proper printing of output
– Don’t split output between two pages if
possible (at least, find a good break point)
– Always use a monotype font (such as Courier
New)
WCSUG Presentation
26
Some Final Issues
• Multiple testing can distort inference (i.e.
doing 100 tests guarantees some
significant results – but they may be
meaningless) – Worry about this
• Controlling the digits in the output. Use
outreg, estout, esttab
WCSUG Presentation
27
The End
WCSUG Presentation
28
Download