Software for data management: The contribution of Stata

advertisement
Software for data management:
The contribution of Stata
Dr Karen Robson, Senior Research
Fellow, The Geary Institute,
University College Dublin, Ireland
Getting acquainted with Stata




StataCorp develops and distributes Stata, software for
statistical analysis.
Stata is available for Windows, Macintosh, and Unix
computers.
Stata is used by medical researchers, biostatisticians,
epidemiologists, economists, sociologists, political
scientists, geographers, psychologists, social
scientists, and other research professionals needing to
analyze data. Gaining popularity in the social and
medical sciences
Particularly useful for handling large-scale longitudinal
data
Stata SE (for large data sets)

can analyze datasets with as many as 32,766
variables, and the only limit on observations is
the amount of RAM on your computer
 can handle string variables with a maximum
length of 244 characters
 can handle matrices up to 11,000 x 11,000.
 requires at least 512 megabytes of RAM and
80 megabytes of disk space
Stata/Intercooled (the standard one)
can analyze datasets with as many as
2,047 variables, and the only limit on
observations is the amount of RAM on
your computer
 can handle string variables with a
maximum length of 244 characters
 can handle matrices up to 800 x 800.

Small Stata

A smaller, student version of Stata (for
educational purchases only)
Stata MP
The fastest version of Stata (for dualcore and multicore/multiprocessor
computers)
 Stata/MP is the fastest and largest
version of Stata.

Resources

StataCorp website (www.stata.com)
Resources

StataCorp website (www.stata.com)
 Timberlake website (www.timberlake.co.uk)
Resources

StataCorp website (www.stata.com)
 Timberlake website (www.timberlake.co.uk)
 UCLA Stata “portal”
(http://www.ats.ucla.edu/stat/)
Resources

StataCorp website (www.stata.com)
 Timberlake website (www.timberlake.co.uk)
 UCLA Stata “portal”
(statcomp.ats.ucla.edu/stata)
 Statalist (www.hsph.harvard.edu/statalist)
Resources





StataCorp website (www.stata.com)
Timberlake website (www.timberlake.co.uk)
UCLA Stata “portal”
(statcomp.ats.ucla.edu/stata)
Statalist (www.hsph.harvard.edu/statalist)
Stata Journal (www.stata-journal.com)
As well, available Dec 2008
Launching Stata
OS contingent
 Default window preferences
 Window preferences fully adjustable
 Auto memory set

Comparing with SPSS

Start up differences
Comparing with SPSS
Start up differences
 With data file open

Comparing with SPSS
Start up differences
 With data file open
 Viewing data


data viewer, data editor
Comparing with SPSS
Start up differences
 With data file open
 Viewing data



data viewer, data editor
Viewing variables
Comparing with SPSS
Start up differences
 With data file open
 Viewing data


data viewer, data editor
Viewing variables
 Viewing output/commands


output window buffer, log files
Comparing with SPSS
Start up differences
 With data file open
 Viewing data


data viewer, data editor
Viewing variables
 Viewing output/commands



output window buffer, log files
Syntax and “do files”
Variable window
INPUT
Stata command window
Review window
Do file
Pull-down menu
Computation
RESULTS
Output window
Log file
Advantages and disadvantages of Stata






User driven
Free STBs
Dedicated journal
Web active
Memory
requirements
Backward
compatible

Change!
 SPSS dominance
 Orientated to writing
syntax/code
 Pull-down windows
debate! Now in
version 8 forward
Advantages and disadvantages of Stata

Easier code
 Easier data handling
 Clarity of operations/
feedback
 Results table
function

Before version 8,
limited graphics
 Now, complex
graphics
 Variable labelling
 Editing of output
Advantages and disadvantages of Stata






Nested/master do
files
Flexible
terminology
Setting types of
data
Interactive help
Switch output (log
file) on/off
Copy and paste
Overview of analytic techniques
Too numerous to mention!
 Comprehensive manuals
 A selection:

All types of regression
 Survey package
 Epidemiological package
 Multilevel modelling
 Time series functions
 Cluster analysis

Data
Data files .dta
 Stat/Transfer software

Stata – using wide and long file
formats
Wide file formats (everything you add
goes to the right of the existing data)
 Long file formats (everything you add
goes underneath the existing data)

MERGE
Data 1
Data 2
APPEND
Data 1
Data 2
_merge values
Data 1
(indi)
‘master’
1
Data 2
(indj)
3
‘using’
2
Download