Optional StataDay

advertisement
Optional StataDay
Eleanor Boyle, Adam Lenart and Virginia Zarulli
Epidemiology and Biostatistics, SDU
Autumn 2013
Outline
Introduction
Exercises
Stata-Environment
Statahas several windows
I Command You type commands like the logarithm of 100
and display
display log(100)
To execute the command press the ENTER-key
The result is shown in the:
I Result window
I the Review window holds the history of all your typed
commands
Do-files
I
Script file that holds your Stata command
I
default extension .do
I
open a new one via task line
I
type a command like disp "Good Day"
I
Mark at least part of the line, and execute via CTRL D
(CMD D for Mac users)
I
to run the whole file: CTRL A and CTRL D
Do-files
I
allows to run sequence of commands several times
I
keeps record of commands used to produce results
I
save via menu or CTRL S (do it frequently!!)
I
comments in do-files: * at the beginning of a line
or everything within /* and */
commands over multiple lines: split by ///
display A B ///
C D
I
Where to get help
I
help correlate
I
search covariance
I
Stata cheat sheet
I
Google
I
Search for Stata-programs by others:
net search unitroot
Stata Data Editor
Open the data editor either
I menu: Data -> Data editor -> Data editor (Edit)
I command: edit
The first colored row will contain the variables names
Assume the first variable is animal
I
I
I
First put in the values of the variable (dog, cat, whale)
and end each time by pressing ENTER
Second: go to the lower right Variables window and
change the default variable name var1 to animal. This
name should now appear in the colored row in the table.
Now you can generate another variable this time called
weight with the values 6.3, 8,9, and 8
Stata’s .dta files
Save your data via File -> Save and provide the file name, for
example animal.
Press OK, and the file is saved in your working directory as
animal.dta
Clear now the memory by
clear and read the data via
use animal, clear
Look at the data either by
list or browse or edit .
Opening and saving data
Open
I open existing Stata datafile animal.dat in your working
direrctory use animal, clear
I
open example dataset from Stata(at
http://www.stata-press.com/data/r12/):
webuse lifeexp, clear
open a data file (here bodyfat.dta from our course
server use http://www.biostat.sdu.dk/courses/data/bodyfat, clear
Save into your working directory
I save actual data in animalNew.dta:
save animalNew, replace
I
Useful commands for working directory
I
report name and contents of current directory: pwd
I
report name and contents of current directory: dir
I
change working directory: cd c:/users/Biostat
I
in case of memory problems: help memory
Dataimport: Excel Files
Import of Excel-files:
I Menu-oriented: see
http://stata.com/stata12/excel-import-export/
I Command: e. g.
import excel C:/Biostat1/Data.xls, sheet("2012") cellrange(A1:H25) firstrow clear
see also help import excel
Dataimport: ASCII (Text Files)
I
Tab separated
name age
Ute
Hansen -> 77.3
Ib Ibsen -> 22.0
insheet using mydata.txt, clear names
I
Space separated (make sure that variable entries are
separated by exactly ONE space (blank))
name age
"Ute
Hansen" 77.3
"Ib Ibsen" 22.0
insheet using mydata.txt, clear names delim=" "
Dataimport: ASCII (Text Files)
I
Comma separated
name,age
Ute
Hansen,77.3
Ib Ibsen,22.0
insheet using mydata.txt, clear names delim=","
For more information see
http://www.ats.ucla.edu/stat/stata/modules/input.htm
Data manipulation
webuse lifeexp
I
generate gnppc2 = gnppc^2
Make new variable gnppc2 as square of the old
I
gen logPopgrowth = log(popgrowth)
Taking logarithm (natural!)
I
gen rich = -1 if gnppc >= 20000
replace rich = 0 if gnppc < 20000
I
recode rich -1=1
Data manipulation
Drop or keep variables
I drop rich Drop the variable rich
keep region country popgrowth lexp gnppc
keeps only the mentioned variables in data
Drop or keep observations
I drop if region==3 Drops observations where region
is equal to 3
I keep in 6/20 Keeps observations from number 6 to 20
I
Add summarizing variables to data
Extensions to generate
egen mgnppc=mean(gnppc)
Adds the overall mean of gnppc to the data.
Important is the by option
egen mgnppcReg=mean(gnppc), by region
Adds the region specific mean of gnppc to the data.
Descriptive statistics
webuse lifeexp
* description of data in memory/variable properties
describe
* describe data contents
codebook
* report univariate summary statistics
summarize lexp gnppc
* empirical distribution function
tabulate safewater
* report correlation resp. covariance
correlate popgrowth lexp gnppc
correlate popgrowth lexp gnppc, covariance
Graphs
webuse lifeexp
* scatter-plot
scatter lexp gnppc
* line-plot
line lexp gnppc, sort
* scatterplot and overlay regression line
scatter lexp gnppc || lfit lexp gnppc
* histogram
histogram gnppc
Graphs: save
PDF - file graph export myplot.pdf, replace
PNG - file (for Word)
graph export myplot.png, replace
WMF - file (for Word)
graph export myplot.wmf, replace
Outline
Introduction
Exercises
Exercise 1
1. Create a new do-file.
2. Load the dataset water.dta from our course server.
3. Save the data under the name vand.dta into your working
directory.
4. Look at summary measures of the data using e. g. the
commands des, list, summarize, codebook, mean.
5. Make a scatterplot of mortality against calcium using
scatter (try help scatter)
6. Make a histogram of mortality.
Exercise 2 - Part 1
1. download the data
http://www.biostat.sdu.dk/courses/data/tabSepData.txt
http://www.biostat.sdu.dk/courses/data/commaSepData.txt
http://www.biostat.sdu.dk/courses/data/spaceSepData.txt
http://www.biostat.sdu.dk/courses/data/excelData.txt
2. set working directory to where you saved the files
(menu-based or using the command cd)
3. use insheet to read the data in in "tabSepData.txt",
"commaSepData.txt" and "spaceSepdata.txt"
4. use mport excel to read in the data in "excelData.txt"
5. save finnaly the data as a Stata dataset named
"StataData.dta"
6. clear the data from memory and read it back in from
"StataData.dta" using use
Exercise 2 - Part 2
Note: Exercise parts marked with * are more difficult than the
others and might possibly be skipped.
8. Click into the Data Editor and type in the variable sex
with values 1,2, and 1.
9. Define value labels for sex (1=male, 2=female)
10. * Use generate to generate id, a subject index (from 1
to 3).
11. Use rename to rename the variables v1 to v3 to time1
to time3.
(*Also try doing this in a loop using forvalues.)
Reshaping Data
Generally, data exists in two formats: wide and long.
Assume we have measurements on j occasions for i subjects.
I
wide: one line per subjectid
each occasion j is represented by a variable weightj
reshape long weight, i(id) j(occ)
Note: the variable occ does exist first in the long table.
I
long: one line per occasion
reshape wide weight, i(id) j(occ)
Exercise 2 - Part 3
Note: Exercise parts marked with * are more difficult than the
others and might possibly be skipped.
12. * Use reshape to convert the dataset to long shape.
13. * Generate a variable d that is equal to the squared
difference between the variable time at each occasion
and the average of time for each subject.
14. Drop the observation corresponding to the third occasion
for id=2 using the commands drop and if (see
help if).
Exercise 3 (1.2 in the Stata-book)
Download http://www.biostat.sdu.dk/courses/data/wagepan.dta
Data on wages and race for 545 American young males for 1980-1987.
The variables considered here are:
year: calendar year 1980 to 1987
lwage: natural log of hourly wage
black: dummy variable for being black
hisp: dummy variable for being Hispanic
1. Execute: describe (or just: des) to get en overview of the data
2. Retain only the above 4 variables using the keep command
3. Create a new variable equal to the exponential of lwage.
4. Collapse the data (using collapse) to obtain the mean wages by
year and ethnic group.
5. Produce a line graph (using twoway line) showing the mean
wages over time, separately for the groups.
4.∗ Improve the graph by defining labels, line patterns, legends etc.
(compare Stata book)
Exercise 4
I
Download into Stata the file Symptoms.dta via
use http://www.biostat.sdu.dk/courses/data/Symptoms.dta, clear
I Download also the .do file PlottingSymptomsOfDiseases.do
http://www.biostat.sdu.dk/courses/data/PlottingSymptomsOfDiseases.do
(On some computers this download may cause problems. Often Mac users have
under Safari -> Preferences NOT unchecked the Open after download,
and Mac tries to open the do file which the OP cannot)
I Try to understand, what it is about and what the do-file is doing.
Download