Getting started with Stata

advertisement

Getting started with Stata, in pictures

A cheat sheet prepared by Robert Grant : www.robertgrantstats.co.uk

The screen shots in this document are taken from Stata version 13 on a Linux computer.

Stata versions 12, 13 and 14 are almost identical in how they look (and 11 is quite similar too), and they have the same commands so you can use this with any of those.

The Stata user interface

When you start Stata, it will look something like this. The window is divided into several panes, showing output (the big one in the middle), recent commands (there are none so far), variables in the data file (there isn't one open yet) and a command line at the bottom where you can type in instructions in Stata's own programming language. However, you don't have to work with this code, you can use the drop-down menus instead.

You can rearrange the panes, and you can change the size of font, or the colour scheme.

When you view the data, or a help file, or a graph, additional windows will open up but can be closed without terminating Stata. Only when you close this window seen above, will

Stata shut down – and it will ask you first if you want to save your work.

Common tasks in Stata

The data file used below is at robertgrantstats.co.uk/vitalsigns2009.dta

Change Stata's working directory (where it looks for and saves files)

Click on the menus for File – Change working directory

Or, type the cd command in the command line, for example cd Documents/research/data

Open a Stata-formatted data file (.dta)

File – Open

Choose the file in the usual way, click OK. The list of variables appears in the variables pane.

if you click the Browse button, you can look at the data in spreadsheet style.

Note that the Gender variable is coloured blue, because it contains codes 1 and 2, which

Stata has linked to labels “Male” and “Female”. The other variables, coloured black, are just numbers.

Or, type use filename.dta

(where you replace filename with your actual file's name).

Importing an Excel spreadsheet

File – Import – Excel spreadsheet

• Choose the file

• Choose the worksheet (if there is more than one)

• If the first row contains the variable names, select the “Import first row as variable names” tick box

• Click OK

Or, type import excel "my data file.xls", sheet("Vital signs 2009") firstrow

Don't forget that, if you have data already open, this command (or use ) will require the clear option added to the end, like this: import excel "my data file.xls", sheet("Vital signs 2009") firstrow clear

Note that, if your file name contains spaces, you will need to enclose it in quotation marks.

Get a table of descriptive statistics

Statistics – Summaries, tables and tests – Other tables – Compact table of summary statistics

Choose the variable(s) to analyse.

If you want the results broken down into groups, select the variable that defines the groups

Choose the statistics you want.

Click OK.

Or, type tabstat SBPrest, by(gend) stat(mean sd p50 p25 p75)

The table appears in the output pane.

Draw a histogram

Graphics – Histogram

Choose a variable

Optionally, choose how many 'bins' you want data grouped into (20-30 is a good range)

Optionally, choose what you want on the Y (vertical) axis (frequency will show the actual number of obsevations in each bin)

The graph appears in a new window. The first time you ask for graphics in each Stata session, it will load its graphics commands, which can take a few seconds longer.

Or, type histogram SBPrest, bin(25) frequency

Save a graph

In the graph window, click File – Save as

Choose a file name in the usual way. If you accept the default .gph file format, your file will be save din Stata format and can then be edited in the future inside Stata. If you choose a picture format like .png, you can paste this into slides, documents, upload it, etc. It is usually a good idea to save both.

Or, type: graph save mygraph.gph, replace graph export mygraph.png, replace

(Note the replace option, which will overwrite this file if it already exists.)

Generate a new variable

It is just as quick and simple to type the command in the command line for this task, as to use the menus. Typing generate hyp = (SBPrest>130) will create a new variable called hyp (unless one already exists), which takes values 0 or 1 depending on whether the statement SBPrest>130 is false or true respectively. (This is a strange definition of hypertension, but it will be handy for demonstration purposes because this group – physiotherapy students – are rarely genuinely hypertensive).

Formulas can be typed in too. If the BMI did not already exist, it could be created: generate bmi = weight / (height^2)

To make a binary variable indicating female sex: generate female = gend-1

Get a confidence interval for a mean

Statistics – Summaries, tables and tests – Summary and descriptive statistics –

Confidence intervals

Select the variable you want to analyse

Click on the by/if/in tab and choose any variable defining groups, if you want the data broken down by them.

Click OK. The statistics will appear in your output window.

Or, type ci SBPrest for all participants together, or bysort gend: ci SBPrest to break it down by gender.

Run an independent-samples t-test

Statistics – Summaries, tables and tests – Classical tests of hypotheses – t-test (mean comparison)

Select the type of test: one-sample, two-sample comparinsg groups or variables

(independent samples), or paired.

For comparing groups, include the variable that defines the groups, and click OK. The results appear in the output window.

As before, you can see the corresponding command above the output. In fact, this is a good way of learning commands via the drop down menus.

Calculate correlations

Statistics – Summaries, tables and tests – Summary and descriptive statistics –

Correlations and covariances

Select the variables you want and click OK. The output will contain a matrix relating each variable to all others.

Analyse a prospective study with risk or odds ratios

Statistics – Epidemiology and related – Tables for epidemiologists – Cohort study riskratio etc.

Select the variables which define the outcome (case) and the cause (exposure)

Under options, if you want to see an odds ratio, tick that box. The Woolf approximation is the 'textbook' formula for a confidence interval around the odds ratio.

The results appear in the output.

Or, type cs hyp female, or woolf

Fit a logistic regression

To assess the effect of BMI on our definition of hypertension, we can fit a logistic regression.

Statistics – Binary outcomes – Logistic regression (reporting odds ratios)

Select the outcome and the predictor(s), and click OK.

The results appear in the output. BMI is not a significant predictor of hypertension in this population (but the sample is small).

Or, type logistic hyp bmi

The syntax of Stata commands

Commands can be very useful. They are flexible, allowing a few things that drop-down menus do not. They are quick, once you know the right command, and they allow you to save the instructions or program for your data processing, analysis and graphics in a “dofile”. They all follow a similar syntax, like the structure of a sentence: histogram SBPrest, bin(25) frequency histogram is the command. These can be shortened sometimes; in the help files you can see what is acceptable for this. If you make even a small typo, Stata will not know what command you want to run and will tell you that your command was not known. This usually just means a spallign mistakke.

SBPrest is a variable name. Some commands accept more than one variable name after another here. In the help files, you will see these called 'varname' and 'varlist'.

Everything that follows the comma is an option. There can only be one comma in a Stata command. Forgetting the comma, or adding more than one, is a common mistake for beginners!

Some options, like bin(25) , require some information in parentheses. Bin allows you to specify how many bins in the histogram, so it is then followed by the number.

Some options are just words that ask Stata to turn on a particular option, like frequency .

Getting help!

Knowing where to find help is the most important skill in Stata (or any other software).

Nobody remembers all the commands and options, so it is normal to have the help window open while you work. There are two important ways to find information inside Stata. If you know the name of the command, like tabstat , you can type in the command line: help tabstat and a window will open explaining the syntax and the options to you. There are usually some examples given further down, and as you become a more advanced user, you will find more information here useful. Notice that, when you use Stata's drop-down menus and access a dialog box to do a particular function, the name of the corresponding command is shown at the top of the box.

On the other hand, you might only remember that you once found a useful command that made a table of statistics. In this case, get Stata to search the help files for those words by typing: hsearch table statistics

The first time you do this, Stata will take a minute or two to compile an index of help files, but in future, searching will be fast. The results will include online information, if you are connected to the internet.

Online, you should also look at statalist.org which is a forum for Stata users. Search the archives to see if anyone else has asked your question, or send a message. You will usually get a response from some experienced users quite quickly.

Another excellent resource is www.ats.ucla.edu/stat/stata which gives worked examples and explanations, but only with commands, not drop-down menus.

Finally, don't forget to Google the thing you want to do, by searching for something like

“table of descriptive statistics in Stata”.

Download