Instructions for a Student Project on an Empirical Analysis of

advertisement
Instructions for a Student Project on an Empirical Analysis of Cigarette Smoking
(Mainly for Wabash students, though this is adaptable at other institutions. Delete this
before distributing. These instructions are based on using the JMP files. The
CigProjectwithPriceData.xls workbook has instructions and links to NBER web sites for
cig data, including SAS and Stata programs for the data.)
The files in this folder are designed to enable a student to do an empirical analysis of
various questions related to smoking behavior. Since the price elasticity of demand for
cigarettes is of particular interest, we provide the material needed to estimate this
elasticity.
A codebook (cpsmay99.pdf) is available.
Three data sets are available:

CigPrice.xls is an Excel workbook that contains average cigarette price data by
state.
The other two data sets are the same, but in different format. You will have to use the
codebook to whittle down the data set to manageable size.

CPSMay99.zip is a compressed archive that contains a single JMP file of the CPS
May 1999 Smoking Supplement. Although a slim 18.1MB compressed, this file
will balloon to 450MB when unzipped. You must have enough space for the file.

CPSMay99.dta is a Stata data set which contains all the data. You may not know
how to use Stata, but your instructors can help you. If you are a Wabash student,
we stand ready to give you a subset of the data in an Excel file containing only the
variables you specify. (This data set is too large to transmit over the web. To
obtain the data, go to the NBER web page for “Reading Current Population
Survey (CPS) Data with SAS, SPSS, or Stata”, and obtain the .do and .dct files for
reading in the data as well as the ascii version of the May 1999 CPS Smoking
Supplement, available at the NBER CPS Supplements page.
CigProjectInstructions.doc
Page 1 of 4
Incorporating Price Information:
If the student wishes to include state average cigarette prices in the analysis, the two
datasets must be merged. The CPS has a Census code for the state in which each
individual resides. The student must figure out these codes and perform the merger. We
recommend using the VLOOKUP function to assign prices to observations based on the
observations’ state codes from an Excel lookup table with three columns: state
abbreviation, state numeric code, state average price. For an example of how to use
VLOOKUP, see CPSRecode.xls (in the Basic Tools\InternetData\CPS folder).
Before merging, if JMP is being used, we suggest cutting down the file size to speed up
analyses and lessen memory issues.
JMP Suggestions:
To make the JMP file more manageable, we recommend deleting unneeded variables and
missing observations. An effective (but by no means the only) way to do this is to use
the Tables: Subset command. The idea is to highlight the rows or columns that you want
to keep, then execute Tables: Subset and a new data table is created.
For example, since smoking is the key concept in this data set, you may wish to drop all
observations where it’s unknown whether or not the person is a smoker. You need to
highlight all of the values that have data on smoking behavior.
One way to do this is to create a new variable, based on the Smoker recode variable. We
created a new column, named it “In Smoker Universe,” and inserted the following
formula:
CigProjectInstructions.doc
Page 2 of 4
This formula produces a “1” if Smoker recode is greater than 0, and a “0” if Smoker
recode is less than or equal to 0. In the code book, Smoker recode is less than 0 if it’s not
possible to determine whether the person is a smoker or if the smoking questions weren’t
asked for this person (“not in the universe”). We then created a histogram of this variable
by using Analyze:Distribution. We clicked on the bar for “1” and had the following
display:
Note that rows with a value of 1 for “In Smoking Universe” are selected.
Now that the observations with information on smoking behavior are highlighted, we
execute Tables:Subset and chose the following options:
This produces a new data table with a little more than half as many observations as
before. Notice that the Subset dialog box allows you to produce a smaller data set by
drawing a random sample from the original data set.
The same Tables: Subset procedure can be used to select variables that will be used in
your analysis. Click on the column containing the variable name, and hold down the
CTRL key as you click on non-contiguous columns that you want to keep. When
CigProjectInstructions.doc
Page 3 of 4
finished, execute Tables: Subset and you will have a much smaller, more manageable
subset of the original, complete data set.
Document your work in case you need to reproduce the smaller dataset and so others can
see what you have done.
CigProjectInstructions.doc
Page 4 of 4
Download