Uploaded by Andrew Wang

stata 0 470

advertisement
STATA Exercise #0: Statistics Review
Econ. 470 - Econometrics
Prof. Jonathan Hill
University of North Carolina - Chapel Hill
Instructions: The write-up should be neat and compact, where you must use a document editor (e.g. Word). Sample statistics
should be presented in neat tables with titles. Some figures require you to type your name in the title of the figure: no name = no
points. Anything written below between @...@ is a comment: do not type it in Stata. Consult the course resources on how to
present a well constructed Stata write-up.
Basic instructions are given below although in some cases Stata's output is not complete (you will still need to do some
computations). You can use the "Stata Manual" posted for reference for Stata. All commands below are performed in the
Command Window.
All datasets for this course are found in the resources folder “Data Sets”.
1.
Download the Birth Weights Excel data set. Open it in Excel, and boot-up Stata.
Copy all Excel data columns, including the variable names. In Stata click-on Data, Data Editor, Data Editor
(Edit). In the pop-up box paste the data (control-v). Be sure to treat the first row as variable names, and not as data.
Save the file for future use. This creates a Stata.dta file (data file).
a.
Compute the sample means with 90% confidence bands, and standard deviations, for the infant's birth weight,
mother's age, and mother's cigarettes smoked daily.
ci means bwght mage cigs, level(90)
summarize bwght mage cigs
“ci means…” generates he confidence interval, and standard error. We do not want the standard error. Use
“summarize” to get the standard deviation.
b.
i.
Create a histogram of birth weight, with an overlayed plot of a normal distribution.
Add a title1 "Relative Frequency of Birth Weight (YOUR NAME)". Add a text box and an arrow in
order to show the reader that the plotted line is a normal distribution.
histogram bwght, normal
ii.
creates the plots
Plot the sample distribution of birth weight, overlayed with a normal distribution.
Add a title "Distribution of Birth Weight and a Normal Distribution (YOUR NAME)".
kdensity bwght, normal
ii.
creates the plots
Test whether birth weight comes from a normal distribution.
Use ksmirnov: consult the Stata guide book. You must first compute and store the sample mean
and standard deviation of birth weight:
egen bwght_mean = mean(bwght)
egen bwght_sd = sd(bwght)
1
@ "egen" generates a new variable @
@ bwght_mean contains the sample mean, etc. @
Example: "Relative Frequency of Birth Weight (Jonathan B. Hill)".
1
Report the p-value and comment on the evidence of normality based on the plots and test.
c.
Test the one-sided hypothesis that the mean birth weight is greater than 3430grams:
H 0 :   3430
H1 :   3430
Perform the test at the 1%, 5% and 10% levels by using p-values.
ttest bwght = 3430, level(95)
d.
@This performs two- and each one-sided tests, and generates 95% bands
Use STATA's p-values wisely! @
Create a scatter plot of infant birth weight (Y) and mother's cigarette use (X).
Add a title "Mother's Cig. Consumption and Infant Birth Weight (YOUR NAME)".
Label the axes " Mother's Cig. Consumption" for (X) and "Infant Birth Weight" for (Y).
graph twoway scatter bwght cigs
Then edit the graph.
e.
Compute the sample correlations of mother's age, cigarette consumption, education and infant's
weight.
birth
i.
Test the hypothesis that mother's education and cigarette consumption are uncorrelated.
ii.
Test the hypothesis that mother's cigarette consumption and the infant's birth weight are uncorrelated.
pwcorr cig meduc bwght, sig @ gives pair-wise correlations, and below each are p-values of the
test that the true correlation is zero. @
2.
Download the Income and Money Excel file. Open it in Excel, and boot-up Stata. Copy-paste the data columns
and variable names into Stata, and save for future use.
a.
Create a line plot of the industrial production index (IPI).. This requires that you define the data set as a time
series.
gen t = _n
tsset t
tsline IPI
creates a time variable: t = 1,2,...,n.
tells STATA the dataset is a time series
plots IPI
Add a title "U.S. Monthly Industrial Production Index (YOUR NAME)".
b.
Create a new variable that is the growth of IPI (call it g_ipi).
gen g_ipi = log(IPI) - log(IPI[_n-1])
@ this creates a difference in logs off-set by one period,
identically growth. @
i.
Create a line plot of g_ipi, add a title with your name.
ii.
Compute the sample mean of industrial growth with a 95% confidence interval.
iii.
Test the hypothesis that the true mean is greater than .003:
H 0 :   .003 against H1 :   .003
Report the p-value and comment.
2
c.
Now we want to plot ln(IPI), and de-trend it to look at output cycles. Create a log-series, de-trend it, plot
ln(IPI) with the trend line, and plot the cycles. Add titles with your name.
gen ln_y = log(IPI)
regress ln_y t
predict trend
predict cycle, resid
tsline ln_y trend
tsline cycle
@ creates ln(IPI) @
@ estimates trend @
@ stores the trend line in trend @
@ stores the cycle = ln(IPI) - trend in cycle @
@ plots ln(IPI) with the trend line @
@ plots the cycle @
3
Download