Uploaded by Phương Nguyễn

1. chapter1

advertisement
Introduction to Econometrics
Instructor: Le Hang My Hanh
lehangmyhanh.cs2@ftu.edu.vn
Course materials
1. Introduction to Econometrics, 3rd Edition, by
Addision-Wesley Series in Economics
2. Basic Econometrics by Gujarati, Fourth
Edition (Ch1-13)
3. Introductory Econometrics- A modern
approach by Jeffrey M. Wooldridge (Ch1-9)
2
Assessment
• Performance: 10%
• Mid term test + project: 30%
• Final term test : 60%
3
Instruction for your project
• Each group should write and present a short report (max. 15 pages all
included) based on the data and introduction given during the course.
• The report should be organized as follows:
1. Introduction
Give a brief statement about the purpose of the study.
2. Literature Review
-
Summarize the main published work concerning your research question.
It should be a synthesis and analysis of the relevant published work, linked at all times
to your research question.
3. Methodology and data
An introduction of your model (dependent and independent variables)
A description of the data must be provided here. You should discuss the
data sources and the definition of variables and report in a table
summary statistics such as minimum and maximum values, means,
standard deviations for each variable.
4. Results: Estimation results are provided in a table and discussed in this
section.
5. Conclusion: you should summarize the results here.
4
TOPIC
• Export and economic growth:
– Tea export
– Fisheries export
– Agricultural export
– ….
Lehangmyhanh.cs2@ftu.edu.vn
5
•
•
•
•
•
•
Outline
Chapter 1: Introduction to Econometrics
Chapter 2: Simple Regression
Chapter 3: Multiple Regression
Chapter 4 : Statistical Inference
Chapter 5: Diagnosing Model Problems
Reading papers + Replicating empirical Research +
Presentation
Le Hang My Hanh, FTU CS2
6
•
•
•
•
•
•
•
•
Some keywords
Dependent variables, independent variables
Regression, estimation, estimator, estimate
Empirical research
Significant, significance, significance level,
confidence interval
Hypothesis, hypothesis testing
Assumption
Correlation, autocorrelation, multicollinearity,
heteroscedasticity, homoscedasticity
Biased, unbiased
7
Examples of empirical research
❑
This thesis examines the relationship between the
probability of financial distress and some specific financial
ratios in order to identify internal factors causing distress
for firms. (Phu Kim Yen, K49 CLC)
• Findings: Size has negative coefficients which are
statistically significant at significance level of 1% in all
estimations. This finding is consistent with previous study
of Ohlson (1980). The author concludes that size affect the
probability of financial distress of Vietnamese listed firms,
especially those on HOSE. In reality, large-cap companies
often have more power in its trading position with
counterparties as well as more approaches to financing
resources. Therefore, it is easier for them to weather
unexpected downturns.
8
Introduction to Econometrics
The Nature and Purpose of Econometrics
1. Why do you need to learn Econometrics?
2. What is Econometrics? What will you
learn from the course?
3. How do you learn? Methodology of
Econometrics
4. Terminology and notation
5. Types of data
9
Why do you need to learn Econometrics?
Economics suggests important relationships, often with policy
implications, but virtually never suggests quantitative
magnitudes of causal effects.
• What is the quantitative effect of reducing class size on
student achievement?
• How does another year of education change earnings?
• What is the price elasticity of cigarettes?
• What is the effect on output growth of a 1 percentage point
increase in interest rates by the Fed?
• What is the effect on housing prices of environmental
improvements?
10
What is Econometrics?
• Econometrics = “economic measurement”.
• “Econometrics may be defined as the social science in
which the tools of economic theory, mathematics, and
statistical inference are applied to the analysis of
economic phenomena” (Goldberger 1964).
11
In this course you will:
• Learn methods for estimating causal effects using
observational data
• Focus on applications – theory is used only as needed to
understand the “why”s of the methods;
• Learn to evaluate the regression analysis of others – this
means you will be able to read/understand empirical
economics papers in other econ courses;
• Get some hands-on experience with regression analysis in
your problem sets.
12
1.2. Methodology of Econometrics
1. Statement of theory or hypothesis .
2. Specification of the mathematical model of the
theory
3. Specification of the statistical, or econometric
model
4. Collecting the data
5. Estimation of the parameters of the econometric
model
6. Hypothesis testing
7. Forecasting or prediction
8. Using the model for control or policy purposes.
Example
1. Statement of Theory or Hypothesis
• Keynes states that on average, consumers increase
their consumption as their income increases, but not
as much as the increase in their income (MPC < 1).
•
MPC= marginal propensity to consume
Example
2. Specification of the Mathematical Model of
Consumption (single-equation model)
Y = β1 + β2X
0 < β2 < 1
(1)
Y = consumption expenditure (dependent variable)
X = income (independent or explanatory variable)
β1 = the intercept
-184.08 + 0,7064X
β2 = the slope coefficient
• The slope coefficient β2 measures the MPC.
MPC= marginal propensity to consume
Example
Geometrically,
• Geometrically,
Example
3. Specification of the Econometric Model of Consumption
• Other variables can affect consumption expenditure: size of family,
ages of the members in the family, family religion → the inexact
relationships between economic variables
• To allow for the inexact relationships between economic variables,
(1) is modified as follows:
• Y = β1 + β2X + u
(2)
• where u = the disturbance, or error, term, a random (stochastic)
variable that has well-defined probabilistic properties.
• u may well represent all those factors that affect consumption but
are not taken into account explicitly.
Example
• (2) is an example of a linear regression model, i.e., it hypothesizes
that Y is linearly related to X, but that the relationship between the
two is not exact; it is subject to individual variation. The econometric
model of (2) can be depicted as shown in Figure 2.
Example
4. Obtaining Data
• Y = personal consumption expenditure (PCE)
• X = gross domestic product (GDP)
Example
5. Estimation of the Econometric Model
• Regression analysis is the main tool used to obtain the
estimates. We obtain the estimates
β1 = −184.08 and β2= 0.7064
Yˆ = −184.08 + 0.7064Xi
(3)
→ An increase in real income of 1 dollar led, on average,
to an increase of about 70 cents in real consumption.
Example
The data are plotted in Figure I.3
Example
6. Hypothesis Testing
• Keynes expected the MPC to be positive but less than 1.
• In our example MPC= 0.70 → we must enquire whether
this estimate is sufficiently below unity. In other words, is
0.70 statistically less than 1? If it is, it may support
Keynes’ theory.
• Such confirmation or refutation of economic theories on
the basis of sample evidence is based on a branch of
statistical theory known as statistical inference
(hypothesis testing).
Example
7. Forecasting or Prediction
• To illustrate, suppose we want to predict the mean
consumption expenditure for 2015. The GDP value for
2015 was 7269.8 billion dollars consumption would be:
Yˆ2015 = −184.0779 + 0.7064 (7269.8) = 4951.3
8. Use of the Model for Control or Policy Purposes
• Suppose the government decides to propose a
reduction in the income tax. What will be the effect of
such a policy on income and thereby on consumption
expenditure and ultimately on employment?
Terminology and notation
Unless stated otherwise:
• The letter Y will denote the dependent variable
• The X’s will denote the independent variables, Xk being the
kth explanatory variable.
• The subscript i or t will denote the ith or the tth observation
or value.
• N will denote the total number of observations or values in
the population,
• n will denote the total number of observations in a sample.
• u or e will denote the random error or stochastic
24
Terminology and notation
• In the literature the terms dependent variable
and explanatory variable are described
variously. A representative list is:
25
1.3. Types of data
• There are three types of data empirical
analysis: time series, cross-section, and panel
data.
• Time series data: a set of observations on the
values that a variable takes at different times.
It is collected at regular time intervals, such
as daily, weekly, monthly, quarterly, annually.
Ex: weekly stock return, monthly interest rate,
GDP growth, CPI and so on.
26
1.3. Types of data
• Cross-section data: data on one or more
variables collected at the same point in time.
Ex: the census of population conducted by
the Vietnam General Statistics Office every 10
years. Profits of listed firms in 2014.
• Panel data/ Pooled data: set of combination of
time series and cross-section.
27
Example of panel data
28
The accuracy of data
The results of research are only as good as
the quality of the data.
• If in given situations researchers find that the
results of the research are “unsatisfactory”,
the cause may be not that they use the wrong
model but that the quality of the data was
poor.
29
Measurement Scales of Variables
• Four broad categories: ratio scale, interval scale,
ordinal scale and nominal scale.
• Ratio scale: GDP growth rate, interest rate, ROE.
Most economic variables belong to this category.
• Interval scale: the distance between two time
periods, say (2000-1995)
• Ordinal scale: income class (upper, middle,
lower), grading systems (A,B, C grades)
• Nominal scale: gender (male, female), marital
status (married, unmarried, divorced, separated)
30
1.4 Review of statistics
• Emperical problem:
educational output
Class
size
and
– Policy question: What is the effect on test
scores (or some other outcome measure) of
reducing class size by one student per class
– We must use data to find out (is there any way
to answer this without data?)
31
Example: The California Test Score Data Set
All K-6 and K-8 California school districts (n=420)
Variables:
• 5th grade test scores (Stanford-9 achievement test,
combined math and reading), district average
• Student-teacher ratio (STR) = no. of students in the
district divided by no. full-time equivalent teachers
32
33
Do districts with smaller classes have higher test scores?
Scatterplot of test score v. student – teacher ratio
34
We need to get some numerical evidence on whether districts
with low STRs have higher test scores – but how?
1. Compare average test scores in districts with low STRs
to those with high STRs (“estimation”)
2. Test the “null” hypothesis that the mean test scores in
the two types of districts are the same, against the “alternative”
hypothesis that they differ (“hypothesis testing”)
3. Estimate an interval for the difference in the mean test
scores, high v. low STR districts (‘confidence interval”)
35
Initial data analysis: Compare districts with
small (STR<20 ) and large (STR>=20) class sizes
36
a. Estimation
Is this a large difference in a real-world sense?
•Standard deviation across districts = 19.1
• Difference between 60th and 75th percentiles of
test score distribution is 667.6 – 659.4 = 8.2
•This is a big enough difference to be important for
school reform discussions, for parents, or for a
school committee?
37
b. Hypothesis testing
• Difference-in-means test: compute the tstatistic:
38
Lehangmyhanh.cs2@ftu.edu.vn
39
c. Confidence interval
40
1.5 Review of probability
a. Population, random variable, and distribution
b. Moments of a distribution (mean, variance,
standard deviation of a deviation, covariance,
correlation)
c. Conditional distributions and conditional
means
d. Distribution of a sample of data draw
randomly from a population: Y1, …, Yn
Lehangmyhanh.cs2@ftu.edu.vn
41
Lehangmyhanh.cs2@ftu.edu.vn
42
Population distribution of Y
• The probabilities of different values of Y that occur
in the population, for ex. Pr (Y=650) (when Y is
discrete)
• Or: The probabilities of sets of these values, for ex.
Pr(640<=Y<=660) (when Y is continuous)
Lehangmyhanh.cs2@ftu.edu.vn
43
Lehangmyhanh.cs2@ftu.edu.vn
44
Lehangmyhanh.cs2@ftu.edu.vn
45
Lehangmyhanh.cs2@ftu.edu.vn
46
Lehangmyhanh.cs2@ftu.edu.vn
47
Lehangmyhanh.cs2@ftu.edu.vn
48
Lehangmyhanh.cs2@ftu.edu.vn
49
Lehangmyhanh.cs2@ftu.edu.vn
50
Lehangmyhanh.cs2@ftu.edu.vn
51
52
Lehangmyhanh.cs2@ftu.edu.vn
53
Distribution of Y1,…, Yn under simple random sampling
Because individuals #1 and #2 are selected at random, the value of
Y1 has no information content for Y2. Thus:
• Y1 and Y2 are independently distributed
• Y1 and Y2 come from the same distribution, that is, Y1, Y2 are
identically distributed
• That is, under simple random sampling, Y1 and Y2 are
independently and identically distributed (i.i.d.).
• More generally, under simple random sampling, {Yi}, i
= 1,…, n, are i.i.d.
This framework allows rigorous statistical inferences about
moments of population distributions using a sample of data
from that population …
Lehangmyhanh.cs2@ftu.edu.vn`
54
Some database
• IMF data: http://www.imf.org/en/Data
• ADB data: http://www.adb.org/data/statistics
• WB data:
http://data.worldbank.org/vietnamese
• GSO: https://www.gso.gov.vn/
• https://www.quandl.com/collections/vietnam
55
Download