Uploaded by ngohoangan20032015

IDA LV1 2023 QUANT

advertisement
QUANTITATIVE
METHODS
CFA LEVEL 1 - 2023
Lectucter: Hoàng Long Thịnh
7 READINGS – EXAM WEIGHT: 8 – 12%
Reading 1. The Time Value of Money
Reading 2. Organizing, Visualizing, and
Describing Data
Reading 3. Probability Concepts
Reading 4. Common Probability Distributions
Reading 5. Sampling and Estimation
Reading 6. Hypothesis Testing
Reading 7. Introduction to Linear Regression
E ida.coaching.center@gmail.com
Fanpage: facebook.com/IDACoachingCenter/
QUANTITATIVE METHODS – CFA LEVEL 1 – 2023
⮚ Topic weights: 8% – 12%
⮚ Readings:
⮚ Reading 1: The time value of money
⮚ Reading 2: Organizing, visualizing, and describing data
⮚ Reading 3: Probability concept
⮚ Reading 4: Common probability distributions
⮚ Reading 5: Sampling and estimation
⮚ Reading 6: Hypothesis testing
⮚ Reading 7: Introduction to linear regression
READING 1: THE TIME VALUE OF MONEY
OUTLINE
⮚ 1. INTEREST RATES: INTERPRETATION
⮚ 2. TIME VALUE OF MONEY
⮚ 3. THE FUTURE VALUE
⮚ 4. THE PRESENT VALUE
1. Interest Rates: Interpretation
⮚ Interpret:
 Required rates of return: the minimum rate of return an investor must receive in order to accept the
investment.
 Discount rates:
 Opportunity cost is the value that investors forgo by choosing a particular course of action.
⮚ Decomposing:
Nominal Risk-Free Interest Rate (approximately) – 90 days T-Bills
Interest rate = Real risk-free interest rate + Inflation premium +
Default risk premium + Liquidity premium + Maturity premium
 The real risk-free interest rate: Reflect the time preferences of individuals for current versus future real
consumption.
 The inflation premium: Compensates for expected inflation and reflects the average inflation rate
expected over the maturity of the debt.
 The default risk premium: Compensates for the possibility that the borrower will fail to make a
promised payment at the contracted time and in the contracted amount.
 The liquidity premium: Compensates for the possibility that the investor will need to convert the
investment to cash quickly and will not receive a fair value.
 The maturity premium: Compensates for the increased sensitivity of the market value of debt to a
change in market interest rates as maturity is extended.
2. Time Value of Money
⮚ Cash flows at different points of time cannot be compared.
⮚ Compounding is the process of moving cash flows forward in time.
⮚ Discounting is the process of moving cash flows back in time.
⮚ Types of interest rate:
 Principal is the amount of funds originally invested.
 Simple interest is the interest rate times the principal.
 Compounded interest is the interest earned on interest.
3. The Future Value
•
3. The Future Value
•
3. The Future Value
•
3. The Future Value
•
4. The Present Value
•
EXERCISES BY PROBLEMS
⮚ The future value of a single cash flow:
▪ Example: The value in six years of $75,000 invested today at a stated annual interest rate of 7%
compounded quarterly is closest to:
⮚ Future value of an ordinary annuity:
▪ Example: A couple plans to set aside $20,000 per year in a conservative portfolio projected to earn 7
percent a year. If they make their first savings contribution one year from now, how much will they
have at the end of 20 years:
⮚ Future value of an annuity due:
▪ Example: A couple plans to set aside $20,000 per year in a conservative portfolio projected to earn 7
percent a year. If they make their first savings contribution today, how much will they have at the end
of 20 years:
⮚ The present value of a single cash flow:
▪ Example: A client requires £100,000 one year from now. If the stated annual rate is 2.50%
compounded weekly, the deposit needed today is closest to?
⮚ The present value of an ordinary annuity:
▪ Example: Suppose you are considering purchasing a financial asset that promises to pay €1,000 per
year for five years, with the first payment one year from now. The required rate of return is 12 percent
per year. How much should you pay for this asset?
EXERCISES BY PROBLEMS
⮚ The present value of an ordinary due:
▪ Example: An investment pays €300 annually for five years, with the first payment occurring today. The
present value (PV) of the investment discounted at a 4% annual rate is closest to:
⮚ The present value of perpetuity:
▪ Example: Consider a level perpetuity of £100 per year with its first payment beginning at t = 5. What is
its present value today (at t = 0), given a 5 percent discount rate?
⮚ Solving for rates:
▪ Example: An investment of €500,000 today that grows to €800,000 after six years has a stated annual
interest rate closest to:
⮚ Solving for number of periods:
▪ Example: For a lump sum investment of ¥250,000 invested at a stated annual rate of 3% compounded
daily, the number of months needed to grow the sum to ¥1,000,000 is closest to:
⮚ Solving for size of annuity payments:
▪ Example: A sports car, purchased for £200,000, is financed for five years at an annual rate of 6%
compounded monthly. If the first payment is due in one month, the monthly payment is closest to:
EXERCISES BY PROBLEMS
⮚ Other problems:
▪ Example: Jill Grant is 22 years old (at t = 0) and is planning for her retirement at age 63 (at t = 41). She
plans to save $2,000 per year for the next 15 years (t = 1 to t = 15). She wants to have retirement
income of $100,000 per year for 20 years, with the first retirement payment starting at t = 41. How
much must Grant save each year from t = 16 to t = 40 in order to achieve her retirement goal?
Assume she plans to invest in a diversified stock-and-bond mutual fund that will earn 8 percent per
year on average:
▪ Example: A client invests €20,000 in a four-year certificate of deposit (CD) that annually pays interest
of 3.5%. The annual CD interest payments are automatically reinvested in a separate savings account
at a stated annual interest rate of 2% compounded monthly. At maturity, the value of the combined
asset is closest to:
READING 2: ORGANIZING, VISUALIZING,
AND DESCRIBING DATA
OUTLINE
 1. Data Types
 2. Organizing Data
 3. Frequency Distributions
 4. Contingency Table
 5. Data Visualization
 6. Measures of Central Tendency
 7. Quantiles
 8. Measures of Dispersion
 9. Downside Deviation and Coefficient of Variation
 10. The Shape of The Distributions
 11. Correlation Between Two Variables
1. Data Types: Numerical vs Categorical Data
 Data contains information in any form.
 Numerical (Quantitative) data (Dữ liệu định lượng) are values that can be counted or measured.

Continuous data (Dữ liệu liên tục) : Uncountable or infinite number of possible values even in a specified
range of values.

Discrete data (Dữ liệu rời rạc): Countable or finite number of possible values.
 Categorical data (qualitative data) (Dữ liệu định tính) are values that describe a quality or characteristic of a
group of observations ⟹ used as labels to divide a dataset into groups to summarize and visualize:

Nominal data (Dữ liệu định danh) are categorical values that are not organized in a logical order.
Example: fund style, country of origin, managers’ gender.

Ordinal data (Dữ liệu thứ tự) are categorical values that can be logically ordered or ranked. Example:
credit rating, fund performance ranking.
 We can perform mathematical operations only on numerical data.
 Example: Identify the data type for each of the following kinds of investment-related information:

(1) Number of coupon payments for a corporate bond; (2) Cash dividends per share paid by a public
company; (3) Credit ratings for corporate bond issues; (4) Hedge fund classification types.
1. Data Types: Cross – Sectional vs Time-Series vs Panel Data
 A variable (Biến) is a characteristic or quantity that can be measured, counted, or categorized and is
subject to change. Example: stock price, dividends.
 An observation (Giá trị quan sát) is the value of a specific variable collected at a point in time or over a
specified period of time. Example: EPS of TCB in 2021 was 5$.
 Observational units (Đối tượng quan sát)
 Cross-sectional data (Dữ liệu chéo) are a list of the observations of a specific variable from multiple
observational units at a given point in time. Example: annual inflation rates (i.e., the variable) for each of
the Asian countries (i.e., the observational units) in 2021.
 Time-series data (Dữ liệu chuỗi thời gian) are a sequence of observations for a single observational unit of
a specific variable collected over time and at discrete and typically equally spaced intervals of time.
Example: the daily closing prices (i.e., the variable) of TCB in 2021.
 Panel data (Dữ liệu bảng) consist of observations through time on one or more variables for multiple
observational units. It is a mix of time-series and cross-sectional data. Example: Quarterly earnings per
share (i.e., the variable) for three companies (i.e., the observational units) in a 2021.
1. Data Types: Cross – Sectional vs Time-Series vs Panel Data
 Example: Which of the following statements describing panel data is most accurate?
A. It is a sequence of observations for a single observational unit of a specific variable collected over
time at discrete and equally spaced intervals.
B. It is a list of observations of a specific variable from multiple observational units at a given point in
time.
C. It is a mix of time-series and cross-sectional data that are frequently used in financial analysis and
modeling
 Example: Which of the following best describes a time series?
A. Daily stock prices of the XYZ stock over a 60-month period.
B. Returns on four-star rated Morningstar investment funds at the end of the most recent month.
C. Stock prices for all stocks in the FTSE100 on 31 December of the most recent calendar year.
1. Data Types: Cross – Structured vs Unstructured Data
 Structured data (Dữ liệu có cấu trúc) are highly organized in a pre-defined manner, usually with repeating
patterns. Example: Market data, Fundamental data, Analytical data.
 Unstructured data (Dữ liệu phi cấu trúc) , in contrast, are data that do not follow any conventionally organized
forms. Some common types of unstructured data are text, audio, video.
 Typically, financial models are able to take only structured data as inputs ⟹ unstructured data must first be
transformed into structured data that models can process.
 Example: Which of the following is most likely to be structured data?
A. Social media posts where consumers are commenting on what they think of a company’s new product.
B. Daily closing prices during the past month for all companies listed on Japan’s Nikkei 225 stock index.
C. Audio and video of a CFO explaining her company’s latest earnings announcement to securities analysts.
2. Organizing Data
 Raw data (Dữ liệu thô) are usually not suitable for use directly by analysts ⟹ Convert to one-dimensional array
and two-dimensional array.
 A one-dimensional array (Mảng một chiều) is the simplest format for representing a collection of data of the
same data type. A time series is an example in that it represents a single variable:

New data can be added without affecting the existing data.

Contain valuable information like trend.
 A two-dimensional array (also called a
data table) (Mảng hai chiều) is one of the
most popular forms for organizing data,
comprised of columns and rows to hold
multiple variables and multiple
observations.
3. Frequency Distributions
 A frequency distribution (also called a one-way table) (Phân phối tần suất) is a tabular display of data
constructed by assigning them to specified groups, or intervals.
 Constructing a frequency distribution of a categorical variable (Biến phân loại) :

Count the number of observations for each unique value of the variable.

Construct a table listing each unique value and the corresponding counts, and then sort the records by
number of counts in descending or ascending order.
 Constructing a frequency distribution of a numerical variable (Biến số) :

Sort the data in ascending order.

Calculate the Range = Maximum value - Minimum value.

Decide on the number of bins (intervals) in the frequency distribution, k.

Determine bin width as Range/k (Need to round up to include maximum value)

Count the number of observations falling in each bin.

Construct a table of the intervals listed from smallest to largest that shows the number of observations
falling in each bin.
3. Frequency Distributions
 Example 3: Suppose we have 12 observations sorted in ascending order: −4.57, −4.04, −1.64, 0.28, 1.34, 2.35,
2.38, 4.28, 4.42, 4.68, 7.16, and 11.43. Create frequency distribution with k = 4.
 The absolute frequency is the actual number of observations counted for each unique value of the variable.
 The relative frequency is calculated as the absolute frequency of each unique value of the variable divided by
the total number of observations.
 The cumulative absolute frequency cumulates the absolute frequencies from the first bin to the last bin.
 The cumulative relative frequency is a corresponding percentages of cumulative absolute frequency.
Cumulative
Absolute
Frequency
Return Interval (%)
Absolute
Frequency
Relative
Frequency
Cumulative
Absolute
Frequency
-4.57 ≤ obs < -0.57
3
25.0%
3
25.0%
-0.57 ≤ obs < 3.43
4
33.3%
7
58.3%
3.43 ≤ obs < 7.43
4
33.3%
11
91.6%
7.43 ≤ obs ≤ 11.43
1
8.4%
12
100%
4. Contingency Table
 A contingency table (Bảng tương quan) is a two-dimensional array with which we can analyze two variables at
the same time.
4. Contingency Table
 One application is for evaluating the performance of a classification model (in this case, the contingency
table is called a confusion matrix).
 Example: Suppose a model for classifying companies into two groups: default and do not default. The
confusion matrix for displaying the model’s results will be a 2 × 2 table showing the frequency of actual
defaults versus the model’s predicted frequency of defaults.
 Investigate potential association between two categorical variables?

Chi-square test of independence (Reading 7)
5. Data Visualization – Histogram
 Visualization is the presentation of data in a
pictorial or graphical format for the purpose of
increasing understanding and for gaining insights
into the data.
 The Histogram (Biểu đồ tần suất) is a bar chart of
data that have been grouped into a frequency
distribution.
 To construct a frequency polygon, the midpoint of
each interval is plotted on the horizontal axis, and the
absolute frequency for that interval is plotted on the
vertical axis.
5. Data Visualization – Bar Chart
 Bar chart (Biểu đồ cột) – each bar represents a distinct category, with the bar’s height proportional to the
frequency of the corresponding category.
 A grouped bar chart or clustered bar chart can illustrate two categories at once, to show joint frequencies.
 A stacked bar chart, the height of each bar represents the cumulative frequency for a category (such as
goods-producing industries) and the colors within each bar represent joint frequencies.
5. Data Visualization – Word Cloud, Tree Map
 A word cloud (tag cloud) is a visual device for
representing textual data. A word cloud consists of words
extracted from a source of textual data, with the size of
each distinct word being proportional to the frequency
with which it appears in the given text.
 Stop words (e.g., “a,” “it,” “the”) are generally stripped
out to focus on key words.
 Tree-map consists of a set of colored rectangles
to represent distinct groups, and the area of
each rectangle is proportional to the value of
the corresponding group.
5. Data Visualization – Line Chart
 A line chart is a type of graph used to visualize ordered observations. Often a line chart is used to display the
change of data series over time.
 Facilitates showing changes in the data and underlying trends in a clear and concise way.
 How can we add an additional dimension to a two-dimensional line chart? A bubble line chart
5. Data Visualization – Scatter Plot
 A scatter plot (Biểu đồ phân tán) is a type of graph for visualizing the joint variation in two numerical variables.
 A scatter plot matrix is a useful tool for organizing scatter plots between pairs of variables, making it easy to
inspect all pairwise relationships in one combined visual
5. Data Visualization – Heat Map
 A heat map (Biểu đồ nhiệt) is a type of graphic that organizes and summarizes data in a tabular format and
represents them using a color spectrum.
5. Data Visualization – Guide to Selecting among Visualization Types
5. Data Visualization – Examples
 Example: Explain which type of chart can be used based on following information

The daily trading volumes of a stock over the past five years.

The analyst would like to get a sense of how closely these 10 variables are associated with the broad
stock market index and whether any pair of variables are associated with each other.

A quantitative researcher wants to analyze the meeting minutes from the website for use in building a
model to predict future economic growth.

A private investor wants to add a stock to her portfolio, so she asks her financial adviser to compare the
three-year financial performances (by quarter) of two companies. One company experienced consistent
revenue and earnings growth, while the other experienced volatile revenue and earnings growth,
including quarterly losses.
6. Measures of Central Tendency
 A measure of central tendency (Xu hướng trung tâm) specifies where the data are centered.
 Measures of location include not only measures of central tendency but other measures that illustrate the location or
distribution of data. Statistical methods include:


Descriptive statistics (Thống kê mô tả) are used to summarize the important characteristics of large data sets.
Statistical inference (Thống kê suy luận) involves making forecasts, estimates, or judgments about a larger group
from the smaller group actually observed.
 A population (Tổng thể) is defined as the set of all possible members of a stated group.
 Any descriptive measure of a population characteristic is called a parameter.
 A sample (Mẫu) is a subset of a population.
 A sample statistic (or statistic) (Giá trị thống kê của mẫu) is a quantity computed from or used to describe a sample.
 The arithmetic mean (Trung bình cộng) is the sum of the observations divided by the number of observations.
 Population mean:
σ𝑵
𝒊=𝟏 𝑿𝒊
𝝁=
𝑵

Sample mean:
σ𝑵
𝑿
ഥ = 𝒊=𝟏 𝒊
𝑿
𝒏
6. Measures of Central Tendency – Arithmetic Mean
 Properties of arithmetic mean:

The sample mean is often interpreted as the fulcrum, or center of gravity, for a given set of data.

ഥ =𝟎
The sum of the deviations around the mean equals 0: σ𝐧𝐢=𝟏 𝐗 𝐢 − 𝐗

Potential drawback of the arithmetic mean is its sensitivity to extreme values, or outliers (Dữ liệu ngoại lai) .
 How to deal with outliers?
•
Do nothing: use the data without any adjustment if values are correct observations.
•
Delete all the outliers: Trimmed mean, which is computed by excluding a stated small percentage of the
lowest and highest values and then computing an arithmetic mean of the remaining values. Example: a
5% trimmed mean discards the lowest 2.5% and the highest 2.5% of values and computes the mean of the
remaining 95% of values.
•
Replace the outliers with another value: Winsorized mean instead of discarding the highest and lowest
observations, they are substituted by a specified low or high value. Example: a 95% winsorized mean sets
the bottom 2.5% of values equal to the value at or below which 2.5% of all the values lie (the 2.5th
percentile value) and the top 2.5% of values equal to the value at or below which 97.5% of all the values
lie (the 97.5th percentile value).
6. Measures of Central Tendency – The Median
 The median (Trung vị) is the value of the middle item of a set of items that has been sorted into
ascending or descending order.

In an odd numbered sample of n items, the median is the value of the item that occupies the (n +
1)/2 position.

In an even-numbered sample, we define the median as the mean of the values of items occupying
the n/2 and (n + 2)/2 positions (the two middle items).
 A potential advantage of the median is that, unlike the mean, extreme values do not affect it. However,
does not use all the information about the size and magnitude of the observations.
 Example: What is the median return for five portfolio managers with 10-year annualized total returns
record of: 30%, 15%, 25%, 21%, and 23%?
 Example: Suppose we add a sixth manager to the previous example with a return of 28%. What is the
median return?
6. Measures of Central Tendency – The Mode, Weighted Mean
 The mode is the value that occurs most frequently in a data set. A data set may have more than one
mode or even no mode:

Unimodal: one mode. Bimodal: two modes. Trimodal: there modes.
 Example: What is the mode of the following data set? Data set: [30%, 28%, 25%, 23%, 28%, 15%, 5%]
 Stock return data and other data from continuous distributions may not have a modal outcome ⟹ data
are grouped into bins ⟹ modal interval.
 In working with portfolios, stock returns usually have different weights ⟹ Weighted mean:
𝐧
ഥ 𝐰 = ෍ 𝐰𝐢 𝐗 𝐢
𝐗
𝐢=𝟏
 If we use forward-looking data, the weighted mean is the expected value.
6. Measures of Central Tendency – Geometric Mean
 The geometric mean (Trung bình nhân) :
𝑮=
𝒏
𝑿𝟏 𝑿𝟐 𝑿𝟑 … 𝑿𝒏 𝑤𝑖𝑡ℎ 𝑋𝑖 ≥ 0
 The geometric mean return is often used when calculating investment returns over multiple periods or
when measuring compound growth rates:
𝟏 + 𝑹𝑮 =
𝒏
(𝟏 + 𝑹𝟏 )(𝟏 + 𝑹𝟐 )(𝟏 + 𝑹𝟑 ) … (𝟏 + 𝑹𝒏 ) 𝑤ℎ𝑒𝑟𝑒 𝑅𝑡 𝑖𝑠 𝑟𝑒𝑡𝑢𝑟𝑛 𝑓𝑜𝑟 𝑝𝑒𝑟𝑖𝑜𝑑 𝑡
 Example: For the last three years, the returns for Acme Corporation common stock have been –9.34%,
23.45%, and 8.92%. Compute the compound annual rate of return over the 3-year period.
 Average return over a one-period horizon ⟹ use the arithmetic mean because the arithmetic mean is
the average of one-period returns.
 Average returns over more than one period ⟹ use the geometric mean of returns because the
geometric mean captures how the total returns are linked over time.
 For expected returns in the future ⟹ use weighted mean is appropriate, with the probabilities of the
possible outcomes used as the weights.
6. Measures of Central Tendency – Harmonic Mean
 A harmonic mean (Trung bình điều hòa) is used for certain computations, such as the average cost of
shares (Cost averaging):
ഥ 𝑯 = 𝒏ൗ 𝒏
𝑿
σ𝒊=𝟏(𝟏Τ𝑿𝒊 ) 𝑤𝑖𝑡ℎ 𝑋𝑖 > 0
 Example: An investor purchases $1,000 of mutual fund shares each month, and over the last three
months the prices paid per share were $8, $9, and $10. What is the average cost per share?
ഥ
XH =
3
= $8.926 per share
1 1 1
+ +
8 9 10
 Arithmetic mean × Harmonic mean = Geometric mean2
 Unless all the observations in a dataset have the same value: Harmonic mean < geometric mean <
arithmetic mean.
6. Measures of Central Tendency – Appropriate Central Tendency Measures
7. Quantiles
 Statisticians use the word quantile (or fractile) (Phân vị) as the most general term for a value at or below
which a stated fraction of the data lies.





Quartiles – the distribution is divided into quarters.
Quintile – the distribution is divided into fifths.
Decile – the distribution is divided into tenths.
Percentile – the distribution is divided into hundredths (percent's).
The interquartile range (IQR) is the difference between the third quartile and the first quartile in
quartiles.
 The formula for the position of the observation at a given percentile, y, with n data points sorted in
ascending order is:
𝒚
𝟏𝟎𝟎
When the location, Ly , is a whole number, the location corresponds to an actual observation.
𝑳𝒚 = (𝒏 + 𝟏)


When Ly is not a whole number or integer, Ly lies between the two closest integer numbers (one
above and one below), and we use linear interpolation (Nội suy tuyến tính) between those two
places to determine.
 Example: What is the third quartile for the following distribution of returns? 8%, 10%, 12%, 13%,15%, 17%,
17%, 18%, 19%, 23%.
7. Quantiles – Box And Whisker Plot
 A box and whisker plot can be used to visualize the dispersion of data across quartiles. A box and
whisker plot consists of a “box” with “whiskers” connected to the box.
 Example 15 – Curriculum L1V1 Page 124
7. Quantiles
 The tenth percentile corresponds to
observations in bins?
 The second quintile corresponds to
observations in bins?
 The fourth quartile corresponds to observations
in bins?
 The median is closest to?
 The interquartile range is closest to:
8. Measures of Dispersion – Range and MAD
 Dispersion (Độ phân tán) is the variability around the central tendency. If mean return measures reward,
dispersion measure risk.

The range is the difference between the maximum and minimum values in a data set:
Range = Maximum value – Minimum value

The mean absolute deviation (MAD) (Độ lệch tuyệt đối trung bình) is the average of the absolute
values of the deviations of individual observations from the arithmetic mean:
ഥ
σ𝒏𝒊=𝟏 𝑿𝒊 − 𝑿
𝑴𝑨𝑫 =
𝒏

The computation of the MAD uses the absolute values of each deviation from the mean because
the sum of the actual deviations from the arithmetic mean is zero.
 Example: What is the range and MAD for the 5-year annualized total returns for five investment managers
if the managers’ individual returns were 30%, 12%, 25%, 20%, and 23%?
8. Measures of Dispersion – Variance
 Variance (Phương sai) is defined as the average of the squared deviations around the mean.

Population variance:
σ𝑵
𝒊=𝟏 𝑿𝒊 − 𝝁
𝝈 =
𝑵
𝟐
𝟐

Sample variance:
ഥ
σ𝑵
𝒊=𝟏 𝑿𝒊 − 𝑿
𝒔 =
𝒏−𝟏
𝟐
𝟐
 Use n instead of n – 1 causes the sample variance to be what is referred to as a biased estimator (Ước
ഥ, which is only an
lượng sai) of the population variance because we measure of central tendency using 𝐗
estimate of the true population parameter.
 Using n – 1 (number of degrees of freedom (Bậc tự do) ) instead of n improves the statistical properties
and makes sample variance be an unbiased estimator (Ước lượng không sai) of population variance.
 The relation between the arithmetic mean and geometric mean is:
𝒔𝟐
ഥ𝑮 ≈ 𝑿
ഥ−
𝑿
𝟐
8. Measures of Dispersion – Standard Deviation
 Example: Assume that the 5-year annualized total returns for the five investment managers used in the
preceding examples represent only a sample of the managers at a large investment firm. What is the
sample variance of these returns?
 Variance is measured in squared units, so it is difficult to interpret it’s meaning.
 Standard deviation (Độ lệch chuẩn) is the positive square root of the variance. It is more easily interpreted
because of the same unit of measurement as the observations.
 Example: Compute the sample standard deviation based on the result of the preceding example.
9. Downside Deviation and Coefficient of Variation
 Variance and standard deviation of returns take account of returns above and below the mean, but
investors are concerned only with downside risk. For example, returns below the mean or below some
specified minimum target return.
 Target semivariance defined as the average squared deviation below a stated target.
σ𝒇𝒐𝒓 𝒂𝒍𝒍 𝑿𝒊 ≤𝑩 𝑿𝒊 − 𝑩
𝟐
𝒏−𝟏

Where B is the target and n is the total number of sample observations
 Target semideviation is its positive square root of target semivariance.
 Example: Suppose the monthly returns on a portfolio are as follow: 5, 3, -1, -4, 4, 2, 0, 4, 3, 0, 6, 5.
Calculate the target downside deviation when the target return is 3%.
9. Downside Deviation and Coefficient of Variation
 Coefficient of Variation:
 Relative dispersion is the amount of variability in a distribution relative to a reference point or
benchmark.


Relative dispersion is commonly measured with the coefficient of variation (CV) which measures
the amount of dispersion in a distribution relative to the distribution’s mean:
𝑪𝑽 = 𝒔ൗഥ
𝑿
This ratio can be thought of as the units of risk per unit of mean return: Higher = riskier
 Example: Suppose that analyst collects the return on assets (in
percentage terms) for ten companies for each of two
industries.
 Calculate the average return on assets (ROA) for each
industry.
 Calculate the standard deviation of ROA for each
industry.
 Calculate the coefficient of variation of ROA for each
industry.
10. The Shape of The Distributions – Skewness
 A distribution is symmetrical (Đối xứng) if it is shaped identically on both sides of its mean.
 A distribution that is not symmetrical is called skewed (Trượt/Xiên) . Nonsymmetrical distributions may be
either positively or negatively skewed and result from the occurrence of outliers in the data set:

A positively skewed distribution has many outliers in the upper (right) tail ⇒ Skewed right.

A negatively skewed distribution has many outliers that fall within its lower (left) tail ⇒ Skewed left.

Symmetrical distribution has a skewness measure equal to 0.

Sample skewness (Độ trượt) is computed as:
ഥ )𝟑
𝟏 σ𝒏𝒊=𝟏(𝑿𝒊 − 𝑿
𝑺𝒌𝒆𝒔𝒔𝒏𝒆𝒔𝒔 ≈
𝑤ℎ𝑒𝑟𝑒 𝑠 𝑖𝑠 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝒏
𝒔𝟑
10. The Shape of The Distributions – Skewness
 Skewness (Độ trượt) affects the location of the mean, median, and mode of a distribution.

For a symmetrical distribution, the mean, median, and mode are equal.

For a positively skewed, unimodal distribution, the mode is less than the median, which is less than
the mean.

For a negatively skewed, unimodal distribution, the mean is less than the median, which is less than
the mode.

Notes: the median is between the other two measures for positively or negatively skewed
distributions. Skew affects the mean more than the median and mode, and the mean is “pulled” in
the direction of the skew.
10. The Shape of The Distributions – Kurtosis
 Kurtosis (Độ nhọn) is a measure of the degree to which a distribution is more or less “peaked” than a
normal distribution.

Leptokurtic (fat–tailed) (Cao) describes a distribution that is more peaked (fatter tail) than a normal
distribution.

Platykurtic (thin–tailed) (Lùn) refers to a distribution that is less peaked, or flatter (thinner tail)than a
normal distribution.

A distribution is mesokurtic (Bình thường) if it has the same kurtosis as a normal distribution.
 The computed kurtosis for all normal distributions is 3.
Statisticians, however, sometimes report excess kurtosis,
which is defined as kurtosis minus three. Sample kurtosis is
computed as:
ഥ )𝟒
𝟏 σ𝒏𝒊=𝟏(𝑿𝒊 − 𝑿
𝑤ℎ𝑒𝑟𝑒 𝑠 𝑖𝑠 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝒏
𝒔𝟒
Excess kurtosis = Sample kurtosis − 3
11. Correlation Between Two Variables
 The sample covariance (Hiệp phương sai): is a measure of how two variables in a sample move together:
ഥ )(𝒀𝒊 − 𝒀
ഥ)
σ𝒏𝒊=𝟏 (𝑿𝒊 − 𝑿
𝒔𝑿𝒀 =
𝒏−𝟏
 Interpreting covariance:

Negative covariance: X and Y tend to be on the opposite side of their mean at the same time.

Positive covariance: X and Y tend to be on the same side (above or below) their mean at the same
time.

Covariance of returns is 0 if X and Y are unrelated.
 Covariance is difficult to interpret because it can take on extremely large values, ranging from negative
to positive infinity, and covariance are expressed in terms of square units ⟹ sample correlation
coefficient (Hệ số tương quan): a standardized measure of how two variables in a sample move together:
𝒔𝑿𝒀
𝒓𝑿𝒀 =
𝒔𝑿 𝒔𝒀
 Properties of correlation of two random variables:

Correlation has no units.

The correlation ranges from –1 to +1.

Correlation = 0: there is no linear relationship between the variables.
11. Correlation Between Two Variables
 Scatter plots are a very useful tool for the sensible
interpretation of a correlation coefficient
 Limitations:

Two variables can have a strong nonlinear relation and
still have a very low correlation.

Correlation may also be an unreliable measure when
outliers are present.

Correlation does not imply causation.
 The term spurious correlation (tương quan giả) has been used
to refer to:

1) correlation between two variables that reflects
chance relationships in a particular data set.

2) correlation induced by a calculation that mixes
each of two variables with a third

3) correlation between two variables arising not from a
direct relation between them but from their relation to
a third variable.
READING 3: PROBABILITY CONCEPTS
Outline
 1. Introduction
 2. Conditional and Joint Probability
 3. Expected Value and Variance
 4. Expected Value, Variance, Covariances, and Correlations of Portfolio Returns
 5. Bayes’ Formula
 6. Principles of Counting
1. Introduction – Random variable, outcome, and events
 A random variable (Biến ngẫu nhiên) is an uncertain quantity/number.
 An outcome (Kết quả) is an observed value of a random variable.
 An event (Biến cố) is a single outcome or a set of outcomes.
 Mutually exclusive events (Biến cố xung khắc) are events that cannot both happen at the same time.
 Exhaustive events (Biến cố đầy đủ) are those that include all possible outcomes.
 Example: Consider rolling a 6-sided die. The number that comes up is a random variable. If you roll a 1,
that is an outcome. Rolling a 1 is an event, and rolling an even number is an event. Rolling a 4 and
rolling a 6 are mutually exclusive events. Rolling an even number and rolling an odd number is a set of
mutually exclusive and exhaustive events.
1. Introduction – Probability
 Probability (Xác suất) is a number between 0 and 1 that measures the chance that a stated event will occur:

The probability of any event E is a number between 0 and 1: 0 ≤ P(E) ≤ 1.

The sum of the probabilities of any set of mutually exclusive and exhaustive events equals 1.
 Example: The probability of rolling any one of the numbers 1 to 6 with a fair die is 1/6 = 0.1667 = 16.7%. The set of
events - rolling a number equal to 1, 2, 3, 4, 5, or 6 - is exhaustive, and the individual events are mutually
exclusive, so the probability of this set of events is equal to 1.
 An empirical probability (Xác suất thực nghiệm) is established by analyzing past data. Example: Based on the
sample of 200 companies in 2010, 60 companies paid dividends. The empirical probability of a firm paying
dividend is therefore 0.3.
 A subjective probability (Xác suất chủ quan) is based on the use of personal judgment. Example: I believe there
is a 70% probability that ABC stock price will increase this year.
 An a priori probability (Xác suất biết trước) is determined using logical analysis rather than on observation or
personal judgment. Example: The probability of getting aces from the deck is 4/52.
 Objective probabilities (Xác suất khách quan) refer to a priori and empirical probabilities because it does not
vary from person to person.
1. Introduction – Odds
 Probabilities are often stated as odds for or against a given event occurring.
 The odds for the event to occur are 𝑷 𝑬 /[𝟏 − 𝑷 𝑬 ] .
 The odds against the event to occur are [𝟏 − 𝑷 𝑬 ]/𝑷 𝑬 .
 Example: Two of your colleagues are taking a quantitative methods investment course:

If your first colleague has a 0.40 probability of passing, what are his odds for passing?

If your second colleague has odds of passing of 4 to 1, what is the probability of her passing?
2. Conditional and Joint Probability
 Unconditional probability (Xác suất không điều kiện) or Marginal probability refers to the probability of an
event regardless of the past or future occurrence of other events – P(A).
 A conditional probability (Xác suất có điều kiện) is one where the occurrence of one event affects the
probability of the occurrence of another event – P(A|B), the probability of A given B.
 Joint probability (Xác suất chung) of two events is the probability that they will both occur – P(AB), the
probability of A and B.
 Multiplication Rule for Probability (Nhân xác suất) : 𝐏 𝐀𝐁 = 𝐏 𝐀 𝐁 𝐏 𝐁 - the joint probability of A and B,
P(AB), is equal to the conditional probability of A given B, P(A | B), times the unconditional probability of
B, P(B).

Example: P(I) = 0.4, the probability of the monetary authority increasing interest rates (I) is 40%. P(R | I) =
0.7, the probability of a recession (R) given an increase in interest rates is 70%. What is P(RI), the joint
probability of a recession and an increase in interest rates?
2. Conditional and Joint Probability
 Addition Rule for Probabilities: 𝐏 𝐀 𝐨𝐫 𝐁 = 𝐏 𝐀 + 𝐏 𝐁 − 𝐏 𝐀𝐁 , the probability that A or B occurs, or both
occur, is equal to the probability that A occurs, plus the probability that B occurs, minus the probability
that both A and B occur.
 Example: One buy order (Order 1) was placed at a price limit of $10. The probability that it will execute
within one hour is 0.35. The second buy order (Order 2) was placed at a price limit of $9.75; it has a 0.25
probability of executing within the same one-hour time frame. (1) What is the probability that either
Order 1 or Order 2 will execute? (2) What is the probability that Order 2 executes, given that Order 1
executes?
 Note: if the events are mutually exclusive the sets do not intersect, P(AB) = 0, and the probability that
one of the two events will occur is simply P(A) + P(B).
 Example: The probability of rolling a fair dice with the outcome of 4 is P(4) = 1/6 and 5 is P(5) = 1/6. The
probability of rolling a fair dice with the outcome of 4 or 5 is P(4 or 5) = 1/6 + 1/6.
2. Conditional and Joint Probability
 Independent events (Biến cố độc lập) refer to events for which the occurrence of one has no influence
on the occurrence of the others: 𝐏 𝐀 𝐁 = 𝐏 𝐀 , or equivalently, P(B|A) = P(B).
 Multiplication Rule for Independent Events: 𝐏(𝐀𝐁) = 𝐏 𝐀 𝐏(𝐁)
 Example: What is the probability of rolling three 4 in one simultaneous toss of three dice?
 Total probability rule (Tổng xác suất) highlights the relationship between unconditional and conditional
probabilities of mutually exclusive and exhaustive events:
𝐏 𝐀 = 𝐏 𝐀𝐒𝟏 + ⋯ + 𝐏 𝐀𝐒𝐧
= 𝐏 𝐀 | 𝐒𝟏 𝐏 𝐒𝟏 + ⋯ + 𝐏 𝐀 | 𝐒𝐧 𝐏 𝐒𝐧

Where S1, S2, …, Sn are mutually exclusive and exhaustive scenarios or events.
 If we have an event or scenario S, the event not-S, called the complement of S, is written SC. Note that
P(S) + P(SC) = 1.
 Example: Assume that P(R | I) = 0.70, P(R | IC ), the probability of recession if interest rates do not rise, is
10% and that P(I) = 0.40 so that P(I C ) = 0.60. The unconditional probability of a recession is?
3. Expected Value and Variance
 The expected value – E(X) of a random variable is the probability-weighted average of the possible
outcomes of the random variable:
𝒏
𝑬 𝑿 = ෍ 𝑷(𝑿𝒊 )𝑿𝒊
𝒊=𝟏
 The variance of a random variable is the expected value (the probability-weighted average) of
squared deviations from the random variable’s expected value:
𝐧
𝛔𝟐 𝐗 𝐨𝐫 𝐕𝐚𝐫(𝐗) = ෍ 𝐗 𝐢 − 𝐄 𝐗
𝟐𝐏
𝐗 𝐢 = 𝐄 𝐗 − 𝐄(𝐗)
𝟐
𝐢=𝟏
 Because variance has units that are squared, it is not easy to interpret. Accordingly, we use its positive
square root, standard deviation, more often because it also measures dispersion but has the same units
as expected value.
3. Expected Value and Variance – Conditional Expectation
 The expected value of a random variable X given an event S has occurred:
𝑬 𝑿 | 𝑺 = 𝑷 𝑿𝟏 | 𝑺 𝑿𝟏 + ⋯ + 𝑷 𝑿𝒏 | 𝑺 𝑿𝒏
 Total probability rule for expected value:
𝑬 𝑿 = 𝑬 𝑿 | 𝑺𝟏 𝑷 𝑺𝟏 + ⋯ + 𝑬 𝑿 | 𝑺𝒏 𝑷 𝑺𝒏
 The conditional variance of a random variable X given an event S has occurred:
𝒏
𝝈𝟐 𝑿 | 𝑺 𝒐𝒓 𝑽𝒂𝒓 𝑿 𝑺) = ෍ 𝑿𝒊 − 𝑬 𝑿 | 𝑺
𝒊=𝟏
 Example: Calculate E(EPS | stable interest
rate), E(EPS), Var(EPS | declining interest rate),
and Var(EPS)
𝟐𝑷
𝑿𝒊 | 𝑺
4. Applications in Portfolio
 The expected return on the portfolio:
𝑬 𝑹𝒑 = 𝑬 𝒘𝟏 𝑹𝟏 + ⋯ + 𝒘𝒏 𝑹𝒏 = 𝒘𝟏 𝑬(𝑹𝟏 ) + ⋯ + 𝒘𝒏 𝑬(𝑹𝒏 )
 Example: Calculate expected returns of a portfolio with 50% invested in asset A, 25% in asset B, and 25%
in C, and the expected returns are 13%, 6%, and 15% respectively.
 Covariance (Hiệp phương sai) is a measure of how two assets move together. Given two random
variables 𝑅𝑖 and 𝑅𝑗 , the covariance between 𝑅𝑖 and 𝑅𝑗 is:
𝑪𝒐𝒗 𝑹𝒊 , 𝑹𝒋 = 𝑬 (𝑹𝒊 −𝑬𝑹𝒊 )(𝑹𝒋 − 𝑬𝑹𝒋
 Alternative notations are 𝝈(𝑹𝒊 , 𝑹𝒋) and 𝝈𝒊𝒋
 We can also determine covariance using historical data. The calculation of the sample covariance is
based on the following formula:
𝑪𝒐𝒗 𝑹𝒊 , 𝑹𝒋 =
ഥ 𝒊 )(𝑹𝒋,𝒕 − 𝑹
ഥ 𝒋)
σ𝒏𝒕=𝟏 (𝑹𝒊,𝒕 − 𝑹
൘
(𝒏 − 𝟏)
4. Applications in Portfolio
 Covariance using a joint probability model: The joint probability function of two random variables X and
Y, denoted P(X,Y), gives the probability of joint occurrences of values of X and Y.
 The joint probability function below might reflect an analysis based on whether economic conditions
are good, average, or poor.
Joint Probability Function of asset A and B
𝑹𝑩 = 𝟐𝟎%
𝑹𝑩 = 𝟏𝟔%
𝑹𝑩 = 𝟏𝟎%
𝑹A = 𝟐𝟓%
0.2
0
0
𝑹A = 𝟏𝟐%
0
0.5
0
𝑹A = 𝟏𝟎%
0
0
0.3
4. Applications in Portfolio
 A covariance matrix shows the covariances between returns on a group of assets:
 In practice, the covariance is difficult to interpret because it can take on extremely large values,
ranging from negative to positive infinity, and, like the variance, these values are expressed in terms of
square units.
 To interpret the relationship between 2 random variables, we use correlation coefficient (Hệ số tương
quan), or simply, correlation:
𝑪𝒐𝒓𝒓 (𝑹𝒊 , 𝑹𝒋 ) =

𝑪𝒐𝒗 𝑹𝒊 , 𝑹𝒋
൘
[𝝈(𝑹𝒊 )𝝈(𝑹𝒋)]
Alternative notations are 𝝆(𝑹𝒊 , 𝑹𝒋 ) and 𝝆𝒊𝒋.
 Correlation can be forward-looking if it uses covariance from a probability model, or backward-looking if
it uses sample covariance from historical data.
4. Applications in Portfolio
 Variance of portfolio return:
𝒏
𝒏
𝑽𝒂𝒓 𝑹𝒑 = ෍ ෍ 𝒘𝒊 𝒘𝒋 𝑪𝒐𝒗(𝑹𝒊 , 𝑹𝒋 )
𝒊=𝟏 𝒋=𝟏

The variance of a portfolio composed of asset A and asset B can be expressed as:
𝑽𝒂𝒓 𝑹𝒑 = 𝒘𝑨 𝟐 𝝈𝟐 (𝑹𝑨 ) + 𝒘𝑩 𝟐 𝝈𝟐 (𝑹𝑩 ) + 𝟐𝒘𝑨 𝒘𝑩 𝑪𝒐𝒗(𝑹𝑨 , 𝑹𝑩 )
= 𝒘𝑨 𝟐 𝝈𝟐 (𝑹𝑨 ) + 𝒘𝑩 𝟐 𝝈𝟐 (𝑹𝑩 ) + 𝟐𝒘𝑨 𝒘𝑩 𝝈(𝑹𝑨 )𝝈(𝑹𝑩 )𝝆(𝑹𝑨 , 𝑹𝑩 )
 Example: A portfolio with 50% invested in S&P 500, 25% in US Long-Term Corporate Bonds, and 25% in MSCI
EAFE. Calculate portfolio variance.
5. Bayes’ Formula
 Bayes’ formula: Given a set of prior probabilities for an event of interest, if you receive new information,
the rule for updating your probability of the event is:
𝑼𝒑𝒅𝒂𝒕𝒆𝒅 𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝒆𝒗𝒆𝒏𝒕 𝒈𝒊𝒗𝒆𝒏 𝒕𝒉𝒆 𝒏𝒆𝒘 𝒊𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏
=
𝑷𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝒕𝒉𝒆 𝒏𝒆𝒘 𝒊𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏 𝒈𝒊𝒗𝒆𝒏 𝒆𝒗𝒆𝒏𝒕
× 𝑷𝒓𝒊𝒐𝒓 𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝒆𝒗𝒆𝒏𝒕
𝑼𝒏𝒄𝒐𝒏𝒅𝒊𝒕𝒊𝒐𝒏𝒂𝒍 𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝒕𝒉𝒆 𝒏𝒆𝒘 𝒊𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏
 In probability notation, this formula can be written concisely as:
𝑷 𝑬𝒗𝒆𝒏𝒕 𝑰𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏 =
𝑷 𝑰𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏 𝑬𝒗𝒆𝒏𝒕
𝑷(𝑬𝒗𝒆𝒏𝒕 )
𝑷(𝑰𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏 )
 This updated probability is called your posterior probability because it reflects or comes after the new
information.
 Example: There is a 60% probability the economy will outperform, and if it does, there is a 70% chance a
stock will go up and a 30% chance the stock will go down. There is a 40% chance the economy will
underperform, and if it does, there is a 20% chance the stock in question will increase in value (have
gains) and an 80% chance it will not. Given that the stock increased in value, calculate the probability
that the economy outperformed.
6. Principles of Counting
 Labeling refers to the situation where there are n items that can each receive one of k different labels:
𝒏!
𝑤ℎ𝑒𝑟𝑒 𝑡ℎ𝑒 𝑠𝑦𝑚𝑏𝑜𝑙 “! ” 𝑠𝑡𝑎𝑛𝑑𝑠 𝑓𝑜𝑟 𝒇𝒂𝒄𝒕𝒐𝒓𝒊𝒂𝒍 𝑎𝑛𝑑 𝒏𝟏 + ⋯ + 𝒏𝒌 = 𝒏
(𝒏𝟏 !) × (𝒏𝟐 !) × ⋯ × (𝒏𝒌 !)

Example: Consider a portfolio consisting of eight stocks. Your goal is to designate four of the stocks as
“long-term holds,” three of the stocks as “short-term holds,” and one stock as “sell.” How many ways can
these eight stocks be labeled?
 A special case of labeling arises when the number of labels equals 2 (k = 2).
 Combination Formula (Binomial Formula) (tổ hợp) : The number of ways that we can choose r objects from a
total of n objects, when the order in which the r objects are listed does not matter, is :
𝒏!
𝒏 − 𝒓 ! 𝒓!
Example: A firm will select two of four vice presidents to be added to the investment committee. How
many different groups of two are possible?
𝒏𝑪𝒓 =

 Permutation Formula (chỉnh hợp): The number of ways that we can choose r objects from a total of n objects,
when the order in which the r objects are listed does matter, is:
𝒏𝑷𝒓 =

𝒏!
𝒏−𝒓 !
Example: From an approved list of 25 funds, a portfolio manager wants to rank 4 mutual funds from most
recommended to least recommended? How many possible ways the funds could be ranked?
READING 4: COMMON PROBABILITY DISTRIBUTIONS
OUTLINE
 1. Discrete Random Variables
 2. The Discrete and Continuous Uniform Distribution
 3. Binomial Distribution
 4. Normal Distribution
 5. Applications of The Normal Distribution
 6. Lognormal Distribution
 7. Student’s T-, Chi-Square, and F-Distributions
 8. Monte Carlo Simulation
1. Discrete Random Variables
 A probability distribution (Phân phối xác suất) specifies the probabilities of the possible outcomes of a
random variable. A random variable is a quantity whose future outcomes are uncertain
 A discrete random variable (Biến ngẫu nhiên rời rạc) is one for which the number of possible outcomes can
be counted.

Example: outcomes when rolling a dice, number of days in next month when VN30 will increase.
 A continuous random variable (Biến ngẫu nhiên liên tục) is one for which the number of possible outcomes is
uncountable, even if lower and upper bounds exist.

Example: Changes in VN30 index.
 A probability function (Hàm xác suất), which specifies the probability that the random variable takes on a
specific value.

For a discrete random variable, the shorthand notation for the probability function is
𝒑 𝐱 = 𝐏 𝐗= 𝐱 > 𝟎.

For continuous random variables, the probability function is denoted f(x) and called the probability
density function (pdf) (Hàm mật độ xác suất), or just the density, so f(x) = 0.
 The two key properties of a probability function are:
 0 ≤ p(x) ≤ 1.
 σ 𝒑 𝒙 = 𝟏, the sum of the probabilities for all possible outcomes, x, for a random variable, X, equals 1.
2. The Discrete and Continuous Uniform Distribution
 The cumulative distribution function (cdf) (Hàm phân phối tích lũy), or distribution function for short, gives
the probability that a random variable X is less than or equal to a particular value x, 𝐅 𝐱 = 𝐏 𝐗 ≤ 𝐱
 The Discrete Uniform Distribution (Phân phối đều rời rạc) is one for which the probabilities for all possible
outcomes for a discrete random variable are equal.

Consider the discrete uniform probability distribution defined as X = {1, 2, 3, 4, 5, 6}, p(x) = 0.167

The cumulative distribution function for the nth outcome, 𝐅 𝐱𝐧 = 𝐧𝐩(𝐱)

The probability for a range of outcomes is 𝒑(𝒙)𝒌, where k is the number of possible outcomes in the
range.
 Example: Determine p(6), F(6), and P(2 ≤ X ≤ 8) for the discrete uniform distribution function defined as: X
= {2, 4, 6, 8, 10}.
2. The Discrete and Continuous Uniform Distribution
 The possible outcomes of continuous random variables are never countable.
 The continuous uniform distribution (Phân phối đều liên tục) is defined over a range that spans between
some lower limit, a, and some upper limit, b, which serve as the parameters of the distribution.
 Outcomes can only occur between a and b and for all a ≤ 𝒙𝟏 < 𝒙𝟐 ≤ b. The properties of a continuous
uniform distribution may be described as follows:
𝐏 𝐗=𝐱 =𝟎
൞ 𝐏 𝐗 < 𝐚 𝐨𝐫 𝐗 > 𝐛 = 𝟎
𝐏(𝐱𝟏 < 𝐗 < 𝐱𝟐 ) = (𝐱𝟐 − 𝐱𝟏 )/(𝐛 − 𝐚)
 Example: X is uniformly distributed between 2 and 12. Calculate the probability that X will be between 4
and 8.
 Continuous uniform cumulative distribution:
𝟎 𝐟𝐨𝐫 𝐱 < 𝐚
𝐱−𝐚
𝐅 𝐱 =
𝐟𝐨𝐫 𝐚 < 𝐱 < 𝐛
𝐛−𝐚
𝟏 𝐟𝐨𝐫 𝐱 > 𝐛
 Example: X is uniformly distributed between 2 and 12. Calculate F 8 .
3. Binomial Distribution
 A binomial random variable (Biến nhị thức ngẫu nhiên) may be defined as the number of “successes” in a
given number of trials, whereby the outcome can be either “success” or “failure.” Such a trial is a
Bernoulli trial.
 The probability of success, p, is constant for each trial, and the trials are independent.
 The binomial probability function (Hàm phân phối nhị thức) defines the probability of x successes in n trials:
𝐩 𝐱 =𝐏 𝐗=𝐱 =
𝐧!
𝐩𝐱 (𝟏 − 𝐩)𝐧−𝐱
𝐧 − 𝐱 ! 𝐱!
 A binomial random variable is completely described by two parameters, n and p: 𝑿~𝑩(𝒏, 𝒑)
 Example: Tính xác suất trúng đề 3 lần trong 10 lần đánh đề.
 For a given series of n trials, the expected number of successes, or E(X), is given by the following formula:
𝐄 𝐗 = 𝐧𝐩
 The variance of a binomial random variable is given by:
𝐕𝐚𝐫 𝐗 = 𝐧𝐩(𝟏 − 𝐩)
 Example: Calculate the expected number of defaults and the standard deviation of number of defaults
over the next year of a high-yield bond portfolio with 25 US issues from distinct issuers. The estimated
annual default rate is 10.7 percent.
3. Binomial Distribution
 A binomial tree is constructed by showing all the possible combinations of up-moves and down-moves
over a number of successive periods.
 The probability of an up-move (the up-transition probability, u) is p and the probability of a down-move
(the down-transition probability, d) is (1 − p). Each of the possible values along a binomial tree is a node.
 One of the important applications of a binomial stock price model is in pricing options.
 Example: For a binomial random variable with five trials and a probability of success on each trial of
0.50, the distribution will be?
4. Normal Distribution
 The normal distribution (Phân phối chuẩn) has the following key properties:

It is completely described by its mean, μ, and variance, 𝝈𝟐 , stated as 𝑿~𝑵(𝝁, 𝝈𝟐 ).

Skewness = 0, meaning that the normal distribution is symmetric about its mean.

Kurtosis = 3, or excess kurtosis = 0.

As a consequence of symmetry, the mean, median, and the mode are all equal for a normal
random variable.

Note: A linear combination of normally distributed random variables is also normally distributed.
4. Normal Distribution
 We can alculate the probability that a normally distributed random variable lies inside a given interval:

Approximately 50% of all observations fall in the interval μ ± (2/3)σ.

Approximately 68% of all observations fall in the interval μ ± σ.

Approximately 95% of all observations fall in the interval μ ± 2σ.

Approximately 99% of all observations fall in the interval μ ± 3σ.
 Example: The chief investment officer of Fund XYZ would like to present some investment return scenarios to
the Investment Committee, so she asks your assistance with some indicative numbers. Assuming daily asset
returns are normally distributed, she would like to know the following:

What is the probability that returns would be less than or equal to 1 standard deviation below the
mean?

What is the probability that returns would be between +1 and −1 standard deviation around the
mean?

How far (in terms of standard deviation) must returns fall below the mean for the probability to equal
95%?
4. Normal Distribution
 A univariate distribution (Phân phối đơn biến) describes a single random variable. For example: Model the
distribution of returns on each asset individually.
 A multivariate distribution (Phân phối đa biến) specifies the probabilities for a group of related random
variables. For example: Model the distribution of returns on the assets as a group.
 A multivariate normal distribution for the returns on n stocks is completely defined by three lists of parameters:

n means of the n series of returns (𝝁𝟏, 𝝁𝟐, …, 𝝁𝒏 );

n variances of the n series of returns (𝝈𝟏 𝟐 , …, 𝝈𝒏 𝟐 );

n(n − 1)/2 distinct pairwise return correlations.
 For example: A portfolio with 5 assets, how many parameters needed to describe multivariate distribution?
4. Normal Distribution – Standard Normal Distribution
 The standard normal distribution (Phân phối chuẩn hóa) is a normal distribution that has been standardized so
that it has a mean of 0 and a standard deviation of 1 [i.e., N~(0,1)].
 To standardize an observation from a given normal distribution, the z-value of the observation must be
calculated. The z-value represents the number of standard deviations a given observation is from the
population mean.
𝐳 − 𝐯𝐚𝐥𝐮𝐞 =
𝐗−𝛍
𝛔
 Example: Assume that the annual earnings per share (EPS) for a population of firms are normally distributed with
a mean of $6 and a standard deviation of $2. What are the z-values for EPS of $2 and $8?
 Explain: The probability that we will observe an EPS as small as or smaller than $8 for X ~ N(6,2) is exactly
the same as the probability that we will observe a value as small as or smaller than 1 for Z ~ N(0,1).
 Z-table contains values generated using the cumulative density function for a standard normal distribution. N(z)
is a conventional notation for the cdf of a standard normal variable.
𝐏 𝐙 ≤ 𝐳 = 𝐍 𝐳 = 𝐏(𝐙 ≥ −𝐳)
𝐅𝐨𝐫 𝐳 ≥ 𝟎 ቊ
𝐏 𝐙 ≥ 𝐳 = 𝟏 − 𝐍 𝐳 = 𝐏 𝐙 ≤ −𝐳
 Example: Considering again EPS distributed with μ = $6 and σ = $2, what is the probability that EPS will be $9.70
or more?
 Notes: N(-z) = 1 − N(z).
4. Normal Distribution – Standard Normal Distribution
 Example: Assume the portfolio mean return is 12 percent and the standard deviation of return estimate
is 22 percent per year. You want to calculate the following probabilities, assuming that a normal
distribution describes returns. (You can use the excerpt from the table of normal probabilities to answer
these questions.)

What is the probability that portfolio return will exceed 20 percent?

What is the probability that portfolio return will be between 12 percent and 20 percent? In other
words, what is P(12% ≤ Portfolio return ≤ 20%)?

You can buy a one-year T-bill that yields 5.5 percent. This yield is effectively a one-year risk-free
interest rate. What is the probability that your portfolio’s return will be equal to or less than the riskfree rate?
5. Applications of The Normal Distribution
 Mean – variance analysis generally considers risk symmetrically in the sense that standard deviation
captures variability both above and below the mean.
 Safety-first rules focus on shortfall risk (Rủi ro thâm hụt), the risk that portfolio value will fall below some
minimum acceptable level over some time horizon.
 Roy’s safety-first criterion states that the optimal portfolio minimizes the probability that portfolio return,
𝑅𝑃 , falls below the threshold level, 𝑅L . That means minimize 𝑷(𝑹𝑷 < 𝑹𝑳 ) or maximize SFRatio:
𝐒𝐅𝐑𝐚𝐭𝐢𝐨 =
𝐏(𝐑 𝐏 < 𝐑 𝐋 ) = 𝐍
𝐑𝐋 − 𝐄 𝐑 𝐏
𝛔𝐏
𝐄 𝐑𝐏 − 𝐑 𝐋
𝛔𝐏
= 𝐍 −𝐒𝐅𝐑𝐚𝐭𝐢𝐨 = 𝟏 − 𝐍 𝐒𝐅𝐑𝐚𝐭𝐢𝐨
 If we substitute the risk-free rate, 𝑹f, for the critical level 𝑹𝑳 , the SFRatio becomes the Sharpe ratio.
5. Applications of The Normal Distribution
 Example: You are researching asset allocations for a client in Canada with a C$800,000 portfolio.
Although her investment objective is long-term growth, at the end of a year she may want to liquidate
C$30,000 of the portfolio to fund educational expenses. If that need arises, she would like to be able to
take out the C$30,000 without invading the initial capital of C$800,000. Table 6 shows three alternative
allocations.

Given the client’s desire not to invade the C$800,000 principal, what is the shortfall level, 𝑹𝑳 ? Use
this shortfall level to answer Part 2.

According to the safety-first criterion, which of the three allocations is the best?

What is the probability that the return on the safety-first optimal will be less than the shortfall level?
6. Lognormal Distribution
 The lognormal distribution is generated by the function 𝒆𝒙, where x is normally distributed.
 A random variable X follows a lognormal distribution if its natural logarithm, ln(X), is normally distributed.
The reverse is also true: If the natural logarithm of random variable X, ln(X), is normally distributed, then X
follows a lognormal distribution.
 Properties of lognormal distribution:

The lognormal distribution is skewed to the right.

The lognormal distribution is bounded from below by zero so that it is useful for modeling asset prices
which never take negative values.
 Continuously compounded return from time t to t + 1:
𝐫𝐭,𝐭+𝟏 = 𝐥𝐧
𝐒𝐭+𝟏
ൗ𝐒𝐭 = 𝐥𝐧(𝟏 + 𝐇𝐏𝐑)
 Example: A stock was purchased for $100 and sold one year later
for $120. Calculate the investor’s annual rate of return on a
continuously compounded basis
6. Lognormal Distribution
 Continuously compounded return from time 0 to T:
𝐫𝟎,𝐓 = 𝐥𝐧
𝐒𝐓
𝐒𝐓
𝐒𝐓−𝟏
𝐒𝟏
= 𝐥𝐧
×
×⋯×
= 𝐫𝑻−𝟏,𝑻 + ⋯ + 𝐫𝟎,𝟏
𝐒𝟎
𝐒𝑻−𝟏 𝐒𝑻−𝟐
𝐒𝟎
 If we assume one-period continuously compounded returns are independently and identically distributed
(i.i.d) (Independence = investors cannot predict future returns using past returns. Identical distribution =
the mean and variance of return do not change from period to period):
𝐄 𝐫𝟎,𝐓 = 𝐄 𝐫𝐓−𝟏,𝐓 + ⋯ + 𝐄 𝐫𝟎,𝟏 = 𝛍𝐓
𝛔𝟐 𝐫𝟎,𝐓 = 𝛔𝟐 𝐓
 Where:

𝛍 = Mean of one-period continuously compounded returns.

𝛔𝟐 = Variance of the one-period continuously compounded return.
 Estimate Volatility (annual basis of 250 days) using historical series of continuously compounded daily
returns:
𝐀𝐧𝐧𝐮𝐚𝐥𝐢𝐳𝐞𝐝 𝐯𝐨𝐥𝐚𝐭𝐢𝐥𝐢𝐭𝐲 = 𝛔 𝟐𝟓𝟎
7. Student’s t-, Chi-Square, and F-Distributions
 Student’s t-distribution, or simply the t-distribution, is a bell-shaped probability distribution that is
symmetrical about its mean.
 Properties of t-distribution:
 It is symmetrical.
 It is defined by a single parameter, the degrees
of freedom (df), where the degrees of freedom
are equal to the number of sample observations
minus 1, n − 1, for sample means.
 It has more probability in the tails (“fatter tails”)
than the normal distribution.
 As the degrees of freedom (the sample size)
gets larger, the shape of the t-distribution more
closely approaches a standard normal
distribution.
7. Student’s t-, Chi-Square, and F-Distributions
 Like the t-distribution, a chi-square distribution (χ2 ) is a family of distributions, each based on degrees of
freedom which equal to n – 1 (n is sample size).
 Properties of chi-square distribution:
 Chi-square distribution is asymmetrical (skew
to the right).
 As degrees of freedom get larger, the chi-
square distribution approaches the normal
distribution in shape
 Chi-square distribution is bounded below by
zero.
7. Student’s t-, Chi-Square, and F-Distributions
 Like the chi-square distribution, the F-distribution is a family of asymmetrical distributions bounded from
below by 0.
 The relationship between the chi-square and F-distributions: If 𝝌𝟐𝟏 is one chi-square random variable with
m df and 𝝌𝟐𝟐 is anther chi-square random variable with n df, then 𝑭 =
numerator and n denominator df
 Properties of F-distribution:
 Chi-square distribution is asymmetrical (skew to
the right).
 As degrees of freedom get larger, the chi-
square distribution approaches the normal
distribution in shape
 F-distribution is bounded below by zero.
𝝌𝟐
𝟏ൗ
𝒎
𝟐
𝝌𝟏ൗ
𝒏
follow F-distribution with m
7. Student’s t-, Chi-Square, and F-Distributions
8. Monte Carlo Simulation
 Monte Carlo simulation is a technique based on the repeated generation of one or more risk factors
that affect security values, in order to generate a distribution of security values.
 To illustrate the steps, using Monte Carlo simulation to estimate NPV:

1. Specify the probability distributions of Cash flow and of the relevant interest rate, as well as the
parameters (mean, variance, possibly skewness) of the distributions.

2. Randomly generate values for both CF and interest rates using random number generator.

3. Estimate NPV for each pair of CF and interest rate.

4. After many iterations, calculate the mean NPV and use that as your estimate of the NPV.
8. Monte Carlo Simulation
 Monte Carlo simulation is used to:
 Valuing complex securities for which have no analytic pricing formula is available: contingent claim,
mortgage-backed securities with complex embedded options.
 Examine the model’s sensitivity to a change in key assumptions ⇒ can solve ”what if” analysis
 Limitations:
 Provides only statistical estimates, not exact results.
 A complement to analytical methods
 Provides no cause-and-effect relationships.
READING 5: SAMPLING AND ESTIMATION
OUTLINE
 1. Sampling Methods
 2. The Central Limit Theorem
 3. Point Estimates of The Population Mean
 4. Confidence Interval
 5. Resampling
 6. Data Snooping Bias, Sample Selection Bias, Look-ahead Bias, And Time-period Bias
1. Sampling Methods
 A parameter (tham số tổng thể) is a quantity computed from or used to describe a population of data. For
example: 𝝁, 𝝈
 A statistic (tham số của mẫu ) is a quantity computed from or used to describe a sample of data. For example:
ഥ, 𝒔
𝑿
 There are two types of sampling methods:

Probability sampling: Every member of the population an equal chance of being selected ⟹
representative of the population.

Nonprobability sampling: Depends on factors other than probability considerations, for example,
researcher’s sample selection capabilities ⟹ non-representative sample.
1. Probability sampling – Simple Random Sampling and Systematic Sampling
 Simple random sample: a subset of a larger population in which each element has an equal probability of
being selected.
 Simple random sampling (Chọn mẫu ngẫu nhiên đơn giản): Number the members of the population in
sequence then use a random-number generator to select.
 Systematic sampling: Select every kth member until we have a sample of the desired size.

For example: Select every 10th company from the list of 600 companies until having 50 companies.
 Sampling error is the difference between a sample statistic (the mean, variance, or standard deviation of
the sample) and its corresponding population parameter (the true mean, variance, or standard deviation of
the population). Example: sampling error of the mean = sample mean – population mean
 The sampling distribution (Phân phối mẫu) of a statistic is the distribution of all the distinct possible values of the
statistic computed from samples of the same size randomly drawn from the population.
 For example: a random sample of 100 bonds is selected from a population of 1,000 bonds, and then the
mean return of the 100-bond sample is calculated. Repeating this process many times will result in many
different estimates of the population mean return (i.e., one for each sample). The distribution of these
estimates of the mean is the sampling distribution of the sample mean (Phân phối trung bình của mẫu).
1. Probability sampling – Stratified Random Sampling
 Stratified random sampling (Lấy mẫu ngẫu nhiên theo nhóm) uses a classification system to separate the
population into smaller groups based on one or more distinguishing characteristics. From each
subgroup, or stratum, a random sample is taken, and the results are pooled.

Bond indexing is one area in which stratified sampling is frequently applied. Bonds in a population
are categorized (stratified) according to major bond risk factors such as duration, maturity, coupon
rate, and the like. Then, samples are drawn from each separate category and combined to form a
final sample.
 For example: You are exploring several approaches to indexing, including a stratified sampling
approach. You first distinguish among agency bonds, US Treasury bonds, and investment grade corporate
bonds. For each of these three groups, you define 10 maturity intervals, and also separate the bonds with
coupons of 6 percent or less from the bonds with coupons of more than 6 percent.

How many cells or strata does this sampling plan entail?

If you use this sampling plan, what is the minimum number of issues the indexed portfolio can have?
1. Probability sampling – Cluster Sampling
 Cluster sampling (Lấy mẫu theo cụm) : Classification is also based on subsets of a population, each subset
(cluster) is representative of the overall population.

In one-stage cluster sampling, a random sample of clusters is selected and all the data in those
clusters comprise the sample.

In two-stage cluster sampling, random samples from each of the selected clusters comprise the
sample.
 Limitations: Lower accuracy because a
sample from a cluster might be less
representative of the entire population.
 Advantage: Time-efficient and cost-efficient
probability sampling plan for analyzing a vast
population.
1. Non – Probability Sampling
 Convenience sampling (Lấy thuận tiện) refers to selecting sample data based on its ease of access,
using data that are readily available.
 Judgmental sampling (Lấy mẫu theo đánh giá) refers to samples for which each observation is selected
from a larger data set by the researcher, based on her experience and judgment.

Could be affected by the bias of the researcher

Judgmental sampling allows researchers to go directly to the target population of interest
 An important consideration when sampling is ensuring that the distribution of data of interest is constant
for the whole population being sampled. For example: Judging a characteristic U.S. banks using data from
2005 to 2015 may not be appropriate.
2. The Central Limit Theorem
 The Central Limit Theorem (Định lý giới hạn trung tâm): Given a population described by any probability
ഥ
distribution having mean μ and finite variance 𝝈𝟐 , the sampling distribution of the sample mean 𝑿
computed from samples of size n from this population will be approximately normal with mean μ (the
𝝈𝟐
population mean) and variance
(the population variance divided by n) when the sample size n is
𝒏
large.
 The standard deviation of a sample statistic is known as the standard error (Sai số chuẩn) of the statistic:
𝛔𝐗ഥ =
𝛔
𝐧
 Example: A population of 2,054 observations. Draw a random sample of 150 observations then calculate
its mean. Repeat that 200 times then we will have distribution of sample mean which is approximately
normal.
2. The Central Limit Theorem
 Properties of central limit theorem:
 If the sample size n is sufficiently large (n ≥ 30), the sampling distribution of the sample means will be
approximately normal.

ഥ = 𝛍.
𝐄 𝐗

𝝈𝟐
The variance of the distribution of sample means is , the population variance divided by the sample size.
𝒏
 Example: The mean hourly wage for Iowa farm workers is $13.50 with a population standard deviation of $2.90.
Calculate and interpret the standard error of the sample mean for a sample size of 30.
 However, population’s standard deviation is almost never known:
ഥ)𝟐
σ𝐧𝐢=𝟏(𝐗 𝐢 − 𝐗
𝐬
𝟐
𝐬𝐗ഥ =
𝐚𝐧𝐝 𝐬 =
𝐧−𝟏
𝐧
 Example: Suppose a sample contains the past 30 monthly returns for McCreary, Inc. The mean return is 2% and
the sample standard deviation is 20%. Calculate and interpret the standard error of the sample mean.
 Example: Continuing with our example, suppose that instead of a sample size of 30, we take a sample of the
past 200 monthly returns for McCreary, Inc. In order to highlight the effect of sample size on the sample standard
error, let’s assume that the mean return and standard deviation of this larger sample remain at 2% and 20%,
respectively. Now, calculate the standard error of the sample mean for the 200-return sample.
3. Point Estimates of The Population Mean
 Estimators (hàm ước lượng): The formulas that used to compute the sample mean and all the other sample
statistics.
 Estimate (giá trị ước lượng): The particular value calculated from sample observations using an estimator.
 Example: the calculated value of the sample mean in a given sample, used as an estimate of the
population mean, is called a point estimate (ước lượng điểm) of the population mean.
 Desirable properties of an estimator:

An unbiased estimator (không sai) is one for which
the expected value of the estimator is equal to
the parameter you are trying to estimate. If the
ഥ , equals μ,
expected value of the sample mean, 𝑿
the population mean, so we say that the sample
mean is an unbiased estimator (of the population
ഥ = 𝛍.
mean): 𝐄 𝐗
3. Point Estimates of The Population Mean
 An unbiased estimator is also efficient (Hiệu
quả) if no other unbiased estimator of the
same parameter has a sampling distribution
with smaller variance.
 A consistent (Nhất quán) estimator is one for which
the accuracy of the parameter estimate increases as
the sample size increases.
 As the sample size increases, the standard error of
the sample mean falls, and the sampling distribution
bunches more closely around the population mean.
4. Confidence Interval for Population Mean
 Confidence interval (khoảng tin cậy) estimates result in a range of values within which the actual value of
a parameter will lie, given the probability of 𝟏 − 𝜶. Here, alpha, 𝜶, is called the level of significance (mức
ý nghĩa) for the confidence interval, and the probability 𝟏 − 𝜶 is referred to as the degree of confidence
(độ tin cậy).
 A 100(1 − α)% confidence interval for a parameter has the following structure:
𝐏𝐨𝐢𝐧𝐭 𝐞𝐬𝐭𝐢𝐦𝐚𝐭𝐞 ± 𝐑𝐞𝐥𝐢𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐟𝐚𝐜𝐭𝐨𝐫 × 𝐒𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐞𝐫𝐫𝐨𝐫
 Where:

Point estimate = a point estimate of the parameter (a value of a sample statistic).

Reliability factor = a number based on the assumed distribution of the point estimate and the
degree of confidence (1 − α) for the confidence interval.

Standard error = the standard error of the sample statistic providing the point estimate.
4. Confidence Interval – Normally Distributed Population with Known Variance
 Confidence Intervals for the Population Mean (Normally Distributed Population with Known Variance), A
100(1 − α)% confidence interval for population mean is:
ഥ ± 𝐳𝛂Τ𝟐
𝐗
𝛔
𝐧
 The most commonly used standard normal distribution reliability factors are:

𝒛𝜶Τ𝟐 = 𝟏. 𝟔𝟒𝟓 for 90% confidence intervals (the significance level is 10%, 5% in each tail).

𝒛𝜶Τ𝟐 = 𝟏. 𝟗𝟔𝟎 for 95% confidence intervals (the significance level is 5%, 2.5% in each tail).

𝒛𝜶Τ𝟐 = 𝟐. 𝟓𝟖 for 99% confidence intervals (the significance level is 1%, 0.5% in each tail).
 Example: Consider a practice exam that was administered to 36 Level I candidates. The mean score on
this practice exam was 80. Assuming a population standard deviation equal to 15, construct and interpret
a 99% confidence interval for the mean score on the practice exam for 36 candidates.
4. Confidence Interval – Large Sample, Population Variance Unknown
 Confidence Intervals for the Population Mean – The z-Alternative (Large Sample, Population Variance
Unknown), a 100(1 − α)% confidence interval for population mean is :
ഥ ± 𝐳𝛂Τ𝟐
𝐗
𝐬
𝐧
 Example: Suppose an investment analyst takes a random sample of US equity mutual funds and
calculates the average Sharpe ratio. The sample size is 100, and the average Sharpe ratio is 0.45. The
sample has a standard deviation of 0.30. Calculate and interpret the 90 percent confidence interval for
the population mean of all US equity mutual funds by using a reliability factor based on the standard
normal distribution.
4. Confidence Interval – Population Variance Unknown
 Confidence Intervals for the Population Mean (Population Variance Unknown) - t-Distribution. If we are
sampling from a population with unknown variance and either of the conditions below holds:

The sample is large, or

The sample is small, but the population is normally distributed, or approximately normally distributed
 A 100(1 − α)% confidence interval for population mean is:
ഥ ± t 𝛂Τ𝟐
𝐗

𝐬
𝐧
Where the number of degrees of freedom for t𝜶Τ𝟐 is n − 1 and n is the sample
 Example: Suppose an investment analyst takes a random sample of US equity mutual funds and
calculates the average Sharpe ratio. The sample size is 100, and the average Sharpe ratio is 0.45. The
sample has a standard deviation of 0.30. Calculate and interpret the 90 percent confidence interval for
the population mean of all US equity mutual funds using theoretically correct t-statistic.
4. Confidence Interval – Selection of Seliability Factors and Sample Size
 Selection of Sample Size: Larger sample reduces the sampling error and the standard deviation of the
sample statistic around its true (population) value and confidence intervals are narrower.
 Two limitations of larger sample size:

Larger samples may contain observations from a different population (distribution).

Increasing sample size may involve additional expenses that outweigh the value of additional
precision.
5. Resampling – Bootstrap
 Resampling: Repeatedly draws samples from the original observed data sample for the statistical inference
of population parameters.
 Bootstrap: Repeatedly draw samples from the original sample, and each resample is of the same size as the
original sample. Note that each item drawn is replaced for the next draw ⟹ Bootstrap sampling distribution.
 Standard error of the sample mean using bootstrap:
𝐁
𝐒𝐗ഥ =
𝟏
෡𝐛 − 𝛉)
ഥ 𝟐
෍ (𝛉
𝐁−𝟏
𝐛=𝟏
 Where:
 B is number of resamples drawn from the original sample.
෡𝒃 is the mean of a resample.
 𝜽
ഥ is the mean across all the resample.
 𝜽
 Advantages:
 Does not rely on an analytical formula ⟹ use when no analytical formula is available. For example:
Median.
 Improve accuracy compared to using only the data in a single sample.
5. Resampling – Jackknife
 Jackknife: Draw a sample form original sample with each with one of the observations removed from the
original sample. For example: A sample of 10 observations, create 10 different sub samples with 9
observations.
 For a sample of size n, jackknife usually requires n repetitions.
 Advantages:
 It can remove bias from statistical estimates.
 Does not need a computational power.
 Low-cost.
6. Data Snooping Bias, Sample Selection Bias, Look-ahead Bias, And Time-period Bias
 Data Snooping Bias: the practice of determining a model by extensive searching through a dataset for
statistically significant patterns. For example: Daily temperatures obviously cannot explain VN30 return,
but you try to build a model using that data.
 Two signs that can warn analysts about the potential existence of data mining:

Evidence that many different variables were tested, most of which are unreported, until significant
ones were found.

The lack of any economic theory that is consistent with the empirical results.
 The best way to avoid data mining is to test a potentially profitable trading rule on a data set different
from the one you used to develop the rule (i.e., use out-of-sample data).
 Sample Selection Bias: Sample selection bias occurs when some data is systematically excluded from the
analysis, usually because of the lack of availability. This means that observed sample to be nonrandom.

Survivorship bias is the most common form of sample selection bias. Example: Funds or companies
that are no longer in business do not appear in mutual fund databases.

Implicit selection bias: A threshold enabling self-selection. Example: companies in HOSE

Backfill bias: When a stock is added to an index, the stock’s past performance may be backfilled
into the index’s database, even though the stock was not included in the database in the previous
year.
6. Data Snooping Bias, Sample Selection Bias, Look-ahead Bias, And Time-period Bias
 Look-Ahead Bias: A test design is subject to look-ahead bias if it uses information that was not available
on the test date.

For example: if a trade is simulated based on information that was not available at the time of the
trade - such as a quarterly earnings number that was released a month later - it will diminish the
accuracy of the trading strategy's true performance and potentially bias the results in favor of the
desired outcome.
 Time-Period Bias: Time-period bias can result if the time period over which the data is gathered is either
too short or too long. Too short of a time period increases the likelihood of period-specific results. Too long
of a time period increases the chance of a regime change.

For example: Assume you are researching returns of an industry and it has a business cycle of 5
years, you should use at least a 5 - year period of data.

For example: Assume you are research foreign exchange rate of a countries from 2000 to 2020.
However, that country changed the exchange rate regime from fixed to floating in 2010. You
should divide data in 2 subsets, one from 2000 to 2009 and the other from 2010 to 2020.
READING 6: HYPOTHESIS TESTING
OUTLINE
 1. Hypothesis Testing
 2. Multiple Tests and Interpreting Significance
 3. Tests Concerning A Single Mean
 4. Test Concerning Differences Between Means With Independent Samples
 5. Testing Concerning Tests of Variances – A single variance
 6. Parametric Vs Nonparametric Tests
 7. Tests Concerning Correlation – Parametric Test
 8. Test of Independence Using Contingency Table Data
1. Hypothesis Testing
 In hypothesis testing (kiểm định giả thiết), we test to see whether a sample statistic is likely to come from a
population with the hypothesized value of the population parameter.
 A hypothesis is a statement about the value of a population parameter developed for the purpose of
testing a theory or belief. For example: The mean return for the U.S. equity market is greater than zero.
 Steps in Hypothesis Testing:
 1. Stating the hypotheses.
 2. Identifying the appropriate test statistic.
 3. Specifying the significance level.
 4. Stating the decision rule.
 5. Collecting the data and calculating the test statistic.
 6. Making a decision.
1. Hypothesis Testing – Step 1. Stating the Hypothesis
•
1. Hypothesis Testing – Step 2. Identify The Appropriate Test Statistic
•
1. Hypothesis Testing – Step 2. Identify The Appropriate Test Statistic
1. Hypothesis Testing – Step 3. Specifying The Significance Level
•
1. Hypothesis Testing – Step 3. Specifying The Significance Level
 Trade-off between type I and type II error :
 If we decrease the probability of a Type I error by specifying a smaller significance level, we increase
the probability of a Type II error.
 Whether to accept more of one type versus the other depends on the cost of the errors.
 The only way to decrease the probability of both errors at the same time is to increase the sample
size because such an increase reduces the denominator of our test statistic.
1. Hypothesis Testing – Step 4. Stating The Decision Rule
•
 For a two-sided test at the 0.05
level, the total probability of a
Type I error must sum to 0.05. Thus,
0.05/2 = 0.025 of the probability
should be in each tail of the
distribution of the test statistic
under the null. Consequently, the
two rejection points are z (0.025) =
1.96 and −z (0.025) = −1.96.
 Thus, an α significance level in a
two-sided hypothesis test can be
interpreted in exactly the same
way as a (1 − α) confidence
interval.
1. Hypothesis Testing – Step 4. Stating The Decision Rule
•
1. Hypothesis Testing – Step 4. Stating The Decision Rule
•
 For this one-sided z-test, the rejection point at the 0.05 level of significance is 1.645. We will reject the
null if the calculated z-statistic is larger than 1.645.
1. Hypothesis Testing – Step 5. Collecting the data and calculating the test statistic
 Collect the data and calculate the test statistic.

In practice, data collection is likely to represent the largest portion of the time spent in hypothesis
testing.

First, we need to ensure that the sampling procedure does not include biases, such as sample
selection or time bias.

Second, we need to cleanse the data, checking inaccuracies and other measurement errors in the
data.
1. Hypothesis Testing – Step 6. Make a decision
 Make a statistical decision: Reject or fail to reject the null hypothesis.
 Make an economic decision: The economic or investment decision takes into consideration not only the
statistical decision but also all pertinent economic issues.
 We may find strong statistical evidence of a difference but only weak economic benefit to acting. For
example: a strategy provides a statistically significant positive mean return, the results are not
economically significant when we account for transaction costs, taxes, and risk.
1. Hypothesis Testing – Step 6. Make a decision
 Analysts and researchers often report the p-value associated with hypothesis tests.
 The p-value is the smallest level of significance at which the null hypothesis can be rejected.
 Reject the null hypothesis if p-value < significance level
 Example: An analyst is testing a two-tail hypotheses.
Using software, she determines that the p-value for the
test statistic is 0.03, or 3%. Which of the following
statements are correct?
A) Reject the null hypothesis at both the 1% and 5%
levels of significance.
B) Reject the null hypothesis at the 5% level but not at
the 1% level of significance.
C) Fail to reject the null hypothesis at both the 1% and
5% levels of significance.
2. Multiple Tests and Interpreting Significance
•
3. Tests Concerning A Single Mean
•
3. Tests Concerning A Single Mean
•
4. Test Concerning Differences Between Means With Independent Samples
•
 Example: Note that these periods are of different
lengths and the samples are independent; that is,
there is no pairing of the days for the two periods.
Test whether there is a difference between the
mean daily returns in Period 1 and in Period 2
using a 5% level of significance..
4. Test Concerning Differences Between Means With Dependent Samples
•
Mean of
differences
Sample variance
of differences
Std of mean
difference
4. Test Concerning Differences Between Means With Dependent Samples
 Example: Suppose we want to compare the returns of the ACE High Yield Index with those of the ACE BBB
Index. We collect data over 1,304 days for both indexes and calculate the means and standard
deviations as shown in Exhibit 18. Using a 5% level of significance, determine whether the mean of the
differences is different from zero.
5. Testing Concerning Tests of Variances – A single variance
•
5. Testing Concerning Tests of Variances – Equality of Two Variances
•
5. Testing Concerning Tests of Variances – Equality of Two Variances
 Example: You are investigating whether the population variance of returns on a stock market index
changed after a change in market regulation. The first 418 weeks occurred before the regulation change,
and the second 418 weeks occurred after the regulation change. You gather the data in Exhibit 21 for 418
weeks of returns both before and after the change in regulation. You have specified a 5% level of
significance.

Test whether the variance of returns is different before the regulation change versus after the
regulation change, using a 5% level of significance.

Test whether the variance of returns is greater before the regulation change versus after the
regulation change, using a 5% level of significance.
6. Parametric Vs Nonparametric Tests
 Tests are said to be parametric when they are concerned with parameters and their validity depends on
a definite set of assumptions:
 Nonparametric tests, in contrast, are either not concerned with the value of a specific parameter or make
minimal assumptions about the population from which the sample is drawn.
 Situations where a nonparametric test is
called for are the following:

The distributional assumptions of the
parametric test are not satisfied. For
example: the sample is small and may
come from non-normally distributed
population.

There are outliers. Test of median.

When data are ranks.

The hypothesis does not involve the
parameters of the distribution.
7. Tests Concerning Correlation – Parametric Test
•
7. Tests Concerning Correlation – Spearman Rank Correlation Coefficient
•
8. Test of Independence Using Contingency Table Data
 A contingency or two-way table shows the number of observations from a sample that have a
combination of two characteristics.
8. Test of Independence Using Contingency Table Data
 This test statistic has (r − 1)(c − 1) degrees of freedom, where r is the number of rows and c is the number of
columns.
READING 7: INTRODUCTION TO LINEAR REGRESSION
OUTLINE
 1. Simple Linear Regression
 2. Estimating The Parameters of A Simple Linear Regression
 3. Assumptions of The Simple Linear Regression Model
 4. Analysis Of Variance
 5. Hypothesis Testing of Linear Regression Coefficients
 6. Prediction Using Simple Linear Regression and Prediction Intervals
 7. Functional Forms For Simple Linear Regression
1. Simple Linear Regression
 The purpose of simple linear regression is to explain the variation in a dependent variable in terms of the
variation in a single independent variable.
 The term “variation” is interpreted as the degree to which a variable differs from its mean value. Don’t
confuse variation with variance—they are related but are not the same.
𝐧
𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐘 = ෍
ഥ) 𝟐
(𝐘𝐢 − 𝐘
𝐢=𝟏
 Our goal is to understand what explains the variation of Y. The variation of Y is often referred to as the
sum of squares total (SST) or the total sum of squares.
 If we use variation of a variable X to explain variation of Y, then:

Dependent variable (Y): explained variable, predicted variable.

Independent variable (X): explanatory variable, predicting variable.
 Simple linear regression models the relationship between two variables as a straight line.
2. Estimating The Parameters of A Simple Linear Regression
 The goal is to fit a line to the observations on Y and X to minimize the squared deviations from the line.
 The following regression equation describes the relation between 2 variables, X and Y:
 In which:
𝐘𝐢 = 𝐛𝟎 + 𝐛𝟏 𝐗 𝐢 + 𝛆𝐢 , 𝐢 = 𝟏, 𝟐, … , 𝐧
 𝑌𝑖 = 𝑖 𝑡ℎ observation of the dependent
variable, Y
 𝑋𝑖 = 𝑖 𝑡ℎ observation of the independent
variable, X
 𝐛𝟎 = intercept term
 𝐛𝟏 = slope coefficient, regression
coefficients
 𝛆𝐢 = error term. The error term represents the
portion of the dependent variable that
cannot be explained by the independent
variable. It is also referred to as the
disturbance term or residual term.
2. Estimating The Parameters of A Simple Linear Regression
 In simple linear regression, we choose values for the intercept, 𝑏0 , and slope, 𝑏1 , that minimize the sum of
the squared vertical distances between the observations and the regression line (Sum of Squared Errors SSE):
𝐧
𝐌𝐢𝐧𝐢𝐦𝐢𝐳𝐞 𝐒𝐒𝐄 = ෍
෡𝐢 ) = ෍
(𝐘𝐢 − 𝐘
𝐢=𝟏
 Solving above equation:
ഥ) × (𝐘𝐢 − 𝐘
ഥ)
𝐂𝐨𝐯𝐱𝐲
σ𝐧𝐢=𝟏(𝐗 𝐢 −𝐗
መ𝐛𝟏 =
=
ഥ) 𝟐
𝛔𝟐𝐱
σ𝐧𝐢=𝟏(𝐗 𝐢 −𝐗
መ𝟎= 𝐘
መ 𝟏𝐗
ഥ−𝐛
ഥ
𝐛
 where 𝑌ത = mean of Y, 𝑋ത = mean of X
𝟐
𝐧
𝐞𝟐𝐢
𝐢=𝟏
2. Estimating The Parameters of A Simple Linear Regression
 Interpreting the Regression Coefficients:

The intercept is an estimate of the dependent variable when the independent variable takes on a
value of zero.

The slope coefficient is interpreted as the change in the dependent variable for a 1-unit change in
the independent variable.
 Example: If we use CAPEX to explain ROA, the estimated slope coefficient was 1.25 and the estimated
intercept term was 4.875. Interpret each coefficient estimate.
3. Assumptions of The Simple Linear Regression Model
 Linear regression is based on a number of assumptions:
 Assumption 1. Linearity: The relationship between the dependent variable, Y, and the independent
variable, X, is linear. Even if the dependent variable is nonlinear, linear regression can be used as long as
the regression is linear in the parameters. For example: 𝑌𝑖 = 𝑏0 + 𝑏1 𝑋𝑖 2 + 𝜀𝑖
 Assumption 2. Homoskedasticity: The
variance of the regression residuals is the
same for all observations
𝑬(𝜺𝟐𝒊 ) = 𝝈𝟐𝜺 , 𝒊 = 𝟏, … , 𝒏
 If the residuals are not homoscedastic,
that is, if the variance of residuals
differs across observations, then we
refer to this as heteroskedasticity.
 For example: Use inflation rate (X) to
explain short-term interest rate (Y):
3. Assumptions of The Simple Linear Regression Model
 Assumption 3. Independence: The residuals are uncorrelated across observations.

For example: Use the quarters to explain revenue.
3. Assumptions of The Simple Linear Regression Model
 Assumption 4. Normality: the residuals are normally distributed.
 With normally distributed residuals, we can test a particular hypothesis about a linear regression
model.
 For large sample sizes, we may be able to drop the assumption of normality because of central limit
theorem.
4. Analysis Of Variance
 Sum of squares total can be broken down into two parts: the sum of squares error (SSE) and the sum of
squares regression (SSR):
𝐧
𝐧
𝐧
ഥ)𝟐 , 𝐒𝐒𝐑 = ෍(𝐘෡𝐢 − 𝐘
ഥ)𝟐 , 𝐒𝐒𝐄 = ෍(𝐘𝐢 − 𝐘
෡)𝟐
𝐒𝐒𝐓 = ෍(𝐘𝐢 − 𝐘
𝐢=𝟏
𝐢=𝟏
𝐢=𝟏
SST = SSR + SSE
 How well the regression model fits the data? The coefficient of determination, also referred to as the Rsquared or 𝑹𝟐 , is the percentage of the variation of the dependent variable that is explained by the
independent variable:
𝐑𝟐 =
𝐄𝐱𝐩𝐥𝐚𝐢𝐧𝐞𝐝 𝐯𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧 𝐓𝐨𝐭𝐚𝐥 𝐯𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧 − 𝐔𝐧𝐞𝐱𝐩𝐥𝐚𝐢𝐧𝐞𝐝 𝐯𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧 𝐒𝐒𝐓 − 𝐒𝐒𝐄 𝐒𝐒𝐑
=
=
=
𝐓𝐨𝐭𝐚𝐥 𝐯𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧
𝐓𝐨𝐭𝐚𝐥 𝐯𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧
𝐒𝐒𝐓
𝐒𝐒𝐓
 For simple linear regression, (𝐑𝟐) = (𝐫 𝟐) which r is correlation coefficient.
4. Analysis Of Variance – ANOVA Table
4. Analysis Of Variance – ANOVA Table
 Standard error of estimate (𝒔𝒆) measures the degree of variability of the actual Y-values relative to the
෠ from a regression equation. Sometimes called as standard error of the regression
estimated Y-values (𝑌)
or standard deviation of residuals (error terms). The smaller the standard error, the better the fit.
𝐬𝐞 = 𝐌𝐒𝐄 =
෡𝐢 )𝟐
σ𝐧𝐢=𝟏(𝐘𝐢 −𝐘
𝐧−𝟐
 F-distributed test statistic to compare two variances. In regression analysis, we can use an F-distributed
test statistic to test whether the slopes in a regression are equal to zero:
 𝐇𝟎 : 𝐛𝟏 = 𝐛𝟐 = ⋯ = 𝐛𝐧 = 𝟎 𝐯𝐞𝐫𝐬𝐮𝐬 𝐇𝐚: 𝐚𝐭 𝐥𝐞𝐚𝐬𝐭 𝐨𝐧𝐞 𝐛 ≠ 𝟎
𝐑𝐒𝐒ൗ
𝐌𝐒𝐑
𝟏 with df
𝐅 𝐬𝐭𝐚𝐭𝐢𝐬𝐭𝐢𝐜 =
=
numerator = 1, df numerator = n − 2
𝐌𝐒𝐄 𝐒𝐒𝐄ൗ
𝐧−𝟐
 This is one tail test.
4. Analysis Of Variance – ANOVA Table
 Example: Suppose you run a cross-sectional regression for 100 companies, where the dependent
variable is the annual return on stock and the independent variable is the lagged percentage of
institutional ownership (INST). The results of this simple linear regression estimation are shown in Exhibit 23.
Evaluate the model by answering the questions below.
 What is the coefficient of determination for this regression model?
 What is the standard error of the estimate for this regression model?
 At a 5% level of significance, do we reject the null hypothesis of the slope coefficient equal to zero if the
critical F-value is 3.938?
 Based on your answers to the preceding questions, evaluate this simple linear regression model.
5. Hypothesis Testing of Linear Regression Coefficients – Slope Coefficient
 The (1) F-statistic to test whether slope coefficient equal equals to 0.
 But we also may want to perform other hypothesis tests for the slope coefficient (for example: b1=2). We
can use (2) t – test to test such hypotheses about a regression coefficient:
෢ − 𝐛𝟏
𝐛𝟏
𝐭=
with df = n − 2
𝐬𝐛𝟏
෢

𝒔𝒃𝟏
෢ = The standard error of the slope coefficient
𝐬𝐛𝟏
෢ =
𝐬𝐞
ഥ) 𝟐
σ𝐧𝐢=𝟏(𝐗 𝐢 − 𝐗
 For one independent variable with hypothesis b1=0: 𝐅 − 𝐬𝐭𝐚𝐭𝐢s𝐭𝐢𝐜 = 𝐭 𝟐𝐬𝐭𝐚𝐭𝐢𝐬𝐭𝐢𝐜
 We can also use (3) t-test for the correlation coefficient between X and Y (𝑟𝑋,𝑌 = 0)
 Example: The estimated slope coefficient from the ABC example is 0.64 with a standard error equal to
0.26. Assuming that the sample has 36 observations, determine if the estimated slope coefficient is
significantly different than zero at a 5% level of significance.
5. Hypothesis Testing of Linear Regression Coefficients – Intercept
 We can use a t-distributed test statistic to test such hypotheses about an intercept:
෢𝟎 − 𝐛𝟎
𝐛
𝐭=
𝑤𝑖𝑡ℎ 𝑑𝑓 = 𝑛 − 2
𝐬𝐛෢𝟎

𝒔𝒃0
෢ = The standard error of the intercept
𝐬𝐛෢𝟎 =
ഥ𝟐
𝟏
𝐗
+
ഥ )𝟐
𝐧 σ𝐧𝐢=𝟏(𝐗 𝐢 − 𝐗
 Example: If the critical t-values are ±1.96 (at the
5% significance level), is the slope coefficient
different from zero?
 If the critical t-values are ±1.96 (at the 5%
significance level), is the slope coefficient
different from 1.0?
6. Prediction Using Simple Linear Regression and Prediction Intervals
 A forecasted value of the dependent variable:
መ 𝟎+𝐛
መ 𝟏𝐗𝐟
෡𝐟 = 𝐛
𝐘
 For example: In our ROA regression model, if we forecast a company’s CAPEX to be 6%, the forecasted
ROA based on our estimated equation is 12.375%:
෡𝐟 = 𝟒. 𝟖𝟕𝟓 + 𝟏. 𝟐𝟓 × 𝟔 = 𝟏𝟐. 𝟑𝟕𝟓
𝐘
 However, residuals are not all zero, an interval estimate of the forecast is needed to reflect this
uncertainty.
𝐘෡𝐟 ± 𝐭 𝐜𝐫𝐢𝐭𝐢𝐜𝐚𝐥 𝐟𝐨𝐫 𝛂/𝟐 × 𝐬𝐟 with df = n − 2
ഥ) 𝟐
𝟏
(𝐗 𝐟 − 𝐗
𝐒𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐞𝐫𝐫𝐨𝐫 𝐨𝐟 𝐭𝐡𝐞 𝐟𝐨𝐫𝐞𝐜𝐚𝐬𝐭 = 𝐬𝐟 = 𝐬𝐞 × 𝟏 + + 𝐧
ഥ)𝟐
𝐧 σ𝐢=𝟏(𝐗 𝐢 − 𝐗
7. Functional Forms For Simple Linear Regression
 Not every set of independent and dependent variables has a linear relation ⟹ Transform the data to
enable their use in linear regression.
 The log-lin model has the dependent variable is logarithmic, but the independent variable is linear:
𝐋𝐧𝐘𝐢 = 𝐛𝟎 + 𝐛𝟏 𝐗 𝐢
 Interpret: The slope coefficient in this model is the relative change in the dependent variable for an
absolute change in the independent variable.
 For example: if we have a model, ln Y = −7 + 2X. Then if X = 2.5, then ln Y = -2, or Y = 0.135
 The lin-log model has the dependent variable is linear, but the independent variable is logarithmic :
𝐘𝐢 = 𝐛𝟎 + 𝐛𝟏 𝐥𝐧𝐗𝐢
 The slope coefficient in this regression model provides the absolute change in the dependent
variable for a relative change in the independent variable.
 The log-log model has both independent and dependent variables are logarithmic :
𝐥𝐧𝐘𝐢 = 𝐛𝟎 + 𝐛𝟏 𝐥𝐧𝐗 𝐢
 The slope coefficient is the relative change in the dependent variable for a relative change in the
independent variable.
7. Functional Forms For Simple Linear Regression - Correct Functional Form
 Selecting the correct functional form involves determining the nature of the variables and evaluation of
the goodness of fit measures – 𝑹𝟐, the F-statistic, and 𝒔𝒆 – as well as examining whether there are patterns
in the residuals.
 Example: An analyst is investigating the relationship between the annual growth in consumer spending
(CONS) in a country and the annual growth in the country’s GDP (GGDP). The analyst estimates the
following two models:
 Identify the functional form
used in these models.
 Explain which model has
better goodness-of-fit with the
sample data.
Download