Uploaded by Khai Pham

Quant

advertisement
Random variable, outcome, and event
Subjective
Interest rate
Empirical (observation)
Types of probability
Objective
PV and FV of cash flows
Priori (logical analysis)
Probability and odd ratios
PV of unequal cash flows
Time value of money
Odds for/against
Non-​annual & continuous compounding
Addition rule
Solving for I/Y, CAGR, PMT, N
Conditional and joint probability
Multiplication rule
Additivity principle
Total probability rule
Nominal (non-​ranked)
Probability Concepts
Qualitative
(categorical)
Ordinal (ranked)
Conditional expected value and variance
Expected value and variance
Tree diagram
Discrete
Continuous
Quantitative
(numerical)
Covariance given a joint probability function
Data types
Cross-​sectional / Time Series / Panel data
Bayes' formula
Structured (numbers) / Unstructured (text, audio, video, images)
Organizing data
Combination (tổ hợp)
Principles of counting
Frequency distributions
Permutation (chỉnh hợp)
Contingency table
Discrete => Probability mass function: p(x) = P(X = x)
Visualization
Random variables
Continuous => Probability density function (pdf): f(x) = P(X = x)
Sample mean (arithmetic)
Cumulative distribution function (cdf): P(X ≤ x); F(x) = P(X ≤ x)
Median
Discrete and continuous uniform
Mode
Bernoulli trial/random variable
Binomial
Outliers
Binomial random variable (x): number of successes in Bernoulli trial
Probability of 'x' successes in 'n' trials
Organizing, Visualizing, and Describing Data
Winsorized mean
Univariate/Multivariate distribution
Other means
Geometric mean
'n' Means
Harmonic mean
Multivariate normal distribution for 'n' variables
Common Probability Distributions
Weighted mean
Normal
'n' Variances
'n(n - 1)/2' Pairwise correlations
Trimmed mean
Suitable for modelling quarterly/yearly returns; NOT suitable for modelling asset prices
Winsorized mean
Standardizing a random variable
Sample variance and standard deviation
Measures of Central Tendency & Dispersion
Modern portfolio theory (MPT): the
value of investment opportunities can
be meaningfully measured in terms of
mean return and variance of return.
Range
Application of normal distribution
Mean absolute deviation (MAD)
Shortfall risk: The risk that portfolio value
or portfolio return will fall below some
minimum acceptable level over some time
horizon.
Sample variance and standard deviation
Quantiles
Interquartile range (IQR)
Roy's Safety-​first Ratio
L = (n+1)y/100
Location (L) of the 'y'th percentile with 'n' data entries
Quartiles / Quintiles / Deciles / Percentiles
Stress testing: A specific type of scenario
analysis that estimates losses in rare and
extremely unfavorable combinations of
events or scenarios.
Box and whisker plot
Scenario analysis: A technique for exploring
the performance and risk of investment
strategies in different structural regimes.
Suitable for modelling asset prices
Target downside deviation (target semi-​
deviation) and coefficient of variation
Mean (μL) of a lognormal random variable = e^(μ + 0.5σ^2)
Lognormal distribution
Variance (σL^2) of a lognormal random variable = e^(2μ + σ^2) × [e^(σ^2) − 1]
Suitable for modelling asset prices
Shape of distributions
Formula
Continuous compounded rate of return
Skewness
Normal
the continuously compounded return to time T is
the sum of the one-​period continuously
compounded returns
Negatively (left) skewed
Positively (right) skewed
Volatility measures the standard deviation of the continuously compounded returns on the
underlying asset (by convention, it is stated as an annualized measure typically done on the
basis of 250 days in a year - the approximate number of days markets are open for trading).
Kurtosis
Correlation
Quantitative
methods
t-​test, Chi square, F-​test
t-​distribution has fatter tails than normal distribution => more reliable and
conservative downside risk estimate
Chi-​square and F-​test are bounded below by 0
Monte Carlo simulation
Strengths: price complex securities for which no analytic expression is available
Weaknesses:
Provides only statistical estimates, not exact results.
Analytic methods, where available, provide more insights into cause-​effect relationship.
Convenience
Small scale pilot studies
Non-​probability
Judgmental
Auditing
Simple random
Probability
Systematic: selecting every 'k'th observation
Process of hypothesis testing
Stratified random: population is divided into strata based on classification criteria; simple
random samples are then drawn from each stratum proportionally to the relative size of
each stratum in the population to form a large sample.
Appropriate test statistics
Sampling methods
Level of significance
Cluster: divides a population into clusters representative of the population and then
randomly draws certain clusters to form a sample.
Relatively less accurate but more time-​efficient and cost-​efficient
1-​tailed test
Decision rule
2-​tailed test
Sampling error: difference between the observed value of a statistic and the quantity it is
intended to estimate as a result of sampling.
Sampling and Estimation
Sampling Distribution of a Statistic: the distribution of all the distinct possible values that
the statistic can assume when computed from samples of the same size randomly drawn
from the same population.
Hypothesis Testing
Making a decision
Statistical significance ≠ economic significance
Central limit theorem (CLT) & Distribution of the sample mean
p-​value below significance level
=> null is rejected
CLT: Given a population described by any probability distribution having mean μ and finite
variance σ^2, the sampling distribution of the sample mean computed from random
samples from this population will be approximately normal with mean μ (the population
mean) and variance σ^2/n (the population variance divided by n) when the sample size n is
large (n ≥ 30), regardless of the population's distribution.
Standard error of the sample mean: how much inaccuracy of a population parameter
estimate comes from sampling.
p-​value
p-​value is the smallest value
where the null can be rejected
=> then the smaller the p-​value,
the more likely null is rejected
Unbiased: one whose expected value (the mean of its sampling distribution) equals the
parameter it is intended to estimate.
Point estimates of population mean
Efficient: an unbiased estimator is efficient if no other unbiased estimator of the same
parameter has a sampling distribution with smaller variance.
False discovery approach: An adjustment in
the p-​values for tests performed multiple times.
Consistent: one for which the probability of estimates getting close to the value of the
population parameter increases as sample size increases.
False discovery rate (FDR): The rate of Type I errors in testing a null hypothesis multiple
times for a given level of significance.
Multiple Tests and Significance Interpretation
z-​statistics
Multiple testing problem: The risk of getting statistically
significant test results when performing a test multiple times.
If you run 100 tests and use a 5% level of significance, you
get 5 false positives, on average.
Confidence intervals = Point estimate ± Reliability factor × Standard error
t-​statistics
Confidence Intervals for the Population Mean and Sample Size Selection
Sample size selection
Risk of sampling from more than one population and additional expenses of different
sample sizes.
Tests for a single mean
With independent samples
Bootstrap: repeatedly draws samples with replacement of the selected elements from the
original observed sample, treating the original sample as a new population (often used to
find standard error or construct confidence intervals of population parameters).
Tests for differences between means
With dependent samples
Resampling
Single variance
Jackknife: repeatedly draws samples by taking the original observed data sample and
leaving out one observation at a time (without replacement) from the set.
Tests of variances
2 variances
Data snooping: determining a model by extensive searching through a dataset
for statistically significant patterns.
When the data we use do NOT meet distributional assumptions.
When there are outliers.
Usage
Sampling biases
Parametric vs Non-​parametric tests
When the data are given in ranks or use an ordinal scale.
When the hypotheses we are addressing do NOT concern a parameter.
out-​of-​sample test
Sample selection bias: Bias introduced by systematically excluding some members of the
population according to a particular attribute—​for example, the bias introduced when data
availability leads to certain observations being excluded from the analysis.
implicit selection bias: selection bias introduced through the presence of a threshold
that filters out some unqualified members.
Survivorship bias: The exclusion of poorly performing or defunct companies from an index
or database, biasing the index or database toward financially healthy companies.
However, if the assumptions of the parametric test are met, the parametric test is preferred
because it may have a greater ability to reject a false null hypothesis.
Backfill bias: certain surviving hedge funds are added to databases and various hedge fund
indexes only after they are initially successful and start to report their returns.
Pearson correlation
Tests of correlation
Look-​ahead bias: bias caused by using information that was unavailable on the test date.
Spearman rank correlation coefficient
Time-​period bias: statistical conclusion may be sensitive to the starting and ending dates of
the sample.
Tests of independence using contingency table
Intercept, slope coefficient, and error term
Sum of squared errors
Estimating parameters
Measures of variation
Simple linear regression (SLR)
Linearity
Homoskedasticity
Introduction to Linear Regression
Assumptions
Independence
Normality
Sum of squared errors (SSE):
measures variation in observed
values NOT attributable to the
relationship between the dependent
and independent variables
Total sum of squares (SST):
measures variation of observed
values around the mean
Sum of squared regressions (SSR):
measures variation in observed values
attributable to the relationship between
the dependent and independent variables
Analysis of variance
(ANOVA)
Coefficient of determination (R-​squared)
Standard error of the estimate
Slope coefficient
Hypothesis testing of linear regression coefficients
Intercept
Prediction using SLR
Log-​lin
Lin-​log
Functional forms of SLR
Log-​log: useful in calculating elasticities because the slope coefficient is the relative change
in the dependent variable for a relative change in the independent variable.
Selecting functional forms: examining the goodness of fit measures (R-​squared, F-​statistic,
and the standard error of the estimate), and whether there are patterns in the residuals.
Download