Chapter 1 - Actuarial Empire

advertisement
1
Distributions, Moments, and Background
First, we will start with a few definitions and concepts that we will use throughout the rest of this manual. Also, please
note that the level of mathematical sophistication in the first few chapters is a little higher than the rest of the book.
The point is to get comfortable with the mathematical knowledge early on. Practice makes perfect, and math is the
indispensable toolbox you’ll need to start practicing.
1.1
Introduction
We will start by laying down the foundation for the rest of the book. Treat this section as a primer to jog your memory
of elementary level statistics. Much of the material should have been covered in Exam P, so we won’t go into all the
gory details. Let’s get started!
Statistics allows us to model things in real life which occur with some uncertainty. Actuaries use statistics to
model things like frequency and severity of car accidents or a retiree’s mortality. Thus, it is important to begin with
the building blocks of statistical models and probability.
Definition 1.1. An event is a set of outcomes that occur with some probability.
Remark. We typically use capital letters in the beginning of the alphabet, like A, B, or C, to denote events. We might
write Pr(A) to denote the probability that event A occurs.
Definition 1.2. A random variable is a variable which can take on multiple values. The set of values that it can
take is called its sample space (often denoted S), and the probabilities corresponding to each value is defined by its
distribution.
Example 1.3. We toss a coin which lands ‘heads’ with probability p, and ‘tails’ with probability 1 p. Let X be a
random variable which equals 1 if the coin toss resulted in ‘heads’ and 0 otherwise. Define A to be the event that we
our coin toss resulted in ‘tails’. Then,
p = Pr(X = 1) = 1
Pr(X = 0) = 1
Pr(A)
The sample space of X is {0, 1}.
Remark. A random variable (RV) usually is denoted by a capital letter at the end of the alphabet, such as X, Y , or
Z. Unlike for events, we cannot just write Pr(X) because that would be meaningless. We can only talk about the
probability of X being equal to some number, say, x: Pr(X = x). We tend to use the lowercase form of the letter
denoting the random variable to represent “realizations” (namely data) for that random variable.
Definition 1.4. A distribution describes the relationship between a set of values and their associated probabilities.
For a discrete distribution , the sample space S is a set of discrete numbers, e.g. a set of integers, whereas for a
continuous distribution , S is comprised of continuous interval(s), such as the positive real line.
Definition 1.5. A probability mass function (PMF) f (x) for a discrete random variable X is defined as:
f (x) := Pr(X = x)
for all x in the sample space of X.
A probability density function (PDF) f (x) for a continuous random variable X expresses a relative probability
of x, and
Z
Pr(x1  X  x2 ) =
x2
x1
f (x)dx
A cumulative distribution function (CDF) F(x) applies to both continuous and discrete random variables and is
defined as follows for a random variable X:
F(x) := Pr(X  x)
Remark. We will interchangeably use the term PDF to denote both the probability mass function and the probability
density function.
8
The most important thing to remember about probabilities–the total probability over the entire sample space is 1.
Below are some properties that either reiterate, or, follow from this fact.
1. If X is a continuous random variable, Pr(X = x) = 0 for any x.
2. If X is a discrete random variable with sample space S = {x1 , . . . , xk }, then Âx2S f (x) = Âki=1 f (xi ) = 1.
3. If X is
a continuous random variable with sample space S, where S is possibly the real line, or any subset of it,
R
then S f (x)dx = 1.
4. The CDF is a function that always increases monotonically from 0 to 1.
Example 1.6. Let X be a discrete random variable with a sample space of {1, 2, 3, 4}. Then, we can use the PDF to
compute the the probability that X is between 2 and 4, inclusive, by summing up f (x) for values in this given interval.
Pr(2  X  4) =
4
 f (x)
x=2
Note that if X was, instead, a continuous random variable which takes on values in the interval [1, 4], then we would
have computed the same probability as
Pr(2  X  4) =
Z 4
2
f (x)dx
Most times, discrete distributions differ from continuous distributions in that we use a summation (because we have a
finite number of values in our sample space) as opposed to integrals (where in the continuous case, we have intervals
to deal with).
If, for some reason, we had F(x), the CDF, at our disposal, we could compute the same probability (in the discrete
case) via
Pr(2  X  4) = Pr(X  4) Pr(X  1) = F(4) F(1)
and (in the continuous case) via
Pr(2  X  4) = Pr(X  4)
Pr(X  2) = Pr(X  4)
Pr(X < 2) = F(4)
F(2)
The example illustrates the fact that both the PDF and CDF can be used to compute probabilities. Moreover, notice that
because continuous distributions give assign 0 probability to specific points, namely Pr(X = 2) = 0, our calculation
using the CDF is slightly different than for the discrete distribution.
Remark. We will interchangeably use Pr(X = x) and P(X = x) to denote the probability that a random variable X
takes on the value x. Both are frequently used in texts!
Example 1.3 demonstrates one of the simplest distributions–a Bernoulli distribution, which is used to model 0-1
outcomes. The Exam 4 Tables contains a long list of distributions (along with their PDFs and CDFs!) that will come
up on the exam. You should print it out and keep it handy as you go through this book. Our problems and examples
will require you to work with formulas that are included in the tables, and the sooner you can become familiarized
with where to find what, the faster you can be on the exam.
When dealing with continuous distributions, there is a nice relationship between PDFs and CDFs, given to us by
the Fundamental Theorem of Calculus. Let X be a random variable with a continuous distribution. Then,
f (x) =
or equivalently,
F(x) =
d
F(x)
dx
Z x
•
f (t)dt
(1.1)
(1.2)
These relationships are important because knowing them means you can compute the PDF on the fly if you only
remember the CDF, or vice versa.
Let us now look at an even more concrete example which makes use of a distribution which is not in the Exam 4
Tables, yet extremely easy to remember. We’ll define the distribution first.
9
Definition 1.7. Let X be a discrete uniform random variable over integers in the interval {a, . . . , b} (often denoted
as X ⇠Unif({a, . . . , b})). Then,
1
fX (x) =
x 2 {a, . . . , b}
b a+1
x a+1
FX (x) =
x 2 {a, . . . , b}
b a+1
Let Y be a continuous uniform random variable over the interval (a, b) (often denoted as X ⇠Unif(a, b)). Then,
1
b a
8
>
< 0
fY (y) =
FY (y) =
x
>
:
b a
1
y 2 (a, b)
y<a
ayb
y>b
Intuitively, a uniform distribution assigns to each x 2 S equal probability in the discrete case and equal density in
the continuous case. The expression for F(x) can be derived via (1.2). As will often be the case, you simply need to
memorize the PDF or the CDF if you remember how to go back and forth between them.
Also note that in Definition 1.7, we used subscripts on the PDFs and CDFs to distinguish between the functions
for the 2 different random variables. This is good to do when there is room for ambiguity. Get into the habit of writing
your equations clearly to avoid making careless mistakes.
Example 1.8. Let X ⇠Unif({1, 2, 3, 4}). Let Y ⇠Unif(1, 4). Compute Pr(2  X  4) and Pr(2  Y  4).
Answer The formulas we use will come straight out of Example 1.6.
Pr(2  X  4) =
Pr(2  Y  4) =
4
4
x=2
x=2
Z 4
2
1
 fX (x) =  4 = 0.75
fY (y)dy =
Z 4
1
2
3
dy = 2/3
The uniform distribution is an example of a parametric distribution, because its definition contains the parameters
a and b. All other distributions presented in the Exam 4 Tables are parametric as well.
Sometimes our data does not come from a parametric distribution. In fact, this is often the case when we go out to
collect real data, say a sample of size n, denoted {x1 , . . . , xn }. Yes, actuaries do actually go out in the real world from
time to time! In the absence of any other information, we assign an equal probability to each of the n sample points
(specifically 1/n).
Definition 1.9. An empirical model or empirical distribution is a discrete distribution derived from an observed
dataset of n data points {x1 , . . . , xn }, where each observed value is assigned a probability of 1/n. Formally, the
empirical PDF and CDF are given by,
fn (x) =
# {data points equal to x}
n
Fn (x) =
# {data points  x}
n
(1.3)
The PDF or CDF gives all that is needed to define a distribution. However, both functions can be hard to visualize
without a graphing calculator, and in some instances, the CDF does not even have a closed form! Oftentimes, we
might want information that captures the important details of the distribution. This motivates our definition of some
summary statistics, like the mean. Let’s begin there and more summary statistics will follow throughout the remainder
of this chapter.
10
Definition 1.10. The mean or expectation of a random variable (also known as the expected value ) of a distribution
(or of a random variable X following the distribution) is given by
E(X) =
 x f (x)
(for a discrete distribution)
x f (x)dx
(for a continuous distribution)
x2S
E(X) =
Z
S
The mean is a way to visualize where the center of the distribution lies, and is one of the most important summary
statistics for a distribution.
In addition to computing the expectation of a random variable, we can also compute the expectation of a function
of a random variable.
Definition 1.11. The expectation of a function of a random variable g(X) is given by
E(g(X)) =
 g(x) f (x)
(for a discrete distribution)
g(x) f (x)dx
(for a continuous distribution)
x2S
E(g(X)) =
X.
Z
S
Note that in this definition, g(X) can contain variables other than X, but it can only contain one random variable,
Finally, we will round off our discussion with the topic of percentiles and the median.
Definition 1.12. The 100p-th percentile of a distribution is any value p p such that the CDF F(p p )  p  F(p p ).
The median is defined to be the 50th percentile, p0.5 .
The median is another useful measure of centrality. Compared to the mean, it is more robust (less sensitive) to the
presence of outliers.
Remark. If the CDF F(x) is continuous at p p , then the 100p-th percentile is simply a value p p that satisfies F(p p ) = p.
Any continuous distribution will be continuous at p p , regardless of p.
If the CDF F(x) is not continuous at p p , then we may need to use limits as in our definition. The minus sign in the
exponent of F(p p ) signifies the limit of F(x) as x approaches p p from the left.
Note that while a continuous distribution can have only 1 value for the 100p-th percentile, a discrete distribution
may have an interval of values corresponding to the 100p-th percentile.
The following example illustrates all three points above for the case of p0.5 , the median, but the idea extends to all
percentiles.
Example 1.13. Examine closely the 3 CDFs in the figure below. To satisfy your curiosity, Distribution 1 is a continuous distribution, namely the standard normal distribution (which we define in Section 1.2). The other two are
discrete distributions, which you should recognize because of the discontinuous nature of their CDFs. Distribution 2
is a Unif({1, 2}) and Distribution 3 is a Unif({1, 3, 5}).
11
1.0
Distribution 3
●
0.8
●
0.8
0.8
1.0
Distribution 2
1.0
Distribution 1
0.6
F(x)
0.6
F(x)
0.4
0.4
0.4
F(x)
0.6
●
●
−4
−2
0
2
4
0.2
0.0
0.2
0.0
0.0
0.2
●
0.0
0.5
x
1.0
1.5
2.0
2.5
3.0
x
0
2
4
6
x
(a) A continuous distribution. Median (b) A discrete distribution. The set of (c) A discrete distribution. Median = 3.
= 0.
medians is an interval [1, 2).
The graphical procedure for computing p p is to plot F(x), draw a horizontal line at F(x) = p, and read off the
corresponding values of x where the horizontal line intersects the CDF. Using this procedure, we find that the median
for Distribution 1 is 0 and that the medians for Distribution 2 are the entire set [1, 2). (Note that the ‘)’ to the right of
‘2’ indicates that the value of 2 is not included in this interval, since F(2) = 1 > 0.5).
For Distribution 3, there is no point of intersection for the horizontal line, so we need to consider the limit. We
see that as x approaches 3 from the left, F(x) is equal to 1/3, which, in the fancier notation of the definition, means
F(3 ) = 1/3. Also, we see that F(3) = 2/3. These two values sandwich 0.5, so by the definition, our median is 3.
Now, as an exercise, convince yourself that our graphical procedure for computing percentiles is consistent with
the definition.
Note that the summary statistics (mean, percentile, and median) we have presented thus far can also be computed
for empirical models using the empirical PDF and CDF.
The Exam 4 Tables contain even fancier information on distributions other than PDFs, CDFs, and means. These
include things like moments and VaR. We will cover those next, so that you can have a working understanding of them
as we head on to Exam 4 territory!
1.2
Moments
Raw moments (sometimes just called moments) and central moments are examples of summary statistics. Means are
given in the formula sheet in a general form–raw moments. Let’s see a formal definition for raw moments. In each of
the definitions presented in this section, assume X to be the random variable of interest.
Definition 1.14. The k-th raw moment (of a random variable X) is defined to be E[X k ], and is sometimes denoted
µk0 .
Note that the mean is equal to E[X 1 ] or simply µ (no 0 symbol is used for the first raw moment).
In order to calculate a k-th raw moment we need to integrate if the model is continuous and sum if the model is
discrete. If the model is mixed (both continuous and discrete) we need to integrate over the continuous portions and
sum over the discrete portions.
Applying Definition 1.11, we have the following:
E[X k ] =
Z •
•
E[X k ] =
xk f (x) dx if X is continuous
(1.4)
or
 xk p(x) if X is discrete
x2S
Example 1.15. Suppose X is uniformly distributed on (a, b). Calculate E(X).
12
(1.5)
Answer Since X is uniformly distributed on (a, b), we know that f (x) = 1/(b
E(X) =
Z b
x·
b
a
a
1
=
1
b
Z b
a
x2
b a 2
b+a
2
1
=
=
a
a). Thus,
dx
xdx
b
a
Observe that the expected value for the uniform distribution is just the midpoint of the interval.
Definition 1.16. The k-th central moment (of a random variable X) is µk = E[(X
moment is variance s 2 or Var(X) (and rarely denoted µ2 ), is given by
Var(X) = s 2 = E[(X
Standard deviation is defined by s =
µ)k ]. One example of a central
µ)2 )]
p
Var(X).
Intuitively, the variance gives us a measure of how spread out a distribution is. The formula can be interpreted as:
on average, how far away does a value x deviate from the mean µ? Unfortunately, the variance is still a little awkward
to visualize, because it is not in the same unit of measurement as X. For example, if X is the random variable denoting
the number of dollars you win in a bet, then E(X) is easily interpretable to be the average number of dollars you
would expect to
pwin. Var(X) actually has units of dollars squared. To make that interpretable, we often use standard
deviation s = Var(X) to bring the measurement back to our familiar unit of dollars.
As promised, we now deliver some more summary statistics:
Definition 1.17.
coefficient of variation =
s
µ
skewness = g1 =
E[(X µ)3 ]
µ3
= 3
3
s
s
kurtosis = g2 =
E[(X µ)4 ]
µ4
= 4
s4
s
Before we continue, for completeness, we would like to present the formulas to compute a k-th central moment for
both continuous and discrete distributions. These are just extensions of (1.4) and (1.5).
E[(X
µ)k ] =
E[(X
Z •
•
µ)k ] =
(x
µ)k f (x) dx if X is continuous
(1.6)
or
 (x
µ)k p(x) if X is discrete
x2S
Example 1.18. Suppose X is uniformly distributed. Calculate the second central moment.
13
(1.7)
Answer As in Example 1.15, f (x) = 1/(b
h
E (X
µ)2
i
=
=
=
a). Thus,
◆ ✓
◆
Z b✓
b+a 2
1
x
·
dx
2
b a
a
Z b
1
b + a (b + a)2
x2 2x
+
dx
b a a
2
4
2
! b3
✓ ◆b ✓
◆b
1 4 x3
(b + a)x2
(b + a)2
5
+
x
b a
3 a
2
4
a
a
=
(b
a)2
12
Next, we show an important relationship between second central and raw moments.
µ)2 )] = µ2 = E[X 2 ]
Variance = Var(X) = E[(X
E[X]2
(1.8)
This relationship is often easier to compute since there is no shifting within the expectation calculation. If this is
unclear, do not worry as yet since there will be many examples where we will use exactly this relationship.
Example 1.19. Redo Example 1.18 using the formula Var(X) = E(X 2 )
E(X)2 .
Answer Recall from Example 1.15 that E(X) = (a + b)/2. Thus, E(X)2 = (a2 + 2ab + b2 )/4. We now calculate
E(X 2 ).
E(X 2 ) =
=
=
Z b
a
x2 f (x)dx
1
Z b
x2 dx
b a a
a2 + ab + b2
3
Finally,
Var(X) =
=
=
E(X 2 )
E(X)2
a2 + ab + b2
3
(b a)2
12
a2 + 2ab + b2
4
Observe that the answer calculated here and in Example 1.18 are exactly the same.
We now take a moment’s detour to talk about the normal distribution, another common distribution for which no
formulas appear in the Exam 4 Tables. We will present that here, but apart from knowing the mean and standard
deviation, and how to make use of the standard normal distribution table, you shouldn’t need to memorize anything.
Definition 1.20. Let X ⇠ Normal(µ, s 2 ) . Then,
f (x) = p
1
exp
2ps
⇢
(x
µ)2
2s 2
E(X) = µ
Var(X) = s 2
A standard normal distribution is a Normal(0, 1) distribution . A standard normal random variable is often denoted
as Z.
14
Standard Normal Distribution
0.3
f(z)
0.2
0.1
0.0
f(z)
0.00 0.02 0.04 0.06 0.08 0.10 0.12
0.4
Normal(2,9) Distribution
−5
0
5
10
−3
z
−2
−1
0
1
2
3
z
(a) PDF of X ⇠ Normal(2, 9)
(b) PDF of Z
Note that a normal distribution is symmetric about its mean. This symmetry implies that the mean and the median
are identical for a normal distribution.
Another fact to notice is that the two plots above appear very similar. Ignoring the axes, you might think that they
are in fact the same distribution. Had we plotted both on the same set of axes, you will notice that X has a flatter
distribution (larger variance means values are more spread out) and that its peak is to the right of the peak for Z. We’ll
formalize this notion below.
Remark. Let X ⇠ Normal(µ, s 2 ), for some parameters µ and s 2 . Then, we can standardize X in such a way that we
get a standard normal distribution. To do so, we define Z as follows:
Z=
µ
X
s
Then, Z ⇠ Normal(0, 1).
(1.9)
You’ll notice that we did not give the CDF of a normal distribution. In fact, it is impossible to write the CDF in
closed form. (Try integrating f (x) to convince yourself this is the case.) However, we can transform every normal distribution into a standard normal distribution using the above. The CDF for a standard normal distribution (sometimes
denoted F(z) or f (z)) is presented in your Exam 4 Tables, so that is what we use as a substitute for the CDF of X.
Now, back to moments. Using (1.6) and (1.7) we can now compute any central moment, including the skewness
and kurtosis. Skewness is used to measure symmetry and kurtosis is used to measure flatness.
Distributions that are left-skewed have a negative skewness value, while distributions that are right-skewed have
a positive skewness value. The vast majority of distributions we will be working with will have positive skewness.
This implies that small losses are more likely to occur than large losses; in other words, it is more likely to have a low
impact collision than a catastrophic four car pile up. It should be obvious that from an insurer’s point of view, it is
much better to more often pay lower valued claims than higher valued claims. Luckily, reality agrees. This is why the
majority of our statistical models will have positive skewness. The following graphs of two PDFs show both positive
and negative skewness .
15
(a) The right side has an elongated tail.
(b) The left side has an elongated tail.
To summarize what we have accomplished thus far, we have found several ways of visualize a distribution. The
mean and median gives us a measure of centrality. Variance gives us a measure of spread. Percentiles give us information about various points on the CDF. Skewness and kurtosis give us different ways of visualizing the shape of the
distribution.
Let us go one step further and look at generating functions, which are presented for some of the distributions in
your Exam Tables.
Definition 1.21. For a random variable X, the moment generating function (MGF) of X (denoted MX (t)) is equal to
E(etx ). Furthermore, for discrete variables, the probability generating function (PGF) (denoted by PX (z)) is equal
to E(zX ).
Note that if the MGF of a random variable is given, then the PGF can be calculated, and vice versa via:
Similarly,
MX (t) = E(etX ) = E (et )X = PX (et ).
(1.10)
⇣
⌘
PX (z) = E(zX ) = E (eln z )X = E(e(ln z)X ) = MX (ln z).
(1.11)
Two useful facts to remember about PGFs and MGFs:
Pr(S = k) =
E(X k ) =
d k PS (z)
dzk k!
dk
dt k
MX (t)
z=0
t=0
There exists a one-to-one correspondence between the MGF or PGF of a random variable X with the CDF of the
random variable (although the proof is outside the scope of this book). This is a very useful property because one can
often easily deduce the distribution from the form of the MGF (or PGF).
Now, to end the section on MGFs and PGFs let us illustrate the relationship between them through an example.
Example 1.22. Observe that your formula sheet does not include a MGF for the Poisson distribution. The easy way
to get it would be to simply use the PGF and plug in et for z. For the sake of illustration, let us work the entire example
starting with how to compute the PGF. Observe from the formula sheet that if X ⇠ Poisson(l ), then
P(X = x) =
e
l
lx
.
x!
Thus to compute the PGF we observe:
PX (z) = E(zX ) =
•
Â
zx e
x=0
=e
l
=e
l
16
l
lx
x!
•
zx l x
x=0 x!
Â
•
(zl )x
x=0 x!
Â
n
x
A useful fact to know is: ex = •
n=0 n! . Using this in our last step from above we see:
e
l
•
(zl )x
=e
x=0 x!
Â
l zl
e
= el (z
1)
Again, we see that our answer matches the formula sheet exactly (so use the formula sheet!). Now, to find the
MGF of X we simply use (1.10) and observe the following:
MX (t) = PX (et ) = el
1.3
(et 1)
Sums of Random Variables
Now, let’s get into something that actuaries use all the time, sums of random variables. Let Xi is a random variable
used to model losses for the i-th individual (or the i-th risk in a policy). An insurer needs to be able to aggregate the
individual risks to model the total risk. For this, we define the random variable Sk = Âki=1 Xi , the total losses for k
individuals.
First, a nice property of expectations is that it is linear. This means that:
k
E(Sk ) = Â E(Xi )
i=1
If all Xi are independent, then furthermore:
k
Var(Sk ) = Â Var(Xi )
i=1
Now, we present a related result in the form of MGFs and PGFs.
Theorem 1.23. Let Sk = Âki=1 Xi where the Xi ’s are independent. Then MSk (t) = ’kj=1 MX j (t) and PSk (z) = ’kj=1 PX j (z).
Using the tools provided in Theorem 1.23, we can begin to analyze the distribution of various Sk random variables.
Example 1.24. Suppose Xi ⇠ gamma(a, q ), for i 2 {1, . . . , k}. Compute the MGF for Sk = Âki=1 Xi . Repeat this
exercise for the case where each Xi ⇠ gamma(ai , q ), where the ai parameters may vary for each Xi .
Answer First, we will compute the MGF for one Xi . Although you should never do this on the test because the process
is too time consuming, we show a derivation from first principles to illustrate the theory.
Then we will use the MGF property defined in Theorem 1.23 to find the distribution of Sk , the sum of k gamma
variables. Pulling the PDF from the Exam Tables, we get,
x
f (x) =
tX
E(e ) =
=
=
=
xa 1 e q
G(a)q a
Z • tx a 1
e x
e
G(a)q a
0
R • tx a 1
e x
e
0
1
q)
xa
G(a)q a
0
0
x
q
dx
dx
G(a)q a
R • x(t
e
R•
x
q
e
17
1 dx
x( q1 t) a 1
x
dx
G(a)q a
Now, we do a substitution of variables. Let
✓
◆
1
dy
1
y=x
t )
=
q
dx q
t and x =
y
1
q
t
.
Also observe that as x goes from 0 ! •, y goes from 0 ! • (we are assuming t  q1 .) We then continue the computation from above:
R•
0
e
x( q1
t) a 1
x
dx
=
R•
=
R•
0
G(a)q a
0
e y xa
e
R•
0
=
y
1
✓
1
q
1
t
G(a)q a
✓
◆a
1
q
0
y ya 1 dy
e
G(a)q a
1
q
t
t
1
q
1
G(a)q a
y ya 1 dy
e
G(a)q a
1
q
t
R•
G(a)q a q1
1
=
q
( q qt)a
✓
◆a
1
=
1 qt
= (1
qt)
t
◆
dy
a
0
G(a)
=
a
dy
1✓
y
Now, recall that the general formula for a gamma function: G(z) =
G(n) = (n 1)! when n is an integer. Continuing on, we see:
R•
◆
t
e
t t z 1 dt.
You should also know that
a
a
If you look through the formulas on your Exam Tables, you will see that the MGF presented is M(t) = (1
which is exactly what we got. Thus, the MGF for Sk = Ânk=1 Xi is
n
’(1
qt)
a
= (1
qt)
qt)
a
na
i=1
On the test, you should never derive a MGF if it is provided in the tables! MGFs and PGFs are contained the
Exam Tables. If you don’t remember the MGF or PGF, simply refer to the Exam Tables.
Now suppose we wanted to find the distribution of Sk where each Xi ⇠ gamma(ai , q ) and all the Xi ’s are independent. Thus they all have the same q but different ai . Deriving such a formula directly is quite difficult. However, we
can apply the properties of moments for Sk to help us. Referencing Theorem 1.23 we see:
k
k
MSk (t) = ’ MXi (t) = ’
i=1
i=1 (1
1
=
qt)ai
(1
1
qt)Â ai
Finally, we see that the MGF of Sk is simply the MGF of a gamma distribution with parameters (Â ai , q ). Since
there is a one-to-one function between MGF’s and the distribution function of a random variable, we can now conclude
that Sk is distributed as a gamma random variable with parameters (Â ai , q ).
Next, we give one of the most important and crucial ideas of modern statistics, the central limit theorem.
18
Theorem 1.25. Central Limit Theorem (CLT)
Let {X1 , . . . Xk } be a sequence of random variables, and let Sk denote their sum, i.e. Sk = X1 + X2 + ... + Xk . Under
certain nice conditions (which usually can be assumed for the actuarial exam!),
where ; means convergence in distribution.
Sk E(Sk )
; N(0, 1)
lim p
k!•
Var(Sk )
Let’s spendpa little time deciphering the notation in this theorem. On the left hand side, we have a random variable
(Sk E(Sk ))/ Var(Sk ) which has been standardized (subtract the mean, then divide by the standard deviation). The
theorem states that as k increases (as we collect more and more data), this standardized random variable converges to
a standard normal distribution. That means we can compute probabilities for the standardized random variable, and
in turn, we can compute probabilities for Sk , assuming a large enough sample size k. Note that the theorem does not
mention any requirements on how each Xi is distributed (gamma, Poisson, exponential, etc.). Their sums inevitably
become something we can model with a normal distribution!
Corollary 1.26. Assume that the Xi are independent and identically distributed (iid), such that E(Xi ) = µ and
Var(Xi ) = s 2 for all i. Then the Central Limit Theorem implies that:
1. Sk approximately follows a Normal(kµ, ks 2 ) distribution.
2. X̄ = Sk /k approximately follows a Normal(µ, s 2 /k) distribution.
Now we present an example that makes use of the Central Limit Theorem. This example will also show how we
compute normal distribution probabilities. Since it is the first such example in this book, we will go through it in quite
some detail.
Example 1.27. Suppose we ask 100 actuarial students about whether they believe that they will pass Exam 4 on their
first sitting. Let S100 denote the number of people who say yes. In fact, we could write out S100 = Â100
i=1 Xi , where each
Xi ⇠ Binomial(m = 1, q) independently. If the true q is 0.4 (40% of all actuarial students sitting for Exam 4 for the
first time believe they will pass it the first time), then what is the probability that our poll resulted in at least 50 people
believing they will pass?
Answer First, we know that Xi are iid. This is because we are given that each Xi follows the same distribution as
every other Xi , by assumption, and furthermore, each person is assumed to be independent of all others. The mean and
variance of Xi can be found directly from the Exam Tables: µ = q = 0.4 and s 2 = q(1 q) = 0.4(0.6) = 0.24.
By Corollary 1.26, X̄ approximately follows a Normal(µ,s 2 /k) = Normal(0.4, 0.24/100) distribution. k = 100 is
large enough forpus to make this approximation, in case you were worried. Thus, we can standardize as in (1.9) to get
Z = (X̄ 0.4)/ 0.24/100 to do our computations.
!
X̄ 0.4
0.5 0.4
p
P(X̄ > 0.5) = P p
0.24/100
0.24/100
=
P (Z
=
1
P(Z < 2.04)
2.04)
=
1
F(2.04)
We have now written our desired probability as a function of F(z), the CDF of a standard normal random variable.
Looking it up from the standard normal distribution table, we find that F(2.04) = 0.9793. This implies that P(X̄ >
0.5) = 1 0.9793 = 0.0207, a very small probability!
The above example ignored an extra step in dealing with normal approximations to binomial distributions. Because
the binomial distribution is a discrete distribution, modeling it as a normal distribution requires some sort of continuity
adjustment. We’ll overlook this for now.
19
1.4
Problems for Section 1
1. Claim sizes are 90, 110, 300, 600 with probabilities 0.5, 0.2, 0.05, 0.25. Compute the skewness and kurtosis.
2. Losses have an exponential distribution with the 75th percentile equal to 1,000. What is q ?
3. Losses have a inverse Weibull distribution with the 25th percentile equal to 5,000 and 50th percentile equal to
50,000. What is t?
4. Suppose a company writes claims where losses follow a Pareto distribution with a = 3 and q = 10, 000. Use
the Central Limit Theorem to approximate the probability that the sum of 100 claims exceeds $700,000.
5. Suppose a company writes claims where losses follow a gamma distribution with a = 10, q = 1, 000. Use the
Central Limit Theorem to approximate the probability that the sum of 100 claims exceeds $1 million.
6. A company wrote 100 contracts with a sample mean of 2,000 and a standard deviation of 500. Next year, it will
write 1,500 contracts. What is the probability that payout will exceed 102% of expected losses?
1.5
Solutions for Section 1
1. First we compute the mean µ and variance s 2 .
µ = 0.5(90) + 0.2(110) + 0.05(300) + 0.25(600) = 232
2
232)2 + 0.2(110 232)2 + 0.05(300 232)2 + 0.25(600 232)2 = 47, 146
p
The standard deviation s is then 47146 = 217.1. Next, we calculate the following central moments:
s = 0.5(90
µ3 = 0.5(90
µ4 = 0.5(90
232)3 + 0.2(110
232)4 + 0.2(110
232)3 + 0.05(300
232)4 + 0.05(300
232)3 + 0.25(600
232)4 + 0.25(600
232)3 = 10, 679, 916
232)4 = 4, 833, 584, 152
Skewness = µ3 /s 3 = 10, 679, 916/217.13 = 1.04.
Kurtosis = µ4 /s 4 = 4, 833, 584, 152/217.14 = 2.17.
2. We know that the cumulative distribution function for exp(q ) is
F(x) = 1
e
x/q
Hence, substituting in the given information, we have
0.75 =
F(1000) = 1
0.25 =
e
e
1000/q
1000/q
1000
ln 0.25 =
q
1000
q =
ln 0.25
⇡ 721.3
3. Recall that for an inverse Weibull(t, q ) distribution, we have F(x) = e
we know that
✓
0.25 = e
0.5 = e
(q /5000)t
(q /50,000)t
q
5000
) ln 0.25 =
) ln 0.5 =
20
(q /x)t .
✓
◆t
q
50, 000
◆t
Hence, given the two percentiles,
Dividing the two equations at the right, we get
✓ ✓
◆t ◆ ✓ ✓
◆ ◆
ln 0.5
q
5000 t
=
·
ln 0.25
50, 000
q
✓ ◆t
1
1
=
2
10
ln 0.5
t =
ln 0.1
t ⇡ 0.301
4. Recall that for a Pareto(a, q ) distribution, the moments can be calculated using the following formula (on the
equation sheet):
q k k!
E(X k ) =
for integer k < a.
(a 1)(a 2)...(a k)
Applying this, we have that for a single claim X ⇠ Pareto(a, q ), we have
E(X) =
10, 000
= 5000
2
and
s 2 = Var(X) = E(X 2 )
[E(X)]2 =
10, 0002 (2)
2
50002 = 10, 0002
50002
By the CLT, we know that S, the sum of 100 claims, follows a N(100(5000), 100s 2 ) distribution. Hence,
!
700, 000 5000(100)
P(S > 700, 000) = 1 F p
(100)[10, 0002 50002 ]
= 1
F(2.31)
= 0.0104
5. Using gamma distribution formulas, each loss has a mean and variance of
µ = 10(1000) = 10, 000 and s 2 = 10(1000)2 = 10, 000, 000
For 100 claims, we get the total loss mean equal to 100(10, 000) = 1, 000, 000 and variance equal to 100(10, 000, 000) =
1, 000, 000, 000. The probability of total claims exceeding $1 million is thus:
✓
◆
1, 000, 000 1, 000, 000
p
1 F
= 1 F(0)
1, 000, 000, 000
= 0.5
Hence, the probability that the sum of 100 claims exceeds $1 million is 0.5.
6. For
the total sum of contracts has a mean of 1500(2000) = 3, 000, 000 and standard deviation of
p 1500 contracts,
p
2
1500s = 1500(5002 ) = 19, 365, and has an approximately normal distribution.
102% of the mean is 1.02(3, 000, 000) = 3, 060, 000, so we need to compute
✓
◆
3, 060, 000 3, 000, 000
P(X > 3, 060, 000) = 1 F
= 1 F(3.10) = 1
19, 365
21
0.999 = 0.001
Download