Uploaded by bhushan kumar

CS1 CMP 2022 New

advertisement
ActuarialStatistics
Combined MaterialsPack
for examsin 2022
The ActuarialEducationCompany
on behalf of the Institute
and Faculty of Actuarie
Allstudy materialproducedby ActEdis copyright andis sold
for the exclusiveuse of the purchaser. Thecopyright is
ownedbyInstitute andFacultyEducationLimited, a
subsidiary of the Institute and Faculty of Actuaries.
Unlessprior authorityis grantedby ActEd,you maynot hire
out,lend, give out, sell, store ortransmit electronically or
photocopy any part of the study material.
You musttake care of yourstudy materialto ensurethat it
is not used or copied by anybody else.
Legal action will betaken if these terms areinfringed. In
addition, we mayseekto take disciplinaryactionthrough
the profession orthrough your employer.
Theseconditionsremainin force after you havefinished
usingthe course.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1: Study
Guide
Page 1
Subject
CS1
2022StudyGuide
Introduction
This Study Guide has been created to help you navigate your
waythrough
Subject CS1. It contains
all the information you will need before starting to study Subject CS1for the 2022 exams and you
mayalso find it useful to refer to throughout your studies.
The guideis split into two parts:
Part 1 contains specific information about Subject CS1
Part 2 contains general information
about the Core Principles subjects.
Pleaseread this Study Guide carefully before reading the Course Notes, evenif you have studied
for some actuarial exams before. Whileyou mayhave already read (the majority of) the Part 2
material in previous subjects, the information
in Part 1is unique to this course.
Contents
Part1 Section1
Subject CS1 background and contents
Page 2
Section 2
Subject CS1 Syllabus and Core Reading
Page 4
Section 3
Subject CS1 summary of ActEd products
Page 12
Section 4
Subject CS1 skills and assessment
Page 13
Section 5
Subject CS1 frequently
Page 14
Part 2 Section 1
Before you start
Page15
Section 2
Corestudy material
Page16
Section 3
ActEdstudy support
Page18
Section 4
Study skills and assessment
Page 25
Section 5
The Actuarial
asked questions
Education
Company
Queriesand feedback
Page 31
IFE: 2022 Examination
Page 2
CS1: Study
Guide
1.1 SubjectCS1 backgroundandcontents
History
The Actuarial Statistics subjects (Subjects CS1and CS2) wereintroduced in the Institute and
Faculty of Actuaries
2019 Curriculum.
Subject CS1is Actuarial Statistics.
Predecessors
Thetopics in the Actuarial Statistics subjects cover content previously in Subjects CT3,CT4,CT6
and a small amount from Subject ST9:
Subject CS1 contains
material from Subjects CT3 and CT6.
Subject CS2 contains
material from Subjects CT4, CT6 and ST9.
Exemptions
In order to be eligible for a passin Subject CS1,you will need:
to have passed or been granted an exemption from Subject CT3 during the transfer
process
to have met the professions
Seethe professions
requirements
based on the current curriculum.
websitefor further details:
www.actuaries.org.uk/studying/exam-exemptions
Prerequisites/ requiredknowledge
The CS1course assumes that students have a certain level of statistical knowledge before they
start.
More detail on this is given in the CS1 Syllabus (see pages 4-11 in this document).
If you feel that you do not havethis level of background, you may wantto consider ordering the
ActEdcourse Pure Mathsand Statistics for Actuarial Studies. Moreinformation on prerequisites
is given later (see page 5 of this document).
Alternatively,
a good A-level statistics textbook
would help to fill any gaps.
Anextra chapter covering the assumed statistical knowledge for Subject CS1is available on the
ActEd website. Alink is given below:
www.ActEd.co.uk/help_and_advice_CS1_assumed_knowledge.html
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1: Study
Guide
Page 3
Linksto othersubjects
Subject CS2 Risk Modelling and Survival Analysis builds directly on the materialin this
subject.
Subjects CM1 and CM2
Actuarial
Mathematics and Financial Engineering and Loss
Reservingapply the materialin this subject to actuarial andfinancial
modelling.
Contents
There are four parts to the Subject CS1course.
The parts cover related topics and are broken
down into chapters. Atthe end of each part there are assignments testing the materialfrom that
part.
Thefollowing table shows how the parts and chapters relate to each other. Thefinal column
shows how the chapters relate to the days of the regular tutorials.
This table should help you plan
your progress acrossthe study session.
Part
1
2
3
4
The Actuarial
Chapter
No of
Title
pages
1
Data analysis
23
2
Probability
63
3
Generatingfunctions
distributions
X1
Joint distributions
59
5
Conditional
expectation
20
6
Central Limit Theorem
27
7
Sampling and statistical inference
33
8
Point estimation
63
9
Confidence intervals
50
10
Hypothesistesting
89
11
Correlation
41
12
Linear regression
77
13
Generalisedlinear
14
Bayesianstatistics
44
15
Credibility theory
34
16
Empirical Bayes credibility
Company
Tutorial
4 days
1
30
4
Education
X Asst Y Asst
models
theory
Y1
X2
2
X3
3
73
Y2
X4
4
54
IFE: 2022 Examination
Page 4
CS1: Study
Guide
1.2 SubjectCS1 Syllabusand CoreReading
Syllabus
The Syllabusfor Subject CS1is given here. Tothe right of each objective are the chapter numbers
in
which the objective is covered in the ActEd course.
Aim
The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
statistical techniques that are of particular relevance to actuarial work.
Competences
Onsuccessful completion
of this subject, a student
will be able to:
1.
describe the essentialfeatures of statistical distributions
2.
summarise data using appropriate statistical analysis, descriptive statistics and graphical
presentation
3.
describe and apply the principles
4.
describe, apply and interpret the results ofthe linear regression modeland generalised
linear
5.
of statistical inference
models
explain the fundamental
Bayesian estimators.
concepts
of Bayesian statistics and use them to compute
Syllabus topics
1.
Random variables and distributions
(20%)
2.
Data analysis
(15%)
3.
Statistical inference
(20%)
4.
Regression theory
5.
Bayesian statistics
and applications
(30%)
(15%)
The weightings areindicative of the approximate balance of the assessment ofthis subject
between the mainsyllabus topics, averaged over a number of examination sessions.
The weightings also have a correspondence
syllabus topic.
with the amount of learning
material underlying
each
However, this will also reflect aspects such as:
the relative complexity of eachtopic, and hencethe amount of explanation and support
required for it
the need to provide thorough foundation
understanding on whichto build the other
objectives
the extent of prior knowledge
whichis expected
the degree to
area is
IFE: 2022 Examinations
which each topic
more knowledge
or application
based.
The Actuarial
Education
Compan
CS1: Study
Guide
Page 5
Assumed knowledge
Thissubject assumesthat astudent will be competent in the following elements offoundational
mathematics and basic statistics:
1
Summarise the mainfeatures of a data set (exploratory data analysis)
1.1
Summarise a set of data using a table or frequency
distribution,
and display it
graphically using aline plot, a box plot, a bar chart, histogram, stem and leaf plot,
or other appropriate elementary device.
1.2
Describethe level/location
appropriate.
of a set of data usingthe mean, median, mode,as
1.3
Describe the spread/variability
of a set of data using the standard
deviation, range
andinterquartile range, as appropriate.
1.4
2
Explain whatis meant by symmetry and skewness for the distribution of a set of
data.
Probability
2.1
Set functions and sample spacesfor an experiment and an event.
2.2
Probability as aset function on a collection of events andits basic properties.
2.3
Calculate probabilities of events in simple situations.
2.4
Derive and use the addition rule for the probability
2.5
Define and calculate the conditional
of another event.
2.6
Derive and use Bayes Theorem for events.
2.7
Defineindependence for two events, and calculate probabilities in situations
probability
of the union of two events.
of one event given the occurrence
involving independence.
3
Randomvariables
3.1
Explain whatis meant by a discrete random variable, define the distribution
function and the probability function
to calculate probabilities.
3.2
Explain whatis meant by a continuous random variable, define the distribution
function and the probability density function of such a variable, and usethese
functions
3.3
The Actuarial
Education
of such a variable, and use these functions
to calculate
probabilities.
Define the expected value of afunction of arandom variable, the mean,the
variance, the standard deviation, the coefficient of skewness and the moments of
arandom variable, and calculate such quantities.
Company
IFE: 2022 Examination
Page 6
CS1: Study
3.4
Evaluate probabilities
associated
with distributions
(by calculation
Guide
or by referring
to tables as appropriate).
3.5
Derivethe distribution of afunction of arandom variable from the distribution of
the random
variable.
Detailed syllabus objectives
1
Random variables and distributions
(20%)
1.1
Define basic univariate distributions and usethem to calculate probabilities, quantiles and
moments.
1.1.1
(Chapter
Define and explain the key characteristics ofthe discrete distributions: geometric,
binomial,
1.1.2
2)
negative binomial,
hypergeometric,
Poisson and uniform
on a finite
set.
Define and explain the key characteristics ofthe continuous distributions: normal,
lognormal,
exponential,
gamma, chi-square, ,tF
1.1.3
Evaluate probabilities and quantiles associated
or using statistical software as appropriate).
1.1.4
Define and explain the key characteristics
, beta and uniform
with distributions
on an interval.
(by calculation
of the Poisson process and explain the
connection between the Poisson process and the Poisson distribution.
1.1.5
1.1.6
1.2
Generate basic discrete and continuous random variables usingthe inverse
transform method.
Generate discrete and continuous
Independence,
joint
and conditional
random
distributions,
variables using statistical software.
linear combinations
of random
variables
1.2.1
Explain whatis meantbyjointly distributed random variables, marginal
distributions
1.2.2
(Chapter 4)
and conditional
Define the probability
distributions.
function/density
function
of a marginal distribution
and of a
conditional distribution.
1.2.3
Specifythe conditions under whichrandom variables are independent.
1.2.4
Define the expected value of a function
variables, the covariance
and correlation
of two jointly
coefficient
distributed
random
between two variables, and
calculate such quantities.
1.2.5
1.2.6
1.2.7
Define the probability function/density function of the sum oftwo independent
random variables as the convolution of two functions.
Derive the
mean and variance oflinear
combinations
of random
variables.
Usegenerating functions to establish the distribution oflinear combinations of
independent random variables.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1: Study
Guide
Page 7
1.3
Expectations,
conditional
expectations
1.3.1
Define the conditional expectation of one random variable given the value of
another random variable, and calculate such a quantity.
1.3.2
Show how the meanand variance of arandom variable can be obtained from
expected values of conditional
1.4
(Chapter
expected values, and apply this.
Generating functions
(Chapter
1.4.1
Define and determine the moment generating function of random variables.
1.4.2
Define and determine
1.4.3
Usegenerating functions to determine the moments and cumulants of random
the cumulant
generating function
variables, by expansion as a series or by differentiation,
1.4.4
Identify
5)
the applications for
of random
3)
variables.
as appropriate.
which a moment generating function,
a cumulant
generating function and cumulants are used, and the reasons whythey are used.
1.5
Central Limit Theorem
1.5.1
statement and application
(Chapter 6)
State the Central Limit Theorem for a sequence
distributed
1.5.2
random
ofindependent,
identically
variables.
Generate simulated samples from a given distribution and compare the sampling
distribution withthe normal.
2
Data analysis
(10%)
2.1
Data analysis
(Chapter
2.1.1
and
Describe the possible aims of data analysis (eg descriptive, inferential,
1)
predictive).
2.1.2
Describethe stages of conducting a data analysisto solve real-world problems in a
scientific
2.1.3
manner and describe tools suitable for each stage.
Describesources of data and explain the characteristics of different data sources,
including extremely large data sets.
2.1.4
Explain the meaningand value of reproducible research and describe the
elements required
2.2
Exploratory
to ensure a data analysis is reproducible.
data analysis
(Chapter
2.2.1
Describethe purpose of exploratory data analysis.
2.2.2
Useappropriate tools to calculate suitable summary statistics and undertake
exploratory data visualizations.
2.2.3
Define and calculate
Pearsons, Spearmans
for bivariate data, explain their interpretation
appropriate.
The Actuarial
Education
Company
and Kendalls
11)
measures of correlation
and perform statistical inference as
IFE: 2022 Examination
Page 8
CS1: Study
2.2.4
Use principal components
analysis to reduce the dimensionality
of a complex
Guide
data
set.
2.3
Random sampling and sampling distributions
(Chapter 7)
2.3.1
Explain whatis meantby a sample, a population and statistical inference.
2.3.2
Define a random sample from a distribution
2.3.3
Explain what is
meant by a statistic
Determine the
mean and variance of a sample
2.3.4
and its sampling
variancein terms of the population
2.3.5
2.3.7
variable.
distribution.
mean and the
mean of a sample
mean,variance and sample size.
State and usethe basic sampling distributions for the sample meanand the
sample variance for random samples from
2.3.6
of a random
a normal distribution.
State and usethe distribution ofthe t -statistic for random samplesfrom a normal
distribution.
State and usethe F distribution for the ratio of two sample variances from
independent
samples taken from normal distributions.
3
Statistical inference
(25%)
3.1
Estimation and estimators
(Chapter 8)
3.1.1
Describe and apply the method of momentsfor constructing estimators of
population parameters.
3.1.2
Describe and apply the
method of maximum likelihood
for constructing
estimators of population parameters.
3.2
3.1.3
Define the following terms: efficiency, bias, consistency and meansquare error.
3.1.4
Define and apply the property
3.1.5
Define the
3.1.6
Describe and apply the asymptotic distribution of maximumlikelihood estimators.
3.1.7
Usethe bootstrap
of unbiasedness
meansquare error of an estimator,
method to estimate
of an estimator.
and use it to compare
properties
of an estimator.
Confidenceintervals and prediction intervals
3.2.1
Define in general terms
distribution
3.2.2
3.2.3
a confidence interval
estimators.
(Chapter 9)
for an unknown
parameter
of a
based on a random sample.
Definein general terms a prediction interval for afuture observation based on a
random sample.
Derive a confidence interval for an unknown parameter using a given sampling
distribution.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1: Study
Guide
Page 9
3.2.4
Calculate confidence intervals
for the
mean and the variance of a normal
distribution.
3.2.5
Calculate confidence intervals for a binomial probability and a Poisson mean,
including
3.2.6
the use of the normal approximation
Calculate confidence intervals
distribution,
for two-sample
in both cases.
situations involving
and the binomial and Poisson distributions
the normal
using the normal
approximation.
3.3
3.2.7
Calculate confidence intervals for a difference between two
data.
3.2.8
Usethe bootstrap
meansfrom paired
method to obtain confidence intervals.
Hypothesis testing and goodness offit
3.3.1
(Chapter 10)
Explain whatis meant by the following terms: null and alternative hypotheses,
simple and composite hypotheses, type I and type II errors, sensitivity, specificity,
test statistic, likelihood
and power of a test.
ratio, critical region, level
of significance,
probability
value
3.3.2
Apply basictests for the one-sample and two-sample situations involving the
normal, binomial and Poisson distributions, and apply basictests for paired data.
3.3.3
Applythe permutation approach to non-parametric hypothesis tests.
3.3.4
Use a chi-square test to test the hypothesis that a random sample is from a
particular distribution, including cases where parameters are unknown.
3.3.5
Explain whatis meant by a contingency (or two-way) table, and use a chi-square
test to test the independence
4
Regression theory
4.1
Linear regression
of two classification
criteria.
and applications
(30%)
(Chapter 12)
4.1.1
Explain whatis meant by response and explanatory variables.
4.1.2
State the simple regression model(with a single explanatory variable).
4.1.3
Derive the least squares estimates
simple linear regression
4.1.4
of the slope and intercept
parameters in a
model.
Useappropriate software to fit asimple linear regression modelto a data set and
interpret
the output.
Perform statistical inference on the slope parameter.
Describe the use of measures of goodness of fit of alinear regression
model.
Useafitted linear relationship to predict a meanresponse or anindividual
response
The Actuarial
Education
Company
with confidence limits.
IFE: 2022 Examination
Page 10
CS1: Study
Guide
Useresiduals to check the suitability and validity of alinear regression
model.
4.2
4.1.5
State the multiplelinear regression model(with several explanatory variables).
4.1.6
Use appropriate software
and interpret the output.
4.1.7
Use measuresof modelfit to select an appropriate set of explanatory variables.
Generalisedlinear
4.2.1
to fit a multiple linear regression
models
(Chapter 13)
Define an exponential family
distributions
and normal.
4.2.2
model to a data set
of distributions.
Show that the following
maybe written in this form: binomial, Poisson, exponential, gamma
State the meanand variance for an exponential family, and define the variance
function
and the scale parameter.
Derive these quantities for the distributions
above.
4.2.3
Explain whatis meantby the link function and the canonical link function,
referring to the distributions above.
4.2.4
Explain what is meant by a variable, afactor taking categorical values and an
interaction term. Define the linear predictor, illustrating its form for simple
models,including polynomial modelsand modelsinvolving factors.
4.2.5
Define the deviance and scaled deviance and state how the parameters of a
generalised linear model may be estimated. Describe how a suitable model may
be chosen by using an analysis of deviance and by examining the significance of
the parameters.
4.2.6
Define the Pearson and devianceresiduals and describe how they maybe used.
4.2.7
Apply statistical tests to determine the acceptability of afitted
chi-square test and the likelihood
4.2.8
5
5.1
Fit a generalised linear
model: Pearsons
ratio test.
model to a data set and interpret
Bayesianstatistics
the output.
(15%)
(Chapters 14, 15 and 16)
Explainthe fundamental concepts of Bayesianstatistics and usethese concepts to
calculate
Bayesian estimators.
5.1.1
UseBayes theorem to calculate simple conditional probabilities.
5.1.2
Explain whatis meantby a prior distribution, a posterior distribution and a
conjugate prior distribution.
5.1.3
Derive the posterior
5.1.4
Explain what is
IFE: 2022 Examinations
distribution
for a parameter in simple cases.
meant by aloss function.
The Actuarial
Education
Compan
CS1: Study
Guide
Page 11
5.1.5
Usesimple loss functions
to derive Bayesian estimates
5.1.6
Derive credible intervals in simple cases.
5.1.7
Explain
what is meant by the credibility
of parameters.
premium formula
and describe the role
played by the credibility factor.
5.1.8
Explain the Bayesian approach to credibility
theory
and useit to derive credibility
premiums in simple cases.
5.1.9
Explain the empirical Bayesapproach to credibility theory and useit to derive
credibility premiums in simple cases.
5.1.10 Explain the differences between the two approaches and state the assumptions
underlying
each of them.
Core Reading
The Subject CS1Course Notesinclude the Core Readingin full, integrated throughout the course.
Further reading
The exam will be based on the relevant
will be the
The Actuarial
main source of tuition
Education
Company
Syllabus and Core Reading and the ActEd course
material
for students.
IFE: 2022 Examination
Page 12
CS1: Study
Guide
1.3 SubjectCS1 summaryof ActEdproducts
Thefollowing products are available for Subject CS1:
Course Notes
Paper B Online Resources(PBOR),including the Y Assignments
X Assignments four assignments:
X1, X2: 80-mark tests (you are allowed 23/4hoursto complete these)
X3, X4: 100-mark tests (you are allowed
31/4hours to complete these)
Y Assignments two assignments:
Y1, Y2: 100-mark tests (you are allowed 13/4hours to complete these)
Series X Marking
Series Y Marking
Online Classroom
over 150 tutorial units
Flashcards
Revision Notes
seven A5 booklets
ASET(2014-17 papers) four years of exam papers,ie eight sittings, covering the period
April 2014 to September 2017
ASET(2019-21 papers) three years of exam papers, covering the period April 2019 to
September 2021
Mini ASET covering the April 2022 exam paper
Mock Exam
one 100-mark test for the Paper A examination
and a separate
100-mark
test for the practical Paper B exam
Additional
100-mark
Mock Pack (AMP)
two additional
100-mark
Paper Atests and two additional
Paper Btests
MockExam Marking
Marking Vouchers.
Products are generally availablein both paper and eBook format. Visit www.ActEd.co.uk for full
details about available eBooks, software requirements and restrictions.
The following
tutorials
are typically
available for Subject CS1:
Regular Tutorials (four days)
Block Tutorials (four days)
a Preparation
Dayfor the practical exam.
Full details are set out in our Tuition Bulletin, whichis available on our website at
www.ActEd.co.uk.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1: Study
Guide
Page 13
1.4 SubjectCS1 skills andassessment
Technicalskills
Subjects CS1and CS2are very mathematical and haverelatively few questions requiring wordy
answers.
Exam skills
Exam question skill levels
In the CSsubjects, the approximate split of assessment acrossthe three skill types is:
Knowledge
20%
Application
65%
Higher Orderskills
15%.
Assessment
Assessment consists of a combination of a 31/4-hour examination
analysis and statistical modelling examination.
The Actuarial
Education
Company
and a 13/4-hour practical data
IFE: 2022 Examination
Page 14
CS1: Study
Guide
1.5 SubjectCS1 frequently askedquestions
Q:
A:
Q:
A:
Whatknowledge of earlier subjects should I have?
Noknowledge of earlier subjects is required.
Whatlevel
of mathematics is required?
Thelevel of mathsyou need for this courseis broadly A-level standard. However,there
maybe some symbols (eg the gamma function) that are not usuallyincluded on A-level
syllabuses.
You will find the course (and the exam)
much easier if you feel comfortable
withthe mathematical techniques (eg integration by parts) usedin the course and you
feel confident in applying them yourself.
If your
maths or statistics is alittle
rusty you
may wish to consider
purchasing additional
materialto help you get up to speed. The course Pure Mathsand Statistics for Actuarial
Studies is available from ActEd andit covers the mathematical techniques that are
required for the Core Principles subjects, some of which are beyond A-Level (or Higher)
standard. You do not needto workthrough the whole course in order
to it
when you need help on a particular topic.
Aninitial
you can just refer
assessment to test your
mathematical skills andfurther details regarding the course can befound on our website.
You may also find this Assumed Knowledge chapter useful:
www.ActEd.co.uk/help_and_advice_CS1_assumed_knowledge.html
Q:
A:
Whatshould I doif I discover an error in the course?
If you find an error in the course, please check our website at:
www.ActEd.co.uk/paper_corrections.html
to see if the correction
has already been dealt with.
Otherwise please send details via
email to CS1@bpp.com.
Q:
A:
Whoshould I send feedback
to?
Weare always happy to receive feedback from students, particularly details concerning
any errors, contradictions
or unclear statements
in the courses.
If you haveanycomments onthis coursein general,pleaseemailto CS1@bpp.com.
If you have any comments or concerns about the Syllabus or Core Reading,these can be
passed on to the profession via ActEd. Alternatively, you can send them directly to the
Institute
and Faculty of Actuaries
Examination
Team by email to
education.services@actuaries.org.uk.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1: Study
Guide
Page 15
2.1 Beforeyoustart
Whenstudying for the Institute and Faculty of Actuaries exams, you will need:
a copy ofthe Formulae and Tablesfor Examinations ofthe Faculty of Actuariesand the
Institute
of Actuaries, 2nd Edition (2002)
these are referred to simply asthe Tables
a scientific calculator or Excel.
The Tables are available from the Institute and Faculty of Actuaries eShop. Pleasevisit
www.actuaries.org.uk.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 16
CS1: Study
Guide
2.2 Corestudy material
Thissection explains the role ofthe Syllabus, Core Reading and supplementary ActEdtext. It also
gives guidance on how to usethese materials mosteffectively in order to passthe exam.
Some of the information
below is also contained in the introduction
to the Core Reading
produced by the Institute and Faculty of Actuaries.
Syllabus
The Syllabusfor Subject CS1has been produced bythe Institute and Faculty of Actuaries. The
relevant individual syllabus objectives areincluded at the start of each course chapter and a
complete copy of the Syllabus is included in Section 1.2 of this Study Guide.
you use the Syllabus as an important
part of your study.
Werecommend
that
CoreReading
The Core Reading has been produced by the Institute and Faculty of Actuaries. The purpose of
the Core Reading is to assist in ensuring that tutors, students and examiners have clear shared
appreciation ofthe requirements of the Syllabus for the qualification examinations for Fellowship
of the Institute and Faculty of Actuaries.
The Core Reading supports coverage of the Syllabus in helping to ensure that both depth and
breadth are re-enforced. It is therefore important that students have a good understanding of
the concepts covered by the Core Reading.
The examinations require students to demonstrate their understanding of the concepts givenin
the Syllabus and described in the Core Reading; this
will be based on the legislation,
professional
guidance, etc that arein force whenthe Core Readingis published, ie on 31 Mayin the year
preceding the examinations.
Therefore the examsin April and September 2022 will be based on the Syllabus and Core Reading
as at 31 May 2021.
Werecommend
that you always use the up-to-date
Core Reading to prepare
for the exams.
Examiners will have this Core Reading when setting the papers. In preparing for examinations,
students are advised to work through past examination questions and will find additional tuition
helpful. The Core Reading will be updated each year to reflect changesin the Syllabus,to reflect
current practice, andin the interest of clarity.
Accreditation
The Institute
and Faculty of Actuaries
would like to thank the numerous
people
who have helped
in the development of the material contained in this Core Reading.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1: Study
Guide
Page 17
ActEdtext
Core Reading deals with eachsyllabus objective and covers whatis neededto passthe exam.
However, the tuition
material that has been written by ActEd enhances it by giving examples and
further explanation of key points. Hereis an excerpt from some ActEd Course Notesto show you
how to identify Core Reading and the ActEd material. Core Reading is shown in this bold font.
In the example given above, the index willfall if the actual share price goes below the theoretical
ex-rights share price. Again,this is consistent with what would happen to an underlying portfolio.
After allowing for chain-linking,
It() = i
where
the formula
for the investment
index then
becomes:
Thisis
ActEd
?NP
it i,, t
text
Bt()
Thisis Core
Reading
Nit
, is the number
of shares issued
for the ith constituent
at time t;
Bt() is the base value, or divisor, attime t.
Hereis an excerpt from some ActEd Course Notesto show you how to identify Core Readingfor R
code.
The R code to draw a scatterplot
for a bivariate
data frame,
<data>,
is:
plot(<data>)
Further explanation on the use of R will not be provided in the Course Notes, but instead be
picked upin the Paper B Online Resources(PBOR). Werecommend that you refer to and use
PBOR at the end of each chapter,
references.
or couple of chapters, that contains a significant
number of R
Copyright
All study
material produced by ActEd is copyright
and is sold for the exclusive use of the
purchaser. The copyright is owned byInstitute and Faculty Education Limited, asubsidiary of the
Institute
and Faculty of Actuaries.
Unless prior authority is granted by ActEd, you may not hire
out, lend, give out, sell, store or transmit
electronically
or photocopy
any part of the study
material. You musttake care of your study materialto ensure that it is not used or copied by
anybody else.
Legal action will be taken if these terms areinfringed. In addition, we mayseek to take
disciplinary
action through
These conditions
The Actuarial
Education
the Institute
and Faculty of Actuaries or through
remain in force after you have finished
Company
your employer.
using the course.
IFE: 2022 Examination
Page 18
CS1: Study
Guide
2.3 ActEdstudysupport
Thissection gives a description of the products offered by ActEd.
Successful students tend to undertake three
initial
mainstudy activities:
1.
Learning
study and understanding
of subject
material
2.
Revision learning subject material and preparing to tackle exam-style questions
3.
Rehearsal answering exam-style questions, culminating in answering questions at exam
speed.
Different approaches suit different people. For example, you maylike to revise material gradually
over the
months running
up to the exams or you may do your revision in a shorter period just
before the exams. Also,these three activities will almost certainly overlap.
Weoffer aflexible range of products to suit you andlet you control your own learning and exam
preparation. Thefollowing table shows the products that we produce. Not all products are
available for all subjects.
LEARNING
LEARNING &
REVISION
REVISION &
REVISION
Course Notes
Paper B Online
Resources
(PBOR)
Assignments
Combined
REHEARSAL
REHEARSAL
Flashcards
Revision Notes
MockExam
Sound Revision
ASET
Additional Mock
Materials Pack
(CMP)
Pack (AMP)
Mock Marking
Assignment
Marking
Tutorials
Online
Classroom
The products and services are describedin more detail below.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1: Study
Guide
Page 19
Learningproducts
Course Notes
The Course Notes will help you develop the basic knowledge and understanding of principles
needed to pass the exam. They incorporate
the complete
Core Reading and include full
explanation of all the syllabus objectives, with worked examples and questions (including some
past exam questions) to test your understanding.
Each chapter includes:
the relevant syllabus objectives
a chapter summary
a page ofimportant formulae or definitions (where appropriate)
practice questions withfull solutions.
Paper B Online Resources (PBOR)
The Paper B Online Resources(PBOR) will help you prepare for the practical paper. Delivered
through
a virtual learning
practice questions.
environment
(VLE), you will have access to
PBOR will alsoinclude the Y Assignments,
worked examples and
which are two exam-style
assessments.
Learning &revision products
X Assignments
The Series X Assignments are assessments that cover the
material in each part of the course in
turn. They can be usedto develop andtest your understanding of the material.
Y Assignments
The Series Y Assignments are exam-style assessmentsthat cover material acrossthe whole
course.
Combined
Materials Pack (CMP)
The Combined Materials Pack(CMP) comprises the Course Notes, PBORand the Series X
Assignments.
CMP Upgrade
The purpose of the CMP Upgrade is to enable you to amend last years study
material to
makeit
suitable for study for this year.
Whereverpossible,it lists the changes to the syllabus objectives, Core Reading,the Course Notes
and the X / Y Assignments since last year that might realistically affect your chance of success in
the exam. It is produced so that you can manually amend your notes. The upgrade includes
replacement pages and additional pages where appropriate.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 20
CS1: Study
However, if alarge
Guide
number of changes have been made to the Course Notes and X/ Y
Assignments,it is not practical to produce afull upgrade, and the upgrade will only outline the
mostsignificant changes. In this case, werecommend that you purchase a replacement CMP
(printed copy or eBook) or Course Notes at asignificantly reduced price.
The CMP Upgradecan be downloaded free of charge on our website at www.ActEd.co.uk.
Aseparate
upgrade for eBooks is not produced
but a significant
discount is available for retakers
wishingto re-purchase the latest eBook.
X/ Y Assignment Marking
Weare happyto markyour attempts at the Xand/or Y assignments. Markingis not included with
the Assignments or the CMP and you need to order both Series X and Series Y Marking separately.
You should submit your script as an attachment to an email, in the format detailed in your
assignmentinstructions.
You will be able to download your markers feedback via a securelink.
Dont underestimate the benefits of attempting and submitting assignments:
Question practice during this phase of your study gives an earlyfocus on the end goal of
answering exam-style questions.
Youre incentivised to keep up with your study plan and get aregular, realistic assessment
of your progress.
Objective, personalised feedback from
a high quality
marker will highlight areas on which
to work and help with exam technique.
In a recent study, wefound that students who attempt
significantly higher passrates.
There are two different types of marking product: Series
morethan half the assignments have
Marking and
Marking Vouchers.
Series Marking
Series Markingapplies to a specified subject, session and student. If you purchase Series Marking,
you will not be able to defer the
or student.
marking to afuture
exam sitting or transfer it to a different subject
Wetypically provide full solutions with the Series Assignments. However,if you order Series
Marking at the same time as you order the Series Assignments, you can choose
receive a copy of the solutions in advance. If you choose not to receive them
material, you will be able to download the solutions
returned
(or following
the final
via a secure link
deadline date if you do not submit
If you are having your attempts at the assignments
whether or not to
withthe study
when your
marked script is
a script).
marked by ActEd, you should submit your scripts
regularly throughout the session,in accordance with the schedule of recommended datesset out
on our website at www.ActEd.co.uk. This will help you to paceyour study throughout the session
and leave an adequate amount
of time for revision and question practice.
The recommended
dates are realistic targets for the
will be returned
IFE: 2022 Examinations
submission
majority of students.
Your scripts
more quickly if you submit them well before the final deadline dates.
The Actuarial
Education
Compan
CS1: Study
Guide
Page 21
Any script submitted
after the relevant final deadline date will not be marked. It is your
responsibility to ensure that wereceive scripts in good time.
Marking Vouchers
MarkingVouchers givethe holderthe right to submit a script for markingat anytime, irrespective of
the individual
assignment deadlines, study session, subject or person.
Marking Vouchers can be usedfor any assignment. They are valid for four years from the date of
purchase and can be refunded at any time up to the expiry date.
Although you maysubmit your script with a Marking Voucher at any time, you will needto adhere
to the explicit Marking Voucher deadline datesto ensure that your script is returned before the date
of the exam. Thedeadlinedatesare provided on our websiteat www.ActEd.co.uk.
Tutorials
Ourtutorials are specifically designedto develop the knowledge that you will acquire from the
course materialinto the higher-level understanding that is needed to passthe exam.
Werun a range of different tutorials including
face-to-face
tutorials
at various locations,
and Live
Onlinetutorials. Full details are set outin our Tuition Bulletin, whichis available on our website at
www.ActEd.co.uk.
Regular and Block Tutorials
In preparation
for these tutorials,
Notes before attending the tutorial
we expect you to have read the relevant
part(s) of the Course
so that the group can spend time on exam questions and
discussionto develop understanding rather than basic bookwork.
You can choose one of the following
types of tutorial:
RegularTutorialsspread overthe session
a Block Tutorial
The tutorials
outlined
held two to eight weeks before the exam.
above will focus on and develop the skills required
for the Paper A
examination. Students wishingfor some additional tutor support workingthrough exam-style
questions for Paper B may wishto attend a Preparation Day. These will be available Live Online or
face-to-face,
where students
will need to provide their
own device capable of running
R.
Online Classroom
The Online Classroom acts as either a valuable add-on or a great alternative to aface-to-face or
Live Online tutorial,
focussing
on the Paper A examination.
Atthe heart of the Online Classroomin eachsubject is a comprehensive, easily-searched collection
of tutorial
units. These are a mix of:
teaching units, helping you to really get to grips withthe course material, and
guided questions, enabling you to learn the mostefficient waysto answer questions and
avoid common exam pitfalls.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 22
CS1: Study
The best wayto discover the Online Classroom is to see it in action.
Guide
You can watch a sample of
the OnlineClassroomtutorial units on our websiteat www.ActEd.co.uk.
Revision
products
For mostsubjects, there is alot of material to revise. Finding a wayto fit revision into your
routine
as painlessly as possible has got to be a good strategy.
inexpensive
options that can provide a massive boost.
Flashcards and Sound Revision are
They can also provide a variation in
activities during a study day, and so help you to maintain concentration and effectiveness.
Flashcards
Flashcards are a set of A6-sized cards that cover the key points of the subject that
want to commit to
reverse.
memory. Each flashcard
moststudents
has questions on one side and the answers on the
Werecommend that you usethe cards actively and test yourself as you go.
Sound Revision
It is reported that only 30% of information
that is read is retained but this rises to 50% if the
information is also heard. Sound Revision is a set of audio files, designed to help you remember
the
mostimportant
aspects of the Core Reading.
Thefiles cover the majority of the course, split into a number of manageabletopics based on the
chapters in the Course Notes. Each section lasts no longer than afew
minutes.
Choice of revision product
Different students
will have preferences for different revision
So, what mightinfluence
products.
your choice between these study aids? The following
questions and
comments might help you to choose the revision products that are mostsuitable for you:
Do you have aregular train or busjourney?
Flashcards areideal for regular bursts of revision on the move.
Do you want to fit
more study into your routine?
Flashcards are a good option for dead time,
them on the wall in your study.
eg using flashcards
Do you find yourself cramming for exams (even if thats
on your phone or sticking
not your original plan)?
Flashcards are an extremely efficient wayto do your pre-exam preparation.
Do you have some regular time where carrying other
eg commuting, at the gym, walking the dog?
Sound Revision is an ideal hands-free
materials isnt
practical,
revision tool.
Do you have a preference for auditory learning,
eg do you remember
conversations
more
easilythan emails?
Sound Revision will suit your preferred style and be especially effective for you.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1: Study
Guide
Choosing
Page 23
morethan one revision product
Asthere is some degree of overlap between revision products, we do not necessarily recommend
usingthem all simultaneously. However,if you are retaking a subject, then you might consider
using a different
product than on a previous attempt to keep your revision fresh and effective.
Revision &rehearsal products
Revision Notes
Our Revision Notes have been designed withinput from students to help you revise efficiently.
They are suitable for first-time sitters who have workedthrough the ActEdCourse Notes orfor
retakers (who should find them much more useful and challenging than simply reading through
the course again).
The Revision Notes are a set of A5 booklets perfect for revising in places wheretaking large
amounts of study material with you is not practical.
Each booklet covers one maintheme
or a set of related topics from the course and includes:
Core Reading to develop your bookwork
relevant
past exam questions
other useful revision
ActEd Solutions
knowledge
with concise solutions from the last ten years
aids.
with Exam Technique (ASET)
The ActEd Solutions
with Exam Technique (ASET) contains our solutions to a number of past exam
papers, plus comment and explanation. In particular, it highlights how questions might have been
analysed and interpreted
so as to produce a good solution
with a wide range of relevant
points.
This will be valuable in approaching questions in subsequent examinations.
Rehearsal products
MockExam
The MockExam consists of two papers. Thereis a 100-mark mockexam for the Paper A
examination and a separate mockexamfor the practical Paper B exam. These provide arealistic
test of your exam readiness.
It is based on the
Mock Exam from last year but it has been updated to reflect
any changes to the
Syllabus and Core Reading.
Additional
Mock Pack (AMP)
The Additional MockPack(AMP) consists offour further 100-mark mockexam papers
Mock
Exam 2 (Papers A and B) and Mock Exam 3 (Papers A and B). Thisis ideal if you are retaking
and
have already sat the MockExam, orif youjust want some extra question practice.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 24
CS1: Study
Guide
Mock Marking
Weare happyto markyour attempts at the mockexams. The same general principles apply asfor
the Assignment Marking.In particular:
Mock Exam Marking applies to a specified subject, session and student.
covers the marking of both Paper A and Paper B.
In this subject it
Marking Vouchers can be used for each mock exam paper. You will need two
marking
vouchersin order to have both Paper A and Paper B marked. Markingvouchers haveto
be usedfor markingthe AMP mocksand can be usedfor markingthe MockExam.
Recallthat:
markingis notincluded withthe products themselves and you need to order it separately
you should submit your script via email in the format
detailed in the
mock exam
instructions
you will be able to download the feedback
IFE: 2022 Examinations
on your
marked script via a secure link.
The Actuarial
Education
Compan
CS1: Study
Guide
Page 25
2.4 Studyskillsandassessment
Technicalskills
The Core Reading and exam papersfor these subjects tend to be very technical. The exams
themselves
therefore
have many calculation
be on understanding
and manipulation
the
questions.
mathematical techniques
The emphasis in the exam will
and applying them to various,
frequently unfamiliar, situations. It is important to have afeel for whatthe numerical answer
should be by having a deep understanding
of the
material and by doing reasonableness
checks.
Asa highlevel of pure mathematics and statistics is generally required for the Core Principles
subjects, it is important that your mathematical skills are extremely good. If you are alittle rusty
you may wish to consider purchasing additional
material to help you get up to speed. The course
Pure Mathsand Statistics for Actuarial Studies is available from ActEdandit covers the
mathematical techniques that arerequired for the Core Principles subjects, some of whichare
beyond A-Level (or Higher) standard.
You do not need to
work through
you canjust refer to it when you need help on a particular topic.
your
mathematical skills and further
the whole course in order
Aninitial assessment to test
details regarding the course can be found
on our website at
www.ActEd.co.uk.
Studyskills
Overall study plan
Wesuggestthat you develop a realistic study plan, building in time for relaxation and allowing
some time for contingencies. Be aware of busytimes at work, whenyou maynot be able to take
as much study leave as you would like. Once you have set your plan, be determined to stick to it.
You dont have to be too prescriptive at this stage about what precisely you do on each study day.
The mainthing is to be clear that you will cover all the important activities in an appropriate
mannerand leave plenty of time for revision and question practice.
Aim to manage your study so asto allow plenty of time for the concepts you meetin these
courses to bed down in your mind. Most successful students will probably aim to complete the
courses atleast a month before the exam,thereby leaving asufficient amount of time for
revision. Byfinishing the courses as quickly as possible, you will have a muchclearer view ofthe
big picture. It
important
will also allow you to structure
your revision so that you can concentrate
on the
and difficult areas.
You can also try looking
at our discussion forum,
which can be accessed at
www.ActEd.co.uk/forums (or usethe link from our home page at www.ActEd.co.uk). There are
some good suggestions from students
on how to study.
Study sessions
Only do activities that will increase your chance of passing. Try to avoid including activities for the
sake ofit and dont spend time reviewing
material that you already understand. You will only
improve your chances of passingthe exam by getting on top ofthe materialthat you currently
find difficult.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 26
Ideally,
CS1: Study
Guide
each study session should have a specific purpose and be based on a specific task,
egFinish reading Chapter 3 and attempt Practice Questions 3.4, 3.7 and 3.12, as opposed to a
specific amount of time, egThree hours studying the materialin Chapter 3.
Try to study somewhere
quiet and free from
distractions (eg an area at home dedicated to study).
Find out when you operate at your peak, and endeavour to study at those times ofthe day. This
might be between 8am and 10am or could be in the evening.
to remain focused
Take short breaks during your study
its definitely time for a short breakif you find that your brainis tired and
that your concentration
has started to drift from the information
in front
of you.
Order of study
Wesuggest that you work through each of the chapters in turn.
each chapter you should proceed in the following order:
1.
2.
To get the
maximum benefit from
Readthe syllabus objectives. Theseare set outin the box atthe start of each chapter.
Readthe Chapter Summary at the end of eachchapter. This willgive you a useful overview
of the
material that you are about to study and help you to appreciate the context of the
ideas that you meet.
3.
Studythe Course Notesin detail, annotating them and possibly makingyour own notes. Try
the self-assessment questions asyou come to them. Asyou study, pay particular attention
to the listing
4.
of the syllabus Objectives and to the Core Reading.
Read the Chapter Summary again carefully.
If there are any ideas that you cant
remember covering in the Course Notes,read the relevant section ofthe notes again to
refresh your memory.
5.
6.
Attempt (at least some of) the Practice Questions that appear at the end of the chapter.
Where relevant,
work through the relevant Paper B Online Resources for the chapter(s).
You will needto have a good understanding ofthe relevant section of the course before you
attempt the corresponding section of PBOR.
7.
Think about whatspecifically you might wantto include from that chapter in the reference
materials that you choose to have to hand during the exam. For example, you might want
to put together
some easy-reference lists of key concepts or formulae
that can be referred
to quickly and conveniently.
Its afact that people are morelikely to absorb something if they review it several times. So, do
look over the chapters you havestudied so far from time to time. It is useful to re-read the
Chapter Summaries or to try the Practice Questions again a few days after reading the chapter
itself. Its a good idea to annotate the questions with details of when you attempted each one. This
makesit easierto ensure that you try all ofthe questions as part of your revision without repeating
any that you got right first time.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1: Study
Guide
Page 27
Once youve read the relevant
part of the notes and tried a selection
of questions from the
Practice Questions(and attended a tutorial, if appropriate) you should attempt the corresponding
assignment. If you submit your assignment for marking,spend some time looking through it
carefully whenit is returned. It can seem a bit depressing to analyse the errors you made,but
you willincrease
will try their
your chances of passing the exam bylearning from your
mistakes. The markers
best to provide practical comments to help you to improve.
To bereally preparedfor the exam, you should not only know and understand the Core Reading but
also be aware of what the examiners
will expect. Your revision programme should include
plenty of
question practice so that you are aware ofthe typical style, content and markingstructure of exam
questions.
You should attempt
as many past exam questions as you can.
Active study
Hereare some techniques that
1.
mayhelp you to study actively.
Dont believe everything you read. Goodstudents tend to question everything that they
read. They will ask why, how, what for,
and they will apply their own judgement.
when? when confronted
with a new concept,
This contrasts with those who unquestioningly
believe whatthey are told, learn it thoroughly, and reproduce it (unquestioningly?) in
response to exam questions.
2.
Another useful technique
as you read the Course Notes is to think of possible questions
that the examiners could ask. This will help you to understand the examiners
point of
view and should meanthat there arefewer nasty surprises in the exam. Usethe Syllabus
to help you makeup questions.
3.
Annotate your notes with your ownideas and questions. This will makeyou study more
actively and will help when you come to review
and revise the
material. These notes
may
also be usefulto refer to in the exam. Do not simply copy out the notes without thinking
about the issues.
4.
Attempt the questions in the notes as you workthrough the course. Produce your answer
before you refer to the solution.
5.
Attempt other questions and assignments on a similar basis,ie produce your answer
before looking atthe solution provided. Attempting the assignments under timed
conditions hassome particular benefits:
It forces you to think and actin a waythat is similar to how you will behavein the
exam.
When you have your assignments
marked it is
much more useful if the
markers
comments can show you how to improve your performance under timed
conditions than your performance whenyou are under no time pressure.
The knowledge that you are going to do an assignment under timed conditions and
then submit it (however good or bad)for markingcan act asa powerful incentive to
make you study each part as well as possible.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 28
CS1: Study
Guide
It is also quicker than trying to produce perfect answers.
6.
Sit a mockexam four to six weeksbefore the real exam to identify your weaknessesand
workto improve them. You could use a mockexam written by ActEdor a past exam
paper.
Ensure that you have your reference
materials handy, as you plan to in the actual
exam, so that you can practise finding what you needin them quickly and efficiently. (You
might even be able to add to / modify your reference materialsto increase their
usefulness.)
You can find further
information
on how to study in the professions
Student
Handbook,
which
you can download from their website at:
www.actuaries.org.uk/studying
Revision and exam skills
Revision skills
You will have sat many exams before and will have mastered the exam and revision techniques
that suit you. Howeverit is important to note that due to the high volume of workinvolved in the
Core Principles subjects it is not possible to leave all your revision to the last minute. Students
who prepare wellin advance have a better chance of passingtheir exams on the first sitting.
Unprepared students find that they are under time pressure in the exam. Therefore it is
important to find waysof maximisingyour score in the shortest possible time. Part of your
preparation should be to practise alarge number of exam-style questions under timed conditions
assoon as possible. This will:
help you to develop the necessary understanding of the techniques required
highlight the key topics,
which crop up regularly in
many different
contexts and questions
help you to practisethe specific skills that you will need to passthe exam.
There are many sources of exam-style
questions.
You can use past exam papers, the Practice
Questions at the end of each chapter (which include
many past exam questions),
assignments,
mockexams,the Revision Notes and ASET.
Exam question skill levels
Exam questions are not designed to be of similar difficulty. The Institute
specifies different skill levels at which questions may be set.
and Faculty of Actuaries
Questions maybe set at any skill level:
Knowledge
demonstration
of a detailed knowledge
and understanding
of the topic
Application demonstration of an ability to apply the principles underlying the topic
within a given context
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1: Study
Guide
Page 29
Higher Order demonstration of an ability to perform deeper analysis and assessment of
situations, including forming judgements, taking into account different points of view,
comparing and contrasting situations, suggesting possible solutions and actions, and
making recommendations.
Command verbs
TheInstitute and Faculty of Actuaries use command verbs (such asDefine, Discuss and
Explain) to help students to identify whatthe question requires. The profession has produced a
document, Command
verbs usedin the Associate and Fellowship examinations,
to help students
to understand what each command verb is asking them to do.
It also gives the following advice:
The use of a specific command
verb within a syllabus objective
does not indicate
that this
is the only form of question whichcan be asked on the topic covered bythat objective.
The examiners may ask a question on any syllabus topic using any of the agreed command
verbs, as are defined in the document.
You can find the relevant document on the professions website at:
www.actuaries.org.uk/studying/prepare-your-exams
Pastexampapers
You can download some past exam papers and Examiners
at www.actuaries.org.uk.
Reports from the professions
website
However, please be aware that the majority ofthese exam papers are
for the pre-2019 syllabus and so not all questions
will be relevant.
Theexamination
IMPORTANT
NOTE: The following
advice was correct
at the time
of publication,
however it is
important to keep up-to-date with anychanges. Seethe IFoAs websitefor the latest guidance.
Thereis alot of usefulinformation
about the exams at:
www.actuaries.org.uk/studying/my-exams/ifoa-exams
including
The Actuarial
an Examinations
Education
Company
Handbook that gives guidance specific to sitting exams online.
IFE: 2022 Examination
Page 30
CS1: Study
Guide
For the exam, ensure you have ready:
your reference
materials, with helpful bookmarks
rough paper and a pen / pencil
a calculator / Excel (or equivalent)
a printer (if you wishto print out the exam paper)
a copy ofthe Tables.
Please also refer to the professions website and your examination instructions for details about
what you will need for the practical Paper B exam.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1: Study
Guide
Page 31
2.5 Queriesandfeedback
Questionsand queries
From time to time you maycome acrosssomething in the study materialthat is unclearto you.
The easiest way to solve such problems is often through
discussion
peers
whilst studying.
they
will probably have had similar experiences
with friends,
colleagues and
If theres
no-one at work
to talk to then use our discussion forum at www.ActEd.co.uk/forums (or usethe link from our
home page at www.ActEd.co.uk).
Our online forum is dedicated to actuarial students sothat you can get help from fellow students
on any aspect of your studies from technical issues to study advice. You could also useit to get
ideas for revision or for further reading around the subject that you are studying.
ActEd tutors
willvisit the site from time to time to ensure that you are not beingled astray and we also post
other frequently asked questions from students on the forum asthey arise.
If you are still stuck, then you can send queries by email to the relevant subject email address (see
Section 1.5), but werecommend that you try the forum first.
We will endeavour to contact you as
soon as possible after receiving your query but you should be aware that it maytake some time to
reply to queries, particularly whentutors are away from the office running tutorials. Atthe
busiest teaching times
of year, it
maytake us more than a week to get back to you.
If you have many queries on the course
material, you should raise them at a tutorial
or book a
personal tuition session with an ActEdtutor. Information about personal tuition is set outin our
current brochure. Please email ActEd@bpp.com for more details.
Feedback
If you find an error in the course, please check the corrections
page of our website
(www.ActEd.co.uk/paper_corrections.html)to seeif the correction hasalready been dealt with.
Otherwise please send details via email to the relevant subject email address(see Section 1.5).
Each year our tutors work hard to improve the quality ofthe study material and to ensure that
the courses are as clear as possible and free from
errors.
Weare always happy to receive
feedback from students, particularly details concerning any errors, contradictions or unclear
statements in the courses. If you have any comments on this course please email them to the
relevant subject email address (see Section 1.5).
Ourtutors also work withthe profession to suggest developments andimprovements to the
Syllabus and Core Reading. If you have any comments or concerns about the Syllabus or Core
Reading, these can be passed on via ActEd. Alternatively, you can send them directly to the
Institute and Faculty of Actuaries Examination Team by email to
education.services@actuaries.org.uk.
The Actuarial
Education
Company
IFE: 2022 Examination
Allstudy materialproducedby ActEdis copyright andis sold
for the exclusiveuse of the purchaser. Thecopyright is
ownedbyInstitute andFacultyEducationLimited, a
subsidiary of the Institute and Faculty of Actuaries.
Unlessprior authorityis grantedby ActEd,you maynot hire
out,lend, give out, sell, store ortransmit electronically or
photocopy any part of the study material.
You musttake care of yourstudy materialto ensurethat it
is not used or copied by anybody else.
Legal action will betaken if these terms areinfringed. In
addition, we mayseekto take disciplinaryactionthrough
the profession orthrough your employer.
Theseconditionsremainin force after you havefinished
usingthe course.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-01:
Data analysis
Page 1
Dataanalysis
Syllabusobjectives
2.1
Data analysis
2.1.1
Describethe possible aims of data analysis (eg descriptive, inferential and
predictive).
2.1.2
Describe the stages of conducting
a data analysis to solve real-world
problems in a scientific mannerand describe tools suitable for eachstage.
2.1.3
Describesources of data and explain the characteristics of different data
sources,including extremely large data sets.
2.1.4
Explain the
meaning and value of reproducible
research and describe the
elements required to ensure a data analysisis reproducible.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 2
0
CS1-01: Data analysis
Introduction
Thischapter provides anintroduction to the underlying principles of data analysis,in particular
within an actuarial context.
Data analysis is the process by which datais gathered in its raw state and analysed or
processed into information
which can be used for specific purposes. This chapter will
describe some of the different forms of data analysis, the steps involved in the process
and
consider some ofthe practical problems encountered in data analytics.
Although this chapter looks at the general principles involved in data analysis,it does not deal
with the statistical techniques
required
to perform a data analysis.
These are covered
elsewhere,
in Subjects CS1and CS2.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-01:
1
Data analysis
Page 3
Aimsof a dataanalysis
Three keys forms
of data analysis
will be covered in this
section:
descriptive;
inferential;
and
predictive.
1.1
Descriptiveanalysis
Data presented in its raw state can be difficult to manage and draw meaningful conclusions
from, particularly
where there is alarge volume of data to work with. A descriptive
analysis
solves this problem by presenting the data in a simpler format,
more easily understood
and
interpreted
by the user.
Simply
put, this
might involve
summarising
highlights any patterns ortrends.
to draw any specific
conclusions.
the data or presenting
it in a format
which
A descriptive analysis is not intended to enable the user
Rather, it describes
the data actually
presented.
For example, it is likely to be easierto understand the trend and variation in the sterling/euro
exchangerate over the past year bylooking at a graph of the daily exchangerate rather than alist
of values.
The graph is likely to
makethe information
easier to absorb.
Two key
tendency
measures, or parameters, used in a descriptive analysis
and the dispersion.
The most common
measurements
are the measure of central
of central tendency
are the
mean,the median and the mode. Typical measurements ofthe dispersion arethe standard
deviation
and ranges
such as the interquartile
range.
Measuresof central tendency tell us about the average value of a data set, whereas measuresof
dispersion tell us about the spread ofthe values. We will use manyofthese measureslater in
the course.
It can also be important to describe other aspects of the shape of the (empirical)
of the data, for example by calculating
measures of skewness and kurtosis.
distribution
Empirical meansbased on observation.
So an empirical distribution relates to the distribution
the actual data points collected, rather than any assumed underlying theoretical distribution.
Skewnessis a measureof how symmetrical a data set is, and kurtosis is a measureof howlikely
extreme values are to appear (ie those in the tails of the distribution). Weshall touch on these
later.
1.2 Inferential analysis
Often it is not feasible
or practical to collect data in respect of the whole population,
particularly
when that population is very large.
For example, when conducting
an opinion
poll in a large country, it may not be cost effective to survey every citizen.
A practical
solution to this problem
might be to gather data in respect of a sample, which is used to
represent the wider population.
The analysis of the data from this sample is called
inferential
analysis.
The Actuarial
Education
Company
IFE: 2022 Examination
of
Page 4
CS1-01: Data analysis
The sample analysis involves
estimating the parameters as described in Section 1.1 above
and testing hypotheses.
It is generally accepted that if the sample is large and taken
random (selected
without prejudice), then it quite accurately represents the statistics
population, such as distribution,
also contingent
upon the user
probability,
mean,standard deviation,
making reasonably
in order to perform the inferential
correct
hypotheses
at
of the
However, this is
about the population
analysis.
Care may need to be taken to ensure that the sample selected is likely to be representative
of the
whole population. For example, an opinion poll on a national issue conducted in urban locations
on weekday afternoons between 2pm and 4pm maynot accurately reflect the views ofthe whole
population. Thisis becausethose living in rural areas and those whoregularly work during that
period are unlikely to have been surveyed,
and these people
might tend to have a different
viewpoint to those who have been surveyed.
Sampling, inferential
1.3
analysis
and parameter
estimation
are covered in
more detail later.
Predictiveanalysis
Predictive analysis extends the principles behind inferential
analysis
analyse past data and make predictions
about future events.
in order for the user to
It achieves this by using an existing set of data with known attributes (also known as
features), known as the training set in order to discover potentially
predictive relationships.
Those relationships
are tested using a different set of data, known as the test set, to assess
the strength of those relationships.
Atypical example of a predictive
analysis is regression
analysis, which is covered in more
detail later.
The simplest form of this is linear regression
where the relationship
between a
scalar dependent variable and an explanatory
or independent
variable is assumed to be
linear and the training
practical
example
set is used to determine the slope and intercept
might be the relationship
between
a cars
braking
In this example, the cars speed is the explanatory (or independent)
ofthe line.
distance
A
against
speed.
variable and the braking
distanceis the dependent variable.
Question
Based
ondatagathered
ataparticularweather
stationonthe monthly
rainfallin mm( r) andthe
average
numberof hoursofsunshine
perday( s),aresearcher
hasdetermined
thefollowing
explanatory relationship:
=-
Using this
sr
90.1
model:
(i)
Estimatethe average number of hours of sunshine per day,if the monthlyrainfall is 50mm.
(ii)
Statethe impact on the average number of hours of sunshine per day of each extra
millimetre of rainfall in a month.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-01:
Data analysis
Page 5
Solution
(i)
When = 50r:
s =90.1
=-
50
4
ie there are 4 hours of sunshine per day on average.
(ii)
For each extra millimetre ofrainfall in a month,the average number of hours of sunshine
per dayfalls by 0.1 hours, or 6 minutes.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 6
2
CS1-01: Data analysis
Thedataanalysisprocess
Whilethe
process to analyse
data does not follow
a set pattern
of steps, it is helpful to
consider the key stages which might be used by actuaries when collecting and analysing
data.
The key steps in a data analysis
1.
process
can be described
Develop a well-defined set of objectives
as follows:
which need to be met by the results ofthe
data analysis.
The objective maybeto summarise the claims from a sicknessinsurance product by age,
gender and cause of claim, or to predict the outcome of the next national parliamentary
election.
2.
3.
Identify
the
Collection
data items
of the
The relevant
data
required
data from
for the analysis.
appropriate
sources.
may be available internally
(eg from an insurance
companys
administration department) or mayneedto be gathered from external sources (eg from a
local council office or government statistical service).
4.
Processing
and formatting
data for analysis,
eg inputting
into
a spreadsheet,
database or other model.
5.
Cleaning
6.
Exploratory
(a)
data, eg addressing
data analysis,
(c)
which
missing or inconsistent
values.
mayinclude:
Descriptive analysis; producing summary statistics
spread
(b)
unusual,
on central tendency and
of the data.
Inferential analysis; estimating summary parameters of the wider population
of data, testing hypotheses.
Predictive
analysis;
analysing
data to
make predictions
about future
events
or other data sets.
7.
Modelling the data.
8.
Communicating
It will be important
the results.
when communicating
the results to
what analyses were performed, what assumptions
analysis, and any limitations
of the analysis.
9.
Monitoring
the process;
makeit clear
were made, the conclusion
updating the data and repeating
A data analysis is not necessarily just a one-off exercise.
the claims from its sickness policies
what data was used,
of the
the process if required.
Aninsurance
company analysing
may wish to do this every few years to allow for the
new data gathered and to look for trends. An opinion poll company attempting to predict
an election result is likely to repeat the poll a number oftimes in the weeks before the
election to
IFE: 2022 Examinations
monitor any changes in views during the campaign
period.
The Actuarial
Education
Compan
CS1-01:
Data analysis
Page 7
Throughout the process, the
modelling team needs to ensure that any relevant professional
guidance has been complied
with. For example, the Financial Reporting Council has issued
a Technical Actuarial Standard (TAS) on the principles for Technical Actuarial
Work
(TAS100) which includes
Knowledge
Further, the
complied
modelling team
TAS is not required
should
also remain
with. Such legal requirement
data protection
The Actuarial
principles for the use of data in technical actuarial work.
of the detail of this
Education
and gender
Company
for
CS1.
aware of any legal requirement
to be
mayinclude aspects around consumer/customer
discrimination.
IFE: 2022 Examination
Page 8
3
CS1-01: Data analysis
Datasources
Step 3 of the process
described
in
Section 2 above refers to collection
meetthe objectives ofthe analysis from appropriate sources.
of the data needed to
As consideration
of Steps 3,
4, and 5 makes clear, getting data into a form ready for analysis is a process, not a single
event. Consequently,
what is seen as the source of data can depend on your viewpoint.
Suppose you are conducting
an analysis which involves
collecting survey data from a
sample of people in the hope of drawing inferences
about a wider population.
If you are in
charge of the whole process, including
collecting the primary data from your selected
sample, you would probably view the source
of the data as being the people in your
sample.
Having collected, cleaned and possibly summarised the data you might makeit
available to other investigators
in JavaScript object notation (JSON) format via a web
Application
programming interface (API). You will then have created a secondary source
for others to use.
In this section we discuss how the characteristics
of the data are determined both by the
primary source and the steps carried out to prepare it for analysis
which mayinclude the
steps on the journey from primary to secondary
source.
Details of particular data formats
(such as JSON), or of the mechanisms for getting data from an external source into a local
data structure suitable for analysis, are not covered in CS1.
Primary data can be gathered as the outcome of a designed experiment
or from an
observational
study (which could include a survey of responses to specific questions). In
all cases, knowledge of the details of the collection
process is important for a complete
understanding
of the data, including
possible sources of bias or inaccuracy.
Issues that the
analyst should be aware of include:
whether the
limitations
process
was manual or automated;
on the precision
whether there
of the data recorded;
was any validation at source; and
if data wasnt collected
automatically,
how was it converted
These factors can affect the accuracy and reliability
to an electronic
of the data collected.
form.
For example:
in a survey, anindividuals salary maybe specified asfalling into given bands,
eg 20,000 - 29,999, 30,000 - 39,999 etc, rather than the precise value being recorded
if responses werecollected on handwritten forms, and then manuallyinput into a
database, there is greater scope for errors to appear.
Whererandomisation
is important
has been used to reduce the effect of bias or confounding
to know the sampling
scheme
variables it
used:
simple random sampling;
stratified
another
IFE: 2022 Examinations
sampling;
sampling
or
method.
The Actuarial
Education
Compan
CS1-01:
Data analysis
Page 9
Question
Aresearcher wishesto survey 10% of a companys
workforce.
Describe how the sample could be selected using:
(a)
simple random sampling
(b)
stratified sampling.
Solution
(a)
Simple random
sampling
Usingsimple random sampling, each employee would have an equal chance of being selected.
This could be achieved by taking
alist of the employees,
selecting 10% of the numbers at random (either
allocating
each a number,
and then
manually, or using a computer-generated
process).
Stratified
(b)
sampling
Usingstratified sampling, the workforce wouldfirst be split into groups (or strata) defined by
specific criteria, eglevel of seniority. Then 10% of each group would be selected using simple
random sampling. In this way,the resulting sample wouldreflect the structure of the company by
seniority.
This aims to overcome one ofthe issues with simple random sampling, ie that the sample
obtained does not fully reflect the characteristics ofthe population. Witha simple random
sample, it would be possible for all those selected to be at the same level of seniority, and so be
unrepresentative
of the
workforce as a whole.
Data may have undergone some form of pre-processing.
(eg by geographical
area or age band). In the past, this
amount of storage required and to
of computing
be grouped:
A common example is grouping
was often done to reduce the
makethe number of calculations
manageable. The scale
power available now means that this is less often an issue, but data may still
perhaps to anonymise it, or to remove the possibility
of extracting sensitive (or
perhaps commercially
sensitive) details.
Other aspects of the data which are determined
the way it is analysed include the following:
by the collection
process,
Cross-sectional
data involves recording
values of the variables
case in the sample at a single moment in time.
For example, recording
the amount spent in a supermarket
and which affect
of interest
for
each
by each member of aloyalty
card scheme this week.
Longitudinal
data involves recording values atintervals
over time.
For example, recording the amount spent in a supermarket by a particular member of a
loyalty card scheme each weekfor a year.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 10
CS1-01: Data analysis
Censored data occurs when the value of a variable is only partially known, for
example, if a subject in a survival study withdraws, or survives beyond the end of
the study: here alower bound for the survival period is known but the exact value
isnt.
Censoringis dealt within detail in Subject CS2.
Truncated data occurs when measurements
are completely
unknown.
on some variables
are not recorded
so
For example, if we were collecting data on the periods of time for which a users internet
connection
was disrupted, but only recorded the duration of periods of disruption that
lasted 5 minutes orlonger, we would have a truncated data set.
3.1
Big data
The term big datais not well defined but has come to be used to describe data with
characteristics
that makeit impossible
to apply traditional
methods of analysis (for
example, those which rely on a single, well-structured
data set which can be manipulated
and analysed on a single computer).
Typically, this
characteristics
that have to be inferred
from the design of an experiment.
from the
means automatically collected data with
data itself
rather than known in advance
Giventhe description above, the properties that can lead data to be classified as big
include:
size, not only does big data include a very large number of individual
each might include
very many variables, a high proportion
of which
empty (or null) values
leading to sparse data;
cases, but
might have
speed, the data to be analysed
might be arriving in real time at a very fast rate
for
example, from an array of sensors taking
measurements thousands
of time every
second;
variety,
big data is often composed
of elements from
could have very different structures
reliability,
individual
example,
many different
sources
which
oris often largely unstructured;
given the above three characteristics we can see that the reliability of
data elements might be difficult to ascertain and could vary over time (for
an internet
connected
sensor
could
go offline for a period).
Examples ofbig data are:
the information held bylarge online retailers onitems viewed, purchased and
recommended by each ofits customers
measurements of atmospheric pressure from sensors monitored by a national
meteorological organisation
the data held by an insurance
company received from the personal activity trackers (that
monitor daily exercise, food intake and sleep, for example) ofits policyholders.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-01:
Data analysis
Page 11
Although the four points above (size, speed, variety, reliability)
have been presented in the
context of big data, they are characteristics
that should be considered for any data source.
For example, an actuary may need to decide if it is advisable to increase the volume of data
available for a given investigation
externally.
data, plus any issues of reliability
3.2
by combining aninternal
In this case, the extra processing
complexity
data set with data available
required
to handle
a variety
of
ofthe external data, will need to be considered.
Datasecurity,privacyandregulation
In the design of any investigation,
consideration
of issues related to data security, privacy
and complying
with relevant regulations
should be paramount. It is especially important to
be aware that combining
different data from different anonymised
sources can mean that
individual
cases become identifiable.
Another
internet,
complex
The Actuarial
point to be aware of is that just because data has been made available on the
doesnt
mean that that others are free to use it as they wish. This is a very
area and laws vary between jurisdictions.
Education
Company
IFE: 2022 Examination
Page 12
4
CS1-01: Data analysis
Reproducibleresearch
An example reference
for this
section is in
Peng (2016).
For the full reference,
see the end
of this section.
4.1
The meaning of reproducible research
Reproducibility
refers to the idea that when the results of a statistical analysis are reported,
sufficient information
is provided so that an independent
third party can repeat the analysis
and arrive at the same results.
In science, reproducibility
repeating
an experiment
is linked to the concept of replication
which refers to someone
and obtaining the same (or at least consistent) results.
Replication
can be hard, or expensive orimpossible,
for example if:
the study is big;
the study relies
on data collected
the study is of a unique
of a particular event).
Dueto the possible difficulties
often a reasonably
alternative
occurrence
at great expense
or over
(eg the standards
of replication, reproducibility
many years;
of healthcare
or
in the aftermath
ofthe statistical analysis is
standard.
So,rather than the results of the analysis being validated by anindependent third party
completely replicating the study from scratch (including gathering a new data set), the validation
is achieved by an independent
third party reproducing
the same results based on the same data
set.
4.2
Elementsrequired for reproducibility
Typically,
reproducibility
requires
the original
data and the computer
code to be made
available (or fully specified) so that other people can repeat the analysis and verify the
results. In all but the most trivial cases, it will be necessary to include full documentation
(eg description
of each data variable, an audit trail describing the decisions
made when
cleaning and processing the data, and full documented code).
covered in
Subject
Documentation of modelsis
CP2.
Full documented
code can be achieved through literate statistical
programming
(as defined
by Knuth, 1992) where the program includes
an explanation
of the program in plain
language, interspersed
with code snippets.
Within the R environment,
a tool which allows
this is R Markdown.
R Markdown enables documents to be produced that include the code used, an explanation of
that code, and, if desired, the output from that code.
Asasimpler example, it
adding comments
maybe possible to document the workcarried out in a spreadsheet by
or annotations
to explain the operations
performed in particular cells, rows or
columns.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-01:
Data analysis
Page 13
Although not strictly required to
meetthe definition of reproducibility,
control process can ensure evolving drafts of code, documentation
alignment
between the various stages of development
and review,
reversible if necessary.
tool
used for version
a good version
and reports
and changes
are kept in
are
There are manytools that are used for version control.
A popular
control is git.
A detailed knowledge ofthe version control tool git is not required for Subject CS1.
In addition to version control, documenting the software environment, the computing
architecture,
the operating system, the software toolchain,
external dependencies
and
version numbers can all be important in ensuring reproducibility.
As an example, in the
R programming
language,
the command:
sessionInfo()
provides information
about the operating
packages being used.
system,
version
of R and version
of all R
Question
Give a reason
why documenting
the version number of the software
used can beimportant
for
reproducibility of a data analysis.
Solution
Some functions
might:
be availablein one version of a packagethat are not availablein another (older) version,
or
behave differently in different versions of a package.
This could prevent someone being able to reproduce the analysis.
Wherethere is randomness in the statistical or machinelearning techniques
example random forests or neural networks)
require the random seed to be set.
or where simulation
being used (for
is used, replication
will
Machine learning is covered in Subject CS2.
Simulation will be dealt within more detaillater in the course. Atthis point, it is sufficient to
know that eachsimulation that is run will be based on aseries of pseudo-random numbers. So,
for example, one simulation will be based on one particular series of pseudo-random numbers,
but unless explicitly coded otherwise,
a different
simulation
will be based on a different series of
pseudo-random numbers. The second simulation willthen produce different results, rather than
replicating the original results, whichis the desired outcome here.
To ensure the two simulations give the same results, they would both need to be based on the
same series of pseudo-random
this regularly
The Actuarial
numbers.
Thisis known as setting
the random seed.
We will do
when using Rto carry out a simulation.
Education
Company
IFE: 2022 Examination
Page 14
CS1-01: Data analysis
Doing things by
of doing things
hand is very likely to create problems in reproducing the work. Examples
by hand are:
manually editing spreadsheets (rather than reading the raw datainto a programming
environment
and
making the changes there);
editing tables and figures (rather than
creates them exactly as needed);
ensuring that the programming
environment
downloading data manually from a website (rather than doing it programmatically);
and
pointing and clicking (unless the software used creates an audit trail of what has
been clicked).
Pointing
and clicking relates to choosing a particular operation from an on-screen menu,for
example.
This action
would not ordinarily
be recorded
electronically.
The mainthing to note hereis that the more of the analysisthat is performed in an automated
way,the easierit will beto reproduce by another individual.
Manualinterventions maybe
forgotten altogether, and evenif they are remembered, can be difficult to document clearly.
4.3
Thevalueofreproducibility
Many actuarial analyses are undertaken for commercial,
published, but reproducibility
is still valuable:
not scientific,
reasons
and are not
reproducibility
is necessary for a complete technical
work review (which in many
cases will be a professional requirement)
to ensure the analysis has been correctly
carried out and the conclusions
are justified
by the data and analysis;
reproducibility
reproducible
the analysis,
may be required
by external regulators
and auditors;
research is more easily extended to investigate
or to incorporate
new data;
the effect
it is often desirable to compare the results of an investigation
carried out in the past; if the earlier investigation
was reported
of changes to
with a similar
reproducibly,
one
an
analysis of the differences between the two can be carried out with confidence;
the
discipline
of reproducible
research,
with its emphasis
on good documentation
of
processes and data storage, can lead to fewer errors that need correcting in the
original
work and, hence,
There are some issues
greater efficiency.
that reproducibility
does not address:
Reproducibility
does not mean that the analysis is correct.
For example, if an
incorrect
distribution is assumed, the results
may be wrong
even though they can
be reproduced
by making the same incorrect
assumption
about the distribution.
However, by making clear how the results are achieved, it does allow transparency
so that incorrect
analysis can be appropriately
challenged.
If activities involved in reproducibility
happen only at the end of an analysis, this
may be too late for resulting
challenges to be dealt with. For example, resources
may have been moved on to other projects.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-01:
4.4
Data analysis
Page 15
References
Further information
Knuth,
on the materialin this section is givenin the references:
Donald E.(1992).
Literate
Programming.
California:
Center for the Study of Language and Information.
Peng, R. D., 2016, Report
Writing for
Stanford
University
ISBN 978-0-937073-80-3.
Data Science in
R,
www.Leanpub.com/reportwriting
The Actuarial
Education
Company
IFE: 2022 Examination
Page 16
CS1-01: Data analysis
The chapter summary starts on the next page so that you can
keep all the chapter summaries together for revision purposes.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-01:
Data analysis
Page 17
Chapter1 Summary
The three keyforms of data analysis are:
descriptive analysis: producing summary statistics (eg measuresof central tendency
and dispersion) and presenting the datain a simpler format
inferential
analysis: using a data sample to estimate summary parameters for the
wider population from which the sample wastaken, and testing hypotheses
predictive
analysis: extends the principles
ofinferential
analysis to analyse past data
and makepredictions about future events.
The key steps in the data analysis process are:
1.
Develop a well-defined set of objectives which need to be metby the results of the
data analysis.
2.
Identify
the data items required
for the analysis.
3.
Collection of the datafrom appropriate sources.
4.
Processing and formatting
database or other
datafor analysis, eginputting into a spreadsheet,
model.
5.
Cleaning data, eg addressing unusual,
6.
Exploratory data analysis, which mayinclude descriptive analysis,inferential analysis
or predictive
7.
missing or inconsistent
values.
analysis.
Modelling the data.
8.
Communicating
9.
the results.
Monitoring the process; updating the data and repeating
the process if required.
In the data collection process,the primary source of the datais the population (or
population sample) from whichthe raw datais obtained. If, oncethe information is
collected, cleaned and possibly summarised, it is madeavailable for others to use via a web
interface,
this is then a secondary source of data.
Other aspects of the data determined bythe collection processthat
are:
mayaffect the analysis
Cross-sectional datainvolves recording values ofthe variables ofinterest for each
casein the sample at a single momentin time.
Longitudinal
data involves
recording
values at intervals
over time.
Censored data occurs whenthe value of a variable is only partially known.
Truncated data occurs when measurements on some variables are not recorded so
are completely unknown.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 18
CS1-01: Data analysis
The term big data can be usedto describe data with characteristics that makeit impossible
to apply traditional methods of analysis. Typically, this meansautomatically collected data
with characteristics that have to beinferred
advance from the design of the experiment.
from the data itself rather than known in
Properties that canlead to data being classified asbig include:
size of the data set
speed of arrival ofthe data
variety of different sources from whichthe datais drawn
reliability
of the data elements
Replication refers to an independent
might be difficult to ascertain.
third
party repeating
an experiment
and obtaining the
same (or atleast consistent) results. Replication of a data analysis can be difficult, expensive
orimpossible, so reproducibility is often used as areasonably alternative standard.
Reproducibility refers to reporting the results of a statistical analysisin sufficient detail that
an independent
third party can repeat the analysis on the same data set and arrive at the
same results.
Elements required for reproducibility:
the original data and fully documented computer code need to be madeavailable
good version control
documentation ofthe software used, computing architecture, operating system,
external dependencies and version numbers
whererandomness is involved in the process, replication
seed to be set
limiting
IFE: 2022 Examinations
the amount
of work done by
willrequire the random
hand.
The Actuarial
Education
Compan
CS1-01:
Data analysis
Page 19
Chapter1 PracticeQuestions
1.1
The data analysis department of a mobile phone messagingapp provider has gathered data on
the number of messagessent by each user of the app on each day overthe past 5 years. The
geographical location
(i)
Describe each ofthe following terms asit relates to a data set, and give an example of
each asit relates to the app providers data:
(a)
cross-sectional
(b)
longitudinal.
(ii)
Give an example of each of the following
using the app providers data:
(a)
1.2
1.3
of each user (by country) is also known.
types of data analysis that could be carried out
descriptive
(b)
inferential
(c)
predictive.
Explainthe regulatory andlegal requirements that should be observed whenconducting a data
analysis exercise.
Acarinsurer
wishesto investigate
whether young drivers (aged 17-25) are morelikely to have an
accident in a given year than older drivers.
Exam style
Describethe steps that would be followed in the analysis of data for this investigation.
1.4
Exam style
(i)
In the context of data analysis, define the terms replication
(ii)
[7]
and reproducibility.
[2]
Givethree reasons whyreplication of a data analysis can be difficult to achievein practice.
[3]
[Total
The Actuarial
Education
Company
IFE: 2022 Examination
5]
Page 20
CS1-01: Data analysis
The solutions start on the next page so that you can
separate the questions and solutions.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-01:
Data analysis
Page 21
Chapter1 Solutions
1.1
(i)(a)
Cross-sectional
Cross-sectional datainvolves recording the values of the variables ofinterest for eachcasein the
sample at a single
moment in time.
In this data set, this relates to the number of messagessent by each user on any particular day.
(i)(b)
Longitudinal
Longitudinal datainvolves recording the values of the variables ofinterest atintervals overtime.
In this data set, this relates to the number of messagessent by a particular
user on each day over
the 5-year period.
(ii)(a)
Descriptive analysis
Examples of descriptive analysisthat could be carried out on this data set include:
calculating the meanand standard deviation of the number of messagessent each day by
usersin each country
plotting a graph of the total messages sent each day worldwide, to illustrate
trend in the number of messages sent over the 5 years
calculating
what proportion
of the total
the overall
messages sent in each year originate in each
country.
(ii)(b)
Inferential
analysis
Examples ofinferential analysis that could be carried out on this data set include:
testing the hypothesis that
more messagesare sent at weekendsthan on weekdays
assessing whether there is a significant
difference in the rate of growth
of the number of
messagessent each day by usersin different countries over the 5-year period.
(ii)(c)
Predictive analysis
Examples of predictive analysis that could be carried out on this data set include:
forecasting which countries will be the majorusers ofthe app in 5 years time, and will
therefore
need the
mosttechnical
support staff
predicting the number of messages sent on the apps busiest day (eg New Years Eve) next
year, to ensure that the provider continues to havesufficient capacity.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 22
1.2
CS1-01: Data analysis
Throughout the data analysis process, it is important
to ensure that any relevant
professional
guidance has been complied with. For example, the UKs Financial Reporting Council hasissued a
Technical Actuarial Standard (TAS) on the principles for Technical Actuarial Work(TAS100). This
describes the principles that should be adhered to when using datain technical actuarial work.
The data analysis team
must also be aware of any legal requirements
to be complied
with relating
to, for example:
protection of anindividuals
personal data and privacy
discrimination on the grounds of gender, age, or other reasons.
Withregard to privacy regulations, it is important to note that combining data from different
sources may mean that individuals
data sources.
can be identified,
even if they are anonymous in the original
Finally, data that have been madeavailable on the internet cannot necessarily be usedfor any
purpose. Anylegal restrictions should be checked before usingthe data, noting that laws can vary
between jurisdictions.
1.3
The key steps in the data analysis process in this scenario are:
1.
Develop a well-defined set of objectives that need to be met bythe results of the data
analysis.
[1/2]
Here,the objective is to determine whether young drivers are morelikely to have an
2.
accident in a given year than older drivers.
[1/2]
Identify
[1/2]
the data items required for the analysis.
The dataitems needed wouldinclude the number of drivers of each age during the
investigation period and the number of accidents they had.
3.
[1/2]
Collection of the data from appropriate sources.
The insurer
will haveits own internal
[1/2]
data from its administration
department
on the
number of policyholders of each age during the investigation period and which of them
had accidents.
[1/2]
The insurer
may also be able to source data externally,
collates information from a number ofinsurers.
4.
eg from an industry
body that
[1/2]
Processing andformatting the datafor analysis, eginputting into a spreadsheet, database
or other model.
[1/2]
The data will need to be extracted from the administration system and loaded into
whichever statistical
package is being used for the analysis.
[1/2]
If different data sets are being combined, they will need to be put into a consistent format
and any duplicates (ie the same record appearing in different data sets) will need to be
removed.
IFE: 2022 Examinations
[1/2]
The Actuarial
Education
Compan
CS1-01:
Data analysis
5.
Page 23
Cleaning data, eg addressing unusual,
missing or inconsistent
values.
[1/2]
For example, the age of the driver might be missing,or be too low or high to be plausible.
These cases will needinvestigation.
[1/2]
6.
Exploratory data analysis, which here takes the form ofinferential
analysis...
[1/2]
... as we are testing the hypothesis that younger drivers are morelikely to have an
accident than older drivers.
7.
[1/2]
Modellingthe data.
[1/2]
This mayinvolve fitting a distribution to the annual number of accidents arising from the
policyholders in each age group.
[1/2]
8.
Communicating
the results.
[1/2]
This willinvolve describing the data sources used,the modeland analyses performed, and
the conclusion of the analysis(ie whether young drivers areindeed morelikely to have an
accident than older drivers), along with anylimitations of the analysis.
[1/2]
9.
Monitoring the process
updating the data and repeating the process if required.
[1/2]
The carinsurer may wishto repeat the process againin afew years time, usingthe data
gathered overthat period, to ensure that the conclusions of the original analysis remain
valid.
10.
[1/2]
Ensuring that any relevant
has been complied
professional
guidance and legislation
(eg on age discrimination)
with.
[1/2]
[Maximum 7]
1.4
(i)
Definitions
Replication refers to anindependent third party repeating an analysis from scratch (including
gathering
an independent
data sample) and obtaining the same (or at least consistent)
results. [1]
Reproducibility refers to reporting the results of astatistical analysisin sufficient detail that an
independent third party can repeat the analysis on the same data set and arrive at the same
results.
[1]
[Total
(ii)
2]
Threereasons whyreplication is difficult
Replication of a data analysis can be difficult if:
the study is big;
[1]
the study relies on data collected at great expense or over manyyears; or
[1]
the study is of a unique occurrence (eg the standards of healthcare in the aftermath of a
particular event).
[1]
[Total 3]
The Actuarial
Education
Company
IFE: 2022 Examination
Allstudy materialproducedby ActEdis copyright andis sold
for the exclusiveuse of the purchaser. Thecopyright is
ownedbyInstitute andFacultyEducationLimited, a
subsidiary of the Institute and Faculty of Actuaries.
Unlessprior authorityis grantedby ActEd,you maynot hire
out,lend, give out, sell, store ortransmit electronically or
photocopy any part of the study material.
You musttake care of yourstudy materialto ensurethat it
is not used or copied by anybody else.
Legal action will betaken if these terms areinfringed. In
addition, we mayseekto take disciplinaryactionthrough
the profession orthrough your employer.
Theseconditionsremainin force after you havefinished
usingthe course.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 1
Probability
distributions
Syllabusobjectives
1.1
Define basic univariate distributions
quantiles and moments.
1.1.1
Define and explain the key characteristics ofthe discrete distributions:
geometric, binomial, negative binomial, hypergeometric, Poisson and
uniform
1.1.2
and use them to calculate probabilities,
on a finite set.
Define and explain the key characteristics
of the continuous
distributions:
normal, lognormal, exponential, gamma, chi-square, t, F, beta and uniform
on aninterval.
1.1.3
Evaluate probabilities and quantiles associated with distributions (by
calculation
1.1.4
or using statistical
software
as appropriate).
Define and explain the key characteristics ofthe Poisson process and
explain the connection
between the Poisson process and the Poisson
distribution.
1.1.5
Generate basic discrete and continuous
transform
1.1.6
The Actuarial
Education
random
variables using the inverse
method.
Generate discrete and continuous random variables using statistical
software.
Company
IFE: 2022 Examination
Page 2
0
CS1-02:
Probability
that are used in actuarial
work.
distributions
Introduction
This unit introduces
the standard
distributions
Welook in this chapter at all the standard probability distributions usedin Subject CS1.
This chapter does assume that you have some basic knowledge of statistics and probability.
your knowledge in this area is rusty, you can purchase additional ActEd materials to remind
If
you of
those statistical ideas. Pleasesee the ActEd websitefor further details.
Thereis a book called Formulae and Tablesfor Examinations (simply denoted the Tablesin this
course) available in the exam, which contains manyrelevant formulae for the distributions in this
chapter as well as probability tables.
Thisis available from the IFoA and you should purchase a
copy assoon as possible(if you have not already done so) asit is essential to your studying of the
Subject CS1course.
Many of the formulae in this course are contained in the Tables. So you should concentrate
being able to apply them to calculate
means, variances, coefficients
on
of skewness and probabilities,
rather than memorisingthem.
If you havestudied statistics to A-Levelstandard or equivalent you should find this chapter
straightforward. However, some ofthe standard distributions (eg lognormal and gamma) that are
used frequently
in statistical
workin finance and insurance,
may be new to you. Since we will be
usingthe properties of these distributions in the rest of the course,it is vital that you feel
confident with them.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-02: Probability
1
distributions
Page 3
Important discretedistributions
In this section we willlook at the standard discrete distributions that we will usein actuarial
modelling work.
Remember all of these results
are given in the Tables
concentrate
on understanding
and
applying them, particularly to calculating probabilities, rather than memorisingthem.
The distributions
considered
ofsuccesses,
here are all models for the number
number oftrials,
by the variables are integers
as counting variables.
1.1
of something
eg number
number of deaths, number of claims. The values assumed
from the set {0, 1, 2, 3, ...}
such
variables
are often referred
Uniformdistribution
Sample space }Sk?={1, 2, 3,
Probability
,
.
measure: equal assignment (1/k) to all outcomes, ie all outcomes are equally
likely.
Random variable
X defined
by
X()
=ii
, =()ik? 1, 2, 3,
1
x()==PX ()xk?=1, 2, 3,
k
Distribution:
,
.
,
PX x == 6 for x =1,2,...,6 .
For example,if Xis the score on afair die, 1()
Moments:
(1?++2
EX[] ==
To see that
=+
=
?++
12
+
12
? (Sk
+
21 kk +(1)
+kk)
kk
+
=
1
2
= 1 +kkk(1), suppose that:
2
+
1)
-
+
k
Rearrangingthe terms on the right-hand side, wesee that:
Sk ?(1)
k
=+
-
+
2
+
+
1
Addingthese two expressions for S gives:
2
(1?????????????????
Sk) (2 k 1)+ ? +( k - 1 +2) +( + 1) =kkk
( +1)
=+
+
+
-
k terms
So:
=+
The Actuarial
12
Education
+
? Sk
+
Company
=
1 k( k
2
+
1)
IFE: 2022 Examination
to
Page 4
CS1-02:
The second
Probability
distributions
moment is:
The result
22
(1
EX 2[]
2
12
22
?
++
2
+
2
?
++
=
+
)
1kk (k
6
==
1
k
6 kk (1)(21)++
1)(2k
+
+
(kk
+
=
kk
1)(2 k + 1)
6
1) can be proved byinduction,
but this is beyond the
scope of Subject CS1.
EX
() and EX2() givenabove, wecanshowthat the varianceof Xis:
Usingthe formulae for
k 2 -1
12
s2
=
Question
s2
Verifythat
k 2 -1
=
12
for the uniform distribution.
Solution
s
is the standard
deviation,
22
2
EX() [ E( X)]
s
and
=-
s2
is the variance,
-
2(2
3
kk
++
=
as:
+(1)2
(1)(2kk 1)
k++
=
which is calculated
64
1) 3(22
-
+
2 1)
kk+
12
k2 -1
=
12
R code for simulating a random sample from the discrete uniform distribution
To generate
a vector for sample
space
S ={1, 2, 3,..... , 20}:
S = 1:20
To simulate
100 values from this
sample(S,
1.2
100,
replace
sample space:
=
TRUE)
Bernoulli distribution
A Bernoulli trial is an experiment
possible
outcomes
s (success)
which has (or can be regarded as having) only two
and f (failure).
Sample space }Ss = {, f . The words success
necessarily
carry
IFE: 2022 Examinations
with them the ordinary
and failure
meanings of the
are merelylabels
they do not
words.
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 5
For example in life insurance,
Probability
measure:
Random variable
a success could
({ })Ps
p=
,
mean a death.
({Pf })p=-1
X defined by
Xs
() = 1,
Xf() = 0.
X is the number of successes that
occur (0 or 1).
PX
Distribution:
Moments:
=
p 1-xx), x = 0,1;
=-
pp22
=
p(1
-
p)
variable is also called an indicator
variable
its value can be used to indicate
whether or not some specified event, for example A, occurs.
Set1X = if A occurs, 0if A
does not occur. If
distribution.
The event
PA() = p then
A could, for example,
X has the above
be the survival
Bernoulli
of an assured life
An assuredlife is a person with aninsurance policy that
Another example of a Bernoulli random
1
number of sixes obtained, p==p1
See R code for
1.3
<<01p
p
s
A Bernoulli
x()== p -(1
Binomial
over one year.
makesa payment on death.
variable occurs when a fair dieis thrown
566,and )(0
PX
()
== 56 and )(1
PX
once. If
X is the
==16.
distribution.
Binomialdistribution
Consider
(i)
a sequence
the trials
of n Bernoulli trials
are independent
as above such that:
of one another, ie the outcome
of any trial
does not
depend on the outcomes of any other trials
and:
(ii)
the trials are identical, ie at each trial
Such a sequence is called a sequence
for short, asequence
of n Bernoulli
({Ps=})
p.
of n independent, identical,
()ptrials
or,
()ptrials.
A quick wayof saying independent andidentically distributed is IID.
The independence
Bernoulli
allows the probability
of a joint
be expressed as the product of the probabilities
Wewill need this idea later.
outcome involving
two
or more trials to
of the outcomes associated with each
separate trial concerned.
Sample space S:the joint
Probability
measure: as above for
Random variable
The Actuarial
set of outcomes
Education
of all n trials
each trial
X is the number of successes that occur in the n trials.
Company
IFE: 2022 Examination
Page 6
CS1-02:
Distribution:
PX
The coefficients
??
n?? p (1
x??
x() ==
-
pxn)-x
= 0, 1, 2,,? xn ;
Probability
distributions
<<01p
,
here are the same asin the binomial expansion that can be obtained using the
n?? n
??== Cx
x??
numbers from Pascals triangle, ie
n!
-
()!nx
x!
.
Wecan work out these quantities using
the nCr function on acalculator.
If Xis distributed binomially with parameters n and p, then wecan write )XBin
?
The fact that a Bin( n, p) distribution arisesfrom the sum of nindependent
Bernoulli ()
p trials is important
(, p .
n
andidentical
and will be used later to prove some important
results.
np=
Moments:
s
2
np(1 )p=-
Very often when using the binomial distribution we will write 1
-=pq.
Asan example of the binomial distribution, suppose that X is the number of sixes obtained when
C==()
x
afair dieis thrown 10times. Then PX
one six in ten throws is
exactly one six,
1
10C1 1 6
()
ie the six
10x 1 6()x 56()-10x and the
5 ()9 =0.3230.There
are10
6
could be on the first throw,
probability of exactly
=10C1
of obtaining
() ways
the second throw, ....
or the tenth throw.
Question
Calculatethe probability that atleast 9 out of a group of 10 people who have beeninfected by a
serious disease will survive, if the survival
probability for the disease is 70%.
Solution
Thenumber of survivorsis distributed binomially withparameters
=10n
, and
= 0.7p. If
Xis
the number of survivors, then:
(PX
9)
= 9 or 10)
(PX==
10??
??= 0.7
??
0.3 +
10??
910
?? 0.7 = 0.1493
910??
Alternatively, wecould usethe cumulative binomial probabilities given on page 187 of the Tables.
x8= in the Tablesfor the Bin(10,0.7) distribution is 0.8507. Subtracting this from
The figure for
1, weget 1 0.8507
0.1493-=as before.
The R code for simulating values and calculating
probabilities
and quantiles from the
binomial distribution
uses the Rfunctions
rbinom,
dbinom,
pbinom and qbinom. The
prefixes r, d, p, and q stand for random generation,
density, distribution
and quantile
functions respectively.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 7
The R code for simulating a random sample of 100 values from the binomial distribution
with
n
20=and p
n
=
20
p
=
0.3
0.3=is:
rbinom(100,
n,
To calculate
p)
=(2)PX
:
dbinom(2,
n,
p)
Similarly, the cumulative
distribution
function
(CDF) and quantiles
can be calculated
with
pbinom and qbinom.
For a Bernoulli
distribution
the parameter
n is set to
n1=
.
1.4
Geometricdistribution
Consider
variable
again a sequence
of interest
success occurs.
this
distribution
of independent,
now is the number
identical
of trials
with
({Ps=})
p.
The
until the first
Because trials are performed one after the other and a success is awaited,
is one of a class of distributions
Random variable
Bernoulli trials
that has to be performed
X:
Distribution:
Number
of the trial
For =X
x there
success, so
PX
called
waiting-time
on which the first
must be a run
x()== p(1
-
success
of )x -(1 failures
distributions.
occurs
followed
p)x - 1, x = 1, 2, 3,? (0
by a
<<1)p
1
Moments:
=
s
2
p
(1 - )p
=
p2
For example, if the probability that a phone call leads to a saleis 14
phone calls required to makethe first sale, then
(PX 3)==
14
and Xisthe numberof
2
()3
= 0.140625.
4
Question
If the probability of having a maleor female childis equal, calculate the probability that a
womans fourth
child is her first son.
Solution
3
The probability is
??
??
??
The Actuarial
Education
Company
11
=
22
0.0625 .
IFE: 2022 Examination
Page 8
CS1-02:
Consider the conditional
probability
>(PX
Giventhat there have already been n trials
more than
x additional
trials
are required
x>+ n |
X
n(
PX
of the events >X
x
>+
n|
X
>
)
without a success, whatis the probability that
to get a success?
ie just the same as the original
The lack
of success
n and
>+X
xn is just
-()
PX n>+x
=
on the first
PX
>
(1
()
n
probability
n trials
distributions
n) .
PA
To answer this, we will need the conditional probability formula )PA(| B =
The intersection
Probability
=
that
p)xn
n
()
B
PB
()
.
>+X
xn, so:
+
(1
=-
(1 - p)n
more than
is irrelevant
p) x
(PX
=
x trials
under this
>
x)
are required.
model the chances
of
success are not any better because there has been a run of bad luck.
This characteristic
a reflection
of the independent,
identical
important,
and is referred to as the memoryless
property.
trials
structure
is
Question
The probability of having a maleor female child is equal. A woman hastwo boys and a girl.
Calculatethe probability that her next two children are girls.
Solution
Dueto the memoryless property, the children she has sofar areirrelevant
whenit comes to
2
working out the probability that the next two are girls. So the probability is
Another formulation
1- p
p
2??
of the geometric distribution is sometimes used. Let Y be the number
of failures beforethe first success. Then PYp==y()
=
1??
?? = 0.25.
p(1 - )y , y = 0, 1, 2, 3,?
with mean
.
YX=-1 , where
X is defined as above.
Question
Determine the variance for this formulation.
Solution
Since
=-1YX
:
var( )
var(YX)==
IFE: 2022 Examinations
1- p
p2
The Actuarial
Education
Compan
CS1-02: Probability
Subtracting
distributions
Page 9
a constant from a random
variable does not change the spread of the distribution.
The R code for simulating
values and calculating
probabilities
and quantiles from the
geometric distribution is similar to the R code used for the binomial distribution
using the R
functions
rgeom,
dgeom, pgeom and qgeom.
For example:
dgeom(10,
calculates
the
0.3)
probability
(
= 10)PY
for
p
0.3=
.
1.5
Negativebinomialdistribution
This is a generalisation of the geometric distribution.
The random variable
X is the number ofthe trial on which the k th success occurs, where
k is a positive integer.
For example, in atelesales company, X might be the number of phone calls required to makethe
fifth sale.
Distribution:
Wesaythat
PX
x()==
k
1
??-x
??- p (1
1
??
-
kx k
p) -
xk, k
1, ?;
=+
X has a Type 1 negative binomial )kp
( ,
<<01p
distribution.
The probabilities satisfy the recurrence relationship:
PX
(x)==
x
Note that in applying
1
(1 - p) (PX
xk
-
this
k
Moments:
=
p
=
x - 1)
model, the value of k is known.
and:
Note: The mean and variance arejust
s
2 =
(1 kp )
-
p2
k times those for the geometric()p
variable, whichis
itself a special case ofthis random variable (with 1k = ). Further, the negative binomial
variable can be expressed as the sum of k geometric variables (the number of trials to the
first success, plus the number of additional trials to the second success, plus ... to the
( k -1)th success, plus the number of additional trials to the k th success.)
Question
If the probability that a person will believe arumour about a scandal in politics is 0.8, calculate the
probability that the ninth person to hear the rumour will bethe fourth person to believeit.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 10
CS1-02:
Probability
distributions
Solution
Let X be the position
of the fourth person who believesit. Then
= 0.8p
, X =9 and4k = , and
we have:
8??
(PX==9)
??
3??
Another formulation
0.8
45
0.2 = 0.00734
of the negative
binomial
distribution
is sometimes
used.
Let Y bethe number offailures beforethe kth success.
Then
where
PY
y()==
1
ky+- ??
?? p (1y
pky),
y
=
0, 1, 2, 3, ?, with
mean
(1 - kp)
.
p
=
??
k=-YX
,
X is defined as above.
Thisformulation is called the Type 2 negative binomial distribution and can befound on page 9 of
the Tables. It should be noted that in the Tables the combinatorial
terms
of the gamma function
factor
has been rewritten
in
(defined later in this chapter).
The previous formulation is known asthe Type 1 negative binomial distribution. Theformulae for
this version are given on page 8 ofthe Tables.
The R code for simulating values and calculating
probabilities
and quantiles from the
negative binomial distribution is similar to the R code used for the binomial distribution
using the Rfunctions
rnbinom,
dnbinom,
pnbinom and qnbinom.
For example:
dnbinom(15,
calculates
10,
the probability
0.3)
(PY
15)== 0.0366544
for
p
0.3= and
k
10=
.
By default,
1.6
R uses the Type 2 version of the negative binomial distribution.
Hypergeometricdistribution
This is the finite
population
equivalent
of the binomial
distribution,
in the following
sense.
Suppose objects are selected at random, one after another, without replacement, from a
finite population consisting of ksuccesses
and
-Nk failures.
Thetrials are not
independent,
since the result of one trial (the selection of a success
make-up of the population from which the next selection is made.
Random variable
of size
N that
X: is the number ofsuccesses
has k successes
IFE: 2022 Examinations
and
Nk-failures
or a failure)
affects the
in a sample of size n from a population
.
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Distribution:
Page 11
???kN
???
()
x
PX
-
k?
?
xn - x ?,
???
N
??
==
x = 0,1,2,3,? ;
01p<<
.
??
n??
Moments:
nk
N
=
s
2
nk
Nk()( -- Nn)
=
2(1)
NN -
(The details of the derivation ofthe mean and variance ofthe number of successes are not
required
by the syllabus).
Note that the
meanis given by
distribution
the initial
=
proportion
nk
,
N
which parallels the
of successes
np= result for the binomial
k
N
here being
.
In the above context, the binomial is the model appropriate to selecting withreplacement,
which is equivalent to selecting from an infinite population N?8 for which:
(success)
is kept fixed.
k
N
Pp==
Hence, the binomial,
hypergeometric
when
The hypergeometric
N is large
functions
N=pk
compared
, provides
a good approximation
to n.
values and calculating
probabilities
and quantiles from the
distribution is similar to the Rcode used for other distributions
rhyper,
to the
distribution is used in the grouping of signs test in Subject CS2.
The R code for simulating
hypergeometric
with
dhyper,
phyper
using the R
and qhyper.
For example:
rhyper(20,
simulates
15,
20 values from
10,
5)
samples
of size 5 from
a population
in
which
= 15k and
Nk-=10 .
Question
Amongthe 58 people applying for ajob, only 30 have a particular qualification. If 5 of the group
are randomly
selected for a survey about the job application
procedure,
determine the probability
that none ofthe group selected havethe qualification.
Calculatethe answer:
(i)
exactly
(ii)
using a binomial approximation.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 12
CS1-02:
Probability
distributions
Solution
(i)
Let X denote the number of applicants from the group of 5that havethe qualification.
, k 30=
Usingthe probability function of the hypergeometric distribution with N 58=
,
and n5= :
(PX 0)==
30??? 28 ?
??? ?
05 ?
???
58??
??
=
0.0214
5??
Alternatively, wecould consider in turn the probabilities that each candidate is
unqualified, and multiply the probabilities together:
(ii)
28
27
58
57
24
54
=
0.0214
Usingthe hypergeometric distribution,
approximation
with 5n = , and p =
(PX= 0)
1.7
?
= 58N
, and
30
58
=30k, so we will use a binomial
:
28??5
?? = 0.0262
58??
5?? 05
?? p q =
0??
Poisson distribution
This distribution
models the number of events that occur in a specified interval
when the events occur one after another in time in a well-defined
manner. This
presumes that the events occur singly, at a constant rate, and that the numbers
that occur in separate (ie non-overlapping) time intervals
areindependent
oftime,
manner
of events
of one another.
These conditions
can be described loosely by saying that the events occur randomly,
rate of .. per .., and such events are said to occur according to a Poisson process.
formally
define this later in this chapter.
Another approach to the
Poisson
distribution
uses arguments
which appear at first
be unrelated to the above. Consider a sequence of Binomia l n
(, p) distributions
and p0?
together,
to the distribution
Here
at a
We will
such that the
mean np is held constant
of the Poisson variable,
with parameter
at the value
sight to
as n
?8
?. The limit leads
?.
?=np .
Distribution:
== ? ?-xe,
x()
PX
x!
x
=
0, 1, 2, 3,
? ;
? 0>
The probabilities satisfy the recurrence relationship:
PX
x()==
IFE: 2022 Examinations
?
x
P( X = x - 1)
The Actuarial
Education
Compan
CS1-02: Probability
If
distributions
Page 13
X has a Poisson distribution
with parameter
?, then
wecan write X ? Poi ?() .
Moments:
Since the
binomial
meanis held constant
to suggest that the distribution
of
at
? through
X (the limiting
the limiting
distribution)
process, it is reasonable
also has mean
?. This is in
fact the case.
The binomial
variance is:
np(1
???
p)-=
??
n???1
-
???
This suggests that
nn
?
?
?
?1
=
?
? ?
?
X has variance
?? as n
n ??
-
?
?
8
?. This is in fact also the case. So
2
==s? .
Question
Using the probability
variance.
function
for the Poisson distribution,
Hint: for the variance, consider
prove the formulae
for the
mean and
EX
[( X - 1)]
.
Solution
The meanis:
E( X) ?xP( X = x)
=
=
?
-???
x
-??
+ee
=+??
+??
Since e
?
2
e
--
??
1=+?+
2!
EX
()==-? e
+
e
??
+
2!
-???
e
1=+
2!
??
+
2!
??
?
---
34
e -?
3!
?
3!
4
4!
e
?+
?
+?
???+
??
??
3!
23
3!
+? , we obtain:
?
is actually to consider
However,the easiest wayto work out the variance
EX
[( X -1)] :
[EX( X-=1)] ?x( x - 1)(
P X = x)
=
2
1
x
2 -???
e
1=+??
-??
Education
23
23
For the variance we need to work out EX()
2.
The Actuarial
??
eee +++
234
Company
ee
== ??
?
+
e
???
+3
??
2
23
23!
4
-e
+4
3
4!
e- ? +?
2
2!
???+
??
??
22
IFE: 2022 Examination
Page 14
CS1-02:
Probability
distributions
Now:
EX
[( X 1)]-= E( X )
E( X) = 22-
(EX2 )
?
??
2
=+
X)
?
2
=+(E
?
So:
var( ) =XE( X ) - [ E( X)]22
2
=+ ??
2
? -=
?
Wecan calculate Poisson probabilities in the usual way,usingthe probability function or the
cumulative probabilities givenin the Tables.
Question
If goals are scored randomly in a game of football at a constant rate ofthree per match,calculate
the probability that morethan 5 goals are scoredin a match.
Solution
The number of goalsin a matchcan be modelled as a Poisson distribution
PX (5)>= 1 -
(PX
=
with mean ?=3 .
5)
Wecan usethe recurrence relationship given:
(PX
0)== e - 3 =0.0498
(PX 1)==
(PX
2)==
(PX
3)==
(PX
4)==
(PX 5)==
So we have
3
1
3
2
3
3
3
4
3
5
(PX>=5)
0.0498
=
0.1494
0.1494
=
0.2240
0.2240
=
0.2240
0.2240
=
0.1680
0.1680
=
0.1008
1 - 0.9161
=
0.0839 .
Alternatively, wecould obtain this directly usingthe cumulative Poisson probabilities given on
page 176 of the Tables. For Poi(3), the figure for x5= is 0.91608, and 1 0.91608 0.08392-=
.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 15
The Poisson distribution
large
and p is small
provides a very good approximation to the binomial when n is
typical
applications
have
approximation
depends
areirrelevant.
So,for example, the value of
dealing
binomial
100= or more and p
?= ()np the individual
only on the product
effectively the same as the value of
n
PXx= () in the case n
PXx= () in the case n
0.05= or less.
The
values of n and p
200=and
400=and p
0.01=
.
p = 0.02 is
When
with large numbers of opportunities
for the occurrence
of rare
events (under
assumptions),
the distribution
of the number that occurs depends only on the
expected number.
We willlook at other approximations in Chapter 6.
Question
110-8 of beingkilled by
If each of the 55 million peoplein the UKindependently has probability
afalling meteoritein a given year, use an approximation to calculate the probability of exactly 2
such deaths occurring in a given year.
Solution
If Xis the number of people killed by a meteoritein a yearthen
distribution
with n = 55,000,000 and p
distribution
with:
?
np== 55,000,000
1 10 - 8
=
=
11
80- .
X follows the binomial
Wecan approximate this by usingthe Poisson
0.55
Hence:
(PX= 2)
0.55 2
2!
e-0.55 =0.0873
The Poisson distribution is often usedto modelthe number of claims that aninsurance company
receives
stretch
per unit of time. It is also used to
model the number of accidents along a particular
of road.
When events are described as occurring as a Poisson process with rate ? or randomly,
at
a rate of ? per unit time then the number of events that occur in a time period oflength t
has a Poisson distribution
with meant? .
The Poisson process is discussed in
more detail in Section 3.
The R code for simulating
values and calculating
probabilities
and quantiles from the
Poisson distribution is similar to the R code used for other distributions
using the R
functions
rpois,
dpois,
ppois and qpois.
For example, to calculate
ppois(5,
The Actuarial
Education
(PX== 5)
0.9432683 for
?
2.7=
, use the R code:
2.7)
Company
IFE: 2022 Examination
Page 16
CS1-02:
Probability
distributions
Question
The number of homeinsurance claims a company receives in a monthis distributed as a Poisson
random variable with mean2. Calculate the probability that the company receives exactly 30
claims in a year. Treat all months as if they are of equal length.
Solution
Let X denote the number of homeinsurance claims received in a year. Sincethe number of
claims in a month follows the Poi(2) distribution;
(PX==30)
Alternatively,
P( X
24 30
30!
IFE: 2022 Examinations
(24)Poi
. The required
probability is:
e-24 = 0.0363
wecould use the cumulative
30)==
X?
(PX = 30)
(
PX-=
29)
=
Poisson probabilities
0.90415
-
0.86788
=
given on page 184 of the Tables:
0.0363
The Actuarial
Education
Compan
CS1-02: Probability
2
2.1
distributions
Page 17
Important continuousdistributions
Uniform distribution
X takes
values between two
specified
Probability density function:
numbers
a and
1
fxX()=
x
a
say.
<<
- a
X?
(, ) is often written asshorthand for the random variable X has a continuous uniform
U a
distribution overthe interval
a
Moments:
=
s
2
+
, by symmetry,
2
=
(),a
.
the
mid-point
of the range
of possible
values
-ba () 2
12
Question
Prove the variance result,
[(EX
by considering
]-
)2
directly.
Solution
The variance is:
X
var[]
[(EX
=-
) ]
??(x
)
( )
x- 12 ( a+ )()f
=
x
1
2
x dx
22
-
=
dx
-a
a
=??
x-+ 1
3(
a
-+
+a =-
24
()()
a
)
()
????
?? a
()()
)
a
3??
)
a
3(a
3(
()()
a
2
33
a -11(a
=-
-=-
3(
) ()
a)
--
33
11
22
-a
()()
3( --a
11
24
+
22
a
()22
)
1
=
12
-
2
()
a
In this model, the total probability
of 1 is spread evenly
between the two limits,
subintervals
of the same length have the same probability.
The Actuarial
Education
Company
so that
IFE: 2022 Examination
Page 18
CS1-02:
Probability
distributions
Question
If
(50,150) ,
?YU
calculate
PY(> 74) and
<<(50
126)PY
.
Solution
The PDFis given by fy()
(
PY
74)>=
Similarly,
==
150 - 50
76
100
=
11
100
for
<<50
150y. This gives:
0.76
PY(50
<< 126) = 0.76. This probability is the same since the two subintervals havethe
samelength.
The Rcode for simulating 100 values from a (0,3)U distribution is:
runif(100,
min=0,
max=3)
The PDFis obtained by dunif(x,
min=0,
max=3) and is useful for graphing.
To calculate probabilities
for a continuous
distribution
punif.
For example, to calculate
PX==(1.8)
0.6 for
punif(1.8,
min=0,
Similarly, the quantiles
Although there
are not manyreal-life
with qunif.
examples of the continuous
distribution.
uniform
distribution,
Asample of random numbers from
to generate random samples from other distributions.
2.2
by
max=3)
can be calculated
nevertheless animportant
we use the CDF which is obtained
(0,3)U use the R code:
it is
(0,1)Uis often used
Wewill do this in Section 4.
Gamma
(includingexponentialandchi-square)distributions
The gamma family of distributions
PDF can take
different
variableis }xx
{:
shapes
has 2 positive parameters and is a versatile family.
depending
on the values
of the parameters.
The range
The
of the
>0.
The parameter a changes the shape of the graph ofthe PDF,and the parameter ? changesthe
x scale. The gamma distribution maybe written in shorthand as Gamma(, a? ), or Ga (,a? ) .
First note that the gamma function
()aG
is defined for
a
>
0 as follows:
8
()G= ?
a
a
1
te tdt
--
0
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 19
Notein particular that G()
11=
,
=-(1)!aa
), andG 1 ~ =
G()
2
()
GG(aa)
()a=-11
for
1 (ie if
a>
function
=>-
a
?
x
G ()a
1
a
e-
with parameters a and ? is defined by:
for x
?x
to answer examination
()nG is gamma(n).
The PDF of the gamma distribution
fX ()x
is an integer
p .
These results are given on page 5 of the Tables and are all that is required
questions.
The R code for the gamma
a
0
aa
Moments:
==2
s
,
?
?
2
Question
Prove the formulae given for the mean and variance.
Solution
Remembering that the formulae for mean and variance are EX
()
=?xf (x ) dx , and
x
var Xx)=-E? 22(
f ( x ) dx
[ (X )] , using appropriate
limits, we have:
x
8
a
EX=?G() x
?
()
a?
e-
x
dx
a
0
a
Using integration by parts withux
??
EX
()
??
=-
8
a?
=
?a
11
e
aa
??
GG()
a?
??
??0
xa?
and
a
G()
a
8
xx
dv
dx
8
- ? ()
0
a
ax
a
x
= e -?
1
, we obtain:
- ?-e
dx
?
a
?G() xe
a?x --
1
dx
0
The integral is the integral
EX
()=
?
=
of the PDF over the whole range which is 1, giving:
a
?
The Actuarial
Education
Company
IFE: 2022 Examination
Page 20
CS1-02:
Probability
distributions
Forthe variance, weneed :()
EX2
a
8
e
dx
G0() x 21?
+-a? x
?
EX() =
a
a
Usingintegration
by parts withux =
EX()
21
x
+
1
=
8
Theintegral is
??
8
+1 ,
8
weobtain:
aa
??
(
?? - ?
??0 GGa( )
??
e
a+
1)x
-
a
11-?
e
dx
?
a
a?
?G()
0
a
G()a
+-xx
a?
=-
()
0a?
?
xe
dx
-a?x
?a
+
EX
() , so we have EX2()
1= aa
, hence:
??
var()X
aa
(1)+
a
=-=??
??2
a
??
22
???
Weshall see in alater chapter that these results can be proved far
more easily using moment
generating functions.
Wecan calculate gamma probabilities in simple cases byintegrating the PDF.
Question
If X ? Gamma(2,1.5), calculate
PX >(4 ) .
Solution
Usingintegration
by parts:
8
1.52
(4) ?G(2)
xe-1.5x
PX>=
dx
4
x
+???
=-1.5
2.25
2.25
??
1.5xx
?? 4
41
1.5
1
88
ee--1.5
1.5
??
=+
-
1.5
??
??
dx
??
4
?? ??
??
8
ee--61.5x ??
?? 4??
? 4
1 ?-6
e-6 =+??
e ?
22
? 1.5
1.5?? ?
2.25 ?
= 0.0174
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 21
We will see a quicker wayto do this question later in the chapter.
The R code for simulating
and
a 2=
a random
sample
of 100 values from the gamma
distribution
with
0.25=is:
?
rgamma(100,
2,
0.25)
Similarly, the PDF, cumulative distribution function (CDF) and quantiles can be obtained
using the Rfunctions
dgamma, pgamma and qgamma.
Specialcase1: exponential distribution
Gamma with1a = .
PDF:
fxX
?XE
=??()
-
x
e
, x >0
()xp
? is often written as shorthand for the random variable X has an exponential
distribution
with parameter
Moments:
?.
11
==2
s,
?
?
2
x
FxX()
??ee??
dt
1 - --
==
tx
0
For many of the continuous
distributions,
the CDFis given in the Tables.
Question
is the value of msuchthat
?()Expdistribution. (The median
Determine the median of the
PX
m
()==1
2.)
Solution
Since PXm==m
()
FX(
1-=
),
we have:
0.5 ? 0.5 =ee
--
??
mm ?
?
m
-= ln0.5
? m =-
1
ln0.5
?
Since ln
=-0.5
ln2 ,
wecan say
m=
ln2
.
?
The exponential
distribution is used as a simple model for the lifetimes
of certain types of
equipment.
Very importantly,
it also gives the distribution
of the waiting time, T, from one
event to the next in a Poisson process withrate
?. This is proved in Section 3 ofthis
chapter.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 22
CS1-02:
The R code for
simulating
values and obtaining
Probability
the PDF, CDF and quantiles from the
exponential distribution is similar to the Rcode used for other continuous
using the
Rfunctions
rexp,
dexp,
distributions
distributions
pexp and qexp.
Question
Claims to a general insurance
with a rate of 3 per hour.
companys
24-hour call centre occur according to a Poisson process
Calculate the probability
that the next call arrives after
more than 1/2
hour.
Solution
The number of claims, X, in an hour can be modelled as a Poisson distribution with mean ?=3 .
Hence,the waitingtime, T, between claims can be modelled as an exponential distribution with
?=3 . So:
8
PT (1/2)
>=
?3e
dx = [
-
e33xx]1/2 = 0- ( - e- 11/2) = 0.2231
--
8
1/2
In fact the time from
any specified
starting
point (not necessarily
event occurred) to the next event occurring has this exponential
can also be expressed
as the
memoryless
the time
at which the last
distribution.
This property
property.
Recallthat the geometric distribution in Section 1.4 hasthe memoryless property. For the
exponential
distribution
wecan also show that:
PX x>+ n(| X > n)
=
P( X > x)
For example, the probability that we wait atleast afurther 10 minutes given that we have already
waited 20 minutes is equal to the unconditional
probability
of waiting at least 10 minutes.
Question
Prove that if
?
()XExp
?, then
>(|
PX
x>+ n X > n) = (PX
x) .
Solution
n(PX x>+ n | X > )
n(,
PX
x>+ n X
=
PX
PX
=
)
x>+ ()
n
PX > n
()
e
?
-+
e-?n
IFE: 2022 Examinations
>
>
()
n
xn()
==
-?x
=
X > x()eP
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 23
Note: A gamma variable with parameters ka = (a positive integer) and ? can be expressed
as the sum of k exponential variables, each with parameter ?. This gamma distribution is
in fact the model for the time from any specified
event in a Poisson process with rate ?.
The fact that a Gamma(, a? ) random
and identical
?()Exprandom
starting
point to the occurrence
variable can be thought
variables is important
of asthe sum of
and will be used in alater
of the
kth
aindependent
chapter to prove
some important results.
Special case 2: chi-square2()? distribution
freedom ?
Gamma
with
a
=
?
where
2
So the PDF of the chi-square
(1/2
)1/2
1/21
?
fxX()=
x
G(1/2?)
Moments:2?
Note: A
?
2
?-
distribution
e
1/2x
-
for
and
?
=
1
2
.
is:
x>0
s ==2, ?
variable
Since integrating
is a positive integer,
?
with parameter degrees of
with2?
the PDFisnt
is the same as an exponential
=
straightforward,
variable
extensive probability
with mean 2.
tables for the chi-square
distribution are given in the Tables. Thesecan befound on pages 164-166.
Another result that we will usein later workis:
If
W
Gamma? (,a?
)
, then ?2
W has a
degrees of freedom) provided that a2
2
?
2a
distribution
is aninteger.
(ie a chi-square
distribution
with a2
Thisresult is alsoin the Tables(on page 12).
Wecan prove this result using moment generating functions which we will meetin alater
chapter.
This is animportant
result asit is the only practical
gamma distribution in an exam.
way wecan calculate
probabilities
for a
Wecanlook up probabilities associated with the ?2 distribution,
for certain degrees offreedom, in the Tables.
The R code for simulating
values and obtaining the PDF, CDF and quantiles from the
chi-square
distribution is similar to the R code used for other continuous
distributions
using the Rfunctions
rchisq,
dchisq,
pchisq and qchisq.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 24
CS1-02:
Probability
distributions
Question
If the random variable Xfollows the
(a)
PX >(6.5)
(b)
PX(<11.8).
?2
5 distribution,calculate:
Solution
Using the
?2 probabilities
(a)
given on pages 164166
of the Tables, we obtain:
0.7394-=1 0.2606
(b)
Here weneed to interpolate between the two closest probabilities given,
PX 11.5)<=(0.9577 and PX 12)
<=(0.9652, so:
ie
(PX< 11.8) 0.9577 +
11.8 - 11.5
(0.9652
12 - 11.5
Alternatively, wecould useinterpolation
-
0.9577)
=
0.9622
on the ?2 percentage points tables given on page
168-169 of the Tables. These give the approximate answers of 0.2644 and 0.9604.
Wenow repeat an earlier question usingthe ?2 result.
Question
If
X
Gamma?
(2,1.5) , calculate
PX >(4 ) , by using the chi square tables.
Solution
Since X
Gamma?(2,1.5),
we know that
PX(> 4) = P(3 X 12)>=
using the
?2 probability
(
2
P ? 4
3X
?
?2
4. So:
12)>= 1 - 0.9826
=
0.0174
tables given on page 165 of the Tables.
This gives usthe same answer as we obtained earlier, but without the integration by parts.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-02: Probability
2.3
distributions
Page 25
Betadistribution
This is another
versatile family
the variable is
0 xx<<{:
1}.
of distributions
First note that the beta function
)a?
(,
with two
positive
parameters.
The range
of
is defined by:
1
?
? (,a )
)a 11dxxx(1
--
=-
0
The relationship
(,a )
?
between beta functions
and gamma functions is:
()GG a( )
=
G
+a
()
The R code for the beta function
)ab?
(, is beta(a,b).
The PDF of the beta distribution is defined by:
fxX()=-111
(,a
?
)
x
)ax
(1
for
--
<<01x
Moments:
2
s
aa
==
,
+a
a
The (continuous) uniform
() 2 (
++
a
distribution
+
1)
on (0,1) is a special case (with
==1a
).
The beta distribution is a useful distribution becauseit can be rescaled and shifted to create a
widerange of shapes - from straight lines to curves, and from symmetrical distributions to
skewed distributions.
Since the random
variable can only take values between
0 and 1,it is often
usedto model proportions, such asthe proportion of a batch that is defective or the percentage
of claims that are over 1,000.
Question
The random variable X has PDF
fxX()kx
) ,
=- x32(1
, where k is aconstant. Determine
<<01x
the value of k.
Solution
Comparing the PDFdirectly with that of the beta distribution, wecan see that
a=4 and =3 .
So:
k
The Actuarial
G(7)
(4)GG(3)
Education
==
60
Company
IFE: 2022 Examination
Page 26
CS1-02:
Probability
distributions
1
?
k can also befound directly from
(1
) dx-=kx
1 by multiplying out the bracket first and then
x32
0
integrating.
The R code for
simulating
values and obtaining
the
PDF, CDF and quantiles
distribution is similar to the R code used for other continuous
functions
2.4
rbeta,
dbeta,
pbeta
distributions
from the
beta
using the R
and qbeta.
Normal distribution
This distribution,
with its symmetrical bell-shaped
importance
in both statistical theory and practice.
(i)
it is a good
model for the
distribution
density curve is of fundamental
Its roles include the following:
of measurements
that
occur in practice in a
widevariety of different situations, for exampleheights, weights,IQ scoresor exam
scores.
(ii)
it provides good approximations
to various
limiting form of the binomial )np
(, .
It is also used to approximate
other distributions
the Poisson distribution.
in particular it is a
Both of these approximations
are
covered in Chapter 6.
(iii)
it provides
a model for the sampling
distributions
of various
statistics
see
Chapter 7.
(iv)
much of large sample statistical inference is based on it, and some procedures
require an assumption that a variable is normally distributed.
We will look at this in Chapters 9 and 10.
(v)
it is a building
The distribution
mean
about
block for
has two
many other distributions.
parameters,
and the standard deviation
which can conveniently
s of the distribution.
be expressed
directly
as the
The distribution is symmetrical
.
The notation
used for the Normal distribution
The PDF of the normal
fx() =
1
2
distribution
- 1/2
e
x - ??
s
??
??
is
is defined
?XN(,
2)s.
by:
2
for
-8 <8x <
sp
Alinear function
of a normal variable is also a normal variable, ie if
YaX=+ b.
distributed,
so is
This result can be proved using moment generating functions
X is normally
which we will meetin the next
chapter.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 27
It is not possible to find an explicit
used.
These are provided for the
expression
distribution
for
Fxx (P
of Z =
X
-
X==()x) , so tables
have to be
, which is the standard
normal
s
variable
it has mean 0 and standard
deviation
1. The distribution
is symmetrical
about 0.
Wecan also prove this result using moment generating functions.
The x-values
so on.
s
,
The z-value
,
++ 2 s ,
+ 3s
measures how
correspond
to the z-values
many standard
0, 1, 2, 3 respectively,
and
deviations the corresponding
x value is
= 30xfrom a normal distribution
with
above or below the mean. For example the value
mean20 and standard deviation 5 has z-value +2 (30 is 2 standard deviations above the
mean of 20).
The calculation
of a probability
for a normal variable is always done in the same
transform to the standard normal via z = x and look upin the tables.
way
s
Standard normal probabilities are given on pages 160-161 of the Tables.
The probabilities in the table are left
cumulative
distribution
function
hand probabilities,
of Z.
Wesometimes
in other
use
words they give
PZ < z , the()
()zF
for the CDF of Z.
Since Z is symmetrical about zero, it follows that:
PZ
<-
z
()
=
P( Z > z)
PZ
>-
z
()
=
P( Z < z)
=
1
-
(PZ < z)
Question
If XN? (25,36) , calculate:
(i)
PX <(2 8)
(ii)
PX >(3 0)
(iii)
PX <(2 0)
(iv)
(| PX
25|-< 4).
Solution
(i)
(PXZ<=28)
P
<
25??-28
Z(
??= P
36 ??
Thefollowing answers useinterpolation
< 0.5) = 0.69146
between tabulated values:
(ii)
P( X
30)
> 0.833) = 1 - (PZ < 0.833) = 1 - 0.7976 = 0.2024
(PZ>=
(iii)
P( X
20)
(PZ<=
< - 0.833)
The Actuarial
Education
Company
=
1-
(PZ < 0.833)
=
1 - 0.7976
=
0.2024
IFE: 2022 Examination
Page 28
CS1-02:
(iv)
Weneed to simplify the expression involving
(| PX - 25|
<
4)
P=( 4
X<- 25
(21=< PX
<
PX=< (29)
-
<
Probability
distributions
the absolute value:
4)
29)
PX <(21)
(PZ
=< 0.667) -PZ
(
< -0.667)
PZ
0.667)- 1 -PZ
( < 0.667)
[](
=<
= 0.4952
The normal distribution is used in
many areas of statistics and often
we need to find values of the
standard normal distribution connected to certain probabilities, for example the value of a such
that =()Pa-< Z < a
0.99 . Common examples of this type of calculation are now given.
95% and 99% intervals:
PZ(
<=1.96)
0.97500
(?- 1.96 <PZ
Similarly
<
1.96)
so
=
(0 PZ<< 1.96) = 0.97500
2
0.47500
( 2.5758 <PZ
?-
95% of a normal
distribution
<
2.5758)
-
0.5
=
0.47500
0.95
=
=
0.99 .
is contained
So (approximately):
in the interval
1.96
standard
deviations
on either
side of the mean, and 99%is contained in the interval 2.5758 standard deviations on either
side of the
mean.
Note: All but 0.3% of the distribution
3s limits.
(The range
is contained
of a large
in the interval
set of observations
from
)s
(3 , -+ 3s
a normal
the so-called
distribution
is
usually about 6 or 7 standard deviations).
Finally, we note that, if X has the standard normal distribution, then X2 has the chi-squared
distribution (the special case ofthe gamma distribution given above).
In fact X2 ?
2
?1
here. Thisresult can be usedto find
EZ()
2 and var( 2)Z
.
The R code for simulating values and obtaining the PDF, CDF and quantiles
distribution is similar to the R code used for other continuous
distributions
functions
rnorm,
dnorm,
pnorm and qnorm.
from the normal
using the R
2.5 Lognormaldistribution
If
X represents, for example, claim size and
said to have a lognormal
=logYX has a normal distribution, then
X is
distribution.
log X hererefersto naturallog, orlog to basee,ie lnX .
If
X has alognormal
IFE: 2022 Examinations
distribution
with parameters
and
s, then
we write
?logXNs2
( ,
The Actuarial
Education
).
Compan
CS1-02: Probability
distributions
Page 29
The PDF ofthe lognormal
logx
e
2
xsp
-
- 1/2
1
fx() =
distribution is defined by:
??
2
??
??
s
for 0
x<<8
Thelower limit for xis 0 and not -8 , asit is for the normal distribution. Thisis because log x is
not defined for x0=
.
Question
If
?log
(5,6)WN
, calculate PW >(3,000) .
Solution
If
log
(5,6)WN?
, then lnWN(5,6)
?
PW(
. This gives:
3,000)>= P(ln W > 8.006)
P( Z > 1.227)
=
=
1
-
0.8901
The meanand variance ofthe lognormal distribution are not
1
s+
e[]=
EX
2
2
and var[ ] Xe2
(
s+s
22
=-
=
0.1099
and s2 but are given by
1)e
.
Question
If the
mean of the lognormal
parameters
distribution
is 9.97 and the variance is 635.61, calculate the
and s2.
Solution
1
e
2
s+
2
= 9.97 and
s+2
s
ee
22
This can be rearranged to give
(1)-= 635.61, so 9.972( es
=2
s
Substituting into the equation for the
2
1)-= 635.61.
2.
mean, we get
e+ =1
9.97. Takinglogs givesus =1.3 .
Thelognormal distribution is positively skewed and is therefore a good modelfor the distribution
of claim sizes. Wealso usethe lognormal distribution in Subject CM2 to calculate the
probabilities associated with accumulating funds.
The R code for simulating
values and obtaining the PDF, CDF and quantiles from the
lognormal
distribution is similar to the R code used for other continuous
distributions
the Rfunctions
rlnorm,
dlnorm,
plnorm and qlnorm.
The Actuarial
Education
Company
using
IFE: 2022 Examination
Page 30
CS1-02:
Probability
distributions
2.6 t distribution
If the variable
normal
X has a
distribution
2
??
distribution
of the form
and another independent
variable
Z has the standard
(0,1)Nthen the function:
Z
X
/
?
is said to have a t
distribution
The t distribution,
with parameter degrees
like the normal, is symmetrical
of freedom
?.
about 0.
You do not need to know the PDF of the t distribution
for the exam. It is in fact given on page 163
of the Tables.
Calculating probabilities byintegrating this PDFis not easy. Fortunately, we will only be expected
to look
up probabilities
using page 163 of the Tables.
Question
Usethe t tablesto calculate:
(i)
(ii)
Pt(15 < 1.341)
the value of a such that
(iii)
Pt
a)>=8(0.01
Pt(24 <-0.5314) .
Solution
From the Tables:
(i)
1.341)>=15(
10% , so
Pt
(ii)
a =2.896
(iii)
Bysymmetry:
Pt
1.341)<=15(
90% .
Pt24
(
<-0.5314) (= Pt24 > 0.5314)
=
30%
The t distribution is usedto find confidence intervals and carry out hypothesis tests on the mean
of a distribution.
We will meetit again in Chapters 7, 9 and 10.
The R code for simulating values and obtaining the PDF, CDF and quantiles
distribution is similar to the R code used for other continuous
distributions
functions
rt,
dt,
IFE: 2022 Examinations
from the t
using the R
pt and qt.
The Actuarial
Education
Compan
CS1-02: Probability
2.7
distributions
Page 31
Fdistribution
If two independent
random
and 2n respectively,
variables,
X and
Y have
?2 distributions
with parameters 1n
then the function:
/ Xn1
/Yn2
is said to have an F distribution
with parameters
(degrees
of freedom)
Once again,it is not necessaryto know the PDFofthis distribution.
1n
and 2n .
Wefind probabilities by using
the F tables given on pages 170-174 of the Tables.
The F distribution
is not symmetrical.
Given that only upper tail probabilities
PF
a,,
c()>=P
ab
Tables, we will need to know the fact that
??=P Fb
11
Fc??
ab
probabilities.
??,
are given in the
()<
< 1c to find lower tail
This will be covered in greater detail in Chapter 7.
Question
Usethe F tables to calculate:
PF(
5,12< 3.106)
(i)
(ii)
the value of a such that
PF(
7,4
)
0.01a>=
.
Solution
From the Tables:
PF5,12 (3.106)>= 5%, so PF(
5,12 3.106) 95%<=
(i)
(ii)
a 14.98=
.
This distribution is usedto find confidence intervals and carry out hypothesis tests on the
variances oftwo distributions. We will meetit againin Chapters 7, 9, 10, and 12.
The R code for simulating
values and obtaining the PDF, CDF and quantiles
distribution is similar to the R code used for other continuous
distributions
functions
The Actuarial
rf,
Education
df,
from the F
using the R
pf and qf.
Company
IFE: 2022 Examination
Page 32
3
CS1-02:
Probability
distributions
ThePoissonprocess
Earlier,in Section 1.7, we metthe Poisson distribution,
X?
with probability function (PF):
?()Poi
x
PX ()
x==
?
e
?-
x!
,
x = 0,1,2, ?
Thisis useful for modellingthe number of events (eg claims or deaths) occurring per unit time.
For X?? Poi (), wehave events occurring at arate of ? per unit time.
A Poisson process occurs when welet the time period vary. Soinstead oflooking at the number
of events occurring per unit time,
we now look
at the number of events occurring up to time t.
Question
Aninsurer receives car claims at arate of 8 per calendar week. Write down the distribution of the
number of claims received:
(i)
per day
(ii)
per year.
Solution
The number of car claims per week has a Poi(8) distribution,
(i)
carclaims
perdayhasaPoi7()8distribution
(ii)
car claims per year has a Poi(416) distribution
therefore
the number of:
(using 52 weeks in a year).
From the previous question it should be clear to see that if we have X ? Poi ?() modellingthe
number of claims per unit time, then X()
)tPoi
? ( ?t will modelthe number of claims up to time t.
PXt
(( )
x)==
?
t() x
x!
e
?-
t
x = 0,1,2, ?
,
Question
The number of deaths amongst retired
members of a pension scheme occurs at a rate of 3 per
calendar month. Calculate the probability of:
(i)
5 deaths in January to Marchinclusive
(ii)
12 deaths in June to October inclusive.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 33
Solution
Let
t()Xbe the number of deathsin a t -month period. The number of deaths per calendar
monthfollows the Poi(3) distribution, therefore:
(i)
the number of deaths in January to
(PX(3)
(ii)
95
5!
12)==
Poi(9) distribution:
e-9 =0.0607
the number of deaths in June to
(PX(5)
3.1
5)==
Marchinclusive follows the
1512
12!
October inclusive follows the
(15)Poi
distribution:
e-15 = 0.0829
DerivingPoissonprocessformulae
In Section 1.7, westated that the Poisson distribution could be usedto modelevents occurring
randomly one after another in time at a constant rate and that the numbers of events that occur
in separate (ie non-overlapping) time intervals areindependent of one another. However, we
derived the distribution
from the binomial
distribution.
In this section, weshalllook at the Poisson process, Nt
() , by considering events occurring in a
small interval oftime. To start with, weshall define mathematically the properties of a counting
process and a Poisson process.
The Poisson process is an example of a counting process.
Here the number of events
occurring is of interest.
Since the number of events is being counted over time, the event
number process
(i)
{(Nt)}t=0 must satisfy the following
conditions.
N(0)= 0, ie no events have occurred attime 0.
(ii)
For any0t > ,
Nt
()
must be integer
valued.
ie wecant have 2.3 claims.
(iii)
When<st
,
Ns
() = Nt
( ) , ie the number of events over time is non-decreasing.
ie if we have counted, say, 5 deaths in 2 months, then the number of deaths counted in a
3 month period which includes the 2 month period must be at least 5.
(iv)
When<st
interval
,
)st(,
Nt
() - N( s) represents
the number
of events occurring
in the time
.
() events up to time s, so there are
ie we have counted Nt
() events upto time t and Ns
Nt
() - Ns
( ) events counted between time s andtime t .
These arethe mathematical properties of any counting process. We will now define the
mathematical properties for a Poisson process.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 34
CS1-02:
The event number
process
{(Nt)}t =0 is defined to be a Poisson
process
Probability
distributions
with parameter
?
if
the following three conditions are satisfied:
(i)
N(0
=)
0, and
Ns
() = Nt
( )
when<st
.
These are just properties (i) and (iii) from above for any counting
(ii)
PN(( t
h)+= r | N( t )
PN
(( t
h)+= r
+
1| N( t )
=
r ) ?=
PN
(( t
h)+> r
+
1| Nt
( )
=
r ) = o(h)
(Note that a function
=
r) =
o(h)
1- +?h
h
fh() is described
+
(2.1)
o( h)
as
oh() if lim
h ?0
(iii)
When<st
, the number
process.
fh()
h
= 0.)
of events in the time interval
]st(,
is independent
of the
number of events up to time s.
In other words,the numbers of events that occurin separate (ie non-overlapping) time
intervals areindependent of one another.
Condition (ii) states that in a very short time interval oflength h, the only possible numbers
of events are zero or one. Condition (ii) also implies that the number of events in atime
interval oflength h does not depend on when that time interval starts.
Question
Explain how motor insurance
claims could be represented
by a Poisson process.
Solution
The eventsin this case are occurrences of claim events (ie accidents, fires, thefts, etc)reported to
the insurer. The parameter ?represents the average rate of occurrence of claims (eg 50 per
day), which we are assuming remains constant throughout
the year and at different times
of day.
The assumption that, in a sufficiently short time interval, there can be at mostone claim is
satisfied if weassume that claim events cannot lead to multiple claims (ie no motorway
pile-ups, etc).
The reason why a process satisfying conditions (i) to (iii) is called a Poisson process is that
for afixed value of t, the random variable Nt
() has a Poisson distribution with parameter
?
t . This is proved as follows.
First we need alittle shorthand.
Let
ptn
P( Nt
( )==()n) .
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-02: Probability
So,if
distributions
Page 35
Nt
() satisfies conditions (i) to (iii) given above, then
pt
( )
n
exp{=-?t
?t()
n!
}
with probability
function:
(2.2)
Recallthat for a partitionkBB?
1,,
equations
from the conditions
and
, the probability of any event Ais:
(PA| B ) P kk(
)
PA
()B=+PA| B ) P( 11(
B ) +?
P( n by time
)
n
This will be proved by deriving differential-difference
then showing that (2.2) is their solution.
For afixed value of t0>
at time t and write:
Nt
() ? Poit(?
and a small positive value of h, condition on the number of events
th)+= P( n by time
th | n by time t ) P( n by time t )
+
(Pn bytime t++ h| n - 1 by time t ) Pn(
1 by time t )
+ ?
Hence using (2.1) and the
()nptnotation,
we obtain:
()+= p
h
nn 1( t )[h??h + o(h+)]
pt
npt
( )[1-
-
hphnn-1()
t =+
[1
t
] p ()
-??h
+
+ o(h+)] o(h)
o(
)
Thus:
nn(pt)
()
h +-
pt
=
?
h[ p
n
and this identity holds for =n
-1
t()
()] +
n pt
-
1,?2, 3,
o h()
(2.3)
.
Nowrecall the formal definition of differentiation:
dt
(ft )
=
dft+t() h
lim
f ( )??
??
h
h ?0
Now divide (2.3) by h, and let
??
h go to zero from above to get the differential-difference
equation:
pt
lim
()
h+-
p (t )
nn
=
lim
hh?? 00
++
[h pn-1(t )- p n()]
t
?
+ o( h)
hh
ie:
d
dt
By definition
pt()=-
Education
p -1[ t()
oh
()
lim
h? 0 +
The Actuarial
?
h
Company
nptnn
()]
(2.4)
?0.
IFE: 2022 Examination
Page 36
CS1-02:
Probability
distributions
We will now consider the special case when0n = . There is only one possibility in this case:
(0 by time Pt
h)+=
(0 by time Pt
+
h |0 bytime t ) P(0 by time t )
ie:
h+=00pt
()
( )[1 -h? h + o( )]
pt
So:
pt ()
h+- 00(pt)
hpt0()+ o
h( )
=-?
and therefore:
pt
(
lim
h)+- p ()
t
00
= lim
-
?hpt0()
hh?? 00
++
When0n = , an identical
d
dt
pt()
=-
00
with initial condition
+
o( h)
hh
analysis yields:
(2.5)
( )
pt?
p0(0)1= .
It is now straightforward
to verify that the suggested solution (2.2) satisfies
differential
equations (2.4) and (2.5) as well as the initial conditions.
both the
Question
Show that
d
dt
t
pt
e- ?
n()=
pt()=--1[
p
? nn
() n
?t
n!
t
()
satisfies the differential equations:
pn( t )]
and:
d
dt
pt()=-
IFE: 2022 Examinations
? 00
(pt)
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 37
Solution
pt
n()
Wehave
= e-
pt
n()
dt
?
t
?
t() n
n!
dd
=
. Calculating the derivative usingthe product rule gives:
?
e
-?t
dt??
??
t ??
()n
??
n! ??
()-nnn
??
()-nn1
=-
+?? ee
pt =-1()
?
tn??
t
1
nn!!
tt ?? tt
-- ??()
nn-!(1)!
p (t )[]
nn
pt = e- ?t , which gives a derivative of:
0()
Similarly,
pt()==-dd e
dt
{
dt
3.2
tt
+ee?--
=-
e
-- ??}
tt
-
??
pt()00=
Waiting
times betweeneventsin a Poissonprocess
This study ofthe Poisson process concludes by considering the distribution of the time to
the first event,1T , and the times between events,
23,
, ...TT
. Theseinter-event times are
often called the
waiting times
or holding times.
In Section 2.2, wesaid that the
has an
PT
t >=
()1
So the distribution
Ft() t==
that no events occur between time 0 and time t.
P N t()
()0= exp{
=
function
of1T is:
1
exp{--() ?t}
P T1
=
has an exponential
Now weconsider the distribution
Consider the conditional
PT
t | 21
T>= ()
r
=
=(1)
P Nt
Education
Company
distribution
with parameter
?.
of the time between the first and the second event.
P( T1 + T2 > t
(PN t
Hence:
-?t}
distribution
=
The Actuarial
events in a Poisson distribution
?()Exp
distribution.
P(T1 >t) is the probability
so that 1T
waiting time between consecutive
of2T
+
given the value of1T .
r | T1 = r )
r() += 1| N( r )
r() +-
(Nr )
=
= 0|
1)
(Nr) =
IFE: 2022 Examination
Page 38
CS1-02:
Because the number of events in the time interval is independent
up to the start of that time interval
P N( t
r )+-
Nr()
=01
(condition
Nr()
=
=
Probability
ofthe number of events
(iii) above):
P( N( t
r )+-
Nr()
() =)0
Since the number of events in a time interval of length
r does not depend
time interval
starts (condition (ii) above, equations (2.1)), we have:
PN t(+- r()
Hence, 2T
on when that
N r) = () = P N t ) =00()( = exp { - ?t}
has an exponential
This calculation
distributions
distribution
can be repeated
for
with parameter
? and 2T is independent
of1T .
23,, ...TT
.
Wehave now shown that each waiting time
has an Ex
?()p
distribution.
The inter-event time is independent
of the absolute time. In other words, the time until the next
event has the same distribution, irrespective
of the time since the last event or the number of
events that have already occurred.
Question
If reported claims follow a Poisson process with rate 5 per day (and the insurer
hotline), calculate the probability that:
(i)
there will befewer than 2 claims reported on a given day
(ii)
the time until the next reported claim is less than an hour.
has a 24-hour
Solution
(i)
The number of claims per day, X, follows the Poi(5) distribution,
PX (2)<=
(PX
=+
=
=
so:
0) + PX =(1)
ee 55
5
--
0.0404
Alternatively, wecould read the value of PX =(1) from the cumulative Poissontables
listed
on page 176 of the Tables.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-02: Probability
(ii)
distributions
Page 39
Thenumber
ofclaimsperhour,Y, followsthe Poi24()5distribution,
so the waitingtime
(in hours),T, followsthe Exp 24
()5distribution. Hence:
(PT<=1)
?24 e
5
0
24
55tt
11
??
dt = - e-- 24 ??
?? 0
5
= 1 - e -24 = 0.188
Alternatively, wecould usethe cumulative distribution function for the exponential
distribution given on page 11 ofthe Tables.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 40
4
CS1-02:
Probability
distributions
MonteCarlosimulation
Withthe advent
one ofthe
of high-speed
personal
computers
Monte Carlo simulations
most valuable tools of the actuarial profession.
of the practically
important
problems
have become
This is because the vast majority
are not amenable to analytical
solution.
Wehave already seen that we can simulate samples from distributions listed in Sections 2
and 3 using the
rgamma,
Rfunctions
rexp,
rchisq,
rbinom,
rgeom,
rbeta,
rnorm,
Below we outline one basic simulation
most of these distributions.
Thisis known asthe inverse transform
distributions.
rnbinom,
rlnorm,
technique
that
rhyper,
rt
rpois,
runif,
and rf.
can be used to simulate
values from
method. It can be applied to both continuous and discrete
4.1 Inverse transform methodfor continuous distributions
The method works byfirst generating arandom number from a uniform distribution on the
interval (0,1) . Wethen usethe cumulative distribution function ofthe distribution we are trying
to simulate to obtain arandom value from that distribution.
First
we generate
a random
simulate a random variate
number,
U, from the
(0,1)U distribution.
We can use this to
X with PDF fx() by using the CDF, Fx() .
Let U bethe probability that X takes on a valueless than or equalto x, ie:
x==
UP
). Then x can be derived as:
x() = F(
X
xF 1()u
-
=
Hence,the following two-step algorithm is usedto generate arandom variate x from a
continuous
distribution
1.
generate
2.
return
IFE: 2022 Examinations
with CDF Fx() :
a random
number
u from
U(0, 1),
xF 1()u
=
-
.
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Wecan represent
Page 41
this on a diagram as follows.
Recallthat the cumulative distribution,
We have a random value,
u, between 0 and 1.
Fx
() , increases from 0to 1 as x increases:
F(x)
1
u
0
If weset
x
=
-
x
-1
F (u)
=
x()uFwecan obtain arandom value, x, byinverting the cumulative distribution,
Fu . Hencethis methodis called the inverse transform
1()
This method requires that our distribution
has a cumulative
method.
distribution
function
Fx() for
which
wecan write down an algebraic formula. Thisrules out the gamma, normal, lognormal and beta
distributions.
Formally, wecan provethat the random variable X =FU1()
PX
x ==
()
P F
1[
U
()
-
x == ][P U = F x()]
=
hasthe CDF Fx() , as follows:
F ()
x
Example
Generate a random variate from the exponential distribution
The distribution
function
() 1
FxX
=-e
?-
of
with parameter
?.
X is given by:
x
Hence:
xF 1()u ==- - log(1 -)u
Thus, to generate
following
a random
?
variate
x from
an exponential
1.
generate a random variate u from U(0, 1)
2.
return
The Actuarial
distribution
we can use the
algorithm:
Education
=-
log(1
Company
-
xu )
?.
IFE: 2022 Examination
Page 42
CS1-02:
Probability
distributions
The main disadvantage ofthe inverse transform methodis the necessity to have an explicit
expression for the inverse ofthe distribution function
Fx() . Forinstance, to generate a
random variate from the standard normal distribution
we need the inverse of the distribution function:
Fx() =
1
x
transform
method
e -t 2/2 dt
?-8
2p
However,
using the inverse
no explicit
solution
to the equation
uF ()x=can be found in this
case.
However,it is possibleto generate simulated values from a standard normal distribution.
The
procedure is asfollows.
1.
2.
Generate a random
number
(0,1)U distribution.
If u > 0.5, usethe Tables directly to find z such that
simulated value from
3.
u from the
PZ
u==z()
. In this case our
(0,1)Nis z.
If
0.5u<
, usethe Tablesto find z suchthat
value from (0,1)Nis z- .
PZu==z()
1 - . In this case our simulated
Wecan generalise this methodto generate a value from any normal distribution
using the transformation
x=+ zs
XN(,
2)s?
by
.
Question
Simulate three values from the
Ex (0.1)p distribution
using the values 0.113, 0.608 and 0.003 from
U(0,1) .
Solution
Using the inverse transform
x
1
=-
-
0.1
)
=-
method, we have:
10ln(1 - uuln(1
)
This gives:
x
=-
10ln(1
-
0.113)
=
1.20
x
=-
10ln(1
-
0.608)
=
9.36
x
=-
10ln(1
-
0.003)
=
0.03
Wecan also generate random samples from
other distributions,
for example the uniform
distribution.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 43
Question
Generatethree random values from the U(1
- ,4) distribution using the following random values
from
(0,1)U
:
0.07
0.628
0.461
Solution
The distribution function for the U(1
- ,4) distribution is:
Fx
() =
x +1
5
Wenow set our random value, u, equal to this and rearrange:
x+1
=
Substituting,
ux
?
5
u=-51
we obtain:
x
5=
0.07
x
5=
0.628
-
1
=
2.14
x
5=
0.461
-
1
=
1.305
-
1
0.65
= -
Wecan alsoseeintuitively that if westart by generating arandom number from
we multiplyit by 5it will become arandom number from
(0,1)U
, then if
(0,5)U
. If wethen subtract 1,it will
become a random number from U -(1,4) .
Example
Generate a random variate
X from the double exponential
distribution
with density
function:
fx()
1
2
?
e
x
,
x=?-?
?
It is possible in this case to find the distribution function F corresponding to f and to use
the inverse transform method, but an alternative methodis presented here. The density f
is symmetric about 0; we can therefore generate a variate Y having the same distribution
as
||X and set XY=+
or XY=with equal probability.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 44
CS1-02:
The density
of
||()fyX
Probability
distributions
||Xis:
e?-
y
y=>? 0
,
which is easily recognised
algorithm therefore
1.
generate 1u
as the
density
of the exponential
generates a value for
and 2u
from
distribution.
The following
X:
(0, 1)U
,
2.
4.2
if
u1
< 0.5 returnyu
=-
ln(1
-
2)
?, otherwise
return
?=-
ln(1 yu 2 )
.
Discretedistributions
Wecannot algebraically invert
step function.
the distribution
The distribution function,
( 5) FP( X== 5) = P( X = 0)
Given a random
function
of a discrete random
variable, asit is a
Fx
() , is the sum ofthe probabilities so far, eg:
P( X+= 1) +
P( X+=?5)
(0,1)U wecan read off the x value from the distribution function
value, u, from
graph asfollows:
F(x)
1
u
0
x
0
1
From the graph,
23
4
5
6
we can see that in this particular case our value of u lies between
(2)Fand
(3)F
.
This gives3x =
as our simulated
value.
Soin general,
if ourvalue ulies between
-1jFx
() and
()jFx
then oursimulatedvalueisjx . If
the value of u corresponds exactly to the position of a step, then by convention weusethe lower
of the x values,ie the point corresponding to the left-hand end of the step.
Let X be a discrete random variable which can take only the values
x<<
xx12
xx12
?,,
, x N, where
? < N.
The first step is to generate a random number,
this to simulate
IFE: 2022 Examinations
a random
variate
X with PDF
U,from the
(0,1)U distribution.
fx () by using the CDF,
We can use
Fx() .
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 45
Let U bethe probability that
Fxjj)U
1- ()
<= (Fx
P X(=+x()
ie
Notethat for
X takes on a value less than or equal to x. Then X jx= if:
P X = 12
x ) +x??+
(PX
+
1<xx
x - 1)
=
U= PX
<
x1
()=+ (PX = x+2)
(PX = jj)
we have0Fx () = .
Hence, the following three-step algorithm is used to generate arandom variate
discrete
distribution
1.
generate
with CDF
a random
number
2.
find the positive integer
3.
return
x from a
Fx() :
u from
(0,1)U.
i such that
Fxii)
1- ()
u<=
(Fx
.
i= xx .
Wecan see that the algorithm
and that the probability
P(
that
can return
a particular
only variates
value
returned is x=ii )=< P[ Fxx--()
x from the range
i= xx is returned
U
(Fxi )]
=
F xi()
{ ?,,
xx12
, x N}
is given by:
-
(Fxi 11value
) = P( X
=
i
)
Question
Simulate two random
values from the
Poi(2) distribution
using the random
values 0.721 and
0.128.
Solution
PX
e==
x()
- x
(PXe==0)
-2
(PXe==1)
x!
2
2
,
x = 0,1,2, ?
= 0.1353
-2
= 0.2707
(PX 2)== 2 e 2 = 0.2707
-
2!
(PX
3)==
23
3!
Since
e- 2 = 0.1804
?
F(0) = 0.1353
?
F(1) = 0.4060
?
F(2) = 0.6767
?
F(3) = 0.8571 ,
0.721<<(2)
(3)FF
, the first simulated
Since 0.1<28
(0)F, the second simulated
Alternatively,
we could use the cumulative
etc.
value is 3.
value is 0.
Poisson tables on page 175 of the Tables instead
of
calculating the values.
Wecan use a similar approach for the binomial distribution.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 46
CS1-02:
Probability
distributions
Question
Generatethree random values from the Bin(4,0.6) distribution using the following random values
from
(0,1)U
:
0.588
0.222
0.906
Solution
The probability function for the Bin(4, 0.6) distribution is:
4??
xx
?? 0.6 0.44-
PX x()==
x??
x = 0,1,2,3,4
,
Calculatingthe probabilities and the cumulative distribution function:
(PX==0)
0.4 4 = 0.0256
?
F(0) = 0.0256
(PX==1)
4 0.6 0.43 = 0.1536
?
F(1) = 0.1792
(PX==2)
6
0.6
0.422 = 0.3456
?
F(2) = 0.5248
(PX==3)
4
0.6 3
0.4 = 0.3456
?
F(3) = 0.8704
(PX==4)
0.64 = 0.1296
?
F(4) = 1
Since
0.588<<(2)
(3)FF
, our first simulated value is 3.
Since
0.222<<(1)(2)FF
, our second simulated
Since
0.906<<(3)(4)FF
, our third simulated
value is 2.
value is 4.
Alternatively, it is much quicker to use the cumulative
binomial probabilities
given on page 186 of
the Tables.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 47
Chapter2 Summary
Standard discrete distributions covered in this course arethe discrete uniform, Bernoulli,
binomial, geometric, negative binomial, hypergeometric and Poisson.
Standard continuous
distributions
covered in this course are the continuous
uniform,
gamma, exponential, chi-square, normal, lognormal, beta, t and F.
The geometric and exponential distributions havethe memorylessproperty:
PX x>+ (|
n X
The properties
>
n)
=
(PX > x)
of the distributions
are summarised
on the next page.
Thet distribution with k degreesoffreedomis defined as:
tk =
N(0,1)
2
? k
k
The F distribution with
2
,mn
F
=
?m
2
?n
,mn degrees offreedom is defined as:
m
n
The Poisson process counts events occurring up to and including
Nt
() ? Poit(?
time t :
)
Tocalculate probabilities weconsiderevents occurringin a smalltime interval h. The
waitingtimes between eventsin a Poisson process have exponential distributions.
Random variables can be simulated
random number, u, from
The Actuarial
by using the inverse transform
continuous
x = -1()
Fu
discrete
x =jx
Education
Company
method. First wetake a
(0,1)Uthen weset:
where
Fx - ()j u<=1(Fxj )
IFE: 2022 Examination
Page 48
CS1-02:
Distribution
distributions
PF or PDF
Mean
1
k +1
k
2
12
p
(1 pp-)
np
np(1 )p-
1
1- p
p
p2
k
(1 -kp)
p
p2
Discrete uniform
(1 -pp)1-xx
Bernoulli
??
n??xpp(1
x??
Binomial
Probability
xn
)
-
-
Variance
k2
-1
Distribution
Geometric
(1 -pp)x - 1
Type 1
Discrete
Negative binomial
1??kpp(1
??-
x
1??
k
Type 1
)kx-
-
xe
?-
Poisson
?
?
x!
1
Continuous uniform
1
a
Gamma
xe-- a? x
G()a
Exponential
a
1
1
e x
() 2
?2
?
?-
?
a-
12
a
1
?
1
()
a+
2
a-
?
?2
?
?
()1
2
Chi-square
2
1 --122 x
?
xe
Distributions
?
2?
G()2
?
Beta
()
G+ a
a--
()GG a( )
11
(1 -xx)
a
a
()2(a
a+
a
+
1)++
Continuous
Lognormal
xsp
Normal
- 1/2
1
e
2
x
??-log
s
??
??
e
s+
1
2
2
2
s+ s
ee 22 -(1)
2
1
2
??-x
- 1/2
e
s
s2
??
??
2
sp
IFE: 2022 Examinations
The Actuarial
Education
Company
CS1-02: Probability
distributions
Page 49
Chapter2 PracticeQuestions
2.1
If X ? N(14,20), calculate:
(i)
PX <(1 4)
(ii)
PX >
(2 0)
(iii)
PX <(9)
(iv)
r such that
PX
)>=(0.41294r
.
2.2
Determine the third non-central moment of the normal distribution with mean10 and
variance 25.
2.3
Calculate
2.4
2.5
<(8)PX
if:
(i)
X ? U(5,10)
(ii)
XN? (10,5)
(iii)
X Exp?
(0.5)
(iv)
X 5??2
(v)
X ? Gamma (8,2)
(vi)
X log (2,5)N?
.
Arandom
variable
X follows the Poi(3.6) distribution.
(i)
Calculate the
mode of the probability
(ii)
Calculatethe standard deviation ofthe distribution.
(iii)
State, withreasons, whether the distribution is positively or negatively skewed.
U denotes a continuous
random
V denotes a discrete random
{1--
variable that is uniformly
distributed
over the range
(1,1)- and
variable that is equally likely to take any of the values
11
,0,22,
,1}
.
(a)
Calculate var()U
(b)
Comment
The Actuarial
distribution.
Education
and var()V .
on your answers to part (a).
Company
IFE: 2022 Examination
Page 50
2.6
Exam style
CS1-02:
Ananalyst is interested in using a gamma distribution
with density function
(i)
(ii)
with parameters
a2=
Probability
distributions
and
1/2?=
, that is,
1
- x
1
fx()=< xe
2
,
4
0
x <8 .
(a)
State the meanand standard deviation ofthis distribution.
(b)
Hence comment
briefly on its shape.
[2]
Show that the cumulative distribution function is given by:
1
Fx
()
- x
1
1=- (1 + )x e 2 ,
2
0 < x <8
(zero otherwise).
[3]
The analyst wishesto simulate values x from this gamma distribution andis able to generate
random
(iii)
numbers
u from a uniform
distribution
on (0,1).
(a)
Specify an equation involving x and u, the solution of which will yield the
simulated value x.
(b)
Comment
(c)
The graph below gives Fx
() plotted against x. Usethis graph to determine the
briefly on how this equation
might be solved.
simulated value of x corresponding to the random number u 0.66=
.
1.2
1
0.8
0.6
0.4
0.2
0
0
5
10
15
20
x
[Total
IFE: 2022 Examinations
The Actuarial
Education
[3]
8]
Company
CS1-02: Probability
2.7
Calculate
(i)
distributions
Page 51
<(8)PX
in each of the following
cases:
Xis the number of claims reported in a year by 20 policyholders. Claimsreporting from
each policyholder occurs randomly at arate of 0.2 per yearindependently of the other
policyholders.
(ii)
Xis the number of claims examined up to and including the fourth claim that exceeds
20,000. The probability that any claim received exceeds 20,000 is 0.3independently of
any other claim.
Xis the number of deathsin the coming year amongst a group of 500 policyholders.
(iii)
Each policyholder
has a 0.01 probability
of dying in the coming year independently
of any
other policyholder.
(iv)
Xis the number of phone calls made before an agent
makes the first sale. The
probability that any phone call leads to a saleis 0.01independently
2.8
Arandom
variable follows the lognormal
distribution
of any other call.
with mean 10 and variance 4. Calculate the
probability that the variable willtake a value between 7.5 and 12.5.
2.9
The random variable
N has a Poisson distribution
with parameter
? and
PNN==
(1|
1) = 0.4 .
Calculatethe value of ?to 2 decimal places.
2.10
Simulate two observations from the distribution
50
fx()
(5
+
x) 3
with probability
density function:
x=> 0
,
usingthe random numbers 0.863 and 0.447 selected from the uniform distribution on the interval
(0,1).
2.11
Exam style
Claimamounts are modelled as an exponential random variable with mean1,000.
(i)
Calculate the probability
that a randomly
selected claim amount is greater than 5,000. [1]
(ii)
Calculate the probability that arandomly selected claim amount is greater than 5,000
given that it is greater than 1,000.
[2]
[Total 3]
2.12
The ratio of the standard
deviation to the
mean of a random
variable is called the coefficient
of
variation.
Exam
style
For each of the following
distributions,
decide
whether increasing
the
mean of the random
variable increases, decreases, or has no effect on the value ofthe coefficient of variation:
(a)
Poisson with mean ?
(b)
exponential
(c)
chi-square
The Actuarial
Education
with
mean
with n degrees of freedom.
Company
[6]
IFE: 2022 Examination
Page 52
2.13
CS1-02:
Consider the following
simple
model for the number of claims,
Probability
distributions
N, which occur in a year on a
policy:
Exam style
n
0
1
2
3
PNn= ()
0.55
0.25
0.15
0.05
(a)
Explain how you would simulate an observation of N using a number r , an observation
of arandom variable that is uniformly distributed on (0,1) .
(b)
Illustrate
your
method described in (a) by simulating
three observations
of N using the
following random numbers between 0 and 1:
0.6221, 0.1472, 0.9862
2.14
Exam style
[4]
It is assumedthat claims arising on anindustrial policy can be modelled as a Poisson process at a
rate of 0.5 per year.
(i)
Determine the probability
that no claims arise in a single year.
[1]
(ii)
Determine the probability that, in three consecutive years, there is one or moreclaims in
one of the years and no claimsin each ofthe other two years.
[2]
Suppose a claim has just occurred.
(iii)
Determine the probability
that
more than two years
will elapse before the next claim
occurs.
2.15
[2]
[Total 5]
Consider the following
degrees of freedom.
three probability
statements
concerning
an F variable
with 6 and 12
Exam style
(a)
PF(
6,12 0.250) 0.95>=
(b)
PF(
6,12
(c)
PF6,12 (0.13)<=0.01
State,
4.821)
with reasons,
IFE: 2022 Examinations
0.99<=
whether each of these statements is true.
[3]
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 53
Chapter2 Solutions
2.1
(i)
Since 14is the mean,the probability is 0.5.
(ii)
PX
( Z>=20)
(iii)
(PXZ<=9)
P
<
(iv)
(PXZ>=
r)
P
>
P
The third
non-central
[(EX
) ]-=
Z(
?? = P
20 ??
914??Z(
?? = P
20 ??
r
<
>
-
1.342) 1 - 0.91020
1.118) 1 - 0.86821
=
0.0898
0.1318
=
14??-
??= 0.41294, which gives:
20 ??
14??--
PZ
2.2
20 14??-
>
rr
??<= 0.58706 ?
20 ??
E[ X2 ]
- E3
=
0.22
?r =
14.98
20
EX3[] . The formula for the skewness is:
moment is
33
[ X
]
14
+
2 3
Wealso know that the skewness of the normal distribution is zero, so:
0
[EX ]=- 3 10 (25
+
32) + 2 103
10
?
[EX3] = 1,750
Wehave worked out EX2[] here by turning around the relationship var(]XE
)]=-X[
2
E[ X()2
.
2.3
(i)
Uniform
8
(8)
PX<=
0.2x[] 85 = 0.6
=dx
?0.2
5
Alternatively, wecould usethe DFgiven on page 13 ofthe Tables.
(ii)
F(8)<=
=
85
-
10- 5
=0.6.
Normal
(PXZ<=8)
P
<
1=-
(PZ
1=
The Actuarial
PX (8)
Education
Company
810??Z(
??= P
5 ??
<
< - 0.894)
0.894)
0.81434
0.1857
IFE: 2022 Examination
Page 54
(iii)
CS1-02:
Probability
distributions
Exponential
8
(PX
8)<=?
0.5e
0.5
dx
=
e--
-
8
xx??
??0
0.5
=
1 - e-
4
=
0.98168
0
Alternatively, wecould usethe DFgiven on page 11.
(iv)
8)<= F(8) = 1 - e- 0.5
8
= 0.98168.
Chi-square
Usingthe
(v)
(PX
2?
tables on page 165 of the Tables gives PX<=
(8)
0.8438.
Gamma
The only practical wayin a written exam to calculate probabilities involving a gamma random
variable is to usethe relationship X2?
?
?
2
and then read off the probability from the
2a
?
2
tables.
PX(< 8) = P(2
(vi)
2
X < 16
(4 X 32)<= P( ? 16
?? ) = P
Lognormal
Using the fact that if
(PX
?logXNs
( ,
(i)
2
) then ln
8)<= P(ln X <ln8) = P<??Z
(PZ=< 0.036)
2.4
32)<= 0.9900
ln8
?XN(
)s ,
2
:
2??-
5 ??
0.5144
Mode
Wecan find the
mode by calculating
probabilities
and seeing
which value has the highest
probability.
PX (0 )
e-==
3.6 = 0.02732
Usingthe iterative formula for the Poisson distribution gives:
(PX
1)== 3.6 0.02732
(PX
2)==
1
3.6
2
0.09837
(PX 3)== 3.6 0.17706
3
=
0.09837
=
0.17706
=
0.21247
(PX
4)==
3.6
4
0.21247
=
0.19122
(PX
5)==
3.6
5
0.19122
=
0.13768
etc
Wecan see that 3is the mode.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-02: Probability
(ii)
distributions
Standard
Page 55
deviation
The variance ofthe
distributionis
()Poi
? distribution is
?. Sothe standard deviation of the Poisson(3.6)
3.6 1.8974=
.
(iii)
Skewness
The Poisson distribution is positively skewed asthe modeof 3is lower than the meanof 3.6. In
fact the Poisson distribution
is always positively skewed.
distributions, wefind that
For most positively skewed
mode< median< mean.
positively skewed
mod
2.5
(a)
median
mean
Meanand variance
The probability
density function
of U is constant, ie
1
fvV()
,==x
Theprobabilityfunction of Vis constantie
1,
fuU()
2
1 <1x=-< .
-1, 1/2,0,1/2,1
.
5
Bysymmetry the meanvalue of both variablesis zero.
Alternatively:
1
EU
()
uf ( u) du
=
=
??
u
=
-1
?vP( V
EV
()
1
udu
==
(=- 1
11 u2??
?? -1
24
1
1
=-=
4
4
0
v)
+)( -1/2
11
55
+)(0
1
5
+)(1/2
1
) +(1
5
1
) =
5
0
Sothe variance of Uis calculated from:
1
31122
u du==?? u
EU()
?26
1
??
-1 = 13
-1
?
Alternatively,
The Actuarial
var( U)
=- 0 2 =11
33
wecould use the formula
Education
Company
ba-2 from
1
12()
page 13 of the Tables.
IFE: 2022 Examinations
Page 56
CS1-02:
Probability
distributions
So:
2
( - 1)2 (11
52) ++
= ?v P V v
EV()22
() =
?
(b)
02 2(1) 2++ 12??== 1
?? 2
=112
var( V)0=-
22
Comment
The varianceis a measureofthe spread of values. Both distributions take valuesin the range from
- 1 to 1+ and are centred around zero. However, the variance of V is greater than the variance of
U becausethere is a greater probability of obtaining the extreme values1- and1+ .
2.6
(i)(a)
The meanand standard deviation of the distribution
For a gamma distribution
(EX)
==
and
a2=
=4
0.5
?
(i)(b)
with
sdX( ) =
?
aa
0.5=
:
22
=
= =8
22
0.5
?
[1]
The shape of the distribution
Since X cannot take negative values and the standard
gamma distribution with
(ii)
2.828
and
a2=
deviation is large relative to the
mean,the
0.5=is positively skewed.
?
[1]
Cumulative distribution function
Thecumulative distributionfunction,
x
X()
(
Fx PX
()XFx
, is:
1
1
x)==
=? 4te
t
2
-
dt
x >0
[1]
t=0
Usingintegration
1ut4=
and
by parts, with
x
Fx
X()=
dv
dt
=e
-
1
2t:
1
? 14 te - 2t
dt
t=0
??
x
??
?11
=
11
--
??--
-
+11
=-
IFE: 2022 Examinations
1
-x
2
11tdt
44
220
??0
+??
=- te
ttx
ee 22
??
--
??000
11tt
1
x
xxx
? -t1 ?
11?22
e dt =- 12xe 2 - ?e2 ?
22
-
?
1ex
2
()
?
x>0
[2]
The Actuarial
Education
Compan
CS1-02: Probability
(iii)(a)
distributions
Page 57
Equation to simulate
values of x
Weequate the random number u to the cumulative distribution function:
1
=-
(iii)(b)
ue -2 x 11+
2 x()
1
x >0
[1]
Solving the equation
For a given random number, ie a given value of u, wecould solve for x by:
trial and error
using Table Mode on the calculator
using the Newton-Raphson
method or some other iterative
approach.
[1]
Alternatively, the function for u could be plotted against x, and then usedto determine the x
value corresponding to a u value.
(iii)(c)
Usingthe graph
1.2
1
0.8
u=0.66
When =0.66u
, x = 4.5 .
0.6
0.4
0.2
0
0
5
x = 4.5
So, the simulated
2.7
(i)
10
15
20
x
value of x is 4.5.
[1]
Poisson
The number of claimsincurred by each policyholder follows the Poisson distribution with
mean0.2. Therefore the number of claims for the 20 policyholders follows the Poi(4)
distribution.
Since the Poisson distribution
cumulative
The Actuarial
probability
Education
only takes integer
values,
PX (8)
PX<=
=(7) .
Using the Poisson
tables gives 0.94887.
Company
IFE: 2022 Examinations
Page 58
CS1-02:
x
x()==??x! e-
Alternatively, wecould use PX
PX =(7) (the iterative formula
(ii)
to calculate the values of
distributions
=(0)PX
,
=
(1)PX
, ...,
would speed up this process), and then add them up.
Negative binomial
Weare counting the number of trials up to and including
Type 1 negative binomial distribution with k4=
(PX
So PX (8)
(PX
x
x)==
the 4th success. This describes the
and p
??-1
44
x?? 0.3 0.7
3 ??
0.3=
.
x = 4, 5,6, ...
(PX<=
= 4) + ? + (PX = 7).
3
??
4)===?? 0.3 4
3
??
Now using the iterative
Hence,
Probability
0.0081
formula
PX
()
x ==
x- 1
x- 4
(PX
5)==
4
1
0.7
0.0081
(PX
6)== 5
0.7
0.02268
=
0.03969
(PX
7)== 6
0.7
0.03969
=
0.05557
0.0081 + 0.02268
+
0.03969
2
3
(PX<=8)
q P( X = x -1), weget:
0.02268
=
+
0.05557
=
0.12604 .
Alternatively, wecould have calculated each ofthe probabilities usingthe probability function.
(iii)
Binomial
Here we havethe binomial distribution with n
500=and p 0.01=
. Since nis large and pis small
we could use a Poisson approximation
part (i)).
use the cumulative
and then
Poisson tables (as we did in
Bin(500,0.01) ? Poi(5)(approximately)
Using the cumulative
Alternatively,
Poisson tables gives
(
PX
8)
(PX<=
= 7) = 0.86663 .
we could calculate this accurately, starting
(PX==0)
IFE: 2022 Examinations
500??
??
0 ??
0.99
500
=
with the probability
of no deaths:
0.00657
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 59
Now using the iterative formula
(PX
1)==
(PX
2)==
(PX
3)==
(PX
4)==
(PX
5)==
500
20.99
498
(iv)
0.01
30.99
497
0.03318
0.03318
=
0.08363
0.08363
=
0.14023
0.14023
=
0.17600
0.17600
=
0.17635
0.01
496
0.01
50.99
495 0.01
0.01
0.14696
70.99
8)== (PX = 0)+
=PX
(
x -1) :
0.17635= 0.14696
60.99
494
p
xq
=
40.99
(PX==7)
Hence, P( X
0.01
nx-+1
x()==
0.01
0.00657
10.99
499
PX (6)==
PX
=
? + PX
( = 7)
=
0.10476
0.86768 .
Geometric
Weare counting the number of trials up to, but notincluding, the 1stsuccess. This describes the
Type 2 geometric
distribution
with p
0.01=
.
(PX x)== 0.01 0.99x
x = 0,1,2, ?
Now:
PX (8)<=
PX =(7)
PX== (0)
?+PX
(
+
0.01=+ 0.01
0.99
=
7)
0.01
This is a geometric series, so the quickest
0.99
+
+
0.01 0.99
27?
wayto add this up is to use the formula for the sum of a
n
geometric series Sn =
PX<=(
8)
(1 -ar )
1-r
0.01 (1-
The Actuarial
Education
Company
. This gives:
0.99 8 )
10.99
= 0.07726
IFE: 2022 Examination
Page 60
2.8
CS1-02:
Let X denote the random
Probability
distributions
variable.
Usingthe formulae for the meanand variance of alognormal distribution:
1/2 2
EX
[]
es+==10
var( )4=Xe2
(1)
(es+22 1) =
s
(2)
Squaring equation (1) and substituting into
2??
var( )
?
es
?
s
2
Xes =- 1
10 =??
4
??
2
equation (2):
-= 10.04
2
log1.04 == 0.03922
Substituting this into equation (1) gives:
2
1/2
slog10
=
2.2830=-
Sothe required probability is:
P(7.5
X<< 12.5)
(
PX
=
<
12.5)
(
PX
-
(lnPX=< ln12.5)
log7.5
PZ=<
=F
2.9
The conditional
2.2830??
(1.226)
-F
?
log12.5-- 2.2830 ?
?
0.03922
?? <?-PZ
??
( - 1.354)
0.08787
0.8020
PN =(1)
1)==
=
Trial and error gives
PN
(1)
1.62
1.62
e
=
e
?
=
0.3997 . So
-1
??=
-
ee
??
?
--11=
1.62
.
Tosimulate arandom variable werequire the distribution function,
x
Fx
()
?
?
probability is:
N(1|
PN
2.10
(lnPX < ln7.5)
-
0.03922
0.88990=
7.5)
<
P( X== x) = ?50(5 + )
0
IFE: 2022 Examinations
x
dt = - 25(5 + --t
) 32??t
??0= 1 -
Fx
() :
25
(5 +x) 2
The Actuarial
Education
Compan
CS1-02: Probability
distributions
Page 61
Wecan now use the inverse transform
=-
25
(5+x)
? ux
2
25
-
method:
=-15u
1
Substituting in our values of u, we obtain:
25
x1=--1 0.863
58.51
=
25
x2=--1 0.447
=51.72
2.11
(i)
Probability
PX>=
(
5,000)
(ii)
1 - F(5,000) =e
0.001 5,000 =e 5
-
-
=0.00674
[1]
Conditional probability
(PX>>5,000| X
1,000) =
(PX>n 5,000
X >1,000)
PX >(1,000)
PX >(5,000)
=
[1]
PX >(1,000)
Wehave alreadyfound the numerator, wejust need to find the denominator:
PX( 1,000)>= 1 -F(1,000) =e
So the required
e- 5
e
(a)
= e-
1
probability is:
(PX>>5,000| X 1,000) =
2.12
0.001- 1,000
-1
= e -4 = 0.0183
[1]
Poisson
The
()Poi
? distribution has mean ? and variance ?,so:
coefficient of variation
Asthe
(b)
?? ==
mean, ?,increases the coefficient
[1]
?
1
of variation,
1
?, decreases.
[1]
Exponential
Weare given the meanofthe exponential distribution,
the meanis
and the variance is 2
coefficient of variation
Asthe
mean,
, increases the coefficient
no effect on the coefficient.
The Actuarial
2
Education
Company
whichis
?= 1
. So workingin terms of
. Hence:
==
1
of variation,
[1]
1,is unchanged, ie changing the
mean has
[1]
IFE: 2022 Examination
Page 62
CS1-02:
(c)
The
distributions
Chi-square
2 distribution
has a mean of n and a variance of 2n.
?n
coefficient of variation
2 nn==
Hence:
2 n
Asthe mean, n,increases the coefficient of variation,
2.13
Probability
[1]
2 n, decreases.
[1]
Methodfor simulating an observation
(a)
Tosimulate a value from a discrete distribution, wefollow these two steps:
1.
2.
Calculatethe DF,
Fn (1)
If
Fn()
r = Fn-<
( ) , then the simulated value is n.
[1]
The CDFis:
n
0
1
2
3
PNn= ()
0.55
0.8
0.95
1
So the simulated
value is given by:
r== 0.55
?
?1 if 0.55 r<= 0.8
n =?
? 2if 0.8 r<= 0.95
? 3if 0.95 r<= 1
?
? 0if 0
(b)
Simulating
Since 0.55
[1]
three values
0.6221
0.8<=
, the first simulated
value is 1. Since 0
simulated value is 0. Since 0.951<=0.9862
2.14
(i)
0.1472
0.55<=
, the second
, the third simulated valueis 3.
[2]
Probability of no claims in one year
The distribution ofthe number of claims, N, in one yearis Poi(0.5). Hencethe probability of no
claimsin one yearis:
(PN== 0)
(ii)
0.5 0
0!
e-0.5 =0.60653
[1]
Probability of no claims in two of three years
Using our result from
(
PN
1)== 1
IFE: 2022 Examinations
part (i), the probability
(
PN-=
0) = 1 - 0.60653
of one or more claims in one year is:
=
0.39347
The Actuarial
Education
Compan
CS1-02: Probability
If
distributions
Page 63
X is the number of years with one or more claim, then:
X ? Bin(3,0.39347)
So we have:
(PXC==
1)
(iii)
1
0.39347
Probability that
The waiting time,
=
[2]
0.43425
morethan two years will elapse before the next claim
T, in years follows the Ex (0.5)p distribution.
0.5
2
-
PT>=
(
2) 1 - (2)F= e
2.15
0.6065332
=e- 1 =0.36788
[2]
In this question we will usethe notation Faba
,, to be the upper a% point of the Fab
, distribution.
(a)
Statement
(a), true
Weknow that F6,12,95=
(b)
Statement
or false?
1
F12,6,5
(b), true
. From the Tables,
11
== 0.250, so (a) is true.
F12,6,5 4.0
or false?
[1]
From the Tables, F6,12,1 = 4.821, so (b) is true.
(c)
[1]
Statement (c), true or false?
Weknowthat F6,12,99=
The Actuarial
Education
Company
1
F12,6,1
. From the Tables,
11
F12,6,1
7.718
== 0.13, so (c) is true.
[1]
IFE: 2022 Examination
Allstudy materialproducedby ActEdis copyright andis sold
for the exclusiveuse of the purchaser. Thecopyright is
ownedbyInstitute andFacultyEducationLimited, a
subsidiary of the Institute and Faculty of Actuaries.
Unlessprior authorityis grantedby ActEd,you maynot hire
out,lend, give out, sell, store ortransmit electronically or
photocopy any part of the study material.
You musttake care of yourstudy materialto ensurethat it
is not used or copied by anybody else.
Legal action will betaken if these terms areinfringed. In
addition, we mayseekto take disciplinaryactionthrough
the profession orthrough your employer.
Theseconditionsremainin force after you havefinished
usingthe course.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-03:
Generating
functions
Page 1
Generating
functions
Syllabusobjectives
1.4
Generating functions
1.4.1
1.4.2
Define and determine the moment generating function of random variables.
Define and determine the cumulant generating function of random
variables.
1.4.3
Usegenerating functions to determine the moments and cumulants of
random variables,
appropriate.
1.4.4
by expansion as a series or by differentiation,
as
Identify the applications for which a moment generating function, a
cumulant generating function and cumulants are used, and the reasons why
they are used.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 2
0
CS1-03:
Generating
functions
Introduction
Generatingfunctions provide a neat wayof working out various properties of probability
distributions without having to useintegration repeatedly. For example, they can be usedto:
(a)
find the
mean, variance and higher
moments of a probability
distribution.
This will recap
and build upon the work ofthe previous chapter.
(b)
find the distribution of alinear combination ofindependent random variables, eg+X
where X ?
and ?
()YPoi
. This will be covered in alater chapter.
?()Poi
(c)
determine the properties of compound distributions.
In this chapter
we willintroduce
Y
We will meetthese in Subject CS2.
two types of generating functions:
moment generating functions
(MGFs) and cumulant generating functions (CGFs). We will usethem to derive formulae for the
moments of statistical
distributions.
MGFsare used to generate
moments (and so are the
most
useful to us at this point) and CGFsare usedto generate cumulants. For our present purposes, all
we need to know is that the first three cumulants
are the
mean, variance and skewness.
Alot of students get confused in this chapter, asthey wantto know where these definitions
come from.
Basically, they
were invented
to
make calculations
of means and variances easier.
Wesaw some examplesin the previous chapter of how to calculate the meanand variance for
different
well-known probability
generating functions
to derive
distributions.
In this chapter
we will see how we can use
many of these results.
The syllabus saysdefine and determine, so makesure you know the definitions of MGFsand
CGFsand can find them (where they exist) for all the distributions metin the previous chapter. In
addition, the syllabus requires us to determine the moments and cumulants, so ensure you can
calculate EX
() and var()X for each of these distributions.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-03:
1
1.1
Generating
functions
Page 3
Momentgeneratingfunctions
Generalformula
A moment generating
the distribution
function
(MGF)
can be used to generate
of a random variable (discrete
moments (about the
or continuous), ie
EX
() E( X ), (EX
23),
origin)
of
?.
,
Although
the
moments of most distributions
the necessary integrals
provides
can be determined
or summation, utilising
considerable
directly
by evaluation
moment generating functions
using
sometimes
simplifications.
Definition
The moment generating
MtX
(), of a random variable
function,
X is given by:
MtX
() = E etX??
??
for all values
of t for
which the expectation
exists.
MGFscan be defined for both discrete and continuous random variables.
Question
Write down the value of MX
(0) .
Solution
(0) MEXe0??
??==
1 =1
[]E
Thisis true for any random variable X.
This can be a useful check in the exam.
Make sure that the expression you obtain for the
MGF
gives 1 when0t =
.
Wehave defined the expectation of afunction of arandom variable,
g ()
x P( X==?
x) or
[Eg( X)]
x
?g()xf x()Xdx. Sothe MGF
isgiven
by:
x
Mt
()
?e
tx
E[ etX]==
x
The Actuarial
gX
() , to be
Education
Company
P( X
=
x)
or
?etx
f ( x) dx
XX
x
IFE: 2022 Examination
Page 4
CS1-03:
Generating
functions
Question
Derivethe
MGFofthe random variable X with probability function:
3
(PX x)==
x = 1,2,3,
?
4x
Solution
The MGFis:
8
Mt
E(etX)
X()
tx 3
33
et
et
?e==
=416
x
+
4
x =1
Thisis an infinite
geometric series
this geometric series to infinity
Mt
X ()
where
3 et
4
1
4
23t ?
+e64
3
+
with first term
using the formula
= 3aet and common ratio
4
8=-
a
1- r
for
= 1ret . Summing
4
1 <Sr < 1 gives:
3et
==
tt
14-- ee
-< 1<ee <114 ?
4-<
tt
<4
? t
ln4 .
Nowlets derive the MGFof a continuous random variable.
Question
Derive the
MGFof the random
f()
1xx 1/2(1
)
=-
-
variable
X with probability
density function:
1= x=
Solution
The MGF
is:
()
XMt
E( etX)== ?etx
f ( x) dx
x
1
1/2(1)
=-?xetx dx
-1
11
tx
edx=- ??
1/21/2xedx
--
IFE: 2022 Examinations
tx
11
The Actuarial
Education
Compan
CS1-03:
Generating
functions
Usingintegration
Page 5
by parts on the second integral
11
??
()=XMt
11
--
tx ??
2t
11
?? 211 ?22
tt
11
=-22
2ee
tt
tt -
=- -ee + 11
tt
+
?
--
-1
??
1 ?1etx?
t ?t
t
e +2t1 --e t()
1
t
1
t
??
1
?11exetx ?
=-
1
???
1 ?etxdx
??-11
???
?
?22ttxe tx
etx??
22tt22
e
--
we obtain:
? 1
-
1
2t
1et - 1e- t()+
t
t
t
Thisis known as the triangular distribution.
Trysketching the PDFto see why.
In a moment well look at how to obtain the MGFsof the standard distributions given in the
previous chapter, but first lets find out how we can use MGFsto calculate moments.
Calculating moments
The method is to differentiate the MGF with respect
derivative giving the r th moment about the origin.
For example,
MtX
()
'
Xe []tXE
so
=
(0)
=
to t
and then
set 0t = , the r th
MEX'
()X .
Similarly:
M
'' ()
t
=
[EX etX]
?
M
'' (0)
XX
=
22 ]
[EX
M
'''()t
=
[EX etX ]
?
M
'''(0)
XX
=
33
[EX
]
etc
Question
Calculatethe meanand variance of arandom variable, X, with MGFgiven by:
MtX()1=- t 5()-
1
t
<
5
Solution
Differentiating
the
MGF:
2
t
Mt 11 =- 55()
()-
'' Mt
() 125
2 =- t5()-
The Actuarial
Education
Company
3
?
?
E( X)
E( X2)
==
M
''(0)
XX
1
5
==
M''
XX(0)
2
25
IFE: 2022 Examination
Page 6
CS1-03:
Generating
functions
So:
2
XE( X2)
var( )X=-
E [](
) =
21??
25
2
??
-
1
=
5??
25
Wenowlook at an alternative methodthat uses a series expansion of the MGF. Although it
appear to belong-winded, it can be usefulif differentiation is particularly complicated.
Expanding
the exponential
function
and taking
which is justifiable for the distributions
tX
MtX()
= Ee ()
+??1=+ tX
E
1=+ tE [ X
+
2!
+
2!
(a procedure
3!
??
X
23X
?
??
??
tt23
E X
][
23]
3!
E X
][
values throughout
here) gives:
tt23
+
expected
+
?
+
from whichit is seenthat the r th momentofthe distribution about the origin,
obtainable
as the coefficient
To use this
method to find
tr
in the power series
r!
of
might
expansion
of the
MGF.
moments, we need to obtain a series expansion of the
equate the coefficients of the powers of t
[]rEX
, is
MGF. Wethen
with the above expression.
Question
EX
())EX
( 2 and EX()
3 , wherethe MGFof Xis given by:
Use a series expansion to derive
,
MtX()1 t5()-
1
t
=-
<
5
Solution
Using the binomial expansion given on page 2 of the Tables:
-1
Mt
X()
1 5()t
=-
=+
1( - 1)
1=+
-
()
+11 tt
+
525
+
tt
(1 2)--()
125
-
52!
5
t- 23)(
+?
(1)(2)(3)()5
-
+
-
-
3!
t231 +?
The MGFcan also be written as:
XMt
()
1=+ tE ( X
IFE: 2022 Examinations
+
2!
E X
)(
tt23
+
3!
E 23
X ) +?
)(
The Actuarial
Education
Compan
CS1-03:
Generating
functions
Page 7
Equating the coefficients
EX
()
gives:
1
=
5
11
EX() =
2!
EX () =
3!
2
?
25
11
( 22=EX
)
25
125
6
33=
EX
()
?
125
If we differentiate the series expansion for the MGFwithrespect to t and then substitute 0t =
this gives
(0) MEX
( ), M
'''==
XX (0)
Mt
()
1=+ t E( X)
X
+
2!
E( X
?2),
E( X )+
tt23
t2
M t()
(EX)=+ t E( X )
?
M
'' t()
(EX )=+ tEX( 23 ) +?
The uniqueness property
If the distribution
of
23
E( X
)+?
3!
?
+
as before.
2!
(EX
23) + ? ?
?
=
(EX)
M
'' (0)
XX
=
(EX2)
etc
MGFs
of a random variable
distribution that exist can be calculated.
can be identified.
Without going deeply into
M
''(0)
XX
X is known, in theory atleast, all moments ofthe
If the
moments are specified,
then the distribution
mathematical rigour, it can in fact be said that if all moments of a
random variable exist (and if they satisfy a certain convergence
condition)
sequence of moments uniquely determines the distribution
of X.
Further, if a moment generating function
then the
has been found, then there is a unique distribution
with that MGF. Thus an MGF can be recognised
as the MGF of a particular distribution.
(There is a one-to-one correspondence
between MGFs and distributions
with MGFs).
This uniqueness
property
will be used in a number of proofs in future chapters.
Question
Arandom
Usethe
variable,
X, has MGFgiven by
MtX() exp{5 t =+ 3t 2} .
MGFslisted in the Tables and the uniqueness
property
to identify
the distribution
of X.
Solution
Examining the
MGFsgiven in the Tables we want one that involves
normal distribution hasthe following
Mt
()=+ exp{ t
The Actuarial
Education
Company
an exponential term.
The
MGF:
1/2
t22}
s
IFE: 2022 Examination
Page 8
CS1-03:
Equating coefficients,
wesee that
and
=5
s
=2
Generating
functions
6. Since X hasthe same MGFasthe
distribution, the uniqueness property tells usthat X
(5,6)N
(5,6)N?
.
Wecan alsoidentify a distribution by the series expansion ofits
MGF.
Question
Identify
the continuous
distribution
for
k!
which =[]k
EX
?
k
where k =?1, 2, 3,
, and
?>
0.
Solution
The moment generating function
X()
Mt
1=+ t E[ X]
of Xis:
tt23
E[ X ] + E[ X
23] + ?
2!
3!
+
Substituting in the values of the moments given:
Mt = 1 +
X()
t
3! tt23
12!
+
2!
+
??
23 3!
t
t2
??
2
+??= 1 +
?
1 - t ()?- 1. By comparing this to standard
Thisis
t3
+++
?
3
MGFs, we can see that the distribution
is
exponential with parameter ?.
1.2 Important examples discrete distributions
The MGFs for some of the distributions
Discrete
introduced
earlier
are found
as follows.
uniform
The probability function for the discrete uniform distribution on the integers 1, 2,..., k is:
PX
x()== 1 k
,
x =1, 2, 3,?
,
k
Sothe MGFis:
()
X Mt
E( etX )== (1
)(ket
e2t + ? + ekt )
+
ek()(1 =- etkt ) (1 - et )
Binomi)al n
(, p (including
The probability
PX
Bernoulli, for
function for the
x() ==
IFE: 2022 Examinations
??
n?? p (1
x??
-
xn
p)
for t ? 0
which n1= )
Bin)n(, p distribution
-
x,
is:
x = 1, 2,? ,n
The Actuarial
Education
Compan
CS1-03:
Sothe
Generating
functions
Page 9
moment generating function is:
M
X
n
=S
n??
pe
)t n
??(pe tx
) q n - x =+q
(
x = 0
x??
t()
Negative binomial k(, p) (including geometric, for which k1= )
The probability function is:
PX
1
??-x kx -
x()==
k
k,
??-p q
1
??
x = k k +k1,
+ 2,?
,
Sothe MGF
is:
X()
Mt
8
=
?
=xk
=
??1
etx
??1
??
?? pk qx - k
x
k
8
pe ()tk
?
=
xk
pe ()tk (1=-
x
??1
k
??
qet ) qett )]
pe [(1=-
??
(qe t ) x - k
??1
k
k
Note: The summation is valid for
qet 1< , ie for
ln(1 tq<) .
Hypergeometric
MGF not used.
Poisson
()?
The probability function is:
PX
x()==??-
x
exp()/
x! ,
x = 0,1,2,3,
?
Sothe MGF
is:
Mt
e
X()=S
The Actuarial
Education
8
(?
x= 0
Company
)e
tx
x!
=e
-- ???ee
=
e?
ett (1)
-
IFE: 2022 Examination
Page 10
CS1-03:
Generating
functions
1.3 Important examples continuousvariables
We will nowlook at how to calculate the MGFof some standard continuous distributions.
we will beintegrating
Uniform
to obtain the
Here
MGF.
a(, b)
Multiplying the PDFby txeandintegrating:
b
bt
1
dx
ba
Mt
etx
X()==?
t
ee at
--ba()
-
a
Gamma ( ,a? )
Integrate
txef ()
x from 0to
8.
This gives:
tx
Mt
X()
xxx
aa
e
()
Writing out the integral
GGaa
1
?a?
()
??
1
a
??
e--
??t- ()
dx
=-yt() , so that
x?
dy
dx
? =-
t , we have:
a
t??
G-??
0
11
()
and substituting
?
?
--a
??
dx== ??
00
8
a
MtX()
=
88a
x
e
ya
1 --y
e
dy
a
()??
t
a? G-??
=G()
a
a
?
??
=??-??
t
?
In the second line
weve used the definition
of the gamma function,
which is given on page 5 of
the Tables. Wenow see that:
??
?aaa
Mt==
X()
??
??
?
t ?
1 ??
?? = ?1?
-
???1
-- tt
Thisformula only holds when
?<t
??
?
?
. It is given on page 12 ofthe Tables.
Question
Describe what happensif wetry to evaluate EetX() for the gamma distribution
IFE: 2022 Examinations
when
The Actuarial
?=t
Education
.
Compan
CS1-03:
Generating
functions
Page 11
Solution
8
XMt
aa-1
x
()==()
? G() e
?
E etX
0
If
t x
-- ()
dx .
?
a
, then the power in the exponential factor in the integral is positive and therefore the
?=t
answer is infinite.
So the
MGFdoes not exist in this case.
From this:
'
MtX()a?aat?(
=-
1
-)
so
[]'==(0)XEX
M
a
?
'' MtX
()
( =+1)
aa
(? -
so M[]
EX2
aat? --2
)
(0)X
==
''
(1)
aa
+
2
?
Hence,
=
a
, s2
+(1)
aa
=-=??
?
2
a??
a
22
.
??
???
It follows that the MGF ofthe exponential distribution
MtX()(1=-?t)
with mean ?is given by:
1
-
Rememberthat the exponential distribution is a special case ofthe gamma distribution when
1
a=1
. The meanis
?
=
.
?
Note: The MGFofthe chi-square ? distribution is given by
MtX()(1=- 2
)t
-
?/2
.
Question
Show that this is true.
Solution
2
?
?
is gamma
with
?
a=
2
?
MtX()==
Company
2
. Soit has moment generating function:
1 ?? 2
?
??
2 ??
1 ??2
t??2 ??
Education
1
?=
?
1??2
2??
??
?
The Actuarial
and
1
t ??2 ??
1
=
12t
??
2= 1
??-??
?
-
2t()-2
IFE: 2022 Examination
Page 12
CS1-03:
Normal ( ,
Generating
functions
2)
s
The two crucial
steps in evaluating
the integral
to obtain the
MGFfor the normal
distribution
are(i) completing the square in the exponent, and (ii) recognising that the resulting integral
is simply that
of a normal
density
and hence equal to 1. The derivation
is not given in the
Core Reading, butis covered in the next question.
Theresult is:
MtX() exp()s t=+1/2 t22
Question
Prove this result.
Solution
The moment generating function
?edx
-8
exp
2
sp
N(,
2)s distribution is given by:
2??-??
11 x
8
tx
of the
?? ???? ??2
??
s
First we need to complete the square:
8
Mt ?
X()
exp tx =-
2
sp
2
exp =-
??-8sp
??2
?? ?? dx
s
??
11
8
?
2??-??
11 x
-
2 xx
-
2txs
2 ??
()??+ dx
22
2s2
??
-8
11
8
?
2
exp =-
-
xx(
2
+st
22
2s2
sp
??
()??+dx
2
)
??
-8
8
?
2
sp
exp =-
11
( -(s
+()(
xt
))22
2
-
+st
2 2 ?? dx
) ??+
2s2
??
-8
8
?
2
sp
exp =-
11
2s2
(
-(
+xtss
??
?
1
))22?? exp ?-( 2
??
? 2s 2
2
-tt
2 4 ?
s )? dx
?
-8
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-03:
Generating
functions
Page 13
Since the second factor in the integral
MtX()
11
exp =-
does not depend on x , wecan take it outside the integral:
22 4??
-( 2 st
2
sp
-t
? 22
ss
??
-8
8
=+
exp??
The function
11
??
-1
222
(
tt?? ? sp-8
22
s
now being integrated
1
?
8
)?? ? exp?-
s
xt
-+ s
s
22( x - (
2?
+ts 2)) ? dx
?
2??
2)??
????exp
dx
????
??
??
is the PDF of the normal distribution
with mean
s+t
2
and
standard deviation s. Sothe integral is 1, and hence:
MtX()exp
1
t=+??s
2
22??
t
??
as required.
Wecan obtain the moments of the normal distribution from the MGF.
Expanding:
Mt
t=+
X()
1
EX[
()t
1
s
+
22
2
coefficient
of
parameter
s
t2
2!
()+
tt22 2
s
2
+
2!
+?
t ==]
(confirming that the parameter
EX 2[] = coefficient of
1
s
doesindeed represent the
=+
22 so
var[]X
s
=+
22
-
mean).
2
= s
2
(confirming
that the
does indeed represent the standard deviation).
Alternatively, wecould differentiate the MGFto obtain the meanand variance. However, the
series methodis actually quickerin this case.
Bysetting
= 0 and =2
s
The standard
1, wecan see that:
normal random
Mt 1==
t
Z()
exp(1/2)
Hence
Now
The Actuarial
EZ[] = 0,
EZ []2 = 1,
variable
t
+ 1 22
2
1
2
+
EZ []3= 0,
=+ , and it follows that
XZs
Education
Company
Z has
MGF:
2
()t +?
2
2!
EZ []4 = 3 (coefficient
EX
()3??
-=
??
,
EX
of t 4/ 4!), ...
??0
()443s
-=
??
.
IFE: 2022 Examination
Page 14
CS1-03:
Remember that
()3?? is the skewness.
EX-
??
symmetrical, hence weexpect
Generating
functions
Westated earlier that the normal distribution is
()3?? to be zero. This has now been proved.
??
EX-
Question
In this last result, we have usedthe fact that if westandardise a normal random variable X by
setting
X -
Z=
, then
Z has the standard
normal distribution.
Use moment generating
s
functions to show that this is true.
Solution
The MGFof Z =
X -
is:
s
??-X
t
tz
Mt()e==E[ e ]
tt
??
E[
s
?? ]
=e
Usingthe formula for the MGFof the N(,
MtZ
ees
()==e2
which werecognise
conclude that
X-
tt
2??
+1
t
2
-
as the
s
E[e
ss
s
X
t
--
] =se
M
ZX
t ??
??
s
??
2) distribution gives:
2
??
ss??
MGFof
t1 2
(0,1)N. So, using the uniqueness property
of MGFs, we can
follows the standard normal distribution.
s
The MGFsdo not existin closed form for the Beta andlognormal distributions.
excludedfrom this section.
IFE: 2022 Examinations
Hence,they are
The Actuarial
Education
Compan
CS1-03:
2
Generating
functions
Page 15
Cumulantgeneratingfunctions
For
many random
variables the cumulant
generating
function
(CGF) is easier to use than the
MGFin evaluating the mean and variance.
Definition
The cumulant
generating
function,
CtX
() , of a random
variable
X is given by:
tXX()
Ct
() =ln M
Wecan treat this asthe definition ofthe CGF.
Question
The MGFof the Bin)n(, p
distribution is given by:
n
()=+ q pe
Mt
()t
State the CGFofthe Bi
)n n
(, p distribution.
Solution
+qpe tn
) = nln(
Ct
( ) ln M
XX( t )==ln(
As aresult, if
Wehave
+qpet)
CtX
() is known, it is easy to determine
MtX
() .
CtX
()
MtX
()e=
.
Calculatingmoments
The first three
derivatives
of
CtX
() evaluated
at t0=
give the
mean, variance
and
skewness of X directly.
These results
can be proved
Ct'X()
=
'' X Ct() =
'
X
as follows:
Mt
()
Mt
X()
'' Mt
()
XX
Mt
()
-
(
'
X
Mt
()) 2
((Mt))2
X
and
'''X Ct
() =
'''
Mt
()(
XX
Mt
())
3(
X
Mt
())32 ' XMt()
''
X
Mt
()-+
2M
t M
()(
t
())3
'
XX
((Mt))4
X
The Actuarial
Education
Company
IFE: 2022 Examination
Page 16
Now
CS1-03:
MX(0)
functions
= 1, so:
MX'
(0)
EX[]
==X(0)
MX(0)
1
C'
'' (0)MM
XX (0)
'' X(0)
and
Generating
-
2
' (0))
( M
X
M X(0)
3( MX (0))32 MX
' (0) M
'' (0) -+ 2
X
= var[CX] ;
(0)(MM
' (0))3
XX
MX
((0))4
[EX ](1)33
3(1)2 [EX] [EX 2 ]-+ 2(1)( [EX]) 3
=
=
22
1
(0)(MM
(0))
XX
'''
C'''
X(0) =
22
EX [](1) - ( E[ X])
==
14
()
X
skew
Question
(,)
State the CGF of X where X ? Gammaa?
. Henceprovethat
()=
EX
a
, var()X
?
skew()X =
=
a
?
2
and
2a
?
3
.
Solution
??-
?a
Mt==
()
?
a
?
??
1-
-t() a
tt?
XX ) =- a ln ? 1 ? Ct(
??
?
??
<t
?
?
?
Differentiating withrespect to t :
1
Ct
() =-
a
=
1?
''
-1
-
()
??
Ct
() =-
1 - ??
??
'''()Ct =-
1-
??
and is denoted
IFE: 2022 Examinations
-
1
?
??
21
aa?
C
==E
''(0)XXt
( X)
??
?
22
tt ??
--
=aa
1 - ??
?
)X
??
?
?? -
tr
in the
r!
byr? .
of
t ??
??
??
???
??
The coefficient
1-
=
2
aa
?
23
?1-
tt? --
??
Maclaurin
series
a
C''
XX (0)==var(
22
???
33
?
??
of
? skew X)
2a
C'''(0)
XX ==( 3
?
Ct
() =ln
M
t is called the r th cumulant
()
XX
The Actuarial
Education
Compan
CS1-03:
Generating
functions
Page 17
Another methodfor finding the cumulants is to differentiate the CGF with respect to t and
then set t0=
.
The r th derivative
then
gives the
r th cumulant,
r?
.
So:
?
1 =
?2 =
C' X(0)
C'' X(0)
?3 = C
'''X (0)
etc
Cumulants are similar to moments. Thefirst three cumulants arethe mean,the variance and the
skewness.
Question
By usingthe CGFofthe
()Poi
distribution, derive the 2nd, 3rd and 4th cumulants.
Solution
For the Poisson distribution:
Mt
()
=
e
(1)
-
?
C
t ln
XX ()
Differentiating and setting 0t =
et
Ct
() =
et
'''()Cte=
t
?? 3
C'''(0)
XX ==
''''()
Cte=
t
??
C''''(0)
XX ==
''
?
? 2
4
ett
( e
-
1)
we obtain:
=
()
'X Ct
()==
XMt
==
XX(0)
e 0 =C''
Sothe second, third andfourth cumulants of the Poisson distribution are all equal to
all the cumulants
. In fact,
are the same.
Wecan see that the CGFis particularly
useful
when the
MGFis an exponential function,
asit
makesthe differentiation alot easier.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 18
3
CS1-03:
Generating
functions
Linearfunctions
Suppose
interest.
X has MGF
The MGF of
()XMt
and the distribution
Y,
function
MtY
() say, can be obtained from that
EetY []
MYXt
()
of a linear
Ee==
[ t a
bX() ]
+
= e at E[e bt
X]
= e at M
of
Ya=+ bX is of
X as follows:
bt()
Question
(i)
Use MGFsto show that if
(ii)
Estimate
X ? Gammaa?(, ) and a2 is an integer,
then
2? X??
2
2
a.
>(75)PX
when X ? Gamma
(20,0.4).
Solution
(i)
Proof
The MGFof the Ga
(,)mmaa?
distribution is
Mt
t ?? - a
1=-X()
??
???
If
Ya=+ bX, then
.
)YX
Mt
() = e at M bt( .
In this question, 0a = , and
, so:
?=2b
2?t??
-a
tYX(2?)
Mt
()== M
1-
?
??
??
= 1
-
2t()-a
Thisis the moment generating function of the chi-square distribution witha2
freedom,
(ii)
If
so by the uniqueness of
MGFs, we can say that ?2
X follows the
degrees of
2
?
2a distribution.
Probability
X follows the
Gamma(20, 0.4) distribution,
PX (75) >= P(0.8 X
>
60)
=
P( >2
?40
then
0.8X ?
?
2
40. So:
60)
From the percentage points table of the ?2
40 distribution,weseethatthis probability
isjustless
than 0.025.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-03:
Generating
functions
Page 19
Question
If X follows the gamma distribution with parameters
a 2=
and
0.4=, calculate
?
(10)PX
using
>
direct integration.
Solution
The PDFof X is:
1
fX x()
(2)
--0.4
0.4 xe 20.4xx== 0.16xe
x
=
0
,
G
Integrating
the PDF using integration
10
(PX 10)
0.16==
xe
by parts:
10
-0.4x dx
0.4xx????
10
????= -
0.16x
0.4 ??
??
??
??
0.16 ee40
1.6
=+
0.4
0.4
--
10
x??.4--
??
0.4 ??
??0
000
+ ee
- 44
=-
--0.4
??0.16ee
dx
--0.4
4(1)-- ()
1=
-
-5e - 4
So we have:
(
PX
>=1
10)
-
Wecan check this result
1
-
5e
()--5e 44= = 0.09158
by obtaining
P ?2 >(84) using page 165 of the Tables.
Alternatively
we
can obtain PX >
(1 0) directly byintegrating between the limits of 10 and 8.
This method rapidly
becomes tedious (or impossible)
for values of
a other than very small
integers.
Wecan also obtain the CGFof alinear function.
Question
If
Ya=+ bX , derive and simplify
an expression for
CtY
() in terms
of
CtX
() .
Solution
Since
Ct
() =ln
C (t ) =ln
The Actuarial
Education
M
t() , using the expression for
YY
MtY
(), we have:
M (t ) = ln[ eat MX(bt)]=+at ln MX ( bt)
YY
Company
at=+ CX bt()
IFE: 2022 Examination
Page 20
4
CS1-03:
Generating
functions
Furtherapplicationsof generatingfunctions
Generating functions
random variables.
can be used to establish
the distribution
?,,
Alinear combination ofthe random variables X1
cX
11
of linear
combinations
of
This will be covered in detail in alater chapter.
nX is an expression of the form:
c nn
X
++?
where ?,, cc1
n areconstants.
Wecan use MGFs(or CGFs)to obtain the distribution
wecan show that if
variables, then
Moment generating
?
12
++?
Poi
2()XX
1
functions
compound distributions.
IFE: 2022 Examinations
11()XPoi
and
.
?
of such alinear combination.
, and 1X
22()XPoi
and 2X
are independent
For example,
random
We will prove results such as this later in the course.
can also be used to calculate
moments for and specify
This will be covered in detail in Subject CS2.
The Actuarial
Education
Compan
CS1-03:
Generating
functions
Page 21
Chapter3 Summary
Generatingfunctions are usedto makeit easierto find moments of distributions.
The moment generating function (MGF) of arandom variable is defined to be:
MtX
() = E etX??
??
The series expansion for
X()
Mt
The formulae
1=+ tE ( X)
for the
M()=
EX
MGFsis:
+
2!
E( X )+
3!
E( 23
X )+
?
mean and variance are:
(0)X
'
tt23
var(
)=-XMXX(0)M
'' (0)
[]'
2
The cumulant generating function (CGF) of a random variable is defined to be:
Ct
() =ln XX
M ()
t
Theformulae for the moments are:
C()=
EX
'
(0)X
va
=r( ) XCX(0)
''
skew()= C
'''(0)XX
The uniqueness property meansthat if two variables havethe same MGFor CGFthen they
have the same distribution.
If
Ya=+ bX , then:
Mt
M bt) and
() = eat YX(
The Actuarial
Education
Company
Ct
at=+()
C btYX(
)
IFE: 2022 Examination
Page 22
CS1-03:
Generating
functions
The questions start on the next page so that you can keep
all the chapter summaries together for revision purposes.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-03:
Generating
functions
Page 23
Chapter3 PracticeQuestions
3.1
Exam
(i)
Determine the moment generating function ofthe two-parameter exponential random
variable X, defined by the probability density function:
style
fx()
e
x--
?a()
x==?a
where ,?a
>
0.
[3]
,
(ii)
3.2
Derive from first principles the
PX
3.3
3.4
Exam style
Hence, or otherwise, determine the meanand variance of the random variable X.
[4]
[Total 7]
(1- ?? ) x - 1
x()==
moment generating function
Determine the cumulant
(ii)
Hence determine the
generating function
of the
mean, variance and coefficient
moment generating function
of X is
N(,
Now suppose that
(ii)
X, where
2)s distribution.
of skewness of this distribution.
MtX
() .
Derivean expressionfor the momentgeneratingfunction of
(i)
variable
x = 1,2,3, ? .
(i)
Suppose that the
of a random
X is normally distributed
Derive the distribution
of+23X
with mean
and variance
+
23X
in terms of
s2
MtX
(). [2]
.
.
[2]
[Total 4]
3.5
Exam
The momentgeneratingfunction,
MtY
(), of arandom variable, Y, is given by:
style
MtY
() 1
()-2
=- 4t
t < 0.25
Calculate:
(i)
(ii)
EY
()
[1]
the standard deviation of Y
(iii)
[2]
.()
EY6
[2]
[Total 5]
3.6
Exam
The random variable U has a geometric distribution with probability function:
style
PU
(
u)== pq u
(i)
(ii)
-
1
u =1,2,3, ?
where
pq+
=1
Derivethe momentgeneratingfunction of U.
Writedownthe CGFof U, and henceshowthat
[2]
()p= 1/EU
.
[3]
[Total 5]
The Actuarial
Education
Company
IFE: 2022 Examination
Page 24
3.7
CS1-03:
Arandom
Exam style
variable
f()=>
X has probability
-2xxke x
Generating
functions
density function:
R
where R and k are positive constants.
(i)
(ii)
(a)
Derive aformula for the moment generating function of X.
(b)
Statethe values of t for whichthe formula in part(i)(a)is valid.
[4]
Hencedeterminethe value of the constant k in terms of R.
[1]
[Total
3.8
(i)
Derive,from first principles, the moment generating function of a Ga
(,)mmaa?
random
variable.
Exam style
(ii)
5]
[3]
Show, using the moment generating function, that the meanand variance of a
Gam)maa?(,
random
variable are
a ? and
a?
2
, respectively.
[2]
[Total 5]
3.9
Exam
Xis normally distributed with mean
and variance s2.
style
Determine the fourth central moment of X.
3.10
The claim amount
[3]
Xin units of 1,000 for a certain type of industrial
gamma variable with parameters
Exam style
a3=
and
?
policy is modelled as a
1/4=
.
(i)
Use moment generating functions
(ii)
Calculate the probability
to show that
that a randomly
1
2
X?
2
6?.
chosen claim amount
[3]
exceeds 20,000.
[2]
[Total 5]
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-03:
Generating
functions
Page 25
Chapter3 Solutions
3.1
(i)
Usingthe definition of an MGF:
8
X()
Mt
?e
E etX??
==
??
() dx
x--tx?a
e?
a
8
=
?
?a
?
ee
()tx
?--
dx
a
e?a
?
=-
t
?
?
? -
(ii)
Re-writing the
()tx
?--
e
??
8
??
??a??-
providedet?
=<
ta
t
[3]
MGFto makeit easier to differentiate:
-1
Mt
t ??
1=-X()??
ea
1
??
?? e
t
? ??
()
Mt'
X
=-
??
M()
EX
?
tt
+
??
'X (0)==
1
1-
21tt1
??
aaa
?? e
--
? ??
[2]
+a
?
''
Mt
X()
?
?
M()
EX
2
1=-
''
??
??
??
e2tt
X (0)==
?
?
22 a
var()X
=+
2
22a
1-
??
22a
+
+a
-1
eaa
+a
1-
??
??
? ??
t
ta
e
22
?
+ 2
2
1
-
1
??
??+ = 22
??
aa
??
3.2
32
tt ??
???
??
--
+
[2]
??
The MGFis given by:
()
XMt
E etX()==
88
etx P( X = x) =?? etx (1
-??
) x- 1
xx== 11
)??
The Actuarial
Education
Company
=+
eett -(1
e+ 23t-(1
?
)2 +
()
?
IFE: 2022 Examination
Page 26
CS1-03:
The expressionin the bracketsis aninfinite geometric series with
Generating
functions
ret?(1=- )
=aet and
.
Summing it gives:
?et
Mt =
X()
3.3
(i)
1
1
()1-?
??
2
1
ln M ()== t + s
22
t
2
Differentiating and setting 0t =
Ctt=+
()
?
s2
gives:
C
'' (0)==
XX
E( X)
'' Ct
() =
?
)X
C''(0)
==var(
ss 22
XX
'''()Ct = 0
?
Skew X)
Since the skewness is zero, the coefficient
(i)
ln
t22
??
exp t=+??
s
? tXX
Ct
()
3.4
?
For the normal distribution:
Mt
X()
(ii)
-<<ett
1(1 - ?) < 1
where
1(1
--?et )
C'''(0)
==(0
XX
of skewness is also 0.
MGF
The MGFof Xis:
MtX
()e= ( )tXE
So the
MGFof
+23Xis:
MXXt
23()
+
(ii)
[Ee (2tX
3)
]== [Ee(2t ) X
++3t ] = e3t [Ee(2)t
] = e3t Mt(2 )
[2]
Distribution
The MGFfor a N(,
2)srandom variable is
tXX
Mt e2tt(2
M
23()+
)==33
e e
Thisis the MGFof the N(2 s+ 3,4
MGFs,
X
2
ts++s2 t22
=e(2
. Usingthe formula derivedin part (i):
+3)t 1/2(4 2)t
2
) distribution. Therefore by the uniqueness property of
+23Xfollows the N(2 s+ 3,4
IFE: 2022 Examinations
22s+
tt1/2
e
2)
distribution.
[2]
The Actuarial
Education
Compan
CS1-03:
3.5
Generating
(i)
functions
Page 27
Expectation
3
Mtt=( ) 8(1 4 )
8''-
(ii)
M
==E
YX(0)
( Y)
[1]
Standard deviation
'' MtY
()
=- 4t) -
Sixth
Recallthat
? E( Y
4296(1
) = 96
96=- 82 = 32
? var( Y)
(iii)
?
? standard deviation
32== 5.6569
[2]
moment
EY6() is the coefficient of
of the MGF,(1
-
t6
in the expansion of
6!
MtY
(). From the binomial expansion
)t - 2, (using the formula given on page 2 ofthe Tables)the term is:
4
-- 23 - 4 - 5 -
-6 7
6!
-(4 t ) 6
[1]
Hence, EY
(66
) = - 2 - 3 - 4 - 5 - 6 - 7 ( - 4) = 20,643,840.
Alternatively, wecan use EY()M=
3.6
(i)
6(6)(0)
but this requires us to differentiate the
Y
[1]
MGFsix times.
MGF
The MGFof Uis:
88
UMt()
E etU??==
etuP(U = u)
??
=?? etu pq u- 1
uu== 11
pe =+ pqe tt
This is aninfinite
+
pq22 e 3t
geometric series
+
with
[1]
?
ap
te= and
rqte= so using the formula
S8 =
a
gives:
1- r
pet
Mt =
U()
[1]
1-qet
(iii)
CGFand mean
Wehave:
U()
ln
Ct==
The Actuarial
Education
pe
??t
??
1 qet ????
Company
lnp
+t -ln 1 -qe ()t
[1]
IFE: 2022 Examination
Page 28
CS1-03:
Differentiating
Ct'
U ()
qett
qe ??=1
1=-+??
qe ??
??
11
--qe
[1]
tt
:
11
p
q
C()
EU1==' U(0)
(i)(a)
functions
the CGF:
Substituting in t0=
3.7
Generating
+
=
11-- qq
[1]
=
MGF
The MGFof X is:
8
X()
Mt
E[ e]tX
==?etx ke- 2x dx
[1]
R
--tx
e (2-ke (2 ) dx==??
k
8
?
8
) ??
tx
[1]
(2--t )??R
??
R
ke--tR
(2 )
=
(i)(b)
[1]
(2)-t
Values of t for whichvalid
Theintegral converges as x?8
(ii)
3.8
onlyif 2t-
[1]
Evaluate k
-2R
Puttingt0=
givesMkX(0)= 12
Since MX
(0)
mustequal 1,this tells usthat
(i)
is positive. Sothe MGFis valid for t2< .
e.
[1/2]
= ke2
2R.
[1/2]
dx
[1]
MGF
XMt()
8
a
8
-1
?
=
?
x
-1
ea?
x
G()
a
- a? -tx
()
G()? xe
a
IFE: 2022 Examinations
a
()==?
0etx
E etX
dx
0
The Actuarial
Education
Compan
CS1-03:
Generating
functions
Page 29
Theintegral looks like the PDFof a Gamma (,
)ta?random variable, so putting in the
appropriate constants:
()
8
aa
??-t
Mt =
X()
?
() a
?-t
Ga
0
provided
-tt?
??aa
??
?? ==
??-t
?
??
??
?1-
=
?
wecan use the substitution
a
--
?
??
?
of a Gamma PDF over the
[2]
?
?
whole range is 1.
method.
Meanand variance
Using the results
()
Mt
()
M' (0)XEX
and
=
1=-
'' Mt=()
M''
X(0):
=
-- 1
t ??
??
?? ??
?
aaa
M
==E
''XX(0)
( X)
(1)
=-
aa
var()X
+
a
??
The MGFof
2
a--
X()2 M''
==Eaa
XX(0)
?
22
22
==1
1/222t
()MtX-e
So [(EX
)
2
=
t
s
a
?
[1]
2
11
+22
+
4]-is the coefficient of t
4
4!
EX
s-=[(
]
(1)
22
???
s2(0,
, whichfollows the N
X-
[1]
?
(1)?? t
1 ??
aa++
EX2()
??
3.9
t=<
- t
??
(ii)
dx
() a
?
Alternatively,
)
a
?
since the integral
e 1(a?tx-- -
x
()
) distribution, is:
2
ss
t
2
()1+?
2 2
[1]
in this series,ie:
3 44)
[2]
Alternatively, wecan differentiate the MGFfour times and substitute t0=
EX
() E( X ), (EX
23) and
EX4() .
Wethen use the expansion of
[(EX
eachtime to obtain
) 4]-:
,
E[( X
) ]-=
44
(EX
) - 4 E( X3 )
+
6
2
(EX2 )
-
3 4
It mightbetempting to usethe CGFasit givesthe secondandthird central momentsva
=r( ) XCX(0)
''
and skew()=
The Actuarial
C
'''(0)XX
. However, thereafter the CGFdoes not give central moments.
Education
Company
IFE: 2022 Examination
Page 30
3.10
CS1-03:
(i)
Generating
functions
MGF
Weare giventhat X ? Gamma 3, 14
() . From the Tables:
t ??
-a
XMt()
=-
??
???
()-3
=11- 4t
[1]
1
. Then:
YX2=
Let
??
EetY
M t()
==
1
Ee2
tX() ??
M
YX
??=
t ??
??????
2??
-3
t ????
1-2t
???? =()-3
2??
??
=-14
So the
moment generating function
Bycomparing this withthe
[1]
is
YX2=
() - 3
1
of
-12t
.
MGFofthe gamma distribution in the Tables, wesee that this is the
same asthe MGFof a Gamm 3,1/2a
() distribution. Lookingalso at the definition ofthe chi-square
distribution, wesee that Gamm
3,1/2a
() is the definition of a chi-square distribution with 6
degrees of freedom.
Bythe uniqueness property
1
X
2
(ii)
of moment generating functions,
therefore,
we have shown that
2
? ?6 .
[1]
The probability that a claim exceeds 20,000
3, 14
()
Weare given that
X ? Gamma
Wealso know that
2? X ? ? 22a , ie 12 X ? ?62 . Therefore,usingthe Tables:
()
PXX>=20
IFE: 2022 Examinations
P
1
2
()
>10 = P
where X is the claim amount in units of 1,000.
6
()2?1
>10
=
-
0.8753
=
[2]
0.1247
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 1
Jointdistributions
Syllabus objectives
1.2
Independence,
joint
and conditional
distributions,
linear combinations
of random
variables
1.2.1
Explain what is
meant byjointly
distributed
random variables,
marginal
distributions and conditional distributions.
1.2.2
Define the probability function/density function of a marginal distribution
and of a conditional distribution.
1.2.3
Specifythe conditions under whichrandom variables areindependent.
1.2.4
Define the expected value of a function
of two jointly
distributed
random
variables, the covariance and correlation coefficient between two
variables, and calculate such quantities.
1.2.5
Define the probability function/density function of the sum oftwo
independent
1.2.6
1.2.7
Education
variables as the convolution
of two functions.
Derivethe meanand variance oflinear combinations ofrandom variables.
Usegenerating functions to establish the distribution oflinear
combinations
The Actuarial
random
Company
of independent
random
variables.
IFE: 2022 Examination
Page 2
0
CS1-04: Joint
distributions
Introduction
Asyet, wehave only considered situations involving onerandom variable. In this chapter we will
look at some general results involving two or morerandom variables.
This chapter is quite long,
and contains alarge amount
of material. It
maytherefore
be helpful to
notice the parallels withthe single random variable notation, in order to aid understanding ofthe
overall structure of the chapter.
Firstly
we will define a joint
probability (density) function
how wecan obtain a marginaldistributionie
will look at conditional
distributions
PX= ()
x or
x(| Y==PX
y) or
Y(,
x ==PX
y) or f (,xy) .
We will see
x from the joint distribution. Then we
()f
(|)f
xy . The study of conditional
distributions continues in the next chapter. It might be worth studying the next chapter with this
one asthe materialin the two chapters is quite closely linked.
Given a distribution involving two random variables, we will explain how to work out the mean
and variance of each random variable, and the covariance of the two random variables.
We will
also define the correlation
coefficient.
This work will be continued in alater chapter,
where we
will attempt to estimate whatthe correlation is from asample.
Finally, we will extend our work on MGFsfrom the previous chapter to combine distributions
together. This will give us easier waysof obtaining results for the binomial, negative binomial and
gamma distributions,
IFE: 2022 Examinations
amongst others.
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 3
1
Joint distributions
1.1
Joint probability (density) functions
Defining several random
multivariate distribution.
variables simultaneously
on a sample space gives rise to a
In the case of just two variables, it is a bivariate distribution.
Discretecase
To illustrate
the various
this for a pair of discrete variables,
values of )xy
(,
are as follows:
X and
Y, the probabilities
associated
with
x
y
1
2
3
1
0.10
0.10
0.05
2
0.15
0.10
0.05
3
0.20
0.05
-
4
0.15
0.05
-
So,for example,
The function
PXY==
(3,
fx(, y)
1) = 0.05, and
PXY==
(1,
P( X== x Y = y) for all values
3) = 0.20.
of )xy
(,
is the (joint/bivariate)
,
probability
function
of )X(, Y
amongst the possible
distribution
it specifies
values of )xy
(,
how the total
probability
of 1 is divided
and so gives the (joint/bivariate)
up
probability
of )X(, Y .
The requirements
random variables
y(,
fx
=)
for a function
are:
0 for all values
to qualify
as the probability
function
of a pair of discrete
of x and y in the domain
?? fx(, y) 1
=
xy
This parallels earlier results, wherethe probability function
PXx==()
0 for all values of x and
?
was=PX
()
x.
Wesaw that
x ==PX
()
1
.
x
The Actuarial
Education
Company
IFE: 2022 Examination
Page 4
CS1-04: Joint
For example, consider the discrete random
(PM
m
m N n)==
=
,
35 2n
-
2
variables
M and N with joint
distributions
probability
function:
, where m= 1, 2, 3, 4 and n = 1, 2, 3
Lets draw up atable showing the values ofthe joint probability function for
M and N.
Starting withthe smallest possible values of M and N:
PM (1, N 1)==
=
35 2-
12
1
=
35
Calculatingthe joint probability for all combinations of M and N, weget the table shown below.
M
1
N
2
3
1
2
3
4
2
4
6
8
35
35
35
35
1
2
3
4
35
35
35
35
1
1
3
2
70
35
70
35
Question
Usethe table of probabilities
(i)
PM (3, N
== 1 or 2)
(ii)
PN =(3)
(iii)
N(2|
PM
given above to calculate:
==3) .
Solution
(i)
Sincethe events
=(1)PN
and
PM (3,==N
(ii)
Werequire
finding
PNM==
(3,
PN==
(3)
IFE: 2022 Examinations
1 or 2)
=
=(2)PN
are mutually exclusive, we have:
PM =(3, N =1)
+
PM (3, N== 2)
63
=
35
+
35
=(3)PN
, and since this does not depend on the value of
1,2,3 or 4), ie
70
+
11
35
+
3
70
+
we are summing
2
35
=
9
=
35
Mit is the same as
over all possible values of
M:
1
7
The Actuarial
Education
Compan
CS1-04: Joint
(iii)
distributions
Page 5
Usingthe formula for conditional probability, )PA(| B =
PM (2| N 3)==
=
PM (2,==
N 3)
1/ 35
=
PN =(3)
1/ 7
PA
n
PB
()
B
()
, gives:
1
=
5
Continuouscase
In the case of a pair of continuous variables, the distribution of probability over a specified
area in the )xy
(,
probability
plane is given by the (joint)
that the pair )X(, Y takes
integrating
)fx (, y
over A
probability
density function
values in some specified
this integral
region
)fx (, y .
Ais
obtained
The
by
is a double integral.
Thus:
y(,
Px
X 12x y1
<<
yx22
<
Y 2) = ??(fx, y) dxdy
<
yx11
Thejoint distribution function )Fx (, y is defined by:
Fx(, y)y== P( X
x Y= )
,
and it is related to the joint
y(,
fx
)
?
=
??
density function
by:
2
xy
F x )
y(,
The conditions for afunction to qualify as ajoint probability density function
continuous random variables are:
of a pair of
y(,
fx =) 0 for all values of x and y in the domain
??
)fx(, y dxdy
=
1
xy
Theseresults parallel those for a single random variable, wherethe probability density function
was ()f
x.
Wesaw that f x() = 0 for all values of x and
? fx()dx = 1.
Recallalso that probabilities
x
b
were
calculated
usingtheformula Pab<<X
()
=
?
f(x ) dx.
xa
=
The next questioninvolves the use of doubleintegrals.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 6
CS1-04: Joint
distributions
Question
The continuous random variables U and V havejoint probability density function:
fu )v=<
UV(,
2 +
uv
3,000
,
Calculate
(10V<<
, where 10
15,
u < 20 and
-
5
<
v
<
5
> 0)PU
.
Solution
From the formula for the joint probability function:
15
(10V<<
PU 15,
5
>0) =
2 +uv
??3,000dvdu
10uv== 0
Thiscan beintegrated withrespectto either u or v first. If wedo v first, weget:
15
5
uv
15
??3,000
dvdu==
10 uv== 0
1
2
?
15
du
3,000 ??
u=10
212uv
0u++ 12.5
?
10
??v=0
2
=??
5
2??+
v ??
3,000
du
15
uu??+
512.5
3,000 ??10
??
= 0.229
If weintegratefirst withrespectto u andthen withrespectto v, weobtainthe sameansweras
before.
1.2
Marginalprobability (density) functions
Discretecase
The marginal distribution
X
fx()
= ?f( xy,
of a discrete random
variable
X is defined to be:
)
y
This is the distribution
of
X alone
without considering
the values that
Y can take.
Thisis what we were doingin the first question in this chapter when wecalculated the probability
that N3= . If
take.
we want the
IFE: 2022 Examinations
marginal distribution
for
X, wesum over all the values that
The Actuarial
Y can
Education
Compan
CS1-04: Joint
distributions
Page 7
Let X and Y have the joint
probability
function
given in the Core Reading at the start of this
section:
x
1
2
3
1
0.10
0.10
0.05
2
0.15
0.10
0.05
3
0.20
0.05
0
4
0.15
0.05
0
y
Lets find the marginal probability distribution of X.
The marginalprobabilities are:
(PX 1)== 0.1 + 0.15 + 0.2 + 0.15 = 0.6
(PX
2)== 0.1 + 0.1 + 0.05 + 0.05 = 0.3
(PX
3)== 0.05
+
0.05
0.1
=
Sothe probability distribution of Xis:
x
PX = x()
1
2
3
0.6
0.3
0.1
Weare just adding up the numbers in each column.
For the
marginal distribution
of Y we would
calculate the row totals.
Wecan also do this if weare given the joint distribution in the form of a function.
Question
Obtainthe probability functions for the marginaldistributions of M and N, where:
PM
(
m
m N n)==
=
, for
35 2n- 2
m=1, 2, 3, 4 and n = 1, 2, 3
,
Solution
Summing over the values of N gives:
33
PM
m==
()
PM
(
=
m N = n) =
1135 2n-2
nn
==
,
The Actuarial
Education
Company
??
mm
=
=??2 +
35
1+
1??
m
2?? 10
IFE: 2022 Examination
Page 8
CS1-04: Joint
Summing over the values of
n (??P
()
PN
M gives:
44
M = m N n)==
==
distributions
,
mm
==
m
=
1135 2
22 1+ 2 + 3 + 4() =
nn
35 2
--
11
3
7 2n-
Continuouscase
In the case of continuous variables the
fxX
() is obtained
This meansthat f
by integrating
marginal probability
over
density function (PDF) of X,
y (for the given value of x ) the joint
PDF )fx (, y .
X() =?xf( x, y) dy.
y
The resulting
integrating
()Xfx
is a proper
over
PDF
it integrates
()Yfy
, we obtain this by
x (for the given value of y ).
In some cases the region of definition of )X(, Y
one variable
to 1. Similarly for
will involve
may be such that the limits
ofintegration
for
the other variable.
We willsee an examplelike this in Section 1.4.
Question
Determine the marginalprobability density functions for
fu )v=<
UV(,
,
2 +
uv
3,000
, for 10
u < 20 and
-
5
<
v
<
U and V, where:
5
Solution
To obtain the PDFof the marginal distribution of U, weintegrate out V:
5
?
fu==
U()
v=-5
Therefore the
v++1/2
uv
dv
2??
22uv
3,000
5
??
3,000 ??
??v=-
marginal distribution
of Uis
5
=
fu() =
U
uu+ 12.5) = u
()+- ( - 10
12.5
10
3,000
u
150
, 10
150
20u<<
.
Similarly for V, weintegrate out U:
20
?
fv==
V()
u=10
Therefore the
uv
20
du
u2++ uv??
3,000
marginal distribution
IFE: 2022 Examinations
??
=
400
3,000??
??u=10
of V is
fv =
V()
()+- ( 100 + 10
vv20
) 230+v
=
3,000
30+v
300
,
300
<55v-<
.
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 9
To check that these functions
are PDFs, wecan integrate
them over the appropriate
range.
The
answers should both be 1.
1.3
Conditional probability (density) functions
The distribution
of
X for a particular
value of Y is called the conditional
distribution
of
X
given y .
Discretecase
The probability
|
function
discrete random variables
y(,
Px
XY
| = y
PxyXY
(| y) for the conditional distribution
=
of X given=Yy
for
X and Y is:
)== P( X
y(,
Px
XY
,
x | Y = y) =
)
Py
Y()
for all values x in the range of X.
This is what we were doing earlier
when wecalculated
PM (2| N== 3) in a previous example.
Question
A bivariate
distribution
hasthe following
probability function:
X
Y
0
1
2
1
0.1
0.1
0
2
0.1
0.1
0.2
3
0.2
0.1
0.1
Determine:
(i)
the
marginal distribution
of X
(ii)
the conditional distribution of
Y=
|2X.
Solution
(i)
The marginal distribution
PX==
(0)
The Actuarial
Education
Company
0.4 ,
of X can be found
(PX==1)
0.3 ,
by summing the columns in the table.
(PX==2)
0.3
IFE: 2022 Examination
Page 10
(ii)
CS1-04: Joint
Using the definition
of conditional
Y(0|
PX
===2)
Y(1|
PX
==
Y(2|
PX
===2)
probability:
Y(0,
PX
== 2)
PY=(2)
2) =
Y(1,
PX
=
== 2)
0.1
= 0.25
0.4
0.1
==
PY =(2)
Y(2,
PX
distributions
0.4
== 2)
0.2
=
PY =
(2)
0.4
0.25
=0.5
Alternatively, wecould scale up the probabilities in the second row so that they add to
PX
one, eg Y(0|
== 2) =
0.1
0.1
0.1
0.1++ 0.2
=
0.4
=0.25.
Continuous case
The probability
density function
yfx |
(|)XY
y for the conditional
distribution
of
X given
=
Yy=for the continuous
variables
X and
Y is a function
such that:
x2
XY= y(,
fx y) dx
P( x|1=<<X
x in the range
X.
x2| Y = y)
=xx 1
for all values
This conditional
which
distribution
of
in both instances
is only defined for those
values
of y for
fyY
() > 0.
Wecalculate the form of the conditional PDFsimilarly to the method we usedin the discrete case,
ie we divide the joint
PDF by the
y(,
fx
fx
XY
| =yy(,
) = XY
marginal PDF. So:
)
,
fy
Y()
Question
Let X and Y havejoint density function:
y(,
fx
)=+
1
16
(x
3 y)
Determine the conditional
0 < x < 2, 0 < y < 2
density function
of X given
Yy=
.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 11
Solution
The marginalPDFof Y is:
2
2
11 1 2
??
3y) dx=+
=
x +3xy??
16 2
??x =0
? 16
f Y()
(yx
x=0
1
=
(2 6y)
16 +
So:
fx
XY
| =yy(,
)==
1
16
1
XY(,
fx y)
,
fy
Y()
16
1.4
(3xy)
+
(2 + 6y)
=
xy
+3
0<x <2
2(1 + 3y)
Distributions defined on morecomplex domains
For all the distributions wehave seen so far, the limits on both the x and the y integrals have
been numbers.
So, in the previous example, the joint
PDFis defined
over the rectangle
whose
vertices are at the points (0, 0), (0, 2), (2, 0), and (2, 2).
It is possiblefor ajoint distribution to be defined over a non-rectangular area. In these cases,the
limits for y
may be dependent
on x, or vice versa. Care needs to be taken in these cases to
ensure that the correct limits are used whenintegrating.
Question
Let X and Y havejoint density function:
fx(, y)
k( x2=+ xy)
0<<y x <2
(i)
Calculatethe value of k.
(ii)
Determine the PDFs of the
(iii)
Determine the conditional
marginal distributions
density function
for
of|YX x=
X and Y.
.
Solution
(i)
Lets integrate
with respect to x first, then y.
<<yx <02 that, when X is consideredasa variable,the
limits for X will befrom Xy= to X2= . Oncex hasbeenintegrated out, the limits for
Wesee from the inequality
y will be from
The Actuarial
Education
Company
y0=
to
y2= .
IFE: 2022 Examination
Page 12
CS1-04: Joint
So,integrating
first
with respect to x:
2
? kx xy
()2dx+= k??
3??
??y
y
2
this expression
8
ky+???k24
0??
5yy??
36
??
dy
2
xy??
32
x
Wenow integrate
=
k
=k
???
??
+2y??-
y
+
3
??? ?
y33 ?
??+
??
?= k
+2y 2??
??32?? 3
?
??
with respect to y, using the limits
2
?8
? 3
+
y2
534
y ?
?? = k
-
3
88
3?
5y ?
?6?
0 and 2:
10??
16
??
?0
Sincethis mustbe equal to 1, wesee that k =
(ii)
distributions
+
-
??= 6
3??24
1
6
.
Byintegrating first with respect to x, and setting k = 1/ 6, we have already obtained the
marginaldistribution for Y:
18
fy
Y()
To obtain the
=+-2y
63
5y 3??
0 <y <2
??
6 ??
??
marginal distribution
for
X, we mustintegrate
first
with respect to y.
We
see from the inequality givenin the question that the limits for y are now 0 and x. So:
x
fX
x
1??
22
()xy
dy=211
x y+62xy ??
???
xk x
()=+?
0
0
(iii)
? 3
=
6?x
+
1 3?
x
2 ?
?
1 3
=4 x
0<x <2
The PDF of|YX x= is obtained by dividing the joint PDFby the marginal PDFfor X:
fx
YX
| y(,
)==
fx y)
XY(,
fx
X()
1
6
()2
xxy+
,
1 3
4
x
=
y ??
21
3
x
+
??
x2??
0 <y < x < 2
1.5 Independenceofrandom variables
Consider a pair of variables )X(, Y, and suppose that the conditional distribution of Y
given
Xx=does not actually depend on x at all. It follows that the probability
function/PDF x(|
fy ) mustbe simply that ofthe marginaldistribution of Y, fyY
().
Here
f (|yx) is anabbreviation
for fYX|)x
= (,yx.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 13
So,if conditional is equivalent to marginal, then:
ie
XY,fx(,)y
So the joint
This
fx
XY,y(,
f | = x( yx, )==
YYX
fy()
)
fx
X()
( ) f ( y)
= f XYx
PF/PDFis the product of the marginals.
motivates the definition,
which is given here for two variables.
Definition
The random
function/PDF
variables
X and Y are independent
if, and only if, the joint probability
is the product of the two marginal probability functions/PDFs
for all )xy
(,
in
the range of the variables, ie:
XY,fx (,)y
= f X( x) f Y( y) for
all )xy
(,
in the range
Discretecase
It follows
that
probability
into statements about
variables
statements
about values assumed
X and Y separately.
Soif
by )X(, Y can be broken
X and Y are independent
down
discrete
then:
Y(,
PX
x
==
y)
=
P( X
x) P( Y
=
y)
=
Question
Determine
whether the variables
X and Y given below are independent.
X
Y
0
1
2
1
0.1
0.1
0
2
0.1
0.1
0.2
3
0.2
0.1
0.1
Solution
They are notindependent.
given Y2=
is not the same as the
Alternatively,
PY==
(1)
The Actuarial
For example, wesaw earlier that the conditional distribution of X
marginal distribution
wecan see that, for example,
0.2. Since 0.4 0.2
Education
Company
PXY==
(0,
of X.
1) = 0.1 .
However,
(PX==0)
0.4 and
0.1?
, the two random variables are notindependent.
IFE: 2022 Examination
Page 14
CS1-04: Joint
To show that the random
variables are not independent,
distributions
we only need to show that the joint
probability is not equal to the product ofthe marginalprobabilities in any one particular case. If
we wishto show that they areindependent, we need to show that the multiplication worksfor all
possible values of x and y.
As a quick check in the discrete case, note that, for independence,
the table
the probabilities in each row in
mustbein the same ratios asthe probabilities in every other row (and similarly for the
columns). Thisis not the case here.
Question
Thejoint probability function of M and Nis:
(PM
m
m N n)==
=
, where m=1, 2, 3, 4 and n = 1, 2, 3
35 2n 2
,
-
Determine
whether the variables
M and N are independent.
Solution
They areindependent.
joint
probability
Wesaw earlier that
distribution
is the product
PMm==()
of the two
m
and PN
10
1
()==
n
. Hencethe
72n-3
marginal distributions.
So the variables are
independent.
Continuouscase
If X and Y are continuous, the double integral required to evaluate ajoint probability
into the product of two separate integrals, one for X and one for Y, and we have:
P x
X<<
x12(,
1
<
yY < y2 ) = P( x
X<< x12 ) P(
1
<yY
<
splits
y2)
This meansthat in the continuous case,if the two random variables areindependent, wecan
factorise the joint PDFinto two separate expressions, one of which will be afunction of x only,
and the other will be afunction of y only.
Question
Thejoint PDFof U and V is:
fu )v=<
UV(,
,
2 +
uv
3,000
, where 10
u < 20 and
-
5< v
<
5
Determine whether the variables U and V areindependent.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 15
Solution
They are notindependent. If they wereit would be possible to factorise
2 +uv
3,000
into two
functions ofthe form )gu() h( v . Asthis is not possible, the random variables are notindependent.
Functions ofrandom variables
If the random variables
X and Y areindependent, then any functions
gX() and
hY() are
also independent.
Thisshould beintuitively obviousif wethink ofindependence as meaningthat the quantities have
noinfluence on each other.
Severalvariables
Whenconsidering three or more variables, the definition ofindependence involves the
factorisation
of the joint probability function into the product of all the individual
marginal
probability functions.
For X, Y, and Z to be independent
it is not sufficient that they are
independent taken two at atime (pairwise independent).
Question
Considerthe joint probability density function of X, Y and Z given by:
fx(, y, z)
?
=?
+xy() e-z
??
0
for 0
<<
1, 0 <xy
<
1, z
0
>
otherwise
Verifythat the random variables X, Y and Z are not independent,
but that the two random
variables X and Z are pairwise independent, and alsothat the two random variables Y and Z
are pairwise independent.
Solution
Thejoint density function of X and Z is:
1
fXZ
, x(, z)
?( x=+ y)e
dy
1
exy +)220
y )?? =
-??=
--zz (
z
(ex +112
0
Thejoint density function of Y and Z is:
1
YZ(,)
, fy z
?( x=+ ye)
1
dx -??=
e-- ( x + yx)?? = e z(y +112zz
)
220
0
The Actuarial
Education
Company
IFE: 2022 Examination
Page 16
CS1-04: Joint
The marginal density functions
distributions
are:
8
fX ( x)
?
ex=+=22
() dz
-
--
zzex
+
8
11 ??=
()??
0
1
fZ ()
?ze
8
?
1
??0
--zz( x 2
=-??e
( x )11
) dx=+
22 2+ 1 x
0
fY(y)
x
+
1
2
0
() dz
ey=++22
-
--
=
e z
zzey+11()??=
??8 =y 12
0
0
If we multiply together the marginal PDFsfor X, Y and Z, we obtain:
fXY
Z()
xf ( )yf ( z) ( x=+ )( y +11 ) e-z
22
Comparing this
with the joint
distribution
PDF fx(, y, z)
( x=+ y) e-z ,
wesee that they are not the
same. So X, Y and Z are not independent.
However the product of the marginal distribution PDFsfor X and Z, andfor Y and Z, do give
the respective joint PDFs. So X and Z are pairwise independent, and Y and Z are also pairwise
independent.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 17
2
Expectationsoffunctions oftwo variables
2.1
Expectations
The expression
is found
for the expected
by summing
value
(discrete
probability
value of a function
)gX (, Y
case) or integrating
of assuming
of the random
(continuous
variables
)X(, Y
case) the product:
that value
over all values (or combinations of) )xy
(,
. The summation is a double summation, the integral
a double integral.
Thus for discrete variables:
Eg
, Y
)]==
[ ( X
?? (gx y,) p XY,
( x y,)
?? (gx y,) P(
xy
where the summation
X= x Y
,
=
y)
xy
is over all possible
This result parallels that for single random
values of x and y .
variables,
where the expected value of a function
discrete
random
variable
was
defined
to be Eg X)]
of a
==?[(
x) .
g(x)P( X
x
Question
N+ 1
, wherethe joint distribution of M and N is:
Calculatethe expected value of
M
M
1
2
N
3
ie
PMn== m
(, N
) =
1
2
3
4
2
4
6
8
35
35
35
35
1
2
3
4
35
35
35
35
1
1
3
2
70
35
70
35
m
35 2n- 2
.
Solution
From the table of values, working across from the top left gives:
E
The Actuarial
N + ??
12
?? =
M ??
Education
35
Company
21
+
4
35
+?+
4
3
2
36
3
70
35
35
+ 1
=
IFE: 2022 Examination
Page 18
CS1-04: Joint
Alternatively,
(PM
we could
distributions
work from the formula:
m
m N n)==
=
, where m= 1, 2, 3, 4 and n = 1, 2, 3
35 2n 2
,
-
This gives:
43
??
Nn++ 11
EPM
Mm
??
??==
??
m N = n)
(,
mn11
==
43
+1
=
m
??
mn== 11
35 2n-2
43 n+
11
?? 2n-
=
35
mn
==
4
=
35
For continuous
nm
112
11
1
2++ 1
+
-10
+
22
3 +1?? 36
??=
21 ?? 35
variables:
E[ g X
(, Y)]
= ??g( x
y) f XY( x y) dy dx
,
,
,
xy
where the integration
This result
is over all possible
parallels that for single random
values
of x and y .
variables,
where the expected value of afunction
of a
X = g(x) f ( x) dx.
continuous random variable was defined to be Eg[( ?)]
x
Question
U and V havejoint density function:
2 +
uv
fu )v=<
UV(,
3,000
,
(i)
(ii)
Calculate
, where 10 u < 20 and
EU
() and
-
5<v
<
5
EV
() :
(a)
usingfUV
, )uv
(,
(b)
using
u and
()Uf
v.
()Vf
Comment on your answers.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 19
Solution
(i)(a)
Integrating
with respect to v first andthen with respect to u, weobtain:
20
5
EU==
()
20
(2uv)
?? u 3,000 dvdu
10uv==-
5
??
20
2u2
++ uv dv du =
3,000
u=10 v=- 5
5
1 uv22??+
2uv
?
2
3,000
u=10
?? du
??
?? -5
20
20
23
uu ??
du===??
150
450??10
??
10
140
?
u=
5
9
Similarly:
20
5
10 uv==20
?
u=10
(i)(b)
20
11 ??
du===?? u
36
36 ??10
()
u
10
()
EV
v
150
(ii)
2.2
fu =
U()
10 uv
?
3
23
5
v ??+uv
?? du
3,000 ??
?? -5
u=10
u
150
fv =
V()
and
30 +v
300
20
23
u ??
?? =
450??
??10
55
??
vv=-
30
vv++ v 2
300
9
15vv
dv
. Hence:
140
?? 150 du==
=
dv==
1
18
uu10
==
300
++ dvdu =
3,000
5
=-
20
2uv v2
5
20 uu
du
30
=-55
??
=
20
5
dvdu
5
We have already found that
EU
20
(2uv)
?? v 3,000
E ()V==
5
1 23??+
3 ??=
300
??
5
=
?? v=-5
18
Both methodsare equivalent.
Expectationof asum
It follows that:
E[ ag X
()+= bh(Y) ]
aEg
[
() ] + bEh
X
[ ( Y)]
where a and b are constants, so handling the expected value of alinear combination of
functions
is no more difficult
The definition
functions)
of expected
than
handling the expected
values
of the individual
value and this last result (on the expected
extend to functions
Eg X
h Y??
In particular (??
()?+=
)
functions.
value of a sum of
of morethan 2 variables.
E? g X
() ?? +
be easier to find from the respective
?
Eh(
?
Y)??
and these expected values will usually
marginal distributions,
so there
would be no need for
double sums/integrals.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 20
If
CS1-04: Joint
we have the sum of a number of random
SX =+ X
12 +?+
EX )
variables:
Xn
then, by extension of the result
E S
()
distributions
above:
( 12() +?+
EX
(EX=+
)n
Sothe expectation of the sumis equal to the sum of the expectations.
Thisis true
whether or not the random
variables iX
are independent.
Question
Verifythat
??+= 2[
Y
E 22
X ] + E[2 Y] , for the random
EX
??
variables
X and Y given here:
X
Y
0
1
2
1
0.1
0.1
0
2
0.1
0.1
0.2
3
0.2
0.1
0.1
Solution
Reading the values from the table,
EX
Y2 ??+=(022
??
2+ 1)
we have:
0.1 +(1 2
2+ 1) 0.1 + ? +(2 2 + 2 3)
0.1 = 5.9
Usingthe marginaldistributions of X and Y:
EX2??= 0
??
[]2EY
0.3 = 1.5
2 0.2 + 4 0.4 + 6 0.4
=
Thus EX2??
??
2.3
0.4 + 1 0.3 + 4
=
4.4
Y
[]E
+= 25.9, and the result has been verified.
Expectation of a product
For independent
E g
since the joint
random
()Xh(
variables
Y:
??
Y??=)
E?
? g () ?
?XE?
? h( Y)?
?
density function
IFE: 2022 Examinations
X and
factorises
into the two
marginal density functions.
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 21
Thisresult is true only for INDEPENDENTrandom variables.
Question
Thejoint probability function of M and Nis given by:
PM
(
m
m N n)==
=
,
35 2n-
, where m= 1, 2, 3, 4 and n = 1, 2, 3
2
M
1
N
2
3
Verifythat
+N
1
2
3
4
2
4
6
8
35
35
35
35
1
2
3
4
35
35
35
35
1
1
3
2
70
35
70
35
??
?11?
??=+?
??EE
EN
?MM?
?
[1].
Solution
Wehave previously seen that E
Wecan calculate
??
??
??
()
EN?+=
1??
E
??
M
??
N + ?? 136
??=
M??
and
1(n +1) P(N =n) = 2
n=1
This gives
Notethat E
1??
EE N 1()+=
??
M
??
N 1??+
???
?? MEM()
.
()+1EN
usingthe marginalprobabilityfunctions:
44
11
1
EPM == ()
m=
??
m==
Mm
mm 11
3
35
2 18
57
4
m
10
+
3
=
1
? 10
=
m 1
42
77
+
4
=
2
5
1
7
18
=
7
36
=
35
, which verifies the result.
1
()+EN
.
Thisshould not be surprising, since weshowed earlier that
M and N are independent random
variables.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 22
If
CS1-04: Joint
wetake the functions
to be
gXX=
()
distributions
and )hY
( Y= , these last two results give us some simple
relationships between two random variables X and Y:
(a)
(b)
2.4
EX Y
[]+= E[ X] + E[ Y]
if X and Y areindependent, then
EXY
[] = [EX] E[ Y] .
Covariance and correlation coefficient
The covariance cov[]X , Y of two random variables
c ov[]XY,
This simplifies
cov
X and Y is defined by:
E[ ( X=- E[ X])( Y - Y[
E ])]
to:
, ]
XYE
XY[]=-[ E[ X] E[ Y]
Notice the similarity
var( )
between the covariance
[(XE
X
=- E( X))-22
] =E( X )
defined here and the definition
of the variance:
EX(
2 )
Question
Showthat the simplification cov[ X, ] YE[ XY]=-E[ X E
][]Y is correct.
Solution
If
we expand the definition
cov( XY
, )
of the covariance,
E[ ( X=- E[ X])( Y
XE Y]
EXY=-
E[ XYE
]
=
EXY
[]=-
-
we obtain:
E[ Y])]
-
YE X
[]
+
E []
X E[ Y]()[
X EYEY
[]
[ ]--+ [ ] E []
X
EX
[] E[ Y]
[EX] E[ Y]
It is often easier to usethe formula
X, ) YE[ XY]=-cov(
E [X E
] Y
[ ]when calculating covariances,
rather than usingthe formula given in the definition above.
If werearrange this formula it tells us how to find EXY [] for random variables that are not
independent, ie
EXY
[EX] E[ Y]=+[]cov[ X Y] .
Wewillreturn to independence shortly.
,
Note: The units of cov()X , Y are the product ofthose of X and Y. Sofor example if
time in hours, and
cov[ XXX=
, ]
var[
IFE: 2022 Examinations
Y is a sum of money in , then
cov is in
hours
Xis a
. Note also that
]
.
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 23
Question
Calculatethe covariance of the random variables X and Y whosejoint distribution is asfollows:
X
Y
0
1
2
1
0.1
0.1
0
2
0.1
0.1
0.2
3
0.2
0.1
0.1
Solution
cov( X, ) YE[ XY]=-E [ X] E[ Y] .
We will use the formula
From the table of values:
[ Y]
EX
The (marginal)
0=
10.1 +
probability
x
PX = x()
?
230.1
+
distribution
=2
of X is:
0
1
2
0.4
0.3
0.3
So:
[ ]
EX
The (marginal)
0=
0.4
+
1 0.3
probability
Y
PY = y()
+
2
0.3
=
distribution
0.9
of Y is:
1
2
3
0.2
0.4
0.4
So:
EY
[ ]
1= 0.2 + 2 0.4 + 3 0.4
=
2.2
Hence:
cov( X,Y)
The Actuarial
Education
2=- 0.9
Company
2.2 = 0.02
IFE: 2022 Examination
Page 24
CS1-04: Joint
distributions
Usefulresultson handlingcovariances
(a)
cov aX
b cY ++ d[]
=
ac cov [ X, Y]
,
Proof:
[Ea X
b+=
] aE [ X]+ b and [Ec
b+- E[ aX
so aX
b]
+
=
a
?+aX
b cYd
cov
,
Y
d]+= cE [ Y]+ d
X - E X]()[ and cY
[]
=
E,??=
Y??+
a
-
XE
d+- E[ cY
X]()[ c
+
d]
=
- YE Y
[]()
c Y - E Y()[
]
ac
Note: The changes of origin (b and d) have no effect, because
means. The changes of scale (a and c) carry through.
X []cov
we are using deviations
from
This meansthat constants that are added or subtracted can beignored and constants that are
multiplied or divided are pulled out. Notethe similarity between this result and that for the
variance: var[aX b]+= a2 var[ X] .
(b)
cov,
XYZ
[]+= cov,[ X Y]
+
cov,[ X Z]
Proof:
EX Y[+= Z()?? E XY
]??
cov [ X, Y?+
+ E[ XZ] and [EY+=
] ] =ZE [ XY]+ E[ XZE
[EXY =]
cov
EY
Z[ ]
[ ] +E
E[ Y]+ E Z()[
]
[ X]
EXE
[]
[ Y
+]
Z]
[EXZ ]
-
EXE
[] [ Z]
,XY[]=+ cov [ X, Z]
These two results hold for any random variables
X, Y and Z (whenever the covariances
exist).
Thisresult is just like
These results
multiplying out brackets using the distributive law:
y
z()x xy + xz+=
.
will be useful in Subject CS2.
Question
Write down the formula for cov[ XZ++YW
].
,
Solution
YW++ Z) = cov( X, W
) + cov( X, Z)
cov( X
+
cov( Y, W
) + cov( Y, Z)
,
The nextresult concernsrandom variablesthat areindependent.
(c)
If
X and
IFE: 2022 Examinations
Y are independent,
cov[0XY
, ]
=
.
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 25
Proof:
cov [ X, Y]
[EXY ]=- E[
XE
] [ Y] = 0
Thecovarianceof Mand N usedin earlier examplesis zero asthey areindependent.
The result
EXY[] = [EXE[Y
]
] extends to the expected value of the product of any finite
variables, ie =[]
EX
[EX
nn]??
X
.
11]
[EX
number ofindependent
The covariance between
X and Y is a measure of the strength of the linear association
or
linear relationship
between the variables.
However it suffers from the fact that its value is
dependent on the units of measurement ofthe variables.
A related quantity to the covariance
quantity (ie it has no units).
The correlation coefficient
is the correlation
coefficient
as corr()X, Y ) orY?
Y
(),X(written
which is a dimensionless
(),X
oftwo random variables
X and Y is defined by:
cov,() XY
corr ,XY () =
var
X
() var (Y)
Question
Calculatethe correlation coefficient of U and V, where:
uv
2 +
, where 10
3,000
fu )v=<
UV(,
,
() =
EU
You are given that
140
EV() =
and
9
u < 20 and
-
5<v
<
5
5
18
.
Solution
First weneed EUV[] :
20
5
EUV
[]=
2 +uv
?? uv3,000
10 uv==-
dv du
5
Integrating
first withrespect
to u:
u =20
uv
EUV==
[]
21 32
u v 2??
32
3,000
=-55
55
+
??
??
??
=10
dv
??
vvu=-
14vv
9
+
??
?14 v22
?? dv =
20??
? 18
v3
+
?
5
??
25
=
????
60?5
6
Sothecovariance
of Uand Vis:
cov(UV,)
The Actuarial
Education
25
=-
6
Company
140
9
5
18
25
= -
162
IFE: 2022 Examination
Page 26
CS1-04: Joint
distributions
Wenow need the variance of Uand V:
20
var(
?Uu
)=
2
140??
du
150
10
9 ??
20
224
uu ??
-=
140??
650
?? -=????
600??10
9 ???? 81
Similarly:
5
30+
5
dv -??
300
18??
var( )= ?Vv2
-
So the correlation
5
coefficient
534
???? ? 5 22
2,675
?
?? -? ? =
30 1,200??
???
?18
324
vv
=
v
+
-5
is:
25
-
1
162
corr(UV, )==650 2,675
81
The correlation
association
324
coefficient
takes
a value in the range
Wecan use this range to do a reasonableness
means is that
If
. It reflects
the degree
of
check for any numerical
answer.
Any figure
outside
wrong.
? of 1 indicates that the variables haveperfect linear correlation
one variable is actually
=0?
, the random
Independent
=11?
-=
between the two variables.
this range is automatically
A valuefor
= -0.019
2,782
variables
alinear
function
of the other (with
probability
whatthis
1).
are said to be uncorrelated.
variables are uncorrelated (but not all uncorrelated variables areindependent).
This meansthat the converse of Result(c) given previously is not true. If X and Y are
independent, their covariance is equal to zero. However,if the covariance of X and Y is zero,
this does not necessarily
mean that they are independent.
In simple terms, independent
expectations
meansthat probabilities
factorise, and uncorrelated
meansthat
factorise.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 27
Question
A bivariate distribution hasthe following probability function:
P
1
1
0
1
0.1
0.6
0.1
0.1
0
0.1
Q
1
Show that
P and Q are uncorrelated
but not independent.
Solution
Wehave:
EP
[] = 0,
[ ] =- 0.6EQ
and0EPQ
[ ] =
so the covariance is zero.
However the conditional
probability
distribution
0.5, whereas the
of, say,
PQ=|1 takes the values 1
marginal distribution
and 1 each with
of P takes the values 1,
0 and 1 with
probabilities 0.2, 0.6 and 0.2 respectively. Sothe marginal distributions are different from the
conditional distributions, and P and Q are not independent.
Alternatively, wecould compare, for example, PP
( Q=- 1,
=1) with PP
( =- 1) P( Q =1).
This confirms the sentence givenin the Core Reading previously. Thisis an example of random
variables that are uncorrelated,
2.5
but not independent.
Varianceof asum
For any random
var[X
variables
X and
]+= var[YX]
+
Y:
var[ Y] + 2cov[
This can be proved from the definitions
X, Y]
of variance
and covariance.
Onepossible proof is asfollows:
var[
XY]+= E ([
XY][- E
[(EX=- E[ X])
E
+XY])2??+
??
+(
X=- ([
E X]) ?? +
??
var[ ]=+ var[XY]
The Actuarial
Education
Company
Y -EY[ ])]2??
+
??
?EY
( - E[ Y 22?+ 2
?
?
EX - ([
E X])( Y - E[ Y])[]])
2cov( X, Y)
IFE: 2022 Examination
Page 28
CS1-04: Joint
distributions
Question
Set out an alternative proof of the above result starting from var(X
)+= cov(YX + Y X + Y) .
,
Solution
var( X
)+= cov(YX + Y, X + Y)
=
cov( X, ) cov(XXY
, )+++ cov( Y, X)
var( )=+ 2cov(XXY
, )
For independent
var[X
since
random
=
var(Y)
variables,
]YX]+= var[
cov[0XY
, ]
+
cov( Y, Y)
this
can be simplified:
] + var[ Y
.
Question
Show from first principles that the random variables
given below, satisfy var[
MN]+= var[]M ]
+
M, N, whosejoint probability function is
var[ N .
M
1
N
2
3
1
2
3
4
2
4
6
8
35
35
35
35
1
2
3
4
35
35
35
35
1
1
3
2
70
35
70
35
Solution
By adding up the probabilities
following distribution:
N+=
m+ ()
n
The expectation of
EM
[]+= 2
N
IFE: 2022 Examinations
wesee that the random
variable
2
3
4
5
6
7
2
5
17
12
11
2
35
35
70
35
70
35
+mn
PM
from the table,
MN+has the
MN+
is:
35
+
?+ 7
22
35
=
32
7
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 29
The variance of
MN+is:
var(MN)
2+=
35
+?+7
2
222 2
32??
-
35
75
?? =
7??
49
From the marginaldistributions:
var( M)
1=
var( N)
1=
10
4
+?
+
422
14
22
1
3
+?+
77
10
-
32
=
1
2
11??
-
??
7??
26
=
49
26
So M and N satisfy the given relationship, since 1+=
49
75
49
.
Similarly,it can be shown that:
var( X
)-= var(YX)
and so for independent
+
var( Y)
-
2cov( X, Y)
random variables,
var( X )YX
)-= var( )
+
we get:
var( Y
The variance of a difference is not equal to the difference in the variances. It is equal to the sum
of the variances. This mustbetrue, since the difference in the variances could be negative but
variance is always a positive quantity.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 30
3
CS1-04: Joint
distributions
Convolutions
3.1 Introduction
Much of statistical theory involves the distributions
of sums of random variables.
particular the sum of a number of independent
variables is especially important.
In
Discretecase
Consider the sum oftwo discrete random variables, so let
probability
function
Then
y(),Px
.
z()=PZ
is found
ZPz()(=- ?P x, z
ie
ZX=+ Y, where )X(, Y has joint
by summing
y(),Px
over all values of
(),xy
such that
xy+= z,
)x .
x
Wedidthis whenwecalculatedthe distributionof
Now suppose that
two
X and Y areindependent
marginal probability
zZX?
Pz
()=-
functions,
P ( x) PY(
MN+
in the previous question.
variables.
Then
y(),Px
is the product of the
so
x)
x
Definition
When a function
convolution
ZP
can be expressed
of the functions
here, the probability function
functions of X and Y.
XP
of
as a sum of this form, then ZP is called the
and YP .
This is
written symbolically
ZX=+ Y is the convolution
as
PP
=* PY .
ZX
So
of the (marginal) probability
Continuouscase
In the case where
density function
X and Y are independent
y(),fx
, the corresponding
f ( x)fY (z
() ?ZX
fz
=-
continuous
variables
expression is:
with joint
probability
x) dx
x
Question
X?? Poi () and
If
Z
? YP
()oiare independent
random
variables, obtain the probability
function
of
=+
XY.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 31
Solution
Usingthe convolution formula for discrete random variables:
z
PZ z()==
?(P
X = x)P(Y = z
-
x)
x=0
z
x--
?
=
xz
ee ?
?
!(xz
x=0
-+ ?
()
z
=
-
x)!
ez!
?
-xz
?
zx!!( z - x)!
x=0
e
=
-+ ?
()
z
-+ ?
z??
xz x
??
()
=+ ?
z!
-
?
x=0x??
z!
e
?
x
()z
Sincethis matchesthe probability function for the
values Z
=
0,1,2,
?),it follows that
? Poi
?+
+ ()Poi
?
distribution (and Z can take the
()Z
.
Wecan also use a convolution approach to derive the sum of two continuous random variables.
Question
If X?? Exp () and
?
()xpareindependent random variables, obtain the PDFof Z
YE
=+XY.
Solution
Usingthe formula for continuous random variables (and assuming that
f Z()
ze -?
?
e--
xz
x
()
?
dx==
e
-
-?
e??
1 ee
?
=-
?
If
?=
, we get
?
2ze
?-
z
?--
-
()
??
=
?
):
zzz
??(?
-
) e -(?-)x
dx
00
?zz
z-
-
()e-?z
-- ?
, whichis the PDFof the Gamma(2, ? ) distribution.
Wecan also use MGFsto find the distribution of a sum of random variables. This will be dealt
with in Section 4. The MGF method is much easier than the convolution
The Actuarial
Education
Company
method.
IFE: 2022 Examination
Page 32
3.2
CS1-04: Joint
distributions
Momentsoflinear combinationsofrandomvariables
In the last section welooked at the properties offunctions oftwo random variables.
extend these results to
Wecan now
more than two variables.
Mean
If
X nXX
12,,...,
are any random
Ec X
11
where
variables
c2 X2++X??+ c nn
X ()
=
(not
necessarily
c1 E( X1 )+ c2 E( X2 )+
independent),
+ c n E(
then:
n)
cc
12,,...., cn are any constants.
??
ie
Ec
nn
?? ci
??=
X
ii ??
?? ii
==
()
E Xi
11
Thisis an extension of the result concerning the expectation of afunction of two random
variables that
we saw earlier ie
[Eag( X)
bh( Y)]+= aEg
[ ( X)] + bE[ h( Y)] .
Variance
Let
+=+
Yc X
11 c2 X2
X , where the variables are not necessarily independent,
?+c nn
and let us
now consider the variance:
var( )
cov(YYY
, )
=
cX11=+ c2 X2 +??++ c
cov
X
nn
2 cov cXii , Xi ()=+ ???
2
iij
nXX
12X
,,...,
random
are pairwise
variables,
??
c2 X2+
ci cj cov
12
uncorrelated
c
nn
X ()
,iX Xj()
)+= var( X1 ) + var( X2) + 2cov( X1, X2) .
(and
hence certainly if they
are independent)
X() =
nn
(var cX11)+
22
var (cX22 )+
+ c2n var(
n
)
nn
?? c2var
??
var
+
then:
var c1 X1 c 2 X2++X??+ c
ie
cX11
j
<
Thisis an extension of the result var(XX
If
,
cXii ??=
i
Xi ()
?? ii 11
==
Question
Therandom variables X, Y,and Z have meansand variances 4X =
s
=2
Y
4 and
s
,
Y
=-5
,
6Z
=
,
s
=2
X 1,
=2
Z 3. Thecovariances
areasfollows:
cov( X, Y) =- 3
cov( X, Z) =- 2
Calculatethe meanand variance of
IFE: 2022 Examinations
cov(
1YZ
, ) =
=-+23WX
Y
Z.
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 33
Solution
The meanis:
EW
[]
E[ X]=- 2E[ Y] + 3E[ Z] = 4 -(2 - 5)+(3 6) =32
Sincethe randomvariablesX, Yand Zare notindependent,wecanseethat:
var(
)Y?+var(WX
)
4var( ) + 9var( Z)
Instead we have:
var( W
)
var( X=- 2
3YZ)
=
var( X)=+ 4var( )
+
1=+ (4
+
4)(9
+
cov( X - 2
9var(YZ)
3) - (4
-
-
+
3YZ, X - 2
4cov( X, Y)
3)(6
+
-
+
2) - (12
+
3 )
YZ
6cov( X, Z) - 12cov( Y, Z)
1) = 32
It is important to note that there is a distinction between adding up random variables, and
multiplying a random
variable by a constant.
Question
If
XX
?,,
12
, Xn areindependent random variables with mean
mean and variance of
SX =+ X
12+
?+Xn and =Tn1X
and variance s2, obtain the
.
Solution
The meanand variance of S(which is a sum ofindependent random variables) are:
E []
S = E[ X + X =??12
++ X ] = [EX1 ] + [EX2 ]
var( )
=
var(SX
12
++ X ) = var[ X1]
++ E X
nn[]
n
X ] =n
var[]X2 ??+++ var[ nnX+
s2
The meanand variance of T(which is a single random variable multiplied by a constant) are:
ET
[]== E nX ]
var(
nE[ X ]=11[n
var(TnX )== n var(X
11)
)s=22 n
2
The means are the same but the variance of S is smaller.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 34
4
CS1-04: Joint
distributions
Usinggeneratingfunctionsto derivedistributionsoflinear
combinationsofindependentrandom variables
In the last section,
wesaw that
wecan find the distribution
of a sum of a number of independent
random variables using convolutions. In this section welook at an alternative andfrankly
much
easier method.
In
many cases
generating
of Y, where
4.1
functions
may makeit possible to specify the actual
distribution
Yc 11
X=+ c 2 X2 X?+
+
c nn.
Moment generating functions
Suppose 1X
and 2X
respectively,
are independent
and let
random
variables
MtX1
() and
with MGFs
2MtX
()
Sc11X=+ c 2 X2.
Then:
SMt()
= E e
=
=
Ee
c2 X2 t??+
()
cX11
cX t ??
11
??
??
??
? c2
Ee
?
so the
ZX=+ Y, we have:
M
t MY t()
ZX= ()
MGF of the sum of two independent
The result
Let
?
Mc t() M
( c
12t )
XX12
In the case of a simple sum
Mt
( )
Xt
2 ?
extends to the sum of more than two
YX12
=+ X X?
+ + n
tYn=
(Mt)
(and if iX
where the iX
in the sum is replaced
X? +
of the individual
MGFs.
variables.
are independent
and iX
has MGF
()iMt
. Then:
M t() 12
M t() ... M ( )
byicX
If, in addition, the iX s are identically
YX12
=++X
variables is the product
n, then
then
distributed,
()iMt
in the product is replaced
each
with MGF
by
()iMct
).
Mt
() , and
n
MtY
() =??Mt()??.
Both ofthe last two results areimportant to remember and are quotable in the Subject CS1exam.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-04: Joint
4.2
distributions
Page 35
UsingMGFs
to deriverelationshipsamongvariables
Bernoulli/binomial
We will now derive the
MGFof a Bin n(, p) distribution using an alternative
methodto that usedin
the previous chapter. This method usesthe fact that a Bin)n(, p random variable is the sum of n
independent
Let iX
Bernoulli
,in= 1, 2, ?,
p random
()
, be independent
Then each has MGF
So
YX12
=++X
So the
variables.
X? +
Mt
()
q
p variables.
()
t
pe=+
.
n has MGF
Bi
)n ( n, p random
Bernoulli
qpte
n
??+
??
which is the
MGF of a Bin)n(, p variable.
variable is the sum of nindependent
Bernoulli
p random
()
variables.
Each Bernoulli
variable
has mean p and variance
pq ; hence the binomial
has
mean np
and variance npq .
Physically, the number
(0 or 1) at each trial.
of successes
in
Further, the sum of two independent
Bi
n trials is the sum of the
numbers
of successes
binomial variables, one Bin n p(), and the other
mp
(),n
, is a Bin np+ m
(, ) variable.
Question
Showthat if
X ? Bin m
(, p) and
?YBin n
then+XY also
(, p) areindependentrandomvariables,
has a binomial distribution.
Solution
The momentgenerating
functionsfor X and Yare:
MtX
() ( q
pe=+
tm)
and
MtY
() ( q
pe=+
tn)
SinceX andYareindependent,wecanwrite:
MXY
+ (t )
=+ qpe
()
+qpe
()tn
=+ qpe
tm+n
()tm
which werecognise as the MGFof a Bin m+ (,
n p) distribution.
+XYfollows the binomial distribution withparameters+mn
The Actuarial
Education
Company
Hence, by uniqueness of MGFs,
and p.
IFE: 2022 Examination
Page 36
CS1-04: Joint
This result should be obvious. If
wetoss a coin 10 times in the
distributions
morning and count the number of
heads,the number would be distributed as Bin(10,1 ). If wetoss the coin afurther 20times in
2
the afternoon, the number of heads will be distributed
Bin(30, 1 ) distribution
is obviously the same as the
The phrase by
uniqueness
2
of MGFsis important
that
here.
as Bin(20, 1 ) .
2
Adding the totals together
we would expect for the
whole day.
What we are saying here is that it is not
possible for two different distributions to have the same MGF.If it were, then, once we had
found the MGFof the sum, it would not be possible to determine which of the two distributions
havingthe same MGFwasthe onefor the variable XY+
. Fortunately, MGFsdo uniquely define
the distributionto whichthey belong,andso weknowthe exactdistributionof XY+
.
Geometric/negativebinomial
Wewillnow derivethe MGFof a negativebinomial distribution with parametersk and p using
an alternative
method to that used in the previous chapter.
This method uses the fact that a
negative binomial k(, p) random variable is the sum of k independent geometric random
variables with parameter p.
LetiX
,ik = 1, 2,
,?
, beindependent
Then each has MGF
So
YX12
=+kX
Mt
() =
+ +
X?
pet
geometric p
() variables.
t.
1 - qe
has MGF
k
??
??
qet ????
pe t
1
, which is the
MGF of a negative
binomial ( k, p)
variable.
So the
negative binomial
random
k (),
p random variable is the sum of k independent
Each geometric variable has mean
mean
geometric
p()
variables.
k
p
and variance
1
p
and variance
q
p2
; hence the negative binomial has
kq
p2
.
Physically, the number of trials up to the kth success is the sum of the number of trials to
the first success, plus the additional
number to the second success,...,
plus the additional
number to the
kth success.
Further, the sum oftwo independent
(),mp
, is a negative
Thisis straightforward
IFE: 2022 Examinations
binomial
negative binomial variables, one
(),kp
and the other
kp+ m (), variable.
to prove using MGFs.
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 37
Poisson
We will now find the distribution ofthe sum of two independent Poissonrandom variables using
MGFs. This is an alternative
method to the convolution
Let X and Z beindependent
ThenXhasMGF MtX
() exp
()Poi
? and Poi
?
()?
method.
variables.
1 }, Z hasMGF MtZ
(){=()
et
exp
tt -- (){??
1 ???
exp
1?}
exp
Sothesum
+X Z hasMGFexp{ =ee()}
?
?????
?
?
the
MGF of a
?
Poisson
variables is a Poisson
variance ==mean
?, Z has
mean
variance ==
This is animportant
?
et
? +?
1
(){=}.
1t
e - }, which
is
()(){
()+Poi
?
variable.
So the sum of independent
X has
?
variable.
variance ==mean
?, and the sum has
+? .
result to remember
and is quotable in the Subject CS1 exam. It can be
extended (in an obvious way)to the sum of morethan two Poissonrandom variables.
Question
Acompany
has three telephone lines coming into its switchboard.
The first line rings on average
3.5 times per half-hour, the second rings on average 3.9times per half-hour, and the third line
rings on average 2.1 times per half-hour. Assumingthat the numbers of calls areindependent
random variables having Poisson distributions, calculate the probability that in half an hour the
switchboard
will receive:
(i)
at least 5 calls
(ii)
exactly 7 calls.
Solution
Summing the Poisson variables, the total
distribution
(i)
PX==
(
5)
(ii)
(PX
These figures
Alternatively,
The Actuarial
number of telephone
calls coming in follows the Poisson
with mean 3.5 3.9 2.1 = 9.5++
.
7)==
1 - P( X
(PX
=
=
4) = 1 - 0.04026
=
0.95974
7) - P( X = 6) = 0.26866
are taken from the cumulative
-
0.16495
=
0.10371
Poisson table on page 178 of the Tables.
we could use the Poisson probability formula.
Education
Company
IFE: 2022 Examination
Page 38
CS1-04: Joint
distributions
Exponential/gamma
We will now derive the MGFof the Gammaa?
(, ) distribution using an alternative
methodto that
used earlier. This method usesthe fact that a Gammaa?
(, ) random variable can be regarded as
the sum of
Let
, 2,...,ik,
=
Xi,1
aindependent
be independent
Then each has MGF
=+kX X?
+ +
YX12
So
So the
Ex
?()Exp
random variables.
Mt
() (??
=- t )-1
.
k
has MGF
Gamma ()k , ? random
()p
? random
()? variables.
Exp
1
, which is the MGF of a Gamma )k
( ,?
t()-??-??
??
??
variable (for
k a positive integer)
?
k
1
?
2
Further, the sum oftwo independent
,
2
; hence the
Gamma(),k
?
has
.
Physically, the time to the kth event in a Poisson
individual
inter-event times.
?+ ad
?
k
and variance
?
Gamma (
is the sum of k independent
variables.
Each exponential variable has mean 1 and variance
mean
variable.
process
gamma variables, one
with rate
?is the sum of k
(),a?
and the other
(),d?
, is a
) variable.
Thisresult can also be proved using MGFs.
Question
If the number of minutesit takes for a mechanicto check a tyre is arandom variable having an
exponential distribution with mean5, obtain the probability that the mechanic willtake:
(i)
(ii)
more than eight
at least fifteen
minutes to check two tyres
minutes to check three tyres.
Solution
(i)
The sum oftwo independent exponential random variables with mean5, has a gamma
1
distribution with parameters a 2=
and ?= . If welet X bethe total time taken for the
5
mechanicto check the tyres, then:
2
8 () xe dx
1
5
-1 x
(8) ? G(2)
PX>=
5
8
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 39
Integrating
by parts, using
PX>=
(8)
1
??
25
?? +
??8
1
25
1
25
=
Alternatively,
ux=, we obtain:
8
88
??
11xx
??-5xe
5?e dx
--
55
??
??
8
8
--1 x
????
55
=- 25ee ?? ??
40
??8 ??
??
--
40
=+
88
??
ee 55??
25
??
0.525
we could use the Poisson process. If
checkedin a time period of t
minutes,then
welet
Y be the number of tyres
?YPoi(0.2 t ) . The probability that it takes
more than 8 minutes to check two tyres is equivalent to the probability
oftyres checked in 8 minutesis only 0 or 1. Using
?YPoi(0.2
that the number
8), the required
probability is therefore:
PY
(ii)
1.6
0 or 1
= 0.525
()== e 1.6 +1.6 --e
The sum of three independent exponential random variables with mean5 has a gamma
distribution with parameters
and
a 3=
1
5
?=
. If welet X be the total time taken for the
mechanicto check the tyres, then werequire
(
> 15)PX
.
Wecould solve this byintegrating the PDF, but this would require integration
by parts
(twice).
An easier wayto find this probability is to use the gamma-chi squared
relationship
proved earlier. If X ? Gammaa?
( , ) and 2a is a positive integer, then
2
X?
??
2
2a
. So:
(PX
Substituting
15)>= P(2
3=
a
X
>
30 ) =(P??? 22
a
>30?)
and ?= 1 , and usingthe
5
?2 values given on page 166 of the Tables,
we obtain:
PX( 15)>=P ( ? 62
>
6) = 1
-
0.5768
=
0.4232
Alternatively, wecould usethe Poisson distribution with mean 0.2 15 and calculate the
probability of 0, 1 or 2tyres checked within 15 minutes.
The difference in the wording in the two parts of the question
more than versus at
least
is not significant here. Since we are working in continuous time, the probability
that an event occurs at exactly time 8(or time 15)is zero.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 40
CS1-04: Joint
distributions
Chi-square
From the above result
independent
with ?= 2,
1 it follows that the sum of a chi- square n() and an
chi - square
m
() is a c - square
Sothe sum ofindependent
m
n+ ()hi variable.
chi-square variables is a chi-square variable.
Question
Supposethat 1X and2X areindependent random variables such that
let
X
=+12XX
. Use MGFsto prove that
?
X
2
?
X1
m??
2
and
??
X2
2
n, and
mn
.
+
Solution
Since
2
=Gamma
? n
22,n
()1 we have:
m
1
MtX
() 1=- 2t()-
n
and
2
2
MtX
() 1=- 2t()-
2
Since1X and2X areindependent:
Ee
( tX )
MXX
t()
tX +tX
Ee==
( tX 1 etX 2=)
(
12) =Ee
mn
So
MtX
()
=-
12t()
Ee
t MX
( tX1) E( etX2) = M ()
12t()
m+ n
12t() 22--
1=- 2t()-
2
.
Thisis the
X
?
2
?
MGFof the
2
? +
mn distribution.
Bythe uniquenessproperty of MGFs,
it follows that
mn.
+
Thisresult is usefulin manyareas of statistics, including generalised linear
study later in the course).
models(which we will
Normal
Let X be a normal random variable with meanX
be a normal random variable with meanY
independent,
and let
IFE: 2022 Examinations
, and let
Y
and standard deviation Ys . Let X and Y be
ZX=+ Y.
X has MGF tXXMt() exp
Y has MGFtYYMt()
and standard deviation Xs
exp
=+
2s
=+ s
2
X221
t ().
Yt221
()
.
The Actuarial
Education
Compan
CS1-04: Joint
distributions
So the sum
exp (
which is the
Page 41
ZX=+ Y has MGF:
s
tXY
22
ttX
) exp
MGF of a normal
So the sum ofindependent
XY ? Y(,
ie
++N
XY
++
2
22 sYt
2
variable (with
()
=
exp11
mean
+X Y ()t+
+X
Y
1
2
2
2t }
s2 +sX
Y
(){
and variance
+
22
ss XY ).
normal variables is a normal variable.
X
22)
ss+
Similarly,it can be shown that:
XY ? Y(,
--N
XY
X ss+ 22)
The variance ofthe difference is the sum ofthe variances (as wesaw in the general case earlier).
These areimportant results to remember and are quotable in the Subject CS1exam.
Question
If X and Y areindependent standard normal variables, determine the distribution of 2-X
Y.
Solution
The resulting
distribution is normal.
0-XY ? N
2( 2
The Actuarial
Education
Company
-0,2
Wejust need to fill in the
mean and variance to obtain:
1 +( - 1)
22 1) =N(0,5)
IFE: 2022 Examination
Page 42
CS1-04: Joint
distributions
The chapter summary starts on the next page so that you can keep
all the chapter summaries together for revision purposes.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 43
Chapter4 Summary
Two discrete random variables X and Y havejoint probability function (PF), PXy==x(, Y
).
This defines how the probability is split between the different combinations ofthe variables.
The joint
PFsatisfies:
??
PX x(, Y y)==
=1
and
PX x(, Y== y) =0
xy
Two continuous random variables X and Y havejoint probability density function (PDF),
XY
, fx(, y). Thejoint PDFsatisfies:
??
)fxXY
y dxdy
1
and
)==(,
0
y(,
fx
,,XY
xy
Wecan usethe joint PDFto calculate probabilities asfollows:
yx
22
Px
y(,
X<< 12
x y1
<
< 2) = ??f(x
Y
y) dx dy
,
yx
11
Thejoint distribution function, for both discrete and continuous random variables is given
by:
Fx(, y)y==P( X
x Y= )
,
Forcontinuousrandom variables y(,
fx ) =
?
??
2
xy
F(xy, ) .
The marginal distribution, eg PX = x()or f X(x) , can be calculated using:
PX
?P( X
x ==
()
=
x Y
=
f (x )
y)
,
y
The expectation
=
y) dy
x(| Y== y)or f XY
| )y
= ( x | y , can be calculated using:
PX
Y(,
PX x == y)
PY = y()
of any function,
E[g( X, Y)]
,
y
Theconditional
distribution,
eg
y(|
PX)==x Y
f ,Y( x
=?XX
f XY
| y(x y) =
=
,
Eg
[( X, Y)] , can be calculated
?? (gx y,) P( X== x
,
Y = y)
or
y(,
fx
XY,
)
fY( y)
using:
??(gx y,)fXY (x
y) dxdy
,
,
xy
The Actuarial
Education
Company
yx
IFE: 2022 Examination
Page 44
CS1-04: Joint
distributions
Thecovariance,cov()X, Y , can becalculated using:
X( )()( Y -E( Y) ??= E XY(
) E( X) E( Y)
cov( X, )=-YE-??)X E
The correlation
corr(
coefficient,
, )
XY
?(
?=(,
XYXY
) corr(,
) , is given by:
cov( ,XY)
X, Y)==
var( X)var( Y)
Therandom variables X and Y are uncorrelatedif and onlyif:
co rr( XYXY
, ) 0 =?
The random
variables
PX
cov( , ) = 0
E XY(
) = E( X) E( Y)
?
X and Y are independent
x(, Y== y)
=
fx y)
XY(,
=
(PX
=
x) P( Y
if, and only if:
y)
=
f X(xf) Y( y)
,
for all values of x and y .
Independent
random
variables are always uncorrelated.
Uncorrelated
random
variables are not
necessarilyindependent.
Expectations of sums and products can be calculated using:
EX ()
Y+= E( X)
(EXY)
+
E( Y)
E( X E
)()Y=+ Cov( X, Y)
=
(EX) E( Y)
if X, Yindependent
The above are also true for functions
gX
() and hY
() of the random variables.
Variances of sums can be calculated using:
var(X
)+= var(YX)
+
var( Y)
+
2cov( X, Y)
var( )=+ var(XY)
if ,X Yindependent
The convolution of the marginal probability (density) functions of X and Y is the probability
(density) function of Z
f
ZX
=+XY.
(PZ )z= or
()Zfz
is given using the formulae:
P X =x) P(Y= z - x)
?*(
ff Y==
?f(XY)x
f (z- x) dx
or
x
x
Forindependent random variablesXX1
?,,
Ec 11
X
var c 11
X
IFE: 2022 Examinations
++
c nn
X()
++
=
c X
nn()
c1 E( X1 )
=
??
n, andfor any constants
++
(22
1 varcX1 )
cc
12,,..., cn:
c n E( Xn )
??
++
n var(cXn)
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 45
Forindependent
random
X=+
YX1
?
variables
n
+
?
?,,
Mt
()
=
XX1
n
M
YX1 t()?
(Mt)[]
=
Forindependent
random
Bernoulli
p
()
MXn ()
t
n
if the X
Xi's are alsoidentical
variables:
?++Bernoulli(p ) is Bin( n, p)
Bin n(, p)p++Bin( m, p) is Bin(n
Geo p
()++?
m, )
Geo( p) is NBin
p(,k )
NBink(, p)p++NBin( m
, p) is NBin(k
m
, )
()Exp??
++?Exp( ) is Gamma?( a, )
Gamma )a? (,
Gamma ( d
?) is Gamma(
+??22
mn
is
?
s11
is Poi(
) NN(
Some of the notation
,
2m+ n
Poi ()?++?Poi()
(,
?)
++a d
,
2,
s
)
22
2 ) is N( 1
2
2
1
+
ss
2
2)
,
used here for the linear combinations
of random
variables is non-standard
andis usedsimply to convey the results in a conciseformat.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 46
CS1-04: Joint
distributions
The practice questions start on the next page so that you can keep
all the chapter summaries together for revision purposes.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 47
Chapter4 PracticeQuestions
4.1
Let X and Y havejoint density function given by:
f (,xy)
4.2
c( x=+ 3y)
0 < x < 2, 0 < y < 2
(i)
Calculate the value of c .
(ii)
Hence,calculate
(
PX
1,
0.5)Y<>
.
The continuous random variables,X Y havethe bivariate PDF:
x
f2xy
(, ) =
Exam style
1,yx
+<
0, y
>
(i)
Derivethe marginalPDFof Y.
(ii)
Derive the conditional
>
0
[2]
PDF of X given
Yy=using the result from part (i).
[1]
[Total 3]
4.3
The continuous random variables X and Y havejoint PDF:
x
1
6
f (,xy)
4.4
()2
xy=+
0x< y
<
<2
(i)
Determine the PDF of the conditional distribution |X
(ii)
Calculate the conditional
Show that, for the joint random
probability
(1Y<<
PX
variables ,MN,
1.5|
Yy=
.
= 1).
where:
m
m N n)==
=
, for
35 2n 2
m=1, 2, 3, 4 and n = 1, 2, 3
the conditional
probability
M given
corresponding
marginal distributions.
PM
(
,
-
4.5
Exam
functions
for
Nn=and for
N given
Mm=are equal to the
Let X and Y havejoint density function:
style
fXY
, (,xy)
3x =+xy()2
<0
4
5
x < 1,0 < y < 1
Determine:
(i)
the marginal density function of X
[2]
(ii)
the conditional
density function
[1]
(iii)
the covariance
of X and
Y.
of Y given Xx=
[5]
[Total 8]
The Actuarial
Education
Company
IFE: 2022 Examination
Page 48
4.6
CS1-04: Joint
Calculate the correlation
coefficient
of X and Y, where X and Y have the joint
distributions
distribution:
X
Y
4.7
0
1
2
1
0.1
0.1
0
2
0.1
0.1
0.2
3
0.2
0.1
0.1
Claim sizes on a home insurance
policy are normally
distributed
about a mean of 800
and with a
standard deviation of 100. Claimssizes on a carinsurance policy are normally distributed about
a meanof 1,200 and with a standard deviation of 300. All claim sizes are assumed to be
independent.
To date, there
have already been home claims amounting to 800,
but no car claims.
Calculatethe probability that after the next 4 home claims and 3 car claims the total size of car
claims exceeds the total size ofthe home claims.
4.8
Two discrete random variables, X and Y, havethe following joint probability function:
Exam style
X
Y
1
2
3
1
0.2
0
0.2
2
0
0.2
0
3
0.2
0
0.2
Determine:
(i)
(ii)
( )
EX
the probability distribution of
[1]
YX =|1
[1]
(iii)
whether X and Y are correlated or not
[2]
(iv)
whether X and Y are independent
[1]
or not.
[Total 5]
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-04: Joint
4.9
distributions
Page 49
The random variables
kx
where
-a
e
-
y
x<<8
Derive an expression for
4.10
Exam style
given by:
1,1 < y <8
k in terms
of
a and
.
Show using convolutions that if X and Y are independent random variables and X has a
distribution
4.11
density function
, and k is a constant.
1,0>>
a
X and Y have joint
and Y has a
Let X be a random
?n2
variable
distribution,
then
XY+
has
with mean 3 and standard
a
2
?mn+
deviation
2
?m
distribution.
2, and let
Y be a random
variable with mean4 and standard deviation 1. X and Y have a correlation coefficient of 0.3.
Let Z =+XY.
Calculate:
(i)
cov()X , Z
[2]
(ii)
var()Z .
[2]
[Total 4]
4.12
X has a Poisson distribution
with mean 5 and Y has a Poisson distribution
cov( X, Y) =- 12 , calculate the variance of Z where
with mean 10. If
=- XY +23Z
.
[2]
Exam style
4.13
Show that if
X has a negative binomial
distribution
negative binomial distribution with parameters
XY+
4.14
Exam style
also has a negative binomial
distribution,
with parameters
k and p , and Y has a
mand p, and X and Y areindependent, then
and specify its parameters.
For a certain company, claim sizes on car policies are normally distributed about a meanof 1,800
and with standard deviation 300, whereasclaim sizes on home policies are normally distributed
about a mean of 1,200
independent.
and with standard
deviation
500.
Claim sizes are assumed to be
Calculatethe probability that a car claim is atleast twice the size of a home claim.
The Actuarial
Education
Company
[4]
IFE: 2022 Examination
Page 50
CS1-04: Joint
distributions
The solutions start on the next page so that you can
keep the chapter summaries together for revision purposes.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 51
Chapter4 Solutions
4.1
(i)
Using
theresult??fx(, y) dxdy=1 gives:
yx
22
2
y dxdy+= ? c
?? cx (3)
yx 00
1
2
2
3xy??+
x2
??x=0
y=0
==
dy
2
?
(2=+ 6
cy) dy
y=0
cy=+23 y2??
2
??
0y=
16 c== 1
?c =
(ii)
1
16
The probability is:
21
Y(1,
PX
??161 (x
0.5)<>
=
+3 y) dxdy
0.5yx== 0
2
?
16
11
2
16
11
=+
2
1
2=+3xxy??
??
0x=
dy
y=0.5
2
?
3ydy
()
y=0.5
16
2
11
3 2??
=+ yy
??y=
2
2
0.5
= 128?
51
0.398
4.2
(i)
The marginal PDFof Y is:
1-y
fyY()
2== ? 2 dx
1 -y
x[] 0
=
1 - y()2,
0 < y <1
[2]
0
(ii)
The conditional PDFof X given
fx
XY
| =yy(,
The Actuarial
Education
Company
)
y(,
fx
XY,
fy
Y()
Yy=
is:
)
==
21
2(1 y)
=
1--y
,
0 < x1<
-y
[1]
IFE: 2022 Examination
Page 52
4.3
(i)
CS1-04: Joint
Wesaw in Section 1.4 of the chapter that the
5y3??
??
6 ??
??
18
=+-2y
63
fy
Y()
So the PDFfor the conditional
marginal distribution
for
distributions
Y is:
0 <y <2
distribution
|X
Yy=is the joint
PDF divided by the
marginal PDF:
y(,
fx
=y
XY
|
(ii)
1
6
)
()2
+xxy
18
63
So the general conditional
2 xxy
+
0
==
5y3??
2y+-
2y+-
??
??
??
=
x
<
2
366
probability is given by:
1.5
(1 PX<< 1.5| Y
<
85y3
y)
85 ?x2+
xy dx
=
+-
=
1
yy
+22-
36
= 1.125(+-1.125 ()
2 y+-
1.5
xx32 y??
11
??+
33
yy
85
32 ??
??1
36
1/ 3 +yy /2 )
85 y 3
36
Substitutingin y1= , weobtain:
(1
1.5|Y = 1)= 1.125(+-1.125
()
PX<<
2+4.4
1/ 3 +1 /2 ) = 1.4167
= 0.3696
85
36
In the chapter, wesaw that the marginal probability functions for
()
MPm
=
m
for
10
3.8333
M and N are:
m=1, 2, 3, 4
and:
Pn =
N()
1
72n-
So, dividing the joint
IFE: 2022 Examinations
n = 1, 2, 3
probability function
Pmn==
MNn(
, )
| =
for the conditional
for
3
MNPm
by the
n)
??
??
,
Pn
N()
probability function
marginal probability function
2-- ??
35
of
M given
for
N, we obtain:
1 ?? mm(,
, m= 1, 2, 3, 4
?? =
nn
10
7
?? 2 23
Nn=
.
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 53
Similarly:
NM
| = mPmn(,)==
is the conditional
PN)==n(, M
m
m
??
??
2-- ??
PM = m
probability function
of N given
m
10
=
1
735 nn()
2 23
,
n = 1, 2, 3
Mm=
.
These areidentical to the marginaldistributions obtained in the chapter text.
4.5
(i)
Marginal density
1
122
?
fx=+
X()
3x
xy() dy =
y=0
(ii)
3x y +
55
2 ??
2
?
??
??
xy ??
??
=
y=0
?3x2
5?
1 ?
+
2
x ?? for 0 < x < 1
[2]
Conditional density
f YXx
| = (,xy==
)
(iii)
1
444??
y(,
fx
)
XY,
3
4
5
fx
X()
2
3xx
+
+
xxy
()
xxy 33 x++ y
2
=
52 ()
412
=
3
+
xx x3
2
for 0 <y < 1
[1]
+211
2
Covariance
Usingthe marginal
densityfunction of X:
1
?
EX=+
()
3??
3x
x=0
41 32
4? 4
x
x ?? dx =
52 ??
5? 4
+
1 3?1
x ??
6
11
=
?x=0
[1]
15
Obtaining
the marginal
densityfunction of Y:
1
?
fy=+
Y()
3
123
x=0
44
xy
()xdx=55 x
2 ??
+
1
x y??
2
??x=0
4?
=
5?
1+
1 ?
y?? for 0 < y < 1
2 ?
So:
1
?
EY
()
y=0
The Actuarial
Education
41
??
52 ??
y=+ y ?? dy =
Company
4?
1
11
22
y + y 3?
??
5? 2
6
?y=0
8
=
15
[1]
IFE: 2022 Examination
Page 54
CS1-04: Joint
distributions
Now:
11
4
32 2
?? 5
EXY=+
()
y x y()3x
dydx
xy== 00
1
1
43
?
x
52
x=0
1
32
=+
43
?
=+
52
x=0
43
58
1 2 3??
yx y ??
dx
3
?? y=0
1 32??
xx ?? dx
3 ??
1 43??1
xx
9 ??
?? x=0
=+
7
=
[2]
18
Hence:
711 8
=18 15 15
cov()XY
,
4.6
1
= -
[1]
450
The covariance of X and Y wasobtained in Section 2.4 to be cov( X, Y) = 0.02. The variances of
the marginaldistributions are:
E( )[]2 = 22
0
var()XE
)X=- ( X
and:
var( ) YE(
0.4 + 12
1
E( Y)[]=-Y= 22
)
0.2
+
22
0.3 + 22
0.4
+
32
0.3 - (0.9) 2
=
0.69
0.4 -()2.2 22 = 0.56
Sothe correlation coefficient is:
corr X, )Y==(
cov( XY
, )
var(
4.7
0.02
)var(XY)
0.69
0.56
= 0.0322
Let X bethe amount of a homeinsurance claim and Y the amount of a car insurance claim.
Then:
XN(
)
and
??YN(1200,30022800,100
)
Werequire:
PY
( 1
Y2++ Y3) > ( X1 + X
=
IFE: 2022 Examinations
PY1
+
Y2
+
X
234)
+ X
Y3
()+- ( X1
+
800()
X ++ X
234+ X )
>
()
800
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 55
So we need the distribution
ie
(
+
(
YY
12++ Y3) - ( X1
YY
++ Y()
12
3 - ( X1 + X2 + X3 + X4):
of
YY
12 + Y3) - ( X1
X2+++ X3
X2
+
X4) ? N(3 1200
X3
+
-
4 800, 3 300
+
22
4 100
)
X4) ? N(400, 310000)
+
Therefore:
PY1 + Y2
Y3)+- ( X1
X +++X
234
400??-800
Z
> 800()( = P
>??)
X
310,000??
??
(PZ=> 0.718)
1=4.8
(i)
PZ
( < 0.718)
=
0.236
Mean
(EX) = 1 0.4 2+ 0.2 3+ 0.4 = 2
Alternatively,
(ii)
[1]
wecould usethe fact that the distribution
1)==
=
PX (1, Y== y)
and PX (1)==0.4 gives:
PX =(1)
1|YX== 1
2|YX== 1
0.5
about 2.
|YX =1
Probability distribution of
Using X(|
PY y
of X is symmetrical
3|YX== 1
0
0.5
[1]
(iii)
Correlated?
To calculate the correlation coefficient,
EX
() = 2
wefirst require the covariance.
from part (i)
EY)
( = 1 0.4 2+ 0.2 3+ 0.4 = 2
EXY()
=
10.2
2+
0
+
30.2
So cov( X, ) YE( XY)+- E( X )()
E Y =4
Hence corr(,XY)
cov( XY
, )
-
+
?
2 2
=
+
0.
30.2
+
6
0
+
9
0.2
=
4
[1]
== 0.
var( )var(XY)
Therefore
The Actuarial
X and Y are uncorrelated.
Education
Company
[1]
IFE: 2022 Examination
Page 56
(iv)
CS1-04: Joint
distributions
Independent?
X and Y areindependent if Y(,
PX x == y) = (PX = x)P(Y = y) for all x and y.
However PX( 1, Y== 1) = 0.2 ? 0.4 0.4
(PX = 1)P( Y = 1).
=
So X and Y are not independent.
4.9
Since the PDF mustintegrate
88
--y/
??kx e
[1]
to 1:
dxdy=1
a
yx==11
Integrating over the x values gives:
8
? kx e
//a
--
8
yy
-
-+
dx ke
a1/??
??
??
??1
-+
x=1
xke ==
aa
y
11
-
Integratingthis overthe y valuesgives:
8
y
ke
?1
dy
y=
k
aa-- 11=-
e -y /
ke --8/1/
??
??1
=
a- 1
Equatingthis to 1:
ke
-
1/
=1
-1
4.10
The chi-square
a-
k=
?
(1 e1/)
a
distribution is a continuous
distribution
that can take any positive value.
The
chi-square distribution with parameter mis the same as a gamma distribution with parameters
m/2 and 1/ 2.
So, usingthe PDFofthe gamma distribution, the PDFofthe sum
=+Z
XYis given by the
convolution formula:
fZXfz()
?f
z
1/21/2
(1/2)
1/21
?
0
( x) Y(z=- x)dx
(1/2 )
1/2(
)mn +
??
??
??
IFE: 2022 Examinations
x
-- 1/2mx (1/2)mn
mn)
GG (1/2
z
1--1/21/2
-zm
11
ez=- ()
x 1/2n -1 e-1/2(z - x) dx
ex
?GG
2(1/2 ) (1/2
mn)
n1/2 1
z=- x()
dx
0
The Actuarial
Education
Compan
CS1-04: Joint
distributions
Page 57
Usingthe substitution z=tx /
gives:
1
mn
1
z
) e
+-1/2(
1/2
fZ z() (1/2)
1
1/2(mn)
(1/2)
+
1/2(
mn+)
mn)
1/2
G+
=-
zt)1/2n-1 zdt
GG
0
(1/2
1/2m 1
) (1/2
mn(1/2
)(zt) - (z
?
1ze- 1/2z
(1/2
)mn 1/2
G+
?GGm
nt
(1/2 ) (1/2)
1/2m 1
1/2n-1
- (1=- t)
dt
0
Sincethe last integral represents the total probability for the Beta (1/2m,1/2)
n distribution, weget:
1/2(mn)
(1/2)
+
Zfz()
(1/2
G+
1/2
mn)
z1/2(
1/2(mn)
+
(1/2)
=
Since this
2
?mn+
1/2(
mn+) 1
mn+-)
mn
(1/2 G+ 1/2)
matchesthe PDF of the
1
e- 1/2z= <[0
P
ze
-
Beta (1/2 m,1/2n) < 1]
1/2z
2
?mn+
distribution
(and Z can take any positive value),
Z is a
random variable.
It is much easier to prove this result using MGFs.
4.11
(i)
Covariance
Wehave:
cov( X, )
cov(ZX, X=+ Y)
cov( X, )=+ cov(XXY
, )
var( )=+ cov(XXY
, )
Using the correlation
corr(,XY)
coefficient
=-
0.3
between
Xand Y gives:
cov( XYXY
, )
=
cov( , )
=
var( )var(XY)
4 1
? cov( XY
, ) =-0.6
Hence:
cov( X, Z)
The Actuarial
Education
4=- 0.6 = 3.4
Company
[2]
IFE: 2022 Examination
Page 58
(ii)
CS1-04: Joint
distributions
Variance
Using var(Z) = cov()ZZ
, :
var(Z)
cov(
=+ XY
+XY)
,
cov( X, )=+ 2cov(XXY
, )
var( )=+ 2cov(XXY
, )
=+42 -
4.12
Z)
+
cov( Y, Y)
var( Y)
+
1
3.8
=
Note:
0.6
+
[2]
var( )?+var(
var(XY) as X and Y are not independent.
The3+ term does not affect the variance, so:
va Z)
var( =- XY
2 +3)
var( X=-r(2Y)
var(X
)= var(YX) + var( Y)
Now:
2cov( X, Y)
and:
cov( aX, bY)
=
ab cov( X, Y)
So:
var( X
2 )-= var(YX)
5=+ 4
4.13
+
4var( Y) - 2
10
-
The moment generating function
4 (
-
12)
2cov( X, Y)
=
[1]
93
[1]
of X is:
k
t
pe ??
Mt =??
X()
1 qet ????
Similarly,the MGFof Yis:
m
pe ??t
Mt
() =??
Y
1 qet ????
Since X and Y areindependent,wehave:
+
XY
( )
Mt
MX t() MY ()
t ==
pe ??
??
qe ??
??
?
tt
pe
km
?
?
tt ?
?
? -?11 qe ?
pet
=
1
k
??
??
t ??-
+
m
qe ??
Thisis the MGFof another negativebinomial distribution with parameters p and+km.
by uniqueness of MGFs,
+XY
IFE: 2022 Examinations
Hence,
hasthis distribution.
The Actuarial
Education
Compan
CS1-04: Joint
4.14
distributions
Page 59
Let X be the claim size on car policies, so that
?XN 1800,300 ()2
.
Let Y be the claim size on home policies, so that
?YN 1200,500 ()2
.
We want:
PX (2 Y)>=
(PX
-
2Y > 0)
So we need the distribution
XY
2 ? N(1800--
[1]
of X -2Y :
2 1200, 300
+
22
4 500
)
XY
2 ? N-- 600,1090000()
[2]
Standardising:
0 --( 600)
z==0.575
1,090,000
So:
PX (2 Y-> 0) = P
Z( > 0.575) = 1 - P Z <(0.575)
1=- 0.71735
The Actuarial
Education
Company
=
0.283
[1]
IFE: 2022 Examination
Allstudy materialproducedby ActEdis copyright andis sold
for the exclusiveuse of the purchaser. Thecopyright is
ownedbyInstitute andFacultyEducationLimited, a
subsidiary of the Institute and Faculty of Actuaries.
Unlessprior authorityis grantedby ActEd,you maynot hire
out,lend, give out, sell, store ortransmit electronically or
photocopy any part of the study material.
You musttake care of yourstudy materialto ensurethat it
is not used or copied by anybody else.
Legal action will betaken if these terms areinfringed. In
addition, we mayseekto take disciplinaryactionthrough
the profession orthrough your employer.
Theseconditionsremainin force after you havefinished
usingthe course.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-05: Conditional
expectation
Page 1
Conditional
expectation
Syllabus objectives
1.3
Expectations,
1.3.1
conditional
Definethe conditional expectation of one random variable given the value of
another random
1.3.2
The Actuarial
Education
expectations
variable, and calculate such a quantity.
Show how the meanand variance of arandom variable can be obtained from
expected values of conditional expected values, and apply this.
Company
IFE: 2022 Examination
Page 2
0
CS1-05: Conditional
expectation
Introduction
In this short chapter, we willreturn to the conditional PDF, YX|
= fx(, y , that we metin Chapter 4.
)x
We will explain how to determine the conditional
variance, var()YX|
=
x.
expectation,
EY
(| Xx= ) , and the conditional
We willthen see how wecan obtain the unconditional values EY
() and
var()Y from the conditional
meanand variance.
We will use conditional expectation in alater chapter when we define the regression line
EY]x
[|
a=+
x. The idea
will also feature in other actuarial subjects. In particular, in Subject
CS2 we willintroduce the idea of a compound random variable, whichis the sum of arandom
number of random variables.
Compound random
variables can be used to
model total claim
amounts. We will need the results from this chapter to derive some of the standard formulae for
compound random variables.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-05: Conditional
1
expectation
Page 3
TheconditionalexpectationE[Y|X=x]
Definition:
distribution
The conditional
expectation
of Y given Xx=
of Y given
Xx=
is the
mean of the conditional
.
This
mean is denoted
EY[| Xx=
] , or just ]EY [| x .
If X and Y are discrete random variables, this is:
[|
]
?? y
EY X x==
yP[ Y = y | X x]==
PY y[, X== x]
yy
PX
=x
[]
Question
Write down the equivalent
expression in the case when X and Y are continuous
random
variables.
Solution
EY
[| X x]== ??(
y f y | x) dy
y
yy
fx(, y)
fx()
dy
Wecan calculate numerical values for conditional expectations.
Question
Two random
variables
X and Y have the following
discrete joint
distribution:
Y
10
20
30
1
0.2
0.2
0.1
2
0.2
0.3
0
X
Calculate EY
(| X =1).
The Actuarial
Education
Company
IFE: 2022 Examination
Page 4
CS1-05: Conditional
expectation
Solution
X(|
EY 1) ?yP( Y
==
=
|
y X
1)
=
10 PY
( == 10| X = 1) + 20 PY
( = 20| X =1) + 30=PY
(
30| X = 1)
0.2
0.2
0.1
+20
+ 30
0.5
0.5
0.5
10=
10= 0.4 + 20 0.4 + 30 0.2 = 18
Wecan also calculate conditional expectations for continuous joint random variables.
Question
Suppose X and Y havejoint density function given by:
3
fx(, y )=+ x(x
5
y)
0 <x < 1, 0 <y < 2
Determine the conditional expectation
EY
[| Xx= ] .
Solution
EY
[|
Using
==?y
X
x]
y(,
fx )
y
2
?
fx=+
()
dy andrecallingfrom Chapter4that f
fx()
122
x
() =?xf( x, y) dy:
y
33
()
xy dy =
55
y=0
x y
+
2
2??
xy ??
2
y=0
3
?? =5
2x
+
x x
()2 65 +(
2x
=
1)
Hence:
EY
[| X
x]==
y
2
?
y=0
xx
5
6
+
y()
223
dy =
xx +1
()
2
xyy+
2(
1)
yy==
2(
1)
005
2
y23??+xy
23 ??dy
2(
xx++??
1)
?? y=0
11
==
8
xx++ 4
2 +33
==
+xy
??y 2( x + 1) dy
xx++ 1
x
=
3(x
34
+
1)
Wecan also calculate conditional expectations in the case wherethe limits for one variable
depend on the other.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-05: Conditional
expectation
Page 5
Question
Let X and Y havejoint densityfunctiongivenby:
fXY
,
(,xy)
x
1
6
()2 0x< y
xy=+
Determine the conditional
<2
<
expectation
EY
[| Xx= ] .
Solution
Wesaw in Section 1.4 of Chapter 4 that the PDF of|YX x=
21
y??
3 x
x2??
fYX (,xy|) <??=+
So the conditional
x)==
?y
0
21
+??
?
0
Education
Company
y ??
3 x
x
The Actuarial
0 <y < x 2
expectation is:
x
EY
(| X
is:
x2??
22
yy
=+
dy
2y 3??
22
dy
x22
y
=
+
x
??
33xx 39
3??0
xx ??
x2 x2
=
3
+
9x2
=
5
x
<<
02x
9
IFE: 2022 Examination
Page 6
2
CS1-05: Conditional
expectation
Therandom variableE[Y|X]
The conditional
be thought
expectation
EY[| Xx==x ]
g( ) , say, is, in general, afunction
of as the observed value of a random variable
of x . It can
gX() . The random variable
gX()
is denoted ]EY [| X .
EY
[| X
Wesaw in a previous question that
So EY
[| X] =
X +34
3( X + 1)
x]==
x +34
3(x +1)
. This is a function
of x .
, and this is afunction ofthe random variable X.
Note:]EY [| X is also referred to asthe regression
In alater chapter the regression line
EY[| X] , like any other function
will be defined as EY]x[|
Theorem:
EE[[ Y|
X] ] = E
Y[
x .
whose properties depend
Of particular importance is the expected value (the
The usefulness of considering this expected value,
EE[[ Y| X] ] , comes from the following
but true in general.
a =+
of X, hasits own distribution,
on those of the distribution
of X itself.
mean) of the distribution
of ]EY [| X .
variables,
of Y on X.
result,
proved
here in the case of continuous
]
Proof:
EE[[ Y| X] ]
=
=
?E[
Y| x ] f X ( x) dx
??yf
??yf
Weare integrating
y(| x) dy() f
X
x(, y) dx dy ==
( x)
dx
E[ Y]
here over all possible values of x and y.
Here f (|yx) represents the density function of the conditional distribution of|YX x= . This was
written as)fx|YX (, y in Chapter4.
Thelast two steps follow by noting that x(|
fy
) =
fx(, y)
fx
X()
and
) y=fY(),the marginal
?f (,xydx
PDF
of Y.
Thisformula is given on page 16 ofthe Tables.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-05: Conditional
expectation
Page 7
Question
(i)
Calculate EY
[] from first principles given that the joint density function of X and Y is:
3
fx(, y )=+
5
x(x
(ii)
Given that
EY
[| X
(iii)
Hence,confirm that
y)
0 < x < 1,
x 34
, calculate
3(x +1)
+
x]==
[]
EY
=
0 <y <2
EE
[(|Y X)] .
E[ E( Y| X)] for this distribution.
Solution
(i)
EY
()
=?yf ( y) dy,
=?yf ( x, y) dx.
and f ()
1
1
?
fy()
x
=
0
2
?
EY
()
y=0
(ii)
So:
x
y
EEY
[( | X)]
E
33 1 23 1 2 ??
x =+ xy() dx =
x + x y??
55 3
2
x
??
13 122
y=+
y dy
510
34??
3(
1
10
3 ++ 4
Xx
??++x3(
Xx
1)???
==
1)
10
3??
y ??
??y=0
5
0
2
y =+
=
3
+
10
y
6
1.2
5=
f( x) dx
6
As wesawin an earlier question, fx()x=+ x(
5
x +34
6
x+(x
=
3(x +1) 5
E [(
E Y| X)]
=
1
=
1). So:
11
1) dx =
2
?? 3x2
5
+
4 x dx
xx 00
==
=+
26
xx32??1
21.2
??x 0 =55 =
=
(iii)
Comparing the answersin parts (i) and (ii), wecan see that
EY
[]
=
E[ EY
( | X)] .
Wecan also deal with situations where arandom variable depends on the value of a parameter,
which can itself
be treated
as a random
For example, consider a portfolio
quantity.
of motor policies.
Claim amounts arising in the portfolio
might
have a gamma distribution with parameters a and ?. However, different policyholders might
have different values of a. If this is true, wecan represent the variability of a overthe portfolio
by givingit its own probability distribution. So we might decide that a could be treated as having
an exponential
distribution
over the
whole portfolio.
Wecan then deduce the
mean and variance
of arandomly chosen claim amount.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 8
CS1-05: Conditional
expectation
Question
Therandom variable K has an
?()Expdistribution.
For a given value of K, the random variable
X has a Poisson ()
K distribution.
(i)
Obtain an expression for ]EX [| K .
(ii)
Hence,calculate EX
[] .
Solution
(i)
If
Kk=
, then
X has a Poisson ()k distribution,
which has mean k . So EX
[| Kk==k]
, and
this can be written as]EX [| KK= .
(ii)
E []
X
[EE[ X| K]]
[EK==
] =
1
.
?
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-05: Conditional
3
expectation
Page 9
Therandomvariablevar[Y|X]andtheE[V]+var[E] result
The variance ofthe conditional
var[
var[]Yx|
Y| x ]=- Ex??
{ Y
|=- ][
E Y
[var[ EYX|
EY
]]
[v
var[
x ]} | x = [EY
22 | x]
??
X]
Xx=is denoted var[]Yx|
E[ Y|
-
E[ var[ Y X ]]
EYar(| X)][{(
]=+Y
YE[var[
|
X ]] =- E[{( g X)}22
=
, where:
]()2
where:
g X)}2
{(
-
][ Y2 ] - E[{( g X)}2 ] and so:
E
E [{( gX=+[]
)}22|]
[ E ()]22, is given by:
]Y=YE Y()
g X)}
X]]
| YX] 22
) = E[ Y2 | X]
( E[
E[ E[ Y |
So the variance of Y, var[
ie
[EY|
of Y given
is the observed value of arandom variable var[]YX|
var[YX|
Hence
distribution
+-E
][ E g
{( X)}][var(|
22 = E
Y
X)]var[
+
g
()]
X
var[ E[ Y| X]] .
This formula is given on page 16 of the Tables.
Question
Evaluate var[ |
=
1]YX
given the joint distribution:
Y
10
20
30
1
0.2
0.2
0.1
2
0.2
0.3
0
X
Solution
var[
YX |1]== E(Y | X
=
1)- E
22(
In Section 1, wesaw that
X(|
EY
EY
(| X==
1) 18 . Similarly:
y22?
P( Y
1)==
10
100=
The Actuarial
Education
Company
YX =|1) .
=
y| X
=
1)
PY
( == 10| X = 1)
0.2
0.5
+
400
22
20
PY
( += 20| X = 1) + 302 P( Y = 30| X = 1)
0.2
0.5
+
900
0.1
0.5
= 380
IFE: 2022 Examination
Page 10
CS1-05: Conditional
expectation
So:
var[ |YX==1]
380 - 182 = 56
Question
Therandom variable K has an
?()Expdistribution.
For a given value of K, the random variable
X has a Poisson ()
K distribution.
Obtainan expressionfor var[]X | K. Hencederivean expressionfor var()X .
Solution
If
Kk=
, X has a Poisson ()k distribution,
So var[ X| k==
Kk]
and hence var[]X |
which has variance k .
KK=
.
Usingthe result givenin this section, we have:
var[ X]
Since
??KE
)
[ var(EX| K]
var [E( X| K)]=+E
= [ K]
+
var[ K]
()xp
, it follows that:
var[ ]K=+
XE[ K]
11
var[ ] =
+
?
IFE: 2022 Examinations
?
=
??
+
1
22
The Actuarial
Education
Compan
CS1-05: Conditional
expectation
Page 11
Chapter5 Summary
EY
(| X) is the meanof the conditional distribution of Y given X (which wasdefined in
Chapter 4). Theformulae for the conditional meanare:
EY
[| X
x]==
yP[ Y = y | X
y f y x) dy
==??(|
EY
[| X x]
var(
)YX|
?? y
x]==
y
yy
PY
yy
fx(, y)
fx()
y[, X== x]
PX
=x
[]
dy
(discrete case)
(continuous
case)
is the variance ofthe conditional distribution of Y given X. It is given by:
var( YX
|)
The unconditional
E( Y2| X)
E(
YX=|)[]2
mean and variance can be found from the conditional
mean and variance
usingthe formulae:
EY
[]
=
va
The Actuarial
Education
E[ EY
( | X)]
] YE[var( Y| X)]=+r[var[ E( Y| X)]
Company
IFE: 2022 Examination
Page 12
CS1-05: Conditional
expectation
The practice questions start on the next page so that you can
keep the chapter summaries together for revision purposes.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-05: Conditional
expectation
Page 13
Chapter5 PracticeQuestions
5.1
Calculate EX(| Y = 10) given the joint distribution:
Y
10
20
30
1
0.2
0.2
0.1
2
0.2
0.3
0
X
5.2
The random variable V follows the Poisson distribution with mean5. For a given value of V, the
random variable U is distributed as follows:
Exam style
|(UV =v)
?
U(0, v)
Determine the unconditional
5.3
Exam style
Suppose that
(i)
meanand variance of U.
X and Y are continuous
=
E[ EY
( | X)]
related variable with conditional
EY
(| Xx==x)
(ii)
Suppose that
3 +1
X is a standard
with parameters
a 3=
and
?
2.= Y is a
meanand variance of:
var( |YXx==x)
Calculatethe unconditional
2
2
+5
meanand standard deviation of
normal random
Y.
variable, and the conditional
[5]
[Total 8]
distribution
of a
Poissonrandom variable Y, given the value of Xx= , has expectation x21+ .
Determine )EY
(
5.5
[3]
X follows the gamma distribution
The random variable
Exam style
random variables.
Provefrom first principles that:
EY
()
5.4
[4]
The table
and Y:
and var()Y.
[5]
below shows the bivariate probability
X0=
X1=
X2=
Y1=
0.15
0.20
0.25
Y2=
0.05
0.15
0.20
Calculate the value of
The Actuarial
Education
Company
distribution
for two discrete random
variables
EX
(| Y = 2).
IFE: 2022 Examination
X
Page 14
5.6
CS1-05: Conditional
Two discrete random
variables,
X and Y, have the following
Exam style
joint
probability
expectation
function:
X
1
2
3
4
1
0.2
0
0.05
0.15
2
0
0.3
0.1
0.2
Y
(i)
Determine
var( X| Y
2)=.
[3]
Let U and V havejoint density function:
UV
v(,
fu
) 6 2uv=- u ()2
0
<u <v <1
,
(ii)
Determine EU
(| Vv= ) .
[3]
[Total 6]
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-05: Conditional
expectation
Page 15
Chapter5 Solutions
5.1
Y(|
EX 10) ?xP( X
==
=
1 PX
( == 1| Y
0.2
=
12
+
0.4
1=
0.5
+
2
|
=
=
10)
x Y
10)
2 (PX = 2| Y = 10)
+
0.2
0.4
0.5
=
1.5
Alternatively, wecan seethis directly by noting that if weknow that Y 10=
, then X is equally
likely to be 1 or 2. Sincethis is a symmetrical distribution, the conditional meanis 1.5.
5.2
Weare given in the question that:
UV = v
U v)
|(0,
EV==V
()
5
var()
V?? Poi(5)
So:
5
[1/2]
and:
11 V2
var( | V) 212
EU
(| V)==UV
Using the formulae
[1/2]
on page 16 of the Tables, we have:
EU
E EU
[ ]==U
[ | V][]
var[ ]
var EU
[ | V][] + E[ var[ U| V]]
Therefore:
EU
[]
E E( U| V)[]
var[ ]
E
V =11E[V??==
] = 21
??
22
??=+VE 11 V2??
??
212
??
var[ ]=+ 11VEV
[ 2]
412
var[ V]
var[ U]
The Actuarial
Education
[1]
var UEU
( | V)[]=+E[ var( U| V]
)
var
Since EV []
2
=
[1]
[ [EV=+,
]]22 we have:
11
3
5 +412
(5 + 52 ) = 34
Company
[1]
IFE: 2022 Examination
Page 16
5.3
(i)
CS1-05: Conditional
expectation
Proof
EY
(| Xx= ) is a function
of x. So, using
Eg
[ x()]
=?g x() f
x() dx, we have:
x
X
[EEY(| ])
=?E Y(|
x) f(x ) dx
[1]
x
Usingthe definition of EY
(| X x)==?y f ( y| x) dy gives:
y
??
EE
X(|
Y )[]
f ( x) dx
??y f ( y | x) dy??
??
=
??
xy
Usingthe definition
EE
X(|
Y )[]
fy(| x) =
f (,xy)
??y)fxfx(,()y
=
??
dy?? f(x ) dx
??
??y f
[1]
??
xy
=
gives:
f x()
x y) dy dx
(,
xy
??
yf x(,y) dx?? dy
??
??
=
??
yx
Sinceintegrating the joint density function,
function,
f (,xy) , over all values of x givesthe marginal density
y()f, we have:
(| ])
[EEY
X
)
yf(y) dy E(Y==?
[1]
y
(ii)
Calculate the unconditional
meanand variance
The meanandvarianceof X aregivenby:
EX
()
==
=1.5
aa
var()X =
?
Usingthe result from part (i), ie
EY
()
=
?2
33
=
24
= 0.75
[1]
E[ EY
( | X)] :
EY
()X=+E[3 X 1] = 3E[ ] +1 = 3 1.5 + 1 = 5.5
IFE: 2022 Examinations
[1]
The Actuarial
Education
Compan
CS1-05: Conditional
expectation
Using the result
Page 17
var( )
var[YEY
( | X)]
var( ) YE[2 X2=+ 5]
+
var[3 X + 1]
2[EX2]=+ 5
+
9var[ X]
(EX )X=+
var( X)
Usingthe fact that
var( Y)
2= 3 + 5 + 9
page 16 of the Tables:
[1]
E ( ) = 0.75 + 1.523= :
22
[1]
0.75 = 17.75
Sothe standard deviation is
5.4
E[var( Y| X)]=+
from
17.75
4.21=
.
[1]
Wehave XN? (0,1). So:
EX==X
() 0
and
var()
1
Wealso have (|YXx=+x) ? Poi(
EY
(| x==
Xx
2
1). Hence:
+1 and var(|Y x==
Xx)
22) +1
[1]
Usingthe expectation formula gives:
EY
()
E[ E( Y| X== x)]
=
E[ X
+
X)
1] = E( 22
+
1 = 1+ 1
=
2
[1]
Now using the variance formula:
var( ) YE[var( Y| X)]=+ var[ E( Y| X)]
=
E( X
+
22 + 1)
1) + var( X
[1]
22
EX()=+ 1 +var()X
Now EX()
var( X) [ EX=+=
( )]22 1 + 0 =1. However, well haveto do var( 2)Xfrom first principles:
var(
XE( X
)X=24) [ E( 2)] 2
From the formula
for the
moments of a standard
normal random
variable (given on page 10 of
the Tables), wesee that:
EX4()
2
42
Using EX()X=+var( X)
var(
1(1G+ 4)
1 G(5) 1
==
=
4!
(1G+ 2
2!
4)
22
G(3) 4
E
22( ) = 1 +0 = 1 again gives:
24
=- E2 ( X2) = 3 - 12
()XXE
)
=3
=
2
Hence:
var(
The Actuarial
YE( X )=+ 1 + var( X
22)
) = 1+ 1+ 2 = 4
Education
Company
[2]
IFE: 2022 Examination
Page 18
5.5
E (|
X
CS1-05: Conditional
2)==
(i)
+
0.4
Y(,
PX x
xx
0.05
0=
5.6
2)== ??Yx
YxP X
(|
0.15
1
+
0.4
expectation
2)
==
(2)
PY=
0.2
2
=
0.4
1.375
Conditional variance
var( XY |2)== E( X
(EX|
Y==2)
xP X = x | Y
=
=
Y(|
EX
Y =|2)- E
22
2
0.6
+
PY =(2)
0.1
3
0.6
=(2)
0.2
4
+
0.6
5
[1]
6
22
2)
==
x P X
0
=
8
=
PXY=nx
2)==??(x
00.3
12
+
0.6
XY(|2)
=
0.6
=
x|Y
=
2)=??(x2
0.3
22
12
+
0.6
+
PXY=n
x
PY =(2)
0.1
32
0.6
(2)
=
0.2
42
+
0.6
5
[1]
6
2
So var(
(ii)
|XY==2)
8
-
55??
29
2 ?? =
= 0.80556 .
6636
??
[1]
Conditional expectation
Werequire:
EU
(| V v)==?
u f ( u| v) du
u
Now:
v
? 6(2
fvuv
()
=- u )=du
22
u3
6??
uv-
u=0
?
v(|
fu
IFE: 2022 Examinations
)
f (,uv )
fv()
==
6(2 uv
4
u )
v
?? u
=
= 6? v3 -11 v3? =4v3
?
=
2 uv-- 22
u
vv332
330
[1]
?
for <01 <<uv
[1]
3
The Actuarial
Education
Compan
CS1-05: Conditional
expectation
Page 19
So:
v
EU
(| V v)==
?
u =0
The Actuarial
Education
Company
2uv
v
23
-
u
du
uv
2134
u ??
34
33
22vv
33
??=
??
??u=0
2 v4--1 v4
=
4
3
32 v
= 5 v for
8
<<01v
[1]
3
IFE: 2022 Examination
Page 20
CS1-05: Conditional
expectation
Endof Part1
Whatnext?
1.
Briefly review the key areas of Part 1 and/or re-read the summaries atthe end of
Chapters 1 to 5.
2.
Ensureyou haveattempted some ofthe Practice Questionsatthe end of eachchapterin
Part 1. If you dont havetime to do them all, you could savethe remainder for use as part
of your revision.
3.
4.
Attempt
Assignment
X1.
Workthrough the Chapter2 and 5 material(discrete distributions,continuous
distributions and conditional expectation) of the Paper B Online Resources(PBOR).
Timeto consider...
... learning and revision products
Marking
Recallthat you can buy Series Marking or moreflexible
have your assignments
Marking Vouchersto
marked by ActEd. Results of surveys suggest that attempting
assignments and having them
marked improves
your chances of passing the exam.
the
One
student said:
The
insight into
that
of the
myinterpretation
model solutions
of the questions
was helpful.
compared
Also, the pointers
with
as to how to
shorten the amount of work required to reach an answer were
appreciated.
Face-to-face and Live Online Tutorials If you havent yet booked a tutorial, then maybe
nowis the time to do so. Feedback on ActEdtutorials is extremely positive:
I
would not pass exams without
ActEds lovely,
clever,
patient
know how you managed to find so many great teachers.
Online Classroom
Alternatively
/ additionally,
you
tutors.
I dont
Thank you!
might consider the
Online Classroom
to give you accessto ActEds expert tuition and additional support:
Please
You can find lots
do an online classroom
moreinformation,
for everything.
including
It is amazing.
demos and our Tuition Bulletin, on our
website at www.ActEd.co.uk.
Buy online at www.ActEd.co.uk/estore
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-06: The Central
Limit Theorem
Page 1
TheCentral
LimitTheorem
Syllabus objectives
1.5
Central Limit Theorem
1.5.1
random variables.
Generate simulated
sampling distribution
The Actuarial
Education
and application
State the Central Limit Theorem for a sequence ofindependent, identically
distributed
1.5.2.
statement
Company
values from a given distribution
and compare the
withthe normal.
IFE: 2021 Examination
Page 2
0
CS1-06: The Central Limit
Theorem
Introduction
The Central Limit
Theorem is perhaps the
basis for large-sample inference
is unknown
large-sample
most important
about a population
result in statistics.
It provides
the
mean whenthe population distribution
and more importantly
does not need to be known. It also provides the basis for
inference
about a population
proportion, for example, in initial
mortality rates
at given age x , or in opinion polls and surveys. It is one of the reasons for the importance
of the normal
distribution
in statistics.
We willstudy statistical inference in Chapter 10 (Hypothesis testing).
Basically,the CentralLimit Theoremgivesusan approximatedistribution of the sample mean, X,
from any distribution.
next four chapters.
The usefulness of this, though
not apparent
now,
will become clear in the
The Central Limit Theorem can also be usedto give approximations to other distributions. Thisis
usefulif weare calculating probabilities that wouldtake too long otherwise. For example,
PX <(30)
where X
Bin?
(100,0.3)
would require
them all up. If we use a normal approximation,
and the loss of accuracy is slight.
IFE: 2021 Examinations
us to
work out 30 probabilities
the calculation
and then add
of the probability is
The Actuarial
muchsimpler,
Education
Compan
CS1-06: The Central
Limit Theorem
Page 3
1
TheCentralLimit Theorem
1.1
Definition
If
XX ,...,
variables
X
s
/
-
is a sequence
Xn12,
with finite
mean
of independent,
distributed
and finite (non-zero) variance
approaches the standard normal distribution,
(iid) random
s2 then the distribution
N(,
)01 , as n
of
?8 .
n
It is not necessary to be able to prove this result.
as X
calculated
1.2
identically
Remember that
Xis the sample
mean,
=1 n?Xi.
ni =1
Practicaluses
The waythe Central Limit Theorem is used in practice is to provide useful normal
approximations
to the
Therefore
Xn
s /
both
distributions
of particular
and?Xi - n
ns
functions
are approximately
of a set of iid random
distributed
as
variables.
(0,1)Nfor large
n.
2
The second ofthese expressions can be obtained from the first just by multiplying top and bottom
through
by the sample size n .
Alternatively, the unstandardised forms can be used. Thus X is approximatelyNn( )s
and
iX
(Nn
n
is approximately
?
s
2)
2
,/
.
,
EX
()=
In fact, the expressionsfor the meanand variance are exact,ie
and var()X
s
=
2
n
. It is
the shape of the curve that is approximate.
As a notation the symbol ~? is used to
?
write the statements
? 2(,
? Xi~)
Nn
n
?
An obvious
s
in the
preceding
meanis
paragraph
approximately
as
?
?
distributed,
so we can
) and
(,s 2~/XnN
.
question is:
what is large
n ?
A common answer is simply
= 30n but this is too simple an answer. Afuller answer is that
it depends on the shape ofthe population, that is, the distribution of iX , and in particular
how skewed it is.
If this population distribution is fairly symmetric even though
belarge enough; whereasif the distribution is very skewed,
non-normal, then
= 10n may
= 50n or more may be
necessary.
In other words,the closer the original distribution is to being symmetrical, the better the
approximation
The Actuarial
given by the Central Limit Theorem.
Education
Company
IFE: 2021 Examination
Page 4
CS1-06: The Central Limit
Theorem
Question
It is assumedthat the number of claims arriving at aninsurance company per working day has a
meanof 40 and a standard deviation of 12. Asurvey is to be conducted over 50 working days.
Calculate the probability
that the sample
mean number of claims arriving per working day is less
than 35.
Solution
Usingthe notation givenin the Core Reading, = 40,
Bythe Central Limit Theorem,
We want
?. XN(40,122
s=12, n =50.
50).
.
PX <(3 5). Standardising in the usual way:
??
< P Z<
35)
X(
P
40??-35
= (PZ <
??
-
2.946) = 1 - (PZ< 2.946)
=
1 - 0.99839 = 0.00161
122 50??
We
canalsousetheCentral
LimitTheorem
to answer
questions
aboutthedistribution
of ?X
i,
rather than
X.
Question
The cost of repairing a vehicle following an accident has mean$6,200 and standard deviation
$650.
A study
was carried out into
65 vehicles that had been involved in accidents.
Calculate the
probability that the total repair billfor the vehicles exceeded $400,000.
Solution
Usingthe notation givenin the Core Reading,wehave = 6,200,
let
s=650, n =65. Also
ZN? (0,1) .
We wantthe probability that the total repair bill, T is greater than 400,000. The Central Limit
Theorem states that:
.
22 )
?TN(65 6200, 65 650 )N= (403000, 5240
.
So the probability is calculated
T(
P
400,000)>
IFE: 2021 Examinations
P Z>
asfollows:
400,000
403,000??-
5,240
( >?? =PZ
??
0.572) =PZ
( < 0.572) = 0.71634
The Actuarial
Education
Compan
CS1-06: The Central
2
Limit Theorem
Page 5
Normalapproximations
Wecan use Central Limit Theorem to obtain approximations
gamma distributions.
intervals
and carrying
computer to calculate
Poisson and
and obtaining confidence
out hypothesis tests on a piece of paper. However, it is easy for
exact probabilities,
confidence intervals
and hypothesis tests.
Hence, these approximations
2.1
to the binomial,
This is useful for calculating probabilities
are not as important
a
as they used to be.
Binomial distribution, Bin(n,p)
LetiX
beiid Bernoulli random variables, that is,
PXi
(1)== p
PXi(0)
In other
Consider
Bin(1,)p , so that:
1- p
==
words iX
is the number
X nXX
12,,...,
of successes
, a sequence
in a single
of such variables.
Bernoulli trial.
This is precisely the binomial
situationand X = ?iX is the number
ofsuccesses
inthe ntrials.
So X = ?iX ~ Bi
)n n
(, p. Alsonote
that
it can be said that, for large
X?
?
As a result
of the Central Limit Theorem
n:
n (,
? Xi ??~ Nn
~,
s 2 / n()N or
For the Bernoulli
X
= X.
n
)
s2
distribution:
[EXii]== p
and
s
2
Therefore? Xi~?? Nnpnpp(, (1
= var[ X ]
= p(1
-
p)
)) for large n, whichis of course the normal
approximation to the binomial.
Basically, weapproximate using a normal distribution,
the binomial distribution.
which hasthe same meanand variance as
Question
Giventhat X ? Bin (,
n p) , derive the meanand variance of X, and hence write down an
approximate distribution for
The Actuarial
Education
Company
X usingthe Central Limit Theorem assuming n is sufficiently large.
IFE: 2021 Examination
Page 6
CS1-06: The Central Limit
Theorem
Solution
Since=
?iX,
X
we have:
n
??
?Xi
E()
EX
?Xi()E
??==
??
?? nn
var(=XX
) var??
11
=
? X??i
n
np
var== ? i()
??
??
So, bythe CentralLimit Theorem,
.
?XN
.
p,
=
p
11
nn22
np(1)- p =
(1-
pp)
nn
(1 pp)????
n ??
.
Whatis large n? A commonly quoted rule of thumb is that the approximation can be used
only when both np and (1 -np) are greater than 5. Theonly when is a bit severe. It is
more a case of the approximation
is less good if either is less than 5. However, this rule of
thumb agrees with the answer that it depends on the symmetry/skewness
of the population.
Notethat when p = 0.5 the Bernoulli distribution is symmetrical. In this case both np and
= 10n
, and so the rule ofthumb suggests that
= 10nis large
(1 -np) equal 5 when
enough.
As p moves away from 0.5 towards either 0 or 1the Bernoulli distribution
severely
skewed.
For example,
when p = 0.2 or 0.8the rule of thumb
enough, but, when p = 0.05 or 0.95 the rule of thumb gives
Recall from Chapter 2, that the binomial distribution
becomes more
gives
= 100nas large
can also be approximated
= 25n as large
enough.
by the Poisson
distribution. This approximation is valid when n is large and pis small. Thiscontrasts with the
normal approximation, whichrequires n to belarge and pto be closeto 1/2(although, as n gets
larger the normal approximation workswellevenif pis not closeto 1/2).
2.2
Poissondistribution
Let
Xi,
So
==]iEX
=1,2,...,in
[
?
The Central Limit
?
Xi
be iid
and
for large
that:
n
n?
()Poi
andso,forlargen, Poi
Poi??() ~? N (,?? ) for large
IFE: 2021 Examinations
variables.
var[ ]iX==2s?.
Theorem implies
? )Nn n??(,
~?
But ?Xi~
Po ?()i random
n
() ? ~(?
Nn
n??
,
?
) , or, equivalently,
?.
The Actuarial
Education
Compan
CS1-06: The Central
Limit Theorem
Weare approximating
Page 7
using a normal distribution
with the same mean and variance as the
Poisson distribution.
Question
()Poi
, whereiX is
?Xi
Showthat
()Poi
for all i .
n??
?
Solution
Recall that the Poisson distribution
is additive, ie:
Poi ?() and Y?? Poi()
X
X
Y
Poi(?++?)
()Poi
?Xi
Therefore
?
n??
.
A rule of thumb
extensive
for this
tables
one is that the approximation
for a range
of values of
is good if
5?
.
>
However since
? are available, it is only needed in practice for
muchlarger values of ?.
Rememberthat the Poisson distribution is the limiting case ofthe binomial with ?= np as
n
?8 and
?0p
. Sothis is consistent withthe rule for the binomial.
The normal approximations to the binomial and Poisson distributions (both discrete) are the
most commonly used in practice, and they are needed as the direct calculation
probabilities is computationally
awkward without them.
Thisis the point mentionedin the introduction.
wed need to
2.3
work out 30 probabilities
To calculate
(30)PX<
where X
of
Bin?
(100,0.3) ,
and then add them all up.
Gamma distribution
Let Xi,
= 1,2,...,in
The exponential
distribution
Therefore for large
Recallthat, if
be a sequence
n,
Therefore ,Y
which is
In fact, Gam
)maa?
( ,
has mean
= ?YXi
?
?
(N n
Exp ()? variables
?=
1/
and let
and variance
Y be their
sum.
1/s?=22
.
~,n 2)
??
?Xi? Gamma
n ?) .
(,
(), then
?? Exp
Xi
of iid
Gamma )n(, ? , will have a normal
can be approximated by N
() provided
, 2
aa
?
approximation
a
for large
is large.
a
values
of n.
need not be an
?
integer.
Since
2 = Gamma
k(
?k
2,1 2),
2 will have a normal approximation
?k
)Nk (,2 k for large
values
ofits degrees offreedom k.
The Actuarial
Education
Company
IFE: 2021 Examination
Page 8
CS1-06: The Central Limit
These approximations
Theorem
are poorer than those used for the binomial and Poisson distributions
owing to the skewness of the gamma distribution. It is therefore preferable to makeuse of the
exact result from
Chapter 3 that if
can then usethe
?2
IFE: 2021 Examinations
X ? Gammaa?
(, ) and
2a
is aninteger,
then
2? X??
2
2
a.
We
tables to obtain the probabilities.
The Actuarial
Education
Compan
CS1-06: The Central
3
Limit Theorem
Page 9
Thecontinuitycorrection
When dealing
with the normal
approximations
to the binomial
and Poisson
which are both discrete, a discrete distribution is being approximated
When using such an approximation
allowed for.
the change from
distributions,
by a continuous
discrete to continuous
one.
must be
For an integer-valued discrete distribution, such as the binomial or Poisson, it is perfectly
reasonable to consider individual probabilities such as PX =(4) . However if X is
continuous,
continuous
such asthe normal, =PX
(4) is not meaningful and is taken to be zero. For a
variable it is sensible to consider
only the probability
that
Xlies in some
interval.
For a continuous
distribution,
it is not useful to think about the probability
of a random
variable
being exactly equal to a value. For example, for a continuous distribution:
4
P(4====
X 4) = ?f(x dx)= 0
PX (4)
4
To allow for this
a continuity
correction
must be used.
Essentially
it corresponds
to treating
the integer values as being rounded to the nearest integer.
The diagram belowillustrates the problem. The bars correspond to the probabilities for
the Bin(10,0.5) distribution, whereasthe graph corresponds to the probability density function
for the normal approximation.
0.3
0.2
f(x
0.1
0
123456789
10 11
x
Since the binomial is a discrete distribution,
there are no probabilities for non-integer
values,
whereasthe normal approximation can take any value. To compensate for the gaps between
the bars, wesuppose that they are actually rounded to the nearestinteger. For example,
the x6= baris assumedto represent values between x 5.5=and x 6.5=
.
So to use the continuity correction in practice, for example:
The Actuarial
X= 4
is equivalent
to
'3
X > 15
is equivalent
to
' X > 15.5'
X = 15
is equivalent to
' X > 14.5'
Education
Company
X<<.54.5'
IFE: 2021 Examinations
Page 10
CS1-06: The Central Limit
Takethe first example. All values that are contained in the interval
rounded
to the nearest whole number.
in the interval
Alternatively,
Similarly, values in the interval
the bars on the graph:
X=4
X>15
4.5
14.5
4
X4=
4.5X<<
become 4 when
X 15.5>
, become values
>15Xwhenrounded to the nearest whole number.
considering
3.5
3.5
Theorem
X = 15
15.5
14.5
15
15.5
15
must, obviously,include all of the X4=
bar whichgoesfrom 3.5 to 4.5.
X 15>mustnotinclude the X 15=bar(asit is a strictinequality), therefore it shouldstart from
15.5 (the upper end of the 15 bar).
X 15=includes the X 15=bar and higher, therefore it should start from 14.5(the lower end of
the 15 bar).
Question
Draw the corresponding
(i)
X <8
diagrams for:
(ii)
Hencegive eachinequality
X =8
with the continuity correction applied.
Solution
X<8
7.5
X = 8
8.5
7.5
8.5
88
(i)
X
8< mustnotinclude the X
8= bar(asit is a strictinequality). Soit shouldstart from
7.5(the lower end ofthe 8 bar). Thisgives X 7.5<
.
(ii)
X8= includes the X8= bar and lower.
8 bar). This gives X 8.5<
Soit should start from 8.5(the upper end of the
.
IFE: 2021 Examinations
The Actuarial
Education
Compan
CS1-06: The Central
Limit Theorem
Page 11
Lets now see how to calculate
a normal approximation
to a probability involving
a discrete
random variable, allowing correctly for the continuity correction.
Question
Let X be a Poisson variable with parameter 20. Use the normal approximation to obtain a
value for
PX =(1 5) and use tables
to compare
with the exact value.
Solution
Wehave:
X ~ Poi(20)
X - 20
~XN(20,20)
20
(PX
??N??
~ (0,1)
??
15)== P( X < 15.5) : using continuity
PZ<
15.5
20??-
20
Z(
?? = P
??
=-10.84279 , interpolating
correction
< - 1.006)
in tables
to be as accurate
as possible
= 0.15721 .
From Poisson tables,
PX
15)==(0.15651.
Error = 0.0007 , or a 0.45% relative error.
It
was mentioned earlier that approximations
because the direct calculation
to the binomial and Poisson distributions
of probabilities is computationally
are used
awkward.
Weare now in a position to look at the following example.
Question
The average number of calls received
per hour by an insurance
companys
switchboard
is 5.
Calculatethe probability that in a working day of eight hours, the number oftelephone calls
received will be:
(i)
exactly 36
(ii)
between 42 and 45inclusive.
Assumingthat the number of calls has a Poisson distribution, calculate the exact probabilities and
also the approximate probabilities using a normal approximation.
The Actuarial
Education
Company
IFE: 2021 Examination
Page 12
CS1-06: The Central Limit
Theorem
Solution
If the number of calls per dayis X, then X
4036e 40
Poi?
(40). The exact probabilities are:
-
(i)
(ii)
(
PX
36)==
0.0539
=
36!
In order to calculate this,
(42 PX
we sum the probabilities
40 42
45)==
=
42!
40
4043ee-+
again, using continuity
(i)
(PXP= 36)
40
4045ee -40
+
44!
0.0495
to this Poisson distribution is
+
0.0440
=
45!
0.2064
N(40,40) .
Calculating the probabilities
(35.5 < X < 36.5)
35.5 40
=F
40
( - 0.553)
PX== 45)
36.5-- 40??
=<PZ
-F
P(41.5<
40 ??
( - 0.712)
0.7617=- 0.7099
(42
+
42, 43, 44 and 45:
corrections:
<??
(ii)
-44
40
+
43!
0.0585=+ 0.0544
The normal approximation
40
of getting
=
0.0518
X < 45.5)
41.5 40
<??
=<PZ
40
45.5-- 40??
40 ??
(0.237=< PZ < 0.870)
=F
(0.870)
-F
(0.237)
=
0.8078
-
0.5937
It is evident that in mostcases using an approximation
=
0.2141
makesthe calculations easier, and that the
values obtained are fairly close to the exact probabilities.
Question
Usea normal approximation to calculate an approximate value for the probability that an
observation from a Gam
(25,50)marandom
variable falls between 0.4 and 0.8.
Solution
The meanand variance of a general gamma distribution are
a
?
and
a
?2
, so here the meanand
variance are 0.5 and 0.01 respectively.
IFE: 2021 Examinations
The Actuarial
Education
Compan
CS1-06: The Central
Limit Theorem
Page 13
If X ? Gamma(25,50) , then
.
?XN(0.5,0.01) and:
.
(0.4 PX<< 0.8) P( - 1 < Z < 3)
=F
(3)
-F
=F
(3)
-
( - 1)
1 -F (1)[]
0.99865=- 0.15866 = 0.840
Nocontinuity correction is required, as westarted with a continuous distribution.
The exact answer is 0.8387.
Wecan also usethe Central Limit Theorem to calculate approximate probabilities relating to a
sample
mean obtained from a random
sample from a continuous
distribution.
Question
Calculatethe approximate probability that the meanof a sample of 10 observations from
a Beta(10,10) random variable falls between 0.48 and 0.52.
Solution
(10,10) distribution has mean
Usingthe formulae on page 13 of the Tables,the Beta
10
10
+
10
= 0.5
and variance:
10
10
(1010)2 (10101)
++
= 0.01190
+
Wehave asample of 10 values. From the Central Limit Theorem,
.
?XN
.
.
?XN
0.5,
.
,
s
2??
?? , so here
n ??
??
0.01190??
??, and:
10 ??
(0.48
PX<< 0.52)
P( - 0.5798 < Z < 0.5798)
=F
(0.5798)
=F
(0.5798)
-F
-
(1
( - 0.5798)
-F
0.71897=- 0.28103
(0.5798))
=
0.43794
Nocontinuity correction is required asthe beta distribution is continuous.
The Actuarial
Education
Company
IFE: 2021 Examination
Page 14
4
CS1-06: The Central
Limit Theorem
Comparingsimulatedsamples
Thissection of the Core Readingrefers to the use of Rto simulate random samples. This material
is not explained in detail here; wecover it in the PBORresources for Subject CS1.
Wesaw in a previous chapter how to use Rto simulate samples from standard
distributions.
Wecan then
obtain the sum or mean of each of these
samples.
The following
R code uses aloop to obtain the means of 1,000 samples of size 40 from a
Poisson distribution
with mean 5. It then stores these sample means in the vector xbar:
set.seed(23)
xbar<-rep(0,1000)
for
(i
in
1:1000)
{x<-rpois(40,5);xbar[i]<-mean(x)}
Notethat we have used the set.seed
function
so that you can obtain exactly the same
results for your simulation.
The Central Limit Theorem tells us that the distribution
approximately
have a N(5,0.125)
distribution.
The simulated
mean and variance
of the sample
of x are 5.01135 and 0.1250763
means will
which are very close.
Wecan compare our observed distribution
of the sample means with the
Theorem by a histogram of the sample means (using the Rfunction
hist)
superimposing the normal distribution
hist(xbar,
prob=TRUE,
curve(dnorm(x,mean=5,sd=sqrt(0.125)),
IFE: 2021 Examinations
curve (using the Rfunction
Central Limit
and
curve):
ylim=c(0,1.2))
add=TRUE,
lwd=2,
The Actuarial
col="red")
Education
Compan
CS1-06: The Central
Limit Theorem
Page 15
Another method of comparing the distribution
distribution
of our sample means, x, withthe normal
is to examine the quantiles.
In R we can find the quantiles of x using the quantile function.
Using the default setting
(type 7) to obtain the sample lower quartile, median and upper quartile gives 4.775, 5.000
and 5.250, respectively.
However in Subject CS1 we prefer to use type 5 or type 6.
In
R, we can find the quartiles
gives alower quartile,
The quantiles obtained
of the normal
distribution
using the qnorm function.
This
median and upper quartile of 4.762, 5.000 and 5.238, respectively.
here are those of a normal distribution
with mean 5 and variance 5/40.
Thereis no universal agreement amongst statisticians over how to define sample quantiles. The
lower quartile, for example,is sometimes defined to bethe position ofthe
where n is the sample size. Others mayusethe
n +1
4
th sample value,
n +2
n+ 3
4
4
th sample value, or even the
th
value.
In R,if
we do not specify, R will use
definitions
n +3
and
n+31
for the lower and upper quartiles. Other
4
4
can be used by specifying them in the Rcode. In fact, when we use R, we are often
using quite large sample sizes,in whichcase the differences between the different definitions will
be minimal.
Weobserve that the distribution
tails.
This is
The Actuarial
of the sample meansis slightly
what we observed in the previous
Education
Company
more spread out in the
diagram.
IFE: 2021 Examination
Page 16
A quick
function
CS1-06: The Central
way to compare
all the quantiles
in one go is by drawing
Limit Theorem
a QQ-plot using the
R
qqnorm.
If the sample quantiles coincide
with the quantiles of the normal distribution,
we would
observe a perfect diagonal line (which
we have added to the diagram for clarity).
For our
example we can see that x and the normal distribution are very similar, except in the tails,
where we see that x has alighter lower tail and a heavier upper tail than the normal
distribution.
The QQplot gives ussample quantiles which are very close to the diagonalline.
Care needsto be taken wheninterpreting
of the distribution,
a QQplot. In this example, wesee that, at the top end
the sample quantiles are slightly larger than
we would expect them to be. This
suggests that our sample hasslightly more weightin the upper tail than the corresponding normal
distribution.
Atthe lower end, again the sample quantiles are slightly larger than we would expect. This
suggests that our sample has slightly less weight in the lower tail than the corresponding
distribution.
This might be the case if the sample distribution
was(very slightly)
normal
positively
skewed.
If weuse Rto calculate the coefficient of skewnessfor this sample, weobtain afigure of 0.0731.
This confirms the very slight positive sample skewness.
IFE: 2021 Examinations
The Actuarial
Education
Compan
CS1-06: The Central
Limit Theorem
Page 17
Chapter6Summary
CentralLimit Theorem
If X1,, ? Xn areindependent andidentically distributedrandom variables with mean and
variance2s
and nis sufficiently large, then:
?i
XNn
s
i ..2
??
()n,and?Xn
.. N(0,1)
ns
2??
X -s
XN ,and ??
n ??
??
s
2
2
..
??
(0,1)
.. N
n
Normalapproximations
Bin( n, p) can be approximated by Nnp
( , npq)
if np
Poi()??N) can be approximated
if
5, nq>> 5??
?
Gamma(
,a?)
by
( ,
?
can be approximated by
N
?
?k2
can be approximated by Nk
( ,2 k)
The Actuarial
Education
Company
?
large
with continuity correction
??
()aif large
,
aa
2
?
if k large
IFE: 2021 Examination
Page 18
CS1-06: The Central Limit
Theorem
The practice questions start on the next page so that you can keep
all the chapter summaries together for revision purposes.
.
IFE: 2021 Examinations
The Actuarial
Education
Compan
CS1-06: The Central
Limit Theorem
Page 19
Chapter6 PracticeQuestions
6.1
The number of claims arisingin a month under a homeinsurance policy follows the Poisson
distribution with mean0.075.
Calculate the approximate
probability
that at least 50 claims in total arise in a month under a
group of 500independent such policies.
6.2
Exam style
If X follows the gamma distribution
probability
that
with parameters
a
10=and
?
0.2=
, calculate the
X exceeds 80
(a)
using a normal distribution
(b)
using a chi-squared
distribution.
Explain which of these answers is more accurate.
6.3
Whenusing the continuity correction with arandom variable X that can take anyinteger value,
write down expressions that are equivalent to the following:
(i)
X7<
(ii)
X0=
(iii)
X2=-
(iv)
510X<=
(v)
38X=<
(vi)
6.4
[5]
410
The probability
48X=<
.
of any given policy in a portfolio
of term assurance policies lapsing
before it
expiresis considered to be 0.15. Consider arandom sample of 100 such policies.
Calculatethe approximate probability that morethan 20 policies willlapse before they expire.
6.5
Acompany issues questionnaires to clients to obtain feedback on the clarity of their brochure. It
is thought
that 5% of clients do not find the brochure helpful.
Exam style
Let N denote the number of clients who do not find the brochure helpful in asample of 1,000
responses.
Calculatethe approximateprobabilitythat 40
6.6
In a certain large population
individuals
Exam
70N<<
.
[5]
45% of people have blood group A. Arandom sample of 300
is chosen from this population.
style
Calculate an approximate
group A.
The Actuarial
value for the probability
that
more than 115 of the sample have blood
[3]
Education
Company
IFE: 2021 Examination
Page 20
6.7
Exam style
CS1-06: The Central Limit
Consider a random sample of size 16 taken from a normal distribution
variance =2
s
with mean
Exam style
25=and
4. Let the sample meanbe denoted X.
State the distribution of X and hence calculate the probability that
than 26.
6.8
Theorem
X assumes a value greater
[3]
Suppose that the sums assured under policies of a certain type are modelled by a distribution
with mean8,000 and standard deviation 3,000.
of this type.
Consider a group of 100independent policies
Calculatethe approximate probability that the total sum assured under this group of policies
exceeds 845,000.
6.9
Acomputer
routine
selects one of the integers
1, 2, 3, 4, 5 at random
[3]
and replicates the process a
total of 100times. Let S denote the sum ofthe 100 numbers selected.
Exam style
Calculatethe approximate probability that S assumes a value between 280 and 320inclusive.
6.10
Therandom variable Y has a gamma distribution
(i)
(a)
with parameters
[5]
a(1> ) and ?.
Show that the mode of Y is given by:
a -
1
?
(b)
Byconsidering the relative locations of the meanand modeusing sketches ofthe
gamma distribution, state how you would expect the distribution to behavein the
limit as a?8,
but where ?is varied so that the mean
a
has a constant
?
value .
(ii)
(iii)
6.11
X1,,
(i)
Given that
(ii)
50=and
? 0.2=, calculate the value of PY
( > 350) using:
(a)
the chi-squared distribution
(b)
the Central Limit Theorem.
Explain the reason for the difference
between the answers obtained in part (ii).
nX? areindependent andidentically distributed Gam
)maa?
( ,
Show, using moment generating functions,
The random
Ex
a
variable
()p
? distribution,
T, representing
where 1
Calculate the probability
IFE: 2021 Examinations
that
the total lifetime
random variables.
X has a Gamma na?
n (,
of an individual
light
) distribution.
bulb, follows the
2,000?=hours.
that the average lifetime
of 10 bulbs will exceed 4,000 hours.
The Actuarial
Education
Compan
CS1-06: The Central
Limit Theorem
Page 21
Chapter6 Solutions
6.1
The number of claims arising from anindividual policy in a monthfollows the Poi(0.075)
distribution.
follows the
Hence,the number of claims arising in a monthfrom 500independent such policies
Poi(37.5)
distribution.
(PX=>50) becomes
(PX
PZ>
This is approximated
49.5)
by N(37.5,37.5) .
(continuity
correction)
49.5 37.5??37.5
??
??
(PZ=> 1.960)
=-F
=
6.2
(a)
If X
1(1.960)
0.025
Gamma?
(10,0.2) , then
EX
()
10
==
0.2
, and50
var()X
N(50,250) distribution as an approximation.
10
==250. So we will usethe
0.22
[1]
So:
(PX 80)>
P N(50,250) > 80
[]
=
P Z>
80 50??-
?? = 1 - Z N(0,1) = 1.89737[]
250 ??
Interpolating in the normal distribution tables between the valuesfor z
z 1.90=
, wefind that:
(PX
80)>= 1 - 0.97111
=
1.89=and
0.02889
[1]
ie about 2.9%.
(b)
Wenow usethe result that if Xis Gam
)maa?(,
if Xis Gamma(10,0.2) , then 0.4X is
2
?20
, then X
2?
has a
2
?2a
distribution. So
, and the required probability is:
2
PX (80) >= P(0.4 X > 32) =P??? 20
>32
??
[1]
2
From page 166 of the Tables, wesee that the probability that ?20is
less than 32is
0.9567. Sothe required probability is 1 0.9567
The answer in (b) is
more accurate, since we have not used an approximation.
result is exact.
The Actuarial
Education
0.0433-=
, or about 4.3%.
[1]
The chi-squared
[1]
Company
IFE: 2021 Examination
Page 22
6.3
CS1-06: The Central Limit
(i)
X7<
becomes
X
(ii)
X0=
becomes
(iii)
X2=-
becomes
6.5<
0.5
< 0.5X-<
X
2.5>-
(iv)
510X<=
becomes 5.5
(v)
38X=<
becomes 2.5
(vi)
Theorem
If Xcan take integer
10.5X<<
7.5X<<
values then
10X takes values such as 10, 20, 30,... . So from the
inequality in the question, 10X can actually be 10, 20, 30 or 40, which meansthat X can
be 1, 2, 3 or 4. So
15X=<
, and using a continuity correction on these values, this
becomes 0.5
4.5X<<
.
6.4
Let X be the number of policies lapsing
approximately
before they expire.
X
Bin?
(100,0.15) ,
whichis
N
(15,12.75) .
Using a continuity
correction:
(PX>>20) becomes
(PX
PZ>
1=-F
20.5)
20.5 15????
12.75 ??
(1.54)
1=- 0.93822
=
0.06178
Sothe approximate probability that morethan 20 policies willlapse is 0.062.
The exact answeris 0.0663.
6.5
Wehave
?NBin (1000, 0.05).
Usinga normal approximation:
???
NN(50,47.5)
Using a continuity
P(40
[2]
correction
N<< 70)
(40 PN<< 70)
N
<< 69.5) . Hence:
[1]
(PN < 69.5) - PN
( < 40.5)
PZ=<
69.5 50??
?
40.5-- 50 ?
-PZ??
? <
?
47.5 ??
47.5 ?
?
(PZ=< 2.829) - [1 -PZ(
0.99766=- [1
=
IFE: 2021 Examinations
P(40.5
-
< 1.378)]
0.9159]
0.91356
[2]
The Actuarial
Education
Compan
CS1-06: The Central
6.6
Limit Theorem
Page 23
Let X be the number of individuals
XBin (300,0.45)
with blood group A.
(135,74.25)??N
.
[1]
.
Using a continuity
(PX
115.5 135??-
P Z
6.7
correction
115) becomes
PZ
( >
??>=
74.25 ??
-
(PX
115.5)>>
:
[1]
2.263) =PZ
( < 2.263) = 0.988
[1]
If our populationis normal, wedo not needthe centrallimit theorem. The distribution of X is
exactly normal:
?XN
2??
s
,
??
[1]
n ??
??
Hence:
(PXZ>=26)
6.8
Let iX
P
>
26 25??-
?? = P( Z > 2) = 1 - 0.97725
216??
??
=
[2]
0.02275
be the sum assured under the i th policy.
Werequire:
100
??
?PXi
i
??>
845,000
??
1
=
Now, according to the Central Limit Theorem:
100
?i
?
XN 100 8000, 100 3000
()2(approximately)
[1]
i=1
Therefore:
100
??
?PXi Z> 845,000??
=
P
>
845,000
1=- 0.93319
6.9
30,000
??
??i
1
800,000??-
=
?? = P Z >1.5()
??
0.06681
[2]
Wehavethe sum of 100 discrete uniform random variables, iX
formulae from page 10 ofthe Tables, with a1= , b5=
()
EXi
var()Xi
The Actuarial
Education
+15
==
2
1
12
i =(1,2, ?,100) .
Usingthe
and h1= , we get:
3
(5=- 1)(5 - 1 + 2) = 2
Company
[1]
IFE: 2021 Examination
Page 24
CS1-06: The Central Limit
Theorem
Using the Central Limit Theorem:
100
= ?SXi ? N(300,200)
[1]
.
.
i
=
1
Usinga continuity correction, the probability is:
(280
PS
== 320)
P(279.5
S<< 320.5)
[1]
Standardising this:
(279.5 PS<< 320.5) = P( S
320.5) - P( S < 279.5)
<
PZ=<
320.5
300??
200
??
PZ
( =< 1.44957) -PZ(
?
-PZ??<
?
?
279.5-- 300 ?
?
200
?
< -1.44957)
(PZ=< 1.44957) - [1 -PZ
( <1.44957)]
2 (PZ
=< 1.44957) - 1
2= 0.92641 - 1
0.85282
=
6.10
(i)(a)
[2]
Mode
The modeis the maximum of the PDF
y()f:
a
?
fy()
G
()
y
1 -ea? y
y=> 0
a
Differentiating and setting the derivative equal to zero gives:
d
?
fy()
dy
G a()
=-
1) y
2
ye a? y[(
?
Alternatively,
a
a?y
-
--
e
1)--a?--=y]
wecould differentiate
a
??-
21(e yya?
?
??
-
0
the log of the PDF.
This gives:
0or
yy==
a
-
1
?
Since f y() = 0 and f(0) = 0, the first solution of zero mustbe a minimum and therefore the
second solution
IFE: 2021 Examinations
mustbe a maximum.
The Actuarial
Education
Compan
CS1-06: The Central
Alternatively,
Limit Theorem
Page 25
the second solution can be shown to be a maximum by considering the second
derivative:
d2
dy 2
Substituting
a
?
fy=()
a
y=
1
-
y
e
G a()
- 2)y
1)(
aa
(
?a
1)y
-2 ?a(
--
a
-
+?
2y a 1
??-32
??
gives:
?
d
fy()
()
dy
?a (1)
23
?a(
aa
G()
a
a-
(1)
a-
To ensure this is negative,
werequire
22
a-
3
?
??
=-??
( G--??
1)??
aa
?3
aa
--(1)
e
2)
--
)(1
?a
a
1
=-
a--(1)
e
(1)2
(1)a to be positive, hence we have a maximum if 1a>
which was given in the question.
(i)(b)
Sketchlocations of modeand median
Weareletting
a?8,
but keeping
constant. The meanis
a
, which willremain constant.
?
-
The modeis
11
aa
=??
a?8
=
?
, which will be less than the
-
mean
, but will tend to
as
a
.
So, for large
a, the distribution
looks like this:
f(y)
mode
mean
a 1
a
?
?
y
The mean and mode are very close together.
In fact, the distribution approaches a normal distribution in the limit.
The Actuarial
Education
Company
IFE: 2021 Examination
Page 26
CS1-06: The Central Limit
(ii)(a)
Probability
using chi-squared
?YGamma (50,0.2) .
PY>=
(
350)
distribution
?YGammaa?
(
Usingthe relationship
P(2
Y > ??
2
Theorem
, )
2? Y??
?
2
2a
:
350)
(0.4PY=> 140)
P(
Usingthe
(ii)(b)
2
?100 =>
140)
?2 probabilities on page 169 ofthe Tables gives a value of approximately 0.5%.
Probability using normal approximation
The mean and variance of the gamma distribution
EY
( )
==
?
50
0.2
aa
=250
var(Y) =
?
=
50
22
0.2
are:
=1,250
Bythe CLT,the gamma distribution can be approximated
sufficiently large. Here a
50=
, whichis fairly large, so:
.
by a normal distribution
provided
a is
?YN(250,1250)
.
Hence:
PY>
(
350) P Z >
(iii)
350 250??-
??
=P( Z > 2.828) = 1 - 0.99766
??
1,250 ??
=
0.234%
Explain the differences
The gamma distribution is always positively skewed, although it becomes moresymmetrical as
a?8
.
As a consequence, its upper tail is thicker than that of a symmetrical
distribution
and the
corresponding tail probabilities are higher.
IFE: 2021 Examinations
The Actuarial
Education
Compan
CS1-06: The Central
6.11
(i)
Show
Limit Theorem
Page 27
mean has a gamma
distribution
Wehave:
MX()==
t
E etX
Eenn
n11
=??
( )
=
=??
()t
??
(ii)
X
t
?e
X?
nn
?
?
asMX
's identical
Xin
na
t ????
n? ??
Ga
byindependence
n
Thisis the MGFof the Ga
X follows the
tt
?? ??Ee
?
tt()XXn
()? MM nn
1
1=-
?++ XX () ??
na?
n (,)mma
distribution. Hence,bythe uniquenessproperty of MGFs,
na?
n (,)mma distribution.
Probability that average lifetime of 10 bulbs exceeds 4,000 hours
Theindividual lifetimes
T follow the Ex ?()p distribution, whichis the same as the Gamma(1, ? )
distribution. So, usingthe result from part (i) we have:
T ? Gamma(10 1,10
Using the result from
1
2,000
)
Gamma=
(10,0.005)
page 12 of the Tables, the probability
that the average lifetime
T will
exceed 4,000 hoursis:
PT(
4,000)>=??PP(
20 >
2
0.005
4,000)
=
(
22
20 >
40)
From page 166 ofthe Tables,this is 0.005. Sothe probability that the averagelifetime
will exceed
4,000 hours is 0.5%.
The Actuarial
Education
Company
IFE: 2021 Examination
Allstudy materialproducedby ActEdis copyright andis sold
for the exclusiveuse of the purchaser. Thecopyright is
ownedbyInstitute andFacultyEducationLimited, a
subsidiary of the Institute and Faculty of Actuaries.
Unlessprior authorityis grantedby ActEd,you maynot hire
out,lend, give out, sell, store ortransmit electronically or
photocopy any part of the study material.
You musttake care of yourstudy materialto ensurethat it
is not used or copied by anybody else.
Legal action will betaken if these terms areinfringed. In
addition, we mayseekto take disciplinaryactionthrough
the profession orthrough your employer.
Theseconditionsremainin force after you havefinished
usingthe course.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-07: Sampling
and statistical
inference
Page 1
Sampling
andstatistical
inference
Syllabusobjectives
2.3
Random sampling
and sampling
distributions
2.3.1
Explain whatis meant by a sample, a population and statistical inference.
2.3.2
Define arandom sample from a distribution of arandom variable.
2.3.3
Explain whatis meant by a statistic andits sampling distribution.
2.3.4
Determine the
mean and variance
of a sample
mean and the
mean of a
sample variance in terms of the population meanand variance and the
sample size.
2.3.5
State and use the basicsampling distributions for the sample meanand the
sample variance for random
2.3.6
State and use the distribution
samples from a normal distribution.
of the t -statistic for random samples from a
normal distribution.
2.3.7
The Actuarial
Education
State and usethe F distribution for the ratio of two sample variances from
independent samples taken from normal distributions.
Company
IFE: 2022 Examination
Page 2
0
CS1-07: Sampling
and statistical
inference
Introduction
When a sample is taken from
a population
certain things about the population.
the validity
of a statement
the sample information
can be used to infer
For example, to estimate a population quantity or test
made about the population.
A population quantity could beits meanor variance, for example. So we might betesting the
meanof a normal distribution, say.
In this chapter,
we will consider taking a sample from a distribution
and calculating its
mean and
variance. If we wereto keep taking samples from the same distribution and calculating the mean
and variance for each ofthe samples, we wouldfind that these values alsoform probability
distributions.
The distributions
of the sample
mean and sample variance are called sampling
distributions
and
will be used extensively in Chapters 9 and 10 to construct confidence intervals and carry out
hypothesis tests.
Part of this
1
-
n 1
work will explain
??
???
mathematically
22 nX2
SX
rather than
=-
1
n
why the sample variance is usually defined to be
???
SX
22=- nX2?? .
We will also makeuse of the Central Limit Theorem from Chapter 6to obtain the asymptotic
distribution of the sample mean.
Finally, this chapter willlook atthe t distribution and the F distribution in greater detail. You
willrequire a copy ofthe Formulae and Tables for the Actuarial Examinations to be able to
progress through this chapter.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-07: Sampling
1
and statistical
inference
Page 3
Basicdefinitions
The statistical methodfor testing assertions such assmoking reduces life expectancy, involves
selecting a sample ofindividuals from the population and, on the basis of the attributes of the
sample,
making statistical inferences
about the corresponding
attributes
of the parent population.
Thisis done by assuming that the variation in the attribute in the parent population can be
modelled using astatistical distribution. Theinference can then be carried out on the basis of the
properties of this distribution.
Theoretically this (technique)
deals with samples from infinite
populations.
Actuaries are
concerned
with sampling from populations
of policyholders,
policies, claims, buildings,
employees, etc. Such populations
may be looked upon as conceptually infinite
but even
without doing so, they will be very large populations
of many thousands
and so the
methods for infinite
populations
will be more than adequate.
1.1
Randomsamples
Aset ofitems selected from a parent population is arandom sampleif:
the probability
that anyitem in the population is included in the sample is proportional
to
its frequency in the parent population and
the inclusion/exclusion of anyitem in the sample operates independently of the
inclusion/exclusion of any other item.
Arandom sample is made up of (iid) random variables and so they are denoted by
capital Xs.
We will use the shorthand notation X to denote a random sample, that is,
= XX12,X ,..., Xn().
population distribution
fx
?
();, where
Due to the
considered
enough.
?
An observed
sample
will be denoted
by
=
xx12,x ,..., x n().
will be specified by a density (or probability function)
denotes the parameter(s)
The
denoted by
of the distribution.
Central Limit Theorem, inference
concerning
a population
mean can be
without specifying the form of the population,
provided the sample size is large
Question
Identify the population, the sample and the statistical inference in each ofthe following examples.
(i)
Weare studying
cities.
(ii)
whether air pollution levels are acceptable in
UK
Weare analysing the burglary claims for last January to get afeel for whatthe total range
of claims
The Actuarial
10 cities to establish
Education
might be for the
Company
whole year.
IFE: 2022 Examination
Page 4
CS1-07: Sampling
and statistical
inference
Solution
(i)
Air pollution
The population consists of all cities in the UK.
The sample consists of the 10 cities selected for study (and the
measurements
of the pollution
levels for these).
The statistical inference required hereis to assess whether there are unacceptable pollution
levels in UKcities in general.
Thisis an example of a statistical test.
(ii)
Burglary claims
The population consists of all possible claims that could arise during the year.
The sample consists of the amounts paid for each of the January claims.
The statistical inference required hereis to find an approximate range for the total claim amount
for the year.
Thisis an example of a confidence interval.
1.2
Definition of astatistic
A statistic
X= ? Xi
n
of course
is a function
andSXi2
of
1
n
X only and does not involve
1?
()2
X
=-
any unknown
are statistics whereas
parameters.
1
n?
Xi
-
Thus
()2 is not, unless
-
is known.
Note here the difference
between
,
which is the population
mean(ie the
meanfor all possible
observations, whichis usuallyunknown)and X, whichis the sample mean(ie the meanofthe
sample values which wecan calculate for any given sample).
We might also beinterested in statistics such as max
iX , the highest value in the sample.
A statistic
can be generally
denoted
by
X.
()g
Since a statistic is a function
of random
variables, it will be a random variable itself and will have a distribution, its sampling
distribution.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-07: Sampling
2
and statistical
inference
Page 5
Momentsofthe sample meanandvariance
In the following section we willlook at the statistical properties of the sample meanand sample
variance, which arethe mostimportant sample statistics.
2.1
Thesample mean
Suppose iX
has
mean
and variance
s
2.
Recall that the sample
mean is
X
=
? Xi
.
n
Consider
first ?:Xi
??==
X??
EX
??
var
iiE
??
?? ?
??
?? =
??
ns
=
var XXii ??
??
2
=
n
since they areidentically
sincetheyareindependent
since they are identically
Weare using the results from Chapter 4 that
areindependent
1,,
n? XX
As
distributed
EX
]??+
++ =]
X
11var[XXnnvar[
var[ X
] +
n
s
is called the standard
n
[EX
11][]
++ X
= 1 ?XXi , wecannowwritedownthat ??
=??
EX
Note: sd=??X
??
distributed
++ E[ X
nn]??=
, and if
.
andvar
error of the sample
??=
??
mean) and a variance ofns 2
These are very important
Aconsequence
(ie the population
n2
Xns 2 =
s
2
n
.
mean.
Wehave establishedthat the sample mean X hasan expectedvalue of
population
1
(ie the same asthe
variance divided by the sample size).
results and will be used extensively in Chapters 9 and 10.
of the result for the variance of X is that as the sample gets bigger the variance
gets smaller. Thisshould beintuitive since a bigger sample produces more accurate results.
2.2
Thesample variance
Recallthat the sample variance is
n
Considering
The Actuarial
-
1
SXi?()
X221
.
=-
only the
mean of2S , it can be proved that
n-1
22
Education
s??
=??
22ES
as follows:
?SXi =-nX ??
??1
Company
2
IFE: 2022 Examination
Page 6
CS1-07: Sampling
Takingexpectationsand notingthat for anyrandom variable Y, EY
and statistical
var[Y]
inference
()2
]
E[Y=+[]2
(obtained by rearranging var( )Y=( ) [ E( )]22 ) leads to:
YEY
[ES]
1
n
-
1
1??
?E
22[] =X
i
?-()
=+s
1-????
1
n- 1
-()
1
n
-
1
nE X2[]()
n
=+s 22
n=- (1)
{}
s
=ss
s
??+
????22
2
2
nn
2 -nn
????
??
2
{}
22
as required.
To work out
EX2[] , weve used the general formula just
[EX2] var( X)
The denominator
mentioned,
which tells us that
[EX=+
]()2 and then weve usedthe results wejust derivedfor the sample mean.
of
n1-
2
is usedto makethe meanof S equal to the true value of
s
2. Thisis
the motivation behind the definition of the sample variance. Later in Chapter 8, we will discover
that this result meansthat the sample variance is an unbiased estimator of the population
variance.
Thereis no generalformula for var[]S2 . This depends on the specific distribution ofthe
population. The only one that you will be required to know for Subject CS1is for a normal
population. Thisis covered in Section 3.2.
Question
Thetotal number of new motorinsurance claims reported to a particular branch of aninsurance
company on successive days during arandomly selected monthcan be considered to come from
the Poisson distribution
with
?5=
. Calculate the meanand variance of a sample meanbased on
30 days figures.
Solution
The Poisson distribution in the question has meanand variance of 5.
If the sample sizeis 30 then 5EX[] =
and var[]X
5
==0.167 .
30
Wecan apply the same theory to situations involving a continuous distribution.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-07: Sampling
and statistical
inference
Page 7
Question
Calculatethe meanand variance of the sample meanfor samples of size 110 from a parent
population whichis Pareto with parameters a 5=
and ?= 3,000 .
Solution
?
The Pareto distribution has a meanof
a-
the question has
Thus
750=and
[ ] = 750EX
and var[]X
The formulae
for the
s
, and variance of
1
a?
2
, so the distribution in
(12)(aa -- 2)
2 =937,500 .
937,500
110
== 8,522.7 .
mean and variance of a Pareto distribution
are given on page 14 of the
Tables.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 8
CS1-07: Sampling
3
Samplingdistributionsfor the normal
3.1
Thesample mean
The Central Limit
Theorem
provides
alarge-sample
X without the need for any distributional
large n:
Xs
/
n
,NX
~(0,1)
N...
..
or
2
~
approximate
and statistical
sampling
inference
distribution
assumptions about the population.
for
Sofor
n()/s
This result is often called the z result.
It transpires
that the above result
gives the exact sampling
distribution
of
X for random
samples from a normal population.
3.2
Thesamplevariance
The sampling
distribution
and variance
s
2
(1)nS2
~n?
2
when sampling
from
a normal
population,
with mean
, is:
-
s
of2S
2
-1
Thisis a more advanced result. Its proof is beyond the scope of Subject CS1.
Whereasthe distribution of X is normal and hence symmetrical, the distribution of2S is
positively skewed especially so for small n but becoming symmetrical for large n.
2
2
?4
f(x)
?20
0.8
0.8
0.6
0.6
0.4
0.4
f(x
0.2
0.2
0
0
05
02
10
46
10
x
x
Using the
8
?2 result to investigate
the first and second order moments of2S , when
sampling from a normal population, and the fact that the
meanand variance of
2
?k
are k
and 2k , respectively:
1nS 2??()
s
2
??=- En
??
??
?
E S22
1[
]
s
2
Thisis the result in Section 2.2, in the context
IFE: 2022 Examinations
=
n- 1
(n
-
1)
= s
of a normal distribution.
The Actuarial
Education
Company
CS1-07: Sampling
and statistical
inference
Page 9
Wealso have:
()
1nS 2??-
var
??=- 2(
]nS 1)
2
? var[
22= n2(
n-
n -(1)
??
??
s
2 44
- 1) = ss
1
Theseresults areimportant.
For both
X and 2S the variances
decrease
increases. Addedto the facts that
closer to
and2S
EX[]=
gets closer to 2s
properties of estimators of
and tend to zero as the sample size
and
ES [] s=
22 , these imply
asthe sample size increases.
that
n
X gets
These are desirable
and2s .
Question
Calculatethe probability that, for arandom sample of 5 values taken from a N(100,252)
population:
(i)
X willbe between 80 and 120
(ii)
S will exceed 41.7.
Solution
(i)
2=N
5) (100,125):
Since
?XN(100,25
(80 PX<< 120) = P
80 100
125
<Z
120-- 100??
??<
125 ??
(=- 1.789 <PZ< 1.789)
=F
(1.789)
-F
0.96319=- (1
(ii)
Since
4S2
s
2
2
?? 4
( - 1.789)
-
0.96319)
=
0.926
, wehave:
PS>=
(
41.7)
S
P
s
>
22 ??44
41.7
22
25
??
??
??
P( => 11.13) = 1 - (
22
44
<
11.13)??P
Interpolating between values taken from page 165 of the Tables gives:
(
PS
The Actuarial
Education
41.7)
Company
0.0253>
IFE: 2022 Examination
Page 10
CS1-07: Sampling
and statistical
inference
3.3 Independenceofthe sample meanandvariance
The other important
X and2S .
feature
when sampling
from
Afull proof ofthis is not trivial
normal
populations
is the independence
of
but it is aresult that is easily appreciated as
follows.
Suppose that a sample from
some normal
does not give any information
distribution
has been simulated.
The value of x
about the value of 2s .
Remember that changing the mean of a normal distribution shifts the graph to the left
Changing the variance squashes the graph up or stretches it out.
However, if the sample is from
information
some exponential
about the value of 2s , as
For the exponential
distribution
and 2s
distribution,
or right.
the value of x does give
are related.
these are directly linked
since
=
1
?
and
s
2
1
=
?
2
.
Other cases such as Poisson, binomial and gamma can be considered in a similar way, but
only the
normal
has the independence
property.
Question
Calculatethe probability that, for the sample in the previous question, (i) and (ii) will both occur.
Solution
Since X and2S areindependent,
(80 PX<< 120
n
S > 41.7)
wecan factorise the probability:
=
(80
<
PX < 120)
P( S > 41.7)
Referring backto the previous question, we have already found the probabilities. So:
(80
PX<< 120
IFE: 2022 Examinations
n
S > 41.7)
=
0.926
0.0253
=
0.023
The Actuarial
Education
Compan
CS1-07: Sampling
4
and statistical
inference
Page 11
Thet result
The sampling
distribution
for
X-
X , that is,
s
subsequent
units for inference
/
~(0,1)
or
N
n
concerning
when the
X ~)Nn(
population
2
s
,/
variance
,
will be used in
s
2
is known.
However this is rare in practice, and another result is needed for the realistic situation
s
2
is unknown.
This is the t result
The t result is similar to the
X-
Thus
or the
z result
t sampling
but with
distribution.
replaced
s
when
by S and
(0,1)N replaced
by tn1-
.
~tn- 1.
/ Sn
It is not a sampling distribution for
Thekt
X alone asit involves a combination of X and S.
variable is defined by:
tk =
N(0,1)
?
2
k
where the
(0,1)N and
2
?k
random
variables
are independent
/ k
Thenthe t result abovefollows from the sampling distributions of the last section, that is,
Xs
/
n
is the
obtain
(0,1)Nand
-
(1)
nS 2
s
X-
/ Sn
2
~n?
2
-1
is the
2
?k
, together
with their independence, to
~tn- 1 when sampling from a normal population.
The t distribution is symmetrical about zero andits critical points aretabulated.
Percentage points (or critical points) for the t distribution
can be found
Tables. The t distribution has one parameter, which,like the 2?
number of degreesoffreedom.
on page 163 of the
distribution, is called the
Whenusingthe t distribution,the number of degreesof
freedom is the same asthe number we divide by when estimating the variance.
A graph ofthe t distribution is also given on page 163 of the Tables.
It looks similar to the standard normal (ie symmetrical) especially for large values of degrees
offreedom. The following picture shows a2t density, a 10t density and a (0,1)Ndensity
for comparison.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 12
CS1-07: Sampling
In fact, as
and statistical
inference
ktk?8? N(0,1) .
,
The1t distribution is also called the Cauchy distribution
and is peculiar in that none ofits
moments exist, not even its mean. However since samples
should not arise as a sampling distribution.
For k2> , the kt
distribution
has mean 0 and variance
of size 2 are unrealistic,
/(
it
2)kk.
Question
State the distribution of
X - 100
S
5
for a random sample of 5 values taken from a N(100,
2s
)
population. Calculatethe probability that this quantity exceeds 1.533.
Solution
From previous results
X - 100
S
5
?t 4.
From the Tables, wesee that the probability that this quantity will exceed 1.533is 10%.
Wenow consider the situation involving
IFE: 2022 Examinations
two samples from
different
normal populations.
The Actuarial
Education
Compan
CS1-07: Sampling
and statistical
inference
Page 13
Question
Independent
N(,)11
s
2
(i)
random samples of size 1n
and N(,)22
2
s
and2n
are taken from the normal populations
respectively.
Write down the sampling distributions of1X and2X and hence determine the sampling
distribution of
(ii)
X- 12X
, the difference between the sample means.
Now assumethat
22
12
s==
ss
2.
(a)
Expressthe sampling distribution of
X- 12X
in standard normal form.
(b)
State the sampling
-+ -n
(1)nS
( 2
11
distribution
of
2
s
Usingthe (0,1)Ndistribution from (a) and the
(c)
1)S
22
2
.
?2 distribution from (b), apply the
definition of the t distribution to find the sampling distribution of
s
2
X- 12Xwhen
is unknown.
Solution
(i)
X1 isNn
X
(,s
2
11
1)
and 2X is
The variance of
(XX
s
(ii)(b)
As
2 Nn
22
2) .
12Xis the difference between two independent
normal, with mean
(ii)(a)
(,s
-
12 and variance
X- 12Xis now
12()
1
--- 2 )
2
11??
s
2
.
11 ??
??+and so standardising gives:
nn
12??
2
(1)nS2
-
their sum is also
additive property
nn
12
?N(0,1)
s
independent),
+
??+
nn12??
(1)nS2
11 ?? 2 1 and
n 1-
-
s
normal variables and sois itself
22
ss 12
2
?
ofindependent
22
2
??
2
n 2-
1
, with 2nn
are independent (because the samples are
+-
12
?2 distributions
degrees of freedom.
(ie
+??mn
22 ?
?
This is using the
2m+ n ), which we
proved, in Chapter 4, Section 4.2.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 14
(ii)(c)
CS1-07: Sampling
and statistical
inference
Weusethe definition of the t distribution:
tk =
N(0,1)
2 k
?k
(0,1)N
, and the distribution in part (ii)(b) is
The distribution in part (ii)(a) is
(XX
12()
2
s
The2s
. So:
--- 12)
11??
??+
nn
12??
(1)nS
11-+ ( n2
s
2
?nn2+- 12
2
22
-
+-2
tnn 12?
1) S2
nn
12 +- (2)
s cancel to give:
(12 XX()
(1)nS
11-+2(n
--- 12)
nn
(2)+12
Wewillseethat
n1
(1)nS
11-+ 2(n
?tnn
11??-
1)S222
-
12+-2
??+
n2??
1)S22
2 , whichappearsin the denominator,is the pooled
nn
(2)
12 +-
variance ofthe two samples. It is a weighted average of the individual sample variances,
using the degrees of freedom
IFE: 2022 Examinations
asthe
weightings.
The Actuarial
Education
Compan
CS1-07: Sampling
5
and statistical
inference
Page 15
TheFresultfor varianceratios
/Uv1
, where U and
/Vv2
The F distribution is defined by F =
variables
samples
with1v
and2v
of size 1n
2 and
2 , then
s1
s2
degrees of freedom respectively.
and 2n
22
s11
22
s22
S /
S /
respectively
~ Fnn,-12
are taken from
V areindependent
2?
random
Thus if independent random
normal
populations
with variances
.
11
TheF distributiongivesusthe distributionofthe varianceratiofor two normalpopulations.1v
and2v
can be referred to asthe number of degrees of freedom in the numerator and
denominator, respectively.
It should
be noted that it is arbitrary
denominator
and so
S /
S /
Since it is arbitrary
22
s22
22
s11
which one is the
~ Fnn,-21
11
numerator
and which is the
.
which value is the numerator
and which is the denominator,
and since only the
upper percentage points are tabulated, it is usually easierto put the larger value of the sample
varianceinto the numerator andthe smaller sample variance into the denominator.
Alternatively,
FF
1,nn12
--
1
?
1
~~ -Fn n1,
F
2
1.
1-
This reciprocal form is needed when using tables of critical points, as only upper tail points
are tabulated.
See Formulae
This is animportant
and Tables.
result and will be used in Chapter 9in the work on confidence intervals
and
Chapter 10in the workon hypothesis tests.
Thepercentagepointsfor the F distributioncanbefound on pages170-174ofthe Tables.
Question
Determine:
(i)
PF(
9,10> 3.779)
(ii)
PF12,14<(3.8)
(iii)
PF11,8 <(0.3392)
(iv)
the valueof p suchthat PF(
14,6 ) 0.01p<=
.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 16
CS1-07: Sampling
and statistical
inference
Solution
Byreferring to the Tables on pages 170 to 174:
(i)
3.779is greater than 1, so we usethe table of upper percentage points to see that:
PF(
9,10 3.779) 0.025>=
ie 3.779is the 21/2%point of the
(ii)
9,10Fdistribution (page 173).
Since 3.8 is greater than 1,it is again an upper value and so we use the Tables directly.
Weturn the probability around as follows:
PF
( 12,14 3.8)<= 1
(iii)
PF( 12,14> 3.8) = 1 - 0.01 = 0.99
1
Sincethis is alower percentage point, we need to usethe
Fmn
result:
,
( 11,8 < 0.3392) = P
PF
(iv)
??
??>= P F8,11
??
??
F11,8
1
??11
>2.948)
??>= PF8,11
(
??0.3392
0.3392
=
0.05
Sinceonly1%ofthe distributionis below p, thisimpliesthat p mustbealower
percentage point. So we usethe
1
Fmn
result again:
,
PF
F<=p)
( 14,6
P
6,14
??
> ?? = 0.01
??
11
? = 4.456
pp
? p = 0.2244
The meanofthe F distribution is 1,regardless ofthe number of degrees offreedom. So values
such as 0.3392 and 0.2244 given above are valuesin the lower tail, whereas3.779 and 3.8 are
upper tail values.
Wenow apply the F result to problems involving sample variances.
Question
For random samples of size 10 and 25 from two normal populations
with equal variances, use the
??
2
F distribution to determine the values of a and
P
S12
??
S22
??
??
such that P
S1
S22
a
??>=
??
??
0.05 and
??<= 0.05, wheresubscript 1 represents the sample of size 10 and subscript 2represents
the sample of size 25.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-07: Sampling
and statistical
inference
Page 17
Solution
Since the population
variances are equal,
From the table of 5% points for the
PF(
9,24
2.300)
0.05>=
, and therefore
Now weknow that
1
==
2.900
S21
S22
<
S12
S22
?F9,24and
F distribution
a=
S22
S12
?F24,9
.
on page 172 of the Tables, wefind that
2.300 .
is equivalent to
S22
S12
>
1/
and (
24,9PF
2.900)>= 0.05, giving
0.345 .
Wecan use the
F distribution
to obtain probabilities
relating to the ratio of two different
sample
variances.
Question
Calculatethe probability that the sample variance of asample of 10 values from a normal
distribution will be morethan 6times the sample variance of a sample of 5 valuesfrom an
independent
normal distribution
with the same variance.
Solution
If X denotes the sample with 10 values and Y denotes the sample with 5 values, weknow that
asthese arefrom independent normal distributions,
S
SYYs22
Sincethe populationvariancesare equal,this meansthat
2
SoPPSX 2
SY
??
??>=
??
22
sXX
?F
9,4
.
22F?XY
SS
9,4.
F9,4>66.
()
From the Tables page 172 wesee that the upper 5% point of 9,4F
is 5.999. Sothe required
probability is just less than 5%.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 18
CS1-07: Sampling
and statistical
inference
The chapter summary starts on the next page so that you can keep
all the chapter summaries together for revision purposes.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-07: Sampling
and statistical
inference
Page 19
Chapter7 Summary
The sample meanand sample variance are given by:
X= ?Xi
22nn1111 ??X -nX2()2
and
SXiiX
()
=-
n
=
--
Wecan find their sampling
EX==
()
means and variances.
var(X)
s
For any distribution:
2
n
S( )s=
E
22
For a normal distribution only:
4
2s
var( S2) =
n- 1
The standard deviation of the sample meanis known as the standard error of the sample
mean.
Tofind probabilities involving
any distribution:
???
XN
,
s
X or S2, weneed their distributions. For alarge sample from
2??
??
n ??
??
X hasthis exactdistribution(rather than it beingapproximate)for anysize of samplefrom a
normal distribution.
Whensampling from a normal population,
the sample
For a random sample from a normal population, if
X-
n
s
If
s
2
s
2
mean and variance are independent.
is known:
?N(0,1)
is unknown:
X-
? tn- 1
Sn
For a random sample from
(1)nS2
-
s
The Actuarial
2
Education
a normal population:
2
??n-1
Company
IFE: 2022 Examination
Page 20
CS1-07: Sampling
and statistical
inference
If wetake random samples from two independent normal populations:
S
S
22
s11
22
? Fnn
s22
1,12-- 1
The t and F distributions are defined as:
tk =
N(0,1)
2
?k
2
,Fmn =
k
?m
2
?n
m
n
To determine probabilities involving the lower tail ofthe F distribution,
PFk,,
k()
mn(<=
IFE: 2022 Examinations
weusethe result:
PFnm > 1/ )
The Actuarial
Education
Compan
CS1-07: Sampling
and statistical
inference
Page 21
Chapter7 PracticeQuestions
7.1
Arandom sample of n observations is taken from a normal distribution
variance
s2. The sample variance is an observation
of a random
with mean
and
variable 2S
.
Derive expressionsfor
ES2() and var( 2)Susing the relationship between the gamma and
chi-squared distributions given on page 12 of the Tables.
7.2
(i)
Determine:
PF
3,9 <(3.863)
(a)
(ii)
Exam style
PF(
10,10< 0.269)
(b)
PF18,9p>=()
Determine the value of p such that:
(a)
7.3
(b)
PF(
24,30
)
0.10p>=
Arandom sample of 10 observations is drawn from the normal distribution
standard
deviation
15. Independently,
normal distribution with mean
respective
sample
7.5
with mean
and
a random sample of 25 observations is drawn from the
andstandard deviation 12. Let X and Y denotethe
means.
Evaluate PX
Y-> (3) .
7.4
99%
[3]
Calculate:
(a)
PF6,8 >(6.371)
(b)
PF7,12 >(0.3748) .
(i)
(a)
State the definition ofthe kt
(b)
Showthat:
X-
distribution.
?tn- 1
Sn
usingX
(ii)
?
s
,
Nn
()2and
(1)nS
-
22 ?s? n2-1.
(a)
Statethe definition ofthe Fmn
, distribution.
(b)
Showthat for suitably defined samples:
S
S
22
s11
22 ?Fmn1,-- 1
s22
using the fact that
The Actuarial
Education
Company
(1)nS
-
22 ? 2
s? n -1,.
IFE: 2022 Examination
Page 22
7.6
7.7
CS1-07: Sampling
inference
Evaluatec suchthat:
(a)
PF(
2,15 )
(b)
PF8,5c<=()
97.5%c<=
5%.
Show that:
PF mn
7.8
and statistical
a>= b
()
P=??
Fnm
,, <
?
1??
a??
b
Arandom sample ?110,,XX is drawn from the
(5,4)Ndistribution.
Evaluate:
7.9
Exam style
(i)
??>
PX 60
???
(ii)
PX
(iii)
PXX>-4 and
Let
X(,
,12
???
??X->
34()
2
1
(
9
Exam style
???
?,XX9) be a random sample from a N(0,)s 2
distribution. Let X and2S denote the
sample meanand variance respectively.
Calculate the approximate
7.10
) 2 < 2.6??X
.
??
value of
PXS> () by referring
to an appropriate
statistical
table.
[3]
House pricesin region Xare normally distributed with a meanof 100,000 and a standard
deviation of 10,000. House pricesin region Yare normally distributed with a meanof 90,000
and a standard deviation of 5,000.
Arandom
random sample of 5 houses from region Y.
Calculate the probability
sample of 10 houses is taken from region
X and a
that:
(i)
the region X sample
meanis greater than the region Ysample
(ii)
the difference between the sample meansis less than 5,000
[3]
(iii)
the region Xsample variance is less than the region Ysample variance
[3]
(iv)
the region X sample standard
deviation is
mean
more than four times greater than the region
sample standard deviation.
IFE: 2022 Examinations
[3]
Y
[2]
[Total 11]
The Actuarial
Education
Compan
CS1-07: Sampling
7.11
and statistical
inference
The time taken to process simple home insurance
Page 23
claims has a mean of 20 mins and a standard
deviation of 5 mins.
Exam style
Calculate,stating any assumptions, the probability that:
7.12
Exam style
(i)
the sample meanofthe times to process 5 claimsis less than 15 mins
[2]
(ii)
the sample
[2]
(iii)
the sample variance of the time to process 5 claims is greater than 6.65 mins
[2]
(iv)
the sample standard deviation of the time to process 30 claims is less than 7 mins
[2]
(v)
both (i) and (iii) occur for the same sample of 5 claims.
mean of the times to process 50 claims is greater than 22 mins
[1]
[Total 9]
Astatistician suggests that, since a t variable with k degrees offreedom is symmetrical with
mean0 and variance
variable
N 0,
k
k- 2
for k2> , one canapproximatethe distribution usingthe normal
k ??
??k 2??
.
(i)
Usethis to obtain an approximation for the upper 5% percentage points for a t variable
with:
(a)
4 degrees offreedom, and
(b)
40 degrees of freedom.
(ii)
Compare your answers with the exact values from tables
result.
[2]
and comment
briefly on the
[2]
[Total 4]
The Actuarial
Education
Company
IFE: 2022 Examination
Page 24
CS1-07: Sampling
and statistical
inference
The solutions start on the next page so that you can
separate the questions and solutions.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-07: Sampling
and statistical
inference
Page 25
Chapter7 Solutions
7.1
The sampling distribution for S2 is:
(1)nS2
-
s
The
2
?k
2
??n-1
2
distribution
is the same asthe gamma distribution
k 2 =
Ek
12
2??== a
?k??
?
(1)nS2
??
??
s
var
2??==
?k??
-
?n-1
n -(1)
s
(1)
?
Sn 1)=var () 2(
?
1
?=
2
. Therefore:
k 2 =
2k
22
(1 2)
22
() n=-1 ? s()=
ES
2
()
22
S
24
nn--ss2(1)
4
2
==var
42 1n-
n-(1)
3.863is greater than 1. So, usingthe upper percentage points from the Tables:
(PF3,9
(i)(b)
2
and
ES
2
s
(i)(a)
a=
(1)nS2????== var ?n-1 ()2 2( n - 1)
2 ??
s
??
var
7.2
a
?
EE ()2??n1 ?
??==
2
and
k
with
3.863)
<= 1 -P( F3,9>3.863)= 1 - 0.05 = 0.95
Sincethis is alower percentage point, weneed to usethe
1
Fmn
result:
,
(PF10,10 0.269)<= P F10,10>
(ii)(a)
1 ??
?? =P(F10,10>3.717) = 0.025
0.269??
Since only 10% of the distribution is above p, p mustbein the upper tail. Soreading off
from the 10%tables gives:
(PF24,30 p)
(ii)(b)
0.10>= ? p =1.638
Since 99% of the distribution is greater than p, p mustbein the lower tail. So we need
to usethe
1
Fmn
result:
,
(PF18,9F>=p)
The Actuarial
Education
Company
P
9,18
??
< ?? = 0.99
??
?
? ?P F
?
11?
0.01
?>=9,18
pp?
IFE: 2022 Examination
Page 26
CS1-07: Sampling
and statistical
inference
Reading off the 1% tables:
1
= 3.597
p
7.3
? =p
0.278
PXY-> (3) , therefore
Werequire
we needthe distribution of XY- . The distributions of the
sample meansare:
??
152212
?? Y??
10??
??
XN N
?
? ,,
?
?
?
?
25 ?
?
[1]
The meanofthe difference is the difference of the means,and the variance of the difference is
the sum ofthe variances:
XY ? N
=??0,
(PX
7.4
(a)
152212 ??
-+
10
25 ??
??
Y-> 3) = P( Z > 0.564)
N(0,28.26)
=
1 - P( Z
<
0.564)
[1]
=
1
-
0.71362
=
0.28638
[1]
Probability
6.371is greater than 1. Usingthe upper percentage points from the Tables:
PF(
6,8 6.371)
(b)
0.01>=
Probability
Since this is alower
percentage
point,
we need to use the
1
result:
Fmn
,
(PF7,12 0.3748)>= P F12,7<
7.5
(i)(a)
Definition
1 ??
(
<2.688)
?? =PF12,7
0.3748??
=
1 - (P F12,7 >2.688) = 1 - 0.1 = 0.9
of t distribution
If ZN? (0,1) and W k??2 , and Z and W areindependent, then:
Z
/Wk
(i)(b)
? tk
Show t result
Standardising, weget:
X
s
-
n
?N(0,1)
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-07: Sampling
and statistical
inference
Page 27
Wealso have:
S2
2
s
2
?n-1
?
n- 1
Substituting these into the definition of the tn1-
distribution:
Xn
s
X -
22
s
S
=
?tn- 1
Sn
(ii)(a) Definitionof Fdistribution
If
U
??2
and
m
/Um
/Vn
V
??2
n, and U and Vareindependent,
then:
? Fmn
,
(ii)(b) Showresultis Fdistribution
2 and S2
2 from normal
Assumingtwo samples of size mand n, with sample variances S1
distributions
2
with variances
2
s1 and s2 , respectively,
-(1)mS
22
2
S
112?? ? m-1
?
?m-
ss
-(1)nS
?n-
221
m -1
11
22
S
222??
?
we have:
221
ss 22
2
?n-1
n-1
Hence, by the definition of the Fmn1, -- 1 distribution:
7.6
(a)
22
11
S
s
S
s22
22
? Fmn
1, -- 1
Since97.5%ofthe distributionis below c, c mustbe onthe uppertail. Soreadingfrom
the 21/2%
tables gives:
(PF2,15
(b)
c)<= 0.975
?
PF
( 2,15
c)>= 0.025
?
c = 4.765
Sinceonly 5%of the distributionis below c, c it mustbein the lower tail. So weneedto
usethe
1
Fmn
result:
,
(PF8,5F<=c)
The Actuarial
Education
Company
P
??
5,8
> ?? = 0.05
??
?
11
= 3.688 ? c = 0.2711
cc
IFE: 2022 Examination
Page 28
7.7
CS1-07: Sampling
Taking reciprocals,
and statistical
inference
we obtain:
PF mn a
()>= b
11??
? P??
,
<=
Fa??
??
mn,
From the definition
of the
mn=
b
F distribution:
??
?
22
mn
mn1
==FF,,nm
22
mn
?? nm
,
nmF
Hence:
PF mn ()
a>=b
11??
? P??
1??
? P F,,nm
<=
Fa??
<=b
??b
a??
??
mn
,
7.8
Probability of sum
(i)
?
Using the result that
(? PXi
(ii)
?XNi n (,
60)>= P Z >
sn2)N=(50,40)
60 50??40
Probability
of central
Z(
??= P
??
> 1.581) = 1 -F (1.581) = 0.0569
moment
(1)nS2
-= n
S-()
1)22(XXi
and
2
Since?
, we obtain:
-
??2
n-1:
s
(iii)
2
S
22
??=P[9 S >34]??= P???
PX
?(
i
X)-> 34
934??
>
??
44 ??
=
P
?? 2
9
>
8.5?
?
=
1 - 0.5154
=
0.485
Joint probability
Since
=-X?
SX9
()i2 and the fact that X and2S areindependent when weare sampling
1
from a normal distribution:
[PX
>
4 and S
2.6]<=> [PX
4] P[ S
<
2.6]
Now:
var(X)==
So
?XN(5,0.4)
s
2
n
4
= 0.4
10
, and:
[PXZ>=4]
IFE: 2022 Examinations
P
>
45??-
??= P(Z > - 1.581) =F(1.581) = 0.9431
0.4??
The Actuarial
Education
Compan
CS1-07: Sampling
and statistical
(1)nS2
-
Also using
s
??2
2
[PS<=2.6]
Hence
7.9
Using
inference
[PXS><4 and
X-
Sn
n-1:
S
P
Page 29
<
22??99
2.6
44
?? = ?[
P
??
??
2.6] = 0.9431
2
9 <
0.9145
=
15.21] = 0.9145
0.862 .
?tn- 1:
XX
3
S
S
3
???
tt88
[1]
Considering the probability in the question:
PX S
()>= P?? >
??
?3
XX
=P?
[1]
From the Tables, wecan see that this probability lies between 1% and 0.5%. Byinterpolation
find that the probability is approximately 0.89%.
we
[1]
?SS
??
7.10
?
>13? =
(P t8 >3)
(i)
?
Probability that the meanof Xis greater than the mean of Y
Werequire
PXY>=Y
()
1,000s, the distributions
XN 100,10()
>0), therefore weneedthe distribution of X
Y- .
(PX -
of the sample
and
Working
in
means are:
??YN 90,5()
So:
XY ? N 100-- 90,10
+
5
()= N(10,15)
[1]
and:
(PX
The Actuarial
Y-> 0) = P Z >
Education
Company
010????= P(Z > - 2.582) =F(2.582) = 0.995
15??
[2]
IFE: 2022 Examination
Page 30
(ii)
CS1-07: Sampling
Probability
that the difference
Usingthe distribution of X
Y(|PX Y-<| 5) = P(5-
between
and statistical
inference
meansis less than 5,000
from part (i):
< X - Y < 5)
PX=- Y <(5)- P( X - Y < -5)
510??
PZ=<
15 ??
?
--?510
-
-PZ??<
?
=F
Probability
-F
S
2
22
YY
(3.873)]
-F
(1.291)
that the sample variance
SS2
s X
ss
[1]
[1]
Werequire PSS<=
S() P SX
22
XY
2
X
< - 3.873)
0.0983
=
(iii)
(1.291)] - [1
(3.873)
?
15 ?
?
PZ
( =<- 1.291) -PZ
(
[1=-F
[1]
2
S2
XY
==
2
X
Y
s
2
Y
of Y
<1()2
. Usingthe definitionofthe F distribution,weget:
22
S
XY
S
of X is less than the sample variance
22
S
XY
S
=
4
102 52
? F9,4
Hence:
22
PS XY
S 1()<=
<??
P
22
SS
??
=
0.25)
XY< 0.25(PF9,4
??
4
[1]
??
Sincethis is in the lower tail we needto usethe
1
Fmn
result:
,
PF9,4 (0.25) <=
P F4,9>
1 ??
(
??= PF4,9
> 4)
0.25??
This valueis between 21/2%
and 5%, and,interpolating,
approximately 4.2%.
(iv)
Probability
22
XY>=
wefind that the probability is
[1]
that the sample s.d. of X is greater than four times the sample s.d. of Y
(4 S
We
require PS)XYX>=
PS S
[1]
P S
16() P
22
SY > 4() = P SX SY
>16().
22
SS
Using the result from (iii)
we get:
??
> 4)
XY 4??=
(PF9,4
??
4
[1]
>
??
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-07: Sampling
and statistical
inference
Page 31
From the Tables:
PF9,4 3.936
()
10%>=
Sothe required probability is approximately 10%.
7.11
Probabilityofsample mean
(i)
X ?)Nn (,s
2
distribution
[1]
(5)=n
holds exactly for samples from the normal distribution
if n is large.
and approximately
Since we only have a sample of size 5, we require that
for any
we are sampling
from a normal distribution.
(PX
15
15)<= P
<??Z
20??-
since X ? N(20,5)
5 ??
PZ=<- (2.236)
1=-F
=
(2.236)
0.0127
[2]
Probability of sample mean
(ii)
As n is large,
werequire
(50)=n
no assumptions
other than it being a random sample, although the
answer will be approximate if the sample is not from a normal distribution.
22 20??-
PX (22)>= P
>??Z
since X ? N(20,0.5)
0.5 ??
PZ=> (2.828)
1=-F
=
(iii)
Probability
(1)
- nS2
s
2
2
(2.828)
0.00234
[2]
of sample variance
only holdsfor samples from a normal distribution.
??n-1
Therefore werequire that we
are sampling from a normal distribution.
2
PS
4 S2
(6.65) >= P
>??
s
P
=
The Actuarial
Education
0.9
Company
2
? 4=>
4 6.65??
22
5
??
??
(1.064)
(from page 168 ofthe Tables)
[2]
IFE: 2022 Examination
Page 32
(iv)
CS1-07: Sampling
Probability
of sample standard
and statistical
inference
deviation
Again werequire that weare sampling from a normal distribution:
29 S
(PS
P<=7)
<
22??
29
7
22
5
s
??= P(?2 < 56.84)
29
??
??
[1]
Usingthe figuresfrom page169 ofthe Tables,andinterpolating, wefind that
PS<
(
7) 0.998.
[1]
(v)
Probability of(i) and (iii) both occurring
X and2S areindependent if weare sampling from a normal distribution. So makingthis
assumption, weget:
(PX <
7.12
(i)(a)
Sn
6.65)>=<PX
(
15) P( 2215
S
6.65)>= 0.0127 0.9
=
0.0114
[1]
Normal approximation for 4t
Wehave:
4 ?tN(0,2) (approximately)
Werequirethe valuea suchthat4Pt a>=() 0.05. Usingourapproximation,weget:
0??-
??>= 0.05
??
PZ
(i)(b)
Normal approximation
?
aa
22
for
= 1.6449
? a =2.326
[1]
40t
Wehave:
40 ?tN 0,
40??
??
38??
(approximately)
Werequire the value b such that
0 ????>= 0.05
40 38??
??
PZ
(ii)
Pt(40
?
)
bb
40 38
0.05b>=
. Usingthe approximation, weget:
=1.6449
?
b =1.688
[1]
Compare approximate results withthe exact values
From the t tables,
Pt(4
Pt(40
wesee that:
2.132)
1.684)
IFE: 2022 Examinations
0.05>=
0.05>=
ie a = 2.132
ie b =1.684
[1]
The Actuarial
Education
Compan
CS1-07: Sampling
and statistical
inference
Page 33
Wecan see that the approximation of 2.326for the upper 5% point ofthe 4t
whereas the approximation
of 1.688 for the upper 5% point of the
distribution is poor,
40t distribution
is quite good.
Thissuggeststhat the t distributiontends towards the standard normal distribution asthe
number of degrees of freedom increases.
The Actuarial
Education
Company
[1]
IFE: 2022 Examination
Allstudy materialproducedby ActEdis copyright andis sold
for the exclusiveuse of the purchaser. Thecopyright is
ownedbyInstitute andFacultyEducationLimited, a
subsidiary of the Institute and Faculty of Actuaries.
Unlessprior authorityis grantedby ActEd,you maynot hire
out,lend, give out, sell, store ortransmit electronically or
photocopy any part of the study material.
You musttake care of yourstudy materialto ensurethat it
is not used or copied by anybody else.
Legal action will betaken if these terms areinfringed. In
addition, we mayseekto take disciplinaryactionthrough
the profession orthrough your employer.
Theseconditionsremainin force after you havefinished
usingthe course.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-08: Point estimation
Page 1
Pointestimation
Syllabusobjectives
3.1
The Actuarial
Estimation and estimators
3.1.1
Describe and apply the method of momentsfor constructing estimators of
population parameters.
3.1.2
Describe and apply the method of maximum likelihood
estimators of population parameters.
3.1.3
Definethe following terms: efficiency, bias, consistency and meansquare
error.
3.1.4
Define and apply the property of unbiasedness of an estimator.
3.1.5
Define the
estimators.
3.1.6
Describe and apply the asymptotic distribution of maximumlikelihood
estimators.
Education
Company
mean square error of an estimator
for constructing
and useit to compare
IFE: 2022 Examination
Page 2
0
CS1-08: Point estimation
Introduction
In manysituations we will beinterested in the value of an unknown population parameter. For
example, we might beinterested in the number of claims from a certain portfolio that wereceive
in a month. Suppose
we have the following
Claims
Frequency
(number of months)
data relating
to 100 one-month
periods:
0
1
2
3
4
5
6
9
22
26
21
13
6
3
It maybethat weknow that the Poisson distribution is a good modelfor the number of claims
received, but the natural question is what is the value of the Poisson parameter ?.
This chapter gives two methods that can be used to estimate the value of the unknown
using the information
provided by a sample.
parameter
Thefirst methodis called the method of moments andinvolves equating the sample momentsto
the population moments.
The second
method is called the
the parameter
value that
method of maximum likelihood
would maximise the probability
and uses differentiation
to find
of us getting the particular sample that
we observed.
Theseare not the only methods of obtaining estimates (for examplein Subject CS2 we will meet
the method of percentiles). Thetwo methods we meet here do not always give the same value
for the estimate (although
they often do).
Later in this chapter we willlook at how to decide whether the formulae that we obtain for the
parameter estimates give good estimates based upon their average value and their spread.
The expression point estimation
parameter value. This contrasts
refers to the problem of finding a single number to estimate the
with confidence interval estimation (covered in the next
chapter) where we wishto find a range of possible values.
Thisis a keytopic in moststatistics courses.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-08: Point estimation
1
Page 3
The methodof moments
The basic principle is to equate population moments (ie the means,variances, etc of the
theoretical model)to corresponding sample moments (ie the means,variances, etc ofthe
sample data observed) and solve for the parameter(s).
1.1
The one-parametercase
This is the simplest case: to equate population
mean, EX
() , to sample mean, x, and
solve for the parameter, ie:
EX[]
= 1 n? xi
n i=1
Question
Arandom sample from an
()Exp?
distribution is as follows:
14.84, 0.19, 11.75, 1.18, 2.44, 0.53
Calculatethe method of moments estimate for ?.
Solution
The population meanfor an
()Exp?
distribution from page 11 ofthe Tablesis
EX
() =
1
.
?
14.84
0.19++ 11.75
The sample meanis x==
=
5.155
1.18
+
2.44
+
0.53
6
Equating these gives us the
1
+
5.155.
method of moments estimate:
0.1940
? =?
?
Thisis an estimate of
?rather than the true value, and we distinguish this by putting ahat
or
similar over the parameter.
Wecan apply this methodto a number of different single parameter distributions. For example,
the method works well with arandom sample from a Poisson distribution.
Note: For some populations
on -??)
(,
However
The Actuarial
or the normal
the
Company
the parameter,
N(0,
)s 2 , in which case a higher-order
such cases are rarely
Education
mean does not involve
such as the uniform
moment must be used.
of practical importance.
IFE: 2022 Examination
Page 4
CS1-08: Point estimation
()=-=1/2( ??+ ) 0. Settingthis equalto the sample
) distribution has EX
meanis not going to be helpful. So what weshould dois to use, say, the variance,
For example the
var(X)=-
[
12
??-U(,
22=
, asthis involves the parameter.
3?
11
( ??- )]
Wecould then equate this to the
sample variance.
Question
Therandom sample :
2.6, 1.9, 3.8, 4.1, 0.2, 0.7, 1.1, 6.9
is taken from a
(,??- ) distribution.
U
By equating the sample and population variances, calculate an estimate for ?.
Solution
2 90.97=
and ?xi
Forthese
sample
values,?xi 11.3=
. So the
sample variance is:
2
s 1??11.3??
??2
90.97 8
?? =10.7155
=-
So using the formula
12
3
?
????
??
for the population
variance given above,
we have:
=10.7155
Solving this, wefind that
The estimator is
distribution.
78
?
5.67=
.
written in upper case asit is a random variable and will have a sampling
The estimate is
written in lower
case as it comes from
an actual sample
of
numerical values.
Be careful to distinguish
between the words estimate
and estimator.
particular numerical value that results from usingthe formula, eg =
actual sample values being used). Onthe other hand,estimator
Estimate
refers to a
x (the lower case denotes
refers to the random variable
representing
anysample,eg = X.
1.2
Thetwo-parameter case
Withtwo unknown
parameters,
we will require two equations.
This involves
equating the first and second-order
moments
sample, and solving the resulting
pair of equations.
of the population
and the
Moments about the origin can be used but the solution is the same (and often more easily
obtained) using
mean itself.
IFE: 2022 Examinations
moments about the
mean
apart from the first-order
moment being the
The Actuarial
Education
Compan
CS1-08: Point estimation
The first-order
EX
[]
Page 5
equation is the same asin the one-parameter
case:
=1 n?xi
ni=
1
The second-order
equation is:
= 1 n
EX ??
xi
??
n
22?
i =1
or equivalently:
EX
??
1 n
-=
??
?? n i 1
()
1 n
-ix x()22?
ni= 1
=
1
var( X)
ie:
ni
n
i
xi =-?
x22
xx22
=-?
=1
Weare not equating sample and population variances here; weare using a denominator of n on
the right hand side ofthe final equation, whereasthe sample variance uses a denominator of
n1.
Question
Show that these two second-order
equations
give the same answers for the parameter
estimates.
Solution
Starting with the last Core Reading equation above, our two equations are:
(EX)
x
and var(X)==
1
n?(ix
-
2
x)
Expanding the brackets in the second equation
var(X)
11
(ii)xx
=?22
=
{?? x
nn
gives:
1
-
nx2}=
n
xi 2
-
x2
Sinceourfirst equationis EX
() =x, wehave:
1
var()=-i?
X
n
Since
EX()
EX()
The Actuarial
22
[xE( X)]
ie
?221
+=
var( X) [ EX
( )]
n
xi
var( X)=+[ E( X)]22 , we now have:
=1 ?xi
22
Education
n
Company
IFE: 2022 Examination
Page 6
CS1-08: Point estimation
Thisis the other second-order
Wecan now find
equation.
So the two second-order
equations
are equivalent.
method of moments estimators in the two-parameter case.
Question
)n n
(, p distribution yields the following values:
Arandom sample from a Bi
4, 2, 7, 4, 1, 4, 5, 4
Calculatemethod
of moments
estimates
of n andp.
Solution
There aretwo unknown parameters so we need two equations. The population meanfor the
Bi
)n n
(, p distribution from page 6 of the Tablesis
n
1
Equating
these
gives []=EX
x? i
n
=i 1
?
np
=
EX
() = np. The sample meanis x
.
31
== 3.875 .
8
(1)3.875
Thereis noformula for 2EX () on page 6 of the Tables. However, since
X)
( )=-22var(
EX
[ E( X)]
,
we have:
EX()
var( X)=+ [ (EX)]22
We
alsohave1
2
xi
n
p)-+ ( np)=2
np(1
Substituting
=
np(1
-
143
8
==?
17.875.
p) +( np)2
Equating this to :()
EX2
17.875
(2)
equation (1) into (2) gives:
3.8 75(1 =pp
)-+ 3.875
=
17.875
? 2
0.2621
. Sincenisthe numberoftrials,the true valuecannotbe 14.78.Therefore
it is
Hence,n 14.78=
likely to be 14 or 15.
Alternatively,
n
using the second of the second-order
equations
gives var()X p=-np(1
) and
2
?xxi22-=1143 -3.8752.859375.
Equating
these
gives:
n=i 1
8
np(1
)-=p2.859375
IFE: 2022 Examinations
=
(3)
The Actuarial
Education
Compan
CS1-08: Point estimation
Substituting
Page 7
equation (1) into (3) gives:
3.875
? pp
(1 = 0.2621
)-= 2.859375
and hence n 14.78=as before
Wecan apply the
method of moments to other distributions.
Question
Arandom sample of size 10 from a Type 2 negative binomial distribution
with parameters
k and
pis asfollows:
1, 1, 0, 1, 1, 1, 3, 2, 0, 5
Calculate
method
of moments
estimates
of k andp.
Solution
There are two unknown
parameters so we need two equations.
Type 2 NBin k(, p) distribution from page 9 ofthe Tablesis EX() =
x
The population
(1 -kp)
p
meanfor the
. The sample meanis
15
==1.5 . Equating these gives:
10
n
EX
[]=
=
?xi
?
1( 1-kp
)
=1.5
npi
1
(1)
Thereis noformula for 2EX () on page6 ofthe Tables. However,since
X)
EX
(
)=-22var(
[ E( X)]
,
wehave:
EX()
(1 kp)
22
[ E( X)]=+=
p2
var( X)
xi
We
alsohave
2143
4.3.
==?
n
(1 kp
2
p
Substituting
1.5
p
10
p
??
Equating these gives:
2
(1 -- kp
) )??
?? 4.3
+=
p ??
??
(2)
equation (1) into (2) gives:
1.52+= 4.3
?
Hence, equation (1) gives k
The Actuarial
2
(1-- kp)??
+??
Education
Company
p = 0.7317
4.091=
.
IFE: 2022 Examination
Page 8
CS1-08: Point estimation
Alternatively, using the second ofthe second-order equations gives var(X) =
(1 -kp)
and
2p
n
?
ni =1
22-=143 - 1.52
xxi
10
(1 -kp)
2
p
Substituting
=2.05.
Equatingthese gives:
(3)
= 2.05
equation (1) into (3) gives:
1.5
= 2.05
p
p = 0.7317
?
and hence k
4.091= as before.
Notethat 2s
with divisor )n -(1 is often used in place of the second central sample
moment, ie weoften usethe definition of the sample variance quoted on page 22 ofthe Tables.
Sothe second-order equation is now:
var(
X)
22
??2
x
nn
==
sx - x()
=
nn 11 ii
ii
==
11
??11??
??-nx
2
??--??
Usingthis version will not give the same estimates asthose obtained using the previous
second-order
obtained.
equations.
The advantage of this
However, if n is large there is little
difference
method is that 2S is an unbiased estimator
between the estimates
of the population
variance.
The
importance of this property is covered in more detail later.
Question
Arandom sample from a Bi
)n n
(, p distribution yields the following values:
4, 2, 7, 4, 1, 4, 5, 4
Calculatemethod
of moments
estimates
of n and pusingx ands2 (use a denominator of n1for the sample variance).
Solution
Wehave sample
3.875
IFE: 2022 Examinations
mean and variance of:
and xs
==17
143
8
3.87522=
()- 3.26786
The Actuarial
Education
Compan
CS1-08: Point estimation
The population
Page 9
mean and variance are:
EX
[]
np=
Equating population
np
var[ X]p=-np 1 ()
and:
and sample statistics gives:
3.875
and
np(1
p) = 3.26786=-
Solving
gives
=p 0.1567and n 24.73=
(which
Wecan also apply the
are different from the values calculated
method of moments to continuous
previously).
distributions.
Question
The sample
mean and sample variance for alarge random sample from a
distribution
are 10 and 25, respectively.
Usethe
Gamma (,a? )
method of moments to estimate
a and ?.
Solution
Equating the
mean and variance,
10
and
?
aa
we get:
==225
?
Dividing the first equation by the second gives:
10== 0.4
25
For cases
with
? ?a
more than two
10= 0.4 = 4
parameters,
moments about zero should
be used.
For example, if we havethree parameters to estimate, we would usethe set of equations:
EX
[]
xii
[EX ]==
11
x
?? 22
nn
EX
[ ]
=
n
331
?xi
This approach can be extended in an obvious wayfor morethan three parameters.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 10
2
CS1-08: Point estimation
The methodof maximum
likelihood
The method of maximum likelihood
estimators. In particular
determined
asymptotic
Asymptotic
2.1
is
widely regarded
maximumlikelihood
properties
as the best general
method of finding
estimators have excellent and usually easily
and so are especially
good in the large-sample
situation.
here means whenthe samples are verylarge.
The one-parametercase
The mostimportant
stage in applying the methodis that of writing down the likelihood:
n
()
x??i
?Lf (
=
; )
i =1
for a random
sample
xx12,,,?
xn from
a population
with density
or probability
function
fx(; ?) .
n
?
?f x()i would mean f ( x )
meansproduct,so
fx12()
fx 3()
?
(fx)n . The above
i =1
statement is saying that the likelihood function is the product of the densities (or the probability
functions in the case of discrete distributions)
calculated for each sample value.
Remember that ? is the parameter whose value weare trying to estimate.
The likelihood
is the probability
of observing
the sample in the discrete
proportional to the probability of observing values in the neighbourhood
the continuous
case, and is
of the sample in
case.
Thelikelihood function is afunction of the unknown parameter ?. So different values of ? would
give different
values for the likelihood.
The maximum likelihood
approach is to find the value of
that would have been mostlikely to give usthe particular sample wegot. In other words, we
need to find the value of ?that
maximisesthe likelihood function.
For a continuous distribution the probability of getting any exact value is zero, but since
x
PX
x()=
+e
?
f ( t ) dt
2e f( x), wecan see that it is proportional to the PDF.
x -e
In
most cases taking logs
estimator
(MLE)
Differentiating
greatly
simplifies
the determination
maximum likelihood
?.
the likelihood
or log likelihood
derivative to zero gives the maximum likelihood
IFE: 2022 Examinations
of the
with respect
to the parameter
and setting the
estimator for the parameter.
The Actuarial
Education
Compan
?
CS1-08: Point estimation
Page 11
Example
sample of size n ie x (,1 ?, x n) from the exponential
Given a random
fx ()
x=>?,0x, the
e?-
n
L=?()
n
?fx
=()i
i=1
log
?
()
x??
log
MLE, ?, is found
?
- xi
=???
n
population
with density
as follows:
ee - ? ?
xi
i=1
?=
Ln log
-
n
()
?
=-?
?
?i
Lxi
?
??
equating to zero:
?
?
xi
MLEis
-=
0
? ?=
?
nn
?
==
?xi
1
x
1
X
1
Notethat
x
likelihood
is a maximumlikelihood estimate, ie a numerical value, whereas
estimator, ie a random
1
X
is a maximum
variable.
It is necessary to check, either formally
or through
simple logic, that the turning
point is a
maximum. Generally the likelihood
starts at zero, finishes
at or tends to zero, and is non-negative.
Therefore if there is one turning
point it
must be a maximum.
Theformal approach would be to check that the second derivative is negative. For the above
example weget:
2
log L( ?) =d
dn
??
22
<0
?
max
It is important that we do check, whether formally or through simple logic, and state this
(together with your working/reasoning) in the exam to receive all the marks.
At the differentiation
stage, any terms that do not contain the parameter (? in this case) will
disappear. So whenthe log-likelihood is written down, any terms that dont contain the
parameter can bethought of asa constant.
Wecan calculate maximumlikelihood estimates for parameters from discrete distributions too.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 12
CS1-08: Point estimation
Question
Arandom sample of size n (ie x nxx?
12,, ,
Derive the
(ii)
The sum of a sample of 10 observations from a Poisson () distribution is 24. Calculate
maximum likelihood
estimator
estimate,
of
()Poi
distribution.
(i)
the
maximum likelihood
) is taken from a
.
.
Solution
(i)
Thelikelihood function is:
n
()==
?
e
xi
-
constant Le -n
xi !
i= 1
?xi
Takinglogs:
ln ( )=- constant
Differentiating
d
d
+?Ln
i lnx
with respect to:
ln (
)
=+ ?xi
Ln
This derivative is equal to zero when:
==?xi
x
n
Differentiating again(to check that it is a maximum):
d2
ln L( ) =-
d
?xi< 0
22
?
max
Sothe estimate (the value obtained for a particular sample) is
=x . The estimator (the
random variable)is X.
(ii)
Wehave n
==
IFE: 2022 Examinations
10=
x
and ?xi 24=
. Hence the
24
10
=
estimate is:
2.4
The Actuarial
Education
Compan
CS1-08: Point estimation
Page 13
MLEsdisplaytheinvarianceproperty,whichmeans
thatif ?is the MLEof ?thenthe MLE
of a function
()g
? is g()
? .
For example,the MLEof -221
?
is
-221.
?
Question
The MLEsof the parameters
=2
s
of alognormal
distribution
have been found to be
= 2 and
0.25. Derivethe maximumlikelihood estimate ofthe meanofthe lognormal distribution.
Solution
The formula for the
?=
e
1
+
s
2
mean ?(say) of alognormal
distribution
=e
page 14 of the Tables):
2
Theinvariance property tells usthat the MLEsof ,? ,and
?
is (from
s
arerelated bythe same equation:
1
2s 2
+
Sothe MLEof the meanis:
2.2
1
+
e
?
.25
202
== 8.37
Thetwo-parametercase
This is straightforward in principle and the
but the solution
iterative
of the resulting
methodis the same as the one-parameter case,
equations
may be more awkward,
perhaps requiring
an
or numerical solution.
The only difference is that a partial derivative is taken
before equating each to zero and solving the resulting
with respect to each parameter,
system of simultaneous
equations
for the parameters.
Soin summary, the steps for finding the maximumlikelihood estimator in straightforward cases
are:
Write down the likelihood
function,
L.
Find ln L and simplify the resulting expression.
Partially differentiate ln L withrespect to each parameter to be estimated.
Set the derivatives
equal to zero.
Solvethese equations simultaneously.
In the two-parameter
complicated,
The Actuarial
case, the second-order
condition
that is used to check for
maxima is
more
and weshall not discuss it here.
Education
Company
IFE: 2022 Examination
Page 14
CS1-08: Point estimation
Question
Derivethe
MLEsof
and s for a sample of n IID observations from a N(,
s
2) distribution.
Solution
The likelihood
function is:
n
?
exp
2
i
2??
11 x
=-
2??
sp=1
-n
????i=
1
??
??
s
????-
n
2 ?Lx
exp -
2s i=1
-si ()2
?? constant
??
??
Takinglogs:
log
=-Lnlog
2s
Differentiating
?
n
1
-
2 ? i -sx ()2 +constant
i= 1
with respect to
log
and s gives:
11??nn
2(
=-
=
?
ss2
?
logLx- =-
n
s
?s
-
2
-
22
??)Lx
x n ????
ii
??
ii ==11
12??
)
=
1
??
1
nn
??
22
(xii
) - n????(??
??
32
sss== ii 11
??
Setting these to zero and assuming these are maxima gives:
1n
xi ==?
x
ni= 1
Also:
n
i
2.3
=
2=
=-s ()ixs
1
11
nn
22??
s =s
1
nn--
n
Aspecialcase the uniformdistribution
For populations
where the range of the random variable involves the parameter, care must
be taken to specify
when the likelihood is zero and non-zero.
Often a plot of the likelihood
is helpful.
An example of a random variable
where the range involves the parameter is the uniform
distribution:
f()
IFE: 2022 Examinations
1
-ba
xa=< x < b
The Actuarial
Education
Compan
CS1-08: Point estimation
Page 15
Welook at this in the next question.
Note how wespecify
when the likelihood
is zero (ie it does
not exist for the specified values of the parameter) and non-zero (ie whereit does exist for the
specified values of the parameter).
The second important feature about this question is that the usual route for finding the
using differentiation
breaks down.
maximum
Question
Derivethe maximum
likelihoodestimateof ? for]U[0, ? basedon arandomsampleofvalues
x12,
nxx ,...,
.
Solution
For asamplefrom the]U[0, ? distribution we musthave
?xx,
n?
0,1
. Hence max?ix=
==
.
Thusthe likelihood for asample of size n is:
? 1
?
L = ??
if
n
?
?>
maxxi
0otherwise
?
Differentiation doesnt workbecausedd
second derivative shows the problem
n whichgivesaturningpointof
L( ) =-ln
???
n
d2 ln L(? )
d
??
22
=>
0.
So using common sense, we mustfind the value of ?that
??8. The
Wehave a minimum as ??8.
maximisesL()? = 1 .
Wewant ?to be
n
?
as small as possible subject to the constraint that
?=
maxix .
Hence
?=
ix .
max
2.4 Incomplete samples
The method of maximum likelihood
can be applied in situations
wherethe sample is
incomplete.
For example, truncated
data or censored data in which observations
are known
to be greater than a certain value, or multiple claims where the number of claims is known
to be two or more.
Censored data arise when we have information
about the full range of possible values but that
information is not complete (eg when we only know that there are, say, 6 values greater than
500). Truncated data arise when we have noinformation about part of the range of possible
values (eg when we have no information
at all about values greater than 500).
In these situations,
as long as the likelihood
(the probability
information)
can be written as a function
of the parameter(s),
Again in such cases the solution
of observing the given
then the method can be used.
may be more complex, perhaps requiring
numerical
methods.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 16
CS1-08: Point estimation
For example, suppose a sample yields n observations
,xx(, 12 ?, )nx and m observations
greaterthanthe value y, then thelikelihood is given by:
??
n
>?? ?
Lfxi
()
i
Our estimate
=
??
) =
P( X
y)[](,
m
??
??
1
will be as accurate as possible if we use all the information
that
we have available.
Forincompletesamples,wedont know whatthe valuesabovey are. All weknowis that they
nm
+
aregreaterthan y. Since
thevaluesabovey areunknownwecannotuse () = ?Lfx(i ??
, ). We
i= 1
instead
use the formula
given.
If theinformation
is moredetailed
thangreaterthan y wecanusea moredetailed
likelihood
function.Forexample,
if wehavem observed
valuesbetweeny andz, and pobserved
values
above z, in addition to the n known values, then we would use:
n
()
( ??
, )
?Lfxi=
i
P( y < X< z)[][mp
P(X >z)]
=1
Question
Claims(in 000s) on a particular policy have a distribution
fx()
2cxe- cx2
with PDFgiven by:
x=> 0
Seven of the last ten claims are given below:
1.05, 3.38, 3.26, 3.22, 2.71, 2.37, 1.85
The three remaining
claims
were known to be greater than 6,000.
Calculate the
maximum
likelihood estimate of c.
Solution
Wehave 7 known claims and 3 claims greater than 6. So the likelihood
7
Lc
()X= ?f ( xi )
P(
is:
3
> 6)
[]
i=1
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-08: Point estimation
Page 17
7
8
Since PX>=
(6)
22
8
- e-- cx
dx ??=
??
cx
? 2cxe
7
-c
and
?xi2
=
49.91, the likelihood
function is:
i =1
62
??
??
i=1
7
2
-cxi ?
constant=
ce i =1
constant=
7
ce-
62
3
2
? 2cxi e-cxi =??e
The log-likelihood
=e
??6
6
()
Lc
-c
e7108
c
157.91c
is:
ln Lc=+
( ) constant 7ln c- 157.91c
Differentiating the log likelihood gives:
7
d
ln Lc
()
dc
=- 157.91
c
This derivative is equal to zero when:
c
157.91-= 0
? c
77
==0.0443
157.91
Differentiating again to check weget a maximum:
d2
ln)Lc
(
=-
dc
7
22
c
<?0
max
Soc 0.0443=
.
If
we have some claims about
which nothing is known (ie we dont
even know
whether there are
any claims of a particular type), then the data are said to be truncated, rather than censored.
need to take a slightly different
We
approach here.
Question
The number of claimsin a year on a petinsurance policy are distributed asfollows:
No. ofclaims,n
PN=()
n
0
1
2
5?
3?
?
3=
19?
Information
from the claims file for a particular year showed that there
1 claim, 24 policies with 2 claims and 16 policies with 3 or more claims.
were 60 policies with
There was no information
about the number of policies with no claims.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 18
CS1-08: Point estimation
Calculate the
maximum likelihood
estimate of ?.
Solution
Since we have no information
at all about zero claims, we need to determine the truncated
distribution.
All we dois omit the zero claims probability and scale up the remaining probabilities
(whichonlytotal to ?-15 ) sothat they nowtotal to 1:
No.ofclaims,n
1
PN =()
n
2
3?
?
-
15 ?
15-?
-
-
These probabilities
table is actually
can also be thought
PNN=>
(1|
N(1|
PN
0)=>
=
19?
15?
of as conditional
probabilities, ie the first
probability in the
0). Usingthe definition of conditional probability, weobtain:
PN =(1)
3?
=
PN>-5?
(0)
1
and we obtain the same probabilities
The likelihood
=3
as before.
is:
60
PN
( = 1)
[PN( = 2)]24 [PN=( 3)]16
[]
0) =constant
LN?
>(|
So:
60
??
-??
(?| LN> 0) = constant
??
??
15 ??
15 ??
?---??
84
(1
(1
-
constant=
24
??
16
?
?
??319
?
15 ??
9 ) 16
- ??
5? )100
The constant arisesfrom the fact that we dont know which ofthe 60 policies had 1 claim, etc and
so there is some combinatorial factor to account for this.
The log-likelihood
is:
ln -??
( LN>=
|
0) constant
+
84ln
+
16ln(1 - 9? )
100ln(1
-
5? )
Differentiating:
dd? ln
(?|
0)
LN>=
84
-
9 16
19
??
+
5 100
1-- 5?
Setting this equal to 0:
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-08: Point estimation
84(1
Page 19
9 )(1-84
?
?
?
Differentiating
5
??
)
144 ?(1
-
-
9 ?)
=
0
= 0.102
again to check we get a maximum:
84
- 9 9 16 + 5 5 100 <0 when??
9 22
)
(1
d?
?=
500 ?(1
+
820 ? -= 0
d22ln ( |LN>=0)
So
5 ?)
-
(1-- 5? ) 2
??
=
0.102
?
max
0.102
Independent samples
For independent
overall likelihood
samples from two populations
which share a common
is the product of the two separate likelihoods.
parameter,
the
Question
The number ofclaims, X, per year arisingfrom alow-risk policy hasa Poissondistribution with
mean . The number of claims, Y,per year arisingfrom a high-riskpolicyhasa Poisson
distribution
with mean2
.
Asample of 15low-risk policies had atotal of 48 claimsin a year and asample of 10 high-risk
policies had a total of 59 claimsin a year. Determine the maximumlikelihood estimate of
based on this information.
Solution
Thelikelihood for these 15low-risk and 10 high-risk policiesis:
15
10
()== (??LPX x )
ij 11
10
xi
15
P(Y=yij ) = ?
e
?
==
2
iji!!=1xyj=1
? yj
?xi
15
i=1
48
constant
e--
10
15
constant=
(2 ) yj
15
j=1
ee
--
59
-- 20
ee
=
20
=
constant
107 e -35
Thelog-likelihood is:
ln L( )
constant=+ 107ln
35
-
Differentiating:
d
ln L( )
107
=-35
d
The Actuarial
Education
Company
IFE: 2022 Examination
Page 20
CS1-08: Point estimation
Thisis equal to 0 when:
107
==
35
3.057
Differentiating again to check we get a maximum:
d2
lnL()
d
So
=-
107
22
<0
?
max
3.057=
.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-08: Point estimation
3
Page 21
Unbiasedness
Consideration
of the sampling
good it is as an estimator.
to be located
distribution
can give an indication
of how
ofthe estimator
near the true value and have a small spread.
If we have arandom sample
parameter
of an estimator
Clearly the aim is for the sampling distribution
? and
12,
X ,..., Xn()=from a distribution
XX
gX() is an estimator
of ?, it seems
with an unknown
desirable that [Eg()]=X
?.
This is the property of unbiasedness.
Wecan think
of an unbiased estimator as one whose mean value equals the true
parameter
value.
Question
Showthat the estimator for
obtained in the question on page 12is unbiased.
Solution
In this question we have arandom sample from the
=
()Poi distribution
X. Toshowthat this is unbiased weneedto showthat E()
and the estimator is
= , ie EX
()=
.
Wehave:
??
E()==
EX
Since
?
Xi
Xii??
??
??
Poi () we have
n
EX
()==
=
So the estimator
If an estimator is
?
1
11 nn
??
nn
EX
i ()=
11
n
nni
X=
.
Hence:
=
is unbiased.
biased, its
between the expected
(EX )
ii== 11
bias is given by
value of the estimator
??X
()Eg
?, ie it is a measure of the difference
??
and the
parameter
being estimated.
If the biasis greater than zero, the estimator is said to be positively biasedie it tends to
overestimate the true value. Alternatively, the bias could beless than zero, leading to a
negatively
The Actuarial
biased estimator that
Education
Company
would tend to underestimate
the true value.
IFE: 2022 Examination
Page 22
CS1-08: Point estimation
Question
Thefollowing are estimators for the variance of a distribution having mean
and variance2s .
Obtainthe biasfor each estimator:
2
(i)
(ii)
s
2
n
1
-
?SXi X()2
=-
n 1i=
1
1
n
ni= 1
X=-i?
X
()2
Solution
(i)
Theformula for the bias of2S is:
S()s=- E 22
S ()as
bi
Consider
2
ES()
2:
E()
ES
??
nn
(X =- X22
)
=??
???
E
??
nn--??
11
2????11
??Xii 2 nX ??-??
ii==11
??
???
n
1
??
1 ? EX()=-i nEX( 22)??
??
n??
i=1
Since:
22
E
( )i
EXii()X=+ var(X )
= s
2
+
2
and:
22
EX()==+var( X)
E ( X)
s2
n
+
2
weget:
1
ES()
n
1-????
?(22=+
1
n-1
1
n- 1
-s
=+nn 22
2
) -ns
s
????
2
??
2 ??+
??
??i=1
nn
s2 - n 2
()
n=- (1)s2
=s2
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-08: Point estimation
Page 23
So:
bias S() = E( 22
S ) -
2
2
=-=0s2
ss
This meansthat 2S is an unbiased estimator of2s
(ii)
Sinces
n=
.
221
S
, wecan use the result from part (i) to get:
n
()22EE=??
n11 ?? nn-S ==
nn
??
-21
ss
n
2
E( S )
So:
bias
()
Ess 22()=-
The property of unbiasedness
estimator/parameter.
s
2
n=
s2
-s
is not preserved
2 = -11
nn
2
s
under non-linear
transformations
of the
So,for example, the fact that 2S is an unbiased estimator ofthe population variance does not
meanthat Sis an unbiased estimator of the population standard deviation.
As indicated
earlier
unbiasedness
seems to be a desirable
necessarily an essential property for an estimator.
which a biased estimator
unbiased estimator.
Theimportance
is better than an unbiased
property.
However it is not
There are many common situations in
one, and, in fact,
better than the best
of unbiasedness is secondary to that of having a small mean square error.
An unbiased estimator is one that for different samples will give the true value on average.
However,it could be that some of the estimates are too large and some are too small, but on
average they givethe true value. So we need some wayof measuringthe spread ofthe
estimates obtained for different
samples.
That measure is the
mean square error and is covered
in the next section.
A biased estimator whosevalue does not deviate very far from the true value (ie has a small
spread)is preferable to an unbiased one whose values areall over the place.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 24
4
CS1-08: Point estimation
Meansquareerror
As biased estimators
can be better than
compare estimators generally.
The mean square
error (MSE)
MSE((gX))
alower
mean square
ones a measure of efficiency
is needed to
of an estimator
gX() for
is defined
?
by:
??
gX)=- ?()(??
2
E
??
Note that this is afunction of
Thus the
unbiased
That measureis the mean square error.
?.
error is the second
moment of
gX() about
and an estimator
?
with
MSEis said to be more efficient.
The MSE of a particular
estimator
density of the sampling distribution
However it is usually
MSE
as this
can be worked out directly as an integral
using the
of gX() , or using the density of X itself.
much easier to use the alternative
expression:
variance =+ bias 2
makes use of quantities that are already known or can easily be obtained.
This expression
can be proved as follows:
(Simplifying things by dropping the
MSE( ) gE
()Xand writing simply g.)
2
g=-? () ??
??
??
Eg=- E
+[]g
E []
g
2??
()()
?{} ????
-??
g=-
E
Eg
[]()
??+ 2() Eg[]???-
2
??
var []=+ 0 + bias2 gg[]
Note:If the estimator
E g
-
?? ?
? Eg[]??
Eg[]
+
2
?
as required
X is unbiased, then
()g
MSE= variance.
Question
Obtainthe MSEofthe estimator for
obtained in the question on page 12.
Solution
In this question, we have a random sample from the
X=
.
()Poi distribution and the estimator is
The MSEis given by:
2 =+ var()
MSE()
IFE: 2022 Examinations
bias ( )
The Actuarial
Education
Compan
CS1-08: Point estimation
Page 25
In Section 3 weshowed that the estimator is unbiased, ie bias =()
= )=+ var( )
MSE(
02
0. So:
var( ) = var( X)
Now:
var( )
As
Xi
??
var
n
??
??
11
n
nn
??2var( Xii )
)
n
=
i= 1
MSEis
n
sinceiX
areindependent
ii ==11
()Poi, we know that var( )iX=
?
var(
So the
XX ??==
.
Hence:
11
nn
Xn
==?22
n
.
The following
diagram gives the sampling distributions
of two estimators:
but has alarge variance, the other is biased with a much smaller variance.
one is unbiased
This illustrates
situation in which a biased estimator is better than an unbiased one.
It is clear that an estimator
with a small
MSEis a good estimator.
an estimator gets better as the sample size increases.
it is desirable that
The Actuarial
Education
MS
?E
Company
0 as
It is also desirable that
Putting these together suggests that
?8n. This property is known as consistency.
IFE: 2022 Examination
a
Page 26
CS1-08: Point estimation
Question
The estimator, s 2 , is usedto estimate the variance of a N(,
s
2) distribution based on a random
sampleof n observations:
s
2
1
n
ni= 1
X=-i?
X
()2
(i)
Determine the meansquare error of 2s .
(ii)
Determine
whether 2s is consistent.
Solution
(i)
Relating 2s to the usual sample variance :
22
(1)
=
n
S
s
ns2
?
s
nn (1)--S2
and
2
2
?s? n-1
2
??n-1
2
Hencethe meanof 2s is obtained from:
s2
s
??
?? En=2 ??
nn -1
Ess1(22
) =
n
?
??
So:
biasss ()
22
E
()=-
s
2
n -1
=
s2
2
s
-s2
=
-
nn
Thevarianceof s2is determined asfollows:
s2
var
s
??
2(n=- 1)
??
??
??
? var()24
2(
nn
=
-
1)
ss
22
n
So:
MSE()
(ii)
Sincethe MSE,
IFE: 2022 Examinations
2=+??
var() 22
??21n
n2
?? s
s
2 2(
()biasss
=
??
4, tendsto zero as
1)
s
4
s
+
-
2
2
??
2 1 4
nn
?? =??s--??
22 ??
??
nnn
??
?8n
, the estimator is consistent.
??
The Actuarial
Education
Compan
CS1-08: Point estimation
5
Page 27
Asymptoticdistribution of maximum
likelihood estimators
Givenarandom sampleof size n from a distribution withdensity(or probabilityfunctionin
the discrete case) )fx (; ?, the maximum likelihood
is approximately
bound, that is:
?~
?
where
normal,
and is unbiased
estimator
with variance
?
is such that, for large
given by the
Cramr-Rao
n,
?
lower
( ,CRLB)
??N
1
CRLB =
?
nE
2
log f
X;?
??
??
?? ??
()?? ??
?? ??
??
.
The MLEcantherefore be called asymptotically efficient in that, for large n, it is unbiased
with a variance equal to the lowest possible value of unbiased estimators.
The Core Readingis saying that the CRLBgives alower bound for the variance of an unbiased
estimator of a parameter (which is the same asits meansquare error). So no unbiased estimator
can have a smaller variance than the CRLB.
This is potentially
a very useful result as it provides an approximate
when the true sampling distribution
may be unknown or impossible
distribution for the MLE
to determine easily, and
hence may be used to obtain approximate confidence intervals.
Confidence intervals
will be covered in alater chapter.
The result holds under very general conditions
with only one major exclusion: it does not
apply in cases where the support of the distribution involves the parameter, such as the
uniform distribution.
Thisis due to a discontinuity, so the derivative in the formula doesnt
There are two
useful alternative
expressions
Noting that
()L
? is really )LX?
( ,
, these are:
1
CRLB =
?
??
for the
2
??
?? ??
itself.
1
-
??
?? ??
??
The second formula is normally easier to
derivative of the log-likelihood
CRLB based on the likelihood
and CRLB =
log EL ? , X()??
makesense.
?
??
2
2logEL ?, X()??
??
??
work with (as we would have calculated the second
when checking that we get a maximum). Thisformula is given on
page 23 of the Tables.
Question
Derivethe CRLBfor estimators of , for a sample
The Actuarial
Education
Company
X 1,, ?
Xn from
a
()Poi distribution.
IFE: 2022 Examination
Page 28
CS1-08: Point estimation
Solution
Thelikelihood is:
n
( )
?
e
i= 1
-
Xi
==
Xi !
constant Le-n
?Xi
So:
ln ( )=- constant Ln
Differentiating
d
d
+?
with respect to
ln ( )
i lnX
gives:
=+? Xi
Ln
Setting this equal to zero would give the MLEof
X=
.
Differentiating again (which we would have done to check weget a maximum):
d2
ln L( )
=-? Xi
22
d
Finding the expectation of this (noting that only the iX s are random variables):
2
d
??
lnEL( )??
=-
??
??
22
E X
]i
=-
11
??[
2
1
= -
n
2
dn
=-
So,from the second formula for the CRLB:
CRLB=-
E
d2
d
??
1l2n
(L )??=
??
??
n
In fact, in this case, the maximumlikelihood estimator
X=
is unbiased and has variance
n.
So,the estimator attains the CRLB.
Wecan find the CRLBfor estimators
of parameters from continuous
distributions.
Question
(i)
Show that the CRLBfor unbiased estimators
observations from a N(,
(ii)
Show that the
IFE: 2022 Examinations
s
2) distribution
maximum likelihood
of
,
based on a random sample of n
with known variance2s
estimator
X=
, is given by
2
s
n
.
attains the CRLB.
The Actuarial
Education
Compan
CS1-08: Point estimation
Page 29
Solution
(i)
From the question on page 14, wesee that:
?
ln ( )
2(LX =-
)
11?? nn
22
ss?? ii==11
2
?
??
=
Setting this equal to zero and rearranging
Note:
Wehave changed ix
to iX
X
ii
n ????
gives the
MLE
X=
.
as we are working with the estimator.
Differentiating again gives:
?
2
n
ln L( )
=-
?
s
22
Since there are no iX s, all the values are constants,
?
2
??
lnEL( )??=- E-??
??
??
?
??
22
nn
=
??
and hence:
2
ss
So,from the second formula for the CRLB:
CRLB=-1l2 E
?
??
2
(L )??=
n
2
??
n
??
?
(ii)
s
From a previous chapter
wesaw that if
XN(,
then
2)s?
?XN
,
s
2??
?? so var()X
n ??
??
=
s
2
n
Hencethe MLEattains the CRLB.
Whatfollows
now is an example to illustrate
the fact that if
we want to obtain the CRLBfor the
variance, s2, wecant just take the CRLB
for the standard deviation, s, andsquareit. The
reason for this is that the formula for the CRLBof
CRLB()
s
sis:
1
=-
??
d2
2lnEL(s)??
ds
??
??
whereasthe formula for the CRLBof vs=
2
is:
1
CRLB ()
v =-
d2
??
2lnELv
( )??
dv
??
??
Thereis no simple connection between the derivatives.
The Actuarial
Education
Company
IFE: 2022 Examination
.
Page 30
CS1-08: Point estimation
Question
Derivethe CRLBfor estimators of the variance of a N
s2(,
) distribution, where
is known,
basedonarandomsampleof n observations.
Solution
Weneedto workin terms ofthe populationvariance
2s , whichwewill writeas v. Thelikelihood
function is:
n
Lv=-
()
??
11
exp
? 2 pv
- n
(X - ) ?? =v
2
??
i=1
n
1
vv=?(Xii
exp 22
i
-)
22??
1
?? constant
??
??
Takinglogs:
ln)Lv
(
1 n
?(X
22vi= 1 i
n
=- -lnv
-
2
+)
constant
Differentiating withrespect to v gives:
?
log)Lv(
?vv
n
=-
2
+
n
1
2 ?(Xi
2v i=
-
)
2
1
Differentiating again:
?
2
n
ln)Lv(
n
1
22
=- 3 ?(Xi
-
2
?vv
) 2 ie
v i=1
n
2vv
Weneed to determine the expectation of this.
Xi
i
=??
s
??-
1 n
-
22
i
=
???s
1
2
??-Xi
??
We will usethe fact that
XNi (,
2)s?
, so
The Actuarial
Education
?ZN(0,1) and hence:
??
EZ()
var( Z )=+ 22
E (Zii
i)
IFE: 2022 Examinations
=
1 + 02
=
1
Compan
CS1-08: Point estimation
Page 31
So we have:
?
2
?? n
ln ELv
( )??=222
???
?? vv
n
1
?E
v2
n =-
s
n
1
?
EZi2??
??
22
vv
i=1
2
n
n
1
=-
2
2
?? ??-Xi
?? ??
??
??=1
??i
?1
vv22 i=1
nn
=-
n
= -
22
22v 2
vv
Hence:
??
2
?
CRLB =-1logL
E
?v2
22 24
s
v
??=
??
=
nn
Wenow consider the CRLBfor arandom sample of observations from an exponential distribution.
Question
Givenarandom sample of n observations from an
?()Exp
distribution,
determine the CRLBfor
unbiased estimators of:
(i)
(ii)
?
the population
1
mean,
=
.
?
Comment on the results.
Solution
(i)
Using the Core Reading example from page 11, we have:
n
n
()==
??Le
?
-
?
Xii=1
?
n
- ? ? Xi
e
i=1
n
? ln
( )
lnLn
?? =-
?
?Xi
i=1
The Actuarial
Education
Company
IFE: 2022 Examination
Page 32
CS1-08: Point estimation
Differentiating
d
this
ln (?)
with respect to
gives:
?
n
dn
=-?LXi
??
i
=
1
Setting this equal to zero givesthe estimator
Differentiating again withrespect to
2
ln L( ? )
??
X
.
? gives:
22
are no iX s, all the values are constants and hence:
2
d
??
dn ??
ln EL( ? )??=- E-??
=
??
??
So,from the second formula
CRLB=-1l2 E
(ii)
1
dn
=-
d
Since there
?=
d2
d?
22
??
n
?
2
for the CRLB:
??
nL(?)??=
?2
??
n
??
Weare estimating the meanof an
()Exp?
distribution, ie
=
1
, therefore
we need to
?
workin terms of
and differentiate
withrespect to
.
Thelikelihood function for the sample is:
n
-?()
? Le ?Xi==
1
?Xi
e-
n
i= 1
1
? ln -( ) =-Lnln
Differentiating
d
with respect to
ln L()
? Xi
:
=-dn + ?Xi
2
Differentiating again withrespect to
2
ln L()
d
IFE: 2022 Examinations
dn
:
=-2 ?Xi
22
3
The Actuarial
Education
Compan
CS1-08: Point estimation
Page 33
Finding the expectation
2
d
Since
Xi ??
of this:
?? dn
lnEL( ??=)
22
??
??
2
?E[ ]iX
3
Exp () , we have
()iEX
=
?? dn
2
lnEL( )??=d
??
22
and hence:
n
?
3
=-
2
22
3
n
= -
n
2
So, from the second formula for the CRLB:
CRLB =-
d2
E
??
2
1l2
ogL??=
??
??
d
n
Comment
Although
1
=
wesee that
1
CRLB()?
CRLB()
?
?
2
In fact, weactually have CR
The Actuarial
Education
Company
()LB
n
==
1
n?2
.
.
IFE: 2022 Examination
Page 34
6
CS1-08: Point estimation
Comparingthe methodof momentswith MLestimation
Wenow compare the method of moments and the method of maximumlikelihood.
Essentially
maximum likelihood
is regarded
In the usual one-parameter case the
the sample mean X and this
as the better
method.
method of moments estimator is always afunction
mustlimit its usefulness in some situations.
of
For example in
the case of the uniform distribution on [0,]? the methodof momentsestimator is 2X and
this can result in inadmissible
estimates
which are greater than
?.
For example,supposing wehadthe following datafrom ]U[0, ?:
4.5, 1.8, 2.7, 0.9, 1.3
This gives x = 2.24 . Sincethe method of moments estimator is ?= 2X , wehave ?= 4.48. This
estimate for the upperlimit is inadmissible as one of the data valuesis greater than 4.48.
Nevertheless in
normal
many common applications
cases both
In some situations
such as the binomial, Poisson, exponential and
methods yield the same estimator.
such
as the gamma
with two unknown
parameters
the simplicity
method of moments gives it a possible advantage over maximumlikelihood
require
a complicated
numerical
of the
which may
solution.
Toobtainthe MLE
of afrom agammadistribution
requires
the differentiation
ofaG(), which
requires
numerical
IFE: 2022 Examinations
methods.
The Actuarial
Education
Compan
CS1-08: Point estimation
7
Page 35
Thebootstrap method
Thissection ofthe Core Readingrefers to the use of Rin bootstrapping. This materialis not
explained in detail here; wecover it in the PBORresources for Subject CS1.
7.1 Introduction to bootstrap
The bootstrap
method is a computer intensive
estimation
method and can be used to
estimate the properties
of an estimator. It is mainly distinguished in two types: parametric
and non-parametric
Suppose that
bootstrap.
we want to
(,yy12,y? , )n
makeinferences
which follow
about
a distribution
parameter
with cumulative
Usually inference is based on the sampling
distribution is obtained either by theoretical
? using
distribution
observed
function
data
)Fy (; ? .
distribution
of an estimator ? . A sampling
results, or is based on a large number of
samples from )Fy (; ? .
For example,
suppose
with parameter
? and we wish to
~(12?? , 1
YN
asymptotically
(,yy12,y? , )n from
we have a sample
n
makeinferences
about
7.2
distribution
distribution
us that
to estimate
or tests about ?). However, there will be
may not hold (or we may not want to use
Then one alternative option is to use the bootstrap
resampling
The CLT tells
) and we can use this sampling
quantities of interest (eg for confidence intervals
cases where assumptions
or asymptotic results
them
eg when samples are small).
making assumptions
forming an empirical
?.
an exponential
method. Bootstrap allows us to avoid
about the sampling
distribution
of a statistic of interest,
by instead
sampling distribution
of the statistic.
This is generally achieved by
based on the available sample.
Non-parametric (full) bootstrap
The main idea behind
described as follows.
non-parametric
Construct the empirical distribution,
Fy()
1
when estimating
a parameter
? , can be
Fn, ofthe data:
{ Number of nin
==yy}
Then perform the following
1.
bootstrap,
steps:
Draw a sample of size n from Fn.
This is the bootstrap
sample
(, 12, ? , )nyy
y
with y* selected
**
*
with replacement
from
(,yy12 ,y? , )n .
2.
Obtain an estimate
* from the bootstrap
?
sample.
Thisis donein the samewayas ?is obtained
fromthe originalsample.
Repeat steps 1 and 2, say, Btimes.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 36
CS1-08: Point estimation
Provided that
the empirical
?
Bis sufficiently
distribution
, and is referred
Schematically,
yy12,
,..., yn
of ?
* , which serves as an estimate
to as the bootstrap
this
can be thought
sample
1:
sample
2:
empirical
distribution
**
yy12,
,..., y*n()
**
,yy12
,..., y*n()
B:
**
?,
)B??
?
will provide
*
of the sampling
of
?
distribution
of
.
as
?
? ?* ?
1
?
? ?* ?
2
?
sample
(, 12,
large, the output set of estimates
**
*
,yy12
,..., yn
()
?
*
?B
Bootstrapempirical distribution of ?.
???
?
?
?
?
Thebootstrapdistributionof ? canthen beusedfor anydesired
inferenceregardingthe
estimator ?, and particularlyto estimateits properties. Forexamplewecan:
estimate the
mean of estimator
(, 12,
estimates
?
by using the sample mean ofthe bootstrap
? :
? , )B??
**
*
B
E ()
estimate its
= 1 ? *j ;
B
??
j
=
1
median, using the 0.5 empirical
quantile
of the bootstrap
*
estimates
?j ;
estimate
the varianceof estimator? byusingthe samplevarianceofthe bootstrap
estimates
**
(, 12
,
? :
?, )B??
*
?
var()
1-????
estimate a (1
??
()
2????
**2 ????
?jj
??
??
BB jj
== 11
??
11
=-
BB
??
)%a-confidence interval
;
for
?
by:
kk,aa-21
2()
where
the
kadenotes
Confidence intervals
ath empirical quantile of the bootstrap values ? * .
are described
in
Chapter 9.
Example
Suppose we havethe following
with unknown
parameter
sample of 10 values (to 2 DP)from an Ex
()p distribution
?
?:
0.61 6.47 2.56 5.44 2.72 0.87 2.77 6.00 0.14 0.75
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-08: Point estimation
Page 37
Wecan use the following
R code to obtain
a single resample
with replacement
from this
original sample.
sample.data
0.14,
<-c(0.61,
6.47,
2.56,
5.44,
2.72,
0.87,
2.77,
6.00,
0.75)
sample(sample.data,
replace=TRUE)
If we do this, Rautomatically gives us a sample ofthe same size asthe original data sample, ie we
obtain a sample of size 10in this case.
Note that this is non-parametric
as we are ignoring
the
Ex
?()p
assumption
to obtain a new
sample.
The following
Rcode obtains
stores them in the vector
B 1,000=estimates (
**
*
, 12
,...,??
) using
?1,000
?
**1jjy=
and
estimate:
set.seed(47)
estimate<-rep(0,1000)
for
(i
in
1:1000)
{x<-sample(sample.data,
replace=TRUE);
estimate[i]<-1/mean(x)}
An alternative
would be to use:
set.seed(47)
estimate
<-replicate(1000,
1/mean(sample(sample.data,
replace=TRUE)))
The Actuarial
Education
Company
IFE: 2022 Examination
Page 38
CS1-08: Point estimation
This gives us the following
Wecan obtain estimates
estimator
empirical sampling distribution
for the
mean, standard
using the following
?
of
?
:
error and 95% confidence
interval
of the
R code:
mean(estimate)
sd(estimate)
quantile(estimate,
7.3
c(0.025,0.975))
Parametric bootstrap
If
we are prepared to assume that the sample is considered
distribution,
likelihood,
to come from
wefirst obtain an estimate of the parameter ofinterest
or method of moments).
equal to
?
proceed
as with the non-parametric
Then we use the assumed
, to draw the bootstrap samples.
a given
(eg using maximum
?
distribution,
with parameter
Oncethe bootstrap samples are available, we
method before.
Example
Using our sample of 10 values (to 2 DP)from an Ex
parameter
()p distribution
?
with unknown
?:
0.61 6.47 2.56 5.44 2.72 0.87 2.77 6.00 0.14 0.75
our estimate wouldfor ? would be ?y==1
distribution
to generate the bootstrap
Note that this is parametric
samples.
IFE: 2022 Examinations
1 2.833 = 0.3530.
Wenow usethe Exp(0.3530)
samples.
as we are using the exponential
distribution
to obtain
The Actuarial
new
Education
Compan
CS1-08: Point estimation
Page 39
Wecan use the following
?
R code to obtain
**1jjy=
and store them in the vector
B 1,000= estimates
**
*
, 12
,...,??
) using
? 1,000
(
param.estimate:
set.seed(47)
param.estimate<-rep(0,1000)
for
(i
in
1:1000)
{x<-rexp(10,rate=1/mean(sample.data));
param.estimate[i]<-1/mean(x)}
An alternative
would be to use:
param.estimate
<-replicate(1000,
1/mean(rexp(10,rate=1/mean(sample.data))))
This gives us the following
Various inferences
can then
empirical sampling distribution
be made using the bootstrap
of
?
:
estimates (
**
*
12 ,..., )B?
??,
Bootstrap methodology can also be used in other, more complicated, scenarios
example in regression
The Actuarial
Education
Company
analysis
or generalised
linear
as before.
for
model settings.
IFE: 2022 Examination
Page 40
CS1-08: Point estimation
The chapter summary starts on the next page so that you can keep
all the chapter summaries together for revision purposes.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-08: Point estimation
Page 41
Chapter8Summary
Methodof moments
The method of moments technique
using the formulae:
(EX) =
1 parameter
1
equates the population
moments to the sample
moments
n
n
? Xi
i= 1
2 parameters
( )==
EX
X
X(
E
11 nn
?? X22
)
or
ii
nn
var(X) =
ii ==11
alternatively
( )==
EX
X
1
n
n
?(X
i
X)2
-
i =1
var( X)
S =
11
nn1-
nn
?? (Xii
-
X)
22
ii== 11
Maximumlikelihood estimation
The method of maximumlikelihood hasthe following stages:
n
() =? Lfxi
( ??
; )
find the likelihood
i
=
1
find ln L
find
?
that solves
ln ??L
( )
=
0
??
check for
maximum
?
2
2ln L?
( )< 0
.
??
If the range of the distribution is afunction of the parameter, the maximum must be found
from first principles.
Propertiesof estimators
The bias of an estimator is given by
gX
() is an unbiased estimator
of ? if
[( X)]Eg
?-
where
gX() is the estimator.
[( X)]Eg
?=
.
The mean square error of an estimator is given by [(Eg( X?)
) 2]
where
gX() is the estimator.
An easier formula is var[ g( X)]+ bias 2[ g( X)] .
The Actuarial
Education
Company
IFE: 2022 Examination
Page 42
CS1-08: Point estimation
Anestimatoris consistent
if the meansquareerrortendsto zeroas ntendstoinfinity, where
nis the sizeof the sample.
A good estimator has a small MSE,
is unbiased and consistent.
The Cramr-Rao lower
bound gives alower
bound for the variance of an unbiased estimator.
It can be usedto obtain confidence intervals. Its formula is:
CRLB()=?
1
2
???
2lnEL(?, X)??
?
???
??
The value of the CRLBdepends on the parameter you are estimating. To usethis formula, the
likelihood mustbe expressedin terms of the correct parameter.
The asymptotic
distribution
of an MLEis:
(,NCRLB)
?????
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-08: Point estimation
Page 43
Chapter8 PracticeQuestions
8.1
Arandom sample from a Poi
()sson
distribution is asfollows:
4, 2, 7, 3, 1, 2, 5, 4, 0, 2
Calculate the
8.2
method of moments estimate for
The heights of 10-year-old children
.
are assumed to conform to a normal distribution.
The heights
of arandom sample of 5 such children are:
124cm, 122cm, 130cm, 125cm and 132cm
Estimate the meanand variance ofthe heights of 10-year-old children using the method of
moments.
8.3
Waitingtimes in a post office queue have an
?()Exp
distribution. Ten people had waitingtimes
(in minutes) of:
1.6
0.9
1.1
2.1
0.7
1.5
2.3
1.7
3.0
3.4
Afurther six people had waitingtimes of morethan 4 minutes.
Calculate the
8.4
Exam
style
maximum likelihood
estimate of
? based on these data.
The number of claims arisingin a year on a certain type ofinsurance policy has a Poisson
distribution
with parameter ?.
Theinsurers claim file shows that claims were madeon 238 policies during the last year withthe
following
frequency
distribution
for the number of claims:
Number of claims
1
Frequency
174
2
50
3
10
4
4
= 5
0
Noinformation is available from the policyfile, that is, only data concerning those policies on
which claims
were made can be usedin the estimation
of the claim rate
?. (This is
why there is
no entry for the number of claims being 0in the table.)
(i)
Show that the truncated probability function is given by:
xe
- ?
PX x()==
?
!(1 -xe
The Actuarial
Education
Company
x =1,2,3,?
-?
[3]
)
IFE: 2022 Examination
Page 44
(ii)
CS1-08: Point estimation
Show that both the
method of moments estimate and the
?
(1=- xe - ) , where x is the
claim.
?
(iii)
MLEof
?satisfy the equation
mean number of claims for policies that have at least one
[7]
Solvethis equation, by any means,for the given data and calculate the resulting estimate
of ?to two decimal places.
(iv)
[3]
Hence, estimate the percentage
of all policies
with no claims during the year.
[1]
[Total 14]
8.5
Determine the
mean square error of
X which is used to estimate the
=
mean of
a
N
s2(,
)
distribution based on arandom sample of n observations.
8.6
Exam style
Supposethat unbiased estimators 1X and 2X of a parameter
independent
Let Y be the combination
(i)
va
1r()X
methods, and suppose that
given by =+12YX
a
Derivethe relationship satisfied by
s=
2
and that
X , where
a
and
?have been determined by two
va
2r()X
a and
fs=
2
, where
f>0.
denote non-negative
weights.
sothat Yis also an unbiased estimator of ?.
[2]
(ii)
8.7
Exam style
Determinethe varianceof Yin terms of f and s2if, additionally,the weightsarechosen
suchthat the varianceof Yis a minimum.
[4]
Arandom sample
nxx?
12x
,, ,
is taken from a population, which hasthe probability distribution
function Fx
() andthe densityfunction
the minimum and maximum values
(i)
fx() . The valuesin the sample are arrangedin order and
MINx
and
Showthat the distribution function of
for the distribution function of
MAXx
are recorded.
MAXX
is [( )]nFx
, and find a corresponding formula
MINX
.
Theoriginaldistributionis nowbelievedto bea Par
[3]
(,a1)eto
distribution, ie the probability density
function is:
()
fx==+
a
(1)+ x
(ii)
x
0
Determinethe distributionfunction of X, and hencedeterminethe distributionfunction
of
(iii)
a1,
MAXX
.
[2]
Showthat the probability density function for the distribution of
na
fx()==+n
XMIN
(1
+
x)
a1
x
0
Arandom sample of 25 values gives a sample value for
IFE: 2022 Examinations
MINX
, is:
[2]
MINxof 23.
The Actuarial
Education
Compan
CS1-08: Point estimation
(iv)
Page 45
Obtain a maximum likelihood
estimate
of
a
using the distribution
of
[3]
MINX
.
The same random sample gives a value of
(v)
MAXx
of 770.
Obtain an equation for the maximumlikelihood estimator of a using
on the difficulty
(vi)
MAXx
. Comment
of solving this equation.
Outline whatfurther information
moments estimate
of
[3]
you would need herein order to obtain a method of
a.
[1]
[Total 14]
8.8
Arandom sample of eight observations from a distribution is given below:
4.8
(i)
7.6
3.5
2.9
0.8
0.5
2.3
Derivethe method of moments estimates for:
(a)
?from an
(b)
?from a
(ii)
()Exp?
distribution
2 distribution.
??
Derivethe method of moments estimators for:
(a)
k andpfromaType2negative
binomialdistribution
(b)
8.9
1.2
and s
2
from alognormal distribution.
Showthat the likelihood that an observationfrom a Poisson()? distributiontakes an odd value
(ie 1, 3, 5,...) is 12(1 -e -2? ) .
8.10
A discrete random
variable has a probability function
x
2
PX
x=()
(i)
1
8
4
1
+2a
2
given by:
5
-3a
3
8a+
Givethe range of possible values for the unknown parameter
a.
Arandom sample of 30 observations gaverespective frequencies of 7, 6 and 17.
(ii)
(iii)
Calculatethe method of moments estimate of a.
Write down an expression for the likelihood
maximumlikelihood estimate
a
of these data and hence show that the
satisfies the quadratic equation:
180 2 111
aa +- 91 =0
832
The Actuarial
Education
Company
IFE: 2022 Examination
Page 46
(iv)
CS1-08: Point estimation
Hence determine the
maximum likelihood
estimate and explain
why one root is rejected
as a possible estimate of a.
8.11
A motorinsurance portfolio produces claim incidence datafor 100,000 policies over one year.
The table below shows the observed number of policyholders
Exam style
making 0, 1, 2, 3, 4, 5, and 6 or
moreclaims in a year.
No. of claims
No. of policies
0
87,889
1
11,000
2
1,000
3
100
4
10
5
1
=6
Total
(i)
(a)
100,000
Estimate the parameter of the Poisson distribution to fit the above data usingthe
method of moments.
(b)
(ii)
[3]
Showthat the estimate of the Poisson parameter calculated from the above data using
the
(iii)
Hence calculate the expected number of policies giving rise to the different
numbers of claims assuming the Poisson model.
(a)
(b)
method of moments is also the
maximum likelihood
estimate
of this parameter.
[4]
Estimate the two parameters ofthe Type 2 negative binomial distribution to fit
the above data usingthe method of moments.
Hence calculate the expected number of policies giving rise to the different
numbers of claims assuming a negative binomial model.
[6]
You may use the relationship:
PX ()
x==
kx+- 1
x
q P(X = x -1)
for the negative binomial distribution.
(iv)
Explain briefly
why you would expect a negative binomial
distribution
to fit the above
data better than a Poisson distribution.
[2]
[Total
IFE: 2022 Examinations
The Actuarial
Education
15]
Compan
CS1-08: Point estimation
8.12
Exam style
Page 47
Arandom sample
X1,,
n?
X is taken from the normal distribution
which has mean
and
variance s2.
(i)
State
thedistribution
of ?
s
2
X
()iX
.
[1]
2
It is decidedto estimatethe variance, s2, usingthe following estimator:
+
?
nb
s
=-
221
()iX
X
wherebis aconstant.
(ii)
(iii)
(a)
Usepart (i) to obtain the bias of s 2.
(b)
Hence,
showthat s 2is unbiasedwhenb
1=-.
(a)
Show,usingparts(i) and(ii)(a), that the meansquare error of s 2 is given by:
MSE()
2(
=
[3]
1)-+ (1 +nb)2 24
nb
() 2
ss
+
(b)
2
Determine whether the estimator, s , is consistent.
(c)
Show
thatthe mean
squareerrorofs 2is minimised
when
1b= .
You may assume that the turning
(iv)
[7]
point is a minimum.
Commentonthe bestchoicefor the valueof b.
[2]
[Total 13]
The Actuarial
Education
Company
IFE: 2022 Examination
Page 48
CS1-08: Point estimation
The solutions start on the next page so that you can
separate the questions and solutions.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-08: Point estimation
Page 49
Chapter8 Solutions
8.1
sson
The population meanfor a Poi
.
30
meanis x
The sample
() from page 7 of the Tables is
== 3.
10
Equating population meanto sample meangives =3 . Sincethis is an estimate ofthe true value
of
8.2
,
we write = 3.
The sample moments are:
1
633
==
126.6 and
1nn
80,209
??xxii 2
=
nn
55
ii==11
The population moments are
sample and population
EX
()=
and EX()
var( X) [ (EX)]22
=
2
+s=+ 2 . Equating the
moments gives:
126.6
=
s
22
16,041.8
+=
?
s2
=14.24
s
Alternatively,
usingx =126.6and22=
population
EX
()=
moments of
=xs
==126.6
8.3
=16,041.8
Using the likelihood
and s 22=
formula
??
?Lfx( i )??
??
??
i=1
80,209 =- 5
var(
and
=
126.6
{}
17.8
andequating
thesetothe
s2)Xgives:
17.8
given for censored data in Section 2.4:
10
()=
1
4
X(
P
> 4)
[]
6
=??
10
e-
?
?xi
e-4
6
()?
8
(4)
since fxi()=? e?- xi and PX>=
? ?e
??4 e-4--?.
dx = - e ??xx??8
=
4
Takinglogs:
ln ( )?=- 10ln
-?Lxi
?
24??
Since?xi 18.3=
weget:
ln L??
( )?=- 10ln
The Actuarial
Education
Company
42.3
IFE: 2022 Examination
Page 50
CS1-08: Point estimation
Differentiating:
d
10
lnL( ?)=-42.3
d??
Thisis equal to 0 when
Differentiating
d2
10
?==
0.2364.
42.3
again:
10
lnL( ? ) =-
d
22
<0
max
?
??
?== 10
Sothe maximumlikelihood estimate is
8.4
(i)
0.2364.
42.3
Since only policies with claims are included,
xe
we mustuse atruncated Poisson distribution:
?-
(PX
x)== k
?
where k is the constant
[1]
x = 1,2,3,?
x!
of proportionality
to ensure that the sum of the probabilities is 1.
For the ordinary Poisson distribution:
?PX x()
==
1
(PX 1)== 1 - (PX = 0) = 1 -e
?-
?
[1]
x
So our probability function
8
?
kP
X==x()
can be written as:
1
? k(1
1
e- )-= 1
? k=
(1-e -?)
?
x= 1
(ii)
We will first use the
method of moments technique,
[1]
so we need the
mean of the
truncated Poisson distribution:
88
EX
[]
x
==
!(1e--xe )
?? x
xx== 10
??
xx
ee-- ??
1
=
!(1 xe-- ?? )
(1 -
8
- ?
)
x=
?x ?
x
e- ?
[2]
x!
0
sincethe =0x
termis zero.
Thesumis the meanof the Poissondistribution(found bysumming
8
EX
[]
?x
(1eex ) x = 0
x
e-
!
?
11
??
==
[1]
? =
(1--
-- ??
Sothe method of moments equation is x =
(1 - e ? )
)
-
?
1 - e-
IFE: 2022 Examinations
PFx), so weget:
or
?
(1=- xe
-
?
), as required.
?
The Actuarial
Education
Compan
CS1-08: Point estimation
Page 51
The likelihood
function is:
n n???
(
L
xx
ii ee
--
i = ? !(1 xei )
?)
==
--1e
constant
(1
where the constant incorporates
-- ??
??
[1]
)n
the factorial factor.
Takinglogs:
constant (?=+
log
-nLxi)log??-nlog(1
-e - ? )
[1]
Differentiating withrespect to ?:
log ?
d
=-
Ln -
dnxi
e
nx(1 e
n (1-- e )--n??)e
-- ??
?-
=
?1(1
--
??
nx(1-- e-?)
-?
ee
--)
??
n?
=
?(1- e
-?)
Equating to zero gives
(iii)
(1=- xe
?
) asrequired.
- ?
[2]
From the data:
174 1 + 50 2+ 10 3
4 4
+
238
320
238
x==
So
320
(1-=- e
238
?
)
?
?
or
320
-
(1-e
[1]
-=0 .
?)
238
Usingtrial and error on the second equation we get:
?
= 0.6
? LHS =- 0.0147
?
= 0.7
? LHS = 0.0460
Usinglinear interpolation:
0.6=+
?
0 (-- 0.0147)
0.0460
(-- 0.0147)
(0.7
-
0.6)
=
[2]
0.624
Alternatively wecould use asystematic methodsuch as Newton-Raphson.
(iv)
Now PX (0)
e-==
?. Bythe invariance property, the maximumlikelihood estimate of this
probability is:
?
ee 0.624
== 0.536
--
So we estimate that 54% of policies have no claims.
The Actuarial
Education
Company
[1]
IFE: 2022 Examination
Page 52
8.5
CS1-08: Point estimation
The MSEis given by:
bias 2( )
MSE()=+ var()
2
var()X =+ biasX(
)
where:
bi
When
X E
X
()=()as
XN(,
2)s?we have
?XN
,
s
2??
??
n ??
so EX
()=
and var()X
s
=
??
2
n
.
Hence:
bias ()=-=0
X
Therefore:
MSE() =+ var()X
8.6
(i)
Since1X
0
and 2X
EX
2
s2
=
n
are unbiased estimators
[EX[]
]
12 ==
?
EY
()
var( )
1
X)
s2)X
=
a
E( X1) +
d
minimum,
var(Y)
2
a
-
a
2
f
= fs2
a var(1X)
=
2(1
2
)
=(a
+
)?
[1]
. Since1X and2X areindependent:
+ 22 var()X2
]
[1]
weset the derivative
[2sa
E
X(
+=1.
and var2()X
var( YX=+ a 12
X)
22
)[(1
=+
sa
To obtain the
meansthat:
[1]
E( a =12
X =+
Now we have var(
?this
?
Hence,if Yis unbiased for ?, then
(ii)
of
equal to zero:
) ]
af=-
-
da
?
(1=-aaf )
?
a
f
=
[1]
1 + f
Checkingits a minimum:
d2
2var(Y)
2 =+
sf[2
2] >
0
?
min
[1]
da
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-08: Point estimation
Page 53
So:
??
s
?
2
2
(1
11++ ff
?
f
???
??
???
??
??
(1++ff 22
) ??
??
)
f
2
s
=
ff
ff
=+
??
22
?? ?
?? =+ ?1 -
var(Y) s 2
[1]
1+f
8.7
(i)
Considerthe value of
MAXX
. This will beless than some value x, say,if and onlyif all the
sample valuesareless than x. The probability ofthis happeningis just
Fx
()[]n . So:
n
[1]
FxXMAX
() = F x)[](
Usingsimilar logic,
MINXwill be greater than some number x if and onlyif all the sample
values are greater than x. So:
PX MIN ()
x == P(all
?
(ii)
Xi
x)== 1
-F( x)[]n
n
1 - F( x)[]
Fx()
XMIN
1=-
function
of X is given by:
The distribution
[2]
xx
??+(1 )t
Fx()
f (t ) dt==
1
--
a
??
dt = -(1 +t )-aax ?? =-1
(1 + x )-a
[1]
0
00
where =0x
.
Hence
(iii)
Similarly
FxXMAX
() F x()[] ==
n
n
1 - (1 +)x
=- 1 - F x()[] n
FxXMIN
() 1
()a.
[1]
n
=- (1 + x) ??
1
??
1=-(1 + x--) aa
n
This hasthe same form asthe original distribution function, so
x =0
[1]
MINX
hasthe Pareto
distributionwithparameters
na and1. Sothe density
functionof MINX
is:
XMIN fx()
The Actuarial
Education
Company
na
(1
+
x)
na
1
x==+0
[1]
IFE: 2022 Examination
Page 54
(iv)
CS1-08: Point estimation
The likelihood
L()a
function
for
a, based on a single value of
, is:
MINX
na
=
(1 +x)n a+1
? log-Ln
( )
?
?
log =+logaa
log (a)=1
=
a
[2]
log(1 +nx)
25=and x
Substituting in n
(v)
Lnlog(1 + x)
a
?a
?
1
( n a + 1)log(1 + x)
23=
, weget a 0.01259=
.
MAXX
, wehave(using the
Applying the same approach to
[1]
derivative of
FxXMAX
() from
earlier) alikelihood function of:
)
n-1
()==LfX
( x)
MAX
n 1(
1+ x()-a
aa
? log-Ln
( ) log =+( n 1)log 1 - (1
?
?
log )(a
1(1
Substituting in n
25= and x
log771 -+ 24
1
x)-a ?? +logaaa
??
)-alog(1 ++ xx)
-
(
+
1)log(1)+ x
-log(1 + x) = 0
[2]
1(1-+x )-a
a
?a
1
=+-(Ln 1)
+
(1 + x)-- a
770= we get:
771
-
a
log771
=0
[1]
1 - 771 -a
a
This equation cannot be solved algebraically. A numerical method will be needed to solve
it.
(vi)
Wecannot usethe usual method of moments approach unless weknow all the individual
sample values (or atleast the
mean of the sample).
So we do not have sufficient
information to usethe method of moments approach here.
8.8
(i)
[1]
Method of moments(one unknown)
Wehave one unknown
and so require
only one equation:
EX
()x==i?
1 x
n
Herewehave =2.95x
.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-08: Point estimation
(i)(a)
Page 55
Exponential
Usingthe formula for the meanof an exponential distribution:
1
2.95
=
0.33898
? =?
?
(i)(b)
Chi-square
2
Since
=Gamma1(,?
??
(ii)
) , weget
22
?
a
==
2
12
?
. Hence
= ?
?=
2.95.
Method of moments(two unknowns)
Wehave two unknowns
and so require two equations.
=??x
)ii==x
Either:
()
EX
or:
EX
()x==1n? xi
E( X
and
Forourdatawehavexxi
2.95,
(ii)(a)
EX()
2
1
8
11and22
nnx
var()Xs= 2
13.635==?
and s2 =5.6371.
Negative binomial
Usingthe first method gives:
(1)-kp
(EX)
p
==2.95
(1 -- )kp)??2
22 (1 kp +??
=13.635
??
(EX )
var( X) [ (EX=+
)] =
p ??
2
p
Substituting
the first equation into the second gives:
2.95
2.952 += 13.635
?
2.95
pp
= 4.9325
?
p = 0.59807
Hence,substituting this backinto the first equation gives=k
Using the second
Substituting
method gives:
(1) kp
( )
EX
p
== 2.95
and
var(X)
k(1--p )
p
==2 5.6371
the first equation into the second gives:
2.95
= 5.6371
p
The Actuarial
4.3896.
Education
Company
?
p = 0.52331
IFE: 2022 Examination
Page 56
CS1-08: Point estimation
Hence, substituting
(ii)(b)
this back into the first equation
gives =k
3.2386 .
Lognormal
Usingthe first
method gives:
(EX)==e
++ s 221/22
2.95 and
E( X ) = e2
2s
=13.635
Rewriting the second equation gives:
2(
+1/2 ees22
)s
2.952 es 2== 13.635
?
Substituting this into the first equation gives
=0.44903
s2
0.85729=
.
Usingthe second method gives:
s
(EX)==e ++1/22
Substituting
22
s ( e s2
2.95 and var()X = e
-
1)
=
5.6371
the first equation into the second gives:
2.95 Xes
( 2
var( )
Hence, substituting
1) = 5.6371=s?
22
=
0.49942
this into the first equation gives
0.83210=
.
8.9
Defining X to be an observation from a Poisson?() distribution, wehave:
(PX 1) +(PX 3)
+(
PX==
= 5) +
=
e-
??
+??? +?????
3!
5!
??
35
+
Tosum the seriesin the square bracket, note that:
e?
e
So
1
=
1
+ ?
1=-
-?
?-=-ee
??
2!
? +
()
+
??
+++
2!
?
-
23
?
3!
??
23
3!
+
?
3
which is the required
+?,
23!
series.
Sothe required probability is:
e- 2()
ee
??
IFE: 2022 Examinations
--
?
= (1-e ?-11)
22
The Actuarial
Education
Compan
CS1-08: Point estimation
8.10
(i)
Page 57
Range of values
Since 1==
0(PX x) = , usingthis for eachofthe probabilitiesgiveslower boundsfor
,--
16
11
3
and
6
-
8
1
. Hence,
a=-
16
.
Wealso obtain upper boundsfor
a
of
71
,
16 6
a of
and
5
8
.
1
Hence
a=
6
.
(ii)
Method of moments estimator
Wehaveoneunknown,
so wewilluse EX
()x= .
aa ()
=+22 () + 82 -43
5 8++ a() =3338 -3a
11
(EX)
From the data, we have:
7 2+
6
x==
4
+
17
5
123
30
30
=4.1
Therefore:
33
8
3
?aa -=
4.1
= 0.0083
?
This value lies between the limits
(iii)
derived in part (i).
Maximumlikelihood
The likelihood
of obtaining the observed results is:
()
L()=
11
828
2 +
aaconstant
6
() -
3a
3
717
()
+a
Taking logs and differentiating:
ln ( ) Lconstant
d
?
d?
ln
=+
7ln
14
a()L
2
+ aa
=-
() + 6ln 11
82
18
11+- 23
aa
+
82
-
3 a() + 17ln
3 +a
8
()
17
3
8
+ a
Equating this to zero to find the maximumvalue of ? gives:
14
18
11 23
82+- aa
-+
17
3
a
8 +
? 14211 3()()
233+
-+aa
-18
28
?
14
16
? 180
The Actuarial
Education
2
35
-aa -8
3
=0
()
8
+
38+
18 64
()()8
a
a
a
117 8 +
+
1
()()
a 2
-3 a = 0
168+5 a -6
+2 a22() + 1771
() =0
a2
111
0832
aa +- 91 =
Company
IFE: 2022 Examination
Page 58
(iv)
CS1-08: Point estimation
MLE
Solving the quadratic equation gives:
()2
111
111
88
-
a
4180
-
91
32
-
360
The maximum likelihood
==-
0.170,0.0929
estimate is 0.0929.
The other solution of 0.170- does notlie between the bounds calculated in (i). It is not feasible
asit is less than the smallest possible value for a of - 0.0625 .
8.11
(i)(a)
Method of moments estimate
of Poisson parameter
The sample meanis:
1
100,000
(87,889
The mean of the
0
11,000
1
?()Poi
distribution is
?.
+
+
1,000
2 +? ) = 0.13345
Sothe method of moments estimate of ? is 0.13345.
(i)(b)
[1]
Expected results using method of moments(Poisson)
For the Poisson distribution, probabilities can be calculated iteratively usingthe relationship:
(PX x)==
?
x
P(X = x -1),
x =1, 2, 3,...
The expected numbers, based on this estimate, are:
x0= :
100,000e- 0.13345 = 87,507
x1= :
0.13345 87,507 11,678=
x2= :
x3= :
x4= :
0.13345
2
0.13345
3
0.13345
4
11,678= 779
779= 35
35=
1
x5= :
0.13345
5
x6= :
100,000 87,507
-- 11,678 - 779 - 35 - 1 - 0 = 0
IFE: 2022 Examinations
=
10
[2]
The Actuarial
Education
Compan
CS1-08: Point estimation
(ii)
Page 59
MLE(Poisson)
Thelikelihood ofobtaining
0n 0s,1n 1s etc(makingatotal of n),assuming
the numbers
conform to a Poisson distribution, is the multinomial probability:
!
()=
-Le 1?? )n0(??( e
nn
2!!
01 n !?
constant=
?
constant
?
=
n++nn
1223
13,345
?2
ne-?
n
n
?? 2
??
2! ??
??
)
?
+
3??) -? (n+0 +n1 n2+
e
e -100,000?
[1]
Sothe log likelihood is:
ln L( ) 13,345ln =- 100,000 ?+??constant
Differentiating withrespect to
d
13,345
ln L(?)
[1]
? to maximisethis:
=-100,000
[1]
d??
Thisis zero when:
13,345/100,000
?
0.13345==
[1]
Sincethe second derivative is negative, this is the maximumlikelihood estimate of ?. It is the
same asthe method of moments estimate.
8.12
(iii)(a)
Method of moments estimate of negative binomial parameters
The second (non-central)
1
100,000
sample
(87,889
0
+
moment for the data is:
11,000
22
1
+
1,000
22 +? ) = 0.16085
The meanand second non-central moment of the negative binomial distribution
k and p are
kq
p
and
2
kq??
kq
p2
with parameters
+??
p??
.
Sothe method of moments estimators of k and p satisfy the equations:
kq
2
0.13345
and
kq
kq??
=+
=??
2
ppp
??
0.16085
[2]
From the second equation:
kq
p2
The Actuarial
2
0.16085=-
Education
Company
kq??
??
p??
0.16085=- (0.13345) 2
=
0.14304
[1/2]
IFE: 2022 Examination
Page 60
CS1-08: Point estimation
Using the first equation gives:
0.13345
p==0.93295
0.14304
1=- qp
[1/2]
1=- 0.93295
=
0.06705
[1/2]
0.13345 0.93295
and
k
(iii)(b)
Expected results
==1.8569
0.06705
using
[1/2]
method of moments (negative
binomial)
The expected numbers, based on these estimates, are:
x0=
:
100,000(0.93295)
1.8569
x1= :
1
= 87,909
0.06705 87,909
=
10,945
2.8569
0.06705 10,945 = 1,048
2
x2= :
3.8569
x3= :
x4=
1.8569
3
4.8569
:
4
5.8569
x5= :
5
0.06705
1,048
0.06705
90
=
=
90
7
0.06705 7 = 1
100,000 87,90910,9451,048
-- 90 - 7- 1= 0
x6= :
Wehave made use of the negative binomial recursive relationship
(iv)
[2]
given in the question.
Whynegative binomial is a better fit
For a Poisson distribution, the meanand variance arethe same. Sincethe sample meanand
variance (which, for a sample aslarge asthis, should be very close to the true values) are 0.13345
and 0.14304,
which differ significantly,
this suggests that the Poisson distribution
may not be a
suitable model here.
[1]
The negative binomial distribution has moreflexibility and can accommodate different values for
the
mean and variance (provided
IFE: 2022 Examinations
the variance exceeds the
mean).
[1]
The Actuarial
Education
Compan
CS1-08: Point estimation
8.13
(i)
Page 61
Distribution
Usingthe result given on page 22 of the Tables:
= ? i -XX() 2
-(1 )nS2
ss
(ii)(a)
2
? ?n1
22
[1]
Bias
The bias of 2s is given by bi
?
()
22()as
ssE
2
=-s
. From part (i) we have:
XXi
() 2??s
??=-En (1)
2
nb
Since (+=s
[1/2]
??
??
, we have:
? Xi - X)
22()
nbs
() 2??+
s
??=- En
2
+nb
()
?
s
(1)
??
??
2
[] (En=- 1)
s
2
? E[]22
n -(1)
=
[1]
ss
+nb
()
Therefore the bias is given by:
(1)
bias()22
(ii)(b)
ss
nb
()
=-
s
2
= -
(1-+ nb)
()
++ nb
s
2
[1/2]
Unbiased
Substituting =-1binto the bias gives:
bias()22
Hence,
s2
(1
=-
-
1)
ss
n-(1)
=0
is an unbiased estimator
[1]
of
s2
when
=-1b
.
(iii)(a)
Mean square error
The meansquare error of 2s is given by
var
?
XXi
() 2??s
The Actuarial
Education
2
Company
??=- 2(n
1)
MSEss()s=+ var( 22 )
bias 2 2 ().
From part (i) we have:
[1]
??
??
IFE: 2022 Examination
Page 62
CS1-08: Point estimation
nb
Since (+=s
?
, wehave:
- X)
22()
Xi
nbs
() 2??+
??=- 2( n
var
2
s
() 2
+nb
?
4
s
? var[
1)
??
var[
] =
]
s
2(n=-2 1)
2( n- 1)
24
ss
[1]
() 2
+nb
Usingthis and the biasfrom (ii)(a), the meansquare error is given by:
MSE()242( )-+1)ss
=+
(1 nb
nb
()
(iii)(b)
As
nb()22
nb
() 2
s
4
[1]
+
Consistent
?8n
, the meansquare error becomes:
MSE()242 ss
??0
n
So
++
22
-2(n 1)+ (1 + b)
=
s4
s2
(iii)(c)
is consistent.
[1]
Minimum meansquare error
Differentiating
d
db
with respect to b using the quotient rule gives:
MSE()
2(1)(b
++
nb)
-
[2( n - 1)+(1) +b22]
=
2( +nb)
ss
() 4
+nb
24
[2]
Substituting b1= into this expression gives:
db
22(
dn+ 1)2 - [2( n - 1) + 4] 2( n+ 1)
MSE()
=
n+(1)
b=1
4(
22
1) +- 4(
nn + 1)
=
n +(1)
4
ss
4
s
24
4
=0
Sothe MSE
is minimised when b1=
IFE: 2022 Examinations
[1]
.
The Actuarial
Education
Compan
CS1-08: Point estimation
Alternatively,
Page 63
we could attempt to find the value of b that
=[2(n-1)+ (1+ b22
)]
2(1
b++ )()
bn
? (1
?
?
?
(iv)
2()n
+
makes this zero asfollows:
b
)(bn++ b) = [2( n- 1) +(1 + b) 2]
nbbnb
++
22
= n -21 + 2 +bb
+
bn (1)-= n - 1
b =1
Best estimator
All values of b give consistent estimators.
unbiased,
whereas when
When b
b1= , the estimators
=-1 , the estimators
+ ?Xi
n 1
=-
- ? Xi
=-
n 1
()221
X
has the smallest
X
(221
) is
MSE,but it is
biased.
Since a smaller
MSEis
moreimportant
than being unbiased,
weshould choose b1=
[1]
.
However,
there willbelittle differencebetweenthe estimatorswhennislarge asthe mean
square errors and biases both tend to zero.
The Actuarial
Education
Company
[1]
IFE: 2022 Examination
Allstudy materialproducedby ActEdis copyright andis sold
for the exclusiveuse of the purchaser. Thecopyright is
ownedbyInstitute andFacultyEducationLimited, a
subsidiary of the Institute and Faculty of Actuaries.
Unlessprior authorityis grantedby ActEd,you maynot hire
out,lend, give out, sell, store ortransmit electronically or
photocopy any part of the study material.
You musttake care of yourstudy materialto ensurethat it
is not used or copied by anybody else.
Legal action will betaken if these terms areinfringed. In
addition, we mayseekto take disciplinaryactionthrough
the profession orthrough your employer.
Theseconditionsremainin force after you havefinished
usingthe course.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
Page 1
Confidence
intervalsand
prediction
intervals
Syllabusobjectives
3.2
Confidenceintervals
3.2.1
Definein general terms a confidence interval for an unknown parameter of
a distribution based on a random sample.
3.2.2
Define in general terms a prediction interval for a future
on a random
3.2.3
based
sample.
Derive a confidence interval for an unknown parameter using a given
sampling
3.2.4
observation
distribution.
Calculate confidence intervals
for the
mean and the variance of a normal
distribution.
3.2.5
Calculate confidence intervals for a binomial probability and a Poisson
mean,including the use ofthe normal approximation in both cases.
3.2.6
Calculate confidence intervals for two-sample situations involving the
normal distribution,
and the binomial and Poisson distributions
using the
normal approximation.
The Actuarial
3.2.7
Calculate confidence intervals for a difference between two meansfrom
paired data.
3.2.8
Usethe bootstrap
Education
Company
method to obtain confidence intervals.
IFE: 2022 Examination
Page 2
0
CS1-09: Confidence
intervals
Introduction
In the previous chapter we usedthe method of moments and the method of maximumlikelihood
to obtain estimates for the population parameter(s). For example, we might havethe following
numbers of claims from a certain
portfolio that
wereceive in 100 different
monthly periods:
Claims
0
1
2
3
4
5
6
Frequency
9
22
26
21
13
6
3
Assuming a Poisson distribution with parameter
estimate of
for the number of claimsin a month, our
usingthe methodsgiven in the previous chapter would be
x
2.37==
.
The problem is that this might not bethe correct value of . In this chapter welook at
constructing confidence intervals that have a high probability of containing the correct value. For
example, a 95%confidence interval for
meansthat there is a 95% probability that it contains
the true value of
.
Confidence intervals
will be constructed
using the sampling
distributions
example, whensampling from a N s2(, ) distribution where
2??
XN
??
n ??
?
Z=
X -s
s
??
95%
21/2%
z1
s2
given in Chapter 7. For
is known:
??N,(0,1)
n
21/2%
z2
If werequire a 95% confidence interval, then wecan read off the upper 2.5% point ofthe
standard
normal distribution
from page 162 of the Tables to get +1.96.
Wecan then
use the
symmetry of the standard normal distribution to deducethat the lower 2.5% point is 1.96.
It is important to realise that the formula for the endpoints of this interval contains X, and so the
endpoints are random variables. Wecan obtain numerical values for these endpoints by
collectingsomesample data andreplacing X bythe observedsample mean.x
different samples maylead to different endpoints.
we obtain should contain the true value of .
IFE: 2022 Examinations
If
wesample repeatedly,
Naturally,
95% of the intervals
The Actuarial
Education
Compan
CS1-09: Confidence
1
intervals
Page 3
Confidence
intervalsin general
A confidence
to apoint
interval
provides
probability.
involved.
-(1
A 100
The width of the interval
estimate
provides
)%a confidence interval for
(),??X12 ()X depending
((PX)
<< 12? (1X??
)) =
Rightly or wrongly,
common
Thus
an interval
of an unknown
estimate). It is designed to contain the parameters
=
PX()
of the estimator
X
=
,XX)n
?1(,
such that
.
0.05aleading to a 95% confidence interval, is by far the most
case used in practice
X??
precision
? is defined by specifying random variables
on the sample
-a
a measure of the
parameter (as opposed
value with some stated
and we will tend to use this in
() ,X
12
()<<
() = 0.95 specifies
?12
most of our illustrations.
X()??
() as a 95% confidence
interval
for
This emphasises the fact that it is the interval and not ? that is random. In the long
? .
run, 95% of the realisations
of such intervals
will include
? and 5% of the realisations
will
not include
?.
Suppose wetake a random
sample from a particular
population
at a fixed
moment in time and,
based on this sample, wecalculate a 95% confidence interval for the meanof the population to be
(25,30). Suppose wethen take another random sample from this population at the same moment
in time, and this second sample gives a 95% confidence interval
to appreciate that the limits
of any confidence interval
for
of (23,29).
It is important
depend on the sample values collected.
If werepeat the sampling process manytimes and calculate a 95% CIfrom eachsample, then 95%
of these confidence intervals will contain the true value of .
It is important to understand that, since the meanof the population is constant (not arandom
variable), it doesnt
(25<<
make sense to
=30)
P
make statements
of the form:
0.95
Whenever we write down a probability statement,
we must makesure that it contains at least
one random variable. If there is no random variable, the statement is nonsense.
Toillustrate this, consider the score obtained when afair dieis thrown. Let X = score obtained
on the next throw.
Since X represents
a future
outcome, its value isnt
yet known.
There is
morethan one possible value that X could take andits value is down to chance. So Xis a
variable and it makes sense to consider probabilities involving
X, eg: (2 PX 4)=<
.
random
Now suppose that the score on the last throw
of the die was 5. This is a past value,
which has
already been determined and recorded. Lets denote this by y. Consider the statement:
Py <(4)
This doesnt
makeany sense as y is afixed amount
y =(5) , in the same wayas the meanof a
population at any given moment of time is afixed amount.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 4
CS1-09: Confidence
Confidence intervals
intervals
are not unique. In general they should be obtained via the sampling
distribution
of a good estimator, in particular the maximum likelihood
estimator.
Even then
there is a choice between one-sided and two-sided intervals
and between equal-tailed and
shortest-length intervals although these are often the same, eg for sampling distributions
that are symmetrical
about the unknown
value of the parameter.
We willsee some examples of these shortly.
Often, we are more interested in statements about future observations
parameters underlying the distribution
of these observations.
than
about the
This arises in the context of regression
models, for example, when a fitted
model is being
used to make predictions
about future observations.
Even if the parameter ? equals the
unknown
mean of the distribution, it will not be the case that a future observation
will fall
within a 95% confidence interval
with probability
95%. For this, a prediction interval is
required.
Aconfidence interval gives ussome information about the value of afixed parameter, ?,from a
particular distribution, but a prediction interval gives usinformation about the next future value
from that distribution,
A 100-(1
that
Pl
(( X)
X.
)%a prediction
X 1
interval
(hX<<+n
)) = 1-
for
a
.
Xn+1is defined
Prediction
by random
variables
are, like
confidence
intervals
unique but typical choices are one-sided or symmetric.
more generally for functions
of one or more future
lX () ,
hX() such
intervals,
Prediction intervals
not
can be defined
observations.
For example,in Chapter 12, we will predict the output of the function
a
x+
.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
Page 5
2
Derivationof confidenceandpredictionintervals
2.1
The pivotal method
There is a general
method of constructing
confidence
intervals
This method requires a pivotal quantity of the form
called the pivotal
(),gX
? to be found
method.
with the following
properties:
(1)
it is a function
of the sample values and the unknown
(2)
its
is completely
(3)
it is monotonic in ? .
distribution
The distribution in condition (2)
parameter
?
known
must not depend on
?. Monotonic
meansthat the function
either consistently increases or decreases with ?.
The equation:
g2
? ft() dt=0.95,(where
ft() is the known
probability
(density)
of
)
?(),gX
g1
defines two values, 1g
and 2g , such that:
Pg12,0.95()=<<g
Xg? ()
andgg12 are usually constants.
Weare assuming here that
X has a continuous
distribution.
We willlook shortly at examples
based on discrete distributions.
If
(),gX
? is
monotonic increasing
gXg??
()<? <22,
gX??()?<,g 11<
and if
in
?, then:
? for some number 2?
for some number 1?
?
(),gX
? is monotonic decreasing in
gXg??
()<? <21,
gX??
()?<,g
resulting in
Fortunately,
??
in
() being
12,
?,then:
?
?12<
a 95% confidence
most practical
situations
interval
for
such quantities
?
.
? do exist, although
(),gX
an
approximation to the methodis needed for the binomial and Poisson cases.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 6
CS1-09: Confidence
In sampling from a N(,
could
2)sdistribution
be used as a pivotal
quantity,
with known value of2s
but for easier comparison
, the expression
intervals
X-
with the case of unknown
variance below we use:
Xn
s
whose distribution
is
(0,1)N
.
For example,
given a random
yields a sample
x 1.96
s
62.75 1.96
=
n
Thisis a symmetrical
intervals,
sample
of size 20 from the normal
mean of 62.75, an equal-tailed
10
95% confidence
population
interval
N(,
210
)
for
which
is:
= 62.75 4.38
20
confidence interval
since it is of the form
we can write down the interval using the
?
notation,
. For symmetrical
confidence
where the two values indicate
the upper and lower limits. Alternatively, we can write this confidence interval in the form
(58.37, 67.13) . Here we are using the pivotal quantity
X
10
-
20
, which follows the
(0,1)N
distribution, irrespective of the value of .
The normal
Another
mean illustration
95% interval,
shows
with unequal
that confidence
intervals
??
tails, is ??XX1.8808
-+
ss
,
nn
??
However,
there would not be much
are not unique.
2.0537
.
reason to use this one in practice.
Question
Show that both this and the first interval given above are 95% confidence intervals. Calculate the
width of each of these intervals.
Solution
For the second confidence interval:
PZ
(-< 1.8808
<
2.0537)
=
P (Z < 2.0537)
(
=< PZ
=-
This interval has width
IFE: 2022 Examinations
3.9345
2.0537)
0.98000
-
(P
Z < -1.8808)
(1 -PZ(
(1 - 0.97000)
< 1.8808))
=
0.95
s
n
.
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
Page 7
For the first confidence interval:
P( 1.96-< Z < 1.96) =PZ
( < 1.96) -PZ
(
2=
(
PZ
<
1.96)
2= 0.975 - 1
n
Since the
0.95
=
.
which are of some use in practice are the one-sided 95%intervals:
??s
X +,1.6449
-8
1
s
Thisinterval haslength 3.92
Otherintervals
-
< - 1.96)
normal
and
??
n??
X 1.6449
,-8
n
distribution
is symmetrical
??s
??
??
about the value of the unknown
parameter, it is
quite easy to see that the equal-tailed interval is the shortest-length interval for that level of
confidence.
Question
The average IQ of a random
sample of 50 university students is found to be 132. Calculate a
symmetrical 95%confidence interval for the averageIQ of university students, assuming that IQs
are normally distributed. It is known from previous studies that the standard deviation ofIQs
among students is approximately 20.
Solution
X-
Sincethe distribution is normal, weknow that
n
s
From the Tables weknow that 0.95
0.95
X-
P( 1.96=<
Using n
50=
, s
<
1.96)PZ
, so:
<1.96)
Rearranging to obtain limits for
( =- 1.96
PX
<
n
s
0.95
(=- 1.96
? N(0,1) , when s is known.
<
:
< X +1.96
ss
)
nn
20=and X 132=from the question, we obtain the interval 132
5.5, or
()126.5,137.5
.
So a symmetrical
95% confidence interval
The Actuarial
Company
Education
for the average IQ is() 126.5, 137.5 .
IFE: 2022 Examination
Page 8
CS1-09: Confidence
With prediction intervals,
we are predicting
already have a sample of values 1(,X
value
value from the distribution.
?, X
)n from this distribution,
Since we
well call this new predicted
Xn
1+ .
A similar
from
a single future
intervals
approach
a normal
depend
on
-
can be used for
distribution
prediction
intervals.
with known variance,
-
In the example
XXn1+
above,
has a distribution
of sampling
that does not
, and in fact:
XXn
+
1
? N(0,1)
11 n
s
+
The predicted value comes from a normal distribution,
)s(,
+1 ?XNn
Theorem tells usthat for samplesfrom a normal distribution,
linear combination
of normal distributions
XXn 1 ? N
--+s
result from
n + s22) = N(0,s 2(1
(,
2 . The Central Limit
X ?)Nn (,s
2
. Hence, usingthe
Chapter 4:
n+1))
Standardising this gives the result above.
The previous
with
s
+11
derivations
n : a 95% prediction
X 1.96
Aless formal
therefore
1+s 1 20
=
give prediction
interval
intervals
for the random
for
sample
Xn1+
if
wereplace
n
s
of size 20 above is:
62.75 20.08
wayto consider this is asfollows.
The predicted
value comes from
a N(,
2)s
distribution. Since ( 1.96-< PX < 1.96) = 0.95, weknow that 95% ofthe values from that
distribution lies between
However,
1.96s
.
we do not know the true value of
X 1.96 s
but a 95% confidence interval
for it is given by
n. Putting these two together, a 95% confidence interval for a predicted value
Xn
1+ is:
1.96
=ss Xn
1.96+
() Xs1.96 n
1 +1
Question
The average IQ of a random sample of 50 university students is found to be 132. Calculate a
symmetrical 99% prediction interval for the averageIQ of university students, assumingthat IQs
are normally distributed. It is known from previous studies that the standard deviation ofIQs
among students is approximately 20.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
Page 9
Solution
Since the distribution
is normal,
-XXn
+1 ? N(0,1), when sis known.
we use
11 n
s
From the Tables weknow that 0.99
P(=- 2.5758
0.99
Rearranging to obtain limits for
Using n
(PX=- 2.5758
50=
, s
( 2.5758=<PZ < 2.5758), so:
- XXn
+1 <2.5758)
11 n
<
s
0.99
+
+
Xn
1+ :
1+1 n
Xn+1
<
<
X + 2.5758
ss
1 + 1 n)
20=and X 132=from the question, we obtain the interval 132
52.03, or
80.0,184.0() .
So a symmetrical
2.2
for the average IQ is() 80.0, 184.0 .
99% prediction interval
Confidence limits
The 95% confidence
interval
??
, XX-+ 1.96 ss ?? for
nn??
1.96
as:
s
X 1.96
n
This is quite informative
as it gives the point estimator
accuracy.
cannot
However, this
always
Also one-sided confidence intervals
limit
is often expressed
X together
be done so simply
withthe indication
using a confidence
ofits
interval.
correspond to specifying an upper orlower confidence
only.
If an exam question asks for aconfidence
interval,
it
means a two-sided
symmetrical
confidence
interval. If the examiners require any other type of confidence interval, they will explicitly ask
for it.
2.3
Sample size
A very common
question
asked
of a statistician
is:
How large a sample is needed?
This question
cannot
(1)
the accuracy
(2)
anindication
The Actuarial
Education
be answered
of estimation
without further information,
namely:
required
ofthe size of the population standard deviation
Company
s.
IFE: 2022 Examination
Page 10
CS1-09: Confidence
The latter information
be needed
or a rough
intervals
may not readily be available, in which case a small pilot sample may
guess based on previous
studies in similar
populations.
Asa consequence of the Central Limit Theorem, a confidence interval that is derived from alarge
sample
will tend to be narrower than the corresponding
interval
since the variation in the observed values will tend to average
derived from a small sample,
out as the sample size is
increased. Marketresearch companies often need to be confident that their results are accurate
to within a given margin(eg 3%). In order to do this, they will need to estimate how big a
sampleis required in order to obtain a narrow enough confidence interval.
Example
A company
wishes to estimate the mean claim amount for claims under a certain class of
policy during the past year. Extensive past records from previous years suggest that the
standard deviation of claim amounts is likely to be about 45.
If the company wishes to estimate the mean claim amount such that a 95% confidence
interval is of width
5 , determine the sample size needed to achieve this accuracy
of
estimation.
Solution
Theresulting confidence interval will be x 1.96
s
n
.
The standard deviation s can betaken to be 45 and so werequire n such that:
1.96
45
= 5
?
n
1.96=
45
5
= 17.64
?nn = 311.2
So a sample of size 312, or perhaps 320 to err on the safe side (since the variance is only a
rough guess) would be required.
Question
Calculate how big a sample
would be needed to have a 99% confidence interval
of width 1
.
Solution
The answer can be calculated from the equation:
2.5758=
45
1
? =n 13,436
n
The figure
of 2.5758 can be found
on page 162 of the Tables.
In this case weneed a substantially bigger sample size.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
Page 11
3
Confidenceandpredictionintervalsfor the normaldistribution
3.1
The mean
The previous
section
dealt with confidence
intervals
for a normal
mean
in the case
where
the standard deviation
s was known. In practice this is unlikely to be the case and so a
different pivotal quantity is needed for the realistic case when s is unknown.
Fortunately, there is a similar pivotal quantity readily available and that is the t result:
X-
~tn 1
-
/ Sn
where S is the sample standard deviation.
The resulting
confidence
Xt0.025,1n
interval,
in the form
of symmetrical
95% confidence
limits,
is:
S
-
n
t0.025,1
-n is usedto denote
the upper2.5%pointofthe t distributionwithn1-
degrees of
freedom, andis defined by:
Ptnn
10.025,
()1
0.025--t>=
Forexample,from page163ofthe Tables,t0.025,10
is equalto 2.228.
This is a small sample
N(0,1)
and the
confidence
interval
for
.
Central Limit Theorem justifies
For large
the resulting
samples
interval
tn
-1
becomes like
without the requirement
that the population is normal.
The normality of the population is animportant assumption for the validity of the t interval
especially whenthe sample size is very small, for example, in single figures. However the t
interval
is quite robust
against
departures
from
normality
especially
as the sample
size
increases.
Normality can be checked by inspecting a diagram, such as a dotplot, of the
data. This can also be used to identify substantial skewness or outliers which may
invalidate
the analysis.
Question
Calculate a 95%confidence interval for the average height of 10-year-old children, assuming that
heights have a N s2(, ) distribution (where
of 5 children
The Actuarial
and
s
are unknown), based on arandom sample
whose heights are: 124cm, 122cm, 130cm, 125cm and 132cm.
Education
Company
IFE: 2022 Examination
Page 12
CS1-09: Confidence
intervals
Solution
Sincethe sample comes from a normal distribution, weknow that
X-
has a tn -1 distribution,
Sn
where 2S is the sample variance.
Fromthe Tables, wefind that t 0.025,4= 2.776, ie 0.95
( 2.776=<Pt 4 < 2.776).
So:
0.95
P( 2.776=<
X
-
<2.776)
Sn
Rearrangingthe inequality to isolate
0.95
(PX 2.776=S
n<
gives:
<
X +2.776 S
n)
Usingthe calculated valuesfor the sample ( n5= , x =126.6, and s2
17.8=
) gives:
121.4, 131.8()
When calculating
a numerical confidence interval,
we must drop the probability
notation.
This is
required since muis not arandom variable and hence expressions such as
P 121.4
<<
The Rfunction
variance is:
131.8()
for
=
0.95) do not makesense.
a symmetrical
t.test(<sample
95% confidence
data>,
interval
for the
mean with unknown
conf=0.95)
For small samples from a non-normal
distribution,
confidence intervals
empirically in R using the bootstrap
method used in Chapter 8 Section
can be constructed
7. For example, a
non-parametric 95% confidence interval for the mean could be obtained by:
quantile(replicate(1000,mean(sample(<sample
data>,replace=TRUE))),
probs=c(0.025,0.975))
3.2
Thevariance
For the estimation
available:
of a normal
-1()nS 2
2
s
IFE: 2022 Examinations
~
variance
s
2
, there is again a pivotal
quantity readily
2
?n-1
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
The resulting
Page 13
95% confidence interval for the variance
()nS
--
22
11
()nS ??
??
22
,
??
1 ?? 0.975, nn-- 1??
0.025,
or for the standard
deviation
()nS
-,
0.025,
1
s
:
??
11
()nS22 ??
??
22
?? 0.975, nn 1
-??
Note: Dueto the skewness ofthe
symmetrical
s2 is:
?2 distribution, these confidence intervals
about the point estimator
So wecant write these usingthe
S2, and are also not the shortest-length
are not
intervals.
notation.
The above intervals require the normality assumption for the population but are considered
fairly robust
against
departures
from
normality
for reasonable
sample sizes.
There is no built-in function for calculating
confidence intervals for the variance in R. We
can use Rto calculate the results of the formula from scratch or use a bootstrap
method if
the assumptions are not met.
Question
Calculate:
(i)
an equal-tailed 95% confidence interval and
(ii)
a 95%confidenceinterval of the form
0,L()
for the standard deviation ofthe heights ofthe children in the population based on the
information
given in the last question.
Solution
Sincethe sample comes from a normal distribution,
weknow that
4S2
s
(i)
2
2.
?? 4
From the Tables, wefind that:
?4
0.95 P(0.4844=< <2
11.14)
So:
0.95 P(0.4844=<
4S2
s
The Actuarial
Education
Company
2
<11.14)
IFE: 2022 Examination
Page 14
CS1-09: Confidence
intervals
Replacing2S by 17.8, the sample variance calculated in the solution to the previous
question, and dropping the probability notation (since2s
have:
is not arandom variable), we
4 17.8
0.4844<<11.14
s
2
71.2
?
s
11.14
?
6.39
2.53
?
2<<
71.2
0.4844
s2
<< 147.0
s <<
12.1
So, an equal-tailed 95%confidence interval for the standard deviation is (2.53, 12.1).
(ii)
From the Tables wefind that:
0.95
P(0.7107=< ?2
)
4
0.95
P 0.7107=<
So:
4S2??
??
2 ??
s
??
Replacing 2S
by 17.8, the sample variance calculated in the solution to the previous
question, and dropping the probability notation (since2s
have:
417.8
<
0.7107
?
ss
22< 100.2 ?
s
is not arandom variable), we
<10.0
So, a one-sided 95%confidence interval for the standard deviation is
Tofind a confidence interval
lower
5% point of the
with an upper limit, ie of the form
2 distribution,
?4
ie the point
0,L
() , we need to start with the
which is exceeded by 95% of the distribution.
If we wantto find a confidenceinterval withalower limit, ie of the form
start byfinding the upper 5% point of the
2
?4
0, 10.0() .
8(),L
, we wouldneedto
distribution, ie the point whichis exceeded by only
5% ofthe distribution.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-09: Confidence
3.3
intervals
Page 15
Predictioninterval for normaldistribution
Weve already seenthat:
X-
? tn-
/ Sn
Replacing
with
1
and adjusting
Xn1+
the denominator
produces
a pivotal
quantity
with the
same distribution:
-
XX n 1
? tn-1
+11Sn
+
A prediction interval for
Xn+
Xn
1+ can therefore take the form:
-1
tS
0.025,n
11
Question
The heights of 10-year-old children are normally distributed. The heights of arandom sample of
five children (in cm) are: 124cm, 122cm, 130cm, 125cm and 132cm.
Calculate a 90% confidence interval for the predicted
height of a 10-year-old child based on these
data values.
Solution
Sincethe sample comes from a normal distribution,
weknow that
-
XXn
1 has a tn -1
11Sn
+
+
distribution,
where 2S is the sample variance.
From the Tables, wefind that t 0.05,4 = 2.132, ie 0.90
0.90
P=( 2.132
( 2.132=-<Pt 4 < 2.132) . So:
- XXn
<
1 < 2.132)
11Sn
+
+
Rearranging the inequality
0.90
P( X=- 2.132
to isolate
Xn
1+ gives:
1 + 1SnX<
n+ 1
< X
+
Forthis sample, we have n5= , x =126.6, and s2
prediction interval of:
1
+
1Sn2.132
)
17.8=
. Usingthese values gives a 90%
116.7, 136.5()
The Actuarial
Education
Company
IFE: 2022 Examination
Page 16
There is
CS1-09: Confidence
no simple function
Prediction intervals
scratch
intervals
#
for
calculating
prediction
intervals
for fitted
can either be calculated by implementing
distributions
random
in
R.
the above formula from
or alternatively
by leveraging the functionality
in Rfor calculating
for linear regressions
(by regressing
on a constant):
create
intervals
prediction
sample
set.seed(23)
x<-rnorm(10)
#
calculate
#
confidence
mu
sigma
10
observations
and
<-mean(x)
#
mu
sample
mu
calculate
<-c(mu
#
root
+
sigma
*
sqrt(1/10)
+
sigma
*
sqrt(1+1/10)
sigma
*
<-c(mu
and
intervals
square
+
confidence
functionality
sample
from
scratch
mean
#
prediction_interval
#
prediction
<-sqrt(var(x))
confidence_interval
in
+
sample
*
sqrt(1/10)
variance
*
qt(0.025,9),
qt(0.975,9))
sigma
*
prediction
of
*
sqrt(1+1/10)
*
qt(0.025,9),
qt(0.975,9))
intervals
using
linear
regression
(lm)
data.frame(1)
is
just
dummy
data
in
formulae
below
predict(lm(x~1),data.frame(1),interval
=
predict(lm(x~1),data.frame(1),interval
= "prediction")
"confidence")
Thelinear regression approach is covered in Chapter 12.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-09: Confidence
4
intervals
Page 17
Confidence
intervalsfor binomial &Poissonparameters
Both these
situations
probabilities
involve
using the large-sample
adjusted.
One approach
distribution
Then if both
normal approximations,
is to use a quantity
Phh??)()
h( X)
12
1()h
? and
can be inverted
4.1
a discrete
which introduces
the difficulty
not being exactly 0.95, and so at least 0.95 is usedinstead.
hX()
of
Also, when not
the pivotal
quantity
method
whose distribution
involves
?
must be
such that:
= 0.95
()(<<
2()h
? are
monotonic increasing
to obtain a confidence
interval
(or both decreasing),
the inequalities
as before.
The binomial distribution
If
Xis a single
?
observation
from
Bin (,
n ? ) , the
maximum likelihood
estimator is:
= X
n
Whatfollows is a slight diversion from our aim of obtaining a confidence interval for
?. It is just
demonstrating that the methodis sound.
Using X as the quantity
=()
Ph12() h<<
PX
h
0.95??X
, where withequal tails
()
()
hX() , it is necessary to find if
1()h
? and
PX
h ()?()
2()h
? exist such that
0.025
and==1
0.025==2
?()
.
= 95%
= 21/2%
= 21/2%
h1 (?)
h2(?)
Wecan have at most 2.5% in the lower (or upper) tail, so we need to be very careful about finding
the values of1h
and2h
.
Thereis no explicit expression for the pivotal quantity
For the
Bin (20,0.3)
PX
The Actuarial
case:
1 == 0.0076
()
Education
hX() .
Company
and P( X
=
2)
=
0.0355
? h 1( ?) = 1
IFE: 2022 Examination
Page 18
CS1-09: Confidence
intervals
Also:
PX 11
()== 0.0171, (PX = 10)
=
0.0480
?
=2h( ?
)
11
Question
Calculate the values of1h
and2h for the binomial distribution
with parameters
n
20=
, and
0.4=
?
.
Solution
If X Bin?
(20,0.4) , then (using page 188 of the Tables) PX==
(
3) 0.0160 and PX==
(
4)
0.0510.
So3h1 = .
Also (PX==13)
0.0210 and PX==
(
12) 0.0565. So 2 = 13h
.
h1 and2h have higher values than for the Bin(20,0.3) case.
So
()1h
? and
()2h
? do exist andincrease with ?.
Were back on track.
Wecan move on to obtain our confidence interval for
Therefore the inequalities
can be inverted
Xh=
=11X??
() ?
?( )
Xh=
X??()
=22
?
?
(
These are the tail probabilities.
?.
as follows:
)
So the inequalities
involving
?1 and ?2 are defining the tails.
Our confidence interval is the region not covered by these tail inequalities:
This gives a 95% confidence interval
Note: The lower limit
?
1()X from the lower
of the form )??(21 X
2()X? comes from the upper tail
tail
<<()X?
.
probabilities
and the
upper limit
probabilities.
Wellsee this is the casein the question on the next page.
However since there
expressions
for
are no explicit
1()X? and
So, adopting the convention
found
expressions
2()X? and they
ofincluding
for
1()h
? and
2()h
? , there
will have to be calculated
are no
numerically.
the observed xin the tails, 1? and2?
can be
by solving:
nx
br
rx
n??;
, ()
0.025
and
??(br ; n, 12 )==0.025
r== 0
Here br(; ?,
n ) denotes=PX r() when
IFE: 2022 Examinations
?XB
in
(, ? ).
n
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
Page 19
These can be expressed
1--
Fx
in terms
; ?=11
()
of the distribution
0.025 and
Fx
?
function
?:
();Fx
()=2;0.025
Note: Equality can be attained as ? has a continuous range (0,1) and the discrete
problem
does not arise.
The Rfunction
for an exact 95% confidence
binom.test(x,n,
interval
for the
proportion
is:
conf=0.95)
Question
Wehave obtained a value of 1from the binomial distribution with parameters n
Construct a 95%symmetrical confidence interval for ?.
20=and ?.
Solution
PX==
(1)
Weneed1? such that
under Bin(20,
2?
0.025 under Bin(20,1? ) , and2?
such that
PX==0.025
(1)
).
Forthe first equation, wehave ( 1 ??? ) 20-+ 20(1 - 11)19 1 = 0.025.
Solvingthis weobtain =1
?
A numerical
0.249.
method will be needed here, or trial and improvement.
write the equation in the form (1
-+
One approach
would be to
19
) = 0.025, then iterate using
??11)19(1
1
1=-??
?n+1
0.025 ??19
119?n??+
starting
with
0.5 .
?=1
Forthe second equation we have (1?-=20 2 )
Solvingthis weobtain =2
?
0.975.
0.00127.
Our confidence interval is then() 0.00127,0.249 .
The normal approximation
It is easy for a computer
p evenif nis large.
binomial
n??
to calculate
an exact confidence
interval
for the binomial
parameter
However, on a piece of paper we use the normal approximation to the
distribution.
Xn?
can be used as a pivotal
quantity.
1- ()
The Actuarial
Education
Company
IFE: 2022 Examination
Page 20
CS1-09: Confidence
Solving the resulting
- Xn?
However
n??
simpler
equations for
, with
?
intervals
? would not be easy.
in place of ?
(in the denominator
only), can be used in a
-1()
way and yields the standard
95% confidence
interval
used in practice,
namely:
)
Xn (1-1.96
??
n
or
?
1.96
(1
-??
n
) , where =
?
X
n
.
Question
In a one-year mortalityinvestigation, 45 of the 250 ninety-year-olds present at the start ofthe
investigation
distribution
died before the end of the year.
with parameters n
Assuming that the number of deaths has a binomial
250=and q, calculate a symmetrical 90% confidence interval for
the unknown mortality rate q.
Solution
-Xnq
Since 250is alarge sample, weknow that
nq(1
Since ( 1.6449-<
<
-
? N(0,1) approximately.
q)
1.6449) = 0.90PZ
, wecan saythat:
P??=1.6449-<
??-
250Xq
250 (1 qq)
<
1.6449
????
0.90
Rearranging this:
250
1.6449
Xq(1
250
q)
Pq <
X
250
+
1.6449-<
Replacing X by the observed value of 45 gives q =
Therefore a symmetrical
IFE: 2022 Examinations
q(1--q )??
??= 0.90
250 ??
??
45
250
90% confidence interval for
.
q is
0.140,0.220() .
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
Page 21
Question
Repeatthe question on page 15 usingthe normal approximation.
obtained.
Comment on the answer
Solution
A 95%symmetrical confidence interval is given by:
X
X
1.96
X????
nn??
1
nn
From the question, weknow that x1=
the confidence interval
and n
to be (-0.046,0.146)
20=
. Substituting these into the formula,
weget
.
Sincethe value of n is so small, the normal approximation is not really appropriate. Thisis
highlighted by the lower limit whichis not sensible, as p mustbe between 0 and 1. The upper
limit is not even close to the accurate value either.
The reason whythe accuracyis so poor in this caseis that the distribution is skew. Since wegot 1
out of 20,the value of p can be estimated as 0.05. Sothe value of np1 20 0.05 = is nowhere
near big enough to justify a normal approximation, where we usually require np5=
4.2
.
The Poissondistribution
The Poisson situation can be tackled in a very similar wayto the binomial for both large and
small sample sizes.
If
Xi,1 = ,2,...,in areindependent Poi()? variables,that is, arandom sample of size n
fromPoi ()?
, then ?Xi?Poi(
)n? .
Using
iX
?
as a single observation from Poi()n? is equivalent to the random sample of
size n from Poi ()?. Thisis similarto the single binomial situation.
Recallthat a Bi
)n n
(, p distribution arisesfrom the sum of n Bernoulli trials with probability of
success p.
Given a single observation
where
()1h
? and
Inverting this gives
The Actuarial
Education
X from
a Poi
()? distribution,
then
=()
Ph12 ()h<<
()
0.95??X
,
()2h
? areincreasing functions of ?.
(
Company
PX)
<< ?12(
( )) = 0.95X??
.
IFE: 2022 Examination
Page 22
CS1-09: Confidence
The resulting
8
95% confidence
interval
for
is given by
?
)??
(, 12
intervals
where:
x
? 1(;
pr ? ) = 0. 025and ? pr
(; ?2) = 0.025
r=x
r=0
Here pr(; ? 1) denotes PXr= () where
?XPoi
?
1() .
or:
Fx 11;-=1
?
()
The Rfunction
for
0.025 and
Fx ? ()=2;0.025
an exact 95% confidence
poisson.test(x,n,
interval
for the
Poisson
parameter
? is:
conf=0.95)
Question
Supposethat wehave obtaineda value of 1from
confidence interval
for
()Poi
?. Calculateasymmetrical 90%
?.
Solution
Weneed
, and
0.05 under Poi()
1?
PX==
(1)
Thefirst equation is
equation
?
.
0.05 under Poi()
2?
10.05-= ? ee 11 = 0.95??
, which gives ? 1
--
-- ???
2ee 22
The second equation is
iterative
PX==
(1)
=log
1
=
0.0513.
0.05+=
. Solving this numerically, for example by using the
???+
?? , we obtain
0.05??
2 =
4.74?
Therefore a symmetrical 90%confidence interval for
.
?is
0.0513, 4.74().
Notsurprisingly this is
very wide, since we only have 1 sample value.
Thenormalapproximation
Again, it is easy for a computer
to calculate
an exact confidence
interval
for
large sample from Poisson()? , or a single observation from Poisson()?
However,
?Xi
on a piece of paper a normal approximation
n??
~ Poi()(nN
IFE: 2022 Examinations
?
,n
? even for a
where ? is large.
can be used either from
?) or from the Central Limit Theorem as
?XN
?
,
? ??
The Actuarial
??
n??
.
Education
Compan
CS1-09: Confidence
X?
?
/ n
intervals
can then
Page 23
be used as a pivotal
quantity
yielding
a confidence
interval.
However, as in
X-
the binomial case, the standard confidence interval in practical use comes from
?
where =?
/
?
n
X.
This clearly gives X
1.96
X
as an approximate 95% confidence interval for
n
?
.
Question
In a one-year investigation
of claim frequencies for a particular category of motorists, the total
number of claims made under 5,000 policies was 800. Assuming that the number of claims made
byindividual
motorists has a
for the unknown
?()Poidistribution, calculate a symmetrical 90%confidence interval
average claim frequency
?.
Solution
Sincethe sample comes from a Poisson distribution, weknow that
X- ?
??? N(0,1) . Here
? n
n
5,000=
.
From the Tables, wefind that
P( 1.6449-<
X - ?
?
( 1.6449-<
<
1.6449) = 0.90PZ
. So:
<1.6449) = 0.90
n
Rearranging so that only ?lies in the middle of the doubleinequality:
PX
1.6449
-< <?
X
+1.6449
??
??
?? = 0.90
nn??
??
Replacing
nby5,000,X by0.16and ?by0.16givesaconfidence
intervalof
The Actuarial
Education
Company
0.151, 0.169().
IFE: 2022 Examination
Page 24
5
CS1-09: Confidence
intervals
Confidence
intervalsfor two-sample problems
A comparison
of the parameters
of two
populations
can be considered
by taking
independent random samples from each population.
The importance
of the independence
is illustrated
by noting that:
22
var
ss 12
??
XX
12??
-=
when the samples
If the samples
+
nn12
are independent.
are not independent,
ss
??
XX
12 ??-=
var
This covariance
term
+
then
a covariance
term
will be included:
22
12
- 2cov ?? ,1X X2??
nn12
can clearly
have a substantial
effect in the non-independent
case.
The most common form of non-independence is due to paired data.
5.1
Two normal means
Case 1(known population variances)
If 1X and 2X
respectively
respectively,
population
are the meansfrom independent
taken from
populations
then the equal-tailed
100-(1
which have known variances
)%a confidence
interval
2 and
s1
2
s2
for the difference in the
means is given by:
z- XX
12()
So for example,
There is
normal
random samples of size1n and2n
when
ss
a
/2
a=5%,
no built-in function
22
+ 12
nn12
we have
a
for calculating
zz
22.5%1.9600
.
==
the above confidence
interval
Rto calculate the results ofthe formula from scratch or use a bootstrap
assumptions
are not
IFE: 2022 Examinations
in
R.
Wecan use
methodif the
met.
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
Page 25
Case 2(unknown population variances)
If
1,,X12 XS and 2S , are the
samples
of size 1n
and 2n
means and standard
respectively
taken from
variances, then the equal-tailed 100-(1
population
deviations
normal
from independent
populations
random
which have equal
)%a confidence interval for the difference in the
means is given by:
t12
XX ()
.nn
a
+12/2,-
11
nn12
2 Sp
+
where:
(1)nS11 -+ ( n2
2
Sp
=
-
1)S2
22
nn
2
12 +-
Thisformula is given on page 23 of the Tables.
In any practical
situation
2
s1
small and whether
consideration
and
2
s2
are known
must be made as to
or unknown.
whether 1n
and 2n
are large
or
In the case of the t result it should
be noted that there is the additional assumption
of equality of variances and this should
checked by plotting the data in a suitable way and/or using the formal approach in
be
Section 5.2.
Note: The pooled
estimator
to give an unbiased
2
Sp
is based on the
maximum likelihood
estimator
but adjusted
for the t
is the same asthe
estimator.
Remember that the number of degrees of freedom
distribution
number usedin the denominator of the pooled sample variance formula.
S1
2 and S2
2 arethe
sample variances calculated in the usual way.
Question
A motor company runs tests to investigate the fuel consumption of cars using a newly developed
fuel additive. Sixteen cars ofthe same makeand age are used, eight withthe new additive and
eight as controls. The results, in milesper gallon over atest track under regulated conditions, are
asfollows:
Control
27.0
32.2
30.4
28.0
26.5
25.5
29.6
27.2
Additive
31.4
29.9
33.2
34.4
32.0
28.7
26.1
30.3
Calculate a 95%confidence interval for the increase in milesper gallon achieved by cars withthe
additive. State clearly any assumptions required for this analysis.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 26
CS1-09: Confidence
intervals
Solution
Assumingthat the samples come from normal distributions withthe same variance and that the
samples areindependent,
(AC ()--
weknow that
Sp
-
AC)
?t
11
nn+- 2, whereA and C arethe
AC
+
nn
AC
sample means, A and C are the underlying population means,An andCn arethe sample sizes
and
spis the pooledsamplestandarddeviation.
2 by 5.96, andAn andCn by 8to obtain the
Wenow replace A by 30.75, C by 28.3, Sp
confidence interval.
For a symmetric
2
(The individual
sample variances are sA =
confidence interval,
48.06
7
2
and sC =
35.38
.)
7
we need the upper 2.5% point of 14t, which is 2.145.
Substituting these valuesin gives a symmetrical 95%confidence interval of:
5.96
2.45 2.145
=-
4
( 0.168,5.068)
In R we can use the function
t.test
confidence interval for the difference
The t.test
function
with the argument var.equal
= TRUE to obtain a
between the means with unknown but equal variances.
can also obtain confidence
intervals
for the
difference
between the
means with unknown but non-equal variances.
5.2
Again, we could use the bootstrap
methodto construct empirical confidence intervals if the
assumptions
are not met.
of the above formulae
Two population variances
For the comparison
of two population
22
-ss 22.
difference
ss 12/than the
12
but also from
a technical
variances,
This follows
it is
more natural to consider
logically
point of view there is a pivotal
from the concept
quantity readily
the ratio
of variance,
available
for the
ratio of normal variances but not for their difference.
It is
22
SS/
12
~ Fnn
22
ss/ 12
The resulting
1,
12 --
1
confidence
1
1,
nn12-- 1
.
interval
22
s11
22
s
22
is given by:
SS 2
<< 1 ..nn
F
2
SSF
2
1, 21 1
--
where Fnn 1, -- 1 is the relevant percentage point from the F distribution.
12
Noticethat the order
of the degrees offreedom is different in the two F distributions here.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
Page 27
It should be said that in practice the estimation of
22 is
12/
ss
not a common objective.
However the same F result is used for the more common objective oftesting
andss 22
12
may be equal,
which is relevant
means. The acceptability
of the
for the t result for
H :ss01
hypothesis
= 22
2
comparing
whether
population
can be determined simply by
confirming that the value 1is included in the confidence interval for
22.
ss 12
Whatthis is saying is that if the number 1lies in the confidence interval, then 1is one ofthe many
reasonable values that the variance ratio can take. So weare not unhappy about the assumption
that
=22
ss12
1, ie
12. Thealternative wayof checkingequalityis to usethe hypothesis
ss=22
test detailed in Chapter 10.
Question
For the fuel additive data in the previous question, calculate a 90% confidence interval for the
ratio
s
2
C
2
ofthe variances of the fuel consumption distributions both without and with the
s A
additive, and comment
question.
on the equality of variances assumption
needed for the analysis in that
Solution
Fortwo independent random samples from
N
)CCN
22
S
s2(,
s2(,
A ) and
A
,
sAA
SCCs22
? Fnn
1AC1,
--
,
2
whereAn andCn arethe sample sizes, and SA
and SC
2 arethe sample variances.
2
Fromthe previousquestion, sA
6.8657=and sC
2
=
5.0543 .
1
Fromthe Tables, weknowthat 0.90 <??
=<PF
3.787 7,7
0.90
22
sAA
S
1
P<??
=<
3.787
S
22
sCC
??
3.787 , whichgivesus:
??
??
3.787
??
??
2
Rearrangingthis to give
s C
s
2
2
(and dropping the probability notation since
s C
s
A
2
variable), weget 0.1944<<s C
2
is not arandom
A
2.788.
2
s A
So the confidence interval is therefore
The Actuarial
Education
Company
(0.1944,2.788)
.
IFE: 2022 Examination
Page 28
CS1-09: Confidence
Since the value of 1lies
well within this interval,
the assumption
intervals
of equality of variances needed
in the previous question appears to bejustified.
The Rcode for this confidence interval is var.test
the assumptions
5.3
or we could use a bootstrap
methodif
are not met.
Two population proportions
The comparison
of population
proportions
corresponds to comparing two binomial
probabilities
on the basis of single observations
12,XX
from Bin (,)n
?11 and Bin?(,
22n
)
respectively.
Considering
only the case
where 1n
and 2n
are large,
so that the normal
approximation
can
be used, the pivotal quantity used in practice is:
(1
where
??12()--
(?
)
?
?? 11
+
)
- ?12
2
(1--
?
2
? ~(0,1)
) ? N
nn12
XX
12
,12 are the MLEs
??
,
The R code for this
nn12
confidence
, respectively.
interval
is prop.test
with the argument
correct=FALSE.
Question
In a one-year mortalityinvestigation, 25 of the 100 ninety-year-old malesand 20 of the 150
ninety-year-old females present at the start ofthe investigation died before the end ofthe year.
Assumingthat the numbers of deathsfollow independent binomial distributions, calculate a
symmetrical
95% confidence interval
for the difference
between
male and female
mortality rates
atthis age.
Solution
Since the samples come from independent
??
XX
12
2??
IFE: 2022 Examinations
nn
11
we know that, approximately:
? N(0,1)
X ?
?11
--
n
+
distributions,
()
-pp
12
??-nn12 ??
2?? XX
11
??
binomial
?
X2 ?
?
n2 ?
nn12
The Actuarial
Education
Compan
CS1-09: Confidence
Calling
intervals
Page 29
X
X1= , andp1
2 = p2 , and usingthe Tables, weknowthat:
n1
n2
??
??
pp
12()-- (
0.95=- P??<
1.96 <
Replacing 1p
by 0.25 and 2p
0. 016
<-
So a symmetrical
5.4
pp
()
11
pp12
pp )
??
- 12
1.96
??+
p2 11-- p2()
nn
12
??
??
by 0.133, the inequality
becomes:
0.218
<
95% confidence interval
for the difference in
mortality rates is (0.016,0.218)
.
TwoPoissonparameters
Considering
the comparison
approximation
of two
Xi is an estimator ofi?
Therefore
-XX
12
XX
12
Using ?iiX =
Poisson
parameters
(1?
and 2? ) when the normal
can be used:
such that
is an estimator of -12??
N
1
-??-?
2,
, an approximate
.- XX
12
+??196
??
?XN
ii,
?
?
??i
??
ni ??
??
such that:
??
12??+
nn
12??
95% confidence
interval
for
-?? 12
is given by:
??
XX
12
nn12 ??
??
Weare assumingthat the two samples areindependent.
There is
no built in function
for calculating
the above confidence
Rto calculate the results ofthe formula from scratch.
can be used to obtain a confidence
interval
for the ratio
interval
in R.
However, the function
of the two
Poisson
Wecan use
poisson.test
parameters.
Question
In a one-year investigation
of claim frequencies
150 claims from the 500 policyholders
for a particular category
of motorists, there
were
aged under 25 and 650 claims from the 4,500 remaining
policyholders. Assumingthat the numbers of claims madebythe individual motoristsin each
category haveindependent Poisson distributions, calculate a 99%confidence interval for the
difference between the two Poisson parameters.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 30
CS1-09: Confidence
intervals
Solution
Sincethe samples come from independent Poisson distributions, weknow that
XX()--??- 1
2()12
? N(0,1), wheresubscripts 1 and 2 refer to young and old drivers
XX
12
+
nn
12
respectively.
From the Tables, we know that
0.99
0.99 =-<2.5758
(=- 2.5758 <
XX()
<
2.5758)PZ
. This gives us:
??
??
()-1 -??12
2
2.5758??P??<
XX
12
nn
12
??+
??
??
Replacing1X by 0.3,2X by 0.1444,1n by 500 and2n by 4,500andrearranging,the inequality
becomes:
0.0908
<-<??12
So the confidence interval
IFE: 2022 Examinations
0.2203
for
??-12
is (0.0908,0.2203)
.
The Actuarial
Education
Compan
CS1-09: Confidence
6
intervals
Page 31
Paireddata
Paired data is a common
example
of comparison
using non-independent
samples.
Essentially having paired or matched data meansthat there is one sample:
(XX11
21),( X12
,
,
X22 ),( X13
,
X23 ),...,(
X1
2
nn)
,
rather than two separate samples:
(XX11
12
,
,
X13 ,...,
X1n) and ,XX21
(,
22 X23,..., X2n)
The paired situation is really a single sample problem, that is, a problem based on a sample
of n pairs of observations.
(In the independent
two-sample
situation the sample sizes need
not, of course, be equal.)
Paired data can arise in the form
of before
and after
comparisons.
We will see one of these in the next question.
Investigations
using paired data are usually better than two-sample investigations
sense that the estimation
is
in the
more accurate.
This meansthe confidence interval derived from the paired data will usually be narrower.
Paired data are analysed using the differences
DXii =-12
iX
and estimation of
=-12D
is considered.
A z result or a t result can be used, but the latter will be more common as it
is unlikely that the variances of the differences
will be known.
Assuming normality of the
population
of such differences (but not necessarily the normality of the 1X and 2X
populations), the pivotal quantity for the t result is:
D- D ? tn -1
D/ Sn
SDis calculated from the values of D.
The resulting
95% confidence
interval
for
D
will be
Dt0.025,n -1
SD
n
.
Question
The average blood pressure
b for a group of 10 patients
was 77.0 mmHg. The average blood
pressure a after they were put on a special diet was75.0 mmHg. Assumingthat variation in
blood pressure follows
a normal distribution,
calculate a 95% symmetrical
confidence interval
for
the reduction in blood pressure attributable to the special diet. Assessthe effectiveness ofthe
diet in reducing the patients blood pressure. It is known that
The Actuarial
Education
Company
?
-=68.
ba
ii() 2
IFE: 2022 Examination
Page 32
CS1-09: Confidence
intervals
Solution
Sincethis is a paired sample from a normal distribution, weknow that
D
D
where
()
--
ABt? n- 1,
Sn
DA
=- B.
From the Tables, weknow that 0.95
0.95 ( 2.262=<
D
()
AB
-D
(=- 2.262 <<9t 2.262), so:
<
2.262)
Sn
Wecan now replace n by 10, D by 2.0 andDS by:
()
? b-ba
ii
sD
22(
n -a)
n-
68
==
3.26,
4
=
19
Rearranging gives a 95% confidence interval
(
10-
for
1.764
AB
of:-
0.74)-.
Sincethis interval does not include the value 0(which would be the value if there wasno
difference in the average blood pressure before and after), the diet seems to be effective.
A plot ofthe sample differences can be used to check on normality but recall that the t
result is robust as n increases.
Alsothe Central Limit Theorem meansthat it can be safely
used for large
n.
From a practical viewpoint:
(i)
When confronted
with two-sample
data, consideration
should
the data mayin fact be paired.
One wayis to draw a scatterplot
be made of whether
and calculate the
correlation coefficient to see whether there is any relationship in the pairs
points. If there is a strong relationship,
the data were paired by design.
(ii)
the data source
should
of data
be checked to see if
If a paired problem is analysed as though it involved independent
samples, then the
results
would be invalid because the assumption
of independence
is violated.
On
the other hand, if independent
samples are analysed as though they were paired,
then the results
would be valid although they
would be makinginefficient
data due to the discarding of possible information
the two separate populations.
The ideal approach is to ask the person
The R code for this
confidence
interval
about the
who collected the data whether any pairing
is t.test
with the argument
use ofthe
means and variances
of
was used.
paired=TRUE.
Thefuel consumption data given earlier in the chapter were not paired data. Thereis no wayto
link a specific item
group.
of data in the control
group to the corresponding
So we analyse the data using the two-sample
IFE: 2022 Examinations
item
of data in the additive
t situation.
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
Page 33
Onthe other hand, suppose that
we had measured the fuel consumption
of 8 cars without a fuel
additive, and then re-measured the fuel consumption of the same 8 cars with the fuel additive.
Thisis now a paired situation. A dataitem from the first sample is linked to a specific item in the
second sample. In this situation we wouldtreat the data as being paired, and would subtract the
figure for control
consumption
for each car from the figure for the same car when using the
additive.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 34
CS1-09: Confidence
intervals
The chapter summary starts on the next page so that you can keep
all the chapter summaries together for revision purposes.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
Page 35
Chapter9 Summary
Confidenceintervals
A confidence interval gives us a range of values in which we believe the true parameter value
lies, together
with an associated probability.
There are a number of different situations for
which wecan find confidence intervals.
For a single sample from a normal distribution:
(0,1)
XX--
known
nSn
s
-(1)nS2
?
2
s
1 ss
22
unknown
2
?n-1
Forsamples from two independent
XX
12()
()
- 12-+ss
11
??Ntn-
22
normal distributions:
XX()--
(0,1)
nn
22
-
Sn +11 12
n
p
known
1
ss
22
2()12
Nt
??+-2
nn12
unknown
Assuming equal variances
where:
(1)nS
11 -+ ( n2
2
Sp
=
-
1) S
22
2
nn
12+- 2
To compare the variances of two independent normal populations:
22
S
s
S
s22
11 ? Fnn
22
1, 12-- 1
For a sample from a binomial
pp
(0,1)
pq n
or
distribution:
X--
np
??NN(0,1) (approximately)
npq
Forsamples from two independent
12()12
-pp-?Np
(0,1)
pp ()
pq
11
The Actuarial
Education
+
p2 q2
binomial distributions:
(approximately)
where
X12
p ==,
12
X
nn
12
nn
12
Company
IFE: 2022 Examination
Page 36
CS1-09: Confidence
intervals
For a sample from a Poisson distribution:
??
(0,1)
or
-Xn?
???
NN(0,1)
(approximately)
nn
??
For samples from two independent Poisson distributions:
()
12
1
-??-()??
2
? (0,1)
(approximately)
where
X==,11
NX?? 2
2
??12
+
nn
12
Generalconfidence intervals for parameters can befound, using the pivotal method, and the
formulae given above.
For paired data wesubtract the paired values to come up with a new variable,
D, and then
follow one of the other standard confidence interval calculations:
X D
DD ? t -1
Sn
nD
s2
unknown
PredictionIntervals
A prediction interval
gives us a range of values for a future
predicted value, together
with an
associated probability.
For a single sample from a normal distribution:
XX
s
11
IFE: 2022 Examinations
(0,1)
known
--
nS
XX++11
nn
++
11 n
1nNt
22
ss
unknown??
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
Page 37
Chapter9 PracticeQuestions
9.1
Asurvey wascarried out to find out the number of hours that actuarial students spend watching
television per week.It wasdiscovered that for a sample of 10 students, the following times were
spent
watching television:
8, 4, 7, 5, 9, 7, 6, 9, 5, 7
(i)
(a)
Calculate a symmetrical 95%confidence interval for the meantime an actuarial
student spends watching television
(b)
(ii)
per week.
Write down the assumptions needed to calculate the confidence interval in part
(a).
Calculate a symmetrical
95% prediction interval for the time an actuarial student spends
watchingtelevision per week.
(iii)
(a)
Describethe limiting case of the formulae for the intervals in parts (i)(a) and (ii) as
n tends to infinity.
(b)
9.2
Explain which ofthe two intervals calculated will be moresensitive to the
assumptions in part (i)(b).
Aresearcher investigating attitudes to Sundayshopping reports that, in a sample of 8
interviewees,
7 werein favour
of more opportunities
to shop on Sunday.
Calculate an exact 95%confidence interval for the underlying proportion in favour of this idea
usingthe binomial distribution.
9.3
An opinion poll of 1,000 voters found that 450 favoured Party P. Calculate an approximate 99%
confidence interval
for the proportion
Comment on the likelihood
The Actuarial
Education
Company
of voters
whofavour
Party P.
of more than 50% of the voters voting for Party Pin an election.
IFE: 2022 Examination
Page 38
9.4
CS1-09: Confidence
Two inspectors
carry out property
valuations for an estate agency.
Over a particular
intervals
week they
each go out to similar properties. Thetable below shows their valuations (in 000s):
Exam style
A
102
98
93
86
92
94
89
97
B
86
88
92
95
98
97
94
92
The dotplots for these two inspectors
91
as
Inspector A
85
90
95
100
105
100
105
valuation ('000)
Inspector B
85
90
95
valuation ('000)
(i)
(a)
(b)
Comment on the possible assumption of normality and equal variances for the
two underlying populations using the diagrams.
Calculate a 95% confidence interval for this common variance using the equal
variance assumption from part (a).
(c)
Calculate a 95% confidence interval for the meandifference between the
valuations by A and B,commenting briefly on the result.
[10]
The estate agency employing the inspectors decides to test their valuations by sending them each
to the same set of eight houses,independently and without knowledge that the other is going.
The resulting
valuations (in 000s) follow:
Property
(ii)
1
2
3
4
5
6
7
8
A
94
98
102
132
118
121
106
123
B
92
96
111
129
111
122
101
118
Calculate a 90% confidence interval for the
mean difference
between valuations
B,commenting briefly on the result.
IFE: 2022 Examinations
by A and
[4]
[Total 14]
The Actuarial
Education
Compan
CS1-09: Confidence
9.5
intervals
Page 39
The ordered remission times (in weeks) of 20 leukaemia
Exam style
patients are given in the table:
1
1
2
2
3
4
4
5
5
8
8
8
11
11
12
12
15
17
22
23
Supposethe remission times can be regarded as arandom sample from an exponential
distribution
with density:
fx(;??
?)
(i)
(ii)
e
x, x=>- 0
maximum likelihood
estimator
?
(a)
Determine the
of
?.
(b)
Calculate the large-sample approximate variance of ? .
(c)
Hence calculate an approximate 95% confidence interval for
(a)
Calculate an exact 95%confidence interval for
?.
[7]
? usingthe fact that ?2 nX has a
?2
2n distribution.
(b)
Comment
briefly on how it compares
with your interval in (i)(c).
[3]
[Total 10]
The Actuarial
Education
Company
IFE: 2022 Examination
Page 40
9.6
CS1-09: Confidence
Heights of males with classic congenital
intervals
adrenal hyperplasia (CAH) are assumed to be normally
distributed.
(i)
Determine the minimum sample sizeto ensure that a 95% confidence interval for the
mean height has a maximum width of 10cm, if:
(ii)
(iii)
9.7
(a)
a previous sample has a standard
deviation
(b)
the population standard deviation is 8.4 cm.
Determine the minimum sample sizeto ensure that a 95% prediction interval for the
height of a male with CAHhas a maximum width of 38cm,if:
(a)
a previous sample has astandard deviation of 8.4 cm
(b)
the population
standard
deviation is 8.4 cm.
Comment on the difference in sample sizes required for parts (i) and(ii).
Asample value of 2is obtained from a Poisson distribution
(i)
with mean .
Calculate an exact two-sided 90%confidence interval for
Asample of 30 values from the same Poisson distribution
(ii)
9.8
of 8.4 cm
Usethese data values to construct
An office
.
has a mean of 2.
an approximate
90% confidence interval for
.
manager wants to analyse the variability in the time taken for her typists to complete
a
given task. She has given seven typists the task and the results are asfollows (in minutes):
15, 17.2, 13.7, 11.2, 18, 15.1, 14
The manager wants a 95% confidence interval
form
for the true standard
deviation
of time taken
of the
8(),k
.
Calculatethe value of k.
9.9
The amounts ofindividual claims arising under a certain type of generalinsurance policy are
known from
Exam
style
past experience to conform to alognormal
distribution
in which the standard
deviation is 1.8 times the mean. An actuary hasfound that the lower and upperlimits of a 95%
confidence interval for the meanclaim amount are 4,250 and 4,750.
Evaluate the lower and upperlimits of a 95%confidence interval for the lognormal parameter
.
[3]
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-09: Confidence
9.10
Exam style
intervals
A general insurance
Page 41
company is debating introducing
a new screening
programme
to reduce the
claim amounts that it needsto pay out. The programme consists of a much more detailed
application form that takes longer for the new client department to process. The screening is
applied to a test group of clients as atrial whilst other clients continue to fill in the old application
form. It can be assumed that claim payments follow
a normal distribution.
The claim payments datafor samples of the two groups of clients are(in 100 per year):
(i)
Without screening
24.5
21.7
35.2
15.9
23.7
34.2
29.3
21.1
23.5
28.3
Withscreening
22.4
21.2
36.3
15.7
21.5
7.3
12.8
21.2
23.9
18.4
(a)
Calculate a 95% confidence interval
for the difference
between the
mean claim
amounts.
(ii)
(b)
Comment
on your answer.
[6]
(a)
Calculate a 95% confidence interval
(b)
Hence, comment on the assumption of equal variances required in part (i).
for the ratio
of the population
variances.
[4]
Assumethat the sample sizes taken from the clients with and without screening are always equal
to keep processing easy.
(iii)
Calculatethe minimum sample size so that the width of a 95% confidence interval for the
difference between meanclaim amounts is less than 10, assuming that the samples have
the same variances asin part (i).
[3]
[Total
The Actuarial
Education
Company
13]
IFE: 2022 Examination
Page 42
CS1-09: Confidence
intervals
The solutions start on the next page so that you can
separate the questions and solutions.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
Page 43
Chapter9 Solutions
9.1
(i)(a)
The sample meanand variance are:
67
==6.7
x
10
1
9 475
s
=-
{}
10 6.722=
2.9
Sothe confidence interval is given by:
6.7 t 0.025;9
2.9
10
From the Tables with a = 0.025, t0.025,9 = 2.262, so our confidence interval is (5.48, 7.92).
(i)(b)
Wehave assumed that the numbers of hours that actuarial students spend watching
television
(ii)
has a normal distribution.
The prediction interval is given by:
6.7 t 0.025;9
1??
2.9 1+??
10??
This gives a prediction interval
(iii)(a)
of (2.66, 10.7).
Forlarge samples, the confidence interval for the mean will eventually converge on the
sample
mean which should be equal to the true
mean, whereas the prediction interval
will not converge to a single value but down to aninterval of the distribution.
(iii)(b)
Unlike confidence intervals
distribution,
for the
prediction intervals
mean, whichis concerned
with the centre of the
also take account of the tails as well as the centre.
Hence, prediction intervals have greater sensitivity to the assumption of normality.
9.2
The numberin a sample of 8 who arein favour has a Bin(8,)p
underlying proportion in favour.
We wantthe value of p for whichthe probability of getting 7 or
more in favour in a sample of 8is 0.025. This will give the lower
p.
distribution, where p is the true
end of the confidence interval for
Wealso wantthe value of p for whichthe probability of getting 7 or fewer in favour is 0.025.
This will give us the upper end of the interval.
The probability of getting 7 or morein favour is:
8??
??
7??
78
(1 pp)-+ p = 0.025
Rearranging the equation:
pp7
7
)-=(80.025
The Actuarial
Education
Company
IFE: 2022 Examination
Page 44
CS1-09: Confidence
Using trial and error, or goalseek in Excel to solve this equation
intervals
we obtain:
p =0.4735
For the upper end of the interval,
wehave:
p -=810.025
which wecan solve directly to give p
(0.4735,0.9968)
9.3
0.9968=
. So a 95% confidence interval for p is
.
Assumingthat the sample comes from a binomial distribution,
weknow that the quantity
X
Xnp
-
np(1
? N(0,1) or
p)
-
- p
n
? N(0,1) .
(1 -pp)
Here n = 1,000 and Xis the number whofavour
n
Party P.
From the Tables wefind that 0.99
( 2.5758 <PZ < 2.5758), so:
=-
X
0.99
P =- 2.5758
n
<
??
??-
p
<2.5758 ??
pp)
??-(1
??
??
n
Rearranging
thisto giveus p, andreplacing p byp underthe squareroot:
X
)pp
0.99??+2.5758
=- 2.5758
nn
Replacing
X by 450 and p
by
450
1,000
Pp <
<
(1--(1
pp)??
X
n
n
??
??
, weget the confidence interval to be 0.409,0.491.
()
Sincethis 99%confidence interval doesnt contain the value
= 0.5p(or
highervaluesof p),it is
unlikely that Party P will gain morethan 50% of the votes.
9.4
(i)(a)
(i)(b)
Bappears to have aslightly smaller spread (but it is hard to tell with so few data points).
The difference in the spread doesnt appear to be significant, so the assumption of equal
variances can be allowed to stand.
[1]
There are no outliers and so there is nothing to suggest non-normality.
[1]
ForInspector A, we have
2
sA
IFE: 2022 Examinations
1
=??70,683=-
8,
751 2??
78
AA
=??
nx
751,
2
70,683, giving:
Ax==
[1]
26.125
??
??
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
For Inspector
2
sB
Page 45
9, =??
BB
x==
nx
B, we have
833 2??
1
89
=??77,223=-
2
833,
77,223B,
giving:
[1]
15.528
??
??
The common (or pooled) varianceis given by:
2
sP
7 26.125
8 15.528 20.473
+
==
The pivotal quantity is
2
15 SP
2
s
15 20.473
27.49
(i)(c)
[1]
+78
,
??152
. This gives a 95% confidence interval for
sP2
of:
P
15 20.473 ??
??= (11.2,49.0)
6.262 ??
[1]
The confidence interval is calculated using:
1
1 ??
+
=
nn
12??
2
- t0.025,15 sP??
xx
AB)
(
751
-
833??
89
?1
?? 2.131 20.473 ?
??
?8
+
Thisgivesa confidenceinterval of (-3.37,6.00).
1?
?
9?
[2]
[1]
Sincethis interval contains zero, there is insufficient evidence at the 5%level to suggest
that there is a difference in the valuations
(ii)
For the differences we have nD8=
[1]
2
, ?xD 14=
, ?xD
198=
, giving:
1142??
2
sD
xD = 1.75
given by each of the two inspectors.
198==??
78
[1]
24.786
??
??
The confidence interval is calculated using:
xt0.05,7
D=
2
sD
14
1.895
nD
24.786
[1]
88
Thisgivesa confidenceinterval of (1.59,5.09)
. Sincethisinterval containszerothere is
insufficient evidence to suggest that Aand B give different valuations.
9.5
(i)(a)
[2]
Thelikelihood function is:
n
()
???Le
==
e
--
???
?xxn
ii
i=1
?
ln ( ) ln =??
?
xii
ln L(?)
?
dn
=-
??Ln
x
[2]
d??
The Actuarial
Education
Company
IFE: 2022 Examination
Page 46
CS1-09: Confidence
intervals
Setting the derivative equal to zero:
0-=
?
Checking its
nn
xi
?
1
[1]
==
?xi X
a maximum:
2
ln L( ? ) =d
dn
??
<0
22
For these data, ?== 1
max
?
[1/2]
11
CRLB=-
d2
d? 2
[1/2]
0.11494.
8.7
(i)(b)
?
?
??
lnEL( ? )??
=
??
? n ?
E? 2 ?
1
n
=
?
=
2
[1]
n
??????2
Usingthese data values, the estimate of the CRLBis 2? n = 0.000661.
(i)(c)
Since
(,
) approximately, the confidence interval is given by
???NCRLB
[1]
CRLB
?1.96
which, usingthe CRLBestimate, gives (0.06457,0.1653).
(ii)(a)
Since 2?nX
2
2 n,
??
we have 40
X?
??
[1]
2
40 . Thelower and upper 2.5% points of
2
?40
are
24.43 and 59.34. So:
2
Pn
24.43
59.34()= 0.95
<<?X
Hence a 95%confidence interval for
24.43
40
(ii)(b)
59.34???? 24.43
??==
40 ????xx 348
,
?is:
59.34
,
This confidence interval is narrower
348
[1]
0.07020, 0.1705()??
asit is based upon the exact result,
whereas in part
(i)(c) it wasbased on a relatively small sample of 20. Alarger sample would have given a
narrower interval.
[2]
9.6
(i)(a)
Sample size needed (unknown
Using the result
X-
variance)
?tn- 1, gives a 95%confidence interval of:
Sn
xt 0.025;-1n
s
n
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-09: Confidence
intervals
Page 47
The width ofthis confidence interval is 2 t 0.025;1-n
s
, so werequire:
n
t0.025;n 1
8.4
210<- ?
t 0.025;
1n-
<0.5952
nn
Usingthe values from page 163 ofthe Tables, wefind that:
t 0.025;12 2.179
13
13
==0.6043
and:
t 0.025;13 2.160
==0.5773
14
Therefore
(i)(b)
14
we need a sample of at least
14 individuals.
Sample size needed (known variance)
X-
Usingthe result
X
1.96
?N(0,1) , gives a 95% confidence interval of:
n
s
s
n
The width ofthis confidence interval is
21.96
s
. So werequire:
n
2 1.96
8.4
10< ?
3.29 <
? nn > 10.8
n
Therefore we need asample of atleast 11individuals.
(ii)(a)
Sample size needed (unknown
Using the result
-
variance)
XXn 1 ?tn-1, givesa 95%confidence
intervalof:
+
11Sn
+
xn+-
ts
0.025;
n 1
11
The width of this confidence interval is21 1tsn
0.025;
2 ?tn
0.025;18.4 1+
1 < 38
1
+-
n , so werequire:
nn 1 1 1 n+< 2.262-t0.025;
Using the values from page 163 of the Tables, wefind that:
t 0.025;111 1 12+= 2.201 1 + 1 12 = 2.291
The Actuarial
Education
Company
IFE: 2022 Examination
Page 48
CS1-09: Confidence
intervals
and:
t 0.025;121 1 13+= 2.179 1 1 13+= 2.261
Therefore
(ii)(b)
we need a sample of at least 13individuals.
Sample size needed (known variance)
-XXn
+1 ? N(0,1), gives a 95% confidence interval
Usingthe result
Xn+s 1.96
+
1
1
The width of this confidence interval is
2 1.96
of:
11 n
s
8.4 1 + 1
38=
21.96
?
1
1+s 1 n. So werequire:
1nn+= 1.1540
?
1 n = 0.33179
?
n = 3.01
Therefore we need a sample size of atleast 4individuals.
(ii)(c)
Comment
For the confidence intervals the sample sizes are similar, butlarger in the case whereless
information
is known. In general, prediction intervals
are wider than confidence intervals
and so
alarger sample is needed to get the same width. However,in this case, the prediction intervals
vary due to the vast difference in the tails of the t distribution.
9.7
(i)
Exact confidence interval
Werequire:
PX==
(2)
0.05 under
Po
1i
()
PX==
(2) 0.05 under
Po
2i
()
From the first equation:
0.95
(PX== 0) + PX
( = 1)e=
+ 1 -- e
Solving this numerically weobtain =1
11
0.36.
From the second equation:
0.05 = PX
( = 0)
PX
( += 1)
PX
( += 2)
2
e
=+
-ee
2
22
+
2
-
2
2
Solving this numerically weobtain =2 6.3.
Sothe confidenceinterval is 0.36,6.3
() .
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-09: Confidence
(ii)
intervals
Page 49
Approximate
confidence interval
Since nis large enough to use a normal approximation, the pivotal quantity is:
?
-
Xn
(0,1)
n
or
-
??(0,1)
(approximately)
NN
n
where
.
X=
Hence, a 90% confidence interval
?
Xn
1.6449
Replacing n by 30,X?
or
1.6449
by 60 and
60 1.6449 30 2
from:
nn
by 2 gives:
or 2 1.6449
30
can be obtained for
2
30
So a 90% confidence interval is:
(1.58,2.42)
9.8
The confidence interval is based on the distributional result:
(1)ns2
-
s
2
??n-1
2
Wehave:
104.2
x==14.88571
7
=- 7
{1,581.98
14.88571
22}= 5.148
1
6
s
So a 95% one-sided confidence interval for the variance is given by:
6 5.148 ??
30.888 ??
,, 8=(2.45, )
12.59 ??
??8=
8??
2
??
??
?0.05;6
So a 95% one-sided confidence interval
9.9
The formulae
for the
EX
[]=
e s+
for the standard
mean and variance of alognormal
1/2 2
and
var( ) Xe2 s+=- (
22
deviation is (1.57,)8 .
distribution
are:
1)es
Sincethe standard deviation equals 1.8times the mean, weknow that:
22
ss
1/21/2
ee
The Actuarial
Education
Company
-=
s
1.8e++21/2(1)
[1]
IFE: 2022 Examination
Page 50
CS1-09: Confidence
intervals
So:
es
2
1/2
(1) -=1.8
?
s
2
=
1.4446
[1]
The 95% confidence interval for the meancorresponds to the inequality:
4,250<<es+ 1/2 2
Solving for
4,750
gives:
l og4,250-< 1/2s
<log4,750 - 1/2s
22
Usingthe value found for2s , this is:
7.632
7.744<<
So the lower limit
9.10
(i)(a)
[1]
of the confidence interval for
is 7.632 and the upper limit is 7.744.
Mean difference confidence interval
Using the subscript
1 to refer to without
screening,
and 2 to refer to with
screening,
the pivotal
quantity is:
(XX ()
--
11
SP
)
12
12
12
? t nn
+-2
-
[1]
+
nn
12
Calculatingthe required values:
x1
s1
s2
257.4
10
==25.74
200.7
x2
10
==20.07
[1]
1
=- 1025.74
22}=36.1871
9{6,951.16
[1/2]
1
=- 1020.07
22}=58.4357
9{4,553.97
[1/2]
The pooled sample varianceis given by:
2
sP
1
(9= 36.1871 + 9
18
58.4357)
=
47.3114
[1]
Hence,a 95% confidence interval is given by:
(25.74
20.07)-
2.101
47.3114
Alternatively, the confidence interval for
IFE: 2022 Examinations
2
=-(
10
-
0.793,12.1)
[1]
21 is (-12.1,0.793) .
The Actuarial
Education
Compan
CS1-09: Confidence
(i)(b)
intervals
Page 51
Comment
Sincethe confidence interval contains the value 0,there is insufficient evidence to conclude that
the new screening programme significantly reduces the meanclaim amount.
[1]
(ii)(a)
Ratio of variances confidence interval
The pivotal quantity is:
22
SS
12
?Fnn
22
ss 12
[1]
1, 12 -- 1
Hence,a 95% confidence interval is given by:
22
SS
12
0.025;
2
S1
S22
22<<
ss 12
1,nn
1
12--
FF0.975;
1--
1,nn2 1
2
Replacing S1
by 36.1871 and S2
2 by 58.4357, we obtain:
0.6193
4.026
0.6193
1 4.026
22
ss 12 <<
[2]
So the confidence interval is (0.154, 2.49) .
Alternatively, the confidence interval for
(ii)(b)
Comment
Since the confidence interval contains 1, this
population variances are the same.
(iii)
22 is (0.401,6.50).
ss 21
meansthat
we are reasonably
confident that the
[1]
Sample size
The width of the confidence interval is:
2 t2.5%;2n-2
2
36.1871(
1)-+ 58.4357(
-
1)nn 19.455 t 2.5%;2n-2
=
nn -22
n
[1]
This mustbeless than 10, so usingthe percentage points of the t distribution from page 163 of
the Tables, wesee that:
and:
n 15=
?
n = 16
?
19.455t0.025,2n-2 19.455 2.048
15
19.455
2.042
16
The minimum sample sizeis 16.
The Actuarial
Education
Company
15
==>10.3 10
9.93=< 10
[2]
IFE: 2022 Examination
Page 52
CS1-09: Confidence
intervals
Endof Part2
Whatnext?
1.
Briefly review the key areas of Part 2 and/or re-read the summaries atthe end of
Chapters 6 to 9.
2.
Ensureyou haveattempted some ofthe Practice Questionsatthe end of eachchapterin
Part 2. If you dont havetime to do them all, you could save the remainder for use as part
of your revision.
3.
4.
Attempt
Assignment
X2.
Workthrough the Chapter6to 9 material(Central Limit Theorem,sampling distributions,
estimation and confidence Intervals) of the Paper B Online Resources(PBOR).
5.
Attempt Assignment Y1.
Timeto consider...
... revision products
Flashcards
The
These are availablein both paper and eBook format.
paper-based
You can find lots
Flashcards
moreinformation,
Onestudent said:
are brilliant.
including
samples, on our website at www.ActEd.co.uk.
Buy online at www.ActEd.co.uk/estore
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 1
Hypothesis
testing
Syllabus objectives
3.3
Hypothesistesting and goodness offit
3.3.1
Explain what is
meant by the following
terms:
null and alternative
hypotheses, simple and composite hypotheses, type I and type II errors,
sensitivity,
specificity, test statistic, likelihood
ratio, critical region, level
of
significance, probability-value and power of atest.
3.3.2
Apply basic tests for the one-sample
and two-sample
the normal, binomial and Poisson distributions,
situations involving
and apply basic tests for
paired data.
3.3.3
Applythe permutation approach to non-parametric hypothesis tests.
3.3.4
Use a chi-square test to test the hypothesis that a random sample is from a
particular
3.3.5
The Actuarial
Education
distribution,
including
cases where parameters
are unknown.
Explain whatis meantby a contingency (or two-way) table, and use a
chi-square test to test the independence of two classification criteria.
Company
IFE: 2022 Examination
Page 2
0
CS1-10: Hypothesis
testing
Introduction
In manyresearch areas,such as medicine, education, advertising and insurance, it is necessaryto
carry out statistical tests. Thesetests enable researchers to usethe results of their experiments
to answer questions such as:
Is Drug Aa more effective treatment
Does Training programme
Are the severities
lognormal
for
AIDSthan
Tlead to improved
oflarge individual
private
Drug B?
staff efficiency?
motor insurance
claims consistent
with a
distribution?
A hypothesis is where we makea statement about something, for example the meanlifetime of
smokers is less than that of non-smokers. A hypothesis test is where wecollect arepresentative
sample and examineit to seeif our hypothesis holds true.
Hypothesis tests are closely linked to the confidence intervals
example,
when we were sampling from a N s2(, ) distribution
2??
XN
By substituting
??
n ??
??
in
X,
beingin the centre.
s2
?
Z=
X -s
we developed in Chapter 9. For
(s2
known)
we used:
??N,(0,1)
n
s
and n, wefound the values of
that corresponded
For hypothesis tests, wenow assume a value of
to 95% of the data
based on our hypothesis
and can calculate a probability value for the test assuming ourinitial value of
is correct. If we
find that our sample meanis unlikely to occur given our hypothesised value of ,
we naturally
conclude that it is likely that our sample does not come from this distribution
with the assumed
value of . In this case we would reject the null
hypothesis. If, however our sample meanis not
very extreme, it would be fair to saythat it probably does havethe assumed value of
case we would not reject the null
extreme
reject
hypothesis.
values
extreme
reject
null hypothesis
. In this
21/2%
values
null hypothesis
21/2%
z1
z2
assumed
Most of the formulae
value
used in this chapter
are identical
to those in Chapter 9. The only exceptions
arefor the binomial and Poisson distributions.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 3
Finally, we can develop our estimation
work from
Chapter 8. For example,
suppose
we have
recorded the following numbers of claims from a certain portfolio over the last 100 months:
Claims
0
1
2
3
4
5
6
Frequency
9
22
26
21
13
6
3
Assuming a Poisson distribution
with parameter
Chapter 8 would be
.
==2.37
X
, the estimate using the
We obtained a confidence interval for the
methods given in
meanin Chapter 9.
But all of this workis appropriate onlyif the distribution is Poisson. We willseein this chapter
how to carry out a test of whether our sample does or does not conform to this distribution.
The materialin this chapter hastraditionally been examinedin one ofthe longer questions of the
Statistics exam. Spend your time wisely.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 4
CS1-10: Hypothesis
1
Hypotheses,
test statistics,decisionsanderrors
1.1
Thetesting procedure
testing
The standard approach to carrying out a statistical test involves the following steps:
specify the hypothesis to be tested
select a suitable statistical
model
design and carry out an experiment/study
calculate a test statistic
calculate the probability
value, or decide whether the value of the test statistic lies
within
the rejection region
determine the conclusion of the test.
We will not be concerned here with the design of the experiment.
We will assume that an
experiment, based on an appropriate statistical
model, has already been conducted and the
results are available.
1.2
Hypotheses
In Sections 1-6 ofthis chapter a hypothesis is a statement about the value of an unknown
parameter in the
model.
The basic hypothesis being tested is the null hypothesis, denoted 0H
regarded
as representing
the current
parameter being tested (the status
between two
difference.
populations
state of knowledge
quo hypothesis). In
is being tested
In a test, the null hypothesis
is contrasted
many situations a difference
and the null hypothesis
with the alternative
Where a hypothesis completely specifies the distribution,
Otherwise it is called a composite
hypothesis.
For example, whentesting the null hypothesis H0:0.8
=
it can sometimes be
or belief about the value of the
is that there is no
hypothesis,
denoted 1H .
it is called a simple
against the alternative hypothesis
H1 =:0.6, both ofthe hypotheses are simple. However whentesting H0:0.8
=
H1
<:0.8 ,1H is a composite
against
hypothesis.
Atest is a rule which divides the sample space (the set of possible
two subsets, a region in which the data are judged to be consistent
complement, in which the data arejudged to beinconsistent
here are designed to answer the question Do
our rejecting 0H ?.
IFE: 2022 Examinations
hypothesis.
the data provide
values of the data) into
with0H , and its
with0H . Thetests discussed
sufficient
evidence to justify
The Actuarial
Education
Compan
CS1-10:
1.3
Hypothesis
testing
Page 5
One-sidedandtwo-sidedtests
In atest of whether smoking reduces life expectancies, the hypotheses are:
H0: smoking
makes no difference to life expectancy
H1: smoking reduces life
expectancy
Thisis an example of a one-sided test, since weare only considering the possibility of a reduction
in life expectancy, ie a change in one direction.
However
we could have specified the hypotheses
asfollows:
H0: smoking makesno difference to life expectancy
H1: smoking affects life expectancy
Thisis a two-sided test since the alternative hypothesis considers the possibility of a changein
either direction, ie anincrease
1.4
or a decrease.
Test statistics
The actual decision is based on the value of a suitable function
of the data, the test statistic.
The set of possible values of the test statistic itself divides into two subsets, a region in
which the value of the test statistic is judged consistent
with 0H , and its complement, the
critical region (or rejection region), in which the value ofthe test statistic is judged
inconsistent
with 0H . If the test
statistic
has a value in the critical region, 0H is rejected.
The test statistic (like any statistic) must be such that its distribution is completely specified
when the value of the parameter itself is specified (and in particular under 0H ie when0H
is true).
In exam questions the test statistic is generally calculated from data givenin the question. For
details of how to reach a conclusion in practice, see Section 3.1.
1.5
Errors
It is rare for data to enable
result of performing
a test
Type I error: reject 0H
discrimination
with certainty
may be the correct decision,
when it is true;
Type II error: fail to reject 0H
between the two hypotheses.
The
but two kinds of error could arise:
and
when it is false.
The level of significance
of the test, denoted
a , is the probability
error, ie it is the probability of rejecting 0H when it is in fact true.
committing
a Type II error, denoted
, is the
probability
Anideal test would be one which simultaneously
of committing
The probability
of accepting 0H
minimises a and
a Type I
of
when it is false.
. This ideal however
is not attainable in practice.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 6
CS1-10: Hypothesis
testing
Question
Arandom variable Xis believed to follow an
hypothesis
20= against the alternative
?()Expdistribution. In order to test the null
hypothesis
30=
, where
?1= , a single value is
observed from the distribution. If this valueis less than 28,0H is not rejected, otherwise 0H is
rejected.
Calculatethe probabilities of:
(i)
a TypeI error
(ii)
a Type II error.
Solution
(i)
The probability
of a Type I error is given by:
(reject PH
when H
00true)
P X=> 28 when X ? Exp 1 /20 ()()
1=-
The CDF of the exponential
(ii)
The probability
X
(28) =Fe- 28/20 = 0.2466
distribution is given on page 11 of the Tables.
of a Type II error is given by:
(do not reject PH
when H
00false)
P X=< 28 when X ? Exp 1/ 30()()
X(28)== 1 -Fe- 28/30
In this case we were forced to choose between
H0 is false is the same as saying that
H0
= :20
= 0.6068
and H1:30=
. So saying that
= 30.
Since weve only got one value in our sample here, not surprisingly, the probabilities of
Type I and Type II errors are quite big.
The probability of a TypeI error is also referred to asthe size
of the test, which will normally be a
small number such as 0.05 (say).
The power of a test is the probability
equals
1
of rejecting
0H
when it is false,
-.
In general, this
will be a function
of the unknown
parameter
value.
For simple hypotheses the power is a single value, but for composite
function
being defined at all points in the alternative hypothesis.
Atest
result.
so that the power
with a high power is said to be powerful
IFE: 2022 Examinations
hypotheses
asit is very effective at demonstrating
The Actuarial
it is a
a positive
Education
Compan
CS1-10:
Hypothesis
testing
Page 7
Question
Givean expressionin terms of
for the power ofthe test in the question on the previous page.
Comment on how the power is affected
by the value of
.
Solution
The power is the probability of rejecting 0H
value other than
PX
If
=20 . In terms of
28 | X
?
whenthe true value ofthe parameter
is some
this is:
Exp 1 / ()()>=-FX
1
(28) = e-28/
is large (1,000, say),then the power will be close to 1, since the test willreject H0:20=
very easily. Converselyif
H0:20=
not reject
is small (10, say), then the power will be close to 0, since the test will
very easily.
Type I and II errors can also arise in the context of binary classification,
in healthcare as well as in machine learning
contexts. Here, rather than
sample consisting
hypothesis
holds,
a common situation
gathering a data
of multiple observations to assess whether a(population-level)
a decision is required
In a medical context, the classification
for
each individual
is into
healthy
observation.
and diseased
based on a binary test
result. In these contexts:
A Type I error, known as a false
positive,
occurs
when a healthy individual
receives
a
positive test result; and
A Type II error, known
for the disease.
as a false
negative,
occurs
when a diseased individual
tests
negative
The equivalent null hypothesis in this caseis that the individual is healthy, and weare carrying out
a test to ascertain
whether this is the case. If the null hypothesis is true (ie the individual
is
actually healthy) but the test is positive (indicating that the individual hasthe disease),then we
would berejecting atrue hypothesis and makinga TypeI error.
If the null hypothesis is false (ie the individual is sick) but the test is negative(indicating that the
individual
does not have the disease), then
we would be failing to reject a false hypothesis and
makinga TypeII error.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 8
CS1-10: Hypothesis
testing
The table below shows all the possible outcomes from a medical test result:
Test result predicts patient
as having disease
YES
YES
NO
True positive (TP)
False negative (FN)
Type II error
Patient actually
has disease
NO
False positive (FP)
TypeI error
True negative (TN)
The probability of a diseased individual testing positive for the disease (ie atrue positive
rate), is the sensitivity of the test:
Sensitivity =
Number of true positives
Number of true positives
+
Number of false negatives
=
Number of true positives
Total number of people withthe disease
=
P(positive test|individual
hasthe disease)
=- P
1(negative test|individual
hasthe disease)
1=-P (Type II error)
=
The probability
Power of the test
of a healthy individual testing negative (ie a true negative rate), whichis 1
minus the probability
Specificity =
of a false
positive, is called the specificity
of the test.
Number oftrue negatives
Number of true negatives
+
Number of false positives
Number of true negatives
=
=
Total number of people who do not havethe disease
P(negativetest|individual
1=- P(positive test|individual
does not have the disease)
does not havethe disease)
1=- P(Type I error)
Question
Ashort screening test hasjust been developed for depression. Anindependent blind comparison
was made with a gold-standard test for diagnosis of depression among 200 psychiatric
outpatients.
Amongthe 50 outpatients found to be depressed according to the gold-standard test, 35 patients
tested positive under the new short test. Among 150 patients found not to be depressed
according to the gold-standard test, 30 patients tested positive under the new short test.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 9
Calculate the sensitivity
and specificity
of the short screening test, assuming that the
gold-standard test correctly classifies eachindividual.
Solution
Number of true positives
Sensitivity
Total number of people
Specificity==
35
with depression
==
50
=
70%
Number of true negatives
Total number ofindividuals
Examples of binary classifications in
150 - 30
without depression
machine learning
150
= 80%
contexts include:
classifying emails according to whether they are spam
assessing whether claims received
by an insurance
company are fraudulent.
One method of makingsuch predictions is to use a generalisedlinear model with a binomial
distribution. Well cover this in Chapter 13. Other methods are covered in Subject CS2.
Although
the contexts
makeinferences,
are different in important
respects
(eg
hypothesis
testing
seeks to
classifiers seek to make predictions; the true state is usually known with
certainty, atleast for a training set, in classification
problems), understanding
the trade-offs
of minimising Type I versus Type II errors play an important
role in test selection in both
cases.
For example, in the case of using a smear test to identify cervical cancer, it is vital to have atest
with a high sensitivity (currently its 86%-100%), ascervical canceris a serious but treatable
condition if caught early.
However, smear tests have a muchlower specificity (currently
which meansthat a high proportion
30%-87%),
of women with a positive cervical smear test who go on
to havefurther investigation subsequently find that there is no causefor concern. Thisis
considered a small priceto pay compared to the alternative.
R can calculate the power of a one-sample t test (covered in Section 3.1) using the
function:
power.t.test
The Actuarial
Education
Company
IFE: 2022 Examination
Page 10
2
CS1-10: Hypothesis
testing
Classical
testing,significanceandp-values
2.1 Best tests
The classical approach to finding a good
test (called the Neyman-Pearson theory) fixes the
value of a, ie the level of significance
required and then tries to find such a test for which
the other error probability,
, is as small as possible for every value of the parameter
specified bythe alternative hypothesis.
powerful
This can also be described asfinding the most
test.
The key result in the search for such a test is the
the best
test (smallest
Neyman-Pearson
) in the case oftwo simple hypotheses.
lemma,
which provides
For a given level, the
critical region (and in fact the test statistic) for the best test is determined
by setting an
upper bound on the likelihood
ratio
LL
01, where 0L and 1L are the likelihood
functions
of
the data under 0H
and 1H respectively.
The Neyman-Pearson
lemma
Formally, if C is a critical
LL01 = k inside
C and
size
=??
a for testing
region of size a and there exists a constant
k such that
LL01 = k outside C, then C is a most powerful critical region
the simple
hypothesis 0?? =
against the simple alternative
of
hypothesis
1.
So a Neyman-Pearson test rejects 0H if:
Likelihood
Likelihood
under H0 <
critical value
under H1
Question
Arandom variable Xis believed to follow an
hypothesis
=20
?()Expdistribution. In order to test the null
against the alternative hypothesis
=
30, where
?= 1
, a single value is
observed from the distribution. If this valueis less than 28,0H is not rejected, otherwise 0H is
rejected.
Show that this is a Neyman-Pearsontest.
Solution
Given asingle value from an exponential distribution, the Neyman-Pearson criterion is reject 0H
if
LL
01<
criticalvalue. Using
the nullandalternativehypotheses,
the test becomes:
1 e 20
20
< constant
1 e 30
30
x
x
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 11
x
This reduces to e 60 < constant , or x > constant . This was exactly the form
used (we rejected 0H
Common tests
when
of the test that
we
>28x). So this is a Neyman-Pearson test.
are often such that the
composite alternative, eg H10
?:??
null hypothesis
is simple,
, whichis two-sided,
and
eg H00
=:??
H10
>:??
, against a
or H10
<:??
,
which are one-sided.
Here it is only in certain
special cases (usually
one-sided
cases) that
a single test is
available which is best (ie uniformly most powerful) for all parameter values. In cases where
a single best test in the sense of the Neyman-Pearson Lemma is unavailable, another
approach is used to derive sensible tests. This approach, whichis a generalisation ofthe
lemma,
produces
tests
which are referred
to as likelihood
ratio tests.
Likelihood ratio tests
The critical region (and test statistic) for the test are determined by setting an upper bound
on the ratio ( max
0
maxLL),
where
max L0 is the
maximum value of the likelihood
L
under the restrictions
imposed
by the null hypothesis,
and max Lis the overall maximum
value of L for all allowable values of all parameters involved.
Likelihood ratio tests are
used, for example, in survival
models with covariates (see Subject CS2).
In the mostcommon case when0H and1H together cover all possible values for the parameters,
this generalised test rejects 0H if:
max(Likelihood under)H0
max(Likelihood under
Important
results
leads to the test
include
HH)
01
< critical value
+
the case of sampling
from
a N(,
)
s2 distribution.
The method
statistic:
X- 0 ?tn 1 under H00
=:
-
/ Sn
for tests on the value of the mean .
Were assuming here that
s2 is unknown.
If it is known, then the z-test is the best
test.
The method also leads to the test statistic:
nS
-(1)
s
2
2
?
2
?n - 1
under
=
for tests on the value ofthe variance
The Actuarial
22
H00:ss
0
Education
Company
s
2
.
IFE: 2022 Examination
Page 12
2.2
CS1-10: Hypothesis
testing
p-values
Under the classical
Neyman-Pearson
approach,
with a fixed
test will produce a decision as to whether to reject 0H .
test statistic
reject 0H
with some critical value and concluding
with significance level 5%
does not provide the recipient
orresult
of the results
predetermined
value of a, a
But merely comparing the observed
egusing
a 5%test, reject 0H
or
significant at 5% (all equivalent statements)
with clear detailed information
on the strength
of the evidence against 0H .
A more informative
approach is to calculate
observed test statistic.
probability,
assuming 0H is true,
(inconsistent
and quote the
probability
value (p-value)
This is the observed significance level of the test statistic
of observing
a test
statistic
atleast
of the
the
as extreme
with 0H ) as the value observed.
The p-value is the lowest
level
at which 0H
can be rejected.
The smaller the p-value, the stronger is the evidence against the null hypothesis.
For example,
0.5Hv
when testing
H :??s== 0.4 , where
01:
?
is the
probability
of a coin
coming up heads, and 82 heads have been observed in 200 tosses, the p-value of the
result is:
PX =(8 2) where X ~ Bin(200,0.5)
PZ
82.5 100??Z(
??<= P
50 ??
H0 is therefore
against 0H
extremely
and in favour
<
unlikely
of1H .
-
2.475) = 0.0067
probability
A good
< 0.01
way of expressing
and there is very strong
the result is: we
evidence
have very
strong evidence against the hypothesis that the coin is fair (p-value 0.007) and conclude
that it is biased against
heads.
Testing does not prove that any hypothesis is true or untrue. Failure to detect a departure
from 0H
means that there is not enough evidence to justify rejecting 0H , so0H is accepted
in this sense only, whilst realising that it may not be true. This attitude to the acceptance
of
H0 is a feature of the fact that 0H is usually a precise statement,
which is almost certainly
not exactly true.
Question
Arandom variable Xis believed to follow an
hypothesis =20
?()Expdistribution. In order to test the null
against the alternative hypothesis
= 30, where ?=1
, a single valueis
observed from the distribution. If this value of Xis less than k,0H is not rejected, otherwise 0H is
rejected.
(i)
Calculate the value of k that gives a test of size 5%.
(ii)
Determine the probability of a TypeII error in this case.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 13
Solution
(i)
We want:
0.05== ?
k
1
20
88
20
xx ??
k
-- e 20
?? = e 20
??k
edx
So:
k =-20ln0.05
(ii)
=
59.9
The probability of a TypeII error is:
1
?30
0
30
edx
kk
-- xx??
=- e 30
?? =
??0
-e -1.997
A p-value ofless than 5%is considered significant,
exam question
The Actuarial
Education
does not state the level
Company
10.864
=
so that the null hypothesis is rejected. If an
of the test, assume that it is 5%.
IFE: 2022 Examination
Page 14
CS1-10: Hypothesis
3
Basictests singlesamples
3.1
Testingthe value of a population mean
Situation:
random
Testing:
H00
=:
(a)
sample,
size n, from
known: test statistic is
s
X
N(,
)
s unknown:
For large
justifies
samples,
test
statistic
is
X-
sample
X 0
-
, and
s
(b)
2
s
0
/ Sn
/
n
approximation
mean X
~(0,1)under0H
N
~tn -1 under 0H
(0,1)N can be used in place of tn -1.
the use of a normal
testing
Further, the
for the distribution
Central Limit Theorem
of X in sampling
from
any
reasonable
population,
and 2s is a good estimate of2s , so the requirement that we are
sampling from a normal distribution is not necessary in either case (a) or (b) when we have
alarge sample.
Question
The averageIQ of a sample of 50 university students wasfound to be 105. Carry out a statistical
test to conclude
whether the average IQ of university students is greater than 100, assuming that
IQs are normally distributed. It is known from previous studies that the standard deviation ofIQs
among students is approximately 20.
Solution
Weare testing:
H=>
Under0H ,
X
-
100 Hv
100
01::
100s(s
known)
? N(0,1).
n
s
Thetest statistic is
105 - 100
20
=1.768.
50
Weneed to draw a conclusion and there are two waysof doing this.
Method 1:
Calculate the probability
of getting a result as extreme as the test statistic (ie the p-value). If
ZN? (0,1) :
(PZ
1.768)>= 1 - 0.96147
IFE: 2022 Examinations
=
0.03853
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Weare carrying
Page 15
out a 5% one-tailed test.
The probability
we have obtained is less than 5%, so we
have sufficient evidence to reject 0H at the 5%level. Therefore it is reasonable to conclude that
the averageIQ of university students is greater than 100.
Method 2:
PZ
From the Tables,
test statistic
1.6449)>=(0.05 , so 1.6449 is the critical value for a one-tailed
of 1.768 exceeds this critical value, so wereach the same conclusion
5% test.
The
as we did for
Method 1.
Question
Test using a 5% significance level whether the average IQ of university students is greater than
103, based on the sample in the previous question.
Solution
Weare testing:
:103 Hv
H=>
01
:
103s
Under 0H :
X - 103
n
s
? N(0,1)
The observed value of the test statistic is:
105
20
-
103
= 0.707
50
This is less than 1.6449 (the upper 5% point of a
(0,1)N distribution)
so we have insufficient
evidence to reject 0H at the 5%level. Therefore it is reasonable to conclude that the averageIQ
of university students is not morethan 103.
Alternatively,usingprobability values, wehave PZ (0.707)>0.24 . Thisis greaterthan 0.05, so
we have insufficient
The Actuarial
Education
evidence to reject 0H
Company
at the 5% level.
IFE: 2022 Examination
Page 16
CS1-10: Hypothesis
testing
Question
The annual rainfall in centimetres at a certain weather station overthe last ten years has been as
follows:
17.2
28.1
25.3
26.2
30.7
19.2
23.4
27.5
29.5
31.6
Scientists at the weather station wishto test whether the average annual rainfall hasincreased
from its former long-term value of 22 cm. Test this hypothesis at the 5%level, stating any
assumptions that you make.
Solution
Weare testing:
:22 Hvs H
01
=> :22
Assumingthat annual rainfall
under0H :
X - 22
Sn
measurements areindependent and normally distributed, then
?tn -1
Wehave:
221
(6,895.73 =- 10 25.87 )
9
s
=
22.57
So the observed value of the test statistic is:
25.87 - 22
22.57
=2.576
10
Sincethis is greater than 1.833(the upper 5% point ofthe 9t
evidence to reject 0H
average annual rainfall
Alternatively,
at the 5%level.
distribution), we havesufficient
Therefore it is reasonable
to conclude that the long-term
hasincreased from its former level.
using probability
values,
we have
Pt(9
>
2.576) ? 0.0166 . This is less than
0.05, so
we havesufficient evidenceto reject 0H at the 5%level.
R can carry out a hypothesis
t.test(<sample
For small samples from
test for the
data>,
mean with unknown
a non-normal
which we can calculate the critical
IFE: 2022 Examinations
at the 5% level
using:
conf=0.95)
distribution
then
statistic can be constructed in R using the bootstrap
from
variance
value(s)
an empirical
distribution
of the
method(see Chapter 8, Section 7),
and obtain an estimate
of the p-value.
The Actuarial
Education
Compan
CS1-10:
3.2
Hypothesis
testing
Page 17
Testingthe valueof a populationvariance
Situation:
random
Testing:
H00:ss=
sample,
size n , from
N(,
2)s
sample
variance 2S .
22
(1)nS 2
-
Test statistic is
s
~
2
2
?n
-1 under 0H
0
Forlarge samples, the test works well even if the population is not normally distributed.
Question
Carry out a statistical test to assess whether the standard
deviation
of the heights of 10-year-old
children is equal to 3cm, based on the random sample of 5 heights in cm given below.
Assume
that heights are normally distributed.
124,
122,
130,
125,
132
Solution
Weare testing:
:=?3
:3 Hvs
H
01
ss
Under0H :
4S2
32
2
? ? 4
Wehave:
221
s
4
80,209=- 5 126.6
()
=
17.8
Sothe observed value of the test statistic is:
417.8
32
= 7.91
Ourstatistic of 7.91lies between 0.4844 and 11.14(the lower and upper
distribution).
21/2%
So we haveinsufficient evidence to reject 0H atthe 5%level.
reasonable to conclude that the standard
Alternatively, wehave P(? 4
deviation
points ofthe
?2
4
Therefore it is
of the heights of 10-year-old children is 3cm.
7.91)>2 0.0952. Sincethis test is two-sided, the probability of
obtaining a value at least as extreme asthat actually obtained is
0.0952=2 0.190, whichis
greater than 0.05. So wehaveinsufficient evidenceto reject 0H at the 5%level.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 18
CS1-10: Hypothesis
testing
Question
The annual rainfall in centimetres at a certain weather station overthe last ten years has been as
follows:
17.2
28.1
25.3
26.2
30.7
19.2
23.4
27.5
29.5
31.6
Assumingthese data values are taken from a normal distribution, test at the 5%level whether the
standard deviation of the annual rainfall atthe weather station is equal to 4 cm.
Solution
Weare testing:
:=?4
:4
Hvs
The test is two-sided.
9S2
42
?
H
01 ss
Assuming independence
and normality,
then under 0H :
2
?9
Usingthe sample variance calculated earlier, the observed value ofthe test statistic is:
9 22.57
16
= 12.69
Thisis between the upper and lower
21/2%points of
2 (2.700
?9
and 19.02), so we have insufficient
evidence to reject 0H at the 5%level. Therefore it is reasonable to conclude that the standard
deviation
of the rainfall is 4 cm.
Alternatively,
two-sided,
2
using probability
the probability
0.1775
values,
of obtaining
0.355=. Thisis greater than
we have P(? 92
12.69)>= 0.1775 . Since this test is
a value at least as extreme as that actually obtained is
0.05, so we have insufficient
evidence to reject 0H
at the
5%level.
There is
no built-in function
to carry out a hypothesis
test for the variance in
Rto calculate the value of the statistic from scratch or use a bootstrap
assumptions
For example, if
are not
R.
We can use
methodif the
met.
we are unsure whether the sample comes from a normal distribution,
a bootstrap
method would be moreappropriate here.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
3.3
Hypothesis
testing
Page 19
Testingthe valueof a populationproportion
Situation: n binomialtrials with
Testing:
.00
=:Hp
p
Test statistic is
X
For large
use:
?
=
Bin n(, p0) under 0H .
n, use the normal approximation
1
2
X
Pp(success)
; weobserve x successes.
to the binomial
(with
continuity
correction),
ie
-p
n
??? N(0,1)
(1 - pp)
n
or:
1
2
Xnp
np(1
-
p)
N(0,1)
???
Whencarrying out tests ofthis type wecan work out whether we need to add or subtract the 1
2
in the continuity
correction if weremember
mean of the distribution
under 0H
that
we always adjust the value of X towards the
. Forlarge valuesof n,this will makelittle difference unless
the test statistic is close to the critical value.
Question
In a one-year
mortality investigation,
45 of the 250 ninety-year-olds
present at the start of the
investigation died before the end of the year. Assumingthat the number of deaths has a
Bin(250,)q distribution, test whether this result is consistent with a mortality rate of q = 0.2 for
this age.
Solution
Wearetesting:
Hq
0.2
vs
H :
01:
0.2=?q
Under 0H :
Xn - 0.2
0.2
? N(0,1) approximately
0.8
n
The Actuarial
Education
Company
IFE: 2022 Examination
Page 20
CS1-10: Hypothesis
Using the observed values, n
45.5 250
-
0.2
250=and x
45=, the test statistic with continuity
testing
correction
is:
=-0.712
0.2 0.8
250
Since the
meanis np= 250
0.2 = 50 , the continuity
correction involves
adjusting 45 towards the
mean. So we haveto add 0.5.
Our statistic
of
distribution).
0.712 lies between
-
So we have insufficient
reasonable to conclude that the true
1.960
(the lower and upper 21/2%
points of the (0,1)N
evidence to reject 0H
at the 5%level.
Therefore it is
mortality rate for this ageis 0.2.
Alternatively, using probability values, wehave PZ
( <- 0.712) = 0.238. Sincethis test is two-sided,
the probability
2 0.238
of obtaining
a value at least
as extreme as the one actually obtained is
0.48. This is greater than 0.05 , so we have insufficient evidence to reject
0Hat the
5% level.
Question
A new gene has been identified that makes carriers of it particularly susceptible to a particular
degenerative
disease. In a random sample of 250 adult males born in the UK, 8 were found to be
carriers of the disease. Test whether
the proportion
of adult males born in the UK carrying the
gene is less than 10%.
Solution
We are testing:
Hp01:
:0.1=<0.1 vs
Under
H
p
0H:
Xn
-
0.1
???N(0,1)
0.1 0.9
n
The observed value of the test statistic, with continuity correction adjusted towards the mean, is:
8.5 250 - 0.1
0.1
=-3.479
0.9
250
We are carrying out a one-sided test. The value of the test statistic is less than -1.6449 (the lower
5% point of the
(0,1)Ndistribution) so we have sufficient evidence to reject
Therefore it is reasonable
than 10%.
IFE: 2022 Examinations
to conclude that the proportion
0Hat the 5% level.
of male carriers in the population
The Actuarial
Education
is less
Compan
CS1-10:
Hypothesis
testing
Alternatively,
Page 21
using probability
values,
(PZ <- 3.479) ? 0.00025 .
we have
Thisis less than
0.05 , so
wehavesufficient evidenceto reject 0H at the 5% level. In fact, wehave sufficient evidenceto
reject 0H
at even the 0.05% level.
R can carry
out an exact hypothesis
binom.test(x,n,
3.4
test for
p at the 5% level
using:
conf=0.95)
Testingthe valueofthe meanof a Poissondistribution
Situation: random sample, size n, from Poi()? distribution.
Testing:
=:
H00
??
Teststatistic
is sample
sum ?Xni
~ Poi( )? 0
0n? is of moderate size, probabilities
under0H . In the case where n is small and
can be evaluated directly (or found from tables, if
available).
For large
samples (or indeed
whenever the Poisson
can be used for the distribution
?)Xni? Poi(
Test statistic is
?
n( ??n?,
N
X, and
mean is large)
a normal approximation
ofthe sample sum or sample mean. Recallthat
).
X - ?0
?0
orwecanuse ?
iX , and
~(0,1) under 0H
N
n
?
-Xni?0
~(0,1) under0H .
N
n? 0
Using the second version it is easier to incorporate
The first version has continuity
correction
a continuity correction.
0.5 n, whereas the second version has continuity
correction 0.5.
Question
In a one-yearinvestigation of claim frequencies for a particular category of motorists,the total
number of claims madeunder 5,000 policies was800. Assumingthat the number of claims made
byindividual motoristshasa
average claim frequency
()Poi
? distribution,test at the 1%level whetherthe unknown
? is less than 0.175.
Solution
Weare testing:
H=<
The Actuarial
Education
0.175 Hv
Company
01::
0.175??s
IFE: 2022 Examination
Page 22
CS1-10: Hypothesis
testing
Under0H :
X - 0.175
? N(0,1)
0.175 n
Usingthe observed values, n
5,000= and x
0.16=
, the test statistic, with continuity correction,
is:
800.5
5,000
-0.175
=-2.519
0.175 5,000
Thisis less than - 2.3263, the lower 1%point ofthe
(0,1)Ndistribution.
evidence at the 1%level to reject 0H . Therefore it is reasonable
So we have sufficient
to conclude that the true claim
frequency is less than 0.175.
Alternatively, using probability values, wehave PZ
(
<-
2.519)
=
0.0059. Sincethis is less than
0.01, we havesufficient evidenceto reject 0H at the 1%level.
Question
Arandom sample of 500 policies of a particular kind revealed a total of 116 claims during the last
year. Test the null hypothesis H0:0.18?=
against the alternative H ?>1:0.18 , where ? is the
annual claim frequency, ie the average number of claims per policy.
Solution
Weare testing:
0.18
Hvs
H :?? => 0.1801:
Assuming that the underlying claim frequency
X - 0.18
0.18 n
500
-
then under 0H :
N(0,1)
???
The observed value of the test statistic,
115.5
has a Poisson distribution,
with continuity
correction, is:
0.18
= 2.688
0.18 500
Weare carrying out a one-sided test.
upper 5% point of the
The value of the test statistic is greater than 1.6449 (the
(0,1)N distribution) so we havesufficient evidenceto reject 0H at the 5%
level. Therefore it is reasonable to conclude that the true claim frequency is morethan 0.18.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 23
Alternatively,usingprobability values, wehave PZ 2.688)
>( 0.0036,ie 0.36%.Thisis less than
0.05, so wehave sufficient evidenceto reject 0H at the 5%level. In fact, we havesufficient
evidence to reject 0H
R can carry
even at the 0.5% level.
out an exact hypothesis
poisson.test(x,n,
The Actuarial
Education
Company
test for
? at the 5% level
using:
conf=0.95)
IFE: 2022 Examination
Page 24
CS1-10: Hypothesis
testing
4
Basictests two independentsamples
4.1
Testingthe value of the difference betweentwo population means
Situation: independent
random
samples,
sizes 1n and 2n from
N(,
2s
11 )
,
N(,
2s
22 )
respectively.
Testing:
H
(a)
ss
test
2-=01
d
:
22 known
12,
statistic:
xx -12
z =
ss
+
d
22
12
nn12
There is no built-in function for calculating the above hypothesis test in R. We can use Rto
calculate the results of the statistic from scratch or use a bootstrap
method if the
assumptions are not met.
(b)
ss
22
unknown
12,
Large samples:
use
muchthe more usual situation
Si2 to estimate
2
We will now use a t distribution.
si .
Further, the
Central Limit
distribution
of the test statistic in sampling from any reasonable populations, so the
requirement that
large samples.
Small samples:
Theorem justifies
we are sampling
from
the use of a normal approximation
normal
under the assumption
ss=
distributions
22
12
2 degrees of freedom
2
Remember that sp =
--d
12
sp
nn12+-
say()=
, this common
s2
xx
estimated by2Sp, and the test statistic is t =
is not necessary
for the
when we have
variance is
which is distributed
as t
with
11
+
nn12
under 0H .
(1)22
ns
11-+ ( n2 - 1)s2
nn
12+- 2
.
R can carry out a hypothesis test for the difference between the means with unknown
variance using the function
t.test.
We set the argument var.equal
= TRUEfor small
samples.
Again, we could use the bootstrap
statistic
if the assumptions
IFE: 2022 Examinations
methodto construct an empirical distribution of the
are not met.
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 25
Question
The average blood pressurefor a control group Cof 10 patients was77.0 mmHg. The average
blood pressurein a similar group T of 10 patients on a special diet was75.0 mmHg. Carry out a
statistical test to assess whether patients on the special diet have lower
10
10
i=1
i=1
blood pressure.
Youaregiven
that ?=2ci59,420and ?ti2 = 56,390
.
Solution
Weare testing:
CT
If
Hvs
H
01::
C=>
T
we assume that blood pressures are normally distributed
and that the variance of the
underlying distribution for each group is the same, then under0H :
CT
()-- (0)
11
+
mn
SP
? tmn +-2
Wehave:
1
s=-
??
mn+- 2
sm (1)22 +(nPC
-1)s2T
mn
1
mn
??
2
1
==+-??
1
t 22
) ????
ii 11
mn
+cmc
=-
22
??
2
ii
mn 2??+-??
10 10+-??
2
??
?? (tii
+cc
=- ()
??
nt 2??-t
ii == 11
59,420 =- 10
77.0
+
56,390
-
10
??
75.022
15.00== 3.873 2
As mentioned previously, the number of degrees of freedom to use with a t test is the same asthe
denominator used whencalculating the estimate of the variance, ie 18in this case.
Using the observed values of
10,mn
== 10, t = 75.0, c = 77.0, and sP
3.87322=, the value of the
test statistic is:
(77.0
3.873
The Actuarial
Education
-
75.0)
11
+
10 10
Company
=1.15
IFE: 2022 Examination
Page 26
CS1-10: Hypothesis
Thisis less than 1.734, the upper 5% point of the
18t distribution.
testing
So we have insufficient
evidence to reject 0H at the 5%level. Therefore it is reasonable to conclude that patients on the
special diet have the same blood pressure as patients on the normal diet.
Alternatively,
using probability
0.05, we have insufficient
values,
we have
1.15)>18(0.134. Sincethis is greater than
Pt
evidence to reject 0H
at the 5%level.
We will not always be testing for equality between the two sample
means.
Question
Acar manufacturer runs tests to investigate the fuel consumption of cars using a newly developed
fuel additive. Sixteen cars of the same make and age are used, eight with the new additive and
eight as controls. The results, in miles per gallon over a test track under regulated conditions, are
asfollows:
Control
27.0 32.2 30.4 28.0 26.5 25.5 29.6 27.2
Additive
31.4
29.9
33.2
34.4
32.0
28.7
26.1
30.3
If C is the meannumber of milesper gallon achieved by carsin the control group, and A is the
meannumber of milesper gallon achieved by carsin the group withfuel additive, test:
(i)
(ii)
H:0
-=
HvA
-=:6
s
01
CA
H
01 :
CA
Hvs C->
:0A
C-?6
Solution
UsingiC for the number of milesper gallon ofthe carsin the control group and iA for the
number of milesper gallon ofthe cars with additive, we have:
?ci =226.4
, ?ci2 = 6,442.5? =246ia
, ?ai2
=
7,612.56
=
5.96
,
The estimate of the pooled sample variance is:
1
mn
2
+-??
1
14
(i)
??
??a2 ma2
22
2 +
sciinc=-
6,442.5 =- 8 28.3
+
-
7,612.56
-
8
()
30.7522
Wearetesting:
H:0
IFE: 2022 Examinations
-=
Hvs
01
CA
C-> :0A
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 27
Assuming that the underlying
AC
()--
0
?tmn
11
S
+-
distributions
are normal, then under 0H :
2
+
mn
The observed value of the test statistic is:
30.75 - 28.3
=2.007
11
5.96
+
88
Thisis greater than 1.761(the upper 5% point of the 14t distribution) so we have
sufficient evidence to reject 0H atthe 5%level. Therefore it is reasonable to conclude
that the meanperformance is greater with the additive than without.
Alternatively,
using probability
0.05, so we have sufficient
(ii)
values, we have
evidence to reject 0H
Pt
2.007)>14(
0.0340. Thisis less than
at the 5%level.
Weare now testing:
s H
-= :6 HvA
01 : CA Making the same assumptions
AC
()--
6
? tmn
11
+
mn
S
+-
C ?6
as before, under 0H :
2
The observed value of the test statistic is now:
(30.75
28.3)--
5.96
This is a two-sided
6
=-2.908
11
+
88
test and the statistic is less than -2.145 (the lower
2.5% point of the
t 14 distribution) so wehave sufficient evidence to reject 0H at the 5%level. Therefore it
is reasonable to conclude that the difference in the meansis not equal to 6.
Alternatively,
two-sided,
the probability
obtained is 2
reject 0H
using probability
0.00598
values, we have
of obtaining
Pt(14 <-2.908)
0.00598 . Since this test is
a value at least as extreme as the one actually
00120=. Thisis less than 0.05, so we have sufficient
at the 5%level. In fact,
we have sufficient
evidence to reject 0H
evidence to
even at the
2.5% level.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 28
4.2
CS1-10: Hypothesis
testing
Testingthe valueofthe ratio oftwo populationvariances
Situation: independent
respectively.
random
samples,
Sample variances
Testing:
Hss
22
2
01
This test is a formal
= 22 is required.
ss 12
Hvs
::
prerequisite
In practice,
justify the assumption
any problem
1
S12
s
2
1
N(,
s
2
11
),
N(,
2
22 )
s
and S2.
2
=?
s2
2
for the two-sample
however, a simple
t test, for which the assumption
plot of the data is often sufficient
to
only if the population variances are very different in size is there
with the t test.
Test statistic:
22
SS
/~ F
12
nn-- 1 under 0H
1,
Wesaw in Chapter 9 that
12
22
SS
12
22
ss
? Fnn 1,12 1 , soit follows that if wearetesting the hypothesis
--
12
12, wecan usethe test statistic
SS
22 and compare it withthe critical pointsin the
12
ss=22
appropriate
sizes 1n and 2n from
Ftable.
Question
The average blood pressurefor a control group C of 10 patients was77.0 mmHg. The average
blood pressurein a similar group T of 10 patients on a special diet was75.0 mmHg. Test whether
the variances in the two populations
can be considered to be equal.
10
10
i=1
i=1
You
aregiven
that ?=2ci
59,420
and ?=2ti56,390
.
Solution
Weare testing:
22
s=?ss TC
Hvs H
01::
2
2
C
s T
Assumingthat blood pressures are normally distributed, then under0H , both populations have
the same variance, so that:
S
22
s
22
s
S2
= TT ? Fmn1, -2
SS
CC
1
Usingthe given data values, we have:
sT
sC
1
9
1
9
56,390=- 10
()
=
15.56
()
=
14.44
22
75
59,420 =- 10 7722
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 29
The observed value of the test statistic is:
15.56
=1.077
14.44
1
Thisis atwo-sided test and ourstatisticis between 4.026and 4.026=
0.2484(the upperand
lower 21/2%
valuesfrom the 9,9Fdistribution). Sothere is insufficient evidenceto reject0H at the
5%level. Therefore it is reasonable to conclude that there is no difference in the variances of the
two populations.
PF9,
>9 (1.077) is greaterthan 0.1.
Alternatively, wecan seefrom page 171 of the Tablesthat
Since the test is two-sided, the p-value is greater than
20.1
we have insufficient evidence to reject 0H at the 5%level.
This means that
0.2=. Thisis greater than
we werejustified in carrying out the two-sample
t test previously,
0.05, so
which
assumes equal variances.
Had we used
s 2C
14.44
s2
15.56
T
== 0.9280,
we would have reached the same conclusion.
R can carry out a hypothesis test for the ratio ofthe variances using var.test
use a bootstrap
4.3
method if the assumptions
or we could
are not met.
Testingthe value ofthe difference betweentwo population proportions
Both one-sided and two-sided tests can easily be performed
binomial probabilities
at least for large samples.
on the difference
between two
Situation:
Testing:
n1 (large) trials
withPp(success) =
n2 (large) trials
withPp(success)
Hp:p=
01
-
pp12()
(1 pp)
+
sample
2
;
proportions
successes.
N~0,1() under 0H
XX
12 ), and p is the
nn12
estimates (MLEs) of1p and 2p respectively, (the
MLE of the common
which is the overall sample proportion, namely
Education
observe 2x
p(1-- p)
nn12
, 12 are the maximumlikelihood
pp
,
The Actuarial
observe 1x successes.
2.
Test statistic is
where
=
1;
Company
p under the null hypothesis,
+XX
12
+nn
12
.
IFE: 2022 Examination
Page 30
CS1-10: Hypothesis
-pp
12
In some textbooks an alternative test statistic is used, namely:
(1 pp
)
11
+
p2(1-- )2p
testing
? N(0,1).
nn
12
The denominator in the Core Reading expressionis found by pooling the sample proportions,
whereas in the alternative
Both approximations
version, the values of 1p and 2p are used separately.
are valid. In the exam we would advise you to use the version shown in the
Core Reading.
Question
In a one-year
mortality investigation,
25 of the 100 ninety-year-old
males and 20 of the 150
ninety-year-old females present at the start ofthe investigation died before the end ofthe year.
Assuming that the numbers of deaths follow
binomial distributions,
test whether there is a
difference between maleand female mortality rates at this age.
Solution
Weare testing:
F::q
HqMF
q
vs
H
01
M=? q
If MX andFX denotethe number of deathsamongthe malesandfemales, mand f arethe
sample sizes, and q the pooled sample proportion,
XX ??
MF
??--
mf??
(1 qq)
q(1-+
then, under 0H :
0
q)
? N(0,1)
mf
100,mf== 150,
Usingthe observed values of
= 25, xMFx= 20, and q = 45 , the value of the
250
test statistic is:
0.25 0.1333
()(0.18
0.82) 100+ (0.18
=2.35
0.82) 150
Thisis greaterthan 1.960(the upper 21/2%
point ofthe
(0,1)Ndistribution). So we havesufficient
evidence to reject 0H at the 5%level. Therefore it is reasonable to conclude that
maleand
female mortality rates are different at this age.
Alternatively, using probability values, wehave PZ 2.35)>=(0.0093. Sincethis test is two-sided,
the probability of obtaining a value atleast as extreme as the one actually obtained is
2
0.0093
0.019=.
As 0.019
0.05<, we have sufficient
evidence to reject 0H
at the 5%level.
Thistest can also be one-tailed.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 31
Question
Asample of 100 claims on household policies madeduring the yearjust ended showed that 62
were dueto burglary. Asample of 200 claims madeduring the previous year had 115 due to
burglary.
Test the hypothesis that the underlying
proportion
of claims that are due to burglary is higher in
the second year than in the first.
Solution
Wearetesting:
Hp
01
where
p2
H1
:: p2=> p1 (ie p1
vs
12 are the
and pp
proportions
- p2 <0)
of claims due to burglaries in the previous and current years
respectively.
If 1N
and 2N
denote the numbers of claims due to burglaries in each year, then, under 0H :
(200 NN
(1 pp)
200
+
100)12
0
--
???
(1 -- pp)
N(0,1)
100
The observed value of the test statistic is:
(115 200
62 100)--
0
=-0.747
0.59(1 0.59) 0.59(1
0.59)
-+
200
100
Weare carrying
out a one-sided test and the value of our statistic is greater than -1.6449 (the
lower 5%point ofthe (0,1)Ndistribution). So we haveinsufficient evidence to reject 0H atthe
5%level.
Therefore it is reasonable to conclude that the proportion
of claims due to burglaries in
the yearjust endedis not greater than the proportion in the previous year.
Alternatively, using probability values, wehave (PZ <- 0.747) 0.228. Sincethis is greaterthan
0.05, we have insufficient
evidence to reject 0H
at the 5%level.
R can carry out a hypothesis test for the difference in proportions
argument
4.4
using prop.test
withthe
correct=FALSE.
Testingthe value ofthe difference betweentwo Poisson means
Situation: independent
distributions.
random samples, sizes 1n and2n , from
Considering
the case in
which normal
Poi
approximations
?
1 and Poi
()
()
?2
can be used
which is
so whenever the sample sizes are large and/or the parameter values arelarge:
Testing:
The Actuarial
H
:
01 =?? 2 .
Education
Company
IFE: 2022 Examination
Page 32
CS1-10: Hypothesis
-??
Test statistic is
testing
()
12
? N(0,1)
??
+
nn
12
under 0H
where
??
,12 are the
MLE of the common
MLEs (the sample
? under the null hypothesis,
means
XX12, ,
respectively)
which is the overall
and
sample
?
is the
mean.
()
12
? N(0,1).
-??
Again,in some textbooks
you maysee an alternative test statistic,
namely:
??12
+
nn
12
Similarly to the last section, the Core Reading version has a pooled value for the parameter,
whereasthe alternative version doesnt.
Both are valid approximations.
Question
In a one-year investigation
of claim frequencies
for a particular
category of motorists, there
were
150 claims from the 500 policyholders aged under 25 and 650 claims from the 4,500 remaining
policyholders. Assumingthat the number of claims madebyindividual motoristsin each category
has a Poisson distribution, test atthe 1%level whether the claim frequency is the same for drivers
under age 25 and over age 25.
Solution
Weare testing:
?=???YO
Hvs H
::
01
?Y
O
where weare using Yto represent young
and Oto represent old.
Under0H :
YO
()--
0
???
N(0,1) where mand n arethe samplesizes
??
+
mn
The observed value of the test statistic is:
0.300- 0.144
0.16
500
0.16
+
= 8.25
4,500
Weare carrying out atwo-sided test and our statistic is muchgreater than +2.5758 (the upper
1/2%point ofthe (0,1)Ndistribution). So weeasily havesufficient evidence to reject 0H at the 1%
level. Therefore it is reasonable to conclude that the claim frequencies are different for younger
and older drivers.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
Alternatively,
is two-sided)
testing
Page 33
using probability
values,
we have
(PZ><<
8.25)
0.0005% .
Doubling this (as this test
gives a p-value that is still less than 0.001%. So we have sufficient
evidence to reject
H0, even at the 0.001%level.
In fact, although the hypotheses
werent
posed in this
the claim frequency is higher for the younger
There is no built-in function
for
calculating
the above statistic in
calculate the result from scratch or use a bootstrap
hypothesis
The Actuarial
test for the ratio
Education
Company
of the two
wayin the question,
wecan conclude that
drivers.
Poisson
R.
Wecan use Rto
method. However, R can carry out a
parameters
using
poisson.test.
IFE: 2022 Examination
Page 34
5
CS1-10: Hypothesis
Basictest
In testing
testing
paireddata
for a difference
between two population
can have a major drawback.
means, the use of independent
samples
Evenif areal difference does exist, the variability among the
responses
within each sample can be large enough to mask it. The random variation
within
the samples will mask the real difference between the populations from which they come.
One way to control this variability external to the issue in question is to use a pair of
responses from each subject, and then work with the differences
within the pairs. The aim
is to remove as far as possible the subject-to-subject
variation from the analysis, and thus
to home in on any real difference between the populations.
Assumption:
Testing:
differences
H
=01 :D
a random
sample from
a normal
distribution.
()
d=-
2
D
Test statistic is
constitute
-
d
? tn -1 under 0H .
D / Sn
Wecan use
(0,1)N for t, and do not require
the normal
assumption,
if nis large.
Question
Theaverageblood pressure B for a group of 10 patients was77.0 mmHg.Theaverageblood
pressure A after they were put on a special diet was 75.0 mmHg. Carry out a statistical test to
assess whether the special diet reduces blood pressure.
10
You are given that
?(
ba
ii)
-=68.0.
2
i =1
Solution
Weare testing:
Hvs
B=<
H
01::A BA
where A is after and Bis before
Wecan calculate the difference in blood pressure
within each pair, ie
DA
=- Bi . If
ii
we assume
that blood pressures are normally distributed, then under0H , the iD s also have a normal
2
distribution. So wecan apply a one-sample t test to the iD s, based onthe sample variance sD:
D
-D
()
ABt? n-1
Sn
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 35
For our samples:
da
=
-
75.0=-=- 77.0
b
2
d)=-=
22
sdDi
nn
11
nn--??
11??
di
2 -nd2
??
1
=
ii== 11
9
68.0 - 10(-2.0) 2????
= 3.111
????(
=
1.7642
So,the observed value of the test statistic is:
75.0 - 77.0
=-3.59
1.764 10
Thisis less than -1.833, the lower 5% point of the 9t
to reject 0H
at the 5%level.
distribution. So we have sufficient evidence
Therefore it is reasonable to conclude that the special diet does
reduce blood pressure.
Alternatively,
using probability
values,
we have
Pt(9 <-3.59)
0.0037 , which is less than 0.05. So
wehavesufficient evidenceto reject 0H at the 5%level. In fact, we havesufficient evidenceto
reject it at even the 0.5% level.
When we performed the two-sample
because the reduction
t test earlier,
we were unable to reach this conclusion
was masked by other factors.
Sometimes care is needed to carry out the test the
right
wayround.
Question
In order to increase the efficiency with which employees in a certain organisation can carry out a
task, 5 employees
are sent on a training
course.
The time in seconds to carry out the task both
before and after the training course is given below for the 5 employees:
A
B
C
D
E
Before
42
51
37
43
45
After
38
37
32
40
48
Test whether the training course has hadthe desired effect.
Solution
Weare testing:
AB=<
where Ais After
The Actuarial
Education
Hvs
H
01
BA
(ie
A::-
B <
0)
and Bis Before.
Company
IFE: 2022 Examination
Page 36
CS1-10: Hypothesis
Taking the differences
performance),
a=db
(so that a positive value of d represents
5
3
testing
an improvement
in
we have:
4
14
3
Applying a one-sample t test to the D values (and assuming that the underlying distributions are
normal):
D
()
BAt? n-1
--
Sn
D
For the sample values:
d
23
==
5
11
22?
= (255
4.6 and
sd
( Di=- d)
44
-
5 4.6 2 )
=
6.107
2
So the observed value of the test statistic is:
4.6
-
0
6.107
=1.684
5
Thisis a one-sided test and the observed value of the test statistic is less than 2.132(the upper 5%
point of the 4t distribution). So we haveinsufficient evidence to reject 0H at the 5%level.
Therefore it is reasonable to conclude that the training
course does not increase
employees
efficiency.
Alternatively, using probability values, we have Pt
0.05. So we have insufficient
R can carry out this
IFE: 2022 Examinations
evidence to reject 0H
hypothesis
test
using t.test
1.684)>4(0.0874, whichis greater than
at the 5%level.
with the argument
paired=TRUE.
The Actuarial
Education
Compan
CS1-10:
6
Hypothesis
testing
Page 37
Testsandconfidenceintervals
You mayhave noticed that weve been using some of the same examplesin this chapter asin
Chapter 9. Thisis becausestatistical tests and confidence intervals are very closely related. The
methods are basically the same in each case, except that they
work opposite
ways round.
Confidenceintervals start from a probability and find a range of parameters associated withthis.
Statistical tests start with a possible value (or values) for the parameter and associate a
probability value with this.
There are very close parallels between the inferential
methods for tests and confidence
intervals.
In many situations there is a direct link between
parameter and tests of hypothesised
values for it.
a confidence
interval
for a
A confidence interval for ? can be regarded as a set of acceptable hypothetical values for
?, so a value
contained in the confidence interval should be such that the hypothesis
0?
H00
=:??
will be accepted in a corresponding
test.
This generally
proves to be the case.
In some situations there is a difference between the manner of construction
ofthe
confidence interval and that of the construction
of the test statistic
which is actually used.
For example the confidence interval for the difference between two proportions (based on
normal approximations)
is constructed in a different way from that used for the test statistic
in the corresponding test, where an estimate of a common proportion (under 0H ) is used.
As a result, in this
and similar
between the confidence interval
One useful consequence
cases there is only an approximate
and the corresponding
of this relationship
match (albeit
a good one)
test.
between tests and confidence intervals is that if
we
have a 95%confidence interval for a parameter, wecanimmediately apply a 5%test on the value
of that parameter by observing whether or not the interval contains the proposed value.
Question
Aresearcher hasfound 95% confidence intervals for the average daily vitamin Cconsumption (in
milligrams)in three countries. For country Ait is (75,95), for country Bit is (40,50) and for
country Cit is (55,65). Onthe basisof thisinformation, are people are getting sufficient vitamin C
in each country? Therecommended daily allowance is 60mg.
Solution
Country A
The95%confidenceinterval is (75,95), whichcontains only valuesabove60. Soin a 5%test of
vs
H0:60=
H
1:60>
wereject 0H
and conclude that people are getting
more than enough
vitamin C.
Country B
The95%confidenceinterval is (40,50), whichcontains only values below 60. Soin a 5%test of
H0:60=
vs
H
1:60<
wereject 0H
and conclude that people are not getting enough
vitamin C.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 38
CS1-10: Hypothesis
testing
Country C
The95%confidenceinterval is (55,65), whichcontainsthe value 60. Soin a 5%test wecannot
reject 0H and weconclude that people are getting the recommended daily allowance.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
7
Hypothesis
testing
Page 39
Non-parametric
tests
The tests
we have been considering
the variables ofinterest
the level
of statistical
so far all make assumptions
about the distribution
of
within the population. If these assumptions are not correct, then
significance
It is possible to devise tests
termed non-parametric.
can be affected.
which
make no distributional
assumptions.
Such tests
are
They have the advantages of being applicable under conditions in
which the tests in the previous
sections
should
not be used.
For example, whilstthe two sample t test is robust for departures from normality and equal
variances for large samples, it is not appropriate for small samples with a non-normal distribution.
Hence, we need to use a test
which doesnt
make any distributional
assumptions
about the data
or the test statistic. Thesetests are called non-parametric tests.
However, some non-parametric
tests
do not use all the information
available.
For example,
the Signs Test in Subject CS2 uses the signs ofthe differences between two samples while
ignoring
their
magnitude.
By using only some of the information, the test wont be as accurate.
7.1
Permutationapproach
One way of constructing
a non-parametric
test is to consider all possible permutations
of
the data subject to some criterion.
For example, consider a test of the difference between
the means of two independent
samples of sizes An and Bn . The null hypothesis is that
there is no difference in the mean ofthe two samples.
Label the two samples
as A and B, and consider
elements
on the combined
sample such that An
category
B. Each of these
permutations
all possible
of them
will produce
ofthe
Assuming that each permutation
are in category
a test statistic
and the mean differences from all possible permutations
differences.
ways of selecting
(the
the Bnn+
A
A and Bn
mean difference),
will provide a distribution
is equally likely,
are in
we can calculate
of mean
the p-value
mean difference in the data we have (the permutation actually observed).
The null hypothesisis that the distributions of both categories are the same and hencethe means
(or any other statistic such asthe medians)are the same. In whichcase, a data point is equally
likely to have been assignedto either group.
Wecan then calculate the p-valuefor our observed statistic of the sampling distribution. This will
be the proportion
of permutations
that lead to test statistics at least
as extreme (relative
to an
alternative hypothesis) as the actuallabelling ofthe data.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 40
CS1-10: Hypothesis
If the two
samples
are stored in vectors
xA and xB, then
sample
testing
R code for obtaining
the
permutation sampling distribution for the difference in the meansis as follows:
results
<-c(xA,xB)
index
<-1:length(results)
p<-combn(index,nA)
n<-ncol(p)
dif<-rep(0,n)
for
(i
in
1:n)
{
dif[i]<-mean(results[p[,i]])-mean(results[-p[,i]])
}
If our observed statistic is T and our alternative hypothesis is
is calculated
H
:>
11
2 then the p-value
as follows:
length(dif[dif>=T])/length(dif)
Alternatively,
we can use the permTS function
function in the coin package orthe perm.test
(though
this
only
works if the observed
in the perm package
or the oneway_test
function in the exactRankTests
package
values are integers).
Similar approaches
can be used for tests for paired data where the pairs are kept together.
This is equivalent to calculating the permutations
of the signs of the differences
of the
pairs.
The permutation
approach is not new, but it has become much more feasible
with the advent of powerful computers,
which can undertake the calculation
permutations involved in all but the smallest problems.
in recent years
of the many
However, for larger samples the number of permutations
grows rapidly and this
computationally
expensive.
Hence, we usually resort to resampling
methods.
For example, two groups of size 20 result in 137,846,528,820
combinations.
becomes
Resampling
methods
reduce the number of combinations and thus the computation time.
Thesetechniques will be described morefully in the CS1Bcourse.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
8
Hypothesis
testing
Page 41
Chi-square
tests
These tests
are relevant
to category
or count
data.
Each sample value falls into
one or
other of several categories or cells. The test is then based on comparing the frequencies
actually observed in the categories/cells
hypothesis,
using the test statistic
expected
under some
respectively
in the ith
feii ()-2
?
where if
with the frequencies
ei
and ie
category/cell,
are the
observed
and expected frequencies
and the summation is taken over all categories/cells involved.
has, approximately,
( ?2) distribution
a chi-square
which the expected frequencies
The statistic is often
written as
under the hypothesis
This statistic
on the basis of
were calculated.
?
-OE
ii ()2
, to show which is the observed value.
The values of
Ei
Oi andiE should be numbers rather than proportions or percentages.
8.1
Goodnessoffit
This is investigating
whether it is reasonable
specified distribution, ie whether a particular
to regard
a random
sample as coming from
a
model provides agood fit to the data.
Degreesoffreedom
Suppose there
are k cells, so k terms in the summation
which produces the statistic,
and
thatthesample
sizeisinf= ? . Theexpected
frequencies
alsosumto n,soknowing
any
k - 1 of them
automatically
gives you the last
one.
terms which are added up to produce the statistic
of freedom
of the basic statistic
Further, for each parameter
is
of the
There is a dependence
built in to the
k
and this is the reason whythe degrees
k - 1 and not k.
distribution
specified
by the null hypothesis
which
must
be estimated from the observed data, another degree of dependence is introduced in the
expected frequencies
The theory behind this
for each parameter estimated another degree of freedom is lost.
assumes that the maximum likelihood
estimators are used. So the
number of degrees offreedom is reduced bythe number of parameters estimated from the
observed
data.
Theaccuracy ofthe chi-square approximation
The test statistic
is only approximately,
expected frequencies
Dividing
erratic,
ie
by very small ie
and the tail
distribution
very
The Actuarial
in the denominators
distributed
of the statistic
terms to be somewhat
may not match that
well. So,in practice, it is best not to have too
cells and suffering
as2? . The presence
of the
of the terms to be added up is important.
values causes the resulting
of the distribution
can be done by combining
offreedom.
not exactly,
the consequent
many small ie
loss
large
and
of the 2?
values,
which
of information/degrees
The most common recommendation is not to use anyie which is less than 5.
Education
Company
IFE: 2022 Examination
Page 42
CS1-10: Hypothesis
(However, the statistic is
approach,
them
testing
morerobust than that and in practice a less conservative
such as ensuring
that
allie
are greater than
1 and that
not more than
20% of
are less than 5, may be taken.)
Question
In testing
whether a die is fair, a suitable
(PX
and the
1
,
6
i) ==
i
hypotheses
=
1,2,3,4,5,6
modelis:
where Xis the number thrown
may be:
H0: Number thrown
has the
distribution
specified
H1: Number thrown
does not have the
distribution
If the die is thrown
300 times,
with the following
x:
1
2
3
4
5
6
fi :
43
56
54
47
41
59
in the
model
specified
in the
model
results,
Carry out a ?2 test to assess whether the data comes from a fair die.
Solution
Under0H , 300
1
= 50 occurrences
i = 1,2,3,4,5,6 . The values of
frequencies,
of each face of the die would be expected, so ei = 50,
6
fe ()-ii
, the differences between observed and expected
are then:
7,6,4,-- 3, - 9,9
which of course
sum to zero.
The value of the test statistic is then:
49
36
16
50
50
+
50
9
81
50
50
+++
81
272
+
=
50
=
50
5.44
In this illustration,
with 6 cells and a fully specified
distribution
of the test statistic under 0H is ?52 .
Thisis a one-sided test.
Since 5.44 is less than 11.07 (the upper 5% point of the
?2
5
we haveinsufficient evidence to reject 0H at the 5%level.
Alternatively, the p value is
P ?2
to estimate), the
Wereject 0H for large values ofthe statistic (ie whenthe observed and
expected values are very different).
distribution)
model (no parameters
P
?
2
>
(55
.44) . The probability tables (on page 165) show that
(55.5) >= 0.358 , so P ?2 >(55.44) is about 0.36.
so we have observed
IFE: 2022 Examinations
a value
much in line
Note also that a ?52 variable has mean 5,
under the model.
with what is expected
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 43
We have no evidence that the die is not fair.
H0 can stand.
Question
Thetable below shows the causes of deathin elderly menderived from a study in the 1970s.
Carry out a chi-square test to determine whether these percentages can still be considered to
provide an accurate description of causes of deathin 2000.
Cause of death
Proportion of deathsin 1975
Number of deaths in 2000
Cancer
8%
286
Heart disease
22%
805
Other circulatory disease
40%
1,548
Respiratory diseases
19%
755
Other causes
11%
464
Solution
Wearetesting:
H0: the causes of deathin 2000 conform to the percentages shown
vs
H1 : the causes of death in 2000 do not conform to the percentages shown
Under0H :
?
2
2ii -OE()
Ei
? ? f
where f is the number of degrees of freedom.
The expected values for each category are calculated
by multiplying the total number of deaths by
the percentage for that category. For example the expected number of deaths from heart disease
is 0.22 3,858 848.8=
.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 44
CS1-10: Hypothesis
2()
CO
ii =-iE
Thetable below shows the observed and expected figures is (where
Actual,iO
Cause of death
Expected,iE
C
i
Cancer
286
308.6
1.66
Heart disease
805
848.8
2.26
1,548
1,543.2
0.01
Respiratory disease
755
733.0
0.66
Other causes
464
424.4
3.7
3,858
3,858
8.29
Other circulatory diseases
Total
There are no small groups.
testing
i
E):
The value of the chi-square statistic is 8.29.
There are 5 categories. TheiE s
werecalculated from the total number of observations.
havent estimated any parameters. Sothe number of degrees offreedom is
We
51-= 4 .
Chi-square goodness-of-fit tests are one-sided tests. The observed value of the test statistic is less
than 9.488, the upper 5% point of the
H0 at the 5%level.
?2
4
distribution.So wehaveinsufficientevidenceto reject
Therefore it is reasonable to conclude that there
has been no change in the
pattern of causes of death.
Alternatively,
using probability
values,
we have P(? 4
8.29)>2 0.0819 , whichis greater than 0.05.
So we haveinsufficient evidenceto reject 0H atthe 5%level.
Wecan apply the test to data from other distributions, for example the Poisson distribution.
Question
The numbers of claims madelast year byindividual
motorinsurance policyholders were:
Number of claims
0
1
2
3
4+
Number of policyholders
2,962
382
47
25
4
Carry out a chi-square test to determine
whether these frequencies
can be considered to conform
to a Poisson distribution.
Solution
Weare testing:
H0: the number of claims conform to a Poisson distribution
vs
H1: the number of claims dont conform to a Poisson distribution
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 45
Under 0H :
?
2ii -OE() 2
? ? f
Ei
Tofind the expected numbers, we mustestimate the unknown meanof the Poisson distribution.
The MLEof the meanof a Poisson distribution is the meannumber of claims. If we assumethat
no policyholders
made more than 4 claims, this is:
2,962 0 + 382 1+ 47 2+ 25 3+ 4 4
?==0.1658
3,420
The expected values are found by applying the Poisson probabilities calculated usingthis value for
the parameter to the total observed number of claimsie 3,420.
The table showing the observed and expected figures is:
Number of claims
Actual
Expected
0
2962
2,897.5
1
382
480.4
2
47
39.8
3
25
2.2
4 or more
4
0.1
Total
3,420
3,420
Wecalculate the last expected figure
by subtraction.
The expected numbers in the last two groups are very small, so we need to combine the last three
groups to form a2 or more group.
The value of the chi-square statistic is:
?
2 (2,962 2,897.5)
2,897.5
(382--
=+
22
480.4)
(76
480.4
+
-
42.1)2
42.1
= 48.89
There are now 3 groups. TheiE s werecalculated from the total number of observations.
We
have estimated one parameter. Sothe number of degrees of freedom is 1--31 1 =
.
Weare carrying
out a one-sided test.
the upper 0.5% point of the
?2
1
The observed value of the test statistic far exceeds 7.879,
distribution.So wehavesufficientevidenceto reject0H atthe
0.5%level. Therefore it is reasonable to conclude that a Poisson model does not provide a good
modelfor the number of claims.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 46
CS1-10: Hypothesis
testing
Question
Ona particular run of a process which bottles a drink, it is thought that the cleansing process of
the bottles has partially failed. The bottles have been boxedinto crates, each containing six
bottles. It is thought
that each bottle, independently
of all others, has the same chance of
containing impurities.
Asurvey has been conducted, and each bottle in arandom sample of 200 crates has been tested
for impurities. Thetable below givesthe numbers of cratesin the sample which had the
respective
number of bottles
Number ofimpure
which contained impurities:
bottles:
Number of crates:
0
1
2
3
4
5
6
38
70
58
25
6
2
1
Testthe goodness offit of a binomial distribution to these observations.
Solution
Wefirst need an estimate of ?, the proportion of bottles containing impurities.
finding the
MLEfor
Weget this by
? based on the random sample.
Perhapsthe simplest wayto calculate the MLE,? , is:
total
number of successes (impure
bottles)
total number of bottles
Alternatively,
we can see that
?6= x
301
==0.25083333
1,200
, where x is the
mean number of impure
bottles per crate.
301
Fromthe data, x==
1.505, so, giventhat there aresix bottlesin eachcrate,
200
x 6 0.25083333==
?
.
An alternative
approach to deriving the
Let the number of bottles
x1
MLEis asfollows.
with impurities
in each crate in a random
)n(6, ?
2,...,xx200,
. Eachix is an observation from a Bi
function for
L()=-
sample of 200 crates be
distribution, and so the likelihood
? is:
6??
??
??
x1??
(1 ? )6-xx
constant =??)??
...11
6
??
??? 200(1
x200??
1,200
-
-
?) 6-xx 200
xxii(1
Taking logs:
l og
IFE: 2022 Examinations
Lxii)log(1
log =+(1,200 -?? x
-
??
)
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Differentiating
Page 47
with respect to
log
?
? and setting the result equal to zero:
??xxiiL 0
1,200 =-
??
=
1- ?
?
xi
Solving
thisweget ?===?
1,200
301
0.25083.
1,200
Wecan now calculate the expected frequencies.
Wecalculate the probabilities
from a
Bin(6,0.25083) distribution, and multiply each probability by 200:
Number of bottles with
Observed
Expected
0
38
35.36
1
70
71.03
2
58
59.46
3
25
26.54
4 or more
9
7.61
Total
200
200
impurities
Wehave combined the last three groups since the expected frequencies
are small. In fact
we
anticipated that the last two groups weregoing to havesmall expected numbers and calculated
the expected number for the 4 or more group by subtraction from 200.
The observed value of the chi square statistic is:
?
(70 -- 71.03)22
2 (38 35.36)
=+
35.36
(58
+
71.03
-
59.46) 2
59.46
(25
+
-
26.54) 2
26.54
(9
+
-
7.61)2
7.61
= 0.59
There are now 5 groups.
TheiE s
were calculated from the total
number of observations.
have estimated one parameter. Sothe number of degrees of freedom is 3-51 1 =
Weare carrying
out a one-sided test.
The observed value of the test statistic
We
.
has a p-value of
about 90%. So we haveinsufficient evidence to reject 0H at the 90%level. Therefore it is
reasonable to conclude that the underlying distribution is binomial.
Indeed the fit is almost too
R can carry
out a2?
Education
The resulting
goodness-of-fit
chisq.test(<observed
The Actuarial
good.
Company
test
freq>,
value of the test statistic is suspiciously small.
using:
p=<expected
probabilities>)
IFE: 2022 Examination
Page 48
8.2
CS1-10: Hypothesis
testing
Contingency
tables
A contingency table is a two-way table of counts obtained when sample items (people,
companies, policies, claims etc) are classified according to two category variables.
The
question ofinterest is whether the two classification
H0: the two classification
criteria
criteria areindependent.
are independent.
The simple rule for calculating the expected frequency for any cell is then:
row total
column total
table
total
(iethe proportionof datain rowi is
???ijffij
ji
numberexpected
in cell)i(,j is
??
ij
??
??? ?ijff
ji
The degrees offreedom associated
(rc
1)-- ( r
so if the criteria areindependent,
the
j
j
??
??
with atable
fij .)
i
with r rows and c columns is:
1)-- ( c - 1) = ( r - 1)( c - 1)
since the column totals and row totals reduce the number of degrees of freedom.
Animportant use ofthis methodis withatable of dimension2
test for
differences
among 2 or more population
c (or2r
) whichgivesa
proportions.
Question
For each ofthree insurance
companies, A, B, and C, a random sample of non-life policies of
a particular kind is examined. It turns out that a claim (or claims) have arisen in the past
year in 23% of the sampled policies for A,in 28% of those for B, and in 20% of those for C.
Test for differences in the underlying proportions
rise to claims in the past year among the three
of policies of this kind which have given
companies
(a)
the sample sizes
were 100, 100, and 200 respectively
(b)
the sample sizes
were 300, 300, and 600 respectively.
Comment
briefly
IFE: 2022 Examinations
in the two
situations:
on your results.
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 49
Solution
H0: population proportions are all equal
H1: population proportions are not all equal
(a)
Observed frequencies:
A
B
C
23
28
40
91
77
72
160
309
100
100
200
400
?
Expected frequencies
under 0H :
A
?
B
C
22.75
22.75
45.50
91
77.25
77.25
154.50
309
100
100
200
400
Values of :ii -fe
0.25
5.25
-0.25
-5.25
-5.5
5.5
So:
2
?
=
0.25
22.75
+
22
5.25
22.75
0.003=+ 1.212
=
+
+
5.52
45.50
0.665
+
0.25
5.2522
+++
77.25 77.25
0.001
+
0.357
+
5.52
154.50
0.196
2.43
on 2 df.
Here df stands for degrees of freedom.
This is an unremarkable
stand.
(b)
No differences
now
?
2
p-value
Education
2.
?2
We have no evidence
among the population
The sample sizes are increased
claims as in (a) are assumed.
each component
The Actuarial
value for
proportions
against 0H , which can
have been detected.
by a factor of 3, but the same percentages
with
fe
and
-()
fe
all
increase
by
a
factor
of
3 so
, ii
ii
of 2? , and the resulting
value, also increase
by a factor
of 3. So
7.3=
.
=
P ?
2 (72
.3) , which is just
>
Company
a bit bigger than 0.025.
IFE: 2022 Examination
Page 50
CS1-10: Hypothesis
There is quite strong
evidence
against 0H
we conclude
that the
testing
population
proportions are not all equal (p-value about 0.03).
Comments: The observed sample proportions
23%, 28%, and 20% are not significantly
different
when based on sample sizes of 100, 100, and 200, but are significantly
different
when based on sample sizes which are considerably
bigger (300, 300, and 600).
Question
In aninvestigation into the effectiveness
according to the severity of their injuries
of car seat belts, 292 accident victims were classified
and whether they were wearing a seat belt at the time
of the accident. The results were asfollows:
Wearing a seatbelt
Not wearing a seatbelt
3
47
Severe injury
78
32
Minor injury
103
29
Death
Determine
whether the severity of injuries
sustained is dependent
on whether the victims are
wearing a seat belt.
Solution
The hypotheses are:
H0:
severity ofinjuries is independent of wearing aseatbelt
H1:
severity ofinjuries is not independent of wearing aseatbelt
Wecan calculate the expected frequencies
in each category
by multiplying the row and column
totals, and dividing by the overall total:
Expectedfreq
Wearinga seatbelt
Not wearing a seatbelt
Death
31.5
18.5
Severeinjury
69.3
40.7
Minorinjury
83.2
48.8
For example, 184 50 = 31.507. So wecan now calculate the value of the chi-square statistic:
292
2
(3
?
31.5)
31.5
=+
?
(29 -+
48.8)22
48.8
= 85.39
Thenumber of degreesoffreedomis (32--1)(2 1) = .
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Weare carrying
Page 51
out a one-sided test.
Our observed value of the test statistic is far in excess of
10.60, the upper 0.5% point ofthe ?2
2 distribution.In fact
out the first term in the
wecouldhavestoppedafter working
?2 value which is already 25.79. So we have sufficient
evidence to reject
H0 at the 0.5%level. Therefore it is reasonable to conclude that the level ofinjury is almost
certainly dependent on whether the victim is wearing a seatbelt.
Question
Thetable below shows the numbers of births during one month at a particular hospital classified
according to whether a particular medicalcharacteristic wasor wasnt present during childbirth.
Age of mother
< 20
21-25
26-30
31-35
36+
Total
10
12
9
4
3
38
5
51
38
25
5
124
15
63
47
29
8
162
Characteristic
present
Characteristic
absent
Total
Assess whether the presence ofthis characteristic is dependent on the age of the mother.
Solution
The hypotheses are:
H0:
the characteristic is independent of the mothers age
H1:
the characteristic
is not independent
of the
mothers age
The observed frequencies are:
Age of mother
< 20
21-25
26-30
31-35
36+
Total
Characteristic
present
10
12
9
4
3
38
5
51
38
25
5
124
15
63
47
29
8
162
Characteristic
absent
Total
The Actuarial
Education
Company
IFE: 2022 Examination
Page 52
CS1-10: Hypothesis
Wecan calculate the expected frequencies
in each category
testing
by multiplying the row and column
totals, and dividing by 162:
Age of mother
< 20
21-25
26-30
31-35
36+
Total
3.52
14.78
11.02
6.80
1.88
38
Characteristic
absent
11.48
48.22
35.98
22.20
6.12
124
Total
15
63
47
29
8
162
Characteristic
present
In contingency tables the totals
meansthat in a table
are always the same in the observed and expected tables.
with only 2 rows, if
This
wecalculate the entries in one of the rows first,
we can
work out the entries in the other row by subtraction.
Two cells out of 10 cells have expected frequencies less than 5. Sincethis is not morethan 20%
wecan usethe table asit is.
So we can now calculate the value of the chi square statistic.
2
(10
3.5)22
?
=+
3.5
?
(5-- 6.1)
+
6.1
The number of degrees of freedom is (5
= 19.2
1)(2-- 1) = 4.
Weare carrying out a one-sided test. Our observed value of the test statistic exceeds 18.47, the
upper 0.1% point of the ?2
4
level.
distribution.Sowehavesufficientevidence
to reject0H atthe 0.1%
Therefore it is reasonable to conclude that the characteristic is dependent
on the
mothers
age.
If wedecided to combine cells because of the expected values beingless than 5, wecould do this
by combining
adjacent groups asfollows:
Age of mother
= 25
26-30
31+
Total
22
9
7
38
Characteristic
absent
56
38
30
124
Total
78
47
37
162
Characteristic
present
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 53
The expected values are:
Ageof mother
= 25
26-30
31+
Total
18.30
11.02
8.68
38
Characteristic
absent
59.70
35.98
28.32
124
Total
78
47
37
162
Characteristic
present
Wecan now calculate the value of the chi-square statistic.
2
(22
?
18.30)22
=+
18.30
?
(30 -+
28.32)
28.32
= 1.89
Thenumberof degreesof freedomis (32--1)(2 1) = .
Weare carrying
out a one-sided test.
5.991, the upper 5% point ofthe ?2
2
the 5%level.
Our observed value of the test statistic
does not exceed
distribution.So wehaveinsufficientevidence
to reject0H at
Therefore it is reasonable to conclude that the characteristic
is not dependent on the
mothers age.
The results are so different
R can carry
out a2?
because of the effect of the small expected values.
contingency
table test
using chisq.test(<table>).
automatically
applies a continuity
correction for
22 tables
argument correct=FALSE
if we wished to prevent this.
8.3
Since this
we would need to set the
Fishersexacttest
A non-parametric
permutation
approach to contingency
tables was devised more than 80
years ago by the great statistician
R.A. Fisher.
Consider two categorical
variables
X and
Y, each withtwo categories,1X , 2X , 1Y and2Y . Suppose wehave datafor n
observations,
and that
of these
nX1 are in category 1X
on variable
X,
nX2 are in category 2X
on variable
X,
nY1 are in category 1Y
on variable
Y, and
nY2 are in category 2Y
on variable
Y.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 54
CS1-10: Hypothesis
These data can be represented in a 2
X1
Variable
testing
2 contingency table as shown below.
X2
Y1
nY1
Y2
nY2
nX1
nX2
n
Fisher proposed testing the association
between the two categorical variables by working
out the probability of each possible permutation
of values in the shaded cells consistent
with the
marginal totals
association, the
hypergeometric.
variable
nX1, nX2, nY1 andnY2
. Then, underthe null hypothesis of no
distribution
of ways of allocating the data to the four shaded cells is
This means that, if the number of individuals
which have the value 1X
X and the value 1Y
on variable
Y is
nXY , then the probability
of obtaining
on
this
11
number is
given by:
???
()
Pn XY11
XY
=
???
???nn
nnXX
?
12
Y
???11
1
-
n
?
?
11?
XY
for
n ??
??
??
nY1
XY11
=
nnX
1
,
nY1
??
The stronger the association
between
be concentrated in either cells }YX{, 11
X and Y the more heavily the observations
should
and }YX{, 22 or }YX
{, 12 and }YX{, 21 (ie in two
opposite corners of the contingency table).
Consider a sample of 10 people, 6 men and 4 women.
Of these 3 are colour blind:
Colour blind
Not
Male
2
4
6
Female
1
3
4
3
7
10
Usingthe formula above, the probability of observing 2 colour-blind
menfrom this sample is:
???37 ?
??? ?
???24 ?== 335
10??
210
??
6??
10??
??
6??
1
2
is the total number of ways of choosing the 6 menfrom the 10 people.
???37 ?
??? ? is the number of ways of choosing 2 menfrom the 3 colour blind people and the 4 men
???24 ?
from the 7 non-colour blind people.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 55
Hence this expression gives us the probability
of observing 2 colour blind
menfrom this group of
10 people.
Atest
can then
be constructed
by considering
the observed or a more extreme concentration
The only four outcomes
which produce a
the probability
of getting
a distribution
22 table
with the same row and column totals
3
3
2
4
1
5
0
6
0
4
1
3
2
2
3
1
Using the formula
we can calculate the probabilities
???37 ?
??? ?
???37 ?
??? ?
???37 ?
??? ?
10??
10??
10??
10??
??
6??
??
6??
??
6??
6??
???24 ?
These are 1
6,
2,
1
3
10
15 ?
???
06 ?
???
??
and 30,
1 respectively.
For a one-tailed test it suffices to consider
only distributions
which are extreme in the same
direction as the observed table, whereas for atwo-tailed test distributions
considered
small tables
which are extreme in the opposite direction (this
as the sampling distribution is not symmetrical).
Atthe 5%level of significance
In our example, the probability
should be
can cause complications
with
we should reject the null hypothesis of no association if the
probability
of getting a distribution
less than 0.05.
colour blindness)is
are:
of each of these outcomes:
???37 ?
??? ?
33 ?
???
with
of observations in two opposite corners.
which is the same or more extreme than that
of observing this result
observed is
or more extreme (ie 2 or more men with
11 2 . Thisis notlessthan 5% it is actuallyverylikely andso basedon
3+=
62
these results weconclude that gender and colour blindness areindependent.
Onthe other hand, if
we were to find that our result
wasrare,
we would conclude that the result
is notjust due to chance, there is some connection between the variables.
Fishers test wasextendedto a general
RCtable by Freeman and Halton.
Wechose an example with a very small sample as otherwise there would be manycombinations
which will be time consuming
on a piece of paper.
However, this test is no problem for a
computer.
R can carry
The Actuarial
out Fishers
Education
Company
Exact Test using the command
fisher.test(<table>).
IFE: 2022 Examination
Page 56
CS1-10: Hypothesis
testing
Question
Acertain company employs both graduates and non-graduates. Asmall sample of employees are
entered for a certain test, with the following results. Ofthe four graduates taking the test, all
passed.
Of the eight non-graduates
taking the test, five passed.
Using Fishers exact test, assess
whether graduates are morelikely to passthe test than non-graduates.
Solution
Giventhat we had nine passes,the number of ways of choosing four graduates to passis
Giventhat we had three fails, the number of ways of choosing no graduates to fail is
total number of ways of choosing four graduates out of 12 employees is
9??
??
4??
.
3??
??. The
0??
12??
??
4??
.
Sothe probability of obtaining four graduate passesis:
??
?93?
???
?
?? ?40?
12??
14==0.2545
55
??
4??
Since wecannot obtain morethan four graduate passes when we only havefour graduates, this is
the mostextreme result possible, and the total probability of obtaining as extreme a result asthis
is 0.2545.
Since this is not less than 5%, we have insufficient
evidence to conclude that graduates
are morelikely to passthan non-graduates.
Wecan see that in this case it will never be possible to obtain a significant result based on the
small sample numbers we have here. Fishers exact test needs much bigger samples for it to be
usable to obtain satisfactory statistical results.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 57
Chapter10Summary
Statistical tests can be usedto test assertions about populations.
The process of statistical testing involves setting up a null hypothesis and an alternative
hypothesis, calculating
a test statistic
and using this to determine
The probability of a TypeI error is the probability of rejecting 0H
a p-value.
whenit is true. Thisis also
called the size(or level) ofthe test. The probability of a TypeII error is the probability of not
rejecting 0H whenit is false. The power of atest is the probability ofrejecting 0H whenit is
false.
Errors can also occur in the context of binary classifications, for example when an individual
is classified as testing positive or negative for a particular disease. The null hypothesis is that
the individual does not havethe disease. A TypeI error is afalse positive and a TypeII error
is afalse negative. The sensitivity of this test is the true positive rate (which is
1
(Type II error) -=Ppower of the test ). The specificity
of this test is the true negative rate
(which is 1 -P(Type I error) ).
Thebest test can be found usingthe likelihood ratio criterion.
detailed overleaf.
Thisleads to the tests
The test for two normal means(unknown variances) requires that the variances are the
same and uses the pooled sample variance:
2 =
sp
?2
(1)22
ns
11-+ ( n2 - 1)s2
nn
12+- 2
tests can be carried out to test for goodness of fit or to test whether two factors are
independent (using contingency tables).
Thestatistic
is ?
ii()2
-OE
Ei
.
Tofind the number of degrees offreedom for the goodness offit test, take the number of
cells, subtract 1if the total ofthe observed figures has been usedin the calculation of the
expected numbers (which is usually the case), and then subtract the number of parameters
estimated.
To find the number of degrees of freedom
for a contingency table calculate
(1)( -- 1)rc
. If
the expected numbersin some cells are small, these should be grouped. One degree of
The Actuarial
Education
Company
IFE: 2022 Examination
Page 58
CS1-10: Hypothesis
testing
One-sample
normaldistribution
(0,1)
XX--
known
nSn
s
(1)nS2
-
s
00??Ntn-1
ss
22
unknown
2
??n-1
2
0
Two-samplenormal distribution
XX2()
1
--- 12()
22 nn
ss
+
11
2
-XX
(0,1)
()
Sn +11 12
n
p
2
known
S
S
12
12()-
ss
Nt nn
??+-2
12
22 unknown
22
s
11
22
? Fnn
s22
1,12 1
--
One-samplebinomial
pp
(0,1)
or
pq
00 n
X-- np
00??
?? (0,1)NN
npq00
??
withcontinuity correction
Two-samplebinomial
12
()12
-pp--
pp ()
pq
pq
(0,1)
???
Np=
x
+
x12
nn
12
is the overall sample proportion
+
+
nn
12
One-sample
Poisson
X
?- 0?????
(0,1) or
-Xn?0
nn
?? 00
(0,1)
NN
??
with continuity
correction
Two-samplePoisson
(??
12()--
1 ??)
- 2 ???N(0,1)
??
?=
1?
1
+nn
?2 2 is the overall sample
nn
mean
+ 12
+
nn
12
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
9
10.1
Hypothesis
testing
Page 59
Chapter10 PracticeQuestions
Astatistical test is usedto determine whether or not an anti-smoking campaign carried out 5 years
ago hasled to a significant reduction in the meannumber of smoking related illnesses. The
probability
value of the test statistic is 7%.
Determine the conclusion for atest of size:
10.2
(i)
10%
(ii)
5%.
Arandom sample,
9.5
?110,,x x
18.2
120.19
(i)
(ii)
10.3
, from a normal population
4.69
3.76
14.2
17.13
gives the following
15.69
13.9
values:
15.7
7.42
??2
xxii == 1,693.6331
Test atthe 5%level whether the meanof the whole population is 15if the variance is:
(a)
unknown
(b)
20.
Test atthe 5%level whether the population varianceis 20.
A professional gambler hassaid: Flipping acoin into the airis fair, since the coin rotates about a
horizontal
axis, and it is equally likely to be either
coin is equally likely to land showing
way up when it first clips the ground.
heads or tails.
So a flicked
However, spinning a coin on a table is not fair,
since the coin rotates about a vertical axis, and there is asystematic biascausingit to tilt towards
the side wherethe embossed pattern is heavier. In fact, whena new coin is spun,it is morethan
twice aslikely to land showing tails asit is to land showing heads.
After hearing this, an experiment
was carried out, spinning a new coin 25 times
on a polished
table; the coin showed tails 18 times.
Comment on whether the results of the experiment
support the gamblers
claims about the
probabilities whena coin is spun.
10.4
The sample variances of two independent
the same population variance, are =2
sA
nB = 5 and the sample
Two populations
normal populations
A and B, which have
12.4 and =2
sB 25.8. The sample sizes are nA = 10 and
means are found to differ by 4.5.
Test whether the population
10.5
samples from
means are equal.
X and Y are known to have the same variance, but the precise distributions
are
not known. Asample of 5 valuesfrom population X and 10 valuesfrom population Y had sample
variances of =2
sX
47.0 and =2
sY
12.6 .
Carry out a statistical test based on the F distribution
to assess whether both populations
can be
considered to be normally distributed.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 60
10.6
CS1-10: Hypothesis
Determine the form
of the best test of
=00:Hvs
distribution of the underlying population is N
10.7
=11:H
, where
>10,
testing
assuming the
) , based on a sample of size n.
s2(,
A blood test has been used on 1,000 people to detect whether they have a particular condition.
Ofthe 427 people
who had a positive result, 369 of them
had the condition.
Ofthe 573 people
who had a negative result, 15 of them had the condition.
(i)
(a)
Calculatethe sensitivity of the blood test.
(b)
Calculatethe specificity ofthe blood test.
Asecond blood test is used on 1,000 people which has a sensitivity of 80% and a specificity of
60%. For this blood test, 544 people had a positive result.
(ii)
10.8
(a)
Calculatethe number oftrue positives.
(b)
Calculatethe number offalse positives.
Thelengths of a random sample of 12 worms of a particular species have a meanof 8.54 cm and a
standard
deviation
of 2.97 cm. Let
denote the
meanlength
of a worm of this species. It is
Exam style
required
to test:
:=?7cm01:
7cm Hvs
The lengths
H
of worms are assumed to be normally
Calculate the probability-value
10.9
Exam style
distributed.
of these sample results.
[3]
Ageneralinsurance companyis debatingintroducing a new screening programme to reduce the
claim amounts that it needsto pay out. The programme consists of a much more detailed
application form that takes longer for the new client department to process. The screening is
applied to a test group of clients as atrial whilst other clients continue to fill in the old application
form. It can be assumedthat claim paymentsfollow a normal distribution.
The claim payments datafor samples ofthe two groups of clients are(in 100 per year):
Without screening
24.5
21.7
45.2
15.9
23.7
34.2
29.3
21.1
23.5
28.3
Withscreening
22.4
21.2
36.3
15.7
21.5
7.3
12.8
21.2
23.9
18.4
(i)
Testthe hypothesis that the new screening programme reduces the meanclaim amount. [5]
(ii)
Testthe assumption of equal variances required in part (i).
IFE: 2022 Examinations
[3]
[Total 8]
The Actuarial
Education
Compan
CS1-10:
10.10
Exam style
Hypothesis
testing
Page 61
An environmentalist
is investigating
the possibility that oestrogenic chemicals are leading to a
particular type of deformity in aspecies of amphibiansliving in alake. The usual proportion of
deformed animalsliving in unpolluted wateris 0.5%.In asample of 1,000 animals examined, 15
werefound to have deformities.
(i)
Test whether this provides evidence of the presence of harmful chemicals in the lake.
Following an extensive campaign to reduce these chemicals in the lake afurther
[3]
sample of 800
animals wasexamined and 10 werefound to have deformities.
(ii)
Test whether there has been asignificant reduction in the proportion of deformed
animals in the lake.
[3]
[Total 6]
10.11
The total claim amounts (in m) for home and car insurance
over a year for similar sized companies
are collected by anindependent advisor:
Exam style
Home
13.3
19.2
12.9
15.8
17.6
Car
14.3
21.0
12.8
17.4
22.8
(i)
Test whether the meanhome and car claims are equal. State clearly your probability
value.
[5]
It wassubsequently discovered that the results were actually 5 consecutive years from the same
company.
(ii)
Carry out an appropriate
test of whether the
mean home and car claims are equal.
[3]
[Total 8]
10.12
Arandom
Exam style
variable
fx()
X is believed to have probability
( =+-x) 343
??
density function,
)f ( x , where:
x >0
In orderto testthe nullhypothesis?=50 againstthe alternativehypothesis?=60,asingle
value is observed. If this value is greater than 93.5, 0H is rejected.
(i)
Calculate the size of the test.
(ii)
Calculatethe power of the test.
The Actuarial
Education
Company
[2]
[2]
[Total 4]
IFE: 2022 Examination
Page 62
10.13
Exam style
CS1-10: Hypothesis
In an extrasensory
perception
experiment
carried out in alive television interview,
testing
the
interviewee whoclaimed to have extrasensory powers wasrequired to identify the pattern on
each of 10 cards, which had been randomly assigned with one of five different patterns. The
cards were visible only to the audience who were askedto transmit the patterns to the
interviewee.
this
Whenthe interviewee
failed to identify
any of the cards correctly, she claimed that
was clear proof of the existence of ESP,since there
was a strong
mind in the audience
who
was willing her to get the answers wrong.
(i)
(a)
State the hypotheses implied
by the interviewees
conclusion and carry out a 5%
test on this basis.
(b)
(ii)
Comment on your answer.
(a)
[3]
State precisely the hypotheses that the interviewer
could have specified
before
the experiment to prevent the interviewee from cheating in this way.
(b)
Determine the number of cards that would haveto beidentified correctly to
demonstrate
10.14
Exam style
the existence of ESPat the 5%level.
[2]
[Total 5]
Aninsurer believesthat the distribution ofthe number of claims on a particular type of policyis
binomialwithparameters
3n= and p. Arandomsampleofthe numberofclaimson153policies
revealed the following results:
Number of claims
0
1
2
3
Number of policies
60
75
16
2
(i)
Derive the
maximum likelihood
estimate of
p.
[4]
(ii)
Carry out a goodness-of-fit test for the binomial modelspecified in part (i) for the number
of claims on each policy.
[5]
[Total 9]
10.15
In an investigation
into a patients red corpuscle count, the number of such corpuscles appearing in
each of 400 cells of a haemocytometer wascounted. Theresults were asfollows:
Exam
style
No. of red blood corpuscles
0
1
2
3
4
5
6
7
8
No. of cells
40
66
93
94
62
25
14
5
1
It is thought that a Poisson distribution with mean
provides an appropriate
modelfor this
situation.
(i)
.
(a)
Estimate
(b)
Test the fit of the Poisson model.
IFE: 2022 Examinations
[8]
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 63
For a healthy person, the
mean count per cell is known to be equal to 3. For a patient
with
certain types of anaemia, the number of red blood corpusclesis known to belower than this.
(ii)
Test whether this patient has one of these types of anaemia.
[3]
[Total
10.16
In a recent study investigating
developing
Exam
style
symptoms
a possible genetic link between individuals
of AIDS, 549 men who had been diagnosed
susceptibility
HIV positive
11]
to
were classified
according to whether they carried two particular alleles (DRB1*0702 and DQA1*0201). The
results
were asfollows:
Free of
symptoms
Condition ofindividual
Early
symptoms
Suffering
from AIDS
Total
Alleles present
24
7
17
48
Alleles absent
98
93
310
501
Total
122
100
327
549
Test whether there is an association between the presence ofthe alleles and the classification
into the three AIDS statuses using these results.
[5]
10.17 Insurance claims (in ) arriving at an office over the last monthhave been analysed. Theresults are
asfollows:
Exam style
Claim size, c
c=< 0500
No. of claims
(i)
500
75
Assuming that the
=<c1,000
1,000
51
=<c2,500
22
over 2,500
5
maximum claim amount is 10,000:
(a)
calculate the sample meanofthe data
(b)
test atthe 5%level whether an exponential distribution with parameter
appropriate
?is an
distribution for the claim sizes. You should estimate the value of
using the method of moments.
?
[6]
An actuary decides to investigate
whether claim sizes vary according to the postcode of residence
of the claimant. She splits the data into the three different postcodes observed. The results for
the first two postcodes are given below:
Postcode 1:
Claim size, c
c=< 0500
No. of claims
The Actuarial
Education
23
Company
500
=<c1,000
14
1,000
=<c2,500
7
over 2,500
3
IFE: 2022 Examination
Page 64
CS1-10: Hypothesis
testing
Postcode 2:
Claimsize, c
c=< 0500
No.of claims
(ii)
500
30
Test at the 5%level
=<c1,000
1,000
16
=<c2,500
over 2,500
11
whether claim sizes are independent
1
of the postcodes.
[8]
[Total 14]
10.18
Exam style
A politician hassaid:A recent studyin a particular areashowed that 25% ofthe 400 teenagers who
wereliving in single-parent families had beenin trouble withthe police, compared with only 20% of
the 1,200 teenagers who wereliving in two-parent families.
Our aimis to reduce the number of
single-parent families in order to reduce the crime rates during the next decade.
(i)
Carry out a contingency table test at the 5%significance level to assess whether there is a
significant association between living in a single-parent family and getting into trouble
with the police.
(ii)
[5]
Comment on the politicians
statement.
[1]
[Total 6]
10.19
Exam style
Acertain species of plant producesflowers whichare either red, white or pink. It also produces
leaves which maybe either plain or variegated. For a sample of 500 plants, the distribution of
flower colour and leaf type
was:
Red
(i)
97
42
77
Variegated
105
148
31
Test whether these results indicate any association between flower colour andleaf type.
[6]
model suggests that the proportions
Red
Plain
q
Variegated
q
where
(iii)
Pink
Plain
Agenetic
(ii)
White
(0 qq
of each combination
should be asfollows:
White
q /2
3/2q
Pink
(1
3 ) /2q-
(1
5 ) /2q-
1/ 5)<<
is an unknown parameter.
(a)
Show that the maximumlikelihood estimate for q is 0.181.
(b)
Test whether this genetic modelfits the data well.
[12]
Comment briefly on your conclusions.
[3]
[Total
IFE: 2022 Examinations
The Actuarial
Education
21]
Compan
CS1-10:
10.20
Exam style
Hypothesis
A particular
testing
Page 65
area in a town suffers a high burglary rate.
A sample of 100 streets is taken, and in
each of the sampled streets, a sample of six similar housesis taken. Thetable below shows the
number of sampled houses, which have had burglaries during the last six months.
No. of houses burgled
x
0
1
2
3
4
5
6
No. of streets
f
39
38
18
4
0
1
0
(i)
(a)
State any assumptions needed to justify the use of a binomial modelfor the
number of houses per street which have been burgled during the last six months.
(b)
Derive the
maximum likelihood
estimate
of p, the probability
that a house of the
type sampled has been burgled during the last six months.
(c)
Determine the probabilities
(d)
Comment on the fit without doing aformal test.
Aninsurance
company
for the binomial
works on the basis that the probability
model using your estimate
of p.
[10]
of a house being burgled over a six-month
period is 0.18.
(ii)
Carry out a test to investigate
a good fit for the data.
whether the binomial
model with this value of p provides
[7]
[Total 17]
The Actuarial
Education
Company
IFE: 2022 Examination
Page 66
10.21
Exam style
CS1-10: Hypothesis
It is desired to investigate
the level
of premium charged by two companies for contents
testing
policies
for housesin a certain area. Random samples of 10 housesinsured by Company Aare compared
with 10 similar housesinsured by Company B. The premiums chargedin each caseare as follows:
Company A
117
154
166
189
190
202
233
263
289
331
Company B
142
160
166
188
221
241
276
279
284
302
Theline plots below show the sample valuesfor the two companies :
Company A
100
150
200
250
300
350
300
350
Company B
100
(i)
150
200
250
Comment briefly on the validity of the assumptions required
for a two-sample
t test for
the premiums ofthese two companies usingthe plots.
[2]
Forthese
data:?=A2,134?=A 494,126
, ?
=B 2,259?=B 541,463
.
2
2
,
(ii)
,
Carry out a formal test to check that it is appropriate
to apply a two-sample
t test to
these data, assumingthat the premiums are normally distributed.
(iii)
Test whether the level of premiums charged by Company B wassignificantly higher than
that charged
(iv)
[4]
(a)
by Company
A,stating the
p value and conclusion clearly.
Calculate a 95% confidence interval
for the difference
[3]
between the proportions
of
premiums of each company that arein excess of 200.
(b)
Comment briefly on your result to part (iv)(a).
[3]
The average premium charged by Company Ain the previous year was 170.
(v)
Test whether Company A appears to have increased its premiums since the previous year.
[3]
[Total 15]
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 67
Chapter10Solutions
10.1
The hypotheses are:
H0:
The campaign has not led to areduction in smoking related illnesses.
H1:
The campaign hasled to a reduction in smoking related illnesses.
Conclusion for test of size 10%
(i)
Sincethe calculated probability value (7%) is less than the size ofthe test (10%), we have
sufficient evidence at the 10%level to reject 0H . Therefore the campaign hasled to a reduction
in the meannumber of smoking related illnesses at the 10%level.
(ii)
Conclusion for a test of size 5%
Since the calculated
probability
value (7%) is greater than the size of the test (5%),
we have
insufficient evidence at the 5%level to reject 0H . Therefore the campaign has not led to a
reduction in the
10.2
(i)(a)
mean number of smoking related illnesses at the 5%level.
Test mean when population variance unknown
Weare testing:
:=?15
:15 Hvs
H
01
Sincethe variance is unknown, the test statistic is
X-
?tn-1 . From the data, we have:
Sn
120.19
==12.019
x
10
1
22=
(1,693.6331 =- 10 12.019
) 27.674
9
s
This gives a statistic of:
12.019 15
t==-1.792 27.674 10
Thisis greater than the 9t critical value of -2.262 so there is insufficient evidence at the 5%level
to reject 0H . Therefore it is reasonable to conclude that
Alternatively,
the probability
2
0.055
using probability
of obtaining
values,
we have
a value at least
=15.
Pt(9 <-1.792)
0.055 . This test is two-sided,
so
as extreme as the one actually obtained is
0.11=. Thisis greater than 0.05 so we have insufficient
evidence to reject 0H
at the 5%
level.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 68
(i)(b)
CS1-10: Hypothesis
Test mean when population
testing
variance known
Weare testing:
:=?15
:15 Hvs
H
01
X-
Sincethe varianceis known wecan use
? N(0,1). This gives:
n
s
X
z==
12.019 -- 15
n
s
20
= -2.108
10
Thisis less than the critical value of
1.96- so there is sufficient
H0. Therefore it is reasonable to conclude that
Alternatively,
using probability
the probability
2 0.0175
of obtaining
values,
we have
a value at least
evidence at the 5%level to reject
=/ 15.
(PZ <- 2.108)= 0.0175. Thistest is two-sided, so
as extreme as the one actually obtained is
0.035= whichis less than 0.05. So we havesufficient evidenceto reject 0H at the 5%
level.
(ii)
Test variance
Weare testing:
:=?20
:20 Hvs
Weknow that
(1)nS2
-
s
2
H
01ss
22
2
has a ?n
-1
distribution.
The observed value of the test statistic is:
9 27.674
20
=12.45
The critical values of
?92
are 2.700 and 19.02for atwo-sided test. So we haveinsufficient
evidence at the 5%level to reject 0H . Therefore it is reasonable
to conclude that
s
2
20=
.
10.3
Totest whether tails is morethan twice aslikely, weusethe hypotheses:
Hp
vs
H
01 ::p
=>
22
33
Let X be the number of tails obtained in the experiment,
XBin(25, p)?? N(25 p,25 pq)
IFE: 2022 Examinations
?
then:
25
Xp
???N(0,1)
25pq ??
-
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 69
Under 0H , the statistic
17
z
with continuity
1612
-
23
correction is:
0.354
==
55
9
Thisis less than the critical value of 1.645, so there is insufficient evidence at the 5%level to
reject 0H . Therefore it is reasonable to conclude that p = 3
2 , ie the experiment
enough evidence to show that tails is
more than twice aslikely
does not provide
as heads.
Alternatively,usingprobability values, wehave PZ>=
(
0.354) 0.362, which is greater than 0.05.
So wehaveinsufficientevidence
to reject0H atthe 5%level.
10.4
Wearetesting:
Hvs
=?
H
::A BA
01
B
The test statistic is:
(XX
()--
A -ABB)
SP
2
nn+- 2 where SP=
?t
- 22
A (1)nS -+n(ABB1)S
nn
AB+-2
AB
11
+nn
AB
The observed value of the pooled variance is:
sP2==
9 12.4+
4 25.8
16.52
13
Sothe value ofthe test statistic is:
4.5
-
0
=2.021
11+
16.52
10
5
Thislies between the 13tcritical values of
2.160, so wehaveinsufficient evidence atthe 5%
level to reject 0H . Therefore weconclude that .B
A
=
Alternatively, using probability values, wehave Pt(13
the probability
2 0.034
of obtaining
a value at least
2.021)
0.034>=
. Thistest is two-sided, so
as extreme as the one actually obtained is
0.068=
. Thisis greater than 0.05, so wehaveinsufficient evidenceto reject 0H at the
5%level.
10.5
Weare testing:
vs
H0:
The populations
H1:
Atleast one of the populations does not have a normal distribution.
If 0H is true, then
The Actuarial
Education
S / s22
XX
22
S / sYY
Company
both have normal distributions
?F
4,9
.
IFE: 2022 Examination
Page 70
CS1-10: Hypothesis
Since weknow that
47.0/12.6
testing
22/XY
SS,
which has an observed value of
22, this test statistic is just
ss=XY
3.730=
.
The 5%critical values for an 4,9Fdistribution are 0.1123 and 4.718. Since 3.730lies between
these, we haveinsufficient evidence at the 5%level to reject 0H . Therefore weconclude that the
populations are both normal.
Thisis a slightly
unusual application
of the Ftest,
which is usually used to test variances for
populations that are assumed to have a normal distribution.
10.6
The hypotheses are:
H00
Hvs
==
1::
1
(where
>10)
Here, wecan usethe likelihood ratio criterion,
Likelihood
under H0 <
critical value
Likelihood
under
whichsays that weshould reject 0H if:
H1
Sincethe populations are normal, this is:
e
-
1??xi - 0
2
nn
??2??
s
??
sp
ii ==s11
11
1??xi
?
2?
-
e
2
-1
s
?
?
22 p
<constant
Cancellingthe constants reduces this to:
()
--
e
11
?? (xxii --
22
ss
e 22
)
22
01
<
constant
Takinglogs:
11
xiixconstant
-()<22
+
22
ss
Multiplying through by 2s
+ii
--(2
x
Simplifying this gives
Since
>
constant
)
-
01
and expanding the squares:
2
22
??(
)
1xi
i(x2 -2??
+1
2
)
<constant
() ix constant-<?
01
.
10, we have to reverse the inequality
-
>
x
+00
22
01, and the test criterion
when we divide through
by the negative
reduces to:
xconstant
Sothe besttest requires usto reject 0H if the sample meanexceeds a specified critical value.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
10.7
(i)
Hypothesis
testing
Page 71
From the question,
we have the following
outcomes from the blood test:
Bloodtest result
Patient
Positive
Negative
Yes
369
15
No
58
558
427
573
actually has
condition
Number of true positives
(a)
Sensitivity
(b)
Specificity
(ii)
369
Total number of people withthe condition
==
369 + 15
Number of true negatives
Total number of people
96.1%
=
558
==
without the condition
58
+
558
=
90.6%
From the question, wehavethe following outcomes from the blood test:
Blood test result
Patient actually
hascondition
Positive
Negative
Yes
True positive (TP)
False negative (FN)
No
False positive (FP)
True negative (TN)
544
456
1000
Rearrangingthe sensitivity to expressthe false negativesin terms of true positives:
Sensitivity
TP
== 80%
?
TP + FN
=+ FN)
TP 0.8(TP
? FN = 1 TP
4
Rearranging the specificity to express the true negatives in terms
Specificity
TN
==
FP + TN
60%
?
TN 0.6(FP
=+ TN)
?
of false positives:
TN = 3 FP
2
Usingthe given number of positive test results, weget:
TP
FP
(1)
544+=
Working with the number of negative test results and expressingin terms of numbers of true and
false positives:
3 FP 456
FN TN 456+= ? 1 TP+
42
=
(b)
TP 6FP
1,824
+=
(2)
Subtracting the first equation from the second gives:
5FP = 1,280
(a)
?
Substituting
? FP = 256
the false positives into the first equation
gives TP
288=
.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 72
10.8
CS1-10: Hypothesis
testing
Weare testing:
H:
7cmHvs 01:
s=?
X-
Under 0H , the statistic
7cm
(
2
unknown)
has a 11t distribution.
Sn
Sothe value of our test statistic is:
8.54
-
2.97
Comparing this
7
=1.796
[1]
12
with the tables of the
11t distribution,
Hence,wehaveaprobability
valueof
10.9
(i)
Test whether new screening
wefind that
Pt
5%.
(1.796) >=11
2=5%
10%,asthetestis twosided.
programme
reduces
[1]
[1]
mean claim amount
Weare testing:
01 =<
Hvs
2
H1
::
2
[1/2]
1
wheresubscript 1refers to without screening.
Thetest statistic is:
(XX
()--
1 -12 2)
SP
2
nn+- 2 whereSP=
?t
12
11
+nn
(1)nS
22
11-+ ( n2 - 1) S2
nn
12+- 2
[1/2]
12
Calculatingthe observed values:
267.4
10
s1
s2
2
1
9
1
9
==
26.74, xx12
200.7
=
20.07
[1/2]
()
=
67.2093
[1/2]
()
=
58.4357
[1/2]
22
7,755.16=- 10 26.74
4,553.97 =- 10
=
10
20.0722
9 67.2093
+
9 58.4357
sP==62.8225
[1/2]
18
Sothe value of the test statistic is:
(26.74
20.07)-- 0
62.8225
IFE: 2022 Examinations
=1.882
[1]
2
10
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 73
Thisis greater than the 18tcritical value of 1.734, so we have sufficient evidence at the 5%level
to reject 0H . Therefore
we conclude that
<
21.
[1]
Alternatively, using probability values, wehave Pt(18
1.882)
0.04>
, whichis less than 0.05. So
wehavesufficient evidenceto reject 0H at the 5%level.
(ii)
Test equality of variances
Weare testing:
Hss
22 Hvs
s=?01 2
1:: s
12
22
[1/2]
Thetest statistic is:
S
S
22
s
11 ? Fnn
22
s22
[1/2]
1,12-- 1
Under0H , the value ofthe test statistic is:
67.2093
58.4357
= 1.150
The 5% critical values for an
[1]
9,9Fdistribution
are 0.2484 and 4.026. Since 1.150 lies between
these, we haveinsufficient evidence at the 5%level to reject 0H . Therefore weconclude that
22
ss=12
10.10 (i)
(and hence the assumption required for part (i) seems valid).
[1]
Testif chemicals are present
Wearetesting the proportion p of defective animals usingthe hypotheses:
Hp :0.005
vs
H
01
[1/2]
:0.005=>p
Let X be the number of deformed animals obtained, then:
XBin(1000,
)p ?? N(1000 p,1000 pq)
?
1,000Xp
???N(0,1)
1,000pq ??
-
[1/2]
Under0H , the statistic with continuity correction is:
14.5 - 5
4.975
= 4.26
[1]
Thisis greater than the 1%critical value of 2.3263, so there is sufficient evidence at the 1%level
to reject 0H . Therefore weconclude that p > 0.005, ie there are harmful chemicals present in
the lake.
[1]
Alternatively, using probability values, wehave (PZ>=4.26)
The Actuarial
Education
Company
0.00001, whichis very significant.
IFE: 2022 Examination
Page 74
(ii)
CS1-10: Hypothesis
Test if there has been a significant
reduction in deformed
testing
animals
Weare testing:
p::
Hp
01
p
vs
[1/2]
H1 p2=<2 1
wherethe subscript 1 refers to before
and 2refers to after.
Thetest statistic is:
-pp
12
(1 pp)
(1-- pp)
+
N(0,1)
???
[1/2]
nn
12
Here we have:
15
=
1,000
pp
12
0.015
==
10
=
0.0125
25
p=
800
1,800
?
= 0.0138
[1/2]
which gives us a value of 0.450 for our test statistic.
[1/2]
Thisis less than the critical value of 1.6449, so there is insufficient evidence at the 5%level to
reject 0H . Therefore it is reasonable to conclude that =12pp (ie there has not been asignificant
reduction in the proportion of deformed animalsin the lake).
Alternatively, using probability values, wehave PZ>=
(
0.450)
[1]
0.326, whichis greater than 0.05 so
we haveinsufficient evidenceto reject 0H atthe 5%level.
10.11
(i)
Test whether
mean home and car claims are equal
Weare testing:
=?
Hvs
HC
H
::
01
H
[1/2]
C
Thetest statistic is:
)
HC
HC
?t
(XX ()
--
SP
HC
-
nn +-2
11
nn
HC
+
where S2
(1)nS
-+n( C -C1)S
22
HH
P =
[1/2]
nn
HC+-2
Calculatingthe observed values:
78.8
==
s1
1
4
15.76,
88.3
xxH
55
2
=
()
22
1,271.34=- 5 15.76
IFE: 2022 Examinations
=
=17.66
[1/2]
7.363
[1/2]
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
1
s2
()
22
1,631.93=- 5 17.66
4
2
Page 75
=
18.138
[1/2]
4 7.363
4 18.138
+
sP==12.7505
[1/2]
8
The value of the test statistic is:
(15.76
17.66)-- 0
[1]
=-0.841
2
5
12.7505
Thislies between the 8t
critical values of
2.306 , so we have insufficient
level to reject 0H . Therefore weconclude that
=
Alternatively, using probability values, wehave8Pt
evidence at the 5%
HC.
<-(0.841)
[1]
0.21. Thistest is two-sided, so
the probability of obtaining a value atleast as extreme asthe one obtained is
20.21 0.42=
. This
is muchgreater than 0.05, so wehaveinsufficient evidenceto reject 0H at the 5%level.
(ii)
Paired t-test
Since the data are paired,
:=?0DD
:0 Hvs
H
The differences
01
D for each pair are:
Sample 2
1.0
we are testing:
Sample 1:
1.8
0.1
1.6
5.2
[1/2]
Now:
9.5
==
1.9
xs
DD
=
54
221
33.85 - 5 1.9 = 3.95
()
[1/2]
So the observed value of the test statistic is:
x
- DD
D
sn
1.9
-
0
==2.138
[1]
3.95 5
This lies between the 4t
critical values of
2.776 , so we have insufficient
level to reject 0H . Therefore weconclude that
D 0=
Alternatively, using probability values, wehave Pt(4
probability
of obtaining
a value at least as extreme
evidence at the 5%
.
2.138)
[1]
0.05>
. Thistest is two-sided, so the
asthe one actually obtained is
20.05
0.1=.
Thisis greater than 0.05, so wehaveinsufficient evidenceto reject 0H at the 5%level.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 76
10.12
(i)
CS1-10: Hypothesis
testing
Size of the test
The size of a test,
a,is the probability of a TypeI error ie the probability of rejecting 0H
whenit
is true:
PX=> (93.5 whena?
50)
=
[1]
8
?
=
350 (50
+
)-34 xdx
93.5
8
??
=- 50 (50 + x)-33
=
??93.5
0.0423
The size of the test is 4.23%.
(ii)
[1]
Power of the test
The power of a test, 1-
1
(PX
-=
>
, is the probability
93.5 when ?
=
of rejecting 0H
when it is false.
60)
[1]
8
?
=
360 (60
+
)-34 xdx
93.5
8
??
=- 60 (60 + x)-33
??93.5
=
0.0597
The power of the test is 5.97%.
10.13
(i)(a)
State the interviewees
[1]
hypotheses
The interviewee
appears to be assuming (with the benefit of hindsight) a two-sided alternative
hypothesis that includes both very good results and very bad results, ie the hypotheses
(expressed in terms of the probability of a correct identification
Hp
(i)(b)
0.2
vs
H:
01:
p) would be:
[1]
0.2=?p
Interviewees test
Under0H , the number of correctly identified
patterns has a Bin(10,0.2) distribution.
The probability of getting asfew as 0 correct is:
10??
010
= 0.107
??(0.2) (0.8)
0??
The additional probability for the other tail can onlyincrease this value. Sothe result is not
significant even at the 10%level.
[1]
So, even after bending the rules, the interviewee
[1]
IFE: 2022 Examinations
hasfailed to demonstrate her powers.
The Actuarial
Education
Compan
CS1-10:
Hypothesis
(ii)(a)
testing
Page 77
Correct hypotheses
The hypothesesto usein a one-sided test designed to convince non-believers should be:
0.2
Hp
(ii)(b)
H
:
01:
vs
0.2=>p
Number of cards required to be correct
Calculatingthe probabilities for the Bin(10,0.2) distribution (iteratively) shows that:
PBin
[ (10,0.2)
4]== 0.1074 + 0.2684
=
+
0.3020 + 0.2013 + 0.0881
0.9672
[1]
Sothe interviewee would haveto identify atleast 5 cards correctly to demonstrate the existence
of ESPat the 5%level. (The actual size of the test is 3.28%.)
[1]
10.14 (i)
Maximumlikelihood estimate of p
The likelihood
of observing the given sample is:
60
(1
p=LC
75
16
2
) ?? ?3p(1 - p32
) ? ? 3p2(1 - p)? p? 3?
??
?
?
(1=-Kp) 180 p75-(1
?
p) 150 p 32(1
?
-
?
?
p) 16 p6
(1=- Kp) 346 p113
[1]
where Cis a constant arising from the fact that the sample can occurin different orders.
Taking logs:
ln
lnLK=+ 346ln(1
Differentiating
lndL
dp
p) + 113ln p
-
with respect to
346
1- p
=-
+
p:
113
p
[1]
Setting this equal to zero:
346
113(1
=- pp)
? 459
113
==0.246
459
113= ? pp
[1]
Checkingthat we do have a maximum:
2
lndL
dp
So p =
=-
346
(1 -)p
22 -
113
2
p
<
?
0max
[1]
113
459
.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 78
(ii)
CS1-10: Hypothesis
Goodness-of-fit
test
Weare testing the following
0
vs
testing
hypotheses
using
2
a ?
goodness-of-fit
test:
conform to aHBin(3, p) distribution
: the probabilities
H1 : the probabilities do not conform to a Bin(3, p) distribution
Using p = 113 from part (i), the probabilities for this binomial distribution are:
459
(PX
0)== (1
p) 3
-
(PX 1)== 3 p(1
=
p) 2
-
0.4283
=
0.4197
[1]
(PX
2
2)== 3 p (1
PX (3)== p3
=
-
p)
=
0.1371
0.0149
Multiplying these by 153 we obtain expected values of 65.54, 64.21, 20.97, 2.283.
Sincethe last one of these expected valuesis less than 5 we need to combine this with another
group, say the third one. This gives:
Number of claims
0
1
2 and 3
Observed no. of policies
60
75
18
Expected no. of policies
65.54
64.21
23.25
[1]
The number of degrees of freedom 1=- 3
1-1 = .
[1]
The observed value of the test statistic is:
?(
OE
ii )
Ei
22
(60-- 65.54)
65.54
=+(75
- 64.21)2
(18
+
64.21
-
23.25)2
23.25
0.4683=+ 1.813 + 1.185 = 3.47
[1]
Sincethis is less than the 5%critical value of 3.841, we haveinsufficient evidence at the 5%level
to reject 0H . Wetherefore conclude that the modelis a good fit.
[1]
10.15
(i)(a)
Estimate the Poisson parameter
The maximum likelihood
estimator
of the Poisson parameter (representing
the average number
of corpusclesin eachsquare) is the sample mean, whichis:
040+ 1 66 + 2 93+ ? +8
400
IFE: 2022 Examinations
1
==
1,034
400
=
2.585
[1]
The Actuarial
Education
Compan
CS1-10:
Hypothesis
(i)(b)
testing
Page 79
Goodness-of-fit
test
The hypotheses are:
H0: The observed numbers conform to a Poisson distribution
vs
H1:
The observed numbers dont conform to a Poisson distribution.
Wecan use the estimate from
(PX 0)
eg
[1/2]
part (i)(a) to calculate the expected numbers using the Poisson PF:
2.585
e-==
= 0.07540
? 30.16 cells
[1/2]
Corpuscle count
0
1
2
3
4
5
6
7
8=
Actual number
40
66
93
94
62
25
14
5
1
Expected number
30.2
78.0
100.8
86.8
56.1
29.0
12.5
4.6
2.0
[2]
If
we pool the groups for counts of 7 or more, the observed value of the test statistic is:
?(
(40 -- 30.2)22
OE)
E
30.2
=+(66
3.180=+ 1.846
+
- 78.0)2
78.0
0.604
+
(6
+?+
0.597
-
6.6)2
6.6
0.620 + 0.552
+
+
0.18
0.055+= 7.63
The number of degrees offreedom is 6--81 1 = .
[2]
[1]
Sincethis is less than the 5%critical value of 12.59, we haveinsufficient evidence at the 5%level
to reject 0H . Wetherefore conclude that the modelis a good fit.
[1]
(ii)
Testif patient
has anaemia
Weare testing:
:s H=<:3
Hv
01
3
[1/2]
Let X be the count per cell, then:
?
XPoi(400? )?? N(400
,400
)
X -
400
400
???N(0,1)
[1/2]
??
Under0H , the statistic with continuity correction is:
1,034.5 - 1,200
z==-4.78
1,200
[1]
This is less than the 1% critical value of -2.3263, so there is sufficient
reject 0H . Therefore weconclude that 3<
evidence at the 1%level to
, ie the patient does have anaemia.
[1]
Alternatively, using probability values, wehave (PZ <- 4.78) < 0.0005%, whichis highly
significant.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 80
10.16
CS1-10: Hypothesis
testing
Here we are testing:
H0:
The classification into the three AIDSstatuses is independent of the presence or
absence of the alleles
vs
H1:
The classification into the three
AIDS statuses is not independent
of the presence
or absence of the alleles.
The expected frequencies,
calculated
using row total
columntotal , are:
grand total
Free of
EXPECTED
[1/2]
Early symptoms
symptoms
Suffering from
AIDS
Total
Alleles present
10.7
8.7
28.6
48
Alleles absent
111.3
91.3
298.4
501
Total
122
100
327
549
[1]
The value ofthe chi-square test statistic is:
?(
OE
ii )
(24-- 10.7)
22
=+
10.7
Ei
?
(310
+
298.4)2
= 23.79
298.4
-
[2]
Thetest statistic is sensitive to rounding.
The number of degrees offreedom is given by (22--1)(3 1) = .
Since the test statistic is greater than the 1/2%
2
?2
critical value of 10.60,
1/2%level, and conclude that the classification into the three
[1/2]
wecan reject 0H
AIDS statuses is not independent
the presence or absence of the alleles.
10.17
(i)(a)
at the
of
[1]
Sample mean
The sample meanis:
250 75+ 750 51
75
(i)(b)
+
1,750 22 + 6,250 5
51++ 22 + 5
126,750
==828.43
153
Goodnessof fit of exponential distribution
Weare testing :
H0: the exponential is a suitable distribution
vs
H1: the exponential is not a suitable distribution.
Wefirst need to estimate the value of ? usingthe method of moments. The meanofthe claim
amount distribution is1 ?. Setting this equal to the sample meangives a value of 0.0012071
for
?.
IFE: 2022 Examinations
[1]
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 81
The probability that an exponential
Fb
()
F( a)-= ee??
Soif the claim amount is
(500
a and b is:
X we have:
PX<< 1,000)
(1,000<< PX
variable lies between
ab
---
500) = e0 - e
(0<<PX
random
=
500
??
= 1 - e-- 500
- e-- 1,000??
e 500
2,500) = e
(2,500 PX<< 10,000)
=
1,000
e
- e
--
2,500
= 0.4531
= 0.2478
2,500??
- --e
= 0.2502
10,000??
=0.0489
[1]
Multiplying these figures by 153, we obtain the expected values 69.33, 37.91, 38.27 and 7.48
respectively.
OE
() 2
We
thencalculate
theteststatistic?
(75
69.33)
(51 -- 37.91)22
69.33
The underlying
37.91
distribution is
- ii
Ei
(22
-
++
:
38.27)2
(5
+
38.27
?2 with 2--41
1=
-
7.48)2
7.48
= 12.7
degrees of freedom
[1]
(since
we have set the
total and estimated the meanfrom the data).
The critical value of the
2
?2
distribution
[1]
[1]
is 5.991, so we have evidence to reject 0H
and conclude that the exponential is not an appropriate
at the 5%level
distribution.
[1]
Contingency table
(ii)
Wearetesting:
vs
H0:
the claim size is independent
of postcode
H1:
the claim size is not independent
of postcode
The observed valuesin each ofthe categories are:
Claim size, c
0
500c=<
500
1, 000=<c
1, 000
=<c2, 500
2, 500
=<c10, 000
Total
Postcode 1
23
14
7
3
47
Postcode 2
30
16
11
1
58
Postcode 3
22
21
4
1
48
Total
75
51
22
5
153
[1]
The Actuarial
Education
Company
IFE: 2022 Examination
Page 82
CS1-10: Hypothesis
Wecan calculate the expected frequencies
in each category
testing
by multiplying the row and column
totals, and dividing by 153:
Claim size, c
0
500c=<
500
=<c 1, 000
1, 000
=<c2, 500
2, 500
=<c10, 000
Postcode 1
23.04
15.67
6.76
1.54
Postcode 2
28.43
19.33
8.34
1.90
Postcode 3
23.53
16.00
6.90
1.57
[3]
Since there are three cells containing less than 5, we will combine the last two columns.
Claim size, c
0
500
500c=<
=<c 1, 000
1, 000
=<c10, 000
Postcode 1
23.04
15.67
8.29
Postcode 2
28.43
19.33
10.24
Postcode 3
23.53
16.00
8.47
[1]
Wecan now calculate the observed value of the test statistic:
?
2 (23 23.04)
23.04
=+
?
8.47)22
(5-+
8.47
[1]
= 4.58
The number of degrees offreedom is (34-- 1)(3 1) = .
the observed value of the test statistic
[1]
does not exceed 9.488, the upper 5% point of the
?42
distribution. So we haveinsufficient evidence at the 5%level to reject 0H . Therefore we
conclude that the claim sizeis independent of the postcode.
10.18
(i)
[1]
Test for association
The hypotheses for the test are:
H0 :
There is no association
between living in a single parent family and getting into
trouble with the police
vs
H1:
There is an association between living in a single parent family
and getting into
trouble with the police.
[1/2]
The actual numbers in each category are:
ACTUAL
In trouble
Notin trouble
Total
Single parent
100
300
400
Two parent
240
960
1,200
Total
340
1,260
1,600
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 83
The expected numbers for each category are:
EXPECTED
In trouble
Notin trouble
Total
Single parent
85
315
400
Two parent
255
945
1,200
Total
340
1,260
1,600
[1]
The observed value ofthe test statistic is:
?(
22
(100 - 85)
-OE)
E
85
=+(300
315)2
315
+(240---
255)2
255
(960
+
945)2
945
= 4.482
[2]
The number of degrees offreedom is (21--1)(2 1)
=.
[1/2]
Sincethe observed value of the test statistic exceeds 3.841, the upper 5% point of the
2
?1 distribution,
wecan reject the null hypothesis
and conclude that there is an association
between single parent families and beingin trouble
(ii)
withthe police.
[1]
Comment
However, the presence of an association does not justify the politicians assumption that single
parents cause crime.
There may be some other underlying causes (eg education levels, poverty)
that influence family circumstances and crime rates together.
10.19 (i)
[1]
Testfor association
The test required is a2?
contingency table test.
The hypotheses are:
H0: Thereis no association between flower colour andleaf type
H1: Thereis some association between flower colour andleaf type.
vs
[1]
The expected frequencies are:
Plain
Variegated
Red
White
Pink
87.3
82.1
46.7
114.7
107.9
61.3
[2]
So the observed value of the test statistic is:
?(
The Actuarial
OE)
(97-- 87.3)
22
E
Education
87.3
Company
=+
(31
-
61.3)2
61.3
+=?
71.0
[2]
IFE: 2022 Examination
Page 84
CS1-10: Hypothesis
Comparing this withthe figures in the Tablesfor the
?22
distribution,
testing
wesee that this figure is far
larger than the 1/2%point ofthe distribution. Wehave overwhelming evidence against the null
hypothesis, and weconclude that there is almost certainly some association between flower
colour and leaf type.
(ii)(a)
[1]
Maximumlikelihood estimate of q
Assumingthat this genetic modelis correct, the likelihood function is:
97 ?
?
?22?
42
Lq
()=? q
?13 qq ?
?
?
?
?
-qq392(1=- 3 ) 77(1
77
148
q105
3 q???
???
2 ???
1--? 5q ?
?
2 ?
31
constant
5q) 31 constant
[1]
Takinglogs:
log
392logLq=+ 77log(1
Differentiating
-
3q) + 31log(1 - 5q) + constant
[1]
with respect to q:
d
logL=dq
392
q
231
155
13q
1-- 5q
[1]
Setting this equal to zero, and multiplying through
392(1
3 )(1-- 5 )
-qq
by )qq
(1q-- 3 )(1
5 , we obtain:
231 q(1 - 5 q) - 155 q(1 - 3 q) = 0
[1]
Gathering terms:
392
2
3,5220qq
-+ 7,500
=
Solving the quadratic equation:
3,522
q==0.18128
If q = 0.28832 , then
3,522 2- 4 7,500
15,000
-15q
2
392
or 0.28832
is negative, so we can ignore the larger root.
[2]
Wecan check that this doesindeed give a maximum:
d2
logL
dq
=-
392
q
22
693
-
(1
3q)
775
2
-
2 <?0
(1-- 5q)
max
[1]
Sothe maximumlikelihood estimate for qis q = 0.181.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
(ii)(b)
testing
Page 85
Test goodness of fit
Using q
0.181=
, wecan find the expected frequencies by multiplying the probabilities by 500.
This gives the following
table of expected frequencies:
Red
White
Pink
Plain
90.6
45.3
114.0
Variegated
90.6
136.0
23.4
[2]
Usinga chi-squared test, the hypotheses are:
vs
H0: The probabilities
of each plant type conform
to this genetic
model
H1: The probabilities
of each plant type do not conform to this genetic
model.
The observed value of the test statistic is:
?(
OE)
(97-- 90.6)
22
=+
90.6
E
(31
23.4)2
+=?
18.5
23.4
-
Comparing this value with the appropriate points ofthe
have strong evidence to reject 0H , and weconclude
[2]
2
?4
distribution, wesee that again we
at the 1/2%level that this genetic
model does
not appear to fit the data well.
This time
[1]
we are not testing for association, ie it is an ordinary
chi-square
goodness of fit test.
So
the number of degrees offreedom is the number of cells minusthe number of estimated
parameters minus1. This gives us 4--61 1 = degrees offreedom here.
(iii)
Comment
None of the modelssuggested here appear to fit the data well. Ofthe pink flowers, there appear
to be far too
many with plain leaves and far too few
under the assumption ofindependence.
overcompensate
for this,
with variegated leaves than
with the result that the actual number of pink flowers
is smaller than that predicted by the model. Afurther
models we have tried so far
10.20 (i)(a)
we would expect
However,the genetic modelin part (ii) appears to
with plain leaves
modelsomewhere between the two
might give a better fit to the observed data.
[3]
State assumptions
Each houseindependently
(i)(b)
[1]
Derive the maximumlikelihood estimate of p
L(p)
The Actuarial
musthavethe same probability of being burgled.
=
[( X = 0)]39[ PX
P
(
1)]38 [(PX=== 2)]18 [(PX 3)]4 P( X = 5)
Education
Company
[1]
IFE: 2022 Examination
Page 86
CS1-10: Hypothesis
Using a Bin(6, p) distribution
to calculate the probabilities:
c[(1=- p) 6 ] 39 [ p(1
( )
Lp
testing
-
p) 5 ] 38 [ p2 (1
p) 4 ] 18 [ p3(1
-
-
p) 3 ] 4 p5(1
-
p)
cp 91(1=- p) 509
? ln
ln c=+ 91ln p + 509ln(1 )- p
(Lp)
?
?
91
?
[1]
equal to zero to obtain the
509
-= 0
?
- pp
Checking its
[1/2]
91 509
=pp 1?- p
ln)Lp
(
Setting the differential
?
[1/2]
p=
maximum:
91
[1]
1600
a maximum:
2
ln)Lp
(
Alternatively,
=-
91
509
<?0
pp22 (1?- p) 2
max
[1]
since the binomial distribution is additive,
we could have looked
at a single
Bin(600,)p distribution instead.
(i)(c)
Fit the binomial model and comment
Using the estimate
PX x()==
(i)(d)
6??
??
x??
p = 91 600
(1p - p
we get frequencies
6-xx)
[31/2]
.
Comment
These are very similar to the observed frequencies,
(ii)
Test whether binomial
Using
of 37.3, 40.0, 17.9, 4.3, 0.6, 0.0, 0.0, using
= 0.18p and
(PX
x)==
model with p
6??
?? 0.18
x??
which implies that the
modelis a good fit.
[1/2]
0.18=is a good fit for the data
60.82xx
we get:
0
1
2
3
4
5
6
Observed
39
38
18
4
0
1
0
Expected
30.40
40.04
21.97
6.43
1.06
0.09
0.00
[2]
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 87
Since the expected frequencies
are less than five for 4, 5 and 6 houses burgled,
we need to
combine these columns together with the 3 column:
0
1
2
3+
Observed
39
38
18
5
Expected
30.40
40.04
21.97
7.58
[1]
The observed value ofthe test statistic is:
2 (39
?
30.40)
30.40
?
=+
(5-- 7.58)22
+
7.58
[2]
= 4.13
There are now 4 groups sothe number of degrees offreedom is 3-=
41
valuefor p of 0.18 wasgiven and was not estimated usingthis data.
Weare carrying
out a one-sided test.
10.21 (i)
to conclude that the
Illustrate
[1]
Our observed value of the test statistic is less than the 5%
critical value of 7.815. So we haveinsufficient
is reasonable
. Remember that the
evidence to reject 0H
at the 5%level.
Therefore it
modelis a good fit.
[1]
data and comment on the assumptions
Thereis perhaps some very slight evidence of concentration at the centre ofthe distribution for
A, but the sample sizes are small andit is difficult to tell whether an assumption of normality is
reasonable.
The variance of the data from
Company Blooks slightly smaller than that from
Company A. However,it is unlikely that such a small difference is significant. There are no
outliers in either distribution.
(ii)
[2]
Test whether appropriate to apply atwo-sample t test
Werequire the variances to be equal, so we are testing:
Hss
s=?
22
Hvs
494,126=-
2
::AsBA
2
B
01
2,134 ??
2,25922?
221 1 ?
= 4,303.4?? ss
?541,463= 3,461.7
=
AB
10 ?
?
910 ??
9 ??
??
?
The test is based on the result
22
SS
AB
22
ss AB
4,303.4 3,461.7
1
[1/2]
? Fnn
1AB 1,
-- . Theobservedvalue of the test statisticis:
= 1.243
Weare carrying out atwo-sided test. Comparing the statistic withthe
that it is less than the 5% critical value of 4.026. So we have insufficient
to reject the null hypothesis. Therefore it is reasonable to conclude that
The Actuarial
Education
Company
[1]
[1]
9,9Fdistribution,
wesee
evidence at the 5%level
A=ssB22.
[11/2]
IFE: 2022 Examination
Page 88
(iii)
CS1-10: Hypothesis
Test whether premiums charged by Company
testing
B was higher than those by Company
A
Weare testing:
H
Hvs
A=> BA
::
01
[1/2]
B
Underthis null hypothesis, we use:
- XX
BA
2
?t
nnAB
11??
SP
+- 2
??+
nn
BA ??
Substituting in the values, weget atest statistic of:
225.9 - 213.4
9 4,303.4+ 9 3,461.7
18
Comparing this
with the
= 0.4486
1
10
[1]
1??
??+
10??
18t values gives a p-value of in excess of 30%. So we have insufficient
evidence to reject our null hypothesis at the 30% level.
Therefore it is reasonable to conclude
that the level of premiums charged by Company Bis the same asthat charged by Company A.[11/2]
(iv)(a)
Confidenceinterval for the difference between the proportions
Usingthe pivotal value, from Chapter 8 of:
ppAB()-pqAA
p
+
()
p
- AB
???N(0,1)
[1/2]
pB q B
nn
AB
Wehave:
A
0.5, pq
0.5,==AB
p = 0.6,
Bq=0.4,
nA = nB = 10
[1/2]
Weobtain a 95%confidence interval of:
0.1- 1.96
(iv)(b)
0.25 0.24
+
=-( 0.53,0.33)
10
10
[1]
Comment
Sincethis confidence interval contains zero, wecannot conclude that the proportions of
premiums in excess of 200 are different for the two companies.
(v)
[1]
Test whether Company A appears to have increased its premiums
Wenow carry out a single sample t-test
As
A H=>:
IFE: 2022 Examinations
170Hv
01:
on the data for Company
A. Weare testing:
170
[1/2]
The Actuarial
Education
Compan
CS1-10:
Hypothesis
testing
Page 89
The observed value of the test statistic is:
213.4 - 170
= 2.092
[1]
4,303.4 / 10
Comparing this with values of the 9t distribution, wefind that wehave a result that is significant
atlevel somewhere between 2.5% and 5%. So we havesufficient evidence to reject 0H at the 5%
level.
Therefore it is reasonable to conclude that the company
the previous year.
The Actuarial
Education
hasincreased its premiums since
[11/2]
Company
IFE: 2022 Examination
Allstudy materialproducedby ActEdis copyright andis sold
for the exclusiveuse of the purchaser. Thecopyright is
ownedbyInstitute andFacultyEducationLimited, a
subsidiary of the Institute and Faculty of Actuaries.
Unlessprior authorityis grantedby ActEd,you maynot hire
out,lend, give out, sell, store ortransmit electronically or
photocopy any part of the study material.
You musttake care of yourstudy materialto ensurethat it
is not used or copied by anybody else.
Legal action will betaken if these terms areinfringed. In
addition, we mayseekto take disciplinaryactionthrough
the profession orthrough your employer.
Theseconditionsremainin force after you havefinished
usingthe course.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-11: Correlation
Page 1
Correlation
Syllabusobjectives
2.2
Exploratory data analysis
2.2.1
2.2.2
Describethe purpose of exploratory data analysis.
Use appropriate
undertake
2.2.3
tools to calculate suitable summary statistics
exploratory
and
data visualizations.
Define and calculate Pearsons, Spearmans and Kendalls measuresof
correlation
for bivariate
data, explain their interpretation
and perform
statistical inference as appropriate.
2.2.4
The Actuarial
Education
Use principal components
complex data set.
Company
analysis to reduce the dimensionality
of a
IFE: 2022 Examination
Page 2
0
CS1-11: Correlation
Introduction
Actuaries,
statisticians
and many other professionals
are increasingly
engaged in analysing
and interpreting large data sets, in order to determine whether there is any relationship
between variables,
the following three
and to assess the strength of that relationship.
The methods in this and
chapters are perhaps more widely applied than any other statistical
methods.
Exploratory
data analysis (EDA) is the process of analysing data to gain further insight into
the nature of the data, its patterns and relationships
between the variables, before any
formal statistical techniques
are applied.
Thatis, weapproach the datafree of any pre-conceived assumptions or hypotheses.
the patterns in the data before
weimpose
any views on it and fit
Wefirst see
models.
In addition to discovering the underlying structure of the data and any relationships between
variables,
exploratory
data analysis can also be used to:
detect any errors (outliers or anomalies) in the data
check the assumptions madeby any modelsor statistical tests
identify
the
mostimportant/influential
variables
develop parsimonious models that is modelsthat explain the data with the minimum
number of variables necessary.
For numerical
data, this
process
use of data visualisations.
this
will include
Transformation
the calculation
of summary
statistics
and the
ofthe original data may be necessary as part of
process.
For a single
variable,
EDA will involve
calculating
summary
statistics
(such
as mean,
median, quartiles, standard deviation, IQR and skewness) and drawing suitable diagrams
(such as histograms,
boxplots,
series/ordered
data).
quantile-quantile
(Q-Q) plots and a line
chart for time
For bivariate or multivariate data, EDA willinvolve calculating the summary statistics for
each variable
and calculating
visualisation
will typically involve scatterplots
Linear correlation
correlation
coefficients
between each pair of variables.
Data
between each pair of variables.
between a pair of variables looks at the strength
of the linear relationship
between them. The diagrams below show the various degrees of positive correlation:
perfect positive
correlation
IFE: 2022 Examinations
strong positive
correlation
weak positive
correlation
The Actuarial
Education
Compan
CS1-11: Correlation
Recall that
Page 3
we met correlation
in Chapter 4 and defined it for a population.
In this chapter
we will
explain how to obtain the sample correlation, and then how to useit to makeinferences about
the populations correlation.
population
mean,
Thisis similar to what we did withthe sample mean, X, and the
, in Chapters 7 to 10.
For multivariate data sets with large dimensionality
analysis and principle components
analysis (also
reduce the complexity
of the data set.
Subject
CS1 assumes that students
various techniques
such as cluster
called factor analysis) can be used to
can carry out EDA on univariate
data sets.
Thisincludes calculation of summary statistics (eg meanand variance) and construction of
diagrams (eg histograms),
This chapter
which are assumed knowledge for Subject CS1.
covers three
aspects
of EDA:
using scatterplots to assess the shape of any correlation for bivariate data sets,
calculating
correlation
using principal
for multivariate
coefficients
components
data sets.
Some results in this chapter are quoted
to
measure the strength
analysis (PCA) to identify
without proof.
the
of that correlation,
most important
Students are expected to
and
variables
memorise these
and apply them in the exam.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 4
1
CS1-11: Correlation
Bivariatecorrelationanalysis
In a bivariate correlation analysis the problem of interest is an assessment
of the relationship
between the two variables
Y and X.
of the strength
In any analysis, it is assumed that measurements (or counts) have been made, and are
available, on the variables, giving us bivariate data
(,xy11), ( x2, y 2) , ? , ( x , y ).nn
1.1
Datavisualisation
The starting point is always to visualise the data. For bivariate data, the simplest
this is to draw a scatterplot
and get a feel for the relationship
as revealed/suggested
by the data.
The R code to draw a scatterplot
for a bivariate
data frame,
wayto do
(if any) between the variables
<data>, is:
plot(<data>)
Weare particularly
interested
in
whether there is alinear
relationship
between
Y, the
response (or dependent) variable, and X, the explanatory (or independent, or regressor)
variable. Thatis the expected value of Y, for any given value x of X, is alinear function
of that value x, ie:
EY
x[]?=
+ax
Recallfrom Chapter 5that ]EY[| x is a conditional
corresponding
mean, whichrepresents the average value of Y
to a given value of x.
If alinear relationship (even a weak one) is indicated
(Linear
Regression)
relationship
can be used to fit a linear
by the data, the
methods of Chapter 12
model, with a view to exploiting
the
between the variables to help estimate the expected response for a given value
of the explanatory
Wenow look
variable.
at two examples (one linear
and one non-linear)
which we will analyse throughout
the chapter.
Question
A sample
of ten claims
and corresponding
is taken from the business of an insurance
The amounts, in units of 100,
Claim
x
Payment y
payments
on settlement
for
household
policies
company.
are as follows:
2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
Draw ascatterplot and comment on the relationship between claims and payments.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-11: Correlation
Page 5
Solution
The scatterplot
for these
data is as follows:
Here we can see that there appears to be a strong
data points lie roughly in a straight line.
positive linear relationship.
The plotted
Wecan see from the graph that there appears to be alinear relationship between the claims and
payments (ie the rate of change in payment is constant for arate of change in the claim). So we
will be able to use our linear regression
work on these
data values in the next chapter.
The next example contains a non-linear relationship between the variables.
A well-chosentransformation of y (or x, or even both) mayhowever bring the datainto a
linear
relationship.
Thisthen allows usto usethe linear regression techniques in the next chapter.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 6
CS1-11: Correlation
Question
The rate of interest of borrowing,
over the next five years, for ten companies
each companys leverage ratio (its debt to equity ratio).
is compared
to
The data is as follows:
Leverage ratio,
x
Interest rate (%), y
Draw a scatterplot
0.1
0.4
0.5
0.8
1.0
1.8
2.0
2.8
3.4
3.5
3.6
4.6
6.3
10.2 19.7 31.3 42.9
and comment
on the relationship
2.5
between company
2.8
3.0
borrowing (leverage)
and
interest rate. Hence apply a transformation to obtain alinear relationship.
Solution
The scatterplot
It can clearly
for these
data is as follows:
be seen that the data displays
change in the interest rate increases
IFE: 2022 Examinations
a non-linear
relationship,
since the rate of
withthe leverage ratio.
The Actuarial
Education
Compan
CS1-11: Correlation
Page 7
In this case, the log of the interest rate against the leverage ratio produces afar morelinear
relationship:
1.2
Sample correlation coefficients
The degree of association between the
x and y values is summarised
bythe value of an
appropriate correlation coefficient each of whichtake values from -1 to +1.
The coefficient oflinear correlation provides a measure of how well alinear regression model
explains the relationship between two variables. The values of r can beinterpreted asfollows:
Value
r =1
<<
01r
r =0
Thetwo variables movetogether in the same direction in a perfect linear
relationship.
Thetwo variables tend to movetogether in the same direction but there is
not a direct relationship.
Thetwo variables can movein either direction and show nolinear
relationship.
The two variables tend to
<10r
-<
move together in opposite directions
but there is
not a direct relationship.
r =-1
The Actuarial
Interpretation
Education
Thetwo variables movetogether in opposite directions in a perfect linear
relationship.
Company
IFE: 2022 Examination
Page 8
CS1-11: Correlation
In this section welook at three correlation coefficients:
Kendalls
Pearson, Spearmans rank and
rank.
It is always important
relationship
in data analysis to note that
simply finding
a mathematical
between variables tells one nothing in itself about the causality ofthat
relationship
or its continuing
persistence through time.
analysis is essential before making predictions
or taking
Jumping to acause
Qualitative
action.
as well as quantitative
and effect conclusion - that a change in one variable causes a change
in the other - is a common
misinterpretation
of correlation
coefficients.
For example, the
correlation
may be spurious, or there
may be another variable not part of the analysis that is
causal.
Pearsons correlation coefficient
Pearsons correlation coefficient r (also called Pearsons product-moment correlation
coefficient) measures the strength oflinear relationship between two variables and is given
by:
Sxy
r =
xxyySS
where:
xx
Syy
= ?
= ?
Sxi
x()
-=?
xi2
-
(
?
xi 22) n
i
yy()
-=?
yi 2
-
(
?
yi 22) n
xy = ? Sxi
x()( yi--
y) = ? xi yi -
()()xi
?? yi
n
Sxx and Syy, the sums of squaresof x and y respectively,arethe samplevariancesof x and y
multipliedby
(1)n. Similarly xyS
is the samplecovariance multipliedby n.
n
?xi 2is oftenabbreviated
to ?2x , etc.
i =1
Question
Show that:
xx =
Sx - ()i
x
IFE: 2022 Examinations
22
?? x =i
2
?xi
()
n
?xi2=-nx2
The Actuarial
Education
Compan
CS1-11: Correlation
Page 9
Solution
Expanding the bracket and splitting up the summation,
xx=- Sxi
=-
x() 2
we have:
=?? xi -(2 x xi +x22)
2xx?? xii
2
+
x()
=-
?x22
xn
?? ii ()22
+
?xxi()2
?? x22=ii
2
nnn
Since?=ixnx, weget:
xx=-Sxi
2
?xi
()
?? xi=- nx()
nn
2
nx2
?xi2=-22
These
formulae
aregiven
onpage
24oftheTables
inthe
xx
Sxi=-?22
nx format.
Recallfrom Chapter 4that the population correlation coefficient is defined to be:
?==
cov( ,XY)
corr(,XY)
var( X)var( Y)
Pearsons sample correlation coefficient, r , is an estimate of the population correlation
coefficient,
?,in the same
way as x is an estimate of
The formula for the sample correlation
coefficient,
or s 2 is an estimate
of2s
.
r , is given on page 25 of the Tables.
Lets now calculate this correlation coefficient for the examples we metearlier.
Question
For the claims settlement data, we have:
Claim(100s)
x
Payment (100s)
2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
y
2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
Number of pairs of observations
35
.4 , ??xx
n
== 133.76 ,
Calculate Pearsons correlation
10=
.
?y
coefficient
= 32.87 ,
?y
22=115.2025
, ?xy=123.81
for the claims settlement
data and comment
on its
value.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 10
CS1-11: Correlation
Solution
35.4 2
=-nx?x22
133.76
=-
Sxx
10
Syy=-?
yny22
115.2025=-
Sxy=-xynxy?
?
123.81=-
7.4502
r
8.444
7.15881
=
32.287
10
(35.4
8.444
=
7.15881
32.87)
= 7.4502
10
== 0.95824
Asexpected,this is high(close to 1+ ), andindicates a strong positivelinear relationship.
Question
For the original borrowing rate data:
Leverage ratio, x
Interest rate y
0.1
0.4
0.5
0.8
1.0
1.8
0.028 0.034 0.035 0.036 0.046 0.063
Number of pairs of observations
2.0
2.5
2.8
3.0
0.102 0.197 0.313 0.429
= 10n
.
14.9,??xx
== 32.39 ,
?y = 1.283 , ?y
22=0.341769
, ?xy=3.082
Calculate Pearsons correlation coefficient for the borrowing rate data.
Solution
2
Sxx
14.9??
?xnx
=- 22= 32.39
- 10
??
Syy=-
?yny 22= 0.341769
- 10
Sxy=-
?xy nxy= 3.082
?
10 ??
r
IFE: 2022 Examinations
- 10
= 10.189
2
1.283??
??
10 ??
= 0.1771601
14.9
??? 1.283?
???
? = 1.17033
10 ??? 10 ?
1.17033
== 0.87108
10.189 0.1771601
The Actuarial
Education
Compan
CS1-11: Correlation
Page 11
Since Pearsons correlation
coefficient
measureslinear association, it
may give alow result
when variables have a strong, but non-linear relationship.
Whilst the value for the
borrowing rate data is high, it is materially lower than in the first example, due to the
non-linearity
ofthe relationship.
The moralof the story however is always to plot the datafirst. For example, the following
scatterplots
(from the statistician
Francis Anscombe) all have a correlation
coefficient
of 0.816:
Reference: Anscombe, F.J. (1973). Graphs in Statistical Analysis. American Statistician 27 (1):
1721.
JSTOR 2682899
The R code for calculating a Pearson correlation coefficient for variables
cor(x,
y,
method
=
x and y is:
"pearson")
Spearmansrank correlation coefficient
Spearmans rank correlation coefficient
necessarily
linear)
Formally, it is the
relationship
between two variables.
Pearson correlation
rather than the raw values,
rs measures the strength of monotonic (but not
coefficient
applied to the ranks,
Education
Company
()irY
,
)iiXY
(,
, ofthe bivariate data.
Soit just usestheir relative sizesin relation to each other.
to largest.
The Actuarial
()irXand
Weusually order them from smallest
IFE: 2022 Examination
Page 12
CS1-11: Correlation
If all the iX s are unique,
and separately
all of the iY s
are unique, ie there
are no ties,
then this calculation simplifies to:
?di 2
6
1=-
rs
where
i
nn 2 -(1)
iiX()=- r ( )iY .
dr
Since Spearmans
rank correlation
coefficient
only considers ranks rather than the actual values,
the value of the coefficient is less affected by extreme values/outliers in the data than Pearsons
correlation coefficient. Hencethis statistic is morerobust.
Lets now calculate Spearmans rank correlation
coefficient
for the examples
we met earlier.
Question
Calculate Spearmans rank correlation
Claim(100s)
x
Payment (100s)
coefficient
for the claims settlement
data and comment.
2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
y
2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
Solution
For the claims settlement data:
Claim x
Payment y
Rank x
Rank y
2.1
2.18
1
2
2.4
2.06
2
1
1
1
2.5
2.54
3
3
0
0
3.2
2.61
4
4
0
0
3.6
3.67
5
6
3.8
3.25
6
5
4.1
4.02
7
8
4.2
3.71
8
7
1
1
4.5
4.38
9
9
0
0
5
4.45
10
10
0
0
d
1
1
1
1
d2
1
1
1
1
This gives:
rs
66
1
=-
10 (10
IFE: 2022 Examinations
-2
= 0.9636
1)
The Actuarial
Education
Compan
CS1-11: Correlation
Page 13
As expected, the Spearmans rank correlation coefficient is very high, since it is known from
the calculation
of the Pearsons correlation
relationship
(hence a strong monotonically
The Spearmans
correlation
coefficient
coefficient
increasing
that there is a strong
relationship).
may give a value that is substantially
positive linear
different from the
Pearsons coefficient for the same data.
Question
Calculate Spearmans
rank correlation
coefficient
for the original borrowing rate data and
comment.
Leverage ratio,
x
Interest rate (%), y
0.1
0.4
0.5
0.8
1.0
2.8
3.4
3.5
3.6 4.6
1.8
2.0
2.5
2.8
3.0
6.3 10.2 19.7
31.3
42.9
Solution
For the corporate borrowing data, the ranks of the two data are exactly equal, hence
Spearmans
rank correlation
coefficient
is trivially
equal to 1.
The reason that this is materially higher than the equivalent Pearson coefficient is because
the non-linearity
of the relationship
does not feature in the calculation,
only the fact that it is
monotonically increasing.
TheRcodefor calculatinga Spearmanrank correlationcoefficientfor variables x and y
is:
cor(x,
y,
method
=
"spearman")
The Kendall rank correlation coefficient
Kendalls rank correlation
between two variables.
coefficient
t
measures the strength
of monotonic relationship
Like the Spearman rank correlation
coefficient, the Kendall rank correlation
coefficient
considers
only the relative values of the bivariate data, and not their actual values. It is far
more intensive from a calculation
viewpoint,
however, since it considers the relative values
of all possible pairs of bivariate data, not simply the rank of Xi andiY for a given i
.
Despitethe morecomplicated calculation, it is considered to have better statistical properties
than Spearmans rank correlation coefficient, particularly for small data sets withlarge numbers
of tied ranks.
Any pair of observations
for both elements agree, ie
(,XY
ii );( Xj , Yj ) where
XX>
ij and
YY>
ij , or
ij?, is said to be concordant if the ranks
XX<
ij and
YY<;
otherwise
ij
they
are said to be discordant.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 14
CS1-11: Correlation
Consider the settlement
payments for claims example.
Suppose Claim Ais greater than Claim B.
If the settlement for Claim Ais also greater than the settlement for Claim Bthen they havethe
same relative rank orders, and wesaythat A and Bare concordant pairs withrespect to the
random variables claims and settlement.
Let nc be the number
of concordant
pairs, and let dn
Assuming that there are no ties, the Kendall coefficient
t
be the number
of discordant
pairs.
t is defined as:
- nn
cd
=
nn -(1) / 2
The numerator is the difference in the number of concordant and discordant pairs.
The denominator is the total number of combinations of pairing each )ii(,XY witheach
XjY
j (, ) .
Thiscould also be definedas.nn+ cd
For example,if there were3 observationsof X and Y then there wouldbe
2) 2=(33
combinations:
)XY(, 11
and )XY(, 22
)XY(, 11
and )XY(, 33
)XY(, 22
and )XY(, 33
So t can beinterpreted asthe difference between the probability ofthese objects beingin the
same order and the probability of these objects beingin a different order.
Therefore, a value of 1-
indicates
all discordant
pairs and +1indicates
all concordant
pairs.
Intuitively, it is clear that if the number of concordant pairsis muchlarger than the number of
discordant
pairs, then the random variables are positively correlated.
Onthe other hand, if the
number of concordant pairsis muchless than the number of discordant pairs, then the variables
are negatively correlated.
Lets now calculate the Kendall rank correlation coefficient for the examples we metearlier.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-11: Correlation
Page 15
Question
Calculate Kendalls rank correlation coefficient for the claims settlement data and comment.
Claim(100s)
x
2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
Payment (100s)
y
2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
Solution
For the example
(x,y)
claims
2.1,2.18
data:
2.4,2.06
2.1,2.18
2.5,2.54
3.2,2.61
3.6,3.67
3.8,3.25
4.1,4.02
4.2,3.71
4.5,4.38
5.0,4.45
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
d
c
c
c
c
c
c
c
c
d
c
c
c
c
d
2.4,2.06
2.5,2.54
3.2,2.61
3.6,3.67
3.8,3.25
4.1,4.02
4.2,3.71
4.5,4.38
c
5.0,4.45
where c represents
Here
nc
=
42 ,
nd 3=
a concordant
, so
t
-nn
cd
pair, and d represents
(42 =- 3) (10
=
pair.
0.8667 .
42 - 3
== 0.8667 .
42 + 3
Alternatively using t
=
Again the relatively
high value demonstrates
+nn
cd
gives t
9 2)
a discordant
the strong
correlation
between the variables.
Its often easierto determine concordant and discordant pairs by usingthe ranks instead of the
actual numbers.
First arrange the values in order of rank for
x . Then the number of concordant
pairs (C) is the
number of observations below which have a higher rank for the y andthe number of discordant
pairs(D)is the number of observationsbelow whichhavealower rankfor the y.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 16
CS1-11: Correlation
Rank x
Rank y
C
D
2.1, 2.18
1
2
8
1
2.4, 2.06
2
1
8
0
2.5, 2.54
3
3
7
0
3.2, 2.61
4
4
6
0
3.6, 3.67
5
6
4
1
3.8, 3.25
6
5
4
0
4.1, 4.02
7
8
2
1
4.2, 3.71
8
7
2
0
4.5, 4.38
9
9
1
0
5.0, 4.45
10
10
42
3
Totalling the columns gives nc = 42 , nd =3 as before.
Question
Calculate Kendalls rank correlation
coefficient
for the original borrowing rate data and comment
onits value.
Leverage ratio, x
0.1
0.4
0.5
0.8
1.0
1.8
2.0
2.5
2.8
3.0
Interest rate (%), y
2.8
3.4
3.5 3.6
4.6
6.3 10.2 19.7 31.3 42.9
Solution
For the corporate
equal to 1.
borrowing
data, clearly
all the
pairs are concordant,
and so t is trivially
The Rcode for calculating a Kendallrank correlation coefficient for variables x and y is:
cor(x,
IFE: 2022 Examinations
y,
method
=
"kendall")
The Actuarial
Education
Compan
CS1-11: Correlation
Page 17
1.3 Inference
To go further
than
distribution
a mere description/summary
of the data, a model is required
for the
ofthe underlying variables )X(, Y .
Inference under Pearsonscorrelation
The appropriate
X
modelis this: the distribution
,,
, and
,YX Yss
of )X(, Y is bivariate normal, with parameters
.
?
Assuming a bivariate normal distribution meansthat wehave continuous data, each(marginal)
distribution is also normal, the variance is constant and we have alinear relationship between X
and Y. If any ofthese assumptions are not metthen inference will give misleadingresults.
In the bivariate normal
model, both variables are considered to be random.
However, they are
correlated, so their values arelinked.
The bivariate normal modelassumesthat the values of
with joint
XY
, fx(, y)
PDF
-8 < x y <8(,
) given by:
1
s 21-
exp ?--
?2
??
---
+xx
??
sxx??
?xx
YX--
YX
Y
Y
sY
??
2(1?? ??
??XYps
??
??-??
)?????
??
0, the cross term is zero and the PDFfactorises into the product of the
In the case where
=
???
??
Y ??
y
-<<?11.
PDFsfor two independent variables, one with a N(,)XX
distribution.
22
???y
2??2
? ss
where ?is the correlation parameter, where
In the case where=?
)iiXY
(,
have ajoint normal distribution
?? 1, the bivariate
s
2
distribution and one with a N(,
distribution
s
2)YY
degenerates into a single line
, ie the values of X and Y are directly linked.
ss YX
If weintegrate over all possiblevaluesof y to find the conditional expectation, weget the
following result:
EY
(| X x)==
sY
+?
s X
(x
-
YX)
Theimportant thing to note hereis that the expression on the RHSis alinear function of x.
To assess the significance
of any calculated
needed. The distribution
Two results
r , the sampling
distribution
of this
statistic
is
of r is negatively skewed and has high spread/variability.
are available.
Result 1
Under
H0
?
=:0,
rn
-
2
has a t
distribution
with
2n?
=-
degrees
of freedom.
1- r2
The Actuarial
Education
Company
IFE: 2022 Examination
Page 18
CS1-11: Correlation
From this result
a test
of H0
=:0? (the
hypothesis
of no linear
variables) can be performed by working out the value of r
level
of testing,
or by finding
the probability
relationship
between the
whichis significant
value of the observed
at a given
r.
Thisresult is given on page 25 of the Tables.
Question
Test
0=?
:0 Hvs 01:
for the claims settlement data. Recallthat r = 0.95824.
H??
Solution
For the given data n
10=and r
0.95824 8
1
-
0.958=. So the value of the test statistic is:
= 9.478
0.958242
distribution.
Under 0H this should be a value from the 8t
The p-value of
2(
Pt8> 9.478) is less
than 0.1%. Wehave extremely strong evidenceto reject 0H and conclude
Result 2(Fishers transformation
This is a more general result
If
W= ln
11 +r
, then
21 - r
standard deviation
Thisis usually referred
.
of r)
it is not restricted
Whas (approximately)
to the case 0? = .
a normal
distribution
with mean
ln
11 +
?
21 -
?
and
1
n- 3
.
to as the Fisher Z transformation
approximately normal). Accordingly,the letter
(because the resulting
z values are
Zis usually used.
W can also be written as tanh - 1 r . This is the inverse
Note that
? 0?
hyperbolic tangent function,
which, on modern Casiocalculators, is accessed by pressing hyp and then choosing Option 6to
get tanh- 1
.
From the result
and hence for
on
W, tests
of
=:??
H00
can be performed.
Confidence intervals
for
w
? can also befound.
Thisresult is given on page 25 of the Tables.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-11: Correlation
Page 19
Question
Considering
the data on claims
H??
:0.9 Hvs
and settlements,
carry out the test:
=> :0.9
01
for the population of all claims/payments
ofthis type.
Solution
For the given data:
n = 10, r = 0.95824 , observed value of W =tanh - 1 0.95824 = 1.9239
Under0H ,
W has a normal distribution
deviation
110
-=3 0.37796.
( Z>=1.921)
PW
P
>
with mean tanh
-
1
0.9 = 1.4722 and standard
So:
1.9239 1.4722??Z(
?? = P
0.37796
??
>
1.195)
0.12
So the p-value of r = 0.958 is about 0.12.
There is insufficient
evidence to justify rejecting 0H , which can stand.
Notes:
(a)
The bivariate normal assumption.
The presence of outliers
data points far away from the main body of the data
mayindicate that the distributional
assumption
underlying the above methods is
highly questionable.
(b)
Influence
Just as a single
observation
can have a marked effect
on the value of a sample
mean
and standard deviation, so a single observation separated from the bulk of the data
can have a marked effect
The R code for carrying
on the value of a sample correlation
out any hypothesis
test
coefficient.
using the Pearson correlation
coefficient
for variables x and y is:
cor.test(x,
y,
method
=
"pearson")
Inference under Spearmansrank correlation
Since
we are using ranks rather than the actual
distribution
The Actuarial
of X, Y or )XY
(,
Education
Company
data, no assumption
is needed about the
, ie it is a non-parametric test.
IFE: 2022 Examination
Page 20
CS1-11: Correlation
However, non-parametric
tests are less powerful than parametric tests (ie ones that do assume a
distribution) as we haveless information.
So we would need to obtain a more extreme result
before weare ableto reject 0H . Onthe plus side, the test is less affected by outliers.
Under a null hypothesis
sampling
distribution
permutations.
of no association/no
of rs can (for
monotonic relationship
small values
This does not have the form
of a common
statistical
For example,if we had a sample size of 4, there would be=4!
the Y variables, so each arrangement
2d for each arrangement
S
between
of n ) be determined
has a probability
of 24
1
and hence obtain the probabilities
using the probabilities
for large
n this
For larger
obtained above and if
using
distribution.
24 waysof arranging the ranks of
of occurring. Wethen calculate
of getting
each value of 2Sd .
Wecan then carry out a hypothesis test. For example, if weare testing H0:0?=
with a 5% significance level, wherethe data values give
X and Y the
precisely
vs
?>1:0H
()
d S=23, wecan calculate
Pd S=23
we get less than 5% we would reject 0H . However,
will be time consuming.
values
of n ( > 20 ) we can use Result 1 from
above.
So, under the null hypothesis that the variables are uncorrelated:
s
rn
2
-
provided
???tn 2
-
1-rs2
>20n
Recallthat Spearmans rank correlation coefficient is derived by applying Pearsons correlation
coefficient
to the ranks rather than the original data.
The limiting
normal
distribution
has a mean of 0 and a variance
of 1(
-1)n.
This meansthat for verylarge values of n, the sampling distribution of rs can be approximated
bythe N 0,n-1()1distribution.
The R code for
carrying
out a hypothesis
test using
Spearmans
rank correlation
coefficient
for variables x and y is:
cor.test(x,
y,
method
=
"spearman")
Inference under Kendallsrank correlation
Again, since
we are using ranks,
we have a non-parametric
test.
Underthe null hypothesis ofindependence
of X and Y, the sampling distribution
be determined
for
precisely
using permutations
Wecan carry out a hypothesis test in the same
each arrangement.
IFE: 2022 Examinations
However, again, for large
small values
of t can
of n.
way as described above but calculating
n this
-nn forcd
will be time consuming.
The Actuarial
Education
Compan
CS1-11: Correlation
Page 21
Forlarger values of n ( >10), use ofthe Central Limit Theorem meansthat an approximate
normal
distribution
can be used,
The R code for carrying
with mean 0 and variance
out a hypothesis
test
using the
5) 9 ( +-2(2
1)nnn
.
Kendall rank correlation
coefficient
for variables x and y is:
cor.test(x,
y,
method
=
"kendall")
Note that cor.test
will determine exact p-values if
samples the test statistic is approximately
normally
< 50n(ignoring
distributed.
tied
values); for larger
Question
An actuary wantsto investigate if there is any correlation between students scoresin the CS1
mockexam and the CS2 mockexam. Data values from 22 students werecollected and the results
are:
Student
1
2
3
4
5
6
7
8
9
10
11
CS1 mockscore
51
43
39
80
56
57
26
68
54
75
72
CS2 mock score
52
42
58
56
47
72
16
63
48
80
68
Student
12
13
14
15
16
17
18
19
20
21
22
CS1 mock score
85
48
27
63
76
64
55
78
82
52
60
CS2 mock score
82
54
38
57
71
50
45
60
59
49
61
You are given that
S=2494d
, nc =174 and
=57dn
.
Test
:0 Hvs H
01
?? =>
:0 for the mockscore data usingthe Spearmans rank correlation
coefficient and the Kendalls rank correlation coefficient.
Solution
Forthe given data values:
n =22
rs
d2
1=-
(1)
nn
t==
Under0H ,
s
rn
66S 494
22
22(22
-- 1)
174 - 57
- nn
cd
(nn
1=-
1) 2
22-
21 2
= 0.72106
= 0.50649
2
-
???tn -2.
2
1-rs
The Actuarial
Education
Company
IFE: 2022 Examination
Page 22
CS1-11: Correlation
The observed value of the test statistic is:
0.72106
20
=
4.654
1 - 0.721062
This exceeds eventhe upper 0.05% point of 20t(which is 3.850). Percentage points for the t
distribution
are given on page 163 of the Tables.
So we have very strong evidence to reject 0H , and we conclude that the
mock scores in CS1and
CS2are positively correlated.
Under0H , the sampling distribution of Kendalls rank correlation coefficient is approximately
normal
with mean 0 and variance
0.50649 - 0
2(2 n + 5)
. The observed value of the test statistic is:
9(
nn - 1)
=3.299
249
922 21
This exceedsthe upper 0.05% point of the standard normal distribution (which is 3.2905).
Percentage points for the standard normal distribution are given on page 162 of the Tables.
So we have very strong evidenceto reject 0H , and weconclude that the mockscoresin CS1and
CS2are positively correlated.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-11: Correlation
2
Page 23
Multivariatecorrelationanalysis
So far,
we have only considered
many variables to consider.
variable ofinterest,
bivariate
data. In
most practical
Wenow consider the case )XY
(,
applications,
there
are
, where Y remains the
but X is now a vector of possible explanatory variables.
For example, in motorinsurance we may wishto see the connection between the claim amounts
and a number of explanatory variables such as age, number of years driving, size of the engine
and annual number of miles driven.
2.1
Datavisualisation
Again, the starting point is always to visualise the data. For multivariate cases it is no
problem for a computer package to plot a scattergraph
matrix, ie scattergraphs
between
each pair of variables to make the relationships
between them clear.
The R code to draw scatterplots for all pairs from a multivariate data frame, <data>, is:
plot(<data>)
orit is possible to use:
pairs(<data>)
Nowlets look at a set of multivariate data.
Consider
a set of equity returns
from four
different
Market 1
Market 2
Market 3
Market 4
(Mkt_1)
(Mkt_2)
(Mkt_3)
(Mkt_4)
0.83%
4.27%
1.79%
0.39%
0.12%
3.72%
0.90%
0.26%
5.49%
5.21%
4.62%
5.67%
2.75%
6.26%
3.38%
1.40%
5.68%
7.37%
5.21%
5.05%
3.70%
1.60%
2.34%
2.66%
5.75%
5.08%
6.03%
5.48%
1.03%
1.38%
2.37%
1.47%
0.69%
0.17%
0.38%
4.03%
3.26%
3.04%
markets across
12 time
periods
(X) .
0.10%
2.59%
0.54%
2.22%
1.42%
1.37%
3.03%
9.47%
2.95%
2.99%
Wecan use Rto obtain a matrix of scatterplots, by plotting the marketreturns in pairs. We wish
to consider the relationship
The Actuarial
Education
Company
between
Market 4 and the other
markets.
IFE: 2022 Examination
Page 24
CS1-11: Correlation
This gives the following
scattergraph
matrix:
The bottom row has Market 4 as the response variable withthe other three markets as the
explanatory
variables.
between the response
Wecan see that there appear to be positive linear
variable and explanatory
variables.
There are strong positive linear relationships
between
Market 4 and explanatory
Markets1 and 3. Since Markets1 and 3 movetogether there
their influence
on Market 4.
relationships
variables
maybe some overlap
We will look at how wecan strip this overlap
between
out in the Principal
Components Analysis(PCA) section later in this chapter.
2.2
Samplecorrelationcoefficient matrix
Similarly it is no problem for a computer package to calculate
between each pair of variables and display them in a matrix.
The R code for
calculation
of a Pearson correlation
coefficient
correlation
matrix for
coefficients
a multivariate
numeric data frame <data> is:
cor(<data>,
IFE: 2022 Examinations
method
=
"pearson")
The Actuarial
Education
Compan
CS1-11: Correlation
Page 25
Wecan also use R on the equity return
The Pearson correlation
Mkt_1
coefficient
data to obtain the Pearson correlation
matrix for the four
Mkt_2
Mkt_3
coefficient
markets as produced in
matrix.
R output is:
Mkt_4
Mkt_1
1.0000000
0.6508163
0.9538019
0.9727972
Mkt_2
0.6508163
1.0000000
0.5321185
0.6893932
Mkt_3
0.9538019
0.5321185
1.0000000
0.9681911
Mkt_4
0.9727972
0.6893932
0.9681911
1.0000000
Notice that the diagonal elements are all 1. Thats because there is perfect correlation
between,
say, Market 1 and Market 1. Notice also that it is symmetrical as corr( XYY
, ) = corr( , X.
)
2.3
Inference
Wecan carry out tests on the correlation
described in Section 1.3.
The Actuarial
Education
Company
for each pair of variables
using the
methods
IFE: 2022 Examination
Page 26
3
CS1-11: Correlation
Principalcomponentanalysis
Principal component analysisis mosteasily tackled using a computer. In this section the Core
Readingruns through the theory and gives an example, but this topic will be dealt within more
depth in the Paper B Online Resources (PBOR).
Until now we have considered the variables in separate
analysis required in this approach grows exponentially
pairs, but in practice the amount
with each additional variable.
of
Principal component analysis (PCA), also called factor analysis, provides a methodfor
reducing
the dimensionality
components
necessary
to
of the data set,
X . In other
model and understand
words, it seeks to identify
the key
the data.
For many multivariate datasets there is correlation between each of the variables. This means
there is some overlap between the information that each of the variables provide. The technical
phrase is that there is redundancy in the data. PCA gives us a process to remove this overlap.
The idea is that
we create new uncorrelated
variables, and weshould find that only some of these
new variables are needed to explain mostofthe variability observed in the data. The key thing is
that eachnew variable is alinear combination ofthe old variables, soif weeliminate any ofthe
new variables
we are still retaining the
mostimportant
bits of information.
Wethen rewrite the data in terms of these new variables,
These components
are chosen to be uncorrelated
the data which maximise the variance.
linear
which are called principle components.
combinations
of the variables
of
The next section of Core Reading explains the process of how a PCAis carried out and contains
some matrix theory. In parallel with the text, we will work through a simple matrix as an example
so that you can see whatis happening. The Core Readingstarts with a reminder of how to
determine eigenvectors and eigenvalues. Thisis assumed knowledge for the actuarial exams.
The eigenvalues
identity
define
A are the values
matrix. The corresponding
equation
Consider
of matrix
?
?
such that
eigenvector,
0?
I )-=det(A
v , of an eigenvalue
where I is the
? satisfies the
I )v -=(A
0
.
annp
W as app
doing this is that
centred
data matrix X.
Using standard techniques
matrix, whose columns are the eigenvectors
from linear
of
algebra,
XXT
. The intuition for
XXTrepresents the (scaled) covariance ofthe data.
Here p represents the number of variables and n represents the number of observations of each
variable. In a centred data matrix,the entries in each column have a meanof zero. Wecan
obtain a centred matrixfrom the original matrix by subtracting the appropriate column mean
from each entry. The sample variance/covariance
IFE: 2022 Examinations
matrixisTXX
divided by (
1)n.
The Actuarial
Education
Compan
CS1-11: Correlation
Page 27
Suppose we are trying to
model the chances of a student
passing the CS1 exam.
Weare going to
include in our modelthe average number of days per week each student does some studying (1X )
and the average number of hours each student studies at the weekend (2X ). The data values for
one student
data
2,xx
12== 10 and for another student wehave
are
4, xx
12== 6. The original
matrix is therefore:
210??
??
46??
The mean of the entries in the first column is 3 and the
is 8, so the centred
mean of the entries in the second column
matrixis:
12??-
X =??-??
12
Wenow need to calculateTXX :
T
11????
-- 1
XX
?==
????
?--22???? 1
2
? 2
2
- 4?
?8?
-4
Wecan see that this is the covariance matrixfor the data in X. The variance of the data set
-(1,1) is 2,the variance of the data set (2, 2)-is 8 and the covariance between the data setsis
calculated asfollows:
n
11
x -- x
2jj xx()(112
n
=
2
??)xj1xj2
121
==
--
=-( 1) 2 + 1 ( - 2) = -4
jj 11
Here the scaled covariance is the same as the non-scaled covariance
so
11n-=
.
Next we determine the eigenvalues,
?
--
24
48--?
When ? 0=
=
0
?
(2
? 0?
??? ? = ? ?
48??? ? ? 0?
The Actuarial
and from there the eigenvectors, for the
-??
)(8--
)
160=
?
?
2
?10
-=
0
? =?
matrixTXX :
0 or 10
:
24???xx?
When ?
because the sample size is 2
2 -- 4y
?
=
0
-- 4yx + 8y =
?
0?
?
= 2xy
? 2?
? oneeigenvector
is ?1?
? ?
10=
:
--
84???xx?
--
? = ? 0?
?
42
???
???yx
? ?
Education
Company
? 0?
?
-8
-y4
= 0?
-4
-y2
= 0?
?
=-2yx
? 1?
? one eigenvector is ?
?
?-2?
IFE: 2022 Examination
Page 28
CS1-11: Correlation
The unit eigenvectors
are:
?? 11 22??
?? =
??
22
+21
??
??
115
11??
?? 11
??=
??
+- 2)
22
1(
??
--
??
225
By definition:
12??
1
W= 5
The principal
21??-??
components
decomposition
12??? 1
PXW
==
???
12???-- 2
2?
P of X is then
11 ?-- 5
?
1? 55 ? 5
?
defined
as=PXW
0?
?
0?
=
It is obvious that the second column doesnt provide any usefulinformation
deletion of components below.
Wehave now transformed
make this
more useful,
eigenvalue,
the data into
Wis orthogonalif
- 1
a set of p orthogonal
ensure that the eigenvectors
ie the components
which have the
.
in
components.
W are ranked
most explanatory
but weconsider the
In order to
by the largest
power come first.
T. It follows from this definitionthat the columns of Ware
WW=
orthogonal vectors, each with magnitude 1. In our example we did construct
largest eigenvalue.
Wranked by the
The goal is now to move on from simply transforming
the data, and instead
than p components,
so that wereduce the dimensionality
of the problem.
to use fewer
By eliminating
those
much
components
with the least
explanatory
power
we wont sacrifice
too
information.
To assess the explanatory power of each component, consider =SP PT . This is a diagonal
matrix where each diagonal element is the (scaled) variance of each component
transformed
data (the covariance
between components is zero by construction).
T
PP==
Recallthat
??25-- 55
????
??
00????
5??
????5
0
1150
?
0
0
5?
?
0?
?
0?
? 10
0?
?0
0?
=?
of the
?
TXX givesthe (scaled) covariance of the data using the original variables. HenceTPP
gives the (scaled) covariance of the data using the new variables (components).
Since the
components are uncorrelated, the covariances between them are zero. The diagonal elements
givethe (scaled)variancesof eachcomponent(the valuesin matrix P). Thesample variancesare
the diagonal elements divided by (
1)n, whichin this exampleis just 1. Incidentally, it should be
noted that the sample variances are equal to the corresponding eigenvalues.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-11: Correlation
Page 29
For a given q, it is usefulto consider the proportion ofthe variancethat is explained bythe
first
q principal
S divided
components.
This is given by the sum of the first
by the sum of all the diagonal
elements.
q diagonal
elements
of
It is often the case that even for large
data sets with many variables, the first two or three principal components explain 90% or
even
more of the total
variance
which is sufficient
to
model the situation.
In our example, 100% ofthe variance is explained by the first component.
Theres no hard and fast rule for deciding exactly how many principal components
weshould use.
One criterion is to keep as many components as necessary in order to explain
at least 90% of the
total variance.
Other criteria
are covered after the Core Reading example on the following
pages
and will be considered further in PBOR.
Since Wis orthogonal by construction,
=XP WT.
This allows usto reconstruct the original (centred) data using all orjust the reduced
number
of components.
In the general case, we would set the columns in
The Rfunction
for
PCA on a numeric
P of the components
data frame
we are eliminating
to zero.
<data> is:
prcomp(<data>)
Technically this uses the more numerically
stable method of singular value decomposition
(SVD) of the data matrix which obtains the same answers as using the eigenvalues
of
covariance
matrix.
An alternative
Rfunction
for
PCA is princomp(<data>)
which does use eigenvalues
also uses n rather than n - 1 as the divisor in the variance/covariance
give slightly
different
results
but
matrix. Hence it will
to prcomp(<data>).
Notes:
1.
Since the principal
components
are linear
useful for reducing the dimensionality
suitable transformation
2.
Since the loadings
(such
as log)
combinations
of the variables it is not
wherethere are non-linear relationships.
should
of each variable that
A
be applied first.
make up the components
are chosen
by
maximising variance, variables that have the highest variance will be given more
weight. It is often good practice (especially
if different
units
of measurement
are
used for each variables) to scale the data before applying PCA.
3.
No explanation
practical,
has been provided for
real-world
sense. Intuitively,
whatthese components represent in a
the first
component
is the overall trend
of the
data. For the second component onwards, intuition for this must be sought
elsewhere. This is often done by regressing the components against variables
external to the data which the statistical
analyst
has an a priori cause to believe
may
have explanatory power.
Thereis now a Core Reading example based on the equity returns data from Section 2.1. It is very
hard to check these figures manually dueto the amount of data. Werecommend you try to
follow whatis being done without attempting to check the numbers.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 30
CS1-11: Correlation
Consider
our set of equity returns,
Weobtain
X by first
T
XX
X, from four
centering the
different
data (ie by subtracting
0.01431
0.01310
0.01308
0.01249??
0.01310
=??
0.02830
0.01026
0.01245
0.01026
0.01245
0.01315
0.01192
0.01192??
??
0.01153??
0.01308
0.01249
markets across
the
12 time
means of each
periods.
market).
??
Thisis the variance/covariance
matrix of the centred data. Since we have 12 observations
need to divide each of these figures by ( n-=1) 11 to get the sample variance/covariances.
The eigenvectors
we
are:
0.48118
W =??
Then:
--
0.62118
0.33488
0.77332
0.80202
--
0.11440 ????
0.06531
-
0.10879
-
0.55026 ??
0.43394
--
0.47122
-
0.53559
0.44078
--
0.26035
-
0.25621
??
0.81993??
These eigenvectors have magnitude of 1.
Hence the
principal
component
-
0.02822
?
0.01630
0.00286
0.02875
0.09663
0.01781
0.00049
0.07401
0.00291
0.00630
0.00195
0.04243
?
0.04287
is:
0.01374
0.00237
0.11079
PXW==
decomposition
--
---
0.02115
0.11673 -- 0.01947
0.02033 -- 0.02593
0.01066 -- 0.00197
0.05706
0.01253
0.00110 ??
0.00732 ??
-
0.00744
??
-
??
0.00639
-??
?--0.10657
0.04458
0.00350
0.00116 ??
0.00145??
??
0.00113??
??
0.00172??0.00545 ?
-
0.00828
0.00166??
0.00358??
--
-0.00169
- 0.00545
0.00571
0.00543
?-- 0.03578
??
??
0.00084
0.00219
?
?
0.00369 ?
Now consider:
SPT P==??
0.05445
0
0
0
0
0.01218
0
0
0
0
0.00051
0
0
0
0
??
??
??
??
0.00013 ??
Thisis the (scaled) variance/covariance matrix of the principal components. The diagonal
elements are the scaled variances (the sample variances arethese figures divided by 11) andthe
other elements are the scaled covariances (which are all zero as the components
are
uncorrelated).
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-11: Correlation
Page 31
The total (scaled) variance is the sum of the diagonals,
which is 0.06727.
Wecan now calculate
how much ofthis total variance each principal component explains.
The first
principal
component
explains
80.9% of the total
variance, the first two 99.0%, and
the first three 99.8%.
Weobtain these figures asfollows:
0.05445
0.06727
80.9%,
0.05445
0.01218
0.06727
It would therefore seem reasonable
the first two components
of P.
in this
== 99.0%,
0.05445++ 0.01218
+
0.00051
0.06727
example to reduce the dimensionality
= 99.8%,
to 2, using
The decision criteria used hereis to choose principal components that explain atleast 90% of the
total variance.
We would now continue our modelling using methodssuch aslinear regression and GLMson this
reduced data set.
Whilst the first component is the trend, the second component
will need to be regressed
against one or several variables to determine an intuitive
explanation.
To choose which components to keep wecould also use a Screetest. Thisinvolves the
examination of aline chart of the variances of each component (called a Scree diagram). The
Scree test retains only those principal components
observe from the Scree diagram).
before the variances level
off (which is easy to
For the Core Reading example, the Scree diagram is:
Since the scree plot levels off after the first two components
this
would imply that these two
components are enough.
Afurther
alternative is the Kaiser test.
This suggests only using components
with variances
greater than 1. This methodis only suitable if the data values are scaled and henceis not
appropriate
The Actuarial
here as the data has only been centred (not scaled).
Education
Company
IFE: 2022 Examination
Page 32
CS1-11: Correlation
The chapter summary starts on the next page so that you can keep
all the chapter summaries together for revision purposes.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-11: Correlation
Page 33
Chapter11Summary
Exploratory data analysis(EDA)is the process of analysing datato gain further insight into
the nature ofthe data,its patterns and relationships between the variables, before any
formal
statistical techniques
are applied.
Scatterplots are the first step to visualisethe data and assessthe shape of any correlation
between a pair of variables. The strength of that correlation is measuredby the sample
correlation coefficient whichtakes a valuefrom 1- to +1.
Pearsonscorrelationcoefficient
Measuresthe strength oflinear relationship between x and y .
r
Sxy
=
SSyy
xx
Wecan carry out hypothesis tests on the true Pearson correlation coefficient,
?, between
two variables using the t result, the Fisher Z test or permutations.
Under H0
:
?=:0
rn1-r
2
?tn- 2
2
Otherwise:
ta nh
1--
rN???
tanh 11 ,
???
n
??-
3??
Spearmansrank correlation coefficient
Measuresthe strength of monotonic relationship.
UsesPearsons formula but withranks. If
there are no ties in the data:
?di 2
6
rs
1=-
For inference
i
nn 2 -(1)
where Yii dr X()=- r ( )i
use permutations
or, for
n
20>, Pearsons formulae
with ranks.
For very large
values
ofn, thesampling
distribution
ofrs canbeapproximated
bythe N0,n-1
()1
distribution.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 34
CS1-11: Correlation
Kendallsrankcorrelationcoefficient
Measuresthe strength of dependence of rank correlation. If there are no ties in the data,
then:
nn
t
=
- cd
nn-(1) / 2
where nc is the number of concordant
pairs (where the ranks of both elements agree) and
nd is the number of discordant pairs.
Forinference use permutations or, for n
n 5)
10>
, a N( 0,2(2
) distribution.
9(
nn 1)
+
-
Principalcomponentsanalysis
Principal component
analysis (PCA) is a method for reducing the dimensionality
of a data set,
X, byidentifying the keycomponents necessaryto modeland understandthe data. These
components are chosen to be uncorrelated linear combinations of the variables ofthe data
which maximise the variance.
The principalcomponentsdecomposition, P, of X (an np centred data matrix)is defined
to be=PXW, where Wis app
matrix, whosecolumns arethe eigenvectorsof the matrix
T
XX
.
The(scaled) covariance SP=
TP givesthe explanatory power of eachcomponent.
We mightthen decide to retain components one by one until some target percentage
(eg 90%) ofthe total variance has been explained. Alternatively, wecould use a Scree
diagram orthe Kaisertest to help us decide which components to keep.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-11: Correlation
Page 35
Chapter11 PracticeQuestions
11.1
A new computerised ultrasound scanning technique has enabled doctors to monitor the weights
of unborn babies. Thetable below shows the estimated weightsfor one particular baby at
fortnightly
intervals
during the pregnancy.
Gestation period (weeks)
30
32
34
36
38
40
Estimated baby weight(kg)
1.6
1.7
2.5
2.8
3.2
3.5
15.3
?y
= 42.03
??xx?22210
== 7,420
?y
=
xy = 549.8
70,SSyy
== 3.015 and Sxy =14.3.
(i)
Showthat xx
(ii)
Show that the (Pearsons) linear correlation coefficient is equal to 0.984 and comment.
(iii)
Explain whythe Spearmans and Kendalls rank correlation coefficients are both equal
to 1.
(iv)
(v)
Carry out a test of
0:0H
?=
vs
(a)
the t test
(b)
Fishers transformation.
1:0H
?>
Test whether Pearsons sample correlation
using Pearsons correlation coefficient and:
coefficient
supports the hypothesis that the
true correlation parameter is greater than 0.9.
11.2
Aschoolteacher is investigating the claim that class size does not affect GCSE
results.
observations
His
of nine GCSEclasses are as follows:
Exam style
Class
X1
X2
X3
X4
Y1
Y2
Y3
Y4
Y5
Students in class( c )
35
32
27
21
34
30
28
24
7
5.9
4.1
2.4
1.7
6.3
5.3
3.5
2.6
1.6
Average GCSEpoint
scorefor class( p )
238
(i)
??cc ==
?p = 33.4
?p
226,884
=149.62?cp=983
(a)
Calculate Pearsons, Spearmans and Kendalls correlation coefficients.
(b)
UsePearsons correlation coefficient to test whether or not the data agree with
the claim that class size does not affect GCSEresults.
[10]
Following hisinvestigation, the teacher concludes, bigger class sizesimprove GCSE
results.
(ii)
The Actuarial
Comment onthis statement.
Education
Company
[2]
[Total 12]
IFE: 2022 Examination
Page 36
11.3
Exam style
CS1-11: Correlation
A university
wishes to analyse the performance
ofits students
on a particular degree course. It
records the scores obtained by a sample of 12 students at entry to the course, and the scores
obtained in their final examinations by the same students. Theresults are asfollows:
Student
A
B
C
D
E
F
G
H
Entrance exam score x(%)
86
53
71
60
62
79
66
Finals paper score y (%)
75
60
74
68
70
75
78
?xy836 ?? ==867
(i)
(ii)
I
J
K
L
84
90
55
58
72
90
85
60
62
70
x =60,016 ?y
22=63,603 ?(x- x)(y- y)= 1,122
(a)
Explain why Spearmans and Kendalls rank correlation
calculated here using the simplified formula.
(b)
Calculatethe Pearsons correlation coefficient.
coefficients
cannot
be
[3]
Test whether this data comes from a population with Pearsons correlation coefficient
equal to 0.75.
[3]
[Total
IFE: 2022 Examinations
The Actuarial
Education
Compan
6]
CS1-11: Correlation
Page 37
Chapter11Solutions
11.1
(i)
Calculate summary statistics
(ii)
xx
Sx =-
yy
Sy =-
xy=-
Sxy
?? x()
=7,420 -
?? y()
= 42.03 -
n
n
x()(??
?y)
n
Calculate (Pearsons)
Using the results from
There is a strong linear
(iii)
Explain
linear
=
22211
3.015
15.3
= 549.8 -
11
6
correlation
14.3
==
xxSS
yy
6
22211
70
210
=
210 15.3 = 14.3
coefficient
and comment
part (i):
Sxy
r
6
70 3.015
association
=
0.984336
between gestation period and foetal
why the Spearman
and Kendall rank correlation
weight.
coefficients
are both equal to 1
The ranks of the two variables (gestation period and weight) are exactly equal, hence Spearmans
rank correlation
coefficient is equal to 1.
This means that all the pairs are concordant,
(iv)(a)
Test ?0>
Weare testing
and so t is also equal to 1.
using Pearsons correlation coefficient and the t test
H0:0
?=
vs
?>1:0H
.
If 0H is true, then the test statistic
r
4
has a4t
distribution.
1- r 2
The observed value ofthis statistic is
0.984336 2
=11.17. Thisis muchgreater than 8.610,
1 - 0.9843362
the upper 0.05% point ofthe 4t
distribution.
So, wereject 0H at the 0.05%level and conclude that there is very strong evidencethat
ie that there is a positive linear correlation
(iv)(b)
weight and gestation
period.
Test ?> 0 using Pearsons correlation coefficient and Fishers transformation
If 0H is true, then the test statistic
The Actuarial
between the babys
?> 0,
Education
Company
=tanh - 1Zrr has a N(0,1) distribution.
3
IFE: 2022 Examination
Page 38
CS1-11: Correlation
The observed value ofthis statistic is tanh - 1 0.984336
2.4208
13
= 4.193 on the
of the standard
=
2.4208, whichcorresponds to a value of
(0,1)N distribution. Thisis muchgreater than 3.090,the upper 0.1% point
normal distribution.
So, wereject 0H at the 0.1%level and conclude that there is very strong evidence that
?> 0,
ie that there is a positive correlation between the babys weight and gestation period.
(v)
Test whether Pearsons sample correlation coefficient supports
Weare testing
vs H
H0:0.9?=
?
0.9>
.
?>1:0.9
1 distribution, where
If 0H is true, then the test statistic Zr hasa Nz?
(,3)
z ?==-1
tanh
0.9
1.4722
The observed value ofthis statistic is tanh -1 0.984336 = 2.4208, whichcorresponds to a value of
2.4208 - 1.4722
=1.643 onthe (0,1)N distribution. Thisis just less than 1.645,the upper 5%
13
point of the standard normal distribution.
So, we cannot reject 0H
that the correlation
11.2
(i)(a)
at the 5%level ie the data does not provide enough evidence to conclude
parameter
between the babys
weight and gestation
period exceeds 0.9.
Calculate the correlation coefficients
Pearson correlation
cc
coefficient
2
2=- ?c
()
=6,884 - 2382 =590.2222
Sc
?
cp=- ?Scp
cp
238
()(??)
=983n
2
?p2
33.42
pp ?Sp =- () =149.62n
? r
Scp
ccSS
pp
IFE: 2022 Examinations
==
[1/2]
9
n
9
33.4
9
= 99.75556
[1/2]
=25.66889
99.75556
[1/2]
[11/2]
= 0.81045
590.2222 25.66889
The Actuarial
Education
Compan
CS1-11: Correlation
Page 39
Spearman rank correlation
coefficient
The ranks (from lowest to highest) and differences are asfollows:
Class
X1
X2
X3
X4
Y1
Y2
Y3
Y4
Y5
Students in class( c )
9
7
4
2
8
6
5
3
1
Average GCSEpoint
score for class( p )
8
6
3
2
9
7
5
4
1
1
1
1
0
1
1
0
1
0
Differences
[1]
Hence:
66
rs
=-
2
9(9
[1]
=10.95
1)
-
Kendall rank correlation coefficient
Arranging in order of class rank:
Class
Y5
X4
Y4
X3
Y3
Y2
X2
Y1
X1
1
2
3
4
5
6
7
8
9
scorefor class( p )
1
2
4
3
5
7
6
9
8
# concordant pairs
8
7
5
5
4
2
2
0
0
# discordant
0
0
1
0
0
1
0
1
0
Students in class ( c )
Average GCSEpoint
pairs
[1]
Totalling the rows gives nc =33, nd =3. Hence:
33
t
(i)(b)
-
3
9(9 - 1) /2
==
?
0.83
[1]
Test whether class size does not affect
GCSEresults
Weare testing:
:=?0
Hvs:0
H
01
??
[1/2]
Under 0H :
rn-
2
? tn -2
[1/2]
1- r 2
The Actuarial
Education
Company
IFE: 2022 Examination
Page 40
CS1-11: Correlation
The observed value of the test statistic is:
0.81045
7
[1]
= 3.660
1 - 0.810452
Thisis greater than 3.499,the upper 0.5% point of the 7t
distribution.
[1/2]
Therefore, wehave sufficient evidence atthe 1%level to reject 0H . Therefore weconclude that
there is a correlation between class size and GCSE
results (ie classsize does affect GCSE
results).
[Total
Wecould use Fishers transformation.
the accurate version
(ii)
However this is only an approximation,
when testing the hypothesis that
? 0=
[1/2]
10]
it is better to use
.
Comment
There is strong positive correlation
better GCSEresults).
between class size and GCSEresults (ie bigger classes have
[1]
However, correlation does not necessarilyimply causation, ie whilst bigger classes have better
results, it is not necessarilythe classsizethat causes the improvement.
[1]
[Total
11.3
(i)(a)
2]
Why wecant use simplified formulae
Theranks (from lowest to highest) and differences are as follows:
Student
Entrance exam score x(%)
Finals paper score y (%)
Since we have tied ranks
A
B
C
D
E
F
G
H
I
J
K
L
11
1
7
4
5
9
6
10
12
2
3
8
8
1
7
4
5
8
10
12
11
1
3
5
wecannot
use the simplified
formula
for Spearman or Kendall.
[1]
We would haveto use a correction, whichis best handled by a computer.
(i)(b)
Pearsons correlation
Sxx
60,016=-
Syy
63,603=-
Sxy
=
coefficient
8362
12
=
1,774.67
=
962.25
8672
12
[1]
1,122
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-11: Correlation
Page 41
Therefore:
Sxy
r
1,122
==
xxSS
yy
=
0.85860
[1]
1,774.67 962.25
[Total 3]
(ii)
Hypothesis test
Wearetesting
H=?
0.75 Hv
If 0H is true, then the test statistic
z?
tanh
1
01::
[1/2]
0.75??s
Zr follows the
Nz
,
9?
()1distribution,
where
0.75==-0.97296 .
[1/2]
The observed value ofthis statistic is tanh 1 0.85860
1.2880 - 0.97296
= 0.945 on the (0,1)Ndistribution.
19
-
=
1.2880 , whichcorresponds to a value of
This is clearly less than 1.96, the upper 2.5% point of the standard
[1]
normal distribution.
[1/2]
So, wehaveinsufficient evidence at the 5%level to reject 0H ie the data do not provide enough
evidence to conclude that the correlation parameter is any different from 0.75.
The Actuarial
Education
Company
[1/2]
[Total 3]
IFE: 2022 Examination
Allstudy materialproducedby ActEdis copyright andis sold
for the exclusiveuse of the purchaser. Thecopyright is
ownedbyInstitute andFacultyEducationLimited, a
subsidiary of the Institute and Faculty of Actuaries.
Unlessprior authorityis grantedby ActEd,you maynot hire
out,lend, give out, sell, store ortransmit electronically or
photocopy any part of the study material.
You musttake care of yourstudy materialto ensurethat it
is not used or copied by anybody else.
Legal action will betaken if these terms areinfringed. In
addition, we mayseekto take disciplinaryactionthrough
the profession orthrough your employer.
Theseconditionsremainin force after you havefinished
usingthe course.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 1
Linearregression
Syllabus objectives
4.1
Linear regression
4.1.1
Explain what is
meant by response and explanatory
4.1.2
State the simple regression model(with a single explanatory variable).
4.1.3
Derive the least squares estimates of the slope and intercept parameters in
a simple linear regression model.
4.1.4
Use appropriate statistical software to fit a simple linear regression
to a data set and interpret the output.
Perform statistical inference
Describe the use of various
variables.
model
on the slope parameter.
measures of goodness of fit of alinear
regression model2()R
.
Useafitted linear relationship to predict a meanresponse or an
individual response with confidence limits.
Useresiduals to check the suitability and validity of alinear
regression model.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 2
CS1-12: Linear regression
4.1.5
State the multiple linear regression
variables).
4.1.6
Useappropriate software to fit a multiplelinear regression modelto a data
set and interpret
4.1.7
model (with several explanatory
the output.
Use measuresof modelfit to select an appropriate set of explanatory
variables.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
0
Page 3
Introduction
In the last chapter we examined the correlation between two variables.
If there is a suitably strong enough correlation between the two variables(and there is cause and
effect) wecanjustifiably calculate aregression line, which givesthe mathematicalform of this
relationship:
regression line
Y
E[Y|X]
= a+X
X
Much of this chapter is concerned
with obtaining
estimates of the variables associated
with this
regression line and giving confidence intervals for our estimates usingthe methodsfrom
Chapter 9. Dueto the mathematically rigorous nature of this work, a number of results are
quoted without proof, and students are expected to memorise and apply these results in the
exam.
This is along chapter and will probably require two study sessions to cover it in detail. In the past,
this material often formed one ofthe longer questionsin the Statistics exam.
In the previous
chapter
we carried
out correlation
to assess the strength ofthe relationship
In this
unit
welook
at regression
analysis
on bivariate
and
multivariate
analysis to assess the nature
of the relationship
between
Y, the response (or dependent) variable, and X, the explanatory (or independent,
regressor)
data
between variables.
or
variable(s).
The values of the response
variable (our
principal
variable
part, explained by, the values of the other variable(s),
of interest)
depend
on, or are, in
whichis referred to asthe explanatory
variable(s).
Ideally,
the values
used for the explanatory
variable(s)
are controlled
by the experimenter
(in the analysis they are in fact assumed to be error-free constants, as opposed to random
variables
with distributions).
Regression
analysis
consists
view to estimating the
specified
response
of choosing
and fitting
an appropriate
model
usually
with a
meanresponse (ie the mean value ofthe response variable) for
values of the explanatory
may also be needed.
variable(s).
In this chapter only linear relationships
A prediction
will be considered
of the value of an individual
which assume that the expected
value of Y, for any givenvalue x of X, is alinear function ofthat value x. Forthe
bivariate
case this
EY
The Actuarial
x[] ?=a
Education
simplifies
to:
x+
Company
IFE: 2022 Examination
Page 4
For the
CS1-12: Linear regression
multivariate
EY
x
case
with k explanatory
?=
x ,..., x ??
12,
??
1 x1 +
+a
variables,
2 x2 +...
Recall from Chapter 5 that ]EY [| x is a conditional
corresponding
As always,
+
this is:
kkkx
mean, which represents
the average value of Y
to a given value of x.
before selecting
and fitting
a model, the data
must be examined
scatterplots) to see which types of model(and model assumptions)
(eg in
may or may not be
reasonable.
Question
A sample often claims and corresponding
is taken from the business
of an insurance
The amounts, in units of 100,
payments on settlement for household policies
company.
are as follows:
Claim x
2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
Payment y
2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
The scatterplot from the previous unit was as follows:
Discuss whether alinear regression
IFE: 2022 Examinations
model is appropriate.
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 5
Solution
There appears to be a strong
positive linear
relationship
and so fitting
a linear
regression
modelis appropriate.
If a non-linear relationship
(or no relationship)
data, then the methods of analysis discussed
between the variables is indicated
by the
here are not applicable for the data as they
stand.However
a well-chosen
transformationof y (or x, orevenboth) maybringthe data
into
a form for
which these
methods are applicable.
The purpose of the transformation
Ya bX=+
.
is to change the relationship
into linear form, ie into the form
Question
Explain how to transform the relationship
Ya
xb= to alinear form.
Solution
If wetake logs, the relationship becomes:
log
log =+ xlogYab
Soif we workin terms of the variableYY' =log , we have alinear relationship:
'=+ log Ya
The Actuarial
Education
xlogb
Company
IFE: 2022 Examination
Page 6
1
1.1
CS1-12: Linear regression
Thesimple bivariatelinear model
Modelspecification
Given a set of n pairs of data
(,xy ),ii,in,=?1, 2,
the iy
are regarded
as observations
of a
response variable iY . For the purposes of the analysis theix , the values of an explanatory
variable,
are regarded
as constant.
Thesimplelinear regression model(with one explanatory variable)
The response
variable iY is related to the value ix
a =+
where the ie
So
Ee
0i[] =
Yxii
+
ei
=?1, 2,
are uncorrelated
, var[ ]ie
=
by:
,in
error variables
with mean 0 and common
variance
s
2
.
s2 , ,in. =?1, 2,
is the slope parameter,
the intercept
a
Thisis equivalent to saying that
parameter.
ymx=+c, where mis the gradient or slope and c is the
intercept, ie wherethe line crosses the y-axis.
1.2
Fitting the model
Wecan estimate the parameters in a regression
Fitting the
model using the method
ofleast squares.
model involves:
(a)
estimating
the parameters
(b)
estimating the error variance
The fitted regression line,
and
s
2
a, and
.
which gives the estimated value of Y for afixed x,is given by:
=+ yxa
where =
Sxy
Sxx
anda
=-
yx .
Theseare the equations we useto calculate the best
values of
a and
.
They are givenin the
Tables.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 7
Recallfrom the previous chapter that:
xx
= ? Sx
i
Syy = ?
x()
yy() -=?
i
xy=?Sxi
222
xi
-( ?
yi
-( ?y
-=?
x yi -- y
()()
xi )
n
i 222
)
n
= ?xyi i -(
22
=-xi
nx?
y? =-yni 22
xi )(??yi ) n
xii
=-?
ynxy
n
Inregression
questions,?xi2 is oftenabbreviated
to ?x2, etctosimplify
thenotation.
i=1
Question
Showthat the first ofthese relationships is true, ie that:
?xi
()2
22
?? xi=-
Sx - ()i
x
xx =
?xi2=-nx2
n
Solution
Expanding the bracket and splitting up the summation,
xx=-
Sxi
x()
=-
2
=??
?? xii
xx
2
xi -(2
+
x ()
2
=-
Nowsince
we have:
22
x xi +x )
?x22
+xn
?? ii ()22
2
?? x22
ii =-
2
xxi ()
?
nnn
?x
i nx=
, wehave:
x??
Sx
i -
xx =
?xi
()2
22
i =-
nx()2
nn
2
?xi
=-nx2
These
formulae
aregiven
onpage
24oftheTables
inthe
For a set of paired data {(, )ii xy
coefficients
are the values
a
; ,in},=?1, 2,
and
nn
==
qe 2
y
ax
for
the least
xx
Sxi =- x()
squares
?? 22
xi - nx 2 format.
=
estimates
of the regression
which:
2
i()??-????+ii
ii ==11
is a minimum.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 8
CS1-12: Linear regression
e=+
Yg xii()
In fact, for any model
i , the least squares estimates
nn
-??
qe == x[(ii
y
be determined as the values for which
ig
of the regression coefficients
can
Theequations
)]22 is a minimum.
ii==11
we need to solve to find the values ofa
Differentiating
q partially
and
with respect to
are sometimes called normal
a and
equations.
, and equating to zero, gives the normal
equations:
nn
y
??nx
=+ a
ii
ii
11
==
nn
n
??
x yii
=+ai
ii == 11
Solving these
i
equations
=
?xxi2
1
by using determinants
least squares estimate of
??
?nn
?? n
??
??
?
?ii ==11
??
? ?i
nxy ii ??- ??? xi ? ?
=
or the
method of elimination
then
gives the
as:
1
2
=
??
? nn
?
??
??
?
?ii ==11
?
?
?
?yi ??
?
nx 2??- ??? xii ?
Thiscan also be written as
The first
of the two
Sxy
=
Sxx
.
normal equations
gives
a
as:
nn
??yxii
-
ii
==
a==
n
11
y
x
-
Being able to produce afull derivation of these results is important,
times in the past.
Notethat afitted line
will pass through the point
asit has been examined many
(),xy
.
Question
Show that the fitted line
=+
yxa passes through the point )x(, y .
Solution
Substituting x into the RHSof
=+
=+
yxagives:
yxa
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
Sincea =-yx
Page 9
, it follows that:
yy=- ()x
x
=
y+
is the observed value of a statistic
B whose sampling distribution
has the following
properties:
EB
[]
=,
var[ B ] =
s
2
Sxx
The estimate of the error variance s 2 is based on the sum of squares of the estimated
errors:
1
2
s
=-
n
yy
2? ()2
ii
-
Alternatively, this can be calculated more easily using this equivalent formula:
s
221
=-
n
-
2
(SSyy
xy
S
xx
)
This is given on page 24 of the Tables. We will see later that this is an unbiased estimator of
The R code to fit a linear model to a bivariate data frame, <data>=c(X,Y)
the object
2s
and assign it to
model, is:
model
<-lm(Y
Then the estimates
~
X)
of the coefficients
and error standard
deviation
can be obtained
by:
summary(model)
The R code to draw the fitted
regression
line on an existing
plot is:
abline(model)
Question
The sample of ten claims and payments above (in units of 100) has the following summations:
??xx 35.4 ,
==
22133.76 , ?
y
=
32.87
,?
Calculate the fitted regression line and the estimated
The Actuarial
Education
Company
y
=
115.2025
, ? xy
= 123.81
error variance.
IFE: 2022 Examination
.
Page 10
CS1-12: Linear regression
Solution
Number of pairs of observations
= 10n
.
35.4 2
Sxx
=-nx?x22
133.76=-
Syy
yny22
=-?
Sxy
(35.4
=-xynxy?
123.81=-
=
s
7.4502
Sxx
8.444
yx=2
115.2025 =-
Sxy
a
=-
yy
SS2
xy
S ()
xx
Sothe fitted regressionline is
IFE: 2022 Examinations
8.444
=
32.287
=
10
7.1588
32.87)
10
= 7.4502
==0.88231
3.287 =- (0.88231
1
n- 2
10
3.54)
=
1
8( 7.1588
=-
0.164
7.45022 8.444)
=
0.0732
0.164=+0.8823
yx which is shown on the graph below.
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 11
Once we have worked out the estimates
corresponding to ix
usingthe formula
of
a and
,
we can calculate predicted
values of y
yxa
ii.
=+
Question
For the claims settlement
question above, calculate the expected payment
on settlement
for a
claim of 350.
Solution
Since we are working in units of 100
a claim of 350
corresponds
to x
3.5=
.
Substituting this into the regression line gives:
y
0.164=+ 0.8823 3.5 = 3.25
So we would expect the settlement payment to be 325.
1.3
Partitioningthe variability ofthe responses
To help understand
the goodness
responses,
asgivenby
of fit
Syi
yy
of the
=-?y()2
should
model to the data, the total
variation in the
be studied.
Someofthe variationin the responsescan beattributedto the relationship with x (eg y
maytend to be high when x is high, low when x is low) and someis random variation
(unmodellable)
or explained
above and beyond that.
by the
model
Just how muchis attributable to the relationship
is a measure of the goodness
of fit of the
model.
Westart from anidentity involvingiy (the observedy value), y (the overall averageof the y
values) and
Squaring
yi (thepredictedvalueofy).
and summing
yy -=
both sides of:
ii- yy
-i () +
i
yy()
-
yi ()22
gives:
yy()
the cross-product
-=
??
iiy
+
?
-yyi() 2
term vanishing.
The sum on the left is the total
sum of squares
of the responses,
denoted
here by
TOTSS
.
The second sum on the right is the sum of the squares ofthe deviations ofthe fitted
responses (the estimates of the conditional
means) from the overall mean response (the
estimate of the overall mean). It summarises the variability
accounted for, or explained
by
the model. It is called the regression
sum of squares,
denoted here by SSREG.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 12
CS1-12: Linear regression
The first sum on the right is the sum ofthe squares of the estimated errors (response
fitted response,
generally referred to in statistics as a residual
from the fit). It summarises
the remaining
variability, that between the responses
and their fitted values and so
unexplained
SS RES .
by the model. It is called the residual
The estimate
of
s
sum of squares,
SSRES
2
is based on it. It is
n - 2
denoted here by
.
So:
SSTOT
SS
=+RES
SSREG
SSRESis often also written as
For computational
purposes
? ()
-a =+
SSREG
ERRSS
(error).
SSTOTS= yy and:
xi
The last step uses the fact that
So
RES
2
()a +
x ??
?? =
SS=
xy
xx
2
S
S2
xx =Sxxxy
.
S 2xy
=-SS
S yy
Sxx
.
Question
Determine
the split of total variation
in the claims and payments
model between
the residual sum
of squares and the regression sum of squares.
Recall that:
nx== 10,
SS
xx
??
== 8.444
35.4 ,
,
yy
=
32.87y
7.1588 , Sxy
=
7.4502
Solution
SSTOT=Syy 7.1588=
SS REG=
?
SSRESn-= 2
S2xy
Sxx
7.45022
8.444
==6.5734
RESSS
=-SSTOT
REGSS
= 0.5854
0.5854 8 = 0.0732 gives the same value of
alternative formula() SS
yy
IFE: 2022 Examinations
2
xy Sxx ( n--
2sthat we obtained earlier using the
.2)
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 13
It can then be shown that:
ESSTOT[]
from
(
n=- 1) +22sSxx
which it follows
that
ES
ESSREG
[]
[]
RESS
=+22
sSxx
=( n -2) s 2.
Hence:
2 ??
??
Sxy
E?? Syy
==
[]2 EE
nSxx??
-22
??
SS
( n- 2)s2
RES
????11 E[ SSRES
] =
??= n 2
n-2
????
n--
??
Sos
2
is an unbiased estimator of2s
In the case that the data are close
=ss
2
.
to aline ( r
high
a strong linear
relationship)
the
modelfits well,the fitted responses (the values on the fitted line) are close to the observed
responses, and so
REGSS
is relatively high with
RESSS
relatively low.
r is referringto Pearsons
correlation
coefficient,whichwecalculated
in Chapter
11.
In the case that the data are not close
to a line ( r low
a weak linear
relationship)
the
model does not fit so well,the fitted responses are not so close to the observed responses,
and so
REGSS
is relatively low and
RESSS
relatively high.
The proportion
coefficient
of the total
variability
of determination,
R2
SSREG
SSTOT
[The value of the
==
of the responses
denoted 2R .
explained
Here, the proportion
by a model is called the
is:
Sxy2
Sxx Syy
proportion
2R
is usually
quoted as a percentage].
R2 cantake valuesbetween 0%and 100%inclusive.
Question
Calculate the coefficient
of determination
for the claims and payments
model and comment
onit.
Recallthat:
SSTOT==
7.1588
SSREG 6.5734
SSRES
= 0.5854
Solution
R2 =
The Actuarial
SSREG
SSTOT
Education
Company
6.5734
==0.918 (91.8%)
7.1588
IFE: 2022 Examination
Page 14
CS1-12: Linear regression
This value is very high and so indicates
the overwhelming
majority of the variation is explained
the model(and hence verylittle is left overin residual variation).
modelis a good fit to the data.
In this case (the simple linear regression
determination is the square of Pearsons
r =
by
Hencethe linear regression
model), note that the value of the coefficient
correlation
coefficient for the data since:
of
Sxy
()1/2
xxSSyy
The Pearsons sample correlation
coefficient
wasintroduced
in the previous chapter
and is given
on page 24 ofthe Tables.
Question
Calculatethe correlation coefficient for the claims and payment data by usingthe coefficient of
determination from the previous question.
Solution
r== 0.918
0.958
Since wesaw earlier that there
was a positive relationship
between claims and the settlement
payments wehave a correlation coefficient of 0.958.
The Rcode to obtain the regression
assigned
to the object
and residual sum of squares for alinear
model
model, is:
anova(model)
The coefficient
of determination
is given in the output
of:
summary(model)
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 15
2
Thefull normal modelandinference
2.1
Thefull normal model
The model must be specified further in order to makeinferences
concerning the responses
based on the fitted
model. In particular, information
on the distribution
of the iY s is
required.
In the full
model, we now assume that the errors, ie , areindependent and identically distributed
as N(0),s 2 variables. This willthen allow usto obtain the distributions for
can then use these to construct
For the full
confidence intervals
model the following
The error variables ie
Under this full
each
additional
assumptions
are: (a) independent,
independent,
distribution
normally
distributed
with
are made:
and (b) normally distributed
mean 0 and variance
random
We
and carry out statistical inference.
model,the ie s are independent, identically
with a normal
and the iY s.
variables,
distributed random variables,
s
2. It follows
with
xa
=+[]iiEY
that the iY s
and
var[ ]iY
are
= s
2
.
B, beingalinear combinationofindependent normalvariables,itself hasa normal
distribution,
The further
(1)
with mean and variance
results
B and
2()- s2
n
(2)
s
Note:
2
s
2
With the full
derive
required
as noted earlier.
are:
are independent
has
a 2?
distribution
with
model in place the iY s
maximum likelihood
estimators
=-2n?
.
have normal
distributions
of the parameters
a,
and it is
, and2s
possible to
(since
maximum
likelihood estimation requires usto know the distribution whereasleast squares estimation does
not).
It is possibleto show that the maximumlikelihood estimators of
least squares estimators, but the MLEof2s
estimator).
a and
are the same asthe
has a different denominator from the least squares
2.2 Inferences onthe slope parameter
To conform to usual practice the distinction
, will now be dropped.
The Actuarial
Education
Company
Only one symbol,
between
namely
B, therandomvariable,andits value
will be used.
IFE: 2022 Examination
Page 16
CS1-12: Linear regression
and var( ) =s 2 Sxxfrom Section 1.2:
Usingthe fact that E()
=
(
()
=-
1/2
s2ASxx )
is a standard
normal variable
Repeating result (2) from Section 2.1:
Bn=- 2() 22ss
Now,since
with
=-2n?
)
(
-
is
variable with
a 2?
degrees offreedom
=-2n?
ands 2 areindependent,
it followsthat
AB
n
1/2
()//
2
{}- has a t distribution
, ie:
) / se(
has a t distribution
wherethe symbol se()
with
(3)
=-2n?
denotes the estimated standard error of
Result (3) can now be used for the construction
value of
, the slope coefficient in the model.
, namely
21/2()xxSs
.
of confidence intervals,
and for tests,
is
the
no
linear
relationship
=
H0:0
on the
hypothesis.
Since
Sxy
=
Sxx
and r =
Sxy
, if 0= then Sxy=0 andr0= too.
xxSS
yy
This t distribution result for the estimator of
is given on page 24 ofthe Tables.
Question
For the claims/settlements
data:
(a)
calculate
line
a two-sided
(b)
test the hypothesis
95% confidence
interval
:1 vs HH
01
for
, the slope of the true regression
=?:1.
Recallthat:
xx== 8.444 ,SSyy
7.1588 ,
Sxy
= 7.4502 ,
= 0.164 ,a
= 0.88231 ,
s
2
=
0.0732
Solution
(a)
se()
(0.0732 / 8.444)
95% confidence
ie
interval
==1/2
0.0931
for
is
(2.3060.8823
0.0931) ie 0.8823
.025,8
{( )}tse
0.2147
So a 95% confidence interval is (0.668,1.10)
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
(b)
Page 17
The 95% two-sided confidence interval in (a) contains the value 1, so the two-sided
test in (b) conducted atthe 5%level results in 0H being accepted.
In R,the statistic and p-value for the test of 0
=:0Hfor alinear regression
model are
displayed in summary(model).
In
R, 95% confidence
intervals
for the parameters
a and
from
alinear
regression
model
are given by:
confint(model,level=0.95)
2.3
Analysis of variance (ANOVA)
In Section1.3 wepartitioned the variabilityinto that arisingfrom the regression model(
andthe left-over residual(or error) variability(
REGSS
)
RESSS
or ERRSS
). Wethen calculated a ratio of the
variances to obtain the proportion of variability that wasdetermined (or explained) by the model
(called the coefficient of determination,
With the distributional
assumptions
R2). This gave us a crude measureoffit.
underlying the full regression
model we can now do a more
formal test of fit. Recallfrom Chapter 7that the variance of a sample taken from a normal
distribution has a chi-square distribution,
(1)nS
-
22
2
?s?n-1
, and the ratio of variances of two
22
2
11()()SS
samples from a normal distribution has an F distribution,
Fss2
22
nn--1
1,12
?
.
Wecan
therefore use an Ftest to compare the regression variance to the residual variance.
Another
method of testing
the no linear relationship
hypothesis
(ie
H0:0
=
) is to analyse
the sum of squares from Section 1.3.
In Section 2.1, wesaw under the full normal modelthat
2
s
SSRES n=-(2)
When0H
Since
we have
SSTOT
is true,
SS REG and
SSREG s2 is
ss22 ~ ?2n- 2. Since
n -(2)
~s? 22n-2.
SSRES
n -(1) is the overall sample variance and so SSTOT s2 is
RESSS
are in fact independent
and SSRES
s
2
is
2
?n -2
it follows
?n2 1.
-
that
2
?1 .
Therefore:
SSREG
SSRESn(
-
regression
2)
residual
mean sum of squares == MSSREG
mean sum of squares
is F1, -2nand0H is rejected for large
The meansum of squares (sometimes
squares bythe degrees of freedom.
MSSRES
values of this ratio.
just called the
mean square) is where we divide the sum of
The sample variance, 2
1
n
-
1
?sxi
=-
x()2 is actually a
mean sum of squares.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 18
CS1-12: Linear regression
Unlike the coefficient
of determination,
we divide the regression
variability
by the residual rather
than the total variability.
Alarge value of this ratio would meanthat the majority ofthe variability is explained by the linear
regression
model. Therefore
we would reject the null hypothesis
of no linear relationship.
The results are usually set out in an ANOVA table:
Degrees of
Freedom
Source of variation
SSREG
1
Regression
Mean Sum of
Squares
Sum of Squares
Residual
n-
2
SS RES
Total
n -
1
SSTOT
SSRE
1G
SS RE
S
n
-(2)
Thetest statistic is the ratio of the values in the last column.
In R,the F statistic andits p-value for a regression
anova(model)
model are given in the output of both
and summary(model).
Question
For the data set of 10 claims
SSTOT
7.1588
and their
settlement
SSREG== 6.5734
payments,
we had:
SSRES = 0.5854
Construct the ANOVAtable and carry out an F test to assess whether
0=
.
Solution
The ANOVA table is:
Source of variation
d.f.
SS
MSS
Regression
1
6.5734
6.5734
Residual
8
0.5854
0.0732
Total
9
7.1588
Under
H0:0
The p-value
=
we have
F
6.5734
of F = 89.8 is less than
IFE: 2022 Examinations
== 89.8 on (1, 8) degrees of freedom.
0.0732
even 0.01, so 0H
is rejected
at the 1% level.
The Actuarial
Education
Compan
CS1-12: Linear regression
2.4
Page 19
Estimatinga mean
responseand predictinganindividual response
(a)
Mean response
This is often the
mainissue
the
whole point of the
modelling
exercise.
For example, the
expected settlement for claims of 460 can be estimated as follows:
If 0
00
is the expected (mean) response
x[|
EY
] == a
The variance
var
x 0 ), 0
+
is estimated
of the estimator
1=+
()
0
0
for a value 0x
by
a
=+ x
of the explanatory
variable (ie
00, which is an unbiased estimator.
is given by:
2
xx() ????
nSxx
??
??
??
s2
Thisresult is given on page 25 of the Tables.
The distribution
in
Section
actually used is a t distribution.
The argument is similar to that described
2.2:
()/ se[] 0
- 00
has a t distribution
with
(4)
=-2n?
where se[] 0 denotes the estimated standard error ofthe estimate, namely:
1
se
xx()
1
??
0
nSxx
0
2??-
?? 2
??
2??
??
??
??
??
????=+
s
Result (4) can be used for the construction
expected response
when 0xx = .
(b)
Individual
Rather than
individual
of confidence
intervals
for the value of the
response
estimating
an expected response
response 0y
(for 0xx =
EY[| x0] an estimate,
) is sometimes
required.
or prediction,
The actual
of an
estimate is the same
as in (a), namely:
00
=+
yxa
but the uncertainty
associated
with this estimator (as measured by the variance) is greater
than in (a) since the value of an individual
response 0y rather than the more stable
mean
response
To cater for the extra variation of an individual
response
about the
2
mean, an extra term s
has to be added into the expression for the variance of the
estimator of a mean response.
In other
is required.
words, the variance of the individual
var(y0)
The Actuarial
Education
1=+
+??
1
Company
2??0 xx() ?? s
nSxx ??
??
response estimator is:
2
IFE: 2022 Examination
Page 20
CS1-12: Linear regression
The result is:
-yy ()/ se[ y00] has a t distribution
with
(5)
=-2n?
where se[]y 0 denotes the estimated standard error of the estimate, namely:
se[ y0]
+??1
=+
1
1/2
??
xx()2????
0
nSxx
s
2??
??
??
??
??
Result (5) can then be used for the construction
of confidence
intervals) for the value of a response
when 0xx = .
The resulting
for the
interval
for an individual
mean response 0
Recall that for an individual
i+ ax
response 0y
is
intervals
(or prediction
wider than the corresponding
interval
.
response value
plus an error term, ie . Since
we have iyx
ii
?eNi(0, s2) anindividual
average. Hence we havethe same estimate
see that there is an additional2s
=+a
a x+
+
e , whichis the regression line
point is on the regression line on
0 asfor the mean
response.However,wecan
in the expression for the variance.
Question
Consider again the claims/settlements
example. Calculate:
(a)
a 95% confidence
interval
for the expected
payments
(b)
a 95% confidence
interval
for the
actual payments
Recall that
s==
0.88231, =
0.164,a
2
predicted
on claims
of 460.
on claims
of 460.
0.0732 and Sxx = 8.444 .
Solution
(a)
Estimate of expected payment
se of estimate
=+
10
0.1636=+ 0.88231(4.6) = 4.222
3.54)2??-1(4.6
??
8.444
??
0.0732
=
0.1306
??
t0.025,8 = 2.306
So confidence
ie 4.222
interval
is
0.301
ie (3.921, 4.523) ie (392,
IFE: 2022 Examinations
(2.306 4.222
0.1306)
452)
The Actuarial
Education
Compan
CS1-12: Linear regression
(b)
Page 21
Predicted
payment
= 4.222
1=+
se of estimate
10
??-1(4.6
3.54) 2??
?? 0.0732
8.444
??
??
+
So confidence interval is
ie 4.222
ie (3.529,
0.693
4.915) ie (353,
can be obtained
newdata
0.3004
(2.3064.222
0.3004)
492)
In R, predicted y values for, say, 4x0 =
c(X,Y)
=
in alinear regression
modelfitted to a data frame
as follows:
<-data.frame(X=4)
predict(model,newdata)
The R code for 95% confidence
intervals
for the
mean and individual
response
are:
predict(model,newdata,interval="confidence",level=0.95)
predict(model,newdata,interval="predict",level=0.95)
2.5
Checkingthe model
The residual
from the fit atix
yi and the fitted
error, the difference
between the response
value, ie:
residual atix is
The R code for
is the estimated
obtaining
yii ey
=-
i
the fitted
values and the residuals
of a linear
regression
model is:
fitted(model)
residuals(model)
By examining the residuals it is possible
model about (i) the true errors ie (which
to investigate
the validity of the assumptions in the
are assumed to be independent
normal variables
with means0 andthe same variance s2), and(ii) the nature ofthe relationship betweenthe
response
and explanatory
variables.
Plotting the residuals
along a line may suggest a departure from normality for the error
distribution.
The sizes of the residuals
should also be looked at, bearing in mind that the
value of s estimates the standard deviation of the error distribution.
Ideally, we would expect the residuals to be symmetrical about 0 and no morethan 3 standard
deviations from it.
Alternatively,
should form
The Actuarial
So skewed residuals
a quantile-quantile
a straight line.
Education
Company
or outliers
would indicate
(Q-Q) plot of the residuals
non-normality.
against
a normal
distribution
IFE: 2022 Examination
Page 22
CS1-12: Linear regression
Recall that
Q-Q plots were introduced
in Chapter 6. They are far superior to dotplots, but will
require the use of Rto produce them using the function qqnorm.
Scatter plots of the residuals against the values of the explanatory
variable (or against the
values of the fitted responses)
are also most informative.
If the residuals
do not have a
random scatter
if there is a pattern
then this suggests an inadequacy in the model.
Question
The claims/settlement
data values were asfollows:
Claim x
2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
Payment y
2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
Calculate the residuals for the fitted regression
0.8823=+
yx0.164
model
.
Solution
Theresiduals ii
ey
iy=-are given in the table below:
xi
2.10
2.40
2.50
3.20
3.60
3.80
4.10
4.20
4.50
5.00
i
e
0.163
-0.221
0.171
-0.377
0.330
-0.266
0.239
-0.159
0.246
-0.125
The dotplot
explanatory
and the Q-Q plot of the residuals
variable are as follows:
IFE: 2022 Examinations
and the plot of the residuals
against the
The Actuarial
Education
Compan
CS1-12: Linear regression
There is
Page 23
nothing to suggest
The dotplot is symmetrical
non-normality
in the first
about 0 and within
s
=30.811
diagram.
either side, so there
are no outliers.
Ideally we would expect morevaluesin the middle andless at the edge, but this is unlikely with
such a small data set.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 24
CS1-12: Linear regression
Nor does there appear to be a pattern in the third diagram.
There appears to be no connection
However, the
between the residuals
Q-Q plot does possibly indicate
If the residuals
were normally
distributed,
and the explanatory
some deficiency
in at least
variable (claims).
one of the values.
we would expect the Q-Q plot to be along the diagonal
line whereas one ofthe valuesis some wayfrom the line.
The Core Reading now considers a different set of data.
Suppose the plot of the residuals
against the explanatory
Wecan see that the size of the residuals
tends to increase
variable
was as follows:
as xincreases.
This suggests
that the error variance is not in fact constant, but is increasing with x . (A transformation
the responses maystabilise the error variance in a situation like this).
Typically,
2.6
of
we would log the data in a situation like this.
Extendingthe scope of the linear model
In certain growth
models the appropriate
model is that the expected response is related to
the explanatory value through an exponential function,
EYii]
[ | x =a exp()i x
. In such a
case the response
Wx
ii
e? =+
data can be transformed
+
(where
i
is then fitted to the data
representation
implies
=
usingwy
=log
and the linear
model:ii
log? a )
)iixw
(,
. The fact that the error structure is additive in this
that it plays a multiplicative
such a structure is considered invalid,
role in the original form
of the
model. If
different methods from those covered in this chapter
would have to be used.
The concept oferror structure is touching on the subject of generalisedlinear
willstudy in Chapter 13.
IFE: 2022 Examinations
models, which we
The Actuarial
Education
Compan
CS1-12: Linear regression
In
Page 25
R we can apply a transformation
model
The Actuarial
Education
<-lm(Y
Company
~
at the
model stage.
For example:
log(X))
IFE: 2022 Examination
Page 26
3
CS1-12: Linear regression
The multiplelinear regression model
3.1 Introduction
Previouslyweexaminedthe relationship between Y, the response(or dependent)variableand
one explanatory(orindependent or regressor)variable X. Wenow considera modelwith k
explanatory variables,
XX
12
,,X? , k .
There are many problems
where one variable can quite accurately be predicted in terms of
another.
However, the use of additional relevant information
should improve
predictions.
There are many different formulae
used to express regression relationships
between more
than two variables.
Most are of the form:
EY X
,..
X
12
X??.kkk
+a
???=
1
x1
As with the simple linear regression
values are to be predicted in terms
12
,...
k are known as the
constants
3.2
+
2 x2
+?
x
+
model discussed earlier Yis a random
of given data values kxx 12,
, ..., x .
multiple regression
which can be determined
from
coefficients.
observed
variable
whose
They are numerical
data.
Fittingthe model
As for the simple linear
model, the
the method of least squares.
The response
xii
variable iY
=+a
Yx11 +
is related to the values
2 i 2 + ? + kxik
Sothe least squares estimates of
y??
qe2
multiple regression
nn
ii
a
a+11
xi
+
+
ei
, 12,
, ...,
2 xi 2 +?
coefficients
xxii 12
,,...
xik
are usually
estimated
by
by
i = 1,... ,n
k arethe values
a,k
,
12,
...,
for which:
2
xik()==????+k
ii == 11
is
minimised.
As for the simple linear
model, to find the estimates the above is differentiated
respect to a and k
12,, ...,
in turn and the results are equated to zero.
partially
with
Question
Asenior actuary wantsto analysethe salaries ofthe 50 actuarial students employed by her
company, using alinear model based on number of exam passesand years of experience. Express
this modeland the available data in terms of the notation given here.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 27
Solution
The basic modelis:
EY
[|]x , 12
x
1x1
=+ a
+x
2 2
Here1x represents the number of exam passes,2x represents the number of years
experience
and Y wouldrepresent the corresponding salary.
a 12,
are constants where:
and
a
reflects the average salary for a new student (with
no exam passes or experience)
and 12 reflect the changes in pay associated with an extra exam pass and an extra
years experience, respectively.
Since the data relates to 50=()n
students,
we need to introduce
an extra subscript i
corresponding to the i th student. Sothe actual salary for the i th student will be:
=+ a
whereie
Yx
ii11
+
2
is the difference
xi 2 +ei
between the students
actual salary and the theoretical
salary for
someone with the same number of exam passesand experience.
Manually solving the equations becomes complicated
multiple linear regressions
are usually
carried
even with2k = .
out using a computer
As a result, such
package.
Soin a paper-based exam wecan only really test the general principles (asin the question above)
rather than the actual
modelling.
The R code to fit a multiple linear
model
<-lm(Y
Then the estimates
~
Well use Rto do the
modelling.
modelto a multivariate data frame is:
X1+X2+...+Xk)
of the coefficients
and error standard
deviation
can be obtained from:
summary(model)
The Actuarial
Education
Company
IFE: 2022 Examination
Page 28
Lets return
Consider
CS1-12: Linear regression
now to the
market equity returns
a set of equity returns
Mkt 1
Mkt 2
0.83%
4.27%
0.12%
3.72%
5.49%
5.21%
Mkt 3
1.79%
0.90%
4.62%
from four
5.68%
7.37%
5.21%
5.05%
3.70%
1.60%
2.34%
2.66%
5.75%
5.08%
6.03%
5.48%
1.03%
1.38%
2.37%
1.47%
0.69%
0.17%
0.38%
2.59%
2.22%
1.42%
1.37%
3.03%
9.47%
2.95%
2.99%
Chapter 11 was as follows:
Model the bottom row Mkt_4 as the response
the explanatory
variables (
3,,X12 XX ).
IFE: 2022 Examinations
(X) .
0.10%
0.54%
from
periods
5.67%
1.40%
The scatterplot
12 time
0.26%
3.38%
3.04%
markets across
0.39%
6.26%
3.26%
different
we saw in Chapter 11.
Mkt 4
2.75%
4.03%
data that
variable (Y ) with the
other three
The Actuarial
markets as
Education
Compan
CS1-12: Linear regression
The basic form
Page 29
of the
Yx
11
=+ a
modelis:
2 x2 +
+
3 x3
where:
xi = return from
Market
ii =,1,2,3
Y = return from Market4
Wecan use Rto estimate the parameters a3,,
the CS1B PBOR course.
12,
for this model. Forfurther details, see
Wecan also estimate the error variance;
when we do so we obtain the
following numbers.
Modelling this (using R) gives:
=-
+
0.211472 yx0.001954
+ 0.125051 x
12+
0.598636x
with s2
3
= 0.004928
Giventhe strong positive correlation between the first and third market, we mayhave been able
to use principle components analysis(from Chapter 11)to reduce the number of variables before
fitting
3.3
our
multiple linear regression
model.
R2in the multipleregressioncase
In the bivariate
responses
case (Section
explained
1.3) we noted that the proportion
by a model, called the coefficient
of the total
variation
of determination,
of the
denoted 2R , was
equal to the square of the correlation coefficient between the dependent variable Y and the
single independent
variable
x.
In the case of multiple regression
independent
variables,
Yexplained
lies
kxx
with a single
12,
, ...,
x ,
dependent
variable,
R2 measures the proportion
Y, and several
of the total
variation in
bythe combination of explanatory variables in the model. The value of2R
between 0 and 1. It
will generally increase
explanatory
variables k increases.
If
100% of the variation in Yis explained
(and cannot
decrease)
as the number
of
R2 = 1 the model perfectly predicts the values of Y:
by variation in kxx 12,, ..., x .
Because R2 cannot decrease as more explanatory variables are added to the model,if it is
used alone to assess the adequacy
moreexplanatory variables.
amount,
of the
model, there
However, these
while adding to the complexity
of the
will always
be a tendency
mayincrease the value of2R
model. Increased
complexity
to add
by a small
is generally
considered to be undesirable.
Weprefer to usethe principle of parsimony whenfitting models, which means wechoose the
simplest modelthat doesthe job. So weneed to introduce a new measurethat prevents usfrom
adding new variables unnecessarily.
To take account
quote an adjusted
of the undesirability
2R
statistic.
of increased
This is a correction
complexity,
of the 2R
the mean square errors (ie the residual meansum of squares,
the number of predictors,
The Actuarial
Education
Company
computer
packages
will often
statistic
which is based on
MSSRES) and takes account of
k , and the number of data points the modelis based on.
IFE: 2022 Examination
Page 30
CS1-12: Linear regression
If wehave k predictors, and n observations:
Adjusted
So MSS
RES
2R
=-
MSSRES
MSSTOT
n
n
1 ??-(1
k--1 ??
R2 )
MSSTOTgives a measureof how muchvariability is explained bythe residuals (or
errors) and takes values between
much variability is explained
coefficient
11??
=-
of determination,
Recall that the
0 and 1. Hence 1 -MSS RES
by the regression
MSSTOTgives a measure of how
model. Soit is a similar
measure to the original
R2.
mean sum of squares (MSS) is the sum of squares divided by the degrees of
freedom. So MSSRE
S
RES n k -(1) and=-SS
MSSTOT
TOT n (1)=-SS
.
The model which maximises the adjusted 2R
the best
proportion
statistic can be regarded in some sense as
model. Note, however, that the adjusted
of the variation in Y which is explained
The R code to obtain the regression
and residual
2R
cannot be interpreted
as the
by variation in the
xx12,, ..., kx .
sum of squares for
alinear
model
assigned to the object model,is:
anova(model)
The adjusted 2R is given in the output
of:
summary(model)
Question
Calculatethe adjusted2R for the equity returns from four different markets, given that
R2 = 0.9831 .
Solution
Wehave 12periods of data=(12)
n
=
(3)k.
and weare modelling market4from the other 3 markets
Hence, we have an adjusted 2R
?--
of:
n-- ??
2
? 112 1 ?
nk-- ??1(1??R ) = 1 - ?112- 3 - 1?(1
0.9831)=0.9768
?
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 31
4
Thefull normal modelandinference
4.1
Thefull normal model
Again, to makeinferences
specify the model further.
In the full
concerning the responses
based on the fitted
model, we need to
We make the same assumptions
as for the linear model:
model, we now assume that the errors, ie , areindependent andidentically distributed
N
(0),s 2 random variables. This willthen allow usto obtain the distributions for
Wecan then
usethese to construct confidence intervals
The error variables ie
Under this full
each
model, the ie s
with a normal
independent,
are: (a) independent,
distribution
and carry out statistical inference.
and (b) normally distributed.
are independent,
with
and the iY s.
identically
distributed
mean 0 and variance
s
2
random
. It follows
variables,
that the iY s
are
normally distributed random variables, with:
EY[]ii
=+ a
11
x
+
2
xi
kxik and var[ ]iY
+?
2 +
This mimics the bivariate linear regression
s2
=
model but with the
mean dependent
on k explanatory
variables.
4.2
Testing hypotheses onindividual covariates
In
multiple regression
variable
the coefficients
on the dependent
variable
k
12,
, ...,
describe the effect of each explanatory
Y after controlling
for the effects
of other explanatory
variables in the model.
Eachcoefficient j
measures
the increasein the value ofthe responsevariable y for a
corresponding increase in the value ofjx independent of the other covariates.
Asin the bivariate case, hypotheses about the values ofk
the hypothesis
the variable ix
=
i
0 which states that, after controlling
has no linear relationship
Recall that in the bivariate case a hypothesis
Generally
for
speaking,
can be tested, notably
for the effects
of other variables,
with Y.
of0=
it is not useful to include
which we cannot reject the hypothesis
12,,...,
that
is equivalent to 0?=
in a multiple regression
i
In R,the statistic and p-value for the tests of H0
=
.
model a covariate ix
0.
=
:0i are given in the output of
summary(model).
The Actuarial
Education
Company
IFE: 2022 Examination
Page 32
CS1-12: Linear regression
Question
For our equity returns
from four
different
markets,
we have the following
By considering the p-values given in the final column comment
output from
on the significance
R:
of the
parameters.
Solution
A p-value ofless than 0.05(helpfully indicated by asterisks)indicates a significant result.
Wecan see that 2
4.3
and3
are significantly
different from zero.
Analysis of variance (ANOVA)
In Section 3.3 we partitioned the variability into that arising from the regression model(
and the left-over
residual (or error) variability
(
RESSS
or SSERR).
Wethen calculated
REGSS
)
a ratio of
the variances to obtain the proportion of variability that wasdetermined (or explained) bythe
model (called the coefficient
Wecan use ANOVA to test
least
of determination,
2:0k
H01
2R ). This gave us a crude
===? =
measure of fit.
against the alternative
H1:0j
for
?
one j .
The ANOVA table is now:
Source of variation
Regression
Residual
Total
Degrees of
Freedom
Sum of Squares
Mean Sum of
Squares
k
SSREG
SSREG k
1--nk
SSRES
n - 1
SSRE
S
( n k--1)
SSTOT
On statistical computer packages the regression
sum of squares is often subdivided
the sum of squares from each explanatory
variable.
IFE: 2022 Examinations
The Actuarial
Education
into
Compan
at
CS1-12: Linear regression
Page 33
Our statistic is now:
SSREG
k
SSRES ( n
regression
=
k--1)
which is Fkn k ,1
mean square
residual
mean square
where 0H is rejected
for large
values of this ratio.
--
The test statistic is just the ratio of the values in the last column.
Unlike the adjusted 2R
total
we divide the regression
mean variability
by the residual rather than the
meanvariability.
Alarge value of this ratio
means that the
majority of the variability is explained
linear regression model. Therefore we would reject the null hypothesis
Atleast one of the predictors must be explaining the variability.
In R,the F statistic and its p-value for aregression
anova(model)
by the
of nolinear
multiple
relationship.
model are given in the output of both
and summary(model).
Question
For our equity returns from four different
markets, where we model Mkt_4 using all of the
other markets we have the following
output:
Explain this result.
Solution
So we can reject
3:0H===
01
2
, since the p-value is extremely small.
So at least one of the coefficients is non-zero, ie there is some relationship
with at least
one of the
covariates.
4.4
Estimating a meanresponse and predicting anindividual response
The whole point of the modelling exerciseis so that wecan estimate values of the response
variable Y given the input variables
xx12
,,x? , k .
Meanresponse
As with the linear
linear regression
model we can estimate the expected (mean) response, 0
model given a vector of explanatory variables, 0x .
EY[| x00 ]== a
0
is estimated
The Actuarial
Education
, for a multiple
by
Company
+
a
=+
1 x01
01 01
+2x02
+
+?+kk
2xx02 ++?
x0
0 ,
kkx
which is an unbiased
estimator.
IFE: 2022 Examination
Page 34
CS1-12: Linear regression
Recall that the
multivariate linear regression
model assumes that the iY s are independent,
normally distributed random variables, with
this expected value to obtain an estimated
EY[]ii
x11
=+a
+
2
xi
2
++?
k
mean response corresponding
xik
.
Wehave used
to the vector 0x .
We
are using vector notation here:
x = 001,,
x
0k()
02 xx...
Individual response
Similarly,
we could
predict
an individual
response
0y
(for
=0xx ) using the same estimate
yx01 + 2 x02 +?
kk but with an extra s2 in the expression for the variance of
=+a 01
x0
the estimator compared to the meanresponse.
Recall that for an individual
response value
we have xii
=+a
yx11 +
2i2 + ? +k
xik
+ei. Each
individual response valueis associated with an error term from the regression line. Since
?eNi(0, s2) anindividual point is on the regression line on average
estimate
++ ax
additional 2s
101
2 02 +?+
hence we havethe same
xx kk
0 asfor the meanresponse. However, there is an
for the variance.
Question
For the equity returns from four different
=-
+
markets, we had the following
0.211472yx0.001954
+ 0.125051 x
12 + 0.598636x
model:
3
where Market 4is the response variable (Y ) and the other three marketsare the explanatory
variables ( X , 12,XX3 ).
Usethis
model to construct
an estimate for the return
Market 1, Market 2 and Market 3 are 8%, 4% and
on Market 4 when the returns
on
1%-, respectively.
Solution
Substituting
these values into
y =-0.001954
+
our equation gives:
0.211472
0.08
+
0.125051
0.04
+
0.598636
-
0.01
=
0.0140
ie 1.40%.
We will use Rto calculate confidence intervals for the meanand individual responses. These are
beyond the scope of the
IFE: 2022 Examinations
written exam.
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 35
For the equity returns
from four
different
for the mean and individual response
are 8%, 4% and -1%, respectively,
newdata
markets, we can obtain 95% confidence
intervals
whenthe returns on Market 1, Market 2 and Market 3
using
R as follows:
<-data.frame(Mkt_1=0.08,
Mkt_2=0.04
,
Mkt_3=-0.01)
predict(model,newdata,interval="confidence",level=0.95)
predict(model,newdata,interval="predict",level=0.95)
These give (1.95%,4.74%)
4.5
and (2.14%,
4.93%), respectively
to 3 SF.
Checking
the model
As we did for the linear
regression
model, we can also calculate the residuals from the fit at
i . Wecan then examine them to seeif they
each xi whichis the estimated error, ii y=-ey
are normally
distributed
and also independent
of the explanatory
variables.
Question
Forthe equity returns from four different
+ 0.125051
+ 0.211472 yx0.001954
=-
where Market 4is the response
variables(
markets, wehad the following
x12+
model:
0.598636x3
variable (Y ) and the other three
markets are the explanatory
3,,XX
12 X ).
The equity returns from four different
Mkt 1
Mkt 2
0.83%
4.27%
Mkt 3
markets for the first time period
were:
Mkt 4
1.79%
0.39%
Calculate the residual for this first time
period.
Solution
Substituting
the values of markets 1 to 3 during the first time period into
y =-0.001954
+
0.211472
0.0083
+
0.125051
0.0427
+
0.598636
our equation gives:
-
0.0179
= -
0.0056
So the residual is:
0.0039
--- ( 0.0056) = 0.0017
A Q-Q plot can be used to test
whether the residuals
are normally distributed.
A plot of the
residuals against the fitted values can be used to determine if the variance is constant
whether they are independent
of the explanatory variables.
The Actuarial
Education
Company
and
IFE: 2022 Examination
Page 36
CS1-12: Linear regression
Question
For our equity returns
of the residuals
from four
different
markets, the
Q-Q plot of the residuals
and the plot
against the fitted values are given here.
Comment on these results.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 37
Solution
The Q-Q plot suggests
normality.
the lower fitted
deficiencies.
4.6
that the one of the data points
does not fit the assumption
of
The plot of residuals against fitted values possibly suggests greater variance for
values.
This
mayindicate
dependency
and thus imply
the
model has some
The processof selecting explanatory variables
Selecting the optimal set of explanatory
approaches
can be outlined:
variables is not always easy.
(1) Forward selection.
Start with the single covariate that is
dependent variable Y . Add that to the model. Then search
Two general
most closely related to the
among the remaining
covariates to find the one whichimproves the adjusted 2R the
most when added to the
model. Continue adding covariates until adding any more causes the adjusted 2R to fall.
In Chapter 11, wesaw that the Pearson correlation coefficient
equity markets were:
Mkt_1
Mkt_2
Mkt_3
matrixfor the returns on the four
Mkt_4
Mkt_1
1.0000000
0.6508163
0.9538019
0.9727972
Mkt_2
0.6508163
1.0000000
0.5321185
0.6893932
Mkt_3
0.9538019
0.5321185
1.0000000
0.9681911
Mkt_4
0.9727972
0.6893932
0.9681911
1.0000000
Lets use a forward selection approach to creating a multiple linear regression model.
the response variable (Y ) and the other three markets are the explanatory variables
Mkt_4 is
3,,XX
12 X .
First covariate
Westart with Mkt_1 asthat hasthe highest correlation with Mkt_4.
Using R weget the model
=-0.000140
+
0.873309yx1 which has an adjusted 2R of 0.941.
Secondcovariate
Using R,adding Mkt_2 givesthe model
adjusted 2R
=-0
0.816265yx + 0.062317 x12.000063
which has an
+
of 0.9411.
Whereasadding Mkt_3 givesthe model
=-0
-
0.490675yx
which has an
x
- 0.418474 13.001692
adjusted2R of 0.9564.
Since adding the covariate Mkt_3improves the adjusted 2R the most, we would go for this
model.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 38
CS1-12: Linear regression
Third covariate
Now we have a model with both Mkt_1 and Mkt_3 ascovariates, we willseeif adding Mkt_2
produces animprovement.
Using R,adding Mkt_2gives the model
=-
+ 0.125051 x
+ 0.211472 yx0.001954
12+ 0.598636x3
which has an adjusted 2R of 0.9768.
However, whilstthis maximisesthe adjusted 2R , one of the coefficients of this modelis not
significantly different from zero.
(2) Backward
selection.
one by one for
Start by adding
which the hypothesis
that
all available
i
0=
covariates.
cannot
Then remove
be rejected
value reaches a maximum, and all the remaining covariates have a statistically
impact
2R
significant
on Y.
For the equity returns from four different
markets,using R,the model with all covariates addedis
+ 0.125051 x
+ 0.211472yx0.001954
12+ 0.598636x3 which has an adjusted2R
=-
covariates
until the adjusted
of 0.9768.
Wecan see that the coefficient for Mkt_1is not significantly different from zero. Removing this
coefficient using Rgivesthe model =-0.002598 + 0.155102yx + 0.785578 x23 whichhasan
adjusted2R of 0.97652.
Soremoving this covariate causesthe adjusted 2R to decrease not increase. So wed probably
keepit.
The problem is the high correlation between Mkt_1 and Mkt_3 meaningthat there is some
overlap between them in descriptive
ability. Ideally
we would use Principal Components
Analysis
from Chapter 11 to reduce the number of covariates by removing this overlap.
4.7
Extendingthe scope of the multiplelinear model
Interaction between terms
Sofar wehaveonly consideredeachvariablejX
as a maineffect,thatis wherewe
incorporate each new variable via an additive term,
in
Xj
willincrease the averageresponse byj
IFE: 2022 Examinations
jjX. This meansthat a unit increase
regardless of the other variables.
The Actuarial
Education
Compan
CS1-12: Linear regression
Soin our
Page 39
multiple linear regression
model with Mkt_4 asthe response variable (Y ), the three
other markets( 3,,X12
X X ) areincluded as maineffects only:
=-
+ 0.125051 x
+ 0.211472yx0.001954
12+
0.598636x3
So an increase in, say Mkt_1, by 1% would lead to an increase in
However, it is often the case that the effect of one predictor
response
variable,
Y, depends
on the value of another
Mkt_4 of 0.211472
variable,
predictor
0.01.
say 1X , on the
variable,
say 2X
.
This is
called interaction.
Thatis, we observe an additional effect when both predictors are present.
We model this
by including
corresponds to the term
The regression
interaction
term,
denoted
12.XX
on paper
which
122XX?
1 , in the regression function.
function
for the two variables, 1X
and 2X
as main effects
and their
is:
YX11
+a =+
Note that
an interaction
2
X2 +?
when an interaction
X1 X2
12
term is used in a model, both
main effects
must also be
included.
R uses a colon to denote interaction,
hence the code to fit the multiple linear
model above
is:
model
<-lm(Y
~
X1+X2+X1:X2)
The shorthand notation for the main effects and the interaction
XX*
12 which corresponds to the whole model above.
So the equivalent
model
way of specifying
<-lm(Y
~
the above
model in
between them is denoted
Ris:
X1*X2)
Interaction effects are described in greater detail in the Generalised Linear Modelschapter.
Polynomialregression
Finally, the term linear
the coefficients
in linear regression
a andk
we could fit a quadratic
12,
, ...,
model
Although this is a bivariate
We use the I( ) function
YX
X12a
+
2
.
model(having only two
model treating
Rto treat a term
For example,
measured variables
X and 2X
as a different
as different
variable.
So the
X and Y) we model
variables.
R code to fit this
modelis:
model
The Actuarial
in
rather than linear in terms of the jX s.
=+
it as a multiple linear regression
quadratic
meansthat the regression function is linear in
Education
<-lm(Y
Company
~
X+I(X^2))
IFE: 2022 Examination
Page 40
CS1-12: Linear regression
The chapter summary starts on the next page so that you can keep
all the chapter summaries together for revision purposes.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 41
Chapter12Summary
Aregression model,such asthe simple linear regression model,can be usedto modelthe
response when an explanatory variable operates at a given level, or to model bivariate data
points.
Simplelinear regression
Thelinear regression modelis given by:
=+a
The parameters
=
Yx
ii
+
where
ei
and s
a ,
2
(0s,
?eNi
2)
can be estimated using the formulae:
Sxy
Sxx
a
yx=-
?()=- yyii
s22
11
22
nn--??
2
Syy-
Sxy??
??=
Sxx
??
These are given on page 24 ofthe Tables.
Confidence intervals
-
s
can be obtained for
using the result:
? tn- 2
2 Sxx
Prediction
intervalsfor a mean
response
0
oranindividualresponse
0y canbeobtained
using the results:
1
00
??0 xx()2??+s2
nSxx ??
??
? tn-2
-yy00
and
1
1
++
? tn-2
??0 xx()2??2
s
nSxx ??
??
These are alsoin the Tables
The Actuarial
Education
Company
IFE: 2022 Examination
Page 42
CS1-12: Linear regression
The fit of the linear
regression
model can be analysed by partitioning
the total variance,
SS
TOT
, into that whichis explainedbythe model, REGSS
, andthat which
is not, RESSS
. The
formulae
for these are asfollows:
TOT
SS
? =-iyy()2s= yy
REG ? iy=- y()2=
SS
SSRES
2
sxy
sxx
? yi =- yi ()2 =syy -
2
sxy
sxx
Thecoefficientof determination,
R2, gives the
by the
percentage of this variance whichis explained
model:
R2==
2
Sxy
SSREG
SSTOT
Sxx Syy
SS
TOT SSRES
REG
=+ SS
Examining
theresiduals,iiy=-
iey
, wewouldexpectthemto benormallydistributed
about
zero and to have no relationship
with the x values.
Both of these features can be examined
using diagrams.
Multiplelinear regression
Thelinear
multiple regression modelis given by:
xii =a + Yx
11
The parameters a,,
k?
Confidence intervals
1
2 i 2++
x ik++?ei
k
and
,
s2
where
s,
?eNi(0
can be estimated using a computer package.
and tests can be carried out for k? 1,,
ANOVAcan be usedto test
H
2:0k
01==
2)
=?
=
.
against the alternative
1:0jH? for
d
)
at least one j :
F
MSSREG
SSREG k
==kn
MSSRESSSRES n k-- (1)
k--,1
fd
IFE: 2022 Examinations
l
l
b
b
df
h
d
d
d l(
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 43
Wecan partition the total variance,
SSREG and that
percentage
TOTSS
, into that
which is not, SSRES. The coefficient
of this variance
whichis explained
which is explained by the
of determination,
by the combination
model,
2R , gives the
of explanatory
variables in
the model.
However, since 2R
cannot decrease as more explanatory
variables are added to the
model,
if it is used alone to assessthe adequacy of the model,there will always be atendency to
add more explanatory variables whichis undesirable. Hence,computer packages quote an
adjusted 2R statistic whichis based on the meansquare errors andtakes account ofthe
number of predictors, k, and the number of data points the modelis based on.
'A djusted'
1
=-
MSS
RES
MSS
TOT
1=-
n
n
??-
221
??
-(1 RR
)
k--1 ??
If the modelis a good fit then we would expect the residuals,
ey
=- iy, to be normally
ii
distributed about zero, have constant variance and no relationship
can be examined using diagrams.
withthe x values. These
Wecan use one of the following approachesto select an optimal set of explanatory
variables:
(1) Forward selection. Start with the single covariate that is mostclosely related to the
dependent variable Y. Add that to the model. Then search among the remaining covariates
to find the one whichimproves the adjusted 2R the most whenadded to the model.
Continue adding covariates
until adding any more causes the adjusted
2R
to fall.
(2) Backward selection. Start by adding all available covariates. Then remove covariates one
by one for
which the hypothesis that
i 0=
cannot
be rejected
until the adjusted
2R
value
reaches a maximum, and all the remaining covariates have astatistically significant impact
on Y.
Interactive
termsoftheform ?ab ai xbixshould
The Actuarial
Education
Company
be added where the effect of one predictor
IFE: 2022 Examination
Page 44
CS1-12: Linear regression
The practice questions start on the next page so that you can keep
all the chapter summaries together for revision purposes.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 45
Chapter12 PracticeQuestions
12.1
A new computerised ultrasound scanning technique has enabled doctors to monitor the weights
of unborn babies. Thetable below shows the estimated weightsfor one particular baby at
fortnightly
intervals
during the pregnancy.
Gestation period (weeks)
30
32
34
36
38
40
Estimated baby weight(kg)
1.6
1.7
2.5
2.8
3.2
3.5
??xx
21
(i)
7,420
?y =?15.3
y220=42.03 ?xy=549.8
Show that:
(a)
(b)
(c)
(ii)
(iii)
==
70,SSyy
== 3.015 and Sxy=14.3.
xx
the fitted regressionline is
s
2
=
=- 4.60 + 0.2043yx.
0.0234.
Calculatethe babys expected weight at 42 weeks(assuming it hasnt been born by then).
(a)
Calculate the residual sum of squares and the regression
sum of squares for these
data.
(b)
(iv)
Calculate the coefficient
Carryout a test of
(v)
0:0H
=
of determination,
R2, and comment
on its value.
vs 1:0H
> , assumingalinear modelis appropriate.
Construct an ANOVA table for the sum of squares from
part (iii)(a) and carry out an F-test
stating the conclusion clearly.
(vi)
(a)
Estimate the mean weight of a baby at 33 weeks.
(b)
Calculate the variance of this meanpredicted response.
(c)
Hence, calculate a 90%confidence interval for the mean weight of a baby at 33
weeks.
(vii)
(a)
Estimate the actual weight of an individual
(b)
Calculate the variance ofthis individual predicted response.
(c)
Hence, calculate a 90%confidence interval for the weight of anindividual baby at
33 weeks.
The Actuarial
Education
Company
baby at 33 weeks.
IFE: 2022 Examination
Page 46
CS1-12: Linear regression
The table below shows some of the residuals:
(viii)
Gestation period (weeks)
30
Residual
0.07
(a)
32
34
36
38
40
0.05
0.04
- 0.07
Calculatethe missingresiduals.
Two plots of the residuals are asfollows:
(b)
Comment on the first dotplot ofthe residuals .
(c)
Comment on the fit of the model using the plot the residuals against the x values.
(d)
Comment on the Q-Qplot of the residuals given below:
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
12.2
Page 47
An analysis using the simple linear regression
xx s==12.2
(i)
(ii)
Calculate
(b)
Test whether
(a)
Calculate r .
(b)
Test whether
style
is significantly
?
different
from zero.
is significantly different from zero.
( yi
y)
=
of determination
Explain how to transform
the following
+iie=+
ya
bx 2
analysis are found to be:
SSRES ??(yi =- yi)22
6.4
Calculate the coefficient
and explain
=
SSTOT
3.6
?(y
y)2 = 10.0
i =-
what this represents.
models to linear form:
i
=yaiebxi
(ii)
Exam
.
The sums of the squares of the errors in a regression
(i)
12.5
= 8.1
xy
Comment on the results of the tests in parts (i) and (ii).
SS
REG=-
12.4
10.6
(a)
(iii)
12.3
ssyy
model based on 19 data points gave:
A university wishesto analysethe performance ofits students on a particular degree course. It
records the scores obtained by a sample of 12 students at entry to the course, and the scores
obtained in their final examinations
Student
by the same students.
The results are as follows:
A
B
C
D
E
F
G
H
I
J
K
L
Entrance exam score x(%)
86
53
71
60
62
79
66
84
90
55
58
72
Finals paper score y (%)
75
60
74
68
70
75
78
90
85
60
62
70
x==
836 ??
(i)
?867xy
22 =63,603
=60,016 ?y
(x?- x)(y- y)= 1,122
Calculatethe fitted linear regression equation of y on x.
[3]
Now assume that the full normal model holds.
(ii)
(a)
Calculate an estimate of the error variance2s
.
(b)
Hence, obtain a 90%confidence interval for 2s .
[3]
(iii)
Test whether the data are positively correlated by considering the slope parameter.
(iv)
Calculate a 95% confidence interval for the meanfinals paper score corresponding to an
individual
The Actuarial
Education
entrance score of 53.
Company
[3]
[3]
IFE: 2022 Examination
Page 48
(v)
12.6
CS1-12: Linear regression
(a)
Calculate the proportion
of variation
explained by the
(b)
Hence,comment on the fit of the model.
model.
[2]
[Total 14]
The share price,in pence, of a certain company is monitored over an 8-year period. The results
are shown in the table below:
Exam style
Time (years)
Price
(
0
1
2
3
4
5
6
7
8
100
131
183
247
330
454
601
819
1,095
xx
?? (y
60-=
?(xx
i - y)(i
- y)
22)= 925,262ii
- y) = 7,087
Anactuary fits the following simple linear regression modelto the data:
yx
ii +ie
=+ a
where
(i)
i = 0,1, ?,8
{}ieareindependent
normal random variables with meanzero and variance
Determine the fitted regression line in
s
2
.
which the price is modelled as the response and
the time as an explanatory variable.
(ii)
Calculate a 99%confidence interval for:
(a)
(b)
(iii)
[2]
, the true underlying slope parameter
s
2
, the true underlying
error variance.
[5]
(a)
State the total sum of squares and calculate its partition into the regression sum
of squares and the residual sum of squares.
(b)
Calculatethe proportion
of variability explained by the model usingthe valuesin
part (iii)(a) to
(c)
(iv)
Comment on the result in part (iii)(b).
[5]
The actuary decidesto check the fit ofthe modelby calculating the residuals.
(a)
Complete the table of residuals (rounding to the nearestinteger):
Time (years)
Residual
IFE: 2022 Examinations
0
132
1
2
3
- 21
- 75
4
5
6
7
- 104
- 75
25
The Actuarial
Education
8
Compan
CS1-12: Linear regression
A dotplot
Page 49
of the residuals is shown below:
-150
-100
(b)
-50
Comment
0
on the assumption
50
100
of normality
150
200
using the dotplot.
A plot of the residuals against time is given below:
200
150
100
50
0
Residua
-50
-100
-150
012345678
Time
(c)
Comment on the appropriateness ofthe linear
the residuals against time.
model by referring to the plot of
[5]
[Total 17]
The Actuarial
Education
Company
IFE: 2022 Examinations
Page 50
12.7
CS1-12: Linear regression
Aschoolteacher
is investigating
the claim that class size does not affect GCSEresults.
His
observations of nine GCSE
classes are asfollows:
Exam style
Class
X1
X2
X3
X4
Y1
Y2
Y3
Y4
Y5
Students
in class( c)
35
32
27
21
34
30
28
24
7
Average GCSEpoint
score for class ( p )
5.9
4.1
2.4
1.7
6.3
5.3
3.5
2.6
1.6
238
(i)
??cc
==
?p =?33.4 p
=149.62 ?cp=983
226,884
Determinethefitted regression
linefor p on c.
[3]
Class X5 was not included in the results above and contains 15 students.
(ii)
(a)
Calculate an estimate
of the average GCSEpoint score for this individual
class.
(b)
Calculatethe standard error for the estimate in part (ii)(a) assuming the full
normal model.
[4]
[Total
12.8
7]
Anactuaryis fitting the following linear regression modelthrough the origin:
Exam style
is Yx
i=+e
ii
(i)
N(0, 2)
e??i
=1,2, n
Showthat the least squares estimator of
=
is given by:
?x iiY
?x 2
[3]
i
(ii)
Derive
thebiasandmean
square
errorof under
thismodel.
[4]
[Total 7]
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
12.9
Exam style
Page 51
Alife assurance company is examining the force of mortality, x
, of a particular
group of
policyholders.
It is thoughtthatit isrelatedtothe age,x,ofthe policyholders
bytheformula:
x
x
Bc=
It is decided to analyse this assumption by usingthe linear regression model:
Yxa
ii +
e=+
i
whereesiN? (0,
2) areindependently distributed
The summary results for eight ages were asfollows:
Age, x
30
32
34
36
38
40
42
44
5.84
6.10
6.48
7.05
7.87
9.03
10.56
12.66
-7.45
-7.40
-7.34
-7.26
-7.15
-7.01
-6.85
-6.67
Force of mortality,x
( 10 4- )
lnx
(3 sf)
?==
(i)
??xx
11,120 ln x =-57.129 ?(lnii 226
) 408.50 ? ln xi =-2,104.5
ii
29
=
x
(a)
Apply a transformation
to the original formula,
xi
x
x,
Bc=
to makeit suitablefor
analysis bylinear regression.
(b)
Writedown expressionsfor Y,
transformation
The graph ofln
25
a and
in terms of x, B and c usingthe
given in part (i)(a).
[2]
x against
theageofthepolicyholder,
xisshown
below:
30
35
40
45
-6
-6.5
-7
-7.5
-8
(ii)
Comment on the suitability of the regression modeland state how this supports the
transformation in part (i)(a).
[1]
Usethe data to calculate least squares estimates of B and c in the original formula.
[3]
(iii)
The Actuarial
Education
Company
IFE: 2022 Examination
Page 52
(iv)
CS1-12: Linear regression
Calculate the coefficient
(b)
Hence comment
(c)
Complete the table of residuals below.
(d)
Comment on the fit by considering the residuals.
Age, x
30
Residual,e
i
(v)
of determination
between lnx
(a)
on the fit of the
32
model to the data.
34
36
- 0.03
0.08
and x.
[5]
38
40
- 0.06
42
44
0.02
0.09
(a)
Calculate a 95% confidence interval for the meanpredicted response ln
(b)
Hence obtain a 95% confidence interval for the
35 .
mean predicted value of
35.
[4]
[Total 16]
12.10
The government of a country suffering from hyperinflation hassponsored an economist to monitor
the price of abasket
Exam style
ofitems in the populations
staple diet over a one-year period.
As part of his
study, the economist selected six days during the year and on each ofthese days visited asingle
nightclub,
where he recorded the price of a pint of lager.
Hisreport showed the following
Day( i )
8
29
57
92
141
148
Price(iP )
15
17
22
51
88
95
lniP
2.7081
475
??ii
==
2.8332
?ln P =
3.0910
21.5953
3.9318
4.4773
?(ln iiP2254,403
) = 81.1584
prices:
4.5539
?ilniP
= 1,947.020
The economist believesthat the price of a pint oflager in a given bar on day i can be modelledby:
lnii
Pa bi=+
e+
where a and b areconstantsandtheie s are uncorrelated N(0)s
, 2 random variables.
(i)
Estimatethe valuesof a, b and2s.
[5]
(ii)
Calculatethe linear correlation coefficient r.
[1]
(iii)
Calculate a 99%confidence interval for
[2]
(iv)
Determine a 95% confidence interval for the average price of a pint oflager on day 365:
(a)
in the country as a whole
(b)
in a randomly
IFE: 2022 Examinations
b.
selected bar.
[7]
[Total 15]
The Actuarial
Education
Compan
CS1-12: Linear regression
12.11
(i)
Page 53
Show that the
maximum likelihood
estimates (MLEs) of
a and
in the simple linear
regression modelareidentical to the least squares estimates.
Exam style
(ii)
Show that the MLEof2s
[5]
has a different denominator from the least squares estimate.
[4]
[Total 9]
12.12
The effectiveness of atablet containing 1x
mgof drug 1 and2x
mgof drug 2is being tested. In
trials the following results are obtained:
%effectiveness, y
x1
x2
92.5
50.9
20.8
94.9
54.1
16.9
89.3
47.3
25.2
94.1
45.1
49.7
98.9
37.6
95.2
469.7
??yx == 235
yx1222,028.78 ??
(i)
yx
?x
=11,202.68
?x1x2 =
19,870.22
==
?x1
=207.812
22?x2
=12,886.42
8,985.96
Usingthe multiplelinear least square regression model:
a =+
(a)
yx11
+
2
x2
+
e
Show that the least squares estimates
yn
(b)
=+ a
?? xii11
+
2
a,1
and2
satisfy:
?xi 2
yx11
ii
?? xi
=+ a
1
?xi12
yxii22
?? xi =+a
1
?xx1i i2
+
2?xxi2 i1
+
2
?xi22
Hence, using the above data, show that the fitted
=+ 1.194yx+1225.31
0.3015x
(ii)
of
modelis:
[7]
Comment onthe significance ofthe parameters by considering the following output from
Rfor this
model.
[2]
The Actuarial
Education
Company
IFE: 2022 Examination
Page 54
CS1-12: Linear regression
The coefficient of determination for the fitted
(iii)
modelis R2
0.9992=
.
Calculatethe adjusted2R.
[2]
The ANOVAtable for the modelis:
Degrees of
Freedom
Sum of Squares
Regression
2
49.1137
*
Residual
2
0.0383
*
Total
4
49.152
Source of variation
(iv)
(v)
MeanSum of
Squares
Calculatethe missingvalues, the Fstatistic and then carry out the Ftest ,stating the
conclusion clearly.
Calculate the percentage
effectiveness for a tablet containing
51.3 mg of drug 1x
[4]
and
18.3 mg of drug 2x.
[2]
The plot of the residuals against the fitted
values and the Q-Q plot of the residuals
are given
below.
(vi)
Comment on the fit ofthe model, makingreference to the plots given above.
[2]
It is thought that the two drugs might have aninteractive effect.
(vii)
(a)
Explain what this
(b)
Write down the formula for the regression modelthat hasthe two drugs as main
effects and alsotheir interaction.
IFE: 2022 Examinations
means.
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 55
The modelin part (vii)(b) has an adjusted2R of 0.9969.
(c)
Comment
on whether the new model is an improvement.
[Total
The Actuarial
Education
Company
[4]
23]
IFE: 2022 Examination
Page 56
CS1-12: Linear regression
The solutions start on the next page so that you can
separate the questions and solutions.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 57
Chapter12Solutions
12.1
(i)(a)
(i)(b)
Calculate sums
xx=-
sx
yy=-
sy
xy=-
sxy
n
n
n
?? x()2 =
7,420
11
-
6
?? y()2 = 42.03 - 11
6
21022
=
70
15.322 =3.015
?? x()( ?y = 549.8-11) 210 15.3 14.3
=
6
Fitted regression line
Usingthe values from part (i)(a):
==
Sxy
14.3
Sxx
70
=
0.2043
The meanvalues are:
n
1210
?xx
==
6
=
1
35 and
yy
?
15.3
==
n
6
=
2.55
So:
yx
=- a
=
2.55 - 0.2043 35
Hencethe fitted regressionline is
(i)(c)
4.60
=- 4.60 + 0.2043yx.
Error variance
1
s2
(ii)
=-
Syy
=-
-
2 ??
Sxy
1
?? =
3.015nSxx?? 24
??
14.32??
??= 0.0234
??
70 ??
Estimated weight at 42 weeks
Usingthe regression line from part (i)(b):
=+ a
The Actuarial
Education
yx
= -
4.60
Company
+
0.2043 42 = 3.98kg
IFE: 2022 Examination
Page 58
(iii)(a)
CS1-12: Linear regression
Partition
of the variability
For the baby weights, Sxx=70, Syy=3.015 and Sxy=14.3. So:
3.015
SSTOT yyS==
SSREG
2
Sxy
14.32
==
Sxx
70
=2.921
SSRES SSTOT=-= SSREG 3.015 - 2.921
(iii)(b)
Coefficient
R2
=
0.094
of determination
SSREG
SSTOT
==
2.921
3.015
= 0.969 or 96.9%
So wesee that, in this case, mostof the variation is explained by the model. The modelis an
excellent fit to the data.
Test for
(iv)
- 0
If 0H is true, then the test statistic
s
The observed value of this statistic is
follows the4t distribution.
2 Sxx
0.2043
-
0
0.0234 70
=11.2, whichis muchgreaterthan 8.610,the
upper 0.05%point of the4t distribution.
So, wereject0H at the 0.05%level and concludethat thereis extremelystrong evidencethat
ie that the babys weightis increasing overtime.
0>
(v)
ANOVA
The ANOVA table is:
Source of variation
df
SS
Regression
1
2.921
2.921
Residual
4
0.0937
0.0234
Total
5
3.015
Under
0:0H=
The p-value of F
IFE: 2022 Examinations
2.921
we have F==
0.0234
124.7=is
MSS
124.7 on (1, 4) degrees of freedom.
muchless than even 0.01, so0H is rejected
at the 1%level.
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 59
Therefore it is reasonable to assume that
(vi)(a)
0?
, ie there is linear relationship.
Estimate of meanresponse
=- 4.60 + 0.2043yx when 0 = 33x we have:
Usingthe least squares regression line of
0 =- 4.60 + 0.2043 33 = 2.141
ie the
mean weight of a baby at 33 weeksis expected to be 2.141kg.
(vi)(b)
Variance of meanresponse
The variance ofthis estimator is calculated as:
va
(vi)(b)
0)
0 -xx()
=+
??
?
?11
2
??
s r(
?
=
??
(33
+
-
35) 2 ?
?
?0.0234
??nSxx
????
670
=
0.00524
Confidenceinterval
The 90% confidence interval
=0ts. 00.05,4
(vii)(a)
2??
Estimate
.(e
will be:
)
2.141
of individual
2.132
=
(1.99,2.30)
response
Theindividualpredictedresponseis also0y
(vii)(b)
0.00524
2.141=kg.
Variance ofindividual response
The variance ofthis estimator is calculated as:
var( y0)
(vii)(b)
???
1=+
0--
+
xx 2?? 2
??s
??
11 (33
?1=+ +
35)2??()
?
670
?0.0234
??nSxx
????
=
0.0287
Confidence interval
The 90% confidence interval
=0y yt00.05,4
(viii)(a)
s. e.(
will be:
)
2.141
2.132
0.0287
=
(1.78,2.50)
Residuals
The completed table is:
Gestation period (weeks)
30
Residual
0.07
The Actuarial
Education
Company
32
0.24-
34
36
38
40
0.15
0.05
0.04
- 0.07
IFE: 2022 Examination
Page 60
CS1-12: Linear regression
(viii)(b)
Comment
on first
dotplot
All values are between
3
=s
3
0.0234
0.46 so there
=
appear to be no outliers.
There maybe possible skewness butits difficult to tell with such a small dataset.
(viii)(c) Comment on the plot residuals against explanatory variable
The plot appears to be patternless
(viii)(d) Interpret
which implies
a good fit.
Q-Qplot
Oneof the values is way off the diagonal line whichindicates that the data set maybe non-normal
and hence the full normal linear regression model maynot be appropriate.
12.2
(i)(a)
Calculate slope parameter
estimate
Usingthe formula given on page 24 of the Tables:
Sxy
==
(i)(b)
8.1
12.2
Sxx
=
0.66393
Test whether slope parameter is significantly different from zero
Weare testing:
0H=?
:0
Hvs
-
Under0H ,
s
2/
hasa 2nt- distribution.
Now:
xxS
2
s
01 :
Syy
=-
-
2 ??
Sxy
??
=
??
11 10.6- 8.12
??=
?? 217
nSxx
??
12.2??
??
0.30718
Sothe observed value of the test statistic is:
0.66393
-
0
= 4.184
0.30718 /12.2
Sincethisis muchgreaterthan 2.898, the upper 0.5%point of the 17tdistribution, wehave
sufficient evidenceto reject 0H at the 1%level. Thereforeit is reasonableto concludethat
0?
(ii)(a)
.
Calculate the correlation coefficient
Usingthe formula on page 25 of the Tables:
r
Sxy
xxSS
yy
IFE: 2022 Examinations
8.1
==
12.2 10.6
=
0.71228
The Actuarial
Education
Compan
CS1-12: Linear regression
(ii)(b)
Page 61
Test whether correlation
coefficient
is significantly
different
from zero
Wearetesting:
:0
rn- 2
Under0H ,
01 :
Hvs
0??H=?
follows the tn2- distribution.
2
1-r
Sothe observed value of the test statistic is:
0.71228 17
= 4.184
1 - 0.712282
Sincethis is muchgreaterthan 2.898, the upper 0.5%point ofthe 17t distribution, we have
sufficient evidenceto reject0H at the 1%level. Thereforeit is reasonableto concludethat
? 0?
.
(iii)
Comment
Thesetests are equivalent. Testing whether there is any correlation is equivalent to testing if the
slope is not zero (ie it is sloping upwards and there is positive correlation orit is sloping
downwards
12.3
and there is negative correlation).
The coefficient
R2
of determination
SSREG
SS
TOT
==
6.4
10.0
This gives the proportion
So the tests give the same statistic
and p-value.
is given by:
= 0.64
of the total variance explained by the
model. So 64% of the variance can
be explained bythe model,leaving 36% ofthe total variance unexplained.
12.4
(i)
Let
Transform quadratic to linear form
Yy=and
Xiix= 2
ii
.
Then the
(ii)
model becomes
Transform
e=+
Ya bX
ii + i .
exponential
to linear form
Takinglogs gives:
ln
Let
lnya
lnYy=
ii and
Then the
The Actuarial
bx=+
ii
x= iiX
.
model becomes
Education
Company
=+
YXa
ii where
a
lna= and
b=
.
IFE: 2022 Examination
Page 62
12.5
CS1-12: Linear regression
(i)
Fitted regression line
Calculating the sums of squares:
Sxx
Sxy
1,774.67
=
12
[1/2]
1,122
=
==
8362
60,016=-
Sxy
1,122
Sxx
1,774.67
yx
=- a
=
0.63223
[1]
72.25- 0.63223 69.667 = 28.205
=
Hence,the fitted regression equation of y on xis
(ii)(a)
1
2
s
(ii)(b)
[1/2]
86
27
- 2
=962.25, so:
12
=Syy
2 ??
Sxy
1
nSxx ??
10
?? =
??
962.25-
1,1222
??
[1]
??= 25.289
??
???
1,774.6
Confidenceinterval for variance
10
s 2
??10
2
s
10
2
, which gives a confidence interval for 2s
25.289
18.31
(iii)
28.205 =+ 0.63223yx .
Estimate of error variance
Wehave Syy=-63,603
Now
[1]
10
25.289
3.94
,
of:
??=??
(13.8,64.2)
[2]
??
Test whether data are positively correlated
Weare testing
=>
H
:0 Hvs 01
:0 .
-
Now
?t10. The observed value of the test statistic is:
s
2
/ Sxx
0.63223 - 0
25.289 /1774.67
Thisis a highly significant
= 5.296
result,
[2]
which exceeds the 0.5% critical value of the
10t distribution
3.169. So we havesufficient evidence at the 0.5%level to reject 0H and weconclude that
(ie the data are positively correlated).
IFE: 2022 Examinations
of
0>
[1]
The Actuarial
Education
Compan
CS1-12: Linear regression
(iv)
Page 63
Confidence interval
for the
mean finals paper score
The variance ofthe distribution of the meanfinals score corresponding to an entrance score of 53
is:
2??
0 -xx() ??s 2+=
69.667
()2??-53
11
12
??
The predicted value is 28.205
+
?? 25.289
1,774.67
0.63223
??nSxx
????
=
6.0657
53 = 61.713+
[1]
[1/2]
.
Wehave a 10t distribution, so the 95% confidence interval is:
61.713
(v)(a)
2.228
[11/2]
Calculate the proportion of variation explained by the model
The proportion
of variability
2
Sxy
R2
(v)(b)
6.0657 = (56.2,67.2)
explained by the
1,1222
xxSsyy
==
1,774.67
962.25
=
model is given by:
73.7%
[1]
Comment
73.7% of the variation is explained by the model, whichindicates that the fit is fairly good. It still
might be worthwhile to examine the residuals to double check that alinear modelis appropriate.
[1]
12.6
(i)
Regression line
Weare given:
60
xx s==
ssyy
925,262
xy
=7,087
So:
==
Since x
sxy
7,087
sxx
60
36== 4 and
9
=- a
yx
=
[1]
=118.117
3,960
9
y==
440, we get:
440 - 118.117 4
= -
32.47
[1]
Sothe regression line is:
=-
The Actuarial
32.47 + 118.117yx
Education
Company
IFE: 2022 Examination
Page 64
CS1-12: Linear regression
(ii)(a)
Confidence interval
for slope parameter
The pivotal quantity is given by:
s
? tn- 2
2 sxx
A 99%confidence interval is given by:
s
tn-2;0.005
2
sxx
From our data:
s
??
925,262
=- 17,087
??
760
??
2
2
12,595.6
=??
[1]
So the 99% confidence interval is given by:
118.117
(ii)(b)
=
12,595.6
3.499
118.117
60
50.696
=
(67.4,169)
[2]
Confidence interval for variance
The pivotal quantity
s) 2
n(2
s
2
is given by:
2
??n-2
[1]
So:
??-
0.99 =<P ?? 22
nn
--2;0.005
2;0.995
??
(2
n
??
)s2
s
2
<??
which gives a confidence interval of:
?? (2)
nn-??
ss
22
???? 22
??
Substituting
nn-- 2;0.005
2:0.995
in, the confidence
7 12,595.6
??=
??
??
(2)
,
20.28
IFE: 2022 Examinations
interval
7 12,595.6
,
(to 3 SF) is:
(4350,89100)
[1]
0.9893
The Actuarial
Education
Compan
CS1-12: Linear regression
(iii)(a)
Page 65
Partition
The total sum of squares,
is 925,262.
SS
TOT y? =-iy
() 2 is givenby yyswhich
[1]
The partition given at the bottom of page 25in the Tablesis:
?iiyy() -=??(
SSTOT
ie
i)
-yy 22 +
()
yi - y 2
SSRES
=+ SSREG
Now, modifying the s 2 formula
SSRES
on page 24 of the Tables, we have:
?(y
y2 s
i =- i ) = yy -
2
sxy
sxx
= 925,262-
7,0872
60
Alternatively,usings 2 from part(ii), weget SSRES(
=-n
= 88,169
[1]
2)s 2 = 7 12,595.6.
? SSREG=925,262 88,169 = 837,093
[1]
Alternatively, this could be calculated as SSREG
(iii)(b)
Proportion
of variability
explained
SSTOT
(iii)(c)
7,0872
sxx
60
==
= 837,093 .
model
R2, whichis given by:
Thisis the coefficient of determination,
SS
R2== REG
by the
2
sxy
837,093
= 90.5%
925,262
[1]
Comment
This tells usthat 90.5% of the variation in the prices is explained
by the
model. Since this leaves
only 9.5%from other non-model sources,it would appear that the modelis a very good fit to the
data.
[1]
(iv)(a)
Residuals
Theresiduals,ie , be calculatedfrom the actual prices,iy , andthe predicted prices, iy :
ii y=ey
i
Usingthe regression line ii32.47
=-
The Actuarial
+
118.117yx from part (i), we get:
=1
? xy =- 32.47 + 118.117 1
86
=4
? xy =- 32.47 + 118.117
440
=8
? xy =-+32.47
Education
Company
4
118.117 8??? 912
e
131=- 86???45
e
[1]
330=- 440???- 110
e 1,095=- 912
183
[1]
[1]
IFE: 2022 Examination
Page 66
CS1-12: Linear regression
(iv)(b)
Dotplot of residuals
Since
?eNi(0, s2) we would expect the dotplot to be normally distributed about zero. This does
not appear to bethe case, butit is difficult to tell with such asmall data set.
(iv)(c)
Plot of residuals
[1]
against time
Clearlythis is not patternless. The residuals are notindependent of the time. This meansthat the
linear
modelis definitely
missing something
and is not appropriate
A plot of the original data (with the regression line) shows
to these data.
[1]
whats happening:
1,200
1,000
800
600
400
200
0
0
1
2
3
4
5
6
7
8
9
-200
The priceincreases in an exponential (rather than linear)
way. Weshould have usedthe log ofthe
price against time instead.
12.7
(i)
Obtain the fitted
regression line
Theregression line for p on c is given by:
=+
where
Scp
=
Scc
?
Sc
cc =-
=-?
Scp
cp
IFE: 2022 Examinations
pca
anda
=-pc
2 ()?c
.
2
= 6,884 -
n
()( cp
n
2382
= 590.2222
9
) =??
983-238
33.4
9
[1]
= 99.75556
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 67
So:
99.75556
==
0.16901
590.2222
33.4
a=-
0.16901
238
99
[1/2]
= -0.75836
[1/2]
Hence,the fitted regression line is:
0.16901pc 0.75836=(ii)(a)
Estimate the GCSEscore
The estimate
of the average GCSEpoint score is obtained from the regression line:
P =-0.75836
(ii)(a)
[1]
Standard
The standard
0.16901 15 = 1.78
error of the predicted
error of this individual
1++
1
- 2
[1]
GCSEscore
response is given by:
cc()02??
2??- 1
wheres2
+
[1]
?? s
nScc ??
Spp
=-
2 ??
Scp
1
?? =
25.66889nScc ?? 7
??
99.755562??
??= 1.25841.
??
590.2222 ??
[1]
Hence,the standard error is given by:
23
28 ??9 ) ?? 1.25841
??
590.2222 ??
??
1 (15
1++
9
1.33302= 1.25841
=
=
12.8
(i)
1.67748
1.29518
[1]
Least squares estimate of slope parameter
Theleast squares estimate minimises
qeii
iY==- x??
()22
2
?ei.
Now:
[1/2]
Differentiating this gives:
dq
=-
?x2(ii
Education
Company
d
The Actuarial
-
Yx
i )
[1]
IFE: 2022 Examination
Page 68
CS1-12: Linear regression
Setting this equal to zero:
? xYii
?
() 0
xi
-=
?? xi 2-=
xY
ii
?
? xY
ii
=
[1]
?xi 2
Thesecondderivative
is
(ii)
?xi
2 >20, so wedohavea minimum.
[1/2]
Bias and mean square error
The expectation
of
is:
??
xYxE
ii
??
??
??
()== EE
Now
0
e
i)
EY
=ii
()x=+E( x
2
xi
2
E ()
?? Yi i()
22
[1/2]
??xxii
0+ =
x
i
i
. So:
[1/2]
==?
[1/2]
?xi
Hence:
biasE ()=-
()
The variance of
var( )
Now var( )
=
0
[1/2]
is:
var
?? i 2 var( Yi )
??
xYx
ii
??==
?xi
??
??
[1/2]
?xi
()222
var( Yx
i ) = s 2. So:
ii=+ ei) = var(e
22
var()==?xi
?xi2()
s
s
[1/2]
2
22
?xi
Hence:
MSE()
(
=+ var()
IFE: 2022 Examinations
bias2
)
s
=
2
[1]
?xi2
The Actuarial
Education
Compan
CS1-12: Linear regression
12.9
(i)(a)
Page 69
Transformation
Takinglogs of the original expression gives:
ln
(i)(b)
Bxln
xc=+ln
[1]
Expressions for parameters
This expressionis nowlinear in x. Comparing the expression with =+ a
ln
(ii)
a
==
lnYBx
Yx gives:
[1]
=lnc
Comment
The graph appears to show an approximately linear relationship and this supports the
transformation
in part (i)(a). However, it does appear to have a slight curve and this would
warrant closer inspection of the modelto seeif it is appropriate for the data.
(iii)
Least squares estimates
Obtaining the estimates of
y=
[1]
a and
usingthe formulae given on page 24 of the Tables with
ln x:
2
xx=-
?sx
xy
?sxy=- nxy
==
nx22 = 11,120 - 8
sxy
9.273
sxx
168
yx
=- a
296??
?? = 168
8 ??
= -2,104.5 - 8???
??
-?296
57.129 ?
88
???
? = 9.273
?
0.055196
=
- 57.129
=
[1]
296
-0.055196
= -9.1834
88
[1]
Therefore, we obtain:
Bea == e - 9.1834
ce
(iv)(a)
0.055196
e
==
Coefficient
The coefficient
The Actuarial
yy ?sy
Education
[1]
=
1.06
of determination
of determination
22
Rr
==
where
0.000103
=
2
sxy
xxssyy
=
is given by:
9.2732
= 95.7%
168 0.53467
22= 408.50- 8
ny=-
Company
[1]
2
8
??-57.129
?? = 0.53467.
??
IFE: 2022 Examination
Page 70
(iv)(b)
CS1-12: Linear regression
Comment
Thistells usthat 95.7% of the variation in the data can be explained by the modeland so indicates
an extremely good overall fit ofthe model.
[1]
(iv)(c)
Calculate residuals
The completed
table of residuals
Age,x
Residual,e
i
30
32
0.08
0.02
using
ii y=ey
i is:
34
36
-
0.03
38
-
40
-
0.06
-
0.06
0.03
Age 32 yrs:
( 7.40)-- ( - 9.1834
+
0.055196
32)
Age 36 yrs:
( 7.26)-- ( - 9.1834
+
0.055196
36) =
-
0.06
Age 40 yrs:
( 7.01)-- ( - 9.1834
+
0.055196
40) =
-
0.03
=
42
44
0.02
0.09
0.02
[1]
[1]
(iv)(d)
Comment
Theresidualsshouldbepatternlesswhenplottedagainst x, howeverit is clearto seethat some
pattern exists this indicates that the linear
variable at work here.
(v)(a)
modelis not a good fit and that there is some other
[1]
Confidenceinterval for the log of the mean predicted value
Using the formula
xx 2??
???
?+=
where2
s
given on page 25 of the Tables, the variance
?? s2
??
11 (35-- 37)2??()0
?
?0.0038056
+
8168 ????
??nSxx
=
of the
mean predicted response is:
0.0005663
[1]
??
2
=??0.53467 =- 19.273
6168 ??
??
0.0038056.
The estimate is Y==-9.1834
l
35n
+
[1]
0.055196 35
= -
7.251. Usingthe 6t
distribution, a 95%
confidence interval for Y ln 35=is:
7.251(v)(b)
2.447
0.0005663
Confidence interval
The corresponding
IFE: 2022 Examinations
for
=
( - 7.309,
-
[1]
7.193)
mean predicted value
95% confidence interval for
35 is (0.000669,0.000752).
[1]
The Actuarial
Education
Compan
CS1-12: Linear regression
12.10
(i)
Estimate
Page 71
parameters
Nowusingx for i andy for lniP, weget:
xx
?sx
nx=22 = 16,799
?sxy
xy=-nxy
= 237.39
yy ?sy
22=
ny=-
3.4322
Sothe estimatesfor a, b and2s
sxy
b==
sxx
237.39
16,799
(ii)
1
n-2
(syy=-
are:
[1]
= 0.01413
ay bx== 21.5953
s2
[2]
-0.01413
2
sxy
)
sxx
1
=
475??
??= 2.4805
[1]
66 ??
237.392
(3.4322-
4
16,799
)
=
0.01940
[1]
Correlation coefficient
The correlation coefficient is:
r
sxy
237.39
xxssyy
(iii)
==
16,799 3.4322
=
0.989
[1]
Confidenceinterval for slope parameter
Usingthe result given on page 24 of the Tables, wehave:
= bt4;0.005
s
2
Sxx
0.01413
This gives a confidence interval
(iv)(a)
If
Confidence interval
for
for
4.604
0.01940
[1]
16,799
b of (0.00918,0.0191).
[1]
mean response
365ydenotes the log of the average price of a pint oflager in the country as a whole on day
365,the predictedvaluefor
365y
is:
y 365 2.4805
=+ 0.01413 365 = 7.638
The Actuarial
Education
Company
[1]
IFE: 2022 Examination
Page 72
CS1-12: Linear regression
The distribution of 365
YY
-
S365
365 is 4t , where:
1 (365
x ) ?? 22
=+
=s??
??? nSxx ??
s365
?1 [365 (475 / 6)]-22 ?
?
+
?6
So a symmetrical 95% confidence interval for
7.638
2.776
0.09758
7.638
?0.01940
?
?
16,799
=
[1]
0.09758
365y
is:
0.867
and the corresponding confidence interval for
[1]
= (6.77,8.51)=
365P
is:
( 6.771ee 8.505
=) (870,4940)
[1]
,
(iv)(b)
Confidenceinterval for individual response
*
If y365
denotes the log ofthe observed price of a pint oflager in arandomly selected bar on day
YY
365
*
365 -
365, then
hasa4t distribution, where:
S*365
*2
1=+
365
x) 2????
1 (365
+
nSxx
2
??
This gives a confidence interval
=ss2365 ss
+2
= 0.09758
+
0.01940
=
0.11698
[1]
of:
7.638 2.776 0.11698
7.638 0.949 = (6.69,8.59)=
[1]
*
Sothe confidence interval for P365is:
( 6.689ee8.587) = (800, 5360)
[1]
,
12.11
(i)
MLEsof a and
EachiY
has a
s+
L ?sp
2
Nxia
(, 2) distribution,
exp
11
=-??
so the joint likelihood
yx
ii
-- a
2??
??
s
=
??
??
1
sp
function is:
nn
nnexp
- 1 ?(yi --2 i
)
22(2
ii==
11
ax
s
)2??
????
???? [1]
??
Takinglogs weget:
logLn
=-
log
1
-
n
2s 2 ?(y
-sa
-x
ii) 2 +constant
[1/2]
i=1
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
Differentiating
Page 73
with respect to
?logL
2s2 ?2(yx
xii
-
?
) (-
-a
i)
[1]
n
1
2s2 ?2(yx
=-
ii)
( - 1)
equal
to 0 weget??
yn
xa --
-
?a
-a
i=1
logL
[1/2]
nn
ii
?a
Bysetting
?
logL
nn
equal
to 0 weget??yxii
n
xi-ii==11
These are the same normal equations that
.
[1/2]
0
?nn
??
?? n
??
?
==
??
Now differentiating the log likelihood
?()
-
?s
- a
2
s
MLEsare as before, ie:
nn
??yxii
-
and
denominator
withrespect to
n
1
=-
[1/2]a
a
n
ii==11
==
-
[1]
yx
?
MLEof s2 has a different
?logLn
.
0
=
i =1
?
=
?ii==11
?xi 2
we got before, so the
nxyii ??-?
x ??
y?i? S
i ??
?? ?
?? ?ii 11 ??i 1
==? xy
2
Sxx
nn
?? ?
?
nx2??-?
??
x
?? ? ii ??
Show
=
ii==11
?
(ii)
a:
i=1
?logL
?
with respect to
n
1
=-
Bysetting
and then
-
yxii
-
from the least squares estimate
s:
2s 23
-
i=1
=-
n
s
The Actuarial
Education
1
+
Company
s
n
?()2
-
-a
yx3
ii
[1]
i=1
IFE: 2022 Examination
Page 74
CS1-12: Linear regression
?
Bysetting
logL
equal to 0 and substituting a
yx=, weobtain:
?s
n
1
?
=- sa
ni= 1
()22
yx
ii
-
n
1
? yy x x
=-
ni=1
+
1 n? -yy=-
()
n
ii()
-
??2
( xii
x)
??-
i=1
1 n??
-yy=-
2)(
nn
( xii
ii==11
1 +SSxy
=-
22
n??
-
x
-yy )+ ()22
i
n
??
x) 2????
?(xi
i =1
??
Sxx
yy
1
2
????
SSxy2
=-?2 xySSxy +??
Sxx ?
yy
nSxx
Sxx ??
?
?
??
1
Syy
=-
2 ??
Sxy
[3]
??
nSxx??
??
which has a different denominator from before (and therefore is a biased) estimator.
12.12
(i)(a)
Least squares estimates equations
Weneed to
?((
Qy
minimise the expression
=-a
+
2.
11 2i 2))ii
x
[1]
+x
To do this, we need to differentiate the expression withrespect to the parameters and set the
expressions equal to zero:
?Q
?
=-
2(+yx
-(a
ii
?a
?
ii
?Q
=-2(
yn
xa =+
?? 11 +
?
- (a
?1
?
yxii11
?Q
?
=-2(
11
+
xyii
??
xy
ii
+
xi =+a
- (a
+
2xi 2))=0
11 xi1
1
?x2
2
eqn (1)
i
+
=
2 xi2))
?xi12 + 2?x
21xi1
+
0
eqn (2)
x2i i1
2 xi2 )) =
0
?2
?
yx
ii22
??
xi =+a
1
?xx1ii2
+
2
?xi2
2
eqn (3)
[3]
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
(i)(b)
Page 75
Evaluate the least squares estimates
Substituting these valuesinto the equations above, weget:
(1) 469.7
5 a =+ 235
+
207.8
12
(2) 22,028.78
235 a =+ 11,202.68
(3) 19,870.22
207.8
a=+
8,985.96
+
8,985.96
+
12
12,886.42
12
Solvingthese simultaneously:
47 (1)- (2)
5 (3)1
of
(ii)
207.8 (1)
57.68 (5)-
Substituting
47.12
?
?
3,903.2
this back in,
157.68
1,747.44
(4)
we get
1
=
=- 3,903.2
=2
0.301468
1.19367
and
?
eqn (4)
780.64=-12+
eqn(5)
21,251.26 12+
[1]
=
25.3084a, which gives us a regression line
=+ 1.194 +1225.31
0.3015yxx.
[2]
Significance of the parameters
The p-values for all the parameters
are less than 0.05 and so they are all significantly
different
from zero.
(iii)
[2]
Adjusted2R
Wehave5n=
trials and k2=
adjusted
(iv)
predictors. Hence:
n-- ??
?
2215
1 ??(1) = 1- ?
??(1 -RR
nk--15??
? - 2 - 1?
1=-
0.9992)0.9984
=
[2]
ANOVA
The completed
ANOVA table for the
modelis:
Source of variation
Degrees of
Freedom
Sum of Squares
MeanSum of
Squares
Regression
2
49.1137
24.5569
Residual
2
0.0383
0.0192
Total
4
49.152
[1]
The F statistic is:
F
SSREG k
SSRESn
k--
24.5569
==
(1)
0.0192
=
1,280 (3 SF)
[1]
Thisis far in excessof eventhe 1% 2,2F
critical value of 99.00. Hencewecanreject the null
hypothesis that 0
The Actuarial
Education
12==.
Company
[2]
IFE: 2022 Examination
Page 76
(v)
CS1-12: Linear regression
Predict the percentage
effectiveness
Substituting in the values given:
y
(vi)
25.31=+ (1.194
Interpret
51.3) + (0.3015
18.3)
=
[2]
92.1%
plots
Thefirst plot appears to berandom andthere is no discernible increase in the variance
would imply that the
model meets these assumptions.
so this
Point 1(92.5%) does appear to be an
outlier. Butit is difficult to tell withsuch a small dataset.
Withthe exception
[1]
of point (1) the rest of the values lie along the diagonal line thus implying
normal distribution is appropriate.
(vii)(a)
a
[1]
Interaction
If there is interaction
between the two drugs then there is an additional
effect caused when both
are present compared with what would be expected if they wereeach administered singly.
[1]
(vii)(b) Formula
The formula is
(vii)(c)
Compare
=+a
YX1 +
1
2X2 + ? 1212
X X .
[1]
models
The model with just the two drugs as main effects had an adjusted 2R
of 0.9984 in part (iii)
whereasthe new model with the interactive effect has an adjusted2R of 0.9969.
Sincethere is a decreasein the value of the adjusted2R the previous model would be considered
the best
modelasthe interaction term does not improve the fit enough to justify the extra
parameter.
[2]
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-12: Linear regression
Page 77
Endof Part3
Whatnext?
1.
Briefly review the key areas of Part 3 and/or re-read the summaries at the end of
Chapters 10 to 12.
Ensureyou haveattempted some ofthe Practice Questionsatthe end of eachchapterin
2.
Part 3. If you dont havetime to do them all, you could save the remainder for use as part
of your revision.
3.
Attempt
Assignment
X3.
Workthrough the Chapter10to 12 material(hypothesistests, correlation andregression)
4.
of the Paper B Online Resources(PBOR).
Timeto consider...
... revision andrehearsal products
Revision Notes Each booklet covers one maintheme of the course and includes
integrated questions testing Core Reading,relevant past exam questions and other useful
revision aids.
One student said:
Revision
books are the most useful ActEd resource.
ASET This contains past exam papers with detailed solutions and explanations, pluslots of
comments about exam technique. Onestudent said:
ASET
into
is the single
far
most useful tool ActEd produces.
more detail than necessary
source of learning
and I am sure it
The answers do go
for the exams, but this is a good
has helped
me gain extra
marks in
the exam.
Youcan find lots
moreinformation, including samples, on our website at www.ActEd.co.uk.
Buy online at www.ActEd.co.uk/estore
The Actuarial
Education
Company
IFE: 2022 Examination
Allstudy materialproducedby ActEdis copyright andis sold
for the exclusiveuse of the purchaser. Thecopyright is
ownedbyInstitute andFacultyEducationLimited, a
subsidiary of the Institute and Faculty of Actuaries.
Unlessprior authorityis grantedby ActEd,you maynot hire
out,lend, give out, sell, store ortransmit electronically or
photocopy any part of the study material.
You musttake care of yourstudy materialto ensurethat it
is not used or copied by anybody else.
Legal action will betaken if these terms areinfringed. In
addition, we mayseekto take disciplinaryactionthrough
the profession orthrough your employer.
Theseconditionsremainin force after you havefinished
usingthe course.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 1
Generalised
linear models
Syllabusobjectives
4.2
Generalisedlinear
4.2.1
models
Define an exponential family of distributions.
Show that the following
distributions
may be written in this form: binomial, Poisson, exponential,
gamma and normal.
4.2.2
State the meanand variance for an exponential family, and define the
variance function
and the scale parameter.
Derive these quantities for the
distributions above.
4.2.3
Explain what is
meant by the link function
and the canonical link function,
referring to the distributions above.
4.2.4
Explain whatis meantby a variable, afactor taking categorical values and
aninteraction term. Definethe linear predictor, illustrating its form for
simple
4.2.5
models,including
polynomial
models and models involving
factors.
Define the deviance and scaled deviance and state how the parameters
of
a GLM maybe estimated. Describe how asuitable model maybe chosen
by using an analysis of deviance and by examining the significance ofthe
parameters.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 2
CS1-13:
4.2.6
Generalised linear
models
Definethe Pearson and deviance residuals and describe how they maybe
used.
4.2.7
Applystatistical tests to determine the acceptability of afitted
Pearsons chi-square test and the likelihood-ratio test.
4.2.8
Fit a generalised linear
IFE: 2022 Examinations
model to a data set and interpret
model:
the output.
The Actuarial
Education
Compan
CS1-13:
0
Generalised
linear
models
Page 3
Introduction
In
Chapter 12 weintroduced
the simple
linear
model by adding
by allowing functions
models.
The multiple linear
of these
variables,
including
model built on
interaction.
Recallthat the bivariate linear regression modelis iYx
+ii
a
=+
e and the multiplelinear
11
regression model with k explanatory variablesisYx
xii =a +
normal
regression
more explanatory variables and then we extended this further
model we assume the error terms
are normally
distributed
2i 2++...
xk ik ++ ei. In the full
with mean 0 and variance 2s
.
Hence,the response variable,iY , is also normally distributed.
Generalised linear
models (GLMs)
extend this further
by allowing the distribution
of the data
to be non-normal.
This is particularly important in actuarial
work where the data very often do not have a
normal distribution.
For example, in mortality, the Poisson distribution is used in modelling
the force of mortality, x
and the exponential is used for survival analysis. In general
insurance, the Poisson distribution is often used for modelling the claim frequency
gamma or lognormal
distribution
for the claim severity.
the binomial distribution is used to model propensity.
Finally, in all forms
Claim severity is just another term for the size of a claim, claim frequency
which claims are received
In this chapter
called factors).
and propensity refers to the probability
we also introduce
the idea
of categorical
GLMs are widely used both in general and life insurance.
and the
of insurance,
refers to the rate at
of an event happening.
explanatory
variables
(sometimes
They are used to:
decide whichrating factors to use(rating factors are measurable or categorical factors
that are used as proxies for risk in setting premiums,
eg age or gender)
estimate an appropriate premium to charge for a particular policy given the level ofrisk
present.
For example, in
motor insurance,
there are manyfactors that
may be used as proxies for the level
of risk (type of car driven, age of driver, number of years past driving experience, etc). Wecan
use a GLM both to decide which ofthese factors are significant to the assessment ofrisk (and
hence which should beincluded), and to suggest an appropriate premium to charge for arisk that
represents
a particular combination
of these factors.
Question
Suggest rating factors that an insurance
annuity contract.
The Actuarial
Education
Company
company
may consider in the pricing of a single life
IFE: 2022 Examination
Page 4
CS1-13:
Generalised linear
models
Solution
Rating factors that
might be usedin the pricing of a singlelife annuity include:
age
sex (if permitted
by legislation)
size of fund with whichto purchase an annuity
postcode
health status (for impaired life annuities).
Wehave only used continuous variables so far in linear regression, such as weight, height and size
of claim. Categorical explanatory variables can only take categories, such as gender and type of
car driven.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
1
Generalised
linear
models
Page 5
Generalised
linear models
Generalised linear
models (GLMs)
relate the response
variable that
we want to predict to the
explanatory variables or factors (called predictors, covariates orindependent
about
which
variables)
we have information.
In other words, a GLMhasinputs (explanatory variables) andis usedto predict an output
(response variable).
Tofully define a GLM, we need to specify the following three components.
1.
A distribution
Forlinear
for the response
variable
modelsthe response variable had a normal distribution,
extend this to a general form
For example,
of distributions
known
we might choose a gamma distribution
YN
)s 2 .
~( 0,
as the exponential
to
Wenow
family.
model the sizes of motor insurance
claims
or a Poisson distribution to modelthe number of claims, or a binomial distribution to modelthe
probability
2.
of contracting
Alinear
The linear
disease.
predictor
predictor,
model this
a certain
?, is a function
+ 01x. For the
was
of the covariates.
multivariate linear
model this
regression
was
extended to functions
of the explanatory
For example, if the response variable is weight, alinear predictor of
01x+would be
xx01
1++
2
2
+?
+
kkx
, which wethen
For the bivariate linear
regression
variables.
?
is the Greekletter eta.
appropriate
Alinear
predictor is linear in the parameters 0
covariates.
3.
for a model where we thought the only covariate
For example
Alink
2 is also alinear
?01x=+
x.
. It does not have to belinear in the
predictor.
function
Thelink function
connects the
= EY() . Forlinear
eg
and1
was height,
EY == +01()
x
meanresponse to the linear predictor,
modelsthe meanresponse
, so the link function
is the identity
The link function, like its name suggests, is the link
g()
=
?, where
was equal to the linear predictor,
function,
between the linear
g()
=.
predictor (input)
and the
meanof the distribution (output).
Rememberthat what wearetrying to doin a GLMis find a relationship between the meanofthe
response variable and the covariates.
the link function is invertible,
=g
The Actuarial
1
Education
By setting the link function
we can makethe
mean
()g
?=
, then, assuming that
the subject of the formula:
()?-
Company
IFE: 2022 Examination
Page 6
CS1-13:
The notation is not straightforward
to get to grips with. An example
Generalised linear
models
may help.
Example
Supposethat we are trying to modelthe number of claims on carinsurance policies. The
response variable,iY , is the number of claims from Policyi .
distribution
Wedecide that a Poisson
is appropriate:
?
YPoi
ii ()
Consider a model where we believe that the only covariate is the age,ix , of the policyholder.
The linear
predictor is
iix
? =+ a
.
Alink function
g
that is commonly
()
used with the Poisson distribution
(see page 27 of the Tables) is:
log=
Weset this equal to the linear predictor. So,for Policy i :
()== logii
i
=?a
+
gxi
Now weinvert the formula so that i
==
exp ii()
exp (?a
Wenow have a relationship
+
is the subject of the formula:
xi )
between the
mean of the response
variable and the covariate.
Componentsof a generalisedlinear model
The three components
of a GLM are:
1.
a distribution for the data (Poisson, exponential, gamma, normal or binomial)
2.
alinear predictor (a function ofthe covariates that is linear in the parameters)
3.
alink function
In order to understand
examples below.
(that links the
mean of the response variable to the linear
how these three components
fit together,
predictor).
we give a couple of further
Example
Supposethat we are setting up a modelto predict the passrate for a particular student in a
particular actuarial exam. We might expect there to be manyfactors that affect whether a
student is likely to pass or not. We might decide to set up athree-factor model,so that the
probability
of passing is a function
the number of assignments
the students
IFE: 2022 Examinations
mark on the
of:
N submitted
by the student (a value from
mock exam S (on a scale from
0 to 4)
0 to 100)
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 7
whether the student had attended tutorials or not (Yes/No).
We mightthen decide to usethe linear predictor:
?
whereia
+=+12NSa
i
takes one value for those attending tutorials and a different value for those who do not.
Wenow need alink function.
? here will not necessarily take a value in the interval (0, 1).
Depending on the values ofia , 1
g
() = log
and 2
,
? might take any value. If
??
and set this equal to the linear predictor
??-??
1
invert this function
to
make
the subject to give
==
?, we havelog??=?.
e?
ee
1
=
1 +e
-?
??
We
??-1
() 1.
-
Wecan
11
willlie in the range from zero to one, and so can be used as a passrate.
??
now see that
Wenow use maximum likelihood
a
we use the link function
estimation
++
to estimate the four parameter
values: Ya , Na
(the
parameters corresponding to having attended tutorials and not having attended tutorials,
respectively), 1
(the parameter for the number of assignments) and2
mock mark). To do this
(the parameter for the
we need (ideally) the actual exam results of alarge sample of students
whofall into each of the categories.
Havingdone this for a set of data, we might come up withthe following parameter values for the
linear predictor:
1.501
=-
aY
aN
=-
3.196
1
0.5459=
2
=
0.0251
Wecan now usethe linear predictor andlink function to predict passrates for groups of students
with a particular characteristic. For example, for a student who attends tutorials, submits three
assignments
?
=-
and scores 65% on the
mock, we have:
1.501 + 0.5459 3 + 0.0251 65 = 1.7682
Wenow use the inverse
1.7682
1=+ e-
of the link function
to calculate
:
()1 = 0.8542
-
Sothe model predicts an 85% probability of passingfor astudent in this situation. Soin this
particular
g ()
= log
The Actuarial
situation, the linear
1
Education
predictor is
a
+=+12i
NS?and the link function is
??
??-??
.
Company
IFE: 2022 Examination
Page 8
CS1-13:
Generalised linear
models
Question
Usingthe model outlined above, answer the following questions.
(i)
Calculatethe predicted pass probability for a student who attends tutorials, submits three
assignments and scores 60% on the
mock exam.
(ii)
Calculate how muchthe probability would go upif the fourth assignment weresubmitted.
(iii)
Calculatethe highest pass probability for someone who does not attend tutorials.
(iv)
Determine whether anyone gets a probability of 0 or 1 under this model.If not, calculate
the
(v)
minimum and maximum pass rates.
State the underlying
probability
distribution.
Solution
(i)
Using the values
. So the
(ii)
60=and
aY
2.1886 so that
Using a
=
1.501 , we get?=
1.6427 , so that
0.83790
=
=-3.196
N
N
we use
=4N
instead of
=3Nand get
0.8992 . So the pass rate goes up by about 6%.
=
4 and
=100S
, weget
?=1.4976
, so that
=
0.8172. Sothe
,
highest possible pass rate for someone
(iv)
=-
model predicts an 84% pass rate.
If the fourth assignment wassubmitted,
?=
(iii)
N3= , S
who does not attend tutorials is about 82%.
No. The minimum probability (for someone
who does not attend tutorials
assignments and whoscores zero on the mock)is obtained from a value of
which gives a pass probability
of about 4%. The maximum probability
or submit
?
3.196=-
of passing (for
someone who goesto tutorials, submits all the assignments and scores 100% on the
mock) comes from a value of
?
3.1926= which gives a pass rate of about 96%. So these
arethe maximum and minimum passrates predicted by the model.
(v)
In fact, what weare doing hereis estimating a parameter of a binomial distribution. For
any group of students with the same characteristics (ie all having the same values for all of
the 3 factors), the number who pass may be well-modelled using a binomial distribution.
The parameterof the binomial distribution
value of
e?
?
that weare trying to find is the
n (),ZBin
that wefound above.
=
1+e?
Weare again using
to denote a probability as well as the meanofthe response
variable n=YZ/
.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 9
Example
Astatistician is analysing data on truancy rates for different school pupils. She believes that the
number of unexplained
days off school in a year (ie those not due to sickness etc) for a particular
pupil mayhave a Poisson distribution
number of factors that
catchment
may affect
with parameter
.
However, she believes that there are a
, for example: age of pupil,
whether he/she lives
within the
area, and sex.
She builds a generalisedlinear model based on these characteristics, using data from alarge
group of pupils. Her model willtake the form:
a+ij=+
?
?
where x = age, and
x
a and
are numerical variables corresponding
to the different
characteristics for location and sex respectively.
She has collected the data shown in the table below. Eachfigure gives the average number of
unexplained absences in a year for 16 different
having the same characteristics.
groups of pupils, all the pupils within each group
Average number of unexplained absences per pupil in a year
Agelast birthday
Within
8
10
12
14
1.8
2.0
6.3
14.1
Female
0.5
1.6
5.0
16.2
Male
2.1
7.5
25.5
72.0
Female
2.8
6.2
19.6
68.2
Male
catchment
area
Outside
catchment
area
Bycarrying out a maximumlikelihood estimation analysis, she calculates the values ofthe
parameters that fit the model best. As aresult she can find a value of ?for any particular pupil,
which she can use to find the appropriate
Poisson parameter
she needs afunction that converts a number ?that
(since the Poisson parameter
function
g()
=
log,
using the link function.
In this case
maytake any valueinto a positive number
mustalways be positive). Sofor example she could usethe link
so that whenthis is set equal to the linear predictor
=e?. This will give her a positive valuefor
,
?
and inverted,
whichshe can usefor her Poisson parameter.
Soshe mightcome up withthe following values for the parameters:
aWC
=-2.64
aOC
=-1.14
M
=-3.26
F
where WC=Within catchment , OC= Outsidecatchment,
use the
The Actuarial
model to predict possible truancy rates for students
Education
Company
=-3.54
?
=
0.64
=MaleM
, F =Female. Shecan now
with particular characteristics.
IFE: 2022 Examination
Page 10
CS1-13:
The link function
g()
=log
is called the canonical link function
Generalised linear
models
for the Poisson distribution.
Canonical meansthe accepted form ofthe function. It is anatural
give sensible results.
function to use, and will often
In fact, it is not compulsory to usethe canonical link function and there maybe situations where a
different link function is more appropriate. Each case mustbejudged onits merits.
Question
Determine the expected number of unexplained days absence for afemale pupilliving within the
catchment area whois 12 years old.
Solution
For this combination offactors, wehave:
?aWF
=+
+=12 ?
Using the link function
e1.5
==
1.5
given,
we have:
4.48
Sothe expected number of days unexplained absencein this caseis about 4.5.
We will consider each of the components
function) in the next three sections.
In practice, the distribution
of a GLM (distribution,
predictor,
link
ofthe data is usually specified atthe outset (often defined by
the data), the linear predictor
may be chosen according
convenient, and then the best model structure is found
predictors.
linear
to what is thought appropriate
by looking
at a range of linear
Of course, these are not rules which must be adhered to: it
or
may bethat it is
possible that more than one distribution
could be appropriate,
and these should be
investigated
before making a final decision. It could be unclear which link function
should
be used, and again arange offunctions
The R code to fit a generalised
object model, is:
model
<-glm(Y
We will specify the inputs
IFE: 2022 Examinations
~ ...,
linear
family
can beinvestigated.
model to a multivariate
= ...
(link
for the blanks in the following
= ...
data frame
and assign it to the
))
three sections.
The Actuarial
Education
Compan
CS1-13:
2
Generalised
linear
models
Page 11
Exponentialfamily
Recallthat the distribution of the response variable, Y,in a GLMis a memberofthe exponential
family.
The exponential family is the set of distributions
density function
(PDF)
Y(fy ;,?ff )
where
f
()a,
whose probability function,
can be written in the following
yb((?? ))
cy( ,
=+
a()f
exp
()b and )cy (,
?
are specific
f
or probability
form:
??-
(1)
)??
??
functions.
This formula is given on page 27 of the Tables.
Note that
fis another
way of writing the Greek
letter phi, usually written as f.
There are two parameters in the above PDF. ?, which is called the natural
parameter, is
the one which is relevant to the model for relating the response
()Yto the covariates, and
is known
as the scale parameter
or dispersion
f
parameter.
When trying to show that a distribution is a member of the exponential family, it is important to
remember that ?is a function of
EY=
() only. Weshall see later in the chapter exactly how ?
is usedto relate the response to the covariates.
Wherea distribution hastwo parameters, such asthe N(,
scale parameter
f is to take
f to be the other
2)s
, one approach to determining the
parameter in the distribution,
other than the mean. For example, in the case ofthe normal distribution,
ie the parameter
wetake
f s=
2
.
Wherea distribution has one parameter, such as
Consider the following
? fy() dy
=
statement
about a continuous
()Poi
? , wetake
f1= .
PDF:
1
(2)
y
By substituting
the expression
from (1) and differentiating
this
with respect
to
?, it can be
shown that the meanand variance of Y are:
EY[] =
b'
( ?)
and
var(
)
These formulae
Ya( f )
=
b''
(? ) .
can also be found
on page 27 of the Tables.
Question
Prove these two results for a member ofthe exponential family, usingthe results given above.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 12
CS1-13:
Generalised linear
models
Solution
Mean
Differentiating both sides of equation (2) with respect to ? gives:
yb' ()?
-
?
fy dy()= 0
a()f
y
(3)
Simplifying:
b1()
' ?
yf ()
y dy-=
()??
aa()
ff
()
f y dy
0
yy
y dyY= E( ), and? fy
Since?yf ()
() dy=1, wehave:
()
b'1( ?)
EY
()
aaff( )
-=0
Hence:
EY
()0-=
( )
b??
?
=EY()
b'' ( )
Variance
Usingthe product rule to differentiate equation (3) withrespect to
2
2
fy() dy
d?
=??
??2
yy??
yb()? ??
db''
()=?? f y
() ??
Splitting this into two separate integrals
af()[] 2
Since(bE ()
=
yb ( ? )()2f y() dy--'
)Y?'then
?
?
()
aaff ()
??'
??fy() dy
??
? gives:
0
gives:
b''?1()
a()f ??
yy
=
f y dy() 0
(( 2))yb
var(Y
).
f( y) dy
-='?
Again
? fy() dy=1, sowehave:
b''?
1()
var(Y)-=0
a()f
a()f []2
Rearranging gives:
var()Ya)
=
IFE: 2022 Examinations
bf
( ) ?''(
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 13
In general, note that the mean does not depend on f, so when predicting
Y it is
? which is
of importance.
Also, the variance of the data has two components:
one which involves the
scale parameter, and the other which determines the way the variance depends on the
mean.
The variance
of the
Thats because
normal
and2s
For other distributions,
distribution
does not depend on the
mean.
areindependent.
however, the variance does depend on the
For example, the Poisson distribution
mean.
has mean and variance both equal to the parameter
.
So
knowing the meanof a Poisson distribution tells usthe variance as well.
To emphasise this dependence on the mean the variance is often
var( ) =Ya( )Vf
) ( , where the variance
function
is defined as
written as
() =Vb'' (? )
The variance function
2.1
does not give the variance directly unless
af
()
=
1.
Normaldistribution
To motivate these definitions
normal distribution.
and the subsequent
For members of an exponential family,
developments,
we consider
we want to be able to find formulae
first the
for the
mean and
variance ofthe distribution from the general parameters.
First we will rewrite the normal distribution in the form of equation (1) and then consider
other distributions
as exponential families.
Note that we use f, in a slight abuse of notation,
for both continuous
and discrete distributions.
Wehave seenthis style of notation before in this subject. The alternative notation is to use px
()
for a probability function and
x for a density function.
()f
Provided that the methodis clear,
either notation is acceptable.
Y(;,?f)
fy
exp
=
) 2??
??
1( y-2s2
2ps2
y
exp??log2
2
??
??
??
??
1 y2
2
=-
+?? 22
2 ss
????
??-??
ps2
????
????
??
??
The Actuarial
Education
Company
IFE: 2022 Examination
Page 14
CS1-13:
Generalised linear
models
This is in the form of (1), with:
=?
2
=fs
a ()
=ff
b ?()
=
2
?
2
1??y 2
cy(, )
=- +??
2
log2 fpf
??
f
??
Thus, the natural parameter for the normal distribution is
Alternatively, wecould have said
Using the formulae
var(
)==Ya(
)f?b )
''
above, the
(
=f
fs= and
mean is
()
a
EY()
ff=
and the scale parameter is 2s .
2. Thereis no unique parameterisation.
b' ( )== ??
and the variance is
=
s2
.
Sothese do give us the results that we expect for the normal distribution.
Question
Show that if we reparameterise
the normal distribution
using
2= ?
, westill get the same results
for the meanand variance of the distribution.
Solution
If
we put
2=
?
,
we get the following
a2ff=
()
2
b()
4??=
Using the formulae
for the
EY
()4=='b ( )
and
expressions for the various functions:
2
??
=-1/2(
2
/
log2
+
pf
mean and variance, as before:
22
/
Yb''( ) ?fa
var( )==
( )
cy(,)ffy )
/
=
1/2
=
4
2f=f
Sothe meanand variance are
and2s
= s
2
, as before.
As mentioned above, for the normal distribution, the variance of Y does not depend on the
mean (the variance function
does depend
on the
1?
()==(Vb''
)
), whereas for
option
as this
IFE: 2022 Examinations
the variance
mean.
In R,to use a normal distribution in the glm command,
this
other distributions
distribution
we set family=gaussian
(or omit
is the default).
The Actuarial
Education
Compan
CS1-13:
2.2
Generalised
linear
models
Page 15
Poissondistribution
For the Poisson
distribution:
e
y!
-
y
Y
(fy ; ?f
, )
which is in the form
=
== exp[ ylog -
of (1),
-
log y !]
with:
log ?
1, so that aff( )== 1
?
?() =be
cy(, f )
Thus, the
EY()
natural
b( )
function
=-log
tells
e??'==
y!
parameter for the
=
Poisson
distribution
and the variance function
us that the variance is proportional
is actually equal to the
meansince
a
f
()
=
is
is log,
the
() Vb'' ( )== e?? =
to the
mean.
meanis
.
The variance
We can see that the variance
1.
Question
Comment on whether or not we can re-parameterise
the Poisson distribution
using
f 2=
, say.
Solution
Yes. Just as before withthe normal distribution, there is morethan one wayto set up the
parameters.
However the natural approach is to use1f=
rather than 2f=
, and this is the
most
sensible approach to usein the exam.
In
R,to use a Poisson
The Actuarial
Education
Company
distribution
in the
glm command,
we set family=poisson.
IFE: 2022 Examination
Page 16
2.3
CS1-13:
Generalised linear
models
Binomialdistribution
This is slightly
variable by n.
of Z is
more awkward to deal with, since we have to first divide the binomial random
Thus, suppose
?ZBin (,
) . Let n= YZ , so that =ZnY
. The distribution
n
n??
??
Z(fz ;,
?f
)zn
)(1=-
z??
n?? ny
(1
??
ny??
Yfy(; ,?f )=-
exp
( log
ny
exp
nylog
-
z and by substituting
-n
)
z, the distribution
of Y is:
ny
=+(1 - y )log(1
))
-
n
log
????
??
??+
ny????
n????
??
??
1
for
????+??
?? log(1 )?? log??-??
ny????
??
=+
-
which is in the form of(1), with:
?
=
??
(note that the inverse
??-1
log
??
ofthis is
e?
=
)
1+ e?
f n=
a()f
1
=
f
?
()
cy(, f )
log(1
=+
= log
be? )
n??
??
ny??
The reason for all this is that
?is afunction
of
, the distribution
mean only.
binomial distribution as wetypically quote it, Bin n(, p) , does not have
So we start by considering
Bin)n(,
, which does have
then divide this by n to get a distribution with
as a parameter,
However, the
as one ofits parameters.
but has mean
n.
We
in its probability function and which also has
mean .
Here fn= , the other
parameter in the distribution (ie the parameter other than the mean).
Question
Verifythat the formulae givenin the Core Reading are correct.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 17
Solution
If
?
=
, then to get n in the denominator we need aff=
( )
log
1/
with
f n=
. Similarly,
?()b
1 -
mustbe given by:
( )-= log(1 -? ) =log
()?=+
So
Thus, the
as required,
1 be?()log
natural
??
e?
1-
1
?? = +??log
??
??++??
11 ee
??
and
cy(,f)
=log
parameter for the binomial
??
1be ()log
=-
?
n??
??
ny??
.
distribution
??
is log
??, the
mean is:
??-1
EY[]
e?
b ( )=='??
=
1+ e
and the variance function
is:
e
?
)?() Vb'' (
==
?
(1 + e ) 2
(1
=
Wecan get the second derivative of
-
)
()b?mosteasily by writing
'?
()
=-
() 1
?
1 +be1
-
.
Question
Comment on whether these are the results we would expect.
Solution
Yes. Since Zis binomial with meann and variance n- (1
EY
()=
and:
E( Z)
)
11
= n
nn
n 1(1
var(YZvar(
)==
nn22
R,to use a binomial
The Actuarial
Education
Company
ZnY=
, weshould have:
=
These agree with the results that
In
) and
)
(1-- )
=
=a( )V(f
)
n
we actually got.
distribution
in the glm command,
we set family=binomial.
IFE: 2022 Examination
Page 18
2.4
CS1-13:
Generalised linear
models
Gammadistribution
The best way to consider
?
to
a
and
a
, ie
=
the
a
?=
Gamma distribution
is to change the parameters
from
a and
.
?
Recallthat
that
? mustalways be expressed as afunction of
/
?=a
fY
appears in the PDFformula.
y
(;
)
?f,
?a
ye
-- a?
a()
G
Wecan do this byreplacing the ?:
aa
a
==
11
ye yy /
-
-
a
aG()a
exp??G
=which is in the form
, so the best wayto start is to ensure
of (1),
y
-loga
??
??
??
+( a- 1)log
y
+a
log
a -
log
??
(a )
??
with:
1
?
=-
=fa
a f()
1
=
f
b( )
log(
=-
cy(, )
(
ff
??
-
=-
)
1)log y
log
+ f
f
-
log( G
f
)
Since ?is negative, log( )?- is well-defined.
Thus, the natural parameter for the gamma distribution is
mean is
EY[]
b ( )=='
1
-
=?
.
The variance function
?
is
1
, ignoring the minus sign. The
)?()
Vb'' (
1
==
?
2
=
2
and so the
2
variance is
.
a
Here
In
fa=, the other
R,to use a gamma
parameter in the distribution.
distribution
in the glm command,
we set family=Gamma.
Question
Show that the exponential distribution can be written in the form of a member ofthe exponential
family.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 19
Solution
Wecan write the PDFof an exponential distribution as:
fy()e==? e
() =
Since EY
1
?y
log
??y
-
, this is in the appropriate form with:
?
1
=- ??
The Ex
= -
( ) =- log( - )
b??
()p
? distribution is equivalent to
)aff=
(
f 1=
and
Gamma(1, ?) distribution,
cy
( ,f)
=
0
so the results are consistent
with those for the gamma distribution.
2.5
Lognormal distribution
Finally, the lognormal
distribution is often used, for example in general insurance to model
the distribution
of claim sizes. This can be incorporated
in the framework
of GLMs since if
Y ~lognormal , log Y ~ normal . Thus, if the lognormal distribution is to be used, the data
should first
be logged
and then the normal
In R we could either set another
..., family
= gaussian
(link
gaussian
(link
= ...
modelling
distribution
variable, say Z, equal to log(Y)
= ... )) or we use glm(log(Y)
can be applied.
and then model glm(Z
~ ..., family
=
~
)).
Syllabus objective 4.2.1 requires
students to show that the binomial,
Poisson, exponential,
gamma
and normal distributions are members ofthe exponential family.
Havingthe distribution of the response variable belonging to the exponential family ensuresthe
calculations are easier when estimating the parameters using maximumlikelihood. It also
ensures that the
model possesses good statistical
The Actuarial
Company
Education
properties.
IFE: 2022 Examination
Page 20
3
CS1-13:
Generalised linear
models
Linearpredictor
The second component of a GLMis the linear predictor, ?, whichis afunction ofthe covariates,
ie the input variables to the model.
The covariates (also known as explanatory, predictor orindependent
model through
be estimated.
variables), enter the
the linear predictor.
This is also where the parameters occur which have to
The requirement
is that it is linear in the parameters that we are estimating.
There are two kinds of covariates usedin GLMs: variables and factors.
3.1
Variables
In general,
predictor.
linear
variables
are covariates
where the actual
value of a variable
enters the linear
The age ofthe policyholder is an actuarial example of a variable.
models we have only
met continuous
Sofar in our
variables.
Avariable is atype of covariate whosereal numerical value enters the linear predictor directly,
such as age( x). Other examples of variablesin a carinsurance context are annual mileageand
number of years for
which a driving licence
The bivariate linear
model had a single
predictor of
1
+ 01x . Tofit this
and so the actual value of x
k continuous
of
?,,xx12
has been held.
continuous
explanatory
matters. For the
main effect variables
this
was
x
with a linear
multivariate linear regression
xx201
++
1
+ ...
2
+
.
kkx
and
model with
Again the values
, x k matter.
Weuse the same Rformulae in the glm function
glm(Y
~
X,
glm(Y
~
X1+X2+...+Xk,
as we did in the lm function:
...)
...)
Asin the previous unit we can extend our modelsto include
the variable
variable
modelit is necessary to estimate the parameters, 0
and to linear
predictors
including
more than
Recallthat the linear predictor is linear in the parameters (eg0
linear in the covariates (eg
IFE: 2022 Examinations
?01x=+
2
polynomials, to functions
of
one variable.
and1
) and not necessarily
is also alinear predictor).
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
Some examples,
models
Page 21
where age 1()x
the table below together
and duration 2()x
with the formula
Model
1 (null
are treated
as variables,
are shown in
usedin the glm function in R.
Linear predictor
model)
Rformula
Y
0
age
+x101
age2
+
age + age2
x
age + duration
Y
2
21x
01 1++
xx201
log(age)
01
1x 2
1++
Y
2
log 1x01
+
~
~
1
Y
~
X1
~
I(X^2)
X1
+
I(X1^2)
Y
~
X1
Y
~
log(X1)
+
X2
The null modelhas no covariates and so there is just the intercept parameter. Thisis estimated as
the sample
mean of the response values.
Its fairly easyto see that westart with anintercept
parameter and then add a new term with a
slope
parameter multiplied by the covariate.
However, there is actually alittle
before we get to this simplified linear predictor.
Supposethe linear predictor for age onlyis a
a
+ 22 x2
.
Wecould then obtain alinear
+ 11
more happening
x1 and the linear predictor for duration onlyis
predictor for both of these covariates
by summing their
individual linear predictors:
a 1+
1
x 1()
(a
2++
22)
( 1=++aa 2)
1xx1
+
x22
Thefinal simplified version givenin the table above,
constants together, ie
uncombined
formula
2=+01
aa
. This simplified
and is more efficient
01
1x x++2
formula
asit requires
2,
combines the two
gives the same final values as the
us to estimate three rather than four
parameters.
However,it is actually impossible to estimate1a
and2a individually from any given data and
hence we haveto combine them in the linear predictor to overcome this issue.
this in the following question, where we give four values of the linear
sufficient to estimate four parameters.
The Actuarial
Education
Company
predictor
Wedemonstrate
which should be
IFE: 2022 Examination
Page 22
CS1-13:
Generalised linear
models
Question
Thetable below shows the value of the linear predictor
values of age1()x
?a
12()
=+
a
+
12xx
1+ 2
for different
and duration 2()x .
Linear predictor,
?
Age1()x
Duration2()x
35
20
0
37
20
1
45
30
0
55
30
5
Show that it is impossible to individually estimate all the parameters in the linear predictor.
Solution
Substituting
the given values into the formula
for the linear
predictor
gives the following
four
equations:
(aa
)1220+
=+
37
20
+(aa)
12 +
(aa
55
Subtracting
30+
=+
)12
=+
(aa
12 )
+
and
(1)1
1
(3)
2 (2)
35
=+1
305 1
+
45
2
equations (1) and (2) gives
22=
.
Subtracting equations (1) and (3) gives 10 1
However, substituting
can only estimate
the values of
their total 01 a
1and
10=
we
2
a=+
2 = for15and
hence 1
1=
.
all four equations
into
gives 12
are unable to give individual
() 15aa
.
Hence (4)
+=
estimates ()
1a and 2a . Incidentally,
the actual values used in the question above were 1 =5.5a
and
see that the simplification
models
We
2 = 9.5a.
It is easy to
above gives exactly the same answers as the original values. Other
can also be fitted, including,
for example,
a model for age with no intercept
term.
omit the intercept in R by adding a 1
(ie negative one) to the formula. Its
unusual to have models with no intercept
term as these would give a value of zero when a covariate
is zero.
IFE: 2022 Examinations
The
Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 23
3.2 Interaction betweenvariables
In addition to considering
variables as the main effect we can include interactions
between
variables like we did in Chapter 12, Section 4.7 (ie where the effect on the response variable
of one predictor variable depends on the value of another predictor variable).
Sofar, each covariate has beenincorporated into the linear predictor through an additive term
iix
. Such a term is called the
main effect for that covariate.
For a main effect, the covariate increases the linear
predictor
byi
for each unit increase in ix
independently of all the other covariates.
When
thereisinteractionbetween
twovariables,
sayix andjx , theyarenotindependent.So
the effect of one covariate, ix , on the linear
predictor
depends on the value of the other
covariate,
jx . Wemodelthis usinganadditiveterm oftheformx ijjxi
Recall that
included.
when an interaction
term is used in a model, both
in thelinear predictor.
main effects
must also be
Otherwise we are saying that the variables dont contribute anything independently.
that they are perfectly correlated and hence one of them is unnecessary.
Model
age + duration
Linear predictor
+ age.duration
age * duration
The two
illustrate
xx
Rformula
++
2
+2
32x 1 x
01 1++
2
+2
32x 1 x
xx 01
1
Thisimplies
Y
~
X1+X2+X1:X2
Y
~
X1*X2
models in the table above are equivalent, and have been shown separately
the use of the dot and star model notation in R.
Aninteraction term is denoted using dot notation. In the example above,age.duration
the interaction
between
to
denotes
age and duration (although in Ra colon is used to prevent confusion
with
a decimal point).
The star notation is usedto denote the maineffects and the interaction term. In the example
above, age*duration
3.3
= age + duration + age.duration.
Factorsandinteraction betweenfactors
The other main type of covariate is a factor,
which takes a categorical
the sex of the policyholder is either male or female,
which constitutes
value. For example,
a factor
with two
categories (or levels).
Other examples of factors in a car insurance
context are postcode and car type.
This type of covariate can be parameterised so that the linear predictor has aterm 1a for a
male, and aterm 2a for afemale (ieia
general, there is parameter
The Actuarial
Education
Company
where1i = for a maleand 2i =
for each level that the factor
for a female).
In
maytake.
IFE: 2022 Examination
Page 24
CS1-13:
Factors are typically
non-numerical
Generalised linear
(eg sex). Even for those that are (eg vehicle rating
models
group), it
doesnt makeany sense to include their valuein the linear predictor. Instead we assign
parameter values for each possible category the factor can take.
In the following table sex and vehicle rating group (vrg) are factors. If there is more than
one factor in the model, then the inclusion
of an interaction
term implies that the effect of
each factor depends on the level of the other factor.
Model
Linear predictor
sex
a
vehicle rating
group
sex*vehicle
rating
rating
Again, the last two
Y
~
sex
j
Y
~
vrg
sex
+
a
sex + vehicle rating group
+ sex.vehicle
i
group
sex + vehicle rating
a
group
group
a
Rformula
Y
ij+
ij
++
ij?
ij
++
ij?
Y
~
~
vrg
sex+vrg+sex:vrg
Y
~
sex
*
vrg
models are identical.
As mentioned above, sexis afactor with a parameter assignedto each ofits two categories (ia
where1i = for a maleand2i = for afemale).
Similarly, vehicle rating group is a factor
with a parameter
assigned to each ofits categories.
For
example,if there werethree categories (Group 1, Group 2 and Group 3), then we would have
j
j
,1,2,3 .
=
Weuse a different
subscript from the one used for sex since the
the change to the linear predictor independently
subscript for both, it
Group 2.
would mean that
of all the other covariates. If weused i asthe
males were always in Group 1 and females
Again wecan construct linear predictors that involve
to estimate
1aa
were always in
morethan one covariate bysumming the
linear predictorsfor eachindividual covariate. Hencewe woulduse a+ij
However, it is again impossible
main effects give
,, 212
,
and 3
for age +vrg.
individually
from any given data
set, and so wehaveto combine constants together to overcome this issue. This meansthat one
of those constants effectively becomes zero. Thisis called the baseassumption in the model.
Wedemonstrate this in the following question, where wegive five values of the linear predictor
which should be sufficient to estimate five parameters.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 25
Question
Thetable below shows the value of the linear predictor
Linear predictor,
? =+
ija
for different values of i andj:
Sex,i
?
Group,j
0.5
Male
1
0.48
Male
2
0.41
Male
3
0.6
Female
1
0.58
Female
2
Showthat it is impossible to individually estimate all the parameters in the linear predictor.
Solution
Substituting the given valuesinto the formula for the linear predictor givesthe following five
equations:
0.5=+ a
(1)
11
0.48=+ a
12
(2)
0.41=+ a
13
(3)
0.6=+ a
0.58=+
(4)
21
a
(5)
22
Subtracting (1) and (4) (or subtracting
(2) and (5)) gives
Subtracting (1) and (2) (or subtracting
(4) and (5)) gives
aa
21
0.1=+
.
21
0.02=.
Subtracting (1) and (3) gives
31 0.09=.
Wecan try other combinations but none ofthem will be able to give us estimates of all five
parameters.
If, however, weset one constant to zero, say 1a
0=
(ie our base assumption is that the
policyholder is male)and the other parameters are calculated relative to this level, we obtain:
?
ai
=
i =01(male)
?
?0.1 i
=
2(female)
and
j
? 0.5
?
0.48
? 0.41
?
The actual values used in creating the above question
The Actuarial
Education
Company
j
=
1
j==?2
j
=
3
were:
IFE: 2022 Examination
Page 26
CS1-13:
ai =
? 0.45 i
?
? 0.55
i
=
1(male)
=
2 (female)
? 0.05
and
?
j
0.03
?-=0.04
?
j
=
Generalised linear
models
1
j==?2
j
3
It is worth spending a moment checking that every combination of sex and vehicle rating group
gives exactly the same answer as the estimated
values.
Finally, weconsider aninteraction between two factors:
sex*vehicle rating group = sex + vehicle rating group + sex. vehicle rating group
Westart by summing the linear predictors for each ofthe three terms: sex, vehicle rating group
and sex . vehicle rating group separately.
is
?
We already know that the linear
,1,2 . Similarly, the linear predictor for vehicle group alone is
ia==i
alsoneed alinear predictorfor the interaction effect .==
?
notation
here does not mean multiply.
ij
predictor for sex alone
,1,2,3 .
j==?j
We
. The dot
1,2and = 1,2,3ija
We have written it in this format
for now to indicate
an
interaction.
? =+
a
ij
+
a
i.
j
Analternative (and more commonly used) notation for the interaction term that depends on both
i and j is ij?, sothat:
?
=+a ij
?+ ij
Interaction between sex and vehicle group indicates that the difference in risk levels for maleand
female drivers varies for different
vehicle groups.
For example,if the response variable is the number of claims on a carinsurance policy, the effect
of being male might depend on whether the car being drivenis a Porsche(where the driver might
be tempted
to show off) or a Mini(where the driver
might drive
more carefully).
However,it is againimpossible to estimate all the parameters (1a ,2a , 1 , 2
? 13 ,
21? ,
22?
and
23?
) individually
from any given data set.
,3
,
11?
, 12?
,
We have to combine constants
together to overcome this issue. Thistime it is harder asthere are only six combinations that we
can observe:
Group 1
male
female
Group 2
Group 3
a
11 ++
11?
a
12 ++
12?
a
13 ++
13?
a
21 ++
21?
a
22++
22?
a
23 ++
23?
There are 11 parameters to estimate so we will haveto set five of the parameters equal to zero to
be able to solve the relevant
IFE: 2022 Examinations
equations.
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
For example,
models
Page 27
we might get the following:
=jj
j==
01
i =01(male)
?
=ij ???
0.5
a
?0.66 i = 2 (female)
??
2
0.4
1
??
ijji ==?==
1
0.55
??
ji == 2
??
3
=2
0
j
0
-
3
0
0.55
-
0.58
when the true values might be:
?0.45
=ij?==???
i = 1 (male)
? 0.55
a
i = 2 (female)
0.05
=jjj== 1
0.03
=ji
2
-=
3
0.04
??
??
1
=
ij
??
ji
??
=
1
=2
0.05
0.02
2 0.06 0.03
j
3
-
0.01
-
0.03
Again,it is worth checking that every combination of sex and vehicle rating group gives exactly
the same answer as the estimated
values.
An alternative linear predictor that givesidentical results to
?
where
ijd=
?
?
i==?1
?i = 2
?
dij
?
=+a
ij
?+
ij
is:
jj ==12 j = 3
0.55
0.5
0.4
0.66
0.61
0.48
This gives the six possible combinations directly. Again, do spend a short while checking that
every combination of sex and vehicle rating group gives exactly the same answer as before.
3.4
Predictors with variables andfactors andinteraction
Finally, well look at modelsthat contain both variables and factors.
that includes
predictor:
ai
an age effect
and an effect for the sex of the
predictors for each individual
Here, the linear
?
ia==
For example, a model
could
have alinear
+x
Asabove, wecan construct linear predictors that involve
the linear
policyholder
predictor for age
is
morethan one covariate by summing
covariate.
=+
01x? and the linear
predictor for sex
is
,1,2 . Summing these gives:
i
?
+a
=+ 01
=
()
0
+
aii
+
1xx
Againit is impossible to estimate the parameters0
andia individually, so we haveto combine
them together:
?
a'=+
i
1
x
Wehave added a dash to indicate that the values here are not the same asin the original ia
The Core Readingskips straight to the simplified result,
The Actuarial
Education
Company
a +
i
.
1x.
IFE: 2022 Examination
Page 28
CS1-13:
For example, suppose
we have the following
different values of age ()x
Linear predictor,
values of the linear
predictor
Generalised linear
? =+
+()i
models
01x for
a
and different genders.
?
Age ()x
Sex
1.45
20
Male
1.95
30
Male
1.55
20
Female
2.05
30
Female
Since there are four unknown parameters to estimate, four
Substituting the given values into the formula for the linear
data points should be sufficient.
predictor gives the following four
equations:
1.45
(
=+
01)+
20a 1
(1)
1.95
(
=+
01)+
30a 1
(2)
1.55
(
=+
)
02 +
20a 1
(3)
2.05
(
=+
02 )+
30a 1
(4)
Subtracting
equations (1) and (2) (or subtracting
hence 1
0.05=
equations (3) and (4)) gives 10
0.5=, and
1
.
Subtracting equations (1) and (3) (or subtracting equations (2) and(4)) gives
hence
aa
Substituting
21
1
aa
21
0.1-=
, and
0.1=+
.
0.05= and
aa
21
0.1=+
into both the other equations gives
so weare unable to estimate these parameters separately.
Weabsorb the 0
a 02()
0.55+=and
into the sex
parameters to resolve this issue.
Notice that the
estimated
parameter
separately
0 is redundant
from 1a
and has not been included
(it could
not be
and 2a ).
This gives:
?0.45 if i =1 (male)
1
0.05= and ai' = ?
? 0.55 if
i =2(female)
The actual values usedto construct the question above were:
0
0.5=
,
IFE: 2022 Examinations
1 =
0.05 and
ai =
?-=0.05 i
?
1(male)
?0.05 i =2(female)
The Actuarial
Education
Compan
CS1-13:
Generalised
Again,its
linear
models
Page 29
worth spending
a short
while checking that every combination
of age and sex gives
exactly the same answer as the estimated values.
Notice also that the effect of the age of the policyholder
is the same whether the
policyholder is male or female.
In other words, age and sex areindependent covariates. Thereis nointeraction between them.
In this caseif we wereto draw a graph ofthe linear predictor, it would consist oftwo parallel
straight lines (one for
males and one for females).
?
a2 +
x
a1 +
x
age(x)
Including
the interaction
predictor
of:
i =(1,2)
+x
aii
between the age and sex would lead to a linear
Recallthat aninteraction is wherethe effect of one covariate (eg age) on the linear predictor
depends on the value that another covariate (eg sex) takes.
In this case, the effect
of the age of the policyholder
The graph for the model withinteraction
is different for
males and females.
would consist of two non-parallel straight lines.
?
a1 +
1x
a2 +
2x
age (x)
For example, if the response variable is the number of accidents claimed for on a carinsurance
policy, it might be the casethat young menare more prone to accidents than young women but,
as men get older, there is a steeper
drop off in the number
Lets now consider how we would construct the linear
The Actuarial
Education
Company
of accidents.
predictor
+
iixa
for this
model.
IFE: 2022 Examination
Page 30
CS1-13:
Westart by summing the linear
separately.
linear
predictors for each of the three terms:
Wealready know that the linear predictor for age alone is
predictor for sex alone is
effect
01()
a=+ ?
,1,2 .
? ia==i
Wealso need alinear
Generalised linear
models
age, sex and age.sex
=+ 01x andthat the
?
predictor for the interaction
., xii = 1,2. Thedot notationheredoesnot meanmultiply. Wehave
written it in this format for now to indicate aninteraction.
Weadd the three of these together:
?
01
=+
+
a.ii +
xx
1 ()0
+
a
Wecould then use MLEon a set of past data to come up with estimates based on past data for
each of the parameters. For example, these might be:
0.5=
,1
0
with interaction
0.05=,
ai
?- 0.05
if
i =1 (male)
= ?
i =2(female)
? 0.05 if
terms:
?0.35
if
= ?
0 ai
?0.05 if
.
This approach is rather
i =1(male)
and
i =2(female)
artificial
?-0.15
.ai
=
1
and would involve
?
if
?-0.02 if
estimating
i =1 (male)
=2(female)
i
eight non-zero parameters.
However,there is a moreefficient way. Wecan combine the parameters0
,
ai
andi
0. a
as
these terms are not attached to x in the linear predictor. Similarly, wecan combine the terms
1 and i
Alinear
1.
a
.
predictor that gives identical
? =+
results is:
iixa
where:
ai
? 0.8 if
=?
? 0.6 if
The following table
factor of sex:
i =1(male)
i =2 (female)
summarises
Model
the
and
i
different
i =1 (male)
if
i =2 (female)
age (as a variable)
Linear predictor
+
sex
ai
ai
x+
Y
Y
1
aii
1x+
age*sex
aii
1x+
and the
Rformula
01
1x
age + sex + age.sex
IFE: 2022 Examinations
if
models involving
age
age + sex
?- 0.2
?
?- 0.07
=
Y
Y ~
Y
~
~
X1
~
sex
X1
+
sex
X1+sex+X1:sex
~
X1
*
The Actuarial
sex
Education
Compan
CS1-13:
Generalised
linear
models
Page 31
Question
In UK motorinsurance business, vehicle-rating group is also used as afactor. Vehicles are divided
into twenty categories numbered 1 to 20, with group 20including those vehicles that are most
expensive to repair.
Suppose that
we have a three-factor
model specified
as age*(sex
+
vehicle group) .
Determine
the linear predictor for a model ofthis type.
Solution
A helpful starting pointis to consider the linear predictor for sex + vehicle group onits own.
Summing the linear predictors for both of these maineffects gives:
? =+
ija
Wedont attempt to simplifythis to ija asthis notationis reservedfor aninteraction between
sex and vehicle group, which we are not considering here.
Now weconsider the linear predictor for age * (sex + vehicle group). Recallthat this can also be
written as:
age +(sex + vehicle group) + age . (sex + vehicle group)
Wesum the linear
??
predictors for each of these three components:
?01
=+
Finally, we simplify
a d++
Note that
ij
() +
a
+
ij() + ?
by combining
? i
x
+
j
,
ai
+
()()
. a
xx
?01
+ ij
parameters:
x
we have:
combined
left
j
0?
andi? 0.
a
into a newia
alone
combined 1?
renamed
andi? 1 .
.?j 1
asjd
a
into i?
.
In general, when we add a new main effect, we add n1parameters (or equivalently lose n1degrees of freedom),
where n is the number of parameters that we would have used had the
maineffect stood onits own. In the case wherethe maineffect is afactor, n is also the number
of categories.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 32
When we add an interactive
CS1-13:
factor,
degrees offreedom),
(1)( -- 1)nm
we add
Generalised linear
models
(1)( -- 1)nm
parameters (or equivalently lose
where n and m are the number of parameters that we would
have used had each ofthe maineffects stood on their own. In the case where both these main
effects are factors, n and m are also the number of possible categories for eachfactor.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
4
Generalised
linear
models
Page 33
Linkfunctions
Recallthat the link function
where
invertible
connects the meanresponse to the linear predictor,
g()
=?,
= EY() . Technically, it is necessary for the link function to be differentiable and
in order to fit a model.
Aninvertible function is one that is one-to-one,
so that for any value of ?there is a unique value
of
to be able to invert
.
use the
Wehave seen already that it is important
model to
make predictions
the link function in order to
about the future.
Beyond these basic requirements,
there are a number of functions
which are appropriate
the distributions
above.
However, it is sensible to choose link functions to ensure our
predicted response variables stay within sensible bounds and ideally
for
minimise the residual
variance.
In
R we specify the link function
identity,
log,
by setting link
sqrt,
logit,
equal to the appropriate
inverse,
1/mu^2,
function
etc.
For each distribution, the natural, or canonical, link function is defined g()
Remember that
?is the natural parameter for the exponential
function of the meanof the distribution
If no link function
is specified
in
Rthen these
will be the default
identity
g()
=
Poisson
log
g()
=
log
binomial
logit
g()
=
log
gamma
inverse
g()
=
canonical link function.
family form and that
is a
option.
Hence the
given in Section 2 are:
normal
=-1?
()?.
.
canonical link functions for the distributions
Earlier, weshowed that
=
??
??
??-1
1
for the gamma distribution. The minussignis dropped in the
This doesnt affect anything since constants will be absorbed into the
parameters in the linear
predictor.
The canonical link functions are given on page 27 ofthe Tables.
These link functions
work well for
each of the above distributions,
but it is not obligatory
that they are used in each case. For example, we could use the identity link function in
conjunction
with the Poisson distribution,
had a gamma distribution,
and so on.
The Actuarial
Education
Company
we could
use the log link function
for
data which
IFE: 2022 Examination
Page 34
CS1-13:
However, we need to consider the implications
possible
values for
positive.
If
.
to be positive, whatever value (positive
not true if
we use the identity
Other link functions
of the choice ofthe link function
For example, if the data have a Poisson
we use the log link function,
Generalised linear
distribution
models
on the
then
must be
=e?. Thus,
is guaranteed
or negative) the linear predictor takes. The same is
then
)?
=
log(
and
link function.
exist, and can be quite complex for specific
basis for actuarial applications, the above four functions
modelling
purposes.
As a
are often sufficient.
Question
Determinethe inverse ofthe link function
comment
on why this
g
() = log
1
??
by setting it equal to
??-??
?
and
might be an appropriate link function for the binomial distribution.
Solution
Weusedthis inverse function in the actuarial exam passrates example. It is:
e?
==
ee??-
1
++
=
11
It is an appropriate link function
1 +e-?
() 1
-
for the binomial
distribution
since it results in values of
, the
probability parameter, between 0 and 1.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
5
Generalised
linear
models
Page 35
Modelfitting andcomparison
The process
of choosing
a model also uses
methods
which are approximations,
based on
maximumlikelihood theory, and this section outlines this process.
5.1
Obtaining the estimates
The parameters in a GLM are usually estimated using maximum likelihood
estimation.
The
log-likelihood
function,
?(; ?f, ) = log( yfY y(; ?f, )) , depends on the parameters in the linear
predictor through the link function.
Thus, maximum likelihood
estimates of the parameters
may be obtained by maximising ? with respect to the parameters in the linear predictor.
This depends on the invariance property from Chapter 8, that the MLEof afunction is equal to the
function of the MLE. Wereally wantto find the MLEofthe final parameter . However,
because of the invariance
property it is permissible to find the
translate this into the MLEfor
MLE of the linear
predictor
?, and
.
Question
Claim amounts for
distribution
medicalinsurance
with meani
fy ()==
1
claims for hamsters are believed to have an exponential
:
yii /
yi
exp -
e
log
??
ii
Wehave the following
age ix
data for hamsters
(months)
claim amount ()
??
ii ??-
medical claims, using the
4
8
10
11
17
50
52
119
41
163
model above:
Theinsurer believesthat alinear function of age affects the claim amount:
? =+
iixa
Usingthe canonical link function,
the
maximum likelihood
write down (but do not try to solve) the equations satisfied by
estimates for
a and
, based on the above data.
Solution
Thelog of the likelihood function is:
log L
( )
yi
=--??log
ii
i
The canonical link function for the exponential distribution is
function
connects the
mean response to the linear
predictor,
1iig=
()
. Recallthat the link
()iig
?=
.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 36
CS1-13:
Generalised linear
models
Hence, we have:
1
a =+x
i
i
Rearranging this gives:
1
=
i
a
xi
+
This enables usto
log ( a ,
write the log-likelihood
)
Lyii(a
=-
x )
+
function in terms
?? log(
+
log+Lyi
,a)
(
=aa
?
log ( ,a )
?? +1
=
a and
:
1
?+??
xi
?+??a
xi
+
Sothe equations satisfied bythe
yi-+
:
xi
Lxyii
=-
a and
xi)
+a
Wecan now differentiate this with respect to
?
of
MLEsof
a and
are:
0
axi
and:
xyii -+
xi
?? +
=
0
axi
Substituting in the given data values gives the following
11
1
++
8++ a4
a
4
and:
8
10
++
4
8++ a
a
These are not particularly
equations gives
a
a + 10
1
+
a + 10
a + 11
1
+
11
+
a + 11
a + 17
17
+
a + 17
easy to solve without computer
0.160134=and
=-0.000598
.
equations:
-425
=
0
- 5,028
=
0
assistance.
Using Rto solve the
Wecan then estimate the
mean claim
amounts for various ages using:
1
i
=
a
+ xi
Doing so gives estimates for the claim amounts of 6.34, 6.44, 6.49, 6.51 and 6.67, which are very
poor indeed.
So the
IFE: 2022 Examinations
model does not appear to be appropriate
at all.
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 37
The R code to fit a generalised
linear
model to a multivariate
data frame
and assign it to the
object model,is:
model
<-glm(Y
Then the estimates
obtained by:
~ ...,
family
of the parameters
= ...
and their
(link
= ...
approximate
))
standard
errors can be
summary(model)
An example of a part of the summary output is shown below, which we can see is identical
to the
5.2
multivariate
model output:
Significanceof the parameters
As for the multiple linear regression
model we can test whether each of the parameters is
significantly
different from zero. Generally speaking, it is not useful to include
a covariate
for which we cannot reject the hypothesis that 0 = .
Approximate
standard
likelihood
theory.
errors
of the parameters
can be obtained
using asymptotic
maximum
Recallfrom Chapter 8 that estimators arein general asymptotically normal and unbiased with
variance equal to the Cramr-Rao lower bound:
)
???(,NCRLB
Hence, whentesting
- 0
se.()
large
0:0H
=
vs
n
1:0H
?
???N(0,1)
For atwo-tailed test the critical values are
1.96> )se
. .(
As a rough
, we usethe result:
1.96. So we have a significant
value if
Wecould approximate the 1.96 by 2 for simplicity.
guide, an indication
of the significance
of the parameters is given by twice the
standard error. Thus, if:
> 2 standard
error()
the parameter is significant
is a candidate
The Actuarial
for
Education
and should be retained in the model. Otherwise, the parameter
being discarded.
Company
IFE: 2022 Examination
Page 38
CS1-13:
Generalised linear
models
It should be noted that in some cases, a parameter may appear to be unnecessary using
this criterion,
but the
model without it does not provide
a good enough fit to the data.
As with any model we needlook at the whole situation and not just one aspectin isolation.
In
R,the statistic
and p-value for the tests
of
0
=:0H are given in the output
of
summary(model).
So for the above
printout
we can see that the covariate
disp is significant,
whereas the
covariate wt is not. Therefore, we would remove wt from the model and see if it still
provides
a good enough fit to the data.
Recall from
5.3
Chapter 10 that a p-value is significant if it is less than 5% (ie 0.05).
Thesaturated model
To compare
models, we need a measure of the fit of a model to the
data.
To do this, we compare our model to a model that is a perfect fit to the data. The model that fits
perfectly is called the saturated
model.
A saturated
model is defined to be a model in
observations,
which there
are as many parameters
as
so that the fitted values are equal to the observed values.
Keyinformation
In the saturated
model we have iiy =
, ie the fitted
values are equal to the observed values.
Question
Claim amounts for medicalinsurance claims for hamsters are believed to have an exponential
distribution
with meani
fy()==
1
yii
-
/
e
:
exp -
yi
log
??
ii ??-
??
ii
Weare given the following data for hamsters
ageix
(months)
claim amount ()
The insurer
medicalclaims, using the model above:
4
8
10
11
17
50
52
119
41
163
believes that a model with 5 categories for ageis sufficiently
i==?a
ii
accurate:
1,2,3,4,5
Using the canonical link function,
show that the fitted
values ()i
are the observed claim
amounts, iy
.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 39
Solution
Thelog of the likelihood function is:
log L( )
yi
=--??log
ii
i
Setting the canonicallink function for the exponential distribution to the linear predictor,
gi 1ii
()?==
, gives:
=
?
11
=
a ii
aii
This enables usto writethe log-likelihood function in terms ofia :
log iaa
( )
=-
Lyii
+i?? log( a )
Wecan now differentiate this withrespect toia :
?
log ( )ii
=-
Lya +
?
1
aa ii
Allthe terms other than those involving the specificia
Sothe equations satisfied by the MLEsofia
-+
=
yiiiiya
Hence, the fitted
0
?
a
=
wearelooking at disappear.
are:
11
values are:
1
==
a
y
iii
Thefitted values, i , are equal to the observed values,iy .
However, a modelthat fits the data perfectly is not necessarily a satisfactory
Suppose we are trying to
model weight (Y) and height (x) using alinear
model,
model.
01x+. Thisis
shown below on the left, whereas a graphical representation ofthe saturated modelis onthe
right.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 40
CS1-13:
Generalised linear
models
Weight(Y)
Weight(Y)
x
x
x
x
x
x
0
+1x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Height (x)
Height (x)
The saturated modelis a perfect fit to the data. However, since the fitted value is the observed
value, wecannot predict a value without first knowing whatit is. The saturated model has no
predictive ability for other heights, but it does provide an excellent benchmark against which to
compare the fit of other models.
5.4
Scaleddeviance(or likelihood ratio)
In order to assessthe adequacy of a modelfor describing a set of data, wecan compare the
likelihood
under this
model with the likelihood
under the saturated
model.
The saturated modelusesthe same distribution andlink function as the current model, but has as
manyparameters asthere are data points. Assuchit fits the data perfectly. Wecan then
compare our modelto the saturated modelto see how good afit it is.
Suppose that SL and ML denote the likelihood
evaluated at their respective
optimal parameter
functions
of the saturated
values. The likelihood
and current
models,
ratio statistic is given by
/LL
SM. If the current modeldescribes the data wellthen the value ofML should be close to the
value ofSL . If the modelis poor then the value ofML
and the likelihood
Alternatively,
ratio statistic
will be muchsmaller than the value ofSL
will belarge.
we could examine the natural log of the likelihood
log LS
LM
where ?SSL=
log
=-
ratio statistic:
??
SM
and ?MML=
log
.
The scaled deviance is defined as twice the difference between the log-likelihood
model under consideration
IFE: 2022 Examinations
(known
as the current
model) and the saturated
of the
model.
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 41
Scaled deviance
The scaled deviancefor a particular model Mis defined as:
SDMS2
()M=-
??
The deviance for the current
scaled deviance
model,MD , is defined
such that:
DM
=
f
Remember that
fis a scale parameter,
soit seems sensible that it should be used to connect the
deviance withthe scaled deviance. For the Poisson and exponential distributions,
f 1=
, so the
scaled deviance andthe deviance areidentical.
The smaller the deviance,
However, there
the better the
will be a trade-off
here.
model from the point
of view of model fit.
A model with many parameters
will fit the data well.
However a model withtoo manyparameters will be difficult and complex to build, and will not
necessarilylead to better prediction in the future. It is possible for modelsto be
over-parameterised,
ie factors are included that lead to a slightly, but not significantly, better fit.
When choosing linear models, we will usually need to strike a balance between a model with too
few parameters (which
will not take account of factors that have a substantial impact
on the data,
and willtherefore not be sensitive enough) and one withtoo many parameters (which will be too
sensitive to factors that really do not have mucheffect on the results). Weusethe principle of
parsimony
here, ie
we choose the simplest
This can be illustrated
by considering
In this case, the log-likelihood
n
)yf
(;,?f)
?
=
model that does the job.
the case
for a sample
when the data are normally
distributed.
of size n is:
?log ( y; ?, f
Yi
i
i=1
n
log2ps
2
=-
The likelihood
function
2
-?n
i=1
yii
-?
()2
2s2
for a random sample of size n is
fy ) f ( y12()... fy
( )n .
When wetake logs,
weaddthe logs of the individual PDF. Recallthat for the normal distribution the natural
parameter is the mean,ie
For the saturated
disappears.
n
=
ii?
.
model, the parameter i?
is estimated
by iy , and so the second term
Thus, the scaled deviance (twice the difference
under the current and saturated
models) is
y
1
between the values of the log-likelihood
2
?()ii
-
S
i
=
2
s
where i? is the fitted value for the current model.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 42
CS1-13:
The deviance (remembering
that the scale parameter
=
fs2
), is the
Generalised linear
well-known
models
residual
sum of squares:
n
y
S-
?
ii
i=1
()2
Thisis why the deviance is defined
with afactor
of two in it, so that for the normal
deviance is equal to the residual sum of squares that
model the
we metin linear regression.
The residual deviance (ie the deviance after all the covariates have been included) is
displayed
as part of the results
from
summary(model).
For example:
In R we can obtain a breakdown
of how the deviance is reduced by each covariate
sequentially
by using anova(model).
However, unlike for linear regression, this
does not automatically
carry out a test.
Also recall that the smaller the residual (left over) deviance, the better the fit of the
added
command
model.
Theresidual deviance outputted by the glm() function is a measureoffit, similar to the scaled
deviance and deviance defined earlier. However,this output wont necessarily matchthe scaled
deviance or deviance calculated from first principles usingthe formulae in this section.
5.5
Usingscaled devianceand AkaikesInformation Criterionto choose
between models
Adding more covariates
will always improve the fit and thus decrease the deviance,
however we need to determine whether adding a particular covariate leads to a significant
decrease in the deviance.
For normally distributed data, the scaled deviance has a ?2 distribution.
parameter
for the normal
sum-of-squares
models).
=
fs2
and using
must be estimated,
we compare
F tests (as in the analysis
Since the scale
models by taking
of variance for linear
ratios
of
regression
Wecovered this in Section 4.3from the previous chapter.
Thus, if we want to decide if
is a significant
improvement
S1), we see if
2
Model 2 (which
over
has
Model 1 (which
- SS () q
12
is greater than the 5% value for the
Sn (( p-+q ))
The code for comparing two normally distributed
anova(model1,
IFE: 2022 Examinations
pq+ parameters
model2,
and scaled deviance
has p parameters
and scaled
Fqn
, q--p
S2)
deviance
distribution.
models, model1 and model2, in Ris:
test="F")
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 43
In the case of data that are not normally distributed, the scale parameter
example, for the Poisson
distribution.
looking
For these
distribution
reasons,
1f
Since the distributions
= ), and the deviance is only asymptotically
the common
at the difference in the scaled
may be known (for
procedure
deviance
are only asymptotically
is to compare
and comparing
normal, the F test
get a better result by comparing two approximate
?
2
two
with a
?
2
a
?
2
models by
distribution.
will not be very accurate.
We
distributions.
To be more precise,its the absolute difference between the scaled deviancesthat is compared
with
?
2
.
Thus, if we wantto decide if Model 2(which
is a significant
improvement
S1), we see if
has
over Model 1 (which
pq+ parameters and scaled deviance 2S )
has p parameters
SS12 is greater than the 5% value for the
and scaled deviance
2
?q distribution.
Recallthat wesubtract one degree offreedom for each extra parameter introduced.
difference
Since
between
p and +pq
2
+??p22 ?q? qp
that
Soits the
matters.
(provided the random variables areindependent), it
makessense to say
+
that the difference in the scaled deviances has a ?q2 distribution.
What we are trying to do hereis to decide whether the added complexity results in significant
additional accuracy. If not, then it would be preferable to usethe model withfewer parameters.
Alternatively, wecould expressthis test in terms ofthe log-likelihood functions. If welet ?p and
?p
q+
denote the log-likelihoods ofthe models with p and
pq+parameters respectively, then
the test statistic can be written as:
SS
12-=
-??Sp
=-
2
() -22( ?
-??pp
- ?
Sp
+q
)
+q()
Thisis the format given on page 23 of the Tables and will be usedin Subject CS2to compare Cox
regression models.
Question
Explain whythe test statistic will always be positive.
Solution
As we have mentioned before, adding
more parameters
willimprove
the fit of the
model to the
data. Therefore we would expect the value of the likelihood function to belarger for models with
moreparameters. Hence,>??
p
The Actuarial
Education
Company
qp+andsothe statistic willbe positive.
IFE: 2022 Examination
Page 44
CS1-13:
Generalised linear
models
The code for comparing these two (non-normally distributed) models, model1 and model2, in
Ris:
anova(model1,
model2,
A very important
models. In other
two
modelsfor
linear
predictor
point is that this method of comparison
can only be used for nested
words, Model 1 must be a submodel
of Model 2. Thus, we can compare
which the distribution
ofthe data and the link function
has one extra parameter in
are the same, but the
Model 2. For example
x and01
+
2
++
01
2xx
. But wecouldnotcompare
in this way
if the distributionofthe dataorthe
link function
and
test="Chi")
+ 03logx . It should be clear that we can gauge the importance
examining
the scaled deviances,
but we cannot
use the testing
In the first case,the difference between the modelsis
2x
2
++
01
2xx
are different, or, for example, whenthe linear predictors are
of factors by
procedure
outlined
above.
2, and so a significant difference
between the modelstells usthat the quadratic term should beincluded. In the second case, the
difference
between the
modelsis
logxx - 32 2, andso asignificantdifferencedoesnttell us
which parameter is significant.
An alternative
method of comparing
models is to use Akaikes Information
Criterion (AIC).
Since the deviance will always decrease as more covariates are added to the model, there
will always be atendency to add more covariates.
However this
willincrease the complexity
of the model which is generally considered to be undesirable.
To take account of the
undesirability
of increased
complexity,
computer packages will often quote the AIC, which
is a penalised log-likelihood:
AIC
=-
2 log LM + 2
number
where log ML is the log-likelihood
Whencomparing two
deviance is
of parameters
of the
model under consideration.
models, the smaller the AIC,the better the fit.
more than twice the change in the number
Soif the change in
of parameters then it
would give a
smaller AIC.
Thisis approximately
equivalent to checking
the 5% value of the
?2 distribution
whether the difference in deviance is greater than
for degrees of freedom
between 5 and 15. However, it has
the added advantage of being a simple wayto compare GLMs without formal testing. Thisis
similar to comparing the adjusted 2R for
multiple linear regression
models in the previous
chapter and henceis displayed as part of the output of a computer fitted
GLM.
In Rthe AICis displayed as part of the results from summary(model).
An example of this is given in the R box at the end of Section 5.4.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
5.6
Generalised
linear
models
Page 45
Theprocessofselectingexplanatoryvariables
As for multiple linear regression the process of selecting the optimal set of covariates
GLM is not always easy. Again, we could use one of the two following
approaches:
(1) Forward selection.
significant
AIC to rise
for a
Addthe covariate that reduces the AICthe most or causes a
decrease in the deviance.
Continue in this way until adding any more causes the
or does not lead to a significant improvement
in the deviance.
Note we should
start with main effects before interaction
terms and linear terms before polynomial.
Suppose weare modellingthe number of claims on a motorinsurance portfolio and we have data
on the drivers
age, sex and vehicle group.
We would start
with the null model (ie a single
constant equal to the sample mean). Then we wouldtry each of single covariate models(linear
function
of age or the factors
improvement
in a
sex or vehicle group) to see which produces the
?2 test or reduces the AIC the
most. Suppose this
most significant
was sex. Then we would try
adding a second covariate (linear function of age or the factor vehicle group). Supposethis was
age. Then we would try adding the third covariate (vehicle group).
We might then try a quadratic
function of the variable age(and maybehigher powers) or each of 2term interactions (eg sex*age
or sex*group
or age*group).
Finally we would try the 3 term interaction
(2) Backward selection.
Start by adding all available
remove covariates one by one starting
with the least
covariates
significant
(ie sex*age*group).
and interactions.
Then
until the AIC reaches a
minimum or there is no significant improvement in the deviance, and all the remaining
covariates
have a statistically
significant
impact
on the response.
So withthe last example we would start with the 3 term interaction sex*age*group and look at
which parameter hasthe largest p-value (in a test of it being zero) and remove that. Weshould
see a significant improvement
parameter
with the largest
in a ?2 test and the AIC should fall.
Then weremove the next
p-value and so on.
The Core Reading uses Rto demonstrate this procedure.
PBOR,its important to understand the process here.
Whilstthis will be covered in the CS1
Example
We demonstrate
both of these
methods in
R using a binomial
model on the mtcars
dataset
from the MASS package to determine whether a car has a V engine or an S engine (vs)
using
weight in 1000 lbs (wt) and engine
Forward
selection
Starting
with the null
model0
The AIC of this
<-glm(vs
~
model (which
1,
data=mtcars,
The Actuarial
Education
Company
(disp)
as covariates.
family=binomial)
would be displayed using summary(model0))
<-update(model0,
anova(model0,
in cubic inches
model:
We have to choose whether we add disp
greatest improvement
in the deviance.
model1
displacement
model1,
or wt first.
~.+
is 45.86.
Wetry each and see which has the
disp)
test="Chi")
IFE: 2022 Examination
Page 46
CS1-13:
model2
<-update(model0,
anova(model0,
~.+
model2,
So we can see that
disp
Generalised linear
models
wt)
test="Chi")
has produced
the
more significant
result
so we add that
covariate first.
R always calls the
models we are comparing Model
1 and Model
2, irrespective
of how we have
named them. Thiscanlead to confusion if weare not careful.
The AICof model 1(adding disp) is 26.7 whereasthe AIC of model 2(adding wt) is 35.37.
Therefore adding disp reduces the AIC morefrom model 0s value of 45.86.
Let us now seeif adding wt to disp produces a significant improvement:
model3
<-update(model1,
anova(model1,
~.+
model3,
This has not led to a significant
therefore
we definitely
wt)
test="Chi")
improvement
in the
deviance
The AICof model 3(adding wt) is 27.4 whichis worsethan
would not add it.
Incidentally
the
AIC for
so we would not add wt (and
would not add an interaction term between disp and wt).
model1s AIC of 26.7. Therefore we
models 0, 1, 2, 3 are 45.86, 26.7, 35.37 and 27.4.
So using these
would have given the same results (as Model 1 produces a smaller AICthan Model 2, and
then
Model 3 increases
the
AIC and so we would not have selected it).
Backward selection
Starting
with all the possibilities:
modelA
IFE: 2022 Examinations
<-glm(vs
~
wt
*
disp,
data=mtcars,
family=binomial)
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 47
The output is:
None of these
covariates
are significant.
The parameter
of the interaction
term has the highest p-value (0.829),
and so is mostlikely to be
zero.
Wefirst remove
modelB
the interaction
wt:disp,
<-update(model1,
The AIC has fallen from
Alternatively,
term
as this is the least
significant
parameter:
~.-wt:disp)
29.361 to 27.4.
carrying out a ?2 test using anova(modelA,
modelB,
test="Chi")
would
show that there is no significant difference between the models(p-value of 0.8417) and therefore
we are correct to remove the interaction
The wt term is not significant
modelC
Both of these
term
so removing
<-update(modelB,
coefficients
between
wt and disp.
that:
~.-wt)
are significant
and the
AIC has fallen from
Alternatively, carrying out a ?2 test using anova(modelB,
modelC,
27.4 to 26.696.
test="Chi")
would
show that there is no significant difference between the models(p-value of 0.255) and therefore
we are correct to remove the wt covariate.
We would stop at this
model. If
weremove
the disp term (to give the
null
model), the AIC
increases to 45.86.
Alternatively,
carrying out a ?2 test between these two
difference (p-value ofless than 0.001) and therefore
models would show a very significant
weshould not remove the disp covariate.
Wecan see that both forward and backward selection lead to the same model being chosen
in this case.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 48
5.7
CS1-13:
Generalised linear
models
Estimatingthe responsevariable
Once we have obtained our model and its estimates,
we are then able to calculate the value
of the linear predictor,
?, and by using the inverse of the link function
we can calculate our
estimate
of the response
=
variable
-1?g () .
Substituting the estimated parameters into the linear predictor gives the estimated value of the
linear predictor for different individuals. Thelink function links the linear predictor to the mean
of the distribution. Hence wecan obtainan estimatefor the meanofthe distribution of Yfor
that individual.
Lets now return to the Core Reading example on page 45.
Suppose,
we wish to estimate the probability
of having a V engine for a car with weight
2,100lbs and displacement 180 cubic inches.
Using our linear
0
=
4.137827
predictor
and
1
+ 01
disp (ie vs
~ disp),
we obtained
estimates
of
.
=-0.021600
These coefficients displayed as part of the summary output of Model C in the example above.
Hence, for displacement
180 we have
?
=- 4.137827
0.021600
180
=
0.24983 . We did not
specify the link function so we shall use the canonical binomial link function which is the
logit function.
??
0.24983 = log ??
??1
-
?
Recall that the mean for a binomial
e 0.24983
==0.562
0.24983
1+ e
model is the probability.
So the probability
of having a V
engine for a car with weight 2,100 lbs and displacement 180 cubic inches is 56.2%.
The figure 2,100 does not enter the calculation
In R we can obtain
newdata
because we removed
the weight covariate.
this as follows:
<-data.frame(disp=180)
predict(model,newdata,type="response")
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
6
Generalised
linear
models
Page 49
Residualsanalysisandassessmentof modelfit
Once a possible
The residuals
fitted
model has been found it should
responses,
.
The fitted responses
function to the linear predictor
Welooked
be checked
by looking
at the residuals.
are based on the differences between the observed responses,
are obtained
by applying
y , and the
the inverse
of the link
with the fitted values of the parameters.
at how we could obtain predicted responses values in the previous section.
The fitted
values arethe predicted Y valuesfor the observed data set, x.
The R code for obtaining the fitted values of a GLMis:
fitted(model)
For example, in the actuarial passrates modeldetailed on page 6, wecould calculate from the
model what the pass rate ought to be for students
assignments and scored 60% on the mock exam.
The difference
between this theoretical
who have attended tutorials,
submitted
three
pass rate and the actual pass rate observed for students
who matchthe criteria exactly will give usthe residuals.
Question
Draw up a table showing the differences
between the actual and expected values of the truancy
rates in the example on page 9.
Solution
Recall that the expected number of unexplained
?a
aWC
+ij=+
?
where x
x
=-2.64
aOC
=-1.14
=
M
absences in a year were modelled by:
age, and
and
a
=-3.26
F
where WC=Within catchment , OC= Outsidecatchment,
are asfollows:
=-3.54
?
=
0.64
=MaleM
, F =Female.
This gives expected values of:
Agelast birthday
Within
catchment
area
Outside
catchment
area
The Actuarial
Education
8
10
12
14
Male
0.46
1.65
5.93
21.33
Female
0.35
1.25
4.48
16.12
Male
2.05
7.39
26.58
95.58
Female
1.55
5.58
20.09
72.24
Company
IFE: 2022 Examination
Page 50
CS1-13:
So the differences
Generalised linear
models
between the actual values (given on page 9) and expected values are:
Agelast birthday
8
10
12
14
Male
1.34
0.35
0.37
7.23
Female
0.15
0.35
0.52
Male
0.05
0.11
1.08
23.58
Female
1.25
0.62
0.49
4.04
Within
catchment
area
Outside
0.08
catchment
area
The procedure here is a natural extension of the way wecalculated residuals for linear regression
models covered in the previous chapter. However, because of the different distributions used, we
need to transform
There are two
6.1
these raw
residuals so we are able to interpret
kinds of residuals:
them
meaningfully.
Pearson and deviance.
Pearson
residuals
The Pearson residuals are defined as:
y -
var( )
The var()
the fitted
in the denominator refers to the variance of the response distribution, var()Y
values,
distribution
is
, in the formula.
2, we have var( 2)=
The Pearson residual,
using
For example, since the variance of the exponential
in that case.
which is often used for
normally
distributed
data, has the
disadvantage that its distribution is often skewed for non-normal data. This makesthe
interpretation
of residual
The R code for
obtaining
plots
the
difficult.
Pearson residuals
residuals(model,
type=
The Pearson residuals returned
is:
"pearson")
by R are calculated slightly
differently
from the definition
given in
this section. Therefore, this output wont necessarily matchthe Pearsonresiduals calculated from
yfirst principles
using
var( )
.
If the data come from a normal distribution, then the Pearson residuals willfollow the standard
normal distribution. Bycomparing these residuals to astandard normal (eg by using a Q-Qplot),
we can determine
IFE: 2022 Examinations
whether the
model is a good fit.
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 51
However, for non-normal
data the Pearson residuals
will not follow the standard
normal
distribution and wont even be symmetrical. This makesit difficult to determine whether the
modelis a good fit. Hence we will need to use a different type of residual.
6.2
Devianceresiduals
Deviance residuals
the contribution
sign
y
are defined
as the product
of the sign of
y-
and the square root
of
of y to the scaled deviance. Thus, the deviance residual is:
()
d-
i
where
thescaled
deviance
is ?di2.
Recall that:
?+> 1if x
sign x()= ?
?-< 1if x
0
0
Deviance residuals
are usually more likely to be symmetrically
distributed
and to have
approximately
normal distributions,
and are preferred for actuarial applications.
The R code for obtaining the deviance residuals is:
residuals(model)
The deviance residuals returned
this section.
by Rare calculated
Therefore, this output
wont
slightly differently
necessarily
from the definition
given in
match the deviance residuals calculated
from first principles using the formulae in this section.
Wecan see that deviance residuals
the following result: If
are
morelikely to be symmetrically
Note that for
by considering
{}iXis a set ofindependent normal random variables, then
have a ?2 distribution. Therefore, since
follows that id
distributed
2
2
?di(iethescaled
deviance)
isapproximately
? , it
(and also the deviance residual) is likely to be approximately
normally
=?YXi2 will
normal.
distributed
data, the Pearson and deviance residuals
distributed
data, the Pearson and deviance residuals
are identical.
Question
Show that, for normally
are identical.
Solution
If
?YNsii (,
2), then from Section 6.1, the Pearson residuals are:
ii
var( i )
The Actuarial
Education
yy --
ii
=
s
Company
IFE: 2022 Examination
Page 52
CS1-13:
In Section 5.4, wesaw that the scaled deviance
y
-
ii()
2
2
Generalised linear
models
was:
nn
??=di2
ii==s
11
Sothe devianceresiduals are given by:
sign y y-= ii () di
sign
-
ii ()
ii
yy -i
i
=
ss
Hencethe Pearsonresiduals and the devianceresiduals arethe same.
6.3
Usingresidualplotsto checkthe fit
The assumptions
of a GLM require that the residuals
should show no patterns.
presence of a pattern implies that something
has been missed in the relationship
the predictors and the response. If this is the case, other model specifications
The
between
should be
tried.
So,in addition to the residuals being symmetrical, we would expect no connection between the
residuals
covariates,
and the explanatory covariates.
Rather than plotting the residuals against each of the
we could just see if there is a pattern
For our model above (on the mtcars
when plotted
against the fitted
dataset), a plot ofthe residuals
values.
against the fitted
values is as follows:
There does appear to be some
be outliers.
IFE: 2022 Examinations
pattern here and the three
named points
on the graph
The Actuarial
Education
might
Compan
CS1-13:
Generalised
linear
models
Page 53
The line shows the trend.
Ideally this should be horizontal
which indicates
no pattern.
Also the
residuals should be small. If Rnames them, then they are considered to betoo large.
Wecould also plot a histogram
of the residuals,
or another
similar
also be examined in order to assess whether the distributional
diagnostic
plot should
assumptions arejustified.
Whilsta Q-Qplot is produced as an output of the GLMsprocess, there is some controversy over
whether this is appropriate for non-normal distributions such asthe binomial distribution in the
Core Reading example above. Henceit has not beenincluded in the Core Reading.
6.4
Acceptabilityof afitted model
In addition to comparing
models, statistical tests can be used to determine the acceptability
of a particular
model, once fitted.
Pearsons chi-square test and the likelihood
ratio test are
typically
used. These are described in Chapter 10, Sections 8 and 2 respectively.
The tests
for overall fit involve comparing the scaled deviance ofthe fitted
model withthe scaled
deviance of the null model (with no covariates).
The extent by which the fitted
model
reduces the scaled deviance (per additional
parameter estimated) is a measure of how
the fitted
model is an improvement
on the null model.
Considerable
flexibility
in the interpretation
of the tests
is sometimes necessary in order to arrive at a suitable
based on statistical
inference
much
theory
model. Thus, the interpretation
of
deviances, residuals
and significance
of parameters given above should be viewed as
useful guides in selecting a model, rather than strict rules which must be adhered to.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 54
CS1-13:
Generalised linear
models
The chapter summary starts on the next page so that you can keep
all the chapter summaries together for revision purposes.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 55
Chapter13Summary
Exponentialfamily
There is a wide variety of distributions (normal, Poisson, binomial,
that have a common form, called the exponential family.
gamma and exponential)
If the distribution of Yis a memberofthe exponentialfamily, then the densityfunction of Y
can be written in the form:
Yfy;
(
?ff
,)
exp
yb
((?? ))
=+
a()f
cy(
??-
,)??
??
where ? is the natural parameter whichis afunction of the mean
distribution,
and
fis a scale parameter.
Wherethe distribution
as the normal, gamma and binomial distributions),
we can take
EY=
() only ofthe
has two parameters (such
f to be the parameter
other
than the mean. Wherethe distribution has one parameter (such asthe Poisson and
exponential distributions), wecan take f 1= . However, the parameterisations are not
unique.
Mean,varianceandvariancefunction
EY
()
= ?
b' (
var()Ya)
)
(
= bf
)
?'' (
The variance function is afunction of the mean
var ()Yrelates to
() =Vb
?'' ( )
EY=
() and gives a measureof how
.
Generalisedlinear models(GLMs)
A GLMtakes multiple regression one step further by allowing the datato be non-normally
distributed. Instead, wecan use any of the distributions in the exponential family.
A GLMconsists ofthree components:
1)
a distribution
for the data (Poisson,
2)
alinear
3)
alink function (that links the meanof the response variable to the linear
predictor).
predictor (a function
Maximum likelihood
estimation
exponential,
gamma, normal or binomial)
of the covariates that is linear in the parameters)
can be used to estimate the values of the parameters in the
linear predictor.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 56
CS1-13:
Generalised linear
models
Linkfunctions
For each underlying distribution there is onelink function that appears morenatural to use
than any other, usually becauseit willresult in valuesfor
that are appropriate to the
distribution under consideration. Thisis called the canonicallink function,
accepted
link function.
The canonical link functions
They are equivalent to the natural parameter
which meansthe
are given on page 27 of the Tables.
? from the exponential family formulation
of
the PDF.
Covariates
A variable (eg age) is a type of covariate whose real numerical value enters the linear
predictor directly, and a factor (eg sex) is a type of covariate that takes categorical values.
Linear predictors
Linear predictors arefunctions ofthe covariates. They arelinear in the parameters and not
necessarilyin the covariates.
The simplest linear predictor is that for the constant model: ?a= , whichis usedif it is
thought
that the
Aninteraction
mean of the response variable is the same for all cases.
term is used in the predictor
when two covariates
are believed not to be
independent. In other words,the effect of one covariate (eg the age of anindividual) is
thought to depend on the value of another covariate (eg whether the sex of anindividual is
male or female).
The dot .
notation is used to indicate
aninteraction,
eg age.sex is the interactive
term
between age and sex.
The star *
notation is usedto indicate the maineffects as well asthe interaction, eg:
age*sex = age +sex + age.sex
Aninteraction
(dot) term never appears on its own.
Saturated model
The modelthat provides the perfect fit to the datais called the saturated model. The
saturated model has as manyparameters as data points. Thefitted values i are equal to
the observed values iy . The saturated
model is not useful from a predictive
It is, however, a good benchmark against which to compare the fit of other
point of view,.
models via the
scaled deviance.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 57
Scaleddeviance
The scaled deviance(or likelihood ratio) is usedto compare the fit ofthe saturated model
withthe fit of another model. The scaled deviance of Model1is defined as:
SD
L
2ln =- lnSL
11()
whereSL is the likelihood
of the saturated
model.
The poorer the fit of Model1,the biggerthe scaled deviance will be.
Comparing models
Wherethe data are normally distributed, it can be shown that, for two nested models,
Models 1 and 2 where Model 1 has p parameters
SD12
- SD ??
and
Model 2 has
pq+parameters:
2
q
For other distributions, the difference in the scaled deviances has an approximate
(asymptotic)
chi-square
distribution
with q degrees of freedom.
Alternatively, wecan compare the reduction in the AICof the two
models.
The processof selecting explanatory variables
(1) Forward selection. Add the covariate that reduces the AICthe mostor causes a
significant decreasein the deviance. Continuein this way until adding any morecausesthe
AICto rise or does not lead to a significant improvement
in the deviance. It is usual to
consider maineffects before interaction terms andlinear terms before polynomials.
(2) Backward selection. Start by adding all available covariates and interactions.
Then
remove covariates one by one starting with the least significant until the AICreaches a
minimum or there is no significant improvement in the deviance, and all the remaining
covariates have astatistically significant impact on the response.
Rulesfor determining the number of parameters in a model
The constant model has 1 parameter.
A modelconsisting of one maineffect that is a variable (eg age) hastwo parameters (eg 0
and1
).
A modelconsisting of one maineffect that is afactor (eg sex) has as manyparameters as
there are categories (egia , i = 1(male) andi = 2(female)).
The Actuarial
Education
Company
IFE: 2022 Examination
Page 58
CS1-13:
Generalised linear
models
When a new main effect is added to a model (eg age + sex), we add n1parameters where
n is the number of parameters if the main effect were onits own (eg for age + sex, the
number of parameters is 2 +(2
1) = 3).
Whenaninteractive effect (a dot term) is added to a model(eg age + sex + age.sex), weadd
(1)(mn 1)--parameters for the interactive
parameters is 2 +(2
1) +(2
1)(2
effect (eg for age +sex + age.sex, the number of
1) = 4).
A modelconsisting of a star term only (eg age*sex) has mn parameters where mand n are
the number of parameters if the maineffects were on their own (eg for age*sex,the number
of parameters is 4=22
).
Residuals
Aresidual is a measure ofthe difference between the observed valuesiy and the fitted
values
i . Two commonly
used residuals for
GLMs are the Pearson residual and the
deviance residual.
Pearsonresiduals
These are
y-
where var
var( )
value
() is
Y()varwith
replaced by the corresponding fitted
.
The Pearson residual,
which is often used for normally distributed
data, has the disadvantage
that its distribution is often skewed for non-normal data. This makesthe interpretation
residuals plots difficult.
of
Devianceresiduals
These are sign y
d-
() i
2
is thescaleddeviance
ofthe model.
where ?di
Devianceresiduals are usually morelikely to be symmetrically distributed and to have
approximately normal distributions, and are preferred for actuarial applications.
For normally distributed
data, the Pearson and deviance residuals
are identical.
Testing whethera parameteris significantly different from zero
As a general rule,
we can conclude that a parameter is significantly
different from zero if it is
at least twice as bigin absolute terms asits standard error, ie if:
2.
IFE: 2022 Examinations
>
()se
The Actuarial
Education
Compan
CS1-13:
7
13.1
Generalised
linear
models
Page 59
Chapter13 PracticeQuestions
Explain whythe link function
g
() log=
is appropriate for the Poisson distribution by
considering the range of values that it results in
taking.
i
13.2
Explain the difference
between the two types of covariate:
13.3
Arandom variable Y has density of exponential family form:
Exam style
(fy=+c
) exp
yb?? ()
a()f
a variable and a factor.
??-
y( ,f) ??
??
(i)
State the meanand variance of Yin terms of
()b?andits derivatives and
(ii)
(a)
random
Show that an exponentially
distributed
variable
()af.
with mean
[1]
has a
density that can be written in the above form.
(b)
Determine the natural parameter
and the variance function.
[3]
[Total 4]
13.4
Exam style
Aninsurer wishesto use a generalisedlinear
portfolio.
It has collected the following
different
modelto analysethe claim numbers onits motor
data on claim numbers iy , i = 1, 2,..., 35 from three
classes of policy:
ClassI
1
2
0
2
1
ClassII
1
0
1
1
0
ClassIII
0
0
0
0
1
0
1
0
For these
0
2
2
1
0
1
0
1
0
0
0
0
0
0
0
0
data values:
10
15
35
i=1
i=11
i=16
?=yi 11
The company
(i)
0
?=yi 3
?=yi 4
wishes to use a Poisson model to analyse these data.
Showthat the Poisson distribution is a member ofthe exponential family of distributions.
[2]
Theinsurer decidesto use a model(Model A)for which:
?a
log
?
??
The Actuarial
=
1, 2, ..., 10
i==?11, 12,..., 15
i
?
wherei
i
i
=
16, 17, ..., 35
is the meanofthe relevant Poisson distribution.
Education
Company
IFE: 2022 Examination
Page 60
(ii)
CS1-13:
Derive the likelihood
function
estimates for
and ?.
a,
for this
model, and hence find the
Generalised linear
models
maximum likelihood
[4]
Theinsurer now analysesthe simpler
modellogi
a= , for all policies.
(iii)
Calculate the maximumlikelihood estimate for
a under this model(Model B).
(iv)
(a)
Show that the scaled deviancefor Model Ais 24.93.
(b)
Calculate the scaled deviance for
[2]
Model B.
[5]
It can be assumedthat f () =yylog y is equal to zero when y0= .
(v)
13.5
Compare Model A directly with Model B, by calculating an appropriate test statistic.
[2]
[Total 15]
In the context
function
of generalised linear
models, consider the exponential
distribution
with density
x , where:
()f
Exam style
1
fx()=>-x e
(x
/
0).
(i)
Show that
x can be written in the form of the exponential family
()f
(ii)
Show that the canonical link function,
?,is given by
of distributions.
1
[1]
[1]
?=.
(iii)
Determine the variance function
and the dispersion parameter.
[3]
[Total 5]
13.6
Exam style
Therandom variable iZ
has a binomial distribution
Asecond random variable,iY , is defined as nii =YZ /
(i)
Show that the distribution
with parameters n andi
01i<<
.
.
ofiY is a member of the exponential family,
natural and scale parameters and their functions
(ii)
, where
()af,)b?(
and )cy (,
stating clearly the
f.
[4]
Determine the variance function ofiY.
[2]
[Total 6]
13.7
Astatistical distribution is said to be a member of the exponential family if its probability function
or probability
density function
can be expressed in the form:
Exam style
Y(fy;
(i)
?ff
,)
exp
yb?? ()
a()f
??-
=+( y ,)c
??
??
Showthat the meanof such a distribution is ?' ()b and derive the corresponding formula
for the variance by differentiating
the following
expression
with respect to
?:
? fy() dy=1
[4]
y
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
Generalised
(ii)
linear
Usethis
models
Page 61
method to determine formulae
for the
mean and variance of the gamma
distribution with density function:
a
fx()
a
a
x
G
a
a
1
e-
x/
( x=>- 0)
a
[3]
()
[Total 7]
13.8
Exam style
Independent claim amounts
EY=
()ii
YY
12Y?,,, n are modelled as exponential random variables with
1,2,...,in=. Thefitted values for a particular modelare denoted by i .
,
Derive an expression for the scaled deviance.
13.9
Asmall insurer
linear
wishes to
modelits claim costs for
[5]
motor insurance
using a simple generalised
model based on the three factors:
Exam style
=1
YOi =??
i 0
for 'young' drivers??i
??
for 'old' drivers ??=
??
=1
for 'fast' cars??j
??
FSj =??
0for 'slow' cars??
??=
j
k
1
TCk =??
k
for 'town' areas ??=
??
0
for 'country' areas
??=
??
The insurer is considering three
possible
modelsfor the linear
Model1:
YO FS TC++
Model2:
YO FS YO
.FS +TC++
Model3:
YO FS
** TC
predictor:
(i)
Writeeach of these modelsin parameterised form, stating how many non-zero parameter
values are present in each model.
[6]
(ii)
Explain why Model 1 might not be appropriate
and why the insurer
may wish to avoid
using Model 3.
[2]
The student fitting the modelshas saidWe are assuming a normal error structure and weare
usingthe canonical link function.
(iii)
Explain what this
The Actuarial
Education
Company
means.
[3]
IFE: 2022 Examination
Page 62
CS1-13:
The table below shows the students
calculated
Generalised linear
models
values of the scaled deviance for these three
modelsandthe constant model.
Scaled
Model
Deviance
1
YO
(iv)
50
FS
TC++
5
YO
0
(a)
7
10
FS++ YO.FS + TC
YO
Degrees offreedom
**FS
TC
Complete the table by filling in the
missing entries in the degrees of freedom
column.
(b)
13.10
The following
Carry out the calculations necessaryto determine which model would bethe
mostappropriate.
[5]
[Total 16]
study
was carried out into the
mortality of leukaemia
sufferers.
A white blood cell
count wastaken from each of 17 patients and their survival times wererecorded.
Exam style
Supposethat iY represents the survival time (in weeks)of the thi patient andix
logarithm (to the base 10) ofthe thi patients initial
represents the
white blood cell count ( i = 1,2, ?,17 ).
Theresponse variablesiY are assumedto be exponentially distributed. A possible specification
for
()iEY
is EYii)xa
()=+ exp(
(i)
(ii)
. This will ensure that
Write down the natural link function associated with the linear predictor
? =+
iix. [2]
a
Usethis link function andlinear predictor to derive the equations that mustbe solved in
order to obtain the maximumlikelihood estimates of a and .
[4]
The maximumlikelihood estimate of
estimated standard error 1.655.
(iii)
is non-negative for all values of ix .
()iEY
a
derived from the experimental datais
Construct an approximate 95%confidence interval for
Thefollowing two
a
8.477=
, with
a and interpret this result.
[2]
modelsare now to be compared:
Model 1:
()iEY
a=
Model2:
()iiEY=+ax
The scaled deviancefor
Model 1is found to be 26.282 and the scaled deviance for Model 2is
19.457.
(iv)
Test the null hypothesis that
0=
against the alternative
hypothesis that
0?
stating
any conclusions clearly.
[3]
[Total 11]
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 63
Chapter13Solutions
13.1
When weset the link function
make
the subject, weget
for a
13.2
()Poidistribution
() log=
equal to the linear predictor
?
e?=
. Thisresults in positive values only for
where
and then invert to
,
whichis sensible
is defined to be greater than 0.
Avariable is atype of covariate (eg age) whoseactual numerical value enters the linear predictor
directly,
13.3
g
and a factor is a type of covariate (eg sex) that takes categorical
values.
This questionis taken from Subject 106, April 2003, Question 3.
(i)
Mean and variance
Wehave:
EY
[](==
(ii)(a)
)''f
b
( ?)
var[] Y
[1]
a b) ?' (
Exponential form
The PDFof the exponential distribution with mean
( )
fy
1
is:
y??
exp =-??
??
This can be written as an exponential:
fy=-()exp ln
1
y??
[1/2]
??
??
Comparing this to the standard form givenin part (i), wecan define:
=-
,(?f ) =1,( ab ? ) =- ln
11
= -ln( - ?),
cy(
f)
=
0
[1]
,
(ii)(b)
Natural parameter and variance function
The natural parameter is ?, so here the natural parameter is:
1
[1/2]
-
The variance function is (by definition)
()=-
bb()'''
??
?
?'' ()b, so here wefind:
11
=
?
2
=
2
[1]
[Total 3]
The Actuarial
Education
Company
IFE: 2022 Examination
Page 64
13.4
(i)
CS1-13:
Exponential
models
family
For the Poisson distribution,
f () =ye
Generalised linear
wehave:
y / y!
-
We wishto writethis in the form:
yb?? ()
(gy=+(
) exp
??-
c y,f) ??
??
a()f
Rearranging the Poissonformula:
fyy
()
ylog
exp
-
1
??
=-log
!
??
[1]
??
Wecan see that this hasthe correct form with:
log?
(ii)
( ? )==
be
a(f )
?
=
= f
=
1
fc( y, )
=-
log y!
[1]
Maximumlikelihood estimates
Using the rearranged form for the Poisson distribution
likelihood function can be written:
log (
,
,
III
III
)
This now becomes,for
log y=+
-??
Lyi log
15
Ly
a
??
3
wesee that the log of the
(*)
?logiy !
i
35
ii
4?
?yi -10e
+?
ii==111
=+
part (i),
Model A:
10
11
+a
=-i
from
i
- 10
-
5e
a
35
-
20e -?log!yi
= 16
[1]
?
i=1
-5
a
35
- 20ee
e -?log y!i
?
(**)
i=1
Differentiating this log-likelihood function in turn with respect to
?
?
log
a,
and ?, weget:
11=- 10Lea
[1/2]
a
?
log5Le 3=-
[1/2]
log
[1/2]
?
and:
?
4=- 20Le?
??
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 65
Setting each of these expressions equal to zero in turn,
log1.1
a
and:
?
wefind that:
[1/2]
0.09531==
log 0.6
0.51083==-
[1/2]
log 0.2
1.60944==-
[1/2]
These are the
maximum likelihood
estimates for
a,
and ?.
Simpler model
(iii)
In this casethe log-likelihood function reduces to:
35
logL=-
35
y
35
??logeyii !
-
35
=
18
aa
-
35
aa
?logey!i
-
ii==11
i
(***)
=1
[1]
Differentiating
18
(iv)(a)
this
35
with respect to
-= 0
ea
?
a
Scaled deviance for
The scaled deviancefor
a, and setting the result equal to zero, wefind that:
18??
==-??0.66498
35??
log
[1]
Model A
Model Ais given by:
Scaled Deviance
2(log )SA
=- log LL
wherelogSL is the value of the log-likelihood function for the saturated model,and logAL is the
value of the log-likelihood function for
For the saturated
log
Model A.
model, wereplace the i
ySi
i=y??
Lylog
=4
2log2
18
with the iy s in Equation (*) . So:
?log!
i
yi -
s
4log2-=- 4log2
Weusethe hintin the question here.
18 = - 15.2274
[1]
log yyii is zero when y0= , and also when y1= . Sothe
only contribution to the first term is when y2= , giving 4lots of 2log2 .
For the log-likelihood
a
,
and ?
for
Model A, wereplace the parameters
a,
and ? with their estimates
in Equation (**):
log
+a=+ 3
11
4?
- 10
- 5ae - 20Le
-e?
35
?log!
y
Ai
i=1
11log1.1=+ 3log0.6
The corresponding
The Actuarial
Education
value for log AL
Company
+
4log0.2
-
11 - 3 - 4 - 4log2
= -
27.6944
[1]
without the final term is 24.9218.
IFE: 2022 Examination
Page 66
CS1-13:
Generalised linear
models
So the scaled deviance is twice the difference in the log-likelihoods:
Scaled Deviance
(iv)(b)
2(logLLlog
=-
Scaled deviance for
)
=
2 (-15.2274)
-
(
-
27.6944)()
=
[1]
24.93
Model A and Model B
Wenow repeat the processfor
Using Equation (***),
SA
Model B.
the log-likelihood
for
Model Bis:
35
log
18a=- 35Le a
?logBiy !
-
i =1
18??
??=--18
18log
35??
4log2
= -
32.7422
[1]
The value without the final term is 29.9696.
The scaled devianceis again twice the difference in the log-likelihoods:
Deviance 2(log =SB
=- logLL )
Sca
(v)
[1]
2 ( - 15.2274) - ( - 32.7422)()led
= 35.03
Comparing A with B
Wecan use the chi-squared
distribution
to compare
Model A with Model B.
difference in the scaled deviances(which is just 2(log
35.03
24.93
A -
Wecalculate the
logLLB) ):
10.10-=
[1]
Thisshould have a chi-squared distribution with 2-=
31
degrees offreedom, which has a critical
value at the upper 5%level of 5.991. Ourvalue is significant here, since 10.10 5.991>
, so this
suggests that
13.5
Model Ais a significant improvement
over
Model B.
Weprefer
Model A here.
[1]
Thisis Subject 106, September 2000, Question 2.
(i)
Exponential
family
Weneed to expressthe density function in the form exp
yb?? ()
a()f
??-
( ,f) ??+
cy
.
??
Wecan write the density function as:
fy
( )
exp-??
=-
y
??
log
??
Soif welet:
=-
1(
b ? ) =log
?
=ac
1(
)
ff
IFE: 2022 Examinations
1
=- log( - ?)
( y,f==
) 0
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 67
then the density function
(ii)
will have the appropriate form.
Canonicallink function
Wesee from
(iii)
part (i) that
1/=- .
?
[1]
Variance function and dispersion parameter
The variance function is
variance function is
(i)
twice, wefind that b'' ()
()b. Differentiating )b?(
?''
2.
== 221/??
. Sothe
[2]
The dispersion parameter or scale parameter is
13.6
[1]
[1]
f 1=.
ShowiY is member of the exponential family
The PF ofiZ is:
n?? zi(1
fz ()
=-
??
zi??
iii
)nz
- i
The PF ofiY can be obtained by replacing iz
n
fy ()
??
) nyi
(1=??
withiny :
-
nnyi
iii
nyi ??
[1/2]
This can be written as:
fy ()i=+exp ln
n ??
?? nyln
ii )
nyi??
??
=+ nln(1 -
i
1-
i
??
Comparing this to the expression
ln
i
Education
ln??
n????
??
nyi??
??-1
??
ii
)
??
=+ln
n??
??
????
nyi??
??
??
??
[1/2]
i ??-1
Company
e?i
i
[1]
on page 27 of the Tables, we see that the natural parameter is:
??
??
Rearrangingthis gives
The Actuarial
??
??
i ??)
??
??
1n
=
)
i ??
i
y ln??+-ln(1
?i
-nyiln(1
??
??
nyiiln??
+??exp
exp
+nln(1 - i
, so the function
=
1+e
is given by:
? ()ib
?i
IFE: 2022 Examination
Page 68
CS1-13:
??
e?
i
( 1ii)
=-
ln(1
)
-?
ln
=-
??
?? =-ln +??
-
1
??
?? ii
11++??
ee
??
Generalised linear
models
=ln 1 be ?i ()
[1]
The scale parameter and its functions are:
1
na,(ff )==
f
(ii)
n ??
c(yi, f ) =ln
??=
nyii??
,
ln
?
f
?
?
?
fy
?
?
[1]
The variance function
The variance function is given by var()
=
b )?''(
.
Differentiating
)b?
(
gives:
e?
b' ?()
==
[1/2]
i
1 + e?
??
1+ee
()
e? e?
b'' ?()
e?
==
()
Substituting in
()
??
i
?= ln
[1/2]
22
??
11++ ee
?? gives:
i ??-1
i
1
b ?()''
==
2
1
??
??+
i
i
- ii
1- i
(1 -
[1]
) 2 = ii (1 - i )
??-1
[Total
13.7
(i)
Derive mean and variance
Mean
Differentiating both sides with respect to ? gives:
yb' ()
?
-
?
a()f
y
fy dy()= 0
[1]
Simplifying:
b1()
' ?
yf y() dy-=
()??
aa()
ff
()
f y dy
0
yy
Since?yf y() dyY= E( ), and? fy() dy=1, wehave:
()
EY
()
IFE: 2022 Examinations
b1
)
' ( ?
aaff( )
-=0
[1/2]
The Actuarial
Education
Compan
6]
CS1-13:
Generalised
linear
models
Page 69
Hence:
EY
()0-=
( )
=EY()
?
[1/2]
b
( )
''b??
Variance
Usingthe product rule to differentiate the above equation with respect to
2
??'
yb()? ??
db'' ()?
??f
y
fy
dy
()
=()
??
() ??
()
ff
aa
??
2
=??
??2
fy() dy
d?
yy??
Splitting this into two separate integrals
a()[] 2
y
b ?()()2 fy
( ) dy--'
=
?
)Y?' then
0
[1]
gives:
b''?1()
??fy( ) dy =0
a()f yy
f
Since(bE ()
? gives:
(( 2))yb
var(Y.
)
f( y) dy
-='?
Again
? fy()dy=1, so wehave:
b''1(?)
var(Y)-=0
a()f
a()f []2
[1/2]
Rearranging gives:
var()Ya)
(ii)
[1/2]
bf
( ) ?''(
=
Meanand variance of gamma
Thelog ofthe PDFgiven is:
log fxx
( )
(logaa
=-
log
) - log G( a ) + (
a-
1)log
-
a
x
which can be written as:
(fxx)
exp-??
x
--
-[log(1
1/a
)]
=+ logaa
- log G( a ) +(a
1)log
??
??
This conforms to the definition ofthe exponential family, with:
1
?=-
f a=
,
,
b??()
()
aff=
=- log( - ) ,
1
,
(cx,
)a=-aa log
a
log G( a ) + (
-
1)log x
[1]
Applying the results in (i):
The Actuarial
Education
Company
IFE: 2022 Examination
Page 70
CS1-13:
EX
()
b
( ?)=='
-
11
-( 1) = -
=
11
2
Generalised linear
models
[1]
-??
var( )
and:
)f?Xa
( ) b'' (
==
f
13.8
?
[1]
=
2
a
The scaled deviance is given by:
scaled deviance
2[lnLLlnSM
=-
where SL is the likelihood
the fitted
function
[1/2]
]
for the saturated
model, and ML is the likelihood
function for
model.
First we need the log-likelihoods:
() Lf( y)1=
fy(in )
e
=
1
1
??
-- 11 yy
11e
1
1
=
-?
1
ni
1
n
?
n
yni
[1]
e
Takinglogs:
ln ( )
=-??ln
1 Lyi
ii
[1/2]
i
Sothe log-likelihood ofthe fitted
lnLyMi =-
1
i
-??ln
For the log-likelihood
model,lnML is given by:
[1]
i
ofthe saturated model,lnSL , the fitted values, i
arethe observed values,
yi . Hence:
lnLy
=-
ln Si
?? yi1 yi
-
=-
lnyi
?? 1
[1]
-
So the scaled deviance is:
scaled
deviance
2{ (??
lnyy=- 1()- - ?ln
?=-2lny
- 1 +ln ii +
ii
-
?1 )}
i
i
{}
yi
i
?2ln y
13.9
(i)
)=- 1 + {}(
yii
[1]
ii
Parameterised form
In parameterisedform, the linear predictorsare(with i , j and k correspondingto the levels of
YO, FS and TC respectively):
Model 1:
IFE: 2022 Examinations
a
(4 parameters)
k? ij ++
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 71
There is one parameter to set the baselevel for the combination
0,,YO
FS
00 TC
and one additional
parameter for each ofthe higherlevels ofthe three factors.
Model2:
a ij
There are four parameters for the
additional
(5 parameters)
k?+
22 combinations
of YO and FS (assuming 0TC ) and one
parameter for the higher level of TC.
Model3:
(8 parameters)
a ijk
Thereare eight parametersfor the 2 22
combinations of YO, FS and TC.
[2 for each model]
(ii)
Problems with Model 1 and Model3
Model 1 does not allow for the possibility that there
some of the factors.
to live in towns.
For example, it
maybeinteractions (correlations) between
may be the case that young drivers tend to drive fast cars and
[1]
With Model 3, whichis a saturated model,it would be possible to fit the average values for each
group exactlyie there are no degrees offreedom left. This defeats the purpose of applying a
statistical
model, as it
would not smooth
The problem referred to
(iii)
out any anomalous results.
[1]
with Model 3 corresponds to the idea of undergraduation
Explaining normal
error structure
in Subject CS2.
and canonical link function
Normal error structure meansthat the randomness present in the observed valuesin each
category (eg young/fast/town) is assumed to follow a normal distribution.
[1]
Thelink function is the function applied to the linear estimator to obtain the predicted values.
Associated
with each type of error structure is acanonical
or natural
link function.
In the case
of a normal error structure, the canonical link function is the identity function .
(iv)(a)
[2]
Completed table
The completed table, together with the differences in the scaled deviance and degrees of
freedom, is shown below.
Scaled
Model
Deviance
Constant: 1
DF
50
7
? Scaled
Deviance
? DF
Model 1: YO
FS
TC++
10
4
40
3
Model 2: YO
FS
YO.FS + TC++
5
3
5
1
Model3:
FS
**YO
TC
0
0
5
3
[3]
The Actuarial
Education
Company
IFE: 2022 Examination
Page 72
(iv)(b)
CS1-13:
Compare
Generalised linear
models
models
Comparingthe constant modeland Model1
The difference in the scaled deviances is 40.
Thisis greater than 7.815,the upper 5% point ofthe
So Model 1is a significant improvement
Alternatively, if
(deviance) ?> 2
Comparing
models, wefind that
[1/2]
since
, Model1is a significant improvement over the constant model.
(parameters)
Model 1 and
distribution.
overthe constant model.
we use the AIC to compare
?
2
?3
Model 2
The difference in the scaled deviancesis 5.
Thisis greater than 3.841, the upper 5% point of the
So Model 2is a significant improvement
2
?1
distribution.
over Model1.
Alternatively, if we usethe AIC to compare
[1/2]
models, wefind that since
(deviance)?> 2 ? (parameters) , Model2is a significant improvement over Model1.
Comparing Model 2 and Model 3
The difference in the scaled deviancesis 5.
Thisis less than 7.815,the upper 5% point ofthe
So Model 3is not a significant improvement
2
?3
distribution.
over Model2.
[1/2]
Alternatively, if we usethe AICto compare models, wefind that since
(deviance) ?> 2
?
(parameters)
,
Model 3is not a significant improvement
over Model 2.
So Model 2is the mostappropriate in this case.
13.10
(i)
[1/2]
Natural link function
Usingthe linear predictor
? =+
a
()==
iix, we have EYii
e?i
. So
==
?g
ii() ln i is the
natural link function.
(ii)
[2]
Equations
Assumingthat
??YExp ii (),
17
we havethe following likelihood function:
17
Lf yii()==???e
?-
iiy
ii== 11
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-13:
Generalised
linear
models
Page 73
Taking logs:
17
lniLy
17
17
ln??
=ii=-??
ii==
=11
??
?? ??
?ln
i
17
1
17
=-
??
()
+axye
-
yi??
?? sincei
i=1ii ??
17
11
= E( Y
i ) =
?
?i
xi()
-+ a
ii
[1]
ii== 11
Differentiating withrespect to
?lnL
a gives:
17
-+
=+17 ?ye
a
i
?a
xi()
[1]
i=1
and differentiating
with respect to
17
?lnL
?
17
xxy
+??=ii i
-+a
gives:
xi()
[1]
e
ii== 11
Setting these expressions equal to 0, we obtain:
17
?iye -+ xi() 17
a
=
i=1
17
17
x iiye i()=
??x-+
xi
a
[1]
ii== 11
(iii)
Confidenceinterval
An approximate 95% confidence interval for
. aa
(1.96
)= 8.477
se
Since this confidence interval
(1.96
1.655)
ais:
=
8.477
3.244
=
(5.233,11.721)
[1]
does not contain zero, it is reasonable to assume that the
parameter is non-zero and should be kept.
Thisis equivalent to the significance
(iv)
Test
Comparing
0:0H=
dev?= 26.282
of this distribution.
.2(> )seaa
models
against
Comparing with
of a parameter test:
[1]
2
?1
1:0H?
-
19.457
. The test statistic is:
=
[1]
6.825
wefind that the value ofthe test statistic exceedsthe upper 1% point (6.635)
Wetherefore
reject the null hypothesis and conclude that
Model 2
significantly reduces the scaled deviance(ie it is significantly better fit to the data ), and that
survival time is dependent
The Actuarial
Education
Company
on initial
white blood cell count.
[2]
IFE: 2022 Examination
Allstudy materialproducedby ActEdis copyright andis sold
for the exclusiveuse of the purchaser. Thecopyright is
ownedbyInstitute andFacultyEducationLimited, a
subsidiary of the Institute and Faculty of Actuaries.
Unlessprior authorityis grantedby ActEd,you maynot hire
out,lend, give out, sell, store ortransmit electronically or
photocopy any part of the study material.
You musttake care of yourstudy materialto ensurethat it
is not used or copied by anybody else.
Legal action will betaken if these terms areinfringed. In
addition, we mayseekto take disciplinaryactionthrough
the profession orthrough your employer.
Theseconditionsremainin force after you havefinished
usingthe course.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
Page 1
Bayesian
statistics
Syllabusobjectives
5.1
Explainthe fundamental concepts of Bayesian statistics and usethese concepts to
calculate Bayesian estimates.
5.1.1
5.1.2
Use Bayes theorem to calculate simple conditional
Explain whatis meant by a prior distribution, a posterior distribution and a
conjugate
The Actuarial
probabilities.
prior distribution.
5.1.3
Derive the posterior
5.1.4
Explain whatis meant by aloss function.
5.1.5
Usesimple loss functions to derive Bayesian estimates of parameters.
5.1.6
Derive credible intervals in simple cases.
Education
Company
distribution
for a parameter in simple cases.
IFE: 2022 Examination
Page 2
0
CS1-14:
Bayesian statistics
Introduction
Earlierin this course welooked at the classical approach to statistical estimation, when we
introduced the method of maximumlikelihood and the method of moments. There weassumed
that the parameters to be estimated
were fixed quantities.
In this chapter wedescribe the Bayesian approach. This will also be usedin Chapter 15.
The Bayesian
philosophy
to classical statistical
involves
a completely
different
approach
to statistics,
compared
methods. The Bayesian version of estimation is considered here for
the basic situation concerning the estimation of a parameter given a random sample from
particular distribution.
Classical estimation involves the method of maximum likelihood.
The fundamental
? is considered
difference between Bayesian and classical
methods is that the
to be a random variable in Bayesian methods.
a
parameter
We might have some knowledge about the likely value of ? and wecan represent this using a
distribution. For example, we might believe that ? is equallylikely to take any of the values 1, 2
or 3. In this respect,
wecan treat
? as a random
variable.
In classical statistics,
? is a fixed but unknown quantity.
the careful interpretation
required for classical confidence
that is random.
there is
This leads to difficulties
such as
intervals,
where it is the interval
As soon as the data are observed and a numerical interval is calculated,
no probability
involved.
A statement
such as P(10.45
<<?
13.26)
=
0.95 cannot
be
made because ?is not arandom variable.
In classical statistics
associated
? either lies within the interval orit does not. There can be no probability
with such a statement.
In Bayesian statistics
concerning
the values
no such difficulties arise and probability statements can be made
of a parameter
?.
In fact, wecan calculate a Bayesian confidence interval for a parameter, which wecall a credible
interval.
Wecover this in section 5.
Another advantage of Bayesianstatistics is that it enables us to makeuse of anyinformation that
wealready have about the situation under investigation. Often researchers investigating an
unknown population parameter haveinformation available from other sources in advance of their
study.
take.
Thisinformation
might provide a strong indication
The classical statistical
of what values the parameter is likely to
approach offers no scope for researchers
to take this additional
information into account. The Bayesianapproach, however, does allow for the use of this
information.
For example, suppose that aninsurance company is reviewing its premium rates for a particular
type of policy and has access to results from
other insurers,
as well as from its own policyholders.
Thisinformation from other insurers cannot be taken into account directly becausethe terms and
conditions of the policies for other companies maybe slightly different. However, this additional
information might be very useful, and hence should not beignored.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
1
Page 3
Bayestheorem
If kBB 12,
, ..., B constitute a partition of a sample space S and
0iPB
() ?
for
1,?2, ,ik=
,
then for any event
A in
PA? () P( B
rrB)
PBA? ()r =
for ,rk= 1,2, ?
S such that 0PA
( ) ? :
()
PA
k
where
P(A)= ? PA? B()P( iiB )
i= 1
.
A partition of asample spaceis a collection of events that are mutually exclusive and exhaustive,
ie they do not overlap andthey cover the entire range of possible outcomes.
The result above is known as Bayes theorem
(or Bayes formula)
and is given on page 5 of the
Tables. It follows easily from the result:
PA
()n= P( A)P( B| A)
B
which rearranges to give the conditional
A(|
PB
PA
) =
n
probability formula:
B
()
PA
()
However:
PA(n= ()
B
P B n A) = P( B)P( A| B)
Now,replacing B byrB , wehave:
A(|
PBr
)==
PB
)rr
n
()
A
PA
()
( ) P( A| Br
PB
PA
()
and, from the law of total probability:
PA
()
=? P( A
B P( ii)
B)
i
Bayes formula
allows us to turn
round
a conditional
probability, ie it allows us to calculate
PB
(| A) if we know )PA (| B .
Question
Three manufacturers supply clothing to a retailer.
60% of the stock comes from
Manufacturer 1, 30% from Manufacturer 2 and 10% from Manufacturer 3. 10% of the
clothing from Manufacturer 1 is faulty, 5% from Manufacturer 2 is faulty and 15% from
Manufacturer 3is faulty.
Whatis the probability
The Actuarial
Education
Company
that a faulty
garment
comes from
Manufacturer
3?
IFE: 2022 Examination
Page 4
CS1-14:
Bayesian statistics
Solution
Let
from
A be the event that
Manufacturer
a garment is faulty
and let iB
be the event that the garment
comes
i.
Substituting the figures into the formula for Bayes theorem:
3 () =
PBA?
Although
the faulty
(0.15)(0.1)
(0.1)(0.6)(0.05)(0.3)(0.15)(0.1)++
=
0.015
= 0.167
0.09
Manufacturer 3 supplies only 10% of the garments
garments come from that manufacturer.
to the retailer,
nearly 17% of
Analternative way of tackling this question is to draw a tree diagram. There are 3 manufacturers
so westart
with 3 branches in our tree and mark on the associated probabilities:
B1
0.6
0.3
B2
0.1
B3
Eachgarmentis eitherfaulty (event A) or perfect(event 'A ). Theseoutcomesandtheir
(conditional)
probabilities
are now added to the diagram:
0.1
A
B1
0.9
0.6
0.05
0.3
A'
A
B2
0.95
0.1
0.15
A'
A
B3
0.85
IFE: 2022 Examinations
A'
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
The required
Page 5
probability is:
A(|
PB3
) =
PB3
n
A
()
PA
()
From the diagram wecan see that
PB(
3
)n=A0.1 0.15 = 0.015. (This is obtained by multiplying
the appropriate branch weights.) Wecan also see that there are three waysin which event A can
occur. Since these are mutually exclusive,
probabilities.
( )
PA
we can calculate
PA
() by summing the three
associated
Hence:
(0.6= 0.1) + (0.3
0.05) + (0.1
0.15)
=
0.09
and it follows that:
0.015
PB3(| A)
0.09
==0.167
as before.
Bayestheoremcanbeadaptedto deal withcontinuousrandomvariables.If X and Yare
continuous,then the conditional PDFof Y given Xis:
y(,
fx )
|
YX
fx y)
XY(,
== XY
| fx(, y) f Y( y)
,
fx()
XX
fx()
where:
f
()
xfXXY x, y) dy==
??fX,|(Y( x, y) fY (y) dy
yy
The Actuarial
Education
Company
IFE: 2022 Examination
Page 6
2
CS1-14:
Bayesian statistics
Priorand posteriordistributions
Suppose X =(
XX
12,,..., Xn) is arandom sample from a population specified by the density
or probability function )fx( ; ?
and it is required to estimate
?.
Recallthat a random sample is aset ofIID random variables. Herethe Core Readingis using the
letter
f for both the density function
of a continuous
distribution
and the probability function
of
a discrete distribution.
As a result of the parameter
? being a random variable, it will have a distribution.
allows the use of any knowledge
available
about possible
values for
This
? before the collection
of any data. This knowledge is quantified by expressing it as the prior distribution
of ?.
The prior distribution summarises what weknow about ? before wecollect any datafrom the
relevant
population.
Then after collecting appropriate
data, the posterior
this forms the basis of all inference
concerning
?.
distribution
of ? is determined,
and
The Bayesian approach combines the sample data withthe prior distribution. The conditional
distribution of ? given the observed datais called the posterior distribution of ?.
2.1
Notation
As ? is a random
written as ?T ()f .
variable, it should really be denoted by the capital T, and its prior density
However, for simplicity
no distinction
will be made between T and ?, and
the density will simply be denoted by
is continuous.
(like the binomial
?
()f
? . Notethat referring to a density hereimplies that
In most applications
this will be the case, as even when X is discrete
or Poisson), the parameter (p or ?) will vary in a continuous
space ((0,1)
or (0,)8 , respectively).
Also the population
density
or probability
the earlier )fx( ; ? as it represents
The prior and posterior
distributions
function
the conditional
will be denoted
distribution
of
by )fx (| ? rather than
X given
?.
of ? always have the same support (or domain). In other
words,the set of possible values of ?is the same for both its prior and posterior distributions.
So,if the prior distribution is continuous, then the posterior distribution is also continuous.
Similarly, if the prior distribution is discrete, then the posterior distribution is also discrete.
Suppose, for example, that the parameter
beta distribution
as the prior distribution
? musttake a value between 0 and 1. We might use a
(as a beta random
variable
must take a value between
and 1). The posterior PDFof ?is then also non-zero for values of ? in the interval (0,1) only.
2.2
Continuous prior distributions
Suppose that
X is a random sample from a population specified by)fx (| ? and that ? has
the prior density )f ( ? .
IFE: 2022 Examinations
The Actuarial
Education
Compan
0
CS1-14: Bayesian statistics
In other
Page 7
words, nXX?
1,,
is a set of IID random
variables
whose distribution
depends on the
value of ?. Each of these random variables has PDF f x(| ?) .
Determining the posterior density
The posterior density of
|X? is determined by applying the basic definition of a conditional
density:
(|fX?) =
(,fX )
?
fX ()
=
fX ???)() f(
fX()
Notethat f(X) = ?fX? () f ( d)???. This result is like a continuous version of Bayes theorem.
Wesaw this result at the end of Section 1.
A useful way of expressing the posterior density is to use proportionality.
involve
? and is just the constant
needed to
fX() does not
makeit a proper density that integrates
to
unity, so:
fX()
f?????
f X? () ( )
This formula is given on page 28 of the Tables.
Also,)fX (| ? , being the joint
making the posterior
density ofthe sample values, is none other than the likelihood,
proportional
to the product
of the likelihood
This idea is really the key to answering questions involving continuous
formula for the posterior PDF can also be expressed asfollows:
post ?? ()
fC= fprior()
and the
prior.
prior distributions.
The
L
where:
fpr
()ior
? is the prior PDFof ?
fpost ()?is the posterior PDFof ?
L is the likelihood function obtained from the sample data
Cis a constant that
makesthe posterior PDFintegrate to 1.
Question
The annual number of claims arising from a particular group of policiesfollows a Poisson
distribution with mean . The prior distribution of is exponential with mean30.
In the previous two years, the numbers of claims arising from the group
were 28 and 26,
respectively.
Determine the posterior distribution of
The Actuarial
Education
Company
.
IFE: 2022 Examination
Page 8
CS1-14:
Bayesian statistics
Solution
Weare told that
prior PDF of
prior
has an exponential distribution
with a meanof 30. So
Ex (1 / 30)pand the
is:
()
=
1
fe- /30
30
>
,
0
() ,
LetjX represent
the numberofclaims
in yearj . ThenXj ? Poisson
function
?
and the likelihood
obtained from the sample data is:
28
() LP X
28)==
X( 12(= 26) =
P
28!
26
ee-26!
-254
=Ce
where Cis a constant.
Combining the prior distribution
pos ()t =fKe-
61
/30
and the sample data, we see that the posterior
54
,
>
PDF of
is:
0
for some constant K.
Comparing this
with the formula
the posterior distribution of
for the gamma PDF given on page 12 of the Tables,
is
wesee that
Gamma 30,
()6155
.
Thetable in Section 4 ofthis chapter shows the posterior distribution for some common
combinations of likelihoods and prior distributions.
You do not need to learn this table. However,
you should check that you can derive some of the results in the table, working along the lines
shown in the solution above.
Conjugate priors
For a given likelihood,
if the prior distribution
leads to a posterior
distribution
belonging
to
the same family as the prior distribution, then this prior is called the conjugate prior for this
likelihood.
Thelikelihood function determines whichfamily of distributions willlead to a conjugate pair,ie a
prior and posterior distribution that come from the same family. Conjugate distributions can be
found by selecting a family of distributions that has the same algebraic form asthe likelihood
function, treating the unknown parameter asthe random variable.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
Page 9
Question
Suppose that
X nXX?
12,, ,
is a random sample from
a Type 1 geometric
distribution
with
parameter p , where p is arandom variable.
Determine a family
of distributions
for
p that
would result in conjugate
prior and posterior
distributions.
Solution
Each of the random
PX
x
()
variables iX
p(1==
- p) x -
1
has probability
function:
x =1, 2, 3,?
,
If the observedvaluesof
XX
nxx
,,,
?,,
12 , Xn are x?12
, then the likelihood function is:
nn
Lp())==
P( X
iix
=?? p(1
n
p )ii(11 = p
-
-
p) ?xx
--
n
ii ==11
Weknow that p musttake a value between 0 and 1. Toresult in a conjugate prior, the PDFof p
mustbe of the form:
something
(1 -ppsomething
)
,
for
01p<<
ie p musthave a beta distribution.
Usingconjugate distributions often makesBayesian calculations simpler. Conjugate distributions
may also be appropriate to use where there is a family of distributions
provide anatural
probability
that
might be expected to
modelfor the unknown parameter, egin the previous example wherethe
parameter
p had to lie in the range
01p<<
(which is the range of values over which
the beta distribution is defined).
Uninformativeprior distributions
An uninformative prior distribution assumesthat an unknown parameter is equally likely to take
any value from a given set. In other words, the parameter is
modelled using a uniform
distribution.
Asan example, suppose that we have a random sample
with mean , but wehave no prior information about
model
using a uniform distribution.
appropriate
distribution
The Actuarial
uniform
prior is
U -8
8(,
Since
nXX?
12X
,, ,
from a normal population
. In this caseit would be natural to
can take any value between -8
) . This leads to a problem,
and 8, the
however, since the PDF of this
is 0 everywhere.
Education
Company
IFE: 2022 Examination
Page 10
CS1-14:
Wecan get round this problem by using the distribution )Um
(
m,
) , then the prior PDFof
Um
(,
m-?
? 1
if
?
() = ? 2m
fprior
??
Bayesian statistics
and then letting
m?8
. If
is:
<mm
-<
0otherwise
Also,since the data values come from a normal population, the likelihood function is:
nn
()
Lfxi
( )==??
ii 11
exp
2
11 xi
2
sp
s
==
2??-??
?? ???? ??
??
Asusual, weare using s to denote the population standard deviation. Theformula for the PDF
of
N(,
2)sis given on page 11 of the Tables.
The likelihood
function
can alternatively
() LC exp=-
1 n xi
?
2i =1 s
be expressed as:
2??-??
?? ??
?? ??
??
where Cis a constant that does not depend on .
Combining the prior PDF withthe likelihood function gives:
?
<Km
?
exp
?
fpost = ()
1
2
??-??
2
n
xi
?????
if -??
???
=1
s
<
?
0otherwise
??
where Kis also a constant that does not depend on .
the PDFintegrates
Letting
m
??i
This constant is required to ensure that
to 1.
m?8 , wesee that the posterior PDFis proportional to:
1
exp
n
?
2??-??
xi
2i =1
s
?? ??-,
for
?? ??
??
-8
<8
<
Notice that the PDF of this posterior distribution is proportional to the likelihood function.
This
should be intuitive as, by definition, a posterior distribution is obtained by combining two pieces
ofinformation:
prior knowledge
of the parameter,
and
the sample data.
However, in this case we are using an uninformative
parameter.
The posterior
IFE: 2022 Examinations
distribution
is therefore
prior as we have no prior knowledge
determined
of the
solely by the sample data.
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
2.3
Page 11
Discreteprior distributions
Whenthe prior distribution is discrete, the posterior distribution is also discrete. To determine
the posterior
distribution,
we must calculate a set of conditional
probabilities.
This can be done
using Bayesformula.
Question
The number of claims received per weekfrom a certain portfolio has a Poisson distribution with
mean ?. The prior distribution of ?is asfollows:
?
1
2
3
Prior probability
0.3
0.5
0.2
Given that 3 claims
were received last
week, determine the posterior
distribution
of
?.
Solution
Let X be the number of claims received in a week. To determine the posterior distribution of ?,
we must calculate the conditional
probabilities
(1| PX?
== 3) ,
(2|
== 3)PX?
and
(3| PX?
== 3) . Thefirst of these is:
?==(1|
PX
3)
(1,PX== 3)
PX==
(3)
=
X(
P
=
=
3| = 1)P(??
? = 1)
P X (3)
Since X ? Poisson ?() :
PX==
(3|?
1) =
e -13
11
=
3!
6
e-
1
and, from the given prior distribution, weknow that:
P
?
(1)== 0.3
So:
(1| PX 3) =
?==
11 ee
-- 11
0.3
PX (3)
=
620
P( X== 3)
Similarly:
e?==(2| PX
The Actuarial
Education
3)
Company
PX (3|2)P
=
PX (3)
??(==
= 2)
=
2 23
0.5
3!
PX== (3)
2
=
3
e
-
2
P X =(3)
IFE: 2022 Examination
Page 12
CS1-14:
Bayesian statistics
and:
333
e?==(3| PX
PX (3|
3) =
3)P??(==
= 3)
=
P X (3)
0.2
3!
9 e-3
= 10
PX== (3)
PX =(3)
Sincethese conditional probabilities mustsum to 1,the denominator mustbethe sum of the
three numerators, ie:
(PX==
3)
20
9
--12 +10e
- 3 = 0.15343
+123e
e
Thiscan also be seen using the law oftotal probability:
PX
=
=
PX(3)
(3,
=
(
PX
=
1) + PX (3,??
==
3|
=
1)(P??
=
1)
3)
==
(PX
+
P( X+= 3|
?
=
3, ? == 3)
2)(P ?
=
2)
(PX+= 3|
?
=
3)P(?
=
3)
Sothe posterior probabilities are:
1 e- 1
1|PX?( 3)==
=
20
0.15343
2
2|PX?( 3)==
=
3
e-
2
0.15343
9
10
3|PX?( 3)==
=
= 0.11989
= 0.58806
e-3
0.15343
= 0.29205
Alternatively,
wecould use a proportionality
The posterior
probabilities
argument to determine
are proportional
to the likelihood
(
1|PX
3)==
?P
X(
=
3|??
=
1)P( ?
=
1) = e
(
2|PX
3)==
?P
X(
=
3|??
=
2)P( ?
=
2) =
(
3|PX 3)==
?P
X(
=
3|??
=
3)P(?
=
3) =
Rescaling so the probabilities
(?
1|PX
3)==
=
(?
2|PX
3)==
=
(?
3|PX
3)==
=
IFE: 2022 Examinations
0.3
e-23 2
3!
e-33 3
3!
the posterior
probabilities.
multiplied by the prior probability:
=11 e--11
620
0.5
=
0.01839
= 2 e - 2 = 0.09022
3
9 e-3 = 0.04481
0.2 = 10
sum to 1 we get:
0.01839
0.01839
= 0.11989
0.09022++ 0.04481
0.09022
0.01839
0.09022 ++ 0.04481
0.04481
0.01839
0.09022++ 0.04481
= 0.58806
= 0.29205
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
The posterior
Page 13
probability
that 1? =
is lower
than the corresponding
prior probability.
The other
two posterior probabilities are higher than their corresponding prior probabilities. Thisis to be
expected given that the observed number of claims was3.
Once we have determined the posterior distribution of a parameter, wecan usethis distribution
to estimate the parameter
value.
As we are about to see, the estimate
will depend on the chosen
loss function.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 14
3
CS1-14:
Bayesian statistics
Thelossfunction
To obtain an estimator of
the loss incurred
when
?, a loss function
must first
X
()g is used as an estimator
which is zero when the estimation
and does not decrease
used loss function,
as
gX()
is exactly
gets further
called quadratic
be specified.
This is a measure of
of ?. Aloss function is sought
correct, that is,
away from
or squared
gX()?=
, and which is positive
?. There is one very commonly
error loss.
Two others are also used in
practice.
Then the Bayesian estimator is the
the posterior
X
()gthat
minimises the expected loss with respect to
distribution.
The main loss function
Lg(( x x??),)
is quadratic loss
g( )=-
defined
by:
[]2
So, when using quadratic loss, the aimis to minimise:
??-=
() ??
???
??
Eg ()
x
? ?
x - ()22()g
fpost( )d
?
This is related to
Recallthat, if
?
mean square
error from
classical
statistics.
is an estimator of ?,then:
MSE
(??()
E
=- ?
??
bias()
?
?
)2??
= var()+??
??
2
The formula for the squared error loss implies that as we movefurther
away from the true
parameter value, the loss increases at anincreasing rate. The graph of the loss function is a
parabola with a minimum of zero at the true parameter value.
loss
g
?
A second loss function
Lg(( x), )x=-
IFE: 2022 Examinations
is absolute
( )
error loss
defined
by:
??g
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
Page 15
Herethe graph ofthe loss function is two straightlines that meetat the point (,0)?. As we move
away from the true valuein either direction, the loss increases at a constant rate.
loss
g
?
Athird loss function is 0/1
or all-or-nothing
? 0if (gx)
x((
Lg ),?) = ?
?1if gx)
(
=
loss defined by:
?
? ?
In this case there is a constant loss of 1 for any parameter
estimate that is not equal to the true
underlying parameter value. If wehit the parameter value exactly, then the loss is zero.
loss
g
?
The Bayesian
estimator
that arises
by minimising the expected loss for each of these loss
functions in turn is the mean, median and mode,respectively,
each of which is a measure of location
of the posterior
ofthe posterior distribution,
distribution.
We will prove these results shortly.
The expected
posterior
EPL
The Actuarial
Education
loss is:
E[ L( g (),
)]==
x
Company
?(g
x ??L ) f(?|
(),
x) d ?
IFE: 2022 Examination
Page 16
CS1-14:
The lower limit
Bayesian statistics
of the integral is the smallest possible value of ? and the upper limit is the largest
possible value of ?.
3.1
Quadraticloss
For simplicity,
In other
g will be written instead
words, we are assuming that
of g(x).
g is our estimate of ? .
So:
=-?()
2f ( |
EPL
g
d??
x)
?
We wantto determine the value of g that
minimisesthe EPL,so we differentiate the EPL with
respect to g . Usingthe formula for differentiating anintegral (which is given on page 3 of the
Tables), wesee that:
d
?2( g
EPL
dg
=-??
) f ( | x) d ?
Equating to zero:
gfd???
(|)x
But )fx
d???(|
=
?fx (| ) d
???
= 1.
Thisis because
(|)f
? x is the PDFof the posterior distribution. Integrating the PDFover all
possible values of ? gives the value 1.
So:
d??
Clearly this
gf (| x)
minimises
?
?E( | x)==?
EPL.
Wecan see this from the graph of the loss function or by differentiating the EPLa second time:
d2
2
EPL
dg
Therefore the
?
f | x) ??d
2(
Bayesian
==
estimator
2
>
0
?
min
under quadratic loss is the
mean of the posterior
distribution.
Question
For the estimation of a binomial probability ? from a single observation x ofthe random
variable
X withthe prior distribution
investigate
the form of the posterior
of ? under quadratic loss.
IFE: 2022 Examinations
of ? being beta with parameters
distribution
of ? and determine
a and
,
the Bayesian estimate
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
Page 17
Solution
The proportionality
argument
will be used and any constants
simply
omitted
as appropriate.
Prior:
f()
)a?? ?--
11(1
?-
()
G+a
omitting the constant
()GG a( )
.
Likelihood:
fx (| )-?- ?? xxn(1
?
)
n??
omitting the constant
??
x??
.
Combining the prior PDF with the likelihood
fx| )?-
(??
(1)
?
xn
x
.?
-a --
(1) -
function
?
11
gives the posterior
ax+
= ?
-
1(1)
-?
n
-
x +
PDF:
-1
Nowit can be seen that, apart from the appropriate constant of proportionality, this is the
density of a beta random
distribution
of ? given
variable.
Therefore the immediate
is
beta
with parameters
= Xx
xa+
It can also be seen that the posterior
of distributions.
density
conclusion is that the posterior
and -+ nx
.
and the prior density
belong to the same family
Thus the conjugate prior for the binomial distribution is the beta
distribution.
The Bayesian
estimate
under quadratic loss is the
mean of this
distribution,
that is:
xx ++ aa
()++ (xn
-
x +a )
=
n +
a
+
Wecan use Rto simulate this Bayesian estimate.
The R code to obtain the
pm
for
Monte Carlo Bayesian
estimate
of the above is:
<-rep(0,M)
(i
in
{theta
x
1:M)
<-rbeta(1,alpha,beta)
<-rbinom(1,n,theta)
pm[i]
<-(x+alpha)/(n+alpha+beta)
}
The average
of these
Bayesian estimates
under quadratic
loss is given by:
mean(pm)
The Actuarial
Education
Company
IFE: 2022 Examination
Page 18
CS1-14:
Bayesian statistics
Question
Arandom sample of size 10from a Poisson distribution
values:
with mean ? yields the following data
3, 4, 3, 1, 5, 5, 2, 3, 3, 2
The prior distribution of
?
is Gamma(5,2) .
Calculatethe Bayesian estimate of ? under squared error loss.
Solution
Using the formula for the PDF of the gamma distribution
given on page 12 of the Tables, we see
that the prior PDFof ? is:
25
G(5)
42
???prior()
=
fe Alternatively,
> 0
wecould say:
()?
The likelihood
?()
where
?
,
42??
?
feiorpr
,
function
?
>
0
obtained from the data is:
LP( X == 3) P
X(
=
4)12
? P( X10 = 2)
Poisson?() random variables. So:
?,,XX
110 areindependent
e??
()=
ee-- ??
3!
34
-
?
4!
?
2!
?2
-10?
=LCe
??
31
where C is a constant. (31 is the sum of the observed data values.)
Combining the prior distribution and the sample data, wesee that:
()?
35fetpos
- 12??
?,
?
>
0
Comparing this withthe formula for the gamma PDF, wesee that the posterior distribution of ?
is Gamma (36,12) . The Bayesian estimate of ? under squared error loss is the meanof this
gamma distribution,
IFE: 2022 Examinations
ie =36
12
3.
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
3.2
Page 19
Absoluteerrorloss
Again, g
will be written instead
EPL =?
? g -d?? ? f
Assuming the range for
?x
()
of
x()g.
So:
?
? is (-8,
8), then:
g
8
EPL (??g=-
fxd) ?
() ??
(
?
+
()
- g
?
? ) d?
fx
?
g
-8
Weneed to split the integral into two parts sothat
integral
covers the interval
interval
where
?g=
where
. Here-||
.
? g=
-=??
Here-||
wecan remove the modulussign. Thefirst
-=
gg ??.
The second integral
covers the
gg .
Again, we wantto determine the value of g that
minimisesthe EPL,so we differentiate the EPL
withrespect to g .
by()
[Recall
that
by()
d
fx(, y) dx
dy ?
= ?
ay()
ay()
?
?y
(,
x y)
) (by
(), y)+-f''(
) ( a (),
dx b( yf
a yf
y y)]
(This is the formula for differentiating anintegral, given on page 3 ofthe Tables.)
Replacingx by ? and y by g in theformulafor differentiatinganintegral, weseethat:
gg
d
dg??
(g
g
) ??-?fx() d ? =
-8
?fx
?() d ?+ ( g - g) f ( g | x) - 0
-8
=
?
fx() d?
?
?
-8
and:
88
d
dg??
(???-? g()
fx ) d
8
(??)
= (-1)
fx d?+
gg
?d) ?
0-(g-)gf( g| x) =- ? ?( fx
g
So:
d
dg
g
EPL d??
f =? x() d
??
8
-
f ?x()
?
?
g
-8
Equating to zero:
g
?
8
d??
fx()
?
=
?
d??
?
fx()
g
-8
that is, P(? = g) = P(? = g), which specifies
The Actuarial
Education
Company
the
median of the posterior
distribution.
IFE: 2022 Examination
Page 20
CS1-14:
Recall that the
median of a continuous
distribution
is the value of
Bayesian statistics
Mthat divides the distribution
into two parts, with a 50% probability of beingless than (or equal to)
being greater than (or equal to) M.
Mand a 50% probability of
Question
Arandom sample of size 10 from a Poisson distribution
with mean ? yields the following
data
values:
3, 4, 3, 1, 5, 5, 2, 3, 3, 2
(5,2). Calculate the Bayesian estimate of ? under absolute
The prior distribution of ? is Gamma
error loss.
Solution
From the solution to the previous question,
Gamma(36,12) . The Bayesian estimate of
we know that the posterior
?
distribution
of ? is
under absolute error loss is the median ofthis
distribution, which can be obtained very quickly using R. The command qgamma(0.5,36,12)
gives the answer to be 2.972268.
Weuse the R command
q to calculate the percentiles
of a distribution.
Wefollow
q
with the
name ofthe distribution. Here we wantthe median, or the 50th percentile, so the first argument
is 0.5. The second and third arguments arethe parameters of the gamma distribution.
Alternatively, the mediancould be calculated (approximately) using the Tables. To do this, we
have to use the relationship
between the gamma distribution
and the chi-squared
distribution
(which is givenin the Tables on page 12). Here weknow that:
x ? Gamma 36,12)|(
?
For notational convenience, letWx
W
??236
Gamma(36,12)
|?=
2 12?W
. Then:
2
?
The medianof the posterior distribution is the value of M such that:
PWM<=()
0.5
or equivalently:
?
2
72
(24PM)<= 0.5
2
From page 169 of the Tables, wesee that the 50th percentile of ?70is
69.33 and the 50th
2 is 79.33. Interpolating
percentile of ?80
of
2
?72
between these values, wefind that the 50th percentile
is approximately:
(1
0.2)- 69.33 + 0.2 79.33 = 71.33
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
Page 21
So:
24 M 71.33
Hence:
M 2.972
3.3
All-or-nothingloss
Here the differentiation
approach
with a limiting
argument.
cannot
be used. Instead
a direct approach
will be used
Consider:
?
Lg(( x),?)
0ifgg
= ?
?1
so that, in the limit
-< e?
<
+ e
otherwise
as e ? 0 , this tends to the required
loss function.
Then the expected posterior loss is:
g
+ e
EPL =- ? f1(
g
?x ) ??
d = 1-e2.(f
g| x)
for small e
- e
Thisis saying that, for a narrow strip, the area under the function is approximately equal to the
area of arectangle whose widthis 2e and whoseheight is equal to the average value of the
function over that strip.
Again, the Bayesian estimate is the value of g that
need to
posterior
maximise
e
minimises the EPL. To minimise the EPL, we
f2( |gx) . This occurs when f (|gx) is
maximised, ie at the
mode of the
distribution.
The EPLis
minimised by taking
g to bethe
mode of)fx?(|
.
Question
Arandom sample of size 10 from a Poisson distribution
with mean ? yields the following
data
values:
3, 4, 3, 1, 5, 5, 2, 3, 3, 2
The prior distribution of ? is Gamma(5,2). Calculate the Bayesian estimate of ? under
all-or-nothing loss.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 22
CS1-14:
Bayesian statistics
Solution
From a previous question, weknow that the posterior distribution of ?is Gam
(36,12)ma
. The
Bayesian estimate of ? under all-or-nothing loss is the modeof this distribution. To calculate the
mode, we need to differentiate the posterior PDF(or the log of this PDF)and set the derivative
equal to 0.
Wehave already seen that:
fC
post()=
e??
Takinglogs (to
35 ?- 12
makethe differentiation easier):
ln pos( ) lnfCt??
=+ 35ln
-
12?
Differentiating:
d
35
ln fpost(?)
=-12
d??
The derivative is equal to 0 when
35
?=
12
.
Differentiating again:
d2
35
ln fpost(?)
=-
d
??
22
<
0
Sincethe second derivative is negative, the posterior PDFis maximised when ?=
Bayesian estimate of
IFE: 2022 Examinations
?
under all-or-nothing
loss is
35
12
35
12
. Sothe
or 2.917.
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
4
Page 23
SomeBayesianposteriordistributions
In this section wegive atable of situations in whichthe Bayesian approach may work well. The
likelihood function is given, together with the distributions of the prior and the posterior. Do not
attempt to learn all the results given in this table.
The results are here for reference
purposes
only, and you will not be expected to be able to quote all these results in the examination.
However, you maylike to select one or two ofthe results given here and check that you can prove
that the distribution of the posterior is as stated. Youcould usethe table as a way of generating
extra questions on particular Bayesianresults.
The negative binomial
distribution
referenced
here is that described in the Tables as the Type 2
negative binomial distribution. Theresults for the Type 2 geometric distribution can be obtained
from those for the Type 2 negative binomial by setting k1= . You maylike to work out the
corresponding
results for the Type 1 negative binomial and geometric
Notice that despite the large number of examples given, the posterior
distributions.
distribution
in all these
casesturns out to be gamma, beta or normal. So,in mostBayesian questions it is worth checking
whether the posterior PDFtakes the form of one of these three distributions before you start
thinking
The Actuarial
about other possibilities.
Education
Company
IFE: 2022 Examination
Page 24
CS1-14:
Likelihood ofIID
random variables
Distribution of parameter
Unknown
parameter
1,,
n? XX
Poisson?()
Prior
Posterior
U(0,)8
Gammaxn(1,
+?)
Ex
?0>
Gammax (1, n ?'++?
)
?'()p
Gammaa? (,
''
)
Gamma
U(0,)8
Ex
()p
Ex
?0>
?
?'()p
Ex
''
)
Ga
-8 <
''
)
Gammanaa)++?
(,
-8 <
1
2
1
U)-8 (, 8
1
nn
++
22
??+
??'
11 ??
22 ??
ss ''
??
2??s
?log,
Nx
??
nn ??
??
Beta
+??xx+(1, nm
1)
01p<<
)
Beta
xx+-a)''(, nm
??
nk (1,
Beta
U(0,1)
+
1)++x?
01p<<
Beta
IFE: 2022 Examinations
??
nn ??
??
22
,
Beta ''a(,
NB)k(, p
2??s
??'
ss
N
U(0,1)
Bin)m(, p
?Nx,
?x
<8
<8
x
?''
ss
2)s
(1,a+
?x)
Gamma
n (1,a?x
'? ++ )
?'()p
N(,
)s' '
LogN(,
)
?'?
n++?
)a?x''mma
(,
U)-8 (, 8
2)s
(1,+?x)
Gamma n
Gammaa? (,
N(,
''
Gamma n (1,++x
U(0,)8
?0>
xna??
(, ++ )
Gamma n
Gammaa? (,
Gammaa?
(, )
Bayesian statistics
a'' (,
)
Beta nk)++a ''?(,
The Actuarial
x
Education
Compan
CS1-14: Bayesian statistics
5
Page 25
CredibleIntervals
Having derived the posterior
distribution
we can summarise inferences
density is very informative
of a parameter
? , there are several
ways in
which
about ?. For single parameters, a plot of the posterior
and shows
clearly the range
of values consistent
with our
posterior beliefs.
In Section 5.1 below, the Core Reading considers
distribution is Gamma(15,5.3) .
a numerical
example
where the posterior
A plot of the PDFof this distribution is given below:
As described earlier, we can also quote quantities
parameter or the posterior variance.
such as the
posterior
mean of a
Forthe Gamma(15,5.3) distribution pictured above, the meanis 2.83, the variance is 0.534 and
the standard deviation is 0.731.
For expressing and quantifying
classical
confidence
interval
uncertainty about the values of ?, a natural analogue ofthe
is the Bayesian
credible interval.
In Chapter 8, wesaw how to estimate parameters usingthe method of moments and the method
of maximumlikelihood. In Chapter 9, weused confidence intervals to express the uncertainty in
these estimates. Earlier in this chapter, we estimated a parameter using the mean, mode or
median of its posterior distribution.
We will now explain how to express the uncertainty in these
estimates.
Suppose that,
<<
given data x , we derive the
01a
, a 100-(1
PA(| x)?=
)%a credible interval for
?f(
??x
|
)
d?
posterior
?
density
of ? as
f ? (|
x) .
Then, for
is aregion of ?, say A, which is such that:
= 1 - a
A
So, a 100-(1
)%a credible interval
is an interval
whose posterior
probability
of containing
is 1-a.
The Actuarial
Education
Company
IFE: 2022 Examination
?
Page 26
5.1
CS1-14:
Bayesian statistics
Equal-tailedcredibleintervals
Often, we quote an equal-tailed
100-(1
2)%a critical
points
credible interval,
of the posterior
obtained
by using the 100(
distribution.
For example,
2.5% and 97.5% critical points of the posterior distribution
with
2)%a and
= 0.05 , the
a
would give a 95% credible
interval.
Thisis similar to the approach we usedin Chapter 9 to calculate confidence intervals. If we want
atwo-sided 95%confidence interval, wesplit the remaining 5% equally between the two tails.
By definition, an equal-tailed credible interval
must contain the median of the posterior
distribution, ie the posterior estimate for ? under absolute loss.
To calculate equal-tailed credible intervals for a parameter we need the cumulative
distribution
function
of its
posterior
distribution.
When the posterior
convenient form, such as a normal, beta or gamma distribution,
statistical tables,
calculations.
or standard
functions
from
a computer
distribution
has a
we can usually use
package
such as Rto do the
There are no tables for the beta distribution in the Tables, so we have to use Rto obtain credible
intervals based on a beta posterior distribution. Wecan, however, usethe standard normal
tables for a normal posterior, and the chi-square tables, along withthe gamma-chi relationship,
for a gamma posterior.
Example
Suppose that,
distribution
given data x , the posterior
with parameters
credible interval
of
15 and 5.3, ie
distribution
?
of the parameter
| x ? Gamma(15,5.3) .
?, we need the 5% and 95% critical
? is a gamma
For an equal-tailed
90%
Gamma (15,5.3)
points of the
distribution.
In
R we can use:
qgamma(0.05,15,5.3)
qgamma(0.95,15,5.3)
to obtain the 90% equal-tailed credible interval
as (1.74,4.13).
Notice that, in this case, we can also use the relationship
between the gamma
chi-square
distribution to calculate the interval.
In particular,
we have:
(2
5.3)??||xx= 10.6
From statistical
tables,
? Gamma(15,1 2), ie 10.6?| ~x
we have that the 5% and 95% critical
are 18.49 and 43.77, respectively.
(18.49, 43.77), and therefore
18.49
10.6
,
IFE: 2022 Examinations
So a 90% equal-tailed
a 90% equal-tailed
2
?30
points
2
?30 distribution
of the
credible interval
credible interval
and the
for x? |
for
10.6x? |
is
is:
43.77??
??= (1.74,4.13) , exactly as before.
10.6??
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
Wecan similarly
Page 27
obtain a 95% equal-tailed
credible interval for
The credible interval is (1.58, 4.43). 95% of the distribution
|x?:
(the shaded area in the diagram
above)lies between these values, with 2.5% on either side. The areas under the graphin the two
tails are equal, ie
1.58| Px()<= P??
(
> 4.43|
x) =
0.025 .
Question
Arandom sample of size 15 from
a normal distribution
with mean
and standard
deviation
3
yields the following data values:
10.75 -0.29
The prior distribution
5.37 6.68 8.77 1.69 7.12 4.89 6.45 4.27 9.37 5.68 3.87 7.70 6.98
of
is N(5 ,2 2).
Calculate an equal-tailed 95% Bayesian credible interval for
are given that the posterior distribution of
based on these data values. You
is N(5.83,0.7222 ) .
Solution
From the Tables, we have
N(5.83,0.722
2
(1.96-<
<
1.96) = 0.95PZ
. Sothe lower and upper 2.5% points of
) are:
5.83
1.96
0.772 = 4.41-
5.83
1.96
0.772 = 7.24+
and:
So an equal-tailed 95% credible interval for
The Actuarial
Education
Company
is (4.41,7.24) .
IFE: 2022 Examination
Page 28
5.2
CS1-14:
Bayesian statistics
Highestposteriordensityintervals
As an alternative
interval
for
?
to an equal-tailed
could
be quoted.
credible interval,
a 100-(1
In addition to satisfying
)%a highest
)PA
-(|
?=
such that the minimum density of any point within the interval
the density
posterior
density
1 ?ax , this interval
is
A is equal to or higher than
outside that interval.
Thefollowing diagram shows a 95% highest posterior density interval for
|x?:
Calculating highest posterior density intervals for non-symmetrical distributions is not
straightforward. In R,the package bayestestR hasthe function hdi that calculates the highest
density interval for a parameter.
This is beyond the scope of Subject CS1, but for interested
students, the code used to generate the 95% highest posterior
density interval in this example is
given below:
install.packages("bayestestR")
library("bayestestR")
set.seed(3)
x
<-rgamma(100000,15,5.3)
hdi(x,ci=0.95)
The credible interval is (1.48, 4.29). The areas under the graphin the two tails are not equal,
1.48|Px()<? P( > 4.29| x) ? 0.025??
, although the probabilities do sum to 5%.
ie
For unimodal distributions (such as the gamma distribution), the two endpoints of a highest
posterior
density interval
have the same height (ie density).
In the example above:
ff((4.29)==1.48)
0.80
The densities of all the values in a higher posterior
those outside the interval
density interval
(ie the graph is higher in the interval).
are larger than the densities of
So, a higher posterior
density
interval contains a collection of mostlikely values of the parameter ?, whichis a desirable
property. By definition, a higher posterior density interval mustcontain the mode,ie the
posterior
estimate for
IFE: 2022 Examinations
? under 0-1 loss.
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
For a unimodal
Page 29
distribution,
the highest posterior
density interval is the shortest interval
amongst
all Bayesian credible intervals. For symmetrical distributions, such as a normal posterior
distribution, the equal-tailed credible interval and highest posterior density interval areidentical
when based on the same data set. Forskewed distributions, such asthe gamma and mostbeta
posterior
distributions,
the highest posterior
density interval is not the same as the equal-tailed
interval (as we have seen in the example above involving
The Actuarial
Education
Company
the
Gamma(15,5.3)
distribution).
IFE: 2022 Examination
Page 30
CS1-14:
Bayesian statistics
The chapter summary starts on the next page so that you can
keep all the chapter summaries together for revision purposes.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
Page 31
Chapter14Summary
Bayesianestimation v classicalestimation
A common
problem in statistics is to estimate the value of some unknown
The classical approach to this problem is to treat
? as a fixed,
usesample data to estimate its value. For example,if
then its value maybe estimated by a sample mean.
parameter
but unknown,
?.
constant and
? represents some population mean
The Bayesian approach is to treat ? as a random variable.
Priordistribution
The prior distribution
?
of ? represents the knowledge
before the collection
available about the possible values of
of any sample data.
Likelihood function
Alikelihood
function,
The likelihood
L, is then determined,
function is the joint
?? , |n
XX
X
12,,
based on a random sample
PDF(or, in the discrete case, the joint
X
=
XX
12,
,..., Xn() .
probability)
of
.
Posterior distribution
The prior distribution and the likelihood function are combined to obtain the posterior
distribution
of
?.
When ? is a continuous random
()
post?? L?
variable:
ffprior()
When ? is a discrete random variable, the posterior distribution is a set of conditional
probabilities.
Conjugatedistributions
For a givenlikelihood, if the prior distribution leads to a posterior distribution belonging to
the same family
as the prior, then this prior is called the conjugate
prior for this likelihood.
Uninformativeprior distributions
If we have no prior knowledge
sometimes
referred
about
? , a uniform
to as an uninformative
prior distribution
prior distribution.
should be used. Thisis
When the prior distribution
is
uniform, the posterior PDFis proportional to the likelihood function.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 32
CS1-14:
Bayesian statistics
Lossfunctions
Aloss function, such as quadratic (or squared) error loss, absolute error loss or all-or-nothing
(0/1) loss gives a measure of the loss incurred
value of ? . In other words,it
when
?is usedasanestimatorofthetrue
measuresthe seriousness of anincorrect estimator.
Undersquared error loss, the meanof the posterior distribution
minimisesthe expected loss
function.
Under absolute error loss, the medianofthe posterior distribution
minimisesthe expected
loss function.
Under all-or-nothing
loss, the
mode of the posterior
distribution
minimises the expected loss
function.
Credibleintervals
A Bayesian credible interval
100(1
quantifies
uncertainty
)%a-credible interval is aninterval
about the values of parameter
?. A
whose posterior probability of containing
?
is
1a.
These can be equal-tailed intervals or highest posterior densityintervals.
The endpoints of an equal-tailed 95%credible interval for
points of the posterior
distribution
distribution
with tabulated
of
?. If the posterior
values, we can calculate
? are the lower and upper 2.5%
distribution
equal-tailed
is a standard
confidence intervals
algebraically.
The densities of all points within a highest posterior densityinterval are greater than or
equal to the densities of all points that lie outside the interval.
Wecan use Rto calculate
highest posterior
IFE: 2022 Examinations
density intervals.
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
Page 33
Chapter14 PracticeQuestions
14.1
The punctuality of trains has beeninvestigated by considering a number oftrain journeys. In the
sample, 60% of trains had a destination of Manchester, 20% Edinburgh and 20% Birmingham. The
probabilities
of a train arriving late in
Manchester, Edinburgh or Birmingham
are 30%, 20% and
25%, respectively.
Alate train is picked at random from the group under consideration.
Calculate the probability
14.2
that it terminated
in
Manchester.
Arandom variable X has a Poisson distribution with mean ?, whichis initially assumed to have a
chi-squared
distribution
with 4 degrees of freedom.
Determine the posterior
distribution
of
? after observing a single value x of the random
variable X.
14.3
The number of claimsin a week arising from a certain group ofinsurance policies has a Poisson
distribution
with mean
The prior distribution
14.4
Exam style
.
of
Seven claims
is uniform
were incurred in the last
on the integers
(i)
Determine the posterior distribution of
(ii)
Calculatethe Bayesian estimate of
week.
8, 10 and 12.
.
under squared error loss.
Forthe estimation of a population proportion p, asample of n is taken and yields x successes.
Asuitable
(i)
prior distribution
for
p is beta with parameters
Show that the posterior
distribution
4 and 4.
of p given x is beta and specify its parameters.
[2]
11 successesare observedin a sample of size 25.
(ii)
14.5
Exam style
Calculatethe Bayesian estimate under all-or-nothing (0/1) loss.
[4]
[Total 6]
The annual number of claims from a particular risk has a Poisson distribution
prior distribution
for
has a gamma distribution
with
a2=
and
with mean
.
The
? 5=
.
Claim numbersnxx?
1,,
(i)
over the last n years have been recorded.
Show that the posterior
distribution is gamma and determine its parameters.
[3]
8
Now
suppose
that n8= and ?=xi 5
i=1
The Actuarial
Education
Company
IFE: 2022 Examination
Page 34
(ii)
(iii)
CS1-14:
Determine the Bayesian estimate for
(a)
squared-error loss
(b)
all-or-nothing loss
(c)
absolute error loss.
Bayesian statistics
under:
[5]
Calculate a 95% equal-tailed credible interval for
.
[2]
[Total
14.6
Exam
10]
Asingle observation, x, is drawn from a distribution with the probability density function:
style
fx(|?)
???
?
-
1
=?
??
0<<x
0otherwise
The prior PDFof ?is given by:
f??
()
exp(=- ? ),
Derive an expression in terms
14.7
Exam style
A proportion
>0
?
of x for the Bayesian estimate of
p of packets of a rather
?
under absolute error loss.
[4]
dull breakfast cereal contain an exciting toy (independently
from packet to packet). An actuary has been persuaded by his children to begin buying packets of
this cereal.
His prior beliefs about
distribution
on the interval
(i)
p before opening any packets are given by a uniform
[0,1]. It turns out the first toy is found in the 1n th packet of cereal.
Determine the posterior
distribution
of p after the first toy is found.
[3]
Afurther toy wasfound after opening another 2n packets, another toy after opening another3n
packets and so on until the fifth toy wasfound after opening a grand total of
n1 5nnn
+234
+++
n packets.
(ii)
Determine the posterior
distribution
of p after the fifth toy is found.
(iii)
Show the Bayes estimate of p under quadratic loss is not the same as the
[2]
maximum
likelihood estimate and comment on this result.
[5]
[Total
14.8
Anactuary has atendency to belate for work. If he gets uplate then he arrives at work X
minutes late
Exam style
10]
where Xis exponentially
distributed
with mean 15. If he gets up on time then he
arrives at work Y minuteslate where Y is uniformly distributed on[0,25]. The office manager
believes that the actuary gets up late one third
of the time.
Calculatethe posterior probability that the actuary didin fact get uplate given that he arrives
morethan 20 minuteslate at work.
[5]
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
Page 35
Chapter14Solutions
14.1
Let M denote the event a train chosen at random terminates in Manchester (and let E and B
have corresponding definitions). In addition, let L denote the event A train chosen at random
runs late.
The situation
can then be represented
using the following
0.3
L
0.7
L'
tree diagram:
M
0.
0.2
0.2
L
0.8
L'
0.25
L
0.75
L'
E
0.2
B
The required
probability is:
PM
L(|
PM ) =
From the diagram,
nL)
PM
(
n
L
()
PL
()
wesee that:
0.6== 0.3
0.18
and:
PL
( )
0.6= 0.3 + 0.2 0.2 + 0.2 0.25
=
0.27
So:
0.18 2
==(|
0.27 3
PM L)
Alternatively,
L(|
PM
we can calculate the probability
) =
=
using Bayes formula:
PM() P( L | M)
P ()
M (PL| M
)
(0.6
P( E)P( L| E)++
(PB)PL
( | B)
0.6 0.3
0.3)+ (0.2 0.2) (0.2+ 0.25)
2
=
The Actuarial
Education
3
Company
IFE: 2022 Examinations
Page 36
14.2
CS1-14:
The prior distribution of
pr
()ior
? fe
-??
?
?
2
is
?4 ,
Bayesian statistics
whichis the same as Gamma(2,1/2)
. So:
/2
Thelikelihood function for a single observation x from a Poisson( ?) distribution is proportional
to:
?x
-e
?
So:
/2
post() e??? fe
-- ??
?
Hencethe posterior distribution of
14.3
(i)
= ?xx+1 e -
?
?
32
is Gamma
x
(22,3) .
+
Posterior distribution
Let X bethe number of claimsreceivedin a week. To determinethe posterior distribution of ,
we mustcalculate the conditional probabilities
(8| PX== 7),
(10|
== 7) andPX
(12| PX== 7). Thefirst of these is:
==
(8, PX== 7)
(8| PX 7) =
PX==
(7)
P( X
=
=
7|
=
8)P(8)
=
PX (7)
Since X ? Poisson () :
(7|
PX
8)==
=
e-
878
7!
and since the prior distribution is uniform
P
(8)==
on the integers
8, 10 and 12:
1
3
So:
e== (8|
PX
IFE: 2022 Examinations
7)
=
8781
3 = 0.04653
PX==(7)
PX (7)
7!
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
Page 37
Similarly:
e==
(10| PX
PX 7|
7) =
==
10)P(
(
7!
=
PX (7)
== (12|
PX
7)
Since these conditional
PX 7|
=
(
== 12)P(
P X (7)
probabilities
-
X(
P
=
7)
12712 1
12)
=
3 = 0.03003
(PX== 7)
e
and:
10710 1
10)
=
7!
=
3 = 0.01456
PX== (7)
must sum to 1,the denominator
PX =(7)
must be the sum of the
numerators, ie:
PX( 7)== 0.04653
+
0.03003
+
0.01456
=
0.09112
Sothe posterior probabilities are:
0.04653
= 0.51066
0.09112
8|PX(
== 7) =
(
10|PX== 7) =
12|PX(
== 7)
(ii)
0.03003
0.09112
0.01456
=
0.09112
= 0.32954
= 0.15980
Bayesian estimate under squared error loss
The Bayesian estimate under squared error loss is the meanof the posterior distribution:
8
14.4
(i)
0.51066 +
10
0.32954
+
12
0.15980
=
9.29830
Posterior distribution
Since the prior distribution
prior fp()
p
?-
of p is
Beta(4,4) :
[1/2]
p33(1
)
Nowlet X denotethe numberofsuccesses
from asampleofsize n. Then?X Binomialn
(, p) .
Since x successes have been observed, the likelihood
Lp()
P( X
x)==
=
n??
?? p
x??
(1p- xn
)-- x ? px(1 - p) n
function is:
x
[1/2]
Combining the prior PDF with the likelihood function gives:
(fp )
post
The Actuarial
Education
p (1?- p33
)
Company
p (1p- x) n
x
=px-+ 3(1 - p) n - x +3
[1/2]
IFE: 2022 Examination
Page 38
CS1-14:
Comparing this
with the PDF of the beta distribution
the posterior distribution of p is Beta
+(4,
x
n+- x
Bayesian statistics
(given on page 13 of the Tables),
wesee that
4).
[1/2]
[Total 2]
(ii)
Bayesian estimate under all-or-nothing loss
The Bayesian estimate
value of p that
under all-or-nothing
maximises the posterior
loss is the
mode of the posterior
PDF. To find the
distribution,
ie the
mode, we need to differentiate
the PDF
(or equivalently differentiate the log ofthe PDF)and equate it to zero.
Giventhat
=11xand
= 25n
, the posterior of p is Beta(15,18) and:
fp()p=-post
Cp14(1
Takinglogs (to
)17
[1]
makethe differentiation easier):
ln f ( )p=+lnpC
14ln
+ 17ln(1 - p)
[1]
Differentiating:
d
dp
ln)fp(
14
17
p
1- p
=-
[1/2]
The derivative is equal to 0 when:
14(1
)-= 17pp
ie when:
p=
14
Differentiating
d2
[1/2]
31
again:
ln)fp(
=-
dp
14
22
p
-
17
(1)- p 2
<?0
max
[1/2]
Sothe Bayesian estimate of p under all-or-nothing loss is 14 or 0.45161.
[1/2]
31
[Total 4]
14.5
(i)
Posterior distribution
Since the prior distribution
of
is Gamma (2,5) :
2
prior
()
5
(2)
fe e
=?
--
55
[1/2]
G
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
The likelihood
Page 39
is the product
1
()=
of Poisson probabilities:
xx n
Le
e
--
?
xxn!!
1
?x
i e n
[1]
-
?
So:
nn5)
+5(1
()e? fe+--
post
??xxii
=
e
-
[1]
Comparing this with the PDFof the gamma distribution (given on page 12 ofthe Tables), wesee
that the posterior
distribution
of
(2n++?
Gamma
xi ,
is
5).
[1/2]
[Total 3]
(ii)(a)
Squared-error
When8n =
loss
?xi5=
and
, the posterior
The Bayesian estimate of
ie
distribution
of
is Gamma (7,13) .
under squared error loss is the
mean of the posterior
[1/2]
distribution,
7 or 0.538.
[1/2]
13
(ii)(b)
All-or-nothing
loss
The Bayesian estimate of
under all-or-nothing
loss is the
mode of the posterior
distribution,
ie the value of p that maximisesthe posterior PDF. Tofind the mode, we needto differentiate
the PDF(or equivalently differentiate the log of the PDF)and equate it to zero.
Since the posterior
distribution
fC
post()
of
is
Gamma(7,13) :
613
-
=
e
where C is a constant.
Takinglogs:
ln post( ) =fC
613
-
[1/2]
e
Differentiating:
d
ln fpos()
t
6
=-
13
[1/2]
d
The derivative is equal to 0 when
6 .
13=
[1/2]
Differentiating again:
d2
ln
fpost
( )
d
The Actuarial
Education
Company
6
=-
22
?0
<
max
[1/2]
IFE: 2022 Examination
Page 40
CS1-14:
6
under all-or-nothing loss is 13
or 0.462.
Sothe Bayesian estimate of
The modeof Gam
)maa?(,
Bayesian statistics
a
is
-
1
provided that
a 1>
.
?
(ii)(c)
Absolute error loss
The Bayesian estimate of
under absolute error loss is the medianofthe posterior distribution.
Here weknow that:
x ? Gamma|( 7,13)
For notational
convenience, letWx
??27
WGamma(7,13)
. Then:
|=
2 13?W
2
[1/2]
?
The medianof the posterior distribution is the value of M such that:
PWM<=()
0.5
or equivalently:
2
? 14
[1/2]
(26PM)<= 0.5
From page 169 of the Tables, wesee that the 50th percentile of
26
13.34
13.34
?
26
Sothe Bayesian estimate of
2
?14
is 13.34:
==0.513MM=
[1/2]
under absolute error loss is 0.513.
[1/2]
[Total
(iii)
5]
Credibleinterval
In part (ii)(c), we noted that:
W W?
x
2
Gamma 7,13)=?26|( ?? 14
From pages 168 and 169 of the Tables, we have:
2
P(5.629<<? 14
26.12)
Therefore a 95% equal-tailed
5.629
26
=
0.95
credible interval
[1]
for
|x is:
26.12??
,
IFE: 2022 Examinations
[1]
??= (0.217,1.00)
26 ??
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
14.6
Page 41
Since we only have a single observation,
the likelihood
function is equal to the PDF of the
distribution from whichthe observation came, ie:
-1
???
?
=?
L()?
0<<x
[1/2]
0otherwise
??
Also, since f pr
ior
()?=-?? exp( ) for
? 0>
, it follows that :
?
?
?Ce
-
post() fC=
fL ?()
prior()
??
=?
??
0<<x
?
[1/2]
0otherwise
where C is a constant. This distribution is notin the Tables,so we will haveto workfrom first
principles to determine the value of the constant.
Integrating
the posterior
PDF over all possible values of ? gives 1:
8
? Ce
-- ?? 8
Ce
11
??x -=
?
d
=
? =C
? Ce1xx???=
e
[1]
x
So the posterior
PDFis:
pos () fet???--(),x =>x
[1/2]
The Bayesian estimate of ? under absolute error loss is the medianof the posterior distribution.
The median,
m, satisfies the equation:
8
?
()ed? ?=1/2
x
--
[1/2]
m
Integrating:
e
?e
8
?? -= 1/2
x()
? --
?? m
--
()
mx=1/2
?
mx-= log2
?
=+ log2
mx
ie the Bayesian estimate of
The Actuarial
Education
Company
[1]
?
under absolute error loss is x +log2 .
[Total 4]
IFE: 2022 Examination
Page 42
14.7
CS1-14:
Bayesian statistics
This question is Subject CT6, April 2012, Question 6.
(i)
Posterior distribution of p
Let X be the number of packets of cereal that
must be opened in order to find a toy.
Then |X
p
has a Type 1 geometric distribution with parameter p. The prior distribution for p is uniform
over the interval [0,1] , so:
priorfp()p==1, 0
=1
ie the prior PDFis constant overthe interval [0,1] .
[1/2]
The sample consists of one observation,1n . Sothe likelihood function is:
Lp
()n==P( X
=) (1
1
-
p)n1
p- 1
[1]
Combining the prior PDFand the likelihood function,
n -
priorf
? L(p) = p 1- p() 1 1
postfp()
So the posterior
wesee that:
distribution
of p is
[1/2]
Beta (21, n ) .
[1]
[Total 3]
(ii)
Posterior
The likelihood
distribution
after the fifth toy is found
function is now:
L ()==
p P X
=(1 -
n
11()
P( X2 = n2 )
)12 pp (1-
PX3
=
n3() P X4
pnn
) 11 p (1-
p)
--
=
n4()
p (1 )- p
PX5
=
n 5()
p (1-
111p
pnnn
) 354
---
(1=- ) ?ni - 5pp 5
[1]
and hence:
fp p?- p5 1 () ?-ni 5
post()
[1/2]
5
Sothe posterior distribution of p is Beta 6,
?ni
i =1
??
.
4????
[1/2]
??
[Total 2]
(iii)
Bayesian estimate under quadratic loss v maximumlikelihood estimate
The Bayesian estimate
of p under quadratic loss is the
mean of the posterior
distribution:
66
=
+??2ii+-64
[1/2]
55
nn
ii== 11
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-14: Bayesian statistics
Page 43
The maximum likelihood
Lp()=-(1
p)
estimate
of p is the value of p that
function:
?ni 5 p5
-
Thisis the same asthe value of p that
5
??
i=1
??
maximisesthe log-likelihood function:
?-in 5??
ln 1 ??
5ln p=+
log Lp
( )
maximises the likelihood
p
()
Differentiating withrespect to p :
5
d
dp
5=-??i=1
log Lp
( )
[1]
p ??-
1
p
??
5??-
?ni
??
??
??
The derivative is equal to 0 when:
5
5
=i
?ni
- 5
1
=
1-pp
ie when:
5
51
()-- pp=??
?ni
i=1
??
- 5
??
??
0
The solution of this equation is:
p=
5
[1]
5
? ni
i =1
Wecan check this is a maximum by differentiating
5
d2
log )Lp
(
=-
dp
5
22
p
? ni
-
i =1
a second time
with respect to
p:
??
5????
??
[1/2]
(1 - p) 2
Since each 1in= , both terms in the expression are negative, so we have a maximum. Hencethe
5
maximum
likelihood
estimate
of p is 5 ? .
[1/2]ni
i=1
The Actuarial
Education
Company
IFE: 2022 Examination
Page 44
CS1-14:
The two estimates are different.
g that
Bayesian statistics
The Bayesian estimate of p under quadratic loss is the value of
minimisesthe expected posterior loss:
1
?
-
() 2 f post( p) dp
gp
[1/2]
0
The maximum likelihood
estimate
of p is the value of p that
maximises the likelihood
function.
[1/2]
We would expect the estimates to be different
since they are calculated in different
ways.
[1/2]
[Total 5]
14.8
This question is Subject CT6, April 2013, Question 3.
The required
probability is:
P(uplate|>20
minslate)
( > 20 minslate|up late)PP(up late)
=
( 20 minslate|up late)PP(up late)
P>+>
(
20 minslate|up on time)P(up on time)
[1]
Usingthe fact that whenthe actuary gets uplate, he arrives at work X minuteslate and when he
gets up on time, he arrives at work Y minuteslate, wehave:
P(uplate|>20
Since
(PX
20) P(up late)
20)P(up late)>+ P( Y > 20)P(up on time)
(PX
>
[1]
?XExp(1 15):
PX>=
(20)
Also
minslate) =
1 - FX(20) = 1 - 1-e
() = e 4 3
20
1 15
-
[1]
-
?YU(0,25) , so:
PY>=
(20)
1 -FY(20) = 1 - 20 - 0 = 1
25 - 0
[1]
5
Substituting these in gives:
e - 43
P(uplate|>20
minslate)
e 43
-
+
1
3
11
2
35
3
==
0.39722
[1]
[Total 5]
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-15: Credibility
theory
Page 1
Credibility
theory
Syllabusobjectives
5.1
Explain the fundamental
calculate
5.1.7
5.1.8
concepts
of Bayesian statistics and use these concepts to
Bayesian estimates.
Explain whatis meant by the credibility premium formula and describe the
role played bythe credibility factor.
Explainthe Bayesian approach to credibility theory and useit to derive
credibility
premiums in simple cases.
5.1.10 Explainthe differences between the two approaches(ie the Bayesian
approach and the empirical Bayesapproach) and state the assumptions
underlying each ofthem.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 2
0
CS1-15: Credibility
theory
Introduction
In this chapter we will discusscredibility theory and explain how it can be usedto calculate
premiums orto estimate claim frequencies in generalinsurance. Here we willconcentrate on the
Bayesian approach to credibility.
We will be using the theory
of Bayesian estimation
developed in
Chapter 14 as well as some results from Chapter 5involving conditional random variables.
IFE: 2022 Examinations
The Actuarial
Education
Compan
CS1-15: Credibility
1
theory
Page 3
Recapofconditionalexpectationresults
Recallfrom Chapter 5 that if
E X
(|
X and Y are discrete random variables, then:
?xP(
Yy)==
Xx|
Yy)
=
=
x
Similarly,if
X and Y are continuous random variables, then:
EX
(| Y
y)==?
x fXY(x
y) dx
|
,
x
Manipulation of conditional
expectations is an important technique
is in many other areas of actuarial science.
Some results are:
For any random
variables
X and
Y (for
which the relevant
in credibility
(15.1.1)
This result is easy to demonstrate.
?E
as it
moments exist):
EX[] = E[ E( X| Y)]
E X ])
[EY(|
theory,
If
X and Y are discrete random variables, then:
X
(| Y== y) P( Y
=
y)
y
??
??
xPX
=(| == x Y y)?? P( Y = y)
??
yx
?? xP
X== (,
x Y
=
y)
xP X== x(, Y
=
y)
yx
??
xy
??xP
X== (,
x Y
=
y)
xy
? xP
X== x()
x
=
EX
()
Asimilar approach usingintegrals can be usedif X and Y are continuous random variables.
Another important
and 2X
concept is that
are conditionally
EX X
[| Y]
12
The Actuarial
Education
=
Company
of conditional
independent
[EX1|
independence.
given a third random
]YE[ X2| Y]
If two random
variable
variables 1X
Y, then:
(15.1.2)
IFE: 2022 Examination
Page 4
CS1-15: Credibility
Intuitively this says that both1X and2X
known, then 1X and2X
unconditionally
areindependent.
independent,
ie independent
theory
depend on Y, but, if the value taken by Y is
This does not imply that 1X and 2X
if the value taken
are
by Y is not known.
Hence, it
may be the case that:
EX X
[]
12
even though
? E[ X1] E[ X2]
(15.1.2)
IFE: 2022 Examinations
holds.
The Actuarial
Education
Compan
CS1-15: Credibility
theory
Page 5
2
Credibility
2.1
Thecredibility premiumformula
The basic idea underlying the credibility
premium formula
appealing.
Consider an extremely simple example.
is intuitively
very simple
and very
Example
Alocal
authority
local authority
in a small town
has run
a fleet
of ten buses for a number
accidents involving
these buses. The pure premium for this insurance
calculated, ie the expected cost of claims in the coming year.
In order to
of years.
The
wishes to insure this fleet for the coming year against claims arising from
makethis calculation, the following
For the
past five
years for this fleet
needs to be
data are available to you:
of buses the average
cost of claims
per annum
(for the ten buses) has been 1,600.
Data relating
to a large
number
of local
authority
bus fleets from
all over the
United
Kingdom show that the average cost of claims per annum per bus is 250, so that
the average
cost of claims
per annum for a fleet
of ten
buses is 2,500.
However, while this figure of 2,500 is based on many morefleets of buses than the
figure
of 1,600,
some of the fleets
of buses included
in this large
data set operate
under very different conditions (eg in large cities orin rural areas) from the fleet
which is of concern here, and these different conditions are thought to affect the
number
and size of claims.
There are two extreme choices for the pure premium for the coming year:
(i)
1,600
could
be chosen
as it is based on the
most appropriate
data
(ii)
2,500
could
be chosen
because it is based on more data, so might be considered
a
morereliable figure.
The credibility
approach to this problem is to take a weighted
answers, ie to calculate the pure premium as:
1,600 + (1 - ZZ)
average
of these two extreme
2,500
where Z is some number between zero and one. Z is known asthe credibility factor.
Purely for the sake of illustration,
suppose
Z is set equal to 0.6 so that the pure premium is
calculated to be 1,960.
This example will be revisited to illustrate
some points in the next section but now the
above ideas will be expressed alittle
more formally.
The problem is to estimate the
expected aggregate claims, or, possibly, just the expected number of claims, in the coming
year from a risk.
By a risk we mean a single policy or a group of policies.
These policies
are, typically,
short term policies and, for convenience, the term of the policies will be taken
to be one year, although it could equally well be any other short period.
The Actuarial
Education
Company
IFE: 2022 Examination
Page 6
CS1-15: Credibility
theory
The following information is available:
x is an estimate
of the expected
aggregate
claims / number
of claims for the coming
claims / number
of claims for the
year based solely on data from the risk itself.
is an estimate
coming
identical
of the expected
aggregate
year based on collateral data, ie data for risks
to, the particular risk under consideration.
The credibility
premium formula
claims) for this risk is:
(or credibility
estimate
similar to, but not necessarily
of the aggregate
claims / number
(1+- )Z
Zx
of
(15.2.1)
where Z is a number between zero and one and is known as the credibility factor.
The
attractive features of the credibility
premium formula are its simplicity and, provided
x
and
are obviously reasonable
alternatives, the ease with which it can be explained to a
lay
person.
Question
Aspecialist insurer that provides insurance
against breakdown
of photocopying
equipment
calculatesits premiums using a credibility formula. Based on the companys recent experience of
all modelsof copiers, the premium for this year should be 100 per machine. The companys
experience for a new modelof copier, whichis considered to be morereliable, indicates that the
premium should be 60
per machine.
Given that the credibility
factor is 0.75, calculate the premium that should be charged for insuring
the new model.
Solution
The premium
based on the collateral
data (including
all machines) is:
100=
The premium
based on the direct data (the new model) is:
X 60=
So, usingthe credibility formula with Z 0.75=
, the premium that should be charged is:
PZ
=+X(1-
IFE: 2022 Examinations
Z)
=
0.75
60
+
0.25
100
=
70
The Actuarial
Education
Compan
CS1-15: Credibility
theory
Page 7
Examples of situations
where an insurer
might determine
a premium rate by combining
direct
data for arisk with collateral datainclude the following:
Newtype of cover
Aninsurer offering a new type of cover (eg protection against damage caused by driverless
vehicles) would not have enough direct data available initially from the claims from the new
policiesto judge the premium accurately. Theinsurer could useclaims data from similar
well-establishedtypes of cover (eg vehicles with drivers) as collateral datain the first few
years. Asthe company sold more of the new policies, the pattern of claims arising from
driverless vehicles would become clearer and the insurer could put more emphasis on the
direct data.
Unusual risk
Aninsurer insuring a small number of yachts of a particular model would not have enough
direct data for this modelof yacht to set an appropriate premium rate. Theinsurer could
use past claims experience from similar types of boats as collateral data. Theinsurer may
never have enough experience for this particular model to assessthe risk purely on the
basis ofthe direct data.
Experience rating
Aninsurer insuring a fleet of motor vehicles operated by a medium sized company may wish
to charge a premium that is based on the collateral data provided by motor fleets as a
whole, but also takesinto account the past experience provided by the direct data for this
particular fleet. If the safety record for the company has been good, the company will pay a
lower-than-average
2.2
premium.
The credibility factor
The credibility factor
Z is just a weighting factor. Its value reflects how much trust
is
placed in the data from the risk itself,
x, compared
with the data from the larger group,
,
as an estimate of next years expected aggregate claims or number of claims
the higher
the value of Z, the more trust is placed in x compared with , and vice versa. This idea
will be clarified by going back to the simple example in Section 2.1.
Suppose that datafrom the particular fleet of buses under consideration
for
more than just five years.
For example,
suppose
that the estimate
claims in the coming year based on data from this fleet itself
had been available
of the aggregate
were 1,600, as before, but that
this is now based on ten years data rather than just five. In this case, the figure of 1,600 is
considered
more trustworthy
than the figure of 2,500, and this means giving the credibility
factor a higher value, say 0.75 rather than 0.6. The resulting
aggregate
claims
Now suppose
credibility estimate ofthe
would be 1,825.
the figure
of 1,600 is based on just five years
data, but the figure
of 2,500
based only on data from bus fleets operating in towns of roughly the same size as the one
under consideration,
ie it no longer includes
data from large cities or rural areas. (It is still
assumed that the figure of 2,500 is based on considerably
more data than the figure of
1,600.) In this case the collateral data would be regarded as morerelevant than it wasin
Section 2.1 and so the credibility factor
would be correspondingly
0.4 from 0.6 giving a credibility
premium of 2,140.
The Actuarial
Education
Company
reduced,
for example to
IFE: 2022 Examination
is
Page 8
CS1-15: Credibility
The models discussed in this chapter
theory
do not allow any scope for this kind of subjective
adjustment.
Finally,
suppose
the situation
is exactly
as in
Section
2.1 except that the figure
of 2,500
is
based only on data from bus fleets operating in London and Glasgow. In this case the
collateral data might be regarded as less relevant than in Section 2.1 and so the credibility
factor would be correspondingly
increased,
for example to 0.8 from 0.6, giving a credibility
premium of 1,780.
Sothe amount of the collateral data is also afactor. If there is a great deal of(relevant) collateral
data, the credibility factor maybe reduced to allow for this.
From these
simple
examples it can be seen that, in general terms, the credibility
factor in
formula (15.2.1) would be expected to behave as follows:
The more data there
are from the risk itself,
the higher
should
be the value of the
credibility factor.
The more relevant
factor.
the collateral
data, the lower
should
be the value of the credibility
One final point to be made about the credibility factor is that, while its value should reflect
the amount of data available from the risk itself, its value should not depend on the actual
data from the risk itself, ie on the value of x. If Z were allowed to depend on x then any
estimate ofthe aggregate claims/number
could
be written in the form
Thisis easily verified.
ZxZ+(1
)
of (15.2.1)
Setting
f
=
Z=
-f
??
?
??
?
f
xx
=+
-ff
x
-
x
--
(f
()/
x-- )
.
?
?
?
f??f
??
xx--
??
x
f --
x--
Z to be equal to
wesee that:
??x?+ 1 - -xx
??
?? x=+
=
by choosing
,
x -
of claims, say f, taking a value between x and
??
f
xx--
=f
The problems remain of how to measurethe relevance of collateral data and how to
calculate the credibility
credibility and empirical
factor
Z.
There are two approaches
to these
problems:
Bayesian
Bayes credibility theory.
The first of these is covered in the remainder
of this chapter.
The second is discussed in
Download