ActuarialStatistics Combined MaterialsPack for examsin 2022 The ActuarialEducationCompany on behalf of the Institute and Faculty of Actuarie Allstudy materialproducedby ActEdis copyright andis sold for the exclusiveuse of the purchaser. Thecopyright is ownedbyInstitute andFacultyEducationLimited, a subsidiary of the Institute and Faculty of Actuaries. Unlessprior authorityis grantedby ActEd,you maynot hire out,lend, give out, sell, store ortransmit electronically or photocopy any part of the study material. You musttake care of yourstudy materialto ensurethat it is not used or copied by anybody else. Legal action will betaken if these terms areinfringed. In addition, we mayseekto take disciplinaryactionthrough the profession orthrough your employer. Theseconditionsremainin force after you havefinished usingthe course. IFE: 2022 Examinations The Actuarial Education Compan CS1: Study Guide Page 1 Subject CS1 2022StudyGuide Introduction This Study Guide has been created to help you navigate your waythrough Subject CS1. It contains all the information you will need before starting to study Subject CS1for the 2022 exams and you mayalso find it useful to refer to throughout your studies. The guideis split into two parts: Part 1 contains specific information about Subject CS1 Part 2 contains general information about the Core Principles subjects. Pleaseread this Study Guide carefully before reading the Course Notes, evenif you have studied for some actuarial exams before. Whileyou mayhave already read (the majority of) the Part 2 material in previous subjects, the information in Part 1is unique to this course. Contents Part1 Section1 Subject CS1 background and contents Page 2 Section 2 Subject CS1 Syllabus and Core Reading Page 4 Section 3 Subject CS1 summary of ActEd products Page 12 Section 4 Subject CS1 skills and assessment Page 13 Section 5 Subject CS1 frequently Page 14 Part 2 Section 1 Before you start Page15 Section 2 Corestudy material Page16 Section 3 ActEdstudy support Page18 Section 4 Study skills and assessment Page 25 Section 5 The Actuarial asked questions Education Company Queriesand feedback Page 31 IFE: 2022 Examination Page 2 CS1: Study Guide 1.1 SubjectCS1 backgroundandcontents History The Actuarial Statistics subjects (Subjects CS1and CS2) wereintroduced in the Institute and Faculty of Actuaries 2019 Curriculum. Subject CS1is Actuarial Statistics. Predecessors Thetopics in the Actuarial Statistics subjects cover content previously in Subjects CT3,CT4,CT6 and a small amount from Subject ST9: Subject CS1 contains material from Subjects CT3 and CT6. Subject CS2 contains material from Subjects CT4, CT6 and ST9. Exemptions In order to be eligible for a passin Subject CS1,you will need: to have passed or been granted an exemption from Subject CT3 during the transfer process to have met the professions Seethe professions requirements based on the current curriculum. websitefor further details: www.actuaries.org.uk/studying/exam-exemptions Prerequisites/ requiredknowledge The CS1course assumes that students have a certain level of statistical knowledge before they start. More detail on this is given in the CS1 Syllabus (see pages 4-11 in this document). If you feel that you do not havethis level of background, you may wantto consider ordering the ActEdcourse Pure Mathsand Statistics for Actuarial Studies. Moreinformation on prerequisites is given later (see page 5 of this document). Alternatively, a good A-level statistics textbook would help to fill any gaps. Anextra chapter covering the assumed statistical knowledge for Subject CS1is available on the ActEd website. Alink is given below: www.ActEd.co.uk/help_and_advice_CS1_assumed_knowledge.html IFE: 2022 Examinations The Actuarial Education Compan CS1: Study Guide Page 3 Linksto othersubjects Subject CS2 Risk Modelling and Survival Analysis builds directly on the materialin this subject. Subjects CM1 and CM2 Actuarial Mathematics and Financial Engineering and Loss Reservingapply the materialin this subject to actuarial andfinancial modelling. Contents There are four parts to the Subject CS1course. The parts cover related topics and are broken down into chapters. Atthe end of each part there are assignments testing the materialfrom that part. Thefollowing table shows how the parts and chapters relate to each other. Thefinal column shows how the chapters relate to the days of the regular tutorials. This table should help you plan your progress acrossthe study session. Part 1 2 3 4 The Actuarial Chapter No of Title pages 1 Data analysis 23 2 Probability 63 3 Generatingfunctions distributions X1 Joint distributions 59 5 Conditional expectation 20 6 Central Limit Theorem 27 7 Sampling and statistical inference 33 8 Point estimation 63 9 Confidence intervals 50 10 Hypothesistesting 89 11 Correlation 41 12 Linear regression 77 13 Generalisedlinear 14 Bayesianstatistics 44 15 Credibility theory 34 16 Empirical Bayes credibility Company Tutorial 4 days 1 30 4 Education X Asst Y Asst models theory Y1 X2 2 X3 3 73 Y2 X4 4 54 IFE: 2022 Examination Page 4 CS1: Study Guide 1.2 SubjectCS1 Syllabusand CoreReading Syllabus The Syllabusfor Subject CS1is given here. Tothe right of each objective are the chapter numbers in which the objective is covered in the ActEd course. Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and statistical techniques that are of particular relevance to actuarial work. Competences Onsuccessful completion of this subject, a student will be able to: 1. describe the essentialfeatures of statistical distributions 2. summarise data using appropriate statistical analysis, descriptive statistics and graphical presentation 3. describe and apply the principles 4. describe, apply and interpret the results ofthe linear regression modeland generalised linear 5. of statistical inference models explain the fundamental Bayesian estimators. concepts of Bayesian statistics and use them to compute Syllabus topics 1. Random variables and distributions (20%) 2. Data analysis (15%) 3. Statistical inference (20%) 4. Regression theory 5. Bayesian statistics and applications (30%) (15%) The weightings areindicative of the approximate balance of the assessment ofthis subject between the mainsyllabus topics, averaged over a number of examination sessions. The weightings also have a correspondence syllabus topic. with the amount of learning material underlying each However, this will also reflect aspects such as: the relative complexity of eachtopic, and hencethe amount of explanation and support required for it the need to provide thorough foundation understanding on whichto build the other objectives the extent of prior knowledge whichis expected the degree to area is IFE: 2022 Examinations which each topic more knowledge or application based. The Actuarial Education Compan CS1: Study Guide Page 5 Assumed knowledge Thissubject assumesthat astudent will be competent in the following elements offoundational mathematics and basic statistics: 1 Summarise the mainfeatures of a data set (exploratory data analysis) 1.1 Summarise a set of data using a table or frequency distribution, and display it graphically using aline plot, a box plot, a bar chart, histogram, stem and leaf plot, or other appropriate elementary device. 1.2 Describethe level/location appropriate. of a set of data usingthe mean, median, mode,as 1.3 Describe the spread/variability of a set of data using the standard deviation, range andinterquartile range, as appropriate. 1.4 2 Explain whatis meant by symmetry and skewness for the distribution of a set of data. Probability 2.1 Set functions and sample spacesfor an experiment and an event. 2.2 Probability as aset function on a collection of events andits basic properties. 2.3 Calculate probabilities of events in simple situations. 2.4 Derive and use the addition rule for the probability 2.5 Define and calculate the conditional of another event. 2.6 Derive and use Bayes Theorem for events. 2.7 Defineindependence for two events, and calculate probabilities in situations probability of the union of two events. of one event given the occurrence involving independence. 3 Randomvariables 3.1 Explain whatis meant by a discrete random variable, define the distribution function and the probability function to calculate probabilities. 3.2 Explain whatis meant by a continuous random variable, define the distribution function and the probability density function of such a variable, and usethese functions 3.3 The Actuarial Education of such a variable, and use these functions to calculate probabilities. Define the expected value of afunction of arandom variable, the mean,the variance, the standard deviation, the coefficient of skewness and the moments of arandom variable, and calculate such quantities. Company IFE: 2022 Examination Page 6 CS1: Study 3.4 Evaluate probabilities associated with distributions (by calculation Guide or by referring to tables as appropriate). 3.5 Derivethe distribution of afunction of arandom variable from the distribution of the random variable. Detailed syllabus objectives 1 Random variables and distributions (20%) 1.1 Define basic univariate distributions and usethem to calculate probabilities, quantiles and moments. 1.1.1 (Chapter Define and explain the key characteristics ofthe discrete distributions: geometric, binomial, 1.1.2 2) negative binomial, hypergeometric, Poisson and uniform on a finite set. Define and explain the key characteristics ofthe continuous distributions: normal, lognormal, exponential, gamma, chi-square, ,tF 1.1.3 Evaluate probabilities and quantiles associated or using statistical software as appropriate). 1.1.4 Define and explain the key characteristics , beta and uniform with distributions on an interval. (by calculation of the Poisson process and explain the connection between the Poisson process and the Poisson distribution. 1.1.5 1.1.6 1.2 Generate basic discrete and continuous random variables usingthe inverse transform method. Generate discrete and continuous Independence, joint and conditional random distributions, variables using statistical software. linear combinations of random variables 1.2.1 Explain whatis meantbyjointly distributed random variables, marginal distributions 1.2.2 (Chapter 4) and conditional Define the probability distributions. function/density function of a marginal distribution and of a conditional distribution. 1.2.3 Specifythe conditions under whichrandom variables are independent. 1.2.4 Define the expected value of a function variables, the covariance and correlation of two jointly coefficient distributed random between two variables, and calculate such quantities. 1.2.5 1.2.6 1.2.7 Define the probability function/density function of the sum oftwo independent random variables as the convolution of two functions. Derive the mean and variance oflinear combinations of random variables. Usegenerating functions to establish the distribution oflinear combinations of independent random variables. IFE: 2022 Examinations The Actuarial Education Compan CS1: Study Guide Page 7 1.3 Expectations, conditional expectations 1.3.1 Define the conditional expectation of one random variable given the value of another random variable, and calculate such a quantity. 1.3.2 Show how the meanand variance of arandom variable can be obtained from expected values of conditional 1.4 (Chapter expected values, and apply this. Generating functions (Chapter 1.4.1 Define and determine the moment generating function of random variables. 1.4.2 Define and determine 1.4.3 Usegenerating functions to determine the moments and cumulants of random the cumulant generating function variables, by expansion as a series or by differentiation, 1.4.4 Identify 5) the applications for of random 3) variables. as appropriate. which a moment generating function, a cumulant generating function and cumulants are used, and the reasons whythey are used. 1.5 Central Limit Theorem 1.5.1 statement and application (Chapter 6) State the Central Limit Theorem for a sequence distributed 1.5.2 random ofindependent, identically variables. Generate simulated samples from a given distribution and compare the sampling distribution withthe normal. 2 Data analysis (10%) 2.1 Data analysis (Chapter 2.1.1 and Describe the possible aims of data analysis (eg descriptive, inferential, 1) predictive). 2.1.2 Describethe stages of conducting a data analysisto solve real-world problems in a scientific 2.1.3 manner and describe tools suitable for each stage. Describesources of data and explain the characteristics of different data sources, including extremely large data sets. 2.1.4 Explain the meaningand value of reproducible research and describe the elements required 2.2 Exploratory to ensure a data analysis is reproducible. data analysis (Chapter 2.2.1 Describethe purpose of exploratory data analysis. 2.2.2 Useappropriate tools to calculate suitable summary statistics and undertake exploratory data visualizations. 2.2.3 Define and calculate Pearsons, Spearmans for bivariate data, explain their interpretation appropriate. The Actuarial Education Company and Kendalls 11) measures of correlation and perform statistical inference as IFE: 2022 Examination Page 8 CS1: Study 2.2.4 Use principal components analysis to reduce the dimensionality of a complex Guide data set. 2.3 Random sampling and sampling distributions (Chapter 7) 2.3.1 Explain whatis meantby a sample, a population and statistical inference. 2.3.2 Define a random sample from a distribution 2.3.3 Explain what is meant by a statistic Determine the mean and variance of a sample 2.3.4 and its sampling variancein terms of the population 2.3.5 2.3.7 variable. distribution. mean and the mean of a sample mean,variance and sample size. State and usethe basic sampling distributions for the sample meanand the sample variance for random samples from 2.3.6 of a random a normal distribution. State and usethe distribution ofthe t -statistic for random samplesfrom a normal distribution. State and usethe F distribution for the ratio of two sample variances from independent samples taken from normal distributions. 3 Statistical inference (25%) 3.1 Estimation and estimators (Chapter 8) 3.1.1 Describe and apply the method of momentsfor constructing estimators of population parameters. 3.1.2 Describe and apply the method of maximum likelihood for constructing estimators of population parameters. 3.2 3.1.3 Define the following terms: efficiency, bias, consistency and meansquare error. 3.1.4 Define and apply the property 3.1.5 Define the 3.1.6 Describe and apply the asymptotic distribution of maximumlikelihood estimators. 3.1.7 Usethe bootstrap of unbiasedness meansquare error of an estimator, method to estimate of an estimator. and use it to compare properties of an estimator. Confidenceintervals and prediction intervals 3.2.1 Define in general terms distribution 3.2.2 3.2.3 a confidence interval estimators. (Chapter 9) for an unknown parameter of a based on a random sample. Definein general terms a prediction interval for afuture observation based on a random sample. Derive a confidence interval for an unknown parameter using a given sampling distribution. IFE: 2022 Examinations The Actuarial Education Compan CS1: Study Guide Page 9 3.2.4 Calculate confidence intervals for the mean and the variance of a normal distribution. 3.2.5 Calculate confidence intervals for a binomial probability and a Poisson mean, including 3.2.6 the use of the normal approximation Calculate confidence intervals distribution, for two-sample in both cases. situations involving and the binomial and Poisson distributions the normal using the normal approximation. 3.3 3.2.7 Calculate confidence intervals for a difference between two data. 3.2.8 Usethe bootstrap meansfrom paired method to obtain confidence intervals. Hypothesis testing and goodness offit 3.3.1 (Chapter 10) Explain whatis meant by the following terms: null and alternative hypotheses, simple and composite hypotheses, type I and type II errors, sensitivity, specificity, test statistic, likelihood and power of a test. ratio, critical region, level of significance, probability value 3.3.2 Apply basictests for the one-sample and two-sample situations involving the normal, binomial and Poisson distributions, and apply basictests for paired data. 3.3.3 Applythe permutation approach to non-parametric hypothesis tests. 3.3.4 Use a chi-square test to test the hypothesis that a random sample is from a particular distribution, including cases where parameters are unknown. 3.3.5 Explain whatis meant by a contingency (or two-way) table, and use a chi-square test to test the independence 4 Regression theory 4.1 Linear regression of two classification criteria. and applications (30%) (Chapter 12) 4.1.1 Explain whatis meant by response and explanatory variables. 4.1.2 State the simple regression model(with a single explanatory variable). 4.1.3 Derive the least squares estimates simple linear regression 4.1.4 of the slope and intercept parameters in a model. Useappropriate software to fit asimple linear regression modelto a data set and interpret the output. Perform statistical inference on the slope parameter. Describe the use of measures of goodness of fit of alinear regression model. Useafitted linear relationship to predict a meanresponse or anindividual response The Actuarial Education Company with confidence limits. IFE: 2022 Examination Page 10 CS1: Study Guide Useresiduals to check the suitability and validity of alinear regression model. 4.2 4.1.5 State the multiplelinear regression model(with several explanatory variables). 4.1.6 Use appropriate software and interpret the output. 4.1.7 Use measuresof modelfit to select an appropriate set of explanatory variables. Generalisedlinear 4.2.1 to fit a multiple linear regression models (Chapter 13) Define an exponential family distributions and normal. 4.2.2 model to a data set of distributions. Show that the following maybe written in this form: binomial, Poisson, exponential, gamma State the meanand variance for an exponential family, and define the variance function and the scale parameter. Derive these quantities for the distributions above. 4.2.3 Explain whatis meantby the link function and the canonical link function, referring to the distributions above. 4.2.4 Explain what is meant by a variable, afactor taking categorical values and an interaction term. Define the linear predictor, illustrating its form for simple models,including polynomial modelsand modelsinvolving factors. 4.2.5 Define the deviance and scaled deviance and state how the parameters of a generalised linear model may be estimated. Describe how a suitable model may be chosen by using an analysis of deviance and by examining the significance of the parameters. 4.2.6 Define the Pearson and devianceresiduals and describe how they maybe used. 4.2.7 Apply statistical tests to determine the acceptability of afitted chi-square test and the likelihood 4.2.8 5 5.1 Fit a generalised linear model: Pearsons ratio test. model to a data set and interpret Bayesianstatistics the output. (15%) (Chapters 14, 15 and 16) Explainthe fundamental concepts of Bayesianstatistics and usethese concepts to calculate Bayesian estimators. 5.1.1 UseBayes theorem to calculate simple conditional probabilities. 5.1.2 Explain whatis meantby a prior distribution, a posterior distribution and a conjugate prior distribution. 5.1.3 Derive the posterior 5.1.4 Explain what is IFE: 2022 Examinations distribution for a parameter in simple cases. meant by aloss function. The Actuarial Education Compan CS1: Study Guide Page 11 5.1.5 Usesimple loss functions to derive Bayesian estimates 5.1.6 Derive credible intervals in simple cases. 5.1.7 Explain what is meant by the credibility of parameters. premium formula and describe the role played by the credibility factor. 5.1.8 Explain the Bayesian approach to credibility theory and useit to derive credibility premiums in simple cases. 5.1.9 Explain the empirical Bayesapproach to credibility theory and useit to derive credibility premiums in simple cases. 5.1.10 Explain the differences between the two approaches and state the assumptions underlying each of them. Core Reading The Subject CS1Course Notesinclude the Core Readingin full, integrated throughout the course. Further reading The exam will be based on the relevant will be the The Actuarial main source of tuition Education Company Syllabus and Core Reading and the ActEd course material for students. IFE: 2022 Examination Page 12 CS1: Study Guide 1.3 SubjectCS1 summaryof ActEdproducts Thefollowing products are available for Subject CS1: Course Notes Paper B Online Resources(PBOR),including the Y Assignments X Assignments four assignments: X1, X2: 80-mark tests (you are allowed 23/4hoursto complete these) X3, X4: 100-mark tests (you are allowed 31/4hours to complete these) Y Assignments two assignments: Y1, Y2: 100-mark tests (you are allowed 13/4hours to complete these) Series X Marking Series Y Marking Online Classroom over 150 tutorial units Flashcards Revision Notes seven A5 booklets ASET(2014-17 papers) four years of exam papers,ie eight sittings, covering the period April 2014 to September 2017 ASET(2019-21 papers) three years of exam papers, covering the period April 2019 to September 2021 Mini ASET covering the April 2022 exam paper Mock Exam one 100-mark test for the Paper A examination and a separate 100-mark test for the practical Paper B exam Additional 100-mark Mock Pack (AMP) two additional 100-mark Paper Atests and two additional Paper Btests MockExam Marking Marking Vouchers. Products are generally availablein both paper and eBook format. Visit www.ActEd.co.uk for full details about available eBooks, software requirements and restrictions. The following tutorials are typically available for Subject CS1: Regular Tutorials (four days) Block Tutorials (four days) a Preparation Dayfor the practical exam. Full details are set out in our Tuition Bulletin, whichis available on our website at www.ActEd.co.uk. IFE: 2022 Examinations The Actuarial Education Compan CS1: Study Guide Page 13 1.4 SubjectCS1 skills andassessment Technicalskills Subjects CS1and CS2are very mathematical and haverelatively few questions requiring wordy answers. Exam skills Exam question skill levels In the CSsubjects, the approximate split of assessment acrossthe three skill types is: Knowledge 20% Application 65% Higher Orderskills 15%. Assessment Assessment consists of a combination of a 31/4-hour examination analysis and statistical modelling examination. The Actuarial Education Company and a 13/4-hour practical data IFE: 2022 Examination Page 14 CS1: Study Guide 1.5 SubjectCS1 frequently askedquestions Q: A: Q: A: Whatknowledge of earlier subjects should I have? Noknowledge of earlier subjects is required. Whatlevel of mathematics is required? Thelevel of mathsyou need for this courseis broadly A-level standard. However,there maybe some symbols (eg the gamma function) that are not usuallyincluded on A-level syllabuses. You will find the course (and the exam) much easier if you feel comfortable withthe mathematical techniques (eg integration by parts) usedin the course and you feel confident in applying them yourself. If your maths or statistics is alittle rusty you may wish to consider purchasing additional materialto help you get up to speed. The course Pure Mathsand Statistics for Actuarial Studies is available from ActEd andit covers the mathematical techniques that are required for the Core Principles subjects, some of which are beyond A-Level (or Higher) standard. You do not needto workthrough the whole course in order to it when you need help on a particular topic. Aninitial you can just refer assessment to test your mathematical skills andfurther details regarding the course can befound on our website. You may also find this Assumed Knowledge chapter useful: www.ActEd.co.uk/help_and_advice_CS1_assumed_knowledge.html Q: A: Whatshould I doif I discover an error in the course? If you find an error in the course, please check our website at: www.ActEd.co.uk/paper_corrections.html to see if the correction has already been dealt with. Otherwise please send details via email to CS1@bpp.com. Q: A: Whoshould I send feedback to? Weare always happy to receive feedback from students, particularly details concerning any errors, contradictions or unclear statements in the courses. If you haveanycomments onthis coursein general,pleaseemailto CS1@bpp.com. If you have any comments or concerns about the Syllabus or Core Reading,these can be passed on to the profession via ActEd. Alternatively, you can send them directly to the Institute and Faculty of Actuaries Examination Team by email to education.services@actuaries.org.uk. IFE: 2022 Examinations The Actuarial Education Compan CS1: Study Guide Page 15 2.1 Beforeyoustart Whenstudying for the Institute and Faculty of Actuaries exams, you will need: a copy ofthe Formulae and Tablesfor Examinations ofthe Faculty of Actuariesand the Institute of Actuaries, 2nd Edition (2002) these are referred to simply asthe Tables a scientific calculator or Excel. The Tables are available from the Institute and Faculty of Actuaries eShop. Pleasevisit www.actuaries.org.uk. The Actuarial Education Company IFE: 2022 Examination Page 16 CS1: Study Guide 2.2 Corestudy material Thissection explains the role ofthe Syllabus, Core Reading and supplementary ActEdtext. It also gives guidance on how to usethese materials mosteffectively in order to passthe exam. Some of the information below is also contained in the introduction to the Core Reading produced by the Institute and Faculty of Actuaries. Syllabus The Syllabusfor Subject CS1has been produced bythe Institute and Faculty of Actuaries. The relevant individual syllabus objectives areincluded at the start of each course chapter and a complete copy of the Syllabus is included in Section 1.2 of this Study Guide. you use the Syllabus as an important part of your study. Werecommend that CoreReading The Core Reading has been produced by the Institute and Faculty of Actuaries. The purpose of the Core Reading is to assist in ensuring that tutors, students and examiners have clear shared appreciation ofthe requirements of the Syllabus for the qualification examinations for Fellowship of the Institute and Faculty of Actuaries. The Core Reading supports coverage of the Syllabus in helping to ensure that both depth and breadth are re-enforced. It is therefore important that students have a good understanding of the concepts covered by the Core Reading. The examinations require students to demonstrate their understanding of the concepts givenin the Syllabus and described in the Core Reading; this will be based on the legislation, professional guidance, etc that arein force whenthe Core Readingis published, ie on 31 Mayin the year preceding the examinations. Therefore the examsin April and September 2022 will be based on the Syllabus and Core Reading as at 31 May 2021. Werecommend that you always use the up-to-date Core Reading to prepare for the exams. Examiners will have this Core Reading when setting the papers. In preparing for examinations, students are advised to work through past examination questions and will find additional tuition helpful. The Core Reading will be updated each year to reflect changesin the Syllabus,to reflect current practice, andin the interest of clarity. Accreditation The Institute and Faculty of Actuaries would like to thank the numerous people who have helped in the development of the material contained in this Core Reading. IFE: 2022 Examinations The Actuarial Education Compan CS1: Study Guide Page 17 ActEdtext Core Reading deals with eachsyllabus objective and covers whatis neededto passthe exam. However, the tuition material that has been written by ActEd enhances it by giving examples and further explanation of key points. Hereis an excerpt from some ActEd Course Notesto show you how to identify Core Reading and the ActEd material. Core Reading is shown in this bold font. In the example given above, the index willfall if the actual share price goes below the theoretical ex-rights share price. Again,this is consistent with what would happen to an underlying portfolio. After allowing for chain-linking, It() = i where the formula for the investment index then becomes: Thisis ActEd ?NP it i,, t text Bt() Thisis Core Reading Nit , is the number of shares issued for the ith constituent at time t; Bt() is the base value, or divisor, attime t. Hereis an excerpt from some ActEd Course Notesto show you how to identify Core Readingfor R code. The R code to draw a scatterplot for a bivariate data frame, <data>, is: plot(<data>) Further explanation on the use of R will not be provided in the Course Notes, but instead be picked upin the Paper B Online Resources(PBOR). Werecommend that you refer to and use PBOR at the end of each chapter, references. or couple of chapters, that contains a significant number of R Copyright All study material produced by ActEd is copyright and is sold for the exclusive use of the purchaser. The copyright is owned byInstitute and Faculty Education Limited, asubsidiary of the Institute and Faculty of Actuaries. Unless prior authority is granted by ActEd, you may not hire out, lend, give out, sell, store or transmit electronically or photocopy any part of the study material. You musttake care of your study materialto ensure that it is not used or copied by anybody else. Legal action will be taken if these terms areinfringed. In addition, we mayseek to take disciplinary action through These conditions The Actuarial Education the Institute and Faculty of Actuaries or through remain in force after you have finished Company your employer. using the course. IFE: 2022 Examination Page 18 CS1: Study Guide 2.3 ActEdstudysupport Thissection gives a description of the products offered by ActEd. Successful students tend to undertake three initial mainstudy activities: 1. Learning study and understanding of subject material 2. Revision learning subject material and preparing to tackle exam-style questions 3. Rehearsal answering exam-style questions, culminating in answering questions at exam speed. Different approaches suit different people. For example, you maylike to revise material gradually over the months running up to the exams or you may do your revision in a shorter period just before the exams. Also,these three activities will almost certainly overlap. Weoffer aflexible range of products to suit you andlet you control your own learning and exam preparation. Thefollowing table shows the products that we produce. Not all products are available for all subjects. LEARNING LEARNING & REVISION REVISION & REVISION Course Notes Paper B Online Resources (PBOR) Assignments Combined REHEARSAL REHEARSAL Flashcards Revision Notes MockExam Sound Revision ASET Additional Mock Materials Pack (CMP) Pack (AMP) Mock Marking Assignment Marking Tutorials Online Classroom The products and services are describedin more detail below. IFE: 2022 Examinations The Actuarial Education Compan CS1: Study Guide Page 19 Learningproducts Course Notes The Course Notes will help you develop the basic knowledge and understanding of principles needed to pass the exam. They incorporate the complete Core Reading and include full explanation of all the syllabus objectives, with worked examples and questions (including some past exam questions) to test your understanding. Each chapter includes: the relevant syllabus objectives a chapter summary a page ofimportant formulae or definitions (where appropriate) practice questions withfull solutions. Paper B Online Resources (PBOR) The Paper B Online Resources(PBOR) will help you prepare for the practical paper. Delivered through a virtual learning practice questions. environment (VLE), you will have access to PBOR will alsoinclude the Y Assignments, worked examples and which are two exam-style assessments. Learning &revision products X Assignments The Series X Assignments are assessments that cover the material in each part of the course in turn. They can be usedto develop andtest your understanding of the material. Y Assignments The Series Y Assignments are exam-style assessmentsthat cover material acrossthe whole course. Combined Materials Pack (CMP) The Combined Materials Pack(CMP) comprises the Course Notes, PBORand the Series X Assignments. CMP Upgrade The purpose of the CMP Upgrade is to enable you to amend last years study material to makeit suitable for study for this year. Whereverpossible,it lists the changes to the syllabus objectives, Core Reading,the Course Notes and the X / Y Assignments since last year that might realistically affect your chance of success in the exam. It is produced so that you can manually amend your notes. The upgrade includes replacement pages and additional pages where appropriate. The Actuarial Education Company IFE: 2022 Examination Page 20 CS1: Study However, if alarge Guide number of changes have been made to the Course Notes and X/ Y Assignments,it is not practical to produce afull upgrade, and the upgrade will only outline the mostsignificant changes. In this case, werecommend that you purchase a replacement CMP (printed copy or eBook) or Course Notes at asignificantly reduced price. The CMP Upgradecan be downloaded free of charge on our website at www.ActEd.co.uk. Aseparate upgrade for eBooks is not produced but a significant discount is available for retakers wishingto re-purchase the latest eBook. X/ Y Assignment Marking Weare happyto markyour attempts at the Xand/or Y assignments. Markingis not included with the Assignments or the CMP and you need to order both Series X and Series Y Marking separately. You should submit your script as an attachment to an email, in the format detailed in your assignmentinstructions. You will be able to download your markers feedback via a securelink. Dont underestimate the benefits of attempting and submitting assignments: Question practice during this phase of your study gives an earlyfocus on the end goal of answering exam-style questions. Youre incentivised to keep up with your study plan and get aregular, realistic assessment of your progress. Objective, personalised feedback from a high quality marker will highlight areas on which to work and help with exam technique. In a recent study, wefound that students who attempt significantly higher passrates. There are two different types of marking product: Series morethan half the assignments have Marking and Marking Vouchers. Series Marking Series Markingapplies to a specified subject, session and student. If you purchase Series Marking, you will not be able to defer the or student. marking to afuture exam sitting or transfer it to a different subject Wetypically provide full solutions with the Series Assignments. However,if you order Series Marking at the same time as you order the Series Assignments, you can choose receive a copy of the solutions in advance. If you choose not to receive them material, you will be able to download the solutions returned (or following the final via a secure link deadline date if you do not submit If you are having your attempts at the assignments whether or not to withthe study when your marked script is a script). marked by ActEd, you should submit your scripts regularly throughout the session,in accordance with the schedule of recommended datesset out on our website at www.ActEd.co.uk. This will help you to paceyour study throughout the session and leave an adequate amount of time for revision and question practice. The recommended dates are realistic targets for the will be returned IFE: 2022 Examinations submission majority of students. Your scripts more quickly if you submit them well before the final deadline dates. The Actuarial Education Compan CS1: Study Guide Page 21 Any script submitted after the relevant final deadline date will not be marked. It is your responsibility to ensure that wereceive scripts in good time. Marking Vouchers MarkingVouchers givethe holderthe right to submit a script for markingat anytime, irrespective of the individual assignment deadlines, study session, subject or person. Marking Vouchers can be usedfor any assignment. They are valid for four years from the date of purchase and can be refunded at any time up to the expiry date. Although you maysubmit your script with a Marking Voucher at any time, you will needto adhere to the explicit Marking Voucher deadline datesto ensure that your script is returned before the date of the exam. Thedeadlinedatesare provided on our websiteat www.ActEd.co.uk. Tutorials Ourtutorials are specifically designedto develop the knowledge that you will acquire from the course materialinto the higher-level understanding that is needed to passthe exam. Werun a range of different tutorials including face-to-face tutorials at various locations, and Live Onlinetutorials. Full details are set outin our Tuition Bulletin, whichis available on our website at www.ActEd.co.uk. Regular and Block Tutorials In preparation for these tutorials, Notes before attending the tutorial we expect you to have read the relevant part(s) of the Course so that the group can spend time on exam questions and discussionto develop understanding rather than basic bookwork. You can choose one of the following types of tutorial: RegularTutorialsspread overthe session a Block Tutorial The tutorials outlined held two to eight weeks before the exam. above will focus on and develop the skills required for the Paper A examination. Students wishingfor some additional tutor support workingthrough exam-style questions for Paper B may wishto attend a Preparation Day. These will be available Live Online or face-to-face, where students will need to provide their own device capable of running R. Online Classroom The Online Classroom acts as either a valuable add-on or a great alternative to aface-to-face or Live Online tutorial, focussing on the Paper A examination. Atthe heart of the Online Classroomin eachsubject is a comprehensive, easily-searched collection of tutorial units. These are a mix of: teaching units, helping you to really get to grips withthe course material, and guided questions, enabling you to learn the mostefficient waysto answer questions and avoid common exam pitfalls. The Actuarial Education Company IFE: 2022 Examination Page 22 CS1: Study The best wayto discover the Online Classroom is to see it in action. Guide You can watch a sample of the OnlineClassroomtutorial units on our websiteat www.ActEd.co.uk. Revision products For mostsubjects, there is alot of material to revise. Finding a wayto fit revision into your routine as painlessly as possible has got to be a good strategy. inexpensive options that can provide a massive boost. Flashcards and Sound Revision are They can also provide a variation in activities during a study day, and so help you to maintain concentration and effectiveness. Flashcards Flashcards are a set of A6-sized cards that cover the key points of the subject that want to commit to reverse. memory. Each flashcard moststudents has questions on one side and the answers on the Werecommend that you usethe cards actively and test yourself as you go. Sound Revision It is reported that only 30% of information that is read is retained but this rises to 50% if the information is also heard. Sound Revision is a set of audio files, designed to help you remember the mostimportant aspects of the Core Reading. Thefiles cover the majority of the course, split into a number of manageabletopics based on the chapters in the Course Notes. Each section lasts no longer than afew minutes. Choice of revision product Different students will have preferences for different revision So, what mightinfluence products. your choice between these study aids? The following questions and comments might help you to choose the revision products that are mostsuitable for you: Do you have aregular train or busjourney? Flashcards areideal for regular bursts of revision on the move. Do you want to fit more study into your routine? Flashcards are a good option for dead time, them on the wall in your study. eg using flashcards Do you find yourself cramming for exams (even if thats on your phone or sticking not your original plan)? Flashcards are an extremely efficient wayto do your pre-exam preparation. Do you have some regular time where carrying other eg commuting, at the gym, walking the dog? Sound Revision is an ideal hands-free materials isnt practical, revision tool. Do you have a preference for auditory learning, eg do you remember conversations more easilythan emails? Sound Revision will suit your preferred style and be especially effective for you. IFE: 2022 Examinations The Actuarial Education Compan CS1: Study Guide Choosing Page 23 morethan one revision product Asthere is some degree of overlap between revision products, we do not necessarily recommend usingthem all simultaneously. However,if you are retaking a subject, then you might consider using a different product than on a previous attempt to keep your revision fresh and effective. Revision &rehearsal products Revision Notes Our Revision Notes have been designed withinput from students to help you revise efficiently. They are suitable for first-time sitters who have workedthrough the ActEdCourse Notes orfor retakers (who should find them much more useful and challenging than simply reading through the course again). The Revision Notes are a set of A5 booklets perfect for revising in places wheretaking large amounts of study material with you is not practical. Each booklet covers one maintheme or a set of related topics from the course and includes: Core Reading to develop your bookwork relevant past exam questions other useful revision ActEd Solutions knowledge with concise solutions from the last ten years aids. with Exam Technique (ASET) The ActEd Solutions with Exam Technique (ASET) contains our solutions to a number of past exam papers, plus comment and explanation. In particular, it highlights how questions might have been analysed and interpreted so as to produce a good solution with a wide range of relevant points. This will be valuable in approaching questions in subsequent examinations. Rehearsal products MockExam The MockExam consists of two papers. Thereis a 100-mark mockexam for the Paper A examination and a separate mockexamfor the practical Paper B exam. These provide arealistic test of your exam readiness. It is based on the Mock Exam from last year but it has been updated to reflect any changes to the Syllabus and Core Reading. Additional Mock Pack (AMP) The Additional MockPack(AMP) consists offour further 100-mark mockexam papers Mock Exam 2 (Papers A and B) and Mock Exam 3 (Papers A and B). Thisis ideal if you are retaking and have already sat the MockExam, orif youjust want some extra question practice. The Actuarial Education Company IFE: 2022 Examination Page 24 CS1: Study Guide Mock Marking Weare happyto markyour attempts at the mockexams. The same general principles apply asfor the Assignment Marking.In particular: Mock Exam Marking applies to a specified subject, session and student. covers the marking of both Paper A and Paper B. In this subject it Marking Vouchers can be used for each mock exam paper. You will need two marking vouchersin order to have both Paper A and Paper B marked. Markingvouchers haveto be usedfor markingthe AMP mocksand can be usedfor markingthe MockExam. Recallthat: markingis notincluded withthe products themselves and you need to order it separately you should submit your script via email in the format detailed in the mock exam instructions you will be able to download the feedback IFE: 2022 Examinations on your marked script via a secure link. The Actuarial Education Compan CS1: Study Guide Page 25 2.4 Studyskillsandassessment Technicalskills The Core Reading and exam papersfor these subjects tend to be very technical. The exams themselves therefore have many calculation be on understanding and manipulation the questions. mathematical techniques The emphasis in the exam will and applying them to various, frequently unfamiliar, situations. It is important to have afeel for whatthe numerical answer should be by having a deep understanding of the material and by doing reasonableness checks. Asa highlevel of pure mathematics and statistics is generally required for the Core Principles subjects, it is important that your mathematical skills are extremely good. If you are alittle rusty you may wish to consider purchasing additional material to help you get up to speed. The course Pure Mathsand Statistics for Actuarial Studies is available from ActEdandit covers the mathematical techniques that arerequired for the Core Principles subjects, some of whichare beyond A-Level (or Higher) standard. You do not need to work through you canjust refer to it when you need help on a particular topic. your mathematical skills and further the whole course in order Aninitial assessment to test details regarding the course can be found on our website at www.ActEd.co.uk. Studyskills Overall study plan Wesuggestthat you develop a realistic study plan, building in time for relaxation and allowing some time for contingencies. Be aware of busytimes at work, whenyou maynot be able to take as much study leave as you would like. Once you have set your plan, be determined to stick to it. You dont have to be too prescriptive at this stage about what precisely you do on each study day. The mainthing is to be clear that you will cover all the important activities in an appropriate mannerand leave plenty of time for revision and question practice. Aim to manage your study so asto allow plenty of time for the concepts you meetin these courses to bed down in your mind. Most successful students will probably aim to complete the courses atleast a month before the exam,thereby leaving asufficient amount of time for revision. Byfinishing the courses as quickly as possible, you will have a muchclearer view ofthe big picture. It important will also allow you to structure your revision so that you can concentrate on the and difficult areas. You can also try looking at our discussion forum, which can be accessed at www.ActEd.co.uk/forums (or usethe link from our home page at www.ActEd.co.uk). There are some good suggestions from students on how to study. Study sessions Only do activities that will increase your chance of passing. Try to avoid including activities for the sake ofit and dont spend time reviewing material that you already understand. You will only improve your chances of passingthe exam by getting on top ofthe materialthat you currently find difficult. The Actuarial Education Company IFE: 2022 Examination Page 26 Ideally, CS1: Study Guide each study session should have a specific purpose and be based on a specific task, egFinish reading Chapter 3 and attempt Practice Questions 3.4, 3.7 and 3.12, as opposed to a specific amount of time, egThree hours studying the materialin Chapter 3. Try to study somewhere quiet and free from distractions (eg an area at home dedicated to study). Find out when you operate at your peak, and endeavour to study at those times ofthe day. This might be between 8am and 10am or could be in the evening. to remain focused Take short breaks during your study its definitely time for a short breakif you find that your brainis tired and that your concentration has started to drift from the information in front of you. Order of study Wesuggest that you work through each of the chapters in turn. each chapter you should proceed in the following order: 1. 2. To get the maximum benefit from Readthe syllabus objectives. Theseare set outin the box atthe start of each chapter. Readthe Chapter Summary at the end of eachchapter. This willgive you a useful overview of the material that you are about to study and help you to appreciate the context of the ideas that you meet. 3. Studythe Course Notesin detail, annotating them and possibly makingyour own notes. Try the self-assessment questions asyou come to them. Asyou study, pay particular attention to the listing 4. of the syllabus Objectives and to the Core Reading. Read the Chapter Summary again carefully. If there are any ideas that you cant remember covering in the Course Notes,read the relevant section ofthe notes again to refresh your memory. 5. 6. Attempt (at least some of) the Practice Questions that appear at the end of the chapter. Where relevant, work through the relevant Paper B Online Resources for the chapter(s). You will needto have a good understanding ofthe relevant section of the course before you attempt the corresponding section of PBOR. 7. Think about whatspecifically you might wantto include from that chapter in the reference materials that you choose to have to hand during the exam. For example, you might want to put together some easy-reference lists of key concepts or formulae that can be referred to quickly and conveniently. Its afact that people are morelikely to absorb something if they review it several times. So, do look over the chapters you havestudied so far from time to time. It is useful to re-read the Chapter Summaries or to try the Practice Questions again a few days after reading the chapter itself. Its a good idea to annotate the questions with details of when you attempted each one. This makesit easierto ensure that you try all ofthe questions as part of your revision without repeating any that you got right first time. IFE: 2022 Examinations The Actuarial Education Compan CS1: Study Guide Page 27 Once youve read the relevant part of the notes and tried a selection of questions from the Practice Questions(and attended a tutorial, if appropriate) you should attempt the corresponding assignment. If you submit your assignment for marking,spend some time looking through it carefully whenit is returned. It can seem a bit depressing to analyse the errors you made,but you willincrease will try their your chances of passing the exam bylearning from your mistakes. The markers best to provide practical comments to help you to improve. To bereally preparedfor the exam, you should not only know and understand the Core Reading but also be aware of what the examiners will expect. Your revision programme should include plenty of question practice so that you are aware ofthe typical style, content and markingstructure of exam questions. You should attempt as many past exam questions as you can. Active study Hereare some techniques that 1. mayhelp you to study actively. Dont believe everything you read. Goodstudents tend to question everything that they read. They will ask why, how, what for, and they will apply their own judgement. when? when confronted with a new concept, This contrasts with those who unquestioningly believe whatthey are told, learn it thoroughly, and reproduce it (unquestioningly?) in response to exam questions. 2. Another useful technique as you read the Course Notes is to think of possible questions that the examiners could ask. This will help you to understand the examiners point of view and should meanthat there arefewer nasty surprises in the exam. Usethe Syllabus to help you makeup questions. 3. Annotate your notes with your ownideas and questions. This will makeyou study more actively and will help when you come to review and revise the material. These notes may also be usefulto refer to in the exam. Do not simply copy out the notes without thinking about the issues. 4. Attempt the questions in the notes as you workthrough the course. Produce your answer before you refer to the solution. 5. Attempt other questions and assignments on a similar basis,ie produce your answer before looking atthe solution provided. Attempting the assignments under timed conditions hassome particular benefits: It forces you to think and actin a waythat is similar to how you will behavein the exam. When you have your assignments marked it is much more useful if the markers comments can show you how to improve your performance under timed conditions than your performance whenyou are under no time pressure. The knowledge that you are going to do an assignment under timed conditions and then submit it (however good or bad)for markingcan act asa powerful incentive to make you study each part as well as possible. The Actuarial Education Company IFE: 2022 Examination Page 28 CS1: Study Guide It is also quicker than trying to produce perfect answers. 6. Sit a mockexam four to six weeksbefore the real exam to identify your weaknessesand workto improve them. You could use a mockexam written by ActEdor a past exam paper. Ensure that you have your reference materials handy, as you plan to in the actual exam, so that you can practise finding what you needin them quickly and efficiently. (You might even be able to add to / modify your reference materialsto increase their usefulness.) You can find further information on how to study in the professions Student Handbook, which you can download from their website at: www.actuaries.org.uk/studying Revision and exam skills Revision skills You will have sat many exams before and will have mastered the exam and revision techniques that suit you. Howeverit is important to note that due to the high volume of workinvolved in the Core Principles subjects it is not possible to leave all your revision to the last minute. Students who prepare wellin advance have a better chance of passingtheir exams on the first sitting. Unprepared students find that they are under time pressure in the exam. Therefore it is important to find waysof maximisingyour score in the shortest possible time. Part of your preparation should be to practise alarge number of exam-style questions under timed conditions assoon as possible. This will: help you to develop the necessary understanding of the techniques required highlight the key topics, which crop up regularly in many different contexts and questions help you to practisethe specific skills that you will need to passthe exam. There are many sources of exam-style questions. You can use past exam papers, the Practice Questions at the end of each chapter (which include many past exam questions), assignments, mockexams,the Revision Notes and ASET. Exam question skill levels Exam questions are not designed to be of similar difficulty. The Institute specifies different skill levels at which questions may be set. and Faculty of Actuaries Questions maybe set at any skill level: Knowledge demonstration of a detailed knowledge and understanding of the topic Application demonstration of an ability to apply the principles underlying the topic within a given context IFE: 2022 Examinations The Actuarial Education Compan CS1: Study Guide Page 29 Higher Order demonstration of an ability to perform deeper analysis and assessment of situations, including forming judgements, taking into account different points of view, comparing and contrasting situations, suggesting possible solutions and actions, and making recommendations. Command verbs TheInstitute and Faculty of Actuaries use command verbs (such asDefine, Discuss and Explain) to help students to identify whatthe question requires. The profession has produced a document, Command verbs usedin the Associate and Fellowship examinations, to help students to understand what each command verb is asking them to do. It also gives the following advice: The use of a specific command verb within a syllabus objective does not indicate that this is the only form of question whichcan be asked on the topic covered bythat objective. The examiners may ask a question on any syllabus topic using any of the agreed command verbs, as are defined in the document. You can find the relevant document on the professions website at: www.actuaries.org.uk/studying/prepare-your-exams Pastexampapers You can download some past exam papers and Examiners at www.actuaries.org.uk. Reports from the professions website However, please be aware that the majority ofthese exam papers are for the pre-2019 syllabus and so not all questions will be relevant. Theexamination IMPORTANT NOTE: The following advice was correct at the time of publication, however it is important to keep up-to-date with anychanges. Seethe IFoAs websitefor the latest guidance. Thereis alot of usefulinformation about the exams at: www.actuaries.org.uk/studying/my-exams/ifoa-exams including The Actuarial an Examinations Education Company Handbook that gives guidance specific to sitting exams online. IFE: 2022 Examination Page 30 CS1: Study Guide For the exam, ensure you have ready: your reference materials, with helpful bookmarks rough paper and a pen / pencil a calculator / Excel (or equivalent) a printer (if you wishto print out the exam paper) a copy ofthe Tables. Please also refer to the professions website and your examination instructions for details about what you will need for the practical Paper B exam. IFE: 2022 Examinations The Actuarial Education Compan CS1: Study Guide Page 31 2.5 Queriesandfeedback Questionsand queries From time to time you maycome acrosssomething in the study materialthat is unclearto you. The easiest way to solve such problems is often through discussion peers whilst studying. they will probably have had similar experiences with friends, colleagues and If theres no-one at work to talk to then use our discussion forum at www.ActEd.co.uk/forums (or usethe link from our home page at www.ActEd.co.uk). Our online forum is dedicated to actuarial students sothat you can get help from fellow students on any aspect of your studies from technical issues to study advice. You could also useit to get ideas for revision or for further reading around the subject that you are studying. ActEd tutors willvisit the site from time to time to ensure that you are not beingled astray and we also post other frequently asked questions from students on the forum asthey arise. If you are still stuck, then you can send queries by email to the relevant subject email address (see Section 1.5), but werecommend that you try the forum first. We will endeavour to contact you as soon as possible after receiving your query but you should be aware that it maytake some time to reply to queries, particularly whentutors are away from the office running tutorials. Atthe busiest teaching times of year, it maytake us more than a week to get back to you. If you have many queries on the course material, you should raise them at a tutorial or book a personal tuition session with an ActEdtutor. Information about personal tuition is set outin our current brochure. Please email ActEd@bpp.com for more details. Feedback If you find an error in the course, please check the corrections page of our website (www.ActEd.co.uk/paper_corrections.html)to seeif the correction hasalready been dealt with. Otherwise please send details via email to the relevant subject email address(see Section 1.5). Each year our tutors work hard to improve the quality ofthe study material and to ensure that the courses are as clear as possible and free from errors. Weare always happy to receive feedback from students, particularly details concerning any errors, contradictions or unclear statements in the courses. If you have any comments on this course please email them to the relevant subject email address (see Section 1.5). Ourtutors also work withthe profession to suggest developments andimprovements to the Syllabus and Core Reading. If you have any comments or concerns about the Syllabus or Core Reading, these can be passed on via ActEd. Alternatively, you can send them directly to the Institute and Faculty of Actuaries Examination Team by email to education.services@actuaries.org.uk. The Actuarial Education Company IFE: 2022 Examination Allstudy materialproducedby ActEdis copyright andis sold for the exclusiveuse of the purchaser. Thecopyright is ownedbyInstitute andFacultyEducationLimited, a subsidiary of the Institute and Faculty of Actuaries. Unlessprior authorityis grantedby ActEd,you maynot hire out,lend, give out, sell, store ortransmit electronically or photocopy any part of the study material. You musttake care of yourstudy materialto ensurethat it is not used or copied by anybody else. Legal action will betaken if these terms areinfringed. In addition, we mayseekto take disciplinaryactionthrough the profession orthrough your employer. Theseconditionsremainin force after you havefinished usingthe course. IFE: 2022 Examinations The Actuarial Education Compan CS1-01: Data analysis Page 1 Dataanalysis Syllabusobjectives 2.1 Data analysis 2.1.1 Describethe possible aims of data analysis (eg descriptive, inferential and predictive). 2.1.2 Describe the stages of conducting a data analysis to solve real-world problems in a scientific mannerand describe tools suitable for eachstage. 2.1.3 Describesources of data and explain the characteristics of different data sources,including extremely large data sets. 2.1.4 Explain the meaning and value of reproducible research and describe the elements required to ensure a data analysisis reproducible. The Actuarial Education Company IFE: 2022 Examination Page 2 0 CS1-01: Data analysis Introduction Thischapter provides anintroduction to the underlying principles of data analysis,in particular within an actuarial context. Data analysis is the process by which datais gathered in its raw state and analysed or processed into information which can be used for specific purposes. This chapter will describe some of the different forms of data analysis, the steps involved in the process and consider some ofthe practical problems encountered in data analytics. Although this chapter looks at the general principles involved in data analysis,it does not deal with the statistical techniques required to perform a data analysis. These are covered elsewhere, in Subjects CS1and CS2. IFE: 2022 Examinations The Actuarial Education Compan CS1-01: 1 Data analysis Page 3 Aimsof a dataanalysis Three keys forms of data analysis will be covered in this section: descriptive; inferential; and predictive. 1.1 Descriptiveanalysis Data presented in its raw state can be difficult to manage and draw meaningful conclusions from, particularly where there is alarge volume of data to work with. A descriptive analysis solves this problem by presenting the data in a simpler format, more easily understood and interpreted by the user. Simply put, this might involve summarising highlights any patterns ortrends. to draw any specific conclusions. the data or presenting it in a format which A descriptive analysis is not intended to enable the user Rather, it describes the data actually presented. For example, it is likely to be easierto understand the trend and variation in the sterling/euro exchangerate over the past year bylooking at a graph of the daily exchangerate rather than alist of values. The graph is likely to makethe information easier to absorb. Two key tendency measures, or parameters, used in a descriptive analysis and the dispersion. The most common measurements are the measure of central of central tendency are the mean,the median and the mode. Typical measurements ofthe dispersion arethe standard deviation and ranges such as the interquartile range. Measuresof central tendency tell us about the average value of a data set, whereas measuresof dispersion tell us about the spread ofthe values. We will use manyofthese measureslater in the course. It can also be important to describe other aspects of the shape of the (empirical) of the data, for example by calculating measures of skewness and kurtosis. distribution Empirical meansbased on observation. So an empirical distribution relates to the distribution the actual data points collected, rather than any assumed underlying theoretical distribution. Skewnessis a measureof how symmetrical a data set is, and kurtosis is a measureof howlikely extreme values are to appear (ie those in the tails of the distribution). Weshall touch on these later. 1.2 Inferential analysis Often it is not feasible or practical to collect data in respect of the whole population, particularly when that population is very large. For example, when conducting an opinion poll in a large country, it may not be cost effective to survey every citizen. A practical solution to this problem might be to gather data in respect of a sample, which is used to represent the wider population. The analysis of the data from this sample is called inferential analysis. The Actuarial Education Company IFE: 2022 Examination of Page 4 CS1-01: Data analysis The sample analysis involves estimating the parameters as described in Section 1.1 above and testing hypotheses. It is generally accepted that if the sample is large and taken random (selected without prejudice), then it quite accurately represents the statistics population, such as distribution, also contingent upon the user probability, mean,standard deviation, making reasonably in order to perform the inferential correct hypotheses at of the However, this is about the population analysis. Care may need to be taken to ensure that the sample selected is likely to be representative of the whole population. For example, an opinion poll on a national issue conducted in urban locations on weekday afternoons between 2pm and 4pm maynot accurately reflect the views ofthe whole population. Thisis becausethose living in rural areas and those whoregularly work during that period are unlikely to have been surveyed, and these people might tend to have a different viewpoint to those who have been surveyed. Sampling, inferential 1.3 analysis and parameter estimation are covered in more detail later. Predictiveanalysis Predictive analysis extends the principles behind inferential analysis analyse past data and make predictions about future events. in order for the user to It achieves this by using an existing set of data with known attributes (also known as features), known as the training set in order to discover potentially predictive relationships. Those relationships are tested using a different set of data, known as the test set, to assess the strength of those relationships. Atypical example of a predictive analysis is regression analysis, which is covered in more detail later. The simplest form of this is linear regression where the relationship between a scalar dependent variable and an explanatory or independent variable is assumed to be linear and the training practical example set is used to determine the slope and intercept might be the relationship between a cars braking In this example, the cars speed is the explanatory (or independent) ofthe line. distance A against speed. variable and the braking distanceis the dependent variable. Question Based ondatagathered ataparticularweather stationonthe monthly rainfallin mm( r) andthe average numberof hoursofsunshine perday( s),aresearcher hasdetermined thefollowing explanatory relationship: =- Using this sr 90.1 model: (i) Estimatethe average number of hours of sunshine per day,if the monthlyrainfall is 50mm. (ii) Statethe impact on the average number of hours of sunshine per day of each extra millimetre of rainfall in a month. IFE: 2022 Examinations The Actuarial Education Compan CS1-01: Data analysis Page 5 Solution (i) When = 50r: s =90.1 =- 50 4 ie there are 4 hours of sunshine per day on average. (ii) For each extra millimetre ofrainfall in a month,the average number of hours of sunshine per dayfalls by 0.1 hours, or 6 minutes. The Actuarial Education Company IFE: 2022 Examination Page 6 2 CS1-01: Data analysis Thedataanalysisprocess Whilethe process to analyse data does not follow a set pattern of steps, it is helpful to consider the key stages which might be used by actuaries when collecting and analysing data. The key steps in a data analysis 1. process can be described Develop a well-defined set of objectives as follows: which need to be met by the results ofthe data analysis. The objective maybeto summarise the claims from a sicknessinsurance product by age, gender and cause of claim, or to predict the outcome of the next national parliamentary election. 2. 3. Identify the Collection data items of the The relevant data required data from for the analysis. appropriate sources. may be available internally (eg from an insurance companys administration department) or mayneedto be gathered from external sources (eg from a local council office or government statistical service). 4. Processing and formatting data for analysis, eg inputting into a spreadsheet, database or other model. 5. Cleaning 6. Exploratory (a) data, eg addressing data analysis, (c) which missing or inconsistent values. mayinclude: Descriptive analysis; producing summary statistics spread (b) unusual, on central tendency and of the data. Inferential analysis; estimating summary parameters of the wider population of data, testing hypotheses. Predictive analysis; analysing data to make predictions about future events or other data sets. 7. Modelling the data. 8. Communicating It will be important the results. when communicating the results to what analyses were performed, what assumptions analysis, and any limitations of the analysis. 9. Monitoring the process; makeit clear were made, the conclusion updating the data and repeating A data analysis is not necessarily just a one-off exercise. the claims from its sickness policies what data was used, of the the process if required. Aninsurance company analysing may wish to do this every few years to allow for the new data gathered and to look for trends. An opinion poll company attempting to predict an election result is likely to repeat the poll a number oftimes in the weeks before the election to IFE: 2022 Examinations monitor any changes in views during the campaign period. The Actuarial Education Compan CS1-01: Data analysis Page 7 Throughout the process, the modelling team needs to ensure that any relevant professional guidance has been complied with. For example, the Financial Reporting Council has issued a Technical Actuarial Standard (TAS) on the principles for Technical Actuarial Work (TAS100) which includes Knowledge Further, the complied modelling team TAS is not required should also remain with. Such legal requirement data protection The Actuarial principles for the use of data in technical actuarial work. of the detail of this Education and gender Company for CS1. aware of any legal requirement to be mayinclude aspects around consumer/customer discrimination. IFE: 2022 Examination Page 8 3 CS1-01: Data analysis Datasources Step 3 of the process described in Section 2 above refers to collection meetthe objectives ofthe analysis from appropriate sources. of the data needed to As consideration of Steps 3, 4, and 5 makes clear, getting data into a form ready for analysis is a process, not a single event. Consequently, what is seen as the source of data can depend on your viewpoint. Suppose you are conducting an analysis which involves collecting survey data from a sample of people in the hope of drawing inferences about a wider population. If you are in charge of the whole process, including collecting the primary data from your selected sample, you would probably view the source of the data as being the people in your sample. Having collected, cleaned and possibly summarised the data you might makeit available to other investigators in JavaScript object notation (JSON) format via a web Application programming interface (API). You will then have created a secondary source for others to use. In this section we discuss how the characteristics of the data are determined both by the primary source and the steps carried out to prepare it for analysis which mayinclude the steps on the journey from primary to secondary source. Details of particular data formats (such as JSON), or of the mechanisms for getting data from an external source into a local data structure suitable for analysis, are not covered in CS1. Primary data can be gathered as the outcome of a designed experiment or from an observational study (which could include a survey of responses to specific questions). In all cases, knowledge of the details of the collection process is important for a complete understanding of the data, including possible sources of bias or inaccuracy. Issues that the analyst should be aware of include: whether the limitations process was manual or automated; on the precision whether there of the data recorded; was any validation at source; and if data wasnt collected automatically, how was it converted These factors can affect the accuracy and reliability to an electronic of the data collected. form. For example: in a survey, anindividuals salary maybe specified asfalling into given bands, eg 20,000 - 29,999, 30,000 - 39,999 etc, rather than the precise value being recorded if responses werecollected on handwritten forms, and then manuallyinput into a database, there is greater scope for errors to appear. Whererandomisation is important has been used to reduce the effect of bias or confounding to know the sampling scheme variables it used: simple random sampling; stratified another IFE: 2022 Examinations sampling; sampling or method. The Actuarial Education Compan CS1-01: Data analysis Page 9 Question Aresearcher wishesto survey 10% of a companys workforce. Describe how the sample could be selected using: (a) simple random sampling (b) stratified sampling. Solution (a) Simple random sampling Usingsimple random sampling, each employee would have an equal chance of being selected. This could be achieved by taking alist of the employees, selecting 10% of the numbers at random (either allocating each a number, and then manually, or using a computer-generated process). Stratified (b) sampling Usingstratified sampling, the workforce wouldfirst be split into groups (or strata) defined by specific criteria, eglevel of seniority. Then 10% of each group would be selected using simple random sampling. In this way,the resulting sample wouldreflect the structure of the company by seniority. This aims to overcome one ofthe issues with simple random sampling, ie that the sample obtained does not fully reflect the characteristics ofthe population. Witha simple random sample, it would be possible for all those selected to be at the same level of seniority, and so be unrepresentative of the workforce as a whole. Data may have undergone some form of pre-processing. (eg by geographical area or age band). In the past, this amount of storage required and to of computing be grouped: A common example is grouping was often done to reduce the makethe number of calculations manageable. The scale power available now means that this is less often an issue, but data may still perhaps to anonymise it, or to remove the possibility of extracting sensitive (or perhaps commercially sensitive) details. Other aspects of the data which are determined the way it is analysed include the following: by the collection process, Cross-sectional data involves recording values of the variables case in the sample at a single moment in time. For example, recording the amount spent in a supermarket and which affect of interest for each by each member of aloyalty card scheme this week. Longitudinal data involves recording values atintervals over time. For example, recording the amount spent in a supermarket by a particular member of a loyalty card scheme each weekfor a year. The Actuarial Education Company IFE: 2022 Examination Page 10 CS1-01: Data analysis Censored data occurs when the value of a variable is only partially known, for example, if a subject in a survival study withdraws, or survives beyond the end of the study: here alower bound for the survival period is known but the exact value isnt. Censoringis dealt within detail in Subject CS2. Truncated data occurs when measurements are completely unknown. on some variables are not recorded so For example, if we were collecting data on the periods of time for which a users internet connection was disrupted, but only recorded the duration of periods of disruption that lasted 5 minutes orlonger, we would have a truncated data set. 3.1 Big data The term big datais not well defined but has come to be used to describe data with characteristics that makeit impossible to apply traditional methods of analysis (for example, those which rely on a single, well-structured data set which can be manipulated and analysed on a single computer). Typically, this characteristics that have to be inferred from the design of an experiment. from the means automatically collected data with data itself rather than known in advance Giventhe description above, the properties that can lead data to be classified as big include: size, not only does big data include a very large number of individual each might include very many variables, a high proportion of which empty (or null) values leading to sparse data; cases, but might have speed, the data to be analysed might be arriving in real time at a very fast rate for example, from an array of sensors taking measurements thousands of time every second; variety, big data is often composed of elements from could have very different structures reliability, individual example, many different sources which oris often largely unstructured; given the above three characteristics we can see that the reliability of data elements might be difficult to ascertain and could vary over time (for an internet connected sensor could go offline for a period). Examples ofbig data are: the information held bylarge online retailers onitems viewed, purchased and recommended by each ofits customers measurements of atmospheric pressure from sensors monitored by a national meteorological organisation the data held by an insurance company received from the personal activity trackers (that monitor daily exercise, food intake and sleep, for example) ofits policyholders. IFE: 2022 Examinations The Actuarial Education Compan CS1-01: Data analysis Page 11 Although the four points above (size, speed, variety, reliability) have been presented in the context of big data, they are characteristics that should be considered for any data source. For example, an actuary may need to decide if it is advisable to increase the volume of data available for a given investigation externally. data, plus any issues of reliability 3.2 by combining aninternal In this case, the extra processing complexity data set with data available required to handle a variety of ofthe external data, will need to be considered. Datasecurity,privacyandregulation In the design of any investigation, consideration of issues related to data security, privacy and complying with relevant regulations should be paramount. It is especially important to be aware that combining different data from different anonymised sources can mean that individual cases become identifiable. Another internet, complex The Actuarial point to be aware of is that just because data has been made available on the doesnt mean that that others are free to use it as they wish. This is a very area and laws vary between jurisdictions. Education Company IFE: 2022 Examination Page 12 4 CS1-01: Data analysis Reproducibleresearch An example reference for this section is in Peng (2016). For the full reference, see the end of this section. 4.1 The meaning of reproducible research Reproducibility refers to the idea that when the results of a statistical analysis are reported, sufficient information is provided so that an independent third party can repeat the analysis and arrive at the same results. In science, reproducibility repeating an experiment is linked to the concept of replication which refers to someone and obtaining the same (or at least consistent) results. Replication can be hard, or expensive orimpossible, for example if: the study is big; the study relies on data collected the study is of a unique of a particular event). Dueto the possible difficulties often a reasonably alternative occurrence at great expense or over (eg the standards of replication, reproducibility many years; of healthcare or in the aftermath ofthe statistical analysis is standard. So,rather than the results of the analysis being validated by anindependent third party completely replicating the study from scratch (including gathering a new data set), the validation is achieved by an independent third party reproducing the same results based on the same data set. 4.2 Elementsrequired for reproducibility Typically, reproducibility requires the original data and the computer code to be made available (or fully specified) so that other people can repeat the analysis and verify the results. In all but the most trivial cases, it will be necessary to include full documentation (eg description of each data variable, an audit trail describing the decisions made when cleaning and processing the data, and full documented code). covered in Subject Documentation of modelsis CP2. Full documented code can be achieved through literate statistical programming (as defined by Knuth, 1992) where the program includes an explanation of the program in plain language, interspersed with code snippets. Within the R environment, a tool which allows this is R Markdown. R Markdown enables documents to be produced that include the code used, an explanation of that code, and, if desired, the output from that code. Asasimpler example, it adding comments maybe possible to document the workcarried out in a spreadsheet by or annotations to explain the operations performed in particular cells, rows or columns. IFE: 2022 Examinations The Actuarial Education Compan CS1-01: Data analysis Page 13 Although not strictly required to meetthe definition of reproducibility, control process can ensure evolving drafts of code, documentation alignment between the various stages of development and review, reversible if necessary. tool used for version a good version and reports and changes are kept in are There are manytools that are used for version control. A popular control is git. A detailed knowledge ofthe version control tool git is not required for Subject CS1. In addition to version control, documenting the software environment, the computing architecture, the operating system, the software toolchain, external dependencies and version numbers can all be important in ensuring reproducibility. As an example, in the R programming language, the command: sessionInfo() provides information about the operating packages being used. system, version of R and version of all R Question Give a reason why documenting the version number of the software used can beimportant for reproducibility of a data analysis. Solution Some functions might: be availablein one version of a packagethat are not availablein another (older) version, or behave differently in different versions of a package. This could prevent someone being able to reproduce the analysis. Wherethere is randomness in the statistical or machinelearning techniques example random forests or neural networks) require the random seed to be set. or where simulation being used (for is used, replication will Machine learning is covered in Subject CS2. Simulation will be dealt within more detaillater in the course. Atthis point, it is sufficient to know that eachsimulation that is run will be based on aseries of pseudo-random numbers. So, for example, one simulation will be based on one particular series of pseudo-random numbers, but unless explicitly coded otherwise, a different simulation will be based on a different series of pseudo-random numbers. The second simulation willthen produce different results, rather than replicating the original results, whichis the desired outcome here. To ensure the two simulations give the same results, they would both need to be based on the same series of pseudo-random this regularly The Actuarial numbers. Thisis known as setting the random seed. We will do when using Rto carry out a simulation. Education Company IFE: 2022 Examination Page 14 CS1-01: Data analysis Doing things by of doing things hand is very likely to create problems in reproducing the work. Examples by hand are: manually editing spreadsheets (rather than reading the raw datainto a programming environment and making the changes there); editing tables and figures (rather than creates them exactly as needed); ensuring that the programming environment downloading data manually from a website (rather than doing it programmatically); and pointing and clicking (unless the software used creates an audit trail of what has been clicked). Pointing and clicking relates to choosing a particular operation from an on-screen menu,for example. This action would not ordinarily be recorded electronically. The mainthing to note hereis that the more of the analysisthat is performed in an automated way,the easierit will beto reproduce by another individual. Manualinterventions maybe forgotten altogether, and evenif they are remembered, can be difficult to document clearly. 4.3 Thevalueofreproducibility Many actuarial analyses are undertaken for commercial, published, but reproducibility is still valuable: not scientific, reasons and are not reproducibility is necessary for a complete technical work review (which in many cases will be a professional requirement) to ensure the analysis has been correctly carried out and the conclusions are justified by the data and analysis; reproducibility reproducible the analysis, may be required by external regulators and auditors; research is more easily extended to investigate or to incorporate new data; the effect it is often desirable to compare the results of an investigation carried out in the past; if the earlier investigation was reported of changes to with a similar reproducibly, one an analysis of the differences between the two can be carried out with confidence; the discipline of reproducible research, with its emphasis on good documentation of processes and data storage, can lead to fewer errors that need correcting in the original work and, hence, There are some issues greater efficiency. that reproducibility does not address: Reproducibility does not mean that the analysis is correct. For example, if an incorrect distribution is assumed, the results may be wrong even though they can be reproduced by making the same incorrect assumption about the distribution. However, by making clear how the results are achieved, it does allow transparency so that incorrect analysis can be appropriately challenged. If activities involved in reproducibility happen only at the end of an analysis, this may be too late for resulting challenges to be dealt with. For example, resources may have been moved on to other projects. IFE: 2022 Examinations The Actuarial Education Compan CS1-01: 4.4 Data analysis Page 15 References Further information Knuth, on the materialin this section is givenin the references: Donald E.(1992). Literate Programming. California: Center for the Study of Language and Information. Peng, R. D., 2016, Report Writing for Stanford University ISBN 978-0-937073-80-3. Data Science in R, www.Leanpub.com/reportwriting The Actuarial Education Company IFE: 2022 Examination Page 16 CS1-01: Data analysis The chapter summary starts on the next page so that you can keep all the chapter summaries together for revision purposes. IFE: 2022 Examinations The Actuarial Education Compan CS1-01: Data analysis Page 17 Chapter1 Summary The three keyforms of data analysis are: descriptive analysis: producing summary statistics (eg measuresof central tendency and dispersion) and presenting the datain a simpler format inferential analysis: using a data sample to estimate summary parameters for the wider population from which the sample wastaken, and testing hypotheses predictive analysis: extends the principles ofinferential analysis to analyse past data and makepredictions about future events. The key steps in the data analysis process are: 1. Develop a well-defined set of objectives which need to be metby the results of the data analysis. 2. Identify the data items required for the analysis. 3. Collection of the datafrom appropriate sources. 4. Processing and formatting database or other datafor analysis, eginputting into a spreadsheet, model. 5. Cleaning data, eg addressing unusual, 6. Exploratory data analysis, which mayinclude descriptive analysis,inferential analysis or predictive 7. missing or inconsistent values. analysis. Modelling the data. 8. Communicating 9. the results. Monitoring the process; updating the data and repeating the process if required. In the data collection process,the primary source of the datais the population (or population sample) from whichthe raw datais obtained. If, oncethe information is collected, cleaned and possibly summarised, it is madeavailable for others to use via a web interface, this is then a secondary source of data. Other aspects of the data determined bythe collection processthat are: mayaffect the analysis Cross-sectional datainvolves recording values ofthe variables ofinterest for each casein the sample at a single momentin time. Longitudinal data involves recording values at intervals over time. Censored data occurs whenthe value of a variable is only partially known. Truncated data occurs when measurements on some variables are not recorded so are completely unknown. The Actuarial Education Company IFE: 2022 Examination Page 18 CS1-01: Data analysis The term big data can be usedto describe data with characteristics that makeit impossible to apply traditional methods of analysis. Typically, this meansautomatically collected data with characteristics that have to beinferred advance from the design of the experiment. from the data itself rather than known in Properties that canlead to data being classified asbig include: size of the data set speed of arrival ofthe data variety of different sources from whichthe datais drawn reliability of the data elements Replication refers to an independent might be difficult to ascertain. third party repeating an experiment and obtaining the same (or atleast consistent) results. Replication of a data analysis can be difficult, expensive orimpossible, so reproducibility is often used as areasonably alternative standard. Reproducibility refers to reporting the results of a statistical analysisin sufficient detail that an independent third party can repeat the analysis on the same data set and arrive at the same results. Elements required for reproducibility: the original data and fully documented computer code need to be madeavailable good version control documentation ofthe software used, computing architecture, operating system, external dependencies and version numbers whererandomness is involved in the process, replication seed to be set limiting IFE: 2022 Examinations the amount of work done by willrequire the random hand. The Actuarial Education Compan CS1-01: Data analysis Page 19 Chapter1 PracticeQuestions 1.1 The data analysis department of a mobile phone messagingapp provider has gathered data on the number of messagessent by each user of the app on each day overthe past 5 years. The geographical location (i) Describe each ofthe following terms asit relates to a data set, and give an example of each asit relates to the app providers data: (a) cross-sectional (b) longitudinal. (ii) Give an example of each of the following using the app providers data: (a) 1.2 1.3 of each user (by country) is also known. types of data analysis that could be carried out descriptive (b) inferential (c) predictive. Explainthe regulatory andlegal requirements that should be observed whenconducting a data analysis exercise. Acarinsurer wishesto investigate whether young drivers (aged 17-25) are morelikely to have an accident in a given year than older drivers. Exam style Describethe steps that would be followed in the analysis of data for this investigation. 1.4 Exam style (i) In the context of data analysis, define the terms replication (ii) [7] and reproducibility. [2] Givethree reasons whyreplication of a data analysis can be difficult to achievein practice. [3] [Total The Actuarial Education Company IFE: 2022 Examination 5] Page 20 CS1-01: Data analysis The solutions start on the next page so that you can separate the questions and solutions. IFE: 2022 Examinations The Actuarial Education Compan CS1-01: Data analysis Page 21 Chapter1 Solutions 1.1 (i)(a) Cross-sectional Cross-sectional datainvolves recording the values of the variables ofinterest for eachcasein the sample at a single moment in time. In this data set, this relates to the number of messagessent by each user on any particular day. (i)(b) Longitudinal Longitudinal datainvolves recording the values of the variables ofinterest atintervals overtime. In this data set, this relates to the number of messagessent by a particular user on each day over the 5-year period. (ii)(a) Descriptive analysis Examples of descriptive analysisthat could be carried out on this data set include: calculating the meanand standard deviation of the number of messagessent each day by usersin each country plotting a graph of the total messages sent each day worldwide, to illustrate trend in the number of messages sent over the 5 years calculating what proportion of the total the overall messages sent in each year originate in each country. (ii)(b) Inferential analysis Examples ofinferential analysis that could be carried out on this data set include: testing the hypothesis that more messagesare sent at weekendsthan on weekdays assessing whether there is a significant difference in the rate of growth of the number of messagessent each day by usersin different countries over the 5-year period. (ii)(c) Predictive analysis Examples of predictive analysis that could be carried out on this data set include: forecasting which countries will be the majorusers ofthe app in 5 years time, and will therefore need the mosttechnical support staff predicting the number of messages sent on the apps busiest day (eg New Years Eve) next year, to ensure that the provider continues to havesufficient capacity. The Actuarial Education Company IFE: 2022 Examination Page 22 1.2 CS1-01: Data analysis Throughout the data analysis process, it is important to ensure that any relevant professional guidance has been complied with. For example, the UKs Financial Reporting Council hasissued a Technical Actuarial Standard (TAS) on the principles for Technical Actuarial Work(TAS100). This describes the principles that should be adhered to when using datain technical actuarial work. The data analysis team must also be aware of any legal requirements to be complied with relating to, for example: protection of anindividuals personal data and privacy discrimination on the grounds of gender, age, or other reasons. Withregard to privacy regulations, it is important to note that combining data from different sources may mean that individuals data sources. can be identified, even if they are anonymous in the original Finally, data that have been madeavailable on the internet cannot necessarily be usedfor any purpose. Anylegal restrictions should be checked before usingthe data, noting that laws can vary between jurisdictions. 1.3 The key steps in the data analysis process in this scenario are: 1. Develop a well-defined set of objectives that need to be met bythe results of the data analysis. [1/2] Here,the objective is to determine whether young drivers are morelikely to have an 2. accident in a given year than older drivers. [1/2] Identify [1/2] the data items required for the analysis. The dataitems needed wouldinclude the number of drivers of each age during the investigation period and the number of accidents they had. 3. [1/2] Collection of the data from appropriate sources. The insurer will haveits own internal [1/2] data from its administration department on the number of policyholders of each age during the investigation period and which of them had accidents. [1/2] The insurer may also be able to source data externally, collates information from a number ofinsurers. 4. eg from an industry body that [1/2] Processing andformatting the datafor analysis, eginputting into a spreadsheet, database or other model. [1/2] The data will need to be extracted from the administration system and loaded into whichever statistical package is being used for the analysis. [1/2] If different data sets are being combined, they will need to be put into a consistent format and any duplicates (ie the same record appearing in different data sets) will need to be removed. IFE: 2022 Examinations [1/2] The Actuarial Education Compan CS1-01: Data analysis 5. Page 23 Cleaning data, eg addressing unusual, missing or inconsistent values. [1/2] For example, the age of the driver might be missing,or be too low or high to be plausible. These cases will needinvestigation. [1/2] 6. Exploratory data analysis, which here takes the form ofinferential analysis... [1/2] ... as we are testing the hypothesis that younger drivers are morelikely to have an accident than older drivers. 7. [1/2] Modellingthe data. [1/2] This mayinvolve fitting a distribution to the annual number of accidents arising from the policyholders in each age group. [1/2] 8. Communicating the results. [1/2] This willinvolve describing the data sources used,the modeland analyses performed, and the conclusion of the analysis(ie whether young drivers areindeed morelikely to have an accident than older drivers), along with anylimitations of the analysis. [1/2] 9. Monitoring the process updating the data and repeating the process if required. [1/2] The carinsurer may wishto repeat the process againin afew years time, usingthe data gathered overthat period, to ensure that the conclusions of the original analysis remain valid. 10. [1/2] Ensuring that any relevant has been complied professional guidance and legislation (eg on age discrimination) with. [1/2] [Maximum 7] 1.4 (i) Definitions Replication refers to anindependent third party repeating an analysis from scratch (including gathering an independent data sample) and obtaining the same (or at least consistent) results. [1] Reproducibility refers to reporting the results of astatistical analysisin sufficient detail that an independent third party can repeat the analysis on the same data set and arrive at the same results. [1] [Total (ii) 2] Threereasons whyreplication is difficult Replication of a data analysis can be difficult if: the study is big; [1] the study relies on data collected at great expense or over manyyears; or [1] the study is of a unique occurrence (eg the standards of healthcare in the aftermath of a particular event). [1] [Total 3] The Actuarial Education Company IFE: 2022 Examination Allstudy materialproducedby ActEdis copyright andis sold for the exclusiveuse of the purchaser. Thecopyright is ownedbyInstitute andFacultyEducationLimited, a subsidiary of the Institute and Faculty of Actuaries. Unlessprior authorityis grantedby ActEd,you maynot hire out,lend, give out, sell, store ortransmit electronically or photocopy any part of the study material. You musttake care of yourstudy materialto ensurethat it is not used or copied by anybody else. Legal action will betaken if these terms areinfringed. In addition, we mayseekto take disciplinaryactionthrough the profession orthrough your employer. Theseconditionsremainin force after you havefinished usingthe course. IFE: 2022 Examinations The Actuarial Education Compan CS1-02: Probability distributions Page 1 Probability distributions Syllabusobjectives 1.1 Define basic univariate distributions quantiles and moments. 1.1.1 Define and explain the key characteristics ofthe discrete distributions: geometric, binomial, negative binomial, hypergeometric, Poisson and uniform 1.1.2 and use them to calculate probabilities, on a finite set. Define and explain the key characteristics of the continuous distributions: normal, lognormal, exponential, gamma, chi-square, t, F, beta and uniform on aninterval. 1.1.3 Evaluate probabilities and quantiles associated with distributions (by calculation 1.1.4 or using statistical software as appropriate). Define and explain the key characteristics ofthe Poisson process and explain the connection between the Poisson process and the Poisson distribution. 1.1.5 Generate basic discrete and continuous transform 1.1.6 The Actuarial Education random variables using the inverse method. Generate discrete and continuous random variables using statistical software. Company IFE: 2022 Examination Page 2 0 CS1-02: Probability that are used in actuarial work. distributions Introduction This unit introduces the standard distributions Welook in this chapter at all the standard probability distributions usedin Subject CS1. This chapter does assume that you have some basic knowledge of statistics and probability. your knowledge in this area is rusty, you can purchase additional ActEd materials to remind If you of those statistical ideas. Pleasesee the ActEd websitefor further details. Thereis a book called Formulae and Tablesfor Examinations (simply denoted the Tablesin this course) available in the exam, which contains manyrelevant formulae for the distributions in this chapter as well as probability tables. Thisis available from the IFoA and you should purchase a copy assoon as possible(if you have not already done so) asit is essential to your studying of the Subject CS1course. Many of the formulae in this course are contained in the Tables. So you should concentrate being able to apply them to calculate means, variances, coefficients on of skewness and probabilities, rather than memorisingthem. If you havestudied statistics to A-Levelstandard or equivalent you should find this chapter straightforward. However, some ofthe standard distributions (eg lognormal and gamma) that are used frequently in statistical workin finance and insurance, may be new to you. Since we will be usingthe properties of these distributions in the rest of the course,it is vital that you feel confident with them. IFE: 2022 Examinations The Actuarial Education Compan CS1-02: Probability 1 distributions Page 3 Important discretedistributions In this section we willlook at the standard discrete distributions that we will usein actuarial modelling work. Remember all of these results are given in the Tables concentrate on understanding and applying them, particularly to calculating probabilities, rather than memorisingthem. The distributions considered ofsuccesses, here are all models for the number number oftrials, by the variables are integers as counting variables. 1.1 of something eg number number of deaths, number of claims. The values assumed from the set {0, 1, 2, 3, ...} such variables are often referred Uniformdistribution Sample space }Sk?={1, 2, 3, Probability , . measure: equal assignment (1/k) to all outcomes, ie all outcomes are equally likely. Random variable X defined by X() =ii , =()ik? 1, 2, 3, 1 x()==PX ()xk?=1, 2, 3, k Distribution: , . , PX x == 6 for x =1,2,...,6 . For example,if Xis the score on afair die, 1() Moments: (1?++2 EX[] == To see that =+ = ?++ 12 + 12 ? (Sk + 21 kk +(1) +kk) kk + = 1 2 = 1 +kkk(1), suppose that: 2 + 1) - + k Rearrangingthe terms on the right-hand side, wesee that: Sk ?(1) k =+ - + 2 + + 1 Addingthese two expressions for S gives: 2 (1????????????????? Sk) (2 k 1)+ ? +( k - 1 +2) +( + 1) =kkk ( +1) =+ + + - k terms So: =+ The Actuarial 12 Education + ? Sk + Company = 1 k( k 2 + 1) IFE: 2022 Examination to Page 4 CS1-02: The second Probability distributions moment is: The result 22 (1 EX 2[] 2 12 22 ? ++ 2 + 2 ? ++ = + ) 1kk (k 6 == 1 k 6 kk (1)(21)++ 1)(2k + + (kk + = kk 1)(2 k + 1) 6 1) can be proved byinduction, but this is beyond the scope of Subject CS1. EX () and EX2() givenabove, wecanshowthat the varianceof Xis: Usingthe formulae for k 2 -1 12 s2 = Question s2 Verifythat k 2 -1 = 12 for the uniform distribution. Solution s is the standard deviation, 22 2 EX() [ E( X)] s and =- s2 is the variance, - 2(2 3 kk ++ = as: +(1)2 (1)(2kk 1) k++ = which is calculated 64 1) 3(22 - + 2 1) kk+ 12 k2 -1 = 12 R code for simulating a random sample from the discrete uniform distribution To generate a vector for sample space S ={1, 2, 3,..... , 20}: S = 1:20 To simulate 100 values from this sample(S, 1.2 100, replace sample space: = TRUE) Bernoulli distribution A Bernoulli trial is an experiment possible outcomes s (success) which has (or can be regarded as having) only two and f (failure). Sample space }Ss = {, f . The words success necessarily carry IFE: 2022 Examinations with them the ordinary and failure meanings of the are merelylabels they do not words. The Actuarial Education Compan CS1-02: Probability distributions Page 5 For example in life insurance, Probability measure: Random variable a success could ({ })Ps p= , mean a death. ({Pf })p=-1 X defined by Xs () = 1, Xf() = 0. X is the number of successes that occur (0 or 1). PX Distribution: Moments: = p 1-xx), x = 0,1; =- pp22 = p(1 - p) variable is also called an indicator variable its value can be used to indicate whether or not some specified event, for example A, occurs. Set1X = if A occurs, 0if A does not occur. If distribution. The event PA() = p then A could, for example, X has the above be the survival Bernoulli of an assured life An assuredlife is a person with aninsurance policy that Another example of a Bernoulli random 1 number of sixes obtained, p==p1 See R code for 1.3 <<01p p s A Bernoulli x()== p -(1 Binomial over one year. makesa payment on death. variable occurs when a fair dieis thrown 566,and )(0 PX () == 56 and )(1 PX once. If X is the ==16. distribution. Binomialdistribution Consider (i) a sequence the trials of n Bernoulli trials are independent as above such that: of one another, ie the outcome of any trial does not depend on the outcomes of any other trials and: (ii) the trials are identical, ie at each trial Such a sequence is called a sequence for short, asequence of n Bernoulli ({Ps=}) p. of n independent, identical, ()ptrials or, ()ptrials. A quick wayof saying independent andidentically distributed is IID. The independence Bernoulli allows the probability of a joint be expressed as the product of the probabilities Wewill need this idea later. outcome involving two or more trials to of the outcomes associated with each separate trial concerned. Sample space S:the joint Probability measure: as above for Random variable The Actuarial set of outcomes Education of all n trials each trial X is the number of successes that occur in the n trials. Company IFE: 2022 Examination Page 6 CS1-02: Distribution: PX The coefficients ?? n?? p (1 x?? x() == - pxn)-x = 0, 1, 2,,? xn ; Probability distributions <<01p , here are the same asin the binomial expansion that can be obtained using the n?? n ??== Cx x?? numbers from Pascals triangle, ie n! - ()!nx x! . Wecan work out these quantities using the nCr function on acalculator. If Xis distributed binomially with parameters n and p, then wecan write )XBin ? The fact that a Bin( n, p) distribution arisesfrom the sum of nindependent Bernoulli () p trials is important (, p . n andidentical and will be used later to prove some important results. np= Moments: s 2 np(1 )p=- Very often when using the binomial distribution we will write 1 -=pq. Asan example of the binomial distribution, suppose that X is the number of sixes obtained when C==() x afair dieis thrown 10times. Then PX one six in ten throws is exactly one six, 1 10C1 1 6 () ie the six 10x 1 6()x 56()-10x and the 5 ()9 =0.3230.There are10 6 could be on the first throw, probability of exactly =10C1 of obtaining () ways the second throw, .... or the tenth throw. Question Calculatethe probability that atleast 9 out of a group of 10 people who have beeninfected by a serious disease will survive, if the survival probability for the disease is 70%. Solution Thenumber of survivorsis distributed binomially withparameters =10n , and = 0.7p. If Xis the number of survivors, then: (PX 9) = 9 or 10) (PX== 10?? ??= 0.7 ?? 0.3 + 10?? 910 ?? 0.7 = 0.1493 910?? Alternatively, wecould usethe cumulative binomial probabilities given on page 187 of the Tables. x8= in the Tablesfor the Bin(10,0.7) distribution is 0.8507. Subtracting this from The figure for 1, weget 1 0.8507 0.1493-=as before. The R code for simulating values and calculating probabilities and quantiles from the binomial distribution uses the Rfunctions rbinom, dbinom, pbinom and qbinom. The prefixes r, d, p, and q stand for random generation, density, distribution and quantile functions respectively. IFE: 2022 Examinations The Actuarial Education Compan CS1-02: Probability distributions Page 7 The R code for simulating a random sample of 100 values from the binomial distribution with n 20=and p n = 20 p = 0.3 0.3=is: rbinom(100, n, To calculate p) =(2)PX : dbinom(2, n, p) Similarly, the cumulative distribution function (CDF) and quantiles can be calculated with pbinom and qbinom. For a Bernoulli distribution the parameter n is set to n1= . 1.4 Geometricdistribution Consider variable again a sequence of interest success occurs. this distribution of independent, now is the number identical of trials with ({Ps=}) p. The until the first Because trials are performed one after the other and a success is awaited, is one of a class of distributions Random variable Bernoulli trials that has to be performed X: Distribution: Number of the trial For =X x there success, so PX called waiting-time on which the first must be a run x()== p(1 - success of )x -(1 failures distributions. occurs followed p)x - 1, x = 1, 2, 3,? (0 by a <<1)p 1 Moments: = s 2 p (1 - )p = p2 For example, if the probability that a phone call leads to a saleis 14 phone calls required to makethe first sale, then (PX 3)== 14 and Xisthe numberof 2 ()3 = 0.140625. 4 Question If the probability of having a maleor female childis equal, calculate the probability that a womans fourth child is her first son. Solution 3 The probability is ?? ?? ?? The Actuarial Education Company 11 = 22 0.0625 . IFE: 2022 Examination Page 8 CS1-02: Consider the conditional probability >(PX Giventhat there have already been n trials more than x additional trials are required x>+ n | X n( PX of the events >X x >+ n| X > ) without a success, whatis the probability that to get a success? ie just the same as the original The lack of success n and >+X xn is just -() PX n>+x = on the first PX > (1 () n probability n trials distributions n) . PA To answer this, we will need the conditional probability formula )PA(| B = The intersection Probability = that p)xn n () B PB () . >+X xn, so: + (1 =- (1 - p)n more than is irrelevant p) x (PX = x trials under this > x) are required. model the chances of success are not any better because there has been a run of bad luck. This characteristic a reflection of the independent, identical important, and is referred to as the memoryless property. trials structure is Question The probability of having a maleor female child is equal. A woman hastwo boys and a girl. Calculatethe probability that her next two children are girls. Solution Dueto the memoryless property, the children she has sofar areirrelevant whenit comes to 2 working out the probability that the next two are girls. So the probability is Another formulation 1- p p 2?? of the geometric distribution is sometimes used. Let Y be the number of failures beforethe first success. Then PYp==y() = 1?? ?? = 0.25. p(1 - )y , y = 0, 1, 2, 3,? with mean . YX=-1 , where X is defined as above. Question Determine the variance for this formulation. Solution Since =-1YX : var( ) var(YX)== IFE: 2022 Examinations 1- p p2 The Actuarial Education Compan CS1-02: Probability Subtracting distributions Page 9 a constant from a random variable does not change the spread of the distribution. The R code for simulating values and calculating probabilities and quantiles from the geometric distribution is similar to the R code used for the binomial distribution using the R functions rgeom, dgeom, pgeom and qgeom. For example: dgeom(10, calculates the 0.3) probability ( = 10)PY for p 0.3= . 1.5 Negativebinomialdistribution This is a generalisation of the geometric distribution. The random variable X is the number ofthe trial on which the k th success occurs, where k is a positive integer. For example, in atelesales company, X might be the number of phone calls required to makethe fifth sale. Distribution: Wesaythat PX x()== k 1 ??-x ??- p (1 1 ?? - kx k p) - xk, k 1, ?; =+ X has a Type 1 negative binomial )kp ( , <<01p distribution. The probabilities satisfy the recurrence relationship: PX (x)== x Note that in applying 1 (1 - p) (PX xk - this k Moments: = p = x - 1) model, the value of k is known. and: Note: The mean and variance arejust s 2 = (1 kp ) - p2 k times those for the geometric()p variable, whichis itself a special case ofthis random variable (with 1k = ). Further, the negative binomial variable can be expressed as the sum of k geometric variables (the number of trials to the first success, plus the number of additional trials to the second success, plus ... to the ( k -1)th success, plus the number of additional trials to the k th success.) Question If the probability that a person will believe arumour about a scandal in politics is 0.8, calculate the probability that the ninth person to hear the rumour will bethe fourth person to believeit. The Actuarial Education Company IFE: 2022 Examination Page 10 CS1-02: Probability distributions Solution Let X be the position of the fourth person who believesit. Then = 0.8p , X =9 and4k = , and we have: 8?? (PX==9) ?? 3?? Another formulation 0.8 45 0.2 = 0.00734 of the negative binomial distribution is sometimes used. Let Y bethe number offailures beforethe kth success. Then where PY y()== 1 ky+- ?? ?? p (1y pky), y = 0, 1, 2, 3, ?, with mean (1 - kp) . p = ?? k=-YX , X is defined as above. Thisformulation is called the Type 2 negative binomial distribution and can befound on page 9 of the Tables. It should be noted that in the Tables the combinatorial terms of the gamma function factor has been rewritten in (defined later in this chapter). The previous formulation is known asthe Type 1 negative binomial distribution. Theformulae for this version are given on page 8 ofthe Tables. The R code for simulating values and calculating probabilities and quantiles from the negative binomial distribution is similar to the R code used for the binomial distribution using the Rfunctions rnbinom, dnbinom, pnbinom and qnbinom. For example: dnbinom(15, calculates 10, the probability 0.3) (PY 15)== 0.0366544 for p 0.3= and k 10= . By default, 1.6 R uses the Type 2 version of the negative binomial distribution. Hypergeometricdistribution This is the finite population equivalent of the binomial distribution, in the following sense. Suppose objects are selected at random, one after another, without replacement, from a finite population consisting of ksuccesses and -Nk failures. Thetrials are not independent, since the result of one trial (the selection of a success make-up of the population from which the next selection is made. Random variable of size N that X: is the number ofsuccesses has k successes IFE: 2022 Examinations and Nk-failures or a failure) affects the in a sample of size n from a population . The Actuarial Education Compan CS1-02: Probability distributions Distribution: Page 11 ???kN ??? () x PX - k? ? xn - x ?, ??? N ?? == x = 0,1,2,3,? ; 01p<< . ?? n?? Moments: nk N = s 2 nk Nk()( -- Nn) = 2(1) NN - (The details of the derivation ofthe mean and variance ofthe number of successes are not required by the syllabus). Note that the meanis given by distribution the initial = proportion nk , N which parallels the of successes np= result for the binomial k N here being . In the above context, the binomial is the model appropriate to selecting withreplacement, which is equivalent to selecting from an infinite population N?8 for which: (success) is kept fixed. k N Pp== Hence, the binomial, hypergeometric when The hypergeometric N is large functions N=pk compared , provides a good approximation to n. values and calculating probabilities and quantiles from the distribution is similar to the Rcode used for other distributions rhyper, to the distribution is used in the grouping of signs test in Subject CS2. The R code for simulating hypergeometric with dhyper, phyper using the R and qhyper. For example: rhyper(20, simulates 15, 20 values from 10, 5) samples of size 5 from a population in which = 15k and Nk-=10 . Question Amongthe 58 people applying for ajob, only 30 have a particular qualification. If 5 of the group are randomly selected for a survey about the job application procedure, determine the probability that none ofthe group selected havethe qualification. Calculatethe answer: (i) exactly (ii) using a binomial approximation. The Actuarial Education Company IFE: 2022 Examination Page 12 CS1-02: Probability distributions Solution (i) Let X denote the number of applicants from the group of 5that havethe qualification. , k 30= Usingthe probability function of the hypergeometric distribution with N 58= , and n5= : (PX 0)== 30??? 28 ? ??? ? 05 ? ??? 58?? ?? = 0.0214 5?? Alternatively, wecould consider in turn the probabilities that each candidate is unqualified, and multiply the probabilities together: (ii) 28 27 58 57 24 54 = 0.0214 Usingthe hypergeometric distribution, approximation with 5n = , and p = (PX= 0) 1.7 ? = 58N , and 30 58 =30k, so we will use a binomial : 28??5 ?? = 0.0262 58?? 5?? 05 ?? p q = 0?? Poisson distribution This distribution models the number of events that occur in a specified interval when the events occur one after another in time in a well-defined manner. This presumes that the events occur singly, at a constant rate, and that the numbers that occur in separate (ie non-overlapping) time intervals areindependent oftime, manner of events of one another. These conditions can be described loosely by saying that the events occur randomly, rate of .. per .., and such events are said to occur according to a Poisson process. formally define this later in this chapter. Another approach to the Poisson distribution uses arguments which appear at first be unrelated to the above. Consider a sequence of Binomia l n (, p) distributions and p0? together, to the distribution Here at a We will such that the mean np is held constant of the Poisson variable, with parameter at the value sight to as n ?8 ?. The limit leads ?. ?=np . Distribution: == ? ?-xe, x() PX x! x = 0, 1, 2, 3, ? ; ? 0> The probabilities satisfy the recurrence relationship: PX x()== IFE: 2022 Examinations ? x P( X = x - 1) The Actuarial Education Compan CS1-02: Probability If distributions Page 13 X has a Poisson distribution with parameter ?, then wecan write X ? Poi ?() . Moments: Since the binomial meanis held constant to suggest that the distribution of at ? through X (the limiting the limiting distribution) process, it is reasonable also has mean ?. This is in fact the case. The binomial variance is: np(1 ??? p)-= ?? n???1 - ??? This suggests that nn ? ? ? ?1 = ? ? ? ? X has variance ?? as n n ?? - ? ? 8 ?. This is in fact also the case. So 2 ==s? . Question Using the probability variance. function for the Poisson distribution, Hint: for the variance, consider prove the formulae for the mean and EX [( X - 1)] . Solution The meanis: E( X) ?xP( X = x) = = ? -??? x -?? +ee =+?? +?? Since e ? 2 e -- ?? 1=+?+ 2! EX ()==-? e + e ?? + 2! -??? e 1=+ 2! ?? + 2! ?? ? --- 34 e -? 3! ? 3! 4 4! e ?+ ? +? ???+ ?? ?? 3! 23 3! +? , we obtain: ? is actually to consider However,the easiest wayto work out the variance EX [( X -1)] : [EX( X-=1)] ?x( x - 1)( P X = x) = 2 1 x 2 -??? e 1=+?? -?? Education 23 23 For the variance we need to work out EX() 2. The Actuarial ?? eee +++ 234 Company ee == ?? ? + e ??? +3 ?? 2 23 23! 4 -e +4 3 4! e- ? +? 2 2! ???+ ?? ?? 22 IFE: 2022 Examination Page 14 CS1-02: Probability distributions Now: EX [( X 1)]-= E( X ) E( X) = 22- (EX2 ) ? ?? 2 =+ X) ? 2 =+(E ? So: var( ) =XE( X ) - [ E( X)]22 2 =+ ?? 2 ? -= ? Wecan calculate Poisson probabilities in the usual way,usingthe probability function or the cumulative probabilities givenin the Tables. Question If goals are scored randomly in a game of football at a constant rate ofthree per match,calculate the probability that morethan 5 goals are scoredin a match. Solution The number of goalsin a matchcan be modelled as a Poisson distribution PX (5)>= 1 - (PX = with mean ?=3 . 5) Wecan usethe recurrence relationship given: (PX 0)== e - 3 =0.0498 (PX 1)== (PX 2)== (PX 3)== (PX 4)== (PX 5)== So we have 3 1 3 2 3 3 3 4 3 5 (PX>=5) 0.0498 = 0.1494 0.1494 = 0.2240 0.2240 = 0.2240 0.2240 = 0.1680 0.1680 = 0.1008 1 - 0.9161 = 0.0839 . Alternatively, wecould obtain this directly usingthe cumulative Poisson probabilities given on page 176 of the Tables. For Poi(3), the figure for x5= is 0.91608, and 1 0.91608 0.08392-= . IFE: 2022 Examinations The Actuarial Education Compan CS1-02: Probability distributions Page 15 The Poisson distribution large and p is small provides a very good approximation to the binomial when n is typical applications have approximation depends areirrelevant. So,for example, the value of dealing binomial 100= or more and p ?= ()np the individual only on the product effectively the same as the value of n PXx= () in the case n PXx= () in the case n 0.05= or less. The values of n and p 200=and 400=and p 0.01= . p = 0.02 is When with large numbers of opportunities for the occurrence of rare events (under assumptions), the distribution of the number that occurs depends only on the expected number. We willlook at other approximations in Chapter 6. Question 110-8 of beingkilled by If each of the 55 million peoplein the UKindependently has probability afalling meteoritein a given year, use an approximation to calculate the probability of exactly 2 such deaths occurring in a given year. Solution If Xis the number of people killed by a meteoritein a yearthen distribution with n = 55,000,000 and p distribution with: ? np== 55,000,000 1 10 - 8 = = 11 80- . X follows the binomial Wecan approximate this by usingthe Poisson 0.55 Hence: (PX= 2) 0.55 2 2! e-0.55 =0.0873 The Poisson distribution is often usedto modelthe number of claims that aninsurance company receives stretch per unit of time. It is also used to model the number of accidents along a particular of road. When events are described as occurring as a Poisson process with rate ? or randomly, at a rate of ? per unit time then the number of events that occur in a time period oflength t has a Poisson distribution with meant? . The Poisson process is discussed in more detail in Section 3. The R code for simulating values and calculating probabilities and quantiles from the Poisson distribution is similar to the R code used for other distributions using the R functions rpois, dpois, ppois and qpois. For example, to calculate ppois(5, The Actuarial Education (PX== 5) 0.9432683 for ? 2.7= , use the R code: 2.7) Company IFE: 2022 Examination Page 16 CS1-02: Probability distributions Question The number of homeinsurance claims a company receives in a monthis distributed as a Poisson random variable with mean2. Calculate the probability that the company receives exactly 30 claims in a year. Treat all months as if they are of equal length. Solution Let X denote the number of homeinsurance claims received in a year. Sincethe number of claims in a month follows the Poi(2) distribution; (PX==30) Alternatively, P( X 24 30 30! IFE: 2022 Examinations (24)Poi . The required probability is: e-24 = 0.0363 wecould use the cumulative 30)== X? (PX = 30) ( PX-= 29) = Poisson probabilities 0.90415 - 0.86788 = given on page 184 of the Tables: 0.0363 The Actuarial Education Compan CS1-02: Probability 2 2.1 distributions Page 17 Important continuousdistributions Uniform distribution X takes values between two specified Probability density function: numbers a and 1 fxX()= x a say. << - a X? (, ) is often written asshorthand for the random variable X has a continuous uniform U a distribution overthe interval a Moments: = s 2 + , by symmetry, 2 = (),a . the mid-point of the range of possible values -ba () 2 12 Question Prove the variance result, [(EX by considering ]- )2 directly. Solution The variance is: X var[] [(EX =- ) ] ??(x ) ( ) x- 12 ( a+ )()f = x 1 2 x dx 22 - = dx -a a =?? x-+ 1 3( a -+ +a =- 24 ()() a ) () ???? ?? a ()() ) a 3?? ) a 3(a 3( ()() a 2 33 a -11(a =- -=- 3( ) () a) -- 33 11 22 -a ()() 3( --a 11 24 + 22 a ()22 ) 1 = 12 - 2 () a In this model, the total probability of 1 is spread evenly between the two limits, subintervals of the same length have the same probability. The Actuarial Education Company so that IFE: 2022 Examination Page 18 CS1-02: Probability distributions Question If (50,150) , ?YU calculate PY(> 74) and <<(50 126)PY . Solution The PDFis given by fy() ( PY 74)>= Similarly, == 150 - 50 76 100 = 11 100 for <<50 150y. This gives: 0.76 PY(50 << 126) = 0.76. This probability is the same since the two subintervals havethe samelength. The Rcode for simulating 100 values from a (0,3)U distribution is: runif(100, min=0, max=3) The PDFis obtained by dunif(x, min=0, max=3) and is useful for graphing. To calculate probabilities for a continuous distribution punif. For example, to calculate PX==(1.8) 0.6 for punif(1.8, min=0, Similarly, the quantiles Although there are not manyreal-life with qunif. examples of the continuous distribution. uniform distribution, Asample of random numbers from to generate random samples from other distributions. 2.2 by max=3) can be calculated nevertheless animportant we use the CDF which is obtained (0,3)U use the R code: it is (0,1)Uis often used Wewill do this in Section 4. Gamma (includingexponentialandchi-square)distributions The gamma family of distributions PDF can take different variableis }xx {: shapes has 2 positive parameters and is a versatile family. depending on the values of the parameters. The range The of the >0. The parameter a changes the shape of the graph ofthe PDF,and the parameter ? changesthe x scale. The gamma distribution maybe written in shorthand as Gamma(, a? ), or Ga (,a? ) . First note that the gamma function ()aG is defined for a > 0 as follows: 8 ()G= ? a a 1 te tdt -- 0 IFE: 2022 Examinations The Actuarial Education Compan CS1-02: Probability distributions Page 19 Notein particular that G() 11= , =-(1)!aa ), andG 1 ~ = G() 2 () GG(aa) ()a=-11 for 1 (ie if a> function =>- a ? x G ()a 1 a e- with parameters a and ? is defined by: for x ?x to answer examination ()nG is gamma(n). The PDF of the gamma distribution fX ()x is an integer p . These results are given on page 5 of the Tables and are all that is required questions. The R code for the gamma a 0 aa Moments: ==2 s , ? ? 2 Question Prove the formulae given for the mean and variance. Solution Remembering that the formulae for mean and variance are EX () =?xf (x ) dx , and x var Xx)=-E? 22( f ( x ) dx [ (X )] , using appropriate limits, we have: x 8 a EX=?G() x ? () a? e- x dx a 0 a Using integration by parts withux ?? EX () ?? =- 8 a? = ?a 11 e aa ?? GG() a? ?? ??0 xa? and a G() a 8 xx dv dx 8 - ? () 0 a ax a x = e -? 1 , we obtain: - ?-e dx ? a ?G() xe a?x -- 1 dx 0 The integral is the integral EX ()= ? = of the PDF over the whole range which is 1, giving: a ? The Actuarial Education Company IFE: 2022 Examination Page 20 CS1-02: Probability distributions Forthe variance, weneed :() EX2 a 8 e dx G0() x 21? +-a? x ? EX() = a a Usingintegration by parts withux = EX() 21 x + 1 = 8 Theintegral is ?? 8 +1 , 8 weobtain: aa ?? ( ?? - ? ??0 GGa( ) ?? e a+ 1)x - a 11-? e dx ? a a? ?G() 0 a G()a +-xx a? =- () 0a? ? xe dx -a?x ?a + EX () , so we have EX2() 1= aa , hence: ?? var()X aa (1)+ a =-=?? ??2 a ?? 22 ??? Weshall see in alater chapter that these results can be proved far more easily using moment generating functions. Wecan calculate gamma probabilities in simple cases byintegrating the PDF. Question If X ? Gamma(2,1.5), calculate PX >(4 ) . Solution Usingintegration by parts: 8 1.52 (4) ?G(2) xe-1.5x PX>= dx 4 x +??? =-1.5 2.25 2.25 ?? 1.5xx ?? 4 41 1.5 1 88 ee--1.5 1.5 ?? =+ - 1.5 ?? ?? dx ?? 4 ?? ?? ?? 8 ee--61.5x ?? ?? 4?? ? 4 1 ?-6 e-6 =+?? e ? 22 ? 1.5 1.5?? ? 2.25 ? = 0.0174 IFE: 2022 Examinations The Actuarial Education Compan CS1-02: Probability distributions Page 21 We will see a quicker wayto do this question later in the chapter. The R code for simulating and a 2= a random sample of 100 values from the gamma distribution with 0.25=is: ? rgamma(100, 2, 0.25) Similarly, the PDF, cumulative distribution function (CDF) and quantiles can be obtained using the Rfunctions dgamma, pgamma and qgamma. Specialcase1: exponential distribution Gamma with1a = . PDF: fxX ?XE =??() - x e , x >0 ()xp ? is often written as shorthand for the random variable X has an exponential distribution with parameter Moments: ?. 11 ==2 s, ? ? 2 x FxX() ??ee?? dt 1 - -- == tx 0 For many of the continuous distributions, the CDFis given in the Tables. Question is the value of msuchthat ?()Expdistribution. (The median Determine the median of the PX m ()==1 2.) Solution Since PXm==m () FX( 1-= ), we have: 0.5 ? 0.5 =ee -- ?? mm ? ? m -= ln0.5 ? m =- 1 ln0.5 ? Since ln =-0.5 ln2 , wecan say m= ln2 . ? The exponential distribution is used as a simple model for the lifetimes of certain types of equipment. Very importantly, it also gives the distribution of the waiting time, T, from one event to the next in a Poisson process withrate ?. This is proved in Section 3 ofthis chapter. The Actuarial Education Company IFE: 2022 Examination Page 22 CS1-02: The R code for simulating values and obtaining Probability the PDF, CDF and quantiles from the exponential distribution is similar to the Rcode used for other continuous using the Rfunctions rexp, dexp, distributions distributions pexp and qexp. Question Claims to a general insurance with a rate of 3 per hour. companys 24-hour call centre occur according to a Poisson process Calculate the probability that the next call arrives after more than 1/2 hour. Solution The number of claims, X, in an hour can be modelled as a Poisson distribution with mean ?=3 . Hence,the waitingtime, T, between claims can be modelled as an exponential distribution with ?=3 . So: 8 PT (1/2) >= ?3e dx = [ - e33xx]1/2 = 0- ( - e- 11/2) = 0.2231 -- 8 1/2 In fact the time from any specified starting point (not necessarily event occurred) to the next event occurring has this exponential can also be expressed as the memoryless the time at which the last distribution. This property property. Recallthat the geometric distribution in Section 1.4 hasthe memoryless property. For the exponential distribution wecan also show that: PX x>+ n(| X > n) = P( X > x) For example, the probability that we wait atleast afurther 10 minutes given that we have already waited 20 minutes is equal to the unconditional probability of waiting at least 10 minutes. Question Prove that if ? ()XExp ?, then >(| PX x>+ n X > n) = (PX x) . Solution n(PX x>+ n | X > ) n(, PX x>+ n X = PX PX = ) x>+ () n PX > n () e ? -+ e-?n IFE: 2022 Examinations > > () n xn() == -?x = X > x()eP The Actuarial Education Compan CS1-02: Probability distributions Page 23 Note: A gamma variable with parameters ka = (a positive integer) and ? can be expressed as the sum of k exponential variables, each with parameter ?. This gamma distribution is in fact the model for the time from any specified event in a Poisson process with rate ?. The fact that a Gamma(, a? ) random and identical ?()Exprandom starting point to the occurrence variable can be thought variables is important of asthe sum of and will be used in alater of the kth aindependent chapter to prove some important results. Special case 2: chi-square2()? distribution freedom ? Gamma with a = ? where 2 So the PDF of the chi-square (1/2 )1/2 1/21 ? fxX()= x G(1/2?) Moments:2? Note: A ? 2 ?- distribution e 1/2x - for and ? = 1 2 . is: x>0 s ==2, ? variable Since integrating is a positive integer, ? with parameter degrees of with2? the PDFisnt is the same as an exponential = straightforward, variable extensive probability with mean 2. tables for the chi-square distribution are given in the Tables. Thesecan befound on pages 164-166. Another result that we will usein later workis: If W Gamma? (,a? ) , then ?2 W has a degrees of freedom) provided that a2 2 ? 2a distribution is aninteger. (ie a chi-square distribution with a2 Thisresult is alsoin the Tables(on page 12). Wecan prove this result using moment generating functions which we will meetin alater chapter. This is animportant result asit is the only practical gamma distribution in an exam. way wecan calculate probabilities for a Wecanlook up probabilities associated with the ?2 distribution, for certain degrees offreedom, in the Tables. The R code for simulating values and obtaining the PDF, CDF and quantiles from the chi-square distribution is similar to the R code used for other continuous distributions using the Rfunctions rchisq, dchisq, pchisq and qchisq. The Actuarial Education Company IFE: 2022 Examination Page 24 CS1-02: Probability distributions Question If the random variable Xfollows the (a) PX >(6.5) (b) PX(<11.8). ?2 5 distribution,calculate: Solution Using the ?2 probabilities (a) given on pages 164166 of the Tables, we obtain: 0.7394-=1 0.2606 (b) Here weneed to interpolate between the two closest probabilities given, PX 11.5)<=(0.9577 and PX 12) <=(0.9652, so: ie (PX< 11.8) 0.9577 + 11.8 - 11.5 (0.9652 12 - 11.5 Alternatively, wecould useinterpolation - 0.9577) = 0.9622 on the ?2 percentage points tables given on page 168-169 of the Tables. These give the approximate answers of 0.2644 and 0.9604. Wenow repeat an earlier question usingthe ?2 result. Question If X Gamma? (2,1.5) , calculate PX >(4 ) , by using the chi square tables. Solution Since X Gamma?(2,1.5), we know that PX(> 4) = P(3 X 12)>= using the ?2 probability ( 2 P ? 4 3X ? ?2 4. So: 12)>= 1 - 0.9826 = 0.0174 tables given on page 165 of the Tables. This gives usthe same answer as we obtained earlier, but without the integration by parts. IFE: 2022 Examinations The Actuarial Education Compan CS1-02: Probability 2.3 distributions Page 25 Betadistribution This is another versatile family the variable is 0 xx<<{: 1}. of distributions First note that the beta function )a? (, with two positive parameters. The range of is defined by: 1 ? ? (,a ) )a 11dxxx(1 -- =- 0 The relationship (,a ) ? between beta functions and gamma functions is: ()GG a( ) = G +a () The R code for the beta function )ab? (, is beta(a,b). The PDF of the beta distribution is defined by: fxX()=-111 (,a ? ) x )ax (1 for -- <<01x Moments: 2 s aa == , +a a The (continuous) uniform () 2 ( ++ a distribution + 1) on (0,1) is a special case (with ==1a ). The beta distribution is a useful distribution becauseit can be rescaled and shifted to create a widerange of shapes - from straight lines to curves, and from symmetrical distributions to skewed distributions. Since the random variable can only take values between 0 and 1,it is often usedto model proportions, such asthe proportion of a batch that is defective or the percentage of claims that are over 1,000. Question The random variable X has PDF fxX()kx ) , =- x32(1 , where k is aconstant. Determine <<01x the value of k. Solution Comparing the PDFdirectly with that of the beta distribution, wecan see that a=4 and =3 . So: k The Actuarial G(7) (4)GG(3) Education == 60 Company IFE: 2022 Examination Page 26 CS1-02: Probability distributions 1 ? k can also befound directly from (1 ) dx-=kx 1 by multiplying out the bracket first and then x32 0 integrating. The R code for simulating values and obtaining the PDF, CDF and quantiles distribution is similar to the R code used for other continuous functions 2.4 rbeta, dbeta, pbeta distributions from the beta using the R and qbeta. Normal distribution This distribution, with its symmetrical bell-shaped importance in both statistical theory and practice. (i) it is a good model for the distribution density curve is of fundamental Its roles include the following: of measurements that occur in practice in a widevariety of different situations, for exampleheights, weights,IQ scoresor exam scores. (ii) it provides good approximations to various limiting form of the binomial )np (, . It is also used to approximate other distributions the Poisson distribution. in particular it is a Both of these approximations are covered in Chapter 6. (iii) it provides a model for the sampling distributions of various statistics see Chapter 7. (iv) much of large sample statistical inference is based on it, and some procedures require an assumption that a variable is normally distributed. We will look at this in Chapters 9 and 10. (v) it is a building The distribution mean about block for has two many other distributions. parameters, and the standard deviation which can conveniently s of the distribution. be expressed directly as the The distribution is symmetrical . The notation used for the Normal distribution The PDF of the normal fx() = 1 2 distribution - 1/2 e x - ?? s ?? ?? is is defined ?XN(, 2)s. by: 2 for -8 <8x < sp Alinear function of a normal variable is also a normal variable, ie if YaX=+ b. distributed, so is This result can be proved using moment generating functions X is normally which we will meetin the next chapter. IFE: 2022 Examinations The Actuarial Education Compan CS1-02: Probability distributions Page 27 It is not possible to find an explicit used. These are provided for the expression distribution for Fxx (P of Z = X - X==()x) , so tables have to be , which is the standard normal s variable it has mean 0 and standard deviation 1. The distribution is symmetrical about 0. Wecan also prove this result using moment generating functions. The x-values so on. s , The z-value , ++ 2 s , + 3s measures how correspond to the z-values many standard 0, 1, 2, 3 respectively, and deviations the corresponding x value is = 30xfrom a normal distribution with above or below the mean. For example the value mean20 and standard deviation 5 has z-value +2 (30 is 2 standard deviations above the mean of 20). The calculation of a probability for a normal variable is always done in the same transform to the standard normal via z = x and look upin the tables. way s Standard normal probabilities are given on pages 160-161 of the Tables. The probabilities in the table are left cumulative distribution function hand probabilities, of Z. Wesometimes in other use words they give PZ < z , the() ()zF for the CDF of Z. Since Z is symmetrical about zero, it follows that: PZ <- z () = P( Z > z) PZ >- z () = P( Z < z) = 1 - (PZ < z) Question If XN? (25,36) , calculate: (i) PX <(2 8) (ii) PX >(3 0) (iii) PX <(2 0) (iv) (| PX 25|-< 4). Solution (i) (PXZ<=28) P < 25??-28 Z( ??= P 36 ?? Thefollowing answers useinterpolation < 0.5) = 0.69146 between tabulated values: (ii) P( X 30) > 0.833) = 1 - (PZ < 0.833) = 1 - 0.7976 = 0.2024 (PZ>= (iii) P( X 20) (PZ<= < - 0.833) The Actuarial Education Company = 1- (PZ < 0.833) = 1 - 0.7976 = 0.2024 IFE: 2022 Examination Page 28 CS1-02: (iv) Weneed to simplify the expression involving (| PX - 25| < 4) P=( 4 X<- 25 (21=< PX < PX=< (29) - < Probability distributions the absolute value: 4) 29) PX <(21) (PZ =< 0.667) -PZ ( < -0.667) PZ 0.667)- 1 -PZ ( < 0.667) []( =< = 0.4952 The normal distribution is used in many areas of statistics and often we need to find values of the standard normal distribution connected to certain probabilities, for example the value of a such that =()Pa-< Z < a 0.99 . Common examples of this type of calculation are now given. 95% and 99% intervals: PZ( <=1.96) 0.97500 (?- 1.96 <PZ Similarly < 1.96) so = (0 PZ<< 1.96) = 0.97500 2 0.47500 ( 2.5758 <PZ ?- 95% of a normal distribution < 2.5758) - 0.5 = 0.47500 0.95 = = 0.99 . is contained So (approximately): in the interval 1.96 standard deviations on either side of the mean, and 99%is contained in the interval 2.5758 standard deviations on either side of the mean. Note: All but 0.3% of the distribution 3s limits. (The range is contained of a large in the interval set of observations from )s (3 , -+ 3s a normal the so-called distribution is usually about 6 or 7 standard deviations). Finally, we note that, if X has the standard normal distribution, then X2 has the chi-squared distribution (the special case ofthe gamma distribution given above). In fact X2 ? 2 ?1 here. Thisresult can be usedto find EZ() 2 and var( 2)Z . The R code for simulating values and obtaining the PDF, CDF and quantiles distribution is similar to the R code used for other continuous distributions functions rnorm, dnorm, pnorm and qnorm. from the normal using the R 2.5 Lognormaldistribution If X represents, for example, claim size and said to have a lognormal =logYX has a normal distribution, then X is distribution. log X hererefersto naturallog, orlog to basee,ie lnX . If X has alognormal IFE: 2022 Examinations distribution with parameters and s, then we write ?logXNs2 ( , The Actuarial Education ). Compan CS1-02: Probability distributions Page 29 The PDF ofthe lognormal logx e 2 xsp - - 1/2 1 fx() = distribution is defined by: ?? 2 ?? ?? s for 0 x<<8 Thelower limit for xis 0 and not -8 , asit is for the normal distribution. Thisis because log x is not defined for x0= . Question If ?log (5,6)WN , calculate PW >(3,000) . Solution If log (5,6)WN? , then lnWN(5,6) ? PW( . This gives: 3,000)>= P(ln W > 8.006) P( Z > 1.227) = = 1 - 0.8901 The meanand variance ofthe lognormal distribution are not 1 s+ e[]= EX 2 2 and var[ ] Xe2 ( s+s 22 =- = 0.1099 and s2 but are given by 1)e . Question If the mean of the lognormal parameters distribution is 9.97 and the variance is 635.61, calculate the and s2. Solution 1 e 2 s+ 2 = 9.97 and s+2 s ee 22 This can be rearranged to give (1)-= 635.61, so 9.972( es =2 s Substituting into the equation for the 2 1)-= 635.61. 2. mean, we get e+ =1 9.97. Takinglogs givesus =1.3 . Thelognormal distribution is positively skewed and is therefore a good modelfor the distribution of claim sizes. Wealso usethe lognormal distribution in Subject CM2 to calculate the probabilities associated with accumulating funds. The R code for simulating values and obtaining the PDF, CDF and quantiles from the lognormal distribution is similar to the R code used for other continuous distributions the Rfunctions rlnorm, dlnorm, plnorm and qlnorm. The Actuarial Education Company using IFE: 2022 Examination Page 30 CS1-02: Probability distributions 2.6 t distribution If the variable normal X has a distribution 2 ?? distribution of the form and another independent variable Z has the standard (0,1)Nthen the function: Z X / ? is said to have a t distribution The t distribution, with parameter degrees like the normal, is symmetrical of freedom ?. about 0. You do not need to know the PDF of the t distribution for the exam. It is in fact given on page 163 of the Tables. Calculating probabilities byintegrating this PDFis not easy. Fortunately, we will only be expected to look up probabilities using page 163 of the Tables. Question Usethe t tablesto calculate: (i) (ii) Pt(15 < 1.341) the value of a such that (iii) Pt a)>=8(0.01 Pt(24 <-0.5314) . Solution From the Tables: (i) 1.341)>=15( 10% , so Pt (ii) a =2.896 (iii) Bysymmetry: Pt 1.341)<=15( 90% . Pt24 ( <-0.5314) (= Pt24 > 0.5314) = 30% The t distribution is usedto find confidence intervals and carry out hypothesis tests on the mean of a distribution. We will meetit again in Chapters 7, 9 and 10. The R code for simulating values and obtaining the PDF, CDF and quantiles distribution is similar to the R code used for other continuous distributions functions rt, dt, IFE: 2022 Examinations from the t using the R pt and qt. The Actuarial Education Compan CS1-02: Probability 2.7 distributions Page 31 Fdistribution If two independent random and 2n respectively, variables, X and Y have ?2 distributions with parameters 1n then the function: / Xn1 /Yn2 is said to have an F distribution with parameters (degrees of freedom) Once again,it is not necessaryto know the PDFofthis distribution. 1n and 2n . Wefind probabilities by using the F tables given on pages 170-174 of the Tables. The F distribution is not symmetrical. Given that only upper tail probabilities PF a,, c()>=P ab Tables, we will need to know the fact that ??=P Fb 11 Fc?? ab probabilities. ??, are given in the ()< < 1c to find lower tail This will be covered in greater detail in Chapter 7. Question Usethe F tables to calculate: PF( 5,12< 3.106) (i) (ii) the value of a such that PF( 7,4 ) 0.01a>= . Solution From the Tables: PF5,12 (3.106)>= 5%, so PF( 5,12 3.106) 95%<= (i) (ii) a 14.98= . This distribution is usedto find confidence intervals and carry out hypothesis tests on the variances oftwo distributions. We will meetit againin Chapters 7, 9, 10, and 12. The R code for simulating values and obtaining the PDF, CDF and quantiles distribution is similar to the R code used for other continuous distributions functions The Actuarial rf, Education df, from the F using the R pf and qf. Company IFE: 2022 Examination Page 32 3 CS1-02: Probability distributions ThePoissonprocess Earlier,in Section 1.7, we metthe Poisson distribution, X? with probability function (PF): ?()Poi x PX () x== ? e ?- x! , x = 0,1,2, ? Thisis useful for modellingthe number of events (eg claims or deaths) occurring per unit time. For X?? Poi (), wehave events occurring at arate of ? per unit time. A Poisson process occurs when welet the time period vary. Soinstead oflooking at the number of events occurring per unit time, we now look at the number of events occurring up to time t. Question Aninsurer receives car claims at arate of 8 per calendar week. Write down the distribution of the number of claims received: (i) per day (ii) per year. Solution The number of car claims per week has a Poi(8) distribution, (i) carclaims perdayhasaPoi7()8distribution (ii) car claims per year has a Poi(416) distribution therefore the number of: (using 52 weeks in a year). From the previous question it should be clear to see that if we have X ? Poi ?() modellingthe number of claims per unit time, then X() )tPoi ? ( ?t will modelthe number of claims up to time t. PXt (( ) x)== ? t() x x! e ?- t x = 0,1,2, ? , Question The number of deaths amongst retired members of a pension scheme occurs at a rate of 3 per calendar month. Calculate the probability of: (i) 5 deaths in January to Marchinclusive (ii) 12 deaths in June to October inclusive. IFE: 2022 Examinations The Actuarial Education Compan CS1-02: Probability distributions Page 33 Solution Let t()Xbe the number of deathsin a t -month period. The number of deaths per calendar monthfollows the Poi(3) distribution, therefore: (i) the number of deaths in January to (PX(3) (ii) 95 5! 12)== Poi(9) distribution: e-9 =0.0607 the number of deaths in June to (PX(5) 3.1 5)== Marchinclusive follows the 1512 12! October inclusive follows the (15)Poi distribution: e-15 = 0.0829 DerivingPoissonprocessformulae In Section 1.7, westated that the Poisson distribution could be usedto modelevents occurring randomly one after another in time at a constant rate and that the numbers of events that occur in separate (ie non-overlapping) time intervals areindependent of one another. However, we derived the distribution from the binomial distribution. In this section, weshalllook at the Poisson process, Nt () , by considering events occurring in a small interval oftime. To start with, weshall define mathematically the properties of a counting process and a Poisson process. The Poisson process is an example of a counting process. Here the number of events occurring is of interest. Since the number of events is being counted over time, the event number process (i) {(Nt)}t=0 must satisfy the following conditions. N(0)= 0, ie no events have occurred attime 0. (ii) For any0t > , Nt () must be integer valued. ie wecant have 2.3 claims. (iii) When<st , Ns () = Nt ( ) , ie the number of events over time is non-decreasing. ie if we have counted, say, 5 deaths in 2 months, then the number of deaths counted in a 3 month period which includes the 2 month period must be at least 5. (iv) When<st interval , )st(, Nt () - N( s) represents the number of events occurring in the time . () events up to time s, so there are ie we have counted Nt () events upto time t and Ns Nt () - Ns ( ) events counted between time s andtime t . These arethe mathematical properties of any counting process. We will now define the mathematical properties for a Poisson process. The Actuarial Education Company IFE: 2022 Examination Page 34 CS1-02: The event number process {(Nt)}t =0 is defined to be a Poisson process Probability distributions with parameter ? if the following three conditions are satisfied: (i) N(0 =) 0, and Ns () = Nt ( ) when<st . These are just properties (i) and (iii) from above for any counting (ii) PN(( t h)+= r | N( t ) PN (( t h)+= r + 1| N( t ) = r ) ?= PN (( t h)+> r + 1| Nt ( ) = r ) = o(h) (Note that a function = r) = o(h) 1- +?h h fh() is described + (2.1) o( h) as oh() if lim h ?0 (iii) When<st , the number process. fh() h = 0.) of events in the time interval ]st(, is independent of the number of events up to time s. In other words,the numbers of events that occurin separate (ie non-overlapping) time intervals areindependent of one another. Condition (ii) states that in a very short time interval oflength h, the only possible numbers of events are zero or one. Condition (ii) also implies that the number of events in atime interval oflength h does not depend on when that time interval starts. Question Explain how motor insurance claims could be represented by a Poisson process. Solution The eventsin this case are occurrences of claim events (ie accidents, fires, thefts, etc)reported to the insurer. The parameter ?represents the average rate of occurrence of claims (eg 50 per day), which we are assuming remains constant throughout the year and at different times of day. The assumption that, in a sufficiently short time interval, there can be at mostone claim is satisfied if weassume that claim events cannot lead to multiple claims (ie no motorway pile-ups, etc). The reason why a process satisfying conditions (i) to (iii) is called a Poisson process is that for afixed value of t, the random variable Nt () has a Poisson distribution with parameter ? t . This is proved as follows. First we need alittle shorthand. Let ptn P( Nt ( )==()n) . IFE: 2022 Examinations The Actuarial Education Compan CS1-02: Probability So,if distributions Page 35 Nt () satisfies conditions (i) to (iii) given above, then pt ( ) n exp{=-?t ?t() n! } with probability function: (2.2) Recallthat for a partitionkBB? 1,, equations from the conditions and , the probability of any event Ais: (PA| B ) P kk( ) PA ()B=+PA| B ) P( 11( B ) +? P( n by time ) n This will be proved by deriving differential-difference then showing that (2.2) is their solution. For afixed value of t0> at time t and write: Nt () ? Poit(? and a small positive value of h, condition on the number of events th)+= P( n by time th | n by time t ) P( n by time t ) + (Pn bytime t++ h| n - 1 by time t ) Pn( 1 by time t ) + ? Hence using (2.1) and the ()nptnotation, we obtain: ()+= p h nn 1( t )[h??h + o(h+)] pt npt ( )[1- - hphnn-1() t =+ [1 t ] p () -??h + + o(h+)] o(h) o( ) Thus: nn(pt) () h +- pt = ? h[ p n and this identity holds for =n -1 t() ()] + n pt - 1,?2, 3, o h() (2.3) . Nowrecall the formal definition of differentiation: dt (ft ) = dft+t() h lim f ( )?? ?? h h ?0 Now divide (2.3) by h, and let ?? h go to zero from above to get the differential-difference equation: pt lim () h+- p (t ) nn = lim hh?? 00 ++ [h pn-1(t )- p n()] t ? + o( h) hh ie: d dt By definition pt()=- Education p -1[ t() oh () lim h? 0 + The Actuarial ? h Company nptnn ()] (2.4) ?0. IFE: 2022 Examination Page 36 CS1-02: Probability distributions We will now consider the special case when0n = . There is only one possibility in this case: (0 by time Pt h)+= (0 by time Pt + h |0 bytime t ) P(0 by time t ) ie: h+=00pt () ( )[1 -h? h + o( )] pt So: pt () h+- 00(pt) hpt0()+ o h( ) =-? and therefore: pt ( lim h)+- p () t 00 = lim - ?hpt0() hh?? 00 ++ When0n = , an identical d dt pt() =- 00 with initial condition + o( h) hh analysis yields: (2.5) ( ) pt? p0(0)1= . It is now straightforward to verify that the suggested solution (2.2) satisfies differential equations (2.4) and (2.5) as well as the initial conditions. both the Question Show that d dt t pt e- ? n()= pt()=--1[ p ? nn () n ?t n! t () satisfies the differential equations: pn( t )] and: d dt pt()=- IFE: 2022 Examinations ? 00 (pt) The Actuarial Education Compan CS1-02: Probability distributions Page 37 Solution pt n() Wehave = e- pt n() dt ? t ? t() n n! dd = . Calculating the derivative usingthe product rule gives: ? e -?t dt?? ?? t ?? ()n ?? n! ?? ()-nnn ?? ()-nn1 =- +?? ee pt =-1() ? tn?? t 1 nn!! tt ?? tt -- ??() nn-!(1)! p (t )[] nn pt = e- ?t , which gives a derivative of: 0() Similarly, pt()==-dd e dt { dt 3.2 tt +ee?-- =- e -- ??} tt - ?? pt()00= Waiting times betweeneventsin a Poissonprocess This study ofthe Poisson process concludes by considering the distribution of the time to the first event,1T , and the times between events, 23, , ...TT . Theseinter-event times are often called the waiting times or holding times. In Section 2.2, wesaid that the has an PT t >= ()1 So the distribution Ft() t== that no events occur between time 0 and time t. P N t() ()0= exp{ = function of1T is: 1 exp{--() ?t} P T1 = has an exponential Now weconsider the distribution Consider the conditional PT t | 21 T>= () r = =(1) P Nt Education Company distribution with parameter ?. of the time between the first and the second event. P( T1 + T2 > t (PN t Hence: -?t} distribution = The Actuarial events in a Poisson distribution ?()Exp distribution. P(T1 >t) is the probability so that 1T waiting time between consecutive of2T + given the value of1T . r | T1 = r ) r() += 1| N( r ) r() +- (Nr ) = = 0| 1) (Nr) = IFE: 2022 Examination Page 38 CS1-02: Because the number of events in the time interval is independent up to the start of that time interval P N( t r )+- Nr() =01 (condition Nr() = = Probability ofthe number of events (iii) above): P( N( t r )+- Nr() () =)0 Since the number of events in a time interval of length r does not depend time interval starts (condition (ii) above, equations (2.1)), we have: PN t(+- r() Hence, 2T on when that N r) = () = P N t ) =00()( = exp { - ?t} has an exponential This calculation distributions distribution can be repeated for with parameter ? and 2T is independent of1T . 23,, ...TT . Wehave now shown that each waiting time has an Ex ?()p distribution. The inter-event time is independent of the absolute time. In other words, the time until the next event has the same distribution, irrespective of the time since the last event or the number of events that have already occurred. Question If reported claims follow a Poisson process with rate 5 per day (and the insurer hotline), calculate the probability that: (i) there will befewer than 2 claims reported on a given day (ii) the time until the next reported claim is less than an hour. has a 24-hour Solution (i) The number of claims per day, X, follows the Poi(5) distribution, PX (2)<= (PX =+ = = so: 0) + PX =(1) ee 55 5 -- 0.0404 Alternatively, wecould read the value of PX =(1) from the cumulative Poissontables listed on page 176 of the Tables. IFE: 2022 Examinations The Actuarial Education Compan CS1-02: Probability (ii) distributions Page 39 Thenumber ofclaimsperhour,Y, followsthe Poi24()5distribution, so the waitingtime (in hours),T, followsthe Exp 24 ()5distribution. Hence: (PT<=1) ?24 e 5 0 24 55tt 11 ?? dt = - e-- 24 ?? ?? 0 5 = 1 - e -24 = 0.188 Alternatively, wecould usethe cumulative distribution function for the exponential distribution given on page 11 ofthe Tables. The Actuarial Education Company IFE: 2022 Examination Page 40 4 CS1-02: Probability distributions MonteCarlosimulation Withthe advent one ofthe of high-speed personal computers Monte Carlo simulations most valuable tools of the actuarial profession. of the practically important problems have become This is because the vast majority are not amenable to analytical solution. Wehave already seen that we can simulate samples from distributions listed in Sections 2 and 3 using the rgamma, Rfunctions rexp, rchisq, rbinom, rgeom, rbeta, rnorm, Below we outline one basic simulation most of these distributions. Thisis known asthe inverse transform distributions. rnbinom, rlnorm, technique that rhyper, rt rpois, runif, and rf. can be used to simulate values from method. It can be applied to both continuous and discrete 4.1 Inverse transform methodfor continuous distributions The method works byfirst generating arandom number from a uniform distribution on the interval (0,1) . Wethen usethe cumulative distribution function ofthe distribution we are trying to simulate to obtain arandom value from that distribution. First we generate a random simulate a random variate number, U, from the (0,1)U distribution. We can use this to X with PDF fx() by using the CDF, Fx() . Let U bethe probability that X takes on a valueless than or equalto x, ie: x== UP ). Then x can be derived as: x() = F( X xF 1()u - = Hence,the following two-step algorithm is usedto generate arandom variate x from a continuous distribution 1. generate 2. return IFE: 2022 Examinations with CDF Fx() : a random number u from U(0, 1), xF 1()u = - . The Actuarial Education Compan CS1-02: Probability distributions Wecan represent Page 41 this on a diagram as follows. Recallthat the cumulative distribution, We have a random value, u, between 0 and 1. Fx () , increases from 0to 1 as x increases: F(x) 1 u 0 If weset x = - x -1 F (u) = x()uFwecan obtain arandom value, x, byinverting the cumulative distribution, Fu . Hencethis methodis called the inverse transform 1() This method requires that our distribution has a cumulative method. distribution function Fx() for which wecan write down an algebraic formula. Thisrules out the gamma, normal, lognormal and beta distributions. Formally, wecan provethat the random variable X =FU1() PX x == () P F 1[ U () - x == ][P U = F x()] = hasthe CDF Fx() , as follows: F () x Example Generate a random variate from the exponential distribution The distribution function () 1 FxX =-e ?- of with parameter ?. X is given by: x Hence: xF 1()u ==- - log(1 -)u Thus, to generate following a random ? variate x from an exponential 1. generate a random variate u from U(0, 1) 2. return The Actuarial distribution we can use the algorithm: Education =- log(1 Company - xu ) ?. IFE: 2022 Examination Page 42 CS1-02: Probability distributions The main disadvantage ofthe inverse transform methodis the necessity to have an explicit expression for the inverse ofthe distribution function Fx() . Forinstance, to generate a random variate from the standard normal distribution we need the inverse of the distribution function: Fx() = 1 x transform method e -t 2/2 dt ?-8 2p However, using the inverse no explicit solution to the equation uF ()x=can be found in this case. However,it is possibleto generate simulated values from a standard normal distribution. The procedure is asfollows. 1. 2. Generate a random number (0,1)U distribution. If u > 0.5, usethe Tables directly to find z such that simulated value from 3. u from the PZ u==z() . In this case our (0,1)Nis z. If 0.5u< , usethe Tablesto find z suchthat value from (0,1)Nis z- . PZu==z() 1 - . In this case our simulated Wecan generalise this methodto generate a value from any normal distribution using the transformation x=+ zs XN(, 2)s? by . Question Simulate three values from the Ex (0.1)p distribution using the values 0.113, 0.608 and 0.003 from U(0,1) . Solution Using the inverse transform x 1 =- - 0.1 ) =- method, we have: 10ln(1 - uuln(1 ) This gives: x =- 10ln(1 - 0.113) = 1.20 x =- 10ln(1 - 0.608) = 9.36 x =- 10ln(1 - 0.003) = 0.03 Wecan also generate random samples from other distributions, for example the uniform distribution. IFE: 2022 Examinations The Actuarial Education Compan CS1-02: Probability distributions Page 43 Question Generatethree random values from the U(1 - ,4) distribution using the following random values from (0,1)U : 0.07 0.628 0.461 Solution The distribution function for the U(1 - ,4) distribution is: Fx () = x +1 5 Wenow set our random value, u, equal to this and rearrange: x+1 = Substituting, ux ? 5 u=-51 we obtain: x 5= 0.07 x 5= 0.628 - 1 = 2.14 x 5= 0.461 - 1 = 1.305 - 1 0.65 = - Wecan alsoseeintuitively that if westart by generating arandom number from we multiplyit by 5it will become arandom number from (0,1)U , then if (0,5)U . If wethen subtract 1,it will become a random number from U -(1,4) . Example Generate a random variate X from the double exponential distribution with density function: fx() 1 2 ? e x , x=?-? ? It is possible in this case to find the distribution function F corresponding to f and to use the inverse transform method, but an alternative methodis presented here. The density f is symmetric about 0; we can therefore generate a variate Y having the same distribution as ||X and set XY=+ or XY=with equal probability. The Actuarial Education Company IFE: 2022 Examination Page 44 CS1-02: The density of ||()fyX Probability distributions ||Xis: e?- y y=>? 0 , which is easily recognised algorithm therefore 1. generate 1u as the density of the exponential generates a value for and 2u from distribution. The following X: (0, 1)U , 2. 4.2 if u1 < 0.5 returnyu =- ln(1 - 2) ?, otherwise return ?=- ln(1 yu 2 ) . Discretedistributions Wecannot algebraically invert step function. the distribution The distribution function, ( 5) FP( X== 5) = P( X = 0) Given a random function of a discrete random variable, asit is a Fx () , is the sum ofthe probabilities so far, eg: P( X+= 1) + P( X+=?5) (0,1)U wecan read off the x value from the distribution function value, u, from graph asfollows: F(x) 1 u 0 x 0 1 From the graph, 23 4 5 6 we can see that in this particular case our value of u lies between (2)Fand (3)F . This gives3x = as our simulated value. Soin general, if ourvalue ulies between -1jFx () and ()jFx then oursimulatedvalueisjx . If the value of u corresponds exactly to the position of a step, then by convention weusethe lower of the x values,ie the point corresponding to the left-hand end of the step. Let X be a discrete random variable which can take only the values x<< xx12 xx12 ?,, , x N, where ? < N. The first step is to generate a random number, this to simulate IFE: 2022 Examinations a random variate X with PDF U,from the (0,1)U distribution. fx () by using the CDF, We can use Fx() . The Actuarial Education Compan CS1-02: Probability distributions Page 45 Let U bethe probability that Fxjj)U 1- () <= (Fx P X(=+x() ie Notethat for X takes on a value less than or equal to x. Then X jx= if: P X = 12 x ) +x??+ (PX + 1<xx x - 1) = U= PX < x1 ()=+ (PX = x+2) (PX = jj) we have0Fx () = . Hence, the following three-step algorithm is used to generate arandom variate discrete distribution 1. generate with CDF a random number 2. find the positive integer 3. return x from a Fx() : u from (0,1)U. i such that Fxii) 1- () u<= (Fx . i= xx . Wecan see that the algorithm and that the probability P( that can return a particular only variates value returned is x=ii )=< P[ Fxx--() x from the range i= xx is returned U (Fxi )] = F xi() { ?,, xx12 , x N} is given by: - (Fxi 11value ) = P( X = i ) Question Simulate two random values from the Poi(2) distribution using the random values 0.721 and 0.128. Solution PX e== x() - x (PXe==0) -2 (PXe==1) x! 2 2 , x = 0,1,2, ? = 0.1353 -2 = 0.2707 (PX 2)== 2 e 2 = 0.2707 - 2! (PX 3)== 23 3! Since e- 2 = 0.1804 ? F(0) = 0.1353 ? F(1) = 0.4060 ? F(2) = 0.6767 ? F(3) = 0.8571 , 0.721<<(2) (3)FF , the first simulated Since 0.1<28 (0)F, the second simulated Alternatively, we could use the cumulative etc. value is 3. value is 0. Poisson tables on page 175 of the Tables instead of calculating the values. Wecan use a similar approach for the binomial distribution. The Actuarial Education Company IFE: 2022 Examination Page 46 CS1-02: Probability distributions Question Generatethree random values from the Bin(4,0.6) distribution using the following random values from (0,1)U : 0.588 0.222 0.906 Solution The probability function for the Bin(4, 0.6) distribution is: 4?? xx ?? 0.6 0.44- PX x()== x?? x = 0,1,2,3,4 , Calculatingthe probabilities and the cumulative distribution function: (PX==0) 0.4 4 = 0.0256 ? F(0) = 0.0256 (PX==1) 4 0.6 0.43 = 0.1536 ? F(1) = 0.1792 (PX==2) 6 0.6 0.422 = 0.3456 ? F(2) = 0.5248 (PX==3) 4 0.6 3 0.4 = 0.3456 ? F(3) = 0.8704 (PX==4) 0.64 = 0.1296 ? F(4) = 1 Since 0.588<<(2) (3)FF , our first simulated value is 3. Since 0.222<<(1)(2)FF , our second simulated Since 0.906<<(3)(4)FF , our third simulated value is 2. value is 4. Alternatively, it is much quicker to use the cumulative binomial probabilities given on page 186 of the Tables. IFE: 2022 Examinations The Actuarial Education Compan CS1-02: Probability distributions Page 47 Chapter2 Summary Standard discrete distributions covered in this course arethe discrete uniform, Bernoulli, binomial, geometric, negative binomial, hypergeometric and Poisson. Standard continuous distributions covered in this course are the continuous uniform, gamma, exponential, chi-square, normal, lognormal, beta, t and F. The geometric and exponential distributions havethe memorylessproperty: PX x>+ (| n X The properties > n) = (PX > x) of the distributions are summarised on the next page. Thet distribution with k degreesoffreedomis defined as: tk = N(0,1) 2 ? k k The F distribution with 2 ,mn F = ?m 2 ?n ,mn degrees offreedom is defined as: m n The Poisson process counts events occurring up to and including Nt () ? Poit(? time t : ) Tocalculate probabilities weconsiderevents occurringin a smalltime interval h. The waitingtimes between eventsin a Poisson process have exponential distributions. Random variables can be simulated random number, u, from The Actuarial by using the inverse transform continuous x = -1() Fu discrete x =jx Education Company method. First wetake a (0,1)Uthen weset: where Fx - ()j u<=1(Fxj ) IFE: 2022 Examination Page 48 CS1-02: Distribution distributions PF or PDF Mean 1 k +1 k 2 12 p (1 pp-) np np(1 )p- 1 1- p p p2 k (1 -kp) p p2 Discrete uniform (1 -pp)1-xx Bernoulli ?? n??xpp(1 x?? Binomial Probability xn ) - - Variance k2 -1 Distribution Geometric (1 -pp)x - 1 Type 1 Discrete Negative binomial 1??kpp(1 ??- x 1?? k Type 1 )kx- - xe ?- Poisson ? ? x! 1 Continuous uniform 1 a Gamma xe-- a? x G()a Exponential a 1 1 e x () 2 ?2 ? ?- ? a- 12 a 1 ? 1 () a+ 2 a- ? ?2 ? ? ()1 2 Chi-square 2 1 --122 x ? xe Distributions ? 2? G()2 ? Beta () G+ a a-- ()GG a( ) 11 (1 -xx) a a ()2(a a+ a + 1)++ Continuous Lognormal xsp Normal - 1/2 1 e 2 x ??-log s ?? ?? e s+ 1 2 2 2 s+ s ee 22 -(1) 2 1 2 ??-x - 1/2 e s s2 ?? ?? 2 sp IFE: 2022 Examinations The Actuarial Education Company CS1-02: Probability distributions Page 49 Chapter2 PracticeQuestions 2.1 If X ? N(14,20), calculate: (i) PX <(1 4) (ii) PX > (2 0) (iii) PX <(9) (iv) r such that PX )>=(0.41294r . 2.2 Determine the third non-central moment of the normal distribution with mean10 and variance 25. 2.3 Calculate 2.4 2.5 <(8)PX if: (i) X ? U(5,10) (ii) XN? (10,5) (iii) X Exp? (0.5) (iv) X 5??2 (v) X ? Gamma (8,2) (vi) X log (2,5)N? . Arandom variable X follows the Poi(3.6) distribution. (i) Calculate the mode of the probability (ii) Calculatethe standard deviation ofthe distribution. (iii) State, withreasons, whether the distribution is positively or negatively skewed. U denotes a continuous random V denotes a discrete random {1-- variable that is uniformly distributed over the range (1,1)- and variable that is equally likely to take any of the values 11 ,0,22, ,1} . (a) Calculate var()U (b) Comment The Actuarial distribution. Education and var()V . on your answers to part (a). Company IFE: 2022 Examination Page 50 2.6 Exam style CS1-02: Ananalyst is interested in using a gamma distribution with density function (i) (ii) with parameters a2= Probability distributions and 1/2?= , that is, 1 - x 1 fx()=< xe 2 , 4 0 x <8 . (a) State the meanand standard deviation ofthis distribution. (b) Hence comment briefly on its shape. [2] Show that the cumulative distribution function is given by: 1 Fx () - x 1 1=- (1 + )x e 2 , 2 0 < x <8 (zero otherwise). [3] The analyst wishesto simulate values x from this gamma distribution andis able to generate random (iii) numbers u from a uniform distribution on (0,1). (a) Specify an equation involving x and u, the solution of which will yield the simulated value x. (b) Comment (c) The graph below gives Fx () plotted against x. Usethis graph to determine the briefly on how this equation might be solved. simulated value of x corresponding to the random number u 0.66= . 1.2 1 0.8 0.6 0.4 0.2 0 0 5 10 15 20 x [Total IFE: 2022 Examinations The Actuarial Education [3] 8] Company CS1-02: Probability 2.7 Calculate (i) distributions Page 51 <(8)PX in each of the following cases: Xis the number of claims reported in a year by 20 policyholders. Claimsreporting from each policyholder occurs randomly at arate of 0.2 per yearindependently of the other policyholders. (ii) Xis the number of claims examined up to and including the fourth claim that exceeds 20,000. The probability that any claim received exceeds 20,000 is 0.3independently of any other claim. Xis the number of deathsin the coming year amongst a group of 500 policyholders. (iii) Each policyholder has a 0.01 probability of dying in the coming year independently of any other policyholder. (iv) Xis the number of phone calls made before an agent makes the first sale. The probability that any phone call leads to a saleis 0.01independently 2.8 Arandom variable follows the lognormal distribution of any other call. with mean 10 and variance 4. Calculate the probability that the variable willtake a value between 7.5 and 12.5. 2.9 The random variable N has a Poisson distribution with parameter ? and PNN== (1| 1) = 0.4 . Calculatethe value of ?to 2 decimal places. 2.10 Simulate two observations from the distribution 50 fx() (5 + x) 3 with probability density function: x=> 0 , usingthe random numbers 0.863 and 0.447 selected from the uniform distribution on the interval (0,1). 2.11 Exam style Claimamounts are modelled as an exponential random variable with mean1,000. (i) Calculate the probability that a randomly selected claim amount is greater than 5,000. [1] (ii) Calculate the probability that arandomly selected claim amount is greater than 5,000 given that it is greater than 1,000. [2] [Total 3] 2.12 The ratio of the standard deviation to the mean of a random variable is called the coefficient of variation. Exam style For each of the following distributions, decide whether increasing the mean of the random variable increases, decreases, or has no effect on the value ofthe coefficient of variation: (a) Poisson with mean ? (b) exponential (c) chi-square The Actuarial Education with mean with n degrees of freedom. Company [6] IFE: 2022 Examination Page 52 2.13 CS1-02: Consider the following simple model for the number of claims, Probability distributions N, which occur in a year on a policy: Exam style n 0 1 2 3 PNn= () 0.55 0.25 0.15 0.05 (a) Explain how you would simulate an observation of N using a number r , an observation of arandom variable that is uniformly distributed on (0,1) . (b) Illustrate your method described in (a) by simulating three observations of N using the following random numbers between 0 and 1: 0.6221, 0.1472, 0.9862 2.14 Exam style [4] It is assumedthat claims arising on anindustrial policy can be modelled as a Poisson process at a rate of 0.5 per year. (i) Determine the probability that no claims arise in a single year. [1] (ii) Determine the probability that, in three consecutive years, there is one or moreclaims in one of the years and no claimsin each ofthe other two years. [2] Suppose a claim has just occurred. (iii) Determine the probability that more than two years will elapse before the next claim occurs. 2.15 [2] [Total 5] Consider the following degrees of freedom. three probability statements concerning an F variable with 6 and 12 Exam style (a) PF( 6,12 0.250) 0.95>= (b) PF( 6,12 (c) PF6,12 (0.13)<=0.01 State, 4.821) with reasons, IFE: 2022 Examinations 0.99<= whether each of these statements is true. [3] The Actuarial Education Compan CS1-02: Probability distributions Page 53 Chapter2 Solutions 2.1 (i) Since 14is the mean,the probability is 0.5. (ii) PX ( Z>=20) (iii) (PXZ<=9) P < (iv) (PXZ>= r) P > P The third non-central [(EX ) ]-= Z( ?? = P 20 ?? 914??Z( ?? = P 20 ?? r < > - 1.342) 1 - 0.91020 1.118) 1 - 0.86821 = 0.0898 0.1318 = 14??- ??= 0.41294, which gives: 20 ?? 14??-- PZ 2.2 20 14??- > rr ??<= 0.58706 ? 20 ?? E[ X2 ] - E3 = 0.22 ?r = 14.98 20 EX3[] . The formula for the skewness is: moment is 33 [ X ] 14 + 2 3 Wealso know that the skewness of the normal distribution is zero, so: 0 [EX ]=- 3 10 (25 + 32) + 2 103 10 ? [EX3] = 1,750 Wehave worked out EX2[] here by turning around the relationship var(]XE )]=-X[ 2 E[ X()2 . 2.3 (i) Uniform 8 (8) PX<= 0.2x[] 85 = 0.6 =dx ?0.2 5 Alternatively, wecould usethe DFgiven on page 13 ofthe Tables. (ii) F(8)<= = 85 - 10- 5 =0.6. Normal (PXZ<=8) P < 1=- (PZ 1= The Actuarial PX (8) Education Company 810??Z( ??= P 5 ?? < < - 0.894) 0.894) 0.81434 0.1857 IFE: 2022 Examination Page 54 (iii) CS1-02: Probability distributions Exponential 8 (PX 8)<=? 0.5e 0.5 dx = e-- - 8 xx?? ??0 0.5 = 1 - e- 4 = 0.98168 0 Alternatively, wecould usethe DFgiven on page 11. (iv) 8)<= F(8) = 1 - e- 0.5 8 = 0.98168. Chi-square Usingthe (v) (PX 2? tables on page 165 of the Tables gives PX<= (8) 0.8438. Gamma The only practical wayin a written exam to calculate probabilities involving a gamma random variable is to usethe relationship X2? ? ? 2 and then read off the probability from the 2a ? 2 tables. PX(< 8) = P(2 (vi) 2 X < 16 (4 X 32)<= P( ? 16 ?? ) = P Lognormal Using the fact that if (PX ?logXNs ( , (i) 2 ) then ln 8)<= P(ln X <ln8) = P<??Z (PZ=< 0.036) 2.4 32)<= 0.9900 ln8 ?XN( )s , 2 : 2??- 5 ?? 0.5144 Mode Wecan find the mode by calculating probabilities and seeing which value has the highest probability. PX (0 ) e-== 3.6 = 0.02732 Usingthe iterative formula for the Poisson distribution gives: (PX 1)== 3.6 0.02732 (PX 2)== 1 3.6 2 0.09837 (PX 3)== 3.6 0.17706 3 = 0.09837 = 0.17706 = 0.21247 (PX 4)== 3.6 4 0.21247 = 0.19122 (PX 5)== 3.6 5 0.19122 = 0.13768 etc Wecan see that 3is the mode. IFE: 2022 Examinations The Actuarial Education Compan CS1-02: Probability (ii) distributions Standard Page 55 deviation The variance ofthe distributionis ()Poi ? distribution is ?. Sothe standard deviation of the Poisson(3.6) 3.6 1.8974= . (iii) Skewness The Poisson distribution is positively skewed asthe modeof 3is lower than the meanof 3.6. In fact the Poisson distribution is always positively skewed. distributions, wefind that For most positively skewed mode< median< mean. positively skewed mod 2.5 (a) median mean Meanand variance The probability density function of U is constant, ie 1 fvV() ,==x Theprobabilityfunction of Vis constantie 1, fuU() 2 1 <1x=-< . -1, 1/2,0,1/2,1 . 5 Bysymmetry the meanvalue of both variablesis zero. Alternatively: 1 EU () uf ( u) du = = ?? u = -1 ?vP( V EV () 1 udu == (=- 1 11 u2?? ?? -1 24 1 1 =-= 4 4 0 v) +)( -1/2 11 55 +)(0 1 5 +)(1/2 1 ) +(1 5 1 ) = 5 0 Sothe variance of Uis calculated from: 1 31122 u du==?? u EU() ?26 1 ?? -1 = 13 -1 ? Alternatively, The Actuarial var( U) =- 0 2 =11 33 wecould use the formula Education Company ba-2 from 1 12() page 13 of the Tables. IFE: 2022 Examinations Page 56 CS1-02: Probability distributions So: 2 ( - 1)2 (11 52) ++ = ?v P V v EV()22 () = ? (b) 02 2(1) 2++ 12??== 1 ?? 2 =112 var( V)0=- 22 Comment The varianceis a measureofthe spread of values. Both distributions take valuesin the range from - 1 to 1+ and are centred around zero. However, the variance of V is greater than the variance of U becausethere is a greater probability of obtaining the extreme values1- and1+ . 2.6 (i)(a) The meanand standard deviation of the distribution For a gamma distribution (EX) == and a2= =4 0.5 ? (i)(b) with sdX( ) = ? aa 0.5= : 22 = = =8 22 0.5 ? [1] The shape of the distribution Since X cannot take negative values and the standard gamma distribution with (ii) 2.828 and a2= deviation is large relative to the mean,the 0.5=is positively skewed. ? [1] Cumulative distribution function Thecumulative distributionfunction, x X() ( Fx PX ()XFx , is: 1 1 x)== =? 4te t 2 - dt x >0 [1] t=0 Usingintegration 1ut4= and by parts, with x Fx X()= dv dt =e - 1 2t: 1 ? 14 te - 2t dt t=0 ?? x ?? ?11 = 11 -- ??-- - +11 =- IFE: 2022 Examinations 1 -x 2 11tdt 44 220 ??0 +?? =- te ttx ee 22 ?? -- ??000 11tt 1 x xxx ? -t1 ? 11?22 e dt =- 12xe 2 - ?e2 ? 22 - ? 1ex 2 () ? x>0 [2] The Actuarial Education Compan CS1-02: Probability (iii)(a) distributions Page 57 Equation to simulate values of x Weequate the random number u to the cumulative distribution function: 1 =- (iii)(b) ue -2 x 11+ 2 x() 1 x >0 [1] Solving the equation For a given random number, ie a given value of u, wecould solve for x by: trial and error using Table Mode on the calculator using the Newton-Raphson method or some other iterative approach. [1] Alternatively, the function for u could be plotted against x, and then usedto determine the x value corresponding to a u value. (iii)(c) Usingthe graph 1.2 1 0.8 u=0.66 When =0.66u , x = 4.5 . 0.6 0.4 0.2 0 0 5 x = 4.5 So, the simulated 2.7 (i) 10 15 20 x value of x is 4.5. [1] Poisson The number of claimsincurred by each policyholder follows the Poisson distribution with mean0.2. Therefore the number of claims for the 20 policyholders follows the Poi(4) distribution. Since the Poisson distribution cumulative The Actuarial probability Education only takes integer values, PX (8) PX<= =(7) . Using the Poisson tables gives 0.94887. Company IFE: 2022 Examinations Page 58 CS1-02: x x()==??x! e- Alternatively, wecould use PX PX =(7) (the iterative formula (ii) to calculate the values of distributions =(0)PX , = (1)PX , ..., would speed up this process), and then add them up. Negative binomial Weare counting the number of trials up to and including Type 1 negative binomial distribution with k4= (PX So PX (8) (PX x x)== the 4th success. This describes the and p ??-1 44 x?? 0.3 0.7 3 ?? 0.3= . x = 4, 5,6, ... (PX<= = 4) + ? + (PX = 7). 3 ?? 4)===?? 0.3 4 3 ?? Now using the iterative Hence, Probability 0.0081 formula PX () x == x- 1 x- 4 (PX 5)== 4 1 0.7 0.0081 (PX 6)== 5 0.7 0.02268 = 0.03969 (PX 7)== 6 0.7 0.03969 = 0.05557 0.0081 + 0.02268 + 0.03969 2 3 (PX<=8) q P( X = x -1), weget: 0.02268 = + 0.05557 = 0.12604 . Alternatively, wecould have calculated each ofthe probabilities usingthe probability function. (iii) Binomial Here we havethe binomial distribution with n 500=and p 0.01= . Since nis large and pis small we could use a Poisson approximation part (i)). use the cumulative and then Poisson tables (as we did in Bin(500,0.01) ? Poi(5)(approximately) Using the cumulative Alternatively, Poisson tables gives ( PX 8) (PX<= = 7) = 0.86663 . we could calculate this accurately, starting (PX==0) IFE: 2022 Examinations 500?? ?? 0 ?? 0.99 500 = with the probability of no deaths: 0.00657 The Actuarial Education Compan CS1-02: Probability distributions Page 59 Now using the iterative formula (PX 1)== (PX 2)== (PX 3)== (PX 4)== (PX 5)== 500 20.99 498 (iv) 0.01 30.99 497 0.03318 0.03318 = 0.08363 0.08363 = 0.14023 0.14023 = 0.17600 0.17600 = 0.17635 0.01 496 0.01 50.99 495 0.01 0.01 0.14696 70.99 8)== (PX = 0)+ =PX ( x -1) : 0.17635= 0.14696 60.99 494 p xq = 40.99 (PX==7) Hence, P( X 0.01 nx-+1 x()== 0.01 0.00657 10.99 499 PX (6)== PX = ? + PX ( = 7) = 0.10476 0.86768 . Geometric Weare counting the number of trials up to, but notincluding, the 1stsuccess. This describes the Type 2 geometric distribution with p 0.01= . (PX x)== 0.01 0.99x x = 0,1,2, ? Now: PX (8)<= PX =(7) PX== (0) ?+PX ( + 0.01=+ 0.01 0.99 = 7) 0.01 This is a geometric series, so the quickest 0.99 + + 0.01 0.99 27? wayto add this up is to use the formula for the sum of a n geometric series Sn = PX<=( 8) (1 -ar ) 1-r 0.01 (1- The Actuarial Education Company . This gives: 0.99 8 ) 10.99 = 0.07726 IFE: 2022 Examination Page 60 2.8 CS1-02: Let X denote the random Probability distributions variable. Usingthe formulae for the meanand variance of alognormal distribution: 1/2 2 EX [] es+==10 var( )4=Xe2 (1) (es+22 1) = s (2) Squaring equation (1) and substituting into 2?? var( ) ? es ? s 2 Xes =- 1 10 =?? 4 ?? 2 equation (2): -= 10.04 2 log1.04 == 0.03922 Substituting this into equation (1) gives: 2 1/2 slog10 = 2.2830=- Sothe required probability is: P(7.5 X<< 12.5) ( PX = < 12.5) ( PX - (lnPX=< ln12.5) log7.5 PZ=< =F 2.9 The conditional 2.2830?? (1.226) -F ? log12.5-- 2.2830 ? ? 0.03922 ?? <?-PZ ?? ( - 1.354) 0.08787 0.8020 PN =(1) 1)== = Trial and error gives PN (1) 1.62 1.62 e = e ? = 0.3997 . So -1 ??= - ee ?? ? --11= 1.62 . Tosimulate arandom variable werequire the distribution function, x Fx () ? ? probability is: N(1| PN 2.10 (lnPX < ln7.5) - 0.03922 0.88990= 7.5) < P( X== x) = ?50(5 + ) 0 IFE: 2022 Examinations x dt = - 25(5 + --t ) 32??t ??0= 1 - Fx () : 25 (5 +x) 2 The Actuarial Education Compan CS1-02: Probability distributions Page 61 Wecan now use the inverse transform =- 25 (5+x) ? ux 2 25 - method: =-15u 1 Substituting in our values of u, we obtain: 25 x1=--1 0.863 58.51 = 25 x2=--1 0.447 =51.72 2.11 (i) Probability PX>= ( 5,000) (ii) 1 - F(5,000) =e 0.001 5,000 =e 5 - - =0.00674 [1] Conditional probability (PX>>5,000| X 1,000) = (PX>n 5,000 X >1,000) PX >(1,000) PX >(5,000) = [1] PX >(1,000) Wehave alreadyfound the numerator, wejust need to find the denominator: PX( 1,000)>= 1 -F(1,000) =e So the required e- 5 e (a) = e- 1 probability is: (PX>>5,000| X 1,000) = 2.12 0.001- 1,000 -1 = e -4 = 0.0183 [1] Poisson The ()Poi ? distribution has mean ? and variance ?,so: coefficient of variation Asthe (b) ?? == mean, ?,increases the coefficient [1] ? 1 of variation, 1 ?, decreases. [1] Exponential Weare given the meanofthe exponential distribution, the meanis and the variance is 2 coefficient of variation Asthe mean, , increases the coefficient no effect on the coefficient. The Actuarial 2 Education Company whichis ?= 1 . So workingin terms of . Hence: == 1 of variation, [1] 1,is unchanged, ie changing the mean has [1] IFE: 2022 Examination Page 62 CS1-02: (c) The distributions Chi-square 2 distribution has a mean of n and a variance of 2n. ?n coefficient of variation 2 nn== Hence: 2 n Asthe mean, n,increases the coefficient of variation, 2.13 Probability [1] 2 n, decreases. [1] Methodfor simulating an observation (a) Tosimulate a value from a discrete distribution, wefollow these two steps: 1. 2. Calculatethe DF, Fn (1) If Fn() r = Fn-< ( ) , then the simulated value is n. [1] The CDFis: n 0 1 2 3 PNn= () 0.55 0.8 0.95 1 So the simulated value is given by: r== 0.55 ? ?1 if 0.55 r<= 0.8 n =? ? 2if 0.8 r<= 0.95 ? 3if 0.95 r<= 1 ? ? 0if 0 (b) Simulating Since 0.55 [1] three values 0.6221 0.8<= , the first simulated value is 1. Since 0 simulated value is 0. Since 0.951<=0.9862 2.14 (i) 0.1472 0.55<= , the second , the third simulated valueis 3. [2] Probability of no claims in one year The distribution ofthe number of claims, N, in one yearis Poi(0.5). Hencethe probability of no claimsin one yearis: (PN== 0) (ii) 0.5 0 0! e-0.5 =0.60653 [1] Probability of no claims in two of three years Using our result from ( PN 1)== 1 IFE: 2022 Examinations part (i), the probability ( PN-= 0) = 1 - 0.60653 of one or more claims in one year is: = 0.39347 The Actuarial Education Compan CS1-02: Probability If distributions Page 63 X is the number of years with one or more claim, then: X ? Bin(3,0.39347) So we have: (PXC== 1) (iii) 1 0.39347 Probability that The waiting time, = [2] 0.43425 morethan two years will elapse before the next claim T, in years follows the Ex (0.5)p distribution. 0.5 2 - PT>= ( 2) 1 - (2)F= e 2.15 0.6065332 =e- 1 =0.36788 [2] In this question we will usethe notation Faba ,, to be the upper a% point of the Fab , distribution. (a) Statement (a), true Weknow that F6,12,95= (b) Statement or false? 1 F12,6,5 (b), true . From the Tables, 11 == 0.250, so (a) is true. F12,6,5 4.0 or false? [1] From the Tables, F6,12,1 = 4.821, so (b) is true. (c) [1] Statement (c), true or false? Weknowthat F6,12,99= The Actuarial Education Company 1 F12,6,1 . From the Tables, 11 F12,6,1 7.718 == 0.13, so (c) is true. [1] IFE: 2022 Examination Allstudy materialproducedby ActEdis copyright andis sold for the exclusiveuse of the purchaser. Thecopyright is ownedbyInstitute andFacultyEducationLimited, a subsidiary of the Institute and Faculty of Actuaries. Unlessprior authorityis grantedby ActEd,you maynot hire out,lend, give out, sell, store ortransmit electronically or photocopy any part of the study material. You musttake care of yourstudy materialto ensurethat it is not used or copied by anybody else. Legal action will betaken if these terms areinfringed. In addition, we mayseekto take disciplinaryactionthrough the profession orthrough your employer. Theseconditionsremainin force after you havefinished usingthe course. IFE: 2022 Examinations The Actuarial Education Compan CS1-03: Generating functions Page 1 Generating functions Syllabusobjectives 1.4 Generating functions 1.4.1 1.4.2 Define and determine the moment generating function of random variables. Define and determine the cumulant generating function of random variables. 1.4.3 Usegenerating functions to determine the moments and cumulants of random variables, appropriate. 1.4.4 by expansion as a series or by differentiation, as Identify the applications for which a moment generating function, a cumulant generating function and cumulants are used, and the reasons why they are used. The Actuarial Education Company IFE: 2022 Examination Page 2 0 CS1-03: Generating functions Introduction Generatingfunctions provide a neat wayof working out various properties of probability distributions without having to useintegration repeatedly. For example, they can be usedto: (a) find the mean, variance and higher moments of a probability distribution. This will recap and build upon the work ofthe previous chapter. (b) find the distribution of alinear combination ofindependent random variables, eg+X where X ? and ? ()YPoi . This will be covered in alater chapter. ?()Poi (c) determine the properties of compound distributions. In this chapter we willintroduce Y We will meetthese in Subject CS2. two types of generating functions: moment generating functions (MGFs) and cumulant generating functions (CGFs). We will usethem to derive formulae for the moments of statistical distributions. MGFsare used to generate moments (and so are the most useful to us at this point) and CGFsare usedto generate cumulants. For our present purposes, all we need to know is that the first three cumulants are the mean, variance and skewness. Alot of students get confused in this chapter, asthey wantto know where these definitions come from. Basically, they were invented to make calculations of means and variances easier. Wesaw some examplesin the previous chapter of how to calculate the meanand variance for different well-known probability generating functions to derive distributions. In this chapter we will see how we can use many of these results. The syllabus saysdefine and determine, so makesure you know the definitions of MGFsand CGFsand can find them (where they exist) for all the distributions metin the previous chapter. In addition, the syllabus requires us to determine the moments and cumulants, so ensure you can calculate EX () and var()X for each of these distributions. IFE: 2022 Examinations The Actuarial Education Compan CS1-03: 1 1.1 Generating functions Page 3 Momentgeneratingfunctions Generalformula A moment generating the distribution function (MGF) can be used to generate of a random variable (discrete moments (about the or continuous), ie EX () E( X ), (EX 23), origin) of ?. , Although the moments of most distributions the necessary integrals provides can be determined or summation, utilising considerable directly by evaluation moment generating functions using sometimes simplifications. Definition The moment generating MtX (), of a random variable function, X is given by: MtX () = E etX?? ?? for all values of t for which the expectation exists. MGFscan be defined for both discrete and continuous random variables. Question Write down the value of MX (0) . Solution (0) MEXe0?? ??== 1 =1 []E Thisis true for any random variable X. This can be a useful check in the exam. Make sure that the expression you obtain for the MGF gives 1 when0t = . Wehave defined the expectation of afunction of arandom variable, g () x P( X==? x) or [Eg( X)] x ?g()xf x()Xdx. Sothe MGF isgiven by: x Mt () ?e tx E[ etX]== x The Actuarial gX () , to be Education Company P( X = x) or ?etx f ( x) dx XX x IFE: 2022 Examination Page 4 CS1-03: Generating functions Question Derivethe MGFofthe random variable X with probability function: 3 (PX x)== x = 1,2,3, ? 4x Solution The MGFis: 8 Mt E(etX) X() tx 3 33 et et ?e== =416 x + 4 x =1 Thisis an infinite geometric series this geometric series to infinity Mt X () where 3 et 4 1 4 23t ? +e64 3 + with first term using the formula = 3aet and common ratio 4 8=- a 1- r for = 1ret . Summing 4 1 <Sr < 1 gives: 3et == tt 14-- ee -< 1<ee <114 ? 4-< tt <4 ? t ln4 . Nowlets derive the MGFof a continuous random variable. Question Derive the MGFof the random f() 1xx 1/2(1 ) =- - variable X with probability density function: 1= x= Solution The MGF is: () XMt E( etX)== ?etx f ( x) dx x 1 1/2(1) =-?xetx dx -1 11 tx edx=- ?? 1/21/2xedx -- IFE: 2022 Examinations tx 11 The Actuarial Education Compan CS1-03: Generating functions Usingintegration Page 5 by parts on the second integral 11 ?? ()=XMt 11 -- tx ?? 2t 11 ?? 211 ?22 tt 11 =-22 2ee tt tt - =- -ee + 11 tt + ? -- -1 ?? 1 ?1etx? t ?t t e +2t1 --e t() 1 t 1 t ?? 1 ?11exetx ? =- 1 ??? 1 ?etxdx ??-11 ??? ? ?22ttxe tx etx?? 22tt22 e -- we obtain: ? 1 - 1 2t 1et - 1e- t()+ t t t Thisis known as the triangular distribution. Trysketching the PDFto see why. In a moment well look at how to obtain the MGFsof the standard distributions given in the previous chapter, but first lets find out how we can use MGFsto calculate moments. Calculating moments The method is to differentiate the MGF with respect derivative giving the r th moment about the origin. For example, MtX () ' Xe []tXE so = (0) = to t and then set 0t = , the r th MEX' ()X . Similarly: M '' () t = [EX etX] ? M '' (0) XX = 22 ] [EX M '''()t = [EX etX ] ? M '''(0) XX = 33 [EX ] etc Question Calculatethe meanand variance of arandom variable, X, with MGFgiven by: MtX()1=- t 5()- 1 t < 5 Solution Differentiating the MGF: 2 t Mt 11 =- 55() ()- '' Mt () 125 2 =- t5()- The Actuarial Education Company 3 ? ? E( X) E( X2) == M ''(0) XX 1 5 == M'' XX(0) 2 25 IFE: 2022 Examination Page 6 CS1-03: Generating functions So: 2 XE( X2) var( )X=- E []( ) = 21?? 25 2 ?? - 1 = 5?? 25 Wenowlook at an alternative methodthat uses a series expansion of the MGF. Although it appear to belong-winded, it can be usefulif differentiation is particularly complicated. Expanding the exponential function and taking which is justifiable for the distributions tX MtX() = Ee () +??1=+ tX E 1=+ tE [ X + 2! + 2! (a procedure 3! ?? X 23X ? ?? ?? tt23 E X ][ 23] 3! E X ][ values throughout here) gives: tt23 + expected + ? + from whichit is seenthat the r th momentofthe distribution about the origin, obtainable as the coefficient To use this method to find tr in the power series r! of might expansion of the MGF. moments, we need to obtain a series expansion of the equate the coefficients of the powers of t []rEX , is MGF. Wethen with the above expression. Question EX ())EX ( 2 and EX() 3 , wherethe MGFof Xis given by: Use a series expansion to derive , MtX()1 t5()- 1 t =- < 5 Solution Using the binomial expansion given on page 2 of the Tables: -1 Mt X() 1 5()t =- =+ 1( - 1) 1=+ - () +11 tt + 525 + tt (1 2)--() 125 - 52! 5 t- 23)( +? (1)(2)(3)()5 - + - - 3! t231 +? The MGFcan also be written as: XMt () 1=+ tE ( X IFE: 2022 Examinations + 2! E X )( tt23 + 3! E 23 X ) +? )( The Actuarial Education Compan CS1-03: Generating functions Page 7 Equating the coefficients EX () gives: 1 = 5 11 EX() = 2! EX () = 3! 2 ? 25 11 ( 22=EX ) 25 125 6 33= EX () ? 125 If we differentiate the series expansion for the MGFwithrespect to t and then substitute 0t = this gives (0) MEX ( ), M '''== XX (0) Mt () 1=+ t E( X) X + 2! E( X ?2), E( X )+ tt23 t2 M t() (EX)=+ t E( X ) ? M '' t() (EX )=+ tEX( 23 ) +? The uniqueness property If the distribution of 23 E( X )+? 3! ? + as before. 2! (EX 23) + ? ? ? = (EX) M '' (0) XX = (EX2) etc MGFs of a random variable distribution that exist can be calculated. can be identified. Without going deeply into M ''(0) XX X is known, in theory atleast, all moments ofthe If the moments are specified, then the distribution mathematical rigour, it can in fact be said that if all moments of a random variable exist (and if they satisfy a certain convergence condition) sequence of moments uniquely determines the distribution of X. Further, if a moment generating function then the has been found, then there is a unique distribution with that MGF. Thus an MGF can be recognised as the MGF of a particular distribution. (There is a one-to-one correspondence between MGFs and distributions with MGFs). This uniqueness property will be used in a number of proofs in future chapters. Question Arandom Usethe variable, X, has MGFgiven by MtX() exp{5 t =+ 3t 2} . MGFslisted in the Tables and the uniqueness property to identify the distribution of X. Solution Examining the MGFsgiven in the Tables we want one that involves normal distribution hasthe following Mt ()=+ exp{ t The Actuarial Education Company an exponential term. The MGF: 1/2 t22} s IFE: 2022 Examination Page 8 CS1-03: Equating coefficients, wesee that and =5 s =2 Generating functions 6. Since X hasthe same MGFasthe distribution, the uniqueness property tells usthat X (5,6)N (5,6)N? . Wecan alsoidentify a distribution by the series expansion ofits MGF. Question Identify the continuous distribution for k! which =[]k EX ? k where k =?1, 2, 3, , and ?> 0. Solution The moment generating function X() Mt 1=+ t E[ X] of Xis: tt23 E[ X ] + E[ X 23] + ? 2! 3! + Substituting in the values of the moments given: Mt = 1 + X() t 3! tt23 12! + 2! + ?? 23 3! t t2 ?? 2 +??= 1 + ? 1 - t ()?- 1. By comparing this to standard Thisis t3 +++ ? 3 MGFs, we can see that the distribution is exponential with parameter ?. 1.2 Important examples discrete distributions The MGFs for some of the distributions Discrete introduced earlier are found as follows. uniform The probability function for the discrete uniform distribution on the integers 1, 2,..., k is: PX x()== 1 k , x =1, 2, 3,? , k Sothe MGFis: () X Mt E( etX )== (1 )(ket e2t + ? + ekt ) + ek()(1 =- etkt ) (1 - et ) Binomi)al n (, p (including The probability PX Bernoulli, for function for the x() == IFE: 2022 Examinations ?? n?? p (1 x?? - xn p) for t ? 0 which n1= ) Bin)n(, p distribution - x, is: x = 1, 2,? ,n The Actuarial Education Compan CS1-03: Sothe Generating functions Page 9 moment generating function is: M X n =S n?? pe )t n ??(pe tx ) q n - x =+q ( x = 0 x?? t() Negative binomial k(, p) (including geometric, for which k1= ) The probability function is: PX 1 ??-x kx - x()== k k, ??-p q 1 ?? x = k k +k1, + 2,? , Sothe MGF is: X() Mt 8 = ? =xk = ??1 etx ??1 ?? ?? pk qx - k x k 8 pe ()tk ? = xk pe ()tk (1=- x ??1 k ?? qet ) qett )] pe [(1=- ?? (qe t ) x - k ??1 k k Note: The summation is valid for qet 1< , ie for ln(1 tq<) . Hypergeometric MGF not used. Poisson ()? The probability function is: PX x()==??- x exp()/ x! , x = 0,1,2,3, ? Sothe MGF is: Mt e X()=S The Actuarial Education 8 (? x= 0 Company )e tx x! =e -- ???ee = e? ett (1) - IFE: 2022 Examination Page 10 CS1-03: Generating functions 1.3 Important examples continuousvariables We will nowlook at how to calculate the MGFof some standard continuous distributions. we will beintegrating Uniform to obtain the Here MGF. a(, b) Multiplying the PDFby txeandintegrating: b bt 1 dx ba Mt etx X()==? t ee at --ba() - a Gamma ( ,a? ) Integrate txef () x from 0to 8. This gives: tx Mt X() xxx aa e () Writing out the integral GGaa 1 ?a? () ?? 1 a ?? e-- ??t- () dx =-yt() , so that x? dy dx ? =- t , we have: a t?? G-?? 0 11 () and substituting ? ? --a ?? dx== ?? 00 8 a MtX() = 88a x e ya 1 --y e dy a ()?? t a? G-?? =G() a a ? ?? =??-?? t ? In the second line weve used the definition of the gamma function, which is given on page 5 of the Tables. Wenow see that: ?? ?aaa Mt== X() ?? ?? ? t ? 1 ?? ?? = ?1? - ???1 -- tt Thisformula only holds when ?<t ?? ? ? . It is given on page 12 ofthe Tables. Question Describe what happensif wetry to evaluate EetX() for the gamma distribution IFE: 2022 Examinations when The Actuarial ?=t Education . Compan CS1-03: Generating functions Page 11 Solution 8 XMt aa-1 x ()==() ? G() e ? E etX 0 If t x -- () dx . ? a , then the power in the exponential factor in the integral is positive and therefore the ?=t answer is infinite. So the MGFdoes not exist in this case. From this: ' MtX()a?aat?( =- 1 -) so []'==(0)XEX M a ? '' MtX () ( =+1) aa (? - so M[] EX2 aat? --2 ) (0)X == '' (1) aa + 2 ? Hence, = a , s2 +(1) aa =-=?? ? 2 a?? a 22 . ?? ??? It follows that the MGF ofthe exponential distribution MtX()(1=-?t) with mean ?is given by: 1 - Rememberthat the exponential distribution is a special case ofthe gamma distribution when 1 a=1 . The meanis ? = . ? Note: The MGFofthe chi-square ? distribution is given by MtX()(1=- 2 )t - ?/2 . Question Show that this is true. Solution 2 ? ? is gamma with ? a= 2 ? MtX()== Company 2 . Soit has moment generating function: 1 ?? 2 ? ?? 2 ?? 1 ??2 t??2 ?? Education 1 ?= ? 1??2 2?? ?? ? The Actuarial and 1 t ??2 ?? 1 = 12t ?? 2= 1 ??-?? ? - 2t()-2 IFE: 2022 Examination Page 12 CS1-03: Normal ( , Generating functions 2) s The two crucial steps in evaluating the integral to obtain the MGFfor the normal distribution are(i) completing the square in the exponent, and (ii) recognising that the resulting integral is simply that of a normal density and hence equal to 1. The derivation is not given in the Core Reading, butis covered in the next question. Theresult is: MtX() exp()s t=+1/2 t22 Question Prove this result. Solution The moment generating function ?edx -8 exp 2 sp N(, 2)s distribution is given by: 2??-?? 11 x 8 tx of the ?? ???? ??2 ?? s First we need to complete the square: 8 Mt ? X() exp tx =- 2 sp 2 exp =- ??-8sp ??2 ?? ?? dx s ?? 11 8 ? 2??-?? 11 x - 2 xx - 2txs 2 ?? ()??+ dx 22 2s2 ?? -8 11 8 ? 2 exp =- - xx( 2 +st 22 2s2 sp ?? ()??+dx 2 ) ?? -8 8 ? 2 sp exp =- 11 ( -(s +()( xt ))22 2 - +st 2 2 ?? dx ) ??+ 2s2 ?? -8 8 ? 2 sp exp =- 11 2s2 ( -( +xtss ?? ? 1 ))22?? exp ?-( 2 ?? ? 2s 2 2 -tt 2 4 ? s )? dx ? -8 IFE: 2022 Examinations The Actuarial Education Compan CS1-03: Generating functions Page 13 Since the second factor in the integral MtX() 11 exp =- does not depend on x , wecan take it outside the integral: 22 4?? -( 2 st 2 sp -t ? 22 ss ?? -8 8 =+ exp?? The function 11 ?? -1 222 ( tt?? ? sp-8 22 s now being integrated 1 ? 8 )?? ? exp?- s xt -+ s s 22( x - ( 2? +ts 2)) ? dx ? 2?? 2)?? ????exp dx ???? ?? ?? is the PDF of the normal distribution with mean s+t 2 and standard deviation s. Sothe integral is 1, and hence: MtX()exp 1 t=+??s 2 22?? t ?? as required. Wecan obtain the moments of the normal distribution from the MGF. Expanding: Mt t=+ X() 1 EX[ ()t 1 s + 22 2 coefficient of parameter s t2 2! ()+ tt22 2 s 2 + 2! +? t ==] (confirming that the parameter EX 2[] = coefficient of 1 s doesindeed represent the =+ 22 so var[]X s =+ 22 - mean). 2 = s 2 (confirming that the does indeed represent the standard deviation). Alternatively, wecould differentiate the MGFto obtain the meanand variance. However, the series methodis actually quickerin this case. Bysetting = 0 and =2 s The standard 1, wecan see that: normal random Mt 1== t Z() exp(1/2) Hence Now The Actuarial EZ[] = 0, EZ []2 = 1, variable t + 1 22 2 1 2 + EZ []3= 0, =+ , and it follows that XZs Education Company Z has MGF: 2 ()t +? 2 2! EZ []4 = 3 (coefficient EX ()3?? -= ?? , EX of t 4/ 4!), ... ??0 ()443s -= ?? . IFE: 2022 Examination Page 14 CS1-03: Remember that ()3?? is the skewness. EX- ?? symmetrical, hence weexpect Generating functions Westated earlier that the normal distribution is ()3?? to be zero. This has now been proved. ?? EX- Question In this last result, we have usedthe fact that if westandardise a normal random variable X by setting X - Z= , then Z has the standard normal distribution. Use moment generating s functions to show that this is true. Solution The MGFof Z = X - is: s ??-X t tz Mt()e==E[ e ] tt ?? E[ s ?? ] =e Usingthe formula for the MGFof the N(, MtZ ees ()==e2 which werecognise conclude that X- tt 2?? +1 t 2 - as the s E[e ss s X t -- ] =se M ZX t ?? ?? s ?? 2) distribution gives: 2 ?? ss?? MGFof t1 2 (0,1)N. So, using the uniqueness property of MGFs, we can follows the standard normal distribution. s The MGFsdo not existin closed form for the Beta andlognormal distributions. excludedfrom this section. IFE: 2022 Examinations Hence,they are The Actuarial Education Compan CS1-03: 2 Generating functions Page 15 Cumulantgeneratingfunctions For many random variables the cumulant generating function (CGF) is easier to use than the MGFin evaluating the mean and variance. Definition The cumulant generating function, CtX () , of a random variable X is given by: tXX() Ct () =ln M Wecan treat this asthe definition ofthe CGF. Question The MGFof the Bin)n(, p distribution is given by: n ()=+ q pe Mt ()t State the CGFofthe Bi )n n (, p distribution. Solution +qpe tn ) = nln( Ct ( ) ln M XX( t )==ln( As aresult, if Wehave +qpet) CtX () is known, it is easy to determine MtX () . CtX () MtX ()e= . Calculatingmoments The first three derivatives of CtX () evaluated at t0= give the mean, variance and skewness of X directly. These results can be proved Ct'X() = '' X Ct() = ' X as follows: Mt () Mt X() '' Mt () XX Mt () - ( ' X Mt ()) 2 ((Mt))2 X and '''X Ct () = ''' Mt ()( XX Mt ()) 3( X Mt ())32 ' XMt() '' X Mt ()-+ 2M t M ()( t ())3 ' XX ((Mt))4 X The Actuarial Education Company IFE: 2022 Examination Page 16 Now CS1-03: MX(0) functions = 1, so: MX' (0) EX[] ==X(0) MX(0) 1 C' '' (0)MM XX (0) '' X(0) and Generating - 2 ' (0)) ( M X M X(0) 3( MX (0))32 MX ' (0) M '' (0) -+ 2 X = var[CX] ; (0)(MM ' (0))3 XX MX ((0))4 [EX ](1)33 3(1)2 [EX] [EX 2 ]-+ 2(1)( [EX]) 3 = = 22 1 (0)(MM (0)) XX ''' C''' X(0) = 22 EX [](1) - ( E[ X]) == 14 () X skew Question (,) State the CGF of X where X ? Gammaa? . Henceprovethat ()= EX a , var()X ? skew()X = = a ? 2 and 2a ? 3 . Solution ??- ?a Mt== () ? a ? ?? 1- -t() a tt? XX ) =- a ln ? 1 ? Ct( ?? ? ?? <t ? ? ? Differentiating withrespect to t : 1 Ct () =- a = 1? '' -1 - () ?? Ct () =- 1 - ?? ?? '''()Ct =- 1- ?? and is denoted IFE: 2022 Examinations - 1 ? ?? 21 aa? C ==E ''(0)XXt ( X) ?? ? 22 tt ?? -- =aa 1 - ?? ? )X ?? ? ?? - tr in the r! byr? . of t ?? ?? ?? ??? ?? The coefficient 1- = 2 aa ? 23 ?1- tt? -- ?? Maclaurin series a C'' XX (0)==var( 22 ??? 33 ? ?? of ? skew X) 2a C'''(0) XX ==( 3 ? Ct () =ln M t is called the r th cumulant () XX The Actuarial Education Compan CS1-03: Generating functions Page 17 Another methodfor finding the cumulants is to differentiate the CGF with respect to t and then set t0= . The r th derivative then gives the r th cumulant, r? . So: ? 1 = ?2 = C' X(0) C'' X(0) ?3 = C '''X (0) etc Cumulants are similar to moments. Thefirst three cumulants arethe mean,the variance and the skewness. Question By usingthe CGFofthe ()Poi distribution, derive the 2nd, 3rd and 4th cumulants. Solution For the Poisson distribution: Mt () = e (1) - ? C t ln XX () Differentiating and setting 0t = et Ct () = et '''()Cte= t ?? 3 C'''(0) XX == ''''() Cte= t ?? C''''(0) XX == '' ? ? 2 4 ett ( e - 1) we obtain: = () 'X Ct ()== XMt == XX(0) e 0 =C'' Sothe second, third andfourth cumulants of the Poisson distribution are all equal to all the cumulants . In fact, are the same. Wecan see that the CGFis particularly useful when the MGFis an exponential function, asit makesthe differentiation alot easier. The Actuarial Education Company IFE: 2022 Examination Page 18 3 CS1-03: Generating functions Linearfunctions Suppose interest. X has MGF The MGF of ()XMt and the distribution Y, function MtY () say, can be obtained from that EetY [] MYXt () of a linear Ee== [ t a bX() ] + = e at E[e bt X] = e at M of Ya=+ bX is of X as follows: bt() Question (i) Use MGFsto show that if (ii) Estimate X ? Gammaa?(, ) and a2 is an integer, then 2? X?? 2 2 a. >(75)PX when X ? Gamma (20,0.4). Solution (i) Proof The MGFof the Ga (,)mmaa? distribution is Mt t ?? - a 1=-X() ?? ??? If Ya=+ bX, then . )YX Mt () = e at M bt( . In this question, 0a = , and , so: ?=2b 2?t?? -a tYX(2?) Mt ()== M 1- ? ?? ?? = 1 - 2t()-a Thisis the moment generating function of the chi-square distribution witha2 freedom, (ii) If so by the uniqueness of MGFs, we can say that ?2 X follows the degrees of 2 ? 2a distribution. Probability X follows the Gamma(20, 0.4) distribution, PX (75) >= P(0.8 X > 60) = P( >2 ?40 then 0.8X ? ? 2 40. So: 60) From the percentage points table of the ?2 40 distribution,weseethatthis probability isjustless than 0.025. IFE: 2022 Examinations The Actuarial Education Compan CS1-03: Generating functions Page 19 Question If X follows the gamma distribution with parameters a 2= and 0.4=, calculate ? (10)PX using > direct integration. Solution The PDFof X is: 1 fX x() (2) --0.4 0.4 xe 20.4xx== 0.16xe x = 0 , G Integrating the PDF using integration 10 (PX 10) 0.16== xe by parts: 10 -0.4x dx 0.4xx???? 10 ????= - 0.16x 0.4 ?? ?? ?? ?? 0.16 ee40 1.6 =+ 0.4 0.4 -- 10 x??.4-- ?? 0.4 ?? ??0 000 + ee - 44 =- --0.4 ??0.16ee dx --0.4 4(1)-- () 1= - -5e - 4 So we have: ( PX >=1 10) - Wecan check this result 1 - 5e ()--5e 44= = 0.09158 by obtaining P ?2 >(84) using page 165 of the Tables. Alternatively we can obtain PX > (1 0) directly byintegrating between the limits of 10 and 8. This method rapidly becomes tedious (or impossible) for values of a other than very small integers. Wecan also obtain the CGFof alinear function. Question If Ya=+ bX , derive and simplify an expression for CtY () in terms of CtX () . Solution Since Ct () =ln C (t ) =ln The Actuarial Education M t() , using the expression for YY MtY (), we have: M (t ) = ln[ eat MX(bt)]=+at ln MX ( bt) YY Company at=+ CX bt() IFE: 2022 Examination Page 20 4 CS1-03: Generating functions Furtherapplicationsof generatingfunctions Generating functions random variables. can be used to establish the distribution ?,, Alinear combination ofthe random variables X1 cX 11 of linear combinations of This will be covered in detail in alater chapter. nX is an expression of the form: c nn X ++? where ?,, cc1 n areconstants. Wecan use MGFs(or CGFs)to obtain the distribution wecan show that if variables, then Moment generating ? 12 ++? Poi 2()XX 1 functions compound distributions. IFE: 2022 Examinations 11()XPoi and . ? of such alinear combination. , and 1X 22()XPoi and 2X are independent For example, random We will prove results such as this later in the course. can also be used to calculate moments for and specify This will be covered in detail in Subject CS2. The Actuarial Education Compan CS1-03: Generating functions Page 21 Chapter3 Summary Generatingfunctions are usedto makeit easierto find moments of distributions. The moment generating function (MGF) of arandom variable is defined to be: MtX () = E etX?? ?? The series expansion for X() Mt The formulae 1=+ tE ( X) for the M()= EX MGFsis: + 2! E( X )+ 3! E( 23 X )+ ? mean and variance are: (0)X ' tt23 var( )=-XMXX(0)M '' (0) []' 2 The cumulant generating function (CGF) of a random variable is defined to be: Ct () =ln XX M () t Theformulae for the moments are: C()= EX ' (0)X va =r( ) XCX(0) '' skew()= C '''(0)XX The uniqueness property meansthat if two variables havethe same MGFor CGFthen they have the same distribution. If Ya=+ bX , then: Mt M bt) and () = eat YX( The Actuarial Education Company Ct at=+() C btYX( ) IFE: 2022 Examination Page 22 CS1-03: Generating functions The questions start on the next page so that you can keep all the chapter summaries together for revision purposes. IFE: 2022 Examinations The Actuarial Education Compan CS1-03: Generating functions Page 23 Chapter3 PracticeQuestions 3.1 Exam (i) Determine the moment generating function ofthe two-parameter exponential random variable X, defined by the probability density function: style fx() e x-- ?a() x==?a where ,?a > 0. [3] , (ii) 3.2 Derive from first principles the PX 3.3 3.4 Exam style Hence, or otherwise, determine the meanand variance of the random variable X. [4] [Total 7] (1- ?? ) x - 1 x()== moment generating function Determine the cumulant (ii) Hence determine the generating function of the mean, variance and coefficient moment generating function of X is N(, Now suppose that (ii) X, where 2)s distribution. of skewness of this distribution. MtX () . Derivean expressionfor the momentgeneratingfunction of (i) variable x = 1,2,3, ? . (i) Suppose that the of a random X is normally distributed Derive the distribution of+23X with mean and variance + 23X in terms of s2 MtX (). [2] . . [2] [Total 4] 3.5 Exam The momentgeneratingfunction, MtY (), of arandom variable, Y, is given by: style MtY () 1 ()-2 =- 4t t < 0.25 Calculate: (i) (ii) EY () [1] the standard deviation of Y (iii) [2] .() EY6 [2] [Total 5] 3.6 Exam The random variable U has a geometric distribution with probability function: style PU ( u)== pq u (i) (ii) - 1 u =1,2,3, ? where pq+ =1 Derivethe momentgeneratingfunction of U. Writedownthe CGFof U, and henceshowthat [2] ()p= 1/EU . [3] [Total 5] The Actuarial Education Company IFE: 2022 Examination Page 24 3.7 CS1-03: Arandom Exam style variable f()=> X has probability -2xxke x Generating functions density function: R where R and k are positive constants. (i) (ii) (a) Derive aformula for the moment generating function of X. (b) Statethe values of t for whichthe formula in part(i)(a)is valid. [4] Hencedeterminethe value of the constant k in terms of R. [1] [Total 3.8 (i) Derive,from first principles, the moment generating function of a Ga (,)mmaa? random variable. Exam style (ii) 5] [3] Show, using the moment generating function, that the meanand variance of a Gam)maa?(, random variable are a ? and a? 2 , respectively. [2] [Total 5] 3.9 Exam Xis normally distributed with mean and variance s2. style Determine the fourth central moment of X. 3.10 The claim amount [3] Xin units of 1,000 for a certain type of industrial gamma variable with parameters Exam style a3= and ? policy is modelled as a 1/4= . (i) Use moment generating functions (ii) Calculate the probability to show that that a randomly 1 2 X? 2 6?. chosen claim amount [3] exceeds 20,000. [2] [Total 5] IFE: 2022 Examinations The Actuarial Education Compan CS1-03: Generating functions Page 25 Chapter3 Solutions 3.1 (i) Usingthe definition of an MGF: 8 X() Mt ?e E etX?? == ?? () dx x--tx?a e? a 8 = ? ?a ? ee ()tx ?-- dx a e?a ? =- t ? ? ? - (ii) Re-writing the ()tx ?-- e ?? 8 ?? ??a??- providedet? =< ta t [3] MGFto makeit easier to differentiate: -1 Mt t ?? 1=-X()?? ea 1 ?? ?? e t ? ?? () Mt' X =- ?? M() EX ? tt + ?? 'X (0)== 1 1- 21tt1 ?? aaa ?? e -- ? ?? [2] +a ? '' Mt X() ? ? M() EX 2 1=- '' ?? ?? ?? e2tt X (0)== ? ? 22 a var()X =+ 2 22a 1- ?? 22a + +a -1 eaa +a 1- ?? ?? ? ?? t ta e 22 ? + 2 2 1 - 1 ?? ??+ = 22 ?? aa ?? 3.2 32 tt ?? ??? ?? -- + [2] ?? The MGFis given by: () XMt E etX()== 88 etx P( X = x) =?? etx (1 -?? ) x- 1 xx== 11 )?? The Actuarial Education Company =+ eett -(1 e+ 23t-(1 ? )2 + () ? IFE: 2022 Examination Page 26 CS1-03: The expressionin the bracketsis aninfinite geometric series with Generating functions ret?(1=- ) =aet and . Summing it gives: ?et Mt = X() 3.3 (i) 1 1 ()1-? ?? 2 1 ln M ()== t + s 22 t 2 Differentiating and setting 0t = Ctt=+ () ? s2 gives: C '' (0)== XX E( X) '' Ct () = ? )X C''(0) ==var( ss 22 XX '''()Ct = 0 ? Skew X) Since the skewness is zero, the coefficient (i) ln t22 ?? exp t=+?? s ? tXX Ct () 3.4 ? For the normal distribution: Mt X() (ii) -<<ett 1(1 - ?) < 1 where 1(1 --?et ) C'''(0) ==(0 XX of skewness is also 0. MGF The MGFof Xis: MtX ()e= ( )tXE So the MGFof +23Xis: MXXt 23() + (ii) [Ee (2tX 3) ]== [Ee(2t ) X ++3t ] = e3t [Ee(2)t ] = e3t Mt(2 ) [2] Distribution The MGFfor a N(, 2)srandom variable is tXX Mt e2tt(2 M 23()+ )==33 e e Thisis the MGFof the N(2 s+ 3,4 MGFs, X 2 ts++s2 t22 =e(2 . Usingthe formula derivedin part (i): +3)t 1/2(4 2)t 2 ) distribution. Therefore by the uniqueness property of +23Xfollows the N(2 s+ 3,4 IFE: 2022 Examinations 22s+ tt1/2 e 2) distribution. [2] The Actuarial Education Compan CS1-03: 3.5 Generating (i) functions Page 27 Expectation 3 Mtt=( ) 8(1 4 ) 8''- (ii) M ==E YX(0) ( Y) [1] Standard deviation '' MtY () =- 4t) - Sixth Recallthat ? E( Y 4296(1 ) = 96 96=- 82 = 32 ? var( Y) (iii) ? ? standard deviation 32== 5.6569 [2] moment EY6() is the coefficient of of the MGF,(1 - t6 in the expansion of 6! MtY (). From the binomial expansion )t - 2, (using the formula given on page 2 ofthe Tables)the term is: 4 -- 23 - 4 - 5 - -6 7 6! -(4 t ) 6 [1] Hence, EY (66 ) = - 2 - 3 - 4 - 5 - 6 - 7 ( - 4) = 20,643,840. Alternatively, wecan use EY()M= 3.6 (i) 6(6)(0) but this requires us to differentiate the Y [1] MGFsix times. MGF The MGFof Uis: 88 UMt() E etU??== etuP(U = u) ?? =?? etu pq u- 1 uu== 11 pe =+ pqe tt This is aninfinite + pq22 e 3t geometric series + with [1] ? ap te= and rqte= so using the formula S8 = a gives: 1- r pet Mt = U() [1] 1-qet (iii) CGFand mean Wehave: U() ln Ct== The Actuarial Education pe ??t ?? 1 qet ???? Company lnp +t -ln 1 -qe ()t [1] IFE: 2022 Examination Page 28 CS1-03: Differentiating Ct' U () qett qe ??=1 1=-+?? qe ?? ?? 11 --qe [1] tt : 11 p q C() EU1==' U(0) (i)(a) functions the CGF: Substituting in t0= 3.7 Generating + = 11-- qq [1] = MGF The MGFof X is: 8 X() Mt E[ e]tX ==?etx ke- 2x dx [1] R --tx e (2-ke (2 ) dx==?? k 8 ? 8 ) ?? tx [1] (2--t )??R ?? R ke--tR (2 ) = (i)(b) [1] (2)-t Values of t for whichvalid Theintegral converges as x?8 (ii) 3.8 onlyif 2t- [1] Evaluate k -2R Puttingt0= givesMkX(0)= 12 Since MX (0) mustequal 1,this tells usthat (i) is positive. Sothe MGFis valid for t2< . e. [1/2] = ke2 2R. [1/2] dx [1] MGF XMt() 8 a 8 -1 ? = ? x -1 ea? x G() a - a? -tx () G()? xe a IFE: 2022 Examinations a ()==? 0etx E etX dx 0 The Actuarial Education Compan CS1-03: Generating functions Page 29 Theintegral looks like the PDFof a Gamma (, )ta?random variable, so putting in the appropriate constants: () 8 aa ??-t Mt = X() ? () a ?-t Ga 0 provided -tt? ??aa ?? ?? == ??-t ? ?? ?? ?1- = ? wecan use the substitution a -- ? ?? ? of a Gamma PDF over the [2] ? ? whole range is 1. method. Meanand variance Using the results () Mt () M' (0)XEX and = 1=- '' Mt=() M'' X(0): = -- 1 t ?? ?? ?? ?? ? aaa M ==E ''XX(0) ( X) (1) =- aa var()X + a ?? The MGFof 2 a-- X()2 M'' ==Eaa XX(0) ? 22 22 ==1 1/222t ()MtX-e So [(EX ) 2 = t s a ? [1] 2 11 +22 + 4]-is the coefficient of t 4 4! EX s-=[( ] (1) 22 ??? s2(0, , whichfollows the N X- [1] ? (1)?? t 1 ?? aa++ EX2() ?? 3.9 t=< - t ?? (ii) dx () a ? Alternatively, ) a ? since the integral e 1(a?tx-- - x () ) distribution, is: 2 ss t 2 ()1+? 2 2 [1] in this series,ie: 3 44) [2] Alternatively, wecan differentiate the MGFfour times and substitute t0= EX () E( X ), (EX 23) and EX4() . Wethen use the expansion of [(EX eachtime to obtain ) 4]-: , E[( X ) ]-= 44 (EX ) - 4 E( X3 ) + 6 2 (EX2 ) - 3 4 It mightbetempting to usethe CGFasit givesthe secondandthird central momentsva =r( ) XCX(0) '' and skew()= The Actuarial C '''(0)XX . However, thereafter the CGFdoes not give central moments. Education Company IFE: 2022 Examination Page 30 3.10 CS1-03: (i) Generating functions MGF Weare giventhat X ? Gamma 3, 14 () . From the Tables: t ?? -a XMt() =- ?? ??? ()-3 =11- 4t [1] 1 . Then: YX2= Let ?? EetY M t() == 1 Ee2 tX() ?? M YX ??= t ?? ?????? 2?? -3 t ???? 1-2t ???? =()-3 2?? ?? =-14 So the moment generating function Bycomparing this withthe [1] is YX2= () - 3 1 of -12t . MGFofthe gamma distribution in the Tables, wesee that this is the same asthe MGFof a Gamm 3,1/2a () distribution. Lookingalso at the definition ofthe chi-square distribution, wesee that Gamm 3,1/2a () is the definition of a chi-square distribution with 6 degrees of freedom. Bythe uniqueness property 1 X 2 (ii) of moment generating functions, therefore, we have shown that 2 ? ?6 . [1] The probability that a claim exceeds 20,000 3, 14 () Weare given that X ? Gamma Wealso know that 2? X ? ? 22a , ie 12 X ? ?62 . Therefore,usingthe Tables: () PXX>=20 IFE: 2022 Examinations P 1 2 () >10 = P where X is the claim amount in units of 1,000. 6 ()2?1 >10 = - 0.8753 = [2] 0.1247 The Actuarial Education Compan CS1-04: Joint distributions Page 1 Jointdistributions Syllabus objectives 1.2 Independence, joint and conditional distributions, linear combinations of random variables 1.2.1 Explain what is meant byjointly distributed random variables, marginal distributions and conditional distributions. 1.2.2 Define the probability function/density function of a marginal distribution and of a conditional distribution. 1.2.3 Specifythe conditions under whichrandom variables areindependent. 1.2.4 Define the expected value of a function of two jointly distributed random variables, the covariance and correlation coefficient between two variables, and calculate such quantities. 1.2.5 Define the probability function/density function of the sum oftwo independent 1.2.6 1.2.7 Education variables as the convolution of two functions. Derivethe meanand variance oflinear combinations ofrandom variables. Usegenerating functions to establish the distribution oflinear combinations The Actuarial random Company of independent random variables. IFE: 2022 Examination Page 2 0 CS1-04: Joint distributions Introduction Asyet, wehave only considered situations involving onerandom variable. In this chapter we will look at some general results involving two or morerandom variables. This chapter is quite long, and contains alarge amount of material. It maytherefore be helpful to notice the parallels withthe single random variable notation, in order to aid understanding ofthe overall structure of the chapter. Firstly we will define a joint probability (density) function how wecan obtain a marginaldistributionie will look at conditional distributions PX= () x or x(| Y==PX y) or Y(, x ==PX y) or f (,xy) . We will see x from the joint distribution. Then we ()f (|)f xy . The study of conditional distributions continues in the next chapter. It might be worth studying the next chapter with this one asthe materialin the two chapters is quite closely linked. Given a distribution involving two random variables, we will explain how to work out the mean and variance of each random variable, and the covariance of the two random variables. We will also define the correlation coefficient. This work will be continued in alater chapter, where we will attempt to estimate whatthe correlation is from asample. Finally, we will extend our work on MGFsfrom the previous chapter to combine distributions together. This will give us easier waysof obtaining results for the binomial, negative binomial and gamma distributions, IFE: 2022 Examinations amongst others. The Actuarial Education Compan CS1-04: Joint distributions Page 3 1 Joint distributions 1.1 Joint probability (density) functions Defining several random multivariate distribution. variables simultaneously on a sample space gives rise to a In the case of just two variables, it is a bivariate distribution. Discretecase To illustrate the various this for a pair of discrete variables, values of )xy (, are as follows: X and Y, the probabilities associated with x y 1 2 3 1 0.10 0.10 0.05 2 0.15 0.10 0.05 3 0.20 0.05 - 4 0.15 0.05 - So,for example, The function PXY== (3, fx(, y) 1) = 0.05, and PXY== (1, P( X== x Y = y) for all values 3) = 0.20. of )xy (, is the (joint/bivariate) , probability function of )X(, Y amongst the possible distribution it specifies values of )xy (, how the total probability of 1 is divided and so gives the (joint/bivariate) up probability of )X(, Y . The requirements random variables y(, fx =) for a function are: 0 for all values to qualify as the probability function of a pair of discrete of x and y in the domain ?? fx(, y) 1 = xy This parallels earlier results, wherethe probability function PXx==() 0 for all values of x and ? was=PX () x. Wesaw that x ==PX () 1 . x The Actuarial Education Company IFE: 2022 Examination Page 4 CS1-04: Joint For example, consider the discrete random (PM m m N n)== = , 35 2n - 2 variables M and N with joint distributions probability function: , where m= 1, 2, 3, 4 and n = 1, 2, 3 Lets draw up atable showing the values ofthe joint probability function for M and N. Starting withthe smallest possible values of M and N: PM (1, N 1)== = 35 2- 12 1 = 35 Calculatingthe joint probability for all combinations of M and N, weget the table shown below. M 1 N 2 3 1 2 3 4 2 4 6 8 35 35 35 35 1 2 3 4 35 35 35 35 1 1 3 2 70 35 70 35 Question Usethe table of probabilities (i) PM (3, N == 1 or 2) (ii) PN =(3) (iii) N(2| PM given above to calculate: ==3) . Solution (i) Sincethe events =(1)PN and PM (3,==N (ii) Werequire finding PNM== (3, PN== (3) IFE: 2022 Examinations 1 or 2) = =(2)PN are mutually exclusive, we have: PM =(3, N =1) + PM (3, N== 2) 63 = 35 + 35 =(3)PN , and since this does not depend on the value of 1,2,3 or 4), ie 70 + 11 35 + 3 70 + we are summing 2 35 = 9 = 35 Mit is the same as over all possible values of M: 1 7 The Actuarial Education Compan CS1-04: Joint (iii) distributions Page 5 Usingthe formula for conditional probability, )PA(| B = PM (2| N 3)== = PM (2,== N 3) 1/ 35 = PN =(3) 1/ 7 PA n PB () B () , gives: 1 = 5 Continuouscase In the case of a pair of continuous variables, the distribution of probability over a specified area in the )xy (, probability plane is given by the (joint) that the pair )X(, Y takes integrating )fx (, y over A probability density function values in some specified this integral region )fx (, y . Ais obtained The by is a double integral. Thus: y(, Px X 12x y1 << yx22 < Y 2) = ??(fx, y) dxdy < yx11 Thejoint distribution function )Fx (, y is defined by: Fx(, y)y== P( X x Y= ) , and it is related to the joint y(, fx ) ? = ?? density function by: 2 xy F x ) y(, The conditions for afunction to qualify as ajoint probability density function continuous random variables are: of a pair of y(, fx =) 0 for all values of x and y in the domain ?? )fx(, y dxdy = 1 xy Theseresults parallel those for a single random variable, wherethe probability density function was ()f x. Wesaw that f x() = 0 for all values of x and ? fx()dx = 1. Recallalso that probabilities x b were calculated usingtheformula Pab<<X () = ? f(x ) dx. xa = The next questioninvolves the use of doubleintegrals. The Actuarial Education Company IFE: 2022 Examination Page 6 CS1-04: Joint distributions Question The continuous random variables U and V havejoint probability density function: fu )v=< UV(, 2 + uv 3,000 , Calculate (10V<< , where 10 15, u < 20 and - 5 < v < 5 > 0)PU . Solution From the formula for the joint probability function: 15 (10V<< PU 15, 5 >0) = 2 +uv ??3,000dvdu 10uv== 0 Thiscan beintegrated withrespectto either u or v first. If wedo v first, weget: 15 5 uv 15 ??3,000 dvdu== 10 uv== 0 1 2 ? 15 du 3,000 ?? u=10 212uv 0u++ 12.5 ? 10 ??v=0 2 =?? 5 2??+ v ?? 3,000 du 15 uu??+ 512.5 3,000 ??10 ?? = 0.229 If weintegratefirst withrespectto u andthen withrespectto v, weobtainthe sameansweras before. 1.2 Marginalprobability (density) functions Discretecase The marginal distribution X fx() = ?f( xy, of a discrete random variable X is defined to be: ) y This is the distribution of X alone without considering the values that Y can take. Thisis what we were doingin the first question in this chapter when wecalculated the probability that N3= . If take. we want the IFE: 2022 Examinations marginal distribution for X, wesum over all the values that The Actuarial Y can Education Compan CS1-04: Joint distributions Page 7 Let X and Y have the joint probability function given in the Core Reading at the start of this section: x 1 2 3 1 0.10 0.10 0.05 2 0.15 0.10 0.05 3 0.20 0.05 0 4 0.15 0.05 0 y Lets find the marginal probability distribution of X. The marginalprobabilities are: (PX 1)== 0.1 + 0.15 + 0.2 + 0.15 = 0.6 (PX 2)== 0.1 + 0.1 + 0.05 + 0.05 = 0.3 (PX 3)== 0.05 + 0.05 0.1 = Sothe probability distribution of Xis: x PX = x() 1 2 3 0.6 0.3 0.1 Weare just adding up the numbers in each column. For the marginal distribution of Y we would calculate the row totals. Wecan also do this if weare given the joint distribution in the form of a function. Question Obtainthe probability functions for the marginaldistributions of M and N, where: PM ( m m N n)== = , for 35 2n- 2 m=1, 2, 3, 4 and n = 1, 2, 3 , Solution Summing over the values of N gives: 33 PM m== () PM ( = m N = n) = 1135 2n-2 nn == , The Actuarial Education Company ?? mm = =??2 + 35 1+ 1?? m 2?? 10 IFE: 2022 Examination Page 8 CS1-04: Joint Summing over the values of n (??P () PN M gives: 44 M = m N n)== == distributions , mm == m = 1135 2 22 1+ 2 + 3 + 4() = nn 35 2 -- 11 3 7 2n- Continuouscase In the case of continuous variables the fxX () is obtained This meansthat f by integrating marginal probability over density function (PDF) of X, y (for the given value of x ) the joint PDF )fx (, y . X() =?xf( x, y) dy. y The resulting integrating ()Xfx is a proper over PDF it integrates ()Yfy , we obtain this by x (for the given value of y ). In some cases the region of definition of )X(, Y one variable to 1. Similarly for will involve may be such that the limits ofintegration for the other variable. We willsee an examplelike this in Section 1.4. Question Determine the marginalprobability density functions for fu )v=< UV(, , 2 + uv 3,000 , for 10 u < 20 and - 5 < v < U and V, where: 5 Solution To obtain the PDFof the marginal distribution of U, weintegrate out V: 5 ? fu== U() v=-5 Therefore the v++1/2 uv dv 2?? 22uv 3,000 5 ?? 3,000 ?? ??v=- marginal distribution of Uis 5 = fu() = U uu+ 12.5) = u ()+- ( - 10 12.5 10 3,000 u 150 , 10 150 20u<< . Similarly for V, weintegrate out U: 20 ? fv== V() u=10 Therefore the uv 20 du u2++ uv?? 3,000 marginal distribution IFE: 2022 Examinations ?? = 400 3,000?? ??u=10 of V is fv = V() ()+- ( 100 + 10 vv20 ) 230+v = 3,000 30+v 300 , 300 <55v-< . The Actuarial Education Compan CS1-04: Joint distributions Page 9 To check that these functions are PDFs, wecan integrate them over the appropriate range. The answers should both be 1. 1.3 Conditional probability (density) functions The distribution of X for a particular value of Y is called the conditional distribution of X given y . Discretecase The probability | function discrete random variables y(, Px XY | = y PxyXY (| y) for the conditional distribution = of X given=Yy for X and Y is: )== P( X y(, Px XY , x | Y = y) = ) Py Y() for all values x in the range of X. This is what we were doing earlier when wecalculated PM (2| N== 3) in a previous example. Question A bivariate distribution hasthe following probability function: X Y 0 1 2 1 0.1 0.1 0 2 0.1 0.1 0.2 3 0.2 0.1 0.1 Determine: (i) the marginal distribution of X (ii) the conditional distribution of Y= |2X. Solution (i) The marginal distribution PX== (0) The Actuarial Education Company 0.4 , of X can be found (PX==1) 0.3 , by summing the columns in the table. (PX==2) 0.3 IFE: 2022 Examination Page 10 (ii) CS1-04: Joint Using the definition of conditional Y(0| PX ===2) Y(1| PX == Y(2| PX ===2) probability: Y(0, PX == 2) PY=(2) 2) = Y(1, PX = == 2) 0.1 = 0.25 0.4 0.1 == PY =(2) Y(2, PX distributions 0.4 == 2) 0.2 = PY = (2) 0.4 0.25 =0.5 Alternatively, wecould scale up the probabilities in the second row so that they add to PX one, eg Y(0| == 2) = 0.1 0.1 0.1 0.1++ 0.2 = 0.4 =0.25. Continuous case The probability density function yfx | (|)XY y for the conditional distribution of X given = Yy=for the continuous variables X and Y is a function such that: x2 XY= y(, fx y) dx P( x|1=<<X x in the range X. x2| Y = y) =xx 1 for all values This conditional which distribution of in both instances is only defined for those values of y for fyY () > 0. Wecalculate the form of the conditional PDFsimilarly to the method we usedin the discrete case, ie we divide the joint PDF by the y(, fx fx XY | =yy(, ) = XY marginal PDF. So: ) , fy Y() Question Let X and Y havejoint density function: y(, fx )=+ 1 16 (x 3 y) Determine the conditional 0 < x < 2, 0 < y < 2 density function of X given Yy= . IFE: 2022 Examinations The Actuarial Education Compan CS1-04: Joint distributions Page 11 Solution The marginalPDFof Y is: 2 2 11 1 2 ?? 3y) dx=+ = x +3xy?? 16 2 ??x =0 ? 16 f Y() (yx x=0 1 = (2 6y) 16 + So: fx XY | =yy(, )== 1 16 1 XY(, fx y) , fy Y() 16 1.4 (3xy) + (2 + 6y) = xy +3 0<x <2 2(1 + 3y) Distributions defined on morecomplex domains For all the distributions wehave seen so far, the limits on both the x and the y integrals have been numbers. So, in the previous example, the joint PDFis defined over the rectangle whose vertices are at the points (0, 0), (0, 2), (2, 0), and (2, 2). It is possiblefor ajoint distribution to be defined over a non-rectangular area. In these cases,the limits for y may be dependent on x, or vice versa. Care needs to be taken in these cases to ensure that the correct limits are used whenintegrating. Question Let X and Y havejoint density function: fx(, y) k( x2=+ xy) 0<<y x <2 (i) Calculatethe value of k. (ii) Determine the PDFs of the (iii) Determine the conditional marginal distributions density function for of|YX x= X and Y. . Solution (i) Lets integrate with respect to x first, then y. <<yx <02 that, when X is consideredasa variable,the limits for X will befrom Xy= to X2= . Oncex hasbeenintegrated out, the limits for Wesee from the inequality y will be from The Actuarial Education Company y0= to y2= . IFE: 2022 Examination Page 12 CS1-04: Joint So,integrating first with respect to x: 2 ? kx xy ()2dx+= k?? 3?? ??y y 2 this expression 8 ky+???k24 0?? 5yy?? 36 ?? dy 2 xy?? 32 x Wenow integrate = k =k ??? ?? +2y??- y + 3 ??? ? y33 ? ??+ ?? ?= k +2y 2?? ??32?? 3 ? ?? with respect to y, using the limits 2 ?8 ? 3 + y2 534 y ? ?? = k - 3 88 3? 5y ? ?6? 0 and 2: 10?? 16 ?? ?0 Sincethis mustbe equal to 1, wesee that k = (ii) distributions + - ??= 6 3??24 1 6 . Byintegrating first with respect to x, and setting k = 1/ 6, we have already obtained the marginaldistribution for Y: 18 fy Y() To obtain the =+-2y 63 5y 3?? 0 <y <2 ?? 6 ?? ?? marginal distribution for X, we mustintegrate first with respect to y. We see from the inequality givenin the question that the limits for y are now 0 and x. So: x fX x 1?? 22 ()xy dy=211 x y+62xy ?? ??? xk x ()=+? 0 0 (iii) ? 3 = 6?x + 1 3? x 2 ? ? 1 3 =4 x 0<x <2 The PDF of|YX x= is obtained by dividing the joint PDFby the marginal PDFfor X: fx YX | y(, )== fx y) XY(, fx X() 1 6 ()2 xxy+ , 1 3 4 x = y ?? 21 3 x + ?? x2?? 0 <y < x < 2 1.5 Independenceofrandom variables Consider a pair of variables )X(, Y, and suppose that the conditional distribution of Y given Xx=does not actually depend on x at all. It follows that the probability function/PDF x(| fy ) mustbe simply that ofthe marginaldistribution of Y, fyY (). Here f (|yx) is anabbreviation for fYX|)x = (,yx. IFE: 2022 Examinations The Actuarial Education Compan CS1-04: Joint distributions Page 13 So,if conditional is equivalent to marginal, then: ie XY,fx(,)y So the joint This fx XY,y(, f | = x( yx, )== YYX fy() ) fx X() ( ) f ( y) = f XYx PF/PDFis the product of the marginals. motivates the definition, which is given here for two variables. Definition The random function/PDF variables X and Y are independent if, and only if, the joint probability is the product of the two marginal probability functions/PDFs for all )xy (, in the range of the variables, ie: XY,fx (,)y = f X( x) f Y( y) for all )xy (, in the range Discretecase It follows that probability into statements about variables statements about values assumed X and Y separately. Soif by )X(, Y can be broken X and Y are independent down discrete then: Y(, PX x == y) = P( X x) P( Y = y) = Question Determine whether the variables X and Y given below are independent. X Y 0 1 2 1 0.1 0.1 0 2 0.1 0.1 0.2 3 0.2 0.1 0.1 Solution They are notindependent. given Y2= is not the same as the Alternatively, PY== (1) The Actuarial For example, wesaw earlier that the conditional distribution of X marginal distribution wecan see that, for example, 0.2. Since 0.4 0.2 Education Company PXY== (0, of X. 1) = 0.1 . However, (PX==0) 0.4 and 0.1? , the two random variables are notindependent. IFE: 2022 Examination Page 14 CS1-04: Joint To show that the random variables are not independent, distributions we only need to show that the joint probability is not equal to the product ofthe marginalprobabilities in any one particular case. If we wishto show that they areindependent, we need to show that the multiplication worksfor all possible values of x and y. As a quick check in the discrete case, note that, for independence, the table the probabilities in each row in mustbein the same ratios asthe probabilities in every other row (and similarly for the columns). Thisis not the case here. Question Thejoint probability function of M and Nis: (PM m m N n)== = , where m=1, 2, 3, 4 and n = 1, 2, 3 35 2n 2 , - Determine whether the variables M and N are independent. Solution They areindependent. joint probability Wesaw earlier that distribution is the product PMm==() of the two m and PN 10 1 ()== n . Hencethe 72n-3 marginal distributions. So the variables are independent. Continuouscase If X and Y are continuous, the double integral required to evaluate ajoint probability into the product of two separate integrals, one for X and one for Y, and we have: P x X<< x12(, 1 < yY < y2 ) = P( x X<< x12 ) P( 1 <yY < splits y2) This meansthat in the continuous case,if the two random variables areindependent, wecan factorise the joint PDFinto two separate expressions, one of which will be afunction of x only, and the other will be afunction of y only. Question Thejoint PDFof U and V is: fu )v=< UV(, , 2 + uv 3,000 , where 10 u < 20 and - 5< v < 5 Determine whether the variables U and V areindependent. IFE: 2022 Examinations The Actuarial Education Compan CS1-04: Joint distributions Page 15 Solution They are notindependent. If they wereit would be possible to factorise 2 +uv 3,000 into two functions ofthe form )gu() h( v . Asthis is not possible, the random variables are notindependent. Functions ofrandom variables If the random variables X and Y areindependent, then any functions gX() and hY() are also independent. Thisshould beintuitively obviousif wethink ofindependence as meaningthat the quantities have noinfluence on each other. Severalvariables Whenconsidering three or more variables, the definition ofindependence involves the factorisation of the joint probability function into the product of all the individual marginal probability functions. For X, Y, and Z to be independent it is not sufficient that they are independent taken two at atime (pairwise independent). Question Considerthe joint probability density function of X, Y and Z given by: fx(, y, z) ? =? +xy() e-z ?? 0 for 0 << 1, 0 <xy < 1, z 0 > otherwise Verifythat the random variables X, Y and Z are not independent, but that the two random variables X and Z are pairwise independent, and alsothat the two random variables Y and Z are pairwise independent. Solution Thejoint density function of X and Z is: 1 fXZ , x(, z) ?( x=+ y)e dy 1 exy +)220 y )?? = -??= --zz ( z (ex +112 0 Thejoint density function of Y and Z is: 1 YZ(,) , fy z ?( x=+ ye) 1 dx -??= e-- ( x + yx)?? = e z(y +112zz ) 220 0 The Actuarial Education Company IFE: 2022 Examination Page 16 CS1-04: Joint The marginal density functions distributions are: 8 fX ( x) ? ex=+=22 () dz - -- zzex + 8 11 ??= ()?? 0 1 fZ () ?ze 8 ? 1 ??0 --zz( x 2 =-??e ( x )11 ) dx=+ 22 2+ 1 x 0 fY(y) x + 1 2 0 () dz ey=++22 - -- = e z zzey+11()??= ??8 =y 12 0 0 If we multiply together the marginal PDFsfor X, Y and Z, we obtain: fXY Z() xf ( )yf ( z) ( x=+ )( y +11 ) e-z 22 Comparing this with the joint distribution PDF fx(, y, z) ( x=+ y) e-z , wesee that they are not the same. So X, Y and Z are not independent. However the product of the marginal distribution PDFsfor X and Z, andfor Y and Z, do give the respective joint PDFs. So X and Z are pairwise independent, and Y and Z are also pairwise independent. IFE: 2022 Examinations The Actuarial Education Compan CS1-04: Joint distributions Page 17 2 Expectationsoffunctions oftwo variables 2.1 Expectations The expression is found for the expected by summing value (discrete probability value of a function )gX (, Y case) or integrating of assuming of the random (continuous variables )X(, Y case) the product: that value over all values (or combinations of) )xy (, . The summation is a double summation, the integral a double integral. Thus for discrete variables: Eg , Y )]== [ ( X ?? (gx y,) p XY, ( x y,) ?? (gx y,) P( xy where the summation X= x Y , = y) xy is over all possible This result parallels that for single random values of x and y . variables, where the expected value of a function discrete random variable was defined to be Eg X)] of a ==?[( x) . g(x)P( X x Question N+ 1 , wherethe joint distribution of M and N is: Calculatethe expected value of M M 1 2 N 3 ie PMn== m (, N ) = 1 2 3 4 2 4 6 8 35 35 35 35 1 2 3 4 35 35 35 35 1 1 3 2 70 35 70 35 m 35 2n- 2 . Solution From the table of values, working across from the top left gives: E The Actuarial N + ?? 12 ?? = M ?? Education 35 Company 21 + 4 35 +?+ 4 3 2 36 3 70 35 35 + 1 = IFE: 2022 Examination Page 18 CS1-04: Joint Alternatively, (PM we could distributions work from the formula: m m N n)== = , where m= 1, 2, 3, 4 and n = 1, 2, 3 35 2n 2 , - This gives: 43 ?? Nn++ 11 EPM Mm ?? ??== ?? m N = n) (, mn11 == 43 +1 = m ?? mn== 11 35 2n-2 43 n+ 11 ?? 2n- = 35 mn == 4 = 35 For continuous nm 112 11 1 2++ 1 + -10 + 22 3 +1?? 36 ??= 21 ?? 35 variables: E[ g X (, Y)] = ??g( x y) f XY( x y) dy dx , , , xy where the integration This result is over all possible parallels that for single random values of x and y . variables, where the expected value of afunction of a X = g(x) f ( x) dx. continuous random variable was defined to be Eg[( ?)] x Question U and V havejoint density function: 2 + uv fu )v=< UV(, 3,000 , (i) (ii) Calculate , where 10 u < 20 and EU () and - 5<v < 5 EV () : (a) usingfUV , )uv (, (b) using u and ()Uf v. ()Vf Comment on your answers. IFE: 2022 Examinations The Actuarial Education Compan CS1-04: Joint distributions Page 19 Solution (i)(a) Integrating with respect to v first andthen with respect to u, weobtain: 20 5 EU== () 20 (2uv) ?? u 3,000 dvdu 10uv==- 5 ?? 20 2u2 ++ uv dv du = 3,000 u=10 v=- 5 5 1 uv22??+ 2uv ? 2 3,000 u=10 ?? du ?? ?? -5 20 20 23 uu ?? du===?? 150 450??10 ?? 10 140 ? u= 5 9 Similarly: 20 5 10 uv==20 ? u=10 (i)(b) 20 11 ?? du===?? u 36 36 ??10 () u 10 () EV v 150 (ii) 2.2 fu = U() 10 uv ? 3 23 5 v ??+uv ?? du 3,000 ?? ?? -5 u=10 u 150 fv = V() and 30 +v 300 20 23 u ?? ?? = 450?? ??10 55 ?? vv=- 30 vv++ v 2 300 9 15vv dv . Hence: 140 ?? 150 du== = dv== 1 18 uu10 == 300 ++ dvdu = 3,000 5 =- 20 2uv v2 5 20 uu du 30 =-55 ?? = 20 5 dvdu 5 We have already found that EU 20 (2uv) ?? v 3,000 E ()V== 5 1 23??+ 3 ??= 300 ?? 5 = ?? v=-5 18 Both methodsare equivalent. Expectationof asum It follows that: E[ ag X ()+= bh(Y) ] aEg [ () ] + bEh X [ ( Y)] where a and b are constants, so handling the expected value of alinear combination of functions is no more difficult The definition functions) of expected than handling the expected values of the individual value and this last result (on the expected extend to functions Eg X h Y?? In particular (?? ()?+= ) functions. value of a sum of of morethan 2 variables. E? g X () ?? + be easier to find from the respective ? Eh( ? Y)?? and these expected values will usually marginal distributions, so there would be no need for double sums/integrals. The Actuarial Education Company IFE: 2022 Examination Page 20 If CS1-04: Joint we have the sum of a number of random SX =+ X 12 +?+ EX ) variables: Xn then, by extension of the result E S () distributions above: ( 12() +?+ EX (EX=+ )n Sothe expectation of the sumis equal to the sum of the expectations. Thisis true whether or not the random variables iX are independent. Question Verifythat ??+= 2[ Y E 22 X ] + E[2 Y] , for the random EX ?? variables X and Y given here: X Y 0 1 2 1 0.1 0.1 0 2 0.1 0.1 0.2 3 0.2 0.1 0.1 Solution Reading the values from the table, EX Y2 ??+=(022 ?? 2+ 1) we have: 0.1 +(1 2 2+ 1) 0.1 + ? +(2 2 + 2 3) 0.1 = 5.9 Usingthe marginaldistributions of X and Y: EX2??= 0 ?? []2EY 0.3 = 1.5 2 0.2 + 4 0.4 + 6 0.4 = Thus EX2?? ?? 2.3 0.4 + 1 0.3 + 4 = 4.4 Y []E += 25.9, and the result has been verified. Expectation of a product For independent E g since the joint random ()Xh( variables Y: ?? Y??=) E? ? g () ? ?XE? ? h( Y)? ? density function IFE: 2022 Examinations X and factorises into the two marginal density functions. The Actuarial Education Compan CS1-04: Joint distributions Page 21 Thisresult is true only for INDEPENDENTrandom variables. Question Thejoint probability function of M and Nis given by: PM ( m m N n)== = , 35 2n- , where m= 1, 2, 3, 4 and n = 1, 2, 3 2 M 1 N 2 3 Verifythat +N 1 2 3 4 2 4 6 8 35 35 35 35 1 2 3 4 35 35 35 35 1 1 3 2 70 35 70 35 ?? ?11? ??=+? ??EE EN ?MM? ? [1]. Solution Wehave previously seen that E Wecan calculate ?? ?? ?? () EN?+= 1?? E ?? M ?? N + ?? 136 ??= M?? and 1(n +1) P(N =n) = 2 n=1 This gives Notethat E 1?? EE N 1()+= ?? M ?? N 1??+ ??? ?? MEM() . ()+1EN usingthe marginalprobabilityfunctions: 44 11 1 EPM == () m= ?? m== Mm mm 11 3 35 2 18 57 4 m 10 + 3 = 1 ? 10 = m 1 42 77 + 4 = 2 5 1 7 18 = 7 36 = 35 , which verifies the result. 1 ()+EN . Thisshould not be surprising, since weshowed earlier that M and N are independent random variables. The Actuarial Education Company IFE: 2022 Examination Page 22 If CS1-04: Joint wetake the functions to be gXX= () distributions and )hY ( Y= , these last two results give us some simple relationships between two random variables X and Y: (a) (b) 2.4 EX Y []+= E[ X] + E[ Y] if X and Y areindependent, then EXY [] = [EX] E[ Y] . Covariance and correlation coefficient The covariance cov[]X , Y of two random variables c ov[]XY, This simplifies cov X and Y is defined by: E[ ( X=- E[ X])( Y - Y[ E ])] to: , ] XYE XY[]=-[ E[ X] E[ Y] Notice the similarity var( ) between the covariance [(XE X =- E( X))-22 ] =E( X ) defined here and the definition of the variance: EX( 2 ) Question Showthat the simplification cov[ X, ] YE[ XY]=-E[ X E ][]Y is correct. Solution If we expand the definition cov( XY , ) of the covariance, E[ ( X=- E[ X])( Y XE Y] EXY=- E[ XYE ] = EXY []=- - we obtain: E[ Y])] - YE X [] + E [] X E[ Y]()[ X EYEY [] [ ]--+ [ ] E [] X EX [] E[ Y] [EX] E[ Y] It is often easier to usethe formula X, ) YE[ XY]=-cov( E [X E ] Y [ ]when calculating covariances, rather than usingthe formula given in the definition above. If werearrange this formula it tells us how to find EXY [] for random variables that are not independent, ie EXY [EX] E[ Y]=+[]cov[ X Y] . Wewillreturn to independence shortly. , Note: The units of cov()X , Y are the product ofthose of X and Y. Sofor example if time in hours, and cov[ XXX= , ] var[ IFE: 2022 Examinations Y is a sum of money in , then cov is in hours Xis a . Note also that ] . The Actuarial Education Compan CS1-04: Joint distributions Page 23 Question Calculatethe covariance of the random variables X and Y whosejoint distribution is asfollows: X Y 0 1 2 1 0.1 0.1 0 2 0.1 0.1 0.2 3 0.2 0.1 0.1 Solution cov( X, ) YE[ XY]=-E [ X] E[ Y] . We will use the formula From the table of values: [ Y] EX The (marginal) 0= 10.1 + probability x PX = x() ? 230.1 + distribution =2 of X is: 0 1 2 0.4 0.3 0.3 So: [ ] EX The (marginal) 0= 0.4 + 1 0.3 probability Y PY = y() + 2 0.3 = distribution 0.9 of Y is: 1 2 3 0.2 0.4 0.4 So: EY [ ] 1= 0.2 + 2 0.4 + 3 0.4 = 2.2 Hence: cov( X,Y) The Actuarial Education 2=- 0.9 Company 2.2 = 0.02 IFE: 2022 Examination Page 24 CS1-04: Joint distributions Usefulresultson handlingcovariances (a) cov aX b cY ++ d[] = ac cov [ X, Y] , Proof: [Ea X b+= ] aE [ X]+ b and [Ec b+- E[ aX so aX b] + = a ?+aX b cYd cov , Y d]+= cE [ Y]+ d X - E X]()[ and cY [] = E,??= Y??+ a - XE d+- E[ cY X]()[ c + d] = - YE Y []() c Y - E Y()[ ] ac Note: The changes of origin (b and d) have no effect, because means. The changes of scale (a and c) carry through. X []cov we are using deviations from This meansthat constants that are added or subtracted can beignored and constants that are multiplied or divided are pulled out. Notethe similarity between this result and that for the variance: var[aX b]+= a2 var[ X] . (b) cov, XYZ []+= cov,[ X Y] + cov,[ X Z] Proof: EX Y[+= Z()?? E XY ]?? cov [ X, Y?+ + E[ XZ] and [EY+= ] ] =ZE [ XY]+ E[ XZE [EXY =] cov EY Z[ ] [ ] +E E[ Y]+ E Z()[ ] [ X] EXE [] [ Y +] Z] [EXZ ] - EXE [] [ Z] ,XY[]=+ cov [ X, Z] These two results hold for any random variables X, Y and Z (whenever the covariances exist). Thisresult is just like These results multiplying out brackets using the distributive law: y z()x xy + xz+= . will be useful in Subject CS2. Question Write down the formula for cov[ XZ++YW ]. , Solution YW++ Z) = cov( X, W ) + cov( X, Z) cov( X + cov( Y, W ) + cov( Y, Z) , The nextresult concernsrandom variablesthat areindependent. (c) If X and IFE: 2022 Examinations Y are independent, cov[0XY , ] = . The Actuarial Education Compan CS1-04: Joint distributions Page 25 Proof: cov [ X, Y] [EXY ]=- E[ XE ] [ Y] = 0 Thecovarianceof Mand N usedin earlier examplesis zero asthey areindependent. The result EXY[] = [EXE[Y ] ] extends to the expected value of the product of any finite variables, ie =[] EX [EX nn]?? X . 11] [EX number ofindependent The covariance between X and Y is a measure of the strength of the linear association or linear relationship between the variables. However it suffers from the fact that its value is dependent on the units of measurement ofthe variables. A related quantity to the covariance quantity (ie it has no units). The correlation coefficient is the correlation coefficient as corr()X, Y ) orY? Y (),X(written which is a dimensionless (),X oftwo random variables X and Y is defined by: cov,() XY corr ,XY () = var X () var (Y) Question Calculatethe correlation coefficient of U and V, where: uv 2 + , where 10 3,000 fu )v=< UV(, , () = EU You are given that 140 EV() = and 9 u < 20 and - 5<v < 5 5 18 . Solution First weneed EUV[] : 20 5 EUV []= 2 +uv ?? uv3,000 10 uv==- dv du 5 Integrating first withrespect to u: u =20 uv EUV== [] 21 32 u v 2?? 32 3,000 =-55 55 + ?? ?? ?? =10 dv ?? vvu=- 14vv 9 + ?? ?14 v22 ?? dv = 20?? ? 18 v3 + ? 5 ?? 25 = ???? 60?5 6 Sothecovariance of Uand Vis: cov(UV,) The Actuarial Education 25 =- 6 Company 140 9 5 18 25 = - 162 IFE: 2022 Examination Page 26 CS1-04: Joint distributions Wenow need the variance of Uand V: 20 var( ?Uu )= 2 140?? du 150 10 9 ?? 20 224 uu ?? -= 140?? 650 ?? -=???? 600??10 9 ???? 81 Similarly: 5 30+ 5 dv -?? 300 18?? var( )= ?Vv2 - So the correlation 5 coefficient 534 ???? ? 5 22 2,675 ? ?? -? ? = 30 1,200?? ??? ?18 324 vv = v + -5 is: 25 - 1 162 corr(UV, )==650 2,675 81 The correlation association 324 coefficient takes a value in the range Wecan use this range to do a reasonableness means is that If . It reflects the degree of check for any numerical answer. Any figure outside wrong. ? of 1 indicates that the variables haveperfect linear correlation one variable is actually =0? , the random Independent =11? -= between the two variables. this range is automatically A valuefor = -0.019 2,782 variables alinear function of the other (with probability whatthis 1). are said to be uncorrelated. variables are uncorrelated (but not all uncorrelated variables areindependent). This meansthat the converse of Result(c) given previously is not true. If X and Y are independent, their covariance is equal to zero. However,if the covariance of X and Y is zero, this does not necessarily mean that they are independent. In simple terms, independent expectations meansthat probabilities factorise, and uncorrelated meansthat factorise. IFE: 2022 Examinations The Actuarial Education Compan CS1-04: Joint distributions Page 27 Question A bivariate distribution hasthe following probability function: P 1 1 0 1 0.1 0.6 0.1 0.1 0 0.1 Q 1 Show that P and Q are uncorrelated but not independent. Solution Wehave: EP [] = 0, [ ] =- 0.6EQ and0EPQ [ ] = so the covariance is zero. However the conditional probability distribution 0.5, whereas the of, say, PQ=|1 takes the values 1 marginal distribution and 1 each with of P takes the values 1, 0 and 1 with probabilities 0.2, 0.6 and 0.2 respectively. Sothe marginal distributions are different from the conditional distributions, and P and Q are not independent. Alternatively, wecould compare, for example, PP ( Q=- 1, =1) with PP ( =- 1) P( Q =1). This confirms the sentence givenin the Core Reading previously. Thisis an example of random variables that are uncorrelated, 2.5 but not independent. Varianceof asum For any random var[X variables X and ]+= var[YX] + Y: var[ Y] + 2cov[ This can be proved from the definitions X, Y] of variance and covariance. Onepossible proof is asfollows: var[ XY]+= E ([ XY][- E [(EX=- E[ X]) E +XY])2??+ ?? +( X=- ([ E X]) ?? + ?? var[ ]=+ var[XY] The Actuarial Education Company Y -EY[ ])]2?? + ?? ?EY ( - E[ Y 22?+ 2 ? ? EX - ([ E X])( Y - E[ Y])[]]) 2cov( X, Y) IFE: 2022 Examination Page 28 CS1-04: Joint distributions Question Set out an alternative proof of the above result starting from var(X )+= cov(YX + Y X + Y) . , Solution var( X )+= cov(YX + Y, X + Y) = cov( X, ) cov(XXY , )+++ cov( Y, X) var( )=+ 2cov(XXY , ) For independent var[X since random = var(Y) variables, ]YX]+= var[ cov[0XY , ] + cov( Y, Y) this can be simplified: ] + var[ Y . Question Show from first principles that the random variables given below, satisfy var[ MN]+= var[]M ] + M, N, whosejoint probability function is var[ N . M 1 N 2 3 1 2 3 4 2 4 6 8 35 35 35 35 1 2 3 4 35 35 35 35 1 1 3 2 70 35 70 35 Solution By adding up the probabilities following distribution: N+= m+ () n The expectation of EM []+= 2 N IFE: 2022 Examinations wesee that the random variable 2 3 4 5 6 7 2 5 17 12 11 2 35 35 70 35 70 35 +mn PM from the table, MN+has the MN+ is: 35 + ?+ 7 22 35 = 32 7 The Actuarial Education Compan CS1-04: Joint distributions Page 29 The variance of MN+is: var(MN) 2+= 35 +?+7 2 222 2 32?? - 35 75 ?? = 7?? 49 From the marginaldistributions: var( M) 1= var( N) 1= 10 4 +? + 422 14 22 1 3 +?+ 77 10 - 32 = 1 2 11?? - ?? 7?? 26 = 49 26 So M and N satisfy the given relationship, since 1+= 49 75 49 . Similarly,it can be shown that: var( X )-= var(YX) and so for independent + var( Y) - 2cov( X, Y) random variables, var( X )YX )-= var( ) + we get: var( Y The variance of a difference is not equal to the difference in the variances. It is equal to the sum of the variances. This mustbetrue, since the difference in the variances could be negative but variance is always a positive quantity. The Actuarial Education Company IFE: 2022 Examination Page 30 3 CS1-04: Joint distributions Convolutions 3.1 Introduction Much of statistical theory involves the distributions of sums of random variables. particular the sum of a number of independent variables is especially important. In Discretecase Consider the sum oftwo discrete random variables, so let probability function Then y(),Px . z()=PZ is found ZPz()(=- ?P x, z ie ZX=+ Y, where )X(, Y has joint by summing y(),Px over all values of (),xy such that xy+= z, )x . x Wedidthis whenwecalculatedthe distributionof Now suppose that two X and Y areindependent marginal probability zZX? Pz ()=- functions, P ( x) PY( MN+ in the previous question. variables. Then y(),Px is the product of the so x) x Definition When a function convolution ZP can be expressed of the functions here, the probability function functions of X and Y. XP of as a sum of this form, then ZP is called the and YP . This is written symbolically ZX=+ Y is the convolution as PP =* PY . ZX So of the (marginal) probability Continuouscase In the case where density function X and Y are independent y(),fx , the corresponding f ( x)fY (z () ?ZX fz =- continuous variables expression is: with joint probability x) dx x Question X?? Poi () and If Z ? YP ()oiare independent random variables, obtain the probability function of =+ XY. IFE: 2022 Examinations The Actuarial Education Compan CS1-04: Joint distributions Page 31 Solution Usingthe convolution formula for discrete random variables: z PZ z()== ?(P X = x)P(Y = z - x) x=0 z x-- ? = xz ee ? ? !(xz x=0 -+ ? () z = - x)! ez! ? -xz ? zx!!( z - x)! x=0 e = -+ ? () z -+ ? z?? xz x ?? () =+ ? z! - ? x=0x?? z! e ? x ()z Sincethis matchesthe probability function for the values Z = 0,1,2, ?),it follows that ? Poi ?+ + ()Poi ? distribution (and Z can take the ()Z . Wecan also use a convolution approach to derive the sum of two continuous random variables. Question If X?? Exp () and ? ()xpareindependent random variables, obtain the PDFof Z YE =+XY. Solution Usingthe formula for continuous random variables (and assuming that f Z() ze -? ? e-- xz x () ? dx== e - -? e?? 1 ee ? =- ? If ?= , we get ? 2ze ?- z ?-- - () ?? = ? ): zzz ??(? - ) e -(?-)x dx 00 ?zz z- - ()e-?z -- ? , whichis the PDFof the Gamma(2, ? ) distribution. Wecan also use MGFsto find the distribution of a sum of random variables. This will be dealt with in Section 4. The MGF method is much easier than the convolution The Actuarial Education Company method. IFE: 2022 Examination Page 32 3.2 CS1-04: Joint distributions Momentsoflinear combinationsofrandomvariables In the last section welooked at the properties offunctions oftwo random variables. extend these results to Wecan now more than two variables. Mean If X nXX 12,,..., are any random Ec X 11 where variables c2 X2++X??+ c nn X () = (not necessarily c1 E( X1 )+ c2 E( X2 )+ independent), + c n E( then: n) cc 12,,...., cn are any constants. ?? ie Ec nn ?? ci ??= X ii ?? ?? ii == () E Xi 11 Thisis an extension of the result concerning the expectation of afunction of two random variables that we saw earlier ie [Eag( X) bh( Y)]+= aEg [ ( X)] + bE[ h( Y)] . Variance Let +=+ Yc X 11 c2 X2 X , where the variables are not necessarily independent, ?+c nn and let us now consider the variance: var( ) cov(YYY , ) = cX11=+ c2 X2 +??++ c cov X nn 2 cov cXii , Xi ()=+ ??? 2 iij nXX 12X ,,..., random are pairwise variables, ?? c2 X2+ ci cj cov 12 uncorrelated c nn X () ,iX Xj() )+= var( X1 ) + var( X2) + 2cov( X1, X2) . (and hence certainly if they are independent) X() = nn (var cX11)+ 22 var (cX22 )+ + c2n var( n ) nn ?? c2var ?? var + then: var c1 X1 c 2 X2++X??+ c ie cX11 j < Thisis an extension of the result var(XX If , cXii ??= i Xi () ?? ii 11 == Question Therandom variables X, Y,and Z have meansand variances 4X = s =2 Y 4 and s , Y =-5 , 6Z = , s =2 X 1, =2 Z 3. Thecovariances areasfollows: cov( X, Y) =- 3 cov( X, Z) =- 2 Calculatethe meanand variance of IFE: 2022 Examinations cov( 1YZ , ) = =-+23WX Y Z. The Actuarial Education Compan CS1-04: Joint distributions Page 33 Solution The meanis: EW [] E[ X]=- 2E[ Y] + 3E[ Z] = 4 -(2 - 5)+(3 6) =32 Sincethe randomvariablesX, Yand Zare notindependent,wecanseethat: var( )Y?+var(WX ) 4var( ) + 9var( Z) Instead we have: var( W ) var( X=- 2 3YZ) = var( X)=+ 4var( ) + 1=+ (4 + 4)(9 + cov( X - 2 9var(YZ) 3) - (4 - - + 3YZ, X - 2 4cov( X, Y) 3)(6 + - + 2) - (12 + 3 ) YZ 6cov( X, Z) - 12cov( Y, Z) 1) = 32 It is important to note that there is a distinction between adding up random variables, and multiplying a random variable by a constant. Question If XX ?,, 12 , Xn areindependent random variables with mean mean and variance of SX =+ X 12+ ?+Xn and =Tn1X and variance s2, obtain the . Solution The meanand variance of S(which is a sum ofindependent random variables) are: E [] S = E[ X + X =??12 ++ X ] = [EX1 ] + [EX2 ] var( ) = var(SX 12 ++ X ) = var[ X1] ++ E X nn[] n X ] =n var[]X2 ??+++ var[ nnX+ s2 The meanand variance of T(which is a single random variable multiplied by a constant) are: ET []== E nX ] var( nE[ X ]=11[n var(TnX )== n var(X 11) )s=22 n 2 The means are the same but the variance of S is smaller. The Actuarial Education Company IFE: 2022 Examination Page 34 4 CS1-04: Joint distributions Usinggeneratingfunctionsto derivedistributionsoflinear combinationsofindependentrandom variables In the last section, wesaw that wecan find the distribution of a sum of a number of independent random variables using convolutions. In this section welook at an alternative andfrankly much easier method. In many cases generating of Y, where 4.1 functions may makeit possible to specify the actual distribution Yc 11 X=+ c 2 X2 X?+ + c nn. Moment generating functions Suppose 1X and 2X respectively, are independent and let random variables MtX1 () and with MGFs 2MtX () Sc11X=+ c 2 X2. Then: SMt() = E e = = Ee c2 X2 t??+ () cX11 cX t ?? 11 ?? ?? ?? ? c2 Ee ? so the ZX=+ Y, we have: M t MY t() ZX= () MGF of the sum of two independent The result Let ? Mc t() M ( c 12t ) XX12 In the case of a simple sum Mt ( ) Xt 2 ? extends to the sum of more than two YX12 =+ X X? + + n tYn= (Mt) (and if iX where the iX in the sum is replaced X? + of the individual MGFs. variables. are independent and iX has MGF ()iMt . Then: M t() 12 M t() ... M ( ) byicX If, in addition, the iX s are identically YX12 =++X variables is the product n, then then distributed, ()iMt in the product is replaced each with MGF by ()iMct ). Mt () , and n MtY () =??Mt()??. Both ofthe last two results areimportant to remember and are quotable in the Subject CS1exam. IFE: 2022 Examinations The Actuarial Education Compan CS1-04: Joint 4.2 distributions Page 35 UsingMGFs to deriverelationshipsamongvariables Bernoulli/binomial We will now derive the MGFof a Bin n(, p) distribution using an alternative methodto that usedin the previous chapter. This method usesthe fact that a Bin)n(, p random variable is the sum of n independent Let iX Bernoulli ,in= 1, 2, ?, p random () , be independent Then each has MGF So YX12 =++X So the variables. X? + Mt () q p variables. () t pe=+ . n has MGF Bi )n ( n, p random Bernoulli qpte n ??+ ?? which is the MGF of a Bin)n(, p variable. variable is the sum of nindependent Bernoulli p random () variables. Each Bernoulli variable has mean p and variance pq ; hence the binomial has mean np and variance npq . Physically, the number (0 or 1) at each trial. of successes in Further, the sum of two independent Bi n trials is the sum of the numbers of successes binomial variables, one Bin n p(), and the other mp (),n , is a Bin np+ m (, ) variable. Question Showthat if X ? Bin m (, p) and ?YBin n then+XY also (, p) areindependentrandomvariables, has a binomial distribution. Solution The momentgenerating functionsfor X and Yare: MtX () ( q pe=+ tm) and MtY () ( q pe=+ tn) SinceX andYareindependent,wecanwrite: MXY + (t ) =+ qpe () +qpe ()tn =+ qpe tm+n ()tm which werecognise as the MGFof a Bin m+ (, n p) distribution. +XYfollows the binomial distribution withparameters+mn The Actuarial Education Company Hence, by uniqueness of MGFs, and p. IFE: 2022 Examination Page 36 CS1-04: Joint This result should be obvious. If wetoss a coin 10 times in the distributions morning and count the number of heads,the number would be distributed as Bin(10,1 ). If wetoss the coin afurther 20times in 2 the afternoon, the number of heads will be distributed Bin(30, 1 ) distribution is obviously the same as the The phrase by uniqueness 2 of MGFsis important that here. as Bin(20, 1 ) . 2 Adding the totals together we would expect for the whole day. What we are saying here is that it is not possible for two different distributions to have the same MGF.If it were, then, once we had found the MGFof the sum, it would not be possible to determine which of the two distributions havingthe same MGFwasthe onefor the variable XY+ . Fortunately, MGFsdo uniquely define the distributionto whichthey belong,andso weknowthe exactdistributionof XY+ . Geometric/negativebinomial Wewillnow derivethe MGFof a negativebinomial distribution with parametersk and p using an alternative method to that used in the previous chapter. This method uses the fact that a negative binomial k(, p) random variable is the sum of k independent geometric random variables with parameter p. LetiX ,ik = 1, 2, ,? , beindependent Then each has MGF So YX12 =+kX Mt () = + + X? pet geometric p () variables. t. 1 - qe has MGF k ?? ?? qet ???? pe t 1 , which is the MGF of a negative binomial ( k, p) variable. So the negative binomial random k (), p random variable is the sum of k independent Each geometric variable has mean mean geometric p() variables. k p and variance 1 p and variance q p2 ; hence the negative binomial has kq p2 . Physically, the number of trials up to the kth success is the sum of the number of trials to the first success, plus the additional number to the second success,..., plus the additional number to the kth success. Further, the sum oftwo independent (),mp , is a negative Thisis straightforward IFE: 2022 Examinations binomial negative binomial variables, one (),kp and the other kp+ m (), variable. to prove using MGFs. The Actuarial Education Compan CS1-04: Joint distributions Page 37 Poisson We will now find the distribution ofthe sum of two independent Poissonrandom variables using MGFs. This is an alternative method to the convolution Let X and Z beindependent ThenXhasMGF MtX () exp ()Poi ? and Poi ? ()? method. variables. 1 }, Z hasMGF MtZ (){=() et exp tt -- (){?? 1 ??? exp 1?} exp Sothesum +X Z hasMGFexp{ =ee()} ? ????? ? ? the MGF of a ? Poisson variables is a Poisson variance ==mean ?, Z has mean variance == This is animportant ? et ? +? 1 (){=}. 1t e - }, which is ()(){ ()+Poi ? variable. So the sum of independent X has ? variable. variance ==mean ?, and the sum has +? . result to remember and is quotable in the Subject CS1 exam. It can be extended (in an obvious way)to the sum of morethan two Poissonrandom variables. Question Acompany has three telephone lines coming into its switchboard. The first line rings on average 3.5 times per half-hour, the second rings on average 3.9times per half-hour, and the third line rings on average 2.1 times per half-hour. Assumingthat the numbers of calls areindependent random variables having Poisson distributions, calculate the probability that in half an hour the switchboard will receive: (i) at least 5 calls (ii) exactly 7 calls. Solution Summing the Poisson variables, the total distribution (i) PX== ( 5) (ii) (PX These figures Alternatively, The Actuarial number of telephone calls coming in follows the Poisson with mean 3.5 3.9 2.1 = 9.5++ . 7)== 1 - P( X (PX = = 4) = 1 - 0.04026 = 0.95974 7) - P( X = 6) = 0.26866 are taken from the cumulative - 0.16495 = 0.10371 Poisson table on page 178 of the Tables. we could use the Poisson probability formula. Education Company IFE: 2022 Examination Page 38 CS1-04: Joint distributions Exponential/gamma We will now derive the MGFof the Gammaa? (, ) distribution using an alternative methodto that used earlier. This method usesthe fact that a Gammaa? (, ) random variable can be regarded as the sum of Let , 2,...,ik, = Xi,1 aindependent be independent Then each has MGF =+kX X? + + YX12 So So the Ex ?()Exp random variables. Mt () (?? =- t )-1 . k has MGF Gamma ()k , ? random ()p ? random ()? variables. Exp 1 , which is the MGF of a Gamma )k ( ,? t()-??-?? ?? ?? variable (for k a positive integer) ? k 1 ? 2 Further, the sum oftwo independent , 2 ; hence the Gamma(),k ? has . Physically, the time to the kth event in a Poisson individual inter-event times. ?+ ad ? k and variance ? Gamma ( is the sum of k independent variables. Each exponential variable has mean 1 and variance mean variable. process gamma variables, one with rate ?is the sum of k (),a? and the other (),d? , is a ) variable. Thisresult can also be proved using MGFs. Question If the number of minutesit takes for a mechanicto check a tyre is arandom variable having an exponential distribution with mean5, obtain the probability that the mechanic willtake: (i) (ii) more than eight at least fifteen minutes to check two tyres minutes to check three tyres. Solution (i) The sum oftwo independent exponential random variables with mean5, has a gamma 1 distribution with parameters a 2= and ?= . If welet X bethe total time taken for the 5 mechanicto check the tyres, then: 2 8 () xe dx 1 5 -1 x (8) ? G(2) PX>= 5 8 IFE: 2022 Examinations The Actuarial Education Compan CS1-04: Joint distributions Page 39 Integrating by parts, using PX>= (8) 1 ?? 25 ?? + ??8 1 25 1 25 = Alternatively, ux=, we obtain: 8 88 ?? 11xx ??-5xe 5?e dx -- 55 ?? ?? 8 8 --1 x ???? 55 =- 25ee ?? ?? 40 ??8 ?? ?? -- 40 =+ 88 ?? ee 55?? 25 ?? 0.525 we could use the Poisson process. If checkedin a time period of t minutes,then welet Y be the number of tyres ?YPoi(0.2 t ) . The probability that it takes more than 8 minutes to check two tyres is equivalent to the probability oftyres checked in 8 minutesis only 0 or 1. Using ?YPoi(0.2 that the number 8), the required probability is therefore: PY (ii) 1.6 0 or 1 = 0.525 ()== e 1.6 +1.6 --e The sum of three independent exponential random variables with mean5 has a gamma distribution with parameters and a 3= 1 5 ?= . If welet X be the total time taken for the mechanicto check the tyres, then werequire ( > 15)PX . Wecould solve this byintegrating the PDF, but this would require integration by parts (twice). An easier wayto find this probability is to use the gamma-chi squared relationship proved earlier. If X ? Gammaa? ( , ) and 2a is a positive integer, then 2 X? ?? 2 2a . So: (PX Substituting 15)>= P(2 3= a X > 30 ) =(P??? 22 a >30?) and ?= 1 , and usingthe 5 ?2 values given on page 166 of the Tables, we obtain: PX( 15)>=P ( ? 62 > 6) = 1 - 0.5768 = 0.4232 Alternatively, wecould usethe Poisson distribution with mean 0.2 15 and calculate the probability of 0, 1 or 2tyres checked within 15 minutes. The difference in the wording in the two parts of the question more than versus at least is not significant here. Since we are working in continuous time, the probability that an event occurs at exactly time 8(or time 15)is zero. The Actuarial Education Company IFE: 2022 Examination Page 40 CS1-04: Joint distributions Chi-square From the above result independent with ?= 2, 1 it follows that the sum of a chi- square n() and an chi - square m () is a c - square Sothe sum ofindependent m n+ ()hi variable. chi-square variables is a chi-square variable. Question Supposethat 1X and2X areindependent random variables such that let X =+12XX . Use MGFsto prove that ? X 2 ? X1 m?? 2 and ?? X2 2 n, and mn . + Solution Since 2 =Gamma ? n 22,n ()1 we have: m 1 MtX () 1=- 2t()- n and 2 2 MtX () 1=- 2t()- 2 Since1X and2X areindependent: Ee ( tX ) MXX t() tX +tX Ee== ( tX 1 etX 2=) ( 12) =Ee mn So MtX () =- 12t() Ee t MX ( tX1) E( etX2) = M () 12t() m+ n 12t() 22-- 1=- 2t()- 2 . Thisis the X ? 2 ? MGFof the 2 ? + mn distribution. Bythe uniquenessproperty of MGFs, it follows that mn. + Thisresult is usefulin manyareas of statistics, including generalised linear study later in the course). models(which we will Normal Let X be a normal random variable with meanX be a normal random variable with meanY independent, and let IFE: 2022 Examinations , and let Y and standard deviation Ys . Let X and Y be ZX=+ Y. X has MGF tXXMt() exp Y has MGFtYYMt() and standard deviation Xs exp =+ 2s =+ s 2 X221 t (). Yt221 () . The Actuarial Education Compan CS1-04: Joint distributions So the sum exp ( which is the Page 41 ZX=+ Y has MGF: s tXY 22 ttX ) exp MGF of a normal So the sum ofindependent XY ? Y(, ie ++N XY ++ 2 22 sYt 2 variable (with () = exp11 mean +X Y ()t+ +X Y 1 2 2 2t } s2 +sX Y (){ and variance + 22 ss XY ). normal variables is a normal variable. X 22) ss+ Similarly,it can be shown that: XY ? Y(, --N XY X ss+ 22) The variance ofthe difference is the sum ofthe variances (as wesaw in the general case earlier). These areimportant results to remember and are quotable in the Subject CS1exam. Question If X and Y areindependent standard normal variables, determine the distribution of 2-X Y. Solution The resulting distribution is normal. 0-XY ? N 2( 2 The Actuarial Education Company -0,2 Wejust need to fill in the mean and variance to obtain: 1 +( - 1) 22 1) =N(0,5) IFE: 2022 Examination Page 42 CS1-04: Joint distributions The chapter summary starts on the next page so that you can keep all the chapter summaries together for revision purposes. IFE: 2022 Examinations The Actuarial Education Compan CS1-04: Joint distributions Page 43 Chapter4 Summary Two discrete random variables X and Y havejoint probability function (PF), PXy==x(, Y ). This defines how the probability is split between the different combinations ofthe variables. The joint PFsatisfies: ?? PX x(, Y y)== =1 and PX x(, Y== y) =0 xy Two continuous random variables X and Y havejoint probability density function (PDF), XY , fx(, y). Thejoint PDFsatisfies: ?? )fxXY y dxdy 1 and )==(, 0 y(, fx ,,XY xy Wecan usethe joint PDFto calculate probabilities asfollows: yx 22 Px y(, X<< 12 x y1 < < 2) = ??f(x Y y) dx dy , yx 11 Thejoint distribution function, for both discrete and continuous random variables is given by: Fx(, y)y==P( X x Y= ) , Forcontinuousrandom variables y(, fx ) = ? ?? 2 xy F(xy, ) . The marginal distribution, eg PX = x()or f X(x) , can be calculated using: PX ?P( X x == () = x Y = f (x ) y) , y The expectation = y) dy x(| Y== y)or f XY | )y = ( x | y , can be calculated using: PX Y(, PX x == y) PY = y() of any function, E[g( X, Y)] , y Theconditional distribution, eg y(| PX)==x Y f ,Y( x =?XX f XY | y(x y) = = , Eg [( X, Y)] , can be calculated ?? (gx y,) P( X== x , Y = y) or y(, fx XY, ) fY( y) using: ??(gx y,)fXY (x y) dxdy , , xy The Actuarial Education Company yx IFE: 2022 Examination Page 44 CS1-04: Joint distributions Thecovariance,cov()X, Y , can becalculated using: X( )()( Y -E( Y) ??= E XY( ) E( X) E( Y) cov( X, )=-YE-??)X E The correlation corr( coefficient, , ) XY ?( ?=(, XYXY ) corr(, ) , is given by: cov( ,XY) X, Y)== var( X)var( Y) Therandom variables X and Y are uncorrelatedif and onlyif: co rr( XYXY , ) 0 =? The random variables PX cov( , ) = 0 E XY( ) = E( X) E( Y) ? X and Y are independent x(, Y== y) = fx y) XY(, = (PX = x) P( Y if, and only if: y) = f X(xf) Y( y) , for all values of x and y . Independent random variables are always uncorrelated. Uncorrelated random variables are not necessarilyindependent. Expectations of sums and products can be calculated using: EX () Y+= E( X) (EXY) + E( Y) E( X E )()Y=+ Cov( X, Y) = (EX) E( Y) if X, Yindependent The above are also true for functions gX () and hY () of the random variables. Variances of sums can be calculated using: var(X )+= var(YX) + var( Y) + 2cov( X, Y) var( )=+ var(XY) if ,X Yindependent The convolution of the marginal probability (density) functions of X and Y is the probability (density) function of Z f ZX =+XY. (PZ )z= or ()Zfz is given using the formulae: P X =x) P(Y= z - x) ?*( ff Y== ?f(XY)x f (z- x) dx or x x Forindependent random variablesXX1 ?,, Ec 11 X var c 11 X IFE: 2022 Examinations ++ c nn X() ++ = c X nn() c1 E( X1 ) = ?? n, andfor any constants ++ (22 1 varcX1 ) cc 12,,..., cn: c n E( Xn ) ?? ++ n var(cXn) The Actuarial Education Compan CS1-04: Joint distributions Page 45 Forindependent random X=+ YX1 ? variables n + ? ?,, Mt () = XX1 n M YX1 t()? (Mt)[] = Forindependent random Bernoulli p () MXn () t n if the X Xi's are alsoidentical variables: ?++Bernoulli(p ) is Bin( n, p) Bin n(, p)p++Bin( m, p) is Bin(n Geo p ()++? m, ) Geo( p) is NBin p(,k ) NBink(, p)p++NBin( m , p) is NBin(k m , ) ()Exp?? ++?Exp( ) is Gamma?( a, ) Gamma )a? (, Gamma ( d ?) is Gamma( +??22 mn is ? s11 is Poi( ) NN( Some of the notation , 2m+ n Poi ()?++?Poi() (, ?) ++a d , 2, s ) 22 2 ) is N( 1 2 2 1 + ss 2 2) , used here for the linear combinations of random variables is non-standard andis usedsimply to convey the results in a conciseformat. The Actuarial Education Company IFE: 2022 Examination Page 46 CS1-04: Joint distributions The practice questions start on the next page so that you can keep all the chapter summaries together for revision purposes. IFE: 2022 Examinations The Actuarial Education Compan CS1-04: Joint distributions Page 47 Chapter4 PracticeQuestions 4.1 Let X and Y havejoint density function given by: f (,xy) 4.2 c( x=+ 3y) 0 < x < 2, 0 < y < 2 (i) Calculate the value of c . (ii) Hence,calculate ( PX 1, 0.5)Y<> . The continuous random variables,X Y havethe bivariate PDF: x f2xy (, ) = Exam style 1,yx +< 0, y > (i) Derivethe marginalPDFof Y. (ii) Derive the conditional > 0 [2] PDF of X given Yy=using the result from part (i). [1] [Total 3] 4.3 The continuous random variables X and Y havejoint PDF: x 1 6 f (,xy) 4.4 ()2 xy=+ 0x< y < <2 (i) Determine the PDF of the conditional distribution |X (ii) Calculate the conditional Show that, for the joint random probability (1Y<< PX variables ,MN, 1.5| Yy= . = 1). where: m m N n)== = , for 35 2n 2 m=1, 2, 3, 4 and n = 1, 2, 3 the conditional probability M given corresponding marginal distributions. PM ( , - 4.5 Exam functions for Nn=and for N given Mm=are equal to the Let X and Y havejoint density function: style fXY , (,xy) 3x =+xy()2 <0 4 5 x < 1,0 < y < 1 Determine: (i) the marginal density function of X [2] (ii) the conditional density function [1] (iii) the covariance of X and Y. of Y given Xx= [5] [Total 8] The Actuarial Education Company IFE: 2022 Examination Page 48 4.6 CS1-04: Joint Calculate the correlation coefficient of X and Y, where X and Y have the joint distributions distribution: X Y 4.7 0 1 2 1 0.1 0.1 0 2 0.1 0.1 0.2 3 0.2 0.1 0.1 Claim sizes on a home insurance policy are normally distributed about a mean of 800 and with a standard deviation of 100. Claimssizes on a carinsurance policy are normally distributed about a meanof 1,200 and with a standard deviation of 300. All claim sizes are assumed to be independent. To date, there have already been home claims amounting to 800, but no car claims. Calculatethe probability that after the next 4 home claims and 3 car claims the total size of car claims exceeds the total size ofthe home claims. 4.8 Two discrete random variables, X and Y, havethe following joint probability function: Exam style X Y 1 2 3 1 0.2 0 0.2 2 0 0.2 0 3 0.2 0 0.2 Determine: (i) (ii) ( ) EX the probability distribution of [1] YX =|1 [1] (iii) whether X and Y are correlated or not [2] (iv) whether X and Y are independent [1] or not. [Total 5] IFE: 2022 Examinations The Actuarial Education Compan CS1-04: Joint 4.9 distributions Page 49 The random variables kx where -a e - y x<<8 Derive an expression for 4.10 Exam style given by: 1,1 < y <8 k in terms of a and . Show using convolutions that if X and Y are independent random variables and X has a distribution 4.11 density function , and k is a constant. 1,0>> a X and Y have joint and Y has a Let X be a random ?n2 variable distribution, then XY+ has with mean 3 and standard a 2 ?mn+ deviation 2 ?m distribution. 2, and let Y be a random variable with mean4 and standard deviation 1. X and Y have a correlation coefficient of 0.3. Let Z =+XY. Calculate: (i) cov()X , Z [2] (ii) var()Z . [2] [Total 4] 4.12 X has a Poisson distribution with mean 5 and Y has a Poisson distribution cov( X, Y) =- 12 , calculate the variance of Z where with mean 10. If =- XY +23Z . [2] Exam style 4.13 Show that if X has a negative binomial distribution negative binomial distribution with parameters XY+ 4.14 Exam style also has a negative binomial distribution, with parameters k and p , and Y has a mand p, and X and Y areindependent, then and specify its parameters. For a certain company, claim sizes on car policies are normally distributed about a meanof 1,800 and with standard deviation 300, whereasclaim sizes on home policies are normally distributed about a mean of 1,200 independent. and with standard deviation 500. Claim sizes are assumed to be Calculatethe probability that a car claim is atleast twice the size of a home claim. The Actuarial Education Company [4] IFE: 2022 Examination Page 50 CS1-04: Joint distributions The solutions start on the next page so that you can keep the chapter summaries together for revision purposes. IFE: 2022 Examinations The Actuarial Education Compan CS1-04: Joint distributions Page 51 Chapter4 Solutions 4.1 (i) Using theresult??fx(, y) dxdy=1 gives: yx 22 2 y dxdy+= ? c ?? cx (3) yx 00 1 2 2 3xy??+ x2 ??x=0 y=0 == dy 2 ? (2=+ 6 cy) dy y=0 cy=+23 y2?? 2 ?? 0y= 16 c== 1 ?c = (ii) 1 16 The probability is: 21 Y(1, PX ??161 (x 0.5)<> = +3 y) dxdy 0.5yx== 0 2 ? 16 11 2 16 11 =+ 2 1 2=+3xxy?? ?? 0x= dy y=0.5 2 ? 3ydy () y=0.5 16 2 11 3 2?? =+ yy ??y= 2 2 0.5 = 128? 51 0.398 4.2 (i) The marginal PDFof Y is: 1-y fyY() 2== ? 2 dx 1 -y x[] 0 = 1 - y()2, 0 < y <1 [2] 0 (ii) The conditional PDFof X given fx XY | =yy(, The Actuarial Education Company ) y(, fx XY, fy Y() Yy= is: ) == 21 2(1 y) = 1--y , 0 < x1< -y [1] IFE: 2022 Examination Page 52 4.3 (i) CS1-04: Joint Wesaw in Section 1.4 of the chapter that the 5y3?? ?? 6 ?? ?? 18 =+-2y 63 fy Y() So the PDFfor the conditional marginal distribution for distributions Y is: 0 <y <2 distribution |X Yy=is the joint PDF divided by the marginal PDF: y(, fx =y XY | (ii) 1 6 ) ()2 +xxy 18 63 So the general conditional 2 xxy + 0 == 5y3?? 2y+- 2y+- ?? ?? ?? = x < 2 366 probability is given by: 1.5 (1 PX<< 1.5| Y < 85y3 y) 85 ?x2+ xy dx = +- = 1 yy +22- 36 = 1.125(+-1.125 () 2 y+- 1.5 xx32 y?? 11 ??+ 33 yy 85 32 ?? ??1 36 1/ 3 +yy /2 ) 85 y 3 36 Substitutingin y1= , weobtain: (1 1.5|Y = 1)= 1.125(+-1.125 () PX<< 2+4.4 1/ 3 +1 /2 ) = 1.4167 = 0.3696 85 36 In the chapter, wesaw that the marginal probability functions for () MPm = m for 10 3.8333 M and N are: m=1, 2, 3, 4 and: Pn = N() 1 72n- So, dividing the joint IFE: 2022 Examinations n = 1, 2, 3 probability function Pmn== MNn( , ) | = for the conditional for 3 MNPm by the n) ?? ?? , Pn N() probability function marginal probability function 2-- ?? 35 of M given for N, we obtain: 1 ?? mm(, , m= 1, 2, 3, 4 ?? = nn 10 7 ?? 2 23 Nn= . The Actuarial Education Compan CS1-04: Joint distributions Page 53 Similarly: NM | = mPmn(,)== is the conditional PN)==n(, M m m ?? ?? 2-- ?? PM = m probability function of N given m 10 = 1 735 nn() 2 23 , n = 1, 2, 3 Mm= . These areidentical to the marginaldistributions obtained in the chapter text. 4.5 (i) Marginal density 1 122 ? fx=+ X() 3x xy() dy = y=0 (ii) 3x y + 55 2 ?? 2 ? ?? ?? xy ?? ?? = y=0 ?3x2 5? 1 ? + 2 x ?? for 0 < x < 1 [2] Conditional density f YXx | = (,xy== ) (iii) 1 444?? y(, fx ) XY, 3 4 5 fx X() 2 3xx + + xxy () xxy 33 x++ y 2 = 52 () 412 = 3 + xx x3 2 for 0 <y < 1 [1] +211 2 Covariance Usingthe marginal densityfunction of X: 1 ? EX=+ () 3?? 3x x=0 41 32 4? 4 x x ?? dx = 52 ?? 5? 4 + 1 3?1 x ?? 6 11 = ?x=0 [1] 15 Obtaining the marginal densityfunction of Y: 1 ? fy=+ Y() 3 123 x=0 44 xy ()xdx=55 x 2 ?? + 1 x y?? 2 ??x=0 4? = 5? 1+ 1 ? y?? for 0 < y < 1 2 ? So: 1 ? EY () y=0 The Actuarial Education 41 ?? 52 ?? y=+ y ?? dy = Company 4? 1 11 22 y + y 3? ?? 5? 2 6 ?y=0 8 = 15 [1] IFE: 2022 Examination Page 54 CS1-04: Joint distributions Now: 11 4 32 2 ?? 5 EXY=+ () y x y()3x dydx xy== 00 1 1 43 ? x 52 x=0 1 32 =+ 43 ? =+ 52 x=0 43 58 1 2 3?? yx y ?? dx 3 ?? y=0 1 32?? xx ?? dx 3 ?? 1 43??1 xx 9 ?? ?? x=0 =+ 7 = [2] 18 Hence: 711 8 =18 15 15 cov()XY , 4.6 1 = - [1] 450 The covariance of X and Y wasobtained in Section 2.4 to be cov( X, Y) = 0.02. The variances of the marginaldistributions are: E( )[]2 = 22 0 var()XE )X=- ( X and: var( ) YE( 0.4 + 12 1 E( Y)[]=-Y= 22 ) 0.2 + 22 0.3 + 22 0.4 + 32 0.3 - (0.9) 2 = 0.69 0.4 -()2.2 22 = 0.56 Sothe correlation coefficient is: corr X, )Y==( cov( XY , ) var( 4.7 0.02 )var(XY) 0.69 0.56 = 0.0322 Let X bethe amount of a homeinsurance claim and Y the amount of a car insurance claim. Then: XN( ) and ??YN(1200,30022800,100 ) Werequire: PY ( 1 Y2++ Y3) > ( X1 + X = IFE: 2022 Examinations PY1 + Y2 + X 234) + X Y3 ()+- ( X1 + 800() X ++ X 234+ X ) > () 800 The Actuarial Education Compan CS1-04: Joint distributions Page 55 So we need the distribution ie ( + ( YY 12++ Y3) - ( X1 YY ++ Y() 12 3 - ( X1 + X2 + X3 + X4): of YY 12 + Y3) - ( X1 X2+++ X3 X2 + X4) ? N(3 1200 X3 + - 4 800, 3 300 + 22 4 100 ) X4) ? N(400, 310000) + Therefore: PY1 + Y2 Y3)+- ( X1 X +++X 234 400??-800 Z > 800()( = P >??) X 310,000?? ?? (PZ=> 0.718) 1=4.8 (i) PZ ( < 0.718) = 0.236 Mean (EX) = 1 0.4 2+ 0.2 3+ 0.4 = 2 Alternatively, (ii) [1] wecould usethe fact that the distribution 1)== = PX (1, Y== y) and PX (1)==0.4 gives: PX =(1) 1|YX== 1 2|YX== 1 0.5 about 2. |YX =1 Probability distribution of Using X(| PY y of X is symmetrical 3|YX== 1 0 0.5 [1] (iii) Correlated? To calculate the correlation coefficient, EX () = 2 wefirst require the covariance. from part (i) EY) ( = 1 0.4 2+ 0.2 3+ 0.4 = 2 EXY() = 10.2 2+ 0 + 30.2 So cov( X, ) YE( XY)+- E( X )() E Y =4 Hence corr(,XY) cov( XY , ) - + ? 2 2 = + 0. 30.2 + 6 0 + 9 0.2 = 4 [1] == 0. var( )var(XY) Therefore The Actuarial X and Y are uncorrelated. Education Company [1] IFE: 2022 Examination Page 56 (iv) CS1-04: Joint distributions Independent? X and Y areindependent if Y(, PX x == y) = (PX = x)P(Y = y) for all x and y. However PX( 1, Y== 1) = 0.2 ? 0.4 0.4 (PX = 1)P( Y = 1). = So X and Y are not independent. 4.9 Since the PDF mustintegrate 88 --y/ ??kx e [1] to 1: dxdy=1 a yx==11 Integrating over the x values gives: 8 ? kx e //a -- 8 yy - -+ dx ke a1/?? ?? ?? ??1 -+ x=1 xke == aa y 11 - Integratingthis overthe y valuesgives: 8 y ke ?1 dy y= k aa-- 11=- e -y / ke --8/1/ ?? ??1 = a- 1 Equatingthis to 1: ke - 1/ =1 -1 4.10 The chi-square a- k= ? (1 e1/) a distribution is a continuous distribution that can take any positive value. The chi-square distribution with parameter mis the same as a gamma distribution with parameters m/2 and 1/ 2. So, usingthe PDFofthe gamma distribution, the PDFofthe sum =+Z XYis given by the convolution formula: fZXfz() ?f z 1/21/2 (1/2) 1/21 ? 0 ( x) Y(z=- x)dx (1/2 ) 1/2( )mn + ?? ?? ?? IFE: 2022 Examinations x -- 1/2mx (1/2)mn mn) GG (1/2 z 1--1/21/2 -zm 11 ez=- () x 1/2n -1 e-1/2(z - x) dx ex ?GG 2(1/2 ) (1/2 mn) n1/2 1 z=- x() dx 0 The Actuarial Education Compan CS1-04: Joint distributions Page 57 Usingthe substitution z=tx / gives: 1 mn 1 z ) e +-1/2( 1/2 fZ z() (1/2) 1 1/2(mn) (1/2) + 1/2( mn+) mn) 1/2 G+ =- zt)1/2n-1 zdt GG 0 (1/2 1/2m 1 ) (1/2 mn(1/2 )(zt) - (z ? 1ze- 1/2z (1/2 )mn 1/2 G+ ?GGm nt (1/2 ) (1/2) 1/2m 1 1/2n-1 - (1=- t) dt 0 Sincethe last integral represents the total probability for the Beta (1/2m,1/2) n distribution, weget: 1/2(mn) (1/2) + Zfz() (1/2 G+ 1/2 mn) z1/2( 1/2(mn) + (1/2) = Since this 2 ?mn+ 1/2( mn+) 1 mn+-) mn (1/2 G+ 1/2) matchesthe PDF of the 1 e- 1/2z= <[0 P ze - Beta (1/2 m,1/2n) < 1] 1/2z 2 ?mn+ distribution (and Z can take any positive value), Z is a random variable. It is much easier to prove this result using MGFs. 4.11 (i) Covariance Wehave: cov( X, ) cov(ZX, X=+ Y) cov( X, )=+ cov(XXY , ) var( )=+ cov(XXY , ) Using the correlation corr(,XY) coefficient =- 0.3 between Xand Y gives: cov( XYXY , ) = cov( , ) = var( )var(XY) 4 1 ? cov( XY , ) =-0.6 Hence: cov( X, Z) The Actuarial Education 4=- 0.6 = 3.4 Company [2] IFE: 2022 Examination Page 58 (ii) CS1-04: Joint distributions Variance Using var(Z) = cov()ZZ , : var(Z) cov( =+ XY +XY) , cov( X, )=+ 2cov(XXY , ) var( )=+ 2cov(XXY , ) =+42 - 4.12 Z) + cov( Y, Y) var( Y) + 1 3.8 = Note: 0.6 + [2] var( )?+var( var(XY) as X and Y are not independent. The3+ term does not affect the variance, so: va Z) var( =- XY 2 +3) var( X=-r(2Y) var(X )= var(YX) + var( Y) Now: 2cov( X, Y) and: cov( aX, bY) = ab cov( X, Y) So: var( X 2 )-= var(YX) 5=+ 4 4.13 + 4var( Y) - 2 10 - The moment generating function 4 ( - 12) 2cov( X, Y) = [1] 93 [1] of X is: k t pe ?? Mt =?? X() 1 qet ???? Similarly,the MGFof Yis: m pe ??t Mt () =?? Y 1 qet ???? Since X and Y areindependent,wehave: + XY ( ) Mt MX t() MY () t == pe ?? ?? qe ?? ?? ? tt pe km ? ? tt ? ? ? -?11 qe ? pet = 1 k ?? ?? t ??- + m qe ?? Thisis the MGFof another negativebinomial distribution with parameters p and+km. by uniqueness of MGFs, +XY IFE: 2022 Examinations Hence, hasthis distribution. The Actuarial Education Compan CS1-04: Joint 4.14 distributions Page 59 Let X be the claim size on car policies, so that ?XN 1800,300 ()2 . Let Y be the claim size on home policies, so that ?YN 1200,500 ()2 . We want: PX (2 Y)>= (PX - 2Y > 0) So we need the distribution XY 2 ? N(1800-- [1] of X -2Y : 2 1200, 300 + 22 4 500 ) XY 2 ? N-- 600,1090000() [2] Standardising: 0 --( 600) z==0.575 1,090,000 So: PX (2 Y-> 0) = P Z( > 0.575) = 1 - P Z <(0.575) 1=- 0.71735 The Actuarial Education Company = 0.283 [1] IFE: 2022 Examination Allstudy materialproducedby ActEdis copyright andis sold for the exclusiveuse of the purchaser. Thecopyright is ownedbyInstitute andFacultyEducationLimited, a subsidiary of the Institute and Faculty of Actuaries. Unlessprior authorityis grantedby ActEd,you maynot hire out,lend, give out, sell, store ortransmit electronically or photocopy any part of the study material. You musttake care of yourstudy materialto ensurethat it is not used or copied by anybody else. Legal action will betaken if these terms areinfringed. In addition, we mayseekto take disciplinaryactionthrough the profession orthrough your employer. Theseconditionsremainin force after you havefinished usingthe course. IFE: 2022 Examinations The Actuarial Education Compan CS1-05: Conditional expectation Page 1 Conditional expectation Syllabus objectives 1.3 Expectations, 1.3.1 conditional Definethe conditional expectation of one random variable given the value of another random 1.3.2 The Actuarial Education expectations variable, and calculate such a quantity. Show how the meanand variance of arandom variable can be obtained from expected values of conditional expected values, and apply this. Company IFE: 2022 Examination Page 2 0 CS1-05: Conditional expectation Introduction In this short chapter, we willreturn to the conditional PDF, YX| = fx(, y , that we metin Chapter 4. )x We will explain how to determine the conditional variance, var()YX| = x. expectation, EY (| Xx= ) , and the conditional We willthen see how wecan obtain the unconditional values EY () and var()Y from the conditional meanand variance. We will use conditional expectation in alater chapter when we define the regression line EY]x [| a=+ x. The idea will also feature in other actuarial subjects. In particular, in Subject CS2 we willintroduce the idea of a compound random variable, whichis the sum of arandom number of random variables. Compound random variables can be used to model total claim amounts. We will need the results from this chapter to derive some of the standard formulae for compound random variables. IFE: 2022 Examinations The Actuarial Education Compan CS1-05: Conditional 1 expectation Page 3 TheconditionalexpectationE[Y|X=x] Definition: distribution The conditional expectation of Y given Xx= of Y given Xx= is the mean of the conditional . This mean is denoted EY[| Xx= ] , or just ]EY [| x . If X and Y are discrete random variables, this is: [| ] ?? y EY X x== yP[ Y = y | X x]== PY y[, X== x] yy PX =x [] Question Write down the equivalent expression in the case when X and Y are continuous random variables. Solution EY [| X x]== ??( y f y | x) dy y yy fx(, y) fx() dy Wecan calculate numerical values for conditional expectations. Question Two random variables X and Y have the following discrete joint distribution: Y 10 20 30 1 0.2 0.2 0.1 2 0.2 0.3 0 X Calculate EY (| X =1). The Actuarial Education Company IFE: 2022 Examination Page 4 CS1-05: Conditional expectation Solution X(| EY 1) ?yP( Y == = | y X 1) = 10 PY ( == 10| X = 1) + 20 PY ( = 20| X =1) + 30=PY ( 30| X = 1) 0.2 0.2 0.1 +20 + 30 0.5 0.5 0.5 10= 10= 0.4 + 20 0.4 + 30 0.2 = 18 Wecan also calculate conditional expectations for continuous joint random variables. Question Suppose X and Y havejoint density function given by: 3 fx(, y )=+ x(x 5 y) 0 <x < 1, 0 <y < 2 Determine the conditional expectation EY [| Xx= ] . Solution EY [| Using ==?y X x] y(, fx ) y 2 ? fx=+ () dy andrecallingfrom Chapter4that f fx() 122 x () =?xf( x, y) dy: y 33 () xy dy = 55 y=0 x y + 2 2?? xy ?? 2 y=0 3 ?? =5 2x + x x ()2 65 +( 2x = 1) Hence: EY [| X x]== y 2 ? y=0 xx 5 6 + y() 223 dy = xx +1 () 2 xyy+ 2( 1) yy== 2( 1) 005 2 y23??+xy 23 ??dy 2( xx++?? 1) ?? y=0 11 == 8 xx++ 4 2 +33 == +xy ??y 2( x + 1) dy xx++ 1 x = 3(x 34 + 1) Wecan also calculate conditional expectations in the case wherethe limits for one variable depend on the other. IFE: 2022 Examinations The Actuarial Education Compan CS1-05: Conditional expectation Page 5 Question Let X and Y havejoint densityfunctiongivenby: fXY , (,xy) x 1 6 ()2 0x< y xy=+ Determine the conditional <2 < expectation EY [| Xx= ] . Solution Wesaw in Section 1.4 of Chapter 4 that the PDF of|YX x= 21 y?? 3 x x2?? fYX (,xy|) <??=+ So the conditional x)== ?y 0 21 +?? ? 0 Education Company y ?? 3 x x The Actuarial 0 <y < x 2 expectation is: x EY (| X is: x2?? 22 yy =+ dy 2y 3?? 22 dy x22 y = + x ?? 33xx 39 3??0 xx ?? x2 x2 = 3 + 9x2 = 5 x << 02x 9 IFE: 2022 Examination Page 6 2 CS1-05: Conditional expectation Therandom variableE[Y|X] The conditional be thought expectation EY[| Xx==x ] g( ) , say, is, in general, afunction of as the observed value of a random variable of x . It can gX() . The random variable gX() is denoted ]EY [| X . EY [| X Wesaw in a previous question that So EY [| X] = X +34 3( X + 1) x]== x +34 3(x +1) . This is a function of x . , and this is afunction ofthe random variable X. Note:]EY [| X is also referred to asthe regression In alater chapter the regression line EY[| X] , like any other function will be defined as EY]x[| Theorem: EE[[ Y| X] ] = E Y[ x . whose properties depend Of particular importance is the expected value (the The usefulness of considering this expected value, EE[[ Y| X] ] , comes from the following but true in general. a =+ of X, hasits own distribution, on those of the distribution of X itself. mean) of the distribution of ]EY [| X . variables, of Y on X. result, proved here in the case of continuous ] Proof: EE[[ Y| X] ] = = ?E[ Y| x ] f X ( x) dx ??yf ??yf Weare integrating y(| x) dy() f X x(, y) dx dy == ( x) dx E[ Y] here over all possible values of x and y. Here f (|yx) represents the density function of the conditional distribution of|YX x= . This was written as)fx|YX (, y in Chapter4. Thelast two steps follow by noting that x(| fy ) = fx(, y) fx X() and ) y=fY(),the marginal ?f (,xydx PDF of Y. Thisformula is given on page 16 ofthe Tables. IFE: 2022 Examinations The Actuarial Education Compan CS1-05: Conditional expectation Page 7 Question (i) Calculate EY [] from first principles given that the joint density function of X and Y is: 3 fx(, y )=+ 5 x(x (ii) Given that EY [| X (iii) Hence,confirm that y) 0 < x < 1, x 34 , calculate 3(x +1) + x]== [] EY = 0 <y <2 EE [(|Y X)] . E[ E( Y| X)] for this distribution. Solution (i) EY () =?yf ( y) dy, =?yf ( x, y) dx. and f () 1 1 ? fy() x = 0 2 ? EY () y=0 (ii) So: x y EEY [( | X)] E 33 1 23 1 2 ?? x =+ xy() dx = x + x y?? 55 3 2 x ?? 13 122 y=+ y dy 510 34?? 3( 1 10 3 ++ 4 Xx ??++x3( Xx 1)??? == 1) 10 3?? y ?? ??y=0 5 0 2 y =+ = 3 + 10 y 6 1.2 5= f( x) dx 6 As wesawin an earlier question, fx()x=+ x( 5 x +34 6 x+(x = 3(x +1) 5 E [( E Y| X)] = 1 = 1). So: 11 1) dx = 2 ?? 3x2 5 + 4 x dx xx 00 == =+ 26 xx32??1 21.2 ??x 0 =55 = = (iii) Comparing the answersin parts (i) and (ii), wecan see that EY [] = E[ EY ( | X)] . Wecan also deal with situations where arandom variable depends on the value of a parameter, which can itself be treated as a random For example, consider a portfolio quantity. of motor policies. Claim amounts arising in the portfolio might have a gamma distribution with parameters a and ?. However, different policyholders might have different values of a. If this is true, wecan represent the variability of a overthe portfolio by givingit its own probability distribution. So we might decide that a could be treated as having an exponential distribution over the whole portfolio. Wecan then deduce the mean and variance of arandomly chosen claim amount. The Actuarial Education Company IFE: 2022 Examination Page 8 CS1-05: Conditional expectation Question Therandom variable K has an ?()Expdistribution. For a given value of K, the random variable X has a Poisson () K distribution. (i) Obtain an expression for ]EX [| K . (ii) Hence,calculate EX [] . Solution (i) If Kk= , then X has a Poisson ()k distribution, which has mean k . So EX [| Kk==k] , and this can be written as]EX [| KK= . (ii) E [] X [EE[ X| K]] [EK== ] = 1 . ? IFE: 2022 Examinations The Actuarial Education Compan CS1-05: Conditional 3 expectation Page 9 Therandomvariablevar[Y|X]andtheE[V]+var[E] result The variance ofthe conditional var[ var[]Yx| Y| x ]=- Ex?? { Y |=- ][ E Y [var[ EYX| EY ]] [v var[ x ]} | x = [EY 22 | x] ?? X] Xx=is denoted var[]Yx| E[ Y| - E[ var[ Y X ]] EYar(| X)][{( ]=+Y YE[var[ | X ]] =- E[{( g X)}22 = , where: ]()2 where: g X)}2 {( - ][ Y2 ] - E[{( g X)}2 ] and so: E E [{( gX=+[] )}22|] [ E ()]22, is given by: ]Y=YE Y() g X)} X]] | YX] 22 ) = E[ Y2 | X] ( E[ E[ E[ Y | So the variance of Y, var[ ie [EY| of Y given is the observed value of arandom variable var[]YX| var[YX| Hence distribution +-E ][ E g {( X)}][var(| 22 = E Y X)]var[ + g ()] X var[ E[ Y| X]] . This formula is given on page 16 of the Tables. Question Evaluate var[ | = 1]YX given the joint distribution: Y 10 20 30 1 0.2 0.2 0.1 2 0.2 0.3 0 X Solution var[ YX |1]== E(Y | X = 1)- E 22( In Section 1, wesaw that X(| EY EY (| X== 1) 18 . Similarly: y22? P( Y 1)== 10 100= The Actuarial Education Company YX =|1) . = y| X = 1) PY ( == 10| X = 1) 0.2 0.5 + 400 22 20 PY ( += 20| X = 1) + 302 P( Y = 30| X = 1) 0.2 0.5 + 900 0.1 0.5 = 380 IFE: 2022 Examination Page 10 CS1-05: Conditional expectation So: var[ |YX==1] 380 - 182 = 56 Question Therandom variable K has an ?()Expdistribution. For a given value of K, the random variable X has a Poisson () K distribution. Obtainan expressionfor var[]X | K. Hencederivean expressionfor var()X . Solution If Kk= , X has a Poisson ()k distribution, So var[ X| k== Kk] and hence var[]X | which has variance k . KK= . Usingthe result givenin this section, we have: var[ X] Since ??KE ) [ var(EX| K] var [E( X| K)]=+E = [ K] + var[ K] ()xp , it follows that: var[ ]K=+ XE[ K] 11 var[ ] = + ? IFE: 2022 Examinations ? = ?? + 1 22 The Actuarial Education Compan CS1-05: Conditional expectation Page 11 Chapter5 Summary EY (| X) is the meanof the conditional distribution of Y given X (which wasdefined in Chapter 4). Theformulae for the conditional meanare: EY [| X x]== yP[ Y = y | X y f y x) dy ==??(| EY [| X x] var( )YX| ?? y x]== y yy PY yy fx(, y) fx() y[, X== x] PX =x [] dy (discrete case) (continuous case) is the variance ofthe conditional distribution of Y given X. It is given by: var( YX |) The unconditional E( Y2| X) E( YX=|)[]2 mean and variance can be found from the conditional mean and variance usingthe formulae: EY [] = va The Actuarial Education E[ EY ( | X)] ] YE[var( Y| X)]=+r[var[ E( Y| X)] Company IFE: 2022 Examination Page 12 CS1-05: Conditional expectation The practice questions start on the next page so that you can keep the chapter summaries together for revision purposes. IFE: 2022 Examinations The Actuarial Education Compan CS1-05: Conditional expectation Page 13 Chapter5 PracticeQuestions 5.1 Calculate EX(| Y = 10) given the joint distribution: Y 10 20 30 1 0.2 0.2 0.1 2 0.2 0.3 0 X 5.2 The random variable V follows the Poisson distribution with mean5. For a given value of V, the random variable U is distributed as follows: Exam style |(UV =v) ? U(0, v) Determine the unconditional 5.3 Exam style Suppose that (i) meanand variance of U. X and Y are continuous = E[ EY ( | X)] related variable with conditional EY (| Xx==x) (ii) Suppose that 3 +1 X is a standard with parameters a 3= and ? 2.= Y is a meanand variance of: var( |YXx==x) Calculatethe unconditional 2 2 +5 meanand standard deviation of normal random Y. variable, and the conditional [5] [Total 8] distribution of a Poissonrandom variable Y, given the value of Xx= , has expectation x21+ . Determine )EY ( 5.5 [3] X follows the gamma distribution The random variable Exam style random variables. Provefrom first principles that: EY () 5.4 [4] The table and Y: and var()Y. [5] below shows the bivariate probability X0= X1= X2= Y1= 0.15 0.20 0.25 Y2= 0.05 0.15 0.20 Calculate the value of The Actuarial Education Company distribution for two discrete random variables EX (| Y = 2). IFE: 2022 Examination X Page 14 5.6 CS1-05: Conditional Two discrete random variables, X and Y, have the following Exam style joint probability expectation function: X 1 2 3 4 1 0.2 0 0.05 0.15 2 0 0.3 0.1 0.2 Y (i) Determine var( X| Y 2)=. [3] Let U and V havejoint density function: UV v(, fu ) 6 2uv=- u ()2 0 <u <v <1 , (ii) Determine EU (| Vv= ) . [3] [Total 6] IFE: 2022 Examinations The Actuarial Education Compan CS1-05: Conditional expectation Page 15 Chapter5 Solutions 5.1 Y(| EX 10) ?xP( X == = 1 PX ( == 1| Y 0.2 = 12 + 0.4 1= 0.5 + 2 | = = 10) x Y 10) 2 (PX = 2| Y = 10) + 0.2 0.4 0.5 = 1.5 Alternatively, wecan seethis directly by noting that if weknow that Y 10= , then X is equally likely to be 1 or 2. Sincethis is a symmetrical distribution, the conditional meanis 1.5. 5.2 Weare given in the question that: UV = v U v) |(0, EV==V () 5 var() V?? Poi(5) So: 5 [1/2] and: 11 V2 var( | V) 212 EU (| V)==UV Using the formulae [1/2] on page 16 of the Tables, we have: EU E EU [ ]==U [ | V][] var[ ] var EU [ | V][] + E[ var[ U| V]] Therefore: EU [] E E( U| V)[] var[ ] E V =11E[V??== ] = 21 ?? 22 ??=+VE 11 V2?? ?? 212 ?? var[ ]=+ 11VEV [ 2] 412 var[ V] var[ U] The Actuarial Education [1] var UEU ( | V)[]=+E[ var( U| V] ) var Since EV [] 2 = [1] [ [EV=+, ]]22 we have: 11 3 5 +412 (5 + 52 ) = 34 Company [1] IFE: 2022 Examination Page 16 5.3 (i) CS1-05: Conditional expectation Proof EY (| Xx= ) is a function of x. So, using Eg [ x()] =?g x() f x() dx, we have: x X [EEY(| ]) =?E Y(| x) f(x ) dx [1] x Usingthe definition of EY (| X x)==?y f ( y| x) dy gives: y ?? EE X(| Y )[] f ( x) dx ??y f ( y | x) dy?? ?? = ?? xy Usingthe definition EE X(| Y )[] fy(| x) = f (,xy) ??y)fxfx(,()y = ?? dy?? f(x ) dx ?? ??y f [1] ?? xy = gives: f x() x y) dy dx (, xy ?? yf x(,y) dx?? dy ?? ?? = ?? yx Sinceintegrating the joint density function, function, f (,xy) , over all values of x givesthe marginal density y()f, we have: (| ]) [EEY X ) yf(y) dy E(Y==? [1] y (ii) Calculate the unconditional meanand variance The meanandvarianceof X aregivenby: EX () == =1.5 aa var()X = ? Usingthe result from part (i), ie EY () = ?2 33 = 24 = 0.75 [1] E[ EY ( | X)] : EY ()X=+E[3 X 1] = 3E[ ] +1 = 3 1.5 + 1 = 5.5 IFE: 2022 Examinations [1] The Actuarial Education Compan CS1-05: Conditional expectation Using the result Page 17 var( ) var[YEY ( | X)] var( ) YE[2 X2=+ 5] + var[3 X + 1] 2[EX2]=+ 5 + 9var[ X] (EX )X=+ var( X) Usingthe fact that var( Y) 2= 3 + 5 + 9 page 16 of the Tables: [1] E ( ) = 0.75 + 1.523= : 22 [1] 0.75 = 17.75 Sothe standard deviation is 5.4 E[var( Y| X)]=+ from 17.75 4.21= . [1] Wehave XN? (0,1). So: EX==X () 0 and var() 1 Wealso have (|YXx=+x) ? Poi( EY (| x== Xx 2 1). Hence: +1 and var(|Y x== Xx) 22) +1 [1] Usingthe expectation formula gives: EY () E[ E( Y| X== x)] = E[ X + X) 1] = E( 22 + 1 = 1+ 1 = 2 [1] Now using the variance formula: var( ) YE[var( Y| X)]=+ var[ E( Y| X)] = E( X + 22 + 1) 1) + var( X [1] 22 EX()=+ 1 +var()X Now EX() var( X) [ EX=+= ( )]22 1 + 0 =1. However, well haveto do var( 2)Xfrom first principles: var( XE( X )X=24) [ E( 2)] 2 From the formula for the moments of a standard normal random variable (given on page 10 of the Tables), wesee that: EX4() 2 42 Using EX()X=+var( X) var( 1(1G+ 4) 1 G(5) 1 == = 4! (1G+ 2 2! 4) 22 G(3) 4 E 22( ) = 1 +0 = 1 again gives: 24 =- E2 ( X2) = 3 - 12 ()XXE ) =3 = 2 Hence: var( The Actuarial YE( X )=+ 1 + var( X 22) ) = 1+ 1+ 2 = 4 Education Company [2] IFE: 2022 Examination Page 18 5.5 E (| X CS1-05: Conditional 2)== (i) + 0.4 Y(, PX x xx 0.05 0= 5.6 2)== ??Yx YxP X (| 0.15 1 + 0.4 expectation 2) == (2) PY= 0.2 2 = 0.4 1.375 Conditional variance var( XY |2)== E( X (EX| Y==2) xP X = x | Y = = Y(| EX Y =|2)- E 22 2 0.6 + PY =(2) 0.1 3 0.6 =(2) 0.2 4 + 0.6 5 [1] 6 22 2) == x P X 0 = 8 = PXY=nx 2)==??(x 00.3 12 + 0.6 XY(|2) = 0.6 = x|Y = 2)=??(x2 0.3 22 12 + 0.6 + PXY=n x PY =(2) 0.1 32 0.6 (2) = 0.2 42 + 0.6 5 [1] 6 2 So var( (ii) |XY==2) 8 - 55?? 29 2 ?? = = 0.80556 . 6636 ?? [1] Conditional expectation Werequire: EU (| V v)==? u f ( u| v) du u Now: v ? 6(2 fvuv () =- u )=du 22 u3 6?? uv- u=0 ? v(| fu IFE: 2022 Examinations ) f (,uv ) fv() == 6(2 uv 4 u ) v ?? u = = 6? v3 -11 v3? =4v3 ? = 2 uv-- 22 u vv332 330 [1] ? for <01 <<uv [1] 3 The Actuarial Education Compan CS1-05: Conditional expectation Page 19 So: v EU (| V v)== ? u =0 The Actuarial Education Company 2uv v 23 - u du uv 2134 u ?? 34 33 22vv 33 ??= ?? ??u=0 2 v4--1 v4 = 4 3 32 v = 5 v for 8 <<01v [1] 3 IFE: 2022 Examination Page 20 CS1-05: Conditional expectation Endof Part1 Whatnext? 1. Briefly review the key areas of Part 1 and/or re-read the summaries atthe end of Chapters 1 to 5. 2. Ensureyou haveattempted some ofthe Practice Questionsatthe end of eachchapterin Part 1. If you dont havetime to do them all, you could savethe remainder for use as part of your revision. 3. 4. Attempt Assignment X1. Workthrough the Chapter2 and 5 material(discrete distributions,continuous distributions and conditional expectation) of the Paper B Online Resources(PBOR). Timeto consider... ... learning and revision products Marking Recallthat you can buy Series Marking or moreflexible have your assignments Marking Vouchersto marked by ActEd. Results of surveys suggest that attempting assignments and having them marked improves your chances of passing the exam. the One student said: The insight into that of the myinterpretation model solutions of the questions was helpful. compared Also, the pointers with as to how to shorten the amount of work required to reach an answer were appreciated. Face-to-face and Live Online Tutorials If you havent yet booked a tutorial, then maybe nowis the time to do so. Feedback on ActEdtutorials is extremely positive: I would not pass exams without ActEds lovely, clever, patient know how you managed to find so many great teachers. Online Classroom Alternatively / additionally, you tutors. I dont Thank you! might consider the Online Classroom to give you accessto ActEds expert tuition and additional support: Please You can find lots do an online classroom moreinformation, for everything. including It is amazing. demos and our Tuition Bulletin, on our website at www.ActEd.co.uk. Buy online at www.ActEd.co.uk/estore IFE: 2022 Examinations The Actuarial Education Compan CS1-06: The Central Limit Theorem Page 1 TheCentral LimitTheorem Syllabus objectives 1.5 Central Limit Theorem 1.5.1 random variables. Generate simulated sampling distribution The Actuarial Education and application State the Central Limit Theorem for a sequence ofindependent, identically distributed 1.5.2. statement Company values from a given distribution and compare the withthe normal. IFE: 2021 Examination Page 2 0 CS1-06: The Central Limit Theorem Introduction The Central Limit Theorem is perhaps the basis for large-sample inference is unknown large-sample most important about a population result in statistics. It provides the mean whenthe population distribution and more importantly does not need to be known. It also provides the basis for inference about a population proportion, for example, in initial mortality rates at given age x , or in opinion polls and surveys. It is one of the reasons for the importance of the normal distribution in statistics. We willstudy statistical inference in Chapter 10 (Hypothesis testing). Basically,the CentralLimit Theoremgivesusan approximatedistribution of the sample mean, X, from any distribution. next four chapters. The usefulness of this, though not apparent now, will become clear in the The Central Limit Theorem can also be usedto give approximations to other distributions. Thisis usefulif weare calculating probabilities that wouldtake too long otherwise. For example, PX <(30) where X Bin? (100,0.3) would require them all up. If we use a normal approximation, and the loss of accuracy is slight. IFE: 2021 Examinations us to work out 30 probabilities the calculation and then add of the probability is The Actuarial muchsimpler, Education Compan CS1-06: The Central Limit Theorem Page 3 1 TheCentralLimit Theorem 1.1 Definition If XX ,..., variables X s / - is a sequence Xn12, with finite mean of independent, distributed and finite (non-zero) variance approaches the standard normal distribution, (iid) random s2 then the distribution N(, )01 , as n of ?8 . n It is not necessary to be able to prove this result. as X calculated 1.2 identically Remember that Xis the sample mean, =1 n?Xi. ni =1 Practicaluses The waythe Central Limit Theorem is used in practice is to provide useful normal approximations to the Therefore Xn s / both distributions of particular and?Xi - n ns functions are approximately of a set of iid random distributed as variables. (0,1)Nfor large n. 2 The second ofthese expressions can be obtained from the first just by multiplying top and bottom through by the sample size n . Alternatively, the unstandardised forms can be used. Thus X is approximatelyNn( )s and iX (Nn n is approximately ? s 2) 2 ,/ . , EX ()= In fact, the expressionsfor the meanand variance are exact,ie and var()X s = 2 n . It is the shape of the curve that is approximate. As a notation the symbol ~? is used to ? write the statements ? 2(, ? Xi~) Nn n ? An obvious s in the preceding meanis paragraph approximately as ? ? distributed, so we can ) and (,s 2~/XnN . question is: what is large n ? A common answer is simply = 30n but this is too simple an answer. Afuller answer is that it depends on the shape ofthe population, that is, the distribution of iX , and in particular how skewed it is. If this population distribution is fairly symmetric even though belarge enough; whereasif the distribution is very skewed, non-normal, then = 10n may = 50n or more may be necessary. In other words,the closer the original distribution is to being symmetrical, the better the approximation The Actuarial given by the Central Limit Theorem. Education Company IFE: 2021 Examination Page 4 CS1-06: The Central Limit Theorem Question It is assumedthat the number of claims arriving at aninsurance company per working day has a meanof 40 and a standard deviation of 12. Asurvey is to be conducted over 50 working days. Calculate the probability that the sample mean number of claims arriving per working day is less than 35. Solution Usingthe notation givenin the Core Reading, = 40, Bythe Central Limit Theorem, We want ?. XN(40,122 s=12, n =50. 50). . PX <(3 5). Standardising in the usual way: ?? < P Z< 35) X( P 40??-35 = (PZ < ?? - 2.946) = 1 - (PZ< 2.946) = 1 - 0.99839 = 0.00161 122 50?? We canalsousetheCentral LimitTheorem to answer questions aboutthedistribution of ?X i, rather than X. Question The cost of repairing a vehicle following an accident has mean$6,200 and standard deviation $650. A study was carried out into 65 vehicles that had been involved in accidents. Calculate the probability that the total repair billfor the vehicles exceeded $400,000. Solution Usingthe notation givenin the Core Reading,wehave = 6,200, let s=650, n =65. Also ZN? (0,1) . We wantthe probability that the total repair bill, T is greater than 400,000. The Central Limit Theorem states that: . 22 ) ?TN(65 6200, 65 650 )N= (403000, 5240 . So the probability is calculated T( P 400,000)> IFE: 2021 Examinations P Z> asfollows: 400,000 403,000??- 5,240 ( >?? =PZ ?? 0.572) =PZ ( < 0.572) = 0.71634 The Actuarial Education Compan CS1-06: The Central 2 Limit Theorem Page 5 Normalapproximations Wecan use Central Limit Theorem to obtain approximations gamma distributions. intervals and carrying computer to calculate Poisson and and obtaining confidence out hypothesis tests on a piece of paper. However, it is easy for exact probabilities, confidence intervals and hypothesis tests. Hence, these approximations 2.1 to the binomial, This is useful for calculating probabilities are not as important a as they used to be. Binomial distribution, Bin(n,p) LetiX beiid Bernoulli random variables, that is, PXi (1)== p PXi(0) In other Consider Bin(1,)p , so that: 1- p == words iX is the number X nXX 12,,..., of successes , a sequence in a single of such variables. Bernoulli trial. This is precisely the binomial situationand X = ?iX is the number ofsuccesses inthe ntrials. So X = ?iX ~ Bi )n n (, p. Alsonote that it can be said that, for large X? ? As a result of the Central Limit Theorem n: n (, ? Xi ??~ Nn ~, s 2 / n()N or For the Bernoulli X = X. n ) s2 distribution: [EXii]== p and s 2 Therefore? Xi~?? Nnpnpp(, (1 = var[ X ] = p(1 - p) )) for large n, whichis of course the normal approximation to the binomial. Basically, weapproximate using a normal distribution, the binomial distribution. which hasthe same meanand variance as Question Giventhat X ? Bin (, n p) , derive the meanand variance of X, and hence write down an approximate distribution for The Actuarial Education Company X usingthe Central Limit Theorem assuming n is sufficiently large. IFE: 2021 Examination Page 6 CS1-06: The Central Limit Theorem Solution Since= ?iX, X we have: n ?? ?Xi E() EX ?Xi()E ??== ?? ?? nn var(=XX ) var?? 11 = ? X??i n np var== ? i() ?? ?? So, bythe CentralLimit Theorem, . ?XN . p, = p 11 nn22 np(1)- p = (1- pp) nn (1 pp)???? n ?? . Whatis large n? A commonly quoted rule of thumb is that the approximation can be used only when both np and (1 -np) are greater than 5. Theonly when is a bit severe. It is more a case of the approximation is less good if either is less than 5. However, this rule of thumb agrees with the answer that it depends on the symmetry/skewness of the population. Notethat when p = 0.5 the Bernoulli distribution is symmetrical. In this case both np and = 10n , and so the rule ofthumb suggests that = 10nis large (1 -np) equal 5 when enough. As p moves away from 0.5 towards either 0 or 1the Bernoulli distribution severely skewed. For example, when p = 0.2 or 0.8the rule of thumb enough, but, when p = 0.05 or 0.95 the rule of thumb gives Recall from Chapter 2, that the binomial distribution becomes more gives = 100nas large can also be approximated = 25n as large enough. by the Poisson distribution. This approximation is valid when n is large and pis small. Thiscontrasts with the normal approximation, whichrequires n to belarge and pto be closeto 1/2(although, as n gets larger the normal approximation workswellevenif pis not closeto 1/2). 2.2 Poissondistribution Let Xi, So ==]iEX =1,2,...,in [ ? The Central Limit ? Xi be iid and for large that: n n? ()Poi andso,forlargen, Poi Poi??() ~? N (,?? ) for large IFE: 2021 Examinations variables. var[ ]iX==2s?. Theorem implies ? )Nn n??(, ~? But ?Xi~ Po ?()i random n () ? ~(? Nn n?? , ? ) , or, equivalently, ?. The Actuarial Education Compan CS1-06: The Central Limit Theorem Weare approximating Page 7 using a normal distribution with the same mean and variance as the Poisson distribution. Question ()Poi , whereiX is ?Xi Showthat ()Poi for all i . n?? ? Solution Recall that the Poisson distribution is additive, ie: Poi ?() and Y?? Poi() X X Y Poi(?++?) ()Poi ?Xi Therefore ? n?? . A rule of thumb extensive for this tables one is that the approximation for a range of values of is good if 5? . > However since ? are available, it is only needed in practice for muchlarger values of ?. Rememberthat the Poisson distribution is the limiting case ofthe binomial with ?= np as n ?8 and ?0p . Sothis is consistent withthe rule for the binomial. The normal approximations to the binomial and Poisson distributions (both discrete) are the most commonly used in practice, and they are needed as the direct calculation probabilities is computationally awkward without them. Thisis the point mentionedin the introduction. wed need to 2.3 work out 30 probabilities To calculate (30)PX< where X of Bin? (100,0.3) , and then add them all up. Gamma distribution Let Xi, = 1,2,...,in The exponential distribution Therefore for large Recallthat, if be a sequence n, Therefore ,Y which is In fact, Gam )maa? ( , has mean = ?YXi ? ? (N n Exp ()? variables ?= 1/ and let and variance Y be their sum. 1/s?=22 . ~,n 2) ?? ?Xi? Gamma n ?) . (, (), then ?? Exp Xi of iid Gamma )n(, ? , will have a normal can be approximated by N () provided , 2 aa ? approximation a for large is large. a values of n. need not be an ? integer. Since 2 = Gamma k( ?k 2,1 2), 2 will have a normal approximation ?k )Nk (,2 k for large values ofits degrees offreedom k. The Actuarial Education Company IFE: 2021 Examination Page 8 CS1-06: The Central Limit These approximations Theorem are poorer than those used for the binomial and Poisson distributions owing to the skewness of the gamma distribution. It is therefore preferable to makeuse of the exact result from Chapter 3 that if can then usethe ?2 IFE: 2021 Examinations X ? Gammaa? (, ) and 2a is aninteger, then 2? X?? 2 2 a. We tables to obtain the probabilities. The Actuarial Education Compan CS1-06: The Central 3 Limit Theorem Page 9 Thecontinuitycorrection When dealing with the normal approximations to the binomial and Poisson which are both discrete, a discrete distribution is being approximated When using such an approximation allowed for. the change from distributions, by a continuous discrete to continuous one. must be For an integer-valued discrete distribution, such as the binomial or Poisson, it is perfectly reasonable to consider individual probabilities such as PX =(4) . However if X is continuous, continuous such asthe normal, =PX (4) is not meaningful and is taken to be zero. For a variable it is sensible to consider only the probability that Xlies in some interval. For a continuous distribution, it is not useful to think about the probability of a random variable being exactly equal to a value. For example, for a continuous distribution: 4 P(4==== X 4) = ?f(x dx)= 0 PX (4) 4 To allow for this a continuity correction must be used. Essentially it corresponds to treating the integer values as being rounded to the nearest integer. The diagram belowillustrates the problem. The bars correspond to the probabilities for the Bin(10,0.5) distribution, whereasthe graph corresponds to the probability density function for the normal approximation. 0.3 0.2 f(x 0.1 0 123456789 10 11 x Since the binomial is a discrete distribution, there are no probabilities for non-integer values, whereasthe normal approximation can take any value. To compensate for the gaps between the bars, wesuppose that they are actually rounded to the nearestinteger. For example, the x6= baris assumedto represent values between x 5.5=and x 6.5= . So to use the continuity correction in practice, for example: The Actuarial X= 4 is equivalent to '3 X > 15 is equivalent to ' X > 15.5' X = 15 is equivalent to ' X > 14.5' Education Company X<<.54.5' IFE: 2021 Examinations Page 10 CS1-06: The Central Limit Takethe first example. All values that are contained in the interval rounded to the nearest whole number. in the interval Alternatively, Similarly, values in the interval the bars on the graph: X=4 X>15 4.5 14.5 4 X4= 4.5X<< become 4 when X 15.5> , become values >15Xwhenrounded to the nearest whole number. considering 3.5 3.5 Theorem X = 15 15.5 14.5 15 15.5 15 must, obviously,include all of the X4= bar whichgoesfrom 3.5 to 4.5. X 15>mustnotinclude the X 15=bar(asit is a strictinequality), therefore it shouldstart from 15.5 (the upper end of the 15 bar). X 15=includes the X 15=bar and higher, therefore it should start from 14.5(the lower end of the 15 bar). Question Draw the corresponding (i) X <8 diagrams for: (ii) Hencegive eachinequality X =8 with the continuity correction applied. Solution X<8 7.5 X = 8 8.5 7.5 8.5 88 (i) X 8< mustnotinclude the X 8= bar(asit is a strictinequality). Soit shouldstart from 7.5(the lower end ofthe 8 bar). Thisgives X 7.5< . (ii) X8= includes the X8= bar and lower. 8 bar). This gives X 8.5< Soit should start from 8.5(the upper end of the . IFE: 2021 Examinations The Actuarial Education Compan CS1-06: The Central Limit Theorem Page 11 Lets now see how to calculate a normal approximation to a probability involving a discrete random variable, allowing correctly for the continuity correction. Question Let X be a Poisson variable with parameter 20. Use the normal approximation to obtain a value for PX =(1 5) and use tables to compare with the exact value. Solution Wehave: X ~ Poi(20) X - 20 ~XN(20,20) 20 (PX ??N?? ~ (0,1) ?? 15)== P( X < 15.5) : using continuity PZ< 15.5 20??- 20 Z( ?? = P ?? =-10.84279 , interpolating correction < - 1.006) in tables to be as accurate as possible = 0.15721 . From Poisson tables, PX 15)==(0.15651. Error = 0.0007 , or a 0.45% relative error. It was mentioned earlier that approximations because the direct calculation to the binomial and Poisson distributions of probabilities is computationally are used awkward. Weare now in a position to look at the following example. Question The average number of calls received per hour by an insurance companys switchboard is 5. Calculatethe probability that in a working day of eight hours, the number oftelephone calls received will be: (i) exactly 36 (ii) between 42 and 45inclusive. Assumingthat the number of calls has a Poisson distribution, calculate the exact probabilities and also the approximate probabilities using a normal approximation. The Actuarial Education Company IFE: 2021 Examination Page 12 CS1-06: The Central Limit Theorem Solution If the number of calls per dayis X, then X 4036e 40 Poi? (40). The exact probabilities are: - (i) (ii) ( PX 36)== 0.0539 = 36! In order to calculate this, (42 PX we sum the probabilities 40 42 45)== = 42! 40 4043ee-+ again, using continuity (i) (PXP= 36) 40 4045ee -40 + 44! 0.0495 to this Poisson distribution is + 0.0440 = 45! 0.2064 N(40,40) . Calculating the probabilities (35.5 < X < 36.5) 35.5 40 =F 40 ( - 0.553) PX== 45) 36.5-- 40?? =<PZ -F P(41.5< 40 ?? ( - 0.712) 0.7617=- 0.7099 (42 + 42, 43, 44 and 45: corrections: <?? (ii) -44 40 + 43! 0.0585=+ 0.0544 The normal approximation 40 of getting = 0.0518 X < 45.5) 41.5 40 <?? =<PZ 40 45.5-- 40?? 40 ?? (0.237=< PZ < 0.870) =F (0.870) -F (0.237) = 0.8078 - 0.5937 It is evident that in mostcases using an approximation = 0.2141 makesthe calculations easier, and that the values obtained are fairly close to the exact probabilities. Question Usea normal approximation to calculate an approximate value for the probability that an observation from a Gam (25,50)marandom variable falls between 0.4 and 0.8. Solution The meanand variance of a general gamma distribution are a ? and a ?2 , so here the meanand variance are 0.5 and 0.01 respectively. IFE: 2021 Examinations The Actuarial Education Compan CS1-06: The Central Limit Theorem Page 13 If X ? Gamma(25,50) , then . ?XN(0.5,0.01) and: . (0.4 PX<< 0.8) P( - 1 < Z < 3) =F (3) -F =F (3) - ( - 1) 1 -F (1)[] 0.99865=- 0.15866 = 0.840 Nocontinuity correction is required, as westarted with a continuous distribution. The exact answer is 0.8387. Wecan also usethe Central Limit Theorem to calculate approximate probabilities relating to a sample mean obtained from a random sample from a continuous distribution. Question Calculatethe approximate probability that the meanof a sample of 10 observations from a Beta(10,10) random variable falls between 0.48 and 0.52. Solution (10,10) distribution has mean Usingthe formulae on page 13 of the Tables,the Beta 10 10 + 10 = 0.5 and variance: 10 10 (1010)2 (10101) ++ = 0.01190 + Wehave asample of 10 values. From the Central Limit Theorem, . ?XN . . ?XN 0.5, . , s 2?? ?? , so here n ?? ?? 0.01190?? ??, and: 10 ?? (0.48 PX<< 0.52) P( - 0.5798 < Z < 0.5798) =F (0.5798) =F (0.5798) -F - (1 ( - 0.5798) -F 0.71897=- 0.28103 (0.5798)) = 0.43794 Nocontinuity correction is required asthe beta distribution is continuous. The Actuarial Education Company IFE: 2021 Examination Page 14 4 CS1-06: The Central Limit Theorem Comparingsimulatedsamples Thissection of the Core Readingrefers to the use of Rto simulate random samples. This material is not explained in detail here; wecover it in the PBORresources for Subject CS1. Wesaw in a previous chapter how to use Rto simulate samples from standard distributions. Wecan then obtain the sum or mean of each of these samples. The following R code uses aloop to obtain the means of 1,000 samples of size 40 from a Poisson distribution with mean 5. It then stores these sample means in the vector xbar: set.seed(23) xbar<-rep(0,1000) for (i in 1:1000) {x<-rpois(40,5);xbar[i]<-mean(x)} Notethat we have used the set.seed function so that you can obtain exactly the same results for your simulation. The Central Limit Theorem tells us that the distribution approximately have a N(5,0.125) distribution. The simulated mean and variance of the sample of x are 5.01135 and 0.1250763 means will which are very close. Wecan compare our observed distribution of the sample means with the Theorem by a histogram of the sample means (using the Rfunction hist) superimposing the normal distribution hist(xbar, prob=TRUE, curve(dnorm(x,mean=5,sd=sqrt(0.125)), IFE: 2021 Examinations curve (using the Rfunction Central Limit and curve): ylim=c(0,1.2)) add=TRUE, lwd=2, The Actuarial col="red") Education Compan CS1-06: The Central Limit Theorem Page 15 Another method of comparing the distribution distribution of our sample means, x, withthe normal is to examine the quantiles. In R we can find the quantiles of x using the quantile function. Using the default setting (type 7) to obtain the sample lower quartile, median and upper quartile gives 4.775, 5.000 and 5.250, respectively. However in Subject CS1 we prefer to use type 5 or type 6. In R, we can find the quartiles gives alower quartile, The quantiles obtained of the normal distribution using the qnorm function. This median and upper quartile of 4.762, 5.000 and 5.238, respectively. here are those of a normal distribution with mean 5 and variance 5/40. Thereis no universal agreement amongst statisticians over how to define sample quantiles. The lower quartile, for example,is sometimes defined to bethe position ofthe where n is the sample size. Others mayusethe n +1 4 th sample value, n +2 n+ 3 4 4 th sample value, or even the th value. In R,if we do not specify, R will use definitions n +3 and n+31 for the lower and upper quartiles. Other 4 4 can be used by specifying them in the Rcode. In fact, when we use R, we are often using quite large sample sizes,in whichcase the differences between the different definitions will be minimal. Weobserve that the distribution tails. This is The Actuarial of the sample meansis slightly what we observed in the previous Education Company more spread out in the diagram. IFE: 2021 Examination Page 16 A quick function CS1-06: The Central way to compare all the quantiles in one go is by drawing Limit Theorem a QQ-plot using the R qqnorm. If the sample quantiles coincide with the quantiles of the normal distribution, we would observe a perfect diagonal line (which we have added to the diagram for clarity). For our example we can see that x and the normal distribution are very similar, except in the tails, where we see that x has alighter lower tail and a heavier upper tail than the normal distribution. The QQplot gives ussample quantiles which are very close to the diagonalline. Care needsto be taken wheninterpreting of the distribution, a QQplot. In this example, wesee that, at the top end the sample quantiles are slightly larger than we would expect them to be. This suggests that our sample hasslightly more weightin the upper tail than the corresponding normal distribution. Atthe lower end, again the sample quantiles are slightly larger than we would expect. This suggests that our sample has slightly less weight in the lower tail than the corresponding distribution. This might be the case if the sample distribution was(very slightly) normal positively skewed. If weuse Rto calculate the coefficient of skewnessfor this sample, weobtain afigure of 0.0731. This confirms the very slight positive sample skewness. IFE: 2021 Examinations The Actuarial Education Compan CS1-06: The Central Limit Theorem Page 17 Chapter6Summary CentralLimit Theorem If X1,, ? Xn areindependent andidentically distributedrandom variables with mean and variance2s and nis sufficiently large, then: ?i XNn s i ..2 ?? ()n,and?Xn .. N(0,1) ns 2?? X -s XN ,and ?? n ?? ?? s 2 2 .. ?? (0,1) .. N n Normalapproximations Bin( n, p) can be approximated by Nnp ( , npq) if np Poi()??N) can be approximated if 5, nq>> 5?? ? Gamma( ,a?) by ( , ? can be approximated by N ? ?k2 can be approximated by Nk ( ,2 k) The Actuarial Education Company ? large with continuity correction ?? ()aif large , aa 2 ? if k large IFE: 2021 Examination Page 18 CS1-06: The Central Limit Theorem The practice questions start on the next page so that you can keep all the chapter summaries together for revision purposes. . IFE: 2021 Examinations The Actuarial Education Compan CS1-06: The Central Limit Theorem Page 19 Chapter6 PracticeQuestions 6.1 The number of claims arisingin a month under a homeinsurance policy follows the Poisson distribution with mean0.075. Calculate the approximate probability that at least 50 claims in total arise in a month under a group of 500independent such policies. 6.2 Exam style If X follows the gamma distribution probability that with parameters a 10=and ? 0.2= , calculate the X exceeds 80 (a) using a normal distribution (b) using a chi-squared distribution. Explain which of these answers is more accurate. 6.3 Whenusing the continuity correction with arandom variable X that can take anyinteger value, write down expressions that are equivalent to the following: (i) X7< (ii) X0= (iii) X2=- (iv) 510X<= (v) 38X=< (vi) 6.4 [5] 410 The probability 48X=< . of any given policy in a portfolio of term assurance policies lapsing before it expiresis considered to be 0.15. Consider arandom sample of 100 such policies. Calculatethe approximate probability that morethan 20 policies willlapse before they expire. 6.5 Acompany issues questionnaires to clients to obtain feedback on the clarity of their brochure. It is thought that 5% of clients do not find the brochure helpful. Exam style Let N denote the number of clients who do not find the brochure helpful in asample of 1,000 responses. Calculatethe approximateprobabilitythat 40 6.6 In a certain large population individuals Exam 70N<< . [5] 45% of people have blood group A. Arandom sample of 300 is chosen from this population. style Calculate an approximate group A. The Actuarial value for the probability that more than 115 of the sample have blood [3] Education Company IFE: 2021 Examination Page 20 6.7 Exam style CS1-06: The Central Limit Consider a random sample of size 16 taken from a normal distribution variance =2 s with mean Exam style 25=and 4. Let the sample meanbe denoted X. State the distribution of X and hence calculate the probability that than 26. 6.8 Theorem X assumes a value greater [3] Suppose that the sums assured under policies of a certain type are modelled by a distribution with mean8,000 and standard deviation 3,000. of this type. Consider a group of 100independent policies Calculatethe approximate probability that the total sum assured under this group of policies exceeds 845,000. 6.9 Acomputer routine selects one of the integers 1, 2, 3, 4, 5 at random [3] and replicates the process a total of 100times. Let S denote the sum ofthe 100 numbers selected. Exam style Calculatethe approximate probability that S assumes a value between 280 and 320inclusive. 6.10 Therandom variable Y has a gamma distribution (i) (a) with parameters [5] a(1> ) and ?. Show that the mode of Y is given by: a - 1 ? (b) Byconsidering the relative locations of the meanand modeusing sketches ofthe gamma distribution, state how you would expect the distribution to behavein the limit as a?8, but where ?is varied so that the mean a has a constant ? value . (ii) (iii) 6.11 X1,, (i) Given that (ii) 50=and ? 0.2=, calculate the value of PY ( > 350) using: (a) the chi-squared distribution (b) the Central Limit Theorem. Explain the reason for the difference between the answers obtained in part (ii). nX? areindependent andidentically distributed Gam )maa? ( , Show, using moment generating functions, The random Ex a variable ()p ? distribution, T, representing where 1 Calculate the probability IFE: 2021 Examinations that the total lifetime random variables. X has a Gamma na? n (, of an individual light ) distribution. bulb, follows the 2,000?=hours. that the average lifetime of 10 bulbs will exceed 4,000 hours. The Actuarial Education Compan CS1-06: The Central Limit Theorem Page 21 Chapter6 Solutions 6.1 The number of claims arising from anindividual policy in a monthfollows the Poi(0.075) distribution. follows the Hence,the number of claims arising in a monthfrom 500independent such policies Poi(37.5) distribution. (PX=>50) becomes (PX PZ> This is approximated 49.5) by N(37.5,37.5) . (continuity correction) 49.5 37.5??37.5 ?? ?? (PZ=> 1.960) =-F = 6.2 (a) If X 1(1.960) 0.025 Gamma? (10,0.2) , then EX () 10 == 0.2 , and50 var()X N(50,250) distribution as an approximation. 10 ==250. So we will usethe 0.22 [1] So: (PX 80)> P N(50,250) > 80 [] = P Z> 80 50??- ?? = 1 - Z N(0,1) = 1.89737[] 250 ?? Interpolating in the normal distribution tables between the valuesfor z z 1.90= , wefind that: (PX 80)>= 1 - 0.97111 = 1.89=and 0.02889 [1] ie about 2.9%. (b) Wenow usethe result that if Xis Gam )maa?(, if Xis Gamma(10,0.2) , then 0.4X is 2 ?20 , then X 2? has a 2 ?2a distribution. So , and the required probability is: 2 PX (80) >= P(0.4 X > 32) =P??? 20 >32 ?? [1] 2 From page 166 of the Tables, wesee that the probability that ?20is less than 32is 0.9567. Sothe required probability is 1 0.9567 The answer in (b) is more accurate, since we have not used an approximation. result is exact. The Actuarial Education 0.0433-= , or about 4.3%. [1] The chi-squared [1] Company IFE: 2021 Examination Page 22 6.3 CS1-06: The Central Limit (i) X7< becomes X (ii) X0= becomes (iii) X2=- becomes 6.5< 0.5 < 0.5X-< X 2.5>- (iv) 510X<= becomes 5.5 (v) 38X=< becomes 2.5 (vi) Theorem If Xcan take integer 10.5X<< 7.5X<< values then 10X takes values such as 10, 20, 30,... . So from the inequality in the question, 10X can actually be 10, 20, 30 or 40, which meansthat X can be 1, 2, 3 or 4. So 15X=< , and using a continuity correction on these values, this becomes 0.5 4.5X<< . 6.4 Let X be the number of policies lapsing approximately before they expire. X Bin? (100,0.15) , whichis N (15,12.75) . Using a continuity correction: (PX>>20) becomes (PX PZ> 1=-F 20.5) 20.5 15???? 12.75 ?? (1.54) 1=- 0.93822 = 0.06178 Sothe approximate probability that morethan 20 policies willlapse is 0.062. The exact answeris 0.0663. 6.5 Wehave ?NBin (1000, 0.05). Usinga normal approximation: ??? NN(50,47.5) Using a continuity P(40 [2] correction N<< 70) (40 PN<< 70) N << 69.5) . Hence: [1] (PN < 69.5) - PN ( < 40.5) PZ=< 69.5 50?? ? 40.5-- 50 ? -PZ?? ? < ? 47.5 ?? 47.5 ? ? (PZ=< 2.829) - [1 -PZ( 0.99766=- [1 = IFE: 2021 Examinations P(40.5 - < 1.378)] 0.9159] 0.91356 [2] The Actuarial Education Compan CS1-06: The Central 6.6 Limit Theorem Page 23 Let X be the number of individuals XBin (300,0.45) with blood group A. (135,74.25)??N . [1] . Using a continuity (PX 115.5 135??- P Z 6.7 correction 115) becomes PZ ( > ??>= 74.25 ?? - (PX 115.5)>> : [1] 2.263) =PZ ( < 2.263) = 0.988 [1] If our populationis normal, wedo not needthe centrallimit theorem. The distribution of X is exactly normal: ?XN 2?? s , ?? [1] n ?? ?? Hence: (PXZ>=26) 6.8 Let iX P > 26 25??- ?? = P( Z > 2) = 1 - 0.97725 216?? ?? = [2] 0.02275 be the sum assured under the i th policy. Werequire: 100 ?? ?PXi i ??> 845,000 ?? 1 = Now, according to the Central Limit Theorem: 100 ?i ? XN 100 8000, 100 3000 ()2(approximately) [1] i=1 Therefore: 100 ?? ?PXi Z> 845,000?? = P > 845,000 1=- 0.93319 6.9 30,000 ?? ??i 1 800,000??- = ?? = P Z >1.5() ?? 0.06681 [2] Wehavethe sum of 100 discrete uniform random variables, iX formulae from page 10 ofthe Tables, with a1= , b5= () EXi var()Xi The Actuarial Education +15 == 2 1 12 i =(1,2, ?,100) . Usingthe and h1= , we get: 3 (5=- 1)(5 - 1 + 2) = 2 Company [1] IFE: 2021 Examination Page 24 CS1-06: The Central Limit Theorem Using the Central Limit Theorem: 100 = ?SXi ? N(300,200) [1] . . i = 1 Usinga continuity correction, the probability is: (280 PS == 320) P(279.5 S<< 320.5) [1] Standardising this: (279.5 PS<< 320.5) = P( S 320.5) - P( S < 279.5) < PZ=< 320.5 300?? 200 ?? PZ ( =< 1.44957) -PZ( ? -PZ??< ? ? 279.5-- 300 ? ? 200 ? < -1.44957) (PZ=< 1.44957) - [1 -PZ ( <1.44957)] 2 (PZ =< 1.44957) - 1 2= 0.92641 - 1 0.85282 = 6.10 (i)(a) [2] Mode The modeis the maximum of the PDF y()f: a ? fy() G () y 1 -ea? y y=> 0 a Differentiating and setting the derivative equal to zero gives: d ? fy() dy G a() =- 1) y 2 ye a? y[( ? Alternatively, a a?y - -- e 1)--a?--=y] wecould differentiate a ??- 21(e yya? ? ?? - 0 the log of the PDF. This gives: 0or yy== a - 1 ? Since f y() = 0 and f(0) = 0, the first solution of zero mustbe a minimum and therefore the second solution IFE: 2021 Examinations mustbe a maximum. The Actuarial Education Compan CS1-06: The Central Alternatively, Limit Theorem Page 25 the second solution can be shown to be a maximum by considering the second derivative: d2 dy 2 Substituting a ? fy=() a y= 1 - y e G a() - 2)y 1)( aa ( ?a 1)y -2 ?a( -- a - +? 2y a 1 ??-32 ?? gives: ? d fy() () dy ?a (1) 23 ?a( aa G() a a- (1) a- To ensure this is negative, werequire 22 a- 3 ? ?? =-?? ( G--?? 1)?? aa ?3 aa --(1) e 2) -- )(1 ?a a 1 =- a--(1) e (1)2 (1)a to be positive, hence we have a maximum if 1a> which was given in the question. (i)(b) Sketchlocations of modeand median Weareletting a?8, but keeping constant. The meanis a , which willremain constant. ? - The modeis 11 aa =?? a?8 = ? , which will be less than the - mean , but will tend to as a . So, for large a, the distribution looks like this: f(y) mode mean a 1 a ? ? y The mean and mode are very close together. In fact, the distribution approaches a normal distribution in the limit. The Actuarial Education Company IFE: 2021 Examination Page 26 CS1-06: The Central Limit (ii)(a) Probability using chi-squared ?YGamma (50,0.2) . PY>= ( 350) distribution ?YGammaa? ( Usingthe relationship P(2 Y > ?? 2 Theorem , ) 2? Y?? ? 2 2a : 350) (0.4PY=> 140) P( Usingthe (ii)(b) 2 ?100 => 140) ?2 probabilities on page 169 ofthe Tables gives a value of approximately 0.5%. Probability using normal approximation The mean and variance of the gamma distribution EY ( ) == ? 50 0.2 aa =250 var(Y) = ? = 50 22 0.2 are: =1,250 Bythe CLT,the gamma distribution can be approximated sufficiently large. Here a 50= , whichis fairly large, so: . by a normal distribution provided a is ?YN(250,1250) . Hence: PY> ( 350) P Z > (iii) 350 250??- ?? =P( Z > 2.828) = 1 - 0.99766 ?? 1,250 ?? = 0.234% Explain the differences The gamma distribution is always positively skewed, although it becomes moresymmetrical as a?8 . As a consequence, its upper tail is thicker than that of a symmetrical distribution and the corresponding tail probabilities are higher. IFE: 2021 Examinations The Actuarial Education Compan CS1-06: The Central 6.11 (i) Show Limit Theorem Page 27 mean has a gamma distribution Wehave: MX()== t E etX Eenn n11 =?? ( ) = =?? ()t ?? (ii) X t ?e X? nn ? ? asMX 's identical Xin na t ???? n? ?? Ga byindependence n Thisis the MGFof the Ga X follows the tt ?? ??Ee ? tt()XXn ()? MM nn 1 1=- ?++ XX () ?? na? n (,)mma distribution. Hence,bythe uniquenessproperty of MGFs, na? n (,)mma distribution. Probability that average lifetime of 10 bulbs exceeds 4,000 hours Theindividual lifetimes T follow the Ex ?()p distribution, whichis the same as the Gamma(1, ? ) distribution. So, usingthe result from part (i) we have: T ? Gamma(10 1,10 Using the result from 1 2,000 ) Gamma= (10,0.005) page 12 of the Tables, the probability that the average lifetime T will exceed 4,000 hoursis: PT( 4,000)>=??PP( 20 > 2 0.005 4,000) = ( 22 20 > 40) From page 166 ofthe Tables,this is 0.005. Sothe probability that the averagelifetime will exceed 4,000 hours is 0.5%. The Actuarial Education Company IFE: 2021 Examination Allstudy materialproducedby ActEdis copyright andis sold for the exclusiveuse of the purchaser. Thecopyright is ownedbyInstitute andFacultyEducationLimited, a subsidiary of the Institute and Faculty of Actuaries. Unlessprior authorityis grantedby ActEd,you maynot hire out,lend, give out, sell, store ortransmit electronically or photocopy any part of the study material. You musttake care of yourstudy materialto ensurethat it is not used or copied by anybody else. Legal action will betaken if these terms areinfringed. In addition, we mayseekto take disciplinaryactionthrough the profession orthrough your employer. Theseconditionsremainin force after you havefinished usingthe course. IFE: 2022 Examinations The Actuarial Education Compan CS1-07: Sampling and statistical inference Page 1 Sampling andstatistical inference Syllabusobjectives 2.3 Random sampling and sampling distributions 2.3.1 Explain whatis meant by a sample, a population and statistical inference. 2.3.2 Define arandom sample from a distribution of arandom variable. 2.3.3 Explain whatis meant by a statistic andits sampling distribution. 2.3.4 Determine the mean and variance of a sample mean and the mean of a sample variance in terms of the population meanand variance and the sample size. 2.3.5 State and use the basicsampling distributions for the sample meanand the sample variance for random 2.3.6 State and use the distribution samples from a normal distribution. of the t -statistic for random samples from a normal distribution. 2.3.7 The Actuarial Education State and usethe F distribution for the ratio of two sample variances from independent samples taken from normal distributions. Company IFE: 2022 Examination Page 2 0 CS1-07: Sampling and statistical inference Introduction When a sample is taken from a population certain things about the population. the validity of a statement the sample information can be used to infer For example, to estimate a population quantity or test made about the population. A population quantity could beits meanor variance, for example. So we might betesting the meanof a normal distribution, say. In this chapter, we will consider taking a sample from a distribution and calculating its mean and variance. If we wereto keep taking samples from the same distribution and calculating the mean and variance for each ofthe samples, we wouldfind that these values alsoform probability distributions. The distributions of the sample mean and sample variance are called sampling distributions and will be used extensively in Chapters 9 and 10 to construct confidence intervals and carry out hypothesis tests. Part of this 1 - n 1 work will explain ?? ??? mathematically 22 nX2 SX rather than =- 1 n why the sample variance is usually defined to be ??? SX 22=- nX2?? . We will also makeuse of the Central Limit Theorem from Chapter 6to obtain the asymptotic distribution of the sample mean. Finally, this chapter willlook atthe t distribution and the F distribution in greater detail. You willrequire a copy ofthe Formulae and Tables for the Actuarial Examinations to be able to progress through this chapter. IFE: 2022 Examinations The Actuarial Education Compan CS1-07: Sampling 1 and statistical inference Page 3 Basicdefinitions The statistical methodfor testing assertions such assmoking reduces life expectancy, involves selecting a sample ofindividuals from the population and, on the basis of the attributes of the sample, making statistical inferences about the corresponding attributes of the parent population. Thisis done by assuming that the variation in the attribute in the parent population can be modelled using astatistical distribution. Theinference can then be carried out on the basis of the properties of this distribution. Theoretically this (technique) deals with samples from infinite populations. Actuaries are concerned with sampling from populations of policyholders, policies, claims, buildings, employees, etc. Such populations may be looked upon as conceptually infinite but even without doing so, they will be very large populations of many thousands and so the methods for infinite populations will be more than adequate. 1.1 Randomsamples Aset ofitems selected from a parent population is arandom sampleif: the probability that anyitem in the population is included in the sample is proportional to its frequency in the parent population and the inclusion/exclusion of anyitem in the sample operates independently of the inclusion/exclusion of any other item. Arandom sample is made up of (iid) random variables and so they are denoted by capital Xs. We will use the shorthand notation X to denote a random sample, that is, = XX12,X ,..., Xn(). population distribution fx ? ();, where Due to the considered enough. ? An observed sample will be denoted by = xx12,x ,..., x n(). will be specified by a density (or probability function) denotes the parameter(s) The denoted by of the distribution. Central Limit Theorem, inference concerning a population mean can be without specifying the form of the population, provided the sample size is large Question Identify the population, the sample and the statistical inference in each ofthe following examples. (i) Weare studying cities. (ii) whether air pollution levels are acceptable in UK Weare analysing the burglary claims for last January to get afeel for whatthe total range of claims The Actuarial 10 cities to establish Education might be for the Company whole year. IFE: 2022 Examination Page 4 CS1-07: Sampling and statistical inference Solution (i) Air pollution The population consists of all cities in the UK. The sample consists of the 10 cities selected for study (and the measurements of the pollution levels for these). The statistical inference required hereis to assess whether there are unacceptable pollution levels in UKcities in general. Thisis an example of a statistical test. (ii) Burglary claims The population consists of all possible claims that could arise during the year. The sample consists of the amounts paid for each of the January claims. The statistical inference required hereis to find an approximate range for the total claim amount for the year. Thisis an example of a confidence interval. 1.2 Definition of astatistic A statistic X= ? Xi n of course is a function andSXi2 of 1 n X only and does not involve 1? ()2 X =- any unknown are statistics whereas parameters. 1 n? Xi - Thus ()2 is not, unless - is known. Note here the difference between , which is the population mean(ie the meanfor all possible observations, whichis usuallyunknown)and X, whichis the sample mean(ie the meanofthe sample values which wecan calculate for any given sample). We might also beinterested in statistics such as max iX , the highest value in the sample. A statistic can be generally denoted by X. ()g Since a statistic is a function of random variables, it will be a random variable itself and will have a distribution, its sampling distribution. IFE: 2022 Examinations The Actuarial Education Compan CS1-07: Sampling 2 and statistical inference Page 5 Momentsofthe sample meanandvariance In the following section we willlook at the statistical properties of the sample meanand sample variance, which arethe mostimportant sample statistics. 2.1 Thesample mean Suppose iX has mean and variance s 2. Recall that the sample mean is X = ? Xi . n Consider first ?:Xi ??== X?? EX ?? var iiE ?? ?? ? ?? ?? = ?? ns = var XXii ?? ?? 2 = n since they areidentically sincetheyareindependent since they are identically Weare using the results from Chapter 4 that areindependent 1,, n? XX As distributed EX ]??+ ++ =] X 11var[XXnnvar[ var[ X ] + n s is called the standard n [EX 11][] ++ X = 1 ?XXi , wecannowwritedownthat ?? =?? EX Note: sd=??X ?? distributed ++ E[ X nn]??= , and if . andvar error of the sample ??= ?? mean) and a variance ofns 2 These are very important Aconsequence (ie the population n2 Xns 2 = s 2 n . mean. Wehave establishedthat the sample mean X hasan expectedvalue of population 1 (ie the same asthe variance divided by the sample size). results and will be used extensively in Chapters 9 and 10. of the result for the variance of X is that as the sample gets bigger the variance gets smaller. Thisshould beintuitive since a bigger sample produces more accurate results. 2.2 Thesample variance Recallthat the sample variance is n Considering The Actuarial - 1 SXi?() X221 . =- only the mean of2S , it can be proved that n-1 22 Education s?? =?? 22ES as follows: ?SXi =-nX ?? ??1 Company 2 IFE: 2022 Examination Page 6 CS1-07: Sampling Takingexpectationsand notingthat for anyrandom variable Y, EY and statistical var[Y] inference ()2 ] E[Y=+[]2 (obtained by rearranging var( )Y=( ) [ E( )]22 ) leads to: YEY [ES] 1 n - 1 1?? ?E 22[] =X i ?-() =+s 1-???? 1 n- 1 -() 1 n - 1 nE X2[]() n =+s 22 n=- (1) {} s =ss s ??+ ????22 2 2 nn 2 -nn ???? ?? 2 {} 22 as required. To work out EX2[] , weve used the general formula just [EX2] var( X) The denominator mentioned, which tells us that [EX=+ ]()2 and then weve usedthe results wejust derivedfor the sample mean. of n1- 2 is usedto makethe meanof S equal to the true value of s 2. Thisis the motivation behind the definition of the sample variance. Later in Chapter 8, we will discover that this result meansthat the sample variance is an unbiased estimator of the population variance. Thereis no generalformula for var[]S2 . This depends on the specific distribution ofthe population. The only one that you will be required to know for Subject CS1is for a normal population. Thisis covered in Section 3.2. Question Thetotal number of new motorinsurance claims reported to a particular branch of aninsurance company on successive days during arandomly selected monthcan be considered to come from the Poisson distribution with ?5= . Calculate the meanand variance of a sample meanbased on 30 days figures. Solution The Poisson distribution in the question has meanand variance of 5. If the sample sizeis 30 then 5EX[] = and var[]X 5 ==0.167 . 30 Wecan apply the same theory to situations involving a continuous distribution. IFE: 2022 Examinations The Actuarial Education Compan CS1-07: Sampling and statistical inference Page 7 Question Calculatethe meanand variance of the sample meanfor samples of size 110 from a parent population whichis Pareto with parameters a 5= and ?= 3,000 . Solution ? The Pareto distribution has a meanof a- the question has Thus 750=and [ ] = 750EX and var[]X The formulae for the s , and variance of 1 a? 2 , so the distribution in (12)(aa -- 2) 2 =937,500 . 937,500 110 == 8,522.7 . mean and variance of a Pareto distribution are given on page 14 of the Tables. The Actuarial Education Company IFE: 2022 Examination Page 8 CS1-07: Sampling 3 Samplingdistributionsfor the normal 3.1 Thesample mean The Central Limit Theorem provides alarge-sample X without the need for any distributional large n: Xs / n ,NX ~(0,1) N... .. or 2 ~ approximate and statistical sampling inference distribution assumptions about the population. for Sofor n()/s This result is often called the z result. It transpires that the above result gives the exact sampling distribution of X for random samples from a normal population. 3.2 Thesamplevariance The sampling distribution and variance s 2 (1)nS2 ~n? 2 when sampling from a normal population, with mean , is: - s of2S 2 -1 Thisis a more advanced result. Its proof is beyond the scope of Subject CS1. Whereasthe distribution of X is normal and hence symmetrical, the distribution of2S is positively skewed especially so for small n but becoming symmetrical for large n. 2 2 ?4 f(x) ?20 0.8 0.8 0.6 0.6 0.4 0.4 f(x 0.2 0.2 0 0 05 02 10 46 10 x x Using the 8 ?2 result to investigate the first and second order moments of2S , when sampling from a normal population, and the fact that the meanand variance of 2 ?k are k and 2k , respectively: 1nS 2??() s 2 ??=- En ?? ?? ? E S22 1[ ] s 2 Thisis the result in Section 2.2, in the context IFE: 2022 Examinations = n- 1 (n - 1) = s of a normal distribution. The Actuarial Education Company CS1-07: Sampling and statistical inference Page 9 Wealso have: () 1nS 2??- var ??=- 2( ]nS 1) 2 ? var[ 22= n2( n- n -(1) ?? ?? s 2 44 - 1) = ss 1 Theseresults areimportant. For both X and 2S the variances decrease increases. Addedto the facts that closer to and2S EX[]= gets closer to 2s properties of estimators of and tend to zero as the sample size and ES [] s= 22 , these imply asthe sample size increases. that n X gets These are desirable and2s . Question Calculatethe probability that, for arandom sample of 5 values taken from a N(100,252) population: (i) X willbe between 80 and 120 (ii) S will exceed 41.7. Solution (i) 2=N 5) (100,125): Since ?XN(100,25 (80 PX<< 120) = P 80 100 125 <Z 120-- 100?? ??< 125 ?? (=- 1.789 <PZ< 1.789) =F (1.789) -F 0.96319=- (1 (ii) Since 4S2 s 2 2 ?? 4 ( - 1.789) - 0.96319) = 0.926 , wehave: PS>= ( 41.7) S P s > 22 ??44 41.7 22 25 ?? ?? ?? P( => 11.13) = 1 - ( 22 44 < 11.13)??P Interpolating between values taken from page 165 of the Tables gives: ( PS The Actuarial Education 41.7) Company 0.0253> IFE: 2022 Examination Page 10 CS1-07: Sampling and statistical inference 3.3 Independenceofthe sample meanandvariance The other important X and2S . feature when sampling from Afull proof ofthis is not trivial normal populations is the independence of but it is aresult that is easily appreciated as follows. Suppose that a sample from some normal does not give any information distribution has been simulated. The value of x about the value of 2s . Remember that changing the mean of a normal distribution shifts the graph to the left Changing the variance squashes the graph up or stretches it out. However, if the sample is from information some exponential about the value of 2s , as For the exponential distribution and 2s distribution, or right. the value of x does give are related. these are directly linked since = 1 ? and s 2 1 = ? 2 . Other cases such as Poisson, binomial and gamma can be considered in a similar way, but only the normal has the independence property. Question Calculatethe probability that, for the sample in the previous question, (i) and (ii) will both occur. Solution Since X and2S areindependent, (80 PX<< 120 n S > 41.7) wecan factorise the probability: = (80 < PX < 120) P( S > 41.7) Referring backto the previous question, we have already found the probabilities. So: (80 PX<< 120 IFE: 2022 Examinations n S > 41.7) = 0.926 0.0253 = 0.023 The Actuarial Education Compan CS1-07: Sampling 4 and statistical inference Page 11 Thet result The sampling distribution for X- X , that is, s subsequent units for inference / ~(0,1) or N n concerning when the X ~)Nn( population 2 s ,/ variance , will be used in s 2 is known. However this is rare in practice, and another result is needed for the realistic situation s 2 is unknown. This is the t result The t result is similar to the X- Thus or the z result t sampling but with distribution. replaced s when by S and (0,1)N replaced by tn1- . ~tn- 1. / Sn It is not a sampling distribution for Thekt X alone asit involves a combination of X and S. variable is defined by: tk = N(0,1) ? 2 k where the (0,1)N and 2 ?k random variables are independent / k Thenthe t result abovefollows from the sampling distributions of the last section, that is, Xs / n is the obtain (0,1)Nand - (1) nS 2 s X- / Sn 2 ~n? 2 -1 is the 2 ?k , together with their independence, to ~tn- 1 when sampling from a normal population. The t distribution is symmetrical about zero andits critical points aretabulated. Percentage points (or critical points) for the t distribution can be found Tables. The t distribution has one parameter, which,like the 2? number of degreesoffreedom. on page 163 of the distribution, is called the Whenusingthe t distribution,the number of degreesof freedom is the same asthe number we divide by when estimating the variance. A graph ofthe t distribution is also given on page 163 of the Tables. It looks similar to the standard normal (ie symmetrical) especially for large values of degrees offreedom. The following picture shows a2t density, a 10t density and a (0,1)Ndensity for comparison. The Actuarial Education Company IFE: 2022 Examination Page 12 CS1-07: Sampling In fact, as and statistical inference ktk?8? N(0,1) . , The1t distribution is also called the Cauchy distribution and is peculiar in that none ofits moments exist, not even its mean. However since samples should not arise as a sampling distribution. For k2> , the kt distribution has mean 0 and variance of size 2 are unrealistic, /( it 2)kk. Question State the distribution of X - 100 S 5 for a random sample of 5 values taken from a N(100, 2s ) population. Calculatethe probability that this quantity exceeds 1.533. Solution From previous results X - 100 S 5 ?t 4. From the Tables, wesee that the probability that this quantity will exceed 1.533is 10%. Wenow consider the situation involving IFE: 2022 Examinations two samples from different normal populations. The Actuarial Education Compan CS1-07: Sampling and statistical inference Page 13 Question Independent N(,)11 s 2 (i) random samples of size 1n and N(,)22 2 s and2n are taken from the normal populations respectively. Write down the sampling distributions of1X and2X and hence determine the sampling distribution of (ii) X- 12X , the difference between the sample means. Now assumethat 22 12 s== ss 2. (a) Expressthe sampling distribution of X- 12X in standard normal form. (b) State the sampling -+ -n (1)nS ( 2 11 distribution of 2 s Usingthe (0,1)Ndistribution from (a) and the (c) 1)S 22 2 . ?2 distribution from (b), apply the definition of the t distribution to find the sampling distribution of s 2 X- 12Xwhen is unknown. Solution (i) X1 isNn X (,s 2 11 1) and 2X is The variance of (XX s (ii)(b) As 2 Nn 22 2) . 12Xis the difference between two independent normal, with mean (ii)(a) (,s - 12 and variance X- 12Xis now 12() 1 --- 2 ) 2 11?? s 2 . 11 ?? ??+and so standardising gives: nn 12?? 2 (1)nS2 - their sum is also additive property nn 12 ?N(0,1) s independent), + ??+ nn12?? (1)nS2 11 ?? 2 1 and n 1- - s normal variables and sois itself 22 ss 12 2 ? ofindependent 22 2 ?? 2 n 2- 1 , with 2nn are independent (because the samples are +- 12 ?2 distributions degrees of freedom. (ie +??mn 22 ? ? This is using the 2m+ n ), which we proved, in Chapter 4, Section 4.2. The Actuarial Education Company IFE: 2022 Examination Page 14 (ii)(c) CS1-07: Sampling and statistical inference Weusethe definition of the t distribution: tk = N(0,1) 2 k ?k (0,1)N , and the distribution in part (ii)(b) is The distribution in part (ii)(a) is (XX 12() 2 s The2s . So: --- 12) 11?? ??+ nn 12?? (1)nS 11-+ ( n2 s 2 ?nn2+- 12 2 22 - +-2 tnn 12? 1) S2 nn 12 +- (2) s cancel to give: (12 XX() (1)nS 11-+2(n --- 12) nn (2)+12 Wewillseethat n1 (1)nS 11-+ 2(n ?tnn 11??- 1)S222 - 12+-2 ??+ n2?? 1)S22 2 , whichappearsin the denominator,is the pooled nn (2) 12 +- variance ofthe two samples. It is a weighted average of the individual sample variances, using the degrees of freedom IFE: 2022 Examinations asthe weightings. The Actuarial Education Compan CS1-07: Sampling 5 and statistical inference Page 15 TheFresultfor varianceratios /Uv1 , where U and /Vv2 The F distribution is defined by F = variables samples with1v and2v of size 1n 2 and 2 , then s1 s2 degrees of freedom respectively. and 2n 22 s11 22 s22 S / S / respectively ~ Fnn,-12 are taken from V areindependent 2? random Thus if independent random normal populations with variances . 11 TheF distributiongivesusthe distributionofthe varianceratiofor two normalpopulations.1v and2v can be referred to asthe number of degrees of freedom in the numerator and denominator, respectively. It should be noted that it is arbitrary denominator and so S / S / Since it is arbitrary 22 s22 22 s11 which one is the ~ Fnn,-21 11 numerator and which is the . which value is the numerator and which is the denominator, and since only the upper percentage points are tabulated, it is usually easierto put the larger value of the sample varianceinto the numerator andthe smaller sample variance into the denominator. Alternatively, FF 1,nn12 -- 1 ? 1 ~~ -Fn n1, F 2 1. 1- This reciprocal form is needed when using tables of critical points, as only upper tail points are tabulated. See Formulae This is animportant and Tables. result and will be used in Chapter 9in the work on confidence intervals and Chapter 10in the workon hypothesis tests. Thepercentagepointsfor the F distributioncanbefound on pages170-174ofthe Tables. Question Determine: (i) PF( 9,10> 3.779) (ii) PF12,14<(3.8) (iii) PF11,8 <(0.3392) (iv) the valueof p suchthat PF( 14,6 ) 0.01p<= . The Actuarial Education Company IFE: 2022 Examination Page 16 CS1-07: Sampling and statistical inference Solution Byreferring to the Tables on pages 170 to 174: (i) 3.779is greater than 1, so we usethe table of upper percentage points to see that: PF( 9,10 3.779) 0.025>= ie 3.779is the 21/2%point of the (ii) 9,10Fdistribution (page 173). Since 3.8 is greater than 1,it is again an upper value and so we use the Tables directly. Weturn the probability around as follows: PF ( 12,14 3.8)<= 1 (iii) PF( 12,14> 3.8) = 1 - 0.01 = 0.99 1 Sincethis is alower percentage point, we need to usethe Fmn result: , ( 11,8 < 0.3392) = P PF (iv) ?? ??>= P F8,11 ?? ?? F11,8 1 ??11 >2.948) ??>= PF8,11 ( ??0.3392 0.3392 = 0.05 Sinceonly1%ofthe distributionis below p, thisimpliesthat p mustbealower percentage point. So we usethe 1 Fmn result again: , PF F<=p) ( 14,6 P 6,14 ?? > ?? = 0.01 ?? 11 ? = 4.456 pp ? p = 0.2244 The meanofthe F distribution is 1,regardless ofthe number of degrees offreedom. So values such as 0.3392 and 0.2244 given above are valuesin the lower tail, whereas3.779 and 3.8 are upper tail values. Wenow apply the F result to problems involving sample variances. Question For random samples of size 10 and 25 from two normal populations with equal variances, use the ?? 2 F distribution to determine the values of a and P S12 ?? S22 ?? ?? such that P S1 S22 a ??>= ?? ?? 0.05 and ??<= 0.05, wheresubscript 1 represents the sample of size 10 and subscript 2represents the sample of size 25. IFE: 2022 Examinations The Actuarial Education Compan CS1-07: Sampling and statistical inference Page 17 Solution Since the population variances are equal, From the table of 5% points for the PF( 9,24 2.300) 0.05>= , and therefore Now weknow that 1 == 2.900 S21 S22 < S12 S22 ?F9,24and F distribution a= S22 S12 ?F24,9 . on page 172 of the Tables, wefind that 2.300 . is equivalent to S22 S12 > 1/ and ( 24,9PF 2.900)>= 0.05, giving 0.345 . Wecan use the F distribution to obtain probabilities relating to the ratio of two different sample variances. Question Calculatethe probability that the sample variance of asample of 10 values from a normal distribution will be morethan 6times the sample variance of a sample of 5 valuesfrom an independent normal distribution with the same variance. Solution If X denotes the sample with 10 values and Y denotes the sample with 5 values, weknow that asthese arefrom independent normal distributions, S SYYs22 Sincethe populationvariancesare equal,this meansthat 2 SoPPSX 2 SY ?? ??>= ?? 22 sXX ?F 9,4 . 22F?XY SS 9,4. F9,4>66. () From the Tables page 172 wesee that the upper 5% point of 9,4F is 5.999. Sothe required probability is just less than 5%. The Actuarial Education Company IFE: 2022 Examination Page 18 CS1-07: Sampling and statistical inference The chapter summary starts on the next page so that you can keep all the chapter summaries together for revision purposes. IFE: 2022 Examinations The Actuarial Education Compan CS1-07: Sampling and statistical inference Page 19 Chapter7 Summary The sample meanand sample variance are given by: X= ?Xi 22nn1111 ??X -nX2()2 and SXiiX () =- n = -- Wecan find their sampling EX== () means and variances. var(X) s For any distribution: 2 n S( )s= E 22 For a normal distribution only: 4 2s var( S2) = n- 1 The standard deviation of the sample meanis known as the standard error of the sample mean. Tofind probabilities involving any distribution: ??? XN , s X or S2, weneed their distributions. For alarge sample from 2?? ?? n ?? ?? X hasthis exactdistribution(rather than it beingapproximate)for anysize of samplefrom a normal distribution. Whensampling from a normal population, the sample For a random sample from a normal population, if X- n s If s 2 s 2 mean and variance are independent. is known: ?N(0,1) is unknown: X- ? tn- 1 Sn For a random sample from (1)nS2 - s The Actuarial 2 Education a normal population: 2 ??n-1 Company IFE: 2022 Examination Page 20 CS1-07: Sampling and statistical inference If wetake random samples from two independent normal populations: S S 22 s11 22 ? Fnn s22 1,12-- 1 The t and F distributions are defined as: tk = N(0,1) 2 ?k 2 ,Fmn = k ?m 2 ?n m n To determine probabilities involving the lower tail ofthe F distribution, PFk,, k() mn(<= IFE: 2022 Examinations weusethe result: PFnm > 1/ ) The Actuarial Education Compan CS1-07: Sampling and statistical inference Page 21 Chapter7 PracticeQuestions 7.1 Arandom sample of n observations is taken from a normal distribution variance s2. The sample variance is an observation of a random with mean and variable 2S . Derive expressionsfor ES2() and var( 2)Susing the relationship between the gamma and chi-squared distributions given on page 12 of the Tables. 7.2 (i) Determine: PF 3,9 <(3.863) (a) (ii) Exam style PF( 10,10< 0.269) (b) PF18,9p>=() Determine the value of p such that: (a) 7.3 (b) PF( 24,30 ) 0.10p>= Arandom sample of 10 observations is drawn from the normal distribution standard deviation 15. Independently, normal distribution with mean respective sample 7.5 with mean and a random sample of 25 observations is drawn from the andstandard deviation 12. Let X and Y denotethe means. Evaluate PX Y-> (3) . 7.4 99% [3] Calculate: (a) PF6,8 >(6.371) (b) PF7,12 >(0.3748) . (i) (a) State the definition ofthe kt (b) Showthat: X- distribution. ?tn- 1 Sn usingX (ii) ? s , Nn ()2and (1)nS - 22 ?s? n2-1. (a) Statethe definition ofthe Fmn , distribution. (b) Showthat for suitably defined samples: S S 22 s11 22 ?Fmn1,-- 1 s22 using the fact that The Actuarial Education Company (1)nS - 22 ? 2 s? n -1,. IFE: 2022 Examination Page 22 7.6 7.7 CS1-07: Sampling inference Evaluatec suchthat: (a) PF( 2,15 ) (b) PF8,5c<=() 97.5%c<= 5%. Show that: PF mn 7.8 and statistical a>= b () P=?? Fnm ,, < ? 1?? a?? b Arandom sample ?110,,XX is drawn from the (5,4)Ndistribution. Evaluate: 7.9 Exam style (i) ??> PX 60 ??? (ii) PX (iii) PXX>-4 and Let X(, ,12 ??? ??X-> 34() 2 1 ( 9 Exam style ??? ?,XX9) be a random sample from a N(0,)s 2 distribution. Let X and2S denote the sample meanand variance respectively. Calculate the approximate 7.10 ) 2 < 2.6??X . ?? value of PXS> () by referring to an appropriate statistical table. [3] House pricesin region Xare normally distributed with a meanof 100,000 and a standard deviation of 10,000. House pricesin region Yare normally distributed with a meanof 90,000 and a standard deviation of 5,000. Arandom random sample of 5 houses from region Y. Calculate the probability sample of 10 houses is taken from region X and a that: (i) the region X sample meanis greater than the region Ysample (ii) the difference between the sample meansis less than 5,000 [3] (iii) the region Xsample variance is less than the region Ysample variance [3] (iv) the region X sample standard deviation is mean more than four times greater than the region sample standard deviation. IFE: 2022 Examinations [3] Y [2] [Total 11] The Actuarial Education Compan CS1-07: Sampling 7.11 and statistical inference The time taken to process simple home insurance Page 23 claims has a mean of 20 mins and a standard deviation of 5 mins. Exam style Calculate,stating any assumptions, the probability that: 7.12 Exam style (i) the sample meanofthe times to process 5 claimsis less than 15 mins [2] (ii) the sample [2] (iii) the sample variance of the time to process 5 claims is greater than 6.65 mins [2] (iv) the sample standard deviation of the time to process 30 claims is less than 7 mins [2] (v) both (i) and (iii) occur for the same sample of 5 claims. mean of the times to process 50 claims is greater than 22 mins [1] [Total 9] Astatistician suggests that, since a t variable with k degrees offreedom is symmetrical with mean0 and variance variable N 0, k k- 2 for k2> , one canapproximatethe distribution usingthe normal k ?? ??k 2?? . (i) Usethis to obtain an approximation for the upper 5% percentage points for a t variable with: (a) 4 degrees offreedom, and (b) 40 degrees of freedom. (ii) Compare your answers with the exact values from tables result. [2] and comment briefly on the [2] [Total 4] The Actuarial Education Company IFE: 2022 Examination Page 24 CS1-07: Sampling and statistical inference The solutions start on the next page so that you can separate the questions and solutions. IFE: 2022 Examinations The Actuarial Education Compan CS1-07: Sampling and statistical inference Page 25 Chapter7 Solutions 7.1 The sampling distribution for S2 is: (1)nS2 - s The 2 ?k 2 ??n-1 2 distribution is the same asthe gamma distribution k 2 = Ek 12 2??== a ?k?? ? (1)nS2 ?? ?? s var 2??== ?k?? - ?n-1 n -(1) s (1) ? Sn 1)=var () 2( ? 1 ?= 2 . Therefore: k 2 = 2k 22 (1 2) 22 () n=-1 ? s()= ES 2 () 22 S 24 nn--ss2(1) 4 2 ==var 42 1n- n-(1) 3.863is greater than 1. So, usingthe upper percentage points from the Tables: (PF3,9 (i)(b) 2 and ES 2 s (i)(a) a= (1)nS2????== var ?n-1 ()2 2( n - 1) 2 ?? s ?? var 7.2 a ? EE ()2??n1 ? ??== 2 and k with 3.863) <= 1 -P( F3,9>3.863)= 1 - 0.05 = 0.95 Sincethis is alower percentage point, weneed to usethe 1 Fmn result: , (PF10,10 0.269)<= P F10,10> (ii)(a) 1 ?? ?? =P(F10,10>3.717) = 0.025 0.269?? Since only 10% of the distribution is above p, p mustbein the upper tail. Soreading off from the 10%tables gives: (PF24,30 p) (ii)(b) 0.10>= ? p =1.638 Since 99% of the distribution is greater than p, p mustbein the lower tail. So we need to usethe 1 Fmn result: , (PF18,9F>=p) The Actuarial Education Company P 9,18 ?? < ?? = 0.99 ?? ? ? ?P F ? 11? 0.01 ?>=9,18 pp? IFE: 2022 Examination Page 26 CS1-07: Sampling and statistical inference Reading off the 1% tables: 1 = 3.597 p 7.3 ? =p 0.278 PXY-> (3) , therefore Werequire we needthe distribution of XY- . The distributions of the sample meansare: ?? 152212 ?? Y?? 10?? ?? XN N ? ? ,, ? ? ? ? 25 ? ? [1] The meanofthe difference is the difference of the means,and the variance of the difference is the sum ofthe variances: XY ? N =??0, (PX 7.4 (a) 152212 ?? -+ 10 25 ?? ?? Y-> 3) = P( Z > 0.564) N(0,28.26) = 1 - P( Z < 0.564) [1] = 1 - 0.71362 = 0.28638 [1] Probability 6.371is greater than 1. Usingthe upper percentage points from the Tables: PF( 6,8 6.371) (b) 0.01>= Probability Since this is alower percentage point, we need to use the 1 result: Fmn , (PF7,12 0.3748)>= P F12,7< 7.5 (i)(a) Definition 1 ?? ( <2.688) ?? =PF12,7 0.3748?? = 1 - (P F12,7 >2.688) = 1 - 0.1 = 0.9 of t distribution If ZN? (0,1) and W k??2 , and Z and W areindependent, then: Z /Wk (i)(b) ? tk Show t result Standardising, weget: X s - n ?N(0,1) IFE: 2022 Examinations The Actuarial Education Compan CS1-07: Sampling and statistical inference Page 27 Wealso have: S2 2 s 2 ?n-1 ? n- 1 Substituting these into the definition of the tn1- distribution: Xn s X - 22 s S = ?tn- 1 Sn (ii)(a) Definitionof Fdistribution If U ??2 and m /Um /Vn V ??2 n, and U and Vareindependent, then: ? Fmn , (ii)(b) Showresultis Fdistribution 2 and S2 2 from normal Assumingtwo samples of size mand n, with sample variances S1 distributions 2 with variances 2 s1 and s2 , respectively, -(1)mS 22 2 S 112?? ? m-1 ? ?m- ss -(1)nS ?n- 221 m -1 11 22 S 222?? ? we have: 221 ss 22 2 ?n-1 n-1 Hence, by the definition of the Fmn1, -- 1 distribution: 7.6 (a) 22 11 S s S s22 22 ? Fmn 1, -- 1 Since97.5%ofthe distributionis below c, c mustbe onthe uppertail. Soreadingfrom the 21/2% tables gives: (PF2,15 (b) c)<= 0.975 ? PF ( 2,15 c)>= 0.025 ? c = 4.765 Sinceonly 5%of the distributionis below c, c it mustbein the lower tail. So weneedto usethe 1 Fmn result: , (PF8,5F<=c) The Actuarial Education Company P ?? 5,8 > ?? = 0.05 ?? ? 11 = 3.688 ? c = 0.2711 cc IFE: 2022 Examination Page 28 7.7 CS1-07: Sampling Taking reciprocals, and statistical inference we obtain: PF mn a ()>= b 11?? ? P?? , <= Fa?? ?? mn, From the definition of the mn= b F distribution: ?? ? 22 mn mn1 ==FF,,nm 22 mn ?? nm , nmF Hence: PF mn () a>=b 11?? ? P?? 1?? ? P F,,nm <= Fa?? <=b ??b a?? ?? mn , 7.8 Probability of sum (i) ? Using the result that (? PXi (ii) ?XNi n (, 60)>= P Z > sn2)N=(50,40) 60 50??40 Probability of central Z( ??= P ?? > 1.581) = 1 -F (1.581) = 0.0569 moment (1)nS2 -= n S-() 1)22(XXi and 2 Since? , we obtain: - ??2 n-1: s (iii) 2 S 22 ??=P[9 S >34]??= P??? PX ?( i X)-> 34 934?? > ?? 44 ?? = P ?? 2 9 > 8.5? ? = 1 - 0.5154 = 0.485 Joint probability Since =-X? SX9 ()i2 and the fact that X and2S areindependent when weare sampling 1 from a normal distribution: [PX > 4 and S 2.6]<=> [PX 4] P[ S < 2.6] Now: var(X)== So ?XN(5,0.4) s 2 n 4 = 0.4 10 , and: [PXZ>=4] IFE: 2022 Examinations P > 45??- ??= P(Z > - 1.581) =F(1.581) = 0.9431 0.4?? The Actuarial Education Compan CS1-07: Sampling and statistical (1)nS2 - Also using s ??2 2 [PS<=2.6] Hence 7.9 Using inference [PXS><4 and X- Sn n-1: S P Page 29 < 22??99 2.6 44 ?? = ?[ P ?? ?? 2.6] = 0.9431 2 9 < 0.9145 = 15.21] = 0.9145 0.862 . ?tn- 1: XX 3 S S 3 ??? tt88 [1] Considering the probability in the question: PX S ()>= P?? > ?? ?3 XX =P? [1] From the Tables, wecan see that this probability lies between 1% and 0.5%. Byinterpolation find that the probability is approximately 0.89%. we [1] ?SS ?? 7.10 ? >13? = (P t8 >3) (i) ? Probability that the meanof Xis greater than the mean of Y Werequire PXY>=Y () 1,000s, the distributions XN 100,10() >0), therefore weneedthe distribution of X Y- . (PX - of the sample and Working in means are: ??YN 90,5() So: XY ? N 100-- 90,10 + 5 ()= N(10,15) [1] and: (PX The Actuarial Y-> 0) = P Z > Education Company 010????= P(Z > - 2.582) =F(2.582) = 0.995 15?? [2] IFE: 2022 Examination Page 30 (ii) CS1-07: Sampling Probability that the difference Usingthe distribution of X Y(|PX Y-<| 5) = P(5- between and statistical inference meansis less than 5,000 from part (i): < X - Y < 5) PX=- Y <(5)- P( X - Y < -5) 510?? PZ=< 15 ?? ? --?510 - -PZ??< ? =F Probability -F S 2 22 YY (3.873)] -F (1.291) that the sample variance SS2 s X ss [1] [1] Werequire PSS<= S() P SX 22 XY 2 X < - 3.873) 0.0983 = (iii) (1.291)] - [1 (3.873) ? 15 ? ? PZ ( =<- 1.291) -PZ ( [1=-F [1] 2 S2 XY == 2 X Y s 2 Y of Y <1()2 . Usingthe definitionofthe F distribution,weget: 22 S XY S of X is less than the sample variance 22 S XY S = 4 102 52 ? F9,4 Hence: 22 PS XY S 1()<= <?? P 22 SS ?? = 0.25) XY< 0.25(PF9,4 ?? 4 [1] ?? Sincethis is in the lower tail we needto usethe 1 Fmn result: , PF9,4 (0.25) <= P F4,9> 1 ?? ( ??= PF4,9 > 4) 0.25?? This valueis between 21/2% and 5%, and,interpolating, approximately 4.2%. (iv) Probability 22 XY>= wefind that the probability is [1] that the sample s.d. of X is greater than four times the sample s.d. of Y (4 S We require PS)XYX>= PS S [1] P S 16() P 22 SY > 4() = P SX SY >16(). 22 SS Using the result from (iii) we get: ?? > 4) XY 4??= (PF9,4 ?? 4 [1] > ?? IFE: 2022 Examinations The Actuarial Education Compan CS1-07: Sampling and statistical inference Page 31 From the Tables: PF9,4 3.936 () 10%>= Sothe required probability is approximately 10%. 7.11 Probabilityofsample mean (i) X ?)Nn (,s 2 distribution [1] (5)=n holds exactly for samples from the normal distribution if n is large. and approximately Since we only have a sample of size 5, we require that for any we are sampling from a normal distribution. (PX 15 15)<= P <??Z 20??- since X ? N(20,5) 5 ?? PZ=<- (2.236) 1=-F = (2.236) 0.0127 [2] Probability of sample mean (ii) As n is large, werequire (50)=n no assumptions other than it being a random sample, although the answer will be approximate if the sample is not from a normal distribution. 22 20??- PX (22)>= P >??Z since X ? N(20,0.5) 0.5 ?? PZ=> (2.828) 1=-F = (iii) Probability (1) - nS2 s 2 2 (2.828) 0.00234 [2] of sample variance only holdsfor samples from a normal distribution. ??n-1 Therefore werequire that we are sampling from a normal distribution. 2 PS 4 S2 (6.65) >= P >?? s P = The Actuarial Education 0.9 Company 2 ? 4=> 4 6.65?? 22 5 ?? ?? (1.064) (from page 168 ofthe Tables) [2] IFE: 2022 Examination Page 32 (iv) CS1-07: Sampling Probability of sample standard and statistical inference deviation Again werequire that weare sampling from a normal distribution: 29 S (PS P<=7) < 22?? 29 7 22 5 s ??= P(?2 < 56.84) 29 ?? ?? [1] Usingthe figuresfrom page169 ofthe Tables,andinterpolating, wefind that PS< ( 7) 0.998. [1] (v) Probability of(i) and (iii) both occurring X and2S areindependent if weare sampling from a normal distribution. So makingthis assumption, weget: (PX < 7.12 (i)(a) Sn 6.65)>=<PX ( 15) P( 2215 S 6.65)>= 0.0127 0.9 = 0.0114 [1] Normal approximation for 4t Wehave: 4 ?tN(0,2) (approximately) Werequirethe valuea suchthat4Pt a>=() 0.05. Usingourapproximation,weget: 0??- ??>= 0.05 ?? PZ (i)(b) Normal approximation ? aa 22 for = 1.6449 ? a =2.326 [1] 40t Wehave: 40 ?tN 0, 40?? ?? 38?? (approximately) Werequire the value b such that 0 ????>= 0.05 40 38?? ?? PZ (ii) Pt(40 ? ) bb 40 38 0.05b>= . Usingthe approximation, weget: =1.6449 ? b =1.688 [1] Compare approximate results withthe exact values From the t tables, Pt(4 Pt(40 wesee that: 2.132) 1.684) IFE: 2022 Examinations 0.05>= 0.05>= ie a = 2.132 ie b =1.684 [1] The Actuarial Education Compan CS1-07: Sampling and statistical inference Page 33 Wecan see that the approximation of 2.326for the upper 5% point ofthe 4t whereas the approximation of 1.688 for the upper 5% point of the distribution is poor, 40t distribution is quite good. Thissuggeststhat the t distributiontends towards the standard normal distribution asthe number of degrees of freedom increases. The Actuarial Education Company [1] IFE: 2022 Examination Allstudy materialproducedby ActEdis copyright andis sold for the exclusiveuse of the purchaser. Thecopyright is ownedbyInstitute andFacultyEducationLimited, a subsidiary of the Institute and Faculty of Actuaries. Unlessprior authorityis grantedby ActEd,you maynot hire out,lend, give out, sell, store ortransmit electronically or photocopy any part of the study material. You musttake care of yourstudy materialto ensurethat it is not used or copied by anybody else. Legal action will betaken if these terms areinfringed. In addition, we mayseekto take disciplinaryactionthrough the profession orthrough your employer. Theseconditionsremainin force after you havefinished usingthe course. IFE: 2022 Examinations The Actuarial Education Compan CS1-08: Point estimation Page 1 Pointestimation Syllabusobjectives 3.1 The Actuarial Estimation and estimators 3.1.1 Describe and apply the method of momentsfor constructing estimators of population parameters. 3.1.2 Describe and apply the method of maximum likelihood estimators of population parameters. 3.1.3 Definethe following terms: efficiency, bias, consistency and meansquare error. 3.1.4 Define and apply the property of unbiasedness of an estimator. 3.1.5 Define the estimators. 3.1.6 Describe and apply the asymptotic distribution of maximumlikelihood estimators. Education Company mean square error of an estimator for constructing and useit to compare IFE: 2022 Examination Page 2 0 CS1-08: Point estimation Introduction In manysituations we will beinterested in the value of an unknown population parameter. For example, we might beinterested in the number of claims from a certain portfolio that wereceive in a month. Suppose we have the following Claims Frequency (number of months) data relating to 100 one-month periods: 0 1 2 3 4 5 6 9 22 26 21 13 6 3 It maybethat weknow that the Poisson distribution is a good modelfor the number of claims received, but the natural question is what is the value of the Poisson parameter ?. This chapter gives two methods that can be used to estimate the value of the unknown using the information provided by a sample. parameter Thefirst methodis called the method of moments andinvolves equating the sample momentsto the population moments. The second method is called the the parameter value that method of maximum likelihood would maximise the probability and uses differentiation to find of us getting the particular sample that we observed. Theseare not the only methods of obtaining estimates (for examplein Subject CS2 we will meet the method of percentiles). Thetwo methods we meet here do not always give the same value for the estimate (although they often do). Later in this chapter we willlook at how to decide whether the formulae that we obtain for the parameter estimates give good estimates based upon their average value and their spread. The expression point estimation parameter value. This contrasts refers to the problem of finding a single number to estimate the with confidence interval estimation (covered in the next chapter) where we wishto find a range of possible values. Thisis a keytopic in moststatistics courses. IFE: 2022 Examinations The Actuarial Education Compan CS1-08: Point estimation 1 Page 3 The methodof moments The basic principle is to equate population moments (ie the means,variances, etc of the theoretical model)to corresponding sample moments (ie the means,variances, etc ofthe sample data observed) and solve for the parameter(s). 1.1 The one-parametercase This is the simplest case: to equate population mean, EX () , to sample mean, x, and solve for the parameter, ie: EX[] = 1 n? xi n i=1 Question Arandom sample from an ()Exp? distribution is as follows: 14.84, 0.19, 11.75, 1.18, 2.44, 0.53 Calculatethe method of moments estimate for ?. Solution The population meanfor an ()Exp? distribution from page 11 ofthe Tablesis EX () = 1 . ? 14.84 0.19++ 11.75 The sample meanis x== = 5.155 1.18 + 2.44 + 0.53 6 Equating these gives us the 1 + 5.155. method of moments estimate: 0.1940 ? =? ? Thisis an estimate of ?rather than the true value, and we distinguish this by putting ahat or similar over the parameter. Wecan apply this methodto a number of different single parameter distributions. For example, the method works well with arandom sample from a Poisson distribution. Note: For some populations on -??) (, However The Actuarial or the normal the Company the parameter, N(0, )s 2 , in which case a higher-order such cases are rarely Education mean does not involve such as the uniform moment must be used. of practical importance. IFE: 2022 Examination Page 4 CS1-08: Point estimation ()=-=1/2( ??+ ) 0. Settingthis equalto the sample ) distribution has EX meanis not going to be helpful. So what weshould dois to use, say, the variance, For example the var(X)=- [ 12 ??-U(, 22= , asthis involves the parameter. 3? 11 ( ??- )] Wecould then equate this to the sample variance. Question Therandom sample : 2.6, 1.9, 3.8, 4.1, 0.2, 0.7, 1.1, 6.9 is taken from a (,??- ) distribution. U By equating the sample and population variances, calculate an estimate for ?. Solution 2 90.97= and ?xi Forthese sample values,?xi 11.3= . So the sample variance is: 2 s 1??11.3?? ??2 90.97 8 ?? =10.7155 =- So using the formula 12 3 ? ???? ?? for the population variance given above, we have: =10.7155 Solving this, wefind that The estimator is distribution. 78 ? 5.67= . written in upper case asit is a random variable and will have a sampling The estimate is written in lower case as it comes from an actual sample of numerical values. Be careful to distinguish between the words estimate and estimator. particular numerical value that results from usingthe formula, eg = actual sample values being used). Onthe other hand,estimator Estimate refers to a x (the lower case denotes refers to the random variable representing anysample,eg = X. 1.2 Thetwo-parameter case Withtwo unknown parameters, we will require two equations. This involves equating the first and second-order moments sample, and solving the resulting pair of equations. of the population and the Moments about the origin can be used but the solution is the same (and often more easily obtained) using mean itself. IFE: 2022 Examinations moments about the mean apart from the first-order moment being the The Actuarial Education Compan CS1-08: Point estimation The first-order EX [] Page 5 equation is the same asin the one-parameter case: =1 n?xi ni= 1 The second-order equation is: = 1 n EX ?? xi ?? n 22? i =1 or equivalently: EX ?? 1 n -= ?? ?? n i 1 () 1 n -ix x()22? ni= 1 = 1 var( X) ie: ni n i xi =-? x22 xx22 =-? =1 Weare not equating sample and population variances here; weare using a denominator of n on the right hand side ofthe final equation, whereasthe sample variance uses a denominator of n1. Question Show that these two second-order equations give the same answers for the parameter estimates. Solution Starting with the last Core Reading equation above, our two equations are: (EX) x and var(X)== 1 n?(ix - 2 x) Expanding the brackets in the second equation var(X) 11 (ii)xx =?22 = {?? x nn gives: 1 - nx2}= n xi 2 - x2 Sinceourfirst equationis EX () =x, wehave: 1 var()=-i? X n Since EX() EX() The Actuarial 22 [xE( X)] ie ?221 += var( X) [ EX ( )] n xi var( X)=+[ E( X)]22 , we now have: =1 ?xi 22 Education n Company IFE: 2022 Examination Page 6 CS1-08: Point estimation Thisis the other second-order Wecan now find equation. So the two second-order equations are equivalent. method of moments estimators in the two-parameter case. Question )n n (, p distribution yields the following values: Arandom sample from a Bi 4, 2, 7, 4, 1, 4, 5, 4 Calculatemethod of moments estimates of n andp. Solution There aretwo unknown parameters so we need two equations. The population meanfor the Bi )n n (, p distribution from page 6 of the Tablesis n 1 Equating these gives []=EX x? i n =i 1 ? np = EX () = np. The sample meanis x . 31 == 3.875 . 8 (1)3.875 Thereis noformula for 2EX () on page 6 of the Tables. However, since X) ( )=-22var( EX [ E( X)] , we have: EX() var( X)=+ [ (EX)]22 We alsohave1 2 xi n p)-+ ( np)=2 np(1 Substituting = np(1 - 143 8 ==? 17.875. p) +( np)2 Equating this to :() EX2 17.875 (2) equation (1) into (2) gives: 3.8 75(1 =pp )-+ 3.875 = 17.875 ? 2 0.2621 . Sincenisthe numberoftrials,the true valuecannotbe 14.78.Therefore it is Hence,n 14.78= likely to be 14 or 15. Alternatively, n using the second of the second-order equations gives var()X p=-np(1 ) and 2 ?xxi22-=1143 -3.8752.859375. Equating these gives: n=i 1 8 np(1 )-=p2.859375 IFE: 2022 Examinations = (3) The Actuarial Education Compan CS1-08: Point estimation Substituting Page 7 equation (1) into (3) gives: 3.875 ? pp (1 = 0.2621 )-= 2.859375 and hence n 14.78=as before Wecan apply the method of moments to other distributions. Question Arandom sample of size 10 from a Type 2 negative binomial distribution with parameters k and pis asfollows: 1, 1, 0, 1, 1, 1, 3, 2, 0, 5 Calculate method of moments estimates of k andp. Solution There are two unknown parameters so we need two equations. Type 2 NBin k(, p) distribution from page 9 ofthe Tablesis EX() = x The population (1 -kp) p meanfor the . The sample meanis 15 ==1.5 . Equating these gives: 10 n EX []= = ?xi ? 1( 1-kp ) =1.5 npi 1 (1) Thereis noformula for 2EX () on page6 ofthe Tables. However,since X) EX ( )=-22var( [ E( X)] , wehave: EX() (1 kp) 22 [ E( X)]=+= p2 var( X) xi We alsohave 2143 4.3. ==? n (1 kp 2 p Substituting 1.5 p 10 p ?? Equating these gives: 2 (1 -- kp ) )?? ?? 4.3 += p ?? ?? (2) equation (1) into (2) gives: 1.52+= 4.3 ? Hence, equation (1) gives k The Actuarial 2 (1-- kp)?? +?? Education Company p = 0.7317 4.091= . IFE: 2022 Examination Page 8 CS1-08: Point estimation Alternatively, using the second ofthe second-order equations gives var(X) = (1 -kp) and 2p n ? ni =1 22-=143 - 1.52 xxi 10 (1 -kp) 2 p Substituting =2.05. Equatingthese gives: (3) = 2.05 equation (1) into (3) gives: 1.5 = 2.05 p p = 0.7317 ? and hence k 4.091= as before. Notethat 2s with divisor )n -(1 is often used in place of the second central sample moment, ie weoften usethe definition of the sample variance quoted on page 22 ofthe Tables. Sothe second-order equation is now: var( X) 22 ??2 x nn == sx - x() = nn 11 ii ii == 11 ??11?? ??-nx 2 ??--?? Usingthis version will not give the same estimates asthose obtained using the previous second-order obtained. equations. The advantage of this However, if n is large there is little difference method is that 2S is an unbiased estimator between the estimates of the population variance. The importance of this property is covered in more detail later. Question Arandom sample from a Bi )n n (, p distribution yields the following values: 4, 2, 7, 4, 1, 4, 5, 4 Calculatemethod of moments estimates of n and pusingx ands2 (use a denominator of n1for the sample variance). Solution Wehave sample 3.875 IFE: 2022 Examinations mean and variance of: and xs ==17 143 8 3.87522= ()- 3.26786 The Actuarial Education Compan CS1-08: Point estimation The population Page 9 mean and variance are: EX [] np= Equating population np var[ X]p=-np 1 () and: and sample statistics gives: 3.875 and np(1 p) = 3.26786=- Solving gives =p 0.1567and n 24.73= (which Wecan also apply the are different from the values calculated method of moments to continuous previously). distributions. Question The sample mean and sample variance for alarge random sample from a distribution are 10 and 25, respectively. Usethe Gamma (,a? ) method of moments to estimate a and ?. Solution Equating the mean and variance, 10 and ? aa we get: ==225 ? Dividing the first equation by the second gives: 10== 0.4 25 For cases with ? ?a more than two 10= 0.4 = 4 parameters, moments about zero should be used. For example, if we havethree parameters to estimate, we would usethe set of equations: EX [] xii [EX ]== 11 x ?? 22 nn EX [ ] = n 331 ?xi This approach can be extended in an obvious wayfor morethan three parameters. The Actuarial Education Company IFE: 2022 Examination Page 10 2 CS1-08: Point estimation The methodof maximum likelihood The method of maximum likelihood estimators. In particular determined asymptotic Asymptotic 2.1 is widely regarded maximumlikelihood properties as the best general method of finding estimators have excellent and usually easily and so are especially good in the large-sample situation. here means whenthe samples are verylarge. The one-parametercase The mostimportant stage in applying the methodis that of writing down the likelihood: n () x??i ?Lf ( = ; ) i =1 for a random sample xx12,,,? xn from a population with density or probability function fx(; ?) . n ? ?f x()i would mean f ( x ) meansproduct,so fx12() fx 3() ? (fx)n . The above i =1 statement is saying that the likelihood function is the product of the densities (or the probability functions in the case of discrete distributions) calculated for each sample value. Remember that ? is the parameter whose value weare trying to estimate. The likelihood is the probability of observing the sample in the discrete proportional to the probability of observing values in the neighbourhood the continuous case, and is of the sample in case. Thelikelihood function is afunction of the unknown parameter ?. So different values of ? would give different values for the likelihood. The maximum likelihood approach is to find the value of that would have been mostlikely to give usthe particular sample wegot. In other words, we need to find the value of ?that maximisesthe likelihood function. For a continuous distribution the probability of getting any exact value is zero, but since x PX x()= +e ? f ( t ) dt 2e f( x), wecan see that it is proportional to the PDF. x -e In most cases taking logs estimator (MLE) Differentiating greatly simplifies the determination maximum likelihood ?. the likelihood or log likelihood derivative to zero gives the maximum likelihood IFE: 2022 Examinations of the with respect to the parameter and setting the estimator for the parameter. The Actuarial Education Compan ? CS1-08: Point estimation Page 11 Example sample of size n ie x (,1 ?, x n) from the exponential Given a random fx () x=>?,0x, the e?- n L=?() n ?fx =()i i=1 log ? () x?? log MLE, ?, is found ? - xi =??? n population with density as follows: ee - ? ? xi i=1 ?= Ln log - n () ? =-? ? ?i Lxi ? ?? equating to zero: ? ? xi MLEis -= 0 ? ?= ? nn ? == ?xi 1 x 1 X 1 Notethat x likelihood is a maximumlikelihood estimate, ie a numerical value, whereas estimator, ie a random 1 X is a maximum variable. It is necessary to check, either formally or through simple logic, that the turning point is a maximum. Generally the likelihood starts at zero, finishes at or tends to zero, and is non-negative. Therefore if there is one turning point it must be a maximum. Theformal approach would be to check that the second derivative is negative. For the above example weget: 2 log L( ?) =d dn ?? 22 <0 ? max It is important that we do check, whether formally or through simple logic, and state this (together with your working/reasoning) in the exam to receive all the marks. At the differentiation stage, any terms that do not contain the parameter (? in this case) will disappear. So whenthe log-likelihood is written down, any terms that dont contain the parameter can bethought of asa constant. Wecan calculate maximumlikelihood estimates for parameters from discrete distributions too. The Actuarial Education Company IFE: 2022 Examination Page 12 CS1-08: Point estimation Question Arandom sample of size n (ie x nxx? 12,, , Derive the (ii) The sum of a sample of 10 observations from a Poisson () distribution is 24. Calculate maximum likelihood estimator estimate, of ()Poi distribution. (i) the maximum likelihood ) is taken from a . . Solution (i) Thelikelihood function is: n ()== ? e xi - constant Le -n xi ! i= 1 ?xi Takinglogs: ln ( )=- constant Differentiating d d +?Ln i lnx with respect to: ln ( ) =+ ?xi Ln This derivative is equal to zero when: ==?xi x n Differentiating again(to check that it is a maximum): d2 ln L( ) =- d ?xi< 0 22 ? max Sothe estimate (the value obtained for a particular sample) is =x . The estimator (the random variable)is X. (ii) Wehave n == IFE: 2022 Examinations 10= x and ?xi 24= . Hence the 24 10 = estimate is: 2.4 The Actuarial Education Compan CS1-08: Point estimation Page 13 MLEsdisplaytheinvarianceproperty,whichmeans thatif ?is the MLEof ?thenthe MLE of a function ()g ? is g() ? . For example,the MLEof -221 ? is -221. ? Question The MLEsof the parameters =2 s of alognormal distribution have been found to be = 2 and 0.25. Derivethe maximumlikelihood estimate ofthe meanofthe lognormal distribution. Solution The formula for the ?= e 1 + s 2 mean ?(say) of alognormal distribution =e page 14 of the Tables): 2 Theinvariance property tells usthat the MLEsof ,? ,and ? is (from s arerelated bythe same equation: 1 2s 2 + Sothe MLEof the meanis: 2.2 1 + e ? .25 202 == 8.37 Thetwo-parametercase This is straightforward in principle and the but the solution iterative of the resulting methodis the same as the one-parameter case, equations may be more awkward, perhaps requiring an or numerical solution. The only difference is that a partial derivative is taken before equating each to zero and solving the resulting with respect to each parameter, system of simultaneous equations for the parameters. Soin summary, the steps for finding the maximumlikelihood estimator in straightforward cases are: Write down the likelihood function, L. Find ln L and simplify the resulting expression. Partially differentiate ln L withrespect to each parameter to be estimated. Set the derivatives equal to zero. Solvethese equations simultaneously. In the two-parameter complicated, The Actuarial case, the second-order condition that is used to check for maxima is more and weshall not discuss it here. Education Company IFE: 2022 Examination Page 14 CS1-08: Point estimation Question Derivethe MLEsof and s for a sample of n IID observations from a N(, s 2) distribution. Solution The likelihood function is: n ? exp 2 i 2?? 11 x =- 2?? sp=1 -n ????i= 1 ?? ?? s ????- n 2 ?Lx exp - 2s i=1 -si ()2 ?? constant ?? ?? Takinglogs: log =-Lnlog 2s Differentiating ? n 1 - 2 ? i -sx ()2 +constant i= 1 with respect to log and s gives: 11??nn 2( =- = ? ss2 ? logLx- =- n s ?s - 2 - 22 ??)Lx x n ???? ii ?? ii ==11 12?? ) = 1 ?? 1 nn ?? 22 (xii ) - n????(?? ?? 32 sss== ii 11 ?? Setting these to zero and assuming these are maxima gives: 1n xi ==? x ni= 1 Also: n i 2.3 = 2= =-s ()ixs 1 11 nn 22?? s =s 1 nn-- n Aspecialcase the uniformdistribution For populations where the range of the random variable involves the parameter, care must be taken to specify when the likelihood is zero and non-zero. Often a plot of the likelihood is helpful. An example of a random variable where the range involves the parameter is the uniform distribution: f() IFE: 2022 Examinations 1 -ba xa=< x < b The Actuarial Education Compan CS1-08: Point estimation Page 15 Welook at this in the next question. Note how wespecify when the likelihood is zero (ie it does not exist for the specified values of the parameter) and non-zero (ie whereit does exist for the specified values of the parameter). The second important feature about this question is that the usual route for finding the using differentiation breaks down. maximum Question Derivethe maximum likelihoodestimateof ? for]U[0, ? basedon arandomsampleofvalues x12, nxx ,..., . Solution For asamplefrom the]U[0, ? distribution we musthave ?xx, n? 0,1 . Hence max?ix= == . Thusthe likelihood for asample of size n is: ? 1 ? L = ?? if n ? ?> maxxi 0otherwise ? Differentiation doesnt workbecausedd second derivative shows the problem n whichgivesaturningpointof L( ) =-ln ??? n d2 ln L(? ) d ?? 22 => 0. So using common sense, we mustfind the value of ?that ??8. The Wehave a minimum as ??8. maximisesL()? = 1 . Wewant ?to be n ? as small as possible subject to the constraint that ?= maxix . Hence ?= ix . max 2.4 Incomplete samples The method of maximum likelihood can be applied in situations wherethe sample is incomplete. For example, truncated data or censored data in which observations are known to be greater than a certain value, or multiple claims where the number of claims is known to be two or more. Censored data arise when we have information about the full range of possible values but that information is not complete (eg when we only know that there are, say, 6 values greater than 500). Truncated data arise when we have noinformation about part of the range of possible values (eg when we have no information at all about values greater than 500). In these situations, as long as the likelihood (the probability information) can be written as a function of the parameter(s), Again in such cases the solution of observing the given then the method can be used. may be more complex, perhaps requiring numerical methods. The Actuarial Education Company IFE: 2022 Examination Page 16 CS1-08: Point estimation For example, suppose a sample yields n observations ,xx(, 12 ?, )nx and m observations greaterthanthe value y, then thelikelihood is given by: ?? n >?? ? Lfxi () i Our estimate = ?? ) = P( X y)[](, m ?? ?? 1 will be as accurate as possible if we use all the information that we have available. Forincompletesamples,wedont know whatthe valuesabovey are. All weknowis that they nm + aregreaterthan y. Since thevaluesabovey areunknownwecannotuse () = ?Lfx(i ?? , ). We i= 1 instead use the formula given. If theinformation is moredetailed thangreaterthan y wecanusea moredetailed likelihood function.Forexample, if wehavem observed valuesbetweeny andz, and pobserved values above z, in addition to the n known values, then we would use: n () ( ?? , ) ?Lfxi= i P( y < X< z)[][mp P(X >z)] =1 Question Claims(in 000s) on a particular policy have a distribution fx() 2cxe- cx2 with PDFgiven by: x=> 0 Seven of the last ten claims are given below: 1.05, 3.38, 3.26, 3.22, 2.71, 2.37, 1.85 The three remaining claims were known to be greater than 6,000. Calculate the maximum likelihood estimate of c. Solution Wehave 7 known claims and 3 claims greater than 6. So the likelihood 7 Lc ()X= ?f ( xi ) P( is: 3 > 6) [] i=1 IFE: 2022 Examinations The Actuarial Education Compan CS1-08: Point estimation Page 17 7 8 Since PX>= (6) 22 8 - e-- cx dx ??= ?? cx ? 2cxe 7 -c and ?xi2 = 49.91, the likelihood function is: i =1 62 ?? ?? i=1 7 2 -cxi ? constant= ce i =1 constant= 7 ce- 62 3 2 ? 2cxi e-cxi =??e The log-likelihood =e ??6 6 () Lc -c e7108 c 157.91c is: ln Lc=+ ( ) constant 7ln c- 157.91c Differentiating the log likelihood gives: 7 d ln Lc () dc =- 157.91 c This derivative is equal to zero when: c 157.91-= 0 ? c 77 ==0.0443 157.91 Differentiating again to check weget a maximum: d2 ln)Lc ( =- dc 7 22 c <?0 max Soc 0.0443= . If we have some claims about which nothing is known (ie we dont even know whether there are any claims of a particular type), then the data are said to be truncated, rather than censored. need to take a slightly different We approach here. Question The number of claimsin a year on a petinsurance policy are distributed asfollows: No. ofclaims,n PN=() n 0 1 2 5? 3? ? 3= 19? Information from the claims file for a particular year showed that there 1 claim, 24 policies with 2 claims and 16 policies with 3 or more claims. were 60 policies with There was no information about the number of policies with no claims. The Actuarial Education Company IFE: 2022 Examination Page 18 CS1-08: Point estimation Calculate the maximum likelihood estimate of ?. Solution Since we have no information at all about zero claims, we need to determine the truncated distribution. All we dois omit the zero claims probability and scale up the remaining probabilities (whichonlytotal to ?-15 ) sothat they nowtotal to 1: No.ofclaims,n 1 PN =() n 2 3? ? - 15 ? 15-? - - These probabilities table is actually can also be thought PNN=> (1| N(1| PN 0)=> = 19? 15? of as conditional probabilities, ie the first probability in the 0). Usingthe definition of conditional probability, weobtain: PN =(1) 3? = PN>-5? (0) 1 and we obtain the same probabilities The likelihood =3 as before. is: 60 PN ( = 1) [PN( = 2)]24 [PN=( 3)]16 [] 0) =constant LN? >(| So: 60 ?? -?? (?| LN> 0) = constant ?? ?? 15 ?? 15 ?? ?---?? 84 (1 (1 - constant= 24 ?? 16 ? ? ??319 ? 15 ?? 9 ) 16 - ?? 5? )100 The constant arisesfrom the fact that we dont know which ofthe 60 policies had 1 claim, etc and so there is some combinatorial factor to account for this. The log-likelihood is: ln -?? ( LN>= | 0) constant + 84ln + 16ln(1 - 9? ) 100ln(1 - 5? ) Differentiating: dd? ln (?| 0) LN>= 84 - 9 16 19 ?? + 5 100 1-- 5? Setting this equal to 0: IFE: 2022 Examinations The Actuarial Education Compan CS1-08: Point estimation 84(1 Page 19 9 )(1-84 ? ? ? Differentiating 5 ?? ) 144 ?(1 - - 9 ?) = 0 = 0.102 again to check we get a maximum: 84 - 9 9 16 + 5 5 100 <0 when?? 9 22 ) (1 d? ?= 500 ?(1 + 820 ? -= 0 d22ln ( |LN>=0) So 5 ?) - (1-- 5? ) 2 ?? = 0.102 ? max 0.102 Independent samples For independent overall likelihood samples from two populations which share a common is the product of the two separate likelihoods. parameter, the Question The number ofclaims, X, per year arisingfrom alow-risk policy hasa Poissondistribution with mean . The number of claims, Y,per year arisingfrom a high-riskpolicyhasa Poisson distribution with mean2 . Asample of 15low-risk policies had atotal of 48 claimsin a year and asample of 10 high-risk policies had a total of 59 claimsin a year. Determine the maximumlikelihood estimate of based on this information. Solution Thelikelihood for these 15low-risk and 10 high-risk policiesis: 15 10 ()== (??LPX x ) ij 11 10 xi 15 P(Y=yij ) = ? e ? == 2 iji!!=1xyj=1 ? yj ?xi 15 i=1 48 constant e-- 10 15 constant= (2 ) yj 15 j=1 ee -- 59 -- 20 ee = 20 = constant 107 e -35 Thelog-likelihood is: ln L( ) constant=+ 107ln 35 - Differentiating: d ln L( ) 107 =-35 d The Actuarial Education Company IFE: 2022 Examination Page 20 CS1-08: Point estimation Thisis equal to 0 when: 107 == 35 3.057 Differentiating again to check we get a maximum: d2 lnL() d So =- 107 22 <0 ? max 3.057= . IFE: 2022 Examinations The Actuarial Education Compan CS1-08: Point estimation 3 Page 21 Unbiasedness Consideration of the sampling good it is as an estimator. to be located distribution can give an indication of how ofthe estimator near the true value and have a small spread. If we have arandom sample parameter of an estimator Clearly the aim is for the sampling distribution ? and 12, X ,..., Xn()=from a distribution XX gX() is an estimator of ?, it seems with an unknown desirable that [Eg()]=X ?. This is the property of unbiasedness. Wecan think of an unbiased estimator as one whose mean value equals the true parameter value. Question Showthat the estimator for obtained in the question on page 12is unbiased. Solution In this question we have arandom sample from the = ()Poi distribution X. Toshowthat this is unbiased weneedto showthat E() and the estimator is = , ie EX ()= . Wehave: ?? E()== EX Since ? Xi Xii?? ?? ?? Poi () we have n EX ()== = So the estimator If an estimator is ? 1 11 nn ?? nn EX i ()= 11 n nni X= . Hence: = is unbiased. biased, its between the expected (EX ) ii== 11 bias is given by value of the estimator ??X ()Eg ?, ie it is a measure of the difference ?? and the parameter being estimated. If the biasis greater than zero, the estimator is said to be positively biasedie it tends to overestimate the true value. Alternatively, the bias could beless than zero, leading to a negatively The Actuarial biased estimator that Education Company would tend to underestimate the true value. IFE: 2022 Examination Page 22 CS1-08: Point estimation Question Thefollowing are estimators for the variance of a distribution having mean and variance2s . Obtainthe biasfor each estimator: 2 (i) (ii) s 2 n 1 - ?SXi X()2 =- n 1i= 1 1 n ni= 1 X=-i? X ()2 Solution (i) Theformula for the bias of2S is: S()s=- E 22 S ()as bi Consider 2 ES() 2: E() ES ?? nn (X =- X22 ) =?? ??? E ?? nn--?? 11 2????11 ??Xii 2 nX ??-?? ii==11 ?? ??? n 1 ?? 1 ? EX()=-i nEX( 22)?? ?? n?? i=1 Since: 22 E ( )i EXii()X=+ var(X ) = s 2 + 2 and: 22 EX()==+var( X) E ( X) s2 n + 2 weget: 1 ES() n 1-???? ?(22=+ 1 n-1 1 n- 1 -s =+nn 22 2 ) -ns s ???? 2 ?? 2 ??+ ?? ??i=1 nn s2 - n 2 () n=- (1)s2 =s2 IFE: 2022 Examinations The Actuarial Education Compan CS1-08: Point estimation Page 23 So: bias S() = E( 22 S ) - 2 2 =-=0s2 ss This meansthat 2S is an unbiased estimator of2s (ii) Sinces n= . 221 S , wecan use the result from part (i) to get: n ()22EE=?? n11 ?? nn-S == nn ?? -21 ss n 2 E( S ) So: bias () Ess 22()=- The property of unbiasedness estimator/parameter. s 2 n= s2 -s is not preserved 2 = -11 nn 2 s under non-linear transformations of the So,for example, the fact that 2S is an unbiased estimator ofthe population variance does not meanthat Sis an unbiased estimator of the population standard deviation. As indicated earlier unbiasedness seems to be a desirable necessarily an essential property for an estimator. which a biased estimator unbiased estimator. Theimportance is better than an unbiased property. However it is not There are many common situations in one, and, in fact, better than the best of unbiasedness is secondary to that of having a small mean square error. An unbiased estimator is one that for different samples will give the true value on average. However,it could be that some of the estimates are too large and some are too small, but on average they givethe true value. So we need some wayof measuringthe spread ofthe estimates obtained for different samples. That measure is the mean square error and is covered in the next section. A biased estimator whosevalue does not deviate very far from the true value (ie has a small spread)is preferable to an unbiased one whose values areall over the place. The Actuarial Education Company IFE: 2022 Examination Page 24 4 CS1-08: Point estimation Meansquareerror As biased estimators can be better than compare estimators generally. The mean square error (MSE) MSE((gX)) alower mean square ones a measure of efficiency is needed to of an estimator gX() for is defined ? by: ?? gX)=- ?()(?? 2 E ?? Note that this is afunction of Thus the unbiased That measureis the mean square error. ?. error is the second moment of gX() about and an estimator ? with MSEis said to be more efficient. The MSE of a particular estimator density of the sampling distribution However it is usually MSE as this can be worked out directly as an integral using the of gX() , or using the density of X itself. much easier to use the alternative expression: variance =+ bias 2 makes use of quantities that are already known or can easily be obtained. This expression can be proved as follows: (Simplifying things by dropping the MSE( ) gE ()Xand writing simply g.) 2 g=-? () ?? ?? ?? Eg=- E +[]g E [] g 2?? ()() ?{} ???? -?? g=- E Eg []() ??+ 2() Eg[]???- 2 ?? var []=+ 0 + bias2 gg[] Note:If the estimator E g - ?? ? ? Eg[]?? Eg[] + 2 ? as required X is unbiased, then ()g MSE= variance. Question Obtainthe MSEofthe estimator for obtained in the question on page 12. Solution In this question, we have a random sample from the X= . ()Poi distribution and the estimator is The MSEis given by: 2 =+ var() MSE() IFE: 2022 Examinations bias ( ) The Actuarial Education Compan CS1-08: Point estimation Page 25 In Section 3 weshowed that the estimator is unbiased, ie bias =() = )=+ var( ) MSE( 02 0. So: var( ) = var( X) Now: var( ) As Xi ?? var n ?? ?? 11 n nn ??2var( Xii ) ) n = i= 1 MSEis n sinceiX areindependent ii ==11 ()Poi, we know that var( )iX= ? var( So the XX ??== . Hence: 11 nn Xn ==?22 n . The following diagram gives the sampling distributions of two estimators: but has alarge variance, the other is biased with a much smaller variance. one is unbiased This illustrates situation in which a biased estimator is better than an unbiased one. It is clear that an estimator with a small MSEis a good estimator. an estimator gets better as the sample size increases. it is desirable that The Actuarial Education MS ?E Company 0 as It is also desirable that Putting these together suggests that ?8n. This property is known as consistency. IFE: 2022 Examination a Page 26 CS1-08: Point estimation Question The estimator, s 2 , is usedto estimate the variance of a N(, s 2) distribution based on a random sampleof n observations: s 2 1 n ni= 1 X=-i? X ()2 (i) Determine the meansquare error of 2s . (ii) Determine whether 2s is consistent. Solution (i) Relating 2s to the usual sample variance : 22 (1) = n S s ns2 ? s nn (1)--S2 and 2 2 ?s? n-1 2 ??n-1 2 Hencethe meanof 2s is obtained from: s2 s ?? ?? En=2 ?? nn -1 Ess1(22 ) = n ? ?? So: biasss () 22 E ()=- s 2 n -1 = s2 2 s -s2 = - nn Thevarianceof s2is determined asfollows: s2 var s ?? 2(n=- 1) ?? ?? ?? ? var()24 2( nn = - 1) ss 22 n So: MSE() (ii) Sincethe MSE, IFE: 2022 Examinations 2=+?? var() 22 ??21n n2 ?? s s 2 2( ()biasss = ?? 4, tendsto zero as 1) s 4 s + - 2 2 ?? 2 1 4 nn ?? =??s--?? 22 ?? ?? nnn ?? ?8n , the estimator is consistent. ?? The Actuarial Education Compan CS1-08: Point estimation 5 Page 27 Asymptoticdistribution of maximum likelihood estimators Givenarandom sampleof size n from a distribution withdensity(or probabilityfunctionin the discrete case) )fx (; ?, the maximum likelihood is approximately bound, that is: ?~ ? where normal, and is unbiased estimator with variance ? is such that, for large given by the Cramr-Rao n, ? lower ( ,CRLB) ??N 1 CRLB = ? nE 2 log f X;? ?? ?? ?? ?? ()?? ?? ?? ?? ?? . The MLEcantherefore be called asymptotically efficient in that, for large n, it is unbiased with a variance equal to the lowest possible value of unbiased estimators. The Core Readingis saying that the CRLBgives alower bound for the variance of an unbiased estimator of a parameter (which is the same asits meansquare error). So no unbiased estimator can have a smaller variance than the CRLB. This is potentially a very useful result as it provides an approximate when the true sampling distribution may be unknown or impossible distribution for the MLE to determine easily, and hence may be used to obtain approximate confidence intervals. Confidence intervals will be covered in alater chapter. The result holds under very general conditions with only one major exclusion: it does not apply in cases where the support of the distribution involves the parameter, such as the uniform distribution. Thisis due to a discontinuity, so the derivative in the formula doesnt There are two useful alternative expressions Noting that ()L ? is really )LX? ( , , these are: 1 CRLB = ? ?? for the 2 ?? ?? ?? itself. 1 - ?? ?? ?? ?? The second formula is normally easier to derivative of the log-likelihood CRLB based on the likelihood and CRLB = log EL ? , X()?? makesense. ? ?? 2 2logEL ?, X()?? ?? ?? work with (as we would have calculated the second when checking that we get a maximum). Thisformula is given on page 23 of the Tables. Question Derivethe CRLBfor estimators of , for a sample The Actuarial Education Company X 1,, ? Xn from a ()Poi distribution. IFE: 2022 Examination Page 28 CS1-08: Point estimation Solution Thelikelihood is: n ( ) ? e i= 1 - Xi == Xi ! constant Le-n ?Xi So: ln ( )=- constant Ln Differentiating d d +? with respect to ln ( ) i lnX gives: =+? Xi Ln Setting this equal to zero would give the MLEof X= . Differentiating again (which we would have done to check weget a maximum): d2 ln L( ) =-? Xi 22 d Finding the expectation of this (noting that only the iX s are random variables): 2 d ?? lnEL( )?? =- ?? ?? 22 E X ]i =- 11 ??[ 2 1 = - n 2 dn =- So,from the second formula for the CRLB: CRLB=- E d2 d ?? 1l2n (L )??= ?? ?? n In fact, in this case, the maximumlikelihood estimator X= is unbiased and has variance n. So,the estimator attains the CRLB. Wecan find the CRLBfor estimators of parameters from continuous distributions. Question (i) Show that the CRLBfor unbiased estimators observations from a N(, (ii) Show that the IFE: 2022 Examinations s 2) distribution maximum likelihood of , based on a random sample of n with known variance2s estimator X= , is given by 2 s n . attains the CRLB. The Actuarial Education Compan CS1-08: Point estimation Page 29 Solution (i) From the question on page 14, wesee that: ? ln ( ) 2(LX =- ) 11?? nn 22 ss?? ii==11 2 ? ?? = Setting this equal to zero and rearranging Note: Wehave changed ix to iX X ii n ???? gives the MLE X= . as we are working with the estimator. Differentiating again gives: ? 2 n ln L( ) =- ? s 22 Since there are no iX s, all the values are constants, ? 2 ?? lnEL( )??=- E-?? ?? ?? ? ?? 22 nn = ?? and hence: 2 ss So,from the second formula for the CRLB: CRLB=-1l2 E ? ?? 2 (L )??= n 2 ?? n ?? ? (ii) s From a previous chapter wesaw that if XN(, then 2)s? ?XN , s 2?? ?? so var()X n ?? ?? = s 2 n Hencethe MLEattains the CRLB. Whatfollows now is an example to illustrate the fact that if we want to obtain the CRLBfor the variance, s2, wecant just take the CRLB for the standard deviation, s, andsquareit. The reason for this is that the formula for the CRLBof CRLB() s sis: 1 =- ?? d2 2lnEL(s)?? ds ?? ?? whereasthe formula for the CRLBof vs= 2 is: 1 CRLB () v =- d2 ?? 2lnELv ( )?? dv ?? ?? Thereis no simple connection between the derivatives. The Actuarial Education Company IFE: 2022 Examination . Page 30 CS1-08: Point estimation Question Derivethe CRLBfor estimators of the variance of a N s2(, ) distribution, where is known, basedonarandomsampleof n observations. Solution Weneedto workin terms ofthe populationvariance 2s , whichwewill writeas v. Thelikelihood function is: n Lv=- () ?? 11 exp ? 2 pv - n (X - ) ?? =v 2 ?? i=1 n 1 vv=?(Xii exp 22 i -) 22?? 1 ?? constant ?? ?? Takinglogs: ln)Lv ( 1 n ?(X 22vi= 1 i n =- -lnv - 2 +) constant Differentiating withrespect to v gives: ? log)Lv( ?vv n =- 2 + n 1 2 ?(Xi 2v i= - ) 2 1 Differentiating again: ? 2 n ln)Lv( n 1 22 =- 3 ?(Xi - 2 ?vv ) 2 ie v i=1 n 2vv Weneed to determine the expectation of this. Xi i =?? s ??- 1 n - 22 i = ???s 1 2 ??-Xi ?? We will usethe fact that XNi (, 2)s? , so The Actuarial Education ?ZN(0,1) and hence: ?? EZ() var( Z )=+ 22 E (Zii i) IFE: 2022 Examinations = 1 + 02 = 1 Compan CS1-08: Point estimation Page 31 So we have: ? 2 ?? n ln ELv ( )??=222 ??? ?? vv n 1 ?E v2 n =- s n 1 ? EZi2?? ?? 22 vv i=1 2 n n 1 =- 2 2 ?? ??-Xi ?? ?? ?? ??=1 ??i ?1 vv22 i=1 nn =- n = - 22 22v 2 vv Hence: ?? 2 ? CRLB =-1logL E ?v2 22 24 s v ??= ?? = nn Wenow consider the CRLBfor arandom sample of observations from an exponential distribution. Question Givenarandom sample of n observations from an ?()Exp distribution, determine the CRLBfor unbiased estimators of: (i) (ii) ? the population 1 mean, = . ? Comment on the results. Solution (i) Using the Core Reading example from page 11, we have: n n ()== ??Le ? - ? Xii=1 ? n - ? ? Xi e i=1 n ? ln ( ) lnLn ?? =- ? ?Xi i=1 The Actuarial Education Company IFE: 2022 Examination Page 32 CS1-08: Point estimation Differentiating d this ln (?) with respect to gives: ? n dn =-?LXi ?? i = 1 Setting this equal to zero givesthe estimator Differentiating again withrespect to 2 ln L( ? ) ?? X . ? gives: 22 are no iX s, all the values are constants and hence: 2 d ?? dn ?? ln EL( ? )??=- E-?? = ?? ?? So,from the second formula CRLB=-1l2 E (ii) 1 dn =- d Since there ?= d2 d? 22 ?? n ? 2 for the CRLB: ?? nL(?)??= ?2 ?? n ?? Weare estimating the meanof an ()Exp? distribution, ie = 1 , therefore we need to ? workin terms of and differentiate withrespect to . Thelikelihood function for the sample is: n -?() ? Le ?Xi== 1 ?Xi e- n i= 1 1 ? ln -( ) =-Lnln Differentiating d with respect to ln L() ? Xi : =-dn + ?Xi 2 Differentiating again withrespect to 2 ln L() d IFE: 2022 Examinations dn : =-2 ?Xi 22 3 The Actuarial Education Compan CS1-08: Point estimation Page 33 Finding the expectation 2 d Since Xi ?? of this: ?? dn lnEL( ??=) 22 ?? ?? 2 ?E[ ]iX 3 Exp () , we have ()iEX = ?? dn 2 lnEL( )??=d ?? 22 and hence: n ? 3 =- 2 22 3 n = - n 2 So, from the second formula for the CRLB: CRLB =- d2 E ?? 2 1l2 ogL??= ?? ?? d n Comment Although 1 = wesee that 1 CRLB()? CRLB() ? ? 2 In fact, weactually have CR The Actuarial Education Company ()LB n == 1 n?2 . . IFE: 2022 Examination Page 34 6 CS1-08: Point estimation Comparingthe methodof momentswith MLestimation Wenow compare the method of moments and the method of maximumlikelihood. Essentially maximum likelihood is regarded In the usual one-parameter case the the sample mean X and this as the better method. method of moments estimator is always afunction mustlimit its usefulness in some situations. of For example in the case of the uniform distribution on [0,]? the methodof momentsestimator is 2X and this can result in inadmissible estimates which are greater than ?. For example,supposing wehadthe following datafrom ]U[0, ?: 4.5, 1.8, 2.7, 0.9, 1.3 This gives x = 2.24 . Sincethe method of moments estimator is ?= 2X , wehave ?= 4.48. This estimate for the upperlimit is inadmissible as one of the data valuesis greater than 4.48. Nevertheless in normal many common applications cases both In some situations such as the binomial, Poisson, exponential and methods yield the same estimator. such as the gamma with two unknown parameters the simplicity method of moments gives it a possible advantage over maximumlikelihood require a complicated numerical of the which may solution. Toobtainthe MLE of afrom agammadistribution requires the differentiation ofaG(), which requires numerical IFE: 2022 Examinations methods. The Actuarial Education Compan CS1-08: Point estimation 7 Page 35 Thebootstrap method Thissection ofthe Core Readingrefers to the use of Rin bootstrapping. This materialis not explained in detail here; wecover it in the PBORresources for Subject CS1. 7.1 Introduction to bootstrap The bootstrap method is a computer intensive estimation method and can be used to estimate the properties of an estimator. It is mainly distinguished in two types: parametric and non-parametric Suppose that bootstrap. we want to (,yy12,y? , )n makeinferences which follow about a distribution parameter with cumulative Usually inference is based on the sampling distribution is obtained either by theoretical ? using distribution observed function data )Fy (; ? . distribution of an estimator ? . A sampling results, or is based on a large number of samples from )Fy (; ? . For example, suppose with parameter ? and we wish to ~(12?? , 1 YN asymptotically (,yy12,y? , )n from we have a sample n makeinferences about 7.2 distribution distribution us that to estimate or tests about ?). However, there will be may not hold (or we may not want to use Then one alternative option is to use the bootstrap resampling The CLT tells ) and we can use this sampling quantities of interest (eg for confidence intervals cases where assumptions or asymptotic results them eg when samples are small). making assumptions forming an empirical ?. an exponential method. Bootstrap allows us to avoid about the sampling distribution of a statistic of interest, by instead sampling distribution of the statistic. This is generally achieved by based on the available sample. Non-parametric (full) bootstrap The main idea behind described as follows. non-parametric Construct the empirical distribution, Fy() 1 when estimating a parameter ? , can be Fn, ofthe data: { Number of nin ==yy} Then perform the following 1. bootstrap, steps: Draw a sample of size n from Fn. This is the bootstrap sample (, 12, ? , )nyy y with y* selected ** * with replacement from (,yy12 ,y? , )n . 2. Obtain an estimate * from the bootstrap ? sample. Thisis donein the samewayas ?is obtained fromthe originalsample. Repeat steps 1 and 2, say, Btimes. The Actuarial Education Company IFE: 2022 Examination Page 36 CS1-08: Point estimation Provided that the empirical ? Bis sufficiently distribution , and is referred Schematically, yy12, ,..., yn of ? * , which serves as an estimate to as the bootstrap this can be thought sample 1: sample 2: empirical distribution ** yy12, ,..., y*n() ** ,yy12 ,..., y*n() B: ** ?, )B?? ? will provide * of the sampling of ? distribution of . as ? ? ?* ? 1 ? ? ?* ? 2 ? sample (, 12, large, the output set of estimates ** * ,yy12 ,..., yn () ? * ?B Bootstrapempirical distribution of ?. ??? ? ? ? ? Thebootstrapdistributionof ? canthen beusedfor anydesired inferenceregardingthe estimator ?, and particularlyto estimateits properties. Forexamplewecan: estimate the mean of estimator (, 12, estimates ? by using the sample mean ofthe bootstrap ? : ? , )B?? ** * B E () estimate its = 1 ? *j ; B ?? j = 1 median, using the 0.5 empirical quantile of the bootstrap * estimates ?j ; estimate the varianceof estimator? byusingthe samplevarianceofthe bootstrap estimates ** (, 12 , ? : ?, )B?? * ? var() 1-???? estimate a (1 ?? () 2???? **2 ???? ?jj ?? ?? BB jj == 11 ?? 11 =- BB ?? )%a-confidence interval ; for ? by: kk,aa-21 2() where the kadenotes Confidence intervals ath empirical quantile of the bootstrap values ? * . are described in Chapter 9. Example Suppose we havethe following with unknown parameter sample of 10 values (to 2 DP)from an Ex ()p distribution ? ?: 0.61 6.47 2.56 5.44 2.72 0.87 2.77 6.00 0.14 0.75 IFE: 2022 Examinations The Actuarial Education Compan CS1-08: Point estimation Page 37 Wecan use the following R code to obtain a single resample with replacement from this original sample. sample.data 0.14, <-c(0.61, 6.47, 2.56, 5.44, 2.72, 0.87, 2.77, 6.00, 0.75) sample(sample.data, replace=TRUE) If we do this, Rautomatically gives us a sample ofthe same size asthe original data sample, ie we obtain a sample of size 10in this case. Note that this is non-parametric as we are ignoring the Ex ?()p assumption to obtain a new sample. The following Rcode obtains stores them in the vector B 1,000=estimates ( ** * , 12 ,...,?? ) using ?1,000 ? **1jjy= and estimate: set.seed(47) estimate<-rep(0,1000) for (i in 1:1000) {x<-sample(sample.data, replace=TRUE); estimate[i]<-1/mean(x)} An alternative would be to use: set.seed(47) estimate <-replicate(1000, 1/mean(sample(sample.data, replace=TRUE))) The Actuarial Education Company IFE: 2022 Examination Page 38 CS1-08: Point estimation This gives us the following Wecan obtain estimates estimator empirical sampling distribution for the mean, standard using the following ? of ? : error and 95% confidence interval of the R code: mean(estimate) sd(estimate) quantile(estimate, 7.3 c(0.025,0.975)) Parametric bootstrap If we are prepared to assume that the sample is considered distribution, likelihood, to come from wefirst obtain an estimate of the parameter ofinterest or method of moments). equal to ? proceed as with the non-parametric Then we use the assumed , to draw the bootstrap samples. a given (eg using maximum ? distribution, with parameter Oncethe bootstrap samples are available, we method before. Example Using our sample of 10 values (to 2 DP)from an Ex parameter ()p distribution ? with unknown ?: 0.61 6.47 2.56 5.44 2.72 0.87 2.77 6.00 0.14 0.75 our estimate wouldfor ? would be ?y==1 distribution to generate the bootstrap Note that this is parametric samples. IFE: 2022 Examinations 1 2.833 = 0.3530. Wenow usethe Exp(0.3530) samples. as we are using the exponential distribution to obtain The Actuarial new Education Compan CS1-08: Point estimation Page 39 Wecan use the following ? R code to obtain **1jjy= and store them in the vector B 1,000= estimates ** * , 12 ,...,?? ) using ? 1,000 ( param.estimate: set.seed(47) param.estimate<-rep(0,1000) for (i in 1:1000) {x<-rexp(10,rate=1/mean(sample.data)); param.estimate[i]<-1/mean(x)} An alternative would be to use: param.estimate <-replicate(1000, 1/mean(rexp(10,rate=1/mean(sample.data)))) This gives us the following Various inferences can then empirical sampling distribution be made using the bootstrap of ? : estimates ( ** * 12 ,..., )B? ??, Bootstrap methodology can also be used in other, more complicated, scenarios example in regression The Actuarial Education Company analysis or generalised linear as before. for model settings. IFE: 2022 Examination Page 40 CS1-08: Point estimation The chapter summary starts on the next page so that you can keep all the chapter summaries together for revision purposes. IFE: 2022 Examinations The Actuarial Education Compan CS1-08: Point estimation Page 41 Chapter8Summary Methodof moments The method of moments technique using the formulae: (EX) = 1 parameter 1 equates the population moments to the sample moments n n ? Xi i= 1 2 parameters ( )== EX X X( E 11 nn ?? X22 ) or ii nn var(X) = ii ==11 alternatively ( )== EX X 1 n n ?(X i X)2 - i =1 var( X) S = 11 nn1- nn ?? (Xii - X) 22 ii== 11 Maximumlikelihood estimation The method of maximumlikelihood hasthe following stages: n () =? Lfxi ( ?? ; ) find the likelihood i = 1 find ln L find ? that solves ln ??L ( ) = 0 ?? check for maximum ? 2 2ln L? ( )< 0 . ?? If the range of the distribution is afunction of the parameter, the maximum must be found from first principles. Propertiesof estimators The bias of an estimator is given by gX () is an unbiased estimator of ? if [( X)]Eg ?- where gX() is the estimator. [( X)]Eg ?= . The mean square error of an estimator is given by [(Eg( X?) ) 2] where gX() is the estimator. An easier formula is var[ g( X)]+ bias 2[ g( X)] . The Actuarial Education Company IFE: 2022 Examination Page 42 CS1-08: Point estimation Anestimatoris consistent if the meansquareerrortendsto zeroas ntendstoinfinity, where nis the sizeof the sample. A good estimator has a small MSE, is unbiased and consistent. The Cramr-Rao lower bound gives alower bound for the variance of an unbiased estimator. It can be usedto obtain confidence intervals. Its formula is: CRLB()=? 1 2 ??? 2lnEL(?, X)?? ? ??? ?? The value of the CRLBdepends on the parameter you are estimating. To usethis formula, the likelihood mustbe expressedin terms of the correct parameter. The asymptotic distribution of an MLEis: (,NCRLB) ????? IFE: 2022 Examinations The Actuarial Education Compan CS1-08: Point estimation Page 43 Chapter8 PracticeQuestions 8.1 Arandom sample from a Poi ()sson distribution is asfollows: 4, 2, 7, 3, 1, 2, 5, 4, 0, 2 Calculate the 8.2 method of moments estimate for The heights of 10-year-old children . are assumed to conform to a normal distribution. The heights of arandom sample of 5 such children are: 124cm, 122cm, 130cm, 125cm and 132cm Estimate the meanand variance ofthe heights of 10-year-old children using the method of moments. 8.3 Waitingtimes in a post office queue have an ?()Exp distribution. Ten people had waitingtimes (in minutes) of: 1.6 0.9 1.1 2.1 0.7 1.5 2.3 1.7 3.0 3.4 Afurther six people had waitingtimes of morethan 4 minutes. Calculate the 8.4 Exam style maximum likelihood estimate of ? based on these data. The number of claims arisingin a year on a certain type ofinsurance policy has a Poisson distribution with parameter ?. Theinsurers claim file shows that claims were madeon 238 policies during the last year withthe following frequency distribution for the number of claims: Number of claims 1 Frequency 174 2 50 3 10 4 4 = 5 0 Noinformation is available from the policyfile, that is, only data concerning those policies on which claims were made can be usedin the estimation of the claim rate ?. (This is why there is no entry for the number of claims being 0in the table.) (i) Show that the truncated probability function is given by: xe - ? PX x()== ? !(1 -xe The Actuarial Education Company x =1,2,3,? -? [3] ) IFE: 2022 Examination Page 44 (ii) CS1-08: Point estimation Show that both the method of moments estimate and the ? (1=- xe - ) , where x is the claim. ? (iii) MLEof ?satisfy the equation mean number of claims for policies that have at least one [7] Solvethis equation, by any means,for the given data and calculate the resulting estimate of ?to two decimal places. (iv) [3] Hence, estimate the percentage of all policies with no claims during the year. [1] [Total 14] 8.5 Determine the mean square error of X which is used to estimate the = mean of a N s2(, ) distribution based on arandom sample of n observations. 8.6 Exam style Supposethat unbiased estimators 1X and 2X of a parameter independent Let Y be the combination (i) va 1r()X methods, and suppose that given by =+12YX a Derivethe relationship satisfied by s= 2 and that X , where a and ?have been determined by two va 2r()X a and fs= 2 , where f>0. denote non-negative weights. sothat Yis also an unbiased estimator of ?. [2] (ii) 8.7 Exam style Determinethe varianceof Yin terms of f and s2if, additionally,the weightsarechosen suchthat the varianceof Yis a minimum. [4] Arandom sample nxx? 12x ,, , is taken from a population, which hasthe probability distribution function Fx () andthe densityfunction the minimum and maximum values (i) fx() . The valuesin the sample are arrangedin order and MINx and Showthat the distribution function of for the distribution function of MAXx are recorded. MAXX is [( )]nFx , and find a corresponding formula MINX . Theoriginaldistributionis nowbelievedto bea Par [3] (,a1)eto distribution, ie the probability density function is: () fx==+ a (1)+ x (ii) x 0 Determinethe distributionfunction of X, and hencedeterminethe distributionfunction of (iii) a1, MAXX . [2] Showthat the probability density function for the distribution of na fx()==+n XMIN (1 + x) a1 x 0 Arandom sample of 25 values gives a sample value for IFE: 2022 Examinations MINX , is: [2] MINxof 23. The Actuarial Education Compan CS1-08: Point estimation (iv) Page 45 Obtain a maximum likelihood estimate of a using the distribution of [3] MINX . The same random sample gives a value of (v) MAXx of 770. Obtain an equation for the maximumlikelihood estimator of a using on the difficulty (vi) MAXx . Comment of solving this equation. Outline whatfurther information moments estimate of [3] you would need herein order to obtain a method of a. [1] [Total 14] 8.8 Arandom sample of eight observations from a distribution is given below: 4.8 (i) 7.6 3.5 2.9 0.8 0.5 2.3 Derivethe method of moments estimates for: (a) ?from an (b) ?from a (ii) ()Exp? distribution 2 distribution. ?? Derivethe method of moments estimators for: (a) k andpfromaType2negative binomialdistribution (b) 8.9 1.2 and s 2 from alognormal distribution. Showthat the likelihood that an observationfrom a Poisson()? distributiontakes an odd value (ie 1, 3, 5,...) is 12(1 -e -2? ) . 8.10 A discrete random variable has a probability function x 2 PX x=() (i) 1 8 4 1 +2a 2 given by: 5 -3a 3 8a+ Givethe range of possible values for the unknown parameter a. Arandom sample of 30 observations gaverespective frequencies of 7, 6 and 17. (ii) (iii) Calculatethe method of moments estimate of a. Write down an expression for the likelihood maximumlikelihood estimate a of these data and hence show that the satisfies the quadratic equation: 180 2 111 aa +- 91 =0 832 The Actuarial Education Company IFE: 2022 Examination Page 46 (iv) CS1-08: Point estimation Hence determine the maximum likelihood estimate and explain why one root is rejected as a possible estimate of a. 8.11 A motorinsurance portfolio produces claim incidence datafor 100,000 policies over one year. The table below shows the observed number of policyholders Exam style making 0, 1, 2, 3, 4, 5, and 6 or moreclaims in a year. No. of claims No. of policies 0 87,889 1 11,000 2 1,000 3 100 4 10 5 1 =6 Total (i) (a) 100,000 Estimate the parameter of the Poisson distribution to fit the above data usingthe method of moments. (b) (ii) [3] Showthat the estimate of the Poisson parameter calculated from the above data using the (iii) Hence calculate the expected number of policies giving rise to the different numbers of claims assuming the Poisson model. (a) (b) method of moments is also the maximum likelihood estimate of this parameter. [4] Estimate the two parameters ofthe Type 2 negative binomial distribution to fit the above data usingthe method of moments. Hence calculate the expected number of policies giving rise to the different numbers of claims assuming a negative binomial model. [6] You may use the relationship: PX () x== kx+- 1 x q P(X = x -1) for the negative binomial distribution. (iv) Explain briefly why you would expect a negative binomial distribution to fit the above data better than a Poisson distribution. [2] [Total IFE: 2022 Examinations The Actuarial Education 15] Compan CS1-08: Point estimation 8.12 Exam style Page 47 Arandom sample X1,, n? X is taken from the normal distribution which has mean and variance s2. (i) State thedistribution of ? s 2 X ()iX . [1] 2 It is decidedto estimatethe variance, s2, usingthe following estimator: + ? nb s =- 221 ()iX X wherebis aconstant. (ii) (iii) (a) Usepart (i) to obtain the bias of s 2. (b) Hence, showthat s 2is unbiasedwhenb 1=-. (a) Show,usingparts(i) and(ii)(a), that the meansquare error of s 2 is given by: MSE() 2( = [3] 1)-+ (1 +nb)2 24 nb () 2 ss + (b) 2 Determine whether the estimator, s , is consistent. (c) Show thatthe mean squareerrorofs 2is minimised when 1b= . You may assume that the turning (iv) [7] point is a minimum. Commentonthe bestchoicefor the valueof b. [2] [Total 13] The Actuarial Education Company IFE: 2022 Examination Page 48 CS1-08: Point estimation The solutions start on the next page so that you can separate the questions and solutions. IFE: 2022 Examinations The Actuarial Education Compan CS1-08: Point estimation Page 49 Chapter8 Solutions 8.1 sson The population meanfor a Poi . 30 meanis x The sample () from page 7 of the Tables is == 3. 10 Equating population meanto sample meangives =3 . Sincethis is an estimate ofthe true value of 8.2 , we write = 3. The sample moments are: 1 633 == 126.6 and 1nn 80,209 ??xxii 2 = nn 55 ii==11 The population moments are sample and population EX ()= and EX() var( X) [ (EX)]22 = 2 +s=+ 2 . Equating the moments gives: 126.6 = s 22 16,041.8 += ? s2 =14.24 s Alternatively, usingx =126.6and22= population EX ()= moments of =xs ==126.6 8.3 =16,041.8 Using the likelihood and s 22= formula ?? ?Lfx( i )?? ?? ?? i=1 80,209 =- 5 var( and = 126.6 {} 17.8 andequating thesetothe s2)Xgives: 17.8 given for censored data in Section 2.4: 10 ()= 1 4 X( P > 4) [] 6 =?? 10 e- ? ?xi e-4 6 ()? 8 (4) since fxi()=? e?- xi and PX>= ? ?e ??4 e-4--?. dx = - e ??xx??8 = 4 Takinglogs: ln ( )?=- 10ln -?Lxi ? 24?? Since?xi 18.3= weget: ln L?? ( )?=- 10ln The Actuarial Education Company 42.3 IFE: 2022 Examination Page 50 CS1-08: Point estimation Differentiating: d 10 lnL( ?)=-42.3 d?? Thisis equal to 0 when Differentiating d2 10 ?== 0.2364. 42.3 again: 10 lnL( ? ) =- d 22 <0 max ? ?? ?== 10 Sothe maximumlikelihood estimate is 8.4 (i) 0.2364. 42.3 Since only policies with claims are included, xe we mustuse atruncated Poisson distribution: ?- (PX x)== k ? where k is the constant [1] x = 1,2,3,? x! of proportionality to ensure that the sum of the probabilities is 1. For the ordinary Poisson distribution: ?PX x() == 1 (PX 1)== 1 - (PX = 0) = 1 -e ?- ? [1] x So our probability function 8 ? kP X==x() can be written as: 1 ? k(1 1 e- )-= 1 ? k= (1-e -?) ? x= 1 (ii) We will first use the method of moments technique, [1] so we need the mean of the truncated Poisson distribution: 88 EX [] x == !(1e--xe ) ?? x xx== 10 ?? xx ee-- ?? 1 = !(1 xe-- ?? ) (1 - 8 - ? ) x= ?x ? x e- ? [2] x! 0 sincethe =0x termis zero. Thesumis the meanof the Poissondistribution(found bysumming 8 EX [] ?x (1eex ) x = 0 x e- ! ? 11 ?? == [1] ? = (1-- -- ?? Sothe method of moments equation is x = (1 - e ? ) ) - ? 1 - e- IFE: 2022 Examinations PFx), so weget: or ? (1=- xe - ? ), as required. ? The Actuarial Education Compan CS1-08: Point estimation Page 51 The likelihood function is: n n??? ( L xx ii ee -- i = ? !(1 xei ) ?) == --1e constant (1 where the constant incorporates -- ?? ?? [1] )n the factorial factor. Takinglogs: constant (?=+ log -nLxi)log??-nlog(1 -e - ? ) [1] Differentiating withrespect to ?: log ? d =- Ln - dnxi e nx(1 e n (1-- e )--n??)e -- ?? ?- = ?1(1 -- ?? nx(1-- e-?) -? ee --) ?? n? = ?(1- e -?) Equating to zero gives (iii) (1=- xe ? ) asrequired. - ? [2] From the data: 174 1 + 50 2+ 10 3 4 4 + 238 320 238 x== So 320 (1-=- e 238 ? ) ? ? or 320 - (1-e [1] -=0 . ?) 238 Usingtrial and error on the second equation we get: ? = 0.6 ? LHS =- 0.0147 ? = 0.7 ? LHS = 0.0460 Usinglinear interpolation: 0.6=+ ? 0 (-- 0.0147) 0.0460 (-- 0.0147) (0.7 - 0.6) = [2] 0.624 Alternatively wecould use asystematic methodsuch as Newton-Raphson. (iv) Now PX (0) e-== ?. Bythe invariance property, the maximumlikelihood estimate of this probability is: ? ee 0.624 == 0.536 -- So we estimate that 54% of policies have no claims. The Actuarial Education Company [1] IFE: 2022 Examination Page 52 8.5 CS1-08: Point estimation The MSEis given by: bias 2( ) MSE()=+ var() 2 var()X =+ biasX( ) where: bi When X E X ()=()as XN(, 2)s?we have ?XN , s 2?? ?? n ?? so EX ()= and var()X s = ?? 2 n . Hence: bias ()=-=0 X Therefore: MSE() =+ var()X 8.6 (i) Since1X 0 and 2X EX 2 s2 = n are unbiased estimators [EX[] ] 12 == ? EY () var( ) 1 X) s2)X = a E( X1) + d minimum, var(Y) 2 a - a 2 f = fs2 a var(1X) = 2(1 2 ) =(a + )? [1] . Since1X and2X areindependent: + 22 var()X2 ] [1] weset the derivative [2sa E X( +=1. and var2()X var( YX=+ a 12 X) 22 )[(1 =+ sa To obtain the meansthat: [1] E( a =12 X =+ Now we have var( ?this ? Hence,if Yis unbiased for ?, then (ii) of equal to zero: ) ] af=- - da ? (1=-aaf ) ? a f = [1] 1 + f Checkingits a minimum: d2 2var(Y) 2 =+ sf[2 2] > 0 ? min [1] da IFE: 2022 Examinations The Actuarial Education Compan CS1-08: Point estimation Page 53 So: ?? s ? 2 2 (1 11++ ff ? f ??? ?? ??? ?? ?? (1++ff 22 ) ?? ?? ) f 2 s = ff ff =+ ?? 22 ?? ? ?? =+ ?1 - var(Y) s 2 [1] 1+f 8.7 (i) Considerthe value of MAXX . This will beless than some value x, say,if and onlyif all the sample valuesareless than x. The probability ofthis happeningis just Fx ()[]n . So: n [1] FxXMAX () = F x)[]( Usingsimilar logic, MINXwill be greater than some number x if and onlyif all the sample values are greater than x. So: PX MIN () x == P(all ? (ii) Xi x)== 1 -F( x)[]n n 1 - F( x)[] Fx() XMIN 1=- function of X is given by: The distribution [2] xx ??+(1 )t Fx() f (t ) dt== 1 -- a ?? dt = -(1 +t )-aax ?? =-1 (1 + x )-a [1] 0 00 where =0x . Hence (iii) Similarly FxXMAX () F x()[] == n n 1 - (1 +)x =- 1 - F x()[] n FxXMIN () 1 ()a. [1] n =- (1 + x) ?? 1 ?? 1=-(1 + x--) aa n This hasthe same form asthe original distribution function, so x =0 [1] MINX hasthe Pareto distributionwithparameters na and1. Sothe density functionof MINX is: XMIN fx() The Actuarial Education Company na (1 + x) na 1 x==+0 [1] IFE: 2022 Examination Page 54 (iv) CS1-08: Point estimation The likelihood L()a function for a, based on a single value of , is: MINX na = (1 +x)n a+1 ? log-Ln ( ) ? ? log =+logaa log (a)=1 = a [2] log(1 +nx) 25=and x Substituting in n (v) Lnlog(1 + x) a ?a ? 1 ( n a + 1)log(1 + x) 23= , weget a 0.01259= . MAXX , wehave(using the Applying the same approach to [1] derivative of FxXMAX () from earlier) alikelihood function of: ) n-1 ()==LfX ( x) MAX n 1( 1+ x()-a aa ? log-Ln ( ) log =+( n 1)log 1 - (1 ? ? log )(a 1(1 Substituting in n 25= and x log771 -+ 24 1 x)-a ?? +logaaa ?? )-alog(1 ++ xx) - ( + 1)log(1)+ x -log(1 + x) = 0 [2] 1(1-+x )-a a ?a 1 =+-(Ln 1) + (1 + x)-- a 770= we get: 771 - a log771 =0 [1] 1 - 771 -a a This equation cannot be solved algebraically. A numerical method will be needed to solve it. (vi) Wecannot usethe usual method of moments approach unless weknow all the individual sample values (or atleast the mean of the sample). So we do not have sufficient information to usethe method of moments approach here. 8.8 (i) [1] Method of moments(one unknown) Wehave one unknown and so require only one equation: EX ()x==i? 1 x n Herewehave =2.95x . IFE: 2022 Examinations The Actuarial Education Compan CS1-08: Point estimation (i)(a) Page 55 Exponential Usingthe formula for the meanof an exponential distribution: 1 2.95 = 0.33898 ? =? ? (i)(b) Chi-square 2 Since =Gamma1(,? ?? (ii) ) , weget 22 ? a == 2 12 ? . Hence = ? ?= 2.95. Method of moments(two unknowns) Wehave two unknowns and so require two equations. =??x )ii==x Either: () EX or: EX ()x==1n? xi E( X and Forourdatawehavexxi 2.95, (ii)(a) EX() 2 1 8 11and22 nnx var()Xs= 2 13.635==? and s2 =5.6371. Negative binomial Usingthe first method gives: (1)-kp (EX) p ==2.95 (1 -- )kp)??2 22 (1 kp +?? =13.635 ?? (EX ) var( X) [ (EX=+ )] = p ?? 2 p Substituting the first equation into the second gives: 2.95 2.952 += 13.635 ? 2.95 pp = 4.9325 ? p = 0.59807 Hence,substituting this backinto the first equation gives=k Using the second Substituting method gives: (1) kp ( ) EX p == 2.95 and var(X) k(1--p ) p ==2 5.6371 the first equation into the second gives: 2.95 = 5.6371 p The Actuarial 4.3896. Education Company ? p = 0.52331 IFE: 2022 Examination Page 56 CS1-08: Point estimation Hence, substituting (ii)(b) this back into the first equation gives =k 3.2386 . Lognormal Usingthe first method gives: (EX)==e ++ s 221/22 2.95 and E( X ) = e2 2s =13.635 Rewriting the second equation gives: 2( +1/2 ees22 )s 2.952 es 2== 13.635 ? Substituting this into the first equation gives =0.44903 s2 0.85729= . Usingthe second method gives: s (EX)==e ++1/22 Substituting 22 s ( e s2 2.95 and var()X = e - 1) = 5.6371 the first equation into the second gives: 2.95 Xes ( 2 var( ) Hence, substituting 1) = 5.6371=s? 22 = 0.49942 this into the first equation gives 0.83210= . 8.9 Defining X to be an observation from a Poisson?() distribution, wehave: (PX 1) +(PX 3) +( PX== = 5) + = e- ?? +??? +????? 3! 5! ?? 35 + Tosum the seriesin the square bracket, note that: e? e So 1 = 1 + ? 1=- -? ?-=-ee ?? 2! ? + () + ?? +++ 2! ? - 23 ? 3! ?? 23 3! + ? 3 which is the required +?, 23! series. Sothe required probability is: e- 2() ee ?? IFE: 2022 Examinations -- ? = (1-e ?-11) 22 The Actuarial Education Compan CS1-08: Point estimation 8.10 (i) Page 57 Range of values Since 1== 0(PX x) = , usingthis for eachofthe probabilitiesgiveslower boundsfor ,-- 16 11 3 and 6 - 8 1 . Hence, a=- 16 . Wealso obtain upper boundsfor a of 71 , 16 6 a of and 5 8 . 1 Hence a= 6 . (ii) Method of moments estimator Wehaveoneunknown, so wewilluse EX ()x= . aa () =+22 () + 82 -43 5 8++ a() =3338 -3a 11 (EX) From the data, we have: 7 2+ 6 x== 4 + 17 5 123 30 30 =4.1 Therefore: 33 8 3 ?aa -= 4.1 = 0.0083 ? This value lies between the limits (iii) derived in part (i). Maximumlikelihood The likelihood of obtaining the observed results is: () L()= 11 828 2 + aaconstant 6 () - 3a 3 717 () +a Taking logs and differentiating: ln ( ) Lconstant d ? d? ln =+ 7ln 14 a()L 2 + aa =- () + 6ln 11 82 18 11+- 23 aa + 82 - 3 a() + 17ln 3 +a 8 () 17 3 8 + a Equating this to zero to find the maximumvalue of ? gives: 14 18 11 23 82+- aa -+ 17 3 a 8 + ? 14211 3()() 233+ -+aa -18 28 ? 14 16 ? 180 The Actuarial Education 2 35 -aa -8 3 =0 () 8 + 38+ 18 64 ()()8 a a a 117 8 + + 1 ()() a 2 -3 a = 0 168+5 a -6 +2 a22() + 1771 () =0 a2 111 0832 aa +- 91 = Company IFE: 2022 Examination Page 58 (iv) CS1-08: Point estimation MLE Solving the quadratic equation gives: ()2 111 111 88 - a 4180 - 91 32 - 360 The maximum likelihood ==- 0.170,0.0929 estimate is 0.0929. The other solution of 0.170- does notlie between the bounds calculated in (i). It is not feasible asit is less than the smallest possible value for a of - 0.0625 . 8.11 (i)(a) Method of moments estimate of Poisson parameter The sample meanis: 1 100,000 (87,889 The mean of the 0 11,000 1 ?()Poi distribution is ?. + + 1,000 2 +? ) = 0.13345 Sothe method of moments estimate of ? is 0.13345. (i)(b) [1] Expected results using method of moments(Poisson) For the Poisson distribution, probabilities can be calculated iteratively usingthe relationship: (PX x)== ? x P(X = x -1), x =1, 2, 3,... The expected numbers, based on this estimate, are: x0= : 100,000e- 0.13345 = 87,507 x1= : 0.13345 87,507 11,678= x2= : x3= : x4= : 0.13345 2 0.13345 3 0.13345 4 11,678= 779 779= 35 35= 1 x5= : 0.13345 5 x6= : 100,000 87,507 -- 11,678 - 779 - 35 - 1 - 0 = 0 IFE: 2022 Examinations = 10 [2] The Actuarial Education Compan CS1-08: Point estimation (ii) Page 59 MLE(Poisson) Thelikelihood ofobtaining 0n 0s,1n 1s etc(makingatotal of n),assuming the numbers conform to a Poisson distribution, is the multinomial probability: ! ()= -Le 1?? )n0(??( e nn 2!! 01 n !? constant= ? constant ? = n++nn 1223 13,345 ?2 ne-? n n ?? 2 ?? 2! ?? ?? ) ? + 3??) -? (n+0 +n1 n2+ e e -100,000? [1] Sothe log likelihood is: ln L( ) 13,345ln =- 100,000 ?+??constant Differentiating withrespect to d 13,345 ln L(?) [1] ? to maximisethis: =-100,000 [1] d?? Thisis zero when: 13,345/100,000 ? 0.13345== [1] Sincethe second derivative is negative, this is the maximumlikelihood estimate of ?. It is the same asthe method of moments estimate. 8.12 (iii)(a) Method of moments estimate of negative binomial parameters The second (non-central) 1 100,000 sample (87,889 0 + moment for the data is: 11,000 22 1 + 1,000 22 +? ) = 0.16085 The meanand second non-central moment of the negative binomial distribution k and p are kq p and 2 kq?? kq p2 with parameters +?? p?? . Sothe method of moments estimators of k and p satisfy the equations: kq 2 0.13345 and kq kq?? =+ =?? 2 ppp ?? 0.16085 [2] From the second equation: kq p2 The Actuarial 2 0.16085=- Education Company kq?? ?? p?? 0.16085=- (0.13345) 2 = 0.14304 [1/2] IFE: 2022 Examination Page 60 CS1-08: Point estimation Using the first equation gives: 0.13345 p==0.93295 0.14304 1=- qp [1/2] 1=- 0.93295 = 0.06705 [1/2] 0.13345 0.93295 and k (iii)(b) Expected results ==1.8569 0.06705 using [1/2] method of moments (negative binomial) The expected numbers, based on these estimates, are: x0= : 100,000(0.93295) 1.8569 x1= : 1 = 87,909 0.06705 87,909 = 10,945 2.8569 0.06705 10,945 = 1,048 2 x2= : 3.8569 x3= : x4= 1.8569 3 4.8569 : 4 5.8569 x5= : 5 0.06705 1,048 0.06705 90 = = 90 7 0.06705 7 = 1 100,000 87,90910,9451,048 -- 90 - 7- 1= 0 x6= : Wehave made use of the negative binomial recursive relationship (iv) [2] given in the question. Whynegative binomial is a better fit For a Poisson distribution, the meanand variance arethe same. Sincethe sample meanand variance (which, for a sample aslarge asthis, should be very close to the true values) are 0.13345 and 0.14304, which differ significantly, this suggests that the Poisson distribution may not be a suitable model here. [1] The negative binomial distribution has moreflexibility and can accommodate different values for the mean and variance (provided IFE: 2022 Examinations the variance exceeds the mean). [1] The Actuarial Education Compan CS1-08: Point estimation 8.13 (i) Page 61 Distribution Usingthe result given on page 22 of the Tables: = ? i -XX() 2 -(1 )nS2 ss (ii)(a) 2 ? ?n1 22 [1] Bias The bias of 2s is given by bi ? () 22()as ssE 2 =-s . From part (i) we have: XXi () 2??s ??=-En (1) 2 nb Since (+=s [1/2] ?? ?? , we have: ? Xi - X) 22() nbs () 2??+ s ??=- En 2 +nb () ? s (1) ?? ?? 2 [] (En=- 1) s 2 ? E[]22 n -(1) = [1] ss +nb () Therefore the bias is given by: (1) bias()22 (ii)(b) ss nb () =- s 2 = - (1-+ nb) () ++ nb s 2 [1/2] Unbiased Substituting =-1binto the bias gives: bias()22 Hence, s2 (1 =- - 1) ss n-(1) =0 is an unbiased estimator [1] of s2 when =-1b . (iii)(a) Mean square error The meansquare error of 2s is given by var ? XXi () 2??s The Actuarial Education 2 Company ??=- 2(n 1) MSEss()s=+ var( 22 ) bias 2 2 (). From part (i) we have: [1] ?? ?? IFE: 2022 Examination Page 62 CS1-08: Point estimation nb Since (+=s ? , wehave: - X) 22() Xi nbs () 2??+ ??=- 2( n var 2 s () 2 +nb ? 4 s ? var[ 1) ?? var[ ] = ] s 2(n=-2 1) 2( n- 1) 24 ss [1] () 2 +nb Usingthis and the biasfrom (ii)(a), the meansquare error is given by: MSE()242( )-+1)ss =+ (1 nb nb () (iii)(b) As nb()22 nb () 2 s 4 [1] + Consistent ?8n , the meansquare error becomes: MSE()242 ss ??0 n So ++ 22 -2(n 1)+ (1 + b) = s4 s2 (iii)(c) is consistent. [1] Minimum meansquare error Differentiating d db with respect to b using the quotient rule gives: MSE() 2(1)(b ++ nb) - [2( n - 1)+(1) +b22] = 2( +nb) ss () 4 +nb 24 [2] Substituting b1= into this expression gives: db 22( dn+ 1)2 - [2( n - 1) + 4] 2( n+ 1) MSE() = n+(1) b=1 4( 22 1) +- 4( nn + 1) = n +(1) 4 ss 4 s 24 4 =0 Sothe MSE is minimised when b1= IFE: 2022 Examinations [1] . The Actuarial Education Compan CS1-08: Point estimation Alternatively, Page 63 we could attempt to find the value of b that =[2(n-1)+ (1+ b22 )] 2(1 b++ )() bn ? (1 ? ? ? (iv) 2()n + makes this zero asfollows: b )(bn++ b) = [2( n- 1) +(1 + b) 2] nbbnb ++ 22 = n -21 + 2 +bb + bn (1)-= n - 1 b =1 Best estimator All values of b give consistent estimators. unbiased, whereas when When b b1= , the estimators =-1 , the estimators + ?Xi n 1 =- - ? Xi =- n 1 ()221 X has the smallest X (221 ) is MSE,but it is biased. Since a smaller MSEis moreimportant than being unbiased, weshould choose b1= [1] . However, there willbelittle differencebetweenthe estimatorswhennislarge asthe mean square errors and biases both tend to zero. The Actuarial Education Company [1] IFE: 2022 Examination Allstudy materialproducedby ActEdis copyright andis sold for the exclusiveuse of the purchaser. Thecopyright is ownedbyInstitute andFacultyEducationLimited, a subsidiary of the Institute and Faculty of Actuaries. Unlessprior authorityis grantedby ActEd,you maynot hire out,lend, give out, sell, store ortransmit electronically or photocopy any part of the study material. You musttake care of yourstudy materialto ensurethat it is not used or copied by anybody else. Legal action will betaken if these terms areinfringed. In addition, we mayseekto take disciplinaryactionthrough the profession orthrough your employer. Theseconditionsremainin force after you havefinished usingthe course. IFE: 2022 Examinations The Actuarial Education Compan CS1-09: Confidence intervals Page 1 Confidence intervalsand prediction intervals Syllabusobjectives 3.2 Confidenceintervals 3.2.1 Definein general terms a confidence interval for an unknown parameter of a distribution based on a random sample. 3.2.2 Define in general terms a prediction interval for a future on a random 3.2.3 based sample. Derive a confidence interval for an unknown parameter using a given sampling 3.2.4 observation distribution. Calculate confidence intervals for the mean and the variance of a normal distribution. 3.2.5 Calculate confidence intervals for a binomial probability and a Poisson mean,including the use ofthe normal approximation in both cases. 3.2.6 Calculate confidence intervals for two-sample situations involving the normal distribution, and the binomial and Poisson distributions using the normal approximation. The Actuarial 3.2.7 Calculate confidence intervals for a difference between two meansfrom paired data. 3.2.8 Usethe bootstrap Education Company method to obtain confidence intervals. IFE: 2022 Examination Page 2 0 CS1-09: Confidence intervals Introduction In the previous chapter we usedthe method of moments and the method of maximumlikelihood to obtain estimates for the population parameter(s). For example, we might havethe following numbers of claims from a certain portfolio that wereceive in 100 different monthly periods: Claims 0 1 2 3 4 5 6 Frequency 9 22 26 21 13 6 3 Assuming a Poisson distribution with parameter estimate of for the number of claimsin a month, our usingthe methodsgiven in the previous chapter would be x 2.37== . The problem is that this might not bethe correct value of . In this chapter welook at constructing confidence intervals that have a high probability of containing the correct value. For example, a 95%confidence interval for meansthat there is a 95% probability that it contains the true value of . Confidence intervals will be constructed using the sampling distributions example, whensampling from a N s2(, ) distribution where 2?? XN ?? n ?? ? Z= X -s s ?? 95% 21/2% z1 s2 given in Chapter 7. For is known: ??N,(0,1) n 21/2% z2 If werequire a 95% confidence interval, then wecan read off the upper 2.5% point ofthe standard normal distribution from page 162 of the Tables to get +1.96. Wecan then use the symmetry of the standard normal distribution to deducethat the lower 2.5% point is 1.96. It is important to realise that the formula for the endpoints of this interval contains X, and so the endpoints are random variables. Wecan obtain numerical values for these endpoints by collectingsomesample data andreplacing X bythe observedsample mean.x different samples maylead to different endpoints. we obtain should contain the true value of . IFE: 2022 Examinations If wesample repeatedly, Naturally, 95% of the intervals The Actuarial Education Compan CS1-09: Confidence 1 intervals Page 3 Confidence intervalsin general A confidence to apoint interval provides probability. involved. -(1 A 100 The width of the interval estimate provides )%a confidence interval for (),??X12 ()X depending ((PX) << 12? (1X?? )) = Rightly or wrongly, common Thus an interval of an unknown estimate). It is designed to contain the parameters = PX() of the estimator X = ,XX)n ?1(, such that . 0.05aleading to a 95% confidence interval, is by far the most case used in practice X?? precision ? is defined by specifying random variables on the sample -a a measure of the parameter (as opposed value with some stated and we will tend to use this in () ,X 12 ()<< () = 0.95 specifies ?12 most of our illustrations. X()?? () as a 95% confidence interval for This emphasises the fact that it is the interval and not ? that is random. In the long ? . run, 95% of the realisations of such intervals will include ? and 5% of the realisations will not include ?. Suppose wetake a random sample from a particular population at a fixed moment in time and, based on this sample, wecalculate a 95% confidence interval for the meanof the population to be (25,30). Suppose wethen take another random sample from this population at the same moment in time, and this second sample gives a 95% confidence interval to appreciate that the limits of any confidence interval for of (23,29). It is important depend on the sample values collected. If werepeat the sampling process manytimes and calculate a 95% CIfrom eachsample, then 95% of these confidence intervals will contain the true value of . It is important to understand that, since the meanof the population is constant (not arandom variable), it doesnt (25<< make sense to =30) P make statements of the form: 0.95 Whenever we write down a probability statement, we must makesure that it contains at least one random variable. If there is no random variable, the statement is nonsense. Toillustrate this, consider the score obtained when afair dieis thrown. Let X = score obtained on the next throw. Since X represents a future outcome, its value isnt yet known. There is morethan one possible value that X could take andits value is down to chance. So Xis a variable and it makes sense to consider probabilities involving X, eg: (2 PX 4)=< . random Now suppose that the score on the last throw of the die was 5. This is a past value, which has already been determined and recorded. Lets denote this by y. Consider the statement: Py <(4) This doesnt makeany sense as y is afixed amount y =(5) , in the same wayas the meanof a population at any given moment of time is afixed amount. The Actuarial Education Company IFE: 2022 Examination Page 4 CS1-09: Confidence Confidence intervals intervals are not unique. In general they should be obtained via the sampling distribution of a good estimator, in particular the maximum likelihood estimator. Even then there is a choice between one-sided and two-sided intervals and between equal-tailed and shortest-length intervals although these are often the same, eg for sampling distributions that are symmetrical about the unknown value of the parameter. We willsee some examples of these shortly. Often, we are more interested in statements about future observations parameters underlying the distribution of these observations. than about the This arises in the context of regression models, for example, when a fitted model is being used to make predictions about future observations. Even if the parameter ? equals the unknown mean of the distribution, it will not be the case that a future observation will fall within a 95% confidence interval with probability 95%. For this, a prediction interval is required. Aconfidence interval gives ussome information about the value of afixed parameter, ?,from a particular distribution, but a prediction interval gives usinformation about the next future value from that distribution, A 100-(1 that Pl (( X) X. )%a prediction X 1 interval (hX<<+n )) = 1- for a . Xn+1is defined Prediction by random variables are, like confidence intervals unique but typical choices are one-sided or symmetric. more generally for functions of one or more future lX () , hX() such intervals, Prediction intervals not can be defined observations. For example,in Chapter 12, we will predict the output of the function a x+ . IFE: 2022 Examinations The Actuarial Education Compan CS1-09: Confidence intervals Page 5 2 Derivationof confidenceandpredictionintervals 2.1 The pivotal method There is a general method of constructing confidence intervals This method requires a pivotal quantity of the form called the pivotal (),gX ? to be found method. with the following properties: (1) it is a function of the sample values and the unknown (2) its is completely (3) it is monotonic in ? . distribution The distribution in condition (2) parameter ? known must not depend on ?. Monotonic meansthat the function either consistently increases or decreases with ?. The equation: g2 ? ft() dt=0.95,(where ft() is the known probability (density) of ) ?(),gX g1 defines two values, 1g and 2g , such that: Pg12,0.95()=<<g Xg? () andgg12 are usually constants. Weare assuming here that X has a continuous distribution. We willlook shortly at examples based on discrete distributions. If (),gX ? is monotonic increasing gXg?? ()<? <22, gX??()?<,g 11< and if in ?, then: ? for some number 2? for some number 1? ? (),gX ? is monotonic decreasing in gXg?? ()<? <21, gX?? ()?<,g resulting in Fortunately, ?? in () being 12, ?,then: ? ?12< a 95% confidence most practical situations interval for such quantities ? . ? do exist, although (),gX an approximation to the methodis needed for the binomial and Poisson cases. The Actuarial Education Company IFE: 2022 Examination Page 6 CS1-09: Confidence In sampling from a N(, could 2)sdistribution be used as a pivotal quantity, with known value of2s but for easier comparison , the expression intervals X- with the case of unknown variance below we use: Xn s whose distribution is (0,1)N . For example, given a random yields a sample x 1.96 s 62.75 1.96 = n Thisis a symmetrical intervals, sample of size 20 from the normal mean of 62.75, an equal-tailed 10 95% confidence population interval N(, 210 ) for which is: = 62.75 4.38 20 confidence interval since it is of the form we can write down the interval using the ? notation, . For symmetrical confidence where the two values indicate the upper and lower limits. Alternatively, we can write this confidence interval in the form (58.37, 67.13) . Here we are using the pivotal quantity X 10 - 20 , which follows the (0,1)N distribution, irrespective of the value of . The normal Another mean illustration 95% interval, shows with unequal that confidence intervals ?? tails, is ??XX1.8808 -+ ss , nn ?? However, there would not be much are not unique. 2.0537 . reason to use this one in practice. Question Show that both this and the first interval given above are 95% confidence intervals. Calculate the width of each of these intervals. Solution For the second confidence interval: PZ (-< 1.8808 < 2.0537) = P (Z < 2.0537) ( =< PZ =- This interval has width IFE: 2022 Examinations 3.9345 2.0537) 0.98000 - (P Z < -1.8808) (1 -PZ( (1 - 0.97000) < 1.8808)) = 0.95 s n . The Actuarial Education Compan CS1-09: Confidence intervals Page 7 For the first confidence interval: P( 1.96-< Z < 1.96) =PZ ( < 1.96) -PZ ( 2= ( PZ < 1.96) 2= 0.975 - 1 n Since the 0.95 = . which are of some use in practice are the one-sided 95%intervals: ??s X +,1.6449 -8 1 s Thisinterval haslength 3.92 Otherintervals - < - 1.96) normal and ?? n?? X 1.6449 ,-8 n distribution is symmetrical ??s ?? ?? about the value of the unknown parameter, it is quite easy to see that the equal-tailed interval is the shortest-length interval for that level of confidence. Question The average IQ of a random sample of 50 university students is found to be 132. Calculate a symmetrical 95%confidence interval for the averageIQ of university students, assuming that IQs are normally distributed. It is known from previous studies that the standard deviation ofIQs among students is approximately 20. Solution X- Sincethe distribution is normal, weknow that n s From the Tables weknow that 0.95 0.95 X- P( 1.96=< Using n 50= , s < 1.96)PZ , so: <1.96) Rearranging to obtain limits for ( =- 1.96 PX < n s 0.95 (=- 1.96 ? N(0,1) , when s is known. < : < X +1.96 ss ) nn 20=and X 132=from the question, we obtain the interval 132 5.5, or ()126.5,137.5 . So a symmetrical 95% confidence interval The Actuarial Company Education for the average IQ is() 126.5, 137.5 . IFE: 2022 Examination Page 8 CS1-09: Confidence With prediction intervals, we are predicting already have a sample of values 1(,X value value from the distribution. ?, X )n from this distribution, Since we well call this new predicted Xn 1+ . A similar from a single future intervals approach a normal depend on - can be used for distribution prediction intervals. with known variance, - In the example XXn1+ above, has a distribution of sampling that does not , and in fact: XXn + 1 ? N(0,1) 11 n s + The predicted value comes from a normal distribution, )s(, +1 ?XNn Theorem tells usthat for samplesfrom a normal distribution, linear combination of normal distributions XXn 1 ? N --+s result from n + s22) = N(0,s 2(1 (, 2 . The Central Limit X ?)Nn (,s 2 . Hence, usingthe Chapter 4: n+1)) Standardising this gives the result above. The previous with s +11 derivations n : a 95% prediction X 1.96 Aless formal therefore 1+s 1 20 = give prediction interval intervals for the random for sample Xn1+ if wereplace n s of size 20 above is: 62.75 20.08 wayto consider this is asfollows. The predicted value comes from a N(, 2)s distribution. Since ( 1.96-< PX < 1.96) = 0.95, weknow that 95% ofthe values from that distribution lies between However, 1.96s . we do not know the true value of X 1.96 s but a 95% confidence interval for it is given by n. Putting these two together, a 95% confidence interval for a predicted value Xn 1+ is: 1.96 =ss Xn 1.96+ () Xs1.96 n 1 +1 Question The average IQ of a random sample of 50 university students is found to be 132. Calculate a symmetrical 99% prediction interval for the averageIQ of university students, assumingthat IQs are normally distributed. It is known from previous studies that the standard deviation ofIQs among students is approximately 20. IFE: 2022 Examinations The Actuarial Education Compan CS1-09: Confidence intervals Page 9 Solution Since the distribution is normal, -XXn +1 ? N(0,1), when sis known. we use 11 n s From the Tables weknow that 0.99 P(=- 2.5758 0.99 Rearranging to obtain limits for Using n (PX=- 2.5758 50= , s ( 2.5758=<PZ < 2.5758), so: - XXn +1 <2.5758) 11 n < s 0.99 + + Xn 1+ : 1+1 n Xn+1 < < X + 2.5758 ss 1 + 1 n) 20=and X 132=from the question, we obtain the interval 132 52.03, or 80.0,184.0() . So a symmetrical 2.2 for the average IQ is() 80.0, 184.0 . 99% prediction interval Confidence limits The 95% confidence interval ?? , XX-+ 1.96 ss ?? for nn?? 1.96 as: s X 1.96 n This is quite informative as it gives the point estimator accuracy. cannot However, this always Also one-sided confidence intervals limit is often expressed X together be done so simply withthe indication using a confidence ofits interval. correspond to specifying an upper orlower confidence only. If an exam question asks for aconfidence interval, it means a two-sided symmetrical confidence interval. If the examiners require any other type of confidence interval, they will explicitly ask for it. 2.3 Sample size A very common question asked of a statistician is: How large a sample is needed? This question cannot (1) the accuracy (2) anindication The Actuarial Education be answered of estimation without further information, namely: required ofthe size of the population standard deviation Company s. IFE: 2022 Examination Page 10 CS1-09: Confidence The latter information be needed or a rough intervals may not readily be available, in which case a small pilot sample may guess based on previous studies in similar populations. Asa consequence of the Central Limit Theorem, a confidence interval that is derived from alarge sample will tend to be narrower than the corresponding interval since the variation in the observed values will tend to average derived from a small sample, out as the sample size is increased. Marketresearch companies often need to be confident that their results are accurate to within a given margin(eg 3%). In order to do this, they will need to estimate how big a sampleis required in order to obtain a narrow enough confidence interval. Example A company wishes to estimate the mean claim amount for claims under a certain class of policy during the past year. Extensive past records from previous years suggest that the standard deviation of claim amounts is likely to be about 45. If the company wishes to estimate the mean claim amount such that a 95% confidence interval is of width 5 , determine the sample size needed to achieve this accuracy of estimation. Solution Theresulting confidence interval will be x 1.96 s n . The standard deviation s can betaken to be 45 and so werequire n such that: 1.96 45 = 5 ? n 1.96= 45 5 = 17.64 ?nn = 311.2 So a sample of size 312, or perhaps 320 to err on the safe side (since the variance is only a rough guess) would be required. Question Calculate how big a sample would be needed to have a 99% confidence interval of width 1 . Solution The answer can be calculated from the equation: 2.5758= 45 1 ? =n 13,436 n The figure of 2.5758 can be found on page 162 of the Tables. In this case weneed a substantially bigger sample size. IFE: 2022 Examinations The Actuarial Education Compan CS1-09: Confidence intervals Page 11 3 Confidenceandpredictionintervalsfor the normaldistribution 3.1 The mean The previous section dealt with confidence intervals for a normal mean in the case where the standard deviation s was known. In practice this is unlikely to be the case and so a different pivotal quantity is needed for the realistic case when s is unknown. Fortunately, there is a similar pivotal quantity readily available and that is the t result: X- ~tn 1 - / Sn where S is the sample standard deviation. The resulting confidence Xt0.025,1n interval, in the form of symmetrical 95% confidence limits, is: S - n t0.025,1 -n is usedto denote the upper2.5%pointofthe t distributionwithn1- degrees of freedom, andis defined by: Ptnn 10.025, ()1 0.025--t>= Forexample,from page163ofthe Tables,t0.025,10 is equalto 2.228. This is a small sample N(0,1) and the confidence interval for . Central Limit Theorem justifies For large the resulting samples interval tn -1 becomes like without the requirement that the population is normal. The normality of the population is animportant assumption for the validity of the t interval especially whenthe sample size is very small, for example, in single figures. However the t interval is quite robust against departures from normality especially as the sample size increases. Normality can be checked by inspecting a diagram, such as a dotplot, of the data. This can also be used to identify substantial skewness or outliers which may invalidate the analysis. Question Calculate a 95%confidence interval for the average height of 10-year-old children, assuming that heights have a N s2(, ) distribution (where of 5 children The Actuarial and s are unknown), based on arandom sample whose heights are: 124cm, 122cm, 130cm, 125cm and 132cm. Education Company IFE: 2022 Examination Page 12 CS1-09: Confidence intervals Solution Sincethe sample comes from a normal distribution, weknow that X- has a tn -1 distribution, Sn where 2S is the sample variance. Fromthe Tables, wefind that t 0.025,4= 2.776, ie 0.95 ( 2.776=<Pt 4 < 2.776). So: 0.95 P( 2.776=< X - <2.776) Sn Rearrangingthe inequality to isolate 0.95 (PX 2.776=S n< gives: < X +2.776 S n) Usingthe calculated valuesfor the sample ( n5= , x =126.6, and s2 17.8= ) gives: 121.4, 131.8() When calculating a numerical confidence interval, we must drop the probability notation. This is required since muis not arandom variable and hence expressions such as P 121.4 << The Rfunction variance is: 131.8() for = 0.95) do not makesense. a symmetrical t.test(<sample 95% confidence data>, interval for the mean with unknown conf=0.95) For small samples from a non-normal distribution, confidence intervals empirically in R using the bootstrap method used in Chapter 8 Section can be constructed 7. For example, a non-parametric 95% confidence interval for the mean could be obtained by: quantile(replicate(1000,mean(sample(<sample data>,replace=TRUE))), probs=c(0.025,0.975)) 3.2 Thevariance For the estimation available: of a normal -1()nS 2 2 s IFE: 2022 Examinations ~ variance s 2 , there is again a pivotal quantity readily 2 ?n-1 The Actuarial Education Compan CS1-09: Confidence intervals The resulting Page 13 95% confidence interval for the variance ()nS -- 22 11 ()nS ?? ?? 22 , ?? 1 ?? 0.975, nn-- 1?? 0.025, or for the standard deviation ()nS -, 0.025, 1 s : ?? 11 ()nS22 ?? ?? 22 ?? 0.975, nn 1 -?? Note: Dueto the skewness ofthe symmetrical s2 is: ?2 distribution, these confidence intervals about the point estimator So wecant write these usingthe S2, and are also not the shortest-length are not intervals. notation. The above intervals require the normality assumption for the population but are considered fairly robust against departures from normality for reasonable sample sizes. There is no built-in function for calculating confidence intervals for the variance in R. We can use Rto calculate the results of the formula from scratch or use a bootstrap method if the assumptions are not met. Question Calculate: (i) an equal-tailed 95% confidence interval and (ii) a 95%confidenceinterval of the form 0,L() for the standard deviation ofthe heights ofthe children in the population based on the information given in the last question. Solution Sincethe sample comes from a normal distribution, weknow that 4S2 s (i) 2 2. ?? 4 From the Tables, wefind that: ?4 0.95 P(0.4844=< <2 11.14) So: 0.95 P(0.4844=< 4S2 s The Actuarial Education Company 2 <11.14) IFE: 2022 Examination Page 14 CS1-09: Confidence intervals Replacing2S by 17.8, the sample variance calculated in the solution to the previous question, and dropping the probability notation (since2s have: is not arandom variable), we 4 17.8 0.4844<<11.14 s 2 71.2 ? s 11.14 ? 6.39 2.53 ? 2<< 71.2 0.4844 s2 << 147.0 s << 12.1 So, an equal-tailed 95%confidence interval for the standard deviation is (2.53, 12.1). (ii) From the Tables wefind that: 0.95 P(0.7107=< ?2 ) 4 0.95 P 0.7107=< So: 4S2?? ?? 2 ?? s ?? Replacing 2S by 17.8, the sample variance calculated in the solution to the previous question, and dropping the probability notation (since2s have: 417.8 < 0.7107 ? ss 22< 100.2 ? s is not arandom variable), we <10.0 So, a one-sided 95%confidence interval for the standard deviation is Tofind a confidence interval lower 5% point of the with an upper limit, ie of the form 2 distribution, ?4 ie the point 0,L () , we need to start with the which is exceeded by 95% of the distribution. If we wantto find a confidenceinterval withalower limit, ie of the form start byfinding the upper 5% point of the 2 ?4 0, 10.0() . 8(),L , we wouldneedto distribution, ie the point whichis exceeded by only 5% ofthe distribution. IFE: 2022 Examinations The Actuarial Education Compan CS1-09: Confidence 3.3 intervals Page 15 Predictioninterval for normaldistribution Weve already seenthat: X- ? tn- / Sn Replacing with 1 and adjusting Xn1+ the denominator produces a pivotal quantity with the same distribution: - XX n 1 ? tn-1 +11Sn + A prediction interval for Xn+ Xn 1+ can therefore take the form: -1 tS 0.025,n 11 Question The heights of 10-year-old children are normally distributed. The heights of arandom sample of five children (in cm) are: 124cm, 122cm, 130cm, 125cm and 132cm. Calculate a 90% confidence interval for the predicted height of a 10-year-old child based on these data values. Solution Sincethe sample comes from a normal distribution, weknow that - XXn 1 has a tn -1 11Sn + + distribution, where 2S is the sample variance. From the Tables, wefind that t 0.05,4 = 2.132, ie 0.90 0.90 P=( 2.132 ( 2.132=-<Pt 4 < 2.132) . So: - XXn < 1 < 2.132) 11Sn + + Rearranging the inequality 0.90 P( X=- 2.132 to isolate Xn 1+ gives: 1 + 1SnX< n+ 1 < X + Forthis sample, we have n5= , x =126.6, and s2 prediction interval of: 1 + 1Sn2.132 ) 17.8= . Usingthese values gives a 90% 116.7, 136.5() The Actuarial Education Company IFE: 2022 Examination Page 16 There is CS1-09: Confidence no simple function Prediction intervals scratch intervals # for calculating prediction intervals for fitted can either be calculated by implementing distributions random in R. the above formula from or alternatively by leveraging the functionality in Rfor calculating for linear regressions (by regressing on a constant): create intervals prediction sample set.seed(23) x<-rnorm(10) # calculate # confidence mu sigma 10 observations and <-mean(x) # mu sample mu calculate <-c(mu # root + sigma * sqrt(1/10) + sigma * sqrt(1+1/10) sigma * <-c(mu and intervals square + confidence functionality sample from scratch mean # prediction_interval # prediction <-sqrt(var(x)) confidence_interval in + sample * sqrt(1/10) variance * qt(0.025,9), qt(0.975,9)) sigma * prediction of * sqrt(1+1/10) * qt(0.025,9), qt(0.975,9)) intervals using linear regression (lm) data.frame(1) is just dummy data in formulae below predict(lm(x~1),data.frame(1),interval = predict(lm(x~1),data.frame(1),interval = "prediction") "confidence") Thelinear regression approach is covered in Chapter 12. IFE: 2022 Examinations The Actuarial Education Compan CS1-09: Confidence 4 intervals Page 17 Confidence intervalsfor binomial &Poissonparameters Both these situations probabilities involve using the large-sample adjusted. One approach distribution Then if both normal approximations, is to use a quantity Phh??)() h( X) 12 1()h ? and can be inverted 4.1 a discrete which introduces the difficulty not being exactly 0.95, and so at least 0.95 is usedinstead. hX() of Also, when not the pivotal quantity method whose distribution involves ? must be such that: = 0.95 ()(<< 2()h ? are monotonic increasing to obtain a confidence interval (or both decreasing), the inequalities as before. The binomial distribution If Xis a single ? observation from Bin (, n ? ) , the maximum likelihood estimator is: = X n Whatfollows is a slight diversion from our aim of obtaining a confidence interval for ?. It is just demonstrating that the methodis sound. Using X as the quantity =() Ph12() h<< PX h 0.95??X , where withequal tails () () hX() , it is necessary to find if 1()h ? and PX h ()?() 2()h ? exist such that 0.025 and==1 0.025==2 ?() . = 95% = 21/2% = 21/2% h1 (?) h2(?) Wecan have at most 2.5% in the lower (or upper) tail, so we need to be very careful about finding the values of1h and2h . Thereis no explicit expression for the pivotal quantity For the Bin (20,0.3) PX The Actuarial case: 1 == 0.0076 () Education hX() . Company and P( X = 2) = 0.0355 ? h 1( ?) = 1 IFE: 2022 Examination Page 18 CS1-09: Confidence intervals Also: PX 11 ()== 0.0171, (PX = 10) = 0.0480 ? =2h( ? ) 11 Question Calculate the values of1h and2h for the binomial distribution with parameters n 20= , and 0.4= ? . Solution If X Bin? (20,0.4) , then (using page 188 of the Tables) PX== ( 3) 0.0160 and PX== ( 4) 0.0510. So3h1 = . Also (PX==13) 0.0210 and PX== ( 12) 0.0565. So 2 = 13h . h1 and2h have higher values than for the Bin(20,0.3) case. So ()1h ? and ()2h ? do exist andincrease with ?. Were back on track. Wecan move on to obtain our confidence interval for Therefore the inequalities can be inverted Xh= =11X?? () ? ?( ) Xh= X??() =22 ? ? ( These are the tail probabilities. ?. as follows: ) So the inequalities involving ?1 and ?2 are defining the tails. Our confidence interval is the region not covered by these tail inequalities: This gives a 95% confidence interval Note: The lower limit ? 1()X from the lower of the form )??(21 X 2()X? comes from the upper tail tail <<()X? . probabilities and the upper limit probabilities. Wellsee this is the casein the question on the next page. However since there expressions for are no explicit 1()X? and So, adopting the convention found expressions 2()X? and they ofincluding for 1()h ? and 2()h ? , there will have to be calculated are no numerically. the observed xin the tails, 1? and2? can be by solving: nx br rx n??; , () 0.025 and ??(br ; n, 12 )==0.025 r== 0 Here br(; ?, n ) denotes=PX r() when IFE: 2022 Examinations ?XB in (, ? ). n The Actuarial Education Compan CS1-09: Confidence intervals Page 19 These can be expressed 1-- Fx in terms ; ?=11 () of the distribution 0.025 and Fx ? function ?: ();Fx ()=2;0.025 Note: Equality can be attained as ? has a continuous range (0,1) and the discrete problem does not arise. The Rfunction for an exact 95% confidence binom.test(x,n, interval for the proportion is: conf=0.95) Question Wehave obtained a value of 1from the binomial distribution with parameters n Construct a 95%symmetrical confidence interval for ?. 20=and ?. Solution PX== (1) Weneed1? such that under Bin(20, 2? 0.025 under Bin(20,1? ) , and2? such that PX==0.025 (1) ). Forthe first equation, wehave ( 1 ??? ) 20-+ 20(1 - 11)19 1 = 0.025. Solvingthis weobtain =1 ? A numerical 0.249. method will be needed here, or trial and improvement. write the equation in the form (1 -+ One approach would be to 19 ) = 0.025, then iterate using ??11)19(1 1 1=-?? ?n+1 0.025 ??19 119?n??+ starting with 0.5 . ?=1 Forthe second equation we have (1?-=20 2 ) Solvingthis weobtain =2 ? 0.975. 0.00127. Our confidence interval is then() 0.00127,0.249 . The normal approximation It is easy for a computer p evenif nis large. binomial n?? to calculate an exact confidence interval for the binomial parameter However, on a piece of paper we use the normal approximation to the distribution. Xn? can be used as a pivotal quantity. 1- () The Actuarial Education Company IFE: 2022 Examination Page 20 CS1-09: Confidence Solving the resulting - Xn? However n?? simpler equations for , with ? intervals ? would not be easy. in place of ? (in the denominator only), can be used in a -1() way and yields the standard 95% confidence interval used in practice, namely: ) Xn (1-1.96 ?? n or ? 1.96 (1 -?? n ) , where = ? X n . Question In a one-year mortalityinvestigation, 45 of the 250 ninety-year-olds present at the start ofthe investigation distribution died before the end of the year. with parameters n Assuming that the number of deaths has a binomial 250=and q, calculate a symmetrical 90% confidence interval for the unknown mortality rate q. Solution -Xnq Since 250is alarge sample, weknow that nq(1 Since ( 1.6449-< < - ? N(0,1) approximately. q) 1.6449) = 0.90PZ , wecan saythat: P??=1.6449-< ??- 250Xq 250 (1 qq) < 1.6449 ???? 0.90 Rearranging this: 250 1.6449 Xq(1 250 q) Pq < X 250 + 1.6449-< Replacing X by the observed value of 45 gives q = Therefore a symmetrical IFE: 2022 Examinations q(1--q )?? ??= 0.90 250 ?? ?? 45 250 90% confidence interval for . q is 0.140,0.220() . The Actuarial Education Compan CS1-09: Confidence intervals Page 21 Question Repeatthe question on page 15 usingthe normal approximation. obtained. Comment on the answer Solution A 95%symmetrical confidence interval is given by: X X 1.96 X???? nn?? 1 nn From the question, weknow that x1= the confidence interval and n to be (-0.046,0.146) 20= . Substituting these into the formula, weget . Sincethe value of n is so small, the normal approximation is not really appropriate. Thisis highlighted by the lower limit whichis not sensible, as p mustbe between 0 and 1. The upper limit is not even close to the accurate value either. The reason whythe accuracyis so poor in this caseis that the distribution is skew. Since wegot 1 out of 20,the value of p can be estimated as 0.05. Sothe value of np1 20 0.05 = is nowhere near big enough to justify a normal approximation, where we usually require np5= 4.2 . The Poissondistribution The Poisson situation can be tackled in a very similar wayto the binomial for both large and small sample sizes. If Xi,1 = ,2,...,in areindependent Poi()? variables,that is, arandom sample of size n fromPoi ()? , then ?Xi?Poi( )n? . Using iX ? as a single observation from Poi()n? is equivalent to the random sample of size n from Poi ()?. Thisis similarto the single binomial situation. Recallthat a Bi )n n (, p distribution arisesfrom the sum of n Bernoulli trials with probability of success p. Given a single observation where ()1h ? and Inverting this gives The Actuarial Education X from a Poi ()? distribution, then =() Ph12 ()h<< () 0.95??X , ()2h ? areincreasing functions of ?. ( Company PX) << ?12( ( )) = 0.95X?? . IFE: 2022 Examination Page 22 CS1-09: Confidence The resulting 8 95% confidence interval for is given by ? )?? (, 12 intervals where: x ? 1(; pr ? ) = 0. 025and ? pr (; ?2) = 0.025 r=x r=0 Here pr(; ? 1) denotes PXr= () where ?XPoi ? 1() . or: Fx 11;-=1 ? () The Rfunction for 0.025 and Fx ? ()=2;0.025 an exact 95% confidence poisson.test(x,n, interval for the Poisson parameter ? is: conf=0.95) Question Supposethat wehave obtaineda value of 1from confidence interval for ()Poi ?. Calculateasymmetrical 90% ?. Solution Weneed , and 0.05 under Poi() 1? PX== (1) Thefirst equation is equation ? . 0.05 under Poi() 2? 10.05-= ? ee 11 = 0.95?? , which gives ? 1 -- -- ??? 2ee 22 The second equation is iterative PX== (1) =log 1 = 0.0513. 0.05+= . Solving this numerically, for example by using the ???+ ?? , we obtain 0.05?? 2 = 4.74? Therefore a symmetrical 90%confidence interval for . ?is 0.0513, 4.74(). Notsurprisingly this is very wide, since we only have 1 sample value. Thenormalapproximation Again, it is easy for a computer to calculate an exact confidence interval for large sample from Poisson()? , or a single observation from Poisson()? However, ?Xi on a piece of paper a normal approximation n?? ~ Poi()(nN IFE: 2022 Examinations ? ,n ? even for a where ? is large. can be used either from ?) or from the Central Limit Theorem as ?XN ? , ? ?? The Actuarial ?? n?? . Education Compan CS1-09: Confidence X? ? / n intervals can then Page 23 be used as a pivotal quantity yielding a confidence interval. However, as in X- the binomial case, the standard confidence interval in practical use comes from ? where =? / ? n X. This clearly gives X 1.96 X as an approximate 95% confidence interval for n ? . Question In a one-year investigation of claim frequencies for a particular category of motorists, the total number of claims made under 5,000 policies was 800. Assuming that the number of claims made byindividual motorists has a for the unknown ?()Poidistribution, calculate a symmetrical 90%confidence interval average claim frequency ?. Solution Sincethe sample comes from a Poisson distribution, weknow that X- ? ??? N(0,1) . Here ? n n 5,000= . From the Tables, wefind that P( 1.6449-< X - ? ? ( 1.6449-< < 1.6449) = 0.90PZ . So: <1.6449) = 0.90 n Rearranging so that only ?lies in the middle of the doubleinequality: PX 1.6449 -< <? X +1.6449 ?? ?? ?? = 0.90 nn?? ?? Replacing nby5,000,X by0.16and ?by0.16givesaconfidence intervalof The Actuarial Education Company 0.151, 0.169(). IFE: 2022 Examination Page 24 5 CS1-09: Confidence intervals Confidence intervalsfor two-sample problems A comparison of the parameters of two populations can be considered by taking independent random samples from each population. The importance of the independence is illustrated by noting that: 22 var ss 12 ?? XX 12?? -= when the samples If the samples + nn12 are independent. are not independent, ss ?? XX 12 ??-= var This covariance term + then a covariance term will be included: 22 12 - 2cov ?? ,1X X2?? nn12 can clearly have a substantial effect in the non-independent case. The most common form of non-independence is due to paired data. 5.1 Two normal means Case 1(known population variances) If 1X and 2X respectively respectively, population are the meansfrom independent taken from populations then the equal-tailed 100-(1 which have known variances )%a confidence interval 2 and s1 2 s2 for the difference in the means is given by: z- XX 12() So for example, There is normal random samples of size1n and2n when ss a /2 a=5%, no built-in function 22 + 12 nn12 we have a for calculating zz 22.5%1.9600 . == the above confidence interval Rto calculate the results ofthe formula from scratch or use a bootstrap assumptions are not IFE: 2022 Examinations in R. Wecan use methodif the met. The Actuarial Education Compan CS1-09: Confidence intervals Page 25 Case 2(unknown population variances) If 1,,X12 XS and 2S , are the samples of size 1n and 2n means and standard respectively taken from variances, then the equal-tailed 100-(1 population deviations normal from independent populations random which have equal )%a confidence interval for the difference in the means is given by: t12 XX () .nn a +12/2,- 11 nn12 2 Sp + where: (1)nS11 -+ ( n2 2 Sp = - 1)S2 22 nn 2 12 +- Thisformula is given on page 23 of the Tables. In any practical situation 2 s1 small and whether consideration and 2 s2 are known must be made as to or unknown. whether 1n and 2n are large or In the case of the t result it should be noted that there is the additional assumption of equality of variances and this should checked by plotting the data in a suitable way and/or using the formal approach in be Section 5.2. Note: The pooled estimator to give an unbiased 2 Sp is based on the maximum likelihood estimator but adjusted for the t is the same asthe estimator. Remember that the number of degrees of freedom distribution number usedin the denominator of the pooled sample variance formula. S1 2 and S2 2 arethe sample variances calculated in the usual way. Question A motor company runs tests to investigate the fuel consumption of cars using a newly developed fuel additive. Sixteen cars ofthe same makeand age are used, eight withthe new additive and eight as controls. The results, in milesper gallon over atest track under regulated conditions, are asfollows: Control 27.0 32.2 30.4 28.0 26.5 25.5 29.6 27.2 Additive 31.4 29.9 33.2 34.4 32.0 28.7 26.1 30.3 Calculate a 95%confidence interval for the increase in milesper gallon achieved by cars withthe additive. State clearly any assumptions required for this analysis. The Actuarial Education Company IFE: 2022 Examination Page 26 CS1-09: Confidence intervals Solution Assumingthat the samples come from normal distributions withthe same variance and that the samples areindependent, (AC ()-- weknow that Sp - AC) ?t 11 nn+- 2, whereA and C arethe AC + nn AC sample means, A and C are the underlying population means,An andCn arethe sample sizes and spis the pooledsamplestandarddeviation. 2 by 5.96, andAn andCn by 8to obtain the Wenow replace A by 30.75, C by 28.3, Sp confidence interval. For a symmetric 2 (The individual sample variances are sA = confidence interval, 48.06 7 2 and sC = 35.38 .) 7 we need the upper 2.5% point of 14t, which is 2.145. Substituting these valuesin gives a symmetrical 95%confidence interval of: 5.96 2.45 2.145 =- 4 ( 0.168,5.068) In R we can use the function t.test confidence interval for the difference The t.test function with the argument var.equal = TRUE to obtain a between the means with unknown but equal variances. can also obtain confidence intervals for the difference between the means with unknown but non-equal variances. 5.2 Again, we could use the bootstrap methodto construct empirical confidence intervals if the assumptions are not met. of the above formulae Two population variances For the comparison of two population 22 -ss 22. difference ss 12/than the 12 but also from a technical variances, This follows it is more natural to consider logically point of view there is a pivotal from the concept quantity readily the ratio of variance, available for the ratio of normal variances but not for their difference. It is 22 SS/ 12 ~ Fnn 22 ss/ 12 The resulting 1, 12 -- 1 confidence 1 1, nn12-- 1 . interval 22 s11 22 s 22 is given by: SS 2 << 1 ..nn F 2 SSF 2 1, 21 1 -- where Fnn 1, -- 1 is the relevant percentage point from the F distribution. 12 Noticethat the order of the degrees offreedom is different in the two F distributions here. IFE: 2022 Examinations The Actuarial Education Compan CS1-09: Confidence intervals Page 27 It should be said that in practice the estimation of 22 is 12/ ss not a common objective. However the same F result is used for the more common objective oftesting andss 22 12 may be equal, which is relevant means. The acceptability of the for the t result for H :ss01 hypothesis = 22 2 comparing whether population can be determined simply by confirming that the value 1is included in the confidence interval for 22. ss 12 Whatthis is saying is that if the number 1lies in the confidence interval, then 1is one ofthe many reasonable values that the variance ratio can take. So weare not unhappy about the assumption that =22 ss12 1, ie 12. Thealternative wayof checkingequalityis to usethe hypothesis ss=22 test detailed in Chapter 10. Question For the fuel additive data in the previous question, calculate a 90% confidence interval for the ratio s 2 C 2 ofthe variances of the fuel consumption distributions both without and with the s A additive, and comment question. on the equality of variances assumption needed for the analysis in that Solution Fortwo independent random samples from N )CCN 22 S s2(, s2(, A ) and A , sAA SCCs22 ? Fnn 1AC1, -- , 2 whereAn andCn arethe sample sizes, and SA and SC 2 arethe sample variances. 2 Fromthe previousquestion, sA 6.8657=and sC 2 = 5.0543 . 1 Fromthe Tables, weknowthat 0.90 <?? =<PF 3.787 7,7 0.90 22 sAA S 1 P<?? =< 3.787 S 22 sCC ?? 3.787 , whichgivesus: ?? ?? 3.787 ?? ?? 2 Rearrangingthis to give s C s 2 2 (and dropping the probability notation since s C s A 2 variable), weget 0.1944<<s C 2 is not arandom A 2.788. 2 s A So the confidence interval is therefore The Actuarial Education Company (0.1944,2.788) . IFE: 2022 Examination Page 28 CS1-09: Confidence Since the value of 1lies well within this interval, the assumption intervals of equality of variances needed in the previous question appears to bejustified. The Rcode for this confidence interval is var.test the assumptions 5.3 or we could use a bootstrap methodif are not met. Two population proportions The comparison of population proportions corresponds to comparing two binomial probabilities on the basis of single observations 12,XX from Bin (,)n ?11 and Bin?(, 22n ) respectively. Considering only the case where 1n and 2n are large, so that the normal approximation can be used, the pivotal quantity used in practice is: (1 where ??12()-- (? ) ? ?? 11 + ) - ?12 2 (1-- ? 2 ? ~(0,1) ) ? N nn12 XX 12 ,12 are the MLEs ?? , The R code for this nn12 confidence , respectively. interval is prop.test with the argument correct=FALSE. Question In a one-year mortalityinvestigation, 25 of the 100 ninety-year-old malesand 20 of the 150 ninety-year-old females present at the start ofthe investigation died before the end ofthe year. Assumingthat the numbers of deathsfollow independent binomial distributions, calculate a symmetrical 95% confidence interval for the difference between male and female mortality rates atthis age. Solution Since the samples come from independent ?? XX 12 2?? IFE: 2022 Examinations nn 11 we know that, approximately: ? N(0,1) X ? ?11 -- n + distributions, () -pp 12 ??-nn12 ?? 2?? XX 11 ?? binomial ? X2 ? ? n2 ? nn12 The Actuarial Education Compan CS1-09: Confidence Calling intervals Page 29 X X1= , andp1 2 = p2 , and usingthe Tables, weknowthat: n1 n2 ?? ?? pp 12()-- ( 0.95=- P??< 1.96 < Replacing 1p by 0.25 and 2p 0. 016 <- So a symmetrical 5.4 pp () 11 pp12 pp ) ?? - 12 1.96 ??+ p2 11-- p2() nn 12 ?? ?? by 0.133, the inequality becomes: 0.218 < 95% confidence interval for the difference in mortality rates is (0.016,0.218) . TwoPoissonparameters Considering the comparison approximation of two Xi is an estimator ofi? Therefore -XX 12 XX 12 Using ?iiX = Poisson parameters (1? and 2? ) when the normal can be used: such that is an estimator of -12?? N 1 -??-? 2, , an approximate .- XX 12 +??196 ?? ?XN ii, ? ? ??i ?? ni ?? ?? such that: ?? 12??+ nn 12?? 95% confidence interval for -?? 12 is given by: ?? XX 12 nn12 ?? ?? Weare assumingthat the two samples areindependent. There is no built in function for calculating the above confidence Rto calculate the results ofthe formula from scratch. can be used to obtain a confidence interval for the ratio interval in R. However, the function of the two Poisson Wecan use poisson.test parameters. Question In a one-year investigation of claim frequencies 150 claims from the 500 policyholders for a particular category of motorists, there were aged under 25 and 650 claims from the 4,500 remaining policyholders. Assumingthat the numbers of claims madebythe individual motoristsin each category haveindependent Poisson distributions, calculate a 99%confidence interval for the difference between the two Poisson parameters. The Actuarial Education Company IFE: 2022 Examination Page 30 CS1-09: Confidence intervals Solution Sincethe samples come from independent Poisson distributions, weknow that XX()--??- 1 2()12 ? N(0,1), wheresubscripts 1 and 2 refer to young and old drivers XX 12 + nn 12 respectively. From the Tables, we know that 0.99 0.99 =-<2.5758 (=- 2.5758 < XX() < 2.5758)PZ . This gives us: ?? ?? ()-1 -??12 2 2.5758??P??< XX 12 nn 12 ??+ ?? ?? Replacing1X by 0.3,2X by 0.1444,1n by 500 and2n by 4,500andrearranging,the inequality becomes: 0.0908 <-<??12 So the confidence interval IFE: 2022 Examinations 0.2203 for ??-12 is (0.0908,0.2203) . The Actuarial Education Compan CS1-09: Confidence 6 intervals Page 31 Paireddata Paired data is a common example of comparison using non-independent samples. Essentially having paired or matched data meansthat there is one sample: (XX11 21),( X12 , , X22 ),( X13 , X23 ),...,( X1 2 nn) , rather than two separate samples: (XX11 12 , , X13 ,..., X1n) and ,XX21 (, 22 X23,..., X2n) The paired situation is really a single sample problem, that is, a problem based on a sample of n pairs of observations. (In the independent two-sample situation the sample sizes need not, of course, be equal.) Paired data can arise in the form of before and after comparisons. We will see one of these in the next question. Investigations using paired data are usually better than two-sample investigations sense that the estimation is in the more accurate. This meansthe confidence interval derived from the paired data will usually be narrower. Paired data are analysed using the differences DXii =-12 iX and estimation of =-12D is considered. A z result or a t result can be used, but the latter will be more common as it is unlikely that the variances of the differences will be known. Assuming normality of the population of such differences (but not necessarily the normality of the 1X and 2X populations), the pivotal quantity for the t result is: D- D ? tn -1 D/ Sn SDis calculated from the values of D. The resulting 95% confidence interval for D will be Dt0.025,n -1 SD n . Question The average blood pressure b for a group of 10 patients was 77.0 mmHg. The average blood pressure a after they were put on a special diet was75.0 mmHg. Assumingthat variation in blood pressure follows a normal distribution, calculate a 95% symmetrical confidence interval for the reduction in blood pressure attributable to the special diet. Assessthe effectiveness ofthe diet in reducing the patients blood pressure. It is known that The Actuarial Education Company ? -=68. ba ii() 2 IFE: 2022 Examination Page 32 CS1-09: Confidence intervals Solution Sincethis is a paired sample from a normal distribution, weknow that D D where () -- ABt? n- 1, Sn DA =- B. From the Tables, weknow that 0.95 0.95 ( 2.262=< D () AB -D (=- 2.262 <<9t 2.262), so: < 2.262) Sn Wecan now replace n by 10, D by 2.0 andDS by: () ? b-ba ii sD 22( n -a) n- 68 == 3.26, 4 = 19 Rearranging gives a 95% confidence interval ( 10- for 1.764 AB of:- 0.74)-. Sincethis interval does not include the value 0(which would be the value if there wasno difference in the average blood pressure before and after), the diet seems to be effective. A plot ofthe sample differences can be used to check on normality but recall that the t result is robust as n increases. Alsothe Central Limit Theorem meansthat it can be safely used for large n. From a practical viewpoint: (i) When confronted with two-sample data, consideration should the data mayin fact be paired. One wayis to draw a scatterplot be made of whether and calculate the correlation coefficient to see whether there is any relationship in the pairs points. If there is a strong relationship, the data were paired by design. (ii) the data source should of data be checked to see if If a paired problem is analysed as though it involved independent samples, then the results would be invalid because the assumption of independence is violated. On the other hand, if independent samples are analysed as though they were paired, then the results would be valid although they would be makinginefficient data due to the discarding of possible information the two separate populations. The ideal approach is to ask the person The R code for this confidence interval about the who collected the data whether any pairing is t.test with the argument use ofthe means and variances of was used. paired=TRUE. Thefuel consumption data given earlier in the chapter were not paired data. Thereis no wayto link a specific item group. of data in the control group to the corresponding So we analyse the data using the two-sample IFE: 2022 Examinations item of data in the additive t situation. The Actuarial Education Compan CS1-09: Confidence intervals Page 33 Onthe other hand, suppose that we had measured the fuel consumption of 8 cars without a fuel additive, and then re-measured the fuel consumption of the same 8 cars with the fuel additive. Thisis now a paired situation. A dataitem from the first sample is linked to a specific item in the second sample. In this situation we wouldtreat the data as being paired, and would subtract the figure for control consumption for each car from the figure for the same car when using the additive. The Actuarial Education Company IFE: 2022 Examination Page 34 CS1-09: Confidence intervals The chapter summary starts on the next page so that you can keep all the chapter summaries together for revision purposes. IFE: 2022 Examinations The Actuarial Education Compan CS1-09: Confidence intervals Page 35 Chapter9 Summary Confidenceintervals A confidence interval gives us a range of values in which we believe the true parameter value lies, together with an associated probability. There are a number of different situations for which wecan find confidence intervals. For a single sample from a normal distribution: (0,1) XX-- known nSn s -(1)nS2 ? 2 s 1 ss 22 unknown 2 ?n-1 Forsamples from two independent XX 12() () - 12-+ss 11 ??Ntn- 22 normal distributions: XX()-- (0,1) nn 22 - Sn +11 12 n p known 1 ss 22 2()12 Nt ??+-2 nn12 unknown Assuming equal variances where: (1)nS 11 -+ ( n2 2 Sp = - 1) S 22 2 nn 12+- 2 To compare the variances of two independent normal populations: 22 S s S s22 11 ? Fnn 22 1, 12-- 1 For a sample from a binomial pp (0,1) pq n or distribution: X-- np ??NN(0,1) (approximately) npq Forsamples from two independent 12()12 -pp-?Np (0,1) pp () pq 11 The Actuarial Education + p2 q2 binomial distributions: (approximately) where X12 p ==, 12 X nn 12 nn 12 Company IFE: 2022 Examination Page 36 CS1-09: Confidence intervals For a sample from a Poisson distribution: ?? (0,1) or -Xn? ??? NN(0,1) (approximately) nn ?? For samples from two independent Poisson distributions: () 12 1 -??-()?? 2 ? (0,1) (approximately) where X==,11 NX?? 2 2 ??12 + nn 12 Generalconfidence intervals for parameters can befound, using the pivotal method, and the formulae given above. For paired data wesubtract the paired values to come up with a new variable, D, and then follow one of the other standard confidence interval calculations: X D DD ? t -1 Sn nD s2 unknown PredictionIntervals A prediction interval gives us a range of values for a future predicted value, together with an associated probability. For a single sample from a normal distribution: XX s 11 IFE: 2022 Examinations (0,1) known -- nS XX++11 nn ++ 11 n 1nNt 22 ss unknown?? The Actuarial Education Compan CS1-09: Confidence intervals Page 37 Chapter9 PracticeQuestions 9.1 Asurvey wascarried out to find out the number of hours that actuarial students spend watching television per week.It wasdiscovered that for a sample of 10 students, the following times were spent watching television: 8, 4, 7, 5, 9, 7, 6, 9, 5, 7 (i) (a) Calculate a symmetrical 95%confidence interval for the meantime an actuarial student spends watching television (b) (ii) per week. Write down the assumptions needed to calculate the confidence interval in part (a). Calculate a symmetrical 95% prediction interval for the time an actuarial student spends watchingtelevision per week. (iii) (a) Describethe limiting case of the formulae for the intervals in parts (i)(a) and (ii) as n tends to infinity. (b) 9.2 Explain which ofthe two intervals calculated will be moresensitive to the assumptions in part (i)(b). Aresearcher investigating attitudes to Sundayshopping reports that, in a sample of 8 interviewees, 7 werein favour of more opportunities to shop on Sunday. Calculate an exact 95%confidence interval for the underlying proportion in favour of this idea usingthe binomial distribution. 9.3 An opinion poll of 1,000 voters found that 450 favoured Party P. Calculate an approximate 99% confidence interval for the proportion Comment on the likelihood The Actuarial Education Company of voters whofavour Party P. of more than 50% of the voters voting for Party Pin an election. IFE: 2022 Examination Page 38 9.4 CS1-09: Confidence Two inspectors carry out property valuations for an estate agency. Over a particular intervals week they each go out to similar properties. Thetable below shows their valuations (in 000s): Exam style A 102 98 93 86 92 94 89 97 B 86 88 92 95 98 97 94 92 The dotplots for these two inspectors 91 as Inspector A 85 90 95 100 105 100 105 valuation ('000) Inspector B 85 90 95 valuation ('000) (i) (a) (b) Comment on the possible assumption of normality and equal variances for the two underlying populations using the diagrams. Calculate a 95% confidence interval for this common variance using the equal variance assumption from part (a). (c) Calculate a 95% confidence interval for the meandifference between the valuations by A and B,commenting briefly on the result. [10] The estate agency employing the inspectors decides to test their valuations by sending them each to the same set of eight houses,independently and without knowledge that the other is going. The resulting valuations (in 000s) follow: Property (ii) 1 2 3 4 5 6 7 8 A 94 98 102 132 118 121 106 123 B 92 96 111 129 111 122 101 118 Calculate a 90% confidence interval for the mean difference between valuations B,commenting briefly on the result. IFE: 2022 Examinations by A and [4] [Total 14] The Actuarial Education Compan CS1-09: Confidence 9.5 intervals Page 39 The ordered remission times (in weeks) of 20 leukaemia Exam style patients are given in the table: 1 1 2 2 3 4 4 5 5 8 8 8 11 11 12 12 15 17 22 23 Supposethe remission times can be regarded as arandom sample from an exponential distribution with density: fx(;?? ?) (i) (ii) e x, x=>- 0 maximum likelihood estimator ? (a) Determine the of ?. (b) Calculate the large-sample approximate variance of ? . (c) Hence calculate an approximate 95% confidence interval for (a) Calculate an exact 95%confidence interval for ?. [7] ? usingthe fact that ?2 nX has a ?2 2n distribution. (b) Comment briefly on how it compares with your interval in (i)(c). [3] [Total 10] The Actuarial Education Company IFE: 2022 Examination Page 40 9.6 CS1-09: Confidence Heights of males with classic congenital intervals adrenal hyperplasia (CAH) are assumed to be normally distributed. (i) Determine the minimum sample sizeto ensure that a 95% confidence interval for the mean height has a maximum width of 10cm, if: (ii) (iii) 9.7 (a) a previous sample has a standard deviation (b) the population standard deviation is 8.4 cm. Determine the minimum sample sizeto ensure that a 95% prediction interval for the height of a male with CAHhas a maximum width of 38cm,if: (a) a previous sample has astandard deviation of 8.4 cm (b) the population standard deviation is 8.4 cm. Comment on the difference in sample sizes required for parts (i) and(ii). Asample value of 2is obtained from a Poisson distribution (i) with mean . Calculate an exact two-sided 90%confidence interval for Asample of 30 values from the same Poisson distribution (ii) 9.8 of 8.4 cm Usethese data values to construct An office . has a mean of 2. an approximate 90% confidence interval for . manager wants to analyse the variability in the time taken for her typists to complete a given task. She has given seven typists the task and the results are asfollows (in minutes): 15, 17.2, 13.7, 11.2, 18, 15.1, 14 The manager wants a 95% confidence interval form for the true standard deviation of time taken of the 8(),k . Calculatethe value of k. 9.9 The amounts ofindividual claims arising under a certain type of generalinsurance policy are known from Exam style past experience to conform to alognormal distribution in which the standard deviation is 1.8 times the mean. An actuary hasfound that the lower and upperlimits of a 95% confidence interval for the meanclaim amount are 4,250 and 4,750. Evaluate the lower and upperlimits of a 95%confidence interval for the lognormal parameter . [3] IFE: 2022 Examinations The Actuarial Education Compan CS1-09: Confidence 9.10 Exam style intervals A general insurance Page 41 company is debating introducing a new screening programme to reduce the claim amounts that it needsto pay out. The programme consists of a much more detailed application form that takes longer for the new client department to process. The screening is applied to a test group of clients as atrial whilst other clients continue to fill in the old application form. It can be assumed that claim payments follow a normal distribution. The claim payments datafor samples of the two groups of clients are(in 100 per year): (i) Without screening 24.5 21.7 35.2 15.9 23.7 34.2 29.3 21.1 23.5 28.3 Withscreening 22.4 21.2 36.3 15.7 21.5 7.3 12.8 21.2 23.9 18.4 (a) Calculate a 95% confidence interval for the difference between the mean claim amounts. (ii) (b) Comment on your answer. [6] (a) Calculate a 95% confidence interval (b) Hence, comment on the assumption of equal variances required in part (i). for the ratio of the population variances. [4] Assumethat the sample sizes taken from the clients with and without screening are always equal to keep processing easy. (iii) Calculatethe minimum sample size so that the width of a 95% confidence interval for the difference between meanclaim amounts is less than 10, assuming that the samples have the same variances asin part (i). [3] [Total The Actuarial Education Company 13] IFE: 2022 Examination Page 42 CS1-09: Confidence intervals The solutions start on the next page so that you can separate the questions and solutions. IFE: 2022 Examinations The Actuarial Education Compan CS1-09: Confidence intervals Page 43 Chapter9 Solutions 9.1 (i)(a) The sample meanand variance are: 67 ==6.7 x 10 1 9 475 s =- {} 10 6.722= 2.9 Sothe confidence interval is given by: 6.7 t 0.025;9 2.9 10 From the Tables with a = 0.025, t0.025,9 = 2.262, so our confidence interval is (5.48, 7.92). (i)(b) Wehave assumed that the numbers of hours that actuarial students spend watching television (ii) has a normal distribution. The prediction interval is given by: 6.7 t 0.025;9 1?? 2.9 1+?? 10?? This gives a prediction interval (iii)(a) of (2.66, 10.7). Forlarge samples, the confidence interval for the mean will eventually converge on the sample mean which should be equal to the true mean, whereas the prediction interval will not converge to a single value but down to aninterval of the distribution. (iii)(b) Unlike confidence intervals distribution, for the prediction intervals mean, whichis concerned with the centre of the also take account of the tails as well as the centre. Hence, prediction intervals have greater sensitivity to the assumption of normality. 9.2 The numberin a sample of 8 who arein favour has a Bin(8,)p underlying proportion in favour. We wantthe value of p for whichthe probability of getting 7 or more in favour in a sample of 8is 0.025. This will give the lower p. distribution, where p is the true end of the confidence interval for Wealso wantthe value of p for whichthe probability of getting 7 or fewer in favour is 0.025. This will give us the upper end of the interval. The probability of getting 7 or morein favour is: 8?? ?? 7?? 78 (1 pp)-+ p = 0.025 Rearranging the equation: pp7 7 )-=(80.025 The Actuarial Education Company IFE: 2022 Examination Page 44 CS1-09: Confidence Using trial and error, or goalseek in Excel to solve this equation intervals we obtain: p =0.4735 For the upper end of the interval, wehave: p -=810.025 which wecan solve directly to give p (0.4735,0.9968) 9.3 0.9968= . So a 95% confidence interval for p is . Assumingthat the sample comes from a binomial distribution, weknow that the quantity X Xnp - np(1 ? N(0,1) or p) - - p n ? N(0,1) . (1 -pp) Here n = 1,000 and Xis the number whofavour n Party P. From the Tables wefind that 0.99 ( 2.5758 <PZ < 2.5758), so: =- X 0.99 P =- 2.5758 n < ?? ??- p <2.5758 ?? pp) ??-(1 ?? ?? n Rearranging thisto giveus p, andreplacing p byp underthe squareroot: X )pp 0.99??+2.5758 =- 2.5758 nn Replacing X by 450 and p by 450 1,000 Pp < < (1--(1 pp)?? X n n ?? ?? , weget the confidence interval to be 0.409,0.491. () Sincethis 99%confidence interval doesnt contain the value = 0.5p(or highervaluesof p),it is unlikely that Party P will gain morethan 50% of the votes. 9.4 (i)(a) (i)(b) Bappears to have aslightly smaller spread (but it is hard to tell with so few data points). The difference in the spread doesnt appear to be significant, so the assumption of equal variances can be allowed to stand. [1] There are no outliers and so there is nothing to suggest non-normality. [1] ForInspector A, we have 2 sA IFE: 2022 Examinations 1 =??70,683=- 8, 751 2?? 78 AA =?? nx 751, 2 70,683, giving: Ax== [1] 26.125 ?? ?? The Actuarial Education Compan CS1-09: Confidence intervals For Inspector 2 sB Page 45 9, =?? BB x== nx B, we have 833 2?? 1 89 =??77,223=- 2 833, 77,223B, giving: [1] 15.528 ?? ?? The common (or pooled) varianceis given by: 2 sP 7 26.125 8 15.528 20.473 + == The pivotal quantity is 2 15 SP 2 s 15 20.473 27.49 (i)(c) [1] +78 , ??152 . This gives a 95% confidence interval for sP2 of: P 15 20.473 ?? ??= (11.2,49.0) 6.262 ?? [1] The confidence interval is calculated using: 1 1 ?? + = nn 12?? 2 - t0.025,15 sP?? xx AB) ( 751 - 833?? 89 ?1 ?? 2.131 20.473 ? ?? ?8 + Thisgivesa confidenceinterval of (-3.37,6.00). 1? ? 9? [2] [1] Sincethis interval contains zero, there is insufficient evidence at the 5%level to suggest that there is a difference in the valuations (ii) For the differences we have nD8= [1] 2 , ?xD 14= , ?xD 198= , giving: 1142?? 2 sD xD = 1.75 given by each of the two inspectors. 198==?? 78 [1] 24.786 ?? ?? The confidence interval is calculated using: xt0.05,7 D= 2 sD 14 1.895 nD 24.786 [1] 88 Thisgivesa confidenceinterval of (1.59,5.09) . Sincethisinterval containszerothere is insufficient evidence to suggest that Aand B give different valuations. 9.5 (i)(a) [2] Thelikelihood function is: n () ???Le == e -- ??? ?xxn ii i=1 ? ln ( ) ln =?? ? xii ln L(?) ? dn =- ??Ln x [2] d?? The Actuarial Education Company IFE: 2022 Examination Page 46 CS1-09: Confidence intervals Setting the derivative equal to zero: 0-= ? Checking its nn xi ? 1 [1] == ?xi X a maximum: 2 ln L( ? ) =d dn ?? <0 22 For these data, ?== 1 max ? [1/2] 11 CRLB=- d2 d? 2 [1/2] 0.11494. 8.7 (i)(b) ? ? ?? lnEL( ? )?? = ?? ? n ? E? 2 ? 1 n = ? = 2 [1] n ??????2 Usingthese data values, the estimate of the CRLBis 2? n = 0.000661. (i)(c) Since (, ) approximately, the confidence interval is given by ???NCRLB [1] CRLB ?1.96 which, usingthe CRLBestimate, gives (0.06457,0.1653). (ii)(a) Since 2?nX 2 2 n, ?? we have 40 X? ?? [1] 2 40 . Thelower and upper 2.5% points of 2 ?40 are 24.43 and 59.34. So: 2 Pn 24.43 59.34()= 0.95 <<?X Hence a 95%confidence interval for 24.43 40 (ii)(b) 59.34???? 24.43 ??== 40 ????xx 348 , ?is: 59.34 , This confidence interval is narrower 348 [1] 0.07020, 0.1705()?? asit is based upon the exact result, whereas in part (i)(c) it wasbased on a relatively small sample of 20. Alarger sample would have given a narrower interval. [2] 9.6 (i)(a) Sample size needed (unknown Using the result X- variance) ?tn- 1, gives a 95%confidence interval of: Sn xt 0.025;-1n s n IFE: 2022 Examinations The Actuarial Education Compan CS1-09: Confidence intervals Page 47 The width ofthis confidence interval is 2 t 0.025;1-n s , so werequire: n t0.025;n 1 8.4 210<- ? t 0.025; 1n- <0.5952 nn Usingthe values from page 163 ofthe Tables, wefind that: t 0.025;12 2.179 13 13 ==0.6043 and: t 0.025;13 2.160 ==0.5773 14 Therefore (i)(b) 14 we need a sample of at least 14 individuals. Sample size needed (known variance) X- Usingthe result X 1.96 ?N(0,1) , gives a 95% confidence interval of: n s s n The width ofthis confidence interval is 21.96 s . So werequire: n 2 1.96 8.4 10< ? 3.29 < ? nn > 10.8 n Therefore we need asample of atleast 11individuals. (ii)(a) Sample size needed (unknown Using the result - variance) XXn 1 ?tn-1, givesa 95%confidence intervalof: + 11Sn + xn+- ts 0.025; n 1 11 The width of this confidence interval is21 1tsn 0.025; 2 ?tn 0.025;18.4 1+ 1 < 38 1 +- n , so werequire: nn 1 1 1 n+< 2.262-t0.025; Using the values from page 163 of the Tables, wefind that: t 0.025;111 1 12+= 2.201 1 + 1 12 = 2.291 The Actuarial Education Company IFE: 2022 Examination Page 48 CS1-09: Confidence intervals and: t 0.025;121 1 13+= 2.179 1 1 13+= 2.261 Therefore (ii)(b) we need a sample of at least 13individuals. Sample size needed (known variance) -XXn +1 ? N(0,1), gives a 95% confidence interval Usingthe result Xn+s 1.96 + 1 1 The width of this confidence interval is 2 1.96 of: 11 n s 8.4 1 + 1 38= 21.96 ? 1 1+s 1 n. So werequire: 1nn+= 1.1540 ? 1 n = 0.33179 ? n = 3.01 Therefore we need a sample size of atleast 4individuals. (ii)(c) Comment For the confidence intervals the sample sizes are similar, butlarger in the case whereless information is known. In general, prediction intervals are wider than confidence intervals and so alarger sample is needed to get the same width. However,in this case, the prediction intervals vary due to the vast difference in the tails of the t distribution. 9.7 (i) Exact confidence interval Werequire: PX== (2) 0.05 under Po 1i () PX== (2) 0.05 under Po 2i () From the first equation: 0.95 (PX== 0) + PX ( = 1)e= + 1 -- e Solving this numerically weobtain =1 11 0.36. From the second equation: 0.05 = PX ( = 0) PX ( += 1) PX ( += 2) 2 e =+ -ee 2 22 + 2 - 2 2 Solving this numerically weobtain =2 6.3. Sothe confidenceinterval is 0.36,6.3 () . IFE: 2022 Examinations The Actuarial Education Compan CS1-09: Confidence (ii) intervals Page 49 Approximate confidence interval Since nis large enough to use a normal approximation, the pivotal quantity is: ? - Xn (0,1) n or - ??(0,1) (approximately) NN n where . X= Hence, a 90% confidence interval ? Xn 1.6449 Replacing n by 30,X? or 1.6449 by 60 and 60 1.6449 30 2 from: nn by 2 gives: or 2 1.6449 30 can be obtained for 2 30 So a 90% confidence interval is: (1.58,2.42) 9.8 The confidence interval is based on the distributional result: (1)ns2 - s 2 ??n-1 2 Wehave: 104.2 x==14.88571 7 =- 7 {1,581.98 14.88571 22}= 5.148 1 6 s So a 95% one-sided confidence interval for the variance is given by: 6 5.148 ?? 30.888 ?? ,, 8=(2.45, ) 12.59 ?? ??8= 8?? 2 ?? ?? ?0.05;6 So a 95% one-sided confidence interval 9.9 The formulae for the EX []= e s+ for the standard mean and variance of alognormal 1/2 2 and var( ) Xe2 s+=- ( 22 deviation is (1.57,)8 . distribution are: 1)es Sincethe standard deviation equals 1.8times the mean, weknow that: 22 ss 1/21/2 ee The Actuarial Education Company -= s 1.8e++21/2(1) [1] IFE: 2022 Examination Page 50 CS1-09: Confidence intervals So: es 2 1/2 (1) -=1.8 ? s 2 = 1.4446 [1] The 95% confidence interval for the meancorresponds to the inequality: 4,250<<es+ 1/2 2 Solving for 4,750 gives: l og4,250-< 1/2s <log4,750 - 1/2s 22 Usingthe value found for2s , this is: 7.632 7.744<< So the lower limit 9.10 (i)(a) [1] of the confidence interval for is 7.632 and the upper limit is 7.744. Mean difference confidence interval Using the subscript 1 to refer to without screening, and 2 to refer to with screening, the pivotal quantity is: (XX () -- 11 SP ) 12 12 12 ? t nn +-2 - [1] + nn 12 Calculatingthe required values: x1 s1 s2 257.4 10 ==25.74 200.7 x2 10 ==20.07 [1] 1 =- 1025.74 22}=36.1871 9{6,951.16 [1/2] 1 =- 1020.07 22}=58.4357 9{4,553.97 [1/2] The pooled sample varianceis given by: 2 sP 1 (9= 36.1871 + 9 18 58.4357) = 47.3114 [1] Hence,a 95% confidence interval is given by: (25.74 20.07)- 2.101 47.3114 Alternatively, the confidence interval for IFE: 2022 Examinations 2 =-( 10 - 0.793,12.1) [1] 21 is (-12.1,0.793) . The Actuarial Education Compan CS1-09: Confidence (i)(b) intervals Page 51 Comment Sincethe confidence interval contains the value 0,there is insufficient evidence to conclude that the new screening programme significantly reduces the meanclaim amount. [1] (ii)(a) Ratio of variances confidence interval The pivotal quantity is: 22 SS 12 ?Fnn 22 ss 12 [1] 1, 12 -- 1 Hence,a 95% confidence interval is given by: 22 SS 12 0.025; 2 S1 S22 22<< ss 12 1,nn 1 12-- FF0.975; 1-- 1,nn2 1 2 Replacing S1 by 36.1871 and S2 2 by 58.4357, we obtain: 0.6193 4.026 0.6193 1 4.026 22 ss 12 << [2] So the confidence interval is (0.154, 2.49) . Alternatively, the confidence interval for (ii)(b) Comment Since the confidence interval contains 1, this population variances are the same. (iii) 22 is (0.401,6.50). ss 21 meansthat we are reasonably confident that the [1] Sample size The width of the confidence interval is: 2 t2.5%;2n-2 2 36.1871( 1)-+ 58.4357( - 1)nn 19.455 t 2.5%;2n-2 = nn -22 n [1] This mustbeless than 10, so usingthe percentage points of the t distribution from page 163 of the Tables, wesee that: and: n 15= ? n = 16 ? 19.455t0.025,2n-2 19.455 2.048 15 19.455 2.042 16 The minimum sample sizeis 16. The Actuarial Education Company 15 ==>10.3 10 9.93=< 10 [2] IFE: 2022 Examination Page 52 CS1-09: Confidence intervals Endof Part2 Whatnext? 1. Briefly review the key areas of Part 2 and/or re-read the summaries atthe end of Chapters 6 to 9. 2. Ensureyou haveattempted some ofthe Practice Questionsatthe end of eachchapterin Part 2. If you dont havetime to do them all, you could save the remainder for use as part of your revision. 3. 4. Attempt Assignment X2. Workthrough the Chapter6to 9 material(Central Limit Theorem,sampling distributions, estimation and confidence Intervals) of the Paper B Online Resources(PBOR). 5. Attempt Assignment Y1. Timeto consider... ... revision products Flashcards The These are availablein both paper and eBook format. paper-based You can find lots Flashcards moreinformation, Onestudent said: are brilliant. including samples, on our website at www.ActEd.co.uk. Buy online at www.ActEd.co.uk/estore IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 1 Hypothesis testing Syllabus objectives 3.3 Hypothesistesting and goodness offit 3.3.1 Explain what is meant by the following terms: null and alternative hypotheses, simple and composite hypotheses, type I and type II errors, sensitivity, specificity, test statistic, likelihood ratio, critical region, level of significance, probability-value and power of atest. 3.3.2 Apply basic tests for the one-sample and two-sample the normal, binomial and Poisson distributions, situations involving and apply basic tests for paired data. 3.3.3 Applythe permutation approach to non-parametric hypothesis tests. 3.3.4 Use a chi-square test to test the hypothesis that a random sample is from a particular 3.3.5 The Actuarial Education distribution, including cases where parameters are unknown. Explain whatis meantby a contingency (or two-way) table, and use a chi-square test to test the independence of two classification criteria. Company IFE: 2022 Examination Page 2 0 CS1-10: Hypothesis testing Introduction In manyresearch areas,such as medicine, education, advertising and insurance, it is necessaryto carry out statistical tests. Thesetests enable researchers to usethe results of their experiments to answer questions such as: Is Drug Aa more effective treatment Does Training programme Are the severities lognormal for AIDSthan Tlead to improved oflarge individual private Drug B? staff efficiency? motor insurance claims consistent with a distribution? A hypothesis is where we makea statement about something, for example the meanlifetime of smokers is less than that of non-smokers. A hypothesis test is where wecollect arepresentative sample and examineit to seeif our hypothesis holds true. Hypothesis tests are closely linked to the confidence intervals example, when we were sampling from a N s2(, ) distribution 2?? XN By substituting ?? n ?? ?? in X, beingin the centre. s2 ? Z= X -s we developed in Chapter 9. For (s2 known) we used: ??N,(0,1) n s and n, wefound the values of that corresponded For hypothesis tests, wenow assume a value of to 95% of the data based on our hypothesis and can calculate a probability value for the test assuming ourinitial value of is correct. If we find that our sample meanis unlikely to occur given our hypothesised value of , we naturally conclude that it is likely that our sample does not come from this distribution with the assumed value of . In this case we would reject the null hypothesis. If, however our sample meanis not very extreme, it would be fair to saythat it probably does havethe assumed value of case we would not reject the null extreme reject hypothesis. values extreme reject null hypothesis . In this 21/2% values null hypothesis 21/2% z1 z2 assumed Most of the formulae value used in this chapter are identical to those in Chapter 9. The only exceptions arefor the binomial and Poisson distributions. IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 3 Finally, we can develop our estimation work from Chapter 8. For example, suppose we have recorded the following numbers of claims from a certain portfolio over the last 100 months: Claims 0 1 2 3 4 5 6 Frequency 9 22 26 21 13 6 3 Assuming a Poisson distribution with parameter Chapter 8 would be . ==2.37 X , the estimate using the We obtained a confidence interval for the methods given in meanin Chapter 9. But all of this workis appropriate onlyif the distribution is Poisson. We willseein this chapter how to carry out a test of whether our sample does or does not conform to this distribution. The materialin this chapter hastraditionally been examinedin one ofthe longer questions of the Statistics exam. Spend your time wisely. The Actuarial Education Company IFE: 2022 Examination Page 4 CS1-10: Hypothesis 1 Hypotheses, test statistics,decisionsanderrors 1.1 Thetesting procedure testing The standard approach to carrying out a statistical test involves the following steps: specify the hypothesis to be tested select a suitable statistical model design and carry out an experiment/study calculate a test statistic calculate the probability value, or decide whether the value of the test statistic lies within the rejection region determine the conclusion of the test. We will not be concerned here with the design of the experiment. We will assume that an experiment, based on an appropriate statistical model, has already been conducted and the results are available. 1.2 Hypotheses In Sections 1-6 ofthis chapter a hypothesis is a statement about the value of an unknown parameter in the model. The basic hypothesis being tested is the null hypothesis, denoted 0H regarded as representing the current parameter being tested (the status between two difference. populations state of knowledge quo hypothesis). In is being tested In a test, the null hypothesis is contrasted many situations a difference and the null hypothesis with the alternative Where a hypothesis completely specifies the distribution, Otherwise it is called a composite hypothesis. For example, whentesting the null hypothesis H0:0.8 = it can sometimes be or belief about the value of the is that there is no hypothesis, denoted 1H . it is called a simple against the alternative hypothesis H1 =:0.6, both ofthe hypotheses are simple. However whentesting H0:0.8 = H1 <:0.8 ,1H is a composite against hypothesis. Atest is a rule which divides the sample space (the set of possible two subsets, a region in which the data are judged to be consistent complement, in which the data arejudged to beinconsistent here are designed to answer the question Do our rejecting 0H ?. IFE: 2022 Examinations hypothesis. the data provide values of the data) into with0H , and its with0H . Thetests discussed sufficient evidence to justify The Actuarial Education Compan CS1-10: 1.3 Hypothesis testing Page 5 One-sidedandtwo-sidedtests In atest of whether smoking reduces life expectancies, the hypotheses are: H0: smoking makes no difference to life expectancy H1: smoking reduces life expectancy Thisis an example of a one-sided test, since weare only considering the possibility of a reduction in life expectancy, ie a change in one direction. However we could have specified the hypotheses asfollows: H0: smoking makesno difference to life expectancy H1: smoking affects life expectancy Thisis a two-sided test since the alternative hypothesis considers the possibility of a changein either direction, ie anincrease 1.4 or a decrease. Test statistics The actual decision is based on the value of a suitable function of the data, the test statistic. The set of possible values of the test statistic itself divides into two subsets, a region in which the value of the test statistic is judged consistent with 0H , and its complement, the critical region (or rejection region), in which the value ofthe test statistic is judged inconsistent with 0H . If the test statistic has a value in the critical region, 0H is rejected. The test statistic (like any statistic) must be such that its distribution is completely specified when the value of the parameter itself is specified (and in particular under 0H ie when0H is true). In exam questions the test statistic is generally calculated from data givenin the question. For details of how to reach a conclusion in practice, see Section 3.1. 1.5 Errors It is rare for data to enable result of performing a test Type I error: reject 0H discrimination with certainty may be the correct decision, when it is true; Type II error: fail to reject 0H between the two hypotheses. The but two kinds of error could arise: and when it is false. The level of significance of the test, denoted a , is the probability error, ie it is the probability of rejecting 0H when it is in fact true. committing a Type II error, denoted , is the probability Anideal test would be one which simultaneously of committing The probability of accepting 0H minimises a and a Type I of when it is false. . This ideal however is not attainable in practice. The Actuarial Education Company IFE: 2022 Examination Page 6 CS1-10: Hypothesis testing Question Arandom variable Xis believed to follow an hypothesis 20= against the alternative ?()Expdistribution. In order to test the null hypothesis 30= , where ?1= , a single value is observed from the distribution. If this valueis less than 28,0H is not rejected, otherwise 0H is rejected. Calculatethe probabilities of: (i) a TypeI error (ii) a Type II error. Solution (i) The probability of a Type I error is given by: (reject PH when H 00true) P X=> 28 when X ? Exp 1 /20 ()() 1=- The CDF of the exponential (ii) The probability X (28) =Fe- 28/20 = 0.2466 distribution is given on page 11 of the Tables. of a Type II error is given by: (do not reject PH when H 00false) P X=< 28 when X ? Exp 1/ 30()() X(28)== 1 -Fe- 28/30 In this case we were forced to choose between H0 is false is the same as saying that H0 = :20 = 0.6068 and H1:30= . So saying that = 30. Since weve only got one value in our sample here, not surprisingly, the probabilities of Type I and Type II errors are quite big. The probability of a TypeI error is also referred to asthe size of the test, which will normally be a small number such as 0.05 (say). The power of a test is the probability equals 1 of rejecting 0H when it is false, -. In general, this will be a function of the unknown parameter value. For simple hypotheses the power is a single value, but for composite function being defined at all points in the alternative hypothesis. Atest result. so that the power with a high power is said to be powerful IFE: 2022 Examinations hypotheses asit is very effective at demonstrating The Actuarial it is a a positive Education Compan CS1-10: Hypothesis testing Page 7 Question Givean expressionin terms of for the power ofthe test in the question on the previous page. Comment on how the power is affected by the value of . Solution The power is the probability of rejecting 0H value other than PX If =20 . In terms of 28 | X ? whenthe true value ofthe parameter is some this is: Exp 1 / ()()>=-FX 1 (28) = e-28/ is large (1,000, say),then the power will be close to 1, since the test willreject H0:20= very easily. Converselyif H0:20= not reject is small (10, say), then the power will be close to 0, since the test will very easily. Type I and II errors can also arise in the context of binary classification, in healthcare as well as in machine learning contexts. Here, rather than sample consisting hypothesis holds, a common situation gathering a data of multiple observations to assess whether a(population-level) a decision is required In a medical context, the classification for each individual is into healthy observation. and diseased based on a binary test result. In these contexts: A Type I error, known as a false positive, occurs when a healthy individual receives a positive test result; and A Type II error, known for the disease. as a false negative, occurs when a diseased individual tests negative The equivalent null hypothesis in this caseis that the individual is healthy, and weare carrying out a test to ascertain whether this is the case. If the null hypothesis is true (ie the individual is actually healthy) but the test is positive (indicating that the individual hasthe disease),then we would berejecting atrue hypothesis and makinga TypeI error. If the null hypothesis is false (ie the individual is sick) but the test is negative(indicating that the individual does not have the disease), then we would be failing to reject a false hypothesis and makinga TypeII error. The Actuarial Education Company IFE: 2022 Examination Page 8 CS1-10: Hypothesis testing The table below shows all the possible outcomes from a medical test result: Test result predicts patient as having disease YES YES NO True positive (TP) False negative (FN) Type II error Patient actually has disease NO False positive (FP) TypeI error True negative (TN) The probability of a diseased individual testing positive for the disease (ie atrue positive rate), is the sensitivity of the test: Sensitivity = Number of true positives Number of true positives + Number of false negatives = Number of true positives Total number of people withthe disease = P(positive test|individual hasthe disease) =- P 1(negative test|individual hasthe disease) 1=-P (Type II error) = The probability Power of the test of a healthy individual testing negative (ie a true negative rate), whichis 1 minus the probability Specificity = of a false positive, is called the specificity of the test. Number oftrue negatives Number of true negatives + Number of false positives Number of true negatives = = Total number of people who do not havethe disease P(negativetest|individual 1=- P(positive test|individual does not have the disease) does not havethe disease) 1=- P(Type I error) Question Ashort screening test hasjust been developed for depression. Anindependent blind comparison was made with a gold-standard test for diagnosis of depression among 200 psychiatric outpatients. Amongthe 50 outpatients found to be depressed according to the gold-standard test, 35 patients tested positive under the new short test. Among 150 patients found not to be depressed according to the gold-standard test, 30 patients tested positive under the new short test. IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 9 Calculate the sensitivity and specificity of the short screening test, assuming that the gold-standard test correctly classifies eachindividual. Solution Number of true positives Sensitivity Total number of people Specificity== 35 with depression == 50 = 70% Number of true negatives Total number ofindividuals Examples of binary classifications in 150 - 30 without depression machine learning 150 = 80% contexts include: classifying emails according to whether they are spam assessing whether claims received by an insurance company are fraudulent. One method of makingsuch predictions is to use a generalisedlinear model with a binomial distribution. Well cover this in Chapter 13. Other methods are covered in Subject CS2. Although the contexts makeinferences, are different in important respects (eg hypothesis testing seeks to classifiers seek to make predictions; the true state is usually known with certainty, atleast for a training set, in classification problems), understanding the trade-offs of minimising Type I versus Type II errors play an important role in test selection in both cases. For example, in the case of using a smear test to identify cervical cancer, it is vital to have atest with a high sensitivity (currently its 86%-100%), ascervical canceris a serious but treatable condition if caught early. However, smear tests have a muchlower specificity (currently which meansthat a high proportion 30%-87%), of women with a positive cervical smear test who go on to havefurther investigation subsequently find that there is no causefor concern. Thisis considered a small priceto pay compared to the alternative. R can calculate the power of a one-sample t test (covered in Section 3.1) using the function: power.t.test The Actuarial Education Company IFE: 2022 Examination Page 10 2 CS1-10: Hypothesis testing Classical testing,significanceandp-values 2.1 Best tests The classical approach to finding a good test (called the Neyman-Pearson theory) fixes the value of a, ie the level of significance required and then tries to find such a test for which the other error probability, , is as small as possible for every value of the parameter specified bythe alternative hypothesis. powerful This can also be described asfinding the most test. The key result in the search for such a test is the the best test (smallest Neyman-Pearson ) in the case oftwo simple hypotheses. lemma, which provides For a given level, the critical region (and in fact the test statistic) for the best test is determined by setting an upper bound on the likelihood ratio LL 01, where 0L and 1L are the likelihood functions of the data under 0H and 1H respectively. The Neyman-Pearson lemma Formally, if C is a critical LL01 = k inside C and size =?? a for testing region of size a and there exists a constant k such that LL01 = k outside C, then C is a most powerful critical region the simple hypothesis 0?? = against the simple alternative of hypothesis 1. So a Neyman-Pearson test rejects 0H if: Likelihood Likelihood under H0 < critical value under H1 Question Arandom variable Xis believed to follow an hypothesis =20 ?()Expdistribution. In order to test the null against the alternative hypothesis = 30, where ?= 1 , a single value is observed from the distribution. If this valueis less than 28,0H is not rejected, otherwise 0H is rejected. Show that this is a Neyman-Pearsontest. Solution Given asingle value from an exponential distribution, the Neyman-Pearson criterion is reject 0H if LL 01< criticalvalue. Using the nullandalternativehypotheses, the test becomes: 1 e 20 20 < constant 1 e 30 30 x x IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 11 x This reduces to e 60 < constant , or x > constant . This was exactly the form used (we rejected 0H Common tests when of the test that we >28x). So this is a Neyman-Pearson test. are often such that the composite alternative, eg H10 ?:?? null hypothesis is simple, , whichis two-sided, and eg H00 =:?? H10 >:?? , against a or H10 <:?? , which are one-sided. Here it is only in certain special cases (usually one-sided cases) that a single test is available which is best (ie uniformly most powerful) for all parameter values. In cases where a single best test in the sense of the Neyman-Pearson Lemma is unavailable, another approach is used to derive sensible tests. This approach, whichis a generalisation ofthe lemma, produces tests which are referred to as likelihood ratio tests. Likelihood ratio tests The critical region (and test statistic) for the test are determined by setting an upper bound on the ratio ( max 0 maxLL), where max L0 is the maximum value of the likelihood L under the restrictions imposed by the null hypothesis, and max Lis the overall maximum value of L for all allowable values of all parameters involved. Likelihood ratio tests are used, for example, in survival models with covariates (see Subject CS2). In the mostcommon case when0H and1H together cover all possible values for the parameters, this generalised test rejects 0H if: max(Likelihood under)H0 max(Likelihood under Important results leads to the test include HH) 01 < critical value + the case of sampling from a N(, ) s2 distribution. The method statistic: X- 0 ?tn 1 under H00 =: - / Sn for tests on the value of the mean . Were assuming here that s2 is unknown. If it is known, then the z-test is the best test. The method also leads to the test statistic: nS -(1) s 2 2 ? 2 ?n - 1 under = for tests on the value ofthe variance The Actuarial 22 H00:ss 0 Education Company s 2 . IFE: 2022 Examination Page 12 2.2 CS1-10: Hypothesis testing p-values Under the classical Neyman-Pearson approach, with a fixed test will produce a decision as to whether to reject 0H . test statistic reject 0H with some critical value and concluding with significance level 5% does not provide the recipient orresult of the results predetermined value of a, a But merely comparing the observed egusing a 5%test, reject 0H or significant at 5% (all equivalent statements) with clear detailed information on the strength of the evidence against 0H . A more informative approach is to calculate observed test statistic. probability, assuming 0H is true, (inconsistent and quote the probability value (p-value) This is the observed significance level of the test statistic of observing a test statistic atleast of the the as extreme with 0H ) as the value observed. The p-value is the lowest level at which 0H can be rejected. The smaller the p-value, the stronger is the evidence against the null hypothesis. For example, 0.5Hv when testing H :??s== 0.4 , where 01: ? is the probability of a coin coming up heads, and 82 heads have been observed in 200 tosses, the p-value of the result is: PX =(8 2) where X ~ Bin(200,0.5) PZ 82.5 100??Z( ??<= P 50 ?? H0 is therefore against 0H extremely and in favour < unlikely of1H . - 2.475) = 0.0067 probability A good < 0.01 way of expressing and there is very strong the result is: we evidence have very strong evidence against the hypothesis that the coin is fair (p-value 0.007) and conclude that it is biased against heads. Testing does not prove that any hypothesis is true or untrue. Failure to detect a departure from 0H means that there is not enough evidence to justify rejecting 0H , so0H is accepted in this sense only, whilst realising that it may not be true. This attitude to the acceptance of H0 is a feature of the fact that 0H is usually a precise statement, which is almost certainly not exactly true. Question Arandom variable Xis believed to follow an hypothesis =20 ?()Expdistribution. In order to test the null against the alternative hypothesis = 30, where ?=1 , a single valueis observed from the distribution. If this value of Xis less than k,0H is not rejected, otherwise 0H is rejected. (i) Calculate the value of k that gives a test of size 5%. (ii) Determine the probability of a TypeII error in this case. IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 13 Solution (i) We want: 0.05== ? k 1 20 88 20 xx ?? k -- e 20 ?? = e 20 ??k edx So: k =-20ln0.05 (ii) = 59.9 The probability of a TypeII error is: 1 ?30 0 30 edx kk -- xx?? =- e 30 ?? = ??0 -e -1.997 A p-value ofless than 5%is considered significant, exam question The Actuarial Education does not state the level Company 10.864 = so that the null hypothesis is rejected. If an of the test, assume that it is 5%. IFE: 2022 Examination Page 14 CS1-10: Hypothesis 3 Basictests singlesamples 3.1 Testingthe value of a population mean Situation: random Testing: H00 =: (a) sample, size n, from known: test statistic is s X N(, ) s unknown: For large justifies samples, test statistic is X- sample X 0 - , and s (b) 2 s 0 / Sn / n approximation mean X ~(0,1)under0H N ~tn -1 under 0H (0,1)N can be used in place of tn -1. the use of a normal testing Further, the for the distribution Central Limit Theorem of X in sampling from any reasonable population, and 2s is a good estimate of2s , so the requirement that we are sampling from a normal distribution is not necessary in either case (a) or (b) when we have alarge sample. Question The averageIQ of a sample of 50 university students wasfound to be 105. Carry out a statistical test to conclude whether the average IQ of university students is greater than 100, assuming that IQs are normally distributed. It is known from previous studies that the standard deviation ofIQs among students is approximately 20. Solution Weare testing: H=> Under0H , X - 100 Hv 100 01:: 100s(s known) ? N(0,1). n s Thetest statistic is 105 - 100 20 =1.768. 50 Weneed to draw a conclusion and there are two waysof doing this. Method 1: Calculate the probability of getting a result as extreme as the test statistic (ie the p-value). If ZN? (0,1) : (PZ 1.768)>= 1 - 0.96147 IFE: 2022 Examinations = 0.03853 The Actuarial Education Compan CS1-10: Hypothesis testing Weare carrying Page 15 out a 5% one-tailed test. The probability we have obtained is less than 5%, so we have sufficient evidence to reject 0H at the 5%level. Therefore it is reasonable to conclude that the averageIQ of university students is greater than 100. Method 2: PZ From the Tables, test statistic 1.6449)>=(0.05 , so 1.6449 is the critical value for a one-tailed of 1.768 exceeds this critical value, so wereach the same conclusion 5% test. The as we did for Method 1. Question Test using a 5% significance level whether the average IQ of university students is greater than 103, based on the sample in the previous question. Solution Weare testing: :103 Hv H=> 01 : 103s Under 0H : X - 103 n s ? N(0,1) The observed value of the test statistic is: 105 20 - 103 = 0.707 50 This is less than 1.6449 (the upper 5% point of a (0,1)N distribution) so we have insufficient evidence to reject 0H at the 5%level. Therefore it is reasonable to conclude that the averageIQ of university students is not morethan 103. Alternatively,usingprobability values, wehave PZ (0.707)>0.24 . Thisis greaterthan 0.05, so we have insufficient The Actuarial Education evidence to reject 0H Company at the 5% level. IFE: 2022 Examination Page 16 CS1-10: Hypothesis testing Question The annual rainfall in centimetres at a certain weather station overthe last ten years has been as follows: 17.2 28.1 25.3 26.2 30.7 19.2 23.4 27.5 29.5 31.6 Scientists at the weather station wishto test whether the average annual rainfall hasincreased from its former long-term value of 22 cm. Test this hypothesis at the 5%level, stating any assumptions that you make. Solution Weare testing: :22 Hvs H 01 => :22 Assumingthat annual rainfall under0H : X - 22 Sn measurements areindependent and normally distributed, then ?tn -1 Wehave: 221 (6,895.73 =- 10 25.87 ) 9 s = 22.57 So the observed value of the test statistic is: 25.87 - 22 22.57 =2.576 10 Sincethis is greater than 1.833(the upper 5% point ofthe 9t evidence to reject 0H average annual rainfall Alternatively, at the 5%level. distribution), we havesufficient Therefore it is reasonable to conclude that the long-term hasincreased from its former level. using probability values, we have Pt(9 > 2.576) ? 0.0166 . This is less than 0.05, so we havesufficient evidenceto reject 0H at the 5%level. R can carry out a hypothesis t.test(<sample For small samples from test for the data>, mean with unknown a non-normal which we can calculate the critical IFE: 2022 Examinations at the 5% level using: conf=0.95) distribution then statistic can be constructed in R using the bootstrap from variance value(s) an empirical distribution of the method(see Chapter 8, Section 7), and obtain an estimate of the p-value. The Actuarial Education Compan CS1-10: 3.2 Hypothesis testing Page 17 Testingthe valueof a populationvariance Situation: random Testing: H00:ss= sample, size n , from N(, 2)s sample variance 2S . 22 (1)nS 2 - Test statistic is s ~ 2 2 ?n -1 under 0H 0 Forlarge samples, the test works well even if the population is not normally distributed. Question Carry out a statistical test to assess whether the standard deviation of the heights of 10-year-old children is equal to 3cm, based on the random sample of 5 heights in cm given below. Assume that heights are normally distributed. 124, 122, 130, 125, 132 Solution Weare testing: :=?3 :3 Hvs H 01 ss Under0H : 4S2 32 2 ? ? 4 Wehave: 221 s 4 80,209=- 5 126.6 () = 17.8 Sothe observed value of the test statistic is: 417.8 32 = 7.91 Ourstatistic of 7.91lies between 0.4844 and 11.14(the lower and upper distribution). 21/2% So we haveinsufficient evidence to reject 0H atthe 5%level. reasonable to conclude that the standard Alternatively, wehave P(? 4 deviation points ofthe ?2 4 Therefore it is of the heights of 10-year-old children is 3cm. 7.91)>2 0.0952. Sincethis test is two-sided, the probability of obtaining a value at least as extreme asthat actually obtained is 0.0952=2 0.190, whichis greater than 0.05. So wehaveinsufficient evidenceto reject 0H at the 5%level. The Actuarial Education Company IFE: 2022 Examination Page 18 CS1-10: Hypothesis testing Question The annual rainfall in centimetres at a certain weather station overthe last ten years has been as follows: 17.2 28.1 25.3 26.2 30.7 19.2 23.4 27.5 29.5 31.6 Assumingthese data values are taken from a normal distribution, test at the 5%level whether the standard deviation of the annual rainfall atthe weather station is equal to 4 cm. Solution Weare testing: :=?4 :4 Hvs The test is two-sided. 9S2 42 ? H 01 ss Assuming independence and normality, then under 0H : 2 ?9 Usingthe sample variance calculated earlier, the observed value ofthe test statistic is: 9 22.57 16 = 12.69 Thisis between the upper and lower 21/2%points of 2 (2.700 ?9 and 19.02), so we have insufficient evidence to reject 0H at the 5%level. Therefore it is reasonable to conclude that the standard deviation of the rainfall is 4 cm. Alternatively, two-sided, 2 using probability the probability 0.1775 values, of obtaining 0.355=. Thisis greater than we have P(? 92 12.69)>= 0.1775 . Since this test is a value at least as extreme as that actually obtained is 0.05, so we have insufficient evidence to reject 0H at the 5%level. There is no built-in function to carry out a hypothesis test for the variance in Rto calculate the value of the statistic from scratch or use a bootstrap assumptions For example, if are not R. We can use methodif the met. we are unsure whether the sample comes from a normal distribution, a bootstrap method would be moreappropriate here. IFE: 2022 Examinations The Actuarial Education Compan CS1-10: 3.3 Hypothesis testing Page 19 Testingthe valueof a populationproportion Situation: n binomialtrials with Testing: .00 =:Hp p Test statistic is X For large use: ? = Bin n(, p0) under 0H . n, use the normal approximation 1 2 X Pp(success) ; weobserve x successes. to the binomial (with continuity correction), ie -p n ??? N(0,1) (1 - pp) n or: 1 2 Xnp np(1 - p) N(0,1) ??? Whencarrying out tests ofthis type wecan work out whether we need to add or subtract the 1 2 in the continuity correction if weremember mean of the distribution under 0H that we always adjust the value of X towards the . Forlarge valuesof n,this will makelittle difference unless the test statistic is close to the critical value. Question In a one-year mortality investigation, 45 of the 250 ninety-year-olds present at the start of the investigation died before the end of the year. Assumingthat the number of deaths has a Bin(250,)q distribution, test whether this result is consistent with a mortality rate of q = 0.2 for this age. Solution Wearetesting: Hq 0.2 vs H : 01: 0.2=?q Under 0H : Xn - 0.2 0.2 ? N(0,1) approximately 0.8 n The Actuarial Education Company IFE: 2022 Examination Page 20 CS1-10: Hypothesis Using the observed values, n 45.5 250 - 0.2 250=and x 45=, the test statistic with continuity testing correction is: =-0.712 0.2 0.8 250 Since the meanis np= 250 0.2 = 50 , the continuity correction involves adjusting 45 towards the mean. So we haveto add 0.5. Our statistic of distribution). 0.712 lies between - So we have insufficient reasonable to conclude that the true 1.960 (the lower and upper 21/2% points of the (0,1)N evidence to reject 0H at the 5%level. Therefore it is mortality rate for this ageis 0.2. Alternatively, using probability values, wehave PZ ( <- 0.712) = 0.238. Sincethis test is two-sided, the probability 2 0.238 of obtaining a value at least as extreme as the one actually obtained is 0.48. This is greater than 0.05 , so we have insufficient evidence to reject 0Hat the 5% level. Question A new gene has been identified that makes carriers of it particularly susceptible to a particular degenerative disease. In a random sample of 250 adult males born in the UK, 8 were found to be carriers of the disease. Test whether the proportion of adult males born in the UK carrying the gene is less than 10%. Solution We are testing: Hp01: :0.1=<0.1 vs Under H p 0H: Xn - 0.1 ???N(0,1) 0.1 0.9 n The observed value of the test statistic, with continuity correction adjusted towards the mean, is: 8.5 250 - 0.1 0.1 =-3.479 0.9 250 We are carrying out a one-sided test. The value of the test statistic is less than -1.6449 (the lower 5% point of the (0,1)Ndistribution) so we have sufficient evidence to reject Therefore it is reasonable than 10%. IFE: 2022 Examinations to conclude that the proportion 0Hat the 5% level. of male carriers in the population The Actuarial Education is less Compan CS1-10: Hypothesis testing Alternatively, Page 21 using probability values, (PZ <- 3.479) ? 0.00025 . we have Thisis less than 0.05 , so wehavesufficient evidenceto reject 0H at the 5% level. In fact, wehave sufficient evidenceto reject 0H at even the 0.05% level. R can carry out an exact hypothesis binom.test(x,n, 3.4 test for p at the 5% level using: conf=0.95) Testingthe valueofthe meanof a Poissondistribution Situation: random sample, size n, from Poi()? distribution. Testing: =: H00 ?? Teststatistic is sample sum ?Xni ~ Poi( )? 0 0n? is of moderate size, probabilities under0H . In the case where n is small and can be evaluated directly (or found from tables, if available). For large samples (or indeed whenever the Poisson can be used for the distribution ?)Xni? Poi( Test statistic is ? n( ??n?, N X, and mean is large) a normal approximation ofthe sample sum or sample mean. Recallthat ). X - ?0 ?0 orwecanuse ? iX , and ~(0,1) under 0H N n ? -Xni?0 ~(0,1) under0H . N n? 0 Using the second version it is easier to incorporate The first version has continuity correction a continuity correction. 0.5 n, whereas the second version has continuity correction 0.5. Question In a one-yearinvestigation of claim frequencies for a particular category of motorists,the total number of claims madeunder 5,000 policies was800. Assumingthat the number of claims made byindividual motoristshasa average claim frequency ()Poi ? distribution,test at the 1%level whetherthe unknown ? is less than 0.175. Solution Weare testing: H=< The Actuarial Education 0.175 Hv Company 01:: 0.175??s IFE: 2022 Examination Page 22 CS1-10: Hypothesis testing Under0H : X - 0.175 ? N(0,1) 0.175 n Usingthe observed values, n 5,000= and x 0.16= , the test statistic, with continuity correction, is: 800.5 5,000 -0.175 =-2.519 0.175 5,000 Thisis less than - 2.3263, the lower 1%point ofthe (0,1)Ndistribution. evidence at the 1%level to reject 0H . Therefore it is reasonable So we have sufficient to conclude that the true claim frequency is less than 0.175. Alternatively, using probability values, wehave PZ ( <- 2.519) = 0.0059. Sincethis is less than 0.01, we havesufficient evidenceto reject 0H at the 1%level. Question Arandom sample of 500 policies of a particular kind revealed a total of 116 claims during the last year. Test the null hypothesis H0:0.18?= against the alternative H ?>1:0.18 , where ? is the annual claim frequency, ie the average number of claims per policy. Solution Weare testing: 0.18 Hvs H :?? => 0.1801: Assuming that the underlying claim frequency X - 0.18 0.18 n 500 - then under 0H : N(0,1) ??? The observed value of the test statistic, 115.5 has a Poisson distribution, with continuity correction, is: 0.18 = 2.688 0.18 500 Weare carrying out a one-sided test. upper 5% point of the The value of the test statistic is greater than 1.6449 (the (0,1)N distribution) so we havesufficient evidenceto reject 0H at the 5% level. Therefore it is reasonable to conclude that the true claim frequency is morethan 0.18. IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 23 Alternatively,usingprobability values, wehave PZ 2.688) >( 0.0036,ie 0.36%.Thisis less than 0.05, so wehave sufficient evidenceto reject 0H at the 5%level. In fact, we havesufficient evidence to reject 0H R can carry even at the 0.5% level. out an exact hypothesis poisson.test(x,n, The Actuarial Education Company test for ? at the 5% level using: conf=0.95) IFE: 2022 Examination Page 24 CS1-10: Hypothesis testing 4 Basictests two independentsamples 4.1 Testingthe value of the difference betweentwo population means Situation: independent random samples, sizes 1n and 2n from N(, 2s 11 ) , N(, 2s 22 ) respectively. Testing: H (a) ss test 2-=01 d : 22 known 12, statistic: xx -12 z = ss + d 22 12 nn12 There is no built-in function for calculating the above hypothesis test in R. We can use Rto calculate the results of the statistic from scratch or use a bootstrap method if the assumptions are not met. (b) ss 22 unknown 12, Large samples: use muchthe more usual situation Si2 to estimate 2 We will now use a t distribution. si . Further, the Central Limit distribution of the test statistic in sampling from any reasonable populations, so the requirement that large samples. Small samples: Theorem justifies we are sampling from the use of a normal approximation normal under the assumption ss= distributions 22 12 2 degrees of freedom 2 Remember that sp = --d 12 sp nn12+- say()= , this common s2 xx estimated by2Sp, and the test statistic is t = is not necessary for the when we have variance is which is distributed as t with 11 + nn12 under 0H . (1)22 ns 11-+ ( n2 - 1)s2 nn 12+- 2 . R can carry out a hypothesis test for the difference between the means with unknown variance using the function t.test. We set the argument var.equal = TRUEfor small samples. Again, we could use the bootstrap statistic if the assumptions IFE: 2022 Examinations methodto construct an empirical distribution of the are not met. The Actuarial Education Compan CS1-10: Hypothesis testing Page 25 Question The average blood pressurefor a control group Cof 10 patients was77.0 mmHg. The average blood pressurein a similar group T of 10 patients on a special diet was75.0 mmHg. Carry out a statistical test to assess whether patients on the special diet have lower 10 10 i=1 i=1 blood pressure. Youaregiven that ?=2ci59,420and ?ti2 = 56,390 . Solution Weare testing: CT If Hvs H 01:: C=> T we assume that blood pressures are normally distributed and that the variance of the underlying distribution for each group is the same, then under0H : CT ()-- (0) 11 + mn SP ? tmn +-2 Wehave: 1 s=- ?? mn+- 2 sm (1)22 +(nPC -1)s2T mn 1 mn ?? 2 1 ==+-?? 1 t 22 ) ???? ii 11 mn +cmc =- 22 ?? 2 ii mn 2??+-?? 10 10+-?? 2 ?? ?? (tii +cc =- () ?? nt 2??-t ii == 11 59,420 =- 10 77.0 + 56,390 - 10 ?? 75.022 15.00== 3.873 2 As mentioned previously, the number of degrees of freedom to use with a t test is the same asthe denominator used whencalculating the estimate of the variance, ie 18in this case. Using the observed values of 10,mn == 10, t = 75.0, c = 77.0, and sP 3.87322=, the value of the test statistic is: (77.0 3.873 The Actuarial Education - 75.0) 11 + 10 10 Company =1.15 IFE: 2022 Examination Page 26 CS1-10: Hypothesis Thisis less than 1.734, the upper 5% point of the 18t distribution. testing So we have insufficient evidence to reject 0H at the 5%level. Therefore it is reasonable to conclude that patients on the special diet have the same blood pressure as patients on the normal diet. Alternatively, using probability 0.05, we have insufficient values, we have 1.15)>18(0.134. Sincethis is greater than Pt evidence to reject 0H at the 5%level. We will not always be testing for equality between the two sample means. Question Acar manufacturer runs tests to investigate the fuel consumption of cars using a newly developed fuel additive. Sixteen cars of the same make and age are used, eight with the new additive and eight as controls. The results, in miles per gallon over a test track under regulated conditions, are asfollows: Control 27.0 32.2 30.4 28.0 26.5 25.5 29.6 27.2 Additive 31.4 29.9 33.2 34.4 32.0 28.7 26.1 30.3 If C is the meannumber of milesper gallon achieved by carsin the control group, and A is the meannumber of milesper gallon achieved by carsin the group withfuel additive, test: (i) (ii) H:0 -= HvA -=:6 s 01 CA H 01 : CA Hvs C-> :0A C-?6 Solution UsingiC for the number of milesper gallon ofthe carsin the control group and iA for the number of milesper gallon ofthe cars with additive, we have: ?ci =226.4 , ?ci2 = 6,442.5? =246ia , ?ai2 = 7,612.56 = 5.96 , The estimate of the pooled sample variance is: 1 mn 2 +-?? 1 14 (i) ?? ??a2 ma2 22 2 + sciinc=- 6,442.5 =- 8 28.3 + - 7,612.56 - 8 () 30.7522 Wearetesting: H:0 IFE: 2022 Examinations -= Hvs 01 CA C-> :0A The Actuarial Education Compan CS1-10: Hypothesis testing Page 27 Assuming that the underlying AC ()-- 0 ?tmn 11 S +- distributions are normal, then under 0H : 2 + mn The observed value of the test statistic is: 30.75 - 28.3 =2.007 11 5.96 + 88 Thisis greater than 1.761(the upper 5% point of the 14t distribution) so we have sufficient evidence to reject 0H atthe 5%level. Therefore it is reasonable to conclude that the meanperformance is greater with the additive than without. Alternatively, using probability 0.05, so we have sufficient (ii) values, we have evidence to reject 0H Pt 2.007)>14( 0.0340. Thisis less than at the 5%level. Weare now testing: s H -= :6 HvA 01 : CA Making the same assumptions AC ()-- 6 ? tmn 11 + mn S +- C ?6 as before, under 0H : 2 The observed value of the test statistic is now: (30.75 28.3)-- 5.96 This is a two-sided 6 =-2.908 11 + 88 test and the statistic is less than -2.145 (the lower 2.5% point of the t 14 distribution) so wehave sufficient evidence to reject 0H at the 5%level. Therefore it is reasonable to conclude that the difference in the meansis not equal to 6. Alternatively, two-sided, the probability obtained is 2 reject 0H using probability 0.00598 values, we have of obtaining Pt(14 <-2.908) 0.00598 . Since this test is a value at least as extreme as the one actually 00120=. Thisis less than 0.05, so we have sufficient at the 5%level. In fact, we have sufficient evidence to reject 0H evidence to even at the 2.5% level. The Actuarial Education Company IFE: 2022 Examination Page 28 4.2 CS1-10: Hypothesis testing Testingthe valueofthe ratio oftwo populationvariances Situation: independent respectively. random samples, Sample variances Testing: Hss 22 2 01 This test is a formal = 22 is required. ss 12 Hvs :: prerequisite In practice, justify the assumption any problem 1 S12 s 2 1 N(, s 2 11 ), N(, 2 22 ) s and S2. 2 =? s2 2 for the two-sample however, a simple t test, for which the assumption plot of the data is often sufficient to only if the population variances are very different in size is there with the t test. Test statistic: 22 SS /~ F 12 nn-- 1 under 0H 1, Wesaw in Chapter 9 that 12 22 SS 12 22 ss ? Fnn 1,12 1 , soit follows that if wearetesting the hypothesis -- 12 12, wecan usethe test statistic SS 22 and compare it withthe critical pointsin the 12 ss=22 appropriate sizes 1n and 2n from Ftable. Question The average blood pressurefor a control group C of 10 patients was77.0 mmHg. The average blood pressurein a similar group T of 10 patients on a special diet was75.0 mmHg. Test whether the variances in the two populations can be considered to be equal. 10 10 i=1 i=1 You aregiven that ?=2ci 59,420 and ?=2ti56,390 . Solution Weare testing: 22 s=?ss TC Hvs H 01:: 2 2 C s T Assumingthat blood pressures are normally distributed, then under0H , both populations have the same variance, so that: S 22 s 22 s S2 = TT ? Fmn1, -2 SS CC 1 Usingthe given data values, we have: sT sC 1 9 1 9 56,390=- 10 () = 15.56 () = 14.44 22 75 59,420 =- 10 7722 IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 29 The observed value of the test statistic is: 15.56 =1.077 14.44 1 Thisis atwo-sided test and ourstatisticis between 4.026and 4.026= 0.2484(the upperand lower 21/2% valuesfrom the 9,9Fdistribution). Sothere is insufficient evidenceto reject0H at the 5%level. Therefore it is reasonable to conclude that there is no difference in the variances of the two populations. PF9, >9 (1.077) is greaterthan 0.1. Alternatively, wecan seefrom page 171 of the Tablesthat Since the test is two-sided, the p-value is greater than 20.1 we have insufficient evidence to reject 0H at the 5%level. This means that 0.2=. Thisis greater than we werejustified in carrying out the two-sample t test previously, 0.05, so which assumes equal variances. Had we used s 2C 14.44 s2 15.56 T == 0.9280, we would have reached the same conclusion. R can carry out a hypothesis test for the ratio ofthe variances using var.test use a bootstrap 4.3 method if the assumptions or we could are not met. Testingthe value ofthe difference betweentwo population proportions Both one-sided and two-sided tests can easily be performed binomial probabilities at least for large samples. on the difference between two Situation: Testing: n1 (large) trials withPp(success) = n2 (large) trials withPp(success) Hp:p= 01 - pp12() (1 pp) + sample 2 ; proportions successes. N~0,1() under 0H XX 12 ), and p is the nn12 estimates (MLEs) of1p and 2p respectively, (the MLE of the common which is the overall sample proportion, namely Education observe 2x p(1-- p) nn12 , 12 are the maximumlikelihood pp , The Actuarial observe 1x successes. 2. Test statistic is where = 1; Company p under the null hypothesis, +XX 12 +nn 12 . IFE: 2022 Examination Page 30 CS1-10: Hypothesis -pp 12 In some textbooks an alternative test statistic is used, namely: (1 pp ) 11 + p2(1-- )2p testing ? N(0,1). nn 12 The denominator in the Core Reading expressionis found by pooling the sample proportions, whereas in the alternative Both approximations version, the values of 1p and 2p are used separately. are valid. In the exam we would advise you to use the version shown in the Core Reading. Question In a one-year mortality investigation, 25 of the 100 ninety-year-old males and 20 of the 150 ninety-year-old females present at the start ofthe investigation died before the end ofthe year. Assuming that the numbers of deaths follow binomial distributions, test whether there is a difference between maleand female mortality rates at this age. Solution Weare testing: F::q HqMF q vs H 01 M=? q If MX andFX denotethe number of deathsamongthe malesandfemales, mand f arethe sample sizes, and q the pooled sample proportion, XX ?? MF ??-- mf?? (1 qq) q(1-+ then, under 0H : 0 q) ? N(0,1) mf 100,mf== 150, Usingthe observed values of = 25, xMFx= 20, and q = 45 , the value of the 250 test statistic is: 0.25 0.1333 ()(0.18 0.82) 100+ (0.18 =2.35 0.82) 150 Thisis greaterthan 1.960(the upper 21/2% point ofthe (0,1)Ndistribution). So we havesufficient evidence to reject 0H at the 5%level. Therefore it is reasonable to conclude that maleand female mortality rates are different at this age. Alternatively, using probability values, wehave PZ 2.35)>=(0.0093. Sincethis test is two-sided, the probability of obtaining a value atleast as extreme as the one actually obtained is 2 0.0093 0.019=. As 0.019 0.05<, we have sufficient evidence to reject 0H at the 5%level. Thistest can also be one-tailed. IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 31 Question Asample of 100 claims on household policies madeduring the yearjust ended showed that 62 were dueto burglary. Asample of 200 claims madeduring the previous year had 115 due to burglary. Test the hypothesis that the underlying proportion of claims that are due to burglary is higher in the second year than in the first. Solution Wearetesting: Hp 01 where p2 H1 :: p2=> p1 (ie p1 vs 12 are the and pp proportions - p2 <0) of claims due to burglaries in the previous and current years respectively. If 1N and 2N denote the numbers of claims due to burglaries in each year, then, under 0H : (200 NN (1 pp) 200 + 100)12 0 -- ??? (1 -- pp) N(0,1) 100 The observed value of the test statistic is: (115 200 62 100)-- 0 =-0.747 0.59(1 0.59) 0.59(1 0.59) -+ 200 100 Weare carrying out a one-sided test and the value of our statistic is greater than -1.6449 (the lower 5%point ofthe (0,1)Ndistribution). So we haveinsufficient evidence to reject 0H atthe 5%level. Therefore it is reasonable to conclude that the proportion of claims due to burglaries in the yearjust endedis not greater than the proportion in the previous year. Alternatively, using probability values, wehave (PZ <- 0.747) 0.228. Sincethis is greaterthan 0.05, we have insufficient evidence to reject 0H at the 5%level. R can carry out a hypothesis test for the difference in proportions argument 4.4 using prop.test withthe correct=FALSE. Testingthe value ofthe difference betweentwo Poisson means Situation: independent distributions. random samples, sizes 1n and2n , from Considering the case in which normal Poi approximations ? 1 and Poi () () ?2 can be used which is so whenever the sample sizes are large and/or the parameter values arelarge: Testing: The Actuarial H : 01 =?? 2 . Education Company IFE: 2022 Examination Page 32 CS1-10: Hypothesis -?? Test statistic is testing () 12 ? N(0,1) ?? + nn 12 under 0H where ?? ,12 are the MLE of the common MLEs (the sample ? under the null hypothesis, means XX12, , respectively) which is the overall and sample ? is the mean. () 12 ? N(0,1). -?? Again,in some textbooks you maysee an alternative test statistic, namely: ??12 + nn 12 Similarly to the last section, the Core Reading version has a pooled value for the parameter, whereasthe alternative version doesnt. Both are valid approximations. Question In a one-year investigation of claim frequencies for a particular category of motorists, there were 150 claims from the 500 policyholders aged under 25 and 650 claims from the 4,500 remaining policyholders. Assumingthat the number of claims madebyindividual motoristsin each category has a Poisson distribution, test atthe 1%level whether the claim frequency is the same for drivers under age 25 and over age 25. Solution Weare testing: ?=???YO Hvs H :: 01 ?Y O where weare using Yto represent young and Oto represent old. Under0H : YO ()-- 0 ??? N(0,1) where mand n arethe samplesizes ?? + mn The observed value of the test statistic is: 0.300- 0.144 0.16 500 0.16 + = 8.25 4,500 Weare carrying out atwo-sided test and our statistic is muchgreater than +2.5758 (the upper 1/2%point ofthe (0,1)Ndistribution). So weeasily havesufficient evidence to reject 0H at the 1% level. Therefore it is reasonable to conclude that the claim frequencies are different for younger and older drivers. IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis Alternatively, is two-sided) testing Page 33 using probability values, we have (PZ><< 8.25) 0.0005% . Doubling this (as this test gives a p-value that is still less than 0.001%. So we have sufficient evidence to reject H0, even at the 0.001%level. In fact, although the hypotheses werent posed in this the claim frequency is higher for the younger There is no built-in function for calculating the above statistic in calculate the result from scratch or use a bootstrap hypothesis The Actuarial test for the ratio Education Company of the two wayin the question, wecan conclude that drivers. Poisson R. Wecan use Rto method. However, R can carry out a parameters using poisson.test. IFE: 2022 Examination Page 34 5 CS1-10: Hypothesis Basictest In testing testing paireddata for a difference between two population can have a major drawback. means, the use of independent samples Evenif areal difference does exist, the variability among the responses within each sample can be large enough to mask it. The random variation within the samples will mask the real difference between the populations from which they come. One way to control this variability external to the issue in question is to use a pair of responses from each subject, and then work with the differences within the pairs. The aim is to remove as far as possible the subject-to-subject variation from the analysis, and thus to home in on any real difference between the populations. Assumption: Testing: differences H =01 :D a random sample from a normal distribution. () d=- 2 D Test statistic is constitute - d ? tn -1 under 0H . D / Sn Wecan use (0,1)N for t, and do not require the normal assumption, if nis large. Question Theaverageblood pressure B for a group of 10 patients was77.0 mmHg.Theaverageblood pressure A after they were put on a special diet was 75.0 mmHg. Carry out a statistical test to assess whether the special diet reduces blood pressure. 10 You are given that ?( ba ii) -=68.0. 2 i =1 Solution Weare testing: Hvs B=< H 01::A BA where A is after and Bis before Wecan calculate the difference in blood pressure within each pair, ie DA =- Bi . If ii we assume that blood pressures are normally distributed, then under0H , the iD s also have a normal 2 distribution. So wecan apply a one-sample t test to the iD s, based onthe sample variance sD: D -D () ABt? n-1 Sn IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 35 For our samples: da = - 75.0=-=- 77.0 b 2 d)=-= 22 sdDi nn 11 nn--?? 11?? di 2 -nd2 ?? 1 = ii== 11 9 68.0 - 10(-2.0) 2???? = 3.111 ????( = 1.7642 So,the observed value of the test statistic is: 75.0 - 77.0 =-3.59 1.764 10 Thisis less than -1.833, the lower 5% point of the 9t to reject 0H at the 5%level. distribution. So we have sufficient evidence Therefore it is reasonable to conclude that the special diet does reduce blood pressure. Alternatively, using probability values, we have Pt(9 <-3.59) 0.0037 , which is less than 0.05. So wehavesufficient evidenceto reject 0H at the 5%level. In fact, we havesufficient evidenceto reject it at even the 0.5% level. When we performed the two-sample because the reduction t test earlier, we were unable to reach this conclusion was masked by other factors. Sometimes care is needed to carry out the test the right wayround. Question In order to increase the efficiency with which employees in a certain organisation can carry out a task, 5 employees are sent on a training course. The time in seconds to carry out the task both before and after the training course is given below for the 5 employees: A B C D E Before 42 51 37 43 45 After 38 37 32 40 48 Test whether the training course has hadthe desired effect. Solution Weare testing: AB=< where Ais After The Actuarial Education Hvs H 01 BA (ie A::- B < 0) and Bis Before. Company IFE: 2022 Examination Page 36 CS1-10: Hypothesis Taking the differences performance), a=db (so that a positive value of d represents 5 3 testing an improvement in we have: 4 14 3 Applying a one-sample t test to the D values (and assuming that the underlying distributions are normal): D () BAt? n-1 -- Sn D For the sample values: d 23 == 5 11 22? = (255 4.6 and sd ( Di=- d) 44 - 5 4.6 2 ) = 6.107 2 So the observed value of the test statistic is: 4.6 - 0 6.107 =1.684 5 Thisis a one-sided test and the observed value of the test statistic is less than 2.132(the upper 5% point of the 4t distribution). So we haveinsufficient evidence to reject 0H at the 5%level. Therefore it is reasonable to conclude that the training course does not increase employees efficiency. Alternatively, using probability values, we have Pt 0.05. So we have insufficient R can carry out this IFE: 2022 Examinations evidence to reject 0H hypothesis test using t.test 1.684)>4(0.0874, whichis greater than at the 5%level. with the argument paired=TRUE. The Actuarial Education Compan CS1-10: 6 Hypothesis testing Page 37 Testsandconfidenceintervals You mayhave noticed that weve been using some of the same examplesin this chapter asin Chapter 9. Thisis becausestatistical tests and confidence intervals are very closely related. The methods are basically the same in each case, except that they work opposite ways round. Confidenceintervals start from a probability and find a range of parameters associated withthis. Statistical tests start with a possible value (or values) for the parameter and associate a probability value with this. There are very close parallels between the inferential methods for tests and confidence intervals. In many situations there is a direct link between parameter and tests of hypothesised values for it. a confidence interval for a A confidence interval for ? can be regarded as a set of acceptable hypothetical values for ?, so a value contained in the confidence interval should be such that the hypothesis 0? H00 =:?? will be accepted in a corresponding test. This generally proves to be the case. In some situations there is a difference between the manner of construction ofthe confidence interval and that of the construction of the test statistic which is actually used. For example the confidence interval for the difference between two proportions (based on normal approximations) is constructed in a different way from that used for the test statistic in the corresponding test, where an estimate of a common proportion (under 0H ) is used. As a result, in this and similar between the confidence interval One useful consequence cases there is only an approximate and the corresponding of this relationship match (albeit a good one) test. between tests and confidence intervals is that if we have a 95%confidence interval for a parameter, wecanimmediately apply a 5%test on the value of that parameter by observing whether or not the interval contains the proposed value. Question Aresearcher hasfound 95% confidence intervals for the average daily vitamin Cconsumption (in milligrams)in three countries. For country Ait is (75,95), for country Bit is (40,50) and for country Cit is (55,65). Onthe basisof thisinformation, are people are getting sufficient vitamin C in each country? Therecommended daily allowance is 60mg. Solution Country A The95%confidenceinterval is (75,95), whichcontains only valuesabove60. Soin a 5%test of vs H0:60= H 1:60> wereject 0H and conclude that people are getting more than enough vitamin C. Country B The95%confidenceinterval is (40,50), whichcontains only values below 60. Soin a 5%test of H0:60= vs H 1:60< wereject 0H and conclude that people are not getting enough vitamin C. The Actuarial Education Company IFE: 2022 Examination Page 38 CS1-10: Hypothesis testing Country C The95%confidenceinterval is (55,65), whichcontainsthe value 60. Soin a 5%test wecannot reject 0H and weconclude that people are getting the recommended daily allowance. IFE: 2022 Examinations The Actuarial Education Compan CS1-10: 7 Hypothesis testing Page 39 Non-parametric tests The tests we have been considering the variables ofinterest the level of statistical so far all make assumptions about the distribution of within the population. If these assumptions are not correct, then significance It is possible to devise tests termed non-parametric. can be affected. which make no distributional assumptions. Such tests are They have the advantages of being applicable under conditions in which the tests in the previous sections should not be used. For example, whilstthe two sample t test is robust for departures from normality and equal variances for large samples, it is not appropriate for small samples with a non-normal distribution. Hence, we need to use a test which doesnt make any distributional assumptions about the data or the test statistic. Thesetests are called non-parametric tests. However, some non-parametric tests do not use all the information available. For example, the Signs Test in Subject CS2 uses the signs ofthe differences between two samples while ignoring their magnitude. By using only some of the information, the test wont be as accurate. 7.1 Permutationapproach One way of constructing a non-parametric test is to consider all possible permutations of the data subject to some criterion. For example, consider a test of the difference between the means of two independent samples of sizes An and Bn . The null hypothesis is that there is no difference in the mean ofthe two samples. Label the two samples as A and B, and consider elements on the combined sample such that An category B. Each of these permutations all possible of them will produce ofthe Assuming that each permutation are in category a test statistic and the mean differences from all possible permutations differences. ways of selecting (the the Bnn+ A A and Bn mean difference), will provide a distribution is equally likely, are in we can calculate of mean the p-value mean difference in the data we have (the permutation actually observed). The null hypothesisis that the distributions of both categories are the same and hencethe means (or any other statistic such asthe medians)are the same. In whichcase, a data point is equally likely to have been assignedto either group. Wecan then calculate the p-valuefor our observed statistic of the sampling distribution. This will be the proportion of permutations that lead to test statistics at least as extreme (relative to an alternative hypothesis) as the actuallabelling ofthe data. The Actuarial Education Company IFE: 2022 Examination Page 40 CS1-10: Hypothesis If the two samples are stored in vectors xA and xB, then sample testing R code for obtaining the permutation sampling distribution for the difference in the meansis as follows: results <-c(xA,xB) index <-1:length(results) p<-combn(index,nA) n<-ncol(p) dif<-rep(0,n) for (i in 1:n) { dif[i]<-mean(results[p[,i]])-mean(results[-p[,i]]) } If our observed statistic is T and our alternative hypothesis is is calculated H :> 11 2 then the p-value as follows: length(dif[dif>=T])/length(dif) Alternatively, we can use the permTS function function in the coin package orthe perm.test (though this only works if the observed in the perm package or the oneway_test function in the exactRankTests package values are integers). Similar approaches can be used for tests for paired data where the pairs are kept together. This is equivalent to calculating the permutations of the signs of the differences of the pairs. The permutation approach is not new, but it has become much more feasible with the advent of powerful computers, which can undertake the calculation permutations involved in all but the smallest problems. in recent years of the many However, for larger samples the number of permutations grows rapidly and this computationally expensive. Hence, we usually resort to resampling methods. For example, two groups of size 20 result in 137,846,528,820 combinations. becomes Resampling methods reduce the number of combinations and thus the computation time. Thesetechniques will be described morefully in the CS1Bcourse. IFE: 2022 Examinations The Actuarial Education Compan CS1-10: 8 Hypothesis testing Page 41 Chi-square tests These tests are relevant to category or count data. Each sample value falls into one or other of several categories or cells. The test is then based on comparing the frequencies actually observed in the categories/cells hypothesis, using the test statistic expected under some respectively in the ith feii ()-2 ? where if with the frequencies ei and ie category/cell, are the observed and expected frequencies and the summation is taken over all categories/cells involved. has, approximately, ( ?2) distribution a chi-square which the expected frequencies The statistic is often written as under the hypothesis This statistic on the basis of were calculated. ? -OE ii ()2 , to show which is the observed value. The values of Ei Oi andiE should be numbers rather than proportions or percentages. 8.1 Goodnessoffit This is investigating whether it is reasonable specified distribution, ie whether a particular to regard a random sample as coming from a model provides agood fit to the data. Degreesoffreedom Suppose there are k cells, so k terms in the summation which produces the statistic, and thatthesample sizeisinf= ? . Theexpected frequencies alsosumto n,soknowing any k - 1 of them automatically gives you the last one. terms which are added up to produce the statistic of freedom of the basic statistic Further, for each parameter is of the There is a dependence built in to the k and this is the reason whythe degrees k - 1 and not k. distribution specified by the null hypothesis which must be estimated from the observed data, another degree of dependence is introduced in the expected frequencies The theory behind this for each parameter estimated another degree of freedom is lost. assumes that the maximum likelihood estimators are used. So the number of degrees offreedom is reduced bythe number of parameters estimated from the observed data. Theaccuracy ofthe chi-square approximation The test statistic is only approximately, expected frequencies Dividing erratic, ie by very small ie and the tail distribution very The Actuarial in the denominators distributed of the statistic terms to be somewhat may not match that well. So,in practice, it is best not to have too cells and suffering as2? . The presence of the of the terms to be added up is important. values causes the resulting of the distribution can be done by combining offreedom. not exactly, the consequent many small ie loss large and of the 2? values, which of information/degrees The most common recommendation is not to use anyie which is less than 5. Education Company IFE: 2022 Examination Page 42 CS1-10: Hypothesis (However, the statistic is approach, them testing morerobust than that and in practice a less conservative such as ensuring that allie are greater than 1 and that not more than 20% of are less than 5, may be taken.) Question In testing whether a die is fair, a suitable (PX and the 1 , 6 i) == i hypotheses = 1,2,3,4,5,6 modelis: where Xis the number thrown may be: H0: Number thrown has the distribution specified H1: Number thrown does not have the distribution If the die is thrown 300 times, with the following x: 1 2 3 4 5 6 fi : 43 56 54 47 41 59 in the model specified in the model results, Carry out a ?2 test to assess whether the data comes from a fair die. Solution Under0H , 300 1 = 50 occurrences i = 1,2,3,4,5,6 . The values of frequencies, of each face of the die would be expected, so ei = 50, 6 fe ()-ii , the differences between observed and expected are then: 7,6,4,-- 3, - 9,9 which of course sum to zero. The value of the test statistic is then: 49 36 16 50 50 + 50 9 81 50 50 +++ 81 272 + = 50 = 50 5.44 In this illustration, with 6 cells and a fully specified distribution of the test statistic under 0H is ?52 . Thisis a one-sided test. Since 5.44 is less than 11.07 (the upper 5% point of the ?2 5 we haveinsufficient evidence to reject 0H at the 5%level. Alternatively, the p value is P ?2 to estimate), the Wereject 0H for large values ofthe statistic (ie whenthe observed and expected values are very different). distribution) model (no parameters P ? 2 > (55 .44) . The probability tables (on page 165) show that (55.5) >= 0.358 , so P ?2 >(55.44) is about 0.36. so we have observed IFE: 2022 Examinations a value much in line Note also that a ?52 variable has mean 5, under the model. with what is expected The Actuarial Education Compan CS1-10: Hypothesis testing Page 43 We have no evidence that the die is not fair. H0 can stand. Question Thetable below shows the causes of deathin elderly menderived from a study in the 1970s. Carry out a chi-square test to determine whether these percentages can still be considered to provide an accurate description of causes of deathin 2000. Cause of death Proportion of deathsin 1975 Number of deaths in 2000 Cancer 8% 286 Heart disease 22% 805 Other circulatory disease 40% 1,548 Respiratory diseases 19% 755 Other causes 11% 464 Solution Wearetesting: H0: the causes of deathin 2000 conform to the percentages shown vs H1 : the causes of death in 2000 do not conform to the percentages shown Under0H : ? 2 2ii -OE() Ei ? ? f where f is the number of degrees of freedom. The expected values for each category are calculated by multiplying the total number of deaths by the percentage for that category. For example the expected number of deaths from heart disease is 0.22 3,858 848.8= . The Actuarial Education Company IFE: 2022 Examination Page 44 CS1-10: Hypothesis 2() CO ii =-iE Thetable below shows the observed and expected figures is (where Actual,iO Cause of death Expected,iE C i Cancer 286 308.6 1.66 Heart disease 805 848.8 2.26 1,548 1,543.2 0.01 Respiratory disease 755 733.0 0.66 Other causes 464 424.4 3.7 3,858 3,858 8.29 Other circulatory diseases Total There are no small groups. testing i E): The value of the chi-square statistic is 8.29. There are 5 categories. TheiE s werecalculated from the total number of observations. havent estimated any parameters. Sothe number of degrees offreedom is We 51-= 4 . Chi-square goodness-of-fit tests are one-sided tests. The observed value of the test statistic is less than 9.488, the upper 5% point of the H0 at the 5%level. ?2 4 distribution.So wehaveinsufficientevidenceto reject Therefore it is reasonable to conclude that there has been no change in the pattern of causes of death. Alternatively, using probability values, we have P(? 4 8.29)>2 0.0819 , whichis greater than 0.05. So we haveinsufficient evidenceto reject 0H atthe 5%level. Wecan apply the test to data from other distributions, for example the Poisson distribution. Question The numbers of claims madelast year byindividual motorinsurance policyholders were: Number of claims 0 1 2 3 4+ Number of policyholders 2,962 382 47 25 4 Carry out a chi-square test to determine whether these frequencies can be considered to conform to a Poisson distribution. Solution Weare testing: H0: the number of claims conform to a Poisson distribution vs H1: the number of claims dont conform to a Poisson distribution IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 45 Under 0H : ? 2ii -OE() 2 ? ? f Ei Tofind the expected numbers, we mustestimate the unknown meanof the Poisson distribution. The MLEof the meanof a Poisson distribution is the meannumber of claims. If we assumethat no policyholders made more than 4 claims, this is: 2,962 0 + 382 1+ 47 2+ 25 3+ 4 4 ?==0.1658 3,420 The expected values are found by applying the Poisson probabilities calculated usingthis value for the parameter to the total observed number of claimsie 3,420. The table showing the observed and expected figures is: Number of claims Actual Expected 0 2962 2,897.5 1 382 480.4 2 47 39.8 3 25 2.2 4 or more 4 0.1 Total 3,420 3,420 Wecalculate the last expected figure by subtraction. The expected numbers in the last two groups are very small, so we need to combine the last three groups to form a2 or more group. The value of the chi-square statistic is: ? 2 (2,962 2,897.5) 2,897.5 (382-- =+ 22 480.4) (76 480.4 + - 42.1)2 42.1 = 48.89 There are now 3 groups. TheiE s werecalculated from the total number of observations. We have estimated one parameter. Sothe number of degrees of freedom is 1--31 1 = . Weare carrying out a one-sided test. the upper 0.5% point of the ?2 1 The observed value of the test statistic far exceeds 7.879, distribution.So wehavesufficientevidenceto reject0H atthe 0.5%level. Therefore it is reasonable to conclude that a Poisson model does not provide a good modelfor the number of claims. The Actuarial Education Company IFE: 2022 Examination Page 46 CS1-10: Hypothesis testing Question Ona particular run of a process which bottles a drink, it is thought that the cleansing process of the bottles has partially failed. The bottles have been boxedinto crates, each containing six bottles. It is thought that each bottle, independently of all others, has the same chance of containing impurities. Asurvey has been conducted, and each bottle in arandom sample of 200 crates has been tested for impurities. Thetable below givesthe numbers of cratesin the sample which had the respective number of bottles Number ofimpure which contained impurities: bottles: Number of crates: 0 1 2 3 4 5 6 38 70 58 25 6 2 1 Testthe goodness offit of a binomial distribution to these observations. Solution Wefirst need an estimate of ?, the proportion of bottles containing impurities. finding the MLEfor Weget this by ? based on the random sample. Perhapsthe simplest wayto calculate the MLE,? , is: total number of successes (impure bottles) total number of bottles Alternatively, we can see that ?6= x 301 ==0.25083333 1,200 , where x is the mean number of impure bottles per crate. 301 Fromthe data, x== 1.505, so, giventhat there aresix bottlesin eachcrate, 200 x 6 0.25083333== ? . An alternative approach to deriving the Let the number of bottles x1 MLEis asfollows. with impurities in each crate in a random )n(6, ? 2,...,xx200, . Eachix is an observation from a Bi function for L()=- sample of 200 crates be distribution, and so the likelihood ? is: 6?? ?? ?? x1?? (1 ? )6-xx constant =??)?? ...11 6 ?? ??? 200(1 x200?? 1,200 - - ?) 6-xx 200 xxii(1 Taking logs: l og IFE: 2022 Examinations Lxii)log(1 log =+(1,200 -?? x - ?? ) The Actuarial Education Compan CS1-10: Hypothesis testing Differentiating Page 47 with respect to log ? ? and setting the result equal to zero: ??xxiiL 0 1,200 =- ?? = 1- ? ? xi Solving thisweget ?===? 1,200 301 0.25083. 1,200 Wecan now calculate the expected frequencies. Wecalculate the probabilities from a Bin(6,0.25083) distribution, and multiply each probability by 200: Number of bottles with Observed Expected 0 38 35.36 1 70 71.03 2 58 59.46 3 25 26.54 4 or more 9 7.61 Total 200 200 impurities Wehave combined the last three groups since the expected frequencies are small. In fact we anticipated that the last two groups weregoing to havesmall expected numbers and calculated the expected number for the 4 or more group by subtraction from 200. The observed value of the chi square statistic is: ? (70 -- 71.03)22 2 (38 35.36) =+ 35.36 (58 + 71.03 - 59.46) 2 59.46 (25 + - 26.54) 2 26.54 (9 + - 7.61)2 7.61 = 0.59 There are now 5 groups. TheiE s were calculated from the total number of observations. have estimated one parameter. Sothe number of degrees of freedom is 3-51 1 = Weare carrying out a one-sided test. The observed value of the test statistic We . has a p-value of about 90%. So we haveinsufficient evidence to reject 0H at the 90%level. Therefore it is reasonable to conclude that the underlying distribution is binomial. Indeed the fit is almost too R can carry out a2? Education The resulting goodness-of-fit chisq.test(<observed The Actuarial good. Company test freq>, value of the test statistic is suspiciously small. using: p=<expected probabilities>) IFE: 2022 Examination Page 48 8.2 CS1-10: Hypothesis testing Contingency tables A contingency table is a two-way table of counts obtained when sample items (people, companies, policies, claims etc) are classified according to two category variables. The question ofinterest is whether the two classification H0: the two classification criteria criteria areindependent. are independent. The simple rule for calculating the expected frequency for any cell is then: row total column total table total (iethe proportionof datain rowi is ???ijffij ji numberexpected in cell)i(,j is ?? ij ?? ??? ?ijff ji The degrees offreedom associated (rc 1)-- ( r so if the criteria areindependent, the j j ?? ?? with atable fij .) i with r rows and c columns is: 1)-- ( c - 1) = ( r - 1)( c - 1) since the column totals and row totals reduce the number of degrees of freedom. Animportant use ofthis methodis withatable of dimension2 test for differences among 2 or more population c (or2r ) whichgivesa proportions. Question For each ofthree insurance companies, A, B, and C, a random sample of non-life policies of a particular kind is examined. It turns out that a claim (or claims) have arisen in the past year in 23% of the sampled policies for A,in 28% of those for B, and in 20% of those for C. Test for differences in the underlying proportions rise to claims in the past year among the three of policies of this kind which have given companies (a) the sample sizes were 100, 100, and 200 respectively (b) the sample sizes were 300, 300, and 600 respectively. Comment briefly IFE: 2022 Examinations in the two situations: on your results. The Actuarial Education Compan CS1-10: Hypothesis testing Page 49 Solution H0: population proportions are all equal H1: population proportions are not all equal (a) Observed frequencies: A B C 23 28 40 91 77 72 160 309 100 100 200 400 ? Expected frequencies under 0H : A ? B C 22.75 22.75 45.50 91 77.25 77.25 154.50 309 100 100 200 400 Values of :ii -fe 0.25 5.25 -0.25 -5.25 -5.5 5.5 So: 2 ? = 0.25 22.75 + 22 5.25 22.75 0.003=+ 1.212 = + + 5.52 45.50 0.665 + 0.25 5.2522 +++ 77.25 77.25 0.001 + 0.357 + 5.52 154.50 0.196 2.43 on 2 df. Here df stands for degrees of freedom. This is an unremarkable stand. (b) No differences now ? 2 p-value Education 2. ?2 We have no evidence among the population The sample sizes are increased claims as in (a) are assumed. each component The Actuarial value for proportions against 0H , which can have been detected. by a factor of 3, but the same percentages with fe and -() fe all increase by a factor of 3 so , ii ii of 2? , and the resulting value, also increase by a factor of 3. So 7.3= . = P ? 2 (72 .3) , which is just > Company a bit bigger than 0.025. IFE: 2022 Examination Page 50 CS1-10: Hypothesis There is quite strong evidence against 0H we conclude that the testing population proportions are not all equal (p-value about 0.03). Comments: The observed sample proportions 23%, 28%, and 20% are not significantly different when based on sample sizes of 100, 100, and 200, but are significantly different when based on sample sizes which are considerably bigger (300, 300, and 600). Question In aninvestigation into the effectiveness according to the severity of their injuries of car seat belts, 292 accident victims were classified and whether they were wearing a seat belt at the time of the accident. The results were asfollows: Wearing a seatbelt Not wearing a seatbelt 3 47 Severe injury 78 32 Minor injury 103 29 Death Determine whether the severity of injuries sustained is dependent on whether the victims are wearing a seat belt. Solution The hypotheses are: H0: severity ofinjuries is independent of wearing aseatbelt H1: severity ofinjuries is not independent of wearing aseatbelt Wecan calculate the expected frequencies in each category by multiplying the row and column totals, and dividing by the overall total: Expectedfreq Wearinga seatbelt Not wearing a seatbelt Death 31.5 18.5 Severeinjury 69.3 40.7 Minorinjury 83.2 48.8 For example, 184 50 = 31.507. So wecan now calculate the value of the chi-square statistic: 292 2 (3 ? 31.5) 31.5 =+ ? (29 -+ 48.8)22 48.8 = 85.39 Thenumber of degreesoffreedomis (32--1)(2 1) = . IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Weare carrying Page 51 out a one-sided test. Our observed value of the test statistic is far in excess of 10.60, the upper 0.5% point ofthe ?2 2 distribution.In fact out the first term in the wecouldhavestoppedafter working ?2 value which is already 25.79. So we have sufficient evidence to reject H0 at the 0.5%level. Therefore it is reasonable to conclude that the level ofinjury is almost certainly dependent on whether the victim is wearing a seatbelt. Question Thetable below shows the numbers of births during one month at a particular hospital classified according to whether a particular medicalcharacteristic wasor wasnt present during childbirth. Age of mother < 20 21-25 26-30 31-35 36+ Total 10 12 9 4 3 38 5 51 38 25 5 124 15 63 47 29 8 162 Characteristic present Characteristic absent Total Assess whether the presence ofthis characteristic is dependent on the age of the mother. Solution The hypotheses are: H0: the characteristic is independent of the mothers age H1: the characteristic is not independent of the mothers age The observed frequencies are: Age of mother < 20 21-25 26-30 31-35 36+ Total Characteristic present 10 12 9 4 3 38 5 51 38 25 5 124 15 63 47 29 8 162 Characteristic absent Total The Actuarial Education Company IFE: 2022 Examination Page 52 CS1-10: Hypothesis Wecan calculate the expected frequencies in each category testing by multiplying the row and column totals, and dividing by 162: Age of mother < 20 21-25 26-30 31-35 36+ Total 3.52 14.78 11.02 6.80 1.88 38 Characteristic absent 11.48 48.22 35.98 22.20 6.12 124 Total 15 63 47 29 8 162 Characteristic present In contingency tables the totals meansthat in a table are always the same in the observed and expected tables. with only 2 rows, if This wecalculate the entries in one of the rows first, we can work out the entries in the other row by subtraction. Two cells out of 10 cells have expected frequencies less than 5. Sincethis is not morethan 20% wecan usethe table asit is. So we can now calculate the value of the chi square statistic. 2 (10 3.5)22 ? =+ 3.5 ? (5-- 6.1) + 6.1 The number of degrees of freedom is (5 = 19.2 1)(2-- 1) = 4. Weare carrying out a one-sided test. Our observed value of the test statistic exceeds 18.47, the upper 0.1% point of the ?2 4 level. distribution.Sowehavesufficientevidence to reject0H atthe 0.1% Therefore it is reasonable to conclude that the characteristic is dependent on the mothers age. If wedecided to combine cells because of the expected values beingless than 5, wecould do this by combining adjacent groups asfollows: Age of mother = 25 26-30 31+ Total 22 9 7 38 Characteristic absent 56 38 30 124 Total 78 47 37 162 Characteristic present IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 53 The expected values are: Ageof mother = 25 26-30 31+ Total 18.30 11.02 8.68 38 Characteristic absent 59.70 35.98 28.32 124 Total 78 47 37 162 Characteristic present Wecan now calculate the value of the chi-square statistic. 2 (22 ? 18.30)22 =+ 18.30 ? (30 -+ 28.32) 28.32 = 1.89 Thenumberof degreesof freedomis (32--1)(2 1) = . Weare carrying out a one-sided test. 5.991, the upper 5% point ofthe ?2 2 the 5%level. Our observed value of the test statistic does not exceed distribution.So wehaveinsufficientevidence to reject0H at Therefore it is reasonable to conclude that the characteristic is not dependent on the mothers age. The results are so different R can carry out a2? because of the effect of the small expected values. contingency table test using chisq.test(<table>). automatically applies a continuity correction for 22 tables argument correct=FALSE if we wished to prevent this. 8.3 Since this we would need to set the Fishersexacttest A non-parametric permutation approach to contingency tables was devised more than 80 years ago by the great statistician R.A. Fisher. Consider two categorical variables X and Y, each withtwo categories,1X , 2X , 1Y and2Y . Suppose wehave datafor n observations, and that of these nX1 are in category 1X on variable X, nX2 are in category 2X on variable X, nY1 are in category 1Y on variable Y, and nY2 are in category 2Y on variable Y. The Actuarial Education Company IFE: 2022 Examination Page 54 CS1-10: Hypothesis These data can be represented in a 2 X1 Variable testing 2 contingency table as shown below. X2 Y1 nY1 Y2 nY2 nX1 nX2 n Fisher proposed testing the association between the two categorical variables by working out the probability of each possible permutation of values in the shaded cells consistent with the marginal totals association, the hypergeometric. variable nX1, nX2, nY1 andnY2 . Then, underthe null hypothesis of no distribution of ways of allocating the data to the four shaded cells is This means that, if the number of individuals which have the value 1X X and the value 1Y on variable Y is nXY , then the probability of obtaining on this 11 number is given by: ??? () Pn XY11 XY = ??? ???nn nnXX ? 12 Y ???11 1 - n ? ? 11? XY for n ?? ?? ?? nY1 XY11 = nnX 1 , nY1 ?? The stronger the association between be concentrated in either cells }YX{, 11 X and Y the more heavily the observations should and }YX{, 22 or }YX {, 12 and }YX{, 21 (ie in two opposite corners of the contingency table). Consider a sample of 10 people, 6 men and 4 women. Of these 3 are colour blind: Colour blind Not Male 2 4 6 Female 1 3 4 3 7 10 Usingthe formula above, the probability of observing 2 colour-blind menfrom this sample is: ???37 ? ??? ? ???24 ?== 335 10?? 210 ?? 6?? 10?? ?? 6?? 1 2 is the total number of ways of choosing the 6 menfrom the 10 people. ???37 ? ??? ? is the number of ways of choosing 2 menfrom the 3 colour blind people and the 4 men ???24 ? from the 7 non-colour blind people. IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 55 Hence this expression gives us the probability of observing 2 colour blind menfrom this group of 10 people. Atest can then be constructed by considering the observed or a more extreme concentration The only four outcomes which produce a the probability of getting a distribution 22 table with the same row and column totals 3 3 2 4 1 5 0 6 0 4 1 3 2 2 3 1 Using the formula we can calculate the probabilities ???37 ? ??? ? ???37 ? ??? ? ???37 ? ??? ? 10?? 10?? 10?? 10?? ?? 6?? ?? 6?? ?? 6?? 6?? ???24 ? These are 1 6, 2, 1 3 10 15 ? ??? 06 ? ??? ?? and 30, 1 respectively. For a one-tailed test it suffices to consider only distributions which are extreme in the same direction as the observed table, whereas for atwo-tailed test distributions considered small tables which are extreme in the opposite direction (this as the sampling distribution is not symmetrical). Atthe 5%level of significance In our example, the probability should be can cause complications with we should reject the null hypothesis of no association if the probability of getting a distribution less than 0.05. colour blindness)is are: of each of these outcomes: ???37 ? ??? ? 33 ? ??? with of observations in two opposite corners. which is the same or more extreme than that of observing this result observed is or more extreme (ie 2 or more men with 11 2 . Thisis notlessthan 5% it is actuallyverylikely andso basedon 3+= 62 these results weconclude that gender and colour blindness areindependent. Onthe other hand, if we were to find that our result wasrare, we would conclude that the result is notjust due to chance, there is some connection between the variables. Fishers test wasextendedto a general RCtable by Freeman and Halton. Wechose an example with a very small sample as otherwise there would be manycombinations which will be time consuming on a piece of paper. However, this test is no problem for a computer. R can carry The Actuarial out Fishers Education Company Exact Test using the command fisher.test(<table>). IFE: 2022 Examination Page 56 CS1-10: Hypothesis testing Question Acertain company employs both graduates and non-graduates. Asmall sample of employees are entered for a certain test, with the following results. Ofthe four graduates taking the test, all passed. Of the eight non-graduates taking the test, five passed. Using Fishers exact test, assess whether graduates are morelikely to passthe test than non-graduates. Solution Giventhat we had nine passes,the number of ways of choosing four graduates to passis Giventhat we had three fails, the number of ways of choosing no graduates to fail is total number of ways of choosing four graduates out of 12 employees is 9?? ?? 4?? . 3?? ??. The 0?? 12?? ?? 4?? . Sothe probability of obtaining four graduate passesis: ?? ?93? ??? ? ?? ?40? 12?? 14==0.2545 55 ?? 4?? Since wecannot obtain morethan four graduate passes when we only havefour graduates, this is the mostextreme result possible, and the total probability of obtaining as extreme a result asthis is 0.2545. Since this is not less than 5%, we have insufficient evidence to conclude that graduates are morelikely to passthan non-graduates. Wecan see that in this case it will never be possible to obtain a significant result based on the small sample numbers we have here. Fishers exact test needs much bigger samples for it to be usable to obtain satisfactory statistical results. IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 57 Chapter10Summary Statistical tests can be usedto test assertions about populations. The process of statistical testing involves setting up a null hypothesis and an alternative hypothesis, calculating a test statistic and using this to determine The probability of a TypeI error is the probability of rejecting 0H a p-value. whenit is true. Thisis also called the size(or level) ofthe test. The probability of a TypeII error is the probability of not rejecting 0H whenit is false. The power of atest is the probability ofrejecting 0H whenit is false. Errors can also occur in the context of binary classifications, for example when an individual is classified as testing positive or negative for a particular disease. The null hypothesis is that the individual does not havethe disease. A TypeI error is afalse positive and a TypeII error is afalse negative. The sensitivity of this test is the true positive rate (which is 1 (Type II error) -=Ppower of the test ). The specificity of this test is the true negative rate (which is 1 -P(Type I error) ). Thebest test can be found usingthe likelihood ratio criterion. detailed overleaf. Thisleads to the tests The test for two normal means(unknown variances) requires that the variances are the same and uses the pooled sample variance: 2 = sp ?2 (1)22 ns 11-+ ( n2 - 1)s2 nn 12+- 2 tests can be carried out to test for goodness of fit or to test whether two factors are independent (using contingency tables). Thestatistic is ? ii()2 -OE Ei . Tofind the number of degrees offreedom for the goodness offit test, take the number of cells, subtract 1if the total ofthe observed figures has been usedin the calculation of the expected numbers (which is usually the case), and then subtract the number of parameters estimated. To find the number of degrees of freedom for a contingency table calculate (1)( -- 1)rc . If the expected numbersin some cells are small, these should be grouped. One degree of The Actuarial Education Company IFE: 2022 Examination Page 58 CS1-10: Hypothesis testing One-sample normaldistribution (0,1) XX-- known nSn s (1)nS2 - s 00??Ntn-1 ss 22 unknown 2 ??n-1 2 0 Two-samplenormal distribution XX2() 1 --- 12() 22 nn ss + 11 2 -XX (0,1) () Sn +11 12 n p 2 known S S 12 12()- ss Nt nn ??+-2 12 22 unknown 22 s 11 22 ? Fnn s22 1,12 1 -- One-samplebinomial pp (0,1) or pq 00 n X-- np 00?? ?? (0,1)NN npq00 ?? withcontinuity correction Two-samplebinomial 12 ()12 -pp-- pp () pq pq (0,1) ??? Np= x + x12 nn 12 is the overall sample proportion + + nn 12 One-sample Poisson X ?- 0????? (0,1) or -Xn?0 nn ?? 00 (0,1) NN ?? with continuity correction Two-samplePoisson (?? 12()-- 1 ??) - 2 ???N(0,1) ?? ?= 1? 1 +nn ?2 2 is the overall sample nn mean + 12 + nn 12 IFE: 2022 Examinations The Actuarial Education Compan CS1-10: 9 10.1 Hypothesis testing Page 59 Chapter10 PracticeQuestions Astatistical test is usedto determine whether or not an anti-smoking campaign carried out 5 years ago hasled to a significant reduction in the meannumber of smoking related illnesses. The probability value of the test statistic is 7%. Determine the conclusion for atest of size: 10.2 (i) 10% (ii) 5%. Arandom sample, 9.5 ?110,,x x 18.2 120.19 (i) (ii) 10.3 , from a normal population 4.69 3.76 14.2 17.13 gives the following 15.69 13.9 values: 15.7 7.42 ??2 xxii == 1,693.6331 Test atthe 5%level whether the meanof the whole population is 15if the variance is: (a) unknown (b) 20. Test atthe 5%level whether the population varianceis 20. A professional gambler hassaid: Flipping acoin into the airis fair, since the coin rotates about a horizontal axis, and it is equally likely to be either coin is equally likely to land showing way up when it first clips the ground. heads or tails. So a flicked However, spinning a coin on a table is not fair, since the coin rotates about a vertical axis, and there is asystematic biascausingit to tilt towards the side wherethe embossed pattern is heavier. In fact, whena new coin is spun,it is morethan twice aslikely to land showing tails asit is to land showing heads. After hearing this, an experiment was carried out, spinning a new coin 25 times on a polished table; the coin showed tails 18 times. Comment on whether the results of the experiment support the gamblers claims about the probabilities whena coin is spun. 10.4 The sample variances of two independent the same population variance, are =2 sA nB = 5 and the sample Two populations normal populations A and B, which have 12.4 and =2 sB 25.8. The sample sizes are nA = 10 and means are found to differ by 4.5. Test whether the population 10.5 samples from means are equal. X and Y are known to have the same variance, but the precise distributions are not known. Asample of 5 valuesfrom population X and 10 valuesfrom population Y had sample variances of =2 sX 47.0 and =2 sY 12.6 . Carry out a statistical test based on the F distribution to assess whether both populations can be considered to be normally distributed. The Actuarial Education Company IFE: 2022 Examination Page 60 10.6 CS1-10: Hypothesis Determine the form of the best test of =00:Hvs distribution of the underlying population is N 10.7 =11:H , where >10, testing assuming the ) , based on a sample of size n. s2(, A blood test has been used on 1,000 people to detect whether they have a particular condition. Ofthe 427 people who had a positive result, 369 of them had the condition. Ofthe 573 people who had a negative result, 15 of them had the condition. (i) (a) Calculatethe sensitivity of the blood test. (b) Calculatethe specificity ofthe blood test. Asecond blood test is used on 1,000 people which has a sensitivity of 80% and a specificity of 60%. For this blood test, 544 people had a positive result. (ii) 10.8 (a) Calculatethe number oftrue positives. (b) Calculatethe number offalse positives. Thelengths of a random sample of 12 worms of a particular species have a meanof 8.54 cm and a standard deviation of 2.97 cm. Let denote the meanlength of a worm of this species. It is Exam style required to test: :=?7cm01: 7cm Hvs The lengths H of worms are assumed to be normally Calculate the probability-value 10.9 Exam style distributed. of these sample results. [3] Ageneralinsurance companyis debatingintroducing a new screening programme to reduce the claim amounts that it needsto pay out. The programme consists of a much more detailed application form that takes longer for the new client department to process. The screening is applied to a test group of clients as atrial whilst other clients continue to fill in the old application form. It can be assumedthat claim paymentsfollow a normal distribution. The claim payments datafor samples ofthe two groups of clients are(in 100 per year): Without screening 24.5 21.7 45.2 15.9 23.7 34.2 29.3 21.1 23.5 28.3 Withscreening 22.4 21.2 36.3 15.7 21.5 7.3 12.8 21.2 23.9 18.4 (i) Testthe hypothesis that the new screening programme reduces the meanclaim amount. [5] (ii) Testthe assumption of equal variances required in part (i). IFE: 2022 Examinations [3] [Total 8] The Actuarial Education Compan CS1-10: 10.10 Exam style Hypothesis testing Page 61 An environmentalist is investigating the possibility that oestrogenic chemicals are leading to a particular type of deformity in aspecies of amphibiansliving in alake. The usual proportion of deformed animalsliving in unpolluted wateris 0.5%.In asample of 1,000 animals examined, 15 werefound to have deformities. (i) Test whether this provides evidence of the presence of harmful chemicals in the lake. Following an extensive campaign to reduce these chemicals in the lake afurther [3] sample of 800 animals wasexamined and 10 werefound to have deformities. (ii) Test whether there has been asignificant reduction in the proportion of deformed animals in the lake. [3] [Total 6] 10.11 The total claim amounts (in m) for home and car insurance over a year for similar sized companies are collected by anindependent advisor: Exam style Home 13.3 19.2 12.9 15.8 17.6 Car 14.3 21.0 12.8 17.4 22.8 (i) Test whether the meanhome and car claims are equal. State clearly your probability value. [5] It wassubsequently discovered that the results were actually 5 consecutive years from the same company. (ii) Carry out an appropriate test of whether the mean home and car claims are equal. [3] [Total 8] 10.12 Arandom Exam style variable fx() X is believed to have probability ( =+-x) 343 ?? density function, )f ( x , where: x >0 In orderto testthe nullhypothesis?=50 againstthe alternativehypothesis?=60,asingle value is observed. If this value is greater than 93.5, 0H is rejected. (i) Calculate the size of the test. (ii) Calculatethe power of the test. The Actuarial Education Company [2] [2] [Total 4] IFE: 2022 Examination Page 62 10.13 Exam style CS1-10: Hypothesis In an extrasensory perception experiment carried out in alive television interview, testing the interviewee whoclaimed to have extrasensory powers wasrequired to identify the pattern on each of 10 cards, which had been randomly assigned with one of five different patterns. The cards were visible only to the audience who were askedto transmit the patterns to the interviewee. this Whenthe interviewee failed to identify any of the cards correctly, she claimed that was clear proof of the existence of ESP,since there was a strong mind in the audience who was willing her to get the answers wrong. (i) (a) State the hypotheses implied by the interviewees conclusion and carry out a 5% test on this basis. (b) (ii) Comment on your answer. (a) [3] State precisely the hypotheses that the interviewer could have specified before the experiment to prevent the interviewee from cheating in this way. (b) Determine the number of cards that would haveto beidentified correctly to demonstrate 10.14 Exam style the existence of ESPat the 5%level. [2] [Total 5] Aninsurer believesthat the distribution ofthe number of claims on a particular type of policyis binomialwithparameters 3n= and p. Arandomsampleofthe numberofclaimson153policies revealed the following results: Number of claims 0 1 2 3 Number of policies 60 75 16 2 (i) Derive the maximum likelihood estimate of p. [4] (ii) Carry out a goodness-of-fit test for the binomial modelspecified in part (i) for the number of claims on each policy. [5] [Total 9] 10.15 In an investigation into a patients red corpuscle count, the number of such corpuscles appearing in each of 400 cells of a haemocytometer wascounted. Theresults were asfollows: Exam style No. of red blood corpuscles 0 1 2 3 4 5 6 7 8 No. of cells 40 66 93 94 62 25 14 5 1 It is thought that a Poisson distribution with mean provides an appropriate modelfor this situation. (i) . (a) Estimate (b) Test the fit of the Poisson model. IFE: 2022 Examinations [8] The Actuarial Education Compan CS1-10: Hypothesis testing Page 63 For a healthy person, the mean count per cell is known to be equal to 3. For a patient with certain types of anaemia, the number of red blood corpusclesis known to belower than this. (ii) Test whether this patient has one of these types of anaemia. [3] [Total 10.16 In a recent study investigating developing Exam style symptoms a possible genetic link between individuals of AIDS, 549 men who had been diagnosed susceptibility HIV positive 11] to were classified according to whether they carried two particular alleles (DRB1*0702 and DQA1*0201). The results were asfollows: Free of symptoms Condition ofindividual Early symptoms Suffering from AIDS Total Alleles present 24 7 17 48 Alleles absent 98 93 310 501 Total 122 100 327 549 Test whether there is an association between the presence ofthe alleles and the classification into the three AIDS statuses using these results. [5] 10.17 Insurance claims (in ) arriving at an office over the last monthhave been analysed. Theresults are asfollows: Exam style Claim size, c c=< 0500 No. of claims (i) 500 75 Assuming that the =<c1,000 1,000 51 =<c2,500 22 over 2,500 5 maximum claim amount is 10,000: (a) calculate the sample meanofthe data (b) test atthe 5%level whether an exponential distribution with parameter appropriate ?is an distribution for the claim sizes. You should estimate the value of using the method of moments. ? [6] An actuary decides to investigate whether claim sizes vary according to the postcode of residence of the claimant. She splits the data into the three different postcodes observed. The results for the first two postcodes are given below: Postcode 1: Claim size, c c=< 0500 No. of claims The Actuarial Education 23 Company 500 =<c1,000 14 1,000 =<c2,500 7 over 2,500 3 IFE: 2022 Examination Page 64 CS1-10: Hypothesis testing Postcode 2: Claimsize, c c=< 0500 No.of claims (ii) 500 30 Test at the 5%level =<c1,000 1,000 16 =<c2,500 over 2,500 11 whether claim sizes are independent 1 of the postcodes. [8] [Total 14] 10.18 Exam style A politician hassaid:A recent studyin a particular areashowed that 25% ofthe 400 teenagers who wereliving in single-parent families had beenin trouble withthe police, compared with only 20% of the 1,200 teenagers who wereliving in two-parent families. Our aimis to reduce the number of single-parent families in order to reduce the crime rates during the next decade. (i) Carry out a contingency table test at the 5%significance level to assess whether there is a significant association between living in a single-parent family and getting into trouble with the police. (ii) [5] Comment on the politicians statement. [1] [Total 6] 10.19 Exam style Acertain species of plant producesflowers whichare either red, white or pink. It also produces leaves which maybe either plain or variegated. For a sample of 500 plants, the distribution of flower colour and leaf type was: Red (i) 97 42 77 Variegated 105 148 31 Test whether these results indicate any association between flower colour andleaf type. [6] model suggests that the proportions Red Plain q Variegated q where (iii) Pink Plain Agenetic (ii) White (0 qq of each combination should be asfollows: White q /2 3/2q Pink (1 3 ) /2q- (1 5 ) /2q- 1/ 5)<< is an unknown parameter. (a) Show that the maximumlikelihood estimate for q is 0.181. (b) Test whether this genetic modelfits the data well. [12] Comment briefly on your conclusions. [3] [Total IFE: 2022 Examinations The Actuarial Education 21] Compan CS1-10: 10.20 Exam style Hypothesis A particular testing Page 65 area in a town suffers a high burglary rate. A sample of 100 streets is taken, and in each of the sampled streets, a sample of six similar housesis taken. Thetable below shows the number of sampled houses, which have had burglaries during the last six months. No. of houses burgled x 0 1 2 3 4 5 6 No. of streets f 39 38 18 4 0 1 0 (i) (a) State any assumptions needed to justify the use of a binomial modelfor the number of houses per street which have been burgled during the last six months. (b) Derive the maximum likelihood estimate of p, the probability that a house of the type sampled has been burgled during the last six months. (c) Determine the probabilities (d) Comment on the fit without doing aformal test. Aninsurance company for the binomial works on the basis that the probability model using your estimate of p. [10] of a house being burgled over a six-month period is 0.18. (ii) Carry out a test to investigate a good fit for the data. whether the binomial model with this value of p provides [7] [Total 17] The Actuarial Education Company IFE: 2022 Examination Page 66 10.21 Exam style CS1-10: Hypothesis It is desired to investigate the level of premium charged by two companies for contents testing policies for housesin a certain area. Random samples of 10 housesinsured by Company Aare compared with 10 similar housesinsured by Company B. The premiums chargedin each caseare as follows: Company A 117 154 166 189 190 202 233 263 289 331 Company B 142 160 166 188 221 241 276 279 284 302 Theline plots below show the sample valuesfor the two companies : Company A 100 150 200 250 300 350 300 350 Company B 100 (i) 150 200 250 Comment briefly on the validity of the assumptions required for a two-sample t test for the premiums ofthese two companies usingthe plots. [2] Forthese data:?=A2,134?=A 494,126 , ? =B 2,259?=B 541,463 . 2 2 , (ii) , Carry out a formal test to check that it is appropriate to apply a two-sample t test to these data, assumingthat the premiums are normally distributed. (iii) Test whether the level of premiums charged by Company B wassignificantly higher than that charged (iv) [4] (a) by Company A,stating the p value and conclusion clearly. Calculate a 95% confidence interval for the difference [3] between the proportions of premiums of each company that arein excess of 200. (b) Comment briefly on your result to part (iv)(a). [3] The average premium charged by Company Ain the previous year was 170. (v) Test whether Company A appears to have increased its premiums since the previous year. [3] [Total 15] IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 67 Chapter10Solutions 10.1 The hypotheses are: H0: The campaign has not led to areduction in smoking related illnesses. H1: The campaign hasled to a reduction in smoking related illnesses. Conclusion for test of size 10% (i) Sincethe calculated probability value (7%) is less than the size ofthe test (10%), we have sufficient evidence at the 10%level to reject 0H . Therefore the campaign hasled to a reduction in the meannumber of smoking related illnesses at the 10%level. (ii) Conclusion for a test of size 5% Since the calculated probability value (7%) is greater than the size of the test (5%), we have insufficient evidence at the 5%level to reject 0H . Therefore the campaign has not led to a reduction in the 10.2 (i)(a) mean number of smoking related illnesses at the 5%level. Test mean when population variance unknown Weare testing: :=?15 :15 Hvs H 01 Sincethe variance is unknown, the test statistic is X- ?tn-1 . From the data, we have: Sn 120.19 ==12.019 x 10 1 22= (1,693.6331 =- 10 12.019 ) 27.674 9 s This gives a statistic of: 12.019 15 t==-1.792 27.674 10 Thisis greater than the 9t critical value of -2.262 so there is insufficient evidence at the 5%level to reject 0H . Therefore it is reasonable to conclude that Alternatively, the probability 2 0.055 using probability of obtaining values, we have a value at least =15. Pt(9 <-1.792) 0.055 . This test is two-sided, so as extreme as the one actually obtained is 0.11=. Thisis greater than 0.05 so we have insufficient evidence to reject 0H at the 5% level. The Actuarial Education Company IFE: 2022 Examination Page 68 (i)(b) CS1-10: Hypothesis Test mean when population testing variance known Weare testing: :=?15 :15 Hvs H 01 X- Sincethe varianceis known wecan use ? N(0,1). This gives: n s X z== 12.019 -- 15 n s 20 = -2.108 10 Thisis less than the critical value of 1.96- so there is sufficient H0. Therefore it is reasonable to conclude that Alternatively, using probability the probability 2 0.0175 of obtaining values, we have a value at least evidence at the 5%level to reject =/ 15. (PZ <- 2.108)= 0.0175. Thistest is two-sided, so as extreme as the one actually obtained is 0.035= whichis less than 0.05. So we havesufficient evidenceto reject 0H at the 5% level. (ii) Test variance Weare testing: :=?20 :20 Hvs Weknow that (1)nS2 - s 2 H 01ss 22 2 has a ?n -1 distribution. The observed value of the test statistic is: 9 27.674 20 =12.45 The critical values of ?92 are 2.700 and 19.02for atwo-sided test. So we haveinsufficient evidence at the 5%level to reject 0H . Therefore it is reasonable to conclude that s 2 20= . 10.3 Totest whether tails is morethan twice aslikely, weusethe hypotheses: Hp vs H 01 ::p => 22 33 Let X be the number of tails obtained in the experiment, XBin(25, p)?? N(25 p,25 pq) IFE: 2022 Examinations ? then: 25 Xp ???N(0,1) 25pq ?? - The Actuarial Education Compan CS1-10: Hypothesis testing Page 69 Under 0H , the statistic 17 z with continuity 1612 - 23 correction is: 0.354 == 55 9 Thisis less than the critical value of 1.645, so there is insufficient evidence at the 5%level to reject 0H . Therefore it is reasonable to conclude that p = 3 2 , ie the experiment enough evidence to show that tails is more than twice aslikely does not provide as heads. Alternatively,usingprobability values, wehave PZ>= ( 0.354) 0.362, which is greater than 0.05. So wehaveinsufficientevidence to reject0H atthe 5%level. 10.4 Wearetesting: Hvs =? H ::A BA 01 B The test statistic is: (XX ()-- A -ABB) SP 2 nn+- 2 where SP= ?t - 22 A (1)nS -+n(ABB1)S nn AB+-2 AB 11 +nn AB The observed value of the pooled variance is: sP2== 9 12.4+ 4 25.8 16.52 13 Sothe value ofthe test statistic is: 4.5 - 0 =2.021 11+ 16.52 10 5 Thislies between the 13tcritical values of 2.160, so wehaveinsufficient evidence atthe 5% level to reject 0H . Therefore weconclude that .B A = Alternatively, using probability values, wehave Pt(13 the probability 2 0.034 of obtaining a value at least 2.021) 0.034>= . Thistest is two-sided, so as extreme as the one actually obtained is 0.068= . Thisis greater than 0.05, so wehaveinsufficient evidenceto reject 0H at the 5%level. 10.5 Weare testing: vs H0: The populations H1: Atleast one of the populations does not have a normal distribution. If 0H is true, then The Actuarial Education S / s22 XX 22 S / sYY Company both have normal distributions ?F 4,9 . IFE: 2022 Examination Page 70 CS1-10: Hypothesis Since weknow that 47.0/12.6 testing 22/XY SS, which has an observed value of 22, this test statistic is just ss=XY 3.730= . The 5%critical values for an 4,9Fdistribution are 0.1123 and 4.718. Since 3.730lies between these, we haveinsufficient evidence at the 5%level to reject 0H . Therefore weconclude that the populations are both normal. Thisis a slightly unusual application of the Ftest, which is usually used to test variances for populations that are assumed to have a normal distribution. 10.6 The hypotheses are: H00 Hvs == 1:: 1 (where >10) Here, wecan usethe likelihood ratio criterion, Likelihood under H0 < critical value Likelihood under whichsays that weshould reject 0H if: H1 Sincethe populations are normal, this is: e - 1??xi - 0 2 nn ??2?? s ?? sp ii ==s11 11 1??xi ? 2? - e 2 -1 s ? ? 22 p <constant Cancellingthe constants reduces this to: () -- e 11 ?? (xxii -- 22 ss e 22 ) 22 01 < constant Takinglogs: 11 xiixconstant -()<22 + 22 ss Multiplying through by 2s +ii --(2 x Simplifying this gives Since > constant ) - 01 and expanding the squares: 2 22 ??( ) 1xi i(x2 -2?? +1 2 ) <constant () ix constant-<? 01 . 10, we have to reverse the inequality - > x +00 22 01, and the test criterion when we divide through by the negative reduces to: xconstant Sothe besttest requires usto reject 0H if the sample meanexceeds a specified critical value. IFE: 2022 Examinations The Actuarial Education Compan CS1-10: 10.7 (i) Hypothesis testing Page 71 From the question, we have the following outcomes from the blood test: Bloodtest result Patient Positive Negative Yes 369 15 No 58 558 427 573 actually has condition Number of true positives (a) Sensitivity (b) Specificity (ii) 369 Total number of people withthe condition == 369 + 15 Number of true negatives Total number of people 96.1% = 558 == without the condition 58 + 558 = 90.6% From the question, wehavethe following outcomes from the blood test: Blood test result Patient actually hascondition Positive Negative Yes True positive (TP) False negative (FN) No False positive (FP) True negative (TN) 544 456 1000 Rearrangingthe sensitivity to expressthe false negativesin terms of true positives: Sensitivity TP == 80% ? TP + FN =+ FN) TP 0.8(TP ? FN = 1 TP 4 Rearranging the specificity to express the true negatives in terms Specificity TN == FP + TN 60% ? TN 0.6(FP =+ TN) ? of false positives: TN = 3 FP 2 Usingthe given number of positive test results, weget: TP FP (1) 544+= Working with the number of negative test results and expressingin terms of numbers of true and false positives: 3 FP 456 FN TN 456+= ? 1 TP+ 42 = (b) TP 6FP 1,824 += (2) Subtracting the first equation from the second gives: 5FP = 1,280 (a) ? Substituting ? FP = 256 the false positives into the first equation gives TP 288= . The Actuarial Education Company IFE: 2022 Examination Page 72 10.8 CS1-10: Hypothesis testing Weare testing: H: 7cmHvs 01: s=? X- Under 0H , the statistic 7cm ( 2 unknown) has a 11t distribution. Sn Sothe value of our test statistic is: 8.54 - 2.97 Comparing this 7 =1.796 [1] 12 with the tables of the 11t distribution, Hence,wehaveaprobability valueof 10.9 (i) Test whether new screening wefind that Pt 5%. (1.796) >=11 2=5% 10%,asthetestis twosided. programme reduces [1] [1] mean claim amount Weare testing: 01 =< Hvs 2 H1 :: 2 [1/2] 1 wheresubscript 1refers to without screening. Thetest statistic is: (XX ()-- 1 -12 2) SP 2 nn+- 2 whereSP= ?t 12 11 +nn (1)nS 22 11-+ ( n2 - 1) S2 nn 12+- 2 [1/2] 12 Calculatingthe observed values: 267.4 10 s1 s2 2 1 9 1 9 == 26.74, xx12 200.7 = 20.07 [1/2] () = 67.2093 [1/2] () = 58.4357 [1/2] 22 7,755.16=- 10 26.74 4,553.97 =- 10 = 10 20.0722 9 67.2093 + 9 58.4357 sP==62.8225 [1/2] 18 Sothe value of the test statistic is: (26.74 20.07)-- 0 62.8225 IFE: 2022 Examinations =1.882 [1] 2 10 The Actuarial Education Compan CS1-10: Hypothesis testing Page 73 Thisis greater than the 18tcritical value of 1.734, so we have sufficient evidence at the 5%level to reject 0H . Therefore we conclude that < 21. [1] Alternatively, using probability values, wehave Pt(18 1.882) 0.04> , whichis less than 0.05. So wehavesufficient evidenceto reject 0H at the 5%level. (ii) Test equality of variances Weare testing: Hss 22 Hvs s=?01 2 1:: s 12 22 [1/2] Thetest statistic is: S S 22 s 11 ? Fnn 22 s22 [1/2] 1,12-- 1 Under0H , the value ofthe test statistic is: 67.2093 58.4357 = 1.150 The 5% critical values for an [1] 9,9Fdistribution are 0.2484 and 4.026. Since 1.150 lies between these, we haveinsufficient evidence at the 5%level to reject 0H . Therefore weconclude that 22 ss=12 10.10 (i) (and hence the assumption required for part (i) seems valid). [1] Testif chemicals are present Wearetesting the proportion p of defective animals usingthe hypotheses: Hp :0.005 vs H 01 [1/2] :0.005=>p Let X be the number of deformed animals obtained, then: XBin(1000, )p ?? N(1000 p,1000 pq) ? 1,000Xp ???N(0,1) 1,000pq ?? - [1/2] Under0H , the statistic with continuity correction is: 14.5 - 5 4.975 = 4.26 [1] Thisis greater than the 1%critical value of 2.3263, so there is sufficient evidence at the 1%level to reject 0H . Therefore weconclude that p > 0.005, ie there are harmful chemicals present in the lake. [1] Alternatively, using probability values, wehave (PZ>=4.26) The Actuarial Education Company 0.00001, whichis very significant. IFE: 2022 Examination Page 74 (ii) CS1-10: Hypothesis Test if there has been a significant reduction in deformed testing animals Weare testing: p:: Hp 01 p vs [1/2] H1 p2=<2 1 wherethe subscript 1 refers to before and 2refers to after. Thetest statistic is: -pp 12 (1 pp) (1-- pp) + N(0,1) ??? [1/2] nn 12 Here we have: 15 = 1,000 pp 12 0.015 == 10 = 0.0125 25 p= 800 1,800 ? = 0.0138 [1/2] which gives us a value of 0.450 for our test statistic. [1/2] Thisis less than the critical value of 1.6449, so there is insufficient evidence at the 5%level to reject 0H . Therefore it is reasonable to conclude that =12pp (ie there has not been asignificant reduction in the proportion of deformed animalsin the lake). Alternatively, using probability values, wehave PZ>= ( 0.450) [1] 0.326, whichis greater than 0.05 so we haveinsufficient evidenceto reject 0H atthe 5%level. 10.11 (i) Test whether mean home and car claims are equal Weare testing: =? Hvs HC H :: 01 H [1/2] C Thetest statistic is: ) HC HC ?t (XX () -- SP HC - nn +-2 11 nn HC + where S2 (1)nS -+n( C -C1)S 22 HH P = [1/2] nn HC+-2 Calculatingthe observed values: 78.8 == s1 1 4 15.76, 88.3 xxH 55 2 = () 22 1,271.34=- 5 15.76 IFE: 2022 Examinations = =17.66 [1/2] 7.363 [1/2] The Actuarial Education Compan CS1-10: Hypothesis testing 1 s2 () 22 1,631.93=- 5 17.66 4 2 Page 75 = 18.138 [1/2] 4 7.363 4 18.138 + sP==12.7505 [1/2] 8 The value of the test statistic is: (15.76 17.66)-- 0 [1] =-0.841 2 5 12.7505 Thislies between the 8t critical values of 2.306 , so we have insufficient level to reject 0H . Therefore weconclude that = Alternatively, using probability values, wehave8Pt evidence at the 5% HC. <-(0.841) [1] 0.21. Thistest is two-sided, so the probability of obtaining a value atleast as extreme asthe one obtained is 20.21 0.42= . This is muchgreater than 0.05, so wehaveinsufficient evidenceto reject 0H at the 5%level. (ii) Paired t-test Since the data are paired, :=?0DD :0 Hvs H The differences 01 D for each pair are: Sample 2 1.0 we are testing: Sample 1: 1.8 0.1 1.6 5.2 [1/2] Now: 9.5 == 1.9 xs DD = 54 221 33.85 - 5 1.9 = 3.95 () [1/2] So the observed value of the test statistic is: x - DD D sn 1.9 - 0 ==2.138 [1] 3.95 5 This lies between the 4t critical values of 2.776 , so we have insufficient level to reject 0H . Therefore weconclude that D 0= Alternatively, using probability values, wehave Pt(4 probability of obtaining a value at least as extreme evidence at the 5% . 2.138) [1] 0.05> . Thistest is two-sided, so the asthe one actually obtained is 20.05 0.1=. Thisis greater than 0.05, so wehaveinsufficient evidenceto reject 0H at the 5%level. The Actuarial Education Company IFE: 2022 Examination Page 76 10.12 (i) CS1-10: Hypothesis testing Size of the test The size of a test, a,is the probability of a TypeI error ie the probability of rejecting 0H whenit is true: PX=> (93.5 whena? 50) = [1] 8 ? = 350 (50 + )-34 xdx 93.5 8 ?? =- 50 (50 + x)-33 = ??93.5 0.0423 The size of the test is 4.23%. (ii) [1] Power of the test The power of a test, 1- 1 (PX -= > , is the probability 93.5 when ? = of rejecting 0H when it is false. 60) [1] 8 ? = 360 (60 + )-34 xdx 93.5 8 ?? =- 60 (60 + x)-33 ??93.5 = 0.0597 The power of the test is 5.97%. 10.13 (i)(a) State the interviewees [1] hypotheses The interviewee appears to be assuming (with the benefit of hindsight) a two-sided alternative hypothesis that includes both very good results and very bad results, ie the hypotheses (expressed in terms of the probability of a correct identification Hp (i)(b) 0.2 vs H: 01: p) would be: [1] 0.2=?p Interviewees test Under0H , the number of correctly identified patterns has a Bin(10,0.2) distribution. The probability of getting asfew as 0 correct is: 10?? 010 = 0.107 ??(0.2) (0.8) 0?? The additional probability for the other tail can onlyincrease this value. Sothe result is not significant even at the 10%level. [1] So, even after bending the rules, the interviewee [1] IFE: 2022 Examinations hasfailed to demonstrate her powers. The Actuarial Education Compan CS1-10: Hypothesis (ii)(a) testing Page 77 Correct hypotheses The hypothesesto usein a one-sided test designed to convince non-believers should be: 0.2 Hp (ii)(b) H : 01: vs 0.2=>p Number of cards required to be correct Calculatingthe probabilities for the Bin(10,0.2) distribution (iteratively) shows that: PBin [ (10,0.2) 4]== 0.1074 + 0.2684 = + 0.3020 + 0.2013 + 0.0881 0.9672 [1] Sothe interviewee would haveto identify atleast 5 cards correctly to demonstrate the existence of ESPat the 5%level. (The actual size of the test is 3.28%.) [1] 10.14 (i) Maximumlikelihood estimate of p The likelihood of observing the given sample is: 60 (1 p=LC 75 16 2 ) ?? ?3p(1 - p32 ) ? ? 3p2(1 - p)? p? 3? ?? ? ? (1=-Kp) 180 p75-(1 ? p) 150 p 32(1 ? - ? ? p) 16 p6 (1=- Kp) 346 p113 [1] where Cis a constant arising from the fact that the sample can occurin different orders. Taking logs: ln lnLK=+ 346ln(1 Differentiating lndL dp p) + 113ln p - with respect to 346 1- p =- + p: 113 p [1] Setting this equal to zero: 346 113(1 =- pp) ? 459 113 ==0.246 459 113= ? pp [1] Checkingthat we do have a maximum: 2 lndL dp So p = =- 346 (1 -)p 22 - 113 2 p < ? 0max [1] 113 459 . The Actuarial Education Company IFE: 2022 Examination Page 78 (ii) CS1-10: Hypothesis Goodness-of-fit test Weare testing the following 0 vs testing hypotheses using 2 a ? goodness-of-fit test: conform to aHBin(3, p) distribution : the probabilities H1 : the probabilities do not conform to a Bin(3, p) distribution Using p = 113 from part (i), the probabilities for this binomial distribution are: 459 (PX 0)== (1 p) 3 - (PX 1)== 3 p(1 = p) 2 - 0.4283 = 0.4197 [1] (PX 2 2)== 3 p (1 PX (3)== p3 = - p) = 0.1371 0.0149 Multiplying these by 153 we obtain expected values of 65.54, 64.21, 20.97, 2.283. Sincethe last one of these expected valuesis less than 5 we need to combine this with another group, say the third one. This gives: Number of claims 0 1 2 and 3 Observed no. of policies 60 75 18 Expected no. of policies 65.54 64.21 23.25 [1] The number of degrees of freedom 1=- 3 1-1 = . [1] The observed value of the test statistic is: ?( OE ii ) Ei 22 (60-- 65.54) 65.54 =+(75 - 64.21)2 (18 + 64.21 - 23.25)2 23.25 0.4683=+ 1.813 + 1.185 = 3.47 [1] Sincethis is less than the 5%critical value of 3.841, we haveinsufficient evidence at the 5%level to reject 0H . Wetherefore conclude that the modelis a good fit. [1] 10.15 (i)(a) Estimate the Poisson parameter The maximum likelihood estimator of the Poisson parameter (representing the average number of corpusclesin eachsquare) is the sample mean, whichis: 040+ 1 66 + 2 93+ ? +8 400 IFE: 2022 Examinations 1 == 1,034 400 = 2.585 [1] The Actuarial Education Compan CS1-10: Hypothesis (i)(b) testing Page 79 Goodness-of-fit test The hypotheses are: H0: The observed numbers conform to a Poisson distribution vs H1: The observed numbers dont conform to a Poisson distribution. Wecan use the estimate from (PX 0) eg [1/2] part (i)(a) to calculate the expected numbers using the Poisson PF: 2.585 e-== = 0.07540 ? 30.16 cells [1/2] Corpuscle count 0 1 2 3 4 5 6 7 8= Actual number 40 66 93 94 62 25 14 5 1 Expected number 30.2 78.0 100.8 86.8 56.1 29.0 12.5 4.6 2.0 [2] If we pool the groups for counts of 7 or more, the observed value of the test statistic is: ?( (40 -- 30.2)22 OE) E 30.2 =+(66 3.180=+ 1.846 + - 78.0)2 78.0 0.604 + (6 +?+ 0.597 - 6.6)2 6.6 0.620 + 0.552 + + 0.18 0.055+= 7.63 The number of degrees offreedom is 6--81 1 = . [2] [1] Sincethis is less than the 5%critical value of 12.59, we haveinsufficient evidence at the 5%level to reject 0H . Wetherefore conclude that the modelis a good fit. [1] (ii) Testif patient has anaemia Weare testing: :s H=<:3 Hv 01 3 [1/2] Let X be the count per cell, then: ? XPoi(400? )?? N(400 ,400 ) X - 400 400 ???N(0,1) [1/2] ?? Under0H , the statistic with continuity correction is: 1,034.5 - 1,200 z==-4.78 1,200 [1] This is less than the 1% critical value of -2.3263, so there is sufficient reject 0H . Therefore weconclude that 3< evidence at the 1%level to , ie the patient does have anaemia. [1] Alternatively, using probability values, wehave (PZ <- 4.78) < 0.0005%, whichis highly significant. The Actuarial Education Company IFE: 2022 Examination Page 80 10.16 CS1-10: Hypothesis testing Here we are testing: H0: The classification into the three AIDSstatuses is independent of the presence or absence of the alleles vs H1: The classification into the three AIDS statuses is not independent of the presence or absence of the alleles. The expected frequencies, calculated using row total columntotal , are: grand total Free of EXPECTED [1/2] Early symptoms symptoms Suffering from AIDS Total Alleles present 10.7 8.7 28.6 48 Alleles absent 111.3 91.3 298.4 501 Total 122 100 327 549 [1] The value ofthe chi-square test statistic is: ?( OE ii ) (24-- 10.7) 22 =+ 10.7 Ei ? (310 + 298.4)2 = 23.79 298.4 - [2] Thetest statistic is sensitive to rounding. The number of degrees offreedom is given by (22--1)(3 1) = . Since the test statistic is greater than the 1/2% 2 ?2 critical value of 10.60, 1/2%level, and conclude that the classification into the three [1/2] wecan reject 0H AIDS statuses is not independent the presence or absence of the alleles. 10.17 (i)(a) at the of [1] Sample mean The sample meanis: 250 75+ 750 51 75 (i)(b) + 1,750 22 + 6,250 5 51++ 22 + 5 126,750 ==828.43 153 Goodnessof fit of exponential distribution Weare testing : H0: the exponential is a suitable distribution vs H1: the exponential is not a suitable distribution. Wefirst need to estimate the value of ? usingthe method of moments. The meanofthe claim amount distribution is1 ?. Setting this equal to the sample meangives a value of 0.0012071 for ?. IFE: 2022 Examinations [1] The Actuarial Education Compan CS1-10: Hypothesis testing Page 81 The probability that an exponential Fb () F( a)-= ee?? Soif the claim amount is (500 a and b is: X we have: PX<< 1,000) (1,000<< PX variable lies between ab --- 500) = e0 - e (0<<PX random = 500 ?? = 1 - e-- 500 - e-- 1,000?? e 500 2,500) = e (2,500 PX<< 10,000) = 1,000 e - e -- 2,500 = 0.4531 = 0.2478 2,500?? - --e = 0.2502 10,000?? =0.0489 [1] Multiplying these figures by 153, we obtain the expected values 69.33, 37.91, 38.27 and 7.48 respectively. OE () 2 We thencalculate theteststatistic? (75 69.33) (51 -- 37.91)22 69.33 The underlying 37.91 distribution is - ii Ei (22 - ++ : 38.27)2 (5 + 38.27 ?2 with 2--41 1= - 7.48)2 7.48 = 12.7 degrees of freedom [1] (since we have set the total and estimated the meanfrom the data). The critical value of the 2 ?2 distribution [1] [1] is 5.991, so we have evidence to reject 0H and conclude that the exponential is not an appropriate at the 5%level distribution. [1] Contingency table (ii) Wearetesting: vs H0: the claim size is independent of postcode H1: the claim size is not independent of postcode The observed valuesin each ofthe categories are: Claim size, c 0 500c=< 500 1, 000=<c 1, 000 =<c2, 500 2, 500 =<c10, 000 Total Postcode 1 23 14 7 3 47 Postcode 2 30 16 11 1 58 Postcode 3 22 21 4 1 48 Total 75 51 22 5 153 [1] The Actuarial Education Company IFE: 2022 Examination Page 82 CS1-10: Hypothesis Wecan calculate the expected frequencies in each category testing by multiplying the row and column totals, and dividing by 153: Claim size, c 0 500c=< 500 =<c 1, 000 1, 000 =<c2, 500 2, 500 =<c10, 000 Postcode 1 23.04 15.67 6.76 1.54 Postcode 2 28.43 19.33 8.34 1.90 Postcode 3 23.53 16.00 6.90 1.57 [3] Since there are three cells containing less than 5, we will combine the last two columns. Claim size, c 0 500 500c=< =<c 1, 000 1, 000 =<c10, 000 Postcode 1 23.04 15.67 8.29 Postcode 2 28.43 19.33 10.24 Postcode 3 23.53 16.00 8.47 [1] Wecan now calculate the observed value of the test statistic: ? 2 (23 23.04) 23.04 =+ ? 8.47)22 (5-+ 8.47 [1] = 4.58 The number of degrees offreedom is (34-- 1)(3 1) = . the observed value of the test statistic [1] does not exceed 9.488, the upper 5% point of the ?42 distribution. So we haveinsufficient evidence at the 5%level to reject 0H . Therefore we conclude that the claim sizeis independent of the postcode. 10.18 (i) [1] Test for association The hypotheses for the test are: H0 : There is no association between living in a single parent family and getting into trouble with the police vs H1: There is an association between living in a single parent family and getting into trouble with the police. [1/2] The actual numbers in each category are: ACTUAL In trouble Notin trouble Total Single parent 100 300 400 Two parent 240 960 1,200 Total 340 1,260 1,600 IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 83 The expected numbers for each category are: EXPECTED In trouble Notin trouble Total Single parent 85 315 400 Two parent 255 945 1,200 Total 340 1,260 1,600 [1] The observed value ofthe test statistic is: ?( 22 (100 - 85) -OE) E 85 =+(300 315)2 315 +(240--- 255)2 255 (960 + 945)2 945 = 4.482 [2] The number of degrees offreedom is (21--1)(2 1) =. [1/2] Sincethe observed value of the test statistic exceeds 3.841, the upper 5% point of the 2 ?1 distribution, wecan reject the null hypothesis and conclude that there is an association between single parent families and beingin trouble (ii) withthe police. [1] Comment However, the presence of an association does not justify the politicians assumption that single parents cause crime. There may be some other underlying causes (eg education levels, poverty) that influence family circumstances and crime rates together. 10.19 (i) [1] Testfor association The test required is a2? contingency table test. The hypotheses are: H0: Thereis no association between flower colour andleaf type H1: Thereis some association between flower colour andleaf type. vs [1] The expected frequencies are: Plain Variegated Red White Pink 87.3 82.1 46.7 114.7 107.9 61.3 [2] So the observed value of the test statistic is: ?( The Actuarial OE) (97-- 87.3) 22 E Education 87.3 Company =+ (31 - 61.3)2 61.3 +=? 71.0 [2] IFE: 2022 Examination Page 84 CS1-10: Hypothesis Comparing this withthe figures in the Tablesfor the ?22 distribution, testing wesee that this figure is far larger than the 1/2%point ofthe distribution. Wehave overwhelming evidence against the null hypothesis, and weconclude that there is almost certainly some association between flower colour and leaf type. (ii)(a) [1] Maximumlikelihood estimate of q Assumingthat this genetic modelis correct, the likelihood function is: 97 ? ? ?22? 42 Lq ()=? q ?13 qq ? ? ? ? ? -qq392(1=- 3 ) 77(1 77 148 q105 3 q??? ??? 2 ??? 1--? 5q ? ? 2 ? 31 constant 5q) 31 constant [1] Takinglogs: log 392logLq=+ 77log(1 Differentiating - 3q) + 31log(1 - 5q) + constant [1] with respect to q: d logL=dq 392 q 231 155 13q 1-- 5q [1] Setting this equal to zero, and multiplying through 392(1 3 )(1-- 5 ) -qq by )qq (1q-- 3 )(1 5 , we obtain: 231 q(1 - 5 q) - 155 q(1 - 3 q) = 0 [1] Gathering terms: 392 2 3,5220qq -+ 7,500 = Solving the quadratic equation: 3,522 q==0.18128 If q = 0.28832 , then 3,522 2- 4 7,500 15,000 -15q 2 392 or 0.28832 is negative, so we can ignore the larger root. [2] Wecan check that this doesindeed give a maximum: d2 logL dq =- 392 q 22 693 - (1 3q) 775 2 - 2 <?0 (1-- 5q) max [1] Sothe maximumlikelihood estimate for qis q = 0.181. IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis (ii)(b) testing Page 85 Test goodness of fit Using q 0.181= , wecan find the expected frequencies by multiplying the probabilities by 500. This gives the following table of expected frequencies: Red White Pink Plain 90.6 45.3 114.0 Variegated 90.6 136.0 23.4 [2] Usinga chi-squared test, the hypotheses are: vs H0: The probabilities of each plant type conform to this genetic model H1: The probabilities of each plant type do not conform to this genetic model. The observed value of the test statistic is: ?( OE) (97-- 90.6) 22 =+ 90.6 E (31 23.4)2 +=? 18.5 23.4 - Comparing this value with the appropriate points ofthe have strong evidence to reject 0H , and weconclude [2] 2 ?4 distribution, wesee that again we at the 1/2%level that this genetic model does not appear to fit the data well. This time [1] we are not testing for association, ie it is an ordinary chi-square goodness of fit test. So the number of degrees offreedom is the number of cells minusthe number of estimated parameters minus1. This gives us 4--61 1 = degrees offreedom here. (iii) Comment None of the modelssuggested here appear to fit the data well. Ofthe pink flowers, there appear to be far too many with plain leaves and far too few under the assumption ofindependence. overcompensate for this, with variegated leaves than with the result that the actual number of pink flowers is smaller than that predicted by the model. Afurther models we have tried so far 10.20 (i)(a) we would expect However,the genetic modelin part (ii) appears to with plain leaves modelsomewhere between the two might give a better fit to the observed data. [3] State assumptions Each houseindependently (i)(b) [1] Derive the maximumlikelihood estimate of p L(p) The Actuarial musthavethe same probability of being burgled. = [( X = 0)]39[ PX P ( 1)]38 [(PX=== 2)]18 [(PX 3)]4 P( X = 5) Education Company [1] IFE: 2022 Examination Page 86 CS1-10: Hypothesis Using a Bin(6, p) distribution to calculate the probabilities: c[(1=- p) 6 ] 39 [ p(1 ( ) Lp testing - p) 5 ] 38 [ p2 (1 p) 4 ] 18 [ p3(1 - - p) 3 ] 4 p5(1 - p) cp 91(1=- p) 509 ? ln ln c=+ 91ln p + 509ln(1 )- p (Lp) ? ? 91 ? [1] equal to zero to obtain the 509 -= 0 ? - pp Checking its [1/2] 91 509 =pp 1?- p ln)Lp ( Setting the differential ? [1/2] p= maximum: 91 [1] 1600 a maximum: 2 ln)Lp ( Alternatively, =- 91 509 <?0 pp22 (1?- p) 2 max [1] since the binomial distribution is additive, we could have looked at a single Bin(600,)p distribution instead. (i)(c) Fit the binomial model and comment Using the estimate PX x()== (i)(d) 6?? ?? x?? p = 91 600 (1p - p we get frequencies 6-xx) [31/2] . Comment These are very similar to the observed frequencies, (ii) Test whether binomial Using of 37.3, 40.0, 17.9, 4.3, 0.6, 0.0, 0.0, using = 0.18p and (PX x)== model with p 6?? ?? 0.18 x?? which implies that the modelis a good fit. [1/2] 0.18=is a good fit for the data 60.82xx we get: 0 1 2 3 4 5 6 Observed 39 38 18 4 0 1 0 Expected 30.40 40.04 21.97 6.43 1.06 0.09 0.00 [2] IFE: 2022 Examinations The Actuarial Education Compan CS1-10: Hypothesis testing Page 87 Since the expected frequencies are less than five for 4, 5 and 6 houses burgled, we need to combine these columns together with the 3 column: 0 1 2 3+ Observed 39 38 18 5 Expected 30.40 40.04 21.97 7.58 [1] The observed value ofthe test statistic is: 2 (39 ? 30.40) 30.40 ? =+ (5-- 7.58)22 + 7.58 [2] = 4.13 There are now 4 groups sothe number of degrees offreedom is 3-= 41 valuefor p of 0.18 wasgiven and was not estimated usingthis data. Weare carrying out a one-sided test. 10.21 (i) to conclude that the Illustrate [1] Our observed value of the test statistic is less than the 5% critical value of 7.815. So we haveinsufficient is reasonable . Remember that the evidence to reject 0H at the 5%level. Therefore it modelis a good fit. [1] data and comment on the assumptions Thereis perhaps some very slight evidence of concentration at the centre ofthe distribution for A, but the sample sizes are small andit is difficult to tell whether an assumption of normality is reasonable. The variance of the data from Company Blooks slightly smaller than that from Company A. However,it is unlikely that such a small difference is significant. There are no outliers in either distribution. (ii) [2] Test whether appropriate to apply atwo-sample t test Werequire the variances to be equal, so we are testing: Hss s=? 22 Hvs 494,126=- 2 ::AsBA 2 B 01 2,134 ?? 2,25922? 221 1 ? = 4,303.4?? ss ?541,463= 3,461.7 = AB 10 ? ? 910 ?? 9 ?? ?? ? The test is based on the result 22 SS AB 22 ss AB 4,303.4 3,461.7 1 [1/2] ? Fnn 1AB 1, -- . Theobservedvalue of the test statisticis: = 1.243 Weare carrying out atwo-sided test. Comparing the statistic withthe that it is less than the 5% critical value of 4.026. So we have insufficient to reject the null hypothesis. Therefore it is reasonable to conclude that The Actuarial Education Company [1] [1] 9,9Fdistribution, wesee evidence at the 5%level A=ssB22. [11/2] IFE: 2022 Examination Page 88 (iii) CS1-10: Hypothesis Test whether premiums charged by Company testing B was higher than those by Company A Weare testing: H Hvs A=> BA :: 01 [1/2] B Underthis null hypothesis, we use: - XX BA 2 ?t nnAB 11?? SP +- 2 ??+ nn BA ?? Substituting in the values, weget atest statistic of: 225.9 - 213.4 9 4,303.4+ 9 3,461.7 18 Comparing this with the = 0.4486 1 10 [1] 1?? ??+ 10?? 18t values gives a p-value of in excess of 30%. So we have insufficient evidence to reject our null hypothesis at the 30% level. Therefore it is reasonable to conclude that the level of premiums charged by Company Bis the same asthat charged by Company A.[11/2] (iv)(a) Confidenceinterval for the difference between the proportions Usingthe pivotal value, from Chapter 8 of: ppAB()-pqAA p + () p - AB ???N(0,1) [1/2] pB q B nn AB Wehave: A 0.5, pq 0.5,==AB p = 0.6, Bq=0.4, nA = nB = 10 [1/2] Weobtain a 95%confidence interval of: 0.1- 1.96 (iv)(b) 0.25 0.24 + =-( 0.53,0.33) 10 10 [1] Comment Sincethis confidence interval contains zero, wecannot conclude that the proportions of premiums in excess of 200 are different for the two companies. (v) [1] Test whether Company A appears to have increased its premiums Wenow carry out a single sample t-test As A H=>: IFE: 2022 Examinations 170Hv 01: on the data for Company A. Weare testing: 170 [1/2] The Actuarial Education Compan CS1-10: Hypothesis testing Page 89 The observed value of the test statistic is: 213.4 - 170 = 2.092 [1] 4,303.4 / 10 Comparing this with values of the 9t distribution, wefind that wehave a result that is significant atlevel somewhere between 2.5% and 5%. So we havesufficient evidence to reject 0H at the 5% level. Therefore it is reasonable to conclude that the company the previous year. The Actuarial Education hasincreased its premiums since [11/2] Company IFE: 2022 Examination Allstudy materialproducedby ActEdis copyright andis sold for the exclusiveuse of the purchaser. Thecopyright is ownedbyInstitute andFacultyEducationLimited, a subsidiary of the Institute and Faculty of Actuaries. Unlessprior authorityis grantedby ActEd,you maynot hire out,lend, give out, sell, store ortransmit electronically or photocopy any part of the study material. You musttake care of yourstudy materialto ensurethat it is not used or copied by anybody else. Legal action will betaken if these terms areinfringed. In addition, we mayseekto take disciplinaryactionthrough the profession orthrough your employer. Theseconditionsremainin force after you havefinished usingthe course. IFE: 2022 Examinations The Actuarial Education Compan CS1-11: Correlation Page 1 Correlation Syllabusobjectives 2.2 Exploratory data analysis 2.2.1 2.2.2 Describethe purpose of exploratory data analysis. Use appropriate undertake 2.2.3 tools to calculate suitable summary statistics exploratory and data visualizations. Define and calculate Pearsons, Spearmans and Kendalls measuresof correlation for bivariate data, explain their interpretation and perform statistical inference as appropriate. 2.2.4 The Actuarial Education Use principal components complex data set. Company analysis to reduce the dimensionality of a IFE: 2022 Examination Page 2 0 CS1-11: Correlation Introduction Actuaries, statisticians and many other professionals are increasingly engaged in analysing and interpreting large data sets, in order to determine whether there is any relationship between variables, the following three and to assess the strength of that relationship. The methods in this and chapters are perhaps more widely applied than any other statistical methods. Exploratory data analysis (EDA) is the process of analysing data to gain further insight into the nature of the data, its patterns and relationships between the variables, before any formal statistical techniques are applied. Thatis, weapproach the datafree of any pre-conceived assumptions or hypotheses. the patterns in the data before weimpose any views on it and fit Wefirst see models. In addition to discovering the underlying structure of the data and any relationships between variables, exploratory data analysis can also be used to: detect any errors (outliers or anomalies) in the data check the assumptions madeby any modelsor statistical tests identify the mostimportant/influential variables develop parsimonious models that is modelsthat explain the data with the minimum number of variables necessary. For numerical data, this process use of data visualisations. this will include Transformation the calculation of summary statistics and the ofthe original data may be necessary as part of process. For a single variable, EDA will involve calculating summary statistics (such as mean, median, quartiles, standard deviation, IQR and skewness) and drawing suitable diagrams (such as histograms, boxplots, series/ordered data). quantile-quantile (Q-Q) plots and a line chart for time For bivariate or multivariate data, EDA willinvolve calculating the summary statistics for each variable and calculating visualisation will typically involve scatterplots Linear correlation correlation coefficients between each pair of variables. Data between each pair of variables. between a pair of variables looks at the strength of the linear relationship between them. The diagrams below show the various degrees of positive correlation: perfect positive correlation IFE: 2022 Examinations strong positive correlation weak positive correlation The Actuarial Education Compan CS1-11: Correlation Recall that Page 3 we met correlation in Chapter 4 and defined it for a population. In this chapter we will explain how to obtain the sample correlation, and then how to useit to makeinferences about the populations correlation. population mean, Thisis similar to what we did withthe sample mean, X, and the , in Chapters 7 to 10. For multivariate data sets with large dimensionality analysis and principle components analysis (also reduce the complexity of the data set. Subject CS1 assumes that students various techniques such as cluster called factor analysis) can be used to can carry out EDA on univariate data sets. Thisincludes calculation of summary statistics (eg meanand variance) and construction of diagrams (eg histograms), This chapter which are assumed knowledge for Subject CS1. covers three aspects of EDA: using scatterplots to assess the shape of any correlation for bivariate data sets, calculating correlation using principal for multivariate coefficients components data sets. Some results in this chapter are quoted to measure the strength analysis (PCA) to identify without proof. the of that correlation, most important Students are expected to and variables memorise these and apply them in the exam. The Actuarial Education Company IFE: 2022 Examination Page 4 1 CS1-11: Correlation Bivariatecorrelationanalysis In a bivariate correlation analysis the problem of interest is an assessment of the relationship between the two variables Y and X. of the strength In any analysis, it is assumed that measurements (or counts) have been made, and are available, on the variables, giving us bivariate data (,xy11), ( x2, y 2) , ? , ( x , y ).nn 1.1 Datavisualisation The starting point is always to visualise the data. For bivariate data, the simplest this is to draw a scatterplot and get a feel for the relationship as revealed/suggested by the data. The R code to draw a scatterplot for a bivariate data frame, wayto do (if any) between the variables <data>, is: plot(<data>) Weare particularly interested in whether there is alinear relationship between Y, the response (or dependent) variable, and X, the explanatory (or independent, or regressor) variable. Thatis the expected value of Y, for any given value x of X, is alinear function of that value x, ie: EY x[]?= +ax Recallfrom Chapter 5that ]EY[| x is a conditional corresponding mean, whichrepresents the average value of Y to a given value of x. If alinear relationship (even a weak one) is indicated (Linear Regression) relationship can be used to fit a linear by the data, the methods of Chapter 12 model, with a view to exploiting the between the variables to help estimate the expected response for a given value of the explanatory Wenow look variable. at two examples (one linear and one non-linear) which we will analyse throughout the chapter. Question A sample of ten claims and corresponding is taken from the business of an insurance The amounts, in units of 100, Claim x Payment y payments on settlement for household policies company. are as follows: 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45 Draw ascatterplot and comment on the relationship between claims and payments. IFE: 2022 Examinations The Actuarial Education Compan CS1-11: Correlation Page 5 Solution The scatterplot for these data is as follows: Here we can see that there appears to be a strong data points lie roughly in a straight line. positive linear relationship. The plotted Wecan see from the graph that there appears to be alinear relationship between the claims and payments (ie the rate of change in payment is constant for arate of change in the claim). So we will be able to use our linear regression work on these data values in the next chapter. The next example contains a non-linear relationship between the variables. A well-chosentransformation of y (or x, or even both) mayhowever bring the datainto a linear relationship. Thisthen allows usto usethe linear regression techniques in the next chapter. The Actuarial Education Company IFE: 2022 Examination Page 6 CS1-11: Correlation Question The rate of interest of borrowing, over the next five years, for ten companies each companys leverage ratio (its debt to equity ratio). is compared to The data is as follows: Leverage ratio, x Interest rate (%), y Draw a scatterplot 0.1 0.4 0.5 0.8 1.0 1.8 2.0 2.8 3.4 3.5 3.6 4.6 6.3 10.2 19.7 31.3 42.9 and comment on the relationship 2.5 between company 2.8 3.0 borrowing (leverage) and interest rate. Hence apply a transformation to obtain alinear relationship. Solution The scatterplot It can clearly for these data is as follows: be seen that the data displays change in the interest rate increases IFE: 2022 Examinations a non-linear relationship, since the rate of withthe leverage ratio. The Actuarial Education Compan CS1-11: Correlation Page 7 In this case, the log of the interest rate against the leverage ratio produces afar morelinear relationship: 1.2 Sample correlation coefficients The degree of association between the x and y values is summarised bythe value of an appropriate correlation coefficient each of whichtake values from -1 to +1. The coefficient oflinear correlation provides a measure of how well alinear regression model explains the relationship between two variables. The values of r can beinterpreted asfollows: Value r =1 << 01r r =0 Thetwo variables movetogether in the same direction in a perfect linear relationship. Thetwo variables tend to movetogether in the same direction but there is not a direct relationship. Thetwo variables can movein either direction and show nolinear relationship. The two variables tend to <10r -< move together in opposite directions but there is not a direct relationship. r =-1 The Actuarial Interpretation Education Thetwo variables movetogether in opposite directions in a perfect linear relationship. Company IFE: 2022 Examination Page 8 CS1-11: Correlation In this section welook at three correlation coefficients: Kendalls Pearson, Spearmans rank and rank. It is always important relationship in data analysis to note that simply finding a mathematical between variables tells one nothing in itself about the causality ofthat relationship or its continuing persistence through time. analysis is essential before making predictions or taking Jumping to acause Qualitative action. as well as quantitative and effect conclusion - that a change in one variable causes a change in the other - is a common misinterpretation of correlation coefficients. For example, the correlation may be spurious, or there may be another variable not part of the analysis that is causal. Pearsons correlation coefficient Pearsons correlation coefficient r (also called Pearsons product-moment correlation coefficient) measures the strength oflinear relationship between two variables and is given by: Sxy r = xxyySS where: xx Syy = ? = ? Sxi x() -=? xi2 - ( ? xi 22) n i yy() -=? yi 2 - ( ? yi 22) n xy = ? Sxi x()( yi-- y) = ? xi yi - ()()xi ?? yi n Sxx and Syy, the sums of squaresof x and y respectively,arethe samplevariancesof x and y multipliedby (1)n. Similarly xyS is the samplecovariance multipliedby n. n ?xi 2is oftenabbreviated to ?2x , etc. i =1 Question Show that: xx = Sx - ()i x IFE: 2022 Examinations 22 ?? x =i 2 ?xi () n ?xi2=-nx2 The Actuarial Education Compan CS1-11: Correlation Page 9 Solution Expanding the bracket and splitting up the summation, xx=- Sxi =- x() 2 we have: =?? xi -(2 x xi +x22) 2xx?? xii 2 + x() =- ?x22 xn ?? ii ()22 + ?xxi()2 ?? x22=ii 2 nnn Since?=ixnx, weget: xx=-Sxi 2 ?xi () ?? xi=- nx() nn 2 nx2 ?xi2=-22 These formulae aregiven onpage 24oftheTables inthe xx Sxi=-?22 nx format. Recallfrom Chapter 4that the population correlation coefficient is defined to be: ?== cov( ,XY) corr(,XY) var( X)var( Y) Pearsons sample correlation coefficient, r , is an estimate of the population correlation coefficient, ?,in the same way as x is an estimate of The formula for the sample correlation coefficient, or s 2 is an estimate of2s . r , is given on page 25 of the Tables. Lets now calculate this correlation coefficient for the examples we metearlier. Question For the claims settlement data, we have: Claim(100s) x Payment (100s) 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00 y 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45 Number of pairs of observations 35 .4 , ??xx n == 133.76 , Calculate Pearsons correlation 10= . ?y coefficient = 32.87 , ?y 22=115.2025 , ?xy=123.81 for the claims settlement data and comment on its value. The Actuarial Education Company IFE: 2022 Examination Page 10 CS1-11: Correlation Solution 35.4 2 =-nx?x22 133.76 =- Sxx 10 Syy=-? yny22 115.2025=- Sxy=-xynxy? ? 123.81=- 7.4502 r 8.444 7.15881 = 32.287 10 (35.4 8.444 = 7.15881 32.87) = 7.4502 10 == 0.95824 Asexpected,this is high(close to 1+ ), andindicates a strong positivelinear relationship. Question For the original borrowing rate data: Leverage ratio, x Interest rate y 0.1 0.4 0.5 0.8 1.0 1.8 0.028 0.034 0.035 0.036 0.046 0.063 Number of pairs of observations 2.0 2.5 2.8 3.0 0.102 0.197 0.313 0.429 = 10n . 14.9,??xx == 32.39 , ?y = 1.283 , ?y 22=0.341769 , ?xy=3.082 Calculate Pearsons correlation coefficient for the borrowing rate data. Solution 2 Sxx 14.9?? ?xnx =- 22= 32.39 - 10 ?? Syy=- ?yny 22= 0.341769 - 10 Sxy=- ?xy nxy= 3.082 ? 10 ?? r IFE: 2022 Examinations - 10 = 10.189 2 1.283?? ?? 10 ?? = 0.1771601 14.9 ??? 1.283? ??? ? = 1.17033 10 ??? 10 ? 1.17033 == 0.87108 10.189 0.1771601 The Actuarial Education Compan CS1-11: Correlation Page 11 Since Pearsons correlation coefficient measureslinear association, it may give alow result when variables have a strong, but non-linear relationship. Whilst the value for the borrowing rate data is high, it is materially lower than in the first example, due to the non-linearity ofthe relationship. The moralof the story however is always to plot the datafirst. For example, the following scatterplots (from the statistician Francis Anscombe) all have a correlation coefficient of 0.816: Reference: Anscombe, F.J. (1973). Graphs in Statistical Analysis. American Statistician 27 (1): 1721. JSTOR 2682899 The R code for calculating a Pearson correlation coefficient for variables cor(x, y, method = x and y is: "pearson") Spearmansrank correlation coefficient Spearmans rank correlation coefficient necessarily linear) Formally, it is the relationship between two variables. Pearson correlation rather than the raw values, rs measures the strength of monotonic (but not coefficient applied to the ranks, Education Company ()irY , )iiXY (, , ofthe bivariate data. Soit just usestheir relative sizesin relation to each other. to largest. The Actuarial ()irXand Weusually order them from smallest IFE: 2022 Examination Page 12 CS1-11: Correlation If all the iX s are unique, and separately all of the iY s are unique, ie there are no ties, then this calculation simplifies to: ?di 2 6 1=- rs where i nn 2 -(1) iiX()=- r ( )iY . dr Since Spearmans rank correlation coefficient only considers ranks rather than the actual values, the value of the coefficient is less affected by extreme values/outliers in the data than Pearsons correlation coefficient. Hencethis statistic is morerobust. Lets now calculate Spearmans rank correlation coefficient for the examples we met earlier. Question Calculate Spearmans rank correlation Claim(100s) x Payment (100s) coefficient for the claims settlement data and comment. 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00 y 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45 Solution For the claims settlement data: Claim x Payment y Rank x Rank y 2.1 2.18 1 2 2.4 2.06 2 1 1 1 2.5 2.54 3 3 0 0 3.2 2.61 4 4 0 0 3.6 3.67 5 6 3.8 3.25 6 5 4.1 4.02 7 8 4.2 3.71 8 7 1 1 4.5 4.38 9 9 0 0 5 4.45 10 10 0 0 d 1 1 1 1 d2 1 1 1 1 This gives: rs 66 1 =- 10 (10 IFE: 2022 Examinations -2 = 0.9636 1) The Actuarial Education Compan CS1-11: Correlation Page 13 As expected, the Spearmans rank correlation coefficient is very high, since it is known from the calculation of the Pearsons correlation relationship (hence a strong monotonically The Spearmans correlation coefficient coefficient increasing that there is a strong relationship). may give a value that is substantially positive linear different from the Pearsons coefficient for the same data. Question Calculate Spearmans rank correlation coefficient for the original borrowing rate data and comment. Leverage ratio, x Interest rate (%), y 0.1 0.4 0.5 0.8 1.0 2.8 3.4 3.5 3.6 4.6 1.8 2.0 2.5 2.8 3.0 6.3 10.2 19.7 31.3 42.9 Solution For the corporate borrowing data, the ranks of the two data are exactly equal, hence Spearmans rank correlation coefficient is trivially equal to 1. The reason that this is materially higher than the equivalent Pearson coefficient is because the non-linearity of the relationship does not feature in the calculation, only the fact that it is monotonically increasing. TheRcodefor calculatinga Spearmanrank correlationcoefficientfor variables x and y is: cor(x, y, method = "spearman") The Kendall rank correlation coefficient Kendalls rank correlation between two variables. coefficient t measures the strength of monotonic relationship Like the Spearman rank correlation coefficient, the Kendall rank correlation coefficient considers only the relative values of the bivariate data, and not their actual values. It is far more intensive from a calculation viewpoint, however, since it considers the relative values of all possible pairs of bivariate data, not simply the rank of Xi andiY for a given i . Despitethe morecomplicated calculation, it is considered to have better statistical properties than Spearmans rank correlation coefficient, particularly for small data sets withlarge numbers of tied ranks. Any pair of observations for both elements agree, ie (,XY ii );( Xj , Yj ) where XX> ij and YY> ij , or ij?, is said to be concordant if the ranks XX< ij and YY<; otherwise ij they are said to be discordant. The Actuarial Education Company IFE: 2022 Examination Page 14 CS1-11: Correlation Consider the settlement payments for claims example. Suppose Claim Ais greater than Claim B. If the settlement for Claim Ais also greater than the settlement for Claim Bthen they havethe same relative rank orders, and wesaythat A and Bare concordant pairs withrespect to the random variables claims and settlement. Let nc be the number of concordant pairs, and let dn Assuming that there are no ties, the Kendall coefficient t be the number of discordant pairs. t is defined as: - nn cd = nn -(1) / 2 The numerator is the difference in the number of concordant and discordant pairs. The denominator is the total number of combinations of pairing each )ii(,XY witheach XjY j (, ) . Thiscould also be definedas.nn+ cd For example,if there were3 observationsof X and Y then there wouldbe 2) 2=(33 combinations: )XY(, 11 and )XY(, 22 )XY(, 11 and )XY(, 33 )XY(, 22 and )XY(, 33 So t can beinterpreted asthe difference between the probability ofthese objects beingin the same order and the probability of these objects beingin a different order. Therefore, a value of 1- indicates all discordant pairs and +1indicates all concordant pairs. Intuitively, it is clear that if the number of concordant pairsis muchlarger than the number of discordant pairs, then the random variables are positively correlated. Onthe other hand, if the number of concordant pairsis muchless than the number of discordant pairs, then the variables are negatively correlated. Lets now calculate the Kendall rank correlation coefficient for the examples we metearlier. IFE: 2022 Examinations The Actuarial Education Compan CS1-11: Correlation Page 15 Question Calculate Kendalls rank correlation coefficient for the claims settlement data and comment. Claim(100s) x 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00 Payment (100s) y 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45 Solution For the example (x,y) claims 2.1,2.18 data: 2.4,2.06 2.1,2.18 2.5,2.54 3.2,2.61 3.6,3.67 3.8,3.25 4.1,4.02 4.2,3.71 4.5,4.38 5.0,4.45 c c c c c c c c c c c c c c c c c c c c c c c c c c c c c d c c c c c c c c d c c c c d 2.4,2.06 2.5,2.54 3.2,2.61 3.6,3.67 3.8,3.25 4.1,4.02 4.2,3.71 4.5,4.38 c 5.0,4.45 where c represents Here nc = 42 , nd 3= a concordant , so t -nn cd pair, and d represents (42 =- 3) (10 = pair. 0.8667 . 42 - 3 == 0.8667 . 42 + 3 Alternatively using t = Again the relatively high value demonstrates +nn cd gives t 9 2) a discordant the strong correlation between the variables. Its often easierto determine concordant and discordant pairs by usingthe ranks instead of the actual numbers. First arrange the values in order of rank for x . Then the number of concordant pairs (C) is the number of observations below which have a higher rank for the y andthe number of discordant pairs(D)is the number of observationsbelow whichhavealower rankfor the y. The Actuarial Education Company IFE: 2022 Examination Page 16 CS1-11: Correlation Rank x Rank y C D 2.1, 2.18 1 2 8 1 2.4, 2.06 2 1 8 0 2.5, 2.54 3 3 7 0 3.2, 2.61 4 4 6 0 3.6, 3.67 5 6 4 1 3.8, 3.25 6 5 4 0 4.1, 4.02 7 8 2 1 4.2, 3.71 8 7 2 0 4.5, 4.38 9 9 1 0 5.0, 4.45 10 10 42 3 Totalling the columns gives nc = 42 , nd =3 as before. Question Calculate Kendalls rank correlation coefficient for the original borrowing rate data and comment onits value. Leverage ratio, x 0.1 0.4 0.5 0.8 1.0 1.8 2.0 2.5 2.8 3.0 Interest rate (%), y 2.8 3.4 3.5 3.6 4.6 6.3 10.2 19.7 31.3 42.9 Solution For the corporate equal to 1. borrowing data, clearly all the pairs are concordant, and so t is trivially The Rcode for calculating a Kendallrank correlation coefficient for variables x and y is: cor(x, IFE: 2022 Examinations y, method = "kendall") The Actuarial Education Compan CS1-11: Correlation Page 17 1.3 Inference To go further than distribution a mere description/summary of the data, a model is required for the ofthe underlying variables )X(, Y . Inference under Pearsonscorrelation The appropriate X modelis this: the distribution ,, , and ,YX Yss of )X(, Y is bivariate normal, with parameters . ? Assuming a bivariate normal distribution meansthat wehave continuous data, each(marginal) distribution is also normal, the variance is constant and we have alinear relationship between X and Y. If any ofthese assumptions are not metthen inference will give misleadingresults. In the bivariate normal model, both variables are considered to be random. However, they are correlated, so their values arelinked. The bivariate normal modelassumesthat the values of with joint XY , fx(, y) PDF -8 < x y <8(, ) given by: 1 s 21- exp ?-- ?2 ?? --- +xx ?? sxx?? ?xx YX-- YX Y Y sY ?? 2(1?? ?? ??XYps ?? ??-?? )????? ?? 0, the cross term is zero and the PDFfactorises into the product of the In the case where = ??? ?? Y ?? y -<<?11. PDFsfor two independent variables, one with a N(,)XX distribution. 22 ???y 2??2 ? ss where ?is the correlation parameter, where In the case where=? )iiXY (, have ajoint normal distribution ?? 1, the bivariate s 2 distribution and one with a N(, distribution s 2)YY degenerates into a single line , ie the values of X and Y are directly linked. ss YX If weintegrate over all possiblevaluesof y to find the conditional expectation, weget the following result: EY (| X x)== sY +? s X (x - YX) Theimportant thing to note hereis that the expression on the RHSis alinear function of x. To assess the significance of any calculated needed. The distribution Two results r , the sampling distribution of this statistic is of r is negatively skewed and has high spread/variability. are available. Result 1 Under H0 ? =:0, rn - 2 has a t distribution with 2n? =- degrees of freedom. 1- r2 The Actuarial Education Company IFE: 2022 Examination Page 18 CS1-11: Correlation From this result a test of H0 =:0? (the hypothesis of no linear variables) can be performed by working out the value of r level of testing, or by finding the probability relationship between the whichis significant value of the observed at a given r. Thisresult is given on page 25 of the Tables. Question Test 0=? :0 Hvs 01: for the claims settlement data. Recallthat r = 0.95824. H?? Solution For the given data n 10=and r 0.95824 8 1 - 0.958=. So the value of the test statistic is: = 9.478 0.958242 distribution. Under 0H this should be a value from the 8t The p-value of 2( Pt8> 9.478) is less than 0.1%. Wehave extremely strong evidenceto reject 0H and conclude Result 2(Fishers transformation This is a more general result If W= ln 11 +r , then 21 - r standard deviation Thisis usually referred . of r) it is not restricted Whas (approximately) to the case 0? = . a normal distribution with mean ln 11 + ? 21 - ? and 1 n- 3 . to as the Fisher Z transformation approximately normal). Accordingly,the letter (because the resulting z values are Zis usually used. W can also be written as tanh - 1 r . This is the inverse Note that ? 0? hyperbolic tangent function, which, on modern Casiocalculators, is accessed by pressing hyp and then choosing Option 6to get tanh- 1 . From the result and hence for on W, tests of =:?? H00 can be performed. Confidence intervals for w ? can also befound. Thisresult is given on page 25 of the Tables. IFE: 2022 Examinations The Actuarial Education Compan CS1-11: Correlation Page 19 Question Considering the data on claims H?? :0.9 Hvs and settlements, carry out the test: => :0.9 01 for the population of all claims/payments ofthis type. Solution For the given data: n = 10, r = 0.95824 , observed value of W =tanh - 1 0.95824 = 1.9239 Under0H , W has a normal distribution deviation 110 -=3 0.37796. ( Z>=1.921) PW P > with mean tanh - 1 0.9 = 1.4722 and standard So: 1.9239 1.4722??Z( ?? = P 0.37796 ?? > 1.195) 0.12 So the p-value of r = 0.958 is about 0.12. There is insufficient evidence to justify rejecting 0H , which can stand. Notes: (a) The bivariate normal assumption. The presence of outliers data points far away from the main body of the data mayindicate that the distributional assumption underlying the above methods is highly questionable. (b) Influence Just as a single observation can have a marked effect on the value of a sample mean and standard deviation, so a single observation separated from the bulk of the data can have a marked effect The R code for carrying on the value of a sample correlation out any hypothesis test coefficient. using the Pearson correlation coefficient for variables x and y is: cor.test(x, y, method = "pearson") Inference under Spearmansrank correlation Since we are using ranks rather than the actual distribution The Actuarial of X, Y or )XY (, Education Company data, no assumption is needed about the , ie it is a non-parametric test. IFE: 2022 Examination Page 20 CS1-11: Correlation However, non-parametric tests are less powerful than parametric tests (ie ones that do assume a distribution) as we haveless information. So we would need to obtain a more extreme result before weare ableto reject 0H . Onthe plus side, the test is less affected by outliers. Under a null hypothesis sampling distribution permutations. of no association/no of rs can (for monotonic relationship small values This does not have the form of a common statistical For example,if we had a sample size of 4, there would be=4! the Y variables, so each arrangement 2d for each arrangement S between of n ) be determined has a probability of 24 1 and hence obtain the probabilities using the probabilities for large n this For larger obtained above and if using distribution. 24 waysof arranging the ranks of of occurring. Wethen calculate of getting each value of 2Sd . Wecan then carry out a hypothesis test. For example, if weare testing H0:0?= with a 5% significance level, wherethe data values give X and Y the precisely vs ?>1:0H () d S=23, wecan calculate Pd S=23 we get less than 5% we would reject 0H . However, will be time consuming. values of n ( > 20 ) we can use Result 1 from above. So, under the null hypothesis that the variables are uncorrelated: s rn 2 - provided ???tn 2 - 1-rs2 >20n Recallthat Spearmans rank correlation coefficient is derived by applying Pearsons correlation coefficient to the ranks rather than the original data. The limiting normal distribution has a mean of 0 and a variance of 1( -1)n. This meansthat for verylarge values of n, the sampling distribution of rs can be approximated bythe N 0,n-1()1distribution. The R code for carrying out a hypothesis test using Spearmans rank correlation coefficient for variables x and y is: cor.test(x, y, method = "spearman") Inference under Kendallsrank correlation Again, since we are using ranks, we have a non-parametric test. Underthe null hypothesis ofindependence of X and Y, the sampling distribution be determined for precisely using permutations Wecan carry out a hypothesis test in the same each arrangement. IFE: 2022 Examinations However, again, for large small values of t can of n. way as described above but calculating n this -nn forcd will be time consuming. The Actuarial Education Compan CS1-11: Correlation Page 21 Forlarger values of n ( >10), use ofthe Central Limit Theorem meansthat an approximate normal distribution can be used, The R code for carrying with mean 0 and variance out a hypothesis test using the 5) 9 ( +-2(2 1)nnn . Kendall rank correlation coefficient for variables x and y is: cor.test(x, y, method = "kendall") Note that cor.test will determine exact p-values if samples the test statistic is approximately normally < 50n(ignoring distributed. tied values); for larger Question An actuary wantsto investigate if there is any correlation between students scoresin the CS1 mockexam and the CS2 mockexam. Data values from 22 students werecollected and the results are: Student 1 2 3 4 5 6 7 8 9 10 11 CS1 mockscore 51 43 39 80 56 57 26 68 54 75 72 CS2 mock score 52 42 58 56 47 72 16 63 48 80 68 Student 12 13 14 15 16 17 18 19 20 21 22 CS1 mock score 85 48 27 63 76 64 55 78 82 52 60 CS2 mock score 82 54 38 57 71 50 45 60 59 49 61 You are given that S=2494d , nc =174 and =57dn . Test :0 Hvs H 01 ?? => :0 for the mockscore data usingthe Spearmans rank correlation coefficient and the Kendalls rank correlation coefficient. Solution Forthe given data values: n =22 rs d2 1=- (1) nn t== Under0H , s rn 66S 494 22 22(22 -- 1) 174 - 57 - nn cd (nn 1=- 1) 2 22- 21 2 = 0.72106 = 0.50649 2 - ???tn -2. 2 1-rs The Actuarial Education Company IFE: 2022 Examination Page 22 CS1-11: Correlation The observed value of the test statistic is: 0.72106 20 = 4.654 1 - 0.721062 This exceeds eventhe upper 0.05% point of 20t(which is 3.850). Percentage points for the t distribution are given on page 163 of the Tables. So we have very strong evidence to reject 0H , and we conclude that the mock scores in CS1and CS2are positively correlated. Under0H , the sampling distribution of Kendalls rank correlation coefficient is approximately normal with mean 0 and variance 0.50649 - 0 2(2 n + 5) . The observed value of the test statistic is: 9( nn - 1) =3.299 249 922 21 This exceedsthe upper 0.05% point of the standard normal distribution (which is 3.2905). Percentage points for the standard normal distribution are given on page 162 of the Tables. So we have very strong evidenceto reject 0H , and weconclude that the mockscoresin CS1and CS2are positively correlated. IFE: 2022 Examinations The Actuarial Education Compan CS1-11: Correlation 2 Page 23 Multivariatecorrelationanalysis So far, we have only considered many variables to consider. variable ofinterest, bivariate data. In most practical Wenow consider the case )XY (, applications, there are , where Y remains the but X is now a vector of possible explanatory variables. For example, in motorinsurance we may wishto see the connection between the claim amounts and a number of explanatory variables such as age, number of years driving, size of the engine and annual number of miles driven. 2.1 Datavisualisation Again, the starting point is always to visualise the data. For multivariate cases it is no problem for a computer package to plot a scattergraph matrix, ie scattergraphs between each pair of variables to make the relationships between them clear. The R code to draw scatterplots for all pairs from a multivariate data frame, <data>, is: plot(<data>) orit is possible to use: pairs(<data>) Nowlets look at a set of multivariate data. Consider a set of equity returns from four different Market 1 Market 2 Market 3 Market 4 (Mkt_1) (Mkt_2) (Mkt_3) (Mkt_4) 0.83% 4.27% 1.79% 0.39% 0.12% 3.72% 0.90% 0.26% 5.49% 5.21% 4.62% 5.67% 2.75% 6.26% 3.38% 1.40% 5.68% 7.37% 5.21% 5.05% 3.70% 1.60% 2.34% 2.66% 5.75% 5.08% 6.03% 5.48% 1.03% 1.38% 2.37% 1.47% 0.69% 0.17% 0.38% 4.03% 3.26% 3.04% markets across 12 time periods (X) . 0.10% 2.59% 0.54% 2.22% 1.42% 1.37% 3.03% 9.47% 2.95% 2.99% Wecan use Rto obtain a matrix of scatterplots, by plotting the marketreturns in pairs. We wish to consider the relationship The Actuarial Education Company between Market 4 and the other markets. IFE: 2022 Examination Page 24 CS1-11: Correlation This gives the following scattergraph matrix: The bottom row has Market 4 as the response variable withthe other three markets as the explanatory variables. between the response Wecan see that there appear to be positive linear variable and explanatory variables. There are strong positive linear relationships between Market 4 and explanatory Markets1 and 3. Since Markets1 and 3 movetogether there their influence on Market 4. relationships variables maybe some overlap We will look at how wecan strip this overlap between out in the Principal Components Analysis(PCA) section later in this chapter. 2.2 Samplecorrelationcoefficient matrix Similarly it is no problem for a computer package to calculate between each pair of variables and display them in a matrix. The R code for calculation of a Pearson correlation coefficient correlation matrix for coefficients a multivariate numeric data frame <data> is: cor(<data>, IFE: 2022 Examinations method = "pearson") The Actuarial Education Compan CS1-11: Correlation Page 25 Wecan also use R on the equity return The Pearson correlation Mkt_1 coefficient data to obtain the Pearson correlation matrix for the four Mkt_2 Mkt_3 coefficient markets as produced in matrix. R output is: Mkt_4 Mkt_1 1.0000000 0.6508163 0.9538019 0.9727972 Mkt_2 0.6508163 1.0000000 0.5321185 0.6893932 Mkt_3 0.9538019 0.5321185 1.0000000 0.9681911 Mkt_4 0.9727972 0.6893932 0.9681911 1.0000000 Notice that the diagonal elements are all 1. Thats because there is perfect correlation between, say, Market 1 and Market 1. Notice also that it is symmetrical as corr( XYY , ) = corr( , X. ) 2.3 Inference Wecan carry out tests on the correlation described in Section 1.3. The Actuarial Education Company for each pair of variables using the methods IFE: 2022 Examination Page 26 3 CS1-11: Correlation Principalcomponentanalysis Principal component analysisis mosteasily tackled using a computer. In this section the Core Readingruns through the theory and gives an example, but this topic will be dealt within more depth in the Paper B Online Resources (PBOR). Until now we have considered the variables in separate analysis required in this approach grows exponentially pairs, but in practice the amount with each additional variable. of Principal component analysis (PCA), also called factor analysis, provides a methodfor reducing the dimensionality components necessary to of the data set, X . In other model and understand words, it seeks to identify the key the data. For many multivariate datasets there is correlation between each of the variables. This means there is some overlap between the information that each of the variables provide. The technical phrase is that there is redundancy in the data. PCA gives us a process to remove this overlap. The idea is that we create new uncorrelated variables, and weshould find that only some of these new variables are needed to explain mostofthe variability observed in the data. The key thing is that eachnew variable is alinear combination ofthe old variables, soif weeliminate any ofthe new variables we are still retaining the mostimportant bits of information. Wethen rewrite the data in terms of these new variables, These components are chosen to be uncorrelated the data which maximise the variance. linear which are called principle components. combinations of the variables of The next section of Core Reading explains the process of how a PCAis carried out and contains some matrix theory. In parallel with the text, we will work through a simple matrix as an example so that you can see whatis happening. The Core Readingstarts with a reminder of how to determine eigenvectors and eigenvalues. Thisis assumed knowledge for the actuarial exams. The eigenvalues identity define A are the values matrix. The corresponding equation Consider of matrix ? ? such that eigenvector, 0? I )-=det(A v , of an eigenvalue where I is the ? satisfies the I )v -=(A 0 . annp W as app doing this is that centred data matrix X. Using standard techniques matrix, whose columns are the eigenvectors from linear of algebra, XXT . The intuition for XXTrepresents the (scaled) covariance ofthe data. Here p represents the number of variables and n represents the number of observations of each variable. In a centred data matrix,the entries in each column have a meanof zero. Wecan obtain a centred matrixfrom the original matrix by subtracting the appropriate column mean from each entry. The sample variance/covariance IFE: 2022 Examinations matrixisTXX divided by ( 1)n. The Actuarial Education Compan CS1-11: Correlation Page 27 Suppose we are trying to model the chances of a student passing the CS1 exam. Weare going to include in our modelthe average number of days per week each student does some studying (1X ) and the average number of hours each student studies at the weekend (2X ). The data values for one student data 2,xx 12== 10 and for another student wehave are 4, xx 12== 6. The original matrix is therefore: 210?? ?? 46?? The mean of the entries in the first column is 3 and the is 8, so the centred mean of the entries in the second column matrixis: 12??- X =??-?? 12 Wenow need to calculateTXX : T 11???? -- 1 XX ?== ???? ?--22???? 1 2 ? 2 2 - 4? ?8? -4 Wecan see that this is the covariance matrixfor the data in X. The variance of the data set -(1,1) is 2,the variance of the data set (2, 2)-is 8 and the covariance between the data setsis calculated asfollows: n 11 x -- x 2jj xx()(112 n = 2 ??)xj1xj2 121 == -- =-( 1) 2 + 1 ( - 2) = -4 jj 11 Here the scaled covariance is the same as the non-scaled covariance so 11n-= . Next we determine the eigenvalues, ? -- 24 48--? When ? 0= = 0 ? (2 ? 0? ??? ? = ? ? 48??? ? ? 0? The Actuarial and from there the eigenvectors, for the -?? )(8-- ) 160= ? ? 2 ?10 -= 0 ? =? matrixTXX : 0 or 10 : 24???xx? When ? because the sample size is 2 2 -- 4y ? = 0 -- 4yx + 8y = ? 0? ? = 2xy ? 2? ? oneeigenvector is ?1? ? ? 10= : -- 84???xx? -- ? = ? 0? ? 42 ??? ???yx ? ? Education Company ? 0? ? -8 -y4 = 0? -4 -y2 = 0? ? =-2yx ? 1? ? one eigenvector is ? ? ?-2? IFE: 2022 Examination Page 28 CS1-11: Correlation The unit eigenvectors are: ?? 11 22?? ?? = ?? 22 +21 ?? ?? 115 11?? ?? 11 ??= ?? +- 2) 22 1( ?? -- ?? 225 By definition: 12?? 1 W= 5 The principal 21??-?? components decomposition 12??? 1 PXW == ??? 12???-- 2 2? P of X is then 11 ?-- 5 ? 1? 55 ? 5 ? defined as=PXW 0? ? 0? = It is obvious that the second column doesnt provide any usefulinformation deletion of components below. Wehave now transformed make this more useful, eigenvalue, the data into Wis orthogonalif - 1 a set of p orthogonal ensure that the eigenvectors ie the components which have the . in components. W are ranked most explanatory but weconsider the In order to by the largest power come first. T. It follows from this definitionthat the columns of Ware WW= orthogonal vectors, each with magnitude 1. In our example we did construct largest eigenvalue. Wranked by the The goal is now to move on from simply transforming the data, and instead than p components, so that wereduce the dimensionality of the problem. to use fewer By eliminating those much components with the least explanatory power we wont sacrifice too information. To assess the explanatory power of each component, consider =SP PT . This is a diagonal matrix where each diagonal element is the (scaled) variance of each component transformed data (the covariance between components is zero by construction). T PP== Recallthat ??25-- 55 ???? ?? 00???? 5?? ????5 0 1150 ? 0 0 5? ? 0? ? 0? ? 10 0? ?0 0? =? of the ? TXX givesthe (scaled) covariance of the data using the original variables. HenceTPP gives the (scaled) covariance of the data using the new variables (components). Since the components are uncorrelated, the covariances between them are zero. The diagonal elements givethe (scaled)variancesof eachcomponent(the valuesin matrix P). Thesample variancesare the diagonal elements divided by ( 1)n, whichin this exampleis just 1. Incidentally, it should be noted that the sample variances are equal to the corresponding eigenvalues. IFE: 2022 Examinations The Actuarial Education Compan CS1-11: Correlation Page 29 For a given q, it is usefulto consider the proportion ofthe variancethat is explained bythe first q principal S divided components. This is given by the sum of the first by the sum of all the diagonal elements. q diagonal elements of It is often the case that even for large data sets with many variables, the first two or three principal components explain 90% or even more of the total variance which is sufficient to model the situation. In our example, 100% ofthe variance is explained by the first component. Theres no hard and fast rule for deciding exactly how many principal components weshould use. One criterion is to keep as many components as necessary in order to explain at least 90% of the total variance. Other criteria are covered after the Core Reading example on the following pages and will be considered further in PBOR. Since Wis orthogonal by construction, =XP WT. This allows usto reconstruct the original (centred) data using all orjust the reduced number of components. In the general case, we would set the columns in The Rfunction for PCA on a numeric P of the components data frame we are eliminating to zero. <data> is: prcomp(<data>) Technically this uses the more numerically stable method of singular value decomposition (SVD) of the data matrix which obtains the same answers as using the eigenvalues of covariance matrix. An alternative Rfunction for PCA is princomp(<data>) which does use eigenvalues also uses n rather than n - 1 as the divisor in the variance/covariance give slightly different results but matrix. Hence it will to prcomp(<data>). Notes: 1. Since the principal components are linear useful for reducing the dimensionality suitable transformation 2. Since the loadings (such as log) combinations of the variables it is not wherethere are non-linear relationships. should of each variable that A be applied first. make up the components are chosen by maximising variance, variables that have the highest variance will be given more weight. It is often good practice (especially if different units of measurement are used for each variables) to scale the data before applying PCA. 3. No explanation practical, has been provided for real-world sense. Intuitively, whatthese components represent in a the first component is the overall trend of the data. For the second component onwards, intuition for this must be sought elsewhere. This is often done by regressing the components against variables external to the data which the statistical analyst has an a priori cause to believe may have explanatory power. Thereis now a Core Reading example based on the equity returns data from Section 2.1. It is very hard to check these figures manually dueto the amount of data. Werecommend you try to follow whatis being done without attempting to check the numbers. The Actuarial Education Company IFE: 2022 Examination Page 30 CS1-11: Correlation Consider our set of equity returns, Weobtain X by first T XX X, from four centering the different data (ie by subtracting 0.01431 0.01310 0.01308 0.01249?? 0.01310 =?? 0.02830 0.01026 0.01245 0.01026 0.01245 0.01315 0.01192 0.01192?? ?? 0.01153?? 0.01308 0.01249 markets across the 12 time means of each periods. market). ?? Thisis the variance/covariance matrix of the centred data. Since we have 12 observations need to divide each of these figures by ( n-=1) 11 to get the sample variance/covariances. The eigenvectors we are: 0.48118 W =?? Then: -- 0.62118 0.33488 0.77332 0.80202 -- 0.11440 ???? 0.06531 - 0.10879 - 0.55026 ?? 0.43394 -- 0.47122 - 0.53559 0.44078 -- 0.26035 - 0.25621 ?? 0.81993?? These eigenvectors have magnitude of 1. Hence the principal component - 0.02822 ? 0.01630 0.00286 0.02875 0.09663 0.01781 0.00049 0.07401 0.00291 0.00630 0.00195 0.04243 ? 0.04287 is: 0.01374 0.00237 0.11079 PXW== decomposition -- --- 0.02115 0.11673 -- 0.01947 0.02033 -- 0.02593 0.01066 -- 0.00197 0.05706 0.01253 0.00110 ?? 0.00732 ?? - 0.00744 ?? - ?? 0.00639 -?? ?--0.10657 0.04458 0.00350 0.00116 ?? 0.00145?? ?? 0.00113?? ?? 0.00172??0.00545 ? - 0.00828 0.00166?? 0.00358?? -- -0.00169 - 0.00545 0.00571 0.00543 ?-- 0.03578 ?? ?? 0.00084 0.00219 ? ? 0.00369 ? Now consider: SPT P==?? 0.05445 0 0 0 0 0.01218 0 0 0 0 0.00051 0 0 0 0 ?? ?? ?? ?? 0.00013 ?? Thisis the (scaled) variance/covariance matrix of the principal components. The diagonal elements are the scaled variances (the sample variances arethese figures divided by 11) andthe other elements are the scaled covariances (which are all zero as the components are uncorrelated). IFE: 2022 Examinations The Actuarial Education Compan CS1-11: Correlation Page 31 The total (scaled) variance is the sum of the diagonals, which is 0.06727. Wecan now calculate how much ofthis total variance each principal component explains. The first principal component explains 80.9% of the total variance, the first two 99.0%, and the first three 99.8%. Weobtain these figures asfollows: 0.05445 0.06727 80.9%, 0.05445 0.01218 0.06727 It would therefore seem reasonable the first two components of P. in this == 99.0%, 0.05445++ 0.01218 + 0.00051 0.06727 example to reduce the dimensionality = 99.8%, to 2, using The decision criteria used hereis to choose principal components that explain atleast 90% of the total variance. We would now continue our modelling using methodssuch aslinear regression and GLMson this reduced data set. Whilst the first component is the trend, the second component will need to be regressed against one or several variables to determine an intuitive explanation. To choose which components to keep wecould also use a Screetest. Thisinvolves the examination of aline chart of the variances of each component (called a Scree diagram). The Scree test retains only those principal components observe from the Scree diagram). before the variances level off (which is easy to For the Core Reading example, the Scree diagram is: Since the scree plot levels off after the first two components this would imply that these two components are enough. Afurther alternative is the Kaiser test. This suggests only using components with variances greater than 1. This methodis only suitable if the data values are scaled and henceis not appropriate The Actuarial here as the data has only been centred (not scaled). Education Company IFE: 2022 Examination Page 32 CS1-11: Correlation The chapter summary starts on the next page so that you can keep all the chapter summaries together for revision purposes. IFE: 2022 Examinations The Actuarial Education Compan CS1-11: Correlation Page 33 Chapter11Summary Exploratory data analysis(EDA)is the process of analysing datato gain further insight into the nature ofthe data,its patterns and relationships between the variables, before any formal statistical techniques are applied. Scatterplots are the first step to visualisethe data and assessthe shape of any correlation between a pair of variables. The strength of that correlation is measuredby the sample correlation coefficient whichtakes a valuefrom 1- to +1. Pearsonscorrelationcoefficient Measuresthe strength oflinear relationship between x and y . r Sxy = SSyy xx Wecan carry out hypothesis tests on the true Pearson correlation coefficient, ?, between two variables using the t result, the Fisher Z test or permutations. Under H0 : ?=:0 rn1-r 2 ?tn- 2 2 Otherwise: ta nh 1-- rN??? tanh 11 , ??? n ??- 3?? Spearmansrank correlation coefficient Measuresthe strength of monotonic relationship. UsesPearsons formula but withranks. If there are no ties in the data: ?di 2 6 rs 1=- For inference i nn 2 -(1) where Yii dr X()=- r ( )i use permutations or, for n 20>, Pearsons formulae with ranks. For very large values ofn, thesampling distribution ofrs canbeapproximated bythe N0,n-1 ()1 distribution. The Actuarial Education Company IFE: 2022 Examination Page 34 CS1-11: Correlation Kendallsrankcorrelationcoefficient Measuresthe strength of dependence of rank correlation. If there are no ties in the data, then: nn t = - cd nn-(1) / 2 where nc is the number of concordant pairs (where the ranks of both elements agree) and nd is the number of discordant pairs. Forinference use permutations or, for n n 5) 10> , a N( 0,2(2 ) distribution. 9( nn 1) + - Principalcomponentsanalysis Principal component analysis (PCA) is a method for reducing the dimensionality of a data set, X, byidentifying the keycomponents necessaryto modeland understandthe data. These components are chosen to be uncorrelated linear combinations of the variables ofthe data which maximise the variance. The principalcomponentsdecomposition, P, of X (an np centred data matrix)is defined to be=PXW, where Wis app matrix, whosecolumns arethe eigenvectorsof the matrix T XX . The(scaled) covariance SP= TP givesthe explanatory power of eachcomponent. We mightthen decide to retain components one by one until some target percentage (eg 90%) ofthe total variance has been explained. Alternatively, wecould use a Scree diagram orthe Kaisertest to help us decide which components to keep. IFE: 2022 Examinations The Actuarial Education Compan CS1-11: Correlation Page 35 Chapter11 PracticeQuestions 11.1 A new computerised ultrasound scanning technique has enabled doctors to monitor the weights of unborn babies. Thetable below shows the estimated weightsfor one particular baby at fortnightly intervals during the pregnancy. Gestation period (weeks) 30 32 34 36 38 40 Estimated baby weight(kg) 1.6 1.7 2.5 2.8 3.2 3.5 15.3 ?y = 42.03 ??xx?22210 == 7,420 ?y = xy = 549.8 70,SSyy == 3.015 and Sxy =14.3. (i) Showthat xx (ii) Show that the (Pearsons) linear correlation coefficient is equal to 0.984 and comment. (iii) Explain whythe Spearmans and Kendalls rank correlation coefficients are both equal to 1. (iv) (v) Carry out a test of 0:0H ?= vs (a) the t test (b) Fishers transformation. 1:0H ?> Test whether Pearsons sample correlation using Pearsons correlation coefficient and: coefficient supports the hypothesis that the true correlation parameter is greater than 0.9. 11.2 Aschoolteacher is investigating the claim that class size does not affect GCSE results. observations His of nine GCSEclasses are as follows: Exam style Class X1 X2 X3 X4 Y1 Y2 Y3 Y4 Y5 Students in class( c ) 35 32 27 21 34 30 28 24 7 5.9 4.1 2.4 1.7 6.3 5.3 3.5 2.6 1.6 Average GCSEpoint scorefor class( p ) 238 (i) ??cc == ?p = 33.4 ?p 226,884 =149.62?cp=983 (a) Calculate Pearsons, Spearmans and Kendalls correlation coefficients. (b) UsePearsons correlation coefficient to test whether or not the data agree with the claim that class size does not affect GCSEresults. [10] Following hisinvestigation, the teacher concludes, bigger class sizesimprove GCSE results. (ii) The Actuarial Comment onthis statement. Education Company [2] [Total 12] IFE: 2022 Examination Page 36 11.3 Exam style CS1-11: Correlation A university wishes to analyse the performance ofits students on a particular degree course. It records the scores obtained by a sample of 12 students at entry to the course, and the scores obtained in their final examinations by the same students. Theresults are asfollows: Student A B C D E F G H Entrance exam score x(%) 86 53 71 60 62 79 66 Finals paper score y (%) 75 60 74 68 70 75 78 ?xy836 ?? ==867 (i) (ii) I J K L 84 90 55 58 72 90 85 60 62 70 x =60,016 ?y 22=63,603 ?(x- x)(y- y)= 1,122 (a) Explain why Spearmans and Kendalls rank correlation calculated here using the simplified formula. (b) Calculatethe Pearsons correlation coefficient. coefficients cannot be [3] Test whether this data comes from a population with Pearsons correlation coefficient equal to 0.75. [3] [Total IFE: 2022 Examinations The Actuarial Education Compan 6] CS1-11: Correlation Page 37 Chapter11Solutions 11.1 (i) Calculate summary statistics (ii) xx Sx =- yy Sy =- xy=- Sxy ?? x() =7,420 - ?? y() = 42.03 - n n x()(?? ?y) n Calculate (Pearsons) Using the results from There is a strong linear (iii) Explain linear = 22211 3.015 15.3 = 549.8 - 11 6 correlation 14.3 == xxSS yy 6 22211 70 210 = 210 15.3 = 14.3 coefficient and comment part (i): Sxy r 6 70 3.015 association = 0.984336 between gestation period and foetal why the Spearman and Kendall rank correlation weight. coefficients are both equal to 1 The ranks of the two variables (gestation period and weight) are exactly equal, hence Spearmans rank correlation coefficient is equal to 1. This means that all the pairs are concordant, (iv)(a) Test ?0> Weare testing and so t is also equal to 1. using Pearsons correlation coefficient and the t test H0:0 ?= vs ?>1:0H . If 0H is true, then the test statistic r 4 has a4t distribution. 1- r 2 The observed value ofthis statistic is 0.984336 2 =11.17. Thisis muchgreater than 8.610, 1 - 0.9843362 the upper 0.05% point ofthe 4t distribution. So, wereject 0H at the 0.05%level and conclude that there is very strong evidencethat ie that there is a positive linear correlation (iv)(b) weight and gestation period. Test ?> 0 using Pearsons correlation coefficient and Fishers transformation If 0H is true, then the test statistic The Actuarial between the babys ?> 0, Education Company =tanh - 1Zrr has a N(0,1) distribution. 3 IFE: 2022 Examination Page 38 CS1-11: Correlation The observed value ofthis statistic is tanh - 1 0.984336 2.4208 13 = 4.193 on the of the standard = 2.4208, whichcorresponds to a value of (0,1)N distribution. Thisis muchgreater than 3.090,the upper 0.1% point normal distribution. So, wereject 0H at the 0.1%level and conclude that there is very strong evidence that ?> 0, ie that there is a positive correlation between the babys weight and gestation period. (v) Test whether Pearsons sample correlation coefficient supports Weare testing vs H H0:0.9?= ? 0.9> . ?>1:0.9 1 distribution, where If 0H is true, then the test statistic Zr hasa Nz? (,3) z ?==-1 tanh 0.9 1.4722 The observed value ofthis statistic is tanh -1 0.984336 = 2.4208, whichcorresponds to a value of 2.4208 - 1.4722 =1.643 onthe (0,1)N distribution. Thisis just less than 1.645,the upper 5% 13 point of the standard normal distribution. So, we cannot reject 0H that the correlation 11.2 (i)(a) at the 5%level ie the data does not provide enough evidence to conclude parameter between the babys weight and gestation period exceeds 0.9. Calculate the correlation coefficients Pearson correlation cc coefficient 2 2=- ?c () =6,884 - 2382 =590.2222 Sc ? cp=- ?Scp cp 238 ()(??) =983n 2 ?p2 33.42 pp ?Sp =- () =149.62n ? r Scp ccSS pp IFE: 2022 Examinations == [1/2] 9 n 9 33.4 9 = 99.75556 [1/2] =25.66889 99.75556 [1/2] [11/2] = 0.81045 590.2222 25.66889 The Actuarial Education Compan CS1-11: Correlation Page 39 Spearman rank correlation coefficient The ranks (from lowest to highest) and differences are asfollows: Class X1 X2 X3 X4 Y1 Y2 Y3 Y4 Y5 Students in class( c ) 9 7 4 2 8 6 5 3 1 Average GCSEpoint score for class( p ) 8 6 3 2 9 7 5 4 1 1 1 1 0 1 1 0 1 0 Differences [1] Hence: 66 rs =- 2 9(9 [1] =10.95 1) - Kendall rank correlation coefficient Arranging in order of class rank: Class Y5 X4 Y4 X3 Y3 Y2 X2 Y1 X1 1 2 3 4 5 6 7 8 9 scorefor class( p ) 1 2 4 3 5 7 6 9 8 # concordant pairs 8 7 5 5 4 2 2 0 0 # discordant 0 0 1 0 0 1 0 1 0 Students in class ( c ) Average GCSEpoint pairs [1] Totalling the rows gives nc =33, nd =3. Hence: 33 t (i)(b) - 3 9(9 - 1) /2 == ? 0.83 [1] Test whether class size does not affect GCSEresults Weare testing: :=?0 Hvs:0 H 01 ?? [1/2] Under 0H : rn- 2 ? tn -2 [1/2] 1- r 2 The Actuarial Education Company IFE: 2022 Examination Page 40 CS1-11: Correlation The observed value of the test statistic is: 0.81045 7 [1] = 3.660 1 - 0.810452 Thisis greater than 3.499,the upper 0.5% point of the 7t distribution. [1/2] Therefore, wehave sufficient evidence atthe 1%level to reject 0H . Therefore weconclude that there is a correlation between class size and GCSE results (ie classsize does affect GCSE results). [Total Wecould use Fishers transformation. the accurate version (ii) However this is only an approximation, when testing the hypothesis that ? 0= [1/2] 10] it is better to use . Comment There is strong positive correlation better GCSEresults). between class size and GCSEresults (ie bigger classes have [1] However, correlation does not necessarilyimply causation, ie whilst bigger classes have better results, it is not necessarilythe classsizethat causes the improvement. [1] [Total 11.3 (i)(a) 2] Why wecant use simplified formulae Theranks (from lowest to highest) and differences are as follows: Student Entrance exam score x(%) Finals paper score y (%) Since we have tied ranks A B C D E F G H I J K L 11 1 7 4 5 9 6 10 12 2 3 8 8 1 7 4 5 8 10 12 11 1 3 5 wecannot use the simplified formula for Spearman or Kendall. [1] We would haveto use a correction, whichis best handled by a computer. (i)(b) Pearsons correlation Sxx 60,016=- Syy 63,603=- Sxy = coefficient 8362 12 = 1,774.67 = 962.25 8672 12 [1] 1,122 IFE: 2022 Examinations The Actuarial Education Compan CS1-11: Correlation Page 41 Therefore: Sxy r 1,122 == xxSS yy = 0.85860 [1] 1,774.67 962.25 [Total 3] (ii) Hypothesis test Wearetesting H=? 0.75 Hv If 0H is true, then the test statistic z? tanh 1 01:: [1/2] 0.75??s Zr follows the Nz , 9? ()1distribution, where 0.75==-0.97296 . [1/2] The observed value ofthis statistic is tanh 1 0.85860 1.2880 - 0.97296 = 0.945 on the (0,1)Ndistribution. 19 - = 1.2880 , whichcorresponds to a value of This is clearly less than 1.96, the upper 2.5% point of the standard [1] normal distribution. [1/2] So, wehaveinsufficient evidence at the 5%level to reject 0H ie the data do not provide enough evidence to conclude that the correlation parameter is any different from 0.75. The Actuarial Education Company [1/2] [Total 3] IFE: 2022 Examination Allstudy materialproducedby ActEdis copyright andis sold for the exclusiveuse of the purchaser. Thecopyright is ownedbyInstitute andFacultyEducationLimited, a subsidiary of the Institute and Faculty of Actuaries. Unlessprior authorityis grantedby ActEd,you maynot hire out,lend, give out, sell, store ortransmit electronically or photocopy any part of the study material. You musttake care of yourstudy materialto ensurethat it is not used or copied by anybody else. Legal action will betaken if these terms areinfringed. In addition, we mayseekto take disciplinaryactionthrough the profession orthrough your employer. Theseconditionsremainin force after you havefinished usingthe course. IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression Page 1 Linearregression Syllabus objectives 4.1 Linear regression 4.1.1 Explain what is meant by response and explanatory 4.1.2 State the simple regression model(with a single explanatory variable). 4.1.3 Derive the least squares estimates of the slope and intercept parameters in a simple linear regression model. 4.1.4 Use appropriate statistical software to fit a simple linear regression to a data set and interpret the output. Perform statistical inference Describe the use of various variables. model on the slope parameter. measures of goodness of fit of alinear regression model2()R . Useafitted linear relationship to predict a meanresponse or an individual response with confidence limits. Useresiduals to check the suitability and validity of alinear regression model. The Actuarial Education Company IFE: 2022 Examination Page 2 CS1-12: Linear regression 4.1.5 State the multiple linear regression variables). 4.1.6 Useappropriate software to fit a multiplelinear regression modelto a data set and interpret 4.1.7 model (with several explanatory the output. Use measuresof modelfit to select an appropriate set of explanatory variables. IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression 0 Page 3 Introduction In the last chapter we examined the correlation between two variables. If there is a suitably strong enough correlation between the two variables(and there is cause and effect) wecanjustifiably calculate aregression line, which givesthe mathematicalform of this relationship: regression line Y E[Y|X] = a+X X Much of this chapter is concerned with obtaining estimates of the variables associated with this regression line and giving confidence intervals for our estimates usingthe methodsfrom Chapter 9. Dueto the mathematically rigorous nature of this work, a number of results are quoted without proof, and students are expected to memorise and apply these results in the exam. This is along chapter and will probably require two study sessions to cover it in detail. In the past, this material often formed one ofthe longer questionsin the Statistics exam. In the previous chapter we carried out correlation to assess the strength ofthe relationship In this unit welook at regression analysis on bivariate and multivariate analysis to assess the nature of the relationship between Y, the response (or dependent) variable, and X, the explanatory (or independent, regressor) data between variables. or variable(s). The values of the response variable (our principal variable part, explained by, the values of the other variable(s), of interest) depend on, or are, in whichis referred to asthe explanatory variable(s). Ideally, the values used for the explanatory variable(s) are controlled by the experimenter (in the analysis they are in fact assumed to be error-free constants, as opposed to random variables with distributions). Regression analysis consists view to estimating the specified response of choosing and fitting an appropriate model usually with a meanresponse (ie the mean value ofthe response variable) for values of the explanatory may also be needed. variable(s). In this chapter only linear relationships A prediction will be considered of the value of an individual which assume that the expected value of Y, for any givenvalue x of X, is alinear function ofthat value x. Forthe bivariate case this EY The Actuarial x[] ?=a Education simplifies to: x+ Company IFE: 2022 Examination Page 4 For the CS1-12: Linear regression multivariate EY x case with k explanatory ?= x ,..., x ?? 12, ?? 1 x1 + +a variables, 2 x2 +... Recall from Chapter 5 that ]EY [| x is a conditional corresponding As always, + this is: kkkx mean, which represents the average value of Y to a given value of x. before selecting and fitting a model, the data must be examined scatterplots) to see which types of model(and model assumptions) (eg in may or may not be reasonable. Question A sample often claims and corresponding is taken from the business of an insurance The amounts, in units of 100, payments on settlement for household policies company. are as follows: Claim x 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00 Payment y 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45 The scatterplot from the previous unit was as follows: Discuss whether alinear regression IFE: 2022 Examinations model is appropriate. The Actuarial Education Compan CS1-12: Linear regression Page 5 Solution There appears to be a strong positive linear relationship and so fitting a linear regression modelis appropriate. If a non-linear relationship (or no relationship) data, then the methods of analysis discussed between the variables is indicated by the here are not applicable for the data as they stand.However a well-chosen transformationof y (or x, orevenboth) maybringthe data into a form for which these methods are applicable. The purpose of the transformation Ya bX=+ . is to change the relationship into linear form, ie into the form Question Explain how to transform the relationship Ya xb= to alinear form. Solution If wetake logs, the relationship becomes: log log =+ xlogYab Soif we workin terms of the variableYY' =log , we have alinear relationship: '=+ log Ya The Actuarial Education xlogb Company IFE: 2022 Examination Page 6 1 1.1 CS1-12: Linear regression Thesimple bivariatelinear model Modelspecification Given a set of n pairs of data (,xy ),ii,in,=?1, 2, the iy are regarded as observations of a response variable iY . For the purposes of the analysis theix , the values of an explanatory variable, are regarded as constant. Thesimplelinear regression model(with one explanatory variable) The response variable iY is related to the value ix a =+ where the ie So Ee 0i[] = Yxii + ei =?1, 2, are uncorrelated , var[ ]ie = by: ,in error variables with mean 0 and common variance s 2 . s2 , ,in. =?1, 2, is the slope parameter, the intercept a Thisis equivalent to saying that parameter. ymx=+c, where mis the gradient or slope and c is the intercept, ie wherethe line crosses the y-axis. 1.2 Fitting the model Wecan estimate the parameters in a regression Fitting the model using the method ofleast squares. model involves: (a) estimating the parameters (b) estimating the error variance The fitted regression line, and s 2 a, and . which gives the estimated value of Y for afixed x,is given by: =+ yxa where = Sxy Sxx anda =- yx . Theseare the equations we useto calculate the best values of a and . They are givenin the Tables. IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression Page 7 Recallfrom the previous chapter that: xx = ? Sx i Syy = ? x() yy() -=? i xy=?Sxi 222 xi -( ? yi -( ?y -=? x yi -- y ()() xi ) n i 222 ) n = ?xyi i -( 22 =-xi nx? y? =-yni 22 xi )(??yi ) n xii =-? ynxy n Inregression questions,?xi2 is oftenabbreviated to ?x2, etctosimplify thenotation. i=1 Question Showthat the first ofthese relationships is true, ie that: ?xi ()2 22 ?? xi=- Sx - ()i x xx = ?xi2=-nx2 n Solution Expanding the bracket and splitting up the summation, xx=- Sxi x() =- 2 =?? ?? xii xx 2 xi -(2 + x () 2 =- Nowsince we have: 22 x xi +x ) ?x22 +xn ?? ii ()22 2 ?? x22 ii =- 2 xxi () ? nnn ?x i nx= , wehave: x?? Sx i - xx = ?xi ()2 22 i =- nx()2 nn 2 ?xi =-nx2 These formulae aregiven onpage 24oftheTables inthe For a set of paired data {(, )ii xy coefficients are the values a ; ,in},=?1, 2, and nn == qe 2 y ax for the least xx Sxi =- x() squares ?? 22 xi - nx 2 format. = estimates of the regression which: 2 i()??-????+ii ii ==11 is a minimum. The Actuarial Education Company IFE: 2022 Examination Page 8 CS1-12: Linear regression e=+ Yg xii() In fact, for any model i , the least squares estimates nn -?? qe == x[(ii y be determined as the values for which ig of the regression coefficients can Theequations )]22 is a minimum. ii==11 we need to solve to find the values ofa Differentiating q partially and with respect to are sometimes called normal a and equations. , and equating to zero, gives the normal equations: nn y ??nx =+ a ii ii 11 == nn n ?? x yii =+ai ii == 11 Solving these i equations = ?xxi2 1 by using determinants least squares estimate of ?? ?nn ?? n ?? ?? ? ?ii ==11 ?? ? ?i nxy ii ??- ??? xi ? ? = or the method of elimination then gives the as: 1 2 = ?? ? nn ? ?? ?? ? ?ii ==11 ? ? ? ?yi ?? ? nx 2??- ??? xii ? Thiscan also be written as The first of the two Sxy = Sxx . normal equations gives a as: nn ??yxii - ii == a== n 11 y x - Being able to produce afull derivation of these results is important, times in the past. Notethat afitted line will pass through the point asit has been examined many (),xy . Question Show that the fitted line =+ yxa passes through the point )x(, y . Solution Substituting x into the RHSof =+ =+ yxagives: yxa IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression Sincea =-yx Page 9 , it follows that: yy=- ()x x = y+ is the observed value of a statistic B whose sampling distribution has the following properties: EB [] =, var[ B ] = s 2 Sxx The estimate of the error variance s 2 is based on the sum of squares of the estimated errors: 1 2 s =- n yy 2? ()2 ii - Alternatively, this can be calculated more easily using this equivalent formula: s 221 =- n - 2 (SSyy xy S xx ) This is given on page 24 of the Tables. We will see later that this is an unbiased estimator of The R code to fit a linear model to a bivariate data frame, <data>=c(X,Y) the object 2s and assign it to model, is: model <-lm(Y Then the estimates ~ X) of the coefficients and error standard deviation can be obtained by: summary(model) The R code to draw the fitted regression line on an existing plot is: abline(model) Question The sample of ten claims and payments above (in units of 100) has the following summations: ??xx 35.4 , == 22133.76 , ? y = 32.87 ,? Calculate the fitted regression line and the estimated The Actuarial Education Company y = 115.2025 , ? xy = 123.81 error variance. IFE: 2022 Examination . Page 10 CS1-12: Linear regression Solution Number of pairs of observations = 10n . 35.4 2 Sxx =-nx?x22 133.76=- Syy yny22 =-? Sxy (35.4 =-xynxy? 123.81=- = s 7.4502 Sxx 8.444 yx=2 115.2025 =- Sxy a =- yy SS2 xy S () xx Sothe fitted regressionline is IFE: 2022 Examinations 8.444 = 32.287 = 10 7.1588 32.87) 10 = 7.4502 ==0.88231 3.287 =- (0.88231 1 n- 2 10 3.54) = 1 8( 7.1588 =- 0.164 7.45022 8.444) = 0.0732 0.164=+0.8823 yx which is shown on the graph below. The Actuarial Education Compan CS1-12: Linear regression Page 11 Once we have worked out the estimates corresponding to ix usingthe formula of a and , we can calculate predicted values of y yxa ii. =+ Question For the claims settlement question above, calculate the expected payment on settlement for a claim of 350. Solution Since we are working in units of 100 a claim of 350 corresponds to x 3.5= . Substituting this into the regression line gives: y 0.164=+ 0.8823 3.5 = 3.25 So we would expect the settlement payment to be 325. 1.3 Partitioningthe variability ofthe responses To help understand the goodness responses, asgivenby of fit Syi yy of the =-?y()2 should model to the data, the total variation in the be studied. Someofthe variationin the responsescan beattributedto the relationship with x (eg y maytend to be high when x is high, low when x is low) and someis random variation (unmodellable) or explained above and beyond that. by the model Just how muchis attributable to the relationship is a measure of the goodness of fit of the model. Westart from anidentity involvingiy (the observedy value), y (the overall averageof the y values) and Squaring yi (thepredictedvalueofy). and summing yy -= both sides of: ii- yy -i () + i yy() - yi ()22 gives: yy() the cross-product -= ?? iiy + ? -yyi() 2 term vanishing. The sum on the left is the total sum of squares of the responses, denoted here by TOTSS . The second sum on the right is the sum of the squares ofthe deviations ofthe fitted responses (the estimates of the conditional means) from the overall mean response (the estimate of the overall mean). It summarises the variability accounted for, or explained by the model. It is called the regression sum of squares, denoted here by SSREG. The Actuarial Education Company IFE: 2022 Examination Page 12 CS1-12: Linear regression The first sum on the right is the sum ofthe squares of the estimated errors (response fitted response, generally referred to in statistics as a residual from the fit). It summarises the remaining variability, that between the responses and their fitted values and so unexplained SS RES . by the model. It is called the residual The estimate of s sum of squares, SSRES 2 is based on it. It is n - 2 denoted here by . So: SSTOT SS =+RES SSREG SSRESis often also written as For computational purposes ? () -a =+ SSREG ERRSS (error). SSTOTS= yy and: xi The last step uses the fact that So RES 2 ()a + x ?? ?? = SS= xy xx 2 S S2 xx =Sxxxy . S 2xy =-SS S yy Sxx . Question Determine the split of total variation in the claims and payments model between the residual sum of squares and the regression sum of squares. Recall that: nx== 10, SS xx ?? == 8.444 35.4 , , yy = 32.87y 7.1588 , Sxy = 7.4502 Solution SSTOT=Syy 7.1588= SS REG= ? SSRESn-= 2 S2xy Sxx 7.45022 8.444 ==6.5734 RESSS =-SSTOT REGSS = 0.5854 0.5854 8 = 0.0732 gives the same value of alternative formula() SS yy IFE: 2022 Examinations 2 xy Sxx ( n-- 2sthat we obtained earlier using the .2) The Actuarial Education Compan CS1-12: Linear regression Page 13 It can then be shown that: ESSTOT[] from ( n=- 1) +22sSxx which it follows that ES ESSREG [] [] RESS =+22 sSxx =( n -2) s 2. Hence: 2 ?? ?? Sxy E?? Syy == []2 EE nSxx?? -22 ?? SS ( n- 2)s2 RES ????11 E[ SSRES ] = ??= n 2 n-2 ???? n-- ?? Sos 2 is an unbiased estimator of2s In the case that the data are close =ss 2 . to aline ( r high a strong linear relationship) the modelfits well,the fitted responses (the values on the fitted line) are close to the observed responses, and so REGSS is relatively high with RESSS relatively low. r is referringto Pearsons correlation coefficient,whichwecalculated in Chapter 11. In the case that the data are not close to a line ( r low a weak linear relationship) the model does not fit so well,the fitted responses are not so close to the observed responses, and so REGSS is relatively low and RESSS relatively high. The proportion coefficient of the total variability of determination, R2 SSREG SSTOT [The value of the == of the responses denoted 2R . explained Here, the proportion by a model is called the is: Sxy2 Sxx Syy proportion 2R is usually quoted as a percentage]. R2 cantake valuesbetween 0%and 100%inclusive. Question Calculate the coefficient of determination for the claims and payments model and comment onit. Recallthat: SSTOT== 7.1588 SSREG 6.5734 SSRES = 0.5854 Solution R2 = The Actuarial SSREG SSTOT Education Company 6.5734 ==0.918 (91.8%) 7.1588 IFE: 2022 Examination Page 14 CS1-12: Linear regression This value is very high and so indicates the overwhelming majority of the variation is explained the model(and hence verylittle is left overin residual variation). modelis a good fit to the data. In this case (the simple linear regression determination is the square of Pearsons r = by Hencethe linear regression model), note that the value of the coefficient correlation coefficient for the data since: of Sxy ()1/2 xxSSyy The Pearsons sample correlation coefficient wasintroduced in the previous chapter and is given on page 24 ofthe Tables. Question Calculatethe correlation coefficient for the claims and payment data by usingthe coefficient of determination from the previous question. Solution r== 0.918 0.958 Since wesaw earlier that there was a positive relationship between claims and the settlement payments wehave a correlation coefficient of 0.958. The Rcode to obtain the regression assigned to the object and residual sum of squares for alinear model model, is: anova(model) The coefficient of determination is given in the output of: summary(model) IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression Page 15 2 Thefull normal modelandinference 2.1 Thefull normal model The model must be specified further in order to makeinferences concerning the responses based on the fitted model. In particular, information on the distribution of the iY s is required. In the full model, we now assume that the errors, ie , areindependent and identically distributed as N(0),s 2 variables. This willthen allow usto obtain the distributions for can then use these to construct For the full confidence intervals model the following The error variables ie Under this full each additional assumptions are: (a) independent, independent, distribution normally distributed with are made: and (b) normally distributed mean 0 and variance random We and carry out statistical inference. model,the ie s are independent, identically with a normal and the iY s. variables, distributed random variables, s 2. It follows with xa =+[]iiEY that the iY s and var[ ]iY are = s 2 . B, beingalinear combinationofindependent normalvariables,itself hasa normal distribution, The further (1) with mean and variance results B and 2()- s2 n (2) s Note: 2 s 2 With the full derive required as noted earlier. are: are independent has a 2? distribution with model in place the iY s maximum likelihood estimators =-2n? . have normal distributions of the parameters a, and it is , and2s possible to (since maximum likelihood estimation requires usto know the distribution whereasleast squares estimation does not). It is possibleto show that the maximumlikelihood estimators of least squares estimators, but the MLEof2s estimator). a and are the same asthe has a different denominator from the least squares 2.2 Inferences onthe slope parameter To conform to usual practice the distinction , will now be dropped. The Actuarial Education Company Only one symbol, between namely B, therandomvariable,andits value will be used. IFE: 2022 Examination Page 16 CS1-12: Linear regression and var( ) =s 2 Sxxfrom Section 1.2: Usingthe fact that E() = ( () =- 1/2 s2ASxx ) is a standard normal variable Repeating result (2) from Section 2.1: Bn=- 2() 22ss Now,since with =-2n? ) ( - is variable with a 2? degrees offreedom =-2n? ands 2 areindependent, it followsthat AB n 1/2 ()// 2 {}- has a t distribution , ie: ) / se( has a t distribution wherethe symbol se() with (3) =-2n? denotes the estimated standard error of Result (3) can now be used for the construction value of , the slope coefficient in the model. , namely 21/2()xxSs . of confidence intervals, and for tests, is the no linear relationship = H0:0 on the hypothesis. Since Sxy = Sxx and r = Sxy , if 0= then Sxy=0 andr0= too. xxSS yy This t distribution result for the estimator of is given on page 24 ofthe Tables. Question For the claims/settlements data: (a) calculate line a two-sided (b) test the hypothesis 95% confidence interval :1 vs HH 01 for , the slope of the true regression =?:1. Recallthat: xx== 8.444 ,SSyy 7.1588 , Sxy = 7.4502 , = 0.164 ,a = 0.88231 , s 2 = 0.0732 Solution (a) se() (0.0732 / 8.444) 95% confidence ie interval ==1/2 0.0931 for is (2.3060.8823 0.0931) ie 0.8823 .025,8 {( )}tse 0.2147 So a 95% confidence interval is (0.668,1.10) IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression (b) Page 17 The 95% two-sided confidence interval in (a) contains the value 1, so the two-sided test in (b) conducted atthe 5%level results in 0H being accepted. In R,the statistic and p-value for the test of 0 =:0Hfor alinear regression model are displayed in summary(model). In R, 95% confidence intervals for the parameters a and from alinear regression model are given by: confint(model,level=0.95) 2.3 Analysis of variance (ANOVA) In Section1.3 wepartitioned the variabilityinto that arisingfrom the regression model( andthe left-over residual(or error) variability( REGSS ) RESSS or ERRSS ). Wethen calculated a ratio of the variances to obtain the proportion of variability that wasdetermined (or explained) by the model (called the coefficient of determination, With the distributional assumptions R2). This gave us a crude measureoffit. underlying the full regression model we can now do a more formal test of fit. Recallfrom Chapter 7that the variance of a sample taken from a normal distribution has a chi-square distribution, (1)nS - 22 2 ?s?n-1 , and the ratio of variances of two 22 2 11()()SS samples from a normal distribution has an F distribution, Fss2 22 nn--1 1,12 ? . Wecan therefore use an Ftest to compare the regression variance to the residual variance. Another method of testing the no linear relationship hypothesis (ie H0:0 = ) is to analyse the sum of squares from Section 1.3. In Section 2.1, wesaw under the full normal modelthat 2 s SSRES n=-(2) When0H Since we have SSTOT is true, SS REG and SSREG s2 is ss22 ~ ?2n- 2. Since n -(2) ~s? 22n-2. SSRES n -(1) is the overall sample variance and so SSTOT s2 is RESSS are in fact independent and SSRES s 2 is 2 ?n -2 it follows ?n2 1. - that 2 ?1 . Therefore: SSREG SSRESn( - regression 2) residual mean sum of squares == MSSREG mean sum of squares is F1, -2nand0H is rejected for large The meansum of squares (sometimes squares bythe degrees of freedom. MSSRES values of this ratio. just called the mean square) is where we divide the sum of The sample variance, 2 1 n - 1 ?sxi =- x()2 is actually a mean sum of squares. The Actuarial Education Company IFE: 2022 Examination Page 18 CS1-12: Linear regression Unlike the coefficient of determination, we divide the regression variability by the residual rather than the total variability. Alarge value of this ratio would meanthat the majority ofthe variability is explained by the linear regression model. Therefore we would reject the null hypothesis of no linear relationship. The results are usually set out in an ANOVA table: Degrees of Freedom Source of variation SSREG 1 Regression Mean Sum of Squares Sum of Squares Residual n- 2 SS RES Total n - 1 SSTOT SSRE 1G SS RE S n -(2) Thetest statistic is the ratio of the values in the last column. In R,the F statistic andits p-value for a regression anova(model) model are given in the output of both and summary(model). Question For the data set of 10 claims SSTOT 7.1588 and their settlement SSREG== 6.5734 payments, we had: SSRES = 0.5854 Construct the ANOVAtable and carry out an F test to assess whether 0= . Solution The ANOVA table is: Source of variation d.f. SS MSS Regression 1 6.5734 6.5734 Residual 8 0.5854 0.0732 Total 9 7.1588 Under H0:0 The p-value = we have F 6.5734 of F = 89.8 is less than IFE: 2022 Examinations == 89.8 on (1, 8) degrees of freedom. 0.0732 even 0.01, so 0H is rejected at the 1% level. The Actuarial Education Compan CS1-12: Linear regression 2.4 Page 19 Estimatinga mean responseand predictinganindividual response (a) Mean response This is often the mainissue the whole point of the modelling exercise. For example, the expected settlement for claims of 460 can be estimated as follows: If 0 00 is the expected (mean) response x[| EY ] == a The variance var x 0 ), 0 + is estimated of the estimator 1=+ () 0 0 for a value 0x by a =+ x of the explanatory variable (ie 00, which is an unbiased estimator. is given by: 2 xx() ???? nSxx ?? ?? ?? s2 Thisresult is given on page 25 of the Tables. The distribution in Section actually used is a t distribution. The argument is similar to that described 2.2: ()/ se[] 0 - 00 has a t distribution with (4) =-2n? where se[] 0 denotes the estimated standard error ofthe estimate, namely: 1 se xx() 1 ?? 0 nSxx 0 2??- ?? 2 ?? 2?? ?? ?? ?? ?? ????=+ s Result (4) can be used for the construction expected response when 0xx = . (b) Individual Rather than individual of confidence intervals for the value of the response estimating an expected response response 0y (for 0xx = EY[| x0] an estimate, ) is sometimes required. or prediction, The actual of an estimate is the same as in (a), namely: 00 =+ yxa but the uncertainty associated with this estimator (as measured by the variance) is greater than in (a) since the value of an individual response 0y rather than the more stable mean response To cater for the extra variation of an individual response about the 2 mean, an extra term s has to be added into the expression for the variance of the estimator of a mean response. In other is required. words, the variance of the individual var(y0) The Actuarial Education 1=+ +?? 1 Company 2??0 xx() ?? s nSxx ?? ?? response estimator is: 2 IFE: 2022 Examination Page 20 CS1-12: Linear regression The result is: -yy ()/ se[ y00] has a t distribution with (5) =-2n? where se[]y 0 denotes the estimated standard error of the estimate, namely: se[ y0] +??1 =+ 1 1/2 ?? xx()2???? 0 nSxx s 2?? ?? ?? ?? ?? Result (5) can then be used for the construction of confidence intervals) for the value of a response when 0xx = . The resulting for the interval for an individual mean response 0 Recall that for an individual i+ ax response 0y is intervals (or prediction wider than the corresponding interval . response value plus an error term, ie . Since we have iyx ii ?eNi(0, s2) anindividual average. Hence we havethe same estimate see that there is an additional2s =+a a x+ + e , whichis the regression line point is on the regression line on 0 asfor the mean response.However,wecan in the expression for the variance. Question Consider again the claims/settlements example. Calculate: (a) a 95% confidence interval for the expected payments (b) a 95% confidence interval for the actual payments Recall that s== 0.88231, = 0.164,a 2 predicted on claims of 460. on claims of 460. 0.0732 and Sxx = 8.444 . Solution (a) Estimate of expected payment se of estimate =+ 10 0.1636=+ 0.88231(4.6) = 4.222 3.54)2??-1(4.6 ?? 8.444 ?? 0.0732 = 0.1306 ?? t0.025,8 = 2.306 So confidence ie 4.222 interval is 0.301 ie (3.921, 4.523) ie (392, IFE: 2022 Examinations (2.306 4.222 0.1306) 452) The Actuarial Education Compan CS1-12: Linear regression (b) Page 21 Predicted payment = 4.222 1=+ se of estimate 10 ??-1(4.6 3.54) 2?? ?? 0.0732 8.444 ?? ?? + So confidence interval is ie 4.222 ie (3.529, 0.693 4.915) ie (353, can be obtained newdata 0.3004 (2.3064.222 0.3004) 492) In R, predicted y values for, say, 4x0 = c(X,Y) = in alinear regression modelfitted to a data frame as follows: <-data.frame(X=4) predict(model,newdata) The R code for 95% confidence intervals for the mean and individual response are: predict(model,newdata,interval="confidence",level=0.95) predict(model,newdata,interval="predict",level=0.95) 2.5 Checkingthe model The residual from the fit atix yi and the fitted error, the difference between the response value, ie: residual atix is The R code for is the estimated obtaining yii ey =- i the fitted values and the residuals of a linear regression model is: fitted(model) residuals(model) By examining the residuals it is possible model about (i) the true errors ie (which to investigate the validity of the assumptions in the are assumed to be independent normal variables with means0 andthe same variance s2), and(ii) the nature ofthe relationship betweenthe response and explanatory variables. Plotting the residuals along a line may suggest a departure from normality for the error distribution. The sizes of the residuals should also be looked at, bearing in mind that the value of s estimates the standard deviation of the error distribution. Ideally, we would expect the residuals to be symmetrical about 0 and no morethan 3 standard deviations from it. Alternatively, should form The Actuarial So skewed residuals a quantile-quantile a straight line. Education Company or outliers would indicate (Q-Q) plot of the residuals non-normality. against a normal distribution IFE: 2022 Examination Page 22 CS1-12: Linear regression Recall that Q-Q plots were introduced in Chapter 6. They are far superior to dotplots, but will require the use of Rto produce them using the function qqnorm. Scatter plots of the residuals against the values of the explanatory variable (or against the values of the fitted responses) are also most informative. If the residuals do not have a random scatter if there is a pattern then this suggests an inadequacy in the model. Question The claims/settlement data values were asfollows: Claim x 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00 Payment y 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45 Calculate the residuals for the fitted regression 0.8823=+ yx0.164 model . Solution Theresiduals ii ey iy=-are given in the table below: xi 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00 i e 0.163 -0.221 0.171 -0.377 0.330 -0.266 0.239 -0.159 0.246 -0.125 The dotplot explanatory and the Q-Q plot of the residuals variable are as follows: IFE: 2022 Examinations and the plot of the residuals against the The Actuarial Education Compan CS1-12: Linear regression There is Page 23 nothing to suggest The dotplot is symmetrical non-normality in the first about 0 and within s =30.811 diagram. either side, so there are no outliers. Ideally we would expect morevaluesin the middle andless at the edge, but this is unlikely with such a small data set. The Actuarial Education Company IFE: 2022 Examination Page 24 CS1-12: Linear regression Nor does there appear to be a pattern in the third diagram. There appears to be no connection However, the between the residuals Q-Q plot does possibly indicate If the residuals were normally distributed, and the explanatory some deficiency in at least variable (claims). one of the values. we would expect the Q-Q plot to be along the diagonal line whereas one ofthe valuesis some wayfrom the line. The Core Reading now considers a different set of data. Suppose the plot of the residuals against the explanatory Wecan see that the size of the residuals tends to increase variable was as follows: as xincreases. This suggests that the error variance is not in fact constant, but is increasing with x . (A transformation the responses maystabilise the error variance in a situation like this). Typically, 2.6 of we would log the data in a situation like this. Extendingthe scope of the linear model In certain growth models the appropriate model is that the expected response is related to the explanatory value through an exponential function, EYii] [ | x =a exp()i x . In such a case the response Wx ii e? =+ data can be transformed + (where i is then fitted to the data representation implies = usingwy =log and the linear model:ii log? a ) )iixw (, . The fact that the error structure is additive in this that it plays a multiplicative such a structure is considered invalid, role in the original form of the model. If different methods from those covered in this chapter would have to be used. The concept oferror structure is touching on the subject of generalisedlinear willstudy in Chapter 13. IFE: 2022 Examinations models, which we The Actuarial Education Compan CS1-12: Linear regression In Page 25 R we can apply a transformation model The Actuarial Education <-lm(Y Company ~ at the model stage. For example: log(X)) IFE: 2022 Examination Page 26 3 CS1-12: Linear regression The multiplelinear regression model 3.1 Introduction Previouslyweexaminedthe relationship between Y, the response(or dependent)variableand one explanatory(orindependent or regressor)variable X. Wenow considera modelwith k explanatory variables, XX 12 ,,X? , k . There are many problems where one variable can quite accurately be predicted in terms of another. However, the use of additional relevant information should improve predictions. There are many different formulae used to express regression relationships between more than two variables. Most are of the form: EY X ,.. X 12 X??.kkk +a ???= 1 x1 As with the simple linear regression values are to be predicted in terms 12 ,... k are known as the constants 3.2 + 2 x2 +? x + model discussed earlier Yis a random of given data values kxx 12, , ..., x . multiple regression which can be determined from coefficients. observed variable whose They are numerical data. Fittingthe model As for the simple linear model, the the method of least squares. The response xii variable iY =+a Yx11 + is related to the values 2 i 2 + ? + kxik Sothe least squares estimates of y?? qe2 multiple regression nn ii a a+11 xi + + ei , 12, , ..., 2 xi 2 +? coefficients xxii 12 ,,... xik are usually estimated by by i = 1,... ,n k arethe values a,k , 12, ..., for which: 2 xik()==????+k ii == 11 is minimised. As for the simple linear model, to find the estimates the above is differentiated respect to a and k 12,, ..., in turn and the results are equated to zero. partially with Question Asenior actuary wantsto analysethe salaries ofthe 50 actuarial students employed by her company, using alinear model based on number of exam passesand years of experience. Express this modeland the available data in terms of the notation given here. IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression Page 27 Solution The basic modelis: EY [|]x , 12 x 1x1 =+ a +x 2 2 Here1x represents the number of exam passes,2x represents the number of years experience and Y wouldrepresent the corresponding salary. a 12, are constants where: and a reflects the average salary for a new student (with no exam passes or experience) and 12 reflect the changes in pay associated with an extra exam pass and an extra years experience, respectively. Since the data relates to 50=()n students, we need to introduce an extra subscript i corresponding to the i th student. Sothe actual salary for the i th student will be: =+ a whereie Yx ii11 + 2 is the difference xi 2 +ei between the students actual salary and the theoretical salary for someone with the same number of exam passesand experience. Manually solving the equations becomes complicated multiple linear regressions are usually carried even with2k = . out using a computer As a result, such package. Soin a paper-based exam wecan only really test the general principles (asin the question above) rather than the actual modelling. The R code to fit a multiple linear model <-lm(Y Then the estimates ~ Well use Rto do the modelling. modelto a multivariate data frame is: X1+X2+...+Xk) of the coefficients and error standard deviation can be obtained from: summary(model) The Actuarial Education Company IFE: 2022 Examination Page 28 Lets return Consider CS1-12: Linear regression now to the market equity returns a set of equity returns Mkt 1 Mkt 2 0.83% 4.27% 0.12% 3.72% 5.49% 5.21% Mkt 3 1.79% 0.90% 4.62% from four 5.68% 7.37% 5.21% 5.05% 3.70% 1.60% 2.34% 2.66% 5.75% 5.08% 6.03% 5.48% 1.03% 1.38% 2.37% 1.47% 0.69% 0.17% 0.38% 2.59% 2.22% 1.42% 1.37% 3.03% 9.47% 2.95% 2.99% Chapter 11 was as follows: Model the bottom row Mkt_4 as the response the explanatory variables ( 3,,X12 XX ). IFE: 2022 Examinations (X) . 0.10% 0.54% from periods 5.67% 1.40% The scatterplot 12 time 0.26% 3.38% 3.04% markets across 0.39% 6.26% 3.26% different we saw in Chapter 11. Mkt 4 2.75% 4.03% data that variable (Y ) with the other three The Actuarial markets as Education Compan CS1-12: Linear regression The basic form Page 29 of the Yx 11 =+ a modelis: 2 x2 + + 3 x3 where: xi = return from Market ii =,1,2,3 Y = return from Market4 Wecan use Rto estimate the parameters a3,, the CS1B PBOR course. 12, for this model. Forfurther details, see Wecan also estimate the error variance; when we do so we obtain the following numbers. Modelling this (using R) gives: =- + 0.211472 yx0.001954 + 0.125051 x 12+ 0.598636x with s2 3 = 0.004928 Giventhe strong positive correlation between the first and third market, we mayhave been able to use principle components analysis(from Chapter 11)to reduce the number of variables before fitting 3.3 our multiple linear regression model. R2in the multipleregressioncase In the bivariate responses case (Section explained 1.3) we noted that the proportion by a model, called the coefficient of the total variation of determination, of the denoted 2R , was equal to the square of the correlation coefficient between the dependent variable Y and the single independent variable x. In the case of multiple regression independent variables, Yexplained lies kxx with a single 12, , ..., x , dependent variable, R2 measures the proportion Y, and several of the total variation in bythe combination of explanatory variables in the model. The value of2R between 0 and 1. It will generally increase explanatory variables k increases. If 100% of the variation in Yis explained (and cannot decrease) as the number of R2 = 1 the model perfectly predicts the values of Y: by variation in kxx 12,, ..., x . Because R2 cannot decrease as more explanatory variables are added to the model,if it is used alone to assess the adequacy moreexplanatory variables. amount, of the model, there However, these while adding to the complexity of the will always be a tendency mayincrease the value of2R model. Increased complexity to add by a small is generally considered to be undesirable. Weprefer to usethe principle of parsimony whenfitting models, which means wechoose the simplest modelthat doesthe job. So weneed to introduce a new measurethat prevents usfrom adding new variables unnecessarily. To take account quote an adjusted of the undesirability 2R statistic. of increased This is a correction complexity, of the 2R the mean square errors (ie the residual meansum of squares, the number of predictors, The Actuarial Education Company computer packages will often statistic which is based on MSSRES) and takes account of k , and the number of data points the modelis based on. IFE: 2022 Examination Page 30 CS1-12: Linear regression If wehave k predictors, and n observations: Adjusted So MSS RES 2R =- MSSRES MSSTOT n n 1 ??-(1 k--1 ?? R2 ) MSSTOTgives a measureof how muchvariability is explained bythe residuals (or errors) and takes values between much variability is explained coefficient 11?? =- of determination, Recall that the 0 and 1. Hence 1 -MSS RES by the regression MSSTOTgives a measure of how model. Soit is a similar measure to the original R2. mean sum of squares (MSS) is the sum of squares divided by the degrees of freedom. So MSSRE S RES n k -(1) and=-SS MSSTOT TOT n (1)=-SS . The model which maximises the adjusted 2R the best proportion statistic can be regarded in some sense as model. Note, however, that the adjusted of the variation in Y which is explained The R code to obtain the regression and residual 2R cannot be interpreted as the by variation in the xx12,, ..., kx . sum of squares for alinear model assigned to the object model,is: anova(model) The adjusted 2R is given in the output of: summary(model) Question Calculatethe adjusted2R for the equity returns from four different markets, given that R2 = 0.9831 . Solution Wehave 12periods of data=(12) n = (3)k. and weare modelling market4from the other 3 markets Hence, we have an adjusted 2R ?-- of: n-- ?? 2 ? 112 1 ? nk-- ??1(1??R ) = 1 - ?112- 3 - 1?(1 0.9831)=0.9768 ? IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression Page 31 4 Thefull normal modelandinference 4.1 Thefull normal model Again, to makeinferences specify the model further. In the full concerning the responses based on the fitted model, we need to We make the same assumptions as for the linear model: model, we now assume that the errors, ie , areindependent andidentically distributed N (0),s 2 random variables. This willthen allow usto obtain the distributions for Wecan then usethese to construct confidence intervals The error variables ie Under this full each model, the ie s with a normal independent, are: (a) independent, distribution and carry out statistical inference. and (b) normally distributed. are independent, with and the iY s. identically distributed mean 0 and variance s 2 random . It follows variables, that the iY s are normally distributed random variables, with: EY[]ii =+ a 11 x + 2 xi kxik and var[ ]iY +? 2 + This mimics the bivariate linear regression s2 = model but with the mean dependent on k explanatory variables. 4.2 Testing hypotheses onindividual covariates In multiple regression variable the coefficients on the dependent variable k 12, , ..., describe the effect of each explanatory Y after controlling for the effects of other explanatory variables in the model. Eachcoefficient j measures the increasein the value ofthe responsevariable y for a corresponding increase in the value ofjx independent of the other covariates. Asin the bivariate case, hypotheses about the values ofk the hypothesis the variable ix = i 0 which states that, after controlling has no linear relationship Recall that in the bivariate case a hypothesis Generally for speaking, can be tested, notably for the effects of other variables, with Y. of0= it is not useful to include which we cannot reject the hypothesis 12,,..., that is equivalent to 0?= in a multiple regression i In R,the statistic and p-value for the tests of H0 = . model a covariate ix 0. = :0i are given in the output of summary(model). The Actuarial Education Company IFE: 2022 Examination Page 32 CS1-12: Linear regression Question For our equity returns from four different markets, we have the following By considering the p-values given in the final column comment output from on the significance R: of the parameters. Solution A p-value ofless than 0.05(helpfully indicated by asterisks)indicates a significant result. Wecan see that 2 4.3 and3 are significantly different from zero. Analysis of variance (ANOVA) In Section 3.3 we partitioned the variability into that arising from the regression model( and the left-over residual (or error) variability ( RESSS or SSERR). Wethen calculated REGSS ) a ratio of the variances to obtain the proportion of variability that wasdetermined (or explained) bythe model (called the coefficient Wecan use ANOVA to test least of determination, 2:0k H01 2R ). This gave us a crude ===? = measure of fit. against the alternative H1:0j for ? one j . The ANOVA table is now: Source of variation Regression Residual Total Degrees of Freedom Sum of Squares Mean Sum of Squares k SSREG SSREG k 1--nk SSRES n - 1 SSRE S ( n k--1) SSTOT On statistical computer packages the regression sum of squares is often subdivided the sum of squares from each explanatory variable. IFE: 2022 Examinations The Actuarial Education into Compan at CS1-12: Linear regression Page 33 Our statistic is now: SSREG k SSRES ( n regression = k--1) which is Fkn k ,1 mean square residual mean square where 0H is rejected for large values of this ratio. -- The test statistic is just the ratio of the values in the last column. Unlike the adjusted 2R total we divide the regression mean variability by the residual rather than the meanvariability. Alarge value of this ratio means that the majority of the variability is explained linear regression model. Therefore we would reject the null hypothesis Atleast one of the predictors must be explaining the variability. In R,the F statistic and its p-value for aregression anova(model) by the of nolinear multiple relationship. model are given in the output of both and summary(model). Question For our equity returns from four different markets, where we model Mkt_4 using all of the other markets we have the following output: Explain this result. Solution So we can reject 3:0H=== 01 2 , since the p-value is extremely small. So at least one of the coefficients is non-zero, ie there is some relationship with at least one of the covariates. 4.4 Estimating a meanresponse and predicting anindividual response The whole point of the modelling exerciseis so that wecan estimate values of the response variable Y given the input variables xx12 ,,x? , k . Meanresponse As with the linear linear regression model we can estimate the expected (mean) response, 0 model given a vector of explanatory variables, 0x . EY[| x00 ]== a 0 is estimated The Actuarial Education , for a multiple by Company + a =+ 1 x01 01 01 +2x02 + +?+kk 2xx02 ++? x0 0 , kkx which is an unbiased estimator. IFE: 2022 Examination Page 34 CS1-12: Linear regression Recall that the multivariate linear regression model assumes that the iY s are independent, normally distributed random variables, with this expected value to obtain an estimated EY[]ii x11 =+a + 2 xi 2 ++? k mean response corresponding xik . Wehave used to the vector 0x . We are using vector notation here: x = 001,, x 0k() 02 xx... Individual response Similarly, we could predict an individual response 0y (for =0xx ) using the same estimate yx01 + 2 x02 +? kk but with an extra s2 in the expression for the variance of =+a 01 x0 the estimator compared to the meanresponse. Recall that for an individual response value we have xii =+a yx11 + 2i2 + ? +k xik +ei. Each individual response valueis associated with an error term from the regression line. Since ?eNi(0, s2) anindividual point is on the regression line on average estimate ++ ax additional 2s 101 2 02 +?+ hence we havethe same xx kk 0 asfor the meanresponse. However, there is an for the variance. Question For the equity returns from four different =- + markets, we had the following 0.211472yx0.001954 + 0.125051 x 12 + 0.598636x model: 3 where Market 4is the response variable (Y ) and the other three marketsare the explanatory variables ( X , 12,XX3 ). Usethis model to construct an estimate for the return Market 1, Market 2 and Market 3 are 8%, 4% and on Market 4 when the returns on 1%-, respectively. Solution Substituting these values into y =-0.001954 + our equation gives: 0.211472 0.08 + 0.125051 0.04 + 0.598636 - 0.01 = 0.0140 ie 1.40%. We will use Rto calculate confidence intervals for the meanand individual responses. These are beyond the scope of the IFE: 2022 Examinations written exam. The Actuarial Education Compan CS1-12: Linear regression Page 35 For the equity returns from four different for the mean and individual response are 8%, 4% and -1%, respectively, newdata markets, we can obtain 95% confidence intervals whenthe returns on Market 1, Market 2 and Market 3 using R as follows: <-data.frame(Mkt_1=0.08, Mkt_2=0.04 , Mkt_3=-0.01) predict(model,newdata,interval="confidence",level=0.95) predict(model,newdata,interval="predict",level=0.95) These give (1.95%,4.74%) 4.5 and (2.14%, 4.93%), respectively to 3 SF. Checking the model As we did for the linear regression model, we can also calculate the residuals from the fit at i . Wecan then examine them to seeif they each xi whichis the estimated error, ii y=-ey are normally distributed and also independent of the explanatory variables. Question Forthe equity returns from four different + 0.125051 + 0.211472 yx0.001954 =- where Market 4is the response variables( markets, wehad the following x12+ model: 0.598636x3 variable (Y ) and the other three markets are the explanatory 3,,XX 12 X ). The equity returns from four different Mkt 1 Mkt 2 0.83% 4.27% Mkt 3 markets for the first time period were: Mkt 4 1.79% 0.39% Calculate the residual for this first time period. Solution Substituting the values of markets 1 to 3 during the first time period into y =-0.001954 + 0.211472 0.0083 + 0.125051 0.0427 + 0.598636 our equation gives: - 0.0179 = - 0.0056 So the residual is: 0.0039 --- ( 0.0056) = 0.0017 A Q-Q plot can be used to test whether the residuals are normally distributed. A plot of the residuals against the fitted values can be used to determine if the variance is constant whether they are independent of the explanatory variables. The Actuarial Education Company and IFE: 2022 Examination Page 36 CS1-12: Linear regression Question For our equity returns of the residuals from four different markets, the Q-Q plot of the residuals and the plot against the fitted values are given here. Comment on these results. IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression Page 37 Solution The Q-Q plot suggests normality. the lower fitted deficiencies. 4.6 that the one of the data points does not fit the assumption of The plot of residuals against fitted values possibly suggests greater variance for values. This mayindicate dependency and thus imply the model has some The processof selecting explanatory variables Selecting the optimal set of explanatory approaches can be outlined: variables is not always easy. (1) Forward selection. Start with the single covariate that is dependent variable Y . Add that to the model. Then search Two general most closely related to the among the remaining covariates to find the one whichimproves the adjusted 2R the most when added to the model. Continue adding covariates until adding any more causes the adjusted 2R to fall. In Chapter 11, wesaw that the Pearson correlation coefficient equity markets were: Mkt_1 Mkt_2 Mkt_3 matrixfor the returns on the four Mkt_4 Mkt_1 1.0000000 0.6508163 0.9538019 0.9727972 Mkt_2 0.6508163 1.0000000 0.5321185 0.6893932 Mkt_3 0.9538019 0.5321185 1.0000000 0.9681911 Mkt_4 0.9727972 0.6893932 0.9681911 1.0000000 Lets use a forward selection approach to creating a multiple linear regression model. the response variable (Y ) and the other three markets are the explanatory variables Mkt_4 is 3,,XX 12 X . First covariate Westart with Mkt_1 asthat hasthe highest correlation with Mkt_4. Using R weget the model =-0.000140 + 0.873309yx1 which has an adjusted 2R of 0.941. Secondcovariate Using R,adding Mkt_2 givesthe model adjusted 2R =-0 0.816265yx + 0.062317 x12.000063 which has an + of 0.9411. Whereasadding Mkt_3 givesthe model =-0 - 0.490675yx which has an x - 0.418474 13.001692 adjusted2R of 0.9564. Since adding the covariate Mkt_3improves the adjusted 2R the most, we would go for this model. The Actuarial Education Company IFE: 2022 Examination Page 38 CS1-12: Linear regression Third covariate Now we have a model with both Mkt_1 and Mkt_3 ascovariates, we willseeif adding Mkt_2 produces animprovement. Using R,adding Mkt_2gives the model =- + 0.125051 x + 0.211472 yx0.001954 12+ 0.598636x3 which has an adjusted 2R of 0.9768. However, whilstthis maximisesthe adjusted 2R , one of the coefficients of this modelis not significantly different from zero. (2) Backward selection. one by one for Start by adding which the hypothesis that all available i 0= covariates. cannot Then remove be rejected value reaches a maximum, and all the remaining covariates have a statistically impact 2R significant on Y. For the equity returns from four different markets,using R,the model with all covariates addedis + 0.125051 x + 0.211472yx0.001954 12+ 0.598636x3 which has an adjusted2R =- covariates until the adjusted of 0.9768. Wecan see that the coefficient for Mkt_1is not significantly different from zero. Removing this coefficient using Rgivesthe model =-0.002598 + 0.155102yx + 0.785578 x23 whichhasan adjusted2R of 0.97652. Soremoving this covariate causesthe adjusted 2R to decrease not increase. So wed probably keepit. The problem is the high correlation between Mkt_1 and Mkt_3 meaningthat there is some overlap between them in descriptive ability. Ideally we would use Principal Components Analysis from Chapter 11 to reduce the number of covariates by removing this overlap. 4.7 Extendingthe scope of the multiplelinear model Interaction between terms Sofar wehaveonly consideredeachvariablejX as a maineffect,thatis wherewe incorporate each new variable via an additive term, in Xj willincrease the averageresponse byj IFE: 2022 Examinations jjX. This meansthat a unit increase regardless of the other variables. The Actuarial Education Compan CS1-12: Linear regression Soin our Page 39 multiple linear regression model with Mkt_4 asthe response variable (Y ), the three other markets( 3,,X12 X X ) areincluded as maineffects only: =- + 0.125051 x + 0.211472yx0.001954 12+ 0.598636x3 So an increase in, say Mkt_1, by 1% would lead to an increase in However, it is often the case that the effect of one predictor response variable, Y, depends on the value of another Mkt_4 of 0.211472 variable, predictor 0.01. say 1X , on the variable, say 2X . This is called interaction. Thatis, we observe an additional effect when both predictors are present. We model this by including corresponds to the term The regression interaction term, denoted 12.XX on paper which 122XX? 1 , in the regression function. function for the two variables, 1X and 2X as main effects and their is: YX11 +a =+ Note that an interaction 2 X2 +? when an interaction X1 X2 12 term is used in a model, both main effects must also be included. R uses a colon to denote interaction, hence the code to fit the multiple linear model above is: model <-lm(Y ~ X1+X2+X1:X2) The shorthand notation for the main effects and the interaction XX* 12 which corresponds to the whole model above. So the equivalent model way of specifying <-lm(Y ~ the above model in between them is denoted Ris: X1*X2) Interaction effects are described in greater detail in the Generalised Linear Modelschapter. Polynomialregression Finally, the term linear the coefficients in linear regression a andk we could fit a quadratic 12, , ..., model Although this is a bivariate We use the I( ) function YX X12a + 2 . model(having only two model treating Rto treat a term For example, measured variables X and 2X as a different as different variable. So the X and Y) we model variables. R code to fit this modelis: model The Actuarial in rather than linear in terms of the jX s. =+ it as a multiple linear regression quadratic meansthat the regression function is linear in Education <-lm(Y Company ~ X+I(X^2)) IFE: 2022 Examination Page 40 CS1-12: Linear regression The chapter summary starts on the next page so that you can keep all the chapter summaries together for revision purposes. IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression Page 41 Chapter12Summary Aregression model,such asthe simple linear regression model,can be usedto modelthe response when an explanatory variable operates at a given level, or to model bivariate data points. Simplelinear regression Thelinear regression modelis given by: =+a The parameters = Yx ii + where ei and s a , 2 (0s, ?eNi 2) can be estimated using the formulae: Sxy Sxx a yx=- ?()=- yyii s22 11 22 nn--?? 2 Syy- Sxy?? ??= Sxx ?? These are given on page 24 ofthe Tables. Confidence intervals - s can be obtained for using the result: ? tn- 2 2 Sxx Prediction intervalsfor a mean response 0 oranindividualresponse 0y canbeobtained using the results: 1 00 ??0 xx()2??+s2 nSxx ?? ?? ? tn-2 -yy00 and 1 1 ++ ? tn-2 ??0 xx()2??2 s nSxx ?? ?? These are alsoin the Tables The Actuarial Education Company IFE: 2022 Examination Page 42 CS1-12: Linear regression The fit of the linear regression model can be analysed by partitioning the total variance, SS TOT , into that whichis explainedbythe model, REGSS , andthat which is not, RESSS . The formulae for these are asfollows: TOT SS ? =-iyy()2s= yy REG ? iy=- y()2= SS SSRES 2 sxy sxx ? yi =- yi ()2 =syy - 2 sxy sxx Thecoefficientof determination, R2, gives the by the percentage of this variance whichis explained model: R2== 2 Sxy SSREG SSTOT Sxx Syy SS TOT SSRES REG =+ SS Examining theresiduals,iiy=- iey , wewouldexpectthemto benormallydistributed about zero and to have no relationship with the x values. Both of these features can be examined using diagrams. Multiplelinear regression Thelinear multiple regression modelis given by: xii =a + Yx 11 The parameters a,, k? Confidence intervals 1 2 i 2++ x ik++?ei k and , s2 where s, ?eNi(0 can be estimated using a computer package. and tests can be carried out for k? 1,, ANOVAcan be usedto test H 2:0k 01== 2) =? = . against the alternative 1:0jH? for d ) at least one j : F MSSREG SSREG k ==kn MSSRESSSRES n k-- (1) k--,1 fd IFE: 2022 Examinations l l b b df h d d d l( The Actuarial Education Compan CS1-12: Linear regression Page 43 Wecan partition the total variance, SSREG and that percentage TOTSS , into that which is not, SSRES. The coefficient of this variance whichis explained which is explained by the of determination, by the combination model, 2R , gives the of explanatory variables in the model. However, since 2R cannot decrease as more explanatory variables are added to the model, if it is used alone to assessthe adequacy of the model,there will always be atendency to add more explanatory variables whichis undesirable. Hence,computer packages quote an adjusted 2R statistic whichis based on the meansquare errors andtakes account ofthe number of predictors, k, and the number of data points the modelis based on. 'A djusted' 1 =- MSS RES MSS TOT 1=- n n ??- 221 ?? -(1 RR ) k--1 ?? If the modelis a good fit then we would expect the residuals, ey =- iy, to be normally ii distributed about zero, have constant variance and no relationship can be examined using diagrams. withthe x values. These Wecan use one of the following approachesto select an optimal set of explanatory variables: (1) Forward selection. Start with the single covariate that is mostclosely related to the dependent variable Y. Add that to the model. Then search among the remaining covariates to find the one whichimproves the adjusted 2R the most whenadded to the model. Continue adding covariates until adding any more causes the adjusted 2R to fall. (2) Backward selection. Start by adding all available covariates. Then remove covariates one by one for which the hypothesis that i 0= cannot be rejected until the adjusted 2R value reaches a maximum, and all the remaining covariates have astatistically significant impact on Y. Interactive termsoftheform ?ab ai xbixshould The Actuarial Education Company be added where the effect of one predictor IFE: 2022 Examination Page 44 CS1-12: Linear regression The practice questions start on the next page so that you can keep all the chapter summaries together for revision purposes. IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression Page 45 Chapter12 PracticeQuestions 12.1 A new computerised ultrasound scanning technique has enabled doctors to monitor the weights of unborn babies. Thetable below shows the estimated weightsfor one particular baby at fortnightly intervals during the pregnancy. Gestation period (weeks) 30 32 34 36 38 40 Estimated baby weight(kg) 1.6 1.7 2.5 2.8 3.2 3.5 ??xx 21 (i) 7,420 ?y =?15.3 y220=42.03 ?xy=549.8 Show that: (a) (b) (c) (ii) (iii) == 70,SSyy == 3.015 and Sxy=14.3. xx the fitted regressionline is s 2 = =- 4.60 + 0.2043yx. 0.0234. Calculatethe babys expected weight at 42 weeks(assuming it hasnt been born by then). (a) Calculate the residual sum of squares and the regression sum of squares for these data. (b) (iv) Calculate the coefficient Carryout a test of (v) 0:0H = of determination, R2, and comment on its value. vs 1:0H > , assumingalinear modelis appropriate. Construct an ANOVA table for the sum of squares from part (iii)(a) and carry out an F-test stating the conclusion clearly. (vi) (a) Estimate the mean weight of a baby at 33 weeks. (b) Calculate the variance of this meanpredicted response. (c) Hence, calculate a 90%confidence interval for the mean weight of a baby at 33 weeks. (vii) (a) Estimate the actual weight of an individual (b) Calculate the variance ofthis individual predicted response. (c) Hence, calculate a 90%confidence interval for the weight of anindividual baby at 33 weeks. The Actuarial Education Company baby at 33 weeks. IFE: 2022 Examination Page 46 CS1-12: Linear regression The table below shows some of the residuals: (viii) Gestation period (weeks) 30 Residual 0.07 (a) 32 34 36 38 40 0.05 0.04 - 0.07 Calculatethe missingresiduals. Two plots of the residuals are asfollows: (b) Comment on the first dotplot ofthe residuals . (c) Comment on the fit of the model using the plot the residuals against the x values. (d) Comment on the Q-Qplot of the residuals given below: IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression 12.2 Page 47 An analysis using the simple linear regression xx s==12.2 (i) (ii) Calculate (b) Test whether (a) Calculate r . (b) Test whether style is significantly ? different from zero. is significantly different from zero. ( yi y) = of determination Explain how to transform the following +iie=+ ya bx 2 analysis are found to be: SSRES ??(yi =- yi)22 6.4 Calculate the coefficient and explain = SSTOT 3.6 ?(y y)2 = 10.0 i =- what this represents. models to linear form: i =yaiebxi (ii) Exam . The sums of the squares of the errors in a regression (i) 12.5 = 8.1 xy Comment on the results of the tests in parts (i) and (ii). SS REG=- 12.4 10.6 (a) (iii) 12.3 ssyy model based on 19 data points gave: A university wishesto analysethe performance ofits students on a particular degree course. It records the scores obtained by a sample of 12 students at entry to the course, and the scores obtained in their final examinations Student by the same students. The results are as follows: A B C D E F G H I J K L Entrance exam score x(%) 86 53 71 60 62 79 66 84 90 55 58 72 Finals paper score y (%) 75 60 74 68 70 75 78 90 85 60 62 70 x== 836 ?? (i) ?867xy 22 =63,603 =60,016 ?y (x?- x)(y- y)= 1,122 Calculatethe fitted linear regression equation of y on x. [3] Now assume that the full normal model holds. (ii) (a) Calculate an estimate of the error variance2s . (b) Hence, obtain a 90%confidence interval for 2s . [3] (iii) Test whether the data are positively correlated by considering the slope parameter. (iv) Calculate a 95% confidence interval for the meanfinals paper score corresponding to an individual The Actuarial Education entrance score of 53. Company [3] [3] IFE: 2022 Examination Page 48 (v) 12.6 CS1-12: Linear regression (a) Calculate the proportion of variation explained by the (b) Hence,comment on the fit of the model. model. [2] [Total 14] The share price,in pence, of a certain company is monitored over an 8-year period. The results are shown in the table below: Exam style Time (years) Price ( 0 1 2 3 4 5 6 7 8 100 131 183 247 330 454 601 819 1,095 xx ?? (y 60-= ?(xx i - y)(i - y) 22)= 925,262ii - y) = 7,087 Anactuary fits the following simple linear regression modelto the data: yx ii +ie =+ a where (i) i = 0,1, ?,8 {}ieareindependent normal random variables with meanzero and variance Determine the fitted regression line in s 2 . which the price is modelled as the response and the time as an explanatory variable. (ii) Calculate a 99%confidence interval for: (a) (b) (iii) [2] , the true underlying slope parameter s 2 , the true underlying error variance. [5] (a) State the total sum of squares and calculate its partition into the regression sum of squares and the residual sum of squares. (b) Calculatethe proportion of variability explained by the model usingthe valuesin part (iii)(a) to (c) (iv) Comment on the result in part (iii)(b). [5] The actuary decidesto check the fit ofthe modelby calculating the residuals. (a) Complete the table of residuals (rounding to the nearestinteger): Time (years) Residual IFE: 2022 Examinations 0 132 1 2 3 - 21 - 75 4 5 6 7 - 104 - 75 25 The Actuarial Education 8 Compan CS1-12: Linear regression A dotplot Page 49 of the residuals is shown below: -150 -100 (b) -50 Comment 0 on the assumption 50 100 of normality 150 200 using the dotplot. A plot of the residuals against time is given below: 200 150 100 50 0 Residua -50 -100 -150 012345678 Time (c) Comment on the appropriateness ofthe linear the residuals against time. model by referring to the plot of [5] [Total 17] The Actuarial Education Company IFE: 2022 Examinations Page 50 12.7 CS1-12: Linear regression Aschoolteacher is investigating the claim that class size does not affect GCSEresults. His observations of nine GCSE classes are asfollows: Exam style Class X1 X2 X3 X4 Y1 Y2 Y3 Y4 Y5 Students in class( c) 35 32 27 21 34 30 28 24 7 Average GCSEpoint score for class ( p ) 5.9 4.1 2.4 1.7 6.3 5.3 3.5 2.6 1.6 238 (i) ??cc == ?p =?33.4 p =149.62 ?cp=983 226,884 Determinethefitted regression linefor p on c. [3] Class X5 was not included in the results above and contains 15 students. (ii) (a) Calculate an estimate of the average GCSEpoint score for this individual class. (b) Calculatethe standard error for the estimate in part (ii)(a) assuming the full normal model. [4] [Total 12.8 7] Anactuaryis fitting the following linear regression modelthrough the origin: Exam style is Yx i=+e ii (i) N(0, 2) e??i =1,2, n Showthat the least squares estimator of = is given by: ?x iiY ?x 2 [3] i (ii) Derive thebiasandmean square errorof under thismodel. [4] [Total 7] IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression 12.9 Exam style Page 51 Alife assurance company is examining the force of mortality, x , of a particular group of policyholders. It is thoughtthatit isrelatedtothe age,x,ofthe policyholders bytheformula: x x Bc= It is decided to analyse this assumption by usingthe linear regression model: Yxa ii + e=+ i whereesiN? (0, 2) areindependently distributed The summary results for eight ages were asfollows: Age, x 30 32 34 36 38 40 42 44 5.84 6.10 6.48 7.05 7.87 9.03 10.56 12.66 -7.45 -7.40 -7.34 -7.26 -7.15 -7.01 -6.85 -6.67 Force of mortality,x ( 10 4- ) lnx (3 sf) ?== (i) ??xx 11,120 ln x =-57.129 ?(lnii 226 ) 408.50 ? ln xi =-2,104.5 ii 29 = x (a) Apply a transformation to the original formula, xi x x, Bc= to makeit suitablefor analysis bylinear regression. (b) Writedown expressionsfor Y, transformation The graph ofln 25 a and in terms of x, B and c usingthe given in part (i)(a). [2] x against theageofthepolicyholder, xisshown below: 30 35 40 45 -6 -6.5 -7 -7.5 -8 (ii) Comment on the suitability of the regression modeland state how this supports the transformation in part (i)(a). [1] Usethe data to calculate least squares estimates of B and c in the original formula. [3] (iii) The Actuarial Education Company IFE: 2022 Examination Page 52 (iv) CS1-12: Linear regression Calculate the coefficient (b) Hence comment (c) Complete the table of residuals below. (d) Comment on the fit by considering the residuals. Age, x 30 Residual,e i (v) of determination between lnx (a) on the fit of the 32 model to the data. 34 36 - 0.03 0.08 and x. [5] 38 40 - 0.06 42 44 0.02 0.09 (a) Calculate a 95% confidence interval for the meanpredicted response ln (b) Hence obtain a 95% confidence interval for the 35 . mean predicted value of 35. [4] [Total 16] 12.10 The government of a country suffering from hyperinflation hassponsored an economist to monitor the price of abasket Exam style ofitems in the populations staple diet over a one-year period. As part of his study, the economist selected six days during the year and on each ofthese days visited asingle nightclub, where he recorded the price of a pint of lager. Hisreport showed the following Day( i ) 8 29 57 92 141 148 Price(iP ) 15 17 22 51 88 95 lniP 2.7081 475 ??ii == 2.8332 ?ln P = 3.0910 21.5953 3.9318 4.4773 ?(ln iiP2254,403 ) = 81.1584 prices: 4.5539 ?ilniP = 1,947.020 The economist believesthat the price of a pint oflager in a given bar on day i can be modelledby: lnii Pa bi=+ e+ where a and b areconstantsandtheie s are uncorrelated N(0)s , 2 random variables. (i) Estimatethe valuesof a, b and2s. [5] (ii) Calculatethe linear correlation coefficient r. [1] (iii) Calculate a 99%confidence interval for [2] (iv) Determine a 95% confidence interval for the average price of a pint oflager on day 365: (a) in the country as a whole (b) in a randomly IFE: 2022 Examinations b. selected bar. [7] [Total 15] The Actuarial Education Compan CS1-12: Linear regression 12.11 (i) Page 53 Show that the maximum likelihood estimates (MLEs) of a and in the simple linear regression modelareidentical to the least squares estimates. Exam style (ii) Show that the MLEof2s [5] has a different denominator from the least squares estimate. [4] [Total 9] 12.12 The effectiveness of atablet containing 1x mgof drug 1 and2x mgof drug 2is being tested. In trials the following results are obtained: %effectiveness, y x1 x2 92.5 50.9 20.8 94.9 54.1 16.9 89.3 47.3 25.2 94.1 45.1 49.7 98.9 37.6 95.2 469.7 ??yx == 235 yx1222,028.78 ?? (i) yx ?x =11,202.68 ?x1x2 = 19,870.22 == ?x1 =207.812 22?x2 =12,886.42 8,985.96 Usingthe multiplelinear least square regression model: a =+ (a) yx11 + 2 x2 + e Show that the least squares estimates yn (b) =+ a ?? xii11 + 2 a,1 and2 satisfy: ?xi 2 yx11 ii ?? xi =+ a 1 ?xi12 yxii22 ?? xi =+a 1 ?xx1i i2 + 2?xxi2 i1 + 2 ?xi22 Hence, using the above data, show that the fitted =+ 1.194yx+1225.31 0.3015x (ii) of modelis: [7] Comment onthe significance ofthe parameters by considering the following output from Rfor this model. [2] The Actuarial Education Company IFE: 2022 Examination Page 54 CS1-12: Linear regression The coefficient of determination for the fitted (iii) modelis R2 0.9992= . Calculatethe adjusted2R. [2] The ANOVAtable for the modelis: Degrees of Freedom Sum of Squares Regression 2 49.1137 * Residual 2 0.0383 * Total 4 49.152 Source of variation (iv) (v) MeanSum of Squares Calculatethe missingvalues, the Fstatistic and then carry out the Ftest ,stating the conclusion clearly. Calculate the percentage effectiveness for a tablet containing 51.3 mg of drug 1x [4] and 18.3 mg of drug 2x. [2] The plot of the residuals against the fitted values and the Q-Q plot of the residuals are given below. (vi) Comment on the fit ofthe model, makingreference to the plots given above. [2] It is thought that the two drugs might have aninteractive effect. (vii) (a) Explain what this (b) Write down the formula for the regression modelthat hasthe two drugs as main effects and alsotheir interaction. IFE: 2022 Examinations means. The Actuarial Education Compan CS1-12: Linear regression Page 55 The modelin part (vii)(b) has an adjusted2R of 0.9969. (c) Comment on whether the new model is an improvement. [Total The Actuarial Education Company [4] 23] IFE: 2022 Examination Page 56 CS1-12: Linear regression The solutions start on the next page so that you can separate the questions and solutions. IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression Page 57 Chapter12Solutions 12.1 (i)(a) (i)(b) Calculate sums xx=- sx yy=- sy xy=- sxy n n n ?? x()2 = 7,420 11 - 6 ?? y()2 = 42.03 - 11 6 21022 = 70 15.322 =3.015 ?? x()( ?y = 549.8-11) 210 15.3 14.3 = 6 Fitted regression line Usingthe values from part (i)(a): == Sxy 14.3 Sxx 70 = 0.2043 The meanvalues are: n 1210 ?xx == 6 = 1 35 and yy ? 15.3 == n 6 = 2.55 So: yx =- a = 2.55 - 0.2043 35 Hencethe fitted regressionline is (i)(c) 4.60 =- 4.60 + 0.2043yx. Error variance 1 s2 (ii) =- Syy =- - 2 ?? Sxy 1 ?? = 3.015nSxx?? 24 ?? 14.32?? ??= 0.0234 ?? 70 ?? Estimated weight at 42 weeks Usingthe regression line from part (i)(b): =+ a The Actuarial Education yx = - 4.60 Company + 0.2043 42 = 3.98kg IFE: 2022 Examination Page 58 (iii)(a) CS1-12: Linear regression Partition of the variability For the baby weights, Sxx=70, Syy=3.015 and Sxy=14.3. So: 3.015 SSTOT yyS== SSREG 2 Sxy 14.32 == Sxx 70 =2.921 SSRES SSTOT=-= SSREG 3.015 - 2.921 (iii)(b) Coefficient R2 = 0.094 of determination SSREG SSTOT == 2.921 3.015 = 0.969 or 96.9% So wesee that, in this case, mostof the variation is explained by the model. The modelis an excellent fit to the data. Test for (iv) - 0 If 0H is true, then the test statistic s The observed value of this statistic is follows the4t distribution. 2 Sxx 0.2043 - 0 0.0234 70 =11.2, whichis muchgreaterthan 8.610,the upper 0.05%point of the4t distribution. So, wereject0H at the 0.05%level and concludethat thereis extremelystrong evidencethat ie that the babys weightis increasing overtime. 0> (v) ANOVA The ANOVA table is: Source of variation df SS Regression 1 2.921 2.921 Residual 4 0.0937 0.0234 Total 5 3.015 Under 0:0H= The p-value of F IFE: 2022 Examinations 2.921 we have F== 0.0234 124.7=is MSS 124.7 on (1, 4) degrees of freedom. muchless than even 0.01, so0H is rejected at the 1%level. The Actuarial Education Compan CS1-12: Linear regression Page 59 Therefore it is reasonable to assume that (vi)(a) 0? , ie there is linear relationship. Estimate of meanresponse =- 4.60 + 0.2043yx when 0 = 33x we have: Usingthe least squares regression line of 0 =- 4.60 + 0.2043 33 = 2.141 ie the mean weight of a baby at 33 weeksis expected to be 2.141kg. (vi)(b) Variance of meanresponse The variance ofthis estimator is calculated as: va (vi)(b) 0) 0 -xx() =+ ?? ? ?11 2 ?? s r( ? = ?? (33 + - 35) 2 ? ? ?0.0234 ??nSxx ???? 670 = 0.00524 Confidenceinterval The 90% confidence interval =0ts. 00.05,4 (vii)(a) 2?? Estimate .(e will be: ) 2.141 of individual 2.132 = (1.99,2.30) response Theindividualpredictedresponseis also0y (vii)(b) 0.00524 2.141=kg. Variance ofindividual response The variance ofthis estimator is calculated as: var( y0) (vii)(b) ??? 1=+ 0-- + xx 2?? 2 ??s ?? 11 (33 ?1=+ + 35)2??() ? 670 ?0.0234 ??nSxx ???? = 0.0287 Confidence interval The 90% confidence interval =0y yt00.05,4 (viii)(a) s. e.( will be: ) 2.141 2.132 0.0287 = (1.78,2.50) Residuals The completed table is: Gestation period (weeks) 30 Residual 0.07 The Actuarial Education Company 32 0.24- 34 36 38 40 0.15 0.05 0.04 - 0.07 IFE: 2022 Examination Page 60 CS1-12: Linear regression (viii)(b) Comment on first dotplot All values are between 3 =s 3 0.0234 0.46 so there = appear to be no outliers. There maybe possible skewness butits difficult to tell with such a small dataset. (viii)(c) Comment on the plot residuals against explanatory variable The plot appears to be patternless (viii)(d) Interpret which implies a good fit. Q-Qplot Oneof the values is way off the diagonal line whichindicates that the data set maybe non-normal and hence the full normal linear regression model maynot be appropriate. 12.2 (i)(a) Calculate slope parameter estimate Usingthe formula given on page 24 of the Tables: Sxy == (i)(b) 8.1 12.2 Sxx = 0.66393 Test whether slope parameter is significantly different from zero Weare testing: 0H=? :0 Hvs - Under0H , s 2/ hasa 2nt- distribution. Now: xxS 2 s 01 : Syy =- - 2 ?? Sxy ?? = ?? 11 10.6- 8.12 ??= ?? 217 nSxx ?? 12.2?? ?? 0.30718 Sothe observed value of the test statistic is: 0.66393 - 0 = 4.184 0.30718 /12.2 Sincethisis muchgreaterthan 2.898, the upper 0.5%point of the 17tdistribution, wehave sufficient evidenceto reject 0H at the 1%level. Thereforeit is reasonableto concludethat 0? (ii)(a) . Calculate the correlation coefficient Usingthe formula on page 25 of the Tables: r Sxy xxSS yy IFE: 2022 Examinations 8.1 == 12.2 10.6 = 0.71228 The Actuarial Education Compan CS1-12: Linear regression (ii)(b) Page 61 Test whether correlation coefficient is significantly different from zero Wearetesting: :0 rn- 2 Under0H , 01 : Hvs 0??H=? follows the tn2- distribution. 2 1-r Sothe observed value of the test statistic is: 0.71228 17 = 4.184 1 - 0.712282 Sincethis is muchgreaterthan 2.898, the upper 0.5%point ofthe 17t distribution, we have sufficient evidenceto reject0H at the 1%level. Thereforeit is reasonableto concludethat ? 0? . (iii) Comment Thesetests are equivalent. Testing whether there is any correlation is equivalent to testing if the slope is not zero (ie it is sloping upwards and there is positive correlation orit is sloping downwards 12.3 and there is negative correlation). The coefficient R2 of determination SSREG SS TOT == 6.4 10.0 This gives the proportion So the tests give the same statistic and p-value. is given by: = 0.64 of the total variance explained by the model. So 64% of the variance can be explained bythe model,leaving 36% ofthe total variance unexplained. 12.4 (i) Let Transform quadratic to linear form Yy=and Xiix= 2 ii . Then the (ii) model becomes Transform e=+ Ya bX ii + i . exponential to linear form Takinglogs gives: ln Let lnya lnYy= ii and Then the The Actuarial bx=+ ii x= iiX . model becomes Education Company =+ YXa ii where a lna= and b= . IFE: 2022 Examination Page 62 12.5 CS1-12: Linear regression (i) Fitted regression line Calculating the sums of squares: Sxx Sxy 1,774.67 = 12 [1/2] 1,122 = == 8362 60,016=- Sxy 1,122 Sxx 1,774.67 yx =- a = 0.63223 [1] 72.25- 0.63223 69.667 = 28.205 = Hence,the fitted regression equation of y on xis (ii)(a) 1 2 s (ii)(b) [1/2] 86 27 - 2 =962.25, so: 12 =Syy 2 ?? Sxy 1 nSxx ?? 10 ?? = ?? 962.25- 1,1222 ?? [1] ??= 25.289 ?? ??? 1,774.6 Confidenceinterval for variance 10 s 2 ??10 2 s 10 2 , which gives a confidence interval for 2s 25.289 18.31 (iii) 28.205 =+ 0.63223yx . Estimate of error variance Wehave Syy=-63,603 Now [1] 10 25.289 3.94 , of: ??=?? (13.8,64.2) [2] ?? Test whether data are positively correlated Weare testing => H :0 Hvs 01 :0 . - Now ?t10. The observed value of the test statistic is: s 2 / Sxx 0.63223 - 0 25.289 /1774.67 Thisis a highly significant = 5.296 result, [2] which exceeds the 0.5% critical value of the 10t distribution 3.169. So we havesufficient evidence at the 0.5%level to reject 0H and weconclude that (ie the data are positively correlated). IFE: 2022 Examinations of 0> [1] The Actuarial Education Compan CS1-12: Linear regression (iv) Page 63 Confidence interval for the mean finals paper score The variance ofthe distribution of the meanfinals score corresponding to an entrance score of 53 is: 2?? 0 -xx() ??s 2+= 69.667 ()2??-53 11 12 ?? The predicted value is 28.205 + ?? 25.289 1,774.67 0.63223 ??nSxx ???? = 6.0657 53 = 61.713+ [1] [1/2] . Wehave a 10t distribution, so the 95% confidence interval is: 61.713 (v)(a) 2.228 [11/2] Calculate the proportion of variation explained by the model The proportion of variability 2 Sxy R2 (v)(b) 6.0657 = (56.2,67.2) explained by the 1,1222 xxSsyy == 1,774.67 962.25 = model is given by: 73.7% [1] Comment 73.7% of the variation is explained by the model, whichindicates that the fit is fairly good. It still might be worthwhile to examine the residuals to double check that alinear modelis appropriate. [1] 12.6 (i) Regression line Weare given: 60 xx s== ssyy 925,262 xy =7,087 So: == Since x sxy 7,087 sxx 60 36== 4 and 9 =- a yx = [1] =118.117 3,960 9 y== 440, we get: 440 - 118.117 4 = - 32.47 [1] Sothe regression line is: =- The Actuarial 32.47 + 118.117yx Education Company IFE: 2022 Examination Page 64 CS1-12: Linear regression (ii)(a) Confidence interval for slope parameter The pivotal quantity is given by: s ? tn- 2 2 sxx A 99%confidence interval is given by: s tn-2;0.005 2 sxx From our data: s ?? 925,262 =- 17,087 ?? 760 ?? 2 2 12,595.6 =?? [1] So the 99% confidence interval is given by: 118.117 (ii)(b) = 12,595.6 3.499 118.117 60 50.696 = (67.4,169) [2] Confidence interval for variance The pivotal quantity s) 2 n(2 s 2 is given by: 2 ??n-2 [1] So: ??- 0.99 =<P ?? 22 nn --2;0.005 2;0.995 ?? (2 n ?? )s2 s 2 <?? which gives a confidence interval of: ?? (2) nn-?? ss 22 ???? 22 ?? Substituting nn-- 2;0.005 2:0.995 in, the confidence 7 12,595.6 ??= ?? ?? (2) , 20.28 IFE: 2022 Examinations interval 7 12,595.6 , (to 3 SF) is: (4350,89100) [1] 0.9893 The Actuarial Education Compan CS1-12: Linear regression (iii)(a) Page 65 Partition The total sum of squares, is 925,262. SS TOT y? =-iy () 2 is givenby yyswhich [1] The partition given at the bottom of page 25in the Tablesis: ?iiyy() -=??( SSTOT ie i) -yy 22 + () yi - y 2 SSRES =+ SSREG Now, modifying the s 2 formula SSRES on page 24 of the Tables, we have: ?(y y2 s i =- i ) = yy - 2 sxy sxx = 925,262- 7,0872 60 Alternatively,usings 2 from part(ii), weget SSRES( =-n = 88,169 [1] 2)s 2 = 7 12,595.6. ? SSREG=925,262 88,169 = 837,093 [1] Alternatively, this could be calculated as SSREG (iii)(b) Proportion of variability explained SSTOT (iii)(c) 7,0872 sxx 60 == = 837,093 . model R2, whichis given by: Thisis the coefficient of determination, SS R2== REG by the 2 sxy 837,093 = 90.5% 925,262 [1] Comment This tells usthat 90.5% of the variation in the prices is explained by the model. Since this leaves only 9.5%from other non-model sources,it would appear that the modelis a very good fit to the data. [1] (iv)(a) Residuals Theresiduals,ie , be calculatedfrom the actual prices,iy , andthe predicted prices, iy : ii y=ey i Usingthe regression line ii32.47 =- The Actuarial + 118.117yx from part (i), we get: =1 ? xy =- 32.47 + 118.117 1 86 =4 ? xy =- 32.47 + 118.117 440 =8 ? xy =-+32.47 Education Company 4 118.117 8??? 912 e 131=- 86???45 e [1] 330=- 440???- 110 e 1,095=- 912 183 [1] [1] IFE: 2022 Examination Page 66 CS1-12: Linear regression (iv)(b) Dotplot of residuals Since ?eNi(0, s2) we would expect the dotplot to be normally distributed about zero. This does not appear to bethe case, butit is difficult to tell with such asmall data set. (iv)(c) Plot of residuals [1] against time Clearlythis is not patternless. The residuals are notindependent of the time. This meansthat the linear modelis definitely missing something and is not appropriate A plot of the original data (with the regression line) shows to these data. [1] whats happening: 1,200 1,000 800 600 400 200 0 0 1 2 3 4 5 6 7 8 9 -200 The priceincreases in an exponential (rather than linear) way. Weshould have usedthe log ofthe price against time instead. 12.7 (i) Obtain the fitted regression line Theregression line for p on c is given by: =+ where Scp = Scc ? Sc cc =- =-? Scp cp IFE: 2022 Examinations pca anda =-pc 2 ()?c . 2 = 6,884 - n ()( cp n 2382 = 590.2222 9 ) =?? 983-238 33.4 9 [1] = 99.75556 The Actuarial Education Compan CS1-12: Linear regression Page 67 So: 99.75556 == 0.16901 590.2222 33.4 a=- 0.16901 238 99 [1/2] = -0.75836 [1/2] Hence,the fitted regression line is: 0.16901pc 0.75836=(ii)(a) Estimate the GCSEscore The estimate of the average GCSEpoint score is obtained from the regression line: P =-0.75836 (ii)(a) [1] Standard The standard 0.16901 15 = 1.78 error of the predicted error of this individual 1++ 1 - 2 [1] GCSEscore response is given by: cc()02?? 2??- 1 wheres2 + [1] ?? s nScc ?? Spp =- 2 ?? Scp 1 ?? = 25.66889nScc ?? 7 ?? 99.755562?? ??= 1.25841. ?? 590.2222 ?? [1] Hence,the standard error is given by: 23 28 ??9 ) ?? 1.25841 ?? 590.2222 ?? ?? 1 (15 1++ 9 1.33302= 1.25841 = = 12.8 (i) 1.67748 1.29518 [1] Least squares estimate of slope parameter Theleast squares estimate minimises qeii iY==- x?? ()22 2 ?ei. Now: [1/2] Differentiating this gives: dq =- ?x2(ii Education Company d The Actuarial - Yx i ) [1] IFE: 2022 Examination Page 68 CS1-12: Linear regression Setting this equal to zero: ? xYii ? () 0 xi -= ?? xi 2-= xY ii ? ? xY ii = [1] ?xi 2 Thesecondderivative is (ii) ?xi 2 >20, so wedohavea minimum. [1/2] Bias and mean square error The expectation of is: ?? xYxE ii ?? ?? ?? ()== EE Now 0 e i) EY =ii ()x=+E( x 2 xi 2 E () ?? Yi i() 22 [1/2] ??xxii 0+ = x i i . So: [1/2] ==? [1/2] ?xi Hence: biasE ()=- () The variance of var( ) Now var( ) = 0 [1/2] is: var ?? i 2 var( Yi ) ?? xYx ii ??== ?xi ?? ?? [1/2] ?xi ()222 var( Yx i ) = s 2. So: ii=+ ei) = var(e 22 var()==?xi ?xi2() s s [1/2] 2 22 ?xi Hence: MSE() ( =+ var() IFE: 2022 Examinations bias2 ) s = 2 [1] ?xi2 The Actuarial Education Compan CS1-12: Linear regression 12.9 (i)(a) Page 69 Transformation Takinglogs of the original expression gives: ln (i)(b) Bxln xc=+ln [1] Expressions for parameters This expressionis nowlinear in x. Comparing the expression with =+ a ln (ii) a == lnYBx Yx gives: [1] =lnc Comment The graph appears to show an approximately linear relationship and this supports the transformation in part (i)(a). However, it does appear to have a slight curve and this would warrant closer inspection of the modelto seeif it is appropriate for the data. (iii) Least squares estimates Obtaining the estimates of y= [1] a and usingthe formulae given on page 24 of the Tables with ln x: 2 xx=- ?sx xy ?sxy=- nxy == nx22 = 11,120 - 8 sxy 9.273 sxx 168 yx =- a 296?? ?? = 168 8 ?? = -2,104.5 - 8??? ?? -?296 57.129 ? 88 ??? ? = 9.273 ? 0.055196 = - 57.129 = [1] 296 -0.055196 = -9.1834 88 [1] Therefore, we obtain: Bea == e - 9.1834 ce (iv)(a) 0.055196 e == Coefficient The coefficient The Actuarial yy ?sy Education [1] = 1.06 of determination of determination 22 Rr == where 0.000103 = 2 sxy xxssyy = is given by: 9.2732 = 95.7% 168 0.53467 22= 408.50- 8 ny=- Company [1] 2 8 ??-57.129 ?? = 0.53467. ?? IFE: 2022 Examination Page 70 (iv)(b) CS1-12: Linear regression Comment Thistells usthat 95.7% of the variation in the data can be explained by the modeland so indicates an extremely good overall fit ofthe model. [1] (iv)(c) Calculate residuals The completed table of residuals Age,x Residual,e i 30 32 0.08 0.02 using ii y=ey i is: 34 36 - 0.03 38 - 40 - 0.06 - 0.06 0.03 Age 32 yrs: ( 7.40)-- ( - 9.1834 + 0.055196 32) Age 36 yrs: ( 7.26)-- ( - 9.1834 + 0.055196 36) = - 0.06 Age 40 yrs: ( 7.01)-- ( - 9.1834 + 0.055196 40) = - 0.03 = 42 44 0.02 0.09 0.02 [1] [1] (iv)(d) Comment Theresidualsshouldbepatternlesswhenplottedagainst x, howeverit is clearto seethat some pattern exists this indicates that the linear variable at work here. (v)(a) modelis not a good fit and that there is some other [1] Confidenceinterval for the log of the mean predicted value Using the formula xx 2?? ??? ?+= where2 s given on page 25 of the Tables, the variance ?? s2 ?? 11 (35-- 37)2??()0 ? ?0.0038056 + 8168 ???? ??nSxx = of the mean predicted response is: 0.0005663 [1] ?? 2 =??0.53467 =- 19.273 6168 ?? ?? 0.0038056. The estimate is Y==-9.1834 l 35n + [1] 0.055196 35 = - 7.251. Usingthe 6t distribution, a 95% confidence interval for Y ln 35=is: 7.251(v)(b) 2.447 0.0005663 Confidence interval The corresponding IFE: 2022 Examinations for = ( - 7.309, - [1] 7.193) mean predicted value 95% confidence interval for 35 is (0.000669,0.000752). [1] The Actuarial Education Compan CS1-12: Linear regression 12.10 (i) Estimate Page 71 parameters Nowusingx for i andy for lniP, weget: xx ?sx nx=22 = 16,799 ?sxy xy=-nxy = 237.39 yy ?sy 22= ny=- 3.4322 Sothe estimatesfor a, b and2s sxy b== sxx 237.39 16,799 (ii) 1 n-2 (syy=- are: [1] = 0.01413 ay bx== 21.5953 s2 [2] -0.01413 2 sxy ) sxx 1 = 475?? ??= 2.4805 [1] 66 ?? 237.392 (3.4322- 4 16,799 ) = 0.01940 [1] Correlation coefficient The correlation coefficient is: r sxy 237.39 xxssyy (iii) == 16,799 3.4322 = 0.989 [1] Confidenceinterval for slope parameter Usingthe result given on page 24 of the Tables, wehave: = bt4;0.005 s 2 Sxx 0.01413 This gives a confidence interval (iv)(a) If Confidence interval for for 4.604 0.01940 [1] 16,799 b of (0.00918,0.0191). [1] mean response 365ydenotes the log of the average price of a pint oflager in the country as a whole on day 365,the predictedvaluefor 365y is: y 365 2.4805 =+ 0.01413 365 = 7.638 The Actuarial Education Company [1] IFE: 2022 Examination Page 72 CS1-12: Linear regression The distribution of 365 YY - S365 365 is 4t , where: 1 (365 x ) ?? 22 =+ =s?? ??? nSxx ?? s365 ?1 [365 (475 / 6)]-22 ? ? + ?6 So a symmetrical 95% confidence interval for 7.638 2.776 0.09758 7.638 ?0.01940 ? ? 16,799 = [1] 0.09758 365y is: 0.867 and the corresponding confidence interval for [1] = (6.77,8.51)= 365P is: ( 6.771ee 8.505 =) (870,4940) [1] , (iv)(b) Confidenceinterval for individual response * If y365 denotes the log ofthe observed price of a pint oflager in arandomly selected bar on day YY 365 * 365 - 365, then hasa4t distribution, where: S*365 *2 1=+ 365 x) 2???? 1 (365 + nSxx 2 ?? This gives a confidence interval =ss2365 ss +2 = 0.09758 + 0.01940 = 0.11698 [1] of: 7.638 2.776 0.11698 7.638 0.949 = (6.69,8.59)= [1] * Sothe confidence interval for P365is: ( 6.689ee8.587) = (800, 5360) [1] , 12.11 (i) MLEsof a and EachiY has a s+ L ?sp 2 Nxia (, 2) distribution, exp 11 =-?? so the joint likelihood yx ii -- a 2?? ?? s = ?? ?? 1 sp function is: nn nnexp - 1 ?(yi --2 i ) 22(2 ii== 11 ax s )2?? ???? ???? [1] ?? Takinglogs weget: logLn =- log 1 - n 2s 2 ?(y -sa -x ii) 2 +constant [1/2] i=1 IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression Differentiating Page 73 with respect to ?logL 2s2 ?2(yx xii - ? ) (- -a i) [1] n 1 2s2 ?2(yx =- ii) ( - 1) equal to 0 weget?? yn xa -- - ?a -a i=1 logL [1/2] nn ii ?a Bysetting ? logL nn equal to 0 weget??yxii n xi-ii==11 These are the same normal equations that . [1/2] 0 ?nn ?? ?? n ?? ? == ?? Now differentiating the log likelihood ?() - ?s - a 2 s MLEsare as before, ie: nn ??yxii - and denominator withrespect to n 1 =- [1/2]a a n ii==11 == - [1] yx ? MLEof s2 has a different ?logLn . 0 = i =1 ? = ?ii==11 ?xi 2 we got before, so the nxyii ??-? x ?? y?i? S i ?? ?? ? ?? ?ii 11 ??i 1 ==? xy 2 Sxx nn ?? ? ? nx2??-? ?? x ?? ? ii ?? Show = ii==11 ? (ii) a: i=1 ?logL ? with respect to n 1 =- Bysetting and then - yxii - from the least squares estimate s: 2s 23 - i=1 =- n s The Actuarial Education 1 + Company s n ?()2 - -a yx3 ii [1] i=1 IFE: 2022 Examination Page 74 CS1-12: Linear regression ? Bysetting logL equal to 0 and substituting a yx=, weobtain: ?s n 1 ? =- sa ni= 1 ()22 yx ii - n 1 ? yy x x =- ni=1 + 1 n? -yy=- () n ii() - ??2 ( xii x) ??- i=1 1 n?? -yy=- 2)( nn ( xii ii==11 1 +SSxy =- 22 n?? - x -yy )+ ()22 i n ?? x) 2???? ?(xi i =1 ?? Sxx yy 1 2 ???? SSxy2 =-?2 xySSxy +?? Sxx ? yy nSxx Sxx ?? ? ? ?? 1 Syy =- 2 ?? Sxy [3] ?? nSxx?? ?? which has a different denominator from before (and therefore is a biased) estimator. 12.12 (i)(a) Least squares estimates equations Weneed to ?(( Qy minimise the expression =-a + 2. 11 2i 2))ii x [1] +x To do this, we need to differentiate the expression withrespect to the parameters and set the expressions equal to zero: ?Q ? =- 2(+yx -(a ii ?a ? ii ?Q =-2( yn xa =+ ?? 11 + ? - (a ?1 ? yxii11 ?Q ? =-2( 11 + xyii ?? xy ii + xi =+a - (a + 2xi 2))=0 11 xi1 1 ?x2 2 eqn (1) i + = 2 xi2)) ?xi12 + 2?x 21xi1 + 0 eqn (2) x2i i1 2 xi2 )) = 0 ?2 ? yx ii22 ?? xi =+a 1 ?xx1ii2 + 2 ?xi2 2 eqn (3) [3] IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression (i)(b) Page 75 Evaluate the least squares estimates Substituting these valuesinto the equations above, weget: (1) 469.7 5 a =+ 235 + 207.8 12 (2) 22,028.78 235 a =+ 11,202.68 (3) 19,870.22 207.8 a=+ 8,985.96 + 8,985.96 + 12 12,886.42 12 Solvingthese simultaneously: 47 (1)- (2) 5 (3)1 of (ii) 207.8 (1) 57.68 (5)- Substituting 47.12 ? ? 3,903.2 this back in, 157.68 1,747.44 (4) we get 1 = =- 3,903.2 =2 0.301468 1.19367 and ? eqn (4) 780.64=-12+ eqn(5) 21,251.26 12+ [1] = 25.3084a, which gives us a regression line =+ 1.194 +1225.31 0.3015yxx. [2] Significance of the parameters The p-values for all the parameters are less than 0.05 and so they are all significantly different from zero. (iii) [2] Adjusted2R Wehave5n= trials and k2= adjusted (iv) predictors. Hence: n-- ?? ? 2215 1 ??(1) = 1- ? ??(1 -RR nk--15?? ? - 2 - 1? 1=- 0.9992)0.9984 = [2] ANOVA The completed ANOVA table for the modelis: Source of variation Degrees of Freedom Sum of Squares MeanSum of Squares Regression 2 49.1137 24.5569 Residual 2 0.0383 0.0192 Total 4 49.152 [1] The F statistic is: F SSREG k SSRESn k-- 24.5569 == (1) 0.0192 = 1,280 (3 SF) [1] Thisis far in excessof eventhe 1% 2,2F critical value of 99.00. Hencewecanreject the null hypothesis that 0 The Actuarial Education 12==. Company [2] IFE: 2022 Examination Page 76 (v) CS1-12: Linear regression Predict the percentage effectiveness Substituting in the values given: y (vi) 25.31=+ (1.194 Interpret 51.3) + (0.3015 18.3) = [2] 92.1% plots Thefirst plot appears to berandom andthere is no discernible increase in the variance would imply that the model meets these assumptions. so this Point 1(92.5%) does appear to be an outlier. Butit is difficult to tell withsuch a small dataset. Withthe exception [1] of point (1) the rest of the values lie along the diagonal line thus implying normal distribution is appropriate. (vii)(a) a [1] Interaction If there is interaction between the two drugs then there is an additional effect caused when both are present compared with what would be expected if they wereeach administered singly. [1] (vii)(b) Formula The formula is (vii)(c) Compare =+a YX1 + 1 2X2 + ? 1212 X X . [1] models The model with just the two drugs as main effects had an adjusted 2R of 0.9984 in part (iii) whereasthe new model with the interactive effect has an adjusted2R of 0.9969. Sincethere is a decreasein the value of the adjusted2R the previous model would be considered the best modelasthe interaction term does not improve the fit enough to justify the extra parameter. [2] IFE: 2022 Examinations The Actuarial Education Compan CS1-12: Linear regression Page 77 Endof Part3 Whatnext? 1. Briefly review the key areas of Part 3 and/or re-read the summaries at the end of Chapters 10 to 12. Ensureyou haveattempted some ofthe Practice Questionsatthe end of eachchapterin 2. Part 3. If you dont havetime to do them all, you could save the remainder for use as part of your revision. 3. Attempt Assignment X3. Workthrough the Chapter10to 12 material(hypothesistests, correlation andregression) 4. of the Paper B Online Resources(PBOR). Timeto consider... ... revision andrehearsal products Revision Notes Each booklet covers one maintheme of the course and includes integrated questions testing Core Reading,relevant past exam questions and other useful revision aids. One student said: Revision books are the most useful ActEd resource. ASET This contains past exam papers with detailed solutions and explanations, pluslots of comments about exam technique. Onestudent said: ASET into is the single far most useful tool ActEd produces. more detail than necessary source of learning and I am sure it The answers do go for the exams, but this is a good has helped me gain extra marks in the exam. Youcan find lots moreinformation, including samples, on our website at www.ActEd.co.uk. Buy online at www.ActEd.co.uk/estore The Actuarial Education Company IFE: 2022 Examination Allstudy materialproducedby ActEdis copyright andis sold for the exclusiveuse of the purchaser. Thecopyright is ownedbyInstitute andFacultyEducationLimited, a subsidiary of the Institute and Faculty of Actuaries. Unlessprior authorityis grantedby ActEd,you maynot hire out,lend, give out, sell, store ortransmit electronically or photocopy any part of the study material. You musttake care of yourstudy materialto ensurethat it is not used or copied by anybody else. Legal action will betaken if these terms areinfringed. In addition, we mayseekto take disciplinaryactionthrough the profession orthrough your employer. Theseconditionsremainin force after you havefinished usingthe course. IFE: 2022 Examinations The Actuarial Education Compan CS1-13: Generalised linear models Page 1 Generalised linear models Syllabusobjectives 4.2 Generalisedlinear 4.2.1 models Define an exponential family of distributions. Show that the following distributions may be written in this form: binomial, Poisson, exponential, gamma and normal. 4.2.2 State the meanand variance for an exponential family, and define the variance function and the scale parameter. Derive these quantities for the distributions above. 4.2.3 Explain what is meant by the link function and the canonical link function, referring to the distributions above. 4.2.4 Explain whatis meantby a variable, afactor taking categorical values and aninteraction term. Definethe linear predictor, illustrating its form for simple 4.2.5 models,including polynomial models and models involving factors. Define the deviance and scaled deviance and state how the parameters of a GLM maybe estimated. Describe how asuitable model maybe chosen by using an analysis of deviance and by examining the significance ofthe parameters. The Actuarial Education Company IFE: 2022 Examination Page 2 CS1-13: 4.2.6 Generalised linear models Definethe Pearson and deviance residuals and describe how they maybe used. 4.2.7 Applystatistical tests to determine the acceptability of afitted Pearsons chi-square test and the likelihood-ratio test. 4.2.8 Fit a generalised linear IFE: 2022 Examinations model to a data set and interpret model: the output. The Actuarial Education Compan CS1-13: 0 Generalised linear models Page 3 Introduction In Chapter 12 weintroduced the simple linear model by adding by allowing functions models. The multiple linear of these variables, including model built on interaction. Recallthat the bivariate linear regression modelis iYx +ii a =+ e and the multiplelinear 11 regression model with k explanatory variablesisYx xii =a + normal regression more explanatory variables and then we extended this further model we assume the error terms are normally distributed 2i 2++... xk ik ++ ei. In the full with mean 0 and variance 2s . Hence,the response variable,iY , is also normally distributed. Generalised linear models (GLMs) extend this further by allowing the distribution of the data to be non-normal. This is particularly important in actuarial work where the data very often do not have a normal distribution. For example, in mortality, the Poisson distribution is used in modelling the force of mortality, x and the exponential is used for survival analysis. In general insurance, the Poisson distribution is often used for modelling the claim frequency gamma or lognormal distribution for the claim severity. the binomial distribution is used to model propensity. Finally, in all forms Claim severity is just another term for the size of a claim, claim frequency which claims are received In this chapter called factors). and propensity refers to the probability we also introduce the idea of categorical GLMs are widely used both in general and life insurance. and the of insurance, refers to the rate at of an event happening. explanatory variables (sometimes They are used to: decide whichrating factors to use(rating factors are measurable or categorical factors that are used as proxies for risk in setting premiums, eg age or gender) estimate an appropriate premium to charge for a particular policy given the level ofrisk present. For example, in motor insurance, there are manyfactors that may be used as proxies for the level of risk (type of car driven, age of driver, number of years past driving experience, etc). Wecan use a GLM both to decide which ofthese factors are significant to the assessment ofrisk (and hence which should beincluded), and to suggest an appropriate premium to charge for arisk that represents a particular combination of these factors. Question Suggest rating factors that an insurance annuity contract. The Actuarial Education Company company may consider in the pricing of a single life IFE: 2022 Examination Page 4 CS1-13: Generalised linear models Solution Rating factors that might be usedin the pricing of a singlelife annuity include: age sex (if permitted by legislation) size of fund with whichto purchase an annuity postcode health status (for impaired life annuities). Wehave only used continuous variables so far in linear regression, such as weight, height and size of claim. Categorical explanatory variables can only take categories, such as gender and type of car driven. IFE: 2022 Examinations The Actuarial Education Compan CS1-13: 1 Generalised linear models Page 5 Generalised linear models Generalised linear models (GLMs) relate the response variable that we want to predict to the explanatory variables or factors (called predictors, covariates orindependent about which variables) we have information. In other words, a GLMhasinputs (explanatory variables) andis usedto predict an output (response variable). Tofully define a GLM, we need to specify the following three components. 1. A distribution Forlinear for the response variable modelsthe response variable had a normal distribution, extend this to a general form For example, of distributions known we might choose a gamma distribution YN )s 2 . ~( 0, as the exponential to Wenow family. model the sizes of motor insurance claims or a Poisson distribution to modelthe number of claims, or a binomial distribution to modelthe probability 2. of contracting Alinear The linear disease. predictor predictor, model this a certain ?, is a function + 01x. For the was of the covariates. multivariate linear model this regression was extended to functions of the explanatory For example, if the response variable is weight, alinear predictor of 01x+would be xx01 1++ 2 2 +? + kkx , which wethen For the bivariate linear regression variables. ? is the Greekletter eta. appropriate Alinear predictor is linear in the parameters 0 covariates. 3. for a model where we thought the only covariate For example Alink 2 is also alinear ?01x=+ x. . It does not have to belinear in the predictor. function Thelink function connects the = EY() . Forlinear eg and1 was height, EY == +01() x meanresponse to the linear predictor, modelsthe meanresponse , so the link function is the identity The link function, like its name suggests, is the link g() = ?, where was equal to the linear predictor, function, between the linear g() =. predictor (input) and the meanof the distribution (output). Rememberthat what wearetrying to doin a GLMis find a relationship between the meanofthe response variable and the covariates. the link function is invertible, =g The Actuarial 1 Education By setting the link function we can makethe mean ()g ?= , then, assuming that the subject of the formula: ()?- Company IFE: 2022 Examination Page 6 CS1-13: The notation is not straightforward to get to grips with. An example Generalised linear models may help. Example Supposethat we are trying to modelthe number of claims on carinsurance policies. The response variable,iY , is the number of claims from Policyi . distribution Wedecide that a Poisson is appropriate: ? YPoi ii () Consider a model where we believe that the only covariate is the age,ix , of the policyholder. The linear predictor is iix ? =+ a . Alink function g that is commonly () used with the Poisson distribution (see page 27 of the Tables) is: log= Weset this equal to the linear predictor. So,for Policy i : ()== logii i =?a + gxi Now weinvert the formula so that i == exp ii() exp (?a Wenow have a relationship + is the subject of the formula: xi ) between the mean of the response variable and the covariate. Componentsof a generalisedlinear model The three components of a GLM are: 1. a distribution for the data (Poisson, exponential, gamma, normal or binomial) 2. alinear predictor (a function ofthe covariates that is linear in the parameters) 3. alink function In order to understand examples below. (that links the mean of the response variable to the linear how these three components fit together, predictor). we give a couple of further Example Supposethat we are setting up a modelto predict the passrate for a particular student in a particular actuarial exam. We might expect there to be manyfactors that affect whether a student is likely to pass or not. We might decide to set up athree-factor model,so that the probability of passing is a function the number of assignments the students IFE: 2022 Examinations mark on the of: N submitted by the student (a value from mock exam S (on a scale from 0 to 4) 0 to 100) The Actuarial Education Compan CS1-13: Generalised linear models Page 7 whether the student had attended tutorials or not (Yes/No). We mightthen decide to usethe linear predictor: ? whereia +=+12NSa i takes one value for those attending tutorials and a different value for those who do not. Wenow need alink function. ? here will not necessarily take a value in the interval (0, 1). Depending on the values ofia , 1 g () = log and 2 , ? might take any value. If ?? and set this equal to the linear predictor ??-?? 1 invert this function to make the subject to give == ?, we havelog??=?. e? ee 1 = 1 +e -? ?? We ??-1 () 1. - Wecan 11 willlie in the range from zero to one, and so can be used as a passrate. ?? now see that Wenow use maximum likelihood a we use the link function estimation ++ to estimate the four parameter values: Ya , Na (the parameters corresponding to having attended tutorials and not having attended tutorials, respectively), 1 (the parameter for the number of assignments) and2 mock mark). To do this (the parameter for the we need (ideally) the actual exam results of alarge sample of students whofall into each of the categories. Havingdone this for a set of data, we might come up withthe following parameter values for the linear predictor: 1.501 =- aY aN =- 3.196 1 0.5459= 2 = 0.0251 Wecan now usethe linear predictor andlink function to predict passrates for groups of students with a particular characteristic. For example, for a student who attends tutorials, submits three assignments ? =- and scores 65% on the mock, we have: 1.501 + 0.5459 3 + 0.0251 65 = 1.7682 Wenow use the inverse 1.7682 1=+ e- of the link function to calculate : ()1 = 0.8542 - Sothe model predicts an 85% probability of passingfor astudent in this situation. Soin this particular g () = log The Actuarial situation, the linear 1 Education predictor is a +=+12i NS?and the link function is ?? ??-?? . Company IFE: 2022 Examination Page 8 CS1-13: Generalised linear models Question Usingthe model outlined above, answer the following questions. (i) Calculatethe predicted pass probability for a student who attends tutorials, submits three assignments and scores 60% on the mock exam. (ii) Calculate how muchthe probability would go upif the fourth assignment weresubmitted. (iii) Calculatethe highest pass probability for someone who does not attend tutorials. (iv) Determine whether anyone gets a probability of 0 or 1 under this model.If not, calculate the (v) minimum and maximum pass rates. State the underlying probability distribution. Solution (i) Using the values . So the (ii) 60=and aY 2.1886 so that Using a = 1.501 , we get?= 1.6427 , so that 0.83790 = =-3.196 N N we use =4N instead of =3Nand get 0.8992 . So the pass rate goes up by about 6%. = 4 and =100S , weget ?=1.4976 , so that = 0.8172. Sothe , highest possible pass rate for someone (iv) =- model predicts an 84% pass rate. If the fourth assignment wassubmitted, ?= (iii) N3= , S who does not attend tutorials is about 82%. No. The minimum probability (for someone who does not attend tutorials assignments and whoscores zero on the mock)is obtained from a value of which gives a pass probability of about 4%. The maximum probability or submit ? 3.196=- of passing (for someone who goesto tutorials, submits all the assignments and scores 100% on the mock) comes from a value of ? 3.1926= which gives a pass rate of about 96%. So these arethe maximum and minimum passrates predicted by the model. (v) In fact, what weare doing hereis estimating a parameter of a binomial distribution. For any group of students with the same characteristics (ie all having the same values for all of the 3 factors), the number who pass may be well-modelled using a binomial distribution. The parameterof the binomial distribution value of e? ? that weare trying to find is the n (),ZBin that wefound above. = 1+e? Weare again using to denote a probability as well as the meanofthe response variable n=YZ/ . IFE: 2022 Examinations The Actuarial Education Compan CS1-13: Generalised linear models Page 9 Example Astatistician is analysing data on truancy rates for different school pupils. She believes that the number of unexplained days off school in a year (ie those not due to sickness etc) for a particular pupil mayhave a Poisson distribution number of factors that catchment may affect with parameter . However, she believes that there are a , for example: age of pupil, whether he/she lives within the area, and sex. She builds a generalisedlinear model based on these characteristics, using data from alarge group of pupils. Her model willtake the form: a+ij=+ ? ? where x = age, and x a and are numerical variables corresponding to the different characteristics for location and sex respectively. She has collected the data shown in the table below. Eachfigure gives the average number of unexplained absences in a year for 16 different having the same characteristics. groups of pupils, all the pupils within each group Average number of unexplained absences per pupil in a year Agelast birthday Within 8 10 12 14 1.8 2.0 6.3 14.1 Female 0.5 1.6 5.0 16.2 Male 2.1 7.5 25.5 72.0 Female 2.8 6.2 19.6 68.2 Male catchment area Outside catchment area Bycarrying out a maximumlikelihood estimation analysis, she calculates the values ofthe parameters that fit the model best. As aresult she can find a value of ?for any particular pupil, which she can use to find the appropriate Poisson parameter she needs afunction that converts a number ?that (since the Poisson parameter function g() = log, using the link function. In this case maytake any valueinto a positive number mustalways be positive). Sofor example she could usethe link so that whenthis is set equal to the linear predictor =e?. This will give her a positive valuefor , ? and inverted, whichshe can usefor her Poisson parameter. Soshe mightcome up withthe following values for the parameters: aWC =-2.64 aOC =-1.14 M =-3.26 F where WC=Within catchment , OC= Outsidecatchment, use the The Actuarial model to predict possible truancy rates for students Education Company =-3.54 ? = 0.64 =MaleM , F =Female. Shecan now with particular characteristics. IFE: 2022 Examination Page 10 CS1-13: The link function g() =log is called the canonical link function Generalised linear models for the Poisson distribution. Canonical meansthe accepted form ofthe function. It is anatural give sensible results. function to use, and will often In fact, it is not compulsory to usethe canonical link function and there maybe situations where a different link function is more appropriate. Each case mustbejudged onits merits. Question Determine the expected number of unexplained days absence for afemale pupilliving within the catchment area whois 12 years old. Solution For this combination offactors, wehave: ?aWF =+ +=12 ? Using the link function e1.5 == 1.5 given, we have: 4.48 Sothe expected number of days unexplained absencein this caseis about 4.5. We will consider each of the components function) in the next three sections. In practice, the distribution of a GLM (distribution, predictor, link ofthe data is usually specified atthe outset (often defined by the data), the linear predictor may be chosen according convenient, and then the best model structure is found predictors. linear to what is thought appropriate by looking at a range of linear Of course, these are not rules which must be adhered to: it or may bethat it is possible that more than one distribution could be appropriate, and these should be investigated before making a final decision. It could be unclear which link function should be used, and again arange offunctions The R code to fit a generalised object model, is: model <-glm(Y We will specify the inputs IFE: 2022 Examinations ~ ..., linear family can beinvestigated. model to a multivariate = ... (link for the blanks in the following = ... data frame and assign it to the )) three sections. The Actuarial Education Compan CS1-13: 2 Generalised linear models Page 11 Exponentialfamily Recallthat the distribution of the response variable, Y,in a GLMis a memberofthe exponential family. The exponential family is the set of distributions density function (PDF) Y(fy ;,?ff ) where f ()a, whose probability function, can be written in the following yb((?? )) cy( , =+ a()f exp ()b and )cy (, ? are specific f or probability form: ??- (1) )?? ?? functions. This formula is given on page 27 of the Tables. Note that fis another way of writing the Greek letter phi, usually written as f. There are two parameters in the above PDF. ?, which is called the natural parameter, is the one which is relevant to the model for relating the response ()Yto the covariates, and is known as the scale parameter or dispersion f parameter. When trying to show that a distribution is a member of the exponential family, it is important to remember that ?is a function of EY= () only. Weshall see later in the chapter exactly how ? is usedto relate the response to the covariates. Wherea distribution hastwo parameters, such asthe N(, scale parameter f is to take f to be the other 2)s , one approach to determining the parameter in the distribution, other than the mean. For example, in the case ofthe normal distribution, ie the parameter wetake f s= 2 . Wherea distribution has one parameter, such as Consider the following ? fy() dy = statement about a continuous ()Poi ? , wetake f1= . PDF: 1 (2) y By substituting the expression from (1) and differentiating this with respect to ?, it can be shown that the meanand variance of Y are: EY[] = b' ( ?) and var( ) These formulae Ya( f ) = b'' (? ) . can also be found on page 27 of the Tables. Question Prove these two results for a member ofthe exponential family, usingthe results given above. The Actuarial Education Company IFE: 2022 Examination Page 12 CS1-13: Generalised linear models Solution Mean Differentiating both sides of equation (2) with respect to ? gives: yb' ()? - ? fy dy()= 0 a()f y (3) Simplifying: b1() ' ? yf () y dy-= ()?? aa() ff () f y dy 0 yy y dyY= E( ), and? fy Since?yf () () dy=1, wehave: () b'1( ?) EY () aaff( ) -=0 Hence: EY ()0-= ( ) b?? ? =EY() b'' ( ) Variance Usingthe product rule to differentiate equation (3) withrespect to 2 2 fy() dy d? =?? ??2 yy?? yb()? ?? db'' ()=?? f y () ?? Splitting this into two separate integrals af()[] 2 Since(bE () = yb ( ? )()2f y() dy--' )Y?'then ? ? () aaff () ??' ??fy() dy ?? ? gives: 0 gives: b''?1() a()f ?? yy = f y dy() 0 (( 2))yb var(Y ). f( y) dy -='? Again ? fy() dy=1, sowehave: b''? 1() var(Y)-=0 a()f a()f []2 Rearranging gives: var()Ya) = IFE: 2022 Examinations bf ( ) ?''( The Actuarial Education Compan CS1-13: Generalised linear models Page 13 In general, note that the mean does not depend on f, so when predicting Y it is ? which is of importance. Also, the variance of the data has two components: one which involves the scale parameter, and the other which determines the way the variance depends on the mean. The variance of the Thats because normal and2s For other distributions, distribution does not depend on the mean. areindependent. however, the variance does depend on the For example, the Poisson distribution mean. has mean and variance both equal to the parameter . So knowing the meanof a Poisson distribution tells usthe variance as well. To emphasise this dependence on the mean the variance is often var( ) =Ya( )Vf ) ( , where the variance function is defined as written as () =Vb'' (? ) The variance function 2.1 does not give the variance directly unless af () = 1. Normaldistribution To motivate these definitions normal distribution. and the subsequent For members of an exponential family, developments, we consider we want to be able to find formulae first the for the mean and variance ofthe distribution from the general parameters. First we will rewrite the normal distribution in the form of equation (1) and then consider other distributions as exponential families. Note that we use f, in a slight abuse of notation, for both continuous and discrete distributions. Wehave seenthis style of notation before in this subject. The alternative notation is to use px () for a probability function and x for a density function. ()f Provided that the methodis clear, either notation is acceptable. Y(;,?f) fy exp = ) 2?? ?? 1( y-2s2 2ps2 y exp??log2 2 ?? ?? ?? ?? 1 y2 2 =- +?? 22 2 ss ???? ??-?? ps2 ???? ???? ?? ?? The Actuarial Education Company IFE: 2022 Examination Page 14 CS1-13: Generalised linear models This is in the form of (1), with: =? 2 =fs a () =ff b ?() = 2 ? 2 1??y 2 cy(, ) =- +?? 2 log2 fpf ?? f ?? Thus, the natural parameter for the normal distribution is Alternatively, wecould have said Using the formulae var( )==Ya( )f?b ) '' above, the ( =f fs= and mean is () a EY() ff= and the scale parameter is 2s . 2. Thereis no unique parameterisation. b' ( )== ?? and the variance is = s2 . Sothese do give us the results that we expect for the normal distribution. Question Show that if we reparameterise the normal distribution using 2= ? , westill get the same results for the meanand variance of the distribution. Solution If we put 2= ? , we get the following a2ff= () 2 b() 4??= Using the formulae for the EY ()4=='b ( ) and expressions for the various functions: 2 ?? =-1/2( 2 / log2 + pf mean and variance, as before: 22 / Yb''( ) ?fa var( )== ( ) cy(,)ffy ) / = 1/2 = 4 2f=f Sothe meanand variance are and2s = s 2 , as before. As mentioned above, for the normal distribution, the variance of Y does not depend on the mean (the variance function does depend on the 1? ()==(Vb'' ) ), whereas for option as this IFE: 2022 Examinations the variance mean. In R,to use a normal distribution in the glm command, this other distributions distribution we set family=gaussian (or omit is the default). The Actuarial Education Compan CS1-13: 2.2 Generalised linear models Page 15 Poissondistribution For the Poisson distribution: e y! - y Y (fy ; ?f , ) which is in the form = == exp[ ylog - of (1), - log y !] with: log ? 1, so that aff( )== 1 ? ?() =be cy(, f ) Thus, the EY() natural b( ) function =-log tells e??'== y! parameter for the = Poisson distribution and the variance function us that the variance is proportional is actually equal to the meansince a f () = is is log, the () Vb'' ( )== e?? = to the mean. meanis . The variance We can see that the variance 1. Question Comment on whether or not we can re-parameterise the Poisson distribution using f 2= , say. Solution Yes. Just as before withthe normal distribution, there is morethan one wayto set up the parameters. However the natural approach is to use1f= rather than 2f= , and this is the most sensible approach to usein the exam. In R,to use a Poisson The Actuarial Education Company distribution in the glm command, we set family=poisson. IFE: 2022 Examination Page 16 2.3 CS1-13: Generalised linear models Binomialdistribution This is slightly variable by n. of Z is more awkward to deal with, since we have to first divide the binomial random Thus, suppose ?ZBin (, ) . Let n= YZ , so that =ZnY . The distribution n n?? ?? Z(fz ;, ?f )zn )(1=- z?? n?? ny (1 ?? ny?? Yfy(; ,?f )=- exp ( log ny exp nylog - z and by substituting -n ) z, the distribution of Y is: ny =+(1 - y )log(1 )) - n log ???? ?? ??+ ny???? n???? ?? ?? 1 for ????+?? ?? log(1 )?? log??-?? ny???? ?? =+ - which is in the form of(1), with: ? = ?? (note that the inverse ??-1 log ?? ofthis is e? = ) 1+ e? f n= a()f 1 = f ? () cy(, f ) log(1 =+ = log be? ) n?? ?? ny?? The reason for all this is that ?is afunction of , the distribution mean only. binomial distribution as wetypically quote it, Bin n(, p) , does not have So we start by considering Bin)n(, , which does have then divide this by n to get a distribution with as a parameter, However, the as one ofits parameters. but has mean n. We in its probability function and which also has mean . Here fn= , the other parameter in the distribution (ie the parameter other than the mean). Question Verifythat the formulae givenin the Core Reading are correct. IFE: 2022 Examinations The Actuarial Education Compan CS1-13: Generalised linear models Page 17 Solution If ? = , then to get n in the denominator we need aff= ( ) log 1/ with f n= . Similarly, ?()b 1 - mustbe given by: ( )-= log(1 -? ) =log ()?=+ So Thus, the as required, 1 be?()log natural ?? e? 1- 1 ?? = +??log ?? ??++?? 11 ee ?? and cy(,f) =log parameter for the binomial ?? 1be ()log =- ? n?? ?? ny?? . distribution ?? is log ??, the mean is: ??-1 EY[] e? b ( )=='?? = 1+ e and the variance function is: e ? )?() Vb'' ( == ? (1 + e ) 2 (1 = Wecan get the second derivative of - ) ()b?mosteasily by writing '? () =- () 1 ? 1 +be1 - . Question Comment on whether these are the results we would expect. Solution Yes. Since Zis binomial with meann and variance n- (1 EY ()= and: E( Z) ) 11 = n nn n 1(1 var(YZvar( )== nn22 R,to use a binomial The Actuarial Education Company ZnY= , weshould have: = These agree with the results that In ) and ) (1-- ) = =a( )V(f ) n we actually got. distribution in the glm command, we set family=binomial. IFE: 2022 Examination Page 18 2.4 CS1-13: Generalised linear models Gammadistribution The best way to consider ? to a and a , ie = the a ?= Gamma distribution is to change the parameters from a and . ? Recallthat that ? mustalways be expressed as afunction of / ?=a fY appears in the PDFformula. y (; ) ?f, ?a ye -- a? a() G Wecan do this byreplacing the ?: aa a == 11 ye yy / - - a aG()a exp??G =which is in the form , so the best wayto start is to ensure of (1), y -loga ?? ?? ?? +( a- 1)log y +a log a - log ?? (a ) ?? with: 1 ? =- =fa a f() 1 = f b( ) log( =- cy(, ) ( ff ?? - =- ) 1)log y log + f f - log( G f ) Since ?is negative, log( )?- is well-defined. Thus, the natural parameter for the gamma distribution is mean is EY[] b ( )==' 1 - =? . The variance function ? is 1 , ignoring the minus sign. The )?() Vb'' ( 1 == ? 2 = 2 and so the 2 variance is . a Here In fa=, the other R,to use a gamma parameter in the distribution. distribution in the glm command, we set family=Gamma. Question Show that the exponential distribution can be written in the form of a member ofthe exponential family. IFE: 2022 Examinations The Actuarial Education Compan CS1-13: Generalised linear models Page 19 Solution Wecan write the PDFof an exponential distribution as: fy()e==? e () = Since EY 1 ?y log ??y - , this is in the appropriate form with: ? 1 =- ?? The Ex = - ( ) =- log( - ) b?? ()p ? distribution is equivalent to )aff= ( f 1= and Gamma(1, ?) distribution, cy ( ,f) = 0 so the results are consistent with those for the gamma distribution. 2.5 Lognormal distribution Finally, the lognormal distribution is often used, for example in general insurance to model the distribution of claim sizes. This can be incorporated in the framework of GLMs since if Y ~lognormal , log Y ~ normal . Thus, if the lognormal distribution is to be used, the data should first be logged and then the normal In R we could either set another ..., family = gaussian (link gaussian (link = ... modelling distribution variable, say Z, equal to log(Y) = ... )) or we use glm(log(Y) can be applied. and then model glm(Z ~ ..., family = ~ )). Syllabus objective 4.2.1 requires students to show that the binomial, Poisson, exponential, gamma and normal distributions are members ofthe exponential family. Havingthe distribution of the response variable belonging to the exponential family ensuresthe calculations are easier when estimating the parameters using maximumlikelihood. It also ensures that the model possesses good statistical The Actuarial Company Education properties. IFE: 2022 Examination Page 20 3 CS1-13: Generalised linear models Linearpredictor The second component of a GLMis the linear predictor, ?, whichis afunction ofthe covariates, ie the input variables to the model. The covariates (also known as explanatory, predictor orindependent model through be estimated. variables), enter the the linear predictor. This is also where the parameters occur which have to The requirement is that it is linear in the parameters that we are estimating. There are two kinds of covariates usedin GLMs: variables and factors. 3.1 Variables In general, predictor. linear variables are covariates where the actual value of a variable enters the linear The age ofthe policyholder is an actuarial example of a variable. models we have only met continuous Sofar in our variables. Avariable is atype of covariate whosereal numerical value enters the linear predictor directly, such as age( x). Other examples of variablesin a carinsurance context are annual mileageand number of years for which a driving licence The bivariate linear model had a single predictor of 1 + 01x . Tofit this and so the actual value of x k continuous of ?,,xx12 has been held. continuous explanatory matters. For the main effect variables this was x with a linear multivariate linear regression xx201 ++ 1 + ... 2 + . kkx and model with Again the values , x k matter. Weuse the same Rformulae in the glm function glm(Y ~ X, glm(Y ~ X1+X2+...+Xk, as we did in the lm function: ...) ...) Asin the previous unit we can extend our modelsto include the variable variable modelit is necessary to estimate the parameters, 0 and to linear predictors including more than Recallthat the linear predictor is linear in the parameters (eg0 linear in the covariates (eg IFE: 2022 Examinations ?01x=+ 2 polynomials, to functions of one variable. and1 ) and not necessarily is also alinear predictor). The Actuarial Education Compan CS1-13: Generalised linear Some examples, models Page 21 where age 1()x the table below together and duration 2()x with the formula Model 1 (null are treated as variables, are shown in usedin the glm function in R. Linear predictor model) Rformula Y 0 age +x101 age2 + age + age2 x age + duration Y 2 21x 01 1++ xx201 log(age) 01 1x 2 1++ Y 2 log 1x01 + ~ ~ 1 Y ~ X1 ~ I(X^2) X1 + I(X1^2) Y ~ X1 Y ~ log(X1) + X2 The null modelhas no covariates and so there is just the intercept parameter. Thisis estimated as the sample mean of the response values. Its fairly easyto see that westart with anintercept parameter and then add a new term with a slope parameter multiplied by the covariate. However, there is actually alittle before we get to this simplified linear predictor. Supposethe linear predictor for age onlyis a a + 22 x2 . Wecould then obtain alinear + 11 more happening x1 and the linear predictor for duration onlyis predictor for both of these covariates by summing their individual linear predictors: a 1+ 1 x 1() (a 2++ 22) ( 1=++aa 2) 1xx1 + x22 Thefinal simplified version givenin the table above, constants together, ie uncombined formula 2=+01 aa . This simplified and is more efficient 01 1x x++2 formula asit requires 2, combines the two gives the same final values as the us to estimate three rather than four parameters. However,it is actually impossible to estimate1a and2a individually from any given data and hence we haveto combine them in the linear predictor to overcome this issue. this in the following question, where we give four values of the linear sufficient to estimate four parameters. The Actuarial Education Company predictor Wedemonstrate which should be IFE: 2022 Examination Page 22 CS1-13: Generalised linear models Question Thetable below shows the value of the linear predictor values of age1()x ?a 12() =+ a + 12xx 1+ 2 for different and duration 2()x . Linear predictor, ? Age1()x Duration2()x 35 20 0 37 20 1 45 30 0 55 30 5 Show that it is impossible to individually estimate all the parameters in the linear predictor. Solution Substituting the given values into the formula for the linear predictor gives the following four equations: (aa )1220+ =+ 37 20 +(aa) 12 + (aa 55 Subtracting 30+ =+ )12 =+ (aa 12 ) + and (1)1 1 (3) 2 (2) 35 =+1 305 1 + 45 2 equations (1) and (2) gives 22= . Subtracting equations (1) and (3) gives 10 1 However, substituting can only estimate the values of their total 01 a 1and 10= we 2 a=+ 2 = for15and hence 1 1= . all four equations into gives 12 are unable to give individual () 15aa . Hence (4) += estimates () 1a and 2a . Incidentally, the actual values used in the question above were 1 =5.5a and see that the simplification models We 2 = 9.5a. It is easy to above gives exactly the same answers as the original values. Other can also be fitted, including, for example, a model for age with no intercept term. omit the intercept in R by adding a 1 (ie negative one) to the formula. Its unusual to have models with no intercept term as these would give a value of zero when a covariate is zero. IFE: 2022 Examinations The Actuarial Education Compan CS1-13: Generalised linear models Page 23 3.2 Interaction betweenvariables In addition to considering variables as the main effect we can include interactions between variables like we did in Chapter 12, Section 4.7 (ie where the effect on the response variable of one predictor variable depends on the value of another predictor variable). Sofar, each covariate has beenincorporated into the linear predictor through an additive term iix . Such a term is called the main effect for that covariate. For a main effect, the covariate increases the linear predictor byi for each unit increase in ix independently of all the other covariates. When thereisinteractionbetween twovariables, sayix andjx , theyarenotindependent.So the effect of one covariate, ix , on the linear predictor depends on the value of the other covariate, jx . Wemodelthis usinganadditiveterm oftheformx ijjxi Recall that included. when an interaction term is used in a model, both in thelinear predictor. main effects must also be Otherwise we are saying that the variables dont contribute anything independently. that they are perfectly correlated and hence one of them is unnecessary. Model age + duration Linear predictor + age.duration age * duration The two illustrate xx Rformula ++ 2 +2 32x 1 x 01 1++ 2 +2 32x 1 x xx 01 1 Thisimplies Y ~ X1+X2+X1:X2 Y ~ X1*X2 models in the table above are equivalent, and have been shown separately the use of the dot and star model notation in R. Aninteraction term is denoted using dot notation. In the example above,age.duration the interaction between to denotes age and duration (although in Ra colon is used to prevent confusion with a decimal point). The star notation is usedto denote the maineffects and the interaction term. In the example above, age*duration 3.3 = age + duration + age.duration. Factorsandinteraction betweenfactors The other main type of covariate is a factor, which takes a categorical the sex of the policyholder is either male or female, which constitutes value. For example, a factor with two categories (or levels). Other examples of factors in a car insurance context are postcode and car type. This type of covariate can be parameterised so that the linear predictor has aterm 1a for a male, and aterm 2a for afemale (ieia general, there is parameter The Actuarial Education Company where1i = for a maleand 2i = for each level that the factor for a female). In maytake. IFE: 2022 Examination Page 24 CS1-13: Factors are typically non-numerical Generalised linear (eg sex). Even for those that are (eg vehicle rating models group), it doesnt makeany sense to include their valuein the linear predictor. Instead we assign parameter values for each possible category the factor can take. In the following table sex and vehicle rating group (vrg) are factors. If there is more than one factor in the model, then the inclusion of an interaction term implies that the effect of each factor depends on the level of the other factor. Model Linear predictor sex a vehicle rating group sex*vehicle rating rating Again, the last two Y ~ sex j Y ~ vrg sex + a sex + vehicle rating group + sex.vehicle i group sex + vehicle rating a group group a Rformula Y ij+ ij ++ ij? ij ++ ij? Y ~ ~ vrg sex+vrg+sex:vrg Y ~ sex * vrg models are identical. As mentioned above, sexis afactor with a parameter assignedto each ofits two categories (ia where1i = for a maleand2i = for afemale). Similarly, vehicle rating group is a factor with a parameter assigned to each ofits categories. For example,if there werethree categories (Group 1, Group 2 and Group 3), then we would have j j ,1,2,3 . = Weuse a different subscript from the one used for sex since the the change to the linear predictor independently subscript for both, it Group 2. would mean that of all the other covariates. If weused i asthe males were always in Group 1 and females Again wecan construct linear predictors that involve to estimate 1aa were always in morethan one covariate bysumming the linear predictorsfor eachindividual covariate. Hencewe woulduse a+ij However, it is again impossible main effects give ,, 212 , and 3 for age +vrg. individually from any given data set, and so wehaveto combine constants together to overcome this issue. This meansthat one of those constants effectively becomes zero. Thisis called the baseassumption in the model. Wedemonstrate this in the following question, where wegive five values of the linear predictor which should be sufficient to estimate five parameters. IFE: 2022 Examinations The Actuarial Education Compan CS1-13: Generalised linear models Page 25 Question Thetable below shows the value of the linear predictor Linear predictor, ? =+ ija for different values of i andj: Sex,i ? Group,j 0.5 Male 1 0.48 Male 2 0.41 Male 3 0.6 Female 1 0.58 Female 2 Showthat it is impossible to individually estimate all the parameters in the linear predictor. Solution Substituting the given valuesinto the formula for the linear predictor givesthe following five equations: 0.5=+ a (1) 11 0.48=+ a 12 (2) 0.41=+ a 13 (3) 0.6=+ a 0.58=+ (4) 21 a (5) 22 Subtracting (1) and (4) (or subtracting (2) and (5)) gives Subtracting (1) and (2) (or subtracting (4) and (5)) gives aa 21 0.1=+ . 21 0.02=. Subtracting (1) and (3) gives 31 0.09=. Wecan try other combinations but none ofthem will be able to give us estimates of all five parameters. If, however, weset one constant to zero, say 1a 0= (ie our base assumption is that the policyholder is male)and the other parameters are calculated relative to this level, we obtain: ? ai = i =01(male) ? ?0.1 i = 2(female) and j ? 0.5 ? 0.48 ? 0.41 ? The actual values used in creating the above question The Actuarial Education Company j = 1 j==?2 j = 3 were: IFE: 2022 Examination Page 26 CS1-13: ai = ? 0.45 i ? ? 0.55 i = 1(male) = 2 (female) ? 0.05 and ? j 0.03 ?-=0.04 ? j = Generalised linear models 1 j==?2 j 3 It is worth spending a moment checking that every combination of sex and vehicle rating group gives exactly the same answer as the estimated values. Finally, weconsider aninteraction between two factors: sex*vehicle rating group = sex + vehicle rating group + sex. vehicle rating group Westart by summing the linear predictors for each ofthe three terms: sex, vehicle rating group and sex . vehicle rating group separately. is ? We already know that the linear ,1,2 . Similarly, the linear predictor for vehicle group alone is ia==i alsoneed alinear predictorfor the interaction effect .== ? notation here does not mean multiply. ij predictor for sex alone ,1,2,3 . j==?j We . The dot 1,2and = 1,2,3ija We have written it in this format for now to indicate an interaction. ? =+ a ij + a i. j Analternative (and more commonly used) notation for the interaction term that depends on both i and j is ij?, sothat: ? =+a ij ?+ ij Interaction between sex and vehicle group indicates that the difference in risk levels for maleand female drivers varies for different vehicle groups. For example,if the response variable is the number of claims on a carinsurance policy, the effect of being male might depend on whether the car being drivenis a Porsche(where the driver might be tempted to show off) or a Mini(where the driver might drive more carefully). However,it is againimpossible to estimate all the parameters (1a ,2a , 1 , 2 ? 13 , 21? , 22? and 23? ) individually from any given data set. ,3 , 11? , 12? , We have to combine constants together to overcome this issue. Thistime it is harder asthere are only six combinations that we can observe: Group 1 male female Group 2 Group 3 a 11 ++ 11? a 12 ++ 12? a 13 ++ 13? a 21 ++ 21? a 22++ 22? a 23 ++ 23? There are 11 parameters to estimate so we will haveto set five of the parameters equal to zero to be able to solve the relevant IFE: 2022 Examinations equations. The Actuarial Education Compan CS1-13: Generalised linear For example, models Page 27 we might get the following: =jj j== 01 i =01(male) ? =ij ??? 0.5 a ?0.66 i = 2 (female) ?? 2 0.4 1 ?? ijji ==?== 1 0.55 ?? ji == 2 ?? 3 =2 0 j 0 - 3 0 0.55 - 0.58 when the true values might be: ?0.45 =ij?==??? i = 1 (male) ? 0.55 a i = 2 (female) 0.05 =jjj== 1 0.03 =ji 2 -= 3 0.04 ?? ?? 1 = ij ?? ji ?? = 1 =2 0.05 0.02 2 0.06 0.03 j 3 - 0.01 - 0.03 Again,it is worth checking that every combination of sex and vehicle rating group gives exactly the same answer as the estimated values. An alternative linear predictor that givesidentical results to ? where ijd= ? ? i==?1 ?i = 2 ? dij ? =+a ij ?+ ij is: jj ==12 j = 3 0.55 0.5 0.4 0.66 0.61 0.48 This gives the six possible combinations directly. Again, do spend a short while checking that every combination of sex and vehicle rating group gives exactly the same answer as before. 3.4 Predictors with variables andfactors andinteraction Finally, well look at modelsthat contain both variables and factors. that includes predictor: ai an age effect and an effect for the sex of the predictors for each individual Here, the linear ? ia== For example, a model could have alinear +x Asabove, wecan construct linear predictors that involve the linear policyholder predictor for age is morethan one covariate by summing covariate. =+ 01x? and the linear predictor for sex is ,1,2 . Summing these gives: i ? +a =+ 01 = () 0 + aii + 1xx Againit is impossible to estimate the parameters0 andia individually, so we haveto combine them together: ? a'=+ i 1 x Wehave added a dash to indicate that the values here are not the same asin the original ia The Core Readingskips straight to the simplified result, The Actuarial Education Company a + i . 1x. IFE: 2022 Examination Page 28 CS1-13: For example, suppose we have the following different values of age ()x Linear predictor, values of the linear predictor Generalised linear ? =+ +()i models 01x for a and different genders. ? Age ()x Sex 1.45 20 Male 1.95 30 Male 1.55 20 Female 2.05 30 Female Since there are four unknown parameters to estimate, four Substituting the given values into the formula for the linear data points should be sufficient. predictor gives the following four equations: 1.45 ( =+ 01)+ 20a 1 (1) 1.95 ( =+ 01)+ 30a 1 (2) 1.55 ( =+ ) 02 + 20a 1 (3) 2.05 ( =+ 02 )+ 30a 1 (4) Subtracting equations (1) and (2) (or subtracting hence 1 0.05= equations (3) and (4)) gives 10 0.5=, and 1 . Subtracting equations (1) and (3) (or subtracting equations (2) and(4)) gives hence aa Substituting 21 1 aa 21 0.1-= , and 0.1=+ . 0.05= and aa 21 0.1=+ into both the other equations gives so weare unable to estimate these parameters separately. Weabsorb the 0 a 02() 0.55+=and into the sex parameters to resolve this issue. Notice that the estimated parameter separately 0 is redundant from 1a and has not been included (it could not be and 2a ). This gives: ?0.45 if i =1 (male) 1 0.05= and ai' = ? ? 0.55 if i =2(female) The actual values usedto construct the question above were: 0 0.5= , IFE: 2022 Examinations 1 = 0.05 and ai = ?-=0.05 i ? 1(male) ?0.05 i =2(female) The Actuarial Education Compan CS1-13: Generalised Again,its linear models Page 29 worth spending a short while checking that every combination of age and sex gives exactly the same answer as the estimated values. Notice also that the effect of the age of the policyholder is the same whether the policyholder is male or female. In other words, age and sex areindependent covariates. Thereis nointeraction between them. In this caseif we wereto draw a graph ofthe linear predictor, it would consist oftwo parallel straight lines (one for males and one for females). ? a2 + x a1 + x age(x) Including the interaction predictor of: i =(1,2) +x aii between the age and sex would lead to a linear Recallthat aninteraction is wherethe effect of one covariate (eg age) on the linear predictor depends on the value that another covariate (eg sex) takes. In this case, the effect of the age of the policyholder The graph for the model withinteraction is different for males and females. would consist of two non-parallel straight lines. ? a1 + 1x a2 + 2x age (x) For example, if the response variable is the number of accidents claimed for on a carinsurance policy, it might be the casethat young menare more prone to accidents than young women but, as men get older, there is a steeper drop off in the number Lets now consider how we would construct the linear The Actuarial Education Company of accidents. predictor + iixa for this model. IFE: 2022 Examination Page 30 CS1-13: Westart by summing the linear separately. linear predictors for each of the three terms: Wealready know that the linear predictor for age alone is predictor for sex alone is effect 01() a=+ ? ,1,2 . ? ia==i Wealso need alinear Generalised linear models age, sex and age.sex =+ 01x andthat the ? predictor for the interaction ., xii = 1,2. Thedot notationheredoesnot meanmultiply. Wehave written it in this format for now to indicate aninteraction. Weadd the three of these together: ? 01 =+ + a.ii + xx 1 ()0 + a Wecould then use MLEon a set of past data to come up with estimates based on past data for each of the parameters. For example, these might be: 0.5= ,1 0 with interaction 0.05=, ai ?- 0.05 if i =1 (male) = ? i =2(female) ? 0.05 if terms: ?0.35 if = ? 0 ai ?0.05 if . This approach is rather i =1(male) and i =2(female) artificial ?-0.15 .ai = 1 and would involve ? if ?-0.02 if estimating i =1 (male) =2(female) i eight non-zero parameters. However,there is a moreefficient way. Wecan combine the parameters0 , ai andi 0. a as these terms are not attached to x in the linear predictor. Similarly, wecan combine the terms 1 and i Alinear 1. a . predictor that gives identical ? =+ results is: iixa where: ai ? 0.8 if =? ? 0.6 if The following table factor of sex: i =1(male) i =2 (female) summarises Model the and i different i =1 (male) if i =2 (female) age (as a variable) Linear predictor + sex ai ai x+ Y Y 1 aii 1x+ age*sex aii 1x+ and the Rformula 01 1x age + sex + age.sex IFE: 2022 Examinations if models involving age age + sex ?- 0.2 ? ?- 0.07 = Y Y ~ Y ~ ~ X1 ~ sex X1 + sex X1+sex+X1:sex ~ X1 * The Actuarial sex Education Compan CS1-13: Generalised linear models Page 31 Question In UK motorinsurance business, vehicle-rating group is also used as afactor. Vehicles are divided into twenty categories numbered 1 to 20, with group 20including those vehicles that are most expensive to repair. Suppose that we have a three-factor model specified as age*(sex + vehicle group) . Determine the linear predictor for a model ofthis type. Solution A helpful starting pointis to consider the linear predictor for sex + vehicle group onits own. Summing the linear predictors for both of these maineffects gives: ? =+ ija Wedont attempt to simplifythis to ija asthis notationis reservedfor aninteraction between sex and vehicle group, which we are not considering here. Now weconsider the linear predictor for age * (sex + vehicle group). Recallthat this can also be written as: age +(sex + vehicle group) + age . (sex + vehicle group) Wesum the linear ?? predictors for each of these three components: ?01 =+ Finally, we simplify a d++ Note that ij () + a + ij() + ? by combining ? i x + j , ai + ()() . a xx ?01 + ij parameters: x we have: combined left j 0? andi? 0. a into a newia alone combined 1? renamed andi? 1 . .?j 1 asjd a into i? . In general, when we add a new main effect, we add n1parameters (or equivalently lose n1degrees of freedom), where n is the number of parameters that we would have used had the maineffect stood onits own. In the case wherethe maineffect is afactor, n is also the number of categories. The Actuarial Education Company IFE: 2022 Examination Page 32 When we add an interactive CS1-13: factor, degrees offreedom), (1)( -- 1)nm we add Generalised linear models (1)( -- 1)nm parameters (or equivalently lose where n and m are the number of parameters that we would have used had each ofthe maineffects stood on their own. In the case where both these main effects are factors, n and m are also the number of possible categories for eachfactor. IFE: 2022 Examinations The Actuarial Education Compan CS1-13: 4 Generalised linear models Page 33 Linkfunctions Recallthat the link function where invertible connects the meanresponse to the linear predictor, g() =?, = EY() . Technically, it is necessary for the link function to be differentiable and in order to fit a model. Aninvertible function is one that is one-to-one, so that for any value of ?there is a unique value of to be able to invert . use the Wehave seen already that it is important model to make predictions the link function in order to about the future. Beyond these basic requirements, there are a number of functions which are appropriate the distributions above. However, it is sensible to choose link functions to ensure our predicted response variables stay within sensible bounds and ideally for minimise the residual variance. In R we specify the link function identity, log, by setting link sqrt, logit, equal to the appropriate inverse, 1/mu^2, function etc. For each distribution, the natural, or canonical, link function is defined g() Remember that ?is the natural parameter for the exponential function of the meanof the distribution If no link function is specified in Rthen these will be the default identity g() = Poisson log g() = log binomial logit g() = log gamma inverse g() = canonical link function. family form and that is a option. Hence the given in Section 2 are: normal =-1? ()?. . canonical link functions for the distributions Earlier, weshowed that = ?? ?? ??-1 1 for the gamma distribution. The minussignis dropped in the This doesnt affect anything since constants will be absorbed into the parameters in the linear predictor. The canonical link functions are given on page 27 ofthe Tables. These link functions work well for each of the above distributions, but it is not obligatory that they are used in each case. For example, we could use the identity link function in conjunction with the Poisson distribution, had a gamma distribution, and so on. The Actuarial Education Company we could use the log link function for data which IFE: 2022 Examination Page 34 CS1-13: However, we need to consider the implications possible values for positive. If . to be positive, whatever value (positive not true if we use the identity Other link functions of the choice ofthe link function For example, if the data have a Poisson we use the log link function, Generalised linear distribution models on the then must be =e?. Thus, is guaranteed or negative) the linear predictor takes. The same is then )? = log( and link function. exist, and can be quite complex for specific basis for actuarial applications, the above four functions modelling purposes. As a are often sufficient. Question Determinethe inverse ofthe link function comment on why this g () = log 1 ?? by setting it equal to ??-?? ? and might be an appropriate link function for the binomial distribution. Solution Weusedthis inverse function in the actuarial exam passrates example. It is: e? == ee??- 1 ++ = 11 It is an appropriate link function 1 +e-? () 1 - for the binomial distribution since it results in values of , the probability parameter, between 0 and 1. IFE: 2022 Examinations The Actuarial Education Compan CS1-13: 5 Generalised linear models Page 35 Modelfitting andcomparison The process of choosing a model also uses methods which are approximations, based on maximumlikelihood theory, and this section outlines this process. 5.1 Obtaining the estimates The parameters in a GLM are usually estimated using maximum likelihood estimation. The log-likelihood function, ?(; ?f, ) = log( yfY y(; ?f, )) , depends on the parameters in the linear predictor through the link function. Thus, maximum likelihood estimates of the parameters may be obtained by maximising ? with respect to the parameters in the linear predictor. This depends on the invariance property from Chapter 8, that the MLEof afunction is equal to the function of the MLE. Wereally wantto find the MLEofthe final parameter . However, because of the invariance property it is permissible to find the translate this into the MLEfor MLE of the linear predictor ?, and . Question Claim amounts for distribution medicalinsurance with meani fy ()== 1 claims for hamsters are believed to have an exponential : yii / yi exp - e log ?? ii Wehave the following age ix data for hamsters (months) claim amount () ?? ii ??- medical claims, using the 4 8 10 11 17 50 52 119 41 163 model above: Theinsurer believesthat alinear function of age affects the claim amount: ? =+ iixa Usingthe canonical link function, the maximum likelihood write down (but do not try to solve) the equations satisfied by estimates for a and , based on the above data. Solution Thelog of the likelihood function is: log L ( ) yi =--??log ii i The canonical link function for the exponential distribution is function connects the mean response to the linear predictor, 1iig= () . Recallthat the link ()iig ?= . The Actuarial Education Company IFE: 2022 Examination Page 36 CS1-13: Generalised linear models Hence, we have: 1 a =+x i i Rearranging this gives: 1 = i a xi + This enables usto log ( a , write the log-likelihood ) Lyii(a =- x ) + function in terms ?? log( + log+Lyi ,a) ( =aa ? log ( ,a ) ?? +1 = a and : 1 ?+?? xi ?+??a xi + Sothe equations satisfied bythe yi-+ : xi Lxyii =- a and xi) +a Wecan now differentiate this with respect to ? of MLEsof a and are: 0 axi and: xyii -+ xi ?? + = 0 axi Substituting in the given data values gives the following 11 1 ++ 8++ a4 a 4 and: 8 10 ++ 4 8++ a a These are not particularly equations gives a a + 10 1 + a + 10 a + 11 1 + 11 + a + 11 a + 17 17 + a + 17 easy to solve without computer 0.160134=and =-0.000598 . equations: -425 = 0 - 5,028 = 0 assistance. Using Rto solve the Wecan then estimate the mean claim amounts for various ages using: 1 i = a + xi Doing so gives estimates for the claim amounts of 6.34, 6.44, 6.49, 6.51 and 6.67, which are very poor indeed. So the IFE: 2022 Examinations model does not appear to be appropriate at all. The Actuarial Education Compan CS1-13: Generalised linear models Page 37 The R code to fit a generalised linear model to a multivariate data frame and assign it to the object model,is: model <-glm(Y Then the estimates obtained by: ~ ..., family of the parameters = ... and their (link = ... approximate )) standard errors can be summary(model) An example of a part of the summary output is shown below, which we can see is identical to the 5.2 multivariate model output: Significanceof the parameters As for the multiple linear regression model we can test whether each of the parameters is significantly different from zero. Generally speaking, it is not useful to include a covariate for which we cannot reject the hypothesis that 0 = . Approximate standard likelihood theory. errors of the parameters can be obtained using asymptotic maximum Recallfrom Chapter 8 that estimators arein general asymptotically normal and unbiased with variance equal to the Cramr-Rao lower bound: ) ???(,NCRLB Hence, whentesting - 0 se.() large 0:0H = vs n 1:0H ? ???N(0,1) For atwo-tailed test the critical values are 1.96> )se . .( As a rough , we usethe result: 1.96. So we have a significant value if Wecould approximate the 1.96 by 2 for simplicity. guide, an indication of the significance of the parameters is given by twice the standard error. Thus, if: > 2 standard error() the parameter is significant is a candidate The Actuarial for Education and should be retained in the model. Otherwise, the parameter being discarded. Company IFE: 2022 Examination Page 38 CS1-13: Generalised linear models It should be noted that in some cases, a parameter may appear to be unnecessary using this criterion, but the model without it does not provide a good enough fit to the data. As with any model we needlook at the whole situation and not just one aspectin isolation. In R,the statistic and p-value for the tests of 0 =:0H are given in the output of summary(model). So for the above printout we can see that the covariate disp is significant, whereas the covariate wt is not. Therefore, we would remove wt from the model and see if it still provides a good enough fit to the data. Recall from 5.3 Chapter 10 that a p-value is significant if it is less than 5% (ie 0.05). Thesaturated model To compare models, we need a measure of the fit of a model to the data. To do this, we compare our model to a model that is a perfect fit to the data. The model that fits perfectly is called the saturated model. A saturated model is defined to be a model in observations, which there are as many parameters as so that the fitted values are equal to the observed values. Keyinformation In the saturated model we have iiy = , ie the fitted values are equal to the observed values. Question Claim amounts for medicalinsurance claims for hamsters are believed to have an exponential distribution with meani fy()== 1 yii - / e : exp - yi log ?? ii ??- ?? ii Weare given the following data for hamsters ageix (months) claim amount () The insurer medicalclaims, using the model above: 4 8 10 11 17 50 52 119 41 163 believes that a model with 5 categories for ageis sufficiently i==?a ii accurate: 1,2,3,4,5 Using the canonical link function, show that the fitted values ()i are the observed claim amounts, iy . IFE: 2022 Examinations The Actuarial Education Compan CS1-13: Generalised linear models Page 39 Solution Thelog of the likelihood function is: log L( ) yi =--??log ii i Setting the canonicallink function for the exponential distribution to the linear predictor, gi 1ii ()?== , gives: = ? 11 = a ii aii This enables usto writethe log-likelihood function in terms ofia : log iaa ( ) =- Lyii +i?? log( a ) Wecan now differentiate this withrespect toia : ? log ( )ii =- Lya + ? 1 aa ii Allthe terms other than those involving the specificia Sothe equations satisfied by the MLEsofia -+ = yiiiiya Hence, the fitted 0 ? a = wearelooking at disappear. are: 11 values are: 1 == a y iii Thefitted values, i , are equal to the observed values,iy . However, a modelthat fits the data perfectly is not necessarily a satisfactory Suppose we are trying to model weight (Y) and height (x) using alinear model, model. 01x+. Thisis shown below on the left, whereas a graphical representation ofthe saturated modelis onthe right. The Actuarial Education Company IFE: 2022 Examination Page 40 CS1-13: Generalised linear models Weight(Y) Weight(Y) x x x x x x 0 +1x x x x x x x x x x x x x x x x x x x x x x x x x x Height (x) Height (x) The saturated modelis a perfect fit to the data. However, since the fitted value is the observed value, wecannot predict a value without first knowing whatit is. The saturated model has no predictive ability for other heights, but it does provide an excellent benchmark against which to compare the fit of other models. 5.4 Scaleddeviance(or likelihood ratio) In order to assessthe adequacy of a modelfor describing a set of data, wecan compare the likelihood under this model with the likelihood under the saturated model. The saturated modelusesthe same distribution andlink function as the current model, but has as manyparameters asthere are data points. Assuchit fits the data perfectly. Wecan then compare our modelto the saturated modelto see how good afit it is. Suppose that SL and ML denote the likelihood evaluated at their respective optimal parameter functions of the saturated values. The likelihood and current models, ratio statistic is given by /LL SM. If the current modeldescribes the data wellthen the value ofML should be close to the value ofSL . If the modelis poor then the value ofML and the likelihood Alternatively, ratio statistic will be muchsmaller than the value ofSL will belarge. we could examine the natural log of the likelihood log LS LM where ?SSL= log =- ratio statistic: ?? SM and ?MML= log . The scaled deviance is defined as twice the difference between the log-likelihood model under consideration IFE: 2022 Examinations (known as the current model) and the saturated of the model. The Actuarial Education Compan CS1-13: Generalised linear models Page 41 Scaled deviance The scaled deviancefor a particular model Mis defined as: SDMS2 ()M=- ?? The deviance for the current scaled deviance model,MD , is defined such that: DM = f Remember that fis a scale parameter, soit seems sensible that it should be used to connect the deviance withthe scaled deviance. For the Poisson and exponential distributions, f 1= , so the scaled deviance andthe deviance areidentical. The smaller the deviance, However, there the better the will be a trade-off here. model from the point of view of model fit. A model with many parameters will fit the data well. However a model withtoo manyparameters will be difficult and complex to build, and will not necessarilylead to better prediction in the future. It is possible for modelsto be over-parameterised, ie factors are included that lead to a slightly, but not significantly, better fit. When choosing linear models, we will usually need to strike a balance between a model with too few parameters (which will not take account of factors that have a substantial impact on the data, and willtherefore not be sensitive enough) and one withtoo many parameters (which will be too sensitive to factors that really do not have mucheffect on the results). Weusethe principle of parsimony here, ie we choose the simplest This can be illustrated by considering In this case, the log-likelihood n )yf (;,?f) ? = model that does the job. the case for a sample when the data are normally distributed. of size n is: ?log ( y; ?, f Yi i i=1 n log2ps 2 =- The likelihood function 2 -?n i=1 yii -? ()2 2s2 for a random sample of size n is fy ) f ( y12()... fy ( )n . When wetake logs, weaddthe logs of the individual PDF. Recallthat for the normal distribution the natural parameter is the mean,ie For the saturated disappears. n = ii? . model, the parameter i? is estimated by iy , and so the second term Thus, the scaled deviance (twice the difference under the current and saturated models) is y 1 between the values of the log-likelihood 2 ?()ii - S i = 2 s where i? is the fitted value for the current model. The Actuarial Education Company IFE: 2022 Examination Page 42 CS1-13: The deviance (remembering that the scale parameter = fs2 ), is the Generalised linear well-known models residual sum of squares: n y S- ? ii i=1 ()2 Thisis why the deviance is defined with afactor of two in it, so that for the normal deviance is equal to the residual sum of squares that model the we metin linear regression. The residual deviance (ie the deviance after all the covariates have been included) is displayed as part of the results from summary(model). For example: In R we can obtain a breakdown of how the deviance is reduced by each covariate sequentially by using anova(model). However, unlike for linear regression, this does not automatically carry out a test. Also recall that the smaller the residual (left over) deviance, the better the fit of the added command model. Theresidual deviance outputted by the glm() function is a measureoffit, similar to the scaled deviance and deviance defined earlier. However,this output wont necessarily matchthe scaled deviance or deviance calculated from first principles usingthe formulae in this section. 5.5 Usingscaled devianceand AkaikesInformation Criterionto choose between models Adding more covariates will always improve the fit and thus decrease the deviance, however we need to determine whether adding a particular covariate leads to a significant decrease in the deviance. For normally distributed data, the scaled deviance has a ?2 distribution. parameter for the normal sum-of-squares models). = fs2 and using must be estimated, we compare F tests (as in the analysis Since the scale models by taking of variance for linear ratios of regression Wecovered this in Section 4.3from the previous chapter. Thus, if we want to decide if is a significant improvement S1), we see if 2 Model 2 (which over has Model 1 (which - SS () q 12 is greater than the 5% value for the Sn (( p-+q )) The code for comparing two normally distributed anova(model1, IFE: 2022 Examinations pq+ parameters model2, and scaled deviance has p parameters and scaled Fqn , q--p S2) deviance distribution. models, model1 and model2, in Ris: test="F") The Actuarial Education Compan CS1-13: Generalised linear models Page 43 In the case of data that are not normally distributed, the scale parameter example, for the Poisson distribution. looking For these distribution reasons, 1f Since the distributions = ), and the deviance is only asymptotically the common at the difference in the scaled may be known (for procedure deviance are only asymptotically is to compare and comparing normal, the F test get a better result by comparing two approximate ? 2 two with a ? 2 a ? 2 models by distribution. will not be very accurate. We distributions. To be more precise,its the absolute difference between the scaled deviancesthat is compared with ? 2 . Thus, if we wantto decide if Model 2(which is a significant improvement S1), we see if has over Model 1 (which pq+ parameters and scaled deviance 2S ) has p parameters SS12 is greater than the 5% value for the and scaled deviance 2 ?q distribution. Recallthat wesubtract one degree offreedom for each extra parameter introduced. difference Since between p and +pq 2 +??p22 ?q? qp that Soits the matters. (provided the random variables areindependent), it makessense to say + that the difference in the scaled deviances has a ?q2 distribution. What we are trying to do hereis to decide whether the added complexity results in significant additional accuracy. If not, then it would be preferable to usethe model withfewer parameters. Alternatively, wecould expressthis test in terms ofthe log-likelihood functions. If welet ?p and ?p q+ denote the log-likelihoods ofthe models with p and pq+parameters respectively, then the test statistic can be written as: SS 12-= -??Sp =- 2 () -22( ? -??pp - ? Sp +q ) +q() Thisis the format given on page 23 of the Tables and will be usedin Subject CS2to compare Cox regression models. Question Explain whythe test statistic will always be positive. Solution As we have mentioned before, adding more parameters willimprove the fit of the model to the data. Therefore we would expect the value of the likelihood function to belarger for models with moreparameters. Hence,>?? p The Actuarial Education Company qp+andsothe statistic willbe positive. IFE: 2022 Examination Page 44 CS1-13: Generalised linear models The code for comparing these two (non-normally distributed) models, model1 and model2, in Ris: anova(model1, model2, A very important models. In other two modelsfor linear predictor point is that this method of comparison can only be used for nested words, Model 1 must be a submodel of Model 2. Thus, we can compare which the distribution ofthe data and the link function has one extra parameter in are the same, but the Model 2. For example x and01 + 2 ++ 01 2xx . But wecouldnotcompare in this way if the distributionofthe dataorthe link function and test="Chi") + 03logx . It should be clear that we can gauge the importance examining the scaled deviances, but we cannot use the testing In the first case,the difference between the modelsis 2x 2 ++ 01 2xx are different, or, for example, whenthe linear predictors are of factors by procedure outlined above. 2, and so a significant difference between the modelstells usthat the quadratic term should beincluded. In the second case, the difference between the modelsis logxx - 32 2, andso asignificantdifferencedoesnttell us which parameter is significant. An alternative method of comparing models is to use Akaikes Information Criterion (AIC). Since the deviance will always decrease as more covariates are added to the model, there will always be atendency to add more covariates. However this willincrease the complexity of the model which is generally considered to be undesirable. To take account of the undesirability of increased complexity, computer packages will often quote the AIC, which is a penalised log-likelihood: AIC =- 2 log LM + 2 number where log ML is the log-likelihood Whencomparing two deviance is of parameters of the model under consideration. models, the smaller the AIC,the better the fit. more than twice the change in the number Soif the change in of parameters then it would give a smaller AIC. Thisis approximately equivalent to checking the 5% value of the ?2 distribution whether the difference in deviance is greater than for degrees of freedom between 5 and 15. However, it has the added advantage of being a simple wayto compare GLMs without formal testing. Thisis similar to comparing the adjusted 2R for multiple linear regression models in the previous chapter and henceis displayed as part of the output of a computer fitted GLM. In Rthe AICis displayed as part of the results from summary(model). An example of this is given in the R box at the end of Section 5.4. IFE: 2022 Examinations The Actuarial Education Compan CS1-13: 5.6 Generalised linear models Page 45 Theprocessofselectingexplanatoryvariables As for multiple linear regression the process of selecting the optimal set of covariates GLM is not always easy. Again, we could use one of the two following approaches: (1) Forward selection. significant AIC to rise for a Addthe covariate that reduces the AICthe most or causes a decrease in the deviance. Continue in this way until adding any more causes the or does not lead to a significant improvement in the deviance. Note we should start with main effects before interaction terms and linear terms before polynomial. Suppose weare modellingthe number of claims on a motorinsurance portfolio and we have data on the drivers age, sex and vehicle group. We would start with the null model (ie a single constant equal to the sample mean). Then we wouldtry each of single covariate models(linear function of age or the factors improvement in a sex or vehicle group) to see which produces the ?2 test or reduces the AIC the most. Suppose this most significant was sex. Then we would try adding a second covariate (linear function of age or the factor vehicle group). Supposethis was age. Then we would try adding the third covariate (vehicle group). We might then try a quadratic function of the variable age(and maybehigher powers) or each of 2term interactions (eg sex*age or sex*group or age*group). Finally we would try the 3 term interaction (2) Backward selection. Start by adding all available remove covariates one by one starting with the least covariates significant (ie sex*age*group). and interactions. Then until the AIC reaches a minimum or there is no significant improvement in the deviance, and all the remaining covariates have a statistically significant impact on the response. So withthe last example we would start with the 3 term interaction sex*age*group and look at which parameter hasthe largest p-value (in a test of it being zero) and remove that. Weshould see a significant improvement parameter with the largest in a ?2 test and the AIC should fall. Then weremove the next p-value and so on. The Core Reading uses Rto demonstrate this procedure. PBOR,its important to understand the process here. Whilstthis will be covered in the CS1 Example We demonstrate both of these methods in R using a binomial model on the mtcars dataset from the MASS package to determine whether a car has a V engine or an S engine (vs) using weight in 1000 lbs (wt) and engine Forward selection Starting with the null model0 The AIC of this <-glm(vs ~ model (which 1, data=mtcars, The Actuarial Education Company (disp) as covariates. family=binomial) would be displayed using summary(model0)) <-update(model0, anova(model0, in cubic inches model: We have to choose whether we add disp greatest improvement in the deviance. model1 displacement model1, or wt first. ~.+ is 45.86. Wetry each and see which has the disp) test="Chi") IFE: 2022 Examination Page 46 CS1-13: model2 <-update(model0, anova(model0, ~.+ model2, So we can see that disp Generalised linear models wt) test="Chi") has produced the more significant result so we add that covariate first. R always calls the models we are comparing Model 1 and Model 2, irrespective of how we have named them. Thiscanlead to confusion if weare not careful. The AICof model 1(adding disp) is 26.7 whereasthe AIC of model 2(adding wt) is 35.37. Therefore adding disp reduces the AIC morefrom model 0s value of 45.86. Let us now seeif adding wt to disp produces a significant improvement: model3 <-update(model1, anova(model1, ~.+ model3, This has not led to a significant therefore we definitely wt) test="Chi") improvement in the deviance The AICof model 3(adding wt) is 27.4 whichis worsethan would not add it. Incidentally the AIC for so we would not add wt (and would not add an interaction term between disp and wt). model1s AIC of 26.7. Therefore we models 0, 1, 2, 3 are 45.86, 26.7, 35.37 and 27.4. So using these would have given the same results (as Model 1 produces a smaller AICthan Model 2, and then Model 3 increases the AIC and so we would not have selected it). Backward selection Starting with all the possibilities: modelA IFE: 2022 Examinations <-glm(vs ~ wt * disp, data=mtcars, family=binomial) The Actuarial Education Compan CS1-13: Generalised linear models Page 47 The output is: None of these covariates are significant. The parameter of the interaction term has the highest p-value (0.829), and so is mostlikely to be zero. Wefirst remove modelB the interaction wt:disp, <-update(model1, The AIC has fallen from Alternatively, term as this is the least significant parameter: ~.-wt:disp) 29.361 to 27.4. carrying out a ?2 test using anova(modelA, modelB, test="Chi") would show that there is no significant difference between the models(p-value of 0.8417) and therefore we are correct to remove the interaction The wt term is not significant modelC Both of these term so removing <-update(modelB, coefficients between wt and disp. that: ~.-wt) are significant and the AIC has fallen from Alternatively, carrying out a ?2 test using anova(modelB, modelC, 27.4 to 26.696. test="Chi") would show that there is no significant difference between the models(p-value of 0.255) and therefore we are correct to remove the wt covariate. We would stop at this model. If weremove the disp term (to give the null model), the AIC increases to 45.86. Alternatively, carrying out a ?2 test between these two difference (p-value ofless than 0.001) and therefore models would show a very significant weshould not remove the disp covariate. Wecan see that both forward and backward selection lead to the same model being chosen in this case. The Actuarial Education Company IFE: 2022 Examination Page 48 5.7 CS1-13: Generalised linear models Estimatingthe responsevariable Once we have obtained our model and its estimates, we are then able to calculate the value of the linear predictor, ?, and by using the inverse of the link function we can calculate our estimate of the response = variable -1?g () . Substituting the estimated parameters into the linear predictor gives the estimated value of the linear predictor for different individuals. Thelink function links the linear predictor to the mean of the distribution. Hence wecan obtainan estimatefor the meanofthe distribution of Yfor that individual. Lets now return to the Core Reading example on page 45. Suppose, we wish to estimate the probability of having a V engine for a car with weight 2,100lbs and displacement 180 cubic inches. Using our linear 0 = 4.137827 predictor and 1 + 01 disp (ie vs ~ disp), we obtained estimates of . =-0.021600 These coefficients displayed as part of the summary output of Model C in the example above. Hence, for displacement 180 we have ? =- 4.137827 0.021600 180 = 0.24983 . We did not specify the link function so we shall use the canonical binomial link function which is the logit function. ?? 0.24983 = log ?? ??1 - ? Recall that the mean for a binomial e 0.24983 ==0.562 0.24983 1+ e model is the probability. So the probability of having a V engine for a car with weight 2,100 lbs and displacement 180 cubic inches is 56.2%. The figure 2,100 does not enter the calculation In R we can obtain newdata because we removed the weight covariate. this as follows: <-data.frame(disp=180) predict(model,newdata,type="response") IFE: 2022 Examinations The Actuarial Education Compan CS1-13: 6 Generalised linear models Page 49 Residualsanalysisandassessmentof modelfit Once a possible The residuals fitted model has been found it should responses, . The fitted responses function to the linear predictor Welooked be checked by looking at the residuals. are based on the differences between the observed responses, are obtained by applying y , and the the inverse of the link with the fitted values of the parameters. at how we could obtain predicted responses values in the previous section. The fitted values arethe predicted Y valuesfor the observed data set, x. The R code for obtaining the fitted values of a GLMis: fitted(model) For example, in the actuarial passrates modeldetailed on page 6, wecould calculate from the model what the pass rate ought to be for students assignments and scored 60% on the mock exam. The difference between this theoretical who have attended tutorials, submitted three pass rate and the actual pass rate observed for students who matchthe criteria exactly will give usthe residuals. Question Draw up a table showing the differences between the actual and expected values of the truancy rates in the example on page 9. Solution Recall that the expected number of unexplained ?a aWC +ij=+ ? where x x =-2.64 aOC =-1.14 = M absences in a year were modelled by: age, and and a =-3.26 F where WC=Within catchment , OC= Outsidecatchment, are asfollows: =-3.54 ? = 0.64 =MaleM , F =Female. This gives expected values of: Agelast birthday Within catchment area Outside catchment area The Actuarial Education 8 10 12 14 Male 0.46 1.65 5.93 21.33 Female 0.35 1.25 4.48 16.12 Male 2.05 7.39 26.58 95.58 Female 1.55 5.58 20.09 72.24 Company IFE: 2022 Examination Page 50 CS1-13: So the differences Generalised linear models between the actual values (given on page 9) and expected values are: Agelast birthday 8 10 12 14 Male 1.34 0.35 0.37 7.23 Female 0.15 0.35 0.52 Male 0.05 0.11 1.08 23.58 Female 1.25 0.62 0.49 4.04 Within catchment area Outside 0.08 catchment area The procedure here is a natural extension of the way wecalculated residuals for linear regression models covered in the previous chapter. However, because of the different distributions used, we need to transform There are two 6.1 these raw residuals so we are able to interpret kinds of residuals: them meaningfully. Pearson and deviance. Pearson residuals The Pearson residuals are defined as: y - var( ) The var() the fitted in the denominator refers to the variance of the response distribution, var()Y values, distribution is , in the formula. 2, we have var( 2)= The Pearson residual, using For example, since the variance of the exponential in that case. which is often used for normally distributed data, has the disadvantage that its distribution is often skewed for non-normal data. This makesthe interpretation of residual The R code for obtaining plots the difficult. Pearson residuals residuals(model, type= The Pearson residuals returned is: "pearson") by R are calculated slightly differently from the definition given in this section. Therefore, this output wont necessarily matchthe Pearsonresiduals calculated from yfirst principles using var( ) . If the data come from a normal distribution, then the Pearson residuals willfollow the standard normal distribution. Bycomparing these residuals to astandard normal (eg by using a Q-Qplot), we can determine IFE: 2022 Examinations whether the model is a good fit. The Actuarial Education Compan CS1-13: Generalised linear models Page 51 However, for non-normal data the Pearson residuals will not follow the standard normal distribution and wont even be symmetrical. This makesit difficult to determine whether the modelis a good fit. Hence we will need to use a different type of residual. 6.2 Devianceresiduals Deviance residuals the contribution sign y are defined as the product of the sign of y- and the square root of of y to the scaled deviance. Thus, the deviance residual is: () d- i where thescaled deviance is ?di2. Recall that: ?+> 1if x sign x()= ? ?-< 1if x 0 0 Deviance residuals are usually more likely to be symmetrically distributed and to have approximately normal distributions, and are preferred for actuarial applications. The R code for obtaining the deviance residuals is: residuals(model) The deviance residuals returned this section. by Rare calculated Therefore, this output wont slightly differently necessarily from the definition given in match the deviance residuals calculated from first principles using the formulae in this section. Wecan see that deviance residuals the following result: If are morelikely to be symmetrically Note that for by considering {}iXis a set ofindependent normal random variables, then have a ?2 distribution. Therefore, since follows that id distributed 2 2 ?di(iethescaled deviance) isapproximately ? , it (and also the deviance residual) is likely to be approximately normally =?YXi2 will normal. distributed data, the Pearson and deviance residuals distributed data, the Pearson and deviance residuals are identical. Question Show that, for normally are identical. Solution If ?YNsii (, 2), then from Section 6.1, the Pearson residuals are: ii var( i ) The Actuarial Education yy -- ii = s Company IFE: 2022 Examination Page 52 CS1-13: In Section 5.4, wesaw that the scaled deviance y - ii() 2 2 Generalised linear models was: nn ??=di2 ii==s 11 Sothe devianceresiduals are given by: sign y y-= ii () di sign - ii () ii yy -i i = ss Hencethe Pearsonresiduals and the devianceresiduals arethe same. 6.3 Usingresidualplotsto checkthe fit The assumptions of a GLM require that the residuals should show no patterns. presence of a pattern implies that something has been missed in the relationship the predictors and the response. If this is the case, other model specifications The between should be tried. So,in addition to the residuals being symmetrical, we would expect no connection between the residuals covariates, and the explanatory covariates. Rather than plotting the residuals against each of the we could just see if there is a pattern For our model above (on the mtcars when plotted against the fitted dataset), a plot ofthe residuals values. against the fitted values is as follows: There does appear to be some be outliers. IFE: 2022 Examinations pattern here and the three named points on the graph The Actuarial Education might Compan CS1-13: Generalised linear models Page 53 The line shows the trend. Ideally this should be horizontal which indicates no pattern. Also the residuals should be small. If Rnames them, then they are considered to betoo large. Wecould also plot a histogram of the residuals, or another similar also be examined in order to assess whether the distributional diagnostic plot should assumptions arejustified. Whilsta Q-Qplot is produced as an output of the GLMsprocess, there is some controversy over whether this is appropriate for non-normal distributions such asthe binomial distribution in the Core Reading example above. Henceit has not beenincluded in the Core Reading. 6.4 Acceptabilityof afitted model In addition to comparing models, statistical tests can be used to determine the acceptability of a particular model, once fitted. Pearsons chi-square test and the likelihood ratio test are typically used. These are described in Chapter 10, Sections 8 and 2 respectively. The tests for overall fit involve comparing the scaled deviance ofthe fitted model withthe scaled deviance of the null model (with no covariates). The extent by which the fitted model reduces the scaled deviance (per additional parameter estimated) is a measure of how the fitted model is an improvement on the null model. Considerable flexibility in the interpretation of the tests is sometimes necessary in order to arrive at a suitable based on statistical inference much theory model. Thus, the interpretation of deviances, residuals and significance of parameters given above should be viewed as useful guides in selecting a model, rather than strict rules which must be adhered to. The Actuarial Education Company IFE: 2022 Examination Page 54 CS1-13: Generalised linear models The chapter summary starts on the next page so that you can keep all the chapter summaries together for revision purposes. IFE: 2022 Examinations The Actuarial Education Compan CS1-13: Generalised linear models Page 55 Chapter13Summary Exponentialfamily There is a wide variety of distributions (normal, Poisson, binomial, that have a common form, called the exponential family. gamma and exponential) If the distribution of Yis a memberofthe exponentialfamily, then the densityfunction of Y can be written in the form: Yfy; ( ?ff ,) exp yb ((?? )) =+ a()f cy( ??- ,)?? ?? where ? is the natural parameter whichis afunction of the mean distribution, and fis a scale parameter. Wherethe distribution as the normal, gamma and binomial distributions), we can take EY= () only ofthe has two parameters (such f to be the parameter other than the mean. Wherethe distribution has one parameter (such asthe Poisson and exponential distributions), wecan take f 1= . However, the parameterisations are not unique. Mean,varianceandvariancefunction EY () = ? b' ( var()Ya) ) ( = bf ) ?'' ( The variance function is afunction of the mean var ()Yrelates to () =Vb ?'' ( ) EY= () and gives a measureof how . Generalisedlinear models(GLMs) A GLMtakes multiple regression one step further by allowing the datato be non-normally distributed. Instead, wecan use any of the distributions in the exponential family. A GLMconsists ofthree components: 1) a distribution for the data (Poisson, 2) alinear 3) alink function (that links the meanof the response variable to the linear predictor). predictor (a function Maximum likelihood estimation exponential, gamma, normal or binomial) of the covariates that is linear in the parameters) can be used to estimate the values of the parameters in the linear predictor. The Actuarial Education Company IFE: 2022 Examination Page 56 CS1-13: Generalised linear models Linkfunctions For each underlying distribution there is onelink function that appears morenatural to use than any other, usually becauseit willresult in valuesfor that are appropriate to the distribution under consideration. Thisis called the canonicallink function, accepted link function. The canonical link functions They are equivalent to the natural parameter which meansthe are given on page 27 of the Tables. ? from the exponential family formulation of the PDF. Covariates A variable (eg age) is a type of covariate whose real numerical value enters the linear predictor directly, and a factor (eg sex) is a type of covariate that takes categorical values. Linear predictors Linear predictors arefunctions ofthe covariates. They arelinear in the parameters and not necessarilyin the covariates. The simplest linear predictor is that for the constant model: ?a= , whichis usedif it is thought that the Aninteraction mean of the response variable is the same for all cases. term is used in the predictor when two covariates are believed not to be independent. In other words,the effect of one covariate (eg the age of anindividual) is thought to depend on the value of another covariate (eg whether the sex of anindividual is male or female). The dot . notation is used to indicate aninteraction, eg age.sex is the interactive term between age and sex. The star * notation is usedto indicate the maineffects as well asthe interaction, eg: age*sex = age +sex + age.sex Aninteraction (dot) term never appears on its own. Saturated model The modelthat provides the perfect fit to the datais called the saturated model. The saturated model has as manyparameters as data points. Thefitted values i are equal to the observed values iy . The saturated model is not useful from a predictive It is, however, a good benchmark against which to compare the fit of other point of view,. models via the scaled deviance. IFE: 2022 Examinations The Actuarial Education Compan CS1-13: Generalised linear models Page 57 Scaleddeviance The scaled deviance(or likelihood ratio) is usedto compare the fit ofthe saturated model withthe fit of another model. The scaled deviance of Model1is defined as: SD L 2ln =- lnSL 11() whereSL is the likelihood of the saturated model. The poorer the fit of Model1,the biggerthe scaled deviance will be. Comparing models Wherethe data are normally distributed, it can be shown that, for two nested models, Models 1 and 2 where Model 1 has p parameters SD12 - SD ?? and Model 2 has pq+parameters: 2 q For other distributions, the difference in the scaled deviances has an approximate (asymptotic) chi-square distribution with q degrees of freedom. Alternatively, wecan compare the reduction in the AICof the two models. The processof selecting explanatory variables (1) Forward selection. Add the covariate that reduces the AICthe mostor causes a significant decreasein the deviance. Continuein this way until adding any morecausesthe AICto rise or does not lead to a significant improvement in the deviance. It is usual to consider maineffects before interaction terms andlinear terms before polynomials. (2) Backward selection. Start by adding all available covariates and interactions. Then remove covariates one by one starting with the least significant until the AICreaches a minimum or there is no significant improvement in the deviance, and all the remaining covariates have astatistically significant impact on the response. Rulesfor determining the number of parameters in a model The constant model has 1 parameter. A modelconsisting of one maineffect that is a variable (eg age) hastwo parameters (eg 0 and1 ). A modelconsisting of one maineffect that is afactor (eg sex) has as manyparameters as there are categories (egia , i = 1(male) andi = 2(female)). The Actuarial Education Company IFE: 2022 Examination Page 58 CS1-13: Generalised linear models When a new main effect is added to a model (eg age + sex), we add n1parameters where n is the number of parameters if the main effect were onits own (eg for age + sex, the number of parameters is 2 +(2 1) = 3). Whenaninteractive effect (a dot term) is added to a model(eg age + sex + age.sex), weadd (1)(mn 1)--parameters for the interactive parameters is 2 +(2 1) +(2 1)(2 effect (eg for age +sex + age.sex, the number of 1) = 4). A modelconsisting of a star term only (eg age*sex) has mn parameters where mand n are the number of parameters if the maineffects were on their own (eg for age*sex,the number of parameters is 4=22 ). Residuals Aresidual is a measure ofthe difference between the observed valuesiy and the fitted values i . Two commonly used residuals for GLMs are the Pearson residual and the deviance residual. Pearsonresiduals These are y- where var var( ) value () is Y()varwith replaced by the corresponding fitted . The Pearson residual, which is often used for normally distributed data, has the disadvantage that its distribution is often skewed for non-normal data. This makesthe interpretation residuals plots difficult. of Devianceresiduals These are sign y d- () i 2 is thescaleddeviance ofthe model. where ?di Devianceresiduals are usually morelikely to be symmetrically distributed and to have approximately normal distributions, and are preferred for actuarial applications. For normally distributed data, the Pearson and deviance residuals are identical. Testing whethera parameteris significantly different from zero As a general rule, we can conclude that a parameter is significantly different from zero if it is at least twice as bigin absolute terms asits standard error, ie if: 2. IFE: 2022 Examinations > ()se The Actuarial Education Compan CS1-13: 7 13.1 Generalised linear models Page 59 Chapter13 PracticeQuestions Explain whythe link function g () log= is appropriate for the Poisson distribution by considering the range of values that it results in taking. i 13.2 Explain the difference between the two types of covariate: 13.3 Arandom variable Y has density of exponential family form: Exam style (fy=+c ) exp yb?? () a()f a variable and a factor. ??- y( ,f) ?? ?? (i) State the meanand variance of Yin terms of ()b?andits derivatives and (ii) (a) random Show that an exponentially distributed variable ()af. with mean [1] has a density that can be written in the above form. (b) Determine the natural parameter and the variance function. [3] [Total 4] 13.4 Exam style Aninsurer wishesto use a generalisedlinear portfolio. It has collected the following different modelto analysethe claim numbers onits motor data on claim numbers iy , i = 1, 2,..., 35 from three classes of policy: ClassI 1 2 0 2 1 ClassII 1 0 1 1 0 ClassIII 0 0 0 0 1 0 1 0 For these 0 2 2 1 0 1 0 1 0 0 0 0 0 0 0 0 data values: 10 15 35 i=1 i=11 i=16 ?=yi 11 The company (i) 0 ?=yi 3 ?=yi 4 wishes to use a Poisson model to analyse these data. Showthat the Poisson distribution is a member ofthe exponential family of distributions. [2] Theinsurer decidesto use a model(Model A)for which: ?a log ? ?? The Actuarial = 1, 2, ..., 10 i==?11, 12,..., 15 i ? wherei i i = 16, 17, ..., 35 is the meanofthe relevant Poisson distribution. Education Company IFE: 2022 Examination Page 60 (ii) CS1-13: Derive the likelihood function estimates for and ?. a, for this model, and hence find the Generalised linear models maximum likelihood [4] Theinsurer now analysesthe simpler modellogi a= , for all policies. (iii) Calculate the maximumlikelihood estimate for a under this model(Model B). (iv) (a) Show that the scaled deviancefor Model Ais 24.93. (b) Calculate the scaled deviance for [2] Model B. [5] It can be assumedthat f () =yylog y is equal to zero when y0= . (v) 13.5 Compare Model A directly with Model B, by calculating an appropriate test statistic. [2] [Total 15] In the context function of generalised linear models, consider the exponential distribution with density x , where: ()f Exam style 1 fx()=>-x e (x / 0). (i) Show that x can be written in the form of the exponential family ()f (ii) Show that the canonical link function, ?,is given by of distributions. 1 [1] [1] ?=. (iii) Determine the variance function and the dispersion parameter. [3] [Total 5] 13.6 Exam style Therandom variable iZ has a binomial distribution Asecond random variable,iY , is defined as nii =YZ / (i) Show that the distribution with parameters n andi 01i<< . . ofiY is a member of the exponential family, natural and scale parameters and their functions (ii) , where ()af,)b?( and )cy (, stating clearly the f. [4] Determine the variance function ofiY. [2] [Total 6] 13.7 Astatistical distribution is said to be a member of the exponential family if its probability function or probability density function can be expressed in the form: Exam style Y(fy; (i) ?ff ,) exp yb?? () a()f ??- =+( y ,)c ?? ?? Showthat the meanof such a distribution is ?' ()b and derive the corresponding formula for the variance by differentiating the following expression with respect to ?: ? fy() dy=1 [4] y IFE: 2022 Examinations The Actuarial Education Compan CS1-13: Generalised (ii) linear Usethis models Page 61 method to determine formulae for the mean and variance of the gamma distribution with density function: a fx() a a x G a a 1 e- x/ ( x=>- 0) a [3] () [Total 7] 13.8 Exam style Independent claim amounts EY= ()ii YY 12Y?,,, n are modelled as exponential random variables with 1,2,...,in=. Thefitted values for a particular modelare denoted by i . , Derive an expression for the scaled deviance. 13.9 Asmall insurer linear wishes to modelits claim costs for [5] motor insurance using a simple generalised model based on the three factors: Exam style =1 YOi =?? i 0 for 'young' drivers??i ?? for 'old' drivers ??= ?? =1 for 'fast' cars??j ?? FSj =?? 0for 'slow' cars?? ??= j k 1 TCk =?? k for 'town' areas ??= ?? 0 for 'country' areas ??= ?? The insurer is considering three possible modelsfor the linear Model1: YO FS TC++ Model2: YO FS YO .FS +TC++ Model3: YO FS ** TC predictor: (i) Writeeach of these modelsin parameterised form, stating how many non-zero parameter values are present in each model. [6] (ii) Explain why Model 1 might not be appropriate and why the insurer may wish to avoid using Model 3. [2] The student fitting the modelshas saidWe are assuming a normal error structure and weare usingthe canonical link function. (iii) Explain what this The Actuarial Education Company means. [3] IFE: 2022 Examination Page 62 CS1-13: The table below shows the students calculated Generalised linear models values of the scaled deviance for these three modelsandthe constant model. Scaled Model Deviance 1 YO (iv) 50 FS TC++ 5 YO 0 (a) 7 10 FS++ YO.FS + TC YO Degrees offreedom **FS TC Complete the table by filling in the missing entries in the degrees of freedom column. (b) 13.10 The following Carry out the calculations necessaryto determine which model would bethe mostappropriate. [5] [Total 16] study was carried out into the mortality of leukaemia sufferers. A white blood cell count wastaken from each of 17 patients and their survival times wererecorded. Exam style Supposethat iY represents the survival time (in weeks)of the thi patient andix logarithm (to the base 10) ofthe thi patients initial represents the white blood cell count ( i = 1,2, ?,17 ). Theresponse variablesiY are assumedto be exponentially distributed. A possible specification for ()iEY is EYii)xa ()=+ exp( (i) (ii) . This will ensure that Write down the natural link function associated with the linear predictor ? =+ iix. [2] a Usethis link function andlinear predictor to derive the equations that mustbe solved in order to obtain the maximumlikelihood estimates of a and . [4] The maximumlikelihood estimate of estimated standard error 1.655. (iii) is non-negative for all values of ix . ()iEY a derived from the experimental datais Construct an approximate 95%confidence interval for Thefollowing two a 8.477= , with a and interpret this result. [2] modelsare now to be compared: Model 1: ()iEY a= Model2: ()iiEY=+ax The scaled deviancefor Model 1is found to be 26.282 and the scaled deviance for Model 2is 19.457. (iv) Test the null hypothesis that 0= against the alternative hypothesis that 0? stating any conclusions clearly. [3] [Total 11] IFE: 2022 Examinations The Actuarial Education Compan CS1-13: Generalised linear models Page 63 Chapter13Solutions 13.1 When weset the link function make the subject, weget for a 13.2 ()Poidistribution () log= equal to the linear predictor ? e?= . Thisresults in positive values only for where and then invert to , whichis sensible is defined to be greater than 0. Avariable is atype of covariate (eg age) whoseactual numerical value enters the linear predictor directly, 13.3 g and a factor is a type of covariate (eg sex) that takes categorical values. This questionis taken from Subject 106, April 2003, Question 3. (i) Mean and variance Wehave: EY [](== (ii)(a) )''f b ( ?) var[] Y [1] a b) ?' ( Exponential form The PDFof the exponential distribution with mean ( ) fy 1 is: y?? exp =-?? ?? This can be written as an exponential: fy=-()exp ln 1 y?? [1/2] ?? ?? Comparing this to the standard form givenin part (i), wecan define: =- ,(?f ) =1,( ab ? ) =- ln 11 = -ln( - ?), cy( f) = 0 [1] , (ii)(b) Natural parameter and variance function The natural parameter is ?, so here the natural parameter is: 1 [1/2] - The variance function is (by definition) ()=- bb()''' ?? ? ?'' ()b, so here wefind: 11 = ? 2 = 2 [1] [Total 3] The Actuarial Education Company IFE: 2022 Examination Page 64 13.4 (i) CS1-13: Exponential models family For the Poisson distribution, f () =ye Generalised linear wehave: y / y! - We wishto writethis in the form: yb?? () (gy=+( ) exp ??- c y,f) ?? ?? a()f Rearranging the Poissonformula: fyy () ylog exp - 1 ?? =-log ! ?? [1] ?? Wecan see that this hasthe correct form with: log? (ii) ( ? )== be a(f ) ? = = f = 1 fc( y, ) =- log y! [1] Maximumlikelihood estimates Using the rearranged form for the Poisson distribution likelihood function can be written: log ( , , III III ) This now becomes,for log y=+ -?? Lyi log 15 Ly a ?? 3 wesee that the log of the (*) ?logiy ! i 35 ii 4? ?yi -10e +? ii==111 =+ part (i), Model A: 10 11 +a =-i from i - 10 - 5e a 35 - 20e -?log!yi = 16 [1] ? i=1 -5 a 35 - 20ee e -?log y!i ? (**) i=1 Differentiating this log-likelihood function in turn with respect to ? ? log a, and ?, weget: 11=- 10Lea [1/2] a ? log5Le 3=- [1/2] log [1/2] ? and: ? 4=- 20Le? ?? IFE: 2022 Examinations The Actuarial Education Compan CS1-13: Generalised linear models Page 65 Setting each of these expressions equal to zero in turn, log1.1 a and: ? wefind that: [1/2] 0.09531== log 0.6 0.51083==- [1/2] log 0.2 1.60944==- [1/2] These are the maximum likelihood estimates for a, and ?. Simpler model (iii) In this casethe log-likelihood function reduces to: 35 logL=- 35 y 35 ??logeyii ! - 35 = 18 aa - 35 aa ?logey!i - ii==11 i (***) =1 [1] Differentiating 18 (iv)(a) this 35 with respect to -= 0 ea ? a Scaled deviance for The scaled deviancefor a, and setting the result equal to zero, wefind that: 18?? ==-??0.66498 35?? log [1] Model A Model Ais given by: Scaled Deviance 2(log )SA =- log LL wherelogSL is the value of the log-likelihood function for the saturated model,and logAL is the value of the log-likelihood function for For the saturated log Model A. model, wereplace the i ySi i=y?? Lylog =4 2log2 18 with the iy s in Equation (*) . So: ?log! i yi - s 4log2-=- 4log2 Weusethe hintin the question here. 18 = - 15.2274 [1] log yyii is zero when y0= , and also when y1= . Sothe only contribution to the first term is when y2= , giving 4lots of 2log2 . For the log-likelihood a , and ? for Model A, wereplace the parameters a, and ? with their estimates in Equation (**): log +a=+ 3 11 4? - 10 - 5ae - 20Le -e? 35 ?log! y Ai i=1 11log1.1=+ 3log0.6 The corresponding The Actuarial Education value for log AL Company + 4log0.2 - 11 - 3 - 4 - 4log2 = - 27.6944 [1] without the final term is 24.9218. IFE: 2022 Examination Page 66 CS1-13: Generalised linear models So the scaled deviance is twice the difference in the log-likelihoods: Scaled Deviance (iv)(b) 2(logLLlog =- Scaled deviance for ) = 2 (-15.2274) - ( - 27.6944)() = [1] 24.93 Model A and Model B Wenow repeat the processfor Using Equation (***), SA Model B. the log-likelihood for Model Bis: 35 log 18a=- 35Le a ?logBiy ! - i =1 18?? ??=--18 18log 35?? 4log2 = - 32.7422 [1] The value without the final term is 29.9696. The scaled devianceis again twice the difference in the log-likelihoods: Deviance 2(log =SB =- logLL ) Sca (v) [1] 2 ( - 15.2274) - ( - 32.7422)()led = 35.03 Comparing A with B Wecan use the chi-squared distribution to compare Model A with Model B. difference in the scaled deviances(which is just 2(log 35.03 24.93 A - Wecalculate the logLLB) ): 10.10-= [1] Thisshould have a chi-squared distribution with 2-= 31 degrees offreedom, which has a critical value at the upper 5%level of 5.991. Ourvalue is significant here, since 10.10 5.991> , so this suggests that 13.5 Model Ais a significant improvement over Model B. Weprefer Model A here. [1] Thisis Subject 106, September 2000, Question 2. (i) Exponential family Weneed to expressthe density function in the form exp yb?? () a()f ??- ( ,f) ??+ cy . ?? Wecan write the density function as: fy ( ) exp-?? =- y ?? log ?? Soif welet: =- 1( b ? ) =log ? =ac 1( ) ff IFE: 2022 Examinations 1 =- log( - ?) ( y,f== ) 0 The Actuarial Education Compan CS1-13: Generalised linear models Page 67 then the density function (ii) will have the appropriate form. Canonicallink function Wesee from (iii) part (i) that 1/=- . ? [1] Variance function and dispersion parameter The variance function is variance function is (i) twice, wefind that b'' () ()b. Differentiating )b?( ?'' 2. == 221/?? . Sothe [2] The dispersion parameter or scale parameter is 13.6 [1] [1] f 1=. ShowiY is member of the exponential family The PF ofiZ is: n?? zi(1 fz () =- ?? zi?? iii )nz - i The PF ofiY can be obtained by replacing iz n fy () ?? ) nyi (1=?? withiny : - nnyi iii nyi ?? [1/2] This can be written as: fy ()i=+exp ln n ?? ?? nyln ii ) nyi?? ?? =+ nln(1 - i 1- i ?? Comparing this to the expression ln i Education ln?? n???? ?? nyi?? ??-1 ?? ii ) ?? =+ln n?? ?? ???? nyi?? ?? ?? ?? [1/2] i ??-1 Company e?i i [1] on page 27 of the Tables, we see that the natural parameter is: ?? ?? Rearrangingthis gives The Actuarial ?? ?? i ??) ?? ?? 1n = ) i ?? i y ln??+-ln(1 ?i -nyiln(1 ?? ?? nyiiln?? +??exp exp +nln(1 - i , so the function = 1+e is given by: ? ()ib ?i IFE: 2022 Examination Page 68 CS1-13: ?? e? i ( 1ii) =- ln(1 ) -? ln =- ?? ?? =-ln +?? - 1 ?? ?? ii 11++?? ee ?? Generalised linear models =ln 1 be ?i () [1] The scale parameter and its functions are: 1 na,(ff )== f (ii) n ?? c(yi, f ) =ln ??= nyii?? , ln ? f ? ? ? fy ? ? [1] The variance function The variance function is given by var() = b )?''( . Differentiating )b? ( gives: e? b' ?() == [1/2] i 1 + e? ?? 1+ee () e? e? b'' ?() e? == () Substituting in () ?? i ?= ln [1/2] 22 ?? 11++ ee ?? gives: i ??-1 i 1 b ?()'' == 2 1 ?? ??+ i i - ii 1- i (1 - [1] ) 2 = ii (1 - i ) ??-1 [Total 13.7 (i) Derive mean and variance Mean Differentiating both sides with respect to ? gives: yb' () ? - ? a()f y fy dy()= 0 [1] Simplifying: b1() ' ? yf y() dy-= ()?? aa() ff () f y dy 0 yy Since?yf y() dyY= E( ), and? fy() dy=1, wehave: () EY () IFE: 2022 Examinations b1 ) ' ( ? aaff( ) -=0 [1/2] The Actuarial Education Compan 6] CS1-13: Generalised linear models Page 69 Hence: EY ()0-= ( ) =EY() ? [1/2] b ( ) ''b?? Variance Usingthe product rule to differentiate the above equation with respect to 2 ??' yb()? ?? db'' ()? ??f y fy dy () =() ?? () ?? () ff aa ?? 2 =?? ??2 fy() dy d? yy?? Splitting this into two separate integrals a()[] 2 y b ?()()2 fy ( ) dy--' = ? )Y?' then 0 [1] gives: b''?1() ??fy( ) dy =0 a()f yy f Since(bE () ? gives: (( 2))yb var(Y. ) f( y) dy -='? Again ? fy()dy=1, so wehave: b''1(?) var(Y)-=0 a()f a()f []2 [1/2] Rearranging gives: var()Ya) (ii) [1/2] bf ( ) ?''( = Meanand variance of gamma Thelog ofthe PDFgiven is: log fxx ( ) (logaa =- log ) - log G( a ) + ( a- 1)log - a x which can be written as: (fxx) exp-?? x -- -[log(1 1/a )] =+ logaa - log G( a ) +(a 1)log ?? ?? This conforms to the definition ofthe exponential family, with: 1 ?=- f a= , , b??() () aff= =- log( - ) , 1 , (cx, )a=-aa log a log G( a ) + ( - 1)log x [1] Applying the results in (i): The Actuarial Education Company IFE: 2022 Examination Page 70 CS1-13: EX () b ( ?)==' - 11 -( 1) = - = 11 2 Generalised linear models [1] -?? var( ) and: )f?Xa ( ) b'' ( == f 13.8 ? [1] = 2 a The scaled deviance is given by: scaled deviance 2[lnLLlnSM =- where SL is the likelihood the fitted function [1/2] ] for the saturated model, and ML is the likelihood function for model. First we need the log-likelihoods: () Lf( y)1= fy(in ) e = 1 1 ?? -- 11 yy 11e 1 1 = -? 1 ni 1 n ? n yni [1] e Takinglogs: ln ( ) =-??ln 1 Lyi ii [1/2] i Sothe log-likelihood ofthe fitted lnLyMi =- 1 i -??ln For the log-likelihood model,lnML is given by: [1] i ofthe saturated model,lnSL , the fitted values, i arethe observed values, yi . Hence: lnLy =- ln Si ?? yi1 yi - =- lnyi ?? 1 [1] - So the scaled deviance is: scaled deviance 2{ (?? lnyy=- 1()- - ?ln ?=-2lny - 1 +ln ii + ii - ?1 )} i i {} yi i ?2ln y 13.9 (i) )=- 1 + {}( yii [1] ii Parameterised form In parameterisedform, the linear predictorsare(with i , j and k correspondingto the levels of YO, FS and TC respectively): Model 1: IFE: 2022 Examinations a (4 parameters) k? ij ++ The Actuarial Education Compan CS1-13: Generalised linear models Page 71 There is one parameter to set the baselevel for the combination 0,,YO FS 00 TC and one additional parameter for each ofthe higherlevels ofthe three factors. Model2: a ij There are four parameters for the additional (5 parameters) k?+ 22 combinations of YO and FS (assuming 0TC ) and one parameter for the higher level of TC. Model3: (8 parameters) a ijk Thereare eight parametersfor the 2 22 combinations of YO, FS and TC. [2 for each model] (ii) Problems with Model 1 and Model3 Model 1 does not allow for the possibility that there some of the factors. to live in towns. For example, it maybeinteractions (correlations) between may be the case that young drivers tend to drive fast cars and [1] With Model 3, whichis a saturated model,it would be possible to fit the average values for each group exactlyie there are no degrees offreedom left. This defeats the purpose of applying a statistical model, as it would not smooth The problem referred to (iii) out any anomalous results. [1] with Model 3 corresponds to the idea of undergraduation Explaining normal error structure in Subject CS2. and canonical link function Normal error structure meansthat the randomness present in the observed valuesin each category (eg young/fast/town) is assumed to follow a normal distribution. [1] Thelink function is the function applied to the linear estimator to obtain the predicted values. Associated with each type of error structure is acanonical or natural link function. In the case of a normal error structure, the canonical link function is the identity function . (iv)(a) [2] Completed table The completed table, together with the differences in the scaled deviance and degrees of freedom, is shown below. Scaled Model Deviance Constant: 1 DF 50 7 ? Scaled Deviance ? DF Model 1: YO FS TC++ 10 4 40 3 Model 2: YO FS YO.FS + TC++ 5 3 5 1 Model3: FS **YO TC 0 0 5 3 [3] The Actuarial Education Company IFE: 2022 Examination Page 72 (iv)(b) CS1-13: Compare Generalised linear models models Comparingthe constant modeland Model1 The difference in the scaled deviances is 40. Thisis greater than 7.815,the upper 5% point ofthe So Model 1is a significant improvement Alternatively, if (deviance) ?> 2 Comparing models, wefind that [1/2] since , Model1is a significant improvement over the constant model. (parameters) Model 1 and distribution. overthe constant model. we use the AIC to compare ? 2 ?3 Model 2 The difference in the scaled deviancesis 5. Thisis greater than 3.841, the upper 5% point of the So Model 2is a significant improvement 2 ?1 distribution. over Model1. Alternatively, if we usethe AIC to compare [1/2] models, wefind that since (deviance)?> 2 ? (parameters) , Model2is a significant improvement over Model1. Comparing Model 2 and Model 3 The difference in the scaled deviancesis 5. Thisis less than 7.815,the upper 5% point ofthe So Model 3is not a significant improvement 2 ?3 distribution. over Model2. [1/2] Alternatively, if we usethe AICto compare models, wefind that since (deviance) ?> 2 ? (parameters) , Model 3is not a significant improvement over Model 2. So Model 2is the mostappropriate in this case. 13.10 (i) [1/2] Natural link function Usingthe linear predictor ? =+ a ()== iix, we have EYii e?i . So == ?g ii() ln i is the natural link function. (ii) [2] Equations Assumingthat ??YExp ii (), 17 we havethe following likelihood function: 17 Lf yii()==???e ?- iiy ii== 11 IFE: 2022 Examinations The Actuarial Education Compan CS1-13: Generalised linear models Page 73 Taking logs: 17 lniLy 17 17 ln?? =ii=-?? ii== =11 ?? ?? ?? ?ln i 17 1 17 =- ?? () +axye - yi?? ?? sincei i=1ii ?? 17 11 = E( Y i ) = ? ?i xi() -+ a ii [1] ii== 11 Differentiating withrespect to ?lnL a gives: 17 -+ =+17 ?ye a i ?a xi() [1] i=1 and differentiating with respect to 17 ?lnL ? 17 xxy +??=ii i -+a gives: xi() [1] e ii== 11 Setting these expressions equal to 0, we obtain: 17 ?iye -+ xi() 17 a = i=1 17 17 x iiye i()= ??x-+ xi a [1] ii== 11 (iii) Confidenceinterval An approximate 95% confidence interval for . aa (1.96 )= 8.477 se Since this confidence interval (1.96 1.655) ais: = 8.477 3.244 = (5.233,11.721) [1] does not contain zero, it is reasonable to assume that the parameter is non-zero and should be kept. Thisis equivalent to the significance (iv) Test Comparing 0:0H= dev?= 26.282 of this distribution. .2(> )seaa models against Comparing with of a parameter test: [1] 2 ?1 1:0H? - 19.457 . The test statistic is: = [1] 6.825 wefind that the value ofthe test statistic exceedsthe upper 1% point (6.635) Wetherefore reject the null hypothesis and conclude that Model 2 significantly reduces the scaled deviance(ie it is significantly better fit to the data ), and that survival time is dependent The Actuarial Education Company on initial white blood cell count. [2] IFE: 2022 Examination Allstudy materialproducedby ActEdis copyright andis sold for the exclusiveuse of the purchaser. Thecopyright is ownedbyInstitute andFacultyEducationLimited, a subsidiary of the Institute and Faculty of Actuaries. Unlessprior authorityis grantedby ActEd,you maynot hire out,lend, give out, sell, store ortransmit electronically or photocopy any part of the study material. You musttake care of yourstudy materialto ensurethat it is not used or copied by anybody else. Legal action will betaken if these terms areinfringed. In addition, we mayseekto take disciplinaryactionthrough the profession orthrough your employer. Theseconditionsremainin force after you havefinished usingthe course. IFE: 2022 Examinations The Actuarial Education Compan CS1-14: Bayesian statistics Page 1 Bayesian statistics Syllabusobjectives 5.1 Explainthe fundamental concepts of Bayesian statistics and usethese concepts to calculate Bayesian estimates. 5.1.1 5.1.2 Use Bayes theorem to calculate simple conditional Explain whatis meant by a prior distribution, a posterior distribution and a conjugate The Actuarial probabilities. prior distribution. 5.1.3 Derive the posterior 5.1.4 Explain whatis meant by aloss function. 5.1.5 Usesimple loss functions to derive Bayesian estimates of parameters. 5.1.6 Derive credible intervals in simple cases. Education Company distribution for a parameter in simple cases. IFE: 2022 Examination Page 2 0 CS1-14: Bayesian statistics Introduction Earlierin this course welooked at the classical approach to statistical estimation, when we introduced the method of maximumlikelihood and the method of moments. There weassumed that the parameters to be estimated were fixed quantities. In this chapter wedescribe the Bayesian approach. This will also be usedin Chapter 15. The Bayesian philosophy to classical statistical involves a completely different approach to statistics, compared methods. The Bayesian version of estimation is considered here for the basic situation concerning the estimation of a parameter given a random sample from particular distribution. Classical estimation involves the method of maximum likelihood. The fundamental ? is considered difference between Bayesian and classical methods is that the to be a random variable in Bayesian methods. a parameter We might have some knowledge about the likely value of ? and wecan represent this using a distribution. For example, we might believe that ? is equallylikely to take any of the values 1, 2 or 3. In this respect, wecan treat ? as a random variable. In classical statistics, ? is a fixed but unknown quantity. the careful interpretation required for classical confidence that is random. there is This leads to difficulties such as intervals, where it is the interval As soon as the data are observed and a numerical interval is calculated, no probability involved. A statement such as P(10.45 <<? 13.26) = 0.95 cannot be made because ?is not arandom variable. In classical statistics associated ? either lies within the interval orit does not. There can be no probability with such a statement. In Bayesian statistics concerning the values no such difficulties arise and probability statements can be made of a parameter ?. In fact, wecan calculate a Bayesian confidence interval for a parameter, which wecall a credible interval. Wecover this in section 5. Another advantage of Bayesianstatistics is that it enables us to makeuse of anyinformation that wealready have about the situation under investigation. Often researchers investigating an unknown population parameter haveinformation available from other sources in advance of their study. take. Thisinformation might provide a strong indication The classical statistical of what values the parameter is likely to approach offers no scope for researchers to take this additional information into account. The Bayesianapproach, however, does allow for the use of this information. For example, suppose that aninsurance company is reviewing its premium rates for a particular type of policy and has access to results from other insurers, as well as from its own policyholders. Thisinformation from other insurers cannot be taken into account directly becausethe terms and conditions of the policies for other companies maybe slightly different. However, this additional information might be very useful, and hence should not beignored. IFE: 2022 Examinations The Actuarial Education Compan CS1-14: Bayesian statistics 1 Page 3 Bayestheorem If kBB 12, , ..., B constitute a partition of a sample space S and 0iPB () ? for 1,?2, ,ik= , then for any event A in PA? () P( B rrB) PBA? ()r = for ,rk= 1,2, ? S such that 0PA ( ) ? : () PA k where P(A)= ? PA? B()P( iiB ) i= 1 . A partition of asample spaceis a collection of events that are mutually exclusive and exhaustive, ie they do not overlap andthey cover the entire range of possible outcomes. The result above is known as Bayes theorem (or Bayes formula) and is given on page 5 of the Tables. It follows easily from the result: PA ()n= P( A)P( B| A) B which rearranges to give the conditional A(| PB PA ) = n probability formula: B () PA () However: PA(n= () B P B n A) = P( B)P( A| B) Now,replacing B byrB , wehave: A(| PBr )== PB )rr n () A PA () ( ) P( A| Br PB PA () and, from the law of total probability: PA () =? P( A B P( ii) B) i Bayes formula allows us to turn round a conditional probability, ie it allows us to calculate PB (| A) if we know )PA (| B . Question Three manufacturers supply clothing to a retailer. 60% of the stock comes from Manufacturer 1, 30% from Manufacturer 2 and 10% from Manufacturer 3. 10% of the clothing from Manufacturer 1 is faulty, 5% from Manufacturer 2 is faulty and 15% from Manufacturer 3is faulty. Whatis the probability The Actuarial Education Company that a faulty garment comes from Manufacturer 3? IFE: 2022 Examination Page 4 CS1-14: Bayesian statistics Solution Let from A be the event that Manufacturer a garment is faulty and let iB be the event that the garment comes i. Substituting the figures into the formula for Bayes theorem: 3 () = PBA? Although the faulty (0.15)(0.1) (0.1)(0.6)(0.05)(0.3)(0.15)(0.1)++ = 0.015 = 0.167 0.09 Manufacturer 3 supplies only 10% of the garments garments come from that manufacturer. to the retailer, nearly 17% of Analternative way of tackling this question is to draw a tree diagram. There are 3 manufacturers so westart with 3 branches in our tree and mark on the associated probabilities: B1 0.6 0.3 B2 0.1 B3 Eachgarmentis eitherfaulty (event A) or perfect(event 'A ). Theseoutcomesandtheir (conditional) probabilities are now added to the diagram: 0.1 A B1 0.9 0.6 0.05 0.3 A' A B2 0.95 0.1 0.15 A' A B3 0.85 IFE: 2022 Examinations A' The Actuarial Education Compan CS1-14: Bayesian statistics The required Page 5 probability is: A(| PB3 ) = PB3 n A () PA () From the diagram wecan see that PB( 3 )n=A0.1 0.15 = 0.015. (This is obtained by multiplying the appropriate branch weights.) Wecan also see that there are three waysin which event A can occur. Since these are mutually exclusive, probabilities. ( ) PA we can calculate PA () by summing the three associated Hence: (0.6= 0.1) + (0.3 0.05) + (0.1 0.15) = 0.09 and it follows that: 0.015 PB3(| A) 0.09 ==0.167 as before. Bayestheoremcanbeadaptedto deal withcontinuousrandomvariables.If X and Yare continuous,then the conditional PDFof Y given Xis: y(, fx ) | YX fx y) XY(, == XY | fx(, y) f Y( y) , fx() XX fx() where: f () xfXXY x, y) dy== ??fX,|(Y( x, y) fY (y) dy yy The Actuarial Education Company IFE: 2022 Examination Page 6 2 CS1-14: Bayesian statistics Priorand posteriordistributions Suppose X =( XX 12,,..., Xn) is arandom sample from a population specified by the density or probability function )fx( ; ? and it is required to estimate ?. Recallthat a random sample is aset ofIID random variables. Herethe Core Readingis using the letter f for both the density function of a continuous distribution and the probability function of a discrete distribution. As a result of the parameter ? being a random variable, it will have a distribution. allows the use of any knowledge available about possible values for This ? before the collection of any data. This knowledge is quantified by expressing it as the prior distribution of ?. The prior distribution summarises what weknow about ? before wecollect any datafrom the relevant population. Then after collecting appropriate data, the posterior this forms the basis of all inference concerning ?. distribution of ? is determined, and The Bayesian approach combines the sample data withthe prior distribution. The conditional distribution of ? given the observed datais called the posterior distribution of ?. 2.1 Notation As ? is a random written as ?T ()f . variable, it should really be denoted by the capital T, and its prior density However, for simplicity no distinction will be made between T and ?, and the density will simply be denoted by is continuous. (like the binomial ? ()f ? . Notethat referring to a density hereimplies that In most applications this will be the case, as even when X is discrete or Poisson), the parameter (p or ?) will vary in a continuous space ((0,1) or (0,)8 , respectively). Also the population density or probability the earlier )fx( ; ? as it represents The prior and posterior distributions function the conditional will be denoted distribution of by )fx (| ? rather than X given ?. of ? always have the same support (or domain). In other words,the set of possible values of ?is the same for both its prior and posterior distributions. So,if the prior distribution is continuous, then the posterior distribution is also continuous. Similarly, if the prior distribution is discrete, then the posterior distribution is also discrete. Suppose, for example, that the parameter beta distribution as the prior distribution ? musttake a value between 0 and 1. We might use a (as a beta random variable must take a value between and 1). The posterior PDFof ?is then also non-zero for values of ? in the interval (0,1) only. 2.2 Continuous prior distributions Suppose that X is a random sample from a population specified by)fx (| ? and that ? has the prior density )f ( ? . IFE: 2022 Examinations The Actuarial Education Compan 0 CS1-14: Bayesian statistics In other Page 7 words, nXX? 1,, is a set of IID random variables whose distribution depends on the value of ?. Each of these random variables has PDF f x(| ?) . Determining the posterior density The posterior density of |X? is determined by applying the basic definition of a conditional density: (|fX?) = (,fX ) ? fX () = fX ???)() f( fX() Notethat f(X) = ?fX? () f ( d)???. This result is like a continuous version of Bayes theorem. Wesaw this result at the end of Section 1. A useful way of expressing the posterior density is to use proportionality. involve ? and is just the constant needed to fX() does not makeit a proper density that integrates to unity, so: fX() f????? f X? () ( ) This formula is given on page 28 of the Tables. Also,)fX (| ? , being the joint making the posterior density ofthe sample values, is none other than the likelihood, proportional to the product of the likelihood This idea is really the key to answering questions involving continuous formula for the posterior PDF can also be expressed asfollows: post ?? () fC= fprior() and the prior. prior distributions. The L where: fpr ()ior ? is the prior PDFof ? fpost ()?is the posterior PDFof ? L is the likelihood function obtained from the sample data Cis a constant that makesthe posterior PDFintegrate to 1. Question The annual number of claims arising from a particular group of policiesfollows a Poisson distribution with mean . The prior distribution of is exponential with mean30. In the previous two years, the numbers of claims arising from the group were 28 and 26, respectively. Determine the posterior distribution of The Actuarial Education Company . IFE: 2022 Examination Page 8 CS1-14: Bayesian statistics Solution Weare told that prior PDF of prior has an exponential distribution with a meanof 30. So Ex (1 / 30)pand the is: () = 1 fe- /30 30 > , 0 () , LetjX represent the numberofclaims in yearj . ThenXj ? Poisson function ? and the likelihood obtained from the sample data is: 28 () LP X 28)== X( 12(= 26) = P 28! 26 ee-26! -254 =Ce where Cis a constant. Combining the prior distribution pos ()t =fKe- 61 /30 and the sample data, we see that the posterior 54 , > PDF of is: 0 for some constant K. Comparing this with the formula the posterior distribution of for the gamma PDF given on page 12 of the Tables, is wesee that Gamma 30, ()6155 . Thetable in Section 4 ofthis chapter shows the posterior distribution for some common combinations of likelihoods and prior distributions. You do not need to learn this table. However, you should check that you can derive some of the results in the table, working along the lines shown in the solution above. Conjugate priors For a given likelihood, if the prior distribution leads to a posterior distribution belonging to the same family as the prior distribution, then this prior is called the conjugate prior for this likelihood. Thelikelihood function determines whichfamily of distributions willlead to a conjugate pair,ie a prior and posterior distribution that come from the same family. Conjugate distributions can be found by selecting a family of distributions that has the same algebraic form asthe likelihood function, treating the unknown parameter asthe random variable. IFE: 2022 Examinations The Actuarial Education Compan CS1-14: Bayesian statistics Page 9 Question Suppose that X nXX? 12,, , is a random sample from a Type 1 geometric distribution with parameter p , where p is arandom variable. Determine a family of distributions for p that would result in conjugate prior and posterior distributions. Solution Each of the random PX x () variables iX p(1== - p) x - 1 has probability function: x =1, 2, 3,? , If the observedvaluesof XX nxx ,,, ?,, 12 , Xn are x?12 , then the likelihood function is: nn Lp())== P( X iix =?? p(1 n p )ii(11 = p - - p) ?xx -- n ii ==11 Weknow that p musttake a value between 0 and 1. Toresult in a conjugate prior, the PDFof p mustbe of the form: something (1 -ppsomething ) , for 01p<< ie p musthave a beta distribution. Usingconjugate distributions often makesBayesian calculations simpler. Conjugate distributions may also be appropriate to use where there is a family of distributions provide anatural probability that might be expected to modelfor the unknown parameter, egin the previous example wherethe parameter p had to lie in the range 01p<< (which is the range of values over which the beta distribution is defined). Uninformativeprior distributions An uninformative prior distribution assumesthat an unknown parameter is equally likely to take any value from a given set. In other words, the parameter is modelled using a uniform distribution. Asan example, suppose that we have a random sample with mean , but wehave no prior information about model using a uniform distribution. appropriate distribution The Actuarial uniform prior is U -8 8(, Since nXX? 12X ,, , from a normal population . In this caseit would be natural to can take any value between -8 ) . This leads to a problem, and 8, the however, since the PDF of this is 0 everywhere. Education Company IFE: 2022 Examination Page 10 CS1-14: Wecan get round this problem by using the distribution )Um ( m, ) , then the prior PDFof Um (, m-? ? 1 if ? () = ? 2m fprior ?? Bayesian statistics and then letting m?8 . If is: <mm -< 0otherwise Also,since the data values come from a normal population, the likelihood function is: nn () Lfxi ( )==?? ii 11 exp 2 11 xi 2 sp s == 2??-?? ?? ???? ?? ?? Asusual, weare using s to denote the population standard deviation. Theformula for the PDF of N(, 2)sis given on page 11 of the Tables. The likelihood function can alternatively () LC exp=- 1 n xi ? 2i =1 s be expressed as: 2??-?? ?? ?? ?? ?? ?? where Cis a constant that does not depend on . Combining the prior PDF withthe likelihood function gives: ? <Km ? exp ? fpost = () 1 2 ??-?? 2 n xi ????? if -?? ??? =1 s < ? 0otherwise ?? where Kis also a constant that does not depend on . the PDFintegrates Letting m ??i This constant is required to ensure that to 1. m?8 , wesee that the posterior PDFis proportional to: 1 exp n ? 2??-?? xi 2i =1 s ?? ??-, for ?? ?? ?? -8 <8 < Notice that the PDF of this posterior distribution is proportional to the likelihood function. This should be intuitive as, by definition, a posterior distribution is obtained by combining two pieces ofinformation: prior knowledge of the parameter, and the sample data. However, in this case we are using an uninformative parameter. The posterior IFE: 2022 Examinations distribution is therefore prior as we have no prior knowledge determined of the solely by the sample data. The Actuarial Education Compan CS1-14: Bayesian statistics 2.3 Page 11 Discreteprior distributions Whenthe prior distribution is discrete, the posterior distribution is also discrete. To determine the posterior distribution, we must calculate a set of conditional probabilities. This can be done using Bayesformula. Question The number of claims received per weekfrom a certain portfolio has a Poisson distribution with mean ?. The prior distribution of ?is asfollows: ? 1 2 3 Prior probability 0.3 0.5 0.2 Given that 3 claims were received last week, determine the posterior distribution of ?. Solution Let X be the number of claims received in a week. To determine the posterior distribution of ?, we must calculate the conditional probabilities (1| PX? == 3) , (2| == 3)PX? and (3| PX? == 3) . Thefirst of these is: ?==(1| PX 3) (1,PX== 3) PX== (3) = X( P = = 3| = 1)P(?? ? = 1) P X (3) Since X ? Poisson ?() : PX== (3|? 1) = e -13 11 = 3! 6 e- 1 and, from the given prior distribution, weknow that: P ? (1)== 0.3 So: (1| PX 3) = ?== 11 ee -- 11 0.3 PX (3) = 620 P( X== 3) Similarly: e?==(2| PX The Actuarial Education 3) Company PX (3|2)P = PX (3) ??(== = 2) = 2 23 0.5 3! PX== (3) 2 = 3 e - 2 P X =(3) IFE: 2022 Examination Page 12 CS1-14: Bayesian statistics and: 333 e?==(3| PX PX (3| 3) = 3)P??(== = 3) = P X (3) 0.2 3! 9 e-3 = 10 PX== (3) PX =(3) Sincethese conditional probabilities mustsum to 1,the denominator mustbethe sum of the three numerators, ie: (PX== 3) 20 9 --12 +10e - 3 = 0.15343 +123e e Thiscan also be seen using the law oftotal probability: PX = = PX(3) (3, = ( PX = 1) + PX (3,?? == 3| = 1)(P?? = 1) 3) == (PX + P( X+= 3| ? = 3, ? == 3) 2)(P ? = 2) (PX+= 3| ? = 3)P(? = 3) Sothe posterior probabilities are: 1 e- 1 1|PX?( 3)== = 20 0.15343 2 2|PX?( 3)== = 3 e- 2 0.15343 9 10 3|PX?( 3)== = = 0.11989 = 0.58806 e-3 0.15343 = 0.29205 Alternatively, wecould use a proportionality The posterior probabilities argument to determine are proportional to the likelihood ( 1|PX 3)== ?P X( = 3|?? = 1)P( ? = 1) = e ( 2|PX 3)== ?P X( = 3|?? = 2)P( ? = 2) = ( 3|PX 3)== ?P X( = 3|?? = 3)P(? = 3) = Rescaling so the probabilities (? 1|PX 3)== = (? 2|PX 3)== = (? 3|PX 3)== = IFE: 2022 Examinations 0.3 e-23 2 3! e-33 3 3! the posterior probabilities. multiplied by the prior probability: =11 e--11 620 0.5 = 0.01839 = 2 e - 2 = 0.09022 3 9 e-3 = 0.04481 0.2 = 10 sum to 1 we get: 0.01839 0.01839 = 0.11989 0.09022++ 0.04481 0.09022 0.01839 0.09022 ++ 0.04481 0.04481 0.01839 0.09022++ 0.04481 = 0.58806 = 0.29205 The Actuarial Education Compan CS1-14: Bayesian statistics The posterior Page 13 probability that 1? = is lower than the corresponding prior probability. The other two posterior probabilities are higher than their corresponding prior probabilities. Thisis to be expected given that the observed number of claims was3. Once we have determined the posterior distribution of a parameter, wecan usethis distribution to estimate the parameter value. As we are about to see, the estimate will depend on the chosen loss function. The Actuarial Education Company IFE: 2022 Examination Page 14 3 CS1-14: Bayesian statistics Thelossfunction To obtain an estimator of the loss incurred when ?, a loss function must first X ()g is used as an estimator which is zero when the estimation and does not decrease used loss function, as gX() is exactly gets further called quadratic be specified. This is a measure of of ?. Aloss function is sought correct, that is, away from or squared gX()?= , and which is positive ?. There is one very commonly error loss. Two others are also used in practice. Then the Bayesian estimator is the the posterior X ()gthat minimises the expected loss with respect to distribution. The main loss function Lg(( x x??),) is quadratic loss g( )=- defined by: []2 So, when using quadratic loss, the aimis to minimise: ??-= () ?? ??? ?? Eg () x ? ? x - ()22()g fpost( )d ? This is related to Recallthat, if ? mean square error from classical statistics. is an estimator of ?,then: MSE (??() E =- ? ?? bias() ? ? )2?? = var()+?? ?? 2 The formula for the squared error loss implies that as we movefurther away from the true parameter value, the loss increases at anincreasing rate. The graph of the loss function is a parabola with a minimum of zero at the true parameter value. loss g ? A second loss function Lg(( x), )x=- IFE: 2022 Examinations is absolute ( ) error loss defined by: ??g The Actuarial Education Compan CS1-14: Bayesian statistics Page 15 Herethe graph ofthe loss function is two straightlines that meetat the point (,0)?. As we move away from the true valuein either direction, the loss increases at a constant rate. loss g ? Athird loss function is 0/1 or all-or-nothing ? 0if (gx) x(( Lg ),?) = ? ?1if gx) ( = loss defined by: ? ? ? In this case there is a constant loss of 1 for any parameter estimate that is not equal to the true underlying parameter value. If wehit the parameter value exactly, then the loss is zero. loss g ? The Bayesian estimator that arises by minimising the expected loss for each of these loss functions in turn is the mean, median and mode,respectively, each of which is a measure of location of the posterior ofthe posterior distribution, distribution. We will prove these results shortly. The expected posterior EPL The Actuarial Education loss is: E[ L( g (), )]== x Company ?(g x ??L ) f(?| (), x) d ? IFE: 2022 Examination Page 16 CS1-14: The lower limit Bayesian statistics of the integral is the smallest possible value of ? and the upper limit is the largest possible value of ?. 3.1 Quadraticloss For simplicity, In other g will be written instead words, we are assuming that of g(x). g is our estimate of ? . So: =-?() 2f ( | EPL g d?? x) ? We wantto determine the value of g that minimisesthe EPL,so we differentiate the EPL with respect to g . Usingthe formula for differentiating anintegral (which is given on page 3 of the Tables), wesee that: d ?2( g EPL dg =-?? ) f ( | x) d ? Equating to zero: gfd??? (|)x But )fx d???(| = ?fx (| ) d ??? = 1. Thisis because (|)f ? x is the PDFof the posterior distribution. Integrating the PDFover all possible values of ? gives the value 1. So: d?? Clearly this gf (| x) minimises ? ?E( | x)==? EPL. Wecan see this from the graph of the loss function or by differentiating the EPLa second time: d2 2 EPL dg Therefore the ? f | x) ??d 2( Bayesian == estimator 2 > 0 ? min under quadratic loss is the mean of the posterior distribution. Question For the estimation of a binomial probability ? from a single observation x ofthe random variable X withthe prior distribution investigate the form of the posterior of ? under quadratic loss. IFE: 2022 Examinations of ? being beta with parameters distribution of ? and determine a and , the Bayesian estimate The Actuarial Education Compan CS1-14: Bayesian statistics Page 17 Solution The proportionality argument will be used and any constants simply omitted as appropriate. Prior: f() )a?? ?-- 11(1 ?- () G+a omitting the constant ()GG a( ) . Likelihood: fx (| )-?- ?? xxn(1 ? ) n?? omitting the constant ?? x?? . Combining the prior PDF with the likelihood fx| )?- (?? (1) ? xn x .? -a -- (1) - function ? 11 gives the posterior ax+ = ? - 1(1) -? n - x + PDF: -1 Nowit can be seen that, apart from the appropriate constant of proportionality, this is the density of a beta random distribution of ? given variable. Therefore the immediate is beta with parameters = Xx xa+ It can also be seen that the posterior of distributions. density conclusion is that the posterior and -+ nx . and the prior density belong to the same family Thus the conjugate prior for the binomial distribution is the beta distribution. The Bayesian estimate under quadratic loss is the mean of this distribution, that is: xx ++ aa ()++ (xn - x +a ) = n + a + Wecan use Rto simulate this Bayesian estimate. The R code to obtain the pm for Monte Carlo Bayesian estimate of the above is: <-rep(0,M) (i in {theta x 1:M) <-rbeta(1,alpha,beta) <-rbinom(1,n,theta) pm[i] <-(x+alpha)/(n+alpha+beta) } The average of these Bayesian estimates under quadratic loss is given by: mean(pm) The Actuarial Education Company IFE: 2022 Examination Page 18 CS1-14: Bayesian statistics Question Arandom sample of size 10from a Poisson distribution values: with mean ? yields the following data 3, 4, 3, 1, 5, 5, 2, 3, 3, 2 The prior distribution of ? is Gamma(5,2) . Calculatethe Bayesian estimate of ? under squared error loss. Solution Using the formula for the PDF of the gamma distribution given on page 12 of the Tables, we see that the prior PDFof ? is: 25 G(5) 42 ???prior() = fe Alternatively, > 0 wecould say: ()? The likelihood ?() where ? , 42?? ? feiorpr , function ? > 0 obtained from the data is: LP( X == 3) P X( = 4)12 ? P( X10 = 2) Poisson?() random variables. So: ?,,XX 110 areindependent e?? ()= ee-- ?? 3! 34 - ? 4! ? 2! ?2 -10? =LCe ?? 31 where C is a constant. (31 is the sum of the observed data values.) Combining the prior distribution and the sample data, wesee that: ()? 35fetpos - 12?? ?, ? > 0 Comparing this withthe formula for the gamma PDF, wesee that the posterior distribution of ? is Gamma (36,12) . The Bayesian estimate of ? under squared error loss is the meanof this gamma distribution, IFE: 2022 Examinations ie =36 12 3. The Actuarial Education Compan CS1-14: Bayesian statistics 3.2 Page 19 Absoluteerrorloss Again, g will be written instead EPL =? ? g -d?? ? f Assuming the range for ?x () of x()g. So: ? ? is (-8, 8), then: g 8 EPL (??g=- fxd) ? () ?? ( ? + () - g ? ? ) d? fx ? g -8 Weneed to split the integral into two parts sothat integral covers the interval interval where ?g= where . Here-|| . ? g= -=?? Here-|| wecan remove the modulussign. Thefirst -= gg ??. The second integral covers the gg . Again, we wantto determine the value of g that minimisesthe EPL,so we differentiate the EPL withrespect to g . by() [Recall that by() d fx(, y) dx dy ? = ? ay() ay() ? ?y (, x y) ) (by (), y)+-f''( ) ( a (), dx b( yf a yf y y)] (This is the formula for differentiating anintegral, given on page 3 ofthe Tables.) Replacingx by ? and y by g in theformulafor differentiatinganintegral, weseethat: gg d dg?? (g g ) ??-?fx() d ? = -8 ?fx ?() d ?+ ( g - g) f ( g | x) - 0 -8 = ? fx() d? ? ? -8 and: 88 d dg?? (???-? g() fx ) d 8 (??) = (-1) fx d?+ gg ?d) ? 0-(g-)gf( g| x) =- ? ?( fx g So: d dg g EPL d?? f =? x() d ?? 8 - f ?x() ? ? g -8 Equating to zero: g ? 8 d?? fx() ? = ? d?? ? fx() g -8 that is, P(? = g) = P(? = g), which specifies The Actuarial Education Company the median of the posterior distribution. IFE: 2022 Examination Page 20 CS1-14: Recall that the median of a continuous distribution is the value of Bayesian statistics Mthat divides the distribution into two parts, with a 50% probability of beingless than (or equal to) being greater than (or equal to) M. Mand a 50% probability of Question Arandom sample of size 10 from a Poisson distribution with mean ? yields the following data values: 3, 4, 3, 1, 5, 5, 2, 3, 3, 2 (5,2). Calculate the Bayesian estimate of ? under absolute The prior distribution of ? is Gamma error loss. Solution From the solution to the previous question, Gamma(36,12) . The Bayesian estimate of we know that the posterior ? distribution of ? is under absolute error loss is the median ofthis distribution, which can be obtained very quickly using R. The command qgamma(0.5,36,12) gives the answer to be 2.972268. Weuse the R command q to calculate the percentiles of a distribution. Wefollow q with the name ofthe distribution. Here we wantthe median, or the 50th percentile, so the first argument is 0.5. The second and third arguments arethe parameters of the gamma distribution. Alternatively, the mediancould be calculated (approximately) using the Tables. To do this, we have to use the relationship between the gamma distribution and the chi-squared distribution (which is givenin the Tables on page 12). Here weknow that: x ? Gamma 36,12)|( ? For notational convenience, letWx W ??236 Gamma(36,12) |?= 2 12?W . Then: 2 ? The medianof the posterior distribution is the value of M such that: PWM<=() 0.5 or equivalently: ? 2 72 (24PM)<= 0.5 2 From page 169 of the Tables, wesee that the 50th percentile of ?70is 69.33 and the 50th 2 is 79.33. Interpolating percentile of ?80 of 2 ?72 between these values, wefind that the 50th percentile is approximately: (1 0.2)- 69.33 + 0.2 79.33 = 71.33 IFE: 2022 Examinations The Actuarial Education Compan CS1-14: Bayesian statistics Page 21 So: 24 M 71.33 Hence: M 2.972 3.3 All-or-nothingloss Here the differentiation approach with a limiting argument. cannot be used. Instead a direct approach will be used Consider: ? Lg(( x),?) 0ifgg = ? ?1 so that, in the limit -< e? < + e otherwise as e ? 0 , this tends to the required loss function. Then the expected posterior loss is: g + e EPL =- ? f1( g ?x ) ?? d = 1-e2.(f g| x) for small e - e Thisis saying that, for a narrow strip, the area under the function is approximately equal to the area of arectangle whose widthis 2e and whoseheight is equal to the average value of the function over that strip. Again, the Bayesian estimate is the value of g that need to posterior maximise e minimises the EPL. To minimise the EPL, we f2( |gx) . This occurs when f (|gx) is maximised, ie at the mode of the distribution. The EPLis minimised by taking g to bethe mode of)fx?(| . Question Arandom sample of size 10 from a Poisson distribution with mean ? yields the following data values: 3, 4, 3, 1, 5, 5, 2, 3, 3, 2 The prior distribution of ? is Gamma(5,2). Calculate the Bayesian estimate of ? under all-or-nothing loss. The Actuarial Education Company IFE: 2022 Examination Page 22 CS1-14: Bayesian statistics Solution From a previous question, weknow that the posterior distribution of ?is Gam (36,12)ma . The Bayesian estimate of ? under all-or-nothing loss is the modeof this distribution. To calculate the mode, we need to differentiate the posterior PDF(or the log of this PDF)and set the derivative equal to 0. Wehave already seen that: fC post()= e?? Takinglogs (to 35 ?- 12 makethe differentiation easier): ln pos( ) lnfCt?? =+ 35ln - 12? Differentiating: d 35 ln fpost(?) =-12 d?? The derivative is equal to 0 when 35 ?= 12 . Differentiating again: d2 35 ln fpost(?) =- d ?? 22 < 0 Sincethe second derivative is negative, the posterior PDFis maximised when ?= Bayesian estimate of IFE: 2022 Examinations ? under all-or-nothing loss is 35 12 35 12 . Sothe or 2.917. The Actuarial Education Compan CS1-14: Bayesian statistics 4 Page 23 SomeBayesianposteriordistributions In this section wegive atable of situations in whichthe Bayesian approach may work well. The likelihood function is given, together with the distributions of the prior and the posterior. Do not attempt to learn all the results given in this table. The results are here for reference purposes only, and you will not be expected to be able to quote all these results in the examination. However, you maylike to select one or two ofthe results given here and check that you can prove that the distribution of the posterior is as stated. Youcould usethe table as a way of generating extra questions on particular Bayesianresults. The negative binomial distribution referenced here is that described in the Tables as the Type 2 negative binomial distribution. Theresults for the Type 2 geometric distribution can be obtained from those for the Type 2 negative binomial by setting k1= . You maylike to work out the corresponding results for the Type 1 negative binomial and geometric Notice that despite the large number of examples given, the posterior distributions. distribution in all these casesturns out to be gamma, beta or normal. So,in mostBayesian questions it is worth checking whether the posterior PDFtakes the form of one of these three distributions before you start thinking The Actuarial about other possibilities. Education Company IFE: 2022 Examination Page 24 CS1-14: Likelihood ofIID random variables Distribution of parameter Unknown parameter 1,, n? XX Poisson?() Prior Posterior U(0,)8 Gammaxn(1, +?) Ex ?0> Gammax (1, n ?'++? ) ?'()p Gammaa? (, '' ) Gamma U(0,)8 Ex ()p Ex ?0> ? ?'()p Ex '' ) Ga -8 < '' ) Gammanaa)++? (, -8 < 1 2 1 U)-8 (, 8 1 nn ++ 22 ??+ ??' 11 ?? 22 ?? ss '' ?? 2??s ?log, Nx ?? nn ?? ?? Beta +??xx+(1, nm 1) 01p<< ) Beta xx+-a)''(, nm ?? nk (1, Beta U(0,1) + 1)++x? 01p<< Beta IFE: 2022 Examinations ?? nn ?? ?? 22 , Beta ''a(, NB)k(, p 2??s ??' ss N U(0,1) Bin)m(, p ?Nx, ?x <8 <8 x ?'' ss 2)s (1,a+ ?x) Gamma n (1,a?x '? ++ ) ?'()p N(, )s' ' LogN(, ) ?'? n++? )a?x''mma (, U)-8 (, 8 2)s (1,+?x) Gamma n Gammaa? (, N(, '' Gamma n (1,++x U(0,)8 ?0> xna?? (, ++ ) Gamma n Gammaa? (, Gammaa? (, ) Bayesian statistics a'' (, ) Beta nk)++a ''?(, The Actuarial x Education Compan CS1-14: Bayesian statistics 5 Page 25 CredibleIntervals Having derived the posterior distribution we can summarise inferences density is very informative of a parameter ? , there are several ways in which about ?. For single parameters, a plot of the posterior and shows clearly the range of values consistent with our posterior beliefs. In Section 5.1 below, the Core Reading considers distribution is Gamma(15,5.3) . a numerical example where the posterior A plot of the PDFof this distribution is given below: As described earlier, we can also quote quantities parameter or the posterior variance. such as the posterior mean of a Forthe Gamma(15,5.3) distribution pictured above, the meanis 2.83, the variance is 0.534 and the standard deviation is 0.731. For expressing and quantifying classical confidence interval uncertainty about the values of ?, a natural analogue ofthe is the Bayesian credible interval. In Chapter 8, wesaw how to estimate parameters usingthe method of moments and the method of maximumlikelihood. In Chapter 9, weused confidence intervals to express the uncertainty in these estimates. Earlier in this chapter, we estimated a parameter using the mean, mode or median of its posterior distribution. We will now explain how to express the uncertainty in these estimates. Suppose that, << given data x , we derive the 01a , a 100-(1 PA(| x)?= )%a credible interval for ?f( ??x | ) d? posterior ? density of ? as f ? (| x) . Then, for is aregion of ?, say A, which is such that: = 1 - a A So, a 100-(1 )%a credible interval is an interval whose posterior probability of containing is 1-a. The Actuarial Education Company IFE: 2022 Examination ? Page 26 5.1 CS1-14: Bayesian statistics Equal-tailedcredibleintervals Often, we quote an equal-tailed 100-(1 2)%a critical points credible interval, of the posterior obtained by using the 100( distribution. For example, 2.5% and 97.5% critical points of the posterior distribution with 2)%a and = 0.05 , the a would give a 95% credible interval. Thisis similar to the approach we usedin Chapter 9 to calculate confidence intervals. If we want atwo-sided 95%confidence interval, wesplit the remaining 5% equally between the two tails. By definition, an equal-tailed credible interval must contain the median of the posterior distribution, ie the posterior estimate for ? under absolute loss. To calculate equal-tailed credible intervals for a parameter we need the cumulative distribution function of its posterior distribution. When the posterior convenient form, such as a normal, beta or gamma distribution, statistical tables, calculations. or standard functions from a computer distribution has a we can usually use package such as Rto do the There are no tables for the beta distribution in the Tables, so we have to use Rto obtain credible intervals based on a beta posterior distribution. Wecan, however, usethe standard normal tables for a normal posterior, and the chi-square tables, along withthe gamma-chi relationship, for a gamma posterior. Example Suppose that, distribution given data x , the posterior with parameters credible interval of 15 and 5.3, ie distribution ? of the parameter | x ? Gamma(15,5.3) . ?, we need the 5% and 95% critical ? is a gamma For an equal-tailed 90% Gamma (15,5.3) points of the distribution. In R we can use: qgamma(0.05,15,5.3) qgamma(0.95,15,5.3) to obtain the 90% equal-tailed credible interval as (1.74,4.13). Notice that, in this case, we can also use the relationship between the gamma chi-square distribution to calculate the interval. In particular, we have: (2 5.3)??||xx= 10.6 From statistical tables, ? Gamma(15,1 2), ie 10.6?| ~x we have that the 5% and 95% critical are 18.49 and 43.77, respectively. (18.49, 43.77), and therefore 18.49 10.6 , IFE: 2022 Examinations So a 90% equal-tailed a 90% equal-tailed 2 ?30 points 2 ?30 distribution of the credible interval credible interval and the for x? | for 10.6x? | is is: 43.77?? ??= (1.74,4.13) , exactly as before. 10.6?? The Actuarial Education Compan CS1-14: Bayesian statistics Wecan similarly Page 27 obtain a 95% equal-tailed credible interval for The credible interval is (1.58, 4.43). 95% of the distribution |x?: (the shaded area in the diagram above)lies between these values, with 2.5% on either side. The areas under the graphin the two tails are equal, ie 1.58| Px()<= P?? ( > 4.43| x) = 0.025 . Question Arandom sample of size 15 from a normal distribution with mean and standard deviation 3 yields the following data values: 10.75 -0.29 The prior distribution 5.37 6.68 8.77 1.69 7.12 4.89 6.45 4.27 9.37 5.68 3.87 7.70 6.98 of is N(5 ,2 2). Calculate an equal-tailed 95% Bayesian credible interval for are given that the posterior distribution of based on these data values. You is N(5.83,0.7222 ) . Solution From the Tables, we have N(5.83,0.722 2 (1.96-< < 1.96) = 0.95PZ . Sothe lower and upper 2.5% points of ) are: 5.83 1.96 0.772 = 4.41- 5.83 1.96 0.772 = 7.24+ and: So an equal-tailed 95% credible interval for The Actuarial Education Company is (4.41,7.24) . IFE: 2022 Examination Page 28 5.2 CS1-14: Bayesian statistics Highestposteriordensityintervals As an alternative interval for ? to an equal-tailed could be quoted. credible interval, a 100-(1 In addition to satisfying )%a highest )PA -(| ?= such that the minimum density of any point within the interval the density posterior density 1 ?ax , this interval is A is equal to or higher than outside that interval. Thefollowing diagram shows a 95% highest posterior density interval for |x?: Calculating highest posterior density intervals for non-symmetrical distributions is not straightforward. In R,the package bayestestR hasthe function hdi that calculates the highest density interval for a parameter. This is beyond the scope of Subject CS1, but for interested students, the code used to generate the 95% highest posterior density interval in this example is given below: install.packages("bayestestR") library("bayestestR") set.seed(3) x <-rgamma(100000,15,5.3) hdi(x,ci=0.95) The credible interval is (1.48, 4.29). The areas under the graphin the two tails are not equal, 1.48|Px()<? P( > 4.29| x) ? 0.025?? , although the probabilities do sum to 5%. ie For unimodal distributions (such as the gamma distribution), the two endpoints of a highest posterior density interval have the same height (ie density). In the example above: ff((4.29)==1.48) 0.80 The densities of all the values in a higher posterior those outside the interval density interval (ie the graph is higher in the interval). are larger than the densities of So, a higher posterior density interval contains a collection of mostlikely values of the parameter ?, whichis a desirable property. By definition, a higher posterior density interval mustcontain the mode,ie the posterior estimate for IFE: 2022 Examinations ? under 0-1 loss. The Actuarial Education Compan CS1-14: Bayesian statistics For a unimodal Page 29 distribution, the highest posterior density interval is the shortest interval amongst all Bayesian credible intervals. For symmetrical distributions, such as a normal posterior distribution, the equal-tailed credible interval and highest posterior density interval areidentical when based on the same data set. Forskewed distributions, such asthe gamma and mostbeta posterior distributions, the highest posterior density interval is not the same as the equal-tailed interval (as we have seen in the example above involving The Actuarial Education Company the Gamma(15,5.3) distribution). IFE: 2022 Examination Page 30 CS1-14: Bayesian statistics The chapter summary starts on the next page so that you can keep all the chapter summaries together for revision purposes. IFE: 2022 Examinations The Actuarial Education Compan CS1-14: Bayesian statistics Page 31 Chapter14Summary Bayesianestimation v classicalestimation A common problem in statistics is to estimate the value of some unknown The classical approach to this problem is to treat ? as a fixed, usesample data to estimate its value. For example,if then its value maybe estimated by a sample mean. parameter but unknown, ?. constant and ? represents some population mean The Bayesian approach is to treat ? as a random variable. Priordistribution The prior distribution ? of ? represents the knowledge before the collection available about the possible values of of any sample data. Likelihood function Alikelihood function, The likelihood L, is then determined, function is the joint ?? , |n XX X 12,, based on a random sample PDF(or, in the discrete case, the joint X = XX 12, ,..., Xn() . probability) of . Posterior distribution The prior distribution and the likelihood function are combined to obtain the posterior distribution of ?. When ? is a continuous random () post?? L? variable: ffprior() When ? is a discrete random variable, the posterior distribution is a set of conditional probabilities. Conjugatedistributions For a givenlikelihood, if the prior distribution leads to a posterior distribution belonging to the same family as the prior, then this prior is called the conjugate prior for this likelihood. Uninformativeprior distributions If we have no prior knowledge sometimes referred about ? , a uniform to as an uninformative prior distribution prior distribution. should be used. Thisis When the prior distribution is uniform, the posterior PDFis proportional to the likelihood function. The Actuarial Education Company IFE: 2022 Examination Page 32 CS1-14: Bayesian statistics Lossfunctions Aloss function, such as quadratic (or squared) error loss, absolute error loss or all-or-nothing (0/1) loss gives a measure of the loss incurred value of ? . In other words,it when ?is usedasanestimatorofthetrue measuresthe seriousness of anincorrect estimator. Undersquared error loss, the meanof the posterior distribution minimisesthe expected loss function. Under absolute error loss, the medianofthe posterior distribution minimisesthe expected loss function. Under all-or-nothing loss, the mode of the posterior distribution minimises the expected loss function. Credibleintervals A Bayesian credible interval 100(1 quantifies uncertainty )%a-credible interval is aninterval about the values of parameter ?. A whose posterior probability of containing ? is 1a. These can be equal-tailed intervals or highest posterior densityintervals. The endpoints of an equal-tailed 95%credible interval for points of the posterior distribution distribution with tabulated of ?. If the posterior values, we can calculate ? are the lower and upper 2.5% distribution equal-tailed is a standard confidence intervals algebraically. The densities of all points within a highest posterior densityinterval are greater than or equal to the densities of all points that lie outside the interval. Wecan use Rto calculate highest posterior IFE: 2022 Examinations density intervals. The Actuarial Education Compan CS1-14: Bayesian statistics Page 33 Chapter14 PracticeQuestions 14.1 The punctuality of trains has beeninvestigated by considering a number oftrain journeys. In the sample, 60% of trains had a destination of Manchester, 20% Edinburgh and 20% Birmingham. The probabilities of a train arriving late in Manchester, Edinburgh or Birmingham are 30%, 20% and 25%, respectively. Alate train is picked at random from the group under consideration. Calculate the probability 14.2 that it terminated in Manchester. Arandom variable X has a Poisson distribution with mean ?, whichis initially assumed to have a chi-squared distribution with 4 degrees of freedom. Determine the posterior distribution of ? after observing a single value x of the random variable X. 14.3 The number of claimsin a week arising from a certain group ofinsurance policies has a Poisson distribution with mean The prior distribution 14.4 Exam style . of Seven claims is uniform were incurred in the last on the integers (i) Determine the posterior distribution of (ii) Calculatethe Bayesian estimate of week. 8, 10 and 12. . under squared error loss. Forthe estimation of a population proportion p, asample of n is taken and yields x successes. Asuitable (i) prior distribution for p is beta with parameters Show that the posterior distribution 4 and 4. of p given x is beta and specify its parameters. [2] 11 successesare observedin a sample of size 25. (ii) 14.5 Exam style Calculatethe Bayesian estimate under all-or-nothing (0/1) loss. [4] [Total 6] The annual number of claims from a particular risk has a Poisson distribution prior distribution for has a gamma distribution with a2= and with mean . The ? 5= . Claim numbersnxx? 1,, (i) over the last n years have been recorded. Show that the posterior distribution is gamma and determine its parameters. [3] 8 Now suppose that n8= and ?=xi 5 i=1 The Actuarial Education Company IFE: 2022 Examination Page 34 (ii) (iii) CS1-14: Determine the Bayesian estimate for (a) squared-error loss (b) all-or-nothing loss (c) absolute error loss. Bayesian statistics under: [5] Calculate a 95% equal-tailed credible interval for . [2] [Total 14.6 Exam 10] Asingle observation, x, is drawn from a distribution with the probability density function: style fx(|?) ??? ? - 1 =? ?? 0<<x 0otherwise The prior PDFof ?is given by: f?? () exp(=- ? ), Derive an expression in terms 14.7 Exam style A proportion >0 ? of x for the Bayesian estimate of p of packets of a rather ? under absolute error loss. [4] dull breakfast cereal contain an exciting toy (independently from packet to packet). An actuary has been persuaded by his children to begin buying packets of this cereal. His prior beliefs about distribution on the interval (i) p before opening any packets are given by a uniform [0,1]. It turns out the first toy is found in the 1n th packet of cereal. Determine the posterior distribution of p after the first toy is found. [3] Afurther toy wasfound after opening another 2n packets, another toy after opening another3n packets and so on until the fifth toy wasfound after opening a grand total of n1 5nnn +234 +++ n packets. (ii) Determine the posterior distribution of p after the fifth toy is found. (iii) Show the Bayes estimate of p under quadratic loss is not the same as the [2] maximum likelihood estimate and comment on this result. [5] [Total 14.8 Anactuary has atendency to belate for work. If he gets uplate then he arrives at work X minutes late Exam style 10] where Xis exponentially distributed with mean 15. If he gets up on time then he arrives at work Y minuteslate where Y is uniformly distributed on[0,25]. The office manager believes that the actuary gets up late one third of the time. Calculatethe posterior probability that the actuary didin fact get uplate given that he arrives morethan 20 minuteslate at work. [5] IFE: 2022 Examinations The Actuarial Education Compan CS1-14: Bayesian statistics Page 35 Chapter14Solutions 14.1 Let M denote the event a train chosen at random terminates in Manchester (and let E and B have corresponding definitions). In addition, let L denote the event A train chosen at random runs late. The situation can then be represented using the following 0.3 L 0.7 L' tree diagram: M 0. 0.2 0.2 L 0.8 L' 0.25 L 0.75 L' E 0.2 B The required probability is: PM L(| PM ) = From the diagram, nL) PM ( n L () PL () wesee that: 0.6== 0.3 0.18 and: PL ( ) 0.6= 0.3 + 0.2 0.2 + 0.2 0.25 = 0.27 So: 0.18 2 ==(| 0.27 3 PM L) Alternatively, L(| PM we can calculate the probability ) = = using Bayes formula: PM() P( L | M) P () M (PL| M ) (0.6 P( E)P( L| E)++ (PB)PL ( | B) 0.6 0.3 0.3)+ (0.2 0.2) (0.2+ 0.25) 2 = The Actuarial Education 3 Company IFE: 2022 Examinations Page 36 14.2 CS1-14: The prior distribution of pr ()ior ? fe -?? ? ? 2 is ?4 , Bayesian statistics whichis the same as Gamma(2,1/2) . So: /2 Thelikelihood function for a single observation x from a Poisson( ?) distribution is proportional to: ?x -e ? So: /2 post() e??? fe -- ?? ? Hencethe posterior distribution of 14.3 (i) = ?xx+1 e - ? ? 32 is Gamma x (22,3) . + Posterior distribution Let X bethe number of claimsreceivedin a week. To determinethe posterior distribution of , we mustcalculate the conditional probabilities (8| PX== 7), (10| == 7) andPX (12| PX== 7). Thefirst of these is: == (8, PX== 7) (8| PX 7) = PX== (7) P( X = = 7| = 8)P(8) = PX (7) Since X ? Poisson () : (7| PX 8)== = e- 878 7! and since the prior distribution is uniform P (8)== on the integers 8, 10 and 12: 1 3 So: e== (8| PX IFE: 2022 Examinations 7) = 8781 3 = 0.04653 PX==(7) PX (7) 7! The Actuarial Education Compan CS1-14: Bayesian statistics Page 37 Similarly: e== (10| PX PX 7| 7) = == 10)P( ( 7! = PX (7) == (12| PX 7) Since these conditional PX 7| = ( == 12)P( P X (7) probabilities - X( P = 7) 12712 1 12) = 3 = 0.03003 (PX== 7) e and: 10710 1 10) = 7! = 3 = 0.01456 PX== (7) must sum to 1,the denominator PX =(7) must be the sum of the numerators, ie: PX( 7)== 0.04653 + 0.03003 + 0.01456 = 0.09112 Sothe posterior probabilities are: 0.04653 = 0.51066 0.09112 8|PX( == 7) = ( 10|PX== 7) = 12|PX( == 7) (ii) 0.03003 0.09112 0.01456 = 0.09112 = 0.32954 = 0.15980 Bayesian estimate under squared error loss The Bayesian estimate under squared error loss is the meanof the posterior distribution: 8 14.4 (i) 0.51066 + 10 0.32954 + 12 0.15980 = 9.29830 Posterior distribution Since the prior distribution prior fp() p ?- of p is Beta(4,4) : [1/2] p33(1 ) Nowlet X denotethe numberofsuccesses from asampleofsize n. Then?X Binomialn (, p) . Since x successes have been observed, the likelihood Lp() P( X x)== = n?? ?? p x?? (1p- xn )-- x ? px(1 - p) n function is: x [1/2] Combining the prior PDF with the likelihood function gives: (fp ) post The Actuarial Education p (1?- p33 ) Company p (1p- x) n x =px-+ 3(1 - p) n - x +3 [1/2] IFE: 2022 Examination Page 38 CS1-14: Comparing this with the PDF of the beta distribution the posterior distribution of p is Beta +(4, x n+- x Bayesian statistics (given on page 13 of the Tables), wesee that 4). [1/2] [Total 2] (ii) Bayesian estimate under all-or-nothing loss The Bayesian estimate value of p that under all-or-nothing maximises the posterior loss is the mode of the posterior PDF. To find the distribution, ie the mode, we need to differentiate the PDF (or equivalently differentiate the log ofthe PDF)and equate it to zero. Giventhat =11xand = 25n , the posterior of p is Beta(15,18) and: fp()p=-post Cp14(1 Takinglogs (to )17 [1] makethe differentiation easier): ln f ( )p=+lnpC 14ln + 17ln(1 - p) [1] Differentiating: d dp ln)fp( 14 17 p 1- p =- [1/2] The derivative is equal to 0 when: 14(1 )-= 17pp ie when: p= 14 Differentiating d2 [1/2] 31 again: ln)fp( =- dp 14 22 p - 17 (1)- p 2 <?0 max [1/2] Sothe Bayesian estimate of p under all-or-nothing loss is 14 or 0.45161. [1/2] 31 [Total 4] 14.5 (i) Posterior distribution Since the prior distribution of is Gamma (2,5) : 2 prior () 5 (2) fe e =? -- 55 [1/2] G IFE: 2022 Examinations The Actuarial Education Compan CS1-14: Bayesian statistics The likelihood Page 39 is the product 1 ()= of Poisson probabilities: xx n Le e -- ? xxn!! 1 ?x i e n [1] - ? So: nn5) +5(1 ()e? fe+-- post ??xxii = e - [1] Comparing this with the PDFof the gamma distribution (given on page 12 ofthe Tables), wesee that the posterior distribution of (2n++? Gamma xi , is 5). [1/2] [Total 3] (ii)(a) Squared-error When8n = loss ?xi5= and , the posterior The Bayesian estimate of ie distribution of is Gamma (7,13) . under squared error loss is the mean of the posterior [1/2] distribution, 7 or 0.538. [1/2] 13 (ii)(b) All-or-nothing loss The Bayesian estimate of under all-or-nothing loss is the mode of the posterior distribution, ie the value of p that maximisesthe posterior PDF. Tofind the mode, we needto differentiate the PDF(or equivalently differentiate the log of the PDF)and equate it to zero. Since the posterior distribution fC post() of is Gamma(7,13) : 613 - = e where C is a constant. Takinglogs: ln post( ) =fC 613 - [1/2] e Differentiating: d ln fpos() t 6 =- 13 [1/2] d The derivative is equal to 0 when 6 . 13= [1/2] Differentiating again: d2 ln fpost ( ) d The Actuarial Education Company 6 =- 22 ?0 < max [1/2] IFE: 2022 Examination Page 40 CS1-14: 6 under all-or-nothing loss is 13 or 0.462. Sothe Bayesian estimate of The modeof Gam )maa?(, Bayesian statistics a is - 1 provided that a 1> . ? (ii)(c) Absolute error loss The Bayesian estimate of under absolute error loss is the medianofthe posterior distribution. Here weknow that: x ? Gamma|( 7,13) For notational convenience, letWx ??27 WGamma(7,13) . Then: |= 2 13?W 2 [1/2] ? The medianof the posterior distribution is the value of M such that: PWM<=() 0.5 or equivalently: 2 ? 14 [1/2] (26PM)<= 0.5 From page 169 of the Tables, wesee that the 50th percentile of 26 13.34 13.34 ? 26 Sothe Bayesian estimate of 2 ?14 is 13.34: ==0.513MM= [1/2] under absolute error loss is 0.513. [1/2] [Total (iii) 5] Credibleinterval In part (ii)(c), we noted that: W W? x 2 Gamma 7,13)=?26|( ?? 14 From pages 168 and 169 of the Tables, we have: 2 P(5.629<<? 14 26.12) Therefore a 95% equal-tailed 5.629 26 = 0.95 credible interval [1] for |x is: 26.12?? , IFE: 2022 Examinations [1] ??= (0.217,1.00) 26 ?? The Actuarial Education Compan CS1-14: Bayesian statistics 14.6 Page 41 Since we only have a single observation, the likelihood function is equal to the PDF of the distribution from whichthe observation came, ie: -1 ??? ? =? L()? 0<<x [1/2] 0otherwise ?? Also, since f pr ior ()?=-?? exp( ) for ? 0> , it follows that : ? ? ?Ce - post() fC= fL ?() prior() ?? =? ?? 0<<x ? [1/2] 0otherwise where C is a constant. This distribution is notin the Tables,so we will haveto workfrom first principles to determine the value of the constant. Integrating the posterior PDF over all possible values of ? gives 1: 8 ? Ce -- ?? 8 Ce 11 ??x -= ? d = ? =C ? Ce1xx???= e [1] x So the posterior PDFis: pos () fet???--(),x =>x [1/2] The Bayesian estimate of ? under absolute error loss is the medianof the posterior distribution. The median, m, satisfies the equation: 8 ? ()ed? ?=1/2 x -- [1/2] m Integrating: e ?e 8 ?? -= 1/2 x() ? -- ?? m -- () mx=1/2 ? mx-= log2 ? =+ log2 mx ie the Bayesian estimate of The Actuarial Education Company [1] ? under absolute error loss is x +log2 . [Total 4] IFE: 2022 Examination Page 42 14.7 CS1-14: Bayesian statistics This question is Subject CT6, April 2012, Question 6. (i) Posterior distribution of p Let X be the number of packets of cereal that must be opened in order to find a toy. Then |X p has a Type 1 geometric distribution with parameter p. The prior distribution for p is uniform over the interval [0,1] , so: priorfp()p==1, 0 =1 ie the prior PDFis constant overthe interval [0,1] . [1/2] The sample consists of one observation,1n . Sothe likelihood function is: Lp ()n==P( X =) (1 1 - p)n1 p- 1 [1] Combining the prior PDFand the likelihood function, n - priorf ? L(p) = p 1- p() 1 1 postfp() So the posterior wesee that: distribution of p is [1/2] Beta (21, n ) . [1] [Total 3] (ii) Posterior The likelihood distribution after the fifth toy is found function is now: L ()== p P X =(1 - n 11() P( X2 = n2 ) )12 pp (1- PX3 = n3() P X4 pnn ) 11 p (1- p) -- = n4() p (1 )- p PX5 = n 5() p (1- 111p pnnn ) 354 --- (1=- ) ?ni - 5pp 5 [1] and hence: fp p?- p5 1 () ?-ni 5 post() [1/2] 5 Sothe posterior distribution of p is Beta 6, ?ni i =1 ?? . 4???? [1/2] ?? [Total 2] (iii) Bayesian estimate under quadratic loss v maximumlikelihood estimate The Bayesian estimate of p under quadratic loss is the mean of the posterior distribution: 66 = +??2ii+-64 [1/2] 55 nn ii== 11 IFE: 2022 Examinations The Actuarial Education Compan CS1-14: Bayesian statistics Page 43 The maximum likelihood Lp()=-(1 p) estimate of p is the value of p that function: ?ni 5 p5 - Thisis the same asthe value of p that 5 ?? i=1 ?? maximisesthe log-likelihood function: ?-in 5?? ln 1 ?? 5ln p=+ log Lp ( ) maximises the likelihood p () Differentiating withrespect to p : 5 d dp 5=-??i=1 log Lp ( ) [1] p ??- 1 p ?? 5??- ?ni ?? ?? ?? The derivative is equal to 0 when: 5 5 =i ?ni - 5 1 = 1-pp ie when: 5 51 ()-- pp=?? ?ni i=1 ?? - 5 ?? ?? 0 The solution of this equation is: p= 5 [1] 5 ? ni i =1 Wecan check this is a maximum by differentiating 5 d2 log )Lp ( =- dp 5 22 p ? ni - i =1 a second time with respect to p: ?? 5???? ?? [1/2] (1 - p) 2 Since each 1in= , both terms in the expression are negative, so we have a maximum. Hencethe 5 maximum likelihood estimate of p is 5 ? . [1/2]ni i=1 The Actuarial Education Company IFE: 2022 Examination Page 44 CS1-14: The two estimates are different. g that Bayesian statistics The Bayesian estimate of p under quadratic loss is the value of minimisesthe expected posterior loss: 1 ? - () 2 f post( p) dp gp [1/2] 0 The maximum likelihood estimate of p is the value of p that maximises the likelihood function. [1/2] We would expect the estimates to be different since they are calculated in different ways. [1/2] [Total 5] 14.8 This question is Subject CT6, April 2013, Question 3. The required probability is: P(uplate|>20 minslate) ( > 20 minslate|up late)PP(up late) = ( 20 minslate|up late)PP(up late) P>+> ( 20 minslate|up on time)P(up on time) [1] Usingthe fact that whenthe actuary gets uplate, he arrives at work X minuteslate and when he gets up on time, he arrives at work Y minuteslate, wehave: P(uplate|>20 Since (PX 20) P(up late) 20)P(up late)>+ P( Y > 20)P(up on time) (PX > [1] ?XExp(1 15): PX>= (20) Also minslate) = 1 - FX(20) = 1 - 1-e () = e 4 3 20 1 15 - [1] - ?YU(0,25) , so: PY>= (20) 1 -FY(20) = 1 - 20 - 0 = 1 25 - 0 [1] 5 Substituting these in gives: e - 43 P(uplate|>20 minslate) e 43 - + 1 3 11 2 35 3 == 0.39722 [1] [Total 5] IFE: 2022 Examinations The Actuarial Education Compan CS1-15: Credibility theory Page 1 Credibility theory Syllabusobjectives 5.1 Explain the fundamental calculate 5.1.7 5.1.8 concepts of Bayesian statistics and use these concepts to Bayesian estimates. Explain whatis meant by the credibility premium formula and describe the role played bythe credibility factor. Explainthe Bayesian approach to credibility theory and useit to derive credibility premiums in simple cases. 5.1.10 Explainthe differences between the two approaches(ie the Bayesian approach and the empirical Bayesapproach) and state the assumptions underlying each ofthem. The Actuarial Education Company IFE: 2022 Examination Page 2 0 CS1-15: Credibility theory Introduction In this chapter we will discusscredibility theory and explain how it can be usedto calculate premiums orto estimate claim frequencies in generalinsurance. Here we willconcentrate on the Bayesian approach to credibility. We will be using the theory of Bayesian estimation developed in Chapter 14 as well as some results from Chapter 5involving conditional random variables. IFE: 2022 Examinations The Actuarial Education Compan CS1-15: Credibility 1 theory Page 3 Recapofconditionalexpectationresults Recallfrom Chapter 5 that if E X (| X and Y are discrete random variables, then: ?xP( Yy)== Xx| Yy) = = x Similarly,if X and Y are continuous random variables, then: EX (| Y y)==? x fXY(x y) dx | , x Manipulation of conditional expectations is an important technique is in many other areas of actuarial science. Some results are: For any random variables X and Y (for which the relevant in credibility (15.1.1) This result is easy to demonstrate. ?E as it moments exist): EX[] = E[ E( X| Y)] E X ]) [EY(| theory, If X and Y are discrete random variables, then: X (| Y== y) P( Y = y) y ?? ?? xPX =(| == x Y y)?? P( Y = y) ?? yx ?? xP X== (, x Y = y) xP X== x(, Y = y) yx ?? xy ??xP X== (, x Y = y) xy ? xP X== x() x = EX () Asimilar approach usingintegrals can be usedif X and Y are continuous random variables. Another important and 2X concept is that are conditionally EX X [| Y] 12 The Actuarial Education = Company of conditional independent [EX1| independence. given a third random ]YE[ X2| Y] If two random variable variables 1X Y, then: (15.1.2) IFE: 2022 Examination Page 4 CS1-15: Credibility Intuitively this says that both1X and2X known, then 1X and2X unconditionally areindependent. independent, ie independent theory depend on Y, but, if the value taken by Y is This does not imply that 1X and 2X if the value taken are by Y is not known. Hence, it may be the case that: EX X [] 12 even though ? E[ X1] E[ X2] (15.1.2) IFE: 2022 Examinations holds. The Actuarial Education Compan CS1-15: Credibility theory Page 5 2 Credibility 2.1 Thecredibility premiumformula The basic idea underlying the credibility premium formula appealing. Consider an extremely simple example. is intuitively very simple and very Example Alocal authority local authority in a small town has run a fleet of ten buses for a number accidents involving these buses. The pure premium for this insurance calculated, ie the expected cost of claims in the coming year. In order to of years. The wishes to insure this fleet for the coming year against claims arising from makethis calculation, the following For the past five years for this fleet needs to be data are available to you: of buses the average cost of claims per annum (for the ten buses) has been 1,600. Data relating to a large number of local authority bus fleets from all over the United Kingdom show that the average cost of claims per annum per bus is 250, so that the average cost of claims per annum for a fleet of ten buses is 2,500. However, while this figure of 2,500 is based on many morefleets of buses than the figure of 1,600, some of the fleets of buses included in this large data set operate under very different conditions (eg in large cities orin rural areas) from the fleet which is of concern here, and these different conditions are thought to affect the number and size of claims. There are two extreme choices for the pure premium for the coming year: (i) 1,600 could be chosen as it is based on the most appropriate data (ii) 2,500 could be chosen because it is based on more data, so might be considered a morereliable figure. The credibility approach to this problem is to take a weighted answers, ie to calculate the pure premium as: 1,600 + (1 - ZZ) average of these two extreme 2,500 where Z is some number between zero and one. Z is known asthe credibility factor. Purely for the sake of illustration, suppose Z is set equal to 0.6 so that the pure premium is calculated to be 1,960. This example will be revisited to illustrate some points in the next section but now the above ideas will be expressed alittle more formally. The problem is to estimate the expected aggregate claims, or, possibly, just the expected number of claims, in the coming year from a risk. By a risk we mean a single policy or a group of policies. These policies are, typically, short term policies and, for convenience, the term of the policies will be taken to be one year, although it could equally well be any other short period. The Actuarial Education Company IFE: 2022 Examination Page 6 CS1-15: Credibility theory The following information is available: x is an estimate of the expected aggregate claims / number of claims for the coming claims / number of claims for the year based solely on data from the risk itself. is an estimate coming identical of the expected aggregate year based on collateral data, ie data for risks to, the particular risk under consideration. The credibility premium formula claims) for this risk is: (or credibility estimate similar to, but not necessarily of the aggregate claims / number (1+- )Z Zx of (15.2.1) where Z is a number between zero and one and is known as the credibility factor. The attractive features of the credibility premium formula are its simplicity and, provided x and are obviously reasonable alternatives, the ease with which it can be explained to a lay person. Question Aspecialist insurer that provides insurance against breakdown of photocopying equipment calculatesits premiums using a credibility formula. Based on the companys recent experience of all modelsof copiers, the premium for this year should be 100 per machine. The companys experience for a new modelof copier, whichis considered to be morereliable, indicates that the premium should be 60 per machine. Given that the credibility factor is 0.75, calculate the premium that should be charged for insuring the new model. Solution The premium based on the collateral data (including all machines) is: 100= The premium based on the direct data (the new model) is: X 60= So, usingthe credibility formula with Z 0.75= , the premium that should be charged is: PZ =+X(1- IFE: 2022 Examinations Z) = 0.75 60 + 0.25 100 = 70 The Actuarial Education Compan CS1-15: Credibility theory Page 7 Examples of situations where an insurer might determine a premium rate by combining direct data for arisk with collateral datainclude the following: Newtype of cover Aninsurer offering a new type of cover (eg protection against damage caused by driverless vehicles) would not have enough direct data available initially from the claims from the new policiesto judge the premium accurately. Theinsurer could useclaims data from similar well-establishedtypes of cover (eg vehicles with drivers) as collateral datain the first few years. Asthe company sold more of the new policies, the pattern of claims arising from driverless vehicles would become clearer and the insurer could put more emphasis on the direct data. Unusual risk Aninsurer insuring a small number of yachts of a particular model would not have enough direct data for this modelof yacht to set an appropriate premium rate. Theinsurer could use past claims experience from similar types of boats as collateral data. Theinsurer may never have enough experience for this particular model to assessthe risk purely on the basis ofthe direct data. Experience rating Aninsurer insuring a fleet of motor vehicles operated by a medium sized company may wish to charge a premium that is based on the collateral data provided by motor fleets as a whole, but also takesinto account the past experience provided by the direct data for this particular fleet. If the safety record for the company has been good, the company will pay a lower-than-average 2.2 premium. The credibility factor The credibility factor Z is just a weighting factor. Its value reflects how much trust is placed in the data from the risk itself, x, compared with the data from the larger group, , as an estimate of next years expected aggregate claims or number of claims the higher the value of Z, the more trust is placed in x compared with , and vice versa. This idea will be clarified by going back to the simple example in Section 2.1. Suppose that datafrom the particular fleet of buses under consideration for more than just five years. For example, suppose that the estimate claims in the coming year based on data from this fleet itself had been available of the aggregate were 1,600, as before, but that this is now based on ten years data rather than just five. In this case, the figure of 1,600 is considered more trustworthy than the figure of 2,500, and this means giving the credibility factor a higher value, say 0.75 rather than 0.6. The resulting aggregate claims Now suppose credibility estimate ofthe would be 1,825. the figure of 1,600 is based on just five years data, but the figure of 2,500 based only on data from bus fleets operating in towns of roughly the same size as the one under consideration, ie it no longer includes data from large cities or rural areas. (It is still assumed that the figure of 2,500 is based on considerably more data than the figure of 1,600.) In this case the collateral data would be regarded as morerelevant than it wasin Section 2.1 and so the credibility factor would be correspondingly 0.4 from 0.6 giving a credibility premium of 2,140. The Actuarial Education Company reduced, for example to IFE: 2022 Examination is Page 8 CS1-15: Credibility The models discussed in this chapter theory do not allow any scope for this kind of subjective adjustment. Finally, suppose the situation is exactly as in Section 2.1 except that the figure of 2,500 is based only on data from bus fleets operating in London and Glasgow. In this case the collateral data might be regarded as less relevant than in Section 2.1 and so the credibility factor would be correspondingly increased, for example to 0.8 from 0.6, giving a credibility premium of 1,780. Sothe amount of the collateral data is also afactor. If there is a great deal of(relevant) collateral data, the credibility factor maybe reduced to allow for this. From these simple examples it can be seen that, in general terms, the credibility factor in formula (15.2.1) would be expected to behave as follows: The more data there are from the risk itself, the higher should be the value of the credibility factor. The more relevant factor. the collateral data, the lower should be the value of the credibility One final point to be made about the credibility factor is that, while its value should reflect the amount of data available from the risk itself, its value should not depend on the actual data from the risk itself, ie on the value of x. If Z were allowed to depend on x then any estimate ofthe aggregate claims/number could be written in the form Thisis easily verified. ZxZ+(1 ) of (15.2.1) Setting f = Z= -f ?? ? ?? ? f xx =+ -ff x - x -- (f ()/ x-- ) . ? ? ? f??f ?? xx-- ?? x f -- x-- Z to be equal to wesee that: ??x?+ 1 - -xx ?? ?? x=+ = by choosing , x - of claims, say f, taking a value between x and ?? f xx-- =f The problems remain of how to measurethe relevance of collateral data and how to calculate the credibility credibility and empirical factor Z. There are two approaches to these problems: Bayesian Bayes credibility theory. The first of these is covered in the remainder of this chapter. The second is discussed in