Nielsen Out of Home Reach and Frequency Methodology Contents Introduction Section 1: Calculation Methodology for a Single Campaign Flight 1.1 1.2 1.3 Overview Notation The Process 1.3.1 1.3.2 1.3.3 1.3.4 1.3.5 1.3.6 1.3.7 1.4 Preliminary Analyses to Generate Inputs Algorithm to Estimate Model Parameters Calculating Outputs Demographic Consistency Negative Reach Build Schedule Reliability Sites with Zero Passage Example Section 2: Schedule Reach Calculation for 2+ Flights 2.1 2.2 2.3 2.4 Overview Methodology and Example Generating the frequency distribution Three or more Schedules Page 1 of 13 Introduction This document describes the reach and frequency calculation methodology for SAARF/Nielsen Outdoor data. These data are collected from GPS tracking (via the Npod) of respondents’ travel patterns over a nine day data collection period. These data are converted to schedule reach and frequency by cross-referencing with outdoor advertising locations, and the application of a probability model to generate schedule reach and frequency. This model allows the full frequency distribution to be calculated. The first section describes the basic model as it applies to a single campaign flight, and the second section describes how it is applied to multiple campaign flights of different lengths. Section 1: Calculation Methodology for a Single Campaign Flight 1.1 Overview The methodology for calculation of reach and frequency employs the Gamma Poisson Distribution (also known as the Negative Binomial Distribution, or NBD). The methodology ensures consistency with published nine days GRPs and reach, and generates GRPs, reach and frequency distributions for any number of days. In summary, the methodology works as follows: 1. Published (9-Day) GRPs are calculated for the schedule. These are calculated using the Board Impressions File. 2. A weighted reach and frequency analysis is generated using the intab1 reporting sample for Day 1. Those persons not intab on subsequent days are deemed to have no exposures on those days. This analysis is used to generate the initial parameters for the model. 3. The initial model parameters are scaled to ensure that the model produces results that are consistent with the published nine day GRPs. 4. The model with the scaled parameters can then be used to generate the schedule frequency distribution and total GRPs for the required number of days. 1 An “intab” respondent is defined as a respondent whose data for a given day has passed through a quality assurance process and has been included in the reporting sample. Page 2 of 13 In addition, the algorithm also covers the case where the frequency distribution reduces to a Poisson distribution, and for extreme cases where Reach is 100% or 0%. We recommend that single-precision floating point arithmetic is used. If fixed point calculation is used, the values should be retained to at least two decimal places if respondent weights are in units and five decimal places in the case of weights in thousands. 1.2. Notation Gp : Published 9-Day GRP’s for the schedule. Throughout we assume this to be greater than 0. Gr : 9-Day GRP’s calculated in the Set-up reach and frequency analysis fr(0) : Weighted proportion of persons with zero exposures over 9 days in the Set-up reach and frequency analysis c : A constant used in the calculation of NBD parameters x! : x factorial ln (x) : Natural logarithm of x abs(x) : Absolute value of x a, k : NBD Parameters a , α : Scaled NBD Parameters. t : Unit of time used in model. d : Number of Days. NB: t=d/9. λ : Poisson Parameter fp(i) : Final Gold Standard proportion of individuals with i exposures to the schedule, consistent with published 9-day GRPs Gp(d) : Final Gold Standard schedule GRPs for day d. Page 3 of 13 1.3. The Process 1.3.1 Preliminary Analyses to Generate Inputs Analysis 1 9 Day GRPs Adults 16+ The Board Impressions File is used to calculate Adults 16+ Total GRPs over nine days, as follows: a) Sum the daily passage estimates in the file for the selected sites and multiply by nine. This gives Adults 16+ nine day impressions in 000s. b) Divide the nine day impressions by the Adults 16+ population estimate and multiply by 100 to obtain Adults 16+ GRPs. This figure, denoted Gp, acts as an input into the Reach and Frequency Model. Demographic Analyses To generate the nine-day GRPs for input into demographic analyses, the following preliminary results should be generated: Smoothed Adults 16+ nine-day GRPs as described above Survey-calculated Adults 16+ nine-day GRPs Survey-calculated Demographic nine-day GRPs = = = A B C Then the input into the Reach and Frequency model, Gp, is calculated as follows: Gp = A/B x C Analysis 2 Generate a Reach and Frequency analysis for 9 days data using the weighted Intab sample for Day 1. NB: (i) (ii) Use Day 1 weights for each day. If a person is not intab on any of days 2 – 9, the exposures for that person are assumed to be zero. This analysis is used to generate the following Inputs to the model: a) The set-up weighted 9-Day GRPs, denoted as Gr . b) The weighted proportion of persons with 0 exposures, denoted as fr(0). Note that the inputs Gr and fr(0) should not be rounded or truncated. Page 4 of 13 1.3.2 Algorithm to Estimate Model Parameters The NBD is a two-parameter model. There are various ways of estimating the parameters – one approach is outlined below. This approach also checks for special cases where the standard model is inappropriate. 10 Algorithm Comments if fr(0)=0 then set fr(0)=0.001 Amend 100% Reach if fr(0)=1, then set fr(0)=0.999 Amend 0% Reach 20 c = Gr /(100*ln(fr(0))) Calculate ‘c’ 30 if c ≥ -1 go to 60 Check for Poisson Condition 40 41 a = -2*(1+c) b=a a = c*(a-(1+a)*ln(1+a))/(1+a+c) if abs(a-b)<0.0001 then go to 50 go to 41 Iterative procedure to estimate NBD Parameter ‘a’ 50 k = Gr /(100*a) a = a*Gp/Gr α = 1/a go to End NBD Parameter ‘k’ Scaled NBD Parameters 60 λ = Gp/100 Poisson Parameter End Page 5 of 13 1.3.3 Calculating Outputs Negative Binomial Distribution The full frequency distribution for time t is given by first calculating the probability of zero exposures. The probability of n exposures is calculated as a function of the probability of n-1 exposures as follows: Probability of zero exposures fp(0) = (α/(α + t))k fp(n) = ((k + n – 1)/n)*((t/(α + t))*fp(n-1) For n ≥ 1: Probability of n exposures The average frequency AveF is given by: AveF = kt/α Note that the units of t are 9 days. Schedule reach over d days, expressed as a percentage, is given by Reach = 100*(1 - (α/(α + d/9))k) GRPs are given by GRP(d) = (d/9)*Gp(9) Poisson Distribution When the Poisson condition is indicated, the frequency distribution for time t is as follows: fp(i) = (λt)i e-λt/i! Schedule Reach over d days, expressed as a percentage, is therefore given as Reach = 100*(1- e-λd/9) The GRPs delivered are given by: Gp(d) = (d/9)*Gp(9) Page 6 of 13 1.3.4 Demographic Consistency Reach estimates from separate reach and frequency runs will not always be additive. For example, a Men 16+ Reach estimate and a Women 16+ Reach estimate will not necessarily be consistent with an Adults 16+ Reach estimate for the same schedule. In cases where consistency is required, a “top-down” balancing should be adopted. For example: Adults 16+, Men 16+ and Women 16+ Schedule Analysis Adults 16+ Men 16+ Women 16+ UE 1000 480 520 Initial Reach (%) 50 60 40 Initial Reach (000s) 500 288 208 The summed demographics reach in 000s is 288 + 208 = 496. To get consistency, the reach estimates should be balanced to the total of 500: Adults 16+, Men 16+ and Women 16+ Schedule Analysis Adults 16+ Men 16+ Women 16+ 1.3.5 Summed Weights 1000 480 520 Initial Reach (%) 50 60 40 Initial Final Reach (000s) Reach (000s) 500 500 288 290.3 208 209.7 Final Reach 50.0 60.5 40.3 Negative Reach Build The reach projection model can return an apparent negative reach build as boards are added to a campaign. This is a phenomenon that affects all reach projection models to a certain extent. Negative reach build is a manifestation of forecast error in the model, and essentially means that there is no significant difference between the reach of the two schedules. 1.3.6 Schedule Reliability The table below presents estimates of standard error for campaigns of varying sizes for adults 16+, based on the sample of 1933 respondents in Gauteng and KZN. Estimates for demographics can be approximated from these data using the inverse-square relationship of sample size and standard error: a demographic that is a quarter of the sample will have double the standard error. Page 7 of 13 More generally, the standard error for a demographic can be estimated as follows: Demographic Standard Error = Adults 16+ Standard error x Square Root (Adults 16+ Sample Size/Demographic Sample Size) Number of Boards 10 20 50 100 150 200 28 Day GRPs 52 105 261 523 784 1045 Standard Error (%) 11 7 5 5 4 4 95% confidence interval 41 - 63 90 - 119 234 - 289 474 - 571 723 - 845 965 - 1125 As a guide, we would recommend that campaign analyses should require at least 40 separate respondents in the target demographic to be exposed to the campaign. In practice this means that a few top-rated boards will support analysis on their own for broad demographics, but a typical campaign aimed at 16-34’s (for example) would require five or more boards, and in some cases (eg lower rated boards in rural areas) considerably more boards would need to be selected. 1.3.7 Sites with Zero Passage A minority of sites have zero passage (no sample respondents exposed to the site in the fieldwork period). The Board Impressions File sets the Adults 16+ value for each board, and this will be extended later in 2007 to include demographics. In the interim, some smaller campaigns will have zero impressions for some demographics. In this case, Nielsen would recommend estimating the campaign’s demographic profile by taking the average demographic profile of the site types in the campaign. Note that this should be an acceptable approach for sex/age demographics, but will be less reliable for other characteristics such as race and income. 1.4 Example: Estimating Adult 35-49 GRPs for a 100 GRP Campaign Consider a campaign aimed at Adults 35-49 that delivers 100 Adult 16+ GRPs, composed of 60 GRPs on Super Signs and 40 GRPs on 96 Sheets. From the survey data, Adults 35-49 GRPs index at 151 on Adults 16+ for all Super Signs and at 143 for 96 Sheets. Adult 35-49 GRPs for the campaign are given by: Super Signs: 60 x 1.51 = 90.6 96 Sheets: 40 x 1.43 = 57.2 Total Campaign GRPs for Adults 35-49 = 147.8 Page 8 of 13 Section 2: Schedule Reach Calculation for 2+ Flights 2.1 Overview This section specifies the methodology for calculating reach and frequency for schedules with two or more components of different time periods. An example is a four week run of bus shelters combined with two weeks of billboards. The methodology outlined here produces the final delivered reach and frequency of multi-flight schedules. Note that there is no need to go through this procedure for two components of equal length (e.g. 4 weeks of bus shelters and 4 weeks of billboards) – these can be combined and considered as a single schedule, even if they occur at different times. 2.2 Methodology and Example Example: We have two components to the schedule. Component 1 lasts four weeks and Component 2 lasts two weeks. We calculate Reach and Frequency using the NBD in the normal way and label them for subsequent use in calculation: Component 1: Component 2: Length 4 weeks 2 weeks GRPs 1000 400 Reach 60 40 Label R12 R21 GRPs The first and easiest part is to calculate the combined GRP delivery, which is simply the sum of the two components, in this case 1400 GRPs. Reach We first calculate the combined GRPs and Reach for the two components combined, for 2 weeks and 4 weeks: Components 1/2: Components 1/2: Length 2 weeks 4 weeks GRPs 900 1800 Reach . 60 70 Label R31 R32 We also calculate Component 1 for 2 weeks, and Component 2 for 4 weeks: Component 1: Component 2: 2 Weeks 4 Weeks 500 800 45 50 R11 R22 Page 9 of 13 These different reach runs allow us to populate some of this table: Component 2 Component 1 Period 1 Period 2, not Period 1 Neither Period 40 10 50 Period 1 45 25.0 ↑ Period 2, not Period 1 15 ← 15.0 Neither Period 40 30.0 The data values are related to the labels as follows: Component 2 Component 1 Period 1 Period 2, not Period 1 Neither Period R11 R12 - R11 Period 1 Period 2, not Period 1 Neither Period R21 R22-R21 100-R22 R11+R21-R31 ← ↑ (R12+R22-R32)(R11+R21-R31) 100-R12 100-R32 With this information, we need to find the best solution to the following: Period 1 R21 Component 1 Period 1 Period 2, not Period 1 Neither Period R11 R11+R21-R31 R12 - R11 b f 100-R12 Component 2 Period 2, not Period 1 R22-R21 Neither Period 100-R22 a c g d e 100-R32 Here is the solution: essentially we calculate a, b and c assuming first a random duplication between the two components of the schedule, which we then factor to fit the results of our combined schedule analysis. We then derive d, e, f and g directly (in fact we really only need f). Page 10 of 13 First, we ensure consistency between the individual component analyses and the combined analysis. 100 200 Set k1 = R11+R21 –R31 If k1 < 0, k1 = 0 Set k2= R12+R22-R32 If k2 < 0, k2 = 0 Second, calculate a, b and c using random duplication: 300 a = R11 x (R22-R21)/100 b = (R12-R11) x (R21)/100 c = (R12-R11) x (R22-R21)/100 Set k3 = a + b + c Third, factor a, c and d to the combined schedule analysis results: 400 Set k4 = (k2-k1)/k3 a = a x k4 b = b x k4 c = c x k4 Fourth, Derive d, e, f and g 500 d = R11- k1 – a e = R12 – R11 – b – c f = R21 – k1 – b g = R22 – R21 – a - c The final task is to calculate the combined reach, which consists of these elements: - those reached by Component 1 in periods 1 or 2 – Call this X1 those reached by Component 2 in period 1, who were not reached by Component 1 in either period – X2 X1 = R12 X2 = f Combined Reach is therefore R12 + f. Page 11 of 13 In this example, we get the following results: Component 1 Period 1 45 15 40 Period 2, not Period 1 Neither Period Period 1 Component 2 Period 2, not Period 1 Neither Period 40 25.0 7.5 7.5 10 5.6 1.9 2.5 50 14.4 5.6 30.0 The final schedule reach is then 60 + 7.5 = 67.5. 2.3 Generating the frequency distribution at the end of the schedule The frequency distribution of the schedule can be calculated by fitting the Negative Binomial Distribution to the schedule with the final reach and GRPs as inputs for parameter calculation. Note: i) ii) iii) There is no need to adjust the parameters since the results are final. The distribution is achieved by setting t = 1. The NBD is only valid in multi-component schedule analyses for the complete schedule. It cannot be used for reach build over time during the period of schedule, because of the different time periods of the components. The results that feed into parameter calculation are: GRP Total= GRP of R12 + GRP of R21 Reach Final= R12 + f 2.4 Three or more Schedules The process can be applied to three or more schedules in an iterative way. The two shortest schedules should be combined first, and then considered to be a single schedule, with the combined reach calculated as described above. The third schedule is then combined with this combined schedule in a similar way, with the period 1 inputs being the results for the first two schedules. The overlaps in Periods 1 and 2 between Schedules (1 and 2) and 3 are estimated by factoring. Example using the Period 1 and 2 schedules above, and a third schedule: We have these basic components: the 70 for Components 1/2 is R32 in the example above, and is the reach for these components over four weeks. Component Duration Reach 1/2 3 1/2/3 4 Weeks 6 Weeks 4 Weeks 6 Weeks 4 Weeks 6 Weeks 70.0 75.0 35.0 40.0 80.0 84.0 Page 12 of 13 To take into account the fact that Component 2 is only two weeks and the reach of Components 1 and 2 over the 4 weeks is 67.5 (as shown above), we factor the results as follows: 1/2 Component Duration Reach Factored 3 1/2/3 4 Weeks 6 Weeks 4 Weeks 6 Weeks 70.0 75.0 35.0 40.0 = 67.5 = (67.5/70) X 75 35.0 40.0 4 Weeks 6 Weeks 80.0 84.0 =84 X =80 X (67.5+35)/ (67.5+40)/ (70+35) (70+40) This then gives us the following as inputs into the procedure as follows, with the Reference Codes related to the procedure as described above. Component Duration 1/2 3 1/2/3 4 Weeks 6 Weeks 4 Weeks 6 Weeks 70.0 75.0 35.0 40.0 Factored = 67.5 = (67.5/70) X 75 35.0 40.0 Results 67.5 72.3 35.0 40.0 78.1 82.0 Reference R21 R22 R11 R12 R31 R32 Reach 4 Weeks 6 Weeks 80.0 84.0 =84 X =80 X (67.5+35)/ (67.5+40)/ (70+35) (70+40) Using the method above, this then delivers a total reach for the three components over the six weeks of 79.4. Page 13 of 13