Estimation method for media audience duplication ESOMAR

ESOMAR

Cross Media Conference, Montreal, June 2005

www.esomar.org

Estimation method for media audience duplication

Patricio Moyano Galdames and Orlando Muñoz Balmaceda

Time Ibope, Chile

Elias Selman Carranza

Ibope Time Pacific, Chile

OVERVIEW

Modeling the duplication of vehicles audiences has a long history in our field, both successful and unsuccessful, from Agostini's method,

1

with its controversial K constant, to the more broadly accepted Metheringham Method (Beta Binomial Distribution).

2

But this outstanding pioneer in the study of audience duplication phenomena left us with a serious problem: the decline of reach in the case of the addition of a spot with a lower average rating than the previous average.

3

The discussion of this problem has not advanced significantly. This undesirable effect—the decline of reach—led many of our colleagues to improve their estimations using proprietary models of a similar type.

4

However, these experiences are all linked principally to readership estimations used to solve the problem of advertising in print media.

In television, the considerations are somewhat different, as people's exposure depends on day parts and programming schedules. Therefore, it is necessary to analyze intra- and inter-channel duplication. Initial investigations assumed constant duplication

5

in the set of media outlets analyzed.

One approach that simplifies the analysis, and which has been used for a long time, is to assume that duplication is a random event and that consumption of a media outlet, program, etc., is an independent phenomenon. But this solution is questionable with respect to the estimation of media reach.

With the rise of personal computers, there has been an explosive growth in the analysis of media plans and of software used for this purpose with evaluation functionalities

6

that provide statistics on reach, average frequency, exposure distributions, etc. It should be pointed out that these systems work by calculating real duplication using a raw database produced by research, especially using the People Meter system. In some cases a final adjustment is performed in order to match the published GRPs (daily) with those calculated using a constant sample panel that is generally formed on the middle day

7

of the period being evaluated.

The need for combined media assessments has led market researchers to design a methodology generically known as “single source”.

8

When inquiring about the consumption of different media in a single interview, it is possible to use this same data to perform multimedia evaluations using the same sample. This is unquestionably an adequate solution, but the information it provides is more for media strategy (long-term), while the purchase of space in vehicles is more closely related to media tactics (short-term), particularly in television, where the fight for audience occurs on a daily basis and specialized studies, such as those of readership or those using the People Meter system, provide more accurate and detailed data.

Another alternative, which is both new and promising, is data fusion. This interesting technique uses common elements to match up the contents of various databases, thereby creating one single database. It is also possible to assume that a multimedia estimation is an adequate approximation, except that the complexity involved in matching more than two databases requires the additional assumption that the matching variables are sufficient to establish a consistent fusion.

In short, there are different approaches to the problem of evaluating media plans, especially when they are mixed, e.g. based on the different needs of communication campaigns for products and services.

Our approach takes into account the fact that specialized studies for audience measurement provide the highest-quality and most detailed data, and that they are used intensively in the purchase of spaces. What is lacking is a consistent link that would allow consolidation of the results of globally-viewed advertising media plans with high precision and low information loss.

THE MODEL

We define f(x) as the distribution of the frequency of a certain schedule for a media outlet, A, and g(y) as the distribution of the frequency of another schedule for a media outlet, B. Thus, the problem is creating a new “intermediate” distribution that we will call h(x,y) and which describes the joint effect of the two schedules on the campaign's target group. Then, in the following stage, it consolidates this in a new distribution that we will call T(z).

Thus, the distributions X and Y shall be the marginal distributions of the joint distribution obtained from the algorithm that we will describe below.

The expected value E(X) of the distributions is equal to the GRPs of each schedule (see Equation 1 ).

Then, the condition that must be satisfied by the new distribution T(z) – final – is that the GRPs be equal to the sum of the two distributions and that the total reach be equal to or greater than the largest of the two reaches and less than the one produced under the hypothesis of independence.

Conditions:

1.

Grps(T) = Grps(X) + Grps(Y)

2.

Reach(T)  ≥  Max(Reach(X),Reach(Y))

3.

Reach(T)  ≤  Reach(X) + Reach(Y) −  (Reach(X)) (Reach(Y))

Where Reach = l-f(0), that is, 1 minus those who do not see the schedule.

This procedure begins by calculating the new total reach, which is assumed to be within the interval of the conditions described above.

The Limits of the Reach from the Point of View of Set Theory

1. The maximum reach of both schedules (see Figure 4 )

For the minimum level, we are assuming that the reach of schedule A is contained in or is a subset of B, which means that A does not exceed the reach

(see Equation 2 ).

This means that when consolidating both schedules, the total reach is equal to the reach of B.

2. The schedule B is independent from A (see Equation 3 )

This means that the intersection of schedules A and B is the product of their probabilities (see Equation 4 ).

This means that the maximum level is equal to random duplication.

CREATING THE FACTORS

First, the factors that determine the maximum and minimum levels of reach are calculated; the interval of the solution of the total reach (the result of consolidating both schedules) is defined here.

Factor: Maximum Reach of Both Schedules. This corresponds to the factor with which the reach of the lower limit of the consolidated schedule is obtained

(see Equation 5 ).

Factor: Random Reach (Independence). This corresponds to the factor with which the reach of the upper limit of the consolidated schedule is obtained

(see Equation 6 ).

New Factor or Probabilistic Factor. The factor most likely to occur (see Equation 7 ).

The new total reach or mixed reach of the consolidated model is determined using the probabilistic factor (see Equation 8 ).

Figure 1 depicts the curves of the three factors as a function of increases in the GRPs. They make the reach increase, but with decreasing returns. The factor acts to decelerate the reach as a function of increases in the GRPs, as they are basically the OTS minus the GRPs.

The step that follows the estimation of the mixed reach of the schedules being consolidated is to calculate the ratio that allows the joint distribution of the distributions to be created.

Estimation of the Ratio: This Ratio will make it possible to distribute the joint proportions of the schedule distributions that are being consolidated (see

Equation 9 ).

9

Where random duplication (Dup rdm

) is: (see Equation 10 )

Within the concept of probabilities, this duplication involves independence between the media that are being consolidated. In other words, the consumption of one media outlet has no influence on the consumption of the other.

Where actual duplication is: (see Equation 11 )

In the following section we present an example that illustrates the application of this method.

ILLUSTRATION

In order to illustrate the methodology that consolidates the frequency distributions of various media, two schedules were created: a television schedule and a print media schedule. The schedules were then evaluated using the software currently available in the Chilean market, TVdata and PrintPlan, which are used for television and print media schedules, respectively.

The target group used here is the total number of people (in Chile) between the ages of 25 and 54; this universe comprised of 2,149,519 people. Figure 2 shows the output of the software when applied to the two schedules being studied.

The frequency distributions of individual media are presented in Table 1 (also see Table 2 ).

Estimation of Maximum Factor. This is the factor that corresponds to the lower limit of the consolidated schedule. In this case it is the minimum reach possible in the new distribution, which is to say the maximum reach of the media evaluations, independently. Let Reach max

be the maximum reach of the two schedules, let GRP tv

be the GRP obtained in the television schedule, let GRP pr

be the GRP obtained in the print media schedule, and let GRP mix

be

GRP tv

+ GRP pr

(see Equation 12 ).

Estimation of the Random Factor. This is the factor that corresponds to the upper limit of the consolidated schedule. In this case, it corresponds to the random duplication, which means assuming independence between the media. Let Reach rdm

be the reach when assuming random duplication (Dup rdm

)

(see Equation 13 ).

Estimation of the Probabilistic Factor. This is an average of the Maximum and Random factors, weighted according to their respective reaches (see

Equation 14 ).

Estimation of the Reach of the Consolidated Schedule: The weighted factor is used to determine the reach of the consolidated schedule (Reach mix

). It is calculated as follows in Equation 15 .

Calculation of Actual Duplication. Once the reach of the consolidated schedule has been determined, the actual duplication of the television and print media schedules can be determined. We label designate duplication as Dup act

. We know that the reach of the consolidated schedule (Reach mix

) is the sum of the reaches of the television schedule (Reach tv

) and the print media schedule (Reach pr

), minus their duplication (Dup act

), that is: (See Equation 16 )

This produces: (see Equation 17 )

Estimation of the Frequency Distribution of the Consolidated Schedule: In order to estimate the consolidated frequency distribution, the proportion of the non-impacted ones (zero frequency) of both distributions (television and print media) must be recalculated.

1.

Estimation of the Ratio: (see Equation 18 )

2.

Re-estimation of the proportion of individuals not reached by each schedule: (see Equation 19 )

Where:

P

*

•,0

: Re-estimation of the proportion of people not reached by the television schedule.

P j =0

: Original proportion of people not reached by the television schedule.

P

*

0,•

: Re-estimation of the proportion of people not reached by the print media schedule.

P i =0

: Original proportion of people not reached by the print media schedule (see Tables 3 and 4 ).

The distribution of the consolidated frequency is obtained from the matrix in which the frequency distributions of the media are combined with their re calculated zero frequency.

Let

Be

P i,j

: Matrix cell i,j contains the proportion of individuals exposed i times to the print media model and j times to the television model, where i=0,1,...,11+ and j=0,l,...,9+.

P

•, j

: Frequency distribution of television with the recalculated zero frequency.

P i ,•

: Frequency distribution of print media with recalculated zero frequency.

Matrix cell 0,0 contains the proportion of individuals not exposed to either of the two schedules, meaning the people who are not exposed to the consolidated schedule. The method of calculating this is as follows in Equation 20 .

The remaining cells in the matrix are calculated the same way. To exemplify our methodology, the calculation of cell (1,0) is shown (see Equation 21 ).

Cell (i,j) indicates the proportion of individuals who were impacted i times by the print media schedule and j times by the television schedule. Adding up the diagonals of the matrix produces the respective frequencies of the consolidated schedule. By way of illustration, Table 4 shows the diagonal that corresponds to frequency 4 of the consolidated distribution (0.0056 + 0.0284 + 0.0387 + 0.0349 + 0.0145 = 0.1221).

Table 5 shows the frequency distribution after consolidating the television and print media distributions, which produces a reach of 75.04 (100 – 24.96) and a GRP of 247. This coincides with the sum of the GRPs obtained by the television and print schedule. Figure 3 shows a graph of the consolidated frequency distribution.

Table 6 summarizes different evaluations of combined schedules, comparing the proposed method with the results obtained by treating the data as a single source.

CONCLUSION

This model incorporates a methodology which allows consistent evaluations to be performed using diverse data sources, especially those from specialized media studies, due to the fact that it uses the final data of each distribution.

The table above compares the results of this method with those obtained by processing the data as a single source.

This confirms that the differences are not statistically significant, which validates the model.

FOOTNOTES

1.

Agostini, J.M. How to Estimate Unduplicated Audiences. JAR, March 1963.

2.

Metheringham, R. A. Measuring the Net Cumulative Coverage of a Print Campaign. JAR, December 1964.

3.

Leckenby, J.D. and M.D. Rice. The Declining Reach Phenomenon in Exposure Distribution Models. Journal of Advertising (15), 1986.

4.

Metrex, TruCume and MetherPlus are a few examples of improved estimation models.

5.

See Goodhart and Ehrenberg's 1969 papers.

6.

Time Ibope provides the TV data software that was used to carry out these evaluations.

7.

This adjustment generally uses probabilistic negative binomial distributions.

8.

For example, TGI (Target Group Index) and EGM (Estudio General de Medios, “General Media Study”).

9.

For more information, see Katz, Lancaster. “Strategic Media Planning.”

NOTES & EXHIBITS

EQUATION 1

FIGURE 4

EQUATION 2

EQUATION 3

EQUATION 4

EQUATION 5

EQUATION 6

EQUATION 7

EQUATION 8

FIGURE 1: SOLUTION OF THE REACH COMBINED

EQUATION 9

EQUATION 10

EQUATION 11

FIGURE 2: RESULTS PRODUCED BY TVDATA AND PRINTPLAN SOFTWARE

TABLE 1: DISTRIBUTION OF TELEVISION AND PRINT MEDIA FREQUENCY

TABLE 2: SUMMARY OF THE RESULTS OF EVALUATIONS OF A TELEVISION SCHEDULE AND A PRINT MEDIA SCHEDULE

EQUATION 12

EQUATION 13

EQUATION 14

EQUATION 15

EQUATION 16

EQUATION 17

EQUATION 18

EQUATION 19

TABLE 3: FREQUENCY DISTRIBUTIONS OF TELEVISION AND PRINT MEDIA WITH RECALCULATED ZERO FREQUENCIES

TABLE 4: COMBINATION OF THE FREQUENCY DISTRIBUTIONS OF TELEVISION AND PRINT MEDIA

EQUATION 20

EQUATION 21

TABLE 5: FREQUENCY DISTRIBUTION AFTER CONSOLIDATING TELEVISION AND PRINT MEDIA DISTRIBUTIONS

FIGURE 3: CONSOLIDATED FREQUENCY DISTRIBUTION

TABLE 6: SUMMARY OF EVALUATIONS OF COMBINED SCHEDULES

© Copyright ESOMAR 2005

ESOMAR

Eurocenter 2, 11th floor, Barbara Strozzilaan 384, 1083 HN Amsterdam, The Netherlands

Tel: +31 20 664 2141, Fax: +31 20 664 2922

All rights reserved including database rights. This electronic file is for the personal use of authorised users based at the subscribing company's office location. It may not be reproduced, posted on intranets, extranets or the internet, e-mailed, archived or shared electronically either within the purchaser’s organisation or externally without express written permission from Warc.

www.warc.com