GEOGRAPHIC INFORMATION SYSTEMS (GIS), PUBLIC HEALTH DATA, AND SYNDICATED MARKET RESEARCH DATA BASES IN HEALTH COMMUNICATION William E. Pollard, Ph.D. Susan D. Kirby, Dr.P.H. Office of Communication Centers for Disease Control and Prevention 1600 Clifton Rd., N.E., Mailstop D-42 Atlanta, GA 30333 OVERVIEW of the population to target with the message, and to identify audience segments that may differ in interests, lifestyle, and media habits in order to design messages with the appropriate content, design, and media channels (Meyers, 1996; Weinstein, 1994). In this presentation the use of this data for audience segmentation in health communication planning is examined. In this presentation we focus on statistical aspects of some new directions in health communication planning at the Centers for Disease Control and Prevention. In particular, we discuss how the integration of syndicated market research data with GIS data and public health data, within the geographic structure of U.S. census data, provides a comprehensive framework for data-driven health communication planning. Health Communication at the CDC The CDC is “the nation’s prevention agency” with a mission of promoting health through scientific, data-based prevention policies and activities. Since the creation of the CDC in 1946 with a focus on control of malaria, typhus, and other communicable diseases, the scope of health issues addressed has expanded to include such problems as injuries, noncommunicable lifestyle illnesses, environmental health, occupational health, and AIDS and sexually transmitted disease, and a key element of prevention in these areas is behavior change. One approach to changing behavior is through communication, and the field of health communication draws from the fields of social psychology and other social Syndicated Market Research Data Bases While public health researchers are likely to be familiar with the latter three types of data, marketing data bases require some discussion here. These data bases are widely used in the commercial sector to develop messages to promote products and services to potential customers. They contain proprietary and public information on sociodemographic characteristics, consumer behavior, lifestyle activities, and media habits of potential customers, and are available through licensing and contractual agreements. A primary use of such data is that of audience segmentation. The data bases are used to identify segments 1 neighborhood tend to be more similar to each other on a variety of demographic and lifestyle characteristics than are randomly selected individuals from the U.S. population (Curry, 1993, p. 209). The clusters derived through geodemographic segmentation provide relatively homogeneous and distinctive lifestyle groupings for communication planning and targeting. sciences, health education, mass communication, and marketing for the crafting and delivery of messages for prevention and health promotion. The tie to marketing is even stronger within the subfield of social marketing, or prevention marketing, which adapts commercial marketing technologies to the analysis, planning, execution, and evaluation of programs designed to influence target audiences. As noted above, data-based audience analysis is one of these important technologies. PRIZM Lifestyle Clusters The segmentation system used here is the PRIZM© system (Claritas, Inc., 1994; Weiss, 1989,1999) developed by Claritas, Inc. which is currently part of one of the largest marketing information services in the U.S. The system was developed in the 1970's and has been reconstructed with every decennial census which provides the demographic foundation of the system. For purposes of communication, marketing data bases complement existing health data. The CDC excels in the collection and analysis of epidemiological data, but by itself this type of data does not provide communication planners with the information needed to analyze and understand audiences. Integration of syndicated marketing data with health data, and other kinds geographic data provides a foundation for a systematic, datadriven approach to health communication. An Office of Communication was created within the Office of the Director in 1996, and one of its functions is to work with CDC Centers, Institutes, and Offices (CIO’s) and state partners in the use of audience data. The first step in the development of the current PRIZM system was factor analysis of 1990 census data from the Summary Tape Files STF1 and STF3 for the more than 226,000 block groups in the U.S. (Barrett, 1994; Lavin, 1996) to obtain factors that account for most of the variation among block groups. Using these factors, a twostage cluster analysis was conducted. The first stage resulted in 15 social groups varying along five levels of urbanization and three levels of socioeconomic status. A second stage of domain-dependent clustering within each of these social groups subdivided them further into subgroups or lifestyle clusters on the basis of various demographic factors. The cluster solution was then tested and refined with large public and proprietary data bases involving several hundred million records on consumer GEODEMOGRAPHIC LIFESTYLE SEGMENTATION A widely-used framework for audience analysis is based on geodemographic segmentation of the population into distinct lifestyle segments or clusters. The segmentation involves grouping together small geographic units on the basis of demographic and other characteristics that they have in common. Underlying this is the notion that “birds of a feather flock together,” or more precisely, persons living in the same 2 behaviors involving purchases, media use, consumer credit, and other lifestyle data. adds information to better predict behavior (Curry, 1993, p. 209). The analysis yielded 62 clusters with distinct demographic and behavioral characteristics, with each of the 15 social groups containing between two to five clusters. Each cluster contains between 1/2% to 3% of the U.S. population. Every block group falls into to one of these clusters. The larger geographic units of census tracts and ZIP codes can also be given a single cluster designation based on their overall demographic characteristics, although there is likely to be more heterogeneity than there is within block groups. Larger geographic units such as counties are typically too large and diverse to be given a single cluster designation. Claritas conducts a proprietary update of the census data each year and areas are assigned to clusters based on their current-year demographics. A wide variety of syndicated data, public and proprietary, is available coded by PRIZM cluster. This project uses Simmons Market Research Bureau data from their Study of Media and Markets (SMM) which annually measures product, service, and media usage of a sample of 20,000 individuals age 18 and older, with updates every six months. The data sets include: Lifestyles, Magazines, Media Usage, Financial Product Usage, Product Usage, and Television. In addition, census demographic data are summarized by cluster. Thus, for each cluster, data are available for hundreds of consumer behaviors and demographic characteristics. PRIZM Clusters and Demographics Although the clusters are derived from demographics, they permit more precise targeting than do broad demographic categories, such as “the elderly”, “rural poor”, “men age 55+”, etc., that are sometimes used in planning and targeting. High concentrations of these groups occur in several different clusters with very different lifestyles, and different communication strategies may be necessary. Likewise the cluster framework provides finer distinctions for targeting populations defined in terms of race and ethnicity. The concentration of Black population, for example, exceeds the national average in 14 of the clusters and this population group is predominant in five of these clusters. The concentration of Hispanic population exceeds the national average in 21 clusters and this population group is predominant in five of these. The clusters within each group range from urban Claritas has assigned copyrighted nicknames to the clusters, and while not being literal descriptions, they do give some sense of cluster characteristics. Names such as “Kids & Cul-de-Sacs”, “Grain Belt”, “Blue Blood Estates”, and “Mines & Mills” call to mind rather different neighborhood types and lifestyles. Some of the key differentiating factors, along with the general urbanization and SES factors mentioned above, are the distributions and modal characteristics within the clusters of income levels, family life cycle stages, age, education, occupation, and housing types. The clustering concept does not imply that individuals and households within a cluster are identical, rather clusters group together people and areas that are similar in many respects, and knowledge of the cluster membership of ones’ audience 3 (3) Cluster data: Market research and demographic data summarized by cluster; to rural and from upper-middle income to low income. Recognition of these finer distinctions can be of value in more clearly defining the target audience(s) and the necessary communication approaches. (4) Health data: Any health-related data with geographic identifiers. CLUSTERS AND DATA INTEGRATION The first three data sets are included with the COMPASS software that is licensed for analysis using the PRIZM segmentation system. The fourth is user data that can be imported for particular projects and might actually encompass several data sets. For certain communication issues, the first three data sets may be all that are necessary, as some examples below illustrate. The geographic aspect of the clusters is the key to data integration. A cluster contains a portion of the population with common demographic and lifestyle characteristics. However, it is also an area -the collection of block groups containing that population. The clusters partition the U.S. just as the states do, except that the area comprising the cluster is not contiguous. This geographic basis provides the link for integrating a variety of data sets. Any data with geographic identifiers can be linked to other data with these identifiers and various types of communication-relevant information can be brought to bear on a particular problem. Use of the Data for Communication Planning Together, these data sets enable one to address the following communication questions: Who are the target audiences? The data can help define and prioritize audience segments to be reached. Data Sets What are they like? The data contain a wide variety of demographic and lifestyle information on audience segments and on populations in specific locations. In this project, four different kinds of data are linked together through geographic identifiers: (1) U.S. Census data: Demographic information on population and households from STF1 and STF3 for all block groups and higher levels of geography; Where are they? Target audiences can be located down to the block group level, and the distribution of audiences and salient audience characteristics can be displayed in maps. (2) GIS data: Geographic boundary files, roads, and landmarks; How can they be reached? The data provide information on the media preferences of target audiences for magazines, television, radio, and outdoor advertising. In addition, it 4 and OC on an immunization initiative for children in six cities, with a focus on media and public health partnerships in promoting immunization. The target population consisted of low SES groups with young children. In each of the cities, census tracts having low SES populations and high concentrations of young children were identified from census data. The predominant PRIZM clusters in those areas were then identified, and media preferences and lifestyle characteristics were profiled for each of the clusters. Maps showing the target areas and different clusters were also prepared. The PRIZM analysis thus provided further breakdowns of the population within the overall demographic definition of the target audience. This information was made available to working groups with representatives from state and local immunization programs, media organizations, and state public health information officers for the six cities. provides product use and lifestyle information useful in identifying partner organizations for message dissemination to specific audiences. For health communication purposes, any item associated with a public health issue can be used as a starting point and then linked to other information to address particular issues. Possible starting points can be geographic locations, demographic categories, marketing data, and health data. Some examples of each are described here. 1. Geographic Starting Point This type of application involved the collaboration of the Agency for Toxic Substances and Disease Registry (ATSDR) and the Office of Communication (OC). ATSDR is the principal federal public health agency involved with hazardous waste issues, and conducts and funds studies of communities to examine the relationship between exposure to hazardous substances and disease. The agency also works directly with communities to understand their health concerns and to assist them in taking steps to deal with these concerns. Given geographically defined exposure areas, PRIZM clusters occurring in those areas were identified and a wide range of lifestyle and communication-related information was summarized for use in planning community meetings and educational efforts. The community PRIZM cluster analysis complements other geographic, epidemiological, and toxicological analyses of the communities. 3. Marketing Data Starting Point Many of the audiences that the CDC targets for disease prevention messages report television as an important source of health information. Given the power of the entertainment media to reach mass audiences and convey information through characters and story lines, the Office of Communication is establishing links to the entertainment industry to provide technical assistance and education on important health topics. One project involves working with writers and producers of soap operas and an upcoming meeting with these groups will involve CDC speakers and awards for specific programs. Viewing rates for TV daytime drama vary considerably over the different PRIZM clusters, ranging from 10% of adults at the 2. Demographic Starting Point This project involved the collaboration of the National Immunization Program (NIP) 5 low end to more than 35% at the high end. The PRIZM cluster composition of the audience for daytime drama in general, as well as specific shows, and soap opera magazines was examined, and information on the lifestyle and demographic characteristics of the audiences was compiled for work in this area. It was found that some of the clusters in these audiences were the same clusters identified in the immunization promotion project described above, as well as clusters that had been identified for targeting in analyses of other health issues. these audience segments could be priorities for targeting to reach at-risk populations for promoting HIV status awareness. The gains due to segmentation can be seen in the Lorenz curve shown in Figure 2. A Lorenz curve allows one to compare the rate of accumulation in two cumulative distributions (Curry, 1993; Harris, 1996). The clusters are ranked by number of cases as in Figure 1. The solid line shows the percentage of the intended audience that would be targeted by directing the message to increasing numbers of clusters. It shows that by focusing on the top ten clusters, for example, one would be targeting over twothirds of the intended audience, projecting from the information from the case data. The dotted line shows, for comparison, how much of the intended audience would one would expect to target with a random selection of clusters, or the equivalent portions of the population. It can be seen that with prioritizing audience segments in this manner, one simultaneously increases targeting of the intended audience, while reducing the size of the population to which the message needs to be communicated. Roughly speaking, the greater the bulge in the solid line, the greater the benefits of segmentation; Curry (1993) references a Gini coefficient for quantifying this when comparing segmentation plans. 4. Health Data Starting Point In this section two projects involving the integration of epidemiological data with other data for communication planning are described. HIV Status Awareness. This project involved a collaboration among the National Center for HIV, STD and TB Prevention (NCHSTP), state health departments from pilot states, and OC. The objective was to identify PRIZM clusters with highest concentrations of at-risk population and relate this to communication information for promotion of HIV status awareness. The study involved twelve metropolitan statistical areas (MSA’s) with a total population of 27,156,391 persons age 13 and above in 1996. Counts of AIDS cases diagnosed in 1995 - 1997 by ZIP codes of residence were obtained, with a total of 26,218 cases. The case counts were examined with respect to the PRIZM cluster designation of ZIP codes. Some preliminary results for the total sample are shown in Figure 1. Here PRIZM clusters are ranked by number of cases. It can be seen that a majority of the cases tend to occur in a small number of PRIZM clusters and that 6 counts. The analysis indicated that five of the PRIZM clusters accounted for 63% of the cases, while at the same time these clusters contained only 15% of the population. The conclusion regarding the five primary PRIZM clusters to target was robust with respect to modifications and transformations to deal with outliers, influential cases, and heterogeneity of variance. Separate profiles of audience characteristics were prepared for each of the five clusters for use in communication planning. Recruitment from these clusters for focus groups for validating audience characteristics and preliminary message testing is now being planned. Actually, the concentration of cases within PRIZM clusters is even greater than Figure 1 indicates. Some ZIP codes in urban areas have large populations and can contain a number of different clusters at the block group level. While these block group clusters may have certain characteristics in common and may be similar to, or in some cases the same as, the overall ZIP code cluster designation, it is possible that the cases might be concentrated within only certain kinds of block groups. Some aspects of the data seemed to suggest this: While for some clusters in the initial analysis, all the ZIP codes had uniformly high incidence rates, for other clusters some ZIP codes had high rates and others had low rates. This latter observation suggested that variation in block group cluster composition among ZIP codes in the same cluster might be a possible cause of this. The number of block group households of each cluster type within ZIP codes was available in the data base. Least squares was used to examine the relationship between ZIP code case counts and block group cluster composition by regressing the case count on the cluster household counts. The regression coefficients obtained are rates per household, and when these are multiplied by household counts, the product gives some indication of the extent to which block group clusters contribute to the overall ZIP code Hantavirus Prevention. This project involved collaboration of the National Center for Infectious Disease (NCID) and OC. The data here consisted of 164 hantavirus cases diagnosed in the U.S. from 1993 - 1997 and the ZIP code of residence. It can be seen in Figure 3 that cases here also tend to be concentrated in a small number of PRIZM clusters. Ten clusters contain 73% of the cases, but only 18% of the population, and would be priority audience segments to target. 7 CONCLUSION Curry, D. J. (1993). The New Marketing Research Systems: How to Use Strategic Database Information for Better Marketing Decisions. New York: John Wiley & Sons, Inc. Harris, R. L. (1996). Information Graphics: A Comprehensive Illustrated Reference. Visual Tools for Analyzing, Managing, and Communicating. Atlanta, GA: Management Graphics. Lavin, M. R. (1996). Understanding the Census: A Guide for Marketers, Planners, Grant Writers and Other Data Base Users. Kenmore, NY: Epoch Books, Inc. Meyers, J. H. (1996). Segmentation and Positioning for Strategic Marketing Decisions. Chicago, IL: American Marketing Association. Weinstein, A. (1994). Market Segmentation: Using Demographics, Psychographics and Other Niche Marketing Techniques to Predict Consumer Behavior. (Revised Edition ed.). Chicago, IL: Probus Publishing Co. Weiss, M. J. (1989). The Clustering of America. New York: Harper & Row, Publishers, Inc. Weiss, M. J. (1999). The Clustered World: A Guide to Lifestyles in America and Beyond. New York: Little Brown, Inc. Use of the integrated data sets discussed here brings together a wide variety of data for communication planning. The geographic links among the data sets allow information about populations, locations, behaviors, and communication factors to be simultaneously brought to bear on a particular problem. As Curry writes, “These links constitute the true power of these systems: they expand our knowledge about a cluster and relate this knowledge to other pieces of information. Links make a system ‘smart;’ they take seemingly unrelated facts and crossreference them with others related to the same problem” (1993, p. 259). The use of syndicated marketing data bases here adds a critical component for health communication. The data bases contain a wide range of information for understanding audiences. They are readily available in off-the-shelf form, and are very timely and cost effective considering what would be involved to collect this data. The availability through syndication has the further advantage of reducing the need for government collection of personal data of this nature. Finally, combining the marketing data with other types of data in the integrated GIS framework discussed here allows one to get more out of existing geographic, demographic, and health data in communication planning. ___________________________________ _ Dr. Pollard is a Health Communication Research Fellow. He serves as a market research data base specialist and statistical analyst, advising CIO’s and state partners in the use of integrated data sets for public health communication. REFERENCES Barrett, R. E. (1994). Using the 1990 U.S. Census for Research. Thousand Oaks, CA: Sage Publications, Inc. Claritas, Inc. (1994). PRIZM Methodology: Claritas, Inc. Dr. Kirby is a Senior Health Communication Specialist and is a senior advisor in social 8 marketing research and health communication planning. She is the primary contact for CIO’s and state partners for using data bases for audience segmentation. 9