Geographic Information Systems (GIS), Public Health Data, and

advertisement
GEOGRAPHIC INFORMATION SYSTEMS (GIS), PUBLIC HEALTH DATA, AND
SYNDICATED MARKET RESEARCH DATA BASES IN HEALTH
COMMUNICATION
William E. Pollard, Ph.D.
Susan D. Kirby, Dr.P.H.
Office of Communication
Centers for Disease Control and Prevention
1600 Clifton Rd., N.E., Mailstop D-42
Atlanta, GA 30333
OVERVIEW
of the population to target with the message,
and to identify audience segments that may
differ in interests, lifestyle, and media habits
in order to design messages with the
appropriate content, design, and media
channels (Meyers, 1996; Weinstein, 1994).
In this presentation the use of this data for
audience segmentation in health
communication planning is examined.
In this presentation we focus on
statistical aspects of some new directions in
health communication planning at the Centers
for Disease Control and Prevention. In
particular, we discuss how the integration of
syndicated market research data with GIS
data and public health data, within the
geographic structure of U.S. census data,
provides a comprehensive framework for
data-driven health communication planning.
Health Communication at the CDC
The CDC is “the nation’s prevention
agency” with a mission of promoting health
through scientific, data-based prevention
policies and activities. Since the creation of
the CDC in 1946 with a focus on control of
malaria, typhus, and other communicable
diseases, the scope of health issues
addressed has expanded to include such
problems as injuries, noncommunicable
lifestyle illnesses, environmental health,
occupational health, and AIDS and sexually
transmitted disease, and a key element of
prevention in these areas is behavior change.
One approach to changing behavior
is through communication, and the field of
health communication draws from the fields
of social psychology and other social
Syndicated Market Research Data Bases
While public health researchers are
likely to be familiar with the latter three types
of data, marketing data bases require some
discussion here. These data bases are widely
used in the commercial sector to develop
messages to promote products and services to
potential customers. They contain proprietary
and public information on sociodemographic
characteristics, consumer behavior, lifestyle
activities, and media habits of potential
customers, and are available through licensing
and contractual agreements. A primary use of
such data is that of audience segmentation.
The data bases are used to identify segments
1
neighborhood tend to be more similar to each
other on a variety of demographic and
lifestyle characteristics than are randomly
selected individuals from the U.S. population
(Curry, 1993, p. 209). The clusters derived
through geodemographic segmentation
provide relatively homogeneous and
distinctive lifestyle groupings for
communication planning and targeting.
sciences, health education, mass
communication, and marketing for the
crafting and delivery of messages for
prevention and health promotion. The tie to
marketing is even stronger within the subfield
of social marketing, or prevention marketing,
which adapts commercial marketing
technologies to the analysis, planning,
execution, and evaluation of programs
designed to influence target audiences. As
noted above, data-based audience analysis is
one of these important technologies.
PRIZM Lifestyle Clusters
The segmentation system used here
is the PRIZM© system (Claritas, Inc., 1994;
Weiss, 1989,1999) developed by Claritas,
Inc. which is currently part of one of the
largest marketing information services in the
U.S. The system was developed in the
1970's and has been reconstructed with
every decennial census which provides the
demographic foundation of the system.
For purposes of communication,
marketing data bases complement existing
health data. The CDC excels in the collection
and analysis of epidemiological data, but by
itself this type of data does not provide
communication planners with the information
needed to analyze and understand audiences.
Integration of syndicated marketing data with
health data, and other kinds geographic data
provides a foundation for a systematic, datadriven approach to health communication.
An Office of Communication was created
within the Office of the Director in 1996, and
one of its functions is to work with CDC
Centers, Institutes, and Offices (CIO’s) and
state partners in the use of audience data.
The first step in the development of
the current PRIZM system was factor
analysis of 1990 census data from the
Summary Tape Files STF1 and STF3 for the
more than 226,000 block groups in the U.S.
(Barrett, 1994; Lavin, 1996) to obtain factors
that account for most of the variation among
block groups. Using these factors, a twostage cluster analysis was conducted. The
first stage resulted in 15 social groups
varying along five levels of urbanization and
three levels of socioeconomic status. A
second stage of domain-dependent
clustering within each of these social groups
subdivided them further into subgroups or
lifestyle clusters on the basis of various
demographic factors. The cluster solution
was then tested and refined with large public
and proprietary data bases involving several
hundred million records on consumer
GEODEMOGRAPHIC LIFESTYLE
SEGMENTATION
A widely-used framework for
audience analysis is based on
geodemographic segmentation of the
population into distinct lifestyle segments or
clusters. The segmentation involves grouping
together small geographic units on the basis
of demographic and other characteristics that
they have in common. Underlying this is the
notion that “birds of a feather flock together,”
or more precisely, persons living in the same
2
behaviors involving purchases, media use,
consumer credit, and other lifestyle data.
adds information to better predict behavior
(Curry, 1993, p. 209).
The analysis yielded 62 clusters with
distinct demographic and behavioral
characteristics, with each of the 15 social
groups containing between two to five
clusters. Each cluster contains between 1/2%
to 3% of the U.S. population. Every block
group falls into to one of these clusters. The
larger geographic units of census tracts and
ZIP codes can also be given a single cluster
designation based on their overall
demographic characteristics, although there is
likely to be more heterogeneity than there is
within block groups. Larger geographic units
such as counties are typically too large and
diverse to be given a single cluster
designation. Claritas conducts a proprietary
update of the census data each year and areas
are assigned to clusters based on their
current-year demographics.
A wide variety of syndicated data,
public and proprietary, is available coded by
PRIZM cluster. This project uses Simmons
Market Research Bureau data from their
Study of Media and Markets (SMM) which
annually measures product, service, and
media usage of a sample of 20,000
individuals age 18 and older, with updates
every six months. The data sets include:
Lifestyles, Magazines, Media Usage,
Financial Product Usage, Product Usage,
and Television. In addition, census
demographic data are summarized by
cluster. Thus, for each cluster, data are
available for hundreds of consumer
behaviors and demographic characteristics.
PRIZM Clusters and Demographics
Although the clusters are derived
from demographics, they permit more
precise targeting than do broad demographic
categories, such as “the elderly”, “rural
poor”, “men age 55+”, etc., that are
sometimes used in planning and targeting.
High concentrations of these groups occur in
several different clusters with very different
lifestyles, and different communication
strategies may be necessary. Likewise the
cluster framework provides finer distinctions
for targeting populations defined in terms of
race and ethnicity. The concentration of
Black population, for example, exceeds the
national average in 14 of the clusters and this
population group is predominant in five of
these clusters. The concentration of
Hispanic population exceeds the national
average in 21 clusters and this population
group is predominant in five of these. The
clusters within each group range from urban
Claritas has assigned copyrighted
nicknames to the clusters, and while not being
literal descriptions, they do give some sense
of cluster characteristics. Names such as
“Kids & Cul-de-Sacs”, “Grain Belt”, “Blue
Blood Estates”, and “Mines & Mills” call to
mind rather different neighborhood types and
lifestyles. Some of the key differentiating
factors, along with the general urbanization
and SES factors mentioned above, are the
distributions and modal characteristics within
the clusters of income levels, family life cycle
stages, age, education, occupation, and
housing types. The clustering concept does
not imply that individuals and households
within a cluster are identical, rather clusters
group together people and areas that are
similar in many respects, and knowledge of
the cluster membership of ones’ audience
3
(3) Cluster data: Market research and
demographic data summarized by
cluster;
to rural and from upper-middle income to low
income. Recognition of these finer
distinctions can be of value in more clearly
defining the target audience(s) and the
necessary communication approaches.
(4) Health data: Any health-related
data with geographic identifiers.
CLUSTERS AND DATA
INTEGRATION
The first three data sets are included with the
COMPASS software that is licensed for
analysis using the PRIZM segmentation
system. The fourth is user data that can be
imported for particular projects and might
actually encompass several data sets. For
certain communication issues, the first three
data sets may be all that are necessary, as
some examples below illustrate.
The geographic aspect of the clusters
is the key to data integration. A cluster
contains a portion of the population with
common demographic and lifestyle
characteristics. However, it is also an area -the collection of block groups containing that
population. The clusters partition the U.S.
just as the states do, except that the area
comprising the cluster is not contiguous. This
geographic basis provides the link for
integrating a variety of data sets. Any data
with geographic identifiers can be linked to
other data with these identifiers and various
types of communication-relevant information
can be brought to bear on a particular
problem.
Use of the Data for Communication
Planning
Together, these data sets enable one
to address the following communication
questions:
Who are the target audiences? The data can
help define and prioritize audience segments
to be reached.
Data Sets
What are they like? The data contain a wide
variety of demographic and lifestyle
information on audience segments and on
populations in specific locations.
In this project, four different kinds of
data are linked together through geographic
identifiers:
(1) U.S. Census data: Demographic
information on population and
households from STF1 and STF3 for
all block groups and higher levels of
geography;
Where are they? Target audiences can be
located down to the block group level, and
the distribution of audiences and salient
audience characteristics can be displayed in
maps.
(2) GIS data: Geographic boundary
files, roads, and landmarks;
How can they be reached? The data provide
information on the media preferences of
target audiences for magazines, television,
radio, and outdoor advertising. In addition, it
4
and OC on an immunization initiative for
children in six cities, with a focus on media
and public health partnerships in promoting
immunization. The target population
consisted of low SES groups with young
children. In each of the cities, census tracts
having low SES populations and high
concentrations of young children were
identified from census data. The
predominant PRIZM clusters in those areas
were then identified, and media preferences
and lifestyle characteristics were profiled for
each of the clusters. Maps showing the
target areas and different clusters were also
prepared. The PRIZM analysis thus
provided further breakdowns of the
population within the overall demographic
definition of the target audience. This
information was made available to working
groups with representatives from state and
local immunization programs, media
organizations, and state public health
information officers for the six cities.
provides product use and lifestyle information
useful in identifying partner organizations for
message dissemination to specific audiences.
For health communication purposes,
any item associated with a public health issue
can be used as a starting point and then linked
to other information to address particular
issues. Possible starting points can be
geographic locations, demographic
categories, marketing data, and health data.
Some examples of each are described here.
1. Geographic Starting Point
This type of application involved the
collaboration of the Agency for Toxic
Substances and Disease Registry (ATSDR)
and the Office of Communication (OC).
ATSDR is the principal federal public health
agency involved with hazardous waste issues,
and conducts and funds studies of
communities to examine the relationship
between exposure to hazardous substances
and disease. The agency also works directly
with communities to understand their health
concerns and to assist them in taking steps to
deal with these concerns. Given
geographically defined exposure areas,
PRIZM clusters occurring in those areas were
identified and a wide range of lifestyle and
communication-related information was
summarized for use in planning community
meetings and educational efforts. The
community PRIZM cluster analysis
complements other geographic,
epidemiological, and toxicological analyses of
the communities.
3. Marketing Data Starting Point
Many of the audiences that the CDC
targets for disease prevention messages
report television as an important source of
health information. Given the power of the
entertainment media to reach mass audiences
and convey information through characters
and story lines, the Office of Communication
is establishing links to the entertainment
industry to provide technical assistance and
education on important health topics. One
project involves working with writers and
producers of soap operas and an upcoming
meeting with these groups will involve CDC
speakers and awards for specific programs.
Viewing rates for TV daytime drama vary
considerably over the different PRIZM
clusters, ranging from 10% of adults at the
2. Demographic Starting Point
This project involved the collaboration
of the National Immunization Program (NIP)
5
low end to more than 35% at the high end.
The PRIZM cluster composition of the
audience for daytime drama in general, as
well as specific shows, and soap opera
magazines was examined, and information on
the lifestyle and demographic characteristics
of the audiences was compiled for work in
this area. It was found that some of the
clusters in these audiences were the same
clusters identified in the immunization
promotion project described above, as well as
clusters that had been identified for targeting
in analyses of other health issues.
these audience segments could be priorities
for targeting to reach at-risk populations for
promoting HIV status awareness.
The gains due to segmentation can be
seen in the Lorenz curve shown in Figure 2.
A Lorenz curve allows one to compare the
rate of accumulation in two cumulative
distributions (Curry, 1993; Harris, 1996).
The clusters are ranked by number of cases
as in Figure 1. The solid line shows the
percentage of the intended audience that
would be targeted by directing the message
to increasing numbers of clusters. It shows
that by focusing on the top ten clusters, for
example, one would be targeting over twothirds of the intended audience, projecting
from the information from the case data.
The dotted line shows, for comparison, how
much of the intended audience would one
would expect to target with a random
selection of clusters, or the equivalent
portions of the population. It can be seen
that with prioritizing audience segments in
this manner, one simultaneously increases
targeting of the intended audience, while
reducing the size of the population to which
the message needs to be communicated.
Roughly speaking, the greater the bulge in
the solid line, the greater the benefits of
segmentation; Curry (1993) references a
Gini coefficient for quantifying this when
comparing segmentation plans.
4. Health Data Starting Point
In this section two projects involving
the integration of epidemiological data with
other data for communication planning are
described.
HIV Status Awareness. This project
involved a collaboration among the National
Center for HIV, STD and TB Prevention
(NCHSTP), state health departments from
pilot states, and OC. The objective was to
identify PRIZM clusters with highest
concentrations of at-risk population and relate
this to communication information for
promotion of HIV status awareness. The
study involved twelve metropolitan statistical
areas (MSA’s) with a total population of
27,156,391 persons age 13 and above in
1996. Counts of AIDS cases diagnosed in
1995 - 1997 by ZIP codes of residence were
obtained, with a total of 26,218 cases. The
case counts were examined with respect to
the PRIZM cluster designation of ZIP codes.
Some preliminary results for the total sample
are shown in Figure 1. Here PRIZM clusters
are ranked by number of cases. It can be seen
that a majority of the cases tend to occur in a
small number of PRIZM clusters and that
6
counts. The analysis indicated that five of
the PRIZM clusters accounted for 63% of
the cases, while at the same time these
clusters contained only 15% of the
population. The conclusion regarding the
five primary PRIZM clusters to target was
robust with respect to modifications and
transformations to deal with outliers,
influential cases, and heterogeneity of
variance.
Separate profiles of audience
characteristics were prepared for each of the
five clusters for use in communication
planning. Recruitment from these clusters
for focus groups for validating audience
characteristics and preliminary message
testing is now being planned.
Actually, the concentration of cases
within PRIZM clusters is even greater than
Figure 1 indicates. Some ZIP codes in urban
areas have large populations and can contain a
number of different clusters at the block
group level. While these block group clusters
may have certain characteristics in common
and may be similar to, or in some cases the
same as, the overall ZIP code cluster
designation, it is possible that the cases might
be concentrated within only certain kinds of
block groups. Some aspects of the data
seemed to suggest this: While for some
clusters in the initial analysis, all the ZIP
codes had uniformly high incidence rates, for
other clusters some ZIP codes had high rates
and others had low rates. This latter
observation suggested that variation in block
group cluster composition among ZIP codes
in the same cluster might be a possible cause
of this. The number of block group
households of each cluster type within ZIP
codes was available in the data base. Least
squares was used to examine the relationship
between ZIP code case counts and block
group cluster composition by regressing the
case count on the cluster household counts.
The regression coefficients obtained are rates
per household, and when these are multiplied
by household counts, the product gives some
indication of the extent to which block group
clusters contribute to the overall ZIP code
Hantavirus Prevention. This
project involved collaboration of the National
Center for Infectious Disease (NCID) and
OC. The data here consisted of 164
hantavirus cases diagnosed in the U.S. from
1993 - 1997 and the ZIP code of residence.
It can be seen in Figure 3 that cases here also
tend to be concentrated in a small number of
PRIZM clusters. Ten clusters contain 73%
of the cases, but only 18% of the population,
and would be priority audience segments to
target.
7
CONCLUSION
Curry, D. J. (1993). The New
Marketing Research Systems: How to Use
Strategic Database Information for Better
Marketing Decisions. New York: John Wiley
& Sons, Inc.
Harris, R. L. (1996). Information
Graphics: A Comprehensive Illustrated
Reference. Visual Tools for Analyzing,
Managing, and Communicating. Atlanta,
GA: Management Graphics.
Lavin, M. R. (1996). Understanding
the Census: A Guide for Marketers,
Planners, Grant Writers and Other Data Base
Users. Kenmore, NY: Epoch Books, Inc.
Meyers, J. H. (1996). Segmentation
and Positioning for Strategic Marketing
Decisions. Chicago, IL: American Marketing
Association.
Weinstein, A. (1994). Market
Segmentation: Using Demographics,
Psychographics and Other Niche Marketing
Techniques to Predict Consumer Behavior.
(Revised Edition ed.). Chicago, IL: Probus
Publishing Co.
Weiss, M. J. (1989). The Clustering
of America. New York: Harper & Row,
Publishers, Inc.
Weiss, M. J. (1999). The Clustered
World: A Guide to Lifestyles in America
and Beyond. New York: Little Brown, Inc.
Use of the integrated data sets
discussed here brings together a wide variety
of data for communication planning. The
geographic links among the data sets allow
information about populations, locations,
behaviors, and communication factors to be
simultaneously brought to bear on a particular
problem. As Curry writes, “These links
constitute the true power of these systems:
they expand our knowledge about a cluster
and relate this knowledge to other pieces of
information. Links make a system ‘smart;’
they take seemingly unrelated facts and crossreference them with others related to the same
problem” (1993, p. 259).
The use of syndicated marketing data
bases here adds a critical component for
health communication. The data bases
contain a wide range of information for
understanding audiences. They are readily
available in off-the-shelf form, and are very
timely and cost effective considering what
would be involved to collect this data. The
availability through syndication has the
further advantage of reducing the need for
government collection of personal data of this
nature. Finally, combining the marketing data
with other types of data in the integrated GIS
framework discussed here allows one to get
more out of existing geographic,
demographic, and health data in
communication planning.
___________________________________
_
Dr. Pollard is a Health Communication
Research Fellow. He serves as a market
research data base specialist and statistical
analyst, advising CIO’s and state partners in
the use of integrated data sets for public
health communication.
REFERENCES
Barrett, R. E. (1994). Using the 1990
U.S. Census for Research. Thousand Oaks,
CA: Sage Publications, Inc.
Claritas, Inc. (1994). PRIZM
Methodology: Claritas, Inc.
Dr. Kirby is a Senior Health Communication
Specialist and is a senior advisor in social
8
marketing research and health communication
planning. She is the primary contact for
CIO’s and state partners for using data bases
for audience segmentation.
9
Download