External Data Sources AD MS 2008 CAS Ratemaking Seminar

advertisement
External Data Sources
2008 CAS Ratemaking Seminar
March 17-18, 2008
John Stenmark
Consulting Actuary
Actuarial Data Management Services
AD
MS
1
External Data Sources
Overview
Why use (Zip Code based) Demographic
Data
 Zip Code vs. Zip Code Tabulation Area
Issues
 Some Possible Methodologies to Address
those Issues
 Census.gov Data Guide
 Cartographic Boundary File Guide

2
External Data Sources
Why use (Zip Code based) Demographic Data





Predictive Modeling allows/encourages the use of data outside
of the rating variables and, in fact, outside of the company.
The first external data that a company is likely to use is
Demographic Data, usually by Zip Code.
Predictive Modeling has two phases: the Modeling itself (usually
frequency and severity based) and derivation of rates and
relativities using the modeled data (frequency and severity
combined into modeled pure premium)
Demographic data is used in the Modeling Phase
Especially useful in a multi-state database
3
External Data Sources





Zip Code vs. Zip Code Tabulation Area Issues
The Problem
Postal Zip Codes are not defined by Boundaries, but by postal
routes.
ZCTAs (as used in this presentation) were defined during the
2000 census as boundaries by the U.S. Census Bureau based, at
least partially, on the U. S. Postal Zip Codes at that time and have
not been changed.
The Postal Zip Codes in your data do change quite often.
Over time your insured postal zip codes (and the territory
boundaries defined from those codes) will increasingly diverge
from the ZCTAs.
Therefore the Zip Code for a particular policy or claim may not
have demographic data associated with it.
4
External Data Sources
Zip Code vs. Zip Code Tabulation Area Issues
The Solution(s)





Assign Derived Demographic Data elements by
County (filling entire database)
Then assign data elements by ZCTA where there
is a match
Disadvantage: Slight inaccuracy problem where Postal
Zip is in a different geographic area from ZCTA
Disadvantage: Precision is inconsistent (county
demographics for some insureds and Zip for others)
Advantage: Easy to apply
5
External Data Sources
Zip Code vs. Zip Code Tabulation Area Issues
The Solution(s)





Geocode company data
Assign each policy/claim to ZCTA using U. S.
Census Bureau boundary files.
Assign ZCTA demographic data to policy/claim
Disadvantage: More complex and time consuming
(resource intensive)
Advantage: Far more accurate
6
External Data Sources
Census Data











There are numerous sources of demographic data but…
The source for most of these data is the US Census Bureau at
http://www.census.gov/ .
Many variables can be derived from this data.
Some possible variables appear below:
Average Education Years
Population Density
Mean Age
Percent Rural
Percent Farm
Travel Time
Median Income





Median year Owner occupied structure
built
Median year householder moved into unit
Median value for all owner-occupied
housing units
Median price asked
Median selected monthly owner costs
7
External Data Sources
Census Data
8
External Data Sources
Census Data
9
External Data Sources
Census Data
10
External Data Sources
Census Data
11
External Data Sources
Census Data
12
External Data Sources
Census Data
So how do you use the data from these Zip Files?
Very Cryptic Text Files
An Access database is available (referenced in the Readme document.)
The text below is from:
ftp://ftp2.census.gov/census_2000/datasets/Summary_File_3/Arkansas/0README_SF3.doc
■ For step-by-step instructions for moving the data and the structure into a data base format (including screen shots), please see
www.census.gov/support/SF3ASCII.html .
■ Structure files in Access97 and other formats are available at
http://www.census.gov/support/2000/SF3/ .
■ We are unable to provide one-on-one support for applications of the data to specific
spreadsheets or data base software.
13
External Data Sources
Census Data
To download SF3.mdb Click here:
14
External Data Sources
Census Data
After Downloading SF3.mdb open the
Access database
Seventy six tables corresponding to the
seventy-six zipped ftp files
In addition a SF3GEO Table, SF3GEO
Dictionary Table and a Tables Table
These define the structure of the
database
15
External Data Sources
Census Data




Data is imported into the tables using the File – Get
External Data – Import Command.
You will need to change the file extensions from .uf3
to .txt for this to work.
The geo files are fixed width the others are comma
delimited
The database has specs for each table and these can
(should) be accessed using the Advanced button on the
import wizard.
16
17
External Data Sources
Census Data


Use the “Tables” Table to select the columns that you
want, then determine which files you need to import.
Remember that the tables contain all geographic areas:
State, County, Zip, Block, County/Zip, etc.
You
will need to work with one of those at a time.
Summing the entire file will scale up your results somewhat
18
External Data Sources
Census Data
The Tables Table
The TEXT column provides the
description of each data element
The TABLE column provides the Table
name (remember the data must be loaded
into each needed table)
The FIELDNUM column provides the
Column Name that will contain the data
element.
In this case to get one stat (e.g. Average
Education) a weighted average is required
So to get Average Education the table
(P037) tells us we must load Table
SF30003 from file named la00003 and
Select P037001 thru P037035
19
External Data Sources
Census Data
The SF3GEO Table
Indexed on LOGRECNO Column
The NAME column provides the Description for each row
The ZCTA5 Column provides the five digit Zip for the row
Notice that there are partial Zips (split between Parishes/Counties)
20
External Data Sources
Census Data



By joining SF3GEO and the selected
table on LOGRECNO the
demographic data is by columns and
the geographic data is by rows.
Note: Make sure the query selects only
the geographic data desired. I. e. give it
the smell test
The following query:
SELECT SF3GEO.ZCTA5, SF3GEO.AREALAND,
SF3GEO.NAME, SF30003.P037001
FROM SF3GEO INNER JOIN SF30003 ON
SF3GEO.LOGRECNO = SF30003.LOGRECNO
WHERE (((SF3GEO.COUNTY) Is Null) AND
((SF3GEO.ZCTA5) Not Like "###XX" And
(SF3GEO.ZCTA5) Not Like "###HH"))
ORDER BY SF3GEO.ZCTA5;
Yields:
21
External Data Sources
Cartographic Boundary Files




So how do you use the data from these Zip Files?
Very Cryptic Text Files
An Access database is available (referenced in the Readme
document).
The text below is from:
ftp://ftp2.census.gov/census_2000/datasets/Summary_File_3/Arkansas/0README_SF3.doc
(but any of the 0readme.doc will do)
■ For step-by-step instructions for moving the data and the structure into a data base format (including screen shots), please see
www.census.gov/support/SF3ASCII.html .
■ Structure files in Access97 and other formats are available at
http://www.census.gov/support/2000/SF3/ .
■ We are unable to provide one-on-one support for applications of the data to specific
spreadsheets or data base software.
22
External Data Sources
Cartographic Boundary Files



In addition to demographic data
the Census Bureau publishes
boundary files for each of its
boundaries
Remember - since Postal Zip
Codes and Postal Zip Code
definitions change over time
and the Census Bureau
redefined ZCTAs somewhat for
the census there will be a
mismatch between the
boundary files and the Zip
Codes in your experience
database
First go to:
http://www.census.gov/geo/www/cob/bdy_files.html
23
External Data Sources
Cartographic Boundary Files
From there:
 For 5-Digit ZIP Code Tabulation Areas (ZCTAs) go to:
http://www.census.gov/geo/www/cob/z52000.html
 For County and County Equivalent Areas go to:
http://www.census.gov/geo/www/cob/co2000.html
24
External Data Sources
Cartographic Boundary Files

Three types of files on each page. For ZCTAs they are:




Census 2000 5-Digit ZIP Code Tabulation Areas (ZCTAs) in
ARC/INFO Export (.e00) format
Census 2000 5-Digit ZIP Code Tabulation Areas (ZCTAs) in
ArcView Shapefile (.shp) format
Census 2000 5-Digit ZIP Code Tabulation Areas (ZCTAs) in
ARC/INFO Ungenerate (ASCII) format
Most mapping software will read the Shapefile format.
25
External Data Sources
2008 CAS Ratemaking Seminar
March 17-18, 2008
AD
MS
John Stenmark
Consulting Actuary
Actuarial Data Management Services
(601) 955-3022
jstenmark@comcast.net
26
External Data Sources
2008 CAS Ratemaking Seminar
March 17-18, 2008
AD
MS
John Stenmark
Consulting Actuary
Actuarial Data Management Services
(601) 955-3022
jstenmark@comcast.net
27
Download