DSCI 4520/5240: Data Mining
Fall 2013 – Dr. Nick Evangelopoulos
DONOR_RAW: Data
Description
Some slide material taken from: SAS Education
Original artwork © Nick Evangelopoulos, 2013
slide 1
DSCI 4520/5240
DATA MINING
DonorRAW: Data Overview
Determine who is likely to
donate to a non-profit
organization campaign and
target them for donation
solicitation
The scenario is the same as
the one that produced the data
set MYRAW
This time we have somewhat
different data
slide 2
DSCI 4520/5240
DATA MINING
DONOR_RAW: Nonprofit
donation solicitation scenario
In 1997, a non-profit organization
related to U.S. military veterans had
a regular donation solicitation
campaign called 97NK. For each
person targeted by the campaign,
certain information (at a personal or
at a demographic level) was known
beforehand. Solicitation response
(whether they donated and, if yes,
what amount) was recorded.
In 1998, the organization offered the full dataset to analysts
(under certain conditions). The particular data set
DONOR_RAW is a subset that includes 50 variables and
about 19,400 observations.
slide 3
DSCI 4520/5240
DATA MINING
The Charity Donation Project
Business: A National veterans’ organization
Objective: From population of lapsing donors,
identify individuals worth continued
solicitation.
Source: 1998 KDD-Cup Competition via
UCI KDD Archive
slide 4
Data Preparation
DSCI 4520/5240
DATA MINING
Donor Master
Demographics
Raw Analysis Data
Transaction Detail
95,412 Records
481 Fields
slide 5
DSCI 4520/5240
DATA MINING
Additional Data Preparation
Final Analysis Data
(DONOR_RAW)
Raw Analysis Data
19,372 Records
50 Fields
95,412 Records
481 Fields
slide 6
Analysis Data Definition
DSCI 4520/5240
DATA MINING
Donor master data
CONTROL_NUMBER
MONTHS_SINCE_ORIGIN
Unique Donor ID
Elapsed time since first donation
IN_HOUSE
1=Given to In House program,
0=Not In House donor
slide 7
DSCI 4520/5240
DATA MINING
Analysis Data Definition
Demographic and other overlay data
OVERLAY_SOURCE
DONOR_AGE
M=Metromail, P=Polk, B=both
Age as of June 1997
DONOR_GENDER
PUBLISHED_PHONE
HOME_OWNER
MOR_HIT
Actual or inferred gender
Published telephone listing
H=homeowner, U=unknown
Mail order response hit rate
slide 8
DSCI 4520/5240
DATA MINING
Analysis Data Definition
Demographic and other overlay data
CLUSTER_CODE
SES
54 Socio-economic cluster codes
5 Socio-economic cluster codes
INCOME_GROUP
MED_HOUSEHOLD_INCOME
PER_CAPITA_INCOME
WEALTH_RATING
7 income group levels
Median income in $100’s
Income per capita in dollars
10 wealth rating groups
slide 9
DSCI 4520/5240
DATA MINING
Analysis Data Definition
Demographic and other overlay data
MED_HOME_VALUE
Median home value in $100’s
PCT_OWNER_OCCUPIED
URBANICITY
Percent owner occupied housing
U=urban, C=city, S=suburban,
T=town, R=rural, ?=unknown
slide 10
DSCI 4520/5240
DATA MINING
Analysis Data Definition
Census overlay data
PCT_MALE_MILITARY
Percent male military in block
PCT_MALE_VETERANS
PCT_VIETNAM_VETERANS
PCT_WWII_VETERANS
Percent male veterans in block
Percent Vietnam veterans in block
Percent WWII veterans in block
slide 11
Analysis Data Definition
DSCI 4520/5240
DATA MINING
Transaction detail data
NUMBER_PROM_12
Number promotions last 12 mos.
CARD_PROM_12
Number card promotions last 12 mos.
97NK
Time
`94
`95
`96
`97
`98
slide 12
Analysis Data Definition
DSCI 4520/5240
DATA MINING
Transaction detail data
FREQ_STATUS_97NK
Frequency status, June `97
RECENCY_STATUS_96NK
Recency status, June `96
MONTHS_SINCE_LAST
Months since last donation
LAST_GIFT_AMT
Amount of most recent donation
96NK
97NK
Time
`94
`95
`96
`97
`98
slide 13
Analysis Data Definition
DSCI 4520/5240
DATA MINING
RECENT transaction detail data
RESPONSE_PROP
Response proportion since June `94
RESPONSE_COUNT
Response count since June `94
AVG_GIFT_AMT
Average gift amount since June `94
RECENT_STAR_STATUS
STAR (1, 0) status since June `94
94NK
96NK
Time
`94
`95
`96
`97
`98
slide 14
Analysis Data Definition
DSCI 4520/5240
DATA MINING
RECENT transaction detail data
CARD_RESPONSE_PROP
Response proportion since June `94
CARD_RESPONSE_COUNT
Response count since June `94
CARD_AVG_GIFT_AMT
Average gift amount since June `94
94NK
96NK
Time
`94
`95
`96
`97
`98
slide 15
Analysis Data Definition
DSCI 4520/5240
DATA MINING
LIFETIME transaction detail data
PROM
Total number promotions ever
GIFT_COUNT
Total number donations ever
AVG_GIFT_AMT
Overall average gift amount
PEP_STAR
STAR status ever (1=yes, 0=no)
94NK
96NK
Time
`94
`95
`96
`97
`98
slide 16
Analysis Data Definition
DSCI 4520/5240
DATA MINING
LIFETIME transaction detail data
GIFT_AMOUNT
Total gift amount ever
GIFT_COUNT
Total number donations ever
MAX_GIFT
Maximum gift amount
GIFT_RANGE
Maximum less minimum gift amount
94NK
96NK
Time
`94
`95
`96
`97
`98
slide 17
Analysis Data Definition
DSCI 4520/5240
DATA MINING
KDD supplied LIFETIME transaction detail data
FILE_AVG_GIFT
Average gift from raw data
FILE_CARD_GIFT
Average card gift raw data
MONTHS_SINCE_FIRST
First donation date from June `97
MONTHS_SINCE_LAST
Last donation date from June `97
94NK
96NK
Time
`94
`95
`96
`97
`98
slide 18
Analysis Data Definition
DSCI 4520/5240
DATA MINING
Transaction detail data target definition
TARGET_B
Response to 97NK solicitation (1=yes 0=no)
TARGET_D
Response amount to 97NK solicitation
(missing if no response)
97NK
Time
`94
`95
`96
`97
`98
slide 19
DSCI 4520/5240
DATA MINING
PR1 assignment

Use the DONOR_RAW data table, found in the
C:\4520data folder of the SAS EM5.3 server

Follow similar analysis steps to those shown in the
Getting Started with SAS Enterprise Miner 5.3 text,
pp.23-44, to start a new analysis project and make a
Data Source called DONOR_RAW available.

Then follow pp. 45-60. Generate descriptive statistics
(pp. 46-51), create exploratory plots (pp. 51-53),
partition the raw data (pp. 54-55), explore missing
values (pp. 55-58), and replace observations with
unknown levels (pp. 58-60). Handout PR1 lists
exactly what you need to turn in.
slide 20