DSCI 4520/5240: Data Mining
Fall 2013 – Dr. Nick Evangelopoulos
Some slide material taken from: SAS Education
Original artwork © Nick Evangelopoulos, 2013 slide 1
DSCI 4520/5240
DATA MINING
Determine who is likely to donate to a non-profit organization campaign and target them for donation solicitation
The scenario is the same as the one that produced the data set MYRAW
This time we have somewhat different data slide 2
DSCI 4520/5240
DATA MINING
In 1997, a non-profit organization related to U.S. military veterans had a regular donation solicitation campaign called 97NK. For each person targeted by the campaign, certain information (at a personal or at a demographic level) was known beforehand. Solicitation response
(whether they donated and, if yes, what amount) was recorded.
In 1998, the organization offered the full dataset to analysts
(under certain conditions). The particular data set
DONOR_RAW is a subset that includes 50 variables and about 19,400 observations .
slide 3
DSCI 4520/5240
DATA MINING
The Charity Donation Project
Business
:
A National veterans’ organization
Objective: From population of lapsing donors, identify individuals worth continued solicitation.
Source: 1998 KDD-Cup Competition via
UCI KDD Archive slide 4
DSCI 4520/5240
DATA MINING
Donor Master
Demographics
Transaction Detail
Data Preparation
Raw Analysis Data
95,412 Records
481 Fields slide 5
DSCI 4520/5240
DATA MINING
Additional Data Preparation
Final Analysis Data
(DONOR_RAW)
19,372 Records
50 Fields
Raw Analysis Data
95,412 Records
481 Fields slide 6
Analysis Data Definition
DSCI 4520/5240
DATA MINING
Donor master data
CONTROL_NUMBER
MONTHS_SINCE_ORIGIN
IN_HOUSE
Unique Donor ID
Elapsed time since first donation
1=Given to In House program,
0=Not In House donor slide 7
Analysis Data Definition
DSCI 4520/5240
DATA MINING
Demographic and other overlay data
OVERLAY_SOURCE
DONOR_AGE
DONOR_GENDER
PUBLISHED_PHONE
HOME_OWNER
MOR_HIT
M=Metromail, P=Polk, B=both
Age as of June 1997
Actual or inferred gender
Published telephone listing
H=homeowner, U=unknown
Mail order response hit rate slide 8
Analysis Data Definition
DSCI 4520/5240
DATA MINING
Demographic and other overlay data
CLUSTER_CODE
SES
54 Socio-economic cluster codes
5 Socio-economic cluster codes
INCOME_GROUP 7 income group levels
MED_HOUSEHOLD_INCOME Median income in $100’s
PER_CAPITA_INCOME Income per capita in dollars
WEALTH_RATING 10 wealth rating groups slide 9
Analysis Data Definition
DSCI 4520/5240
DATA MINING
Demographic and other overlay data
MED_HOME_VALUE Median home value in $100’s
PCT_OWNER_OCCUPIED Percent owner occupied housing
URBANICITY U=urban, C=city, S=suburban,
T=town, R=rural, ?=unknown slide 10
Analysis Data Definition
DSCI 4520/5240
DATA MINING
Census overlay data
PCT_MALE_MILITARY
PCT_MALE_VETERANS
PCT_VIETNAM_VETERANS
PCT_WWII_VETERANS
Percent male military in block
Percent male veterans in block
Percent Vietnam veterans in block
Percent WWII veterans in block slide 11
DSCI 4520/5240
DATA MINING
Analysis Data Definition
Transaction detail data
NUMBER_PROM_12
CARD_PROM_12
Number promotions last 12 mos.
Number card promotions last 12 mos.
`94 `95 `96 `97
97NK
Time
`98 slide 12
DSCI 4520/5240
DATA MINING
Analysis Data Definition
Transaction detail data
FREQ_STATUS_97NK
RECENCY_STATUS_96NK
MONTHS_SINCE_LAST
LAST_GIFT_AMT
`94 `95 `96
Frequency status, June `97
Recency status, June `96
Months since last donation
Amount of most recent donation
96NK 97NK
`97
Time
`98 slide 13
DSCI 4520/5240
DATA MINING
Analysis Data Definition
RECENT transaction detail data
RESPONSE_PROP
RESPONSE_COUNT
AVG_GIFT_AMT
RECENT_STAR_STATUS
94NK
`94 `95 `96
Response proportion since June `94
Response count since June `94
Average gift amount since June `94
STAR (1, 0) status since June `94
96NK
`97
Time
`98 slide 14
DSCI 4520/5240
DATA MINING
Analysis Data Definition
RECENT transaction detail data
CARD_RESPONSE_PROP
CARD_RESPONSE_COUNT
CARD_AVG_GIFT_AMT
Response proportion since June `94
Response count since June `94
Average gift amount since June `94
94NK 96NK
`94 `95 `96 `97
Time
`98 slide 15
DSCI 4520/5240
DATA MINING
Analysis Data Definition
LIFETIME transaction detail data
PROM
GIFT_COUNT
AVG_GIFT_AMT
PEP_STAR
94NK
`94 `95 `96
Total number promotions ever
Total number donations ever
Overall average gift amount
STAR status ever (1=yes, 0=no)
96NK
`97
Time
`98 slide 16
DSCI 4520/5240
DATA MINING
Analysis Data Definition
LIFETIME transaction detail data
GIFT_AMOUNT
GIFT_COUNT
MAX_GIFT
GIFT_RANGE
94NK
`94 `95 `96
Total gift amount ever
Total number donations ever
Maximum gift amount
Maximum less minimum gift amount
96NK
`97
Time
`98 slide 17
DSCI 4520/5240
DATA MINING
Analysis Data Definition
KDD supplied LIFETIME transaction detail data
FILE_AVG_GIFT
FILE_CARD_GIFT
MONTHS_SINCE_FIRST
MONTHS_SINCE_LAST
94NK
`94 `95 `96
Average gift from raw data
Average card gift raw data
First donation date from June `97
Last donation date from June `97
96NK
`97
Time
`98 slide 18
DSCI 4520/5240
DATA MINING
Analysis Data Definition
Transaction detail data target definition
TARGET_B
TARGET_D
`94
Response to 97NK solicitation (1=yes 0=no)
Response amount to 97NK solicitation
(missing if no response)
`95 `96 `97
97NK
Time
`98 slide 19
PR1 assignment
DSCI 4520/5240
DATA MINING
Use the DONOR_RAW data table, found in the
C:\4520data folder of the SAS EM5.3 server
Follow similar analysis steps to those shown in the
Getting Started with SAS Enterprise Miner 5.3
text, pp.23-44, to start a new analysis project and make a
Data Source called DONOR_RAW available.
Then follow pp. 45-60. Generate descriptive statistics
(pp. 46-51), create exploratory plots (pp. 51-53), partition the raw data (pp. 54-55), explore missing values (pp. 55-58), and replace observations with unknown levels (pp. 58-60). Handout PR1 lists exactly what you need to turn in.
slide 20