DSCI 4520/5240

advertisement

DSCI 4520/5240: Data Mining

Fall 2013 – Dr. Nick Evangelopoulos

DONOR_RAW: Data

Description

Some slide material taken from: SAS Education

Original artwork © Nick Evangelopoulos, 2013 slide 1

DonorRAW: Data Overview

DSCI 4520/5240

DATA MINING

Determine who is likely to donate to a non-profit organization campaign and target them for donation solicitation

The scenario is the same as the one that produced the data set MYRAW

This time we have somewhat different data slide 2

DSCI 4520/5240

DATA MINING

DONOR_RAW: Nonprofit donation solicitation scenario

In 1997, a non-profit organization related to U.S. military veterans had a regular donation solicitation campaign called 97NK. For each person targeted by the campaign, certain information (at a personal or at a demographic level) was known beforehand. Solicitation response

(whether they donated and, if yes, what amount) was recorded.

In 1998, the organization offered the full dataset to analysts

(under certain conditions). The particular data set

DONOR_RAW is a subset that includes 50 variables and about 19,400 observations .

slide 3

DSCI 4520/5240

DATA MINING

The Charity Donation Project

Business

:

A National veterans’ organization

Objective: From population of lapsing donors, identify individuals worth continued solicitation.

Source: 1998 KDD-Cup Competition via

UCI KDD Archive slide 4

DSCI 4520/5240

DATA MINING

Donor Master

Demographics

Transaction Detail

Data Preparation

Raw Analysis Data

95,412 Records

481 Fields slide 5

DSCI 4520/5240

DATA MINING

Additional Data Preparation

Final Analysis Data

(DONOR_RAW)

19,372 Records

50 Fields

Raw Analysis Data

95,412 Records

481 Fields slide 6

Analysis Data Definition

DSCI 4520/5240

DATA MINING

Donor master data

CONTROL_NUMBER

MONTHS_SINCE_ORIGIN

IN_HOUSE

Unique Donor ID

Elapsed time since first donation

1=Given to In House program,

0=Not In House donor slide 7

Analysis Data Definition

DSCI 4520/5240

DATA MINING

Demographic and other overlay data

OVERLAY_SOURCE

DONOR_AGE

DONOR_GENDER

PUBLISHED_PHONE

HOME_OWNER

MOR_HIT

M=Metromail, P=Polk, B=both

Age as of June 1997

Actual or inferred gender

Published telephone listing

H=homeowner, U=unknown

Mail order response hit rate slide 8

Analysis Data Definition

DSCI 4520/5240

DATA MINING

Demographic and other overlay data

CLUSTER_CODE

SES

54 Socio-economic cluster codes

5 Socio-economic cluster codes

INCOME_GROUP 7 income group levels

MED_HOUSEHOLD_INCOME Median income in $100’s

PER_CAPITA_INCOME Income per capita in dollars

WEALTH_RATING 10 wealth rating groups slide 9

Analysis Data Definition

DSCI 4520/5240

DATA MINING

Demographic and other overlay data

MED_HOME_VALUE Median home value in $100’s

PCT_OWNER_OCCUPIED Percent owner occupied housing

URBANICITY U=urban, C=city, S=suburban,

T=town, R=rural, ?=unknown slide 10

Analysis Data Definition

DSCI 4520/5240

DATA MINING

Census overlay data

PCT_MALE_MILITARY

PCT_MALE_VETERANS

PCT_VIETNAM_VETERANS

PCT_WWII_VETERANS

Percent male military in block

Percent male veterans in block

Percent Vietnam veterans in block

Percent WWII veterans in block slide 11

DSCI 4520/5240

DATA MINING

Analysis Data Definition

Transaction detail data

NUMBER_PROM_12

CARD_PROM_12

Number promotions last 12 mos.

Number card promotions last 12 mos.

`94 `95 `96 `97

97NK

Time

`98 slide 12

DSCI 4520/5240

DATA MINING

Analysis Data Definition

Transaction detail data

FREQ_STATUS_97NK

RECENCY_STATUS_96NK

MONTHS_SINCE_LAST

LAST_GIFT_AMT

`94 `95 `96

Frequency status, June `97

Recency status, June `96

Months since last donation

Amount of most recent donation

96NK 97NK

`97

Time

`98 slide 13

DSCI 4520/5240

DATA MINING

Analysis Data Definition

RECENT transaction detail data

RESPONSE_PROP

RESPONSE_COUNT

AVG_GIFT_AMT

RECENT_STAR_STATUS

94NK

`94 `95 `96

Response proportion since June `94

Response count since June `94

Average gift amount since June `94

STAR (1, 0) status since June `94

96NK

`97

Time

`98 slide 14

DSCI 4520/5240

DATA MINING

Analysis Data Definition

RECENT transaction detail data

CARD_RESPONSE_PROP

CARD_RESPONSE_COUNT

CARD_AVG_GIFT_AMT

Response proportion since June `94

Response count since June `94

Average gift amount since June `94

94NK 96NK

`94 `95 `96 `97

Time

`98 slide 15

DSCI 4520/5240

DATA MINING

Analysis Data Definition

LIFETIME transaction detail data

PROM

GIFT_COUNT

AVG_GIFT_AMT

PEP_STAR

94NK

`94 `95 `96

Total number promotions ever

Total number donations ever

Overall average gift amount

STAR status ever (1=yes, 0=no)

96NK

`97

Time

`98 slide 16

DSCI 4520/5240

DATA MINING

Analysis Data Definition

LIFETIME transaction detail data

GIFT_AMOUNT

GIFT_COUNT

MAX_GIFT

GIFT_RANGE

94NK

`94 `95 `96

Total gift amount ever

Total number donations ever

Maximum gift amount

Maximum less minimum gift amount

96NK

`97

Time

`98 slide 17

DSCI 4520/5240

DATA MINING

Analysis Data Definition

KDD supplied LIFETIME transaction detail data

FILE_AVG_GIFT

FILE_CARD_GIFT

MONTHS_SINCE_FIRST

MONTHS_SINCE_LAST

94NK

`94 `95 `96

Average gift from raw data

Average card gift raw data

First donation date from June `97

Last donation date from June `97

96NK

`97

Time

`98 slide 18

DSCI 4520/5240

DATA MINING

Analysis Data Definition

Transaction detail data target definition

TARGET_B

TARGET_D

`94

Response to 97NK solicitation (1=yes 0=no)

Response amount to 97NK solicitation

(missing if no response)

`95 `96 `97

97NK

Time

`98 slide 19

PR1 assignment

DSCI 4520/5240

DATA MINING

 Use the DONOR_RAW data table, found in the

C:\4520data folder of the SAS EM5.3 server

 Follow similar analysis steps to those shown in the

Getting Started with SAS Enterprise Miner 5.3

text, pp.23-44, to start a new analysis project and make a

Data Source called DONOR_RAW available.

 Then follow pp. 45-60. Generate descriptive statistics

(pp. 46-51), create exploratory plots (pp. 51-53), partition the raw data (pp. 54-55), explore missing values (pp. 55-58), and replace observations with unknown levels (pp. 58-60). Handout PR1 lists exactly what you need to turn in.

slide 20

Download