document detailing this first virtual phase

advertisement
NEO CHALLENGE COMPETITION DETAILS
The Neo Challenge competition focuses on the prediction of customer satisfaction.
1. Description of the tasks
The tasks this year use the fields of “CLIENTES_fecha” and “SATISFACCION_fecha”in the
training database. The data is from October 2012 to December 2014 for “CLIENTES_fecha” and
January 2013 to December for “SATISFACCION_fecha” and reflect customer behavior during
those time periods.
The competition consists of two rounds, local and global:
๏‚ท
๏‚ท
Local round: it is an internal country competition where they attempt to solve a
problem labeled “Task 1” (the problem is the same in all six participating countries).
There will be one winner from each country that will compete in the final round.
Final round: The winning teams from the local rounds will be given a new challenge
that is labeled “Task 2”.
NOTE: For both tasks the team are required to submit the following methodology
Task 1: Determine which active customers will become inactive
One of the indicators to measure customer satisfaction is the customer activity level.
The activity/inactivity of a customer is composed of the variables at the end of the
appointed task and are contained in the training database “CLIENTES_fecha” (activity
definition is below). The task is to determine which of the active customers in
December 2014 will become inactive by March 31, 2015. For this we will provide the
number of active customers at the close of December 2014 (N0) and the number of
these active customers that have stopped being active at the close of March 2015 (N1).
The participating team will have to submit N1 customer IDs that correspond to the N1
customers that have become inactive by March 31, 2015.
NOTE: N1 includes not only the customers that go from active to inactive but also those
customers that have left the bank
Example of sample result:
If the number of customers that went from active in December 31, 2014 to
inactive ion March 31, 2015 was N1=3 potential customers would be:
99999
11897
88854
These 3 numbers correspond to the customer IDs that the participating team
had said would become inactive.
1
NOTE: New customers during the first quarter of 2015 that were active and
became inactive will not be a part of the result. In the same vein, the already
inactive customers on December 31, 2015 will not be a part of the calculation.
Definition of an active customer for this exercise:
Perform at least 3 transaction with the account in the last 90 (variable
MOV90_CTA with a value greater than or equal to 3)
or
have an average volume in the last 6 months greater than or equal to a predetermine amount depending on the value assigned to the CUADRANTE
variable for each customer
The CUADRANTE amounts are the following:
If the CUADRANTE is S1 or S2 or S3 or A1 or A2 or A3 reference amount is: 143
If the CUADRANTE is B1 or B2 or B3 it is: 36
If the CUADRANTE is C1 or C2 or C3 or D1 or D2 or D3 it is: 7
How do you calculate the average volume for the last 6 months?
It is calculated by summing the amounts and balances of the following
variables:
SALDO_ACTENOPAQ
SALDO_ACTEPAQ
SALDO_ACCD
SALDO_ACAHPAQ
SALDO_ACAHNOPAQ
SALDO_ACAD
PP_MONTO
PR_MONTO
PH_MONTO
PC_MONTO
SALDO_ADPF
SALDO_APFE
SALDO_FONDO_MONEDA1
SALDO_FONDO_MONEDA2
SALDO_ATIT
For the current month and the five preceding months. And the average
volume is obtained by summing these variables for the 6 months and
dividing between the minimum value that the ANTIGÜEDAD variable has or
6.
2
NOTE: The FECHA variable takes the values 01/01/2013, 01/02/2013with
the DD/MM/YYYY format (D – day, M – month, Y – year) indicating the
close date of the month and year.
For example when we work with dates that have FECHA = 01/12/2014 the
values correspond with the December 2014 close and the MOV90_CTA
variable details the transactions from December 2014, November 2014 and
October 2014. And the amounts and balances correspond to the close of
this same month.
Task 2: Determine the customer satisfaction level
Another way to evaluate the customer satisfaction if to perform surveys with this
purpose.
The level of customer satisfaction of those who were surveyed during the training
period is in the “SATISFACCION_fecha” database and the NIVEL_SATISFACCION
variable that has values of 0, 1 or 2 (0: other 1: Satisfied and 2: Very satisfied), this
information will be provide at the beginning of the. The task consists in determining
the customer satisfaction level of a group of customers that were surveyed during the
first quarter of 2015. Before the Final Round we will provide a final list of customer IDs
that corresponds with those who were surveyed during Q1 2015 (N2 customers) and
the number of customers who respond 0 (N20 the number of customers who respond 1
(N21) and the number of customers who respond 2 (N22). The participating teams will
have to send in the same N2 customer IDs with the customer satisfaction level. The
number of customers with each satisfaction level must match with the N20, N21 y N22
provided in the beginning.
Example of a sample result:
If the amount of customers surveyed during the first quarter of 2015 was
N2=10 customers, N20 = 4, N21 = 4 y N22 = 2, a possible result would be:
12399;0
14697;1
34554;0
67893;2
15557;1
05654;0
99993;2
08356;1
83747;0
11109;1
3
NOTE: The FECHA variable, for the data for the basis of satisfaction, takes the
values 1T2013, 2T2013… with the alphanumeric format (text chain) and
indicates the quarter and year in which the survey was performed (T =
Quarter).
For example, when we work with FECHA = 1T2014, the values of the field
correspond with the first quarter of 2014.
NOTE: For both tasks the following methodology is required.
Please consult section 4. Results Preparation to get more details.
2. Selection of winning teams.
2.1. Selection of the winning team who will be sent to the final round
Only one team will be chosen per country, which will be whomever had the best score for
solving the problem in the local round. In the case of a tie, the winning team will be
whomever submitted the results first.
Task 1 Evaluation: The competition organizers will provide N1 which corresponds to
the number of customers that went from active on December 31, 2014 to inactive on
March 31, 2015. Each participating team must submit a list with exactly N1 customer
identifiers (customer IDs) and the winning team will be who has chosen the most
customer IDs of those who have gone from active to inactive. This means the winning
team how the lowest margin of error and the correct answer. The team will have the
best value of the equation:
๐‘º ๐‘œ๐‘“ ๐‘๐‘ข๐‘ ๐‘ก๐‘œ๐‘š๐‘’๐‘Ÿ๐‘  ๐‘กโ„Ž๐‘Ž๐‘ก ๐‘Ž๐‘Ÿ๐‘’ ๐‘›๐‘œ ๐‘™๐‘œ๐‘›๐‘”๐‘’๐‘Ÿ ๐‘Ž๐‘๐‘ก๐‘–๐‘ฃ๐‘’ ๐‘Ž๐‘›๐‘‘ ๐‘Ž๐‘Ÿ๐‘’ ๐‘ก๐‘Ÿ๐‘ข๐‘™๐‘ฆ ๐‘–๐‘›๐‘Ž๐‘๐‘ก๐‘–๐‘ฃ๐‘’
๐‘1
After the judging of the winners of the local round, the correct answer will be available on
the web page.
2.2. Selection of the winning team from the final round
The winning of each country will go to the final round which will be in Spain. During this
round the winning teams must solve the “task 2” challenge.
Task 2 Evaluation: The competition will provide before the start of the final round a list
of N2 customer identifiers (customer IDs) that corresponds to the customers who were
surveyed during the first quarter of 2015 and the number of customers who
responded 0 (N20), the number of customers who responded 1 (N21) and the number of
customers who responded 2 (N22).The participating teams have to submit the same N2
customer IDs with the predicted customer satisfaction levels and the number of
4
customers for each customer satisfaction level must match the N20, N21 and N22
provided at the beginning of the competition.
The winning team will have correctly predicted the most customer IDs with the
accurate customer satisfaction level. This means the winning team will have the lowest
margin of error and the correct answer considering the 3 levels of satisfaction. The
team will have the best value of the equation:
๐‘º ๐‘œ๐‘“ ๐‘๐‘ข๐‘ ๐‘ก๐‘œ๐‘š๐‘’๐‘Ÿ๐‘  ๐‘ค๐‘–๐‘กโ„Ž ๐‘กโ„Ž๐‘’ ๐‘๐‘Ÿ๐‘’๐‘‘๐‘–๐‘๐‘ก๐‘’๐‘‘ ๐‘ ๐‘Ž๐‘ก๐‘–๐‘ ๐‘“๐‘Ž๐‘๐‘ก๐‘–๐‘œ๐‘› ๐‘™๐‘’๐‘ฃ๐‘’๐‘™ , ๐‘ ๐‘Ž๐‘š๐‘’ ๐‘Ž๐‘  ๐‘กโ„Ž๐‘’๐‘–๐‘Ÿ ๐‘Ž๐‘๐‘ก๐‘ข๐‘Ž๐‘™ ๐‘Ÿ๐‘’๐‘ ๐‘๐‘œ๐‘›๐‘ ๐‘’
๐‘2
The training data necessary to solve this problem will be provided at the beginning of the
competition.
Final evaluation: the winning team of the competition will be whomever has received
the best score on both tasks and the employed methodology, which is: 40% attributed
to task 1, 40% attributed to task 2 and 20% to the methodology employed in both
tasks.
3. Obtaining the training data and compiling the predictive result.
The groups of necessary data will be available to be downloaded on the web page.
After November 1:
๏‚ท Training database (txt format and with delimited fields or fields separated by
the “;”symbol):
o
CLIENTES_fecha
o
SATISFACCION_fecha
The description of each of the fields of each the files in the training database can
be found in the DICCIONARIO_DATOS_CLIENTE.xlsx and
DICCIONARIO_DATOS_SATISFACCION.xlsx files which are available on the website.
The format of the fields will be found in the following files (on the website):
FORMATO_VARIABLES_CLIENTE.xlsx
FORMATO_VARIABLES_SATISFACCION.xlsx
๏‚ท
๏‚ท
N0: 1.847.801 Active customers as of December 31, 2014
N1: 99.044 customers who have become inactive as of March 31, 2015
Note that the active clients as of December 31, (N0) and of these those who are going
to become inactive (the customers from the Task 1 exercise) can be found from the
training database.
Before the final round:
5
๏‚ท
๏‚ท
๏‚ท
๏‚ท
SATISFACCION_2015.txt:- File that has the customer IDs that correspond with
the customers surveyed during the first quarter of 2015 and from these the
customer satisfaction level must be predicted (note that these are the N2
customers).
N20: Number of customers that responded 0.
N21: Number of customers that responded 1.
N22: Number of customers that responded 2.
4. Results Preparation.
4.1. Results Preparation for the first local round.
Each team must present one file with the results in txt format (the responses can be
done as many times as needed before the cutoff date). Only the last version before the
cutoff date will be reviewed. In the case of a tie, the submission date will be taken into
account.
The file has to be named with the following format: Id-Team-Task1.TXT Where the IdTeam is the identifier of the participating team the file must have N1 lines separated by
a blank line and each line must have a customer Identifier (customer ID).
Example in the case N1=3:
99999
11897
88854
It is required to follow the specified name format for the results. The presented results
could be rejected if they do not follow the proper format.
In addition to the results file, it is required to submit one pdf file (“Id-TeamMetodology1.PDF”) where it explains the methodology developed for task 1. Please
see section 5. Methodology Annex detail
4.2. Results Preparation for the final round.
Each team must present one file with the results in txt format (the responses can be
done as many times as needed before the cutoff date). Only the last version before the
cutoff date will be reviewed. In the case of a tie, the submission date will be taken into
account.
The file has to be named with the following format: Id-Team-Task2.TXT Where the IdTeam is the identifier of the participating team the file must have N1 lines separated by
6
a blank line and each line must have a customer Identifier (customer ID) and an
estimate of the customer satisfaction level (both variables separated by a semicolon).
Example: in the case N2=10, N20 = 4, N21 = 4 y N22 = 2, one file will be sent in the
following format:
12399;0
14697;1
34554;0
67893;2
15557;1
05654;0
99993;2
08356;1
83747;0
11109;1
It is required to follow the specified name format for the results. The presented results
could be rejected if they do not follow the proper format.
๏‚ท In addition to the results file, it is required to submit one pdf file (“Id-TeamMetodology2.PDF”) where it explains the methodology developed for task 2.
Please see section 5. Methodology Annex detail
5. Methodology Annex detail.
An outline to explain the methodology for tasks 1 and 2:
๏‚ท Procedure for the generation of entry table for the modelling.
๏‚ท The following stages for the modelling portion. A possible outline
could be:
o Sampling
o Exploration
o Modification: Variable selection, Optimal discretization of
variables, Creation and/or transformation variables,
Treatment of missing, Treatment of outliers
o Modelling
o Validation of comparison models
๏‚ท Final Model
๏‚ท Procedure for the selection of the customers N1 and procedure for the
assignment of N20, N21 y N22
๏‚ท Software uses to solve the problem
๏‚ท Final generation of the results by the software
6. Calendar.
๏‚ท
November 1:
o Beginning of the sign-up period
o Training data (CLIENTES_fecha, SATISFACCION_fecha).
7
o
o
o
Format and description of the fields of each of the files in the training
data
N0: Number of active customers at the close of December 2014
N1: Number of customers that have become inactive by March 31,
2015.
For the teams that sign up at a later date, they will be provided the data at
that time.
๏‚ท
๏‚ท
๏‚ท
๏‚ท
๏‚ท
๏‚ท
December 15, 2015:
o Close of the registration period.
January 31, 2016:
o Final date to submit the task 1 results.
February 3-15, 2016:
o Publication of the winners of the local first round
o Publication of the correct answer to task 1
February 15, 2016:
o For task 2 list the customers that have been surveyed in the first
quarter of 2015 and from these choose which customers are satisfied
o N2: The number of customers that respond they are satisfied, the
number of customers that respond 0 (N20), the number of customers
that respond 1 (N21) and the number of customers that respond 2 (N22)
March 10, 2016:
o Final date to submit the task 2 results.
March 2016:
o Final event
8
Download