NEO CHALLENGE COMPETITION DETAILS The Neo Challenge competition focuses on the prediction of customer satisfaction. 1. Description of the tasks The tasks this year use the fields of “CLIENTES_fecha” and “SATISFACCION_fecha”in the training database. The data is from October 2012 to December 2014 for “CLIENTES_fecha” and January 2013 to December for “SATISFACCION_fecha” and reflect customer behavior during those time periods. The competition consists of two rounds, local and global: ๏ท ๏ท Local round: it is an internal country competition where they attempt to solve a problem labeled “Task 1” (the problem is the same in all six participating countries). There will be one winner from each country that will compete in the final round. Final round: The winning teams from the local rounds will be given a new challenge that is labeled “Task 2”. NOTE: For both tasks the team are required to submit the following methodology Task 1: Determine which active customers will become inactive One of the indicators to measure customer satisfaction is the customer activity level. The activity/inactivity of a customer is composed of the variables at the end of the appointed task and are contained in the training database “CLIENTES_fecha” (activity definition is below). The task is to determine which of the active customers in December 2014 will become inactive by March 31, 2015. For this we will provide the number of active customers at the close of December 2014 (N0) and the number of these active customers that have stopped being active at the close of March 2015 (N1). The participating team will have to submit N1 customer IDs that correspond to the N1 customers that have become inactive by March 31, 2015. NOTE: N1 includes not only the customers that go from active to inactive but also those customers that have left the bank Example of sample result: If the number of customers that went from active in December 31, 2014 to inactive ion March 31, 2015 was N1=3 potential customers would be: 99999 11897 88854 These 3 numbers correspond to the customer IDs that the participating team had said would become inactive. 1 NOTE: New customers during the first quarter of 2015 that were active and became inactive will not be a part of the result. In the same vein, the already inactive customers on December 31, 2015 will not be a part of the calculation. Definition of an active customer for this exercise: Perform at least 3 transaction with the account in the last 90 (variable MOV90_CTA with a value greater than or equal to 3) or have an average volume in the last 6 months greater than or equal to a predetermine amount depending on the value assigned to the CUADRANTE variable for each customer The CUADRANTE amounts are the following: If the CUADRANTE is S1 or S2 or S3 or A1 or A2 or A3 reference amount is: 143 If the CUADRANTE is B1 or B2 or B3 it is: 36 If the CUADRANTE is C1 or C2 or C3 or D1 or D2 or D3 it is: 7 How do you calculate the average volume for the last 6 months? It is calculated by summing the amounts and balances of the following variables: SALDO_ACTENOPAQ SALDO_ACTEPAQ SALDO_ACCD SALDO_ACAHPAQ SALDO_ACAHNOPAQ SALDO_ACAD PP_MONTO PR_MONTO PH_MONTO PC_MONTO SALDO_ADPF SALDO_APFE SALDO_FONDO_MONEDA1 SALDO_FONDO_MONEDA2 SALDO_ATIT For the current month and the five preceding months. And the average volume is obtained by summing these variables for the 6 months and dividing between the minimum value that the ANTIGÜEDAD variable has or 6. 2 NOTE: The FECHA variable takes the values 01/01/2013, 01/02/2013with the DD/MM/YYYY format (D – day, M – month, Y – year) indicating the close date of the month and year. For example when we work with dates that have FECHA = 01/12/2014 the values correspond with the December 2014 close and the MOV90_CTA variable details the transactions from December 2014, November 2014 and October 2014. And the amounts and balances correspond to the close of this same month. Task 2: Determine the customer satisfaction level Another way to evaluate the customer satisfaction if to perform surveys with this purpose. The level of customer satisfaction of those who were surveyed during the training period is in the “SATISFACCION_fecha” database and the NIVEL_SATISFACCION variable that has values of 0, 1 or 2 (0: other 1: Satisfied and 2: Very satisfied), this information will be provide at the beginning of the. The task consists in determining the customer satisfaction level of a group of customers that were surveyed during the first quarter of 2015. Before the Final Round we will provide a final list of customer IDs that corresponds with those who were surveyed during Q1 2015 (N2 customers) and the number of customers who respond 0 (N20 the number of customers who respond 1 (N21) and the number of customers who respond 2 (N22). The participating teams will have to send in the same N2 customer IDs with the customer satisfaction level. The number of customers with each satisfaction level must match with the N20, N21 y N22 provided in the beginning. Example of a sample result: If the amount of customers surveyed during the first quarter of 2015 was N2=10 customers, N20 = 4, N21 = 4 y N22 = 2, a possible result would be: 12399;0 14697;1 34554;0 67893;2 15557;1 05654;0 99993;2 08356;1 83747;0 11109;1 3 NOTE: The FECHA variable, for the data for the basis of satisfaction, takes the values 1T2013, 2T2013… with the alphanumeric format (text chain) and indicates the quarter and year in which the survey was performed (T = Quarter). For example, when we work with FECHA = 1T2014, the values of the field correspond with the first quarter of 2014. NOTE: For both tasks the following methodology is required. Please consult section 4. Results Preparation to get more details. 2. Selection of winning teams. 2.1. Selection of the winning team who will be sent to the final round Only one team will be chosen per country, which will be whomever had the best score for solving the problem in the local round. In the case of a tie, the winning team will be whomever submitted the results first. Task 1 Evaluation: The competition organizers will provide N1 which corresponds to the number of customers that went from active on December 31, 2014 to inactive on March 31, 2015. Each participating team must submit a list with exactly N1 customer identifiers (customer IDs) and the winning team will be who has chosen the most customer IDs of those who have gone from active to inactive. This means the winning team how the lowest margin of error and the correct answer. The team will have the best value of the equation: ๐º ๐๐ ๐๐ข๐ ๐ก๐๐๐๐๐ ๐กโ๐๐ก ๐๐๐ ๐๐ ๐๐๐๐๐๐ ๐๐๐ก๐๐ฃ๐ ๐๐๐ ๐๐๐ ๐ก๐๐ข๐๐ฆ ๐๐๐๐๐ก๐๐ฃ๐ ๐1 After the judging of the winners of the local round, the correct answer will be available on the web page. 2.2. Selection of the winning team from the final round The winning of each country will go to the final round which will be in Spain. During this round the winning teams must solve the “task 2” challenge. Task 2 Evaluation: The competition will provide before the start of the final round a list of N2 customer identifiers (customer IDs) that corresponds to the customers who were surveyed during the first quarter of 2015 and the number of customers who responded 0 (N20), the number of customers who responded 1 (N21) and the number of customers who responded 2 (N22).The participating teams have to submit the same N2 customer IDs with the predicted customer satisfaction levels and the number of 4 customers for each customer satisfaction level must match the N20, N21 and N22 provided at the beginning of the competition. The winning team will have correctly predicted the most customer IDs with the accurate customer satisfaction level. This means the winning team will have the lowest margin of error and the correct answer considering the 3 levels of satisfaction. The team will have the best value of the equation: ๐º ๐๐ ๐๐ข๐ ๐ก๐๐๐๐๐ ๐ค๐๐กโ ๐กโ๐ ๐๐๐๐๐๐๐ก๐๐ ๐ ๐๐ก๐๐ ๐๐๐๐ก๐๐๐ ๐๐๐ฃ๐๐ , ๐ ๐๐๐ ๐๐ ๐กโ๐๐๐ ๐๐๐ก๐ข๐๐ ๐๐๐ ๐๐๐๐ ๐ ๐2 The training data necessary to solve this problem will be provided at the beginning of the competition. Final evaluation: the winning team of the competition will be whomever has received the best score on both tasks and the employed methodology, which is: 40% attributed to task 1, 40% attributed to task 2 and 20% to the methodology employed in both tasks. 3. Obtaining the training data and compiling the predictive result. The groups of necessary data will be available to be downloaded on the web page. After November 1: ๏ท Training database (txt format and with delimited fields or fields separated by the “;”symbol): o CLIENTES_fecha o SATISFACCION_fecha The description of each of the fields of each the files in the training database can be found in the DICCIONARIO_DATOS_CLIENTE.xlsx and DICCIONARIO_DATOS_SATISFACCION.xlsx files which are available on the website. The format of the fields will be found in the following files (on the website): FORMATO_VARIABLES_CLIENTE.xlsx FORMATO_VARIABLES_SATISFACCION.xlsx ๏ท ๏ท N0: 1.847.801 Active customers as of December 31, 2014 N1: 99.044 customers who have become inactive as of March 31, 2015 Note that the active clients as of December 31, (N0) and of these those who are going to become inactive (the customers from the Task 1 exercise) can be found from the training database. Before the final round: 5 ๏ท ๏ท ๏ท ๏ท SATISFACCION_2015.txt:- File that has the customer IDs that correspond with the customers surveyed during the first quarter of 2015 and from these the customer satisfaction level must be predicted (note that these are the N2 customers). N20: Number of customers that responded 0. N21: Number of customers that responded 1. N22: Number of customers that responded 2. 4. Results Preparation. 4.1. Results Preparation for the first local round. Each team must present one file with the results in txt format (the responses can be done as many times as needed before the cutoff date). Only the last version before the cutoff date will be reviewed. In the case of a tie, the submission date will be taken into account. The file has to be named with the following format: Id-Team-Task1.TXT Where the IdTeam is the identifier of the participating team the file must have N1 lines separated by a blank line and each line must have a customer Identifier (customer ID). Example in the case N1=3: 99999 11897 88854 It is required to follow the specified name format for the results. The presented results could be rejected if they do not follow the proper format. In addition to the results file, it is required to submit one pdf file (“Id-TeamMetodology1.PDF”) where it explains the methodology developed for task 1. Please see section 5. Methodology Annex detail 4.2. Results Preparation for the final round. Each team must present one file with the results in txt format (the responses can be done as many times as needed before the cutoff date). Only the last version before the cutoff date will be reviewed. In the case of a tie, the submission date will be taken into account. The file has to be named with the following format: Id-Team-Task2.TXT Where the IdTeam is the identifier of the participating team the file must have N1 lines separated by 6 a blank line and each line must have a customer Identifier (customer ID) and an estimate of the customer satisfaction level (both variables separated by a semicolon). Example: in the case N2=10, N20 = 4, N21 = 4 y N22 = 2, one file will be sent in the following format: 12399;0 14697;1 34554;0 67893;2 15557;1 05654;0 99993;2 08356;1 83747;0 11109;1 It is required to follow the specified name format for the results. The presented results could be rejected if they do not follow the proper format. ๏ท In addition to the results file, it is required to submit one pdf file (“Id-TeamMetodology2.PDF”) where it explains the methodology developed for task 2. Please see section 5. Methodology Annex detail 5. Methodology Annex detail. An outline to explain the methodology for tasks 1 and 2: ๏ท Procedure for the generation of entry table for the modelling. ๏ท The following stages for the modelling portion. A possible outline could be: o Sampling o Exploration o Modification: Variable selection, Optimal discretization of variables, Creation and/or transformation variables, Treatment of missing, Treatment of outliers o Modelling o Validation of comparison models ๏ท Final Model ๏ท Procedure for the selection of the customers N1 and procedure for the assignment of N20, N21 y N22 ๏ท Software uses to solve the problem ๏ท Final generation of the results by the software 6. Calendar. ๏ท November 1: o Beginning of the sign-up period o Training data (CLIENTES_fecha, SATISFACCION_fecha). 7 o o o Format and description of the fields of each of the files in the training data N0: Number of active customers at the close of December 2014 N1: Number of customers that have become inactive by March 31, 2015. For the teams that sign up at a later date, they will be provided the data at that time. ๏ท ๏ท ๏ท ๏ท ๏ท ๏ท December 15, 2015: o Close of the registration period. January 31, 2016: o Final date to submit the task 1 results. February 3-15, 2016: o Publication of the winners of the local first round o Publication of the correct answer to task 1 February 15, 2016: o For task 2 list the customers that have been surveyed in the first quarter of 2015 and from these choose which customers are satisfied o N2: The number of customers that respond they are satisfied, the number of customers that respond 0 (N20), the number of customers that respond 1 (N21) and the number of customers that respond 2 (N22) March 10, 2016: o Final date to submit the task 2 results. March 2016: o Final event 8