JRC Data Validation tool for Data Calls collected under the EU Data Collection Framework DV Tool 4.15 User Manual JRC G.03 Data Collection Team EUR xxxxx EN The mission of the JRC-IPSC is to provide research results and to support EU policy-makers in their effort towards global security and towards protection of European citizens from accidents, deliberate attacks, fraud and illegal actions against EU policies. European Commission Joint Research Centre Institute for the Protection and Security of the Citizen Contact information Address: Via Fermi 2749, T.P. 051 – Ispra (VA), 21027 - Italy E-mail: datasubmission@jrc.ec.europa.eu Tel.: 0332 786479 Fax: 0332 789658 http://ipsc.jrc.ec.europa.eu/ http://www.jrc.ec.europa.eu/ Legal Notice Neither the European Commission nor any person acting on behalf of the Commission is responsible for the use which might be made of this publication. Europe Direct is a service to help you find answers to your questions about the European Union Freephone number (*): 00 800 6 7 8 9 10 11 (*) Certain mobile telephone operators do not allow access to 00 800 numbers or these calls may be billed. A great deal of additional information on the European Union is available on the Internet. It can be accessed through the Europa server http://europa.eu/ JRC [PUBSY request] EUR XXXXX LL ISBN X-XXXX-XXXX-X ISSN XXXX-XXXX doi:XXXXX Luxembourg: Publications Office of the European Union © European Union, 2015 Reproduction is authorised provided the source is acknowledged Printed in Italy DV Tool 4.16 User Manual TABLE OF CONTENTS 1. Introduction................................................................................................................................ 3 2. DV tool and Microsoft Office Excel versions ............................................................................... 4 3. How to use the DV tool features ................................................................................................. 4 3.1. 3.2. AT OPENING ................................................................................................................................................. 4 DV TOOL CONTROL PANEL............................................................................................................................. 5 3.2.1. Data Check area ........................................................................................................................... 6 3.2.1.1. 3.2.1.2. 3.2.1.3. 3.2.1.4. 3.2.1.5. 3.2.1.6. 3.2.1.7. 3.2.2. Step 1: add template................................................................................................................................ 6 Step 2: paste data into the template ........................................................................................................ 8 Step 3: check codification ........................................................................................................................ 9 Step 4: check duplication ....................................................................................................................... 10 Step 5: check Inter-column correspondences........................................................................................ 11 Step 6: check Inter-worksheets correspondences ................................................................................. 12 Step 7: export data ................................................................................................................................ 12 Support area................................................................................................................................ 14 4. Important Notes ....................................................................................................................... 15 5. Disclaimer of warranty ............................................................................................................. 15 1 DV Tool 4.16 User Manual LIST OF FIGURES Figure 3.1: The DV tool button in Excel’s Home ribbon. ......................................................................................... 4 Figure 3.2: The DV tool control panel. .................................................................................................................... 5 Figure 3.3: Buttons of the ‘support area’. ................................................................................................................ 6 Figure 3.4: Selection template window of the Fishing Effort Regime data call ....................................................... 7 Figure 3.5: Selection template window for the Fleet Economic data call. ............................................................... 7 Figure 3.6: Import of data from a file with comma separator as delimiter. .............................................................. 8 Figure 3.7: Prompt message during import process ............................................................................................... 9 Figure 3.8: Check Codification executed on the selected (active) worksheet. ....................................................... 9 Figure 3.10: Duplication check panel. ................................................................................................................... 10 Figure 3.11: Duplication check example: row 5 is a duplication of row 4 and row 12 is a duplication of row 11. ................................................................................................................................................ 10 Figure 3.12: Inter-columns check panel. ............................................................................................................... 11 Figure 3.13: Correspondence check example (Effort data call). ........................................................................... 11 Figure 3.12: Inter-worksheets check panel. .......................................................................................................... 12 Figure 3.14: Option to split the file to smaller size files. ........................................................................................ 13 Figure 3.15: Choose to split the data by year. ...................................................................................................... 13 Figure 3.16: Successful export message. ............................................................................................................. 13 Figure 3.17: Example of exported data in the created directory. .......................................................................... 14 2 DV Tool 4.16 User Manual 1. Introduction The Data Validation (DV) tool is a set of macros developed in Visual Basic for Applications (VBA) and embedded in a specifically designed Excel Workbook. The main purpose of this tool is to facilitate and support the Member States in uploading data to meet the requirements defined by DG MARE in official DCF data calls by STECF (Council Regulation 199/2008). The use of this tool is not mandatory. However, the data validations checks performed by the DV tool can greatly reduce the number of erroneous records contained in a file to be uploaded to the DCF web site, and hence facilitate the uploading procedure. The tool is capable of checking national data stored in Excel rows against certain codifications and rules as requested in the data call. Checks can be performed on partial or whole data sets to be uploaded to the DCF web site. The majority of the checks concern the use of valid codes as defined in the data call and the type of the data entered (numeric or text). Erroneous data are identified and can easily be corrected. The DV tool is design to check data that are provided in predefined worksheets depending on the type of data call. Each data call has a dedicated DV tool. The first version of the DV tool was developed to serve the needs of the Fishing Effort Regimes and the Mediterranean & Black Sea data calls. The DV tool was a set of excel files one for each set of data type requested by the data call. A second version was produced for the Fleet Economic data call in 2013. It contained improvements based on experience acquired with the previous data calls: • a single excel file containing all the data templates and with • the possibility to automatically update, directly from the database, the codes used by the DV tool for the validation of the data. It was realised that ideally the JRC should develop a DV tool that could serve different data calls while remaining easy to maintain. Therefore the original DV tool, developed by Nikolaos Mitrakis in 2011, has been re-engineered and re-designed from scratch to allow the flexibility and maintainability desired. However, the tool is still under development and any feedback from users that could help improve the tool according to their needs is most welcomed. Any changes in the data call imply coherent adaptation of the modules. The purpose of this manual is to provide the general concept surrounding the tool and guidelines for its use. Since the tool is integrated in Excel Workbooks, all Excel functions are still available. The tool is available for download from the Data Collection Framework web site: https://datacollection.jrc.ec.europa.eu 3 DV Tool 4.16 User Manual 2. DV tool and Microsoft Office Excel versions To benefit from the DV tool functionalities it is necessary to run it on a Windows operating system where Microsoft Excel software is installed. The DV tool can be used within Microsoft Excel 2003, 2007, 2010 and 2013; either 32bits or 64bits. 3. How to use the DV tool features The DV tool stands for Data Validation tool. This chapter explains the functionalities and the possible steps to be performed in order to create final excel data files that can be uploaded to the ‘Upload Facility’ available on the Data Collection web site. 3.1. At opening When opening the DV tool file with Microsoft Office 2007/2010 (see below for Microsoft Office 2003), a ‘Security Warning’ may appear with the message: “Macros have been disabled. Options…” This is a standard security option in all Microsoft’s applications, since macros can potentially have access to your data. In order to continue with the use of the tool, the user should choose: “OptionsEnable this content”. To change the security settings go to “DeveloperCodeMacro SecurityDisable” all macros with notification. However, we recommend that the default option not be changed (although, the DV tool is a trusted application, other macros can potentially cause several problems). Once the Macros are enabled the DV Tool ribbon is visible in the Home tab of the Excel application. Figure 3.1: The DV tool button in Excel’s Home ribbon. In order to use the DV tool with MS Office 2003, it is necessary to download and install (if not already installed) “Microsoft’s Office Compatibility Pack for Word, Excel and PowerPoint File Formats” which is available for free at: http://www.microsoft.com/en-us/download/details.aspx?id=3 Attention should be given to the restriction of a maximum number of 65,535 rows (the first row is always the header). However, this is the theoretical limit of rows in Excel 2003. In practise, and especially 4 DV Tool 4.16 User Manual when dealing with large files, large number of rows can cause significant computational time or even an Excel not responding status. Users are strongly advised to use Excel 2007 when dealing with more than 20,000 rows especially when dealing with the 'catch' data table and 'landings' data table. One other option is first to manually split the data set into smaller sizes (e.g. one sheet per year of data) and then insert or paste into the DV tool file. Apply the DV tool separately for each one of these smaller data sets. The environment is the same except for the option to recall and show the control panel from the Office ribbon since this is not available for pre-Office 2007 versions. In order to re-show the DV control panel, simply use this key combination: Ctrl + J or just save and re-open the template. To change the security settings for the Macros go to Tools: Macro Security Security Level Medium. 3.2. DV tool control panel The DV tool control panel is the first and the main window that pops-up when opening the Workbook (Figure 3.2). Here, users can access all the functions that the DV tool provides. Figure 3.2: The DV tool control panel. 5 DV Tool 4.16 User Manual The panel window consists of two main areas: (1) a ‘Data Checking’ area and (2) a ‘Support’ area. The ‘Data Check’ area is made of step-buttons (see the big coloured buttons in Figure 3.2) that can be performed on a single worksheet or on all worksheets at once. While the blue buttons act on the worksheet as a whole (create, export, import…), the other buttons act on the data contained in the worksheet(s) using the corresponding colour to identify the type of error-check. The ‘Support’ area is made of smaller buttons visible at the top right corner of the control panel. These buttons can be used to manipulate the results of the checks and to request help. Figure 3.3: Buttons of the ‘support area’. 3.2.1. Data Check area Users are strongly advised to follow each step in order to guarantee the appropriate data checks. The available buttons include: • Step 1 Add a template structure • Step 2 Paste Data Import Data • Step 3 Check Codification • Step 4 Check for Duplicated rows • Step 5 Check Correspondences between columns • Step 6 Check Correspondences between excel sheets • Step 7 Export Data = Exports the data in the workbook to files ready to be uploaded to the DCF web site Step1 and Step 2 are not necessary if the user intends to import existing excel worksheets by the use of excel functionalities. 3.2.1.1. Step 1: add template In order to upload the DCF data to the ‘Upload facility’ of the Data Collection web site the user needs to prepare the data in a specific format. This format can be different from data call to data call because of requests for a different set of data aggregated at different levels. The specific format is defined by templates; these templates are Excel worksheets with pre-defined headings. 6 DV Tool 4.16 User Manual By clicking on the ‘add worksheet template’ a pop-up window exposes the list of templates valid for the specific data call. The user needs to click one or more check-boxes in correspondence to the desired template(s) in order to create one or more formatted worksheet(s). When pressed, the ‘Execute’ button will add the selected Templates as worksheets into the workbook. Figure 3.4: Selection template window of the Fisheries Dependent Information data call Figure 3.5: Selection template window for the Fleet Economic data call. 7 DV Tool 4.16 User Manual Once the worksheets are created they can be filled with data; it is possible to copy and paste them from an external source (excel sheet or text file) or continue with step 2. Important: For large quantities of data (for example: 'catch' data for Fisheries Dependent Information data call or 'landings' data for the Fleet Economic data call) it is recommended to open and work on only one template at a time. For the other templates use the DV tool in another excel file. 3.2.1.2. Step 2: paste data into the template In order to populate the worksheets with data and check whether the data meet the requirements of the data call the first thing to do is to fill up the empty cells with the data. The user has three options: • type in data manually, • copy and paste selection from one sheet to the DV tool • using Excel functionalities: • o 'move or copy worksheet' function (found by right-clicking the sheet tab) o available from the 'Data' menu, press the ‘Import’ button on the control panel (step 2). The ‘Import data’ feature offered by the DV tool has the advantage that it imports the data just below the header row so that the user can benefit from the format definition and compare the headings with the headings (if they exist) from the external source. Several imports can be executed in the same worksheet, building up a set of data from different sources. The possible external sources can be of two types: • .csv files (comma separated values), • .txt files (text files). First, select the worksheet template by clicking on any cell (note: cell format is set to ‘General’, i.e. no specific number format). Second, select the character separator used by the external source to separate the data. Third, press the ‘Import data’ button and navigate to select the external source. When you click on the ‘Import data’ button, the default Dialog Window will pop up for selecting the file to import. After selecting the file click the ‘OK’ button and the application will execute the importing procedure if the separator is recognised among the external source data. Figure 3.6: Import of data from a file with comma separator as delimiter. A message window is shown to inform the user on the import process. Once the data is imported please check carefully that the imported data correspond to the headings; otherwise the worksheet will never pass the validation checks. During this process the system may prompt the user with messages; in these cases the user needs to confirm and continue with the process. One possible message could be the one represented in Figure 3.7 when working in Microsoft Excel 2013. 8 DV Tool 4.16 User Manual Figure 3.7: Prompt message during import process Important: The user can always copy and paste with the traditional windows style method. Another option for importing data as an Excel Worksheet is to use Excel’s embedded import tool. To use this option, click on the Add Template Worksheet, choose from Excel’s Data ribbon: Data Get External Data, select columns and rows to import, select from data type in Columns (General), and choose to import in the B2 Cell of the current worksheet (when necessary use also: Data Text to column). 3.2.1.3. Step 3: check codification Once data has been inserted in the template, the next step is to check the data against codifications defined under the data call. To check codifications click on the red button. The user can choose to perform this check on the selected worksheet or on all the worksheets in the workbook by clicking the corresponding radio button on the right panel of the ‘Check Codification’ button. If ‘all worksheets’ is selected the macro will go through all worksheets in the file, one by one. If the ‘current worksheet’ option is selected, the macro will check only the active worksheet. Figure 3.8: Check Codification executed on the selected (active) worksheet. Different types of checks are executed during this process: 1. Validity of the worksheet name. 2. Validity of the headers (same order, same spelling and same quantity as the ones defined in the templates) 3. Validity of the data below each header depending on their definition: a. checks on the type of value (numeric, alphanumeric, decimal, ...), b. checks on numeric range, c. checks on maximum number of characters, d. checks if empty cells are allowed, e. checks if code is found in predefined lists. Only after successfully passing points 1 and 2 above, does the procedure of checking continue on to point 3 to check if the data conforms to the codification scheme described in the data call. 9 DV Tool 4.16 User Manual During the check, the only enabled button is ‘Suspend’. Clicking this button while running the checks, stops the process and only the errors, if any, that have been detected until that moment are shown. After finishing this process, or after clicking ‘Suspend’, the user needs to click a second time on the button (now set as “back to start)” to get back to the original situation with all the other buttons again enabled. Clicking on a column heading and the icon allows the user to open a separate excel with the valid codes, if any, belonging to the selected heading (see 3.2.2 Support area). The user should also consult the data call specification and - for the Fisheries Dependent Information data call - the 'Actions and associated error messages for data errors' tables in the upload facility manual. Another way to correct the errors is manually, by simply using ‘Find and Replace’ or directly correcting the value in the cell. In case of doubts, the user can consult the ‘View list’ (see 3.2.2 Support area). Important: After correcting the errors, always re-evaluate to ensure that the changes have been made and are correct. 3.2.1.4. Step 4: check duplication Click the ‘Check Duplication’ button in the control panel to start the check of duplicated excel rows. The duplication is based on predefined columns that need to be different for the data to make sense. These sets of columns can differ from template to template. The columns considered in the duplication process have coloured headings (note: the coloured headings are visible only if step1 has been executed). Figure 3.9: Duplication check panel. The check can be executed for the selected worksheet or for all the worksheets if they are valid templates. This choice must be done from the side panel before pressing the ‘Check Duplication’ button. If no choice is made then only the selected worksheet is checked. A progressive report can be viewed during the execution. The results are reported in an ad-hoc column created for this purpose at the beginning of the sheet in column A. In this column, all duplicated rows are coloured in orange and the number of the current row is displayed coupled with the number of the row with which it matches. Figure 3.10: Duplication check example: row 5 is a duplication of row 4 and row 12 is a duplication of row 11. 10 DV Tool 4.16 User Manual Action: the user can choose to ignore the warnings or he can delete all duplicated rows. To examine the duplicated rows use the support button at the top of the dashboard (this avoids having to scroll down the sheet to find duplicates). The 'matched' rows retain their original row number in the ad-hoc column. A message box will pop-up for confirmation. Important: Always repeat the checking procedure after correcting the errors. If not corrected these errors will produce errors in the ‘Upload facility’ and the worksheet may be refused. 3.2.1.5. Step 5: check Inter-column correspondences This check involves more than one column on the same row; the set of columns is defined by data call. For a specific value in a column, only a few possible values can be inserted in certain other columns. It is also named a ‘horizontal check’ because it is done comparing cells row by row. Figure 3.11: Inter-columns check panel. Figure 3.12: Correspondence check example (Fisheries Dependent Information data call). This check can be done on the selected worksheet or for all valid worksheets. This choice needs to be made before pressing the ‘check inter-columns’ button. During this process, the cells included in the correspondence column set are coloured yellow when not correct and a message is displayed in the ‘check result’ column (A). To correct correspondence errors the user needs to change one of the values involved by clicking on the cell. By clicking on the heading of column 'A' followed by the icon it is possible to see the possible correspondences. The user should also consult the data call specification and - for the Fisheries Dependent Information data call - the 'Actions and associated error messages for data errors' tables in the upload facility manual. Important: If not corrected these errors will produce errors in the ‘Upload facility’ and the worksheet may be refused. 11 DV Tool 4.16 User Manual 3.2.1.6. Step 6: check Inter-worksheets correspondences This check involves more than one sheet on the same excel file. The sheets involved in this check are defined by the data call. The purpose of this check is to verify if specific data defined by the user in one worksheet (e.g. fleet segments defined in the 'Capacity' template for the Fleet Economic data call) are the only ones used to provide data in all the other worksheets; data not defined in the 'definition worksheet are highlighted as errors. The user must: 1. First compile the 'user list definition' worksheet (e.g. for the Fleet Economic data call this corresponds to the CAPACITY worksheet, for the Fisheries Dependent Information data call this correspond to the EFF_02_EFF worksheet) and run a 'codification' check on it (see 3.2.1.3). 2. Second execute the 'inter-worksheets' check over the other worksheets. Worksheets involved in this check are prompted by errors if they contain data that are not defined by the 'user list definition'. Important: If the DV tool control panel closes before completing point 2 then point 1 must be reexecuted. Figure 3.13: Inter-worksheets check panel. This check can be done on the selected worksheet or on all valid worksheets. This choice needs to be made before pressing the ‘check inter-sheets’ button. During the process, cells included in the correspondence column set are coloured yellow when not correct and a message is written in the ‘check result’ column (A). Action: the user can choose to ignore the warnings or to modify the error. To examine the errors use the support at the top (this avoids having to scroll down the sheet to find them). Important: If not corrected these errors will produce errors in the ‘Upload facility’ and the worksheet may be refused. 3.2.1.7. Step 7: export data An ‘Export Data’ option is available on the control panel. This option allows the data present in the DV tool to be saved into new excel files ready to be uploaded to the ‘Upload facility’ of the Data Collection web site. The new excel files exclude any embedded macros and the ‘check results’ columns. When a worksheet contains a large number of rows, the user is prompted to split the data into smaller files. . This choice can be done by number of rows or by years and produces more than one file per worksheet exported. 12 DV Tool 4.16 User Manual Figure 3.14: Option to split the file to smaller size files. Figure 3.15: Choose to split the data by year. After completing the operation, a message appears (Figure 3.16: Successful export message.) informing the users about the folder where the files are saved. By default, the application creates a new folder in the folder of the DV tool file, with the name of the template and the date/time. Figure 3.16: Successful export message. If the worksheets have been split into smaller files, the files are enumerated with the name of the worksheet. In case the user chooses to split the worksheets by year then the excel files produced are named also with the year number. An example is given in Figure 3.17. 13 DV Tool 4.16 User Manual Figure 3.17: Example of exported data in the created directory. Important: Before exporting the data set always delete any worksheet that you do not intend to keep. The ‘check result’ column and the macros are automatically removed from the worksheet exported but any coloured cells are kept if not removed by the user. 3.2.2. Support area The support area is a set of icons that the user can find in the upper right corner of the control panel. These icons and definitions are as follows: If pressed it erases all error and warning messages that have been produced after the checking. The effect is only on the selected worksheet. It pushes the error messages to the top of the worksheet to avoid scrolling down each time the user wants to see the affected rows. 14 DV Tool 4.16 User Manual Turn on and off the ‘Check result’ column situated as the first column on every worksheet. This column is automatically removed during the export process producing an external file that is suitable for the upload of data. By clicking this icon you will retrieve some of the code definitions, the list of possible valid codes to be inserted. This icon is a link to the Upload page of the Data Collection web site. This page lists all the data calls, the periods of the data calls and a link to download the manual of the ‘Upload facility’. This icon is a link to the open calls web page of the Data Collection web site. Here you can find information like: official letter, template explanations, links to relevant legislation and notes that can occur during the data call. 4. Important Notes • This manual is for the current version of the DV Tool 4.16. • Users are strongly advised to download and install the latest version of Microsoft’s Office Service Pack in case they face any compatibility problems. • The DV Tool has been tested successfully on Windows XP, Windows 7 and Windows 8 operating systems. • When dealing with a large number of records please be patient while checking or exporting data. • When the Office 2003 versions are used, the total number of rows is limited to 65535. However, the Office 2007 version can handle over 1 million rows of data. • For questions, suggestions, problems, bug reporting and support, send a direct email to the following email address: datasubmission@jrc.ec.europa.eu 5. Disclaimer of warranty THE APPLICATION AND MANUAL ARE PROVIDED ON AN "AS IS'' BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES THAT THE CODE IS FREE OF DEFECTS, MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE OR NON-INFRINGING. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE CODE IS WITH YOU. SHOULD ANY CODE PROVE DEFECTIVE IN ANY RESPECT, YOU (NOT THE INITIAL DEVELOPER OR ANY OTHER CONTRIBUTOR) ASSUME THE COST OF ANY NECESSARY SERVICING, REPAIR OR CORRECTION. NO USE OF ANY CODE, APPLICATION OR MANUAL IS AUTHORIZED HEREUNDER EXCEPT UNDER THIS DISCLAIMER. 15 European Commission EUR XXXXX LL – Joint Research Centre – Institute for the Protection and Security of the Citizen Title: JRC Data Validation Tool for the EU Data Collection Regulation Framework - DV Tool 4.16 - User Manual Author: JRC G.04 FISHREG Data Collection Team Luxembourg: Publications Office of the European Union 2015 –20 pp. – 21 x 29.7 cm EUR – Scientific and Technical Research series – ISSN XXXX-XXXX ISBN X-XXXX-XXXX-X doi:XXXXX Abstract The Data Validation (DV) tool is a set of macros developed in Visual Basic for Applications (VBA) and embedded in specifically designed template Excel Workbooks. The main purpose of this tool is to facilitate and support the Member States in uploading data which meet the requirements defined by DG MARE in the official DCF data calls. The use of these Excel Template files is not mandatory. However, the data validations checks performed by the DV tool can greatly reduce the number of erroneous records contained in a file to be uploaded to the DCF web site, and hence facilitate the uploading procedure. The current version of the DV tool is 4.14. The purpose of this user manual is to provide the concept of this tool and guidelines for its use. How to obtain EU publications Our priced publications are available from EU Bookshop (http://bookshop.europa.eu), where you can place an order with the sales agent of your choice. The Publications Office has a worldwide network of sales agents. You can obtain their contact details by sending a fax to (352) 29 29-42758. AA-BB-XXXXX-LL-C The mission of the JRC is to provide customer-driven scientific and technical support for the conception, development, implementation and monitoring of EU policies. As a service of the European Commission, the JRC functions as a reference centre of science and technology for the Union. Close to the policy-making process, it serves the common interest of the Member States, while being independent of special interests, whether private or national.