User Manual - Data Collection Framework

advertisement
JRC Data Validation tool
for Data Calls collected under
the EU Data Collection Framework
DV Tool 4.15
User Manual
JRC G.03 Data Collection Team
EUR xxxxx EN
The mission of the JRC-IPSC is to provide research results and to support EU policy-makers in
their effort towards global security and towards protection of European citizens from accidents,
deliberate attacks, fraud and illegal actions against EU policies.
European Commission
Joint Research Centre
Institute for the Protection and Security of the Citizen
Contact information
Address: Via Fermi 2749, T.P. 051 – Ispra (VA), 21027 - Italy
E-mail: datasubmission@jrc.ec.europa.eu
Tel.: 0332 786479
Fax: 0332 789658
http://ipsc.jrc.ec.europa.eu/
http://www.jrc.ec.europa.eu/
Legal Notice
Neither the European Commission nor any person acting on behalf of the Commission
is responsible for the use which might be made of this publication.
Europe Direct is a service to help you find answers
to your questions about the European Union
Freephone number (*):
00 800 6 7 8 9 10 11
(*) Certain mobile telephone operators do not allow access to 00 800 numbers or these calls may be billed.
A great deal of additional information on the European Union is available on the Internet.
It can be accessed through the Europa server http://europa.eu/
JRC [PUBSY request]
EUR XXXXX LL
ISBN X-XXXX-XXXX-X
ISSN XXXX-XXXX
doi:XXXXX
Luxembourg: Publications Office of the European Union
© European Union, 2015
Reproduction is authorised provided the source is acknowledged
Printed in Italy
DV Tool 4.16 User Manual
TABLE OF CONTENTS
1.
Introduction................................................................................................................................ 3
2.
DV tool and Microsoft Office Excel versions ............................................................................... 4
3.
How to use the DV tool features ................................................................................................. 4
3.1.
3.2.
AT OPENING ................................................................................................................................................. 4
DV TOOL CONTROL PANEL............................................................................................................................. 5
3.2.1.
Data Check area ........................................................................................................................... 6
3.2.1.1.
3.2.1.2.
3.2.1.3.
3.2.1.4.
3.2.1.5.
3.2.1.6.
3.2.1.7.
3.2.2.
Step 1: add template................................................................................................................................ 6
Step 2: paste data into the template ........................................................................................................ 8
Step 3: check codification ........................................................................................................................ 9
Step 4: check duplication ....................................................................................................................... 10
Step 5: check Inter-column correspondences........................................................................................ 11
Step 6: check Inter-worksheets correspondences ................................................................................. 12
Step 7: export data ................................................................................................................................ 12
Support area................................................................................................................................ 14
4.
Important Notes ....................................................................................................................... 15
5.
Disclaimer of warranty ............................................................................................................. 15
1
DV Tool 4.16 User Manual
LIST OF FIGURES
Figure 3.1: The DV tool button in Excel’s Home ribbon. ......................................................................................... 4
Figure 3.2: The DV tool control panel. .................................................................................................................... 5
Figure 3.3: Buttons of the ‘support area’. ................................................................................................................ 6
Figure 3.4: Selection template window of the Fishing Effort Regime data call ....................................................... 7
Figure 3.5: Selection template window for the Fleet Economic data call. ............................................................... 7
Figure 3.6: Import of data from a file with comma separator as delimiter. .............................................................. 8
Figure 3.7: Prompt message during import process ............................................................................................... 9
Figure 3.8: Check Codification executed on the selected (active) worksheet. ....................................................... 9
Figure 3.10: Duplication check panel. ................................................................................................................... 10
Figure 3.11: Duplication check example: row 5 is a duplication of row 4 and row 12 is a duplication of
row 11. ................................................................................................................................................ 10
Figure 3.12: Inter-columns check panel. ............................................................................................................... 11
Figure 3.13: Correspondence check example (Effort data call). ........................................................................... 11
Figure 3.12: Inter-worksheets check panel. .......................................................................................................... 12
Figure 3.14: Option to split the file to smaller size files. ........................................................................................ 13
Figure 3.15: Choose to split the data by year. ...................................................................................................... 13
Figure 3.16: Successful export message. ............................................................................................................. 13
Figure 3.17: Example of exported data in the created directory. .......................................................................... 14
2
DV Tool 4.16 User Manual
1. Introduction
The Data Validation (DV) tool is a set of macros developed in Visual Basic for Applications (VBA) and
embedded in a specifically designed Excel Workbook. The main purpose of this tool is to facilitate and
support the Member States in uploading data to meet the requirements defined by DG MARE in official
DCF data calls by STECF (Council Regulation 199/2008). The use of this tool is not mandatory.
However, the data validations checks performed by the DV tool can greatly reduce the number of
erroneous records contained in a file to be uploaded to the DCF web site, and hence facilitate the
uploading procedure.
The tool is capable of checking national data stored in Excel rows against certain codifications and
rules as requested in the data call. Checks can be performed on partial or whole data sets to be
uploaded to the DCF web site. The majority of the checks concern the use of valid codes as defined in
the data call and the type of the data entered (numeric or text). Erroneous data are identified and can
easily be corrected.
The DV tool is design to check data that are provided in predefined worksheets depending on the
type of data call. Each data call has a dedicated DV tool.
The first version of the DV tool was developed to serve the needs of the Fishing Effort Regimes and
the Mediterranean & Black Sea data calls. The DV tool was a set of excel files one for each set of data
type requested by the data call. A second version was produced for the Fleet Economic data call in 2013.
It contained improvements based on experience acquired with the previous data calls:
•
a single excel file containing all the data templates and with
•
the possibility to automatically update, directly from the database, the codes used by the DV
tool for the validation of the data.
It was realised that ideally the JRC should develop a DV tool that could serve different data calls while
remaining easy to maintain. Therefore the original DV tool, developed by Nikolaos Mitrakis in 2011, has
been re-engineered and re-designed from scratch to allow the flexibility and maintainability desired.
However, the tool is still under development and any feedback from users that could help improve the
tool according to their needs is most welcomed. Any changes in the data call imply coherent adaptation
of the modules.
The purpose of this manual is to provide the general concept surrounding the tool and guidelines for
its use. Since the tool is integrated in Excel Workbooks, all Excel functions are still available. The tool is
available for download from the Data Collection Framework web site:
https://datacollection.jrc.ec.europa.eu
3
DV Tool 4.16 User Manual
2. DV tool and Microsoft Office Excel versions
To benefit from the DV tool functionalities it is necessary to run it on a Windows operating system
where Microsoft Excel software is installed. The DV tool can be used within Microsoft Excel 2003, 2007,
2010 and 2013; either 32bits or 64bits.
3. How to use the DV tool features
The DV tool stands for Data Validation tool. This chapter explains the functionalities and the possible
steps to be performed in order to create final excel data files that can be uploaded to the ‘Upload Facility’
available on the Data Collection web site.
3.1. At opening
When opening the DV tool file with Microsoft Office 2007/2010 (see below for Microsoft Office 2003),
a ‘Security Warning’ may appear with the message:
“Macros have been disabled. Options…”
This is a standard security option in all Microsoft’s applications, since macros can potentially have
access to your data. In order to continue with the use of the tool, the user should choose:
“OptionsEnable this content”.
To change the security settings go to “DeveloperCodeMacro SecurityDisable” all macros with
notification. However, we recommend that the default option not be changed (although, the DV tool is a
trusted application, other macros can potentially cause several problems). Once the Macros are enabled
the DV Tool ribbon is visible in the Home tab of the Excel application.
Figure 3.1: The DV tool button in Excel’s Home ribbon.
In order to use the DV tool with MS Office 2003, it is necessary to download and install (if not already
installed) “Microsoft’s Office Compatibility Pack for Word, Excel and PowerPoint File Formats” which is
available for free at: http://www.microsoft.com/en-us/download/details.aspx?id=3
Attention should be given to the restriction of a maximum number of 65,535 rows (the first row is
always the header). However, this is the theoretical limit of rows in Excel 2003. In practise, and especially
4
DV Tool 4.16 User Manual
when dealing with large files, large number of rows can cause significant computational time or even an
Excel not responding status.
Users are strongly advised to use Excel 2007 when dealing with more than 20,000 rows especially
when dealing with the 'catch' data table and 'landings' data table. One other option is first to manually
split the data set into smaller sizes (e.g. one sheet per year of data) and then insert or paste into the DV
tool file. Apply the DV tool separately for each one of these smaller data sets.
The environment is the same except for the option to recall and show the control panel from the
Office ribbon since this is not available for pre-Office 2007 versions. In order to re-show the DV control
panel, simply use this key combination: Ctrl + J or just save and re-open the template.
To change the security settings for the Macros go to Tools:
Macro Security Security Level Medium.
3.2. DV tool control panel
The DV tool control panel is the first and the main window that pops-up when opening the Workbook
(Figure 3.2). Here, users can access all the functions that the DV tool provides.
Figure 3.2: The DV tool control panel.
5
DV Tool 4.16 User Manual
The panel window consists of two main areas: (1) a ‘Data Checking’ area and (2) a ‘Support’ area.
The ‘Data Check’ area is made of step-buttons (see the big coloured buttons in Figure 3.2) that can
be performed on a single worksheet or on all worksheets at once. While the blue buttons act on the
worksheet as a whole (create, export, import…), the other buttons act on the data contained in the
worksheet(s) using the corresponding colour to identify the type of error-check.
The ‘Support’ area is made of smaller buttons visible at the top right corner of the control panel.
These buttons can be used to manipulate the results of the checks and to request help.
Figure 3.3: Buttons of the ‘support area’.
3.2.1.
Data Check area
Users are strongly advised to follow each step in order to guarantee the appropriate data checks. The
available buttons include:
•
Step 1
Add a template structure
•
Step 2
Paste Data
Import Data
•
Step 3
Check Codification
•
Step 4
Check for Duplicated rows
•
Step 5
Check Correspondences between columns
•
Step 6
Check Correspondences between excel sheets
•
Step 7
Export Data = Exports the data in the workbook to files ready to be uploaded to the DCF web
site
Step1 and Step 2 are not necessary if the user intends to import existing excel worksheets by the use
of excel functionalities.
3.2.1.1.
Step 1: add template
In order to upload the DCF data to the ‘Upload facility’ of the Data Collection web site the user needs
to prepare the data in a specific format. This format can be different from data call to data call because of
requests for a different set of data aggregated at different levels. The specific format is defined by
templates; these templates are Excel worksheets with pre-defined headings.
6
DV Tool 4.16 User Manual
By clicking on the ‘add worksheet template’ a pop-up window exposes the list of templates valid for
the specific data call. The user needs to click one or more check-boxes in correspondence to the desired
template(s) in order to create one or more formatted worksheet(s). When pressed, the ‘Execute’ button
will add the selected Templates as worksheets into the workbook.
Figure 3.4: Selection template window of the Fisheries Dependent Information data call
Figure 3.5: Selection template window for the Fleet Economic data call.
7
DV Tool 4.16 User Manual
Once the worksheets are created they can be filled with data; it is possible to copy and paste them
from an external source (excel sheet or text file) or continue with step 2.
Important: For large quantities of data (for example: 'catch' data for Fisheries Dependent Information
data call or 'landings' data for the Fleet Economic data call) it is recommended to open and work on only
one template at a time. For the other templates use the DV tool in another excel file.
3.2.1.2.
Step 2: paste data into the template
In order to populate the worksheets with data and check whether the data meet the requirements of
the data call the first thing to do is to fill up the empty cells with the data. The user has three options:
•
type in data manually,
•
copy and paste selection from one sheet to the DV tool
•
using Excel functionalities:
•
o
'move or copy worksheet' function (found by right-clicking the sheet tab)
o
available from the 'Data' menu,
press the ‘Import’ button on the control panel (step 2).
The ‘Import data’ feature offered by the DV tool has the advantage that it imports the data just below
the header row so that the user can benefit from the format definition and compare the headings with the
headings (if they exist) from the external source. Several imports can be executed in the same
worksheet, building up a set of data from different sources. The possible external sources can be of two
types:
•
.csv files (comma separated values),
•
.txt files (text files).
First, select the worksheet template by clicking on any cell (note: cell format is set to ‘General’, i.e. no
specific number format). Second, select the character separator used by the external source to separate
the data. Third, press the ‘Import data’ button and navigate to select the external source. When you click
on the ‘Import data’ button, the default Dialog Window will pop up for selecting the file to import. After
selecting the file click the ‘OK’ button and the application will execute the importing procedure if the
separator is recognised among the external source data.
Figure 3.6: Import of data from a file with comma separator as delimiter.
A message window is shown to inform the user on the import process. Once the data is imported
please check carefully that the imported data correspond to the headings; otherwise the worksheet will
never pass the validation checks.
During this process the system may prompt the user with messages; in these cases the user needs to
confirm and continue with the process. One possible message could be the one represented in Figure
3.7 when working in Microsoft Excel 2013.
8
DV Tool 4.16 User Manual
Figure 3.7: Prompt message during import process
Important: The user can always copy and paste with the traditional windows style method. Another
option for importing data as an Excel Worksheet is to use Excel’s embedded import tool. To use this
option, click on the Add Template Worksheet, choose from Excel’s Data ribbon: Data Get External
Data, select columns and rows to import, select from data type in Columns (General), and choose to
import in the B2 Cell of the current worksheet (when necessary use also: Data Text to column).
3.2.1.3.
Step 3: check codification
Once data has been inserted in the template, the next step is to check the data against codifications
defined under the data call.
To check codifications click on the red button. The user can choose to perform this check on the
selected worksheet or on all the worksheets in the workbook by clicking the corresponding radio button
on the right panel of the ‘Check Codification’ button. If ‘all worksheets’ is selected the macro will go
through all worksheets in the file, one by one. If the ‘current worksheet’ option is selected, the macro will
check only the active worksheet.
Figure 3.8: Check Codification executed on the selected (active) worksheet.
Different types of checks are executed during this process:
1. Validity of the worksheet name.
2. Validity of the headers (same order, same spelling and same quantity as the ones defined in the
templates)
3. Validity of the data below each header depending on their definition:
a. checks on the type of value (numeric, alphanumeric, decimal, ...),
b. checks on numeric range,
c.
checks on maximum number of characters,
d. checks if empty cells are allowed,
e. checks if code is found in predefined lists.
Only after successfully passing points 1 and 2 above, does the procedure of checking continue on to
point 3 to check if the data conforms to the codification scheme described in the data call.
9
DV Tool 4.16 User Manual
During the check, the only enabled button is ‘Suspend’. Clicking this button while running the checks,
stops the process and only the errors, if any, that have been detected until that moment are shown.
After finishing this process, or after clicking ‘Suspend’, the user needs to click a second time on the
button (now set as “back to start)” to get back to the original situation with all the other buttons again
enabled.
Clicking on a column heading and the
icon allows the user to open a separate excel with the valid
codes, if any, belonging to the selected heading (see 3.2.2 Support area). The user should also consult
the data call specification and - for the Fisheries Dependent Information data call - the 'Actions and
associated error messages for data errors' tables in the upload facility manual.
Another way to correct the errors is manually, by simply using ‘Find and Replace’ or directly
correcting the value in the cell. In case of doubts, the user can consult the ‘View list’ (see 3.2.2 Support
area).
Important: After correcting the errors, always re-evaluate to ensure that the changes have been made
and are correct.
3.2.1.4.
Step 4: check duplication
Click the ‘Check Duplication’ button in the control panel to start the check of duplicated excel rows. The
duplication is based on predefined columns that need to be different for the data to make sense. These
sets of columns can differ from template to template. The columns considered in the duplication process
have coloured headings (note: the coloured headings are visible only if step1 has been executed).
Figure 3.9: Duplication check panel.
The check can be executed for the selected worksheet or for all the worksheets if they are valid
templates. This choice must be done from the side panel before pressing the ‘Check Duplication’ button.
If no choice is made then only the selected worksheet is checked.
A progressive report can be viewed during the execution. The results are reported in an ad-hoc column
created for this purpose at the beginning of the sheet in column A. In this column, all duplicated rows are
coloured in orange and the number of the current row is displayed coupled with the number of the row
with which it matches.
Figure 3.10: Duplication check example: row 5 is a duplication of row 4 and row 12 is a duplication of row 11.
10
DV Tool 4.16 User Manual
Action: the user can choose to ignore the warnings or he can delete all duplicated rows. To examine the
duplicated rows use the support button
at the top of the dashboard (this avoids having to scroll down
the sheet to find duplicates). The 'matched' rows retain their original row number in the ad-hoc column. A
message box will pop-up for confirmation.
Important: Always repeat the checking procedure after correcting the errors. If not corrected these errors
will produce errors in the ‘Upload facility’ and the worksheet may be refused.
3.2.1.5.
Step 5: check Inter-column correspondences
This check involves more than one column on the same row; the set of columns is defined by data call.
For a specific value in a column, only a few possible values can be inserted in certain other columns. It is
also named a ‘horizontal check’ because it is done comparing cells row by row.
Figure 3.11: Inter-columns check panel.
Figure 3.12: Correspondence check example (Fisheries Dependent Information data call).
This check can be done on the selected worksheet or for all valid worksheets. This choice needs to be
made before pressing the ‘check inter-columns’ button.
During this process, the cells included in the correspondence column set are coloured yellow when not
correct and a message is displayed in the ‘check result’ column (A).
To correct correspondence errors the user needs to change one of the values involved by clicking on the
cell. By clicking on the heading of column 'A' followed by the
icon it is possible to see the possible
correspondences. The user should also consult the data call specification and - for the Fisheries
Dependent Information data call - the 'Actions and associated error messages for data errors' tables in
the upload facility manual.
Important: If not corrected these errors will produce errors in the ‘Upload facility’ and the worksheet may
be refused.
11
DV Tool 4.16 User Manual
3.2.1.6.
Step 6: check Inter-worksheets correspondences
This check involves more than one sheet on the same excel file. The sheets involved in this check are
defined by the data call. The purpose of this check is to verify if specific data defined by the user in one
worksheet (e.g. fleet segments defined in the 'Capacity' template for the Fleet Economic data call) are
the only ones used to provide data in all the other worksheets; data not defined in the 'definition
worksheet are highlighted as errors. The user must:
1. First compile the 'user list definition' worksheet (e.g. for the Fleet Economic data call this
corresponds to the CAPACITY worksheet, for the Fisheries Dependent Information data call this
correspond to the EFF_02_EFF worksheet) and run a 'codification' check on it (see 3.2.1.3).
2. Second execute the 'inter-worksheets' check over the other worksheets. Worksheets involved in
this check are prompted by errors if they contain data that are not defined by the 'user list
definition'.
Important: If the DV tool control panel closes before completing point 2 then point 1 must be reexecuted.
Figure 3.13: Inter-worksheets check panel.
This check can be done on the selected worksheet or on all valid worksheets. This choice needs to be
made before pressing the ‘check inter-sheets’ button.
During the process, cells included in the correspondence column set are coloured yellow when not
correct and a message is written in the ‘check result’ column (A).
Action: the user can choose to ignore the warnings or to modify the error. To examine the errors use the
support at the top
(this avoids having to scroll down the sheet to find them).
Important: If not corrected these errors will produce errors in the ‘Upload facility’ and the worksheet may
be refused.
3.2.1.7.
Step 7: export data
An ‘Export Data’ option is available on the control panel. This option allows the data present in the DV
tool to be saved into new excel files ready to be uploaded to the ‘Upload facility’ of the Data Collection
web site. The new excel files exclude any embedded macros and the ‘check results’ columns.
When a worksheet contains a large number of rows, the user is prompted to split the data into smaller
files. . This choice can be done by number of rows or by years and produces more than one file per
worksheet exported.
12
DV Tool 4.16 User Manual
Figure 3.14: Option to split the file to smaller size files.
Figure 3.15: Choose to split the data by year.
After completing the operation, a message appears (Figure 3.16: Successful export message.)
informing the users about the folder where the files are saved. By default, the application creates a new
folder in the folder of the DV tool file, with the name of the template and the date/time.
Figure 3.16: Successful export message.
If the worksheets have been split into smaller files, the files are enumerated with the name of the
worksheet. In case the user chooses to split the worksheets by year then the excel files produced are
named also with the year number. An example is given in Figure 3.17.
13
DV Tool 4.16 User Manual
Figure 3.17: Example of exported data in the created directory.
Important: Before exporting the data set always delete any worksheet that you do not intend to keep.
The ‘check result’ column and the macros are automatically removed from the worksheet exported but
any coloured cells are kept if not removed by the user.
3.2.2.
Support area
The support area is a set of icons that the user can find in the upper right corner of the control panel.
These icons and definitions are as follows:
If pressed it erases all error and warning messages that have been produced after the
checking. The effect is only on the selected worksheet.
It pushes the error messages to the top of the worksheet to avoid scrolling down each
time the user wants to see the affected rows.
14
DV Tool 4.16 User Manual
Turn on and off the ‘Check result’ column situated as the first column on every worksheet.
This column is automatically removed during the export process producing an external
file that is suitable for the upload of data.
By clicking this icon you will retrieve some of the code definitions, the list of possible valid
codes to be inserted.
This icon is a link to the Upload page of the Data Collection web site. This page lists all
the data calls, the periods of the data calls and a link to download the manual of the
‘Upload facility’.
This icon is a link to the open calls web page of the Data Collection web site. Here you
can find information like: official letter, template explanations, links to relevant legislation
and notes that can occur during the data call.
4. Important Notes
•
This manual is for the current version of the DV Tool 4.16.
•
Users are strongly advised to download and install the latest version of Microsoft’s Office Service
Pack in case they face any compatibility problems.
•
The DV Tool has been tested successfully on Windows XP, Windows 7 and Windows 8 operating
systems.
•
When dealing with a large number of records please be patient while checking or exporting data.
•
When the Office 2003 versions are used, the total number of rows is limited to 65535. However, the
Office 2007 version can handle over 1 million rows of data.
•
For questions, suggestions, problems, bug reporting and support, send a direct email to the following
email address: datasubmission@jrc.ec.europa.eu
5. Disclaimer of warranty
THE APPLICATION AND MANUAL ARE PROVIDED ON AN "AS IS'' BASIS, WITHOUT
WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, WITHOUT
LIMITATION, WARRANTIES THAT THE CODE IS FREE OF DEFECTS, MERCHANTABLE,
FIT FOR A PARTICULAR PURPOSE OR NON-INFRINGING. THE ENTIRE RISK AS TO THE
QUALITY AND PERFORMANCE OF THE CODE IS WITH YOU. SHOULD ANY CODE
PROVE DEFECTIVE IN ANY RESPECT, YOU (NOT THE INITIAL DEVELOPER OR ANY
OTHER CONTRIBUTOR) ASSUME THE COST OF ANY NECESSARY SERVICING, REPAIR
OR CORRECTION. NO USE OF ANY CODE, APPLICATION OR MANUAL IS AUTHORIZED
HEREUNDER EXCEPT UNDER THIS DISCLAIMER.
15
European Commission
EUR XXXXX LL – Joint Research Centre – Institute for the Protection and Security of the Citizen
Title: JRC Data Validation Tool for the EU Data Collection Regulation Framework - DV Tool 4.16 - User Manual
Author: JRC G.04 FISHREG Data Collection Team
Luxembourg: Publications Office of the European Union
2015 –20 pp. – 21 x 29.7 cm
EUR – Scientific and Technical Research series – ISSN XXXX-XXXX
ISBN X-XXXX-XXXX-X
doi:XXXXX
Abstract
The Data Validation (DV) tool is a set of macros developed in Visual Basic for Applications (VBA) and
embedded in specifically designed template Excel Workbooks. The main purpose of this tool is to facilitate and
support the Member States in uploading data which meet the requirements defined by DG MARE in the official
DCF data calls. The use of these Excel Template files is not mandatory. However, the data validations checks
performed by the DV tool can greatly reduce the number of erroneous records contained in a file to be uploaded
to the DCF web site, and hence facilitate the uploading procedure. The current version of the DV tool is 4.14.
The purpose of this user manual is to provide the concept of this tool and guidelines for its use.
How to obtain EU publications
Our priced publications are available from EU Bookshop (http://bookshop.europa.eu), where you can place
an order with the sales agent of your choice.
The Publications Office has a worldwide network of sales agents. You can obtain their contact details by
sending a fax to (352) 29 29-42758.
AA-BB-XXXXX-LL-C
The mission of the JRC is to provide customer-driven scientific and technical support for the
conception, development, implementation and monitoring of EU policies. As a service of the
European Commission, the JRC functions as a reference centre of science and technology
for the Union. Close to the policy-making process, it serves the common interest of the
Member States, while being independent of special interests, whether private or national.
Download