Data Masking Checklist

advertisement
Data Masking Checklist
Selecting the Right Data Masking Tool
E: sales@grid-tools.com
www.grid-tools.com
UK: +44 (0)1865 884 600
US: (+1) 866 519 3751
Selecting Your Masking Tool
Ensuring compliance with current data protection regulations and guidelines has become a mandatory operation. Non-compliance not only
carries the risk of heavy fines and damages public relations, but also fails to adequately secure your sensitive data against data breaches.
Traditionally, many organizations have used manual techniques to mask (see also de-sensitize, de-identify, or obfuscate) full copies of
production data for use in development and testing. However, this is a labour-intensive, time-consuming and costly process that is prone to
human error and inconsistency. As a result, teams are often provided with poor quality data that is both inefficient and expensive to create.
This lengthens your test cycles as testers wait for data and reduces quality, resulting in more potentially costly defects making production.
Therefore, organizations are increasingly beginning to look towards implementing data masking tools to improve the quality of the data and
reduce the length and cost of their test cycles. However, there are a number of data masking tools on the market, so how do you choose the
right one for your project?
Below, we have set out a matrix containing a comprehensive list of all the features you need to consider when ensuring that your testing and
development teams are provided with high quality, compliant test data that can increase the quality and reduce the cost of your project. In
each case, we’ve noted how important these are, and how they can help solve the probable problems you might face in the real world.
E: sales@grid-tools.com
www.grid-tools.com
UK: +44 (0)1865 884 600
US: (+1) 866 519 3751
Masking Features
Application and Database
Integrity
Weighting
Mandatory
In The Real World
Consistent masking across multiple applications is essential for integration and end
to end testing.
Cross-Platform Integrity
Mandatory
Most large enterprises feed data across multiple platforms and technology stacks.
Medium
Consistent masking across multiple platforms is essential for integration and end to
end testing.
This is usually part of the set-up and understanding of applications. In our
experience, this can quickly be derived from pattern matching inside the data,
naming standards inside the catalogue and documentation.
Cross-System Data
Relationship Discovery and
Definition
These relationships are interesting, however, more focused ‘PII and Financial
Discovery Scanning’ (see below) is far more important as it takes far more time, is
more prone to error and is subject to changes over time.
PII and Financial Discovery
Scanning
Mandatory
The ability to scan all or percentages of the data across multiple systems and
automatically identify which data is potentially problematic is essential.
Relying on users’ interpretation of reports and screens is not good enough to
discover where hidden data exists in the system. The alternative is to use a ‘double
blind’ manual approach, where multiple users arrive at the same conclusion about
which data needs to be masked. However, this process is extremely time-consuming
and results in a project failure.
Vendor-Provided Apps Packs
of Rules
E: sales@grid-tools.com
Low
Relying on a pre-defined set of rules provided by a vendor means that you are
relying on their knowledge of a specific ERP. This is fraught with danger; remember,
it is you who is liable if there is a data leakage!
www.grid-tools.com
UK: +44 (0)1865 884 600
US: (+1) 866 519 3751
Y/N
Masking Features
Integration into BAU
Development Structures
Masking Repeatability
Multiple Database and
Platform Support
Weighting
High
High
High
Also, these app packs don’t consider local customizations within your applications –
the way you use the system as well as the normal usage of flex fields etc. It is better
to use a robust PII scanning tool to guarantee nothing is missed when masking.
In The Real World
The ability to easily fit the processes into existing DBA data provisioning procedures
in a timely manner.
The ability to consistently mask data using either deterministic masking functions or
cross-reference tables means the data can be masked in a similar manner across
applications
Support for masking on single platform or single database types will result in
different, inconsistent masking being set up across the enterprise.
Being able to mask data in legacy systems, such as IMS and VSAM as well as SQL
Server, for example, is essential.
Multiple Masking Technology
Stacks
High
One size definitely does not fit all. Some vendors provide a single method of
masking, for example, in-place masking, extract into files-mask-and return etc.
In reality, to mask very large or complex applications across multiple platforms
means that different technologies need to be used. This could include native
database utilities, in-database functions, or native mainframe masking etc.
Reporting and Auditing
High
Reporting on what has been masked is required, however, a more important
consideration is who chose what needs to be masked, why it does and when.
In addition, there needs to be an audit of exactly what technology was used to
perform the masking.
Flexible Masking Engines and
Methodologies
E: sales@grid-tools.com
High
The masking product needs to provide multiple methods for the data team. Based
on the size, urgency and potential risk, having simple-to-complex technology
available means that teams will be much more reactive.
www.grid-tools.com
UK: +44 (0)1865 884 600
US: (+1) 866 519 3751
Y/N
Technology should include: In-place masking, extract and mask ‘in flight’, build
shadow tables, as well as dynamic masking via views and message layers.
Masking Features
Dynamic Masking
Weighting
High
In The Real World
In some cases, ad hoc queries need to be made against real data. Access to this real
data can be controlled by creating a masked transparency layer. This uses a set of
views which mask certain fields consistently across databases. These views can also
be adjusted to identify which users have access to which data. In addition,
development applications can be set up to use the masked transparency layer so
that data used by developers appears masked.
Dynamic masking can also be deployed at the message or SOAP level. This can be
extremely useful for TDM teams as they can quickly provide access to web services
via a proxy. The proxy masks the data ‘in-flight’.
No SQL Masking engine
High
Some dynamic masking engines try and interpret the SQL and mask the data
returned from and to the database. All RDBMS’ support the concept of views and
synonyms, so using the native RDBMS’ own built-in functions is a much more
sensible and standard approach.
Subsetting in Conjunction
with Masking
High
A lot of current data legislation refers to ‘minimal data’ being used. Adding
subsetting to a masking project should be easy and is highly recommended. It can
also quickly improve the run times of data provisioning and agility of teams.
Complex Flat File Structures
Medium
Being able to verify that flat file structures are valid (see Data Quality) as well as fully
understood is key. Many enterprise systems will contain multiple definitions of files
and messages; being able to verify these and mask effectively is essential.
Being able to Mask Isolates
High
Dependent on the level of masking required, being able to mask isolated values, for
example high numbers with decimal places (134345567.12), is very important. You
E: sales@grid-tools.com
www.grid-tools.com
UK: +44 (0)1865 884 600
US: (+1) 866 519 3751
Y/N
do not want one piece of information being able to be used to trace back to a
specific user or account.
Masking Features
Being able to Mask Trends
Weighting
Medium
In The Real World
If an entire masked database is lost, then the general trends of the data have
commercial value. Being able to mask these trends, so that application integrity is
still maintained, is essential for fully secure masking.
Subsetting can help with this issue, as can using data constellations to provide the
essence of all the data without data trends.
Data Constellations
High
For very highly regulated markets, shipping masked data offshore is very
problematic. The inability to send data offshore can result in increasing costs.
Using a data constellation that looks for data dimensions that exist in production
(basically transaction major attributes), linked with synthetic and/or masked PII
data, allows ‘production-like’ data to be provisioned with none of the real content.
Richness of Functions, as well
as Custom Masking Routines
High
Most masking tools allow addresses and names to be masked. However, more
complex types of masks, such as IBAN numbers, check sums etc. need to be
included.
In addition to this, the ability to build local custom masking routines or integrate
existing masking should to be included.
Advanced Masking
Functionality
High
As a project develops, more complex types of masking requirements are often
discovered. The masking tool must be able to handle these complex needs.
A typical example would be multi value – multi column cross referencing. For
instance, the names Adam Smith, A Smith and ASMITH need to be masked
consistently. Many vendors do this by simply hand-building SQL to be run prior to
the mask.
E: sales@grid-tools.com
www.grid-tools.com
UK: +44 (0)1865 884 600
US: (+1) 866 519 3751
Y/N
Integration with a Test Data
on Demand Strategy and
Platform
Masking Features
Performance
Medium
Weighting
Mandatory
Mainframe Support
High
Reversible Masking
Low
Agile Development Support
High
No ETL required
Data Quality Management
Consulting Services
E: sales@grid-tools.com
Mandatory
High
Masking can be time-consuming and tedious. Being able to use this work to provide
a better approach to test data delivery will improve the quality of development and
reduce the number of bugs that make it into production.
In The Real World
The masking technology stack needs to be able to mask medium to large databases
very rapidly. Being able to fit a run into a nightly or on-demand window is essential;
developers and testers cannot wait days for production refreshes.
In some cases, we have seen our technology run 100 times faster than competitors’
technology. This is particularly important in databases with multiple billions of rows.
Local support on the mainframe is essential. Using ETL processes or ODBC layers will
not perform and does not fit in with normal mainframe batch estates.
Being able to work your back from a masked value can be useful, and thus can be set
up in a number of different ways.
Masking tools that do not allow data to be delivered to Agile teams at the beginning,
middle and end of a sprint with multiple database and meta model changes should
not be considered.
Some products require that all data is transformed into another database, flat file or
platform before it can be masked. This causes very long delays and introduces a high
level of complexity that is not required for masking projects. In addition, a high level
of CPU usage is needed to move data back and forward.
When masking data the quality of data must be considered. If production data
contains ‘bad’ data then consideration should be given to retaining that ‘bad’ data in
development.
Masking tools should be able to identify these outliers and then be configured to
pass on the data. For ETL projects, this is a must as the migration code must be able
to check for non-standard data.
Med to Low Products should be stand-alone and usable after training. Having consulting services
to bulk up local teams can be useful.
www.grid-tools.com
UK: +44 (0)1865 884 600
US: (+1) 866 519 3751
Y/N
For more information about how
GT Fast Data Masker from GridTools can benefit you, contact us:
UK: +44 01865 884 600
US: +1 866 563 3120
E: sales@grid-tools.com
www.grid-tools.com
Subscribe to our blog
Find us on Facebook
Follow us on Twitter
Connect with us on LinkedIn
E: sales@grid-tools.com
www.grid-tools.com
UK: +44 (0)1865 884 600
US: (+1) 866 519 3751
Download