Data Masking Checklist Selecting the Right Data Masking Tool E: sales@grid-tools.com www.grid-tools.com UK: +44 (0)1865 884 600 US: (+1) 866 519 3751 Selecting Your Masking Tool Ensuring compliance with current data protection regulations and guidelines has become a mandatory operation. Non-compliance not only carries the risk of heavy fines and damages public relations, but also fails to adequately secure your sensitive data against data breaches. Traditionally, many organizations have used manual techniques to mask (see also de-sensitize, de-identify, or obfuscate) full copies of production data for use in development and testing. However, this is a labour-intensive, time-consuming and costly process that is prone to human error and inconsistency. As a result, teams are often provided with poor quality data that is both inefficient and expensive to create. This lengthens your test cycles as testers wait for data and reduces quality, resulting in more potentially costly defects making production. Therefore, organizations are increasingly beginning to look towards implementing data masking tools to improve the quality of the data and reduce the length and cost of their test cycles. However, there are a number of data masking tools on the market, so how do you choose the right one for your project? Below, we have set out a matrix containing a comprehensive list of all the features you need to consider when ensuring that your testing and development teams are provided with high quality, compliant test data that can increase the quality and reduce the cost of your project. In each case, we’ve noted how important these are, and how they can help solve the probable problems you might face in the real world. E: sales@grid-tools.com www.grid-tools.com UK: +44 (0)1865 884 600 US: (+1) 866 519 3751 Masking Features Application and Database Integrity Weighting Mandatory In The Real World Consistent masking across multiple applications is essential for integration and end to end testing. Cross-Platform Integrity Mandatory Most large enterprises feed data across multiple platforms and technology stacks. Medium Consistent masking across multiple platforms is essential for integration and end to end testing. This is usually part of the set-up and understanding of applications. In our experience, this can quickly be derived from pattern matching inside the data, naming standards inside the catalogue and documentation. Cross-System Data Relationship Discovery and Definition These relationships are interesting, however, more focused ‘PII and Financial Discovery Scanning’ (see below) is far more important as it takes far more time, is more prone to error and is subject to changes over time. PII and Financial Discovery Scanning Mandatory The ability to scan all or percentages of the data across multiple systems and automatically identify which data is potentially problematic is essential. Relying on users’ interpretation of reports and screens is not good enough to discover where hidden data exists in the system. The alternative is to use a ‘double blind’ manual approach, where multiple users arrive at the same conclusion about which data needs to be masked. However, this process is extremely time-consuming and results in a project failure. Vendor-Provided Apps Packs of Rules E: sales@grid-tools.com Low Relying on a pre-defined set of rules provided by a vendor means that you are relying on their knowledge of a specific ERP. This is fraught with danger; remember, it is you who is liable if there is a data leakage! www.grid-tools.com UK: +44 (0)1865 884 600 US: (+1) 866 519 3751 Y/N Masking Features Integration into BAU Development Structures Masking Repeatability Multiple Database and Platform Support Weighting High High High Also, these app packs don’t consider local customizations within your applications – the way you use the system as well as the normal usage of flex fields etc. It is better to use a robust PII scanning tool to guarantee nothing is missed when masking. In The Real World The ability to easily fit the processes into existing DBA data provisioning procedures in a timely manner. The ability to consistently mask data using either deterministic masking functions or cross-reference tables means the data can be masked in a similar manner across applications Support for masking on single platform or single database types will result in different, inconsistent masking being set up across the enterprise. Being able to mask data in legacy systems, such as IMS and VSAM as well as SQL Server, for example, is essential. Multiple Masking Technology Stacks High One size definitely does not fit all. Some vendors provide a single method of masking, for example, in-place masking, extract into files-mask-and return etc. In reality, to mask very large or complex applications across multiple platforms means that different technologies need to be used. This could include native database utilities, in-database functions, or native mainframe masking etc. Reporting and Auditing High Reporting on what has been masked is required, however, a more important consideration is who chose what needs to be masked, why it does and when. In addition, there needs to be an audit of exactly what technology was used to perform the masking. Flexible Masking Engines and Methodologies E: sales@grid-tools.com High The masking product needs to provide multiple methods for the data team. Based on the size, urgency and potential risk, having simple-to-complex technology available means that teams will be much more reactive. www.grid-tools.com UK: +44 (0)1865 884 600 US: (+1) 866 519 3751 Y/N Technology should include: In-place masking, extract and mask ‘in flight’, build shadow tables, as well as dynamic masking via views and message layers. Masking Features Dynamic Masking Weighting High In The Real World In some cases, ad hoc queries need to be made against real data. Access to this real data can be controlled by creating a masked transparency layer. This uses a set of views which mask certain fields consistently across databases. These views can also be adjusted to identify which users have access to which data. In addition, development applications can be set up to use the masked transparency layer so that data used by developers appears masked. Dynamic masking can also be deployed at the message or SOAP level. This can be extremely useful for TDM teams as they can quickly provide access to web services via a proxy. The proxy masks the data ‘in-flight’. No SQL Masking engine High Some dynamic masking engines try and interpret the SQL and mask the data returned from and to the database. All RDBMS’ support the concept of views and synonyms, so using the native RDBMS’ own built-in functions is a much more sensible and standard approach. Subsetting in Conjunction with Masking High A lot of current data legislation refers to ‘minimal data’ being used. Adding subsetting to a masking project should be easy and is highly recommended. It can also quickly improve the run times of data provisioning and agility of teams. Complex Flat File Structures Medium Being able to verify that flat file structures are valid (see Data Quality) as well as fully understood is key. Many enterprise systems will contain multiple definitions of files and messages; being able to verify these and mask effectively is essential. Being able to Mask Isolates High Dependent on the level of masking required, being able to mask isolated values, for example high numbers with decimal places (134345567.12), is very important. You E: sales@grid-tools.com www.grid-tools.com UK: +44 (0)1865 884 600 US: (+1) 866 519 3751 Y/N do not want one piece of information being able to be used to trace back to a specific user or account. Masking Features Being able to Mask Trends Weighting Medium In The Real World If an entire masked database is lost, then the general trends of the data have commercial value. Being able to mask these trends, so that application integrity is still maintained, is essential for fully secure masking. Subsetting can help with this issue, as can using data constellations to provide the essence of all the data without data trends. Data Constellations High For very highly regulated markets, shipping masked data offshore is very problematic. The inability to send data offshore can result in increasing costs. Using a data constellation that looks for data dimensions that exist in production (basically transaction major attributes), linked with synthetic and/or masked PII data, allows ‘production-like’ data to be provisioned with none of the real content. Richness of Functions, as well as Custom Masking Routines High Most masking tools allow addresses and names to be masked. However, more complex types of masks, such as IBAN numbers, check sums etc. need to be included. In addition to this, the ability to build local custom masking routines or integrate existing masking should to be included. Advanced Masking Functionality High As a project develops, more complex types of masking requirements are often discovered. The masking tool must be able to handle these complex needs. A typical example would be multi value – multi column cross referencing. For instance, the names Adam Smith, A Smith and ASMITH need to be masked consistently. Many vendors do this by simply hand-building SQL to be run prior to the mask. E: sales@grid-tools.com www.grid-tools.com UK: +44 (0)1865 884 600 US: (+1) 866 519 3751 Y/N Integration with a Test Data on Demand Strategy and Platform Masking Features Performance Medium Weighting Mandatory Mainframe Support High Reversible Masking Low Agile Development Support High No ETL required Data Quality Management Consulting Services E: sales@grid-tools.com Mandatory High Masking can be time-consuming and tedious. Being able to use this work to provide a better approach to test data delivery will improve the quality of development and reduce the number of bugs that make it into production. In The Real World The masking technology stack needs to be able to mask medium to large databases very rapidly. Being able to fit a run into a nightly or on-demand window is essential; developers and testers cannot wait days for production refreshes. In some cases, we have seen our technology run 100 times faster than competitors’ technology. This is particularly important in databases with multiple billions of rows. Local support on the mainframe is essential. Using ETL processes or ODBC layers will not perform and does not fit in with normal mainframe batch estates. Being able to work your back from a masked value can be useful, and thus can be set up in a number of different ways. Masking tools that do not allow data to be delivered to Agile teams at the beginning, middle and end of a sprint with multiple database and meta model changes should not be considered. Some products require that all data is transformed into another database, flat file or platform before it can be masked. This causes very long delays and introduces a high level of complexity that is not required for masking projects. In addition, a high level of CPU usage is needed to move data back and forward. When masking data the quality of data must be considered. If production data contains ‘bad’ data then consideration should be given to retaining that ‘bad’ data in development. Masking tools should be able to identify these outliers and then be configured to pass on the data. For ETL projects, this is a must as the migration code must be able to check for non-standard data. Med to Low Products should be stand-alone and usable after training. Having consulting services to bulk up local teams can be useful. www.grid-tools.com UK: +44 (0)1865 884 600 US: (+1) 866 519 3751 Y/N For more information about how GT Fast Data Masker from GridTools can benefit you, contact us: UK: +44 01865 884 600 US: +1 866 563 3120 E: sales@grid-tools.com www.grid-tools.com Subscribe to our blog Find us on Facebook Follow us on Twitter Connect with us on LinkedIn E: sales@grid-tools.com www.grid-tools.com UK: +44 (0)1865 884 600 US: (+1) 866 519 3751