Application-screen Masking: A Hybrid Approach Abigail Goldsteen, Ksenya Kveler, Tamar Domany, Igor Gokhman, Boris Rozenberg, Ariel Farkash Information Privacy and Security, IBM Research – Haifa Presented by Abigail Goldsteen W2SP Workshop, San Jose, May 2014 © 2014 IBM Corporation Agenda • • • • • • • Problem Existing approaches Our approach Challenges and limitations Comparison between approaches Summary Questions © 2014 IBM Corporation Agenda • • • • • • • Problem Existing approaches Our approach Challenges and limitations Comparison between approaches Summary Questions © 2014 IBM Corporation Problem • How to share information while safeguarding the privacy and security of sensitive data Existing applications New users/ use cases • Need to prevent users from viewing information they are not authorized to see © 2014 IBM Corporation Example Data Center Germany Name: John Smith National ID: 35 Balance: $127.50 Outsourced Call Center India © 2014 IBM Corporation Agenda • • • • • • • Problem Existing approaches Our approach Challenges and limitations Comparison between approaches Summary Questions © 2014 IBM Corporation Existing approaches 1. Redesign application o Can be very complicated and costly o Not always possible due to lack of skills 2. Mask values in database o Difficult to maintain several copies o May “break” the application 3. Mask application-screens o Sensitive values are removed/masked after the application has constructed the visual layout of the screen Application server Masking Client © 2014 IBM Corporation Rule types Content-based • Based on the text value or its format • Can be defined using o Regular expressions o Natural Language Processing (NLP) o Other data classification techniques • Example: o A regular expression depicting email addresses Context-based • Based on the visual structure of the screen • Can be defined using o UI constructs (labeled fields, table columns, drop-down boxes, etc.) o A relationship between two entities on the screen o Absolute locations • Example: o Mask all labeled fields in which the label is “Email Address” © 2014 IBM Corporation Existing application-screen masking approaches (1) At the network level: Application server HTTP request HTTP request Web Proxy HTTP response Masking Masked screen Client Masked HTTP response Fast Secure × Simplistic content-based rules © 2014 IBM Corporation Existing application-screen masking approaches (2) At the presentation level: Application server HTTP request VNC Server Unmasked Remote Framebuffer (RFB) OCR Masked screen Client Masked RFB HTTP response Masking Context-based rules defined on screen × Difficulties in handling complex screens × Severe performance issues © 2014 IBM Corporation Existing application-screen masking approaches (3) At the operating system level Application server HTTP request HTTP response Masked screen Client Masking Context-based rules defined on screen × Installation on every end-user machine × Security issues © 2014 IBM Corporation Agenda • • • • • • • Problem Existing approaches Our approach Challenges and limitations Comparison between approaches Summary Questions © 2014 IBM Corporation Hybrid approach • Masking at the network level Fast Secure • Easy rule definition at the presentation level Context-based rules defined on screen Content-based rules are also supported © 2014 IBM Corporation Some features • All sensitive information is removed from the message and does not reach the browser o Cannot be viewed on screen or in page source • Masking server and proxy are placed within the enterprise’s internal network o Sensitive information does not leave the premises • Client requests are also intercepted to check if they contain masked data o The request is reconstructed with the original data before sending to the server © 2014 IBM Corporation Masking rules • Rules are expressed in Javascript o Powerful • Can define any type of context-based rule o Flexible • Can work on many payload formats (e.g., HTML, XML, JSON, etc.) o Fast • Executed using existing, optimized engine1 • Each rule is executed on a specific HTTP message o Can be filtered based on URL, server or client IP and username • Several possible masking methods o Remove, Replace, Encrypt, etc. 1. Mozilla Spidermonkey, https://developer.mozilla.org/en-US/docs/SpiderMonkey © 2014 IBM Corporation Visual rule authoring • Creating Javascript rules for individual HTTP messages is very difficult o Each displayed element (e.g., table) may originate from several different messages • May have different formats • May come from AJAX requests o Need to use several tools to inspect network traffic, understand the underlying DOM and associate between the displayed element and the messages that created it o Need to write scripts that are syntactically correct and validate that masking is performed correctly • Need some tool to facilitate rule authoring process © 2014 IBM Corporation “Selection tool” © 2014 IBM Corporation “Selection tool” close-up • Web-based tool, implemented in Javascript • A floating panel attached to the original application • Intercepts mouse hovering and click events to enable selection © 2014 IBM Corporation Agenda • • • • • • • Problem Existing approaches Our approach Challenges and limitations Comparison between approaches Summary Questions © 2014 IBM Corporation Technical challenges (1) • Automatically creating scripts from user selections © 2014 IBM Corporation Technical challenges (1) • Our solution: We devised an algorithm for detecting the origin of each screen element while the page is loading o Monitors all web page modifications, compares the DOM before and after the modifications and captures the changes that were initiated by HTTP messages o Creates a map between each visual element and the message it came from, including the message’s URL and the location of the element within the message (e.g., Xpath) © 2014 IBM Corporation Technical challenges (2) • Interacting with the target application without changing it o Need to catch DOM changes and add listeners for mouse events in the target application o Browsers’ same-origin policy prevents pages/frames from different origins from manipulating each others’ DOMs2 This prevents the naïve solution of presenting the target application in its own frame within a larger rule-authoring tool page o Possible solutions: • Browser add-on Both require installation on the rule-author’s machine • Standalone tool • Our solution is based on hidden frames and “injecting” the selection tool code into the application messages using the runtime proxy 2. J. Ruderman, “The same origin policy”, http://www.mozilla.org/projects/security/components/same-origin.html © 2014 IBM Corporation Limitations 1. Cannot mask information that does not flow over the network, i.e., generated on the client-side o Example: an average that is calculated in the browser using Javascript 2. Cannot mask information that flows in binary format o Examples: images, Java applets, Adobe Flash objects, etc. 3. May fail client-side validation o Example: a field that checks for a valid email address o Solution: use format-preserving masking techniques © 2014 IBM Corporation Agenda • • • • • • • Problem Existing approaches Our approach Challenges and limitations Comparison between approaches Summary Questions © 2014 IBM Corporation Comparison of approaches (1) • Rule strength and granularity o We compare our context-based approach with content-based rules and database masking, based on 4 criteria: • Masking granularity – the ability to mask exactly what is needed • Logical rule coverage - the ability to describe a rule by its logical content (e.g., mask only patient emails) • Visual rule coverage - the ability to mask all or part of the elements in a given area of the screen • Visual screen context - the ability to create rules in the context of the presentation layer © 2014 IBM Corporation Examples • Masking granularity: o A content-based rule will always mask all phone numbers in the application • Cannot mask only patient phone numbers and not physician phone numbers • Logical rule coverage: o At the DB layer, any data item can be specified for masking only once, even if it appears on several pages or has several different formats • Cannot support cases where a data item in a table appears in two different contexts, one that should be masked and one that shouldn’t • Visual rule coverage o Our approach enables masking all items in a given area of the screen, even though there may not be any correlation in the format or database table © 2014 IBM Corporation Comparison of approaches (2) • Rule enforcement mechanisms o We compare our network-level enforcement with masking at the database level and the at the presentation-layer (using OCR), based on 3 criteria: • Application integrity – effects on the proper functioning of the application • Role-based masking – different masking based on user roles • Impact of screen complexity – do complex screens make masking more difficult? © 2014 IBM Corporation Examples • Application integrity o At the DB layer, illegal or missing values can result in “breaking” the application o At the network layer, client-side validation or calculations may be compromised • Impact of screen complexity o Masking at the presentation layer is directly correlated to screen complexity • Overlapping or partially visible windows pose a significant challenge o Network-based masking is somewhat affected by application complexity, e.g., a screen constructed from many different messages • Masking is still possible, but rule definition is more complicated © 2014 IBM Corporation Agenda • • • • • • • Problem Existing approaches Our approach Challenges and limitations Comparison between approaches Summary Questions © 2014 IBM Corporation Summary • We showed a hybrid approach that combines context-based rule creation at the presentation level with enforcement at the network level • This enables: o Powerful and flexible rule language o Easy and straight-forward rule authoring process o Minimal performance impact at runtime • Masking rules are defined in a simple and intuitive manner while navigating the target application and clicking on sensitive areas • Requires minimal changes to the existing environment – no changes to the application or database © 2014 IBM Corporation Agenda • • • • • • • Problem Existing approaches Our approach Challenges and limitations Comparison between approaches Summary Questions © 2014 IBM Corporation Questions? © 2014 IBM Corporation Thank you © 2014 IBM Corporation