Big Data Working Group Session Praveen Murthy, Fujitsu Labs of America Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org The ‘freshman’ of the CSA working groups Lots of press & attention Leadership team: Chair - Sree Rajan, Fujitsu Co-chair - Neel Sundaresan, Ebay Co-Chair - Wilco van Ginkel, Verizon Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org 1: Data analytics for security 2: Privacy preserving/enhancing technologies 3: Big datascale crypto 4: Big data Infrastructures' Attack Surface Analysis and Reduction Big Data Working Group 8: Framework and Taxonomy 70+ members 7: Top 10 6: Legal Issues 5: Policy and Governance https://basecamp.com/1825565/projects/511355-big-data-working Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Lead to crystallization of best practices for security and privacy in big data Support industry and government on adoption of best practices Establish liaisons with other organizations in order to coordinate the development of big data security and privacy standards Accelerate the adoption of novel research aimed to address security and privacy issues Identify scalable techniques for datacentric security and privacy problems Top 10 Big Data Security & Privacy Challenges developed for CSA Congress Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org 1) Secure computations in distributed programming frameworks 2) Security best practices for non-relational data stores 3) Secure data storage and transactions logs 4) End-point input validation/filtering Big Data Top-10 5) Real-time security/compliance monitoring 6) Scalable and composable privacy-preserving analytics 7) Crypto-enforced access control and secure communication 8) Granular access control 9) Granular audits 10)Data provenance Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org CSA Big Data Working Group Site https://cloudsecurityalliance.org/research/big-data/ CSA, Big Data LinkedIn http://www.linkedin.com/groups?home=&gid=4458215&trk=anet_ug_hm Basecamp Project Collaboration Site Request Form https://cloudsecurityalliance.org/research/basecamp/ For any questions/remarks/feedback, please contact either: Who How Sreeranga (Sree) Rajan (Fujitsu) sree@us.fujitsu.com Neel Sundaresan (eBay) nsundaresan@ebay.com Wilco van Ginkel (Verizon) wilco.vanginkel@verizon.com Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Help Us Secure Cloud Computing www.cloudsecurityalliance.org info@cloudsecurityalliance.org LinkedIn: www.linkedin.com/groups?gid=1864210 Twitter: @cloudsa, @CSAResearchGuy Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Alvaro Cardenas, UT Dallas Pratyusa Mandhata, HP Labs Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Create a reference architecture educating new users of how big data analytics can be used for security, (might include tutorials?) Explain what is new when compared to other traditional continuous monitoring approaches, Crystallize best practices on big data analytics, Identify big data analytics problems and technologies that can be standardized Identify gaps where new research is needed and best practices Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Intrusion Detection Systems (1990) Network flows, Host Intrusion Detection logs, etc. Security Information and Event Management (SIEM) (mid-2000) Alarm Correlation Big Data Security/Analytics (now) Variety of Data, Security Intelligence Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Big Data Promise Traditional Systems More rigid, predefined schemas Structured and unstructured data treated seamlessly Data gets deleted Keep data for historical correlation (e.g., 10 years) Complex analyst queries take long to complete Faster query response times Hadoop is de facto open standard for big data at rest Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Big Data • Cyber-Data • Logs, events, network flows, user id. & activity, etc Analytics • Models, Baselining • Feature extraction • Anomaly detection • Context (external sources of information) Copyright © 2011 Cloud Security Alliance Dashboard • Security analyst (human) looks at indicators • Correlates with external sources of info to detect attacks www.cloudsecurityalliance.org In 2011 >60% of respondents installed tools to gain a better view of what is on their network McAfee Risk & Compliance Outlook 2012 Examples: Database Activity Monitoring (DAM) Monitors administrator activity, unusual database reads/updates, event aggregation, correlation and reporting Identity Access Management Risk-Management control room Security Information and Event Management (SIEM) Vulnerability Assessment Tools Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Currently on internal review Target: Open to external review Q1 2013 Target: Final report published H1 2013 Main thrust: the centrality of data analytics for combating APTs Contributions to the report (so far) by Symantec, AT&T, EMC, RSA, HP, IBM, Fujitsu, University of Luxembourg, University of Texas at Dallas. Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Henry St. Andre, inContact Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Privacy Enhancing Technologies: PET is a term for a set of computer tools and applications which when integrated with online services allow online users to protect the privacy of their personally identifiable information. The PET Team or PETT seeks to address significant problems around PET Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Need volunteers! Reach out on basecamp, or send email to sree.rajan@us.fujitsu.com Basecamp Project Collaboration Site Request Form https://cloudsecurityalliance.org/research/basecamp/ Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Arnab Roy, Fujitsu Labs Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org 1. Communication protocols 2. Data-centric security 3. Big data privacy 4. Key management 5. Data integrity and poisoning concerns 6. Searching / filtering encrypted data 7. Secure data collection/aggregation 8. Secure collaboration 9. Proof of data storage 10. Secure outsourcing of computation Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org PK Filtering Token Encrypter SK Decrypter “Conjunctive, subset, and range queries on encrypted data” by Dan Boneh and Brent Waters, 2007 Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org How to make collection of data private as well as authenticated? Can verify signature came from a group member Cannot infer which member Copyright © 2011 Cloud Security Alliance In case of dispute, a trusted third party can trace the signature to an individual www.cloudsecurityalliance.org The technical problem is to make group signatures efficient and short “Short Group Signatures” by Boneh, Boyen and Shacham, 2004 Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Private Searching on Streaming Data Ostrovsky and Skeith, CRYPTO 2005 Problem Scenario: The intelligence gathering community needs to collect a useful subset of huge streaming sources of data The criteria for being useful may be classified – private criteria Most of the streaming data is useless and storing it all may be impractical – filter at source How de we keep the filtering criteria secret even if it is executing at the source? Solution: Obfuscate the filtration code Even if the source falls into enemy hands, it cannot figure out the criteria Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Secret Criteria Obfuscate Garbled Blogs Filter Garbled Filter Net Traffic Encrypted Filtered Data Decrypt News Feed Cloud Filtered Data Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Computing on Authenticated Data A signature scheme such that it is possible to derive signatures on “related” data from a signature on the original document For example, deriving signatures on a redacted version of a document, without knowing the signing key “Computing on Authenticated Data” by Jae Hyun Ahn, Dan Boneh, Jan Camenisch, Susan Hohenberger, abhi shelat and Brent Waters. Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org N = pq f = F mod φ(N) File F; N random g F gF mod N Check if gf = gF mod N “PORs: Proofs of Retrievability for Large Files” by Juels and Kaliski “Compact Proofs of Retrievability” by Shacham and Waters Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Problem Scenario: A “weak client” wants to outsource a computation The provider returns the result along with a “proof” that the computation was carried out correctly Catch: verification of the proof should require substantially less computational effort than computing the result from scratch References: “Non-Interactive Verifiable Computing Outsourcing Computation to Untrusted Workers” by Rosario Gennaro, Craig Gentry and Bryan Parno. “Fully Homomorphic Message Authenticators” by Rosario Gennaro and Daniel Wichs. Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Functional Encryption Identitybased encryption Attributebased encryption Richer policies Disjunction, Conjunction Polynomials Threshold Predicates “Predicate Encryption Supporting Disjunctions, Polynomial Equations, and Inner Products” - Jonathan Katz, Amit Sahai and Brent Waters. Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org We invite you to participate in the WG. Contact: Initiative Lead: Arnab Roy, Fujitsu Labs of America Email: aroy@us.fujitsu.com Many thanks to Dan Boneh, Mihai Christodorescu and Roy P. D’Souza for discussion on this topic Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Praveen Murthy, Fujitsu Labs Bryan Payne, Nebula Jesus Molina, Molina Consulting Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Provide analytical security metrics for Big Data infrastructure Hadoop, OpenStack, … Analyze attack surface Idea is to be able to do differential analysis to determine how attack surface changes with various configurations Seed ideas and prototypes in this BDWG initiative in an open, transparent, architecture/brand neutral/agnostic manner Crowd-source for improving and standardizing metrics Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Big Data and Virtualization infrastructure are being hit from inside due to Advanced Targeted Attacks (Spear Phishing) Explore attack surface for these infrastructures for different configurations using open source implementations: OpenStack - Hadoop Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Create a highly reconfigurable implementation of the distributed infrastructure in a public cloud (openstack cloud in the cloud, Hadoop in the cloud) Evaluate attack surface for each configuration, evaluate open attack vectors Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Usage: testbed [command] [options] [testbed-name] command Description start Start instances in a testbed stop Stop instances in a testbed destroy Destroy instances in a testbed list List instances in a testebed Describe Describe testbed configuration (IP, vpc, etc) Create Create a new testbed Configure Configure an existing tetbed ssh Ssh to the controller nodes surface Creates surface attack node Eg: testbed create 5 –config openstack.conf openstacktestbed Creates a VPC for the testbed in AWS cloud Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org 10.0.0.0/24 Elastic IP Virtual Private Cloud AWS Cloud Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org The attack surface is an enumeration of elements that could be utilized by an adversary to infiltrate the system Attack surface can be utilized as a security metric, and also to understand the possible attack vectors and reduce their risk. Currently evaluating three dimensions of the surface 1. Enablers (open processes, files,…) 2. Communication within the distributed system (exposed ports, protocols, …) 3. Access rights Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Howard, Michael, Jon Pincus, and Jeannette Wing. "Measuring relative attack surfaces." Computer Security in the 21st Century (2005): 109-137. Create “snapshots” of system state – Windows Attack Surface Tool Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Nodes Baseline Nodes Configuration 1 Nodes Configuration 2 Attack surface report Attack surface report Attack surface report Difference - report Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org • Initial kickoff in Fall 2012 (OpenStack Summit in Oct 2012) • Broad industry support and collaboration Copyright © 2011 Cloud 39 Security Alliance www.cloudsecurityalliance.org • High interest expressed from participants at OpenStack summit in Fall 2012 • Initial work underway • Aiming for v1 in Spring / Summer 2013 • Start small, grow with community involvement for future versions • Attack Surface Modeling can help direct this security guide, providing a scientific basis for specific security recommendations. Copyright © 2011 Cloud 40 Security Alliance www.cloudsecurityalliance.org We invite you to participate in the WG. Contact: Initiative Leads: Praveen Murthy, Fujitsu Labs of America, Bryan Payne, Nebula Email: praveen.murthy@us.fujitsu.com Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Srinivas Jaini, Kinetic Networks Inc., Pratyusa K. Manadhata, HP Sarah Hendrickson, Dell Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Data analytics for security Privacy preserving/enhancing technologies Big data-scale crypto Cloud Attack Surface Reduction Policy and Governance Framework and Taxonomy To address data governance challenges and contribute to development of standards in the areas of security and governance in big data technologies. Define Big Data Framework & Taxonomy to (i) get a common understanding of Big Data terms & definitions and (ii) act as a structure to which all the Big Data Initiatives can be linked. Two separate initiatives now, but may become one. Top 10 Legal Issues Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Initiative 5: Policy and Governance is looking for volunteers to join. Kick off meeting second week of March planned. Interested volunteers are encouraged to sign up on basecamp. https://cloudsecurityalliance.org/research/basecamp/ Send email to Srinivas Jaini [srinivasjaini@gmail.com] Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Sree Rajan, Fujitsu Labs Arnab Roy, Fujitsu Labs Alvaro Cardenas, UT Dallas Jesus Molina, Molina Consulting Praveen Murthy, Fujitsu Labs Wilco Van Ginkel, Verizon Neel Sundaresan, Ebay Pratyusa Manadhata, HP Labs Shiju Sathyadevan, Amrita University Rongxing Lu, University of Waterloo Adam Fuchs, Sqrrl Yu Chen, SUNY Binghamton Alan Lane, Securosis Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Top 10 Challenges Identified by CSA BDWG 1) 2) 3) 4) 5) 6) 7) 8) 9) Secure computations in distributed programming frameworks Security best practices for non-relational datastores Secure data storage and transactions logs End-point input validation/filtering Real time security monitoring 4, 8, 9 1, 3, 5, 6, 7, 8, 9, 10 Scalable and composable privacy-preserving data mining and analytics Cryptographically enforced access control and secure communication 10 4, 10 Granular access control Granular audits 2, 3, 5, 8, 9 Data Storage Public/Private/Hybrid Cloud 10) Data provenance 5, 7, 8, 9 Copyright © 2011 Cloud Security Alliance 46 www.cloudsecurityalliance.org Copyright 2013 FUJITSU LIMITED A security and privacy challenge typically has three dimensions of difficulty: Modeling: formalizing a threat model that covers most of the cyber-attack or data-leakage scenarios Analysis: finding tractable solutions based on the threat model Implementation: implementing the solution in existing infrastructures. Followed a three-step process to arrive at top challenges in big data: Interviewed Cloud Security Alliance members and surveyed security-practitioner oriented trade journals to draft an initial list of high priority security and privacy problems Studied published solutions Characterized a problem as a challenge if the proposed solution does not cover the problem scenarios. Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Secure Computation in Distributed Programming Frameworks How do we secure distributed frameworks which exploit parallelism in computation and storage? Threats/Challenges: Current Mitigations: Malfunctioning compute worker nodes Trust establishment: initiation, periodic trust update Access to sensitive data Mandatory access control Privacy of output information Privacy preserving transformations Copyright © 2011 Cloud Security Alliance 48 www.cloudsecurityalliance.org Copyright 2013 FUJITSU LIMITED Security Best Practices for Non Relational Data Stores How do we secure non-relational data stores which were not built with security in mind? Threats/Challenges: Current Mitigations: Lack of stringent authentication and authorization mechanisms Enforcement through middleware layer Passwords should never be held in clear Encrypted data at rest Lack of secure communication between compute nodes Protect communication using SSL/TLS Copyright © 2011 Cloud Security Alliance 49 www.cloudsecurityalliance.org Copyright 2013 FUJITSU LIMITED Secure data storage and transaction logs How do we secure infrastructure for big data storage management? Threats/Challenges: Current Mitigations: Data Confidentiality and Integrity Encryption and Signatures Availability Proof of data possession Consistency Periodic audit and hash chains Collusion Policy based encryption Copyright © 2011 Cloud Security Alliance 50 www.cloudsecurityalliance.org Copyright 2013 FUJITSU LIMITED End-point Input Validation / Filtering How can we trust data that is coming in from diverse endpoints like sensors, devices and applications? Threats/Challenges: Current Mitigations: Adversary may tamper with device or software Tamper-proof Software Adversary may clone fake devices Trust Certificate and Trusted Devices Adversary may directly control source of data Analytics to detect outliers Adversary may compromise data in transmission Cryptographic Protocols Copyright © 2011 Cloud Security Alliance 51 www.cloudsecurityalliance.org Copyright 2013 FUJITSU LIMITED Real-time Security Monitoring How do we leverage big data analytics to help improve the security of systems? Threats/Challenges: Current Mitigations: Security of the infrastructure Discussed before Security of the monitoring code itself Secure coding practices Security of the input sources Discussed before Adversary may cause data poisoning Analytics to detect outliers Copyright © 2011 Cloud Security Alliance 52 www.cloudsecurityalliance.org Copyright 2013 FUJITSU LIMITED Scalable and Composable Privacy-Preserving Data Mining and Analytics How do we leverage big data analytics to help improve the security of systems? Threats/Challenges: Current Mitigations: Exploiting vulnerability at host Encryption of data at rest, access control and authorization mechanisms Insider threat Separation of duty principles, clear policy for logging access to datasets Outsourcing analytics to untrusted partners Unintended leakage through sharing of data Awareness of re-identification issues, differential privacy Copyright © 2011 Cloud Security Alliance 53 www.cloudsecurityalliance.org Copyright 2013 FUJITSU LIMITED Cryptographically Enforced Data Centric Security How do we enforce the protection of data end to end? Threats/Challenges: Current Mitigations: Enforcing access control Identity and Attribute-based encryptions Search and filter Encryption techniques supporting search and filter Outsourcing of computation Fully Homomorphic Encryption Integrity of data and preservation of anonymity Group signatures with trusted third parties Copyright © 2011 Cloud Security Alliance 54 www.cloudsecurityalliance.org Copyright 2013 FUJITSU LIMITED Granular Access Control How do we control access to diverse datasets? Threats/Challenges: Current Mitigations: Keeping track of secrecy requirements of individual data elements Pick right level of granularity: row level, column level, cell level Maintaining access labels across analytical transformations At the minimum, conform to lattice of access restrictions. More sophisticated data transforms are being considered in active research Keeping track of roles and authorities of users Authentication, authorization, mandatory access control Copyright © 2011 Cloud Security Alliance 55 www.cloudsecurityalliance.org Copyright 2013 FUJITSU LIMITED Granular Audits How do we audit diverse and distributed systems? Threats/Challenges: Current Mitigations: Completeness of audit information Timely access to audit information Integrity of audit information Infrastructure solutions as discussed before. Scaling of SIEM tools. Authorized access to audit information Copyright © 2011 Cloud Security Alliance 56 www.cloudsecurityalliance.org Copyright 2013 FUJITSU LIMITED Data Provenance How do we keep track of complex metadata? Threats/Challenges: Current Mitigations: Secure collection of data Authentication techniques Consistency of data and metadata Message digests Insider threats Access Control through systems and cryptography Copyright © 2011 Cloud Security Alliance 57 www.cloudsecurityalliance.org Copyright 2013 FUJITSU LIMITED Vivian Tero, Governance Risk & Compliance (GRC) Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Progress in the regulatory and legal track has been slow Difficult to find legal counsels conversant and willing to discuss corporate compliance/risk management activities for their big data activities. Reached out to a couple of regulators (FTC and CFRB) Meetings planed within the next 3-4 weeks. Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org For more info about CSA CloudBytes: Top Challenges for Big Data https://cloudsecurityalliance.org/research/big-data/ Help Us Secure Cloud Computing www.cloudsecurityalliance.org info@cloudsecurityalliance.org LinkedIn: www.linkedin.com/groups?gid=1864210 Twitter: @cloudsa, @CSAResearchGuy Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org Most of our Research Projects are ideas from professionals like you Do you have an idea for a research project on a cloud security topic? If so, please take the time to describe your concept by filling out the our online form. This form is monitored by the CSA research team, who will review your proposal and respond to you with feedback. Copyright©©2011 2011Cloud CloudSecurity SecurityAlliance Alliance Copyright Copyright © 2012 Cloud Security Alliance www.cloudsecurityalliance.org www.cloudsecurityalliance.org Learn how you can participate in Cloud Security Alliance's goals to promote the use of best practices for providing security assurance within Cloud Computing http://www.linkedin.com/groups?gid=1864210 https://cloudsecurityalliance.org/get-involved/ Copyright © 2011 Cloud Security Alliance www.cloudsecurityalliance.org