HDP with Advanced Security Comprehensive Security for Enterprise Hadoop Hortonworks. We do Hadoop. Page 1 © Hortonworks Inc. 2014 Agenda • Our approach across security pillars • Component Deep Dive • Questions Page 2 © Hortonworks Inc. 2014 Security needs are changing Security needs are changing • YARN unlocks the data lake 5 areas of security focus • Multi-tenant: Multiple applications for data access Administration • Changing and complex compliance environment Centrally management & consistent security • ETL of non-sensitive data can yield sensitive data Authentication Authenticate users and systems Authorization Provision access to data Audit Maintain a record of data access Data Protection Protect data at rest and in motion Page 3 © Hortonworks Inc. 2014 Fall 2013 Largely silo’d deployments with single workload clusters Summer 2014 65% of clusters host multiple workloads Security in Hadoop with HDP + Argus (XA Secure) Argus HDP 2.1 Centralized Security Administration Page 4 Authentication Who am I/prove it? Authorization Restrict access to explicit data • Kerberos in native Apache Hadoop • HTTP/REST API Secured with Apache Knox Gateway • HDFS Permissions, HDFS ACL, • Audit logs in with HDFS & MR • Hive ATZ-NG • Wire encryption in Hadoop • Open Source Initiatives • Partner Solutions • As-Is, works with current authentication methods • HDFS, Hive and Hbase • Fine grain access control • RBAC • Future Integration © Hortonworks Inc. 2014 Audit Understand who did what • Centralized audit reporting • Policy and access history Data Protection Encrypt data at rest & in motion Map to Nevada Energy Requirements Questions HDP Security Component End User Security LDAP Integration Kerberos, Argus (XA) Group level access Argus(XA) Multiple level of access Argus(XA) Multiple Environments Argus(XA) Developer Security Page 5 Access control for creating tables Argus(XA) Limit of creating scheme, creating folders Argus(XA) © Hortonworks Inc. 2014 Security Features HDP w/ Advanced Security Authentication Kerberos Support ✔ Perimeter Security – For services and rest API ✔ Authorizations Fine grained access control Role base access control Column level Permission Support HDFS, Hbase and Hive ✔ ✔ Create, Drop, Index, lock, user Auditing Resource access auditing Policy auditing Page 6 © Hortonworks Inc. 2014 Extensive Auditing ✔ Security Features HDP w/ Advanced Security Data Protection Wire Encryption ✔ Volume Encryption ✔ File/Column Encryption Partners Reporting Global view of policies and audit data ✔ Manage Global policy manager, Web UI ✔ ✔ Delegated administration ✔ User/ Group mapping Page 7 © Hortonworks Inc. 2014 Authentication w/ Kerberos Page 8 © Hortonworks Inc. 2014 Page 8 Kerberos Primer KDC 5. Read/write file given NN-ST and file name; returns block locations, block IDs and Block Access Tokens if access permitted 1. kinit - Login and get Ticket Granting Ticket (TGT) NN 3. Get NameNode Service Ticket (NN-ST) Client 2. Client Stores TGT in Ticket Cache 6. Read/write block given Block Access Token and block ID 4. Client Stores NN-ST in Ticket Cache Client’s Kerberos Ticket Cache Page 9 © Hortonworks Inc. 2014 Page 9 DN Kerberos Summary • Provides Strong Authentication • Establishes identity for users, services and hosts • Prevents impersonation on unauthorized account • Supports token delegation model • Works with existing directory services • Basis for Authorization Page 10 © Hortonworks Inc. 2014 Page 10 User Management •Most customers use LDAP for user info –LDAP guarantees that user information is consistent across the cluster –An easy way to manage users & groups –The standard user to group mapping comes from the OS on the NameNode •Kerberos provides authentication –PAM can automatically log user into Kerberos Page 12 © Hortonworks Inc. 2014 Page 12 Kerberos + Active Directory Use existing directory tools to manage users AD / LDAP Use Kerberos tools to manage host + service principals Cross Realm Trust Users: smith@EXAMPLE.COM Hosts: host1@HADOOP.EXAMPLE.COM KDC Services: hdfs/host1@HADOOP.EXAMPLE.COM User Store Client Authentication Hadoop Cluster Page 13 © Hortonworks Inc. 2014 Page 13 Knox Gateway Overview Perimeter REST API Security Page 17 © Hortonworks Inc. 2014 Page 17 What does Perimeter Security really mean? Knox Gateway controls all Hadoop REST API access through firewall REST API REST API User Page 18 Firewall only allows connections through specific ports from Knox hostInc. 2014 © Hortonworks Firewall required at perimeter (today) Gateway Firewall Hadoop Services Hadoop cluster mostly unaffected Why Knox? Enhanced Security • Protect network details • Partial SSL for non-SSL services • WebApp vulnerability filter Centralized Control • Central REST API auditing • Service-level authorization • Alternative to SSH “edge node” Enterprise Integration Simplified Access • • • • • Page 19 Kerberos encapsulation Extends API reach Single access point Multi-cluster support Single SSL certificate © Hortonworks Inc. 2014 • • • • • LDAP integration Active Directory integration SSO integration Apache Shiro extensibility Custom extensibility Current Hadoop Client Model • FileSystem and MapReduce Java APIs • HDFS, Pig, Hive and Oozie clients (that wrap the Java APIs) • Typical use of APIs is via “Edge Node” that is “inside” cluster • Users SSH to Edge Node and execute API commands from shell SSH User Page 20 © Hortonworks Inc. 2014 Edge Node Hadoop Page 20 Hadoop REST APIs Service API WebHDFS Supports HDFS user operations including reading files, writing to files, making directories, changing permissions and renaming. Learn more about WebHDFS. WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL commands. Learn more about WebHCat. Hive Hive REST API operations, JDBC/ODBC over HTTP HBase HBase REST API operations Oozie Job submission and management, and Oozie administration. Learn more about Oozie. • Useful for connecting to Hadoop from the outside the cluster Page 21 © Hortonworks Inc. 2014 Page 21 Hadoop REST API Security: Drill-Down Hadoop Cluster 1 Firewall Firewall DMZ Masters NN Edge Node/Hado op CLIs RM Web HCat Oozie RPC HBase Slaves DN REST Client HS2 NM Knox Gateway HTTP HTTP LB HTTP GW GW Hadoop Cluster 2 Masters NN RM LDAP Page 22 © Hortonworks Inc. 2014 Enterprise Identity Provider LDAP/AD Oozie HBase Web HCat Slaves DN Page 22 NM HS2 Authorization and Auditing Page 23 © Hortonworks Inc. 2014 Page 23 Authorization and Audit Authorization Fine grain access control • HDFS – Folder, File • Hive – Database, Table, Column • HBase – Table, Column Family, Column Flexibility in defining policies Audit Extensive user access auditing in HDFS, Hive and HBase • IP Address • Resource type/ resource • Timestamp • Access granted or denied Page 24 © Hortonworks Inc. 2014 Control access into system Central Security Administration HDP Advanced Security • Delivers a ‘single pane of glass’ for the security administrator • Centralizes administration of security policy • Ensures consistent coverage across the entire Hadoop stack Page 25 © Hortonworks Inc. 2014 Setup Authorization Policies file level access control, flexible definition Control permissions 26 Page 26 © Hortonworks Inc. 2014 Monitor through Auditing 27 Page 27 © Hortonworks Inc. 2014 Enterprise Users Authorization and Auditing w/ XA XA Administration Portal RDBMS XA Audit Server HDFS HBase Hadoop Components Legacy Tools XA Policy Server XA Plugin Hive Server2 XA Plugin Hadoop distributed file system (HDFS) XA Plugin Integration API XA Plugin* Knox XA Plugin* Storm XA Plugin* Falcon * - Future Integration YARN : Data Operating System Page 28 © Hortonworks Inc. 2014 Data Protection HDP allows you to apply data protection policy at three different layers across the Hadoop stack Layer What? How ? Storage Encrypt data while it is at rest Partners, OS level encrypt, Custom Code Transmission Encrypt data as it moves Supported in HDP 2.1 Upon Access Apply restrictions when accessed Partners, Open Source Initiatives Page 30 © Hortonworks Inc. 2014 Points of Communication Hadoop Cluster 1 2 WebHDFS DataTransferProtocol Client 3 4 Page 31 © Hortonworks Inc. 2014 RPC JDBC/ODBC Nodes 2 DataTransfer 3 RPC 4 M/R Shuffle Nodes Page 31 Data Transmission Protection in HDP 2.1 • WebHDFS – Provides read/write access to HDFS – Optionally enable HTTPS – Authenticated using SPNEGO (Kerberos for HTTP) filter – SSL based wire encryption • RPC – Communications between NNs, DNs, etc. and Clients – SASL based wire encryption – DTP encryption with SASL • JDBC/ODBC – SSL based wire encryption – Also available SASL based encryption • Shuffle – Mapper to Reducer over HTTP(S) with SSL Page 32 © Hortonworks Inc. 2014 32 Data Storage Protection • Encrypt at the physical file system level (e.g. dm-crypt) • Encrypt via custom HDFS “compression” codec • Encrypt at Application level (including security service/device) ABC DEF DEF Security Service (Partner) ETL ENCRYPT ABC Page 33 © Hortonworks Inc. 2014 1a3d ABC App DECRYPT HDFS Page 33 Current Open Source Initiatives • HDFS Encryption – Transparent encryption of data at rest in HDFS via Encryption zones. Being worked in the community – Dependency on Key Management Server and Keyshell • • • • Key Management Server Key Provider API Hive Column Level Encryption HBase Column Level Encryption – Transparent Column Encryption, needs more testing/validation • Command line Key Operations Page 34 © Hortonworks Inc. 2014 Resources Page 35 © Hortonworks Inc. 2014 Page 35 Security Page Page 36 © Hortonworks Inc. 2014 Hortonworks Security Investment Plans Investment themes HDP + XA Comprehensive Security for Enterprise Hadoop Goals: Comprehensive Security Meet all security requirements across Authentication, Authorization, Audit & Data Protection for all HDP components Central Administration Provide one location for administering security policies and audit reporting for entire platform Consistent Integration Integrate with other security & identity management systems, for compliance with IT policies …all IN Hadoop Page 37 © Hortonworks Inc. 2014 Previous Phases Kerberos Authentication HDFS, Hive & Hbase authorization Wire Encryption for data in motion Knox for perimeter security Basic Audit in HDFS & MR SQL Style Hive Authorization ACLs for HDFS XA Secure Phase • Centralized Security Admin for HDFS, Hive & HBase • Centralized Audit Reporting • Delegated Policy Administration Future Phases • Encryption in HDFS, Hive & Hbase • Centralized security administration of entire Hadoop platform • Centralized auditing of entire platform • Expand Authentication & SSO integration choices • Tag based global policies (e.g. Policy for PII)