Cloudera Navigator Installation and User Guide Cloudera, Inc. 220 Portage Avenue Palo Alto, CA 94306 info@cloudera.com US: 1-888-789-1488 Intl: 1-650-362-0488 www.cloudera.com Important Notice © 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and any other product or service names or slogans contained in this document, except as otherwise disclaimed, are trademarks of Cloudera and its suppliers or licensors, and may not be copied, imitated or used, in whole or in part, without the prior written permission of Cloudera or the applicable trademark holder. Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation. All other trademarks, registered trademarks, product names and company names or logos mentioned in this document are the property of their respective owners. Reference to any products, services, processes or other information, by trade name, trademark, manufacturer, supplier or otherwise does not constitute or imply endorsement, sponsorship or recommendation thereof by us. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Cloudera. Cloudera may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Cloudera, the furnishing of this document does not give you any license to these patents, trademarks copyrights, or other intellectual property. The information in this document is subject to change without notice. Cloudera shall not be liable for any damages resulting from technical errors or omissions which may be present in this document, or from use of this document. Version: 1.0 Date: February 26, 2013 Contents INTRODUCING CLOUDERA NAVIGATOR ................................................................................................................. 1 CLOUDERA NAVIGATOR ARCHITECTURE ..............................................................................................................................2 SERVICE VERSIONS AND AUDITED OPERATIONS ....................................................................................................................2 HDFS ......................................................................................................................................................................2 HBase ....................................................................................................................................................................2 Hive .......................................................................................................................................................................3 Hue ........................................................................................................................................................................3 CLOUDERA NAVIGATOR 1.0.X RELEASE NOTES AND FAQ ....................................................................................... 3 NEW FEATURES .............................................................................................................................................................4 KNOWN ISSUES IN THE CURRENT RELEASE ...........................................................................................................................4 — The IP address in a Hue service audit log shows as "unknown". ......................................................................4 — Hive service configuration impact on Hue service auditing ..............................................................................4 — Hive service configuration in Cloudera Navigator ............................................................................................4 FREQUENTLY ASKED QUESTIONS .......................................................................................................................................4 Is Cloudera Navigator a module of Cloudera Manager?.......................................................................................4 Is Cloudera Navigator included as part of Cloudera Enterprise? ..........................................................................4 Can Cloudera Navigator be purchased standalone, that is, without Cloudera Manager? ...................................4 What Cloudera Manager and CDH releases does Cloudera Navigator 1.0 work with? ........................................5 Is Cloudera Navigator open source or closed source?...........................................................................................5 How are Cloudera Navigator logs different from Cloudera Manager logs? .........................................................5 Will Cloudera Navigator be the tool to address all compliance issues in Hadoop? ..............................................5 INSTALLING CLOUDERA NAVIGATOR ..................................................................................................................... 5 NEW INSTALLATION ........................................................................................................................................................5 UPGRADING AN EXISTING CLOUDERA MANAGER INSTALLATION ..............................................................................................5 CONFIGURING NAVIGATOR SERVER ...................................................................................................................... 6 CONFIGURING SERVICE AUDITING ......................................................................................................................... 6 SERVICE AUDITING PROPERTIES.........................................................................................................................................7 CONFIGURING SERVICE AUDITING PROPERTIES .....................................................................................................................7 AUDIT EVENT LOGS ................................................................................................................................................ 7 VIEWING AUDIT EVENT LOGS ...........................................................................................................................................7 FILTERING EVENTS IN AUDIT EVENT LOGS ...........................................................................................................................8 Adding Filters ........................................................................................................................................................8 Selecting a Time Range .........................................................................................................................................9 Removing Filters from Service Access Audit Event Logs ........................................................................................9 Modifying Filters ...................................................................................................................................................9 DOWNLOADING AUDIT EVENT LOGS ..................................................................................................................................9 DOWNLOADING HDFS DIRECTORY ACCESS PERMISSION REPORTS ...................................................................... 10 Introducing Cloudera Navigator This guide explains how to install, configure, and use Cloudera Navigator. • Introducing Cloudera Navigator • Cloudera Navigator 1.0.x Release Notes and FAQ • Installing Cloudera Navigator • Configuring Navigator Server • Configuring Service Auditing • Audit Event Logs • Downloading HDFS Directory Access Permission Reports Introducing Cloudera Navigator Cloudera Navigator is the first fully integrated data management tool for the Hadoop platform. Cloudera Navigator 1.0 provides data governance capabilities such as verifying access privileges and auditing access to all data stored in Hadoop. These capabilities are critical for enterprise customers that are in highly regulated industries and have stringent compliance requirements. Cloudera Navigator tracks access permissions and actual accesses to all data objects in Hive, HBase, and HDFS to help answer questions such as - who has access to which data object(s), which data objects were accessed by a user, when was a data object accessed and by whom, what data assets were accessed using a service, which device was used to access, and so on. In the current release Cloudera Navigator supports tracking access to: • HDFS data accessed through HDFS, Hive, and HBase operations • Hive metadata Cloudera Navigator allows administrators to configure, collect, and view audit events, to understand who accessed what data and how. The information in an audit event includes: • Timestamp - The date and time of the access. • Operation - The operation performed on the object. For example, list an HDFS directory, create a Hive table, or put an HBase object. • Object accessed - The object that was accessed. For example, a Hive table, an HDFS file or directory, or an HBase table. • User - The principal that accessed the object. Typically, this is a username. Where appropriate, this is annotated with the authentication mechanism. • IP address - The address of the machine that accessed the object. • Service - The service instance through which the data was accessed. For example, a Hive service instance. Cloudera Navigator Installation and User Guide | 1 Introducing Cloudera Navigator Cloudera Navigator allows administrators to generate reports that list the HDFS access permissions granted to groups. Cloudera Navigator Architecture The architecture of Cloudera Navigator is illustrated below. Cloudera Navigator is implemented as an add-on to Cloudera Manager 4.5; all Cloudera Navigator functions (installation, configuration, and audit log review) are accessed through the Cloudera Manager Admin Console. When Cloudera Navigator is installed, plug-ins that enable collection of audit events are added to each audited service. When data is accessed via the services for whom auditing is enabled via Cloudera Navigator, audit events are generated and sent to the Navigator Server, which stores the events securely and durably in a database. Service Versions and Audited Operations This section describes the service versions and audited operations supported by Cloudera Navigator. HDFS Minimum supported CDH version: 4.0 The captured operations are: • Operations that access or modify a file's or directory's data or metadata • Operations denied due to lack of privileges HBase Minimum supported CDH version: 4.0 The captured operations are: • Operations that require a privilege (except balance, balance switch, and append) • Operations denied due to lack of privileges 2 | Cloudera Navigator Installation and User Guide Cloudera Navigator 1.0.x Release Notes and FAQ • In CDH versions less than 4.2, for grant and revoke operations, the operation in log events is "ADMIN" • In simple authentication mode, if the HBase Secure RPC Engine property is false (the default), the username in log events is "UNKNOWN" Hive Minimum supported CDH version: 4.2 The captured operations are: • Operations (except grant, revoke, and metadata access only) sent to HiveServer2 • Operations denied due to lack of privileges are not captured • Access via the Hive CLI is not supported • In simple authentication mode, the username in log events is the username passed in the HiveServer2 connect command. If you do not pass a username in the connect command, the username is log events is "anonymous". Hue Minumum supported CDH version: 4.2 The captured operations are: • Operations (except grant, revoke, and metadata access only) sent to Beeswax Server You do not directly configure the Hue service for auditing. Instead, when you configure the Hive service for auditing, operations sent to the Hive service through Beeswax appear in the Hue service audit log. Cloudera Navigator 1.0.x Release Notes and FAQ These release notes provide information on the new features, known issues, and frequently asked questions for Cloudera Navigator release 1.0.0. Cloudera Navigator Installation and User Guide | 3 Cloudera Navigator 1.0.x Release Notes and FAQ New Features • View access permissions for HDFS files • Track and view audit information for Hive, HDFS, and HBase Known Issues in the Current Release — The IP address in a Hue service audit log shows as "unknown". The IP address in a Hue service audit log shows as "unknown". Severity: Low Anticipated Resolution: To be fixed in a future release. Workaround: None. — Hive service configuration impact on Hue service auditing If the audit configuration for a Hive service is changed, Beeswax must be restarted to pick up the change in the Hue service audit log. Severity: Low Anticipated Resolution: To be fixed in a future release. Workaround: None. — Hive service configuration in Cloudera Navigator For Hive services, Cloudera Navigator doesn't support the "shutdown" option for the "Queue Policy" property. Severity: Low Anticipated Resolution: To be fixed in a future release. Workaround: None. Frequently Asked Questions Is Cloudera Navigator a module of Cloudera Manager? Cloudera Navigator is a new product line that sits side-by-side with and complements Cloudera Manager. What Cloudera Manager is to managing services in CDH, Cloudera Navigator is to managing all the data stored in those CDH services. Is Cloudera Navigator included as part of Cloudera Enterprise? Cloudera Navigator can be purchased as an add-on to any of the Cloudera Enterprise products. Can Cloudera Navigator be purchased standalone, that is, without Cloudera Manager? Cloudera Navigator builds on top of Cloudera Manager. Therefore, Cloudera Enterprise Core is a prerequisite for Cloudera Navigator HDFS & Hive and Cloudera Enterprise RTD for HBase support. 4 | Cloudera Navigator Installation and User Guide Installing Cloudera Navigator What Cloudera Manager and CDH releases does Cloudera Navigator 1.0 work with? Cloudera Manager 4.5, CDH 4.0 for HDFS and HBase, CDH 4.2 for Hive Is Cloudera Navigator open source or closed source? Cloudera Navigator is a closed-source management tool that adds to the Cloudera suite of management capabilities for Hadoop. How are Cloudera Navigator logs different from Cloudera Manager logs? Cloudera Navigator tracks and aggregates only the accesses to the data stored in CDH services and used for audit reports and analysis. Cloudera Manager monitors and logs all the activity performed by CDH services that helps administrators maintain the health of the cluster. The target audience of these logs are different but together they provide better visibility into both the data access and system activity for an enterprise cluster. Will Cloudera Navigator be the tool to address all compliance issues in Hadoop? Cloudera Navigator helps with compliance by tracking and providing a central dashboard to view accesses to all data in Hive, HDFS, and HBase. It is intended to complement existing cross-platform compliance tools by providing Hadoop-specific auditing information that can be imported into these tools. Installing Cloudera Navigator You can install Cloudera Navigator while installing Cloudera Manager for the first time or while upgrading an existing Cloudera Manager installation. When you install Cloudera Navigator you choose the database to store audit events. You can choose either an embedded PostgreSQL database or a standalone database. For information on setting up a standalone database, see Installing and Configuring Databases. New Installation 1. Install Cloudera Manager following the instructions in the Cloudera Manager Installation Guide. 2. In the Add Cloudera Management Services area of the Choose the CDH4 services screen, check the Include Cloudera Navigator checkbox. Upgrading an Existing Cloudera Manager Installation 1. Upgrade Cloudera Manager following the instructions in the Cloudera Manager Installation Guide. 2. Click the service in the Cloudera Management Services table. 3. Click the Instances tab. 4. Click the Add button. Cloudera Navigator Installation and User Guide | 5 Configuring Navigator Server 5. Choose a host and select the Navigator Server radio button. 6. Click Continue. 7. Choose a database option and click Test Connection to verify the availability of the database. 8. Click Continue. 9. Click Accept. 10. Check the checkbox next to the navigator role. 11. Select Actions for Selected > Start. 12. Click Start. 13. Click Close. 14. Restart all audited services for auditing to go into effect. Configuring Navigator Server To configure Navigator Server, do one of the following: • In the Cloudera Management Services table, a. Click the Navigator Server role. b. Select Configuration > View and Edit. c. Expand the navigator category and optionally choose a subcategory. d. Configure the server and click Save Changes. • Select Services > Cloudera Management Service. a. Select Configuration > View and Edit. b. Expand the Navigator Server category and optionally choose a subcategory. c. Configure the server and click Save Changes. For detailed information on service configuration, see Modifying Service Configurations and Configuring Monitoring Settings. Configuring Service Auditing You can configure services to: • Enable and disable auditing • Exclude and include auditing of files and directories, users, and tables 6 | Cloudera Navigator Installation and User Guide Audit Event Logs • Coalesce auditing events based on operation attributes (time, operation name), user attributes (username) and object attributes (path, table name, and so on). • Specify what action to take when the audit event queue is full Service Auditing Properties Each service that supports auditing configuration has the following properties: • Enable collection - A flag to enable collection of audit events • Event filter - A set of rules that capture properties of auditable events and actions to be performed when an event matches those properties • Event tracker - A set of rules for tracking and coalescing events. • Queue policy - The action to take when the audit event queue is full. When a queue is full and the queue policy of the service is Shutdown, before shutting down the service, N audits will be discarded, where N is the size of the Cloudera Navigator Server queue. The Event Filter and Event Tracker rules for filtering and coalescing events are expressed as JSON objects. For information on the structure of the objects, see the description on the configuration screen. Configuring Service Auditing Properties To configure service auditing properties: 1. Click an HDFS, HBase, or Hive service. 2. Select Configuration > View and Edit. 3. Click the Cloudera Navigator category. The Service-Wide properties display. 4. Edit the properties and click Save Changes. Audit Event Logs In Cloudera Manager audit event logs display service and role life cycle events recorded by Cloudera Manager management services and service access events recorded by Cloudera Navigator. For information on the former, see Viewing the Audit History. Viewing Audit Event Logs You can view audit event logs for all services or for a specific service. To view the audit event log for all services: 1. Click Audits in the banner. To view the audit event log for a service: Cloudera Navigator Installation and User Guide | 7 Audit Event Logs 1. Click an HDFS, HBase, Hive, or Hue service. 2. Click the Audits tab. • When you mouse over a Hive or Hue service event, a popup will display the query that generated the event. • Events that represent denied access are labeled Denied and have a pink background. Filtering Events in Audit Event Logs You filter events by adding filters and selecting a time range. Adding Filters To add a filter to an audit event log, do one of the following: • Click an event property value link in an audit event. A filter containing the property and its value is added to the list of filters at the left and the audit log redisplays all events that match the filter. For example, clicking the Create Table link in the following audit log added the Create Table filter and filtered the log to show only those events. • Click the Add Filter to the left of the log. A filter control is added to the list of filters. a. Choose an event property in the property drop-down list. b. Choose an operator in the operator drop-down list. c. Type an event property value in the value text field. If you use the LIKE operator, specify combinations of literal strings and '%' in the value field. For example, the value 'THE%S' matches THEMOVIES and THEUSERS. 8 | Cloudera Navigator Installation and User Guide Audit Event Logs d. Do one of the following: Click Search. A filter containing the property, operation, and value is added to the list of filters at the left and the audit log redisplays all events that match the filter. Click Add Another. A filter containing the property and its value is added to the list of filters at the left, the audit log redisplays all events that match the filter, and another filter control is added to the list of filters. Selecting a Time Range To specify a time range, do one of the following: • Click a time range link ( • Specify a time range using the Time Range Selector. For information on the Time Range Selector, see Selecting a Time Range. ) on the left of the audit log. The audit log redisplays all events that match the time range. Removing Filters from Service Access Audit Event Logs To remove a filter from an audit event log: 1. Click the at the right of the filter. The filter is removed and the audit log redisplays all audit events that match the remaining filters. If there are no filters, the audit log displays all events. Modifying Filters To modify a filter: 1. Click the filter. The filter expands into separate property, operator, and value fields. 2. Modify the value of one or more fields. 3. Click Search. A filter containing the property, operation, and value is added to the list of filters at the left and the audit log redisplays all events that match the filter. Downloading Audit Event Logs 1. Specify desired filters and time range. 2. Click the Download CSV button to the left of the audit log. A file with the following fields is downloaded: service, username, command, ipAddress, Cloudera Navigator Installation and User Guide | 9 Downloading HDFS Directory Access Permission Reports resource, allowed, timestamp. The structure of the resource field depends on the type of the service as follows: o HDFS - A file path. o Hive and Hue - <database>:<tablename> o HBase - <table> <famil>:<qualifier> Here is an example of an HDFS service audit log: service,username,command,ipAddress,resource,allowed,timestamp hdfs1,cloudera,setPermission,10.20.187.242,/user/hive,false,"201302-09T00:59:34.430Z" hdfs1,cloudera,getfileinfo,10.20.187.242,/user/cloudera,true,"201302-09T00:59:22.667Z" hdfs1,cloudera,getfileinfo,10.20.187.242,/,true,"2013-0209T00:59:22.658Z" In this example, the first event access was denied, and therefore the "allowed" property has the value "false". Downloading HDFS Directory Access Permission Reports For each HDFS service you can download a report that details the HDFS directories a group has permission to access. To download a directory access permission report: 1. In Cloudera Manager, click Reports. 2. In the Directory Access by Group row, click CSV or XLS. The Download User Access Report pop-up displays. a. In the pop-up, type a group and directory. b. Click Download. A report of the selected type will be generated containing the following information – path, owner, permissions, and size – for each directory contained in the specified directory that the specified group has access to. 10 | Cloudera Navigator Installation and User Guide