Cloudera Navigator Installation and User Guide

Cloudera Navigator
Installation and User Guide
Cloudera, Inc.
220 Portage Avenue
Palo Alto, CA 94306
info@cloudera.com
US: 1-888-789-1488
Intl: 1-650-362-0488
www.cloudera.com
Important Notice
© 2010-2013 Cloudera, Inc. All rights reserved.
Cloudera, the Cloudera logo, Cloudera Impala, Impala, and any other product or service names or
slogans contained in this document, except as otherwise disclaimed, are trademarks of Cloudera and its
suppliers or licensors, and may not be copied, imitated or used, in whole or in part, without the prior
written permission of Cloudera or the applicable trademark holder.
Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation. All other
trademarks, registered trademarks, product names and company names or logos mentioned in this
document are the property of their respective owners. Reference to any products, services, processes or
other information, by trade name, trademark, manufacturer, supplier or otherwise does not constitute
or imply endorsement, sponsorship or recommendation thereof by us.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights
under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval
system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or
otherwise), or for any purpose, without the express written permission of Cloudera.
Cloudera may have patents, patent applications, trademarks, copyrights, or other intellectual property
rights covering subject matter in this document. Except as expressly provided in any written license
agreement from Cloudera, the furnishing of this document does not give you any license to these
patents, trademarks copyrights, or other intellectual property.
The information in this document is subject to change without notice. Cloudera shall not be liable for
any damages resulting from technical errors or omissions which may be present in this document, or
from use of this document.
Version: 1.0
Date: February 26, 2013
Contents
INTRODUCING CLOUDERA NAVIGATOR ................................................................................................................. 1
CLOUDERA NAVIGATOR ARCHITECTURE ..............................................................................................................................2
SERVICE VERSIONS AND AUDITED OPERATIONS ....................................................................................................................2
HDFS ......................................................................................................................................................................2
HBase ....................................................................................................................................................................2
Hive .......................................................................................................................................................................3
Hue ........................................................................................................................................................................3
CLOUDERA NAVIGATOR 1.0.X RELEASE NOTES AND FAQ ....................................................................................... 3
NEW FEATURES .............................................................................................................................................................4
KNOWN ISSUES IN THE CURRENT RELEASE ...........................................................................................................................4
— The IP address in a Hue service audit log shows as "unknown". ......................................................................4
— Hive service configuration impact on Hue service auditing ..............................................................................4
— Hive service configuration in Cloudera Navigator ............................................................................................4
FREQUENTLY ASKED QUESTIONS .......................................................................................................................................4
Is Cloudera Navigator a module of Cloudera Manager?.......................................................................................4
Is Cloudera Navigator included as part of Cloudera Enterprise? ..........................................................................4
Can Cloudera Navigator be purchased standalone, that is, without Cloudera Manager? ...................................4
What Cloudera Manager and CDH releases does Cloudera Navigator 1.0 work with? ........................................5
Is Cloudera Navigator open source or closed source?...........................................................................................5
How are Cloudera Navigator logs different from Cloudera Manager logs? .........................................................5
Will Cloudera Navigator be the tool to address all compliance issues in Hadoop? ..............................................5
INSTALLING CLOUDERA NAVIGATOR ..................................................................................................................... 5
NEW INSTALLATION ........................................................................................................................................................5
UPGRADING AN EXISTING CLOUDERA MANAGER INSTALLATION ..............................................................................................5
CONFIGURING NAVIGATOR SERVER ...................................................................................................................... 6
CONFIGURING SERVICE AUDITING ......................................................................................................................... 6
SERVICE AUDITING PROPERTIES.........................................................................................................................................7
CONFIGURING SERVICE AUDITING PROPERTIES .....................................................................................................................7
AUDIT EVENT LOGS ................................................................................................................................................ 7
VIEWING AUDIT EVENT LOGS ...........................................................................................................................................7
FILTERING EVENTS IN AUDIT EVENT LOGS ...........................................................................................................................8
Adding Filters ........................................................................................................................................................8
Selecting a Time Range .........................................................................................................................................9
Removing Filters from Service Access Audit Event Logs ........................................................................................9
Modifying Filters ...................................................................................................................................................9
DOWNLOADING AUDIT EVENT LOGS ..................................................................................................................................9
DOWNLOADING HDFS DIRECTORY ACCESS PERMISSION REPORTS ...................................................................... 10
Introducing Cloudera Navigator
This guide explains how to install, configure, and use Cloudera Navigator.
•
Introducing Cloudera Navigator
•
Cloudera Navigator 1.0.x Release Notes and FAQ
•
Installing Cloudera Navigator
•
Configuring Navigator Server
•
Configuring Service Auditing
•
Audit Event Logs
•
Downloading HDFS Directory Access Permission Reports
Introducing Cloudera Navigator
Cloudera Navigator is the first fully integrated data management tool for the Hadoop platform. Cloudera
Navigator 1.0 provides data governance capabilities such as verifying access privileges and auditing
access to all data stored in Hadoop. These capabilities are critical for enterprise customers that are in
highly regulated industries and have stringent compliance requirements.
Cloudera Navigator tracks access permissions and actual accesses to all data objects in Hive, HBase, and
HDFS to help answer questions such as - who has access to which data object(s), which data objects
were accessed by a user, when was a data object accessed and by whom, what data assets were
accessed using a service, which device was used to access, and so on. In the current release Cloudera
Navigator supports tracking access to:
•
HDFS data accessed through HDFS, Hive, and HBase operations
•
Hive metadata
Cloudera Navigator allows administrators to configure, collect, and view audit events, to understand
who accessed what data and how. The information in an audit event includes:
•
Timestamp - The date and time of the access.
•
Operation - The operation performed on the object. For example, list an HDFS directory, create
a Hive table, or put an HBase object.
•
Object accessed - The object that was accessed. For example, a Hive table, an HDFS file or
directory, or an HBase table.
•
User - The principal that accessed the object. Typically, this is a username. Where appropriate,
this is annotated with the authentication mechanism.
•
IP address - The address of the machine that accessed the object.
•
Service - The service instance through which the data was accessed. For example, a Hive service
instance.
Cloudera Navigator Installation and User Guide | 1
Introducing Cloudera Navigator
Cloudera Navigator allows administrators to generate reports that list the HDFS access permissions
granted to groups.
Cloudera Navigator Architecture
The architecture of Cloudera Navigator is illustrated below.
Cloudera Navigator is implemented as an add-on to Cloudera Manager 4.5; all Cloudera Navigator
functions (installation, configuration, and audit log review) are accessed through the Cloudera Manager
Admin Console.
When Cloudera Navigator is installed, plug-ins that enable collection of audit events are added to each
audited service. When data is accessed via the services for whom auditing is enabled via Cloudera
Navigator, audit events are generated and sent to the Navigator Server, which stores the events
securely and durably in a database.
Service Versions and Audited Operations
This section describes the service versions and audited operations supported by Cloudera Navigator.
HDFS
Minimum supported CDH version: 4.0
The captured operations are:
•
Operations that access or modify a file's or directory's data or metadata
•
Operations denied due to lack of privileges
HBase
Minimum supported CDH version: 4.0
The captured operations are:
•
Operations that require a privilege (except balance, balance switch, and append)
•
Operations denied due to lack of privileges
2 | Cloudera Navigator Installation and User Guide
Cloudera Navigator 1.0.x Release Notes and FAQ
•
In CDH versions less than 4.2, for grant and revoke operations, the operation in log events
is "ADMIN"
•
In simple authentication mode, if the HBase Secure RPC Engine property is false (the
default), the username in log events is "UNKNOWN"
Hive
Minimum supported CDH version: 4.2
The captured operations are:
•
Operations (except grant, revoke, and metadata access only) sent to HiveServer2
•
Operations denied due to lack of privileges are not captured
•
Access via the Hive CLI is not supported
•
In simple authentication mode, the username in log events is the username passed in the
HiveServer2 connect command. If you do not pass a username in the connect command,
the username is log events is "anonymous".
Hue
Minumum supported CDH version: 4.2
The captured operations are:
•
Operations (except grant, revoke, and metadata access only) sent to Beeswax Server
You do not directly configure the Hue service for auditing. Instead, when you configure the Hive
service for auditing, operations sent to the Hive service through Beeswax appear in the Hue service
audit log.
Cloudera Navigator 1.0.x Release Notes and FAQ
These release notes provide information on the new features, known issues, and frequently asked
questions for Cloudera Navigator release 1.0.0.
Cloudera Navigator Installation and User Guide | 3
Cloudera Navigator 1.0.x Release Notes and FAQ
New Features
•
View access permissions for HDFS files
•
Track and view audit information for Hive, HDFS, and HBase
Known Issues in the Current Release
— The IP address in a Hue service audit log shows as "unknown".
The IP address in a Hue service audit log shows as "unknown".
Severity: Low
Anticipated Resolution: To be fixed in a future release.
Workaround: None.
— Hive service configuration impact on Hue service auditing
If the audit configuration for a Hive service is changed, Beeswax must be restarted to pick up the change
in the Hue service audit log.
Severity: Low
Anticipated Resolution: To be fixed in a future release.
Workaround: None.
— Hive service configuration in Cloudera Navigator
For Hive services, Cloudera Navigator doesn't support the "shutdown" option for the "Queue Policy"
property.
Severity: Low
Anticipated Resolution: To be fixed in a future release.
Workaround: None.
Frequently Asked Questions
Is Cloudera Navigator a module of Cloudera Manager?
Cloudera Navigator is a new product line that sits side-by-side with and complements Cloudera
Manager. What Cloudera Manager is to managing services in CDH, Cloudera Navigator is to managing all
the data stored in those CDH services.
Is Cloudera Navigator included as part of Cloudera Enterprise?
Cloudera Navigator can be purchased as an add-on to any of the Cloudera Enterprise products.
Can Cloudera Navigator be purchased standalone, that is, without Cloudera Manager?
Cloudera Navigator builds on top of Cloudera Manager. Therefore, Cloudera Enterprise Core is a
prerequisite for Cloudera Navigator HDFS & Hive and Cloudera Enterprise RTD for HBase support.
4 | Cloudera Navigator Installation and User Guide
Installing Cloudera Navigator
What Cloudera Manager and CDH releases does Cloudera Navigator 1.0 work with?
Cloudera Manager 4.5, CDH 4.0 for HDFS and HBase, CDH 4.2 for Hive
Is Cloudera Navigator open source or closed source?
Cloudera Navigator is a closed-source management tool that adds to the Cloudera suite of management
capabilities for Hadoop.
How are Cloudera Navigator logs different from Cloudera Manager logs?
Cloudera Navigator tracks and aggregates only the accesses to the data stored in CDH services and used
for audit reports and analysis. Cloudera Manager monitors and logs all the activity performed by CDH
services that helps administrators maintain the health of the cluster. The target audience of these logs
are different but together they provide better visibility into both the data access and system activity for
an enterprise cluster.
Will Cloudera Navigator be the tool to address all compliance issues in Hadoop?
Cloudera Navigator helps with compliance by tracking and providing a central dashboard to view
accesses to all data in Hive, HDFS, and HBase. It is intended to complement existing cross-platform
compliance tools by providing Hadoop-specific auditing information that can be imported into these
tools.
Installing Cloudera Navigator
You can install Cloudera Navigator while installing Cloudera Manager for the first time or while
upgrading an existing Cloudera Manager installation.
When you install Cloudera Navigator you choose the database to store audit events. You can choose
either an embedded PostgreSQL database or a standalone database. For information on setting up a
standalone database, see Installing and Configuring Databases.
New Installation
1. Install Cloudera Manager following the instructions in the Cloudera Manager Installation Guide.
2. In the Add Cloudera Management Services area of the Choose the CDH4 services screen, check
the Include Cloudera Navigator checkbox.
Upgrading an Existing Cloudera Manager Installation
1. Upgrade Cloudera Manager following the instructions in the Cloudera Manager Installation
Guide.
2. Click the service in the Cloudera Management Services table.
3. Click the Instances tab.
4. Click the Add button.
Cloudera Navigator Installation and User Guide | 5
Configuring Navigator Server
5. Choose a host and select the Navigator Server radio button.
6. Click Continue.
7. Choose a database option and click Test Connection to verify the availability of the database.
8. Click Continue.
9. Click Accept.
10. Check the checkbox next to the navigator role.
11. Select Actions for Selected > Start.
12. Click Start.
13. Click Close.
14. Restart all audited services for auditing to go into effect.
Configuring Navigator Server
To configure Navigator Server, do one of the following:
•
In the Cloudera Management Services table,
a. Click the Navigator Server role.
b. Select Configuration > View and Edit.
c. Expand the navigator category and optionally choose a subcategory.
d. Configure the server and click Save Changes.
•
Select Services > Cloudera Management Service.
a. Select Configuration > View and Edit.
b. Expand the Navigator Server category and optionally choose a subcategory.
c. Configure the server and click Save Changes.
For detailed information on service configuration, see Modifying Service Configurations and Configuring
Monitoring Settings.
Configuring Service Auditing
You can configure services to:
•
Enable and disable auditing
•
Exclude and include auditing of files and directories, users, and tables
6 | Cloudera Navigator Installation and User Guide
Audit Event Logs
•
Coalesce auditing events based on operation attributes (time, operation name), user attributes
(username) and object attributes (path, table name, and so on).
•
Specify what action to take when the audit event queue is full
Service Auditing Properties
Each service that supports auditing configuration has the following properties:
•
Enable collection - A flag to enable collection of audit events
•
Event filter - A set of rules that capture properties of auditable events and actions to be
performed when an event matches those properties
•
Event tracker - A set of rules for tracking and coalescing events.
•
Queue policy - The action to take when the audit event queue is full. When a queue is full and
the queue policy of the service is Shutdown, before shutting down the service, N audits will be
discarded, where N is the size of the Cloudera Navigator Server queue.
The Event Filter and Event Tracker rules for filtering and coalescing events are expressed as JSON
objects. For information on the structure of the objects, see the description on the configuration screen.
Configuring Service Auditing Properties
To configure service auditing properties:
1. Click an HDFS, HBase, or Hive service.
2. Select Configuration > View and Edit.
3. Click the Cloudera Navigator category.
The Service-Wide properties display.
4. Edit the properties and click Save Changes.
Audit Event Logs
In Cloudera Manager audit event logs display service and role life cycle events recorded by Cloudera
Manager management services and service access events recorded by Cloudera Navigator. For
information on the former, see Viewing the Audit History.
Viewing Audit Event Logs
You can view audit event logs for all services or for a specific service.
To view the audit event log for all services:
1. Click Audits in the banner.
To view the audit event log for a service:
Cloudera Navigator Installation and User Guide | 7
Audit Event Logs
1. Click an HDFS, HBase, Hive, or Hue service.
2. Click the Audits tab.
•
When you mouse over a Hive or Hue service event, a popup will display the query that
generated the event.
•
Events that represent denied access are labeled Denied and have a pink background.
Filtering Events in Audit Event Logs
You filter events by adding filters and selecting a time range.
Adding Filters
To add a filter to an audit event log, do one of the following:
•
Click an event property value link in an audit event.
A filter containing the property and its value is added to the list of filters at the left and the audit
log redisplays all events that match the filter. For example, clicking the Create Table link in the
following audit log added the Create Table filter and filtered the log to show only those events.
•
Click the Add Filter to the left of the log.
A filter control is added to the list of filters.
a. Choose an event property in the property drop-down list.
b. Choose an operator in the operator drop-down list.
c. Type an event property value in the value text field. If you use the LIKE operator, specify
combinations of literal strings and '%' in the value field. For example, the value 'THE%S'
matches THEMOVIES and THEUSERS.
8 | Cloudera Navigator Installation and User Guide
Audit Event Logs
d. Do one of the following:

Click Search.
A filter containing the property, operation, and value is added to the list of
filters at the left and the audit log redisplays all events that match the filter.

Click Add Another.
A filter containing the property and its value is added to the list of filters at the
left, the audit log redisplays all events that match the filter, and another filter
control is added to the list of filters.
Selecting a Time Range
To specify a time range, do one of the following:
•
Click a time range link (
•
Specify a time range using the Time Range Selector. For information on the Time Range Selector,
see Selecting a Time Range.
) on the left of the audit log.
The audit log redisplays all events that match the time range.
Removing Filters from Service Access Audit Event Logs
To remove a filter from an audit event log:
1. Click the at the right of the filter.
The filter is removed and the audit log redisplays all audit events that match the remaining
filters. If there are no filters, the audit log displays all events.
Modifying Filters
To modify a filter:
1. Click the filter.
The filter expands into separate property, operator, and value fields.
2. Modify the value of one or more fields.
3. Click Search.
A filter containing the property, operation, and value is added to the list of filters at the left and
the audit log redisplays all events that match the filter.
Downloading Audit Event Logs
1. Specify desired filters and time range.
2. Click the Download CSV button to the left of the audit log.
A file with the following fields is downloaded: service, username, command, ipAddress,
Cloudera Navigator Installation and User Guide | 9
Downloading HDFS Directory Access Permission Reports
resource, allowed, timestamp. The structure of the resource field depends on the type of the
service as follows:
o
HDFS - A file path.
o
Hive and Hue - <database>:<tablename>
o
HBase - <table> <famil>:<qualifier>
Here is an example of an HDFS service audit log:
service,username,command,ipAddress,resource,allowed,timestamp
hdfs1,cloudera,setPermission,10.20.187.242,/user/hive,false,"201302-09T00:59:34.430Z"
hdfs1,cloudera,getfileinfo,10.20.187.242,/user/cloudera,true,"201302-09T00:59:22.667Z"
hdfs1,cloudera,getfileinfo,10.20.187.242,/,true,"2013-0209T00:59:22.658Z"
In this example, the first event access was denied, and therefore the "allowed" property has the value
"false".
Downloading HDFS Directory Access Permission Reports
For each HDFS service you can download a report that details the HDFS directories a group has
permission to access.
To download a directory access permission report:
1. In Cloudera Manager, click Reports.
2. In the Directory Access by Group row, click CSV or XLS.
The Download User Access Report pop-up displays.
a. In the pop-up, type a group and directory.
b. Click Download.
A report of the selected type will be generated containing the following information –
path, owner, permissions, and size – for each directory contained in the specified
directory that the specified group has access to.
10 | Cloudera Navigator Installation and User Guide