Uploaded by krishnapmishra2011

Defination Pentaho and its usage

advertisement
Defination Pentaho and its usage.
Revered as one of the most efficient and resourceful data integration tools (DI), Pentaho
virtually supports all available data sources and allows scalable data clustering and data
mining. It is a light-weight Business Intelligence suite executing Online Analytical
Processing (OLAP) services, ETL functions, reports and dashboards creation and other
data-analysis and visualization operations.
Important features of Pentaho.

Pentaho is capable of creating Advanced Reporting Algorithms regardless of their input and
output data format.

It supports various report formats, whether Excel spreadsheets, XMLs, PDF docs, CSV
files.

It is a Professionally Certified DI Software rendered by the renowned Pentaho Company
headquartered in Florida, United States.

Offers enhanced functionality and in-Hadoop functionality.

Allows dynamic drill down into larger and greater information.

Rapid Interactive response optimization.

Explore and view multidimensional data.
Major applications comprising Pentaho BI Project.

Business Intelligence Platform.

Dashboards and Visualizations.

Reporting.

Data Mining.

Data Analysis.

Data Integration and ETL (also called Kettle).

Data Discovery and Analysis (OLAP).
Architecture of Pentaho Data Integration
Spoon is the design interface for building ETL jobs and transformations. Spoon provides a drag-and-drop
interface that allows you to graphically describe what you want to take place in your transformations.
Transformations can then be executed locally within Spoon, on a dedicated Data Integration Server, or a
cluster of servers.
The Data Integration Server is a dedicated ETL server whose primary functions are:
Execution
Executes ETL jobs and transformations using the Pentaho Data Integration
engine
Security
Allows you to manage users and roles (default security) or integrate security to
your existing security provider such as LDAP or Active Directory
Content
Provides a centralized repository that allows you to manage your ETL jobs and
Management
transformations. This includes full revision history on content and features such
as sharing and locking for collaborative development environments.
Scheduling
Provides the services allowing you to schedule and monitor activities on the
Data Integration Server from within the Spoon design environment.
Pentaho Data Integration is composed of the following primary components:

Spoon. Introduced earlier, Spoon is a desktop application that uses a graphical interface and editor
for transformations and jobs. Spoon provides a way for you to create complex ETL jobs without
having to read or write code. When you think of Pentaho Data Integration as a product, Spoon is what
comes to mind because, as a database developer, this is the application on which you will spend
most of your time. Any time you author, edit, run or debug a transformation or job, you will be using
Spoon.

Pan. A standalone command line process that can be used to execute transformations and jobs you
created in Spoon. The data transformation engine Pan reads data from and writes data to various
data sources. Pan also allows you to manipulate data.

Kitchen. A standalone command line process that can be used to execute jobs. The program that
executes the jobs designed in the Spoon graphical interface, either in XML or in a database
repository. Jobs are usually scheduled to run in batch mode at regular intervals.

Carte. Carte is a lightweight Web container that allows you to set up a dedicated, remote ETL server.
This provides similar remote execution capabilities as the Data Integration Server but does not
provide scheduling, security integration, and a content management system
Benefits of Data Integration.

The biggest benefit is that integrating data improves consistency and reduces conflicting
and erratic data from the DB. Integration of data allows users to fetch exactly what they
look for, enabling them utilize and work with what they collected.

Accurate data extraction, which further facilitates flexible reporting and monitoring of the
available volumes of data.

Helps meet deadlines for effective business management.

Track customer’s information and buying behavior to improve traffic and conversions in the
future, thus advancing your business performance.
Major types of Data Integration Jobs.

Transformation Jobs : Used for preparing data and used only when the there is no
change in data until transforming of data job is finished.

Provisioning Jobs : Used for transmission/transfer of large volumes of data. Used only
when no change is data is allowed unless job transformation and on large provisioning
requirement.

Hybrid Jobs : Execute both transformation and provisioning jobs. No limitations for data
changes; it can be updates regardless of success/failure. The transforming and
provisioning requirements are not large in this case.
Pentaho Metadata
Pentaho Metadata is a piece of the Pentaho BI Platform designed to make it easier for users to access
information in business terms.
With the help of Pentaho’s open source metadata capabilities, administrators can outline a layer of
abstraction that presents database information to business users in familiar business terms.
Download