MS Word

advertisement
EVALUATION OF IBM INTELLIGENT MINER FOR DATA:
By Anand Kadur & Wangdong Zhang
Introduction:
IBM DB2 Intelligent Miner for Data embodies IBM’s latest data mining technology with
capabilities to support the full range of mining processes, from data analysis and preparation tasks
through mining and assimilation of results. Version 6.1,available September 24, 1999, includes a
powerful new Associations Visualizer, extended data exploration capabilities, support for DB2
UDB V6.1, and a number of productivity enhancements in the areas of interoperability and data
management. Performance enhancements include parallel processing for value Prediction and the
introduction of parallel data mining for SMP and Cluster configurations on AIX, Sun Solaris, and
Windows NT.
Highlights of the Software:

Single framework for data mining - a suite of tools to support the iterative process, offering
data processing, statistical analysis, and results visualization to complement a variety of
mining methods.

Proven mining algorithms that can be used individually or in combination to address a wide
range of business problems and deliver measurable business results.

Scalable solution focused on the technical issues of large-scale mining, such as large volumes
of data, parallel data mining on AIX, Windows NT, Sun Solaris, and OS/390, directly mining
DB2* data, long-running mining operations, and optimization of mining algorithms.

Core technology for IBM data mining solutions, supported by industry-recognized mining
consultants deployed worldwide, with customer engagements in finance, telecommunication,
insurance, and health care.

Application programming interface, enabling development of customized, industry-specific
mining applications by customers, IBM, and IBM Business Partners.
Discovering information that leads to knowledge
Mining is the process of extracting valid, previously unknown, and ultimately comprehensible
information from large databases and using it to make crucial business decisions. It is quickly
being recognized as an essential business intelligence tool, a necessary ingredient to discovering
the information necessary to improve a company's market presence and differentiate its products
and services in today's global marketplace.
Intelligent Miner for Data helps knowledge workers identify and extract high value business
intelligence from their data assets. It provides the fundamental technology and tools to support the
mining process, as well as application services to support development of customized mining
applications.
Intelligent Miner for Data is applicable to a wide range of business problems. Its mining results
can facilitate decision making in business areas:

Campaign planning

Customer relationship management

Process reengineering

Product planning and fulfillment

Fraud or abuse detection
Version 2
In its initial release, Intelligent Miner was established as a scalable, integrated framework capable
of handling the mining of large quantities of data. Its functions cover the full range of mining
processes, from data analysis and preparation tasks through mining and assimilation of results.
Version 2 features statistical functions, optimized mining techniques, usability and productivity
enhancements, DB2* and DB2 Universal Database* exploitation, more parallel mining, and
additional deployment options.
Innovative technology
Based on IBM research and validated through joint customer studies, nine innovative data mining
algorithms have emerged as the critical suite to address a wide range of business problems. In
customer engagements worldwide, these algorithms, often used in combination, indeed proved
their versatility. They have been used by retailers to determine customer purchasing patterns;
bankers to perform risk assessment; healthcare providers to detect potential fraud; and telephone
companies to analyze customer attrition, to name a few.
Data mining algorithms
The algorithms are categorized as follows:
1. Association discovery
2. Sequential pattern discovery
3. Clustering
4. Classification
5. Value prediction
6. Similar time sequences
The alternatives afforded by this breadth of coverage are further enhanced by the fact that three of
these categories are supported by more than one mining algorithm.
Data analysis and preparation
Automation of some of the most typical data preparation tasks is aimed at improving the analyst's
productivity by eliminating the need for programming specialized routines. Depending on the data
mining technique, analysts may select, sample, aggregate, filter, cleanse, and/or transform data in
preparation for mining.
Statistical functions facilitate the analysis and preparation of data, as well as provide forecasting
capabilities. Statistical functions include factor analysis, linear regression, principal component
analysis, univariate curve fitting, logistic regression, and univariate and bivariate statistics.
Visualization
Visualization functions bring out unusual features that might otherwise be "drowned out." A range
of visualizers is provided, specialized to the type of data mining or statistical analyses results.
Administrative user interface
An administrative user interface based on Java™ provides interactive access to mining tasks with
ease of use and consistency across all operations. Implementation features the use of state-of-theart GUI facilities, including online help, task guides, and a graphical representation of the mining
base and its objects.
Repeatable sequences allow an Intelligent Miner for Data user to construct a sequence of mining
operations which can be saved and subsequently modified and repeated. Analysts can develop an
end-to-end mining sequence on one system and deliver (or port) it to a client system for execution.
Application program interface
As mining enters the mainstream of business processes, increasing numbers of customized
applications to present and deploy mining results are being developed by customers, IBM, and
IBM partners. IBM Discovery Series is one such application implemented using the Intelligent
Miner for Data. It provides a suite of customer relationship management applications tailored to
particular industries, such as telecommunications, utilities, and finance.
Industry-specific mining application offerings leverage the benefits of mining by raising the access
and use of mining results to particular users in the enterprise. Results can thus be easily
understood in the context of a business task.
Tool interface
A registration facility is provided to facilitate export of mining results to familiar analysis tools.
Pulling it all together
IBM provides complete customer data mining solutions. In addition to mining tools and
applications, an IBM engagement can involve hardware and/or software offerings to support,
build, and manage the necessary infrastructure, including database, data warehouse, and other
business intelligence offerings.
IBM consulting practices are organized to bring industry and data mining experience as well as
overall warehouse and DB2 skills to a mining project, complementing customer resources. They
are available to support all phases of data mining solution development, from inception through
deployment. Mining engagements can encompass planning and installation of complete mining
solutions, as well as mining education and consulting.
Plus points noticed:
Single framework for data mining--a suite of tools to support the iterative process, offering data
processing, statistical analysis, and results visualization to complement a variety of mining
methods.
Proven mining algorithms that can be used individually or in combination to address a wide range
of business problems and deliver measurable business results.
Scaleable solution focused on the technical issues of large scale mining, such as large volumes of
data, AIX*/SP* parallel processing, mining directly against DB2 Universal Database data, longrunning mining operations, and optimization of mining algorithms.
Core technology for IBM Data Mining Solutions, supported by industry recognized mining
consultants deployed worldwide, with customer engagements in finance, telecommunication,
insurance, healthcare, and other industries.
Application Programming Interface enabling development of customized, industry-specific
mining applications by customers, IBM, and IBM business partners.
Version 2 highlights
New and Improved Analytics

Statistics

Neural Net Value Prediction

Optimization of algorithms

Model quality graphics
Usability Enhancements

Java™ User Interface

Repeatable Sequence

Task Guides, Expert Use Mode

Progress Indicator
DB2 Universal Database Exploitation

Parallel data mining

Performance
More Parallel Mining

Increased parallelism of algorithms

Full exploitation of DB2 Universal Database Enterprise-Extended Edition
Additional Deployment Options

More servers--AIX, AIX/SP, OS/390*, Solaris, OS/400*, Windows NT

More clients--AIX, Windows NT/Windows 95, OS/2*

More languages--Brazilian, French, Hungarian, Italian, Japanese, Korean, Portuguese,
Russian, Spanish, Traditional Chinese

More program access--Server API, Client API

More portable mining bases
Additional evaluation on different criteria:
We try to evaluate this software on the following criteria extending the above evaluation:





Support for platforms.
Functionality and variety of techniques for mining.
Efficiency of visual interfaces for the results.
User friendliness.
Plus points and Critics.
To start with, we would like a software to be compatible with a variety of platforms that
are available to us today. IBM intelligent miner for data has support for most of the platforms
including WIN NT, AIX, Solaris, OS/390,400 for the server part and AIX, OS/2 , WIN NT
and WIN 95 as far as the client part is concerned.
It requires about 450 MB of hard disk space including the DB2 database which is a prerequisite for it to run.
The server can be configured to startup as the computer boots up avoiding manual startup each
time.
The disadvantage on the server part is that the user has to be of the administrator group to
manipulate on the server settings.
When we discuss the functionality of the software we discuss the working of the client side
of the software because this is where most of the development has to occur. The server will be
running in the background supporting the client.
The latest release of this software that is version 6.1 has a whole lot of mining algorithms like
Association discovery, Pattern discovery, clustering, classification, value prediction etc. and
these are the techniques that are needed in data mining and we have a large variety to choose
from.
Options are provided to either use an existing mining base or data source either a flat file or
relational database. We can import data from external sources too. Mining can be performed
on these data sources using one of the algorithms and the results can be viewed using one of
the many interfaces that the software provides.
The software provides over a dozen interfaces to view the results and this is one of the top
points of the software.
We can interpret the results using the appropriate method that suits the application.
Several snap shots will be shown in the presentation.
Lastly, when we discuss user friendliness of the software, we tend to evaluate the software
from a naïve user point of view.
Great demos are available as far as using the software is concerned.
It has good GUI screens but for a new user, certain assumptions are made such as: The user
has to be familiar with DB2, the user must be an administrator on NT machines.
Further more, it does not provide a client version for Solaris systems.
So finally concluding, we found that the software in general is really good because there
are a number of case studies that show how efficiently the product was used for various
applications of commercial use.
Download