Business Intelligence

advertisement
Mendel University in Brno
Business Intelligence
Information Systems
Karel Burda, Bc. Sabina Baběrádová, Bc. Jan Huráb, Bc. Markéta Michálková, Bc. Petr
Krejčík, Bc. Nikol Varčevová
Brno 2011
1. Introduction
Business intelligence (BI) is one of the most important and progressing phenomenon's
in Information Technology. To declare what Business Intelligence is, we must point out that
BI virtually means a set of certain tools, which main aim is to collect certain piece of business
information and perform analysis of these data. Thus, the BI helps us while we are making
strategic business decisions. And this is the main purpose of BI after all.
With the increasing amount of the data and information in businesses, the importance
of Business Intelligence is about to become more and more important. Hence nowadays it is
expected that the investments of BI in businesses shall be bigger and bigger (Rydziová, 2006).
Since the main goal of BI is a very complex issue, thus BI systems itself are very
complicated and challenging as well.
Organization need more than just data and information. Organizations need business
intelligence (BI) - collective, information about customers, competitors, business partners,
competitive environment and internal operations – that gives the company ability to make
effective, important and often strategic business decisions. BI enables the organization
to extract the true meaning of information, so company can take creative and powerful steps
to ensure competitive advantage.
To create business intelligence, you need data and information. Therefore you must
first gather and organize all your data and information. Then you have to have the right IT
tools to define and analyze various relationships within the information. Technology such
as databases, database management systems, data warehouses and data-mining tools can
definitely help you build and use business intelligence.
As you begin working with these IT tools, you´ll be performing the two types
of information processing:

Online transaction processing (OLTP) is gathering of input information,
processing that information and updating existing information to reflect the gathered
and processed information. OLTP creates a class of systems that facilities and support
ordinary operative activities. Data entry and might be mentioned for instance. Few of the
benefits are reduced paper trails and the faster, more accurate forecasts for revenues
and expenses6. Databases that supports OLTP are most often referred to as operational
database. These operational databases contain valuable information that forms the basis
for business intelligence.

Online analytical processing (OLAP) is manipulation of information to support
decision making. Online analytical processing is one of the tools of how to support decision
making, provided having some data. These tools perform process of analyzing the data from
different dimensions. Basically, the product or result of this process is an OLAP
cube (sometimes referred to as multidimensional cube).
For example, if you have a table of data about sales, you can analyze it by product
type (a dimension), by demographic (another dimension), by geographic region (another
dimension), etc. The data you see can even be always the same, but it is prioritized
by whatever column you place first–which we call a dimension5.
2. The Relational Database Model
For storing and organizing basic and transaction-oriented information (that is
eventually used to create business intelligence), business today use databases. On this place,
we will focus on the most popular database model – the relational database model.
The relational database model was conceived by E. F. Codd in 1969, then a researcher
at IBM. The model is based on branches of mathematics called set theory and predicate logic.
The basic idea behind the relational model is that a database consists of a series of unordered
tables (or relations) that can be manipulated using non-procedural operations that return
tables. This model was in vast contrast to the more traditional database theories of the time
that were much more complicated, less flexible and dependent on the physical storage
methods of the data.
Any database can be defined as a collection of information that we can organize
and access according to the logical structure of that information. Then, we can say that the
relational database uses a series of logically related two-dimensional tables or files to store
information in the form of a database. A relational database is actually composed of two
distinct parts: 1) the information itself, stored in a series of two-dimensional tables, files,
or relations, and 2) the logical structure of that information.
A relational database allows the definition of data structures, storage and retrieval
operations and integrity constraints. In such a database the data and relations between them
are organized in tables. A table is a collection of records and each record in a table contains
the same fields.
Properties of relational tables:

values are atomic,

each row is unique,

column values are of the same kind,

the sequence of columns is insignificant,

the sequence of rows is insignificant,

each column has a unique name.
For real examples of integrity and properties of relational tables, see Attachment 1
please.
2.1. Example of relational database
At first, we would like to introduce a simple example demonstrating features of the
relational database model.
Let´s look at a Solomon´s database (a portion of Solomon Enterprises´ Database
for Customer Relationship Management Ordering Processing can be found in Attachment 1).
It contains five files: Order, Customer, Concrete Type, Employee, and Truck.
All of these files may be found as Attachment 3–7.
These files are all related for numerous reasons - customers make orders, employees
drive trucks, an order has a concrete type, etc. We need all these files to manage our customer
relationships and process orders. Within each file, you can see specific pieces of information
(attributes). For example, the Order file contains Order Number, Order Date, Customer
Number, Delivery Address, etc. In the Customer file, you can see specific information
including Customer Number, Customer Name, Customer Phone, and Customer Contact.
These are all important pieces of information that Solomon´s database should contain.
Moreover, Solomon needs all this information (and probably much more) to effectively
process orders and manage customer relationships.
2.2. Data Dictionary
Using the relational database model, you organize and access information according
to its logical structure, not its physical position. In the relational database model, a data
dictionary contains the logical structure for the information in a database. When you create
a database, you first create its data dictionary. The data dictionary contains important
information about your information. For example, the data dictionary for Customer Phone
in the Customer file would require 10 digits. The data dictionary for Date of Hire in the
Employee file would require a day, month, and year.
Using a database, you must clearly define the characteristics of each field by creating
a data dictionary. So, you must carefully plan the design of your database before you can start
adding information.
2.3. Requirements of the relational database
In a relational database, it is necessary create ties or relationships in the information
that show how the files relate to each other. Before the creation of these relationships among
files, the primary key of each file has to be specified.
The essential job of a database table´s primary key is to uniquely identify the rows
in the table - nothing more, nothing less. Each primary key value must be unique within
a table so the database engine can tell the difference between rows. The same primary key
value may appear in another table, but you can´t duplicate it within a table. And, the primary
key can´t be null because the database engine requires a value to locate the record. The second
major job of the primary key is to provide a "hook" for creating table relationships.
In Solomon´s database, Order Number is the primary key for the Order file and Customer
Number is the primary key for the Customer file.
A logical relationship between the two files is an example of a foreign key. A foreign
key is a primary key of one file that appears in another file. Foreign keys are essential in the
relational database model. Without them, there is no way of creating logical ties among the
various files.
2.4. Relational Integrity Constraint
Integrity constraints are sets of rules that can help maintain the quality of information
that is put up. Integrity constraints are mostly used when trying to promote accuracy
and consistency of data that is found in a relational database. This is very important
to companies because information can be considered as an asset to certain organizations and
it must be protected. Therefore, relational integrity constraints are rules which all instances
of the relational database must satisfy in order to correctly model the real world.
As a very good example of ensuring the quality of the information the Ritz-Carlton
hotel can be mentioned. Ritz-Carlton has created a powerful guest preference database
to provide customized, personal, and high-level service to guests of any of its hotels. By
assigning to you a unique customer ID that creates logical ties to your various preferences,
the Ritz-Carlton transfers your information to all of its other hotels. The next time you stay
in a Ritz-Carlton, e.g. in Florida, your information is already there, and the hotel staff
immediately knows your preferences. For the management at Ritz-Carlton, achieving
customer loyalty starts first with knowing each customer individually. See Attachment no. 1
for examples.
3. Database management system
To understand what really is a database the best way how to understand it is to imagine
it as if it was some document or workbook. They all have one thing in common. They contain
some information. And as you need some word processor to work with document, you need
some database management system to work with databases. A database management system
(DBMS) simply helps you to orientate in the database and it helps you to work with it.
A DBMS contains four important software components:
1. DBMS engine
2. Data definition subsystems
3. Application generation subsystem
4. Data administration subsystem
3.1. DBMS engine
DBMS engine accepts logical requests from various other DBMS subsystems,
converts them into their physical equivalent, and actually accesses the database and data
dictionary as they exist on a storage device.
Here it is important to explain the difference between physical and logical view.
Physical view of information is how info is physically put somewhere (i.e. hard drive).
The logical view is then focuses how you can access the information you need.
For one physical view there can be more logical views according to the needs of each user
of the database.
3.2. Data Definition Subsystem
The data definition subsystem helps you to create and maintain the data dictionary
and define the structure of the files in the database.
To understand how creation of the databases works, I will explain it on the creation
of worksheet. If you want to make a worksheet, you can start and immediately add
information to it.
With databases it is not so simple. If you want to have a database where you can add
information, you need make a logical structure of the database, set some rules, use data
definition subsystems. The same process you have to do if you want to change or delete some
data.
To be able to use this better there exist a lot of data manipulation tools, including
views, report generators, query by example tools and structured query language.
3.3. Data Manipulation System
It helps you add, delete or change the information in a database and query
it for valuable information. It is more or less a middleman between the database and user
and helps the logical view put to a physical view in database and vies versa.
Views: it allows you to see the content of a database file, make whatever changes you want, perform
simple sorting, and query to find location of specific information.
Report generators: help you quickly define formats of reports and what information you want to see in
report. Once you define a report, you can view it on the screen or print it.
Query by example tools: help you graphically design and answer some questions and problems.
By clicking and sorting you can get some info.
Structured query language: is similar to QBM but it uses statements like SELECT… FROM… etc. So
it is for more experienced users and programmers.
3.4. Application Generation Subsystem
Is used to develop transaction – intensive applications. These types of application
require that you perform a detailed series of tasks to process a transaction.
As with SQL this application is more used by IT specialists than by usual user.
3.5. Data Administration Subsystem
Data administration subsystem is mostly used by a data administrator, or database
administrator, someone who is responsible for assuring the database. It meets the entire
information needs of an organization:

Backup and recovery

Security management facilities

Query optimization facilities

Reorganization facilities

Concurrency control facilities

Change management facilities
4. Data Warehouse and Data Mining
Nowadays every technological progress is understood as advantage for the companies.
Well, it sure that is quick, logical, helpful, etc. Anyway many companies are using the newest
technologies because they are “hot”. Sometimes is better to answer these questions:
Do you need it?
Do all employees need an entire data?
How up-to-date must the information be?
What tools do you need?
Data Warehouse could be defined as a storage of data (i.e. database) used for analysis
and reporting. The source data come from multiple transaction systems within the
company (or organization) and the data warehouses keep the copy of it and thus maintain data
history.
Data warehouses may be multidimensional (the OLAP cube), normalized, or they can
use another structure of the data though.
Figure 1: OLAP cube
If we consider being indispensable without data warehouse and we create this process
than is time to think about tools which can provide all information wanted.
These tools are called data mining tools which are software tools needed to find
hidden data. These tools support the concept OLAP and include query-and-reporting tools,
intelligent agents, multidimensional analysis tools and statistical tools.

Query-and-reporting tools: are similar to SQL, QBE tools and report generators.

Intelligent agents: utilize artificial intelligence, base of business intelligence.

Multidimensional analysis: allows you to see information from different perspectives.

Statistical tools: allow to apply mathematic models.
One of question was if everybody in company is allowed to see any information. From
this reason company can create subsets of a data warehouse which are called data marts.
These are data for specific segment of employees who do not use all data.
Creating and maintaining a Data Warehouse is a huge job even for the largest
companies. It can take a long time and cost a lot of money. That is why the company should
first answer these questions.
5. Information Ownership
One particular interesting feature of many BI software packages is a digital dashboard.
A digital dashboard displays key information gathered from several sources on a computer
screen in a format tailored to the needs and wants of an individual knowledge worker (see
the attachment). It can provide up-to-the minute snapshots of any type of information and can
often help you identify trends that may represent opportunities or that may be problems. Refer
to the Attachment no. 2 for details.
Data administration is the function in an organization that plans for, oversees the
development of, and monitors the information resource. This function must be completely
in tune with the strategic direction of the organization to assure that all information
requirements can be and are being met.
Database administration is the function in an organization that is responsible for more
technical and operational aspects of managing the information contained in organizational
information repositories (databases, data warehouses, and data marts).
Database
administration functions include defining and organizing database structures and contents,
developing security procedures (in concert with CSO), and approving and monitoring
the development of database and database applications.
In large organizations, both of these administrative functions are usually handled
by steering committees rather than by a single individual. These steering committees are
responsible for their respective functions and for reporting to the CIO.
5.1. Sharing information with responsibility
Information sharing in your organization means that anyone can access and use
whatever information he needs. But information sharing brings to light an important question:
Does anyone in your organization own the information? In other words, if everyone shares the
information, who is ultimately responsible for providing the information and assuring the
quality of the information? Information ownership is a key consideration in today´s
information-based business environment. Someone must accept full responsibility for
providing specific pieces of information ensuring the quality of information. If you find the
wrong information is stored in the organization´s data warehouse, you must be able
to determine the source of the problem and whose responsibility it is.
5.2. Information cleanliness
Information cleanliness is an important topic today and will be for many years. Have
you ever received the same piece of advertising mail multiple times from the same company
on the same day? Many people have, and it´s an example of “unclean” information. The
reason may be your name may appear twice in database, once with your middle initial
and once without it, or your last name is once right and once it has wrong spelling. In the case
of having your information twice in database with two different spellings of your last name,
the utility would probably determine that the two records actually belong to the same person
because of identical nature of other associated information such as your address and phone
number. Always remember GIGO – garbage in, garbage out. If bad information, such
as duplicate records for the same customer, goes into the decision-making process, you can
rest assured that the decision outcome will not be optimal.
6. Conclusion
Business Intelligence is one of the hottest topics and markets today. The entire BI
market is in the range of 50 billion dollars annually and for the next several years we can
expect double-digit growth. When 300 business technology managers were asked about their
immediate project plans, 44% identified data warehouses and 43% identified data-mining
tools. Nowadays BI is related to thousands of success stories.
The objective of BI is to improve the timeliness and quality of the input for decision
making by helping knowledge workers to understand the:

capabilities available in the organization,

state of the art, trends and future directions in the market,

technological,
demographic,
economic,
political,
social
and
regulatory
environments in which the organization competes,

action of competitors and the implications of the action.
Business Intelligence covers both internal and external information. Companies
with well-designed BI system find that their managers make better decisions on a variety
of business issues. Higher quality managerial decision making lets companies gain
an advantage over their competitors who operate without the benefit of BI systems. BI
systems provide managers with actionable information and knowledge:

at the right time,

in the right location,

in the right form.
7. Resources
[1]
HAAG, Stephen ; CUMMINGS, Maeve. Management Information System : for the
Information Age. Eight. New York : Irwin, 2010. 555 s. ISBN 978-0-07-016-7094.
[2]
WATTERSON, Karen. Data miners Tool. Byte, 192 s.
[3]
DATABASE MODELS. Relational Model [online]. [accessed on 2011-05-08].
Available from WWW:
<http://unixspace.com/context/databases.html#RELATIONAL>.
[4]
COMPUTER BUSINESS RESEARCH. Relational integrity constraint [online].
[accessed on 2011-05-08]. Available from WWW:
<http://sites.google.com/site/b188sjsu/Home/database/relational-integrityconstraint>.
[5]
BUSINESS INTELLIGENCE SOFTWARE, DASHBOARDS, REPORTING. OLAP
Definition and its use in Business Intelligence Applications. [online]. 2010 [quoted
2011-10-23]. Available via WWW: <http://www.logixml.com/biencyclopedia/olap/>.
[6]
WIKIPEDIA : THE FREE ENCYCLOPEDIA. Online Transacation Processing.
[online]. 20100, 2011-08-30 [quoted 2011-10- 25]. Available via WWW:
<http://en.wikipedia.org/wiki/OLTP>.
8. Attachment
Attachment no.1
How is database integrity assured within the relational database environment?
Attachment no.2
Digital dashboard
Attachment no.3
Attachment no.4
Attachment no.5
Attachment no.6
Attachment no.7
Download