Mendel University in Brno Business Intelligence Information Systems Karel Burda, Bc. Sabina Baběrádová, Bc. Jan Huráb, Bc. Markéta Michálková, Bc. Petr Krejčík, Bc. Nikol Varčevová Brno 2011 1. Introduction Business intelligence (BI) is one of the most important and progressing phenomenon's in Information Technology. To declare what Business Intelligence is, we must point out that BI virtually means a set of certain tools, which main aim is to collect certain piece of business information and perform analysis of these data. Thus, the BI helps us while we are making strategic business decisions. And this is the main purpose of BI after all. With the increasing amount of the data and information in businesses, the importance of Business Intelligence is about to become more and more important. Hence nowadays it is expected that the investments of BI in businesses shall be bigger and bigger (Rydziová, 2006). Since the main goal of BI is a very complex issue, thus BI systems itself are very complicated and challenging as well. Organization need more than just data and information. Organizations need business intelligence (BI) - collective, information about customers, competitors, business partners, competitive environment and internal operations – that gives the company ability to make effective, important and often strategic business decisions. BI enables the organization to extract the true meaning of information, so company can take creative and powerful steps to ensure competitive advantage. To create business intelligence, you need data and information. Therefore you must first gather and organize all your data and information. Then you have to have the right IT tools to define and analyze various relationships within the information. Technology such as databases, database management systems, data warehouses and data-mining tools can definitely help you build and use business intelligence. As you begin working with these IT tools, you´ll be performing the two types of information processing: Online transaction processing (OLTP) is gathering of input information, processing that information and updating existing information to reflect the gathered and processed information. OLTP creates a class of systems that facilities and support ordinary operative activities. Data entry and might be mentioned for instance. Few of the benefits are reduced paper trails and the faster, more accurate forecasts for revenues and expenses6. Databases that supports OLTP are most often referred to as operational database. These operational databases contain valuable information that forms the basis for business intelligence. Online analytical processing (OLAP) is manipulation of information to support decision making. Online analytical processing is one of the tools of how to support decision making, provided having some data. These tools perform process of analyzing the data from different dimensions. Basically, the product or result of this process is an OLAP cube (sometimes referred to as multidimensional cube). For example, if you have a table of data about sales, you can analyze it by product type (a dimension), by demographic (another dimension), by geographic region (another dimension), etc. The data you see can even be always the same, but it is prioritized by whatever column you place first–which we call a dimension5. 2. The Relational Database Model For storing and organizing basic and transaction-oriented information (that is eventually used to create business intelligence), business today use databases. On this place, we will focus on the most popular database model – the relational database model. The relational database model was conceived by E. F. Codd in 1969, then a researcher at IBM. The model is based on branches of mathematics called set theory and predicate logic. The basic idea behind the relational model is that a database consists of a series of unordered tables (or relations) that can be manipulated using non-procedural operations that return tables. This model was in vast contrast to the more traditional database theories of the time that were much more complicated, less flexible and dependent on the physical storage methods of the data. Any database can be defined as a collection of information that we can organize and access according to the logical structure of that information. Then, we can say that the relational database uses a series of logically related two-dimensional tables or files to store information in the form of a database. A relational database is actually composed of two distinct parts: 1) the information itself, stored in a series of two-dimensional tables, files, or relations, and 2) the logical structure of that information. A relational database allows the definition of data structures, storage and retrieval operations and integrity constraints. In such a database the data and relations between them are organized in tables. A table is a collection of records and each record in a table contains the same fields. Properties of relational tables: values are atomic, each row is unique, column values are of the same kind, the sequence of columns is insignificant, the sequence of rows is insignificant, each column has a unique name. For real examples of integrity and properties of relational tables, see Attachment 1 please. 2.1. Example of relational database At first, we would like to introduce a simple example demonstrating features of the relational database model. Let´s look at a Solomon´s database (a portion of Solomon Enterprises´ Database for Customer Relationship Management Ordering Processing can be found in Attachment 1). It contains five files: Order, Customer, Concrete Type, Employee, and Truck. All of these files may be found as Attachment 3–7. These files are all related for numerous reasons - customers make orders, employees drive trucks, an order has a concrete type, etc. We need all these files to manage our customer relationships and process orders. Within each file, you can see specific pieces of information (attributes). For example, the Order file contains Order Number, Order Date, Customer Number, Delivery Address, etc. In the Customer file, you can see specific information including Customer Number, Customer Name, Customer Phone, and Customer Contact. These are all important pieces of information that Solomon´s database should contain. Moreover, Solomon needs all this information (and probably much more) to effectively process orders and manage customer relationships. 2.2. Data Dictionary Using the relational database model, you organize and access information according to its logical structure, not its physical position. In the relational database model, a data dictionary contains the logical structure for the information in a database. When you create a database, you first create its data dictionary. The data dictionary contains important information about your information. For example, the data dictionary for Customer Phone in the Customer file would require 10 digits. The data dictionary for Date of Hire in the Employee file would require a day, month, and year. Using a database, you must clearly define the characteristics of each field by creating a data dictionary. So, you must carefully plan the design of your database before you can start adding information. 2.3. Requirements of the relational database In a relational database, it is necessary create ties or relationships in the information that show how the files relate to each other. Before the creation of these relationships among files, the primary key of each file has to be specified. The essential job of a database table´s primary key is to uniquely identify the rows in the table - nothing more, nothing less. Each primary key value must be unique within a table so the database engine can tell the difference between rows. The same primary key value may appear in another table, but you can´t duplicate it within a table. And, the primary key can´t be null because the database engine requires a value to locate the record. The second major job of the primary key is to provide a "hook" for creating table relationships. In Solomon´s database, Order Number is the primary key for the Order file and Customer Number is the primary key for the Customer file. A logical relationship between the two files is an example of a foreign key. A foreign key is a primary key of one file that appears in another file. Foreign keys are essential in the relational database model. Without them, there is no way of creating logical ties among the various files. 2.4. Relational Integrity Constraint Integrity constraints are sets of rules that can help maintain the quality of information that is put up. Integrity constraints are mostly used when trying to promote accuracy and consistency of data that is found in a relational database. This is very important to companies because information can be considered as an asset to certain organizations and it must be protected. Therefore, relational integrity constraints are rules which all instances of the relational database must satisfy in order to correctly model the real world. As a very good example of ensuring the quality of the information the Ritz-Carlton hotel can be mentioned. Ritz-Carlton has created a powerful guest preference database to provide customized, personal, and high-level service to guests of any of its hotels. By assigning to you a unique customer ID that creates logical ties to your various preferences, the Ritz-Carlton transfers your information to all of its other hotels. The next time you stay in a Ritz-Carlton, e.g. in Florida, your information is already there, and the hotel staff immediately knows your preferences. For the management at Ritz-Carlton, achieving customer loyalty starts first with knowing each customer individually. See Attachment no. 1 for examples. 3. Database management system To understand what really is a database the best way how to understand it is to imagine it as if it was some document or workbook. They all have one thing in common. They contain some information. And as you need some word processor to work with document, you need some database management system to work with databases. A database management system (DBMS) simply helps you to orientate in the database and it helps you to work with it. A DBMS contains four important software components: 1. DBMS engine 2. Data definition subsystems 3. Application generation subsystem 4. Data administration subsystem 3.1. DBMS engine DBMS engine accepts logical requests from various other DBMS subsystems, converts them into their physical equivalent, and actually accesses the database and data dictionary as they exist on a storage device. Here it is important to explain the difference between physical and logical view. Physical view of information is how info is physically put somewhere (i.e. hard drive). The logical view is then focuses how you can access the information you need. For one physical view there can be more logical views according to the needs of each user of the database. 3.2. Data Definition Subsystem The data definition subsystem helps you to create and maintain the data dictionary and define the structure of the files in the database. To understand how creation of the databases works, I will explain it on the creation of worksheet. If you want to make a worksheet, you can start and immediately add information to it. With databases it is not so simple. If you want to have a database where you can add information, you need make a logical structure of the database, set some rules, use data definition subsystems. The same process you have to do if you want to change or delete some data. To be able to use this better there exist a lot of data manipulation tools, including views, report generators, query by example tools and structured query language. 3.3. Data Manipulation System It helps you add, delete or change the information in a database and query it for valuable information. It is more or less a middleman between the database and user and helps the logical view put to a physical view in database and vies versa. Views: it allows you to see the content of a database file, make whatever changes you want, perform simple sorting, and query to find location of specific information. Report generators: help you quickly define formats of reports and what information you want to see in report. Once you define a report, you can view it on the screen or print it. Query by example tools: help you graphically design and answer some questions and problems. By clicking and sorting you can get some info. Structured query language: is similar to QBM but it uses statements like SELECT… FROM… etc. So it is for more experienced users and programmers. 3.4. Application Generation Subsystem Is used to develop transaction – intensive applications. These types of application require that you perform a detailed series of tasks to process a transaction. As with SQL this application is more used by IT specialists than by usual user. 3.5. Data Administration Subsystem Data administration subsystem is mostly used by a data administrator, or database administrator, someone who is responsible for assuring the database. It meets the entire information needs of an organization: Backup and recovery Security management facilities Query optimization facilities Reorganization facilities Concurrency control facilities Change management facilities 4. Data Warehouse and Data Mining Nowadays every technological progress is understood as advantage for the companies. Well, it sure that is quick, logical, helpful, etc. Anyway many companies are using the newest technologies because they are “hot”. Sometimes is better to answer these questions: Do you need it? Do all employees need an entire data? How up-to-date must the information be? What tools do you need? Data Warehouse could be defined as a storage of data (i.e. database) used for analysis and reporting. The source data come from multiple transaction systems within the company (or organization) and the data warehouses keep the copy of it and thus maintain data history. Data warehouses may be multidimensional (the OLAP cube), normalized, or they can use another structure of the data though. Figure 1: OLAP cube If we consider being indispensable without data warehouse and we create this process than is time to think about tools which can provide all information wanted. These tools are called data mining tools which are software tools needed to find hidden data. These tools support the concept OLAP and include query-and-reporting tools, intelligent agents, multidimensional analysis tools and statistical tools. Query-and-reporting tools: are similar to SQL, QBE tools and report generators. Intelligent agents: utilize artificial intelligence, base of business intelligence. Multidimensional analysis: allows you to see information from different perspectives. Statistical tools: allow to apply mathematic models. One of question was if everybody in company is allowed to see any information. From this reason company can create subsets of a data warehouse which are called data marts. These are data for specific segment of employees who do not use all data. Creating and maintaining a Data Warehouse is a huge job even for the largest companies. It can take a long time and cost a lot of money. That is why the company should first answer these questions. 5. Information Ownership One particular interesting feature of many BI software packages is a digital dashboard. A digital dashboard displays key information gathered from several sources on a computer screen in a format tailored to the needs and wants of an individual knowledge worker (see the attachment). It can provide up-to-the minute snapshots of any type of information and can often help you identify trends that may represent opportunities or that may be problems. Refer to the Attachment no. 2 for details. Data administration is the function in an organization that plans for, oversees the development of, and monitors the information resource. This function must be completely in tune with the strategic direction of the organization to assure that all information requirements can be and are being met. Database administration is the function in an organization that is responsible for more technical and operational aspects of managing the information contained in organizational information repositories (databases, data warehouses, and data marts). Database administration functions include defining and organizing database structures and contents, developing security procedures (in concert with CSO), and approving and monitoring the development of database and database applications. In large organizations, both of these administrative functions are usually handled by steering committees rather than by a single individual. These steering committees are responsible for their respective functions and for reporting to the CIO. 5.1. Sharing information with responsibility Information sharing in your organization means that anyone can access and use whatever information he needs. But information sharing brings to light an important question: Does anyone in your organization own the information? In other words, if everyone shares the information, who is ultimately responsible for providing the information and assuring the quality of the information? Information ownership is a key consideration in today´s information-based business environment. Someone must accept full responsibility for providing specific pieces of information ensuring the quality of information. If you find the wrong information is stored in the organization´s data warehouse, you must be able to determine the source of the problem and whose responsibility it is. 5.2. Information cleanliness Information cleanliness is an important topic today and will be for many years. Have you ever received the same piece of advertising mail multiple times from the same company on the same day? Many people have, and it´s an example of “unclean” information. The reason may be your name may appear twice in database, once with your middle initial and once without it, or your last name is once right and once it has wrong spelling. In the case of having your information twice in database with two different spellings of your last name, the utility would probably determine that the two records actually belong to the same person because of identical nature of other associated information such as your address and phone number. Always remember GIGO – garbage in, garbage out. If bad information, such as duplicate records for the same customer, goes into the decision-making process, you can rest assured that the decision outcome will not be optimal. 6. Conclusion Business Intelligence is one of the hottest topics and markets today. The entire BI market is in the range of 50 billion dollars annually and for the next several years we can expect double-digit growth. When 300 business technology managers were asked about their immediate project plans, 44% identified data warehouses and 43% identified data-mining tools. Nowadays BI is related to thousands of success stories. The objective of BI is to improve the timeliness and quality of the input for decision making by helping knowledge workers to understand the: capabilities available in the organization, state of the art, trends and future directions in the market, technological, demographic, economic, political, social and regulatory environments in which the organization competes, action of competitors and the implications of the action. Business Intelligence covers both internal and external information. Companies with well-designed BI system find that their managers make better decisions on a variety of business issues. Higher quality managerial decision making lets companies gain an advantage over their competitors who operate without the benefit of BI systems. BI systems provide managers with actionable information and knowledge: at the right time, in the right location, in the right form. 7. Resources [1] HAAG, Stephen ; CUMMINGS, Maeve. Management Information System : for the Information Age. Eight. New York : Irwin, 2010. 555 s. ISBN 978-0-07-016-7094. [2] WATTERSON, Karen. Data miners Tool. Byte, 192 s. [3] DATABASE MODELS. Relational Model [online]. [accessed on 2011-05-08]. Available from WWW: <http://unixspace.com/context/databases.html#RELATIONAL>. [4] COMPUTER BUSINESS RESEARCH. Relational integrity constraint [online]. [accessed on 2011-05-08]. Available from WWW: <http://sites.google.com/site/b188sjsu/Home/database/relational-integrityconstraint>. [5] BUSINESS INTELLIGENCE SOFTWARE, DASHBOARDS, REPORTING. OLAP Definition and its use in Business Intelligence Applications. [online]. 2010 [quoted 2011-10-23]. Available via WWW: <http://www.logixml.com/biencyclopedia/olap/>. [6] WIKIPEDIA : THE FREE ENCYCLOPEDIA. Online Transacation Processing. [online]. 20100, 2011-08-30 [quoted 2011-10- 25]. Available via WWW: <http://en.wikipedia.org/wiki/OLTP>. 8. Attachment Attachment no.1 How is database integrity assured within the relational database environment? Attachment no.2 Digital dashboard Attachment no.3 Attachment no.4 Attachment no.5 Attachment no.6 Attachment no.7