Chapter 5 Foundations of Business Intelligence: Databases and Information Management An effective information system provides users with accurate, timely, and relevant information. Accurate information is free of errors. Information is timely when it is available to decision makers when it is needed. File Organization Terms and Concepts: A computer system organizes data in a hierarchy that starts with bits and bytes and progresses to fields, records, files, and databases. Problems with The Traditional File Environment In most organizations, systems tended to grow independently without a company-wide plan. Accounting, finance, manufacturing, human resources, and sales and marketing all developed their own systems and data files. Fig. Traditional File Processing Data Redundancy and Inconsistency Data redundancy is the presence of duplicate data in multiple data files so that the same data are stored in more than place or location. Data redundancy wastes storage resources and also leads to data inconsistency, where the same attribute may have different values. Program-Data Dependence Program-data dependence refers to the coupling of data stores in files and the specific programs required to update and maintain those files such that changes in programs require changes to the data. Lack of Flexibility A traditional file system can deliver routine scheduled reports after extensive programming efforts, but it cannot deliver ad hoc reports or respond to unanticipated information requirements in a timely fashion. Poor Security Because there is little control or management of data, access to and dissemination of information may be out of control. Management may have no way of knowing who is accessing or even making changes to the organization’s data. Lack of Data Sharing and Availability Because pieces of information in different files and different parts of the organization cannot be related to one another, it is virtually impossible for information to be shared or accessed in a timely manner. The Database Approach To Data Management Database Management Systems A database management systems (DBMS) is software that permits an organization to centralize data, manage them efficiently, and provide access to the stored data by application programs. How a DBMS Solves the Problems of the Traditional File Environment A DBMS reduces data redundancy and inconsistency by minimizing isolated files in which the same data are prepared. The DBMS may not enable the organization to eliminate data redundancy entirely, but it can help control redundancy. Rational DBMS Cotemporary DBMS use different database models to keep track of entities, attributes, and relationships. The most popular type of DBMS today for PCs as well as for larger computers and mainframes is the relational DBMS. Fig. Relational Database Tables Operations of a Relational DBMS Relational database tables can be combined easily to deliver data required by users, provided that any two tables share a common data element. Object-Oriented DBMS An object-oriented DBMS stores the data and procedures that act on those data as objects that can be automatically retrieved and shared. Hybrid objectrelational DBMS systems are now available to provide capabilities of both objectoriented and relational DBMS. Databases in the Cloud Cloud computing providers offer database management services, but these services typically have less functionally than their on-premises counterparts. Capabilities of Database Management Systems: DBMS have a data definition capability to specify the structure of the content of the database. A data dictionary is an automated or manual file that stores definitions of data elements and their characteristics. Querying and Reporting Most DBMS have a specialized language called a data manipulation language that is used to add, change, delete, and retrieve the data in the database. Designing Databases To create a database, you must understand the relationships among the data, the type of data that will be maintained in the database, how the data will be used, and how the organization will need to change to manage data from a companywide perspective. The database requires both a conceptual design and a physical design. Normalization and Entity-Relationship Diagrams The process of creating small, stable, yet flexible and adaptive data structures from complex groups of data is called normalization. The relationship between the entities SUPPLIER, PART, LINE_ITEM, AND ORDER is called entityrelationship diagram. FIG. Normalized Tables Created From Order FIG. An Entity-Relationship Diagram Using Databases to Improve Business Performance And Decision Making: Businesses use their databases to keep track of basic transactions, such as paying suppliers, processing orders, keeping track of customers, and paying employees. But they also need databases to provide information that will help the company run the business more efficiently, and help managers and employees make better decisions. Data Warehouses A data warehouse is a database that stores current and historical data of potential interest to decision makers throughout the company. Fig. Component Of A Data Warehouse Data Marts A data mart is a subset of a data warehouse in which a summarized or highly focused portion of the organization’s data is placed in a separate database for a specific population of users. Tools For Business Intelligence: Multidimensional Data Analysis and Data Mining Online Analytical Processing (OLAP) Online Analytical Processing (OLAP) supports multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions. OLAP enables users to obtain online answers to ad hoc questions such as these in a fairly rapid amount of time, even when the data are stored in very large databases, such as sales figures for multiple years. Data Mining Data mining is more discovery-driven. Data mining provides insights into corporate data that cannot be obtained with OLAP by finding hidden patterns and relationships in large databases and inferring rules from them to predict future behavior. Text Mining and Web Mining Text mining tools are now available to help businesses analyze these data. These tools are able to extract key elements from large unstructured data sets, discover patterns and relationships, and summarize the information. Web mining is the discovery and analysis of useful patterns and information form the World Wide Web. Businesses might turn to Web mining to help them understand customer behavior, evaluate the effectiveness of a particular Web site, or quantify the success of a marketing campaign. Managing Data Resources Setting up a database is only a start. In order to make sure that the data for your business remain accurate, reliable, and readily available to those who need it, your business will need special policies and procedures for data management. Establishing An Information Policy An information policy specifies the organization’s rules for sharing disseminating, acquiring, standardizing, classifying, and inventorying information. Data administration is responsible for the specific policies and procedures through which data can be managed as an organizational resource. Data governance used to describe many of these activities. Promoted by IBM, data governance deals with the policies and processes for managing the availability, usability, integrity, and security of the data employed in an enterprise, with special emphasis on promoting privacy, security, data quality, and compliance with government regulations. Ensuring Data Quality Analysis of data quality often begins with a data quality audit, which is a structured survey of the accuracy and level of completeness of the data in an information system. Data cleaning, also known as data scrubbing, consists of activities for detecting and correcting data in a database that are incorrect, incomplete, improperly formatted, or redundant. Networking and Communication Trends Firms in the past used two fundamentally different types of networks: telephone networks, handled voice communication, and computer networks handled data traffic. Both voice and data communication networks have also become more powerful (faster), more portable (smaller and mobile), and less expensive. In few years, more than half the Internet users in the United States will use smartphones and mobile netbooks to access the Internet. Computer Network It is a network consists of two or more connected computers. Each computer on the network contains a network interface device called a network interface card (NIC). The network operating system (NOS) routes and manages communications on the network and coordinates network resources. Hubs are very simple devices that connect network components, sending a packet of data to all the other connected devices. A switch has more intelligence than a hub and can filter and forward data to a specified destination on the network. A router is a communications processor used to route packets of data through different networks, ensuring that the data sent gets to the correct address. Fig. Components of a simple computer networks Fig. Today's Corporate Network Infrastructure Key Digital Networking Technologies Contemporary digital networks and the Internet are based on three key technologies: client/server computing, the use of packet switching,a dn the development of widely used communications standards (the most important of which is Transmission Control Protocol/Internet Protocol, or TCP/IP) for linking disparate networks and computers.) Communications Networks Signals: Digital vs. Analog An analog signal is represented by a continuous waveform that passes through a communications medium and has been used for used for voice communication. A digital signal is a discrete, binary waveform, rather than a continuous waveform. Type of Type Local area network (LAN) Campus area (CAN) Metropolitan Networks Area Up to 500 meters (half a mile); an office or floor of a building network Up to 1,000 meters (a mile); a college campus or corporate facility area A city or metropolitan area network (MAN) Wide area network A transcontinental or global area (WAN) The Global Internet The Internet has become the world's most extensive, public communication system that now rivals the global telephone system in reach and range. An Internet service provider (ISP) is a commercial organization with a permanent connection to the Internet that sells temporary connections to retail subscribers. The Domain Name System Because it would be incredibly difficult for Internet users to remember strings of 12 numbers, the Domain Name System (DNS) converts domain names to IP addresses. .com .edu .gov .mil .net .org .biz .info Commercial organizations/businesses Educational institutions U.S. government agencies U.S. military Network computers Nonprofit organizations and foundations Business firms Information providers Internet Internet Services And Communication Tools Services Capability E-mail Chatting and messaging Newsgroups Telnet Functions Supported Person-to-person messaging; document sharing instant Interactive conversations Discussion groups on electronic bulletin boards Logging on to one computer system and doing work on another File Transfer Protocol Transferring files from computer to computer (FTP) World Wide Web Retrieving, formatting, and displaying information (including text, audio, graphics, and video) using hypertext links Fig. How Voice Over IP Works The Web A typical web site is a collection of web pages linked to a home page. Hypertext Web pages are based on a standard Hypertext Markup Language (HTML), which formats documents and incorporates dynamic links to other documents and pictures stored in the same or remote computers. Hypertext Transfer Protocol (HTTP) is the communications standard used to transfer pages on the web. Web Servers A Web server is software for locating and managing stored Web pages. Searching for Information on the Web Search Engines Search Engines attempt to solve the problem of finding useful information on the Web nearly instantly, and arguably, they are the "killer app" of the Internet era. Search engines have become major shopping tools by offering what is now called search engine marketing. Search engine optimization (SEO) is the process of improving the quality and volume of Web site achieve a higher ranking with the major search engines when certain keywords and phrases are put in the search field. Web 2.0 The second-generation interactive Internet-based services are referring to as Web 2.0. It has four defining features: interactivity, real-time user control, social participation (sharing), and user-generated content. A blog, the popular term for a Weblog, is a personal Web site that typically contains a series of chronological entries (newest to oldest) by its author, and links to related web pages. Web 3.0: The Future Web The future of the Web involves developing techniques to make searching the 100 billion public Web pages more productive and meaningful for ordinary people. Web 1.0 solved the problem of obtaining access to information. Web 2.0 solved the problem of sharing that information with others and building new Web experiences. Web 3.0 is the promise of a future Web where all this digital information, all these contacts, can be woven together into a single meaningful experience. Sometimes this is referred to as the Semantic Web which means "meaning". The Wireless Revolution Wireless communication helps businesses more easily stay in touch with customers, suppliers, and employees and provides more flexible arrangements for organizing work. In addition to voice transmission, they feature capabilities for e-mail, messaging, wireless Internet access, digital photography and personal information management. The features of iPhone and BlackBerry illustrate the extent to which cellphones have evolved into small mobile computers. Wireless Computer Networks and Internet Access If you have a laptop computer, you might be able to use it to access the Internet as you move from room to room in your dorm, or table to table in in your university library. Bluetooth Bluetooth is the popular name for the 802.15 wireless networking standard, which is useful for creating small personal area networks (PANs). Although Bluetooth lends itself to personal networking, it has uses in large corporations. Wi-Fi and Wireless Internet Access The 802.11 set of standard for wireless LANs and wireless Internet access is also known as Wi-Fi. The first of these standards to be widely adopted was 502.11b, which can transmit up to 11 Mbps in the unlicensed 2.4-GHz band and has an effective distance of 30 to 50 meters. Hotspots typically consist of one or more access points providing wireless Internet access in a public place. Fig. A Bluetooth Network (PAN) WiMax The range of Wi-Fi systems is no more than 300 feet from the base station, making it difficult for rural groups that don't have cable or DSL service to find wireless access to the Internet. The IEEE developed a new family of standards known as WiMax to deal with these problems. WiMax, which stands for Worldwide Interoperability for Microwave Access, is the popular term for IEEE Standard 802.16. Radio Frequency Identification (RFID) Radio frequency identification (RFID) systems provide a powerful technology for tracking the movement of goods throughout the supply chain. RFID systems use tiny tags with embedded microchips containing data about an item and its location to transmit radio signals over a short distance to RFID readers. Fig. How RFID works Wireless Sensor Networks Wireless sensor networks (WSNs) are networks of interconnected wireless devices that are embedded into the physical environment to provide measurements of many points over large spaces. These devices have built-in processing, storage, and radio frequency sensors and antennas. Wireless sensor networks are valuable in areas such as monitoring environmental changes, monitoring traffic or military activity, protecting property, efficiently operating and managing machinery and vehicles, establishing security perimeters, monitoring supply chain management, or detecting chemical, biological, or radiological material.