CHAPTER 5 Data and Knowledge Management CHAPTER OUTLINE 5.1 5.2 5.3 5.4 5.5 Managing Data The Database Approach Database Management Systems Data Warehouses and Data Marts Knowledge Management DIFFICULTIES OF MANAGING DATA • Amount of data increasing exponentially • Data are scattered throughout organizations and collected by many individuals using various methods and devices. • Data come from many sources. • Data security, quality, and integrity are critical. ANNUAL FLOOD OF DATA FROM….. Credit card swipes E-mails Digital video Online TV RFID tags Blogs Digital video surveillance Radiology scans Source: Media Bakery ANNUAL FLOOD OF NEW DATA! In the zettabyte range A zettabyte is 1000 exabytes © Fanatic Studio/Age Fotostock America, Inc. DATA GOVERNANCE •Data Governance See video DATA GOVERNANCE •Master Data Management See video DATA GOVERNANCE •Master Data See video MASTER DATA MANAGEMENT John Stevens registers for Introduction to Management Information Systems (ISMN 3140) from 10 AM until 11 AM on Mondays and Wednesdays in Room 41 Smith Hall, taught by Professor Rainer. Transaction Data John Stevens Intro to Management Information Systems ISMN 3140 10 AM until 11 AM Mondays and Wednesdays Room 41 Smith Hall Professor Rainer Master Data Student Course Course No. Time Weekday Location Instructor BIG DATA Defining Big Data as diverse, high-volume,high-velocity information assets that require new forms of processing to enable enhanced decision making, insight discovery, and process optimization . Exhibit variety; • Include structured, unstructured, and semi -structured data; • Are generated at high velocity with an uncertain pattern; • Do not fi t neatly into traditional, structured, relational databases (discussed later in this chapter); and Can be captured, processed, transformed, and analyzed in a reasonable amount of time only by sophisticated information systems. EXAMPLES OF BIG DATA When the Sloan Digital Sky Survey in New Mexico was launched in 2000, its telescope collected more data in its first few weeks than had been amassed in the entire history of astronomy. By 2013, the survey’s archive contained hundreds of terabytes of data. However, the Large Synoptic Survey Telescope in Chile, due to come online in 2016, will collect that quantity of data every five days. • In 2013 Google was processing more than 24 petabytes of data every day. • Facebook members upload more than 10 million new photos every hour. In addition, they click a “like” button or leave a comment nearly 3 billion times every day. • The 800 million monthly users of Google’s YouTube service upload more than an hour ofvideo every second. • The number of messages on Twitter grows at 200 percent every year. By mid-2013 the volume exceeded 450 million tweets per day. CHARACTERISTICS OF BIG DATA Volume: We have noted the incredible volume of Big Data in this chapter. Although the sheer volume of Big Data presents data management problems, this volume also makes Big Data incredibly valuable. Irrespective of their source, structure, format, and frequency, data are always valuable. If certain types of data appear to have no value today, it is because we have not yet been able to analyze them effectively. For example, several years ago when Google began harnessing satellite imagery, capturing street views, and then sharing these geographical data for free, few people understood its value. Today, we recognize that such data are incredibly useful (e.g., consider the myriad of uses for Google Maps). CHARACTERISTICS OF BIG DATA Velocity: The rate at which data fl ow into an organization is rapidly increasing. Velocity is critical because it increases the speed of the feedback loop between a company and its customers. For example, the Internet and mobile technology enable online retailers to compile histories not only on fi nal sales, but on their customers’ every click and interaction. Companies that can quickly utilize that information—for example, by recommending additional purchases—gain competitive advantage. CHARACTERISTICS OF BIG DATA Variety: Traditional data formats tend to be structured, relatively well described, and they change slowly. Traditional data include fi nancial market data, point-of-sale transactions, and much more. In contrast, Big Data formats change rapidly. They include satellite imagery, broadcast audio streams, digital music fi les, Web page content, scans of government documents, and comments posted on social networks. MANAGING BIG DATA The first step for many organizations toward managing Big Data was to integrate information silos into a database environment and then to develop data warehouses for decision making. After completing this step, many organizations turned their attention to the business of information management—making sense of their proliferating data. In recent years, Oracle, IBM, Microsoft,and SAP have spent billions of dollars purchasing software firms that specialize in data management and business intelligence. In addition, many organizations are turning to NoSQL databases (think of them as “ not only SQL” databases) to process Big Data. These databases provide an alternative for firms that have more and dif ferent kinds of data (Big Data) in addition to the traditional, structured data that fit neatly into the rows and columns of relational databases. LEVERAGING BIG DATA Organizations must do more than simply manage Big Data; they must also gain value from it. In general, there are six broadly applicable ways to leverage Big Data to gain value. Creating Transparency. Simply making Big Data easier for relevant stakeholders to access in a timely manner can create tremendous business value. In the public sector, for example, making relevant data more readily accessible across otherwise separate departments can sharply reduce search and processing times. In manufacturing, integrating data from R&D, engineering, and manufacturing units to enable concurrent engineering can significantly reduce time to market and improve quality. LEVERAGING BIG DATA Enabling Experimentation. Experimentation allows organizations to discover needs and improve per formance. As organizations create and store more data in digital form, they can collect more accurate and detailed per formance data (in real or near -real time) on ever ything from product inventories to per sonnel sick days. IT enables organizations to set up controlled experiments. For example, Amazon constantly experiments by of fering slightly dif ferent “looks” on its Web site. These experiments are called A/B experiments, because each experiment has only two possible outcomes. Here is how the experiment works: Hundreds of thousands of people who click on Amazon.com will see one ver sion of the Web site, and hundreds of thousands of other s will see the other ver sion. One experiment might change the location of the “ Buy” button on the Web page. Another might change the size of a par ticular font on the Web page. Amazon captures data on an assor tment of variables from all of the clicks, including which pages user s visited, the time they spent on each page, and whether the click led to a purchase. It then analyzes all of these data to “tweak” its Web site to provide the optimal user experience. LEVERAGING BIG DATA Segmenting Population to Customize Actions. Big Data allows organizations to create narrowly defined customer segmentations and to tailor products and services to precisely meet customer needs. For example, companies are able to perform micro-segmentation of customers in real time to precisely target promotions and advertising. Suppose, for instance, that a company knows you are in one of its stores, considering a particular product. (They can obtain this information from your smartphone, from in -store cameras, and from facial recognition software.) They can send a coupon directly to your phone of fering 10 percent of f if you buy the product within the next five minutes. LEVERAGING BIG DATA Replacing/Supporting Human Decision Making with Automated Algorithms. Sophisticated analytics can substantially improve decision making, minimize risks, and unearth valuable insights. For example, tax agencies use automated risk -analysis software tools to identify tax returns that warrant for further examination, and retailers can use algorithms to fine-tune inventories and pricing in response to real -time in-store and online sales. LEVERAGING BIG DATA Innovating New Business Models, Products, and Services. Big Data enables companies to create new products and services, enhance existing ones, and invent entirely new business models. For example, manufacturers utilize data obtained from the use of actual products to improve the development of the next generation of products and to create innovative after-sales service of ferings. The emergence of real -time location data has created an entirely new set of location-based services ranging from navigation to pricing property and casualty insurance based on where, and how, people drive their cars. LEVERAGING BIG DATA Organizations Can Analyze Far More Data. In some cases, organizations can even process all the data relating to a particular phenomenon, meaning that they do not have to rely as much on sampling. Random sampling works well, but it is not as effective as analyzing an entire dataset. In addition, random sampling has some basic weaknesses. To begin with, its accuracy depends on ensuring randomness when collecting the sample data. However, achieving such randomness is tricky. Systematic biases in the process of data collection can cause the results to be highly inaccurate. For example, consider political polling using landline phones. This sample tends to exclude people who use only cell phones. This bias can seriously skew the results, because cell phone users are typically younger and more liberal than people who rely primarily on landline phones. 5.2 THE DATABASE APPROACH Database management system (DBMS) minimize the following problems: Data redundancy Data isolation Data inconsistency DATABASE APPROACH (CONTINUED) DBMSs maximize the following issues: Data security Data integrity Data independence DATABASE MANAGEMENT SYSTEMS DATA HIERARCHY Bit Byte Field Record File (or table) Database HIERARCHY OF DATA FOR A COMPUTER-BASED FILE DATA HIERARCHY (CONTINUED) Bit (binary digit) Byte (eight bits) DATA HIERARCHY (CONTINUED) Example of Field and Record DATA HIERARCHY (CONTINUED) Example of Field and Record DESIGNING THE DATABASE Data model Entity Attribute Primary key Secondary keys ENTIT Y-RELATIONSHIP MODELING Database designers plan the database design in a process called entity -relationship (ER) modeling . ER diagrams consists of entities, attributes and relationships. Entity classes Instance Identifiers 5.3 DATABASE MANAGEMENT SYSTEMS Database management system (DBMS) Relational database model Structured Query Language (SQL) Query by Example (QBE) STUDENT DATABASE EXAMPLE NORMALIZATION Normalization Minimum redundancy Maximum data integrity Best processing performance Normalized data occurs when attributes in the table depend only on the primary key. NON-NORMALIZED RELATION NORMALIZING THE DATABASE (PART A) NORMALIZING THE DATABASE (PART B) NORMALIZATION PRODUCES ORDER 5.4 DATA WAREHOUSING Data warehouses and Data Marts Organized by business dimension or subject Multidimensional Historical Use online analytical processing DATA WAREHOUSE FRAMEWORK & VIEWS BENEFITS OF DATA WAREHOUSING End users can access data quickly and easily via Web browsers because they are located in one place. End users can conduct extensive analysis with data in ways that may not have been possible before. End users have a consolidated view of organizational data. 5.5 KNOWLEDGE MANAGEMENT Knowledge management (KM) Knowledge Intellectual capital (or intellectual assets) © Peter Eggermann/Age Fotostock America, Inc. KNOWLEDGE MANAGEMENT (CONTINUED) Explicit Knowledge (above the waterline) Tacit Knowledge (below the waterline) © Ina Penning/Age Fotostock America, Inc. KNOWLEDGE MANAGEMENT (CONTINUED) Knowledge management systems (KMSs) Best practices © Peter Eggermann/Age Fotostock America, Inc. KNOWLEDGE MANAGEMENT SYSTEM CYCLE Create knowledge Capture knowledge Refine knowledge Store knowledge Manage knowledge Disseminate knowledge KNOWLEDGE MANAGEMENT SYSTEM CYCLE HOMEWORK Answer the questions of the «Closing Case Can Organizations Have Too Much Data»