Chapter Review 1 Chapter Review: 11 & 13 Jessica Izeh University of Texas Rio Grande Valley Chapter Review 2 Chapter 11 1. What is SQL performance tuning? SQL performance tuning describes a process – on the client-side – that will generate an SQL query to return the correct answer in the least amount of time, using the minimum amount of resources at the server. 2. What is database performance tuning? DBMS performance tuning describes a process – on the server-side – that will properly configure the DBMS environment to respond to clients’ requests in the fastest way possible while making optimum use of existing resources. 3. What is the focus of most performance-tuning activities, and why does that focus exist? Most performance-tuning activities focus on minimizing the number of I/O operations because the I/O operations are much slower than reading data from the data cache. 4. What are database statistics, and why are they important? Database statistics are dynamic metadata that assist the query optimizer in making better decisions in regard to coming up with the fastest query execution plan. For example, if there are only a dozen rows in a table, then there is no point going to an index to do a lookup; you will always be better off doing a full table scan. But if that same table grows to a million rows, then you will probably be better off using the index. A column with a lot of text is suited for full-text search. 5. How are database statistics obtained? Database statistics are obtained by the database server as per user configuration. One can configure the database for automatic database statistics collection and optimization, or they could manually configure (or reconfigure) it. 6. What database statistics measurements are typical of tables, indexes, and resources? For tables, typical measurements include the number of rows, the number of disk blocks used, row length, the number of columns in each row, the number of distinct values in each column, the maximum value in each column, the minimum value in each column, what columns have indexes, and so on. For indexes, typical measurements include the number and name of columns in the index key, the number of key values in the index, the number of distinct key values in the index key, histogram of key values in an index, etc. For resources, typical measurements include the logical and physical disk block size, the location and size of data files, the number of extends per data file, and so on. 7. How is the processing of SQL DDL statements (such as CREATE TABLE) different from the processing required by DML statements? Chapter Review 3 A DDL statement updates the data dictionary tables or system catalog, while a DML statement (SELECT, INSERT, UPDATE and DELETE) mostly manipulates end-user data. 8. In simple terms, the DBMS processes a query in three phases. What are the phases, and what is accomplished in each phase? The three phases are as follows. i. Parsing: The DBMS parses the SQL query and chooses the most efficient access/execution plan ii. Execution: The DBMS executes the SQL query using the chosen execution plan. iii. Fetching: The DBMS fetches the data and sends the result set back to the client. If indexes are so important, why not index every column in every table? (Include a brief discussion of the role played by data sparsity.) 9. Indexing every column in every table will tax the DBMS too much in terms of indexmaintenance processing, especially if the table has many attributes, many rows, and/or requires many inserts, updates, and/or deletes One measure to determine the need for an index is the data sparsity of the column you want to index. Indexes also take up space in memory (RAM), Too many or too large of indexes, and the DB is going to have to be swapping them to and from the disk. They also increase insert and delete time (each index must be updated for every piece of data inserted/deleted/updated). Data Sparsity is the amount of unique value that the column must contain, it measures the need for an index. Knowing the sparsity helps you decide whether the use of an index is required or not. 10. i. ii. What is the difference between a rule-based optimizer and a cost-based optimizer? Rule-based Optimizer also known as the original optimizer to the oracle database. It follows a set of rules mostly based on indexes and types of indexes. Cost-based Optimizer: CBO uses statistics and math to make an educated guess at the lowest cost. CBO processes multiple iterations of explain plans (called permutations). CBO picks the one with the overall lowest cost. It is designed to determine the most efficient way to carry out a SQL statement. Chapter 13 1) What is business intelligence? Give some recent examples of BI usage, using the Internet for assistance. What BI benefits have companies found? Business intelligence is a term used to describe a comprehensive, cohesive and integrated set of applications used to capture, collect, integrate, store and analyze data to generate and present information that is used to support business decision making. Examples. Chapter Review 4 Business intelligence is used in the customer section to enhance sales activities. It has been said customers are the core of any business, and here BI can help one to understand them better where a company gathering data on the purchasing history, accordingly for looking as the customer preferences help to produce actionable results. BI can be used to analyze web traffic; Google offers various tools to examine the website like webmaster accessories. It enables any business to attract more customers and invite them to make a purchase. Like in the case of the hotel industry, BI is useful to determine revenue generated per room. It gathers statistics from every department accordingly. BI also helps in reducing work of labor, smooth efficiency, make data readable and presentable. It provides accurate analysis and better business decisions. Web Examples: The Dallas Teachers Credit Union (DTCU), used geographical data analysis to increase its customer base from 250,000 professional educators to 3.5 million potential customers virtually overnight. The increase gave the credit union the ability to compete with larger banks that had a strong presence in Dallas. [http://www.computerworld.com/s/article/47371/Business_Intelligence?taxonomyId=120 For meal kit company HelloFresh, a centralized business intelligence solution saved the marketing analytics team 10-20 working hours per day by automating reporting processes. It also empowered the larger marketing team to craft regional, individualized digital marketing campaigns. Coca-Cola's business intelligence team handles reporting for all sales and delivery operations at the company. With their BI platform, the team automated manual reporting processes, saving over 260 hours a year—more than six 40-hour workweeks. 2) Describe the BI framework. Illustrate the evolution of BI. BI is not a product by itself, but a framework of concepts, practices, tools, and technologies that help a business better understand its core capabilities, provide snapshots of the company situation, and identify key opportunities to create competitive advantage. In practice, BI provides a well-orchestrated framework for the management of data that works across all levels of the organization.BI involves the following general steps: Collecting and storing operational data, Aggregating the operational data into decision support data, Analyzing decision support data to generate information, Presenting such information to the end-user to support business decisions, Making business decisions, which in turn generate more data that is collected, stored, etc.(restarting the process), Monitoring results to evaluate outcomes of the business decisions (providing more data to be collected, stored, etc.)To implement all these steps, BI uses varied components and technologies. 3) What are decision support systems, and what role do they play in the business environment? Chapter Review 5 Decision Support Systems (DSS) are based on computerized tools that are used to enhance managerial decision-making. Because complex data and the proper analysis of such data are crucial to strategic and tactical decision making, DSS is essential to the well-being and even survival of businesses that must compete in a global marketplace. 4) Explain how the main components of the BI architecture interact to form a system. Describe the evolution of BI information dissemination formats. The textbook Emphasizes that there is no single BI architecture; instead, it ranges from highly integrated applications from a single vendor to a loosely integrated, multi-vendor environment. However, there are some general types of functionality that all BI implementations share. Like any critical business IT infrastructure, the BI architecture is composed of data, people, processes, technology, and the management of such components i. ii. iii. iv. v. The 1970s: centralized reports running on mainframes, minicomputers, or even central server environments. Such reports were predefined and took considerable time to process. The 1980s: desktop computers, downloaded spreadsheet data from central locations. The 1990s: first-generation DSS, centralized reporting. The mid-1990s: Data Warehouse and Online analytical processing systems (OLAP) The 2000s: BI web-based dashboards and mobile BI. 5) What are the most relevant differences between operational data and decision support data? Operational data: Mostly stored in a relational database and are optimized to support transactions that represent daily operations. Ex: Each time an item is sold, Customer data, etc. Decision support data: Differ from operational data in three main areas: period, granularity, and dimensionality. Timespan: Decision support data to cover a longer time frame (Ex: Sales for a customer, A in the last five years) whereas operational data for short durations. Granularity: Decision support data is presented with different levels of aggregation, from highly summarized to near atomic. Ex: managers can analyze sales by region, they also must be able to access data of sales by region, by city within the region, by store within the city, etc. Dimensionality: Decision support data include many data dimensions and show how the data relate over those dimensions. But Operational data focus on individual transactions rather than effects of transactions over time. 6) What is a data warehouse, and what are its main characteristics? How does it differ from a data mart? A data warehouse is an integrated, subject-oriented, time-variant, and non-volatile database that provides support for decision-making. (See section 13.4 for an in-depth discussion about the main characteristics.) The data warehouse is usually a read-only database optimized for Chapter Review 6 data analysis and query processing. Typically, data are extracted from various sources and are then transformed and integrated—in other words, passed through a data filter—before being loaded into the data warehouse. Users access the data warehouse via front-end tools and/or end-user application software to extract the data in a usable form. Figure 13.4 in the text illustrates how a data warehouse is created from the data contained in an operational database. You might be tempted to think that the data warehouse is just a big summarized database. But a good data warehouse is much more than that. A complete data warehouse architecture includes support for a decision support data store, a data extraction and integration filter, and a specialized presentation interface. To be useful, the data warehouse must conform to uniform structures and formats to avoid data conflicts and to support decision making. 7) Give three examples of likely problems when operational data are integrated into the data warehouse. Database type and structure will the main problem when operational data are integrated into a data warehouse. Let us consider a university database that has four departments like accounts, libraries, faculty, students. Let the database has with one thousand students and professor’s data. There may be a student’s status as a freshman, sophomore, senior, junior in the accounting department whereas in the library it may be like Fr, So, Se, Ju. The date of enrollment for classes in the accounting table will be DD-MM-YYYY format whereas the date of enrollment in the student's department will be MM-DD-YYYY, then also the problem arises. There may also be a problem when the data type for the GPA is declared as a float in one department and decimal in other departments. 8) Prepare a high-level summary of the main requirements to evaluate DBMS products for data warehousing. There are four primary ways to evaluate a DBMS that is tailored to provide fast answers to sophistication of data extraction and loading tools the end-user analytical interface the database size requirements establish the requirements based on the size of the database, the data sources, the necessary data transformations, and the end-user query requirements. Determine what type of database is needed, i.e., a multidimensional or a relational database using the star schema. Other valid evaluation criteria include the cost of acquisition and available upgrades (if any), training, technical and development support, performance, ease of use, and maintenance. 9) Your data warehousing project group is debating whether to create a prototype of a data warehouse before its implementation. The project group members are especially concerned about the need to acquire some data warehousing skills before implementing the enterprise-wide data warehouse. What would you recommend? Explain your recommendations Knowing that data warehousing requires time, money, and considerable managerial effort, many companies create data marts, instead. Data marts use smaller, more manageable data sets that are Chapter Review 7 targeted to fit the special needs of small groups within the organization. In other words, data marts are small, single-subject data warehouse subsets. Data mart development and use costs are lower, and the implementation time is shorter. Once the data marts have demonstrated their ability to serve the DSS, they can be expanded to become data warehouses, or they can be migrated into larger existing data warehouses 10) Suppose that you are selling the data warehouse idea to your users. How would you define multidimensional data analysis for them? How would you explain its advantages to them? Multidimensional data analysis is the capability to compare values within tables. It can provide a breakdown and evaluate numbers, i.e. Customers to Sales. It can then take a portion, like Sales, and compare it to another value and get different results per that definition, dimensions. The tables aren’t in tabular view, they are more in multidimensional, almost 3-D.“Multi-dimensional Data Analysis (MDDA) refers to the process of summarizing data across multiple levels (called dimensions) and then presenting the results in a multi-dimensional grid format. This process is also referred to as OLAP, Data Pivot., Decision Cube, and Crosstab” (Multi-Dimensional, n.d.).The advantage of the multidimensional is that it can compare and contrast other businesses to see how well seated your business currently is with the economy. It can also look at sales figures, whose provide the most income and can also define what product has the most benefit. This could be essential to know what the customer wants and how to better evolve the business 11) The data warehousing project group has invited you to provide an OLAP overview. The group’s members are particularly concerned about the OLAP client/server architecture requirements and how OLAP will fit the existing environment. Your job is to explain the main OLAP client/server components and architectures. OLAP systems are based on client/server technologies, which comprises of the following main modules and they are as follow: 1. OLAP GUI 2. OLAP analytical processing logic 3. OLAP data processing logic The location of each of these modules is a function of different client/server architectures. How and where the modules are placed depends on hardware, software, and professional judgment. Any placement decision has its own advantages or disadvantages. However, the following constraints must be met: The OLAP Analytical Processing Logic (APL) module can be place in the client (for speed) or in the server (for better administration and better throughput). The APL performs the complex transformations required for business data analysis, such as multiple dimensions, aggregation, Period comparison, and so on. The OLAP GUI is always placed in the end user's computer. The reason it is placed at the client side is simple: this is the main point of contact between the end user and the system. Specifically, it provides the interface through which the end user queries the data warehouse's contents. The OLAP Data Processing Logic Chapter Review 8 (DPL) maps the data analysis requests to the proper data objects in the Data Warehouse and is, therefore, generally placed at server level. 12) One of your vendors recommends using an MDBMS. How would you explain this recommendation to your project leader? Multidimensional On-Line Analytical Processing (MOLAP) provides OLAP functionality using multidimensional databases (MDBMS) to store and analyze multidimensional data. Multidimensional database systems (MDBMS) use special proprietary technique s to store data in matrix -like arrays of n -dimensions. 13) The project group is ready to make a final decision, choosing between ROLAP and MOLAP. What should be the basis for this decision? Why? The basis for the decision should be the system and end user requirements. Both ROLAP and MOLAP will provide advanced data analysis tools to enable organizations to generate required information. The selection of one or the other depends on which set of tools will fit best within the company's existing expertise base, its technology and end user requirements, and its ability to perform the job at a given cost. The proper OLAP/MOLAP selection criteria must include: purchase and installation price, supported hardware and software, compatibility with existing hardware, software, and DBMS, available programming interfaces, performance, availability, extent, and type of administrative tools, support for the database schema(s), ability to handle current and projected database size, database architecture, available resources, flexibility, Scalability, and total cost of ownership 14) The data warehouse project is in the design phase. Explain to your fellow designers how you would use a star schema in the design. The star schema is a data modeling technique that is used to map multidimensional decision support data into a relational database. The reason for the star schema's development is that existing relational modeling techniques, E -R and normalization, did not yield a database structure that served the advanced data analysis requirements well. Star schemas yield an easily implemented model for multidimensional data analysis while still preserving the relational structures on which the operational database is built. The basic star schema has two four components: facts, dimensions, attributes, and attribute hierarchies. The star schemas represent aggregated data for specific business activities. Using the schemas, we will create multiple aggregated data sources that will represent different aspects of business operations. For example, the aggregation may involve total sales by selected time periods, by products, by stores, and so on. Aggregated totals can be total product units, total sales values by products, etc. 15) Briefly discuss the OLAP architectural styles with and without data marts. DSS development can be traced along these lines: Chapter Review 9 Stage 1: The DSS are based, at least in general terms, on the reporting systems of the 1980's. These reporting systems required direct access to the operational data through a menu interface to yield predefined report structures. Stage2: DSS improved decision support by supplying lightly summarized data extracted from the operational database. These summarized data were usually stored in the RDBMS and were accessed through SQL statements via a query tool. At this stage, the DSS began to grow some ad hoc query capabilities. Stage 3: DSS made use of increasingly sophisticated data extraction and analysis tools. The major technologies that helped spawn this development include more capable microprocessors, parallel processing, relational database technologies, and client/server systems. Chapter Review 10 References 5 real examples of business intelligence in action. (n.d.). Tableau Software. Retrieved July 31, 2020, from https://www.tableau.com/learn/articles/business-intelligence-examples Morris, C.C. S. (2014). Database Systems: Design, Implementation, & Management.