Uploaded by jessica izeh

Database & BI Chapter Review

advertisement
Chapter Review
1
Chapter Review: 11 & 13
Jessica Izeh
University of Texas Rio Grande Valley
Chapter Review
2
Chapter 11
1.
What is SQL performance tuning?
SQL performance tuning describes a process – on the client-side – that will generate an SQL
query to return the correct answer in the least amount of time, using the minimum amount of
resources at the server.
2.
What is database performance tuning?
DBMS performance tuning describes a process – on the server-side – that will properly
configure the DBMS environment to respond to clients’ requests in the fastest way possible
while making optimum use of existing resources.
3.
What is the focus of most performance-tuning activities, and why does that focus
exist?
Most performance-tuning activities focus on minimizing the number of I/O operations
because the I/O operations are much slower than reading data from the data cache.
4.
What are database statistics, and why are they important?
Database statistics are dynamic metadata that assist the query optimizer in making better
decisions in regard to coming up with the fastest query execution plan. For example, if there
are only a dozen rows in a table, then there is no point going to an index to do a lookup; you
will always be better off doing a full table scan. But if that same table grows to a million
rows, then you will probably be better off using the index. A column with a lot of text is
suited for full-text search.
5.
How are database statistics obtained?
Database statistics are obtained by the database server as per user configuration. One can
configure the database for automatic database statistics collection and optimization, or they
could manually configure (or reconfigure) it.
6.
What database statistics measurements are typical of tables, indexes, and
resources?
For tables, typical measurements include the number of rows, the number of disk blocks
used, row length, the number of columns in each row, the number of distinct values in each
column, the maximum value in each column, the minimum value in each column, what
columns have indexes, and so on. For indexes, typical measurements include the number and
name of columns in the index key, the number of key values in the index, the number of
distinct key values in the index key, histogram of key values in an index, etc. For resources,
typical measurements include the logical and physical disk block size, the location and size
of data files, the number of extends per data file, and so on.
7.
How is the processing of SQL DDL statements (such as CREATE TABLE)
different from the processing required by DML statements?
Chapter Review
3
A DDL statement updates the data dictionary tables or system catalog, while a DML
statement (SELECT, INSERT, UPDATE and DELETE) mostly manipulates end-user data.
8.
In simple terms, the DBMS processes a query in three phases. What are the
phases, and what is accomplished in each phase?
The three phases are as follows.
i.
Parsing: The DBMS parses the SQL query and chooses the most efficient
access/execution plan
ii.
Execution: The DBMS executes the SQL query using the chosen execution
plan.
iii.
Fetching: The DBMS fetches the data and sends the result set back to the
client.
If indexes are so important, why not index every column in every table? (Include
a brief discussion of the role played by data sparsity.)
9.
Indexing every column in every table will tax the DBMS too much in terms of indexmaintenance processing, especially if the table has many attributes, many rows, and/or
requires many inserts, updates, and/or deletes One measure to determine the need for an
index is the data sparsity of the column you want to index. Indexes also take up space in
memory (RAM), Too many or too large of indexes, and the DB is going to have to be
swapping them to and from the disk. They also increase insert and delete time (each index
must be updated for every piece of data inserted/deleted/updated). Data Sparsity is the
amount of unique value that the column must contain, it measures the need for an index.
Knowing the sparsity helps you decide whether the use of an index is required or not.
10.
i.
ii.
What is the difference between a rule-based optimizer and a cost-based
optimizer?
Rule-based Optimizer also known as the original optimizer to the oracle database.
It follows a set of rules mostly based on indexes and types of indexes.
Cost-based Optimizer: CBO uses statistics and math to make an educated guess at
the lowest cost. CBO processes multiple iterations of explain plans (called
permutations). CBO picks the one with the overall lowest cost. It is designed to
determine the most efficient way to carry out a SQL statement.
Chapter 13
1) What is business intelligence? Give some recent examples of BI usage, using the
Internet for assistance. What BI benefits have companies found?
Business intelligence is a term used to describe a comprehensive, cohesive and integrated set
of applications used to capture, collect, integrate, store and analyze data to generate and
present information that is used to support business decision making.
Examples.
Chapter Review
4
Business intelligence is used in the customer section to enhance sales activities. It has been
said customers are the core of any business, and here BI can help one to understand them
better where a company gathering data on the purchasing history, accordingly for looking as
the customer preferences help to produce actionable results. BI can be used to analyze web
traffic; Google offers various tools to examine the website like webmaster accessories. It
enables any business to attract more customers and invite them to make a purchase. Like in
the case of the hotel industry, BI is useful to determine revenue generated per room. It
gathers statistics from every department accordingly. BI also helps in reducing work of labor,
smooth efficiency, make data readable and presentable. It provides accurate analysis and
better business decisions.
Web Examples:
The Dallas Teachers Credit Union (DTCU), used geographical data analysis to increase its
customer base from 250,000 professional educators to 3.5 million potential customers virtually overnight. The increase gave the credit union the ability to compete with larger
banks that had a strong presence in Dallas.
[http://www.computerworld.com/s/article/47371/Business_Intelligence?taxonomyId=120
For meal kit company HelloFresh, a centralized business intelligence solution saved the
marketing analytics team 10-20 working hours per day by automating reporting processes. It
also empowered the larger marketing team to craft regional, individualized digital marketing
campaigns.
Coca-Cola's business intelligence team handles reporting for all sales and delivery operations
at the company. With their BI platform, the team automated manual reporting processes,
saving over 260 hours a year—more than six 40-hour workweeks.
2) Describe the BI framework. Illustrate the evolution of BI.
BI is not a product by itself, but a framework of concepts, practices, tools, and technologies that
help a business better understand its core capabilities, provide snapshots of the company
situation, and identify key opportunities to create competitive advantage. In practice, BI
provides a well-orchestrated framework for the management of data that works across all levels
of the organization.BI involves the following general steps: Collecting and storing operational
data, Aggregating the operational data into decision support data, Analyzing decision support
data to generate information, Presenting such information to the end-user to support business
decisions, Making business decisions, which in turn generate more data that is collected, stored,
etc.(restarting the process), Monitoring results to evaluate outcomes of the business decisions
(providing more data to be collected, stored, etc.)To implement all these steps, BI uses varied
components and technologies.
3) What are decision support systems, and what role do they play in the business
environment?
Chapter Review
5
Decision Support Systems (DSS) are based on computerized tools that are used to enhance
managerial decision-making. Because complex data and the proper analysis of such data are
crucial to strategic and tactical decision making, DSS is essential to the well-being and even
survival of businesses that must compete in a global marketplace.
4) Explain how the main components of the BI architecture interact to form a system.
Describe the evolution of BI information dissemination formats.
The textbook Emphasizes that there is no single BI architecture; instead, it ranges from highly
integrated applications from a single vendor to a loosely integrated, multi-vendor environment.
However, there are some general types of functionality that all BI implementations share. Like
any critical business IT infrastructure, the BI architecture is composed of data, people, processes,
technology, and the management of such components
i.
ii.
iii.
iv.
v.
The 1970s: centralized reports running on mainframes, minicomputers, or even central
server environments. Such reports were predefined and took considerable time to
process.
The 1980s: desktop computers, downloaded spreadsheet data from central locations.
The 1990s: first-generation DSS, centralized reporting.
The mid-1990s: Data Warehouse and Online analytical processing systems (OLAP)
The 2000s: BI web-based dashboards and mobile BI.
5) What are the most relevant differences between operational data and decision
support data?
Operational data: Mostly stored in a relational database and are optimized to support transactions
that represent daily operations. Ex: Each time an item is sold, Customer data, etc.
Decision support data: Differ from operational data in three main areas: period, granularity, and
dimensionality.
Timespan: Decision support data to cover a longer time frame (Ex: Sales for a customer, A in the
last five years) whereas operational data for short durations.
Granularity: Decision support data is presented with different levels of aggregation, from highly
summarized to near atomic. Ex: managers can analyze sales by region, they also must be able to
access data of sales by region, by city within the region, by store within the city, etc.
Dimensionality: Decision support data include many data dimensions and show how the data
relate over those dimensions. But Operational data focus on individual transactions rather than
effects of transactions over time.
6) What is a data warehouse, and what are its main characteristics? How does it differ
from a data mart?
A data warehouse is an integrated, subject-oriented, time-variant, and non-volatile database
that provides support for decision-making. (See section 13.4 for an in-depth discussion about
the main characteristics.) The data warehouse is usually a read-only database optimized for
Chapter Review
6
data analysis and query processing. Typically, data are extracted from various sources and
are then transformed and integrated—in other words, passed through a data filter—before
being loaded into the data warehouse. Users access the data warehouse via front-end tools
and/or end-user application software to extract the data in a usable form. Figure 13.4 in the
text illustrates how a data warehouse is created from the data contained in an operational
database. You might be tempted to think that the data warehouse is just a big summarized
database. But a good data warehouse is much more than that. A complete data warehouse
architecture includes support for a decision support data store, a data extraction and
integration filter, and a specialized presentation interface. To be useful, the data warehouse
must conform to uniform structures and formats to avoid data conflicts and to support
decision making.
7) Give three examples of likely problems when operational data are integrated into
the data warehouse.
Database type and structure will the main problem when operational data are integrated into a
data warehouse. Let us consider a university database that has four departments like
accounts, libraries, faculty, students. Let the database has with one thousand students and
professor’s data. There may be a student’s status as a freshman, sophomore, senior, junior in
the accounting department whereas in the library it may be like Fr, So, Se, Ju. The date of
enrollment for classes in the accounting table will be DD-MM-YYYY format whereas the
date of enrollment in the student's department will be MM-DD-YYYY, then also the problem
arises. There may also be a problem when the data type for the GPA is declared as a float in
one department and decimal in other departments.
8) Prepare a high-level summary of the main requirements to evaluate DBMS
products for data warehousing.
There are four primary ways to evaluate a DBMS that is tailored to provide fast answers to
sophistication of data extraction and loading tools the end-user analytical interface the
database size requirements establish the requirements based on the size of the database, the
data sources, the necessary data transformations, and the end-user query requirements.
Determine what type of database is needed, i.e., a multidimensional or a relational database
using the star schema. Other valid evaluation criteria include the cost of acquisition and
available upgrades (if any), training, technical and development support, performance, ease
of use, and maintenance.
9) Your data warehousing project group is debating whether to create a prototype of a data
warehouse before its implementation. The project group members are especially
concerned about the need to acquire some data warehousing skills before implementing
the enterprise-wide data warehouse. What would you recommend? Explain your
recommendations
Knowing that data warehousing requires time, money, and considerable managerial effort, many
companies create data marts, instead. Data marts use smaller, more manageable data sets that are
Chapter Review
7
targeted to fit the special needs of small groups within the organization. In other words, data
marts are small, single-subject data warehouse subsets. Data mart development and use costs are
lower, and the implementation time is shorter. Once the data marts have demonstrated their
ability to serve the DSS, they can be expanded to become data warehouses, or they can be
migrated into larger existing data warehouses
10) Suppose that you are selling the data warehouse idea to your users. How would you
define multidimensional data analysis for them? How would you explain its advantages
to them?
Multidimensional data analysis is the capability to compare values within tables. It can provide a
breakdown and evaluate numbers, i.e. Customers to Sales. It can then take a portion, like Sales,
and compare it to another value and get different results per that definition, dimensions. The
tables aren’t in tabular view, they are more in multidimensional, almost 3-D.“Multi-dimensional
Data Analysis (MDDA) refers to the process of summarizing data across multiple levels (called
dimensions) and then presenting the results in a multi-dimensional grid format. This process is
also referred to as OLAP, Data Pivot., Decision Cube, and Crosstab” (Multi-Dimensional,
n.d.).The advantage of the multidimensional is that it can compare and contrast other businesses
to see how well seated your business currently is with the economy. It can also look at sales
figures, whose provide the most income and can also define what product has the most benefit.
This could be essential to know what the customer wants and how to better evolve the business
11) The data warehousing project group has invited you to provide an OLAP overview.
The group’s members are particularly concerned about the OLAP client/server
architecture requirements and how OLAP will fit the existing environment. Your
job is to explain the main OLAP client/server components and architectures.
OLAP systems are based on client/server technologies, which comprises of the following main
modules and they are as follow:
1. OLAP GUI
2. OLAP analytical processing logic
3. OLAP data processing logic
The location of each of these modules is a function of different client/server architectures. How
and where the modules are placed depends on hardware, software, and professional judgment.
Any placement decision has its own advantages or disadvantages. However, the following
constraints must be met: The OLAP Analytical Processing Logic (APL) module can be place in
the client (for speed) or in the server (for better administration and better throughput). The APL
performs the complex transformations required for business data analysis, such as multiple
dimensions, aggregation, Period comparison, and so on. The OLAP GUI is always placed in the
end user's computer. The reason it is placed at the client side is simple: this is the main point of
contact between the end user and the system. Specifically, it provides the interface through
which the end user queries the data warehouse's contents. The OLAP Data Processing Logic
Chapter Review
8
(DPL) maps the data analysis requests to the proper data objects in the Data Warehouse and is,
therefore, generally placed at server level.
12) One of your vendors recommends using an MDBMS. How would you explain this
recommendation to your project leader?
Multidimensional On-Line Analytical Processing (MOLAP) provides OLAP functionality using
multidimensional databases (MDBMS) to store and analyze multidimensional data.
Multidimensional database systems (MDBMS) use special proprietary technique s to store data
in matrix -like arrays of n -dimensions.
13) The project group is ready to make a final decision, choosing between ROLAP and
MOLAP. What should be the basis for this decision? Why?
The basis for the decision should be the system and end user requirements. Both ROLAP and
MOLAP will provide advanced data analysis tools to enable organizations to generate required
information. The selection of one or the other depends on which set of tools will fit best within
the company's existing expertise base, its technology and end user requirements, and its ability to
perform the job at a given cost.
The proper OLAP/MOLAP selection criteria must include: purchase and installation price,
supported hardware and software, compatibility with existing hardware, software, and DBMS,
available programming interfaces, performance, availability, extent, and type of administrative
tools, support for the database schema(s), ability to handle current and projected database size,
database architecture, available resources, flexibility, Scalability, and total cost of ownership
14) The data warehouse project is in the design phase. Explain to your fellow designers
how you would use a star schema in the design.
The star schema is a data modeling technique that is used to map multidimensional decision
support data into a relational database. The reason for the star schema's development is that
existing relational modeling techniques, E -R and normalization, did not yield a database
structure that served the advanced data analysis requirements well. Star schemas yield an easily
implemented model for multidimensional data analysis while still preserving the relational
structures on which the operational database is built. The basic star schema has two four
components: facts, dimensions, attributes, and attribute hierarchies. The star schemas represent
aggregated data for specific business activities. Using the schemas, we will create multiple
aggregated data sources that will represent different aspects of business operations. For example,
the aggregation may involve total sales by selected time periods, by products, by stores, and so
on. Aggregated totals can be total product units, total sales values by products, etc.
15) Briefly discuss the OLAP architectural styles with and without data marts.
DSS development can be traced along these lines:
Chapter Review
9
Stage 1: The DSS are based, at least in general terms, on the reporting systems of the 1980's.
These reporting systems required direct access to the operational data through a menu interface
to yield predefined report structures.
Stage2: DSS improved decision support by supplying lightly summarized data extracted from the
operational database. These summarized data were usually stored in the RDBMS and were
accessed through SQL statements via a query tool. At this stage, the DSS began to grow some ad
hoc query capabilities.
Stage 3: DSS made use of increasingly sophisticated data extraction and analysis tools. The
major technologies that helped spawn this development include more capable microprocessors,
parallel processing, relational database technologies, and client/server systems.
Chapter Review
10
References
5 real examples of business intelligence in action. (n.d.). Tableau Software. Retrieved July 31,
2020, from https://www.tableau.com/learn/articles/business-intelligence-examples
Morris, C.C. S. (2014). Database Systems: Design, Implementation, & Management.
Download