Uploaded by Ayesha Almansoori

Business intelligent

advertisement
Business Intelligence, 4e (Sharda/Delen/Turban)
Chapter 1 An Overview of Business Intelligence, Analytics, and Data Science
1) Computerized support is only used for organizational decisions that are responses to external
pressures, not for taking advantage of opportunities.
Answer: FALSE
Diff: 2
Page Ref: 3
2) During the early days of analytics, data was often obtained from the domain experts using
manual processes to build mathematical or knowledge-based models.
Answer: TRUE
Diff: 2
Page Ref: 13
3) Computer applications have moved from transaction processing and monitoring activities to
problem analysis and solution applications.
Answer: TRUE
Diff: 1
Page Ref: 11
4) Business intelligence (BI) is a specific term that describes architectures and tools only.
Answer: FALSE
Diff: 1
Page Ref: 16
5) The growth in hardware, software, and network capacities has had little impact on modern BI
innovations.
Answer: FALSE
Diff: 1
Page Ref: 11
6) Managing data warehouses requires special methods, including parallel computing and/or
Hadoop/Spark.
Answer: TRUE
Diff: 3
Page Ref: 11-12
7) Managing information on operations, customers, internal procedures and employee
interactions is the domain of cognitive science.
Answer: FALSE
Diff: 3
Page Ref: 12
8) Decision support system (DSS) and management information system (MIS) have precise
definitions agreed to by practitioners.
Answer: FALSE
Diff: 2
Page Ref: 13
9) In the 2000s, the DW-driven DSSs began to be called BI systems.
Answer: TRUE
Diff: 1
Page Ref: 14
10) Major commercial business intelligence (BI) products and services were well established in
the early 1970s.
Answer: FALSE
Diff: 2
Page Ref: 15
11) Information systems that support such transactions as ATM withdrawals, bank deposits, and
cash register scans at the grocery store represent transaction processing, a critical branch of BI.
Answer: FALSE
Diff: 2
Page Ref: 19
12) Many business users in the 1980s referred to their mainframes as "the black hole," because
all the information went into it, but little ever came back and ad hoc real-time querying was
virtually impossible.
Answer: TRUE
Diff: 2
Page Ref: 20
13) Successful BI is a tool for the information systems department, but is not exposed to the
larger organization.
Answer: FALSE
Diff: 2
Page Ref: 20
14) BI represents a bold new paradigm in which the company's business strategy must be aligned
to its business intelligence analysis initiatives.
Answer: FALSE
Diff: 2
Page Ref: 20-21
15) Traditional BI systems use a large volume of static data that has been extracted, cleansed,
and loaded into a data warehouse to produce reports and analyses.
Answer: TRUE
Diff: 2
Page Ref: 21
16) Demands for instant, on-demand access to dispersed information decrease as firms
successfully integrate BI into their operations.
Answer: FALSE
Diff: 3
Page Ref: 21
17) The use of dashboards and data visualizations is seldom effective in identifying issues in
organizations, as demonstrated by the Silvaris Corporation Case Study.
Answer: FALSE
Diff: 2
Page Ref: 24
18) The use of statistics in baseball by the Oakland Athletics, as described in the Moneyball case
study, is an example of the effectiveness of prescriptive analytics.
Answer: TRUE
Diff: 2
Page Ref: 5
19) Due to industry consolidation, the analytics ecosystem consists of only a handful of players
across several functional areas.
Answer: FALSE
Diff: 2
Page Ref: 38-39
20) Data generation is a precursor, and is not included in the analytics ecosystem.
Answer: FALSE
Diff: 1
Page Ref: 39
21) In the Opening Vignette on Sports Analytics, what was adjusted to drive one-time ticket
sales?
A) player selections
B) stadium location
C) fan tweets
D) ticket prices
Answer: D
Diff: 2
Page Ref: 6
22) In the Opening Vignette on Sports Analytics, what type of modeling was used to predict
offensive tactics?
A) heuristics
B) heat maps
C) cascaded decision trees
D) sentiment analysis
Answer: B
Diff: 3
Page Ref: 7
23) Business applications have moved from transaction processing and monitoring to other
activities. Which of the following is NOT one of those activities?
A) problem analysis
B) solution applications
C) data monitoring
D) mobile access
Answer: C
Diff: 2
Page Ref: 11
24) Which of the following developments is NOT contributing to facilitating growth of decision
support and analytics?
A) collaboration technologies
B) Big Data
C) knowledge management systems
D) locally concentrated workforces
Answer: D
Diff: 3
Page Ref: 11-12
25) In what decade did disjointed information systems begin to be integrated?
A) 1970s
B) 1980s
C) 1990s
D) 2000s
Answer: B
Diff: 2
Page Ref: 14
26) Relational databases began to be used in the
A) 1960s.
B) 1970s.
C) 1980s.
D) 1990s.
Answer: C
Diff: 3
Page Ref: 13
27) The need for more versatile reporting than what was available in 1980s era ERP systems led
to the development of what type of system?
A) management information systems
B) relational databases
C) executive information systems
D) data warehouses
Answer: C
Diff: 3
Page Ref: 14
28) Which of the following is an umbrella term that combines architectures, tools, databases,
analytical tools, applications, and methodologies?
A) MIS
B) DSS
C) ERP
D) BI
Answer: D
Diff: 1
Page Ref: 16
29) The competitive imperatives for BI include all of the following EXCEPT
A) right information
B) right user
C) right time
D) right place
Answer: B
Diff: 2
Page Ref: 16
30) Which of the following is NOT an example of transaction processing?
A) ATM withdrawal
B) bank deposit
C) sales report
D) cash register scans
Answer: C
Diff: 2
Page Ref: 19
31) Online transaction processing (OLTP) systems handle a company's routine ongoing business.
In contrast, a data warehouse is typically
A) the end result of BI processes and operations.
B) a repository of actionable intelligence obtained from a data mart.
C) a distinct system that provides storage for data that will be made use of in analysis.
D) an integral subsystem of an online analytical processing (OLAP) system.
Answer: C
Diff: 2
Page Ref: 19-20
32) The very design that makes an OLTP system efficient for transaction processing makes it
inefficient for
A) end-user ad hoc reports, queries, and analysis.
B) transaction processing systems that constantly update operational databases.
C) the collection of reputable sources of intelligence.
D) transactions such as ATM withdrawals, where we need to reduce a bank balance accordingly.
Answer: A
Diff: 2
Page Ref: 20
33) How are enterprise resources planning (ERP) systems related to supply chain management
(SCM) systems?
A) different terms for the same system
B) complementary systems
C) mutually exclusive systems
D) none of the above; these systems never interface
Answer: B
Diff: 2
Page Ref: 20
34) BI applications must be integrated with
A) databases.
B) legacy systems.
C) enterprise systems.
D) all of these
Answer: D
Diff: 2
Page Ref: 22
35) What has caused the growth of the demand for instant, on-demand access to dispersed
information?
A) the increasing divide between users who focus on the strategic level and those who are more
oriented to the tactical level
B) the need to create a database infrastructure that is always online and contains all the
information from the OLTP systems
C) the more pressing need to close the gap between the operational data and strategic objectives
D) the fact that BI cannot simply be a technical exercise for the information systems department
Answer: C
Diff: 3
Page Ref: 21
36) Today, many vendors offer diversified tools, some of which are completely preprogrammed
(called shells). How are these shells utilized?
A) They are used for customization of BI solutions.
B) All a user needs to do is insert the numbers.
C) The shell provides a secure environment for the organization's BI data.
D) They host an enterprise data warehouse that can assist in decision making.
Answer: B
Diff: 2
Page Ref: 21
37) What type of analytics seeks to recognize what is going on as well as the likely forecast and
make decisions to achieve the best performance possible?
A) descriptive
B) prescriptive
C) predictive
D) domain
Answer: B
Diff: 2
Page Ref: 24-27
38) What type of analytics seeks to determine what is likely to happen in the future?
A) descriptive
B) prescriptive
C) predictive
D) domain
Answer: C
Diff: 2
Page Ref: 24-27
39) Which of the following statements about Big Data is true?
A) Data chunks are stored in different locations on one computer.
B) Hadoop is a type of processor used to process Big Data applications.
C) MapReduce is a storage filing system.
D) Pure Big Data systems do not involve fault tolerance.
Answer: D
Diff: 3
Page Ref: 36
40) Big Data often involves a form of distributed storage and processing using Hadoop and
MapReduce. One reason for this is
A) centralized storage creates too many vulnerabilities.
B) the "Big" in Big Data necessitates over 10,000 processing nodes.
C) the processing power needed for the centralized model would overload a single computer.
D) Big Data systems have to match the geographical spread of social media.
Answer: C
Diff: 3
Page Ref: 36
41) Fundamental reasons for investing in BI must be ________ with the company's business
strategy.
Answer: aligned
Diff: 2
Page Ref: 20
42) Software monitors referred to as ________ can be placed on a separate server in the network
and use event- and process-based approaches to measure and monitor operational processes.
Answer: intelligent agents
Diff: 2
Page Ref: 21
43) Organizations using BI systems are typically seeking to ________ the gap between the
operational data and strategic objectives has become more pressing.
Answer: close
Diff: 2
Page Ref: 21
44) ________ is an umbrella term that combines architectures, tools, databases, analytical tools,
applications, and methodologies.
Answer: Business intelligence (BI)
Diff: 2
Page Ref: 16
45) A(n) ________ is a major component of a Business Intelligence (BI) system that holds
source data.
Answer: data warehouse
Diff: 2
Page Ref: 11
46) A(n) ________ is a major component of a Business Intelligence (BI) system that is often
browser based and often presents a portal or dashboard.
Answer: user interface
Diff: 2
Page Ref: 17
47) ________ cycle times are now extremely compressed, faster, and more informed across
industries.
Answer: Business
Diff: 2
Page Ref: 16
48) Different types of players are identified and described in the analytics ________.
Answer: ecosystem
Diff: 2
Page Ref: 37
49) ________ providers focus on providing technology and services aimed toward integrating
data from multiple sources.
Answer: Data Warehouse
Diff: 2
Page Ref: 40
50) ________ providers focus on bringing all the data stores into an enterprise-wide platform.
Answer: Middleware
Diff: 2
Page Ref: 40
51) The user interface of a BI system is often referred to as a(n) ________.
Answer: dashboard
Diff: 2
Page Ref: 16
52) Data warehouses are intended to work with informational data used for online ________
processing systems.
Answer: analytical
Diff: 2
Page Ref: 20
53) With ________, all the data from every corner of the enterprise is collected and integrated
into a consistent schema so that every part of the organization has access to the single version of
the truth when and where needed.
Answer: Enterprise Resource Planning (ERP)
Diff: 2
Page Ref: 14
54) As the number of potential BI applications increases, the need to justify and prioritize them
arises. This is not an easy task due to the large number of ________ benefits.
Answer: intangible
Diff: 2
Page Ref: 22
55) ________ analytics help managers understand current events in the organization including
causes, trends, and patterns.
Answer: Descriptive
Diff: 2
Page Ref: 24
56) ________ analytics help managers understand probable future outcomes.
Answer: Predictive
Diff: 2
Page Ref: 25
57) ________ analytics help managers make decisions to achieve the best performance in the
future.
Answer: Prescriptive
Diff: 2
Page Ref: 26-27
58) The Google search engine is an example of Big Data in that it has to search and index
billions of ________ in fractions of a second for each search.
Answer: Web pages
Diff: 2
Page Ref: 36
59) The filing system developed by Google to handle Big Data storage challenges is known as
the ________ Distributed File System.
Answer: Hadoop
Diff: 2
Page Ref: 36
60) The programing algorithm developed by Google to handle Big Data computational
challenges is known as ________.
Answer: MapReduce
Diff: 2
Page Ref: 36
61) List four possible analytics applications in the retail value chain.
Answer:
• Inventory Optimization
• Price Elasticity
• Market Basket Analysis
• Shopper Insight
• Customer Churn Analysis
• Channel Analysis
• New Store Analysis
• Store Layout
• Video Analytics
Diff: 2
Page Ref: 34
62) What are the four major components of a Business Intelligence (BI) system?
Answer:
1. A data warehouse, with its source data
2. Business analytics, a collection of tools for manipulating, mining, and analyzing the data in
the data warehouse
3. Business performance management (BPM) for monitoring and analyzing performance
4. A user interface (e.g., a dashboard)
Diff: 3
Page Ref: 16
63) Why is data alone worthless?
Answer: Alone, data is worthless because it does not provide business value. To provide
business value, it has to be analyzed.
Diff: 2
Page Ref: 36
64) What is the intent of the analysis of data that is stored in a data warehouse?
Answer: The intent of the analysis is to give management the ability to analyze data for insights
into the business, and thus provide tactical or operational decision support whereby, for example,
line personnel can make quicker and/or more informed decisions.
Diff: 2
Page Ref: 19-20
65) Describe the three major subsets of the Analytics Focused Software Developers portion of
the Analytics Ecosystem.
Answer:
• Reporting/Descriptive Analytics — Includes tools is enabled by and available from the
Middleware industry players and unique capabilities offered by focused providers.
• Predictive Analytics — a rapidly growing area that includes a variety of statistical packages.
• Prescriptive Analytics — Software providers in this category offer modeling tools and
algorithms for optimization of operations usually called management science/operations research
software.
Diff: 3
Page Ref: 41-42
66) Business applications can be programmed to act on what real-time BI systems discover.
Describe two approaches to the implementation of real-time BI.
Answer:
• One approach to real-time BI uses the DW model of traditional BI systems. In this case,
products from innovative BI platform providers provide a service-oriented, near–real-time
solution that populates the DW much faster than the typical nightly extract/transfer/load (ETL)
batch update does.
• A second approach, commonly called business activity management (BAM), is adopted by
pure play BAM and or hybrid BAM-middleware providers (such as Savvion, Iteration Software,
Vitria, webMethods, Quantive, Tibco, or Vineyard Software). It bypasses the DW entirely and
uses Web services or other monitoring means to discover key business events. These software
monitors (or intelligent agents) can be placed on a separate server in the network or on the
transactional application databases themselves, and they can use event- and process-based
approaches to proactively and intelligently measure and monitor operational processes.
Diff: 3
Page Ref: 21
67) List and describe three levels or categories of analytics that are most often viewed as
sequential and independent, but also occasionally seen as overlapping.
Answer:
• Descriptive or reporting analytics refers to knowing what is happening in the organization
and understanding some underlying trends and causes of such occurrences.
• Predictive analytics aims to determine what is likely to happen in the future. This analysis is
based on statistical techniques as well as other more recently developed techniques that fall
under the general category of data mining.
• Prescriptive analytics recognizes what is going on as well as the likely forecast and makes
decisions to achieve the best performance possible.
Diff: 3
Page Ref: 24-27
68) How does Amazon.com use predictive analytics to respond to product searches by the
customer?
Answer: Amazon uses clustering algorithms to segment customers into different clusters to be
able to target specific promotions to them. The company also uses association mining techniques
to estimate relationships between different purchasing behaviors. That is, if a customer buys one
product, what else is the customer likely to purchase? That helps Amazon recommend or
promote related products. For example, any product search on Amazon.com results in the retailer
also suggesting other similar products that may interest a customer.
Diff: 3
Page Ref: 26
69) Describe and define Big Data. Why is a search engine a Big Data application?
Answer:
• Big Data is data that cannot be stored in a single storage unit. Big Data typically refers to
data that is arriving in many different forms, be they structured, unstructured, or in a stream.
Major sources of such data are clickstreams from Web sites, postings on social media sites such
as Facebook, or data from traffic, sensors, or weather.
• A Web search engine such as Google needs to search and index billions of Web pages in
order to give you relevant search results in a fraction of a second. Although this is not done in
real time, generating an index of all the Web pages on the Internet is not an easy task.
Diff: 3
Page Ref: 35-36
70) What storage system and processing algorithm were developed by Google for Big Data?
Answer:
• Google developed and released as an Apache project the Hadoop Distributed File System
(HDFS) for storing large amounts of data in a distributed way.
• Google developed and released as an Apache project the MapReduce algorithm for pushing
computation to the data, instead of pushing data to a computing node.
Diff: 3
Page Ref: 36
Business Intelligence, 4e (Sharda/Delen/Turban)
Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling,
and Visualization
1) One of SiriusXM's challenges was tracking potential customers when cars were sold.
Answer: TRUE
Diff: 1
Page Ref: 54
2) To respond to its market challenges, SiriusXM decided to focus on manufacturing efficiency.
Answer: FALSE
Diff: 2
Page Ref: 55
3) Data is the contextualization of information, that is, information set in context.
Answer: FALSE
Diff: 1
Page Ref: 98
4) Data is the main ingredient for any BI, data science, and business analytics initiative.
Answer: TRUE
Diff: 2
Page Ref: 57
5) Predictive algorithms generally require a flat file with a target variable, so making data
analytics ready for prediction means that data sets must be transformed into a flat-file format and
made ready for ingestion into those predictive algorithms.
Answer: TRUE
Diff: 1
Page Ref: 58
6) The data storage component of a business reporting system builds the various reports and
hosts them for, or disseminates them to users. It also provides notification, annotation,
collaboration, and other services.
Answer: FALSE
Diff: 2
Page Ref: 98
7) In the FEMA case study, the BureauNet software was the primary reason behind the increased
speed and relevance of the reports FEMA employees received.
Answer: TRUE
Diff: 2
Page Ref: 100
8) Google Maps has set new standards for data visualization with its intuitive Web mapping
software.
Answer: TRUE
Diff: 2
Page Ref: 103
9) There are basic chart types and specialized chart types. A Gantt chart is a specialized chart
type.
Answer: TRUE
Diff: 2
Page Ref: 107
10) Visualization differs from traditional charts and graphs in complexity of data sets and use of
multiple dimensions and measures.
Answer: TRUE
Diff: 2
Page Ref: 110
11) When telling a story during a presentation, it is best to avoid describing hurdles that your
character must overcome, to avoid souring the mood.
Answer: FALSE
Diff: 2
Page Ref: 113
12) Visual analytics is aimed at answering, "What is it happening?" and is usually associated
with business analytics.
Answer: FALSE
Diff: 3
Page Ref: 112
13) Dashboards provide visual displays of important information that is consolidated and
arranged across several screens to maintain data order.
Answer: FALSE
Diff: 2
Page Ref: 117
14) In the Dallas Cowboys case study, the focus was on using data analytics to decide which
players would play every week.
Answer: FALSE
Diff: 2
Page Ref: 118
15) Data source reliability means that data are correct and are a good match for the analytics
problem.
Answer: FALSE
Diff: 1
Page Ref: 59
16) Data accessibility means that the data are easily and readily obtainable.
Answer: TRUE
Diff: 3
Page Ref: 59
17) Structured data is what data mining algorithms use and can be classified as categorical or
numeric.
Answer: TRUE
Diff: 2
Page Ref: 61
18) Interval data are variables that can be measured on interval scales.
Answer: TRUE
Diff: 2
Page Ref: 62
19) Nominal data represent the labels of multiple classes used to divide a variable into specific
groups.
Answer: FALSE
Diff: 2
Page Ref: 61
20) Descriptive statistics is all about describing the sample data on hand.
Answer: TRUE
Diff: 2
Page Ref: 75
21) Which characteristic of data means that all the required data elements are included in the data
set?
A) data source reliability
B) data accessibility
C) data richness
D) data granularity
Answer: C
Diff: 2
Page Ref: 59-60
22) Key performance indicators (KPIs) are metrics typically used to measure
A) database responsiveness.
B) qualitative feedback.
C) external results.
D) internal results.
Answer: D
Diff: 2
Page Ref: 99
23) Kaplan and Norton developed a report that presents an integrated view of success in the
organization called
A) metric management reports.
B) balanced scorecard-type reports.
C) dashboard-type reports.
D) visual reports.
Answer: B
Diff: 2
Page Ref: 99
24) Which characteristic of data requires that the variables and data values be defined at the
lowest (or as low as required) level of detail for the intended use of the data?
A) data source reliability
B) data accessibility
C) data richness
D) data granularity
Answer: D
Diff: 2
Page Ref: 59-60
25) Which of the following is LEAST related to data/information visualization?
A) information graphics
B) scientific visualization
C) statistical graphics
D) graphic artwork
Answer: D
Diff: 2
Page Ref: 101
26) The Internet emerged as a new medium for visualization and brought all the following
EXCEPT
A) worldwide digital distribution of visualization.
B) immersive environments for consuming data.
C) new forms of computation of business logic.
D) new graphics displays through PC displays.
Answer: C
Diff: 2
Page Ref: 101-103
27) Which kind of chart is described as an enhanced version of a scatter plot?
A) heat map
B) bullet
C) pie chart
D) bubble chart
Answer: D
Diff: 3
Page Ref: 107
28) Which type of visualization tool can be very helpful when the intention is to show relative
proportions of dollars per department allocated by a university administration?
A) heat map
B) bullet
C) pie chart
D) bubble chart
Answer: C
Diff: 3
Page Ref: 106
29) Which type of visualization tool can be very helpful when a data set contains location data?
A) bar chart
B) geographic map
C) highlight table
D) tree map
Answer: B
Diff: 2
Page Ref: 107
30) Which type of question does visual analytics seeks to answer?
A) Why is it happening?
B) What happened yesterday?
C) What is happening today?
D) When did it happen?
Answer: A
Diff: 2
Page Ref: 112
31) When you tell a story in a presentation, all of the following are true EXCEPT
A) a story should make sense and order out of a lot of background noise.
B) a well-told story should have no need for subsequent discussion.
C) stories and their lessons should be easy to remember.
D) the outcome and reasons for it should be clear at the end of your story.
Answer: B
Diff: 2
Page Ref: 113
32) Benefits of the latest visual analytics tools, such as SAS Visual Analytics, include all of the
following EXCEPT
A) mobile platforms such as the iPhone are supported by these products.
B) it is easier to spot useful patterns and trends in the data.
C) they explore massive amounts of data in hours, not days.
D) there is less demand on IT departments for reports.
Answer: C
Diff: 2
Page Ref: 115
33) What is the management feature of a dashboard?
A) operational data that identify what actions to take to resolve a problem
B) summarized dimensional data to analyze the root cause of problems
C) summarized dimensional data to monitor key performance metrics
D) graphical, abstracted data to monitor key performance metrics
Answer: A
Diff: 3
Page Ref: 119
34) What is the fundamental challenge of dashboard design?
A) ensuring that users across the organization have access to it
B) ensuring that the organization has the appropriate hardware onsite to support it
C) ensuring that the organization has access to the latest Web browsers
D) ensuring that the required information is shown clearly on a single screen
Answer: D
Diff: 3
Page Ref: 119
35) Contextual metadata for a dashboard includes all the following EXCEPT
A) whether any high-value transactions that would skew the overall trends were rejected as a part
of the loading process.
B) which operating system is running the dashboard server software.
C) whether the dashboard is presenting "fresh" or "stale" information.
D) when the data warehouse was last refreshed.
Answer: B
Diff: 2
Page Ref: 121
36) Dashboards can be presented at all the following levels EXCEPT
A) the visual dashboard level.
B) the static report level.
C) the visual cube level.
D) the self-service cube level.
Answer: C
Diff: 2
Page Ref: 122
37) This measure of central tendency is the sum of all the values/observations divided by the
number of observations in the data set.
A) dispersion
B) mode
C) median
D) arithmetic mean
Answer: D
Diff: 3
Page Ref: 76
38) This measure of dispersion is calculated by simply taking the square root of the variations.
A) standard deviation
B) range
C) variance
D) arithmetic mean
Answer: A
Diff: 2
Page Ref: 78
39) This plot is a graphical illustration of several descriptive statistics about a given data set.
A) pie chart
B) bar graph
C) box-and-whiskers plot
D) kurtosis
Answer: C
Diff: 3
Page Ref: 79
40) This technique makes no a priori assumption of whether one variable is dependent on the
other(s) and is not concerned with the relationship between variables; instead it gives an estimate
on the degree of association between the variables.
A) regression
B) correlation
C) means test
D) multiple regression
Answer: B
Diff: 2
Page Ref: 86
41) A(n) ________ is a communication artifact, concerning business matters, prepared with the
specific intention of relaying information in a presentable form.
Answer: report
Diff: 2
Page Ref: 98
42) ________ statistics is about drawing conclusions about the characteristics of the population.
Answer: Inferential
Diff: 2
Page Ref: 75
43) Due to the ________ expansion of information technology coupled with the need for
improved competitiveness in business, there has been an increase in the use of computing power
to produce unified reports that join different views of the enterprise in one place.
Answer: rapid
Diff: 3
Page Ref: 98
44) ________ management reports are used to manage business performance through outcomeoriented metrics in many organizations.
Answer: Metric
Diff: 2
Page Ref: 99
45) When validating the assumptions of a regression, ________ assumes that the relationship
between the response variable and the explanatory variables are linear.
Answer: linearity
Diff: 2
Page Ref: 89
46) ________ regression is a very popular, statistically sound, probability-based classification
algorithm that employs supervised learning.
Answer: Logistic
Diff: 2
Page Ref: 90
47) ________ charts are useful in displaying nominal data or numerical data that splits nicely
into different categories so you can quickly see comparative results and trends.
Answer: Bar
Diff: 1
Page Ref: 106
48) ________ charts or network diagrams show precedence relationships among the project
activities/tasks.
Answer: PERT
Diff: 1
Page Ref: 107
49) ________ are typically used together with other charts and graphs, as opposed to by
themselves, and show postal codes, country names, etc.
Answer: Maps
Diff: 1
Page Ref: 107
50) Typical charts, graphs, and other visual elements used in visualization-based applications
usually involve ________ dimensions.
Answer: two
Diff: 2
Page Ref: 110
51) Visual analytics is widely regarded as the combination of visualization and ________
analytics.
Answer: predictive
Diff: 2
Page Ref: 112
52) Dashboards present visual displays of important information that are consolidated and
arranged on a single ________.
Answer: screen
Diff: 1
Page Ref: 117
53) With dashboards, the layer of information that uses graphical, abstracted data to keep tabs on
key performance metrics is the ________ layer.
Answer: monitoring
Diff: 2
Page Ref: 119
54) ________ series forecasting is the use of mathematical modeling to predict future values of
the variable of interest based on previously observed values.
Answer: Time
Diff: 1
Page Ref: 97
55) Information dashboards enable ________ operations that allow the users to view underlying
data sources and obtain more detail.
Answer: drill-down/drill-through
Diff: 2
Page Ref: 121
56) With a dashboard, information on sources of the data being presented, the quality and
currency of underlying data provide contextual ________ for users.
Answer: metadata
Diff: 2
Page Ref: 121
57) When validating the assumptions of a regression, ________ assumes that the errors of the
response variable are normally distributed.
Answer: normality
Diff: 2
Page Ref: 89-90
58) ________ charts are effective when you have nominal data or numerical data that splits
nicely into different categories so you can quickly see comparative results and trends within your
data.
Answer: Bar
Diff: 1
Page Ref: 106
59) ________ plots are often used to explore the relationship between two or three variables (in
2-D or 2-D visuals).
Answer: Scatter
Diff: 2
Page Ref: 106
60) ________ charts are a special case of horizontal bar charts that are used to portray project
timelines, project tasks/activity durations, and overlap among the tasks/activities.
Answer: Gantt
Diff: 2
Page Ref: 107
61) List and describe the three major categories of business reports.
Answer:
• Metric management reports. Many organizations manage business performance through
outcome-oriented metrics. For external groups, these are service-level agreements (SLAs). For
internal management, they are key performance indicators (KPIs).
• Dashboard-type reports. This report presents a range of different performance indicators on
one page, like a dashboard in a car. Typically, there is a set of predefined reports with static
elements and fixed structure, but customization of the dashboard is allowed through widgets,
views, and set targets for various metrics.
• Balanced scorecard–type reports. This is a method developed by Kaplan and Norton that
attempts to present an integrated view of success in an organization. In addition to financial
performance, balanced scorecard–type reports also include customer, business process, and
learning and growth perspectives.
Diff: 2
Page Ref: 99
62) List five types of specialized charts and graphs.
Answer:
• Histograms
• Gantt charts
• PERT charts
• Geographic maps
• Bullets
• Heat maps
• Highlight tables
• Tree maps
Diff: 2
Page Ref: 107-108
63) According to Eckerson (2006), a well-known expert on BI dashboards, what are the three
layers of information of a dashboard?
Answer:
1. Monitoring. Graphical, abstracted data to monitor key performance metrics.
2. Analysis. Summarized dimensional data to analyze the root cause of problems.
3. Management. Detailed operational data that identify what actions to take to resolve a
problem.
Diff: 2
Page Ref: 119
64) List the five most common functions of business reports.
Answer:
• To ensure that all departments are functioning properly
• To provide information
• To provide the results of an analysis
• To persuade others to act
• To create an organizational memory (as part of a knowledge management system)
Diff: 2
Page Ref: 98
65) What are the most important assumptions in linear regression?
Answer:
1. Linearity. This assumption states that the relationship between the response variable and the
explanatory variables is linear. That is, the expected value of the response variable is a straightline function of each explanatory variable, while holding all other explanatory variables fixed.
Also, the slope of the line does not depend on the values of the other variables. It also implies
that the effects of different explanatory variables on the expected value of the response variable
are additive in nature.
2. Independence (of errors). This assumption states that the errors of the response variable are
uncorrelated with each other. This independence of the errors is weaker than actual statistical
independence, which is a stronger condition and is often not needed for linear regression
analysis.
3. Normality (of errors). This assumption states that the errors of the response variable are
normally distributed. That is, they are supposed to be totally random and should not represent
any nonrandom patterns.
4. Constant variance (of errors). This assumption, also called homoscedasticity, states that the
response variables have the same variance in their error, regardless of the values of the
explanatory variables. In practice this assumption is invalid if the response variable varies over a
wide enough range/scale.
5. Multicollinearity. This assumption states that the explanatory variables are not correlated (i.e.,
do not replicate the same but provide a different perspective of the information needed for the
model). Multicollinearity can be triggered by having two or more perfectly correlated
explanatory variables presented to the model (e.g., if the same explanatory variable is mistakenly
included in the model twice, one with a slight transformation of the same variable). A
correlation-based data assessment usually catches this error.
Diff: 2
Page Ref: 89-90
66) Describe the difference between simple and multiple regression.
Answer: If the regression equation is built between one response variable and one explanatory
variable, then it is called simple regression. Multiple regression is the extension of simple
regression where the explanatory variables are more than one.
Diff: 2
Page Ref: 87
67) Describe the difference between descriptive and inferential statistics.
Answer: The main difference between descriptive and inferential statistics is the data used in
these methods—whereas descriptive statistics is all about describing the sample data on hand,
and inferential statistics is about drawing inferences or conclusions about the characteristics of
the population.
Diff: 2
Page Ref: 75
68) Describe categorical and nominal data.
Answer: Categorical data represent the labels of multiple classes used to divide a variable into
specific groups. Examples of categorical variables include race, sex, age group, and educational
level. Nominal data contain measurements of simple codes assigned to objects as labels, which
are not measurements. For example, the variable marital status can be generally categorized as
(1) single, (2) married, and (3) divorced.
Diff: 2
Page Ref: 61
Business Intelligence, 4e (Sharda/Delen/Turban)
Chapter 3 Descriptive Analytics II: Business Intelligence and Data Warehousing
1) The BPM development cycle is essentially a one-shot process where the requirement is to get
it right the first time.
Answer: FALSE
Diff: 2
Page Ref: 170
2) The "islands of data" problem in the 1980s describes the phenomenon of unconnected data
being stored in numerous locations within an organization.
Answer: TRUE
Diff: 2
Page Ref: 132
3) Subject oriented databases for data warehousing are organized by detailed subjects such as
disk drives, computers, and networks.
Answer: FALSE
Diff: 2
Page Ref: 133
4) Data warehouses are subsets of data marts.
Answer: FALSE
Diff: 1
Page Ref: 134
5) One way an operational data store differs from a data warehouse is the recency of their data.
Answer: TRUE
Diff: 2
Page Ref: 135
6) Organizations seldom devote a lot of effort to creating metadata because it is not important for
the effective use of data warehouses.
Answer: FALSE
Diff: 2
Page Ref: 135
7) Without middleware, different BI programs cannot easily connect to the data warehouse.
Answer: TRUE
Diff: 2
Page Ref: 139
8) Two-tier data warehouse/BI infrastructures offer organizations more flexibility but cost more
than three-tier ones.
Answer: FALSE
Diff: 2
Page Ref: 140
9) Moving the data into a data warehouse is usually the easiest part of its creation.
Answer: FALSE
Diff: 2
Page Ref: 141
10) The hub-and-spoke data warehouse model uses a centralized warehouse feeding dependent
data marts.
Answer: TRUE
Diff: 2
Page Ref: 142
11) Because of performance and data quality issues, most experts agree that the federated
architecture should supplement data warehouses, not replace them.
Answer: TRUE
Diff: 2
Page Ref: 144
12) Bill Inmon advocates the data mart bus architecture whereas Ralph Kimball promotes the
hub-and-spoke architecture, a data mart bus architecture with conformed dimensions.
Answer: FALSE
Diff: 2
Page Ref: 144
13) Properly integrating data from various databases and other disparate sources is a trivial
process.
Answer: FALSE
Diff: 3
Page Ref: 146
14) With key performance indicators, driver KPIs have a significant effect on outcome KPIs, but
the reverse is not necessarily true.
Answer: TRUE
Diff: 2
Page Ref: 176
15) With the balanced scorecard approach, the entire focus is on measuring and managing
specific financial goals based on the organization's strategy.
Answer: FALSE
Diff: 2
Page Ref: 177
16) OLTP systems are designed to handle ad hoc analysis and complex queries that deal with
many data items.
Answer: FALSE
Diff: 2
Page Ref: 158
17) The data warehousing maturity model consists of six stages: prenatal, infant, child, teenager,
adult, and sage.
Answer: TRUE
Diff: 2
Page Ref: 160-161
18) User-initiated navigation of data through disaggregation is referred to as "drill up."
Answer: FALSE
Diff: 3
Page Ref: 159
19) Data warehouse administrators (DWAs) do not need strong business insight since they only
handle the technical aspect of the infrastructure.
Answer: FALSE
Diff: 2
Page Ref: 164
20) Because the recession has raised interest in low-cost open source software, it is now set to
replace traditional enterprise software.
Answer: FALSE
Diff: 2
Page Ref: 165
21) Why is a performance management system superior to a performance measurement system?
A) because performance measurement systems are only in their infancy
B) because measurement automatically leads to problem solution
C) because performance management systems cost more
D) because measurement alone has little use without action
Answer: D
Diff: 3
Page Ref: 176-177
22) Operational or transaction databases are product oriented, handling transactions that update
the database. In contrast, data warehouses are
A) subject-oriented and nonvolatile.
B) product-oriented and nonvolatile.
C) product-oriented and volatile.
D) subject-oriented and volatile.
Answer: A
Diff: 3
Page Ref: 131
23) Which kind of data warehouse is created separately from the enterprise data warehouse by a
department and not reliant on it for updates?
A) sectional data mart
B) public data mart
C) independent data mart
D) volatile data mart
Answer: C
Diff: 2
Page Ref: 134
24) Oper marts are created when operational data needs to be analyzed
A) linearly.
B) in a dashboard.
C) unidimensionally.
D) multidimensionally.
Answer: D
Diff: 2
Page Ref: 135
25) A Web client that connects to a Web server, which is in turn connected to a BI application
server, is reflective of a
A) one-tier architecture.
B) two-tier architecture.
C) three-tier architecture.
D) four-tier architecture.
Answer: C
Diff: 2
Page Ref: 139
26) Which of the following BEST enables a data warehouse to handle complex queries and scale
up to handle many more requests?
A) use of the Web by users as a front-end
B) parallel processing
C) Microsoft Windows
D) a larger IT staff
Answer: B
Diff: 3
Page Ref: 141
27) Which data warehouse architecture uses metadata from existing data warehouses to create a
hybrid logical data warehouse comprised of data from the other warehouses?
A) independent data marts architecture
B) centralized data warehouse architecture
C) hub-and-spoke data warehouse architecture
D) federated architecture
Answer: D
Diff: 3
Page Ref: 142
28) Which data warehouse architecture uses a normalized relational warehouse that feeds
multiple data marts?
A) independent data marts architecture
B) centralized data warehouse architecture
C) hub-and-spoke data warehouse architecture
D) federated architecture
Answer: C
Diff: 3
Page Ref: 142
29) Which approach to data warehouse integration focuses more on sharing process functionality
than data across systems?
A) extraction, transformation, and load
B) enterprise application integration
C) enterprise information integration
D) enterprise function integration
Answer: B
Diff: 3
Page Ref: 147
30) ________ is an evolving tool space that promises real-time data integration from a variety of
sources, such as relational databases, Web services, and multidimensional databases.
A) Enterprise information integration (EII)
B) Enterprise application integration (EAI)
C) Extraction, transformation, and load (ETL)
D) None of these
Answer: A
Diff: 3
Page Ref: 148
31) In which stage of extraction, transformation, and load (ETL) into a data warehouse are
anomalies detected and corrected?
A) transformation
B) extraction
C) load
D) cleanse
Answer: D
Diff: 3
Page Ref: 149
32) Data warehouses provide direct and indirect benefits to organizations. Which of the
following is an indirect benefit of data warehouses?
A) better and more timely information
B) extensive new analyses performed by users
C) simplified access to data
D) improved customer service
Answer: D
Diff: 3
Page Ref: 150
33) All of the following are benefits of hosted data warehouses EXCEPT
A) smaller upfront investment.
B) better quality hardware.
C) greater control of data.
D) frees up in-house systems.
Answer: C
Diff: 2
Page Ref: 157
34) When representing data in a data warehouse, using several dimension tables that are each
connected only to a fact table means you are using which warehouse structure?
A) star schema
B) snowflake schema
C) relational schema
D) dimensional schema
Answer: A
Diff: 3
Page Ref: 157
35) When querying a dimensional database, a user went from summarized data to its underlying
details. The function that served this purpose is
A) dice.
B) slice.
C) roll-up.
D) drill down.
Answer: D
Diff: 3
Page Ref: 159
36) What is Six Sigma?
A) a letter in the Greek alphabet that statisticians use to measure process variability
B) a methodology aimed at reducing the number of defects in a business process
C) a methodology aimed at reducing the amount of variability in a business process
D) a methodology aimed at measuring the amount of variability in a business process
Answer: B
Diff: 2
Page Ref: 180
37) Real-time data warehousing can be used to support the highest level of decision making
sophistication and power. The major feature that enables this in relation to handling the data is
A) country of (data) origin.
B) nature of the data.
C) speed of data transfer.
D) source of the data.
Answer: C
Diff: 2
Page Ref: 168
38) A large storage location that can hold vast quantities of data (mostly unstructured) in its
native/raw format for future/potential analytics consumption is referred to as a(n)
A) extended ASP.
B) data cloud.
C) data lake.
D) relational database.
Answer: C
Diff: 3
Page Ref: 166
39) How does the use of cloud computing affect the scalability of a data warehouse?
A) Cloud computing vendors bring as much hardware as needed to users' offices.
B) Hardware resources are dynamically allocated as use increases.
C) Cloud vendors are mostly based overseas where the cost of labor is low.
D) Cloud computing has little effect on a data warehouse's scalability.
Answer: B
Diff: 3
Page Ref: 165-166
40) All of the following are true about in-database processing technology EXCEPT
A) it pushes the algorithms to where the data is.
B) it makes the response to queries much faster than conventional databases.
C) it is often used for apps like credit card fraud detection and investment risk management.
D) it is the same as in-memory storage technology.
Answer: D
Diff: 3
Page Ref: 169
41) A(n) ________ data store (ODS) provides a fairly recent form of customer information file.
Answer: operational
Diff: 2
Page Ref: 135
42) In ________ oriented data warehousing, operational databases are tuned to handle
transactions that update the database.
Answer: product
Diff: 2
Page Ref: 134
43) The three main types of data warehouses are data marts, operational ________, and
enterprise data warehouses.
Answer: data stores
Diff: 2
Page Ref: 134
44) ________ describe the structure and meaning of the data, contributing to their effective use.
Answer: Metadata
Diff: 1
Page Ref: 135
45) Most data warehouses are built using ________ database management systems to control and
manage the data.
Answer: relational
Diff: 2
Page Ref: 141
46) A(n) ________ architecture is used to build a scalable and maintainable infrastructure that
includes a centralized data warehouse and several dependent data marts.
Answer: hub-and-spoke
Diff: 2
Page Ref: 142
47) The ________ data warehouse architecture involves integrating disparate systems and
analytical resources from multiple sources to meet changing needs or business conditions.
Answer: federated
Diff: 2
Page Ref: 142
48) Data ________ comprises data access, data federation, and change capture.
Answer: integration
Diff: 3
Page Ref: 146
49) ________ is a mechanism that integrates application functionality and shares functionality
(rather than data) across systems, thereby enabling flexibility and reuse.
Answer: Enterprise application integration (EAI)
Diff: 3
Page Ref: 147
50) ________ is a mechanism for pulling data from source systems to satisfy a request for
information. It is an evolving tool space that promises real-time data integration from a variety of
sources, such as relational databases, Web services, and multidimensional databases.
Answer: Enterprise information integration (EII)
Diff: 3
Page Ref: 148
51) Performing extensive ________ to move data to the data warehouse may be a sign of poorly
managed data and a fundamental lack of a coherent data management strategy.
Answer: extraction, transformation, and load (ETL)
Diff: 3
Page Ref: 149
52) The ________ Model, also known as the EDW approach, emphasizes top-down
development, employing established database development methodologies and tools, such as
entity-relationship diagrams (ERD), and an adjustment of the spiral development approach.
Answer: Inmon
Diff: 2
Page Ref: 153-154
53) The ________ Model, also known as the data mart approach, is a "plan big, build small"
approach. A data mart is a subject-oriented or department-oriented data warehouse. It is a scaleddown version of a data warehouse that focuses on the requests of a specific department, such as
marketing or sales.
Answer: Kimball
Diff: 2
Page Ref: 154
54) ________ modeling is a retrieval-based system that supports high-volume query access.
Answer: Dimensional
Diff: 2
Page Ref: 156
55) A(n) ________ data mart is a subset that is created directly from the data warehouse..
Answer: dependent
Diff: 1
Page Ref: 134
56) Online ________ is a term used for a transaction system that is primarily responsible for
capturing and storing data related to day-to-day business functions such as ERP, CRM, SCM,
and point of sale.
Answer: transaction processing
Diff: 2
Page Ref: 158
57) Given that the size of data warehouses is expanding at an exponential rate, ________ is an
important issue.
Answer: scalability
Diff: 2
Page Ref: 163
58) The role responsible for successful administration and management of a data warehouse is
the ________, who should be familiar with high-performance software, hardware, and
networking technologies, and also possesses solid business insight.
Answer: data warehouse administrator (DWA)
Diff: 2
Page Ref: 164
59) ________, or "The Extended ASP Model," is a creative way of deploying information
system applications where the provider licenses its applications to customers for use as a service
on demand (usually over the Internet).
Answer: SaaS (software as a service)
Diff: 2
Page Ref: 165
60) ________ (also called in-database analytics) refers to the integration of the algorithmic
extent of data analytics into data warehouse.
Answer: In-database processing
Diff: 2
Page Ref: 169
61) What is the definition of a data warehouse (DW) in simple terms?
Answer: In simple terms, a data warehouse (DW) is a pool of data produced to support decision
making; it is also a repository of current and historical data of potential interest to managers
throughout the organization.
Diff: 2
Page Ref: 131
62) A common way of introducing data warehousing is to refer to its fundamental characteristics.
Describe three characteristics of data warehousing.
Answer:
• Subject oriented. Data are organized by detailed subject, such as sales, products, or
customers, containing only information relevant for decision support.
• Integrated. Integration is closely related to subject orientation. Data warehouses must place
data from different sources into a consistent format. To do so, they must deal with naming
conflicts and discrepancies among units of measure. A data warehouse is presumed to be totally
integrated.
• Time variant (time series). A warehouse maintains historical data. The data do not
necessarily provide current status (except in real-time systems). They detect trends, deviations,
and long-term relationships for forecasting and comparisons, leading to decision making. Every
data warehouse has a temporal quality. Time is the one important dimension that all data
warehouses must support. Data for analysis from multiple sources contains multiple time points
(e.g., daily, weekly, monthly views).
• Nonvolatile. After data are entered into a data warehouse, users cannot change or update the
data. Obsolete data are discarded, and changes are recorded as new data.
• Web based. Data warehouses are typically designed to provide an efficient computing
environment for Web-based applications.
• Relational/multidimensional. A data warehouse uses either a relational structure or a
multidimensional structure. A recent survey on multidimensional structures can be found in
Romero and Abelló (2009).
• Client/server. A data warehouse uses the client/server architecture to provide easy access for
end users.
• Real time. Newer data warehouses provide real-time, or active, data-access and analysis
capabilities (see Basu, 2003; and Bonde and Kuckuk, 2004).
• Include metadata. A data warehouse contains metadata (data about data) about how the data
are organized and how to effectively use them.
Diff: 3
Page Ref: 133-134
63) What is the definition of a data mart?
Answer: A data mart is a subset of a data warehouse, typically consisting of a single subject area
(e.g., marketing, operations). Whereas a data warehouse combines databases across an entire
enterprise, a data mart is usually smaller and focuses on a particular subject or department.
Diff: 2
Page Ref: 134
64) Mehra (2005) indicated that few organizations really understand metadata, and fewer
understand how to design and implement a metadata strategy. How would you describe
metadata?
Answer: Metadata are data about data. Metadata describe the structure of and some meaning
about data, thereby contributing to their effective or ineffective use.
Diff: 2
Page Ref: 135
65) What are the four processes that define a closed-loop BPM cycle?
Answer:
1. Strategize: This is the process of identifying and stating the organization's mission, vision,
and objectives, and developing plans (at different levels of granularity—strategic, tactical and
operational) to achieve these objectives.
2. Plan: When operational managers know and understand the what (i.e., the organizational
objectives and goals), they will be able to come up with the how (i.e., detailed operational and
financial plans). Operational and financial plans answer two questions: What tactics and
initiatives will be pursued to meet the performance targets established by the strategic plan?
What are the expected financial results of executing the tactics?
3. Monitor/Analyze: When the operational and financial plans are underway, it is imperative
that the performance of the organization be monitored. A comprehensive framework for
monitoring performance should address two key issues: what to monitor and how to monitor.
4. Act and Adjust: What do we need to do differently? Whether a company is interested in
growing its business or simply improving its operations, virtually all strategies depend on new
projects—creating new products, entering new markets, acquiring new customers or businesses,
or streamlining some processes. The final part of this loop is taking action and adjusting current
actions based on analysis of problems and opportunities.
Diff: 2
Page Ref: 171-172
66) Six Sigma rests on a simple performance improvement model known as DMAIC. What are
the steps involved?
Answer:
1. Define. Define the goals, objectives, and boundaries of the improvement activity. At the top
level, the goals are the strategic objectives of the company. At lower levels—department or
project levels—the goals are focused on specific operational processes.
2. Measure. Measure the existing system. Establish quantitative measures that will yield
statistically valid data. The data can be used to monitor progress toward the goals defined in the
previous step.
3. Analyze. Analyze the system to identify ways to eliminate the gap between the current
performance of the system or process and the desired goal.
4. Improve. Initiate actions to eliminate the gap by finding ways to do things better, cheaper, or
faster. Use project management and other planning tools to implement the new approach.
5. Control. Institutionalize the improved system by modifying compensation and incentive
systems, policies, procedures, manufacturing resource planning, budgets, operation instructions,
or other management systems.
Diff: 2
Page Ref: 180
67) Briefly describe four major components of the data warehousing process.
Answer:
• Data sources. Data are sourced from multiple independent operational "legacy" systems and
possibly from external data providers (such as the U.S. Census). Data may also come from an
OLTP or ERP system.
• Data extraction and transformation. Data are extracted and properly transformed using
custom-written or commercial ETL software.
• Data loading. Data are loaded into a staging area, where they are transformed and cleansed.
The data are then ready to load into the data warehouse and/or data marts.
• Comprehensive database. Essentially, this is the EDW to support all decision analysis by
providing relevant summarized and detailed information originating from many different
sources.
• Metadata. Metadata include software programs about data and rules for organizing data
summaries that are easy to index and search, especially with Web tools.
• Middleware tools. Middleware tools enable access to the data warehouse. There are many
front-end applications that business users can use to interact with data stored in the data
repositories, including data mining, OLAP, reporting tools, and data visualization tools.
Diff: 2
Page Ref: 137-139
68) There are several basic information system architectures that can be used for data
warehousing. What are they?
Answer: Generally speaking, these architectures are commonly called client/server or n-tier
architectures, of which two-tier and three-tier architectures are the most common, but sometimes
there is simply one tier.
Diff: 2
Page Ref: 139-140
69) More data, coming in faster and requiring immediate conversion into decisions, means that
organizations are confronting the need for real-time data warehousing (RDW). How would you
define real-time data warehousing?
Answer: Real-time data warehousing, also known as active data warehousing (ADW), is the
process of loading and providing data via the data warehouse as they become available.
Diff: 2
Page Ref: 168
70) Mention briefly some of the recently popularized concepts and technologies that will play a
significant role in defining the future of data warehousing.
Answer:
• Sourcing (mechanisms for acquisition of data from diverse and dispersed sources):
o Web, social media, and Big Data
o Open source software
o SaaS (software as a service)
o Cloud computing
• Infrastructure (architectural—hardware and software—enhancements):
o Columnar (a new way to store and access data in the database)
o Real-time data warehousing
o Data warehouse appliances (all-in-one solutions to DW)
o Data management technologies and practices
o In-database processing technology (putting the algorithms where the data is)
o In-memory storage technology (moving the data in the memory for faster processing)
o New database management systems
o Advanced analytics
Diff: 3
Page Ref: 165-170
Business Intelligence, 4e (Sharda/Delen/Turban)
Chapter 4 Predictive Analytics I: Data Mining Process, Methods, and Algorithms
1) In the opening case, police detectives used data mining to identify possible new areas of
inquiry.
Answer: FALSE
Diff: 1
Page Ref: 190-191
2) The cost of data storage has plummeted recently, making data mining feasible for more firms.
Answer: TRUE
Diff: 2
Page Ref: 194
3) Data mining can be very useful in detecting patterns such as credit card fraud, but is of little
help in improving sales.
Answer: FALSE
Diff: 2
Page Ref: 193
4) If using a mining analogy, "knowledge mining" would be a more appropriate term than "data
mining."
Answer: TRUE
Diff: 2
Page Ref: 196
5) The entire focus of the predictive analytics system in the Infinity P&C case was on detecting
and handling fraudulent claims for the company's benefit.
Answer: FALSE
Diff: 3
Page Ref: 194-195
6) Data mining requires specialized data analysts to ask ad hoc questions and obtain answers
quickly from the system.
Answer: FALSE
Diff: 2
Page Ref: 197
7) Ratio data is a type of categorical data.
Answer: FALSE
Diff: 1
Page Ref: 202
8) Converting continuous valued numerical variables to ranges and categories is referred to as
discretization.
Answer: TRUE
Diff: 2
Page Ref: 202
9) In the Miami-Dade Police Department case study, predictive analytics helped to identify the
best schedule for officers in order to pay the least overtime.
Answer: FALSE
Diff: 1
Page Ref: 190-191
10) In data mining, classification models help in prediction.
Answer: TRUE
Diff: 2
Page Ref: 215
11) Statistics and data mining both look for data sets that are as large as possible.
Answer: FALSE
Diff: 2
Page Ref: 216
12) Using data mining on data about imports and exports can help to detect tax avoidance and
money laundering.
Answer: TRUE
Diff: 1
Page Ref: 206
13) In the cancer research case study, data mining algorithms that predict cancer survivability
with high predictive power are good replacements for medical professionals.
Answer: FALSE
Diff: 2
Page Ref: 209-210
14) During classification in data mining, a false positive is an occurrence classified as true by the
algorithm while being false in reality.
Answer: TRUE
Diff: 2
Page Ref: 216
15) K-fold cross-validation is also called sliding estimation.
Answer: FALSE
Diff: 2
Page Ref: 218
16) When a problem has many attributes that impact the classification of different patterns,
decision trees may be a useful approach.
Answer: TRUE
Diff: 2
Page Ref: 221
17) In the Dell cases study, the largest issue was how to properly spend the online marketing
budget.
Answer: FALSE
Diff: 2
Page Ref: 198-199
18) Market basket analysis is a useful and entertaining way to explain data mining to a
technologically less savvy audience, but it has little business significance.
Answer: FALSE
Diff: 2
Page Ref: 227
19) Open-source data mining tools include applications such as IBM SPSS Modeler and Dell
Statistica.
Answer: FALSE
Diff: 1
Page Ref: 231
20) Data that is collected, stored, and analyzed in data mining is often private and personal.
There is no way to maintain individuals' privacy other than being very careful about physical
data security.
Answer: FALSE
Diff: 2
Page Ref: 237
21) In the Influence Health case study, what was the goal of the system?
A) locating clinic patients
B) understanding follow-up care
C) decreasing operational costs
D) increasing service use
Answer: D
Diff: 3
Page Ref: 224
22) Understanding customers better has helped Amazon and others become more successful. The
understanding comes primarily from
A) collecting data about customers and transactions.
B) developing a philosophy that is data analytics-centric.
C) analyzing the vast data amounts routinely collected.
D) asking the customers what they want.
Answer: C
Diff: 3
Page Ref: 193
23) All of the following statements about data mining are true EXCEPT
A) the process aspect means that data mining should be a one-step process to results.
B) the novel aspect means that previously unknown patterns are discovered.
C) the potentially useful aspect means that results should lead to some business benefit.
D) the valid aspect means that the discovered patterns should hold true on new data.
Answer: A
Diff: 3
Page Ref: 196
24) What is the main reason parallel processing is sometimes used for data mining?
A) because the hardware exists in most organizations, and it is available to use
B) because most of the algorithms used for data mining require it
C) because of the massive data amounts and search efforts involved
D) because any strategic application requires parallel processing
Answer: C
Diff: 3
Page Ref: 197
25) The data field "ethnic group" can be best described as
A) nominal data.
B) interval data.
C) ordinal data.
D) ratio data.
Answer: A
Diff: 2
Page Ref: 208
26) A data mining study is specific to addressing a well-defined business task, and different
business tasks require
A) general organizational data.
B) general industry data.
C) general economic data.
D) different sets of data.
Answer: D
Diff: 2
Page Ref: 208
27) Which broad area of data mining applications analyzes data, forming rules to distinguish
between defined classes?
A) associations
B) visualization
C) classification
D) clustering
Answer: C
Diff: 2
Page Ref: 200
28) Which broad area of data mining applications partitions a collection of objects into natural
groupings with similar features?
A) associations
B) visualization
C) classification
D) clustering
Answer: D
Diff: 2
Page Ref: 200
29) Clustering partitions a collection of things into segments whose members share
A) similar characteristics.
B) dissimilar characteristics.
C) similar collection methods.
D) dissimilar collection methods.
Answer: A
Diff: 2
Page Ref: 202
30) Identifying and preventing incorrect claim payments and fraudulent activities falls under
which type of data mining applications?
A) insurance
B) retailing and logistics
C) customer relationship management
D) computer hardware and software
Answer: A
Diff: 2
Page Ref: 204
31) All of the following statements about data mining are true EXCEPT:
A) The term is relatively new.
B) Its techniques have their roots in traditional statistical analysis and artificial intelligence.
C) The ideas behind it are relatively new.
D) Intense, global competition make its application more important.
Answer: C
Diff: 2
Page Ref: 194
32) Which data mining process/methodology is thought to be the most comprehensive, according
to kdnuggets.com rankings?
A) SEMMA
B) proprietary organizational methodologies
C) KDD Process
D) CRISP-DM
Answer: D
Diff: 2
Page Ref: 214
33) Prediction problems where the variables have numeric values are most accurately defined as
A) classifications.
B) regressions.
C) associations.
D) computations.
Answer: B
Diff: 3
Page Ref: 215
34) What does the robustness of a data mining method refer to?
A) its ability to predict the outcome of a previously unknown data set accurately
B) its speed of computation and computational costs in using the mode
C) its ability to construct a prediction model efficiently given a large amount of data
D) its ability to overcome noisy data to make somewhat accurate predictions
Answer: D
Diff: 3
Page Ref: 216
35) What does the scalability of a data mining method refer to?
A) its ability to predict the outcome of a previously unknown data set accurately
B) its speed of computation and computational costs in using the mode
C) its ability to construct a prediction model efficiently given a large amount of data
D) its ability to overcome noisy data to make somewhat accurate predictions
Answer: C
Diff: 3
Page Ref: 216
36) In estimating the accuracy of data mining (or other) classification models, the true positive
rate is
A) the ratio of correctly classified positives divided by the total positive count.
B) the ratio of correctly classified negatives divided by the total negative count.
C) the ratio of correctly classified positives divided by the sum of correctly classified positives
and incorrectly classified positives.
D) the ratio of correctly classified positives divided by the sum of correctly classified positives
and incorrectly classified negatives.
Answer: A
Diff: 2
Page Ref: 216-217
37) In data mining, finding an affinity of two products to be commonly together in a shopping
cart is known as
A) association rule mining.
B) cluster analysis.
C) decision trees.
D) artificial neural networks.
Answer: A
Diff: 2
Page Ref: 227
38) Third party providers of publicly available data sets protect the anonymity of the individuals
in the data set primarily by
A) asking data users to use the data ethically.
B) leaving in identifiers (e.g., name), but changing other variables.
C) removing identifiers such as names and social security numbers.
D) letting individuals in the data know their data is being accessed.
Answer: C
Diff: 3
Page Ref: 237
39) In the Target case study, why did Target send a teen maternity ads?
A) Target's analytic model confused her with an older woman with a similar name.
B) Target was sending ads to all women in a particular neighborhood.
C) Target's analytic model suggested she was pregnant based on her buying habits.
D) Target was using a special promotion that targeted all teens in her geographical area.
Answer: C
Diff: 2
Page Ref: 238
40) Which of the following is a data mining myth?
A) Data mining is a multistep process that requires deliberate, proactive design and use.
B) Data mining requires a separate, dedicated database.
C) The current state-of-the-art is ready to go for almost any business.
D) Newer Web-based tools enable managers of all educational levels to do data mining.
Answer: B
Diff: 2
Page Ref: 239-240
41) In the Influence Health case, the company was able to evaluate over ________ million
records in only two days.
Answer: 195
Diff: 3
Page Ref: 225
42) There has been an increase in data mining to deal with global competition and customers'
more sophisticated ________ and wants.
Answer: needs
Diff: 2
Page Ref: 194
43) Knowledge extraction, pattern analysis, data archaeology, information harvesting, pattern
searching, and data dredging are all alternative names for ________.
Answer: data mining
Diff: 1
Page Ref: 196
44) Data are often buried deep within very large ________, which sometimes contain data from
several years.
Answer: databases
Diff: 1
Page Ref: 196
45) ________ was proposed in the mid-1990s by a European consortium of companies to serve
as a nonproprietary standard methodology for data mining.
Answer: CRISP-DM
Diff: 2
Page Ref: 207
46) In the Dell case study, engineers working closely with marketing, used lean software
development strategies and numerous technologies to create a highly scalable, singular
________.
Answer: data mart
Diff: 2
Page Ref: 199
47) Patterns have been manually ________ from data by humans for centuries, but the increasing
volume of data in modern times has created a need for more automatic approaches.
Answer: extracted
Diff: 2
Page Ref: 200
48) While prediction is largely experience and opinion based, ________ is data and model based.
Answer: forecasting
Diff: 2
Page Ref: 200
49) Whereas ________ starts with a well-defined proposition and hypothesis, data mining starts
with a loosely defined discovery statement.
Answer: statistics
Diff: 2
Page Ref: 203
50) Customer ________ management extends traditional marketing by creating one-on-one
relationships with customers.
Answer: relationship
Diff: 2
Page Ref: 203
51) In the terrorist funding case study, an observed price ________ may be related to income tax
avoidance/evasion, money laundering, or terrorist financing.
Answer: deviation
Diff: 3
Page Ref: 206
52) Data preparation, the third step in the CRISP-DM data mining process, is more commonly
known as ________.
Answer: data preprocessing
Diff: 2
Page Ref: 208
53) The data mining in cancer research case study explains that data mining methods are capable
of extracting patterns and ________ hidden deep in large and complex medical databases.
Answer: relationships
Diff: 3
Page Ref: 209-210
54) Fayyad et al. (1996) defined ________ in databases as a process of using data mining
methods to find useful information and patterns in the data.
Answer: knowledge discovery
Diff: 2
Page Ref: 213
55) In ________, a classification method, the complete data set is randomly split into mutually
exclusive subsets of approximately equal size and tested multiple times on each left-out subset,
using the others as a training set.
Answer: k-fold cross-validation
Diff: 2
Page Ref: 218
56) The basic idea behind a(n) ________ is that it recursively divides a training set until each
division consists entirely or primarily of examples from one class.
Answer: decision tree
Diff: 3
Page Ref: 221
57) As described in the Influence Health case study, customers are more often ________ services
from a variety of healthcare service providers before selecting one.
Answer: comparing
Diff: 2
Page Ref: 224
58) Because of its successful application to retail business problems, association rule mining is
commonly called ________.
Answer: market-basket analysis
Diff: 2
Page Ref: 227
59) The ________ is the most commonly used algorithm to discover association rules. Given a
set of itemsets, the algorithm attempts to find subsets that are common to at least a minimum
number of the itemsets.
Answer: Apriori algorithm
Diff: 2
Page Ref: 229
60) One way to accomplish privacy and protection of individuals' rights when data mining is by
________ of the customer records prior to applying data mining applications, so that the records
cannot be traced to an individual.
Answer: de-identification
Diff: 2
Page Ref: 237
61) List five reasons for the growing popularity of data mining in the business world.
Answer:
• More intense competition at the global scale driven by customers' ever-changing needs and
wants in an increasingly saturated marketplace
• General recognition of the untapped value hidden in large data sources
• Consolidation and integration of database records, which enables a single view of customers,
vendors, transactions, etc.
• Consolidation of databases and other data repositories into a single location in the form of a
data warehouse
• The exponential increase in data processing and storage technologies
• Significant reduction in the cost of hardware and software for data storage and processing
• Movement toward the demassification (conversion of information resources into nonphysical
form) of business practices
Diff: 2
Page Ref: 194
62) List 3 common data mining myths and realities.
Answer:
1) Myth: Data mining provides instant, crystal-ball-like predictions.
Reality: Data mining is a multistep process that requires deliberate, proactive design and use.
2) Myth: Data mining is not yet viable for mainstream business applications.
Reality: The current state of the art is ready to go for almost any business type and/or size.
3) Myth: Data mining requires a separate, dedicated database.
Reality: Because of the advances in database technology, a dedicated database is not required.
4) Myth: Only those with advanced degrees can do data mining.
Reality: Newer Web-based tools enable managers of all educational levels to do data mining.
5) Myth: Data mining is only for large firms that have lots of customer data.
Reality: If the data accurately reflect the business or its customers, any company can use data
mining.
Diff: 2
Page Ref: 239
63) List and briefly describe the six steps of the CRISP-DM data mining process.
Answer:
Step 1: Business Understanding — The key element of any data mining study is to know what
the study is for. Answering such a question begins with a thorough understanding of the
managerial need for new knowledge and an explicit specification of the business objective
regarding the study to be conducted.
Step 2: Data Understanding — A data mining study is specific to addressing a well-defined
business task, and different business tasks require different sets of data. Following the business
understanding, the main activity of the data mining process is to identify the relevant data from
many available databases.
Step 3: Data Preparation — The purpose of data preparation (or more commonly called data
preprocessing) is to take the data identified in the previous step and prepare it for analysis by
data mining methods. Compared to the other steps in CRISP-DM, data preprocessing consumes
the most time and effort; most believe that this step accounts for roughly 80 percent of the total
time spent on a data mining project
Step 4: Model Building — Here, various modeling techniques are selected and applied to an
already prepared data set in order to address the specific business need. The model-building step
also encompasses the assessment and comparative analysis of the various models built.
Step 5: Testing and Evaluation — In step 5, the developed models are assessed and evaluated
for their accuracy and generality. This step assesses the degree to which the selected model (or
models) meets the business objectives and, if so, to what extent (i.e., do more models need to be
developed and assessed).
Step 6: Deployment — Depending on the requirements, the deployment phase can be as simple
as generating a report or as complex as implementing a repeatable data mining process across the
enterprise. In many cases, it is the customer, not the data analyst, who carries out the deployment
steps.
Diff: 2
Page Ref: 207-212
64) Describe the role of the simple split in estimating the accuracy of classification models.
Answer: The simple split (or holdout or test sample estimation) partitions the data into two
mutually exclusive subsets called a training set and a test set (or holdout set). It is common to
designate two-thirds of the data as the training set and the remaining one-third as the test set. The
training set is used by the inducer (model builder), and the built classifier is then tested on the
test set. An exception to this rule occurs when the classifier is an artificial neural network. In this
case, the data is partitioned into three mutually exclusive subsets: training, validation, and
testing.
Diff: 2
Page Ref: 217
65) Briefly describe five techniques (or algorithms) that are used for classification modeling.
Answer:
• Decision tree analysis. Decision tree analysis (a machine-learning technique) is arguably the
most popular classification technique in the data mining arena.
• Statistical analysis. Statistical techniques were the primary classification algorithm for many
years until the emergence of machine-learning techniques. Statistical classification techniques
include logistic regression and discriminant analysis.
• Neural networks. These are among the most popular machine-learning techniques that can be
used for classification-type problems.
• Case-based reasoning. This approach uses historical cases to recognize commonalities in
order to assign a new case into the most probable category.
• Bayesian classifiers. This approach uses probability theory to build classification models
based on the past occurrences that are capable of placing a new instance into a most probable
class (or category).
• Genetic algorithms. This approach uses the analogy of natural evolution to build directedsearch-based mechanisms to classify data samples.
• Rough sets. This method takes into account the partial membership of class labels to
predefined categories in building models (collection of rules) for classification problems.
Diff: 2
Page Ref: 219-220
66) Describe cluster analysis and some of its applications.
Answer: Cluster analysis is an exploratory data analysis tool for solving classification problems.
The objective is to sort cases (e.g., people, things, events) into groups, or clusters, so that the
degree of association is strong among members of the same cluster and weak among members of
different clusters. Cluster analysis is an essential data mining method for classifying items,
events, or concepts into common groupings called clusters. The method is commonly used in
biology, medicine, genetics, social network analysis, anthropology, archaeology, astronomy,
character recognition, and even in MIS development. As data mining has increased in popularity,
the underlying techniques have been applied to business, especially to marketing. Cluster
analysis has been used extensively for fraud detection (both credit card and e-commerce fraud)
and market segmentation of customers in contemporary CRM systems.
Diff: 2
Page Ref: 225-226
67) In the data mining in Hollywood case study, how successful were the models in predicting
the success or failure of a Hollywood movie?
Answer: The researchers claim that these prediction results are better than any reported in the
published literature for this problem domain. Fusion classification methods attained up to
56.07% accuracy in correctly classifying movies and 90.75% accuracy in classifying movies
within one category of their actual category. The SVM classification method attained up to
55.49% accuracy in correctly classifying movies and 85.55% accuracy in classifying movies
within one category of their actual category.
Diff: 3
Page Ref: 233
68) In lessons learned from the Target case, what legal warnings would you give another retailer
using data mining for marketing?
Answer: If you look at this practice from a legal perspective, you would conclude that Target
did not use any information that violates customer privacy; rather, they used transactional data
that most every other retail chain is collecting and storing (and perhaps analyzing) about their
customers. What was disturbing in this scenario was perhaps the targeted concept: pregnancy.
There are certain events or concepts that should be off limits or treated extremely cautiously,
such as terminal disease, divorce, and bankruptcy.
Diff: 2
Page Ref: 238
69) List four myths associated with data mining.
Answer:
• Data mining provides instant, crystal-ball-like predictions.
• Data mining is not yet viable for business applications.
• Data mining requires a separate, dedicated database.
• Only those with advanced degrees can do data mining.
• Data mining is only for large firms that have lots of customer data.
Diff: 2
Page Ref: 239
70) List six common data mining mistakes.
Answer:
• Selecting the wrong problem for data mining
• Ignoring what your sponsor thinks data mining is and what it really can and cannot do
• Leaving insufficient time for data preparation
• Looking only at aggregated results and not at individual records
• Being sloppy about keeping track of the data mining procedure and results
• Ignoring suspicious findings and quickly moving on
• Running mining algorithms repeatedly and blindly
• Believing everything you are told about the data
• Believing everything you are told about your own data mining analysis
• Measuring your results differently from the way your sponsor measures them
Diff: 2
Page Ref: 239-240
Business Intelligence, 4e (Sharda/Delen/Turban)
Chapter 5 Predictive Analytics II: Text, Web, and Social Media Analytics
1) Text analytics is the subset of text mining that handles information retrieval and extraction,
plus data mining.
Answer: FALSE
Diff: 2
Page Ref: 251
2) Categorization and clustering of documents during text mining differ only in the preselection
of categories.
Answer: TRUE
Diff: 2
Page Ref: 252
3) Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out.
Answer: TRUE
Diff: 2
Page Ref: 253
4) In the car insurance case study, text mining was used to identify auto features that caused
injuries.
Answer: FALSE
Diff: 2
Page Ref: 254-255
5) Regional accents present challenges for natural language processing.
Answer: TRUE
Diff: 2
Page Ref: 256
6) In the Tito's Vodka case study, trends in cocktails were studied to create a quarterly recipe for
customers.
Answer: TRUE
Diff: 2
Page Ref: 306
7) In the Wimbledon case study, designers balanced the needs of mobile and desktop computer
users.
Answer: TRUE
Diff: 2
Page Ref: 278
8) In text mining, if an association between two concepts has 7% support, it means that 7% of the
documents had both concepts represented in the same document.
Answer: TRUE
Diff: 2
Page Ref: 272
9) In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's
feelings.
Answer: FALSE
Diff: 2
Page Ref: 276
10) Current use of sentiment analysis in voice of the customer applications allows companies to
change their products or services in real time in response to customer sentiment.
Answer: TRUE
Diff: 2
Page Ref: 276
11) In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but
easier to classify others, e.g., movie reviews, in the same way.
Answer: TRUE
Diff: 2
Page Ref: 276
12) Search engines are only used in the context of the World Wide Web (WWW).
Answer: FALSE
Diff: 2
Page Ref: 291
13) Search engine optimization (SEO) techniques play a minor role in a Web site's search
ranking because only well-written content matters.
Answer: FALSE
Diff: 2
Page Ref: 294-295
14) Clickstream analysis does not need users to enter their perceptions of the Web site or other
feedback directly to be useful in determining their preferences.
Answer: TRUE
Diff: 2
Page Ref: 299
15) Since little can be done about visitor Web site abandonment rates, organizations have to
focus their efforts on increasing the number of new visitors.
Answer: FALSE
Diff: 2
Page Ref: 303
16) Web-based media has nearly identical cost and scale structures as traditional media.
Answer: FALSE
Diff: 2
Page Ref: 309
17) Consistent high quality, higher publishing frequency, and longer time lag are all attributes of
industrial publishing when compared to Web publishing.
Answer: FALSE
Diff: 2
Page Ref: 309-310
18) In the evolution of social media user engagement, the largest recent change is the growth of
creators.
Answer: FALSE
Diff: 2
Page Ref: 310-311
19) Descriptive analytics for social media feature such items as your followers as well as the
content in online conversations that help you to identify themes and sentiments.
Answer: FALSE
Diff: 2
Page Ref: 311
20) Companies understand that when their product goes "viral," the content of the online
conversations about their product does not matter, only the volume of conversations.
Answer: FALSE
Diff: 3
Page Ref: 312
21) In the opening vignette, the architectural system that supported Watson used all the
following elements EXCEPT
A) massive parallelism to enable simultaneous consideration of multiple hypotheses.
B) an underlying confidence subsystem that ranks and integrates answers.
C) a core engine that could operate seamlessly in another domain without changes.
D) integration of shallow and deep knowledge.
Answer: C
Diff: 3
Page Ref: 248-250
22) In text mining, tokenizing is the process of
A) categorizing a block of text in a sentence.
B) reducing multiple words to their base or root.
C) transforming the term-by-document matrix to a manageable size.
D) creating new branches or stems of recorded paragraphs.
Answer: A
Diff: 2
Page Ref: 253
23) All of the following are challenges associated with natural language processing EXCEPT
A) dividing up a text into individual words in English.
B) understanding the context in which something is said.
C) distinguishing between words that have more than one meaning.
D) recognizing typographical or grammatical errors in texts.
Answer: A
Diff: 3
Page Ref: 256
24) Natural language processing (NLP) is associated with which of the following areas?
A) text mining
B) artificial intelligence
C) computational linguistics
D) all of these
Answer: D
Diff: 2
Page Ref: 256
25) In the research literature case study, the researchers analyzing academic papers extracted
information from which source?
A) the paper abstract
B) the paper keywords
C) the main body of the paper
D) the paper references
Answer: A
Diff: 1
Page Ref: 273-274
26) In sentiment analysis, which of the following is an implicit opinion?
A) The hotel we stayed in was terrible.
B) The customer service I got for my TV was laughable.
C) The cruise we went on last summer was a disaster.
D) Our new mayor is great for the city.
Answer: B
Diff: 3
Page Ref: 277
27) In the Wimbledon case study, the tournament used data for each match in real time to
highlight
A) winners and losers.
B) player histories.
C) significant events.
D) advertiser content.
Answer: C
Diff: 2
Page Ref: 278-280
28) What do voice of the market (VOM) applications of sentiment analysis do?
A) They examine customer sentiment at the aggregate level.
B) They examine employee sentiment in the organization.
C) They examine the stock market for trends.
D) They examine the "market of ideas" in politics.
Answer: A
Diff: 3
Page Ref: 281
29) Sentiment analysis projects require a lexicon for use. If a project in English is undertaken,
you must generally make sure to
A) use only the single, approved English lexicon.
B) use any general English lexicon.
C) use an English lexicon appropriate to the project at your discretion.
D) create an English lexicon for the project.
Answer: C
Diff: 3
Page Ref: 284-285
30) In text analysis, what is a lexicon?
A) a catalog of words, their synonyms, and their meanings
B) a catalog of customers, their words, and phrases
C) a catalog of letters, words, phrases, and sentences
D) a catalog of customers, products, words, and phrases
Answer: A
Diff: 3
Page Ref: 284
31) What types of documents are BEST suited to semantic labeling and aggregation to determine
sentiment orientation?
A) medium- to large-sized documents
B) small- to medium-sized documents
C) large-sized documents
D) collections of documents
Answer: B
Diff: 3
Page Ref: 286
32) What does Web content mining involve?
A) analyzing the universal resource locator in Web pages
B) analyzing the unstructured content of Web pages
C) analyzing the pattern of visits to a Web site
D) analyzing the PageRank and other metadata of a Web page
Answer: B
Diff: 2
Page Ref: 289
33) Breaking up a Web page into its components to identify worthy words/terms and indexing
them using a set of rules is called
A) preprocessing the documents.
B) document analysis.
C) creating the term-by-document matrix.
D) parsing the documents.
Answer: D
Diff: 3
Page Ref: 293
34) Search engine optimization (SEO) is a means by which
A) Web site developers can negotiate better deals for paid ads.
B) Web site developers can increase Web site search rankings.
C) Web site developers index their Web sites for search engines.
D) Web site developers optimize the artistic features of their Web sites.
Answer: B
Diff: 2
Page Ref: 294-295
35) What are the two main types of Web analytics?
A) old-school and new-school Web analytics
B) Bing and Google Web analytics
C) off-site and on-site Web analytics
D) data-based and subjective Web analytics
Answer: C
Diff: 3
Page Ref: 299
36) Web site usability may be rated poor if
A) the average number of page views on your Web site is large.
B) the time spent on your Web site is long.
C) Web site visitors download few of your offered PDFs and videos.
D) users fail to click on all pages equally.
Answer: C
Diff: 2
Page Ref: 300
37) Understanding which keywords your users enter to reach your Web site through a search
engine can help you understand
A) the hardware your Web site is running on.
B) the type of Web browser being used by your Web site visitors.
C) most of your Web site visitors' wants and needs.
D) how well visitors understand your products.
Answer: D
Diff: 3
Page Ref: 301
38) Which of the following statements about Web site conversion statistics is FALSE?
A) Web site visitors can be classed as either new or returning.
B) Visitors who begin a purchase on most Web sites must complete it.
C) The conversion rate is the number of people who take action divided by the number of
visitors.
D) Analyzing exit rates can tell you why visitors left your Web site.
Answer: B
Diff: 3
Page Ref: 302
39) What is one major way in which Web-based social media differs from traditional publishing
media?
A) Most Web-based media are operated by the government and large firms.
B) They use different languages of publication.
C) They have different costs to own and operate.
D) Web-based media have a narrower range of quality.
Answer: C
Diff: 3
Page Ref: 310
40) What does advanced analytics for social media do?
A) It helps identify your followers.
B) It identifies links between groups.
C) It examines the content of online conversations.
D) It identifies the biggest sources of influence online.
Answer: C
Diff: 2
Page Ref: 311
41) IBM's Watson utilizes a massively parallel, text mining–focused, probabilistic evidencebased computational architecture called ________.
Answer: DeepQA
Diff: 2
Page Ref: 248
42) ________, also called homonyms, are syntactically identical words with different meanings.
Answer: Polysemes
Diff: 2
Page Ref: 253
43) When a word has more than one meaning, selecting the meaning that makes the most sense
can only be accomplished by taking into account the context within which the word is used. This
concept is known as ________.
Answer: word sense disambiguation
Diff: 3
Page Ref: 256
44) ________ is a technique used to detect favorable and unfavorable opinions toward specific
products and services using large numbers of textual data sources.
Answer: Sentiment analysis
Diff: 2
Page Ref: 257
45) In the Mining for Lies case study, a text based deception-detection method used by Fuller
and others in 2008 was based on a process known as ________, which relies on elements of data
and text mining techniques.
Answer: message feature mining
Diff: 2
Page Ref: 262-263
46) At a very high level, the text mining process can be broken down into three consecutive
tasks, the first of which is to establish the ________.
Answer: Corpus
Diff: 2
Page Ref: 269
47) Because the term document matrix is often very large and rather sparse, an important
optimization step is to reduce the ________ of the matrix.
Answer: dimensionality
Diff: 2
Page Ref: 270
48) ________ is mostly driven by sentiment analysis and is a key element of customer
experience management initiatives, where the goal is to create an intimate relationship with the
customer.
Answer: Voice of the customer (VOC)
Diff: 2
Page Ref: 280
49) When viewed as a binary feature, ________ classification is the binary classification task of
labeling an opinionated document as expressing either an overall positive or an overall negative
opinion.
Answer: polarity
Diff: 2
Page Ref: 282
50) Web pages contain both unstructured information and ________, which are connections to
other Web pages.
Answer: hyperlinks
Diff: 1
Page Ref: 290
51) Web ________ are used to automatically read through the contents of Web sites.
Answer: crawlers/spiders
Diff: 1
Page Ref: 289
52) A(n) ________ is one or more Web pages that provide a collection of links to authoritative
Web pages.
Answer: hub
Diff: 1
Page Ref: 290
53) A(n) ________ engine is a software program that searches for Web sites or files based on
keywords.
Answer: search
Diff: 1
Page Ref: 291
54) In the Lotte.com retail case, the company deployed SAS for Customer Experience Analytics
to better understand the quality of customer traffic on their Web site, classify order rates, and see
which ________ had the most visitors.
Answer: channels
Diff: 2
Page Ref: 297
55) ________ Web analytics refers to measurement and analysis of data relating to your
company that takes place outside your Web site.
Answer: Off-site
Diff: 1
Page Ref: 299
56) A(n) ________ Web site contains links that send traffic directly to your Web site.
Answer: referral
Diff: 2
Page Ref: 301
57) ________ statistics help you understand whether your specific marketing objective for a
Web page is being achieved.
Answer: Conversion
Diff: 1
Page Ref: 302
58) In the Tito's Vodka case, it was important that social media users all had a(n) ________
brand experience.
Answer: consistent
Diff: 2
Page Ref: 306
59) ________ is a connections metric for social networks that measures the ties that actors in a
network have with others that are geographically close.
Answer: Propinquity
Diff: 1
Page Ref: 308
60) ________ is a segmentation metric for social networks that measures the strength of the
bonds between actors in a social network.
Answer: Cohesion
Diff: 1
Page Ref: 309
61) How would you describe information extraction in text mining?
Answer: Information extraction is the identification of key phrases and relationships within text
by looking for predefined objects and sequences in text by way of pattern matching.
Diff: 2
Page Ref: 252
62) Natural language processing (NLP), a subfield of artificial intelligence and computational
linguistics, is an important component of text mining. What is the definition of NLP?
Answer: NLP is a discipline that studies the problem of "understanding" the natural human
language, with the view of converting depictions of human language into more formal
representations in the form of numeric and symbolic data that are easier for computer programs
to manipulate.
Diff: 2
Page Ref: 256
63) In the security domain, one of the largest and most prominent text mining applications is the
highly classified ECHELON surveillance system. What is ECHELON assumed to be capable of
doing?
Answer: Identifying the content of telephone calls, faxes, e-mails, and other types of data and
intercepting information sent via satellites, public switched telephone networks, and microwave
links
Diff: 2
Page Ref: 261-262
64) Describe the query-specific clustering method as it relates to clustering.
Answer: This method employs a hierarchical clustering approach where the most relevant
documents to the posed query appear in small tight clusters that are nested in larger clusters
containing less similar documents, creating a spectrum of relevance levels among the documents.
Diff: 3
Page Ref: 272
65) Identify, with a brief description, each of the four steps in the sentiment analysis process.
Answer:
1. Sentiment Detection: Here the goal is to differentiate between a fact and an opinion, which
may be viewed as classification of text as objective or subjective.
2. N-P Polarity Classification: Given an opinionated piece of text, the goal is to classify the
opinion as falling under one of two opposing sentiment polarities, or locate its position on the
continuum between these two polarities.
3. Target Identification: The goal of this step is to accurately identify the target of the
expressed sentiment.
4. Collection and Aggregation: In this step all text data points in the document are aggregated
and converted to a single sentiment measure for the whole document.
Diff: 2
Page Ref: 282-284
66) In what ways does the Web pose great challenges for effective and efficient knowledge
discovery through data mining?
Answer:
• The Web is too big for effective data mining. The Web is so large and growing so rapidly
that it is difficult to even quantify its size. Because of the sheer size of the Web, it is not feasible
to set up a data warehouse to replicate, store, and integrate all of the data on the Web, making
data collection and integration a challenge.
• The Web is too complex. The complexity of a Web page is far greater than a page in a
traditional text document collection. Web pages lack a unified structure. They contain far more
authoring style and content variation than any set of books, articles, or other traditional textbased document.
• The Web is too dynamic. The Web is a highly dynamic information source. Not only does
the Web grow rapidly, but its content is constantly being updated. Blogs, news stories, stock
market results, weather reports, sports scores, prices, company advertisements, and numerous
other types of information are updated regularly on the Web.
• The Web is not specific to a domain. The Web serves a broad diversity of communities and
connects billions of workstations. Web users have very different backgrounds, interests, and
usage purposes. Most users may not have good knowledge of the structure of the information
network and may not be aware of the heavy cost of a particular search that they perform.
• The Web has everything. Only a small portion of the information on the Web is truly
relevant or useful to someone (or some task). Finding the portion of the Web that is truly relevant
to a person and the task being performed is a prominent issue in Web-related research.
Diff: 2
Page Ref: 287-288
67) What is search engine optimization (SEO) and why is it important for organizations that own
Web sites?
Answer: Search engine optimization (SEO) is the intentional activity of affecting the visibility
of an e-commerce site or a Web site in a search engine's natural (unpaid or organic) search
results. In general, the higher ranked on the search results page, and more frequently a site
appears in the search results list, the more visitors it will receive from the search engine's users.
Being indexed by search engines like Google, Bing, and Yahoo! is not good enough for
businesses. Getting ranked on the most widely used search engines and getting ranked higher
than your competitors are what make the difference.
Diff: 3
Page Ref: 294-295
68) What is the difference between white hat and black hat SEO activities?
Answer: An SEO technique is considered white hat if it conforms to the search engines'
guidelines and involves no deception. Because search engine guidelines are not written as a
series of rules or commandments, this is an important distinction to note. White-hat SEO is not
just about following guidelines, but about ensuring that the content a search engine indexes and
subsequently ranks is the same content a user will see.
Black-hat SEO attempts to improve rankings in ways that are disapproved by the search
engines, or involve deception or trying to trick search engine algorithms from their intended
purpose.
Diff: 3
Page Ref: 295
69) Why are the users' page views and time spent on your Web site important metrics?
Answer: If people come to your Web site and don't view many pages, that is undesirable and
your Web site may have issues with its design or structure. Another explanation for low page
views is a disconnect in the marketing messages that brought them to the site and the content that
is actually available.
Generally, the longer a person spends on your Web site, the better it is. That could mean they're
carefully reviewing your content, utilizing interactive components you have available, and
building toward an informed decision to buy, respond, or take the next step you've provided. On
the contrary, the time on site also needs to be examined against the number of pages viewed to
make sure the visitor isn't spending his or her time trying to locate content that should be more
readily accessible.
Diff: 3
Page Ref: 300
70) What are the three categories of social media analytics technologies and what do they do?
Answer:
• Descriptive analytics: Uses simple statistics to identify activity characteristics and trends,
such as how many followers you have, how many reviews were generated on Facebook, and
which channels are being used most often.
• Social network analysis: Follows the links between friends, fans, and followers to identify
connections of influence as well as the biggest sources of influence.
• Advanced analytics: Includes predictive analytics and text analytics that examine the content
in online conversations to identify themes, sentiments, and connections that would not be
revealed by casual surveillance.
Diff: 2
Page Ref: 311
Business Intelligence, 4e (Sharda/Delen/Turban)
Chapter 6 Prescriptive Analytics: Optimization and Simulation
1) In the School District of Philadelphia case, Excel and an add-in was used to evaluate different
vendor options.
Answer: TRUE
Diff: 2
Page Ref: 321
2) Modeling is a key element for prescriptive analytics.
Answer: TRUE
Diff: 1
Page Ref: 322
3) Business analysis is the monitoring, scanning, and interpretation of collected environmental
information.
Answer: FALSE
Diff: 2
Page Ref: 324
4) Online commerce and communication has created an immense need for forecasting and an
abundance of available information for performing it.
Answer: TRUE
Diff: 2
Page Ref: 324
5) All quantitative models are typically made up of six basic components.
Answer: FALSE
Diff: 2
Page Ref: 328
6) Result variables are considered independent variables.
Answer: FALSE
Diff: 2
Page Ref: 328
7) In decision making under uncertainty, it is assumed that complete knowledge is available.
Answer: FALSE
Diff: 2
Page Ref: 330
8) A decision made under risk is also known as a probabilistic or stochastic decision-making
situation.
Answer: TRUE
Diff: 2
Page Ref: 331
9) Spreadsheets include all possible tools needed to deploy a custom DSS.
Answer: FALSE
Diff: 2
Page Ref: 332
10) Spreadsheets are clearly the most popular developer modeling tool.
Answer: FALSE
Diff: 2
Page Ref: 333
11) Every LP model has some internal intermediate variables that are not explicitly stated.
Answer: TRUE
Diff: 2
Page Ref: 340
12) A model builder makes predictions and assumptions regarding input data, many of which
deal with the assessment of certain futures.
Answer: FALSE
Diff: 2
Page Ref: 347
13) Many quantitative models of decision theory are based on comparing a single measure of
effectiveness, generally some form of utility to the decision maker.
Answer: TRUE
Diff: 2
Page Ref: 346
14) A decision table shows the relationships of the problem graphically and can handle complex
situations in a compact form.
Answer: FALSE
Diff: 2
Page Ref: 350-351
15) Decision situations that involve a finite and usually not too large number of alternatives are
modeled through an approach called decision analysis.
Answer: TRUE
Diff: 2
Page Ref: 349
16) The pessimistic approach assumes that the worst possible outcome for each alternative will
occur and selects the best of these.
Answer: TRUE
Diff: 2
Page Ref: 350-351
17) Simulation is the appearance of reality.
Answer: TRUE
Diff: 1
Page Ref: 352
18) Simulation is normally used only when a problem is too complex to be treated using
numerical optimization techniques.
Answer: TRUE
Diff: 2
Page Ref: 352
19) Simulations are an experimental, expensive, error-prone method for gaining insight into
complex decision-making situations.
Answer: FALSE
Diff: 2
Page Ref: 359
20) VIS uses animated computer graphic displays to present the impact of different managerial
decisions.
Answer: TRUE
Diff: 2
Page Ref: 360
21) A more general form of an influence diagram is called a(n)
A) forecast.
B) environmental scan.
C) cognitive map.
D) static model.
Answer: C
Diff: 2
Page Ref: 324
22) A(n) ________ is a graphical representation of a model.
A) multidimensional analysis
B) influence diagram
C) OLAP model
D) Whisker plot
Answer: B
Diff: 2
Page Ref: 327
23) Which of the following is NOT a component of a quantitative model?
A) result variables
B) decision variables
C) classes
D) parameters
Answer: C
Diff: 2
Page Ref: 328
24) Intermediate result variables reflect intermediate outcomes in
A) mathematical models.
B) flowcharts.
C) decision trees.
D) ROI calculations.
Answer: A
Diff: 2
Page Ref: 329
25) When the decision maker must consider several possible outcomes for each alternative, each
with a given probability of occurrence, this is decision making under
A) certainty.
B) uncertainty.
C) risk.
D) duress.
Answer: C
Diff: 2
Page Ref: 331
26) When the decision maker knows exactly what the outcome of each course of action will be,
this is decision making under
A) certainty.
B) uncertainty.
C) risk.
D) duress.
Answer: A
Diff: 2
Page Ref: 330
27) A(n) ________ spreadsheet model represents behavior over time.
A) static
B) dynamic
C) looped
D) add-in
Answer: B
Diff: 2
Page Ref: 336
28) Important spreadsheet features for modeling include all of the following EXCEPT
A) what-if analysis.
B) goal seeking.
C) macros.
D) pivot tables.
Answer: D
Diff: 2
Page Ref: 335
29) Which of the following is NOT a characteristic displayed by a LP allocation problem?
A) A limited quantity of economic resources is available for allocation.
B) The resources are used in the production of products or services.
C) There are two or more ways in which the resources can be used.
D) The problem is not bound by constraints.
Answer: D
Diff: 2
Page Ref: 338
30) Which of the following is NOT a characteristic displayed by a LP allocation problem?
A) Each activity in which the resources are used yields a return in terms of the stated goal.
B) The resources are used in the production of products or services.
C) There is a single way in which the resources can be used.
D) The allocation is usually restricted by several limitations and requirements.
Answer: C
Diff: 2
Page Ref: 338
31) Which of the following is NOT an assumption used by a LP allocation problem?
A) Returns from different allocations can be compared.
B) The return from any allocation is independent of other allocations.
C) The total return is the sum of the returns yielded by the different activities.
D) All data are unknown with decision making under uncertainty.
Answer: D
Diff: 2
Page Ref: 338
32) Which of the following is NOT an assumption used by a LP allocation problem?
A) The resources are to be used in the most economical manner.
B) The return from any allocation is independent of other allocations.
C) Total returns cannot be compared.
D) All data are known with certainty.
Answer: C
Diff: 2
Page Ref: 338
33) This method calculates the values of the inputs necessary to achieve a desired level of an
output.
A) goal seek
B) what-if
C) sensitivity
D) LP
Answer: A
Diff: 2
Page Ref: 348
34) This method calculates the values of the inputs necessary to generate a zero profit outcome.
A) goal seek
B) what-if
C) sensitivity
D) break-even
Answer: A
Diff: 2
Page Ref: 349
35) The most common method for solving a risk analysis problem is to select the alternative with
the
A) smallest expected value.
B) greatest expected value.
C) mean expected value.
D) median expected value.
Answer: B
Diff: 2
Page Ref: 351
36) A decision tree can be cumbersome if there are
A) uncertain results.
B) few alternatives.
C) many alternatives.
D) pre-existing decision tables.
Answer: C
Diff: 2
Page Ref: 351
37) Which of the following is NOT a disadvantage of a simulation?
A) An optimal solution cannot be guaranteed, but relatively good ones are generally found.
B) Simulation software sometimes requires special skills because of the complexity of the formal
solution method.
C) Simulation is often the only DSS modeling method that can readily handle relatively
unstructured problems.
D) Simulation model construction can be a slow and costly process, although newer modeling
systems are easier to use than ever.
Answer: C
Diff: 2
Page Ref: 355
38) Which of the following is the order of simulation methodology?
A) Define the problem, Construct the simulation model, Test and validate the model, Design the
experiment, Conduct the experiment, Implement the results, Evaluate the results.
B) Construct the simulation model, Test and validate the model, Define the problem, Design the
experiment, Conduct the experiment, Evaluate the results, Implement the results.
C) Define the problem, Construct the simulation model, Test and validate the model, Evaluate
the results, Implement the results, Design the experiment, Conduct the experiment.
D) Define the problem, Construct the simulation model, Test and validate the model, Design the
experiment, Conduct the experiment, Evaluate the results, Implement the results.
Answer: D
Diff: 2
Page Ref: 355-356
39) What type of VIM models display a visual image of the result of one decision alternative at a
time?
A) static
B) dynamic
C) DSS
D) VIS
Answer: A
Diff: 2
Page Ref: 360
40) If a simulation result does NOT match the intuition or judgment of the decision maker, what
can occur?
A) read/write error
B) visual distortion
C) project failure
D) confidence gap
Answer: D
Diff: 2
Page Ref: 359
41) A(n) ________ model can be constructed under assumed environments of certainty.
Answer: dynamic
Diff: 2
Page Ref: 324
42) Selecting the best ________ to work with is a laborious yet important task for companies and
government organizations.
Answer: vendors
Diff: 2
Page Ref: 320
43) Identification of a model's variables (e.g., decision, result, uncontrollable) is critical, as are
the relationships among the ________.
Answer: variables
Diff: 2
Page Ref: 324
44) ________, like data, must be managed to maintain their integrity, and thus their applicability.
Answer: Models
Diff: 2
Page Ref: 326
45) Factors that are not under the control of the decision maker but can be fixed, are called
________.
Answer: parameters
Diff: 2
Page Ref: 328
46) The components of a quantitative model are linked by ________ expressions.
Answer: algebraic
Diff: 2
Page Ref: 329
47) A probabilistic decision-making situation is a decision made under ________.
Answer: risk
Diff: 2
Page Ref: 331
48) Risk ________ is a decision-making method that analyzes the risk (based on assumed known
probabilities) associated with different alternatives.
Answer: analysis
Diff: 2
Page Ref: 331
49) Spreadsheets use ________ to extend their functionality.
Answer: add-ins
Diff: 2
Page Ref: 333
50) ________ is performed by indicating a target cell, its desired value, and a changing cell.
Answer: Goal seeking
Diff: 2
Page Ref: 335
51) Of the available solutions, at least one is the best, in the sense that the degree of goal
attainment associated with it is the highest; this is called a(n) ________ solution.
Answer: optimal
Diff: 2
Page Ref: 338
52) Every LP model is composed of ________ variables whose values are unknown and are
searched for.
Answer: decision
Diff: 2
Page Ref: 338
53) ________ analysis attempts to assess the impact of a change in the input data or parameters
on the proposed solution.
Answer: Sensitivity
Diff: 2
Page Ref: 347
54) ________ analysis is structured as "What will happen to the solution if an input variable, an
assumption, or a parameter value is changed?"
Answer: What-if
Diff: 2
Page Ref: 348
55) The ________ approach assumes that the best possible outcome of each alternative will
occur and then selects the best of the best.
Answer: optimistic
Diff: 2
Page Ref: 350
56) Multiple goals is a decision situation in which alternatives are evaluated with several,
sometimes ________, goals.
Answer: conflicting
Diff: 2
Page Ref: 352
57) In ________ simulation, one or more of the independent variables (e.g., the demand in an
inventory problem) are subject to chance variation.
Answer: probabilistic
Diff: 3
Page Ref: 356
58) The most common simulation method for business decision problems is the ________
simulation.
Answer: Monte Carlo
Diff: 3
Page Ref: 357
59) The ________ approach can be used in conjunction with artificial intelligence.
Answer: VIM
Diff: 3
Page Ref: 360
60) Conventional ________ generally reports statistical results at the end of a set of experiments.
Answer: simulation
Diff: 3
Page Ref: 359
61) Why is there a trend to developing and using cloud-based tools for modeling?
Answer: This trend exists because it simplifies the process for users. These systems give them
access to powerful tools and pre-existing models that they can use to solve business problems.
Because these systems are cloud-based, there are costs associated with operating them and
maintaining them.
Diff: 2
Page Ref: 327
62) List and briefly discuss the major components of a quantitative model.
Answer: These components include:
1. Result (outcome) variables reflect the level of effectiveness of a system; that is, they indicate
how well the system performs or attains its goal(s).
2. Decision variables describe alternative courses of action. The decision maker controls the
decision variables.
3. Uncontrollable Variables - in any decision-making situation, there are factors that affect the
result variables but are not under the control of the decision maker
4. Intermediate result variables reflect intermediate outcomes in mathematical models.
Diff: 2
Page Ref: 328-329
63) Why do many believe that making decisions under uncertainty is more difficult than making
decisions under risk?
Answer: This opinion is commonly held because making decisions under uncertainty allows for
an unlimited number of possible outcomes, yet no understanding of the likelihood of those
outcomes. In contrast, decision-making under risk allows for an unlimited number of outcomes,
but a known probability of the likelihood of those outcomes.
Diff: 2
Page Ref: 331
64) Why are spreadsheet applications so commonly used for decision modeling?
Answer: Spreadsheets are often used for this purpose because they are very approachable and
easy to use for end users. Spreadsheets have a shallow learning curve that allows basic functions
to be learned quickly. Additionally, spreadsheets have evolved over time to include a more
robust set of features and functions. These functions can also be augmented through the use of
add-ins, many of which are designed with decision support systems in mind.
Diff: 2
Page Ref: 332
65) How are linear programming models vulnerable when used in complex situation?
Answer: These models have the ability to be vulnerable when used in very complex situations
for a number of reasons. One reason focuses on the possibility that not all parameters can be
known or understood. Another concern is that the standard characteristics of a linear
programming calculation may not hold in more dynamic, real-world environments. Additionally,
in more complex environments all actors may not be wholly rational and economic issues.
Diff: 2
Page Ref: 338
66) Provide some examples where a sensitivity analysis may be used.
Answer: Sensitivity analyses are used for:
• Revising models to eliminate too-large sensitivities
• Adding details about sensitive variables or scenarios
• Obtaining better estimates of sensitive external variables
• Altering a real-world system to reduce actual sensitivities
• Accepting and using the sensitive (and hence vulnerable) real world, leading to the
continuous and close monitoring of actual results
Diff: 3
Page Ref: 347
67) List and describe the most common approaches for treating uncertainty.
Answer: There are two common approaches to dealing with uncertainty. The first is the
optimistic approach and the second is the pessimistic approach. The optimistic approach assumes
that the outcomes for all alternatives will be the best possible and then the best of each of those
may be selected. Under the pessimistic approach the worst possible outcome is assumed for each
alternative and then the best of the worst are selected.
Diff: 2
Page Ref: 350-351
68) Why is the Monte Carlo simulation popular for solving business problems?
Answer: The Monte Carlo simulation is a probabilistic simulation. It is designed around a model
of the decision problem, but the problem does not consider the uncertainty of any of the
variables. This allows for a huge number of simulations to be run with random changes within
each of the variables. In this way, the model may be solved hundreds or thousands of times
before it is completed. These results can then be analyzed for either the dependent or
performance variables using statistical distributions. This demonstrates a number of possible
solutions, as well as providing information about the manner in which variables will respond
under different levels of uncertainty.
Diff: 3
Page Ref: 357
Business Intelligence, 4e (Sharda/Delen/Turban)
Chapter 7 Big Data Concepts and Tools
1) In the opening vignette, the Access Telecom (AT), built a system to better visualize customers
who were unhappy before they canceled their service.
Answer: TRUE
Diff: 2
Page Ref: 372
2) The term "Big Data" is relative as it depends on the size of the using organization.
Answer: TRUE
Diff: 2
Page Ref: 373
3) Satellite data can be used to evaluate the activity at retail locations as a source of alternative
data.
Answer: TRUE
Diff: 2
Page Ref: 377
4) Big Data is being driven by the exponential growth, availability, and use of information.
Answer: TRUE
Diff: 2
Page Ref: 373
5) The quality and objectivity of information disseminated by influential users of Twitter is
higher than that disseminated by noninfluential users.
Answer: TRUE
Diff: 2
Page Ref: 392
6) Big Data uses commodity hardware, which is expensive, specialized hardware that is custom
built for a client or application.
Answer: FALSE
Diff: 2
Page Ref: 375
7) MapReduce can be easily understood by skilled programmers due to its procedural nature.
Answer: TRUE
Diff: 2
Page Ref: 385
8) Hadoop was designed to handle petabytes and exabytes of data distributed over multiple nodes
in parallel.
Answer: TRUE
Diff: 2
Page Ref: 385
9) Hadoop and MapReduce require each other to work.
Answer: FALSE
Diff: 2
Page Ref: 386
10) In most cases, Hadoop is used to replace data warehouses.
Answer: FALSE
Diff: 2
Page Ref: 389
11) Despite their potential, many current NoSQL tools lack mature management and monitoring
tools.
Answer: TRUE
Diff: 2
Page Ref: 389
12) There is a clear difference between the type of information support provided by influential
users versus the others on Twitter.
Answer: TRUE
Diff: 2
Page Ref: 392
13) Social media mentions can be used to chart and predict flu outbreaks.
Answer: TRUE
Diff: 2
Page Ref: 400
14) In Application Case 7.6, Analyzing Disease Patterns from an Electronic Medical Records
Data Warehouse, it was found that urban individuals have a higher number of diagnosed disease
conditions.
Answer: TRUE
Diff: 2
Page Ref: 403
15) For low latency, interactive reports, a data warehouse is preferable to Hadoop.
Answer: TRUE
Diff: 2
Page Ref: 396
16) If you have many flexible programming languages running in parallel, Hadoop is preferable
to a data warehouse.
Answer: TRUE
Diff: 2
Page Ref: 396
17) In the Salesforce case study, streaming data is used to identify services that customers use
most.
Answer: FALSE
Diff: 2
Page Ref: 410
18) It is important for Big Data and self-service business intelligence to go hand in hand to get
maximum value from analytics.
Answer: TRUE
Diff: 1
Page Ref: 395
19) Big Data simplifies data governance issues, especially for global firms.
Answer: FALSE
Diff: 2
Page Ref: 406
20) Current total storage capacity lags behind the digital information being generated in the
world.
Answer: TRUE
Diff: 2
Page Ref: 406
21) Using data to understand customers/clients and business operations to sustain and foster
growth and profitability is
A) easier with the advent of BI and Big Data.
B) essentially the same now as it has always been.
C) an increasingly challenging task for today's enterprises.
D) now completely automated with no human intervention required.
Answer: C
Diff: 2
Page Ref: 373
22) A newly popular unit of data in the Big Data era is the petabyte (PB), which is
A) 109 bytes.
B) 1012 bytes.
C) 1015 bytes.
D) 1018 bytes.
Answer: C
Diff: 2
Page Ref: 375
23) Which of the following sources is likely to produce Big Data the fastest?
A) order entry clerks
B) cashiers
C) RFID tags
D) online customers
Answer: C
Diff: 2
Page Ref: 374
24) Data flows can be highly inconsistent, with periodic peaks, making data loads hard to
manage. What is this feature of Big Data called?
A) volatility
B) periodicity
C) inconsistency
D) variability
Answer: D
Diff: 2
Page Ref: 376
25) In the Twitter case study, how did influential users support their tweets?
A) opinion
B) objective data
C) multiple posts
D) references to other users
Answer: B
Diff: 2
Page Ref: 392
26) Allowing Big Data to be processed in memory and distributed across a dedicated set of nodes
can solve complex problems in near–real time with highly accurate insights. What is this process
called?
A) in-memory analytics
B) in-database analytics
C) grid computing
D) appliances
Answer: A
Diff: 2
Page Ref: 380
27) Which Big Data approach promotes efficiency, lower cost, and better performance by
processing jobs in a shared, centrally managed pool of IT resources?
A) in-memory analytics
B) in-database analytics
C) grid computing
D) appliances
Answer: C
Diff: 2
Page Ref: 380
28) How does Hadoop work?
A) It integrates Big Data into a whole so large data elements can be processed as a whole on one
computer.
B) It integrates Big Data into a whole so large data elements can be processed as a whole on
multiple computers.
C) It breaks up Big Data into multiple parts so each part can be processed and analyzed at the
same time on one computer.
D) It breaks up Big Data into multiple parts so each part can be processed and analyzed at the
same time on multiple computers.
Answer: D
Diff: 3
Page Ref: 386
29) What is the Hadoop Distributed File System (HDFS) designed to handle?
A) unstructured and semistructured relational data
B) unstructured and semistructured non-relational data
C) structured and semistructured relational data
D) structured and semistructured non-relational data
Answer: B
Diff: 2
Page Ref: 385
30) In a Hadoop "stack," what is a slave node?
A) a node where bits of programs are stored
B) a node where metadata is stored and used to organize data processing
C) a node where data is stored and processed
D) a node responsible for holding all the source programs
Answer: C
Diff: 2
Page Ref: 386
31) In a Hadoop "stack," what node periodically replicates and stores data from the Name Node
should it fail?
A) backup node
B) secondary node
C) substitute node
D) slave node
Answer: B
Diff: 2
Page Ref: 386
32) All of the following statements about MapReduce are true EXCEPT
A) MapReduce is a general-purpose execution engine.
B) MapReduce handles the complexities of network communication.
C) MapReduce handles parallel programming.
D) MapReduce runs without fault tolerance.
Answer: D
Diff: 2
Page Ref: 389
33) In a network analysis, what connects nodes?
A) edges
B) metrics
C) paths
D) visualizations
Answer: A
Diff: 2
Page Ref: 403
34) In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case
study, what was the analytic goal?
A) determine if diseases are accurately diagnosed
B) determine probabilities of diseases that are comorbid
C) determine differences in rates of disease in urban and rural populations
D) determine differences in rates of disease in males v. females
Answer: C
Diff: 2
Page Ref: 402
35) Traditional data warehouses have not been able to keep up with
A) the evolution of the SQL language.
B) the variety and complexity of data.
C) expert systems that run on them.
D) OLAP.
Answer: B
Diff: 2
Page Ref: 393
36) Under which of the following requirements would it be more appropriate to use Hadoop over
a data warehouse?
A) ANSI 2003 SQL compliance is required
B) online archives alternative to tape
C) unrestricted, ungoverned sandbox explorations
D) analysis of provisional data
Answer: C
Diff: 2
Page Ref: 396
37) What is Big Data's relationship to the cloud?
A) Hadoop cannot be deployed effectively in the cloud just yet.
B) Amazon and Google have working Hadoop cloud offerings.
C) IBM's homegrown Hadoop platform is the only option.
D) Only MapReduce works in the cloud; Hadoop does not.
Answer: B
Diff: 2
Page Ref: 403
38) Companies with the largest revenues from Big Data tend to be
A) the largest computer and IT services firms.
B) small computer and IT services firms.
C) pure open source Big Data firms.
D) non-U.S. Big Data firms.
Answer: A
Diff: 2
Page Ref: 405
39) In the financial services industry, Big Data can be used to improve
A) regulatory oversight.
B) decision making.
C) customer service.
D) both A & B.
Answer: D
Diff: 2
Page Ref: 411
40) In the Alternative Data for Market Analysis or Forecasts case study, satellite data was NOT
used for
A) evaluating retail traffic.
B) monitoring activity at factories.
C) tracking agricultural estimates.
D) monitoring individual customer patterns.
Answer: D
Diff: 2
Page Ref: 377
41) Big Data comes from ________.
Answer: everywhere
Diff: 2
Page Ref: 373
42) ________ refers to the conformity to facts: accuracy, quality, truthfulness, or trustworthiness
of the data.
Answer: Veracity
Diff: 2
Page Ref: 376
43) In-motion ________ is often overlooked today in the world of BI and Big Data.
Answer: analytics
Diff: 2
Page Ref: 376-377
44) The ________ of Big Data is its potential to contain more useful patterns and interesting
anomalies than "small" data.
Answer: value proposition
Diff: 2
Page Ref: 376
45) As the size and the complexity of analytical systems increase, the need for more ________
analytical systems is also increasing to obtain the best performance.
Answer: efficient
Diff: 2
Page Ref: 380
46) ________ speeds time to insights and enables better data governance by performing data
integration and analytic functions inside the database.
Answer: In-database analytics
Diff: 2
Page Ref: 380
47) ________ bring together hardware and software in a physical unit that is not only fast but
also scalable on an as-needed basis.
Answer: Appliances
Diff: 2
Page Ref: 380
48) Big Data employs ________ processing techniques and nonrelational data storage
capabilities in order to process unstructured and semistructured data.
Answer: parallel
Diff: 2
Page Ref: 383
49) In the world of Big Data, ________ aids organizations in processing and analyzing large
volumes of multistructured data. Examples include indexing and search, graph analysis, etc.
Answer: MapReduce
Diff: 2
Page Ref: 385
50) The ________ Node in a Hadoop cluster provides client information on where in the cluster
particular data is stored and if any nodes fail.
Answer: Name
Diff: 2
Page Ref: 385
51) A job ________ is a node in a Hadoop cluster that initiates and coordinates MapReduce jobs,
or the processing of the data.
Answer: tracker
Diff: 2
Page Ref: 386
52) HBase is a nonrelational ________ that allows for low-latency, quick lookups in Hadoop.
Answer: database
Diff: 2
Page Ref: 387
53) Hadoop is primarily a(n) ________ file system and lacks capabilities we'd associate with a
DBMS, such as indexing, random access to data, and support for SQL.
Answer: distributed
Diff: 2
Page Ref: 388
54) HBase, Cassandra, MongoDB, and Accumulo are examples of ________ databases.
Answer: NoSQL
Diff: 2
Page Ref: 389
55) The problem of forecasting economic activity or microclimates based on a variety of data
beyond the usual retail data is a very recent phenomenon and has led to another buzzword —
________.
Answer: alternative data
Diff: 2
Page Ref: 377
56) As volumes of Big Data arrive from multiple sources such as sensors, machines, social
media, and clickstream interactions, the first step is to ________ all the data reliably and cost
effectively.
Answer: capture
Diff: 2
Page Ref: 393
57) In open-source databases, the most important performance enhancement to date is the costbased ________.
Answer: optimizer
Diff: 2
Page Ref: 395
58) ________ of data provides business value; pulling of data from multiple subject areas and
numerous applications into one repository is the raison d'être for data warehouses.
Answer: Integration
Diff: 2
Page Ref: 395
59) In the energy industry, ________ grids are one of the most impactful applications of stream
analytics.
Answer: smart
Diff: 2
Page Ref: 407
60) Organizations are working with data that meets the three V's–variety, volume, and ________
characterizations.
Answer: velocity
Diff: 2
Page Ref: 374
61) In the opening vignette, why was the Telecom company so concerned about the loss of
customers, if customer churn is common in that industry?
Answer: The company was concerned about its loss of customers, because the loss was at such a
high rate. The company was losing customers faster than it was gaining them. Additionally, the
company had identified that the loss of these customers could be traced back to customer service
interactions. Because of this, the company felt that the loss of customers is something that could
be analyzed and hopefully controlled.
Diff: 2
Page Ref: 370-371
62) List and describe the three main "V"s that characterize Big Data.
Answer:
• Volume: This is obviously the most common trait of Big Data. Many factors contributed to
the exponential increase in data volume, such as transaction-based data stored through the years,
text data constantly streaming in from social media, increasing amounts of sensor data being
collected, automatically generated RFID and GPS data, and so forth.
• Variety: Data today comes in all types of formats—ranging from traditional databases to
hierarchical data stores created by the end users and OLAP systems, to text documents, e-mail,
XML, meter-collected, sensor-captured data, to video, audio, and stock ticker data. By some
estimates, 80 to 85 percent of all organizations' data is in some sort of unstructured or
semistructured format.
• Velocity: This refers to both how fast data is being produced and how fast the data must be
processed (i.e., captured, stored, and analyzed) to meet the need or demand. RFID tags,
automated sensors, GPS devices, and smart meters are driving an increasing need to deal with
torrents of data in near–real time.
Diff: 2
Page Ref: 374-375
63) List and describe four of the most critical success factors for Big Data analytics.
Answer:
• A clear business need (alignment with the vision and the strategy). Business investments
ought to be made for the good of the business, not for the sake of mere technology
advancements. Therefore, the main driver for Big Data analytics should be the needs of the
business at any level—strategic, tactical, and operations.
• Strong, committed sponsorship (executive champion). It is a well-known fact that if you
don't have strong, committed executive sponsorship, it is difficult (if not impossible) to succeed.
If the scope is a single or a few analytical applications, the sponsorship can be at the
departmental level. However, if the target is enterprise-wide organizational transformation,
which is often the case for Big Data initiatives, sponsorship needs to be at the highest levels and
organization-wide.
• Alignment between the business and IT strategy. It is essential to make sure that the
analytics work is always supporting the business strategy, and not other way around. Analytics
should play the enabling role in successful execution of the business strategy.
• A fact-based decision making culture. In a fact-based decision-making culture, the numbers
rather than intuition, gut feeling, or supposition drive decision making. There is also a culture of
experimentation to see what works and doesn't. To create a fact-based decision-making culture,
senior management needs to do the following: recognize that some people can't or won't adjust;
be a vocal supporter; stress that outdated methods must be discontinued; ask to see what
analytics went into decisions; link incentives and compensation to desired behaviors.
• A strong data infrastructure. Data warehouses have provided the data infrastructure for
analytics. This infrastructure is changing and being enhanced in the Big Data era with new
technologies. Success requires marrying the old with the new for a holistic infrastructure that
works synergistically.
Diff: 2
Page Ref: 379-380
64) When considering Big Data projects and architecture, list and describe five challenges
designers should be mindful of in order to make the journey to analytics competency less
stressful.
Answer:
• Data volume: The ability to capture, store, and process the huge volume of data at an
acceptable speed so that the latest information is available to decision makers when they need it.
• Data integration: The ability to combine data that is not similar in structure or source and to
do so quickly and at reasonable cost.
• Processing capabilities: The ability to process the data quickly, as it is captured. The
traditional way of collecting and then processing the data may not work. In many situations data
needs to be analyzed as soon as it is captured to leverage the most value.
• Data governance: The ability to keep up with the security, privacy, ownership, and quality
issues of Big Data. As the volume, variety (format and source), and velocity of data change, so
should the capabilities of governance practices.
• Skills availability: Big Data is being harnessed with new tools and is being looked at in
different ways. There is a shortage of data scientists with the skills to do the job.
• Solution cost: Since Big Data has opened up a world of possible business improvements,
there is a great deal of experimentation and discovery taking place to determine the patterns that
matter and the insights that turn to value. To ensure a positive ROI on a Big Data project,
therefore, it is crucial to reduce the cost of the solutions used to find that value.
Diff: 3
Page Ref: 381
65) Define MapReduce.
Answer: As described by Dean and Ghemawat (2004), "MapReduce is a programming model
and an associated implementation for processing and generating large data sets. Programs written
in this functional style are automatically parallelized and executed on a large cluster of
commodity machines. This allows programmers without any experience with parallel and
distributed systems to easily utilize the resources of a large distributed system."
Diff: 2
Page Ref: 384
66) What is NoSQL as used for Big Data? Describe its major downsides.
Answer:
• NoSQL is a new style of database that has emerged to, like Hadoop, process large volumes of
multi-structured data. However, whereas Hadoop is adept at supporting large-scale, batch-style
historical analysis, NoSQL databases are aimed, for the most part (though there are some
important exceptions), at serving up discrete data stored among large volumes of multistructured data to end-user and automated Big Data applications. This capability is sorely lacking
from relational database technology, which simply can't maintain needed application
performance levels at Big Data scale.
• The downside of most NoSQL databases today is that they trade ACID (atomicity,
consistency, isolation, durability) compliance for performance and scalability. Many also lack
mature management and monitoring tools.
Diff: 2
Page Ref: 389-390
67) List and briefly discuss the three characteristics that define and make the case for data
warehousing.
Answer:
1) Data warehouse performance: More advanced forms of indexing such as materialized
views, aggregate join indexes, cube indexes, and sparse join indexes enable numerous
performance gains in data warehouses. The most important performance enhancement to date is
the cost-based optimizer, which examines incoming SQL and considers multiple plans for
executing each query as fast as possible.
2) Integrating data that provides business value: Integrated data is the unique foundation
required to answer essential business questions.
3) Interactive BI tools: These tools allow business users to have direct access to data
warehouse insights. Users are able to extract business value from the data and supply valuable
strategic information to the executive staff.
Diff: 2
Page Ref: 394-395
68) Why are some portions of tape backup workloads being redirected to Hadoop clusters today?
Answer:
• First, while it may appear inexpensive to store data on tape, the true cost comes with the
difficulty of retrieval. Not only is the data stored offline, requiring hours if not days to restore,
but tape cartridges themselves are also prone to degradation over time, making data loss a reality
and forcing companies to factor in those costs. To make matters worse, tape formats change
every couple of years, requiring organizations to either perform massive data migrations to the
newest tape format or risk the inability to restore data from obsolete tapes.
• Second, it has been shown that there is value in keeping historical data online and accessible.
As in the clickstream example, keeping raw data on a spinning disk for a longer duration makes
it easy for companies to revisit data when the context changes and new constraints need to be
applied. Searching thousands of disks with Hadoop is dramatically faster and easier than
spinning through hundreds of magnetic tapes. Additionally, as disk densities continue to double
every 18 months, it becomes economically feasible for organizations to hold many years' worth
of raw or refined data in HDFS.
Diff: 2
Page Ref: 394
69) What are the differences between stream analytics and perpetual analytics? When would you
use one or the other?
Answer:
• In many cases they are used synonymously. However, in the context of intelligent systems,
there is a difference. Streaming analytics involves applying transaction-level logic to real-time
observations. The rules applied to these observations take into account previous observations as
long as they occurred in the prescribed window; these windows have some arbitrary size (e.g.,
last 5 seconds, last 10,000 observations, etc.). Perpetual analytics, on the other hand, evaluates
every incoming observation against all prior observations, where there is no window size.
Recognizing how the new observation relates to all prior observations enables the discovery of
real-time insight.
• When transactional volumes are high and the time-to-decision is too short, favoring
nonpersistence and small window sizes, this translates into using streaming analytics. However,
when the mission is critical and transaction volumes can be managed in real time, then perpetual
analytics is a better answer.
Diff: 2
Page Ref: 407-408
70) Describe data stream mining and how it is used.
Answer: Data stream mining, as an enabling technology for stream analytics, is the process of
extracting novel patterns and knowledge structures from continuous, rapid data records. A data
stream is a continuous flow of ordered sequence of instances that in many applications of data
stream mining can be read/processed only once or a small number of times using limited
computing and storage capabilities. Examples of data streams include sensor data, computer
network traffic, phone conversations, ATM transactions, web searches, and financial data. Data
stream mining can be considered a subfield of data mining, machine learning, and knowledge
discovery. In many data stream mining applications, the goal is to predict the class or value of
new instances in the data stream given some knowledge about the class membership or values of
previous instances in the data stream.
Diff: 2
Page Ref: 408-409
Business Intelligence, 4e (Sharda/Delen/Turban)
Chapter 8 Future Trends, Privacy and Managerial Considerations in Analytics
1) Siemens utilizes data sensors to track failure rates in household appliances.
Answer: FALSE
Diff: 2
Page Ref: 418
2) In the classification of location-based analytic applications, examining geographic site
locations falls in the consumer-oriented category.
Answer: FALSE
Diff: 2
Page Ref: 445
3) In the Great Clips case study, the company uses geospatial data to analyze, among other
things, the types of haircuts most popular in different geographic locations.
Answer: FALSE
Diff: 2
Page Ref: 443
4) From massive amounts of high-dimensional location data, algorithms that reduce the
dimensionality of the data can be used to uncover trends, meaning, and relationships to
eventually produce human-understandable representations.
Answer: TRUE
Diff: 2
Page Ref: 445
5) In the Quiznos case, the company employed location-based behavioral targeting to narrow the
characteristics of users who were most likely to eat at a quick-service restaurant.
Answer: TRUE
Diff: 2
Page Ref: 446
6) Internet of Things (IoT) is the phenomenon of connecting the physical world to the Internet.
Answer: TRUE
Diff: 2
Page Ref: 419
7) For cloud computing to be successful, users must have knowledge and experience in the
control of the technology infrastructures.
Answer: FALSE
Diff: 2
Page Ref: 430
8) Social networking Web sites like Facebook, Twitter, and LinkedIn, are also examples of cloud
computing.
Answer: TRUE
Diff: 1
Page Ref: 430
9) Web-based e-mail such as Google's Gmail are not examples of cloud computing.
Answer: FALSE
Diff: 2
Page Ref: 430
10) Service-oriented DSS solutions generally offer individual or bundled services to the user as a
service.
Answer: TRUE
Diff: 2
Page Ref: 431
11) Users definitely own their biometric data.
Answer: FALSE
Diff: 2
Page Ref: 452
12) Data as a service began with the notion that data quality could happen in a centralized place,
cleansing and enriching data and offering it to different systems, applications, or users,
irrespective of where they were in the organization, computers, or on the network.
Answer: TRUE
Diff: 2
Page Ref: 431
13) IaaS helps provide faster information, but provides information only to managers in an
organization.
Answer: FALSE
Diff: 2
Page Ref: 432
14) Server virtualization is the pooling of physical storage from multiple network storage devices
into a single storage device.
Answer: FALSE
Diff: 2
Page Ref: 433
15) While cloud services are useful for small and midsize analytic applications, they are still
limited in their ability to handle Big Data applications.
Answer: FALSE
Diff: 2
Page Ref: 435
16) SaaS combines aspects of cloud computing with Big Data analytics and empowers data
scientists and analysts by allowing them to access centrally managed information data sets.
Answer: FALSE
Diff: 2
Page Ref: 435
17) One reason the IoT is growing exponentially is because hardware is smaller and more
affordable.
Answer: TRUE
Diff: 2
Page Ref: 420
18) Connectivity is not a part of the IoT infrastructure.
Answer: FALSE
Diff: 2
Page Ref: 422
19) RFID can be used in supply chains to manage product quality.
Answer: TRUE
Diff: 1
Page Ref: 425
20) The term cloud computing originates from a reference to the Internet as a "cloud" and
represents an evolution of all of the previously shared/centralized computing trends.
Answer: TRUE
Diff: 2
Page Ref: 430
21) What kind of location-based analytics is a real-time marketing promotion?
A) organization-oriented geospatial static approach
B) organization-oriented location-based dynamic approach
C) consumer-oriented geospatial static approach
D) consumer-oriented location-based dynamic approach
Answer: B
Diff: 2
Page Ref: 441
22) GPS Navigation is an example of which kind of location-based analytics?
A) organization-oriented geospatial static approach
B) organization-oriented location-based dynamic approach
C) consumer-oriented geospatial static approach
D) consumer-oriented location-based dynamic approach
Answer: C
Diff: 2
Page Ref: 441
23) What new geometric data type in Teradata's data warehouse captures geospatial features?
A) NAVTEQ
B) ST_GEOMETRY
C) GIS
D) SQL/MM
Answer: B
Diff: 2
Page Ref: 443
24) Which of these is NOT a part of the IoT technology infrastructure?
A) hardware
B) connectivity
C) electrical access
D) software
Answer: C
Diff: 2
Page Ref: 422
25) Today, most smartphones are equipped with various instruments to measure jerk, orientation,
and sense motion. One of these instruments is an accelerometer, and the other is a(n)
A) potentiometer.
B) gyroscope.
C) microscope.
D) oscilloscope.
Answer: B
Diff: 2
Page Ref: 464
26) Smartbin has developed trash containers that include sensors to detect
A) fill levels.
B) types of trash.
C) tip-over.
D) weather.
Answer: A
Diff: 2
Page Ref: 419-420
27) The portion of the IoT technology infrastructure that focuses on the sensors themselves is
A) hardware.
B) connectivity.
C) software backend.
D) applications.
Answer: A
Diff: 2
Page Ref: 422
28) The portion of the IoT technology infrastructure that focuses on how to manage incoming
data and analyze it is
A) hardware.
B) connectivity.
C) software backend.
D) applications.
Answer: C
Diff: 2
Page Ref: 422
29) The portion of the IoT technology infrastructure that focuses on controlling what and how
information is captured is
A) hardware.
B) connectivity.
C) software backend.
D) applications.
Answer: D
Diff: 2
Page Ref: 422
30) The portion of the IoT technology infrastructure that focuses on how to transmit data is
A) hardware.
B) connectivity.
C) software backend.
D) applications.
Answer: B
Diff: 2
Page Ref: 422
31) Using this model, companies can deploy their software and applications in the cloud so that
their customers can use them.
A) SaaS
B) PaaS
C) IaaS
D) DaaS
Answer: B
Diff: 2
Page Ref: 432
32) This model allows consumers to use applications and software that run on distant computers
in the cloud infrastructure.
A) SaaS
B) PaaS
C) IaaS
D) DaaS
Answer: A
Diff: 2
Page Ref: 432
33) Which of the following is true of data-as-a-Service (DaaS) platforms?
A) Knowing where the data resides is critical to the functioning of the platform.
B) There are standardized processes for accessing data wherever it is located.
C) Business processes can access local data only.
D) Data quality happens on each individual platform.
Answer: B
Diff: 2
Page Ref: 431-432
34) Which of the following allows companies to deploy their software and applications in the
cloud so that their customers can use them?
A) SaaS
B) IaaS
C) PaaS
D) AaaS
Answer: C
Diff: 2
Page Ref: 432
35) In this model, infrastructure resources like networks, storage, servers, and other computing
resources are provided to client companies.
A) SaaS
B) PaaS
C) IaaS
D) DaaS
Answer: C
Diff: 2
Page Ref: 432
36) This model began with the notion that data quality could happen in a centralized place,
cleansing and enriching data and offering it to different systems, applications, or users,
irrespective of where they were in the organization, computers, or on the network.
A) SaaS
B) PaaS
C) IaaS
D) DaaS
Answer: D
Diff: 2
Page Ref: 431
37) Why are companies like IBM shifting to provide more services and consulting?
A) Customers see that significant value can be created with the application of analytics, and need
help completing these tasks.
B) They can no longer compete in the software market.
C) New regulations forced them into this market.
D) None of these.
Answer: A
Diff: 3
Page Ref: 454
38) Services that let consumers permanently enter a profile of information along with a password
and use this information repeatedly to access services at multiple sites are called
A) consumer access applications.
B) information collection portals.
C) single-sign-on facilities.
D) consumer information sign on facilities.
Answer: C
Diff: 2
Page Ref: 450
39) Which of the following is true about the furtherance of homeland security?
A) There is a lessening of privacy issues.
B) There is a greater need for oversight.
C) The impetus was the need to harvest information related to financial fraud after 2001.
D) Most people regard analytic tools as mostly ineffective in increasing security.
Answer: B
Diff: 2
Page Ref: 450-451
40) Why is separating the impact of analytics from that of other computerized systems a difficult
task?
A) Businesses do not typically track the sources of successful projects.
B) The trend is toward integrating systems.
C) Software tools are not sophisticated enough.
D) It is not an organizational priority.
Answer: B
Diff: 2
Page Ref: 453
41) ________ is a generic technology that refers to the use of radio-frequency waves to identify
objects.
Answer: RFID
Diff: 2
Page Ref: 422
42) A critical emerging trend in analytics is the incorporation of location data. ________ data is
the static location data used by these location-based analytic applications.
Answer: Geospatial
Diff: 2
Page Ref: 441
43) With RFID tags, a(n) ________ tag has a battery on board to energize it.
Answer: active
Diff: 2
Page Ref: 423
44) With RFID tags, a(n) ________ tag receives energy from the electromagnetic field created
by the interrogator.
Answer: passive
Diff: 2
Page Ref: 423
45) Predictive analytics is beginning to enable development of software that is directly used by a
consumer. One key concern in employing these technologies is the loss of ________.
Answer: privacy
Diff: 2
Page Ref: 448
46) ________ is the splitting of available bandwidth into channels.
Answer: Network virtualization
Diff: 2
Page Ref: 433
47) ________ is the masking of physical servers from server users.
Answer: Server virtualization
Diff: 2
Page Ref: 433
48) ________ provides resources like networks, storage, servers, and other computing resources
to client companies.
Answer: IaaS
Diff: 3
Page Ref: 432
49) IaaS, AaaS and other ________-based offerings allow the rapid diffusion of advanced
analysis tools among users, without significant investment in technology acquisition.
Answer: cloud
Diff: 2
Page Ref: 440
50) A major structural change that can occur when analytics are introduced into an organization
is the creation of new organizational ________.
Answer: units
Diff: 2
Page Ref: 454
51) A(n) ________ is operated solely for a single organization having a mission critical
workload and security concerns.
Answer: private cloud
Diff: 2
Page Ref: 433
52) In a(n) ________ the subscriber uses the resources offered by service providers over the
Internet.
Answer: public cloud
Diff: 2
Page Ref: 434
53) Analytics can change the way in which many ________ are made by managers and can
consequently change their jobs.
Answer: decisions
Diff: 2
Page Ref: 455
54) AaaS in the cloud has economies of scale and scope by providing many ________ analytical
applications with better scalability and higher cost savings.
Answer: virtual
Diff: 2
Page Ref: 435
55) Location information from ________ phones can be used to create profiles of user behavior
and movement.
Answer: mobile
Diff: 2
Page Ref: 462
56) For individual decision makers, ________ values constitute a major factor in the issue of
ethical decision making.
Answer: personal
Diff: 2
Page Ref: 453
57) ________ is/are used to capture, store, analyze, and manage data linked to a location using
integrated sensor technologies, global positioning systems installed in smartphones, or through
RFID deployments in the retail and healthcare industries.
Answer: GIS
Diff: 2
Page Ref: 442
58) By using ________, businesses can collect and analyze data to discern large-scale patterns of
movement and identify distinct classes of behaviors in specific contexts.
Answer: location-enabled services
Diff: 3
Page Ref: 445
59) Pokémon GO is an example of a location-sensing ________ reality-based game.
Answer: augmented
Diff: 2
Page Ref: 446
60) In general, ________ is the right to be left alone and the right to be free from unreasonable
personal intrusion.
Answer: privacy
Diff: 2
Page Ref: 449
61) How does Siemens use sensor data to help monitor equipment on trains?
Answer: Siemens uses an IoT model and sensors attached to several key components of trains
and other railway equipment to help evaluate its current working condition, and predict the need
for future repair. By using a wide variety of different types of sensors, the company is able to
evaluate a multitude of conditions. This evaluation can be on the train itself, or within the
supporting infrastructure. By using analytics to monitor these sensors, the company is able to
predict the need for repair prior to component failure.
Diff: 2
Page Ref: 418
62) How do the traditional location-based analytic techniques using geocoding of organizational
locations and consumers hamper the organizations in understanding "true location-based"
impacts?
Answer: Locations based on postal codes offer an aggregate view of a large geographic area.
This poor granularity may not be able to pinpoint the growth opportunities within a region. The
location of the target customers can change rapidly. An organization's promotional campaigns
might not target the right customers.
Diff: 2
Page Ref: 441
63) In what ways can communications companies use geospatial analysis to harness their data
effectively?
Answer: Communication companies often generate massive amounts of data every day. The
ability to analyze the data quickly with a high level of location-specific granularity can better
identify the customer churn and help in formulating strategies specific to locations for increasing
operational efficiency, quality of service, and revenue.
Diff: 2
Page Ref: 444
64) What is Internet of Things (IoT) and how is it used?
Answer: IoT is the phenomenon of connecting the physical world to the Internet. In IoT,
physical devices are connected to sensors that collect data on the operation, location, and state of
a device. This data is processed using various analytics techniques for monitoring the device
remotely from a central office or for predicting any upcoming faults in the device.
Diff: 2
Page Ref: 419
65) What is cloud computing? What is Amazon's general approach to the cloud computing
services it provides?
Answer:
• Wikipedia defines cloud computing as "a style of computing in which dynamically scalable
and often virtualized resources are provided over the Internet. Users need not have knowledge of,
experience in, or control over the technology infrastructures in the cloud that supports them."
• Amazon.com has developed an impressive technology infrastructure for e- commerce as well
as for business intelligence, customer relationship management, and supply chain management.
It has built major data centers to manage its own operations. However, through Amazon.com's
cloud services, many other companies can employ these very same facilities to gain advantages
of these technologies without having to make a similar investment. Like other cloud-computing
services, a user can subscribe to any of the facilities on a pay-as-you-go basis. This model of
letting someone else own the hardware and software but making use of the facilities on a payper-use basis is the cornerstone of cloud computing.
Diff: 2
Page Ref: 430
66) Data and text mining is a promising application of AaaS. What additional capabilities can
AaaS bring to the analytic world?
Answer: It can also be used for large-scale optimization, highly-complex multi-criteria decision
problems, and distributed simulation models. These prescriptive analytics require highly capable
systems that can only be realized using service-based collaborative systems that can utilize largescale computational resources.
Diff: 3
Page Ref: 435
67) Describe your understanding of the emerging term people analytics. Are there any privacy
issues associated with the application?
Answer:
• Applications such as using sensor-embedded badges that employees wear to track their
movement and predict behavior has resulted in the term people analytics. This application area
combines organizational IT impact, Big Data, sensors, and has privacy concerns. One company,
Sociometric Solutions, has reported several such applications of their sensor-embedded badges.
• People analytics creates major privacy issues. Should the companies be able to monitor their
employees this intrusively? Sociometric has reported that its analytics are only reported on an
aggregate basis to their clients. No individual user data is shared. They have noted that some
employers want to get individual employee data, but their contract explicitly prohibits this type
of sharing. In any case, sensors are leading to another level of surveillance and analytics, which
poses interesting privacy, legal, and ethical questions.
Diff: 2
Page Ref: 455
68) What is a data scientist and what does the job involve?
Answer: A data scientist is a role or a job frequently associated with Big Data or data science. In
a very short time it has become one of the most sought-out roles in the marketplace. Currently,
data scientists' most basic, current skill is the ability to write code (in the latest Big Data
languages and platforms). A more enduring skill will be the need for data scientists to
communicate in a language that all their stakeholders understand—and to demonstrate the
special skills involved in storytelling with data, whether verbally, visually, or—ideally—both.
Data scientists use a combination of their business and technical skills to investigate Big Data
looking for ways to improve current business analytics practices (from descriptive to predictive
and prescriptive) and hence to improve decisions for new business opportunities.
Diff: 2
Page Ref: 459
Download