Business intelligent

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 1 An Overview of Business Intelligence, Analytics, and Data Science 1) Computerized support is only used for organizational decisions that are responses to external pressures, not for taking advantage of opportunities. Answer: FALSE Diff: 2 Page Ref: 3 2) During the early days of analytics, data was often obtained from the domain experts using manual processes to build mathematical or knowledge-based models. Answer: TRUE Diff: 2 Page Ref: 13 3) Computer applications have moved from transaction processing and monitoring activities to problem analysis and solution applications. Answer: TRUE Diff: 1 Page Ref: 11 4) Business intelligence (BI) is a specific term that describes architectures and tools only. Answer: FALSE Diff: 1 Page Ref: 16 5) The growth in hardware, software, and network capacities has had little impact on modern BI innovations. Answer: FALSE Diff: 1 Page Ref: 11 6) Managing data warehouses requires special methods, including parallel computing and/or Hadoop/Spark. Answer: TRUE Diff: 3 Page Ref: 11-12 7) Managing information on operations, customers, internal procedures and employee interactions is the domain of cognitive science. Answer: FALSE Diff: 3 Page Ref: 12 8) Decision support system (DSS) and management information system (MIS) have precise definitions agreed to by practitioners. Answer: FALSE Diff: 2 Page Ref: 13 9) In the 2000s, the DW-driven DSSs began to be called BI systems. Answer: TRUE Diff: 1 Page Ref: 14 10) Major commercial business intelligence (BI) products and services were well established in the early 1970s. Answer: FALSE Diff: 2 Page Ref: 15 11) Information systems that support such transactions as ATM withdrawals, bank deposits, and cash register scans at the grocery store represent transaction processing, a critical branch of BI. Answer: FALSE Diff: 2 Page Ref: 19 12) Many business users in the 1980s referred to their mainframes as "the black hole," because all the information went into it, but little ever came back and ad hoc real-time querying was virtually impossible. Answer: TRUE Diff: 2 Page Ref: 20 13) Successful BI is a tool for the information systems department, but is not exposed to the larger organization. Answer: FALSE Diff: 2 Page Ref: 20 14) BI represents a bold new paradigm in which the company's business strategy must be aligned to its business intelligence analysis initiatives. Answer: FALSE Diff: 2 Page Ref: 20-21 15) Traditional BI systems use a large volume of static data that has been extracted, cleansed, and loaded into a data warehouse to produce reports and analyses. Answer: TRUE Diff: 2 Page Ref: 21 16) Demands for instant, on-demand access to dispersed information decrease as firms successfully integrate BI into their operations. Answer: FALSE Diff: 3 Page Ref: 21 17) The use of dashboards and data visualizations is seldom effective in identifying issues in organizations, as demonstrated by the Silvaris Corporation Case Study. Answer: FALSE Diff: 2 Page Ref: 24 18) The use of statistics in baseball by the Oakland Athletics, as described in the Moneyball case study, is an example of the effectiveness of prescriptive analytics. Answer: TRUE Diff: 2 Page Ref: 5 19) Due to industry consolidation, the analytics ecosystem consists of only a handful of players across several functional areas. Answer: FALSE Diff: 2 Page Ref: 38-39 20) Data generation is a precursor, and is not included in the analytics ecosystem. Answer: FALSE Diff: 1 Page Ref: 39 21) In the Opening Vignette on Sports Analytics, what was adjusted to drive one-time ticket sales? A) player selections B) stadium location C) fan tweets D) ticket prices Answer: D Diff: 2 Page Ref: 6 22) In the Opening Vignette on Sports Analytics, what type of modeling was used to predict offensive tactics? A) heuristics B) heat maps C) cascaded decision trees D) sentiment analysis Answer: B Diff: 3 Page Ref: 7 23) Business applications have moved from transaction processing and monitoring to other activities. Which of the following is NOT one of those activities? A) problem analysis B) solution applications C) data monitoring D) mobile access Answer: C Diff: 2 Page Ref: 11 24) Which of the following developments is NOT contributing to facilitating growth of decision support and analytics? A) collaboration technologies B) Big Data C) knowledge management systems D) locally concentrated workforces Answer: D Diff: 3 Page Ref: 11-12 25) In what decade did disjointed information systems begin to be integrated? A) 1970s B) 1980s C) 1990s D) 2000s Answer: B Diff: 2 Page Ref: 14 26) Relational databases began to be used in the A) 1960s. B) 1970s. C) 1980s. D) 1990s. Answer: C Diff: 3 Page Ref: 13 27) The need for more versatile reporting than what was available in 1980s era ERP systems led to the development of what type of system? A) management information systems B) relational databases C) executive information systems D) data warehouses Answer: C Diff: 3 Page Ref: 14 28) Which of the following is an umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies? A) MIS B) DSS C) ERP D) BI Answer: D Diff: 1 Page Ref: 16 29) The competitive imperatives for BI include all of the following EXCEPT A) right information B) right user C) right time D) right place Answer: B Diff: 2 Page Ref: 16 30) Which of the following is NOT an example of transaction processing? A) ATM withdrawal B) bank deposit C) sales report D) cash register scans Answer: C Diff: 2 Page Ref: 19 31) Online transaction processing (OLTP) systems handle a company's routine ongoing business. In contrast, a data warehouse is typically A) the end result of BI processes and operations. B) a repository of actionable intelligence obtained from a data mart. C) a distinct system that provides storage for data that will be made use of in analysis. D) an integral subsystem of an online analytical processing (OLAP) system. Answer: C Diff: 2 Page Ref: 19-20 32) The very design that makes an OLTP system efficient for transaction processing makes it inefficient for A) end-user ad hoc reports, queries, and analysis. B) transaction processing systems that constantly update operational databases. C) the collection of reputable sources of intelligence. D) transactions such as ATM withdrawals, where we need to reduce a bank balance accordingly. Answer: A Diff: 2 Page Ref: 20 33) How are enterprise resources planning (ERP) systems related to supply chain management (SCM) systems? A) different terms for the same system B) complementary systems C) mutually exclusive systems D) none of the above; these systems never interface Answer: B Diff: 2 Page Ref: 20 34) BI applications must be integrated with A) databases. B) legacy systems. C) enterprise systems. D) all of these Answer: D Diff: 2 Page Ref: 22 35) What has caused the growth of the demand for instant, on-demand access to dispersed information? A) the increasing divide between users who focus on the strategic level and those who are more oriented to the tactical level B) the need to create a database infrastructure that is always online and contains all the information from the OLTP systems C) the more pressing need to close the gap between the operational data and strategic objectives D) the fact that BI cannot simply be a technical exercise for the information systems department Answer: C Diff: 3 Page Ref: 21 36) Today, many vendors offer diversified tools, some of which are completely preprogrammed (called shells). How are these shells utilized? A) They are used for customization of BI solutions. B) All a user needs to do is insert the numbers. C) The shell provides a secure environment for the organization's BI data. D) They host an enterprise data warehouse that can assist in decision making. Answer: B Diff: 2 Page Ref: 21 37) What type of analytics seeks to recognize what is going on as well as the likely forecast and make decisions to achieve the best performance possible? A) descriptive B) prescriptive C) predictive D) domain Answer: B Diff: 2 Page Ref: 24-27 38) What type of analytics seeks to determine what is likely to happen in the future? A) descriptive B) prescriptive C) predictive D) domain Answer: C Diff: 2 Page Ref: 24-27 39) Which of the following statements about Big Data is true? A) Data chunks are stored in different locations on one computer. B) Hadoop is a type of processor used to process Big Data applications. C) MapReduce is a storage filing system. D) Pure Big Data systems do not involve fault tolerance. Answer: D Diff: 3 Page Ref: 36 40) Big Data often involves a form of distributed storage and processing using Hadoop and MapReduce. One reason for this is A) centralized storage creates too many vulnerabilities. B) the "Big" in Big Data necessitates over 10,000 processing nodes. C) the processing power needed for the centralized model would overload a single computer. D) Big Data systems have to match the geographical spread of social media. Answer: C Diff: 3 Page Ref: 36 41) Fundamental reasons for investing in BI must be ________ with the company's business strategy. Answer: aligned Diff: 2 Page Ref: 20 42) Software monitors referred to as ________ can be placed on a separate server in the network and use event- and process-based approaches to measure and monitor operational processes. Answer: intelligent agents Diff: 2 Page Ref: 21 43) Organizations using BI systems are typically seeking to ________ the gap between the operational data and strategic objectives has become more pressing. Answer: close Diff: 2 Page Ref: 21 44) ________ is an umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies. Answer: Business intelligence (BI) Diff: 2 Page Ref: 16 45) A(n) ________ is a major component of a Business Intelligence (BI) system that holds source data. Answer: data warehouse Diff: 2 Page Ref: 11 46) A(n) ________ is a major component of a Business Intelligence (BI) system that is often browser based and often presents a portal or dashboard. Answer: user interface Diff: 2 Page Ref: 17 47) ________ cycle times are now extremely compressed, faster, and more informed across industries. Answer: Business Diff: 2 Page Ref: 16 48) Different types of players are identified and described in the analytics ________. Answer: ecosystem Diff: 2 Page Ref: 37 49) ________ providers focus on providing technology and services aimed toward integrating data from multiple sources. Answer: Data Warehouse Diff: 2 Page Ref: 40 50) ________ providers focus on bringing all the data stores into an enterprise-wide platform. Answer: Middleware Diff: 2 Page Ref: 40 51) The user interface of a BI system is often referred to as a(n) ________. Answer: dashboard Diff: 2 Page Ref: 16 52) Data warehouses are intended to work with informational data used for online ________ processing systems. Answer: analytical Diff: 2 Page Ref: 20 53) With ________, all the data from every corner of the enterprise is collected and integrated into a consistent schema so that every part of the organization has access to the single version of the truth when and where needed. Answer: Enterprise Resource Planning (ERP) Diff: 2 Page Ref: 14 54) As the number of potential BI applications increases, the need to justify and prioritize them arises. This is not an easy task due to the large number of ________ benefits. Answer: intangible Diff: 2 Page Ref: 22 55) ________ analytics help managers understand current events in the organization including causes, trends, and patterns. Answer: Descriptive Diff: 2 Page Ref: 24 56) ________ analytics help managers understand probable future outcomes. Answer: Predictive Diff: 2 Page Ref: 25 57) ________ analytics help managers make decisions to achieve the best performance in the future. Answer: Prescriptive Diff: 2 Page Ref: 26-27 58) The Google search engine is an example of Big Data in that it has to search and index billions of ________ in fractions of a second for each search. Answer: Web pages Diff: 2 Page Ref: 36 59) The filing system developed by Google to handle Big Data storage challenges is known as the ________ Distributed File System. Answer: Hadoop Diff: 2 Page Ref: 36 60) The programing algorithm developed by Google to handle Big Data computational challenges is known as ________. Answer: MapReduce Diff: 2 Page Ref: 36 61) List four possible analytics applications in the retail value chain. Answer: • Inventory Optimization • Price Elasticity • Market Basket Analysis • Shopper Insight • Customer Churn Analysis • Channel Analysis • New Store Analysis • Store Layout • Video Analytics Diff: 2 Page Ref: 34 62) What are the four major components of a Business Intelligence (BI) system? Answer: 1. A data warehouse, with its source data 2. Business analytics, a collection of tools for manipulating, mining, and analyzing the data in the data warehouse 3. Business performance management (BPM) for monitoring and analyzing performance 4. A user interface (e.g., a dashboard) Diff: 3 Page Ref: 16 63) Why is data alone worthless? Answer: Alone, data is worthless because it does not provide business value. To provide business value, it has to be analyzed. Diff: 2 Page Ref: 36 64) What is the intent of the analysis of data that is stored in a data warehouse? Answer: The intent of the analysis is to give management the ability to analyze data for insights into the business, and thus provide tactical or operational decision support whereby, for example, line personnel can make quicker and/or more informed decisions. Diff: 2 Page Ref: 19-20 65) Describe the three major subsets of the Analytics Focused Software Developers portion of the Analytics Ecosystem. Answer: • Reporting/Descriptive Analytics — Includes tools is enabled by and available from the Middleware industry players and unique capabilities offered by focused providers. • Predictive Analytics — a rapidly growing area that includes a variety of statistical packages. • Prescriptive Analytics — Software providers in this category offer modeling tools and algorithms for optimization of operations usually called management science/operations research software. Diff: 3 Page Ref: 41-42 66) Business applications can be programmed to act on what real-time BI systems discover. Describe two approaches to the implementation of real-time BI. Answer: • One approach to real-time BI uses the DW model of traditional BI systems. In this case, products from innovative BI platform providers provide a service-oriented, near–real-time solution that populates the DW much faster than the typical nightly extract/transfer/load (ETL) batch update does. • A second approach, commonly called business activity management (BAM), is adopted by pure play BAM and or hybrid BAM-middleware providers (such as Savvion, Iteration Software, Vitria, webMethods, Quantive, Tibco, or Vineyard Software). It bypasses the DW entirely and uses Web services or other monitoring means to discover key business events. These software monitors (or intelligent agents) can be placed on a separate server in the network or on the transactional application databases themselves, and they can use event- and process-based approaches to proactively and intelligently measure and monitor operational processes. Diff: 3 Page Ref: 21 67) List and describe three levels or categories of analytics that are most often viewed as sequential and independent, but also occasionally seen as overlapping. Answer: • Descriptive or reporting analytics refers to knowing what is happening in the organization and understanding some underlying trends and causes of such occurrences. • Predictive analytics aims to determine what is likely to happen in the future. This analysis is based on statistical techniques as well as other more recently developed techniques that fall under the general category of data mining. • Prescriptive analytics recognizes what is going on as well as the likely forecast and makes decisions to achieve the best performance possible. Diff: 3 Page Ref: 24-27 68) How does Amazon.com use predictive analytics to respond to product searches by the customer? Answer: Amazon uses clustering algorithms to segment customers into different clusters to be able to target specific promotions to them. The company also uses association mining techniques to estimate relationships between different purchasing behaviors. That is, if a customer buys one product, what else is the customer likely to purchase? That helps Amazon recommend or promote related products. For example, any product search on Amazon.com results in the retailer also suggesting other similar products that may interest a customer. Diff: 3 Page Ref: 26 69) Describe and define Big Data. Why is a search engine a Big Data application? Answer: • Big Data is data that cannot be stored in a single storage unit. Big Data typically refers to data that is arriving in many different forms, be they structured, unstructured, or in a stream. Major sources of such data are clickstreams from Web sites, postings on social media sites such as Facebook, or data from traffic, sensors, or weather. • A Web search engine such as Google needs to search and index billions of Web pages in order to give you relevant search results in a fraction of a second. Although this is not done in real time, generating an index of all the Web pages on the Internet is not an easy task. Diff: 3 Page Ref: 35-36 70) What storage system and processing algorithm were developed by Google for Big Data? Answer: • Google developed and released as an Apache project the Hadoop Distributed File System (HDFS) for storing large amounts of data in a distributed way. • Google developed and released as an Apache project the MapReduce algorithm for pushing computation to the data, instead of pushing data to a computing node. Diff: 3 Page Ref: 36 Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization 1) One of SiriusXM's challenges was tracking potential customers when cars were sold. Answer: TRUE Diff: 1 Page Ref: 54 2) To respond to its market challenges, SiriusXM decided to focus on manufacturing efficiency. Answer: FALSE Diff: 2 Page Ref: 55 3) Data is the contextualization of information, that is, information set in context. Answer: FALSE Diff: 1 Page Ref: 98 4) Data is the main ingredient for any BI, data science, and business analytics initiative. Answer: TRUE Diff: 2 Page Ref: 57 5) Predictive algorithms generally require a flat file with a target variable, so making data analytics ready for prediction means that data sets must be transformed into a flat-file format and made ready for ingestion into those predictive algorithms. Answer: TRUE Diff: 1 Page Ref: 58 6) The data storage component of a business reporting system builds the various reports and hosts them for, or disseminates them to users. It also provides notification, annotation, collaboration, and other services. Answer: FALSE Diff: 2 Page Ref: 98 7) In the FEMA case study, the BureauNet software was the primary reason behind the increased speed and relevance of the reports FEMA employees received. Answer: TRUE Diff: 2 Page Ref: 100 8) Google Maps has set new standards for data visualization with its intuitive Web mapping software. Answer: TRUE Diff: 2 Page Ref: 103 9) There are basic chart types and specialized chart types. A Gantt chart is a specialized chart type. Answer: TRUE Diff: 2 Page Ref: 107 10) Visualization differs from traditional charts and graphs in complexity of data sets and use of multiple dimensions and measures. Answer: TRUE Diff: 2 Page Ref: 110 11) When telling a story during a presentation, it is best to avoid describing hurdles that your character must overcome, to avoid souring the mood. Answer: FALSE Diff: 2 Page Ref: 113 12) Visual analytics is aimed at answering, "What is it happening?" and is usually associated with business analytics. Answer: FALSE Diff: 3 Page Ref: 112 13) Dashboards provide visual displays of important information that is consolidated and arranged across several screens to maintain data order. Answer: FALSE Diff: 2 Page Ref: 117 14) In the Dallas Cowboys case study, the focus was on using data analytics to decide which players would play every week. Answer: FALSE Diff: 2 Page Ref: 118 15) Data source reliability means that data are correct and are a good match for the analytics problem. Answer: FALSE Diff: 1 Page Ref: 59 16) Data accessibility means that the data are easily and readily obtainable. Answer: TRUE Diff: 3 Page Ref: 59 17) Structured data is what data mining algorithms use and can be classified as categorical or numeric. Answer: TRUE Diff: 2 Page Ref: 61 18) Interval data are variables that can be measured on interval scales. Answer: TRUE Diff: 2 Page Ref: 62 19) Nominal data represent the labels of multiple classes used to divide a variable into specific groups. Answer: FALSE Diff: 2 Page Ref: 61 20) Descriptive statistics is all about describing the sample data on hand. Answer: TRUE Diff: 2 Page Ref: 75 21) Which characteristic of data means that all the required data elements are included in the data set? A) data source reliability B) data accessibility C) data richness D) data granularity Answer: C Diff: 2 Page Ref: 59-60 22) Key performance indicators (KPIs) are metrics typically used to measure A) database responsiveness. B) qualitative feedback. C) external results. D) internal results. Answer: D Diff: 2 Page Ref: 99 23) Kaplan and Norton developed a report that presents an integrated view of success in the organization called A) metric management reports. B) balanced scorecard-type reports. C) dashboard-type reports. D) visual reports. Answer: B Diff: 2 Page Ref: 99 24) Which characteristic of data requires that the variables and data values be defined at the lowest (or as low as required) level of detail for the intended use of the data? A) data source reliability B) data accessibility C) data richness D) data granularity Answer: D Diff: 2 Page Ref: 59-60 25) Which of the following is LEAST related to data/information visualization? A) information graphics B) scientific visualization C) statistical graphics D) graphic artwork Answer: D Diff: 2 Page Ref: 101 26) The Internet emerged as a new medium for visualization and brought all the following EXCEPT A) worldwide digital distribution of visualization. B) immersive environments for consuming data. C) new forms of computation of business logic. D) new graphics displays through PC displays. Answer: C Diff: 2 Page Ref: 101-103 27) Which kind of chart is described as an enhanced version of a scatter plot? A) heat map B) bullet C) pie chart D) bubble chart Answer: D Diff: 3 Page Ref: 107 28) Which type of visualization tool can be very helpful when the intention is to show relative proportions of dollars per department allocated by a university administration? A) heat map B) bullet C) pie chart D) bubble chart Answer: C Diff: 3 Page Ref: 106 29) Which type of visualization tool can be very helpful when a data set contains location data? A) bar chart B) geographic map C) highlight table D) tree map Answer: B Diff: 2 Page Ref: 107 30) Which type of question does visual analytics seeks to answer? A) Why is it happening? B) What happened yesterday? C) What is happening today? D) When did it happen? Answer: A Diff: 2 Page Ref: 112 31) When you tell a story in a presentation, all of the following are true EXCEPT A) a story should make sense and order out of a lot of background noise. B) a well-told story should have no need for subsequent discussion. C) stories and their lessons should be easy to remember. D) the outcome and reasons for it should be clear at the end of your story. Answer: B Diff: 2 Page Ref: 113 32) Benefits of the latest visual analytics tools, such as SAS Visual Analytics, include all of the following EXCEPT A) mobile platforms such as the iPhone are supported by these products. B) it is easier to spot useful patterns and trends in the data. C) they explore massive amounts of data in hours, not days. D) there is less demand on IT departments for reports. Answer: C Diff: 2 Page Ref: 115 33) What is the management feature of a dashboard? A) operational data that identify what actions to take to resolve a problem B) summarized dimensional data to analyze the root cause of problems C) summarized dimensional data to monitor key performance metrics D) graphical, abstracted data to monitor key performance metrics Answer: A Diff: 3 Page Ref: 119 34) What is the fundamental challenge of dashboard design? A) ensuring that users across the organization have access to it B) ensuring that the organization has the appropriate hardware onsite to support it C) ensuring that the organization has access to the latest Web browsers D) ensuring that the required information is shown clearly on a single screen Answer: D Diff: 3 Page Ref: 119 35) Contextual metadata for a dashboard includes all the following EXCEPT A) whether any high-value transactions that would skew the overall trends were rejected as a part of the loading process. B) which operating system is running the dashboard server software. C) whether the dashboard is presenting "fresh" or "stale" information. D) when the data warehouse was last refreshed. Answer: B Diff: 2 Page Ref: 121 36) Dashboards can be presented at all the following levels EXCEPT A) the visual dashboard level. B) the static report level. C) the visual cube level. D) the self-service cube level. Answer: C Diff: 2 Page Ref: 122 37) This measure of central tendency is the sum of all the values/observations divided by the number of observations in the data set. A) dispersion B) mode C) median D) arithmetic mean Answer: D Diff: 3 Page Ref: 76 38) This measure of dispersion is calculated by simply taking the square root of the variations. A) standard deviation B) range C) variance D) arithmetic mean Answer: A Diff: 2 Page Ref: 78 39) This plot is a graphical illustration of several descriptive statistics about a given data set. A) pie chart B) bar graph C) box-and-whiskers plot D) kurtosis Answer: C Diff: 3 Page Ref: 79 40) This technique makes no a priori assumption of whether one variable is dependent on the other(s) and is not concerned with the relationship between variables; instead it gives an estimate on the degree of association between the variables. A) regression B) correlation C) means test D) multiple regression Answer: B Diff: 2 Page Ref: 86 41) A(n) ________ is a communication artifact, concerning business matters, prepared with the specific intention of relaying information in a presentable form. Answer: report Diff: 2 Page Ref: 98 42) ________ statistics is about drawing conclusions about the characteristics of the population. Answer: Inferential Diff: 2 Page Ref: 75 43) Due to the ________ expansion of information technology coupled with the need for improved competitiveness in business, there has been an increase in the use of computing power to produce unified reports that join different views of the enterprise in one place. Answer: rapid Diff: 3 Page Ref: 98 44) ________ management reports are used to manage business performance through outcomeoriented metrics in many organizations. Answer: Metric Diff: 2 Page Ref: 99 45) When validating the assumptions of a regression, ________ assumes that the relationship between the response variable and the explanatory variables are linear. Answer: linearity Diff: 2 Page Ref: 89 46) ________ regression is a very popular, statistically sound, probability-based classification algorithm that employs supervised learning. Answer: Logistic Diff: 2 Page Ref: 90 47) ________ charts are useful in displaying nominal data or numerical data that splits nicely into different categories so you can quickly see comparative results and trends. Answer: Bar Diff: 1 Page Ref: 106 48) ________ charts or network diagrams show precedence relationships among the project activities/tasks. Answer: PERT Diff: 1 Page Ref: 107 49) ________ are typically used together with other charts and graphs, as opposed to by themselves, and show postal codes, country names, etc. Answer: Maps Diff: 1 Page Ref: 107 50) Typical charts, graphs, and other visual elements used in visualization-based applications usually involve ________ dimensions. Answer: two Diff: 2 Page Ref: 110 51) Visual analytics is widely regarded as the combination of visualization and ________ analytics. Answer: predictive Diff: 2 Page Ref: 112 52) Dashboards present visual displays of important information that are consolidated and arranged on a single ________. Answer: screen Diff: 1 Page Ref: 117 53) With dashboards, the layer of information that uses graphical, abstracted data to keep tabs on key performance metrics is the ________ layer. Answer: monitoring Diff: 2 Page Ref: 119 54) ________ series forecasting is the use of mathematical modeling to predict future values of the variable of interest based on previously observed values. Answer: Time Diff: 1 Page Ref: 97 55) Information dashboards enable ________ operations that allow the users to view underlying data sources and obtain more detail. Answer: drill-down/drill-through Diff: 2 Page Ref: 121 56) With a dashboard, information on sources of the data being presented, the quality and currency of underlying data provide contextual ________ for users. Answer: metadata Diff: 2 Page Ref: 121 57) When validating the assumptions of a regression, ________ assumes that the errors of the response variable are normally distributed. Answer: normality Diff: 2 Page Ref: 89-90 58) ________ charts are effective when you have nominal data or numerical data that splits nicely into different categories so you can quickly see comparative results and trends within your data. Answer: Bar Diff: 1 Page Ref: 106 59) ________ plots are often used to explore the relationship between two or three variables (in 2-D or 2-D visuals). Answer: Scatter Diff: 2 Page Ref: 106 60) ________ charts are a special case of horizontal bar charts that are used to portray project timelines, project tasks/activity durations, and overlap among the tasks/activities. Answer: Gantt Diff: 2 Page Ref: 107 61) List and describe the three major categories of business reports. Answer: • Metric management reports. Many organizations manage business performance through outcome-oriented metrics. For external groups, these are service-level agreements (SLAs). For internal management, they are key performance indicators (KPIs). • Dashboard-type reports. This report presents a range of different performance indicators on one page, like a dashboard in a car. Typically, there is a set of predefined reports with static elements and fixed structure, but customization of the dashboard is allowed through widgets, views, and set targets for various metrics. • Balanced scorecard–type reports. This is a method developed by Kaplan and Norton that attempts to present an integrated view of success in an organization. In addition to financial performance, balanced scorecard–type reports also include customer, business process, and learning and growth perspectives. Diff: 2 Page Ref: 99 62) List five types of specialized charts and graphs. Answer: • Histograms • Gantt charts • PERT charts • Geographic maps • Bullets • Heat maps • Highlight tables • Tree maps Diff: 2 Page Ref: 107-108 63) According to Eckerson (2006), a well-known expert on BI dashboards, what are the three layers of information of a dashboard? Answer: 1. Monitoring. Graphical, abstracted data to monitor key performance metrics. 2. Analysis. Summarized dimensional data to analyze the root cause of problems. 3. Management. Detailed operational data that identify what actions to take to resolve a problem. Diff: 2 Page Ref: 119 64) List the five most common functions of business reports. Answer: • To ensure that all departments are functioning properly • To provide information • To provide the results of an analysis • To persuade others to act • To create an organizational memory (as part of a knowledge management system) Diff: 2 Page Ref: 98 65) What are the most important assumptions in linear regression? Answer: 1. Linearity. This assumption states that the relationship between the response variable and the explanatory variables is linear. That is, the expected value of the response variable is a straightline function of each explanatory variable, while holding all other explanatory variables fixed. Also, the slope of the line does not depend on the values of the other variables. It also implies that the effects of different explanatory variables on the expected value of the response variable are additive in nature. 2. Independence (of errors). This assumption states that the errors of the response variable are uncorrelated with each other. This independence of the errors is weaker than actual statistical independence, which is a stronger condition and is often not needed for linear regression analysis. 3. Normality (of errors). This assumption states that the errors of the response variable are normally distributed. That is, they are supposed to be totally random and should not represent any nonrandom patterns. 4. Constant variance (of errors). This assumption, also called homoscedasticity, states that the response variables have the same variance in their error, regardless of the values of the explanatory variables. In practice this assumption is invalid if the response variable varies over a wide enough range/scale. 5. Multicollinearity. This assumption states that the explanatory variables are not correlated (i.e., do not replicate the same but provide a different perspective of the information needed for the model). Multicollinearity can be triggered by having two or more perfectly correlated explanatory variables presented to the model (e.g., if the same explanatory variable is mistakenly included in the model twice, one with a slight transformation of the same variable). A correlation-based data assessment usually catches this error. Diff: 2 Page Ref: 89-90 66) Describe the difference between simple and multiple regression. Answer: If the regression equation is built between one response variable and one explanatory variable, then it is called simple regression. Multiple regression is the extension of simple regression where the explanatory variables are more than one. Diff: 2 Page Ref: 87 67) Describe the difference between descriptive and inferential statistics. Answer: The main difference between descriptive and inferential statistics is the data used in these methods—whereas descriptive statistics is all about describing the sample data on hand, and inferential statistics is about drawing inferences or conclusions about the characteristics of the population. Diff: 2 Page Ref: 75 68) Describe categorical and nominal data. Answer: Categorical data represent the labels of multiple classes used to divide a variable into specific groups. Examples of categorical variables include race, sex, age group, and educational level. Nominal data contain measurements of simple codes assigned to objects as labels, which are not measurements. For example, the variable marital status can be generally categorized as (1) single, (2) married, and (3) divorced. Diff: 2 Page Ref: 61 Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 3 Descriptive Analytics II: Business Intelligence and Data Warehousing 1) The BPM development cycle is essentially a one-shot process where the requirement is to get it right the first time. Answer: FALSE Diff: 2 Page Ref: 170 2) The "islands of data" problem in the 1980s describes the phenomenon of unconnected data being stored in numerous locations within an organization. Answer: TRUE Diff: 2 Page Ref: 132 3) Subject oriented databases for data warehousing are organized by detailed subjects such as disk drives, computers, and networks. Answer: FALSE Diff: 2 Page Ref: 133 4) Data warehouses are subsets of data marts. Answer: FALSE Diff: 1 Page Ref: 134 5) One way an operational data store differs from a data warehouse is the recency of their data. Answer: TRUE Diff: 2 Page Ref: 135 6) Organizations seldom devote a lot of effort to creating metadata because it is not important for the effective use of data warehouses. Answer: FALSE Diff: 2 Page Ref: 135 7) Without middleware, different BI programs cannot easily connect to the data warehouse. Answer: TRUE Diff: 2 Page Ref: 139 8) Two-tier data warehouse/BI infrastructures offer organizations more flexibility but cost more than three-tier ones. Answer: FALSE Diff: 2 Page Ref: 140 9) Moving the data into a data warehouse is usually the easiest part of its creation. Answer: FALSE Diff: 2 Page Ref: 141 10) The hub-and-spoke data warehouse model uses a centralized warehouse feeding dependent data marts. Answer: TRUE Diff: 2 Page Ref: 142 11) Because of performance and data quality issues, most experts agree that the federated architecture should supplement data warehouses, not replace them. Answer: TRUE Diff: 2 Page Ref: 144 12) Bill Inmon advocates the data mart bus architecture whereas Ralph Kimball promotes the hub-and-spoke architecture, a data mart bus architecture with conformed dimensions. Answer: FALSE Diff: 2 Page Ref: 144 13) Properly integrating data from various databases and other disparate sources is a trivial process. Answer: FALSE Diff: 3 Page Ref: 146 14) With key performance indicators, driver KPIs have a significant effect on outcome KPIs, but the reverse is not necessarily true. Answer: TRUE Diff: 2 Page Ref: 176 15) With the balanced scorecard approach, the entire focus is on measuring and managing specific financial goals based on the organization's strategy. Answer: FALSE Diff: 2 Page Ref: 177 16) OLTP systems are designed to handle ad hoc analysis and complex queries that deal with many data items. Answer: FALSE Diff: 2 Page Ref: 158 17) The data warehousing maturity model consists of six stages: prenatal, infant, child, teenager, adult, and sage. Answer: TRUE Diff: 2 Page Ref: 160-161 18) User-initiated navigation of data through disaggregation is referred to as "drill up." Answer: FALSE Diff: 3 Page Ref: 159 19) Data warehouse administrators (DWAs) do not need strong business insight since they only handle the technical aspect of the infrastructure. Answer: FALSE Diff: 2 Page Ref: 164 20) Because the recession has raised interest in low-cost open source software, it is now set to replace traditional enterprise software. Answer: FALSE Diff: 2 Page Ref: 165 21) Why is a performance management system superior to a performance measurement system? A) because performance measurement systems are only in their infancy B) because measurement automatically leads to problem solution C) because performance management systems cost more D) because measurement alone has little use without action Answer: D Diff: 3 Page Ref: 176-177 22) Operational or transaction databases are product oriented, handling transactions that update the database. In contrast, data warehouses are A) subject-oriented and nonvolatile. B) product-oriented and nonvolatile. C) product-oriented and volatile. D) subject-oriented and volatile. Answer: A Diff: 3 Page Ref: 131 23) Which kind of data warehouse is created separately from the enterprise data warehouse by a department and not reliant on it for updates? A) sectional data mart B) public data mart C) independent data mart D) volatile data mart Answer: C Diff: 2 Page Ref: 134 24) Oper marts are created when operational data needs to be analyzed A) linearly. B) in a dashboard. C) unidimensionally. D) multidimensionally. Answer: D Diff: 2 Page Ref: 135 25) A Web client that connects to a Web server, which is in turn connected to a BI application server, is reflective of a A) one-tier architecture. B) two-tier architecture. C) three-tier architecture. D) four-tier architecture. Answer: C Diff: 2 Page Ref: 139 26) Which of the following BEST enables a data warehouse to handle complex queries and scale up to handle many more requests? A) use of the Web by users as a front-end B) parallel processing C) Microsoft Windows D) a larger IT staff Answer: B Diff: 3 Page Ref: 141 27) Which data warehouse architecture uses metadata from existing data warehouses to create a hybrid logical data warehouse comprised of data from the other warehouses? A) independent data marts architecture B) centralized data warehouse architecture C) hub-and-spoke data warehouse architecture D) federated architecture Answer: D Diff: 3 Page Ref: 142 28) Which data warehouse architecture uses a normalized relational warehouse that feeds multiple data marts? A) independent data marts architecture B) centralized data warehouse architecture C) hub-and-spoke data warehouse architecture D) federated architecture Answer: C Diff: 3 Page Ref: 142 29) Which approach to data warehouse integration focuses more on sharing process functionality than data across systems? A) extraction, transformation, and load B) enterprise application integration C) enterprise information integration D) enterprise function integration Answer: B Diff: 3 Page Ref: 147 30) ________ is an evolving tool space that promises real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases. A) Enterprise information integration (EII) B) Enterprise application integration (EAI) C) Extraction, transformation, and load (ETL) D) None of these Answer: A Diff: 3 Page Ref: 148 31) In which stage of extraction, transformation, and load (ETL) into a data warehouse are anomalies detected and corrected? A) transformation B) extraction C) load D) cleanse Answer: D Diff: 3 Page Ref: 149 32) Data warehouses provide direct and indirect benefits to organizations. Which of the following is an indirect benefit of data warehouses? A) better and more timely information B) extensive new analyses performed by users C) simplified access to data D) improved customer service Answer: D Diff: 3 Page Ref: 150 33) All of the following are benefits of hosted data warehouses EXCEPT A) smaller upfront investment. B) better quality hardware. C) greater control of data. D) frees up in-house systems. Answer: C Diff: 2 Page Ref: 157 34) When representing data in a data warehouse, using several dimension tables that are each connected only to a fact table means you are using which warehouse structure? A) star schema B) snowflake schema C) relational schema D) dimensional schema Answer: A Diff: 3 Page Ref: 157 35) When querying a dimensional database, a user went from summarized data to its underlying details. The function that served this purpose is A) dice. B) slice. C) roll-up. D) drill down. Answer: D Diff: 3 Page Ref: 159 36) What is Six Sigma? A) a letter in the Greek alphabet that statisticians use to measure process variability B) a methodology aimed at reducing the number of defects in a business process C) a methodology aimed at reducing the amount of variability in a business process D) a methodology aimed at measuring the amount of variability in a business process Answer: B Diff: 2 Page Ref: 180 37) Real-time data warehousing can be used to support the highest level of decision making sophistication and power. The major feature that enables this in relation to handling the data is A) country of (data) origin. B) nature of the data. C) speed of data transfer. D) source of the data. Answer: C Diff: 2 Page Ref: 168 38) A large storage location that can hold vast quantities of data (mostly unstructured) in its native/raw format for future/potential analytics consumption is referred to as a(n) A) extended ASP. B) data cloud. C) data lake. D) relational database. Answer: C Diff: 3 Page Ref: 166 39) How does the use of cloud computing affect the scalability of a data warehouse? A) Cloud computing vendors bring as much hardware as needed to users' offices. B) Hardware resources are dynamically allocated as use increases. C) Cloud vendors are mostly based overseas where the cost of labor is low. D) Cloud computing has little effect on a data warehouse's scalability. Answer: B Diff: 3 Page Ref: 165-166 40) All of the following are true about in-database processing technology EXCEPT A) it pushes the algorithms to where the data is. B) it makes the response to queries much faster than conventional databases. C) it is often used for apps like credit card fraud detection and investment risk management. D) it is the same as in-memory storage technology. Answer: D Diff: 3 Page Ref: 169 41) A(n) ________ data store (ODS) provides a fairly recent form of customer information file. Answer: operational Diff: 2 Page Ref: 135 42) In ________ oriented data warehousing, operational databases are tuned to handle transactions that update the database. Answer: product Diff: 2 Page Ref: 134 43) The three main types of data warehouses are data marts, operational ________, and enterprise data warehouses. Answer: data stores Diff: 2 Page Ref: 134 44) ________ describe the structure and meaning of the data, contributing to their effective use. Answer: Metadata Diff: 1 Page Ref: 135 45) Most data warehouses are built using ________ database management systems to control and manage the data. Answer: relational Diff: 2 Page Ref: 141 46) A(n) ________ architecture is used to build a scalable and maintainable infrastructure that includes a centralized data warehouse and several dependent data marts. Answer: hub-and-spoke Diff: 2 Page Ref: 142 47) The ________ data warehouse architecture involves integrating disparate systems and analytical resources from multiple sources to meet changing needs or business conditions. Answer: federated Diff: 2 Page Ref: 142 48) Data ________ comprises data access, data federation, and change capture. Answer: integration Diff: 3 Page Ref: 146 49) ________ is a mechanism that integrates application functionality and shares functionality (rather than data) across systems, thereby enabling flexibility and reuse. Answer: Enterprise application integration (EAI) Diff: 3 Page Ref: 147 50) ________ is a mechanism for pulling data from source systems to satisfy a request for information. It is an evolving tool space that promises real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases. Answer: Enterprise information integration (EII) Diff: 3 Page Ref: 148 51) Performing extensive ________ to move data to the data warehouse may be a sign of poorly managed data and a fundamental lack of a coherent data management strategy. Answer: extraction, transformation, and load (ETL) Diff: 3 Page Ref: 149 52) The ________ Model, also known as the EDW approach, emphasizes top-down development, employing established database development methodologies and tools, such as entity-relationship diagrams (ERD), and an adjustment of the spiral development approach. Answer: Inmon Diff: 2 Page Ref: 153-154 53) The ________ Model, also known as the data mart approach, is a "plan big, build small" approach. A data mart is a subject-oriented or department-oriented data warehouse. It is a scaleddown version of a data warehouse that focuses on the requests of a specific department, such as marketing or sales. Answer: Kimball Diff: 2 Page Ref: 154 54) ________ modeling is a retrieval-based system that supports high-volume query access. Answer: Dimensional Diff: 2 Page Ref: 156 55) A(n) ________ data mart is a subset that is created directly from the data warehouse.. Answer: dependent Diff: 1 Page Ref: 134 56) Online ________ is a term used for a transaction system that is primarily responsible for capturing and storing data related to day-to-day business functions such as ERP, CRM, SCM, and point of sale. Answer: transaction processing Diff: 2 Page Ref: 158 57) Given that the size of data warehouses is expanding at an exponential rate, ________ is an important issue. Answer: scalability Diff: 2 Page Ref: 163 58) The role responsible for successful administration and management of a data warehouse is the ________, who should be familiar with high-performance software, hardware, and networking technologies, and also possesses solid business insight. Answer: data warehouse administrator (DWA) Diff: 2 Page Ref: 164 59) ________, or "The Extended ASP Model," is a creative way of deploying information system applications where the provider licenses its applications to customers for use as a service on demand (usually over the Internet). Answer: SaaS (software as a service) Diff: 2 Page Ref: 165 60) ________ (also called in-database analytics) refers to the integration of the algorithmic extent of data analytics into data warehouse. Answer: In-database processing Diff: 2 Page Ref: 169 61) What is the definition of a data warehouse (DW) in simple terms? Answer: In simple terms, a data warehouse (DW) is a pool of data produced to support decision making; it is also a repository of current and historical data of potential interest to managers throughout the organization. Diff: 2 Page Ref: 131 62) A common way of introducing data warehousing is to refer to its fundamental characteristics. Describe three characteristics of data warehousing. Answer: • Subject oriented. Data are organized by detailed subject, such as sales, products, or customers, containing only information relevant for decision support. • Integrated. Integration is closely related to subject orientation. Data warehouses must place data from different sources into a consistent format. To do so, they must deal with naming conflicts and discrepancies among units of measure. A data warehouse is presumed to be totally integrated. • Time variant (time series). A warehouse maintains historical data. The data do not necessarily provide current status (except in real-time systems). They detect trends, deviations, and long-term relationships for forecasting and comparisons, leading to decision making. Every data warehouse has a temporal quality. Time is the one important dimension that all data warehouses must support. Data for analysis from multiple sources contains multiple time points (e.g., daily, weekly, monthly views). • Nonvolatile. After data are entered into a data warehouse, users cannot change or update the data. Obsolete data are discarded, and changes are recorded as new data. • Web based. Data warehouses are typically designed to provide an efficient computing environment for Web-based applications. • Relational/multidimensional. A data warehouse uses either a relational structure or a multidimensional structure. A recent survey on multidimensional structures can be found in Romero and Abelló (2009). • Client/server. A data warehouse uses the client/server architecture to provide easy access for end users. • Real time. Newer data warehouses provide real-time, or active, data-access and analysis capabilities (see Basu, 2003; and Bonde and Kuckuk, 2004). • Include metadata. A data warehouse contains metadata (data about data) about how the data are organized and how to effectively use them. Diff: 3 Page Ref: 133-134 63) What is the definition of a data mart? Answer: A data mart is a subset of a data warehouse, typically consisting of a single subject area (e.g., marketing, operations). Whereas a data warehouse combines databases across an entire enterprise, a data mart is usually smaller and focuses on a particular subject or department. Diff: 2 Page Ref: 134 64) Mehra (2005) indicated that few organizations really understand metadata, and fewer understand how to design and implement a metadata strategy. How would you describe metadata? Answer: Metadata are data about data. Metadata describe the structure of and some meaning about data, thereby contributing to their effective or ineffective use. Diff: 2 Page Ref: 135 65) What are the four processes that define a closed-loop BPM cycle? Answer: 1. Strategize: This is the process of identifying and stating the organization's mission, vision, and objectives, and developing plans (at different levels of granularity—strategic, tactical and operational) to achieve these objectives. 2. Plan: When operational managers know and understand the what (i.e., the organizational objectives and goals), they will be able to come up with the how (i.e., detailed operational and financial plans). Operational and financial plans answer two questions: What tactics and initiatives will be pursued to meet the performance targets established by the strategic plan? What are the expected financial results of executing the tactics? 3. Monitor/Analyze: When the operational and financial plans are underway, it is imperative that the performance of the organization be monitored. A comprehensive framework for monitoring performance should address two key issues: what to monitor and how to monitor. 4. Act and Adjust: What do we need to do differently? Whether a company is interested in growing its business or simply improving its operations, virtually all strategies depend on new projects—creating new products, entering new markets, acquiring new customers or businesses, or streamlining some processes. The final part of this loop is taking action and adjusting current actions based on analysis of problems and opportunities. Diff: 2 Page Ref: 171-172 66) Six Sigma rests on a simple performance improvement model known as DMAIC. What are the steps involved? Answer: 1. Define. Define the goals, objectives, and boundaries of the improvement activity. At the top level, the goals are the strategic objectives of the company. At lower levels—department or project levels—the goals are focused on specific operational processes. 2. Measure. Measure the existing system. Establish quantitative measures that will yield statistically valid data. The data can be used to monitor progress toward the goals defined in the previous step. 3. Analyze. Analyze the system to identify ways to eliminate the gap between the current performance of the system or process and the desired goal. 4. Improve. Initiate actions to eliminate the gap by finding ways to do things better, cheaper, or faster. Use project management and other planning tools to implement the new approach. 5. Control. Institutionalize the improved system by modifying compensation and incentive systems, policies, procedures, manufacturing resource planning, budgets, operation instructions, or other management systems. Diff: 2 Page Ref: 180 67) Briefly describe four major components of the data warehousing process. Answer: • Data sources. Data are sourced from multiple independent operational "legacy" systems and possibly from external data providers (such as the U.S. Census). Data may also come from an OLTP or ERP system. • Data extraction and transformation. Data are extracted and properly transformed using custom-written or commercial ETL software. • Data loading. Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse and/or data marts. • Comprehensive database. Essentially, this is the EDW to support all decision analysis by providing relevant summarized and detailed information originating from many different sources. • Metadata. Metadata include software programs about data and rules for organizing data summaries that are easy to index and search, especially with Web tools. • Middleware tools. Middleware tools enable access to the data warehouse. There are many front-end applications that business users can use to interact with data stored in the data repositories, including data mining, OLAP, reporting tools, and data visualization tools. Diff: 2 Page Ref: 137-139 68) There are several basic information system architectures that can be used for data warehousing. What are they? Answer: Generally speaking, these architectures are commonly called client/server or n-tier architectures, of which two-tier and three-tier architectures are the most common, but sometimes there is simply one tier. Diff: 2 Page Ref: 139-140 69) More data, coming in faster and requiring immediate conversion into decisions, means that organizations are confronting the need for real-time data warehousing (RDW). How would you define real-time data warehousing? Answer: Real-time data warehousing, also known as active data warehousing (ADW), is the process of loading and providing data via the data warehouse as they become available. Diff: 2 Page Ref: 168 70) Mention briefly some of the recently popularized concepts and technologies that will play a significant role in defining the future of data warehousing. Answer: • Sourcing (mechanisms for acquisition of data from diverse and dispersed sources): o Web, social media, and Big Data o Open source software o SaaS (software as a service) o Cloud computing • Infrastructure (architectural—hardware and software—enhancements): o Columnar (a new way to store and access data in the database) o Real-time data warehousing o Data warehouse appliances (all-in-one solutions to DW) o Data management technologies and practices o In-database processing technology (putting the algorithms where the data is) o In-memory storage technology (moving the data in the memory for faster processing) o New database management systems o Advanced analytics Diff: 3 Page Ref: 165-170 Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 4 Predictive Analytics I: Data Mining Process, Methods, and Algorithms 1) In the opening case, police detectives used data mining to identify possible new areas of inquiry. Answer: FALSE Diff: 1 Page Ref: 190-191 2) The cost of data storage has plummeted recently, making data mining feasible for more firms. Answer: TRUE Diff: 2 Page Ref: 194 3) Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales. Answer: FALSE Diff: 2 Page Ref: 193 4) If using a mining analogy, "knowledge mining" would be a more appropriate term than "data mining." Answer: TRUE Diff: 2 Page Ref: 196 5) The entire focus of the predictive analytics system in the Infinity P&C case was on detecting and handling fraudulent claims for the company's benefit. Answer: FALSE Diff: 3 Page Ref: 194-195 6) Data mining requires specialized data analysts to ask ad hoc questions and obtain answers quickly from the system. Answer: FALSE Diff: 2 Page Ref: 197 7) Ratio data is a type of categorical data. Answer: FALSE Diff: 1 Page Ref: 202 8) Converting continuous valued numerical variables to ranges and categories is referred to as discretization. Answer: TRUE Diff: 2 Page Ref: 202 9) In the Miami-Dade Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime. Answer: FALSE Diff: 1 Page Ref: 190-191 10) In data mining, classification models help in prediction. Answer: TRUE Diff: 2 Page Ref: 215 11) Statistics and data mining both look for data sets that are as large as possible. Answer: FALSE Diff: 2 Page Ref: 216 12) Using data mining on data about imports and exports can help to detect tax avoidance and money laundering. Answer: TRUE Diff: 1 Page Ref: 206 13) In the cancer research case study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals. Answer: FALSE Diff: 2 Page Ref: 209-210 14) During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality. Answer: TRUE Diff: 2 Page Ref: 216 15) K-fold cross-validation is also called sliding estimation. Answer: FALSE Diff: 2 Page Ref: 218 16) When a problem has many attributes that impact the classification of different patterns, decision trees may be a useful approach. Answer: TRUE Diff: 2 Page Ref: 221 17) In the Dell cases study, the largest issue was how to properly spend the online marketing budget. Answer: FALSE Diff: 2 Page Ref: 198-199 18) Market basket analysis is a useful and entertaining way to explain data mining to a technologically less savvy audience, but it has little business significance. Answer: FALSE Diff: 2 Page Ref: 227 19) Open-source data mining tools include applications such as IBM SPSS Modeler and Dell Statistica. Answer: FALSE Diff: 1 Page Ref: 231 20) Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data security. Answer: FALSE Diff: 2 Page Ref: 237 21) In the Influence Health case study, what was the goal of the system? A) locating clinic patients B) understanding follow-up care C) decreasing operational costs D) increasing service use Answer: D Diff: 3 Page Ref: 224 22) Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from A) collecting data about customers and transactions. B) developing a philosophy that is data analytics-centric. C) analyzing the vast data amounts routinely collected. D) asking the customers what they want. Answer: C Diff: 3 Page Ref: 193 23) All of the following statements about data mining are true EXCEPT A) the process aspect means that data mining should be a one-step process to results. B) the novel aspect means that previously unknown patterns are discovered. C) the potentially useful aspect means that results should lead to some business benefit. D) the valid aspect means that the discovered patterns should hold true on new data. Answer: A Diff: 3 Page Ref: 196 24) What is the main reason parallel processing is sometimes used for data mining? A) because the hardware exists in most organizations, and it is available to use B) because most of the algorithms used for data mining require it C) because of the massive data amounts and search efforts involved D) because any strategic application requires parallel processing Answer: C Diff: 3 Page Ref: 197 25) The data field "ethnic group" can be best described as A) nominal data. B) interval data. C) ordinal data. D) ratio data. Answer: A Diff: 2 Page Ref: 208 26) A data mining study is specific to addressing a well-defined business task, and different business tasks require A) general organizational data. B) general industry data. C) general economic data. D) different sets of data. Answer: D Diff: 2 Page Ref: 208 27) Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes? A) associations B) visualization C) classification D) clustering Answer: C Diff: 2 Page Ref: 200 28) Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features? A) associations B) visualization C) classification D) clustering Answer: D Diff: 2 Page Ref: 200 29) Clustering partitions a collection of things into segments whose members share A) similar characteristics. B) dissimilar characteristics. C) similar collection methods. D) dissimilar collection methods. Answer: A Diff: 2 Page Ref: 202 30) Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications? A) insurance B) retailing and logistics C) customer relationship management D) computer hardware and software Answer: A Diff: 2 Page Ref: 204 31) All of the following statements about data mining are true EXCEPT: A) The term is relatively new. B) Its techniques have their roots in traditional statistical analysis and artificial intelligence. C) The ideas behind it are relatively new. D) Intense, global competition make its application more important. Answer: C Diff: 2 Page Ref: 194 32) Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings? A) SEMMA B) proprietary organizational methodologies C) KDD Process D) CRISP-DM Answer: D Diff: 2 Page Ref: 214 33) Prediction problems where the variables have numeric values are most accurately defined as A) classifications. B) regressions. C) associations. D) computations. Answer: B Diff: 3 Page Ref: 215 34) What does the robustness of a data mining method refer to? A) its ability to predict the outcome of a previously unknown data set accurately B) its speed of computation and computational costs in using the mode C) its ability to construct a prediction model efficiently given a large amount of data D) its ability to overcome noisy data to make somewhat accurate predictions Answer: D Diff: 3 Page Ref: 216 35) What does the scalability of a data mining method refer to? A) its ability to predict the outcome of a previously unknown data set accurately B) its speed of computation and computational costs in using the mode C) its ability to construct a prediction model efficiently given a large amount of data D) its ability to overcome noisy data to make somewhat accurate predictions Answer: C Diff: 3 Page Ref: 216 36) In estimating the accuracy of data mining (or other) classification models, the true positive rate is A) the ratio of correctly classified positives divided by the total positive count. B) the ratio of correctly classified negatives divided by the total negative count. C) the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified positives. D) the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified negatives. Answer: A Diff: 2 Page Ref: 216-217 37) In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as A) association rule mining. B) cluster analysis. C) decision trees. D) artificial neural networks. Answer: A Diff: 2 Page Ref: 227 38) Third party providers of publicly available data sets protect the anonymity of the individuals in the data set primarily by A) asking data users to use the data ethically. B) leaving in identifiers (e.g., name), but changing other variables. C) removing identifiers such as names and social security numbers. D) letting individuals in the data know their data is being accessed. Answer: C Diff: 3 Page Ref: 237 39) In the Target case study, why did Target send a teen maternity ads? A) Target's analytic model confused her with an older woman with a similar name. B) Target was sending ads to all women in a particular neighborhood. C) Target's analytic model suggested she was pregnant based on her buying habits. D) Target was using a special promotion that targeted all teens in her geographical area. Answer: C Diff: 2 Page Ref: 238 40) Which of the following is a data mining myth? A) Data mining is a multistep process that requires deliberate, proactive design and use. B) Data mining requires a separate, dedicated database. C) The current state-of-the-art is ready to go for almost any business. D) Newer Web-based tools enable managers of all educational levels to do data mining. Answer: B Diff: 2 Page Ref: 239-240 41) In the Influence Health case, the company was able to evaluate over ________ million records in only two days. Answer: 195 Diff: 3 Page Ref: 225 42) There has been an increase in data mining to deal with global competition and customers' more sophisticated ________ and wants. Answer: needs Diff: 2 Page Ref: 194 43) Knowledge extraction, pattern analysis, data archaeology, information harvesting, pattern searching, and data dredging are all alternative names for ________. Answer: data mining Diff: 1 Page Ref: 196 44) Data are often buried deep within very large ________, which sometimes contain data from several years. Answer: databases Diff: 1 Page Ref: 196 45) ________ was proposed in the mid-1990s by a European consortium of companies to serve as a nonproprietary standard methodology for data mining. Answer: CRISP-DM Diff: 2 Page Ref: 207 46) In the Dell case study, engineers working closely with marketing, used lean software development strategies and numerous technologies to create a highly scalable, singular ________. Answer: data mart Diff: 2 Page Ref: 199 47) Patterns have been manually ________ from data by humans for centuries, but the increasing volume of data in modern times has created a need for more automatic approaches. Answer: extracted Diff: 2 Page Ref: 200 48) While prediction is largely experience and opinion based, ________ is data and model based. Answer: forecasting Diff: 2 Page Ref: 200 49) Whereas ________ starts with a well-defined proposition and hypothesis, data mining starts with a loosely defined discovery statement. Answer: statistics Diff: 2 Page Ref: 203 50) Customer ________ management extends traditional marketing by creating one-on-one relationships with customers. Answer: relationship Diff: 2 Page Ref: 203 51) In the terrorist funding case study, an observed price ________ may be related to income tax avoidance/evasion, money laundering, or terrorist financing. Answer: deviation Diff: 3 Page Ref: 206 52) Data preparation, the third step in the CRISP-DM data mining process, is more commonly known as ________. Answer: data preprocessing Diff: 2 Page Ref: 208 53) The data mining in cancer research case study explains that data mining methods are capable of extracting patterns and ________ hidden deep in large and complex medical databases. Answer: relationships Diff: 3 Page Ref: 209-210 54) Fayyad et al. (1996) defined ________ in databases as a process of using data mining methods to find useful information and patterns in the data. Answer: knowledge discovery Diff: 2 Page Ref: 213 55) In ________, a classification method, the complete data set is randomly split into mutually exclusive subsets of approximately equal size and tested multiple times on each left-out subset, using the others as a training set. Answer: k-fold cross-validation Diff: 2 Page Ref: 218 56) The basic idea behind a(n) ________ is that it recursively divides a training set until each division consists entirely or primarily of examples from one class. Answer: decision tree Diff: 3 Page Ref: 221 57) As described in the Influence Health case study, customers are more often ________ services from a variety of healthcare service providers before selecting one. Answer: comparing Diff: 2 Page Ref: 224 58) Because of its successful application to retail business problems, association rule mining is commonly called ________. Answer: market-basket analysis Diff: 2 Page Ref: 227 59) The ________ is the most commonly used algorithm to discover association rules. Given a set of itemsets, the algorithm attempts to find subsets that are common to at least a minimum number of the itemsets. Answer: Apriori algorithm Diff: 2 Page Ref: 229 60) One way to accomplish privacy and protection of individuals' rights when data mining is by ________ of the customer records prior to applying data mining applications, so that the records cannot be traced to an individual. Answer: de-identification Diff: 2 Page Ref: 237 61) List five reasons for the growing popularity of data mining in the business world. Answer: • More intense competition at the global scale driven by customers' ever-changing needs and wants in an increasingly saturated marketplace • General recognition of the untapped value hidden in large data sources • Consolidation and integration of database records, which enables a single view of customers, vendors, transactions, etc. • Consolidation of databases and other data repositories into a single location in the form of a data warehouse • The exponential increase in data processing and storage technologies • Significant reduction in the cost of hardware and software for data storage and processing • Movement toward the demassification (conversion of information resources into nonphysical form) of business practices Diff: 2 Page Ref: 194 62) List 3 common data mining myths and realities. Answer: 1) Myth: Data mining provides instant, crystal-ball-like predictions. Reality: Data mining is a multistep process that requires deliberate, proactive design and use. 2) Myth: Data mining is not yet viable for mainstream business applications. Reality: The current state of the art is ready to go for almost any business type and/or size. 3) Myth: Data mining requires a separate, dedicated database. Reality: Because of the advances in database technology, a dedicated database is not required. 4) Myth: Only those with advanced degrees can do data mining. Reality: Newer Web-based tools enable managers of all educational levels to do data mining. 5) Myth: Data mining is only for large firms that have lots of customer data. Reality: If the data accurately reflect the business or its customers, any company can use data mining. Diff: 2 Page Ref: 239 63) List and briefly describe the six steps of the CRISP-DM data mining process. Answer: Step 1: Business Understanding — The key element of any data mining study is to know what the study is for. Answering such a question begins with a thorough understanding of the managerial need for new knowledge and an explicit specification of the business objective regarding the study to be conducted. Step 2: Data Understanding — A data mining study is specific to addressing a well-defined business task, and different business tasks require different sets of data. Following the business understanding, the main activity of the data mining process is to identify the relevant data from many available databases. Step 3: Data Preparation — The purpose of data preparation (or more commonly called data preprocessing) is to take the data identified in the previous step and prepare it for analysis by data mining methods. Compared to the other steps in CRISP-DM, data preprocessing consumes the most time and effort; most believe that this step accounts for roughly 80 percent of the total time spent on a data mining project Step 4: Model Building — Here, various modeling techniques are selected and applied to an already prepared data set in order to address the specific business need. The model-building step also encompasses the assessment and comparative analysis of the various models built. Step 5: Testing and Evaluation — In step 5, the developed models are assessed and evaluated for their accuracy and generality. This step assesses the degree to which the selected model (or models) meets the business objectives and, if so, to what extent (i.e., do more models need to be developed and assessed). Step 6: Deployment — Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process across the enterprise. In many cases, it is the customer, not the data analyst, who carries out the deployment steps. Diff: 2 Page Ref: 207-212 64) Describe the role of the simple split in estimating the accuracy of classification models. Answer: The simple split (or holdout or test sample estimation) partitions the data into two mutually exclusive subsets called a training set and a test set (or holdout set). It is common to designate two-thirds of the data as the training set and the remaining one-third as the test set. The training set is used by the inducer (model builder), and the built classifier is then tested on the test set. An exception to this rule occurs when the classifier is an artificial neural network. In this case, the data is partitioned into three mutually exclusive subsets: training, validation, and testing. Diff: 2 Page Ref: 217 65) Briefly describe five techniques (or algorithms) that are used for classification modeling. Answer: • Decision tree analysis. Decision tree analysis (a machine-learning technique) is arguably the most popular classification technique in the data mining arena. • Statistical analysis. Statistical techniques were the primary classification algorithm for many years until the emergence of machine-learning techniques. Statistical classification techniques include logistic regression and discriminant analysis. • Neural networks. These are among the most popular machine-learning techniques that can be used for classification-type problems. • Case-based reasoning. This approach uses historical cases to recognize commonalities in order to assign a new case into the most probable category. • Bayesian classifiers. This approach uses probability theory to build classification models based on the past occurrences that are capable of placing a new instance into a most probable class (or category). • Genetic algorithms. This approach uses the analogy of natural evolution to build directedsearch-based mechanisms to classify data samples. • Rough sets. This method takes into account the partial membership of class labels to predefined categories in building models (collection of rules) for classification problems. Diff: 2 Page Ref: 219-220 66) Describe cluster analysis and some of its applications. Answer: Cluster analysis is an exploratory data analysis tool for solving classification problems. The objective is to sort cases (e.g., people, things, events) into groups, or clusters, so that the degree of association is strong among members of the same cluster and weak among members of different clusters. Cluster analysis is an essential data mining method for classifying items, events, or concepts into common groupings called clusters. The method is commonly used in biology, medicine, genetics, social network analysis, anthropology, archaeology, astronomy, character recognition, and even in MIS development. As data mining has increased in popularity, the underlying techniques have been applied to business, especially to marketing. Cluster analysis has been used extensively for fraud detection (both credit card and e-commerce fraud) and market segmentation of customers in contemporary CRM systems. Diff: 2 Page Ref: 225-226 67) In the data mining in Hollywood case study, how successful were the models in predicting the success or failure of a Hollywood movie? Answer: The researchers claim that these prediction results are better than any reported in the published literature for this problem domain. Fusion classification methods attained up to 56.07% accuracy in correctly classifying movies and 90.75% accuracy in classifying movies within one category of their actual category. The SVM classification method attained up to 55.49% accuracy in correctly classifying movies and 85.55% accuracy in classifying movies within one category of their actual category. Diff: 3 Page Ref: 233 68) In lessons learned from the Target case, what legal warnings would you give another retailer using data mining for marketing? Answer: If you look at this practice from a legal perspective, you would conclude that Target did not use any information that violates customer privacy; rather, they used transactional data that most every other retail chain is collecting and storing (and perhaps analyzing) about their customers. What was disturbing in this scenario was perhaps the targeted concept: pregnancy. There are certain events or concepts that should be off limits or treated extremely cautiously, such as terminal disease, divorce, and bankruptcy. Diff: 2 Page Ref: 238 69) List four myths associated with data mining. Answer: • Data mining provides instant, crystal-ball-like predictions. • Data mining is not yet viable for business applications. • Data mining requires a separate, dedicated database. • Only those with advanced degrees can do data mining. • Data mining is only for large firms that have lots of customer data. Diff: 2 Page Ref: 239 70) List six common data mining mistakes. Answer: • Selecting the wrong problem for data mining • Ignoring what your sponsor thinks data mining is and what it really can and cannot do • Leaving insufficient time for data preparation • Looking only at aggregated results and not at individual records • Being sloppy about keeping track of the data mining procedure and results • Ignoring suspicious findings and quickly moving on • Running mining algorithms repeatedly and blindly • Believing everything you are told about the data • Believing everything you are told about your own data mining analysis • Measuring your results differently from the way your sponsor measures them Diff: 2 Page Ref: 239-240 Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 5 Predictive Analytics II: Text, Web, and Social Media Analytics 1) Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining. Answer: FALSE Diff: 2 Page Ref: 251 2) Categorization and clustering of documents during text mining differ only in the preselection of categories. Answer: TRUE Diff: 2 Page Ref: 252 3) Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out. Answer: TRUE Diff: 2 Page Ref: 253 4) In the car insurance case study, text mining was used to identify auto features that caused injuries. Answer: FALSE Diff: 2 Page Ref: 254-255 5) Regional accents present challenges for natural language processing. Answer: TRUE Diff: 2 Page Ref: 256 6) In the Tito's Vodka case study, trends in cocktails were studied to create a quarterly recipe for customers. Answer: TRUE Diff: 2 Page Ref: 306 7) In the Wimbledon case study, designers balanced the needs of mobile and desktop computer users. Answer: TRUE Diff: 2 Page Ref: 278 8) In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document. Answer: TRUE Diff: 2 Page Ref: 272 9) In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings. Answer: FALSE Diff: 2 Page Ref: 276 10) Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment. Answer: TRUE Diff: 2 Page Ref: 276 11) In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way. Answer: TRUE Diff: 2 Page Ref: 276 12) Search engines are only used in the context of the World Wide Web (WWW). Answer: FALSE Diff: 2 Page Ref: 291 13) Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters. Answer: FALSE Diff: 2 Page Ref: 294-295 14) Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences. Answer: TRUE Diff: 2 Page Ref: 299 15) Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors. Answer: FALSE Diff: 2 Page Ref: 303 16) Web-based media has nearly identical cost and scale structures as traditional media. Answer: FALSE Diff: 2 Page Ref: 309 17) Consistent high quality, higher publishing frequency, and longer time lag are all attributes of industrial publishing when compared to Web publishing. Answer: FALSE Diff: 2 Page Ref: 309-310 18) In the evolution of social media user engagement, the largest recent change is the growth of creators. Answer: FALSE Diff: 2 Page Ref: 310-311 19) Descriptive analytics for social media feature such items as your followers as well as the content in online conversations that help you to identify themes and sentiments. Answer: FALSE Diff: 2 Page Ref: 311 20) Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations. Answer: FALSE Diff: 3 Page Ref: 312 21) In the opening vignette, the architectural system that supported Watson used all the following elements EXCEPT A) massive parallelism to enable simultaneous consideration of multiple hypotheses. B) an underlying confidence subsystem that ranks and integrates answers. C) a core engine that could operate seamlessly in another domain without changes. D) integration of shallow and deep knowledge. Answer: C Diff: 3 Page Ref: 248-250 22) In text mining, tokenizing is the process of A) categorizing a block of text in a sentence. B) reducing multiple words to their base or root. C) transforming the term-by-document matrix to a manageable size. D) creating new branches or stems of recorded paragraphs. Answer: A Diff: 2 Page Ref: 253 23) All of the following are challenges associated with natural language processing EXCEPT A) dividing up a text into individual words in English. B) understanding the context in which something is said. C) distinguishing between words that have more than one meaning. D) recognizing typographical or grammatical errors in texts. Answer: A Diff: 3 Page Ref: 256 24) Natural language processing (NLP) is associated with which of the following areas? A) text mining B) artificial intelligence C) computational linguistics D) all of these Answer: D Diff: 2 Page Ref: 256 25) In the research literature case study, the researchers analyzing academic papers extracted information from which source? A) the paper abstract B) the paper keywords C) the main body of the paper D) the paper references Answer: A Diff: 1 Page Ref: 273-274 26) In sentiment analysis, which of the following is an implicit opinion? A) The hotel we stayed in was terrible. B) The customer service I got for my TV was laughable. C) The cruise we went on last summer was a disaster. D) Our new mayor is great for the city. Answer: B Diff: 3 Page Ref: 277 27) In the Wimbledon case study, the tournament used data for each match in real time to highlight A) winners and losers. B) player histories. C) significant events. D) advertiser content. Answer: C Diff: 2 Page Ref: 278-280 28) What do voice of the market (VOM) applications of sentiment analysis do? A) They examine customer sentiment at the aggregate level. B) They examine employee sentiment in the organization. C) They examine the stock market for trends. D) They examine the "market of ideas" in politics. Answer: A Diff: 3 Page Ref: 281 29) Sentiment analysis projects require a lexicon for use. If a project in English is undertaken, you must generally make sure to A) use only the single, approved English lexicon. B) use any general English lexicon. C) use an English lexicon appropriate to the project at your discretion. D) create an English lexicon for the project. Answer: C Diff: 3 Page Ref: 284-285 30) In text analysis, what is a lexicon? A) a catalog of words, their synonyms, and their meanings B) a catalog of customers, their words, and phrases C) a catalog of letters, words, phrases, and sentences D) a catalog of customers, products, words, and phrases Answer: A Diff: 3 Page Ref: 284 31) What types of documents are BEST suited to semantic labeling and aggregation to determine sentiment orientation? A) medium- to large-sized documents B) small- to medium-sized documents C) large-sized documents D) collections of documents Answer: B Diff: 3 Page Ref: 286 32) What does Web content mining involve? A) analyzing the universal resource locator in Web pages B) analyzing the unstructured content of Web pages C) analyzing the pattern of visits to a Web site D) analyzing the PageRank and other metadata of a Web page Answer: B Diff: 2 Page Ref: 289 33) Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called A) preprocessing the documents. B) document analysis. C) creating the term-by-document matrix. D) parsing the documents. Answer: D Diff: 3 Page Ref: 293 34) Search engine optimization (SEO) is a means by which A) Web site developers can negotiate better deals for paid ads. B) Web site developers can increase Web site search rankings. C) Web site developers index their Web sites for search engines. D) Web site developers optimize the artistic features of their Web sites. Answer: B Diff: 2 Page Ref: 294-295 35) What are the two main types of Web analytics? A) old-school and new-school Web analytics B) Bing and Google Web analytics C) off-site and on-site Web analytics D) data-based and subjective Web analytics Answer: C Diff: 3 Page Ref: 299 36) Web site usability may be rated poor if A) the average number of page views on your Web site is large. B) the time spent on your Web site is long. C) Web site visitors download few of your offered PDFs and videos. D) users fail to click on all pages equally. Answer: C Diff: 2 Page Ref: 300 37) Understanding which keywords your users enter to reach your Web site through a search engine can help you understand A) the hardware your Web site is running on. B) the type of Web browser being used by your Web site visitors. C) most of your Web site visitors' wants and needs. D) how well visitors understand your products. Answer: D Diff: 3 Page Ref: 301 38) Which of the following statements about Web site conversion statistics is FALSE? A) Web site visitors can be classed as either new or returning. B) Visitors who begin a purchase on most Web sites must complete it. C) The conversion rate is the number of people who take action divided by the number of visitors. D) Analyzing exit rates can tell you why visitors left your Web site. Answer: B Diff: 3 Page Ref: 302 39) What is one major way in which Web-based social media differs from traditional publishing media? A) Most Web-based media are operated by the government and large firms. B) They use different languages of publication. C) They have different costs to own and operate. D) Web-based media have a narrower range of quality. Answer: C Diff: 3 Page Ref: 310 40) What does advanced analytics for social media do? A) It helps identify your followers. B) It identifies links between groups. C) It examines the content of online conversations. D) It identifies the biggest sources of influence online. Answer: C Diff: 2 Page Ref: 311 41) IBM's Watson utilizes a massively parallel, text mining–focused, probabilistic evidencebased computational architecture called ________. Answer: DeepQA Diff: 2 Page Ref: 248 42) ________, also called homonyms, are syntactically identical words with different meanings. Answer: Polysemes Diff: 2 Page Ref: 253 43) When a word has more than one meaning, selecting the meaning that makes the most sense can only be accomplished by taking into account the context within which the word is used. This concept is known as ________. Answer: word sense disambiguation Diff: 3 Page Ref: 256 44) ________ is a technique used to detect favorable and unfavorable opinions toward specific products and services using large numbers of textual data sources. Answer: Sentiment analysis Diff: 2 Page Ref: 257 45) In the Mining for Lies case study, a text based deception-detection method used by Fuller and others in 2008 was based on a process known as ________, which relies on elements of data and text mining techniques. Answer: message feature mining Diff: 2 Page Ref: 262-263 46) At a very high level, the text mining process can be broken down into three consecutive tasks, the first of which is to establish the ________. Answer: Corpus Diff: 2 Page Ref: 269 47) Because the term document matrix is often very large and rather sparse, an important optimization step is to reduce the ________ of the matrix. Answer: dimensionality Diff: 2 Page Ref: 270 48) ________ is mostly driven by sentiment analysis and is a key element of customer experience management initiatives, where the goal is to create an intimate relationship with the customer. Answer: Voice of the customer (VOC) Diff: 2 Page Ref: 280 49) When viewed as a binary feature, ________ classification is the binary classification task of labeling an opinionated document as expressing either an overall positive or an overall negative opinion. Answer: polarity Diff: 2 Page Ref: 282 50) Web pages contain both unstructured information and ________, which are connections to other Web pages. Answer: hyperlinks Diff: 1 Page Ref: 290 51) Web ________ are used to automatically read through the contents of Web sites. Answer: crawlers/spiders Diff: 1 Page Ref: 289 52) A(n) ________ is one or more Web pages that provide a collection of links to authoritative Web pages. Answer: hub Diff: 1 Page Ref: 290 53) A(n) ________ engine is a software program that searches for Web sites or files based on keywords. Answer: search Diff: 1 Page Ref: 291 54) In the Lotte.com retail case, the company deployed SAS for Customer Experience Analytics to better understand the quality of customer traffic on their Web site, classify order rates, and see which ________ had the most visitors. Answer: channels Diff: 2 Page Ref: 297 55) ________ Web analytics refers to measurement and analysis of data relating to your company that takes place outside your Web site. Answer: Off-site Diff: 1 Page Ref: 299 56) A(n) ________ Web site contains links that send traffic directly to your Web site. Answer: referral Diff: 2 Page Ref: 301 57) ________ statistics help you understand whether your specific marketing objective for a Web page is being achieved. Answer: Conversion Diff: 1 Page Ref: 302 58) In the Tito's Vodka case, it was important that social media users all had a(n) ________ brand experience. Answer: consistent Diff: 2 Page Ref: 306 59) ________ is a connections metric for social networks that measures the ties that actors in a network have with others that are geographically close. Answer: Propinquity Diff: 1 Page Ref: 308 60) ________ is a segmentation metric for social networks that measures the strength of the bonds between actors in a social network. Answer: Cohesion Diff: 1 Page Ref: 309 61) How would you describe information extraction in text mining? Answer: Information extraction is the identification of key phrases and relationships within text by looking for predefined objects and sequences in text by way of pattern matching. Diff: 2 Page Ref: 252 62) Natural language processing (NLP), a subfield of artificial intelligence and computational linguistics, is an important component of text mining. What is the definition of NLP? Answer: NLP is a discipline that studies the problem of "understanding" the natural human language, with the view of converting depictions of human language into more formal representations in the form of numeric and symbolic data that are easier for computer programs to manipulate. Diff: 2 Page Ref: 256 63) In the security domain, one of the largest and most prominent text mining applications is the highly classified ECHELON surveillance system. What is ECHELON assumed to be capable of doing? Answer: Identifying the content of telephone calls, faxes, e-mails, and other types of data and intercepting information sent via satellites, public switched telephone networks, and microwave links Diff: 2 Page Ref: 261-262 64) Describe the query-specific clustering method as it relates to clustering. Answer: This method employs a hierarchical clustering approach where the most relevant documents to the posed query appear in small tight clusters that are nested in larger clusters containing less similar documents, creating a spectrum of relevance levels among the documents. Diff: 3 Page Ref: 272 65) Identify, with a brief description, each of the four steps in the sentiment analysis process. Answer: 1. Sentiment Detection: Here the goal is to differentiate between a fact and an opinion, which may be viewed as classification of text as objective or subjective. 2. N-P Polarity Classification: Given an opinionated piece of text, the goal is to classify the opinion as falling under one of two opposing sentiment polarities, or locate its position on the continuum between these two polarities. 3. Target Identification: The goal of this step is to accurately identify the target of the expressed sentiment. 4. Collection and Aggregation: In this step all text data points in the document are aggregated and converted to a single sentiment measure for the whole document. Diff: 2 Page Ref: 282-284 66) In what ways does the Web pose great challenges for effective and efficient knowledge discovery through data mining? Answer: • The Web is too big for effective data mining. The Web is so large and growing so rapidly that it is difficult to even quantify its size. Because of the sheer size of the Web, it is not feasible to set up a data warehouse to replicate, store, and integrate all of the data on the Web, making data collection and integration a challenge. • The Web is too complex. The complexity of a Web page is far greater than a page in a traditional text document collection. Web pages lack a unified structure. They contain far more authoring style and content variation than any set of books, articles, or other traditional textbased document. • The Web is too dynamic. The Web is a highly dynamic information source. Not only does the Web grow rapidly, but its content is constantly being updated. Blogs, news stories, stock market results, weather reports, sports scores, prices, company advertisements, and numerous other types of information are updated regularly on the Web. • The Web is not specific to a domain. The Web serves a broad diversity of communities and connects billions of workstations. Web users have very different backgrounds, interests, and usage purposes. Most users may not have good knowledge of the structure of the information network and may not be aware of the heavy cost of a particular search that they perform. • The Web has everything. Only a small portion of the information on the Web is truly relevant or useful to someone (or some task). Finding the portion of the Web that is truly relevant to a person and the task being performed is a prominent issue in Web-related research. Diff: 2 Page Ref: 287-288 67) What is search engine optimization (SEO) and why is it important for organizations that own Web sites? Answer: Search engine optimization (SEO) is the intentional activity of affecting the visibility of an e-commerce site or a Web site in a search engine's natural (unpaid or organic) search results. In general, the higher ranked on the search results page, and more frequently a site appears in the search results list, the more visitors it will receive from the search engine's users. Being indexed by search engines like Google, Bing, and Yahoo! is not good enough for businesses. Getting ranked on the most widely used search engines and getting ranked higher than your competitors are what make the difference. Diff: 3 Page Ref: 294-295 68) What is the difference between white hat and black hat SEO activities? Answer: An SEO technique is considered white hat if it conforms to the search engines' guidelines and involves no deception. Because search engine guidelines are not written as a series of rules or commandments, this is an important distinction to note. White-hat SEO is not just about following guidelines, but about ensuring that the content a search engine indexes and subsequently ranks is the same content a user will see. Black-hat SEO attempts to improve rankings in ways that are disapproved by the search engines, or involve deception or trying to trick search engine algorithms from their intended purpose. Diff: 3 Page Ref: 295 69) Why are the users' page views and time spent on your Web site important metrics? Answer: If people come to your Web site and don't view many pages, that is undesirable and your Web site may have issues with its design or structure. Another explanation for low page views is a disconnect in the marketing messages that brought them to the site and the content that is actually available. Generally, the longer a person spends on your Web site, the better it is. That could mean they're carefully reviewing your content, utilizing interactive components you have available, and building toward an informed decision to buy, respond, or take the next step you've provided. On the contrary, the time on site also needs to be examined against the number of pages viewed to make sure the visitor isn't spending his or her time trying to locate content that should be more readily accessible. Diff: 3 Page Ref: 300 70) What are the three categories of social media analytics technologies and what do they do? Answer: • Descriptive analytics: Uses simple statistics to identify activity characteristics and trends, such as how many followers you have, how many reviews were generated on Facebook, and which channels are being used most often. • Social network analysis: Follows the links between friends, fans, and followers to identify connections of influence as well as the biggest sources of influence. • Advanced analytics: Includes predictive analytics and text analytics that examine the content in online conversations to identify themes, sentiments, and connections that would not be revealed by casual surveillance. Diff: 2 Page Ref: 311 Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 6 Prescriptive Analytics: Optimization and Simulation 1) In the School District of Philadelphia case, Excel and an add-in was used to evaluate different vendor options. Answer: TRUE Diff: 2 Page Ref: 321 2) Modeling is a key element for prescriptive analytics. Answer: TRUE Diff: 1 Page Ref: 322 3) Business analysis is the monitoring, scanning, and interpretation of collected environmental information. Answer: FALSE Diff: 2 Page Ref: 324 4) Online commerce and communication has created an immense need for forecasting and an abundance of available information for performing it. Answer: TRUE Diff: 2 Page Ref: 324 5) All quantitative models are typically made up of six basic components. Answer: FALSE Diff: 2 Page Ref: 328 6) Result variables are considered independent variables. Answer: FALSE Diff: 2 Page Ref: 328 7) In decision making under uncertainty, it is assumed that complete knowledge is available. Answer: FALSE Diff: 2 Page Ref: 330 8) A decision made under risk is also known as a probabilistic or stochastic decision-making situation. Answer: TRUE Diff: 2 Page Ref: 331 9) Spreadsheets include all possible tools needed to deploy a custom DSS. Answer: FALSE Diff: 2 Page Ref: 332 10) Spreadsheets are clearly the most popular developer modeling tool. Answer: FALSE Diff: 2 Page Ref: 333 11) Every LP model has some internal intermediate variables that are not explicitly stated. Answer: TRUE Diff: 2 Page Ref: 340 12) A model builder makes predictions and assumptions regarding input data, many of which deal with the assessment of certain futures. Answer: FALSE Diff: 2 Page Ref: 347 13) Many quantitative models of decision theory are based on comparing a single measure of effectiveness, generally some form of utility to the decision maker. Answer: TRUE Diff: 2 Page Ref: 346 14) A decision table shows the relationships of the problem graphically and can handle complex situations in a compact form. Answer: FALSE Diff: 2 Page Ref: 350-351 15) Decision situations that involve a finite and usually not too large number of alternatives are modeled through an approach called decision analysis. Answer: TRUE Diff: 2 Page Ref: 349 16) The pessimistic approach assumes that the worst possible outcome for each alternative will occur and selects the best of these. Answer: TRUE Diff: 2 Page Ref: 350-351 17) Simulation is the appearance of reality. Answer: TRUE Diff: 1 Page Ref: 352 18) Simulation is normally used only when a problem is too complex to be treated using numerical optimization techniques. Answer: TRUE Diff: 2 Page Ref: 352 19) Simulations are an experimental, expensive, error-prone method for gaining insight into complex decision-making situations. Answer: FALSE Diff: 2 Page Ref: 359 20) VIS uses animated computer graphic displays to present the impact of different managerial decisions. Answer: TRUE Diff: 2 Page Ref: 360 21) A more general form of an influence diagram is called a(n) A) forecast. B) environmental scan. C) cognitive map. D) static model. Answer: C Diff: 2 Page Ref: 324 22) A(n) ________ is a graphical representation of a model. A) multidimensional analysis B) influence diagram C) OLAP model D) Whisker plot Answer: B Diff: 2 Page Ref: 327 23) Which of the following is NOT a component of a quantitative model? A) result variables B) decision variables C) classes D) parameters Answer: C Diff: 2 Page Ref: 328 24) Intermediate result variables reflect intermediate outcomes in A) mathematical models. B) flowcharts. C) decision trees. D) ROI calculations. Answer: A Diff: 2 Page Ref: 329 25) When the decision maker must consider several possible outcomes for each alternative, each with a given probability of occurrence, this is decision making under A) certainty. B) uncertainty. C) risk. D) duress. Answer: C Diff: 2 Page Ref: 331 26) When the decision maker knows exactly what the outcome of each course of action will be, this is decision making under A) certainty. B) uncertainty. C) risk. D) duress. Answer: A Diff: 2 Page Ref: 330 27) A(n) ________ spreadsheet model represents behavior over time. A) static B) dynamic C) looped D) add-in Answer: B Diff: 2 Page Ref: 336 28) Important spreadsheet features for modeling include all of the following EXCEPT A) what-if analysis. B) goal seeking. C) macros. D) pivot tables. Answer: D Diff: 2 Page Ref: 335 29) Which of the following is NOT a characteristic displayed by a LP allocation problem? A) A limited quantity of economic resources is available for allocation. B) The resources are used in the production of products or services. C) There are two or more ways in which the resources can be used. D) The problem is not bound by constraints. Answer: D Diff: 2 Page Ref: 338 30) Which of the following is NOT a characteristic displayed by a LP allocation problem? A) Each activity in which the resources are used yields a return in terms of the stated goal. B) The resources are used in the production of products or services. C) There is a single way in which the resources can be used. D) The allocation is usually restricted by several limitations and requirements. Answer: C Diff: 2 Page Ref: 338 31) Which of the following is NOT an assumption used by a LP allocation problem? A) Returns from different allocations can be compared. B) The return from any allocation is independent of other allocations. C) The total return is the sum of the returns yielded by the different activities. D) All data are unknown with decision making under uncertainty. Answer: D Diff: 2 Page Ref: 338 32) Which of the following is NOT an assumption used by a LP allocation problem? A) The resources are to be used in the most economical manner. B) The return from any allocation is independent of other allocations. C) Total returns cannot be compared. D) All data are known with certainty. Answer: C Diff: 2 Page Ref: 338 33) This method calculates the values of the inputs necessary to achieve a desired level of an output. A) goal seek B) what-if C) sensitivity D) LP Answer: A Diff: 2 Page Ref: 348 34) This method calculates the values of the inputs necessary to generate a zero profit outcome. A) goal seek B) what-if C) sensitivity D) break-even Answer: A Diff: 2 Page Ref: 349 35) The most common method for solving a risk analysis problem is to select the alternative with the A) smallest expected value. B) greatest expected value. C) mean expected value. D) median expected value. Answer: B Diff: 2 Page Ref: 351 36) A decision tree can be cumbersome if there are A) uncertain results. B) few alternatives. C) many alternatives. D) pre-existing decision tables. Answer: C Diff: 2 Page Ref: 351 37) Which of the following is NOT a disadvantage of a simulation? A) An optimal solution cannot be guaranteed, but relatively good ones are generally found. B) Simulation software sometimes requires special skills because of the complexity of the formal solution method. C) Simulation is often the only DSS modeling method that can readily handle relatively unstructured problems. D) Simulation model construction can be a slow and costly process, although newer modeling systems are easier to use than ever. Answer: C Diff: 2 Page Ref: 355 38) Which of the following is the order of simulation methodology? A) Define the problem, Construct the simulation model, Test and validate the model, Design the experiment, Conduct the experiment, Implement the results, Evaluate the results. B) Construct the simulation model, Test and validate the model, Define the problem, Design the experiment, Conduct the experiment, Evaluate the results, Implement the results. C) Define the problem, Construct the simulation model, Test and validate the model, Evaluate the results, Implement the results, Design the experiment, Conduct the experiment. D) Define the problem, Construct the simulation model, Test and validate the model, Design the experiment, Conduct the experiment, Evaluate the results, Implement the results. Answer: D Diff: 2 Page Ref: 355-356 39) What type of VIM models display a visual image of the result of one decision alternative at a time? A) static B) dynamic C) DSS D) VIS Answer: A Diff: 2 Page Ref: 360 40) If a simulation result does NOT match the intuition or judgment of the decision maker, what can occur? A) read/write error B) visual distortion C) project failure D) confidence gap Answer: D Diff: 2 Page Ref: 359 41) A(n) ________ model can be constructed under assumed environments of certainty. Answer: dynamic Diff: 2 Page Ref: 324 42) Selecting the best ________ to work with is a laborious yet important task for companies and government organizations. Answer: vendors Diff: 2 Page Ref: 320 43) Identification of a model's variables (e.g., decision, result, uncontrollable) is critical, as are the relationships among the ________. Answer: variables Diff: 2 Page Ref: 324 44) ________, like data, must be managed to maintain their integrity, and thus their applicability. Answer: Models Diff: 2 Page Ref: 326 45) Factors that are not under the control of the decision maker but can be fixed, are called ________. Answer: parameters Diff: 2 Page Ref: 328 46) The components of a quantitative model are linked by ________ expressions. Answer: algebraic Diff: 2 Page Ref: 329 47) A probabilistic decision-making situation is a decision made under ________. Answer: risk Diff: 2 Page Ref: 331 48) Risk ________ is a decision-making method that analyzes the risk (based on assumed known probabilities) associated with different alternatives. Answer: analysis Diff: 2 Page Ref: 331 49) Spreadsheets use ________ to extend their functionality. Answer: add-ins Diff: 2 Page Ref: 333 50) ________ is performed by indicating a target cell, its desired value, and a changing cell. Answer: Goal seeking Diff: 2 Page Ref: 335 51) Of the available solutions, at least one is the best, in the sense that the degree of goal attainment associated with it is the highest; this is called a(n) ________ solution. Answer: optimal Diff: 2 Page Ref: 338 52) Every LP model is composed of ________ variables whose values are unknown and are searched for. Answer: decision Diff: 2 Page Ref: 338 53) ________ analysis attempts to assess the impact of a change in the input data or parameters on the proposed solution. Answer: Sensitivity Diff: 2 Page Ref: 347 54) ________ analysis is structured as "What will happen to the solution if an input variable, an assumption, or a parameter value is changed?" Answer: What-if Diff: 2 Page Ref: 348 55) The ________ approach assumes that the best possible outcome of each alternative will occur and then selects the best of the best. Answer: optimistic Diff: 2 Page Ref: 350 56) Multiple goals is a decision situation in which alternatives are evaluated with several, sometimes ________, goals. Answer: conflicting Diff: 2 Page Ref: 352 57) In ________ simulation, one or more of the independent variables (e.g., the demand in an inventory problem) are subject to chance variation. Answer: probabilistic Diff: 3 Page Ref: 356 58) The most common simulation method for business decision problems is the ________ simulation. Answer: Monte Carlo Diff: 3 Page Ref: 357 59) The ________ approach can be used in conjunction with artificial intelligence. Answer: VIM Diff: 3 Page Ref: 360 60) Conventional ________ generally reports statistical results at the end of a set of experiments. Answer: simulation Diff: 3 Page Ref: 359 61) Why is there a trend to developing and using cloud-based tools for modeling? Answer: This trend exists because it simplifies the process for users. These systems give them access to powerful tools and pre-existing models that they can use to solve business problems. Because these systems are cloud-based, there are costs associated with operating them and maintaining them. Diff: 2 Page Ref: 327 62) List and briefly discuss the major components of a quantitative model. Answer: These components include: 1. Result (outcome) variables reflect the level of effectiveness of a system; that is, they indicate how well the system performs or attains its goal(s). 2. Decision variables describe alternative courses of action. The decision maker controls the decision variables. 3. Uncontrollable Variables - in any decision-making situation, there are factors that affect the result variables but are not under the control of the decision maker 4. Intermediate result variables reflect intermediate outcomes in mathematical models. Diff: 2 Page Ref: 328-329 63) Why do many believe that making decisions under uncertainty is more difficult than making decisions under risk? Answer: This opinion is commonly held because making decisions under uncertainty allows for an unlimited number of possible outcomes, yet no understanding of the likelihood of those outcomes. In contrast, decision-making under risk allows for an unlimited number of outcomes, but a known probability of the likelihood of those outcomes. Diff: 2 Page Ref: 331 64) Why are spreadsheet applications so commonly used for decision modeling? Answer: Spreadsheets are often used for this purpose because they are very approachable and easy to use for end users. Spreadsheets have a shallow learning curve that allows basic functions to be learned quickly. Additionally, spreadsheets have evolved over time to include a more robust set of features and functions. These functions can also be augmented through the use of add-ins, many of which are designed with decision support systems in mind. Diff: 2 Page Ref: 332 65) How are linear programming models vulnerable when used in complex situation? Answer: These models have the ability to be vulnerable when used in very complex situations for a number of reasons. One reason focuses on the possibility that not all parameters can be known or understood. Another concern is that the standard characteristics of a linear programming calculation may not hold in more dynamic, real-world environments. Additionally, in more complex environments all actors may not be wholly rational and economic issues. Diff: 2 Page Ref: 338 66) Provide some examples where a sensitivity analysis may be used. Answer: Sensitivity analyses are used for: • Revising models to eliminate too-large sensitivities • Adding details about sensitive variables or scenarios • Obtaining better estimates of sensitive external variables • Altering a real-world system to reduce actual sensitivities • Accepting and using the sensitive (and hence vulnerable) real world, leading to the continuous and close monitoring of actual results Diff: 3 Page Ref: 347 67) List and describe the most common approaches for treating uncertainty. Answer: There are two common approaches to dealing with uncertainty. The first is the optimistic approach and the second is the pessimistic approach. The optimistic approach assumes that the outcomes for all alternatives will be the best possible and then the best of each of those may be selected. Under the pessimistic approach the worst possible outcome is assumed for each alternative and then the best of the worst are selected. Diff: 2 Page Ref: 350-351 68) Why is the Monte Carlo simulation popular for solving business problems? Answer: The Monte Carlo simulation is a probabilistic simulation. It is designed around a model of the decision problem, but the problem does not consider the uncertainty of any of the variables. This allows for a huge number of simulations to be run with random changes within each of the variables. In this way, the model may be solved hundreds or thousands of times before it is completed. These results can then be analyzed for either the dependent or performance variables using statistical distributions. This demonstrates a number of possible solutions, as well as providing information about the manner in which variables will respond under different levels of uncertainty. Diff: 3 Page Ref: 357 Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 7 Big Data Concepts and Tools 1) In the opening vignette, the Access Telecom (AT), built a system to better visualize customers who were unhappy before they canceled their service. Answer: TRUE Diff: 2 Page Ref: 372 2) The term "Big Data" is relative as it depends on the size of the using organization. Answer: TRUE Diff: 2 Page Ref: 373 3) Satellite data can be used to evaluate the activity at retail locations as a source of alternative data. Answer: TRUE Diff: 2 Page Ref: 377 4) Big Data is being driven by the exponential growth, availability, and use of information. Answer: TRUE Diff: 2 Page Ref: 373 5) The quality and objectivity of information disseminated by influential users of Twitter is higher than that disseminated by noninfluential users. Answer: TRUE Diff: 2 Page Ref: 392 6) Big Data uses commodity hardware, which is expensive, specialized hardware that is custom built for a client or application. Answer: FALSE Diff: 2 Page Ref: 375 7) MapReduce can be easily understood by skilled programmers due to its procedural nature. Answer: TRUE Diff: 2 Page Ref: 385 8) Hadoop was designed to handle petabytes and exabytes of data distributed over multiple nodes in parallel. Answer: TRUE Diff: 2 Page Ref: 385 9) Hadoop and MapReduce require each other to work. Answer: FALSE Diff: 2 Page Ref: 386 10) In most cases, Hadoop is used to replace data warehouses. Answer: FALSE Diff: 2 Page Ref: 389 11) Despite their potential, many current NoSQL tools lack mature management and monitoring tools. Answer: TRUE Diff: 2 Page Ref: 389 12) There is a clear difference between the type of information support provided by influential users versus the others on Twitter. Answer: TRUE Diff: 2 Page Ref: 392 13) Social media mentions can be used to chart and predict flu outbreaks. Answer: TRUE Diff: 2 Page Ref: 400 14) In Application Case 7.6, Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse, it was found that urban individuals have a higher number of diagnosed disease conditions. Answer: TRUE Diff: 2 Page Ref: 403 15) For low latency, interactive reports, a data warehouse is preferable to Hadoop. Answer: TRUE Diff: 2 Page Ref: 396 16) If you have many flexible programming languages running in parallel, Hadoop is preferable to a data warehouse. Answer: TRUE Diff: 2 Page Ref: 396 17) In the Salesforce case study, streaming data is used to identify services that customers use most. Answer: FALSE Diff: 2 Page Ref: 410 18) It is important for Big Data and self-service business intelligence to go hand in hand to get maximum value from analytics. Answer: TRUE Diff: 1 Page Ref: 395 19) Big Data simplifies data governance issues, especially for global firms. Answer: FALSE Diff: 2 Page Ref: 406 20) Current total storage capacity lags behind the digital information being generated in the world. Answer: TRUE Diff: 2 Page Ref: 406 21) Using data to understand customers/clients and business operations to sustain and foster growth and profitability is A) easier with the advent of BI and Big Data. B) essentially the same now as it has always been. C) an increasingly challenging task for today's enterprises. D) now completely automated with no human intervention required. Answer: C Diff: 2 Page Ref: 373 22) A newly popular unit of data in the Big Data era is the petabyte (PB), which is A) 109 bytes. B) 1012 bytes. C) 1015 bytes. D) 1018 bytes. Answer: C Diff: 2 Page Ref: 375 23) Which of the following sources is likely to produce Big Data the fastest? A) order entry clerks B) cashiers C) RFID tags D) online customers Answer: C Diff: 2 Page Ref: 374 24) Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of Big Data called? A) volatility B) periodicity C) inconsistency D) variability Answer: D Diff: 2 Page Ref: 376 25) In the Twitter case study, how did influential users support their tweets? A) opinion B) objective data C) multiple posts D) references to other users Answer: B Diff: 2 Page Ref: 392 26) Allowing Big Data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems in near–real time with highly accurate insights. What is this process called? A) in-memory analytics B) in-database analytics C) grid computing D) appliances Answer: A Diff: 2 Page Ref: 380 27) Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources? A) in-memory analytics B) in-database analytics C) grid computing D) appliances Answer: C Diff: 2 Page Ref: 380 28) How does Hadoop work? A) It integrates Big Data into a whole so large data elements can be processed as a whole on one computer. B) It integrates Big Data into a whole so large data elements can be processed as a whole on multiple computers. C) It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on one computer. D) It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers. Answer: D Diff: 3 Page Ref: 386 29) What is the Hadoop Distributed File System (HDFS) designed to handle? A) unstructured and semistructured relational data B) unstructured and semistructured non-relational data C) structured and semistructured relational data D) structured and semistructured non-relational data Answer: B Diff: 2 Page Ref: 385 30) In a Hadoop "stack," what is a slave node? A) a node where bits of programs are stored B) a node where metadata is stored and used to organize data processing C) a node where data is stored and processed D) a node responsible for holding all the source programs Answer: C Diff: 2 Page Ref: 386 31) In a Hadoop "stack," what node periodically replicates and stores data from the Name Node should it fail? A) backup node B) secondary node C) substitute node D) slave node Answer: B Diff: 2 Page Ref: 386 32) All of the following statements about MapReduce are true EXCEPT A) MapReduce is a general-purpose execution engine. B) MapReduce handles the complexities of network communication. C) MapReduce handles parallel programming. D) MapReduce runs without fault tolerance. Answer: D Diff: 2 Page Ref: 389 33) In a network analysis, what connects nodes? A) edges B) metrics C) paths D) visualizations Answer: A Diff: 2 Page Ref: 403 34) In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case study, what was the analytic goal? A) determine if diseases are accurately diagnosed B) determine probabilities of diseases that are comorbid C) determine differences in rates of disease in urban and rural populations D) determine differences in rates of disease in males v. females Answer: C Diff: 2 Page Ref: 402 35) Traditional data warehouses have not been able to keep up with A) the evolution of the SQL language. B) the variety and complexity of data. C) expert systems that run on them. D) OLAP. Answer: B Diff: 2 Page Ref: 393 36) Under which of the following requirements would it be more appropriate to use Hadoop over a data warehouse? A) ANSI 2003 SQL compliance is required B) online archives alternative to tape C) unrestricted, ungoverned sandbox explorations D) analysis of provisional data Answer: C Diff: 2 Page Ref: 396 37) What is Big Data's relationship to the cloud? A) Hadoop cannot be deployed effectively in the cloud just yet. B) Amazon and Google have working Hadoop cloud offerings. C) IBM's homegrown Hadoop platform is the only option. D) Only MapReduce works in the cloud; Hadoop does not. Answer: B Diff: 2 Page Ref: 403 38) Companies with the largest revenues from Big Data tend to be A) the largest computer and IT services firms. B) small computer and IT services firms. C) pure open source Big Data firms. D) non-U.S. Big Data firms. Answer: A Diff: 2 Page Ref: 405 39) In the financial services industry, Big Data can be used to improve A) regulatory oversight. B) decision making. C) customer service. D) both A & B. Answer: D Diff: 2 Page Ref: 411 40) In the Alternative Data for Market Analysis or Forecasts case study, satellite data was NOT used for A) evaluating retail traffic. B) monitoring activity at factories. C) tracking agricultural estimates. D) monitoring individual customer patterns. Answer: D Diff: 2 Page Ref: 377 41) Big Data comes from ________. Answer: everywhere Diff: 2 Page Ref: 373 42) ________ refers to the conformity to facts: accuracy, quality, truthfulness, or trustworthiness of the data. Answer: Veracity Diff: 2 Page Ref: 376 43) In-motion ________ is often overlooked today in the world of BI and Big Data. Answer: analytics Diff: 2 Page Ref: 376-377 44) The ________ of Big Data is its potential to contain more useful patterns and interesting anomalies than "small" data. Answer: value proposition Diff: 2 Page Ref: 376 45) As the size and the complexity of analytical systems increase, the need for more ________ analytical systems is also increasing to obtain the best performance. Answer: efficient Diff: 2 Page Ref: 380 46) ________ speeds time to insights and enables better data governance by performing data integration and analytic functions inside the database. Answer: In-database analytics Diff: 2 Page Ref: 380 47) ________ bring together hardware and software in a physical unit that is not only fast but also scalable on an as-needed basis. Answer: Appliances Diff: 2 Page Ref: 380 48) Big Data employs ________ processing techniques and nonrelational data storage capabilities in order to process unstructured and semistructured data. Answer: parallel Diff: 2 Page Ref: 383 49) In the world of Big Data, ________ aids organizations in processing and analyzing large volumes of multistructured data. Examples include indexing and search, graph analysis, etc. Answer: MapReduce Diff: 2 Page Ref: 385 50) The ________ Node in a Hadoop cluster provides client information on where in the cluster particular data is stored and if any nodes fail. Answer: Name Diff: 2 Page Ref: 385 51) A job ________ is a node in a Hadoop cluster that initiates and coordinates MapReduce jobs, or the processing of the data. Answer: tracker Diff: 2 Page Ref: 386 52) HBase is a nonrelational ________ that allows for low-latency, quick lookups in Hadoop. Answer: database Diff: 2 Page Ref: 387 53) Hadoop is primarily a(n) ________ file system and lacks capabilities we'd associate with a DBMS, such as indexing, random access to data, and support for SQL. Answer: distributed Diff: 2 Page Ref: 388 54) HBase, Cassandra, MongoDB, and Accumulo are examples of ________ databases. Answer: NoSQL Diff: 2 Page Ref: 389 55) The problem of forecasting economic activity or microclimates based on a variety of data beyond the usual retail data is a very recent phenomenon and has led to another buzzword — ________. Answer: alternative data Diff: 2 Page Ref: 377 56) As volumes of Big Data arrive from multiple sources such as sensors, machines, social media, and clickstream interactions, the first step is to ________ all the data reliably and cost effectively. Answer: capture Diff: 2 Page Ref: 393 57) In open-source databases, the most important performance enhancement to date is the costbased ________. Answer: optimizer Diff: 2 Page Ref: 395 58) ________ of data provides business value; pulling of data from multiple subject areas and numerous applications into one repository is the raison d'être for data warehouses. Answer: Integration Diff: 2 Page Ref: 395 59) In the energy industry, ________ grids are one of the most impactful applications of stream analytics. Answer: smart Diff: 2 Page Ref: 407 60) Organizations are working with data that meets the three V's–variety, volume, and ________ characterizations. Answer: velocity Diff: 2 Page Ref: 374 61) In the opening vignette, why was the Telecom company so concerned about the loss of customers, if customer churn is common in that industry? Answer: The company was concerned about its loss of customers, because the loss was at such a high rate. The company was losing customers faster than it was gaining them. Additionally, the company had identified that the loss of these customers could be traced back to customer service interactions. Because of this, the company felt that the loss of customers is something that could be analyzed and hopefully controlled. Diff: 2 Page Ref: 370-371 62) List and describe the three main "V"s that characterize Big Data. Answer: • Volume: This is obviously the most common trait of Big Data. Many factors contributed to the exponential increase in data volume, such as transaction-based data stored through the years, text data constantly streaming in from social media, increasing amounts of sensor data being collected, automatically generated RFID and GPS data, and so forth. • Variety: Data today comes in all types of formats—ranging from traditional databases to hierarchical data stores created by the end users and OLAP systems, to text documents, e-mail, XML, meter-collected, sensor-captured data, to video, audio, and stock ticker data. By some estimates, 80 to 85 percent of all organizations' data is in some sort of unstructured or semistructured format. • Velocity: This refers to both how fast data is being produced and how fast the data must be processed (i.e., captured, stored, and analyzed) to meet the need or demand. RFID tags, automated sensors, GPS devices, and smart meters are driving an increasing need to deal with torrents of data in near–real time. Diff: 2 Page Ref: 374-375 63) List and describe four of the most critical success factors for Big Data analytics. Answer: • A clear business need (alignment with the vision and the strategy). Business investments ought to be made for the good of the business, not for the sake of mere technology advancements. Therefore, the main driver for Big Data analytics should be the needs of the business at any level—strategic, tactical, and operations. • Strong, committed sponsorship (executive champion). It is a well-known fact that if you don't have strong, committed executive sponsorship, it is difficult (if not impossible) to succeed. If the scope is a single or a few analytical applications, the sponsorship can be at the departmental level. However, if the target is enterprise-wide organizational transformation, which is often the case for Big Data initiatives, sponsorship needs to be at the highest levels and organization-wide. • Alignment between the business and IT strategy. It is essential to make sure that the analytics work is always supporting the business strategy, and not other way around. Analytics should play the enabling role in successful execution of the business strategy. • A fact-based decision making culture. In a fact-based decision-making culture, the numbers rather than intuition, gut feeling, or supposition drive decision making. There is also a culture of experimentation to see what works and doesn't. To create a fact-based decision-making culture, senior management needs to do the following: recognize that some people can't or won't adjust; be a vocal supporter; stress that outdated methods must be discontinued; ask to see what analytics went into decisions; link incentives and compensation to desired behaviors. • A strong data infrastructure. Data warehouses have provided the data infrastructure for analytics. This infrastructure is changing and being enhanced in the Big Data era with new technologies. Success requires marrying the old with the new for a holistic infrastructure that works synergistically. Diff: 2 Page Ref: 379-380 64) When considering Big Data projects and architecture, list and describe five challenges designers should be mindful of in order to make the journey to analytics competency less stressful. Answer: • Data volume: The ability to capture, store, and process the huge volume of data at an acceptable speed so that the latest information is available to decision makers when they need it. • Data integration: The ability to combine data that is not similar in structure or source and to do so quickly and at reasonable cost. • Processing capabilities: The ability to process the data quickly, as it is captured. The traditional way of collecting and then processing the data may not work. In many situations data needs to be analyzed as soon as it is captured to leverage the most value. • Data governance: The ability to keep up with the security, privacy, ownership, and quality issues of Big Data. As the volume, variety (format and source), and velocity of data change, so should the capabilities of governance practices. • Skills availability: Big Data is being harnessed with new tools and is being looked at in different ways. There is a shortage of data scientists with the skills to do the job. • Solution cost: Since Big Data has opened up a world of possible business improvements, there is a great deal of experimentation and discovery taking place to determine the patterns that matter and the insights that turn to value. To ensure a positive ROI on a Big Data project, therefore, it is crucial to reduce the cost of the solutions used to find that value. Diff: 3 Page Ref: 381 65) Define MapReduce. Answer: As described by Dean and Ghemawat (2004), "MapReduce is a programming model and an associated implementation for processing and generating large data sets. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system." Diff: 2 Page Ref: 384 66) What is NoSQL as used for Big Data? Describe its major downsides. Answer: • NoSQL is a new style of database that has emerged to, like Hadoop, process large volumes of multi-structured data. However, whereas Hadoop is adept at supporting large-scale, batch-style historical analysis, NoSQL databases are aimed, for the most part (though there are some important exceptions), at serving up discrete data stored among large volumes of multistructured data to end-user and automated Big Data applications. This capability is sorely lacking from relational database technology, which simply can't maintain needed application performance levels at Big Data scale. • The downside of most NoSQL databases today is that they trade ACID (atomicity, consistency, isolation, durability) compliance for performance and scalability. Many also lack mature management and monitoring tools. Diff: 2 Page Ref: 389-390 67) List and briefly discuss the three characteristics that define and make the case for data warehousing. Answer: 1) Data warehouse performance: More advanced forms of indexing such as materialized views, aggregate join indexes, cube indexes, and sparse join indexes enable numerous performance gains in data warehouses. The most important performance enhancement to date is the cost-based optimizer, which examines incoming SQL and considers multiple plans for executing each query as fast as possible. 2) Integrating data that provides business value: Integrated data is the unique foundation required to answer essential business questions. 3) Interactive BI tools: These tools allow business users to have direct access to data warehouse insights. Users are able to extract business value from the data and supply valuable strategic information to the executive staff. Diff: 2 Page Ref: 394-395 68) Why are some portions of tape backup workloads being redirected to Hadoop clusters today? Answer: • First, while it may appear inexpensive to store data on tape, the true cost comes with the difficulty of retrieval. Not only is the data stored offline, requiring hours if not days to restore, but tape cartridges themselves are also prone to degradation over time, making data loss a reality and forcing companies to factor in those costs. To make matters worse, tape formats change every couple of years, requiring organizations to either perform massive data migrations to the newest tape format or risk the inability to restore data from obsolete tapes. • Second, it has been shown that there is value in keeping historical data online and accessible. As in the clickstream example, keeping raw data on a spinning disk for a longer duration makes it easy for companies to revisit data when the context changes and new constraints need to be applied. Searching thousands of disks with Hadoop is dramatically faster and easier than spinning through hundreds of magnetic tapes. Additionally, as disk densities continue to double every 18 months, it becomes economically feasible for organizations to hold many years' worth of raw or refined data in HDFS. Diff: 2 Page Ref: 394 69) What are the differences between stream analytics and perpetual analytics? When would you use one or the other? Answer: • In many cases they are used synonymously. However, in the context of intelligent systems, there is a difference. Streaming analytics involves applying transaction-level logic to real-time observations. The rules applied to these observations take into account previous observations as long as they occurred in the prescribed window; these windows have some arbitrary size (e.g., last 5 seconds, last 10,000 observations, etc.). Perpetual analytics, on the other hand, evaluates every incoming observation against all prior observations, where there is no window size. Recognizing how the new observation relates to all prior observations enables the discovery of real-time insight. • When transactional volumes are high and the time-to-decision is too short, favoring nonpersistence and small window sizes, this translates into using streaming analytics. However, when the mission is critical and transaction volumes can be managed in real time, then perpetual analytics is a better answer. Diff: 2 Page Ref: 407-408 70) Describe data stream mining and how it is used. Answer: Data stream mining, as an enabling technology for stream analytics, is the process of extracting novel patterns and knowledge structures from continuous, rapid data records. A data stream is a continuous flow of ordered sequence of instances that in many applications of data stream mining can be read/processed only once or a small number of times using limited computing and storage capabilities. Examples of data streams include sensor data, computer network traffic, phone conversations, ATM transactions, web searches, and financial data. Data stream mining can be considered a subfield of data mining, machine learning, and knowledge discovery. In many data stream mining applications, the goal is to predict the class or value of new instances in the data stream given some knowledge about the class membership or values of previous instances in the data stream. Diff: 2 Page Ref: 408-409 Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 8 Future Trends, Privacy and Managerial Considerations in Analytics 1) Siemens utilizes data sensors to track failure rates in household appliances. Answer: FALSE Diff: 2 Page Ref: 418 2) In the classification of location-based analytic applications, examining geographic site locations falls in the consumer-oriented category. Answer: FALSE Diff: 2 Page Ref: 445 3) In the Great Clips case study, the company uses geospatial data to analyze, among other things, the types of haircuts most popular in different geographic locations. Answer: FALSE Diff: 2 Page Ref: 443 4) From massive amounts of high-dimensional location data, algorithms that reduce the dimensionality of the data can be used to uncover trends, meaning, and relationships to eventually produce human-understandable representations. Answer: TRUE Diff: 2 Page Ref: 445 5) In the Quiznos case, the company employed location-based behavioral targeting to narrow the characteristics of users who were most likely to eat at a quick-service restaurant. Answer: TRUE Diff: 2 Page Ref: 446 6) Internet of Things (IoT) is the phenomenon of connecting the physical world to the Internet. Answer: TRUE Diff: 2 Page Ref: 419 7) For cloud computing to be successful, users must have knowledge and experience in the control of the technology infrastructures. Answer: FALSE Diff: 2 Page Ref: 430 8) Social networking Web sites like Facebook, Twitter, and LinkedIn, are also examples of cloud computing. Answer: TRUE Diff: 1 Page Ref: 430 9) Web-based e-mail such as Google's Gmail are not examples of cloud computing. Answer: FALSE Diff: 2 Page Ref: 430 10) Service-oriented DSS solutions generally offer individual or bundled services to the user as a service. Answer: TRUE Diff: 2 Page Ref: 431 11) Users definitely own their biometric data. Answer: FALSE Diff: 2 Page Ref: 452 12) Data as a service began with the notion that data quality could happen in a centralized place, cleansing and enriching data and offering it to different systems, applications, or users, irrespective of where they were in the organization, computers, or on the network. Answer: TRUE Diff: 2 Page Ref: 431 13) IaaS helps provide faster information, but provides information only to managers in an organization. Answer: FALSE Diff: 2 Page Ref: 432 14) Server virtualization is the pooling of physical storage from multiple network storage devices into a single storage device. Answer: FALSE Diff: 2 Page Ref: 433 15) While cloud services are useful for small and midsize analytic applications, they are still limited in their ability to handle Big Data applications. Answer: FALSE Diff: 2 Page Ref: 435 16) SaaS combines aspects of cloud computing with Big Data analytics and empowers data scientists and analysts by allowing them to access centrally managed information data sets. Answer: FALSE Diff: 2 Page Ref: 435 17) One reason the IoT is growing exponentially is because hardware is smaller and more affordable. Answer: TRUE Diff: 2 Page Ref: 420 18) Connectivity is not a part of the IoT infrastructure. Answer: FALSE Diff: 2 Page Ref: 422 19) RFID can be used in supply chains to manage product quality. Answer: TRUE Diff: 1 Page Ref: 425 20) The term cloud computing originates from a reference to the Internet as a "cloud" and represents an evolution of all of the previously shared/centralized computing trends. Answer: TRUE Diff: 2 Page Ref: 430 21) What kind of location-based analytics is a real-time marketing promotion? A) organization-oriented geospatial static approach B) organization-oriented location-based dynamic approach C) consumer-oriented geospatial static approach D) consumer-oriented location-based dynamic approach Answer: B Diff: 2 Page Ref: 441 22) GPS Navigation is an example of which kind of location-based analytics? A) organization-oriented geospatial static approach B) organization-oriented location-based dynamic approach C) consumer-oriented geospatial static approach D) consumer-oriented location-based dynamic approach Answer: C Diff: 2 Page Ref: 441 23) What new geometric data type in Teradata's data warehouse captures geospatial features? A) NAVTEQ B) ST_GEOMETRY C) GIS D) SQL/MM Answer: B Diff: 2 Page Ref: 443 24) Which of these is NOT a part of the IoT technology infrastructure? A) hardware B) connectivity C) electrical access D) software Answer: C Diff: 2 Page Ref: 422 25) Today, most smartphones are equipped with various instruments to measure jerk, orientation, and sense motion. One of these instruments is an accelerometer, and the other is a(n) A) potentiometer. B) gyroscope. C) microscope. D) oscilloscope. Answer: B Diff: 2 Page Ref: 464 26) Smartbin has developed trash containers that include sensors to detect A) fill levels. B) types of trash. C) tip-over. D) weather. Answer: A Diff: 2 Page Ref: 419-420 27) The portion of the IoT technology infrastructure that focuses on the sensors themselves is A) hardware. B) connectivity. C) software backend. D) applications. Answer: A Diff: 2 Page Ref: 422 28) The portion of the IoT technology infrastructure that focuses on how to manage incoming data and analyze it is A) hardware. B) connectivity. C) software backend. D) applications. Answer: C Diff: 2 Page Ref: 422 29) The portion of the IoT technology infrastructure that focuses on controlling what and how information is captured is A) hardware. B) connectivity. C) software backend. D) applications. Answer: D Diff: 2 Page Ref: 422 30) The portion of the IoT technology infrastructure that focuses on how to transmit data is A) hardware. B) connectivity. C) software backend. D) applications. Answer: B Diff: 2 Page Ref: 422 31) Using this model, companies can deploy their software and applications in the cloud so that their customers can use them. A) SaaS B) PaaS C) IaaS D) DaaS Answer: B Diff: 2 Page Ref: 432 32) This model allows consumers to use applications and software that run on distant computers in the cloud infrastructure. A) SaaS B) PaaS C) IaaS D) DaaS Answer: A Diff: 2 Page Ref: 432 33) Which of the following is true of data-as-a-Service (DaaS) platforms? A) Knowing where the data resides is critical to the functioning of the platform. B) There are standardized processes for accessing data wherever it is located. C) Business processes can access local data only. D) Data quality happens on each individual platform. Answer: B Diff: 2 Page Ref: 431-432 34) Which of the following allows companies to deploy their software and applications in the cloud so that their customers can use them? A) SaaS B) IaaS C) PaaS D) AaaS Answer: C Diff: 2 Page Ref: 432 35) In this model, infrastructure resources like networks, storage, servers, and other computing resources are provided to client companies. A) SaaS B) PaaS C) IaaS D) DaaS Answer: C Diff: 2 Page Ref: 432 36) This model began with the notion that data quality could happen in a centralized place, cleansing and enriching data and offering it to different systems, applications, or users, irrespective of where they were in the organization, computers, or on the network. A) SaaS B) PaaS C) IaaS D) DaaS Answer: D Diff: 2 Page Ref: 431 37) Why are companies like IBM shifting to provide more services and consulting? A) Customers see that significant value can be created with the application of analytics, and need help completing these tasks. B) They can no longer compete in the software market. C) New regulations forced them into this market. D) None of these. Answer: A Diff: 3 Page Ref: 454 38) Services that let consumers permanently enter a profile of information along with a password and use this information repeatedly to access services at multiple sites are called A) consumer access applications. B) information collection portals. C) single-sign-on facilities. D) consumer information sign on facilities. Answer: C Diff: 2 Page Ref: 450 39) Which of the following is true about the furtherance of homeland security? A) There is a lessening of privacy issues. B) There is a greater need for oversight. C) The impetus was the need to harvest information related to financial fraud after 2001. D) Most people regard analytic tools as mostly ineffective in increasing security. Answer: B Diff: 2 Page Ref: 450-451 40) Why is separating the impact of analytics from that of other computerized systems a difficult task? A) Businesses do not typically track the sources of successful projects. B) The trend is toward integrating systems. C) Software tools are not sophisticated enough. D) It is not an organizational priority. Answer: B Diff: 2 Page Ref: 453 41) ________ is a generic technology that refers to the use of radio-frequency waves to identify objects. Answer: RFID Diff: 2 Page Ref: 422 42) A critical emerging trend in analytics is the incorporation of location data. ________ data is the static location data used by these location-based analytic applications. Answer: Geospatial Diff: 2 Page Ref: 441 43) With RFID tags, a(n) ________ tag has a battery on board to energize it. Answer: active Diff: 2 Page Ref: 423 44) With RFID tags, a(n) ________ tag receives energy from the electromagnetic field created by the interrogator. Answer: passive Diff: 2 Page Ref: 423 45) Predictive analytics is beginning to enable development of software that is directly used by a consumer. One key concern in employing these technologies is the loss of ________. Answer: privacy Diff: 2 Page Ref: 448 46) ________ is the splitting of available bandwidth into channels. Answer: Network virtualization Diff: 2 Page Ref: 433 47) ________ is the masking of physical servers from server users. Answer: Server virtualization Diff: 2 Page Ref: 433 48) ________ provides resources like networks, storage, servers, and other computing resources to client companies. Answer: IaaS Diff: 3 Page Ref: 432 49) IaaS, AaaS and other ________-based offerings allow the rapid diffusion of advanced analysis tools among users, without significant investment in technology acquisition. Answer: cloud Diff: 2 Page Ref: 440 50) A major structural change that can occur when analytics are introduced into an organization is the creation of new organizational ________. Answer: units Diff: 2 Page Ref: 454 51) A(n) ________ is operated solely for a single organization having a mission critical workload and security concerns. Answer: private cloud Diff: 2 Page Ref: 433 52) In a(n) ________ the subscriber uses the resources offered by service providers over the Internet. Answer: public cloud Diff: 2 Page Ref: 434 53) Analytics can change the way in which many ________ are made by managers and can consequently change their jobs. Answer: decisions Diff: 2 Page Ref: 455 54) AaaS in the cloud has economies of scale and scope by providing many ________ analytical applications with better scalability and higher cost savings. Answer: virtual Diff: 2 Page Ref: 435 55) Location information from ________ phones can be used to create profiles of user behavior and movement. Answer: mobile Diff: 2 Page Ref: 462 56) For individual decision makers, ________ values constitute a major factor in the issue of ethical decision making. Answer: personal Diff: 2 Page Ref: 453 57) ________ is/are used to capture, store, analyze, and manage data linked to a location using integrated sensor technologies, global positioning systems installed in smartphones, or through RFID deployments in the retail and healthcare industries. Answer: GIS Diff: 2 Page Ref: 442 58) By using ________, businesses can collect and analyze data to discern large-scale patterns of movement and identify distinct classes of behaviors in specific contexts. Answer: location-enabled services Diff: 3 Page Ref: 445 59) Pokémon GO is an example of a location-sensing ________ reality-based game. Answer: augmented Diff: 2 Page Ref: 446 60) In general, ________ is the right to be left alone and the right to be free from unreasonable personal intrusion. Answer: privacy Diff: 2 Page Ref: 449 61) How does Siemens use sensor data to help monitor equipment on trains? Answer: Siemens uses an IoT model and sensors attached to several key components of trains and other railway equipment to help evaluate its current working condition, and predict the need for future repair. By using a wide variety of different types of sensors, the company is able to evaluate a multitude of conditions. This evaluation can be on the train itself, or within the supporting infrastructure. By using analytics to monitor these sensors, the company is able to predict the need for repair prior to component failure. Diff: 2 Page Ref: 418 62) How do the traditional location-based analytic techniques using geocoding of organizational locations and consumers hamper the organizations in understanding "true location-based" impacts? Answer: Locations based on postal codes offer an aggregate view of a large geographic area. This poor granularity may not be able to pinpoint the growth opportunities within a region. The location of the target customers can change rapidly. An organization's promotional campaigns might not target the right customers. Diff: 2 Page Ref: 441 63) In what ways can communications companies use geospatial analysis to harness their data effectively? Answer: Communication companies often generate massive amounts of data every day. The ability to analyze the data quickly with a high level of location-specific granularity can better identify the customer churn and help in formulating strategies specific to locations for increasing operational efficiency, quality of service, and revenue. Diff: 2 Page Ref: 444 64) What is Internet of Things (IoT) and how is it used? Answer: IoT is the phenomenon of connecting the physical world to the Internet. In IoT, physical devices are connected to sensors that collect data on the operation, location, and state of a device. This data is processed using various analytics techniques for monitoring the device remotely from a central office or for predicting any upcoming faults in the device. Diff: 2 Page Ref: 419 65) What is cloud computing? What is Amazon's general approach to the cloud computing services it provides? Answer: • Wikipedia defines cloud computing as "a style of computing in which dynamically scalable and often virtualized resources are provided over the Internet. Users need not have knowledge of, experience in, or control over the technology infrastructures in the cloud that supports them." • Amazon.com has developed an impressive technology infrastructure for e- commerce as well as for business intelligence, customer relationship management, and supply chain management. It has built major data centers to manage its own operations. However, through Amazon.com's cloud services, many other companies can employ these very same facilities to gain advantages of these technologies without having to make a similar investment. Like other cloud-computing services, a user can subscribe to any of the facilities on a pay-as-you-go basis. This model of letting someone else own the hardware and software but making use of the facilities on a payper-use basis is the cornerstone of cloud computing. Diff: 2 Page Ref: 430 66) Data and text mining is a promising application of AaaS. What additional capabilities can AaaS bring to the analytic world? Answer: It can also be used for large-scale optimization, highly-complex multi-criteria decision problems, and distributed simulation models. These prescriptive analytics require highly capable systems that can only be realized using service-based collaborative systems that can utilize largescale computational resources. Diff: 3 Page Ref: 435 67) Describe your understanding of the emerging term people analytics. Are there any privacy issues associated with the application? Answer: • Applications such as using sensor-embedded badges that employees wear to track their movement and predict behavior has resulted in the term people analytics. This application area combines organizational IT impact, Big Data, sensors, and has privacy concerns. One company, Sociometric Solutions, has reported several such applications of their sensor-embedded badges. • People analytics creates major privacy issues. Should the companies be able to monitor their employees this intrusively? Sociometric has reported that its analytics are only reported on an aggregate basis to their clients. No individual user data is shared. They have noted that some employers want to get individual employee data, but their contract explicitly prohibits this type of sharing. In any case, sensors are leading to another level of surveillance and analytics, which poses interesting privacy, legal, and ethical questions. Diff: 2 Page Ref: 455 68) What is a data scientist and what does the job involve? Answer: A data scientist is a role or a job frequently associated with Big Data or data science. In a very short time it has become one of the most sought-out roles in the marketplace. Currently, data scientists' most basic, current skill is the ability to write code (in the latest Big Data languages and platforms). A more enduring skill will be the need for data scientists to communicate in a language that all their stakeholders understand—and to demonstrate the special skills involved in storytelling with data, whether verbally, visually, or—ideally—both. Data scientists use a combination of their business and technical skills to investigate Big Data looking for ways to improve current business analytics practices (from descriptive to predictive and prescriptive) and hence to improve decisions for new business opportunities. Diff: 2 Page Ref: 459

Business intelligent

Related documents

Products

Support

Business intelligent

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib