INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETY SESSION 25 – REVIEW AND WRAP UP SEAN J. TAYLOR ADMINISTRATIVIA • G1: Submit group feedback forms • G2: • You do not need to dress up. • Bring slides on a USB drive or bring a laptop. • 15 minutes total. Optionally end 2-3 minutes early to take questions. • Not everyone needs to present, but make it clear to me who did what! OUTLINE • • • • • • Relational databases SQL Data mining Visualizing data Software engineering Location-based services • Pricing • Network effects • Lock-in and Switching costs From the first half: • Porter’s Five Forces • Innovator’s Dilemma • IT-enabled strategies THE FIVE COMPETITIVE FORCES firm = company = organization = business = competitor Barriers to Entry, or Threat of new Entrants The extent of rivalry between the existing firms in the focal industry The threat of entry by potential entrants (new firms) into the focal industry Industry that you are analyzing (focal industry) Bargaining Power of Suppliers The bargaining power of the firms that sell inputs to the firms in the focal industry Rivalry Among Existing Competitors Threat of Substitute Products or Services The threat of products/services that could substitute (be used instead of) the finished products made by the firms in the focal industry Bargaining Power of Buyers The bargaining power of the customers that buy the finished products of the firms in the focal industry SUSTAINING VS. DISRUPTIVE INNOVATIONS FOUR KEY INTERNET-ENABLED STRATEGIES 4 Key Internet-enabled Strategies for Competitive Advantage: 1. Disintermediation 2. Mass Customization 3. Personalization 4. Global Reach RELATIONAL DATABASES: OBJECTIVES • Know the names for all the key terms. • Argue why using a RDBMS is important. • Identify anomalies that could arise from storing data the wrong way. • Draw an E-R diagram for an application. • Normalize a database I give you. RELATIONAL DATABASES Relational Database Tables Records Fields Field values Bytes, bits Field Record Field value Student Table Last Name SS# DOB Major Smith 100201122 06/11/84 IS Kim 200202222 1/1/85 FIN Davis 300201232 12/31/81 MKT Pat 999132212 3/3/88 ACC ADVANTAGES 1. Consistency • We can restrict the values of certain fields (e.g. dates, integers) • We can impose other kinds of constraints (all costs must be positive, last names must be included, orders must have addresses) • Data look the same to all users at the same time. 2. Centralization • Many different users can edit and view the data simultaneously. Efficient sharing of information. 3. Efficient Querying • SQL and other query languages can be used to create complex reports quickly PROBLEMS WITH EXCEL? When should you use a database instead of Excel? – Insertion anomalies – Deletion anomalies – Update anomalies } Data Quality Problems Should we just create multiple workbooks in Excel? – The real power of a database: Querying – How would you answer the following question in Excel? – Find customers that spend on average $50 per book order, that live on West Coast or on the East Coast (but not in Midwest) and whose annual income is at least $150K SIMPLE HOSPITAL SYSTEM ERD WARD has assigned NURSE DOCTOR accommodates cares for treats PATIENT NORMALIZING AMAZON’S DATA • The process of assuring that a database can be implemented effectively as a set of two-dimensional tables • Unlike Excel though, the tables are connected • Prevents insertion, deletion and update anomalies SQL: OBJECTIVES • Given a query, describe in words what result it would give you. • Given a business question and a set of tables/fields, write a query that answers that question. • Correct a query that has an error in the syntax or otherwise will not run. COMPLETE QUERY EXAMPLE SELECT ISBN, BookName, Price, Publisher FROM Book WHERE BookName like '*Information Systems*' AND PubDate > #1/1/2002# AND Price < 100 ORDER BY Price ISBN is a Foreign Key here TABLE JOIN EXAMPLE SELECT BookName, Date FROM Book, Order Where Book.ISBN = Order.ISBN Order By Book.ISBN, Order.Date Use “table.column” format to avoid ambiguity Joining/matching criteria: very important, don’t forget! MULTIPLE JOINS WITH WHERE AND GROUP BY SELECT FavoriteMovie, count(*) FROM Profiles, FavoriteBooks, FavoriteMovies WHERE FavoriteMovies.ProfileId = Profiles.ProfileId and FavoriteBooks.ProfileID = Profiles.ProfileID and FavoriteBook = "The Great Gatsby" GROUP BY FavoriteMovie ORDER BY count(*) desc; DATA MINING: OBJECTIVES • Explain how competitive advantages can result from data-driven decision making. • What obstacles are there to implementing a data-driven strategy? • Given an application, is it appropriate to use classification? Regression? Clustering? • Why do we need to reduce dimensionality of our features? • Why do we keep hold-out data and how does crossvalidation work? THE DATA-DRIVEN FIRM Why do we see these changes now? • Collect: easier to collect, store information about consumers, technologies, markets • Respond: Fast internal communication means that firms are agile enough to respond to external information • Process: Firms can process large volumes of data to make intelligent decisions DATA-DRIVEN CHALLENGES 1. Measurement What should be measured and how? 2. Incentives How can we design incentives around these measures without creating adverse consequences? 3. Infrastructure Do we have the right infrastructure (servers, software, etc) in place to measure and analyze the data we have? 4. Skills Do we have the skills we need to accomplish these tasks? WHAT IS DATA MINING? • • • Data Mining – process of discovering new (nonobvious) patterns from large data sets (databases) • artificial intelligence • machine learning • statistics Data mining is an automatic or semi-automatic process Types of data mining tasks: • classification – classify new data into existing structure (categories or “classes”) e.g., email as “spam” or “not spam” • regression – model the shape of the data with least error • cluster analysis – discover groups or structures in the data • anomaly detection – find unusual or outlier data records • association rule mining – discover relationships between data variables DATA MINING AS A PROCESS • Cross Industry Standard Process for Data Mining (CRISP-DM) • Practically, the data mining process involves: 1) 2) 3) 4) 5) 6) • Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Pre-processing – compile a large enough data set and get it “ready” for analysis e.g., “Data Warehousing” • • Can involve integration of various databases within a business sanitization/cleaning of data • Data mining task • Validation of results • Is the output “predictive” , i.e., how good is the process when applied to new data that was not in the training set? HOLDOUT & CROSS-VALIDATION Data consists of sets of: Data {𝑥; 𝑦} where x are covariates, y outcomes (input) data (output) models Modeling Algorithm M1 holdout 𝑥ℎ1 ; 𝑦ℎ1 𝑥ℎ1 𝑦ℎ2,𝑚1 (the y of the 1st holdout data as predicted by model 1) How does 𝑦ℎ1 compare to 𝑦𝑚1 ? How much does this depend on exactly which data we held out? VISUALIZATION: OBJECTIVES • What is Anscombe’s quartet and why does it matter? • What are the benefits of exploratory data analysis (EDA)? • Interpret a histogram, scatter plot, density plot, or box-plot. Tell me what you see and why it’s important. • Given a data set and a general question, which visualization technique would you use and why? ANSCOMBE’S QUARTET Your brain can efficiently process properly visualized data. EDA: EXPLORATORY DATA ANALYSIS • An approach to analyzing data sets to summarize their main characteristics in easy-to-understand form. • Often with visual graphs, without using a statistical model or having formulated a hypothesis. • Helps to formulate hypotheses that could be tested on new data-sets. SOFTWARE ENGINEERING: OBJECTIVES • Why is software engineering difficult? • Discuss the essential difficulties of software engineering. • Explain each stage of the waterfall model. • How does the agile methodology change the way software is built compared to the waterfall model? ESSENTIAL DIFFICULTIES 1. Complexity • Hard to manage large teams • Hard to understand system, side-effects 2. Conformity • Software is expected to meet all users’ needs 3. Changeability • Pressure/ability to change 4. Invisibility • No way to see it all at once, visually BUILD OR BUY? WHY BUY? WHY BUILD? • Time to use • Customized, all requirements met • External support • No risk of project failure • Upgrades • Network effects WATERFALL MODEL “AGILE” METHODOLOGY LOCATION-BASED SERVICES: OBJECTIVES • Identify the components and participants in LBS applications • Describe the interaction of the components • Types of context awareness • Adaption in LBS apps • Push vs. Pull • Latency vs. Bandwidth LBS: AN INTERSECTION OF TECHNOLOGIES GIS / Spatial databases Web GIS Internet LBS Mobile GIS Mobile Internet Mobile Devices CONTEXTUAL ADAPTION 1. Information level: the content of the information presented is adapted. (e.g. filtering based on proximity) 2. User interface level: the interface is adapted to suit small screens, onthe-go users. 3. Presentation level: the visualization of the info is adapted. PRICING: OBJECTIVES • Unique properties of digital goods. • Cost leadership vs. product differentiation. • Consumer surplus, producer surplus, dead-weight loss. • Personalized pricing (1st degree PD). • Group pricing (3rd degree PD). • Versioning (2nd degree PD), versioning dimensions. • Bundling. • For a given set of consumers, I expect you to be able to tell me what revenue a firm will earn given different pricing regimes. PERSONALIZED PRICING p p vs. PRICE REVENUE REVENUE DEMAND q • The idealized economic scenario (ideal for the seller) • Find out what each customer is willing to pay, and charge them as close as possible to this • Can work in conjunction with increasing product fit • Made more easily feasible by a web-based sales channel q GROUP PRICING Same product, different prices for different groups • Identify groups willing to pay less, offer them lower prices • Need to be able to identify group membership easily • What types of groups are systematically willing to pay less? p p vs. PRICE SEGMENT 1 SEGMENT 2 PRICE 1 PRICE 2 REVENUE REVENUE DEMAND q DEMAND q VERSIONING p p High-end version Single version PRICE 1 Low-end version vs. PRICE REVENUE PRICE 2 DEMAND q REVENUE DEMAND • Different versions, different prices • Segmentation based on self-selection based on willingness to pay for different versions q BUNDLING: A SIMPLE EXAMPLE Product 1 Word Processor Product 2 Spreadsheet Alice $60 $40 Bob $40 $60 Optimal prices • $40 for product 1 • $40 for product 2 • $100 for the bundle of product 1 and product 2 NETWORK EFFECTS: LEARNING OBJECTIVES • Understand the idea of positive feedback and describe the role it has played in some prior technology industries (railroad, electricity, telephony) • Define network effects (demand-side economies of scale) and understand how they lead to positive feedback • Describe the difference between supply-side and demand-side economies of scale • Understand the typical sources of network effects in information technology industries • Be able to recognize these sources for specific technology products or in specific business contexts • Understand the trade-offs between performance and compatibility, and between openness and proprietary control of a technology POSITIVE FEEDBACK: OVERVIEW What is positive feedback? • when a firm becomes successful, its past and current success make it more likely to succeed in the future • ‘…success feeds on itself, the strong get stronger…’ When does this happen? • More customers lower unit cost (supply-side economies of scale) • More customers larger ‘network’ more valuable product (demandside economies of scale caused by network effects) Possible consequences of positive feedback • Dominance of a single firm or technology • Dominance of an inferior technology that got an early lead • Critical Mass: below the critical mass, few are willing to buy (inertia); beyond the critical mass, the market takes off. • Introducing a new product is difficult because of collective switching costs Historical examples • Railroad gauges, AC versus DC power, telephone networks SOURCES OF POSITIVE FEEDBACK Supply-side economies of scale (Traditional markets) • • More customers more units produced lower average cost per unit Marginal cost less than average cost Spreading fixed costs across more units • Manufacturing efficiencies, learning by doing • Demand-side economies of scale (Digital markets) • More units consumed higher value per unit • The value of the good comes from the network of consumers who use it (at least in part) • Most commonly caused by network effects (Microsoft, Playstation, Facebook) • Positive relationship between popularity and value Consumer expectations are key! SWITCHING COSTS: LEARNING OBJECTIVES 1. Explain switching costs, technological lock-in and how switching costs lead to technological lock-in. 2. Understand how switching costs affect industry competition. 3. Understand the sources of lock-in and switching costs: • durable purchases • specialized suppliers • brand-specific training • search costs • information and databases • loyalty programs • contracts 4. Be able to identify and analyze sources of lock-in for specific products or in specific business contexts. 5. Explain strategies for firms to create lock-in and for consumers to mitigate the consequences of lock-in. NEXT CLASS: GROUP PRESENTATIONS • Group 7 • Group 1 • Group 5 • Group 2