Cloud Computing Overview: Big Data and Business Analytics Hsinchun Chen University of Arizona © 2005 1 Interesting Questions Cloud Computing Applications Big Data Analytics Business Models (CIA) © 2005 2 Cloud Computing Applications: Overview and Examples © 2005 3 IQ: How Amazon makes its money? © 2005 4 Cloud Computing Overview • Cloud computing: applications, system software, and hardware delivered as services over the Internet. • Service oriented architecture + virtualization + utility computing • Software as a Service (SaaS), Infrastructure as a Service (IaaS), Platform as a Service (PaaS) • From web services to cloud computing applications • Moving towards cloud applications and cloud business models, e.g., SaleForce.com, Apple iTune, Amazon © 2005 5 Major Could Computing Platforms • Amazon Elastic Compute Cloud (EC2): LAMP (Linux, Apache, mySQL, and PHP) stack • Google App Engine: Java and Python runtime, Java Persistence API (JPA), Google Bigtable, File systems; Hadoop, MapReduce • Windows Azure: .Net, MS SQL, SharePoint © 2005 6 Emerging Applications • E-Commerce: B2C, life style & entertainment, global supply-chain, banking, telecommunications, IT hosting, business intelligence and analytics • E-Government: government data sources, services • E-Education: online education content delivery • E-Security: cybersecurity, intelligence • E-Health: healthcare big data, healthcare 2.0; genomics + EHR © 2005 7 Selected Health Cloud Initiatives • National Electronic Health Record Data Bank, Singapore: MOH + Accenture, August 2010; healthcare management, quality and performance management, EHR information aggregation, patient self management, decision support • E-Health, E-Health Cloud, England: Chelsea Westminster Hospital + Flexiant, July 2011, patient EHR access • CareStream Cloud, US: Carestream Health (Onex + Kodak), 2009; health imaging sharing, 1B medical images, health cloud SaaS vendor • Taiwan Smart Health Cloud, NTU & NCKU (Sources: NTU Health Cloud proposal) © 2005 8 IQ: What’s the difference between 2005 and 2012 for web computing? © 2005 9 Web Computing and Mining • Emerging web applications business models • Web services, APIs, mashups cloud & mobile computing • Business analytics Data, text and web mining © 2005 10 Web Services and Computing (No Cloud), 2005 (Web 2.0)-2011 © 2005 11 50 Projects, 2005-2012 (“Business Web Mining Using Amazon, Google, eBay, and Google”) • E-commerce and e-Services: iRelocate RealTomatoes SmallBH HobbyCentral NewPlaceSeek College Advisor Friendly Gifter Clipper GottaCouch SkiStop vTrack Barter Bay Link-US Smart Gift Card Timely Bid Tucson Gamer Café TV and More Deliverables Cellphone Intelligent Auctioning Tucson Book Exchange SciBubble Wish Sky GiftChannel PriceSmart WetYourWhistle • Life Style and Entertainment: BetSmart XTREME F1 MLB 100Yards CricWeb iBollywood Sa Ri Ga Ma WOW Bollywood Funzic HinduShrines Indiapaaru NachBaliye Movie Location Quest Remakes SugarSuite MusicBox Artist Connection Concerto Star Search • Government and Education: RepCheck SmallNGreenCars Change of Base iDog Tasty Park iSupport © 2005 12 SmallNGreenCars © 2005 13 SmallNGreenCars © 2005 14 SmallNGreenCars • Unique Concept • Global customers • Youtube vehicle videos • Flickr vehicle photos • Google Maps and Local Search • Google visualization • RSS feeds of global vehicle news • Facebook recommendation from friends • Yahoo Finance for currency exchange • Google Translate for web pages • Recommendation System • Fuel Efficiency Challenge © 2005 • By Kumar Vakeel, Kunal Jain, Neeraj Munshi; MS MIS, Spring 2010 • One-stop portal for green cars information and resources 15 SmallNGreenCars © 2005 16 Sa Ri Ga Ma © 2005 17 Sa Ri Ga Ma © 2005 18 Sa Ri Ga Ma • Sarigama.com latest news and RSS Feeds • Artist information • Mahalakshmi • Transliteration Sundararajan, Pavithra • Music play and video Ravi, Sahana Nagaraja; Spring 2010 • Shopping • Carnatic Music: One of the • Lessons and Library two main genres of Indian • Concert locator classical music; Mostly • Forums performed vocally • Interactive Features • Sarigama.com: one stop information portal for • Tag Clouds carnatic music • Lyrics Recommender system © 2005 19 Sa Ri Ga Ma © 2005 20 Web Services, Cloud Computing, and Mobile Web, 2012 (Web 3.0) © 2005 21 25 Projects, 2012 Cloud and Mobile Computing • E-commerce and e-Services: GamerzLykMe MobileAppPortal Gemstones PersonalInvestment iScream iRace SeeMeSocial AZRegionTrend HelpMeAZ • Health & Life Style: EatRight OrganiCook RoadTrip Xtravel WreckDivers VoiceOfNature HealthMiners HelpAsthma DiabeatUS HikeAday YogaWorld BikersParadise YogaWorld BikersParadise © 2005 22 OrganiCook © 2005 23 © 2005 24 OrganiCook • • • • • • • • • • © 2005 Organic food supplier location Different health concerned recipe catalogs Integrate healthy content with social media Text mining for cookware recommendation Mark allergens among ingredients Provide health news Advertisement Unique recommendation system Amazon EC2 Cloud server Intetergrate Mahout with Hadoop • By Zilong Chang, Mengwen Cheng, Yajie Wang, and Haiqing Wu, Spring 2012 • One-stop portal for healthy foods 25 OrganiCook FatSecret Get recipes and nutrition facts Yahoo Local Get location of organic food suppliers Google Map Google Map-map the location Google Places Get detail info about the food suppliers Facebook Social Plugin Like Button , Comments Twitter Buttons Share a link , Follow Twitter Search Return tweets based on user’s search keyword and recipe name Google+ Share the page Return relevant videos © 2005 Flicker Return pictures of the recipe 26 OrganiCook User Cloud Application Server Browser Internet Connection Apache Tomcat J2EE REST API Mahout Taste Amazon EC2 Data Mining JavaScript API API Servers MySQL 5.5 Database server © 2005 27 EatRight © 2005 28 © 2005 29 EatRight • True SoLoMo (Web 3.0) • Nutrition based meal shopping • Capturing user preferences: “Eat This” button • Directed search advertising rates • Targeted ads based on nutrition preferences and location • EatRight API • Twitter Sentiment • PCI Compliant Credit Card Processing • Amazon EC2 Cloud • Android Mobile App (iOS too!) © 2005 • By Jim Marquardson, Justin William, Dave Wilson, and Mark Grimes, Spring, 2012 • Health & nutrition mobile site 30 EatRight © 2005 31 Big Data & Business Analytics © 2005 32 IQ: Size (storage) of LOC book collection? © 2005 33 IQ: What is a Yottabyte & who owns it? © 2005 34 The Data Deluge (Big Data) • The Economists, March 2010 – LOC total book collection 15 TBs – Google processes 10 PBs per day – Internet traffic 667 Exabytes by 2013, Cisco – Total amount of world information in 2010, 1.2 Zettabyte • KB-MB-GB-TB-PB-EB-ZB-Yottabyte • E-Commerce, Government, Health, Security applications: many with TB/PB of valuable content from customers, citizens, patients, etc. © 2005 35 BI & Analytics: The Market • $3B BI revenue in 2009 (Gartner, 2006); $9.4B BI software M&A spending in 2010 and $14.1B by 2014 (Forrester) • IBM spent $14B in BI in five years; $9B BI revenue in 2010 (USA Today, November 2010); 24 acquisitions, 10,000 BI software developers, 8,000 BI consultants, 200 BI mathematicians Acquired i2/COPLINK in 2011 © 2005 36 BI & Analytics: Definition and Components • BI and Analytics refers to: (1) the technologies, systems, practices and applications that (2) analyze critical business data to (3) help an enterprise better understand its business and market.” • Core technologies: data warehousing, Extraction, Transformation, and Load (ETL); Business Performance Management (BPM), visual dashboards; data and text mining, social network analysis • BI 2.0 & 3.0 research: web analytics, web 2.0; in-memory and real-time BI; web 3.0, cloud computing, Hadoop, MapReduce; mobile computing, stream data mining © 2005 37 Big Data Analytics Research at UA/AI Lab • Applications/problems: digital libraries, search engines, biomedical informatics, healthcare data mining, security informatics, business intelligence • Approaches: web collection/spidering, databases, data warehousing, data mining, text mining, web mining, statistical NLP, ontologies, social media analytics, interface design, information visualization, economic modeling, assessment • Structure: federal funding, director, affiliated faculty, postdocs, Ph.D./MS/BS students commercialization • Major phases: DLI COPLINK Dark Web DiabeticLink © 2005 38 Business Models © 2005 39 IQ: What is “CIA” and their differences? © 2005 40 CIA in the Global IT Landscape • Central Intelligence Agency; Culinary Institute of America • Chinese: math/science, team player, IT/hardware/web, China market (China) • Indians: math/science, entrepreneurial spirit, English • Americans: English, entrepreneurial spirit, IT/software, business development, market (US), VC access ($) © 2005 41 My COPLINK Experience • Taiwan/US Training: NCTU (math) SUNY Buffalo (MBA) NYU (AI) U of Arizona (top 3) • AI Lab: Digital Library COLINK Dark Web DiabeticLink • COPLINK federal funding ($4M), NSF/NIJ, 1997-2002 • COPLINK commercialization ($4.6M), angels/VCs (Taiwan, CA, AZ), 2000 & 2003 • Customer sales ($30M), 4,500 agencies, 120 FTEs, 2000-2011 • M&A Exit, Silverlake/i2/IBM acquisition, 2009 (i2), 2011 (IBM); $500M valuation © 2005 42 © 2005 43 43 COPLINK Identity Resolution and Criminal Network Analysis (DHS) Cross-jurisdictional Information Sharing/Collaboration Arizona IDMatcher Law-enforcement Data AZ CA CAN Visualizer TX Border Crossing Data (AZ, CA, TX) Vehicles Identity Resolution DOB Match Criminal Network Analysis High-risk Vehicle Identification Identity Match Name Match People Address Match ID Match Law-enforcement Data Criminal Link Prediction Suspect Traffic Burst Detection Border Crossing Data Narcotics Network Mutual Information Vehicle A Vehicle B 2000 Time of Day ID Similarity 1500 1000 500 0 May 18 May 25 May 28 May 30 Jun 9 June 17 Jan 26 Jan 31 Feb 27 Mar 5 Dates Mar 5 < 2004 May 18 Dec 29 Jan 6 Jan 6 Jan 6 Jan 15 Jan 19 Address Similarity Nov 11 DOB Similarity Nov 17 Last Name Match Dec 19 Middle Name Match Dec 21 First Name Match 2005 > Frequent Crossers at Night First Name Similarity Middle Name Similarity Last Name Similarity Detect false and deceptive identities across jurisdictions using a probabilistic naïveBayes based resolution system. Vehicle A Vehicle B Identify high-risk vehicles using association techniques like mutual information using border crossing and law enforcement data. Predict interaction between individuals and vehicles using link prediction techniques to identify high-risk border crossers. Detect real-time anomalies and threats in border traffic using Markov switching and other models. * Only the grayed datasets are available to the AI Lab • • • © 2005 Funding: NSF, DOJ, DHS ($4M), VCs ($4.6M); Digital Government Publications: ACM TOIS, CACM, IEEE TKDE, IEEE IS, JASIST, DSS Impact: 3500 agencies, 25 NATO countries, 1M users public safety 44 44 The New York Times, November 2, 2002 COPLINK assisted in DC sniper investigation ABC News April 15, 2003 Google for Cops: Coplink software helps police search for cyber clues to bust criminals Newsweek Magazine, March 3, 2003 A computerized way for police to coordinate crime databases Washington Post, March 6, 2008, COPLINK in use in 3,500 police agencies in US! COPLINK acquired by i2 (Silver Lake) in 2009; i2/COPLINK acquired by IBM in 2011 for $500M © 2005 45 IT Business Models: Some Thoughts • Startup Phase: business ideas (product and market), team (founders & mentors), share structure (shares, directors, options; legal/CPA), business plan (short plan, good introduction), funding (government, angels, VCs, family) Year 0, 1-3 founders, $250K funding (IT/cloud) • Early Phase: first product, product positioning, team building, initial sales Years 1-3, $500K sales • Growth Phase: products plan, strong sales team, sustainable revenues, unique IPs (SW, content), loyal customers Years 38, $10M sales • Exit Phase: IPO or M&A (partners), when ($20M+), next venture Taking risks! © 2005 46 Pain, Sorrow, and Regret • • • • • • • • • • • • • • Loss of family time/life (but never money) Managing university obligations and COI University bureaucracy, Office of Technology Transfer (OPTT) Lawyers, accountants are expensive Chasing angels/VCs (40 frogs 1 prince) Office, employees, products Selling products (becoming a vendor) Burning cash Bubble burst Raising second round funding when you are down ($2M) Board room yelling matches University accusations Losing control and shares Anti-dilution clause (losing $60M for the $2M you never used) © 2005 47 hchen@eller.Arizona.edu http://ai.Arizona.edu © 2005 48