Cloud Computing Overview

advertisement
Cloud Computing Overview:
Big Data and Business Analytics
Hsinchun Chen
University of Arizona
© 2005
1
Interesting Questions
Cloud Computing Applications
Big Data Analytics
Business Models (CIA)
© 2005
2
Cloud Computing Applications:
Overview and Examples
© 2005
3
IQ: How Amazon makes its money?
© 2005
4
Cloud Computing Overview
• Cloud computing: applications, system software, and
hardware delivered as services over the Internet.
• Service oriented architecture + virtualization + utility
computing
• Software as a Service (SaaS), Infrastructure as a
Service (IaaS), Platform as a Service (PaaS)
• From web services to cloud computing applications
• Moving towards cloud applications and cloud business
models, e.g., SaleForce.com, Apple iTune, Amazon
© 2005
5
Major Could Computing Platforms
• Amazon Elastic Compute Cloud (EC2): LAMP (Linux,
Apache, mySQL, and PHP) stack
• Google App Engine: Java and Python runtime, Java
Persistence API (JPA), Google Bigtable, File systems;
Hadoop, MapReduce
• Windows Azure: .Net, MS SQL, SharePoint
© 2005
6
Emerging Applications
• E-Commerce: B2C, life style & entertainment, global
supply-chain, banking, telecommunications, IT hosting,
business intelligence and analytics
• E-Government: government data sources, services
• E-Education: online education content delivery
• E-Security: cybersecurity, intelligence
• E-Health: healthcare big data, healthcare 2.0; genomics
+ EHR
© 2005
7
Selected Health Cloud Initiatives
• National Electronic Health Record Data Bank, Singapore: MOH +
Accenture, August 2010; healthcare management, quality and
performance management, EHR information aggregation, patient self
management, decision support
• E-Health, E-Health Cloud, England: Chelsea Westminster Hospital
+ Flexiant, July 2011, patient EHR access
• CareStream Cloud, US: Carestream Health (Onex + Kodak), 2009;
health imaging sharing, 1B medical images, health cloud SaaS
vendor
• Taiwan Smart Health Cloud, NTU & NCKU
(Sources: NTU Health Cloud proposal)
© 2005
8
IQ: What’s the difference between 2005
and 2012 for web computing?
© 2005
9
Web Computing and Mining
• Emerging web applications  business models
• Web services, APIs, mashups  cloud & mobile
computing
• Business analytics  Data, text and web mining
© 2005
10
Web Services and Computing
(No Cloud), 2005 (Web 2.0)-2011
© 2005
11
50 Projects, 2005-2012
(“Business Web Mining Using Amazon, Google, eBay, and
Google”)
• E-commerce and e-Services:
iRelocate RealTomatoes SmallBH HobbyCentral NewPlaceSeek
College Advisor Friendly Gifter Clipper GottaCouch SkiStop vTrack
Barter Bay Link-US Smart Gift Card Timely Bid Tucson Gamer Café TV and More
Deliverables Cellphone Intelligent Auctioning Tucson Book Exchange SciBubble Wish
Sky GiftChannel PriceSmart WetYourWhistle
• Life Style and Entertainment:
BetSmart XTREME F1 MLB 100Yards CricWeb
iBollywood Sa Ri Ga Ma WOW Bollywood Funzic HinduShrines
Indiapaaru NachBaliye Movie Location Quest Remakes SugarSuite
MusicBox Artist Connection Concerto Star Search
• Government and Education:
RepCheck SmallNGreenCars Change of Base iDog Tasty Park iSupport
© 2005
12
SmallNGreenCars
© 2005
13
SmallNGreenCars
© 2005
14
SmallNGreenCars
• Unique Concept
• Global customers
• Youtube vehicle videos
• Flickr vehicle photos
• Google Maps and Local Search
• Google visualization
• RSS feeds of global vehicle news
• Facebook recommendation from friends
• Yahoo Finance for currency exchange
• Google Translate for web pages
• Recommendation System
• Fuel Efficiency Challenge
© 2005
• By Kumar
Vakeel, Kunal
Jain, Neeraj
Munshi; MS
MIS, Spring
2010
• One-stop
portal for green
cars information
and resources
15
SmallNGreenCars
© 2005
16
Sa Ri Ga Ma
© 2005
17
Sa Ri Ga Ma
© 2005
18
Sa Ri Ga Ma
• Sarigama.com latest news and RSS Feeds
• Artist information
• Mahalakshmi
• Transliteration
Sundararajan, Pavithra
• Music play and video
Ravi, Sahana Nagaraja;
Spring 2010
• Shopping
• Carnatic Music: One of the
• Lessons and Library
two main genres of Indian
• Concert locator
classical music; Mostly
• Forums
performed vocally
• Interactive Features
• Sarigama.com: one stop
information portal for
• Tag Clouds
carnatic music
• Lyrics Recommender system
© 2005
19
Sa Ri Ga Ma
© 2005
20
Web Services, Cloud Computing, and
Mobile Web, 2012 (Web 3.0)
© 2005
21
25 Projects, 2012
Cloud and Mobile Computing
• E-commerce and e-Services:
GamerzLykMe MobileAppPortal Gemstones PersonalInvestment
iScream iRace SeeMeSocial AZRegionTrend HelpMeAZ
• Health & Life Style:
EatRight OrganiCook RoadTrip Xtravel WreckDivers VoiceOfNature
HealthMiners HelpAsthma DiabeatUS HikeAday YogaWorld
BikersParadise YogaWorld BikersParadise
© 2005
22
OrganiCook
© 2005
23
© 2005
24
OrganiCook
•
•
•
•
•
•
•
•
•
•
© 2005
Organic food supplier location
Different health concerned recipe
catalogs
Integrate healthy content with
social media
Text mining for cookware
recommendation
Mark allergens among ingredients
Provide health news
Advertisement
Unique recommendation system
Amazon EC2 Cloud server
Intetergrate Mahout with Hadoop
• By Zilong Chang,
Mengwen Cheng,
Yajie Wang, and
Haiqing Wu,
Spring 2012
• One-stop portal
for healthy foods
25
OrganiCook
FatSecret
Get recipes and nutrition facts
Yahoo Local
Get location of organic food
suppliers
Google Map
Google Map-map the location
Google Places
Get detail info about the food
suppliers
Facebook Social Plugin
Like Button , Comments
Twitter Buttons
Share a link , Follow
Twitter Search
Return tweets based on user’s
search keyword and recipe
name
Google+
Share the page
Return relevant videos
© 2005
Flicker
Return pictures of the recipe
26
OrganiCook
User
Cloud
Application Server
Browser
Internet Connection
Apache Tomcat
J2EE
REST API
Mahout Taste
Amazon EC2
Data Mining
JavaScript API
API Servers
MySQL 5.5
Database server
© 2005
27
EatRight
© 2005
28
© 2005
29
EatRight
• True SoLoMo (Web 3.0)
• Nutrition based meal shopping
• Capturing user preferences: “Eat This”
button
• Directed search advertising rates
• Targeted ads based on nutrition
preferences and location
• EatRight API
• Twitter Sentiment
• PCI Compliant Credit Card Processing
• Amazon EC2 Cloud
• Android Mobile App (iOS too!)
© 2005
• By Jim
Marquardson,
Justin William,
Dave Wilson, and
Mark Grimes,
Spring, 2012
• Health &
nutrition mobile
site
30
EatRight
© 2005
31
Big Data & Business Analytics
© 2005
32
IQ: Size (storage) of LOC book
collection?
© 2005
33
IQ: What is a Yottabyte & who owns it?
© 2005
34
The Data Deluge (Big Data)
• The Economists, March 2010
– LOC total book collection 15 TBs
– Google processes 10 PBs per day
– Internet traffic 667 Exabytes by 2013, Cisco
– Total amount of world information in 2010, 1.2
Zettabyte
• KB-MB-GB-TB-PB-EB-ZB-Yottabyte
• E-Commerce, Government, Health, Security
applications: many with TB/PB of valuable content from
customers, citizens, patients, etc.
© 2005
35
BI & Analytics: The Market
• $3B BI revenue in 2009 (Gartner, 2006); $9.4B BI
software M&A spending in 2010 and $14.1B by 2014
(Forrester)
• IBM spent $14B in BI in five years; $9B BI revenue in
2010 (USA Today, November 2010); 24 acquisitions,
10,000 BI software developers, 8,000 BI consultants,
200 BI mathematicians  Acquired i2/COPLINK in
2011
© 2005
36
BI & Analytics: Definition and
Components
• BI and Analytics refers to: (1) the technologies, systems,
practices and applications that (2) analyze critical business data
to (3) help an enterprise better understand its business and
market.”
• Core technologies: data warehousing, Extraction, Transformation,
and Load (ETL); Business Performance Management (BPM),
visual dashboards; data and text mining, social network analysis
• BI 2.0 & 3.0 research: web analytics, web 2.0; in-memory and
real-time BI; web 3.0, cloud computing, Hadoop, MapReduce;
mobile computing, stream data mining
© 2005
37
Big Data Analytics Research at UA/AI Lab
• Applications/problems: digital libraries, search engines,
biomedical informatics, healthcare data mining, security
informatics, business intelligence
• Approaches: web collection/spidering, databases, data
warehousing, data mining, text mining, web mining,
statistical NLP, ontologies, social media analytics, interface
design, information visualization, economic modeling,
assessment
• Structure: federal funding, director, affiliated faculty, postdocs, Ph.D./MS/BS students  commercialization
• Major phases: DLI  COPLINK  Dark Web 
DiabeticLink
© 2005
38
Business Models
© 2005
39
IQ: What is “CIA” and their
differences?
© 2005
40
CIA in the Global IT Landscape
• Central Intelligence Agency; Culinary Institute of
America
• Chinese: math/science, team player, IT/hardware/web,
China market (China)
• Indians: math/science, entrepreneurial spirit, English
• Americans: English, entrepreneurial spirit, IT/software,
business development, market (US), VC access ($)
© 2005
41
My COPLINK Experience
• Taiwan/US Training: NCTU (math)  SUNY Buffalo (MBA)  NYU
(AI)  U of Arizona (top 3)
• AI Lab: Digital Library  COLINK  Dark Web  DiabeticLink
• COPLINK federal funding ($4M), NSF/NIJ, 1997-2002
• COPLINK commercialization ($4.6M), angels/VCs (Taiwan, CA, AZ),
2000 & 2003
• Customer sales ($30M), 4,500 agencies, 120 FTEs, 2000-2011
• M&A Exit, Silverlake/i2/IBM acquisition, 2009 (i2), 2011 (IBM);
$500M valuation
© 2005
42
© 2005
43
43
COPLINK Identity Resolution and Criminal Network Analysis (DHS)
Cross-jurisdictional Information Sharing/Collaboration
Arizona IDMatcher
Law-enforcement Data
AZ
CA
CAN Visualizer
TX
Border Crossing Data
(AZ, CA, TX)
Vehicles
Identity Resolution
DOB
Match
Criminal Network Analysis
High-risk Vehicle
Identification
Identity
Match
Name
Match
People
Address
Match
ID
Match
Law-enforcement Data
Criminal Link Prediction
Suspect Traffic Burst
Detection
Border Crossing Data
Narcotics Network
Mutual Information
Vehicle A
Vehicle B
2000
Time of Day
ID
Similarity
1500
1000
500
0
May 18
May 25
May 28
May 30
Jun 9
June 17
Jan 26
Jan 31
Feb 27
Mar 5
Dates
Mar 5
< 2004
May 18
Dec 29
Jan 6
Jan 6
Jan 6
Jan 15
Jan 19
Address
Similarity
Nov 11
DOB
Similarity
Nov 17
Last
Name
Match
Dec 19
Middle
Name
Match
Dec 21
First
Name
Match
2005 >
Frequent Crossers at Night
First
Name
Similarity
Middle
Name
Similarity
Last
Name
Similarity
Detect false and deceptive
identities across jurisdictions
using a probabilistic naïveBayes based resolution
system.
Vehicle A
Vehicle B
Identify high-risk vehicles
using association techniques
like mutual information using
border crossing and law
enforcement data.
Predict interaction between
individuals and vehicles using
link prediction techniques to
identify high-risk border
crossers.
Detect real-time anomalies
and threats in border traffic
using Markov switching and
other models.
* Only the grayed datasets are available to the AI Lab
•
•
•
© 2005
Funding: NSF, DOJ, DHS ($4M), VCs ($4.6M); Digital Government
Publications: ACM TOIS, CACM, IEEE TKDE, IEEE IS, JASIST, DSS
Impact: 3500 agencies, 25 NATO countries, 1M users  public safety
44
44
The New York Times, November 2, 2002
COPLINK assisted in DC sniper investigation
ABC News April 15, 2003
Google for Cops: Coplink software helps police search for
cyber clues to bust criminals
Newsweek Magazine, March 3, 2003
A computerized way for police to coordinate crime
databases
Washington Post, March 6, 2008, COPLINK in
use in 3,500 police agencies in US!
COPLINK acquired by i2 (Silver Lake) in
2009; i2/COPLINK acquired by IBM in 2011
for $500M
© 2005
45
IT Business Models: Some Thoughts
• Startup Phase: business ideas (product and market), team
(founders & mentors), share structure (shares, directors, options;
legal/CPA), business plan (short plan, good introduction), funding
(government, angels, VCs, family)  Year 0, 1-3 founders,
$250K funding (IT/cloud)
• Early Phase: first product, product positioning, team building,
initial sales  Years 1-3, $500K sales
• Growth Phase: products plan, strong sales team, sustainable
revenues, unique IPs (SW, content), loyal customers  Years 38, $10M sales
• Exit Phase: IPO or M&A (partners), when ($20M+), next venture
Taking risks!
© 2005
46
Pain, Sorrow, and Regret
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Loss of family time/life (but never money)
Managing university obligations and COI
University bureaucracy, Office of Technology Transfer (OPTT)
Lawyers, accountants are expensive
Chasing angels/VCs (40 frogs  1 prince)
Office, employees, products
Selling products (becoming a vendor)
Burning cash
Bubble burst
Raising second round funding when you are down ($2M)
Board room yelling matches
University accusations
Losing control and shares
Anti-dilution clause (losing $60M for the $2M you never used)
© 2005
47
hchen@eller.Arizona.edu
http://ai.Arizona.edu
© 2005
48
Download