Internet Pictures Clips Maps News Shop Email more BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen 1 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges Opportunities Outline Background “Big data” is term acknowledging the exponential growth, availability and use of … Challenges “Big data” proposes ground challenges on data capture, storage, analysis … Opportunities Many applications can be benefited from “Big data” … 2 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges Opportunities Background We are capturing more data Super exponential growth in data volume Satellite imagery, mobile station, distributed sensor networks, geographical plotting … Copyright belongs to “Data Analysis Challenges”, JSR-08-142, Dec 3 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges Opportunities Background We are using more data Intelligent transportation Digital health care 4 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges Opportunities Background We need quick processing of the data Volcano monitor Hurricane moving path predication 5 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges Opportunities Background We are exploring the unknowns with different means of data measurements Exploring the universe Ocean science 6 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges Opportunities Background We are discovering new rules from data The well-formed. eigenfactor project visualizes information flow in science. This diagram shows the citation links of the journal Nature. Copyright belongs to http://wellformed.eigenfactor.org 7 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges Opportunities Background Defining Big Data Wiki: Big data are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics and visualizing. Gartner(2011): Big data is a popular term used to acknowledge the exponential growth, availability and use of information in the data-rich landscape of tomorrow. 8 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges Opportunities Background Features of Big Data 3V: Variety, Velocity and Volume 9 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges Opportunities Challenges Applications <key,vals> Object E-R Hierarchical Data Processing (Processing lang, optimization, Visualization) Data Model (Interpretation, representation) Network Topology Storage (Reliability, Scalability, Availability) Data Extraction (Acquisition, Integration, Representation ) Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges . Data Model . Storage . Management . Processing Opportunities Challenges Data model challenges Volume Scale up, scale out, and scale in Velocity “Interactive” properties to facilitate processing Variety Simple but unified to adapt heterogeneity Existing data models are not satisfactory <key,vals> Object E-R Hierarchical Functionality vs. Simplicity 11 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges . Data Model . Storage . Management . Processing Opportunities Challenges Storage challenges Storage concerns: • Reliability: data is safe and trustable • Availability: data is accessible • Scalability: data operation performance does not decay along with data size growth However, the CAP theorem is the bottleneck. No one-for-all solution exists 12 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges . Data Model . Storage . Management . Processing Opportunities Challenges Storage challenges CAP Theorem • • • Consistency Availability Partition tolerance 13 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges . Data Model . Storage . Management . Processing Opportunities Challenges Storage challenges ACID vs. BASE RDBMS NoSQL Atomic Consistent Basically Available Isolated Soft-state Durable Eventually consistent RDBMS BigTable HyperTable HBase MongoDB Redis Scalaris etc. Dynamo CouchDB Cassandra SimpleDB Tokyo Cabinet Riak Voldemot etc. C P A 14 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges . Data Model . Storage . Management . Processing Opportunities Challenges Management challenges “Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data” Gartner(2011) Big data management Functionality Flexibility Indexing & Partition Adaption to new requirement and new component 15 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges . Data Model . Storage . Management . Processing Opportunities Challenges Management challenges E.g., Indexing over big data Volume Large volume of Requires Distributed data captured adaptive index very time unit Leads to Significant cost on meta data exchange Leads to Ambiguity on indexing the same object Variety Data captured from different sources Requires Distributed adaptive index 16 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges . Data Model . Storage . Management . Processing Opportunities Challenges Challenges on processing • New query language (algebra) Desired Flexibility Sacrifices & Overhead Complexity in data modeling “Relational” supporting Poor scalability “Uncertain” supporting Poor scalability and significant computing overhead Scalability Efficiency & Effectiveness Less functionality Poor scalability 17 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges . Data Model . Storage . Management . Processing Opportunities Challenges Challenges on processing • New computing paradigm for processing Distributed Computing Paradigm Message Passing Unified Access MapReduce Limitations Poor scalability and fault tolerance Invalidated efficiency over large computing nodes Poor functionality 18 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges . Data Model . Storage . Management . Processing Opportunities Challenges Challenges on processing • New optimization methodology Load Balance Data Locality High Parallelism Merging Cost Less Network I/O Replicated Computing 19 Internet Pictures Clips Maps News Shop Email more BIG Opportunities • We are empowered to learn knowledge and process DATA information more accurately, effectively and efficiently. Outline Background Challenges . Data Model . Storage . Management . Processing Opportunities Why “Big Data”? Natural Science Study Fundamental Scientific Research Big Data Social Civilization Daily Life 20 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges . Data Model . Storage . Management . Processing Opportunities Opportunities Big Data for natural science study • E.g., natural disaster forecasting and management Flood Forecasting Earthquake Meteorological data Geographic data Population, transportation, urban design data Economic data Extreme Weather Manage ment Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges . Data Model . Storage . Management . Processing Opportunities Opportunities Big Data for fundamental scientific research • E.g., Bio informatics and medicine The mutual promotion relation between the gene technology and the clinical medicine 22 Internet Pictures Clips Maps News Shop Email more BIG Opportunities • Light-speed information spreading & enormous knowledge DATA Big Data for social civilization line kground llenges ata Model torage Management rocessing portunities Quick events detection Easy collaboration Wandering where to get a real good cup of coffee ? JUST tweet your question!! Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges . Data Model . Storage . Management . Processing Opportunities Opportunities Big Data for daily life • Our life can be much easier more data… E.g., trip planning Travel to Beijing::Request 3-day stay Budget< 1000$ Predefine Forbidden City Adaptive agenda 10am Meeting every day Real world incidents Traffic jam Updating Luggage delay Bad weather 24 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges . Data Model . Storage . Management . Processing Opportunities Opportunities Opportunity highlights • Volume o Capture, store and analyze data help us better understand the world • Velocity o Guaranteed effective & efficient data processing • Variety o Handling heterogeneous sources of data Considering all the challenges and constraints, perhaps there is no one-for-all solution However, application dependent “Big Data” solutions are promising 25 Internet Pictures Clips Maps News Shop Email more BIG DATA Outline Background Challenges . Data Model . Storage . Management . Processing Opportunities . Applications Opportunities Applications Heterogeneous data management • Search doctors • Search universities (undergoing) Data Integration Web pages on the Internet Search Doctors Hospital databases Search results from general- purpose search engines News / rumors Integrated Database Data Extraction … ~500,000 doctors & ~30,000 hospitals from 50+GB source OLAP Query Processing 26