BIG DATA The next frontier for emerging market USC CSSE Annual Research Review March 14, 2013 Rachchabhorn Wongsaroj Bank of Thailand, Visiting Scholar @ USC Outline Current situation What is big data? Why big data is important? Big data cases Research challenges Big data in Thailand Future research Current Situation Global data Data Quality Data Quantity Problems Data Timeliness Lots of data is being created & collected Data Variety What is big data? Big Data = Volume, Variety and Velocity Volume Variety People to People People to Machine Machine to Machine Velocity 8 Billion messages/day 845M active users 20 Hours of video uploaded every minute 340Million Tweets/day 140M active users Source: Gartner & IBM Why big data is important? Emerging Technologies Hype Cycle 2011 (Gartner) Why big data is important? Emerging Technologies Hype Cycle 2012 (Gartner) Why big data is important? Source: McKinsey Global Institute Analysis Why big data is important? Big data can generate significant financial value across sectors US Health Care $300 billion value/year ̴ 0.7 % annual productivity growth Europe Public Sector Administration Global Personal Location Data £250 billion value/year ̴0.5 % annual productivity growth $100 billion +revenue for service provider Up to $700 billion value to end users US Retail Manufacturing 60+% increase in net margin possible 0.5-1.0 % annual productivity growth Up to 50% decrease in product development Up to 7% reduction in working capital Source: McKinsey Global Institute Analysis Why big data is important? Health Care sector has potential to invest $300B 14% $47B Accounts advanced fraud detection: performance based drug pricing 49% $165B Clinical transparency in clinical data and clinical decision support R&D $47B Account 32% $108B $108B R&D R&D personalized medicine, clinical trial design 2% $5B Business Model aggregation of patient records, online platform and communities $165B Clinical 3% $9B Public health surveillance and response systems Business Model Public Clininal Account Source: US Department of Labor Big data cases Cases Data sources / Techniques Output Google patient search data, Predictive Model, etc. Hospitalization pattern, Customized insurance Advanced analytic solutions Process time reduction Customer transactions Customer defection prediction Trading transactions & IP address Possible Frauds, Financial Bubble, Money Laundering Real time people & location data Crime and terrorist prevention Product search pattern, social media Website outage/peak time support, Travel trend and pattern Research Challenges Function Big data retail lever Marketing Cross-selling Location based marketing In-store behavior analysis Customermicro-segmentation micro-segmentation Customer Sentimentanalysis analysis Sentiment Enhancing the multichannel consumer experience Merchandising Assortment optimization Pricing optimization Placement and design optimization Operations Performance transparency Performance transparency Labor Laborinputs inputsoptimization optimization Supply Chain Inventory management Distribution and logistic optimization Informing supplier negotiations New Business Model Price services Pricecomparison comparison services Web-based markets Source: McKinsey Global Institute Analysis Big data in Thailand Challenges Language Cost of implementation Magnitude of data Demographic data generator Data type Big data in Thailand Language (natural language processing) no space between words Combination between Thai –Foreign languages Lack of Thai text analytic components Example Big data in Thailand Cost of implementation 13 Big data vendors in 2013 Hadoop : Requires: ~$1 million between 125 and 250 nodes Distribution: Annual costs: ~$4,000 per node -> A small fraction of an enterprise data warehouse $10-$100s of millions. Big data in Thailand Magnitude of data As of September 2012 60% use Local Bandwidth 44% 31% 14% 9% Local Bandwidth (.th, or.th, etc) 1,006,140 Mbps Overseas Bandwidth 405,860 Mbps 25% use smart phone 8% use tablet Big data in Thailand Demographic data generator Most data are from young generations Population 65M Internet users 25M 39% of population use Internet 85.9% of data is created by Internet users age 6-24 Big data in Thailand Types of data – limited Big data technique application Only 2.12% focus on Education Source: http://www.prd.go.th/ewt_news.php?nid=23168 Bank of Thailand (BOT) Website – As is Manual Checking Financial institution BOT data (Internet/ Extranet) DB 1 DB 2 Problems Too many steps Once due - act first, fix later Too many stakeholders Bureaucracy management style DB3 Template Input Manual Submit BTWS Working Auto Submit BOT Website Source: Bank of Thailand BOT data website – As is Volume Revision Policy Timeliness Manual Checking Variety Input Data Complex Validation Cross Validation Manual Check Query Data (BO) Input Template Manual Submit Website VelocityApprove Accuracy & Reliability Source: Bank of Thailand Future research Data quality management Tools Template Checklist Process Reference Big Data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute Analysis Understanding Big Data: Analytic for Enterprise Class Haddop and Streaming Data, IBM Gartner Report Thailand National Statistic Office Thailand Digital Statistic Source Bank of Thailand (www.bot.or.th) BIG DATA The next frontier for emerging market Thank you Q&A Rachchabhorn Wongsaroj Bank of Thailand Visiting Scholar @ USC