IS6125 Database Analysis and Design Lecture 2: The changing nature and role of data Rob Gleasure R.Gleasure@ucc.ie www.robgleasure.com IS6125 Today’s session Change of time/place for next week Data a few years ago Data now The cloud Big data Business Intelligence The case of Spotify Data a few years ago Image from http://www.hotcleaner.com/web_storage.html The Cloud Capacity Resources Web is overtaking/has overtaken desktop Mobile is replacing local Utility-based computing is replacing once-off purchase Makes resources seem endless Lowers risk in terms of usage (pay as you go) Demand Resources Capacity Demand Time Static data center Time Data center in the cloud Slide Credits: Berkeley RAD Lab The Cloud The ‘Internet of things’ was born in about 2009 More devices connected to the Web than people… Image from http://computinged.com/edge/become-part-of-the-cloud-computing-revolution/ The Cloud This has meaningful implications for data in terms of Capacity Measurement Integration Security Privacy Big data The idea is that the vast amounts of interaction data allow for systems that are nuanced and responsive in ways that were previously not possible Also a realisation that, if it can be analysed, this data is a huge commodity, meaning new business models are possible So when is data ‘big data’ 3 Vs of Big data Volume Facebook generates 10TB of new data daily, Twitter 7TB A Boeing 737 generates 240 terabytes of flight data during a flight from one side of the US to the other We can use all of this data to tell us something, if we know the right questions to ask 3 Vs of Big data Traditional Approach Analyzed informatio n Big Data Approach All available information analyzed All available information Analyze small subsets of data Analyze all data From http://www.slideshare.net/ibmcanada/big-dataturning-data-into-insights?qid=0b4c69bc-3db2-4e12-ae47-a362a25752eb&v=qf1&b=&from_search=3 3 Vs of Big data Velocity Clickstreams and asynchronous data transfer can capture what millions of users are doing right now Make a change, then watch the response. No guesswork required up front as to what to gather, we can induce the interesting stuff as we see it 3 Vs of Big data Traditional Approach Hypothesis Question Answer Data Start with hypothesis and test against selected data Big Data Approach Data Exploration Insight Correlation Explore all data and identify correlations From http://www.slideshare.net/ibmcanada/big-dataturning-data-into-insights?qid=0b4c69bc-3db2-4e12-ae47-a362a25752eb&v=qf1&b=&from_search=3 3 Vs of Big data Variety Move from structured data to unstructured data, including image recognition, text mining, etc. Gathered from users, applications, systems, sensors Increasingly comprehensive data view of our ecosystem The Internet of Things The Internet of Things From http://www.pcworld.com/article/2039413/new-intel-ceo-creates-mysterious-new-devices-division.html The Internet of Things RFID sensors, bluetooth, microprocessors, wifi all becoming easier to embed in ‘dumb’ devices Move to mobile also means more data streaming from us at all times, e.g. location, call activity, net use The Internet of Things Smart homes/smart cities Temperature, lighting, food stocks, energy, security Smart cars Diagnostics, traffic suggestions, sensors, self-driving Smart healthcare Worn and intravenous computing detects issues early and monitors care outcomes remotely Smart factories, farms Machines coordinated efficiently, linked dynamically to consumption models Big data Success stories Books Barnes and Noble: Discovered that readers often quit nonfiction books less than halfway through. Introduced highly successful new series of short books on topical themes Amazon: originally used a panel of expert reviewers for books. Data surplus allowed them to create increasingly predictive recommendations. Panel has since been disbanded and 1/3 of sales are now driven by the recommender system Big data and the Internet of Things Success stories (continued) Transport Flyontime.us: used historical weather and flight delay information to predict likelihood of flights get delayed Farecast: looked at ticket prices for specific flights based on historical data, then advised users to buy or wait according to predicted fare costing trajectory UPS: Uses a range of traffic data to calculate most efficient time/fuel efficient routes according to complex algorithm Big data and the Internet of Things Famous success stories (continued) Healthcare Modernizing Medicine EMA dermatology system https://www.youtube.com/watch?v=jMGaGtK9nzU Big data and the Internet of Things Famous success stories (continued) Social media Google (data for information relevance) Twitter (c.f. #RescuePH) Facebook (social data) Issues with big data Google Flu Trends Life imitating data, imitating life? No one is really average height Your Xbox knows you like that Katy Perry song Also, Target called to say your teenage daughter is pregnant. Icecream sales and shark attacks… Icecream sales and shark attacks continued (correlation, not causation) From http://xkcd.com/552/ Target’s family monitoring continued Assignment 1 In groups, you are tasked with identifying and researching a business that uses data in an interesting and creative way. The report should be approximately 2000 words and describe the key values offered by the business to its consumers, how this differentiates it from competitors, and how its use of data at different points in the creation, delivery, and support of products/services enables this differentiation. You don’t need to go into deep technical detail concerning how data is handled, nor about the technologies used. However you should discuss data-related processes at a high-level, insofar as you understand them from the information you gathered The report is due on the 23rd October, at which time a soft-bound report should be handed into Ann O’Riordan in room 3.75 Assignment 1 The groups are as follows: Group 1: Hennessy, John James; Gao, Yun; Kenny, Mark Paul; Group 2: O'Driscoll, Nicole; Flood, Lee; Yang, Siyu; Group 3: Duggan, Claire Bernadette; Nolan, Robert Cunningham; Power, Declan; Group 4: Huang, Junqi; Kenneally, Alan Kieran; Murray, Jack Joseph; Group 5: Lawton, Fiona Margaret; Hennessy, Darragh Ross; Chen, Qi; Group 6: Xu, Chenjun; Kilcoyne, Shane Anthony; O'Donovan, MaryKate; Group 7: O'Donovan, Paul Andrew; Guerin, Steven John; MolerRodriguez, Marta; Group 8: O'Riordan, Christina Eilish; Anso, Gabriel; Mc Carthy, Patricia; Group 9: O'Donovan, Eileen; Wang, Mengjian; Lowham, Joshua George; Group 10: Kerrisk, Edward; Meaney, Brendan; Qin, Xiaolu; Want to read more? On Modernizing Medicine https://www.modmed.com/ On Spotify http://www.bigdata-startups.com/BigData-startup/big-dataenabled-spotify-change-music-industry/#!prettyPhoto On the cloud and big data The Little Book of Cloud Computing 2013 edition, Lars Nielsen