Big Data • What is Big Data? • Recently much good science, whether physical, biological, or social, has been forced to confront and has often benefited from - the Big Data phenomenon. • Big Data refers to the explosion in the quantity (and sometimes, quality) of available and potentially relevant data, largely the result of recent and unprecedented advancements in data recording and storage technology. (p. 115) Diebold, F.X. (2003), \Big Data Dynamic Factor Models for Macroeconomic Measurement and Forecasting: A Discussion of the Papers by Reichlin and Watson," In M. Dewatripont, L.P. Hansen and S. Turnovsky (eds.), Advances in Economics and Econometrics: Theory and Applications, Eighth World Congress of the Econometric Society, Cambridge University Press, 115-122 Big data spans four dimensions: Volume, Velocity, Variety, and Veracity • The first 3Vs definition is widely used by Gartner and much of the industry • The new V “Veracity” is introduced by some organizations • Volume: Enterprises are awash with evergrowing data of all types, easily amassing – terabytes—even petabytes—of information. – Turn 12 terabytes of Tweets created each day into improved product sentiment analysis – Convert 350 billion annual meter readings to better predict power consumption • Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching – fraud, big data must be used as it streams into your enterprise in order to maximize its value. – Scrutinize 5 million trade events created each day to identify potential fraud – Analyze 500 million daily call detail records in real-time to predict customer churn faster • Variety: Big data is any type of data structured and unstructured data such as text, sensor – data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together. – Monitor 100’s of live video feeds from surveillance cameras to target points of interest – Exploit the 80% data growth in images, video and documents to improve customer satisfaction • Veracity: 1 in 3 business leaders don’t trust the information they use to make decisions. – How can you act upon information if you don’t trust it? – Establishing trust in big data presents a huge challenge as the variety and number of sources grows. Where Does Big Data Come From? • Our Data-driven World – Science • Data bases from astronomy, genomics, environmental data, transportation data, … – Humanities and Social Sciences • Scanned books, historical documents, social interactions data, new technology like GPS, … – Business & Commerce • Corporate sales, stock market transactions, census, airline traffic, … – Entertainment • Internet images, Hollywood movies, MP3 files, … – Medicine • MRI & CT scans, patient records, … Usage Example in Big Data US 2012 Election - predictive modeling - mybarackobama.com - drive traffic to other campaign sites Facebook page (33 million “likes”) YouTube channel (240,000 subscribers and 246 million page views) - a contest to dine with Sarah Jessica Parker - Every single night, the team ran 66,000 computer simulations - Amazon web services - Orca big-data app (however, there were so many fails about ORCA) - YouTube channel (23,700 subscribers and 26 million page views) Usage Example in Big Data (cont.) Data Analysis prediction for US 2012 Election Drew Linzer, June 2012 332 for Obama, 206 for Romney Nate Silver’s, Five thirty Eight blog Predict Obama had a 86% chance of winning Predicted all 50 state correctly Sam Wang, the Princeton Election Consortium The probability of Obama’s re-election at more than 98% media continue reporting the race as very tight Big Challenge in Big Data • How to convert big data into useable information by identifying patterns and deviations from those patterns? • Big data challenge requires talents – Highly skilled in programming and data analysis to extract meaningful information and insights Big Data Techniques and Technologies • Common Skill Sets – Data analysis is the cornerstone – Education and experience in data analysis, business analytics, mathematics, statistics, quantitative skills • • • • • • • • • • • • A/B testing Association rule learning Classification Cluster analysis Crowdsourcing Data fusion and data integration Data mining Ensemble learning Genetic algorithms Machine learning Natural language processing Neural networks • • • • • • • • • • • • • • • Network analysis Optimization Pattern recognition Predictive modeling Regression Sentiment analysis Signal processing Spatial analysis Statistics Supervised learning Simulation Time series analysis Unsupervised learning Visualization … Big Questions about Big Data • What happens in a world of radical transparency, with data widely available? • If you could test all your decisions, how would that change the way you compete? • How would your business change if you used big data for widespread, real time customization? • How can big data augment or even replace Management? • Could you create a new business model based on data? • … Related Careers in Big Data • Data scientist – Often at the top of the big data hierarchical chart – Typically proven professionals who posses deep analytical talent • Data architect – Computer programmers who are skilled in working with undefined data and disparate types of data • Data visualizer – Professionals who are able to translate data into information that people can effectively use • Data change agent – Use data analytics to recommend and drive changes within an organization • Data engineer and operator – Designers, builders and managers of big data systems Job Opportunities in Big Data Demand for Deep Analytical Talent in US • • • Resource: McKinsey There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions Big Data industry is worth more than $100 billion growing at almost 10% a year (roughly twice as fast as the software business) IS Relevant Courses • IS 410: Introduction to Database Design – Discuss the process of database development, including data modeling, database design, and database implementation • IS 420: Database Application Development – Offer hands-on experience for developing client/server database applications using a major database management system • IS 427: Introduction to Artificial Intelligence: Concepts and Applications – Provide an introduction to, and hands-on experience with several Artificial Intelligence (AI) techniques • IS 428: Data Mining Techniques and Applications – Learn both how data mining techniques work and how to apply data mining to various business and organizational contexts