DATA SCIENCE MIS0855 | Spring 2016 Storing and Retrieving Data SungYong Um sungyong.um@temple.edu The Database A collection of files, organized as tables Tables are made up records (rows) Rows are made up of fields (columns) Fields are made up of characters Database Types Flat file database Relational database Flat file database EmpNo Ename DeptNo DeptName 101 Abigail 10 Marketing 102 Bob 20 Purchasing 103 Carolyn 10 Marketing 104 Doug 20 Purchasing 105 Evelyn 10 Marketing Relational database DeptNo DeptName 10 Marketing 20 Purchasing EmpNo Ename DeptNo 101 Abigail 10 102 Bob 20 103 Carolyn 10 104 Doug 20 105 Evelyn 10 Back to Big Data Velocity Variety Volume “Big Data” is a set of technologies It is not data analytics …Or information, or knowledge It’s a way of processing large amounts of data, not extracting insight from it Why does Facebook (or Twitter, or Instagram) care how you feel? Do you think this is a problem? Reminder: Looking for words… Yelp reviews for Poi Dog Snack Shop “Positive” Word Library Great Fantastic Best Wonderful Outstanding Amazing How we’ll do sentiment analysis Retrieve Tweets using a spreadsheet add-in in Google Drive Copy the tweets from Google Drive to a special Excel workbook Excel classifies the Tweets as positive or negative “Big Data” 101: It’s just a set of technologies… This isn’t the only option, but it is among the most popular… What is their purpose? What Scenario: • There an extremely large database with constantly changing data. • My regular database and computer can’t keep up with this amount of data. • Things get slow, or break down completely. do… What do… Stores big databases in smaller pieces across a network of connected computers Breaks up the task and gives each connected computer a small piece to work on What those jobs are can be anything! An example… Stores real-time cable box activity for 5,000,000 customers, by region Analyzes which programs people are mostly likely to pause and then skip commercials