DATA SCIENCE MIS0855 | Spring 2016 A Brief Introduction to Data SungYong Um Today’s Goal 1. Learn the basic concept of metadata 2. Example Adapted from Statistical Analysis Data Information Observation, relationship (limiting boundary), prediction Adapted from Knowledge Hypothesis Data, Information, Knowledge Data Information Knowledge Raw, unorganized facts Data that is processed to be useful Application of data and information Adapted from How does the data collected by the NSA differ from the data collected by Ashley Madison’s site? Adapted from Adapted from Each piece of data can be described by… Metadata “data about data” The data itself a title a description a data type a value So what is metadata? Metadata • Data Dictionary • “Data for data” • Data that explains what this data is, what it is for, and how it is structured e.g. title, description, data type How would you describe this data? Metadata for Philadelphia Crime Logs Data type a classification that determines the possible values that data can have Type Description Examples What it’s called in Excel What it’s called in Tableau Integer Whole numbers 35, 102, -40, 0 Number Number (whole) Floating point Fractional values 3.56, -1.0, 10.123 Number Number (decimal) Boolean Binary (2) values True/False, Male/Female N/A Boolean String Numeric and nonnumeric characters Bob, I like cheese, hello123 Text String Date/Time Calendar date and time 8/31/2014, 10:05 AM, 8/31/2014 10:05 AM Date Date or Date & time Why do we care about data types? Why do we need Metadata? • Because computers are dumb! • It is getting there, but computers are still not able to understand what the data means and what it is for. • So, we have to describe it for them. Why do we care about metadata? Documents what the data means • Enables navigation • Facilitates understanding • Eliminates guesswork Metadata is everywhere Metadata in the news What is “telephony metadata?” IMEI/MEID, IMSI, trunk identifiers? What can it tell you? How does the NSA turn this (meta)data into information? What is the “non-metadata” data in a telephone call? The Ashley Madison Data Breach How is this data different from other “famous” data breaches? In what ways is it more damaging than telephony metadata? Does this change our notion of privacy?