Uploaded by Andrew Cooper

Introduction to Data Science: Concepts and Applications

What is Data Science
Pairing of people who develop technology with people with data and problems to solve.
The pipeline of activities for extracting information from data
o Data Collection
o Data coding
o Integration/processing and analysis
o Visualization
Combination of:
o Computation/Computer Science
o Statistics
o Domain expertise
 Knowing what the right questions to ask are
What Questions Can Data Science Answer
Testing theories in comparative literature
Disaster preparation
Find patterns of consumption to form predictions
o Purchasing behavior
Finding patters in high dimensional data
Understanding complexities
Why is there an Explosion of Data
Increased streaming of data
o Text
o Youtube/Video
o Podcasts
No barrier for entry
More storage and devices collecting information
o Data storage capacity and speed has increased drastically over time
Pervasiveness of high power computation
o Everything is becoming digital
Easy storage of data
New sensors
o IoT devices
o Fitness trackers
o Thermometers
Digital Exhaust
o Every time you click on something online, you create ‘digital exhaust’
Increased available data on the web.
Data Visualization
Explore data
Develop intuitions
Visualize clusters
Data interpretation relies on visualization
Showing the ‘value and meaning’ of Data in an understandable way
o Translating data from a Data Scientist’s perspective to a general audience