Uploaded by Andrew Cooper

Intro to Data Science

advertisement
What is Data Science



Pairing of people who develop technology with people with data and problems to solve.
The pipeline of activities for extracting information from data
o Data Collection
o Data coding
o Integration/processing and analysis
o Visualization
Combination of:
o Computation/Computer Science
o Statistics
o Domain expertise
 Knowing what the right questions to ask are
What Questions Can Data Science Answer





Testing theories in comparative literature
Disaster preparation
Find patterns of consumption to form predictions
o Purchasing behavior
Finding patters in high dimensional data
Understanding complexities
Why is there an Explosion of Data








Increased streaming of data
o Text
o Youtube/Video
o Podcasts
No barrier for entry
More storage and devices collecting information
o Data storage capacity and speed has increased drastically over time
Pervasiveness of high power computation
o Everything is becoming digital
Easy storage of data
New sensors
o IoT devices
o Fitness trackers
o Thermometers
Digital Exhaust
o Every time you click on something online, you create ‘digital exhaust’
Increased available data on the web.
Data Visualization





Explore data
Develop intuitions
Visualize clusters
Data interpretation relies on visualization
Showing the ‘value and meaning’ of Data in an understandable way
o Translating data from a Data Scientist’s perspective to a general audience
Download