Uploaded by İzzet Yıldız

data science introduction notes

advertisement
Why histograms and scatter plots?
(visualization)
Be ready to plot simple graphs
Why do we need visualization in business life imp aspects of it?
Why are we using maps for example
MT most of visualization, visualization mistakes, why correct or wrong, bin size of the histogram,
Why it is imp to plot same data graph for diff bin sizes.
READ THE PROJECT DOCUMENT
What kind of visualization will you use choose at the beginning?
Why 3D visualization is not precise bcz we don’t have volumetric perception, Steven’s power
function (for senses).
For data exploration check the web page.
Understanding anomalies, trends, how to deal with them. Checklist, skewness left right, dynamics,
missing data.
Inputation is for missing data
Future eng mining and transformation // not inc
For outliers you can use tranformations
Future engineering  end of the course case studies. Not in mt.
XML , web standards HTML, JASON how do we organize data look into them.
XML flexible human and computer readable infinite possibilities.
How the data is stored using the schema. Not the language itself but how do we operate relational
data bases? Why pandas makes our lives easier?
To push data to data server distributed data sets ETL? Enabling techs are very imp. Bcz complex
services.
Prepare one page cheat sheet.
Single var histogram
Relationship between two variables scatter plot
Jason is also human readable and also for machine-to-machine interaction
Smaller memory if we do it binary (not human readable)
Kurtosis? Skewness?
Correlation covariance and variance are imp. Z test imp, T test imp, chi-sqr (not that imp)
1.Scatter 2.bar 3.line 4.pie 5.color
most imp to least
For very few elements pie chart is okay but for many elements not
We are not very good at understanding the area
2nd mt more on personal view on the how to explore data and handle missing info (about your
project)
Start thinking about your project.
Download