Uploaded by sjuejohn

data-science-structure-2022-02

advertisement
Data Science course
at Turing College
Detailed information
Module 1: Data Wrangling
with Python
Sprint 1: Python Mastery
• Practice using strings in Python by doing the Pig
Latin exercise
• Practice Python skills on Codesignal
Part 3: Cluster Analysis With Python
Part 1: Building Foundational Python Skills For Data
• Practice using k-means clustering
Analytics
• Learn the basics of handling passwords
• Refreshing Python skills
• Practice object-oriented programming
• Learning how to do simulations and resampling
• Practice using static type checking
• Learning about Python Random module
• Learn what is PEP 8 and how to use it in your Python
• Learning how comparisons work in Python
code
• Learning about string formatting in Python
• Learn how to use Python's logging module
• Practicing Python skills by completing the Number
• Learn how to package and publish your Python code
guessing game exercise
• Practicing Python skills on Codesignal
• Practice using dictionaries and sets by doing the
Restaurant exercise
• Practice Python skills on Codesignal
Part 2: Improving Code Reliability
• Learn about static type checking with MyPy
Part 4: Containers & REST APIs
• Learn the k-means clustering unsupervised learning
• Learn how to build REST APIs using Bottle library
algorithm
• Learn how to do data extraction, transformation, and
analysis with using defaultdict
• Learn how to read CSV files using iterators
• Understand how Python dictionaries work
• Learn how to apply Clean Code principles in Python
• Learn how to use built-in documentation and
docstrings to document your code
:
• Learn the basics of web applications routing,
q
re uests, responses, and templating
.
• Learn how to test Python code with py test, itertools,
and hypothesis
z
• Learn containeri ation and Docker basics
• Practice using working with files by doing the
/etc/passwd to dict exercise
• Practice Python skills on Codesignal
1
Data Wrangling with Python
Part 5: Project
• Practice performing basic EDA
• Practice writing clean OOP based Python code and
testing it
• Practice reading data, performing queries, and
filtering data using Pandas
• Practice creation own Python package
•
Understand and apply required software license for
Sprint 3: Data Visualization with
the package
Python
• Practice dealing with Python environments
• Practice creating and working in a Docker container
Part 1: Basic Charting
• Learn what the data-ink ratio means and why it is
Sprint 2: Data Processing with
NumPy and Pandas
important
• Learn how to create bar charts, line charts,
heatmaps, and scatter plots
Part 1: Numerical Data with NumPy I
• Learn the fundamentals of
• Practice using matplotlib and seaborn
NumPy
• Practice vector and matrix algebra with
• Practice Python by solving exercises
NumPy
• Learn the basics of linear algebra
• Practice Python by solving exercises
Part 2: Numerical Data with NumPy II
•
Strengthen your understanding of NumPy
• Practice using
NumPy to solve real problems
• Learn how to think about vectorization and code
Part 2: Data Cleaning & Intermediate Charting I
• Reinforcing knowledge about bar charts, line charts,
heatmaps, and scatter plots
• Learn how to use subplots
• Learn how to use animation and interactivity
• Learn how to visualize basic and continuous
errorbars
• Learn how to customize your charts
performance
• Practice Python by solving exercises
Part 3: Exploratory Data Analysis with Pandas I
• Learn how to create subplots
• Practice using matplotlib and seaborn
• Practice Python by solving exercises
• Learn Pandas fundamentals
• Practice using Pandas for basic tasks
•
Handling missing data
Part 3: Data Cleaning & Intermediate Charting II
• Reinforce your knowledge about chart customization
• Practice Python by solving exercises
and annotations
• Learn how to use three-dimensional plotting
Part 4: Exploratory Data Analysis with Pandas II
• Practice using matplotlib and seaborn
• Learn intermediate Pandas functionality
• Practice your data visualization skills by exploring
• Practice using Pandas for more advanced tasks
• Practice Python by solving exercises
FiveThirtyEight Comic Characters Dataset
• Practice Python by solving exercises
Part 5: Project
• Practice working with data from
Kaggle
2
Data Wrangling with Python
Part 4: Exploring Data With Charting
Part 3: SQL For Data Analysis III
• Learn the basics of Statistics
• Learn how to work with multivariate data
• Learn the basics of Probability
• Continue learning about grouping in SQL
• Learn how to use Streamlit to develop and deploy
data applications
• Learn how to generate and manipulate strings in SQL
• Recap best practices for data visualization
• Learn how to perform arithmetic operations and work
with number precision in SQL
• Practice Python by solving exercises
• Learn how to work with temporal data in SQL
Part 5: Project
• Practice working with data from Kaggle
• Practice performing basic EDA
• Learn how to handle Nulls and exceptions in SQL
• Learn the basics of the set theory
• Practice SQL by solving exercises
• Practice visualizing data with Matplotlib & Seaborn
Part 4: Practical Statistics For Data Science
• Practice reading data, performing queries and
filtering data using Pandas
• Strengthen EDA skills
Sprint 4: Capstone Project
Module 2: Data Analysis
• Learn SQL subqueries
• Learn SQL common table expression (with
statement)
• Learn SQL temporal tables
• Learn to use SQLite and Pandas to work with SQL
data.
Sprint 1: Introduction to SQL &
Statistics Fundamentals
• Practice SQL GROUP BY clause by solving
Codesignal exercises
Part 1: SQL For Data Analysis I
Part 5: Project
• Refreshing Python and Jupyter Notebooks skills
• Practice working with SQLite datasets
• Learn what is MySQL
• Practice performing EDA
• Learn how queries are executed in MySQL
• Practice visualizing data with Matplotlib & Seaborn
• Learn the SELECT, FROM, and WHERE query
clauses
• Practice reading data, performing queries and
filtering data using SQL and Pandas
• Learn the basics of SQL
Sprint 2: Inferential Statistical
Analysis
• Practice SQL by solving exercises
Part 2: SQL For Data Analysis II
Part 1: Inferential Procedures
• Review summary statistics: shape, center (location),
spread, outliers
• Learn Type 1 and Type 2 errors
• Learn five-number summary
• Learn SQL joins
• Learn the basics of Bayesian statistics
• Review p-Values and p-Hacking
• Learn about Primary Keys and Foreign Keys
• Learn about data-driven decision-making's relation
with business performance
• Learn about SQL Aliases
• Learn how to clean data with SQL
• Practice SQL by solving exercises
• Learn how to use conditional logic in SQL
• Practice SQL by solving exercises
3
Data Analysis
Part 2: Confidence Intervals
• Practice working with SQLite datasets
• Learn about confidence intervals, confidence levels,
• Practice performing EDA
simple random sampling, and margin of error
• Learn about bootstrapping
• Learn student's t-distribution, binomial distribution,
chi-square distribution, f-distribution, and Poisson
• Practice applying statistical inference procedures
• Practice visualizing data with Matplotlib & Seaborn
• Practice reading data, performing queries and
filtering data using SQL and Pandas
distribution
• Learn about SQL transactions, indexes, and
constraints
• Learn how to use window functions in SQL
• Practice SQL by solving exercises
Part 3: Hypothesis Testing I
Sprint 3: Statistical Modeling
Part 1: Modeling Fundamentals
• Learn the difference between dependent and
independent variables
• Learn the difference between dependent and
• Deep dive into p-values, p-hacking, and how to
perform statistical tests
independent data
• Learn how to use Q-Q plots to visually test for data
• Learn about t-tests
normality
• Learn about ANOVA
• Learn about control variables
• Learn about multi-arm bandits
• Learn about confounding problem
• Learn about SQL views
• Learn about entropy and information gain
• Learn some performance tricks in SQL
• Learn how to scrape web pages using BeautifulSoup
• Practice SQL by solving exercises
• Strengthen your Python engineering skills
Part 4: Hypothesis Testing II
• Strengthen your understanding your statistical
hypothesis testing knowledge
• Learn about Lean AI Playbook and AI Uncertainty
Principle
• Learn about sensitivity analysis
• Learn about the most common mistake that data
scientists make
• Deepen your understanding of the data windows in
SQL
• Practice SQL by solving exercises
Part 2: Linear & Logistic Regression
• Learn and practice using linear regression
• Learn and practice using logistic regression
• Learn how to do linear model inference
• Learn k-nearest neighbors algorithm
• Learn Naive Bayes algorithm
• Learn the fundamentals of Spark
• Practice SQL by solving exercises
• Deepen your understanding of ranking in SQL
• Learn lag and lead SQL functions
Part 3: Multilevel and Marginal Models
• Learn about Apache Drill
• Learn and practice modeling dependent data with
• Practice SQL by solving exercises
Part 5: Project
multilevel and marginal models
•
Review the most important and popular Python
libraries for data science
• Strengthen your understanding of linear and logistic
regressions
4
Data Analysis
• Learn Spark fundamentals
• Understand ethical issues around using data
• Practice SQL by solving exercises
• Learn the Machine Learning Process Lifecycle
Part 4: Introduction To Bayesian Statistics
• Learn model evaluation
• Learn when to use survey weights in models
Part 3: KNNs, Decision Trees, and Random Forests
• Strengthen your understanding of Bayesian theorem
• Learn the decision the tree models
• Learn the fundamentals of Bayesian statistics
• Learn the random forest models
• Learn and practice the Naive Bayes algorithm
• Strengthen your k-NN understanding
• Learn how to evaluate classification models
• Practice SQL by solving exercises
Part 5: Project
• Practice working with CSV files
• Practice performing EDA
• Practice applying statistical inference procedures
• Practice using linear machine learning models
• Practice visualizing data with Matplotlib & Seaborn
• Practice reading data, performing queries and
filtering data using SQL and Pandas
Part 4: Support Vector Machines
• Learn the support vector machines
• Learn the Hinge Loss function
Part 5: Travel Insurance Prediction
• Practice working with CSV files
• Practice performing EDA
• Practice applying statistical inference procedures
• Practice using linear machine learning models
• Practice visualizing data with Matplotlib & Seaborn
Sprint 4: Capstone Project
• Practice reading data, performing queries and
filtering data using SQL and Pandas
Module 3: Machine Learning
Sprint 2: Gradient Boosted Trees &
Feature Engineering
Sprint 1: Supervised Machine
Learning Fundamentals
Part 1: Gradient Boosted Trees, XGBoost, CatBoost,
and LightGBM
Part 1: Introduction to Machine Learning
• Understand the machine learning landscape
• Learn different types of machine learning
• Understand how machine learning project looks endto-end
• Learn how business problems and data interact
• Learn the XGBoost algorithm
• Learn the CatBoost algorithm
• Learn the LightGBM algorithm
Part 2: Feature Engineering
Part 2: Machine Learning Projects
• Learn feature preprocessing
• Strengthen your understanding of the classification
algorithms & metrics used in machine learning
• Learn feature engineering
• Understand how to acquire data
• Review linear algebra
• Learn autofeat library
5
Machine Learning
Part 3: Interpretable & Responsible Machine Learning
Part 2: Clustering
• Learn The Five Stages of ML Adoption
• Learn K-Means
• Learn the AI Hierarchy of Needs for an ML Solution
• Learn DBSCAN
• Learn Cumulative Gains Plot, Lift Curve, and
Discrimination Threshold
• Learn HDBSCAN
• Learn the learning and validation curves
• Learn Gausian Mixture Model
• Learn anomaly detection
Part 4: Maintaining Machine Learning Models
• Learn Taylor series
• Learn model deployment strategies
Part 3: Working with Imbalanced Data
• Learn model deployment patterns
• Learn the properties of effective model runtime
• Learn how to deploy models with FastAPI
• Learn eigenvalues and eigenvectors
• Learn SMOTE
• Learn ADASYN
• Learn class weighting
• Learn how to use imblearn library
Part 5: Stroke Prediction Dataset
• Learn how to use scikit-lego library
• Practice working with CSV files
Part 4: Hyperparameter Turning & Model Selection
• Practice performing EDA
• Practice applying statistical inference procedures
• Learn HyperBand
• Learn Bayesian Hyperparameter Tuning
• Practice using various types of machine learning
models
• Learn Population-based Training
• Practice building ensembles of machine learning
models
• Learn how to use the Auto-sklearn library
• Practice deploying machine learning models
• Practice visualizing data with Matplotlib & Seaborn
• Practice reading data, performing queries and
filtering data
Sprint 3: Unsupervised Learning &
Hyperparameter Tuning
• Learn how to use the Ray Tune library
Part 5: LendingClub
• Practice downloading datasets from external sources
• Practice performing EDA
• Practice applying statistical inference procedures
• Practice using various types of machine learning
models
Part 1: Dimensionality Reduction
• Practice building ensembles of machine learning
models
• Learn PCA and its variants
• Practice using hyperparameter tuning
• Learn t-SNE
• Practice using AutoML tools
• Learn UMAP
• Practice deploying machine learning models
• Learn PHATE
• Practice visualizing data with Matplotlib & Seaborn
• Learn the curse of dimensionality
• Practice reading data, performing queries, and
filtering data.
• Learn variance thresholding
• Learn recursive feature elimination
• Learn functions, derivatives, differentiation,
Jacobian, and Hessian
Sprint 4: Capstone Project
6
Deep Learning
Module 4: Deep Learning
Sprint 1: Computer Vision
Part 2: Transformers Part 2
• Understanding attention mechanism.
• Self attention.
Part 1: Deep Learning Fundamentals
• BERT.
• Supervised learning
• RoBERTa.
• Neuron in artificial neural networks
• Optimization
• Loss/Cost
• Learning rate
Part 2: Introduction to PyTorch
• Introduction to tensors.
• Building a dataset.
• Building a module.
• Training.
• Using PyTorch Lightning.
Part 3: Convolutional Neural Networks
Part 3: Recurrent neural networks
• LSTM
• GRU
• Text generation
Part 4: Generative models
• Character-based generative models.
• GPT-2/3
• GPT vs BERT
Part 5: Project
• Intro to convolutions.
Sprint 3: Practical Deep Learning
• Max pooling.
Part 1: Advanced NLP
• Building CNN in PyTorch.
• Longformers.
• Data augmentations.
• Question-Answering with transformers
Part 4: Transfer Learning
Part 2: Advanced Deep Learning
• How to do transfer learning
• Image segmentation.
• What is fine-tuning and how to do it
• Object detection.
• FastAI framework.
• GANs.
Part 5: Project
• Auto-encoders.
Part 3: Delivering ML Projects
Sprint 2: Natural Language
Processing
Part 1: Transformers Part 1
• Transformers
• Attention mechanism
• Setting baseline
• Making a good test-set
• Iterating over models
• Understanding why model underperforms
• Deploying model to production
• HuggingFace library
• Bert/DistilBert transformer for text classification
7
Deep Learning
Part 4: Practical AI Ethics
• Bias in decisions
• Model interpretability
• Machine ethics
• AI security
Part 5: Project
Sprint 4: Capstone Project
8
Download