Data Science course at Turing College Detailed information Module 1: Data Wrangling with Python Sprint 1: Python Mastery • Practice using strings in Python by doing the Pig Latin exercise • Practice Python skills on Codesignal Part 3: Cluster Analysis With Python Part 1: Building Foundational Python Skills For Data • Practice using k-means clustering Analytics • Learn the basics of handling passwords • Refreshing Python skills • Practice object-oriented programming • Learning how to do simulations and resampling • Practice using static type checking • Learning about Python Random module • Learn what is PEP 8 and how to use it in your Python • Learning how comparisons work in Python code • Learning about string formatting in Python • Learn how to use Python's logging module • Practicing Python skills by completing the Number • Learn how to package and publish your Python code guessing game exercise • Practicing Python skills on Codesignal • Practice using dictionaries and sets by doing the Restaurant exercise • Practice Python skills on Codesignal Part 2: Improving Code Reliability • Learn about static type checking with MyPy Part 4: Containers & REST APIs • Learn the k-means clustering unsupervised learning • Learn how to build REST APIs using Bottle library algorithm • Learn how to do data extraction, transformation, and analysis with using defaultdict • Learn how to read CSV files using iterators • Understand how Python dictionaries work • Learn how to apply Clean Code principles in Python • Learn how to use built-in documentation and docstrings to document your code : • Learn the basics of web applications routing, q re uests, responses, and templating . • Learn how to test Python code with py test, itertools, and hypothesis z • Learn containeri ation and Docker basics • Practice using working with files by doing the /etc/passwd to dict exercise • Practice Python skills on Codesignal 1 Data Wrangling with Python Part 5: Project • Practice performing basic EDA • Practice writing clean OOP based Python code and testing it • Practice reading data, performing queries, and filtering data using Pandas • Practice creation own Python package • Understand and apply required software license for Sprint 3: Data Visualization with the package Python • Practice dealing with Python environments • Practice creating and working in a Docker container Part 1: Basic Charting • Learn what the data-ink ratio means and why it is Sprint 2: Data Processing with NumPy and Pandas important • Learn how to create bar charts, line charts, heatmaps, and scatter plots Part 1: Numerical Data with NumPy I • Learn the fundamentals of • Practice using matplotlib and seaborn NumPy • Practice vector and matrix algebra with • Practice Python by solving exercises NumPy • Learn the basics of linear algebra • Practice Python by solving exercises Part 2: Numerical Data with NumPy II • Strengthen your understanding of NumPy • Practice using NumPy to solve real problems • Learn how to think about vectorization and code Part 2: Data Cleaning & Intermediate Charting I • Reinforcing knowledge about bar charts, line charts, heatmaps, and scatter plots • Learn how to use subplots • Learn how to use animation and interactivity • Learn how to visualize basic and continuous errorbars • Learn how to customize your charts performance • Practice Python by solving exercises Part 3: Exploratory Data Analysis with Pandas I • Learn how to create subplots • Practice using matplotlib and seaborn • Practice Python by solving exercises • Learn Pandas fundamentals • Practice using Pandas for basic tasks • Handling missing data Part 3: Data Cleaning & Intermediate Charting II • Reinforce your knowledge about chart customization • Practice Python by solving exercises and annotations • Learn how to use three-dimensional plotting Part 4: Exploratory Data Analysis with Pandas II • Practice using matplotlib and seaborn • Learn intermediate Pandas functionality • Practice your data visualization skills by exploring • Practice using Pandas for more advanced tasks • Practice Python by solving exercises FiveThirtyEight Comic Characters Dataset • Practice Python by solving exercises Part 5: Project • Practice working with data from Kaggle 2 Data Wrangling with Python Part 4: Exploring Data With Charting Part 3: SQL For Data Analysis III • Learn the basics of Statistics • Learn how to work with multivariate data • Learn the basics of Probability • Continue learning about grouping in SQL • Learn how to use Streamlit to develop and deploy data applications • Learn how to generate and manipulate strings in SQL • Recap best practices for data visualization • Learn how to perform arithmetic operations and work with number precision in SQL • Practice Python by solving exercises • Learn how to work with temporal data in SQL Part 5: Project • Practice working with data from Kaggle • Practice performing basic EDA • Learn how to handle Nulls and exceptions in SQL • Learn the basics of the set theory • Practice SQL by solving exercises • Practice visualizing data with Matplotlib & Seaborn Part 4: Practical Statistics For Data Science • Practice reading data, performing queries and filtering data using Pandas • Strengthen EDA skills Sprint 4: Capstone Project Module 2: Data Analysis • Learn SQL subqueries • Learn SQL common table expression (with statement) • Learn SQL temporal tables • Learn to use SQLite and Pandas to work with SQL data. Sprint 1: Introduction to SQL & Statistics Fundamentals • Practice SQL GROUP BY clause by solving Codesignal exercises Part 1: SQL For Data Analysis I Part 5: Project • Refreshing Python and Jupyter Notebooks skills • Practice working with SQLite datasets • Learn what is MySQL • Practice performing EDA • Learn how queries are executed in MySQL • Practice visualizing data with Matplotlib & Seaborn • Learn the SELECT, FROM, and WHERE query clauses • Practice reading data, performing queries and filtering data using SQL and Pandas • Learn the basics of SQL Sprint 2: Inferential Statistical Analysis • Practice SQL by solving exercises Part 2: SQL For Data Analysis II Part 1: Inferential Procedures • Review summary statistics: shape, center (location), spread, outliers • Learn Type 1 and Type 2 errors • Learn five-number summary • Learn SQL joins • Learn the basics of Bayesian statistics • Review p-Values and p-Hacking • Learn about Primary Keys and Foreign Keys • Learn about data-driven decision-making's relation with business performance • Learn about SQL Aliases • Learn how to clean data with SQL • Practice SQL by solving exercises • Learn how to use conditional logic in SQL • Practice SQL by solving exercises 3 Data Analysis Part 2: Confidence Intervals • Practice working with SQLite datasets • Learn about confidence intervals, confidence levels, • Practice performing EDA simple random sampling, and margin of error • Learn about bootstrapping • Learn student's t-distribution, binomial distribution, chi-square distribution, f-distribution, and Poisson • Practice applying statistical inference procedures • Practice visualizing data with Matplotlib & Seaborn • Practice reading data, performing queries and filtering data using SQL and Pandas distribution • Learn about SQL transactions, indexes, and constraints • Learn how to use window functions in SQL • Practice SQL by solving exercises Part 3: Hypothesis Testing I Sprint 3: Statistical Modeling Part 1: Modeling Fundamentals • Learn the difference between dependent and independent variables • Learn the difference between dependent and • Deep dive into p-values, p-hacking, and how to perform statistical tests independent data • Learn how to use Q-Q plots to visually test for data • Learn about t-tests normality • Learn about ANOVA • Learn about control variables • Learn about multi-arm bandits • Learn about confounding problem • Learn about SQL views • Learn about entropy and information gain • Learn some performance tricks in SQL • Learn how to scrape web pages using BeautifulSoup • Practice SQL by solving exercises • Strengthen your Python engineering skills Part 4: Hypothesis Testing II • Strengthen your understanding your statistical hypothesis testing knowledge • Learn about Lean AI Playbook and AI Uncertainty Principle • Learn about sensitivity analysis • Learn about the most common mistake that data scientists make • Deepen your understanding of the data windows in SQL • Practice SQL by solving exercises Part 2: Linear & Logistic Regression • Learn and practice using linear regression • Learn and practice using logistic regression • Learn how to do linear model inference • Learn k-nearest neighbors algorithm • Learn Naive Bayes algorithm • Learn the fundamentals of Spark • Practice SQL by solving exercises • Deepen your understanding of ranking in SQL • Learn lag and lead SQL functions Part 3: Multilevel and Marginal Models • Learn about Apache Drill • Learn and practice modeling dependent data with • Practice SQL by solving exercises Part 5: Project multilevel and marginal models • Review the most important and popular Python libraries for data science • Strengthen your understanding of linear and logistic regressions 4 Data Analysis • Learn Spark fundamentals • Understand ethical issues around using data • Practice SQL by solving exercises • Learn the Machine Learning Process Lifecycle Part 4: Introduction To Bayesian Statistics • Learn model evaluation • Learn when to use survey weights in models Part 3: KNNs, Decision Trees, and Random Forests • Strengthen your understanding of Bayesian theorem • Learn the decision the tree models • Learn the fundamentals of Bayesian statistics • Learn the random forest models • Learn and practice the Naive Bayes algorithm • Strengthen your k-NN understanding • Learn how to evaluate classification models • Practice SQL by solving exercises Part 5: Project • Practice working with CSV files • Practice performing EDA • Practice applying statistical inference procedures • Practice using linear machine learning models • Practice visualizing data with Matplotlib & Seaborn • Practice reading data, performing queries and filtering data using SQL and Pandas Part 4: Support Vector Machines • Learn the support vector machines • Learn the Hinge Loss function Part 5: Travel Insurance Prediction • Practice working with CSV files • Practice performing EDA • Practice applying statistical inference procedures • Practice using linear machine learning models • Practice visualizing data with Matplotlib & Seaborn Sprint 4: Capstone Project • Practice reading data, performing queries and filtering data using SQL and Pandas Module 3: Machine Learning Sprint 2: Gradient Boosted Trees & Feature Engineering Sprint 1: Supervised Machine Learning Fundamentals Part 1: Gradient Boosted Trees, XGBoost, CatBoost, and LightGBM Part 1: Introduction to Machine Learning • Understand the machine learning landscape • Learn different types of machine learning • Understand how machine learning project looks endto-end • Learn how business problems and data interact • Learn the XGBoost algorithm • Learn the CatBoost algorithm • Learn the LightGBM algorithm Part 2: Feature Engineering Part 2: Machine Learning Projects • Learn feature preprocessing • Strengthen your understanding of the classification algorithms & metrics used in machine learning • Learn feature engineering • Understand how to acquire data • Review linear algebra • Learn autofeat library 5 Machine Learning Part 3: Interpretable & Responsible Machine Learning Part 2: Clustering • Learn The Five Stages of ML Adoption • Learn K-Means • Learn the AI Hierarchy of Needs for an ML Solution • Learn DBSCAN • Learn Cumulative Gains Plot, Lift Curve, and Discrimination Threshold • Learn HDBSCAN • Learn the learning and validation curves • Learn Gausian Mixture Model • Learn anomaly detection Part 4: Maintaining Machine Learning Models • Learn Taylor series • Learn model deployment strategies Part 3: Working with Imbalanced Data • Learn model deployment patterns • Learn the properties of effective model runtime • Learn how to deploy models with FastAPI • Learn eigenvalues and eigenvectors • Learn SMOTE • Learn ADASYN • Learn class weighting • Learn how to use imblearn library Part 5: Stroke Prediction Dataset • Learn how to use scikit-lego library • Practice working with CSV files Part 4: Hyperparameter Turning & Model Selection • Practice performing EDA • Practice applying statistical inference procedures • Learn HyperBand • Learn Bayesian Hyperparameter Tuning • Practice using various types of machine learning models • Learn Population-based Training • Practice building ensembles of machine learning models • Learn how to use the Auto-sklearn library • Practice deploying machine learning models • Practice visualizing data with Matplotlib & Seaborn • Practice reading data, performing queries and filtering data Sprint 3: Unsupervised Learning & Hyperparameter Tuning • Learn how to use the Ray Tune library Part 5: LendingClub • Practice downloading datasets from external sources • Practice performing EDA • Practice applying statistical inference procedures • Practice using various types of machine learning models Part 1: Dimensionality Reduction • Practice building ensembles of machine learning models • Learn PCA and its variants • Practice using hyperparameter tuning • Learn t-SNE • Practice using AutoML tools • Learn UMAP • Practice deploying machine learning models • Learn PHATE • Practice visualizing data with Matplotlib & Seaborn • Learn the curse of dimensionality • Practice reading data, performing queries, and filtering data. • Learn variance thresholding • Learn recursive feature elimination • Learn functions, derivatives, differentiation, Jacobian, and Hessian Sprint 4: Capstone Project 6 Deep Learning Module 4: Deep Learning Sprint 1: Computer Vision Part 2: Transformers Part 2 • Understanding attention mechanism. • Self attention. Part 1: Deep Learning Fundamentals • BERT. • Supervised learning • RoBERTa. • Neuron in artificial neural networks • Optimization • Loss/Cost • Learning rate Part 2: Introduction to PyTorch • Introduction to tensors. • Building a dataset. • Building a module. • Training. • Using PyTorch Lightning. Part 3: Convolutional Neural Networks Part 3: Recurrent neural networks • LSTM • GRU • Text generation Part 4: Generative models • Character-based generative models. • GPT-2/3 • GPT vs BERT Part 5: Project • Intro to convolutions. Sprint 3: Practical Deep Learning • Max pooling. Part 1: Advanced NLP • Building CNN in PyTorch. • Longformers. • Data augmentations. • Question-Answering with transformers Part 4: Transfer Learning Part 2: Advanced Deep Learning • How to do transfer learning • Image segmentation. • What is fine-tuning and how to do it • Object detection. • FastAI framework. • GANs. Part 5: Project • Auto-encoders. Part 3: Delivering ML Projects Sprint 2: Natural Language Processing Part 1: Transformers Part 1 • Transformers • Attention mechanism • Setting baseline • Making a good test-set • Iterating over models • Understanding why model underperforms • Deploying model to production • HuggingFace library • Bert/DistilBert transformer for text classification 7 Deep Learning Part 4: Practical AI Ethics • Bias in decisions • Model interpretability • Machine ethics • AI security Part 5: Project Sprint 4: Capstone Project 8