100 DAY SELF STUDY MASTER PLAN TO BECOME AZURE DATA ENGINEER(ARUN KUMAR) Day Day1 Day1 Day1 Day1 Day1 Day2 Day2 Day2 Day2 Day3 Day3 Day4 Day4 Day5 Day5 Day6 Day7 Day8 Day8 Day8 Day9 Python Azure What is Cloud? What are the different major Cloud Platforms in the market? What are Iaas,Paas and Saas? Pricing Model in Azure Download and Install SSMS, SQL Server SQL commands- DDL, DML, DCL, TCL SQL commands- DDL, DML, DCL, TCL SQL commands- DDL, DML, DCL, TCL Databricks CLI overview Service Principal Mount Point How to create Mount Point using Azure Key Vault Select, INSERT, Update, Delete, Merge, Create, Alter, Truncate, Rename, Grant, Revoke, Commit, Rollback, Save point, Drop Query Execution sequence/Order Clause, Operators, Predicates in SQL LIKE, IN, LIMIT, TOP, CASE, INSTRING, SUBSTRING, AVERAGE, COUNT, MAX, MIN, SUM UPPER, LOWER, EXIST, EXTRACT, Wild Card Operators Etc. SUBQUERIES - Nested Subqueries, Corelated Subqueries, Inline View JOINS – Self Join, Inner Join, Left Outer Join, Right Outer Join, Cross Join Connect ADLS Gen2 from databricks using OAuth 2.0 with an Azure service principal Create Account in Azure Portal(You will need ceradit card.Rs 2 will be deducted for authentication purpose and that will be returned back.) Azure Portal Overview Overview of Different Services relevant to Data Engineering in Azure Understanding what is Big Data. Provision ADLS Gen2 Creating containers,directory and uploading files in ADLS Gen2 Provision Azure Databricks and its overview Creating clusters in databricks. Widgets Utiity Widgets Utiity File System Utility,Understanding Cluster Configuration in Databricks Revision Day10 Day10 Day11 Day11 Day12 Day12 Day13 Day14 Day15 Day15 SQL Database, Datawarehouse and Data Lake Type of databases RDBMS Logical Schema Designing Data modeling Star Schema Fact & Dimension table Normalization Normal Forms Sequence Diagram E-R Diagram Flowcharts Physical Database Design Business Constraints – Primary key, Foreign Key, Candidate Key, Composite Key, Surrogate Key, Unique Key, Check, Default, Not Null Index- Clustered and Non-Clustered Connect ADLS Gen2 from databricks using Shared access signatures (SAS) Connect ADLS Gen2 from databricks using Account keys Read and write data in Azure Databricks Data processing in Azure Databricks Working with DataFrames in Azure Databricks Working with different types of file formats like CSV,Parquet,Avro,Orc Revision SET Operators – Union, Union All, Minus, Intersect CTE (Common Table Expression) Window Function – Rank, Dense Rank, Row number, Lead, Lag, First Value, Last_Value, Percent Rank Pivot and Unpivot Rollup, Cube, Grouping Sets Views, Materialized View Day15 Day16 Day17 Day18 Day19 Day19 Day20 Day20 Day21 Revision Day22 Day22 Day22 Day23 Day23 Day24 Basics of Python programming, working with Python interpreter identifiers keywords constants variables, types of operators precedence of operators data types mutable and immutable data types statements expressions evaluation and comments input and output statements data type conversion Day25 Day25 Day26 Day26 Day27 Day27 Day27 Day27 Day28 Day29 Day29 Day30 Day31 Day32 Day34 Day33 Day35 Day36 Day37 Day38 Day39 Day40 Day41 Day42 Day43 Day44 selection (decision) and repetition (iteration) Selection: if, if-else, and nested if statement, indentation Repetition: for, while, and nested loops, break, continue Chapter 3: Functions Introduction to Day45,Day46-Day47 Modules Day48 Day49 Packages Revision Day50-Day51-Day52 Regular Expressions Day53-Day54-Day55 Day56 Day57-Day58 Day59 Multi Threading Revision+Start Thinking on Resume Preparation Connecting Databases with Python More dbutils commands Databricks Runtime Concept of Delta Table Deep Dive into Delta Tables Managed and Unmanaged Table in Databricks Table batch reads and writes Table delete, update, and merge Different types of cluster in databricks Scheduling jobs in Databricks Notebook Azure Key Vault Provisioning of Azure Key Vault How to securely store Keys and Password in Azure Key-Vault How to access Keys and Password for Azure Key-Vault How to connect AKV from ADB(Azure Databricks) and read the secrets of AKV Secret Scope and databricks utility of secret scope How to connect AKV from ADF(Azure Data Factory) and read the secrets of AKV. Azure Data Lake Gen2(ADLS Gen2) Blob Storage Azure DataLake Gen1 Azure DataLake Gen2 Installation of Azure Storage Explorer and connection with ADLS Gen2 Difference between Blob Storage,ADLS Gen1 and ADLS Gen2 How do we store data in ADLS How to read/write data in ADLS Gen2 from databricks Introduction to functions Revision need of functions User defined functions: passing arguments to a function, returning values from functions, scope of variables Standard library:, using built-in functions,importing modules-math, random, statistics, creating and importing user defined module Periods Strings: initializing strings and accessing strings string operations built-in functions for string manipulation string traversal Periods Lists: list operations - creating, initializing, traversing and manipulating lists list methods and built-in functions nested lists Revision list as argument to a function Tuples: Creating, initializing, accessing elements, tuple assignment, operations on tuples, tuple methods and built-in functions, nested tuples. Dictionary: concept of key-value pair, mutability, creating, initializing, traversing, updating and deleting elements; dictionary methods and built-in functions. Exception Handling need of exception handling user-defined exceptions raising exceptions Try - except - else clause Try - finally clause Revision File Handling: text file, file types open and close files reading and writing text files ETL CONCEPT Data Warehouse Concept Difference between Database,Data Warehouse and Data Lake OLAP vs OLTP systems Concept Fact Table and Dimension Table Data Modelling-Star Schema vs Snowflake Schema Concept SPARK Architecture of Spark Job,Stages and Task in Spark SparkSQL concepts,Pyspark concepts Transformation and Action Lazy Evaluation rdd vs dataframe vs dataset Immutability Concept operation on RDD operation on dataframe Difference file formats like parquet,avro,orc and their differences Understanding Shuffling in Spark Narrow Transformation Wide Transformation Accumulator Broadcast variable Partition By partition pruning,repartition and coalesce cache vs persist Azure SQL Server Provisioning of Azure SQL Server Installation of SSMS and connection with Azure SQL Server Installation of Azure Data Studio and connection with Azure SQL Server Connect Azure SQL server from databricks Connect Azure SQL server from Azure Data Factory Azure Data Factory(ADF) Pipelines Activities Datasets Linked Services Day60 Day60 Day61 Day61 Day61 Day62 Day62 Day62 Day62 Day63 Revision+Prepare first round of resume Day64 Day64 Day64 Day64 Day65 Day65 Day66 Day66 Day66 Day67 Day67 Day68 Day69 Day70 Revision+Make changes in the resume if required Day71-Day72-Day73 Day74 Day75 Day76 Day77 Revision+Make changes in the resume if required Day78 Day78 Day79 Day79 Day80 Day80 Day81 Day82 Day83 Day84 Revision+Make changes in the resume if required Day85 Day86 Day86 Day87 Day87 Day87 Day88-Day89 Day90 Day91 Revision+Make changes in the resume if required Day92-Day93 Medallian Architecture Day94-Day95-Day96-Day97 Creating data pipelien using all major components Day98 Revision+Make changes in the resume if required Day99 Final Resume preparation/Mock Intervew Day100 Final Resume Preparation/Mock Intervew Integration Runtimes AutoResolve Infrarstructue Runtime Self Hosted Infrastructure Runtime(SHIR) SSIS Infrastructure Runtime Pipeline Run Scheduling and Trigger of pipeline Parameters Variables Orchestration of Pipeline in ADF Triggers Schedule Trigger and its System Variables Event Based Trigger and its System Variables Tumbling Widow Trigger and its System Variables Data Flow Pipeline Pipeline Level Parameters Global Parameters Creating Dynamic Pipeline creation How to monitor ADF data pipelines? ARM Template Copy one table from My SQL(on-premise) to Azure SQL Server. Copy filtered data of one table from My SQL(on-premise) to Azure SQL Server. Copy All tables from My SQL Server(on-premise) to Azure SQL Server Copy selected tables from My SQL Server(on-premise) to Azure SQL Server Pipeline for Incremental Load Copy all tables from Azure SQL Database to ADLS Gen2 Copy selected tables from Azure SQL Database to ADLS Gen2 Error Logging of ADF pipelines in Azure SQL Server Azure Logic App How to send mail alert for pipeline failure and success using logic app. Deployment Introduction to Deloyment DEV,UAT and PROD envrionment Azure DevOps Using CI/CD pipelines for deployment of Azure Databricks and Azure Data Factory GET SET READY TO GIVE INTERVIEW!! ALL THE BEST!! Tips:1)Resume preparation is a continuous process.It will take multiple iteration of resume bulding to build a robust resume. 2)Consider each failed interview as the mock interview and prepare the set of question you were not able to answer before sitting in the next interview. 3)Keep on practicing different sets of question in Python and SQL during the entire process of learning from any portal. 4)Continuous practice and revision is the key to learn technology and crack interviews. 5)That day will never come when e you will know everything so stop waiting for that day.Build your basics and start giving interviews.