Uploaded by vetop19053

100 Day Self Study Master Plan Arun Kumar

advertisement
100 DAY SELF STUDY MASTER PLAN TO BECOME AZURE DATA ENGINEER(ARUN KUMAR)
Day
Day1
Day1
Day1
Day1
Day1
Day2
Day2
Day2
Day2
Day3
Day3
Day4
Day4
Day5
Day5
Day6
Day7
Day8
Day8
Day8
Day9
Python
Azure
What is Cloud?
What are the different major Cloud Platforms in the market?
What are Iaas,Paas and Saas?
Pricing Model in Azure
Download and Install SSMS, SQL Server
SQL commands- DDL, DML, DCL, TCL
SQL commands- DDL, DML, DCL, TCL
SQL commands- DDL, DML, DCL, TCL
Databricks CLI overview
Service Principal
Mount Point
How to create Mount Point using Azure Key Vault
Select, INSERT, Update, Delete, Merge, Create, Alter, Truncate,
Rename, Grant, Revoke, Commit, Rollback, Save point, Drop
Query Execution sequence/Order
Clause, Operators, Predicates in SQL
LIKE, IN, LIMIT, TOP, CASE, INSTRING, SUBSTRING, AVERAGE,
COUNT, MAX, MIN, SUM
UPPER, LOWER, EXIST, EXTRACT, Wild Card Operators Etc.
SUBQUERIES - Nested Subqueries, Corelated Subqueries, Inline
View
JOINS – Self Join, Inner Join, Left Outer Join, Right Outer Join, Cross
Join
Connect ADLS Gen2 from databricks using OAuth 2.0 with an Azure service principal
Create Account in Azure Portal(You will need ceradit card.Rs 2 will be deducted for authentication purpose and that will be returned back.)
Azure Portal Overview
Overview of Different Services relevant to Data Engineering in Azure
Understanding what is Big Data.
Provision ADLS Gen2
Creating containers,directory and uploading files in ADLS Gen2
Provision Azure Databricks and its overview
Creating clusters in databricks.
Widgets Utiity
Widgets Utiity
File System Utility,Understanding Cluster Configuration in Databricks
Revision
Day10
Day10
Day11
Day11
Day12
Day12
Day13
Day14
Day15
Day15
SQL
Database, Datawarehouse and Data Lake
Type of databases
RDBMS
Logical Schema Designing
Data modeling
Star Schema
Fact & Dimension table
Normalization
Normal Forms
Sequence Diagram
E-R Diagram
Flowcharts
Physical Database Design
Business Constraints – Primary key, Foreign Key, Candidate Key,
Composite Key, Surrogate Key, Unique Key, Check, Default, Not
Null
Index- Clustered and Non-Clustered
Connect ADLS Gen2 from databricks using Shared access signatures (SAS)
Connect ADLS Gen2 from databricks using Account keys
Read and write data in Azure Databricks
Data processing in Azure Databricks
Working with DataFrames in Azure Databricks
Working with different types of file formats like CSV,Parquet,Avro,Orc
Revision
SET Operators – Union, Union All, Minus, Intersect
CTE (Common Table Expression)
Window Function – Rank, Dense Rank, Row number, Lead, Lag,
First Value, Last_Value, Percent Rank
Pivot and Unpivot
Rollup, Cube, Grouping Sets
Views, Materialized View
Day15
Day16
Day17
Day18
Day19
Day19
Day20
Day20
Day21
Revision
Day22
Day22
Day22
Day23
Day23
Day24
Basics of Python programming, working with Python interpreter identifiers
keywords constants variables,
types of operators precedence of operators data types
mutable and immutable data types statements
expressions
evaluation and comments input and output statements data type conversion
Day25
Day25
Day26
Day26
Day27
Day27
Day27
Day27
Day28
Day29
Day29
Day30
Day31
Day32
Day34
Day33
Day35
Day36
Day37
Day38
Day39
Day40
Day41
Day42
Day43
Day44
selection (decision) and repetition (iteration)
Selection: if, if-else, and nested if statement, indentation Repetition: for, while, and nested loops, break, continue
Chapter 3: Functions
Introduction to
Day45,Day46-Day47
Modules
Day48
Day49
Packages
Revision
Day50-Day51-Day52
Regular Expressions
Day53-Day54-Day55
Day56
Day57-Day58
Day59
Multi Threading
Revision+Start Thinking on Resume Preparation
Connecting Databases with Python
More dbutils commands
Databricks Runtime
Concept of Delta Table
Deep Dive into Delta Tables
Managed and Unmanaged Table in Databricks
Table batch reads and writes
Table delete, update, and merge
Different types of cluster in databricks
Scheduling jobs in Databricks Notebook
Azure Key Vault
Provisioning of Azure Key Vault
How to securely store Keys and Password in Azure Key-Vault
How to access Keys and Password for Azure Key-Vault
How to connect AKV from ADB(Azure Databricks) and read the secrets of AKV
Secret Scope and databricks utility of secret scope
How to connect AKV from ADF(Azure Data Factory) and read the secrets of AKV.
Azure Data Lake Gen2(ADLS Gen2)
Blob Storage
Azure DataLake Gen1
Azure DataLake Gen2
Installation of Azure Storage Explorer and connection with ADLS Gen2
Difference between Blob Storage,ADLS Gen1 and ADLS Gen2
How do we store data in ADLS
How to read/write data in ADLS Gen2 from databricks
Introduction to functions
Revision
need of functions User defined functions: passing arguments to a function, returning values from functions,
scope of variables
Standard library:, using built-in functions,importing modules-math, random, statistics, creating and importing user defined module
Periods Strings: initializing strings and accessing strings string operations
built-in functions for string manipulation string traversal
Periods Lists: list operations - creating, initializing, traversing and manipulating lists list methods and built-in functions
nested lists
Revision
list as argument to a function
Tuples: Creating, initializing, accessing elements, tuple assignment, operations on tuples, tuple methods and built-in functions, nested tuples.
Dictionary: concept of key-value pair, mutability, creating, initializing, traversing, updating and deleting elements; dictionary methods and built-in functions.
Exception Handling
need of exception handling user-defined exceptions raising exceptions
Try - except - else clause Try - finally clause
Revision
File Handling: text file, file types open and close files
reading and writing text files
ETL CONCEPT
Data Warehouse Concept
Difference between Database,Data Warehouse and Data Lake
OLAP vs OLTP systems Concept
Fact Table and Dimension Table
Data Modelling-Star Schema vs Snowflake Schema Concept
SPARK
Architecture of Spark
Job,Stages and Task in Spark
SparkSQL concepts,Pyspark concepts
Transformation and Action
Lazy Evaluation
rdd vs dataframe vs dataset
Immutability Concept
operation on RDD
operation on dataframe
Difference file formats like parquet,avro,orc and their differences
Understanding Shuffling in Spark
Narrow Transformation
Wide Transformation
Accumulator
Broadcast variable
Partition By
partition pruning,repartition and coalesce
cache vs persist
Azure SQL Server
Provisioning of Azure SQL Server
Installation of SSMS and connection with Azure SQL Server
Installation of Azure Data Studio and connection with Azure SQL Server
Connect Azure SQL server from databricks
Connect Azure SQL server from Azure Data Factory
Azure Data Factory(ADF)
Pipelines
Activities
Datasets
Linked Services
Day60
Day60
Day61
Day61
Day61
Day62
Day62
Day62
Day62
Day63
Revision+Prepare first round of resume
Day64
Day64
Day64
Day64
Day65
Day65
Day66
Day66
Day66
Day67
Day67
Day68
Day69
Day70
Revision+Make changes in the resume if required
Day71-Day72-Day73
Day74
Day75
Day76
Day77
Revision+Make changes in the resume if required
Day78
Day78
Day79
Day79
Day80
Day80
Day81
Day82
Day83
Day84
Revision+Make changes in the resume if required
Day85
Day86
Day86
Day87
Day87
Day87
Day88-Day89
Day90
Day91
Revision+Make changes in the resume if required
Day92-Day93
Medallian Architecture
Day94-Day95-Day96-Day97 Creating data pipelien using all major components
Day98
Revision+Make changes in the resume if required
Day99
Final Resume preparation/Mock Intervew
Day100
Final Resume Preparation/Mock Intervew
Integration Runtimes
AutoResolve Infrarstructue Runtime
Self Hosted Infrastructure Runtime(SHIR)
SSIS Infrastructure Runtime
Pipeline Run
Scheduling and Trigger of pipeline
Parameters
Variables
Orchestration of Pipeline in ADF
Triggers
Schedule Trigger and its System Variables
Event Based Trigger and its System Variables
Tumbling Widow Trigger and its System Variables
Data Flow Pipeline
Pipeline Level Parameters
Global Parameters
Creating Dynamic Pipeline creation
How to monitor ADF data pipelines?
ARM Template
Copy one table from My SQL(on-premise) to Azure SQL Server.
Copy filtered data of one table from My SQL(on-premise) to Azure SQL Server.
Copy All tables from My SQL Server(on-premise) to Azure SQL Server
Copy selected tables from My SQL Server(on-premise) to Azure SQL Server
Pipeline for Incremental Load
Copy all tables from Azure SQL Database to ADLS Gen2
Copy selected tables from Azure SQL Database to ADLS Gen2
Error Logging of ADF pipelines in Azure SQL Server
Azure Logic App
How to send mail alert for pipeline failure and success using logic app.
Deployment
Introduction to Deloyment
DEV,UAT and PROD envrionment
Azure DevOps
Using CI/CD pipelines for deployment of Azure Databricks and Azure Data Factory
GET SET READY TO GIVE INTERVIEW!! ALL THE BEST!!
Tips:1)Resume preparation is a continuous process.It will take multiple iteration of resume bulding to build a robust resume.
2)Consider each failed interview as the mock interview and prepare the set of question you were not able to answer before sitting in the next interview.
3)Keep on practicing different sets of question in Python and SQL during the entire process of learning from any portal.
4)Continuous practice and revision is the key to learn technology and crack interviews.
5)That day will never come when
e
you will know everything so stop waiting for that day.Build your basics and start giving interviews.
Download