Uploaded by BINOY PAUL P

MBA ZG536 COURSE HANDOUT

advertisement
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
WORK INTEGRATED LEARNING PROGRAMMES
COURSE HANDOUT
Part A: Content Design
Course Title
Course No(s)
Credit Units
Course Author
Version No
Date
Foundations of Data Science
MBA ZG536/PDBA ZG536
4
ARINDAM ROY, PRAVIN MHASKE
1.0
1 June 2020
Course Description
Introduction, Role of a Data Scientist, Statistics vs. Data Science, Fundamentals of Data Science,
Data Science process and life cycle, Exploratory Data Analysis, Data Engineering and shaping,
Overview of Data Science Techniques and Models, Introduction to Regression, Classification,
Shrinkage, Dimension Reduction, Tree-based models, Support Vector Machines, Unsupervised
learning, Choosing and evaluating models, Featuarization, Overview of Neural Networks, Data
mining, and pattern recognition techniques, Documentation, Deployment, and Presentations of the
insights
Course Objectives
No
Objective
CO1
Get introduced to the field of Data Science, roles, process and challenges involved
therein
CO2
Explore and experience the steps involved in the data preparations and exploratory data
analysis
CO3
Learn to select and apply proper analytics technique for various scenarios, assess the
models performance and interpret the results of the predictive model
CO4
Get familiarity with the general deployment considerations of the predictive models
CO5
Appreciate the importance of techniques like data visualization, storytelling with data for
the effective presentations of the outcomes to the stakeholders
Text Book(s)
No
T1
T2
Author(s), Title, Edition, Publishing House
Data Science for Business, By Foster Provost & Tom Fawcett, O’REILLY
Applied Predictive Analytics, By Dean Abbott, WILEY
Reference Book(s) & other resources
No
R1
R2
Author(s), Title, Edition, Publishing House
Introduction to Data Mining, By Tan, Steinbach and Vipin Kumar, PEARSON
Machine Learning using Python, Manaranjan Pradhan & U Dinesh Kumar, WILEY
Content Structure
No
M1
Title of the Module
Data Science Foundations:
o Applications of Data Science
o Role and responsibilities of Data Scientists
o Comparing Data Science with other domains
o Challenges in the field of Data Science
o Data Science Process
o Data Scientists Toolbox
M2
Data Prep and Exploratory Data Analysis:
o Type of Data and data sets
o Data Quality
o Data Preprocessing
o Feature Creation
o Dimension Reduction
o Feature Selection
o Measures of Similarity and Dissimilarity
o Descriptive Analysis
o Data Visualizations
M3
Descriptive Modeling:
o Clustering
o Association Rules
o Principal Component Analysis
o Interpreting Descriptive models
M4
Predictive Modeling:
o Linear Regression
o Logistic Regression
o K-nearest neighbor
o Decision Tree
o Naïve Bayes
o Support Vector Machines
o Neural Networks
o Model Ensembles
o Assessing Predictive models
M5
Post-processing:
o General deployment considerations
o The Narrative - report / presentation structure
o Building narrative with Data
o Effective storytelling
Learning Outcomes:
No
Learning Outcomes
LO1
Applications of Data Science and the process of Data Science project life cycle
LO2
Techniques and tools effective in addressing the data preprocessing and exploratory data
analysis stages
LO3
Applications of Descriptive and Predictive Data Analytics techniques
LO4
Hands-on experience of model building, evaluations and interpretations of results
LO5
Knowledge of post-processing involved in Data Science project including deployment
considerations, importance of effective storytelling
Part B: Contact Session Plan
Academic Term
First Semester 2024-2025
Course Title
Foundations of Data Science
Course No
MBA ZG536 / PDBA ZG536
ARINDAM ROY, PRAVIN MHASKE
Lead Instructor
Course Contents
Contact
Sessions
(#)
Contact
Hours
(#)
List of Topic Title
(from content structure in Course Handout)
Text/Ref
Book/external
resource
Module 1 : Data Science Foundations
1
1

Applications of Data Science

Role and responsibilities of Data Scientists

Comparing Data Science with other domains
T2: Ch 1
2
T1:Ch 1, 2
R4:Ch1

Challenges in the field of Data Science
Additional
Reading(AR)
Class
discussion
2
3

Data Science Process
T1 : Ch 1
room
T2 : Ch 2
4

Data Scientists Toolbox
Class
discussion
room
Module 2: Data Prep and Exploratory Data Analysis
3
5

Type of Data and data sets

Data Quality
6
4
7
R1: Ch 2
R1:Ch 2

Data Preprocessing
T2: Ch 4

Feature Creation
T2: Ch 4

Dimension Reduction
R1 : Appendix

Feature Selection
T1 : Ch 2
8
AR
5
9

Measures of Similarity and Dissimilarity
R1 : Ch 2

Descriptive Analytics
T2 : Ch 3

Data Visualizations
R2 : Ch 2
10
R1 : Ch 3
Module 3 : Descriptive Modeling
6
11

Clustering
T2 : Ch 6, 7
o Applications
o Data prep for clustering
o K-means algorithm
T1 : Ch 6
12
o Hierarchical clustering algorithm
o Standard cluster model interpretation
7
13

Association Rules
o Terminology
o Parameter Settings
o Item set and candidate rules generation
14
o Apriori algorithm
o Measures of interesting rules
T2 : Ch 5
R1 : Ch 6
o Problems with Association rules
o Collaborative filtering
8
15
R4 : Ch 9

Principal Component Analysis
T2 : Ch 6

Interpreting Descriptive models
T2 : Ch 7

Mid semester course review
16
Module 4: Predictive Modeling
9
17

Linear Regression
o Simple Linear regression
T2 : Ch 8
R4 : Ch 4
o Model diagnostics
18

Multiple Linear regression
o Categorical encoding
T2 : Ch 8
R4 : Ch 4
o Multi-collinearity and VIF
o Residual analysis
10
19

Logistic Regression
o Classification overview
T1 : Ch 4
R4 : Ch 5
o Binary classification
o Gain chart and lift chart
20
o Interpreting Logistic regression models
T2 : Ch 8
o Practical considerations
11
21

K-nearest neighbor
o k-NN learning algorithm
T2 : Ch 8
R4 : Ch 6
o Distance metrics for k-NN
22
o Practical Considerations
T2 : Ch 8

Naïve Bayes
o Bayes theorem
o The Naïve Bayes classifier
o Interpreting Naïve Bayes classifier
o Practical considerations
R1 : Ch 5
12
23

Decision Tree
o Decision tree landscape
T2 : Ch 8
R1 : Ch 4
o Building decision trees
24
o Decision tree splitting metrics
o Decision tree Knobs and Options
o Practical considerations
13
25

Support Vector Machines
o Maximum Margin Hyperplanes
T1 : Ch 4
R1 : Ch5
o Linear SVM
26

Neural Networks
o Building blocks
R1 : Ch 5
o Network training
T2 : Ch 8
o Neural network setting , pruning
o Interpreting decision boundaries
o Practical considerations
14
27

Model Ensembles
o Motivation for Ensembles
T2 : Ch 10
R1 : Ch 4
o Bagging
o Boosting
o Random forests
28
o Interpreting Model Ensembles
T2 : Ch 9

Assessing Predictive models
T1 : 4, 5
o Generalization
o Model overfitting
o Batch approach to Model assessment
o Methods for comparing classifiers
Module 6: Post-processing
15
29

General deployment considerations
o Deployment steps
30

The Narrative
o Report structure
o Presentation structure
Building narrative with Data

T2 Ch:12
Class room
discussion
31
16
32

Effective Story telling with Data

Course recap
AR
# The above contact hours and topics can be adapted for non-specific and specific WILP
programs depending on the requirements and class interests.
Lab Details
Title
Access URL
Lab Setup
Instructions
Lab Capsules
Additional
References
Select Topics and Case Studies from business for experiential learning
Topic
No.
Select Topics in Syllabus for experiential learning
Access URL
1.
Descriptive Analytics – Exploring the structured data
R4 : Ch 2
2.
Clustering Techniques – Grouping the data based on similarity
R4 : Ch 7
3.
Recommendation Techniques – Providing the suggestions
R4 : Ch 9
4.
Linear Regression Techniques – Predicting the numeric value
R4 : Ch 4
5.
Classification Problems – Providing the class labels
R4 : Ch 5
6.
Data Science with Cloud based services
AWS docs
Evaluation Scheme
Legend: EC = Evaluation Component
No
EC1
Name
Experiential Learning
Assignment 1
Experiential Learning
Assignment 2
EC2 Mid-Semester Exam
Type
Duration Weig Day, Date, Session, Time
ht
Take Home-Online
30% To be announced
Closed Book 2 hours
30%
Sunday, 22/09/2024 (FN)
40%
Sunday, 01/12/2024 (FN)
EC3 Comprehensive Exam Open Book
2 ½ hours
Important Information
Syllabus for Mid-Semester Test (Closed Book): Topics in Weeks 1-8
Syllabus for Comprehensive Exam (Open Book): All topics given in plan of study
Evaluation Guidelines:
1. EC-1 consists of two Assignments. Announcements regarding the same will be made in a
timely manner.
2. For Closed Book tests: No books or reference material of any kind will be permitted.
Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed.
3. For Open Book exams: Use of prescribed and reference text books, in original (not
photocopies) is permitted. Class notes/slides as reference material in filed or bound form is
permitted. However, loose sheets of paper will not be allowed. Use of calculators is
permitted in all exams. Laptops/Mobiles of any kind are not allowed. Exchange of any
material is not allowed.
4. If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the
student should follow the procedure to apply for the Make-Up Test/Exam. The
genuineness of the reason for absence in the Regular Exam shall be assessed prior to
giving permission to appear for the Make-up Exam. Make-Up Test/Exam will be
conducted only at selected exam centers on the dates to be announced later.
It shall be the responsibility of the individual student to be regular in maintaining the self-study
schedule as given in the course handout, attend the lectures, and take all the prescribed evaluation
components such as Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to
the evaluation scheme provided in the handout.
Download