BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI WORK INTEGRATED LEARNING PROGRAMMES COURSE HANDOUT Part A: Content Design Course Title Course No(s) Credit Units Course Author Version No Date Foundations of Data Science MBA ZG536/PDBA ZG536 4 ARINDAM ROY, PRAVIN MHASKE 1.0 1 June 2020 Course Description Introduction, Role of a Data Scientist, Statistics vs. Data Science, Fundamentals of Data Science, Data Science process and life cycle, Exploratory Data Analysis, Data Engineering and shaping, Overview of Data Science Techniques and Models, Introduction to Regression, Classification, Shrinkage, Dimension Reduction, Tree-based models, Support Vector Machines, Unsupervised learning, Choosing and evaluating models, Featuarization, Overview of Neural Networks, Data mining, and pattern recognition techniques, Documentation, Deployment, and Presentations of the insights Course Objectives No Objective CO1 Get introduced to the field of Data Science, roles, process and challenges involved therein CO2 Explore and experience the steps involved in the data preparations and exploratory data analysis CO3 Learn to select and apply proper analytics technique for various scenarios, assess the models performance and interpret the results of the predictive model CO4 Get familiarity with the general deployment considerations of the predictive models CO5 Appreciate the importance of techniques like data visualization, storytelling with data for the effective presentations of the outcomes to the stakeholders Text Book(s) No T1 T2 Author(s), Title, Edition, Publishing House Data Science for Business, By Foster Provost & Tom Fawcett, O’REILLY Applied Predictive Analytics, By Dean Abbott, WILEY Reference Book(s) & other resources No R1 R2 Author(s), Title, Edition, Publishing House Introduction to Data Mining, By Tan, Steinbach and Vipin Kumar, PEARSON Machine Learning using Python, Manaranjan Pradhan & U Dinesh Kumar, WILEY Content Structure No M1 Title of the Module Data Science Foundations: o Applications of Data Science o Role and responsibilities of Data Scientists o Comparing Data Science with other domains o Challenges in the field of Data Science o Data Science Process o Data Scientists Toolbox M2 Data Prep and Exploratory Data Analysis: o Type of Data and data sets o Data Quality o Data Preprocessing o Feature Creation o Dimension Reduction o Feature Selection o Measures of Similarity and Dissimilarity o Descriptive Analysis o Data Visualizations M3 Descriptive Modeling: o Clustering o Association Rules o Principal Component Analysis o Interpreting Descriptive models M4 Predictive Modeling: o Linear Regression o Logistic Regression o K-nearest neighbor o Decision Tree o Naïve Bayes o Support Vector Machines o Neural Networks o Model Ensembles o Assessing Predictive models M5 Post-processing: o General deployment considerations o The Narrative - report / presentation structure o Building narrative with Data o Effective storytelling Learning Outcomes: No Learning Outcomes LO1 Applications of Data Science and the process of Data Science project life cycle LO2 Techniques and tools effective in addressing the data preprocessing and exploratory data analysis stages LO3 Applications of Descriptive and Predictive Data Analytics techniques LO4 Hands-on experience of model building, evaluations and interpretations of results LO5 Knowledge of post-processing involved in Data Science project including deployment considerations, importance of effective storytelling Part B: Contact Session Plan Academic Term First Semester 2024-2025 Course Title Foundations of Data Science Course No MBA ZG536 / PDBA ZG536 ARINDAM ROY, PRAVIN MHASKE Lead Instructor Course Contents Contact Sessions (#) Contact Hours (#) List of Topic Title (from content structure in Course Handout) Text/Ref Book/external resource Module 1 : Data Science Foundations 1 1 Applications of Data Science Role and responsibilities of Data Scientists Comparing Data Science with other domains T2: Ch 1 2 T1:Ch 1, 2 R4:Ch1 Challenges in the field of Data Science Additional Reading(AR) Class discussion 2 3 Data Science Process T1 : Ch 1 room T2 : Ch 2 4 Data Scientists Toolbox Class discussion room Module 2: Data Prep and Exploratory Data Analysis 3 5 Type of Data and data sets Data Quality 6 4 7 R1: Ch 2 R1:Ch 2 Data Preprocessing T2: Ch 4 Feature Creation T2: Ch 4 Dimension Reduction R1 : Appendix Feature Selection T1 : Ch 2 8 AR 5 9 Measures of Similarity and Dissimilarity R1 : Ch 2 Descriptive Analytics T2 : Ch 3 Data Visualizations R2 : Ch 2 10 R1 : Ch 3 Module 3 : Descriptive Modeling 6 11 Clustering T2 : Ch 6, 7 o Applications o Data prep for clustering o K-means algorithm T1 : Ch 6 12 o Hierarchical clustering algorithm o Standard cluster model interpretation 7 13 Association Rules o Terminology o Parameter Settings o Item set and candidate rules generation 14 o Apriori algorithm o Measures of interesting rules T2 : Ch 5 R1 : Ch 6 o Problems with Association rules o Collaborative filtering 8 15 R4 : Ch 9 Principal Component Analysis T2 : Ch 6 Interpreting Descriptive models T2 : Ch 7 Mid semester course review 16 Module 4: Predictive Modeling 9 17 Linear Regression o Simple Linear regression T2 : Ch 8 R4 : Ch 4 o Model diagnostics 18 Multiple Linear regression o Categorical encoding T2 : Ch 8 R4 : Ch 4 o Multi-collinearity and VIF o Residual analysis 10 19 Logistic Regression o Classification overview T1 : Ch 4 R4 : Ch 5 o Binary classification o Gain chart and lift chart 20 o Interpreting Logistic regression models T2 : Ch 8 o Practical considerations 11 21 K-nearest neighbor o k-NN learning algorithm T2 : Ch 8 R4 : Ch 6 o Distance metrics for k-NN 22 o Practical Considerations T2 : Ch 8 Naïve Bayes o Bayes theorem o The Naïve Bayes classifier o Interpreting Naïve Bayes classifier o Practical considerations R1 : Ch 5 12 23 Decision Tree o Decision tree landscape T2 : Ch 8 R1 : Ch 4 o Building decision trees 24 o Decision tree splitting metrics o Decision tree Knobs and Options o Practical considerations 13 25 Support Vector Machines o Maximum Margin Hyperplanes T1 : Ch 4 R1 : Ch5 o Linear SVM 26 Neural Networks o Building blocks R1 : Ch 5 o Network training T2 : Ch 8 o Neural network setting , pruning o Interpreting decision boundaries o Practical considerations 14 27 Model Ensembles o Motivation for Ensembles T2 : Ch 10 R1 : Ch 4 o Bagging o Boosting o Random forests 28 o Interpreting Model Ensembles T2 : Ch 9 Assessing Predictive models T1 : 4, 5 o Generalization o Model overfitting o Batch approach to Model assessment o Methods for comparing classifiers Module 6: Post-processing 15 29 General deployment considerations o Deployment steps 30 The Narrative o Report structure o Presentation structure Building narrative with Data T2 Ch:12 Class room discussion 31 16 32 Effective Story telling with Data Course recap AR # The above contact hours and topics can be adapted for non-specific and specific WILP programs depending on the requirements and class interests. Lab Details Title Access URL Lab Setup Instructions Lab Capsules Additional References Select Topics and Case Studies from business for experiential learning Topic No. Select Topics in Syllabus for experiential learning Access URL 1. Descriptive Analytics – Exploring the structured data R4 : Ch 2 2. Clustering Techniques – Grouping the data based on similarity R4 : Ch 7 3. Recommendation Techniques – Providing the suggestions R4 : Ch 9 4. Linear Regression Techniques – Predicting the numeric value R4 : Ch 4 5. Classification Problems – Providing the class labels R4 : Ch 5 6. Data Science with Cloud based services AWS docs Evaluation Scheme Legend: EC = Evaluation Component No EC1 Name Experiential Learning Assignment 1 Experiential Learning Assignment 2 EC2 Mid-Semester Exam Type Duration Weig Day, Date, Session, Time ht Take Home-Online 30% To be announced Closed Book 2 hours 30% Sunday, 22/09/2024 (FN) 40% Sunday, 01/12/2024 (FN) EC3 Comprehensive Exam Open Book 2 ½ hours Important Information Syllabus for Mid-Semester Test (Closed Book): Topics in Weeks 1-8 Syllabus for Comprehensive Exam (Open Book): All topics given in plan of study Evaluation Guidelines: 1. EC-1 consists of two Assignments. Announcements regarding the same will be made in a timely manner. 2. For Closed Book tests: No books or reference material of any kind will be permitted. Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed. 3. For Open Book exams: Use of prescribed and reference text books, in original (not photocopies) is permitted. Class notes/slides as reference material in filed or bound form is permitted. However, loose sheets of paper will not be allowed. Use of calculators is permitted in all exams. Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed. 4. If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the student should follow the procedure to apply for the Make-Up Test/Exam. The genuineness of the reason for absence in the Regular Exam shall be assessed prior to giving permission to appear for the Make-up Exam. Make-Up Test/Exam will be conducted only at selected exam centers on the dates to be announced later. It shall be the responsibility of the individual student to be regular in maintaining the self-study schedule as given in the course handout, attend the lectures, and take all the prescribed evaluation components such as Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the evaluation scheme provided in the handout.