NATIONAL UNIVERSITY OF SINGAPORE NUS Business School Department of Analytics & Operations DAO2702/DSC2008 Programming for Business Analytics Session: Semester 2, 2021/2022 Instructor: Peng Xiong bizxio@nus.edu.sg Tutors: Peng Zheng peng@nus.edu.sg; Bin Huang huang.bin@nus.edu.sg Description: This module is an introductory course to business analytics and data science. It covers basic Python programming and preliminary statistics, with a great emphasis on addressing practical business problems and real datasets. Data science is an interdisciplinary field that requires business insights and expertise, proficiency in programming, as well as a strong background in mathematics and statistics. Therefore, lectures and tutorials in this semester would focus on trainings in the following perspectives: • Python programming and Pythonic coding styles • Analytical and visualization packages • Math and statistics • Practical business insights and problem solving skills Syllabus: 1. Basics of Python programming 1. Data structures and flow control 2. Functions and packages 2. Data analysis with Python 1. Analytical tools: NumPy, SciPy, Pandas 2. Data visualization: Matplotlib 3. Data collection and cleaning 3. Statistical inference 1. Sampling and inference 2. Confidence intervals 3. Hypothesis testing 4. Linear regression 1. Model assumptions and interpretations 2. Categorical variables and modelling nonlinearity 3. Package Statsmodels for regression analysis Software: Anaconda: installation E-learning Mode: • Lectures will be given as offline recorded videos. • The lecture time is used for discussing exercise questions and online consultation. Each session will be recorded and uploaded to LumiNUS multimedia channels. • Tutorial sessions will be face-to-face and broadcasted via Zoom. • Limited face-to-face meetings can be arranged. Class Materials: • Offline lecture video uploaded in LumiNUS multimedia channels • Jupyter Notebook files as lecture notes • • • • Jupyter Notebook files as tutorial case studies Jupyter Notebook files as exercises Slides as supplementary The folder "Advanced topics" provides supplementary reading materials. They are not tested but may be helpful for your project. Materials for each week can be located via LumiNUS->Module Overview. Reference Books: Python programming: • Python data science handbook, by Jake VanderPlas Statistics and data analysis: • Introductory econometrics, by Jeffrey M. Wooldridge • Storytelling with data, by Cole Nussbaumer Knaflic Online Resources: • Python tutor: https://pythontutor.com/ • Programming for Business Analytics: https://share.streamlit.io/xiongpengnus/learn_dao/main/web.py or https://nus-biz-dao.herokuapp.com/ • Storytelling with data: https://www.storytellingwithdata.com/blog Workload: • Students are required to review lectures covered in each week • Students are supposed to work on exercises for each week. The solutions will be uploaded in the next week, and there is no need to submit the assignment. • All tutorial case studies are related to topics covered in previous lectures. Week π΅ − π Upload notes, tutorial, and exercise for Lecture π΅ Week π΅ Discuss exercise solution for Lecture π΅ Week π΅ + π Discuss tutorial solution for Lecture π΅ + π Assessments: Continuous Assessment: Class Participation 10% • Answering questions posted by the instructor on the forum. • Posting questions and answering questions post by other students will also be considered • The class participation will be given according to 1) the quality of the posts; 2) the time of the posts. • A question may be closed if it is answered by many students. Group Project 20% for report and 15% for presentation • Team work. Each team has five to six members. • You may choose your own teammates (in the same tutorial session) by filling a survey. All team members need to fill the survey. If you fail to fill the survey before the deadline, we will randomly assign you to a team. The random team may have fewer members. • If you ask for changing to another team after the deadline without valid reasons, then there will be a 20% penalty on your report. • An eight-page (4 pieces of double-sided paper) report and a formal 10 to 15-minute presentation. • In the video, please provide the name of the current presenter, otherwise there will be a 20% penalty on your presentation. • All files should be zipped into a zip file with the name: Project_<tutorial session name>_<team name>. Please make sure that you follow the name convention, otherwise there will be a 20% penalty. • Your grade of the project would also be affect by the peer evaluation of your teammates. Zero mark for zero contribution! • Zero mark for plagiarism. • More details can be found from Appendix B. Final Examination: 55% • Paper-based close-book examination. • A number of multiple-choice questions and a written question. • You are allowed to take a double-sided A4 cheat-sheet and a calculator with you. Other notes or electronic devices are prohibited. • All topics covered in lectures/exercises/tutorials could be tested. Schedule: Week 1. Course Overview Introduction to Programming and Business Analytics Week 2. Python Basics I Week 3. Python Basics II Week 4. Built-in compound data types Week 5. Functions, modules, and packages Week 6. Data arrays and data visualization Week 7. Basics of Pandas Week 8. Facts from data Week 9. Confidence intervals and hypothesis testing Week 10. Introduction to regression analysis Week 11. Regression analysis for explanatory modeling Week 12. Nonlinearity and categorical variables Week 13. Review FAQ: Lecture and tutorial registrations What if I want to change lecture sessions or have difficulties in securing a tutorial slot? Please contact the BBA office or ModReg as soon as possible. The instructor and tutors do not have the access to the module registration system, so it does not help to email them. Class Participation How are class participation marks calculated? The majority of class participation marks come from posting answers to the "Questions given by instructors" on the forum. A minor part will come from the "DAO2702 StackOverflow" forum, where students are able to post their own questions and answers there. The questions given by instructors are typically open-ended questions, so every student should have a chance to contribute to the forum discussion. The marks will be given according to the ranking of your contributions on the forum, and we mainly consider 1) the quality of your posts; and 2) the time of your posts (e.g. a post made after two months is not considered as valuable as the one made within the first week.) Will the attendance of tutorials be accounted for class participation? What if I am unable to attend one of my tutorial sessionsοΌ You are encouraged to attend the F2F tutorial sessions, but the attendance is not mandatory and will not affect your class participation grades. The tutor may ask you to mark your attendance, but it is just for Covid-19 safety measures. All tutorial sessions provide Zoom links for students to attend from distance, so if you have difficulties in attending one of your F2F sessions, you could always attend any other sessions via Zoom at your convenience. There is no need to inform your tutor. Project Are the cover page, table of contents, reference, and appendices counted in the 8-page limit of the report? The cover page, table of contents, reference, and appendices are not counted in the 8-page limit. Do not place your key findings and important diagrams in the appendices. Is it allowed to place data visualizations in appendices? Please make sure that the graders can read your report without referring to the appendices. Are we supposed to include code in the project report or presentation? Code is submitted in a separate file. Please avoid showing code in your project report or presentation. Are there any requirements for the font, line spacing and margin of the report? There is no specific requirement on the font, spacing, margin of the report. You could use any page layouts as long as the report is clear and neat for reading (as A4-size). What if my presentation recording file is too large to be uploaded to LumiNUS? You could upload the recordings to a cloud storage drive and enclose the link of the drive in the submitted zip file. Is the peer evaluation survey anonymous? Is it compulsory? What shall I do if one of my teammates is not contributing to the project? The peer evaluation survey is totally anonymous. Only the instructor and tutors can see your responses. The peer evaluation survey is entirely optional, and there is no need for the whole team to fill the survey. You can totally ignore it if you have nothing to report. If your teammates are not contributing, it is better to communicate with them first and solve the issue internally. If the issue cannot be settled internally and they still refuse to contribute, please keep all evidence, and file a report in the peer evaluation survey. Your report will be taken into consideration in grading the project. Final Exam What are the topics to be tested in the final exam? All topics covered in lectures/tutorials/exercises could be tested. These topics include 1) basics of Python programming (Lecture 2 to Lecture 6); 2) data arrays, frames, and visualization (Lecture 7 to Lecture 9); and 3) statistics and regression analysis (Lecture 10 to Lecture 13). Only the advanced topics will not be tested. Where can I find the time and venue for the final exam? Please visit the EduRec: https://myedurec.nus.edu.sg/psp/cs90prd/?cmd=login What are needed in the paper-based exam? 1) The double-sided A4 size cheat sheet 2) A scientific calculator (a graphic /programmable calculator is allowed, but it will not give you special advantages in the exam) 3) A pencil for filling the bubble card. All MCQs will be graded based on the submitted bubble cards. Except for the calculator, no electronic device, including the laptop, is allowed in the paper-based exam. Appendix A: Important dates Dates Consequences of missing the dates Deadline of the general survey 14/01/2022 at 11:59 PM None Deadline of the project group survey 18/02/2022 at 11:59 PM You will be randomly assigned to a team. Deadline of submitting the project 15/04/2022 at 11:59 PM 20% penalty on late submission. The forum of instructors' questions expire 15/04/2022 at 11:59 PM No posts can be made on the forum after this date. The forum of students' questions expire 22/04/2022 at 11:59 PM No posts can be made on the forum after this date. Deadline of the peer evaluation survey 03/05/2022 at 11:59 PM Peer evaluation cannot be submitted after this date Appendix B: Project Guidelines Submission Each team must submit a zip file containing: • A formal eight-page (4 pieces of double-sided paper) report saved as a PDF file (no code should appear in the formal report). • Your source code saved as HTML files. • A video presentation (about 10~15 minutes) where each team member participates in the presentation. It can be recorded by ordinary smart phones, as long as it is able to show the presenters and the slides clearly. The zip file name must follow the format: Project_<tutorial session name>_<team name>. Please make sure that you follow the name convention, otherwise there will be a 20% penalty. Any files submitted after the deadline are considered late submission. There will be a 20% penalty for late submission within 24 hours after the deadline. Submission later for more than 24 hours will be considered no submission and receives zero mark. Scope of the Project The team project requires students to solve ONE practical business problem by exploiting real datasets. Possible sources of real datasets: 1) Data.world; 2) Kaggle; 3) Data.gov.sg. You can also download datasets from other websites, or even conduct your own survey to collect data. A few examples are given below: 1) Make property investment decisions based on property price data or Airbnb data; 2) Use a dataset of movies/video games made in recent years to decide what kind of movies/games to make in the future; 3) Use a dataset of employees in a company to investigate if there is discrimination in the company. 4) Based on the demand data of a perishable product, use Monte Carlo simulation to find the optimal production quantity. You are also allowed to work on other directions rather than just analysing real datasets. For example, you could create a dashboard for data visualization, or to develop your own Python packages. For such project topics, please contact the instructor first for his permission. Detailed Guidelines and Grading Criteria for Project Report The main focus of the project should be on data visualization, but you are allowed to use whatever programming/analytics techniques you know (not restricted to this module). Please do not write your project report as an assignment with separated tasks. You need to have a good flow for each part of your report, and they are all used in solving ONE problem. The project report of the whole team would be evaluated following the guidelines below: Components Problem statements (25%) Suggestions • A Clear and concise description of only one business problem • Emphasize how the dataset would help solve the problem. • Justify any assumptions/approximation/missing data. • Novelty and creativity in the selected problem/dataset is highly valued. • Data visualization is the key component of this project Data visualization (40%) • Use proper graphs to present the information of the dataset • Clear presentation and accurate description of your graphs • Do not create irrelevant graphs; good data visualization should well support your analysis and be efficient in delivering data information. Analysis and Discussion of the results (25%) • Derive relevant information form data visualization and use such information to solve the business problem. • You may use statistical models (KNN, linear regression, or any other models you know) covered in the second half of the semester, but this is not compulsory. • Finalize your report with necessary discussion/conclusion/recommendations. • Possible discussion on the influence of missing data and model limitations; give suggestions on solutions or future research. • Well-structured flow and clear logic. Writing (10%) • Better to keep a list of reference for all literatures/external sources you refer to. • Good English and free of error/typos Detailed Guidelines for Project Presentation For the presentation, • Every team member needs to participate. • You may simply use your cell phone to record the video. • Due to the current situation, you are allowed to make separate video clip at home and combine them as a presentation. • The audience should be able to see your slides and your face. The preferred formats are shown as the following pictures. • No need to dress professionally • Make sure that the presenter's name is displayed while he/she is talking, otherwise there will be a 20% penalty on the presenter. • Use proper data visualization for your slides • Try to focus more on the business problem itself and make it as friendly as possible for audience with a limited programming/math/statistics background. • Presentation is graded individually, according to each student's presentation skills. • DO NOT read a script! Students will be penalized if they are reading the scripts on their screen. Reference This book may help to work on your project report/presentation: Storytelling with Data Some videos from the same author: https://www.youtube.com/watch?v=8EMW7io4rSI https://www.youtube.com/channel/UCjhGlILWNloXJdR2NTCBMlA/videos?view=0&sort=da&flow=grid Appendix C: ACADEMIC HONESTY & PLAGIARISM Academic integrity and honesty is essential for the pursuit and acquisition of knowledge. The University and School expect every student to uphold academic integrity & honesty at all times. Academic dishonesty is any misrepresentation with the intent to deceive, or failure to acknowledge the source, or falsification of information, or inaccuracy of statements, or cheating at examinations/tests, or inappropriate use of resources. Plagiarism is ‘the practice of taking someone else's work or ideas and passing them off as one's own' (The New Oxford Dictionary of English). The University and School will not condone plagiarism. Students should adopt this rule - You have the obligation to make clear to the assessor which is your own work, and which is the work of others. Otherwise, your assessor is entitled to assume that everything being presented for assessment is being presented as entirely your own work. This is a minimum standard. In case of any doubts, you should consult your instructor. Additional guidance is available at: http://www.nus.edu.sg/registrar/adminpolicy/acceptance.html#NUSCodeofStudentConduct Online Module on Plagiarism: http://emodule.nus.edu.sg/ac/