DSO 559 Introduction to Python for Business Analytics Richard W. Selby Adjunct Professor University of Southern California rselby@marshall.usc.edu 0 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Introduction to Python for Business Analytics: Course Introduction “What is this course about?” Formal course definition (from Course Catalog) Python programming for descriptive data analytics and technical tools for business applications. Solving business problems and formulating actionable business recommendations including their limitations. 1 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Introduction to Python for Business Analytics: Course Introduction “What is this course about?” Perspective … as viewed by someone who has spent his career in both the “business side” and “technology side” of management and development of technology-centric products in Fortune 100, midsize, and entrepreneurial enterprises ... 2 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Introduction to Python for Business Analytics: Topics 15 class sessions Sessions #1 and #2: Introduction to Python Sessions #3 and #4: Foundational Python features Sessions #5 and #6: Functions, parameters, packages, reuse Sessions #7 and #8: Data visualization Sessions #9 and #10: Predictive modeling Sessions #11 and #12: Neural networks Sessions #13 and #14: Python integration with other programming languages, data pipelines, automation Session #15: Future directions 3 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Introduction to Python for Business Analytics: Grading Grading Goal is to enable everyone to be successful in this course Marshall policy defines a target average grade of 3.5 in graduate elective courses Therefore, please understand that if some grades will be above a 3.5, then some grades will be below a 3.5. Example distribution that has a 3.5 average: 25% A, 25% A-, 25% B+, 25% B. Basis for grading: Projects: 30% (5 projects each worth 6%) Due in Weeks 3, 6, 9, 12, 15 Mid-term exam: 20% Due in Week 8 Final exam: 40% (cumulative) Due during Finals Week Class participation: 10% Total: 100% 4 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Python Python Python is an interpreted, high-level, general-purpose programming language Python is both very powerful and easy to use, and it is a very effective programming language for solving problems Python is open source software (free software) Python runs on Windows, Mac OS X, Linux/UNIX, etc. Python can be run locally on a laptop or in the cloud using a web browser interface Jupyter notebook Jupyter notebook is an interactive shell interpreter that enables you to create and execute Python programs and related commands incrementally Jupyter notebook is open source software (free software) Jupyter notebook is a web application that runs in a browser 5 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Python Python resources https://www.python.org/ https://en.wikipedia.org/wiki/Python_(programming_languag e) https://www.w3schools.com/python/default.asp http://nbviewer.jupyter.org/github/phelps-sg/pythonbigdata/blob/master/src/main/ipynb/intro-python.ipynb Google “python tutorial” Note: We will be using Python 3 (not Python 2) Jupyter notebook resources http://jupyter.readthedocs.io/en/latest/install.html https://jupyter.org/try Google “jupyter notebook tutorial” 6 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Business Analytics Business analytics definition Business analytics is defined as the study, integration, and application of knowledge, skills, and methods for using data, statistical analysis, quantitative approaches, and predictive modeling to enable data-driven decision making and innovation in organizations 7 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Business Analytics Methodology Framework Overall business analytics methodology framework for successfully implementing analytics-driven management and rapidly creating value Four major methodology elements: Goal definition Data collection Data analysis and modeling Interpretation, action, and feedback 8 © Copyright 2008-2023. Richard W. Selby. All rights reserved. DSO 559 Introduction to Python for Business Analytics Project Checklist Richard W. Selby Adjunct Professor University of Southern California rselby@marshall.usc.edu 9 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (1 of 9) Phase 1,2,3,4,5 1,2,3,4,5 1,2,3,4,5 Goal Definition Describe business goal Describe dependent variables Describe independent variables Emphasis: Phase 1 techniques focus on goal definition and variable definition 10 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (2 of 9) Phase Data Collection 1,2,3,4,5 Define data sources including actual links to websites. Include both numeric and categorical dependent variables and both numeric and categorical independent variables 2,3,4,5 Organize data files into columns and rows. Columns contain both the dependent variables and the independent variables, and rows contain the observations (also called “data points”). Tools: Python, Excel 2,3,4,5 Define the number of observations (also called “rows” or “data points”) Tools: Python, Excel 2,3,4,5 Define the number of dependent variables, calculated dependent variables, independent variables, and calculated independent variables Tools: Python, Excel Emphasis: Phase 2 techniques focus on data organization and visualization 11 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (3 of 9) Phase Data Collection (continued) 2,3,4,5 Manipulate data files using SQL commands Tools: Python, SQL 2,3,4,5 Define the number of files that contain the data, describe the joins between the data files, and describe the type of joins (inner join, left outer join, right outer join, or full outer join) Tools: Python, SQL 2,3,4,5 In the joined datasets (also called “wide datasets” or “wide tables”) that combine your data files, define the number of dependent variables, independent variables, and observations (also called “rows” or “data points”) Tools: Python, SQL 2,3,4,5 Describe how any missing data are addressed and how any sampling or filtering of observations to select subsets of data are performed Tools: Python, Excel 5 Integrate automated data collection capabilities into website dashboard that track usage Tools: Google Analytics 12 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (4 of 9) Phase Data Analysis and Modeling 2,3,4,5 Display visualizations for the dependent variables and their relationships with the independent variables, such as bar charts, histograms, pie charts, scatter plots, box plots, etc. Tools: Python 2,3,4,5 Display descriptive statistics for the dependent variables, such as means, standard deviations, medians, minimums, maximums, etc. Tools: Python 13 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (5 of 9) Phase Data Analysis and Modeling (continued) 3,4,5 Display correlations of the dependent variables and the independent variables Tools: Python 3,4,5 Display results from multivariate linear regression models of the dependent variables including overall statistical significance, statistically significant independent variables, regression equation, and percent variance explained (R2) Tools: Python 3,4,5 Display results from analysis of variance (ANOVA) models of the dependent variables including overall statistical significance, statistically significant factors, statistically significant differences in levels within each factor, statistically significant factor interactions, and percent variance explained (R2) Tools: Python Emphasis: Phase 3 techniques focus on analyzing numeric dependent variables 14 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (6 of 9) Phase Data Analysis and Modeling (continued) 4,5 Display results from logistic regression models of the dependent variables including the receiver operating characteristic (ROC) curves and area underneath the ROC curves Tools: Python Emphasis: Phase 4 techniques focus on analyzing categorical dependent variables 15 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (6a of 9) Phase Data Analysis and Modeling (continued) 4,5 Display results from neural network models of the dependent variables Tools: Python Emphasis: Phase 4 techniques focus on analyzing categorical dependent variables 16 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (7 of 9) Phase Data Analysis and Modeling (continued) 5 Describe to what extent programs are developed or customized for the analyses including names of the major functions used in the programs. Display at least one program that is customized for the analyses. Tools: Python 5 Create a website dashboard that displays the data visualizations, analyses, and models Tools: Python, website development tools, Google Analytics Emphasis: Phase 5 techniques focus on automation, adoption, interpretation, and action 17 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (8 of 9) Phase Interpretation, Action, and Feedback 1,2,3,4,5 Describe interpretation of the data analysis and modeling in the context of the business goal 1,2,3,4,5 Outline the possible options and/or decisions available to the business based on the data analysis and modeling 5 Of these possible options and/or decisions, identify which path forward is recommended and why 5 Define the actions that are needed for the recommended path forward 5 Recommend follow-on goals and data analysis and modeling that would build on the results 5 Discuss if the project were to be done over again, what changes would be recommended 18 © Copyright 2008-2023. Richard W. Selby. All rights reserved. Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (9 of 9) Phase 19 Overall Presentation Format 2,3,4,5 In your presentation, focus on the new information and results, and do not use too much time repeating information presented previously 2,3,4,5 In your presentation, create a section called “Backup Slides” at the end of your presentation where you move your previous slides that are still relevant but are not presented during the current project phase 2,3,4,5 Your slide package should be a cumulative version of all your slides © Copyright 2008-2023. Richard W. Selby. All rights reserved. Introduction to Python for Business Analytics: Topics 15 class sessions Sessions #1 and #2: Introduction to Python Sessions #3 and #4: Foundational Python features Sessions #5 and #6: Functions, parameters, packages, reuse Sessions #7 and #8: Data visualization Sessions #9 and #10: Predictive modeling Sessions #11 and #12: Neural networks Sessions #13 and #14: Python integration with other programming languages, data pipelines, automation Session #15: Future directions 20 © Copyright 2008-2023. Richard W. Selby. All rights reserved.