Uploaded by Albert Hokan

DSO 559 - Summary - Week 1

advertisement
DSO 559
Introduction to Python for
Business Analytics
Richard W. Selby
Adjunct Professor
University of Southern California
rselby@marshall.usc.edu
0
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Introduction to Python for Business Analytics:
Course Introduction
“What is this course about?”
 Formal course definition (from Course Catalog)
 Python programming for descriptive data analytics and
technical tools for business applications. Solving business
problems and formulating actionable business
recommendations including their limitations.
1
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Introduction to Python for Business Analytics:
Course Introduction
“What is this course about?”
 Perspective
 … as viewed by someone who has spent his career in both the
“business side” and “technology side” of management and
development of technology-centric products in Fortune 100, midsize, and entrepreneurial enterprises ...
2
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Introduction to Python for Business Analytics:
Topics
15 class sessions
 Sessions #1 and #2: Introduction to Python
 Sessions #3 and #4: Foundational Python features
 Sessions #5 and #6: Functions, parameters, packages, reuse
 Sessions #7 and #8: Data visualization
 Sessions #9 and #10: Predictive modeling
 Sessions #11 and #12: Neural networks
 Sessions #13 and #14: Python integration with other
programming languages, data pipelines, automation
 Session #15: Future directions
3
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Introduction to Python for Business Analytics:
Grading
Grading
 Goal is to enable everyone to be successful in this course
 Marshall policy defines a target average grade of 3.5 in graduate
elective courses
 Therefore, please understand that if some grades will be above
a 3.5, then some grades will be below a 3.5. Example
distribution that has a 3.5 average: 25% A, 25% A-, 25% B+,
25% B.
 Basis for grading:
 Projects: 30% (5 projects each worth 6%)
 Due in Weeks 3, 6, 9, 12, 15
 Mid-term exam: 20%
 Due in Week 8
 Final exam: 40% (cumulative)
 Due during Finals Week
 Class participation: 10%
 Total: 100%
4
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Python
Python
 Python is an interpreted, high-level, general-purpose
programming language
 Python is both very powerful and easy to use, and it is a
very effective programming language for solving problems
 Python is open source software (free software)
 Python runs on Windows, Mac OS X, Linux/UNIX, etc.
 Python can be run locally on a laptop or in the cloud using a
web browser interface
Jupyter notebook
 Jupyter notebook is an interactive shell interpreter that
enables you to create and execute Python programs and
related commands incrementally
 Jupyter notebook is open source software (free software)
 Jupyter notebook is a web application that runs in a browser
5
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Python
Python resources
 https://www.python.org/
 https://en.wikipedia.org/wiki/Python_(programming_languag
e)
 https://www.w3schools.com/python/default.asp
 http://nbviewer.jupyter.org/github/phelps-sg/pythonbigdata/blob/master/src/main/ipynb/intro-python.ipynb
 Google “python tutorial”
 Note: We will be using Python 3 (not Python 2)
Jupyter notebook resources
 http://jupyter.readthedocs.io/en/latest/install.html
 https://jupyter.org/try
 Google “jupyter notebook tutorial”
6
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Business Analytics
 Business analytics definition
 Business analytics is defined as the study, integration, and
application of knowledge, skills, and methods for using data,
statistical analysis, quantitative approaches, and predictive
modeling to enable data-driven decision making and innovation
in organizations
7
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Business Analytics Methodology Framework
 Overall business analytics methodology
framework for successfully implementing
analytics-driven management and rapidly creating
value
 Four major methodology elements:
 Goal definition
 Data collection
 Data analysis and modeling
 Interpretation, action, and feedback
8
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
DSO 559
Introduction to Python for
Business Analytics
Project Checklist
Richard W. Selby
Adjunct Professor
University of Southern California
rselby@marshall.usc.edu
9
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (1 of 9)
Phase
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
Goal Definition
 Describe business goal
 Describe dependent variables
 Describe independent variables
Emphasis: Phase 1 techniques focus on goal definition and variable
definition
10
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (2 of 9)
Phase Data Collection
1,2,3,4,5  Define data sources including actual links to websites. Include
both numeric and categorical dependent variables and both
numeric and categorical independent variables
2,3,4,5  Organize data files into columns and rows. Columns contain
both the dependent variables and the independent variables,
and rows contain the observations (also called “data points”).
 Tools: Python, Excel
2,3,4,5  Define the number of observations (also called “rows” or “data
points”)
 Tools: Python, Excel
2,3,4,5  Define the number of dependent variables, calculated
dependent variables, independent variables, and calculated
independent variables
 Tools: Python, Excel
Emphasis: Phase 2 techniques focus on data organization and
visualization
11
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (3 of 9)
Phase Data Collection (continued)
2,3,4,5  Manipulate data files using SQL commands
 Tools: Python, SQL
2,3,4,5  Define the number of files that contain the data, describe the
joins between the data files, and describe the type of joins
(inner join, left outer join, right outer join, or full outer join)
 Tools: Python, SQL
2,3,4,5  In the joined datasets (also called “wide datasets” or “wide
tables”) that combine your data files, define the number of
dependent variables, independent variables, and observations
(also called “rows” or “data points”)
 Tools: Python, SQL
2,3,4,5  Describe how any missing data are addressed and how any
sampling or filtering of observations to select subsets of data
are performed
 Tools: Python, Excel
5  Integrate automated data collection capabilities into website
dashboard that track usage
 Tools: Google Analytics
12
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (4 of 9)
Phase Data Analysis and Modeling
2,3,4,5  Display visualizations for the dependent variables and their
relationships with the independent variables, such as bar
charts, histograms, pie charts, scatter plots, box plots, etc.
 Tools: Python
2,3,4,5  Display descriptive statistics for the dependent variables, such
as means, standard deviations, medians, minimums,
maximums, etc.
 Tools: Python
13
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (5 of 9)
Phase Data Analysis and Modeling (continued)
3,4,5  Display correlations of the dependent variables and the
independent variables
 Tools: Python
3,4,5  Display results from multivariate linear regression models of
the dependent variables including overall statistical
significance, statistically significant independent variables,
regression equation, and percent variance explained (R2)
 Tools: Python
3,4,5  Display results from analysis of variance (ANOVA) models of
the dependent variables including overall statistical
significance, statistically significant factors, statistically
significant differences in levels within each factor, statistically
significant factor interactions, and percent variance explained
(R2)
 Tools: Python
Emphasis: Phase 3 techniques focus on analyzing numeric dependent
variables
14
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (6 of 9)
Phase Data Analysis and Modeling (continued)
4,5  Display results from logistic regression models of the
dependent variables including the receiver operating
characteristic (ROC) curves and area underneath the ROC
curves
 Tools: Python
Emphasis: Phase 4 techniques focus on analyzing categorical dependent
variables
15
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (6a of 9)
Phase Data Analysis and Modeling (continued)
4,5  Display results from neural network models of the dependent
variables
 Tools: Python
Emphasis: Phase 4 techniques focus on analyzing categorical dependent
variables
16
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (7 of 9)
Phase Data Analysis and Modeling (continued)
5  Describe to what extent programs are developed or customized
for the analyses including names of the major functions used in
the programs. Display at least one program that is customized
for the analyses.
 Tools: Python
5  Create a website dashboard that displays the data
visualizations, analyses, and models
 Tools: Python, website development tools, Google Analytics
Emphasis: Phase 5 techniques focus on automation, adoption,
interpretation, and action
17
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (8 of 9)
Phase Interpretation, Action, and Feedback
1,2,3,4,5  Describe interpretation of the data analysis and modeling in the
context of the business goal
1,2,3,4,5  Outline the possible options and/or decisions available to the
business based on the data analysis and modeling
5  Of these possible options and/or decisions, identify which path
forward is recommended and why
5  Define the actions that are needed for the recommended path
forward
5  Recommend follow-on goals and data analysis and modeling
that would build on the results
5  Discuss if the project were to be done over again, what changes
would be recommended
18
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Business Analytics Focuses on Success in DataDriven Leadership: Project Checklist (9 of 9)
Phase
19
Overall Presentation Format
2,3,4,5
In your presentation, focus on the new information
and results, and do not use too much time repeating
information presented previously
2,3,4,5
In your presentation, create a section called “Backup
Slides” at the end of your presentation where you
move your previous slides that are still relevant but
are not presented during the current project phase
2,3,4,5
Your slide package should be a cumulative version of
all your slides
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Introduction to Python for Business Analytics:
Topics
15 class sessions
 Sessions #1 and #2: Introduction to Python
 Sessions #3 and #4: Foundational Python features
 Sessions #5 and #6: Functions, parameters, packages, reuse
 Sessions #7 and #8: Data visualization
 Sessions #9 and #10: Predictive modeling
 Sessions #11 and #12: Neural networks
 Sessions #13 and #14: Python integration with other
programming languages, data pipelines, automation
 Session #15: Future directions
20
© Copyright 2008-2023. Richard W. Selby. All rights reserved.
Download