Uploaded by Angel Choi

Class 1 - Intro to Business Analytics - BeforeClass Monday1B

advertisement
IIMT 2641 Introduction to Business Analytics
Class 1: Introduction to Business Analytics
2023 Fall
1
IIMT 2641 Teaching Team
Instructor: Feng TIAN
– Email: fengtian@hku.hk
– Office: KKL 1312
– Office Hours: TBD or By appointment.
§ TA: Ian Chan
– Email: ikwchan@hku.hk
– Office: KKL 625
– Office Hours: By appointment
§ TA: Yuwen Brian Wang
– Email: byuwen@hku.hk
– Office Hours: By appointment
– Office: KKL 625
§
2
Who am I?
§
Partial Economist
– BA in Econ (Nankai)
– MA in Econ (Duke)
§
Failed Mathematician
– Love Mathematics (Dreamed of being Mathematician)
– Applied Mathematics, Applying mathematics
§
Operations Researcher
– PhD in Technology & Operations (University of Michigan)
3
Data is powerful and everywhere
§
Data is transforming business, social interactions, and the future of our
society.
§
The total amount of data created, captured, copied, and consumed globally
is forecast to increase rapidly, reaching 64.2 zettabytes in 2020. (Statista)
64,200,000,000,000,000,000,000 bytes (1000 Bytes = 0.9766 Kilobytes)
– This is equal to the storage required for more than 4 trillion HD movies
– It would take a person approximately 300 million years to download them all from
the internet
– This number is predicted to reach 175 zb by 2025.
–
§
Internet users generate about 2.5 quintillion (10^18) bytes of data each
day. 90% of all data has been created in the last two years.
§
Ability to process data also increases
–
Decoding the human genome originally took 10 years to process; now it can be
achieved in one week
4
Data and analytics are useful
§
Analytics is increasingly important in the world today.
§
The global big data analytics market size was valued US$271.83 billion in
2022. The market is projected to grow from US$307.52 billion in 2023 to
US$745.15 billion by 2030.
§
95% of businesses cite the need to manage unstructured data as a
problem for their business.
5
Data and analytics are useful
§
97.2% of organizations are investing in big data and AI.
–
IBM has changed its business focus over the last 100 years from typewriters to
mainframes to personal computers to consulting, and now to analytics.
q
–
§
IBM has invested over $20 billion since 2005 to grow its analytics business
Netflix saves $1 billion per year on customer retention using big data
Critical in almost every business and industry including
6
What is Analytics?
§ The science of using data to build models that lead to
better decisions that add value to individuals, to
companies, to institutions
7
This Course
§
Key Messages:
– Analytics provide a competitive edge to individuals and companies
– Analytics are often critical to the success of a company
§
Methodology:
– Teach analytics techniques through real world examples and real data
– Probability and statistical theories
§
Goal:
– Convince you of the Analytics Edge
q
q
q
The power and importance of data
How analytics methods work
How to interpret and understand the results of analytical models
– Inspire you to use analytics in your career
q
q
q
Excel spreadsheet
Software package R
Not just hearing about analytics, but creating your own models
8
This lecture
§
Summary of some of the cases we will cover
– Netflix (Movie Recommendation)
– Quality of Wine
– Twitter (Text analytics)
§
Other cases we will cover in this course
– Summer Job Search
– New Product Development
– Healthcare Quality Prediction
– Court Ruling Prediction
– Criminal Justice - Enron
– MRI brain image Segmentation
– Housing Price Prediction
9
Netflix
10
Netflix
Subscription services
§ Key aspect is being able to offer customers accurate movie
recommendations based on a customer’s own preferences and viewing
history
§
11
The Netflix Prize
From 2006–2009 Netflix ran a contest asking the public to submit
algorithms to predict user ratings for movies
§ Offered a grand prize of $1,000,000 USD to the team who could beat
Netflix’s own algorithm's accuracy by more than 10%
§ Training data set of ~100,000,000 ratings and test data set of ~3,000,000
ratings were provided
§
12
Predicting the User Ratings
§
What data could be used to predict user ratings?
13
Using other users’ rankings: Collaborative Filtering
14
Using movie information: Content Filtering
§
We saw that Amy liked "Men In Black”
– It was directed by Barry Sonnenfeld
– Classified in the genres of action, adventure, sci-fi and comedy
– It stars actor Will Smith
§
Consider recommending to Amy:
– Barry Sonnenfeld’s movie "Get Shorty"
– "Jurassic Park", which is in the genres of action, adventure, and sci-fi
– Will Smith’s movie "Hitch"
15
Winners are declared!
On September 18, 2009, a winning team was announced
§ BellKor’s Pragmatic Chaos won the competition and the $1,000,000 grand
prize
§
16
What is the edge?
§
In today’s digital age, businesses often have hundreds of thousands of items
to offer their customers
§
Excellent recommendation systems can make or break these businesses
§
Clustering algorithms, which are tailored to find similar customers or
similar items, form the backbone of many of these recommendation
systems
17
Predicting the Quality of Wine
Bordeaux is a region in France popular for producing wine.
§ Large differences in price and quality between years, although wine is
produced in a similar way.
§ Taster better when they are older, store young wines
§
– so hard to tell if wine will be good when it is on the market
Expert tasters predict which ones will be good
§ Can analytics be used to come up with a different system for judging wine?
§
18
Predicting the Quality of Wine
§
March 1990 - Orley Ashenfelter, a Princeton economics professor, claims
he can predict wine quality without tasting the wine (assessing the aroma,
looking at the legs)
§
Ashenfelter used a method called linear regression
– Predicts an outcome variable, or dependent variable
– Predicts using a set of independent variables
§
Dependent variable: typical price in 1990-1991 wine auctions
(approximates quality)
§
Independent variables:
19
The Expert’s Reaction
§
Robert Parker, the world's most influential wine expert:
“Ashenfelter is an absolute total sham”
“rather like a movie critic who never goes to see the movie but tells you how
good it is based on the actors and the director”
20
The Results
§
Parker:
– 1986 is “very good to sometimes exceptional”
§
Ashenfelter:
– 1986 is mediocre
– 1989 will be “the wine of the century” and 1990 will be even better!
§
In wine auctions,
– 1989 sold for more than twice the price of 1986
– 1990 sold for even higher prices!
Later, Ashenfelter predicted 2000 and 2003 would be great
§ Parker has stated that “2000 is the greatest vintage Bordeaux has ever
produced”
§
21
What is the edge?
§
A linear regression model with only a few variables can predict wine prices
well
§
In many cases, outperforms wine experts’ opinions
§
A quantitative approach to a traditionally qualitative problem
22
Twitter/X
A social networking and communication website founded in 2006
§ What can you do with Twitter/X?
§
23
Impact of Twitter/X
§
Who are using twitter, and what are they used for?
24
Understanding People
§
Why companies keep official account on Twitter? What can companies do
on Twitter?
25
Using Text as Data
§
Most of the data we are dealing with
– Structured
– Numerical
– Categorical
§
Tweets are
– Loosely structured
– Textual
– Sometimes poor spelling,
non-traditional grammar
– Possibly multilingual
26
Text Analytics
§
Why people care about textual data?
§
How do we handle it?
§
Humans can’t keep up with Internet-scale volumes of data
– 350000 tweets sent per minute
– 500 million tweets sent each day
§
Computers can help
– Need to 'understand’ text
– Natural Language Processing
– Understand and derive meaning from human language
27
Sentiment Analysis
§
Use NLP, Text analytics to identify, extract and study affective states and
subjective information.
28
What is the edge?
§
Twitter and other social media generates large textual data
– Text analytics (sentiment analysis) deal with massive amount of unstructured
data
§
We’ll see how we can build analytics models using text as our data
§
In general, text analytics (including sentiment analysis) are applied to
marketing, customer service, and healthcare…
29
This lecture
§
Summary of some of the examples we will cover
– Netflix (Movie Recommendation)
– Quality of Wine
– Twitter (Text analytics)
§
Other examples we will cover in this course
– Summer Job Search
– New Product Development
– Healthcare Quality Prediction
– Court Ruling Prediction
– Criminal Justice – Enron
– MRI brain image Segmentation
– Housing Price Prediction
30
What is Analytics?
§
The science of using data to build models that lead to better decisions that
add value to individuals, to companies, to institutions
§
Descriptive analytics: identify patterns in the data
– Summary statistics
– Visualizations
– Clustering
– Text analytics
§
Predictive analytics: predict different outcomes
– Linear Regression
– Logistic Regression
– Classification Trees
§
Prescriptive/Operations analytics: give advice on actions to take
– Decision Analysis
– Linear/Integer Optimization
31
Course Schedule
COURSE CONTENT AND TENTATIVE TEACHING SCHEDULE
Week
Date
Topic
Cases
1
4 Sep
Overview: Business Analytics, Probability (Part 1)
2
11 Sep
Probability (Part 2), Decision Analysis (Part 1)
Summer Job Search
3
18 Sep
Decision Analysis (Part 2), Statistical Inference (Part 1)
New Product Development
4
25 Sep
Statistical Inference (Part 2), Linear Regression (Part 1)
Wine Quality Prediction
5
2 Oct
6
9 Oct
7
16 Oct
NO CLASS - Reading Week
8
23 Oct
General Holiday
9
30 Oct
Logistic Regression (Part 2), Clustering (Part 1)
Movie Recommendation
10
6 Nov
Clustering (Part 2), Classification Tree (Part 1)
Court Ruling Prediction
11
13 Nov
Classification Tree (Part 2), Text Analytics (Part 1)
12
20 Nov
Text Analytics (Part 2)
Sentiment Analysis on
Twitter
Crime Investigation - email
13
27 Nov
More cases and Course Review
General Holiday
Linear Regression (Part 2), Logistic Regression (Part 1)
Healthcare Quality
Assessment
32
Course Introduction: Technology
§
Microsoft Excel
§
R
– Will have a brief introduction soon.
34
Goals
Understand the complexity of data and how to deal with data
§ Create your own analytical models
§ Understand, use, and think critically about the results of the models
§ Know what to do next
§
Reach these goals through learning R
§ But ultimately, this is a course about analytics
§ Don’t get lost in R, think about the models and results
§
35
Administrative Arrangements
Course Materials
§ Required Materials
– Lecture notes, assignments, practice problems (on Moodle)
§ Recommended Text Book
–
The Analytics Edge. Dimitris Bertsimas, Allison K. O'Hair, and William R.
Pulleyblank. Dynamic Ideas LLC., 2016.
37
Manage Lecture Slides
§ Every one topic
– One before-class slides
§ Every one class, one topic
– One after-class slides
38
Assessment
Participation + Attendance
5%
Individual assignments
30%
Group Projects
25%
Final Exam
40%
Total
100%
39
Assessment
§
§
§
Participation + Attendance
5%
Individual assignments
30%
Group Projects
Final Exam
Total
25%
40%
100%
In-Class Participation
– Attending and actively contributing to class discussions.
– In class practice questions and pop-up surveys (they are not quiz, only participation will be recorded).
– If you participate during the class, please fill in the paper in the front during the break or after class
(you may also grab a snack). If you forget to do this, please send TA (cc me) an email (including what
you asked or answer) at the end of every class (no later than 10 pm of the day).
Attendance
– Attendance app
No disruptions in class
– Laptops are needed throughout
– NO CROSSTALKING
– NO CELL PHONE (including phone calls)
40
– Tablets or laptops for taking notes are allowed
Assessment
Participation + Attendance
Individual assignments
5%
30%
Group Projects
Final Exam
Total
25%
40%
100%
Individual assignment policy
– Strict due date/time (posted online) enforced, no excuses.
§ Homework questions are meant to be extensions of what we do in class.
§ Highly encourage you to do your homework with your classmates. Do not
copy!
§ At most 5 graded assignments.
§
41
Assessment
Participation + Attendance
Individual assignments
5%
30%
Group Project
Final Exam
Total
25%
40%
100%
4–5 people
Apply analytics tools to a wide variety of real-world settings
– Project proposal (middle of the course)
– Discussion with the instructor
– Project report (end of the course)
§ More details to be announced during the next class.
§
§
42
Assessment
Participation + Attendance
Individual assignments
5%
30%
Group Projects
Final Exam
Total
25%
40%
100%
During the assessment period.
§ Homework assignments, practice problems
§ TBD
§
43
ChatGPT Policy
§ Learning:
– Could be helpful.
§ Assignments:
– Directly copy answers from ChatGPT is prohibited. If caught, the grade
of that assignment is zero.
– Rely on GPT is not wise, since you cannot use them in the exams.
§ Group project:
– If you used GenAI, you need to declare how you used them.
– Helpful in polish your writing, learn fancy tools faster.
§ More discussion in later courses
– It is also new to me.
44
Academic Conduct
§
Academic dishonesty is ABSOLUTELY NOT TORELATED.
– No second chance.
§
Highly encourage you to do your homework with your classmates. Do not
copy!
45
Academic Support
1.
Tutorial Sessions (TA)
Highly recommended, not mandatory.
2.
Each other
Leverage your classmates and your friends to ask questions and figure
out things
3.
Office Hours or By Appointment
46
What is R?
A software environment for data analysis, statistical computing, and
graphics
§ Natural to use, complete data analyses in just a few lines
§ Can create almost any analytics model imaginable
§ Significantly more powerful than Excel
§
§
A programming language
– There is a lot more that can be done in R
§
Don’t worry
– We won't be doing much programming in class, and almost everything we ask
you to do can be completed in a few lines.
47
History of R
§
Originated from S
– A statistical programming language developed by John Chambers at Bell Labs
in the 1970s
§
The first version of R was developed by Robert Gentleman and Ross Ihaka
at the University of Auckland in the mid-1990s
– Wanted a better statistical software in their Macintosh teaching laboratory
– An open-source alternative: encourage others to download and help develop the
software
– R packages
48
Why use R?
§
There are many choices for data analysis software
– SAS, Stata, SPSS, Excel (with add-ons), MATLAB, Minitab. . .
– So why are we using R?
§
Free (open-source project)
§
Widely used
– More than 2 million users around the world
– New features are being developed all the time
– A lot of community resources
Easy to re-run previous work and make adjustments
§ Nice graphics and visualizations
§
49
R resources
Official page: http://www.r-project.org
§ Download page: http://www.cran.r-project.org
§ Some helpful websites:
• http://www.statmethods.net
• http://www.rseek.org
§
Looking for a command or function? Google it
§ Best way to learn R is through trial and error
§
50
RStudio
Official page: https://www.rstudio.com
§ Interactive integrated development environment (IDE)
§ Include a code editor with many R specific features, a console to execute
your code, and other useful panes, including one to show figures
§
51
Course objectives
§
This course should make you comfortable using analytics in your career
and your life
§
You will know how to work with real data, and will learn many different
methodologies
§
We want to convince you of the edge of business analytics
52
Take a Quick Survey
§
https://forms.gle/h1si1U2nwLkX3kzs8
§
Deadline of the survey: Sep 7, Thursday 6 pm.
53
Download