Uploaded by Jonathan Chen

COMM1110 Lecture Slides Week 2 T3 2022

advertisement
If you are considering coming to campus, please ensure that you:
•
Do not come to campus if you have any symptoms or are unwell or you are considered
a close contact of a COVID-19 positive case
•
We strongly encourage you to wear a good quality mask e.g., N95 if possible.
If you forget to bring a face mask to campus, they can be purchased from the
IGA, Pharmacy or the Post Office.
•
Maintain physical distancing and density at not more than 1 person per 2sqm indoors
•
Wash or sanitise your hands regularly
•
Open windows to maximise air flow where possible
•
Limit unnecessary movements around campus
More information can be found at:
https://www.covid-19.unsw.edu.au
COMM1110 Evidence-Based Problem Solving
Week 2: Problem articulation and
disaggregation
Lecturer: Jonathan Lim
General housekeeping:
• Please switch your microphone to mute to avoid disruption to the class
• Use the chat channel to ask questions or make a comment, or raise your 'virtual' hand
• If you have poor internet, turn off your video
• Wait for your lecturer to start
This week:
This week will focus on:
Bullet Proof
Problem Solving
Framework
Problem-solving step
1.
Scoping (define the problem,
disaggregate)
2.
Analyse (prioritize, workplan,
analyse)
3.
Decision (synthesize,
communicate)
Problem
Solving
Tools
• Logic trees (branch
framing, MECE,
prioritization)
• Descriptive statistics
• Graphs: Frequency
distributions, bar charts,
pie charts & histograms
Case studies
and examples
• Furniture case
study
• UNSW Travel
• Birth data
This week’s to-do list:
• Work through the online learning materials for statistical tools
• Worked example videos of covariance, location, variability and
variance standardisation
• Descriptive statistics - Online videos for numerical measures
• Tutorial
• Tutorial preparation: Complete, as much as possible, of the
statistical examples
• Assessments
• Excel training program – start now
• Case – Start ‘information toolbox’ by identifying references. The library module
provides guidance on search strategies.
Problem articulation and disaggregation
Defining the problem
•
•
Starting point for great problem solving (important which is
covered in tutorials)
We will build on defining the problem by tapping into
problem articulation, especially how to use evidence to
understand what happened (e.g. summarising and
transforming data into useful information), which helps us
define the problem
Problem disaggregation
•
Taking the problem apart helps us see potential ways to
solve it
•
Any problem of real consequence is too complicated to
solve without breaking it down into logical parts that help
us understand the drivers or causes of the situation
Problem-solving step
1.
Scoping (define the problem,
disaggregate)
2.
Analyse (prioritize, workplan,
analyse)
3.
Decision (synthesize,
communicate)
Case study - Furniture store
Will we be able to increase furniture sales?
Problem disaggregation:
Information Toolbox - Logic Trees
• Provides clear visual representation of the problem so we
can understand component parts
• Done correctly, they are holistic (all relevant evidence
captured in the tree) so right questions can be asked of the
problem
• Leads to clear hypotheses (explanations/ideas) that can be
tested with evidence
Factor logic trees
Furniture store
First breakdown
Furniture store
Second breakdown
No. of store locations
Instore
Instore
No. of product lines instore
Level of customer satisfaction
Will we be able
to increase sales?
Will we be able
to increase sales?
Website useability
Online
Online No. of delivery options
No. of product lines on website
Factor logic trees
Logic trees – Extend branch framing
Frame
Key elements
Customer / shareholder /
employee
Competing perspectives
Price / volume
Are there different products or commodities? Where is
market share? What products are being adapted by
customers?
Regulate / incentives
Will legal regulation, taxation, subsidies or nudging policies
change the outcome?
Equity / liberty
Equality among citizens vs allowing more individual
freedom?
Near / long term
Trade-offs in immediate future vs decades into the future?
Financial/ non-financial
Benefits of financial vs non-financial?
Logic trees – Prioritization to the core problem
High
Potential scale of impact
Low
High
Low
Ability to influence
Statistical toolbox – What evidence do
we have to articulate the problem?
• Sample vs population
• Types of statistical data
• Cross-section vs time series
Using evidence to understand the problem
Perhaps sales might increase if
customers are satisfied with our
sales assistants?
Also, it may be useful to know
satisfaction
• By customer profile (age, gender,
individual/business customer etc.)
• And/or has satisfaction changed over
time?
Using evidence to understand the problem
Furniture store case study – customer satisfaction
• 60,000 customers last year → Population (what we are interested in)
• 3,000 customers completed customer satisfaction surveys for sales assistants →
Sample (subset of the population that we have data for)
• Problem: How can we use sample information to gain some insights about the
population?
• Solution: Can describe the data (this week) but also may want to use the data
to say something about the population (use inferential statistics covered in
Weeks 7/8)
Using evidence to understand the problem
Furniture store case study – customer satisfaction
• Have 3,000 rows of customer satisfaction data in an Excel spreadsheet
• How do we use these data to solve our problem?
• First need to recognize different types of data as that has implications for how
the data are summarized
Survey Completion Data
and Time
12/12/2020
12/12/2020
12/12/2020
12/12/2020
16/12/2020
16/12/2020
16/12/2020
19/12/2020
20/12/2020
…
Sales
Assistant
K Jones
K Jones
H Smith
H Smith
H Smith
B Clark
B Clark
B Clark
B Clark
…
Customer Satisfaction Service
5. Excellent
5. Excellent
5. Excellent
3. Good
3. Good
4. Great
2. Fair
2. Fair
1. Poor
…
Customer
Gender
Male
Male
Female
Female
Male
Male
Female
Female
Male
…
Number of Items
Purchased
2
1
3
1
5
6
1
3
4
…
Sales Value
$
99.95
$ 1,500.00
$ 12,000.00
$
500.00
$ 1,335.00
$ 2,449.95
$
129.95
$
359.00
$
450.00
…
Types of statistical data
Types of data
Variable type
Survey Completion Data
and Time
12/12/2020
Observations
12/12/2020
12/12/2020
12/12/2020
16/12/2020
16/12/2020
16/12/2020
19/12/2020
20/12/2020
…
A variable (each column
represents a variable
here) is a characteristic
of a population or of a
sample from a population
Sales
Assistant
K Jones
K Jones
H Smith
H Smith
H Smith
B Clark
B Clark
B Clark
B Clark
…
Customer Satisfaction Service
5. Excellent
5. Excellent
5. Excellent
3. Good
3. Good
4. Great
2. Fair
2. Fair
1. Poor
…
Customer
Gender
Male
Male
Female
Female
Male
Male
Female
Female
Male
…
Number of Items
Purchased
2
1
3
1
5
6
1
3
4
…
Sales Value
$
99.95
$ 1,500.00
$ 12,000.00
$
500.00
$ 1,335.00
$ 2,449.95
$
129.95
$
359.00
$
450.00
…
In order to apply statistical analyses directly to qualitative data, we
must convert it somehow to quantitative data (e.g. convert customer
satisfaction Excellent → 5 Great → 4, Good → 3, Fair → 2, Poor → 1)
A data set contains observations on variables (e.g. the table above shows the customer satisfaction data set ).
Types of data
Total number of customers served
by K Jones
350
300
250
200
150
100
Cross sectional data consist of
measurements of one or more concepts
at a single point in time
• In July how many customers did
each assistant serve?
The type of data influences what sort of
analysis and presentation works best
50
0
Time series data consist of measurements
of the same concept at different points in
time
• The time series plot is a convenient
summary but note you have a choice
of what level of aggregation to use
• Using monthly data for customers
served makes sense as it highlights
the end-of-year peak in sales
Using evidence to understand the problem
Furniture store case study – customer satisfaction
Are customers satisfied with our sales assistants?
(e.g. H Smith received Excellent, Good ratings from customers; B Clark received
Great, Fair, and Poor ratings from customers, but there are 3,000 observations!)
Survey Completion Data
and Time
12/12/2020
12/12/2020
12/12/2020
12/12/2020
16/12/2020
16/12/2020
16/12/2020
19/12/2020
20/12/2020
…
Sales
Assistant
K Jones
K Jones
H Smith
H Smith
H Smith
B Clark
B Clark
B Clark
B Clark
…
Customer Satisfaction Service
5. Excellent
5. Excellent
5. Excellent
3. Good
3. Good
4. Great
2. Fair
2. Fair
1. Poor
…
Customer
Gender
Male
Male
Female
Female
Male
Male
Female
Female
Male
…
Number of Items
Purchased
2
1
3
1
5
6
1
3
4
…
Solution: Summarise the data possibly with visualizations!
Sales Value
$
99.95
$ 1,500.00
$ 12,000.00
$
500.00
$ 1,335.00
$ 2,449.95
$
129.95
$
359.00
$
450.00
…
Using evidence to understand the problem
Furniture store case study – customer satisfaction
Visualising data
Average Customer Satisfaction
5.00
4.00
3.00
2.00
1.00
K Jones appears to have the highest average customer satisfaction ratings over
time. Visualising data helps us to generate this insight. Now that we’ve summarized
the data do we better understand the problem?
0.00
K Jones
H Smith
B Clark
Using evidence to understand the problem
Furniture store case study – customer satisfaction
Further notes
• You need to be able to produce graphs as in previous slide
o See this week’s individual study material & associated tutorials
• Summarising data helps to highlight key features of the data but
there are many choices in how this is done
o Some of these are covered next
o COMM1190 builds upon this foundation
• Evidence other than the survey data would also be relevant
• Online reviews (e.g. Google review)
• Interviews and performance reports from managers
Statistical toolbox – Disaggregate the
problem with descriptive statistics
• One variable
• Frequency distributions, bar charts, pie charts & histograms
• Shapes of distributions
• Measures of central tendency or location
• Measures of dispersion or spread
• Two variables (mostly done in week 4):
• Scatter plots and cross-tabulations to describing bivariate
relations
• Measures of association i.e. correlation and covariance
• Introduction to linear regression
Using evidence to understand the problem
UNSW travel case study
• UNSW routinely surveys staff & students to monitor travel
patterns & trends
• Such data provides evidence to inform operational problem solving &
forward planning
• See 2019 survey results here
• Similar analysis will be provided using the 2011 data
• Frequency distributions, bar charts & pie charts will be used
Frequency distributions, bar charts and
pie charts
• Bar chart provides graphical
representation of frequency
distribution of mode of transport
• 2011 survey a sample of 5,881
responses
•
•
•
•
•
•
•
47 (0.8%) Resident
628 (10.7%) Walk
210 (3.6%) Cycle
1,032 (17.5%) Car
1,188 (20.2%) Bus
2,669 (45.4%) Bus and Train
107 (1.8%) Other
Bar chart of mode of transport to UNSW Campus
3000
2500
2000
1500
1000
500
0
Resident
Walk
Cycle
Car
Bus
Bus &
Train
Other
Frequency distributions, bar charts and
pie charts
• Pie charts show
relative
frequencies more
explicitly
Pie chart of mode of transport to UNSW Campus
Other, 1.8% Resident,
0.8%
Walk,
10.7%
Cycle, 3.6%
Resident
Walk
Cycle
Bus & Train,
45.4%
Car, 17.5%
Car
Bus
Bus & Train
Bus, 20.2%
Other
Frequency distributions, bar charts and
pie charts
Commuter Type
Mode of transport by commuter type
3000
Mode
2500
Frequency
Resident
Staff
Student
Total
0
47
47
Walk
97
531
628
Cycle
52
158
210
Car
472
560
1032
Bus
186
1002
1188
Bus & Train
230
2439
2669
25
82
107
1062
4819
5881
2000
1500
1000
500
0
Other
Staff
Students
Total
Frequency distributions, bar charts and
pie charts
• Is there a better
representation?
Mode of transport by commuter type - Example 2
0.60
Relative frequency by type
• What does the
previous bar graph
highlight?
0.50
0.40
0.30
0.20
0.10
0.00
Resident Walk
Bike
Staff
Car
Students
Bus
Bus &
train
Other
Using evidence to understand the problem
UNSW travel case study
• Such surveys provided evidence base supporting the need for
light rail to service travel to UNSW
• Will eventually provide evidence about the impact of light rail (a
before & after comparison)
• Need to recognize that there are choices in how the same
data can be summarized
• These choices need to be guided by the problem being solved
• Also need to recognize that data will always have limitations
• Covered in weeks 7 and 8
Statistical toolbox
Furniture store example: ‘Are customers satisfied with our sales
assistants?’
• Different evidence - sales performance - to look at the same question
Disaggregate the problem with descriptive statistics
• Histograms to determine symmetry, skewness, modal classes &
outliers
• Comparing measures of central tendency and spread
Using evidence for a refined problem
Furniture store case study – sales performance
Are sales assistant different in terms of their sales performance?
• Started with general problem of monitoring staff performance
• Initially looked at customer satisfaction but equally important to monitor sales as
a performance measure
• Data are available on the sales amount to individual customers and the number
of items sold so choices in what to use
o Could also use these two variables to construct the average purchase amount per customer
• Will develop an evidence base comparing the different sales assistants in terms
of these variables
Using evidence for an extended problem
Furniture store case study – sales performance
Are sales different depending on where customers heard about the store?
• Started with general problem of monitoring staff performance
• Initially looked at customer satisfaction but equally important to monitor sales as
a performance measure & how that relates to marketing
• Data are available from a different survey of customers who purchased furniture
o Focus on actual sales (spend) & amount willing to spend (budget) when entering store
o Could also use these two variables to construct the amount spent as a share of the budget
o Also know where they said they heard about the store
• Develop an evidence base comparing sales in terms of where the customer
heard about the store concentrating on web presence
Histograms
• Suppose data are ordinal (whether discrete or continuous)
o Obvious categories for the data values may not exist
o Can create categories or classes by defining lower & upper class
limits
o Categories need to be mutually exclusive and exhaustive
• How many categories? (Excel calls them bins)
o Too many ➔ doesn’t summarize
o Too few ➔ no information
o No set rules on number of bins, although having more observations
means one generally wants more bins
o Bins need not be of equal width & may be open-ended at the top or
bottom
spend
budget
More
195000
180000
165000
150000
135000
120000
105000
90000
75000
60000
45000
30000
50
45
40
35
30
25
20
15
10
5
0
Frequency
Histogram for amount spent by customer
15000
More
15000
13750
12500
11250
10000
8750
7500
6250
5000
3750
2500
1250
Frequency
Histograms
Histogram for budget of customer
250
200
150
100
50
0
Histograms
• Consider trimmed sample
excluding 4 largest
observations
Histogram for budget with trimmed sample
Frequency
• Budget histogram is not
informative for bulk of
data because of several
customers with relatively
large budgets (outliers)
70
60
50
40
30
20
10
0
budget
Describing histograms
• Symmetry (or lack thereof)
o Left half of a symmetric histogram is a mirror image of right half
o Famous ‘bell-shaped curve’ (normal distribution) is symmetric
• Skewness
o A feature of an asymmetric histogram
o Long tail to the right: positively skewed
o Long tail to the left: negatively skewed
o May be associated with outliers
• Number of modal classes/bins
o The modal class is the class with highest frequency
o Histograms may be unimodal or multimodal
37
Describing histograms
• Notice some customers
spend more than their
initial budget
Histogram for spending as a ratio of budget
60
50
Frequency
• Distribution of budget
skewed by outliers
• Distribution of
spend/budget has no
obvious outliers but is
positively skewed
40
30
20
10
0
0.1
0.2
0.3
0.4
0.5
0.6 0.7 0.8
spendratio
0.9
1
1.1
1.2 More
38
Using evidence for an extended problem
Furniture store case study – sales performance
Further analysis
• Comparing distributions of spend & budget is informative but
further summarization is helpful
• Providing numerical summaries is useful
• How do spend & budget compare “on average”
• Different summary statistics can answer this type of question
• Most common measures of “location” being the mean & median
Using evidence for an extended problem
Furniture store case study – sales performance
Further analysis
• The means for our sample indicate
• spend: $7529; budget: $22,119; & spend/budget: 0.395
• On average customers who make purchases spend about 40% of
their budget in the store
• From histograms budget is very skewed due to large outliers
so does it matter much if report medians
• spend: $7000; budget: $21,000; & spend/budget: 0.366
• In each case median<mean indicating some skewness but overall
message unchanged – customers spend a lot less than their budget
Using evidence for an extended problem
Furniture store case study – sales performance
Further analysis
• Another characteristic of the distributions is the spread – how
much variation is there in the average sale?
• Most common measure of dispersion or spread is the variance (or the
standard deviation)
• Standard deviations
• spend: $3,939; budget: $15,885; & spend/budget: 0.250
• budget is relatively more dispersed, again because of outliers
• Standard deviation for trimmed sample is only $5,252
Using evidence for an extended problem
Furniture store case study – sales performance
Are sales different depending on where customers heard
about the store?
• As most interested in marketing via the web define 𝑤𝑒𝑏 = 1 if
customer was aware of store from web search or store
website & zero otherwise
o If 𝑤𝑒𝑏 = 1 then means are spend: $7,481; budget: $23,716
o If 𝑤𝑒𝑏 = 0 then means are spend: $7,547; budget: $21,509
• Customers attracted via the web tend to have larger budgets
but then tend to spend less on average
o Does this present a problem?
Using evidence for an extended problem
Furniture store case study – sales performance
Are sales different depending on where customers heard
about the store?
• Stressed survey is a sample from the population of sales data
• Can we confidently say that customers attracted via the web tend to
have larger budgets but then tend to spend less on average
• Such a conclusion relates to a comparison of population means
whereas what was provided was a comparison of sample means
• Are differences observed in the sample data “real” or simply a matter
of random variation not related to how the customer became aware of
the store?
Using evidence for an extended problem
Furniture store case study – sales performance
Are sales different depending on where customers heard
about the store?
• Making comparisons of population means is covered in
statistical inference that will be introduced later in the course
• In the language of inference, hypotheses will be developed & tested
• For the moment, our evidence base is descriptive which is useful but
only part of the answer
Re-cap
Defining the problem
•
Have stressed problem articulation, especially how to use quantitative evidence to
understand what happened
o Think in terms of setting the scene by using data to obtain stylized facts
o Basic descriptive statistics is important here
Problem disaggregation
•
Stressed importance of breaking down problems into constituent parts
o These parts become amenable to analysis with some statistical tools that were
illustrated
o Yes, you may need to synthesize all the parts but that comes later in the course
If you have any questions about the
course, please email:
comm1110@unsw.edu.au
The lecture recording will be made
available in your Moodle course site.
Thank you
Download