If you are considering coming to campus, please ensure that you: • Do not come to campus if you have any symptoms or are unwell or you are considered a close contact of a COVID-19 positive case • We strongly encourage you to wear a good quality mask e.g., N95 if possible. If you forget to bring a face mask to campus, they can be purchased from the IGA, Pharmacy or the Post Office. • Maintain physical distancing and density at not more than 1 person per 2sqm indoors • Wash or sanitise your hands regularly • Open windows to maximise air flow where possible • Limit unnecessary movements around campus More information can be found at: https://www.covid-19.unsw.edu.au COMM1110 Evidence-Based Problem Solving Week 2: Problem articulation and disaggregation Lecturer: Jonathan Lim General housekeeping: • Please switch your microphone to mute to avoid disruption to the class • Use the chat channel to ask questions or make a comment, or raise your 'virtual' hand • If you have poor internet, turn off your video • Wait for your lecturer to start This week: This week will focus on: Bullet Proof Problem Solving Framework Problem-solving step 1. Scoping (define the problem, disaggregate) 2. Analyse (prioritize, workplan, analyse) 3. Decision (synthesize, communicate) Problem Solving Tools • Logic trees (branch framing, MECE, prioritization) • Descriptive statistics • Graphs: Frequency distributions, bar charts, pie charts & histograms Case studies and examples • Furniture case study • UNSW Travel • Birth data This week’s to-do list: • Work through the online learning materials for statistical tools • Worked example videos of covariance, location, variability and variance standardisation • Descriptive statistics - Online videos for numerical measures • Tutorial • Tutorial preparation: Complete, as much as possible, of the statistical examples • Assessments • Excel training program – start now • Case – Start ‘information toolbox’ by identifying references. The library module provides guidance on search strategies. Problem articulation and disaggregation Defining the problem • • Starting point for great problem solving (important which is covered in tutorials) We will build on defining the problem by tapping into problem articulation, especially how to use evidence to understand what happened (e.g. summarising and transforming data into useful information), which helps us define the problem Problem disaggregation • Taking the problem apart helps us see potential ways to solve it • Any problem of real consequence is too complicated to solve without breaking it down into logical parts that help us understand the drivers or causes of the situation Problem-solving step 1. Scoping (define the problem, disaggregate) 2. Analyse (prioritize, workplan, analyse) 3. Decision (synthesize, communicate) Case study - Furniture store Will we be able to increase furniture sales? Problem disaggregation: Information Toolbox - Logic Trees • Provides clear visual representation of the problem so we can understand component parts • Done correctly, they are holistic (all relevant evidence captured in the tree) so right questions can be asked of the problem • Leads to clear hypotheses (explanations/ideas) that can be tested with evidence Factor logic trees Furniture store First breakdown Furniture store Second breakdown No. of store locations Instore Instore No. of product lines instore Level of customer satisfaction Will we be able to increase sales? Will we be able to increase sales? Website useability Online Online No. of delivery options No. of product lines on website Factor logic trees Logic trees – Extend branch framing Frame Key elements Customer / shareholder / employee Competing perspectives Price / volume Are there different products or commodities? Where is market share? What products are being adapted by customers? Regulate / incentives Will legal regulation, taxation, subsidies or nudging policies change the outcome? Equity / liberty Equality among citizens vs allowing more individual freedom? Near / long term Trade-offs in immediate future vs decades into the future? Financial/ non-financial Benefits of financial vs non-financial? Logic trees – Prioritization to the core problem High Potential scale of impact Low High Low Ability to influence Statistical toolbox – What evidence do we have to articulate the problem? • Sample vs population • Types of statistical data • Cross-section vs time series Using evidence to understand the problem Perhaps sales might increase if customers are satisfied with our sales assistants? Also, it may be useful to know satisfaction • By customer profile (age, gender, individual/business customer etc.) • And/or has satisfaction changed over time? Using evidence to understand the problem Furniture store case study – customer satisfaction • 60,000 customers last year → Population (what we are interested in) • 3,000 customers completed customer satisfaction surveys for sales assistants → Sample (subset of the population that we have data for) • Problem: How can we use sample information to gain some insights about the population? • Solution: Can describe the data (this week) but also may want to use the data to say something about the population (use inferential statistics covered in Weeks 7/8) Using evidence to understand the problem Furniture store case study – customer satisfaction • Have 3,000 rows of customer satisfaction data in an Excel spreadsheet • How do we use these data to solve our problem? • First need to recognize different types of data as that has implications for how the data are summarized Survey Completion Data and Time 12/12/2020 12/12/2020 12/12/2020 12/12/2020 16/12/2020 16/12/2020 16/12/2020 19/12/2020 20/12/2020 … Sales Assistant K Jones K Jones H Smith H Smith H Smith B Clark B Clark B Clark B Clark … Customer Satisfaction Service 5. Excellent 5. Excellent 5. Excellent 3. Good 3. Good 4. Great 2. Fair 2. Fair 1. Poor … Customer Gender Male Male Female Female Male Male Female Female Male … Number of Items Purchased 2 1 3 1 5 6 1 3 4 … Sales Value $ 99.95 $ 1,500.00 $ 12,000.00 $ 500.00 $ 1,335.00 $ 2,449.95 $ 129.95 $ 359.00 $ 450.00 … Types of statistical data Types of data Variable type Survey Completion Data and Time 12/12/2020 Observations 12/12/2020 12/12/2020 12/12/2020 16/12/2020 16/12/2020 16/12/2020 19/12/2020 20/12/2020 … A variable (each column represents a variable here) is a characteristic of a population or of a sample from a population Sales Assistant K Jones K Jones H Smith H Smith H Smith B Clark B Clark B Clark B Clark … Customer Satisfaction Service 5. Excellent 5. Excellent 5. Excellent 3. Good 3. Good 4. Great 2. Fair 2. Fair 1. Poor … Customer Gender Male Male Female Female Male Male Female Female Male … Number of Items Purchased 2 1 3 1 5 6 1 3 4 … Sales Value $ 99.95 $ 1,500.00 $ 12,000.00 $ 500.00 $ 1,335.00 $ 2,449.95 $ 129.95 $ 359.00 $ 450.00 … In order to apply statistical analyses directly to qualitative data, we must convert it somehow to quantitative data (e.g. convert customer satisfaction Excellent → 5 Great → 4, Good → 3, Fair → 2, Poor → 1) A data set contains observations on variables (e.g. the table above shows the customer satisfaction data set ). Types of data Total number of customers served by K Jones 350 300 250 200 150 100 Cross sectional data consist of measurements of one or more concepts at a single point in time • In July how many customers did each assistant serve? The type of data influences what sort of analysis and presentation works best 50 0 Time series data consist of measurements of the same concept at different points in time • The time series plot is a convenient summary but note you have a choice of what level of aggregation to use • Using monthly data for customers served makes sense as it highlights the end-of-year peak in sales Using evidence to understand the problem Furniture store case study – customer satisfaction Are customers satisfied with our sales assistants? (e.g. H Smith received Excellent, Good ratings from customers; B Clark received Great, Fair, and Poor ratings from customers, but there are 3,000 observations!) Survey Completion Data and Time 12/12/2020 12/12/2020 12/12/2020 12/12/2020 16/12/2020 16/12/2020 16/12/2020 19/12/2020 20/12/2020 … Sales Assistant K Jones K Jones H Smith H Smith H Smith B Clark B Clark B Clark B Clark … Customer Satisfaction Service 5. Excellent 5. Excellent 5. Excellent 3. Good 3. Good 4. Great 2. Fair 2. Fair 1. Poor … Customer Gender Male Male Female Female Male Male Female Female Male … Number of Items Purchased 2 1 3 1 5 6 1 3 4 … Solution: Summarise the data possibly with visualizations! Sales Value $ 99.95 $ 1,500.00 $ 12,000.00 $ 500.00 $ 1,335.00 $ 2,449.95 $ 129.95 $ 359.00 $ 450.00 … Using evidence to understand the problem Furniture store case study – customer satisfaction Visualising data Average Customer Satisfaction 5.00 4.00 3.00 2.00 1.00 K Jones appears to have the highest average customer satisfaction ratings over time. Visualising data helps us to generate this insight. Now that we’ve summarized the data do we better understand the problem? 0.00 K Jones H Smith B Clark Using evidence to understand the problem Furniture store case study – customer satisfaction Further notes • You need to be able to produce graphs as in previous slide o See this week’s individual study material & associated tutorials • Summarising data helps to highlight key features of the data but there are many choices in how this is done o Some of these are covered next o COMM1190 builds upon this foundation • Evidence other than the survey data would also be relevant • Online reviews (e.g. Google review) • Interviews and performance reports from managers Statistical toolbox – Disaggregate the problem with descriptive statistics • One variable • Frequency distributions, bar charts, pie charts & histograms • Shapes of distributions • Measures of central tendency or location • Measures of dispersion or spread • Two variables (mostly done in week 4): • Scatter plots and cross-tabulations to describing bivariate relations • Measures of association i.e. correlation and covariance • Introduction to linear regression Using evidence to understand the problem UNSW travel case study • UNSW routinely surveys staff & students to monitor travel patterns & trends • Such data provides evidence to inform operational problem solving & forward planning • See 2019 survey results here • Similar analysis will be provided using the 2011 data • Frequency distributions, bar charts & pie charts will be used Frequency distributions, bar charts and pie charts • Bar chart provides graphical representation of frequency distribution of mode of transport • 2011 survey a sample of 5,881 responses • • • • • • • 47 (0.8%) Resident 628 (10.7%) Walk 210 (3.6%) Cycle 1,032 (17.5%) Car 1,188 (20.2%) Bus 2,669 (45.4%) Bus and Train 107 (1.8%) Other Bar chart of mode of transport to UNSW Campus 3000 2500 2000 1500 1000 500 0 Resident Walk Cycle Car Bus Bus & Train Other Frequency distributions, bar charts and pie charts • Pie charts show relative frequencies more explicitly Pie chart of mode of transport to UNSW Campus Other, 1.8% Resident, 0.8% Walk, 10.7% Cycle, 3.6% Resident Walk Cycle Bus & Train, 45.4% Car, 17.5% Car Bus Bus & Train Bus, 20.2% Other Frequency distributions, bar charts and pie charts Commuter Type Mode of transport by commuter type 3000 Mode 2500 Frequency Resident Staff Student Total 0 47 47 Walk 97 531 628 Cycle 52 158 210 Car 472 560 1032 Bus 186 1002 1188 Bus & Train 230 2439 2669 25 82 107 1062 4819 5881 2000 1500 1000 500 0 Other Staff Students Total Frequency distributions, bar charts and pie charts • Is there a better representation? Mode of transport by commuter type - Example 2 0.60 Relative frequency by type • What does the previous bar graph highlight? 0.50 0.40 0.30 0.20 0.10 0.00 Resident Walk Bike Staff Car Students Bus Bus & train Other Using evidence to understand the problem UNSW travel case study • Such surveys provided evidence base supporting the need for light rail to service travel to UNSW • Will eventually provide evidence about the impact of light rail (a before & after comparison) • Need to recognize that there are choices in how the same data can be summarized • These choices need to be guided by the problem being solved • Also need to recognize that data will always have limitations • Covered in weeks 7 and 8 Statistical toolbox Furniture store example: ‘Are customers satisfied with our sales assistants?’ • Different evidence - sales performance - to look at the same question Disaggregate the problem with descriptive statistics • Histograms to determine symmetry, skewness, modal classes & outliers • Comparing measures of central tendency and spread Using evidence for a refined problem Furniture store case study – sales performance Are sales assistant different in terms of their sales performance? • Started with general problem of monitoring staff performance • Initially looked at customer satisfaction but equally important to monitor sales as a performance measure • Data are available on the sales amount to individual customers and the number of items sold so choices in what to use o Could also use these two variables to construct the average purchase amount per customer • Will develop an evidence base comparing the different sales assistants in terms of these variables Using evidence for an extended problem Furniture store case study – sales performance Are sales different depending on where customers heard about the store? • Started with general problem of monitoring staff performance • Initially looked at customer satisfaction but equally important to monitor sales as a performance measure & how that relates to marketing • Data are available from a different survey of customers who purchased furniture o Focus on actual sales (spend) & amount willing to spend (budget) when entering store o Could also use these two variables to construct the amount spent as a share of the budget o Also know where they said they heard about the store • Develop an evidence base comparing sales in terms of where the customer heard about the store concentrating on web presence Histograms • Suppose data are ordinal (whether discrete or continuous) o Obvious categories for the data values may not exist o Can create categories or classes by defining lower & upper class limits o Categories need to be mutually exclusive and exhaustive • How many categories? (Excel calls them bins) o Too many ➔ doesn’t summarize o Too few ➔ no information o No set rules on number of bins, although having more observations means one generally wants more bins o Bins need not be of equal width & may be open-ended at the top or bottom spend budget More 195000 180000 165000 150000 135000 120000 105000 90000 75000 60000 45000 30000 50 45 40 35 30 25 20 15 10 5 0 Frequency Histogram for amount spent by customer 15000 More 15000 13750 12500 11250 10000 8750 7500 6250 5000 3750 2500 1250 Frequency Histograms Histogram for budget of customer 250 200 150 100 50 0 Histograms • Consider trimmed sample excluding 4 largest observations Histogram for budget with trimmed sample Frequency • Budget histogram is not informative for bulk of data because of several customers with relatively large budgets (outliers) 70 60 50 40 30 20 10 0 budget Describing histograms • Symmetry (or lack thereof) o Left half of a symmetric histogram is a mirror image of right half o Famous ‘bell-shaped curve’ (normal distribution) is symmetric • Skewness o A feature of an asymmetric histogram o Long tail to the right: positively skewed o Long tail to the left: negatively skewed o May be associated with outliers • Number of modal classes/bins o The modal class is the class with highest frequency o Histograms may be unimodal or multimodal 37 Describing histograms • Notice some customers spend more than their initial budget Histogram for spending as a ratio of budget 60 50 Frequency • Distribution of budget skewed by outliers • Distribution of spend/budget has no obvious outliers but is positively skewed 40 30 20 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 spendratio 0.9 1 1.1 1.2 More 38 Using evidence for an extended problem Furniture store case study – sales performance Further analysis • Comparing distributions of spend & budget is informative but further summarization is helpful • Providing numerical summaries is useful • How do spend & budget compare “on average” • Different summary statistics can answer this type of question • Most common measures of “location” being the mean & median Using evidence for an extended problem Furniture store case study – sales performance Further analysis • The means for our sample indicate • spend: $7529; budget: $22,119; & spend/budget: 0.395 • On average customers who make purchases spend about 40% of their budget in the store • From histograms budget is very skewed due to large outliers so does it matter much if report medians • spend: $7000; budget: $21,000; & spend/budget: 0.366 • In each case median<mean indicating some skewness but overall message unchanged – customers spend a lot less than their budget Using evidence for an extended problem Furniture store case study – sales performance Further analysis • Another characteristic of the distributions is the spread – how much variation is there in the average sale? • Most common measure of dispersion or spread is the variance (or the standard deviation) • Standard deviations • spend: $3,939; budget: $15,885; & spend/budget: 0.250 • budget is relatively more dispersed, again because of outliers • Standard deviation for trimmed sample is only $5,252 Using evidence for an extended problem Furniture store case study – sales performance Are sales different depending on where customers heard about the store? • As most interested in marketing via the web define 𝑤𝑒𝑏 = 1 if customer was aware of store from web search or store website & zero otherwise o If 𝑤𝑒𝑏 = 1 then means are spend: $7,481; budget: $23,716 o If 𝑤𝑒𝑏 = 0 then means are spend: $7,547; budget: $21,509 • Customers attracted via the web tend to have larger budgets but then tend to spend less on average o Does this present a problem? Using evidence for an extended problem Furniture store case study – sales performance Are sales different depending on where customers heard about the store? • Stressed survey is a sample from the population of sales data • Can we confidently say that customers attracted via the web tend to have larger budgets but then tend to spend less on average • Such a conclusion relates to a comparison of population means whereas what was provided was a comparison of sample means • Are differences observed in the sample data “real” or simply a matter of random variation not related to how the customer became aware of the store? Using evidence for an extended problem Furniture store case study – sales performance Are sales different depending on where customers heard about the store? • Making comparisons of population means is covered in statistical inference that will be introduced later in the course • In the language of inference, hypotheses will be developed & tested • For the moment, our evidence base is descriptive which is useful but only part of the answer Re-cap Defining the problem • Have stressed problem articulation, especially how to use quantitative evidence to understand what happened o Think in terms of setting the scene by using data to obtain stylized facts o Basic descriptive statistics is important here Problem disaggregation • Stressed importance of breaking down problems into constituent parts o These parts become amenable to analysis with some statistical tools that were illustrated o Yes, you may need to synthesize all the parts but that comes later in the course If you have any questions about the course, please email: comm1110@unsw.edu.au The lecture recording will be made available in your Moodle course site. Thank you