Introduction to Data Science and Analytics Stephan Sorger www.StephanSorger.com Unit 1. Introduction Disclaimer: • All images such as logos, photos, etc. used in this presentation are the property of their respective copyright owners and are used here for educational purposes only • Some material adapted from: Sorger, Stephan. “Marketing Analytics: Strategic Models and Metrics. Admiral Press. 2013. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Introduction 1 Outline/ Learning Objectives Topic Description Definition Topics Trends Decision Models Predictive Analytics Applications to gain valuable insight Specific areas covered in the course Timely trends driving adoption of data science Decision models; Terminology; Forms; Types Applications; Methods © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Introduction 2 Data Science: Introduction Topic Description Definition Application of technologies, techniques, and tools to data to provide actionable insight Coverage Excel 1: Essentials: Formulas, Charts, Tips and Tricks Excel 2: Tools: Solver, Statistics, etc. Excel 3: Regression: R-squared, F tests, T tests, P tests Excel 4: Forecasting: Time series; Multivariate SQL (and Excel): Dipping into back-end databases R Basics: Basic commands; Regression R Applications: Segmentation © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Introduction 3 Trends Driving Data Science Adoption Accountability Improve productivity Reduce costs “What gets measured gets done” Online Data Availability Data Science Adoption Data-Driven Presentations Data to back up proposals Predict success of plans Cloud-based data storage Online = speed Online = convenience Reduced Resources Massive Data Initiatives to capture customer information What to do with all that data? Do more with less Scrutinized budgets Scientists must show outcomes © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Introduction 4 Data Scientists; Based on “Big Bang Theory” Characters Dr. Sheldon Cooper Theoretical Physicist Dr. Leonard Hofstadter Experimental Physicist Howard Wolowitz Engineer, Applied Physics Theoretical Data Scientist Machine learning; AI General Data Scientist Data mining Data Scientist/ Engineer Software development; Browser technology © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Introduction 5 Decision Models: Definition Topic Description Model Simplified representation of reality to solve problems Evaluate affect of changes in input variables Models provide guidance on business decisions Example Model showing changes in sales as we increase number of features A Sales Revenue Add features to product But too many features feature bloat Sales can actually start to decrease features © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Introduction 6 Decision Models: Styles Topic Description Verbal Expressed in words “Sales is influenced by product features” Pictorial Expressed in pictures Chart or graph of phenomenon Mathematical Expessed in equation Sales = a + b * Features Verbal Pictorial Mathematical Sales = f(features) © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Introduction 7 Decision Models: Forms Topic Description Descriptive Characterize (describe) phenomenon Identify causal relationships and relevant variables Example: Descriptive equation: Sales = a*Features + b*Advertising +c*… Predictive Determine likely outcomes given certain inputs Classic “What If?” spreadsheet exercise Example: Spreadsheet to test different scenarios; What if we increase budget? Normative Decide best course of action to maximize objective, given fixed constraints “Given X, what should I do?” Example: Linear programming model Descriptive Features Predictive Normative This Way Ads © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Introduction 8 Terminology: Linear Equation Y Dependent Variable (Response) Y=a+b*X b Y-intercept Y value when X=0 1 Slope = rise/run = b/1 Y = Dependent Variable; Response/Output X = Independent Variable: Input a = Parameter: Y-intercept b = Parameter: Slope X Independent Variable (Input) © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Introduction 9 Decision Models: Variables Topic Description Variable Quantity that can be changed, or varied Examples: Advertising budget, Sales Independent Variable Variable whose value affects dependent variable Controllable: Product features; Number of emails sent Non-controllable: Customer age; Interest rates Dependent Variable Variable representing response (y, or output) Responds to changes in independent variable What we want to produce; Our objective: sales, customer adoption,.. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Introduction 10 Data Science: Predictive Analytics Technology Cloud computing, Cheap storage Growth Demands Trends Driving Predictive Analytics Looking for growth opportunities Data Availability Competitive Advantage Terabytes of customer data Powerful tool to target niches © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Introduction 11 Data Science: Predictive Analytics Airlines Customer Profitability Predict maintenance before failure Banking FICO scores Collections Identify profitable customers Predictive Analytics Applications Fraud Detection Predict fraudulent claims Healthcare Predict which customers will pay Predict at-risk patients Cross-Selling Insurance “Customers who bought X bought Y” Assign prices to policies © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Introduction 12 Data Science: Data Mining Step Description Selection Pre-Processing Transformation Data Mining Interpretation Select portion of data to target Data cleansing; Removing duplicate records Sorting; Pivoting; Aggregation; Merging Find patterns in data Form judgments based on the patterns Selection Data Pre-Processing Target Data Transformation PreProcessed Data Data Mining Transformed Data Patterns Interpretation Actionable Information © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Introduction 13 Data Science: Data Mining Association Rule Learning Search for associations in data Seek products purchased together Classification Sorts data into different categories Have prior knowledge of patterns Spam filtering Clustering Data Mining Approaches Identify patterns in data No prior knowledge of patterns Regression Find relationships between variables © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Introduction 14 Outline/ Learning Objectives Topic Description Definition Topics Trends Decision Models Predictive Analytics Applications to gain valuable insight Specific areas covered in the course Timely trends driving adoption of data science Decision models; Terminology; Forms; Types Applications; Methods © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Introduction 15