Marketing Analytics with R Disclaimer: • All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are used here for educational purposes only © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.1 Statistical Analysis Software: Introduction Topic Definition Definition Software designed for in-depth analysis Unlike MS Excel (general purpose spreadsheet) Origins SAS conceived in 1966 by Anthony J. Barr Placed statistical procedures in formatted file framewk Uses Advanced statistical techniques Nonlinear functions; Multiple regression; Conjoint Advantages Powerful; Accurate; Specific tools Disadvantages Command line interface; steep learning curve Very expensive © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.2 Statistical Analysis Software: Supplier Companies Topic Definition Statistical Software SAS: Market leader, especially in Fortune 500 SPSS: Strong in education market (IBM) R: Open source Others: StatPac, StatSoft STATISTICA, etc. Business Intelligence Overall Size: 2013: $13.8B; 2016: $17.1B IBM Cognos (2011: 12.1% of market) Microsoft BI (2011: 8.7% of market) Oracle Hyperion 2011 (2011: 15.6% of market) SAP Business Objects (2011: 23.6% of market) SAS Business Intelligence (2011: 12.6% of market) SPSS Modeler (2011: 0.4% of market) Others: GoodData, Panorama, Tableau, etc. Gartner Press Release, “Gartner Says WorldWide Business Intelligence Software Revenue to Grow 7% in 2013.” February 19, 2013. http://www.gartner.com/newsroom/id/2340216 SAS Press Release, “SAS in Leaders Quadrant for Business Intelligence Platforms.” February 3, 2010. http://www.sas.com/news/preleases/biplatformsgartnerleader.html © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.3 Statistical Analysis Software: Supplier Companies Gartner Magic Quadrant Business Intelligence April 2011 (Excerpts) Challengers Leaders Tableau Ability to Execute Microsoft Oracle IBM SAS SAP Actuate Jaspersoft Panorama GoodData Niche Visionaries Completeness of Vision Kalakota, Ravi. PracticalAnalytics.Wordpress.Com.” Gartner Says - BI and Analytics a $12.2B Market.” April 24, 2011. http://practicalanalytics.wordpress.com/2011/04/24/gartner-says-bi-and-analytics-a-10-5-bln-market/ © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.4 Statistical Analysis Software: Supplier Companies High Business Intelligence Technologies TDWI Model Prediction What might happen? Predictive analytics Monitoring What’s happening now? Dashboards, Scorecards Analysis Why did it happen? OLAP, Visualization tools Reporting What Happened? Query, reporting, and search tools Complexity Low Business Value High Kalakota, Ravi. PracticalAnalytics.Wordpress.Com.” Gartner Says - BI and Analytics a $12.2B Market.” April 24, 2011. http://practicalanalytics.wordpress.com/2011/04/24/gartner-says-bi-and-analytics-a-10-5-bln-market/ © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.5 Statistical Analysis Software: Major Suppliers Criteria SAS SPSS R Market Focus User Origins Learning Cost UI Database Graphics Analogy Fortune 500 Power Power user Industry Difficult $86,600/yr+ Command Line 32,768 var. SAS/Graph Microsoft Universities Ease of use Student Education Moderate $16,000/yr+ Point & Click 1 file at a time High quality Apple Universities Price Price-sensitive Open Source Moderate Free Command Line Different packages Linux UCLA, Statistical Software Packages Comparison, ats.ucla.edu: http://www.ats.ucla.edu/stat/mult_pkg/compare_packages.htm MineQuest Business Analytics, “Cost of Licensing WPS 3.0 vs. SAS 9.3.” February 2013. http://www.minequest.com/downloads/Pricing_Comparisons_Between_WPS_and_SAS.pdf IBM SPSS Statistics website, “Buy IBM SPSS Statistics Now” http://www-01.ibm.com/software/analytics/spss/products/statistics/buy-now.html © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.6 R: Introduction Topic Description Description Free statistical computing and graphics software package Widely used among statisticians and data miners Increased popularity in 2010 - on History Started in 1993 Implementation of the S programming language (1976) S offered interactive alternative to Fortran programs S developed by John M. Chambers of Stanford University R developed by Ross Ihaka and Robert Gentleman “R” from Ross & Robert, as well as play on “S” Commercial Revolution Analytics offers enterprise version ($) References: 1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013. http://www.cran.r-project.org/doc/manuals/R-intro.pdf © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.7 R: Introduction Topic Description Features Variety of statistical and graphical techniques Distributed through GNU GPL (General Public License) GNU: Gnu’s Not Unix; Recursive acronym Advantages Free Powerful Extensible through functions and extensions R community noted for its active contributions Different graphical user interfaces (GUIs) available Disadvantages Can be slow and memory-hungry Uses command line interpreter; No native GUI References: 1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013. http://www.cran.r-project.org/doc/manuals/R-intro.pdf © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.8 R: Basics Topic Description Commands Based on UNIX; case sensitive Commands separated by “;” or by newline Compound expression in braces: “{ and }” Comments designated by hashtag: #Comment Data Structure Vector Assignment: > x <- c (1, 2, 3, 4, 5.8) > : Prompt at beginning of line <- : Assignment operator c() : Function c Class “Numeric”; “Logical”; “Character”; “List” Reading Data “read.table()” function > HousePrice <- read.table(“houses.data”) References: 1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013. http://www.cran.r-project.org/doc/manuals/R-intro.pdf © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.9 R: Basics Topic Description Class “Numeric”; “Logical”; “Character”; “List” Reading Data “read.table()” function > HousePrice <- read.table(“houses.data”) Function R features a rich set of functions Statistics functions: mean(x); median(x); range(x); etc. Arithmetic functions: 4^2; log (10); sqrt (16) Plots > hist(x) # generates a default histogram > plot(x,y) # generates a quick x-y plot > quartz(height=4, width=10) # make a wide window References: 1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013. http://www.cran.r-project.org/doc/manuals/R-intro.pdf © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.10 R: Getting Started Topic Description Download R Windows: http://cran.r-project.org/bin/windows/base/ Mac: http://cran.r-project.org/bin/macosx/ Launch R Double-click to launch Will see prompt in “R Console” > New Script Select File > New Script Editor will open Arrange Editor window on left; Console on right Untitled—R Editor R Console >| © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.11 R: Getting Started Topic Description Enter Vector <- = “Equal to”; [<- looks like arrow] Example: vector<-c(2, 4, 6, 8) Run Line Execute (run) line Highlight line on R editor; Click on “Run Line” icon;3rd from left Will see “vector” entered in console Open Script Save Script Run Line Return focus to Console Print RGui Icons Untitled—R Editor R Console vector<-c(2,4,6,8) > vector<-c(2,4,6,8) © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.12 R: Getting Started Topic Description Statistics Find statistics mean(vector) <RUN LINE> (mean) var(vector) <RUN LINE> (variance) sd(vector) <RUN LINE> (standard deviation) Untitled—R Editor R Console vector<-c(2,4,6,8) mean(vector) var(vector) sd(vector) > vector<-c(2,4,6,8) > mean(vector) [1] 5 > var(vector) [1] 6.6667 > sd(vector) [1] 2.5819 © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.13 R: Getting Started Topic Description Directory Load data file to R; typically enter as CSV CSV File Comma-Separated Values; “Save As” CSV in Excel Example Datafile.csv A, B, C, D (identifiers) 1, 2, 3, 4 (data for observation #1) 2, 4, 6, 8 (data for observation #2)… Load Data Drag csv file and drop into R Console R will show filepath: “C:\\My Documents\\R Files\\...” Type filename and read command into R Editor Example<-read.csv(“C:\\My Documents…”, header=T); Run Untitled—R Editor Datafile<-read.csv(“C:\\My ..”, header=T) R Console > load(“C:\\My Documents\\...” © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.14 R: Getting Started Topic Description Directory Alternative approach: Set up working directory for dataset Working directory allows shorter filepaths Windows: See “Windows Explorer help” for more info Mac: See “Finder help” for more info Data Structured dataset commands: str; summary; fix str() Structure Shows structure of Datafile; “data.frame: 4 obs. of 4 variables” Summary Shows summary: Min; Max; Mean; Median Fix Shows data structure in matrix form to change (fix) entries summary() fix() Untitled—R Editor R Console str(Datafile) summary(Datafile) fix(Datafile) > (shows structure of datafile) > (shows summary of datafile) > (allows fixing of datafile) © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.15 R: Getting Started Topic Description Help Get help with “read.csv” command ?(read.csv) help(read.csv) Help Results help(read.csv); shows defaults: read.csv(file, header=TRUE, sep=,”, quote=“\”, dec=“.”, fill=TRUE, comment.char=“”, …) Followed by explanations of commands and parameters Untitled—R Editor R Console help(read.csv) <Opens new window with help> © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.16 R: Getting Started Topic Description Packages Load packages when functions are missing Load Select “Packages” Load Package from RGui top menu Select CRAN mirror: USA (CA 1), UK (London), Vietnam, etc. Install Select “Packages” Install Package Select Package from scrolling list: lm() [regression analysis], .. File Edit Packages Untitled—R Editor Windows Help R Console > chooseCRANmirror() > utils::menuInstallPkgs() © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.17 R: Regression Topic Description Data Created dataset “RealData” of real estate values Data captures Price, House Size, and Lot Size for 20 houses Convert data to CSV format; Excel: “Save As” csv Load Drag and drop data into R Console R Console: Copy filepath name R Editor: Paste filepath name; add read.csv command R Editor: Run Line Structure Check structure of dataset str(RealData) ‘data.frame’: 20 observations of 3 variables: Price: num 6 5.8 5.6 …; House: num 6.9 8 …; Lot: num 42.7… Untitled—R Editor R Console RealData<-read.csv(“C:\\My ..”, header=T) > load(“C:\\My Documents\\...” © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.18 R: Regression Topic Description Dependent Set Price equal to Dependent variable Explanatory Price is a function of Explanatory variables House and Lot Equation Price = c1 + c2*(House Size) + c3*(Lot Size) lm Regression analysis in R; stands for Linear Model Syntax lm(Dependent~Independent+Independent, Dataset) Equation lm(Price~House+Lot,RealData) Type into R Editor; Run Line See results in R Console Untitled—R Editor R Console lm(Price~House+Lot,RealData) > lm(Price+House+Lot,RealData) (Intercept) House Lot -0.55415 0.64680 0.02763 © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.19 R: Regression Topic Description Results Compare results from R with those from Excel Method Coefficient House Size Excel -0.554 +0.646 R -0.55415 +0.64680 Lot Size +0.027 +0.02763 Interpretations R results same as those from Excel House size important factor when assessing price Lot size not as important Statistics Option of calculating regression statistics RealReg<-lm(Price~House+Lot,RealData) summary(RealReg) Gives significance codes, R-squared, F-statistics Untitled—R Editor R Console lm(Price~House+Lot,RealData) > lm(Price+House+Lot,RealData) (Intercept) House Lot -0.55415 0.64680 0.02763 © Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.20