Marketing Analytics with R

advertisement
Marketing Analytics with R
Disclaimer:
• All logos, photos, etc. used in this presentation are the property of their respective
copyright owners and are used here for educational purposes only
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.1
Statistical Analysis Software: Introduction
Topic
Definition
Definition
Software designed for in-depth analysis
Unlike MS Excel (general purpose spreadsheet)
Origins
SAS conceived in 1966 by Anthony J. Barr
Placed statistical procedures in formatted file framewk
Uses
Advanced statistical techniques
Nonlinear functions; Multiple regression; Conjoint
Advantages
Powerful; Accurate; Specific tools
Disadvantages
Command line interface; steep learning curve
Very expensive
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.2
Statistical Analysis Software: Supplier Companies
Topic
Definition
Statistical Software
SAS: Market leader, especially in Fortune 500
SPSS: Strong in education market (IBM)
R: Open source
Others: StatPac, StatSoft STATISTICA, etc.
Business Intelligence
Overall Size: 2013: $13.8B; 2016: $17.1B
IBM Cognos (2011: 12.1% of market)
Microsoft BI (2011: 8.7% of market)
Oracle Hyperion 2011 (2011: 15.6% of market)
SAP Business Objects (2011: 23.6% of market)
SAS Business Intelligence (2011: 12.6% of market)
SPSS Modeler (2011: 0.4% of market)
Others: GoodData, Panorama, Tableau, etc.
Gartner Press Release, “Gartner Says WorldWide Business Intelligence Software Revenue to Grow 7% in 2013.” February
19, 2013.
http://www.gartner.com/newsroom/id/2340216
SAS Press Release, “SAS in Leaders Quadrant for Business Intelligence Platforms.” February 3, 2010.
http://www.sas.com/news/preleases/biplatformsgartnerleader.html
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.3
Statistical Analysis Software: Supplier Companies
Gartner Magic Quadrant
Business Intelligence
April 2011
(Excerpts)
Challengers
Leaders
Tableau
Ability to
Execute
Microsoft
Oracle
IBM
SAS
SAP
Actuate
Jaspersoft
Panorama
GoodData
Niche
Visionaries
Completeness of Vision
Kalakota, Ravi. PracticalAnalytics.Wordpress.Com.” Gartner Says - BI and Analytics a $12.2B Market.” April 24, 2011.
http://practicalanalytics.wordpress.com/2011/04/24/gartner-says-bi-and-analytics-a-10-5-bln-market/
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.4
Statistical Analysis Software: Supplier Companies
High
Business Intelligence
Technologies
TDWI Model
Prediction
What might happen?
Predictive analytics
Monitoring
What’s happening now?
Dashboards, Scorecards
Analysis
Why did it happen?
OLAP, Visualization tools
Reporting
What
Happened?
Query, reporting, and
search tools
Complexity
Low
Business Value
High
Kalakota, Ravi. PracticalAnalytics.Wordpress.Com.” Gartner Says - BI and Analytics a $12.2B Market.” April 24, 2011.
http://practicalanalytics.wordpress.com/2011/04/24/gartner-says-bi-and-analytics-a-10-5-bln-market/
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.5
Statistical Analysis Software: Major Suppliers
Criteria
SAS
SPSS
R
Market
Focus
User
Origins
Learning
Cost
UI
Database
Graphics
Analogy
Fortune 500
Power
Power user
Industry
Difficult
$86,600/yr+
Command Line
32,768 var.
SAS/Graph
Microsoft
Universities
Ease of use
Student
Education
Moderate
$16,000/yr+
Point & Click
1 file at a time
High quality
Apple
Universities
Price
Price-sensitive
Open Source
Moderate
Free
Command Line
Different packages
Linux
UCLA, Statistical Software Packages Comparison, ats.ucla.edu:
http://www.ats.ucla.edu/stat/mult_pkg/compare_packages.htm
MineQuest Business Analytics, “Cost of Licensing WPS 3.0 vs. SAS 9.3.” February 2013.
http://www.minequest.com/downloads/Pricing_Comparisons_Between_WPS_and_SAS.pdf
IBM SPSS Statistics website, “Buy IBM SPSS Statistics Now”
http://www-01.ibm.com/software/analytics/spss/products/statistics/buy-now.html
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.6
R: Introduction
Topic
Description
Description
Free statistical computing and graphics software package
Widely used among statisticians and data miners
Increased popularity in 2010 - on
History
Started in 1993
Implementation of the S programming language (1976)
S offered interactive alternative to Fortran programs
S developed by John M. Chambers of Stanford University
R developed by Ross Ihaka and Robert Gentleman
“R” from Ross & Robert, as well as play on “S”
Commercial
Revolution Analytics offers enterprise version ($)
References:
1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013.
http://www.cran.r-project.org/doc/manuals/R-intro.pdf
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.7
R: Introduction
Topic
Description
Features
Variety of statistical and graphical techniques
Distributed through GNU GPL (General Public License)
GNU: Gnu’s Not Unix; Recursive acronym
Advantages
Free
Powerful
Extensible through functions and extensions
R community noted for its active contributions
Different graphical user interfaces (GUIs) available
Disadvantages Can be slow and memory-hungry
Uses command line interpreter; No native GUI
References:
1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013.
http://www.cran.r-project.org/doc/manuals/R-intro.pdf
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.8
R: Basics
Topic
Description
Commands
Based on UNIX; case sensitive
Commands separated by “;” or by newline
Compound expression in braces: “{ and }”
Comments designated by hashtag: #Comment
Data Structure Vector Assignment:
> x <- c (1, 2, 3, 4, 5.8)
> : Prompt at beginning of line
<- : Assignment operator
c() : Function c
Class
“Numeric”; “Logical”; “Character”; “List”
Reading Data
“read.table()” function
> HousePrice <- read.table(“houses.data”)
References:
1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013.
http://www.cran.r-project.org/doc/manuals/R-intro.pdf
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.9
R: Basics
Topic
Description
Class
“Numeric”; “Logical”; “Character”; “List”
Reading Data
“read.table()” function
> HousePrice <- read.table(“houses.data”)
Function
R features a rich set of functions
Statistics functions: mean(x); median(x); range(x); etc.
Arithmetic functions: 4^2; log (10); sqrt (16)
Plots
> hist(x)
# generates a default histogram
> plot(x,y)
# generates a quick x-y plot
> quartz(height=4, width=10)
# make a wide window
References:
1. Venables, W.N., Smith, D.M., “An Introduction to R.” Version 3.0.1. May 16, 2013.
http://www.cran.r-project.org/doc/manuals/R-intro.pdf
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.10
R: Getting Started
Topic
Description
Download R
Windows:
http://cran.r-project.org/bin/windows/base/
Mac:
http://cran.r-project.org/bin/macosx/
Launch R
Double-click to launch
Will see prompt in “R Console”
>
New Script
Select File > New Script
Editor will open
Arrange Editor window on left; Console on right
Untitled—R Editor
R Console
>|
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.11
R: Getting Started
Topic
Description
Enter Vector
<- = “Equal to”; [<- looks like arrow]
Example: vector<-c(2, 4, 6, 8)
Run Line
Execute (run) line
Highlight line on R editor; Click on “Run Line” icon;3rd from left
Will see “vector” entered in console
Open Script
Save Script
Run Line
Return focus to Console
Print
RGui Icons
Untitled—R Editor
R Console
vector<-c(2,4,6,8)
> vector<-c(2,4,6,8)
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.12
R: Getting Started
Topic
Description
Statistics
Find statistics
mean(vector) <RUN LINE> (mean)
var(vector) <RUN LINE> (variance)
sd(vector) <RUN LINE> (standard deviation)
Untitled—R Editor
R Console
vector<-c(2,4,6,8)
mean(vector)
var(vector)
sd(vector)
> vector<-c(2,4,6,8)
> mean(vector)
[1] 5
> var(vector)
[1] 6.6667
> sd(vector)
[1] 2.5819
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.13
R: Getting Started
Topic
Description
Directory
Load data file to R; typically enter as CSV
CSV File
Comma-Separated Values; “Save As” CSV in Excel
Example
Datafile.csv
A, B, C, D (identifiers)
1, 2, 3, 4 (data for observation #1)
2, 4, 6, 8 (data for observation #2)…
Load Data
Drag csv file and drop into R Console
R will show filepath: “C:\\My Documents\\R Files\\...”
Type filename and read command into R Editor
Example<-read.csv(“C:\\My Documents…”, header=T); Run
Untitled—R Editor
Datafile<-read.csv(“C:\\My ..”,
header=T)
R Console
> load(“C:\\My Documents\\...”
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.14
R: Getting Started
Topic
Description
Directory
Alternative approach: Set up working directory for dataset
Working directory allows shorter filepaths
Windows: See “Windows Explorer help” for more info
Mac: See “Finder help” for more info
Data
Structured dataset commands: str; summary; fix
str()
Structure
Shows structure of Datafile; “data.frame: 4 obs. of 4 variables”
Summary
Shows summary: Min; Max; Mean; Median
Fix
Shows data structure in matrix form to change (fix) entries
summary()
fix()
Untitled—R Editor
R Console
str(Datafile)
summary(Datafile)
fix(Datafile)
> (shows structure of datafile)
> (shows summary of datafile)
> (allows fixing of datafile)
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.15
R: Getting Started
Topic
Description
Help
Get help with “read.csv” command
?(read.csv)
help(read.csv)
Help Results
help(read.csv); shows defaults:
read.csv(file, header=TRUE, sep=,”, quote=“\”,
dec=“.”, fill=TRUE, comment.char=“”, …)
Followed by explanations of commands and parameters
Untitled—R Editor
R Console
help(read.csv)
<Opens new window with help>
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.16
R: Getting Started
Topic
Description
Packages
Load packages when functions are missing
Load
Select “Packages”  Load Package from RGui top menu
Select CRAN mirror: USA (CA 1), UK (London), Vietnam, etc.
Install
Select “Packages” Install Package
Select Package from scrolling list: lm() [regression analysis], ..
File
Edit
Packages
Untitled—R Editor
Windows
Help
R Console
> chooseCRANmirror()
> utils::menuInstallPkgs()
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.17
R: Regression
Topic
Description
Data
Created dataset “RealData” of real estate values
Data captures Price, House Size, and Lot Size for 20 houses
Convert data to CSV format; Excel: “Save As” csv
Load
Drag and drop data into R Console
R Console: Copy filepath name
R Editor: Paste filepath name; add read.csv command
R Editor: Run Line
Structure
Check structure of dataset
str(RealData)
‘data.frame’: 20 observations of 3 variables:
Price: num 6 5.8 5.6 …; House: num 6.9 8 …; Lot: num 42.7…
Untitled—R Editor
R Console
RealData<-read.csv(“C:\\My ..”,
header=T)
> load(“C:\\My Documents\\...”
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.18
R: Regression
Topic
Description
Dependent
Set Price equal to Dependent variable
Explanatory
Price is a function of Explanatory variables House and Lot
Equation
Price = c1 + c2*(House Size) + c3*(Lot Size)
lm
Regression analysis in R; stands for Linear Model
Syntax
lm(Dependent~Independent+Independent, Dataset)
Equation
lm(Price~House+Lot,RealData)
Type into R Editor; Run Line
See results in R Console
Untitled—R Editor
R Console
lm(Price~House+Lot,RealData)
> lm(Price+House+Lot,RealData)
(Intercept) House Lot
-0.55415
0.64680 0.02763
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.19
R: Regression
Topic
Description
Results
Compare results from R with those from Excel
Method
Coefficient
House Size
Excel
-0.554
+0.646
R
-0.55415
+0.64680
Lot Size
+0.027
+0.02763
Interpretations R results same as those from Excel
House size important factor when assessing price
Lot size not as important
Statistics
Option of calculating regression statistics
RealReg<-lm(Price~House+Lot,RealData)
summary(RealReg)
Gives significance codes, R-squared, F-statistics
Untitled—R Editor
R Console
lm(Price~House+Lot,RealData)
> lm(Price+House+Lot,RealData)
(Intercept) House Lot
-0.55415
0.64680 0.02763
© Stephan Sorger 2013. www.StephanSorger.com; Marketing Analytics: Analytics with R; R.20
Download