R / TERR - LondonR

advertisement
R / TERR
Ana Costa e SIlva, PhD
Senior Data Scientist
TIBCO
© Copyright 2000-2013 TIBCO Software Inc.
Tower of Big and Fast Data
Hundreds
of
Records
Visual Data
Discovery
Key
peformance
indicators
Millions of Records
Data Mining
Real Time
Analytics
Billions of Records (Big Data)
Trillions of Records (Fast Data)
© Copyright 2000-2014 TIBCO Software Inc.
2
Tower of Big and Fast Data
Spotfire Analyst
Spotfire Business Author
Spotfire Consumer
Visual Data
Discovery
Hundreds
of
Records
Key Spotfire Mobile Metrics
peformance
indicators
Millions of Records
Data Mining
Spotfire Event Analytics
Real Time
Analytics
TIBCO Enterprise
Runtime for R
Billions of Records (Big Data)
Trillions of Records (Fast Data)
© Copyright 2000-2014 TIBCO Software Inc.
3
TERR
• TIBCO Enterprise Runtime for R (TERR)
• Latest in family of statistics scripting engines: S, S-PLUS®, R, TERR
• Commercial Releases: v1.0 Nov 2012, v2.0 Nov 2013, v2.1 Feb 2014, …
• Developer Edition: www.TIBCOmmunity.com/community/products/analytics/terr
• Engine internals rebuilt from scratch
•
•
•
•
Redesigned data object representation
Redesigned memory management facilities
Addresses long-standing problems with S language
Fast and scalable engine !!
4
TERR Performance
Model Fitting: 5 Million Rows
TERR
7X
Model Scoring: 20 Million Rows
faster
84X
5
TERR: The Fastest Road to Big Data
• TERR: TIBCO Enterprise Runtime for R
•
•
•
•
Most stable and performant access to analytics
Zero learning curve for R programmers
Supports in-database, in-Hadoop functionality
Teradata, Oracle, …; Apache, Horton, Cloudera, MapR, …
• Deployment
•
•
•
•
TERR Server execution: TIBCO Spotfire Statistics Services
CEP Integration: TIBCO Business Events, Streambase
Grid Integration: TIBCO GridServer
Infrastructure Integration: TIBCO Business Works, …
6
TERR integration with RStudio IDE
•
RStudio integration
– TERR now compatible with the most
popular IDE in the R Community
– Professional-quality development
environment to use with TERR
•
Features
– Syntax highlighting, code completion,
and smart indentation
– Execute R code directly from the
source editor
– Manage multiple working directories
using projects
– Quickly navigate code
Demo 1
8
Hadoop / TERR: Write Your Mapper
Use Standard R Syntax; Run using TERR
If you can understand this, you can write mapreduce:
cat input | mapper | sort |reducer
mapper <function(d) {
words <strsplit(paste(d, collapse = ' '),
'[[:punct:][:space:]]+')[[1]]
# split on punctuation and spaces
words <- words[!(words == '')]
# get rid of empty words caused by whitespace at beginning of lines
df <- data.frame(word = words)
df$cnt <- 1
hsWriteTable(df, sep = "\t")
}
9
Write Your Reducer
Use Standard R Syntax; Run using TERR
If you can understand this, you can write mapreduce:
cat input | mapper | sort |reducer
reducer <function(d) { # d$word is all one value per mapreduce
cat(paste(d$word[1], sum(d$cnt), collapse="\t"),
"\n")
}
10
TERR Map Reduce
From the command line:
$ hadoop-streaming –map mapper.R –reduce reducer.R
–input ‘inputfile’ –output ‘outputfile’
From TERR:
optionally call remotely via TIBCO Spotfire Statistics Services
Return.code <system(“hadoop-streaming –map mapper.R –reduce reducer.R
–input ‘inputfile’ –output ‘outputfile’ ”)
11
Hadoop Big Data Tools
Complex
Technical
Confusing
TIBCO Approach
Authors and Consumers – Hide Complexity, Empower Users
Visual Query – data on demand
Fit interface to User skills
12
TERR Map Reduce
Spotfire via Statistics Services
Reducer.R
Mapper.R TERRscript
via TERRscript
Hadoop Streaming
$ hadoop-streaming –map mapper.R –reduce reducer.R
-input ‘inputfile’ –output ‘outputfile’
HDFS
Each Node Processes its own data using TERR
Data Node
Data Node
Data Node
Data Node
13
Demo 2
14
TERR MapReduce from Spotfire
Parameterize MapReduce, Generate and Edit MapReduce code, Test Locally, I/O from Spotfire
Deploy through Hadoop Streaming MapReduce Interface from/to Spotfire
Receive analysis results directly back into Spotfire for visualisation and further analysis
© Copyright 2000-2014 TIBCO Software Inc.
Contact
Thank you!
Ana Costa e Silva, PhD
Senior Data Scientist
ansilva@tibco.com
TERR Developer Edition:
www.TIBCOmmunity.com/community/products/analytics/terr
© Copyright 2000-2013 TIBCO Software Inc.
© Copyright 2000-2013 TIBCO Software Inc.
16
Download