R / TERR Ana Costa e SIlva, PhD Senior Data Scientist TIBCO © Copyright 2000-2013 TIBCO Software Inc. Tower of Big and Fast Data Hundreds of Records Visual Data Discovery Key peformance indicators Millions of Records Data Mining Real Time Analytics Billions of Records (Big Data) Trillions of Records (Fast Data) © Copyright 2000-2014 TIBCO Software Inc. 2 Tower of Big and Fast Data Spotfire Analyst Spotfire Business Author Spotfire Consumer Visual Data Discovery Hundreds of Records Key Spotfire Mobile Metrics peformance indicators Millions of Records Data Mining Spotfire Event Analytics Real Time Analytics TIBCO Enterprise Runtime for R Billions of Records (Big Data) Trillions of Records (Fast Data) © Copyright 2000-2014 TIBCO Software Inc. 3 TERR • TIBCO Enterprise Runtime for R (TERR) • Latest in family of statistics scripting engines: S, S-PLUS®, R, TERR • Commercial Releases: v1.0 Nov 2012, v2.0 Nov 2013, v2.1 Feb 2014, … • Developer Edition: www.TIBCOmmunity.com/community/products/analytics/terr • Engine internals rebuilt from scratch • • • • Redesigned data object representation Redesigned memory management facilities Addresses long-standing problems with S language Fast and scalable engine !! 4 TERR Performance Model Fitting: 5 Million Rows TERR 7X Model Scoring: 20 Million Rows faster 84X 5 TERR: The Fastest Road to Big Data • TERR: TIBCO Enterprise Runtime for R • • • • Most stable and performant access to analytics Zero learning curve for R programmers Supports in-database, in-Hadoop functionality Teradata, Oracle, …; Apache, Horton, Cloudera, MapR, … • Deployment • • • • TERR Server execution: TIBCO Spotfire Statistics Services CEP Integration: TIBCO Business Events, Streambase Grid Integration: TIBCO GridServer Infrastructure Integration: TIBCO Business Works, … 6 TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular IDE in the R Community – Professional-quality development environment to use with TERR • Features – Syntax highlighting, code completion, and smart indentation – Execute R code directly from the source editor – Manage multiple working directories using projects – Quickly navigate code Demo 1 8 Hadoop / TERR: Write Your Mapper Use Standard R Syntax; Run using TERR If you can understand this, you can write mapreduce: cat input | mapper | sort |reducer mapper <function(d) { words <strsplit(paste(d, collapse = ' '), '[[:punct:][:space:]]+')[[1]] # split on punctuation and spaces words <- words[!(words == '')] # get rid of empty words caused by whitespace at beginning of lines df <- data.frame(word = words) df$cnt <- 1 hsWriteTable(df, sep = "\t") } 9 Write Your Reducer Use Standard R Syntax; Run using TERR If you can understand this, you can write mapreduce: cat input | mapper | sort |reducer reducer <function(d) { # d$word is all one value per mapreduce cat(paste(d$word[1], sum(d$cnt), collapse="\t"), "\n") } 10 TERR Map Reduce From the command line: $ hadoop-streaming –map mapper.R –reduce reducer.R –input ‘inputfile’ –output ‘outputfile’ From TERR: optionally call remotely via TIBCO Spotfire Statistics Services Return.code <system(“hadoop-streaming –map mapper.R –reduce reducer.R –input ‘inputfile’ –output ‘outputfile’ ”) 11 Hadoop Big Data Tools Complex Technical Confusing TIBCO Approach Authors and Consumers – Hide Complexity, Empower Users Visual Query – data on demand Fit interface to User skills 12 TERR Map Reduce Spotfire via Statistics Services Reducer.R Mapper.R TERRscript via TERRscript Hadoop Streaming $ hadoop-streaming –map mapper.R –reduce reducer.R -input ‘inputfile’ –output ‘outputfile’ HDFS Each Node Processes its own data using TERR Data Node Data Node Data Node Data Node 13 Demo 2 14 TERR MapReduce from Spotfire Parameterize MapReduce, Generate and Edit MapReduce code, Test Locally, I/O from Spotfire Deploy through Hadoop Streaming MapReduce Interface from/to Spotfire Receive analysis results directly back into Spotfire for visualisation and further analysis © Copyright 2000-2014 TIBCO Software Inc. Contact Thank you! Ana Costa e Silva, PhD Senior Data Scientist ansilva@tibco.com TERR Developer Edition: www.TIBCOmmunity.com/community/products/analytics/terr © Copyright 2000-2013 TIBCO Software Inc. © Copyright 2000-2013 TIBCO Software Inc. 16