GIS AND BIG DATA: THEORY AND BEST PRACTICE CASE STUDIES Dr. Dave Schrader Director – Strategy and Marketing, Teradata October 2012 – University of Redlands WHO IS TERADATA? WHAT IS TERADATA’S STRATEGY? HOW DO BIG DATA AND GEOSPATIAL FIT? TERADATA • Founded 1979, first shipment 1984 • $2.4B a year in revenues, growing 22% • Leading vendor of Enterprise-sized Data Warehouses (HW, SW, PS) • Engineering HQ is in Rancho Bernardo • We sell to the Global 3000, blue chip customer base • Well-known to all database experts • Moving from “back office” to “frontline” (Active), increasing # of data types 3 © Teradata 2012 The Teradata Story – History of Big Data 1983: Teradata ships 1st system to Wells Fargo Jan 1992 Walmart passes 1TB Jan 2006 WMT loads 1B rows/day, 1 hr latency June 2012 eBay loads 1TB/minute 4 More than 25 customers with >25,000 Terabytes at their fingertips © Teradata 2012 What Data is Driving Growth? … The W’s • More detailed data comes from` > Detailed Customer Behavioral Data – “Where” in all industries: mobile and geospatial – “What and When” granularity – e.g., browsing on web, including nonclicks and non-transactions – Telco: all the detail behind each phone call (BSS, OSS): location – Social networking data – tweets, blogs > Detailed Operations Data – “How” – Process data – Network congestion, goal planning – Transportation optimizations in real-time – Manufacturing: sensor and test data 5 © Teradata 2012 Data Mart Appliance Extreme Data Appliance Data Warehouse Appliance Extreme Performance Appliance Purpose Test & Development -orData Marts Strategic Analytics on Extreme Data Volumes Data Warehouse -orDepartmental Data Marts Extreme Performance for Operational Analytics Scalability SMP Up to 12TB MPP Up to 186PB MPP Up to 315TB MPP Up to 18TB Active Users Scalability Flexibility 6 © Teradata 2012 66XX 4600 2690 1650 560 Purpose-Built Teradata Platform Family Active Intelligent Data Warehouse Enterprise Scale Strategic & Operational Intelligence MPP Up to 92PB TOP RATING BY GARTNER - DBMS Why the TOP Rating for Data Warehousing? Happy Customers! Superior Technology! Innovative Users! 7 © Teradata 2012 The Next Generation of Analytics: Trends • Transaction: Value to the business • Interaction: EXPERIENCE with the business • Consumer is CEO of the household • Consumers making intelligent decisions based upon analytics & perfect economic information • Format: Structured & MULTI-STRUCTURED Data • Type: Web, social, location, device, channel • VOLUME and VELOCITY 8 © Teradata 2012 Teradata and its Acquisitions • Teradata Integrated Data Warehouse • Operational BI/Intelligence • Platform Family • Interoperability & Consulting 9 Business Applications •Aprimo Applications •Strategic Partnerships Data Big Data Warehousing Analytics © Teradata 2012 • Aster Data • Extreme Data Appliance • Partnerships TERADATA + GEOSPATIAL Temptation: Build Analytic Silos, Geospatial Silos OLAP Cubes Data Mining BIG DATA Geospatial REG2 SEG1 1A B C D E F G H 1 Total 2A B C D E F G H 2 Total 3A B C D E F G H 3 Total 4A B C D E F G H 4 Total Application Development 11 Data Warehouse © Teradata 2012 PERIOD M01 Accts 1 4 137 50 24 2 2 11 231 1 5 73 35 20 0 1 5 141 0 1 30 26 9 1 0 2 70 0 1 54 2 4 0 1 6 68 Data M02 M03 M04 M05 M06 M07 Balances Accts Balances Accts Balances Accts Balances Accts Balances Accts Balances Accts $1 1 $2 1 $1 2 $1 2 $1 1 $1 2 $14 4 $9 4 $10 5 $13 4 $12 4 $14 5 $369 129 $299 124 $317 165 $323 144 $349 136 $364 153 $45 45 $38 42 $37 61 $37 60 $36 52 $45 56 $71 22 $55 21 $76 31 $59 26 $77 24 $61 27 $2 2 $1 2 $1 3 $1 3 $1 2 $1 3 $5 1 $2 1 $5 2 $5 2 $3 2 $3 1 $36 10 $36 9 $37 13 $32 11 $39 10 $40 11 $542 215 $442 204 $485 281 $471 252 $518 231 $528 258 $3 1 $1 1 $1 2 $3 2 $1 2 $1 2 $12 4 $12 4 $10 6 $14 5 $10 5 $9 5 $249 69 $200 68 $164 84 $186 74 $150 72 $204 79 $30 32 $24 31 $24 40 $24 39 $21 39 $26 41 $29 19 $36 21 $32 25 $38 21 $45 21 $54 22 $0 0 $0 0 $0 0 $0 0 $0 0 $0 0 $4 1 $3 1 $3 1 $4 1 $2 1 $3 1 $20 5 $13 5 $13 6 $13 6 $12 6 $71 6 $346 132 $289 133 $247 164 $282 148 $242 146 $369 156 $0 0 $0 0 $1 0 $1 0 $0 0 $1 0 $1 1 $2 1 $1 1 $2 1 $2 1 $1 1 $87 29 $72 27 $64 32 $75 30 $68 29 $76 30 $29 25 $25 23 $22 30 $26 30 $23 28 $24 28 $26 8 $28 9 $27 11 $20 10 $19 10 $41 10 $1 1 $0 1 $0 1 $0 1 $1 1 $1 1 $0 0 $0 0 $0 0 $1 0 $0 0 $1 0 $7 2 $29 2 $11 2 $6 2 $17 2 $7 2 $151 67 $157 63 $128 78 $131 75 $130 71 $152 72 $0 0 $0 0 $1 1 $1 0 $0 0 $0 0 $2 1 $4 1 $1 1 $1 1 $2 1 $3 1 $130 47 $122 41 $110 62 $121 49 $118 45 $137 49 $1 2 $2 2 $2 2 $1 3 $1 2 $2 3 $6 3 $5 3 $6 4 $14 4 $12 4 $14 4 $0 0 $0 0 $0 0 $0 0 $0 0 $0 0 $1 0 $0 0 $1 1 $0 1 $0 1 $0 1 $18 5 $13 5 $11 6 $20 5 $15 5 $14 5 $159 60 $146 52 $132 78 $159 62 $150 58 $171 63 Agile Analytics Analytics for Everyone OLAP Cubes Data Mining BIG DATA Geospatial REG2 SEG1 1A B C D E F G H 1 Total 2A B C D E F G H 2 Total 3A B C D E F G H 3 Total Application Development Data Warehouse 4A B C D E F G H 4 Total 20-40%+ wasted moving data 12 © Teradata 2012 PERIOD M01 Accts 1 4 137 50 24 2 2 11 231 1 5 73 35 20 0 1 5 141 0 1 30 26 9 1 0 2 70 0 1 54 2 4 0 1 6 68 Data M02 M03 M04 M05 M06 M07 Balances Accts Balances Accts Balances Accts Balances Accts Balances Accts Balances Accts $1 1 $2 1 $1 2 $1 2 $1 1 $1 2 $14 4 $9 4 $10 5 $13 4 $12 4 $14 5 $369 129 $299 124 $317 165 $323 144 $349 136 $364 153 $45 45 $38 42 $37 61 $37 60 $36 52 $45 56 $71 22 $55 21 $76 31 $59 26 $77 24 $61 27 $2 2 $1 2 $1 3 $1 3 $1 2 $1 3 $5 1 $2 1 $5 2 $5 2 $3 2 $3 1 $36 10 $36 9 $37 13 $32 11 $39 10 $40 11 $542 215 $442 204 $485 281 $471 252 $518 231 $528 258 $3 1 $1 1 $1 2 $3 2 $1 2 $1 2 $12 4 $12 4 $10 6 $14 5 $10 5 $9 5 $249 69 $200 68 $164 84 $186 74 $150 72 $204 79 $30 32 $24 31 $24 40 $24 39 $21 39 $26 41 $29 19 $36 21 $32 25 $38 21 $45 21 $54 22 $0 0 $0 0 $0 0 $0 0 $0 0 $0 0 $4 1 $3 1 $3 1 $4 1 $2 1 $3 1 $20 5 $13 5 $13 6 $13 6 $12 6 $71 6 $346 132 $289 133 $247 164 $282 148 $242 146 $369 156 $0 0 $0 0 $1 0 $1 0 $0 0 $1 0 $1 1 $2 1 $1 1 $2 1 $2 1 $1 1 $87 29 $72 27 $64 32 $75 30 $68 29 $76 30 $29 25 $25 23 $22 30 $26 30 $23 28 $24 28 $26 8 $28 9 $27 11 $20 10 $19 10 $41 10 $1 1 $0 1 $0 1 $0 1 $1 1 $1 1 $0 0 $0 0 $0 0 $1 0 $0 0 $1 0 $7 2 $29 2 $11 2 $6 2 $17 2 $7 2 $151 67 $157 63 $128 78 $131 75 $130 71 $152 72 $0 0 $0 0 $1 1 $1 0 $0 0 $0 0 $2 1 $4 1 $1 1 $1 1 $2 1 $3 1 $130 47 $122 41 $110 62 $121 49 $118 45 $137 49 $1 2 $2 2 $2 2 $1 3 $1 2 $2 3 $6 3 $5 3 $6 4 $14 4 $12 4 $14 4 $0 0 $0 0 $0 0 $0 0 $0 0 $0 0 $1 0 $0 0 $1 1 $0 1 $0 1 $0 1 $18 5 $13 5 $11 6 $20 5 $15 5 $14 5 $159 60 $146 52 $132 78 $159 62 $150 58 $171 63 Agile Analytics Teradata Integrated Analytics Teradata Integrated Analytics Data Exploration OLAP Optimization Visual data exploration to quickly understand and analyze data within the database Built-in multidimensional analytics optimization Geospatial Temporal Native database geospatial data types and analytics Native temporal support to manage and update time dimension Advanced Analytics Agile Analytics Big Data Integration Application Development Optimized in-database data mining technology from leading vendors, open source and Teradata In-database data labs to accelerate exploration of new data and ideas Analytic platforms and partner tools to analyze unstructured and structured data Tools and techniques to accelerate development of analytics Teradata Database Teradata Open Parallel Framework Custom Services Embedded Services Virtual Machines Teradata Purpose-Built Platform Family 13 © Teradata 2012 Native Geospatial Data Types Spatial Data Integrated with Non-Spatial Data • Geospatial is a feature that allows us to store, process, consume geospatial data • Teradata Geospatial based on the ST_Geometry data type > SQL/MM Standard > Like numeric or string types native to Teradata > Location is type ST_Geometry – Point (x y) – Line or curve (xy, xy, xy) – Polygon (xy, xy, xy, xy..) line polygon point Geocoded Customer Table Example: 14 Customer ID Integer Customer Name Char Customer Address Char Customer Type Char Location ST_Geometry 38327 John Smith 2110 Oak St. San Francisco, CA 94112 C Point (37.40113, 122.2091) 39234 William White 100 Broadway, Deaborn, MI 21002 A Point (42.153, -83.1078) © Teradata 2012 Teradata Geospatial Spatial Methods – sample High Speed Big Data Analytics 15 Attribute Spatial Operator ST_AsBinary ST_AsText ST_CoordDim ST_Dimension ST_GeometryType ST_IsEmpty ST_IsSimple ST_IsClosed ST_NumPoints ST_SRID … ST_Buffer ST_Intersection ST_Boundary ST_Difference ST_Envelope ST_ExteriorRing ST_GeometryN ST_InteriorRingN ST_Transform © Teradata 2012 Spatial Relationships ST_Intersects ST_Overlaps ST_Relate ST_Touches ST_Within ST_Contains ST_Disjoint ST_Crosses ST_Equals Measurements ST_Area ST_Distance ST_SphericalDistance ST_SpheroidalDistance ST_Perimeter ST_Length Geospatial Queries Answering ‘Where’ • ST_Geometry functions… > Measurements – Distance, surface, perimeter… > Relationship between two Store Area objects – Intersect, contains, within, adjacent… Retail Outlet > Simplified Example - find top Customer 100 customers by value within the store area boundaries and their distance from the store: Competitor outlet SELECT top 100 C.name, C.address, C.value, C.location.ST_Distance(S.location) AS Distance FROM cities C, stores S, store_area SA WHERE S.id=1 and S.id=SA.id and C.location.ST_WITHIN(SA.area) ORDER BY 3 Desc; 16 Mail Campaign Targets © Teradata 2012 Telco – Retail Accelerates Analytics with Teradata Find the 3 closest stores within 50 miles of each customer location. > Over 30 million customers > Over 2,200 stores > Target customers changing frequently Store Store Manual Geospatial Analytics In-database Geospatial Analytics • Calculate distance between each store and customer • Teradata Geospatial functions > Calculations based on complex trigonometric functions > Over 65 billion calculations > Filter results <= 50 miles > Retained 1 billion results 17 >174/8/2015 > Set a 50 mile buffer (filter) for stores > Identify customers within the buffer > Calculate spherical distance for those customers 25 times faster © Teradata 2012 Teradata Geospatial Analytics • Integrated spatial and non-spatial data • High speed processing of big data • Innovation simplified via Data Labs • Proven by industry leaders 18 © Teradata 2012 Big Data - provides enormous insight… …keyword use… …personal profiles… Customer behavior, calling/browsing habits, their social network… …sensor data and metrics… … location, travel destinations… …Opportunity to move beyond traditional analytics ! 19 © Teradata 2012 A major Telco uses real-time analytics to find remedies for dropped mobile phone calls 20 © Teradata 2012