GIS and Big Data-Theory and Best Practice Case Studies

advertisement
GIS AND BIG DATA:
THEORY AND
BEST PRACTICE CASE STUDIES
Dr. Dave Schrader
Director – Strategy and Marketing, Teradata
October 2012 – University of Redlands
WHO IS TERADATA?
WHAT IS TERADATA’S
STRATEGY?
HOW DO BIG DATA AND
GEOSPATIAL FIT?
TERADATA
• Founded 1979, first shipment 1984
• $2.4B a year in revenues, growing 22%
• Leading vendor of Enterprise-sized Data
Warehouses (HW, SW, PS)
• Engineering HQ is in Rancho Bernardo
• We sell to the Global 3000, blue chip
customer base
• Well-known to all database experts
• Moving from “back office” to “frontline”
(Active), increasing # of data types
3
© Teradata 2012
The Teradata Story – History of Big Data
1983: Teradata ships 1st
system to Wells Fargo
Jan 1992
Walmart passes 1TB
Jan 2006
WMT loads 1B
rows/day, 1 hr
latency
June 2012
eBay loads
1TB/minute
4
More than 25 customers with
>25,000 Terabytes at their fingertips
© Teradata 2012
What Data is Driving Growth? … The W’s
• More detailed data comes from`
> Detailed
Customer Behavioral Data
– “Where” in all industries: mobile and geospatial
– “What and When” granularity – e.g., browsing on web, including nonclicks and non-transactions
– Telco: all the detail behind each phone call (BSS, OSS): location
– Social networking data – tweets, blogs
> Detailed
Operations Data
– “How” – Process data
– Network congestion, goal planning
– Transportation optimizations in real-time
– Manufacturing: sensor and test data
5
© Teradata 2012
Data Mart
Appliance
Extreme
Data
Appliance
Data
Warehouse
Appliance
Extreme
Performance
Appliance
Purpose
Test &
Development
-orData Marts
Strategic
Analytics on
Extreme Data
Volumes
Data Warehouse
-orDepartmental
Data Marts
Extreme
Performance for
Operational
Analytics
Scalability
SMP
Up to 12TB
MPP
Up to 186PB
MPP
Up to 315TB
MPP
Up to 18TB
Active
Users
Scalability
Flexibility
6
© Teradata 2012
66XX
4600
2690
1650
560
Purpose-Built Teradata Platform Family
Active Intelligent
Data Warehouse
Enterprise Scale
Strategic &
Operational
Intelligence
MPP
Up to 92PB
TOP RATING BY GARTNER - DBMS
Why the TOP Rating
for Data Warehousing?
Happy Customers!
Superior Technology!
Innovative Users!
7
© Teradata 2012
The Next Generation of Analytics: Trends
• Transaction: Value to the business
• Interaction: EXPERIENCE with the business
• Consumer is CEO of the household
• Consumers making intelligent decisions based
upon analytics & perfect economic information
• Format: Structured & MULTI-STRUCTURED Data
• Type: Web, social, location, device, channel
• VOLUME and VELOCITY
8
© Teradata 2012
Teradata and its Acquisitions
• Teradata
Integrated Data
Warehouse
• Operational
BI/Intelligence
• Platform Family
• Interoperability
& Consulting
9
Business
Applications
•Aprimo Applications
•Strategic Partnerships
Data
Big Data
Warehousing Analytics
© Teradata 2012
• Aster Data
• Extreme Data
Appliance
• Partnerships
TERADATA +
GEOSPATIAL
Temptation: Build Analytic Silos, Geospatial Silos
OLAP Cubes
Data Mining
BIG DATA
Geospatial
REG2
SEG1
1A
B
C
D
E
F
G
H
1 Total
2A
B
C
D
E
F
G
H
2 Total
3A
B
C
D
E
F
G
H
3 Total
4A
B
C
D
E
F
G
H
4 Total
Application
Development
11
Data
Warehouse
© Teradata 2012
PERIOD
M01
Accts
1
4
137
50
24
2
2
11
231
1
5
73
35
20
0
1
5
141
0
1
30
26
9
1
0
2
70
0
1
54
2
4
0
1
6
68
Data
M02
M03
M04
M05
M06
M07
Balances Accts Balances Accts Balances Accts Balances Accts Balances Accts Balances Accts
$1
1
$2
1
$1
2
$1
2
$1
1
$1
2
$14
4
$9
4
$10
5
$13
4
$12
4
$14
5
$369
129
$299
124
$317
165
$323
144
$349
136
$364
153
$45
45
$38
42
$37
61
$37
60
$36
52
$45
56
$71
22
$55
21
$76
31
$59
26
$77
24
$61
27
$2
2
$1
2
$1
3
$1
3
$1
2
$1
3
$5
1
$2
1
$5
2
$5
2
$3
2
$3
1
$36
10
$36
9
$37
13
$32
11
$39
10
$40
11
$542
215
$442
204
$485
281
$471
252
$518
231
$528
258
$3
1
$1
1
$1
2
$3
2
$1
2
$1
2
$12
4
$12
4
$10
6
$14
5
$10
5
$9
5
$249
69
$200
68
$164
84
$186
74
$150
72
$204
79
$30
32
$24
31
$24
40
$24
39
$21
39
$26
41
$29
19
$36
21
$32
25
$38
21
$45
21
$54
22
$0
0
$0
0
$0
0
$0
0
$0
0
$0
0
$4
1
$3
1
$3
1
$4
1
$2
1
$3
1
$20
5
$13
5
$13
6
$13
6
$12
6
$71
6
$346
132
$289
133
$247
164
$282
148
$242
146
$369
156
$0
0
$0
0
$1
0
$1
0
$0
0
$1
0
$1
1
$2
1
$1
1
$2
1
$2
1
$1
1
$87
29
$72
27
$64
32
$75
30
$68
29
$76
30
$29
25
$25
23
$22
30
$26
30
$23
28
$24
28
$26
8
$28
9
$27
11
$20
10
$19
10
$41
10
$1
1
$0
1
$0
1
$0
1
$1
1
$1
1
$0
0
$0
0
$0
0
$1
0
$0
0
$1
0
$7
2
$29
2
$11
2
$6
2
$17
2
$7
2
$151
67
$157
63
$128
78
$131
75
$130
71
$152
72
$0
0
$0
0
$1
1
$1
0
$0
0
$0
0
$2
1
$4
1
$1
1
$1
1
$2
1
$3
1
$130
47
$122
41
$110
62
$121
49
$118
45
$137
49
$1
2
$2
2
$2
2
$1
3
$1
2
$2
3
$6
3
$5
3
$6
4
$14
4
$12
4
$14
4
$0
0
$0
0
$0
0
$0
0
$0
0
$0
0
$1
0
$0
0
$1
1
$0
1
$0
1
$0
1
$18
5
$13
5
$11
6
$20
5
$15
5
$14
5
$159
60
$146
52
$132
78
$159
62
$150
58
$171
63
Agile
Analytics
Analytics for Everyone
OLAP Cubes
Data Mining
BIG DATA
Geospatial
REG2
SEG1
1A
B
C
D
E
F
G
H
1 Total
2A
B
C
D
E
F
G
H
2 Total
3A
B
C
D
E
F
G
H
3 Total
Application
Development
Data
Warehouse
4A
B
C
D
E
F
G
H
4 Total
20-40%+ wasted moving data
12
© Teradata 2012
PERIOD
M01
Accts
1
4
137
50
24
2
2
11
231
1
5
73
35
20
0
1
5
141
0
1
30
26
9
1
0
2
70
0
1
54
2
4
0
1
6
68
Data
M02
M03
M04
M05
M06
M07
Balances Accts Balances Accts Balances Accts Balances Accts Balances Accts Balances Accts
$1
1
$2
1
$1
2
$1
2
$1
1
$1
2
$14
4
$9
4
$10
5
$13
4
$12
4
$14
5
$369
129
$299
124
$317
165
$323
144
$349
136
$364
153
$45
45
$38
42
$37
61
$37
60
$36
52
$45
56
$71
22
$55
21
$76
31
$59
26
$77
24
$61
27
$2
2
$1
2
$1
3
$1
3
$1
2
$1
3
$5
1
$2
1
$5
2
$5
2
$3
2
$3
1
$36
10
$36
9
$37
13
$32
11
$39
10
$40
11
$542
215
$442
204
$485
281
$471
252
$518
231
$528
258
$3
1
$1
1
$1
2
$3
2
$1
2
$1
2
$12
4
$12
4
$10
6
$14
5
$10
5
$9
5
$249
69
$200
68
$164
84
$186
74
$150
72
$204
79
$30
32
$24
31
$24
40
$24
39
$21
39
$26
41
$29
19
$36
21
$32
25
$38
21
$45
21
$54
22
$0
0
$0
0
$0
0
$0
0
$0
0
$0
0
$4
1
$3
1
$3
1
$4
1
$2
1
$3
1
$20
5
$13
5
$13
6
$13
6
$12
6
$71
6
$346
132
$289
133
$247
164
$282
148
$242
146
$369
156
$0
0
$0
0
$1
0
$1
0
$0
0
$1
0
$1
1
$2
1
$1
1
$2
1
$2
1
$1
1
$87
29
$72
27
$64
32
$75
30
$68
29
$76
30
$29
25
$25
23
$22
30
$26
30
$23
28
$24
28
$26
8
$28
9
$27
11
$20
10
$19
10
$41
10
$1
1
$0
1
$0
1
$0
1
$1
1
$1
1
$0
0
$0
0
$0
0
$1
0
$0
0
$1
0
$7
2
$29
2
$11
2
$6
2
$17
2
$7
2
$151
67
$157
63
$128
78
$131
75
$130
71
$152
72
$0
0
$0
0
$1
1
$1
0
$0
0
$0
0
$2
1
$4
1
$1
1
$1
1
$2
1
$3
1
$130
47
$122
41
$110
62
$121
49
$118
45
$137
49
$1
2
$2
2
$2
2
$1
3
$1
2
$2
3
$6
3
$5
3
$6
4
$14
4
$12
4
$14
4
$0
0
$0
0
$0
0
$0
0
$0
0
$0
0
$1
0
$0
0
$1
1
$0
1
$0
1
$0
1
$18
5
$13
5
$11
6
$20
5
$15
5
$14
5
$159
60
$146
52
$132
78
$159
62
$150
58
$171
63
Agile
Analytics
Teradata Integrated Analytics
Teradata Integrated Analytics
Data
Exploration
OLAP
Optimization
Visual data
exploration to
quickly
understand
and analyze
data within
the database
Built-in multidimensional
analytics
optimization
Geospatial
Temporal
Native
database
geospatial
data types
and analytics
Native
temporal
support to
manage and
update time
dimension
Advanced
Analytics
Agile
Analytics
Big Data
Integration
Application
Development
Optimized
in-database
data mining
technology
from leading
vendors,
open source
and Teradata
In-database
data labs to
accelerate
exploration of
new data and
ideas
Analytic
platforms and
partner tools
to analyze
unstructured
and
structured
data
Tools and
techniques to
accelerate
development
of analytics
Teradata Database
Teradata Open Parallel Framework
Custom
Services
Embedded
Services
Virtual
Machines
Teradata Purpose-Built Platform Family
13
© Teradata 2012
Native Geospatial Data Types
Spatial Data Integrated with Non-Spatial Data
• Geospatial is a feature that allows us to store, process, consume
geospatial data
• Teradata Geospatial based on the ST_Geometry data type
> SQL/MM Standard
> Like numeric or string types native to Teradata
> Location is type ST_Geometry
– Point (x y)
– Line or curve (xy, xy, xy)
– Polygon (xy, xy, xy, xy..)
line
polygon
point
Geocoded Customer Table Example:
14
Customer ID
Integer
Customer Name
Char
Customer Address
Char
Customer Type
Char
Location
ST_Geometry
38327
John Smith
2110 Oak St. San
Francisco, CA 94112
C
Point (37.40113, 122.2091)
39234
William White
100 Broadway,
Deaborn, MI 21002
A
Point (42.153, -83.1078)
© Teradata 2012
Teradata Geospatial Spatial Methods – sample
High Speed Big Data Analytics
15
Attribute
Spatial Operator
ST_AsBinary
ST_AsText
ST_CoordDim
ST_Dimension
ST_GeometryType
ST_IsEmpty
ST_IsSimple
ST_IsClosed
ST_NumPoints
ST_SRID
…
ST_Buffer
ST_Intersection
ST_Boundary
ST_Difference
ST_Envelope
ST_ExteriorRing
ST_GeometryN
ST_InteriorRingN
ST_Transform
© Teradata 2012
Spatial Relationships
ST_Intersects
ST_Overlaps
ST_Relate
ST_Touches
ST_Within
ST_Contains
ST_Disjoint
ST_Crosses
ST_Equals
Measurements
ST_Area
ST_Distance
ST_SphericalDistance
ST_SpheroidalDistance
ST_Perimeter
ST_Length
Geospatial Queries
Answering ‘Where’
• ST_Geometry functions…
> Measurements
– Distance, surface, perimeter…
> Relationship between two
Store Area
objects
– Intersect, contains, within,
adjacent…
Retail Outlet
> Simplified Example - find top
Customer
100 customers by value
within the store area
boundaries and their
distance from the store:
Competitor outlet
SELECT top 100 C.name, C.address, C.value,
C.location.ST_Distance(S.location) AS Distance
FROM cities C, stores S, store_area SA
WHERE S.id=1 and S.id=SA.id and
C.location.ST_WITHIN(SA.area)
ORDER BY 3 Desc;
16
Mail Campaign
Targets
© Teradata 2012
Telco – Retail
Accelerates Analytics with Teradata
Find the 3 closest stores within 50 miles of each customer location.
> Over 30 million customers
> Over 2,200 stores
> Target customers changing frequently
Store
Store
Manual Geospatial Analytics
In-database Geospatial Analytics
• Calculate distance between each
store and customer
• Teradata Geospatial functions
> Calculations based on complex
trigonometric functions
> Over 65 billion calculations
> Filter results <= 50 miles
> Retained 1 billion results
17 >174/8/2015
> Set a 50 mile buffer (filter) for stores
> Identify customers within the buffer
> Calculate spherical distance for those
customers
25 times faster
© Teradata 2012
Teradata Geospatial Analytics
• Integrated spatial and non-spatial data
• High speed processing of big data
• Innovation simplified via Data Labs
• Proven by industry leaders
18
© Teradata 2012
Big Data - provides enormous insight…
…keyword use…
…personal profiles…
Customer behavior,
calling/browsing
habits, their social
network…
…sensor data
and metrics…
… location, travel
destinations…
…Opportunity to move
beyond traditional
analytics !
19
© Teradata 2012
A major Telco uses
real-time analytics to
find remedies for
dropped mobile phone
calls
20
© Teradata 2012
Download