Title (46 pt. HP Simplified bold)

advertisement
HiVertica
Capstone Project
Stephen Walkauskas,
Architect, Data Management, Vertica
University of Pittsburgh January 11, 2013
1
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Contact info
Stephen Walkauskas
swalkauskas@vertica.com
2
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica culture
3
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
What Is Vertica
•
•
•
•
•
•
•
•
4
SQL Database for Real-time Analytics
Runs on x86 hardware
MPP Columnar Architecture – scales to PBs!
Reduced footprint via Advanced Compression
Extensible analytics capabilities
Easy to setup and use
Elastic - grow/shrink as needed
Extensive Ecosystem of analytic tools
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Speed
Scale
Simplicity
Map/Reduce
5
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
-- HQL
SELECT a.val1, a.val2, b.val, c.val
FROM a JOIN b ON (a.key = b.key)
LEFT OUTER JOIN c ON (a.key = c.key)
6
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HiVertica
7
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HiVertica
a) Write code to read Hive / HCatalog meta-data and generate DDL to create
corresponding external tables (ETs) in a Vertica DB.
b) Configure ETs with files referenced by the corresponding Hive tables. Vertica ships a
connector to source files from hdfs. Using this connector the aforementioned ETs can
be used to query data in Hive (assuming data is in a format Vertica can parse).
8
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HiVertica
c) Vertica supports User Defined Parsers (you can write your own csv parser if you’re so
inclined). RCFile is commonly used to store data in Hive. It would be useful to be able
to parse that format in a Vertica UDParser.
d) Find that place in Hive where it compiles HQL into M/R jobs and instead rename the
HQL to SQL and, leveraging the above features, send the query to Vertica instead. The
two systems are not 100%; we can tweak them to shrink the feature gap.
9
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thanks!
10
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Download