www.appnovation.com ATLANTA • LONDON • MONTREAL • NEW YORK • SAN FRANCISCO • VANCOUVER Introduction to Big Data Jim MacInnes – Director of Technology June 23, 2015 D+H V AGENDA 1. About Appnovation 2. What is Big Data? 3. Big Data Access 4. Big Data Analysis 5. Appnovation Use Case D+H V 1 D+H Intro to Appnovation V D+H V WHO IS APPNOVATION? LEGAL NAME FOUNDED HEADQUARTERS LEADERSHIP PASSION CORE VALUES SERVICES CLIENTS PARTNERSHIPS D+H Appnovation Technologies 2007 #300-152 W. Hastings St. Vancouver BC Arnold Leung, Chief Executive Officer We have growth in our DNA and a machine that supports it. We are experts at high-volume recruitment, team building, retention, office expansion, etc. Growth – Openness – Teamwork – Innovation – Customer Satisfaction Open Source Architecture & Consulting; Enterprise Integration; Drupal Development; Creative Design & Responsive Development; Cross-platform Development; HTML5 Development; Mobile App Development Samsung, Intel, Fox, Cisco, Pfizer, Google, Department of Defense, Universal Music Group, NBC, GE, The Recording Academy, and Time Inc. etc. MuleSoft, Google, Acquia, Alfresco, HortonWorks V SERVING GLOBAL ORGANIZATIONS AROUND THE WORLD NEW offices opening SOON! SAN FRANCISCO OFFICE 315 Montgomery St. #800 & 900 San Francisco, CA, 94104 D+H VANCOUVER OFFICE ATLANTA OFFICE NEW YORK OFFICE MONTREAL OFFICE LONDON OFFICE 300-152 West Hastings St Vancouver BC, V6B 1G8 3414 Peachtree Road, #1600 Atlanta Georgia, 30326-1164 845 Third Avenue, 6th Floor, New York, NY 10022 5455 de Gaspe Avenue, #370 Montreal Quebec, H2T 2A3 Davidson House, Forbury Square, Reading, RG1 3EU V OUR KEY PARTNERS Mulesoft We are a preferred integration partner for SOA, SaaS integration & API management solutions Google Appnovation is proud to be a Google Partner and to also have them as a customer Acquia Appnovation is the only company in the world that is both an Acquia Enterprise Select Partner, and a Alfresco Platinum Partner Alfresco Alfresco is an Enterprise Content Management system (ECM) leveraged to manage large amounts of documents, records, and other forms of content. D+H V OUR SERVICES • High-capacity open source architectures, consulting and system design • Creative Design & Responsive Development to ensure a world-class user experience regardless of device • Leading edge front and back-end development leveraging best-in-class open source technologies, and open standards D+H V • Complex integrations Drupal, Alfresco, MuleSoft • Enterprise Integration • Big Data • HTML5 Development • Mobile App Development • Cross Platform Development JUST A FEW OF OUR HAPPY CLIENTS D+H V AWARD WINNING SOLUTIONS • • • • • • • • • • • D+H 2014 Acquia Partner Award for Best High Technology Site: Revolution Analytics 2014 Acquia Partner Award for Best Brand Experience Site: Samsung Knox 2014 Blue Drop Award for Drupal Website of the Year: Samsung Knox 2014 Blue Drop Award for Best B2B site: Appnovation Technologies Corporate Site 2014 Blue Drop Award for Best Government Site: Bay Area Rapid Transit 2014 Blue Drop Award for Best Non-Profit Site: Teach For All 2014 Blue Drop Award for Best Retail Site: Rockport 2013 Acquia Partner Award for Mobile site - Presented by Acquia at DrupalCon 2013 2013 Blue Drop Award for Best Marketplace - Presented by BlueDrop at DrupalCon 2013 2013 Top Web Development Company - Presented by BC in Vancouver 2013 2012 Alfresco Solution of the Year - Presented by Alfresco at Partner Conference 2012 V 2 D+H What is Big Data? V WHAT IS BIG DATA? Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate - Wikipedia D+H V Two Main Areas of Big Data • Data Access - Large scale Real-time data storage and retrieval • Data Analysis - Large scale “Offline” data processing and analysis D+H V 3 D+H Big Data Access V Real-Time Data Access • As data sets grow exponentially the traditional database method of using Shared File Systems for managing the storage and retrieval of data falls short • Need a new data access model that allows for more linear scaling of database resources D+H V Traditional Relational Database D+H V The NoSQL Model • Relational Databases use shared Storage to Ensure certain data quality guarantees – Transactional Integrity – Row Level locking – ACID – Atomicity, Consistency, Isolation, Durability • Not all data requires this level of integrity D+H V NoSQL Model D+H V NoSQL Data Model • No central shared data store, data is “Sharded” across multiple file systems • Does away with full ACID compliance in favor of near linear scaling • Not meant for transactions that require full ACID, such as Banking Transactions • Is meant only for data access and storage, not analytics or complex querying • No Structured Query Language (SQL) • Can store unstructured data D+H V NoSQL Software • MongoDB • Casandra • AWS DynamoDB • Google Cloud Datastore D+H V 4 D+H Big Data Analysis V Load Balancing vs Parallel Processing • Traditional Databases use the concept of Load Balancing to increase the processing power of the database cluster • Load balancing limits the processing power of the query to a single computer D+H V Traditional Load Balanced Database D+H V Map/Reduce • The concept of Map/Reduce is a method to run queries on a massively parallel manner. • A Map/Reduce system can crunch huge amounts of data in a short amount of time D+H V Map/Reduce Database D+H V Map/Reduce • Meant for “Offline” Non-Real-time data processing • Can Scale massively across 100s or 1000s of computers D+H V Map/Reduce Software • Hadoop – really the only one – created by Google and Yahoo • AWS Elastic MapReduce – Really just Hadoop • Google Cloud MapReduce – Really just Hadoop D+H V 5 D+H Appnovation Use Case V Pharmaceutical Marketing D+H V CANADIAN HEADQUARTERS 152 West Hastings Street Vancouver BC, V6B 1G8 UNITED STATES OFFICE 3414 Peachtree Road, #1600 Atlanta Georgia, 30326-1164 Questions? Please visit us at www.appnovation.com Thank you! D+H V UNITED KINGDOM OFFICE 3000 Hillswood Drive Hillswood Business Park Chertsey KT16 0RS, UK www.appnovation.com info@appnovation.com