Low-code data engineering on Databricks for Dummies An essential guide to boosting business data team productivity on the lakehouse Our speakers Mitesh Shah Nathan Tong Roberto Salcido VP, Market Strategy Sales Engineer Senior Solutions Architect A non-intimidating primer to low-code data engineering on Databricks Goals Assumptions ✓ An approachable guide to the lakehouse and low-code data engineering ✓ You are early in your low-code data engineering journey ✓ Reinforce content of Low-code Data Engineering on Databricks for Dummies ✓ You might be early in your lakehouse journey ✓ Bring to life the concepts in the book through demo, q&a, and more ✓ You might be new to the concepts, but you’re no dummy :-) Agenda for today ✓ The need for lakehouse and low-code data engineering Demo ✓ The role of AI ✓ Tales from the field: How Prophecy & Databricks drive customer impact Tech leaders are to the right of the Data Maturity Curve From hindsight to foresight Automated Decision Making Competitive Advantage Prescriptive Analytics Predictive Modeling Data Exploration Ad Hoc Queries Clean Data Automatically make the best decision How should we respond? What will happen? Reports What happened? ©2023 Databricks Inc. — All rights reserved Data + AI Maturity 4 Realizing this requires two disparate, incompatible data platforms Data Maturity Curve Competitive Advantage Data Warehouse for BI Data Lake for AI Automated Decision Making What happened? Prescriptive Analytics Predictive Modeling Clean Data Reports ©2023 Databricks Inc. — All rights reserved Ad Hoc Queries Data Exploratio n Data + AI Maturity What will happen? 5 There is no need to have two disparate platforms Data All ofData the Science data and very adaptable Streaming Highly reliable and efficient Business Intelligence SQL Analytics Governance and Security Table ACLs & ML Incomplete support for use cases Incompatible security and governance models Data Science Business & ML Intelligence Data SQL Streaming Analytics Governance and Security Files and Blobs and Table ACLs Copy subsets of data Disjointed and duplicative data silos Data Warehouse Data Lake Structured tables Structured tables unstructured Structured andand unstructured filesfiles ©2023 Databricks Inc. — All rights reserved This is the lakehouse paradigm Technologies Al / ML, SQL, BI, and Streaming use cases One security and governance approach for all data assets on all clouds Data Science & ML Data Streaming Business Intelligence SQL Analytics Governance and Security Files and Blobs and Table ACLs A reliable data platform to efficiently handle all data types Unity Catalog Fine-grained governance for data and AI Delta Lake Data reliability and performance Data Lakehouse Structured tables and unstructured files ©2023 Databricks Inc. — All rights reserved Data Applications Databricks Lakehouse Platform Lakehouse Platform Data Warehousing Data Engineering Data Streaming Data Science and ML Simple Unify your data warehousing and AI use cases on a single platform Unity Catalog Fine-grained governance for data, analytics and AI Delta Lake Data reliability and performance Cloud Data Lake All structured and unstructured data Multicloud One consistent data platform across clouds Open Built on open source and open standards ©2023 Databricks Inc. — All rights reserved 8 Raw data is rarely suitable for immediate consumption Data transformations are the key to building AI- and analytics-ready data products Data Science and Machine Learning Curated Data Real time CDC Databricks Machine Learning Bronze Silver Raw Ingestion and History Filtered, Cleaned, Augmented Gold Business Aggregates & Data Models Data Governance powered by Databricks Unity Catalog EDC DBSQL Warehouse Databricks SQL Enterprise Reporting and BI Existing transformation options have significant shortcomings Code / Scripts X SQL is limited to DWH/relational X Dependency on skilled data engineers X Missing capabilities: orchestration, lineage, deployment, … Legacy low-code solutions X Lock-in X Non-native performance X No support for DataOps Data Science and Machine Learning Curated Data Real time CDC Databricks Machine Learning Bronze Silver Raw Ingestion and History Filtered, Cleaned, Augmented Gold Business Aggregates & Data Models Data Governance powered by Databricks Unity Catalog EDC DBSQL Warehouse Databricks SQL Enterprise Reporting and BI Prophecy is the low-code platform with native-to-cloud execution Data platform team Business data teams ACCESS LAYER INTELLIGENCE LAYER SYSTEMS LAYER Low-code interfaces Generative AI with knowledge graphs on metadata Execution, orchestration, observability, search, quality, lineage, … (uniquely with code & extensibility) for orchestration, scheduling The complete low-code data transformation platform for enterprises in the cloud ✓ Complete ✓ Prophecy is a complete data engineering platform that spans data pipeline development, deployment, management, and orchestration ✓ Low code Visual UI provides drag & drop interface for building pipelines; Prophecy Data Copilot generates pipelines based on natural language prompts ✓ Open Visual=Code generates 100% open, git-committable code that is native to the underlying cloud data platform enabling DataOps practices and preventing lock-in Standardization & Reuse Framework Builder enables users to add a library of visual components, building standards for data and enabling reuse across data stakeholders ✓ AI Data engineers can quickly build generative AI applications on unstructured, enterprise data – powering use cases that immediately improve enterprise efficiency Live demo Discover how Prophecy democratizes data engineering “The hottest new programming language is English.” ANDREJ KARPATHY Prophecy Data Copilot Type your query and Copilot will create a data pipeline. AI assistant for trusted data pipelines ● Enable broader set of users through natural language ● Accelerate pipeline creation with auto-generated pipelines ● Improve pipeline quality with intelligent recommendations Databricks Assistant A context-aware AI assistant Generates SQL queries and code, explains code, and fixes issues Leverages Unity Catalog to provide personalized responses Available in Databricks Notebooks, SQL editor, and file editor Generates SQL queries and code Takes requests in natural language and creates code snippets Applies details from code cells, libraries, runtime, and more to improve accuracy Explains, diagnoses, and fixes issues from within a cell Databricks AI Assistant + Prophecy Data Copilot Using natural language to accelerate & democratize work across the data engineering stack ● Generate SQL or Python code ● Generate pipelines ● Autocomplete code or queries ● Suggest transformations ● Fix issues ● Auto-documentation Tales from the field Prophecy + Databricks drive business impact Driving home faster data pipelines & league winning analytics Legacy data architecture limits data insights ● Rigid architectures and siloed data complicate analytics ● Limited data engineering resources reduces time to insights Unlocking data access and transformation for all ● ● All data users can build high quality data pipelines and products quickly, collaboratively, and easily Flexibility and extensibility has enabled self-serve analytics at scale Building their competitive edge with data ● 7x faster pipeline development ● 3x more analysts and developers building pipelines ● 10x faster meeting stakeholder KPIs 24x faster data inspires confident investments Complex ETL blocking investment growth ● Manual processes were impacting time-to-value ● Legacy tooling caused missed investment opportunities Boosting business data user productivity ● ● Low-code approach empowered data team without reliance on engineering Access to underlying code to ensure pipelines are reliable and performant Data that moves at the speed of the market ● 42x higher data team productivity ● 24x faster time-to-insights Q&A with Roberto & Nathan Prophecy + Databricks When to migrate Common questions FREE TRIAL A low-code approach to data transformation Start for free https://app.prophecy.io/metadata/auth/signup Q&A ● How are you different from Alteryx? Matillion? ● Can you define as what is prophecy is all about as single definition? ● What types of data does Prophecy work with? Unstructured data? ● It's all 'raw' data absorbed into the system deposited into 'bronze 'tables? Or can preprocessing steps occur within input elements. i.e. - consider multiple high volume Kubernetes streams. Prophecy Free Trial https://app.prophecy.io/metadata/auth/signup Key takeaways ● Low-code tooling enables all data team members and boosts their productivity ● We are able to maintain a high-quality codebase and optimize as needed ● Integration with Databricks Lakehouse unifies a low-code approach with data, analytics, and AI use cases Thank you prophecy.io Data transformation for everyone A primer into the power of low-code data engineering and how it can enable both technical and business data users with visual data transformation to convert raw data into analytics and machine learning-ready data. Read this ebook to learn: ● How the lakehouse architecture has transformed the modern data stack ● Why Spark and SQL expertise does not have to be a blocker to data engineering success ● How Prophecy’s visual, low-code data solution democratizes data engineering ● Real world use cases where Prophecy has unlocked the potential of the lakehouse ● And much, much more Get the book Prophecy overview Simplify data engineering for technical and business data users