SAP PRESS is a joint initiative of SAP and Rheinwerk Publishing. The know-how offered by SAP specialists combined with the expertise of Rheinwerk Publishing offers the reader expert books in the field. SAP PRESS features first-hand information and expert advice, and provides useful skills for professional decision-making. SAP PRESS offers a variety of books on technical and business-related topics for the SAP user. For further information, please visit our website: http://www.sap-press.com. Silvia, Frye, Berg SAP HANA: An Introduction (4th Edition) 2017, 549 pages, hardcover and e-book www.sap-press.com/4160 Jonathan Haun SAP HANA Security Guide 2017, 541 pages, hardcover and e-book www.sap-press.com/4227 Anil Bavaraju Data Modeling for SAP HANA 2.0 2019, approx. 475 pp., hardcover and e-book www.sap-press.com/4722 Alborghetti, Kohlbrenner, Pattanayak, Schrank, Sboarina SAP HANA XSA: Native Development for SAP HANA 2018, 607 pages, hardcover and e-book www.sap-press.com/4500 Rudi de Louw SAP HANA® 2.0 Certification Guide: Application Associate Exam Dear Reader, It has been my privilege to work with Rudi for two years now on two different editions of this book. He’s everything an editor looks for in an author: knowledgeable, dedicated, and communicative (I always know exactly what he’s working on)! SAP HANA is a difficult topic to write a certification guide for, as it’s an ever-moving target, and ensuring the most up-to-date coverage requires mastery of many areas. Rudi has always been happy to do the hard work of getting the best information, and we’ve been glad to have him. Sadly, this will be Rudi’s final edition as author of this book, as his career moves on to new (and hopefully bigger and better!) things. Rudi, we wish you nothing but luck in all your future endeavors! (And if you happen to find yourself working on a different SAP topic where we don’t yet have a book? Well, I think I can safely say we’d be delighted to bring you back into the fold.) What did you think about SAP HANA 2.0 Certification Guide: Application Associate Exam? Your comments and suggestions are the most useful tools to help us make our books the best they can be. Please feel free to contact me and share any praise or criticism you may have. Thank you for purchasing a book from SAP PRESS! Meagan White Editor, SAP PRESS meaganw@rheinwerk-publishing.com www.sap-press.com Rheinwerk Publishing • Boston, MA Notes on Usage This e-book is protected by copyright. By purchasing this e-book, you have agreed to accept and adhere to the copyrights. You are entitled to use this e-book for personal purposes. You may print and copy it, too, but also only for personal use. Sharing an electronic or printed copy with others, however, is not permitted, neither as a whole nor in parts. Of course, making them available on the Internet or in a company network is illegal as well. For detailed and legally binding usage conditions, please refer to the section Legal Notes. This e-book copy contains a digital watermark, a signature that indicates which person may use this copy: Imprint This e-book is a publication many contributed to, specifically: Editor Meagan White Acquisitions Editor Hareem Shafi Copyeditor Julie McNamee Cover Design Graham Geary Photo Credit iStockphoto.com/680541274/© panimoni Production E-Book Kelly O’Callaghan Typesetting E-Book SatzPro, Krefeld (Germany) We hope that you liked this e-book. Please share your feedback with us and read the Service Pages to find out how to contact us. The Library of Congress has cataloged the printed edition as follows: Names: De Louw, Rudi, author. Title: SAP HANA 2.0 certification guide : application associate exam / Rudi de Louw. Other titles: SAP HANA certification guide Description: 3rd edition. | Bonn ; Boston : Rheinwerk Publishing, 2019. | Revised edition of: SAP HANA certification guide / Rudi de Louw. 2016. | Includes index. Identifiers: LCCN 2019017642 (print) | LCCN 2019018195 (ebook) | ISBN 9781493218301 | ISBN 9781493218295 (alk. paper) Subjects: LCSH: Relational databases--Examinations--Study guides. | Business enterprises--Computer networks--Examinations--Study guides. | Computer programmers--Certification. | SAP HANA (Electronic resource)--Examinations--Study guides. Classification: LCC QA76.9.D32 (ebook) | LCC QA76.9.D32 D45 2019 (print) | DDC 005.75/6076--dc23 LC record available at https://lccn.loc.gov/2019017642 ISBN 978-1-4932-1829-5 (print) ISBN 978-1-4932-1830-1 (e-book) ISBN 978-1-4932-1831-8 (print and e-book) © 2019 by Rheinwerk Publishing, Inc., Boston (MA) 3rd edition 2019 7 Contents Acknowledgments .............................................................................................................................. 15 Preface .................................................................................................................................................... 17 1 SAP HANA Certification Track—Overview 25 Who This Book Is For .......................................................................................................... 26 SAP HANA Certifications .................................................................................................. 27 Associate-Level Certification ......................................................................................... Specialist-Level Certification ......................................................................................... 28 30 SAP HANA Application Associate Certification Exam ......................................... 30 Exam Objective .................................................................................................................. Exam Structure .................................................................................................................. Exam Process ...................................................................................................................... 32 34 36 Summary ................................................................................................................................. 37 SAP HANA Training 39 SAP Education Training Courses ................................................................................... 40 Training Courses for SAP HANA Certifications ........................................................ Additional SAP HANA Training Courses .................................................................... 41 42 Other Sources of Information ........................................................................................ 43 SAP Help ............................................................................................................................... SAP HANA Home Page and SAP Community ........................................................... SAP HANA Academy ......................................................................................................... openSAP ............................................................................................................................... 43 46 46 48 Hands-On with SAP HANA ............................................................................................... 49 Where to Get an SAP HANA System ........................................................................... Project Examples ............................................................................................................... Where to Get Data ............................................................................................................ 50 61 63 2 8 Contents 3 Exam Questions ................................................................................................................... 64 Types of Questions ........................................................................................................... Elimination Technique .................................................................................................... Bookmark Questions ........................................................................................................ 64 67 67 General Examination Strategies ................................................................................... 68 Summary ................................................................................................................................. 69 Technology, Architecture, and Deployment Scenarios 71 Objectives of This Portion of the Test ........................................................................ 72 Key Concepts Refresher .................................................................................................... 73 In-Memory Technology ................................................................................................... 73 Architecture and Approach ............................................................................................ 79 Deployment Scenarios .................................................................................................... 93 SAP HANA in the Cloud ................................................................................................... 104 SAP Analytics Cloud and Business Intelligence ....................................................... 108 Important Terminology .................................................................................................... 114 Practice Questions .............................................................................................................. 116 Practice Question Answers and Explanations ........................................................ 122 Takeaway ................................................................................................................................ 126 Summary ................................................................................................................................. 127 4 Modeling Tools, Management, and Administration 129 Objectives of This Portion of the Test ........................................................................ 131 Key Concepts Refresher .................................................................................................... 132 New Modeling Process .................................................................................................... 132 New Architecture for Modeling and Development ............................................... 138 Contents Creating a Project .............................................................................................................. Creating SAP HANA Design-Time Objects ................................................................ Building and Managing Information Models .......................................................... Refactoring .......................................................................................................................... 145 155 162 170 Important Terminology .................................................................................................... 174 Practice Questions ............................................................................................................... 175 Practice Question Answers and Explanations ........................................................ 178 Takeaway ................................................................................................................................ 180 Summary ................................................................................................................................. 180 5 Information Modeling Concepts 181 Objectives of This Portion of the Test ........................................................................ 182 Key Concepts Refresher .................................................................................................... 183 Tables .................................................................................................................................... Views ..................................................................................................................................... Cardinality ........................................................................................................................... Joins ..................................................................................................................................... Core Data Services Views ............................................................................................... Cube ..................................................................................................................................... Calculation Views ............................................................................................................. Other Modeling Artifacts ............................................................................................... Semantics ............................................................................................................................ Hierarchies .......................................................................................................................... 183 183 186 186 197 200 204 208 213 213 Important Terminology .................................................................................................... 216 Practice Questions ............................................................................................................... 217 Practice Question Answers and Explanations ........................................................ 225 Takeaway ................................................................................................................................ 229 Summary ................................................................................................................................. 229 9 10 Contents 6 Calculation Views 231 Objectives of This Portion of the Test ........................................................................ 232 Key Concepts Refresher .................................................................................................... 233 Data Sources for Calculation Views ............................................................................ Calculation Views: Dimension, Cube, and Cube with Star Join ........................ Working with Nodes ........................................................................................................ Semantics Node ................................................................................................................. 234 235 248 263 Important Terminology .................................................................................................... 269 Practice Questions .............................................................................................................. 271 Practice Question Answers and Explanations ........................................................ 275 Takeaway ................................................................................................................................ 278 Summary ................................................................................................................................. 278 7 Modeling Functions 279 Objectives of This Portion of the Test ........................................................................ 280 Key Concepts Refresher .................................................................................................... 281 Calculated Columns ......................................................................................................... Restricted Columns .......................................................................................................... Filters ..................................................................................................................................... Variables .............................................................................................................................. Input Parameters .............................................................................................................. Session Variables ............................................................................................................... Currency ............................................................................................................................... Hierarchies .......................................................................................................................... Hierarchy Functions ......................................................................................................... 281 286 291 293 297 302 303 308 316 Important Terminology .................................................................................................... 326 Practice Questions .............................................................................................................. 328 Practice Question Answers and Explanations ........................................................ 333 Takeaway ................................................................................................................................ 336 Summary ................................................................................................................................. 336 Contents 8 SQL and SQLScript 337 Objectives of This Portion of the Test ........................................................................ 339 Key Concepts Refresher .................................................................................................... 340 SQL ..................................................................................................................................... SQLScript .............................................................................................................................. Views, Functions, and Procedures ............................................................................... Catching Up with SAP HANA 2.0 .................................................................................. 340 347 362 365 Important Terminology .................................................................................................... 365 Practice Questions ............................................................................................................... 366 Practice Question Answers and Explanations ........................................................ 371 Takeaway ................................................................................................................................ 374 Summary ................................................................................................................................. 374 9 Data Provisioning 375 Objectives of This Portion of the Test ........................................................................ 376 Key Concepts Refresher .................................................................................................... 377 Concepts .............................................................................................................................. SAP Data Services .............................................................................................................. SAP Landscape Transformation Replication Server ............................................... SAP Replication Server ..................................................................................................... SAP HANA Smart Data Access ....................................................................................... SAP HANA Smart Data Integration and SAP HANA Smart Data Quality ........ SAP HANA Streaming Analytics .................................................................................... SQL Anywhere and Remote Data Sync ....................................................................... Flat Files ................................................................................................................................ Web Services (OData and REST) ................................................................................... SAP Vora ............................................................................................................................... SAP HANA Data Management Suite ........................................................................... 378 383 385 387 388 392 395 397 399 400 401 402 Important Terminology .................................................................................................... 403 Practice Questions ............................................................................................................... 405 11 12 Contents Practice Question Answers and Explanations ........................................................ 408 Takeaway ................................................................................................................................ 410 Summary ................................................................................................................................. 410 10 Text Processing and Predictive Modeling 411 Objectives of This Portion of the Test ........................................................................ 413 Key Concepts Refresher .................................................................................................... 414 Text Processing .................................................................................................................. 414 Predictive Modeling ......................................................................................................... 432 Important Terminology .................................................................................................... 452 Practice Questions .............................................................................................................. 453 Practice Question Answers and Explanations ........................................................ 456 Takeaway ................................................................................................................................ 457 Summary ................................................................................................................................. 458 11 Spatial, Graph, and Series Data Modeling 459 Objectives of This Portion of the Test ........................................................................ 461 Key Concepts Refresher .................................................................................................... 461 Spatial Modeling ............................................................................................................... 462 Graph Modeling ................................................................................................................. 480 Series Data Modeling ....................................................................................................... 498 Important Terminology .................................................................................................... 508 Practice Questions .............................................................................................................. 509 Practice Question Answers and Explanations ........................................................ 513 Takeaway ................................................................................................................................ 517 Summary ................................................................................................................................. 517 Contents 12 Optimization of Information Models 519 Objectives of This Portion of the Test ........................................................................ 520 Key Concepts Refresher .................................................................................................... 521 Architecture and Performance ..................................................................................... Redesigned and Optimized Applications .................................................................. Effects of Good Performance ........................................................................................ Information Modeling Techniques ............................................................................. Optimization Tools ........................................................................................................... Best Practices for Optimization .................................................................................... 521 522 523 524 525 536 Important Terminology .................................................................................................... 542 Practice Questions ............................................................................................................... 543 Practice Question Answers and Explanations ........................................................ 546 Takeaway ................................................................................................................................ 548 Summary ................................................................................................................................. 548 13 Security 549 Objectives of This Portion of the Test ........................................................................ 550 Key Concepts Refresher .................................................................................................... 551 Usage and Concepts ......................................................................................................... SAP HANA XSA and SAP HANA Deployment Infrastructure ............................... Users ..................................................................................................................................... Roles ..................................................................................................................................... Privileges .............................................................................................................................. Assigning Roles to Users ................................................................................................. Data Security ...................................................................................................................... 551 554 559 565 569 577 578 Important Terminology .................................................................................................... 581 Practice Questions ............................................................................................................... 582 Practice Question Answers and Explanations ........................................................ 585 Takeaway ................................................................................................................................ 587 Summary ................................................................................................................................. 587 13 14 Contents 14 Virtual Data Models 589 Objectives of This Portion of the Test ........................................................................ 591 Key Concepts Refresher .................................................................................................... 591 Reporting in the Transactional Systems ................................................................... 592 SAP HANA Live .................................................................................................................... 595 SAP S/4HANA Embedded Analytics ............................................................................ 608 Important Terminology .................................................................................................... 612 Practice Questions .............................................................................................................. 613 Practice Question Answers and Explanations ........................................................ 617 Takeaway ................................................................................................................................ 619 Summary ................................................................................................................................. 620 The Author ............................................................................................................................................. 621 Index ........................................................................................................................................................ 623 Service Pages ..................................................................................................................................... Legal Notes ......................................................................................................................................... I II 15 Acknowledgments “Write the vision, and make it plain upon tables, that he may run that readeth it.” – Habakkuk 2:2 (King James Bible) SAP PRESS had the vision for a book on the SAP HANA certification exam, and it was my privilege to write it and make it plain. It’s my wish that every reader of this book runs with the knowledge and understanding gained from it. All the writing has now even produced what I jokingly refer to as a “trilogy.” This is the third and final edition from my side. In my more than eight-year journey with SAP HANA, I’ve met hundreds of people, all passionate about SAP HANA. Thank you to all of you! Thank you to the entire team at SAP PRESS for making this book happen, and especially to the SAP PRESS editor, Meagan White, and the other editors for your input. Thank you to all my SAP Co-Innovation Lab colleagues for your support and the discussions we had around SAP HANA. And an extra thank you to Skhu, Phillip, and Tshepo for all your input and encouragement. I’m very thankful to all of you: my family, my friends, my colleagues, and everyone I’ve worked with to build great things with SAP HANA. To my readers: I trust that your journey will be even more amazing! 17 Preface The SAP PRESS Certification Series is designed to provide anyone who is preparing to take an SAP certified exam with all the review, insight, and practice they need to pass the exam. The series is written in practical, easy-to-follow language that provides targeted content focused on what you need to know to successfully take your exam. This book is specifically written for those interested in using the SAP HANA 2.0 version and preparing to take the SAP Certified Application Associate – SAP HANA (C_HANAIMP_15) exam. This book will also be helpful to those of you taking some of the older exams, such as the SPS 14 exam (C_HANAIMP_14). The SAP Certified Application Associate – SAP HANA exam tests the taker in the areas of data modeling. It focuses on core SAP HANA skills, as opposed to skills that are specific to a particular implementation type (e.g., SAP Business Warehouse on SAP HANA or SAP Business Suite on SAP HANA). The SAP Certified Application Associate – SAP HANA test can be taken by anyone who signs up. Using this book, you’ll walk away with a thorough understanding of the exam’s structure and what to expect when taking it. You’ll receive a refresher on the key concepts covered on the exam, and you’ll be able to test your skills via sample practice questions and answers. The book is closely aligned with the course syllabus and the exam structure, so all the information provided is relevant and applicable to what you need to know to prepare. We explain the SAP products and features using practical examples and straightforward language, so you can prepare for the exam and improve your skills in your day-to-day work as an SAP HANA data modeler. Each book in the series has been structured and designed to highlight what you really need to know. 18 Preface Structure of This Book Each chapter begins with a clear list of the learning objectives, as shown here: Techniques You’ll Master 쐍 Prepare for the exam 쐍 Understand the general exam structure 쐍 Access practice questions for preparation From there, you’ll dive into the chapter and get right into the test objective coverage. Throughout the book, we’ve also provided several elements that will help you access useful information: 쐍 Tips call out useful information about related ideas and provide practical suggestions for how to use a particular function. For example: Tip This book contains screenshots and diagrams to guide your understanding of the many information-modeling concepts. 쐍 Notes provide other resources to explore or special tools or services from SAP that will help you with the topic under discussion. For example: Note This certification guide covers all topics you need to successfully pass the exam. It provides sample questions similar to those found on the actual exam. 쐍 SPS 04 notes will show you what is new in SAP HANA SPS 04. This will help you when studying exams after C_HANAIMP_15. For example: SPS 04 When studying for the C_HANAIMP_15 exam, you can ignore these notes when doing your final preparation. But they will be helpful to keep you up to date and can be used to prepare for later exams when they get published. Each chapter that covers an exam topic is organized in a similar fashion so you can become familiar with the structure and easily find the information you need. Here’s an example of a typical chapter structure: Preface 쐍 Introductory bullets The beginning of each chapter discusses the techniques you must master to be considered proficient in the topic for the certification examination. 쐍 Topic introduction This section provides you with a general idea of the topic at hand to frame future sections. It also includes objectives for the exam topic covered. 쐍 Real-world scenario This section shows a real-world scenario where these skills would be beneficial to you or your company. 쐍 Objectives This section provides you with the necessary information to successfully pass this portion of the test. 쐍 Key concept refresher This section outlines the major concepts of the chapter. It identifies the tasks you’ll need to understand or perform properly to answer the questions on the certification examination. 쐍 Important terminology Just prior to the practice examination questions, we provide a section to review important terminology. This may be followed by definitions of various terms from the chapter. 쐍 Practice questions The chapter then provides a series of practice questions related to the topic of the chapter. The questions are structured in a similar way to the actual questions on the certification examination. 쐍 Practice question answers and explanations Following the practice exercise are the solutions to the practice exercise questions. As part of the answer, we discuss why an answer is considered correct or incorrect. 쐍 Takeaway This section provides a takeaway or reviews the areas you should now understand. This refresher section identifies the key concepts in the chapter and provides some tips related to the chapter. 쐍 Summary Finally, we conclude with a summary of the chapter. Now that you have an idea of how the book is structured, the following list will dive into the individual topics of the exam covered in each chapter: 19 20 Preface 쐍 Chapter 1, SAP HANA Certification Track—Overview, provides a look at the different certifications for SAP HANA: associate, professional, and specialist. It then looks at the general exam objectives, structure, and scoring. 쐍 Chapter 2, SAP HANA Training, discusses the available training and training material for test takers, so you know where to look for answers beyond the book. We identify the SAP Education training courses available for classroom, virtual, and e-learning, in addition to SAP Learning Hub offerings. We then look into additional resources, including the official SAP documentation and the SAP HANA Academy video tutorials. We also talk about the SAP HANA, express edition, which you can use to get practical experience. 쐍 Chapter 3, Technology, Architecture, and Deployment Scenarios, walks through the evolution of SAP HANA’s in-memory technology, including how it addresses the problems of the past, such as slow disks. From there, you’re given an overview of the persistence layer, before looking at the different deployment options, including cloud deployments, and delving into the new SAP HANA extended application services, advanced model (SAP HANA XSA) features. 쐍 Chapter 4, Modeling Tools, Management, and Administration, introduces you to what’s new in SAP HANA 2.0, the new architecture and modeling process, and the SAP HANA modeling tools, including SAP Web IDE for SAP HANA. In addition, you’ll learn how to create new information models, tables, and data. 쐍 Chapter 5, Information-Modeling Concepts, discusses general modeling concepts such as cubes, fact tables, and the difference between attributes and measures, in addition to covering general modeling best practices. 쐍 Chapter 6, Calculation Views, looks in depth at the three primary information views in SAP HANA and how to build information models using calculation views. 쐍 Chapter 7, Modeling Functions, takes what you’ve learned one step further by showing you how to enhance your calculation views with calculated columns, filter expressions, and more. 쐍 Chapter 8, SQL and SQLScript, reviews how to enhance SAP HANA models using SQL and SQLScript to create tables, read and filter data, create calculated columns, implement procedures, use user-defined functions, and more. 쐍 Chapter 9, Data Provisioning, introduces the concepts, tools, and methods for data provisioning in SAP HANA. Coverage runs from SAP Data Services to SAP HANA streaming analytics. Preface 쐍 Chapter 10, Text Processing and Predictive Modeling, looks at advanced topics, such as creating a text search index, implementing fuzzy search, performing text analysis, and creating predictive analysis models. 쐍 Chapter 11, Spatial, Graph, and Series Data Modeling, looks at more of the advanced topics, such as using the key components of spatial processing, working with relationships between data objects using graphs, and understanding series data. 쐍 Chapter 12, Optimization of Information Models, reviews how to monitor, investigate, and optimize data models in SAP HANA. We’ll look at how optimization affects architecture, performance, and information-modeling techniques. We’ll then dive into the different tools used for optimization including the SQL Analyzer, SAP HANA cockpit, debug query mode, and performance analysis mode. 쐍 Chapter 13, Security, discusses how users, roles, and privileges work together. We’ll look at the different types of users, template roles, and privileges that are available in the SAP HANA system. 쐍 Chapter 14, Virtual Data Models, looks at how to adapt predelivered content to your own business solutions using SAP HANA Live. You’ll learn about SAP HANA Live’s different views, and its two primary tools: SAP HANA Live Browser and SAP HANA Live extension assistant. We also look at SAP S/4HANA embedded analytics, which uses similar concepts. What’s New in the Third Edition Readers of the first two books will be interested to find out what is new in this edition. We’re working with SAP HANA 2.0, using the same user interface as is used by SAP Cloud Platform. While the modeling concepts are still the same as SAP HANA 1.0, the way you work with SAP HANA 2.0 has changed significantly. SAP HANA now ensures that the development you do for on-premise systems will also work in the cloud. This was accomplished by moving SAP HANA into the container and microservices world using SAP HANA XSA. This change has had a huge ripple effect on architecture, modeling tools, managing models, and security, to give a few examples. A bigger distinction now exists between design-time and runtime objects, which leads to updated processes for creating information models (design-time), building these information models to create runtime objects, and, finally, storing, managing, and securing these models. 21 22 Preface We now use SAP Web IDE for SAP HANA instead of SAP HANA Studio. All the screens shown in the book, with their descriptions, were updated to reflect this change. Chapter 4 now discusses this new modeling environment, instead of the old SAP HANA Studio and SAP HANA Web-Based Development Workbench. The bigger distinction between the design-time and runtime objects brought in new tools as well, such as the Git repository, the building process (for creating runtime objects), and new rules for managing your models. In the sections on graphical calculation views, the following additional node types are discussed: non-equi joins, minus, intersects, hierarchy functions, and anonymization. Hierarchy functionality has expanded as well. The largest changes are in the areas of advanced modeling. The SAP training courses now include a three-day training course on the topics of text processing, predictive modeling, spatial processing, graph modeling, and series data. The exam now has more emphasis on these topics, with the marks for this area moving up from 10% to 25%. Tip Even though the exam is based on SAP HANA 2.0 SPS 03, this book also addresses SAP HANA 2.0 SPS 04. Practice Questions We want to give you some background on the test questions before you encounter the first few in the chapters. Just like the exam, each question has a basic structure: 쐍 Actual question Read the question carefully, and be sure to consider all the words used in the question because they can impact the answer. 쐍 Question hint This isn’t a formal term, but we call it a hint because it tells you how many answers are correct. If only one is correct, normally it tells you to choose the correct answer. If more than one is correct, like the actual certification examination, it indicates the correct number of answers. 쐍 Answers The answers to select from depend on the question type. The following question types are possible: Preface – Multiple response: More than one correct answer is possible. – Multiple choice: Only a single answer is correct. – True/false: Only a single answer is correct. These types of questions aren’t used in the exam, but they are used in the book to test your understanding. Summary With this certification guide, you’ll learn how to approach the content and key concepts highlighted for each exam topic. In addition, you’ll have the opportunity to practice with sample test questions in each chapter. After answering the practice questions, you’ll be able to review the explanation of the answer, which dissects the question by explaining why the answers are correct or incorrect. The practice questions give you insight into the types of questions you can expect, what the questions look like, and how the answers relate to the question. Understanding the composition of the questions and seeing how the questions and answers work together is just as important as understanding the content. This book gives you the tools and understanding you need to be successful. Armed with these skills, you’ll be well on your way to becoming an SAP Certified Application Associate in SAP HANA. 23 Chapter 1 SAP HANA Certification Track—Overview Techniques You’ll Master 쐍 Understand the different levels of SAP HANA certifications 쐍 Find the correct SAP HANA certification exam for you 쐍 Learn the scoring structure of the certification exams 쐍 Discover how to book your SAP HANA certification exam 26 Chapter 1 SAP HANA Certification Track—Overview Welcome to the exciting world of SAP HANA! SAP as a company is moving at full speed to update all of its solutions to use SAP HANA. Some solutions have even changed completely to better leverage the capabilities of SAP HANA. In many senses, the future of SAP is tied to the SAP HANA platform. SAP HANA itself is developing at a fast pace to provide all the features required for innovation. Many of the new areas that we see SAP working in, such as cloud and mobile, are enabled because of SAP HANA’s many features. The best-known SAP HANA feature is its fast response times. Both cloud and mobile can suffer from latency issues, and they certainly improve with faster response times. With the large role that SAP HANA now plays, the demand for experienced professionals with SAP HANA knowledge has grown. By purchasing this book, you’ve taken the first step toward meeting these demands and advancing your career via knowledge of SAP HANA. Note Five different certifications are available for SAP HANA, and each focuses on a different topic. The focus of this book is on the SAP Certified Application Associate – SAP HANA certification, which is the most in-demand SAP HANA certification. In this chapter, we’ll discuss the target demographic of this book before diving into a discussion of the various SAP HANA certification exams available. We’ll then look at the specific exam this book is based on before examining the structure of that exam. Who This Book Is For This book covers SAP HANA from a modeling and application perspective for the C_HANAIMP_15 exam. As such, this book is meant for quite a large group of people from various backgrounds and interests. In a broad sense, this book talks about how to work with data and information—and more people work with information now than ever before. This book doesn’t go into the more technical topics (e.g., installations, upgrades, monitoring, updates, backups, and transports) addressed by some of the other SAP HANA certifications. SAP HANA Certifications Chapter 1 Let’s discuss who can benefit from this book: 쐍 Information modelers are a core audience group for this book because the main topic is information modeling. This includes people who come from an enterprise data warehouse (EDW) background. 쐍 Developers need to create and use information models—whether for SAP applications using ABAP, cloud applications using Java or Node, web services, mobile apps, or new SAP HANA applications using SAPUI5. 쐍 Database administrators (DBAs) must understand how to use the new inmemory and columnar database technologies of SAP HANA. 쐍 Architects want to understand SAP HANA concepts, get an overview of what SAP HANA entails, and plan new landscapes and solutions using SAP HANA. 쐍 Data integration and data provisioning specialists need to know how to process data in an optimal manner using SAP HANA. 쐍 Report-writing professionals constantly create, find, and consume information models. Often, the need for these information models arises from the reporting needs of business users. An understanding of how to create these information models in SAP HANA will be of great benefit. 쐍 Data scientists, or anyone working with big data and data mining, are always looking for faster, smarter, and more efficient ways to get results that can help businesses innovate. SAP HANA has proven that it can certainly contribute to these areas. 쐍 Technical performance tuning experts have to understand the intricacies of the new in-memory and columnar database paradigm to fully utilize and leverage the performance that SAP HANA can provide. If you fit into one or more of these categories, excellent! You’re in the right place. Regardless, if you’ve bought this book in preparation for the exam, our primary goal is to help you succeed; all exam takers are welcome. SAP HANA Certifications There are currently five certification examinations available for SAP HANA, as listed in Table 1.1. Note that the certifications have different prefixes. The C prefix indicates full, core exams for SAP HANA data modeling; and the E prefix indicates certification exams for specialists. 27 28 Chapter 1 SAP HANA Certification Track—Overview Table 1.1 also lists the main SAP training courses available for each certification, how many questions will be asked in each certification exam, and how much time is available when taking the examination. Certification Type of Certification Keyword SAP Training Courses Number of Questions Time in Minutes C_HANAIMP Associate Modeling HA100 80 180 80 180 HA300 HA301 C_HANATEC Associate Technical HA200 HA201 HA215 HA240 HA250 C_HANADEV Associate Development HA450 and openSAP 80 180 E_HANAAW Specialist ABAP 40 90 40 90 HA100 HA300 HA400 E_HANABW Specialist SAP BW BW362 Table 1.1 SAP HANA Certifications Although not shown here, the exams for all certifications have a unique number at the end of their names (e.g., C_HANATEC_15 is the technical certification exam for SAP HANA 2.0 SPS 03). We’ll discuss these different numbers later in this chapter. We can divide these different certifications into two categories: associate and specialist (as shown in the Type of Certification column in Table 1.1). Let’s walk through these different categories. Associate-Level Certification The associate-level certification proves that you understand SAP HANA core concepts and can apply them in projects. Based on theory, the associate-level certifications are meant for people who are new to SAP HANA. When setting up these certifications, we ask project members the following question: “What would you want candidates to be able to do and understand when they SAP HANA Certifications Chapter 1 join your project immediately after passing the certification exam?” Understanding the different concepts in SAP HANA and when to use them is important. Three of the SAP HANA certifications fall under the associate level: C_HANAIMP (modeling), C_HANATEC (technical), and C_HANADEV (development). C_HANAIMP Modeling Certification As previously discussed, the C prefix indicates the full, core exams for SAP HANA data modeling. IMP refers to the implementation of SAP HANA applications. This examination consists of 80 questions, and you’re given three hours to complete the exam. The certification doesn’t have any prerequisites. The SAP training courses that help you prepare for this certification exam are HA100, HA300, and HA301. This book and this certification exam are intended for information modelers, DBAs, architects, SAP HANA application developers, ABAP and Java developers, people who perform data integration and reporting, those who are interested in technical performance tuning, SAP Business Warehouse (SAP BW) information modelers, data scientists, and anyone who wants to understand basic SAP HANA modeling concepts. This certification exam tests your knowledge in SAP HANA subjects such as architecture, deployment, information modeling, security, optimization, administration, SAP HANA Live, and some advanced modeling topics, such as predictive, text, graph, geospatial, and series data modeling. C_HANATEC Technical Certification The C_HANATEC certification exam is meant for technical people such as system administrators, DBAs, SAP Basis people, technical support personnel, and hardware vendors. This certification doesn’t have any prerequisites. The training courses for this exam are HA200, HA201, HA215, HA240, and HA250. This certification exam tests your knowledge of SAP HANA installations, updates, upgrades, migrations, system monitoring, administration, configuration, performance tuning, high availability, troubleshooting, backups, and security. 29 30 Chapter 1 SAP HANA Certification Track—Overview C_HANADEV Development Certification The C_HANADEV certification exam is meant for SAP HANA architects and developers of both backend and frontend systems. This certification doesn’t have any prerequisites. The main training course for this exam is HA450. The openSAP courses by Thomas Jung and Rich Heilman will definitely help you when preparing for this exam. This certification exam tests your knowledge of SAP HANA development topics, such as SAP HANA modeling; SAP HANA extended application services, advanced model (SAP HANA XSA); SAPUI5; server-side JavaScript; SQLScript; OData services; and deployment. Specialist-Level Certification The specialist exams prove that you can apply SAP HANA concepts in your current specialist area. These are for people who are already certified in other areas outside of SAP HANA, and want to gain additional knowledge of SAP HANA and how to apply it to their specialties. Such people have backgrounds in SAP BW or ABAP, for example. Each specialist exam includes 40 questions, and you have an hour and a half to complete the exam. They all have prerequisites (i.e., you have to be certified in a specialist area before you’re allowed to take these SAP HANA certification exams). There are two specialist exams: 쐍 E_HANAAW ABAP Certification This specialist exam for ABAP developers requires that you’re certified with ABAP. The main additional training course for this exam is HA400, in which you learn how ABAP development is different in SAP HANA systems than in older SAP NetWeaver systems. 쐍 E_HANABW SAP BW on SAP HANA Certification This exam for people specializing in SAP BW requires that you’re certified with SAP BW. The main training course for this exam is BW362. SAP HANA Application Associate Certification Exam Now, let’s focus on the objective and structure of the C_HANAIMP certification that this book targets and look at the latest C_HANAIMP certification exam. SAP HANA Application Associate Certification Exam Chapter 1 SAP is evolving fast and is constantly being updated. Originally, a new major Support Package Stack (SPS) update of SAP HANA was released twice a year, requiring SAP Education to update all training materials and certification exams every six months. Fortunately, SAP changed this to one release per year, so all SAP certifications remain valid for the past two versions. In other words, your SAP HANA certification, depending on when you received it, will be valid for one and a half to two years. This is regrettably a slightly shorter time frame than for most other SAP certifications, some of which can remain valid for many years. In a sense, this is the cost of working in such a fast-moving environment. Table 1.2 lists the exam names for the SAP Certified Application Associate – SAP HANA certification (C_HANAIMP) through the years. SAP HANA Application Certification Exam SAP HANA SPS Year C_HANAIMP_1 First few releases 2012 C_HANAIMP131 SAP HANA 1.0 SPS 05 2013 C_HANAIMP141 SAP HANA 1.0 SPS 07 2014 C_HANAIMP142 SAP HANA 1.0 SPS 08 2014 C_HANAIMP151 SAP HANA 1.0 SPS 09 2015 D_HANAIMP_10 SAP HANA 1.0 SPS 10 2015 C_HANAIMP_11 SAP HANA 1.0 SPS 11 2016 C_HANAIMP_12 SAP HANA 1.0 SPS 12 2016 C_HANAIMP_13 SAP HANA 2.0 SPS 01 2017 C_HANAIMP_14 SAP HANA 2.0 SPS 02 2018 C_HANAIMP_15 SAP HANA 2.0 SPS 03 2018 Table 1.2 SAP Certified Application Associate – SAP HANA Exam Names Note This book deals specifically with the latest C_HANAIMP_15 certification exam. However, the concepts taught here are applicable to any of the older versions. As the basic concepts stay the same and with coverage of SPS 04 included, we expect this book to be helpful for future exams, although it won’t be tailored specifically to them. You’ll notice in Table 1.2 that the C_HANAIMP_1 certification exam included _1 at the end of the name, showing that it was the first certification exam. Over time, 31 32 Chapter 1 SAP HANA Certification Track—Overview SAP started including the year in the certification exam name—for example, C_HANAIMP151. The first two numbers of 151 refer to the year, 2015, and the last number, 1, refers to the first half of the year. In this case, this combination corresponds to SPS 09. In 2015, the naming convention was updated to refer directly to the SAP HANA SPS number. That is, the SPS 12 exam was called C_HANAIMP_12 instead of C_HANAIMP162. The _12 suffix indicates SPS 12. With the new SAP HANA 2.0 releases, the numbering continues on from 12. As this exam is the third one for SAP HANA 2.0, the exam was called C_HANAIMP_15. SAP Education delivered delta exams for SAP HANA, which will make it easier to extend the life of your SAP HANA certification. The D prefix in Table 1.2 indicates a delta certification exam. You needed to have passed the C_HANAIMP151 certification exam before you could take the D_HANAIMP_10 delta exam. Tip The SPS version for the on-premise version of SAP HANA, which we’re referring to in this book, differs from that of the cloud version. We’re using SAP HANA 2.0 SPS 03. The corresponding version on SAP Cloud Platform is SAP HANA 2.0 SPS 04. Now that we’ve introduced the SAP Certified Application Associate – SAP HANA certification, the next section will look at the exam as a whole: its objective, structure, and general process. Exam Objective In Figure 1.1, the official name of the certification examination is shown in all uppercase letters as C_HANAIMP_15. On the left, you can see that the Level is Associate. On the right side, you can see that the exam has 80 questions. The Cut Score in this example page is 64%, which means that you need to get 64% in this certification examination to pass. The Duration of the exam is 180 mins. The exam is available in English and Korean. SAP HANA Application Associate Certification Exam Figure 1.1 The C_HANAIMP_15 Certification Exam Information Page Chapter 1 33 34 Chapter 1 SAP HANA Certification Track—Overview Note that there used to be a PDF link for sample questions. This has now been replaced by a sample exam that uses the actual exam system you’ll use when writing the exam. Just click on View more next to Sample Questions to open the sample exam. In this case, there are 10 sample questions you can look at to get an idea of what the actual examination questions look like. We normally recommend that you keep the sample questions in reserve for later; don’t dive in right away. You can use the sample questions to judge whether or not you’re ready for the certification examination. Tip As a general rule, if you say “Huh?” when you look at the sample questions, then you’re not yet ready for the exam. The examination itself is focused on certain tasks. Exam Structure When you scroll down through the web page shown in Figure 1.1, you’ll see how the exam is structured and scored. Each of the topic areas are mentioned there, as well as the percentage that each area contributes toward your final score in the certification exam. The first area is Text Processing and Predictive Modeling. When you expand this area, you can see that the HA300 training course is available for studying the content required to answer questions on this topic. More than 12% (actually about 15%) of the questions fall into the Text Processing and Predictive Modeling topic, which means that there are roughly 12 questions in the certification exam on this topic. The second area is Building calculation views. When you expand this area, you can see that the HA300 training course is available. More than 12% of the questions fall into the Building calculation views, which means that there are roughly 12 questions in the certification exam on this topic. The fourth area, Technology, architecture and deployment scenarios of SAP HANA, is roughly 10% (between 8% and 12%) of the exam, meaning that it covers about eight questions. The HA100 training course is available for more information on this topic. Other portions of the exam follow this same percentage determination and provide short descriptions. SAP HANA Application Associate Certification Exam Chapter 1 As you can see, SAP recommends three training courses: HA100, HA300, and HA301. Table 1.3 lists the different topic areas you need to study for the C_HANAIMP_15 certification exam, the weighted percentage for each section, the estimated number of questions, and the chapters that correspond to these topics from the book. Topic Percent of Exam Estimated Number Corresponding of Questions Chapters Technology, architecture, and deployment scenarios of SAP HANA 8%–12% 8 3, 9 Management and administration 8%–12% of models 8 4 Building calculation views > 12% 12 5, 6 Detailed modeling functions > 12% 12 7 SQL and SQLScript in models < 8% 4 8 Text processing and predictive modeling (advanced data processing) > 12% 12 10 Spatial, graph and series data modeling (advanced data processing) 8%–12% 8 11 Optimization of models 8%–12% 8 12 Security in SAP HANA modeling < 8% 4 13 SAP-supplied virtual data models < 8% 4 14 Table 1.3 C_HANAIMP_13 Exam Scoring Structure The exam structure percentages in this book will differ slightly from those shown in Table 1.3 due to the sequence in which the materials are explained. For example, you’ll find the answers to some optimization questions in the information views chapter where the topic is first discussed. This helps to avoid needless repetition. To avoid you having to page through the book, we might occasionally remind you of some important concepts. 35 36 Chapter 1 SAP HANA Certification Track—Overview Exam Process You can book the certification exam you want to take at http://s-prs.co/v487606. You can see a list of available certification exams at http://s-prs.co/v487607. On the SAP Training website (https://training.sap.com/), choose the country you live in or the country that you want to take the exam in. You can also choose to write the exam in the cloud, using the Certification Hub subscription. The price of the certification exam and available dates will appear in the right column on the web page. Currently, the SAP HANA certification exams are also still administered in SAP examination centers even though the exam is an online exam. Note New certification exams, such as C_HANAIMP_15, are administered in the cloud. Cloud certifications use remote proctoring, in which an exam proctor watches you (the test taker) remotely via your computer’s webcam. You have to show the proctor the entire room before the exam begins by moving your webcam or your computer around. These test proctors have been trained to identify possible cheating. When you arrive for the exam, you’ll need a form of identification. During your certification exam, you won’t have access to online materials, books, your mobile phone, or the Internet. As you mark answer options, the answers are immediately stored on a server to ensure that you don’t lose any work. When you finish all your certification exam questions and submit the entire exam, you receive the results immediately. If you don’t pass the certification exam, you can take it up to two more times. (This retake option hopefully won’t be important to any of the readers of this book!) Normally, you’ll receive the printed certificate about a week later at your examination center, or you can receive it by mail. You can also see your certificate in the Acclaim portal. You can find more information on how to register yourself at their website at https://www.youracclaim.com/. The Acclaim platform is used by several large companies to manage people’s certifications. Your certification status will be shown here, and it allows potential customers or employers to verify your certifications. The Acclaim portal also allows you to share your SAP Global Certification digital badges on major social networks like Facebook and LinkedIn. You can share and promote all your certifications from SAP and other companies to give the ecosystem a comprehensive overview of all your skills. Summary Chapter 1 Summary You should now understand the various SAP HANA certification examinations and be able to identify which exam is right for you. You know about the scoring structure of C_HANAIMP_15, so you can focus your study time and energy accordingly. As you work through this book, we’ll guide you through the questions and content that can be expected. Best wishes on your exam! 37 Chapter 2 SAP HANA Training Techniques You’ll Master 쐍 Identify SAP Education training courses for SAP HANA 쐍 Find related SAP courses 쐍 Discover other sources of information and courses on the Internet 쐍 Set up your own SAP HANA system to get practical experience 쐍 Develop strategies for taking your SAP certification exam 40 Chapter 2 SAP HANA Training In this chapter, we’ll provide an overview of options for available resources and training courses to prepare for your certification exam. We’ll look into SAP Education, which makes SAP HANA training courses available for each certification and provides related courses that can enhance your skills and understanding. We’ll also discuss various sources on the Internet that provide SAP HANA documentation, video tutorials, ways to get hands-on experience, and free online courses. Finally, we’ll review some techniques for taking the certification exam. SAP Education Training Courses You can attend SAP official training courses in a couple of different ways. These training course options provide flexibility for learning and access to relevant materials, so you can customize your studies to your lifestyle. These different types of training courses include the following: 쐍 Classroom training The first and most obvious option is classroom training, in which you attend SAP HANA courses in a classroom with a trainer for a few days. Classroom training courses provide a printed manual and a system through which you can practice and perform exercises. At the end of the course, you’ll walk away with a better understanding of what is described in that training material. Classroom training is a popular option that allows individuals to focus on learning in an environment in which they can ask questions, perform exercises, discuss information with other students, and get away from their offices and emails. 쐍 Virtual classrooms You can also attend SAP courses via virtual classrooms. The virtual approach is similar to training in real-life classrooms, but you don’t sit in a physical classroom with a trainer. Instead, your trainer teaches you via the Internet in a virtual classroom. You still have the ability to ask questions, chat online with other students, and perform exercises. 쐍 E-learning You can participate in the same training courses via e-learning as well. In this case, you’re provided with a training manual and an audio recording of course presentations. However, you don’t have an instructor to ask questions of, and there is no interaction with others who are learning the same topic. Given that this type of training normally happens after hours, it requires some discipline. SAP Education Training Courses Chapter 2 쐍 SAP Learning Hub The last training course option is to use the SAP Learning Hub, a service you subscribe to yearly. Your subscription grants you access to the entire SAP portfolio of e-learning courses across every topic in the cloud, training materials, some vouchers for taking certification exams, learning rooms, hands-on training systems, and forums for asking questions. You can find further details about the SAP Learning Hub at their website at https://training.sap.com/learninghub. Tip After you’ve joined the SAP Learning Hub, we recommend that you join the SAP HANA Modeling Learning room. This is by far the largest and most active room, and it’s very helpful while preparing for this exam. The SAP people managing this room have regular webinars on exam topics, provide sample questions, and quickly answer everyone’s questions. You can also look at the Learning Journey for SAP HANA modeling at http://sprs.co/v487600. This shows the preceding points to you in a semi-graphical chart. In the next two sections, we’ll discuss SAP HANA training courses specific to the certification exams and additional courses related to SAP HANA that may prove useful in your learning. Training Courses for SAP HANA Certifications Table 2.1 lists the SAP HANA training courses for the latest C_HANAIMP certification exams, the length of each course, and each course’s prerequisites, if any. Certification SAP Training Course Length Prerequisites C_HANAIMP_14 HA100, collection 14 2 days N/A HA300, collection 14 5 days HA100 HA301, collection 14 3 days HA300 HA100, collection 15 2 days N/A HA300, collection 15 5 days HA100 HA301, collection 15 3 days HA300 C_HANAIMP_15 Table 2.1 SAP Training Courses for C_HANAIMP Certification Exams 41 42 Chapter 2 SAP HANA Training The HA100 training course is the two-day introductory course that everyone must take, regardless of which direction you want to go with SAP HANA. HA100 provides a quick introduction to SAP HANA architecture, the different concepts of inmemory computing, and basic SAP HANA modeling concepts, data provisioning (how to get data into SAP HANA), and how to use SAP HANA information models in reports. The HA300 training course goes into more detail on the modeling concepts and the security aspects of SAP HANA. As of SAP HANA SPS 02 (collection 14), the HA301 training course was added. The SAP HANA advanced modeling topics, previously found in the HA300 course, were moved to this new three-day training course, expanded with more content, and these topics now contribute 25% to the C_HAMAIMP_15 certification exam. You’ll need to take the HA100, HA300, and HA301 courses to prepare for the certification exam. Tip All the answers to the associate-level SAP HANA certification exams are guaranteed to be somewhere in the official SAP training material. Additional SAP HANA Training Courses The following SAP training courses related to SAP HANA modeling can complement your skills and knowledge of SAP HANA: 쐍 HOHAM1 This two-day course teaches modelers that used SAP HANA 1.0 how to migrate their old (legacy) information models from SAP HANA extended application services, classic model (SAP HANA XS) to SAP HANA extended application services, advanced model (SAP HANA XSA). It’s a Virtual Live Classroom (VLC) training course, where the trainer teaches the course via an online platform. The course is practical, hands-on migration training—hence, the HO (for “hands-on”) in the name. 쐍 HA450 This three-day course merges SAP HANA modeling with the native application development that you can perform with SAP HANA’s application server. It teaches how you can take an SAP HANA information model, expose it as an OData or REST web service, and consume it in the JavaScript framework called Other Sources of Information Chapter 2 SAPUI5. You can also write the C_HANADEV certification exam after attending this training course. 쐍 UX402 This course complements the HA450 training course. It focuses on developing user interfaces (UIs) with SAPUI5. 쐍 HA215 If you want to learn about SAP HANA performance tuning, the two-day HA215 course complements the HA100 and HA300 courses. 쐍 BW362 If you come from an SAP Business Warehouse (SAP BW) background, BW362 is a related course that might help you. This course shows how you can build information models with SAP BW on SAP HANA. 쐍 HA400 If you come from an ABAP background, you should think about attending the HA400 training course. It takes the knowledge that you gained in HA100, HA300, and HA301 and shows you how to apply that knowledge in your ABAP development environment. You’ll learn that the ways in which you access the SAP HANA information models and interface with the SAP HANA database are completely different from the ways to perform similar tasks in all the other databases you’ve used through the years. SAP Education offers a wide variety of courses to enhance your skills and further your career. However, it’s important to know what resources are available outside the classroom as well. In the next section, we’ll look at additional resources for continued learning. Other Sources of Information You’ll find that there is no shortage of information about SAP HANA. In fact, so much information is available that it’s almost impossible to get through it all. To help focus your search, let’s look at some of the most popular sources. SAP Help SAP Help (http://help.sap.com) is a valuable resource for your SAP and SAP HANA education. A lot of great documentation is provided on this website. At http:// 43 44 Chapter 2 SAP HANA Training s-prs.co/v487608, you’ll find all available documentation provided in PDF format (see Figure 2.1) in the Documentation Download area. Figure 2.1 SAP HANA Documentation on SAP Help Useful PDF files include the following: 쐍 SAP HANA Modeling Guide for SAP HANA XS Advanced Model We highly recommend this modeling guide, which provides the foundation for working with and building SAP HANA information models. There are two other modeling guides as well, namely for SAP HANA Studio and the Web Workbench. We recommend that you focus on the SAP HANA XSA version. Other Sources of Information Chapter 2 쐍 SAP HANA Interactive Education (SHINE) for SAP HANA XS Advanced This guide is found when you click on the View All button in the Development area of the screen. SHINE is a demo package that you can install into your own SAP HANA system or your company’s SAP HANA sandbox system. It provides a lot of data and models, with examples of how to create good information models. The example screens in this book make use of the SHINE package. We’ll look at SHINE later in this chapter. 쐍 SAP HANA Security Guide This guide tells you everything you need to know about security in more detail. We’ll discuss what you need to know about security for the exam in Chapter 13. 쐍 SAP HANA Troubleshooting and Performance Analysis Guide Learn how to properly troubleshoot your SAP HANA database and enhance overall performance with this guide. 쐍 What’s New in the SAP HANA Platform 2.0 This document lists the new features in the Support Package Stacks (SPSs) of SAP HANA 2.0 since SPS 00. 쐍 SAP HANA reference guides There is an entire section of reference guides that provide more specific and focused discussions of particular SAP HANA topics: – SAP HANA SQLScript Reference – SAP HANA XS JavaScript Reference – SAP HANA XS JavaScript API Reference – SAP HANA XS DB Utilities JavaScript API Reference – SAP HANA Business Function Library (BFL) – SAP HANA Predictive Analysis Library (PAL) Reference 쐍 SAP HANA Developer Quick Start Guide This specific developer’s guide is presented as a set of tutorials. 쐍 SAP HANA Developer Guide for SAP HANA XS Advanced Model We recommend reviewing this guide, especially because development and information modeling are closely linked in SAP HANA. This guide teaches you how to build applications in SAP HANA, write procedures, and more. This includes information on building UIs using SAPUI5. For some of these documents, it might be helpful to have an actual system to play with because some guides provide step-by-step instructions for certain actions. We discuss how to get your own SAP HANA system in a later section. 45 46 Chapter 2 SAP HANA Training SAP HANA Home Page and SAP Community The main website for SAP HANA is found at www.sap.com/products/hana.html, which offers the latest news about SAP HANA and its different use cases. Along with the SAP HANA home page, the SAP Community provides a central location for members of different SAP communities and solution users. At http://s-prs.co/ v487609, you can ask questions about your own SAP HANA system, and SAP employees will answer them. In addition, take a look at http://s-prs.co/v487610 where many other community support options are described. SAP HANA Academy From http://s-prs.co/v487611 or using a Google search, you can find a link to the SAP HANA Academy. The SAP HANA Academy area of YouTube provides hundreds of free video tutorials on all topic areas of SAP HANA, which you can access directly via www.youtube.com/user/saphanaacademy/playlists. Figure 2.2 shows the playlists screen for the SAP HANA Academy. These videos are created by SAP employees who perform actual tasks on an SAP HANA system. If you’ve never made a backup of an SAP HANA system, for example, you can search for a video on how to do just that, with step-by-step instructions performed on an actual system. Because there are so many video clips available, we recommend that you select the Playlists option from the YouTube menu bar, as shown in Figure 2.2. Alternatively, you can go to the URL provided. With each new SAP HANA release, new videos are added to the SAP HANA Academy YouTube channel, as shown in the SAP HANA SPS - What’s New row in Figure 2.2. You’ll also see other playlists here that are worth investigating, for example, Application Development and Delivery (Application Developer) and Predictive Analysis (Data Scientist), as shown in Figure 2.3. Many of the advanced modeling topics discussed in the HA301 training course, such as graph and spatial modeling, are examined in individual playlists. Other Sources of Information Figure 2.2 SAP HANA Academy at YouTube Figure 2.3 Playlists for Application Development and Predictive Analysis Chapter 2 47 48 Chapter 2 SAP HANA Training You can also find videos on the SAP HANA Academy for building solutions. On the Playlists page, look for a playlist called Building Solutions: Live5 under the Building Solutions playlist collection (see Figure 2.4). This will guide you to build a complete social media analysis solution for analyzing Twitter feeds in real time. Figure 2.4 Playlists for Building Solutions in SAP HANA Academy As noted earlier, the http://help.sap.com website provides all the PDFs for you to read, which is great if you like reading and you want the PDFs on your tablet or phone, for example. On the other hand, the SAP HANA Academy YouTube channel is good if you prefer visual learning, and it may be more practical. openSAP Next up to consider is the openSAP website (https://open.sap.com/), which provides many free training courses on a large variety of SAP topics (see Figure 2.5). This website frequently adds new online training courses, some of which focus on SAP HANA (http://s-prs.co/v487612). When new training courses become available, you can enroll in them. Every week, you’ll receive a few video clips. These courses are normally four to six weeks long, and every week you’ll take a test. All the tests together are worth 50% of your total score. In the last week, you take an exam that is worth the other 50% of your score. At the end of the course, you receive a certificate of attendance, and if you performed well, the certificate will show your score. If appropriate, the certificate will also show that you were in the top 10% or top 20% of your class. Note The openSAP certificates don’t hold the same weight as the official SAP Education certifications. Hands-On with SAP HANA Chapter 2 Figure 2.5 openSAP Website After you complete one of these free online courses, you can retake the tests or exams for a fee, but you can still access all of the other course materials for free. You can always download the video clips, the PowerPoint slides, and the transcripts for your records. Hands-On with SAP HANA Everyone learns differently. Some people like visuals, others learn by listening, and still others learn by doing. In this section, we look at how to gain some handson experience with your own SAP HANA system, projects, and data. We’ll begin with the different ways you can personally access an SAP HANA system. 49 50 Chapter 2 SAP HANA Training Where to Get an SAP HANA System On the SAP Developer Center site (https://developers.sap.com/), you can download a free SAP HANA system or access your own SAP HANA server in the cloud via the following link: http://s-prs.co/v487613, as shown in Figure 2.6. To do so, select Install your free SAP HANA instance on this page. Figure 2.6 SAP Developer Center Website for SAP HANA Selecting that option takes you to the page at http://s-prs.co/v487614, where you’ll find links to download the free SAP HANA, express edition; sign up for the free SAP Cloud Platform; or use Microsoft Azure, Amazon Web Services (AWS), or Google Cloud Platform (GCP) for your own hosted SAP HANA, express edition (see Figure 2.7). The interactive graphic in Figure 2.7 will help you decide which of the many options to choose. When preparing for the certification exam, you’ll need the Server + XSA Applications option on the right side. You can install your own system using the On-Premise options or get a cloud-hosted solution via the Cloud Providers options. Hands-On with SAP HANA Chapter 2 Figure 2.7 SAP HANA System Options In the next sections, we’ll look at the following options in greater detail: using a free SAP Cloud Platform account, paying a cloud provider for the computing resources, and two options for running SAP HANA on your own machine. Free SAP Cloud Platform Account Let’s start with the cheapest and easiest way to get your own SAP HANA system: the free SAP Cloud Platform account. Figure 2.8 shows https://cloudplatform.sap.com, where you can get a free account on SAP Cloud Platform. After a quick registration process, you’ll gain access to SAP Cloud Platform. To learn how to set up your own account on SAP Cloud Platform, you can read the Get Started with SAP Cloud Platform information at http://s-prs.co/v487615. The free account on SAP Cloud Platform has restrictions. You don’t get a lot of memory for loading large data sets, and the cloud platform is shared with many other users. SAP Cloud Platform is an excellent choice to get going quickly, and it doesn’t cost you anything. However, for the purposes of preparing for the certification exam, one of the other options will probably be better. 51 52 Chapter 2 SAP HANA Training Figure 2.8 SAP Cloud Platform Registration SAP HANA on Third-Party Platforms The next option is to get your own SAP HANA system on AWS, Microsoft Azure, or GCP. While SAP HANA, express edition is free from SAP, you’ll need to pay these cloud providers for hosting your solution. All these cloud offerings are ideal for learning more about SAP HANA in a practical manner. They’re quite inexpensive, as long as you remember to switch things off when you’re finished—that is, essentially pressing a pause button on your SAP HANA system. Otherwise, the cloud providers will continue charging you for the CPU, memory, network, and disk space being used. You can also put a limit on your finances to ensure you don’t go over your budget. You can create an account on any of these cloud offerings via your own account with the respective companies. An easy way to create your own SAP HANA system on any of them is by using the SAP Cloud Appliance Library at https://cal.sap.com. Registration is free on this website. After you have logged on, when scrolling down the page, you’ll see a description for SAP HANA, express edition, with a Create Instance button. You can select with which Cloud Provider you want to create your SAP HANA system, as shown in Figure 2.9. You can select the size of the SAP HANA server you Hands-On with SAP HANA Chapter 2 require as well. You can play around with the various options, and based on the options you select, the cost per hour and the total monthly costs will be displayed. Figure 2.9 SAP Cloud Appliance Library Note The SAP HANA editions on the SAP Cloud Appliance Library aren’t always the latest versions. The SAP Cloud Appliance Library will also allow you to automatically switch your SAP HANA system on and off at predefined times. The savings are also automatically reflected in the monthly cost calculations. When you’re satisfied with your choices, you can ask the SAP Cloud Appliance Library to create your SAP HANA system automatically for you with your chosen cloud provider. If you have a Microsoft Visual Studio subscription, you get a certain amount of credit each month for Microsoft Azure Cloud, which you can use to pay for your SAP HANA, express edition system. Figure 2.10 shows you how to spin up an SAP HANA, express edition system. Simply click on Create a resource in the top left of the screen; type “SAP HANA” in the search box; select the SAP HANA, express edition (Server + Applications) option from the dropdown; and click the Create button on the bottom right of the screen shown in Figure 2.10. 53 54 Chapter 2 SAP HANA Training Figure 2.10 Creating Your Own SAP HANA, Express Edition System on Microsoft Azure Running SAP HANA, Express Edition on Your Laptop or PC The most-asked question from students through the years was for a copy of SAP HANA that they can install on their laptops. You can now do exactly that! SAP HANA, express edition is compiled so that it can run on your laptop with all the functionality of the full version of SAP HANA while using less memory. Best of all, it’s free for both development and productive use up to 32 GB of RAM. SAP HANA, express edition is available as both a binary installer and a virtual machine image to get you started quickly, but the easiest way to get started is to use the virtual machine image. In the following steps, we’ll discuss how to get you up and running quickly. Tip The screens and examples in this book were created with SAP HANA, express edition. 1. From http://s-prs.co/v487616, click on the Free Installation button, shown earlier in Figure 2.6, or the buttons in the On-Premise block, shown earlier in Figure 2.7. Hands-On with SAP HANA Chapter 2 The alternative is to go directly to http://s-prs.co/v487617, register, and get the download manager for SAP HANA, express edition. This program requires Java. 2. When you start the download manager for SAP HANA, express edition, as shown in Figure 2.11, you can choose between a Virtual Machine image and a Binary Installer. The default is the Virtual Machine image. Figure 2.11 Download Manager for SAP HANA, Express Edition 2.0 3. Select the functionality you want to use with your copy of SAP HANA, express edition. Note You need a minimum of 8 GB memory to run SAP HANA, express edition. You might need 16 GB if you want to use SAP HANA XSA. We recommend that you install the Server + applications virtual machine, as shown in Figure 2.11. This option includes SAP HANA XSA and is what we use in this book. 55 56 Chapter 2 SAP HANA Training Options you might select include the following: – The first selection on the list is the Getting Started with SAP HANA express edition (Virtual Machine Method) PDF file. This document will help you install your copy of SAP HANA, express edition. You can also follow the instructions at http://s-prs.co/v487618. – The Virtual Machine option already includes the SUSE Linux Enterprise Server (SLES) operating system. If you use the Binary Installer option, you’ll need to download a certified operating system. See the next section for links. 4. Ensure you have the VMware Workstation Player (for Windows and Linux) or VMware Fusion (for Mac). Tip The VMware Workstation 15 Player is free for personal use and can be downloaded at their website at www.vmware.com/go/downloadplayer/. After you’ve installed SAP HANA, express edition, you need to set up the environment so that you can start using it for information modeling. You can get guidance by following the tutorials for SAP HANA, express edition at http://s-prs.co/ v487619 or by following the steps described in the PDF you downloaded previously per Figure 2.11. However, the following steps will help you quickly set up your SAP system so that you can get to the screens we’ll show in this book: 1. Make sure your hosts file contains a reference to hxehost as described in the Getting Started guide. 2. Check if the XS engine is up and running by opening http://hxehost:8090 in your browser. Tip We recommend that you use the Google Chrome browser. 3. Check that SAP HANA XSA and services are running by opening https://hxehost:39030. This will show you a list of apps that are running and their web addresses. You can open any of these apps by clicking on their web addresses. Check that the xsa-cockpit and webide apps are running as shown in Figure 2.12. If you see the web app running, it means you’ve also installed the SHINE demo application. Many of the demo screens we show in this book use the SHINE demo application. Hands-On with SAP HANA Chapter 2 Figure 2.12 SAP HANA XSA Up and Running with Available Apps 4. Create your own user by opening the SAP HANA XS Advanced Cockpit app screen and logging in with the XSA_ADMIN user. You’ll see a web application, as shown in Figure 2.13. Click User Management, and create a new user by clicking on the New User button. Figure 2.13 SAP HANA XS Advanced Cockpit App 57 58 Chapter 2 SAP HANA Training 5. Add your new user to the development space. 6. Open the Organizations screen, and select HANAExpress. You’ll see two spaces named development and SAP, as shown in Figure 2.14. Figure 2.14 SAP HANA XSA Administration App: Organization and Space Management 7. Select the development space. Select the Members option on the menu. You should see that the XSA_ADMIN and XSA_DEV users are already in this space. 8. Click on the Add Members button, and enter your user name. 9. Select the Space Developer checkbox next to your user name, and click the OK button. Your user is now added to the development space. 10. Open the webide app. We’ll spend a lot of time in SAP Web IDE for SAP HANA. In Chapter 4, we’ll describe it in more detail. 11. Log in with your new user name. SAP Web IDE for SAP HANA won’t have any content for your new user. To get up and running quickly, you can clone or import the source code of the SHINE SAP HANA XSA demo application. A few tips are as follows: – The easiest way to get the source code for SHINE is to clone it from GitHub. Open the menu, and select File • Git • Clone Repository. When asked for a Git Repository, enter the public GitHub address where SAP HANA SHINE resides, that is, “https://github.com/SAP/hana-shine-xsa”. SAP Web IDE for SAP HANA will download all the source code, information models, and designtime objects for you. Hands-On with SAP HANA Chapter 2 – If you’ve already installed SHINE during the SAP HANA install, open the Web app, and click on the icon in the top-right corner. This will download a ZIP file to your local machine with the SHINE source code. In the webide app, open the menu, and select File • Import • File or Project. Then select the ZIP file from SHINE on your local machine. 12. Next, set the space where you want to work in the Project Settings for your project. As shown in Figure 2.15, select the second-level node (which is your project), and right-click on the name. In the popup context menu, select Project • Project Settings. Figure 2.15 SAP Web IDE for SAP HANA: Configuring Project Settings for a Project 13. On the next screen, select Space, and then choose the development space from the dropdown list. This is the same space you added your user to earlier. Click Save and then Close. 14. The last step is to build your SHINE application. At the moment, you only have the source code, but you want it to create a schema, users, tables with data, information models, and much more for you. This is accomplished by building your project. As shown in Figure 2.16, select the second-level node (your project), and right-click on the name. In the popup context menu, select Build • Build. 15. After the core-db module has been built, you can open it. Open the src folder, then the models folder, and open the PRODUCTS.hdbcalculationview model. You should see something that looks like Figure 2.17. 59 60 Chapter 2 SAP HANA Training Figure 2.16 SAP Web IDE on SAP HANA: Building the SHINE Demo Application Figure 2.17 SAP Web IDE on SAP HANA: Showing an SAP HANA Graphical Information Model Congratulations! You now have a fully working SAP HANA system. Hands-On with SAP HANA Chapter 2 Building an SAP HANA System You can also install SAP HANA on a server. You’ll need some certified hardware, an operating system, and the SAP HANA software. Note that you’ll need a registered S-user name and password before you can download the SAP HANA software. If you’re an existing SAP customer or SAP partner, you can ask your system administrator for a logon user. The requirements for the hardware and how to install the SAP HANA software can be found at http://s-prs.co/v487620. Note It’s recommended that you use a machine with a minimum of 24 GB or 32 GB of memory. If you have less memory, you should use SAP HANA, express edition. The certified operating systems for SAP HANA are SLES for SAP Applications and Red Hat Enterprise Linux for SAP HANA. You can find the SLES for SAP Applications at www.suse.com/products/sles-for-sap/, or you can get a copy of Red Hat Enterprise Linux for SAP HANA from https://developers.redhat.com/products/sap/ overview/. You’ll also need to download a copy of the SAP HANA software from http://sprs.co/v487621. In the alphabetical list of products, select H, choose SAP HANA Platform Edition, and then follow the instructions in installation guides on http://sprs.co/v487620. Project Examples After you have your own SAP HANA system, the next step is to start using it. In this section, we’ll discuss how to start working on SAP HANA projects and how you can use the SHINE demo package or create your own project. Using the SHINE Demo Package One option to get some practice in SAP HANA is to explore the content available in the SHINE demo package. We’ll use the SHINE demo throughout this book. It’s fully documented and is used by SAP as an example for how to develop SAP HANA applications. You can see what the SHINE demo application looks like in Figure 2.18. 61 62 Chapter 2 SAP HANA Training Figure 2.18 SHINE (SAP HANA Interactive Education) for SAP HANA XS Advanced Model Screen The SHINE documentation can be found at http://s-prs.co/v487620 in the Development area by clicking the View All button. Creating Your Own Project You can also think up your own projects. You can start very simply by just learning to work with individual topics discussed in the book, such as fuzzy text search, currency conversion, input parameters, hierarchies, and spatial joins. You can also add some security on top of this, for example, to limit the data for a single user to just one year. The next step is to think about creating a report that can analyze the performance of a particular data set. This is a good exercise because you’ll have to take end-user requirements and learn how to translate them into the required SAP HANA information models. Then, you’ll design and create these models. Finally, you can expand these information models into an application. You can also look at incorporating something such as SAP Lumira to easily create attractive storyboards. You can download a free copy of SAP Lumira from https:// saplumira.com/. Hands-On with SAP HANA Chapter 2 Where to Get Data SAP HANA can process large amounts of data. To get hands-on practice in the SAP HANA system, you’ll want some sort of data to play with. There are a couple of places where you can acquire such data: 쐍 SHINE demo package The SHINE demo package is a great place to get large data sets. To do this in the SHINE package, use the Data Generator option (the first option in Figure 2.18, shown previously). 쐍 Datahub Datahub is a website and free open data management system that can be used to get, use, and share data. At https://datahub.io/ (see Figure 2.19), you can find more than 10,000 data sets. (If you can’t find what you’re searching for, look at the old site at https://old.datahub.io/dataset.) Figure 2.19 Datahub 쐍 US Department of Transportation database If you want a larger data set to play with, you can download the US Department of Transportation database on flights in the United States at http://s-prs.co/ v487622. It has more than 50 million records, with more than 20 years of data, 63 64 Chapter 2 SAP HANA Training including departure airport, destination airport, airline, flight number, airborne time, flight delays and cancellations, and so on. You can use this data to experiment with the predictive capabilities of SAP HANA described in Chapter 10, for example, to predict whether your US flight will be delayed and by how much. Exam Questions Now, we’ll change gears a bit and look at the SAP certification exams and how to approach them. Let’s look at the examination question types and some tips and hints about how to complete the certification exam. Types of Questions An international team sets up the SAP HANA certification exams. All the questions are written in English, and all the communication is in English. Because the exam teams are international, a lot of attention is focused on making sure that everybody will understand what is meant by each question, avoiding possible double meanings or ambiguities. Some of the exams get translated into other languages, so simple and straightforward questions work best. Figure 2.20 shows three types of questions that you’ll find in the certification exam. In your calculation view, you CANNOT enable a hierarchy for time dependency. What could be the reason for this? Which of the following approaches can be used to implement union pruning? Note: There are 2 correct answers to this question. Which of the following are components of SAP HANA Data Warehousing Foundation? Note: There are 3 correct answers to this question. The hierarchy is a parent-child type. Define a constant value for each data source in the Union Node. Native DataStore Object The hierarchy is a level type. Define the cardinality between the data sources. SAP Data Hub You did NOT reference a DIMENSION of the type TIME. Define a restricted column and include it in both data sources of a union. CompositeProvider You did NOT include a history column table. Define union pruning conditions in a pruning configuration table. Data LifeCycle Manager Data Distribution Optimizer Figure 2.20 Three Types of Certification Exam Questions Exam Questions Chapter 2 The first question is multiple choice. The radio buttons indicate that you can only choose one of the four available answer options. There will be three or four different options, with only one correct answer. (Most of the time there will be four answer options. The newer exams have a few questions that only have three answer options for this type of question. The older exams always had four options.) The other two questions are called multiple-response questions, of which there are two types: 쐍 With the first type of multiple-response question, there are four possible answers, and there will always be two correct answers and two incorrect answers. In the exam, this will be indicated by the words, “Note: There are two correct answers to this question.” 쐍 The other type of multiple-response question has five possible answers, of which three will always be correct, and two will be incorrect; you must select three correct answers. Above the answers, you’ll see the words, “Note: There are three correct answers to this question.” Both types of multiple-response questions use checkboxes, and you must select however many answers are correct. All three types of questions have exactly the same weight. You either get a question right or you get it wrong. It’s a binary system: 0 or 1. For a multiple-response question with two correct and two incorrect answers, you must select two answers. If you select three answers or one answer, you don’t receive any points. If you select two answers and they are the correct two answers, then you earn a point. Tip Select as many answers as are required! If you’re certain that an option is correct, select it. If you’re doubtful about an answer, at least select the correct number of answers. If you don’t select anything, you won’t earn any points. Even if you plan to come back to the question later, we still recommend that you select the exact number of answer options required. Multiple-choice questions normally make up the majority of the questions for the associate-level core exams. More than half of the questions will have only one correct answer. 65 66 Chapter 2 SAP HANA Training Note In the certification exam, there are no fill-in-the-blank questions. There are also no true or false questions in the certification exam, but we’ll use true or false questions in this book for training purposes. The questions and the answers themselves appear randomly. In other words, two people could be taking the same exam, sitting next to each other, and the order in which the questions appear for each person would be different. In addition, for similar questions, the order of A, B, C, and D (the answer options) is different for different people. Because answer options are randomized, you’ll never find any questions that list “All of the above” or “None of the above” as a possible answer. The questions in the SAP certification exams tend to be very short and to the point. All extra words and descriptions have been cut away, ensuring that you’ll have enough time to complete the exams. We’ve never heard anyone complain about the time limits in the SAP HANA certification exams. Because the questions are to the point, every word counts. Every word is there for a reason and has a purpose, so don’t ignore any word—especially words such as always, only, and must. Always indicates on every occasion, only means that no other options are allowed, and must indicates something that is mandatory. In the answer options, you could see some optional actions that would be valid actions in normal circumstances, but if they aren’t mandatory, they are incorrect answers for questions asking what you must do. Therefore, for questions using the word must, pay attention to optional steps versus mandatory steps. The certification exams use a minimum of negative words, which are only used for troubleshooting questions. All negative words in a question are written in capital letters—for example, “You did something and it is NOT working. What is the reason?” You can see an example of this in the first question in Figure 2.20 shown earlier in this section. Tip Don’t just learn the facts as if everything will always work properly. Systems break in real life, and you need to know why they didn’t work and how to fix or troubleshoot the problem. There will be questions in the exam to test this knowledge. Now that we’ve looked at the general structure of SAP certification exam questions, let’s look at the strategy of elimination in the exam. Exam Questions Chapter 2 Elimination Technique Experience has shown that the elimination technique can be useful for answering exam questions. In this technique, you start by finding the wrong answers instead of finding the right answers. When creating exams, it’s fairly easy to set up questions, but it’s hard to think up wrong answers that still sound credible. Because it’s so difficult to write credible wrong answers, it’s normally easier to eliminate the wrong answers first. As Sherlock Holmes would say, whatever remains, however improbable, must be the truth. To show how efficient this technique can be, look at the multiple-response question with five different options (refer to Figure 2.20, right). There are three correct and two incorrect answers, so it saves you time to find the two incorrect ones, rather than trying to find the three correct answers. Tip Make sure you know which keywords apply to which area. The wrong answers could come from an unrelated area, but the words sound like they might belong to the area in which the question is asked. Don’t confuse the words from different topic areas. Bookmark Questions In the certification exam, you can always bookmark questions and return to them later. Figure 2.21 shows the sample exam you can write for C_HANAIMP_15. You can see the Assessment Navigator block in the bottom-right corner of the screen. You’ll notice that the top-left corner of question 1 is marked. By clicking on the bookmarked question in the Assessment Navigator, it will take you directly to that question. If you’re not confident about a question, mark it. We still recommend that you complete the right number of answer options: If two answers are required, fill in two options. If three are required, select three options. Then, mark the question so that you can review it later. Tip Watch out when you do the review; your initial choice tends to be the correct one. Remember, don’t overanalyze the question. 67 68 Chapter 2 SAP HANA Training Figure 2.21 Sample Practice Exam: Assessment Navigator Tip Watch out that you don’t click the Submit button in the bottom right of the screen until you’ve finished answering all the questions. This button doesn’t submit the answers to each question, but ends the entire exam! Now, let’s turn our attention to some general exam strategies. Implementing testtaking strategies will help you succeed on the exam. General Examination Strategies The following are some good tips, tricks, and strategies to help you during an exam: 쐍 If you have the SAP training manuals, read them twice. Pay attention to pages with lots of bullet points because that means there are normally lots of options, which are likely good sources for questions. In addition, look carefully at the Caution boxes. Summary Chapter 2 쐍 For this associate-level certification exam, when using this book, read the entire chapters and answer all the questions. If you don’t get the answers right or don’t understand the question, reread that section. This is the equivalent of reading the SAP training materials twice. 쐍 Make sure that you’ve answered all the questions and that you’ve selected the correct number of answer options. If the question says that three answers are correct, then make sure that you’ve marked three—not four or two. 쐍 Occasionally, you’ll find answer options that are opposites—for example, X is true, and X is NOT true. In that case, make sure you select one of the pair. 쐍 In our experience with SAP HANA certification exams, the answers from one question don’t provide answers to another question. SAP Education is careful to make sure that this doesn’t happen in its exams. 쐍 Watch out for certain trigger words. Alarm bells should go off when you see only, must, and always. 쐍 Look out for impossible combinations. For example, if a question is about measures, then the answer can’t be something to do with dimension calculation views because you can’t find measures for such views. 쐍 Some words can be different in different countries. For example, some people talk about a right-click menu, but in other places in the world, this same feature is known as a context menu. This is the menu that pops up in a specific context, and you access this menu by right-clicking. 쐍 Make sure you get a good night’s rest before the exam. Summary You should now know which SAP Education training courses you can attend for your certification examination, which related SAP courses will complement your knowledge and skills, and where to get hands-on experience. In addition, we introduced many free sources of information on the Internet, training videos on YouTube, and online courses at openSAP. You can now form a winning strategy for taking your SAP certification exam. In the next chapter, we’ll begin looking at exam questions and concepts, focusing first on the technology, architecture, and deployment scenarios of SAP HANA. 69 Chapter 3 Technology, Architecture, and Deployment Scenarios Techniques You’ll Master 쐍 Understand in-memory technologies and how SAP HANA uses them 쐍 Get to know the internal architecture of SAP HANA 쐍 Describe the SAP HANA persistence layer 쐍 Learn the different ways that SAP HANA can be deployed 쐍 Know when to use different deployment methods, including the various cloud options 72 Chapter 3 Technology, Architecture, and Deployment Scenarios In this chapter, you’ll become familiar with the various deployment scenarios for SAP HANA and evaluate appropriate system configurations, including cloud deployments. We’ll introduce the in-memory paradigm and the thinking behind how SAP HANA uses in-memory technologies, and you’ll get a glimpse of how to develop information models with SAP HANA. By the end, you’ll better understand how SAP HANA functions in all types of solutions and use cases you might require. Real-World Scenario You start a new project. The business users describe what they want and expect you to suggest an appropriate architecture for how to deploy SAP HANA. You have to decide between building something new in SAP HANA or migrating the current system to use SAP HANA. You also have to decide if you should use SAP HANA in the cloud and, if so, which type of service you’ll need. When you understand the internal architecture of SAP HANA, it can lead to unexpected benefits for your company. You can use the fact that SAP HANA uses compression in memory combined with an understanding of how SAP HANA uses disks and backups to provide a customer with an excellent business case for moving its business solution to SAP HANA. This can lead to dramatically lower operational costs and reduced risk because the compression lets SAP HANA use less memory, which leads to smaller backups and immediately reduces the storage space required by the business (operational costs). These backups take less time, don’t require any downtime, and can be made at any time of the day, leading to reduced risk. Objectives of This Portion of the Test The objective of this portion of the SAP HANA certification test is to evaluate your basic understanding of the various SAP HANA deployment scenarios and the corresponding system configurations. The certification exam expects SAP HANA modelers to have a good understanding of the following topics: 쐍 How SAP HANA uses in-memory technology, including elements such as columns, tables, compression, delta buffer, and partitioning 쐍 SAP HANA architecture concepts 쐍 The persistence layer in SAP HANA 쐍 The various ways that SAP HANA can be deployed Key Concepts Refresher Chapter 3 쐍 How SAP HANA can be deployed in the cloud 쐍 The direction SAP is taking by allowing your modeling and development work to run on both standalone and cloud deployments of SAP HANA 쐍 An overview of using SAP HANA with analytics and business intelligence (BI) Note This portion contributes about 10% of the total certification exam score. Key Concepts Refresher We’ll start with something that most people know very well: Moore’s Law. Even if you’ve never heard of it, you’ve seen its effects on the technology around you, for example, with your mobile phone. Put simply, this means that the overall processing power for computers doubles about every two years. Therefore, you can buy a computer for roughly the same price as about two years ago, and it will be about twice as fast and powerful as the older one. This held true for mobile phones as well. (Lately Moore’s Law hasn’t kept up, and devices such as high-end mobile phones have risen in price.) Moore’s Law is useful for understanding system performance and response times. If you have a report that is running slow, let’s say an hour long, you can fix this by buying faster hardware. You can easily decrease the processing time of your slow report from 1 hour to 30 minutes to 15 minutes by just waiting a while and then buying faster computers. However, a few years ago that simple recipe stopped working. Our easy upgrade path was no longer available, and we had to approach the performance issues in new and innovative ways. Data volumes also keep growing, adding to the problem. SAP realized these problems early and worked on the solution we now know as SAP HANA. In this chapter, we’ll discuss SAP HANA’s in-memory technology and how it addresses the aforementioned issues. We’ll then look at SAP HANA’s architecture and approaches. Finally, we’ll walk through the various deployment options available, including cloud options and using SAP HANA for analytics. In-Memory Technology SAP HANA uses in-memory technology. But, just what is in-memory technology? In-memory technology is an approach to querying data residing in RAM, as 73 74 Chapter 3 Technology, Architecture, and Deployment Scenarios opposed to physical disks. The results of doing so lead to shortened response times, which enable enhanced analytics for BI/analytic applications. In this section, we’ll walk through the process that leads us to use in-memory technology, which will illustrate the importance of this new technology. Multicore CPU After years of CPUs getting faster, we seemed to hit a wall when CPU speeds reached about 3 GHz. It became difficult to make a single CPU faster while keeping it cool enough. Therefore, chip manufacturers moved from a single CPU per chip to multiple CPU cores per chip. Instead of increasingly fast machines, machines now stay at roughly the same speed but have more cores. If a specific report takes an hour and only runs on a single core, that report won’t run any faster thanks to more CPU cores. We can run multiple reports simultaneously, but each of them will run for an hour. To speed up reports, we need to rewrite software to take full advantage of new multicore CPU technology—that is, through parallel processing—which is a problem that has been faced by most software companies. SAP addressed this problem by making sure that all operations in SAP HANA are making full use of all available CPU cores. Slow Disk When talking about poor system performance, you can place a lot of blame on slow hard disks. Table 3.1 shows how slow a disk is when compared to memory. Operation Latency (in Nanoseconds) Relative to Memory Read from level 1 (L1) cache 0.5 ns 200 times faster Read from level 2 (L2) cache 7 ns 14 times faster Read from level 3 (L3) cache 15 ns 6.7 times faster Read from memory 100 ns – Send 1 KB over 1 Gbps network 10,000 ns 100 times slower Read from solid state disk (SSD) 150,000 ns 1,500 time slower Table 3.1 Latency of Hardware Components and Performance of Components Shown Relative to RAM Key Concepts Refresher Chapter 3 Operation Latency (in Nanoseconds) Relative to Memory Read from disk 10,000,000 ns 100,000 times slower Send network packet from the US West Coast to Europe and back 150,000,000 ns 1,500,000 times slower Table 3.1 Latency of Hardware Components and Performance of Components Shown Relative to RAM (Cont.) To give you an idea of how slow a disk really is, think about this: if reading data from RAM takes one minute, it will take more than two months to read the same data from disk! In reality, there is less of a difference when we start reading and streaming data sequentially from disks, but this example prompts the idea that there must be better ways of writing systems. The classic problem is that alternatives such as RAM have been prohibitively expensive, and there hasn’t been enough RAM available in a machine to replace disks. Therefore, the way we’ve designed and built software systems for the past 30 or more years has been heavily influenced by the fact that disks are slow. This affects the way everyone from technical architects to developers to auditors thinks about system and application design. Consider these three examples of the wide effect of slow disks: 쐍 SAP NetWeaver SAP NetWeaver architecture uses one database server and multiple application servers (Figure 3.1). Calculations Application Server Business Rules Application Server Application Server Database Server DB Independent Figure 3.1 SAP NetWeaver Architecture: One Database Server, Many Application Servers 75 76 Chapter 3 Technology, Architecture, and Deployment Scenarios Because the database stores data on disks that are slow, the database becomes the bottleneck in the system. In any system, you always try to minimize the impact of the bottlenecks and the load on the system. Therefore, we move the data into multiple application servers to work with it there, using the database server as little as possible. In SAP systems, we used it so little that we were database independent. SAP systems even used their own database-independent ABAP Data Dictionaries. In the application servers, we run all the code, perform all the calculations, and execute the business rules. We use memory buffers to store a working set of data. In many SAP systems, 99.5% of all database reads come from these memory buffers, thus minimizing the effects of the slow disk. The SAP system architecture was heavily influenced by the fact that the disk is slow. 쐍 Financial systems data storage Inside financial systems, we store data such as year-to-date data, month-to-date data, balances, and aggregates. For reporting, we have cubes, dimensions, and more aggregates. These values are all updated and stored in financial systems to improve performance. Take the case of the year-to-date figure. We don’t really need to store this information because we can calculate it from the data. The problem is that to calculate the year-to-date figure, we need to read millions of records from the database, which the database reads from the slow disk. Therefore, we could calculate the current year-to-date figure, but doing so would be too slow. Because disk storage is cheap and plentiful, we just add this value into the database in another table. With every new transaction, we simply add the new value to the current year-to-date value and update the database. Now reports can run a lot faster on disk. If the disk was fast, we would not need to store these values. Can you imagine a financial system without these values? Ask accountants to imagine a financial system without balances! Why store balances when you can calculate them? When you see the reaction on their faces, you’ll understand how even their thinking has been influenced by the effects of slow disks. We’ll return to this example in the next section. 쐍 Data warehouses One of the many reasons we have data warehouses is that we need fast and flexible reporting. We build large cubes with precalculated, stored aggregates that we can use for reporting and to slice and dice the data. Building these cubes takes time and processing power. Reading the masses of data from disk for the Key Concepts Refresher Chapter 3 reports can be slow. If we keep these cubes inside our financial system, for example, then a single user running a large report from these cubes can pull the system’s performance down. That one reporting user affects thousands of other users. Therefore, SAP created separate data warehouses to reduce the impact of reporting on operational and transactional systems. Those reporting users can now run their heavy, slow reports away from the operational and transactional systems. The database world now is split into two parts: the online transactional processing (OLTP) databases for the financial, transactional, and operations systems, and the online analytical processing (OLAP) data warehouses (see Figure 3.2). Pre-calculated stored values SAP ERP, SAP CRM, SAP SRM, … Data Warehouse Database (OLTP) Database (OLAP) Figure 3.2 OLTP vs. OLAP: Keeping Operational Reporting Separate from Operational Systems for Performance Reasons If disks were fast, and we could create aggregates fast enough, we would be able to perform most of our operational reporting directly in the operational system, where the users expect and need it. We would not have separate OLTP and OLAP systems. Again, slow disks influenced architectural design principles. These three examples prove that slow disks influenced system design, architecture, and thinking for many years. If we could eliminate the slow disk bottleneck from our systems, we could then change our minds about how to design and implement systems. Hence, SAP HANA represents a paradigm shift. 77 78 Chapter 3 Technology, Architecture, and Deployment Scenarios Loading Data into Memory In the first of the three examples in the previous section, you saw that we’ve used memory where possible to speed things up with memory caching. Nevertheless, until a few years ago, memory was still very expensive. Then, memory prices started coming down fast, and we asked, “Can we load all data into memory instead of just storing it on disk?” Customer databases for business systems can easily be 10 TB (10,000 GB) in size. The largest available hardware when SAP HANA was developed had only 512 GB to 1 TB of memory, presenting a problem. SAP obviously needed to compress data to fit it into the available memory. We’re not talking here about ZIP or RAR compression, which needs to be decompressed again before it’s usable, but dictionary compression. Dictionary compression involves mapping distinct values to consecutive numbers. Using this type of compression enables the data to be used in its compressed format. Column-Oriented Database For years, it was common to store data in rows. Row-based databases work well with spinning disks to read single records faster. However, when we start storing data in memory, there are fewer valid reasons to use row-based storage. At that point, we can decide to use row-based or column-based storage. Why would we use columns instead of rows? When we start looking at compressing data, then row-oriented data doesn’t work particularly well because we mix different types of data—such as cities with salaries and employee numbers—in every record. The average compression we get is about two times (i.e., data is stored in about 50% of the original space). When we store the same data in columns, we now group similar data together. By using dictionary compression, we can achieve excellent compression, and thus column-based storage starts making more sense for in-memory databases. Another way in which we can save space is via indices. If we look inside an SAP system, we find that about half the data volume is taken up by indices, which are actually sorted columns. When we store the data in columns in memory, do we really need indices? We can save that space. With such column-oriented database advantages, we end up with 5 to 20 times compression. Let’s get back to our original problem of 10 TB databases, but machines with only 1 TB of memory. By using column-based storage, compression, and no indices, we Key Concepts Refresher Chapter 3 can fit a 10 TB database into the 1 TB of available memory. Even though we focused on the preferred column-oriented concept here, note that SAP HANA can handle both row- and column-based tables. Removing Bottlenecks When we manage to load the database entirely into memory, exciting things start happening. We break free from the slow disk bottleneck that had a stranglehold on system performance for many years and can start exploring new architectures. As always, even if our systems become thousands of times faster, at some stage, we’ll reach new constraints. These constraints will determine what the new architecture will look like. The way we provide new business solutions will depend on the characteristics of this new architecture. Architecture and Approach The new in-memory computing paradigm has changed the way we approach solutions. Different architectures and deployment options are now possible with SAP HANA. Changing from a Database to a Platform The biggest change in approach is how and where we work with the data. Disks were always slow, causing the database to become a bottleneck. Thus, we moved the processing of data onto application servers to minimize the pressure on this disk bottleneck. Now, data storage has moved from disk to memory, which is much faster. Therefore, we needed to reexamine our assumptions about the database being the bottleneck. In this new in-memory architecture, it makes no sense to move data from the memory of the database server to the memory of the application servers via a relatively slow network. In that case, the network will become the bottleneck, and we’ll duplicate functionality. New bottlenecks require new architectures. Our new bottleneck arises because we have many CPU cores that need to be fed as quickly as possible from memory. Therefore, we must put the CPUs close to the memory (i.e., the data). In doing so, we realized that SAP HANA can be offered as an appliance. As shown in Figure 3.3, it doesn’t make sense to copy data from the memory of the database server to the 79 80 Chapter 3 Technology, Architecture, and Deployment Scenarios memory of the application server as we’ve always done (left arrow). We instead now move the code to execute within the SAP HANA server (right arrow). Calculations Application Server Move data to the application (code) Move code to the database (data) The Old Approach The New Approach Database Server Calculations Figure 3.3 Moving Data In-Memory Our solution approach also needed to change. Instead of moving data away from the database to the application servers—as we’ve done for many years—we now want to keep it in the database. Many database administrators (DBAs) have done this for years via stored procedures. Instead of stored procedures, in SAP HANA, we use graphical information models, as we’ll discuss in Chapter 4 through Chapter 7. You can also use stored procedures and table functions if necessary. We’ll look at this in Chapter 8. Note Instead of moving data to the code, we now move the code to the data. We can easily implement this process by way of SAP HANA graphical modeling techniques. When you start running code inside the database, it’s only a matter of time before it changes from a database into a platform. When people tell you that SAP HANA is “just another database,” ask them which other database incorporates many programming languages, a web server, an application server, an HTML5 framework, a development environment, replication and cleaning of data, integration with most other databases, predictive analytics, machine learning, and multitenancy— the reactions can be interesting. Key Concepts Refresher Chapter 3 Parallelism in SAP HANA In the Slow Disk section, we provided three examples of the wide-ranging effects of slow disks on performance. In the second example, we looked at calculating the year-to-date value. We don’t really need to store a year-to-date value because we can calculate it from the data. With slow disks, the problem is that to calculate this value, we need to read millions of records from the database, which the database reads from the slow disk. Because disk storage is cheap and plentiful, we just added this aggregate into the database in another table. When we have millions of records in memory and many CPU cores available, we can calculate the year-to-date value faster than an old system can read it from the new table from a slow disk. As shown in Figure 3.4, even though we require a single year-to-date value, we use memory and all the available CPU cores in parallel to calculate it. Value 1 Value 2 … CPU core 1 CPU core 2 CPU core 3 SUM of 1..n calculated in parallel … … CPU core m Value n SUM of 1..n Figure 3.4 Single Year-to-Date Value Calculated in Parallel Using All Available CPU Cores 81 82 Chapter 3 Technology, Architecture, and Deployment Scenarios We just divide the memory containing the millions of records into several ranges and ask each available CPU core to calculate the value for its memory range. Finally, we add all the answers from each of the CPU cores to get the final value. Even a single sum is run in parallel inside SAP HANA. Although it’s relatively easy to understand this example, don’t overlook the important consequences of this process: if we can calculate the value quickly, we don’t need to store it in another table. By applying this new approach, we start to get rid of many tables in the system, which then leads to simpler systems. We also have to write less code because we don’t need to maintain a particular table, to insert or update the data in that table, and so on. Our backups become smaller and faster because we don’t store unnecessary data. That is exactly what has happened with SAP S/4HANA: functional areas that had 37 tables, for example, now have 2 or 3 tables remaining. To get an idea of the compound effect that SAP HANA has on the system, Figure 3.5 shows how a system shrinks when you start applying this new approach. 750 GB 150 GB 50 GB 10 GB Original data volume Application migrated to SAP HANA Application optimized for SAP HANA Data aging implemented Figure 3.5 How a Database Shrinks with SAP HANA In-Memory Techniques Say that we start with an original database of 750 GB. Because of column-based storage, compression, and the elimination of indices, the database size easily comes down to 150 GB. When we then apply the techniques we just discussed for Key Concepts Refresher Chapter 3 calculating aggregates instead of storing them in separate tables, the database goes down to about 50 GB. When we finally apply data aging, the database can shrink to an impressive 10 GB. In addition, we’re left with a simpler system, less code, faster reporting response times, and smaller backups. Delta Buffer and Delta Merge Before we start sounding like overeager sales people, let’s look at something that column-based databases aren’t good at. Row-based databases are good at reading, inserting, and updating single records. Row-based databases aren’t efficient when we have to add new columns (fields). When we look at column-based databases, we find these two sentences are reversed because data stored in columns instead of in rows is effectively “rotated” by 90 degrees. Adding new columns in a column-based database is easy and straightforward, but adding new records can be a problem. In short, column-based databases tend to be very good at reading data, whereas row-based databases are better at writing data. Fortunately, SAP took a pragmatic rather than a purist approach and implemented both column-based and row-based storage in SAP HANA. This allows users to have the best of both worlds: column storage for compressed data and fast reading, and row storage for fast writing of data. Because reading data happens more often than writing, the column-based store is obviously preferred. Figure 3.6 shows how this is implemented in SAP HANA table data storage. The main table area is using the column store as we just discussed, and the data is optimized for fast reading. Next to the main table area, we also have an area called the delta buffer, which is sometimes called the delta store. When SAP HANA needs to insert new records, it inserts them into this area. This is based on the row store and is optimized for fast data writing. Records are appended at the end of the delta buffer. The data in this delta buffer is therefore normally unsorted and uncompressed. Some people think the read queries are slowed down when the system has to read both areas (for old compressed and newly inserted data) to answer a query. Remember that everything in SAP HANA runs in parallel and uses all available CPU cores. For the query shown in Figure 3.6, 100 CPU cores can read the main table area, and another 20 CPU cores can read the buffer area. The data processed by all 120 cores is then combine to produce the result set. 83 84 Chapter 3 Technology, Architecture, and Deployment Scenarios SAP HANA Reading data Main Writing data Memory • Column store • Read-optimized Delta buffer • Row store • Write-optimized Delta merge Figure 3.6 Column and Row Storage Used to Achieve Fast Reading and Writing of Data We’ve looked at cases in which new data is inserted, but what about cases in which data needs to be updated? Column-based databases can also be relatively slow during updates. Therefore, in such a case, SAP HANA marks the old record in the main storage as “updated at date and time” and inserts the new values of the records in the delta buffer. The SAP HANA system effectively changes an update into a flag and insert. This is called the insert-only technique. We don’t want the delta buffer to become too large because a large unsorted and uncompressed area could slow down query processing. When the delta buffer becomes too full or when you trigger it, the delta merge process takes the values of the delta buffer and merges them into the main table area. SAP HANA then starts with a clean delta buffer. Column-based databases also don’t handle SELECT * SQL statements well, meaning reading all the columns. We’ll look at this in more detail in Chapter 12. For now, we encourage you to avoid such statements in column-based databases by asking only for the fields (columns) you need. In the next section, we’ll look at how the SAP HANA in-memory database persists its data, even when we switch off the machine. Key Concepts Refresher Chapter 3 Persistence Layer The most frequently asked question about SAP HANA and its in-memory concept is as follows: “Because memory is volatile and loses its contents when you switch off its power source, what happens to SAP HANA’s data when you lose power or switch off the machine?” The short answer is that SAP HANA still uses disks. However, whereas other databases are disk-based and optionally load data into memory, SAP HANA is in-memory and merely uses disks to back up the data for when the power is switched off. The focus is totally different. The disk storage and the data storage happens in the persistence layer, and storing data on disk is called persistence. If hardware fails, having the data also stored in the persistence layer allows for full recovery. Newer nonvolatile memory, which doesn’t lose its contents when the power gets switched off, influences the architecture of SAP HANA’s persistence layer going forward. SPS 04 As of SAP HANA 2.0 SPS 04, you can use Intel Optane DC persistent (nonvolatile) memory instead of DRAM in 50% to 80% of your SAP HANA server’s memory slots. SAP HANA is the first database to use this new type of memory in nonvolatile App Direct Mode. Depending on the size of the database, it cuts the startup time down to 8% of the usual startup time on the same hardware. This type of memory is a bit slower than normal DRAM, but due to the way that SAP HANA uses data, observed real-life results show performance very close to that of DRAM. An added benefit is that these memory modules are larger than the DRAM configurations, so you can, for example, put more memory into a four-socket server than you normally would. SAP HANA SPS 04 also has a fast restart option that you can use when patching and restarting SAP HANA. This won’t work when upgrading or rebooting a server. However, persistent memory (PMEM) does work for such scenarios. Let’s look at the various ways SAP HANA uses disks and the persistence layer in the following sections. SAP HANA as a Distributed Database Just like many newer databases, SAP HANA can be deployed as a distributed database. One of the advantages that SAP HANA has due to its in-memory architecture 85 86 Chapter 3 Technology, Architecture, and Deployment Scenarios is that it’s an atomicity, consistency, isolation, and durability (ACID)-compliant database. ACID ensures that database transactions are reliably processed. Most other distributed databases are disk-based, with the associated performance limitations, and most of them—most notably NoSQL databases—aren’t ACID-compliant. NoSQL databases mostly use eventual consistency, meaning that they can’t guarantee consistent answers to database queries. They might come back with different answers to the same query on the same data. SAP HANA will always give consistent answers to database queries, which is vital for financial and business systems. Figure 3.7 shows an instance in which SAP HANA is running on three servers. A fourth server can be added to facilitate high availability. Should any one of the three active servers fail, the fourth standby server will instantly take over the functions of the failed server. As a distributed database, SAP HANA’s shared disk storage is used to facilitate high availability in what is called the scale-out solution. Server 1 (active) Table partition 1 Server 2 (active) Server 3 (active) Table partition 2 Table partition 3 Server 4 (standby) Failover SAP HANA (Distributed DB, ACID compliant) High availability Common storage Figure 3.7 SAP HANA as a Distributed Database Partitioning SAP HANA can partition tables across active servers, meaning that we can take a table and split it to run on all three servers in Figure 3.7. This way, we not only use multiple CPU cores of one server with SAP HANA to execute queries in parallel Key Concepts Refresher Chapter 3 but also use multiple servers. We can partition data in various ways; for example, people whose surnames start with the letters A–M can be in one partition and N– Z names can be in another. SAP HANA supports three types of table partitioning, namely round-robin, hash, and range partitioning. We can use single-level or multilevel partitioning. Data Volume and Data Backups Figure 3.8 shows the various ways that SAP HANA uses disks. At the top left, you can see that the SAP HANA database is in memory. If it’s not in memory, it’s not a database. The primary storage for transactions is the memory. The disk is used for secondary storage—to help when starting up after a power down or hardware failure, with backups, and with disaster recovery. SAP HANA Data blocks (pages) SAP HANA dynamic data tiering Hot data (in memory) Row store Memory Warm data (on disk) Column store Page manager Async Savepoint Data volume Data backup System tables Transaction manager Sync At commit/buffer Log volume Log backup Figure 3.8 Persistence of Data with SAP HANA SAP HANA disaster recovery system System or storage replication Sync or async replication Dedicated or shared servers 87 88 Chapter 3 Technology, Architecture, and Deployment Scenarios Data is stored as blocks (also sometimes called pages) in memory. Every 10 minutes, a process called a savepoint runs that synchronizes these memory blocks to disks in the data volume. Only the blocks that contain data that have changed in the past 10 minutes are written to the data volume. This process is controlled by the page manager. The savepoint process runs asynchronously in the background. Data backups are made from the disks in the data volume. When you request a backup, SAP HANA first runs a savepoint and ensures that the data in the data volume is consistent, and then starts the backup process. Note The backup reads the data from the disks (secondary storage) in the data volume. This doesn’t affect the database operations because the SAP HANA database continues working from memory (primary storage). Therefore, you don’t have to wait for an after-hours time window to start a backup. SAP HANA can unload (drop) data blocks from memory when it needs more memory for certain operations. This unloading of data from memory isn’t a problem because SAP HANA knows that all the data is also safely stored in the data volume. It will unload the least-used portions of the data. It doesn’t unload the entire leastused table, but only the columns or even portions of a column that weren’t used recently. In a scale-out solution, it can even unload parts of a table partition. The most it might have to do is trigger a savepoint before unloading the least-used data. The persistence layer is therefore a very useful feature of SAP HANA. Log Volume and Log Backups In the middle section of Figure 3.8, shown previously, you can see the transaction manager writing to the log volume. The page manager and transaction manager are subprocesses of the SAP HANA index server process, which is the main SAP HANA process. If this process stops, then SAP HANA stops. When you edit data in SAP HANA, the transaction manager manages these changes. All changes are stored in a log buffer in memory. When you COMMIT the transaction, the transaction manager doesn’t commit the transaction to memory until it’s also committed to the disks in the log volume, so this is a synchronous operation. Because the system has to wait for a reply from a disk in this case, fast SSDs are used in the log volume. This synchronous writing operation is also triggered if the log buffer becomes full. Key Concepts Refresher Chapter 3 The log files in the log volume are backed up regularly using scheduled log backups. Note SAP HANA doesn’t lose transactions, even if a power failure occurs, because it doesn’t commit a transaction to the in-memory database unless it’s also written to the disks in the log volume. Startup The data in the data volume isn’t a “database.” Instead, think of it as an organized and controlled memory dump. If it was a database on disk, it would have to convert the disk-based database format to an in-memory format each time it started up. Now, it can just reload the “memory dump” from disk back into memory as quickly as possible. When you start up an SAP HANA system, it performs what is called a lazy load. It first loads the system tables into memory and then the row store. Finally, it loads any column tables that you’ve specified. Because the data volumes are only updated by the savepoint every 10 minutes, some transactions might not be in the data volume; SAP HANA loads those transactions from the log volume. Any transaction that was committed stays committed, and any transaction that wasn’t committed is rolled back. This ensures a consistent database. SAP HANA then reports that it’s ready for work. This ensures that SAP HANA systems can start up quickly. Any other data that is required for queries gets loaded in the background. Again, SAP HANA doesn’t have to load an entire table into memory; it can load only certain columns or part of a column into memory. In a scaleout system, it will load only the required partitions. Disks are therefore useful for getting the data back into memory after system maintenance or potential hardware failures. Disaster Recovery At the bottom right of Figure 3.8, shown earlier, you can see that the data volume and logs also are used to facilitate disaster recovery. In disaster recovery, you make a copy of the SAP HANA system in another data center. If your data center is flooded or damaged in an earthquake, your business can continue working from the disaster recovery copy. 89 90 Chapter 3 Technology, Architecture, and Deployment Scenarios The initial large copy of the system can be performed by replicating the data volume. After the initial copy is made, only the changes are replicated as they are written to the log volume. This is known as log replication. This replication can be performed by the SAP HANA system itself or by the storage vendor of the disks in the data and log volumes. The replication can happen either synchronously or asynchronously. You can also choose whether to use dedicated hardware for the disaster recovery copy or shared hardware by using the disaster recovery machines for development and quality assurance purposes. Note Disaster recovery is covered in more detail in the HA200 course, and in the SAP HANA administration guide at http://s-prs.co/v487620. Active/Active (Read-Enabled) Mode SAP HANA 2.0 has a powerful new feature called active/active (read-enabled) mode that allows you to actively use the disaster recovery hardware to help carry the load on a production system and enhance performance. This feature only works when you’ve dedicated hardware for the disaster recovery system. After the disaster recovery system (also known as the secondary system) is synchronized with the primary SAP HANA system, only the updates need to be sent across via log replication, as shown in Figure 3.9. The changes are then replayed asynchronously on the secondary system. Application Read and write SAP HANA (primary) Log replication Log replayed asynchronously Read-only SAP HANA (secondary) Disaster recovery system Figure 3.9 SAP HANA Active/Active (Read-Enabled) Mode Because both systems are in sync and contain the same data, we can now use the secondary system for reporting and other read-only requirements. Updates must Key Concepts Refresher Chapter 3 happen on the primary system because the direction of the log replication is only from the primary to the secondary system. Any actions that insert, edit, update, or delete data therefore need to use the primary system. All other actions can use either the primary or the secondary system. The load is thus spread across both systems. This feature means you can buy smaller machines because you can use the dedicated disaster recovery hardware to help with your productive load, leading to cost savings. Previously, we mentioned how reporting users could affect the performance of operational systems of old disk-based systems. The solution was to separate OLTP and OLAP into different applications. With SAP HANA, we run both together. Using the active/active (read-enabled) mode, we can still separate read-intense operations, such as reporting or data analysis, using the secondary system. But this is now still running on the same application based on SAP HANA instead of a separate data warehouse. SPS 04 With SAP HANA SPS 04, you can now create multiple secondary sites using multi-target replication. This can be used, for example, to have multiple secondary sites for regions to decrease latency for end users. Data Aging The final piece of the persistence layer is found at the top right of Figure 3.8, shown earlier. This is where disks are used for data aging. Data aging is normally described by the speed at which the data can be accessed. Data in memory can be accessed very quickly, so this is referred to as “hot” data. Data on disk is a bit slower, so this is called “warm” data. Data in other storage, such as an SAP IQ database or a Hadoop data lake, is called “cold” data. The temperature of the data refers to how fast data can be accessed, with hot having the fastest access. As the speed of data access decreases, the price of the storage also normally decreases. The idea behind data aging is that we want the latest data, that is, the data used most often, in the hot storage area. As the data becomes older, we don’t need it as often, and it can be moved to the warm storage area. After a while, we’ll probably never use the data again, but we might need to keep it due to regulatory requirements. In that case, we can move it to the cold storage area. You can specify data aging rules inside SAP HANA to take, for example, all data older than five years and 91 92 Chapter 3 Technology, Architecture, and Deployment Scenarios write it to the next storage area. All queries to the data still work perfectly but are just slower because they are read from disk instead of memory. The warm storage area is managed as a unified part of the SAP HANA database. This is in contrast to the cold storage area, which is managed separately, although the data is still accessible at any time. With SAP HANA, there are three options for warm data, as follows: 쐍 Native storage extension (NSE) This is the primary warm storage option for SAP HANA. It’s an intelligent builtin extension that supports all SAP HANA data types and models. NSE allows for a simple system landscape and is scalable with good performance. 쐍 Extension node This option allows for better performance than NSE. 쐍 Dynamic tiering This option allows for a larger data volume than NSE. You shouldn’t mix multiple warm store options in one landscape for complexity reasons. In SAP HANA 2.0, we can use the extension node warm store using multistore tables that are partitioned across both the in-memory data store and the extended storage on disk, as shown in Figure 3.10. The partitions with the most recent data can be in memory, and the remaining partitions can be in the extended storage. SPS 04 SAP HANA extension node benefits from new partitioning and scale-out features in SAP HANA SPS 04. Some of the new features are partition grouping and a range-hash partition scheme. Table partition 1 Table partition 2 Multistore table SAP HANA Memory storage Extended storage Figure 3.10 Data Aging Using Multistore Tables You’ll need extended storage installed before you can use multistore tables. Key Concepts Refresher Chapter 3 Tip Multistore tables are only available in SAP HANA 2.0. Deployment Scenarios People sometimes get confused about SAP HANA when it’s only defined with a single concept, such as SAP HANA is a database. After a while, they hear seemingly conflicting information that doesn’t fit into that simple description. In this section, we’ll look at the various ways that SAP HANA can be used and deployed: as a database, as an accelerator, as a platform, as a data warehouse, or in the cloud. When looking at SAP HANA’s in-memory technology, and especially the fact that we should ideally move code away from the application servers into SAP HANA, it became clear that SAP systems needed to be rewritten completely. Because SAP HANA doesn’t need to store values such as year-to-date, aggregates, or balances, we should ideally remove them from the database. By doing so, we can get rid of a significant percentage of the tables in, for example, a financial system. However, that implies that we should also trim the code that calculated those aggregates and inserted them into the tables we removed. In this way, we can also remove a significant percentage of our application code. Next, the slow parts of the remaining code that manipulate data should be moved into graphical calculation views in SAP HANA, removing even more code from our application. If we follow this logic, we obviously have to completely rewrite our systems. This wasn’t easy for SAP, which has millions of lines of code in every system and an established client base. This approach can be very disruptive. Disruptive innovation is great when you’re disrupting other companies’ markets, but not so much when you disrupt your own markets. Ask an existing SAP customer if he wants this new technology, and he probably will say, “Yes, but not if it means a reimplementation that will require lots of time, money, people, and effort.” Completely rewriting the system would lead to customers replacing their systems with the new solution; that also means that customers with the older systems would not benefit at all from this innovation. If you look at the new SAP S/4HANA systems, you’ll see that they follow the more disruptive approach we just described. SAP has also stated that its business applications will be migrated to run on SAP HANA. In most cases, they will just run on SAP HANA, as these systems will be optimized to make full use of SAP HANA’s capabilities. 93 94 Chapter 3 Technology, Architecture, and Deployment Scenarios However, there are other ways in which we can implement this disruptive new technology in a nondisruptive way—and this is how SAP approached the process when implementing SAP HANA a few years ago. The following sections walk through the deployment scenarios for implementation. SAP HANA as a Side-by-Side Accelerator Solution The first SAP HANA deployment scenario is a side-by-side accelerator solution, as illustrated in Figure 3.11. SAP ERP Database SAP CRM Database Analytics and Reporting SLT Replication Server SAP HANA Innovation without disruption Non-SAP Database Figure 3.11 SAP HANA Deployed as a Nondisruptive Side-by-Side Accelerator Solution In this deployment scenario, the source systems stay as they are. No changes are required. We add SAP HANA on the side to accelerate only those transactions or reports that have performance issues in the source systems, and then we duplicate the relevant data from the source system to SAP HANA. This duplication is normally performed by the SAP Landscape Transformation Replication Server. (We’ll discuss this topic in Chapter 9.) The copied data in SAP HANA is then used for analysis or reporting. Key Concepts Refresher Chapter 3 The advantages of this deployment option include the following: 쐍 Because we just copy data, we don’t change the source system at all. Therefore, everyone can use this innovation without the disruption (even those who have very old SAP systems). 쐍 We can use this option for all types of systems, not just the latest SAP systems. In fact, a large percentage of the systems that use SAP HANA are non-SAP systems. 쐍 We don’t have to copy the data for an entire system. If a report already runs in three seconds, that’s good enough; it doesn’t have to run in three milliseconds. If a report runs for two hours, but runs only once a year and is scheduled to run overnight, that’s fine too. We only take the data into SAP HANA that we need to accelerate. 쐍 We can replicate data from a source system in real time. The reports running on SAP HANA are always fully up to date. Even though we make a copy of data, remember that SAP HANA compresses this data and doesn’t copy all the data. Therefore, the SAP HANA system is much smaller than the source systems in this deployment scenario. However, the SAP HANA system in the scenario we described starts as a blank system. You’ll have to load the data into SAP HANA, build your own information models, link them to your reports and programs, and look after the security. Much of what we’ll discuss in this book is relevant for this type of work. Using SAP HANA in the side-by-side deployment involves some work because it starts as a blank system. SAP quickly realized that special use cases occurred at many of their customers where some common processes were slow; one of these processes is profitability analysis. Profitability is something that any company needs to focus on, but profitably analysis reports are slow running due to the large amounts of data they have to process. By loading profitability data into SAP HANA, the response times of these reports go from hours to mere seconds (see Figure 3.12). For these accelerator cases, SAP provides prebuilt models in SAP HANA. SAP also slightly modified the ABAP code for Profitability Analysis (CO-PA) to read the data from SAP HANA instead of the original database. SAP Landscape Transformation Replication Server replicates just the required profitability data from the SAP ERP system to the SAP HANA system. 95 96 Chapter 3 Technology, Architecture, and Deployment Scenarios CO-PA ABAP SAP HANA SAP ERP Database SLT Replication Server Figure 3.12 SAP HANA Used as an Accelerator for a Current SAP Solution The installation and configuration process is performed at a customer site over a few days, instead of weeks and months. The only thing that changes for the end users is that they have to get used to the increased processing speed, and no one has a problem with that! The SAP HANA Live topic that we’ll discuss in Chapter 14 can also be used in a sideby-side configuration. SAP HANA as a Database When SAP released the side-by-side solution, many people thought that it would replace their data warehouses. However, this worry was alleviated when SAP announced that SAP HANA would be used as the database under SAP Business Warehouse (SAP BW). Figure 3.13 illustrates a typical (older) SAP landscape. SAP HANA can be used to replace the databases under the older SAP BW, SAP Customer Relationship Management (SAP CRM), and SAP ERP (which are part of SAP Business Suite) systems. There are well-established database migration procedures to replace a current database with SAP HANA. In each case, SAP HANA replaces the database only. The SAP systems still use the SAP application servers as they did previously, and end users still access the SAP systems as they always did, via the SAP application servers. All reports, security, and transports are performed as before. Only the system administrators and DBAs have to learn SAP HANA and log in to the SAP HANA systems. Key Concepts Refresher Chapter 3 SAP ERP Database SAP CRM Database Corporate reporting and analytics Enterprise data warehouse (SAP BW) Database Non-SAP Database Figure 3.13 Typical Current SAP Landscape Example When you use SAP HANA as a method to just accelerate reporting and analytics, you get a similar situation. The users log in to the reporting and analytics tools, and they basically use SAP HANA as a database. Figure 3.14 looks very similar to Figure 3.13, except that the various databases were replaced by SAP HANA. By only replacing the database, SAP again managed to provide most of the benefit of this new in-memory technology to businesses but without the cost, time, and effort of disruptive system replacements. The applications’ code wasn’t forked to produce different code for SAP BW on SAP HANA versus SAP BW on older databases. Over time, after customers and partners became familiar with SAP HANA, SAP moved these solutions completely over to SAP HANA. We now have SAP S/4HANA and SAP BW/4HANA that replace the SAP ERP and SAP BW systems, respectively, that just used SAP HANA as yet another database. 97 98 Chapter 3 Technology, Architecture, and Deployment Scenarios SAP ERP SAP HANA SAP CRM SAP HANA Corporate reporting and analytics Enterprise data warehouse (SAP BW) SAP HANA Non-SAP Database Figure 3.14 SAP HANA as Database Can Change the Current Landscape Let’s look at the example of SAP BW in a bit more detail. Because SAP HANA can address both OLAP and OLTP in a single system, a data warehousing solution such as SAP BW has changed a lot. SAP started with the “classic” SAP BW that was running on other databases. When SAP HANA was released, SAP BW was ported to also run on SAP HANA, and it used SAP HANA as just another database. The application stayed roughly the same. Remember that the “classic” SAP BW has its own information modeling objects— for example, InfoCubes and DataStore Objects (DSOs). These objects in SAP BW are built in the application server layer. Even if SAP BW is powered by SAP HANA, and these SAP BW objects are optimized for SAP HANA, they are still completely different from the normal SAP HANA information models, such as calculation views. As SAP HANA became more popular, and people started realizing the advantages of using SAP HANA, new features started appearing in SAP BW. With SAP BW 7.4, Key Concepts Refresher Chapter 3 we had SAP BW powered by SAP HANA. We could use the classical SAP BW modeling objects alongside some new SAP HANA-specific ones such as Advanced DataStore Objects (Advanced DSOs). It could also run both the old 3.x and the new 7.x data flows. SAP HANA enhanced the performance of SAP BW significantly. SAP then rewrote its SAP Business Suite systems to make full use of SAP HANA. In the process, the systems became a lot simpler and smaller. This is the SAP S/4HANA system. Looking at SAP BW, SAP decided to do the same for SAP BW by creating a new version of SAP BW, called SAP BW/4HANA, that relies solely on SAP HANA. It doesn’t run on any other database, like the “classic” SAP BW did. In the process, SAP also simplified and reduced the size of SAP BW but with even more capabilities. Table 3.2 shows how the number of modeling objects was reduced from 10 to just 4. For example, the InfoCubes and DSOs we mentioned earlier were both replaced with Advanced DSOs. In SAP BW/4HANA, these old objects aren’t available. Old functionality such as 3.x data flows are also no longer available. Classic SAP BW SAP BW/4HANA 쐍 MultiProvider 쐍 CompositeProvider 쐍 InfoSet 쐍 CompositeProvider 쐍 Virtual Provider 쐍 Virtual Provider 쐍 Open ODS view 쐍 Transient Provider 쐍 InfoCube 쐍 Advanced DSO 쐍 DSO 쐍 Hybrid Provider 쐍 Persistent Staging Area (PSA) Table 쐍 InfoObject 쐍 InfoObject Table 3.2 Reduced Number of Modeling Object Types in SAP BW/4HANA SAP HANA as a Platform The next way we can deploy SAP HANA is by using it as a full development platform. SAP HANA is more than just an in-memory database. We’ll quickly discuss how SAP HANA 1.0 evolved into a development platform. 99 100 Chapter 3 Technology, Architecture, and Deployment Scenarios Which other database has several different programming languages included and has version control, code repositories, and a full application server built in? SAP HANA 1.0 already had all of these. This makes SAP HANA 1.0 not just a database, but a platform. You can program in SQL, SQLScript (an SAP HANA-only enhancement that we’ll look at in Chapter 8), R, and JavaScript. However, the main feature we use inside SAP HANA 1.0 when deploying it as a platform is the application server, called the SAP HANA extended application services (SAP HANA XS) engine, which is used via web services (see Figure 3.15). A new version of the SAP HANA XS engine appeared in the last support packs of SAP HANA 1.0, called SAP HANA extended application services, advanced model (SAP HANA XSA). The older version we’re discussing here was renamed to SAP HANA extended application services, classic model (SAP HANA XS). The SAP HANA XS engine is a full web and application server built into SAP HANA 1.0 that is capable of serving and consuming web pages and web services. Extended Application Services (XS engine) SAP HANA • SAP HANA views SAPHANA HANAviews views SAP • SQL, SQLScript HTML5 browser View created using SAPUI5 • OData • REST Mobile (tablet or phone) • HTML5 • CSS3 • Client-side JavaScript • Server-side JavaScript Model Controller View displayed Model-View-Controller (MVC) Figure 3.15 SAP HANA 1.0 as a Development Platform When we deploy SAP HANA 1.0 as a platform, we only need an HTML5 browser and SAP HANA XS. In this case, we still develop data models as we discussed previously with the side-by-side accelerator deployment, but we also develop a full application in SAP HANA using JavaScript and an HTML framework called SAPUI5. All this functionality is built in to SAP HANA 1.0 and SAP HANA XS. Key Concepts Refresher Chapter 3 The web application development paradigm that many frameworks use is called model-view-controller (MVC). Model in this case refers to the information models that we’ll build in SAP HANA and is the focus of this book. We expose these information models in the controller as OData or REST web services. This functionality is available in SAP HANA and is easy to use. You can enhance these web services using server-side JavaScript, meaning that SAP HANA runs JavaScript as a programming language inside the server. The last piece is the view, which is created using the SAPUI5 framework. The screens are the same as the SAP Fiori screens in the new SAP S/4HANA systems, which are HTML5 screens. What is popularly referred to as HTML5 actually consists of HTML5, Cascading Style Sheets (CSS3), and client-side JavaScript. In this case, the JavaScript runs in the end user’s browser. By using JavaScript for both the view and the controller, SAP HANA XS developers only have to learn one programming language to create powerful web applications and solutions that leverage the power and performance of SAP HANA. We can even create native mobile applications for Apple iOS, Android, and other mobile platforms. Some free applications, such as Cordova, take the SAP HANA web applications and compile them into native mobile applications, which can then be distributed via mobile application stores. As a result, you can use SAP HANA’s power directly from your mobile phone. You can also use the SAPUI5 application with the React framework and React Native to create native mobile applications. In the next chapter, we’ll see how these concepts expanded with SAP HANA XSA and SAP HANA 2.0. SAP HANA as an SQL Data Warehouse A relatively new use case of SAP HANA is as an enterprise data warehouse (EDW). SAP has offered the SAP BW system for customer’s data warehousing needs for many years. SAP BW has evolved to use SAP HANA and has been optimized for SAP HANA with SAP BW/4HANA. It’s still the recommended route for customers who want strong built-in features for data governance and integrity, processes, controls, and integration to other SAP systems for an enterprise-wide solution. However, more and more companies want to build their own data warehouse using only SAP HANA. Whereas SAP BW is a data warehousing application using SAP HANA, the new SAP SQL Data Warehousing approach is a developer-oriented and SQL-driven approach 101 102 Chapter 3 Technology, Architecture, and Deployment Scenarios to data warehousing using SAP HANA XS. In both cases, you can provision data using the SAP tools (discussed in Chapter 9) and using the SAP BusinessObjects tools or SAP Analytics Cloud (discussed later in this chapter). SAP HANA has provided strong support for the OLAP warehousing functionality for many years now. You’ll see this in the next few chapters. SAP has now added an optional component called the SAP HANA data warehousing foundation for optimized data distribution, lifecycle management, monitoring, and scheduling. This additional functionality helps companies build native data warehouses using just SAP HANA. Note SAP HANA data warehousing foundation starts with a blank system and offers a do-ityourself approach to build your own data warehouse using loosely coupled tools that may have separate repositories. Figure 3.16 shows the high-level architecture of the SAP SQL Data Warehousing with SAP HANA data warehousing foundation version 2.0. At the bottom are the various data sources of the data provisioned to SAP HANA, which we’ll cover in Chapter 9. The middle of the figure shows SAP SQL Data Warehousing, and we’ll focus on most of the modeling and development functionality shown here in this book. The left side of the figure shows the various tools we’ll use in this book for building what is required in the data warehouse. The right side of the figure shows the various data lake options that we can exchange data with. Finally, the reporting and analysis layer appears at the top, as discussed later in this chapter. Tip The versions of SAP HANA data warehousing foundation are independent from the versions of SAP HANA. It’s important to note that SAP HANA data warehousing foundation 1.0 used SAP HANA XS, whereas the current SAP HANA data warehousing foundation 2.0 uses SAP HANA XSA. Although Figure 3.16 shows SAP HANA data warehousing foundation 2.0, we’ll also quickly discuss SAP HANA data warehousing foundation 1.0. Key Concepts Refresher Chapter 3 SAP Lumira Consume BI SAP Web IDE for SAP HANA Third-Party SAP BusinessObjects … Data Lake SAP Vora Calculation Views SQL Procedures TaskChain CDS – NDSO DW Monitor Flowgraphs Virtual Tables SAP Enterprise Architecture Designer GitHub Hadoop Altiscale Spark … In-Memory ETL Provisioning Data Sources SAPUI5 SAP HANA SQL Data Warehousing Tools … Predictive 103 SAP S/4HANA Twitter Streaming Sensors Replication Other DBs Virtual Access Teradata SAP Fieldglass Figure 3.16 SAP SQL Data Warehousing with SAP HANA Data Warehousing Foundation 2.0 SAP HANA data warehousing foundation 1.0 delivers several SAP HANA XS applications: 쐍 Data Distribution Optimizer (DDO) This application is used with SAP HANA scale-out systems to analyze, plan, and adjust the reorganizations of data in the landscape. Tables with data that are used together often can be moved to the same node to enhance query performance. 쐍 Data Lifecycle Manager (DLM) This application enables data distribution and aging using, for example, dynamic data tiering. 쐍 Data Warehouse Monitor (DWM) This application shows the activities in the data warehouse. 쐍 Data Warehouse Scheduler (DWS) This application provides, for example, a framework to define task chains as a sequence of single tasks. … 104 Chapter 3 Technology, Architecture, and Deployment Scenarios The last two options were shown as planned functionality in SAP HANA 1.0, but they are available in SAP HANA 2.0. You might ask when you would use SAP SQL Data Warehousing. If you want to build your own data warehousing as part of an application you’re building, you prefer a web user experience such as a 100% open SQL approach, you want the freedom to create custom data models and processes, or you want to use techniques such as DevOps and continuous integration, testing, and deployment, this is definitely something you should look at. SAP HANA in the Cloud SAP HANA has many features that make it ideal for use in a cloud platform environment. SAP, like many other enterprises, has moved aggressively into the cloud with many of its solutions. Many features that are required for smooth cloud operations were built into SAP HANA over the past few years. SAP HANA works great as a cloud enabler because it accelerates applications, providing users with fast response times, which we all expect from such environments. Multitenant Database Containers The multitenant database containers (MDC) deployment option became available as of SAP HANA 1.0 SPS 09 and helps us move from the world where everything is deployed in the company’s data center, called on premise, to the cloud. We can now have multiple isolated databases inside a single SAP HANA system, as shown in Figure 3.17. Each of these isolated databases is referred to as a tenant. Think of tenants occupying an apartment building in which each tenant has a front door key and owns furniture inside his or her apartment. Other tenants can’t get into another tenant’s apartment without permission. However, the maintenance of the entire apartment building is performed by a single team. If the building needs repairs to the water pipes, all tenants will be without water for a while. In the same way, tenants in MDC have their own unique business systems, data, and business users. In Figure 3.17, you can see that each of the three tenants has its own unique applications installed and connected to its own tenant database inside SAP HANA. Key Concepts Refresher Chapter 3 SAP HANA Tenant 1 (own data, own users) Application 1 Tenant 2 (own data, own users) Application 2 Common Administration Server maintenance Monitoring Backups Upgrades Tenant 3 (own data, etc.) XS Browser Figure 3.17 Multitenancy with SAP HANA MDC Each tenant can make its own backups if the user wishes. Tenants can also collaborate, so one tenant can use the SAP HANA models of another tenant if the tenant has been given the right security access. However, all the tenants share the same technical server infrastructure and maintenance. The system administration team can make a backup for all the tenants. (Any tenant can be restored individually, though.) SAP HANA upgrades and maintenance schedules happen at the same time for all the tenants. Your own private SAP HANA system can be deployed using MDC. Even the free SAP HANA, express edition, which we discussed in Chapter 2, uses the MDC deployment option. But MDC makes the most sense in a cloud environment. Multitenancy is a vital part of any cloud strategy. When you sign up for a cloud account, you don’t want to wait several weeks for hardware to be procured. You expect the system to be available in a few minutes. Creating new tenants allows this to happen as expected, while maintaining full isolation from other tenants’ information. Note The tenants are all hosted inside a single SAP HANA system, which means that all tenants use the same version of SAP HANA. When the SAP HANA system gets upgraded or patched, this affects all the tenants. 105 106 Chapter 3 Technology, Architecture, and Deployment Scenarios Cloud Deployments We’ve now discussed various ways to deploy SAP HANA. Most of the time, we think of these deployments as running on premise on a local server, but all of these scenarios can also be deployed in the cloud. In cloud deployments, virtualizations manage applications such as SAP HANA. VMware vSphere virtual machine deployments of SAP HANA are fully supported, both for development and production instances. With cloud deployments, you’ll be introduced to various new terms: 쐍 Infrastructure as a service (IaaS) An infrastructure includes things such as network, memory, CPU, and storage. IaaS provides infrastructure in the cloud to support your deployments. 쐍 Platform as a service (PaaS) PaaS uses solutions such as SAP Cloud Platform. In this scenario, you use not only the same infrastructure as IaaS but also a platform, such as a Java application service or the SAP HANA platform services. 쐍 Software as a service (SaaS) This cloud offering provides certain software, such as SAP SuccessFactors or SAP C/4HANA. 쐍 Managed cloud as a service (McaaS) In McaaS, SAP manages the entire environment for you, including backups, disaster recovery, high availability, patches, and upgrades. An example of this is the SAP HANA Enterprise Cloud. Public, Private, and Hybrid Cloud Your cloud scenario could be public, private, or hybrid. Let’s look at what each of those terms means and when each scenario applies. The easiest and cheapest way for you to get an instance of SAP HANA for use with this book is via Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure. We’ve looked at this in Chapter 2. By using your own account, you can have an SAP HANA system up and running in minutes. This is available to everyone—the public—so therefore this is a public cloud. In this example, the cloud provider provides IaaS. The reason this is called a public cloud is that the infrastructure is shared by members of the public. You don’t get your own dedicated machine at Amazon, Microsoft, Google, or any other public cloud provider. Key Concepts Refresher Chapter 3 If you host SAP HANA in your own data center or in a hosted environment specifically for you only, then it’s a private or hybrid cloud. A private cloud offers the same features that you would find in a public cloud, but it’s dictated by different forms of access. With a private cloud, the servers in the cloud provider’s data center are dedicated for your company only. No other companies’ programs or data are allowed on these servers. A hybrid cloud involves either combining a public and private cloud offering, or running a private and public cloud together. With a hybrid cloud, you connect your private cloud’s server back to your own data center via a secure network connection. In that case, the servers in the cloud act as if they are part of your data center, even though they are actually in the cloud provider’s data center. An example of a hybrid cloud is using a Microsoft Azure cloud and another private cloud together. SAP HANA Enterprise Cloud SAP HANA Enterprise Cloud offers SAP applications as cloud offerings. You can choose to run SAP HANA, SAP BW/4HANA, SAP S/4HANA, SAP CRM, SAP Business Suite, and many others as a full service offering in the cloud. All the systems that you would normally run on premise are available in SAP HANA Enterprise Cloud. SAP provides and manages the hardware, infrastructure, SAP HANA, and other SAP systems for the customer. The customer only has to provide the license keys for these systems. This is referred to as bring your own license. SAP HANA Enterprise Cloud is seen as an MCaaS solution on a private cloud, but it can also be deployed in a hybrid cloud offering. SAP Cloud Platform SAP Cloud Platform is a PaaS that allows you to build, extend, and run applications in the cloud. Services include SAPUI5, HTML5, SAP HANA platform services, Java applications, and more. SAP Cloud Platform was previously known as SAP HANA Cloud Platform. Behind the scenes, more than just a name change happened. SAP Cloud Platform not only uses SAP HANA but also many open-source technologies such as OpenStack and Cloud Foundry. SAP is a Founding Platinum Level Member of the Cloud Foundry Foundation, which means that you can use Cloud Foundry buildpacks and Docker 107 108 Chapter 3 Technology, Architecture, and Deployment Scenarios containers, with microservices written in any language of your choice, in SAP Cloud Platform. SAP Cloud Platform also offers a wide range of services to get your cloud applications up and running quickly. These can be, for example, security, HTML framework, database, Internet of Things (IoT), machine learning, predictive, provisioning, analytics, mobile, development, build, and deployment services. You can even sell your SAP Cloud Platform applications in the SAP App Center. SAP Cloud Platform forms the core of the new SAP Leonardo offering, where SAP focuses on digital innovation and transformation. SAP Cloud Platform will also run on the cloud offerings of Amazon, Google, and Microsoft. This new option is termed multi-cloud. To better understand SAP Cloud Platform, Cloud Foundry, and the new development process, we’ll return to SAP HANA in the next chapter and look deeper into SAP HANA XSA. SAP Analytics Cloud and Business Intelligence One topic that has been heavily influenced by SAP HANA is that of analytics. Running fast and interactive analytics in the cloud was unforeseen just a few years ago. Everyone was using on-premise reporting tools, and these reports often took hours to generate. SAP HANA changed that very quickly by generating the same or better reports in mere seconds. Today the focus is on SAP Analytics Cloud, but SAP has a lot of other noncloud BI tools as well, which you’ll see in the next sections. You can connect all these tools to SAP HANA; however, all BI tools must be used correctly with SAP HANA to ensure optimal benefit. SAP Analytics Cloud In the space of two or three years, the emphasis moved from running analytics and reports on your own computer to using SAP Analytics Cloud. One of the main advantages is that SAP Analytics Cloud gets updated every few weeks, whereas the on-premise tools often had update cycles of years. Another advantage is that SAP Analytics Cloud displays everything in your browser. It doesn’t matter whether you’re using a tablet, a Windows laptop, or an Apple Mac, you no longer need to install drivers, special programs, and the like. Key Concepts Refresher Chapter 3 Figure 3.18 shows an example of the SAP Analytics Cloud functionality in a web browser. You can easily share your analytics and reports with other users. And when you compare the results, features, and tools available for creating BI content in SAP Analytics Cloud, it clearly wins over older reporting tools. Figure 3.18 An SAP Analytics Cloud Screen in a Web Browser You can even use it in hybrid scenarios, where you use both on-premise and cloud data sources. The on-premise data sources can range from Microsoft Excel and comma-separated values (CSV) data files to SAP BW and universes. SAP Analytics Cloud gives you the ability to combine BI, available industry content, planning, analysis, and predictive to create interactive views that also allow users to drill down to the underlying data. Figure 3.18 shows SAP Digital Boardroom, built on SAP Analytics Cloud, that allows executives to view all their company metrics in a comprehensive dashboard, while still allowing them to explore the data underneath each of the graphs or tables. Having data at your fingertips allows you to make decisions immediately, instead of waiting for additional BI reports that used to take days or weeks. SAP Business Intelligence Tools Along with SAP Analytics Cloud, SAP also has a large number of other BI tools that it has collected and developed over the years. Some were added when SAP 109 110 Chapter 3 Technology, Architecture, and Deployment Scenarios acquired Business Objects. Others were BI tools developed for use with SAP BW. Others are new tools developed to make use of SAP HANA or to add new BI features, such as visualization and predictive analysis. SAP also has the powerful SAP BusinessObjects BI platform, where reports, with their user administration, security, storage, and scheduling, can be centrally managed. SAP realized that the list of BI tools was too long and decided to converge these into just five: 쐍 SAP Analysis for Microsoft Office This tool runs as a Microsoft Office plug-in. It appears in Excel as an additional ribbon bar menu and provides powerful reporting from trusted OLAP data sources, such as SAP HANA. 쐍 SAP Lumira, discovery edition This comparatively new tool for self-service and visualization of analytics was designed with SAP HANA in mind, so it leverages SAP HANA features to deliver fast results. SAP Lumira, discovery edition integrates with SAP Analytics Cloud. It’s also easy to consume SAP Lumira content on mobile devices via the SAP BusinessObjects Mobile app. SAP Lumira is well suited for ad hoc types of analytics; you can view and visualize any type of data in the way you like with minimal effort. 쐍 SAP Lumira, designer edition This tool is used to create dashboards, OLAP web applications, and guided analytics. It features native (direct) connections to SAP BW and SAP HANA, and it falls into the professionally authored category, in which analytic content is centrally governed. Because it’s more of a developer’s tool, some knowledge of JavaScript is required to get the most out of the tool. SAP Lumira, designer edition, has a “what you see is what you get” (WYSIWYG) design environment. 쐍 SAP BusinessObjects Web Intelligence Popularly called WebI, this is one of the most popular SAP BI tools because it’s easy to work with, yet powerful and fast enough to get the job done. It was one of the first self-service style BI tools from SAP. End users can consume SAP intelligence reports using the mobile app or via a browser. 쐍 SAP Crystal Reports This tool is used to create reports that often have the exact specifications and look of the existing paper forms in a “pixel-perfect” format. SAP Crystal Reports Key Concepts Refresher Chapter 3 focuses on professionally authored reports, which can be viewed on desktops, browsers, and offline. This tool has been around since the 1990s and was probably one of the first reporting tools that most modelers and developers used. Let’s now see how you can use these tools with SAP HANA. Using Business Intelligence Tools with SAP HANA One of the most important lessons to learn from the way that SAP Analytics Cloud works is how it utilizes interactive drilldown analytics. When we use BI tools this way, people become more productive because SAP HANA is actually being used in the optimal way. Let’s use Excel as an example. It’s obviously not an SAP tool but has been the most widely used and best-known BI tool available for more than 30 years. Excel is probably the ultimate self-service BI tool. However, despite all its flexibility, it’s also easy to use it “incorrectly.” If you import 700,000 records into Excel and then build a pivot table on that data, you’re not using Excel in the best possible way. If you use it that way, you use Excel as a database—and it’s not meant to be a database. You’re also straining the network while transferring that much data. You get users complaining about how slow their reports are, only to discover that they run them in Excel on old laptops with very little memory on a slow network. A better way to use Excel is to connect it to SAP HANA information views, as shown in Figure 3.19 1. You can then build the pivot table directly on the SAP HANA view 2. In this case, SAP HANA performs all the calculations, and Excel just displays the results. The 700,000 records stay in SAP HANA, and the minimum amount of data for displaying the results is transferred via the network. SAP HANA usually also uses less memory and thus is faster. The best way to test if you’re using BI tools correctly is to ask whether it will work on a mobile device, such as a tablet. If you load 700,000 records into Excel, it won’t work on your tablet. The tablet simply doesn’t have enough memory, and your mobile data costs will be high. With the “correct” way, on the other hand, you could probably get this setup working on the mobile app version of Excel if you had the correct driver installed. An added benefit of letting SAP HANA do the heavy lifting is that your reports and analytics become interactive; for example, drilldowns are easy to implement. 111 112 Chapter 3 Technology, Architecture, and Deployment Scenarios Figure 3.19 Using SAP HANA Information Models with Hierarchies in an Excel Pivot Table Connecting Business Intelligence Tools to SAP HANA SAP HANA provides drivers known as SAP HANA clients that are commonly used to connect SAP HANA to other types of databases and BI tools or vice versa. The query language sent over these connections is normally SQL, which we’ll discuss in Chapter 8. It can also sometimes be multidimensional expressions (MDX). MDX is a query language for OLAP databases, similar to how SQL is a query language for relational OLTP databases. The MDX standard was adopted by a wide range of OLAP vendors, including SAP. Let’s look at some of the common database connections you might encounter: 쐍 Open Database Connectivity (ODBC) ODBC acts as a translation layer between an application and a database via an ODBC driver. You write your database queries using a standard application Key Concepts Refresher Chapter 3 programming interface (API) for accessing database information. The ODBC driver translates these into database-specific queries, making your database queries database and operating system independent. By changing the ODBC driver to that of another database and changing your connection information, your application will work with another database. 쐍 Java Database Connectivity (JDBC) JDBC is similar to ODBC, but it’s specifically aimed at the Java programming language. 쐍 Object Linking and Embedding Database for Online Analytical Processing (ODBO) ODBO provides an API for exchanging metadata and data between an application and an OLAP server (e.g., a data warehouse using cubes). You can use ODBO to connect Excel to SAP HANA. 쐍 Business Intelligence Consumer Services (BICS) BICS is an SAP-proprietary database connection. This direct client connection performs better and faster than MDX. Hierarchies are supported, negating the need for MDX in SAP environments. Because this is an SAP-only connection type, you can only use it between two SAP systems, for example, from an SAPprovided BI tool to SAP HANA. The fastest way to connect to a database is via dedicated database libraries. ODBC and JDBC insert another layer in the middle that can impact your database query performance. Some reporting tools offer a choice of connectors, for example, allowing you to use either a JDBC or an ODBC connector. Excel uses the ODBO driver to connect to SAP HANA. It’s the only BI tool that uses this driver. The ODBO driver implies that it uses MDX, which means it can read cubes, and Excel is able to consume SAP HANA calculation cube views. You can actually see this in Figure 3.19 in the dialog box on the left. SAP HANA Providing the Semantic Layer If you used older reporting or BI tools, you’ll know that you had to add a semantic layer between the data sources and your reports. Pure data, for example, doesn’t have hierarchies defined, but you need hierarchies when users want to drill down deeper into the data. SAP BusinessObjects Universe Designer was used for creating “universes,” and this evolved into the Information Design Tool while the “universes” evolved into semantic layers. These tools combined multiple data sources into harmonized models that could be used for reporting. 113 114 Chapter 3 Technology, Architecture, and Deployment Scenarios You’ll see in the following chapters how SAP HANA provides these features while you’re creating your information models. SAP also uses this in its various products, such as SAP S/4HANA and SAP C/4HANA. The raw data is combined and filtered, additional fields are calculated, aggregates are created, currencies are converted, and semantics are added in the models. When you expose the models, they are ready for consumption by any reporting tool—even a 30-year-old tool such as Excel! You’ve now seen how SAP HANA architecture works, from the hardware with memory and CPU cores, all the way up to the cloud platforms and analytics. Important Terminology The following important terminology was covered in this chapter: 쐍 In-memory technology In-memory technology is an approach to querying data residing in RAM, as opposed to physical disks. The results of doing so lead to shortened response times. This enables enhanced analytics for BI/analytic applications. 쐍 SAP HANA architecture No longer constrained by disks, other solutions were added, including columnbased storage, compression, no indices, delta merges, scale-out, partitioning, and parallelism. With the paradigm shift to in-memory technologies, various programming languages and application development processes were added to SAP HANA. SAP HANA still uses disks but purely as secondary storage (backup) because memory is volatile. With the data and log volumes, SAP HANA ensures that no data ever gets lost. The data and log volumes are also used for backups, when starting up the SAP HANA system, and for disaster recovery. 쐍 Persistence layer The disk storage and data storage happens in the persistence layer, and storing data on disk is called persistence. 쐍 SAP HANA as a side-by-side accelerator Source systems stay as they are, and SAP HANA is added on the side to accelerate any transactions or reports that have performance issues in the source systems. The relevant data is duplicated from the source system to SAP HANA. This duplication is normally performed by the SAP Landscape Transformation Replication Server. The copied data in SAP HANA is then used for analysis or Important Terminology Chapter 3 reporting. SAP provides prebuilt models in SAP HANA to accelerate processes such as profitability data. SAP Landscape Transformation Replication Server replicates just the required profitability data from the SAP ERP system to the SAP HANA system. 쐍 SAP HANA as a database SAP HANA is used to replace the databases under SAP BW, SAP CRM, and SAP ERP (all part of the SAP Business Suite). There are well-established database migration procedures to replace the current database with SAP HANA. In each case, SAP HANA replaces the database only. 쐍 SAP HANA as a platform SAP HANA is a development platform. By combining the SAP HANA information models with development in the SAP HANA XS application server, we can build powerful custom web applications. 쐍 Public cloud A public cloud is available to anyone, offering any type of cloud service. The infrastructure is shared by members of the public. You don’t get your own dedicated machine at Amazon or any other public cloud provider. 쐍 Private cloud A private cloud offers the same features that you would find in a public cloud, but it’s dictated by different forms of access. With a private cloud, the servers in the cloud provider’s data center are dedicated for your company only. No other companies’ programs or data are allowed on these servers. 쐍 Hybrid cloud A hybrid cloud involves either combining a public and private cloud offering or running a private and public cloud together. With a hybrid cloud, you connect your private cloud’s server back to your own data center via a secure network connection. In that case, the servers in the cloud act as if they are part of your data center, even though they are actually in the cloud provider’s data center. 쐍 SAP HANA Enterprise Cloud SAP HANA Enterprise Cloud offers all the SAP applications as cloud offerings. You can choose to run SAP HANA, SAP BW, SAP ERP, SAP CRM, SAP Business Suite, and many others as a full-service offering in the cloud. All the systems that you would normally run on premise are available in SAP HANA Enterprise Cloud. 쐍 SAP Cloud Platform This PaaS allows you to build, extend, and run applications in the cloud. This includes SAPUI5, HTML5, SAP HANA XS, Java applications, and more. 115 116 Chapter 3 Technology, Architecture, and Deployment Scenarios 쐍 SAP HANA extended application services, classic model (SAP HANA XS) SAP HANA XS embeds an application server into SAP HANA and allows developers and modelers to use SAP HANA as a platform. It uses the MVC paradigm for creating web services. 쐍 SAP HANA extended application services, advanced model (SAP HANA XSA) SAP HANA XSA is the new expanded version of SAP HANA XS that is used extensively in SAP HANA 2.0. Practice Questions These practice questions will help you evaluate your understanding of the topics covered in this chapter. The questions shown are similar in nature to those found on the certification examination. Although none of these questions will be found on the exam itself, they allow you to review your knowledge of the subject. Select the correct answers, and then check the completeness of your answers in the “Practice Question Answers and Explanations” section. Remember, on the exam, you must select all correct answers and only correct answers to receive credit for the question. 1. Which technologies does SAP HANA use to load more data into memory? (There are 3 correct answers.) 첸 A. Eliminate indices. 첸 B. Use ZIP compression. 첸 C. Use dictionary compression. 첸 D. Store data in column tables. 첸 E. Use multicore CPU parallelism. 2. True or False: With SAP HANA, you can run OLAP and OLTP together in one system with good performance. 첸 A. True 첸 B. False Practice Questions Chapter 3 3. True or False: SAP HANA contains both row and column tables. 첸 A. True 첸 B. False 4. How do you move code to SAP HANA? (There are 2 correct answers.) 첸 A. Delete the application server. 첸 B. Use stored procedures in SAP HANA. 첸 C. Use graphical data models in SAP HANA. 첸 D. Use row tables in SAP HANA. 5. In which SAP HANA solution do you eliminate stored data, such as year-todate figures? 첸 A. SAP BW on SAP HANA 첸 B. SAP CRM on SAP HANA 첸 C. SAP S/4HANA 첸 D. SAP HANA Enterprise Cloud 6. Which deployment scenario can you use to accelerate reporting from an old SAP R/3 system? 첸 A. SAP HANA as a database 첸 B. SAP HANA as a platform 첸 C. SAP HANA as a side-by-side accelerator 첸 D. SAP HANA multitenant database containers (MDC) 7. Which deployment scenarios feature security staying in the application server and end users not logging in to the SAP HANA system? (There are 2 correct answers.) 첸 A. SAP HANA as a platform with an SAP HANA XS application 첸 B. SAP HANA as a database 첸 C. SAP HANA as a side-by-side accelerator 첸 D. SAP HANA as a reporting server 117 118 Chapter 3 Technology, Architecture, and Deployment Scenarios 8. For which deployment scenario does SAP deliver prebuilt data models? 첸 A. SAP HANA as a platform 첸 B. SAP HANA as a side-by-side accelerator 첸 C. SAP HANA as an SQL Data Warehouse 첸 D. SAP HANA on the cloud 9. True or False: SAP HANA is just a database. 첸 A. True 첸 B. False 10. What is normally used to build SAP HANA XS applications inside SAP HANA? (There are 2 correct answers.) 첸 A. Java 첸 B. JavaScript 첸 C. SAPUI5 첸 D. ABAP 11. Which type of cloud solution is SAP HANA Enterprise Cloud an example of? 첸 A. Platform as a service (PaaS) 첸 B. Infrastructure as a service (IaaS) 첸 C. Software as a service (SaaS) 첸 D. Managed cloud as a service (MCaaS) 12. Which type of cloud solution is Amazon Web Services (AWS) an example of? 첸 A. Hybrid cloud 첸 B. Public cloud 첸 C. Private cloud 첸 D. Community cloud Practice Questions Chapter 3 13. True or False: VMware vSphere virtual machines are fully supported for productive SAP HANA instances. 첸 A. True 첸 B. False 14. You use parallelism to calculate a year-to-date value quickly, which eliminates a table from the database. What are some of the implications of this action? (There are 3 correct answers.) 첸 A. Smaller database backups 첸 B. Less code 첸 C. Better user interfaces 첸 D. Faster response times 첸 E. Faster inserts into the database 15. What enables the delta buffer technique to speed up both reads and inserts in SAP HANA? 첸 A. Using column-based tables only 첸 B. Using row-based tables only 첸 C. Using both row-based and column-based tables 첸 D. Using only the insert-only technique 16. What technique is used to convert record updates into insert statements in the delta buffer? 첸 A. Unsorted inserts 첸 B. Insert-only 첸 C. Sorted inserts 첸 D. Parallelism 17. What phrase describes the main table area in the delta merge scenario? 첸 A. Read-optimized 첸 B. Write-optimized 첸 C. Unsorted inserts 첸 D. Row store 119 120 Chapter 3 Technology, Architecture, and Deployment Scenarios 18. What does the C in ACID-compliant stand for? 첸 A. Complexity 첸 B. Consistency 첸 C. Complete 첸 D. Constant 19. Which of the following are results of deploying SAP HANA as a distributed (scale-out) solution? (There are 2 correct answers.) 첸 A. Disaster recovery 첸 B. High availability 첸 C. Failover 첸 D. Backups 20. Which words are associated with the log volume? (There are 2 correct answers.) 첸 A. Synchronous 첸 B. Transaction manager 첸 C. Page manager 첸 D. Savepoint 21. Of which SAP HANA process are the transaction manager and page manager a part? 첸 A. Application server 첸 B. Statistics server 첸 C. Index server 첸 D. Name server 22. What type of load does SAP HANA perform when starting up? 첸 A. Lazy load 첸 B. Complete load 첸 C. Fast load 첸 D. Log load Practice Questions Chapter 3 23. What does SAP HANA load when starting up and before indicating that it’s ready? (There are 3 correct answers.) 첸 A. All row tables 첸 B. All system tables 첸 C. All partitions 첸 D. Some column tables 첸 E. Some row tables 24. Which type of tables required extended storage? 첸 A. Row tables 첸 B. Multistore tables 첸 C. Column tables 첸 D. In-memory tables 25. What do you need for active/active (read-enabled) mode? (There are 2 correct answers.) 첸 A. Log replication 첸 B. Extended storage 첸 C. Disaster recovery 첸 D. Multitenant database containers (MDC) 26. What SAP HANA feature do you use to host multiple applications for different companies in a single SAP HANA landscape while ensuring isolation? 첸 A. SAP HANA as a distributed database 첸 B. Extended storage 첸 C. Active/active (read-enabled) mode 첸 D. Multitenant database containers (MDC) 27. Which component of the SAP SQL Data Warehousing do you use to move tables with data that are used together often to the same node to enhance query performance? 첸 A. Data Distribution Optimizer (DDO) 첸 B. Data Lifecycle Manager (DLM) 121 122 Chapter 3 Technology, Architecture, and Deployment Scenarios 첸 C. Data Warehouse Monitor (DWM) 첸 D. Data Warehouse Scheduler (DWS) 28. Which BI tool queries SAP HANA using MDX? 첸 A. SAP Analysis for Microsoft Office 첸 B. SAP Lumira, discovery edition 첸 C. SAP BusinessObjects Web Intelligence 첸 D. Microsoft Excel 29. You want to publish your BI content directly to SAP Analytics Cloud. Which tool do you use? 첸 A. SAP Lumira, discovery edition 첸 B. SAP Crystal Reports 첸 C. SAP Analysis for Microsoft Office 30. Which of the following ways of publishing SAP HANA information model content could require some JavaScript knowledge? 첸 A. Creating BI content with SAP Lumira, designer edition 첸 B. Creating a CompositeProvider in SAP BW 첸 C. Creating a report in SAP Crystal Reports Practice Question Answers and Explanations 1. Correct answers: A, C, D SAP HANA uses column tables, dictionary compression, and eliminating indices to load more data into memory. However, it doesn’t use ZIP compression because ZIP-compressed files would have to be uncompressed before the data could be used. Multicore CPU parallelism helps with the performance but doesn’t directly address the data volume issue. It contributes indirectly because, by using it, we can eliminate things such as year-to-date tables as in SAP S/4HANA. Practice Question Answers and Explanations Chapter 3 2. Correct answer: A Yes, with SAP HANA, we can run OLAP and OLTP together in one system with good performance. Normally, this isn’t possible in the traditional database system. Note There are no True-False questions in the actual certification exam, but we use them here to help you check if you understood this chapter. 3. Correct answer: A Yes, SAP HANA contains both row and column tables. 4. Correct answers: B, C Use stored procedures and graphical data models to execute the code in SAP HANA. Deleting the application server won’t help with running the code in SAP HANA, unless we develop all that same functionality in the SAP HANA XS server. Using row tables in SAP HANA has nothing to do with executing code. 5. Correct answer: C In SAP S/4HANA, many aggregate tables are removed, and the solution is simplified. 6. Correct answer: C SAP HANA as a side-by-side accelerator doesn’t change the source systems. 7. Correct answers: B, D With SAP HANA as a side-by-side accelerator and platform, users have to log in to SAP HANA. With SAP HANA as a database and reporting server, users log in via the application server or reporting tools. 8. Correct answer: B SAP delivers prebuilt data models for the special cases of the side-by-side accelerators, for example, for profitability analysis. 9. Correct answer: B False. SAP HANA isn’t just a database. SAP HANA can also be used as a platform. Databases don’t normally have application servers, version control, code repositories, and various programming languages built in. 123 124 Chapter 3 Technology, Architecture, and Deployment Scenarios 10. Correct answers: B, C SAP HANA XS applications in SAP HANA are built using JavaScript and SAPUI5. Java and ABAP aren’t part of SAP HANA XS. With SAP HANA XSA, it’s possible to run other programming languages, such as Java and C++. 11. Correct answer: D SAP HANA Enterprise Cloud is a managed cloud as a service (MCaaS). 12. Correct answer: B Amazon Web Services (AWS) is a public cloud. 13. Correct answer: A True. Productive SAP HANA is supported on VMware vSphere. 14. Correct answers: A, B, D The implications of eliminating such tables from the database are smaller database backups, less code, and faster response times. Doing so won’t necessarily improve user interfaces. It actually eliminates extra inserts into the database but doesn’t make the current data inserts any faster. Techniques such as removing indices will achieve that. 15. Correct answer: C The delta buffer technique speeds up both reads and inserts in SAP HANA because it uses both row-based and column-based tables. Using column-based tables only speeds up reads, not inserts. Using row-based tables only speeds up inserts, not reads. Using only the insert-only technique is useless without having both row-based and column-based tables. Tip Watch out for the word “only” in a question or in an answer option. 16. Correct answer: B The insert-only technique is used to convert record updates into insert statements in the delta buffer. Unsorted inserts are used in the delta buffer, but they don’t convert updates to insert statements. Sorted inserts aren’t used in the delta buffer. Practice Question Answers and Explanations Chapter 3 Parallelism isn’t relevant in this case. 17. Correct answer: A The main table area in the delta merge scenario is read-optimized. The delta store is write-optimized, uses unsorted inserts, is uncompressed, and is based on the row store. 18. Correct answer: B ACID stands for atomicity, consistency, isolation, and durability. 19. Correct answers: B, C Deploying SAP HANA as a distributed (scale-out) solution results in high availability and failover capability. Disaster recovery and backups work whether the SAP HANA system is distributed or on a single server. 20. Correct answers: A, B Synchronous and transaction manager are associated with the log volume. Page manager and savepoint are associated with the data volume. 21. Correct answer: C The transaction manager and page manager are part of the index server. The statistics server and name server services are also part of SAP HANA, but they aren’t discussed in this book. 22. Correct answer: A SAP HANA performs a lazy load when starting up. 23. Correct answers: A, B, D During the lazy load, SAP HANA loads all row tables, all system tables, and some column tables. SAP HANA doesn’t have to load all partitions of a table. 24. Correct answer: B Multistore tables require extended storage. The other tables will work whether we use extended storage or not. 25. Correct answers: A, C You need log replication and disaster recovery for active/active (read-enabled) mode. Extended storage separates data between a memory (hot) store and a disk (warm) store. Multitenant database containers (MDC) separate tenants. 125 126 Chapter 3 Technology, Architecture, and Deployment Scenarios 26. Correct answer: D You use multitenant database containers (MDC) to host multiple applications for different companies in a single SAP HANA landscape while ensuring isolation. SAP HANA as a distributed database is a single SAP HANA system spread across multiple machines, but it’s normally a single application and doesn’t ensure isolation for multiple companies from each other. Extended storage spreads the data of a single SAP HANA system across a hot and a warm tier. It’s normally a single application and doesn’t ensure isolation for multiple companies from each other. Active/active (read-enabled) mode allows users to read data for tasks such as reporting from a replica copy of the SAP HANA system. It’s normally a single application and doesn’t ensure isolation for multiple companies from each other. 27. Correct answer: A You use the Data Distribution Optimizer (DDO) component of SAP SQL Data Warehousing to move tables with data that are used together often to the same node to enhance query performance. 28. Correct answer: D Microsoft Excel queries SAP HANA using MDX via the ODBO driver. It’s the only BI tool discussed that uses the ODBO driver. 29. Correct answer: A SAP Lumira, discovery edition, can publish your BI content directly to SAP Analytics Cloud. 30. Correct answer: A JavaScript knowledge may be required for SAP Lumira, designer edition. Takeaway You now understand in-memory technologies such as column-based storage, compression, no indices, delta merges, scale-out, partitioning, and parallelism. You also know how SAP HANA uses disks for its persistence storage. You can now describe the various SAP HANA deployment scenarios; how SAP HANA uses in-memory technologies such as columns, tables, and compression; and the various ways that SAP HANA can be deployed. We looked at the different Summary Chapter 3 advantages of each deployment method to help you decide which one fits your requirements best. Finally, we looked at how SAP HANA can also be deployed in the cloud and for multiple tenants. Summary You’ve learned the various ways you can deploy SAP HANA. With this knowledge, you can make recommendations to business users for an appropriate system architecture and landscape. In the next chapter, we’ll discuss changes that SAP HANA XSA brought to the SAP HANA architecture and modeling processes. We’ll also look at the tools with which we create information models and how to manage and administrate these models. 127 Chapter 4 Modeling Tools, Management, and Administration Techniques You’ll Master 쐍 Understand the new modeling and development process in SAP HANA 2.0 쐍 Learn how the new SAP HANA XSA architecture complements SAP Cloud Platform 쐍 Explain the difference between design-time and runtime objects 쐍 Identify the different working areas in SAP Web IDE for SAP HANA 쐍 Get to know the SAP HANA modeling tools 쐍 Learn how to complete the different steps in a project’s modeling and development process 쐍 Learn how to work with tables and data 쐍 Know how to create new information models 쐍 Understand how to solve build errors by learning about the SAP HANA XSA build processes 쐍 Understand the deployment process and learn how it fits into SAP’s transport processes 쐍 Describe the administration and management steps required in a project 쐍 Perform refactoring techniques on information models 130 Chapter 4 Modeling Tools, Management, and Administration Tip If you haven’t worked with SAP HANA 2.0 before, this will possibly be the most important chapter for you. You now understand where SAP HANA came from and how it works from an architectural point of view. SAP HANA has been growing fast, but the world hasn’t stood still. There has been progress in many different areas, such as cloud platforms, containers, development best practices, new programming paradigms, microservices, NoSQL databases, machine learning, blockchain, Internet of Things (IoT), DevOps, continuous integration, and, perhaps most of all, customer expectations. SAP HANA 2.0 is, in some sense, a response to this progress. In this chapter, we’ll look at what is new in SAP HANA 2.0 from many different angles. First, we’ll look at the new modeling and development process. This information is necessary to put many of the new changes into context. These changes include the new architecture, the new web-based user interface (UI), and the new tools. We’ll also cover how to start a modeling project, how to create and run information models, and how to get them into the production system within the new system. You’ll understand why the system behaves the way it does; when you get errors, you’ll know why and how to solve them. We’ll start with some process and architecture diagrams before moving on to the actual system itself. We’ll describe the steps you need to get going with SAP HANA modeling on your own system and your own projects. Real-World Scenario You’re working on a project that requires you to create information models. To do so, you use SAP Web IDE for SAP HANA to create and edit these views graphically. You can use this tool from anywhere, even when using cloud systems. Sometimes you reach for the other tools in your toolkit: SAP HANA database explorer, SAP HANA cockpit, SAP HANA XSA cockpit, and SAP Enterprise Architecture Designer (SAP EA Designer). When working on a project with SAP HANA, you’ll use modeling tools all the time. You have to be familiar with the SAP HANA modeling tools, know when to use which one, and be able to find your way around the various screens. Objectives of This Portion of the Test Chapter 4 SAP HANA has grown over the past few years, and the modeling tools now help you accomplish a wide variety of tasks. The features have also grown to provide rich functionality. It therefore takes some time to learn how to find your way around the various menus, options, and screens. All the information models you created for your project are in your development system. You feel confident about the work you’ve created, so now it’s time to take it to the production system. You need to transport your information models through the SAP HANA landscape. Before end users can use your information models in the production system, your models need to be deployed to the production system. As the project progresses, new customer requirements arise. Your information models need to adapt to changing business conditions. You ensure they do so by using the refactoring tools built in to the SAP HANA modeling tools. Objectives of This Portion of the Test The objective of this portion of the SAP HANA certification is to test your understanding of how to create, manage, build, deploy, and refactor your information models with the various SAP HANA modeling and development tools. Your insight will be tested by describing various scenarios that result in error messages, and you have to know why the errors were triggered. The certification exam expects you to have a good understanding of the following topics: 쐍 The modeling and development process in SAP HANA 2.0 쐍 The SAP HANA extended application services, advanced model (SAP HANA XSA) and SAP HANA Deployment Infrastructure (HDI) architecture 쐍 Differences between design-time and runtime objects 쐍 SAP Web IDE for SAP HANA and the SAP HANA modeling tools 쐍 The different steps in the complete modeling and development process of a project, from creating a project, to working with tables and information models 쐍 Solving build errors by understanding the SAP HANA XSA build processes 쐍 The deployment process and how this fits in with SAP’s transport processes 쐍 The administration and management steps required in a project 쐍 The various refactoring tools and techniques available 131 132 Chapter 4 Modeling Tools, Management, and Administration Note This portion contributes up to 10% of the total certification exam score. Key Concepts Refresher In the previous chapter, we briefly introduced you to SAP HANA extended application services, classic model (SAP HANA XS). We’ll now begin by describing SAP HANA XS in a bit more detail, and then move on to SAP HANA extended application services, advanced model (SAP HANA XSA) with HDI. Note In the What’s New Guide for SAP HANA 2.0 SPS 03, in the section for SAP HANA XS and the SAP HANA repository, it says, “SAP HANA Extended Application Services classic model (XS classic) and SAP HANA Repository are deprecated as of SAP HANA 2.0 SPS 02.” You can also see SAP Note 2465027 for more details. New Modeling Process In SAP HANA 1.0, we had two main working areas, as shown in Figure 4.1. We had the Content area, where we created our information models. These models were stored in a repository inside SAP HANA. In addition to our models, we could also store other files there, such as HTML and JavaScript code. The files in the SAP HANA repository are known as design-time files because we edit them to design our models and code. When we activated these models, a copy got stored in the _SYS_BIC schema in the Catalog area, and the _SYS_REPO user became the owner of these files. These copies are available to users, which they then use to run analysis, reports, and applications on the system. These copies were known as the runtime files because the users used them to run their queries. These files were stored together with the database tables and the data. Tables were grouped together in schemas. By storing and working with the design-time and runtime files in two different areas, there is a separation between design-time and runtime files. We used transports to copy the design-time files to a production environment, where we could Key Concepts Refresher Chapter 4 easily create a new set of runtime files. We could transport single information models or bundles of models together in packages. SAP HANA Studio (modeler, developer) BI tools (end-user) • Modeler perspective • Developer perspective • Administration Browser (end-user) SAP HANA Content Repository for design-time objects • CalcViews • SQLScript • CDS • Roles • HTML • xsjs • Etc. SAP HANA XSC run-time View (HTML) Controller SAPUI5 xsjs Static content OData (JavaScript) Activate User Auth Catalog (SAP HANA Database) _SYS_BIC schema (run-time objects) • Col. Views Classic database schema Classic database schema Figure 4.1 SAP HANA XS SAP HANA used SAP HANA XS as an application server for serving web services to applications. SAP HANA XS was also known as a runtime system, as it was running a web server and the web-based application code. It was easy to use because it was directly built into the SAP HANA system. Originally, SAP HANA Studio was the only modeling and development environment, but later SAP HANA Web-Based Development Workbench also appeared. As time went on, a number of issues cropped up with SAP HANA 1.0, as follows: 쐍 Because SAP HANA XS is built into SAP HANA, it runs on the same expensive hardware that SAP HANA uses. 133 134 Chapter 4 Modeling Tools, Management, and Administration 쐍 There was only one copy of the SAP HANA XS application server for an SAP HANA system, leading to scalability limitations. 쐍 The tight coupling of SAP HANA XS in SAP HANA servers led to some problematic networking designs. 쐍 XS JavaScript and the associated development process were quite different from the latest advances in the JavaScript development community. When companies need skilled people, it helps if the development processes they use fit in with what the rest of the industry does. New people can get up to speed quicker, and support improves in case of problems. 쐍 The built-in repository lacked certain vital features like version control, forking, and merging branches. 쐍 Table definitions weren’t transportable. Core data services (CDS) was introduced in the later versions of SAP HANA 1.0 to address this. SAP also needed to address some industry requirements: 쐍 SAP started using SAP HANA in more of its cloud business systems. Due to feature differences between the cloud and the on-premise offerings, better feature parity was needed between these offerings. Tip Many of the changes in SAP HANA 2.0 and SAP HANA XSA can be summarized by three words: cloud, containers, and microservices. 쐍 The development and modeling processes in the cloud environments were different from that of the on-premise solutions. Web-based UIs are vital for cloud systems. Best practices such as the twelve-factor app, found at https://12factor.net/, for cloud-native development emerged and got adopted. People wanted a consistent way to work with both cloud and on-premise development and modeling. 쐍 The speed of updates in cloud offerings is much faster than that of their onpremise cousins, requiring new ways of providing application services. System administrators and developers now use new automated build and testing processes, DevOps, continuous integration, continuous delivery, blue-green deployment, and so on. 쐍 Cloud environments were enabled by the arrival of virtual machines. Over time, people started adding more and more virtual machines to servers, and certain limits showed up. This led to the concept of containers, which are much smaller, can start very quickly, and can scale easily when used with cloud platforms. The combination of containers and web services led to the emergence of microser- Key Concepts Refresher Chapter 4 vices architectures. There was a need for a way to make use of these new concepts. 쐍 A microservices architecture needs a runtime to manage the various containers and the microservices running in each container. SAP decided to use Cloud Foundry for this. Cloud Foundry is an open-source, multi-cloud application platform as a service (PaaS) offering. The new SAP Cloud Platform uses Cloud Foundry. 쐍 Cloud hyperscalers, such as Amazon, Microsoft, and Google, are used by many SAP customers. The Cloud Foundry PaaS, containers, and microservices developed with SAP run on the offerings from these companies. SAP customers can this leverage their other development and solutions they already have running on these cloud infrastructures. These gaps in coverage led to the development of SAP HANA XSA, which brought a new modeling and development process with it, as shown in Figure 4.2. In the figure, the process moves from the left to the right. One of the first things to notice is that there is now a total separation of the design-time environment (left side) from the runtime environment (right side). Where these were previously together in an SAP HANA system, they are now completely separate. Note You might have noticed already that we keep mentioning modeling and development in the same breath. You’ll use development objects in your information models, and your information models will be used by developers in their web applications. SAP HANA modeling has become an integral part of the development process. The process starts in the top-left corner of Figure 4.2. You use SAP Web IDE for SAP HANA to start a project, create and edit information models, and then build them. Tip If you worked a lot with SAP HANA 1.0, you might be tempted to continue using SAP HANA Studio. You can’t use SAP HANA Studio to develop SAP HANA XSA applications because it doesn’t work with the new SAP HANA HDI infrastructure. You need to use SAP Web IDE for SAP HANA. Your models get stored in a Git repository instead of an SAP HANA repository. Git is the most-used repository by developers worldwide. This allows for version control, forking, and merging. Git integration is built into the SAP HANA XSA modeling and development tools. 135 136 Chapter 4 Modeling Tools, Management, and Administration Modeler, Developer Create, Edit Administrator/Operations End-users Run, Test, Debug SAP HANA XSA SAP Web IDE for SAP HANA, code editor Release Job Build Git (local) Jenkins (automation tool) Development Space (local runtime) Landscape/Staging SAP HANA XSA Production Spaces (Blue/Green) SAP HANA XSA Deploy Job QA/Test Space Gerrit SAP HANA XSA Git (Enterprise-wide) Trigger the Build Build Job CI Space Code review, Approvals Modeler, Dev, Team Lead Design-time Run-time Figure 4.2 The New SAP HANA Development and Operations Process The build process in SAP HANA 2.0 is similar to the activation process in SAP HANA 1.0. It creates the runtime versions of the objects in the development system. In this case, the development system is referred to as a space. We’ll explain this in more detail in later sections. You run your models in the development space to test them. Note When you install SAP HANA, express edition, you’ll get all this on your laptop or cloud instance. When you’re happy with your models and code, you can synchronize them with the enterprise-wide Git server. This is one of the big changes in SAP HANA 2.0. Instead of having a repository in each SAP HANA system with multiple copies of the same design-time file in each system in the landscape, all the design-time files Key Concepts Refresher Chapter 4 are now in one master repository. Git allows for many different version and branches. You have a stable production branch, but you can also have different development branches where people are trying out different new ideas. When the idea gets accepted, the code is merged into the integration and testing branches. You can fork a branch, which creates a copy, and test out something new in a sandbox. Git allows large teams of developers to seamlessly work together on large or complex software projects. You can also use Gerrit, which is Git with some additional features added, such as enabling code reviews and approvals. The team lead can sit with modelers and developers to review their work and approve the changes. Tip In this book, we’ll focus on the process up to this point. In this section, we give a description of the rest of the process to give you an understanding of how your work gets into the production system where end users can start using it and the business can start getting value out of it. In many cases, the update in the enterprise-wide Git or the approval in Gerrit, triggers an automated build of the project, and a build job updates the changes into the continuous integration (CI) space (system) where you can see how your models and code work together with everybody else’s. The build job is set up, configured, and managed by the administrators as part of the operations area. Many times, they need to understand what your code is trying to do and what it will change. On the other hand, you need to know how, when, and where your code gets used. This collaboration has led to DevOps; development and operations working together. The administrators (operations) can use something such as Jenkins, which is an open-source automation server that is used to automate tasks related to the building, testing, and deploying of software. Quality assurance (QA) testers can trigger a deploy job. Testing users can then test everyone’s updates in an environment that is very similar to the production environment. This is also sometimes called the staging server. After all testing has been approved, a release job is triggered that updates the production system. To ensure continuous uptime on the production servers, a bluegreen deployment strategy can be implemented. You can read more about this at http://s-prs.co/v487624 and http://s-prs.co/v487625. 137 138 Chapter 4 Modeling Tools, Management, and Administration To summarize, Table 4.1 lists the differences between SAP HANA 1.0/SAP HANA XS and SAP HANA 2.0/SAP HANA XSA discussed in this section. SAP HANA 1.0 and SAP HANA XS SAP HANA 2.0 and SAP HANA XSA Embedded SAP HANA repository for each SAP HANA system Enterprise-wide Git server, external to SAP HANA SAP HANA Studio and SAP HANA Web-Based Development Workbench SAP Web IDE for SAP HANA Activate process Build process Development server Development space Transports Deploys SAP-specific development and transport processes Industry-accepted processes such as DevOps, CI, and blue-green deployments Proprietary SAP tools Open-source tools and platforms such as Git, Gerrit, Jenkins, and Cloud Foundry Design-time and runtime objects together in an SAP HANA system Design-time objects stored in Git Only runtime objects in an SAP HANA system Table 4.1 Changes in the Modeling Process from SAP HANA 1.0 to SAP HANA 2.0 We’ll now look at the SAP HANA XSA architecture to see how all the remaining concerns mentioned in the previous section were addressed. After that, we’ll get into the SAP HANA system and learn how to practically implement the process we discussed here. New Architecture for Modeling and Development We mentioned earlier that many of the changes in SAP HANA XSA can be summarized by cloud, containers, and microservices. This comes out quite clearly in Figure 4.3, which shows the similarities and compatibility between SAP HANA XSA and SAP Cloud Platform architectures. Key Concepts Refresher Chapter 4 SAP HANA XSA Application Containers Containers and services Platform layer Infrastructure layer HDI Containers Backing Services UAA, Job scheduler, Connectivity, UX, Data Quality, APIs, Integration, Mobile, Machine Learning, IoT, … SAP HANA database platform SAP Cloud Platform with Cloud Foundry Application Containers HDI Containers UAA, Job scheduler, Connectivity, UX, Data Quality, APIs, Integration, Mobile, Machine Learning, IoT, … Redis, PostgreSQL, RabbitMQ, MongoDB Additional Platform Services SAP Cloud Platform Cloud Foundry SAP HANA XSA runtime Servers Backing Services IaaS OpenStack, Private Cloud, Multi-Cloud (Amazon WS, MS Azure, Google CP) Figure 4.3 Similarities and Compatibility between SAP HANA XSA and SAP Cloud Platform Architectures Both architectures have three layers. The bottom layer is different between SAP HANA XSA and SAP Cloud Platform. SAP HANA XSA is running on servers, which can be on premise or on the cloud. SAP Cloud Platform is running on OpenStack, an open-source infrastructure as a service (IaaS) platform, in the SAP data centers or on the cloud platforms of Amazon, Google, and Microsoft. The next layer of both, the platforms, are getting a lot closer. Both SAP HANA XSA and SAP Cloud Platform are based on Cloud Foundry. Both add additional services to enhance the standard Cloud Foundry offering, such as routing, security, scheduling, integration, UI framework (SAPUI5), database services (e.g., the SAP HANA database), and so on. These are shown as the Backing Services in the diagram. The top layer is really where it starts making sense for modelers and developers. The application and HDI containers are identical on both SAP HANA XSA and SAP Cloud Platform. Containers and virtual machines are both used to isolate applications and their dependencies into a single “unit” that can run anywhere. Both remove the need for physical hardware. This enables efficient use of servers in terms of energy 139 140 Chapter 4 Modeling Tools, Management, and Administration costs. However, where virtual machines require an operating system for each virtual machine, containers use new features in modern operating systems that allow them to create thousands of small isolated units to run on a single server and operating system. Containers fully isolate the application, the users, the memory and CPU usage, and the network. Their popularity in recent years stems from the fact that they scale fast and easily. Say you have a small web service running in a container. When there aren’t many requests for this service, only one container is sufficient. As the load increases, the platform layer—also known as the container runtime—automatically creates more copies of the container to meet the increased demand. When the load decreases, the number of copies is also reduced, which is a huge benefit for companies that can’t predict how much hardware is required for a service or application. The container isolation helps teams work at full speed on their own code. One team can work, for example, on the financial calculations service, while another team focuses on the UI. Both teams can deploy updates to the production system independently from each other. Tip The independence of containers, referred to as loose coupling, is important for continuous integration (CI) and continuous deployment (CD). Another advantage is that you can now use the programming language that best suits your needs for that particular service. You could use JavaScript for the UI and Golang for the financial calculations service. This approach is sometimes called “bring your own language” (BYOL). Even though you can use multiple programming languages, the container runtime makes this appear seamless to end users via the routing service. The UI service could use a URL such as https://server/ui, whereas the financial calculations service could use https://server/fincalcs. The routing service directs (routes) the user requests to the correct services in the different containers. If you use a standard UI framework, such as SAPUI5, all your applications will have a consistent look and feel, despite the fact that the underlying services are using different programming languages. Each container also contains the environment required to run the services contained in it. A Java service might require Tomcat, whereas PHP (Hypertext Preprocessor) Key Concepts Refresher Chapter 4 might require the Apache web server running inside its container. This architecture is also known as microservices. Note The microservices architecture allows us to develop services and applications on SAP HANA XSA (including the express edition) in an on-premise environment and then deploy it to SAP Cloud Platform! What we’ve just described is all part of the new SAP HANA architecture. You can see the various aspects, such as routing, UI services, containers, BYOL, and the XSA container runtime, in Figure 4.4. Browser (modeler, developer) • Web IDE for SAP HANA • Database Explorer • Etc. BI tools (end-user) Browser (end-user) SAP HANA Application Router Identity Provider (IdP) User Account and Auth(UAA) Multi Target Application (MTA) HTML5 App, static content Repository (Git) for design-time objects • Calc Views • SQLScript • CDS • Synonyms • Roles • HTML • Node.js • Java • Etc. Backing services Tomcat/TomEE Node.js Application Container Java App Node App OData (Java) OData (JavaScript) Bring Your Own Language (BYOL)/ Runtime SAP HANA extended application services, advanced model (XSA) runtime/Cloud Foundry SAP HANA Database SAP HANA Deployment Infrastructure (HDI) HDI Container HDI Container Figure 4.4 SAP HANA XSA Architecture The three blocks on the left are external to SAP HANA. We’ve already seen that Git is used as an enterprise-wide repository. Security for end users moves from within SAP HANA to using an external Identity Provider (IdP) system. Business users and technical users are separated. We discuss this in more detail in Chapter 13. 141 142 Chapter 4 Modeling Tools, Management, and Administration The SAP HANA system has two major areas: (1) the SAP HANA database at the bottom of the diagram, and (2) SAP HANA XSA. You can run SAP HANA XSA inside the SAP HANA system on the same server as the SAP HANA database (e.g., SAP HANA XS), but you can also separate it to run on one or more other cheaper servers. This addresses the first two requests in the list, as these cheaper servers can scale cheaply and easily, and the network configurations can be hardened. This is indicated by the dashed line between the two areas. In SAP HANA XSA, you can combine services in different programming languages into a one application by creating a multi-target application (MTA) project. You can write one service in Java, another in JavaScript, and yet another in a language of your choice. During deployment, SAP HANA XSA will automatically create different application containers for each of these services. This multi-language configuration is specified by the mta.yaml file in your project. You can use either the SAP HANA XSA runtime or SAP Cloud Platform to “host” these application containers. Tip SAP HANA XSA and SAP Cloud Platform use the industry-accepted Node.js when working with JavaScript. In the SAP HANA database area, you’ll notice something interesting. Each schema in an SAP HANA XSA database is now a container. In our project, you add an SAP HANA database (HDB) module. In this module (folder), you add all your design-time objects, such as tables, calculation views, and SQL procedures. The HDB module definition is also added to the mta.yaml file. The HDB module is for your design-time objects. When you build your HDB module, it creates an SAP HANA database container. This replaces the classical database schemas in the older version of SAP HANA. These containers that contain just SAP HANA database runtime objects are called HDI containers. They contain only runtime objects, as all design-time objects are stored in Git, even the design-time database objects. The HDI container functionality is part of the SAP HANA runtime and is part of SAP Cloud Platform runtime should you choose to deploy your HDI containers on the cloud. One of the first problems you might encounter is due to the fact that containers provide full isolation. This radically impacts all SAP HANA security and can be frustrating when you’re used to the old way of working. Key Concepts Refresher Chapter 4 Tip In this book, and in the exam, the focus is exclusively on the HDB modules in the designtime environment that become HDI containers in the runtime environment. The exam is, after all, about SAP HANA modeling, which gets designed in HDB modules and executed in HDI containers. SAP HANA XSA development with containers and microservices is addressed with the SAP HANA development exam. These two areas complement each other, and there are overlaps. The SAP HANA XSA courses on openSAP focus more on this area, with the result that many developers neglect SAP HANA modeling because they aren’t aware that it could, in many cases, make their lives a lot easier. Let’s look at an example, as shown in Figure 4.5. The top half shows an example of a calculation view (CV2) that uses tables from two other schemas. In SAP HANA 1.0, we would just drag and drop table T1 from the one schema and table T2 from the other schema, and join them in a view in a third schema. All this happened quickly and effortlessly. Schema 1 SAP HANA 1.0 Database CV1 (Calculation View) CV2 (Calculation View) T3 (Table) Classical DB Schema Schema 2 T1 (Table) T2 (Table) SAP HANA 2.0 Database HDI Container C1 (HDI Container) CV1 (Calculation View) Role C2::access Classical DB Schema T1 (Table) CV2 (Calculation View) S1 (Synonym) with user-provided service T3 (Table) S2 (Synonym) with HDI service Figure 4.5 Example Showing How Containers Improve Isolation and Security C2 (HDI Container) T2 (Table) 143 144 Chapter 4 Modeling Tools, Management, and Administration If we try this in HDI, we get authorization errors. Because the three schemas are all fully isolated in three different containers, you’ll have to look at the security when designing a scenario like this to also enforce loose coupling between different containers in microservices architectures. The bottom half shows that to get this working, you have to create a synonym for each external table you want to use. Each HDI container also has a default access authorization role that gets created when you first create the container. You need to assign this role to the owner of your container (schema). Lastly, if you’re connecting to a classical database schema, you also need to provide a service. If you connect to another HDI container, the SAP HANA or SAP Cloud Platform runtimes already have such a service for you. We discuss all this in more detail in Chapter 13. To summarize, Table 4.2 lists the differences between SAP HANA 1.0/SAP HANA XS and SAP HANA 2.0/SAP HANA XSA discussed in this section. SAP HANA 1.0 and SAP HANA XS SAP HANA 2.0 and SAP HANA XSA SAP HANA XS is embedded in SAP HANA. SAP HANA can be run on separate cheaper servers. The database is integrated with the application. The data layer is decoupled. End users of native SAP HANA applications are managed inside SAP HANA. End users are managed from an external IdP system. Business users and technical users are together. Business users and technical users are separated. SAP HANA XS is different from SAP Cloud Platform. SAP HANA XSA and SAP Cloud Platform share the same PaaS layer using Cloud Foundry. Both use a microservices architecture with containers. You can deploy the same models and code to either platform. All code and information models are in one SAP HANA system. Each application’s services are in its own container, and SAP HANA information models are in HDI containers. All database objects are together in one SAP HANA database. Each project and HDB module’s database objects are isolated in its own HDI container. Explicit references are used to tables in other schemas. Synonyms, authorizations (role assigned), and possibly a user-provided service are needed. XS JavaScript is used. Node.js is used. Table 4.2 Changes in the Architecture from SAP HANA 1.0 to SAP HANA 2.0 Key Concepts Refresher Chapter 4 Now that you understand the new modeling and development process, and you know how all the new features fit into the architecture, we can start using SAP HANA 2.0. Creating a Project Your first step in the modeling and development process will be to create a development space. You’ll then need to perform other steps, from creating a development user to selecting a space and converting a module. Creating a Development Space In Figure 4.2, you saw four different spaces: one for development, one for CI, one for QA (staging), and another for production. In the traditional SAP world, these would be called “systems” or “instances.” A space contains the SAP HANA runtime with its containers and runtime objects. The concept of “spaces” come from Cloud Foundry, on which SAP HANA XSA and SAP Cloud Platform are based. Figure 4.6 shows how organizations and spaces are structured in Cloud Foundry environments. The “n” indicates, for example, that each Region can have many Global accounts. In SAP HANA XSA, we look at the Organization and Space levels. Each organization can have many spaces. The default spaces in a standard Cloud Foundry installation are development, test, and production. You can create organizations and spaces using the SAP HANA XSA cockpit tool, also known as the xsa-cockpit tool. Note The name of this tool was changed in SAP HANA 2.0 SPS 03. It used to be called the SAP HANA XSA Administration Tool (xsa-admin). Don’t confuse this new name with the SAP HANA cockpit, which we’ll discuss in the next section. The SAP HANA cockpit is mostly used by system and database administrators, whereas the SAP HANA XSA cockpit is mostly used by modelers and developers. To find the xsa-cockpit tool, first open the SAP HANA XSA services menu in your browser at https://<server>:3<instance-number>30. The host (server) name for the SAP HANA express edition is hxehost, and the instance number is 90. You’ll therefore use https://hxehost:39030. The SAP HANA XSA services menu is shown in Figure 4.7. The xsa-cockpit tool is the eighth option on the right-side menu. 145 146 Chapter 4 Modeling Tools, Management, and Administration SAP Cloud Platform User account Home Region Global Account Cloud Foundry Environment Subaccount Organization Space Cloud Foundry applications Cloud Foundry services Figure 4.6 Cloud Foundry Organizations and Spaces Figure 4.7 SAP HANA XSA Services Menu Key Concepts Refresher Chapter 4 Tip You can also find SAP Web IDE for SAP HANA (webide) and the SAP HANA Interactive Education (SHINE) demo (web) on the SAP HANA XSA services menu. After you’ve opened the xsa-cockpit tool, you’ll see the screen as shown in Figure 4.8. You can click on the New Organization button to create a new organization. This example shows the HANAExpress organization for SAP HANA, express edition. Figure 4.8 SAP HANA XS Advanced Cockpit to Add a New Organization Note The UI of the SAP HANA XSA cockpit was changed from the SAP Fiori launchpad look-andfeel to that of the new SAP Cloud Platform. Figure 4.9 shows the details of the HANAExpress organization screen. The right side shows the Spaces. You can add a new development space for your SAP HANA modeling by clicking on the New Space button. Tip It’s recommended that you create a development space, instead of using the default SAP space. In SAP HANA, express edition, the development space is already created for you. 147 148 Chapter 4 Modeling Tools, Management, and Administration Figure 4.9 Adding a New Space to SAP HANA XSA Creating a Development User After you have a development space, you then need a development user in that space. You can create a new user by selecting the User Management menu option shown on the left side of the screen in Figure 4.8, and then clicking on the New User button. After you’ve created your user, select the Organizations menu option, click on the HANAExpress space, and select the Development space. Then select the Members option on the menu. You should see that the XSA_ADMIN and XSA_DEV users are already in this space. Click on the Add Members button, select your user, select the Space Developer checkbox, and click the OK button. Your user is now added to the Development space. You can also create users in the SAP HANA cockpit. The SAP HANA cockpit is used by system administrators to manage the SAP HANA system. This replaces the Administration perspective previously found in SAP HANA Studio. The SAP HANA cockpit also appears on the SAP HANA XSA services menu as hana-cockpit, as shown previously in Figure 4.7. If you want to manually find the browser address for the SAP HANA cockpit (or other SAP HANA services), you need to log in to the operating system with the Key Concepts Refresher Chapter 4 <System ID>adm user. The system ID for SAP HANA, express edition is hxe, and you need to log in with the hxeadm user. On the command line, type “xs apps”. You’ll see a list of SAP HANA XSA services, as shown in Figure 4.10. Figure 4.10 Running “xs apps” on the Command Line to Find Application URLs To find the SAP HANA cockpit, look for a service called cockpit-web-app. In this case, the browser address (top line) is https://hxehost:51046. This will probably be different on your system. If the service shows as STOPPED in the second column, you can start it by going to the xsa-cockpit tool, and selecting the Applications menu option, as shown in Figure 4.11. You’ll then see the list of services. Here you can start any services that were stopped. You can create users and assign roles to the users using the SAP HANA cockpit. We’ll discuss this in Chapter 13. Figure 4.12 shows what the SAP HANA cockpit looks like. Most of the functionality in the SAP HANA cockpit is for system administrators, so it won’t be discussed here. 149 150 Chapter 4 Modeling Tools, Management, and Administration Figure 4.11 Starting SAP HANA XSA Applications in the SAP HANA Applications Monitor Figure 4.12 SAP HANA Cockpit Used for Administration Key Concepts Refresher Chapter 4 Project Creation Options You can now open the SAP Web IDE for SAP HANA (webide) tool, which is the main tool we’ll use for modeling and development. Even though the names are similar, SAP Web IDE for SAP HANA isn’t the same as the SAP Web IDE that is used for developing SAP Fiori apps. When you open SAP Web IDE for SAP HANA, you’ll have an empty Workspace where you’ll create SAP HANA projects. Each project will be for an SAP HANA application, for which you create development and modeling objects. SPS 04 From SAP HANA SPS 04, you can have multiple workspaces in SAP Web IDE for SAP HANA. Tip The following three words sound similar but refer to completely different topics, so it’s important not to confuse them: 쐍 Space Space refers to a development or production system. 쐍 Workspace Workspace refers to your working area in SAP Web IDE for SAP HANA, where you create projects with development and modeling objects. 쐍 Namespace Namespace is used for naming development and modeling objects, and it helps determine the name of the runtime objects in the containers. There are three ways to create a new SAP HANA project. The following are shown in Figure 4.13: 1. The first is to right-click on Workspace, and select New from the menu. This creates a blank project. You then create an SAP HANA MTA Project. 2. The second is to Import an existing project. You need to have previously exported a project. We’ll look at this a bit more. 3. Lastly, you can Clone a project from the Git repository. 151 152 Chapter 4 Modeling Tools, Management, and Administration Figure 4.13 Creating a New Project, Importing a Project, or Cloning a Project A project can have several modules. These will appear as folders in SAP Web IDE for SAP HANA. Each module can be a different programming language (for application containers) or SAP HANA database objects (for HDI containers). Some of the modules you can add are as follows: 쐍 HDB module These modules contain SAP HANA database objects, data models, data processing (e.g., functions and procedures), virtual tables, synonyms, analytic privileges, CDS views, table data, and tables. As this is the area we’ll focus on, we’ll discuss these in the rest of the book. 쐍 Java or Node.js modules These modules are for business logic, written in Java or JavaScript. 쐍 Basic HTML5 module These modules are for web-based UIs. 쐍 SAPUI5 HTML5 module These modules are for UIs based on the SAPUI5 standard. Key Concepts Refresher Chapter 4 You create your development and modeling objects in the modules (folders) in SAP Web IDE for SAP HANA. The structure of a project can be described as a collection of modules. For exam purposes, the most important of these are the HDB modules. When you build a module, it will create a container after the first successful build. Any subsequent build will create or update the runtime objects in the container. A module is a collection of design-time objects (files). During a build, the runtime objects in the container get created from the design-time objects in the module (folder). The mta.yaml file, shown in Figure 4.14, contains the configuration for your project. Each module is described in this file. The Code Editor view is shown here. There is also an MTA Editor view that shows the configuration graphically. Figure 4.14 The mta.yaml File Shows the Project with Its Modules 153 154 Chapter 4 Modeling Tools, Management, and Administration Selecting a Space for Running the Project After creating, importing, or cloning a project, you have to tell SAP Web IDE for SAP HANA which space is your development space before you can build the project (application). Right-click on the project, and choose Project • Project Setting from the context menu. You’ll get a screen like the one shown in Figure 4.15. Select your Space from the dropdown list, and click on the Save and the Close buttons at the bottom. Note You’ll only see spaces in the dropdown list where your user is registered as a developer. We described how to do this at the start of this section. Figure 4.15 Select a Space for Running the Project Converting a Module after an Import If you’ve imported or cloned a project, you have to do an additional step before you can run a build. After the import or clone, the project doesn’t yet know the module types for the various modules because you could have modified the files in a code editor before importing or cloning. You therefore have to identify the various module types. You do this by right-clicking on a module, choosing Convert from the context menu, and then selecting the module type, for example, HDB Module (see Figure 4.16). Key Concepts Refresher Chapter 4 Figure 4.16 Converting a Module to an HDB Module after Import or Clone To summarize, Table 4.3 lists the differences between SAP HANA 1.0/SAP HANA XS and SAP HANA 2.0/SAP HANA XSA discussed in this section. SAP HANA 1.0 and SAP HANA XS SAP HANA 2.0 and SAP HANA XSA SAP HANA Studio: Content area and developer perspective SAP Web IDE for SAP HANA SAP HANA Studio: Administration perspective SAP HANA cockpit Development system Development space Table 4.3 Changes in the Modeling Tools from SAP HANA 1.0 to SAP HANA 2.0 Creating SAP HANA Design-Time Objects After you’ve created the project and the modules, you can now create your SAP HANA information and data objects in the HDB module. (Remember that we won’t discuss the application modules; instead, we’ll focus on the HDB module.) 155 156 Chapter 4 Modeling Tools, Management, and Administration In SAP Web IDE for SAP HANA, the modules appear as folders. The SAP HANA information and data objects appear as files in these folders. SAP Enterprise Architecture Designer First, we’ll quickly introduce SAP Enterprise Architecture Designer (SAP EA Designer). As the name indicates, it’s a tool that you use for designing enterprise architecture diagrams and models during the planning and design phase of an SAP HANA project. You can visually create data models and views in this tool. This is one of the ways to create SAP HANA design-time objects. You can also use it to reverse-engineer existing SAP HANA information views. In Figure 4.17, we used the Reverse HDI Files option to create an enterprise architecture diagram of an existing SHINE calculation view. You can use this feature to help you in migration projects from other databases to SAP HANA or SAP Cloud Platform. As it can create the required CDS artifacts used by the application programming model for SAP Cloud Platform, this functionality can save you a lot of time. (We’ll discuss CDS in the Creating a Table Using Core Data Services section.) You can use SAP EA Designer to get a complete overview of new or existing projects. It allows you to use the collaborative web-based environment to design a project’s models, data flows, processes, requirements, and so on. Figure 4.17 SAP Enterprise Architecture Designer Key Concepts Refresher Chapter 4 SAP EA Designer is updated more often than SAP HANA, and new features appear quite quickly. With SAP HANA SPS 03, we got a new type of table called system-versioned tables. (We’ll take a quick look at this in Chapter 8.) While there was still limited documentation about the CDS format for this new table type, we could already create such tables using the SAP EA Designer tool, and it would create the required and compliant CDS files for us. SAP EA Designer provides General Data Protection Regulation (GDPR) support in data models and can generate the SAP HANA data security features of anonymization. SAP Web IDE for the SAP HANA User Interface Before we create new SAP HANA design-time objects (files) in SAP Web IDE for SAP HANA, let’s take a good look around to understand the tool better. Figure 4.18 shows an annotated overview of a typical screen in SAP Web IDE for SAP HANA, which is composed of the following: 쐍 A standard menu bar appears across the top of the screen. 쐍 Starting at the left, we see a vertical toolbar with three icons. The first accesses the Development tool 1, which is the equivalent of the Developer Perspective in SAP HANA Studio. We use this to create and edit design-time objects. The second accesses the SAP HANA database explorer 2, which is the equivalent of the Catalog area in SAP HANA Studio. We use this to work with the runtime objects. We’ll look at the SAP HANA database explorer tool more in the next section. 쐍 The next area to the right is the Workspace area, which has a folder-like tree structure. The top folder is the Workspace. Below this we have the project 7 (in this example, hana-shine-xsa), which is equivalent to an application. The project folder contains several modules 8, and the modules subfolders contain objects 9 (i.e., files). 쐍 The middle area is the Scenario area k in which we can create and update our SAP HANA information models. When you open an information model, the name of the model appears in a tab at the top of this area 4 (in this example, it’s the PRODUCTS.hdbcalculationview tab). On the left of this area is the palette toolbar 3, which shows what nodes you can use to create the SAP HANA information model. 157 158 Chapter 4 Modeling Tools, Management, and Administration Figure 4.18 Overview of SAP Web IDE for SAP HANA Tip If your space is limited on the screen, you can double-click on the name tab. The window will expand over the workspace tree structure. If you double-click on it again, it restores to the smaller size. 쐍 The next area to the right of the scenario area is the details area 5. When you select a node j in the scenario area, the details of that node will be shown in the details area. The details area has several tabs showing at the top of the area. In Figure 4.18, the Join Definition tab is selected. 쐍 Below the details area is the PROPERTIES section l. When you select something in the details area, the properties of that item are shown here. 쐍 The toolbar on the top-right side 6 has tools for Searching objects, Debugging, and so on. When you select a tool, an area will expand to the left of the toolbar to show that tool. Clicking on the icon again will collapse the area. 쐍 The toolbar on the bottom-right side has the Console icon m. This is where you’ll see messages, for example, build errors. Once again, when you select the Console icon, an area will expand to the left of the toolbar to show the Console. Key Concepts Refresher Chapter 4 When you right-click on the folders in the Workspace tree structure, the context menu shows a New option. Select the New option, and a list of possible designtime objects appear (see Figure 4.19). Figure 4.19 The New Context Menu in SAP Web IDE for SAP HANA Note We’ll discuss many of these different file types in the next chapters. Let’s take a quick look at the HDB CDS Artifacts object from this list. Creating a Table Using Core Data Services If you’ve used SAP HANA Studio with SAP HANA 1.0, you’ll remember that you could create a new table using a table designer tool. This would create the table directly in the SAP HANA database as a runtime object. The problem was that such tables could not be deployed (transported) to the production system. SAP HANA introduced the concept of CDS in the last few versions of SAP HANA 1.0. We’ll discuss CDS in detail in the next chapter. Basically, CDS files are design-time files that can be deployed on multiple systems. You’ll use an .hdbcds files to define your SAP HANA database table. The syntax of 159 160 Chapter 4 Modeling Tools, Management, and Administration the CDS files is very close to standard SQL syntax. To create the actual table in the database—the runtime object—you first have to build the CDS file. The SHINE project is a good example of CDS data sources in action. You don’t have to create an HDI container, create tables, or import data into these tables. When SHINE is imported into your SAP HANA system, and you start a build, it automatically creates all that content for you using CDS files. Figure 4.20 shows how to create an .hdbcds file. In this example, the SAP HANA CDS file is used to create two tables called MEMBERS and RELATIONSHIPS. You can Save the file as shown 1 or by pressing (Ctrl)+(S) (if you use Microsoft Windows). Figure 4.20 Creating and Saving an .hdbcds File to Design a Table Key Concepts Refresher Chapter 4 Documentation As you near the end of a modeling project, documentation becomes important. You need to hand your modeling work over to someone else. There are always good reasons for the many decisions you make, such as why the various nodes of a calculation view are in a specific order. Documentation tools and features in SAP HANA, such as notes, allow you to communicate that information to anyone who may work on your project in the future. In the context menu of a calculation view (see Figure 4.21), you’ll find a very useful documentation feature. The Generate Document option documents graphical calculation views in SAP HANA. Figure 4.21 shows an example of the HTML output generated. You can select an entire module (folder), and it will generate the HTML documents for all of your calculation views in one go. Figure 4.21 Generating Documentation of SAP HANA XSA Calculation Views To summarize, Table 4.4 lists the differences between SAP HANA 1.0/SAP HANA XS and SAP HANA 2.0/SAP HANA XSA discussed in this section. 161 162 Chapter 4 Modeling Tools, Management, and Administration SAP HANA 1.0 and SAP HANA XS SAP HANA 2.0 and SAP HANA XSA Create design-time objects in SAP HANA Studio in the developer perspective Create design-time objects in application modules and HDB modules SAP HANA Studio: Catalog area SAP HANA database explorer Create tables in SAP HANA Studio using a table designer tool Create tables using .hdbcds files and building the files Use of .hdbtable for creating tables using CDS Use of .hdbcds for creating tables using CDS Table 4.4 Changes in Creating Objects from SAP HANA 1.0 to SAP HANA 2.0 Building and Managing Information Models When you build an entire project the first time, and the build is successful, it will create the various runtime containers. However, you won’t find an HDI container. It turns out that SAP HANA requires a separate dedicated build of each HDB module to create the corresponding HDI containers. When an HDI container is created, it has a corresponding physical schema for the container with a long technical name. When you’ve successfully built an HDB module design-time file, you’ll have a runtime object in the HDI container. You can work with these runtime objects using the SAP HANA database explorer tool. You find this tool at the second icon on the left toolbar in SAP Web IDE for SAP HANA. The top-left area of Figure 4.22 shows the various types of runtime objects in the container. Column Views are the runtime versions of SAP HANA views that were created with the build process. Another group that you’ll look at very often is Tables. Remember You create SAP HANA database design-time objects in the HDB module (folder). When you successfully build the HDB module the first time, it creates an HDI container. SAP HANA uses the design-time objects (files) in the HDB module to create runtime objects in the HDI container. Key Concepts Refresher Chapter 4 Table Definitions and Data When you select the Tables group, it shows a list of all the tables for the HDB module in the bottom left area of the screen (see Figure 4.22). To open the definition of a table on the right side, either double-click a table’s name or right-click and select Open from the context menu. Figure 4.22 SAP HANA Database Explorer Showing a Table Definition Data Preview You can view and maintain table data when you select Open Data from the table’s context menu in the SAP HANA database explorer. You can even insert data and update data while you preview the table contents, as illustrated in Figure 4.23. This obviously can be dangerous in a production system, so the action can be prevented through security settings. SPS 04 One of the features that you could do with SAP HANA Studio in SAP HANA 1.0 was to import data into a table from CSV files or Microsoft Excel spreadsheets. One way you could do it in SAP HANA 2.0 was by defining a CDS .hdbtabledata file and specifying the CSV file. With SPS 04, you can now import data from medium-sized CSV files or Excel spreadsheets using the SAP HANA database explorer. 163 164 Chapter 4 Modeling Tools, Management, and Administration In Figure 4.23, you see two different tabs when previewing the data. The Raw Data tab shows the raw data that can be filtered, sorted, and exported. In the Analysis tab, you can perform a full analysis of table outputs or information model outputs. This works the same as an Excel pivot table. You can show the data using many different types of graphs—for example, the heat map graph shown in Figure 4.24. Figure 4.23 Displaying Table Data with the Option to Insert Data Figure 4.24 Data Preview Analysis of an SAP HANA Information Model Key Concepts Refresher Chapter 4 SQL Console In the Raw Data tab, you also have the Edit SQL Statement in SQL Console option, which opens a SQL statement for viewing the table or viewing data in the SQL Console tool. You can now modify the SQL statement and see the results below it, as shown in Figure 4.25. Figure 4.25 Data Preview of an SAP HANA Information Model Note Never use the SQL Console to create database objects. You can use it to observe changes in reading data and for opening the Analyze SQL tool. We discuss this very useful tool in Chapter 12. You have to create SAP HANA database objects using CDS. If you create an SAP HANA database object using the SQL Console, only the runtime version of the object will exist in the current HDI container. No design-time object will exist. 165 166 Chapter 4 Modeling Tools, Management, and Administration When you build and deploy your HDB module/HDI container to another space, the build and deployment will be successful. The object you created using the SQL Console will, however, be missing from that space because only the design-time objects get built and deployed. If you wanted to create objects using the SQL Console, you would have to manually do it the same way in every space (system). Another problem with doing this is that someone needs similar security access to create it in the production environment, which is a huge security risk. Naming Runtime Objects When you build a design-time object, the build process creates the runtime version of that object. The question we ask now is, “What is the name of that runtime object in the container?” This is important to know, as this is the name that analytics or reporting users need to use. The intuitive assumption is that the file name of the design-time object will be the name of the runtime object. For most objects, that assumption will work, but it won’t work for all objects. If you look at Figure 4.20 again, you’ll see that you create two tables with one .hdbcds design-time definition file. We can’t give both tables the same name. Therefore, runtime objects are named as <namespace>::<objectname>. We mentioned namespaces before when we said there was a difference between a space, a workspace, and a namespace. A namespace is used when naming development and modeling objects, and it helps determine the name of the runtime objects in the containers. By default, the namespace is blank. If you have a design-time object called View1.hdbcalculationview, the default name for the runtime object will be ::View1. Note The runtime object name is the same as the file name of the design-time object, without the file extension, for example, .hdbcalculationview. Because the runtime object names must be unique, you shouldn’t have View1.hdbcalculationview and View1.hdbfunction in the same folder as both will generate runtime objects with the name ::View1. In a small project, there is probably not much to be gained by specifying a namespace. In larger projects, or when developing libraries and application programming interfaces (APIs), you get to the situation where some objects can have the Key Concepts Refresher Chapter 4 same name. In that case, you start adding a namespace to differentiate which specific object you’re talking about. The namespace also gives a sense of where it fits in. For example, you can have two objects with the name Show. By adding different namespaces, you could get Report::Show and Users::Show. The namespace tells other modelers and developers that the context is either Reports or Users. Note The folders don’t have to be named Report and Users. The folder names aren’t related to the namespace at all. The combination of namespace and object name should be unique. You can have two objects called Show, provided they are in different namespaces as shown previously. Tip Think of the namespace like a surname. Many people are called Mark. By adding a surname, it becomes easier to identify which person called Mark we refer to, for example, Mark Green. When you create a new HDB module, an .hdinamespace file gets created inside the source (src) subfolder of the module. This file contains the namespace for the files in that folder. The default namespace is blank, as shown by the name entity in Listing 4.1. { "name": "" "subfolder": "ignore" } Listing 4.1 The Default.hdinamespace File for an HDB Module The subfolder entity indicates whether this namespace is applicable to the subfolders as well or whether the subfolders should ignore this namespace. You can specify cascade instead of ignore to make the namespace applicable to the subfolders as well. Because there is one .hdinamespace file in a folder, it means that all the objects in that folder will have the same namespace. If you want to give some objects another namespace, you’ll have to put them in a subfolder that contains a different .hdinamespace file with another namespace specified. 167 168 Chapter 4 Modeling Tools, Management, and Administration You can also specify the namespace in a design-time file, but then it must correspond to the namespace specified in the .hdinamespace file. Conflicts will cause build errors. Remember the namespace is specified by the .hdinamespace file in the same folder or by an .hdinamespace file in the parent folder with cascading enabled. Tip If no .hdinamespace file is found, or if the .hdinamespace file is empty, the namespace is assumed to be blank. We’ve mentioned that by default the file name of the design-time object determines the name of the runtime object. The exception to this is where we have design-time files in which many objects are specified in a single file. We already saw an example in Figure 4.20, where we created two tables—MEMBERS and RELATIONSHIPS—with the graphf.hdbcds design-time definition file. If our namespace is specified as tables.graph, the runtime objects will be called tables.graph::MEMBERS and tables.graph::RELATIONSHIPS. In this case, the file name graphf is ignored. Synonyms are another example. You can create many synonyms in a single .hdbsynonym file. Tip When you need to access an external data source, you must first define a local synonym in an .hdbsynonym file. You might also need to create or update the .hdbgrants (“permissions”) file to ensure that the external data source can actually be accessed. Build Errors You know by this time that you need to build the design-time file to create the runtime files. End users use the runtime files (refer to the top-right side of Figure 4.2). When you’re developing or modeling SAP HANA information views, you only have the design-time files. Until you do the build, no one can run these views. Tip When you want to view the results of a calculation view, you need to first build the view. If you make some changes, save the view, and then ask for a preview of the data without building it first, you’ll get the results from the previous build—without your latest changes. Key Concepts Refresher Chapter 4 In a book, it’s easy to show you the build menu and assume that all will go well. But in real life, you’ll occasionally get build errors. You need to know why you get the errors because that will help you figure out how to solve them. We’ll examine some examples of build errors in the following. Figure 4.26 shows a build error in the console window at the bottom. The highlighted error message reads no measure or attribute present. In this case, we haven’t yet mapped the input fields to the output fields, and the view had no output fields in the top layer. We’ll look more at how to create this type of view in Chapter 6. Figure 4.26 A Build Error Sometimes the build messages aren’t that clear. Then you’ll have to understand which other objects your view may depend on. Let’s look at an example where you use a table from another schema in your view. When you build only your view—not the entire HDB module—you might get a build error. When you’re using a table from another schema, you already know that you need a synonym and an authorization assigned. You might also need a user-defined cross-schema service if the table is from a classic database schema. If, 169 170 Chapter 4 Modeling Tools, Management, and Administration for example, the synonym isn’t yet built, you’ll get a build error. You could also get build errors because you don’t have the authorization, the service hasn’t been deployed yet, or even that the other schema and tables haven’t yet been created in the space (system) where you’re doing the build. Build errors can really get tricky when you start renaming or deleting objects. Sometimes you can introduce build errors when you refactor your code or models. Refactoring Refactoring is the process of restructuring your information models without changing their external behavior (i.e., their result sets). You do this to improve, simplify, and clarify the design of your models, for example, when you move information models inside the system from one module to another or when you rename an information model. Renaming or moving an information model isn’t always a trivial task. If the information model is used in many other models, you’ll have to update those models as well. You can easily find yourself having to update tens of information models when simply trying to rename a single information model that these models all use. Concepts such as refactoring are used in larger projects to ensure your modeling work stays fresh and useful. The idea is similar to gardening: you need to mow the lawn, pull weeds, and prune the roses to maintain a beautiful garden. This idea of upkeep can also be applied to the administration of your data models. We’ll start by looking at some of the refactoring features in SAP HANA. After that, we’ll look at some examples of what to watch out for so that you don’t break your models. Where-Used In the refactoring process, you may have to figure out where a specific information model is used. Figure 4.27 shows the context menu of an information model in the SAP HANA database explorer. By selecting the Where-Used Browser menu option, SAP HANA finds all the places where that information model is used. This is very useful to see the impact a planned refactoring will have. Key Concepts Refresher Chapter 4 Figure 4.27 Where-Used Browser Option for Finding Where an SAP HANA Information Model Is Used Replacing Data Sources and Nodes in an SAP HANA View Figure 4.28 shows two context menus of nodes in a graphical calculation view. In this case, you can replace the data sources used in the node or even replace or delete the entire node. Figure 4.28 Replacing Data Sources and Nodes in an SAP HANA View 171 172 Chapter 4 Modeling Tools, Management, and Administration You normally won’t do this for an information view that is in use because the result set of the calculation view will change. But replacing the data source in a node is useful when you have to create different calculation views that are based on similar data tables. You can make a copy of an existing view and then change the data source to a similar table. SAP HANA will ask you to verify the mapping of the old data source to the new data source. It will then keep as much of the current information intact as possible, which can save you many hours of building similar calculation views. Show Lineage in an SAP HANA view Sometimes you have renamed a column name in an SAP HANA view. When you want to find out from which table the column came, you can’t just search for the column names in the tables as the column was renamed inside the view. In that case, you can use the Show Lineage option. Select an output column, for example, Supplier_Country from the Name column shown in Figure 4.29, and click on the Show Lineage icon. Figure 4.29 Show Lineage Option Showing the Data Source of a Column Key Concepts Refresher Chapter 4 Go down the layers of nodes, and you can see that the Supplier_Country column was introduced by the Addresses table in the second (from the bottom) join node. This works, even if the column got renamed along the way. Avoiding and Fixing Errors as a Result of Refactoring We end our journey into the new world of SAP HANA 2.0 by looking at some examples where the build process gives errors when you might not expect them. By looking at why this happens, you’ll also understand the build process better. For our first example, we look at a case with two information models called A and B. We start by creating Model A, and then we create Model B based on Model A. We know that we have to build Model A first because Model B depends on it. When we’ve successfully built both models, we rename a field inside Model A. This field is used in Model B. When we build the models we expect Model A to build successfully as the field was just renamed, and this won’t break the model. We expect the build of Model B to give an error that it can’t find the field it expects from Model A. In this case, our expectation is wrong. The build of Model A will give an error. SAP HANA does an end-to-end check when you do the build, even if you just ask it to only build Model A. It sees that Model B depends on Model A and that one of the fields that Model B requires from Model A isn’t there anymore. Because Model A is now breaking the “contract” it has with Model B, the build of Model A will give an error. Remember SAP HANA does an end-to-end check when doing a build, even if you just ask it to build a single model. Instead of renaming a field in Model A, what happens if we rename Model A to Model E? In this case, the build of Model E (the new name) will be successful, but the build of Model B, or the build of the entire HDB module (Model B and Model E), will fail. Even though the build process does end-to-end dependency checks, you broke the link between Model A and Model B by renaming Model A to Model E. Model E is now independent. Model B is based on Model A, which now doesn’t exist, so the build gives an error. 173 174 Chapter 4 Modeling Tools, Management, and Administration Tip Use the Where-Used option to find which models will “break” before you rename an SAP HANA information model. Deleting objects can cause weird effects. Coming back to our example of Model A and Model B, where Model B is based on Model A, let’s assume we’ve successfully built both models. The runtime objects for both models exist in the HDI container. What happens when we delete Model A? When we make a change to Model B, and rebuild it, we expect the build to fail. But the build succeeds. This is because SAP HANA does an incremental build. It looks at what needs to be rebuilt. It sees Model A’s runtime object in the HDI container and sees that it doesn’t need a rebuild, so it just rebuilds Model B. If we now do a build of the entire HDB module, it finally realizes that Model A is deleted and then gives a build error. Hint After deleting files from the HDB module, do a build of the entire module. If you ever want to delete an entire folder or subfolder, don’t just delete the module (folder). First delete the files in the folder, and do a build of the module (folder). Then you can delete the folder. If you first delete the files, the build process realizes that the design-time files have been deleted, and it then deletes the runtime objects from the HDI container. If you delete the entire module (folder), you can’t trigger a build any more, and the runtime objects won’t get deleted from the HDI container. Important Terminology In this chapter, we covered the following important terminology: 쐍 Design-time objects These objects are definitions (metadata) for what SAP HANA should create when running information models. They are seen as the “inactive” versions, and end users can’t run them. 쐍 Runtime version When you build an information model, SAP HANA checks the model and creates an “active copy” of the information model in the container (schema). This active copy is called a runtime version. Practice Questions Chapter 4 쐍 Container An isolated miniature virtual machine that contains runtime objects. 쐍 Build The process of creating a runtime object from the design-time files. 쐍 Deploy The process of distributing your development and modeling work to other systems, for example, production. 쐍 Git A powerful open-source repository of design-time objects that allows for version control, forking, and merging of branches. 쐍 SAP Web IDE for SAP HANA The web-based environment used for creating SAP HANA design-time objects. 쐍 SAP HANA database explorer The web-based tool used to work with SAP HANA runtime objects. 쐍 HDB module A collection of SAP HANA database design-time objects 쐍 HDI container The container and schema created when you build an HDB module. 쐍 Refactoring The process of restructuring your information models without changing their external behavior to improve, simplify, and clarify the design of your models. Practice Questions These practice questions will help you evaluate your understanding of the topics covered in this chapter. The questions shown are similar in nature to those found on the certification examination. Although none of these questions will be found on the exam itself, they will allow you to review your knowledge of the subject. Select the correct answers, and then check the completeness of your answers in the “Practice Question Answers and Explanations” section. Remember that on the exam, you must select all correct answers, and only correct answers, to receive credit for the question. 175 176 Chapter 4 Modeling Tools, Management, and Administration 1. What development tool is available for creating SAP HANA XSA and HDI information models? 첸 A. SAP HANA Studio 첸 B. SAP HANA Web-Based Development Workbench 첸 C. SAP HANA database explorer 첸 D. SAP Web IDE for SAP HANA 2. True or False: CDS only works with runtime objects. 첸 A. True 첸 B. False 3. Refactoring is used with which of the following actions? 첸 A. Improving your modeling design 첸 B. Translating 첸 C. Deployment 첸 D. Documenting 4. For what tasks would you use the Where-Used option in SAP HANA? 첸 A. To find all places where an information model has been deployed 첸 B. To find all places where a developer updated information models 첸 C. To find all places where an information model is used 5. You have Model B that is based on Model A. You have NOT built any of the models yet. You delete Model A. What happens during the build? (There are 2 correct answers.) 첸 A. You get a build error when you build Model B. 첸 B. The build of Model B is successful. 첸 C. You get an error when you build the entire HDB module. 첸 D. The build of the entire HDB module is successful. Practice Questions Chapter 4 6. You have Model B that is based on Model A. You have successfully built both models. You delete Model B. What happens during the build? (There are 3 correct answers.) 첸 A. You get a build error when you build Model A. 첸 B. The build of Model A is successful. 첸 C. You get an error when you build the entire HDB module. 첸 D. The build of the entire HDB module is successful. 첸 E. Model B gets deleted from the HDI container. 7. What should you use to create tables in an HDI container? (There are 2 correct answers.) 첸 A. SQL console 첸 B. Core data services (CDS) 첸 C. SAP HANA database explorer 첸 D. SAP Enterprise Architecture Designer 8. What do you need to read a table from another HDI container in your information model? (There are 3 correct answers.) 첸 A. A user-defined service 첸 B. Authorization 첸 C. Synonym 첸 D. An HDI service 첸 E. Authentication 9. What do you need to do after creating a new project with an HDB module? 첸 A. Create an .hdinamespace file 첸 B. Convert the new module to an HDB module. 첸 C. Select a space for running the project. 10. True or False: You can use the Show Lineage tool to find out which model another model is based on. 첸 A. True 첸 B. False 177 178 Chapter 4 Modeling Tools, Management, and Administration Practice Question Answers and Explanations 1. Correct answer: D You can use SAP Web IDE for SAP HANA to create SAP HANA XSA and HDI information models. SAP HANA Studio and SAP HANA Web-Based Development Workbench were used with SAP HANA 1.0 and SAP HANA XS. SAP HANA XS has been deprecated from SAP HANA 2.0 SPS 02. These older tools don’t have functionality to work with HDI containers. The SAP HANA database explorer can show HDI container objects but doesn’t create them. 2. Correct answer: B False—CDS objects are design-time objects. They create runtime objects when they are activated. The word “only” makes this statement false. Tip Remember that simple true or false questions don’t occur in the certification exam. The other questions with at three or four answers are more representative of the questions you’ll find in the certification exam. 3. Correct answer: A Refactoring is used to improve your modeling design. You can use it, for example, to rename SAP HANA models. Translating is used to translate your models to other languages (e.g., Chinese). Deployment is taking your modeling work from the development system to the production system. Documenting is writing documents and explaining your modeling work. 4. Correct answer: C You use the Where-Used option in SAP HANA to find all the places an information model is used. 5. Correct answers: A, C You have Model B that is based on Model A. You have NOT built any of the models yet. You delete Model A: You get a build errors when you build Model B and the entire HDB module because Model B depends on Model A. You get an error on Model B because you haven’t yet built either one of the models. Practice Question Answers and Explanations Chapter 4 6. Correct answers: B, D You have Model B that is based on Model A. You have successfully built both models. You delete Model B: The build of both Model A and the entire HDB module is successful. Model B’s runtime object also gets deleted from the HDI container. You only deleted the design-time object from the HDB module. The build deletes the runtime object. 7. Correct answers: B, D You should use CDS and SAP Enterprise Architecture Designer to create tables in an HDI container. You should never use SQL Console to create tables as they can’t be deployed (transported). You can’t use SAP HANA database explorer to create tables. 8. Correct answers: B, C, D You need authorization, a synonym, and an HDI service to read a table from another HDI container in your information model. You need a user-defined service when reading a table from a classic database schema, not from another HDI container. Authentication is your login credentials, such as a user name and password. Do not confuse this with authorization, which signifies what you’re allowed to do. 9. Correct answer: C You need to select a space for running the project after creating a new project with an HDB module. An .hdinamespace file is created automatically when you create an HDB module. You only have to convert the new module to an HDB module after an import. 10. Correct answer: B False. You do NOT use the Show Lineage tool to find out which model another model is based on. You can use it to find out where a specific column in an information view originates from. 179 180 Chapter 4 Modeling Tools, Management, and Administration Takeaway You should now understand the SAP HANA 2.0 development and modeling process and architecture in more detail. After reading this chapter, you should be able to create a project, create and build new SAP HANA information models, refactor your modeling work, and solve build error situations. In addition, you now should have a working knowledge of what happens behind the scenes when you create and build a model. Summary In this chapter, we discussed the SAP HANA modeling tools: SAP Web IDE for SAP HANA, SAP HANA database explorer, SAP HANA cockpit, SAP EA Designer, and the SAP HANA XSA cockpit. You know how they work and when to use them. You can now use this knowledge for the modeling chapters ahead. The next chapter will teach you the fundamentals of modeling. Chapter 5 Information Modeling Concepts Techniques You’ll Master 쐍 Understand general data modeling concepts such as views, cubes, fact tables, and hierarchies 쐍 Know when to use the different types of joins in SAP HANA 쐍 Examine the differences between attributes and measures 쐍 Learn about the different types of information views that SAP HANA uses 쐍 Get to know projections, aggregations, and unions used in SAP HANA information views 쐍 Describe core data services (CDS) in SAP HANA 쐍 Get a glimpse of best practices and general guidelines for data modeling in SAP HANA 182 Chapter 5 Information Modeling Concepts Before moving onto the more advanced modeling concepts, it’s important to understand the basics. In this chapter, we’ll discuss the general concepts of information modeling in SAP HANA. We’ll start off by reviewing views, join types, cubes, fact tables, hierarchies, the differences between attributes and measures, the different types of information views that SAP HANA offers, and CDS views. From there, we’ll take a deeper dive into SAP HANA’s information views, as well as concepts such as calculated columns, how to perform currency conversions, input parameters, and so on. Real-World Scenario You start a new project. Many of the project team members have some knowledge of traditional data modeling concepts and describe what they need in those terms. You need to know how to take those terms and ideas, translate them into SAP HANA modeling concepts, and implement them in SAP HANA in an optimal way. Some of the new concepts in SAP HANA are quite different from what you’re used to, for example, not storing the values of a cube, but instead calculating it when required. If you understand these modeling concepts, you can quickly understand and create real-time applications and reports in SAP HANA. Objectives of This Portion of the Test The objective of this portion of the SAP HANA certification is to test your understanding of basic SAP HANA modeling artifacts. For the certification exam, SAP HANA modelers must have a good understanding of the following topics: 쐍 The different types of joins available in SAP HANA and when to use each one 쐍 How the SAP HANA information views correspond to traditional modeling artifacts 쐍 The different types of SAP HANA views and when to use them 쐍 The purpose of CDS in SAP HANA Key Concepts Refresher Chapter 5 Note This portion contributes up to 5% of the total certification exam score. Key Concepts Refresher This section looks at some of the core data modeling concepts used when working with SAP HANA and that will be covered on the exam. Let’s start from where the data is stored in SAP HANA by looking at tables. Tables Tables allow you to store information in rows and columns. Inside SAP HANA, there are different ways of storing data: in row-oriented tables or column-oriented tables. In a normal disk-based database, we use row-oriented storage because it works faster with transactions in which we’re reading and updating single records. However, because SAP HANA works in memory, we prefer the column-oriented method of storing data in the tables, which allows for the use of compression in memory, removes the storage overhead of indexes, saves a lot of space, and optimizes performance by loading compressed data into computer processors. Next, we can begin combining the tables in SAP HANA information models. Views The first step in building information models is building views. A view is a database entity that isn’t persistent and is defined as the projection of other entities, such as tables. For a view, we take two or more tables, link these different tables together on matching columns, and select certain output fields. Figure 5.1 illustrates this process. 183 184 Chapter 5 Information Modeling Concepts Selected field 1 Join Foreign key Primary key Selected field 3 Selected field 2 Left Table Right Table View Selected field 1 Selected field 2 Selected field 3 Result of the view looks like a table Figure 5.1 Database View There are four steps when creating database views: 1. Select the tables. Select two or more tables that have related information, for example, two tables listing the groceries you bought. One table contains the shop where you bought each item, the date, the total amount, and how you paid for your groceries. The other table contains the various grocery items you bought. 2. Link the tables. Next, link the selected tables on matching columns. In our example, perhaps our two tables both have shopping spree numbers, which should ideally be a unique number for every shopping trip you took. We call this a key field. You can link the tables together with joins or unions. 3. Select the output fields. You want to show a certain number of columns that are of interest, for example, to use in grocery analytics. Normally, you don’t want all the available columns as part of the output. Key Concepts Refresher Chapter 5 4. Save the view. Finally, save your view, and it’s created in the database. These views are sometimes called SQL views. You can even add some filters on your data in the view, for example, to show only the shopping trips in 2019 or all the shopping trips in which you bought apples. When you call this view, the database gathers the data. The database view pulls the tables out, links them together, chooses the selected output fields, reads the data in the combined fields, and returns the result set. After this, the view “disappears” until you call it again. Then, the database deletes all the output from the view. It doesn’t store this data from the view inside the database storage. Tip It’s important to realize that database views don’t store the result data. Each time you call a view, it performs the full process again. You might have also heard about materialized views, in which people store the output created by a view in another table. However, in our definition of views, we stated that a view is a database entity that isn’t persistent; that is the way we use the term view in SAP HANA. The data stays in the database tables. When we call the view, the database gathers the subset of data from the tables, showing us only the data that we asked for. If we ask a database view for that same data a few minutes later, the database will regather and recalculate the data. This concept is quite important going forward because you make extensive use of views in SAP HANA. For example, SAP HANA creates a cube-like view, sends the results back to you, and “disappears” again. SAP HANA performs this process fast, avoiding consuming a lot of extra memory by not storing static results. What really makes this concept important is the way it enables you to perform real-time reporting. In the few minutes between running the same view twice, our data in the table might have changed. If we use the same output from the view every time, we won’t get the benefit of seeing the effect of the updated data. Extensive caching of result sets doesn’t help when we want real-time reporting. Note Real-time reporting requires recalculating results because the data can keep changing. Views are designed to handle this requirement elegantly. 185 186 Chapter 5 Information Modeling Concepts We’ve discussed the basic type of database views here, but SAP HANA takes the concept of views to an entirely new level! Before we get there, we’ll look at a few more concepts that we’ll need for our information-modeling journey. Cardinality When joining tables together, we need to define how much data we expect to get from the various joined tables. This part of the relationship is called the cardinality. You’ll typically work with four basic cardinalities. The cardinality is expressed in terms of how many records on the left side join to how many records on the right side. The cardinalities are as follows: 쐍 One-to-one Each record on the left side matches with one record on the right side. For example, say you have a lookup table of country codes and country names. In the left table, you have a country code called ZA. In the right table (the lookup table), this matches with exactly one record, showing that the country name of South Africa is associated with ZA. 쐍 One-to-many In this case, each record on the left side matches with many records on the right side. For example, say you have a list of publishers in the left table. For one publisher—such as SAP PRESS—you find many published books. 쐍 Many-to-one This is the inverse of one-to-many in that many records on the left side match one record on the right side. For example, this case may apply when many people in a small village are all buying items at the local corner shop. 쐍 Many-to-many In this case, many records on the left side match many records on the right side, such as when many people read many web pages. Joins Before we look at the different types of joins, let’s quickly discuss the idea of “left” and “right” tables. Sometimes, it doesn’t make much difference which table is on the left of the join and which table is on the right because some join types are symmetrical. However, with a few join types, it does make a difference. Key Concepts Refresher Chapter 5 We recommend putting the most important table, seen as the main table, on the left of the join. An easy way to remember which table should be the table on the right side of the join is to determine which table can be used for a lookup or for a dropdown list in an application (see Figure 5.2). Customers lookup Field 1 Field n Customers Customers Invoice 1 Customer 1 Customer 1 Invoice 2 Customer 1 Customer 2 Invoice 3 Customer 2 Left Table—Invoices Customers Field 1 … Field n Customer 1 Customer 2 Right Table—Customers Figure 5.2 Table Used for Dropdown List on Right-Hand Side of a Join This doesn’t always mean that this table has to be physically positioned on the left of the screen in our graphical modeling tools. When creating a join in SAP HANA, the table we start the join from (by dragging and dropping) becomes the left table of the join. Databases are highly optimized to work with joins; they are much better at it than you can hope to be in your application server or reporting tool. SAP HANA takes this to the next level with its new in-memory and parallelization techniques. In SAP HANA, there are a number of different types of joins. Some of these are the same as in traditional databases, but others are unique to SAP HANA. Let’s start by looking at the basic join types. 187 188 Chapter 5 Information Modeling Concepts Basic Join Types The basic join types that you’ll find in most databases are inner joins, left outer joins, right outer joins, and full outer joins. You can specify the join types in databases using SQL statements. We’ll discuss SQL in more detail in Chapter 8. The easiest way to visualize these join types is to use circles, as shown in Figure 5.3. This is a simplified illustration of what these join types do. (The assumption is that the tables illustrated here contain unique rows; that is, we join the tables via primary and foreign keys, as illustrated earlier in Figure 5.1. If the tables contain duplicate records, this visualization doesn’t hold.) Left Table A Right Table B B Inner Join B B C Left Table Left Outer Join A Right Table A NULL B B B B C Left Table A Right Table B B B B C NULL C Right Outer Join Full Outer Join Left Table A Right Table A B B B NULL B C NULL C Figure 5.3 Basic Join Types and Their Result Sets The left circle represents the table on the left side of the join. In Figure 5.3, the left table contains the two values A and B. The right circle represents the table on the right side of the join and contains the values B and C. Key Concepts Refresher Chapter 5 Inner Join The inner join is the most widely used join type in most databases. When in doubt and working with a non-SAP HANA database, try an inner join first. The inner join returns the results from the area where the two circles overlap. In effect, this means that a value is shown only if it’s found to be present in both the left and the right tables. In our case, in Figure 5.3, you can see that the value B is found in both tables. Because we’re only working with the “overlap” area, it doesn’t matter for this join type which table is on the left or on the right side of the join. Left Outer Join When using an inner join, you may find that you’re not getting all the data back that you require. To retrieve the data that was left out, you use a left outer join. You’ll only use this join type when this need arises. With a left outer join, everything in the table on the left side is shown (first table). If there are matching values in the right table, these are shown. If there are any “missing” values in the right-hand table, these are shown with a NULL value. For this join type, it’s important to know which tables are on the left and the right side of the join. Right Outer Join The inverse of the left outer join is the right outer join. This shows everything from the right table (second table). If there are matching values in the left table, these are shown. If there are any missing values in the left-hand table, these are shown with a NULL value. Full Outer Join As of SAP HANA 1.0 SPS 11, we have the full outer join. Many other databases also have this join type, so it’s still regarded as one of the four basic join types. This join type combines the result sets of both the left outer join and right outer join into a single result set. Self-Joins There isn’t really a join type called a self-join; the term refers to special cases in which a table is joined to itself. The same table is both the left table and the right 189 190 Chapter 5 Information Modeling Concepts table in the join relationship. The actual join type would still be something like an inner join. This configuration is used mostly in recursive tables—that is, tables that refer back to themselves, such as in HR organizational structures. All employee team members have a manager, but managers are also employees of their companies and have someone else as their manager. In this case, team members and managers are all employees of the same company and are stored in the same table. Other examples include cost and profit center hierarchies or bills of materials (BOMs). We’ll return to this concept when we discuss hierarchies later in this chapter. SAP HANA Join Types The join types on the right side of Figure 5.4 are unique to SAP HANA: referential join, text join, and temporal join. Inner Join Referential Join, Temporal Join Left Outer Join Text Join Right Outer Join Full Outer Join Figure 5.4 Basic Join Types and Some SAP HANA-Specific Join Types Key Concepts Refresher Chapter 5 Referential Join The referential join type normally returns exactly the same results as an inner join. So, what’s the difference? There are many areas in which SAP HANA tries to optimize the performance of queries and result sets. In this case, even though a referential join gives the same results as an inner join, it provides a performance optimization under certain circumstances. The referential join uses referential integrity between the tables in the join. Referential integrity is used in a business system to ensure that data in the left and right tables match. For example, we can’t have an invoice that isn’t linked to a customer, and the customer must be inserted into the database before we’re allowed to create an invoice for that customer. Note that you don’t need a value on both sides of the join: you may have a customer without an invoice, but you may not have an invoice without a customer. If we can prove that the data in our tables has referential integrity, which most data in financial business systems has, then we can use a referential join. In this case, if we’re not reading any data from the right table, then SAP HANA can quite happily ignore the entire right table and the join itself, and it doesn’t have to do any checking, which speeds up the whole process. To understand referential joins in a bit more detail, let’s look at an example. Let’s say you have a table on the left of the join containing invoices. The table on the right contains the products you sold. These tables are joined with a referential join. Several reports query these tables. If the report asks for data from both tables, the referential join behaves like an inner join. If fields from the right table are requested, an inner join is executed. The interesting aspects of a referential join kick in when a report only requests fields from the left table containing the invoice data. When no data from the right table is requested, the join isn’t executed. This is how SAP HANA optimizes the performance of the queries. This case is very common in cubes, which are used in most analytics, reporting, and business intelligence scenarios. (We’ll explain cubes in the next sections.) The only exception to this is the rare instance where the cardinality is 1:n. In this case, the referential join has to execute the query, even if no fields from the right table are requested. In that case, it executes an inner join as usual. Inner joins are slower than referential joins as they will always execute the join, even if no fields from the right table are requested. Left outer joins perform similarly to referential joins, but they give different results. 191 192 Chapter 5 Information Modeling Concepts Referential joins depend on referential integrity. Even the name of the join is derived from this. If your application doesn’t ensure the referential integrity of your data, you shouldn’t use this type of join. In such a case, you might get different results, depending on whether the database query requests fields from the right table. Tip A referential join is the optimal join type in SAP HANA. Text Join Another SAP HANA-specific joint type is a text join. The name gives a clue as to when you’ll use this type of join. The right-hand table, as shown in Figure 5.5, is something like a text lookup table—for example, a list of country names. On one side, a Country code appears, such as US or DE. In the text lookup table, the same Country code appears that will link with the key and also contain the actual name of the country (e.g., United States or Germany). Field 1 Field n Country code DE US ZA Left Table Text Join Field 1 Language code Country name DE EN Germany DE DE Deutschland DE ES Alemania US EN United States ZA EN South Africa Country code Field n Right Table Figure 5.5 Text Join Key Concepts Refresher Chapter 5 Note SAP sells software in many countries, and SAP business systems are available in multiple languages. In Figure 5.5, the names of the countries are translated into different languages. The fourth column in the text lookup table, called the Language code, indicates into which language the country name has been translated. For example, for the country called DE and a language code of EN, the name of the country is in English and thus is Germany. If the language code was DE (for German), the name of the country would be Deutschland, and if the language code was ES (for Español, indicating Spanish), then the name of the country would be Alemania. Therefore, one country can have totally different names depending on what language you speak. In such a case, SAP HANA does something clever: By looking at the browser, the application server, or the machine that you’re working on, it determines the language in which you’ve logged in. If you’re logged in using English, it knows to use the name Germany. If you’re logged in using German, it provides the German country name Deutschland for you. The text join behaves like a left outer join but always has a cardinality of one-toone (1:1). In other words, it only returns a single country name based on the language you’re logged in with. Even if you’re logged in via mobile phone, your mobile phone has a certain language setting. Temporal Join The temporal join is used when we have FROM and TO date and time fields, or integers. For example, say you’re storing the fuel price for gasoline in a specific table. From the beginning of this month to the end of this month, gasoline sells for a certain price. Next month, the price differs, and, maybe two weeks later, it’s adjusted again. Later, you’ll have a list of all the gasoline prices for different time periods. As a car owner, you now want to perform calculations for how much you’ve paid for your gasoline each time you filled up your tank. However, you just have the number of gallons and the dates of when you filled up. You can say, “OK, on this date, I filled up the tank.” You then have to look up which date range your filling-up date falls within and find the price for that specific date. You can then calculate what you paid to fill your tank for each of your dates. Figure 5.6 illustrates this example. 193 194 Chapter 5 Information Modeling Concepts This date range lookup can be a little more complicated in a database. Sometimes, programmers read the data into an application server and then loop through the different records. In this case, a temporal join makes it easy because SAP HANA will perform the date lookups in the FROM and the TO fields automatically for you, compare it to the date you’ve supplied, and get you the right fuel price at that specific date. It will perform all the calculations automatically for you—a great time and effort saver. Field 1 Field n Date 27 February 14 April 13 May Left Table Temporal Join Field 1 From To Gas Price 1 January 28 February 12.22 1 March 1 April 11.45 2 April 25 April 11.50 26 April 10 May 12.77 11 May 25 May 11.33 Right Table—Pricelist Figure 5.6 Temporal Join for Gasoline Prices A temporal join uses an inner join to do the work. A temporal join requires valid FROM and TO dates (in the gasoline price table) and a valid date-time column (in your car logbook table). Spatial Join Spatial joins became available in SAP HANA 1.0 SPS 09. SAP HANA provides spatial data types, which we can use, for example, with maps. These all have the prefix ST_. A location on a map, with longitude and latitude, is an ST_POINT data type. (We’ll discuss spatial data and analytics further in Chapter 11.) Key Concepts Refresher Chapter 5 We can use special spatial joins between tables in SAP HANA (see Figure 5.7). Say you have a map location stored in a ST_POINT data type in one table. In the other table, you have a list of suburbs described in a ST_POLYGON data type. You can now join these two tables with a spatial join and define the spatial join to use the ST_ CONTAINS method. For each of the locations (ST_POINT), this will calculate in which suburb (ST_POLYGON) they are located (ST_CONTAINS). Field 1 Field n ST_POINT (x1, y1) (x2, y2) (x2, y3) Left Table Field 1 Spatial Join using method ST_CONTAINS ST_POLYGON Field n Right Table Figure 5.7 Spatial Join About a dozen methods like ST_CONTAINS are available for spatial joins. For example, you can test if a road crosses a river, if one area overlaps another (e.g., mobile phone tower reception areas), or how close certain objects are to each other. Dynamic Joins A dynamic join isn’t really a join type; it’s a join property. It’s also a performance enhancement available for SAP HANA. After you’ve defined a join, you can mark it as a dynamic join. Note For a dynamic join to work, you have to join the left and right tables on multiple columns. 195 196 Chapter 5 Information Modeling Concepts Let’s assume you define a static (normal) join on columns A, B, and C between two tables, as shown in Figure 5.8. In this case, the join criteria on all three columns will be evaluated every time the view is called in a query. Field 1 A B C Field n Left Table Dynamic Join Field 1 A B C Field n Right Table Figure 5.8 Dynamic Join on Multiple Columns If you now mark the join as dynamic, the join criteria will only be evaluated on the columns involved in the query to the view. If you only ask for column A in your query of the view, then only column A’s join criteria will be evaluated. The join criteria on columns B and C won’t be evaluated, and your system performance improves. Tip If your query doesn’t request any of the join columns in the dynamic join, it will produce a runtime error. Lateral Join We’ve discussed the join types that are used most often. You might also need to use some specialized join types occasionally. One of these is a lateral join. The lateral Key Concepts Refresher Chapter 5 join helps you do row-by-row processing of data using the database engine, instead of having to copy the data to the application server and processing it using loops. Joins like these improve the performance of complex queries by ten or a hundred times. SPS 04 In SAP HANA SPS 04, we get a lateral join. Say you want to join a left table to a subquery. Normally you would write the following: SELECT A, B FROM TABLE_LEFT INNER JOIN (SELECT C, D FROM RIGHT_ TABLE WHERE E < 100) ON … However, you have cases where you want to let the subquery depend on values in the left table. Instead of having E < 100 in the preceding statement, you want to use E < TABLE_ LEFT.VALUE. You basically want a FOR EACH loop through the left table, computing which rows should be joined from the right table. You can use a lateral join for this. The lateral join works together with the INNER and LEFT OUTER joins. We change the earlier line to the following: SELECT A, B FROM TABLE_LEFT INNER JOIN LATERAL (SELECT C, D FROM RIGHT_ TABLE WHERE E < TABLE_LEFT.VALUE) ON … We’ve now discussed most of the join types. In the next section, we discuss a concept that has become very important with SAP HANA modeling: CDS views. Core Data Services Views CDS is a modeling concept that has become important with SAP HANA 2.0 using SAP HANA extended application services, advanced model (SAP HANA XSA), as well as the new application programming model of SAP Cloud Platform. To understand why we need CDS, it’s important to remember the wider context of how SAP HANA systems are deployed. While learning SAP HANA, you work with a single system, but productive environments normally include development, quality assurance, and production systems. You perform development in the development environment, in which you have special modeling and development privileges. Your models and code then enter the quality assurance system for testing. When everyone is satisfied with the results, the models and code move to the production system. You don’t have any modeling and development privileges in the quality assurance and production systems; you therefore need another way to create views and other database objects in the production system, ideally without giving people special privileges. 197 198 Chapter 5 Information Modeling Concepts An SAP HANA system can be deployed as the database for an SAP business system. In such a case, ABAP developers in the SAP business system don’t have any modeling and development privileges, even in the SAP HANA database development system. We need a way to allow ABAP developers to create views and other database objects in the database system without these special privileges. With that in mind, let’s look at why CDS is a good idea with the following example: You need to create a new schema in your SAP HANA database. It’s easy to use a short SQL statement to do this, but remember the wider context. Someone will need the same create privileges in the production system! We want to avoid this because giving anyone too much authorization in a production system can weaken security and encourage hackers. CDS allows for a better way to create such a schema: specify the schema creation statement once by creating a CDS (text) file with the schema_name="DEMO"; instruction. CDS files have a .hdbcds file extension. When we deploy this CDS file to the production system, the SAP HANA system automatically creates the schema for us. In the background, SAP HANA automatically converts the CDS file into the equivalent schema creation SQL statement and executes this statement. The system administrators don’t get privileges to create any schemas in the production system. They have rights to import CDS files, but they have no control over the contents of the CDS files, for example, the name of the schema; that’s decided by the business. CDS thus gives us a nice way to separate what the business needs from what database administration can do. There’s much more to CDS. You can specify table structures (metadata) and associations (relationships) between tables with CDS. When deploying this CDS file to production, it creates the new tables with their relationships. Even more impressive, if you want to modify the table structures later, you just update the CDS file. When you redeploy the CDS file, SAP HANA doesn’t delete the existing tables and re-create them; it automatically generates and executes the correct SQL statements to modify only the minimal changes to the existing table structures, for example, adding only a new column. CDS doesn’t replace SQL statements because it ultimately translates back into SQL statements. CDS adds the awareness to use a table creation SQL statement the first time it’s run, but a table modification SQL statement the second time it’s run. The biggest advantage of this smart behavior is that the data already stored in a table is kept. If we ran a table creation statement the second time, it would delete the contents of the table. Not something we want in a productive system! Key Concepts Refresher Chapter 5 Note We don’t define the schema name when we create CDS artifacts such as tables. The schema name is added automatically and is the name of the SAP HANA Deployment Infrastructure (HDI) container into which the artifacts are deployed. We can also define new semantic types for when we create tables. Instead of using a DATE data type for a field, we can define a DELIVERY_DATE. On a lower level, it still uses a DATE type, but it gives us some business context (e.g., deliveries) when defining the field. If we later change the semantic type, for example, we add more decimal figures to a PAYMENT type, all tables automatically get updated to reflect the new behavior. We can also create views using CDS. These CDS views can be used as data sources in our SAP HANA information views. Note that these CDS views don’t replace the more powerful calculation views we can create in SAP HANA. (We discuss these in Chapter 6 and Chapter 7.) Calculation views are very good for analysis and reporting. Developers who create OData services and SAP Fiori screens sometimes prefer to stick to CDS views, using a text environment rather than the graphical tools. ABAP developers can also use CDS inside ABAP. When a CDS view is sent to the SAP HANA system, it creates the necessary SAP HANA database objects without the ABAP developer having to log in to the SAP HANA database. Because ABAP is database independent, the same CDS file sent to another database will create the database objects relevant to that database. Because of this database independence, the versions of CDS for ABAP and CDS for SAP HANA have slight differences. In SAP HANA 1.0, we used the SAP HANA tools to create tables. In SAP HANA 2.0 with HDI, we use CDS to create tables. The tables in a database are often referred to as the persistence layer. Tip If you want to learn CDS better, or are interested in learning more of SAP HANA XSA development, we recommend that you look at the series of 64 videos that Thomas Jung created on his private YouTube channel at http://s-prs.co/487601. The example screens in this book make use of the SAP HANA Interactive Education (SHINE) demo package described in Chapter 2. SHINE is a good example of CDS in action. You don’t have to create a schema, create tables, or import data into these tables. When SHINE is imported and deployed into your SAP HANA system, it automatically creates all that content for you by using CDS statements. This shows 199 200 Chapter 5 Information Modeling Concepts how using CDS in your SAP HANA projects keeps everything together—from the creation of the persistence layer, to views, to security. When you deploy a project to other systems, these artifacts are automatically created for you during the build process. Tip The SAP Cloud Application Programming Model relies on CDS and is important for modelers and developers going forward. The application programming model is a framework comprised of CDS, service Software Development Kits (SDKs) for working with services in Java and Node.js, cloud-native platform, and tools in SAP Web IDE working together with local tools on the same projects. You can read more about the application programming model at http://s-prs.co/v487602. On the page that appears, click on the Next> link to get a list of tutorials on this topic. SPS 04 SAP Cloud Application Programming Model is supported with SAP HANA 2.0 SPS 04. Now, let’s take our knowledge of these concepts to the next “dimension.” Cube One of the most important modeling concepts we’ll cover is the cube. We’ll expand on this concept when we discuss the information views available in SAP HANA. In this section, we’ll use a very simplistic approach to describing cubes to give you an understanding of the important concepts. When you look at a cube the first time, it looks like multiple tables that are joined together, which might be true at a database level. What makes a cube a cube, however, are the rules for joining these tables together. The tables you join together contain two types of data: transactional data and master data. As shown in Figure 5.9, the main table in the middle is called a fact table, which is where our transactional data is stored. The transactional data stored in the fact table is referred to as either key figures or measures. In SAP HANA, we just use the term measures. The side tables are called dimension tables, and this is where our master data is stored, which is referred to as characteristics, attributes, or sometimes even facets. Key Concepts Refresher Chapter 5 In SAP HANA, we just use the term attributes. In the following sections, we’ll walk through an example and then provide you with some definitions of key terms. Dimension Table Contains Measures Contains Attributes Fact Table Dimension Table Dimension Table Contains Attributes Contains Attributes Figure 5.9 Cube Features Example Our cube example will be simplified to communicate the most basic principles. In our example, we’ll work with the sentence “Laura buys 10 apples.” You’ll agree that this sentence describes a financial transaction: Laura went to the shop to buy some apples, she paid for them, and the shop owner made some profit. Where will we store this transaction’s data in the cube? There are a couple of rules that we have to follow. The simplest rule is that you store the data on which you can perform calculations in the fact table, and you store all other data on which you can’t perform calculations in the dimension tables. In our transaction of “Laura buys 10 apples,” we first look at Laura. Can we perform a calculation on Laura? No, we can’t. Therefore, we should put her name into one of the dimension tables. We put her name in the top dimension table in Figure 201 202 Chapter 5 Information Modeling Concepts 5.10. Laura isn’t the only customer of that shop; her name is stored with those of many other people in this dimension table. We’ll call this table the customer dimension table. Customer table Customer 2— Laura Financial transaction: Laura buys 10 apples Product table Customer 2—Laura, Product 7—Apples, Quantity—10 Product 7—Apples Figure 5.10 Financial Transaction Data Stored in a Cube Next, let’s look at the apples. Can we perform calculations on apples? No, we can’t, so again this data goes into a dimension table. The shop doesn’t sell only apples; it also sells a lot of other products. The second dimension table is therefore called the product dimension table. In an enterprise data warehouse (EDW), we would normally limit the number of dimension tables that we link to the fact table for performance reasons. Typically, in an SAP Business Warehouse (SAP BW) system, we limit the number of dimension tables to a maximum of 16 or 18. Finally, let’s look at the number of apples, 10. Can we perform a calculation on that? Yes we can, because it’s a number. We therefore store that number in the fact table. To complete the picture, if Laura buys more apples, then she is a top customer; let’s say she’s customer number 2. Apples are something the store keeps in stock, so apples are product number 7. If we store a link (join) to customer 2 and another link (join) to product 7 and the number 10 (number of apples), we would say that it constitutes a complete transaction. We can abbreviate the transaction “Laura buys 10 apples” to 2, 10, and 7 in Key Concepts Refresher Chapter 5 our specific cube (2 indicates Laura, 10 indicates the number of apples, and 7 the apples). Laura and apples are part of our master data. The transaction itself is stored in the fact table. We’ve now put a real-life transaction into the cube. Tip You can have master data without associated transactions. As the shopkeeper, you have apples (master data) for sale, but no one has bought them yet (no transaction). Similarly you can have customers such as Laura (master data) but no invoice yet. The reverse doesn’t apply, however. You can’t have an invoice without a customer or products. Hopefully, this gives you a better idea of what a cube is and how to use it. Let’s complete this short tour of cubes by clarifying a few terms ahead. Key Terms As previously shown in Figure 5.9, attributes are stored in dimension tables, and measures are stored in fact tables. Let’s define attributes and measures: 쐍 Attributes Attributes describe the parts of a transaction, such as the customer Laura or the product apples. This is also referred to as master data. We can’t perform calculations on attributes. 쐍 Measures Measures are something we can measure and perform calculations on. Measurable (transactional) information goes into the fact table. Other key terms to be aware of are as follows: 쐍 Fact tables in the data foundation Figure 5.10 showed only a single fact table, but sometimes we have more complex cubes that include multiple fact tables. In SAP HANA, we call a group of fact tables (or the single fact table) the data foundation, although some people refer to the entire group of tables as the fact table. Tip When we join the data foundation (fact tables) to the dimension tables, we always put the data foundation on the left side of the join. 203 204 Chapter 5 Information Modeling Concepts 쐍 Star joins The join between the data foundation and the dimension tables is normally a referential join in SAP HANA. We refer to this as a star join, which indicates that this isn’t just a normal referential join between low-level tables but also a join between a data foundation and dimension tables. This is seen as a join at a higher level. Remember You use star joins to join dimension tables to a fact table. With all the building blocks in place, let’s start examining the graphical SAP HANA information views. Calculation Views Let’s start our introduction to SAP HANA information views by looking back at what we learned when we discussed cubes. When we graphically build complex views in SAP, we call these calculation views. These are different from SQL views and CDS views. Calculations views can be far more powerful and complex than these other views. We’ll now take what we’ve learned, and see how SAP HANA implements this with different types of calculation views. We can then combine these calculation views to build complex information models. Types of Calculation Views SAP HANA has three types of calculation views: 쐍 Dimension views The first thing we need to build cubes is the master data. Master data is stored in dimension tables. Sometimes, dimension tables are created for the cubes. In SAP HANA, we don’t create dimension tables as separate tables; instead, we build them as views by using the master data tables. We join all the low-level database tables that contain the specific master data we’re interested in into a view. For example, we might join two tables in a view to build a new Products dimension in SAP HANA, and maybe another three tables to build a new Customers dimension. Key Concepts Refresher Chapter 5 We use calculation views of type dimension, also called dimension calculation views. In this book, we’ll refer to these type of views simply as dimension views. Just as the same master data is used in many cubes, we’ll reuse dimension views in many of the other SAP HANA information views. We can even build our own library of dimension views. 쐍 Star join views In the same way, instead of storing data in cubes, we can create another type of view in SAP HANA that produces the same results as a traditional cube. These views produce the same results as an old-fashioned, traditional cube called calculation views of type cube with star join. Terminology In this book, we’ll simply refer to these types of views as star join views, not as cube views, which would lead to confusion. This is because the third type of SAP HANA view is a calculation view of type cube. The results created by the star join views in SAP HANA are the same as that of a cube in an EDW, except that in a data warehouse, the data is stored in a cube. With SAP HANA, the data isn’t stored in a cube but is calculated when required. To build a star join view, follow these steps: – Create the dimension views you need as separate calculation views of type dimension. – In the calculation view of type cube with star join (star join view), join the transaction tables together to form the data foundation. – Join the data foundation (or single fact table) to the dimension calculation views using star joins. 쐍 Cube views This view type provides even more powerful capabilities. For example, you can combine multiple star join views (“cubes”) to produce a combined result set. In SAP BW, we call this combined result set a MultiProvider. In SAP HANA, we call these calculation views of type cube. Use cases for this type of view include the following: – In your holding company, you want to combine the financial data from multiple subsidiary companies. – You want to compare your actual expenses for 2019 with your planned budget for 2019. 205 206 Chapter 5 Information Modeling Concepts – You have archived data in an archive database that you want to combine with current data in SAP HANA’s memory to compare sales periods over a number of years. Using and Building the Information Model Now that you know how to build individual views, let’s examine how to use the different types of SAP HANA calculation views together in information modeling. Figure 5.11 provides an overview of everything you’ve learned in this chapter and how each topic fits together. SAP source system Non-SAP source system Database Database HTML5 browser Report or query Application server Data provisioning tool, e.g. SDI Table 1 Table 8 Table 2 Table 9 Table 3 Table 10 Table 4 Table 11 Table 5 Table 12 Customer dimension view Products dimension view Laura Apples Star Join (Logical Join) Quantity – 10 Star-join view (“Cube”) Union Star-join view (“Cube”) Calculation view of type cube Table 6 Data Foundation Table 7 Tables Star-join view (“Cube”) SAP HANA Figure 5.11 Different SAP HANA Calculation Views Working Together On the top left-hand side, you can see our source systems: an SAP source system and a non-SAP source system. SAP HANA doesn’t care from which system the data comes. In many projects, we have more non-SAP systems that we get data from Key Concepts Refresher Chapter 5 than SAP systems. We extract the data from these systems using various data provisioning tools, which we’ll look at in Chapter 9. The data is stored in row tables in these source systems. When we use data provisioning methods and tools, the data is put into column tables in the SAP HANA database, as illustrated in the bottom-left corner of Figure 5.11. Now, let’s start building our information models on top of these tables. If we want to build something like a cube for our analytics or reporting, we use SAP HANA calculation views. First, we need the equivalent of a dimension table, such as our product or customer dimension table. To do this, we’ll build a dimension view (a calculation view of type dimension). This is indicated in Figure 5.11 by Laura and Apples, which represent the customer and product dimension views. We create these dimension views the same way we create any other view: take two tables, join them together, select certain output fields, and save it. After the dimension views are created, we build a data foundation for the cube. The data foundation is for all the transactional data. The data foundation stores measures, such as the number 10 for the 10 apples that Laura bought. We build a data foundation in a similar fashion to building a dimension view via either multiple tables or a single table. If using multiple tables, we join them together, select the output fields, and build our data foundation. To complete the cube, we have to link our data foundation (fact tables) to the dimensions with a star join. Finally, to complete our star join view, we also select the output fields for our cube. The created star join view will produce the same result as a cube, but, in this case, no data is stored, and it performs much better. In the next step, we can create a calculation view of type cube, in which we combine two star join views (cubes). We can combine the star join views using a join or a union. We recommend a union in this case because it leads to better performance than a join. Finally, we can build reports, analytics, or applications on our calculation view, as shown in the top right of Figure 5.11. Let’s now think about the process from the other side and look at what happens when we call our report or application that is built on the calculation view on the right side of our diagram. This is the top-down view of Figure 5.11. The calculation view doesn’t store the data; instead, it has to gather and calculate all the data required by the report. SAP HANA looks at the calculation view’s description and recognizes that it has to build two star join views and union them together by first determining how to build the star join views. 207 208 Chapter 5 Information Modeling Concepts SAP HANA then goes down one level and determines that to build the first star join view, it needs a data foundation and two dimension views, and it then joins them together with a star join. Then, it goes down another level and determines that to build the dimension view, it must read two different tables, join them together, and select the output fields. SAP HANA then builds the two dimension views. Going back up one level, SAP HANA needs to combine these two dimension tables with a data foundation. It creates the data foundation and joins it with the two dimension tables using a star join. It also builds the second star join view in the same way. Going up to the level of the calculation view, it combines the two star join views with a union. Finally, SAP HANA sends the result set out to the report and then cleans out the memory again. (It doesn’t store this data!) If someone else asks for the exact same data five minutes later, SAP HANA happily performs the same calculations all over again. At first, this can seem very wasteful. Often we’re asked “Why don’t you use some kind of caching?” However, caching means that we store data in memory. In SAP HANA, the entire database is already in memory. We don’t need caching because everything is in memory already. The reason people ask this question is that in their traditional data warehouses, data normally doesn’t change until the next evening when the data is refreshed. Inside SAP HANA, the data can be updated continuously, and we always want to use the latest data in our applications, reports, and analytics. That means we have to recalculate the values each time; caching static result sets won’t work. In summary, to use real-time data, you have to recalculate as data gets updated. Other Modeling Artifacts We’ll now discuss some of the other modeling concepts we’ll use when building our information models. The power of calculation views of type cube is that you can build them up in many layers. In the bottom layer, you can perhaps have a union of two star join views (cubes), as shown in Figure 5.12. In the next (higher) layer, we sum up (aggregate) all the values from the union. In the level above that, we can rank the summed results to show the top 10 most profitable customers. Some of the artifacts we can use are discussed in the following subsections. Key Concepts Refresher Chapter 5 Ranking (e.g., Top 10) Aggregation (e.g., Sum) Union Star join view Customers Total Sales Customer 2 $500 Customer 1 $400 Customers Total Sales Customer 1 $400 Customer 2 $500 Customers Total Sales Total Sales Customer 1 $100 2018 Customer 1 $300 2019 Customer 2 $150 2018 Customer 2 $250 2019 Customers Sales for 2015 Customers Sales for 2016 Customer 1 $100 Customer 1 $300 Customer 2 $150 Customer 2 $350 209 Star join view Figure 5.12 Calculation View of Type Cube Built in Various Layers Projection A projection is used to change an output result set. For example, we can get a result set from a star join view (cube) and find that it has too many output fields for what we need in the next layers. Therefore, we use a projection as an additional inbetween layer to create a subset of the results. We can also use the projection to calculate additional fields—for example, sales tax—or we can rename fields to give them more meaningful names for our analytics and reporting users. Ranking As illustrated in the top node of Figure 5.12, we can use ranking to create useful topN or bottom-N reports and analytics. This normally returns a filtered list (e.g., only the top 10 companies) and is sorted in the order we specify. 210 Chapter 5 Information Modeling Concepts Aggregation The word aggregation means a collection of things. In SQL, we use aggregations to calculate values such as the sum of a list of values. Aggregations in databases provide the following functions: 쐍 Sum Adds up all the values in the list. 쐍 Count Counts how many values (records) there are in the list. 쐍 Minimum Finds the smallest value in the list. 쐍 Maximum Finds the largest value in the list. 쐍 Average Calculates the arithmetical mean of all the values in the list. 쐍 Standard deviation Calculates the standard deviation of all the values in the list. 쐍 Variance Calculates the variance of all the values in the list. Normally, we would only calculate aggregations of measures. (Remember that measures are values we can perform calculations on, which usually means our transactional data.) For example, we could calculate the total number of apples a shop sold in 2019. However, in SAP HANA, we can also create aggregations of attributes, essentially creating a list of unique values. Union A union combines two result sets. Figure 5.12, shown previously, illustrates this by combining the results of two star join views. In SAP HANA graphical calculation views, we’re not restricted to combining two result sets; we can combine multiple result sets. SAP HANA merges these multiple result sets into a single result set. Let’s say we have four result sets—A, B, C, and D—that we union together. Although we union multiple result sets, that doesn’t mean the performance will be bad. If SAP HANA detects that you’re not asking for data from A and D, it won’t even generate the result sets for A and D; it will only work with B and C. Key Concepts Refresher Chapter 5 211 Note The union expects the two result sets to have the same number of columns in the same order, and the matching columns from each result set must have similar data types. You can have variations in unions, such as the following: 쐍 Union all Merges the result sets and returns all records. 쐍 Union Merges the results and only returns the unique results. 쐍 Union with constant values Lets you “pivot” your merged results to help get the data ready for reporting output. Figure 5.13 compares the output generated by a union to that of a union with constant values. Customers Total Sales Total Sales Customer 1 $100 2018 Customer 1 $300 2019 Customers Sales for 2015 Sales for 2016 Customer 2 $150 2018 Customer 1 $100 $300 Customer 2 $250 2019 Customer 2 $150 $350 Standard union Union with constant values Customers Sales for 2015 Customers Sales for 2016 Customer 1 $100 Customer 1 $300 Customer 2 $150 Customer 2 $350 Figure 5.13 Standard Union Compared to Union with Constant Values Often, we want a report that looks like the top right of the figure. If we’re given the output from the standard union (top left), it takes more work to create a report with sales for 2018 and 2019 in separate columns. With the output from the union with constant values, writing the report is easy. 212 Chapter 5 Information Modeling Concepts Earlier in Figure 5.11, we showed that you should use a union rather than a join for performance reasons. But what happens if the two star join views that you want to union together don’t have the same output fields? In that case, the union will return an error. How do we solve this problem? Figure 5.14 illustrates an example. On top, we have the ideal case in which both star join views have matching columns. In the middle, we have a case in which only column B matches. Normally, we would use a join, as shown. We solve this problem by creating a projection for each of the two star join views. We add the missing columns (field C in the left table, and field A in the right table) and assign a NULL value to those new additional columns. We can then perform a union on the two projections because they now have matching columns. Inner Join Left Table Right Table A A A (left side) B B A (right side) C C B (left side) Union B (right side) C (left side) C (right side) Left Table A Right Table B B Inner Join B (left side) B (right side) C Union Left Table Right Table A (left side) A A = NULL B (left side) B B B (right side) C = NULL C C (right side) Join converted to union Figure 5.14 Standard Union, Join, and Join Converted to Union In SAP HANA, we can perform two more operations on the result sets of the left and right tables. Instead of just combining the two result sets in a union, we can Key Concepts Refresher Chapter 5 find out which records occur in both the result sets or, alternatively, which records occur only in one of the two result sets. Imagine you have a marketing campaign where you want to sell gaming headphones to customers. In this case, you might want to target only those customers who bought both a computer and a large wide-screen monitor. You’ll have one result set with customers who bought computers, and another result set with customers who bought large wide-screen monitors. Instead of using a union, you’ll now use an intersect operation for use in your marketing campaign. However, if your marketing campaign is to sell large wide-screen monitors to customers who bought computers but did not buy any screens, you would use a minus operation. We’ll look at these again in the next chapter. Semantics An important step that some modelers skip is to add more meaning to the output of their information models. Semantics, as a general term, is the study of meaning. For example, we can use semantics in SAP HANA to rename fields in a system so their purpose becomes clearer. Some database fields have really horrible names. Using semantics, you can change the name of these fields to be more meaningful for your end users. This is especially useful when users create reports. If you have a reporting background, you’ll know that a reporting universe can fulfill a similar function. We can also show that certain fields are related to each other, for example, a customer number and a customer name field. Another example that is used often is that of building a hierarchy. By using a hierarchy, you can see how fields are related to each other with regards to drilling down for reports. Hierarchies Hierarchies are used in reporting to enable intelligent drilldown. Figure 5.15 illustrates two examples: users can start with a list of sales for all the different countries; they can then drill down to the sales of the provinces (or states) of a specific country and finally down to the sales of a city. End users can see the sales figures for just the country, region (state or province), or city (town). Even though countries and cities are two different fields, they’re 213 214 Chapter 5 Information Modeling Concepts related to each other by use of a hierarchy. This extra relationship enhances the data and gives it more meaning (semantic value). Location level hierarchy Date level hierarchy Country Year Region (Province, State) Month City, Town Day Location level hierarchy Date level hierarchy 2019 South Africa Gauteng January Johannesburg 1 Pretoria 2 KwaZulu-Natal February 28 Durban Western Province Cape Town May 31 Figure 5.15 Examples of Level Hierarchies We’ll take a look at two different types of hierarchies: level hierarchies and parentchild hierarchies. The hierarchy we just described is the level hierarchy; level 1 is the country, level 2 is the state or province, and level 3 is the city or town. Another example of a level hierarchy is a time-based hierarchy that goes from a year to a month to a day. You can even keep going deeper, down to seconds. This is illustrated on the right side of Figure 5.15. HR employee organizational charts or cost center structures are examples of parent-child hierarchies, as illustrated in Figure 5.16. In this case, we join a table to itself, as we did when describing self-joins. Key Concepts Refresher Chapter 5 Parent-child hierarchy Employee ID CEO Employee name Employee surname … Personal Assistant … Manager ID … Manager 1 Manager 2 Employee 8 … Data for Employee 2 Manager-employee hierarchy Employee 2 CEO (is also employee 1) Manager 1 … Manager 1 (is also employee 2) Employee 3 … Employee 4 … Manager 2 (employee 5) Employee 1 (the CEO) Employee 6 … … Employee 7 Employee 8 Figure 5.16 Parent-Child Hierarchy Example Table 5.1 lists the hierarchies we discussed and when you should use which one. Level Hierarchies Parent-Child Hierarchies Fixed (rigid) number of levels are used in the hierarchy. Variable hierarchy depth is used. Different data fields are used on every level of the hierarchy (e.g., country, state, and city). The same (two) fields are used on every level of the hierarchy. The fields on each level of the hierarchy can be different The parent and child fields have data types (e.g., country can be text, while ZIP/postal the same data type. code can be numeric). Table 5.1 Comparing Level Hierarchies and Parent-Child Hierarchies Best Practices and Modeling Guidelines 215 216 Chapter 5 Information Modeling Concepts Let’s end our tour of modeling concepts with a summary of best practices for modeling views in SAP HANA. Many of the basic modeling guidelines that we find in traditional data modeling and reporting environments are still applicable in SAP HANA. We can summarize the basic principles as follows: 쐍 Limit (filter) the data as early as possible. It’s wasteful to send millions of data records across a slow network to a reporting tool when the report only uses a few of these data records. It makes a lot more sense to send only the data records that the report requires, so we want to filter the data before it hits the network. In the same way, we don’t want to send data to a star join view when we can filter it in the earlier dimension view—hence, the rule to filter as early as possible. 쐍 Perform calculations as late as possible. Assume you have column A and column B in a table. The table contains billions of records. We can add the values of column A and column B in a formula for every data record, which means the server performs billions of calculations and finally adds up the sum total of all these calculated fields. It would be much faster to calculate the sum total for column A and the sum total for column B and then calculate the formula. In that case, we perform the calculation once instead of billions of times. We delay the calculation as long as possible for performance reasons. Important Terminology In this chapter, we focused on various modeling concepts and how they’re used in SAP HANA. We introduced the following important terms: 쐍 Views We started by looking at views and realized that we can leverage the fact that SAP HANA already has all the data in memory, that the servers have lots of CPU cores available, and that SAP HANA runs everything in parallel. We therefore don’t need to store the information in cubes and dimension tables but can use views to generate the information when we need it. This also gives us the advantage of real-time results. 쐍 Joins We have both normal database join types, such as inner and left outer joins, and SAP HANA-specific types, such as referential, text, temporal, and spatial joins. In the process, we also looked at the special case of self-joins and at dynamic joins as an optimization technique. Practice Questions Chapter 5 쐍 Core data services (CDS) CDS provides a way to separate business semantics and intent from database operations. 쐍 Cubes A cube consists of a data foundation made up of fact tables that contain the transactional data and that are linked to dimension tables that contain master data. The data foundation is linked to the dimension tables with star joins. 쐍 Attributes and measures Attributes describe elements involved in a transaction. Transactional data that we can perform calculations on is called a measure and is stored in the fact tables. 쐍 Calculation views In SAP HANA, we take the concept of views to higher levels. We can create dimension tables and cubes as dimension views and star join views. Dimension views are calculation views of type dimension. Star join views are calculation views of type cube with a star join. The final type of information view is the calculation view of type cube, which we can use to perform even more powerful processes. 쐍 Modeling artifacts Inside our views, we can use unions, projections, aggregations, and ranking. Unions are quite flexible and can be used for a direct merge of result sets, or we can use a union with constant values to create report-ready result sets. Practice Questions These practice questions will help you evaluate your understanding of the topics covered in this chapter. The questions shown are similar in nature to those found on the certification examination. Although none of these questions will be found on the exam itself, they will allow you to review your knowledge of the subject. Select the correct answers, and then check the completeness of your answers in the “Practice Question Answers and Explanations” section. Remember, on the exam, you must select all correct answers and only correct answers to receive credit for the question. In this section, we have a few questions with graphics attached. In the certification exam, you might also see a few questions that include graphics. 217 218 Chapter 5 Information Modeling Concepts 1. In which SAP HANA views will you find measures? (There are 2 correct answers.) 첸 A. Calculation view of type dimension 첸 B. Calculation view of type cube with star join 첸 C. Calculation view of type cube 첸 D. Database views 2. True or False: A database view stores data in the database. 첸 A. True 첸 B. False 3. A traditional cube is represented by which SAP HANA view type? 첸 A. CDS view 첸 B. SQL view 첸 C. Calculation view of type cube with star join 첸 D. Calculation view of type dimension 4. A referential join gives the same results as which other join type? 첸 A. Inner join 첸 B. Left outer join 첸 C. Spatial join 첸 D. Star join 5. Which join type makes use of date ranges? 첸 A. Spatial join 첸 B. Text join 첸 C. Temporal join 첸 D. Inner join 6. If we change the transaction “Laura buys 10 apples” to “Laura buys 10 green apples,” how would we store the color green? 첸 A. Store green as an attribute. 첸 B. Store green as a measure. Practice Questions Chapter 5 첸 C. Store green as a spatial data type. 첸 D. Store green as a CDS view. 7. True or False: A view always shows all the available columns of the underlying tables. 첸 A. True 첸 B. False 8. You have a view with two tables joined by a left outer join. If you redesign the view and accidently swap the two tables around, what should you do to the join? 첸 A. Keep the left outer join. 첸 B. Change the join to a text join. 첸 C. Change the join to a right outer join. 첸 D. Change the join to a referential join. 9. What do you call the data displayed in the data foundation of an SAP HANA information view? 첸 A. Facets 첸 B. Measures 첸 C. Characteristics 첸 D. Key figures 10. You’re writing a mobile application for the World Series. You have details about all the baseball players and the baseball scores for all the previous games. How do you use the data of the player information and the scores? 첸 A. Both the player information and the scores are used as master data. 첸 B. Both the player information and the scores are used as transactional data. 첸 C. The player information is used as master data, and the scores are used as transactional data. 첸 D. The player information is used as transactional data, and the scores are used as master data. 219 220 Chapter 5 Information Modeling Concepts 11. Look at Figure 5.17. You’re selling books that have been translated into various languages. What join type should you use? 첸 A. Left outer join 첸 B. Text join 첸 C. Temporal join 첸 D. Referential join Field 1 Field n Book ID Left Table What join should you use here? Field 1 Book ID Field n SPRAS Title EN CN AR FR RU Right Table Figure 5.17 What Type of Join Should You Use? 12. Which of the following are reasons you should use core data services (CDS) to create a schema? (There are 2 correct answers.) 첸 A. The database administrator might type the name wrong. 첸 B. The database administrator should not get those privileges in the production system. 첸 C. The focus is on the business requirements. 첸 D. The focus is on the database administration requirements. Practice Questions Chapter 5 13. Look at Figure 5.18. A company sends out a lot of quotes. Some customers accept the quotes, and they’re invoiced. The company asks you to find a list of the customers that did NOT accept the quotes. How do you find the customers that received quotes but did NOT receive invoices? 첸 A. Use a right outer join. Filter to show only the NULL values on the right table. 첸 B. Use a left outer join. Filter to show only customers on the right table. 첸 C. Use a left outer join. Filter to show only the NULL values on the right table. 첸 D. Use a right outer join. Filter to show only customers on the left table. Quote ID … Customer ID Left Table—Quotes What join should you use here? Invoice ID Customer ID … Right Table—Invoices Figure 5.18 Find the Customers That Received Quotes but Not Invoices … 221 222 Chapter 5 Information Modeling Concepts 14. True or False: When you use a parent-child hierarchy, the depth in your hierarchy levels can vary. 첸 A. True 첸 B. False 15. Look at Figure 5.19. What type of join should you use? (Hint: Watch out for distractors.) 첸 A. Temporal join 첸 B. Text join 첸 C. Referential join 첸 D. Inner join Field 1 Field n Person Sarah Katrin Amanda Left Table—Persons What join should you use here? Field 1 From airport To airport Language code Amanda Johannesburg Frankfurt EN Katrin Frankfurt JFK DE Sarah JFK Madrid ES Sarah Madrid London EN Amanda London Cape Town EN Right Table—Flights Figure 5.19 What Type of Join Should You Use? 16. You have a list of map locations for clinics. The government wants to build a new clinic where there is the greatest need. You need to find the largest Practice Questions Chapter 5 223 distance between any two clinics. With your current knowledge, how do you do this? 첸 A. Use a temporal join to find the distance between clinics. Use a union with constant values to “pivot” the values. 첸 B. Use a spatial join to find the distance between clinics. Use a level hierarchy for drilldown. 첸 C. Use a dynamic join to find the longest distance between clinics. Use a ranking to find the top 10 values. 첸 D. Use a spatial join to find the distance between clinics. Use an aggregation with the maximum option. 17. Look at Figure 5.20. What are the values for X, Y, and Z? (Note: You won’t see a question like this on the certification exam; it’s only added here to test your understanding.) Customers Ranking for Bottom 10 Average Sales X Y Customers Aggregation (Average) Average Sales Customer 1 Customer 2 Sales Customers Union Star join view Z Total Sales Customer 1 2018 Customer 1 2019 Customer 2 2018 Customer 2 2019 Customers Sales for 2015 Customers Sales for 2016 Customer 1 $100 Customer 1 $300 Customer 2 $200 Customer 2 $600 Figure 5.20 Find Values for X, Y, and Z Star join view 224 Chapter 5 Information Modeling Concepts 첸 A. X = Customer 2, Y = 400, Z = 800 첸 B. X = Customer 1, Y = 400, Z = 400 첸 C. X = Customer 1, Y = 800, Z = 800 첸 D. X = Customer 2, Y = 200, Z = 400 18. You have to build a parent-child hierarchy. What type of join do you expect to use? 첸 A. Relational join 첸 B. Temporal join 첸 C. Dynamic join 첸 D. Self-join 19. You have two result sets: one of customers who booked the venue for an event, and another of customers who want flowers for the event. What do you use to find the customers who booked both the venue and flowers? 첸 A. Union 첸 B. Minus 첸 C. Intersect 첸 D. Inner join 20. Look at Figure 5.21. What join type should you use to see all the suppliers? 첸 A. Left outer join 첸 B. Right outer join 첸 C. Inner join 첸 D. Referential join Practice Question Answers and Explanations Chapter 5 Field 1 … Supplier ID 1 5 3 Left Table—Purchases What join should you use to see all the suppliers? Supplier ID Supplier name 1 Supplier A 2 Supplier B 4 Supplier D 5 Supplier E 7 Supplier G … Column n Right Table—Suppliers Figure 5.21 What Join Should You Use to See All Suppliers? Practice Question Answers and Explanations 1. Correct answers: B, C You find measures (and perhaps attributes) in calculation views of type cube with star join and calculation views of type cube. Calculation views of type dimension don’t contain measures—only attributes. 2. Correct answer: B False. A database view doesn’t store data in the database. 3. Correct answer: C A traditional cube is represented by calculation view of type cube with star join. It’s different from a cube in that they don’t store any data like a cube does. A calculation view of type dimension is similar to a dimension table. CDS and SQL views aren’t like cubes either. 225 226 Chapter 5 Information Modeling Concepts 4. Correct answer: A A referential join gives the same results as an inner join but speeds up the calculations in some cases by assuming referential integrity of the data. It’s the optimal join type in SAP HANA. 5. Correct answer: C A temporal join requires FROM and TO fields in the table on the right side of the join and a date-time column in the table on the left side of the join. 6. Correct answer: A We’ll store the color green as an attribute. We can’t “measure” or perform calculations on the color green. Green isn’t a spatial data type. CDS views aren’t relevant in this context. 7. Correct answer: B False. A view doesn’t always show all the available columns of the underlying tables. You have to select the output fields of a view. The word always in the statement is what makes it false. 8. Correct answer: C A right outer join is the inverse of the left outer join. Therefore, if you reverse the order of the tables, you can “reverse” the join type. It’s important to note that this doesn’t work for a text join. A text join is only equivalent to a left outer join. For a text join, you have to always be careful about the order of the tables. 9. Correct answer: B The data displayed in the data foundation of an SAP HANA information view is called measures. In SAP HANA, we talk about attributes and measures. The other names might be used in other data modeling environments, but not in SAP HANA. 10. Correct answer: C The player information is used as master data, and the scores are used as transactional data. You can’t perform calculations on the players. Therefore, this data is used in the dimensions and is seen as master data. You definitely perform calculations on the scores, so these are used in the fact tables, which means they are transactional data. 11. Correct answer: B Figure 5.17 shows the SPRAS column and a list of languages. The need for a translation is also hinted at in the question, meaning this should be a text join. There isn’t enough information to select any of the other answer options. Practice Question Answers and Explanations Chapter 5 12. Correct answers: B, C You should use CDS to create a schema because the database administrator should not get those privileges in the production system, and the focus should be on the business requirements. 13. Correct answer: C Look carefully at Figure 5.18. The question asks you to find all the customers who received quotes but did not receive invoices. The answer only gives us left outer or right outer as options. Because the question states they want all customers from the quotes (left) table, you might try a left outer join first. Now look at Figure 5.3 to see the difference between the left and right outer joins. Look especially at where the NULL values are shown. With a left outer join, the NULL values are found in the right table. With a right outer join, the NULL values are found in the left table. We want to find the customers who did not get invoices. With a left outer join, the NULL values are found on the right side: that group is the one we’re looking for. Therefore, filter for the NULL values on the right table. Note Any time a negative phrase or word is used in the certification exam, it’s shown in capital letters to bring attention to the fact that it’s a negative word. 14. Correct answer: A True. Yes, the depth of your hierarchy levels can vary when you use a parentchild hierarchy. 15. Correct answer: C You should use a referential join. There are two distractors to try and throw you off track: A temporal join uses FROM and TO fields but only for date-time fields. This example is for airports. The second distractor was to try to get you to select the text join. The language code has nothing to do with translations but has to do with the airline’s primary language. Choose a referential join because you can see the same people traveling in both tables, so it seems that these two tables have referential integrity. 227 228 Chapter 5 Information Modeling Concepts If the tables didn’t have referential integrity, you would have chosen an inner join. 16. Correct answer: D Only a spatial join can give the distance between different points on a map. After you have the distances, there are two ways you can find the largest distance between any two clinics: Use a ranking to find the top 10 values. Use an aggregation with the maximum option. 17. Correct answer: B See the full solution in Figure 5.22. X = Customer 1, Y = 400, Z = 400. Note that we’re using average in the aggregation node, not sum. In the ranking node, we’re asking for the bottom 10, not the top 10. Ranking for Bottom 10 Aggregation (Average) Union Star join view Customers Total Sales Customer 1 200 Customer 2 400 Customers Total Sales Customer 1 200 Customer 2 400 Customers Total Sales Year Customer 1 100 2018 Customer 1 300 2019 Customer 2 200 2018 Customer 2 600 2019 Customers Sales for 2015 Customers Sales for 2016 Customer 1 $100 Customer 1 $300 Customer 2 $200 Customer 2 $600 Figure 5.22 Full Solution: X = Customer 1, Y = 400, Z = 400 Star join view Summary Chapter 5 18. Correct answer: D You expect to use a self-join when you build a parent-child hierarchy. Refer to Figure 5.16 for an illustration. 19. Correct answer: C You use an intersect operation to find the customers that booked both the venue and flowers. 20. Correct answer: B To see everything on the right table, use a right outer join. Note that these tables don’t have referential integrity. If we had asked the question a little differently, this could have been important. Takeaway You should now have a general understanding of information modeling concepts: views, joins, cubes, star joins, data foundations, dimension tables, and so on. In this chapter, you learned about the differences between attributes and measures and the value of CDS in the development cycle. You’ve seen the various types of views that SAP HANA uses, how to use them together, and what options you have available for your information modeling. Finally, we examined a few best practices and guidelines for modeling in SAP HANA in general. Summary You’ve learned about many modeling concepts and how they are applied and implemented in SAP HANA. You’re now ready to go into the practical details of learning how to use the various SAP HANA calculation views and how to enhance these views with the various modeling artifacts that we have available to us in each of these different views. 229 Chapter 6 Calculation Views Techniques You’ll Master 쐍 Learn to create SAP HANA calculation views 쐍 Identify which type of SAP HANA calculation view to use 쐍 Explore the various types of calculation views 쐍 Know how to select the correct joins and nodes 쐍 Get to know the value of semantics in information modeling 232 Chapter 6 Calculation Views In the past two chapters, we discussed the modeling concepts and tools required for building SAP HANA information views. In this chapter and the next, we’ll build on that foundation. We’ll focus on how to graphically create information views in the SAP HANA system and how to enhance these views with additional functionality. In this chapter, you’ll learn to create the different types of calculation views, select the correct nodes for what you want to achieve, and use the Semantics node to add additional value for business users. Real-World Scenario You start a new project in which a business wants to expand the range of services it offers its customers. You’re responsible for building the information models in SAP HANA. You have to build about 50 new SAP HANA information views. As you plan your information views, you notice several master data tables that you’ll use. By creating a few well-designed dimension views, you cater not only to the needs of this project but also to future requirements. You see this as a “library” of reusable dimension views. While you’re busy designing these dimension views, you also gather some analytics and reporting requirements. Most of the analytics requirements are addressed by building star join views and calculation views of type cube. You address some of the reporting requirements by delivering semantically enhanced information models that are easier for your colleagues and end users to work with. The company wants to create a mobile app. Working together with some of the developers, you quickly create a few calculation views to provide them with information that would have taken them hours to create via code. The performance of the calculation views you created are also very good, which delights the mobile app users. Objectives of This Portion of the Test The purpose of this portion of the SAP HANA certification exam is to test your knowledge of modeling SAP HANA information views. Key Concepts Refresher Chapter 6 The certification exam expects you to have a good understanding of the following topics: 쐍 Creating SAP HANA information views 쐍 Calculation views of type dimension 쐍 Calculation views of type cube with star join 쐍 Calculation views of type cube 쐍 Selecting the correct joins and nodes to produce accurate results 쐍 Choosing the correct type of SAP HANA information view 쐍 Understanding the value that semantics adds to the information-modeling process Note This portion contributes more than 10% of the total certification exam score. Key Concepts Refresher In Chapter 4, we examined SAP HANA modeling tools, and, in Chapter 5, we looked at modeling concepts. Now, it’s time to put them together, implementing the modeling concepts with the tools to create SAP HANA calculation views. There are three types of SAP HANA calculation views, which we briefly covered in Chapter 5. First are dimension views, for the master data in the system. All data in these views are treated as attributes, even numeric fields. Second, star join views let you build cube-like information models. Third, calculation views of type cube let you build enhanced multidimensional data structures and information models. Tip We refer to calculation views of type dimension as dimension views and to calculation views of type cube with star join as star join views. The last category is called calculation views of type cube. The last category can cause some confusion because some people and older web pages still refer to the last category simply as calculation views. In this book, we use the latest official name—calculation views of type cube—because that’s what they’re referred to in the SAP HANA certification exam. We use the name calculation views to refer to all three types of calculation views. 233 234 Chapter 6 Calculation Views In this chapter, we start by briefly reviewing the different types of data sources that can be used by calculation views. We then dive into the creation process for the different calculation views before discussing the various types of nodes you’ll encounter, including the Semantics node. Data Sources for Calculation Views SAP HANA calculation views can use data from a wide variety of sources. We’ll discuss some of the data sources in later chapters, so don’t worry if you don’t yet know what all of them are. The main data sources are as follows: 쐍 Tables in the SAP HANA system—both row-based and column-based The different types of column tables, for example, history column tables and temporary tables, are all supported. 쐍 SAP HANA calculation views in the SAP HANA system We can build new calculation views on top of other existing views. For example, you can use dimension views as data sources in a star join view. 쐍 SQL views Most databases have the ability to create simple views using SQL code. We discussed SQL views in Chapter 5. 쐍 Table functions Table functions are written with SQLScript code and return calculated data. The format of that data looks similar to the format of data read from a table. 쐍 Virtual tables In these tables, data from other databases isn’t stored in SAP HANA but is fetched when and if required. 쐍 Core data services (CDS) You can use these sources, for example, CDS views, to deploy table definitions together with a calculation view to the production system. 쐍 Data sources from another tenant in the SAP HANA system These data sources are available if SAP HANA is set up as a multitenant database container (MDC) system. These can include tables, SQL views, and calculation views from another tenant, provided that the proper authorizations are in place. Key Concepts Refresher Chapter 6 Calculation Views: Dimension, Cube, and Cube with Star Join We create calculation views using graphical modeling. The aim is to do as much modeling work as possible with graphical calculation views. As SAP HANA has matured, there has been less emphasis on writing code when creating information models. In this section, we’ll walk through the steps for creating these calculation views. We’ll use the SAP HANA Interactive Education (SHINE) demo application and, where possible, give you the name of the modeling objects we used. You can use this to follow along practically in your own SAP HANA system. Creating Calculation Views You create calculation views in your Workspace in SAP Web IDE for SAP HANA. Select and right-click on the SAP HANA database module and folder in which you want to create the calculation view. You’ll see a context menu like the one shown in Figure 6.1. Figure 6.1 Context Menu for Creating SAP HANA Calculation Views 235 236 Chapter 6 Calculation Views Click New • Calculation View. The options for creating different types of calculation views appear in the popup dialog box, as shown in Figure 6.2. Figure 6.2 Creating Calculation Views in SAP Web IDE for SAP HANA At the top of the dialog box, give the calculation view a Name and a Label (description). Use the Data Category dropdown list and the With Star Join checkbox to create the different types of calculation views. The following instructions explain how to create each type of calculation view (except for calculation of type time dimension 4, which we will not go into detail on here): 1 Calculation view of type cube Select the CUBE option in the Data Category dropdown list, but don’t select the With Star Join checkbox. The default view node will be an Aggregation node. These views can also be used for multidimensional reporting and data analysis purposes. Key Concepts Refresher Chapter 6 2 Calculation view of type cube with star join Select the CUBE option in the Data Category dropdown list, and select the With Star Join checkbox. The default view node will be a Star Join node. These views can be used for multidimensional reporting and data analysis purposes. 3 Calculation view of type dimension Select the DIMENSION option in the Data Category dropdown list. The default view node will be a Projection node. These views don’t support multidimensional reporting and data analysis. You can use dimension calculation views as data sources for other calculation views. Click the Create button to create the calculation view. If you select the DEFAULT option in the Data Category dropdown list, you’ll create a type of calculation view that can only be used internally in SAP HANA or consumed by SQL. An external client tool—for example, a reporting tool—won’t be able to use this calculation view because the DEFAULT calculation view doesn’t create the hierarchy views used by the reporting tools when you build the calculation view. If you’re not using hierarchies, or the calculation view will only be used by reporting tools that can’t use hierarchies, you can use the DEFAULT calculation view type. SPS 04 To be clearer on what the limitation is of the DEFAULT calculation view type, it’s now named SQL Access Only in SAP HANA 2.0 SPS 04. Creating Time Dimension Calculation Views You can also create a special type of dimension view, called a time dimension view. When creating a dimension calculation view, you can specify the Type as Time, as shown in the bottom-right corner of Figure 6.2, shown previously. Additional fields then become available, where you can specify the Calendar (Gregorian or Fiscal) and the Granularity (e.g., yearly). In Figure 6.3, we see an example of a time dimension view. Note This example is found in the SHINE demo application; the calculation view is called TIME_DIM. (The example here was modified to a time dimension view.) 237 238 Chapter 6 Calculation Views Figure 6.3 Creating Time-Based Calculation Views in SAP Web IDE for SAP HANA The time-based calculation view adds a time dimension to information views. Remember that dimensions contain attributes that describe a transaction. If we add a time and date when the transaction happens, we’ve added another attribute (time). These time attributes are stored in a time dimension. We can use this to enhance the information model with drilldown capabilities on a time hierarchy. We’ll discuss hierarchies in more detail in Chapter 7. Working with a Calculation View A calculation view includes different working areas. A calculation view of type dimension is shown in Figure 6.4 in SAP Web IDE for SAP HANA. Tip In SHINE, look for PO_ITEM. The join type should actually be a text join. Key Concepts Refresher Chapter 6 239 Output fields Select nodes Tab menu Navigation Details of the selected node, for example tables and joins Properties Nodes with semantics node on top Figure 6.4 Working with Calculation Views Work from left to right in this layout. The left toolbar, called the tool palette, shows the different type of nodes that are available to you. In the left work area, you’ll see different view nodes. The top view node is always the Semantics node. In the Semantics node, you can rename fields to give them more meaning (i.e., semantics). You can also create hierarchies in the Semantics node. (In Chapter 7, we’ll discuss adding hierarchies to SAP HANA views.) The node below the Semantics node is called the default view node. The default node is the topmost node used for modeling in the view. Figure 6.5 illustrates the default nodes for the different types of calculation views: 1 Type dimension 2 Type cube with start join 3 Type cube You can add other types of nodes below the default node; for example, Figure 6.4 shows two Join nodes below the default Projection node of the dimension view. 240 Chapter 6 Calculation Views Figure 6.5 Default View Nodes for Different Types of Calculation Views In the left screenshot for the dimension view in Figure 6.5, you can see the names of the available nodes in the tool palette toolbar. This toolbar is normally collapsed as shown in the other two examples. SPS 04 The right side shows what the toolbar looks like in SAP HANA 2.0 SPS 04. The redesigned icons will keep their high-resolution quality, even when zoomed in. Note The Non-Equi Join, the Hierarchy Function, and the Anonymization icons in the toolbar are new in SAP HANA 2.0 SPS 03. If you don’t see these icons in your system, edit the hidden file named .hdiconfig. The top entry should be "plugin_version" : "2.0.30.0". Tip Always work bottom-up when building SAP HANA calculation views: start at the nodes at the bottom and work your way up to the default node. The output of each node becomes the input of the node above it. The final output of the view, which is the output of the default node, is shown in the Semantics node. When you select a View node, the information for the selected node is shown in the area on the right. In Figure 6.4, shown previously, we selected the second Join Key Concepts Refresher Chapter 6 node. In the middle area on the right side, you can see the tables in this Join node and how they are joined together. When you select the Columns tab in Figure 6.4, you can see all the selected output fields of that Join node. In the PROPERTIES section below the output, you can also see the properties of whatever object, such as a field or join, you’ve currently selected in the working area. Adding Nodes You can add new nodes to the SAP HANA information view by selecting a node type from the tool palette on the left side and dropping it onto the work area. Drag the node around on the work area to align it with other nodes. When the node lines up with another node, an orange line indicates that the node is aligned (see Figure 6.6). Link the nodes by selecting the node, grabbing the round arrow button, and dragging it to the circle at the bottom of the node above it. Figure 6.6 Adding and Aligning Join Nodes to the Dimension View Adding Data Sources After you’ve added a node to the work area, you can add data sources to it. Select the Join node. A popup overlay menu with a green plus sign and a tooltip (Add Data Source) will appear, as shown in Figure 6.7. Click on the plus sign. A dialog box will appear asking you which data sources you want to add to the node. Enter a few characters of the table’s name to search for any data sources in the SAP HANA database module that match those characters. Figure 6.7 shows the list of tables 241 242 Chapter 6 Calculation Views matching the search for employee data sources. Then, select the data sources, and click on Finish to add them to the node. If you click on the Create Synonym button, you can specify the details of the synonyms required. This is part of security (Chapter 13). If you just click on Finish after selecting the data source, the synonyms are created without asking you for details. Figure 6.7 Adding Data Sources to a View Node Creating Joins After you’ve added the data sources to the data foundation, you’ll want to join the data sources together. Create a join by taking the field from one table and dropping it onto a related field of another table. Figure 6.8 shows what this looks like in SAP Web IDE for SAP HANA. Key Concepts Refresher Chapter 6 Go to the PROPERTIES area to select the Join Type from the dropdown 1, and set the Cardinality 2. Make sure you have the join (the line between the tables) selected as shown; otherwise, the properties won’t show the correct information. Figure 6.8 Creating a Join and Setting Properties in SAP Web IDE for SAP HANA Tip When you create a join, get into the habit of always dropping the field from the main table onto the related field of the smaller lookup table. For some join types, it makes no difference in which direction you do the drag and drop. With a text join, however, it matters. The join types that are available in calculation views are as follows: 쐍 Referential join 쐍 Inner join 쐍 Left outer join 243 244 Chapter 6 Calculation Views 쐍 Right outer join 쐍 Full outer join 쐍 Text join 쐍 Dynamic join 쐍 Spatial join Selecting Output Fields The next step is to select the output fields of your View node. Select the Mapping tab, as shown in Figure 6.9. Drag and drop the fields from the Data Sources on the left to the Output Columns on the right. The data sources are stacked vertically, so if you don’t see the field name you’re looking for, just scroll down until you find it. Figure 6.9 Selecting Fields as Output Columns for the View Node’s Output Figure 6.10 shows what a list of output fields looks like in SAP Web IDE for SAP HANA. Key Concepts Refresher Chapter 6 Figure 6.10 List of Output Fields for the View Node When you select one of the fields in the output columns, you can see the properties of that field in the PROPERTIES section underneath the Columns tab area (refer to the overview screen in Figure 6.4). You can see the Name and Data Type of the column in Figure 6.11. You can change column names to make them more meaningful for end users and reporting tools. You can see this in Figure 6.10, where the EMPLOYEEID field’s Name is selected. The Mapping property ensures that SAP HANA still knows what the original field is in the data source. Figure 6.11 Properties of Output Fields You can also change the data type, specify whether the attribute is a key attribute, or determine whether the field should be hidden or not. 245 246 Chapter 6 Calculation Views Often, you’ll want to select all the fields coming from a lower node as the output fields of the node. In this case, right-click the name of the data source, and select Add To Output, as shown in Figure 6.12. Figure 6.12 Selecting All Fields from the View Node as Output Fields An alternative way of selecting output fields is to drag the name of the data source to the Output Columns. All the fields in the data source will be added to the Output Columns on the right. The Keep Flag When we take another look at Figure 6.10, we see that the right-side column is named Keep Flag. This flag is available for each output column in any of the View nodes, but it’s particularly important when we aggregate data. In older SAP HANA systems, this was called the Calculate Before Aggregation flag. We use this flag when the level of granularity has an impact on the outcome of a measure, that is, when the order in which you do things matter. Let’s look at an example by considering the sales data in Table 6.1. Key Concepts Refresher Chapter 6 Month Cost per Item Number of Items January $50 10 January $5 100 Table 6.1 Cost per Item Data Needing the Keep Flag If we first add the columns, we get $55 for the cost per item column and 110 for the number of items column. When we then multiply these values, we get an aggregate value of $6,050, which isn’t correct. We should first do the multiplication because a cost per item value is only relevant for that specific item. When we multiply the values first, we get a cost of $500 for both rows. When we now create the aggregate (sum), we get the correct value of $1,000. This is applicable to formulas where the sequence in which you do them makes a difference, for example, with multiplication and division. In this case, you should do the calculations before creating the aggregate. Practically, it means that you’ll put the following formula in a View node below the Aggregation node. A Projection node is a good choice for this. Cost per Item × Number of Items Finalizing the Calculation View In Chapter 5, we discussed how to build SAP HANA information views in layers, with each node building on the previous ones. After you’ve finished building the calculation view, there are three final steps to follow: 1. Click the Save button above the Workspace area (see Figure 6.13 1). You can also select the Save option from the File menu 1, or press (Ctrl)+(S) on your keyboard. 2. You then need to Build the calculation view. (We discussed the entire build process in Chapter 4.) You can select Build Selected Files from the File menu, the Build menu, or from the context (right-click) menu 2. 3. Preview your data by selecting the Data Preview option from the menu. (We discussed the preview functionality in Chapter 4.) 247 248 Chapter 6 Calculation Views Figure 6.13 Save and Build the Calculation View Congratulations! You now know how to create calculation views in SAP HANA. Working with Nodes In Chapter 5, we looked at the modeling concepts of joins, star joins, projection, ranking, aggregation, and union. All of these options are available in calculation views as View nodes. You can add these View nodes from the tool palette toolbar on the left of the main work area, as shown earlier in Figure 6.12. These are the basic building blocks to use in the different layers of your calculation views. Key Concepts Refresher Chapter 6 Let’s take a quick look at the different type of nodes, working in the order they appear in the tool palette toolbar. Join Node The earlier examples shown in Figure 6.4 and Figure 6.12 used a Join node. Join nodes are used at the bottom layers to get data sources into the view. Join nodes have only two inputs, so if you want to add five tables in the calculation view, you’ll need four Join nodes. Because of this, joins tend to be highly stacked. (You can see an example later in this chapter in Figure 6.17.) To alleviate this pain somewhat, multi-join capabilities were added to Join nodes in SAP HANA 2.0 SPS 03. You can now add three tables into a Join node, as seen in Figure 6.14. In the Join node properties, you can specify which one of the three tables is the Central Table, and if the Multi Join Order is Outside In or Inside Out. Figure 6.14 A Multi-Join When there were just two tables per Join node, you always knew which tables got evaluated first, as the processing is always bottom-up. The tables in the lower node 249 250 Chapter 6 Calculation Views get evaluated first. When using the new multi-join functionality, you have to specify the priority sequence. When you specify Outside In, the join furthest from the central table is executed first, whereas with Inside Out, the closest join is executed first. There are cases where the result set is the same regardless of the order in which joins get executed, for example, when all the joins used are inner joins. However, if you mix inner joins and left outer joins, the result set can vary depending on the join execution sequence. Non-Equi Join Node Another new feature that was introduced with SAP HANA 2.0 SPS 03 is the NonEqui Join node. In this case, the join condition isn’t represented by an equal operator, but by comparison operators such as greater than, less than, less and equal to, and so on. So when would you use a non-equi join? In Figure 6.15, we’re looking at orders. Figure 6.15 A Non-Equi Join Node Key Concepts Refresher Chapter 6 The left table contains the order information, and the right table contains the order items. These tables are linked by the PurchaseOrderID field. For auditing purposes, we might want to know which orders were updated after the delivery date. We create a new join link between the tables with a non-equi join where the History.ChangeDat field is later than (greater than) the DeliveryDate field. You’ll notice in Figure 6.15 that the Join Type in the PROPERTIES area is an Inner join. You can use inner, left outer, right outer, and full outer join types. The Operator property for the join should be Greater Than in this case. You can set this property for each non-equi join pair. Tip For non-equi joins, you can’t set additional flags, for example, to specify dynamic joins. Union Node In Chapter 5, we explained the difference between a standard union and a union with constant values. Figure 6.16 shows a union with four data sources in the Mapping tab. These sources are combined into a list of output fields on the right side. Tip Look in SHINE for the Customer_Discount_by_Ranking_and_Region calculation view. From the context menu of a field in the list of Output Columns on the right, you can select the Manage Mappings option to open the Manage Mapping popup, where you can edit the current mappings. You can also set up a Constant Value for one of the data sources. In this case, it will change the standard union to a union with constant values. In Figure 6.16, you can see this in action as we have four different SALES columns that were created as a result—one for each region. Tip Union nodes in SAP HANA can have more than two data sources. In contrast, Join nodes can only have two data sources (see Figure 6.17), or three data sources in the case of a multi-join. 251 252 Chapter 6 Calculation Views Figure 6.16 Union Node The Constant Value column in the Manage Mapping popup can be used, for example, to identify each data source of a Union node. This is especially useful in the case of a standard union to identify from which data source each record originates. A constant value of Y for table Y and Z for table Z will show either of these constant values in a Constant field. We can use the Constant Value column for pruning (trimming) of data. Let’s take the example where you combine data from two data sources using a union. One data source has the historical data, whereas the other data source has the current data. By specifying constant values of Historical and Current, respectively, you can create queries where you can specify where the data should come from. By specifying the data must be from the table where the Constant Value is Current, the Union node will ignore the historical data. This improves the performance of the calculation view. Key Concepts Refresher Chapter 6 Figure 6.17 Two Inputs Allowed for a Join Node, and Multiple Inputs Allowed for a Union Node SAP HANA includes even more performance enhancements for the Union node. You can make unions run faster by using a pruning configuration table. In this table, you can specify pruning conditions (e.g., filters) to be applied to some columns. For example, a pruning condition can specify that the year must be 2019 or later. This filters the data and complies with our first modeling guideline: limit (filter) the data as early as possible. You can specify the Pruning Config Table in the Advanced section of the Semantics node. When you select the data source of a Union node in the Mapping tab, you can see the Properties of that data source in the area below, as shown in Figure 6.18. You’ll notice that we also have an Empty Union Behavior selection. 253 254 Chapter 6 Calculation Views Figure 6.18 Empty Union Behavior By default, we don’t expect any data returned from a data source if the query doesn’t need data from it. Sometimes, however, it might be useful to know that a certain data source didn’t return any data. To activate this, you have to set a Constant Value for each data source to uniquely identify it. You then choose Row with Constants for the Empty Union Behavior. When no data is expected from a data source, it will still return one record where all fields have null values except the unique Constant Value that identifies the data source. We can use this to identify data sources that didn’t return any data to our query. Key Concepts Refresher Chapter 6 SPS 04 While we’re looking at the Mapping tab of the Union node, we can quickly note a new feature of SAP HANA 2.0 SPS 04. In Figure 6.18, you see the PROPERTIES of the second data source. When you click on the Union node itself, the PROPERTIES section will show the options you can set for the entire node. In SPS 04, you have a new option called Partition Local Execution, as shown in Figure 6.19. This enables you to control the level of parallelization based on processed table content, for example, by perform currency conversion or aggregation in parallel for every country or time period in a query. Figure 6.19 Enabling Parallelization in a Node in SAP HANA 2.0 SPS 04 You start the parallelization process by selecting the column that controls the level of parallelization at the lowermost Union node. The level of the parallelization is based on data, as a parallel thread is started for each distinct value of the specified column. If you don’t select a column, the parallelization uses table partitioning. In a later (higher) Union node that is fed by only one source, you switch it off again. You have to switch the parallelization on and off in the same view; it won’t work across views. You can only have one parallelization block per view. Intersect and Minus Nodes In SAP HANA, we have two more operations we can perform on the result sets of the left and right tables. Instead of just combining the two result sets in a union, we can use an Intersect node to find out which records occur in both result sets. Alternatively, we use a Minus node to find out which records occur only in one of the two result sets, eliminating records that occur in both nodes or only in the other node. If you remember the marketing campaign in the previous chapter where you wanted to sell gaming headphones to customers. In this case, you might want to target only those customers who bought both a computer and a large wide-screen monitor. You’ll have one result set with customers who bought computers, and another result set with customers who bought large wide-screen monitors. 255 256 Chapter 6 Calculation Views Instead of using a union, you’ll now use an Intersect node for use in your marketing campaign. Figure 6.20 shows this example, where the Category from Projection_1 is computers, and the Category from Projection_2 is large wide-screen monitors. Note that both these projections get their data from the same Join node. Each projection then uses a unique filter to get the required information. Figure 6.20 An Intersect Node However, if your marketing campaign is to sell such monitors to customers who bought computers but have not yet bought screens, you would use a Minus node. Another example could be where you want to find all orders that haven’t been invoiced yet. Graph Node The Graph node is an interesting addition to the tool palette. It delivers functionality that you would not normally expect in information modeling. (We’ll discuss graph modeling in Chapter 11.) Graph modeling is used when you want to analyze relationships between entities, for example, analyzing who are currently “friends” on Facebook, and using that to calculate what the chances are of two unrelated people becoming Facebook friends in the future. Key Concepts Refresher Chapter 6 For this, you’ll need a table with people on Facebook, and another table with the existing “friends” relationships between these people. You then need to create a graph workspace linking these two tables. The Graph node must always be the bottom (leaf) node in a calculation view, as shown in Figure 6.21. It uses the graph workspace as its data source. In addition, you have to ensure that the referential integrity remains on your data when using graph modeling. In the SAP HANA graph reference guide on SAP Help, they use the example of relationships between various gods and deities in Greek mythology. Figure 6.21 Graph Node Projection Node Projection nodes can be used to remove some fields from a calculation view. If you don’t send all the input fields to the output area, the fields that aren’t in the output area won’t appear to client tools. You can also use Projection nodes to rename fields, filter data, and add new calculated columns to a calculation view. (We’ll discuss filter expressions and calculated columns in Chapter 7.) 257 258 Chapter 6 Calculation Views Aggregation Node The Aggregation node in calculation views can calculate single summary values from a large list of values; for example, you can create a sum (single summary value) of all the values in a table column. These summary values are useful for high-level reporting, dashboards, and analytics. (The functionality is similar to the GROUP BY statement in SQL.) The Aggregation node in calculation views also supports all the usual functions: sum, count, minimum, maximum, average, standard deviation, and variance. For performance reasons, be cautious when using the last three functions (average, standard deviation, and variance) on very large stacked calculation views. You normally calculate aggregates of measures, but in SAP HANA, you also can create aggregates of attributes, which basically creates a list of unique values. Tip Watch out when creating average, standard deviation, and variance aggregates in stacked scenarios as this could affect the performance of your calculation views. Rank Node We can use Rank nodes to create top N or bottom N lists of values to sort results. In the Definition tab of a Rank node, you can access extra properties to specify the Rank node behavior. Figure 6.22 shows the Rank node area. The Sort Direction field allows you to specify whether you want a top N or bottom N list. The Order By field allows you to specify what you want to sort the list by, such as the Product_Price. In this case, the top 10 most expensive items will be shown. The Threshold field specifies how many entries you want in the list. If you specify a Fixed value of “10”, you’ll produce a top 10 or bottom 10 list. You can also choose to set this value from an Input Parameter, which means that the calculation view will ask for a value each time a query is run on this view. (We’ll discuss input parameters in Chapter 7.) You can also partition the data by a specific column. In the example in Figure 6.22, we selected the Country column. In this case, we’ll get the top 10 most expensive items for each country. Key Concepts Refresher Chapter 6 Figure 6.22 Rank Node By default, these partition columns are always used. When you select the Dynamic Partition Elements checkbox, SAP HANA will ignore any of the specified partition columns that aren’t included in the query. In other words, if your query doesn’t include the Country column (in Figure 6.22) in the query, SAP HANA will ignore the Country partition column. Finally, and most importantly, you can specify that a new output column must be generated. You can then use the column created by Generate Rank Column in analysis or reporting tools to sort the results. SPS 04 The Rank node is the one node that had the biggest update with SAP HANA 2.0 SPS 04. Just do a quick compare of Figure 6.22 (SPS 03) with Figure 6.23 (SPS 04). 쐍 The Generate Rank Column and the Logical Partition area are still the same. 쐍 The Order By column was expanded to a Sort Column area to allow for multiple sort columns. 쐍 The Sort Direction is renamed to Result Set Direction. 쐍 The Threshold value is now Target Value. 259 260 Chapter 6 Calculation Views Figure 6.23 Rank Node in SAP HANA 2.0 SPS 04 There are three new values as well: Aggregation Function, Result Set Type, and Offset. The Aggregation Function is where most of the new functionality lies. This gives you additional ranking functionalities that allow you to select records based on relevance, for example, select the last value for each day. There are four options, as shown by the dropdown menu: 쐍 Row This aggregation function automatically numbers the rows within a partition of a result set in sequence, without taking any additional factors into account. The first row of each partition is assigned a rank of 1. If two numbers are the same, there is no guarantee of which one will be consistently ranked before another. 쐍 Rank This option is the same as that available in SPS 03. 쐍 Dense Rank This function is closely related to the default Rank function. The difference comes in when two records in the list have the same ranking values. The default Rank function uses “Olympic” ranking. If two competitors have the same scores, they share a place, but a gap is left. For example, if two people have the same score on the second position, Key Concepts Refresher Chapter 6 they both have a rank of 2. There is then a gap (there is no rank of 3), and the next position has a rank of 4. With the Dense Rank function, the next position won’t have a rank of 4, but of 3. There are therefore no gaps in the ranking. 쐍 Sum The Sum function will keep showing records until the sum of the records is below the target value. You specify a range by setting the Target Value and Offset values. The range will start where the Rank Column value is larger than the Offset value, and will continue while the Rank Column value is less or equal to the Offset value + the Target Value. This can be used to report data on quantiles, such as quartiles. The Sum function will only use the first Sort Column. The Result Set Type is based on either the rank number (Absolute) or on the percentage (a value between 0 and 1). If you want to ignore the first five values in a ranking, you simply specify an Offset of “5”. In this case, we use a Result Set Type based on the rank number (Absolute). Table Function Node A Table Function node is code written in SQL or SQLScript which returns a result set that looks and behaves as if it came from a table with data. We’ll discuss table functions in more detail in Chapter 8. Hierarchy Function Node The Hierarchy Function node is a new feature in SAP HANA 2.0 SPS 03. Hierarchy functions enable you to build information models with hierarchical data stored in tables. Such tables typically have data records arranged in a tree or directed graph. We’ll discuss hierarchies and hierarchy functions in the next chapter. Anonymization Node The Anonymization node is also a new feature in SAP HANA 2.0 SPS 03. A lot of business data contains personal or sensitive information, and many large corporates seem to show very little respect for your data. They seem reluctant to get rid of this data as they want to use it for statistical analysis. The anonymization methods in SAP HANA 2.0 SPS 03 still allow you to gain statistically valid insights from your data but also protect the privacy of individuals. It works by modifying individual records without changing the aggregated statistics. For example, you 261 262 Chapter 6 Calculation Views can “scramble” salary data so that no one can find out a particular individual’s salary, but the aggregates such as sum, average, and so on still stay the same. A method such as k-anonymity is an intuitive and widely used method for modifying data for privacy protection. It hides the individual record in a group of similar records, thus significantly reducing the possibility that the individual can be identified. The topic of data privacy is often treated as a security topic and will therefore be discussed in Chapter 13. Star Join Node The Star Join node, which is the default View node in star join views, is used only to join a data foundation with fact tables to dimension views. Tip The join types available in the Star Join node are the same as the normal joins in a Join node, for example, referential joins. It’s what they’re joining that differs. Figure 6.24 shows a Star Join node. Note Look in SHINE for the Test calculation view. The middle block in the Join Definition tab is the data foundation, which comes from a Join node with two fact tables. The other blocks are calculation views of type dimension. Note Only calculation views of type dimension are supported as the dimension data sources in the Star Join node of a calculation view. The Star Join node is the default node of calculation views of type cube with star join. Use Join nodes, shown as Join_1 in Figure 6.24, to join all the transactional tables together. The uppermost Join node contains all the measures. The output from this last Join node becomes the data foundation for the Star Join node. Figure 6.24 only has one Join node called Join_1 for the data foundation. Then add the required dimension views, which have to be calculation views of type dimension, into the Star Join node and join them to the data foundation. Key Concepts Refresher Chapter 6 Figure 6.24 Star Join Node The star joins between the fact tables and dimensions can use the standard join types, for example, inner joins, referential joins, left outer joins, and even full outer joins. Tip A full outer join can be defined only on one dimension view used in a Star Join node, and this dimension view must appear last, just before the Star Join node. Tip You can only create temporal joins in a Star Join node and by specifying the join as an inner join. In that case, a new section appears below the join properties where you can set the Temporal Properties. Semantics Node Semantics is the study of meaning. In the Semantics node, you can give more meaning to the data itself. One basic example is renaming fields. Some database 263 264 Chapter 6 Calculation Views fields have really horrible names—but in the Semantics node, you can change the names of these fields to make them more meaningful for end users. This is especially useful when users have to create reports. If you have a reporting background, you’ll know that a reporting universe can fulfill a similar function. The Semantics node includes four different tabs, as shown in Figure 6.25. We’ll only discuss the first two tabs in this chapter; the other two tabs will be discussed in Chapter 7. Figure 6.25 Tabs in the Semantics Node View Properties Tab The first tab in the Semantics node is the View Properties tab. Here, as shown in Figure 6.26, you can change the properties of the entire calculation view. The View Properties tab has two subtabs, called General and Advanced, as shown in Figure 6.26. In the Data Category property, you can see the type of calculation view. In this case, it’s a DIMENSION view. In addition, the Default Client property is very specific to data from SAP Business Suite systems. In such SAP NetWeaver systems, we work with the concept of different clients. This use of client doesn’t refer to a customer or as part of client/server. In this case, we’re talking about an SAP system database being subdivided into different clients. When you log in to an SAP NetWeaver system, it asks not only for your user name and password but also for a client number. You then are logged into the corresponding client. You can only see the data in that client—not everything inside the SAP database. You can think of a client in SAP systems as a “filter” that prevents you from seeing all the data. Because SAP HANA is used as the database for many different SAP systems, SAP HANA must also allow for this concept of limiting data to a specific client. By setting the session client, you can say what the default client number is for this calculation view. Key Concepts Refresher Chapter 6 Figure 6.26 View Properties Tab in the Semantics Node 265 266 Chapter 6 Calculation Views The Session Client option will filter the data to display only data from the specific SAP client you’re using for your session, for example, from a reporting tool. Instead of using one of these two options, you can also explicitly specify the SAP client number that you want to display data for, for example, “800”, by selecting the Fixed Client option. Tip Settings in the Semantics node of a calculation view takes precedence over client settings made elsewhere, for example, the default client property for an SAP Web IDE user. For the Apply Privileges setting, you can choose between no privileges or SQL Analytic Privileges. (We’ll discuss these options in more detail in Chapter 13 when discussing security.) We recommend always using SQL Analytic Privileges for production systems. Setting the Deprecate checkbox doesn’t mean that the calculation view won’t work any longer; it will merely warn people that this calculation view has now been succeeded by a newer version somewhere in the system and that they should use the newer version because this one will disappear someday in the future. When you add this calculation view in another data model, you’ll see a warning not to use this calculation view. The calculation view will still work as usual, however. In the Advanced subtab, you can set the Execute In to SQL Engine. You’ll only set this preference when developing SAP HANA Live calculation views. More information on SAP HANA Live calculation views can be found in Chapter 14. Normally you shouldn’t set any of the Advanced properties. Columns Tab The next tab in the Semantics node is the Columns tab (see Figure 6.27). In this tab, the Type column shows whether the field is a measure or an attribute. This is very helpful to analytics and reporting tools to ensure that the data is shown and manipulated in the appropriate formats; for example, numeric fields (measures) can be aggregated. Renaming fields is easy. In the Columns tab, click the Name of the field, and type the new name. The same rule applies for the Label. The label is used, for example, when creating SAP HANA extended application services, advanced model (SAP HANA XSA) and HTML5 applications, or in some reporting tools. Key Concepts Refresher Chapter 6 Figure 6.27 Columns Tab in the Semantics Node Renaming fields is more common than you might expect. You tend to use field names like Price in an isolated context when you create tables for order, sales, or discounts. When you later start using these tables together, for example, in a Star Join node, you’ll have multiple fields called Price. SAP HANA will helpfully offer to give these different fields an alias, which is a different name for each of the price variations. You can create an alias such as OrderPrice, or the SAP HANA system will create an alias such as Price_1 for you. You might want to hide fields if you don’t want to show the original field to the end user. Even if a field is Hidden, it can be still used in a formula or a calculation; the field will still be available internally in SAP HANA for the calculation view but won’t be shown to the end users. In the Semantics column, you can further specify the type of field. An Assign Semantics dialog box will appear where you can select the type of field. These types 267 268 Chapter 6 Calculation Views are mostly specific to SAP HANA, and they can range from currency to dates to spatial. We’ll look at these semantic types in more detail in Chapter 7. Below the Columns tab name, you can see a toolbar as shown in Figure 6.28. The first three icons are used to mark the fields as attributes or measures. The fourth icon, as illustrated, is used to extract semantics. Let’s assume you’ve created a few dimension views with some hierarchies. You now create a new star join view, where you use the dimension views. You can use the Extract Semantics feature to replicate the hierarchies from the underlying dimension views. With this feature, you don’t have to create similar hierarchies in the star join view, which will save you a lot of time. You can also use it to copy labels, label columns, aggregation types, and semantic types from the referenced (underlying) calculation view, as shown in Figure 6.28. Figure 6.28 Extracting Semantics Important Terminology Chapter 6 Lastly, let’s take a quick look at the Show Lineage option accessed via the fifth icon on the toolbar (Figure 6.29). To use this option, select an output column, and click on the Show Lineage icon. The left side shows in each node where the column occurred. Go down the layers of nodes, and you can see that the Supplier_Country column was introduced by the Addresses table in the second Join node. This works even if the column got renamed along the way. Figure 6.29 Using the Show Lineage Option to Find the Node an Output Column Originated From Important Terminology In this chapter, the following terminology was used: 쐍 Data sources SAP HANA information views can use data from a wide variety of sources. The main data sources are as follows: – Row-based and column-based tables 269 270 Chapter 6 Calculation Views – SAP HANA information views in the SAP HANA system – Decision tables – SQL views – Table functions – Virtual tables with data from other databases – CDS – Data sources from another tenant in the SAP HANA system, if SAP HANA is set up as an MDC system 쐍 Aggregation node Used to calculate single summary values from a large list of values; supports sum, count, minimum, maximum, average, standard deviation, and variance. 쐍 Graph node Used for modeling relationships between entities and querying calculations, such as the shortest path between two entities. 쐍 Join node Used to get data sources into information views. 쐍 Projection node Used to remove some fields from a calculation view, rename fields, filter data, and add new calculated columns to a calculation view. 쐍 Rank node Used to create top N or bottom N lists of values to sort results. 쐍 Star join node Used to join a data foundation with fact tables to dimension views. The join types available in the star join node are the same as the normal joins in a join node. It’s what they’re joining that differs. 쐍 Union node Used to combine multiple data sources. You can make unions run even faster by using a pruning configuration table. 쐍 Intersect node Used to find the records that occur in all the data sources. 쐍 Minus node Used to find the records that are found only in one of the data sources. 쐍 Semantics node Used to give meaning to the data in information views. Practice Questions Chapter 6 쐍 Calculation view of type dimension Information views used for master data. 쐍 Calculation view of type cube with star join Views used for multidimensional reporting and data analysis purposes. 쐍 Calculation view of type cube Views used for more powerful multidimensional reporting and data analysis purposes. Practice Questions These practice questions will help you evaluate your understanding of the topics covered in this chapter. The questions shown are similar in nature to those found on the certification examination. Although none of these questions will be found on the exam itself, they will allow you to review your knowledge of the subject. Select the correct answers, and then check the completeness of your answers in the “Practice Question Answers and Explanations” section. Remember, on the exam, you must select all correct answers and only correct answers to receive credit for the question. 1. For what type of data are SAP HANA calculation views of type dimension used? 첸 A. Transactional data 첸 B. Master data 첸 C. Materialized cube data 첸 D. Calculated data 2. Which of the following node types can you use to build a dimension view? (There are 3 correct answers.) 첸 A. Star Join node 첸 B. Aggregation node 첸 C. Data Foundation node 첸 D. Semantics node 첸 E. Union node 271 272 Chapter 6 Calculation Views 3. Which types of calculation views can you create? (There are 2 correct answers.) 첸 A. Standard 첸 B. Derived 첸 C. Materialized 첸 D. Time-based 4. In which direction do you build calculation views when including the source tables in Join nodes? 첸 A. From the Join nodes up to the Semantics node 첸 B. From the Semantics node up to the default node 첸 C. From the Join node down to the default node 첸 D. From the Semantics node down to the Join nodes 5. You rename a field in the Semantics node to add clarity for end users. What does SAP HANA use to keep track of what the original field name is? 첸 A. The label column property 첸 B. The mapping property 첸 C. The label property 첸 D. The name property 6. You want to use a field in a formula but don’t want to expose the values of the original field to the end users. What do you do? 첸 A. Make the field a key column. 첸 B. Change the data type of the field. 첸 C. Rename the field. 첸 D. Mark the field as hidden. 7. Where can you set the default value for the SAP client for the entire calculation view? 첸 A. In the Join node of the calculation view 첸 B. In the semantic layer of the calculation view 첸 C. In the default node of the calculation view 첸 D. In the Preferences option of the Tools menu Practice Questions Chapter 6 8. True or False: When you mark a calculation view as deprecated, that view no longer works. 첸 A. True 첸 B. False 9. What node can you use for sorting results, for example, to show the top 100? 첸 A. Intersect node 첸 B. Projection node 첸 C. Rank node 첸 D. Union node 10. For what SQL keyword is the Aggregation node the equivalent? 첸 A. ORDER BY 첸 B. GROUP BY 첸 C. DISTINCT 첸 D. UNIQUE 11. What is the default node for a calculation view of type cube? 첸 A. Aggregation node 첸 B. Projection node 첸 C. Star join node 첸 D. Join node 12. What node is used to link dimension views to a data foundation (fact tables)? 첸 A. Join node 첸 B. Projection node 첸 C. Union node 첸 D. Star Join node 13. What data source is available for Join nodes in calculation views? 첸 A. All procedures in the current SAP HANA system 첸 B. Scalar functions from all SAP HANA systems 273 274 Chapter 6 Calculation Views 첸 C. Calculation views from other tenants in an MDC system 첸 D. Web services from the Internet 14. Which join types are available in a Join node of a calculation view? (There are 2 correct answers.) 첸 A. Referential joins 첸 B. Spatial joins 첸 C. Cross joins 첸 D. Temporal joins 15. From where does the extract semantics feature get the definition of hierarchies? 첸 A. From the current calculation view 첸 B. From the table data sources 첸 C. From the underlying dimension views 첸 D. From the SQL view data sources 16. You want to see in a Union node which data sources return no results for a query. How do you achieve this? 첸 A. Set the is Null property in the Manage Mapping dialog. 첸 B. Create a union pruning configuration table. 첸 C. Create a standard union. 첸 D. Set the Empty Union Behavior property. 17. You want to ignore columns defined in the Rank node that aren’t included in the query. What setting do you set? 첸 A. Default Client 첸 B. Keep Flag 첸 C. Dynamic Partition Elements 첸 D. Hidden Practice Question Answers and Explanations Chapter 6 18. True or False: A table function can be used as a data source for a calculation view. 첸 A. True 첸 B. False 19. True or False: You can use dynamic joins in a Non-Equi Join node. 첸 A. True 첸 B. False 20. Where in a calculation view can you add a Graph node? 첸 A. Only as the top node 첸 B. Always as the bottom (leaf) node 첸 C. Anywhere from the bottom (leaf) node to the default node 21. What node uses a special configuration table to optimize performance? 첸 A. Union node 첸 B. Aggregation node 첸 C. Non-Equi Join node 첸 D. Intersect node Practice Question Answers and Explanations 1. Correct answer: B You use SAP HANA calculation views of type dimension for master data. Transactional data needs a calculation view marked as Cube. SAP HANA doesn’t store or use materialized cube data. You can’t calculate data when you only work with attributes; you need measures to do that. 2. Correct answers: B, D, E You use an Aggregation node, a Union node, and a Semantics node to build a dimension view. Dimension views can be used in a star join, but it doesn’t form part of the actual dimension view. Data Foundation nodes don’t exist in calculation views. They used to exist in attribute and analytic views, which have since been deprecated. 275 276 Chapter 6 Calculation Views 3. Correct answers: A, D You can create standard and time-based calculation views. Derived views have been deprecated and were used with attribute views. SAP HANA doesn’t use materialized views. 4. Correct answer: A You build calculation views (when including the source tables in Join nodes) from the Join nodes up to the Semantics node. The Semantics node is above the default node, which is above the Join nodes. You always start from the bottom and work up. 5. Correct answer: B SAP HANA uses the mapping property to keep track of what the original field name is when you rename a field in the Semantics node. 6. Correct answer: D Mark a field as Hidden when you want to use a field in a formula but don’t want to expose the values of the original field to end users. 7. Correct answer: B You set the default value for the SAP client for the entire calculation view in the semantic layer of the calculation view. You can’t set properties for the entire view in the join or default nodes of the calculation view. The Preferences option in the Tools menu doesn’t contain such a setting. 8. Correct answer: B False. When you mark a calculation view as deprecated, that view will still work. 9. Correct answer: C You use a Rank node to sort results. Projection nodes have calculated columns (which we’ll discuss in Chapter 7), but aren’t used to sort results, for example, to show the top 100. Intersect and Union nodes combine records from multiple data sources. 10. Correct answer: B Aggregation nodes are equivalent to the GROUP BY keyword. 11. Correct answer: A The default node for a calculation view of type cube is an Aggregation node. A Projection node is the default node of a calculation view of type dimension. A Star Join node is the default node of a calculation view of type cube with star join. 12. Correct answer: D A Star Join node is used to link dimension views to a data foundation (fact tables). Practice Question Answers and Explanations Chapter 6 13. Correct answer: C Calculation views from other tenants in an MDC system are available as data sources for Join nodes in calculation views. Scalar functions don’t return table result sets, and procedures don’t have to generate any result sets. (For more information, see Chapter 8.) 14. Correct answers: A, B Referential joins and spatial joins are available in a Join node of a calculation view. Temporal joins are only available in Aggregation nodes and Star Join nodes. 15. Correct answer: C The Extract Semantics option can get the definition of hierarchies from the underlying dimension views. Table data sources and SQL view data sources don’t have hierarchies. 16. Correct answer: D You set the Empty Union Behavior property if want to see in a Union node which data sources return no results for a query. You create a union pruning configuration table for performance reasons. For the Empty Union Behavior to work, you need to set constant values. A standard union doesn’t use constant values. 17. Correct answer: C You set the Dynamic Partition Elements checkbox if you want to ignore columns defined in Rank node that aren’t included in the query. The Default Client setting is used in the Semantics node for the entire calculation view. The Keep Flag and the Hidden flag are set for each field in the Output Columns tab on all nodes. 18. Correct answer: A True. A table function can be used as a data source for a calculation view. 19. Correct answer: B False. You can’t use dynamic joins in a Non-Equi Join node. Tip Some questions in the exam will only have three answer options, like the following question. 277 278 Chapter 6 Calculation Views 20. Correct answer: B A Graph node must always be the bottom (leaf) node. 21. Correct answer: A Union nodes can use a special configuration table to optimize performance for union pruning. Takeaway You should now know how to create SAP HANA information views and when to create each type of calculation view based on your business requirements. You’ve learned how to add data sources, create joins, add nodes, and select output fields. In Table 6.2, we define the different information views and provide a list of their individual features. Calculation Views of Type Dimension Calculation View of Type Cube with Star Join Calculation View of Type Cube Purpose: Create a dimension Purpose: Create a cube for for our analytic modeling our analytic requirements needs Purpose: Create more advanced information models 쐍 Treats all data as attributes Focuses on extra functionality, such as aggregations, joins, projections, ranking, and unions Combines fact tables with dimensions 쐍 Uses mostly joins Table 6.2 Information View Features Summary You learned how to create calculation views and add data sources, joins, nodes, and output columns. You added meaning to the output columns in the Semantics node and set the properties for the view. In the next chapter, we’ll discuss how to further enhance our information views with additional functionality, such as filter expressions, calculated columns, currency conversions, and hierarchies. Chapter 7 Modeling Functions Techniques You’ll Master 쐍 Learn how to define calculated and restricted columns 쐍 Examine filter operations and filter expressions 쐍 Identify when to create variables and input parameters 쐍 Understand currency and currency conversions 쐍 Know how to create hierarchies for use in reporting tools 쐍 Use Hierarchy Function nodes to create hierarchies for use inside the calculation views 280 Chapter 7 Modeling Functions In previous chapters, you learned how to build SAP HANA calculation views. In this chapter, you’ll see how to enhance calculation views with additional functionality, including adding output fields, filtering data, converting currencies, asking for inputs at the time they are run, and implementing hierarchies. Real-World Scenario You’re working on an SAP HANA modeling project and are asked to help with publishing the company’s financial results for the year. You copy some of the calculation views that you’ve created before and start adding additional features to them. First, you filter the data to the last financial year and to the required company code. You then implement currency conversions on all the financial transactions, so that the results can be published in a single currency. The board members aren’t sure if they want the results in US dollars or in euros, so they ask you to give them a choice when they run their queries on the SAP HANA calculation views. They also ask you to create a dashboard where they can drill down into the results to see any missing data or issues. Finally, you create an overview report showing the past few years of financial results side by side. Objectives of This Portion of the Test The purpose of this portion of the SAP HANA certification exam is to test your knowledge of SAP HANA modeling and calculation views, as well as your understanding of how to enhance and refine these views with additional functionality. The certification exam expects you to have a good understanding of the following topics: 쐍 Calculated columns 쐍 Restricted columns 쐍 Filter operations 쐍 Variables 쐍 Input parameters Key Concepts Refresher Chapter 7 쐍 Currency and currency conversions 쐍 Hierarchies 쐍 Hierarchy functions Note This portion contributes up to 15% of the total certification exam score. This is one of the largest portions of the certification exam. Key Concepts Refresher In the previous chapter, we saw how to create output columns for our calculation view. In this chapter, we’ll learn about creating additional output fields. We’ll see how to add calculated columns, restricted columns, counters, and currency conversion columns to the list of output fields in our calculation views. In our discussions on these topics, see if you can identity which feature solves each problem described in the real-world scenario described previously. Let’s start with the first of our nine topics on how to enhance our SAP HANA calculation views with modeling functions: calculated columns. Calculated Columns Up to this point, we’ve created calculation views using available data sources. However, quite often, we’ll want SAP HANA to do some calculations for us on that data (e.g., calculate the sales tax on a transaction). In the past, this was usually done using reporting tools. However, SAP HANA has all the data available and can do the calculations without having to send all the data to the reporting tool. SAP HANA does the calculations faster than reporting tools and sends only the results to reporting tools, saving calculation time, bandwidth, and network time. When we create calculated columns in SAP HANA, it adds additional output columns to the output of our views. The calculated results are displayed in these calculated columns. Figure 7.1 illustrates how calculated columns work. 281 282 Chapter 7 Modeling Functions Calculated columns Table Quantity Item Price Total 5% discount, but 10% if more than 100 items 15% Sales Tax (VAT) Discount Sales Tax (VAT) 2 $200 $400 $380 $57 Plug 500 $5 $2500 $2250 $338 Cable Device 200 $10 $2000 $1800 $270 Connector 7 $20 $140 $133 $20 Adapter 3 $50 $150 $143 $21 Figure 7.1 Calculated Columns Added as Additional Output On the left side of Figure 7.1 is our source data from a table. We added two calculated columns on the right. The first calculated column calculates discounts, and the second one calculates the sales tax. Note The calculated columns are part of the output of the calculation view. The results aren’t stored anywhere, but they are calculated and displayed each time the view is called. We’ll start by discussing the calculated columns in SAP HANA that behave like those we just described. After that, we’ll also look at counters, which create calculated columns for the number of items in specified groupings of data. Creating Calculated Columns In SAP Web IDE for SAP HANA, we create calculated columns in the work area of a node. In the Calculated Columns tab, we select the + option, as shown in Figure 7.2. When you click on the name of the calculated column, you’ll get the work area. In the work area, you can enter the various settings for the calculated column. For example, you have to specify the Name, Label, and Data Type of the new column you want to create. You can also specify what Semantic Type you want to assign to your new calculated columns, for example, making it a Date or a Currency column. Key Concepts Refresher Chapter 7 Figure 7.2 Creating a New Calculated Column in a Node In the Expression area, you can add your formula, called the expression, for doing the calculation. You can just type in the formula you need, or use the Expression Editor screen for more complex expressions. Click on the Expression Editor button to open the next screen. In the Expression Editor screen, shown in Figure 7.3, we first specify what programming Language we want to write the expression in. SQL is the default. You can still use the Column Engine for expressions you might have used in older calculation views that you’re migrating or upgrading. The focus is the formula (called the Expression) on the right column. The three tabs on the left of the screen help us create the expression: 쐍 Elements The Elements tab contains all available columns, previously created calculated columns, and input parameters. We can build our new calculated column on previous calculations to build up more complex calculations. 283 284 Chapter 7 Modeling Functions 쐍 Functions The Functions tab contains various useful functions, such as converting data types, working with strings, doing the basic mathematical calculations, and working with dates. We also have some extra functions, for example, the if() function for if-then-else calculations, but these are only available when the Language is set to Column Engine. You can also get some online help when clicking the i (information) icon. 쐍 Operators The Operators tab contains operators, for example, for adding, multiplying, comparing, or combining columns. Figure 7.3 Using the Expression Editor to Create the Formula for Calculated Columns All the examples here use measures. However, we can also create calculated columns for attributes as well. The Functions also allow us to work with strings and dates, so we can create expressions using attributes. Key Concepts Refresher Chapter 7 Tip We recommend that you click on the various components instead of entering the expression. Although you can enter the expression directly in the text area, we’ve found that many people either mistype a column name or forget that SAP HANA column names can be case-sensitive. By clicking on the components instead, SAP Web IDE for SAP HANA assists you with the typing. The Expression Editor also helps us with syntax checking when we click the Validate Syntax button on the top right. In the top example in Figure 7.4, the Expression Editor describes the error. In this case, we used the if() function in a SQL expression. As it isn’t available here, we get an error. When we change the Language to Column Engine, we get a Valid Expression. Figure 7.4 Syntax Checking in the Expression Editor Tip In Chapter 6, we discussed the Keep Flag column, where we mentioned that sometimes you have to calculate values before you do any aggregation. This is very applicable in the case of calculated columns. In some cases, you might have to create your calculated columns in the node below the Aggregation node. Counters In Aggregation and Star Join nodes, we sometimes want to know how many times in the result set a certain item occurs for an attribute. For example, we might want to know how many unique items were sold in each country to find the countries with the largest variety. We do this using counters. 285 286 Chapter 7 Modeling Functions The option to create a new counter is found in the calculated column’s create (+) menu in Aggregation and Star Join nodes (see Figure 7.5). In the Calculated Column tab, click the + icon, and select the Counter option to create a new counter. In our example, we’ll put the counter on items sold. We can also use multiple columns in the Counter section to expand our example to add item categories, for example, to show which country has the largest variety of items for electronics, the largest variety of items for industrial pumps, and so on. Figure 7.5 Creating a New Counter Restricted Columns The easiest way to explain restricted columns is to think of a reporting requirement that you frequently encounter. For example, let’s say we have a website where we Key Concepts Refresher Chapter 7 work with the data of who visited the website and what device they used. We use a view like the one shown in Figure 7.6. Our data from the source table is in the two columns on the left. We can see how many users connected with each type of device, organized per row. We would like our report to be like the bottom right of the screen in Figure 7.6, where we have a column for each device with the total number of users for each device. You might know this as a pivot table. We already saw in Chapter 5 that we could use a union with constant values to achieve a similar result. We can also achieve this by creating a restricted column for each device. This adds a new additional output column to our view. When we create a restricted column for the mobile phones, we restrict our column to only show mobile phone data. Everything else in that column is NULL. When we now aggregate the restricted column, we get the total number of users visiting the website on mobile phones. We create a restricted field for each device to create our report. Table Device Restricted columns Number of users Mobile phone Tablet Laptop Desktop 20 Mobile phone 20 Tablet 10 Desktop 10 Mobile phone 10 10 Mobile phone 40 40 Laptop 20 Tablet 20 Laptop 30 Mobile phone 50 50 Mobile phone 40 40 Laptop 20 Desktop 20 10 10 20 20 30 20 20 Aggregated results Mobile phone Tablet Laptop 160 30 70 Figure 7.6 Creating Four Restricted Columns and Aggregating the Result Sets Desktop 30 287 288 Chapter 7 Modeling Functions Tip Even though a union with constant values can achieve a similar output result, you don’t always have tables with similar field structures available to do a union. Using restricted columns to achieve this output result will always work and is easy to set up. Because we need aggregation for this to work properly, restricted columns are only available in the Aggregation and Star Join nodes. Note We create restricted columns on attributes. For example, it doesn’t make sense to create a Reporting column with a heading of “25” (measure). However, a Mobile phone column (an attribute) provides more useful information. Creating a restricted column is similar to creating a calculated column. In the Restricted Columns tab, as shown in Figure 7.7, we click the + icon. Tip In SAP HANA Interactive Education (SHINE), look for the PURCHASE_COMMON_CURRENCY calculation view. In the right side of the work area, we create our restricted column. We specify the name of the restricted column and what data we want to display in the column. In this case, we chose the GROSSAMOUNT column. In the Restrictions area, we can add one or more restrictions, such as the year has to be 2019, the device has to be a mobile phone, or only show the Flat screens sold (Figure 7.7). If we add multiple restrictions, SAP HANA automatically creates the relevant AND and OR operators. For example, if we have two restrictions for the year 2019 and 2020, the Expression Editor will use year = 2019 OR year = 2020. Key Concepts Refresher Chapter 7 Figure 7.7 Creating a Restricted Column for the Net Amount of Invoices When we enter a value, such as “2019” for the year, we can ask the Expression Editor screen to give us a list of existing values currently stored in the system. In Figure 7.8, we asked for a list of item categories. In the list, we see, for example, Flat screens and Graphic cards. This popup list is called the value help list. You’ll find these value help lists in all the modeling screens. In the restricted column creation dialog, in the Restrictions area, we can choose to use an Expression rather than specify individual Column values. We can then use the Expression Editor to generate the relevant portion of the SQL WHERE statement directly. 289 290 Chapter 7 Modeling Functions Figure 7.8 Choosing a Restriction Value from a Predefined List of Values If you first fill in the values in the Column view and then change to the Expression view, the Expression Editor automatically converts this for you, as shown in Figure 7.9. You can also create a restricted column using a calculated column. Just use the expression if(“Category”=’Flat screens’, “GROSSAMOUNT”, NULL). The if() function in the Expression Editor works similarly as the IF function in Microsoft Excel: if the condition (first part with the equal sign) is true, then show the second part; otherwise, show the last part. Note that the if() function isn’t available in the default Language, which is SQL. In SAP HANA 2.0, it’s recommended to choose SQL. Just like with a calculated column, a restricted column doesn’t store data. Each time a calculation view is used, the restricted column is created and aggregated. Tip A restricted column fundamentally differs from a filter or a SQL WHERE condition. If you filter on the year 2019, all the data in the entire view will be trimmed to only show the data for 2019. When you create a restricted column for the year 2019, only the data in that (additional) column is trimmed for 2019. Let’s look at filters next. Key Concepts Refresher Chapter 7 Figure 7.9 Using the Expression Editor to Create Restrictions on Restricted Columns Filters Sometimes we don’t want to show all the data from a specific table. For example, we might only want to show data for the year 2019. In this case, we can apply a filter. Filters restrict the data for the entire data source. They are therefore similar to the WHERE condition in an SQL statement. When we apply filters, SAP HANA can work a lot faster because it doesn’t have to work with and sift through all the data. Something to keep in mind is that you can leverage partitions when designing filter expressions. When a field has been partitioned, SAP HANA has already separated the data by values. If you create a filter on a partitioned column, you can expect good performance on your filter expressions. There are three different ways we can use filtering. We can use filter expressions in the calculation views, and this is what we’ll use most of the time. The other two filtering methods are specific to SAP systems written on SAP NetWeaver and are there for compatibility reasons. We’ll start with using filter expressions in our calculation views. 291 292 Chapter 7 Modeling Functions Filter Expressions To create a filter expression, select the Filter Expression tab in the work area of your node. This puts you into the Expression Editor, as shown in Figure 7.10, where you can create an expression. Figure 7.10 Creating a Filter Expression in a Projection or Join Node This works similar to the Expression Editor for calculated and restricted columns. In this case, however, the filter expression will be similar to the WHERE condition of an SQL statement. A filter icon will appear next to the name of the Filter Expression tab after you’ve created a filter expression. In the left work area of your calculation view, where you see the view nodes, you’ll also see a filter icon in the node, as shown in Figure 7.11. Figure 7.11 A Filter Expression Icon in the View Node Key Concepts Refresher Chapter 7 Filtering for SAP Clients Filtering of data can be implemented more widely than the examples we’ve just seen. In Chapter 6, we saw how to set an option for filtering to an SAP client in the Semantics node. Later, we’ll see another example of limiting data to the session client for data from SAP financial systems. Chapter 8 and Chapter 13 also provide further details on filtering for an SAP client using SQL/SQLScript and security, respectively. Domain Fixed Values SAP ERP and SAP S/4HANA use the ABAP programming language, which has some powerful Data Dictionary features. In ABAP, we can define reusable data elements (e.g., data types) and use them in our table definitions. Sometimes we want these data elements to only allow a range of values. We define a domain with the range of values and use this domain with the data element. In all the tables where that specific data element is used, the users can now only enter values that are in the specified range. Instead of a range, we can also specify fixed values for the domain, which are known as the domain fixed values. These are values that won’t change, and the users can only choose from the available list when they enter data into the fields. Let’s look at some examples: Say the users need to enter a gender. The domain values are fixed to male and female. This is very similar to the value help shown earlier in Figure 7.8. Another example could be the possible statuses of a sales order, which could be Quoted, Sold, Delivered, Invoiced, and Paid. The lists of fixed values are stored in tables DD07L and DD07T in the SAP ERP system. Because these tables are so valuable for business systems, we often replicate (or import) these tables to SAP HANA. We can then use these values in SAP HANA for the Value Help screens, similar to that shown earlier in Figure 7.8. These domain fixed values can now also be used to filter business data; for example, we can filter the sales data to the domain fixed value of paid to see only the sales that are already paid. Variables In the previous section, we looked at filtering. These filters were all static, in the sense that we couldn’t change them after we specified them. Each time you query the view, the same filters are used. 293 294 Chapter 7 Modeling Functions Frequently, however, we’ll want the system to ask us at runtime what we would like to filter on. Instead of always filtering the view on the year 2019, we can specify which year’s data we want to see. If we query the view from a reporting tool, we even expect the reporting tool to pop up a dialog box and ask us what we would like to filter on. In other words, you only specify the filter when the query is run on the calculation view. This dynamic type of filtering is done with variables. We define variables in the Semantics node of a calculation view. In Chapter 6, we discussed the first two tabs available in the Semantics node. Now we’ll look at one of the remaining two tabs: Parameters. In Figure 7.12, we see the Semantics node. In the Parameters tab, we select the + icon to reveal the dropdown menu, and select the Variable option. Figure 7.12 Creating a Variable in the Semantics Node In the General work area (Figure 7.13), we start by specifying the Name of the variable. In the Selection Type field, we can choose what type of variable we want to create. There are three types of variables: 쐍 Single Value The variable only asks for one (single) value; for example, the year is 2019. 쐍 Interval The variable asks for from and to values, for example, the years from 2015 to 2018. 쐍 Range The variable asks for a single value but combines it with another operator; for example, the year is greater and equal to 2015. Key Concepts Refresher Chapter 7 Figure 7.13 Creating a Variable When the Multiple Entries checkbox is enabled, we can, for example, select the years 2013, 2016, and 2020 from the Value Help dialog. By default, the variable uses the view or table that it’s based on to display a list of available values in the Value Help dialog. Changing the View/Table Value Help option allows us to specify a different view or table for the list of available values in the Value Help dialog. Ideally, the data source we specify should be in sync with the view we’re using the variable on; otherwise, the users might choose a value from the Value Help dialog that isn’t available in the current view and, therefore, won’t be shown any information. The reason we do this is to ensure that we have consistent value help lists 295 296 Chapter 7 Modeling Functions for all our different views by specifying the same value help data source for all the views. Sometimes this can also help to speed up the display of the value help. With the Hierarchy option, we can link a hierarchy to our variable. Users can then drill down using this hierarchy when they want to select a value from the Value Help dialog. You’ll learn how to set up hierarchies later in this chapter. We can specify default values for our variable in the Default Value(s) section. If the user selects Cancel or presses (Esc) in the Value Help dialog, as shown later in this chapter in Figure 7.18 and Figure 7.24, the variable will then use these default values. We can specify the Attribute(s) on which we want to create our variable in the Apply Filter section. This is the field on which we want to implement our “dynamic filter.” After we’ve created the variable, it will show up in the Columns tab of the Semantics node, next to the name of the attribute we selected. You can see an example in Figure 7.14. Figure 7.14 Variable Shown in the Columns Tab of the Semantics Node Key Concepts Refresher Chapter 7 When we preview our calculation view, a Variable/Input Parameters screen will appear in SAP Web IDE for SAP HANA, as shown in Figure 7.15. We have to supply these values before SAP will run our view and return the results. If we supplied default values, these values will be prepopulated in the fields, and we just continue running the query on our calculation view. When we select the icon on the right side of the FROM field (see Figure 7.15), a Value Help dialog will appear with a list of available values that we can choose from. Figure 7.15 Variable Dialog Input Parameters Input parameters are very similar to variables except where variables are specifically created for dynamic filtering, input parameters are more generic. For example, we can use them in formulas or expressions. Let’s look at an example: An outstanding payments report asks you for how many months outstanding you want to see the results. This time period can then be used in a calculated column’s expression to calculate the results we asked for. The report can be used by the debt-collecting department for payments outstanding longer than two months, while the legal department will use the same report to ask for payments outstanding longer than six months. We’ll start by learning how to create input parameters, and then take a quick look at how to map them to different information-modeling objects. Creating Input Parameters Input parameters are created in the Semantics node (see Figure 7.16). We can also create input parameters in all the node types. The input parameter work area and the various dropdown menu options are shown in Figure 7.16. 297 298 Chapter 7 Modeling Functions Figure 7.16 Creating an Input Parameter We start with the Name of the input parameter. The Parameter Type specifies what type of data we expect to use this input parameter for. There are five types of input parameters we can create: 쐍 Direct End users directly type in the value. We can specify the Data Type of the expected value as Currency, Unit of measure, or Date. 쐍 Column End users can choose a value from all the possible values in a specified column in the current view. 쐍 Static List End users can choose a value from a predefined (static) list of values. 쐍 Derived From Table End users can choose a value from data in another table. Key Concepts Refresher Chapter 7 쐍 Derived From Procedure/Scalar Function End users can choose a value from a list of values generated by a scalar function or a procedure. Note that it must be a scalar function or a procedure that returns a single list of values—basically a single column. A table function or a procedure that returns table result sets (with multiple fields) isn’t allowed here. Input parameters ask for input at runtime, which means that any generated values can be overwritten by the end user when the query is executed. The Default Values can be entered as a Constant value or an Expression. Similar to variables, if the user selects Cancel or presses (Esc) in the Value Help dialog, the input parameter will use this default value. In Figure 7.16, we created an input parameter called DEMO_INPUT_PARAM. We can now use this input parameter, for example, in a calculated column. When we use the input parameter in an expression, we add a $$ at the start and end of the name of the input parameter. This makes it very clear that we’re referring to an input parameter. The name of our input parameter will become $$DEMO_INPUT_PARAM$$. In Figure 7.17, we use the input parameter in the expression of a calculated column. The following expression calculates the number of days between the date that the end user supplies and the delivery date: daysbetween(“DELIVERYDATE”, $$DEMO_INPUT_PARAM$$) In this case, the input parameter has a Parameter Type of Direct and a Data Type of Date. (We changed the Currency shown in Figure 7.16 to Date.) Figure 7.17 Using an Input Parameter in an Expression for a Calculated Column 299 300 Chapter 7 Modeling Functions Tip We have $$CLIENT$$ and $$LANGUAGE$$ built into SAP HANA. These don’t ask for values at runtime, but instead give the SAP client number filter currently in use and the current logon language of the end user. You can see $$CLIENT$$ used later in this chapter in Figure 7.23. When we preview the calculation view in SAP Web IDE for SAP HANA, and the input parameter has a Data Type of Date, a calendar is displayed where you can choose the date you want (see Figure 7.18). Figure 7.18 Date Input Parameter Mapping Input Parameters Using input parameters is a very powerful modeling concept. We can link input parameters to many different information-modeling objects. In Figure 7.19, we select the Parameters tab, and then click on the Manage input parameter mappings icon. We can change the Type of mapping to the following: 쐍 Data Sources We can map the input parameters in this calculation view, for example, to the input parameters in an underlying calculation view, such as a dimension view. Key Concepts Refresher Chapter 7 쐍 Procedures/Scalar function for input parameters We can map the input parameters in this calculation view to the input parameters of a scalar function or procedure. Similarly to when we discussed the type of input parameter when creating one, it must be a scalar function or a procedure that returns a single list of values—basically a single “column.” 쐍 Views for value help for variables/input parameters We can map the input parameters in this calculation view to the input parameters of an external view for value help. 쐍 Views for value help for attributes When you’re filtering the attributes in the underlying data sources that use other calculation views as value help, you can map the input parameters in these value help views to the input parameters of this calculation view. Figure 7.19 Mapping Input Parameters In Figure 7.19, we use a scalar function as the source. But the Source Input Parameters list on the bottom left is empty because we have the Type of mapping specified as Data Sources and not as Procedures/Scalar function for input parameters. If you change to the correct mapping Type, the Source Input Parameters will automatically show up. 301 302 Chapter 7 Modeling Functions Note Unfortunately, the phrase input parameters has two meanings in SAP HANA. The first meaning is for the input parameters we discussed earlier in this chapter, which ask for “input” from a user at runtime. The second meaning is that SQLScript procedures and functions can have input parameters and output parameters. These functions need “input” and give “output.” The preceding mapping gives a way to link these different types of input parameters. Session Variables Input parameters have a lot of uses as we’ve just seen. There are a few cases where predefined input parameters can be very helpful. In some applications, you might want to know extra information such as the name of the user that is logged in, the name of the application that is calling the calculation view, and even the name of the SAP HANA extended application services, advanced model (SAP HANA XSA) user when a technical user is used to talk to the SAP HANA information models. In many modern applications, such as the microservices running in SAP HANA XSA containers, developers are using command-line parameters. These are values that are passed to an application at execution time. These developers have been asking for the same feature for SAP HANA applications. Users currently have to set the value of an input parameter each time they call a calculation view that asks them for a value. What we need is something that is set once and stays active for the duration of each user’s session. SPS 04 SAP HANA 2.0 SPS 04 has two more options in the dropdown menu when you add a new parameter. The updated menu is shown in Figure 7.20. Figure 7.20 Creating a Session Variable in SAP HANA 2.0 SPS 04 Key Concepts Refresher Chapter 7 You can create a Fuzzy Search Parameter for helping users when searching text fields. We’ll look at this option in Chapter 10. The other option is to create a Session Variable. As the name suggests, this is like an input parameter but with a value that is set for your current session. Think of working session on the system as the time from when you log in until you log out. You can use some predefined session variables, such as the name of the user that is logged in. You can also create your own session variables, which allows you to control the processing flow based on the value in your session variable. For example, when working with financial data in a cube, you have both budget data and actual expenses. (We discussed this in Chapter 5 when looking at how to build an information model.) You can now set a session variable to only work with the budget data or only work with the actual expenses. In a Union node, you can use the session variable in the Mapping tab for constant pruning, or you can use a filter on the session variable in another type of node. Working with either hot or warm data in your session is another example. (We discussed hot and warm data in Chapter 3 in the Data Aging section.) Currency SAP is well known for its business systems. An important part of financial and business information models is their ability to work with currencies. In Figure 7.21, we can see the Columns tab of a calculation view. We have a few output fields that are currency fields, for example, four Sales amounts for different regions. When we look at the Properties of these fields, they are treated as numeric fields (measures). We can improve on that by changing the Semantic Type property to Amount with Currency Code. When we change the Semantic Type of a field to currency, an Assign Semantics screen appears as shown in Figure 7.22. SPS 04 SAP HANA 2.0 SPS 04 also has a Hyperlink semantic type. Tools, such as web browsers, that know how to consume hyperlinks can use this information for added functionality. 303 304 Chapter 7 Modeling Functions Figure 7.21 Assigning an Output Field to the Semantic Type “Amount with Currency Code” Figure 7.22 Currency Conversion Most of this screen is focused on currency conversion. Via the Internet, even small companies can do business internationally, and companies can buy parts from Key Concepts Refresher Chapter 7 different suppliers worldwide with international currencies. Their taxes and financial reporting, however, happen in their home country in the local currency. We therefore need to convert currencies as part of doing business. When converting from one currency to another, there is a source currency and a target currency. The date is equally important because exchange rates fluctuate all the time. But what date do we use? Do we use the date of the quote, the date of the invoice, the date of the delivery, the date the customer paid, the end of that month, or the financial year-end? All of the mentioned dates are valid, so the answer depends on the business requirements. By default, most of the options in the Assign Semantics screen shown in Figure 7.22 and Figure 7.23 are unavailable. Enabling the Conversion flag opens up both the Conversion Tables and Definition sections. In SAP systems, the exchange rate data gets stored in some tables prefixed with the TCUR name. There are many programs from banks and other financial institutions that help update the TCUR tables in SAP systems. These TCUR tables are used by the SAP business systems for currency conversions. SAP HANA also knows how to use the TCUR tables. You can easily get a copy of these tables by installing the SHINE package. It will create these TCUR tables for you to demonstrate currency conversions in its information models. These tables are specified in the Conversion Tables section. Let’s go through some of the options in Figure 7.23 to learn more about currency fields and currency conversions in SAP HANA. The top section has the following options: 쐍 Display Currency You can specify the current field’s currency by selecting a specific currency (Fixed) or selecting a Column that stores the currency for each record in the transactions table. 쐍 Decimal Shifts Some currencies require more than two decimal places, which is indicated with this flag. 쐍 Shift Back ABAP programs automatically do a decimal shift. When these ABAP programs work with SAP HANA information views, they might receive currency data from SAP HANA that has already been shifted, and a double decimal shift occurs. In those cases, we need to set this flag to “undo” the extra Decimal Shift. 305 306 Chapter 7 Modeling Functions 쐍 Reverse Look Up If you want to convert EUR to USD, but the system only has an entry for USD to EUR, then the reverse lookup method will find a rate (USD to EUR), invert it, and use this to convert EUR to USD. Figure 7.23 Currency Conversion and the Definition Tab Some of the Definition section options in the screen include the following: 쐍 Source Currency and Target Currency The specific currency codes, for example, EUR for euros and USD for US dollars, can be specified directly in these fields. For the Source Currency we also have the option of Column, and for Target Currency we have the following two additional options: – Column: If we’re storing multiple currencies in the same column, we’ll have another column that specifies what currency each record is in. We can tell SAP HANA to read the currency codes from that column and automatically convert each of the individual records. Key Concepts Refresher Chapter 7 – Input Parameter: We can choose the currency code to use for the currency conversion at runtime using an input parameter. We can get a financial report to ask end users if they want the report in euros or dollars, or allow them to pick any currency from the list of available currency codes in the Value Help dialog. We have to specify the input parameter as a VARCHAR data type with a length of 5 (refer to Figure 7.16 for an example). 쐍 Conversion Date The date of the conversion is used to read the exchange rate information for that date from the TCUR tables. 쐍 Generate The converted currency value is shown in a new additional output column with the specified name. Some of the remaining options specify what type of currency conversion you want to retrieve from the TCUR tables, for example, whether you’re using the buying rate or the selling rate. Finally, you can specify what SAP HANA should do in the case of errors. The Upon Failure field allows three values: 쐍 Fail In this case, the currency conversion process will fail. 쐍 Ignore The currency conversion doesn’t do a conversion and puts the source amount in the target column. This can be quite dangerous as a few million in one currency could be worth just a few dollars. The large (or small) amounts are copied across as-is. 쐍 Set to NULL The target column’s value is set to NULL. Tip Note in the Client field how we can use the built-in $$client$$ input parameter. When you run currency conversion and you’ve specified an input parameter for the Source Currency or the Target Currency, the process will show you a Value Help dialog as shown in Figure 7.24. 307 308 Chapter 7 Modeling Functions Figure 7.24 Value Help for the Source or Target Currency Field Hierarchies Hierarchies are important modeling artifacts that are often used in reporting to enable intelligent drilldown. In Figure 7.25, we can see an example in Excel. Users can start with a list of sales for different countries. They can then drill down to the sales of the provinces (regions or states) of a specific country and, finally, down to the sales of a city. We’ve discussed the concepts of hierarchies in Chapter 5 already. We discussed two types of hierarchies—level hierarchies and parent-child hierarchies—when we should use each type of hierarchy, and what the differences are between these two types of hierarchies. Note Two new types of hierarchies, spantree and temporal, were added in SAP HANA 2.0 SPS 02. We’ll quickly mention these new hierarchy types in the next section on hierarchy functions with SQL. Key Concepts Refresher Chapter 7 Figure 7.25 Hierarchy of Sales Data from SAP HANA in Excel You can use multidimensional expression (MDX) queries to use these hierarchies. Just like SQL has become a standard for querying relational databases, MDX is a standard for querying online analytical processing (OLAP) data. SAP uses Business Intelligence Consumer Services (BICS) instead of MDX when communicating between SAP systems. BICS is a proprietary connection, so SAP can only use it for SAP systems. We use it because it’s a lot faster than MDX. As hierarchies are important for analytics and reporting tools, we’ll look at the various aspects of hierarchies, including creating them, working with calendar-based and time-based hierarchies, using them, and calling them from SQL. Creating a Hierarchy We create a hierarchy in the third tab of the Semantics node in a calculation view, as shown in Figure 7.26. Select the + icon in the top-right menu on the Hierarchies tab. You can choose to create a Level Hierarchy or a Parent Child Hierarchy from the dropdown menu. Figure 7.26 Creating a Hierarchy 309 310 Chapter 7 Modeling Functions You can specify the hierarchy options by clicking on the name of the hierarchy, and completing the fields in the work area in SAP Web IDE for SAP HANA. You start by specifying the Name of the hierarchy. In Figure 7.27, we create a level hierarchy in SAP HANA where we add the different levels. On each level, you select a specific field name in the dropdown list. Each level uses a different column in a level hierarchy. You can also specify the sorting order and on which column it has to sort. Figure 7.27 Level Hierarchy Figure 7.27 shows an example of a level hierarchy using geographic attributes. Level 1 is the region (e.g., Africa or Asia), level 2 is the country, and level 3 is the city. In Figure 7.28, you can see an example of a parent-child hierarchy of an HR organizational structure with managers and employees. In this case, the manager has the parent role, and the employee has the child role. Key Concepts Refresher Chapter 7 We can also use parent-child hierarchies for cost and profit centers or for a bill of materials (BOM). Figure 7.28 Parent-Child Hierarchy We see that the parent-child hierarchy in Figure 7.28 has five sections at the bottom that are collapsed at the moment. Clicking on these section headers opens them up to provide more options. In Figure 7.29, we see the first section for Properties of the hierarchy. The mostused option here is defining what we should do with Orphan Nodes. This applies to both level hierarchies (the last dropdown field in Figure 7.27) and parent-child hierarchies (the second dropdown field in Figure 7.29). Orphan nodes are employees that haven’t yet been assigned a manager, for example, interns in some companies. You can specify in the dropdown menu that they should become Root Nodes (e.g., board members in a company), generate an Error, be Ignored, or be given a Step Parent. 311 312 Chapter 7 Modeling Functions In the case of Step Parent, you can specify a default node that these “orphans” should report to, for example, to the HR manager. The Step Parent node must exist in the hierarchy at the root level. The root level is the highest level in the company, so it would be like saying the HR manager must be a part of the board. The field next to this dropdown will become available, and you can enter the node ID, for example, for the HR manager there. The Aggregate All Nodes property is interpreted by the SAP HANA MDX engine to specify that the values of intermediate nodes of the hierarchy should be aggregated to the total value of the hierarchy’s root node. For better performance, this should not be selected if there is no data posted on aggregate nodes. A limitation of MDX is that it doesn’t support multiple parents and compound parent-child definitions. Figure 7.29 Properties Section of a Parent-Child Hierarchy In Figure 7.30, you see the Additional Attributes section of a parent-child hierarchy. Here you can specify additional attributes that are applicable to your hierarchy. In this example, we added the first and last names of the people in the company. Figure 7.30 Additional Attributes Section of a Parent-Child Hierarchy Key Concepts Refresher Chapter 7 The third section, Order By, allows you to specify how you want to group the result set by one or more columns. In Figure 7.31, we split the result set up per company. Figure 7.31 Order By Section of a Parent-Child Hierarchy The fourth section is for Time Dependency, as discussed next. Time-Dependent Hierarchies Time-dependent hierarchies change over time, for example, before and after a restructuring of the HR organizational structure, and are dependent on the time period of your query. The fields in this section are grayed out until we enable the Enable Time Dependency option (Figure 7.32). After the rest of the fields become available, we supply the required Valid From Column and Valid To Column fields. You can choose to specify the Interval of the supplied dates using the From Date Parameter and To Date Parameter, or you can choose a Key Date parameter, which you derive from an input parameter. Figure 7.32 Time-Dependency for Parent-Child Hierarchies 313 314 Chapter 7 Modeling Functions Note Time-dependent hierarchies are only available for parent-child hierarchies in calculation views. Hierarchies in Value Help As shown earlier in Figure 7.13, we can use hierarchies in variables to drill down into the list of available values in a Value Help dialog. In this case of level hierarchies, the variable’s reference column is pointing to the deepest (leaf) level of the hierarchy. For example, in the earlier Figure 7.27, the variable has to be created at the city level. It won’t work if you point it at the region or country levels. For parent-child hierarchies, you have to ensure that the variable’s reference column is pointing to the parent attribute of the hierarchy. Value help hierarchies can also be time-dependent hierarchies. Enabling Hierarchies for SQL Access We can use hierarchies in business intelligence (BI) tools. When we use hierarchies in these analytics and reporting tools, we can drill down into and sort the data according to the hierarchy structures. Something that was still missing in our discussion was how to enable hierarchies for use in SQL, where developers can also use them. We can enable hierarchies for SQL access in the case of shared hierarchies in a star join view. You first create dimension views with hierarchies. Then you consume these dimension views in a star join view. When you look at the Hierarchies tab in the star join view in Figure 7.33, you’ll notice that there are two areas. The first is for Local hierarchies, which are the hierarchies created in the star join view itself. The second is Shared hierarchies, which shows the hierarchies inherited from the dimension views. In Figure 7.33, we have two shared hierarchies. When you select one of the shared hierarchies, you see the hierarchy’s properties on the next screen. At the bottom, you’ll find a new SQL Access area. When you expand this area, you can select the Enable SQL Access checkbox. Two new fields become available with automatically generated names. The Enable SQL Access checkbox is specifically for enabling this shared hierarchy for SQL access. Key Concepts Refresher Chapter 7 Figure 7.33 Enabling SQL Access for Shared Hierarchies You can also select the Enable Hierarchies for SQL access option in the View Properties tab of the Semantics node. In that case, it enables all the shared hierarchies for SQL access. Let’s look at the Node Column field. With this field, SAP HANA generates a new output field. (We’ve seen other generated output fields in this chapter already, such as calculated columns and restricted columns.) You can now use this new output field in your SQL statements, for example, in a GROUP BY clause or for filtering your result set based on values in your hierarchy. An example is shown in Figure 7.34. (We discuss SQL and SQLScript in the next chapter.) We also show how the node column appears as an output column in the Data Preview of the star join view in SAP Web IDE for SAP HANA. 315 316 Chapter 7 Modeling Functions Figure 7.34 Using the Generated Node Column of the Hierarchy for SQL Access Note The Node Column field is only available for SQL queries, so it won’t show up if the view is consumed by other graphical information tools. Hierarchy Functions In the previous section, we created hierarchies for use by reporting tools. In Figure 7.25, we started with a hierarchy in Excel. The hierarchies were all created in the topmost semantic node. We get the benefit of these hierarchies when we use the Key Concepts Refresher Chapter 7 entire calculation view in BI tools with features such as drilldown and aggregations. In star join cubes, it can help us with dimensional navigation on fact tables, including a time dimension. And we can use these hierarchies with Value Help popups when choosing a value for a variable. But what if we want to work with hierarchal data inside our calculation view? What if we need to use the hierarchical metadata in analytical operations? For example, some companies have a policy that people in a managerial position should have at least seven people reporting to them. If they have less than seven people in their team, they should also have some team tasks in addition to their management responsibilities. How do we identify such individuals? We want a query that says, “Show all managers where the number of direct team members is fewer than seven.” In technical terms, we would ask, “Show all individuals with fewer than seven hierarchy siblings at a hierarchy distance of one.” This isn’t possible with the level and parent-child hierarchies we’ve created in the semantic node over the past few years because these are used for navigation or aggregation. Note The hierarchies we create in the Semantic node are mostly for use by analytics and reporting tools outside of SAP HANA. The hierarchies we create in the Hierarchy Function nodes are for use inside the calculation view. As we saw in Chapter 6, SAP HANA has some new types of graphical calculation view nodes that were added since SAP HANA 2.0 SPS 03. The one we’ll use here is the Hierarchy Function node. These nodes automatically generate hierarchy metadata from the data that is supplied by the data sources. This generated metadata can be used inside the other nodes in the graphical calculation view or in the output of the calculation view. We can use the Hierarchy Function nodes in two ways in the graphical calculation views: 쐍 Generating the basic hierarchy metadata We add a data source to the Hierarchy Function node. This node will then automatically add additional output fields with values that help describe the basic hierarchy, for example, how many people are reporting to a specific manager. 쐍 Navigating the hierarchy We feed the results (output) from the previous (generating) Hierarchy Function Node into another Hierarchy Function Node. In this case, we select the navigate 317 318 Chapter 7 Modeling Functions features in the node, and it will automatically add additional fields with metadata for things such as the hierarchy depth. We can feed the output of the “generate” Hierarchy Function node directly into the “navigate” Hierarchy Function node in the same calculation view, or we can first create a “generate” Hierarchy Function node in a separate calculation view, and then feed this calculation view into the “navigate” Hierarchy Function node. The latter approach is better, as this allows us to create a library of hierarchy generator calculation views that we can reuse in our information modeling. We can also use the hierarchy functions in SQL and SQLScript. We’ll now discuss the various ways that we just mentioned to use hierarchy functions in more detail. Data Used by Hierarchy Functions Let’s take a practical look at how to work with hierarchy functions. Note Hierarchy function nodes will only work with data that has a parent-child hierarchy structure, for example, an organizational structure of a company. In Listing 7.1, we’ve created a simple table. We have an employee name, the ID of the employee, and the ID of the employee’s manager. We save this as an .hdbcds file and then build the file to get SAP HANA to create the table. namespace hier.funcs; context orgstruct { entity HR_ORG { key EMPLOYEE_ID: hana.VARCHAR(12) not null; EMPLOYEE_NAME: hana.VARCHAR(100); MANAGER_ID: hana.VARCHAR(12); }; }; Listing 7.1 Creating a Table for Sample Data Using Core Data Services In this example, we’ll use a small subset of an organizational structure, as shown in the Figure 7.35. When we go to the SAP HANA Database Explorer screen in SAP Web IDE for SAP HANA, right-click on the table, and select Data Preview, we see the screen as shown in Figure 7.36. By selecting the + icon on the toolbar, it allows us to add new records. We added the organizational structure shown into the table, showing the parent-child relationships in this data. Key Concepts Refresher Chapter 7 319 Rudi H Pascal HansHeinrich Andrea HansJoachim Martin Rudi Vasanth Phillip Skhu Tshepo Kevin Vidya Figure 7.35 Subset of an Organizational Structure Illustrated Graphically Figure 7.36 Subset of an Organizational Structure as Table Data We’re now ready to use this table as a data source in a Hierarchy Function node in a calculation view. Generate Hierarchy Functions On the left side of Figure 7.37, you see a dimension calculation view with a Hierarchy Function node. You can add one data source to this node. On the right side of Figure 7.37, you can see the definition of the node. The default type of Hierarchy Function node is for generating the hierarchy metadata. The top two menus in Figure 7.37 show the options for Generation and Navigation. You have to specify which column in your source data (table) is the parent and which column is the child. In this case, the MANAGER_ID is the parent, and the Roman 320 Chapter 7 Modeling Functions EMPLOYEE_ID is the child. In the Start field, you can specify which department you want to analyze. All other records that aren’t part of that department will be ignored. In this case, the records numbered 15–17 in Figure 7.36 won’t be included in the result set. You can additionally specify the Depth of the hierarchy you want. If you leave this blank, it will go down to the lowest levels. You can also specify how orphan nodes should be handled (Orphan Handling). Finally, you can specify if you want the result set sorted; for example, you can sort the hierarchy by the names of the people by selecting the EMPLOYEE_NAME field in the Order By area. Figure 7.37 Creating a Dimension Calculation View with a Generation Hierarchy Function Node In Figure 7.38, all three columns of the table are mapped to the output of the Hierarchy Function node. Notice that the mapping automatically renamed the Manager_ID field to PARENT_ID and the Employee_ID field to NODE_ID. If you look at the number of output columns, however, you’ll notice that there are nine columns. The generation Hierarchy Function node automatically created an additional six columns. Key Concepts Refresher Chapter 7 Figure 7.38 Mapping the Columns in a Hierarchy Function Node and Showing the Additional AutoGenerated Columns The six auto-generated columns are as follows: 쐍 Hierarchy Rank This shows the position of the record in the generated hierarchy. When you specify an Order By field, this could influence these values. 쐍 Hierarchy Tree Size The number of records (nodes) below this node in the hierarchy. It includes this node in the number. 쐍 Hierarchy Parent Rank The position of its parent node. 쐍 Hierarchy Level The level (depth) in the hierarchy of this node. 쐍 Hierarchy is Cycle Whether this node is part of a cycle. This happens when a node is both a parent and a child in the same path. The value can be 0 or 1. 321 322 Chapter 7 Modeling Functions 쐍 Hierarchy is Orphan Whether this node is an orphan. The value can be 0 or 1. In Figure 7.39, you can see the output from the generation Hierarchy Function node. For each record or node in the hierarchy, you can see the various values in the auto-generated fields just described. Figure 7.39 Previewing the Data of a Hierarchy Function Node You can now use this dimensional calculation view with the generation Hierarchy Function node in other calculation views. Navigate Hierarchy Functions Figure 7.40 shows a new dimensional calculation view. It’s very similar to the one shown in Figure 7.37, except that the Hierarchy Function node doesn’t have a table as a data source; instead it uses the previous calculation view with the generation Hierarchy Function node. In this case, you specify the Hierarchy Function node as a navigation type. In Figure 7.37, you can see how to specify the navigation as moving down the hierarchy (Hierarchy Descendants), moving up the hierarchy (Hierarchy Ancestors), or staying at the same level (Hierarchy Siblings). When looking at the Columns tab, you’ll see that a single auto-generated field called Hierarchy Distance was created. This gives the distance between this node and another node, expressed by the difference in levels. You can specify the other node by filling in the Start Rank field. Key Concepts Refresher Chapter 7 Figure 7.40 Creating a Dimension Calculation View with a Navigational Hierarchy Function Node Figure 7.41 illustrates one way of achieving this. Here we moved the data source to a project node below the navigation Hierarchy Function node, and then we inserted another project node with a filter between them. The filter is a simple expression that allows you to filter the result set to a single record. In this case, we specified a unique NODE_ID. In the navigation Hierarchy Function node, you can select any of the fields from the filtered Projection node. In this example, it’s HIERARCHY_RANK. When you now look at the Mapping and Columns tabs, you’ll notice an additional auto-generated column called START_RANK, as shown in Figure 7.42. We started this discussion by giving an example of identifying all managers where the number of direct team members is fewer than seven. In technical terms, we would ask, “Show all individuals with fewer than seven hierarchy siblings at a hierarchy distance of one.” You’re now ready to build such queries in your graphical calculation views. 323 324 Chapter 7 Modeling Functions Figure 7.41 Adding a Filter to a Navigation Hierarchy Function Node Figure 7.42 Showing the Additional Auto-Generated Column in a Navigation Hierarchy Function Node Key Concepts Refresher Chapter 7 Hierarchy Functions with SQL These hierarchy functions have been available in SQLScript for a while already. In fact, more hierarchy functions are available in SQLScript than the ones we discussed here. These are described in the SAP HANA Hierarchy Developer Guide at http://s-prs.co/v487620. You can call them, for example, using table functions. This is described in the next chapter. Following are examples of some available hierarchy functions: 쐍 Hierarchy span tree This function generates a minimal tree to prove connectivity for each node when you just need to know if a node is connected, how many times, and the possible paths. You can use this for checking authorizations that use something like an organizational structure where you just need to establish that there is at least one connection to the root. This then indicates that permission is granted. Basically, a span tree includes only the first shortest path to any node. If there are other longer paths, they are removed. A good use case for this is to find the flights with the lowest number of connections. You don’t want to fly via two stopovers when you can get a direct flight. 쐍 Hierarchy temporal This function generates a hierarchy that presents time-based relationships between the nodes. This can, for example, be used when one person stands in for someone else during a vacation period. The relations are only valid for a limited time. 쐍 Hierarchy leveled This function works with data that represents a level hierarchy. 쐍 Hierarchy composite ID This function combines columns in the source data to create nodes, for example, combining first and last names for each employee. 쐍 Hierarchy descendants aggregation and hierarchy ancestors aggregation This function is similar to the standard descendants and ancestors functions, but it also provides optimized aggregation. SPS 04 Hierarchy span tree and hierarchy temporal functions are now also available in the Hierarchy Function node for graphical calculation views. 325 326 Chapter 7 Modeling Functions Important Terminology In this chapter, we touched on different ways to enhance our SAP HANA calculation views and enrich our information models. The following list describes the important terms covered in this chapter: 쐍 Calculated columns Adds additional output columns to our calculation views and allows us to calculate new values. 쐍 Counters Shows how many times something has occurred within a result set. 쐍 Restricted columns Restricts the data shown in a particular column or field and outputs the restricted values in an additional output field. 쐍 Filter expressions Used if we don’t want to show all the data from a specific table or to only show specific data. This can use both static and dynamic filtering. 쐍 Domain fixed values Values from the ABAP Data Dictionary, and the users can only choose from the available list when they enter data into the fields. 쐍 Variables The three types of variables: – Single value: The variable only asks for one (single) value; for example, the year is 2019. – Interval: The variable asks for from and to values, for example, the years from 2016 to 2019. – Range: The variable asks a single value but combines it with another operator; for example, the year is greater and equal to 2018. 쐍 Input parameters The five types of input parameters we can create: – Direct: Directly enter the value. We can specify the Data Type of the expected value. There are three data types we can use for Direct: Currency, Unit of measure, and Date. – Column: Choose a value from all the possible values in a specified column in the current view. – Derived From Table: Choose a value from data in another table. Important Terminology Chapter 7 – Static List: Choose a value from a predefined (static) list of values. – Derived From Procedures/Scalar Function: Choose a value from a list of values generated by a scalar function or procedure. 쐍 Currency conversion Used when converting from source currency to target currency. A date for the conversion also has to be supplied. 쐍 Hierarchies Used in reporting to enable intelligent drilldown. 쐍 Time-dependent hierarchies Hierarchies that change over time, for example, before and after a restructuring of the HR organizational structure, and are dependent on the time period of your query. 쐍 Hierarchy function nodes Nodes where hierarchies are generated from a data source. These hierarchies can be used for analytical purposes inside the calculation view. Table 7.1 shows in which view node some of the features are available. For example, filter expressions aren’t available in Graph and Hierarchy Function nodes. Node Type Calculated Columns Counters Restricted Columns Input Parameters Filter Expressions Join Yes – – Yes Yes Star Join Yes Yes Yes Yes Yes Projection Yes – – Yes Yes Aggregation Yes Yes Yes Yes Yes Rank – – – Yes Yes Union Yes – – Yes Yes Graph – – – Yes – Intersect, Minus Yes – – Yes Yes Hierarchy Function – – – Yes – Table 7.1 Features per Node 327 328 Chapter 7 Modeling Functions Practice Questions These practice questions will help you evaluate your understanding of the topics covered in this chapter. The questions shown are similar in nature to those found on the certification examination. Although none of these questions will be found on the exam itself, they will allow you to review your knowledge of the subject. Select the correct answers, and then check the completeness of your answers in the “Practice Question Answers and Explanations” section. Remember, on the exam, you must select all correct answers and only correct answers to receive credit for the question. 1. In which type of node can you create calculated columns? (There are 2 correct answers.) 첸 A. Rank node 첸 B. Graph node 첸 C. Union node 첸 D. Projection node 2. You want to build a hierarchy on a time-based data structure, such as year, month, and day. What type of hierarchy do you use? 첸 A. Parent-child hierarchy 첸 B. Level hierarchy 첸 C. Ragged hierarchy 첸 D. Unbalanced hierarchy 3. Which are features of a calculated column? (There are 2 correct answers.) 첸 A. It’s a column in the result set. 첸 B. Its calculations are persisted. 첸 C. It’s created in the output area. 첸 D. It’s created in the Semantics node. 4. You want to show a male and female option to end users. What feature do you use? (There are 3 correct answers.) 첸 A. Domain fixed values 첸 B. Value help Practice Questions Chapter 7 첸 C. Variables 첸 D. Expressions 첸 E. Input parameters 5. You’re creating a currency conversion. What options do you have for the Target Currency field? (There are 3 correct answers.) 첸 A. Select a fixed currency. 첸 B. Select a column. 첸 C. Use the Derived from Table option. 첸 D. Use an input parameter. 첸 E. Select a table type. 6. Where do you create a variable? 첸 A. In the context menu of a workspace 첸 B. In the Semantics node 첸 C. In the output area of a Join node 첸 D. In the work area of an Aggregate node 7. You have logs from a website. How do you show the total number of website visitors that used a mobile phone? (There are 2 correct answers.) 첸 A. Create a restricted column. 첸 B. Create a hierarchy expression parameter. 첸 C. Create a filter expression. 첸 D. Create a rank column. 8. What can you do with a level hierarchy? 첸 A. Make it time-dependent 첸 B. Use it for a bill of materials (BOM) structure 첸 C. Use it for drilldown in a value help list 첸 D. Show variable deepness in the data 329 330 Chapter 7 Modeling Functions 9. What can a variable be used for? 첸 A. Currency conversion 첸 B. Calculated column expression 첸 C. Call a procedure in SQLScript 첸 D. Filtering 10. You’re working with a currency measure. What is decimal shift used for? 첸 A. To increase the accuracy of the currency conversion 첸 B. To undo the decimal adjustment for ABAP 첸 C. For currencies with more than two decimal places 첸 D. To avoid rounding errors during the conversion 11. What variable type do you use if you want to allow the user to choose the years from 2016 to 2019? 첸 A. Single value 첸 B. Interval 첸 C. Range 첸 D. Multiple entries 12. What do you create to find the number of unique items sold per shop? 첸 A. A calculation view with a restricted column in an Aggregation node 첸 B. A calculation view with a counter in a Projection node 첸 C. A calculation view with a counter in an Aggregation node 첸 D. An analytic view with a calculated column in a Star Join node 13. What can you use to limit the number of records produced by a calculation view? (There are 3 correct answers.) 첸 A. SAP client restrictions 첸 B. Domain fixed values 첸 C. Filter expressions 첸 D. Restricted columns 첸 E. Generated columns Practice Questions Chapter 7 14. You created a calculated column in a star join view for a table containing the number of units ordered, the price per unit, and the number of units in stock. You get the wrong results. What could be the problem? 첸 A. You mistyped the calculated column expression. 첸 B. You chose the SQL expression syntax. 첸 C. You did not enable the Keep Flag. 첸 D. You did not enable the currency conversion. 15. Where can you create a filter expression? 첸 A. In an Aggregation node 첸 B. In a Data Foundation node 첸 C. In a Projection node 첸 D. In the Semantics node 16. Which tables do you need for currency conversion? 첸 A. DD07* 첸 B. TCUR* 첸 C. BICS* 첸 D. SHINE* 17. What do you use to build a financial report for the company’s board members that allows them to choose if they want the report to be in euros or dollars? 첸 A. An input parameter from an Aggregation node 첸 B. A variable from the Semantics node 첸 C. A variable from a Projection node 첸 D. An input parameter from the Semantics node 18. What type of input parameters do you use to choose the data from a field in the current table? 첸 A. Direct 첸 B. Derived from table 첸 C. Value help 첸 D. Column 331 332 Chapter 7 Modeling Functions 19. What are properties of domain fixed values? (There are 2 correct answers.) 첸 A. They are based on the ABAP Data Dictionary. 첸 B. They ask end users for input at runtime. 첸 C. They are static values. 첸 D. They are based on SQLScript functions. 20. Which generated output column uses hierarchies? 첸 A. Rank Column 첸 B. Node Column 첸 C. Restricted Column 첸 D. Calculated Column 21. Which hierarchies can you enable for SQL access? 첸 A. Level hierarchies in a calculation view of type cube 첸 B. Local hierarchies in a star join view 첸 C. Time-dependent hierarchies in a dimension view 첸 D. Shared hierarchies in a star join view 22. What do you require when creating a hierarchy for analysis inside a graphical calculation view? (There are 2 correct answers.) 첸 A. Data for a level hierarchy 첸 B. Data for a parent-child hierarchy 첸 C. A Hierarchy Function node 첸 D. A table function 23. Which column gets automatically generated in a navigational Hierarchy Function node? 첸 A. Hierarchy Rank 첸 B. Hierarchy Parent Rank 첸 C. Hierarchy Distance 첸 D. Hierarchy Level Practice Question Answers and Explanations Chapter 7 Practice Question Answers and Explanations 1. Correct answers: C, D You can create calculated columns in a Union node and a Projection node. If you got this wrong, it might be that you remember it from SAP HANA 1.0. Calculated columns are now available in more view nodes than before. 2. Correct answer: B You use a level hierarchy to build a hierarchy on a time-based data structure. Time-based means you have year on level 1, months on level 2, and days on level 3. Only a parent-child hierarchy can be a time-dependent hierarchy. (The words are confusing, so watch out. Here we mentioned the data structure.) It just illustrates that you could achieve fairly similar functionality with both types of hierarchies, although the approaches are quite different. Ragged and unbalanced hierarchies aren’t mentioned in SAP HANA. 3. Correct answers: A, C Features of a calculated column: it’s a column in the result set and is created in the output area. Its calculations aren’t persisted! It’s NOT created in the Semantics nodes. 4. Correct answers: B, C, E You can show a male and female option to users with either variables or input parameters, and by opening the value help. Domain fixed values can be used for filtering. Male and female got mentioned in the “Domain Fixed Values” section, but the context there was completely different. 5. Correct answers: A, B, D The options you have for the target currency field in a currency conversion are as follows: Select a fixed currency. Select a column. Use an input parameter. 6. Correct answer: B You create a variable in the Semantics node. 333 334 Chapter 7 Modeling Functions 7. Correct answers: A, C You show the total number of website visitors that used a mobile phone by creating either a restricted column or a filter expression. A hierarchy expression parameter is created when you enable a hierarchy for SQL access. A rank column will rank them by numbers but doesn’t show the total number. (You could perhaps rank all the data, rank the data, and then search for the total number in the rankings. So, technically, it’s possible but requires that you still manually search the correct output.) 8. Correct answer: C A level hierarchy can be used for drilldown in a value help list. Only parent-child hierarchies have time-dependency, variable deepness in the data, and can be used for a BOM structure. 9. Correct answer: D A variable can be used for filtering. You’ll use input parameters for currency conversion and in calculated column expression. You can’t call a procedure via a SQLScript variable in SQLScript (which we’ll look at in the next chapter). You use the CALL syntax and use output parameters. 10. Correct answer: C Decimal shift is used for currencies with more than two decimal places. Decimal shift back is used to undo the decimal adjustment for ABAP. 11. Correct answer: B You use an interval variable type if you want to allow the user to choose dates from 2016 to 2019. Range would allow for years > 2016. Don’t confuse the range type with the interval type. 12. Correct answer: C You create a calculation view with a counter in an Aggregation node to find the number of unique items sold per shop. You need a counter for this, so two answers are eliminated. A calculation view with a counter in a Projection node isn’t valid because a counter can’t be created in a Projection node. 13. Correct answers: A, B, C You can use SAP client restrictions, domain fixed values, and filter expressions to limit the number of records produced by a calculation view. Practice Question Answers and Explanations Chapter 7 Restricted columns don’t restrict the number of records. The word columns indicates that it’s an additional output column. An additional output column can’t reduce the number of records. 14. Correct answer: C If you get errors in a calculated column in a star join view, you must enable the Keep Flag and ensure that you don’t create the calculated column in the Aggregation node. 15. Correct answer: C You can only create a filter expression in a Projection node. 16. Correct answer: B You need the TCUR* tables for currency conversion. You need the DD07* tables for domain fix values. 17. Correct answer: D You use an input parameter from the Semantics node (where it gets created) to build a financial report for the company’s board members that allows them to choose if they want the report to be in euros or dollars. You just reference the already created input parameter in the currency conversion. 18. Correct answer: D You use a Column type of input parameter to choose the data from a field in the current table. The Direct type is used to directly type in values. The Derived From Table type is used to get data from another table. 19. Correct answer: A, C Domain fixed values are based on the ABAP Data Dictionary and are static values. 20. Correct answer: B A node column is generated when you enable hierarchies for SQL access. 21. Correct answer: D Shared hierarchies in a star join view can be enabled for SQL access. 22. Correct answer: B, C You require both the data for a parent-child hierarchy and a Hierarchy Function node when creating a hierarchy for analysis inside a graphical calculation view. You can only use the data for a level hierarchy when calling additional hierarchy functions from SQLScript using a table function. (Note that actually all four answers could be correct if you create a table function where you use the data for a level hierarchy and call an additional 335 336 Chapter 7 Modeling Functions hierarchy function from SQLScript, and then use this table function inside a graphical calculation view. You’ll understand this better after having read the next chapter.) 23. Correct answer: C The Hierarchy Distance column gets automatically created in a navigational Hierarchy Function node. The other three columns get automatically created in a generation Hierarchy Function node. Takeaway You should now know how to enhance SAP HANA calculation views with the nine different modeling features we’ve discussed in this chapter. You know how to create calculated and restricted columns to add additional output fields to your calculation views. You’ve looked at filtering and filter expressions, and you’ve learned how to limit the data early in your modeling process. You can now identify when to create variables and input parameters to allow business users to specify what outputs they want to see in the result sets of your calculation views at runtime. Working with currency, you now understand how to add currency conversions to your financial reports. Finally, you now know how to create hierarchies to provide powerful drilldown capabilities for your end users and for analyzing hierarchical data in your organization. Summary We learned how to work with calculated columns, restricted columns, and filters in our calculation views. We added variables and input parameters to add a new level of dynamic interaction to our modeling by allowing the end users to specify what they are looking for in each query of the SAP HANA calculation views. We also saw how to enrich our calculation views with currency conversions and hierarchies. In the next chapter, we’ll discuss how to use SQL and SQLScript to provide the advanced features that you might not be able to do with the graphical calculation views we discussed in previous chapters. Chapter 8 SQL and SQLScript Techniques You’ll Master 쐍 Get a basic understanding of SQL and how to use it with your SAP HANA information models 쐍 Learn SQLScript in SAP HANA and find out how it extends and complements SQL 쐍 Understand how the SAP HANA modeling environment with SQL and SQLScript has changed over the past few years 쐍 Create and use table and scalar functions with your graphical SAP HANA information views 쐍 Develop procedures in SAP HANA using SQL and SQLScript 338 Chapter 8 SQL and SQLScript We’ve made a few references about SQL and SQLScript up to this point. Even in Chapter 7, we enabled hierarchies for SQL access, and you saw how to use the generated node column in a GROUP BY clause. In this chapter, we look at how to use SQL and SQLScript to enhance your graphical modeling capabilities. You’ve already seen how you can use SQLScript in your calculation views, for example, in the expression editors of the calculated and restricted columns, in filter expressions, and in creating default values for input parameters (in the Creating Input Parameters section of Chapter 7). We’ll also see some use of it in Chapter 13 when we look at security. In many large SAP HANA projects, you’ll reach a point at which you need to do something more than what is offered by the graphical information views. Information modeling using SQL and SQLScript offer more powerful options. You lose the productivity and simplicity of working graphically with information views, but you gain flexibility and control. As SAP HANA has matured, the graphical modeling tools have grown such that the requirements for SQL and SQLScript have been reduced. In the past, after you started working with SQL and SQLScript, you had to continue in that environment (or choose to create any subsequent layers in your information models there as well). Since SAP HANA 1.0 SPS 11, you can work graphically with your SAP HANA information views, address the occasional constraint with SQL and SQLScript, and bring this development back into the graphical SAP HANA modeling environment. You can then continue building the remaining layers of your information models using the graphical information views. The focus has therefore shifted toward the graphical modeling environment for SAP HANA modelers. Note There is definitely less focus now on SQL and SQLScript in the average modeling workflow than there was in the past. However, a good grasp of development processes and basic programming knowledge will always be to your benefit. In this chapter, we’ll get you up to speed quickly on the SAP HANA information views and how to integrate your SQL/SQLScript development back into the graphical modeling environment. Objectives of This Portion of the Test Chapter 8 Real-World Scenario You’ve been working on a large SAP HANA project for a few months when a group of business users asks if you can help to make security access more powerful and sync it to automatically use their hierarchal organizational structures. You base the analytic privileges on a procedure that easily exploits these structures. The company is quickly moving into the area of mobile business services and plans to deliver some new, real-time business dashboards to executives and managers. SAP HANA is an obvious choice, and your modeling and development skills are required to build the link to the mobile screens. Using your SQLScript knowledge, you quickly figure out ways to create powerful dashboards that respond fast on the mobile screens. You’ve noticed lately how the expression editors in the graphical modeling environment are now using SQL syntax. By leveraging your SQL knowledge, you improve many of the features end users are using daily in their analytics. Objectives of This Portion of the Test The purpose of this portion of the SAP HANA certification exam is to test your knowledge of modeling using SQL and SQLScript. The certification exam expects you to have a good understanding of the following topics: 쐍 Standard SQL and how it relates to SAP HANA 쐍 SQLScript in SAP HANA 쐍 How to create and use user-defined functions (UDFs), especially table functions 쐍 Procedures in SAP HANA using SQL and SQLScript Note This portion contributes about 5% of the total certification exam score. 339 340 Chapter 8 SQL and SQLScript Key Concepts Refresher Structured Query Language (SQL) is a widely used database programming language standardized by the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO). The standard is often referred to as ANSI SQL. Despite the existence of these standards, most SQL vendors extend ANSI SQL to provide enhanced capabilities for their individual database products. Microsoft uses an extended version called Transact-SQL (T-SQL), and Oracle uses PL/SQL. SAP HANA provides SQLScript, which is an extension of ANSI SQL, to provide extra capabilities for the unique in-memory components that SAP delivers. We’ve already seen many of these features, for example, column tables, parameterized information views, delta buffers, multiple result sets in parallel, and built-in currency conversions. The ANSI SQL standard doesn’t provide these features. However, SAP HANA supports ANSI SQL statements. In this chapter, we provide only a short introduction to SQL and SQLScript, and we do assume you have some knowledge of SQL. You can find the full SAP HANA SQL and SQLScript reference guides at http://s-prs.co/v487620. SQL SQL uses a set-oriented approach for working with data. Tip Rather than working with individual records, you should focus on working with the entire result set that consists of many data records. You specify what should happen to all the data records in the result set at once. This allows databases to be efficient when processing data because they can parallelize the processing of these records. Databases are faster at processing data than application servers because SQL is a set-oriented declarative language, whereas application servers typically use procedural or object-oriented languages that process records one at a time. SQL Language SQL is used to define, manipulate, and control database objects. These actions are specified by various subsets of the language: Key Concepts Refresher Chapter 8 쐍 Data Definition Language (DDL) This is used, for example, to CREATE, ALTER (modify), RENAME, or DROP (delete) table definitions. 쐍 Data Manipulation Language (DML) This is used to manipulate data, for example, insert new data into a table or update the data in a table. Examples in SQL include INSERT, UPDATE, and DELETE. 쐍 Data Query Language (DQL) This is used to SELECT (access) data. It doesn’t modify any data. 쐍 Data Control Language (DCL) This is used for data security, for example, to GRANT privileges to a database user. 쐍 Transaction Control Language (TCL) This is used to control database transactions, for example, to COMMIT or ROLLBACK database transactions. SQL is also subdivided into several language elements, as listed in Table 8.1. Element Short Description Identifiers Names of database objects, for example, a table called BOOKS. Data types Specifies the characteristics of the data stored in the database, for example, a string stored as a NVARCHAR data type. Operators Used to assign, compare, or calculate values, for example, =, >=, + (add), * (multiply), AND, and UNION ALL. Expressions Clauses that produce result sets consisting of either scalar values or tables consisting of columns and rows of data. Examples include aggregations, such as SUM, or IF…THEN…ELSE logic using CASE. Predicates Specifies conditions that can be evaluated to TRUE, FALSE, or UNKNOWN. These can be used to filter statement and query results or to change program flow. They are typically used in the WHERE conditions of the SELECT statement, for example, A >= 100. Queries Retrieves data from the database based on specific criteria. Queries use the SELECT statement, for example, SELECT * from BOOKS. Statements Commands to create persistent effects on schemas or data and control transactions, program flow, connections, and sessions. Functions Provides methods to evaluate expressions. They can be used anywhere an expression is allowed. Various functions are built into SQL, for example, LENGTH(BOOKNAME). Table 8.1 Elements of SQL 341 342 Chapter 8 SQL and SQLScript There are a few important points to remember about SQL when working with SAP HANA: 쐍 Identifiers can be delimited or undelimited, meaning the names of database objects are or are not enclosed in double quotes. If you don’t enclose identifiers with quotes, they are treated as uppercase. Therefore, BOOKS, Books, and books all reference the same table name. Delimited identifiers are treated as case-sensitive and can include characters in the name that would be forbidden by undelimited identifiers. SAP HANA uses Unicode, so the range of characters in a name can be wide. In this case, table "BOOKS" and table "books" are treated as two separate tables. 쐍 NULL is a special indicator used by SQL to show that there is no value. 쐍 The LIKE predicate can be used for SQL searches. SELECT author FROM books WHERE title LIKE ‘%HANA%’; will search for all books with the word HANA in the title. We’ll look at this in more detail in Chapter 10. 쐍 Statements in SQL include the semicolon (;) statement terminator. This is a standard part of SQL grammar. Insignificant whitespace is generally ignored in SQL statements, and queries and can be used to make such statements and queries more readable. We’ll now look at some SQL statements that correspond to SAP HANA modeling constructs you’re already familiar with. Creating Tables The short version of the SQL syntax to create a table is as follows: CREATE [<table_type>] TABLE <table_name>; <table_type> ::= COLUMN | ROW | GLOBAL TEMPORARY | GLOBAL TEMPORARY COLUMN | LOCAL TEMPORARY | LOCAL TEMPORARY COLUMN | VIRTUAL This means that you have to type “CREATE TABLE BOOKS;” to create an empty table called BOOKS. In this example, <tablename> is BOOKS. You can choose what to type in place of the <tablename> placeholder. Anything in square brackets, such as [<table_type>], is optional. If you want your table to be a column table, type “CREATE COLUMN TABLE BOOKS;”, for example. If you don’t specify the type of table, the system will create a row table because that’s what ANSI SQL expects. Key Concepts Refresher Chapter 8 Note The CREATE COLUMN TABLE is already the first bit of SQLScript that you meet. Standard ANSI SQL just caters for CREATE TABLE. The extra COLUMN keyword is specific to SAP HANA, and this “extension” is therefore part of SQLScript. In addition, note that the syntax of SQLScript fits in exactly with that standard SQL syntax. The same syntax is used to create a standard row table (CREATE TABLE) and an SAP HANA column table (CREATE COLUMN TABLE). With SAP HANA SPS 03, we got a new table type called SAP HANA versioned tables. Also known as system-versioned tables, they have actually been part of ANSI SQL since 2011. Instead of a single table, you use two tables. Both these tables must be in the same SAP HANA Deployment Infrastructure (HDI) folder. The first table is a normal column table. The second is a history table that is linked to the first table with a WITH SYSTEM VERSIONING HISTORY TABLE clause. This table has the same fields as the first table, with additional VALIDFROM and VALIDTO time stamp fields. The first table contains your data, just like any other table. But when you update a record, instead of forgetting the previous values, the system-versioned table automatically copies the values in the updated record to the second table and updates the VALIDFROM and VALIDTO fields, before updating the record in the first table. When you query the first table, it behaves like a normal column table. However, you can execute special SELECT statements with a “time-travel clause” on these types of tables to get the data as it was at a certain time. We can therefore read a table as it was at any point in time. This can be useful for tables that refer to financial transactions and can completely change the way that systems perform period closings. Rather than you having to code all the logic to handle historical data, it’s built in to the database engine. You also don’t have to put logic for reading historical data into every database query. Tip SAP HANA versioned tables replace the old HISTORY COLUMN tables. You should not use HISTORY COLUMN tables. SAP Enterprise Architecture Designer (SAP EA Designer) (discussed in Chapter 4) has a feature in which it creates SAP HANA versioned tables for you with just a few clicks. You create a normal column table and then activate the Versioning option in the properties. 343 344 Chapter 8 SQL and SQLScript SPS 04 SAP HANA SPS 04 adds a variation of SAP HANA versioned tables called application time period tables. These are basically the same as SAP HANA versioned tables, but they use valid time periods specified by the business rather than the system time of when a record was updated. You can create these by creating a .hdbapplicationtime file. Just remember to ensure that you also add the correct plugin in the (hidden) .hdiconfig file. Reading Data The query, the most common operation in SQL, uses the SELECT statement—for example: SELECT [TOP <unsigned_integer>] [ ALL | DISTINCT ] <select_list> <from_clause> [(<subquery>)] [<where_clause>] [<group_by_clause>] [<having_clause>] [<order_by_clause>] [<limit_clause>] [<for_update_clause>] [<time_travel_clause>]; One of the shortest statements that can be used to read data from table BOOKS is SELECT * FROM BOOKS;. This returns all the fields and data from table BOOKS, in effect putting the entire table in the result set. Tip We recommend never using SELECT * statements as it can severely impact the query performance of column tables. Because SAP HANA primarily uses column tables, this type of SELECT statement should be avoided! A better way is to specify the <select_list>, for example, SELECT author, title, price FROM BOOKS;. In this case, you only read the author, title, and price columns for all the books. Note the [<time_travel_clause>] for use with SAP HANA versioned tables. SPS 04 What is tremendously helpful for developers in SAP HANA 2.0 SPS 04 is the FOR JSON clause. This returns the result set in JavaScript Object Notation (JSON) format. Key Concepts Refresher Chapter 8 Filtering Data In SAP HANA, we sometimes use a filter, by using the WHERE clause as part of a SELECT statement, for example: SELECT author, title, price FROM BOOKS WHERE title LIKE ‘%HANA%’; Creating Projections Inside a Projection node in SAP HANA views, you can rename fields easily via the AS keyword in SQL, for example: SELECT author, title AS bookname, price FROM BOOKS; The title field is now renamed to bookname in the result set. Creating Calculated Columns Creating calculated columns looks almost the same as renaming fields, but there is an added formula, as shown here: SELECT author, title, price * discount AS discount_price FROM BOOKS; In this case, we added a new calculated column called discount_price. Created Nested Reads and Aggregates In SAP HANA views, we often have many layers; in SQL, we use nested SELECT statements. These nested queries are also known as subqueries. For example: SELECT author, title, price FROM BOOKS WHERE price > (SELECT AVG(price) FROM BOOKS); The SELECT AVG(price) FROM BOOKS statement (subquery) returns the average price of all the books in table BOOKS. The entire query above returns the books that are more expensive than the average price. You can create aggregates, such as with an aggregation node in a calculation view, using standard aggregation functions in SQL. We discussed aggregation in Chapter 5. 345 346 Chapter 8 SQL and SQLScript Creating Joins SAP HANA views normally have many joins. Here is an example of how to create an inner join between two tables: SELECT BOOKS.author, BOOKS.title, BOOKS.price, BOOK_SALES.total_sold FROM BOOKS INNER JOIN BOOK_SALES ON BOOKS.book_number = BOOK_SALES.booknumber; The field on which both tables are joined is the book_number field. Creating Unions Union nodes in calculation views can be performed by using the UNION operator, for example: SELECT author, title, price FROM BOOKS UNION SELECT author, title, price FROM BOOKS_OUT_OF_PRINT; You can also use the UNION ALL operator instead of UNION. The UNION operator returns only distinct values, whereas UNION ALL will return all records, including duplicate records. Conditional Statements You can handle if-then logic in SQL via the CASE statement, for example: SELECT author, title, price, CASE WHEN title LIKE '%HANA'%' THEN 'Yes' ELSE 'No' END AS interested_in_book FROM BOOKS Here, we’ve added a calculated column called interested_in_book to our query. If the book’s title contains the word HANA, we indicate that we’re interested in it. With that short introduction, you’ve seen how to do many of the things we’ve done before using graphical modeling in calculation views, but now using SQL. Key Concepts Refresher Chapter 8 SQLScript SQLScript exposes many of the memory features of SAP HANA to SQL developers. Column tables, parameterized information views, delta buffers, multiple result sets in parallel, built-in currency conversions at the database level, fuzzy text searching, spatial data types, and Predictive Analysis Libraries (PAL) make SAP HANA unique. The only way to make this functionality available via SQL queries, even from other applications, is to provide it via SQL extensions. We’ve come across many of these features in earlier chapters. We’ll look in more detail at the fuzzy text search and PAL in Chapter 10, as well as learn more about the Application Function Library (AFL) that includes predictive and business functions that can be used with SQLScript. In Chapter 11, we’ll learn more about spatial functions, graph processing, and series data in SAP HANA In Chapter 5, we saw the spatial join type used between a ST_POINT data type (e.g., a location) and an ST_POLYGON (e.g., a suburb). The spatial functions can be called from SQLScript to access and manipulate spatial data. SAP HANA also has to cater to SAP-specific requirements, for example, limiting data to a specific “client” (e.g., 100) in the SAP business systems. In SAP HANA, you can use a SESSION_CONTEXT(CLIENT) function in the WHERE clause of a query to limit data in a session to a specific client. In this section, we’ll look at the various features of SQLScript. Separate Statements Using Variables One of the unique approaches of SQLScript is the use of variables to break a large, complex SQL statement into smaller, simpler statements. This makes the code much easier to read and to understand, and it also helps with SAP HANA’s performance because many of these smaller statements can be run in parallel. Let’s look at an example: books_per_publisher = SELECT publisher, COUNT (*) AS num_books FROM BOOKS GROUP BY publisher; publisher_with_most_books = SELECT * FROM :books_per_publisher WHERE num_books >= (SELECT MAX (num_books) FROM :books_per_publisher); 347 348 Chapter 8 SQL and SQLScript We would normally write this as a single SQL statement using a temporary table or by repeating a subquery multiple times. In our example, we’ve broken this into two smaller SQL statements by using table variables. The first statement calculates the number of books each publisher has and stores the entire result set into the table variable called books_per_publisher. This variable containing the entire result set is used twice in the second statement. Tip Notice that the table variable is prepended in SQLScript with a colon (:) to indicate that this is used as an input variable. All output variables use only the name, and all input variables have a colon prepended. The second statement uses :books_per_publisher as input and uses a nested SELECT statement. The SQLScript compiler and optimizer will determine how to best execute these statements, whether by using a common subquery expression with database hints or by combining them into a single complex query. The code becomes simpler to write, easier to understand, and more maintainable. SPS 04 In SAP HANA 2.0 SPS 04, these variables can even be treated like tables. You can modify data in SQLScript table variables with statements such as INSERT, UPDATE, and DELETE. You must just remember to prepend a colon to the variable name, for example, INSERT INTO :variable . . .. By breaking the SQL statement into smaller statements, we also mirror the way we’ve learned to build graphical information views in SAP HANA. Look at the two calculation views in Figure 8.1; if you had to write all this in SQL, it would be quite difficult to create optimal code. Just like we start building graphical information views from the bottom up, we do the same with our SQLScript code. We won’t write all the SQLScript code here, but just some pseudocode to illustrate the point. We start with a Union node on the right side of the model in Figure 8.1: union_output = SELECT * FROM AMER UNION SELECT * FROM EMEA; Key Concepts Refresher Chapter 8 The first nodes in the left-side model are Join nodes. You can use the output from a previous Join node as the source for one side of the join: join_output = SELECT * FROM :previous_join_output INNER JOIN SELECT * FROM get_employees_by_name_filter('Jones'); Each time, the output from the lower node becomes the input for the next higher node. As you can see, this layered approach to information modeling is easy to write in SQLScript. Figure 8.1 SAP HANA Calculation Views in Layers 349 350 Chapter 8 SQL and SQLScript Imperative Logic Most of the SQL and SQLScript code we’ve worked with to this point has used declarative logic. In other words, we’ve used the set-oriented paradigm that contributes to SQL’s performance. We’ve asked the database to hand us back all the results immediately when the query has completed. Occasionally, you might prefer to receive the answers to a query one record at a time. This is most often the case when running SQL queries from application servers. In these cases, you’ll use imperative logic, meaning you’ll “loop” through the results one at a time. You can also use conditionals, such as in if-then logic. As such, the imperative login controls the flow of data to the application server. Imperative logic is (mostly) not available in ANSI SQL; therefore, SAP HANA provides it as part of SQLScript. Tip Remember that imperative logic doesn’t perform as well as declarative logic. SAP HANA can’t optimize the imperative logic to the same performance levels as that of declarative logic. If you require the best performance, try to avoid imperative logic. There are different ways to implement imperative logic in SQLScript: 쐍 WHILE loop The WHILE loop will execute some SQLScript statements as long as the condition of the loop evaluates to TRUE: WHILE <condition> DO some SQLScript statements… END WHILE 쐍 FOR loop The FOR loop will iterate a number of times, beginning with a loop counter starting value and incrementing the counter by one each time, until the counter is greater than the end value: FOR <loop-var> IN [REVERSE] <start_value> .. <end_value> DO some SQLScript statements… END FOR SPS 04 In addition to declarative and imperative logic, we also get recursive SQLScript in SAP HANA 2.0 SPS 04. This is where a function or procedure can call itself. We can illustrate this by using the standard example of recursively calculating factorials, as shown in Listing 8.1. Key Concepts Refresher Chapter 8 PROCEDURE factorial(in depth int, out exitvalue int) BEGIN if :depth < 1 then exitvalue = 1; else call factorial(:depth-1, exitvalue); exitvalue = :depth * :exitvalue; end if END Listing 8.1 Recursive SQLScript in SAP HANA 2.0 SPS 04 The recursion depth for the procedure can be set from 8 to 128, with a default of 32. Functions don’t have an explicit depth limit, but you’ll get a runtime error if you run out of memory. Library procedures and functions don’t support recursion. Dynamic SQL and Security Another powerful feature in SQLScript is the ability to create and execute dynamic SQL. Let’s illustrate this via an example. Figure 8.2 shows a web form asking for the user’s mobile telephone number at the top of the figure. We insert this mobile number into a partially prepared SQL statement and then execute this SQL statement using the EXEC statement. Mobile number: +86 139-1234-5678 “SELECT col1, col2, col2 FROM employees WHERE mobile_num =‘“ & mobile_number & “’;” EXEC “SELECT col1, col2, col2 FROM employees WHERE mobile_num =‘+86 139-1234-5678’;” Figure 8.2 Dynamic SQL with Values from a Web Form 351 352 Chapter 8 SQL and SQLScript Each user gets a unique SQL statement created especially for him or her. The performance isn’t as good as static SQL statements, but this is more powerful and flexible. There is one problem: if you’re not careful, hackers can use this setup to break into your system. This has been one of the top security vulnerabilities in web applications for many years now. Let’s look at the web form again. This time, in Figure 8.3, hackers enter more information than just the mobile telephone number. They add a closing quote and a semicolon. When this is inserted into our prepared SQL statement, it closes the current statement. (Remember that SQL statements are terminated by a semicolon.) Mobile number: +86 139-1234-5678’;DROP DATABASE;‘ “SELECT col1, col2, col2 FROM employees WHERE mobile_num =‘“ & mobile_number & “’;” EXEC “SELECT col1, col2, col2 FROM employees WHERE mobile_num =‘+86 139-1234-5678’;DROP DATABASE;’’” Figure 8.3 Dynamic SQL with SQL Injection from a Web Form Anything after this is executed as another SQL statement—so the hackers can enter something like DROP DATABASE that might delete the entire database. Or they can try various SQL statements to read or modify data values in the database. SAP HANA includes some new features related to this problem, such as the new IS_SQL_INJECTION_SAFE function, which helps prevent such attacks. If you remember to prevent security vulnerabilities such as SQL injection, dynamic SQL can enable you to write some powerful applications. You can change Key Concepts Refresher Chapter 8 data sources at any time, link them to applications, and build adaptive user interfaces. Again, keep in mind that this power also comes at the price of some performance. Because SAP HANA has to evaluate SQL statements at runtime, the performance of dynamic SQL statements can’t be as good as that of static SQL statements. The other downside of using dynamic SQL statements is that the result set can’t be passed to a SQLScript variable because the EXEC statement doesn’t return an entire result set. (You can return a single record into a scalar variable.) User-Defined (Table and Scalar) Functions User-defined functions (UDFs) are custom, read-only functions in SAP HANA. You write your own functions that can be called in a similar fashion as SQL functions. There are two types of UDFs: scalar functions and table functions. The main difference is whether you want a function to return values as simple data types (e.g., an integer) or in complete “table-like” result sets. A scalar function can return one or more values, but these values have to be single data types. A table function returns a set of data. Tip Both types of UDFs are read-only. You may not execute any DDL or DML statements inside these functions. We create functions as shown in Listing 8.2. CREATE FUNCTION <func_name> [(<parameter_clause>)] RETURNS <return_type> [LANGUAGE SQLSCRIPT] [SQL SECURITY <DEFINER | INVOKER>] [DEFAULT SCHEMA <default_schema_name>] AS { BEGIN <function_body> END } Listing 8.2 Creating User-Defined Functions Here are a few important points to note about UDFs: 쐍 RETURNS Depending on the type of data we return, RETURNS will be a scalar or a table function. 353 354 Chapter 8 SQL and SQLScript 쐍 LANGUAGE UDFs can only be written in SQLScript. 쐍 SQL SECURITY This indicates against which user’s security privileges this function will be executed: – If we choose the DEFINER option, this function will be executed by the definer of the function, meaning the technical user of the container. – If we choose the INVOKER option, this function will be executed by the end user calling the function, and all security privileges will be limited to that user’s assigned authorizations. You can create UDFs in the SAP HANA system. We discussed in Chapter 4 how to create new files. We create .hdbfunction objects for scalar and table functions. Figure 8.4 shows a newly created function in SAP Web IDE for SAP HANA. Figure 8.4 Newly Created (Blank) Table Function in SAP Web IDE for SAP HANA In the following two sections, we’ll take a look at examples of both scalar functions and table functions. Scalar Functions Scalar UDFs support primitive SQL types as input. Let’s walk through an example to better understand their usage. Start by creating a scalar function as in Listing 8.3 that takes the price of a book and a discount percentage as the two inputs and creates two output values. The first output value is the price of the book after the discount is applied, and the second is the discount amount. Key Concepts Refresher Chapter 8 FUNCTION apply_discount (book_price decimal(15,2), book_discount decimal(15,2)) RETURNS discounted_price decimal(15,2), discount decimal(15,2), LANGUAGE SQLSCRIPT SQL SECURITY INVOKER AS BEGIN discounted_price := :book_price – ( :book_price * :book_discount); discount = :book_price * :book_discount; END; Listing 8.3 Scalar Function That Returns Two Values You can call the scalar function with the following code: SELECT apply_discount (59.50, 0.1).discounted_price as your_book_price, apply_discount (59.50, 0.1).discount as discount_amount from DUMMY; In this case, you get two values back: 53.55 and 5.95. Note the DUMMY table; it doesn’t really get called, so, in this case, it allows you to call the scalar function. Figure 8.5 shows a similar example of a scalar function from the SAP HANA Interactive Education (SHINE) demo content, as shown in SAP Web IDE for SAP HANA. Figure 8.5 Example of Scalar Function Found in SHINE Content 355 356 Chapter 8 SQL and SQLScript A scalar function can be called in an SQL statement as you would call a field. You can also call it as part of the WHERE clause, but you can’t use it as the data source (in the FROM clause.) Table Functions Table UDFs support table types as an input and returns a table as output. Let’s walk through an example to better under their usage. Listing 8.4 shows a modified version of a table function found in the SHINE content. This code performs an inner join between an employee table and an address table to find the employee’s email address. The neat trick here is in the WHERE clause: SAP HANA performs a fuzzy text search on the employee’s last name. You’ll still get an email address, even if you mistyped the person’s last name. This is discussed in more detail in Chapter 10. FUNCTION get_employees_by_name_filter" (lastNameFilter NVARCHAR(40)) RETURNS table (EMPLOYEEID NVARCHAR(10), "NAME.FIRST" NVARCHAR(40), "NAME.LAST" NVARCHAR(40), EMAILADDR NVARCHAR(255)) LANGUAGE SQLSCRIPT SQL SECURITY INVOKER AS BEGIN RETURN SELECT a.EMPLOYEEID, a."NAME.FIRST", a."NAME.LAST", b.EMAILADDR, FROM Employees AS a INNER JOIN Addresses AS b ON a.EMPLOYEEID = b. EMPLOYEEID WHERE contains("NAME.LAST", :lastNameFilter, FUZZY(0.9)); END; Listing 8.4 Table Function That Performs a Fuzzy Text Search on Last Names to Find Email Addresses Figure 8.6 shows the table function from the SHINE demo content in SAP Web IDE for SAP HANA. We can use the table function from Figure 8.6 as a data source in a graphical calculation view. In an SQL statement, you can place a table function in the FROM clause. An example is shown in Figure 8.7, in which this table function is used in a join with the result set from a Union node. (This isn’t really a good example—we should really join on the key field—but is used here purely to illustrate the point.) Key Concepts Refresher Chapter 8 Figure 8.6 Example of Table Function Found in SHINE Content Figure 8.7 Using a Table Function in a Graphical Calculation View 357 358 Chapter 8 SQL and SQLScript Procedures A procedure is a subroutine consisting of SQL statements. Procedures can be called or invoked by applications that access the database. Procedures are sometimes also known as stored procedures. Procedures are more powerful than UDFs. Whereas UDFs are read-only and can’t change any data in the system, procedures can insert, change, and delete data. You can execute any DDL or DML statements inside a procedure. Whereas a table function can only return one result set, procedures can return many result sets as output. These multiple result sets can be scalar and tabular. Listing 8.5 illustrates how to create procedures. CREATE PROCEDURE <proc_name> [(<parameter_clause>)] [LANGUAGE <SQLSCRIPT | R>] [SQL SECURITY <DEFINER | INVOKER>] [DEFAULT SCHEMA <default_schema_name>] [READS SQL DATA [WITH RESULTS VIEW <view_name>]] AS { BEGIN [SEQUENTIAL EXECUTION] <procedure_body> END } Listing 8.5 Creating Procedures The following syntax items are used in Listing 8.5: 쐍 LANGUAGE Procedures can be written in SQLScript or in R. R is a popular open-source programming language and data analysis software environment for statistical computing. 쐍 SQL SECURITY Indicates against which user’s security privileges this function will be executed: – If we choose the DEFINER option, this function will be executed by the definer of the function, meaning the technical user of the container. This is the default for procedures. – If we choose the INVOKER option, this function will be executed by the end user calling the function, and all security privileges will be limited to that user’s assigned authorizations. 쐍 READS SQL DATA Marks the procedure as being read-only. The procedure then can’t change the data or database object, and it can’t contain DDL or DML statements. If you try Key Concepts Refresher Chapter 8 to activate a read-only procedure that contains DDL or DML statements, you’ll get activation errors. 쐍 WITH RESULTS VIEW Specifies that a view (that we have to supply) can be used as the output of a readonly procedure. When a result view is defined for a procedure, it can be called by an SQL statement in the same way as a table or view. 쐍 SEQUENTIAL EXECUTION Forces sequential execution of the procedure logic; no parallelism is allowed. 쐍 PARALLEL EXECUTION Allows parallel execution of the procedure logic. Some important points to note with regards to procedures: 쐍 A procedure doesn’t require any input and output parameters. If we have input and output parameters, they must be explicitly typed. The input and output parameters of a procedure can have any of the primitive SQL types or a table type. In SQLScript, we can also define table types. Table types are similar to database tables but don’t have an instance. They are like a template for future use. They are used to define parameters for a procedure that needs to use tabular results. 쐍 If we require table outputs from a procedure, we have to use table types. 쐍 Any read-only procedure may only call other read-only procedures. The advantage of using a read-only procedure is that certain optimizations are available for read-only procedures, such as not having to check for database locks; this is one of the performance enhancements in SAP HANA. 쐍 Note that WITH RESULTS VIEW is a subclause of the READS SQL clause, so the same restrictions apply to it as those for read-only procedures. Let’s switch gears. Similar to UDFs, we can create procedures. In this case, we create an .hdbprocedure object. Figure 8.8 shows a procedure in SAP Web IDE for SAP HANA. Procedures are called with the CALL statement. You can call the procedure shown in Figure 8.8 with the following statement: CALL "get_product_sales_price" ('HT-1000', ?); The results of this CALL statement, as executed in the SQL Console in SAP Web IDE for SAP HANA, are shown in Figure 8.9. 359 360 Chapter 8 SQL and SQLScript Figure 8.8 Example Procedure Found in SHINE Content Figure 8.9 Results of Calling a Procedure Directly from the SQL Console Key Concepts Refresher Chapter 8 You can call procedures with multiple result sets, which read and write transactional data from an SAP HANA extended application services, advanced model (SAP HANA XSA) application using server-side JavaScript such as Node.js. The RESTstyle web services created will use an HTTP GET for reading data and an HTTP PUT for writing data. You’ll notice that how we use procedures is quite different from how we use UDFs. We can, however, use procedures in a similar way as functions if we specify the read-only and result view clauses. In Listing 8.6, we create a procedure that uses a pre-created result view called View4BookProc and a table type called BOOKS_TABLE_TYPE. CREATE PROCEDURE ProcWithResultView(IN book_num VARCHAR(13), OUT output1 BOOKS_TABLE_TYPE) LANGUAGE SQLSCRIPT READS SQL DATA WITH RESULT VIEW View4BookProc AS BEGIN output1 = SELECT author, title, price FROM BOOKS WHERE isbn_num = :book_num; END; Listing 8.6 Read-Only Procedure Using a Result View We can now call this procedure from a SQL statement, just like a table function: SELECT * FROM View4BookProc (PLACEHOLDER."$$book_num$$"=>'978-1-4932-1230-9'); Another way of calling a procedure is from a calculation view. In Chapter 7, we used input parameter mapping. We can also map the input parameter in a calculation view to the input parameter of a procedure. SQLScript Libraries SQLScript built-in libraries were introduced with SAP HANA 2.0 SPS 02. The number of built-in libraries is growing continuously. With SPS 03, user-defined libraries (UDLs) were introduced, where you can create your own library functions and procedures. These library members have to be wrapped with an SQLScript artifact like a procedure, function, or anonymous block. They can only be used for variable assignment, and they can’t be used in an SQL query. 361 362 Chapter 8 SQL and SQLScript SPS 04 The preceding restrictions have been removed with SPS 04. You now don’t need to wrap library members in an SQLScript artifact, and you can use library functions directly in an SQL query. SPS 04 also adds a library for logging, as well as an end user unit test framework. Multilevel Aggregation Sometimes, you may need to return multiple result sets with a single SQL statement. In normal SQL, this might not always be possible, but we can use multilevel aggregation with grouping sets to achieve this in SQLScript. Let’s start by looking at an example (Listing 8.7) in SQLScript of how we would create two result sets. -- Create result set 1 books_pub_title_year = SELECT publisher, name, year, SUM(price) FROM BOOKS WHERE publisher = pubisher_id GROUP BY publisher, name, year; -- Create result set 2 books_year = SELECT year, SUM(price) FROM BOOKS GROUP BY year; Listing 8.7 Creating Two Result Sets in SQLScript You can create both of these result sets from a single SQL statement using GROUPING SETS, as shown in Listing 8.8. SELECT publisher, name, year, SUM(price) FROM BOOKS WHERE publisher = pubisher_id GROUP BY GROUPING SETS ((publisher, name, year), (year)) Listing 8.8 Creating Multiple Result Sets from a Single SQL Statement with Grouping Sets You now have a good understanding of SQLScript and the powerful capabilities it brings to your modeling toolbox. Let’s now look at when you should use this functionality and how to ensure that you use the latest variations of these features. Views, Functions, and Procedures Taking a step back, let’s look at how to best address our modeling requirements. In SAP HANA we have three main approaches: views, UDFs, and procedures. Let’s first discuss each of these options, and then we’ll discuss when you should use each one. Key Concepts Refresher Chapter 8 Information Views We’ve discussed SAP HANA information views in detail in preceding chapters. You’ve built these views using the graphical environment, and you’ve now also learned how to also create these view using SQLScript. Some of the characteristics of views include the following: 쐍 Views are treated like tables. You can use SELECT, WHERE, GROUP BY, and ORDER BY statements on views. 쐍 Views are static. They always return the same columns, and you look at a set of data in the same way. 쐍 Views always return one result set. 쐍 Views are mostly used for reading data. User-Defined Functions After learning about UDFs, you should be familiar with their characteristics, such as the following: 쐍 Functions always return results. A table function returns a set, and a scalar function returns one or more values. 쐍 Functions require input parameters. 쐍 The code in UDFs is read-only. Functions can’t change, update, or delete data or database objects. You can’t insert data with functions, so you can’t use transactions or COMMIT and ROLLBACK in functions. 쐍 Functions can’t call EXEC statements; otherwise, you could build a dynamic statement to change data and execute it. 쐍 You can call the results of functions in SELECT statements and in WHERE and HAVING clauses. 쐍 Functions can be used in a join. 쐍 Table functions can’t call procedures because procedures can change data. You have to stick to the read-only paradigm of functions. 쐍 Functions use SQLScript code. This can include imperative logic in which you loop through results or apply an if-then logic flow to the results. Procedures Procedures are the most powerful of the three approaches. Some of the characteristics of procedures include the following: 363 364 Chapter 8 SQL and SQLScript 쐍 Procedures can use transactions with COMMIT and ROLLBACK. They can modify tables and data, unless they are marked as read-only. Read-only procedures may only call other read-only procedures. 쐍 Procedures can call UDFs, but UDFs can’t call procedures. 쐍 Procedures don’t have to return any values but may return multiple result sets. Procedures also don’t need to specify any input parameters. 쐍 Procedures provide full control because all code functionality is allowed in procedures. 쐍 Procedures can use try-catch blocks for errors. 쐍 Procedures can also use dynamic SQL—but be aware of security vulnerabilities such as SQL injection when using dynamic SQL. SPS 04 Up to SAP HANA 2.0 SPS 04, dynamic SQL was always considered as read-write. With SPS 04, you can now specify dynamic SQL in a procedure to be read-only by using the READS SQL DATA clause. This especially helps performance by parallelizing multiple read-only dynamic SQL statements. If you use dynamic SQL in a read-only procedure, SAP HANA automatically assumes that the dynamic SQL in such a procedure is also read-only. Why else would you include it into a read-only procedure? When to Use What Approach After all that detail, the recommendation of when to use each of the three approaches is quite easy to make: 쐍 In order of functionality, procedures come in first, followed by UDFs, and then views. 쐍 In order of performance, we have the reverse: views give the best performance, then UDFs (because they are permanently read-only), and then procedures. Tip Use the minimal approach to get the results you require. If you can do it with views, then use views. If you can’t do it with views, then try using UDFs. If that still doesn’t meet the requirements, create a procedure. This way, you do the least work and get the best performance. Important Terminology Chapter 8 Catching Up with SAP HANA 2.0 Figure 8.7 illustrated how to use a table function in a graphical calculation view. This option changes how we approach modeling in SAP HANA projects and shows how strong the graphical modeling environment has become. We can work graphically with our SAP HANA views, create UDFs or procedures for the occasional constraint, and bring these back into the graphical SAP HANA modeling environment. We can then continue building the remainder of our SAP HANA information models using the graphical information views. We should, however, stick to the latest modeling best practices: 쐍 The main guideline to keep in mind is that you should create design-time objects that can be deployed to multiple systems, such as development, testing, and production. Don’t create runtime objects as these can’t easily be copied to other systems. 쐍 Many modelers and developers used the SQL Console to create procedures. This creates runtime objects that can’t be transported to the production system. The recommended way is to create design-time objects—for example, .hdbprocedure objects—instead of using the CREATE PROCEDURE statement. 쐍 Don’t create database objects (e.g., tables) using SQL statements such as CREATE TABLE. Rather use core data services (CDS) objects such as .hdbcds. Even the .hdbtable is deprecated. You can use .hdbtabledata for loading data into the tables created by the CDS objects from data files (e.g., comma-separated values [CSV] files). Important Terminology The following important terminology was used in this chapter: 쐍 SQL Structured Query Language (SQL) is a database programming language used to enhance your modeling process when you have requirements that graphical views can’t fulfill. 쐍 SQLScript SQLScript exposes many of the memory features of SAP HANA to SQL developers. This includes working with column tables, parameterized information views, delta buffers, multiple result sets in parallel, built-in currency conversions at the database level, fuzzy text searching, spatial data types, and PAL. The 365 366 Chapter 8 SQL and SQLScript only way to make this functionality available via SQL queries, even from other applications, is to provide it via SQL extensions, that is, SQLScript. 쐍 User-defined functions (UDF) UDFs are custom, read-only functions in SAP HANA. 쐍 Procedures (stored procedures) A procedure is a subroutine consisting of SQL statements. Procedures can be called or invoked by applications that access the database. 쐍 Imperative logic Occasionally, you might prefer to receive the answers to a query one record at a time. This is most often the case when running SQL queries from application servers. In these cases, you’ll use imperative logic, meaning you’ll “loop” through the results one at a time 쐍 Dynamic SQL Dynamic SQL is when you build up SQL strings on the fly using code and then execute them. Practice Questions These practice questions will help you evaluate your understanding of the topics covered in this chapter. The questions shown are similar in nature to those found on the certification examination. Although none of these questions will be found on the exam itself, they will allow you to review your knowledge of the subject. Select the correct answers, and then check the completeness of your answers in the “Practice Question Answers and Explanations” section. Remember, on the exam, you must select all correct answers and only correct answers to receive credit for the question. 1. What paradigm describes the SQL language? 첸 A. Set-oriented 첸 B. Object-oriented 첸 C. Procedural 첸 D. Application-focused Practice Questions Chapter 8 2. To what subset of SQL does the SELECT statement belong? 첸 A. DML 첸 B. DDL 첸 C. DCL 첸 D. DQL 3. True or False: Delimited identifiers are treated as case-sensitive. 첸 A. True 첸 B. False 4. What is used in SQL to indicate that there is no value? 첸 A. Zero 첸 B. Empty quotes 첸 C. NULL 첸 D. Spaces 5. What character is used to terminate SQL statements? 첸 A. ? (question mark) 첸 B. : (colon) 첸 C. ; (semicolon) 첸 D. . (period) 6. What SELECT statement isn’t recommended for use with column tables? 첸 A. SELECT * 첸 B. SELECT with a time-travel clause 첸 C. SELECT ALL 첸 D. SELECT in a subquery 7. True or False: A union returns all values in a result set, even duplicate records. 첸 A. True 첸 B. False 367 368 Chapter 8 SQL and SQLScript 8. What character is prepended to a SQLScript variable when it’s used as an input variable? 첸 A. ! (exclamation mark) 첸 B. : (colon) 첸 C. _ (underscore) 첸 D. & (ampersand) 9. What should you keep in mind when using imperative logic in SQLScript? 첸 A. It delivers the best possible performance. 첸 B. You can only use if-then logic. 첸 C. You can loop through records. 첸 D. It matches SQL’s set-oriented paradigm. 10. What should you keep in mind when using dynamic SQL? (There are 2 correct answers.) 첸 A. It’s always bad for security. 첸 B. It could be used for SQL injection. 첸 C. You can dynamically change your data sources. 첸 D. It delivers the best possible performance. 11. You created a user-defined function with a return type of BIGINT. What type of function have you created? 첸 A. SQL function 첸 B. Scalar function 첸 C. Table function 첸 D. Window function 12. In which programming languages can you create procedures in SAP HANA? (There are 2 correct answers.) 첸 A. SQLScript 첸 B. L 첸 C. JavaScript 첸 D. R Practice Questions Chapter 8 13. You created a user-defined function with the INVOKER security option. Which user’s authorizations are checked when the function is executed? 첸 A. The owner of the function 첸 B. The container’s technical user 첸 C. The SYSTEM user 첸 D. The user calling the function 14. What is the preferred way to create a procedure? 첸 A. Use the CREATE PROCEDURE statement in the SQL console. 첸 B. Create a .procedure file. 첸 C. Create a .hdbprocedure file. 첸 D. Use the Import menu in SAP Web IDE for SAP HANA. 15. With which statement can you call a table function? (There are 2 correct answers.) 첸 A. CALL 첸 B. INSERT INTO 첸 C. SELECT 첸 D. EXEC 16. What statement is allowed in a user-defined function? 첸 A. COMMIT 첸 B. EXEC 첸 C. INSERT 첸 D. JOIN 17. You want to call a procedure in a SELECT statement (e.g., a table function). Which clauses do you have to specify? (There are 2 correct answers.) 첸 A. LANGUAGE SQLSCRIPT 첸 B. WITH RESULTS VIEW 첸 C. SEQUENTIAL EXECUTION 첸 D. READS SQL DATA 369 370 Chapter 8 SQL and SQLScript 18. What SQL construct do you use to return multiple result sets from a single SQL statement? 첸 A. GROUPING SETS 첸 B. SELECT subquery 첸 C. JOIN 첸 D. UNION 19. What can be used to return multiple result sets? 첸 A. Scalar function 첸 B. Table function 첸 C. Procedure 첸 D. View 20. What should you keep in mind about user-defined functions? 첸 A. They can use imperative logic. 첸 B. They can commit transactions. 첸 C. They can use dynamic SQL. 첸 D. They can call procedures. 21. What should you keep in mind about procedures? (There are 3 correct answers.) 첸 A. They can call views. 첸 B. They are NOT open to SQL injection. 첸 C. They must return values. 첸 D. They may have input parameters. 첸 E. They can call table functions. 22. Where should you use SQLScript outside of the development environment discussed in this chapter? 첸 A. Calculation views 첸 B. Security 첸 C. Text analytics Practice Question Answers and Explanations Chapter 8 첸 D. Creating tables 첸 E. Reporting tools 23. True or False: You should create tables with .hdbtable CDS objects. 첸 A. True 첸 B. False 24. What does SQLScript provide in SAP HANA? 첸 A. Variables to break up complex statements 첸 B. Powerful client-side processing 첸 C. Flow control logic 첸 D. Creation of column tables 첸 E. Aggregation functions, such as AVG() Practice Question Answers and Explanations 1. Correct answer: A SQL uses a set-oriented paradigm. Object-oriented and procedural paradigms are used for application-level programming languages. SQL is database-focused rather than application-focused. (Imperative logic is more application-focused.) 2. Correct answer: D The SELECT statement belongs to the Data Query Language (DQL) subset of SQL. Data Manipulation Language (DML) refers to SQL statements such as INSERT and UPDATE. Data Definition Language (DDL) refers to creating tables and so on. Data Control Language (DCL) refers to security. 3. Correct answer: A True. Yes, delimited identifiers are treated as case-sensitive. Undelimited (without quotes) identifiers are treated as if they were written in uppercase. 4. Correct answer: C NULL is used in SQL to indicate that there is no value. 5. Correct answer: C A semicolon (;) is used to terminate SQL statements. 371 372 Chapter 8 SQL and SQLScript 6. Correct answer: A A SELECT * statement isn’t recommended for use with column tables. Column tables are stored per column, and SELECT * forces the database to read every column. It becomes much faster when it can ignore entire columns. 7. Correct answer: B False. A UNION ALL statement returns all values in a result set, even duplicate records. UNION only returns the unique records. 8. Correct answer: B A colon (:) is prepended to a SQLScript variable when it’s used as an input variable. 9. Correct answer: C You can loop through records using imperative logic in SQLScript. It does not deliver the best possible performance. Declarative logic (the set-oriented paradigm) is much faster. The word only makes the “You can only use if-then logic” statement false because we can also use loop logic with imperative logic. 10. Correct answers: B, C Dynamic SQL can be used for SQL injection, and you can dynamically change your data sources. The word always makes the “It’s always bad for security” statement false. Dynamic SQL doesn’t deliver the best possible performance. 11. Correct answer: B You create a scalar function when it has a return type of BIGINT. This returns a single value, not a table-like result set. SQL functions and window functions aren’t UDFs. 12. Correct answers: A, D You can create procedures in SQLScript and R in SAP HANA. 13. Correct answer: D A user-defined function with the INVOKER security option checks the authorizations of the user calling the function when this function is executed. The technical user of the container is used if the DEFINER security option is used. The owner of the function is no longer applicable after the function has been deployed. The SYSTEM user is never used in this context. Practice Question Answers and Explanations Chapter 8 14. Correct answer: C The preferred way to create a procedure is to create an .hdbprocedure file. Methods A and B are deprecated! (This shows how much things have changed over the past few years.) 15. Correct answers: C, D You can call a table function with SELECT and EXEC statements. EXEC can be used to build a dynamic SQL statement that contains a SELECT statement, for example. You CAN’T use a function to INSERT INTO because it’s read-only. CALL is how procedures are called. 16. Correct answer: D JOIN is allowed in a UDF. COMMIT, EXEC, and INSERT aren’t read-only statements. A UDF won’t even build when you have these in the function. 17. Correct answers: B, D You need the WITH RESULTS VIEW and READS SQL DATA clauses to call a procedure in a SELECT statement. 18. Correct answer: A GROUPING SETS can return multiple result sets from a single SQL statement. All the other statements use multiple sources but return a single result set. 19. Correct answer: C Procedures can return multiple result sets. Table functions and views can only return single result sets. Scalar functions return (multiple) single values, not sets. 20. Correct answer: A UDFs can use imperative logic. They can’t commit transactions or use dynamic SQL (because they are readonly), and they can’t call procedures. (However, procedures can call UDFs.) 21. Correct answers: A, D, E Procedures can call views and table functions, and they may have input parameters. They may, but not must, return values. Because they can use dynamic SQL, they could be open to SQL injection. 373 374 Chapter 8 SQL and SQLScript 22. Correct answers: A, B, C You use SQLScript in calculation views (e.g., in filter expressions, and in calculated and restricted columns), in security, and in text analytics. You shouldn’t use SQLScript to create tables. That way, they’ll be runtime objects and can’t easily be deployed to multiple systems. Create them using CDS as design-time objects. Reporting tools don’t use SQLScript. 23. Correct answer: B No, you should NOT create tables with old .hdbtable CDS objects. You should create them with .hdbcds CDS objects. 24. Correct answers: A, C, D SQLScript provides variables to break up complex statements, flow control logic, and creation of column tables Takeaway You now have new tools and approaches in your modeling toolbox. We started by providing a basic understanding of SQL and how to use it with your SAP HANA information models. You learned about SQLScript in SAP HANA and discovered how it extends and complements SQL, especially by creating and using UDFs and procedures to use with your graphical SAP HANA information views. Summary You should now be up to date with the SQL and SQLScript working environments and know how to use them in your SAP HANA information modeling. We’ve now covered the modeling topics that taught you to build very powerful information models. The next chapter focuses on how to load data into SAP HANA. Chapter 9 Data Provisioning Techniques You’ll Master 쐍 Learn what data provisioning is 쐍 Understand the basic data provisioning concepts 쐍 Get to know some of the various SAP data provisioning tools available for SAP HANA 쐍 Know when to use the different SAP data provisioning tools in the context of SAP HANA 쐍 Identify the benefits of each data provisioning method 쐍 Gain a working knowledge of some features in a few of the SAP provisioning tools 376 Chapter 9 Data Provisioning In this chapter, we’ll cover how to provide data to SAP HANA information models and the basic principles and tools used for data provisioning in SAP HANA. For our information models to work properly, we need to ensure that we get the right data into SAP HANA at the right time. There are people who specialize in each of the data provisioning tools discussed, and there are special training courses for every one of these tools. Our aim, therefore, isn’t to dive deep into detail about the various concepts and tools, but to give you a high-level working knowledge of what each tool does to help you understand the concepts involved when provisioning data into SAP HANA, to enable you to have a meaningful conversation on the topic, and to help you choose the right tool for your business requirements. Real-World Scenario You start a new project that involves SAP HANA, and you’re asked to provide data for the SAP HANA system. You start by asking important questions about the expected data provisioning: Where is the data stored currently? Is the data clean? Will the data be updated in the source systems, or is this a once-off data provisioning process? Does the business need the data in real time, or is a delay of some hours acceptable? Does the data need to be stored in the SAP HANA system? Does the business already have a data provisioning tool that you can reuse? Will the data provisioning process be limited to SAP HANA only? After evaluating all these factors, you recommend the right data provisioning tool to the business, one that is applicable to its budget, circumstances, and requirements. Objectives of This Portion of the Test The objective of this portion of the SAP HANA certification is to test your highlevel understanding of the various SAP data provisioning tools available for getting data into an SAP HANA system. Key Concepts Refresher Chapter 9 The certification exam expects you to have a good understanding of the following topics: 쐍 General data provisioning concepts 쐍 Different SAP data provisioning tools 쐍 Similarities and differences between the various provisioning tools 쐍 When to use a particular SAP provisioning tool 쐍 Features of some of the SAP data provisioning tools Note This portion contributes about 4% of the total certification exam score. Key Concepts Refresher In Chapter 5, we started by looking at the basic information modeling concepts. We presented an overview diagram of the different types of information models in SAP HANA and how they all work together to help solve business requirements. This overview diagram is shown again in Figure 9.1. So far, we’ve focused our attention on the bottom half of the diagram: on information modeling inside SAP HANA. In this chapter, our attention shifts to the upperleft corner of the diagram. You need to know how to supply your SAP HANA information models with the data they require. SAP HANA doesn’t really differentiate whether the data comes from SAP or from non-SAP source systems. Data is data, regardless of where it comes from. Combined with the fact that SAP HANA is a full platform—and not just a dumb database—this makes it an excellent choice for all companies. SAP HANA is meant not only for traditional SAP customers but also for a wide range of other companies, from small startups to large corporations. In fact, SAP works with thousands of startups and smaller partners, helping them use SAP HANA to deliver innovative solutions that are “nontraditional” in the SAP sense. In this chapter, we’ll look at some of the various data provisioning concepts and tools. 377 378 Chapter 9 Data Provisioning SAP source system Non-SAP source system Database Database HTML5 browser Report or query Application server Data provisioning tool, e.g. SDI Table 1 Table 8 Table 2 Table 9 Table 3 Table 10 Table 4 Table 11 Table 5 Table 12 Customer dimension view Products dimension view Laura Apples Star Join (Logical Join) Quantity – 10 Star-join view (“Cube”) Union Star-join view (“Cube”) Calculation view of type cube Table 6 Data Foundation Table 7 Tables Star-join view (“Cube”) SAP HANA Figure 9.1 Overview Diagram and Where Data Provisioning Fits Tip We’ll look at the on-premise data provisioning tools. Several of the data provision tools are also used for SAP Cloud Platform, such as SAP Cloud Platform Smart Data Integration and SAP Cloud Platform Integration for data services. For more information on these tools, look at the SAP Cloud Platform Integration information at http://s-prs.co/v487636. Concepts Data provisioning is used to get data from a source system and provide that data to a target system. In our case, we get this data from various source systems, whether they’re SAP or non-SAP systems, and our target is always the SAP HANA system. We’re not too concerned in this chapter about the different types of data— whether financial transactional data, spatial data, real-time sensor data, social Key Concepts Refresher Chapter 9 media data, or aggregated information. We’ll focus instead on the process of providing the data to SAP HANA. Data provisioning goes beyond merely loading data into the target system. With SAP HANA and some of its new approaches to information modeling, we also have more options available to us when working with data from other systems. We don’t always have to store the data we use in SAP HANA as with traditional data loading. We now also have the ability to consume and analyze data once, never to be looked at again. Extract, Transform, and Load Extract, transform, load (ETL) is viewed by many people as the traditional process of retrieving data from a source system and loading it into a target system. Figure 9.2 illustrates the traditional ETL process of data provisioning. Non-SAP source system Database SAP HANA (target) Extract ETL-based data provisioning tool Load SAP source system Table 1 Table 5 Table 2 Table 6 Table 3 Table 7 Table 4 Tables Extract Transform Database Figure 9.2 Extract, Transform, Load Process The ETL process consists of the following phases: 쐍 Extraction During this phase, data is read from a source system. Traditional ETL tools can read many data sources, and most can also read data from SAP source systems. 쐍 Transform During this phase, the data is transformed. Normally this means cleaning the data, but it can refer to any data manipulation. It can combine fields to calculate something (e.g., sales tax), filter data, or limit data (e.g., to the year 2019). You set up transformation rules and data flows in the ETL tool. 379 380 Chapter 9 Data Provisioning 쐍 Load In the final phase, the data is written into the target system. The ETL process is normally a batch process. Due to the time required for the complex transformations that are sometimes necessary, tools don’t deliver the data in real time. With SAP HANA, the process sometimes changes from ETL to ELT—extract, load, transform—because SAP HANA has data provisioning components built in, as shown in Figure 9.3. In this case, we extract the data, load it into SAP HANA, and only then transform the data inside the SAP HANA system. Non-SAP source system Database Extract SAP HANA Data provisioning component inside SAP HANA Load SAP source system Extract Database Table Transform Table 1 Table 5 Table 2 Table 6 Table 3 Table 7 Table 4 Tables Table Figure 9.3 Extract, Load, Transform Process Initial Load and Delta Loads It’s important to understand that loading data into a target system happens in two distinct phases, as illustrated in Figure 9.4. We always start by making a full copy of the data that we need from the source system. If we require an entire table, we make a copy of the table in the target system. This is called the initial load of the data. We can limit the initial load with a filter (e.g., to load only the latest 2019 data). Sometimes this step is enough, and we don’t need to do anything more. If we set up a training or a testing system, we don’t need the latest data in these systems. Many times, however, we need to ensure that our copy of the data in the target system stays up to date. Business carries on, and items are bought and sold. Each new transaction can change the data in the source system. We need a mechanism to Key Concepts Refresher Chapter 9 381 update the data in the target system. We could just delete the data in the target system and perform the initial load again. For small sets of data, this approach can work, but it quickly becomes inefficient and wasteful for large data sets. Initial data loads Source system Data provisioning tool Table SAP HANA target system Table Delta data loads Source system Data provisioning tool Table SAP HANA target system Table Figure 9.4 Initial Load of Tables and Subsequent Delta Updates A more efficient method is to look at the data that has changed and apply the data changes to our copy of the data in the target system. These data changes are referred to as the delta, and applying these changes to the target system is called a delta load. The various data provisioning tools can differ substantially in the way they implement the delta load mechanism. One method of achieving this is replication. Replication Replication emphasizes the time aspect of the data loading process. ETL focuses on ensuring that the data is clean and in the correct format in the target system. Replication does less of that, but it gets the data into the target system as quickly as possible. As such, the replication process appears quite simple, as shown in Figure 9.5. 382 Chapter 9 Data Provisioning Non-SAP source system Database SAP HANA (target) Read Write Replication Server (real-time) SAP source system Read Write Table 1 Table 5 Table 2 Table 6 Table 3 Table 7 Table 4 Tables Database Figure 9.5 Replication of Tables The extraction process of ETL tools can be demanding on a source system. Because they have the potential to dramatically slow down the source system due to the large volumes of data read operations, the extraction processes are often only run after hours. Replication tools aim to get the data into the target system as fast as possible. This implies that data must be read from the source system at all times of the day and with minimal impact to the performance of the source system. The exact manner in which different replication tools achieve this can range from using database triggers to reading database log files. With SAP HANA as the target system, we achieve real-time replication speeds. (We’ve worked on projects where the average time between when a record is updated in the source system to when the record is updated in SAP HANA was only about 50 milliseconds!) Tip We often use real-time replication with SAP HANA to leverage the speed of SAP HANA for data analysis. SAP-Provided Extractors Normally, we think of data provisioning as reading the data from a single table in the source system and then writing the same data to a similar table in the target system. However, it’s possible to perform this process differently, such as reading Key Concepts Refresher Chapter 9 383 the data from a group of tables and delivering all this data as a single integrated data unit to the target system. This is the idea behind the extractors provided by SAP. The standard approach for taking data from an SAP business system to an SAP Business Warehouse (SAP BW) system (Figure 9.6) uses extractors. SAP predelivers thousands of extractors for SAP business systems. Extractors are only found in SAP systems. In the scenario shown in Figure 9.6, the extractor gathers data from multiple tables in the SAP source system. It neatly packages this data for consumption by the SAP BW system in a flat data structure (datasource). Extractors provide both initial load and delta load functionality. All this happens in the SAP source system. SAP source system SAP BW (target system) Delta queue Infopackage Datasource (flat) Transformation Extractor Activation View Table 1 Table 2 Table 4 Table 5 Table 3 Data Store Object or Infoprovider Figure 9.6 Extractors in SAP Business Systems Loading Data into SAP BW Now that you understand the concepts, we can visit the various SAP tools available for provisioning data. SAP Data Services SAP Data Services is a powerful ETL tool that can read most known data sources, transform and clean data, and load data into target systems. SAP Data Services runs on its own server, as shown in Figure 9.7, and requires a database to store job definitions and metadata. This can be an SAP HANA database. 384 Chapter 9 Data Provisioning Non-SAP source system Extract SAP Data Services (Data Integrator) SAP HANA (target) Designer Database Job Server Extract SAP source system Transform ABAP Data Flows Database Load Table 1 Table 5 Table 2 Table 6 Table 3 Table 7 Table 4 Tables SAP Data Quality Management Figure 9.7 SAP Data Services The extraction capabilities of SAP Data Services are well known in the industry. It can read many data sources—even obscure ones, such as old COBOL copybooks. However, SAP Data Services can also read all the latest social media feeds available on the Internet, and it can communicate with most databases and large business applications. You can also use it with stores of unstructured data, such as Hadoop. SAP Data Services can get data from SAP business systems in various ways. We can use it to directly read a system’s databases. We could also use some of the built-in extractors we discussed previously. Another way SAP Data Services can get data from SAP business systems is by using an ABAP data flow. ABAP is the programming language that SAP uses to write the SAP Business Suite systems. SAP Data Services can write some ABAP code that you can load into the SAP source system. This code becomes part of an ABAP data flow and is a faster and more reliable way to extract the data from an SAP source system. SAP Data Services can also provide complex transformations. SAP Data Services is actually several products combined. When people refer to SAP Data Services, they normally refer to the data provisioning product called Data Integrator. Another product called SAP Data Quality Management is also combined under the SAP Data Services product name. All SAP Data Services products ensure that you have clean and reliable information to work with. You may have heard the old saying, “garbage in, garbage out.” If you have rubbish data going into your reporting system, you get rubbish data in your reports. SAP HANA’s great performance doesn’t make this go away. It just changes that old saying to, “garbage in, garbage out fast!” Key Concepts Refresher Chapter 9 You can use SAP Data Services for data provisioning in the following scenarios: 쐍 A strong ETL tool is required. 쐍 Certain data sources need to be read (e.g., legacy systems). 쐍 Transformation of data is required, for example, when data isn’t clean. 쐍 Real-time data provisioning isn’t a business requirement. 쐍 A customer already has an instance of SAP Data Services running and requires you to use the current infrastructure and skills available. 쐍 The business requires a tool that delivers data to multiple destinations, of which SAP HANA is one. SAP Landscape Transformation Replication Server SAP Landscape Transformation Replication Server is a data provisioning tool that uses replication to load data into SAP HANA. Remember that the focus of a replication server is to load data into the target system as fast as possible. SAP Landscape Transformation Replication Server started as a tool that SAP developed and used internally to help companies migrating data between SAP systems, for example, when companies wanted to consolidate systems after a merger or acquisition. SAP Landscape Transformation Replication Server is written in ABAP and runs on the SAP NetWeaver platform, just like SAP ERP and other SAP business systems. The tool was successfully used for years for data migrations, but it became more popular and well known with the arrival of SAP HANA. SAP Landscape Transformation Replication Server works with both SAP and nonSAP source systems. It uses the same database libraries as the SAP Business Suite and therefore supports all the same databases as the SAP business systems, including SAP HANA, SAP Adaptive Server Enterprise (SAP ASE), SAP MaxDB, Microsoft SQL Server, IBM DB2, Oracle, and Informix. We recommend that SAP Landscape Transformation Replication Server be installed on its own server, as shown in Figure 9.8, even though you can install it inside a source system based on SAP NetWeaver. SAP Landscape Transformation Replication Server uses a trigger-based method for real-time replication. All databases have a trigger whereby they can spur an action to occur when data changes in a table. For example, if data is updated or deleted, or if new data is inserted in a specific table, you can set up a database trigger to also write these changes to a special replication log table. SAP Landscape Transformation Replication Server replication is also known as trigger-based replication. The 385 386 Chapter 9 Data Provisioning impact of the SAP Landscape Transformation Replication Server replication on the source systems is minimal because database triggers are quite efficient. SAP LT Replication Server Source system SAP HANA (target) Table 1 Table 5 Table 2 Table 6 Table 3 Table 7 Table 4 Tables Read module Controller module Write module Database Table 1 … Table n Trigger Logging Table 1 Logging Table 2 Figure 9.8 SAP Landscape Transformation Replication Server SAP Landscape Transformation Replication Server performs the initial load of the database tables without the use of triggers. The delta load mechanism is trigger based. The tool creates triggers in the source systems on the tables it replicates. Any changes made to the source tables after the initial load are collected in logging tables in a separate database schema. SAP Landscape Transformation Replication Server then quickly applies all these changes to the SAP HANA target tables and empties the logging tables in the source system. The logging tables are therefore normally kept very small. You can perform filtering and simple transformations as you load the data into SAP HANA by calling ABAP code inside SAP Landscape Transformation Replication Server. Because it uses ABAP code, you can create more complex transformations, but this would interfere with the real-time advantages of SAP Landscape Transformation Replication Server. If you require complex transformations or data cleaning, consider using SAP Data Services. Because SAP Landscape Transformation Replication Server can read data from older SAP systems, it can also read non-Unicode data from these systems. It converts this data to Unicode while it’s replicated to SAP HANA. This is required because SAP HANA uses Unicode. Key Concepts Refresher Chapter 9 You can use SAP Landscape Transformation Replication Server for your data provisioning needs in the following scenarios: 쐍 Data must be replicated from SAP NetWeaver systems (written in ABAP). 쐍 Replication of data is preferred to transformation. 쐍 There is a business requirement or preference to get the data into SAP HANA in real time. 쐍 The use of triggers is acceptable. 쐍 Replication of data from older SAP systems is required. SAP Replication Server SAP Replication Server also uses replication to load data into SAP HANA. The focus of a replication server is to load the data into the target system as fast as possible with minimal impact on the source system. SAP Replication Server differs from SAP Landscape Transformation Replication Server in that it uses the database log files for real-time replication, as shown in Figure 9.9. Its delta load mechanism is log-based. As data changes in tables, all databases write these changes into a database log file. SAP Replication Server reads these database log files to enable table replication. SAP Replication Server replication is also known as log-based replication. This log file replication mechanism has even less of an impact on the source system than SAP Landscape Transformation Replication Server. SAP HANA (target) Source system SAP Replication Server Database Table 1 Table 2 Table 3 Table 1 … Table n Logs Log files Figure 9.9 SAP Replication Server Tables 387 388 Chapter 9 Data Provisioning SAP Replication Server works with both SAP and non-SAP source systems and supports many databases. Replication servers don’t focus on the filtering or transformation of data before loading the data into SAP HANA. The database log files contain all the changes in a database. SAP Replication Server can use this information to replicate entire databases. For example, SAP ASE databases use SAP Replication Server to replicate the database log files for disaster recovery purposes. Tip The features of SAP Replication Server, such as log-based replication, have been incorporated into SAP HANA smart data integration. SAP HANA smart data integration is built in to SAP HANA and is therefore preferred for our purposes. You can use SAP Replication Server for your data provisioning needs in the following scenarios: 쐍 Replication of data is preferred to transformation. 쐍 There is a business requirement or preference to get the data into SAP HANA in real-time. 쐍 The use of triggers isn’t acceptable. 쐍 No pool or cluster tables need to be replicated. 쐍 The business requires minimal impact on the source system. 쐍 Database replication is needed for disaster recovery purposes. 쐍 The business requires a tool that delivers data to multiple destinations, of which SAP HANA is one. SAP HANA Smart Data Access SAP HANA smart data access provides SAP HANA with direct access to data in a remote database. In this case, we don’t store the data in the SAP HANA database; we can access it at any time. We merely consume the remote data when required. SAP HANA smart data access allows you to create virtual tables, that is, empty tables that point to tables or views inside another database somewhere (see Figure 9.10). Some databases have a similar feature called proxy tables (also sometimes called data federation). Key Concepts Refresher Chapter 9 Report SAP HANA Remote Database View SAP HANA Table Virtual Table S D A Read and write Table View via network Figure 9.10 SAP HANA Smart Data Access and Virtual Tables SAP HANA smart data access has various database adapters it uses to connect these virtual tables to a large number of databases. These adapters include the following: 쐍 SAP HANA, SAP ASE, SAP IQ, SAP Event Stream Processor, and SAP MaxDB 쐍 Hadoop (Hive), Spark, and SAP Vora 쐍 IBM DB2 and IBM Netezza 쐍 Microsoft SQL Server 쐍 Oracle 쐍 Teradata SAP HANA smart data access can also use an Open Database Connectivity (ODBC) framework to connect to other databases via ODBC. When you ask for data from a virtual table, SAP HANA connects to the remote database across the network, logs in, and requests the relevant data from the remote table or view. You can combine the data from the remote database table with data stored in SAP HANA. You can use a virtual table in SAP HANA information models, just like native SAP HANA tables. SAP HANA smart data access will automatically convert the data types from those used in the remote database to SAP HANA data types. 389 390 Chapter 9 Data Provisioning Virtual Tables vs. Native Tables You can’t expect the same performance from SAP HANA smart data access virtual tables as from SAP HANA native tables. SAP HANA tables are loaded in the memory of SAP HANA. The data from the virtual table is fetched from a relatively slow disk-based database across the network. Keep in mind that SAP HANA smart data access reads data from remote tables directly, as it’s stored. SAP HANA smart data access can’t do any data cleansing because it reads data from remote tables. SAP HANA also doesn’t have any scheduling tool, so SAP HANA smart data access can’t be used for ETL processes or replication. It just reads the data from the remote tables as the need arises; for example, a reporting query requires data to be stored in a remote table. SAP HANA provides the following optimizations for SAP HANA smart data access virtual tables: 쐍 SAP HANA caches data that it reads from remote databases. If the data is found in the cache, the data is read from the cache, and no request is sent to the remote database. 쐍 SAP HANA can use join relocation. If you join two virtual tables from different remote databases, it can copy the smaller remote table to a temporary table in the remote system with the larger table. It then requests the second remote system to join the larger table to the temporary table. 쐍 SAP HANA can push down queries to remote databases if it will execute faster in the remote database, rather than copy lots of data via the network. For example, a remote database can calculate an aggregate faster on half a million records than it will take to copy these records across the network for SAP HANA to aggregate. SAP HANA smart data access collects statistics for all its virtual sources to help with such optimizations. 쐍 SAP HANA also has optimization features for aggregating data. It uses functional compensation to compensate for certain (missing) features in other databases. We often hear about people wanting to put all their data into a single physical data warehouse to analyze it there. Some variations are called data lakes. This scenario is referred to as big data (see Figure 9.11, left). Key Concepts Refresher Chapter 9 Report 1 Report 1 Report 2 391 Report 2 Physical Data Warehouse Logical Data Warehouse (SAP HANA using SDA) Table 1 Table 4 System 1 Database 1 Database 2 System 1 Virtual Table 3 Virtual Table 2 Virtual Table 4 Table 5 System 2 System 3 Virtual Table 1 Table 3 Table 2 Virtual Table 5 System 2 System 3 Database 1 Database 2 Figure 9.11 Logical Data Warehouse Use Case for SAP HANA Smart Data Access There are a few problems with the physical data warehouse approach: 쐍 You have to duplicate all your data, so your data is double the size, and you now need more storage. You essentially have doubled your cost and effort. 쐍 You have to copy all that data via the network, which slows your network. 쐍 The data in the new data warehouse is never up to date, and, as such, the data is never consistent. 쐍 The single data warehouse system itself can’t handle all the different types of data requirements, such as unstructured data, graph engine data, key–value pairs, spatial data, huge varieties of data, and so on. These are some of the issues we deal with in the physical data warehouse scenario; therefore, it might be better to consider a logical data warehouse. You keep the data in your source systems, don’t duplicate it via the network, and don’t store 392 Chapter 9 Data Provisioning copies of it. This is where SAP HANA smart data access fits in. You create virtual tables for the various data sources. None of the data is stored inside SAP HANA, but yet it behaves just as if it was in a single data warehouse. You get the data from the various source databases only when required. You can access bits of specific data, combine them with other data, and report on the data (see Figure 9.11, right). You can use SAP HANA smart data access for your data provisioning needs in the following scenarios. 쐍 All the data doesn’t need to be stored inside SAP HANA. 쐍 A logical data warehouse is used, or you’re presented with a big data scenario. 쐍 Integration with Hadoop is required. 쐍 A smaller SAP HANA system is in use that can’t store all the data. 쐍 Real-time data provisioning isn’t a business requirement. 쐍 SAP HANA performance isn’t required. 쐍 You have no need to transform or clean data. SAP HANA Smart Data Integration and SAP HANA Smart Data Quality SAP HANA smart data integration and SAP HANA smart data quality are optional SAP HANA components that you can purchase. They are, however, included with the SAP HANA, express edition, which is completely free for both development and production use for up to 32 GB of memory. SAP HANA Non-SAP source system SDI and SDQ Database Extract, replicate, or SDA SAP source system Load Table Table Table 1 Table 5 Table 2 Table 6 Table 3 Table 7 Table 4 Tables Transform (.hdbflowgraph) Database Figure 9.12 SAP HANA Smart Data Integration and SAP HANA Smart Data Quality As shown in Figure 9.12, SAP HANA smart data integration and SAP HANA smart data quality can work similarly to ELT (refer to Figure 9.3). However, SAP HANA smart data integration can also be used for real-time replication and a few other Key Concepts Refresher Chapter 9 scenarios. Over time, SAP has incorporated many of the data provisioning features from other tools into SAP HANA smart data integration. The other fact that counts in favor of SAP HANA smart data integration is that many of SAP’s business systems have migrated to run on SAP HANA. SAP HANA smart data integration has evolved so much over the past few years, that you can apply the simple rule, “When in doubt, use SAP HANA smart data integration.” It’s even the preferred integration tool on SAP Cloud Platform. Using various adapters, including SAP HANA smart data access adapters, we can replicate or consume data in SAP HANA. Then, using SAP HANA smart data quality capabilities, we can transform and clean that data. Note You can find more information about SAP HANA smart data integration at http://sprs.co/v487637. SAP HANA Smart Data Integration SAP HANA smart data integration can use various adapters to replicate, load, or consume data in SAP HANA. Via SAP HANA smart data access adapters, SAP HANA smart data integration can consume data from a large list of databases using virtual tables. SAP HANA smart data integration has some real-time adapters, for example, reading the database log files of source systems such as Oracle, Microsoft SQL Server, and IBM DB2 databases. These same adapters can be used to read data from SAP systems such as SAP ERP. SAP HANA smart data integration also has a real-time adapter for Twitter, as well as batch adapters for reading web services via OData and for reading data from Hadoop Hive. There is even an adapter for Microsoft Excel. The difference between the log-based adapters for SAP HANA smart data integration and SAP Replication Server is that SAP HANA smart data integration has SAP HANA as the target system for your data provisioning, whereas a standalone SAP Replication Server system can have many target systems. Tip SAP HANA smart data integration has become the preferred SAP data provisioning tool for SAP. This is due to its wide range of adapters that provide both batch and real-time replication capabilities, and the fact that many SAP systems are now running on SAP HANA. 393 394 Chapter 9 Data Provisioning SAP HANA Smart Data Quality SAP HANA smart data quality works together with SAP HANA smart data integration and focuses on the data cleaning aspects. It also provides complex transformations of your data. The data flows in SAP HANA smart data quality use the .hdbflowgraph features inside SAP HANA, as shown in Figure 9.13. (We’ll also use this for predictive and geospatial models in Chapters 10 and 11, respectively.) These data flows work differently from data flows in SAP Data Services. Therefore, you can’t simply copy and paste your SAP Data Services data flows into SAP HANA smart data quality. On the left side of the flowgraph in Figure 9.13, you create your Data Sources. This is where you set up your SAP HANA smart data integration sources. You then design a flow for your data, until it ends up in the Data Target on the right side. Part of this flow can use SAP HANA smart data quality to clean and transform the data. Figure 9.13 Flowgraph Using SAP HANA Smart Data Integration and SAP HANA Smart Data Quality The Geocode node can be used to transform addresses to map locations or transform map locations to addresses. This forms part of the data transformation in the ELT process. In fact, you can use all the different nodes, shown in Figure 9.14 in a flowgraph for data transformation, including all the predictive algorithms available in SAP HANA. In the top right of Figure 9.14, you see nodes for Data Quality. There are also some in the Advanced section. In the Basic and Flow Control sections, the nodes provide Key Concepts Refresher Chapter 9 similar functionality as some of the nodes in a calculation view, such as Projection, Join, Union, and Aggregation. Figure 9.14 Available Flowgraph Nodes You can use SAP smart data integration and SAP HANA smart data quality for your data provisioning needs in the following scenarios: 쐍 An ELT tool is required. 쐍 Transformation of the data is required, such as when the data isn’t clean. 쐍 Real-time data provisioning might be a business requirement. 쐍 Replication of data is required, with later transformation of the data. 쐍 The data provision target is SAP HANA. 쐍 Data must be read from many data sources. SAP HANA Streaming Analytics SAP HANA streaming analytics provides SAP HANA with the ability to observe, process, and analyze fast-moving data. It extends the capabilities of the SAP HANA platform with the addition of real-time event stream processing. An example of such fast-moving data is sensor data or financial ticker tape data. It’s a continuous stream of data points, for example, the temperature of a machine. Each data point in the stream is linked to a point in time. This combination of the temperature data at a specific point in time is called an “event.” 395 396 Chapter 9 Data Provisioning Just like SAP HANA smart data integration and SAP HANA smart data quality, SAP HANA streaming analytics is an optional SAP HANA component that you can purchase. However, SAP HANA streaming analytics is included with SAP HANA, express edition, which is completely free for both development and production use for up to 32 GB of memory. Let’s look at an example of why you might use data stream processing. Say that you want to monitor some manufacturing equipment. Sensor data tells you the equipment temperature, but that data alone isn’t very interesting. If the equipment temperature is 90°C, and the pressure is high, that might be normal. However, if the temperature is 180°C when the pressure is high, that could indicate a dangerous situation in which equipment could be blown apart, causing accidents and manufacturing losses. You want to receive an alert before a situation becomes critical so you can analyze the temperature and pressure trends over time and the relationship between these factors. It’s the combination of certain events in a specified time frame that is important here. We call this the “event window.” The temperature data itself isn’t important, and you don’t want to store this data. The data, once exposed and processed, is no longer useful. You just want to find the trend, see the relationships between the data, and react when certain conditions are met within a certain time period. Data streaming and the processing of such information is different from a single event, such as a database trigger. Figure 9.15 illustrates the data streaming process. SAP HANA SAP HANA streaming analytics Production plant system Streaming data Event window History data Financial ticker tape system Figure 9.15 SAP HANA Streaming Analytics Another example to illustrate the use of SAP HANA streaming analytics is in stock trading. You can have many streams of financial data coming in at very high speeds. How do you react to these continual data streams in a short span of time? Key Concepts Refresher Chapter 9 What do you react to? A single stock going down may not be a problem, but if many stock prices start going down quickly within a certain time window—say, five minutes—that could indicate a stock market crash. The single price values are of little importance. The important information is found in combining many trends and finding the trend in a certain time window. The Internet of Things (IoT) isn’t in the future—it’s already here. Millions of devices are now connected and delivering information. We even have lightbulbs with built-in Wi-Fi that can be controlled from an app on your smartphone. How do you take advantage of all the data that’s available? How do you collect it and analyze it? With SAP HANA streaming analytics, not only can you analyze data streams but you also can store this data and perform deeper, wider analytics of it at a later stage. In this way, SAP HANA streaming analytics and SAP HANA complement each other. SAP HANA streaming analytics is also a vital part of the SAP Leonardo technology stack, where SAP looks at innovative solutions touching areas such as the IoT, machine learning, and blockchain. You can use SAP HANA streaming analytics for your data provisioning needs in the following scenarios: 쐍 Real-time streaming data for events in a certain time window must be analyzed and reacted to properly. 쐍 Fast-moving data is observed, processed, and analyzed, but not stored. 쐍 Data is processed as-is; there is no data cleaning, but you have to find a signal in spite of the noisy data. 쐍 Only data where something interesting is happening needs to be stored. The rest of the data gets discarded, resulting in huge storage savings. SQL Anywhere and Remote Data Sync Up to this point, we’ve made the assumption that we always have good, stable, and constant connectivity between the data sources and SAP HANA. Unfortunately, that isn’t always the case. Let’s consider some scenarios where we can’t assume that we have such connectivity: 쐍 Remote workplaces, such as ships, trains, oil rigs, and mines, that lose connectivity for periods of time (also known as satellite servers) 397 398 Chapter 9 Data Provisioning 쐍 Mobile applications that get used in factories, underground, or any place where the mobile device loses connectivity due to interference or layers of concrete and metals 쐍 IoT sensors that have to conserve power to keep running as long as possible, as well as where constant connectivity would be huge drain on energy levels For these cases, you can install SAP SQL Anywhere on these remote servers or devices. This product contains a small database that can even be embedded in hardware devices such as sensors. On the SAP HANA side, we install SAP HANA remote data sync. You then configure the SAP HANA remote data sync server in SAP HANA to communicate with the SAP SQL Anywhere at the remote sites, as shown in Figure 9.16. SAP HANA remote data sync is particularly useful when you use two-way data flows. The remote sites can send some data to the SAP HANA server, where it can get processed. The results can then be sent down to the remote site and are stored as master data. When the connectivity disappears, the remote site can happily continue functioning until connectivity is restored. All transactions in that time frame are then synchronized. Data Center Remote sites Satellite Server Mobile SAP HANA platform Internet of Things Applications RDSync Server SAP SQL Anywhere Reports Data Synchronization SAP HANA database Figure 9.16 Using SAP HANA Remote Data Sync and SAP SQL Anywhere for Remote Sites with Occasional Connectivity We can use this setup to synchronize data between SAP HANA and thousands of remote SAP SQL Anywhere databases, and still ensure that we can acquire data for analysis in SAP HANA over intermittent and slow networks. The remote sites benefit by always having data available for their business applications. Key Concepts Refresher Chapter 9 Flat Files Flat files (also called comma-separated values [CSV] files) are the simplest way to provision data to SAP HANA. Especially in proof-of-concept (POC) or sandbox testing projects, you can waste time connecting SAP HANA to the various source systems. You don’t need the latest data; you just want to find out if an idea you have for information modeling will work. As such, it makes sense to ask the business to provide you with the data in flat files, and then you can load the data into SAP HANA for the POC or sandbox development (see Figure 9.17). SAP HANA Table 1 Table 5 Table 2 Table 6 Table 3 Table 7 Table 4 Tables CSV data file Import data files Figure 9.17 Loading Data into SAP HANA from Text Files Using Core Data Services (CDS) In SAP Web IDE for SAP HANA, you create an .hdbtabledata file. You can see an example in Figure 9.18 from the SAP HANA Interactive Education (SHINE) demo. In line 7, it specifies that the file is a CSV file, and line 8 specifies the name of the text file. The entire SHINE demo application is built this way, and it loads the demo data using this method. SPS 04 One of the features that you could do with SAP HANA Studio in SAP HANA 1.0 was to import data into a table from CSV files or Excel spreadsheets. With SPS 04, you can now import data from medium-sized CSV files or Excel spreadsheets using the SAP HANA database explorer. 399 400 Chapter 9 Data Provisioning Figure 9.18 Loading Flat Files into SAP HANA Using an .hdbtabledata Definition Consider using data files for your data provisioning needs in the following scenarios: 쐍 Data only needs to be loaded once, for example, for sandbox or training systems. 쐍 There is no connection available to the source systems. 쐍 Data needs to be loaded simply and quickly. Web Services (OData and REST) We won’t discuss web services in detail in this text, but of course you can consume data from a source system that provides it via web services. You can consume these web services using the SAP HANA extended application services, advanced model (SAP HANA XSA) development environment. Key Concepts Refresher Chapter 9 SAP HANA XSA not only delivers OData and REST web services but also consumes them, as shown in Figure 9.19. The data is normally transferred in JavaScript Object Notation (JSON) format rather than XML format. SAP HANA Source system Web Services Consume Web Services (OData, REST) Database XS engine Table 1 Table 5 Table 2 Table 6 Table 3 Table 7 Table 4 Tables Figure 9.19 Data Provisioning via Web Services Consider using web services for your data provisioning needs when the source systems only expose their data as web services. This is typical of web and cloud applications. SAP Vora Another assumption that we sometimes unconsciously make is that all the data we work with is structured and neatly arranged in database records. Real life tends to be a bit messier. Hadoop has grown quite popular in past years for addressing the problems of storing and working with large volumes of “messy” data. Think of Google having to work with all the “messy” data on the Internet. Businesses have started using Hadoop for its low-cost distributed storage and ability to scale. Hadoop is basically a distributed data infrastructure that distributes large data collections across multiple nodes within a cluster of low-cost servers. However, Hadoop isn’t very good for online analytical processing (OLAP), which businesses use to analyze and derive meaning from their data. It usually entails long-running batch computations, and real-time decision support capabilities just aren’t possible. Business analysts sometimes also need to wait for the Hadoop data to be transformed by a few highly skilled but scare resources before they can use it. 401 402 Chapter 9 Data Provisioning SAP Vora is the in-memory solution that SAP has built to solve these problems. SAP Vora uses the in-memory technologies that SAP developed with SAP HANA to speed up these long-running analytics in Hadoop. SAP Vora is installed on each Hadoop/Spark node, as shown in Figure 9.20. Spark is a component running on top of Hadoop to speed up the processing of data. SAP Vora takes that to the next level. Users Business Intelligence Visual Apps Predictive Data Science Hadoop/ Spark node Hadoop/ Spark node Hadoop/ Spark node Hadoop/ Spark node SAP Vora SAP Vora SAP Vora SAP Vora Spark Spark Spark Spark Files Files Files Files Kubernetes cluster Hadoop/Spark cluster Figure 9.20 SAP Vora as Part of a Hadoop/Spark Cluster Inside SAP Vora is a set of in-memory distributed computing engines for running high-performance, sophisticated analytics against relational, time series, graph, and JSON data. It has an OLAP engine for creating the analytics that businesses require. When the data in SAP Vora exceeds the memory, it can swap the data to disk when required. These data overflows are processed efficiently, while ensuring that the resulting analysis still reflect the complete data set. SAP HANA Data Management Suite In this chapter you might get the impression that you only need to get the data into the SAP HANA system, but you’ll also use many of these tools for maintaining Important Terminology Chapter 9 the data. This maintenance might be to keep data up to date, especially for master data in the case of master data management. Or it might be due to regulations such as General Data Protection Regulation (GDPR) that force you to check, update, and delete user data. Data is often called the “oil” of this century. Even engines need some fresh oil every now and then. SAP released SAP HANA Data Management Suite in 2018 to help with this kind of data maintenance and to help with managing data from multiple sources, such as legacy databases, cloud systems, desktop programs, sensors, and data lakes. You can then enrich that data with spatial, text, graph, predictive, and series processing, which we discuss in the next two chapters. Many of the tools in SAP HANA Data Management Suite are available individually. In the suite they are used together to manage the data for the entire enterprise. Tools included are as follows: 쐍 SAP HANA for in-memory processing of the data, ETL, modeling, development, data anonymization, security, and much more. 쐍 SAP Enterprise Architecture Designer (SAP EA Designer) for defining an enterprise-wide metadata model to manage the data models and data rules, and create a harmonized view of the data. (We discussed SAP EA Designer in Chapter 4.) 쐍 SAP Cloud Platform Big Data Services to manage the big data services in the cloud. This includes SAP Vora that we discussed in this chapter. 쐍 SAP Data Hub for providing a central system for data governance, orchestration, pipelining, integration, and federation of data in the company. Important Terminology In this chapter, we covered the following important terms: 쐍 Data provisioning Get data from a source system and provide that data to a target system. 쐍 Extract, transform, load (ETL) The ETL process consists of the following phases: – Extraction: Data is read from a source system. Traditional ETL tools can read many data sources, and most can also read data from SAP source systems. – Transform: Data is transformed. Normally this means cleaning the data, but it can refer to any data manipulation. – Load: Data is written into the target system. 403 404 Chapter 9 Data Provisioning 쐍 Initial load The process of making a full copy of data from the source system. If we require an entire table, we make a copy of the table in the target system. 쐍 Delta load Look at the data that has changed, and apply the data changes to the copy of the data in the target system. These data changes are referred to as the delta, and applying these changes to the target system is called a delta load. 쐍 SAP extractors Read data from a group of tables and deliver the data as a single integrated data unit to the target system. 쐍 SAP Data Services Solution used to integrate, improve the quality of, profile, process, and deliver data. 쐍 SAP Landscape Transformation Replication Server Uses replication to load data into SAP HANA. Remember that the focus of a replication server is to load data into the target system as fast as possible. 쐍 SAP Replication Server Uses replication to load data into SAP HANA. SAP Replication Server differs from SAP Landscape Transformation Replication Server in that it uses the database log files for real-time replication. SAP Replication Server reads these database log files to enable table replication. SAP Replication Server replication is also known as log-based replication. 쐍 SAP HANA smart data access Provides SAP HANA with direct access to data in a remote database. 쐍 SAP HANA smart data integration Uses various adapters to replicate, load, or consume data in SAP HANA. 쐍 SAP HANA smart data quality Used to cleanse and enrich data persisted in the SAP HANA database. 쐍 SAP HANA streaming analytics Allows you to observe, process, and analyze fast-moving data. You can create continuous projects in Computation Continuous Language (CCL) to model streaming information. You can then issue continuous queries to analyze the data. 쐍 SAP HANA remote data sync Allows you to synchronize data between SAP HANA and thousands of remote SAP SQL Anywhere databases, while still ensuring that you can acquire data for analysis in SAP HANA over intermittent and slow networks. Practice Questions Chapter 9 쐍 SAP Vora Uses the in-memory technologies that SAP developed with SAP HANA to speed up long-running analytics in Hadoop and Spark clusters. Practice Questions These practice questions will help you evaluate your understanding of the topic. The questions shown are similar in nature to those found on the certification examination. Although none of these questions will be found on the exam itself, they will allow you to review your knowledge of the subject. Select the correct answers, and then check the completeness of your answers in the “Practice Question Answers and Explanations” section. Remember, on the exam, you must select all correct answers and only correct answers to receive credit for the question. 1. Which of the following SAP data provisioning tools can provide real-time data provisioning? (There are 3 correct answers.) 첸 A. SAP HANA smart data integration 첸 B. SAP Data Services 첸 C. SAP Replication Server 첸 D. SAP HANA smart data quality 첸 E. SAP Landscape Transformation Replication Server 2. What is the correct sequence for loading data with SAP Data Services? 첸 A. Extract • Load • Transform 첸 B. Load • Transform • Extract 첸 C. Load • Extract • Transform 첸 D. Extract • Transform • Load 3. Which of the following data provisioning tools can provide a delta load of a table from the source system? 첸 A. SAP HANA streaming analytics 첸 B. SAP HANA smart data access 첸 C. SAP Landscape Transformation Replication Server 첸 D. Flat files 405 406 Chapter 9 Data Provisioning 4. Which of the following data operations can you perform with SAP HANA smart data access? (There are 2 correct answers.) 첸 A. Data federation 첸 B. Logical data warehouse 첸 C. Physical data warehouse 첸 D. Real-time replication 5. You want to retrieve data from a web application. Which of the following could you use? (There are 3 correct answers.) 첸 A. JSON 첸 B. REST 첸 C. JDBC 첸 D. BICS 첸 E. OData 6. Which data provisioning tools can use SAP HANA adapters? (There are 2 correct answers.) 첸 A. SAP HANA smart data access 첸 B. SAP HANA streaming analytics 첸 C. SAP HANA smart data integration 첸 D. SAP Replication Server 7. Which of the following data provisioning tools can you use to read data directly from Twitter? (There are 3 correct answers.) 첸 A. SAP Replication Server 첸 B. SAP HANA XS engine 첸 C. SAP Data Services 첸 D. SAP HANA smart data integration 첸 E. SAP HANA smart data access Practice Questions Chapter 9 8. Which SAP tools can work with data from Hadoop? (There are 3 correct answers.) 첸 A. SAP HANA smart data access 첸 B. SAP Landscape Transformation Replication Server 첸 C. SAP HANA smart data integration 첸 D. SAP HANA remote data sync 첸 E. SAP Vora 9. The company asks you to get data from IoT sensors. Which SAP tools could you use? (There are 3 correct answers.) 첸 A. SAP SQL Anywhere 첸 B. SAP HANA remote data sync 첸 C. SAP Landscape Transformation Replication Server 첸 D. SAP HANA streaming analytics 첸 E. SAP HANA smart data access 10. Which SAP tool can you use to include geospatial and predictive processing as part of the data transformation process? 첸 A. SAP HANA smart data access 첸 B. SAP HANA smart data quality 첸 C. SAP HANA smart data integration 첸 D. SAP Data Services 11. What type of file would you create in SAP Web IDE for SAP HANA to do data cleaning? 첸 A. .hdbtabledata 첸 B. .hdbgraphworkspace 첸 C. .hdbcds 첸 D. .hdbflowgraph 407 408 Chapter 9 Data Provisioning Practice Question Answers and Explanations 1. Correct answers: A, C, E SAP HANA smart data integration, SAP Replication Server, and SAP Landscape Transformation Replication Server. See Table 9.1 at the end of this chapter for a quick reference. SAP HANA smart data quality works together with SAP HANA smart data integration but focuses on cleaning the data, not replicating the data. 2. Correct answer: D Extract • Transform • Load. SAP Data Services is an ETL tool. 3. Correct answer: C SAP Landscape Transformation Replication Server provides delta loads. SAP HANA streaming analytics performs neither initial nor delta loads because it streams data continually. SAP HANA smart data access and flat files can only be used to perform an initial load. 4. Correct answers: A, B, D SAP HANA smart data access can be used for data federation and as a logical data warehouse. SAP HANA smart data access isn’t used to load data for a physical data warehouse because there is no delta load mechanism. SAP HANA smart data access can’t perform real-time replication because there is no delta load mechanism. 5. Correct answers: A, B, E JSON, REST, and OData are all associated with web services. Web applications (in the cloud or on the Internet) don’t normally allow Java Database Connectivity (JDBC) database connections. Web applications aren’t necessarily SAP systems that understand BICS. Again, a web application won’t allow database connections via the Internet for security reasons. 6. Correct answers: A, C SAP HANA smart data access and SAP HANA smart data integration can use the SAP HANA adapters. To use the built-in adapters, the data provisioning tools also have to be built in. SAP HANA streaming analytics reads streaming data, but not by using the adapters. Practice Question Answers and Explanations Chapter 9 SAP Replication Server reads database log files. 7. Correct answers: B, C, D Twitter does have web services. This isn’t specified in the book, so in the actual exam, this option would not appear because it makes too many assumptions. For the purposes of this book, we assumed that everyone knows what Twitter is and that it has web services. SAP Data Services can read social media feeds, as discussed in the Extracting Data section. SAP HANA smart data integration has a Twitter adapter. See the SAP HANA Smart Data Integration section for more information. 8. Correct answers: A, C, E SAP HANA smart data access, SAP HANA smart data integration, and SAP Vora can work with data from Hadoop. SAP Landscape Transformation Replication Server can only use SAP ABAP systems as the data source. SAP HANA remote data sync connects to remote SAP SQL Anywhere databases. 9. Correct answers: A, B, D You can use SAP SQL Anywhere to store the IoT sensor data, and you then use SAP HANA remote data sync to get it to SAP HANA. SAP HANA streaming analytics is often used to stream sensor data to SAP HANA. SAP Landscape Transformation Replication Server uses SAP systems as data sources. SAP HANA smart data access doesn’t connect to sensors (unless some fancy new device embeds a database!). 10. Correct answer: B You can you use SAP HANA smart data quality to include geospatial and predictive processing as part of the data transformation process. SAP HANA smart data access doesn’t have transformation capabilities. SAP HANA smart data integration is used to get the data to SAP HANA smart data quality. SAP Data Services doesn’t have predictive processing as part of the data transformation process. 11. Correct answer: D You create an .hdbflowgraph file in SAP Web IDE for SAP HANA to do data cleaning. 409 410 Chapter 9 Data Provisioning .hdbtabledata is used to load CSV files into SAP HANA. .hdbgraphworkspace is used for graph processing. .hdbcds is used, for example, to create tables using CDS. Takeaway You should now have a high-level understanding of the various SAP data provisioning tools available for getting data into an SAP HANA system, and you should know what each one does. When asked to recommend an SAP data provisioning tool during a project, you can use Table 9.1 as a quick reference. Business Requirement Data Provisioning Tool Real-time replication SAP Landscape Transformation Replication Server, SAP Replication Server, or SAP HANA smart data integration ETL, data cleaning SAP Data Services, or SAP HANA smart data quality Extractors SAP Data Services Streaming data SAP HANA streaming analytics Federation, big data SAP HANA smart data access POC, development, testing Flat files Remote sites, or occasionally connected devices SAP HANA remote data sync Table 9.1 Choosing a Data Provisioning Tool Summary You’ve learned the various data provisioning concepts and looked at the different SAP data provisioning tools available in the context of SAP HANA. They all have their own strengths, and you now know when to use the various tools. In the next two chapters, we’ll start using exciting features in SAP HANA for searching and analyzing text, leveraging predictive library functions using spatial features, creating graph models for analyzing relationships, and processing IoT data using series data. Chapter 10 Text Processing and Predictive Modeling Techniques You’ll Master 쐍 Know how to create a text search index 쐍 Discover fuzzy text search, text analysis, and text mining 쐍 Get to know the types of prediction algorithms 쐍 Learn how to create a predictive analysis model 412 Chapter 10 Text Processing and Predictive Modeling Now that you understand the basics of SAP HANA information modeling, it’s time to take your knowledge to the next level. In this chapter, we’ll cover two different topics: text processing and predictive modeling. In the next chapter we’ll discuss spatial, graph, and series data modeling. These five topics are independent from each other but can be combined in projects; for an example, see the Real-World Scenario box. In this chapter and the next, we’ll do a deep dive into some advanced modeling with SAP HANA. Each topic that we’ll discuss is a specialist area on its own, on which people build their careers. We obviously won’t go that deep into these topics, but you’ll have a good base for more detailed learning on any of these topics at the end of these two chapters. These two chapters probably include the most changes from the previous edition. In the second edition, these topics were in one chapter, as they were only 10% of the certification exam. SAP Education has expanded its material on advanced modeling with SAP HANA to a separate three-day training course named HA301, and it’s now 25% of the certification exam. Most of these advanced topic areas are components of SAP HANA that are only available in SAP HANA, enterprise edition. The enterprise edition contains everything from the standard edition and components such as text processing, predictive modeling, and spatial, graph, and streaming analytics. SAP HANA, express edition, has all of these components and is a free edition for both development and production with usage up to 32 GB. You can easily expand up to 128 GB by purchasing more memory usage. To move beyond 128 GB, you’ll need to upgrade to a full-use license edition. We’ll begin our discussion with text modeling—specifically, how to search and find text even if misspelled, how to analyze text to find sentiment, and how to mine text in documents. Then, we’ll look at how to build predictive analysis models for business users using a powerful built-in library and graphical web-based tools. Real-World Scenario You’re working on an SAP HANA project at a pharmaceutical company. The company has created a mobile web application with SAP HANA for its sales representatives. However, users are struggling to find products on their mobile phones because the keyboard is small, and the names of the Objectives of This Portion of the Test Chapter 10 pharmaceutical products are quite difficult to type correctly. Therefore, you implement a fuzzy text search that allows users to enter roughly what they want, and the SAP HANA system finds the correct products. The company later tells you that this also helped users avoid adding duplicate entries into the system. You improve this mobile application by suggesting products to the sales reps. You create a predictive analysis model in SAP HANA to help with sales forecasting by looking at the seasonal patterns to determine when people use certain products more (e.g. , in the winter). You also notice that there were many comments about the company’s products on Facebook and Twitter. You process these comments in SAP HANA and can quickly tell the marketing department which products people are talking about, as well as alert the company if people are making negative comments about any product. You provide special alerts to the sales reps to inform the customers about these products. And, of course, you win the “Best Employee” award for the year! Objectives of This Portion of the Test The purpose of this portion of the SAP HANA certification exam is to test your knowledge of advanced modeling with SAP HANA. The certification exam expects you to have a good understanding of the following topics: 쐍 How to create a text search index 쐍 Fuzzy text search and text search queries 쐍 Text analysis and the text analytics configurations 쐍 Text mining 쐍 The types of prediction algorithms 쐍 How to create a predictive analysis model Note This portion contributes up to 15% of the total certification exam score. 413 414 Chapter 10 Text Processing and Predictive Modeling Key Concepts Refresher In recent years, the field of text and language processing has expanded dramatically. Topics such as talking computers that were previously only found in science fiction stories now fill the media. The daily technology news articles talk glowingly of chatbots, artificial intelligence (AI), natural language processing (NLP), text-tospeech, and speech-to-text. Many people have voice-based “assistants” in their homes and on their phones where they regularly talk to Alexa, Siri, and the Google Assistant. Automatic language translation is improving due to new deep learning systems and powerful custom-made processors. You often have to help set the right expectations when working in a project. With all the media attention, it’s easy to get caught up in believing you can do everything with such language technologies. While some of it works well for shorter sentences, this won’t replace translators or editors. Even if it’s 99% correct, this paragraph would have at least one major mistake in it, let alone the subtler nuances of language, such as intonation, double meanings, irregular verbs, puns, oxymorons, irony, satire, and sarcasm. For example, “I named my dog five miles so I can tell people that I walk five miles every day.” Or, “Help stamp out, eliminate, and abolish redundancy!” In the next section, we’ll focus on some of the text processing features built into SAP HANA. SAP also has chatbots, robotic process automation (RPA), and machine learning, but we won’t discuss these here. Text Processing As you know, in Google, you type in a few words, and the Google search engine returns a list of web pages that best meet your search query, ranked to your preferences. It used to indicate when you typed a word incorrectly and suggested another spelling with the question, Did you mean…?. Now, however, the Google search engine has become so confident that it no longer suggests the correction with a qualifying question, but instead says Showing results for…. If you’re convinced your original entry is correct, it will allow you to Search instead for [whatever you entered]. Even more revolutionary is the search engine’s ability to predict what you want as you type. In the following, we’ll first provide you with a brief overview of text processing. We’ll then look at working with the text features in SAP HANA—namely, text searches, text analysis, and text mining. However, before we can discuss each of Key Concepts Refresher Chapter 10 these topics in more detail, we first need to create a full-text index. As you’ll see in Figure 10.1, the full-text index is the backbone of all these processes. Text Processing Overview Google search is a good example of what you can do with SAP HANA in your information models. We can perform fuzzy text searches, such as the “Did you mean” example, and we can perform text mining to find the best document out of a large collection for a user. We can also use text analysis to find out if people wrote some text with a positive or negative sentiment. Figure 10.1 provides an overview of the various text processing processes in SAP HANA. We can use SAP HANA text processing on any table column that has some text data included. In this case, we want to process the data in the Description field. This column could have data in a NVARCHAR field, or documents such as PDFs stored in a BLOB or TEXT field. We have to prepare the data in this field before we can run text processing queries on it. There are three main scenarios for querying the text. We’ll discuss each of these in more detail in upcoming subsections: 쐍 Text search This feature is used when we want to find out which records contain some search terms. To get this working, we first need to create a full-text index for the Description field. The full-text index is created in a hidden column in the table and can only be accessed indirectly via the SAP HANA text processing features. When SAP HANA creates the full-text index, it first processes the files contained in the Description field if we store documents in it. It can extract the text from these documents, can detect the language the text is written in, and break up the text in words or terms—a process called tokenization. Lastly, SAP HANA can convert these tokens into their root words using a process called stemming. 쐍 Text analysis This feature extends the functionality of the full-text index. When creating the full-text index, we need to specify a text analysis configuration file and activate text analysis, which is used when we want to find out the sentiment of the text or to extract more semantic details from the text. With text analysis, SAP HANA can find parts-of-speech, group nouns, and extract entities and facts. (Noun groups are words that are often used together, for example, global economy or fish and chips.) All these results are stored in a new table with a $TA_ prefix. 쐍 Text mining This feature extends the functionality of the full-text index and text analysis results. Where text searching and text analysis work within a document, text 415 416 Chapter 10 Text Processing and Predictive Modeling mining works across and with multiple documents. When we create the fulltext index, we need to specify a text mining configuration file and activate text analysis and text mining. Text mining is used when we want to work with, classify, and cluster stored documents in our table columns. We can run queries to find documents that contain certain search terms, and SAP HANA can also find any similar or related documents for us. These results are stored in new tables with a $TM_ prefix. The only table you should use is prefixed with $TM_MATRIX_. This table is also known as the text mining index and contains a term document matrix. Text mining is optimized through the results of the full-text index and the text analysis, and it uses tokenization, stemming, part-of-speech tagging, and entity extraction. Text Analysis Configuration Text Mining Configuration Table with text or documents Full text index ID Title Description (text or document) (hidden column) Field 1 Field 2 … • File proc. • Language detection • Tokens • Stems $TA_ table with text analysis results $TM_ tables with text mining results • • • • $TM_MATRIX_ table for text document matrix (also known as “text mining index”) Optimized through results of text analysis (tokens, stems, part-of-speech tags, entities) Part-of-speech tagging Noun groups Entity extraction Fact extraction Figure 10.1 Overview of Text Processing in SAP HANA Text Index To enable the text features in SAP HANA, you need a full-text index. Depending on what we specify when we create the full-text index, it gives access to one or more text features. The full-text index will use additional memory to provide the requested text features. Key Concepts Refresher Chapter 10 Listing 10.1 provides an example of a SQL statement to create a full-text index named DEMO_TEXT_INDEX on a column called SOME_TEXT in a column table. This fulltext index enables the SOME_TEXT column for text search and analysis. CREATE FULLTEXT INDEX DEMO_TEXT_INDEX ON "DEMO_TEXT" ("SOME_TEXT") LANGUAGE DETECTION ('EN') SEARCH ONLY ON FUZZY SEARCH INDEX ON TEXT ANALYSIS ON CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER' TEXT MINING OFF; Listing 10.1 SQL Statement to Create a Full-Text Index Let’s examine the different text elements of the SQL statement in Listing 10.1: 쐍 CREATE FULLTEXT INDEX The first line creates a full-text index from the SOME_TEXT column in the DEMO_ TEXT table. 쐍 LANGUAGE DETECTION SAP HANA will attempt to detect the language of your text. You can specify on what languages it should focus with this clause. In this case, we indicated that our text is all in English. 쐍 SEARCH ONLY ON This clause saves some memory by not copying the source text into the full-text index. 쐍 FUZZY SEARCH INDEX ON This enables or disables the fault-tolerant search feature. This feature is also called fuzzy text search. You can speed up the fuzzy search on string types by creating an additional data structure called a fuzzy search index. However, you should be aware that this additional index increases the memory requirements of the table. In addition, don’t confuse the fuzzy search index with the full-text index; it’s just an additional data structure to the full-text index. 쐍 TEXT ANALYSIS ON This enables or disables the text analysis feature. 쐍 CONFIGURATION This is the name of the configuration. The text analysis feature needs a configuration file. Some standard configurations ship with SAP HANA, and you can also create your own. 쐍 TEXT MINING OFF This enables or disables the text-mining feature. 417 418 Chapter 10 Text Processing and Predictive Modeling Tip The full-text index isn’t just used for text searches on a column but can also enable language detection, search, text analysis, and text mining on that column. You open the definition of the column table by double-clicking the table’s name in the Data Explorer perspective of SAP Web IDE for SAP HANA. In the Indexes tab, you’ll see the full-text index named DEMO_TEXT_INDEX that you just created. When you select this index, you’ll see the details of the full-text index as shown in Figure 10.2. Figure 10.2 Full-Text Index Details You can see from the checkboxes that we enabled Fuzzy Search Index and Text Analysis and disabled Text Mining in this index. When Fuzzy Search Index is enabled for text-type columns, the full-text index increases the memory size of the columns by an additional 10% approximately. Key Concepts Refresher Chapter 10 When text analysis is enabled, the increase in the memory size due to the full-text index depends on many factors, but it can be as large as that of the column itself. Tip Remember that a full-text index is created for a specific column, not for the entire table. Figure 10.2 shows the Column Name as SOME_TEXT. You’ll have to create a full-text index for every column that you want to search or analyze. SAP HANA will automatically create full-text indexes for columns with data types TEXT, BINTEXT, and SHORTTEXT. These are called implicit full-text indexes. In this case, the content of the table column is stored in the form of the full-text index to save memory consumption. The original column content is recreated from the full-text index when you use a SQL SELECT statement. Because the focus for TEXT-type fields is on search, SQL operations such as JOIN, CONCAT, LIKE, HAVING, WHERE, ORDER BY, and GROUP BY, and aggregates such as COUNT and SUM aren’t supported. For other column types, you must manually create the full-text index, as we described in Listing 10.1. You can only manually create a full-text index for columns with the VARCHAR, NVARCHAR, ALPHANUM, CLOB, NCLOB, and BLOB data types. When you manually create a full-text index, it’s called an explicit full-text index. The content of the table column is stored in its original format. When you manually create an index, SAP HANA attaches a hidden column to the column you indexed. This hidden column contains extracted textual data from the source column; the original source column remains unchanged. Search queries are performed on the hidden column, but the data returned is from the source column. The full-text index doesn’t store the results of your search or analysis, but it does enables the search/analysis. Note You can put the SQL statement in a procedure (using an .hdbprocedure file). This way, you can deploy it later on the production system. The recommended way to create full-text indexes is using core data services (CDS). You can find the CDS syntax in the SAP HANA Search Developer Guide at http://s-prs.co/v487620, in the Creating Full-Text Indexes Using CDS Annotations section. You can create the same full-text index as you did with SQL code in Listing 10.1 using CDS syntax in an .hdbfulltextindex file. When you compare this with Listing 10.2, you’ll see that they are identical, except that the CDS file doesn’t have the CREATE statement. 419 420 Chapter 10 Text Processing and Predictive Modeling One big difference, however, is that the SQL code uses the built-in configuration file called EXTRACTION_CORE_VOICEOFCUSTOMER. If you’re using CDS, you’ll have to make a copy of this configuration and store it in SAP Web IDE for SAP HANA as an XML file. FULLTEXT INDEX DEMO_TEXT_INDEX ON "DEMO_TEXT.Text_Data" ("SOME_TEXT") LANGUAGE DETECTION ('EN') SEARCH ONLY ON FUZZY SEARCH INDEX ON TEXT ANALYSIS ON CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER' TEXT MINING OFF Listing 10.2 Creating a Separate Full-text Index Using an .hdbfulltextindex File As you should create all your files using CDS, you can actually create the full-text index in the same .hdbcds file used for creating the table. The creation of the fulltext index is included as a Technical Configuration, as shown in Listing 10.3. context DEMO_TEXT3 { Entity Text_Data2 { key TEXT_ID: Integer not null; SOME_TEXT: hana.VARCHAR(100); } Technical Configuration { FULLTEXT INDEX DEMO_TEXT_INDEX ON (SOME_TEXT) LANGUAGE DETECTION ('EN') SEARCH ONLY ON FUZZY SEARCH INDEX ON TEXT ANALYSIS ON CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER' TEXT MINING OFF; }; }; Listing 10.3 Creating a Full-Text Index in an .hdbcds File In Figure 10.1 earlier, you saw that the creation of the full-text index involved the following four steps: 쐍 File processing If the column that you’re creating the full-text index on contains documents, the file processing step will extract text or HTML from any binary document format to text/HTML. Some of the supported Multipurpose Internet Mail Extensions (MIME) types are PDF, Microsoft Office documents, LibreOffice Key Concepts Refresher Chapter 10 documents, plain text, HTML, and XML. These documents are typically stored in a BLOB-type field in the database table. 쐍 Language detection SAP HANA has the ability to identify the language the text is written in. It then uses this to apply the appropriate tokenization and stemming in the next steps. You can specify the language as in the example code of Listing 10.3. You can also specify a column name that specifies the language for that record. There are 32 languages supported in SAP HANA 2.0 SPS 03. Every new SPS release of SAP HANA keeps enhancing the list of languages and the available features. Not all features are available in all languages. Some features are only available for English, other are available for a few languages, and some are available for all the available languages. 쐍 Tokenization This is where the system breaks documents, paragraphs, and sentences into individual words; for example, “I need some coffee” is decomposed to the words “I”, “need”, “some”, and “coffee”. 쐍 Stemming Here the system normalizes tokens to its linguistic base form, for example, the words “ran”, running”, and “runs” are all converted to the stem word “run”. The results of these steps are stored in the full-text index and used when you query the column for text search results. Text Search Text searching works on all string columns, even without a full-text index. However, text search will always use the full-text index if it exists for the preprocessed text that has gone through file filtering and tokenization. Full-text indexes are especially beneficial for multiple words and long text, such as documents. After you’ve created a full-text index, you can do various types of text searches. The first type of search is the exact text search, which is comparable to a WHERE clause in SQL, but can be more powerful. This is the default type of text search that will be used if you don’t specify the text search type explicitly. When you do a linguistic text search, the search will find matches using stemming and return all words with the same stem as the searched word. When you’ve created a full-text index with the fuzzy search option enabled, you can do fault-tolerant searches on your text data. (These are also called fuzzy searches.) This means you can misspell words, and SAP HANA will still find the text 421 422 Chapter 10 Text Processing and Predictive Modeling for you. The real-world scenario at the beginning of the chapter illustrates this point in a somewhat exaggerated manner. SAP HANA will find alternative (US and UK) spellings for words such as color/colour, artifact/artefact, utilize/utilise, and check/cheque. Incorrectly entered words, such as typing coat, will still return the correct word, which was probably cost. Note SAP HANA includes an embedded search via the built-in sys.esh_search() procedure. With this procedure, you can use an existing ABAP SQL connection instead of an SAP HANA extended application services, classic model (SAP HANA XS) OData service for fuzzy text searches in SAP HANA from SAP business applications. For more information on this new application programming interface (API), see the SAP HANA Search Developer Guide at http://s-prs.co/v487620. In Chapter 8, we discussed the LIKE operator in SQL. You can search for a random pharmaceutical product name (e.g., Guaiphenesin) via the following SQL statement: SELECT productname, price from Products where productname LIKE ‘%UAIPH%’; This will find the name if you type the first few characters correctly. If you search for “%AUIPH%” (which is misspelled) instead, you won’t find anything. There is a better way that will return the text you’re searching for, in spite of being misspelled. In SAP HANA, you can use the CONTAINS predicate to use the fuzzy text search feature. In this case, you use the following SQL statement: SELECT SCORE(), productname, price from Products where CONTAINS(productname, ‘AUIPH’, FUZZY(0.8)) ORDER BY SCORE() DESC; The CONTAINS predicate only works on column tables. It doesn’t work across multiple tables or views and doesn’t work with calculated columns. The SCORE() function helps sort the results from the most relevant to the least relevant. Tip Remember that your users won’t type SQL statements; you’ll create these statements in a user-defined function (UDF) and call them from your graphical calculation view or your SAP HANA extended application services, advanced model (SAP HANA XSA) application. Key Concepts Refresher Chapter 10 The CONTAINS predicate has three parameters: the column name you want to search on, the search string itself (even if misspelled), and how “fuzzy” you want the search results to be in this case. The fuzziness factor can range from zero (everything matches) to one (requires an exact match). This factor is called the fuzzy threshold value. Note There are two terms used for the fuzziness factor. When you specify the amount of fuzziness in the query, it’s called the fuzzy score. When you get the results, the fuzziness factor is called the fuzzy threshold. Using the FUZZY() function as the third parameter of the CONTAINS predicate shows that you want to do a fuzzy text search. If you want to do an exact text search, instead use the EXACT keyword or leave out the third parameter. You can use the LINGUISTIC keyword as the third parameter to do a linguistic text search. Tip The linguistic text search uses the language detection and stemming results from the full-text index. It’s quite a lot of work for the system to create the stemming results, so SAP HANA allows you to switch it off when you create the full-text index by specifying FAST PREPROCESS OFF. If you want to search on all columns with a full-text search index, you can specify the CONTAINS predicate as follows: CONTAINS (*, ‘AUIPH’, FUZZY(0.8)) In this case, the asterisk (*) in the first parameter to the CONTAINS predicate ensures that the search includes all columns. When you search on multiple columns at once, it’s called a freestyle search. Instead of a freestyle search, you can specify which columns (multiple) you want to include or exclude in the list. You can use some special characters in the search string (the second parameter) as well: 쐍 Double quotes (") Everything between the double quotes is searched for exactly as it appears. 쐍 Asterisk (*) Replaces zero or more characters in the search string; for example, cat* matches cats, catalogues, and so on. 423 424 Chapter 10 Text Processing and Predictive Modeling 쐍 Question mark (?) Replaces a single character in the search string; for example, cat? matches cats but not catalogue. 쐍 Minus (-) When placed in front of a search word, it acts as a NOT statement. The results will exclude any result where this word occurs; for example, dog* -cat* matches dogs but not cats and dogs. You can use AND and OR logic with the search terms, but brackets aren’t supported as search operators. SPS 04 In SAP HANA 1.0, you could do a fuzzy text search in an attribute view. Attribute views were deprecated many years ago, and the ability to do a fuzzy text search in a graphical calculation view wasn’t implemented until SAP HANA 2.0 SPS 04. It’s quite powerful, as it can be implemented with all calculation views. Figure 10.3 shows the Semantic node of a calculation view. The Parameters dropdown menu has two new options added. We discussed the new Session Variable in Chapter 7. The Fuzzy Search Parameter allows you to create a fuzzy text search for your graphical calculation view. You can specify the Fuzzy Score, as well as the Default Search String. As with other Parameters, such as Input parameter, you can then assign it to an output field. Figure 10.3 Using Fuzzy Text Search in a Calculation View with SAP HANA 2.0 SPS 04 Key Concepts Refresher Chapter 10 You can combine text search queries with additional functions to enhance your text search queries and results. With the SCORE() function, you’ve just seen an example of an additional function that helps with sorting the results of the text search by relevance. Some other functions that you can use are as follows: 쐍 NEAR() This function allows you to do a proximity search for words that are near to each other. Let’s say you have a sentence like “I passed the SAP HANA certification exam.” The query WHERE CONTAINS(*, ‘NEAR(exam, hana), 2, false’) will find the places where the word exam is within 2 tokens distance of the word HANA. The false parameter indicates that you don’t care about the sequence of the words. 쐍 HIGHLIGHTED() You can use this function to highlight the search term in the results in bold. Your query can be something like SELECT HIGHLIGHTED(TITLE), DESCRIPTION FROM TABLE1 WHERE CONTAINS(“TITLE”), ‘HANA’);. The search term “HANA” will be shown in bold; for example, I passed the SAP HANA certification exam. 쐍 SNIPPETS() This function is similar to the HIGHLIGHTED() function, except that it will return a short snippet instead of the entire sentence or paragraph. For example, a query like SELECT SNIPPET(TITLE), DESCRIPTION FROM TABLE1 WHERE CONTAINS(“TITLE”), ‘HANA’); will return . . . passed the SAP HANA certification . . . with an ellipsis on both sides of the snippet. 쐍 WEIGHT() With this function, you can give a higher relevance (more weight) to search terms that are in the Title column than those in the Description column—using the table in Figure 10.1 as an example. You can create a query like WHERE CONTAINS((“TITLE”, “DECRIPTION”), ‘exam’, WEIGHT(1, 0.5)). The weight is a value between 0 and 1. In this case, the query will give a relevance of 1 for finding the word exam in the Title and 0.5 for finding it in the Description. Text Analysis After you’ve created a full-text index with the TEXT ANALYSIS option enabled, you can also analyze text with the SAP HANA text analysis feature, both via entity analysis and sentiment analysis. You create text analysis via the full-text index. The abbreviated statement is as follows: 425 426 Chapter 10 Text Processing and Predictive Modeling CREATE FULLTEXT INDEX INDEX_NAME ON TABLE_NAME ("COLUMN_NAME") TEXT ANALYSIS ON CONFIGURATION 'CONFIG_NAME'; You need the TEXT ANALYSIS ON clause, and a CONFIGURATION NAME is optional. There are a few built-in configurations, but you can also create your own industry-specific configuration files. Each of the configurations has its own unique extraction rules and dictionaries of entity values and types. In the earlier example shown in Listing 10.1, we created the full-text index (called DEMO_TEXT_INDEX) on table DEMO_TEXT. When you look at the table in the SAP HANA database explorer perspective, you’ll notice that SAP HANA has created an additional table: $TA_DEMO_TEXT_INDEX (see Figure 10.4, bottom left). This is the same name of the full-text index we created, prepended with $TA_ to denote text analysis. You get the text analysis request results back in table $TA. Tip Table $TA isn’t the full-text index; it’s a table that contains the results of the text analysis. Figure 10.4 Source Table (DEMO_TEXT) and $TA_ Text Analysis Results Table The original table included the values in Table 10.1 in the SOME_TEXT column. Key Concepts Refresher Chapter 10 Laura likes apples. Laura dislikes chocolate. Colin loves running. Danie works for SAP. Table 10.1 SOME_TEXT Column Figure 10.5 shows the text analysis request results, using the EXTRACTION_CORE_VOICEOFCUSTOMER configuration file that is used for sentiment analysis. The text analysis determined the following points: 쐍 Laura, Colin, and Danie are people. 쐍 SAP is classified as an organization. 쐍 Apples, chocolate, and running are topics. 쐍 Laura liking apples is a weak positive sentiment. 쐍 Laura disliking chocolate is a weak negative sentiment. 쐍 Colin loving running is a strong positive sentiment. With a different text analysis configuration file, we’ll receive different results, as shown in Figure 10.6. Figure 10.5 $TA_ Text Analysis Results Table Content 427 428 Chapter 10 Text Processing and Predictive Modeling Using the standard configuration files, you’ll be able to identify many of the attributes (master data) in your text data. Identifying measures isn’t always that easy. For example, is 2020 a year, an amount, or a quantity? Figure 10.6 shows the $TA results table when no configuration was specified and with the FAST PREPROCESS OFF; allowing the text analysis to also do stemming. The TA_TOKEN column shows the various words (tokens), and the TA_TYPE column lists the linguistic or semantic type of that token. The TA_STEM column contains the stem of that token; for example, the stem of running is run, and the stem of apples is apple. Figure 10.6 The $TA_ Text Analysis Results Table for Stemming and Entity Extraction Tip The text analysis results are stored in a normal SAP HANA table. As such, you can use the results in many scenarios such as calculation views, SAP HANA XSA applications, and data mining tools. The optional Grammatical Role Analysis (GRA) analyzer for English is also available. It identifies groups of three in the form of subject–verb–object expressions. For an example, let’s look at a well-known phrase: The [SUBJECT]quick brown fox[/SUBJECT] [VERB]jumps[/VERB] over the [DIRECTOBJECT]lazy dog[/DIRECTOBJECT] The GRA analyzer identifies the fox and the dog as objects, and it identifies jumps as the verb. Key Concepts Refresher Chapter 10 Tip Text analysis results are stored in a flat structure with the table $TA. As such, it’s hard to represent the complex relationships between the extracted entities. This is one area where graph processing can certainly help. By combining text processing with graph analysis, you can start using graph queries from Chapter 11 to examine the text analysis with various algorithms. For example, you can discover closely connected relationships between entities that you won’t see when just looking at table $TA. SAP HANA has seven built-in text analysis configurations. You’ve already seen the EXTRACTION_CORE_VOICEOFCUSTOMER configuration when we used it for sentiment analysis. There are three configurations for linguistic analysis. 쐍 LINGANALYSIS_BASIC This configuration is used for tokenization only. 쐍 LINGANALYSIS_FULL This configuration is used for tokenization and stemming. If you don’t specify a configuration file, this configuration is used. An example of the results was shown earlier in Figure 10.6. 쐍 LINGANALYSIS_STEMS This configuration is used for tokenization, stemming, and parts of speech. Parts-of-speech tags word categories, for example, showing that quick is an adjective. The four configurations for extraction analysis extend the LINGANALYSIS_FULL configuration: 쐍 EXTRACTION_CORE This configuration is used for extracting standard entities, such as persons and phone numbers. 쐍 EXTRACTION_CORE_ENTERPRISE This configuration is used for extracting business relations. Examples are membership information, management changes, product releases, mergers and acquisitions, and organizational information. 쐍 EXTRACTION_CORE_PUBLIC_SECTOR This configuration is used for extracting government entities. Examples are action and travel events, military units, spatial references, and details of persons, such as aliases, appearances, attributes, and relationships. 쐍 EXTRACTION_CORE_VOICEOFCUSTOMER This configuration is used for extracting sentiment analysis. We saw an example of this when we produced the results in Figure 10.5. 429 430 Chapter 10 Text Processing and Predictive Modeling All text analysis configurations consist of three elements. 쐍 Dictionary The dictionary is a repository of entity categories and their possible values, for example, top soccer (football) countries such as Spain, Germany, Netherlands, Italy, and Brazil. It’s created with an .hdbtextdict file. 쐍 Rule set The rule set contains the rules that define how an entity can be identified; for example, a national identity number can start with the birth date of the person and should be 14 numbers in total. It’s created with an .hdbruleset file. 쐍 Text analysis configurations file The configuration file defines aspects of the text analysis process, for example, which dictionaries and rule sets to use, whether to use linguistic or entity extraction, and which phases of these extractions (like tokenization) should be used. It’s created with an .hdbtextconfig file. When you build your own custom text analysis configuration, you’ll build the files in the sequence above. It might be easier, however, to copy the existing files in SAP HANA and then modify them. All three of these files are XML files that can be edited with standard editors, including the code editor in SAP Web IDE for SAP HANA. Note For more information about text analysis configurations, refer to the SAP HANA Text Analysis Developer Guide at http://s-prs.co/v487620. Text Mining Fuzzy searches and text analysis work with a word, sentence, or paragraph—but with text mining, you can work with entire documents. These documents can be web pages, Microsoft Word documents, or PDF files that are loaded into SAP HANA. Text mining uses the full-text indexing and text analysis processes, as shown earlier in Figure 10.1. Text mining uses these to produce a matrix of documents and terms. When using text mining in SAP HANA, you can perform the following tasks: 쐍 Categorize documents. 쐍 Find top-ranked documents related to a particular term. Key Concepts Refresher Chapter 10 쐍 Analyze a document to find key phrases that describe the document. 쐍 View a list of documents related to the one you asked about. 쐍 Access documents that best match your search terms. When you activate text mining when creating the full-text index, it also creates a few $TM tables. The term-document matrix (known as the text mining index) is an optional data structure that uses the results of text analysis. Figure 10.7 shows the text mining results for the data in Table 10.1 that we previously used for the text analysis. It’s called a term-document matrix, as the terms occurring in each document are listed with their frequency. The TEXT_ID column has the number of the document, which, in this case, was just one or two sentences. This is a very small example. For a large corpus of documents, this can become a large data set. You can apply powerful mathematical, statistical, and data mining processes to this data set. Figure 10.7 The $TM Text Mining Results Tables and the Contents of the Text Mining Index SAP HANA also provides several text mining functions to help process the results table. There are functions to find related terms, suggested terms, relevant terms, relevant documents, related documents, and document categories. The document categories are created using the on k-nearest neighbor (“kNN” or “k-NN”) algorithm. You’ll see it mentioned again in the next section on predictive modeling. You can call the text mining functions from SQL functions or from web applications. 431 432 Chapter 10 Text Processing and Predictive Modeling Tip Remember that users won’t work at the lower level that we’ve described in this chapter: They will want to use text features from a browser-based application interface. This is why SAP has focused on supplying various JavaScript APIs for SAP HANA modelers and developers. Note You can get more information in the “Full-Text Search with SAP HANA Platform” openSAP course at http://s-prs.co/v487632. At http://s-prs.co/v487620, you can find three developer guides for text processing: the SAP HANA Search Developer Guide, the SAP HANA Text Analysis Developer Guide, and the SAP HANA Text Mining Developer Guide. You will have to click on the View All button in the Development section to find some of these guides. Or download all the files from the Documentation Download section. You can also view the SAP HANA Academy on YouTube. The playlist for text processing is at http://s-prs.co/v487633. Now that you know how to work with text data in SAP HANA and can use fault-tolerant text searches, text analysis, and text mining in SAP HANA modeling work, let’s move on to predictive modeling. Predictive Modeling In this section, we’ll look at the SAP HANA Predictive Analysis Library (PAL) and how to use it to create a predictive analysis model. We’ll look at predicting future probabilities and trends and how you can use this functionality in business. In the real-world scenario at the beginning of the chapter, we described a use case in which you might create a predictive model in SAP HANA to help sales reps with their sales forecasting by looking at seasonal patterns; for example, people use more pharmaceuticals for allergies in the spring. There are many use cases for PAL, which is built in to SAP HANA. Here are just a couple examples: 쐍 For financial institutions such as banks, you can detect fraud by looking at abnormal transactions. 쐍 For fleet management and car rental companies, you can predict which parts will break soon in a vehicle and replace them before they actually break. Key Concepts Refresher Chapter 10 We’ll see some more use cases when we discuss the types of algorithms and what you can use them for. Before discussing PAL in more detail, let’s take a quick look at the wider picture. PAL is just one of several tools available for addressing machine learning and predictive analysis requirements in your projects. 쐍 Predictive Analysis Library (PAL) PAL provides more than 90 classic and trending algorithms for native in-database processing of data that you can use in your information modeling. 쐍 Automated Predictive Library (APL) The embedded SAP Predictive Analytics automated analytics engine processes your data automatically with multiple scenarios and determines which predictive algorithm is the best for your scenario using structural risk minimization. APL makes it quick and easy for nonexpert data scientists to consume applications built on SAP HANA. 쐍 SAP domain-specific libraries There are several SAP solutions that have libraries for native in-database processing of application data, for example, the Unified Demand Forecast library for customer activities and the Optimization Function Library (OFL) for supply chain solutions. 쐍 Application Function Library (AFL) Software Development Kit (SDK) This SDK allows SAP partners and customers to create their own functions in libraries similar to PAL, OFL, and Unified Demand Forecast. The code is written in C++, and the custom-created libraries can be used internally in the company or can be sold to SAP customers. 쐍 TensorFlow integration TensorFlow is an open-source deep learning library from Google that is used for many scenarios, for example, image classification where the system “recognizes” objects in photographs. You can call TensorFlow models from SAP HANA External Machine Learning AFL. 쐍 R integration You’ve seen in Chapter 8 that you can write procedures using SQLScript. You can also write procedures in R, AFL, and GraphScript. R is a programming language used by many statisticians and data miners. You can do everything in PAL and more using R. The advantage is that you can reuse lots of code from other projects. PAL is much faster for business data, however, because it runs in database, whereas the R code and the data are sent to an R server to run externally. 433 434 Chapter 10 Text Processing and Predictive Modeling SPS 04 Integration to Python was added with SAP HANA 2.0 SPS 04 via a Python API for SAP HANA machine learning. Many data scientists are using Python for fast prototyping of their machine learning models. You can now leverage the SAP HANA machine learning libraries to process the data in SAP HANA from calls in Python. The R integration has also been expanded with an R API for SAP HANA machine learning to leverage SAP HANA directly from your R code. Predictive modeling uses statistical algorithms on a set of input data to “guess” the probability of a specific outcome. We can use this for data mining where you use these algorithms to find patterns in the data. A variation of this that is discussed a lot in the media is machine learning. In this case, we use statistical algorithms to let the computer look at, analyze, and “learn” from the data by automatically building a predictive model. We can then run new data through the generated model to let the system predict and provide the probability of any outcomes we’re interested in. We’ll now discuss PAL in more detail. Predictive Analysis Library Predictive analytics refers to statistics applied to prediction, machine learning, or data mining processes. Those statistics need a lot of complex algorithms. You can program these algorithms in a programming language such as Java or use algorithms already created in a language such as R. The problem is that you then have to transfer the data to external servers instead of processing it in SAP HANA. Some languages, such as Python, are quite slow in general. You should preferably write such complex algorithms in a compiled language, such as C++, for fast execution. If you want to keep the data in the database, SQL or SQLScript isn’t an option as these are set-oriented data languages. PAL is a library of such algorithms, written in C++, and running inside SAP HANA. You don’t have to transfer any data between SAP HANA and another application server. The algorithms in PAL were carefully selected based on the most commonly used algorithms in the market, algorithms generally available in other database products, and algorithms required by SAP applications. Over the years, the library has grown to fulfill almost all requirements from SAP HANA projects. Recently added algorithms tend to be the popular trending algorithms from areas such as machine learning. Key Concepts Refresher Chapter 10 In SAP HANA, the predictive analysis functions are grouped together in the PAL. PAL defines functions that you can call from SQLScript and use to perform various predictive analysis algorithms. You can use these PAL functions to prepare your data for further analysis, for example, building a predictive model. You can then use this trained model on new data to make predictions for this data. For example, when traveling, you want to know if your flight will be delayed. You can train a predictive model by feeding it all the historical flight data. After the model is prepared, you can give it your flight details. The model will then predict whether your flight will be delayed or not. You’ll learn how to build this in SAP HANA in the “Machine Learning Models” section later in this chapter. In PAL, predictive analysis algorithms are grouped in 10 categories; we’ll take a quick look at some of the categories and discuss a few popular algorithms to give you an idea of what is possible. The categories, minus the miscellaneous one that serves as a catch-all for anything left over, are as follows: 쐍 Association For an example of an association use case, think of Amazon’s suggestions: Customers who bought this item also bought. The Apriori algorithm can be used to analyze shopping baskets. It can then identify products that are commonly purchased together and group them together in the shop, such as flashlights and batteries. You could even offer a discount on the flashlights because you know that people will need to buy batteries for them, and you can make enough profit on the batteries. (Notice, however, that the inverse isn’t necessarily true; a discount on the batteries won’t encourage people to buy flashlights.) 쐍 Clustering With clustering, you divide data into groups (clusters) of items that have similar properties. For example, you can group servers with similar usage characteristics, such as email servers that all have a lot of network traffic on port 25 during the day. You might use the popular k-means algorithm for this purpose. Another example is computer vision, in which you cluster pictures of human faces using the affinity propagation algorithm. Clustering is sometimes also called segmentation. The focus of clustering is the algorithm figuring out by itself what is the best way to group existing data together. 쐍 Regression Regression plots the best possible line or curve through the historical data. You can then use that line or curve to predict future values. This could be used by the previously mentioned car rental company. The company has had 5,000 of these 435 436 Chapter 10 Text Processing and Predictive Modeling vehicles before, and so it has good information available to predict when certain parts in a similar vehicle will break. 쐍 Classification The process of classification takes historical data and trains predictive models using, for example, decision trees. You can then send new data to the predictive model you’ve trained, and the model will attempt to predict which class the new data belongs to. This process is often referred to as supervised learning. The focus here is on getting accurate predictions for unknown data. 쐍 Preprocessing and data preparation This category of algorithms is used to prepare data before further processing. These algorithms may improve the accuracy of the predictive models by reducing the noise. 쐍 Time series For an example of a time series, think of the real-life scenario at the beginning of the chapter, in which we built a predictive model to forecast sales by looking at seasonal patterns. You might also see, for example, a monthly pattern in shops of higher sales after paydays. 쐍 Social network analysis Algorithms in this category are used to predict the likelihood of a future link between two people who have no link yet. This is used by companies such as Facebook and LinkedIn. 쐍 Statistics and outlier detection This category includes some statistical algorithms to find out how closely variables are related to or are independent from each other. One example is the Grubbs’ test algorithm, which is used to detect outliers. This could be used in the fraud detection use case we mentioned earlier, in which certain transactions are different from the others. (Often, we first use a clustering algorithm to group transactions together and then detect outliers for each group/cluster after.) 쐍 Recommender systems These algorithms, as the name suggests, make recommendations by analyzing product associations, the purchase history of the buyer, and external factors, such as the general economy. Music playlists are good examples. Figure 10.8 shows the PAL algorithms, per category, that are available in SAP HANA 2.0 SPS 03. Key Concepts Refresher Chapter 10 Association Classification Time Series 437 Statistics Apriori, AprioriLite Area Under Curve (AUC) ARIMA/Auto-ARIMA ANOVA FP-Growth Back-Propagation (Neural Network) C4.5 Decision Tree Auto/Brown/Single/Double/ Triple Exponential Smoothing Chi-Squared Goodness-of-Fit Test K-Optimal Rule Discovery (KORD) Sequential Pattern Mining CART Decision Tree CHAID Decision Tree Clustering ABC Classification Confusion Matrix Correlation Function Croston's Method Fast Fourier Transform (FFT) Forecast Accuracy Measures Accelerated K-Means/K-Means Gradient Boosting Decision Tree (GBDT) Affinity Propagation KNN (K Nearest Neighbour) Hierarchical Forecast Agglomerate Hierarchical Clustering Linear Discriminant Analysis (LDA) Cluster Assignment Logistic Regression (with Elastic Net Regularization) Linear Regression with Damped Trend and Seasonal Adjust DBSCAN Gaussian Mixture Model (GMM) K-Medians K-Medoids Latent Dirichlet Allocation (LDA) (Kohonen) Self-Organizing Maps Multi-Class Logistic Regression Multilayer Perceptron Naïve Bayes Random Decision Trees/Forest Support Vector Machine Miscellaneous Regression Bi-Variate Geometric, Bi-Variate Natural Logarithmic Regression Cox Proportional Hazard Model Exponential Regression ABC Analysis Weighted Score Table Seasonality Test, Trend Test, White Noise Test Social Network Analysis Multiple Linear Regression Polynomial Regression PageRank Condition Index Covariance Matrix Cumulative Distribution Function Distribution Fitting/Weibull analysis Distribution Quantile, Quantile Function F-test (Equal Variance Test) Factor Analysis Grubbs’ (Outlier) Test Kaplan-Meier Survival Analysis Preprocessing Anomaly Detection Binning, Binning Assignment Inter-Quartile Range (Tukey’s Test) Multidimensional Scaling (MDS) Mean, Median, Variance, Standard Deviation, Kurtosis, Skewness One-Sample Median Test Pearson Correlations Matrix T-Test Partition Univariate Analysis/Multivariate Analysis Principal Component Analysis (PCA) Wilcox Signed Rank Test Random Distribution Sampling Link Prediction – Common Neighbors, Jaccard’s Coefficient, Adamic/Adar, Katzβ Generalized Linear Models (GLM) Forecast Smoothing Chi-Squared Test of Independence Sampling Scale, Scale with Model Substitute Missing Values Variance Test Recommender Systems Alternating Least Squares Factorized Polynomial Regression Models Field-Aware Factorization Machines (FFM) Figure 10.8 Predictive Functions Available in SAP HANA 2.0 SPS 03, Grouped by Category SPS 04 New algorithms, such as the trending and popular Hybrid Gradient Boosting Tree algorithm for classification and regression, were added in SAP HANA 2.0 SPS 04. Another algorithm included is Change Point Detection, which is used for detecting abrupt changes in time-series data such as stock prices. Other examples are Geometry DBSCAN (which we’ll meet again in the geospatial processing in the next chapter), Missing Value Handling, Entropy, and Discretize. Performance of all the PAL function categories have been improved. A useful new function is the Regression Model Comparison function, which allows you to run your data across many different regression algorithms in PAL using a single function call, and then compare the results. Error messages and logging are also now more detailed, which helps with debugging. Our discussions about the PAL algorithms focuses on how to use them with SAP HANA. We can’t discuss when to use which algorithm, as this is a specialist field on 438 Chapter 10 Text Processing and Predictive Modeling its own. To properly use PAL in SAP HANA, you’ll either need to know what and when to use it yourself or work with someone who does. Note If you want to learn about the different PAL algorithms in more detail, there are two great sources. The first is the SAP HANA Academy on YouTube, which has a dedicated video for each algorithm and a few overview video tutorials. You can find the PAL playlist at http:// s-prs.co/v487634. The other source is the SAP HANA Predictive Analysis Library (PAL) Reference Guide from the Reference section at http://s-prs.co/v487620. You can start with the table of contents to look at the categories and see which algorithms are available in the category you need. You can then look at the information for each of the algorithms, for example: 쐍 The description of the algorithm, its perquisites, and capabilities 쐍 The inputs and outputs that are required, as well as the data types for each column 쐍 Sample code for using the algorithm, and what results can be expected If you already know the specific algorithm that you want to use or are working with someone who does, you can use the search feature to find that specific function. Installing and Configuring Predictive Analysis Library PAL is an optional installation with SAP HANA. As it’s part of the SAP HANA, enterprise edition, you might have to ask the SAP HANA system administrator to install it for your project. If you’re using the SAP HANA, express edition, as we are in this book, it’s already preinstalled for you when you use the Server + Applications version. You’ll have used this version if you followed along when we discussed the basic SAP HANA modeling in SAP Web IDE for SAP HANA. PAL is part of the AFL that ships with all SAP HANA systems. AFL is an optional component; if you want to use the predictive analysis algorithms in PAL, you have to install AFL. AFL also includes a Business Function Library (BFL). You can check that PAL is installed by querying the system views SYS.AFL_AREAS and SYS.AFL_ FUNCTIONS. PAL uses a special engine in SAP HANA called the scriptserver. Because partners and customers can write their own function libraries using the AFL SDK, and because some algorithms can run for a longer time, SAP HANA has to protect its kernel against any instability. PAL and any other function libraries written with the AFL SDK run inside the scriptserver. By default, the scriptserver isn’t running. Key Concepts Refresher Chapter 10 You’ll need to add the scriptserver in your tenant database. In the SAP HANA, express edition, this is named HXE. The scriptserver doesn’t run in the SYSTEMDB. You can run the following command while you’re connected to the SYSTEMDB database (replace HXE with the name of your tenant database): ALTER DATABASE HXE ADD 'scriptserver'; After the installation, you have to ensure that the scriptserver is started in the tenant database and make sure the correct roles and authorizations are assigned to your SAP HANA user. (Roles and how to assign them are discussed in Chapter 13.) You can check that the scriptserver is running, by running the following query in the tenant database: SELECT * FROM SYS.M_SERVICES; You also have to assign the necessary roles and privileges. Your development user will need the AFL__SYS_AFL_AFLPAL_EXECUTE role to execute the PAL procedures. Both the SAP HANA Deployment Infrastructure (HDI) service technical user and the HDI container owners need to have been granted SELECT privileges on the PAL procedures. The HDI service technical user can be granted privileges through its specific default role called XSA_DEV_USER_ROLE. You’ll need to assign this role to your development user as well. The HDI container owners can be granted privileges through the default role called _SYS_DI_OO_DEFAULTS. The easiest way to set this up properly is to use the code snippet provided by the SAP HANA Academy. You can view the video called “PAL 146 Getting Started with HANA 20 SPS02” in the YouTube playlist we mentioned in the previous subsection. Or you can just run the code snippet on GitHub that Phillip describes in the video (the code snippet for setting this up is available at http://s-prs.co/v487635). You’ll have to do this for your SAP HANA, express edition, or you won’t be able to create the flowgraphs we’ll discuss next. After PAL is installed and configured, you can create predictive models graphically in SAP Web IDE for SAP HANA. There are three ways to call the PAL algorithms. 쐍 The easiest way to is to create a flowgraph. Even people who don’t know SAP HANA modeling that well, such as a data scientist in your team, can use the flowgraphs. 439 440 Chapter 10 Text Processing and Predictive Modeling 쐍 You can also create an SAP HANA AFLLANG procedure. We won’t go into much detail on this. 쐍 You can call the catalog SQLScript procedures provided. We’ll start using PAL by creating flowgraphs. Creating a Predictive Model Using Flowgraphs We mentioned flowgraphs in Chapter 9 already, when discussing SAP HANA smart data integration and SAP HANA smart data quality. We’ll configure the predictive analytics flowgraphs to generate the procedures we need to execute the predictive model. We can, however, also run the model directly inside the flowgraph editor. A flowgraph is a file with an .hdbflowgraph extension. It allows you to design and run data transformation procedures and tasks without typing a single line of code. You create an .hdbflowgraph file from the New context menu in your project. The flowgraph editor opens, as shown in Figure 10.9. Before creating your flowgraph, you’ll need to set the runtime behavior type of your flowgraph. You can do this in the Properties dialog box. You can open this dialog box by clicking on the top-right icon that looks like a small gear. In the Settings tab, you can select the Runtime Behavior Type in the dropdown menu. There are four options: 쐍 Batch Task This behavior type processes data as a batch or when loaded with an initial load. It can’t be run in real time. A stored procedure and a task are created after building the flowgraph. You can schedule this to run at regular intervals. 쐍 Realtime Task This behavior type allows you to process data in real time. A stored procedure and two tasks are created when you build the flowgraph. The first task is a batch or initial load of the input data. The second task is run in real time for any updates that occur to that input data. 쐍 Transactional Task This also allows you to process data in real time. A single task is created after building the flowgraph. This task is run in real time when any updates occur to the input data. It doesn’t include an initial load as it assumes that all data collected comes from a transactional system. 쐍 Procedure We’ll only discuss this type of runtime behavior in this chapter. It allows you to process data using stored procedures. It can’t be run in real time. When you Key Concepts Refresher Chapter 10 build the flowgraph, it will automatically create the stored procedures for you. You can then run the flowgraph when you want to. This works best when analyzing data. Figure 10.9 Setting the Properties of the Flowgraph In Figure 10.10, we build a graphical flowgraph model. Start by clicking on the large + sign. A palette menu will pop up for you to select nodes to use in your flowgraph model. In this case, we select the Predictive Analysis node. In the case of a Predictive Analysis node, another dialog box will pop up where you chose your algorithm. The PAL functions are grouped in the categories that we discussed earlier. You can expand each category to see the algorithms available in that category. Note One thing to keep in mind is that the graphical flowgraph editor is typically one release behind the PAL. If you really need one of the latest PAL algorithms, you’ll have to call it with SQLScript. 441 442 Chapter 10 Text Processing and Predictive Modeling Figure 10.10 Creating a Flowgraph Model in SAP Web IDE for SAP HANA Your flowgraph must consist of at least three nodes, as shown in Figure 10.11. You need a Data Source node from which the data is fed into the Predictive Analysis node, where it’s processing depends on the algorithm you chose. This is also sometimes called the Transformation node. The output from the predictive analysis is then sent to the Data Target node. Figure 10.11 Flowgraph Sequence from Data Source to Data Target We’ll start with the Predictive Analysis node as the data inputs and outputs are determined by the PAL algorithm that you choose. In Figure 10.12, we selected the Accelerated K-Means clustering algorithm. You can see from the title of the algorithm that it expects two inputs and provides two outputs. Key Concepts Refresher Chapter 10 You can open the bottom part of Figure 10.12 by clicking on the gear icon of the Predictive Analysis node. The text description of the algorithm in this case shows that it expects an ID field and one other column with the data type of integer or double, and that NULL values aren’t allowed. This is necessary information that we need when creating or selecting the Data Source node. Figure 10.12 Configuring the Transformation Node The Regex column shows [ISF8_19_0] and [IDS] for the different Inputs. [ISF8_19_0] means that this entry must be a unique identifier. These can be either integers or strings. In this case, the text description specified that it should be an integer. [ID] means that it should either be an integer or a double. [IDS] means that several values can be sent through. In that case, the Multi Column box is selected. 443 444 Chapter 10 Text Processing and Predictive Modeling The mapping of input fields from the data source is normally done automatically. The columns are handled in the same order in which they are defined in the data source. You now know from the Predictive Analysis node that the data source must supply at least two columns, of which the first must be an ID field. These field data types can either be integer or double. Figure 10.13 shows the configuration of the Data Source node. In Figure 10.11, shown earlier, the top-right corner of the Data Source node has a red dot. This indicates that the node isn’t yet configured. To configure it, you click on the gear icon at the bottom of the node. The main selection here is the type of data source that you want to use. The HANA Object button allows you to select SAP HANA tables, views, synonyms, CDS entities, table functions, and so on. If you don’t use procedure-type flowgraphs, you can also select calculation views. In this case, we’re using a procedure-type flowgraph, so we can’t use a calculation view as a data source. Sometimes you might want to have a more dynamic setup. You can choose to use a Table Type as a data source, as shown in Figure 10.13. You’ll have to specify the columns and the data type of each column. The input table will be passed into the procedure call as an input parameter. Tip Choosing a Table Type as a data source is only available for procedure-type flowgraphs. We provide the expected input fields with the required data types to the Predictive Analysis node. In this example it could be something like product sales data, where we want to cluster according to quantities. We pass through the item IDs, the quantities, and the item names. The configuration of the Data Target node is very similar to that of the Data Source node. The data target defines where you want to put the calculated results. This could be an existing SAP HANA table; in which case, you select HANA Object. If you’re creating a procedure-type flowgraph, you can select Table Type. If you want to create a new table for the results, you can select Template Table, as shown in Figure 10.14. If the Predictive Analysis node creates several result sets, you don’t have to retrieve all of them. If result sets aren’t mapped to a Data Target node, they won’t be stored. Key Concepts Refresher Chapter 10 Figure 10.13 Configuring the Data Source Node Figure 10.14 Configuring the Data Target Node 445 446 Chapter 10 Text Processing and Predictive Modeling You now know how to create simple predictive models. Now let’s discuss how to create machine learning models. Machine Learning Models With machine learning models, we use the predictive algorithms to let the computer look at, analyze, and “learn” from the data by automatically building a predictive model. We can then run new data through the generated model to let the system predict the results based on what it has learned from the historical data. Sometimes, you might want to train your model, and then use it later to make predictions. Figure 10.15 shows such a model: It starts with some training data that you feed into a C4.5 decision tree. The data is used to train the classification model. You then feed the trained classification model into a prediction function and give the prediction function some new data. The prediction function applies the trained model to the new data and provides the predicted results in an output table. This is an example of how you would build a model for predicting delayed flights. You can train a predictive model by feeding it all the historical flight data. After the model is prepared, you can give it your flight details. The model will then predict whether your flight will be delayed or not and by how many minutes. You can build this exact model using the flights data from the US Department of Transportation that we discussed in Chapter 2. Figure 10.15 Predictive Model Created in SAP Web IDE for SAP HANA There is a nice way to check how well your trained model will perform. You use 90% or 95% of the historical data to train your machine learning model. You then Key Concepts Refresher Chapter 10 feed the remaining 10% or 5% of the data into your model to get the predicted values. Because this is actually historical data, you have the result of what really happened. You can now compare the historical result with the predictive values to evaluate the accuracy of your predictive model. The main difference between this model and the simpler one in the previous section is that these models have several steps. The first step trains the model, then you can validate and test it. Finally, you use the trained model for predictions. You can even add another step before the model training step, by first using partitioning-type algorithms to divide the data into several subsets. You can use 90% of your data to train the model, and the remaining 10% to validate it with. Or you can use preprocessing (data preparation) algorithms on the raw data to first shape it into a usable format. SPS 04 To improve the performance of machine learning models in SAP HANA 2.0 SPS 04, you can now keep your trained PAL predictive models in memory. You parse the model once, keep the model state in memory, and can then execute repeatedly on the stored model that is kept in memory for repeated scoring. When you build a flowgraph model, you start on the left side and work toward the right side, as shown in Figure 10.15. First, add a data input node, and select a table with data. Put this on the left side of the work area. Next, select a PAL function from the popup palette menu, and add it to the work area. Then, connect the Data Source node to the PAL function node. You carry on in this fashion until your data model is complete. When you want to run your predictive model, you can click the Execute button in the top left of the flowgraph editor. The biggest advantage of flowgraphs is that they are very easy and flexible. Data science can be quite complex, and you don’t always know which algorithms will produce the best results. Using graphical flowgraphs, you can test the performance of several algorithms to find the best solution. SQLScript Procedures Created by the Flowgraph When you’re satisfied with your predictive model, you can build it in SAP Web IDE for SAP HANA. With a procedure-type flowgraph, it will create a set of two procedures automatically during the build process. 447 448 Chapter 10 Text Processing and Predictive Modeling You can find these two procedures by going to the Procedures folder in the SAP HANA database explorer, as shown in Figure 10.16. The SAP HANA database explorer is accessed via the second icon on the top-left toolbar of SAP Web IDE for SAP HANA. In this case, we used the simple predictive model created earlier in Figure 10.11, using the accelerated k-means algorithm. The flowgraph was called DEMO_INTRO.hdbflowgraph. The two procedures are seen on the list of procedures on the left side: DEMO_ INTRO and DEMO_INTRO.PredictiveAnalysis. The last part of the name is due to the naming of the Transformation node in Figure 10.11. When you open the procedures, you can see the code in the CREATE Statement tab. Figure 10.16 shows the code for the first procedure, called DEMO_INTRO. Figure 10.16 SQLScript Procedure Created by the Flowgraph The first procedure is a SQLScript procedure and is shown in Listing 10.4. After the BEGIN statement, it first calls a parameters table that was also automatically created during the build process. In the next step, it calls the second procedure, called DEMO_INTRO.PredictiveAnalysis. Finally, it inserts the result set into the data target. This procedure is normally the one that your applications will call. Key Concepts Refresher Chapter 10 CREATE PROCEDURE "DEMO_INTRO"(IN DEMO_INTRO_DATASOURCE_TAB "DEMO_INTRO.DEMO_INTRO_DataSource__TT") LANGUAGE SQLSCRIPT SQL SECURITY INVOKER AS BEGIN PREDICTIVEANALYSIS_PARAMETERS_TAB = SELECT * FROM "DEMO_INTRO.PredictiveAnalysis_PARAMETERS"; CALL "DEMO_INTRO.PredictiveAnalysis"(:DEMO_INTRO_DATASOURCE_TAB, :PREDICTIVEANALYSIS_PARAMETERS_TAB, PREDICTIVEANALYSIS_RESULT_TAB, PREDICTIVEANALYSIS_CENTER_POINTS_TAB ); INSERT INTO "DataTarget" ("ID", "ASSIGNED_CLUSTER", "DISTANCE") SELECT "ID" AS "ID", "ASSIGNED_CLUSTER" AS "ASSIGNED_CLUSTER", "DISTANCE" AS "DISTANCE" FROM :PREDICTIVEANALYSIS_RESULT_TAB; END Listing 10.4 SQLScript Procedure Listing 10.5 shows some of the code of the second procedure. The interesting thing here is that this isn’t written in SQLScript, but in AFLLANG. The AREA parameter shows as AFLPAL, showing that it’s calling a PAL function. The FUNCTION is shown as ACCELERATEDKMEANS. In the PARAMETERS section, you can see the input fields that we specified in Figure 10.13. The PARAMETERS section is basically in JavaScript Object Notation (JSON) format. CREATE PROCEDURE "SHINE_CORE_1"."DEMO_INTRO.PredictiveAnalysis" (IN p1 "SHINE_CORE_1"."DEMO_INTRO.PredictiveAnalysis_DATA__TT", IN p2 "SHINE_CORE_1"."DEMO_INTRO.PredictiveAnalysis_PARAMETERS__TT", OUT p3 "SHINE_CORE_1"."DEMO_INTRO.PredictiveAnalysis_RESULT__TT", OUT p4 "SHINE_CORE_1"."DEMO_INTRO.PredictiveAnalysis_CENTER_POINTS__TT") LANGUAGE AFLLANG SQL SECURITY INVOKER READS SQL DATA AS BEGIN AFLLANGVERSION='2'; SCHEMA='_SYS_AFL'; AREA='AFLPAL'; FUNCTION='ACCELERATEDKMEANS'; PARAMETERS='; "TABLE",in,TrexTable,,,,; "ID",,,INT,,INTEGER,; "QUANTITY",,,DOUBLE,,DOUBLE,; "ITEM_NAME",,,STRING,,NVARCHAR,50; … 449 450 Chapter 10 Text Processing and Predictive Modeling '; END Listing 10.5 AFLLANG Procedure We can use these two procedures to make our lives easier when we want to use our flowgraph model in a calculation view. Using Predictive Models in Calculation Views We can’t call a SQLScript procedure in a calculation view because calculation views are read-only artifacts and may therefore only use other read-only artifacts. Procedures can be read-write artifacts. Calculation views may, however, use table functions, as these are read-only. There are three different ways of calling the PAL algorithm from a table function: 쐍 Copy the code from the created SQLScript procedure into a new table function, and modify it slightly. We’ll look at this soon. 쐍 Create an AFLLANG procedure, and call this procedure from your table function. You can create an AFLLANG procedure by creating an .hdbafllangprocedure file. You’ve already seen the JSON-formatted section where you define the name of the PAL function and its input and output parameters. If you want to create such a file, you first need to learn AFLLANG, and then create several tables, table types, CDS entities, or data structures. Each parameter of the PAL function needs to be mapped to an existing object. 쐍 Call the documented PAL SQLScript procedures. This requires the assignment of additional authorizations. Modifying the SQLScript procedure from Listing 10.4 for the function is actually quite easy. You can copy the code of the entire procedure. You then follow a few steps: 1. Change the PROCEDURE to a FUNCTION. To avoid confusion, we changed the name by adding a number 1. You can also add _FUNC to the name. Most of the code stays the same, except that a function returns the results set, whereas the procedure inserted it into the data target. 2. Add a RETURNS table with the definition of the output table type in the FUNCTION header. 3. Delete the INSERTS INTO “DataTarget” command to RETURNS. The result is shown in Listing 10.6. Key Concepts Refresher Chapter 10 CREATE FUNCTION "DEMO_INTRO1"(IN DEMO_INTRO_DATASOURCE_TAB "DEMO_INTRO.DEMO_INTRO_DataSource__TT") RETURNS table (ID NVARCHAR(50), ASSIGNED_CLUSTER INTEGER, DISTANCE DOUBLE) LANGUAGE SQLSCRIPT SQL SECURITY INVOKER AS BEGIN PREDICTIVEANALYSIS_PARAMETERS_TAB = SELECT * FROM "DEMO_INTRO.PredictiveAnalysis_PARAMETERS"; CALL "DEMO_INTRO.PredictiveAnalysis"(:DEMO_INTRO_DATASOURCE_TAB, :PREDICTIVEANALYSIS_PARAMETERS_TAB, PREDICTIVEANALYSIS_RESULT_TAB, PREDICTIVEANALYSIS_CENTER_POINTS_TAB); RETURN SELECT "ID" AS "ID", "ASSIGNED_CLUSTER" AS "ASSIGNED_CLUSTER", "DISTANCE" AS "DISTANCE" FROM :PREDICTIVEANALYSIS_RESULT_TAB; END; Listing 10.6 Modifying the Code from the SQLScript Procedure to Create a Table Function You can now add your table function into a Projection node in the calculation view in Figure 10.17. Add a new Projection node, and click Add Data Source. When you search for “DEMO_INTRO”, you’ll see your table function. Click on Finish to add it as the data source for your calculation view. You’re now using your predictive model that you created using a flowgraph directly in your calculation view. Figure 10.17 Using the Table Function in a Calculation View 451 452 Chapter 10 Text Processing and Predictive Modeling Creating a Predictive Model Using SQLScript You can also create a predictive model purely by using SQLScript instead of graphical modeling tools. This is the way that predictive models were originally created in SAP HANA before the graphical tools existed. Since SAP HANA 2.0 SPS 02, SAP provides SQLScript procedures that you can use to directly use the PAL algorithms. These procedures are found in the _SYS_AFL schema. This makes the SQLScript code a lot easier and shorter. Before SPS 02, you had to write your own wrapper code, which could be quite time-consuming. This is now easy enough that you could even use these procedures interactively in the SQL Console. You’ve now mastered predictive modeling and machine learning using the PAL in SAP HANA. Important Terminology In this chapter, we looked at the following important terms: 쐍 Full-text index Depending on what you specify when you create the full-text index, it gives access to one or more text features. The full-text index will use additional memory to provide the requested text features. 쐍 Fuzzy text search This search type enables or disables the fault-tolerant search feature. 쐍 Text analysis Text analysis provides a vast number of possible entity types and analysis rules for many industries in many languages. 쐍 Text mining When using text mining in SAP HANA, you can perform the following tasks: – Categorize documents. – Find top-ranked terms related to a particular term. – Analyze a document to find key phrases that describe the document. – See a list of documents related to the one you asked about. – Access documents that best match your search terms. 쐍 Predictive Analysis Library (PAL) Defines functions that you can use to perform various predictive analysis algorithms. Practice Questions Chapter 10 Practice Questions These practice questions will help you evaluate your understanding of the topics covered in this chapter. The questions shown are similar in nature to those found on the certification examination. Although none of these questions will be found on the exam itself, they will allow you to review your knowledge of the subject. Select the correct answers, and then check the completeness of your answers in the “Practice Question Answers and Explanations” section. Remember, on the exam, you must select all correct answers and only correct answers to receive credit for the question. 1. Which text feature do you use to find sentiment in textual data? 첸 A. Full-text index 첸 B. Fuzzy text search 첸 C. Text analysis 첸 D. Text mining 2. You create a full-text index. For what text feature clause do you need to specify a configuration? 첸 A. SEARCH ONLY 첸 B. FUZZY SEARCH INDEX 첸 C. TEXT MINING 첸 D. TEXT ANALYSIS 3. What is the recommended way to create full-text indexes? 첸 A. User-defined functions 첸 B. Procedures 첸 C. Core data services 첸 D. SQL statements in the SQL Console 4. For which data types does SAP HANA automatically create full-text indexes? (There are 3 correct answers.) 첸 A. NVARCHAR 첸 B. SHORTTEXT 453 454 Chapter 10 Text Processing and Predictive Modeling 첸 C. BINTEXT 첸 D. CLOB 첸 E. TEXT 5. What predicate do you use for fuzzy text search in the WHERE clause of a SQL statement? 첸 A. LIKE 첸 B. ANY 첸 C. IN 첸 D. CONTAINS 6. What do you use in a SQL query to sort fuzzy text search results by relevance? 첸 A. RANK() 첸 B. SCORE() 첸 C. SORT BY 첸 D. TOP 7. True or False: The full-text index is stored in table $TA. 첸 A. True 첸 B. False 8. Which two elements are part of a text analysis configuration? (There are 2 correct answers.) 첸 A. Entity extraction 첸 B. Dictionary 첸 C. Noun groups 첸 D. Rule set 9. What PAL algorithm category can you use for supervised learning? 첸 A. Classification 첸 B. Clustering 첸 C. Association 첸 D. Social network analysis Practice Questions Chapter 10 10. What PAL algorithm would you use to analyze shopping baskets? 첸 A. K-means 첸 B. Decision tree 첸 C. Apriori 첸 D. Link prediction 11. What must be installed for PAL to work? 첸 A. SAP HANA BFL 첸 B. SAP HANA AFL 첸 C. SAP HANA APL 12. True or False: The scriptserver must be added to the SYSTEMDB database for PAL to work properly. 첸 A. True 첸 B. False 13. Which files can you use to call PAL functions? (There are 3 correct answers.) 첸 A. .hdbcds 첸 B. .hdbprocedure 첸 C. .hdbworkspace 첸 D. .afllangprocedure 첸 E. .hdbflowgraph 14. Which runtime behavior type for a flowgraph can process data in real time? 첸 A. Transactional task 첸 B. Procedure 첸 C. Batch task 15. Which data sources can you use in the Data Source node of a procedure-type flowgraph? (There are 2 correct answers.) 첸 A. A calculation view 첸 B. A table type 455 456 Chapter 10 Text Processing and Predictive Modeling 첸 C. A CDS view 첸 D. A template table Practice Question Answers and Explanations 1. Correct answer: C You use text analysis with the EXTRACTION_CORE_VOICEOFCUSTOMER configuration to find sentiment in textual data. 2. Correct answer: D You need to specify a configuration with the TEXT ANALYSIS clause when you create a full-text index. 3. Correct answer: C The recommended way going forward to create full-text indexes is using CDS. We never recommend using SQL statements in the SQL Console because you’ll have to retype everything again in the production system. Procedures are acceptable and usable, but CDS is the recommended way. 4. Correct answers: B, C, E SAP HANA automatically creates full-text indexes for TEXT, BINTEXT, and SHORTTEXT. For NVARCHAR and CLOB, you have to create full-text indexes manually. 5. Correct answer: D You use the CONTAINS predicate for fuzzy text search in the WHERE clause of a SQL statement. The LIKE predicate only uses exact matching, not fuzzy text search. 6. Correct answer: B You use the SCORE() function in a SQL query to sort fuzzy text search results by relevance. RANK, SORT BY, and TOP can’t be used to sort SAP HANA fuzzy text search results. 7. Correct answer: B False. The full-text index is NOT stored in table $TA. 8. Correct answers: B, D A text analysis configuration consists of a text analysis configuration file, a dictionary, and a rule set. Entity extraction and noun groups are some of the results of a text analysis. 9. Correct answer: A You can use the algorithms in the PAL classification category for supervised learning. Takeaway Chapter 10 Clustering is sometimes also called unsupervised learning. 10. Correct answer: C Use the Apriori PAL algorithm to analyze shopping baskets. 11. Correct answer: B You must install AFL for PAL to work. BFL is also part of AFL, but a completely separate library from PAL. APL is the automated predictive library and is a separate product. 12. Correct answer: B False. The scriptserver must actually be added to the tenant database for PAL to work properly. It can’t be added to the SYSTEMDB database. 13. Correct answers: B, D, E PAL functions can be called from flowgraphs (.hdbflowgraph), AFLLANG procedures (.afllangprocedure), and SQLScript procedures (.hdbprocedure). .hdbcds and .hdbworkspace files can’t call code. 14. Correct answer: A A runtime behavior of the transactional task type for a flowgraph can process data in real time. Procedure and batch task types of runtime behavior for a flowgraph can’t process data in real time. 15. Correct answers: B, C You can use table types and CDS views as data sources in the Data Source node of a procedure-type flowgraph. You can’t use a calculation view as a data source in the Data Source node of a procedure-type flowgraph. A template table can only be used in a Target Data node; not in a Data Source node. Note Some questions in the certification exam will only have three answer options. Takeaway You now know how to create a text search index for use with the fuzzy text search, text analysis, and text mining features in SAP HANA. You can use these features to 457 458 Chapter 10 Text Processing and Predictive Modeling provide fault-tolerant searches, sentiment analysis, and document references for the users of your information models. We also discussed the categories and some of the algorithms in PAL and learned how to create advanced predictive analysis models for machine learning. Summary We discussed a few aspects of text analysis: how to search and find text even if the user misspells it, how to analyze text to find sentiment, and how to mine text in documents. We explained how to build a predictive analysis model for business users using the flowgraph editor, and we saw how to get the system to automatically create procedure code to be used in calculation views. In the next chapter, we’ll look at information modeling with spatial, graph, and series data. Chapter 11 Spatial, Graph, and Series Data Modeling Techniques You’ll Master 쐍 Understand the key components of spatial processing, such as spatial reference systems (SRSs), data types, joins, predicates, functions, expressions, and clustering 쐍 Learn how to use the key spatial components in both graphical and SQLScript modeling 쐍 Work with relationships between data objects using graphs 쐍 Get an overview of SAP HANA series data 쐍 Understand the advantages of using SAP HANA for series data 쐍 Get to know some of the SAP HANA series data functionalities 460 Chapter 11 Spatial, Graph, and Series Data Modeling After understanding the basics of SAP HANA information modeling, we started by taking your knowledge to the next level in the previous chapter with text processing and predictive modeling. In this chapter, we’ll cover three more topics: spatial, graph, and series data modeling. All five topics can be used independently from each other, but they can also be combined in projects. We’ll look at an example of how to combine spatial and graph modeling in this chapter. We’ll begin our discussion with spatial processing and how to combine business data with maps and geospatial capabilities to extract more value. We already met the Graph node in calculation views. Here we’ll take another quick look at the topic. After that, series data is the fifth and last topic that will complement your knowledge of advanced SAP HANA modeling techniques. In this chapter, we’ll just provide an overview on the topic. You can combine series data with aggregation, analysis, and predictive modeling to create valuable insights into real-world operations. Real-World Scenario You’re working on an SAP HANA project at a pharmaceutical company. The company has created a mobile web application with SAP HANA for its sales representatives. To provide the sales reps with a list of new potential customers and to save them time by not contacting people working in nonrelated fields, you analyze the relationships between people in your SAP Customer Relationship Management (SAP CRM) system. You add some what-if map-based analytics to the mobile app so that sales reps can change the areas where they are working and see the impact on their projected sales figures. The company also has a fleet of vehicles for deliveries of their products. All the vehicles are fitted with tracking devices, as well as a variety of sensors to monitor the vehicles. The company wants a dashboard that shows where the vehicles are (time series data), what they are doing (e.g., loading goods or driving), and when they will be at their destinations. This dashboard shows a combination of real-time data, inferred status information, and arrival time predictions. Key Concepts Refresher Chapter 11 You also notice that by analyzing the historical data, you can add planned dates for servicing the vehicles based on their mileage, the way the drivers handled their vehicles, and warnings from sensors in the vehicles You can then calculate projected running costs of the fleet and even show if the company needs fewer or more vehicles in its fleet. Objectives of This Portion of the Test The purpose of this portion of the SAP HANA certification exam is to test your knowledge of advanced modeling with SAP HANA. The certification exam expects you to have a good understanding of the following topics: 쐍 The key components of spatial processing, such as SRSs, data types, joins, predicates, functions, expressions, and clustering 쐍 The use of spatial components in both graphical and SQLScript modeling 쐍 The key concepts of graph processing 쐍 Basic series data modeling techniques 쐍 How to create a series table Note This portion contributes up to 12% of the total certification exam score. Key Concepts Refresher In this chapter, we’ll look at spatial and geospatial processing, graph modeling, and series data. From the real-world scenario above, you can get some idea of what is possible with the knowledge you’ll gain from this chapter. The real value starts appearing when you learn to combine several of these modeling capabilities. One example that you’ll most likely have used is GPS routing. The destination is a geospatial point. The roads linking your current location to your destination form a graph. Each road has certain characteristics, for example, the speed limit and whether it’s a tarred or tolled road. In this case, you combine spatial with graph modeling. 461 462 Chapter 11 Spatial, Graph, and Series Data Modeling Another example is when you order items from an online store. Business information from your order, such as the items and whether you paid for same-day delivery, is linked to the geospatial data of your address. The vehicle is most likely monitored for real-time feedback, the data is stored using series data, and this historical data is processed with predictive algorithms to determine the next maintenance service. The online store can use machine learning with weather and traffic patterns to determine when best to deliver your order, and it can use text processing from social media to determine if people are satisfied with their purchases. The first topic we’ll discuss in this chapter is spatial processing in SAP HANA. Spatial Modeling SAP HANA has very strong native spatial data processing capabilities, and they just keep getting better with each new version. Companies want to be able to use all the data they have in useful ways. They might have a variety of tools, their data is often captured in different formats, and they use different services. Compatibility and integration is therefore a must-have requirement when working with geospatial data. You might get data from satellites or weather services. How do you combine this data with the data in your own databases, even when some departments in the company are using different open-source and commercial tools? SAP HANA is certified Open Geospatial Consortium (OGC) compliant. The OGC is a consensus standards organization that creates open standards for geospatial content and services. As an industry standard, it’s important that SAP HANA conforms to these standards. One of these standards is ISO/IEC 13249-3 SQL/MM Spatial (the “MM” is for multimedia). These specify SQL functions that you can use to manipulate spatial data in SQL code. We’ll look at these functions in this section. The Environmental Systems Research Institute (ESRI) is the largest international software supplier of geographic information system (GIS) software, and it has about 43% of the worldwide GIS software market share. ESRI has an entire suite of GIS software, called ArcGIS. ESRI developed the ESRI shapefile vector format, which has become a de facto standard to transport spatial data. SAP HANA is ESRI GIS Enterprise Geodatabase certified. SAP HANA also works with several map providers, such as HERE and OpenStreetMap. HERE was formerly known as Navteq, and then Nokia Here maps. OpenStreetMap is a collaborative project, like Wikipedia, that creates a free and editable map of the world. Key Concepts Refresher Chapter 11 We’ll discuss some of the SAP HANA spatial features and capabilities in this section. We’ll start by looking at how to store two- and three-dimensional spatial data in SAP HANA, and how to use the data in calculation views using spatial joins, expressions, functions, and predicates. We’ll demonstrate that you can enrich the master data of your customer data by geotagging and geocoding to provide longitude and latitude information to customer addresses. You can then visualize and analyze your enterprise data using standard SAP analytics clients, such as SAP Analytics Cloud. We’ll also take a quick look at spatial clustering and the SAP HANA spatial services. Tip Geospatial processing is a subset of spatial processing related to geography. Spatial processing can include CAD models, augmented reality (AR), virtual reality (VR), or even models of the human body, and it uses a three-dimensional coordinate system that differs from the geospatial coordinates of longitude, latitude, and height above sea level. In this chapter, we’ll focus on the geospatial subset. As with any new topic, half the work is learning the vocabulary. We start by clearing up some misconceptions that you might still have from your school days. Spatial Reference Systems Most schools have a big map of the world somewhere on a wall. Most often this map is the Mercator map of the world, as illustrated on the left side of Figure 11.1. There are many advantages to using this map. It’s nice rectangular format fits nicely on a classroom wall. The biggest advantage is that you can plan routes from one place to another using straight lines. Just think about that for a moment. The earth is a three-dimensional sphere. To travel long distances, you’ll actually travel over the horizon, that is, in a three-dimensional curve. This map simplifies it to a flat straight line using a Cartesian coordinate system, which makes calculation easy and fast. The disadvantage is that to accomplish these amazing feats, it has to distort reality. On the Mercator map shown in Figure 11.1, it seems that Greenland and Africa are about the same size, yet Africa is 14 times larger than Greenland. You can see the actual distortion by using the map tool on http://s-prs.co/v487626 or looking at the map graphic at http://s-prs.co/v487627. 463 464 Chapter 11 Spatial, Graph, and Series Data Modeling Our GPS maps and devices on the other side use the more accurate three-dimensional model of the earth. Just to clarify; the earth isn’t a perfect sphere with a constant radius of 6371 km. In reality, it’s slightly flatter at the poles, or bulging at the equator, depending on your viewpoint. Look at the right side of Figure 11.1. The radius to the poles (shown as p) is about 6 357 km, while the radius to the equator (shown as e) is about 6 378 km. The earth is a spheroid. And even this is an extreme simplification, as the earth has “bumps” in places. Working with calculations in this model is most complex and therefore slower. This model uses the spatial reference system (SRS) called WGS84, where WGS stands for World Geodetic System. Each SRS has its own specifications; for example, WGS84 uses distances in meters, whereas another SRS could use feet. Traditionally, the latitude is given before the longitude, but in a Cartesian coordinates system, you usually mention the x-coordinate before the y-coordinate; therefore, some round-earth SRSs use longitude before latitude. Most commonly, people refer to the system shown in Figure 11.1 as “flat-earth” and “round-earth” models. Each of these models has its advantages and disadvantages. ST_Transform ([SRID]) p e (Web) Mercator Projection Flat-earth SRS with Cartesian Coordinate System WGS84 (World Geodetic System) Round-earth SRS with Geographic Coordinate System Figure 11.1 Two Popular Spatial Reference Systems Each of the SRSs has a spatial reference system identifier (SRID). The SRID for WGS84 used by GPS is 4326. There are hundreds of SRIDs. For example, SRID 3857 is a spherical Mercator projection coordinate system used by web services such as Google Maps and OpenStreetMap. And there is WGS84 (planar), with SRID 1000004326, that is similar to WGS84 except that it’s a flat-earth type of SRS that uses an equirectangular projection of the accompanying distortion to length and area. Key Concepts Refresher Chapter 11 Tip As a best practice, always specify the SRID when working with spatial data. Your assumptions don’t always work, and it’s easy to make mistakes when the SRID isn’t specified. If your data is stored in the 4326 format, but you want to use it with data available in these systems, you’ll need to transform your data to the 3857 format (or transform the other data to 4326). You can do this using the ST_Transform function. We’ll discuss spatial functions later in this section. Sometimes you also need to transform your data to a different SRS if the spatial functions you want to use don’t work in the SRS of your data. For example, some of the spatial functions in SAP HANA don’t support round-earth SRSs. You’ll have to transform your data to a planar SRS before calling those functions. SPS 04 The spatial functions in SAP HANA that don’t support round-earth SRSs are decreasing with every new release of SAP HANA. SAP HANA also has a default SRS of 0, which is a two-dimensional (planar) system that is strictly Cartesian without any relationship to the earth’s shape. This SRS can be used, for instance, to reference a space of limited dimensions, such as an office building, a factory, or a cornfield. SPS 04 The in-memory representation of geometries in planar SRSs has been optimized and now uses up to 50% less memory. This memory footprint reduction is important to lower the total cost of ownership (TCO) for SAP HANA. You need to specify the SRID when creating tables with spatial data in SAP HANA. Creating Tables with Spatial Data When you work with graphics programs on your computer, it often shows you the position of your mouse or pen cursor with coordinates (x, y). Spatial data is often also stored this way. Or your GPS shows you a point of interest (POI) such as a restaurant as latitude and longitude, depending on what SRS it uses. This is similar to (x, y). 465 466 Chapter 11 Spatial, Graph, and Series Data Modeling Sometimes you also need the height above or below sea level, for example, when you’re flying a glider, climbing a mountain, or working in a mine. In this case, you store this value in the z-coordinate, and your POI is shown as (x, y, z). Furthermore, you also might need to store a measure with the three-dimensional POI. This measure could be the air pressure for weather prediction, wind speed while flying a glider, or temperature underground in the mine. You can store this in the m-coordinate, and your POI is then shown as (x, y, z, m). Figure 11.2 shows a list of the spatial data types in SAP HANA with graphical examples. The point position is the most basic spatial data type. The OGC standard states that a point value can include a z- and/or m-coordinate value. In SAP HANA, all spatial data and code is prefixed with ST_. The point spatial data type in SAP HANA is ST_Point. Tip The ST_POINT spatial data type has the limitation that it only stores two-dimensional geometries. No z- or m-coordinate can be stored in a ST_POINT spatial data type. If you need z- or m-coordinates, you should use the ST_GEOMETRY spatial data type. Each of the spatial data types illustrated in Figure 11.2 is available in SAP HANA and is prefixed by with ST_. Point MultiPoint Polygon Polygon (ring) Figure 11.2 Spatial Data Types LineString MultiLineString MultiPolygon CircularString GeometryCollection Key Concepts Refresher Chapter 11 Figure 11.3 shows the hierarchy of the spatial data types. The top data type is ST_ Geometry, and it’s a superset of all the other spatial data types. The following list describes the spatial data types: 쐍 ST_Geometry This is the supertype for all the other spatial data types; you can store objects such as linestrings, polygons, and circles in this data type. Then you can use the TREAT expression to specify which data type you want to work with. For example, TREAT( geometry1 AS ST_Polygon ).ST_Area() will take a geometry, treat it like a polygon, and ask the size of the polygon’s area. 쐍 ST_Point A point is a single location in space with x- and y-coordinates at a minimum. Points are often used to represent addresses, as in the shop example in Figure 11.4 in the next section. 쐍 ST_LineString A linestring is a line, which can be used to represent rivers, railroads, and roads. In the shop example, it could represent a delivery route. 쐍 ST_CircularString A circularstring is a connected sequence of circular arc segments, much like a linestring with circular arcs between points. 쐍 ST_Polygon A polygon defines an area. In the shop example, this is used to represent the suburbs in the second table (see Figure 11.4 in the next section; most suburbs don’t have regular square shapes). 쐍 ST_MultiPoint A multipoint is a collection of points. For example, it could represent all the different locations (addresses) where a company has shops. 쐍 ST_MultiLineString A multilinestring is a collection of linestrings. In the shop example, this collection could include all the deliveries that a shop has to make on a certain day for which the driver goes to different customers or suppliers. 쐍 ST_MultiPolygon A multipolygon is a collection of polygons; for example, the suburbs could be combined to indicate the area of the city or town. 쐍 ST_GeometryCollection A geometry collection is a collection of any of the spatial data types mentioned in this list. 467 468 Chapter 11 Spatial, Graph, and Series Data Modeling ST_Geometry ST_Point ST_LineString ST_CircularString ST_Polygon ST_MultiPoint ST_GeometryCollection ST_MultiLineString TS_MultiPolygon Figure 11.3 Hierarchy of the Spatial Data Types Now that you know the spatial data types, it’s easy to created tables using core data services (CDS). In Listing 11.1 you can see the CDS code used for creating two example tables. The first table stores the location of the shops or branches of a company. In this case, ST_Point is OK because we only need the longitude and latitude of the shops. The second table stores the shapes of all the suburbs in the city. namespace spatial.spatialdb; context spatialtables{ entity GEO_SHOP { key SHOP_ID: Integer not null; SHOP_NAME: hana.VARCHAR(100); SHOP_LOCATION: hana.ST_POINT; }; entity GEO_SUBURB { key SUBURB_ID: Integer not null; SUBURB_NAME: hana.VARCHAR(100); SUBURB_SHAPE: hana.ST_GEOMETRY; }; }; Listing 11.1 CDS Syntax for Creating Two Tables with Spatial Data We can now use these tables in our calculation views. SPS 04 As from SAP HANA 2.0 SP 04, we can now also store spatial data in warm and cold storage using native storage extension (NSE). We discussed data aging in Chapter 3. This helps to lower the TCO for SAP HANA for customers with lots of spatial data. Key Concepts Refresher Chapter 11 We first mentioned spatial processing in Chapter 5 when we discussed spatial joins. Now, let’s take a closer look at spatial joins in an SAP HANA system. Spatial Joins and Predicates Figure 11.4 shows two tables that contain spatial data. The left table, called GEO_SHOP, contains shop locations, and the right table contains city suburb shapes. We use these two tables in a calculation view. In this case, we put them in a Join node. Figure 11.4 Two Tables Containing Spatial Data with a Spatial Join You can join different data spatial types together in a spatial join. Figure 11.5 shows the details of a spatial join. Notice that the SPATIAL JOIN section is available for this join type. The Predicate dropdown list in this section allows you to specify how to join the fields. 469 470 Chapter 11 Spatial, Graph, and Series Data Modeling In the PROPERTIES section, we see the join showing up as an inner join, but this is grayed out because our join is between fields containing spatial data. When you click on the create join button on the top-right toolbar of Figure 11.4, you get the screen shown in Figure 11.5. In the shop example, we’ll use the Contains option because we want to see which suburb (right table) contains which shop addresses (locations). In other words, we want to know in which suburb each shop is located. Tip Don’t confuse the Contains option in the spatial joins with the CONTAINS predicate we used when executing a fuzzy text search in the previous chapter. Figure 11.5 Spatial Join Details So, when do you use which predicate? Figure 11.6 illustrates many of the spatial predicates. Most of these are quite obvious; for example, the first predicate shows that the Point is WITHIN the triangle. Key Concepts Refresher Chapter 11 .ST_Within ( ) .ST_Disjoint ( ) .ST_Equals ( ) .ST_Crosses ( ) .ST_Intersects ( .ST_Touches ( ) ) .ST_Contains ( ) .ST_Overlaps ( ) .ST_Intersects ( ) .ST_Covers ( ) .ST_Contains ( ) = false Figure 11.6 Spatial Predicates There are a few things to watch out for though: 쐍 We think of shapes as, for example, a triangle or an ellipse with a surface area. There is the area of the shape and the exterior, which is everything outside the shape. But there are actually three “areas” to watch out for. There is the inside area, the exterior outside, and the boundary. Think of the boundary as the skin around your body, separating your insides from the outside. A shape can have more than one boundary; for example, a doughnut or ring shape has a round boundary on the outside, as well as a round boundary on the inside for the hole. Look at Figure 11.9 in the middle left to see an example of such a shape. 쐍 The ST_Touches predicate is where another shape overlaps with one or more points of the boundary but doesn’t overlap with any points inside. Hence, returning to the analogy of thinking of the boundary as your skin, this is what happens when someone touches you. 471 472 Chapter 11 Spatial, Graph, and Series Data Modeling 쐍 ST_Crosses and ST_Intersects is when a line or polygon goes over the boundary into the shape. Typically, this is used in geospatial applications to see where roads or railway lines cross rivers or enter another country. 쐍 There is a subtle difference between ST_Contains and ST_Covers in that ST_Contains means a smaller shape is fully enclosed inside the area of the main shape. But what happens when a point is on the boundary of the main shape? In that case, the ST_Contains predicate returns false as there is a difference between the boundary and the area. ST_Covers on the other hand means there are no points of the smaller shape on the exterior of the main shape. When a point is on the boundary of the main shape, the ST_Covers predicate returns true as the point isn’t outside the main shape. (ST_Covers isn’t part of the OGC standard, but it’s available in SAP HANA.) ST_Covers extends the ST_Contains predicate to be more “tolerant" for the boundaries. ST_Covers and the symmetrical ST_CoveredBy methods support round-earth SRSs, which isn’t the case for ST_Contains and ST_Within, which work only on flat SRSs. For a different business case, you could use the GEO_SHOP table on both sides of the join in Figure 11.5 and use a Within Distance predicate in the join. You can then find all the shops within a certain distance of a selected shop. Banks typically use something similar to show you the nearest ATM or bank branches from your current location. Figure 11.7 shows an SAP demo application that looks at gas lines in a city (dark lines along some of the streets) and shows when they get within a certain distance of buildings where they could become a problem. With 10 million pipeline assets across the country, SAP HANA accelerated the application from 3.5 hours to 2.5 seconds. With response times of 2.5 seconds, such an application becomes real time. This screen is zoomed into a specific area, which is why it shows “only” 642 pipes and 1,455 buildings. The response time was less than half a second. This is then combined with business processes by linking the results back to asset management and work management systems. Spatial Expressions Apart from spatial joins, we can also use spatial data in calculated columns and filter expressions. Figure 11.8 shows how you can create a spatial expression in a calculated column. In the Functions tab, you select the Spatial Predicates you want to work with. In this case, we used the Contains predicate that we used in the spatial join. Key Concepts Refresher Chapter 11 Figure 11.7 SAP Demo Application on OpenStreetMap Using a Spatial Distance Join Figure 11.8 Spatial Expressions in Calculation Views 473 474 Chapter 11 Spatial, Graph, and Series Data Modeling More than 80 Spatial Functions are available. SPS 04 More than 100 spatial functions are available in SAP HANA 2.0 SPS 04. Figure 11.9 shows some the spatial functions and their results. We start with a triangle and ellipse on the top left side, and apply four different functions to these shapes on the top line. Original Shapes ST_Union() ST_Intersection() ST_Difference() ST_SymDifference() Boundary and Area Envelope ST_EnvelopeAggr() Convex Hull ST_ConvexHullAggr() ST_IntersectionAggr() ST_UnionAggr() of “dog parts” Result is empty Result is “dog” Result is rectangle that contains all points Result is polygon that contains all points Figure 11.9 Spatial Functions and their Results In the second line, we see an example of the boundary and area terms we’ve already discussed. The second and third items on the second row show two functions that are used quite often. When we have a bunch of points, we often want to find a shape that encloses all these points. The easiest is by creating a rectangle, called an Envelope. This is easy because we just have to look for the minimum and maximum values of all the x and y values in the points. The other one is where we want a smaller polygon to enclose all the points. For this we create a Convex Hull. When you meet a function with Aggr (for aggregation) in the name, it can be applied to a collection of shapes. The ST_UnionAggr of all “dog shapes” will result in a “dog.” Key Concepts Refresher Chapter 11 Geocoding in an SAP HANA Flowgraph Geocoding is the process of converting address information into geographic locations, such as latitude and longitude. Because business data is associated with address information without location information, geocoding is important to enable geographic analysis of such address data. In many businesses, huge value is unlocked by using this additional type of analysis. Reverse geocoding does exactly the reverse, as the name implies. It takes geospatial data such as latitude and longitude, tells you what company is located there, and returns the address data. This functionality needs a geocoding index. This is very similar to the full-text index we already discussed in the previous chapter, except that we work with address and geospatial data instead of text data. SAP HANA smart data quality, that we discussed in Chapter 9, provides native geocoding and reverse geocoding capabilities in SAP Web IDE for SAP HANA. This can be accessed with a Geocode node in a graphical flowgraph model. In Figure 11.10, we combined the Geocode node in an SAP HANA smart data quality workflow that also includes data cleansing. Figure 11.10 Flowgraph Model with a Geocode Node 475 476 Chapter 11 Spatial, Graph, and Series Data Modeling Spatial Clustering Let’s say you need to determine where to build a new clinic or hospital in an area. You can look at a map to see where other clinics and hospitals are and make sure you’re far enough away from the existing ones. But you’ll most likely end up in an area where there are very few people, and building the clinic or hospital will just be a waste of money. You also need to look at the population densities, and the general demographics of the areas. The same goes if you want to open a new branch for your business. You can use spatial clustering to group sets of points together in clusters that meet your criteria. This is similar to the Predictive Analysis Library (PAL) algorithms, such as k-means, that we discussed in the previous chapter, except that you now work with data that also has geospatial properties. In SAP HANA SPS 03, you can only use spatial clustering for two-dimensional points in a flat-earth SRS. Figure 11.11 shows the spatial clustering algorithms available in SAP HANA, as described here: 쐍 The first clustering algorithm (top left) is very easy. You merely throw a grid across the points and count how many points are inside each cell. In the last column, we show the counts for each cell as an example. Figure 11.12 shows what this looks like on a real map. Here we have a database of POIs in Europe. The advantage of the grid clustering algorithm is that it’s easy and fast, quickly providing us an idea of the distribution of the points. SPS 04 The second clustering algorithm (top right) is very similar to the grid clustering algorithm. It just uses hexagons instead of rectangles. In certain cases, it can give more visually appealing results. 쐍 The k-means clustering algorithm works well when the points tend to be in spherical clusters. It’s centroid-based and gives better insights than simple grid or hexagonal clustering. However, this algorithm is more complex and takes longer to process. Figure 11.13 shows the same map and data set as was used in Figure 11.12 but using the k-means clustering algorithm. Key Concepts Refresher Chapter 11 쐍 The DBSCAN clustering algorithm works best with nonspherical clusters. It’s density-based, meaning it requires a certain number of data points in the neighborhood of a point. You can set the radius of the neighborhood. 5 2 2 4 3 1 0 0 Rectangular Grid Hexagonal Grid (from SPS 04) K-means DBSCAN Figure 11.11 Spatial Clustering Algorithms Figure 11.12 shows the grid clustering algorithm used on a map of Europe with a database of POIs, such as restaurants, museums, hospitals, schools, ATMs, and public transport. You can choose which type of POI you want, zoom into the area you plan to visit, and choose the spatial clustering algorithm to analyze the data for that area and POI category. Figure 11.13 shows the same map but using the k-means clustering algorithm. 477 478 Chapter 11 Spatial, Graph, and Series Data Modeling Figure 11.12 SAP Demo Application Using the Grid Spatial Clustering Algorithm Figure 11.13 Results in an SAP Demo Application with the K-means Spatial Clustering Algorithm Spatial Functions Using SQLScript You saw an example of a spatial function used with SQLScript earlier in the “Spatial Joins and Predicates” section, where we asked for the area of a polygon to be calculated, for example, geometry1.ST_Area(). You can ask if a shop’s location is in a suburb as follows: Key Concepts Refresher Chapter 11 SUBURB_SHAPE.ST_CONTAINS(SHOP_LOCATION1) You can also see how far two shops are from each other as follows: SHOP_LOCATION1.ST_WITHINDISTANCE(SHOP_LOCATION2) You can use a spatial join with the ST_Within predicate. When ST_Within returns 1, it means shape b is inside shape a, as follows: SELECT b.id as b_within, a.name, a.shape, b.point FROM A, B WHERE b.point.ST_ Within( a.shape ) = 1; If you want to find all the shapes that aren’t within shape a, you just use a spatial join where ST_Within returns 0, as follows: SELECT b.id as b_not_within, a.name, a.shape, b.point FROM A, B WHERE b.point.ST_Within( a.shape ) = 0; You can see that the spatial functions are used like methods in object orientation, using a dot-notation. Note You can find all the spatial functions and their descriptions in the SAP HANA Spatial Reference guide at http://s-prs.co/v487620. A software development kit (SDK) is also available that you can use to create your own custom geospatial algorithms. Importing and Exporting Spatial Data In SAP HANA, you can import and export spatial data in the Well-Known Text (WKT) and the Well-Known Binary (WKB) formats. These formats are part of the OGC standards. You can also import and export the extended versions of these formats, which are used by PostGIS: Extended Well-Known Text (EWKT) and Extended Well-Known Binary (EWKB). In addition, SAP HANA can import all ESRI shapefiles, except the MultiPatch shape type. SAP HANA can export spatial data in the GeoJSON and Scalable Vector Graphic (SVG) file formats. The SVG file format is maintained by World Wide Web Consortium (W3C) and is supported by all modern browsers. 479 480 Chapter 11 Spatial, Graph, and Series Data Modeling Sometimes you get public spatial data in comma-separated values (CSV) files. As we discussed before, you can import these using CDS .hdbtabledata files or using SAP Web IDE for SAP HANA in SAP HANA 2.0 SPS 04. SAP HANA Spatial Services In 2018, SAP released a set of spatial services on the cloud. These services include weather information, earth observation, and satellite imagery, for example, from the European Space Agency (ESA). You can subscribe to these SAP HANA spatial services and call them using application programming interfaces (APIs). You can find more information at http://s-prs.co/v487603. You can use this set of services, for example, in agricultural scenarios to look at satellite imagery of biomass. This gives you an excellent overview of how fast crops are growing or where there are problem areas caused by insects or plant diseases. Crop insurers can use this information to assess their risk, farmers can use it to calculate crop forecasts, and governments can use it to plan food security. SPS 04 SAP HANA spatial services is now available inside GeoTools. GeoTools is a free opensource GIS toolkit that is used by many open-source and commercial software projects. GeoServer is one of the prominent server-side geoprocessing and data access software tools that uses GeoTools. These tools can now all access and serve geodata stored in SAP HANA. Note To learn more about spatial processing in SAP HANA, look at the examples in the SAP HANA Interactive Education (SHINE) demo content. Here, you can find many examples of spatial processes; for example, screens in which you can select shops on a map on the left of the screen and see their sales details in the area on the right. You can find more information about spatial data in the SAP HANA Spatial Reference Guide. There is also an openSAP course called “Spatial Analysis with SAP HANA Platform” at http://s-prs.co/v487628 from mid-2017. There is also an SAP HANA Academy playlist for spatial analysis on YouTube at http://s-prs.co/v487604. Graph Modeling Next, we’ll discuss graph processing. In this case, we’re not talking about reporting graphs, which can be thought of as charts. Instead, think about social media Key Concepts Refresher Chapter 11 networks, such as LinkedIn and Facebook, where people are linked to many other people. If we draw the relationships between hundreds of people, it would result in a complex spider web-type of diagram. This is referred to as a graph. Graphs are therefore used to model the relationships and attributes of highly networked entities. You can use graph processing together with all the other types of SAP HANA processing, such as textual, spatial, and predictive. This way, you can easily build very complex models that would be very difficult with most other development platforms. Figure 11.14 shows an SAP demo application for planning music festivals. The demo was created to show what’s possible with SAP HANA. In this example, a large number of music fans are spread out over an area. Figure 11.14 SAP Demo Application Using Spatial and Graph Processing to Plan Music Festivals The fan base is shown in a heat map in the middle-top analysis of the area. If you only plan one music festival, not all the fans can come to it due to travel distance, time, and costs. You can see the roads on the map on the right side. If you have a hundred music festivals, it will take too much of your time, cost you too much, and audiences wont’ be large. So, what is the optimal number of music festivals that you should organize, and where should you have these festivals? The main screen on the left shows that SAP HANA does the calculations in a few seconds and 481 482 Chapter 11 Spatial, Graph, and Series Data Modeling suggests 10 festivals at the places shown. It also generates a heat map (middle-bottom map) of the traffic patterns that your music festivals will create. In this demo, we have spatial data with the locations of where the music fans stay and where we’ll have the festivals. We also have graph data in the form of the roads, linking various spatial locations together. We also have to combine this with business data, such as how far are fans prepared to travel, how much should the tickets be, what does a music festival cost the organizers, and how much time is involved with organizing and running such events. To understand the process of how to create graph models, we’ll start with a small example of a group of people. From a sketch, we’ll see how to create this in SAP HANA. After we’ve created the graph, we can view and analyze it using various graph algorithms. With this background, it will be easy to implement it into a calculation view. We’ll close off the topic by looking at two graph query languages available in SAP HANA and how graphs are different from hierarchies. Creating Graphs in SAP HANA Graph processing allows you to discover patterns in real time on large volumes of data, which is something that isn’t possible for humans. We can query the graph to find answers to questions such as the following: 쐍 What are the chances of two specific people going on a date? 쐍 Who is the best person to connect with on LinkedIn to get contact with a buyer? 쐍 In this region, what are the best products to sell to people according to their likes and dislikes on Facebook? You could develop such queries using standard SQL, but it would be very complex, and the performance would probably be really bad. A lot of specialized graph databases have appeared over the past five years. The disadvantage of such specialized databases is that it becomes difficult to combine it with other data and business information to solve wider problems and create more powerful solutions. SAP HANA has, in addition to all its other features, a special graph-processing engine to handle graph queries elegantly and fast. In SAP HANA 2.0, you can query the graph data using graphical calculation views, the Cypher Query Language (Cypher), or GraphScript. LinkedIn makes suggestions of who you can connect with next based on the people you’re currently connected with and who they are connected with. They need some form of graph processing to do this. It’s quite complex because people work Key Concepts Refresher Chapter 11 for companies, some people are looking for new jobs, other people want to show their thought leadership, some want to advertise their products and services, and yet others want to recruit someone for a position at another company. People have different roles and needs, and this shapes the different types of relationships. For us to understand graph processing, we’ll use a much smaller example. Figure 11.15 shows an example of a few colleagues and their various relationships. The colleagues are obviously also working for SAP. There is an organizational structure in the company with reporting lines. Most of the colleagues are married. Even in this small group, we already see at least three different types of relationships, and we have two types of entities: a company and people. With this small group, we can draw the relationships in a diagram. However, when we look at hundreds of millions of relationships—such as those on LinkedIn— drawing such a diagram obviously becomes impractical. Michelle SAP Hans-Joachim Lerato Tshepo Sikhumbuzo Phillip Kele Works for Married to Reports to Person Company Figure 11.15 Graph Diagram Showing Relationships Let’s assume we wanted to analyze this graph. We first need to put the data into the SAP HANA system. For this, you’ll need a table with the people and the company. These are the nodes in the graph, as illustrated in Figure 11.15. These nodes are called vertices and are stored in a normal SAP HANA column table. Each vertex can have multiple attributes, for example, whether this node is a person or a company, whether the person is male or female, the age of the person, and so on. These attributes are simply additional fields in the column table. 483 484 Chapter 11 Spatial, Graph, and Series Data Modeling In another table, we store the relationships, such as the existing “friends” relationships between these people. These relationships are called edges and are also stored in column tables. You can have different types of relationships between people. In Figure 11.15, we have “married to,” which is a bidirectional relationship, and “works for,” which only has a single direction. The direction of the “works for” relationship is “outgoing” for the employee of the company and “incoming” for the company. The nodes are the people and the company, and the lines are the relationships. The lines can have flow direction, meaning we can run traces, such as the reporting lines in a company. From this, we can deduce the organizational structure. In this case, it would be similar to a hierarchy. But these lines can also flow in “circles,” or in both directions, which would not be allowed in hierarchies. The direction of a relationship is determined by the “source” and “target” attributes. For example, the “works for” relationship has the employee’s name in the “source” field, and the company’s name in the “target” field. While this sounds very specific, just remember that many tables in a normal business system fit this pattern. In commercial systems, when you ship a product to someone, you have pickup and delivery addresses. When you fly, you have departure and arrival airports. Any table with “from” and “to” fields is usable. Even normal SAP HANA views with financial data according to time periods—with start and end dates— can be used. Even if a table doesn’t exactly have this format, you can create a view that does and use the view for the graph processing. Tip It’s important to realize that you can use normal tables and information views for graph processing in SAP HANA with your current business data. You don’t need a separate specialized database or tables for graph processing when using SAP HANA. You can use the attributes in vertices and edges when querying the graph. Remember that these attributes are the additional fields in the database tables or views. The important piece that you need to enable the graph processing in SAP HANA is a graph workspace. A graph workspace is a simple file that links the vertex table to the edge table. Note The graph workspace simply contains the definition of the graph model and how the vertices and edge tables relate to each other. Key Concepts Refresher Chapter 11 Building the Graph Tables and Workspace Now that you have an idea of how SAP HANA enables graph processing, let’s create the graph for Figure 11.15 and then use SAP HANA to analyze it. This simple example will make it easy for you to understand the entire process. We already know that we need two tables: a vertex table for the nodes and an edge table for the relationships. In Figure 11.15 we have two types of nodes: people and the company. All the nodes have a name. Our vertex table therefore just needs two fields: the NAME and the TYPE. This table is defined in the CDS table definition in Listing 11.2 and is called MEMBERS. In our case, the names are unique, so we can use the NAME field as the key field. Otherwise, we would have had to add a key field with unique values for each record. The RELATIONSHIPS table needs SOURCE and TARGET fields. Both fields simply use the NAME (key field) of the person or company from the MEMBERS vertex table. We also need to specify the TYPE of relationship, for example, “works for.” And we add a KEY field. namespace graph.graphteam; context team{ entity MEMBERS { key NAME: hana.VARCHAR(100) not null; TYPE: hana.VARCHAR(100); }; entity RELATIONSHIPS { key KEY: Integer not null; SOURCE: hana.VARCHAR(100) not null; TARGET: hana.VARCHAR(100) not null; RELATIONSHIP: hana.VARCHAR(100); }; }; Listing 11.2 CDS Code to Create Tables for Vertices (Members) and Edges (Relationships) After you’ve created your .hdbcds file, build it to create the two tables. The next file we need is an .hdbworkspace file to create the graph workspace. This is a database artifact file. Listing 11.3 shows how to define the workspace file. 485 486 Chapter 11 Spatial, Graph, and Series Data Modeling You specify which table or view is your EDGE TABLE and then specify which fields in it are the SOURCE, TARGET, and KEY fields. Then you specify which table is your VERTEX TABLE and what field in is its KEY field. GRAPH WORKSPACE "graph.graphteam::teamgraph" EDGE TABLE "graph.graphteam::team.RELATIONSHIPS" SOURCE COLUMN "SOURCE" TARGET COLUMN "TARGET" KEY COLUMN "KEY" VERTEX TABLE "graph.graphteam::team.MEMBERS" KEY COLUMN "NAME" Listing 11.3 team.hdbworkspace File to Create the Workspace When you’ve created this simple file, build the file to create your graph workspace. Open the two tables in SAP HANA database explorer, and add the data from Figure 11.15 in the two tables. Figure 11.16 shows what the result looks like. Figure 11.16 Inserting the Data from Figure 11.15 into the Vertex and Edge Tables Key Concepts Refresher Chapter 11 You’ve now built your first graph in SAP HANA. Next, we’ll view and analyze your graph. Viewing and Analyzing Graphs You used the SAP HANA database explorer to insert data into the vertex and edge tables. On the top left area, look for the Graph Workspaces, as shown in Figure 11.17. This shows the list of graph workspaces in your SAP HANA Deployment Infrastructure (HDI) container. Right-click on your workspace, and select the View Graph option. Figure 11.17 Using the SAP HANA Database Explorer to View Graphs Your graph will now show in the area on the right in SAP Web IDE for SAP HANA. You can rearrange the nodes to appear where you want them. When you compare the graph shown in Figure 11.18, you’ll see that this is exactly what we had in Figure 11.15! To get the names on the edges (relationships), use the Options icon in the top-right corner, and set the Edge label. Tip The SAP HANA Graph Reference Guide and some blogs refer to an SAP HANA Graph Viewer that you install via SAP HANA Studio. You don’t need that because this graph viewer is built in to the SAP HANA database explorer in SAP Web IDE for SAP HANA. 487 488 Chapter 11 Spatial, Graph, and Series Data Modeling Figure 11.18 Graph Created in SAP HANA (Compare to Figure 11.15) You can now start analyzing the graph. Click on the < > icon in the top-right corner to open the dialog box shown in Figure 11.19. You can specify what type of algorithm you want to use when analyzing graphs. The available algorithms are Neighborhood, Shortest Path, and Strongly Connected Components. In SAP HANA 2.0, we can also specify a “pattern” of vertices and edges, together with some filters and conditions. We can then use pattern matching to find the parts of the graphs that match your pattern. You do this by writing some Cypher code. The first algorithm is Neighborhood. You specify from which Vertex you want to start and in which Direction the edges (relationships) must point. You can choose Outgoing, Incoming, or Any. Finally, you choose the Depth. A minimum depth of zero includes the starting vertex. A minimum depth of one will exclude the starting vertex and start one relationship level away from this vertex. The maximum depth specifies how deep you want to search through the graph. Key Concepts Refresher Chapter 11 In the first (top) example in Figure 11.19, you see the Outgoing direction for Michelle. There aren’t many outgoing edges, so the result from the analysis is small. When you change the Direction to Incoming, the result is much larger. You can play around by also changing the Direction to Any and changing the value of the Depth. This simple graph only has a Max Depth of 3. Figure 11.19 Using the Neighborhood Graph Algorithm 489 490 Chapter 11 Spatial, Graph, and Series Data Modeling Tip This graph algorithm only works in the graph viewer when the data type of the Vertex key column is a string. The Shortest Path algorithm finds the shortest path between two vertices. Remember that some edges are like one-way streets. The Direction therefore has a significant influence on your results. For example, if we change the direction in Figure 11.20 to Outgoing, we don’t get any result. You can use this algorithm, for example, to plan your flights. Most people prefer a direct flight without stopovers. However, if you specify a Weight, like looking at the cost of the trip, a flight with a stopover might be cheaper. You do this here by selecting the cost field in your table for the Weight Column. Figure 11.20 Using the Shortest Path Graph Algorithm The Strongly Connected Components algorithm finds the vertices that have the most edges between them and then groups the vertices together on how tightly they are linked. In Figure 11.21, SAP HANA created five groups. Three of these are quite obvious from such a small graph, as these are the married couples. The company is a “group” on its own. It shows each group in a different color. The last option in the graph viewer is Cypher. This programming language is the industry standard for graph processing. SAP HANA supports a subset of Cypher for querying graph data. SAP HANA also has another language from graph processing called GraphScript. We’ll compare Cypher to GraphScript later in this section. Key Concepts Refresher Chapter 11 Figure 11.21 Using the Strongly Connected Components Graph Algorithm We won’t teach Cypher in this book, but we’ll show you an example so you can get a feel for it. Listing 11.4 shows the Cypher code for showing the incoming edges to the SAP vertex. In this case, we could have achieved the same by using the Neighborhood graph algorithm, specifying “SAP” as the Start Vertex, and choosing Incoming as the Direction. But, of course, you’ll use Cypher for cases where the other algorithms don’t provide what you’re looking for. MATCH (a)-[e]->(b) WHERE b.NAME = 'SAP' RETURN a.NAME AS name, e.RELATIONSHIP as relation Listing 11.4 Cypher Code to Get All Relationships Going to “SAP” Figure 11.22 shows the result of our Cypher code. Now that you’ve seen and played with the graph viewer, working with the Graph node in calculation views will be easy. 491 492 Chapter 11 Spatial, Graph, and Series Data Modeling Figure 11.22 Using Pattern Matching Cypher Code on a Graph Graph Nodes in Calculation Views We discussed the Graph node in Chapter 6. You already saw that the Graph node must always be the bottom (leaf) node in a calculation view, and you have to ensure that you keep referential integrity on your data when using graph modeling. The Graph node, as shown in Figure 11.23, uses the graph workspace as its data source. After you’ve added your graph workspace in the Graph node, you’ll see the working area as shown on the right side of Figure 11.23. The dropdown menu on the top-right side shows the same algorithms that you’ve used in the graph viewer. The form fields are exactly the same as those in the graph viewer. Figure 11.24 shows the available fields for the Neighborhood and Shortest Path graph algorithms. The Graph node has one more option with the Shortest Path graph algorithm. You can also choose to use Shortest Path graph algorithms from a specified Start Vertex to many other vertices using the One To Many option. This isn’t possible in the graph viewer as it would have to show multiple graphs graphically on the screen. The Graph node on the other side returns results in a table, and it’s therefore possible to include this option here. You limit the number of vertices on the One To Many option by specifying the Depth fields. Key Concepts Refresher Chapter 11 Figure 11.23 Graph Node in a Calculation View Figure 11.24 Using the Various Graph Algorithms 493 494 Chapter 11 Spatial, Graph, and Series Data Modeling The Strongly Connected Components graph algorithm doesn’t have any properties or form fields. After you’ve specified some properties for your Graph node, you need to Save and Build your calculation view. You can then do a Data Preview of the results. Figure 11.25 shows the Properties area for the Strongly Connected Components graph algorithm, as well as the result after a build. You’ll notice that the results are the same as what we got in the graph viewer earlier in Figure 11.21. Figure 11.25 Strongly Connected Components Graph Algorithm and Results Tip It makes sense to use the graph viewer when exploring the possibilities. After you know what you’re looking for, you can create the Graph node. The Graph node isn’t as convenient for exploring possibilities because you have to do a Save, Build, and Data Preview for each round. After you know what you need, it’s much better at using the results together with other business data in a calculation view. Lastly, we can use the same Cypher code from Listing 11.4 in the Cypher editor of the Graph node, as shown in Figure 11.26. The results, in table format instead of graphical, are the same as those shown earlier in Figure 11.22. Key Concepts Refresher Chapter 11 Figure 11.26 Pattern Matching Cypher Code and Results We’ve mentioned that SAP HANA supports both Cypher and GraphScript for querying graph data. Let’s now look at when we use each one. Cypher and GraphScript In SAP HANA 1.0, we used Weakly Structured Information Processing and Exploration (WIPE). WIPE is a declarative language, similar to SQL, that allows easy graphbased queries to be created. As we focus on SAP HANA 2.0, we’ll ignore WIPE. In SAP HANA 2.0, you can query graph data using graphical calculation views, Cypher, or GraphScript. We’ve just finished looking at the calculation views. When do you use Cypher, and when do you use GraphScript? Cypher is used for pattern matching and graph traversals. It focuses on the what. For example, you’ll use this to zoom in to one person on LinkedIn. You’ll see what they studied, where they currently work, their previous jobs, how many languages they can speak, what courses they attended, and who they are connected to. The interest is on one vertex and its surrounding vertices. GraphScript, on the other hand, focuses on the how. It’s used for graph analysis, where you do topological analysis of the entire graph. This is used, for example, for 495 496 Chapter 11 Spatial, Graph, and Series Data Modeling planning routes using the shortest path, finding connected components, and determining page rank. Table 11.1 lists some differences that will help you decide when to use which language. Cypher GraphScript Focuses on the what Focuses on the how Pattern matching, traversal near a specific start vertex Graph analysis (of entire graph) Online transactional processing (OLTP)style: short, read and write, high selectivity Online analytical processing (OLAP)-style: long and expensive, mostly read, low selectivity Declarative, query language Imperative, scripting language Table 11.1 Comparing Cypher and GraphScript You’ve already seen the places in the graph viewer and in Graph nodes where you can use Cypher. You create a GraphScript procedure the same way that you create a SQLScript procedure—by creating an .hdbprocedure file. (see Chapter 8 for details.) Instead of specifying the language as SQLSCRIPT, you set it to GRAPH. Listing 11.5 shows what a blank GraphScript procedure looks like. Notice that you specify the READS SQL DATA AS clause like you did with read-only SQLScript procedures. This is obvious from Table 11.1 when you see that GraphScript is more OLAP style. PROCEDURE "procedurename"(OUT variable INT) LANGUAGE GRAPH READS SQL DATA AS BEGIN --Write your GraphScript code here END Listing 11.5 Creating a GraphScript Procedure in an .hdbprocedure File As with Cypher coding, we won’t teach you GraphScript in this book. (For more details, read the SAP HANA Graph Reference Guide.) We do, however, give you some sample code in Listing 11.6 to give you a feel for the GraphScript language. The BFS mentioned in the following refers to breadth-first search (BFS). Graph g = Graph("calcStock","GRAPH"); ALTER g ADD TEMPORARY VERTEX ATTRIBUTE (Double myStock = 0.0); Multiset<Vertex> myShare = Multiset<Vertex>(:g); Vertex startV = Vertex(:g, :startID); Key Concepts Refresher Chapter 11 TRAVERSE BFS :g FROM :startV ON VISIT EDGE (Edge e, Bigint levels) { Vertex s = SOURCE(:e); Vertex v = TARGET(:e); if (:levels == 0L) { v.myStock = :e.myStock; } else { v.myStock = :v.myStock + :s.myStock; } if (:v.myStock >= 25.0) { myShare = :myShare UNION {:v}; } }; Listing 11.6 Sample GraphScript Code SPS 04 In SAP HANA 2.0 SPS 04, the SAP HANA graph engine can now be used directly in SQL subqueries using the OPENCYPHER_TABLE function. The information is returned in JSON format. GraphScript has been expanded in SPS 04 and includes new algorithms. You can now also use virtual tables with GraphScript, which extends the reach of GraphScript to many other databases and federated data sources. (See the SAP HANA Smart Data Access section in Chapter 9.) You can now also use GraphScript for processing ad hoc graphs on data structures without having to first create a graph workspace. Graphs versus Hierarchies In Chapter 7, we spent a lot of time explaining the hierarchy functionality in SAP HANA calculation views. In many cases, graphs and hierarchies can look very similar. In Table 11.2 is a short summary to help you choose between hierarchies and graphs. Hierarchy Graph Uses standard SQL, ad hoc hierarchies, and SQL views Uses domain-specific languages such as GraphScript and Cypher Data often in a single table, including a single edge type (e.g., “parent”) Needs two tables for vertices and edges with a clear distinction between them Uses a graph workspace to link them together Few predefined traversal algorithms in SQL Powerful pattern matching queries Optimized for tree topology—strictly top-down Generic graph topology—can have “cyclical” occurrences Table 11.2 Differences between Hierarchies and Graphs 497 498 Chapter 11 Spatial, Graph, and Series Data Modeling Note You can learn more about SAP HANA graph processing in the SAP HANA Graph Reference Guide. There is an openSAP course from mid-2018 on graph processing called “Analyzing Connected Data with SAP HANA Graph” at http://s-prs.co/v487629. And, of course, there are videos on the topic on YouTube in the SAP HANA Academy channel. Series Data Modeling When searching for information on series data, you’ll notice that it’s normally referred to as “time series” data. SAP HANA uses “series data” because it can also include any data in a series form, including events and transactions. Time series data is often associated with the Internet of Things (IoT), where sensors capture and send out monitoring or event data. However, series data can also include data such as database log files. In the database world, interest in the topic of time series databases has seen the most growth of all types of databases over the past two years. At http://s-prs.co/ v487630, you can scroll down the page to see the Trend of the last 24 months. Time series databases have clearly seen the fastest growth in interest. Many new databases were released into the market that focus exclusively on time series data. Many of them basically store the data using a label (or “tag”) with key–value pairs. This is typical for many NoSQL databases. We’ll just provide an overview on the topic. You can combine series data with aggregation, analysis, and predictive modeling to create valuable insights into real-world operations. Series Data in Everyday Life Let’s say you want to keep track of your weight. Every morning after waking up, you joyfully jump on the bathroom scale and record your weight. After a while, the data will look something like that shown in Table 11.3. You’ve created a time series. The series data is a sequence of regular measurements taken over a period of time. Key Concepts Refresher Chapter 11 Person Date1 Weight You 1 January 176 lbs You 2 January 175 lbs You 3 January 174 lbs You 4 January 173.3 lbs You … … You 14 February 159 lbs You 15 February 160 lbs Table 11.3 Tracking Your Weight There are only three columns, but you’ll have hundreds of records in just a few months, resulting in a very long, thin weight table over time. SAP HANA is great for such tables because it’s column-based by default and also uses lossless data compression. For time series, SAP HANA can use additional compression methods to reduce the storage requirements even further. Series data uses the following various columns: 쐍 Profile column This is the sensor or object that you’re collecting data from. In this case, it’s the name of the person. You can also add other members of the family to the table. 쐍 Series column This column is the time interval for the data collection. Here you’re recording your weight daily. 쐍 Value columns The value in this table is your weight on that specific day. In SAP HANA, you can have multiple value columns; for example, you can also record how many hours you slept that night or your waist circumference. Let’s look at how SAP HANA can be used with series data. If you think about it, most data is series data. In a database, transactions are a series of data points. Many times, these transactions are recorded with a time stamp (date and time) field. In a database, all changes to the data are recorded in database log files, where each change is recorded with a time stamp. Log files are therefore series data. Log files record the changes to our data and are everywhere. 499 500 Chapter 11 Spatial, Graph, and Series Data Modeling Tip Because we’re working with time stamps, it’s recommended to store them using a single standard, for example, UTC or UTC with a constant offset. Changes to daylight savings or differences in time zones between countries can introduce complexities when combining series data with regular business data. As series data looks similar to normal data, you could use normal tables for series data. But sometimes the workload for series data isn’t exactly the same as a normal transactional workload. The volumes of data are normally also much larger. Hence, we need some special-purpose optimized features for managing series data. SAP HANA provides such features. Examples of Series Data Before diving into the details, let’s look at some use cases for series data and why companies are so interested in this data. In the previous Real-World Scenario box, you already saw one such use case. With the rise of sensors and IoT, the amount of generated data is increasing dramatically. For example, many cities around the world have installed smart meters at each home. In cities like Johannesburg, South Africa, you could have 1 million smart meters for electricity or water. These smart meters generate a reading every 15 minutes. Each meter therefore creates 2,920 readings per year. For 1 million meters, there are 2.9 billion readings per year! With such volumes of data, efficient storage is vital. The reason that cities are interested in such data is that they can analyze it to see usage patterns and make predictions for planning infrastructure and providing services for the population. With readings every 15 minutes, they can deduce what devices were used in each 15-minute interval. Assume a stove uses 500 W in 15 minutes, a hot water heater uses 250 W in 15 minutes, a TV uses 25 W, and an LED lightbulb uses 2 W. If they get a reading of 29 W in the past 15 minutes, they can deduce that this home probably had a TV and two lights on. If they get a reading of 512 W, the people were probably cooking and had a few lights on. The city then aggregates this data to see when people wake up, shower, and have breakfast. They use this to predict traffic patterns and electricity demand, see the effects of rain and cold weather, and so on. Another use case is seen in vehicles. All modern cars have a slot for an On-Board Diagnostics (OBD2) device. You can buy a fairly cheap device and plug it into your Key Concepts Refresher Chapter 11 car to see the data from your car, for example, speed and fuel consumption, as well as diagnostic and warning messages. As an exercise, you can build a tool in SAP HANA to analyze your driving, and use this information to decrease your fuel consumption. With the move to cloud and microservices architectures, the field of DevOps is growing daily. DevOps is a software development methodology that combines software development (Dev) with operations (Ops). DevOps aims to automate many of the processes in the software development cycle, reducing the time to market for new services. To ensure service uptime and optimal performance, you need to constantly monitor these microservices. Analyzing log files has become so important that new time series databases, such as Prometheus, have sprung up to address this need. Other use cases are daily currency conversion rates, share prices, and rainfall per month. As you can see, series data allows you to collect large volumes of data, analyze the way this data behaves over time, and use the information for patterns, trends, and forecasts. The fields of engineering, technology, electronics, development, operations, statistics, signal processing, and machine learning are coming together when we work with series data. Note It’s important to realize that data in the real world isn’t always perfect or complete. You’ll always get some missing readings or some abnormal readings due to network errors, faulty sensors, buffer overflows, bad code, or myriad other reasons. Internet of Things, Edge Processing, and Equidistant Series Data We started looking at IoT, smart meters, and OBD2 devices in the previous section. To better understand IoT, we can also look at a common IoT scenario—exercise and smart watches. As you move, run, or exercise, the watch saves your GPS location, the elevation, and your heart rate every second. So how does the watch know what your heart rate is at any moment of time? The electronics of the watch actually detect the individual heartbeats and measure the time between the heartbeats. The top graph in Figure 11.27 illustrates resting heartbeats. The time between heartbeats, shown as t1, is relatively long—just over 1 second. The bottom graph in Figure 11.27 illustrates heartbeats during exercise, and the time (t2) is much shorter. 501 502 Chapter 11 t1 t2 Spatial, Graph, and Series Data Modeling Heart rate at 50 bpm (beats per minute) Heart rate at 150 bpm Figure 11.27 Heart Rates Measured by a Smart Watch The watch now counts the number of heartbeats in a time period to calculate your heart rate. If it measures 20 heartbeats in 15 seconds, it calculates that this would result in 80 heartbeats per minute, and thus your heart rate is 80 bpm (beats per minute). The important thing to realize is that almost all IoT devices do some calculations on the data. This is called IoT edge processing. The watch processes your heartbeat data, averages it over the past few seconds, calculates your heart rate, and displays this average to you every second. The watch doesn’t store the “raw” data, which is, in this case, the time at which each heartbeat occurs. Rather, the watch stores the calculated value of your heart rate each second. The series data that the watch provides every second is called equidistant. Equidistant is when the time increments between successive records in the series data are a constant—in this case, 1 second. When you look at the heartbeat data that the watch actually measured (refer to Figure 11.27), you’ll see that t1 and t2 aren’t the same. Using edge processing, the watch changed the nonequidistant heartbeat series data into equidistant heart rate series data. The heartbeat data is showing the actual events, that is, the exact time when each heartbeat occurred. The heart rate data, on the other side, shows aggregate summaries at regular intervals of 1 second. From this, you see that recording actual event data provides nonequidistant series data. Aggregated data normally provides equidistant series data. For example, actual financial transactions (events) are nonequidistant, whereas daily or monthly balances (aggregates) are equidistant. Key Concepts Refresher Chapter 11 It’s not just edge processing that performs aggregation of series data. You can also aggregate series data in SAP HANA, as discussed later in this appendix. SAP HANA can also convert nonequidistant series data into equidistant data. Using series data functions, you can specify which rounding rules you want to apply. For example, you can round 8:30 down to 8:00 or up to 9:00. You might be concerned that you’re losing some data when IoT devices are using aggregations. Most of the time, this isn’t a concern because you normally don’t want that much data. When exercising, you don’t want to see each heartbeat. The aggregated heart rate information is exactly what you need at that point. If, however, you’re a medical practitioner, you might be interested in the actual heartbeats, for example, to detect irregular heartbeats or atrial fibrillation. In manufacturing plants, “historian” databases are used to store the IoT series data. These databases can then be used later to do predictive maintenance using predictive algorithms like those we discussed in Chapter 10. Note IoT and series data are very often combined with streaming data, predictive modeling, and machine learning. Everything you learned in Chapter 10 and this chapter is used together in powerful ways to create business insights. SAP HANA provides everything you need for addressing these requirements. SAP HANA for Series Data Due to the increased popularity of time series databases, quite a few of these new databases have appeared in the market. As the market is maturing, users of these time series databases run into some limitations; for example, some of them only allow a single value field (like Table 11.3). This is due to the key–value pair structure they received from their NoSQL heritage. You have to create another new table to add a new value column. In SAP HANA, you can have multiple value columns—for example, you can also record how many hours you slept that night or your waist circumference—all in the same table. One of the biggest limitations of dedicated time series databases is that they can only store time series data. SAP HANA databases can contain both series data and regular business data. From a modeling perspective, there is no difference between these different types of database tables. You can integrate series data with any other type of data to produce powerful applications and analytics by using 503 504 Chapter 11 Spatial, Graph, and Series Data Modeling calculation views or standard SQL statements to query any table, including series data tables. Many of the dedicated time series databases use their own limited query languages with SQL-like syntax that just introduces mental friction. Existing SQLbased tools don’t work with these databases. SAP HANA series data can be accessed using standard SQL, in which simple queries stay simple, complex queries are powerful, and joins are easy. Data modelers and business users can get started immediately using familiar business intelligence (BI) tools, using legacy systems to query this data, and inheriting a vast ecosystem of functionality (e.g., streaming data; extract, transform, load (ETL); enterprise resource planning (ERP); BI; visualization; etc.). Although there is no difference in using a series table in SAP HANA from a modeling perspective, technically, there is obviously a difference between series data tables and regular column tables. You’ve seen that series data can have large volumes and that the tables tend to be long and thin. SAP HANA can use additional compression methods to reduce the storage requirements even further. In Table 11.3 previously, you saw a series data table. While you can store this data in a normal column table in SAP HANA, which will already provide quite good compression, you can do even better. Instead of actually storing the series (Date1) column, you can calculate these dates. The table is an equidistant series data table, meaning that the time period between dates of successive records is a constant. SAP HANA only has to store the start date and the equidistant time period— instead of the entire column! To calculate the date for the fourth record, SAP HANA takes the start date (1 January) and adds three days to give you 4 January. The formula for calculating is therefore as follows: Calculated time stamp = Starting time stamp + (Record number – 1) × Equidistant time period SAP HANA does this compression automatically in the background when you use an equidistant series table, so you won’t see the compression or the calculations. You can use calculation views and normal SQL statements, as if the data is actually stored in the table instead of being calculated. You can find out how to calculate how much space you save when using an equidistant series table instead of a column table by watching the SAP HANA Academy videos about series data on YouTube (http://s-prs.co/v487605). In the specific case given in these videos, they only used about 12% of the space. The source code is at http://s-prs.co/v487631. Key Concepts Refresher Chapter 11 Note Calculation views or SQL queries that use compressed equidistant (calculated) series columns will be slower than if the data were stored in nonequidistant tables. You should use nonequidistant tables if your series column (time) data has no pattern or when trading better performance for more memory is preferred and acceptable. Sometimes, when you keep adding a lot of data to a series table over time, the performance might decrease. You can then run an ALTER TABLE SERIES REORGANIZE statement to improve the compression. Creating Series Tables Let’s say you want to create the series table shown earlier in Table 11.3. You can do so using Listing 11.7. CREATE COLUMN TABLE SCHEMA_NAME.TABLE_NAME ( Person VARCHAR(25) NOT NULL, Date1 DATE NOT NULL, Weight DECIMAL(5,2), PRIMARY KEY (Person, Date1) ) SERIES ( SERIES KEY(Person) PERIOD FOR SERIES(Date1) EQUIDISTANT INCREMENT BY INTERVAL 1 DAY MINVALUE '2019-01-01' MAXVALUE '2019-12-31’ ); Listing 11.7 Creating the Series Table for Table 11.3 You’ll notice that this code starts with creating a normal column table, but then adds the SERIES clause to specify that this is a series table. Inside this SERIES clause, note the following: 쐍 The SERIES KEY is used to define the Person profile column. In this case, it’s the name of the person whose weight is measured. The SERIES KEY can be multiple columns. 쐍 PERIOD FOR SERIES is used to specify the Date1 series column, where the time values are stored. 쐍 EQUIDISTANT INCREMENT BY defines that this series is equidistant, and the distance between data points is one day. 쐍 MINVALUE and MAXVALUE specifies the range of acceptable dates for the Date1 series column. MINVALUE specifies the start point, and MAXVALUE specifies the end point. 505 506 Chapter 11 Spatial, Graph, and Series Data Modeling SAP HANA will also create a trigger for the table, which will return errors if you try to insert dates that aren’t in the specified range or if the date doesn’t align with the daily interval (in this case). You can use the SERIES_GENERATE function to prepopulate the entire series column (Date1). The rows are then ready for just adding values (Weight) at a later point. You can alter a series table using the ALTER TABLE statement. However, the SERIES clause of an existing series table can’t be altered. Horizontal and Vertical Aggregation A challenge you’ll often face when working with series tables is how to JOIN two series tables with different time intervals. To solve this, you must use horizontal aggregation. Horizontal aggregation is when you take granular data and aggregate it to a less granular level. For example, you can aggregate hourly data to a daily level. Or you can disaggregate the data to a more granular level, for example, breaking up hourly data to 15-minute intervals. Note Horizontal aggregation doesn’t change the data in the series table. It’s merely a “view” that displays the data at a higher or lower level of granularity and allows you to JOIN series tables with different time intervals. You can use normal aggregation functions, such as SUM and AVG, for aggregating the value columns. When taking the Date1 series column, for example, to a less granular level, you use the SERIES_ROUND function. In Listing 11.8, horizontal aggregation is applied to the data from Table 11.3. In this case, you use the ROUND_DOWN option, meaning that 14 February gets rounded down to the month of February, rather than rounded up to the month of March. In this case, this is quite obvious, but, sometimes, you’ll want 8:59 to be rounded up to 9:00, rather than rounded down to 8:00. SELECT Person, SERIES_ROUND(Date1, 'INTERVAL 1 MONTH', ROUND_DOWN) AS Monthly, AVG(Weight) AS MonthlyWeight FROM SCHEMA_NAME.TABLE_NAME GROUP BY Person, SERIES_ROUND(Date1, 'INTERVAL 1 MONTH', ROUND_DOWN); Listing 11.8 Horizontal Aggregation Key Concepts Refresher Chapter 11 Listing 11.8 is the classical SQL format. You might not like the fact that the SERIES_ ROUND function is called twice. Listing 11.9 shows an alternative way of accomplishing the same horizontal aggregation by using a WITH … SELECT statement, which only calls the SERIES_ROUND function once. WITH TMP AS ( SELECT Person, SERIES_ROUND(Date1, 'INTERVAL 1 MONTH', ROUND_DOWN) AS Monthly, Weight FROM SCHEMA_NAME.TABLE_NAME ) SELECT Person, TIMER_HOUR, AVG(Weight) AS MonthlyWeight FROM TMP GROUP BY Person, Monthly; Listing 11.9 Horizontal Aggregation Using a WITH Clause You use the SERIES_DISAGGREGATE function instead of the SERIES_ROUND function when you move from coarse units (e.g., daily) to finer units (e.g., hourly). Vertical aggregation is when you aggregate measures over a number of attributes. You can JOIN a series table with other tables in SAP HANA and aggregate the results to create interesting reports, for example, per factory, per region, per machine type, or per year. All logical join types are supported when joining two series tables. However, both series tables must be at similar levels of granularity. Both can be nonequidistant or equidistant with the same time interval. Functions for Series Data Special SQL functions are available specifically for series data. You’ve seen quite a few of these already, for example, SERIES_ROUND, SERIES_DISAGGREGATE, and SERIES_ GENERATE. These functions are documented in the SAP HANA Series Data Developer Guide at http://s-prs.co/v487620 in the Development section. More advanced functions, such as CUBIC_SPLINE_APPROX, allow you to fit a cubic spline over your series data. You also have additional analytic functions for analyzing series data, such as MEDIAN to compute a median value or CORR to calculate the correlation coefficient. 507 508 Chapter 11 Spatial, Graph, and Series Data Modeling Important Terminology In this chapter, we looked at the following important terms: 쐍 Spatial processing Processing that uses data to describe, manipulate, and calculate the position, shape, and orientation of objects in a defined space. 쐍 Spatial reference systems (SRSs) Specifications for various flat-earth and round-earth models. 쐍 Spatial joins and predicates The way that spatial shapes and points can be related to each other. 쐍 Spatial functions Special OGC-compliant functions to process spatial data in the system. 쐍 Geocoding The process of converting address information into geographic locations, such as latitude and longitude. 쐍 Graph processing Used to process relationship and attributes of highly networked entities. 쐍 Vertices Nodes in a graph that are related to each other. 쐍 Edge Relationships between vertices in a graph. 쐍 Graph workspace File linking the vertex data used in a graph to the edge data. 쐍 Time series data A sequence of regular measurements taken over a period of time. 쐍 Equidistant Refers to when the time increments between successive records in the series data are constant, for example, one hour. 쐍 Horizontal aggregation Taking granular data and aggregating it to a less granular level. For example, you can aggregate hourly data to a daily level or disaggregate the data to a more granular level, such as breaking up hourly data to 15-minute intervals. Practice Questions Chapter 11 Practice Questions These practice questions will help you evaluate your understanding of the topics covered in this chapter. The questions shown are similar in nature to those found on the certification examination. Although none of these questions will be found on the exam itself, they will allow you to review your knowledge of the subject. Select the correct answers, and then check the completeness of your answers in the “Practice Question Answers and Explanations” section. Remember, on the exam, you must select all correct answers and only correct answers to receive credit for the question. 1. What spatial data type would you use to store multiple shop locations? 첸 A. ST_Point 첸 B. ST_MultiPoint 첸 C. ST_Polygon 첸 D. ST_MultiLineString 2. Which spatial data formats can you import into SAP HANA? (There are 2 correct answers.) 첸 A. Scalable Vector Graphic (SVG) 첸 B. Well-Known Text (WKT) 첸 C. ESRI shapefiles 첸 D. GeoJSON 3. Which modeling features can you use in flowgraphs? (There are 2 correct answers.) 첸 A. Text 첸 B. Predictive 첸 C. Spatial 첸 D. Graph 509 510 Chapter 11 Spatial, Graph, and Series Data Modeling 4. SAP HANA spatial functions are compliant to the standards set by which company? 첸 A. OGC 첸 B. ESRI 첸 C. W3C 첸 D. HERE 5. What characteristics does the WGS84 (planar) spatial reference system (SRS) have? (There are 2 correct answers.) 첸 A. It’s a flat-earth type SRS, 첸 B. It uses a geographic coordinate system. 첸 C. It’s a round-earth type SRS. 첸 D. It uses a Cartesian coordinate system. 6. What is the default spatial reference system identity (SRID) for SAP HANA? 첸 A. 0 (2D planar) 첸 B. 4326 (WGS84) 첸 C. 3857 (Web Mercator) 7. What is the m-coordinate in a spatial point used for? 첸 A. The height above or below sea level 첸 B. Measures, such as wind speed 첸 C. The latitude 첸 D. The longitude 8. A spatial point is on the boundary of a shape. Which predicate do you use to accept it as enclosed by the shape? 첸 A. ST_Contains 첸 B. ST_Within 첸 C. ST_Covers Practice Questions Chapter 11 9. What function do you use to create a polygon that encloses a cluster of spatial points? 첸 A. ST_UnionAggr 첸 B. ST_ConvexHullAggr 첸 C. ST_EnvelopeAggr 10. You need to cluster a large number of spatial data points to get a quick idea of the data, and you don’t care about visually appealing results. What SAP HANA spatial clustering algorithm do you use? 첸 A. Hexagonal 첸 B. K-means 첸 C. Grid 첸 D. DBSCAN 11. What does the graph workspace file specify? (There are 2 correct answers.) 첸 A. The definition of the graph model 첸 B. The attributes available in the graph model 첸 C. How the vertex and edge table are related 첸 D. The graph algorithm used by the Graph node 12. True or False: The graph workspace file is an .hdbcds file. 첸 A. True 첸 B. False 13. Which of the following can you use in the graph viewer in the SAP HANA database explorer? (There are 2 correct answers.) 첸 A. Cypher 첸 B. GraphScript 첸 C. The Shortest Path One to Many algorithm 첸 D. The Strongly Connected Components algorithm 511 512 Chapter 11 Spatial, Graph, and Series Data Modeling 14. Which graph algorithm do you use to find the cheapest route to a specified destination? 첸 A. Strongly Connected Components 첸 B. Shortest Path One to Many 첸 C. Neighborhood 첸 D. Shortest Path One to One 15. Which property do you specify for the graph algorithm to find the cheapest route to a specified destination? 첸 A. Direction 첸 B. Weight 첸 C. Minimum Depth 첸 D. Maximum Depth 16. True or False: The Graph node must be the bottom node in a cube calculation view. 첸 A. True 첸 B. False 17. You need to use a Graph node with a shortest path algorithm in a calculation view, and require the end users to only specify the start vertex at runtime. What is the easiest way to achieve this? 첸 A. Use a GraphScript procedure. 첸 B. Write an SAP HANA XSA application. 첸 C. Use input parameters. 첸 D. Use a REST API call. 18. Bonus Question: Which language do you use when you need to use a tree-like topology with a declarative query language? 첸 A. Cypher 첸 B. SQLScript 첸 C. GraphScript Practice Question Answers and Explanations Chapter 11 19. What are the vertices in a graph model? 첸 A. The nodes themselves 첸 B. The relationships between the nodes 첸 C. The attributes of the nodes 20. What will you use to convert daily series data to an hourly level? Note: There are 2 correct answers to this question. 첸 A. Vertical aggregation 첸 B. Horizontal aggregation 첸 C. SERIES_DISAGGREGATE 첸 D. SERIES_ROUND 21. What do you use to create an equidistant series table? 첸 A. CREATE TABLE 첸 B. CREATE SERIES TABLE 첸 C. CREATE COLUMN TABLE 첸 D. CREATE HISTORY COLUMN TABLE 22. You create an equidistant series table. In the SERIES clause, what do use to identify the profile column? 첸 A. SERIES KEY 첸 B. PERIOD FOR SERIES 첸 C. EQUIDISTANT INCREMENT BY 첸 D. PRIMARY KEY Practice Question Answers and Explanations 1. Correct answer: B Use a ST_MultiPoint data type to store multiple shop locations. ST_Point can only store a single shop location, and ST_Polygon and ST_MultiLineString can’t be used for shop locations. 513 514 Chapter 11 Spatial, Graph, and Series Data Modeling 2. Correct answers: B, C You can import Well-Known Text (WKT) and ESRI shapefiles into SAP HANA. You can only export Scalable Vector Graphic (SVG) and GeoJSON file formats. 3. Correct answers: B, C Predictive and spatial modeling can be done in flowgraphs. 4. Correct answer: A The SAP HANA spatial functions are compliant to the standards set by OGC. ESRI is the largest spatial software vendor and sets the de facto standard for shape files. The W3C looks after the standard for SVG (and other web standards) HERE provides maps. 5. Correct answers: A, D The WGS84 (planar) spatial reference system (SRS) is a flat-earth type SRS and uses a Cartesian coordinate system. If you got this wrong, it’s because it was a trick question. The WGS84 (planar) SRS is NOT the same as the WGS84 SRS. Check the information below Figure 11.1. 6. Correct answer: A The default spatial reference system identity (SRID) for SAP HANA is 0 (2D planar). Many times, you should rather use 4326 (WGS84). 7. Correct answer: B The m-coordinate is a spatial point used for measures such as wind speed. (M is for measure.) 8. Correct answer: C You use ST_Covers to accept that a spatial point is on the boundary of a shape as enclosed by the shape. ST_Contains and ST_Within don’t show a point on the boundary of a shape as being enclosed by the shape. 9. Correct answer: B You use the ST_ConvexHullAggr function to create a polygon that encloses a cluster of spatial points. The ST_EnvelopeAggr function returns a rectangle, which technically is also a polygon. This is therefore a bad question, as someone could argue about the answer, and would thus not occur in the real exam. Practice Question Answers and Explanations Chapter 11 10. Correct answer: C You use the SAP HANA grid spatial clustering algorithm to cluster a large number of spatial data points to get a quick idea of the data or when you don’t care about visually appealing results. The hexagonal spatial clustering algorithm (in SPS 04) might create more visually appealing results. The k-means and DBSCAN spatial clustering algorithms aren’t as quick as the grid spatial clustering algorithm. 11. Correct answers: A, C The graph workspace file specifies the definition of the graph model and how the vertex and edge table are related. The attributes available in the graph model are just the additional fields available in the tables or views. 12. Correct answer: B False: The graph workspace file is NOT an .hdbcds file but an .hdbworkspace file. 13. Correct answers: A, D You use Cypher and the Strongly Connected Components algorithm in the graph viewer in the SAP HANA database explorer. GraphScript uses an .hdbprocedure file. The Shortest Path One to Many algorithm is available in the Graph node but NOT in the graph viewer. 14. Correct answer: D You use the Shortest Path One to One graph algorithm to find the cheapest route to a specified destination. The Shortest Path One to Many graph algorithm would also be correct if the question did not say “specified destination,” as you could generate all routes and pick the cheapest one for your destination. 15. Correct answer: B You specify the Weight Column property for the graph algorithm to find the cheapest route to a specified destination. 16. Correct answer: A True: The graph node must be the bottom node in a cube calculation view and, actually, for all calculation views. 515 516 Chapter 11 Spatial, Graph, and Series Data Modeling 17. Correct answer: C When you need to use a Graph node with a Shortest Path algorithm in a calculation view and require the end users to only specify the start vertex at runtime, you simply use input parameters (refer to Figure 11.23). Writing an SAP HANA XSA application could work, but not in all cases; for example, it won’t work when you want the users to specify the start vertex while they are using SAP Analytics Cloud. 18. Correct answer: B You use SQLScript when you need to use a tree-like topology with a declarative query language. The tree-like topology refers to hierarchies. Cypher is a declarative query language, but it’s used for generic graph topology. This question is too detailed and hard, which is why it was marked as a bonus question. If you could answer it correctly, you know the topic very well! 19. Correct answer: A The vertices in a graph model are the nodes. The relationships between the nodes are called edges. 20. Correct answer: B, C You use horizontal aggregation and SERIES_DISAGGREGATE to convert daily series data to an hourly level. In this case, you go to hourly, which means there are more values. You need to disaggregate to achieve this. SERIES_ROUND is used with aggregation. Aggregation (e.g., SUM) takes many values and reduces them to one value. 21. Correct answer: C You use CREATE COLUMN TABLE with a SERIES clause to create a series table. There is no CREATE SERIES TABLE statement. CREATE TABLE creates a row table. A history column table is for audit log type tables, where you keep a full history of all changes made to the table. 22. Correct answer: A You identify the profile column, which is the name of the “sensor” in IoT, with SERIES KEY. The PERIOD FOR SERIES (used on the series column) is used for the time period. Note Some questions in the certification exam will only have three answer options. Summary Chapter 11 Takeaway You now understand the key components of spatial processing, which spatial data types are available, and how to perform spatial joins in SAP HANA information views. You also know how to work with nodes (vertices) and relationships (edges) in a graph, as well as how to combine them using a graph workspace. You’ve used the graph viewer in the SAP HANA database explorer and used Graph nodes in graphical calculation views. And you know the differences between Cypher and GraphScript, and hierarchies and graphs. Finally, you learned how to create a series table, understood what an equidistant series is, and gained a basic understanding of series data modeling techniques. Summary We discussed how to use spatial processing, how to combine business data with maps, and how to easily do complex queries on networked entities. In the same way, we can combine graph data with business data to create powerful queries that can’t be solved using traditional hierarchies and reporting methods. We discussed important aspects of SAP HANA series data, including the advantages of using SAP HANA. Throughout this overview of SAP HANA series data, you’ve learned more about the SAP HANA series data functionality, as well as the basic concepts of series data in SAP HANA, such as series tables, equidistant series data, and horizontal aggregation. In the next chapter, we’ll look at how to improve SAP HANA information models to achieve optimal performance. 517 Chapter 12 Optimization of Information Models Techniques You’ll Master 쐍 Understand the performance enhancements in SAP HANA 쐍 Get to know techniques for optimizing performance 쐍 Learn how to use the SAP HANA optimization tools 쐍 Implement good modeling practices for optimal performance 쐍 Create optimal SQL and SQLScript 520 Chapter 12 Optimization of Information Models During our exploration of SAP HANA, we’ve frequently come across the topic of performance. SAP HANA is well known for the improvements it brings to business process runtimes and its real-time computing abilities. In this chapter, we’ll discuss some of these performance enhancements that are built in to SAP HANA and review the techniques you’ve learned previously. We’ll then look into new ways of pinpointing performance issues and discuss how to use the optimization tools built in to SAP HANA. Finally, we’ll examine some guidelines and best practices when modeling and writing SQL and SQLScript code. Real-World Scenario The CEO of your company reads a 2009 study by Akamai stating that 40% of online shoppers will abandon a website that takes longer than three seconds to load. Therefore, you’re tasked with helping to achieve a response time of less than three seconds for most reports and analytics that sales representatives use. Many such users work on tablets, especially when they are visiting customers. You decide to use a mobile-first approach, creating interactive analytics and reports to reduce network requirements. Some of the reports currently take more than two minutes to run on SAP HANA. Using your knowledge of optimization, you manage to bring these times down to just a few seconds. Objectives of This Portion of the Test The purpose of this portion of the SAP HANA certification exam is to test your knowledge of the SAP HANA modeling tools. The certification exam expects you to have a good understanding of the following topics: 쐍 Performance enhancements in SAP HANA 쐍 Techniques for optimizing performance 쐍 Using the performance analysis mode, debug mode, and other optimization tools 쐍 Good modeling practices for optimal performance 쐍 Creating optimal SQL and SQLScript Key Concepts Refresher Chapter 12 Note This portion contributes about 10% of the total certification exam score. Key Concepts Refresher In the preceding chapters, we’ve discussed many performance enhancements that are built-in to the design of SAP HANA; in this chapter, we’ll walk through various optimization topics in SAP HANA. Architecture and Performance The largest and most fundamental change that comes with SAP HANA technology is the move from disks to memory. The in-memory technology that SAP HANA is built on has been one of the biggest contributors of performance improvements. The decision to break with the legacy of disk-based systems and use in-memory systems as the central focus point (and everything that comes with this) has led to significant changes in performance. The next step was to combine memory and multicore CPUs. If we store everything in memory and have plenty of processing power, we don’t have to store calculated results. We just recalculate them when we need them, which opens the door to real-time computing. Real-time computing requires that results be recalculated each time they’re needed because the data might have changed since the last query. This combination of memory and CPU processing power led to new innovations. Now, you can combine online transactional processing (OLTP) and online analytical processing (OLAP) together in the same system: the OLAP components don’t need the storage of precalculated cubes and can be recalculated without impacting the performance of the OLTP system. Another innovation prompted by combining memory and CPU is data compression. This both reduces memory requirements and helps to better utilize CPUs by loading more data in the level 1 CPU cache. Loading more data into CPUs once again improves performance. To improve compression, SAP HANA prefers columnar storage. Along with many advantages, columnar storage also has a few disadvantages: inserting new records and updating existing records aren’t handled as efficiently in column tables as in 521 522 Chapter 12 Optimization of Information Models row tables. To solve this, SAP HANA implemented the insert-only principle with delta buffers. We use row storage for delta buffers to improve insert speeds and then use delta merges to bring the data back to columnar storage, which leads to better read performance. Tip The performance improvements in SAP HANA don’t mean that we can be negligent in modeling and development. For example, you shouldn’t use SELECT * statements or choose any unnecessary fields in queries because doing so decreases the efficiency of the column storage. Using multicore processors, everything in SAP HANA was designed from the ground up to run in parallel. Even a single aggregation value is calculated using parallelism. This led to SAP HANA growing from a single hardware appliance to a distributed scale-out system that can work across multiple server nodes and finally to multitenant database containers (MDCs). Tables can now also be partitioned across multiple servers. With more servers, the scale and performance of SAP HANA systems has vastly advanced. Redesigned and Optimized Applications While combining memory and processors, we also saw that moving data from memory to application servers isn’t the optimal way to work with an in-memory database. It makes more sense to push down some of the application logic to the SAP HANA system for fast access to the data in memory and plenty of computing resources for calculations. SAP started by making SAP HANA available as a side-by-side accelerator solution to improve issues with long-running processes in current systems. Over time, SAP business applications were migrated to SAP HANA and reengineered to make use of the advantages it offers. With SAP S/4HANA, we now have a system that uses views instead of materialized aggregates in lookup tables, significantly reducing the size of the system, simplifying the code, and enriching the user experience. Key Concepts Refresher Chapter 12 Effects of Good Performance We sometimes hear the following: “Why do we have to optimize our SAP HANA models that much? Surely five minutes for an analysis, a report, or some screen is good enough.” Yes, sometimes a response time of five minutes is good enough. But this means people focus only on the task at hand. They don’t look at what is possible and what they’re currently not doing. What other things, which were previously impossible, can we do now if this current process runs in less than a second? What other data can we combine with the current process that unlocks more business value? How can we change our business because of this? Example Most people see SAP HANA as just some technology. Some people see SAP HANA as an enabler. Few people view SAP HANA as a “change agent.” The history of SAP over the past few years is proof of how SAP HANA can change a business. When SAP HANA was released, there was a lot of focus on how it could be used to accelerate reports and analytics. At some stage, people realized that they could not only process the same amount of data faster, but they could now process more data. SAP started using SAP HANA for big data scenarios. One of the major sources of big data is sensors or what is called the Internet of Things (IoT). Many of these “things” (sensors) are sending a constant stream of data out, which results in large data sets that keep growing. (We look at streaming data in Chapter 9.) SAP evolved this focus on the IoT into its new SAP Leonardo portfolio. When you work with big data, you start with historical analysis. But if the performance is great, you can start looking at using this historical analysis to predict events. (We discussed the predictive capabilities of SAP HANA in Chapter 10.) SAP now has the SAP Predictive Maintenance and Service system that uses historical IoT data from machines to predict the optimal time to service, maintain, and upgrade these machines, as well as exactly which parts need to be replaced at what point in time. This results in huge savings. The next step from predictive is machine learning. Some people now call some things machine learning that are still really predictive. Machine learning can build on predictive, but it also has elements such as neural networks. SAP HANA has a 523 524 Chapter 12 Optimization of Information Models machine learning component, which is even available for SAP HANA, express edition. Machine learning with SAP HANA is also part of the new SAP Leonardo portfolio. Look at the user’s experience. If a system takes five minutes to respond, you can’t really use it on a mobile device. We started the chapter with a study that showed people expected answers in less than three seconds on mobile. More than half the requests on the Internet are now coming from mobile devices. A response time of five minutes isn’t good enough in this case. Lastly, when system responses are fast, you can start moving these systems to the cloud. People can now access these systems from any device and from anywhere. SAP now has a huge focus on cloud systems. Many of these would never have worked if the responses to users were slow. SAP Cloud Platform is now a central component in the SAP Leonardo portfolio and the new “Intelligent Enterprise.” Without good performance and a focus on optimizing their modeling and development, products such as SAP Leonardo and SAP Cloud Platform might never have been conceived. Performance can open doors to new business models. Let’s see what we’ve learned already to help us with optimizing our information models. Information Modeling Techniques In previous chapters, we discussed some techniques that can help optimize information models, such as the following: 쐍 Limit data as quickly as possible by doing the following: – Define filters on the source data. – Apply variables in your SAP HANA views to push down filters at runtime. – Select only the fields that you require in your query results. Don’t use SELECT *. – Let SAP HANA perform the calculations and send only the calculated results to a business application, instead of sending lots of data to the application and performing the calculations there. – Don’t output your data as granular (at record level) in a calculation view when your analysis displays it as aggregates. 쐍 Speed up filters by partitioning the tables that you filter on. 쐍 Perform calculations as late as possible. For example, perform aggregation before calculation where possible. Key Concepts Refresher Chapter 12 쐍 Use referential joins when your data has referential integrity. When working with referential integrity, the SAP HANA system doesn’t have to evaluate the join for all queries. 쐍 Similarly, use dynamic joins to optimize the join performance for certain queries. 쐍 Join tables on key fields or indexed columns. 쐍 Use a union instead of a join for combining large sets of data. 쐍 Make unions even more efficient by using union pruning. We discussed this in Chapter 6 already, when discussing unions. 쐍 Where possible, use a single fact table in a data foundation of a star join view. 쐍 Build up your information views in layers. This helps SAP HANA to process the requests in parallel, which drastically improves performance. If you use these techniques, you’ll avoid many optimization hazards. These techniques should be part of your everyday workflow. The best optimization is not having to optimize. By avoiding the pitfalls in this chapter, you’ll most likely not need to optimize your modeling later. In spite of your best efforts, however, sometimes you need to optimize an information model. Optimization Tools When optimizing an SAP HANA information model, a number of tools are at your disposal. You need to know when to pick which optimization tool and how to use it. Note If you’ve worked with SAP HANA 1.0 before, you’ll notice some of the tools that you once knew have changed. We won’t discuss SAP HANA database engines, the Explain Plan tool, the Visualize Plan tool, or the Administration Perspective. These were available in SAP HANA 1.0 and SAP HANA Studio. Instead, we’ll focus on the new tools found in SAP Web IDE for SAP HANA. 525 526 Chapter 12 Optimization of Information Models Performance Analysis Mode Performance analysis mode is available from the top toolbar when you build SAP HANA calculation views. You can switch this mode on or off when designing information views. The tool will analyze your view and suggest possible improvements. When you select a node—for example, the Join node—in your view, and then open the Performance Analysis tab (top right), you might see some improvement recommendations from the SAP HANA system. You’ll notice that a new tab, Performance Analysis, has been added to the right side of the editor in which you design the information view. Figure 12.1 shows the Product calculation view from the SAP HANA Interactive Education (SHINE) content. We select the first (bottom) node in the view, and see the Performance Analysis of this Join node. Figure 12.1 Performance Analysis Mode Optimization Suggestions Key Concepts Refresher Chapter 12 The first section includes the Join Type and Cardinality of the join. The next section includes the Partition Type and Number of Rows of each of the tables in the join. If we have many rows in a table, SAP HANA might suggest that we partition the table to improve performance. In this case, the BusinessPartner table is already partitioned using a Hash partition. There are two possible improvement suggestions in Figure 12.1: 쐍 Proposed cardinality In this case, we didn’t specify any cardinality on the join. SAP HANA analyzed the tables and the Join node and suggested that a n..1 cardinality should be set. Setting the cardinality will allow SAP HANA to execute the view faster because it won’t have to calculate what the cardinality is first. The suggestion of n..1 cardinality makes sense because we’ll order multiple products (left table) from each supplier (right table). A message stating Selected cardinality does not match the proposed cardinality is shown below the Cardinality field. 쐍 Join type SAP HANA looked at the data to see if referential integrity was ensured. Because Referential Integrity is Maintained in this case, we can use a referential join instead of the inner join type in this Join node. In the SAP Web IDE for SAP HANA preferences for the Modeler section, you can set the option to always open calculation views in performance analysis mode. You can also set a threshold for the number of records in a table where performance analysis mode will display a warning symbol next to this table. This will remind you to pay special attention to very large tables. You can use partitioning on such tables to improve performance, and you can apply filters that process only the required data. In addition to the cardinality, join type, and number of records in a table, performance analysis mode will also provide you with warnings when the following is true: 쐍 Joins are based on calculated columns. 쐍 Restricted columns are based on calculated columns. Performance analysis mode works on the runtime artifact. So, even if you did not build your calculation view yet, this mode will still give you valuable information. 527 528 Chapter 12 Optimization of Information Models Debug Mode Right next to the Performance Analysis button on the toolbar is the Debug This View button, as shown in Figure 12.2. Figure 12.2 Activating the Debug Query Mode in a Calculation View In Figure 12.3, we selected the Join node in the view, and opened the Debug Query tab of this node that appeared on the top-right side of the editor. You can see the SQL generated by the calculation view up to this point, as well as the result set. This is very useful to debug your views, for example, if a field is missing or giving different results than what you expected. You can now trace it node by node to see where you went wrong. In Figure 12.4, we selected the next Join node. Comparing the SQL and the result set with that of Figure 12.3, you can see that the new table that was joined in this node added the PRODUCT_DESCRIPTION field. Figure 12.3 Analyzing the SQL for a Join Node, along with the Result Set from That Node Key Concepts Refresher Chapter 12 Figure 12.4 Analyzing the SQL for the Next Node Join The debug query mode works in the runtime view of the calculation view but uses the active version of the calculation view. If you’ve made changes to your calculation view, you must Save and Build before you run debug query mode, or you won’t get the correct results. SQL Analyzer The previous two tools are used while you’re designing your graphical calculation views. The SQL Analyzer tool is used to measure the performance of a calculation view when you run it. Measuring performance of a running view clearly states that we’ll work with runtime objects. For this, we have to open the SAP HANA database explorer in SAP Web IDE for SAP HANA. Select the SAP HANA Deployment Infrastructure (HDI) container and the Column Views folder. A list of views that were successfully built is shown on the bottomleft area. In Figure 12.5, we again selected the same Product view from the SHINE content that we looked at earlier. Right-click on Column Views, and select Generate SELECT Statement from the context menu. 529 530 Chapter 12 Optimization of Information Models Figure 12.5 Generate SELECT Statement of a Column View in the SAP HANA Database Explorer The SAP HANA database explorer tools will open the SQL Console and create a SELECT statement. We now have a few tools at our disposal. Tip The SQL Console is a central tool for developers, and with new releases of SAP HANA, the tools here will continue to be enhanced. These tools can be used to develop queries, troubleshoot problems, and carry out ad hoc tasks. SQL Console tabs persist across sessions, so you can log out and still keep the work you’re busy with in these tabs for when you log in again. Click on the tiny triangle next to the large green arrow to display the Run menu, as shown in Figure 12.6. Here you can select the Prepare Statement tool. When you select the Analyze dropdown menu next to the Run menu, you can select the Analyze SQL option. Key Concepts Refresher Chapter 12 Figure 12.6 Opening the Run Menu to Select the SQL Analyzer Tool The SQL Analyzer tool opens in a separate tab. It’s divided into two halves, as shown in Figure 12.7. The top part has the following three tabs: 쐍 Overview These tiles show the time taken to run the query, how many tables were used, how many records were returned in the result set, and how much memory was used. 쐍 Plan Graph This shows a graphical tree of how long each step in the query took and how many records were returned. 쐍 SQL This just shows the same SQL query we saw in the SQL Console. The bottom part has the following four tabs: 쐍 Operators This provides more detail on every operator used when executing the query. 쐍 Timeline This presents timelines for the various operations and steps of the query. 쐍 Tables in Use and Table Accesses This gives details on the tables used in the query. 531 532 Chapter 12 Optimization of Information Models In the Overview tab, shown in Figure 12.7, you can see how long it took to execute the query, what operators were mainly used, how many tables are used, how many records were returned, and how much memory was consumed. Figure 12.7 SQL Analyzer Tool Displaying the Overview Tab The Plan Graph visually shows how the SQL query will be prepared and executed. Figure 12.8 illustrates the prepared execution plan. The blocks show the time taken by this step in the query, and the lines going to the next block up show how many records will be sent up to that step in the query. This information is helpful when you want to reduce the data transferred between layers as quickly as possible or filter as early as possible, as recommended. You can view the plan and find out how many records will be generated in the result set. Then, you can apply a filter, for example, and rerun the Plan Graph to see the difference. Figure 12.9 shows the Operators tab. Here you can filter and sort each of the query steps, see the operators used, and note which steps took the longest, accessed the most database objects, and created the most output rows. Key Concepts Refresher Chapter 12 Figure 12.8 Plan Graph Tab Displayed in the Top Half of the SQL Analyzer Tool Figure 12.9 Operators Tab Displayed in the Bottom Half of the SQL Analyzer Tool 533 534 Chapter 12 Optimization of Information Models Figure 12.10 shows a Timeline for each of the steps in the SQL query. Often a graphical display of the time used in the query helps us to understand where to focus our attention. Figure 12.10 Timeline Tab Displayed in the Bottom Half of the SQL Analyzer Tool Lastly, you can use the Tables in Use and Table Accesses tabs to see which tables are partitioned, where each table is located, how many entries were processed in each table, how many times each table was accessed, and how much time was spent processing each table (Figure 12.11). Figure 12.11 Table Accesses Tab Displayed in the Bottom Half of the SQL Analyzer Tool Key Concepts Refresher Chapter 12 SPS 04 The SQL Analyzer tool has more tabs in the bottom half in SAP HANA 2.0 SPS 04. There is a new Recommendations tab, which does what is says. There is also a Compilation Summary tab that provides information on the compilation time of queries. You can also export and import SQL plans, and both analysis flow and scale-out analysis have improved. Tip In Figure 12.6 you can also see a Report Code Coverage tool under the SQL Analyzer option in the menu. When you do a call to a procedure with specific parameter values, this tool will show you what code in your procedure was and wasn’t used. For example, in an IF – THEN – ELSE – END IF statement, the code in the ELSE might never be used, in which case, you could remove the entire IF statement. SAP HANA Cockpit The last tool we’ll briefly mention in this chapter is the SAP HANA cockpit. This tool is mainly used by SAP HANA system administrators, but it contains some tools that can occasionally be used to help with performance optimization. Figure 12.12 shows the relevant areas of the SAP HANA cockpit, namely the Monitoring area and the Alerting and Diagnostics area. There are three links that are especially relevant to optimization: 쐍 View trace and diagnostic files This link shows a list of existing trace and log files. We can filter this list, for example, to the “xs” search string, meaning that it only shows files that have “xs” in the file name. These trace and log files can provide detailed process information, and SAP HANA can trace very specific events; for example, you can limit a trace file to a specific user or application. 쐍 Plan trace This link is used to set up and create specific traces and various trace configurations. You’ll need to work with your SAP HANA administrator if you want to run any traces, as this could severely affect the performance of the system. 쐍 Troubleshoot unresponsive systems This link allows you to identify long-running queries in the SAP HANA system. 535 536 Chapter 12 Optimization of Information Models Figure 12.12 SAP HANA Cockpit: Monitoring Area and Alerting and Diagnostics Area SPS 04 A Recommendations Application is available from SAP HANA 2.0 SPS 04. This application will help you to enhance SAP HANA system performance according to existing SAP HANA operating conditions, instead of having to manually determine optimizations. It provides general recommendations based on various aspects of the SAP HANA system, such as data statistics, index usage, and rules for SQL optimization. It will also suggest load units for tables, partitions, and columns according to how frequently they are accessed. Best Practices for Optimization Performance analysis mode suggested some improvements we could make to our information view related to specifying the cardinality, using a referential join, and potentially partitioning our tables. In this last section, we’ll discuss some additional best practices for modeling in SAP HANA. Remember that we mentioned a Key Concepts Refresher Chapter 12 few best practices earlier in the chapter when we summarized concepts you’ve learned previously in the book. We can divide these best practices into two areas: those related to the graphical view design and those related to SQL and SQLScript. Best Practices for Calculation Views The following are some additional best practices to optimize calculation views: 쐍 Reduce data transfer between calculation views. 쐍 Reduce the intermediate results between nodes inside a calculation view. 쐍 Try to create calculations (e.g., in calculated columns) as late as possible in the model. 쐍 Implement filters at a table level. Don’t build filters on top of a calculated column, as this will do a lot of calculation before you filter the data. By filtering first, you can eliminate a lot of calculations. 쐍 The SAP HANA query optimizer tries to push filters down to the lower layers. Sometimes you have a node that gets inputs from multiple nodes, for example, the Intersect node in Chapter 6. The data flow splits before this node and has parallel nodes. If you have a filter in the nodes above, the optimizer has to push the filter down through these parallel nodes. Normally, it won’t do it as the semantic correctness might be affected. If you know that the semantic meaning won’t be affected, you can set the Ignore Multiple Outputs for Filter flag to force the filter pushdown. 쐍 Don’t send all information to the analytics and reporting tools; instead, make your reports more interactive, using actions such as “click to drill down,” and make multiple smaller requests from SAP HANA. The SAP HANA system will use less memory, be more efficient, and respond faster. With less data moving across the network, the user experience improves. In addition, by sending less information to the analytics and reporting tools, you make these tools more mobile-friendly. 쐍 Select only the fields you require; columnar databases don’t like working with all the columns. 쐍 Specify the cardinality in a join. This can improve the runtimes for large models as the optimizer can prune an entire data source from a join if certain prerequisites are met, like only reading data from the left-side table in a referential join. Just make sure that you also update the cardinality setting if you update the model at a later stage. 537 538 Chapter 12 Optimization of Information Models 쐍 For joins: – Use left outer joins rather than right outer joins. Try to use n:1 or 1:1 cardinality with the left outer joins. – Try to reduce the number of join fields. – Avoid joins on calculated columns. – Avoid joins where the data types of the join fields are different, as this will need type conversions. 쐍 Investigate whether partitioning your tables is a valid option. 쐍 Let SAP HANA do the hard work such as aggregations and calculations. 쐍 Identify recurring patterns, and put them into a reuse “library.” For example, you might always join the order header and the order details tables. Put this into a separate calculation view, and reuse it as a data source for multiple other calculation views. This also helps with breaking monolithic calculation views into smaller parts. Added advantages are that the smaller parts are easier to understand, and the reduced duplication leads to less maintenance. 쐍 Avoid mixing core data services (CDS) views with calculation views. Even though you can use CDS views as data sources in calculation views, the performance isn’t as good as when you use the tables directly in the calculation view and create the view functionality yourself. 쐍 Enable caching for calculation views where applicable. If the query results are extremely fast to generate, it might not make sense to enable caching. Caching can be activated and controlled by the system administrator at the database level. You can control caching at the level of an individual calculation view in the Advanced tab of the Semantic node. You can set the Cache Invalidation Period so that it expires after a transaction changes data in one of the data sources, after an hour or after a day. This choice depends on the data requirements of the individual calculation view. 쐍 Use Star Join nodes for OLAP queries. You can get similar results with Join nodes, but it won’t perform as well. 쐍 Try to use dedicated tables or views for value help lookups. It will be faster, and users will get consistent values across all calculation views that use this Value Help dialog. You can filter the list of values for users through analytic privileges. This will show them only the values they are allowed to see and use. We’ll discuss analytic privileges in the next chapter. Key Concepts Refresher Chapter 12 Tip Remember that SAP HANA uses authorizations as a filter. If a user is only allowed to see one cost center, that is essentially a “filter,” and SAP HANA will use it to speed up the query. In this case, security isn’t an overhead as in many other systems but an optimization technique. Partitioning We’ve seen partitioning mentioned a few times as a solution to some performance issues. Partitioning was already introduced in Chapter 3, but let’s discuss it a bit more. Although partitioning is normally managed by the system administrator or database administrator, it’s still good to understand how to use it. Many times, modelers are called on to help with partitioning as they understand the tables and their use. Partitioning is when we subdivide a column store table into multiple separate “blocks.” For example, we can split the table into blocks where partition 1 has all the records where people’s names start with the letter A, partition 2 for B, and so on. These partitions can then be spread across multiple servers. We can also keep the partitions on a single server. This can still improve performance as it’s now easy for the database to select only the records for people with names that start with the letter M, for example. Tip Partitioning can only be used with column tables—not with row tables. Partitioning is different from table distribution where several (whole) tables are distributed across multiple servers. In that case, the tables stay as a whole. Table A gets loaded on server 1, table B on server 2, and so on. In that case, tables aren’t split up, but distributed, whereas with partitioning, a single table gets split up. Following are some of the reasons for partitioning tables: 쐍 The load is balanced across multiple servers. Query performance improves as multiple servers are all working together in parallel to help process the data of a single table. 539 540 Chapter 12 Optimization of Information Models 쐍 If a query, for example, only asks for people’s names that start with the letter in the example above, the request is only executed on the server where the partition resides. 쐍 If a table is partitioned over multiple servers, the delta merge management is more efficient. SAP HANA will split tables with more than 2 billion records to improve performance. Partitioning is transparent to the SQL used for the queries. SAP HANA will automatically handle the partitioning logic in the background. You don’t need to specify partitions in your SQL queries. Partitions are defined when a table is created, or they can be created later with an ALTER TABLE statement. You can specify this using either SQL or CDS. You have full control over partitions; for example, you can split them, merge them, move them to other servers, and so on. SAP HANA supports three types of table partitioning: 쐍 Round robin This is the easiest type of table partitioning. A new record is automatically assigned to the next partition in a circular fashion. You don’t have to understand the data or specify which columns to use for the partitioning. 쐍 Hash For hash partitioning, you specify the columns to use for the partitioning. SAP HANA computes hash values and distributes the data evenly. You don’t really need to understand the actual content in the table. 쐍 Range Range partitioning requires that you know the data in the table really well. You have to specify not only the columns to use for the partitioning but also in what ranges to split the data, for example, by years 2010–2015, 2016–2018, and 2019– 2020. We can even create multilevel partitions. This is when you partition the partitions. For example, you can use hash partitioning to enable load balancing across multiple servers and then use range partitioning at the next level to split the data per year, perhaps for performance reasons. Key Concepts Refresher Chapter 12 SQL and SQLScript Guidelines We mentioned several best practices for optimizing SQL and SQLScript in Chapter 8, including the following: 쐍 Don’t use SELECT * statements. 쐍 Try to avoid imperative logic with a loop or branch logic. SQL uses a set-oriented approach for best performance. When you use imperative statements, you dictate the logic flow, making it very hard for the SQL optimizer to produce a better plan and parallelize the logic for improved performance. 쐍 Avoid using cursors. Cursors are a powerful feature in most databases that allow you to apply some action on one record at a time in a result set. The resultant reduction in performance is similar to the imperative logic in the previous point. 쐍 Break large, complex SQL statements into smaller independent steps using SQLScript variables. This can improve the parallelization of the execution of the query. The calculated result set can be referred to multiple times, eliminating repetitions of the same SQL statements. It will also reduce the complexity, decreasing your development time. 쐍 Use procedures or grouping sets to return multiple results sets. 쐍 Try to write code that is free from side effects. When code changes the state of the database, for example, by creating or altering tables, inserting or updating data, or deleting records, you slow things down. Read-only code, such as table functions, doesn’t alter the state of the database. 쐍 Use SQL syntax instead of the older calculation engine functions. SAP HANA has to convert these to SQL—this process is called unfolding—and the SQL optimizer might not be able to fully optimize this converted code. Even though we might not use calculation engine functions in our SQLScript code any longer, there are other places these can still appear. Figure 12.13 shows the Expression Editor screen for a calculated column in a graphical calculation view. In this case, we needed the IF() function, which isn’t available in an SQL expression. It’s recommended that you use SQL in the Expression Editor screen whenever possible. Tip For more SAP HANA optimization and performance information, you can attend the HA215 training course. You can find more information about this course at http://sprs.co/v487623. 541 542 Chapter 12 Optimization of Information Models Figure 12.13 Column Engine Expression Used in a Calculated Column Important Terminology In this chapter, we covered the following important topics: 쐍 Architecture and performance We started this chapter by reviewing the architecture design principles of SAP HANA and how everything contributes to optimal system performance. We also summarized some of the best practices you’ve already learned in your information-modeling journey. 쐍 Performance analysis mode Performance analysis mode provides suggestions while we’re designing a view to potentially improve performance. This tool looks, for example, at join types, referential integrity, cardinality, and partitioning. Practice Questions Chapter 12 쐍 Debug query mode Debug query mode shows the SQL generated in a calculation view up to the selected node. This can be used to see where fields get added and how that affects the result set. 쐍 SQL Analyzer The SQL Analyzer tool shows graphical flows for the SQL plans. We can see the number of records returned per operator in the plan, the number of records, and the time each step of the query took in the executed plan. We can also combine the graphical flow with a timeline and an operator list. 쐍 SAP HANA cockpit The SAP HANA cockpit provides access to the SQL cache of all SQL statements previously run in the SAP HANA system. We can also create and read trace files for identifying exactly what happens in the SAP HANA system when a query is executed. 쐍 Best practices Finally, we looked at some best practices for ensuring that SAP HANA information models perform optimally. Practice Questions These practice questions will help you evaluate your understanding of the topics covered in this chapter. The questions shown are similar in nature to those found on the certification examination. Although none of these questions will be found on the exam itself, they will allow you to review your knowledge of the subject. Select the correct answers, and then check the completeness of your answers in the “Practice Question Answers and Explanations” section. Remember, on the exam, you must select all correct answers and only correct answers to receive credit for the question. 1. What combination enabled real-time computing? 첸 A. In-memory technology with solid-state disks 첸 B. In-memory technology and multicore CPUs 첸 C. Multicore CPUs and solid-state disks 첸 D. Multicore CPUs with large level 1 caches 543 544 Chapter 12 Optimization of Information Models 2. What are the advantages of columnar storage? (There are 2 correct answers.) 첸 A. Improved compression 첸 B. Improved data reading performance 첸 C. Improved data writing performance 첸 D. Improved performance when all fields are selected Tip The following five items list almost exactly the same question, but the answers are different each time. 3. What performance technique should you implement to improve join performance? 첸 A. Use joins instead of unions for combining large data sets. 첸 B. Join on key fields between tables in a dimension view. 첸 C. Do NOT use dynamic joins if you require optimal performance. 첸 D. Always use referential joins in star join views. 4. What performance techniques should you implement to improve join performance? (There are 3 correct answers.) 첸 A. Use unions instead of joins for combining large data sets. 첸 B. Always specify the cardinality of a join. 첸 C. Join as many tables as possible in a star join view’s data foundation. 첸 D. Use left outer joins instead of right outer joins. 첸 E. Always mark joins as dynamic to improve performance. 5. What performance technique should you implement to improve the performance of SAP HANA information views? 첸 A. Build single large views. 첸 B. Implement union pruning conditions. 첸 C. Output as many fields as possible. 첸 D. Supply reporting tools with all the data in one transfer. Practice Questions Chapter 12 6. What performance technique should you implement to improve the performance of SAP HANA information views that include SQLScript? 첸 A. Use imperative logic. 첸 B. Use SELECT *. 첸 C. Break large statements into smaller steps. 첸 D. Use dynamic SQL. 7. What performance techniques should you implement to improve the performance of SAP HANA information views? (There are 3 correct answers.) 첸 A. Perform calculations before aggregation in your analytic views. 첸 B. Minimize the transfer of data between the execution engines. 첸 C. Investigate partitioning of large tables. 첸 D. Apply filters as late as possible. 첸 E. Push down aggregations to SAP HANA. 8. What information can you find in the plan graph? (There are 2 correct answers.) 첸 A. Execution engines used 첸 B. Number of records returned in each step 첸 C. Time taken for each step 첸 D. Plan trace 9. What information can you find in the SAP HANA cockpit? (There are 2 correct answers.) 첸 A. SQL generated in each step 첸 B. Trace and log files 첸 C. List of operators 첸 D. Plan trace 545 546 Chapter 12 Optimization of Information Models 10. What information can you find in performance analysis mode when you view the analysis for a Join node? (There are 2 correct answers.) 첸 A. Suggested filter for the tables in the Join node 첸 B. Total number of records returned by the node 첸 C. Time taken by the join 첸 D. Tables used by the join in the node Practice Question Answers and Explanations 1. Correct answer: B The combination of in-memory technology and multicore CPUs enabled realtime computing. Solid-state disks are still disks and relatively slow compared to memory. Multicore CPUs with large level 1 caches help, but also need in-memory technology because level 1 caches are quite small, for example, 256 KB. 2. Correct answers: A, B The advantages of columnar storage include improved compression and datareading performance. (Because data is compressed, you can read more data at the same I/O speed.) Improved data-writing performance is for row storage. Neither column nor row has improved performance when all fields are selected via SELECT *. 3. Correct answer: B Joining on key fields between tables in a dimension view will improve join performance. The answer “Use joins instead of unions for combining large data sets” has joins and unions swapped around: you should use unions rather than joins. Dynamic joins are created to improve performance. The word always makes the “Always use referential joins” answer wrong. 4. Correct answers: A, B, D The following are the correct performance techniques to improve join performance: use unions instead of joins for combining large data sets, always specify the cardinality of a join, and use left outer joins instead of right outer joins. You should try to use only one table in a star join view’s data foundation. Practice Question Answers and Explanations Chapter 12 The word always makes the “Always mark joins as dynamic” answer wrong. 5. Correct answer: B Implementing union pruning conditions will improve the performance of SAP HANA information views. You shouldn’t build a single large view; instead, build views up in layers. Use SAP HANA Live as an example. “Output as many fields as possible” is another way of saying “Use SELECT *”; don’t do it. Instead, supply reporting tools with only the data they need right now, make them interactive, and rather use multiple small requests. 6. Correct answer: C Breaking large statements into smaller steps will improve the performance of SAP HANA information views that include SQLScript. Do NOT use imperative logic, SELECT *, or dynamic SQL if you need optimal performance. 7. Correct answers: B, C, E The following are the performance techniques you can implement to improve the performance of SAP HANA information views: minimize the transfer of data between execution engines, investigate partitioning of large tables, and push down aggregations to SAP HANA. “Calculation before aggregation” is reversed; to be correct, it should say “aggregation before calculation.” Filters should be applied as early as possible. 8. Correct answers: B, C The plan graph shows the number of records returned and the time taken in each step. We don’t focus on the execution engines used. The SAP HANA cockpit shows the Plan Trace. 9. Correct answers: B, D The SAP HANA cockpit shows the Plan Trace feature and the trace and log files. The debug query tool shows the SQL statements generated by the nodes in a calculation view. The SQL Analyzer tool show the operators used. 10. Correct answers: B, D Performance analysis mode shows the total number of records returned and the tables used. 547 548 Chapter 12 Optimization of Information Models Takeaway You now have a checklist of modeling practices to ensure optimal performance for SAP HANA information models, including guidelines for creating optimal SQL and SQLScript. You can use the SAP HANA cockpit to identify long-running and expensive statements and to perform traces of specific queries and user activities. Performance analysis mode helps improve your SAP HANA information models by providing optimization suggestions. Summary You’ve seen that SAP HANA has been built from the ground up with many performance enhancements to enable high-performing processes. However, you still have to do your part if you expect the best results. You’ve now created SAP HANA information models with graphical views and with SQL and SQLScript. You’ve also learned how to optimize your information models. In the next chapter, we’ll look at how to apply security to your finished SAP HANA information models. Chapter 13 Security Techniques You’ll Master 쐍 Know when to use the different SAP HANA security features 쐍 Understand how users, roles, and privileges work together 쐍 Know how to create new users and roles 쐍 Learn the various types of privileges 쐍 Explain analytic privileges 쐍 Assign roles to users 쐍 Be familiar with the SAP HANA Deployment Infrastructure (HDI) and SAP HANA extended application services, advanced model (SAP HANA XSA) security concepts 550 Chapter 13 Security In this chapter, we’ll discuss how to implement security on the information models you’ve built. Security is frequently a specialist area, with people dedicated to looking after system security and user administration. We won’t go into all the details of SAP HANA security in this chapter, but we’ll give you a good overview to help you with most of the scenarios you’ll encounter in your day-to-day information-modeling work and those that are covered on the exam. Real-World Scenario You get an error message in SAP HANA indicating that you don’t have sufficient privileges to do a certain task. In cases like these, you’ll need to know how to resolve this kind of error message. For this situation, the answer is simply granting a right to a special user. In some cases, you may need to build special roles and privileges. In other cases, you may be asked by the project manager to assist with setting up the security on the information models. Because you built these information models in SAP HANA, you’ll have a good idea of which users should have certain privileges. You might also have to comply with certain regulations to ensure data security and privacy by using data masking and data anonymization techniques. Objectives of This Portion of the Test The purpose of this portion of the SAP HANA certification exam is to test your knowledge of SAP HANA security. The certification exam expects you to have a good understanding of the following topics: 쐍 When to use SAP HANA security, and when to use application security 쐍 How users, roles, and privileges work together 쐍 Understanding system users and how to create new users 쐍 Working with analytic privileges 쐍 Assigning the various types of privileges to a role 쐍 Building a new role, and assigning it to a user Key Concepts Refresher Chapter 13 쐍 Understanding HDI and SAP HANA XSA security concepts 쐍 Using masking and anonymization for data security and privacy Note This portion contributes about 5% of the total certification exam score. Key Concepts Refresher Security determines what people are allowed to do and see in business applications. Although we want to stop any unauthorized people from getting access to company information, we also want to allow business users to do their work without hindrance. There are various ways to achieve this balance in SAP HANA. In this section, we’ll discuss the basic use cases for security in SAP HANA and important security concepts. We’ll then look at how to create users, how these users can access the system, and what roles and privileges these users should get to achieve their goals. We’ll also discuss data security in general and data masking in particular. Usage and Concepts Let’s begin by looking at when to use security measures in SAP HANA before tackling important concepts. Use Cases The use cases for implementing security in SAP HANA depend on the deployment scenario. Let’s look at the different deployment scenarios and their security management: 쐍 SAP HANA as a database In a deployment scenario in which SAP HANA is used as the database for business applications such as SAP ERP, SAP Customer Relationship Management (SAP CRM), and SAP Business Warehouse (SAP BW), you don’t need to create SAP HANA users and roles for the business users. These users log in to the various SAP systems via the SAP application servers, and all end-user security is managed by these applications. This is shown on the left side of Figure 13.1. You’ll still need to create a few SAP HANA users and roles for the system administrator users, but you don’t need to create anything for the business users. 551 552 Chapter 13 Security SAP ERP Browser and mobile SAP HANA SAP CRM Enterprise data warehouse (SAP BW) SAP HANA SAP HANA SAP HANA XSA SAP HANA Non-SAP Database SAP HANA as a Database SAP HANA as a Platform Figure 13.1 Security for SAP HANA as Database vs. Platform 쐍 SAP HANA as a platform In deployment scenarios in which SAP HANA is used as a platform, end users need to log in and work directly in the SAP HANA system. In such cases, you might need to create users and assign roles for these users. In this chapter, we’ll focus primarily on these scenarios. 쐍 SAP HANA Live For SAP HANA Live, which we’ll discuss in Chapter 14, you also need end users to log in and work in the SAP HANA system. However, sometimes they also work in SAP HANA Live models via SAP applications. In this case, you need to manage the users in the SAP applications and use the Authorization Assistant tool to automatically create the correct SAP HANA Live users and roles inside the SAP HANA system. This chapter won’t focus on this use case. We’ve mentioned users and their roles in this section. In the next section, we’ll look at how these concepts work together in SAP HANA. Key Concepts Refresher Chapter 13 Security Concepts Figure 13.2 illustrates how privileges, roles, and users fit together in SAP HANA: 쐍 Privileges Privileges are the individual rights you can grant to a role. These will finally allow a user to execute a specific action in the system. 쐍 Roles A role in SAP HANA is a collection of granted privileges. You can combine several roles into another role, which is called a composite role. 쐍 Users Users are identities in the SAP HANA system that you assign to business people, modelers, developers, or administrators using your system. You don’t assign privileges directly to users. The best practice is to always assign privileges to a role and then assign that role to users. Composite role User Role 1 System Role 2 Analytic Application Privileges Figure 13.2 Assign Privileges to Roles and Roles to Users Object 553 554 Chapter 13 Security We start the security process shown in Figure 13.2 from the bottom up. We start with privileges, assign certain privileges to a role, and then assign that role to the users. The privileges and roles all add up to give a complete portfolio of what users are allowed to do. If something isn’t specified in a granted role or privilege, SAP HANA denies access to that action. Note In SAP HANA, the default security behavior is to deny access. You can’t do anything in the system until you’ve been granted a privilege. In security, it’s also important to know the difference between authentication and authorization: 쐍 Authentication Authentication answers the “Who are you?” question. It’s concerned with identifying users when they log in and making sure they really are who they claim to be. In SAP HANA, this is set up when we create a user. 쐍 Authorization Authorization answers the “What are you allowed to do?” question. This identifies which tasks a user can perform in the system. In SAP HANA, this is set up when we create a role and assign that role to a user. Now that we have the basic concept of roles and users, let’s take an in-depth look at how security has changed with SAP HANA XSA and HDI. SAP HANA XSA and SAP HANA Deployment Infrastructure If you’ve worked with earlier versions of SAP HANA, you might get some unexpected surprises when you try do things the way you’ve always done them. In the following sections, we’ll provide you with an understanding of the SAP HANA XSA and HDI security concepts. In many places, we’ll focus on how these security concepts differ from previous versions. Application and Database We already discovered in Chapter 4 that SAP HANA XSA can either run on the same hardware as the SAP HANA database or can be deployed on separate hardware. One of the consequences of this is that the SAP HANA XSA application has to be Key Concepts Refresher Chapter 13 separate from the database, which then leads to a separation of business (application) users and technical (database) users, such as modelers, developers, and administrators. Business users aren’t stored in the SAP HANA system but are managed from the Identity Provider (IdP) system. Users are created and managed, roles are assigned, passwords updated, and users deactivated from the IdP system. The User Account and Authorization (UAA) component in SAP HANA XSA communicates with the IdP system, as illustrated in Figure 13.3. Business users Identity Provider (IdP ) (use SAML) Design-time objects (“schema-free”) Application Container User Account and Auth (UAA) Deployed into schemas Application privileges (scopes) SAP HANA XSA Multi Target Application (MTA) SAP HANA Database Technical users User C1#00 (object owner) SAP HANA Deployment Infrastructure (HDI) – Contains run-time objects C1 (HDI Container) C1#DI CV1 (Calculation View) CV2 (Calculation View) T3 (Table) Classical DB schema T1 (Table) Role C2 ::access (Granted to C1#00) C2 (HDI Container) S1 (Synonym) with user-provided service S2 (Synonym) T2 (Table) with HDI service Figure 13.3 An Overview of the SAP HANA XSA Security Concepts Modelers, developers, administrators, and database object owners are, however, still managed from the SAP HANA database. Just like there is a separation of application users and database users, we also have a separation of application users and database privileges. 555 556 Chapter 13 Security Design-Time and Runtime Objects The next big change is that design-time objects are now separate from runtime objects. We used to store the design-time objects in the SAP HANA system where we finally used it. If you used the same object in a development system, a test system, and a production system, you would store three separate copies of the same object. These local copies of the objects were then “activated” to become runtime objects in the same SAP HANA system. These activated (runtime) objects belonged to the _SYS_REPO user. We now store a single copy of the design-time object in a Git repository and “deploy” it to multiple systems. Modelers and developers create such design-time objects, such as calculation views and procedures (refer to the top-right area of Figure 13.3). System administrators and database administrators (DBAs), on the other side, manage the deployment process and the runtime objects in the containers of the SAP HANA database. The biggest difference for modelers and developers is that we now work in a “schema-free” way. Because the same design-time object can be deployed to multiple systems, such as testing and production, we can’t explicitly reference database schemas. Schema names can be different for different HDI containers. The actual schema names are assigned as part of the deployment process. By default, SAP HANA Studio always created roles as runtime objects in the local SAP HANA system. These runtime roles can’t be transported from development to testing and production systems. In SAP Web IDE for SAP HANA and the SAP HANA cockpit, we now create designtime roles that can be deployed to multiple target systems. Containers Isolate Schemas In SAP HANA 1.0, in general, we had several schemas in the Catalog area. Each schema was basically just a collection of SAP HANA runtime objects. There was no isolation of schemas, so it was easy to create a calculation view that joined tables from two different schemas. We merely used explicit references to the tables. If we now try to do the same in an HDI container, we get a You are not authorized message. In the bottom half of Figure 13.3, you see an illustration of how we should do this in HDI containers. Key Concepts Refresher Chapter 13 HDI containers contain similar runtime objects as in the classical schemas in the Catalog area. These could be tables, indexes, calculation views, table functions, and procedures. Each HDI container is equivalent to a database schema, but with some important differences and extras. The word “containers” implies that they are isolated from each other and from the “outside world.” This is the main difference to keep in mind as we discuss HDI containers. The “extras” we mentioned are as follows: 쐍 Each HDI container has its own technical user, that is, the owner of all the runtime objects in the container (schema). For a container named C1, this user will be called C1#00. This is different from before, where the _SYS_REPO user was the owner of all runtime objects in the entire SAP HANA system. 쐍 Each HDI container has another associated container that is used for application programming interfaces (APIs) and storage. For a container named C1, this associated container is called C1#DI. 쐍 Each HDI container has a generic default role that is automatically created when you create the container. For a container named C1, this role is called C1::access. This role (shown much later in this chapter in Figure 13.10), contains a role for running services that allows external users data access to this HDI container (schema), as well as object privileges to read, insert, update, and delete data. Let’s return to our example of creating a calculation view that joins tables from two different schemas. Referring to Figure 13.3, we do our modeling in an HDI container called C1. We create our calculation view, called CV2, that joins table T1 from a classical schema with table T2 in another HDI container C2. Because container C2 is isolated, we need permission to read the table. We need to assign the C2::access role to the C1#00 user and create a synonym S2 to point to table T2. Synonym S2 uses the built-in HDI service to read table T2. When we read table T1 in the classical schema, we also have to create synonym S1 to read the table. But because this table is in a classical schema, we can’t use the HDI service, and we have to use a user-defined service. We also need the C2::access role assigned to the C1#00 user. If we access a table in the same HDI container, such as table T3, this works as usual. No extra steps are required because this table is located in the same container. 557 558 Chapter 13 Security Note HDI containers only work with the “newer” types of SAP HANA modeling and development objects. You can’t use attribute views, analytic views, scripted calculation views, or XML analytic privileges. Classical Schemas versus SAP HANA Deployment Infrastructure Containers Table 13.1 gives a summary of the differences between the SAP HANA 1.0 classical schemas and SAP HANA 2.0 HDI containers. We do generalize a bit with these category descriptions because HDI containers were already available in SAP HANA 1.0, and you can still use classical schemas in SAP HANA 2.0. But the focus in SAP HANA 1.0 was on classical schemas, and in SAP HANA 2.0, the focus is on HDI containers. SAP HANA 1.0 Classical Schemas SAP HANA 2.0 HDI Containers Schemas and explicit references Containers and synonyms Schema-specific design Schema-free design Attribute views, analytic views, scripted calculation views Calculation views, table functions XML analytic privileges and SQL analytic privileges SQL analytic privileges Application and database privileges together in the SAP HANA system Application privileges separate from database privileges Business users and technical users together in the SAP HANA system Business users via IdP, technical users in the SAP HANA system One technical user: _SYS_REPO Technical user per HDI container: <CONTAINER-NAME>#00 Design-time objects stored in the SAP HANA repository Design-time objects in the Git repository Activation Deployment Runtime roles (SAP HANA Studio) Design-time roles Table 13.1 Comparing Security on SAP HANA 1.0 Classical Schemas with Security on SAP HANA 2.0 HDI Containers With that background, we can now start looking at how to create users and roles in SAP HANA. Key Concepts Refresher Chapter 13 Users In this section, we’ll start by looking at how to quickly create new developer users on a new SAP HANA development or sandbox system. We’ll then show how to create and manage other users. Creating and Assigning SAP HANA XSA Development Users Modelers, developers, and administrators are normally the first to get working on a new system. You can quickly create new modeler and developer users for SAP HANA XSA using the SAP HANA XSA Administration tool. You see the tool’s launchpad menu in Figure 13.4. (We discussed this tool in Chapter 4.) Simply click on the User Management menu option. You’ll see a list of existing SAP HANA XSA users. Then click the New User button on the left side of the work area to create a new SAP HANA XSA user. Figure 13.4 SAP HANA XSA User Management After you’ve created the modeling or development user, you can add a Role Collection for this user, as shown in Figure 13.5. This will assign the user the relevant privileges to do the functions described below each role name. 559 560 Chapter 13 Security Figure 13.5 Assigning the Role Collection to an SAP HANA XSA User The last step is to assign your newly created modeling user to a development space. (We discussed spaces in Chapter 4.) In the SAP HANA XSA Administration tool, select the Organization menu option, as shown in Figure 13.6. For SAP HANA, express edition, the organization is HANAExpress, and the space you need to choose in this case is development. Click the Add Members button, and type the name of the new user. Select whether the new user is a Developer, Manager, or Auditor of the space. For modeling, you specify the Developer option. Then click the OK button. It’s a best practice to create your own user instead of using the initial administration users, called SYSTEM and XSA_Admin. The SYSTEM user is the main SAP HANA user. The XSA_Admin user is the main SAP HANA XSA user. When the system is installed, the administration users are required to get things going. However, don’t be surprised when you very quickly get authorization messages saying that the SYSTEM user doesn’t have privileges to perform certain tasks. Remember that, by default, Key Concepts Refresher Chapter 13 SAP HANA denies access to everything. Any new SAP HANA containers or models that you’ve created since the system installation are new objects. You have to explicitly grant access to these new models to the SYSTEM user before the user is allowed to use them. Figure 13.6 Assigning the SAP HANA XSA User to a Space Note Auditors normally require that the SYSTEM user is locked and disabled in an SAP HANA system to improve security. SPS 04 SAP HANA has never had the ability to manage a group of users. When you assign roles, it was always to individual users. In SAP HANA 2.0 SPS 04, we now have the ability to manage user groups. This new feature is available in the SAP HANA cockpit. 561 562 Chapter 13 Security You can create a user group and add users to the group. You can also configure the user group, for example, by setting a password policy for all users in the group. Every user group can have its own dedicated administrators, and only those designated administrators can manage the users in this group. Creating and Managing Users To begin using security in SAP HANA, open the SAP HANA cockpit, as discussed in Chapter 4. Go to the User & Role Management area in the SAP HANA cockpit, as shown in Figure 13.7. Here we can manage users for the HDI. Figure 13.7 Security Areas in the SAP HANA Cockpit In this area, you can start creating users and roles by selecting the Manage Users link. In Figure 13.8, we see an existing modeling user. This is the user we created in Figure 13.4 using the SAP HANA XSA Administration tool. We could also have created the same user in the SAP HANA cockpit. This user was created as a restricted user, meaning it will access the system via HTTP. This is exactly what we do when we use SAP Web IDE for SAP HANA. To Key Concepts Refresher Chapter 13 create a new user, click the + button on the bottom left of the screen. You can choose if the new user should be a normal user or a restricted user. Figure 13.8 Existing Modeling User in SAP HANA The New User or New Restricted User dialog will appear, as shown in Figure 13.9. You can choose a User Name for the user and a validity period. In the AUTHENTICATION area, you can specify if you can log in using a user name and password, or whether you should use one of the many single sign-on (SSO) options available in SAP HANA. The most popular SSO option is Kerberos, which most companies use together with their Microsoft Active Directory services. When a user logs in to the company’s network infrastructure, the Kerberos server issues a security certificate. When that user gets to the SAP HANA system and the Kerberos option is enabled for that SAP HANA user, the system will automatically ask for and validate this certificate and will then allow the user access to the SAP HANA system. With SSO, users don’t have to enter a user name and password each time they use the SAP HANA system. 563 564 Chapter 13 Security Figure 13.9 Screen for Creating a New User Key Concepts Refresher Chapter 13 Tip Users in an SAP HANA system aren’t transportable. You can’t create users in a development system and transport them to the production system with all their roles and privileges. This is by design, as you don’t want development and modeling users in the production system, and you don’t want end users in the development system. Roles In the previous section, we looked at creating users and assigning them roles. In this section, we’ll take a deeper dive into SAP HANA’s roles and discuss the various template roles available and the steps for creating a role in the system. Template Roles SAP provides a few template roles that you can use as examples for building your own, or you can use them as a starting point and extend them. We don’t recommend modifying any of the template roles directly; instead, copy them and modify the copy. The following two template roles are important: 쐍 MONITORING This is a role that provides read-only access to the SAP HANA system. You provide this, for example, to a system administrator or monitoring tool that needs to check your system. Users that have this role assigned can’t modify anything in the system. 쐍 MODELING This role is given to modelers so they can create information models in the SAP HANA system. We don’t recommend giving this role to any user in the production system. Users with this role assigned can create and modify objects in the system and can read all data, even sensitive or confidential data. These types of roles are also known as global roles. They are schema-independent. Creating Roles in SAP HANA Cockpit You can see all the roles in SAP HANA by clicking the Manage Roles link in the SAP HANA cockpit, as shown earlier in Figure 13.7. A list of all the roles in SAP HANA, as shown in Figure 13.10, will show in the left column. When you select a role, the details appear on the right side. 565 566 Chapter 13 Security Figure 13.10 List of Roles in SAP HANA, with the ::access Role Selected To create a new role, click the + button on the bottom left of the screen. In Figure 13.10, we selected the ::access role for our SAP HANA Interactive Education (SHINE) container. We discussed this role earlier. Here you can see the object privileges to read (SELECT), INSERT, UPDATE, and DELETE data. At the bottom, you find the role assigned, and the Grantor is the container#00 user that owns all the runtime objects in this container. SPS 04 Finding out which authorizations to include in a role can take time, especially when you’re trying to find out why a role doesn’t work and what authorization is missing. SAP HANA 2.0 SPS 04 has new simplified authorization troubleshooting to help with finding out what privileges are required when a user is denied access to a view. When an Insufficient Privilege error is generated, the user will receive a GUID—something like 7DO69BEEB327067947CA291F4CA3DFFF. You can use this GUID to retrieve information about the missing privilege by using a system procedure, as follows: call SYS.GET_INSUFFICIENT_PRIVILEGE_ERROR_DETAILS ('7DO69BEEB327067947CA291F4CA3DFFF', ?) Key Concepts Refresher Chapter 13 SPS 04 also includes a new Authorization Dependency Viewer in the SAP HANA cockpit, which also lets you visually inspect the privilege structure on a calculation view. This is a much easier way to troubleshoot missing privileges. Creating Roles in SAP Web IDE for SAP HANA The last method of creating roles is by creating .hdbrole files in SAP Web IDE for SAP HANA. These are design-time files and can thus be deployed to multiple target systems. SAP Web IDE for SAP HANA is still more developer-focused than modeler-friendly for creating roles. It’s easier to create roles in the SAP HANA cockpit graphical editor than SAP Web IDE for SAP HANA code editor (shown in Figure 13.11) used for creating and editing .hdbrole files. Figure 13.11 Role Created in SAP Web IDE for SAP HANA 567 568 Chapter 13 Security The example in Figure 13.11 is the admin.hdbrole role in the SHINE demo. Tip The .hdbrole files are design-time objects that can be transported to the production system. This improves security because you don’t need someone with rights to create roles in the production system. Earlier, we mentioned that you should always create design-time files in a schemafree manner. You can see that that the illustrated role doesn’t contain any schema names. But what should you do if you really need to create a schema-specific authorization where you have to specify the schema name? In this case, you can create a single .hdbroleconfig file that contains the name of the schema, as shown in Listing 13.1. { "DemoRole":{ "SchemaReference1":{ "schema":"DemoSchema1" } } } Listing 13.1 A Sample .hdbcroleconfig File You can now refer to this .hdbroleconfig file in many .hdbrole files, as shown in Listing 13.2. "schema_roles":[ { "schema_reference":" SchemaReference1", "names": ["Role1", "Role2"] } ] Listing 13.2 .hdbrole File Referencing the .hdbroleconfig File Instead of having to modify the schema name in many .hdbrole files, you only have to update the single .hdbroleconfig file. Tip You should never create roles in the SQL Console using SQL statements. These roles are immediately created as runtime objects and can’t be transported (deployed) to other systems. Key Concepts Refresher Chapter 13 Tip If you want to give users with this role the right to assign the role to others users, you need to include WITH GRANT OPTION privileges in the role. In this case, the name of the role must end with a hash (#), for example, role#. Privileges In this section, we’ll discuss the different types of privileges available in SAP HANA. We’ve already seen schema privileges in Figure 13.11. In Figure 13.10, shown earlier, we can see the following tabs for five other types of privileges: 쐍 System Privileges 쐍 Object Privileges 쐍 Analytic Privileges 쐍 Application Privileges 쐍 Package Privileges When you build or extend a role, you can assign these different types of privileges. Each of these privileges has its own tab in the Manage Roles editing screen. You can add new privileges in each tab by clicking on the Edit button. A dialog box will present a list of relevant privileges to choose from. All of these dialog boxes have a text field in which you can enter some search text. The list of available privileges will be limited according to the search text you type. Tip You can only add privileges. There is no concept in SAP HANA of taking away privileges because the default behavior is to deny access. System Privileges System privileges can be associated with system administration. These privileges grant users the rights to work with the SAP HANA system. Most of the system privileges are applicable to DBAs and system administrators. Figure 13.12 shows an example of system privileges. 569 570 Chapter 13 Security Figure 13.12 Assigning System Privileges to Roles Object Privileges We’ve seen object privileges used in the ::access role, as shown earlier in Figure 13.10. Object privileges are used to assign rights for users to data source objects, including schemas, tables, virtual tables, procedures, and table functions. Granting object privileges requires a very basic knowledge of SQL syntax. This makes sense because you’re working with data sources. You’ll have to know basic SELECT, INSERT, UPDATE, and other SQL statements. Figure 13.13 shows an example of assigning object privileges. Figure 13.13 Assigning Object Privileges to Roles Key Concepts Refresher Chapter 13 When you grant, for example, a SELECT object privilege, the user will be able to read all the data in the table or view on which you granted that object privilege. Analytic Privileges With object privileges, you can grant rights for someone to read all the data from a data source—but how do you limit access to sensitive and confidential data, or limit access to the data that people can see or modify to certain areas only? For example, you may need to ensure that managers can only edit data of employees reporting directly to them, that someone responsible for only one cost center accesses only the correct information, or that only board members can see financial results before they are announced. Analytic privileges can be used to limit data at the row level. This means that you can limit data, for example, to show only rows that contain data for 2019, the employees in your team, or your cost center. Tip We’re sometimes asked how to limit data at the column level. For example, a company wants to allow employees to see all their colleagues’ details, except their salaries. How do we hide the Salaries column? This isn’t possible with analytic privileges, but it’s possible with views. Simply create a view on the employees table, make sure the Salaries column isn’t part of the output field list, and then grant everyone access to the view. Figure 13.14 shows how to assign analytic privileges to a role. The figure shows only a small portion of the work involved. Here, we assign the previously created analytic privileges to a role. We’ll look at how to create these analytic privileges later in this section. Figure 13.14 Assigning Analytic Privileges to Roles 571 572 Chapter 13 Security Tip In Figure 13.14, the role was assigned an analytic privilege called _SYS_BI_CP_ALL. Be careful when assigning this analytic privilege; it grants access to the entire data set of the information views involved. It effectively bypasses the analytic privilege restrictions defined on a view. This analytic privilege shouldn’t be granted to end users in a production system. Note that this privilege is part of the MODELING role, as shown here. The MODELING role shouldn’t be granted to any end users in a production system as this will allow them to see all the data in the production system! Think about how analytic privileges behave when they are used with layered information views. Assume you need access to information about your cost center. Let’s view this from the reporting side. You run a report that it calls view A. View A then calls another view, called B. (View A is built on top of view B.) You can only see your cost center information if you have an analytic privilege that allows you to see the information being assigned to both views. If you have the analytic privilege assigned to only one of the two views, you won’t be able to see your cost center information. Tip It will help you to think of analytic privileges across multiple stacked views as an inner join. Only the analytic privileges specified in both views are applied. Another aspect of analytic privileges to think about is what result you want to achieve when using multiple restrictions. Let’s assume you have two restrictions: the first is year = 2018, and the second is cost center = 123. 쐍 If you need year = 2018 AND cost center = 123, you must specify both restrictions in a single analytic privilege. In this case, you’ve limited what the user sees even more than when you use only one of the restrictions. The AND clause restricts the results more. 쐍 If you need year = 2018 OR cost center = 123, you must create two separate analytic privileges and assign both to the same view. In this case, you’ve increased the amount of data in the result compared to just using one of the restrictions. The OR clause lets the user see more. Tip Analytic privileges are the most-used privileges for information-modeling work. Key Concepts Refresher Chapter 13 You can create analytic privileges in the same context menu where you created all your SAP HANA information views. In the context menu, select New • Analytic Privilege. This creates an .hdbanalyticprivilege file. In the next step, choose the information views that you want to restrict access to. After you’ve selected the information views, you can create the row-level access restrictions. On the left side of the screen in Figure 13.15, you can add more information views. On the right side of the screen, create your restrictions. Figure 13.15 Assigning Analytic Privilege Restrictions There are three ways to specify the restrictions: 쐍 Attributes With this option, you first select the attributes that you want to create the restrictions on. In this example, we chose the COMPANYNAME field. You then have to specify the restrictions. In this example, we restrict the supplier to be SAP only. In Figure 13.15, we show Attributes restrictions, which is the default. You can also restrict the validity period of the analytic privilege, as shown in Figure 13.16. 573 574 Chapter 13 Security Figure 13.16 Specifying the Validity Period of an Analytic Privilege 쐍 SQL Expression If you used the running example with the Attribute option and then changed the selection to SQL Expression, SAP HANA Studio automatically converts this to the equivalent of an SQL WHERE clause—in this example: (COMPANYNAME= 'SAP') We can change the restriction type from the dropdown menu on the top right, as shown in Figure 13.17. We merely changed the Attribute restrictions from Figure 13.16 to an SQL Expression type restriction. The system automatically converted it into the SQL WHERE clause shown. This included the validity period. Figure 13.17 Changing the Restriction Type to an SQL Expression Key Concepts Refresher Chapter 13 쐍 Dynamic The last option is the most powerful. We can also use a procedure to create the WHERE clause dynamically. This must be a read-only procedure that only has one output parameter. The output it generates must be the same as the WHERE clause shown earlier, for example, (COMPANYNAME= ‘SAP’). In Figure 13.18, we select a procedure. You can also use a synonym that points to a procedure in another HDI container or schema. Figure 13.18 Selecting a Procedure for Generating Dynamic Analytic Privileges As a modeler, you’ll sometimes get error messages. One of the most common error messages when working with analytic privileges occurs when you build the analytic privilege. You’ll get a Cannot use structured privilege for view not registered for STRUCTURED PRIVILEGE CHECK error message. The Build will fail as shown in Figure 13.19. In your analytic privilege, you specified the views that this analytic privilege applies to. One of these views isn’t configured to use SQL analytic privileges. 575 576 Chapter 13 Security To solve the error, open the view. In the Semantics node of the information view, set the Apply Privileges property to SQL Analytic Privileges, as shown in Figure 13.19. The analytic privilege should now build successfully. Figure 13.19 Solving the Build Error for an Analytic Privilege Application Privileges Application privileges are used to grant rights to SAP HANA extended application services, classic model (SAP HANA XS) applications. These application privileges are created by the developers of these web applications. In this step, you merely assign these privileges to a role. As discussed at the start of this chapter, with SAP HANA XSA, most of this moves from the SAP HANA database to the SAP HANA XSA applications. Key Concepts Refresher Chapter 13 Package Privileges Package privileges were used to package design-time objects for deployment to other servers, such as a production server. In SAP HANA 2.0 with SAP HANA XSA and HDI, we use multi-target application (MTA) archives for deployment. Package privileges are therefore not applicable in this case. Assigning Roles to Users Finally, we can assign our role with all its privileges to a user. To do so, go to the User and Role Management area in the SAP HANA cockpit, as shown earlier in Figure 13.7. Click the Assign role to users link. You can look for a specific user as shown in Figure 13.20. To assign additional roles to the selected user, click on the Edit button on the top left, and then click on the Assign Roles button. After you’ve added the roles, click the Save button. Figure 13.20 Assigning Roles to a User 577 578 Chapter 13 Security Data Security The topic of data security has always been important with SAP HANA because it’s used as a database and a development platform. How and by whom the data is consumed are inherent and expected features. One of the basic principles of system security is segregation of duties. This means that you’re not allowed to have only one person responsible for the administration of the security. Different people have different responsibilities, and the responsibilities are divided in such a way that one single person can’t create and assign rights to themselves or others that give them special privileges that can be used for corruption or fraud. As a modeler, you might be creating analytic privileges. Security administrators then add these as part of a role, and the system administrators can assign this role to people in the organization. The security auditors will check the system regularly to ensure that the segregation of duties principle isn’t violated. The system administrators set up the integration to the Lightweight Directory Access Protocol (LDAP) server, such as Microsoft Active Directory, to enable SSO in the SAP HANA system. They might also configure the Security Assertion Markup Language (SAML) servers for enabling SSO in web browsers. This is required for SAP HANA XSA to work securely in organizations. Roles are only set up in a development server; they are assigned to users in the production server. This segregates privileges across the landscape. Similarly, developers and modelers can only write code and create information models in the development server, but these are run by end users with the correct roles in the production server. SAP HANA has had data security features such as encryption of data files (data volumes, log volumes, and backups) and audit logs for many years now. These are configured by the system administrators and checked by the security auditors. You can also encrypt individual table columns with SAP HANA client-controlled keys. Data masking is used when you want to reduce the exposure of sensitive information. An example of data masking that you already know is when typing your password. Your password isn’t shown as text, but “masked” by displaying dots or blanks. Another typical use case is when working with credit card numbers. You have to hide this information from DBAs and users with broad access. Even users that Key Concepts Refresher Chapter 13 work directly with the user data, such as call center agents, shouldn’t see the credit card details. Data masking can completely hide such information, or partially hide parts of it. A call center agent might see just the last three digits of the credit card number to verify with the customer that the correct card is used for a transaction. Masking data in a column affects the display of the data. The data in the database stays unchanged and can be still used in other information models and programs. You can find the data masking feature in the Semantics node of a calculation view. In the Columns tab, you’ll see a Data Masking column, as shown on the right side of Figure 13.21. Here we created data masks for both the PHONENUMBER and ACCOUNTNUMBER fields Figure 13.21 Data Masking in the Semantics Node with a Data Masking Expression Clicking in the Data Masking column field, a dialog screen opens where you can add an Expression . You can create customizable mask expressions, and even use the built-in functions of the expression editor. In Figure 13.21 the account 579 580 Chapter 13 Security number—that could be a credit card number—is masked out, but it still shows the last three digits. Note Data masking only works for columns of type VARCHAR and NVARCHAR. Figure 13.22 shows what the masked data will look like to a user. Sensitive data in both the PHONENUMBER and ACCOUNTNUMBER fields is masked. Figure 13.22 Data Preview with Data Masking Let’s say that a few special end users, such as the call center managers, need to see the actual cleartext data? You could create another calculation view for these users, but that would require double maintenance each time the view needs to be updated. In this case, you grant the UNMASKED object privilege to the role assigned to these users, as shown in Figure 13.23. If you need to use SQLScript instead of graphical calculation views, you can use the WITH MASK clause in your SQLScript statements, for example, when creating tables or SQL views. This completes our tour of the security aspects you need for working with information models in SAP HANA XSA and HDI containers. Important Terminology Chapter 13 Figure 13.23 Granting the Unmasked Object Privilege Important Terminology In this chapter, we covered the following terminology: 쐍 Users Identities in the SAP HANA system that you assign to business people using your system. 쐍 Roles A collection of granted privileges in SAP HANA. You can combine several roles into another role, which can be called a composite role. 쐍 Template roles Templates that you can use to build your own roles. 쐍 Authentication Identifies users when they log in and ensures they are who they claim to be. 쐍 Authorization Identifies which tasks a user can perform in the system. 581 582 Chapter 13 Security 쐍 Privileges The individual rights you can grant to a role or a user. In SAP HANA, there are different types of privileges: – System privileges: These privileges grant users the rights to work with the SAP HANA system. Most of the system privileges are applicable to database and system administrators. – Object privileges: These privileges are used to assign rights for users to data source objects, including schemas, tables, virtual tables, procedures, and table functions. Requires very basic knowledge of SQL syntax. – Analytic privileges: These privileges control access to SAP HANA data models. There are two types of analytic privileges: classic XML-based analytic privileges and the new SQL-based analytic privileges. There is a migration tool to convert the old XML-based analytic privileges to the new SQL-based analytic privileges. 쐍 Restrictions in analytic privileges Row-level access restrictions that can be implemented using analytic privileges. Practice Questions These practice questions will help you evaluate your understanding of the topic. The questions shown are similar in nature to those found on the certification examination. Although none of these questions will be found on the exam itself, they will allow you to review your knowledge of the subject. Select the correct answers, and then check the completeness of your answers in the “Practice Question Answers and Explanation” section. Remember, on the exam, you must select all correct answers and only correct answers to receive credit for the question. 1. Which are the recommended assignments in SAP HANA? (There are 2 correct answers.) 첸 A. Privileges to users 첸 B. Roles to users 첸 C. Roles to roles 첸 D. Roles to privileges Practice Questions Chapter 13 2. In which SAP HANA deployment scenarios will end users be created as users in the SAP HANA system? (There are 2 correct answers.) 첸 A. SAP BW powered by SAP HANA 첸 B. SAP S/4HANA 첸 C. SAP HANA as a platform 첸 D. SAP HANA Live 3. True or False: The default behavior of SAP HANA is to allow access. 첸 A. True 첸 B. False 4. Your newly created analytic privilege does NOT want to build. What must you do? 첸 A. Grant rights to the _SYS_REPO user. 첸 B. Assign the ::access role to the HDI container user. 첸 C. Change the Apply Privileges property of the view. 첸 D. Change the analytic privilege to use an SQL Expression. 5. What do you have to assign to users to allow single sign-on (SSO)? 첸 A. SQL analytic privileges 첸 B. Kerberos authentication 첸 C. TRUST ADMIN system privilege 첸 D. Application privileges 6. True or False: Object privileges on a table with SELECT rights will allow you to access all the data. 첸 A. True 첸 B. False 583 584 Chapter 13 Security 7. What can you use to limit users to see only 2018 data? (There are 2 correct answers.) 첸 A. An object privilege 첸 B. A SQL analytic privilege 첸 C. A view with a filter 첸 D. A view with a restricted column 8. You have two calculation views for viewing manufactured products. The WORLD calculation view shows worldwide information. The INDIA calculation view is built on top of the WORLD calculation view and shows information for Indian products. What privilege must you assign to see your department’s specific data in a report built on the INDIA calculation view? 첸 A. An analytic privilege on the WORLD calculation view 첸 B. An analytic privilege on the INDIA calculation view 첸 C. An analytic privilege on both calculation views 첸 D. An object privilege on the WORLD calculation view 9. You want to read a table from an SAP HANA classical schema. What do you need to do to get it working? (There are 3 correct answers.) 첸 A. Use an explicit reference to the schema. 첸 B. Make use of a user-provided service. 첸 C. Make use of the built-in HDI service. 첸 D. Create a synonym. 첸 E. Assign the provided ::access role to your container’s technical user. 10. True or false: SAP HANA XSA application users are managed from an external Identity Provider (IdP) system. 첸 A. True 첸 B. False Practice Question Answers and Explanations Chapter 13 11. Where do you assign a user as a developer in a development space? 첸 A. SAP HANA XSA Administration tool 첸 B. SAP HANA cockpit 첸 C. SAP Web IDE for SAP HANA 12. What do you use to ensure that each account holder of a bank account can only see his or her own identity document details in a calculation view? 첸 A. Audit logs 첸 B. Analytic privileges 첸 C. Data masking Practice Question Answers and Explanations 1. Correct answers: B, C We can assign roles to other roles and roles to users in SAP HANA. We could assign privileges to users in SAP HANA 1.0, but this wasn’t recommended. In SAP HANA 2.0, this option was removed. We can’t assign roles to privileges. We can only assign privileges to roles. 2. Correct answers: C, D End users might be created as users in the SAP HANA system in the SAP HANA as a platform deployment scenario. In the case of SAP HANA XSA, where the application and database are separated, we also don’t need to create end users in the SAP HANA database. With SAP HANA Live, users are created in the SAP application server, and they are automatically created by the Authorization Assistant in SAP HANA. With SAP BW on SAP HANA and SAP S/4HANA, users are created in the SAP application servers, not in SAP HANA. 3. Correct answer: B False. The default behavior of SAP HANA is to deny access. Note True or False questions are included to help your understanding, but the actual certification exam doesn’t have such questions. 585 586 Chapter 13 Security 4. Correct answer: C If your newly created analytic privilege does NOT want to build, you must change the Apply Privileges property of the view. Granting rights to the _SYS_REPO user isn’t applicable to HDI containers because each container has its own technical user. Assigning the ::access role to the HDI container user is used for accessing data from another container and is unrelated to this issue. Changing the analytic privilege to use an SQL expression won’t solve the problem because all the types of restrictions in an analytic privilege are equivalent. An SQL expression isn’t a super power! 5. Correct answer: B You have to assign Kerberos authentication to users to allow SSO. 6. Correct answer: A True. Yes, an object privilege on a table with SELECT rights will allow you to access all the data. 7. Correct answers: B, C You can use a view with a filter or a SQL analytic privilege to limit users to see only 2018 data. A view with a restricted column will show all the data in all the fields, except in the restricted column. 8. Correct answer: C You need an analytic privilege on both calculation views to see the department’s specific data in a report built on the INDIA calculation view. If you only have access to one of the two views, SAP HANA will deny access to the other view, and your reporting data will not show. 9. Correct answers: B, D, E If you want to read a table from an SAP HANA classical schema, you need to make use of a user-provided service, create a synonym, and assign the provided ::access role to your container’s technical user. We work in a schema-free way with HDI containers, so you don’t use an explicit reference to the schema. The built-in HDI service is used when accessing a table from another HDI container, not from a classical schema. 10. Correct answer: A True. SAP HANA XSA application users are managed from an external IdP system. Summary Chapter 13 11. Correct answer: A You can assign a user as a developer in a development space using the SAP HANA XSA Administration tool via the Organization and Space Management tile. The SAP HANA cockpit is where you can create users and roles, as well as assign roles to users. You use SAP Web IDE for SAP HANA to create .hdbrole and .hdbanalyticprivilege files. 12. Correct answer: B You use analytic privileges to ensure that each account holder of a bank account can only see his or her own identity document details in a calculation view. Data masking hides data from anyone who isn’t allowed to see all the data Audit logs don’t control what data people can or can’t see. They just log what data people have seen. Takeaway You should now have a good overview of security in SAP HANA, especially as it relates to information modeling. You should understand when to use SAP HANA security and how users, roles, and privileges work together. You know how to create new users and roles and how to assign the various types of privileges, and you can explain analytic privileges and how to create them. In your journey, you also learned best practices and picked up practical tips to work with in your SAP HANA projects. Summary We started by looking at when to use SAP HANA security and when to use an application’s security. We then went into the details of creating users and roles and assigning privileges. Along the way, we discussed analytic privileges, which are particularly applicable to our modeling knowledge. We also looked at data security and privacy, using data masking and data anonymization. 587 588 Chapter 13 Security In our last chapter, we’ll explore how SAP HANA calculation views are used in SAP ERP systems running on SAP HANA, and how this has evolved on the journey to SAP S/4HANA. Chapter 14 Virtual Data Models Techniques You’ll Master 쐍 Understand the concepts behind SAP HANA Live 쐍 Get to know the two architecture options for SAP HANA Live 쐍 Learn about SAP HANA Live’s virtual data model (VDM) and understand what you’re allowed to use and modify within it 쐍 Choose the correct SAP HANA Live views for your reports 쐍 Know how to use the SAP HANA Live Browser tool 쐍 Know how to use the SAP HANA Live extension assistant 쐍 Design your own SAP HANA Live views 쐍 Extend and adapt standard SAP HANA Live views for your reporting requirements 쐍 Find out how the SAP HANA Live VDMs have evolved into the SAP S/4HANA embedded analytics 590 Chapter 14 Virtual Data Models Thus far, we’ve primarily discussed using SAP HANA as a modeling and development platform. In this chapter, we’ll turn our attention to the architecture deployment scenario, in which SAP HANA—via SAP HANA Live—is used as the database for SAP business systems such as SAP ERP, SAP Customer Relationship Management (SAP CRM), and SAP Supply Chain Management (SAP SCM). We’ll also look at how SAP HANA Live has evolved over the past few years into SAP S/4HANA embedded analytics. Real-World Scenario Imagine that you’re asked to help design and develop some new reports for a company—but these aren’t the usual management reports. The company wants to streamline its product deliveries. The company has been struggling to continuously plan the shipping of its products during the day. The company receives a shipping schedule from the data warehouse system every morning, but traffic jams and new customer orders quickly render this planned schedule useless. This causes backlogs at the loading zones and costs the company more in shipping because it can’t combine various customer orders in a single shipment. Someone has tried using a Microsoft Excel spreadsheet, but the data isn’t kept up to date, and it can’t consolidate the data from all the data sources. After hearing about SAP HANA’s ability to analyze data in real time, the company asks you to create some real-time delivery planning reports. You know that there are prebuilt SAP HANA Live information models available for the SAP ERP system, and you can use a few of these SAP HANA Live information models to meet this business need quickly. A few months later, the company migrated its SAP ERP system to SAP S/4HANA. You know what is different in the new SAP S/4HANA embedded analytics from the older SAP HANA Live, so you can guide the company to get the solution up and running quickly using the new virtual data model (VDM). Key Concepts Refresher Chapter 14 Objectives of This Portion of the Test The objective of this portion of the SAP HANA certification is to test your knowledge of SAP HANA Live and SAP S/4HANA embedded analytics concepts. The certification exam expects you to have a good understanding of the following topics: 쐍 SAP HANA Live concepts and architecture 쐍 A high-level overview of SAP HANA Live implementation 쐍 The architecture of SAP HANA Live with the VDM 쐍 How to find and consume the right SAP HANA Live information models using the SAP HANA Live Browser 쐍 How to extend and modify the available SAP HANA Live models to address business requirements 쐍 A high-level view of SAP S/4HANA embedded analytics, and how this differs from SAP HANA Live Note This portion contributes up to 5% of the total certification exam score. The exam questions for this portion include the topic of core data services (CDS), which we discussed in Chapter 5. Key Concepts Refresher Simply put, SAP HANA Live uses predelivered content in the form of calculation views for real-time operational reporting. SAP S/4HANA uses predelivered content in the form of ABAP CDS views. In this chapter, we’ll look first at the real-world scenario again. Then we’ll walk through the history of business information systems to understand the problem posed in this scenario better and determine how using SAP HANA Live and SAP S/4HANA embedded analytics addresses this problem. Then, we’ll look at what necessitated SAP HANA Live, similar to Chapter 3, in which we discussed the impact SAP HANA’s fast in-memory architecture had on systems that traditionally used slower disk memory. From there, we’ll dive into the architecture of SAP HANA Live, its VDM, its many views, and steps for installation and administration. After we’ve covered these foundational topics, we’ll shift our attention to the primary tools used within SAP HANA Live: SAP HANA Live Browser and SAP HANA Live extension assistant. Then we’ll discuss the steps for modifying SAP HANA Live information models. 591 592 Chapter 14 Virtual Data Models Finally, we’ll look at how the concepts of SAP HANA Live have evolved to SAP S/4HANA embedded analytics using ABAP CDS views. Reporting in the Transactional Systems Many years ago, SAP only had a single product, called SAP R/3. This offering blossomed into the SAP ERP system that is still used today. SAP ERP did everything for companies, from finances to manufacturing, procurement, and sales to delivery, payroll, and store management. The SAP ERP system was a transactional system, what we now call an online transactional processing (OLTP) system (also called an operational system). All reports were run from within this system, although they tended to be simple “listing” reports. Over time, the need arose for more complex reports. In response, more advanced analytics and dashboard tools were provided. Through these tools, people learned how to use cubes to analyze data. When developers tried to build these new analytics into the transactional system, they discovered that these new reporting features had a huge impact on the performance of the SAP ERP system. A single reporting user running a large cube could consume more data from the database than hundreds of transactional users. It was unacceptable that a single user could slow the entire system down, causing hundreds of other users to suffer the consequences of slow responses. To address such performance issues, all reporting data from the transactional system was copied to a newly invented reporting system. The era of data warehouses had arrived. All reporting data is updated from the transactional system to the data warehouse during the night. This way, the performance of the transactional system remained acceptable during the workday. The new data warehouses were called online analytical processing (OLAP) systems, and OLTP and OLAP went their separate ways (see Figure 14.1). This departure helped the performance of both systems but eventually led to countless other headaches. Because data was only copied once every 24 hours, reports were always outdated because yesterday’s data was used for the reports. We couldn’t see what was happening in the OLTP system in real time. People then started using Microsoft Excel spreadsheets to address some of the issues, which in turn led to more problems and risks (e.g., reports that didn’t agree, wrong formulas, and an inability to consolidate information). Key Concepts Refresher Chapter 14 SAP ERP, SAP CRM, SAP SRM, SAP GRC, … Data warehouse Data updated every night Database (OLTP) Database (OLAP) Figure 14.1 Separate OLTP and OLAP Systems Note Our real-world scenario at the start of the chapter provides a good example of how yesterday’s data and planning can’t address today’s traffic jams. SAP HANA can solve such problems because it can work with both OLAP and OLTP data in a single system, without causing slow performance when reporting users analyze large data sets. By introducing SAP HANA Live, which moves the operational reporting functionality from the data warehouse back into the operational system, SAP solved many of the problems we just mentioned. If you’re wondering if SAP HANA makes data warehouses such as SAP Business Warehouse (SAP BW) irrelevant, the short answer is no because data warehouses are multipurpose. In the real-world scenario we used for this chapter, we only looked at the operational reporting aspects of data warehouses, which aren’t particularly strong points for such systems. If you had a single operational system with a single warehouse for reporting, you could do everything with SAP HANA— but reality tends to be more complex. Most companies use their data warehouses to consolidate data from many systems, as shown in Figure 14.2. They use data warehouse features such as the following: 쐍 Data governance 쐍 Preconfigured content in the form of cube definitions and reports 쐍 Data lifecycle management, including data aging and a complete history of business transactions 593 594 Chapter 14 Virtual Data Models 쐍 Cross-system consistency and integration 쐍 Planning, consolidation, and consumption Legacy SAP CRM Data Warehouse SAP ERP Microsoft Database (OLAP) Database (OLTP) SAP Ariba SAP SRM Figure 14.2 Data from Multiple Business Systems Consolidated into the Data Warehouse for Reporting, Transactional History, and Aging Tip It makes no sense to try to get rid of data warehouses completely due to some operational reporting issues. A better solution is to focus on addressing the few problematic areas in which data warehouses aren’t optimal. Because SAP BW is primarily used for data warehousing, operational reporting needed to be shifted back to SAP ERP. To enable this functionality, SAP ERP now uses SAP HANA as a database. This allows companies to run their operational reports, even with complex analytics, on real-time SAP ERP data directly from the underlying SAP HANA database via calculation views. Key Concepts Refresher Chapter 14 Note The SAP HANA Live calculation views were built using the SAP HANA Studio based on tables in a classical database schema. They aren’t .hdbcalculationview files, and don’t use data sources in SAP HANA Deployment Infrastructure (HDI) containers. Up to now, we’ve focused on the SAP HANA extended application services, advanced model (SAP HANA XSA) concepts, whereas SAP HANA Live was built using the older SAP HANA 1.0 approach. SAP HANA Live SAP HANA Live is the tool used to install, select, maintain, and modify the SAP HANA calculation views used for operational reporting in the SAP ERP system. SAP HANA Live isn’t limited to SAP ERP but also is available for many other SAP business systems, such as SAP CRM, SAP SCM, SAP Governance, Risk, and Compliance (SAP GRC), and SAP Product Lifecycle Management (SAP PLM). Some of the benefits of SAP HANA Live include the following: 쐍 Real-time reporting and analytics are run on up-to-date information. 쐍 SAP HANA doesn’t need to store precalculated aggregates in cubes. 쐍 OLTP and OLAP run in the same system, which can help reduce the total cost of ownership (TCO). 쐍 Access is provided for any reporting tool or application. 쐍 Customers and partners can extend SAP HANA Live to suit their needs. Now that you have a better understanding of how SAP HANA Live came to be, the following subsections will dive further into SAP HANA Live’s architecture, VDMs, views, administration procedures, and modifications. SAP HANA Live Architecture SAP HANA Live needs an SAP HANA database. Connecting to an SAP HANA database can be achieved in two ways (in this case, we just concentrate on the SAP ERP system, but this also applies to all the other SAP business systems such as SAP CRM, SAP SCM, SAP GRC, and SAP PLM; see Figure 14.3): 쐍 SAP HANA as a database The first option is to migrate an SAP ERP system’s database to SAP HANA. No additional systems are required. This is also known as the integrated scenario. 595 596 Chapter 14 Virtual Data Models 쐍 SAP HANA as a side-by-side accelerator The second option is to keep your current SAP ERP database, for example, Microsoft SQL Server, and install an SAP HANA system as a side-by-side accelerator (as discussed in Chapter 3). Then, you replicate the required database tables to the SAP HANA system using SAP Landscape Transformation Replication Server. You can find more information about SAP Landscape Transformation Replication Server in Chapter 9. SAP Landscape Transformation Replication Server enables real-time replication so that SAP HANA always has the latest data available. For this scenario, you need to install the SAP Landscape Transformation Replication Server and SAP HANA systems. Reports & Analytics SAP ERP SAP ERP SAP HANA with SAP HANA Live loaded SAP HANA with SAP HANA Live loaded Other Database SLT Replication Option 1: SAP HANA as a database Option 2: SAP HANA as a sidecar Figure 14.3 Architecture Options for SAP HANA Live After you’ve connected to an SAP HANA database, you can build reports, analytics, and dashboards directly on the SAP HANA Live content. Other applications can also call this content directly. Note In both architectural scenarios, SAP HANA Live is installed in the SAP HANA database. Key Concepts Refresher Chapter 14 SAP HANA Live Virtual Data Model SAP HANA Live content consists primarily of calculation views that are designed, created, and delivered by SAP. These calculation views are built for operational reporting instead of analytical reporting purposes and, therefore, use mostly join nodes. The information models are relational rather than dimensional (cube-like). A system administrator loads the SAP HANA Live content into the SAP HANA database. These calculation views are layered in a structure called the SAP HANA Live virtual data model (VDM), as mentioned earlier. SAP HANA Live is structured internally as shown in Figure 14.4. Reports, Analytics, and Applications Customer Query Views Customer Extensions Query Views Reuse Views Value Help Views Private Views Standard SAP HANA Live content SAP HANA Live – Virtual Data Model (VDM) Database Tables Figure 14.4 SAP HANA Live VDM The VDM consists of the following types of views: 쐍 Private views These are built directly on the database tables and on other private views. These denormalize the database source data for use in the reuse views. 597 598 Chapter 14 Virtual Data Models 쐍 Reuse views These structure the data in a way that business processes can use. These views can be reused by other views. 쐍 Query views These can be consumed directly by analytical tools and applications. They aren’t reused by other views. These views are used when designing reports. 쐍 Customer query views These are created either by copying SAP-provided query views or by creating your own query views. You can add your own filters, variables, or input parameters to these views. You can also expose additional data fields or hide fields that aren’t required. 쐍 Value help views These are used as lookups (i.e., dropdown lists or filter lists) for selection lists in analytical tools and business applications. They make it easier for users to filter data by providing a list of possible values to choose from. Value help views aren’t used in other VDM views; they are consumed directly by analytical tools and business applications. Tip You should never call private and reuse views directly. You can build your reports, analytics, or business applications directly on query views. These can be either SAP-provided query views or custom query views. When you run your report, these query views are called. They in turn call the relevant reuse views, which call the relevant private views, which in turn read the relevant database tables. The value help views are called whenever dropdown lists are used to display possible values that a user can choose from. Note Modifying any SAP-provided VDM views isn’t recommended because all SAP-provided VDM views are overwritten when a newer version of the SAP HANA Live content is installed. There is no mechanism like the one in ABAP, in which the system asks if you want to keep any customer changes. Therefore, you should always make a copy of a VDM query view to modify, rather than modifying the original. Key Concepts Refresher Chapter 14 SAP HANA Live Views Most of the SAP-provided SAP HANA Live views are calculation views. All the SAP HANA modeling techniques that you’ve learned are relevant to these views. Figure 14.5 shows an example of such a calculation view in the SAP HANA Studio. Figure 14.5 SAP-Provided Calculation View (OverallBillingBlockStatus) in SAP HANA Live for SAP ERP Tip In SAP HANA 2.0, we don’t use the SAP HANA Studio. You see references to the SAP HANA Studio here because, as we mentioned already, SAP HANA Live still uses the older SAP HANA 1.0 modeling approach. You’ll notice several characteristics of these calculation views in Figure 14.5: 쐍 Several nodes in layers This is to denormalize the source data. Normalizing data is normal in an OLTP system to improve performance but isn’t required for analytics. 쐍 Multiple joins These are used to combine reuse views to create the VDM query view. 쐍 Variables, input parameters, and calculated attributes The use of input parameters has been limited due to the fact that some reporting tools have difficulty using them. Default values are always supplied for the input parameters. 599 600 Chapter 14 Virtual Data Models SAP HANA Live Administration SAP HANA Live is installed in the SAP HANA database. A system administrator will normally install the SAP HANA Live content package. After SAP HANA Live is installed on an SAP HANA system, you’ll see the content in the Content folder, under sap/hba, as shown in Figure 14.6. You can explore the calculation views in the SAP HANA Studio. We don’t recommend that you change the views here. As mentioned earlier, all the SAP-provided VDM views are overwritten when a newer version of the SAP HANA Live content is installed. There is no mechanism in place within SAP HANA Live to ask if you want to keep any customer changes, so you could lose all your hard work during an SAP HANA Live update. Figure 14.6 SAP HANA Live Content Package in the SAP HANA Studio Note SAP HANA Live has its own release cycles and timelines. SAP HANA Live content updates don’t have to correspond to the SAP HANA system update cycles and timelines. In the sap/hba package, you’ll also find various web-based tools for working with SAP HANA Live content: 쐍 SAP HANA Live Browser This tool is used to find the right query views for your reporting needs. 쐍 SAP HANA Live extension assistant This tool helps you copy and modify views based on your business requirements. Let’s look at the full breadth of SAP HANA Live Brower’s functionality. SAP HANA Live Browser The SAP HANA Live Browser tool is a web-based (HTML5) application that enables you to find the correct views for your reporting or analytics requirements. Sometimes, you might need to ask an SAP functional person that knows the source Key Concepts Refresher Chapter 14 system well to help you find the required views. For example, to find sales information in an SAP ERP system, you might need the advice of an SAP ERP Sales and Distribution (SAP ERP SD) consultant. You can find the SAP HANA Live Browser tool (Figure 14.7) by accessing http://<SAP HANA server hostname>:8000/sap/hana/hba/explorer. (This assumes that the instance number is 00.) In this section, we’ll walk through the different areas found in the SAP HANA Live Browser, starting with the top menu’s ALL VIEWS section (see Figure 14.7). Figure 14.7 SAP HANA Live Browser Here you can search for views by typing a part of the name of the view in the Filter field. For each view, you can see the following information: 쐍 View name 쐍 Description of the view 쐍 View category (private, reuse, or query view) 쐍 View type (mostly calculation views) 601 602 Chapter 14 Virtual Data Models 쐍 To which area of the application the view is applicable 쐍 In which SAP HANA package the view is located Figure 14.8 shows a partial list of views that we can use for procurement reports. Figure 14.8 Filter String Entry to Find Views You can access many features in SAP HANA Live Browser from the provided toolbar (see Figure 14.9). Figure 14.9 SAP HANA Live Browser Toolbar The following functions are available from the toolbar: 쐍 Open Definition The Open Definition tab (Figure 14.10) shows the metadata of the SAP HANA Live view. These fields are available for your reports and analytics. If you need other fields, you’ll have to use the SAP HANA Live extension assistant. 쐍 Open Content The Open Content tab (Figure 14.11) shows a preview of the data an SAP HANA Live view will produce. 쐍 Open Cross Reference The Open Cross Reference tab (Figure 14.12) shows an interactive and graphical depiction of all the related tables and views used by this specific SAP HANA Live view. You can also see an outline version instead of the graphical version. Key Concepts Refresher Chapter 14 Figure 14.10 Open Definition Tab Showing SAP HANA Live View Metadata Figure 14.11 Open Content Tab Displaying Data Produced by an SAP HANA Live View 603 604 Chapter 14 Virtual Data Models Figure 14.12 Open Cross Reference Tab Showing Related Tables and Views Used by This SAP HANA Live View (Graphical Version) 쐍 Tag The Tag tab (Figure 14.13) allows you to add your own keywords (tags) to describe SAP HANA Live views. You can use these tags to group some SAP HANA Live views that you would like to use in your reporting, analytics, or dashboards. Figure 14.13 Adding Tags to an SAP HANA Live View 쐍 Generate SLT File The Generate SLT File icon allows you to generate configurations for SAP Landscape Transformation Replication Server to replicate only the tables that you require for your reporting or analytics. This function is only required if you’re Key Concepts Refresher Chapter 14 using SAP HANA Live in a sidecar configuration. You won’t replicate the entire source system to the SAP HANA database—only the tables used by the business end users. 쐍 Open View in SAP Lumira and Open View in Analysis Office The last two tabs on the SAP HANA Live Browser toolbar allow you to open the specified SAP HANA Live view in either the SAP Lumira (Figure 14.14) or the SAP Analysis for Microsoft Office reporting tools. This allows you to create reports easily and quickly. Figure 14.14 Output of an SAP HANA Live View for Sales in SAP Lumira The SAP HANA Live Browser doesn’t show you the actual models or where they are used in your reports. You can use the SAP HANA Studio to view SAP HANA Live models. You also can see a list of INVALID VIEWS (Figure 14.15). Views can appear on this list if they are missing underlying tables or dependent views. Such views won’t activate. (Activation of older calculation views is similar to the Build option of newer calculation views.) 605 606 Chapter 14 Virtual Data Models Figure 14.15 List of Invalid SAP HANA Live Views Until now, we’ve discussed the developer edition of the SAP HANA Live Browser. There is also a business user version of SAP HANA Live Browser available at http:// <SAP HANA server host>:8000/sap/hana/hba/explorer/buser.html. This version only allows you to find a model, preview the content, and open the view in SAP Lumira or SAP Analysis for Microsoft Office. It’s meant for nontechnical users, such as functional consultants or specialists. Advanced features, such as generating SAP Landscape Transformation Replication Server files, viewing broken models, tagging a model, or opening graphical cross references, aren’t available in the business user edition of the SAP HANA Live Browser. SAP HANA Live Extension Assistant: Modify Views Earlier in the chapter, Figure 14.5 showed an example of an SAP HANA Live view, which is similar to any other view in SAP HANA. There are two ways to modify SAP HANA Live views: 쐍 SAP HANA Studio We can use the SAP HANA Studio to modify, preview, test, debug, and trace SAP Key Concepts Refresher Chapter 14 HANA Live information models. This is similar to the way we modify SAP HANA views we created in previous chapters by using SAP Web IDE for SAP HANA. 쐍 SAP HANA Live extension assistant This tool can also be used to modify SAP HANA Live views, as shown in Figure 14.16. Figure 14.16 SAP HANA Live Extension Assistant in the SAP HANA Studio SAP HANA Live views don’t expose all their attributes and measures; SAP has carefully chosen a subset of these elements for the most common business processes. However, there are many hidden fields. You can use the SAP HANA Live extension assistant to change which fields are hidden or shown in a particular view. Note Don’t delete the fields you’re not using; just mark them as hidden. SAP HANA’s optimization is very good, and hiding a field doesn’t increase the runtime of queries. 607 608 Chapter 14 Virtual Data Models Note You can only extend reuse and query views, and you can extend a particular view multiple times. The SAP HANA Live extension assistant can’t extend the following: 쐍 A query view that contains a Union node. 쐍 A query view that contains an Aggregation node as one of the middle nodes. An Aggregation node as the top node is normal and allowed. 쐍 Your own custom views. It only recognizes SAP-delivered views. Tip You shouldn’t extend SAP-delivered views. Make a copy, and extend that instead. You should now have a good understanding of what SAP HANA Live offers to customers and when to use it. Now we turn our attention to a newer VDM, namely SAP S/4HANA embedded analytics. SAP S/4HANA Embedded Analytics We’ve already mentioned that SAP HANA Live has evolved into SAP S/4HANA embedded analytics. As the advantages of SAP HANA became more apparent, SAP created a completely new version of SAP ERP that uses only SAP HANA: SAP S/4HANA. With this new architecture and the lessons learned, we see that SAP HANA Live evolved into SAP S/4HANA embedded analytics. Tip SAP S/4HANA only runs on SAP HANA, so there is no side-by-side option for SAP S/4HANA embedded analytics as we have with SAP HANA Live. Logically, as more customers migrate over to SAP S/4HANA, the SAP S/4HANA embedded analytics is taking over from SAP HANA Live. Similar Features Many things in SAP HANA Live work very well and were kept in SAP S/4HANA embedded analytics. The appearance and mechanisms might have been updated, but the essence and intent of the following features stayed the same: Key Concepts Refresher Chapter 14 쐍 Operational analysis reporting occurs from the data in the SAP transactional system. 쐍 VDMs are used. 쐍 SAP builds, provides, and maintains a large library of views for use by power users and end users in reports and analytics. 쐍 The SAP-provided views can be extended by customers. 쐍 The views are built up in layers to enable reuse of lower level views. 쐍 The lower level views aren’t open for changes by customers. 쐍 Modelers, developers, analysts, and power user can browse and search these models. In Table 14.1, we see a list of features available in SAP S/4HANA embedded analytics for the different types of users. You’ll see that many of them reflect the points we just mentioned. Type of user Offerings End users: Analyze the data and act accordingly 쐍 Multidimensional reports and visualizations 쐍 SAP Smart Business key performance indicators (KPIs) and analysis path 쐍 Query Browser 쐍 Analytical SAP Fiori apps Power users: Enable the end users 쐍 Tooling, such as Query Designer 쐍 View and Query Browser 쐍 SAP Fiori KPI Modeler IT users: Provide data in a semantic layer 쐍 CDS view maintenance: ABAP for Eclipse Table 14.1 SAP S/4HANA Embedded Analytics Offerings for Different Types of Users You might also have noticed a few differences, for example, ABAP for Eclipse for the IT users. New and Updated Features Some aspects of SAP S/4HANA embedded analytics are completely different from SAP HANA Live. To understand these differences, it’s good to take a look at the personas developing and creating reports and enhancements inside SAP S/4HANA. 609 610 Chapter 14 Virtual Data Models While modelers and analysts focused more on SAP BW, the people creating operational reporting and extending the functionality of the SAP ERP system have mostly been the ABAP developers. There is more focus on the ABAP developer with SAP S/4HANA embedded analytics. Another point to keep in mind is where these users do their work. Both ABAP developers and SAP BW modelers work in SAP applications. They work at the application layer, and not in the lower database layer. Herein lies a “problem” with SAP HANA Live as this is completely developed in the database layer. These users need SAP HANA login details and work outside of their normal working environments with SAP HANA Live. The next point to look at is the fact that SAP HANA calculation views are read-only views. Many transactional systems require actions to be taken at certain times and when certain conditions occur. Reading and writing would work far better here. With this in mind, SAP decided to build SAP S/4HANA embedded analytics with ABAP CDS views. Note ABAP CDS and SAP HANA CDS are somewhat similar, but they aren’t the same. SAP HANA CDS views are specific to SAP HANA. ABAP CDS can be use with other databases as well and therefore isn’t as feature-rich as SAP HANA CDS. Tip The concept of CDS, which we already discussed in Chapter 5, is also part of the “SAP-supplied virtual data models” exam topic area. We can now look at some of the features in SAP S/4HANA embedded analytics that differ from those of SAP HANA Live: 쐍 The views are created with ABAP CDS. 쐍 The views are built and stored in the ABAP Data Dictionary. 쐍 All authorizations are done with standard security roles in the application. (SAP HANA Live uses SAP HANA analytic privileges for security.) 쐍 The ABAP CDS views are created in a text editor (not a graphical modeler) in the ABAP for Eclipse environment. This is similar to the SAP HANA Studio, which is also based on Eclipse. 쐍 The focus is on developers rather than on modelers. Key Concepts Refresher Chapter 14 쐍 Calculation views focus solely on BI scenarios, whereas ABAP CDS views can also be used for transactional and analytical scenarios. 쐍 ABAP CDS is very integrated with the standard transport process to get development artifacts from a development system to production systems. 쐍 SAP S/4HANA embedded analytics provides predelivered content in the form of ABAP CDS views. SAP HANA Live provides SAP HANA calculation views. The SAP S/4HANA embedded analytics VDM reflects these differences. Virtual Data Model In Figure 14.17, we see the VDM for SAP S/4HANA embedded analytics. Compare this to Figure 14.4, which shows the VDM for SAP HANA Live. At first glance, you’ll notice a lot of similarities. Reports, Analytics, and Applications Customer Extension Customer Extension Consumption views Composite views Basic views Interface Views SAP S/4HANA embedded analytics Database Tables Figure 14.17 VDM for SAP S/4HANA Embedded Analytics 611 612 Chapter 14 Virtual Data Models The main difference seems to be the naming of the views in the different layers. Instead of private and reuse views, we now have basic and composite views. Just as you aren’t allowed to modify private and reuse views in SAP HANA Live, the same restrictions apply to the basic and composite views in SAP S/4HANA embedded analytics. Where we had query views and customers extensions in SAP HANA Live, we have consumption views and customer extensions in SAP S/4HANA embedded analytics. Note The certification exam focuses on the modeling aspects of SAP HANA. As such, the interest in this chapter was more on the calculation views as used in SAP HANA Live. We spent less time on ABAP CDS views because it’s outside the scope of SAP HANA modeling. Important Terminology In this chapter, we covered the following important terminology: 쐍 Operational reporting Operational reporting is reporting that is done on transactional data (e.g., sales and deliveries) to enable people to do their work today (e.g., delivering a sold item). SAP BW reports are normally the next day only, and they don’t help with, for example, today’s deliveries. SAP HANA Live is meant only for operational reporting requirements in SAP operational systems such as SAP ERP. 쐍 SAP HANA Live SAP HANA Live consists of SAP-created calculation views (for the most part) and several tools for working with these SAP HANA views. 쐍 SAP HANA Live views The SAP HANA Live views are organized in an architectural structure called the VDM. The different SAP HANA Live views are as follows: – Private views: These are built directly on the database tables and on other private views. These denormalize the database source data for use in the reuse views. – Reuse views: These structure the data in a way that business processes can use. These views can be reused by other views. Practice Questions Chapter 14 – Query views: These can be consumed directly by analytical tools and applications. They aren’t reused by other views. These views are used when designing reports. – Customer query views: These are created either by copying SAP-provided query views or by creating your own query views. You can add your own filters, variables, or input parameters to these views. You can also expose additional data fields or hide fields that aren’t required. – Value help views: These are used as lookups (i.e., dropdown lists or filter lists) for selection lists in analytical tools and business applications. They make it easier for users to filter data by providing a list of possible values to choose from. Value help views aren’t used in other VDM views; they are consumed directly by analytical tools and business applications. 쐍 SAP HANA Live Browser The SAP HANA Live Browser tool is a web-based (HTML5) application that enables you to find the correct views for your reporting or analytics requirements. 쐍 SAP HANA Live extension assistant The SAP HANA Live extension assistant tool is used to modify SAP HANA Live views. 쐍 SAP S/4HANA embedded analytics SAP S/4HANA embedded analytics consists of SAP-created ABAP CDS views and several tools for working with these views. Practice Questions These practice questions will help you evaluate your understanding of the topics covered in this chapter. The questions shown are similar in nature to those found on the certification examination. Although none of these questions will be found on the exam itself, they will allow you to review your knowledge of the subject. Select the correct answers, and then check the completeness of your answers in the “Practice Question Answers and Explanations” section. Remember, on the exam, you must select all correct answers and only correct answers to receive credit for the question. 613 614 Chapter 14 Virtual Data Models 1. Where is SAP HANA Live installed? 첸 A. On SAP Landscape Transformation Replication Server 첸 B. On the SAP ERP application server 첸 C. In the reporting tool 첸 D. In the SAP HANA database 2. Which of the following views can you extend with the SAP HANA Live extension assistant? 첸 A. Query views with a union 첸 B. Reuse views 첸 C. All SAP-delivered views 첸 D. Your own custom views 3. Which sentence best describes SAP HANA Live? 첸 A. Operational reporting moving back to the OLTP system 첸 B. Analytical reporting moving back to the OLAP system 첸 C. Operational reporting moving back to the OLAP system 첸 D. Analytical reporting moving back to the OLTP system 4. Which of the following are features of a data warehouse? (There are 3 correct answers.) 첸 A. Consolidation 첸 B. Data governance 첸 C. Both OLTP and OLAP in one system 첸 D. Real-time reporting 첸 E. A full transactional history 5. What tool generates files to help configure SAP Landscape Transformation Replication Server table replication? 첸 A. SAP HANA Live extension assistant 첸 B. Authorization Assistant 첸 C. SAP HANA Studio 첸 D. SAP HANA Live Browser Practice Questions Chapter 14 6. True or False: You can create custom query views by directly modifying SAPdelivered query views. 첸 A. True 첸 B. False 7. True or False: When you upgrade SAP HANA Live content, you can choose to keep your modifications. 첸 A. True 첸 B. False 8. Which of following are features associated with SAP HANA Live? (There are 2 correct answers.) 첸 A. A new data warehouse 첸 B. Hundreds of predeveloped views 첸 C. Web-based tools 첸 D. Precalculated aggregates stored in cubes 9. Which of the following terms are associated with an operational system? (There are 2 correct answers.) 첸 A. OLTP 첸 B. OLAP 첸 C. Transactional 첸 D. Data warehouse 10. True or False: SAP HANA Live replaces SAP BW. 첸 A. True 첸 B. False 11. True or False: You can view metadata of SAP HANA Live views with the business user version of the SAP HANA Live Browser. 첸 A. True 첸 B. False 615 616 Chapter 14 Virtual Data Models 12. In which package in SAP HANA will you find the SAP HANA Live views? 첸 A. /sap/hana/democontent 첸 B. /public 첸 C. /sap/hba 첸 D. /sap/bc 13. As what are most of the views in SAP HANA Live created? 첸 A. Table functions 첸 B. Calculation views 첸 C. CDS data sources 첸 D. Cubes 14. Which of the following VDM views can be called by applications and reports? (There are 2 correct answers.) 첸 A. Reuse views 첸 B. Value help views 첸 C. Private views 첸 D. Query views 15. Which views in SAP S/4HANA embedded analytics may you extend? 첸 A. Composite views 첸 B. Query views 첸 C. Interface views 첸 D. Consumption views 16. How does SAP S/4HANA embedded analytics differ from SAP HANA Live? (There are 2 correct answers.) 첸 A. It uses ABAP CDS views. 첸 B. It uses SAP HANA calculation views. 첸 C. Authorizations are at the database level. 첸 D. Authorizations are at the application level. Practice Question Answers and Explanations Chapter 14 17. True or False: SAP S/4HANA embedded analytics also has two architecture deployment options like SAP HANA Live, namely embedded and side-by-side. 첸 A. True 첸 B. False Practice Question Answers and Explanations 1. Correct answer: D SAP HANA Live is installed in the SAP HANA database (refer to Figure 14.3). The SAP Landscape Transformation Replication Server is only used for the sidecar scenario. You still need to install SAP HANA Live, even if you don’t have a sidecar scenario, so this is obviously wrong. SAP HANA calculation views need to be inside SAP HANA, as the name indicates. They won’t run in the SAP ERP application server. Reporting tools can use SAP HANA views, but these views can’t be run in the reporting layer. 2. Correct answer: B You can extend reuse and query views with the SAP HANA Live extension assistant. You can’t extend query views with a Union node or your own custom views. 3. Correct answer: A SAP HANA Live can be described as “Operational reporting moving back to the OLTP (operational) system.” 4. Correct answers: A, B, E Data warehouse features include consolidation, data governance, and a full transactional history. Please note that the question did not specify that this is an SAP BW on SAP HANA system. It asked about a generic data warehouse. Having both OLTP and OLAP in one system is only available for SAP HANA real-time reporting. 5. Correct answer: D SAP HANA Live Browser can help generate files for SAP Landscape Transformation Replication Server replication. SAP HANA Live extension assistant is used for modifying SAP HANA Live views. 617 618 Chapter 14 Virtual Data Models The Authorization Assistant helps with creating users and roles for SAP HANA Live. The SAP HANA Studio doesn’t have this functionality. 6. Correct answer: A This is actually a “bad” question. It’s clear from this chapter that you should not create custom query views by directly modifying SAP-delivered query views. However, the question asked if you can do it. Technically, it’s possible—but you’ll lose all your hard work during the next update of the SAP HANA Live content. 7. Correct answer: B When you upgrade the SAP HANA Live content, there is no option for you to keep your modifications. All changes to SAP-delivered content will be lost. This is different from the process in ABAP systems, which does ask if you want to save your modifications. This question was very similar to the previous question. In the certification exam, you won’t find one question providing the answer to another. The SAP team ensures that this doesn’t happen. 8. Correct answers: B, C SAP HANA Live consists of hundreds of predeveloped views, as well as some web-based tools. SAP HANA Live isn’t a new data warehouse, as we clearly specified that it doesn’t replace SAP BW. With SAP HANA, we don’t store precalculated aggregates in cubes. This is only done in old non-SAP HANA data warehouses. 9. Correct answers: A, C An operational system is the transactional (OLTP) system. 10. Correct answer: B SAP HANA Live does NOT replace SAP BW. Only the operational reporting portion moves back from the data warehouse to the transactional system. 11. Correct answer: B You can’t view the metadata of SAP HANA Live views within the business user version of SAP HANA Live Browser. Metadata, like table definitions, is only used by modelers and developers. This version is meant for nontechnical users. Takeaway Chapter 14 12. Correct answer: C The SAP HANA Live views are in the /sap/hba subpackage. Refer to Figure 14.6 to see the SAP HANA Live content in the SAP HANA Studio. 13. Correct answer: B Most of the views in SAP HANA Live are calculation views. 14. Correct answers: B, D Query views and value help views can be called by applications and reports (refer to Figure 14.4). Reuse views and private views may not be called directly. 15. Correct answer: D You may extend consumption views in SAP S/4HANA embedded analytics. Query views are in SAP HANA Live. Interface views and composite views may not be extended. 16. Correct answers: A, D SAP S/4HANA embedded analytics differs from SAP HANA Live in that it uses ABAP CDS views, and authorizations are at the application level. SAP HANA Live uses SAP HANA calculation views, and the authorizations are at the database level. 17. Correct answer: B False: SAP S/4HANA embedded analytics does not have two architecture deployment options like SAP HANA Live. It does not have a side-by-side option because it can only be deployed on SAP HANA. Takeaway You should now know what SAP HANA Live is, the concepts behind it, and the architecture options. You can use your understanding of the VDM to choose which views can be used, modified, or extended, and you can use the SAP HANA Live Browser tool to help design your reports. You also got an overview of how SAP HANA Live has evolved into SAP S/4HANA embedded analytics and how this differs from SAP HANA Live. 619 620 Chapter 14 Virtual Data Models Summary In this chapter, you used your knowledge of SAP HANA modeling in a scenario in which SAP HANA is used as a database for SAP business systems such as SAP ERP. The difference here is that SAP has developed hundreds of information models already, and you can tap into this vast resource to build powerful analytical reports quickly. This chapter completes your learning experience with SAP HANA modeling. We wish you the best of luck with your SAP HANA certification exam! 621 The Author Rudi de Louw is head of the SAP Co-Innovation Lab at SAP in South Africa where he supports SAP partners in building or improving their business ideas and systems with SAP platforms such as SAP HANA, SAP Cloud Platform, and SAP Leonardo. In this role, he also helps partners extend the integration of their current business products with available and emerging SAP technologies. Rudi has been working with SAP HANA since 2010, and he has been involved from the outset in preparing the SAP HANA certification exams. He has provided software and technology solutions to businesses for more than 25 years and has a passion for sharing knowledge, mentoring, understanding new technologies, and finding innovative ways to leverage these skills to help individuals in their development. 623 Index A ABAP ......................................................................... 384 ABAP Data Dictionary ....................................... 610 Accelerated K-Means ......................................... 442 Accelerator deployment ...................................... 96 profitability analysis ........................................ 95 Acclaim platform .................................................... 36 ACID-compliant database ................................... 86 Active/active (read-enabled) mode ................ 90 Advanced DSO ......................................................... 99 Affinity propagation .......................................... 435 AFLLANG ................................................................. 449 procedure ........................................................... 450 Aggregation ........................................ 210, 217, 522 function .............................................................. 260 Aggregation node .......... 258, 270, 271, 273, 276 restricted column ........................................... 288 Algorithms ............................................................. 435 Alias ........................................................................... 267 Amazon Web Services (AWS) ......... 50, 106, 118 American National Standards Institute (ANSI) ................................................ 340 Analytic privileges ........................... 571, 582, 586 assign to role .................................................... 571 Anonymization node ........................................ 261 Application Function Library (AFL) ............ 347, 433, 438 Application privileges ....................................... 576 Application time period tables ...................... 344 Apply privileges property ................................ 583 Apriori algorithm ................................................ 435 ArcGIS ....................................................................... 462 Architecture .......................... 71, 79, 114, 521, 542 Associated container ......................................... 557 Association algorithms ..................................... 435 Asynchronous replication .................................. 90 Attribute .................................... 181, 200, 203, 217 calculated column .......................................... 284 restricted column ........................................... 288 restrictions ........................................................ 573 Authentication ..................................................... 554 Authorization .............................................. 539, 554 Assistant ............................................................. 552 Automated Predictive Library (APL) ........... 433 B Backups ............................................................... 87, 88 Best practices ........................................................ 215 Big data ........................................................... 390, 523 limitations ......................................................... 391 BIGINT ...................................................................... 372 Blue-green deployment .................................... 134 Bookmark questions ............................................. 67 Bottleneck ................................................................. 79 Branch logic ........................................................... 541 Bring your own language (BYOL) .................. 140 Bring-your-own-license .................................... 107 Build error .............................................................. 169 Build process ...................................... 129, 131, 175 Business Function Library (BFL) .................... 438 Business Intelligence Consumer Services (BICS) ......................................... 113, 309 Business intelligence tools .............................. 314 BW362 ......................................................................... 43 C C_HANADEV ............................................................ 30 C_HANAIMP ............................................................. 29 C_HANAIMP_15 .............................................. 17, 26 scoring ................................................................... 35 C_HANATEC ............................................................. 29 C4.5 decision tree ................................................ 446 Caching .................................................................... 208 Calculated column ....... 182, 279, 281, 326, 328, 345, 346 analytic view .................................................... 331 counter ................................................................ 285 create ................................................................... 282 input parameter .............................................. 299 624 Index Calculation view .............. 98, 204, 207, 235, 557 Calculation view of type cube .... 205, 217, 218, 225, 233, 271, 273 aggregation node ........................................... 276 create ................................................................... 236 with star join ................................. 217, 218, 225 Calculation view of type cube with star join ..................................................... 233, 271 create ................................................................... 237 star join node .......................................... 262, 276 Calculation view of type dimension .......... 205, 217, 225, 233, 238, 271 create ................................................................... 237 projection node ............................................... 276 Calculation views ........... 93, 168, 207, 231, 301, 330, 450, 505 adding data sources ...................................... 241 adding nodes .................................................... 241 create ................................................................... 235 creating joins ................................................... 242 data source ....................................................... 234 default node ..................................................... 240 graph nodes ...................................................... 492 node order ......................................................... 240 output area ....................................................... 303 save ...................................................................... 247 selecting output fields .................................. 244 variable .............................................................. 297 Calculations .................................................. 216, 524 CALL statement .................................................... 359 Cardinality ....................... 186, 243, 527, 537, 544 many-to-many ................................................ 186 many-to-one .................................................... 186 one-to-many .................................................... 186 one-to-one ......................................................... 186 type ...................................................................... 186 CDS views ................................... 197, 217, 220, 610 ABAP .................................................................... 199 benefits ............................................................... 198 data sources ..................................................... 199 Certification prefix ...................................................................... 27 track ........................................................................ 25 Characteristic ........................................................ 200 Circularstring ........................................................ 467 Classification algorithms ................................. 436 Classroom training ................................................ 40 Clients ...................................................................... 264 Client-side JavaScript ........................................ 101 Cloud Foundry ..................................................... 145 Clustering algorithm ................................ 435, 436 Column table ................................................. 89, 340 Columnar storage ...................................... 521, 544 Column-based database ............................................................... 83 input parameter .................................... 298, 326 storage ................................................................... 83 table ........................................................................ 79 Column-oriented tables ................................... 183 Columns ................................................................. 183 CompositeProvider ............................................... 99 Compression ............................................................ 72 Conditional statement ..................................... 346 Container ................................... 134, 139, 175, 556 isolation ............................................................. 140 Contains predicate ............................................. 422 Continuous delivery .......................................... 134 Continuous integration ................ 130, 134, 137 Convex Hull .......................................................... 474 Core data services (CDS) ...... 134, 159, 181, 197, 227, 234, 270, 468 full-text index .................................................. 419 Counters .............................................. 285, 286, 326 CPU .............................................................................. 79 in parallel ............................................................. 81 speed ....................................................................... 74 CSS ............................................................................. 101 CSV file .................................................................... 399 Cubes ........................ 181, 200, 201, 217, 218, 225 Currency ................................................................. 303 conversion ........ 182, 279, 304, 305, 327, 329, 347, 365 decimal shift ..................................................... 305 decimal shift back .......................................... 305 definition options .......................................... 306 display ................................................................ 305 reverse look up ................................................ 306 source and target ........................................... 305 Customer query view ............................... 598, 613 Cypher .................................................. 490, 494, 495 D Data aging ............................................................... 83, 91 Index Data (Cont.) backup .................................................................... 87 category .................................................... 236, 237 compression ........................................... 521, 544 federation .......................................................... 388 filter ...................................................................... 345 foundation .................. 203, 207, 217, 226, 525 lake ....................................................................... 390 masking .............................................................. 578 mining ................................................................. 434 noise ..................................................................... 436 persistence ............................................................ 87 preview ............................................................... 164 read ............................................................ 344, 544 records ................................................................ 340 security ............................................................... 578 sets ........................................................................... 63 source ........................................................ 234, 269 storage ................................................................... 76 temperatures ....................................................... 91 type ....................................................................... 341 volume ................................................. 87–90, 125 Data Control Language (DCL) ............... 341, 371 Data Definition Language (DDL) .................. 341, 353, 371 Data Distribution Optimizer .......................... 103 Data Lifecycle Manager ..................................... 103 Data Manipulation Language (DML) ......................................................... 341, 371 Data modeling artifact ...................................................... 208, 217 limit and filter .................................................. 216 Data preparation algorithms .......................... 436 Data provisioning ..................................... 375, 403 concepts ............................................................. 378 tools ..................................................................... 377 Data Query Language (DQL) ........ 341, 367, 371 Data Warehouse Monitor ................................ 103 Data Warehouse Scheduler ............................. 103 Data warehouses .................................................... 76 features ............................................................... 593 Database administrator ...................................................... 27 column-oriented ................................................ 78 deployment .......................................................... 96 migration .................................................... 96, 115 to platform ........................................................... 79 view ...................................................................... 184 Datahub ...................................................................... 63 DBSCAN clustering ............................................. 477 Debug mode ................................................. 520, 528 Debug query mode .................................... 543, 547 Decimal shift ......................................................... 330 Decision table ..................................................................... 270 tree ........................................................................ 436 Declarative logic .................................................. 350 Dedicated disaster recovery hardware .......... 90 Default view node ............................................... 239 Definer ..................................................................... 354 Delimited identifiers ................................ 342, 371 Delta .......................................................................... 404 buffer ............................... 83, 119, 347, 365, 522 load ................................................... 381, 387, 404 merge ..................................................... 83, 84, 522 store ........................................................................ 83 Deployment ................................ 71, 129, 131, 175 accelerator ........................................................... 96 cloud .................................................................... 106 database ................................................................ 96 development platform .......................... 99, 100 scenarios .............................................. 71, 93, 118 side-by-side solution ........................................ 94 virtual machine ............................................... 106 Deprecate ....................................................... 266, 273 Derived from procedures/scalar function input parameter .................. 299, 327 Derived from table input parameter ................................................. 298, 326 Design-time files ................................................. 568 Design-time objects .............. 129, 131, 174, 556 Development platform deployment ............. 99 programming languages ............................ 100 SAP HANA XS ................................................... 100 Development process ............................... 129, 131 Development user .............................................. 148 DevOps ....................................... 104, 134, 137, 501 Dictionary compression ...................................... 78 Dimension calculation view .............................................. 205 table ............................... 200, 202, 207, 208, 217 view ... 205, 207, 217, 233, 262, 271, 273, 544 Direct input parameter ............................ 298, 326 Disaster recovery .................................................... 89 Distributed database ............................................ 85 Documentation ................................................... 161 625 626 Index Domain fixed values ...................... 293, 326, 328 Dynamic join .................. 195, 196, 216, 244, 525 Dynamic restrictions ......................................... 575 Dynamic SQL ............................................... 351, 364 E E_HANAAW ABAP certification ....................... 30 E_HANABW SAP BW on SAP HANA certification ......................................................... 30 Edge processing ................................................... 502 E-learning .................................................................. 40 Elimination technique ......................................... 67 Enterprise data warehouse (EDW) ................ 101 Entity analysis ...................................................... 425 Envelope ................................................................. 474 Equidistant ................................................... 502, 508 ESRI ........................................................................... 462 Exact text search ................................................. 421 Exam objectives .............................................................. 32 process ................................................................... 36 questions .............................................................. 64 structure ................................................................ 34 EXEC statement ................................................... 351 Explicit full-text index ...................................... 419 Expression ............................................................. 341 Expression Editor ............................. 285, 292, 541 calculated column ......................................... 283 elements ............................................................. 283 functions area .................................................. 284 operators area ................................................. 284 restricted column ........................................... 288 Extended Well-Known Binary (WKB) .......... 479 Extended Well-Known Text (WKT) .............. 479 Extract semantics ............................................... 268 Extract, transform, load (ETL) ............... 379, 403 Extraction ............................................................... 379 F Facet ......................................................................... 200 Fact table ................. 181, 200, 202, 203, 207, 217 Fault-tolerant text search ................................ 421 Fields hide ...................................................................... 272 original ............................................................... 272 Fields (Cont.) output ................................................................. 244 rename ............................................................... 272 Filter ......................... 264, 291, 524, 532, 537, 584 expression ............................. 279, 292, 326, 331 operations ......................................................... 279 variable .............................................................. 294 Flat file ..................................................................... 399 use case .............................................................. 400 Flowgraph .............................................................. 448 editor ................................................................... 447 For loop ................................................................... 350 Formula ................................................................... 247 Free system .............................................................. 50 Freestyle search ................................................... 423 Full outer join .................................... 188, 189, 244 Full-text index ...... 415, 416, 418, 426, 452, 453 column ................................................................ 419 create ................................................................... 417 hidden column ................................................ 419 Function ................................................................. 341 Fuzzy text search ......... 347, 356, 365, 411, 415, 417, 421, 452 alternative names .......................................... 422 fuzzy threshold value ................................... 423 G Geocoding ........................................... 394, 475, 508 GeoJSON ................................................................. 479 Geometry collection .......................................... 467 Geospatial processing ....................................... 463 GeoTools ................................................................. 480 Gerrit ........................................................................ 137 Git ........................................................... 135, 136, 175 GitHub ........................................................................ 58 Google ...................................................................... 414 Google Cloud Platform (GCP) .......... 50, 52, 106 Grammatical Role Analysis (GRA) analyzer .............................................................. 428 Graph algorithm .......................................................... 490 modeling ................................................... 459, 480 node ......................................... 256, 270, 275, 492 view and analyze ............................................ 487 vs hierarchies ................................................... 497 workspace ......................................................... 485 Index Graph processing ...................................... 482, 508 build ..................................................................... 485 edge ...................................................................... 484 pattern matching ........................................... 488 vertices ................................................................ 483 Graphical data model ........................................ 123 Graphical flowgraph model ............................ 441 Graphical information model ........................... 80 GraphScript ......................................... 482, 495, 496 Grid clustering ...................................................... 477 Grouping set .......................................................... 541 Grubbs’ test algorithm ...................................... 436 H HA100 ......................................................................... 42 HA215 ......................................................................... 43 HA300 ......................................................................... 42 HA301 ......................................................................... 42 HA400 .................................................................. 30, 43 HA450 ......................................................................... 42 Hard disk .................................................................... 74 Hardware ................................................................... 61 Hash partitioning ................................................ 540 HDB module ............................. 142, 152, 162, 175 HDI ............................................... 139, 142, 144, 549 HDI container .......................... 175, 487, 557, 558 security ............................................................... 557 hdinamespace file ............................................... 167 Hidden fields ......................................................... 245 Hierarchy .......................... 181, 213, 214, 279, 308 composite ID .................................................... 325 create ................................................................... 309 leveled ................................................................. 325 orphan nodes ................................................... 311 parent-child ...................................................... 312 semantics node ................................................ 309 shared .................................................................. 314 span tree ............................................................. 325 SQL access .......................................................... 314 temporal ............................................................. 325 value help .......................................................... 314 Hierarchy function ................................... 261, 316 columns .............................................................. 321 data ...................................................................... 318 generate ............................................................. 319 navigate ............................................................. 322 Hierarchy function (Cont.) node ............................................................ 279, 317 SQL ........................................................................ 325 High availability ........................................... 86, 120 shared disk storage ........................................... 86 HOHAM1 ................................................................... 42 Horizontal aggregation ........................... 506, 508 HTML5 ............................................................ 100, 101 Hybrid cloud ................................................ 107, 115 I Identifier ........................................................ 341, 342 Identity provider (IdP) ...................................... 141 system ................................................................. 555 if() ............................................................................... 284 Imperative logic ................................ 350, 368, 541 Implicit full-text index ..................................... 419 Index server .................................................... 88, 120 InfoCube .................................................................... 98 InfoObject .................................................................. 99 Information model building .............................................................. 162 concepts ............................................................. 181 refactoring ..................................... 129, 170, 175 techniques ......................................................... 524 use and build .................................................... 206 Information modeler ........................................... 27 Information views ..................................... 233, 573 characteristics ................................................. 363 data source ....................................................... 273 layered ................................................................ 572 parameterized ................................................. 340 performance ..................................................... 544 Infrastructure as a service (IaaS) .......... 106, 139 Initial load ........................................... 380, 386, 404 In-memory ......................................................... 78, 82 data movement .................................................. 80 technology ................................. 71, 73, 114, 543 Inner join ....... 188, 189, 191, 216, 218, 226, 243 Input parameter .................... 182, 279, 297, 300, 302, 326 create ................................................................... 298 date ...................................................................... 300 expression ......................................................... 299 mapping ............................................................. 300 type .............................................................. 298, 326 627 628 Index Insert-only ................................................................ 84 principle ............................................................. 522 International Organization for Standardization (ISO) ................................... 340 Internet of Things (IoT) ......... 130, 397, 498, 523 Intersect .................................................................. 213 Intersect node ............................................. 256, 270 Interval ........................................................... 294, 326 Invoker .................................................................... 354 security ............................................................... 372 J Java Database Connectivity (JDBC) .............. 113 JavaScript ....................................................... 100, 134 server-side ......................................................... 101 Jenkins ..................................................................... 137 Join .................. 182, 184, 186, 190, 216, 231, 242, 363, 527 basic ..................................................................... 188 dynamic join .................................................... 195 node ......................................... 241, 249, 262, 270 performance ..................................................... 544 referential join ................................................. 191 relocation .......................................................... 390 self-join ............................................................... 189 spatial join ........................................................ 194 star join .............................................................. 204 temporal join ................................................... 193 text join .............................................................. 192 type ...................................................................... 187 K Keep flag ................................................................. 246 Kerberos ............................................... 563, 583, 586 Key field .................................................................. 184 Key figure ............................................................... 200 K-means .................................................................. 435 k-means clustering ............................................. 476 L Language ....................................................... 354, 358 detection ................................................... 417, 421 Lateral join ............................................................. 197 Lazy load .......................................................... 89, 125 Left outer join ................ 188, 189, 193, 216, 226, 243, 544 Level hierarchy .... 214, 215, 308, 310, 328, 329 geographic ........................................................ 310 Linestring ............................................................... 467 Linguistic text search ........................................ 421 Load .......................................................................... 380 Log backup ............................................................ 88, 89 buffer ...................................................................... 88 volume ......................................................... 88, 125 Log replication ........................................................ 90 Logical data warehouse .................................... 391 Loop ....................................................... 368, 372, 541 M Machine learning ....................................... 434, 523 Machine learning model .................................. 446 Managed cloud as a service (McaaS) ........... 106 Many-to-many cardinality .............................. 186 Many-to-one cardinality .................................. 186 Mapping property ........................... 245, 272, 276 Master data ................................ 204, 217, 226, 271 Materialized view ................................................ 185 Measures ................ 181, 200, 203, 217, 219, 226 calculated column ......................................... 284 restricted column ........................................... 288 Memory block ......................................................... 88 Mercator ................................................................. 463 Microservices ..................................... 130, 135, 141 Microsoft Azure ..................................... 50, 52, 106 Microsoft Excel .................................................... 122 Minus node .................................................. 256, 270 Minus operation .................................................. 213 Modeling best practices ................................................... 365 process ....................................................... 129, 131 role .............................................................. 565, 572 tools ..................................................................... 129 Model-view-controller (MVC) ........................ 101 Monitoring role ................................................... 565 Moore’s Law ............................................................. 73 mta.yaml ................................................................ 153 Multi cloud ............................................................ 108 Multicore CPUs ........................................... 521, 543 Multidimensional expression (MDX) ......... 309 Index Multilevel aggregation ...................................... 362 Multilinestring ..................................................... 467 Multiple-choice question .................................... 65 Multiple-response questions ............................ 65 Multipoint .............................................................. 467 Multipolygon ........................................................ 467 Multistore table ............................................ 92, 121 Multi-target application (MTA) ..................... 142 Multi-target replication ....................................... 91 Multitenancy ........................................................ 105 Multitenant database container (MDC) .... 104, 234, 270 N Namespace ................................................... 151, 166 Nodes .............................................................. 231, 248 Non-equi join ........................................................ 250 NoSQL databases .................................................... 86 NULL ...................................................... 342, 367, 371 O Object Linking and Embedding Database for Online Analytical Processing ............. 113 Object privileges ........................................ 570, 582 OData ....................................................... 42, 101, 400 service .................................................................. 422 OLAP ............................................... 77, 116, 521, 592 OLTP ................................................ 77, 116, 521, 592 One-to-many cardinality ................................. 186 One-to-one cardinality ...................................... 186 Open Database Connectivity (ODBC) .......... 112 Open Geospatial Consortium (OGC) ......................................................... 462, 479 Open ODS view ........................................................ 99 openSAP ..................................................................... 48 certification ......................................................... 48 Operating system ................................................... 61 Operational reporting ............................. 612, 614 Operator ........................................................ 288, 341 Optimization ......................................................... 519 best practices .................................................... 536 tools ..................................................................... 525 Optimize calculation views ............................. 537 Outlier ...................................................................... 436 Outlier detection algorithms ......................... 436 Output field .................................................. 184, 244 P Package privileges ............................................... 577 Page manager ................................................ 88, 125 Parallel execution ............................................... 359 Parallelism ...................................................... 81, 522 Parameter ............................................................... 359 Parent-child hierarchy .................. 214, 215, 224, 308, 310 Partitioned table ..................................................... 86 Partitioning .................................................... 86, 539 Performance .......................................................... 521 enhancements ................................................. 519 Performance analysis mode ....... 520, 526, 542, 546, 547 Persistence layer .......................... 71, 85, 114, 199 PL/SQL ..................................................................... 340 Plan graph .............................................................. 532 Plan trace ................................................................ 547 Platform as a service (PaaS) .................... 106, 135 Point ......................................................................... 467 Point of interest (POI) ........................................ 465 Polygon ................................................................... 467 PostGIS .................................................................... 479 Predicates ...................................................... 341, 469 list ......................................................................... 469 tips ........................................................................ 470 Predictive ................................................................ 432 Predictive Analysis Library (PAL) ....... 432–435, 447, 452, 454 algorithms ......................................................... 436 Predictive modeling ........................ 411, 432, 440 data source ....................................................... 442 data target ........................................................ 442 predictive analysis node .............................. 442 tasks ..................................................................... 440 Preprocessing algorithms ................................ 436 Primary storage ...................................................... 87 Private cloud ................................................ 107, 115 Private view .................................................. 597, 612 Privileges ............................................. 549, 553, 582 analytic privilege ............................................ 571 application privilege ..................................... 576 object privilege ................................................ 570 629 630 Index Privileges (Cont.) system privilege .............................................. 569 type ...................................................................... 569 Procedure ................................... 358, 362, 366, 368 characteristics ................................................. 363 create ................................................................... 358 parameter .......................................................... 359 read-only ............................................................ 361 Project create your own ................................................. 62 example ................................................................. 61 Projection ............................................ 209, 217, 345 Projection node ................................ 257, 270, 328 Proof-of-concept (POC) ..................................... 399 Proximity search ................................................. 425 Proxy table ............................................................. 388 Pruning configuration table ........................... 253 Public cloud ........................................ 106, 115, 124 Q Query ....................................................................... 341 Query unfolding .................................................. 541 Query view .......................................... 598, 599, 613 R R integration ......................................................... 433 R language ..................................................... 100, 372 RAM ............................................................................. 75 Range ............................................................... 294, 326 partitioning ...................................................... 540 Rank node ........................................... 258, 270, 273 dynamic partition element ........................ 274 sorting ................................................................. 258 Ranking .......................................................... 209, 217 Reads SQL data ..................................................... 358 Real-time computing ................................................ 521, 543 data ...................................................................... 208 reporting ............................................................ 185 Recommender systems .................................... 436 Recursive table ..................................................... 190 Red Hat Enterprise Linux ................................... 61 Refactoring ................................ 129, 170, 175, 176 errors ................................................................... 173 Referential integrity ................................. 191, 226 Referential join ............. 190–192, 216, 218, 226, 227, 243, 277, 525 star join .............................................................. 204 Regression algorithms ...................................... 435 Relationship direction ...................................... 484 Replication ...................................................... 95, 381 Report writers .......................................................... 27 Reporting system ................................................ 592 REST .......................................................... 42, 101, 401 Restricted column .................. 279, 286, 287, 326 operator ............................................................. 288 using calculated columns ........................... 290 Restrictions ........................................................... 573 Returns .................................................................... 353 Reuse view .......................................... 598, 599, 612 Reverse geocoding .............................................. 475 Right outer join .............. 188, 189, 219, 226, 244 Roles ......................... 549, 553, 557, 565, 581, 585 assign to user ................................................... 577 composite role ........................................ 553, 581 create ................................................................... 565 template role .................................................... 565 Round robin partitioning ................................ 540 Row-based database ............................................................... 83 storage ................................................................... 83 table ........................................................................ 79 Row-oriented tables ........................................... 183 Rows ......................................................................... 183 storage ................................................................ 522 Runtime objects ............................... 129, 131, 556 naming ............................................................... 166 Runtime version ................................................. 174 S Sales forecasting .................................................. 432 Sandbox .................................................................. 399 SAP Analysis for Microsoft Office ....... 110, 605 SAP Analytics Cloud ........................ 108, 122, 126 SAP Basis .................................................................... 29 SAP Business Suite ...................................... 99, 385 SAP Business Warehouse (SAP BW) ..... 96, 202, 383, 551 SAP BusinessObjects Web Intelligence ...... 110 SAP BW on SAP HANA .................. 43, 97, 99, 585 SAP BW/4HANA ...................................................... 99 Index SAP C/4HANA ....................................................... 106 SAP Cloud Appliance Library ...................... 52, 53 SAP Cloud Application Programming Model .................................................................. 200 SAP Cloud Platform ................... 50, 51, 106, 107, 129, 524 SAP Community ..................................................... 46 SAP CRM .................................................................. 551 SAP Crystal Reports .................................. 110, 122 SAP Data Quality Management ..................... 384 SAP Data Services ............................. 383, 384, 405 SAP Developer Center .......................................... 50 SAP Digital Boardroom ..................................... 109 SAP Education .......................................................... 39 SAP Enterprise Architecture Designer ........ 156 SAP ERP .......................................................... 551, 594 SAP extractor .............................................. 382, 404 SAP Fiori .................................................................. 101 SAP HANA as a database ..................................... 96, 98, 595 as a development platform ........................... 99 as a platform .................................................... 115 as a side-by-side accelerator solution ................................................... 94, 596 as a virtual machine ...................................... 106 as an accelerator ............................................... 96 BI tools ................................................................ 111 client .................................................................... 112 cloud .......................................................... 104, 106 create calculation view ................................ 236 data provisioning ........................................... 377 flowgraph .......................................................... 475 for series data ................................................... 503 information models ...................................... 157 project ................................................................. 151 reference guides ................................................. 45 Security Guide ..................................................... 45 SQLScript Reference .......................................... 45 training courses .......................................... 39, 41 Troubleshooting and Performance Analysis Guide ............................................... 45 SAP HANA 1.0 as a development platform ............................................................. 100 SAP HANA Academy ............................................. 46 SAP HANA Business Function Library (BFL) Reference .............................................................. 45 SAP HANA certifications ..................................... 28 all exams ............................................................... 31 associate level ..................................................... 28 specialist level ..................................................... 30 SAP HANA cockpit ....... 145, 148, 149, 535, 543, 547, 556, 567 security ............................................................... 562 SAP HANA Data Management Suite ............ 403 SAP HANA database explorer .... 162, 163, 175, 318, 486, 530 SAP HANA deployment infrastructure ...... 139 SAP HANA Developer Guide .............................. 45 SAP HANA Developer Quick Start Guide ...... 45 SAP HANA Enterprise Cloud ........ 106, 107, 115 private cloud .................................................... 107 SAP HANA Interactive Education (SHINE) .............. 45, 56, 61, 199, 262, 356, 480 application ........................................................... 59 data sets ................................................................ 63 SAP HANA Live .............. 547, 583, 585, 591, 593, 597, 610, 614 administration ................................................ 600 architecture ............................................. 595, 596 benefits ............................................................... 595 definition ........................................................... 591 reporting in transactional systems ......... 592 security ............................................................... 552 tags ....................................................................... 604 views ....................................... 589, 599, 606, 612 SAP HANA Live Browser ................ 589, 600, 613 all views .............................................................. 601 invalid views ..................................................... 605 toolbar ................................................................ 602 SAP HANA Live extension assistant ........... 589, 600, 606, 613, 614 restrictions ........................................................ 608 SAP HANA Modeling Guide ............................... 44 SAP HANA Predictive Analysis Library (PAL) Reference .............................................................. 45 SAP HANA remote data sync .......................... 398 SAP HANA smart data access ......................... 388 adapter ............................................................... 389 virtual tables .................................................... 390 SAP HANA smart data integration ...... 392, 393 SAP HANA smart data quality ..... 392, 394, 475 SAP HANA spatial services .............................. 480 631 632 Index SAP HANA SPS 04 .... 85, 91, 151, 163, 197, 200, 237, 240, 255, 259, 302, 303, 325, 344, 348, 350, 362, 364, 399, 424, 434, 437, 447, 465, 468, 474, 476, 480, 497, 535, 536, 561, 566 SAP HANA streaming analytics ............ 395, 396 SAP HANA Studio ................................................ 556 SAP HANA views .................................................. 171 SAP HANA XS .................. 100, 101, 138, 144, 162 application privileges ................................... 576 DB Utilities JavaScript API Reference ........ 45 JavaScript API Reference ................................ 45 JavaScript Reference ........................................ 45 SAP HANA XSA ......... 55, 56, 100, 138, 143, 144, 162, 361, 422, 549 administration tool ....................................... 562 architecture ............................................. 129, 131 Cockpit ................................................................... 57 development users ......................................... 559 runtime ............................................................... 142 security concepts ............................................ 554 SAP HANA, express edition .............. 50, 56, 136 local system ......................................................... 54 SAP Help .................................................................... 43 SAP Landscape Transformation Replication Server ................................. 385, 596 trigger-based replication ............................ 385 SAP Learning Hub .................................................. 41 SAP Leonardo ............................................... 397, 523 SAP Lumira ..................................................... 62, 605 SAP Lumira, designer edition ............... 110, 122 SAP Lumira, discovery edition ............. 110, 122 SAP NetWeaver ....................................................... 75 SAP Predictive Maintenance and Service ................................................................. 523 SAP Replication Server ............................. 387, 388 SAP resources .......................................................... 43 SAP S/4HANA ...................... 82, 93, 101, 123, 585 SAP S/4HANA embedded analytics .... 608, 613 new features ..................................................... 609 SAP HANA Live ................................................ 608 SAP SQL Anywhere ............................................. 398 SAP SQL Data Warehousing ................... 102, 104 SAP SuccessFactors ............................................. 106 SAP Vora ........................................................ 401, 402 SAP Web IDE for SAP HANA ........... 58, 129, 131, 151, 175, 300, 448, 475, 487, 527, 556 output fields ..................................................... 244 SAP Web IDE for SAP HANA (Cont.) roles ..................................................................... 567 UI .......................................................................... 158 SAPUI5 .............................................. 27, 43, 100, 101 Savepoint ................................................. 88, 89, 125 Scalable Vector Graphic (SVG) ....................... 479 Scalar function .................................. 337, 353, 354 Scale-out architecture .......................................... 86 Schema ........................................................... 556, 558 SCORE() function ................................................. 422 Scriptserver ........................................................... 438 Search ...................................................................... 417 Seasonal pattern .................................................. 432 Secondary storage ................................................. 87 Security ................................................ 351, 539, 549 concepts .................................................... 551, 553 HDI ....................................................................... 554 SAP HANA as a database ............................ 551 SAP HANA as a platform .................... 552, 583 single sign-on ................................................... 563 user and role management ........................ 562 Segmentation ....................................................... 435 Segregation of duties ........................................ 578 SELECT * ................................................ 344, 367, 372 Self-join .......................................................... 189, 216 Semantics ............................................................... 213 Semantics node ................................ 263, 270–272 columns tab ...................................................... 266 hide fields .......................................................... 267 input parameter ............................................. 297 renaming fields ............................................... 266 session client .................................................... 266 top node ............................................................. 239 variable .............................................................. 294 view properties tab ........................................ 264 Sentiment analysis .................................... 425, 453 Sequential execution ........................................ 359 Series data ..................................................... 498, 499 examples ............................................................ 500 modeling ................................................... 459, 498 Series tables ........................................................... 505 Servers ........................................................................ 86 Session variable .......................................... 302, 303 Set-oriented .................................................. 340, 366 Shortest path algorithm .......................... 490, 492 Show lineage ................................................ 172, 269 Index Side-by-side deployment .................................... 94 advantages ........................................................... 95 blank system ........................................................ 95 Single sign-on ....................................................... 583 Single value .................................................. 294, 326 Slow disk .................................................................... 77 Smart meter ........................................................... 500 Social network analysis algorithms ............. 436 Software ..................................................................... 61 Software as a service (SaaS) ............................. 106 Space ............................................................... 151, 560 development ..................................................... 145 Spatial clustering ........................................................... 476 data type ......................................... 347, 365, 466 data type, supertype ...................................... 467 expressions ........................................................ 472 functions ............................................................ 478 import data ....................................................... 479 join ........................ 194, 216, 244, 274, 277, 469 join type .............................................................. 347 modeling .................................................. 459, 462 point ..................................................................... 466 processing ....................................... 459, 463, 508 Spatial reference system (SRS) ....................... 464 Spatial reference system identifier (SRID) ................................................................... 464 Spatial TREAT expression ................................ 467 SQL ....................................... 100, 337, 340, 365, 519 analytic privileges ................................ 266, 584 conditional statement .................................. 346 creating calculated column ....................... 345 creating projections ...................................... 345 creating tables ................................................. 342 creating union ................................................. 346 Data Definition Language .......................... 341 Data Manipulation Language .................. 341 dynamic ........................................... 351, 368, 372 Expression Editor ............................................ 283 filter data ........................................................... 345 guidelines ........................................................... 541 language ............................................................ 340 query .................................................................... 532 reading data ..................................................... 344 security ..................................................... 354, 358 SQL (Cont.) set-oriented ....................................................... 340 view ................................................... 185, 234, 270 SQL Analyzer ............................ 529, 531, 543, 547 SQL Console ..................... 165, 359, 530, 531, 568 SQL Editor ............................................................... 574 SQL Engine ............................................................. 266 SQL/MM .................................................................. 462 SQLScript ...... 100, 302, 337, 340, 347, 362, 372, 447, 448, 519, 541, 545, 580 compiler ............................................................. 348 declarative logic .............................................. 350 dynamic SQL ..................................................... 351 for loop ............................................................... 350 libraries ............................................................... 361 multilevel aggregation ................................ 362 optimizer ............................................................ 348 predictive model ............................................. 452 procedure ........................................................... 358 security ...................................................... 351, 352 separate statement ........................................ 347 variable ............................................................... 347 while loop .......................................................... 350 Standard deviation ............................................. 210 Standard union .................................................... 251 Star join ................................................................... 204 referential join ................................................. 204 view ............................................................. 207, 217 views .................................................................... 205 Star join node ..................................... 270, 273, 276 data foundation .............................................. 262 restricted column ........................................... 288 Statement ............................................................... 341 Static list input parameter ...................... 298, 327 Statistics algorithms .......................................... 436 Stemming ...................................................... 415, 421 Stored procedure ................................. 80, 123, 358 Strongly connected components algorithm .................................................. 490, 494 Sum ........................................................................... 210 Supervised learning .................................. 436, 454 SUSE Linux Enterprise Server (SLES) ....... 56, 61 Synchronous ......................................................... 125 replication ............................................................ 90 System privileges ................................................ 569 System-versioned tables .................................. 343 633 634 Index T Table ................................................................ 183, 198 create ................................................................... 342 data source .............................................. 234, 269 definitions ......................................................... 163 function ........................................... 337, 353, 356 function node .................................................. 261 join ....................................................................... 525 left and right .................................................... 186 link ........................................................................ 184 partitioned ................................................. 86, 534 recursive ............................................................. 190 select .................................................................... 184 split ...................................................................... 540 type ...................................................................... 444 Technical performance tuning experts ........ 27 Technical user ....................................................... 557 Template roles ...................................................... 565 modeling role ................................................... 565 monitoring role ............................................... 565 Temporal join ................ 190, 193, 194, 216, 218, 226, 277 Tenants ........................................................... 104, 274 TensorFlow ............................................................ 433 Term document matrix ................................... 416 Text analysis ............................. 411, 415, 425, 452 configuration file ........................................... 415 elements ............................................................. 430 result storage ................................................... 429 results .................................................................. 427 SAP HANA ......................................................... 429 Text index .............................................................. 416 Text join .................. 190, 192, 216, 220, 226, 244 left outer join ................................................... 193 Text mining ..................... 411, 415, 417, 430, 452 capabilities ............................................... 430, 452 configuration file ........................................... 416 index .................................................................... 416 Text processing ........................................... 411, 414 Text search ............................................................. 415 Time dimension .................................................. 238 calculation view .............................................. 237 Time series ............................................................. 436 data ...................................................................... 508 database ............................................................ 498 Time-based calculation view ................. 238, 272 Time-based hierarchy ....................................... 214 Time-dependent hierarchy ............................. 313 value help .......................................................... 314 Tokenization ................................................ 415, 421 Trace and diagnostic files ................................ 535 Training courses ....................................... 28, 41, 42 BW362 ............................................................. 30, 43 HA100 ...................................................... 29, 35, 42 HA200 ................................................................... 29 HA201 ..................................................................... 29 HA215 ............................................................... 29, 43 HA240 .................................................................... 29 HA250 .................................................................... 29 HA300 ...................................................... 29, 35, 42 HA301 .............................................................. 29, 42 HA400 ............................................................ 30, 43 HA450 ............................................................. 30, 42 HOHAM1 ............................................................... 42 UX402 .................................................................... 43 Transaction Control Language (TCL) ........... 341 Transaction manager ........................ 88, 120, 125 Transactional data .............................................. 217 Transact-SQL (T-SQL) ......................................... 340 Transform .............................................................. 379 Trigger ..................................................................... 385 Twelve-factor app ............................................... 134 U UDF ............................................... 353, 358, 366, 368 characteristics ................................................. 363 Undelimited identifier ..................................... 342 Unicode .......................................................... 342, 386 Union .............. 184, 207, 210, 211, 217, 525, 544 all .......................................................................... 211 creating .............................................................. 346 empty behavior ............................................... 254 pruning ............................................................... 544 with constant values ........................... 211, 251 Union node ................................ 251, 270, 271, 328 constant value ................................................. 252 mapping ............................................................. 252 pruning configuration table ...................... 253 User Account and Authorization (UAA) .... 555 Users ...................................................... 549, 553, 559 SYSTEM user ..................................................... 560 Index V Value help ..................................................... 289, 301 hierarchy ............................................................ 314 view ............................................................ 598, 613 Variable ................... 279, 293, 326, 329, 330, 348 create ......................................................... 294, 295 modeling ............................................................ 294 type ............................................................. 294, 326 Variance .................................................................. 210 Versioned table .................................................... 343 Vertical aggregation ........................................... 507 Views ........................ 181–183, 216, 218, 362, 571 disappear ........................................................... 185 save ...................................................................... 185 Virtual classrooms ................................................. 40 Virtual data model (VDM) .............. 21, 589, 591, 597, 611 Virtual machines ................................................. 119 Virtual table ........................................ 234, 270, 389 vs native table .................................................. 390 VMware Player ........................................................ 56 VMware vSphere .............................. 106, 119, 124 W Web services ....................................... 100, 400, 401 Well-Known Binary (WKB) ............................... 479 Well-Known Text (WKT) ................................... 479 WGS84 ...................................................................... 464 What’s New documentation .............................. 45 Where-used ......................................... 170, 174, 176 While loop .............................................................. 350 With results view ................................................. 359 Workspace ..................................................... 151, 157 World Geodetic System .................................... 464 X xsa-cockpit tool .................................................... 147 635 Service Pages The following sections contain notes on how you can contact us. Praise and Criticism We hope that you enjoyed reading this book. If it met your expectations, please do recommend it. If you think there is room for improvement, please get in touch with the editor of the book: meaganw@rheinwerk-publishing.com. We welcome every suggestion for improvement but, of course, also any praise! You can also share your reading experience via Twitter, Facebook, or email. Supplements If there are supplements available (sample code, exercise materials, lists, and so on), they will be provided in your online library and on the web catalog page for this book. You can directly navigate to this page using the following link: http://www.sap-press.com/4876. Should we learn about typos that alter the meaning or content errors, we will provide a list with corrections there, too. Technical Issues If you experience technical issues with your e-book or e-book account at SAP PRESS, please feel free to contact our reader service: support@rheinwerk-publishing.com. About Us and Our Program The website http://www.sap-press.com provides detailed and first-hand information on our current publishing program. Here, you can also easily order all of our books and e-books. Information on Rheinwerk Publishing Inc. and additional contact options can also be found at http://www.sap-press.com. i Legal Notes This section contains the detailed and legally binding usage conditions for this e-book. Copyright Note This publication is protected by copyright in its entirety. All usage and exploitation rights are reserved by the author and Rheinwerk Publishing; in particular the right of reproduction and the right of distribution, be it in printed or electronic form. © 2019 by Rheinwerk Publishing, Inc., Boston (MA) Your Rights as a User You are entitled to use this e-book for personal purposes only. In particular, you may print the e-book for personal use or copy it as long as you store this copy on a device that is solely and personally used by yourself. You are not entitled to any other usage or exploitation. In particular, it is not permitted to forward electronic or printed copies to third parties. Furthermore, it is not permitted to distribute the e-book on the Internet, in intranets, or in any other way or make it available to third parties. Any public exhibition, other publication, or any reproduction of the e-book beyond personal use are expressly prohibited. The aforementioned does not only apply to the e-book in its entirety but also to parts thereof (e.g., charts, pictures, tables, sections of text). Copyright notes, brands, and other legal reservations as well as the digital watermark may not be removed from the e-book. Digital Watermark This e-book copy contains a digital watermark, a signature that indicates which person may use this copy. If you, dear reader, are not this person, you are violating the copyright. So please refrain from using this e-book and inform us about this violation. A brief email to info@rheinwerk-publishing.com is sufficient. Thank you! ii Trademarks The common names, trade names, descriptions of goods, and so on used in this publication may be trademarks without special identification and subject to legal regulations as such. All of the screenshots and graphics reproduced in this book are subject to copyright © SAP SE, Dietmar-Hopp-Allee 16, 69190 Walldorf, Germany. SAP, the SAP logo, ABAP, Ariba, ASAP, Concur, Concur ExpenseIt, Concur TripIt, Duet, SAP Adaptive Server Enterprise, SAP Advantage Database Server, SAP Afaria, SAP ArchiveLink, SAP Ariba, SAP Business ByDesign, SAP Business Explorer, SAP BusinessObjects, SAP BusinessObjects Explorer, SAP BusinessObjects Lumira, SAP BusinessObjects Roambi, SAP BusinessObjects Web Intelligence, SAP Business One, SAP Business Workflow, SAP Crystal Reports, SAP EarlyWatch, SAP Exchange Media (SAP XM), SAP Fieldglass, SAP Fiori, SAP Global Trade Services (SAP GTS), SAP GoingLive, SAP HANA, SAP HANA Vora, SAP Hybris, SAP Jam, SAP MaxAttention, SAP MaxDB, SAP NetWeaver, SAP PartnerEdge, SAPPHIRE NOW, SAP PowerBuilder, SAP PowerDesigner, SAP R/2, SAP R/3, SAP Replication Server, SAP S/4HANA, SAP SQL Anywhere, SAP Strategic Enterprise Management (SAP SEM), SAP SuccessFactors, The Best-Run Businesses Run SAP, TwoGo are registered or unregistered trademarks of SAP SE, Walldorf, Germany. Limitation of Liability Regardless of the care that has been taken in creating texts, figures, and programs, neither the publisher nor the author, editor, or translator assume any legal responsibility or any liability for possible errors and their consequences. iii