Best Practices for SAP HANA Modeling and SAP Data Services Data Loading Dr. Berg Comerit © Copyright 2014 Wellesley Information Services, Inc. All rights reserved. In This Session • • • • • We will explore SAP BusinessObjects Data Services and how to load information into SAP HANA You will learn how to create transformations, merges and joins We will look at the best practices of modeling in SAP HANA We will see step-by-step how to create calculation, attribute and analytical views At the end of this session you will know how to load data and create views to analyze the data 1 What We’ll Cover • • • BusinessObjects Data Services Data Services Overview Creating Batch Jobs Loading From Flat Files Building Transforms and Using Functions Creating Table Joins Utilizing Data Merging SAP HANA Wrap-up 2 Data Services Overview • SAP Data Services is a leading technology for enterprise information management providing solutions for: Data integration Data quality Data profiling Text data processing SAP Data Services transforms, refines, and delivers trusted data for the EDW. 3 What We’ll Cover • • • BusinessObjects Data Services Data Services Overview Creating Batch Jobs Loading From Flat Files Building Transforms and Using Functions Creating Table Joins Utilizing Data Merging SAP HANA Wrap-up 4 What Are Data Services Batch Jobs? Batch jobs are basically used to: Extract data from one or many sources Transform data to meet the organization’s business requirements Load the processed data to a location for use 5 Step-by-Step: Creating Batch Jobs 1. Create a new project and give it a relevant project name 2. Right-click on the project to create a new batch job The practice of giving relevant names to your projects and batch jobs is useful for organization purposes 6 What We’ll Cover • • • BusinessObjects Data Services Data Services Overview Creating Batch Jobs Loading From Flat Files Building Transforms and Using Functions Creating Table Joins Utilizing Data Merging SAP HANA Wrap-up 7 Step-by-Step: Loading from Flat Files 1. Select the related batch job to enter into its workspace 2. From the ‘Format’ category in the ‘Local Object Library’ panel, rightclick on ‘Flat Files’ and select ‘New’ 8 Examples of Other Available Data Sources There are many other data sources that can be used in Data Services. Use the local object library to find existing data sources under the ‘Datastore’ category Can upload more files under the ‘Format’ category 9 Formatting the Flat File 3. In the pop up ‘File Format Editor’, fill in the appropriate fields Date format must match data format ‘Tab’ was chosen because data fields were separated by tabs 10 Defining Table Fields 4. Enter in the field properties Notice the updated view below 11 Preview Data 5. In the Repository under Format, right-click and select ‘View Data’ to preview the newly added data source This allows you to check if the source data populated without error before using the data 12 What We’ll Cover • • • BusinessObjects Data Services Data Services Overview Creating Batch Jobs Loading From Flat Files Building Transforms and Using Functions Creating Table Joins Utilizing Data Merging SAP HANA Wrap-up 13 Transforms Overview • Transforms are built-in objects that process source data to bring about desired outputs • The most commonly used transform is Query Transform • Query Transform enables you to: Filter and select data from a source • Join data from multiple sources • Map columns from input to output schemas • Perform data nesting and unnesting • Add new columns to the output schema • Assign primary keys to output schema • 14 Adding a Data Flow Object to the Workspace The tool palette contains icons which allow the creation of new objects in the workspace. 1. Drag a data flow icon from the tool palette to the workspace 2. Double click on the data flow to enter its workspace When creating a reusable object, such as a data flow object, it will automatically appear in the local object library. 15 Adding a Data Source to the Workspace 1. Drag a data source (i.e. flat file) from the local object library on to the workspace 2. Create a connection between the data source and query 16 Query Editor Overview • • The query editor is a graphical interface for carrying out query operations. It contains three areas: 3. Double-click on the query transform to open the ‘Query Editor’ Schema in area Schema out area Parameters area 17 Setting Up the Output Table 4. Drag the desired output fields to ‘Schema Out’ from the ‘Schema In’ section It is not necessary to drag all fields from schema in to schema out unless you want all the fields to appear in schema out. 18 Creating of New Output Column 1. Right-click on an output field and select ‘New Output Column’ 2. Select where to insert the new column New columns can be created to display results from calculations. 19 Defining Column Properties 3. The ‘Column Properties’ will pop up for you to define and rename the new column and its properties Give the column a descriptive name that properly identifies what the column is used for. 20 Using Functions 1. Double-click in the cell under ‘Mapping’ 3. Select the appropriate category and then the specific function 2. Click on ‘Functions’ For this demo, we want to calculate the number of days a case was open 21 Setting Up the Function Use the dropdown list to state the input parameters to avoid typos. 4. Define the input parameters for the function Noticed the updated code in this panel for the NO_DAYS_CASE_OPEN column after defining the input parameters. This formula will deliver the number of days from ODATE to CDATE giving us a measurement 22 of how long it takes to close a case. Adding an Output Table to the Workspace 1. Drag and drop a table template in the workspace to be our output table 2. Link the query to the template table A template table is an object that can be used as a target for data to populate in when a job gets executed successfully and can also be saved in the object library for use as a data source at a later time. A template table allows us to view the specific information we want without the risk of altering the source data. The data that gets populated in the template table is based on the output schema requirements in the query transform. 23 Executing a Job 1. Right-click on the job and select execute To analyze any issues that may occur during data loading, click on Enable auditing’ and make sure that ‘Use collected statistics’ are checked. 24 Job Log Overview • The log file displays a list of actions in the job execution. If any errors occur, the error icon will appear. Otherwise, ‘Job is completed successfully’ will be displayed • The job log has five columns: • Pid: Tid: Number: Time Stamp Message: Process thread identification number of the executing thread Thread identification number of the thread Number prefix of the error followed by a number Date and time the thread generated a message Error description of the thread 25 Job Log Overview Continued A successful job execution A job with errors will show the error icon Double-click on the error icon to view the list of errors as shown below. 26 How to Preview the Output Table 1. Click on the Data Flow to open its workspace 2. Click on the magnify glass of the output table to view data in the output table Notice that the column created earlier is formatted correctly as a number and that the data is the result of the function defined. 27 What We’ll Cover • • • BusinessObjects Data Services Data Services Overview Creating Batch Jobs Loading From Flat Files Building Transforms and Using Functions Creating Table Joins Utilizing Data Merging SAP HANA Wrap-up 28 Creating Table Joins A join can be used to combine data from multiple sources into one target. Source 1 Source 2 Use the Query Transform FROM clause to join the two sources: Query and Join. In this example, Source 1 has the Car Description for the case, while Source 2 has the Solution to the case. The query transform will combine the data from the two sources in the schema out section to produce a result displaying the overall case solution. 29 Result from a Table Join 1. Once the tables have been joined in the query transform, execute the job as discussed in earlier slides. 2. Enter the data flow workspace and click on the magnify glass to view the results in the output table. Noticed in the output table below how the Solution column from the Join source is now combined with the fields from the Query transform. 30 What We’ll Cover • • • BusinessObjects Data Services Data Services Overview Creating Batch Jobs Loading From Flat Files Building Transforms and Using Functions Creating Table Joins Utilizing Data Merging SAP HANA Wrap-up 31 Merges Overview You can merge rows from two or more sources into a single data set All sources must have the same schema to execute the Merge Transform • Same # of columns • Same column names • Columns must have same data type 32 How to Create a Merge 1. To merge two sources, add a query form to each source to format all the data to be the same in both sources 2. Join the queries to a ‘Merge Transform’ 3. When opening the ‘Merge Transform’, notice how all the fields and data types match for all output and input fields. 33 How to Avoid Creating Duplicated Data in Merges 4. To avoid duplicate rows, add a query transform to display distinct rows only 5. Execute the job to complete merged table 34 Demo of Data Loading with Data Services 35 What We’ll Cover • • • BusinessObjects Data Services SAP HANA SAP HANA Overview Creating Attribute Views Building Analytic Views Making Calculation Views Wrap-Up 36 SAP HANA – In Memory Options • SAP HANA is sold as an in-memory appliance. This means that both Software and Hardware are included from the vendors • Currently you can buy SAP HANA solutions from Cisco, Dell, Fujitsu, IBM, HP, NEC, Hauweii and others • SAP HANA indexes and compress the data from a variety of sources, including ERP and store the data in-memory. Source SAP AG,2014 SAP HANA can radically change the way databases operate and make systems dramatically faster. 37 HANA Editions and Components • Area Platform Edition Component ID Component Name While HANA is sold as an appliance, there are many internal components and the edition you buy may contain different licenses to these components. BC-DB-HDB SAP HANA database BC-DB-HDB-ENG SAP HANA database engine BC-DB-HDB-PER SAP HANA database persistence BC-DB-HDB-SYS SAP HANA database interface BC-DB-HDB-DBA SAP HANA database/DBA cockpit BC-DB-HDB-POR SAP HANA DB Porting BC-DB-HDB-BAC SAP HANA Backup and Recovery BC-CCM-HAG SAP Host agent BC-DB-HDB-CCM SAP HANA CCMS BC-DB-HDB-CLI SAP HANA Clients (JDBC/ODBC) BC-DB-HDB-R SAP HANA Integration with R BC-DB-HDB-SCR SAP HANA SQL scripts BC-DB-HDB-MDX MDX engine: Microsoft Excel client BC-HAN-MOD SAP HANA Studio - Information Modeler BC-HAN-3DM Information Composer BC-HAN-SRC SAP HANA UI toolkit BC-DB-HDB-TXT SAP HANA Text and Search features BC-DB-HDB-DXC SAP HANA Direct extraction connector BI-BIP-CMC, BI-BIP BI Platform BC-DB-HDB-SEC SAP HANA Security and User Mgmt BI-RA-WBI Web Intelligence BC-DB-HDB-XS SAP HANA Application Services BI-RA-XL Dashboard Designer BC-DB-HDB-AFL SAP HANA Advanced functions library BI-RA-CR, BI-BIP-CRS SAP Crystal reports BC-DB-HDB-AFL-PAL SAP HANA Predictive analysis library BI-RA-EXP SAP BusinessObjects Explorer BC-DB-HDB-AFL-SOP SAP HANA Sales & Operations Planning BI-BIP-IDT Information Design Tool (for universes) BC-DB-HDB-PLE BI-RA-AO-XLA Microsoft Excel add-in SAP HANA Planning Engine Area Lifecycle Management Component ID BC-HAN-SL-STP SAP HANA unified installer BC-HAN-UPD Software Update Manager BC-DB-HDB-INS SAP HANA database installation BC-DB-HDB-UPG SAP HANA database upgrade BC-HAN-DXC Enterprise Edition EIM-DS BC-HAN-LOA (also have platform edition components) BC-HAN-LTR BC-HAN-REP End User Clients Component Name SAP HANA Direct Extractor Connection SAP Data Services: ETL-based SAP HANA Load Controller: log-based SAP Landscape Transformation (SLT): trigger-based Sybase Replication Server: log-based 38 Some of the Hardware Options Dell R920 39 Example: IBM 3850 X6 40 Hardware Options 2014 Onward These systems are based on Intel's E7 IvyBridge processors with 15 cores per processor (the old had only 10). UPDATE: Hitachi Servers and Dell (R920) are now also available 41 What We’ll Cover • • • BusinessObjects Data Services SAP HANA SAP HANA Overview Creating Attribute Views Building Analytic Views Making Calculation Views Wrap-Up 42 Attribute Views - Overview • Masterdata reporting can be modeled using attribute views • Can be regarded as Master Data Tables • Can be linked to fact tables in Analytic Views • A measure • Table e.g. weight can be defined as an attribute joins and properties: • Leftouter, rightouter, full outer or text table • Cardinality 1:1, N:1, 1:N • Language Column • Some Views and Functions is shipped with HANA 43 Creating a New Attribute View 1. Open HANA Studio and expand the ‘Content’ folder 2. Right click on the appropriate package in your system 3. Navigate to New > Attribute View… 44 Naming the New Attribute View Give the view a name 2. Add a description 1. The name and description that is provided should accurately describe the Attribute View you want to create. 3. Finish and start adding and joining tables to the view 45 Adding Tables to the Data Foundation 1. 2. Open the ‘Catalog’ folder Expand the system 3. Expand the ‘Tables’ folder 4. Drag the necessary table to the ‘Data Foundation’ 46 Adding More Tables to the Data Foundation Add tables into the data foundation by dragging another table to the data foundation area Join type is set using the Properties panel The first table that was added will be on the left in the ‘Details’ panel 47 Applying Filters to the View Filters can be used to limit the data being displayed. Right click on the attribute you want to filter on and select ‘Apply Filter’ from the context menu. This example shows the creation of a filter on the ‘VALID_TO’ date field. Setting that value to ‘9999-12-31’ forces the result set to only show values that are always valid 48 Making Attributes Visible to End Users 1 & 2. To make an Attribute visible to users, simply click the circle beside each attribute 3. An attribute can be set to a key or changed to a certain type of label Save and Validate once complete 49 What We’ll Cover • • • BusinessObjects Data Services SAP HANA SAP HANA Overview Creating Attribute Views Building Analytic Views Making Calculation Views Wrap-Up 50 Analytic View - Overview • Logically very close to InfoCubes in BW • Join together one central fact table containing measures for reporting • Can consist of calculated measures and variables • Analytic Views do not store data • Data is found in the column store table or view based on Analytic View structure An example of an analytic view might be sales by product, customer, and organizational entity. 51 Starting an Analytic View • • • • Analytic views are the most common views for reporting purposes They are the basic view type used as source data in the SAP BusinessObjects BI tools (or other frontend tools) We will join together sales data with product information This view will be quite simplistic but they can be as complex as you like Analytic views do not have to make use of attribute views. They can simply be a join of master data tables and a fact table. 52 Adding a New Analytic View 1. Find the appropriate package 2. Right click and choose ‘New’ > ‘Analytic View’ 3. Provide a technical name and a description in the popup that follows Ensure that the ‘View Type’ dropdown is set to Analytic View 53 Adding Fields to the Output Add tables to the data foundation by clicking and dragging tables to it You should also select which attributes will be shown in the output by selecting the gray circles next to each item. 54 Setting Attributes and Measures In the semantic layer, you can assign attributes and measures to the items that were selected to be in the output. This is necessary for attributes and measures to be displayed and aggregated properly in the reporting layer. Select which attributes will be shown in the output by selecting the gray circles next to each field. 55 Joining Tables In the ‘Logical Join’, two or more tables must be joined together on fields that are identical or share the same results. 1. 2. 3. Select the ‘Logical Join’ node Drag another view or table into the node Drag from one view to the other on the common field (i.e. Product to Product) By default this creates a referential join of the table to the ‘Data Foundation’. 56 Creating a New Calculated Column Now we will add a new calculated field called ‘Net Sales’ Using the ‘Advanced’ tab you can set the type of value such as currency or percentage. 57 Demo- Building Attribute and Analytical Views 58 What We’ll Cover • • • BusinessObjects Data Services SAP HANA SAP HANA Overview Creating Attribute Views Building Analytic Views Making Calculation Views Wrap-Up 59 Calculation View - Overview • Bring together database tables, attribute views, analytic views, and other calculation views • Provide one source of data for reporting tools • You can also write SQL statements to make sure a set of fields match requirements of other output structures Calculation views are used to satisfy complex business requirements. An example of a calculation view might be a comparison of actual sales with forecast sales. 60 Creating a New Calculation View A calculation view will now be created to join together other tables and views and utilize calculations and aggregations to analyze the data. 1. 2. Right click on the appropriate package In the context menu, click ‘New’ > ‘Calculation View’ 61 Naming the New Calculation View Give the Calculation View a proper name and label The ‘Copy From’ option can be used to copy and extend an existing calculation view without editing the original view or having to create a new one each time. 62 Propagate to Semantics In the projection layer, right click on attributes you want to display in the semantic layer and choose ‘Propagate to Semantics’ If you choose ‘Add to Output’ instead, that field in every node will have to be activated manually. 63 Creating a New Calculation in the View Calculated columns are used to derive some meaningful information in the form of columns, from existing columns. 1. Give the column a proper name 2. Set the ‘Data Type’ 3. Choose a function 4. Select the text within the parentheses 5. Choose an element (or attribute in your table) 6. Validate the syntax You can add your own calculations to the calculation view just as in the analytic view 64 Aggregation - Overview • Aggregation Node - columns will be rolled up, or aggregated, when placed in this layer. Customer Product Amount 1 1 20 1 1 20 2 2 30 3 3 25 4 4 20 Customer 1’s amounts were added up so there is one less row to display. With an aggregated column on customer and amount, you would get a data set that looks like the following: Customer Amount 1 40 2 30 3 25 4 20 65 Adding a Calculated Column to the Aggregation Layer In the aggregation node, calculated columns can be added as aggregated columns If calculations are not added to a projection layer, and then sent to an aggregation node, the totals will not work properly in reporting. 66 Assigning Column Types to the View In the semantics layer, each item needs to be assigned the type attribute or measure 1. Click on the ‘Semantics’ node 2. Click the ‘Auto Assign’ button to automatically assign the ‘Type’ 3. If any of the types are incorrect, you can manually adjust them Once all assignments are complete, save and validate the view. You can set each of these types manually, but the automatic assignments are usually correct 67 What We’ll Cover • • • BusinessObjects Data Services SAP HANA SAP HANA Overview Creating Attribute Views Building Analytic Views Making Calculation Views Wrap-Up 68 7 Key Points to Take Home • SAP Data Services transforms, refines, and delivers trusted data for the Enterprise Data Warehouse • Multiple data sources can be used for Data Services including Flat Files, DTDs, XML Schemas, Excel Workbooks and more • Utilize built-in transforms which are objects that process source data to bring about desired outputs • SAP HANA indexes data from a variety of sources and stores the results on a dedicated server • Attributes add details and can be modeled using Attribute Views • Analytic Views join together one central fact table consisting of calculated measures and variables for reporting • Calculation Views bring together database tables, attribute views, analytic views and other calculation views 69 Where to Find More Information • • • • www.sap-press.com/products/SAP-HANA%3A-An-Introduction(2nd-Edition).html Bjarne Berg and Penny Silvia, SAP HANA: An introduction, SAP Press; 3rd edition (May 1, 2014) http://www.saphana.com/welcome SAP’s main page for all SAP HANA related information http://www.saphana.com/community/try Powered by HANA demos http://scn.sap.com/community/hana-in-memory SAP HANA and In-Memory Computing by SAP HANA Community 70 Your Turn! How to contact me: Dr. Berg bberg@comerit.com Please remember to complete your session evaluation 71 Disclaimer SAP, R/3, mySAP, mySAP.com, SAP NetWeaver®, Duet™®, PartnerEdge, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other product and service names mentioned are the trademarks of their respective companies. Wellesley Information Services is neither owned nor controlled by SAP. 72