SIZING AND PERFORMANCE Performance ASAP FOR BW ACCELERATOR SAP BUSINESS INFORMATION WAREHOUSE Performance issues and tuning of a BW system Version 2.0 June 2000 1999 SAP AG 1 SIZING AND PERFORMANCE Table of Contents ASAP for BW Accelerator ......................................................................................................................... 1 1 1.1 2 INTRODUCTION ............................................................................................ 1 Software Version Supported ......................................................................................................... 1 PERFORMANCE ............................................................................................ 1 2.1 Influencing factors ......................................................................................................................... 1 2.1.1 Golden rules ............................................................................................................................. 1 2.1.2 InfoCube Design ...................................................................................................................... 2 2.1.3 Loading of Data ....................................................................................................................... 6 2.1.4 Querying ................................................................................................................................ 13 2.1.5 Size of Queries ....................................................................................................................... 13 2.1.6 Batch printing ........................................................................................................................ 13 2.2 System settings ............................................................................................................................. 14 2.2.1 Database Storage Parameters ................................................................................................. 14 2.2.2 DB Optimizer statistics .......................................................................................................... 14 2.3 Monitoring.................................................................................................................................... 14 2.3.1 BW Statistics ......................................................................................................................... 14 3 MEASUREMENTS........................................................................................ 15 4 DETAILED INFORMATION .......................................................................... 15 2000 SAP AG 2 SIZING AND PERFORMANCE 1 Introduction This document describes performance issues in a BW system. Monitoring tools will be presented that help you improve performance. 1.1 Software Version Supported This document was written specifically for BW version 2.0B, but it should apply to all versions of BW. 2 Performance Database (DB) functionality, Business Information Warehouse (BW) coding, and the system’s implementation influence the performance of a BW System. Improvements in database platforms, tools and basis technology will constantly be incorporated into BW coding to achieve a better performance. Code will also be optimized based on experiences being made with customer installations. This paper will focus on those issues, which need to be dealt with during your BW implementation or later in production for achieving better performance. 2.1 Influencing factors 2.1.1 Golden rules The most crucial factors which influence the performance of data loading and querying are listed below. Paying attention to these golden rules will help you to avoid unnecessary performance problems. Of course there are further factors which influence the performance, they are described in the other chapters. Data loading Aggregates when loading deltas (2.1.3.4.2) Buffering of number ranges (2.1.3.4.1, 2.1.3.5.1) InfoCube design (2.1.2) Load master data before transaction data (2.1.3.3) Parallel upload (2.1.3.5.1) Package size (2.1.3.1) Secondary indexes for fact table dropped? (2.1.3.5.1) Use of Persisten Staging Area (PSA) (2.1.3.3) Querying Aggregates (see methodology paper “Aggregates”) Avoid huge query results (2.1.5, 2.1.4) DB - Statistics (2.2.2) 1999 SAP AG 1 SIZING AND PERFORMANCE Hierarchies (2.1.2.7) InfoCube design (2.1.2) Navigational Attributes (2.1.2.6) Secondary indexes existing and analyzed? (2.1.3.5.2) 2.1.2 InfoCube Design Before starting to create InfoCubes in SAP BW it is crucial to seriously consider the data model. Data modeling is often a controversial topic and many approaches exist. In this document you’ll only find a short discussion of this issue since the design of an InfoCube has a significant influence on the performance. Data Modeling is treated in detail in a separate Methodology paper entitled “Data Modeling with BW”. When designing InfoCubes you should consider business processes & data users’ reporting requirements decision processes level of detail required In this section we summarize the most important data modeling issues with respect to performance. Although we are primarily addressing query performance issues, some issues related to data uploads will be discussed as well. 2.1.2.1 Fact table The fact table consists of dimension table keys and key figures. Characteristics indirectly define the fact table key and are values, which describe the key figures more exactly. Examples of characteristics might be customer, order, material, or year. In BW characteristics are grouped into dimensions. The fact table’s key consists of all the pointers to the associated dimensions around the fact table. For each combination of characteristics uploaded into BW the corresponding key figures are to be found in the fact table. Key figures usually are additive numeric values (for example, amount, quantity and numeric values) but they can also be values such as average, unit price, auxiliary date fields, non-cumulative values, and non additive calculations (for example, price). 2.1.2.2 Fact table granularity Volume is always a concern for fact tables. The level of detail has a large impact on querying efficiencies and overall storage requirements. The grain of the fact table is directly impacted by the dimension table design because the most atomic characteristic in each dimension determines the grain of the fact table. Let’s say, for example, that The performance of outlets and articles needs to be analyzed. Descriptive attributes are: outlet, receipts, articles, customers, time. Limit analysis to articles and time, and further assume 1.000 articles are grouped by 10 article groups. The article group performance is tracked on a weekly basis. Granularity: article group, week, and 300 sales days a year (45 weeks) 10 x 45 = 450 records in the fact table per year due to only these two attributes if all articles are sold within a week. 1999 SAP AG 2 SIZING AND PERFORMANCE Granularity: article, week, 300 sales days a year (45 weeks) 1,000 x 45 = 45,000 records in the fact table per year due to only these two attributes if all articles are sold within a week. Granularity: article, day, 300 sales days a year 1.000 x 300 = 300,000 records in the fact table per year due to only these two attributes if all articles are sold within a day. Granularity: article, hour, 300 sales days a year, 12 sales hours a day 500 x 300 x 12 = 1,800,000 records in the fact table per year due to only these two attributes if on average 500 articles are sold within an hour. 2.1.2.3 Fact table considerations Large fact tables impact reporting and analysis. Therefore you should always consider whether the use of aggregates of the fact table are feasible methods for improving performance and reducing hardware needs. In addition, consider partitioning of the fact table. Many database platforms support table partitioning. Partitioning can only be setup on the E table (storing the compressed requests, see 2.1.3.7) of an InfoCube in the BW system before any data has been loaded into the InfoCube. Currently partitioning by calendar month or fiscal period is possible. Another concept of partitioning on a logical level is available in BW: MultiCubes. Setting up a cube as MultiCube enables you to read from and load into smaller cubes in parallel, thus improving performance. More information on MultiCubes is available in the methodology paper “ Multi-Dimensional Modeling with BW”. Furthermore keep the number of key figures to a minimum. Avoid storing values that can be calculated. For example, instead ofstoring the average price,store quantity and revenue. The average price can be calculated in the query (revenue/quantity). 2.1.2.4 Dimension tables Each InfoCube may have up to 16 dimensions. There are 3 default dimensions: time, unit, and package. This leaves a maximum of 13 user defined dimensions. Possible dimensions could be: customer, order, date, and material. A maximum of 248 characteristics can be defined in each dimension. A dimension should be defined in such a way that each row in a dimension table has several corresponding rows in the fact table. The fact table and dimension tables are arranged according to the star schema. This means for each query first the dimensions will be browsed and then with the gathered key values all records in the fact table will be selected which have the same values in the fact table key. In general, dimensions should be modeled in such a way that the number of instances are kept small. This means that the dimension tables should be small. This is important for the star join mechanism to work properly. Some rules of thumb are: (a) The ratios size of dimension table / size of fact table should be less than 15%. This limit is heuristic and has no deep scientific background. (b) No combination of characteristics that are put into the same dimension should have an n:m relationship. For example, it is usually not a good idea to put customers and products into the same dimension: customers buy many different products and each product is bought by many customers. Thus there is an n:m relationship between these two entities and, as a consequence, many combination of customers and products as entries in the corresponding dimension table. Therefore n:m relationships are likely to violate rule (a). 1999 SAP AG 3 SIZING AND PERFORMANCE example: 10,000 customers and 10,000 products for two dimensions: 10,000 + 10,000 = 20,000 records for one dimension: 10,000 x 10,000 = 100,000,000 records Obviously, if either n or m is small (i.e. 2, 3 or 4) then this should not necessarily be considered as a violation of rule (b). (c) It is better to have many dimensions with few characteristics rather than a few dimensions, each with many characteristics. (d) If you have a characteristic which has a different instance for almost every fact table record (a line item characteristic) you can set up a dimension as a line item dimension and include only this characteristic in the dimension (before any data is loaded into the InfoCube). For this dimension no separate dimension table is created but the characteristic is included in the fact table itself, thus improving performance for both loading and querying. 2.1.2.5 Master data tables Master data is a common description for values that are InfoCube-independent, i.e. they can be used with several InfoCubes. Master data may have a descriptive text and can be used with hierarchies. Usually a master data table exists for each characteristic in a dimension table. Besides the key the master data tables can contain additional navigational attributes which behave like characteristics. For example, customer may contain customer number, customer group, customer region, customer name, and customer address. 2.1.2.6 Characteristics vs. Navigational Attributes Using navigational attributes always incurs a performance penalty in comparison to the situation in which the same InfoObject is used as a characteristic. Therefore, you should carefully consider whether an InfoObject is used as a navigational attribute or as a characteristic. 2.1.2.7 Hierarchies In BW, the are essentially three possibilities for modeling hierarchies: as a hierarchy of characteristics within a dimension as a hierarchy of attributes attached to a characteristic as an external hierarchy Let us take a quick look at the pros and cons of those different modeling techniques. 2.1.2.7.1 Hierarchies within a Dimension A typical example for a hierarchy fitting into this context is a time hierarchy with levels such as millennium – century – decade – year – month – day – hour. Another typical example is a geographic hierarchy with levels such as continent – country – state – region – city. Hierarchies that can be modeled within a dimension have certain properties: The number of levels is fixed; each level is represented by an InfoObject. Example: A geographic dimension with InfoObjects 0COUNTRY (country), 0REGION (region) and 0CITY (city). Either the hierarchy does not change or its changes do not apply to the past (for example, facts that are already loaded into an InfoCube). 1999 SAP AG 4 SIZING AND PERFORMANCE For example, the geographic hierarchy above changed during German unification. A city like "Dresden" suddenly belonged to an other country. However, this change should not usually affect data/facts that refer to the time before German unification as at that time the previous geographical dependencies applied. The performance aspects of this technique are: Queries to InfoCubes that use these kinds of hierarchies are generally faster than the same queries to InfoCubes that model the same scenario with one of the two other hierarchy modeling techniques However, BW does not explicitly know about hierarchical dependencies. Therefore aggregates that summarize data over regions are not used for queries that summarize over countries if the country is not included in that aggregate as well. Therefore you should always (manually) include the hierarchical levels to such an aggregate that lie above the level over which data is summarized. Example 1: If an aggregate summarizes data over 0REGION then include 0COUNTRY in that aggregate, too. Example 2: If an aggregate summarizes data over months then include years, decades. 2.1.2.7.2 Hierarchies as Navigational Attributes of a Characteristic This case is very similar to the one discussed in the previous section. The difference is the increased flexibility (for example, realignment facilities) that comes with navigational attributes. The hierarchy should still have a fixed number of levels. However, changes to that hierarchy (i.e. changes to attribute values) can be easily applied to facts that are already loaded into a cube. This is the essential difference with section 2.1.2.7.1. A typical example is the hierarchy of sales office – sales group – sales person. This hierarchy has a fixed number of levels but is frequently reorganized. From a performance perspective the same arguments hold as in section 2.1.2.6. In general, this is the least attractive hierarchy modeling technique as it performs worse than 2.1.2.7.1 and frequently not better than the one in the following section. It is more flexible than 2.1.2.7.1 but less flexible than 2.1.2.7.3. 2.1.2.7.3 External Hierarchies An ideal external hierarchy frequently changes and/or has no fixed number of levels (sometimes referred to as a unbalanced hierarchy). A typical example is a cost center hierarchy in which several (sub-)cost centers belong to one cost center which itself belongs to another cost center and so on. Such a hierarchy which has no fixed number of levels as cost centers usually corresponds to departments or groups within a company which might be reorganized into new subgroups. Thus, new levels might be introduced, old ones disappear, the hierarchy might be deeper at one end (due to a deeper hierarchical organization) and shallow at the other end. Another major advantage of external hierarchies vs. their alternatives is that an InfoObject can have several such hierarchies and all these can be used within the same InfoCube. The same effect could only be achieved through unpleasant work-arounds when using the alternative approaches. The performance issues connected to this type of hierarchy are the following: External hierarchies usually perform worse than those modeled within dimensions. They usually perform at least as well as the hierarchies based on navigational attributes. 1999 SAP AG 5 SIZING AND PERFORMANCE Problems can arise for big external hierarchies containing many thousands of nodes and leaves. In that case it might be better to consider one of the other two alternatives. You can explicitly defined aggregates on levels of such hierarchies. Queries that summarize data on higher levels can take advantage of such an aggregate. 2.1.2.8 Aggregates The objective of aggregates is to reduce the volume of data per query being read. In fact aggregates are a new separate, transparent InfoCube which holds aggregated data. The user gets the improved performance without any intervention or knowledge if aggregates are being defined and created. E.g.: Country Customer Sales US 1 10 US 2 15 IT 1 20 IT 3 50 Using a Country Aggregate: Country Sales US 25 IT 70 Using a Customer Aggregate: Customer Sales 1 30 2 15 3 50 Aggregates can be created on navigational attributes, hierarchy levels and dimension characteristics. Aggregate tables cannot be created on hierarchies where the structure is time-dependent, and timedependent navigational attributes of time-dependent data. Best results are gained for aggregates for external hierarchies and navigational attributes. Detailed information on aggregates can be found in the methodology paper “Aggregates”. 2.1.3 Loading of Data Two different techniques are available for loading data into BW: Loading data using the IDoc - technology or using tRFC. 2.1.3.1 Data Packages This size is defined by the parameter IDOC_PACKET_SIZE in the table RSADMINC for BW file or BAPI uploads and is set as the number of records to fill a data package. 1999 SAP AG 6 SIZING AND PERFORMANCE For loading data from the R/3 OLTP System, the maximum size of one data IDoc can be set in the source system table ROIDOCPRMS (size in [kB]). The following issues are relevant when determining an appropriate value for IDOC_PACKET_SIZE: When in BW update rules will find records with equal keys the rows within an IDoc are preaggregated if possible. This means that rows with the same key values are aggregated and as a result reduce the number of INSERT operations at the database level. Pre-aggregation is restricted to the rows within a data package. If the data that is to be loaded into the data warehouse is likely to benefit from this preaggregation, meaning it is likely to hold many rows with matching key values, then data packages should comprise many rows in order to maximize the benefit from pre-aggregation. Performance improvements can be obtained if the following recommendations are adhered to. 1. Keep the number of the data packages small. We recommend a data package size of 20000 to 50000 kB (limited by RAM as well as DB depending, see SAPNET note 130253). You can maintain this value for an R/3 System in its table ROIDOCPRMS and for a file in the BW system in table RSADMIN. 2. While loading large quantities from a file we recommend that you split the file into several parts. These can then be loaded in parallel into the BW with several requests. However, a precondition is that several processors and a quick RAID are available in the BW System. 3. Use a predefined record length (ASCII file) when loading from a file. For a CSV file, the conversion to a predefined record length is carried out in the system during the loading process. 4. If possible, load the data from a file from the application server and not from the client workstation. 5. First, load the master data for all characteristics, which are in the InfoCube or the InfoSource in question. As a result, you avoid that your transaction data contains many characteristics for which a new SID must be determined. 2.1.3.2 Persistent Staging Area (PSA) The PSA is a storage area in which transaction data from different source systems can be stored temporarily. Data storage takes place in relational database tables of the BW. The data format remains unchanged by this, meaning that no summarizations or transformations whatsoever take place via definitions as they do in the case of InfoCubes. When using the PSA, an individual transparent PSA table is created for each transfer structure. Each one has the same structure as the respective transfer structure. If you change the transfer structure by maintaining the transfer rules, as a rule, a new PSA table is then generated. The previous table remains unchanged and is provided with a created version. The data gets into the PSA from a combination of the source system and InfoSource. If you set up the PSA with a data extraction, then you achieve an improved performance because tRFC is used, and can trace the data flow of individual data records thanks to the temporary storage in the PSA. 2.1.3.3 Benefits of PSA vs. IDoc allows more than 1000 bytes record length performs faster data loading provides different load methods: 1999 SAP AG 7 SIZING AND PERFORMANCE Updating Data in the PSA and InfoCubes/InfoObjects at the Same Time This is a way of carrying out a high-performance update of data in the PSA, and in one or more InfoCubes. BW receives data from the source system, at the same time immediately starts the update in the InfoCubes, and saves the data in the PSA. Updating Data in the PSA and InfoCubes/InfoObjects at Different Times This is the standard method. Data is firstly updated in the PSA and can be subsequently updated from there via the context menu of the corresponding request. The preferred method for loading data into BW is to use PSA if this is possible for the specific type of InfoSource. (PSA is not currently usable for hierarchy data). Before loading transaction data into BW, we recommend that the master data load is finished. The master data load creates the SIDs to the key-values. This is much more efficient than doing this during the transaction data upload. More important is the fact that the transaction data can be checked against the existence of master data if this is loaded first. To get clean and valid data we recommend that you load transaction data with the check option. If there is any invalid data the complete InfoPackage which contains the invalid value will be marked as not posted in the monitor. You can identify which value of which InfoObject causes the problem in the monitor log. 2.1.3.4 2.1.3.4.1 Master data Initial load Normally the master data load into the BW is not as performance critical as the load of transactional data. The amount of data in general is much less than for transactional data. In order to improve performance switch on number range buffering for all attributes and key values which must be converted to a SID during the load and for which you expect a high number of different values. The number range buffering is switched off by default because normally it leads to a loss of numbers if the occurance of new entries is rare and it consumes memory. Therefore after the initial load of the master data please reset the buffering to a smaller interval. Please proceed as follows: transaction se37 function RSD_IOBJ_GET single test input: I_IOBJNM name of the InfoObject result: structure E_S_VIOBJ contains a field NUMBRANR with a 7 digit number the name of the number range object is built by adding BIM to the number table E_T_ATR_NAV contains all navigational attributes to get all involved number ranges please repeat this for all entries of E_T_ATR_NAV transaction SNRO ( maintenance of number range objects) Enter the name of the number range object and press the change button EDIT ------ SET-UP BUFFERING --- MAIN MEMORY Now you can enter a number how many number should be buffered in the number range buffer. Normally a value between 50 and 1000 is appropriate. After the successful load of the master data records change the settings back to no buffering. 2.1.3.4.2 Delta load 1999 SAP AG 8 SIZING AND PERFORMANCE Changes to master data can cause real performance problems because all dependent aggregates must be rebuilt. In the first step, the changes are stored in the master data table as “modified”. All queries are running against the old active values. The activation process now recomputes all dependent aggregates, deletes the old “active” version and changes the new “modified” to “active”. The same happens when changing external hierarchies. 2.1.3.5 2.1.3.5.1 Transaction data Initial load The initial load of transaction data is normally a very large amount of data. There is no special functionality for the initial load of transaction data. What is the best way to load a large amount of transaction data into BW? Extraction The first step of the loading process is to build up the extract out of the OLTP system. To support parallel loading you can build several extracts containing disjunct data. Often the best criterion for splitting the extracts is time, e.g. if you load 2 years of transaction data you get 24 extracts if you build one extract for each month. The extracts can be written in parallel into the PSA. Where possible it is beneficial to provide the data sorted by characteristics in the order of the fact table key. The recommendation is to sort the data if the DataSource is a data file. Update Rules If it’s possible provide extraction data such that no complex update rules (e.g. data modification and verification) must be processed. If you need to read master data records in an update routine use the standard functionality instead of own programming. In depth information about performance-efficient programming can be obtained by using transaction SE38 Report: RSHOWTIM. Upload into the InfoCube We recommend that you choose the data load with checking for the existence of master data records. This means the master data for the characteristics should be loaded first. Loading data with “automatic SID creation “ causes a large overhead. Before a mass data load is triggered the secondary indexes on the fact table should be deleted. After the mass data load has finished, these indexes must be recreated. The benefit of this procedure is to avoid overhead that is caused by maintaining such an index during the insert. Number range buffering for dimension tables should be switched on for the initial load. This will speed up the loading process dramatically, especially if very large dimension tables must be built during loading. For the delta load, the buffering should be reset to a smaller interval. For the standard dimensions TIME, UNIT and PACKAGE you normally don’t need to activate the number range buffering. How does it work? Buffering the number range for the dimension Transaction: se37 Function Module: RSD_CUBE_GET Enter name of InfoCube, Object version: A Bypass Buffer: X Press ‘Execute’ the table E_T_DIME The name of the number range can be found under: NOBJECT (e.g. BID0000053) For each number range object: 1999 SAP AG 9 SIZING AND PERFORMANCE Transaction: SNRO Enter object Press ‘Change’ Menu: ‘Edit’ -> ‘Set up buffering’ -> ‘Main memory’ values between 50 and 1000 are reasonable. The effect on the performance due to the use of the above described recommendations cannot be named in values or factors. This heavily depends on the amount of data which is to be loaded. In general, you can say that the effect increases with the increase of data volume but there is no linear relationship. Normally, you should run the upload in the background. If you want to load online please set the SAP R/3 parameter rdisp/max_wprun_time to 0 (zero) in order to allow unlimited CPU time for dialog work processes during mass data loads. 2.1.3.5.2 Delta load Typically a delta load is much smaller than an intial load thus performance is not that important. Otherwise the same recommendations hold as for initial loads. 2.1.3.6 General remarks No formal rule exists for update frequencies from source systems to the BW. BW allows loading data during normal reporting but you should consider the competition of read and load processes. You have to define your company specific data load frequencies, such as a daily update for SD and a monthly update for FI, and evaluate the impacts on query execution and performance. The possible number of aggregates you can maintain depends on the frequency of the transactional data load. On an InfoCube with a low frequency of delta uploads (monthly) you can maintain more aggregates than on an InfoCube with a high delta frequency. The posting of master data changes is a different situation. If a lot of navigational attributes are involved, the activation of the master data causes the rebuild of all affected aggregates. The database administrator should create separate tablespaces for very large fact tables (and very large ODS or PSA tables). The storage parameters must be adapted to the large size of the tables. Most of the problems during initial loads are tablespace overflows of fact, ODS or PSA tables. In the case of IDocs the table “EDI40” causes most of the problems. With enough diskspace and large extents (e.g. 100 – 500 MB ) it’ s easy to avoid abnormal terminations of the loading processes. Please bear in mind that the indexes of large tables are also large. For the maintenance of the rollback segment in BW, we recommend that you have as many rollback segments as the maximum number of parallel upload processes. Enough diskspace for the temporary tablespace and the rollback tablespace is necessary for large package sizes and the index creation of large tables. This temporary space is also necessary for the OLAP engine and especially for the creation of aggregates. 2.1.3.7 Compression of the InfoCube When uploading a request into BW the data of each request will be saved with its own request ID. As a result, records which had been uploaded with the same key may be contained in an InfoCube several times but with a different request ID. This differentiation is necessary to be able to delete single requests out of an InfoCube after they have been uploaded. As a result, it is necessary to aggregate over the request ID each time a query is executed. Apart from additionally needed disk space for the different requests, this separation leads to a decreasing performance. The compress function is designed to eliminate this disadvantage and thus speed up reporting. Therefore all the different requests will be aggregated and stored in a separate request. The original requests will be deleted after the compression is done. Afterwards it is no longer possible to delete records of a particular request out of the InfoCube. For performance and disk space reasons a request should be aggregated as soon as possible if you are sure that the request will never be deleted out of the InfoCube. 1999 SAP AG 10 SIZING AND PERFORMANCE The compress functionality has an additional influence on the query performance if the InfoCube contains non-cumulative values. Together with the compression described above the reference point for non-cumulative values will be updated. As a result, in general less data has to be read for a query with non-cumulative values and thus the response time is better. Example: Assuming we have an InfoCube which consists of the following characteristics (with its dimensions): char1 (dimension 1) char2 (dimension 1) char3 (dimension 2) 0calday (dimension t) 0loc_currcy (dimension u) 0unit (dimension u) 0unit_of_wt (dimension u) 0requid (dimension p) and key figures (with its units): kyf1 (0unit) kyf2 (0unit) kyf3 (0unit_of_wt). Furthermore we assume that the InfoCube already contains data which has been aggregated in requestID 0 by a previous condense run. The example simply shows the entries in the fact table which means that the key figure values will only be differentiated through the respective SID’s of the dimensions. The column number is not part of the InfoCube but is just used to facilitate the referencing of individual rows. Nr. DimP DimU DimT Dim1 Dim2 Kyf1 Kyf2 Kyf3 1 0 1 1 1 1 100 100 50 2 0 1 1 2 1 100 -100 0 3 0 1 1 6 1 100 0 50 Afterwards two further requests will be read (assumed with request SID 6 and 7): Nr. DimP DimU DimT Dim1 Dim2 Kyf1 Kyf2 Kyf3 1 6 1 1 1 1 0 -100 50 2 6 1 1 2 1 -100 100 0 3 6 1 1 3 1 100 4 6 1 1 4 1 5 6 1 1 5 1 100 6 6 1 1 6 1 -100 0 0 -100 -50 50 0 -100 50 0 Nr. DimP DimU DimT Dim1 Dim2 Kyf1 Kyf2 Kyf3 1 7 1 1 4 1 2 7 1 1 5 1 100 100 100 3 7 1 1 6 1 100 100 1999 SAP AG 0 50 -50 50 11 SIZING AND PERFORMANCE If a condense run for these two requests is to be executed you will get the following result: Nr. DimP DimU DimT Dim1 Dim2 Kyf1 Kyf2 Kyf3 1 0 1 1 1 1 100 0 100 2 0 1 1 2 1 0 3 0 1 1 3 1 100 0 -100 4 0 1 1 4 1 0 0 5 0 1 1 5 1 200 100 0 6 0 1 1 6 1 100 200 50 0 0 0 The table results from the following actions: No.1 (or No. 2): Update of records 1 (or 2) of request 0 containing record 1 (resp. 2) of request 6 No. 3: insert of record 3 of requests 6 in request 0 No. 4 (resp. No. 5): aggregation of records 4 of request 6 and record 1 of request 7 (resp. record 5 of request 6 and record 2 of request 7) and subsequent insert in request 0. Nr. 6: aggregation of record 6 of requests 6 and record 3 of requests 7 and subsequent update on record 6 of requests 0. The result table above contains several entries with zero values as key figures. If these values are not desired (they can for example, occur through reverse postings) a zero elimination can be done during the condense run. Then all entries in the fact table which have zero value for key figures will be eliminated. Example: The result table above would look like the following when using the zero elimination functionality: Nr. DimP DimU DimT Dim1 Dim2 Kyf1 Kyf2 Kyf3 1 0 1 1 1 1 100 0 100 3 0 1 1 3 1 100 0 -100 5 0 1 1 5 1 200 100 0 6 0 1 1 6 1 100 200 50 This result now shows us that the two entries (2 and 4 of the result table without zero elimination) which have zeros for all key figures would no any longer be contained in the result fact table. While for entry 2 via record 2 from request 6 an already existing record in the fact table is deleted for entry 4 the insert of a new zero value record will be prevented through entry 4 of request 6 and 1 request. 2.1.3.8 Administration of Extraction The administration of a Data Warehouse is one main location where you can heavily influence a systems performance. E.g.: The extraction of data from the source system should not concur with the online activities in both the source and the Data Warehouse system. Therefore loading should take place at off-hours. Avoid large loads across a network. Opt for the proper loading source: 1999 SAP AG 12 SIZING AND PERFORMANCE Avoid reading load files from tape, instead copy them to disk first. Avoid placing input load files on the same disk drives or controllers as the tables being loading. During testing it is fine to load data directly from the workstation but in case of mass data you should dedicate an application server in your source system landscape to extraction. When loading from the workstation, the whole file is stored in the RAM of the application server, which could lead to an overflow in the RAM. When loading data from the application server the file is read by record and therefore no limitation occurs with respect to the size of the RAM. Use truncate not delete if replacing the entire table contents. Investigate loading with archiving turned off. The log file can become the performance bottleneck, not to mention the required disk space for the log. (Don’t forget to create a backup first and turn archiving back on after loading) 2.1.4 Querying It might be beneficial to add further indexes on individual SID fields in the dimension or master data tables, in particular if this SID refers to a characteristic that is frequently restricted by the queries running on your system. If possible, such indexes should be of type bitmap, but B-tree indexes are fine, too. The design of the queries for an InfoCube has a direct impact on reporting performance of the BW system. Naturally, a query is defined to return the result set that a user expects and one does not want to concede anything with respect to that. However, users usually look at the query result in portions, i.e. they first look at the profits per year just to drill down later in order to see the profits per year, per region. It is much faster to define this query moving region into the rows and year to the free characteristics section. In comparison to a query that holds both, region and year, in the rows section of the query definition screen, this approach has the following advantage: the result set of the initial step is smaller. This implies faster query processing on the database server (due to smaller intermediate results) and faster data transport from the database to the application server and from there to the PC. This approach does not reduce the result that the user can possibly expect from that query. However, it separates the drill down steps by using free characteristics. This not only accelerates query processing but also presents manageable portions of data to the user. 2.1.5 Size of Queries There is no limitation regarding the size of an InfoCube or the size of a query result set, but there are limitations given by the operating system, the application, such as Microsoft Excel’s limit of 65,536 lines, or the database-application server communication, which may be a maximum of 1.5 million records. When defining queries, the duration of the query execution should be considered as well. For online browsing and interactive navigating in result sets of a query we recommend that you keep the number of result sets at a minimum (max. 10.000 records). For larger result sets batch printing should be considered as an alternative. 2.1.6 Batch printing You do not just have to evaluate queries online, you can also print them out. You can change the evaluation of a query in the online display interactively by navigating. The evaluation of the query data in the printed version is fixed. 1999 SAP AG 13 SIZING AND PERFORMANCE You can print queries according to their online display, but special print settings can also be made (e.g., print title, maximum number of pages, result position, page layout for top/end page and header/footer part). Different evaluations (or navigational steps) are possible with these print settings (e.g., choose print characteristics from all available characteristics, new cover page, new page or new block for each characteristic). They are produced as different variants of the print settings of a query. This means that you have the possibility of evaluating data online and in a printout using just one query definition. You can define the online display of a query and the print evaluation differently. Batch printing allows you to schedule print jobs. So you can use off-hours for this. 2.2 System settings OLTP and OLAP systems serve different purposes. The typical main job of an OLTP system is fast order processing, whereas an OLAP system needs to be optimized for querying the database. Therefore, the SAP System parameters for the Business Warehouse are different from an OLTP system. This includes different defaults/parameters for: table buffers program buffers memory management The difference between OLTP and OLAP systems mentioned above also has an impact on the database profile. 2.2.1 Database Storage Parameters It is possible to put a fact table and the dimension tables of an InfoCube into tablespaces that are different from the default tablespace(s). This can be done by assigning a different "data type" for those tables; the data dictionary links those data types directly to particular tablespaces. In case of AS/400 it is not necessary to set storage parameters due to the single level storage feature. 2.2.2 DB Optimizer statistics The query optimizers of the database management systems decide on the most appropriate query execution plan depending on the statistical information that is provided for the tables that participate in the query. If statistical information is missing for one of the tables (and possibly its indexes), it uses default statistics. However, real-world experience shows that such a situation usually ends up with bad query execution plans. Therefore, it is necessary to make sure (a) that statistical information exists and (b) that it is kept up to date. That’s why you should have the DB statistics recalculated after each data load for an InfoCube (if applicable for your database system). 2.3 Monitoring 2.3.1 BW Statistics There are extensive BW Statistics which help to analyze system performance and define and optimize aggregates. The BW Statistics are explained in detail in the methodology paper BW Statistics. The BW Statistics data allow you to answer questions such as: Which InfoCubes, InfoObjects, InfoSources, source systems, queries, and aggregates are used in the BW System? How large is the data volume in the system? How the BW system is being used and by whom? 1999 SAP AG 14 SIZING AND PERFORMANCE Which queries are taking too much time for online reporting? Which resources have been used by which user / user group? Which system resources have been used by the database / OLAP processor / frontend? n general, the BW Statistics show us how the BW system is being used and can be used to identify system resource bottlenecks You can switch on/off the update of BW Statistics for each InfoCube. There are separate flags for OLAP and WHM (Warehouse Management) statistics. 3 Measurements Certified results of the SAP BW Application Benchmark can be found at http://sapnet.sa.com/benchmark. 4 Detailed Information Since any specific information on e.g. system parameters, database parameters is outdated once it is written down, there are no values given here. Currently network requirements are scrutinized but there are no results, yet. So please check SAPNet regularly for up to date information about BW performance. In the collective note 184905, Collective Note Performance 2.0 you can find links to all relevant notes. There is a three day course TABW90, “BW-Technical Administration”. One large part of this course deals with performance issues. 1999 SAP AG 15