Performance ASAP BW A

advertisement
SIZING AND PERFORMANCE
Performance
ASAP FOR BW ACCELERATOR
SAP BUSINESS INFORMATION WAREHOUSE
Performance issues and tuning of a BW
system
Version 2.0
June 2000
1999 SAP AG
1
SIZING AND PERFORMANCE
Table of Contents
ASAP for BW Accelerator ......................................................................................................................... 1
1
1.1
2
INTRODUCTION ............................................................................................ 1
Software Version Supported ......................................................................................................... 1
PERFORMANCE ............................................................................................ 1
2.1
Influencing factors ......................................................................................................................... 1
2.1.1
Golden rules ............................................................................................................................. 1
2.1.2
InfoCube Design ...................................................................................................................... 2
2.1.3
Loading of Data ....................................................................................................................... 6
2.1.4
Querying ................................................................................................................................ 13
2.1.5
Size of Queries ....................................................................................................................... 13
2.1.6
Batch printing ........................................................................................................................ 13
2.2
System settings ............................................................................................................................. 14
2.2.1
Database Storage Parameters ................................................................................................. 14
2.2.2
DB Optimizer statistics .......................................................................................................... 14
2.3
Monitoring.................................................................................................................................... 14
2.3.1
BW Statistics ......................................................................................................................... 14
3
MEASUREMENTS........................................................................................ 15
4
DETAILED INFORMATION .......................................................................... 15
2000 SAP AG
2
SIZING AND PERFORMANCE
1 Introduction
This document describes performance issues in a BW system. Monitoring tools will be presented that
help you improve performance.
1.1 Software Version Supported
This document was written specifically for BW version 2.0B, but it should apply to all versions of BW.
2 Performance
Database (DB) functionality, Business Information Warehouse (BW) coding, and the system’s
implementation influence the performance of a BW System.
Improvements in database platforms, tools and basis technology will constantly be incorporated into
BW coding to achieve a better performance. Code will also be optimized based on experiences being
made with customer installations.
This paper will focus on those issues, which need to be dealt with during your BW implementation or
later in production for achieving better performance.
2.1 Influencing factors
2.1.1
Golden rules
The most crucial factors which influence the performance of data loading and querying are listed
below. Paying attention to these golden rules will help you to avoid unnecessary performance
problems. Of course there are further factors which influence the performance, they are described in
the other chapters.


Data loading

Aggregates when loading deltas (2.1.3.4.2)

Buffering of number ranges (2.1.3.4.1, 2.1.3.5.1)

InfoCube design (2.1.2)

Load master data before transaction data (2.1.3.3)

Parallel upload (2.1.3.5.1)

Package size (2.1.3.1)

Secondary indexes for fact table dropped? (2.1.3.5.1)

Use of Persisten Staging Area (PSA) (2.1.3.3)
Querying

Aggregates (see methodology paper “Aggregates”)

Avoid huge query results (2.1.5, 2.1.4)

DB - Statistics (2.2.2)
1999 SAP AG
1
SIZING AND PERFORMANCE

Hierarchies (2.1.2.7)

InfoCube design (2.1.2)

Navigational Attributes (2.1.2.6)

Secondary indexes existing and analyzed? (2.1.3.5.2)
2.1.2
InfoCube Design
Before starting to create InfoCubes in SAP BW it is crucial to seriously consider the data model. Data
modeling is often a controversial topic and many approaches exist. In this document you’ll only find a
short discussion of this issue since the design of an InfoCube has a significant influence on the
performance. Data Modeling is treated in detail in a separate Methodology paper entitled “Data
Modeling with BW”.
When designing InfoCubes you should consider

business processes & data

users’ reporting requirements

decision processes

level of detail required
In this section we summarize the most important data modeling issues with respect to performance.
Although we are primarily addressing query performance issues, some issues related to data uploads
will be discussed as well.
2.1.2.1
Fact table
The fact table consists of dimension table keys and key figures.
Characteristics indirectly define the fact table key and are values, which describe the key figures more
exactly. Examples of characteristics might be customer, order, material, or year. In BW characteristics
are grouped into dimensions. The fact table’s key consists of all the pointers to the associated
dimensions around the fact table. For each combination of characteristics uploaded into BW the
corresponding key figures are to be found in the fact table.
Key figures usually are additive numeric values (for example, amount, quantity and numeric values)
but they can also be values such as average, unit price, auxiliary date fields, non-cumulative values,
and non additive calculations (for example, price).
2.1.2.2
Fact table granularity
Volume is always a concern for fact tables. The level of detail has a large impact on querying
efficiencies and overall storage requirements.
The grain of the fact table is directly impacted by the dimension table design because the most atomic
characteristic in each dimension determines the grain of the fact table.
Let’s say, for example, that The performance of outlets and articles needs to be analyzed. Descriptive
attributes are: outlet, receipts, articles, customers, time. Limit analysis to articles and time, and further
assume 1.000 articles are grouped by 10 article groups. The article group performance is tracked on a
weekly basis.

Granularity: article group, week, and 300 sales days a year (45 weeks)
10 x 45 = 450 records in the fact table per year due to only these two attributes if all articles
are sold within a week.
1999 SAP AG
2
SIZING AND PERFORMANCE

Granularity: article, week, 300 sales days a year (45 weeks)
1,000 x 45 = 45,000 records in the fact table per year due to only these two attributes if all
articles are sold within a week.

Granularity: article, day, 300 sales days a year
1.000 x 300 = 300,000 records in the fact table per year due to only these two attributes if all
articles are sold within a day.

Granularity: article, hour, 300 sales days a year, 12 sales hours a day
500 x 300 x 12 = 1,800,000 records in the fact table per year due to only these two attributes
if on average 500 articles are sold within an hour.
2.1.2.3
Fact table considerations
Large fact tables impact reporting and analysis. Therefore you should always consider whether the
use of aggregates of the fact table are feasible methods for improving performance and reducing
hardware needs.
In addition, consider partitioning of the fact table. Many database platforms support table partitioning.
Partitioning can only be setup on the E table (storing the compressed requests, see 2.1.3.7) of an
InfoCube in the BW system before any data has been loaded into the InfoCube. Currently partitioning
by calendar month or fiscal period is possible. Another concept of partitioning on a logical level is
available in BW: MultiCubes. Setting up a cube as MultiCube enables you to read from and load into
smaller cubes in parallel, thus improving performance. More information on MultiCubes is available in
the methodology paper “ Multi-Dimensional Modeling with BW”.
Furthermore keep the number of key figures to a minimum. Avoid storing values that can be
calculated. For example, instead ofstoring the average price,store quantity and revenue. The average
price can be calculated in the query (revenue/quantity).
2.1.2.4
Dimension tables
Each InfoCube may have up to 16 dimensions. There are 3 default dimensions: time, unit, and
package. This leaves a maximum of 13 user defined dimensions. Possible dimensions could be:
customer, order, date, and material. A maximum of 248 characteristics can be defined in each
dimension.
A dimension should be defined in such a way that each row in a dimension table has several
corresponding rows in the fact table.
The fact table and dimension tables are arranged according to the star schema. This means for each
query first the dimensions will be browsed and then with the gathered key values all records in the fact
table will be selected which have the same values in the fact table key.
In general, dimensions should be modeled in such a way that the number of instances are kept small.
This means that the dimension tables should be small. This is important for the star join mechanism to
work properly. Some rules of thumb are:
(a) The ratios
size of dimension table / size of fact table
should be less than 15%. This limit is heuristic and has no deep scientific background.
(b) No combination of characteristics that are put into the same dimension should have an n:m
relationship. For example, it is usually not a good idea to put customers and products into the
same dimension: customers buy many different products and each product is bought by many
customers. Thus there is an n:m relationship between these two entities and, as a consequence,
many combination of customers and products as entries in the corresponding dimension table.
Therefore n:m relationships are likely to violate rule (a).
1999 SAP AG
3
SIZING AND PERFORMANCE

example: 10,000 customers and 10,000 products
for two dimensions: 10,000 + 10,000 = 20,000 records
for one dimension: 10,000 x 10,000 = 100,000,000 records
Obviously, if either n or m is small (i.e. 2, 3 or 4) then this should not necessarily be considered as
a violation of rule (b).
(c) It is better to have many dimensions with few characteristics rather than a few dimensions, each
with many characteristics.
(d) If you have a characteristic which has a different instance for almost every fact table record (a line
item characteristic) you can set up a dimension as a line item dimension and include only this
characteristic in the dimension (before any data is loaded into the InfoCube). For this dimension
no separate dimension table is created but the characteristic is included in the fact table itself,
thus improving performance for both loading and querying.
2.1.2.5
Master data tables
Master data is a common description for values that are InfoCube-independent, i.e. they can be used
with several InfoCubes.
Master data may have a descriptive text and can be used with hierarchies.
Usually a master data table exists for each characteristic in a dimension table. Besides the key the
master data tables can contain additional navigational attributes which behave like characteristics. For
example, customer may contain customer number, customer group, customer region, customer
name, and customer address.
2.1.2.6
Characteristics vs. Navigational Attributes
Using navigational attributes always incurs a performance penalty in comparison to the situation in
which the same InfoObject is used as a characteristic. Therefore, you should carefully consider
whether an InfoObject is used as a navigational attribute or as a characteristic.
2.1.2.7
Hierarchies
In BW, the are essentially three possibilities for modeling hierarchies:

as a hierarchy of characteristics within a dimension

as a hierarchy of attributes attached to a characteristic

as an external hierarchy
Let us take a quick look at the pros and cons of those different modeling techniques.
2.1.2.7.1
Hierarchies within a Dimension
A typical example for a hierarchy fitting into this context is a time hierarchy with levels such as
millennium – century – decade – year – month – day – hour. Another typical example is a geographic
hierarchy with levels such as continent – country – state – region – city.
Hierarchies that can be modeled within a dimension have certain properties:

The number of levels is fixed; each level is represented by an InfoObject.
Example: A geographic dimension with InfoObjects 0COUNTRY (country), 0REGION (region) and
0CITY (city).

Either the hierarchy does not change or its changes do not apply to the past (for example, facts
that are already loaded into an InfoCube).
1999 SAP AG
4
SIZING AND PERFORMANCE
For example, the geographic hierarchy above changed during German unification. A city like
"Dresden" suddenly belonged to an other country. However, this change should not usually affect
data/facts that refer to the time before German unification as at that time the previous
geographical dependencies applied.
The performance aspects of this technique are:

Queries to InfoCubes that use these kinds of hierarchies are generally faster than the same
queries to InfoCubes that model the same scenario with one of the two other hierarchy modeling
techniques

However, BW does not explicitly know about hierarchical dependencies. Therefore aggregates
that summarize data over regions are not used for queries that summarize over countries if the
country is not included in that aggregate as well. Therefore you should always (manually) include
the hierarchical levels to such an aggregate that lie above the level over which data is
summarized.
Example 1: If an aggregate summarizes data over 0REGION then include 0COUNTRY in that
aggregate, too.
Example 2: If an aggregate summarizes data over months then include years, decades.
2.1.2.7.2
Hierarchies as Navigational Attributes of a Characteristic
This case is very similar to the one discussed in the previous section. The difference is the increased
flexibility (for example, realignment facilities) that comes with navigational attributes. The hierarchy
should still have a fixed number of levels. However, changes to that hierarchy (i.e. changes to attribute
values) can be easily applied to facts that are already loaded into a cube. This is the essential
difference with section 2.1.2.7.1.
A typical example is the hierarchy of sales office – sales group – sales person. This hierarchy has a
fixed number of levels but is frequently reorganized.
From a performance perspective the same arguments hold as in section 2.1.2.6. In general, this is the
least attractive hierarchy modeling technique as it performs worse than 2.1.2.7.1 and frequently not
better than the one in the following section. It is more flexible than 2.1.2.7.1 but less flexible than
2.1.2.7.3.
2.1.2.7.3
External Hierarchies
An ideal external hierarchy

frequently changes and/or

has no fixed number of levels (sometimes referred to as a unbalanced hierarchy).
A typical example is a cost center hierarchy in which several (sub-)cost centers belong to one cost
center which itself belongs to another cost center and so on. Such a hierarchy which has no fixed
number of levels as cost centers usually corresponds to departments or groups within a company
which might be reorganized into new subgroups. Thus, new levels might be introduced, old ones
disappear, the hierarchy might be deeper at one end (due to a deeper hierarchical organization) and
shallow at the other end.
Another major advantage of external hierarchies vs. their alternatives is that an InfoObject can have
several such hierarchies and all these can be used within the same InfoCube. The same effect could
only be achieved through unpleasant work-arounds when using the alternative approaches.
The performance issues connected to this type of hierarchy are the following:

External hierarchies usually perform worse than those modeled within dimensions.

They usually perform at least as well as the hierarchies based on navigational attributes.
1999 SAP AG
5
SIZING AND PERFORMANCE

Problems can arise for big external hierarchies containing many thousands of nodes and leaves.
In that case it might be better to consider one of the other two alternatives.

You can explicitly defined aggregates on levels of such hierarchies. Queries that summarize data
on higher levels can take advantage of such an aggregate.
2.1.2.8
Aggregates
The objective of aggregates is to reduce the volume of data per query being read. In fact aggregates
are a new separate, transparent InfoCube which holds aggregated data. The user gets the improved
performance without any intervention or knowledge if aggregates are being defined and created.
E.g.:
Country
Customer
Sales
US
1
10
US
2
15
IT
1
20
IT
3
50
Using a Country Aggregate:
Country
Sales
US
25
IT
70
Using a Customer Aggregate:
Customer
Sales
1
30
2
15
3
50
Aggregates can be created on navigational attributes, hierarchy levels and dimension characteristics.
Aggregate tables cannot be created on hierarchies where the structure is time-dependent, and timedependent navigational attributes of time-dependent data. Best results are gained for aggregates for
external hierarchies and navigational attributes.
Detailed information on aggregates can be found in the methodology paper “Aggregates”.
2.1.3
Loading of Data
Two different techniques are available for loading data into BW: Loading data using the
IDoc - technology or using tRFC.
2.1.3.1
Data Packages
This size is defined by the parameter IDOC_PACKET_SIZE in the table RSADMINC for BW file or
BAPI uploads and is set as the number of records to fill a data package.
1999 SAP AG
6
SIZING AND PERFORMANCE
For loading data from the R/3 OLTP System, the maximum size of one data IDoc can be set in the
source system table ROIDOCPRMS (size in [kB]). The following issues are relevant when determining
an appropriate value for IDOC_PACKET_SIZE:

When in BW update rules will find records with equal keys the rows within an IDoc are preaggregated if possible. This means that rows with the same key values are aggregated and as a
result reduce the number of INSERT operations at the database level. Pre-aggregation is
restricted to the rows within a data package.

If the data that is to be loaded into the data warehouse is likely to benefit from this preaggregation, meaning it is likely to hold many rows with matching key values, then data packages
should comprise many rows in order to maximize the benefit from pre-aggregation.
Performance improvements can be obtained if the following recommendations are adhered to.
1. Keep the number of the data packages small. We recommend a data package size of 20000 to
50000 kB (limited by RAM as well as DB depending, see SAPNET note 130253). You can
maintain this value for an R/3 System in its table ROIDOCPRMS and for a file in the BW system
in table RSADMIN.
2. While loading large quantities from a file we recommend that you split the file into several parts.
These can then be loaded in parallel into the BW with several requests. However, a precondition
is that several processors and a quick RAID are available in the BW System.
3. Use a predefined record length (ASCII file) when loading from a file. For a CSV file, the
conversion to a predefined record length is carried out in the system during the loading process.
4. If possible, load the data from a file from the application server and not from the client
workstation.
5. First, load the master data for all characteristics, which are in the InfoCube or the InfoSource in
question. As a result, you avoid that your transaction data contains many characteristics for which
a new SID must be determined.
2.1.3.2
Persistent Staging Area (PSA)
The PSA is a storage area in which transaction data from different source systems can be stored
temporarily. Data storage takes place in relational database tables of the BW. The data format
remains unchanged by this, meaning that no summarizations or transformations whatsoever take
place via definitions as they do in the case of InfoCubes.
When using the PSA, an individual transparent PSA table is created for each transfer structure. Each
one has the same structure as the respective transfer structure. If you change the transfer structure
by maintaining the transfer rules, as a rule, a new PSA table is then generated. The previous table
remains unchanged and is provided with a created version.
The data gets into the PSA from a combination of the source system and InfoSource.
If you set up the PSA with a data extraction, then you achieve an improved performance because
tRFC is used, and can trace the data flow of individual data records thanks to the temporary storage in
the PSA.
2.1.3.3

Benefits of PSA vs. IDoc
allows more than 1000 bytes record length
 performs faster data loading
 provides different load methods:
1999 SAP AG
7
SIZING AND PERFORMANCE

Updating Data in the PSA and InfoCubes/InfoObjects at the Same Time
This is a way of carrying out a high-performance update of data in the PSA, and in one or
more InfoCubes. BW receives data from the source system, at the same time immediately
starts the update in the InfoCubes, and saves the data in the PSA.

Updating Data in the PSA and InfoCubes/InfoObjects at Different Times
This is the standard method. Data is firstly updated in the PSA and can be
subsequently updated from there via the context menu of the corresponding
request.
The preferred method for loading data into BW is to use PSA if this is possible for the specific type of
InfoSource. (PSA is not currently usable for hierarchy data).
Before loading transaction data into BW, we recommend that the master data load is finished. The
master data load creates the SIDs to the key-values. This is much more efficient than doing this
during the transaction data upload.
More important is the fact that the transaction data can be checked against the existence of master
data if this is loaded first. To get clean and valid data we recommend that you load transaction data
with the check option. If there is any invalid data the complete InfoPackage which contains the invalid
value will be marked as not posted in the monitor. You can identify which value of which InfoObject
causes the problem in the monitor log.
2.1.3.4
2.1.3.4.1
Master data
Initial load
Normally the master data load into the BW is not as performance critical as the load of transactional
data. The amount of data in general is much less than for transactional data.
In order to improve performance switch on number range buffering for all attributes and key values
which must be converted to a SID during the load and for which you expect a high number of different
values. The number range buffering is switched off by default because normally it leads to a loss of
numbers if the occurance of new entries is rare and it consumes memory. Therefore after the initial
load of the master data please reset the buffering to a smaller interval.
Please proceed as follows:

transaction se37 function RSD_IOBJ_GET
single test
input: I_IOBJNM name of the InfoObject
result: structure E_S_VIOBJ contains a field NUMBRANR with a 7 digit number
the name of the number range object is built by adding BIM to the number
table E_T_ATR_NAV contains all navigational attributes

to get all involved number ranges please repeat this for all entries of E_T_ATR_NAV

transaction SNRO ( maintenance of number range objects)
Enter the name of the number range object and press the change button
EDIT ------ SET-UP BUFFERING --- MAIN MEMORY
Now you can enter a number how many number should be buffered in the
number range buffer. Normally a value between 50 and 1000 is appropriate.

After the successful load of the master data records change the settings back to no
buffering.
2.1.3.4.2
Delta load
1999 SAP AG
8
SIZING AND PERFORMANCE
Changes to master data can cause real performance problems because all dependent
aggregates must be rebuilt. In the first step, the changes are stored in the master data
table as “modified”. All queries are running against the old active values. The activation
process now recomputes all dependent aggregates, deletes the old “active” version and
changes the new “modified” to “active”. The same happens when changing external
hierarchies.
2.1.3.5
2.1.3.5.1
Transaction data
Initial load
The initial load of transaction data is normally a very large amount of data. There is no special
functionality for the initial load of transaction data. What is the best way to load a large amount of
transaction data into BW?

Extraction
The first step of the loading process is to build up the extract out of the OLTP system. To support
parallel loading you can build several extracts containing disjunct data. Often the best criterion for
splitting the extracts is time, e.g. if you load 2 years of transaction data you get 24 extracts if you
build one extract for each month. The extracts can be written in parallel into the PSA. Where
possible it is beneficial to provide the data sorted by characteristics in the order of the fact table
key. The recommendation is to sort the data if the DataSource is a data file.
Update Rules
If it’s possible provide extraction data such that no complex update rules (e.g. data modification
and verification) must be processed. If you need to read master data records in an update routine
use the standard functionality instead of own programming.
In depth information about performance-efficient programming can be obtained by using
transaction SE38  Report: RSHOWTIM.
Upload into the InfoCube
We recommend that you choose the data load with checking for the existence of master data
records. This means the master data for the characteristics should be loaded first. Loading data with
“automatic SID creation “ causes a large overhead.
Before a mass data load is triggered the secondary indexes on the fact table should be deleted. After
the mass data load has finished, these indexes must be recreated. The benefit of this procedure is to
avoid overhead that is caused by maintaining such an index during the insert.

Number range buffering for dimension tables should be switched on for the initial load. This will
speed up the loading process dramatically, especially if very large dimension tables must be built
during loading. For the delta load, the buffering should be reset to a smaller interval. For the
standard dimensions TIME, UNIT and PACKAGE you normally don’t need to activate the number
range buffering.

How does it work?

Buffering the number range for the dimension
Transaction: se37
Function Module: RSD_CUBE_GET
Enter name of InfoCube,
Object version: A
Bypass Buffer: X
Press ‘Execute’
the table E_T_DIME
The name of the number range can be found under: NOBJECT (e.g. BID0000053)
For each number range object:
1999 SAP AG
9
SIZING AND PERFORMANCE
Transaction: SNRO
Enter object
Press ‘Change’
Menu: ‘Edit’ -> ‘Set up buffering’ -> ‘Main memory’
values between 50 and 1000 are reasonable.
The effect on the performance due to the use of the above described recommendations cannot be
named in values or factors. This heavily depends on the amount of data which is to be loaded. In
general, you can say that the effect increases with the increase of data volume but there is no linear
relationship.
Normally, you should run the upload in the background. If you want to load online please set the SAP
R/3 parameter rdisp/max_wprun_time to 0 (zero) in order to allow unlimited CPU time for dialog work
processes during mass data loads.
2.1.3.5.2
Delta load
Typically a delta load is much smaller than an intial load thus performance is not that important.
Otherwise the same recommendations hold as for initial loads.
2.1.3.6
General remarks
No formal rule exists for update frequencies from source systems to the BW. BW allows loading data
during normal reporting but you should consider the competition of read and load processes. You
have to define your company specific data load frequencies, such as a daily update for SD and a
monthly update for FI, and evaluate the impacts on query execution and performance. The possible
number of aggregates you can maintain depends on the frequency of the transactional data load. On
an InfoCube with a low frequency of delta uploads (monthly) you can maintain more aggregates than
on an InfoCube with a high delta frequency. The posting of master data changes is a different
situation. If a lot of navigational attributes are involved, the activation of the master data causes the
rebuild of all affected aggregates.
The database administrator should create separate tablespaces for very large fact tables (and very
large ODS or PSA tables). The storage parameters must be adapted to the large size of the tables.
Most of the problems during initial loads are tablespace overflows of fact, ODS or PSA tables. In the
case of IDocs the table “EDI40” causes most of the problems. With enough diskspace and large
extents (e.g. 100 – 500 MB ) it’ s easy to avoid abnormal terminations of the loading processes.
Please bear in mind that the indexes of large tables are also large. For the maintenance of the
rollback segment in BW, we recommend that you have as many rollback segments as the maximum
number of parallel upload processes. Enough diskspace for the temporary tablespace and the
rollback tablespace is necessary for large package sizes and the index creation of large tables. This
temporary space is also necessary for the OLAP engine and especially for the creation of aggregates.
2.1.3.7
Compression of the InfoCube
When uploading a request into BW the data of each request will be saved with its own request ID. As
a result, records which had been uploaded with the same key may be contained in an InfoCube
several times but with a different request ID. This differentiation is necessary to be able to delete
single requests out of an InfoCube after they have been uploaded.
As a result, it is necessary to aggregate over the request ID each time a query is executed. Apart from
additionally needed disk space for the different requests, this separation leads to a decreasing
performance. The compress function is designed to eliminate this disadvantage and thus speed up
reporting. Therefore all the different requests will be aggregated and stored in a separate request. The
original requests will be deleted after the compression is done. Afterwards it is no longer possible to
delete records of a particular request out of the InfoCube.
For performance and disk space reasons a request should be aggregated as soon as possible if you
are sure that the request will never be deleted out of the InfoCube.
1999 SAP AG
10
SIZING AND PERFORMANCE
The compress functionality has an additional influence on the query performance if the InfoCube
contains non-cumulative values. Together with the compression described above the reference point
for non-cumulative values will be updated. As a result, in general less data has to be read for a query
with non-cumulative values and thus the response time is better.
Example: Assuming we have an InfoCube which consists of the following characteristics (with its
dimensions):
char1
(dimension 1)
char2
(dimension 1)
char3
(dimension 2)
0calday
(dimension t)
0loc_currcy
(dimension u)
0unit
(dimension u)
0unit_of_wt
(dimension u)
0requid
(dimension p)
and key figures (with its units):
kyf1
(0unit)
kyf2
(0unit)
kyf3
(0unit_of_wt).
Furthermore we assume that the InfoCube already contains data which has been aggregated in
requestID 0 by a previous condense run. The example simply shows the entries in the fact table which
means that the key figure values will only be differentiated through the respective SID’s of the
dimensions. The column number is not part of the InfoCube but is just used to facilitate the
referencing of individual rows.
Nr. DimP DimU DimT Dim1 Dim2 Kyf1 Kyf2 Kyf3
1
0
1
1
1
1 100 100
50
2
0
1
1
2
1 100 -100
0
3
0
1
1
6
1 100
0
50
Afterwards two further requests will be read (assumed with request SID 6 and 7):
Nr. DimP DimU DimT Dim1 Dim2 Kyf1 Kyf2 Kyf3
1
6
1
1
1
1
0 -100
50
2
6
1
1
2
1 -100 100
0
3
6
1
1
3
1 100
4
6
1
1
4
1
5
6
1
1
5
1 100
6
6
1
1
6
1 -100
0
0 -100
-50
50
0 -100
50
0
Nr. DimP DimU DimT Dim1 Dim2 Kyf1 Kyf2 Kyf3
1
7
1
1
4
1
2
7
1
1
5
1 100 100 100
3
7
1
1
6
1 100 100
1999 SAP AG
0
50 -50
50
11
SIZING AND PERFORMANCE
If a condense run for these two requests is to be executed you will get the following result:
Nr. DimP DimU DimT Dim1 Dim2 Kyf1 Kyf2 Kyf3
1
0
1
1
1
1 100
0 100
2
0
1
1
2
1
0
3
0
1
1
3
1 100
0 -100
4
0
1
1
4
1
0
0
5
0
1
1
5
1 200 100
0
6
0
1
1
6
1 100 200
50
0
0
0
The table results from the following actions:

No.1 (or No. 2): Update of records 1 (or 2) of request 0 containing record 1 (resp. 2) of request 6

No. 3: insert of record 3 of requests 6 in request 0

No. 4 (resp. No. 5): aggregation of records 4 of request 6 and record 1 of request 7 (resp. record
5 of request 6 and record 2 of request 7) and subsequent insert in request 0.

Nr. 6: aggregation of record 6 of requests 6 and record 3 of requests 7 and subsequent update on
record 6 of requests 0.
The result table above contains several entries with zero values as key figures. If these values are not
desired (they can for example, occur through reverse postings) a zero elimination can be done during
the condense run. Then all entries in the fact table which have zero value for key figures will be
eliminated.
Example: The result table above would look like the following when using the zero elimination
functionality:
Nr. DimP DimU DimT Dim1 Dim2 Kyf1 Kyf2 Kyf3
1
0
1
1
1
1 100
0 100
3
0
1
1
3
1 100
0 -100
5
0
1
1
5
1 200 100
0
6
0
1
1
6
1 100 200
50
This result now shows us that the two entries (2 and 4 of the result table without zero elimination)
which have zeros for all key figures would no any longer be contained in the result fact table. While for
entry 2 via record 2 from request 6 an already existing record in the fact table is deleted for entry 4 the
insert of a new zero value record will be prevented through entry 4 of request 6 and 1 request.
2.1.3.8
Administration of Extraction
The administration of a Data Warehouse is one main location where you can heavily influence a
systems performance. E.g.:

The extraction of data from the source system should not concur with the online activities in both
the source and the Data Warehouse system. Therefore loading should take place at off-hours.

Avoid large loads across a network.

Opt for the proper loading source:
1999 SAP AG
12
SIZING AND PERFORMANCE

Avoid reading load files from tape, instead copy them to disk first.

Avoid placing input load files on the same disk drives or controllers as the tables being
loading.

During testing it is fine to load data directly from the workstation but in case of mass data you
should dedicate an application server in your source system landscape to extraction. When
loading from the workstation, the whole file is stored in the RAM of the application server,
which could lead to an overflow in the RAM. When loading data from the application server
the file is read by record and therefore no limitation occurs with respect to the size of the
RAM.

Use truncate not delete if replacing the entire table contents.

Investigate loading with archiving turned off. The log file can become the performance bottleneck,
not to mention the required disk space for the log. (Don’t forget to create a backup first and turn
archiving back on after loading)
2.1.4
Querying
It might be beneficial to add further indexes on individual SID fields in the dimension or master data
tables, in particular if this SID refers to a characteristic that is frequently restricted by the queries
running on your system. If possible, such indexes should be of type bitmap, but B-tree indexes are
fine, too.
The design of the queries for an InfoCube has a direct impact on reporting performance of the BW
system. Naturally, a query is defined to return the result set that a user expects and one does not want
to concede anything with respect to that. However, users usually look at the query result in portions,
i.e. they first look at the profits per year just to drill down later in order to see the profits per year, per
region. It is much faster to define this query moving region into the rows and year to the free
characteristics section. In comparison to a query that holds both, region and year, in the rows section
of the query definition screen, this approach has the following advantage: the result set of the initial
step is smaller. This implies faster query processing on the database server (due to smaller
intermediate results) and faster data transport from the database to the application server and from
there to the PC.
This approach does not reduce the result that the user can possibly expect from that query. However,
it separates the drill down steps by using free characteristics. This not only accelerates query
processing but also presents manageable portions of data to the user.
2.1.5
Size of Queries
There is no limitation regarding the size of an InfoCube or the size of a query result set, but there are
limitations given by the operating system, the application, such as Microsoft Excel’s limit of 65,536
lines, or the database-application server communication, which may be a maximum of 1.5 million
records. When defining queries, the duration of the query execution should be considered as well.
For online browsing and interactive navigating in result sets of a query we recommend that you keep
the number of result sets at a minimum (max. 10.000 records). For larger result sets batch printing
should be considered as an alternative.
2.1.6
Batch printing
You do not just have to evaluate queries online, you can also print them out. You can change the
evaluation of a query in the online display interactively by navigating. The evaluation of the query data
in the printed version is fixed.
1999 SAP AG
13
SIZING AND PERFORMANCE
You can print queries according to their online display, but special print settings can also be made
(e.g., print title, maximum number of pages, result position, page layout for top/end page and
header/footer part). Different evaluations (or navigational steps) are possible with these print settings
(e.g., choose print characteristics from all available characteristics, new cover page, new page or new
block for each characteristic). They are produced as different variants of the print settings of a query.
This means that you have the possibility of evaluating data online and in a printout using just one
query definition. You can define the online display of a query and the print evaluation differently.
Batch printing allows you to schedule print jobs. So you can use off-hours for this.
2.2 System settings
OLTP and OLAP systems serve different purposes. The typical main job of an OLTP system is fast
order processing, whereas an OLAP system needs to be optimized for querying the database.
Therefore, the SAP System parameters for the Business Warehouse are different from an OLTP
system. This includes different defaults/parameters for:

table buffers

program buffers

memory management
The difference between OLTP and OLAP systems mentioned above also has an impact on the
database profile.
2.2.1
Database Storage Parameters
It is possible to put a fact table and the dimension tables of an InfoCube into tablespaces that are
different from the default tablespace(s). This can be done by assigning a different "data type" for those
tables; the data dictionary links those data types directly to particular tablespaces.
In case of AS/400 it is not necessary to set storage parameters due to the single level storage feature.
2.2.2
DB Optimizer statistics
The query optimizers of the database management systems decide on the most appropriate query
execution plan depending on the statistical information that is provided for the tables that participate in
the query. If statistical information is missing for one of the tables (and possibly its indexes), it uses
default statistics. However, real-world experience shows that such a situation usually ends up with bad
query execution plans. Therefore, it is necessary to make sure (a) that statistical information exists
and (b) that it is kept up to date. That’s why you should have the DB statistics recalculated after each
data load for an InfoCube (if applicable for your database system).
2.3
Monitoring
2.3.1
BW Statistics
There are extensive BW Statistics which help to analyze system performance and define and optimize
aggregates. The BW Statistics are explained in detail in the methodology paper BW Statistics.
The BW Statistics data allow you to answer questions such as:

Which InfoCubes, InfoObjects, InfoSources, source systems, queries, and aggregates are used in
the BW System?

How large is the data volume in the system?

How the BW system is being used and by whom?
1999 SAP AG
14
SIZING AND PERFORMANCE

Which queries are taking too much time for online reporting?

Which resources have been used by which user / user group?

Which system resources have been used by the database / OLAP processor / frontend?
n general, the BW Statistics show us how the BW system is being used and can be used to identify
system resource bottlenecks You can switch on/off the update of BW Statistics for each InfoCube.
There are separate flags for OLAP and WHM (Warehouse Management) statistics.
3 Measurements
Certified results of the SAP BW Application Benchmark can be found at
http://sapnet.sa.com/benchmark.
4 Detailed Information
Since any specific information on e.g. system parameters, database parameters is outdated once it is
written down, there are no values given here. Currently network requirements are scrutinized but there
are no results, yet. So please check SAPNet regularly for up to date information about BW
performance. In the collective note 184905, Collective Note Performance 2.0 you can find links to all
relevant notes.
There is a three day course TABW90, “BW-Technical Administration”. One large part of this course
deals with performance issues.
1999 SAP AG
15
Download