MY COLLECTING DOCUMENT Data Extraction is the process of

advertisement
MY COLLECTING DOCUMENT
Data Extraction is the process of loading data from OLTP to OLAP (BW/BI). Here is an
illustration...
I have a company, in which daily thousands of transactions happen all over the world. So to analyze my
business on yearly or monthly basis, i am moving to SAP BW/BI, so that i can generate reports and take
business decisions.
Tomorrow i am going to load all the data which was captured till yesterday, from SAP R/3 to BW/BI. I
do a full load for this. After completing this task, i need to load the transactions that will happen from
tomorrow to BW. This can be done either daily or weekly or monthy based on the volume of
transactions.
If there are 1000s of transactions per day, i can use daily load, 10000s - weekly, if in lakhs- monthly.
So, in precise, data has to be extracted in two modes:
1. Full load - Entire data which is available at source is loaded to BW/BI
2. Delta load - Only the new/changed/deleted data is loaded.
Full Load Data FLow:
Let us see, how the data is loaded to BW in Full mode.
Here we need to understand few basic things which happen on R/3 side.
Document Posting means creating a transaction, writing into the application/transaction tables.
So whenever sales order is created ( document posted), it transaction is written into the database
tables/application tables/transaction tables (Ex. EKPO, EKKO, VBAK, VBAP)
Whenever you are doing a full load, setup tables are used.
setup tables:
Access to application tables are not permitted, hence setup tables are there to collect the required
data from the application tables.
When a load fails, you can re-run the load to pull the data from setup tables. Data will be there in
setup tables. Setup tables are used to Initialize delta loads and for full load. Its part of LO Extraction
scenario.
With this option, you avoid pulling from R/3 directly as we need to bring field values from multiple
tables. You can see the data in the setup tables.Setup table table name wiil be extract structure name
followed by SETUP. Set up table names starts with 'MC' followed by application component '01'/'02' etc
and then last digits of the datasource name and then followed by SETUP
Also we can say the communication structure (R/3 side,you can check it in LBWE also) name followed
by 'setup'
example: MC13VD0HDRSETUP



If you want to check data in set up tables you better look at the transaction NPRT here you can
see the table name from which data is picking.
Setup tables are cluster tables and are used to extract the data from R/3 Tables.(LO
Extractors)
Basically, for entire application like SD-Billing we have got it's own setup Tables...so while
filling the set-up tables, we usually fill for the entire application.
Ex: OLI7BW is for filling setup Tables for SD application.
OLI9BW T-code is for Billing Application,

When u fill the setup Tables, the data from different tables..VBAK, VBAP, VBRK, VBRP...etc
will come through communication Structures and saved in SetupTables...

The main advantage of having setup Tables is, we can read the data in different levels..Header
level as well as Item level.
when we run init load or Full load in BW, the data will be read from Setup Tables for the first
time( Entire data will be read).... and the delta records will be updated to Delta Queue once
the v3 job runs... and we can extract the delta records from Delta Queue.
once we succefully run the init, we can delete setup Tables.
Filling up the set up tables depends on the datasource.



There are different T-codes for the respective extract structures
OLIIBW transaction PM data
OLI1BW INVCO Stat. Setup: Material Movemts
OLI2BW INVCO Stat. Setup: Stor. Loc. Stocks
OLI3BW Reorg.PURCHIS BW Extract Structures
OLI4BW Reorg. PPIS Extract Structures
OLI7BW Reorg. of VIS Extr. Struct.: Order
OLI8BW Reorg. VIS Extr. Str.: Delivery
OLI9BW Reorg. VIS Extr. Str.: Invoices
OLIABW Setup: BW agency business
OLIFBW Reorg. Rep. Manuf. Extr. Structs
OLIIBW Reorg. of PM Info System for BW
OLIQBW QM Infosystem Reorganization for BW
OLISBW Reorg. of CS Info System for BW
OLIZBW INVCO Setup: Invoice Verification
OLI7BW is the tcode for Sales Order.
Delta Load:
As a prerequisite we need to discuss various update methods for delta load.
1.
2.
3.
4.
Serialized V3
Queued Delta
Direct Delta
Unserialized V3
Before that we need to understand V1, V2, V3 updates. These are different work processes on the
application server that makes the update LUW from the running program and execute it. These is
separated to optimize the transaction processing capabilities.
These are different work processes on the application server that makes the update LUW from the
running program and execute it. These is separated to optimize the transaction processing capabilities.
For Example :
If you create/change a purchase order (me21n/me22n), when you press 'SAVE' and see a success
message (PO.... changed..), the update to underlying tables EKKO/EKPO has happened (before you saw
the message). This update was executed in the V1 work process.
There are some statistics collecting tables in the system which can capture data for reporting. For
example, LIS table S012 stores purchasing data (it is the same data as EKKO/EKPO stored redundantly,
but in a different structure to optimize reporting). Now, these tables are updated with the transaction
you just posted, in a V2 process. Depending on system load, this may happen a few seconds later (after
you saw the success message). You can see V1/V2/V3 queues in SM12 or SM13.
Statistical tables are for reporting on R/3 while update tables are for BW extraction. And is data stored
redundantly in these two (three if you include application tables) sets of table.
Difference is the fact that update tables are temporary, V3 jobs continually refreshes these tables (as I
understand). This is different from statistics tables which continue to add all the data. Update tables
can be thought of as a staging place on R/3 from where data is consolidated into packages and sent to
the delta queue (by the V3 job).
Update tables can be bypassed (if you use 'direct' or 'queued' delta instead of V3) to send the updates
(data) directly to the BW queue (delta queue). V3 is however better for performance and so it is an
option along with others and it uses update tables.
Statistical table existed since pre BW era (for analytical reporting) and have continued and are in use
when customers want their reporting on R/3.
The structure of statistical table might be different from the update table/BW queue, so, even though
it is based on same data, these might be different subsets of the same superset.
V3 collective update means that the updates are going to be processed only when the V3 job has run.
At the time of oltp transaction, the update entry is made to the update table. Once you have posted
the txn, it is available in the update table and is waiting for the V3 job to run. When V3 job runs, it
picks up these entries from update table and pushes into delta queue from where BW extraction job
extracts it.

Synchronous update (V1 update): Statistics update is carried out at the same time
(synchronous) as the document update (in the application tables).

Asynchronous update (V2 update): Document update and the statistics update take place in
different tasks.
So, V1 and V2 updates don’t require any scheduling activity.

Collective update (V3 update):As for the previous point (V2), document update is managed in a
separate moment from the statistics update one, but, unlike the V2 update, the V3 collective
update must be scheduled as a job (via LBWE).
Remember that the V3 update only processes the update data that is successfully processed with the
V2 update.
------------------------------------------------------------------------------------Serialized V3:
Take an example of the same PO item changing many times in quick succession.
V1 (with enqueue mechanism) ensures that the OLTP tables are updated consistently. Update table
gets these update records which may or may not end up in correct sequence (as there is no locking)
when it reaches BW. 'Serialized V3' was to ensure this correct sequence of update records going from
update tables to delta queue (and then to BW).
Since update table records have the timestamp, when the V3 job runs, it can sequence these records
correctly and thus achieve 'serialization'.
The problems in Serialized V3 update are:

Several changes in one second: For technical reasons, collective run updates that are
generated in the same second cannot be serialized. That is, the serialized V3 update can only
guarantee the correct sequence of extraction data in a document if the document did not
change twice in one second.

Different instances and times synchronization: I think it’s easy to verify how much it is
probable that in a landscape in which there are several application servers for the same
environment different times can be displayed.The time used for the sort order in our BW
extractions is taken from the R/3 kernel which uses the operating system clock as a time
stamp. But, as experience teaches, in general, the clocks on different machines differ and are
not exactly synchronized.The conclusion is that the serialized V3 update can only ensure the
correct sequence in the extraction of a document if the times have been synchronized exactly
on all system instances, so that the time of the update record (determined from the locale
time of the application server) is the same in sorting the update data.

The V2 update dependence: Not to be pitiless, but the serialized V3 update have also the
fault of depending from the V2 processing successful conclusion.Our method can actually only
ensure that the extraction data of a document is in the correct sequence (serialized) if no error
occurs beforehand in the V2 update, since the V3 update only processes update data for which
the V2 update is successfully processed.Independently of the serialization, it’s clear that
update errors occurred in the V2 update of a transaction and which cannot be reposted, cause
that the V3 updates for the transaction that are still open can never be processed.This could
thus lead to serious inconsistencies in the data in the BW system.
Example:
Take a case where the first update (based on earliest timestamp) to be processed is in language EN (for
same PO item). V3 job is then going to process all the update records of language EN in chronological
sequence before going to next language records. If another language update (for same PO item)
happened in between two EN language updates, this is going to be processed later after all EN updates
are processed and thus become out of sequence.
In the above figure, all the documents in red color (EN language) will be processed first and later blue
colored (IT language), which is an inconsistancy in sequence.
Direct Delta ( 2nd delta update method in our list)
With this update mode,


Each document posting is directly transferred into the BW delta queue
Each document posting with delta extraction leads to exactly one LUW in the respective BW
delta queues
Just to remember that ‘LUW’ stands for Logical Unit of Work and it can be considered as an inseparable
sequence of database operations that ends with a database commit (or a roll-back if an error occurs).
Benifits:


Limits:
There’s no need to schedule a job at regular intervals (through LBWE “Job control”) in order to
transfer the data to the BW delta queues; thus, additional monitoring of update data or
extraction queue is not required.
Logically, restrictions and problems described in relation to the "Serialized V3 update" and its
collective run do not apply to this method: by writing in the delta queue within the V1 update
process, the serialization of documents is ensured by using the enqueue concept for
applications and, above all, extraction is independent of V2 update result.

The number of LUWs per datasource in the BW delta queues increases significantly because
different document changes are not summarized into one LUW in the BW delta queues (as was
previously for V3 update).Therefore this update method is recommended only for customers
with a low occurrence of documents (a maximum of 10000 document changes - creating,
changing or deleting - between two delta extractions) for the relevant application. Otherwise,
a larger number of LUWs can cause dumps during extraction process.

No documents can be posted during delta initialization procedure from the start of the
recompilation run in R/3 (setup tables filling job) until all records have been successfully
updated in BW: every document posted in the meantime is irrecoverably lost.

V1 update would be too much heavily burdened by this process.
(Remember that stopping the posting of documents always applies to the entire client).
-----------------------------------------------------------------------------------------Queued Delta ( the third update method)
With queued delta update mode, the extraction data (for the relevant application) is written in an
extraction queue (instead of in the update data as in V3) and can be transferred to the BW delta
queues by an update collective run, as previously executed during the V3 update.
After activating this method, up to 10000 document delta/changes to one LUW are cumulated per
datasource in the BW delta queues.
If you use this method, it will be necessary to schedule a job to regularly transfer the data to the BW
delta queues
As always, the simplest way to perform scheduling is via the "Job control" function in LBWE.
SAP recommends to schedule this job hourly during normal operation after successful delta
initialization, but there is no fixed rule: it depends from peculiarity of every specific situation (business
volume, reporting needs and so on).
Benifits:

When you need to perform a delta initialization in the OLTP, thanks to the logic of this
method, the document postings (relevant for the involved application) can be opened again as
soon as the execution of the recompilation run (or runs, if several and running in parallel) ends,
that is when setup tables are filled, and a delta init request is posted in BW, because the
system is able to collect new document data during the delta init uploading too (with a deeply
felt recommendation: remember to avoid update collective run before all delta init requests
have been successfully updated in your BW!).

By writing in the extraction queue within the V1 update process (that is more burdened than by
using V3), the serialization is ensured by using the enqueue concept, but collective run clearly
performs better than the serialized V3 and especially slowing-down due to documents posted in
multiple languages does not apply in this method.

On the contrary of direct delta, this process is especially recommended for customers with a
high occurrence of documents (more than 10,000 document changes - creation, change or
deletion - performed each day for the application in question.

In contrast to the V3 collective run (see OSS Note 409239 ‘Automatically trigger BW loads upon
end of V3 updates’ in which this scenario is described), an event handling is possible here,
because a definite end for the collective run is identifiable: in fact, when the collective run for
an application ends, an event (&MCEX_nn, where nn is the number of the application) is
automatically triggered and, thus, it can be used to start a subsequent job.

Extraction is independent of V2 update.
Limits:

V1 is more heavily burdened compared to V3.

Administrative overhead of extraction queue.
Note:
if you want to take a look to the data of all extract structures queues in Logistic Cockpit, use
transaction LBWQ or "Log queue overview" function in LBWE (but here you can see only the
queues currently containing extraction data).
2. In the posting-free phase before a new init run in OLTP, you should always execute (as with the
old V3) the update collective run once to make sure to empty the extraction queue from any
old delta records (especially if you are already using the extractor) that, otherwise, can cause
serious inconsistencies in your data.
3. Then, if you want to do some change (through LBWE or RSA6) to the extract structures of an
application (for which you selected this update method), you have to be absolutely sure that
no data is in the extraction queue before executing these changes in the affected systems (and
especially before importing these changes in production environment !).
To perform a check when the V3 update is already in use, you can run in the target system the
RMCSBWCC check report.
1.
-------------------------------------------------------------------------------------Unserialized V3 : (The last one)
With this update mode, that we can consider as the serializer’s brother, the extraction data continues
to be written to the update tables using a V3 update module and then is read and processed by a
collective update run (through LBWE).
But, as the name of this method suggests, the V3 unserialized delta disowns the main characteristic of
his brother: data is read in the update collective run without taking the sequence into account and
then transferred to the BW delta queues.
Issues:

Only suitable for data target design for which correct sequence of changes is not important
e.g. Material Movements

V2 update has to be successful
When this method can be used ?
Only if it’s irrelevant whether or not the extraction data is transferred to BW in exactly the same
sequence (serialization) in which the data was generated in R/3 (thanks to a specific design of data
targets in BW and/or because functional data flow doesn’t require a correct temporal sequence).
Here ends the update methods.
************************************************************
Some important points :



The setup tables are the base tables for the Datasource used for Full upload.So if you are going
for only full uploadfull update is possible in LO extractors.
Full update is possible in LO extractors.In the full update whatever data is present in the setup
tables(from the last done init) is sent to BW.
But setup tables do not receive the delta data from the deltas done after the init.So if ur full
update should get ALL data from the source system,u will need to delete and re-fill setup
tables.
**************************************************************
Some Questions:
Question:
The serialized V3 update can guarantee the correct sequence for the extraction data of a document
only if there were no errors in the V2 update. This is because the V3 update only processes update data
for which a V2 update has be carried out successfully.
Why is V3 dependent on V2, what is V2 and V1 update?
Answer:
V1 - Synchronous update
V2 - Asynchronous update
V3 - Batch asynchronous update
These are different work processes on the application server that takes the update LUW (which may
have various DB manipulation SQLs) from the running program and execute it. These are separated to
optimize transaction processing capabilities.
Taking an example If you create/change a purchase order (me21n/me22n), when you press 'SAVE' and see a success
message (PO.... changed..), the update to underlying tables EKKO/EKPO has happened (before you saw
the message). This update was executed in the V1 work process.
There are some statistics collecting tables in the system which can capture data for reporting. For
example, LIS table S012 stores purchasing data (it is the same data as EKKO/EKPO stored redundantly,
but in a different structure to optimize reporting). Now, these tables are updated with the txn you just
posted, in a V2 process. Depending on system load, this may happen a few seconds later (after you saw
the success message). You can see V1/V2/V3 queues in SM12 or SM13.
V3 is specifically for BW extraction. The update LUW for these is sent to V3 but is not executed
immediately. You have to schedule a job (eg in LBWE definitions) to process these. This is again to
optimize performance.
V2 and V3 are separated from V1 as these are not as realtime critical (updating statistical data). If all
these updates were put together in one LUW, system performance (concurrency, locking etc) would be
impacted.
Serialized V3 update is called after V2 has happened (this is how the code running these updates is
written) so if you have both V2 and V3 updates from a txn, if V2 fails or is waiting, V3 will not happen
yet.
BTW, 'serialized' V3 is discontinued now, in later releases of PI you will have only unserialized V3.
-------------------------------------------------------------------Question:
There are following tables
1. Application tables (R/3 tables)
2. Statistical tables (for reporting purpose)
3. update tables
4. BW queue
For Application tables its V1 update, statistical tables its V2 update and is it that the same information
is again redundantly stored in update tables?
How are statistical tables different from update tables? I mean i understood what statistical tables are,
my question is "Is the same information again redundantly stored in update tables for Collective V3
update to pull the records to BW Queue".
I mean is V3 collective update same as Synchronous V3 update? How does the records get saved in
update tables?
Answer:
Statistical tables are for reporting on R/3 while update tables are for BW extraction. Is data stored
redundantly in these two (three if you include application tables) sets of table?, yes it is.
Difference is the fact that update tables are temporary, V3 jobs continually refresh these tables (as I
understand). This is different from statistics tables which continue to add all the data. Update tables
can be thought of as a staging place on R/3 from where data is consolidated into packages and sent to
the delta queue (by the V3 job).
Update tables can be bypassed (if you use 'direct' or 'queued' delta instead of V3) to send the updates
(data) directly to the BW queue (delta queue). V3 is however better for performance and so it is an
option alongwith others and it uses update tables.
Statistical table existed since pre BW era (for analytical reporting) and have continued and are in use
when customers want their reporting on R/3.
The structure of statistical table might be different from the update table/BW queue, so, even though
it is based on same data, these might be different subsets of the same superset.
V3 collective update means that the updates are going to be processed only when the V3 job has run. I
am not sure about 'synchronous V3'. Do you mean serialized V3?
At the time of oltp transaction, the update entry is made to the update table. Once you have posted
the txn, it is available in the update table and is waiting for the V3 job to run. When V3 job runs, it
picks up these entries from update table and pushes into delta queue from where BW extraction job
extracts it.
----------------------------------------------------------------------------------Question:
what do you mean by serialization?is it the serialization beween sequence of records in update tables
to the sequence in BW Queue?
and
Can you explain little more about the Collective run performance with different languages.
Answer:
The requirement in 'delta' capturing on R/3 side is to be able to capture the delta 'exactly once in
order'.
Take an example of the same PO item changing many times in quick succession.
V1 (with enqueue mechanism) ensures that the OLTP tables are updated consistently. Update table
gets these update records which may or may not end up in correct sequence (as there is no locking)
when it reaches BW. 'Serialized V3' was to ensure this correct sequence of update records going from
update tables to delta queue (and then to BW).
Since update table records have the timestamp, when the V3 job runs, it can sequence these records
correctly and thus achieve 'serialization'. However, there is a technical problem with this. The
timestamp recorded in update record is sent by the application server (where user executed the txn)
and if there are multiple app servers there might be some inconsistency in their system time which can
cause incorrect serialization.
Another problem is in the fundamental design of the V3 process. V3 Job sequences the updates on
timestamp, and then processes the update records from update table (to send it to delta queue), but it
does so for one language at a time (update record also has user logon language stored). Why this is
done is not clear to me, but it is a basic design feature and can not be subverted.
This causes a potential issue if multiple logon languages are used by users. Serialization may not
happen correctly in such a case. Take a case where the first update (based on earliest timestamp) to
be processed is in language EN (for same PO item). V3 job is then going to process all the update
records of language EN in chronological sequence before going to next language records. If another
language update (for same PO item) happened in between two EN language updates, this is going to be
processed later after all EN updates are processed and thus become out of sequence. The weblog
mentions this scenario.
These two constraints remain for 'serialized V3' where 'serialization' couldn't be truly achieved. Hence
newer PIs have discarded 'serialized V3' altogether and now you do not have this option (if you are
using a newer PI).
If you use 'serialized V3', you have to be clear that the 'serialization' may not always work in the above
two scenarios (multi language environment, and multiple app servers or updates to same records in
same second(as timestamp has granularity upto second level only)).
***************************************************************************
Now we will discuss about the functions of LO-COCKPIT:

Maintain Extract Strucutres: Here you can add additional fields from the communication
structures available to the extract structure.

Maintain Data Sources: In the Data source maintenance screen, you can customize the data
source by using the following fields: field name, short text, selection, hide field, inversion or
cancellation field or reverse posting, and field only known in customer exit.

Activating update: By Setting as active, data is written into extract structures both online as
well as during completion of setup tables or restructure table or LO initialization tables.
Depending on the update mode a job has to be scheduled with which the updated data is
transferred in the background into the central delta management (Delta Queue).

Controlling update:This talks about the delta update mode you are using and how do you
control the data load based on the volume of data. LO Cockpit supports 4 types of update
modes ( delta modes, which we have already discussed):Serialized V3 update,Direct
Delta,Queued Delta,Unserialized V3 update.
QUE:
I have several doubts about LO extraction. Can anyone please help me?
1] In LBWE we first inactive the exttract structure and then do the maintenence and again make it active.
Is this correct?
2] During "Delete set up tables" and "fill set up tables" extract structiure is in "Active " mode (p;lz refer first
point.
3] There is no step performed in between deleting set up tables and filling set up tables?
4] How can we check that "filling up the set up table" step has been completed successfully?
ANS:
You can do the maintenance when it is ACTIVE, no problem when you do maintenance during inactive
and activate again.
Yes can do only deleting and filling the setup when the extract structure is ACTIVE.
No steps involved in between the deleting and filling the setup table
LBWG for deleting and OLI*BW for filling the LO ds
RSA3 t-code will help you to find out how many data record got filled in the setup table
Check the below given steps. It should clear all your doubts I believe.
Go to transaction code RSA3 and see if any data is available related to your
DataSource. If data is there in RSA3 then go to transaction code LBWG (Delete Setup
data) and delete the data by entering the application name.
Go to transaction SBIW --> Settings for Application Specific Datasource --> Logistics -->
Managing extract structures --> Initialization --> Filling the Setup table --> Application
specific setup of statistical data --> perform setup (relevant application)
In OLI*** (for example OLI7BW for Statistical setup for old documents : Orders) give the
name of the run and execute. Now all the available records from R/3 will be loaded to
setup tables.
Go to transaction RSA3 and check the data.
Go to transaction LBWE and make sure the update mode for the corresponding
DataSource is serialized V3 update.
Go to BW system and create infopackage and under the update tab select the initialize
delta process. And schedule the package. Now all the data available in the setup tables
are now loaded into the data target.
Now for the delta records go to LBWE in R/3 and change the update mode for the
corresponding DataSource to Direct/Queue delta. By doing this record will bypass
SM13 and directly go to RSA7. Go to transaction code RSA7 there you can see green
light # Once the new records are added immediately you can see the record in RSA7.
Go to BW system and create a new infopackage for delta loads. Double click on new
infopackage. Under update tab you can see the delta update radio button.
Now you can go to your data target and see the delta record.
Hi,
I just noted the LO Extract steps, but I don't know number 3. Kindly explain me
How I can Give the Transport Request number. Regards,
Pl. be noted i am doing it first time.
Here is LO Cockpit Step By Step
LO EXTRACTION
- Go to Transaction LBWE (LO Customizing Cockpit)
1). Select Logistics Application
SD Sales BW
Extract Structures
2). Select the desired Extract Structure and deactivate it first.
3). Give the Transport Request number and continue
4). Click on `Maintenance' to maintain such Extract Structure
Select the fields of your choice and continue
Maintain DataSource if needed
5). Activate the extract structure
6). Give the Transport Request number and continue
- Next step is to Delete the setup tables
Please follow the below steps,
Here is LO Cockpit Step By Step
LO EXTRACTION
- Go to Transaction LBWE (LO Customizing Cockpit)
1). Select Logistics Application
SD Sales BW
Extract Structures
2). Select the desired Extract Structure and deactivate it first.
3). Give the Transport Request number and continue
4). Click on `Maintenance' to maintain such Extract Structure
Select the fields of your choice and continue
Maintain DataSource if needed
5). Activate the extract structure
6). Give the Transport Request number and continue
- Next step is to Delete the setup tables
7). Go to T-Code SBIW
8). Select Business Information Warehouse
i. Setting for Application-Specific Datasources
ii. Logistics
iii. Managing Extract Structures
iv. Initialization
v. Delete the content of Setup tables (T-Code LBWG)
vi. Select the application (01 – Sales & Distribution) and Execute
- Now, Fill the Setup tables
9). Select Business Information Warehouse
i. Setting for Application-Specific Datasources
ii. Logistics
iii. Managing Extract Structures
iv. Initialization
v. Filling the Setup tables
vi. Application-Specific Setup of statistical data
vii. SD Sales Orders – Perform Setup (T-Code OLI7BW)
Specify a Run Name and time and Date (put future date)
Execute
- Check the data in Setup tables at RSA3
- Replicate the DataSource
Use of setup tables:
You should fill the setup table in the R/3 system and extract the data to BW - the setup
tables is in SBIW - after that you can do delta extractions by initialize the extractor.
Full loads are always taken from the setup tables
Regards,
Ravi
Download