DDS, A Seismic Processing Architecture

advertisement
DDS, A Seismic Processing
Architecture
Reproducible research workshop
UBC, Vancouver, 2006
D
W
Randall L. Selzler
Jerry Ehlers
Joseph A. Dellinger*
RSelzler @ Data-Warp.com
Jerry.Ehlers @ BP.com
Joseph.Dellinger @ BP.com
DDS ORIGINS: Amoco TRC, early 90’s
DDS began at the Amoco Tulsa Research Center at
a time of great organizational strain.
The job of the TRC was to do research and crunch
data, not to write software.
Creating software is expensive!
Amoco’s solution was an edict that
“everyone will use DISCO, or else”.
2
Else!
But DISCO just wasn’t good enough!
And so chaos ensued...
We were “mired in seismic processing diversity”.
DDS grew up surrounded by:
•
•
•
•
USP (Amoco internal trace-header based)
SEPlib (ASCII header pointing to data cubes)
SU (SEGY trace-header based)
DISCO (proprietary monitor-based system)
.... and needed to be compatible with all of these!
3
Although formally cast as a research group, in
fact the TRC also functioned as an “internal
contractor” processing shop.
1) So to catch on, not only would any software have to be usable
for quick-turnaround research, but
2) the ability to process large datasets efficiently and in parallel
was also of vital importance.
[Terabytes of data, Connection Machines, MPI, OpenMP]
3) The group had accumulated a considerable number and
variety of computers. [All “Unix”, but
CM5, Cray, Sun, SGI, Linux, Linux clusters, 32 and 64 bit...]
4) Finally, there was an urgent need for software that could
accomodate all the various mutant SEGY formats coming into
the shop, as well as DISCO, SEPlib, SU, and USP!
and out of the chaos came...
John Etgen was using SEPlib for migration
algorithm research on the CM200, a machine that
required massively parallel data I/O.
He showed SEPlib to Randy Selzler:
“I want something that looks like THIS, but can
handle the large industrial-strength jobs I need
to do!”
And thus DDS was born...
5
How SEPlib did it
“header” file
data file
... processing history ...
esize=4 (bytes)
data_format=xdr_float
in=data_location
n1=trace_length
n2=number_traces_per_record
n3=number_records
regularly sampled cube of
IEEE 4-byte floats of
dimension n1 x n2 x n3
d1=sample_interval
o1=starting sample etc...
SEPlib was the system favored by the folks writing programs that
worked on large data volumes instead of individual traces.
6
DDS can look a lot like SEPlib
SEPlib header file
DDS “dictionary” file
... processing history ...
... processing history ...
esize=4 (bytes)
data_format=xdr_float
type=float4
format=fcube
in=data_location
data= data location
n1=trace_length
n2=number_traces_per_record
n3=number_records
axis= t offset cdp
size.t = trace length
size.offset=number traces per record
size.cdp= number records
d1=sample_interval
o1=starting sample
label1=seconds etc...
delta.t= sample_interval
origin.t= starting sample
units.t= seconds etc...
7
DDS can look a lot like SEPlib
“dictionary” file
data file
type=float4
format=fcube
data= data location
axis= t offset cdp
size.t = trace length
size.offset=number traces per record
size.cdp= number records
delta.t= sample_interval
origin.t= starting sample
units.t= seconds etc...
regularly sampled cube of
IEEE 4-byte floats of dimension
size.t x size.offset x size.cdp
(command-line arguments
look a LOT like SEPlib too)
8
DDS’s Generalizations
Dictionary
…
axis= t y cmp
…
size.t= 1000
size.y= 96
size.cmp= 24
…
delta.t= 0.008
units.t= s
…
origin.y= 5000
units.y= m
…
format= segy
data= oak39_@
• N-Dimensional Array of I/O Records
• Densely populated for random access
• Sequential access if sparse
• Meaningful Axis Names
• t, x, y, z, w, kx, ky, kz, cmp, shot, offset, …
• Extensible Axis Attributes
• Regular grid (size, origin, delta, units, …)
• Variable grid (grid.z= 1 3 5 7 11, …)
• Non-numeric (label.attr= Vp Vs rho)
Binary Data
Card Header
Line Header
Great for research! Exotic algorithms
and unforeseen domains can be
accurately represented and processed
as easily as traditional ones.
Traces…
9
How USP did it
USP-format data file
historical line header
(processing history
and 3 data dimensions)
element count
trace header
trace samples
element count
trace header
trace samples
element count
trace header
trace samples
traces
Unix Seismic Processing
USP was Amoco’s
internally home-grown
trace-based processing
system, beloved of Amoco’s
signal processors.
USP is similar to SU in
concept.
USP uses longer trace
headers than SU, but
they still turned out to not
be long enough!
USP is still used as much as
ever today.
10
SU and USP use fixed-format trace
headers defined by include files
/*
* hdr.h – SU include file for segy offset array
*/
static struct {
char *key;
char *type;
int offs;
} hdr[] = {
{
"tracl",
"i",
0},
{
"tracr",
"i",
4},
{
"fldr",
"i",
8},
{
"tracf",
"i",
12},
{
"ep",
"i",
16},
{
"cdp",
"i",
20},
{
"cdpt",
"i",
24},
{
"trid",
"h",
28},
{
"nvs",
"h",
30},
{
"nhs",
"h",
32},
{
"duse",
"h",
34},
{ "offset",
"i",
36},
{
"gelev",
"i",
40},
{
"selev",
"i",
44},
{ "sdepth",
"i",
48},
{
"gdel",
"i",
52},
{
...
11
DDS also plays well with USP
DDS dictionary file
USP-format data file
type=float4
format=usp
data= data location
axis= t offset cdp comp
size.t = trace length
size.offset=number traces per record
size.cdp= number records
size.comp= number components
delta.t= sample_interval
origin.t= starting sample
units.t= seconds etc...
DDS knows what USP headers look like!
line header
(three dimensions)
element count
trace header
trace samples
traces
element count
trace header
trace samples
element count
trace header
trace samples
12
and SEGY...
DDS dictionary file
SEGY-format data file
type=float4ibm
EBCDIC cards
binary header
format=segy
data= data location
axis= t offset cdp comp
size.t = trace length
size.offset=number traces per record
size.cdp= number records
size.comp= number components
trace header
IBM-format samples
traces
trace header
IBM-format samples
trace header
IBM-format samples
delta.t= sample_interval
origin.t= starting sample
units.t= seconds etc...
Note DDS only bothers to convert back to
SEGY’s archaic IBM floats when writing to disk!
13
DDS can speak SU
note input format auto-detected
editd in=minute2.usp
3s=16 3e=16 2s=2 2e=32 2i=2
out_format= su
out_data= stdout: |
supswigp clip=.2 > wiggle.ps
\
\
\
\
DDS dictionaries can point at dictionaries!
dict.comp1
type=float4ibm
type=float4ibm
format=segy slice.comp
data= dict.comp1
dict.comp2
dict.comp3
axis= t offset cdp comp
size.t = trace length
size.offset=number traces per record
size.cdp= number records
size.comp= number components
format=segy
data= data.c1.segy
SEGY
binary
data file
data.c1.segy
axis= t offset cdp
size.t = trace length
size.offset=number traces per record
size.cdp= number records
...
dict.comp2
type=float4ibm
format=segy
...
data= dict.c2.segy
axis= t offset cdp
SEGY
binary
data file
15
data.c2.segy
DDS plays well with mutant SEGY
bridge
in= Atlantis_EQ.segy
\
in_format=segy
\
out_format=usp
\
comment="Component Type"
\
straight map
map:segy:usp.RcComp= "TotalStatic"
\
\
comment="Src and rec locations"
\
map:segy:usp.SrPtXC= "SrcX / 10"
\
map:segy:usp.SrPtYC= "SrcY / 10"
\
map:segy:usp.SrPtEl= "15" \
fixed number
map:segy:usp.ShtDep= "SrcDepth / 10"
\
\
map:segy:usp.RcPtXC= "GrpX / 10"
\
map:segy:usp.RcPtYC= "GrpY / 10"
\
map:segy:usp.GrpElv= "Spare.I4[10] / 10"
\
map:segy:usp.CabDep= "Spare.I4[10]"
\
map:segy:usp.DstSgn= "DstSgn / 10"
\
arithmetic
\
calculation
comment="Rec point and line numbers"
\
map:segy:usp.DpPtLn= "Spare.I4[8]"
\
map:segy:usp.DpPtLt= "Spare.I4[9]"
\
\
comment="Dead or Live" \
map:segy:usp.StaCor= '( TrcIdCode - 1 ) * 30000'
\
|\
editd in= stdin: 3e=106 out_data= raw.usp
16
Data formats and mappings
• This is how DDS differs from SEPlib...
The properties of the binary data, and all the elements
within the binary data, are looked up in the “dictionary”.
• Even the array of trace samples is just another trace field
as far as DDS is concerned.
• DDS knows a few default formats, but can use any
format that you can define.
• It can also map to and from any format that you can
define the necessary mappings for.
• This has the important side effect of documenting the
data format, making future reproducibility possible
17
DDS supports generic formats
In fact, besides having a few built-in default
formats such as USP, SU, and SEGY that are
convenient for geophysicists,
there is nothing in the core of DDS that limits it to
being a seismic processing system!
18
Internal data formats
• Programs can define their own internal data formats as
well, simply by writing definitions into their own internal
dictionary:
fdds_printf (‘MOD_FIELD’, ‘ *+ float MyHeader1, MyHeader2;\n\0’)
• DDS will then convert from the format of the data, as documented by
its dictionary, to the internal format specified by the program.
• On output, the internal format will be converted back into whatever
output format has been requested on the command line, or by
default, the output format will be the same as the input format.
19
Leverage Diversity? Interoperate!
Data handling is fundamental…
Non-DDS
Application
Format and API Emulation
With Random Access I/O
USP Re-link
1998 Proof
of Concept
Disk File
Pipe/Socket
Tape
DISCO Support
1997-2003
Generic Read
DDS
Application
DDS
Application
Generic I/O
Non-DDS
Application
API Emulation
API Emulation
Foreign Library
Generic I/O
Foreign
Format
Any DDS
Supported
Format
Generic Write
Disk File
Pipe/Socket
Tape
Non-DDS
Application
20
Are you scared yet?
• You can probably imagine that all this translating
between formats can get very complicated...
...
fmt:SAMPLE_TYPE= typedef float4 SAMPLE_TYPE;
fmt:USP_ADJUST= typedef enum4 {USP_LINE_PAD \= 0, USP_TRACE_PAD
\= 0, USP_HLH_SIZE \= 2236} USP_ADJUST;
fmt:SEQUENCE= typedef USP_TRACE SEQUENCE;
alias:fmt:USP_TRACE_PAD= fmt:USP_ADJUST
alias:fmt:USP_HLH_SIZE= fmt:USP_ADJUST
alias:fmt:USP_LINE_PAD= fmt:USP_ADJUST
usp_NumRec= 2056
...
But still better than having to change your code or relink your code
for every different mutant data format! It also makes it possible to
21
interoperate with historical data formats without too much pain.
DDS scripting as a Rosetta stone
/apps/global/bin/bridge \
in= /hpc/dat13/zdsr01/Node/EQ/all.segy \
in_format=segy out_format=usp \
comment="Component Type" \
map:segy:usp.RcComp= "TotalStatic" \
comment="Src and rec locations" \
map:segy:usp.SrPtXC= "SrcX / 10" \
map:segy:usp.SrPtYC= "SrcY / 10" \
map:segy:usp.SrPtEl= "15" \
map:segy:usp.ShtDep= "SrcDepth / 10" \
comment="Azimuth, Roll Tilt" \
map:segy:usp.TVPT01= "100 * Spare.F4[11]" \
map:segy:usp.TVPT02= "100 * Spare.F4[12]" \
map:segy:usp.TVPT03= "100 * Spare.F4[13]" \
comment="Dead or Live" \
map:segy:usp.StaCor= '( TrcIdCode - 1 ) * 30000' \
comment="Shot Time" \
map:segy:usp.TVPT15=Date.DateYear \
map:segy:usp.TVPT16=Date.DateDay \
map:segy:usp.TVPT17=Date.DateHour \
map:segy:usp.TVPT18=Date.DateMin \
map:segy:usp.TVPT19=Date.DateSec \
....
22
In Conclusion: caveats
• Things aren’t so complicated if you use DDS as
if it were SEPlib, but then what’s the point?
• Because so much functionality already exists in
USP, there has been little motivation to flesh out
DDS.
• The external distribution is a subset of the same
stuff we use internally. There has been little
effort put into improving the “packaging”.
• While there is some documentation, it is
somewhat lacking!
23
In Conclusion: upsides
• The software infrastructure inside BP today is based almost entirely on
DDS and USP. It is BP’s infrastructure both for research and for
processing. BP’s advanced imaging team in Houston is “BP’s largest
contractor”.
• The DDS I/O library was released publicly in 2003 on “freeusp.org”.
The core of the USP system was released a year or so earlier on the
same web site, along with some ARCO-heritage processing systems
as well.
• By releasing USP and DDS, BP hoped to make it easier to share
algorithms with academia and contractors.
• Randy Selzler now wants to create a successor to
DDS, but that’s his talk, as the “prophet”, to give...
24
Download