Semantic Annotation - University of Southern California

advertisement
SPE-153272-PP
Recovering Linkage Between Seismic Images and Velocity Models
Jing Zhao, Charalampos Chelmis, Vikram Sorathia, Viktor Prasanna, Abhay Goel, University of Southern
California
Copyright 2012, Society of Petroleum Engineers
This paper was prepared for presentation at the SPE Western North American Regional Meeting held in Bakersfield, California, USA, 19–23 March 2012.
This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents of the paper have not been
reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessar ily reflect any position of the Society of Petroleum Engineers, its
officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohi bited. Permission to
reproduce in print is restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.
Abstract
Seismic processing and interpretation involves resource intensive processing in the petroleum exploration domain. By
employing various types of models, seismic interpretations are often derived in an iterative refinement process, which may
result in multiple versions of seismic images. Keeping track of the derivation history (a.k.a. provenance) for such images thus
becomes an important issue for data management. Specifically, the information about what velocity model was used to
generate a seismic image is useful evidence for measuring the quality of the image. The information can also be used for
audit trail and image reproduction. However, in practice, existing seismic processing and interpretation systems do not
always automatically capture and maintain this type of provenance information.
In this paper, by employing state-of-the-art techniques in text analytics, semantic processing and machine learning, we
propose an approach that recovers the linkage between seismic images and their ancestral velocity models when no
provenance information is recorded. Our approach first retrieves information from file/directory names of the images and
models, such as project names, processing vendors, and algorithms involved in the seismic processing and interpretation.
Along with the creation timestamps, the retrieved information is associated with corresponding images and models as
metadata. The metadata of a seismic image and its ancestral models usually satisfy certain relationships. In our approach, we
detect and represent such relationships as rules, and a matching process utilizes the rules and retrieved metadata to find the
best-matching images and models.
In practice, images’ and models’ file names often do not adhere to naming standards and they are stored without following
well established record keeping practices. Users may also use different terms to express the same information in file/directory
names. We employ Semantic Web technologies to address this challenge. We develop domain ontologies with OWL/RDFs,
based on which we provide an interactive way for users to semantically annotate terms contained in file/directory names. All
metadata used by the image-model matching process is represented as ontology instances. Matching can be performed using
the standard semantic query language. The evaluation results show that our approach can achieve satisfying accuracy.
Introduction
Petroleum exploration and production domain employs various scientific methods that involve complex workflows requiring
advanced computational, storage, sensing, networking, and visualization capability [17]. Effective data management becomes
critical with increased regulatory requirements for reporting and standard compliance [19]. Large data volumes in SCADA
systems, data historian systems, hydrocarbon accounting systems, systems of records and other production and operation
systems can be managed with relative ease due to their structured nature and well-documented schema. However, other
specialized systems used for imaging, analysis, optimization, forecasting, and scheduling involve complex scientific
workflows handling data models that are recognized by specific vendor products only. Engineers and geoscientists handle
various kinds of data by subjecting them to complex domain specific models and algorithms that result in large number of
derived datasets, unstructured or semi-structured, which are stored with little or no metadata describing their derivation
history. Once the analysis is complete, and resulting datasets are transferred in a storage repository, retrieval at later stages
becomes difficult, time consuming and labor intensive.
Seismic imaging is a scientific domain that is being increasingly employed not only in exploration, but also in other stages of
2
SPE SPE-153272-PP
E&P lifecycle [20]. A typical seismic imaging workflow involves various steps including data collection, data processing,
model building, data interpretation, analysis, rendering and visualization. Seismic image processing and interpretation
involves highly interactive and iterative processes that require loading, storing, referencing, and rendering of large volumes
of datasets [17]. This requires large amounts of computation and storage capability in addition to domain specific software
products and tools capable in handling, manipulating and visualizing such datasets. Geoscientists skilled in modeling,
characterization, and interpretation employ various techniques and generate large amounts of intermediate datasets during
this process. From 2D, 2.5D, and 3D surveys; pre-stacking, post-stacking approaches; various types of migration algorithms;
there are several techniques proposed for various types of geological structures [18]. In a typical workflow, velocity model or
earth model is first generated for specific geological structure, which is then used to interpret the results in seismic volumes.
Typically, this workflow is repeated with some variations in interpretation parameters until best representations are found,
thereby resulting in large amounts of volumes for a given velocity model.
Data generated during this process is generally retained with no or incomplete metadata. Over a period, data repositories
receive contributions by large teams of geoscientists working on multiple projects. In absence of proper metadata and record
keeping practices, seismic datasets lose the context in which they were created. In order to be useful in decision-making, all
derived volumes must retain the link to the original velocity models [17]. The interpreters have to spend considerable amount
of effort in rediscovering the associated source models. Without formal metadata record, the file names of models may
provide some hints. However, individual interpreters may not have followed consistent file naming standards, and may not
have used unique terms to express the same semantic meaning. This significantly increases the time and effort required to
find the right velocity model from a repository.
We argue that with careful application of advanced machine learning, semantic web and text analytics techniques, we can
address this problem and achieve significant reduction of the search space. Our approach employs text analytics to extract
key words used by individual interpreters in file names and identify the variations in expressing the same term. By
introducing Semantic Web technologies, we generate Ontology for the file naming convention that contains concepts related
to the seismic interpretation process and their possible expressions. Finally, by introducing machine-learning techniques, we
implement a matching system that enables identification of linkage discovery among images and models. Recovering
linkages in this manner can be particularly useful in not only generating the metadata, but also facilitating advance search
capabilities based on various interpretation techniques and parameters. Establishing the derivation history is also useful in
determining the quality and characteristics of seismic volumes.
Motivating Scenario.
Figure 1 depicts the result of a seismic image interpretation process carried out for BP 2004 Salt Structure Data [18]. Here, a
velocity model is utilized with various interpretation techniques and different parameters. Derived seismic volumes are stored
by interpreters in local disks or shared network folders. While storing these derived volumes using interpretation system,
interpreters select filenames that capture key processing parameters by which the given volume was derived. In this
particular case, interpreter has generated three volumes using three different interpretation parameters. One-way and two-way
migration technique was performed on part of the dataset to generate an interpreted volume file. Based on the outcome, the
third interpretation was performed using two-way migration technique on the full dataset. These variations are well captured
in the file names. In addition, the geological structure type and the dataset name, the volume creation time, and the project
name were also captured in the derived volume file names.
This example provides a good understanding of the file naming convention that has been followed by the interpreters. Even
though proper metadata is not generated for all derived seismic volumes, the selection of keywords in file names provides
hints about how particular volume was derived. Knowing the Datasets, Project Name and Geological Structure Type, it
becomes easier to establish links among volumes and the model that was used to derive them. In Figure 1, all derived volume
file names indicate “BP_2004”, “projectbp” and “fslt” that can also be found in the model file name, with the exception of
“fslt”, which is expressed as “fullsalt” instead. Clearly, file names not only include information about key parameters, but
interpreters mostly select the same terms. However, the derived volumes include additional parameters capturing more
information about the preprocessing and post processing steps, segmentation, and other image loading parameters that were
used. In given example, “full”, “part”, “oneway”, “twoway”, “mig”terms are very specific to the interpretation process and
therefore are found in the volume file names only.
We argue that, it is possible to recover the linkages between volumes and associated velocity model by harnessing the hints
provided by interpreters in the filenames. With careful observation of seismic processing and interpretation workflow and file
naming convention followed in a specific organization, it is possible to establish rules that can help in detecting the linkage
between files. For instance, dataset name, project name, geological structure name, and file creation dates all play key roles in
discovering the match. However, pre-processing or post-processing parameters, file loading parameters etc. can be ignored as
they are specific to volume names only. Designing a system for linkage discovery based on this matching approach
SPE SPE-153272-PP
3
introduces various challenges when implemented for large number of users. As indicted in the example, different users use
different keywords (e.g. “fslt” and “fullsalt”) for the same term (“Full Salt”). The proposed system must therefore effectively
handle such variations, and must be able to address semantic and syntactic heterogeneity issues in order to accurately
establish lost linkages between velocity models and their derived seismic image volumes.
Figure 1. Example of Seismic Interpretation Process Indicating Generation of Multiple Volumes Using a Velocity Model
System Overview
Figure 2 illustrates the overview of our approach. In general, our approach consists of three steps:
1) Metadata extraction. Given a set of seismic images and velocity models, our approach first identifies and extracts
the information that can be used for recovering the linkage between images and models. We employ a data loading
process to retrieve the file names of images and models and their creation time. The name of a seismic image file or
a velocity model file usually consists of multiple terms separated by “_”, where each term captures information
about project name, processing vendors, and algorithms involved in the seismic processing and interpretation. We
split file names of images and models into individual terms, and clean the terms by utilizing text analysis techniques.
2) Semantic annotation. The information encoded in the file names is the main hint for us to identify the linkage
between images and models. However, as we have discussed, users may use different terms to express the same
information, making it difficult to directly match seismic images and velocity models based on their names alone.
To attack this challenge, we design an ontology as a global vocabulary to represent the information that may be
encoded in file names. A user-interactive semantic annotation process, which is the second step of our approach,
utilizes the ontology to annotate the terms extracted from file names. Each term is represented as an ontology
instance that is stored in an ontology repository, and a file name can then be represented by a group of ontology
instances. The group of ontology instances, along with the creation time, is associated with the corresponding
seismic image or velocity model as its attributes.
3) Matching. In the last step, for each seismic image, we identify the velocity models that are probable to have been
used for its creation. We use a set of rules to express the relationships that the attributes of an image and its ancestral
model may have. For example, the creation time of the ancestral velocity model of a given image should be within a
certain time window. According to the rules, we then execute semantic queries and rules on image and model
attributes, to identify the best-matching images and models.
We describe each step in detail in the following sections.
4
SPE SPE-153272-PP
Metadata extraction
Data Load
File/directory names
of images/models
Ontology
Database
Pre-processing
(Splitting, Cleansing)
Bag of terms
Semantic Annotation
Creation
time
Ontology
instances
Matching
Linkage between
seismic images and
velocity models
Figure 2. Overview of the Semantic Based Matching Approach
Metadata Extraction
In our use case, all seismic image files and velocity model files are stored in servers running Linux operating system. Thus,
we simply use “ls –l” command to get the standard Linux output containing two fields, i.e., file names and creation time,
separated by spaces. We use the creation time and file names to extract the terms we use in the following steps in our
approach.
The name of a typical seismic image file or a velocity model file usually consists of multiple terms linked with underscore
“_”. For example, one of the image files is named as “fslt_psdm_bp_projectbp_agc_il.bri”. In the file name, each term has its
own semantic meaning, which represents metadata of the corresponding seismic image. In this example, “fslt” means the
“full-salt” model, and “psdm” refers to the “pre-stack depth migration” imaging algorithm. The image file name contains
these two terms to indicate that the seismic processing system for generating the image has employed the corresponding
model and algorithm. Table 1 lists the meaning of all terms contained in the example file name.
Term
fslt
psdm
bp
projectbp
agc
il
Semantic Meaning
The full-salt model
The pre-stack depth migration imaging algorithm
The processing vendor name
The project name
The image pre-processing step
The inline sort method
Table 1. Terms extracted from the example image file name, and the semantic meaning of the terms
For each file name returned by the “ls” command, we split the file name into individual terms. Each image/model file is then
associated with a group of terms. In general, a group of terms usually contain the following information:
1) Project names, for example, “projectbp” in the above example.
2) Processing vendors, a file name may contain terms like “BP”, “Chevron”, and “WesternGeco”.
3) Imaging algorithms, such as the pre-stack depth migration algorithm in the example.
4) Involved models, such as “sediment flood”, “water flood”, and “full-salt”.
5) Version information, for example, “v1” means the file is the first version of the seismic image. Other terms about
the version information may include “v2”, “new”, and “update1”.
6) Post-processing steps, such as “agc”.
SPE SPE-153272-PP
5
7) Sort order, such as “inline” and “crossline”.
8) Image loading parameters, such as “subvolume”, “partial” and “full”.
However, not all the user-supplied terms are useful for the linkage recovery. Before we proceed further with the extracted
terms, we need to remove redundant and useless terms. For example, if the term “agc” cannot be used as a hint for predicting
the ancestral velocity models of the seismic image, we do not need to take it into account in the following steps. For the
exemplified file name “fslt_psdm_bp_projectbp_agc_il.bri”, the metadata extraction step will generate the term group
{“fslt”,“psdm”, “bp”, “projectbp”}.
Matching based on Terms.
We can match images to models directly, based on terms extracted from file names. Basically, a velocity model may be used
to create a seismic image when we detect the following relationships between them:
1) The gap between their creation time is within a certain period of time (e.g., around a month).
2) They share the same or similar model/algorithm name, project name, and/or processing vendors.
Based on this detection, we can compare the creation time and the extracted file name terms to identify the best-matching
seismic images and velocity models. For example, we can match the seismic image “sedflood_psdm_bp_v5.il.bri”, which
was created on 03/21/2011, to the velocity model “sedflood_psdm_bp.bln”, which was generated on 02/23/2011, since the
seismic image file was created nearly one month after the velocity model, and they share the same imaging algorithm, model,
and processing vendors.
However, in practice, users do not always strictly follow the naming standards to name images and models. We used the
Levenshtein Distance [14] to calculate the lexical similarity between terms, so as to identify terms that represent the same
semantic meaning. For example, the Levenshtein distance between “full-salt” and “fullsalt” is 1, which is a small distance so
that we can infer that the two terms may refer to the same model. However, in some cases terms expressing the same
meaning may have a relative large distance. For example, the distance between “fs” and “fullsalt” is 6. Thus lexical similarity
cannot accurately capture “semantic distance”.
Semantic Annotation
Annotation is about attaching names, attributes, comments, descriptions, etc. to a document or to a selected part of text. It
provides additional information (metadata) about an existing piece of data. We use ontology to annotate terms extracted from
file/directory names, so as to solve the problem of heterogeneous file naming conventions and information representation.
The ontology acts as a global vocabulary that represents the semantic meaning of terms, assisting in term disambiguation and
also helping in associating domain concepts and local facts.
Figure 3. Snapshot of the Ontology Used for Annotation
6
SPE SPE-153272-PP
Domain Ontology.
Figure 3 illustrates a snapshot of the ontology that we use for annotation. For illustration purposes, this figure has been
compressed to show only part of the classes and instances contained in the ontology. We use ontology classes such as
“ProjectName”, “VendorName”, “Version”, “SeismicModelingAlgorithmName”, etc. to describe possible information
contained in file names. As indicated by the name, each class captured a particular concept in file/directory names.
Subclasses linked to each class define groups under particular concepts. For example, “Sub_Salt”, “Base_Salt”, “Full_Salt”,
“Multi_Level_Salt”, and “Top_Salt” are all subclasses coming under the “Salt” class, which is in turn a subclass of
“Geobody_Structure”. We can also find concepts like “fullsalt”, “fslt”, “flslt”, “fsalt” and “flst” that are linked to the
“Full_Salt” class. In our ontology, the “Unknown” class captures terms which are unknown or don't fit in any of the above
classes. Later on, with the help of the domain experts; such words can either form new classes or become new instances of
existing classes.
Annotation.
During annotation [16], we annotate terms based on the domain ontology, and represent the terms as ontology instances
belonging to corresponding ontology classes. For example, since the terms “fslt” is used to represent the algorithm name
“Full_Salt”, the ontology class “Full_Salt” should be used for annotation. As shown in Figure 3, we create an instance “fslt”
that belongs to the class “Full_Salt”. We define the instance as an Attribute of the file name. The whole file name can then be
represented as a Semantic Entity containing a set of such attributes. For example, the file name
“fslt_psdm_bp_projectbp_agc_il.bri”, can be represented by a group of attributes {fslt, psdm, bp, projectbp} that belongs to
the ontology classes Full_Salt, PreStackDepthMigration, BP, and ProjectName, respectively.
As shown in Figure 4, generated instances are stored in the ontology repository, which communicates with our annotation
application through a SPARQL [15] endpoint. When annotating a group of terms extracted from an image/model file name,
the Automated Annotation process first probes the ontology to identify if any existing ontology instances match the terms for
annotation. If we can find such ontology instances, the ontology instances, as well as corresponding ontology classes, will be
included as the attributes of the image/model.
SPE SPE-153272-PP
7
Figure 4. Annotation approach
Figure 5. Screenshot of the User Assisted Annotation Tool [16]
If terms have not been annotated before, our annotation system marks them as “Unknown”. The user assisted annotation tool
then annotates unknown terms as ontology instances. A domain expert who doesn't have prior knowledge of Semantic Web
technology can easily update the main ontology using the interface provided by the tool. By utilizing this tool, a domain
expert can either define new ontology classes for the unknown terms or associate them to previously defined classes.
Figure 5 shows a screenshot of the annotation tool. As shown in the figure, “gulfofmaxico”, which is a term extracted from
the file name “Fl_prstk_krchf_vol1_saltbody_insert_gulfofmaxico_vol1_2008.bri”, cannot be initially matched to any
instance in the ontology because of a typo: “maxico”. Thus the annotation tool first annotates the term as “Unknown”. To
define this “unknown” term, the annotation tool allows the user to navigate all relative ontology classes and select
appropriate classes for annotation. In this example, the user selects the ontology class “Gulf_of_Mexico”, which is a subclass
of “PlaceName”. The term “gulfofmaxico” is then described as an attribute of the file name, and a new instance
“gulfofmaxico” belonging to the class “Gulf_of_Mexico” is added to the ontology with user corroboration.
Semantic Based Matching
Every image/model file name can now be represented as a semantic entity containing a set of semantic attributes, each
expressing the semantic meaning of a term contained in the file name. Recall that we also capture the creation time of the file
in the metadata extraction step. Time information is also represented as a semantic attribute.
In the matching step, we speculate a velocity model was used to create a seismic image if we find that their annotation
semantic entities match each other according to certain rules. In particular, we have developed two types of semantic-based
matching approaches. Both approaches utilize semantic technologies such as SPARQL [15] to express the matching rules and
apply the rules to semantic entities. We now introduce the two approaches in the following sections.
Approach 1: Exact Pattern Matching.
In our first approach, we first allow users to define a set of rules, each of which specify a condition that the file name of the
matched velocity models must satisfy. We then compose a SPARQL query based on matching rules, and execute the query to
search among all semantic entities of velocity models. The results of the query indicate potential velocity models which
might have been used to generate the given seismic image.
Specifically, a matching rule can be used to specify
1) What terms MUST be included in the file name of the model;
2) What terms are POSSIBLE to be included in the file name of the model;
8
SPE SPE-153272-PP
3) What terms MUST NOT be included in the file name of the model;
4) The range of creation time.
Since all file names have been represented as semantic entities with semantic attributes, when defining a matching rule, users
do not need to consider all possible terms differentiations. Instead, they can directly use our domain ontology to form their
matching rules.
Based on the annotation semantic entity of a given seismic image, a user can directly specify what ontology classes should be
included in the semantic entity of the velocity model. For example, when the user finds that the semantic entity of the image
contains an attribute “cvx”, and she thinks that the name of the matching velocity model should have the same processing
vendor name, she can directly define a rule restricting the results to velocity models containing attributes belonging to the
ontology class “Chevron”.
Users can further define a set of rules with “if-else” structure in advance. “if-else” rules express the exclusion and/or
inclusion over terms in image and model file names. For example, a user can define a rule of the form “if the semantic entity
of the image contains an attribute belonging to ProcessingVendor P, the semantic entity of the model should also contain an
attribute belonging to P”, where P can be seen as a parameter of the rule. Later, when the system does the matching for an
image file whose semantic entity has an attribute “cvx”, our matching system captures “Chevron” as the argument for the “ifelse” rule, and automatically generates a matching rule that the semantic entity of the velocity model must contain an
attribute belonging to the same ontology class “Chevron”.
Based on the creation time of the seismic image, users can define rules to specify the creation time range for the velocity
model, e.g., 3~5 weeks after the creation time of the image. In our system, we provide a simple GUI for users to define
different types of matching rules, where users can select the corresponding ontology classes that are involved in the matching
rules based on our domain ontology, and also specify operators such as “=”, “>”, “<”, and “๏‚น”.
SELECT ?velocityModel WHERE {
?velocityModel iam:hasAttribute ?vendorAttribute .
?vendorAttribute rdf:type iam:Chevron .
?velocityModel iam:hasAttribute ?geoAttribute .
?geoAttribute rdf:type iam:Full_Salt .
?velocityModel iam:hasAttribute ?creationTime .
FILTER ( ?creationTime > "2010-01-01T00:00:00Z"^^xsd:dateTime )
}
Figure 6. SPARQL Query Example
All matching rules are then integrated together to compose a SPARQL query. Figure 6 illustrates a SPARQL query example
that contains three matching rules:
1) The model name should have an attribute that is an instance of the ontology class “Chevron”.
2) The model name should have an attribute that is an instance of the class “Full_Salt”.
3) The creation time of the model should be after 00:00:00 of 01/01/2010.
Approach 2: Matching Score.
The above approach searches for velocity models whose file names satisfy all user-specified matching rules. However, in
practice, images and models are often not named strictly following the naming, thus the completeness of the information
cannot be guaranteed. Some important information contained in matching rules is often missing in model names. For
example, the model name may miss the term indicating its processing vendor name. As a result, the correct matching
model(s) cannot be identified by the SPARQL query.
We develop a new approach to overcome this shortcoming of the exact pattern matching approach. Still, users define a set of
matching rules. Instead of composing a SPARQL query containing all matching rules, we assign a score to each matching
rule. Each velocity model is associated with a matching score, which is initiated as 0 in the beginning. We go through all
rules and add the corresponding score to the model’s matching score if its file name satisfies a matching rule. We sort
velocity models based on their matching scores and provide the top n models as our matching result. Here n is a number
specified by the user.
For example, suppose users define the three rules contained in the SPARQL query in Figure 6. The scores for Rule 1), 2) and
3) are 0.3, 0.5, and 0.2, respectively. Then if a velocity model, noted as m1, satisfies Rule 1) and 2), m1 gets a matching score
0.3+0.5=0.8. Similarly a velocity model m2 only satisfying Rule 3) gets a matching score 0.2, and a velocity model m3
satisfying Rule 2) and 3) gets a matching score 0.7. Thus m1 and m3 will be returned as matching results if a user wants the
top 2 matching models for further selection (i.e., n is equal to 2).
SPE SPE-153272-PP
9
The score of each matching rule can be considered “weighting”. Intuitively, a matching rule with more importance should
have a higher score. Matching scores can be assigned by users according to domain knowledge, or can be “learned” using
machine learning algorithms.
Evaluation
We evaluate our approach in this section. In our experiment, we first collect a set of seismic images and their corresponding
velocity models, and record the correct matching between them as “ground truth”. We then use our approach to find the
matching model for each seismic image among the set of velocity models, and compare our results with the “ground truth” so
as to measure the precision. We utilize the matching score approach in the matching step.
Experiment Setup and Dataset Generation.
One challenge in our experiment is that although we can collect a relatively large set of images and models, it is still difficult
to identify the “ground truth” since most of the correct linkage between images and models have not been recorded. To
overcome this challenge, we designed an algorithm to generate synthetic “ground truth” based on the set of velocity models
we have collected. Starting from around 500 velocity models collected from real applications, our algorithm mimics the
naming procedure conveyed from domain users who utilize velocity models to generate seismic images. Our matching
approach is then applied to find a matching model for each generated image. The links between velocity models and seismic
images that are generated by our algorithm is used as the synthetic “ground truth”.
As shown in Algorithm 1, for each seismic image, we randomly generate its creation time. Based on our observations on real
data, the creation time of an image should be within 3~5 weeks after the model’s creation. But we also allow exceptions: the
creation time of an image may be out of the time range with a small probability P T. After generating the creation time of the
image, we represent the file name of the velocity model as a semantic entity. Our algorithm acts as a domain engineer, and
determines what information should be included in the image file name based on a set of rules. For each piece of information,
we randomly choose one of its possible representation terms according to our ontology. We compose the image file name by
connecting all such terms (a term may be missing with a probability P M). We also generate some redundant information in the
file name with probability PR.
Input: 1) the file name of a velocity model; 2) a set of image generation rules; 3) probability PT, PM, and PR
Output: the synthetic file name of a seismic image generated by the input velocity model
1. Identify the creation time TV of the velocity model;
2. Randomly generate a datetime TI within the range [TV+21 day, TV+35 day] with a probability PT that TI does not
have any value range. Use TI as the creation time of the seismic image.
3. Extract useful terms from the velocity model file name.
4. Annotate extracted terms by using our domain ontology, and generate the annotation semantic entity.
5. Based on the image generation rules, identify what information should be included in the seismic image file name.
Use a set of ontology classes (noted as {C}) to represent the necessary information.
6. For each ontology class in {C}, randomly pick one of its instances as its term representation in the image file name.
7. Compose the file name for the seismic image by connecting all the generated terms with “_”. Each term has a
probability PM to be missing, and a probability PR to have a redundant copy in the file name (maybe with a different
instance belonging to the same ontology class).
Algorithm 1. Algorithm for Generating the Synthetic Image File Names
We run our experiment in a Desktop with 3.06 GHz Intel Core i3 CPU and 4GB memory.
Evaluation Results.
We measure the precision of our approach when our approach only generates one matching model for each image file. This
precision, noted as P1, measures the probability that our best matching result is the correct matching model:
๐‘ƒ1 =
๐‘๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘๐‘œ๐‘Ÿ๐‘Ÿ๐‘’๐‘๐‘ก ๐‘š๐‘Ž๐‘ก๐‘โ„Ž๐‘–๐‘›๐‘”
๐‘‡๐‘œ๐‘ก๐‘Ž๐‘™ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘ ๐‘’๐‘–๐‘ ๐‘š๐‘–๐‘ ๐‘–๐‘š๐‘Ž๐‘”๐‘’๐‘ 
We also measure the probability of the correct matching model being covered when our approach generates multiple
matching results. In general, if we use ๐‘› ≥ 1 to denote the number of matching results returned for each seismic image, we
have:
๐‘ƒ๐‘› =
๐‘๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘ก๐‘–๐‘š๐‘’๐‘  ๐‘กโ„Ž๐‘Ž๐‘ก ๐‘กโ„Ž๐‘’ ๐‘๐‘œ๐‘Ÿ๐‘Ÿ๐‘’๐‘๐‘ก ๐‘š๐‘Ž๐‘ก๐‘โ„Ž๐‘–๐‘›๐‘” ๐‘š๐‘œ๐‘‘๐‘’๐‘™ ๐‘–๐‘  ๐‘๐‘œ๐‘ฃ๐‘’๐‘Ÿ๐‘’๐‘‘
๐‘‡๐‘œ๐‘ก๐‘Ž๐‘™ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘ ๐‘’๐‘–๐‘ ๐‘š๐‘–๐‘ ๐‘–๐‘š๐‘Ž๐‘”๐‘’๐‘ 
10
SPE SPE-153272-PP
Figure 7. Matching precision as a function of number of returned matching results
Figure 7 shows the precision Pn where n=1, 2, โ‹ฏ, 8. We see that we can only achieve a precision less than 60% when n=1.
However, the precision improves greatly when multiple matching results are provided. When n=8, we can achieve 100%
precision. This means that in order to recover lost links between images and models based solely on their file names, about 8
candidate models suffice for the correct matching model to be covered (i.e. the correct matching model to be in the retrieved
results). Hence, we conjecture that our approach can effectively identify a small set of matching candidate models, thus
drastically reducing the search space of candidate models for users to manually examine in order to select the right one.
Related Work
In this work, we use the Semantic Web technologies to attack the challenge caused by heterogeneous expression of
information contained in image/model file names. Our later matching scheme is based on the semantic representations of file
names. The Semantic Web technologies have been used in the oil and gas industry to address problems such as information
integration and knowledge management [1][2][3][4]. For example, in [1], the POSC Caesar Association proposed the Oil and
Gas Ontology (OGO), which is an industry wide ontology, in order to provide standard means for data integration within and
across business domains. In [3], Norheim and Fjellheim introduced the AKSIO system, which is knowledge management
system for petroleum industry. In addition, different use-cases have been discussed as applications of Semantic Web
technologies in the oil and gas industry in [2]. In our earlier work, we use the Semantic Web technologies to address the
problems in the reservoir management and real-time oil field operations setting [4]. Issues about how to develop ontologies,
how to build the knowledge database, and examples of applications for information integration and knowledge management
have been discussed in [4].
The linkage between the seismic images and the velocity models that were used for their creation can be seen as a type of
provenance information. The provenance information explains the derivation history of a data object thus can be used for
audit trail and is critical for data quality control. Applications and technologies for provenance collection, management and
access have been widely discussed in domains such as e-Science and health care[5][6][7][8][9][10][11][12][13].
Conclusion
The linkage between seismic images and velocity models that were used for their creation is an important type of provenance
information required for further analysis and data quality control. In this paper, we proposed an approach to recover the
missing linkage between seismic images and velocity models. Our approach extracts information contained in image/model
file names, and utilizes Semantic Web technologies to annotate and represent extracted information. Based on user-specified
rules, we designed algorithms to identify the matching between images and models. For future work, we will explore other
possible information types that may be utilized to improve our prediction precision. We will also address the scalability of
our approach so that we can effectively handle large datasets.
Acknowledgement
This work is supported by Chevron Corp. under the joint project, Center for Interactive Smart Oilfield Technologies (CiSoft),
at the University of Southern California.
References
[1] POSC Caesar Association, http://www.posccaesar.com/
[2] F. Chum, “Use Case: Ontology-Driven Information Integration and Delivery A Survey of Semantic Web Technology in the Oil and
Gas Industry,” April 2007. http://www.w3.org/2001/sw/sweo/public/UseCases/Chevron/
SPE SPE-153272-PP
11
[3] D. Norheim and R. Fjellheim, “AKSIO - Active Knowledge management in the petroleum industry,” in 3rd European Semantic Web
Conference (Industry Forum), June 2006
[4] R. Soma, A. Bakshi, V. Prasanna, W. Da Sie and B. Bourgeois, Semantic-web technologies for Oil-field Management, SPE Intelligent
Energy Conference and Exhibition, April 2008
[5] C. Pancerella et. al., “Metadata in the collaboratory for multi-scale chemical science,” Dublin Core Conference, 2003.
[6] I. Foster, J. S. Vockler, M. Wilde, and Y. Zhao, “Chimera: A virtual data system for representing, querying, and automating data
derivation,” Scientific and Statistical Database Management Conference (SSDBM), 2002.
[7] J. Frew and R. Bose, “Earth system science workbench: A data management infrastructure for earth science products,” Scientific and
Statistical Database Management Conference (SSDBM), 2001.
[8] J. Zhao, C. Goble, M. Greenwood, C. Wroe, and R. Stevens, “Annotating, linking and browsing provenance logs for e-science,”
International Semantic Web Conference (ISWC) Workshop on Retrieval of Scientific Data, 2003.
[9] J. Zhao, C. Wroe, C. Goble, R. Stevens, S. Bechhofer, D. Quan, and M. Greenwood, “Using semantic web technologies for
representing eScience provenance,” International Semantic Web Conference (ISWC), 2004.
[10] S. Sahoo, A. Sheth, and C. Henson, “Semantic provenance for e-Science: Managing the deluge of scientific data,” Internet Computing,
IEEE, vol. 12, 2008.
[11] Y. L. Simmhan, B. P., and D. Gannon, “A survey of data provenance in e-science,” SIGMOD Record, vol. 34, no. 3, pp. 31–36,
September 2005.
[12] S. Alvarez, J. Vazquez-Salceda, T. Kifor, L. Z. Varga, and S. Willmott. “Applying Provenance in Distributed Organ Transplant
Management,” International Provenance and Annotation Workshop, Chicago, USA, May 2006.
[13] J. Zhao, Y. Simmhan, K. Gomadam, and V. Prasanna, "Querying Provenance Information in Distributed Environments, "International
Journal of Computers and Their Applications, VOLUME 18, NO. 3, September 2011.
[14] V. I Levenshtein, "Binary codes capable of correcting deletions, insertions, and reversals," Soviet Physics Doklady (1966), Vol. 10,
Issue 8, pp. 707-710.
[15] SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/
[16] C. Chelmis, J. Zhao, V. Sorathia, V. Prasanna, and S. Agarwal, "Semiautomatic, Semantic Assistance to Manual Curation of Data in
Smart Oil Fields," SPE Western North American Regional Meeting, 2012.
[17] P. Neri,“Data Management Challenges in the Pre-Stack Era,” in First Break, 2011, 29, 97-100.
[18] J. B. Bednar, “Modeling, Migration and Velocity Analysis in Simple and Complex Structure,” by Panorama Technologies, Inc., 2009
[19] SURA, “Coastal Ocean Observing and Prediction (SCOOP)-Filename Conventions,” by Southeastern Universities Research
Association, 2006.
[20] T. Alsos, A. Eide, D. Astratti, S. Pickering, M.Benabentos, N. Dutta, S. Mallick, G. Schultz, L. den Boer, M. Livingstone, M. Nickel,
L. Sonneland, J. Schlaf, P. Schoepfer, M. Sigismondi, J. C. Soldo, and L. K. Stronen, “Seismic Applications Throughout the Life of the
Reservoir,” in Oilfield Review, 2002, 14, 48-65.
Download