Semantic Annotation - University of Southern California

SPE-153272-PP Recovering Linkage Between Seismic Images and Velocity Models Jing Zhao, Charalampos Chelmis, Vikram Sorathia, Viktor Prasanna, Abhay Goel, University of Southern California Copyright 2012, Society of Petroleum Engineers This paper was prepared for presentation at the SPE Western North American Regional Meeting held in Bakersfield, California, USA, 19–23 March 2012. This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessar ily reflect any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohi bited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright. Abstract Seismic processing and interpretation involves resource intensive processing in the petroleum exploration domain. By employing various types of models, seismic interpretations are often derived in an iterative refinement process, which may result in multiple versions of seismic images. Keeping track of the derivation history (a.k.a. provenance) for such images thus becomes an important issue for data management. Specifically, the information about what velocity model was used to generate a seismic image is useful evidence for measuring the quality of the image. The information can also be used for audit trail and image reproduction. However, in practice, existing seismic processing and interpretation systems do not always automatically capture and maintain this type of provenance information. In this paper, by employing state-of-the-art techniques in text analytics, semantic processing and machine learning, we propose an approach that recovers the linkage between seismic images and their ancestral velocity models when no provenance information is recorded. Our approach first retrieves information from file/directory names of the images and models, such as project names, processing vendors, and algorithms involved in the seismic processing and interpretation. Along with the creation timestamps, the retrieved information is associated with corresponding images and models as metadata. The metadata of a seismic image and its ancestral models usually satisfy certain relationships. In our approach, we detect and represent such relationships as rules, and a matching process utilizes the rules and retrieved metadata to find the best-matching images and models. In practice, images’ and models’ file names often do not adhere to naming standards and they are stored without following well established record keeping practices. Users may also use different terms to express the same information in file/directory names. We employ Semantic Web technologies to address this challenge. We develop domain ontologies with OWL/RDFs, based on which we provide an interactive way for users to semantically annotate terms contained in file/directory names. All metadata used by the image-model matching process is represented as ontology instances. Matching can be performed using the standard semantic query language. The evaluation results show that our approach can achieve satisfying accuracy. Introduction Petroleum exploration and production domain employs various scientific methods that involve complex workflows requiring advanced computational, storage, sensing, networking, and visualization capability [17]. Effective data management becomes critical with increased regulatory requirements for reporting and standard compliance [19]. Large data volumes in SCADA systems, data historian systems, hydrocarbon accounting systems, systems of records and other production and operation systems can be managed with relative ease due to their structured nature and well-documented schema. However, other specialized systems used for imaging, analysis, optimization, forecasting, and scheduling involve complex scientific workflows handling data models that are recognized by specific vendor products only. Engineers and geoscientists handle various kinds of data by subjecting them to complex domain specific models and algorithms that result in large number of derived datasets, unstructured or semi-structured, which are stored with little or no metadata describing their derivation history. Once the analysis is complete, and resulting datasets are transferred in a storage repository, retrieval at later stages becomes difficult, time consuming and labor intensive. Seismic imaging is a scientific domain that is being increasingly employed not only in exploration, but also in other stages of 2 SPE SPE-153272-PP E&P lifecycle [20]. A typical seismic imaging workflow involves various steps including data collection, data processing, model building, data interpretation, analysis, rendering and visualization. Seismic image processing and interpretation involves highly interactive and iterative processes that require loading, storing, referencing, and rendering of large volumes of datasets [17]. This requires large amounts of computation and storage capability in addition to domain specific software products and tools capable in handling, manipulating and visualizing such datasets. Geoscientists skilled in modeling, characterization, and interpretation employ various techniques and generate large amounts of intermediate datasets during this process. From 2D, 2.5D, and 3D surveys; pre-stacking, post-stacking approaches; various types of migration algorithms; there are several techniques proposed for various types of geological structures [18]. In a typical workflow, velocity model or earth model is first generated for specific geological structure, which is then used to interpret the results in seismic volumes. Typically, this workflow is repeated with some variations in interpretation parameters until best representations are found, thereby resulting in large amounts of volumes for a given velocity model. Data generated during this process is generally retained with no or incomplete metadata. Over a period, data repositories receive contributions by large teams of geoscientists working on multiple projects. In absence of proper metadata and record keeping practices, seismic datasets lose the context in which they were created. In order to be useful in decision-making, all derived volumes must retain the link to the original velocity models [17]. The interpreters have to spend considerable amount of effort in rediscovering the associated source models. Without formal metadata record, the file names of models may provide some hints. However, individual interpreters may not have followed consistent file naming standards, and may not have used unique terms to express the same semantic meaning. This significantly increases the time and effort required to find the right velocity model from a repository. We argue that with careful application of advanced machine learning, semantic web and text analytics techniques, we can address this problem and achieve significant reduction of the search space. Our approach employs text analytics to extract key words used by individual interpreters in file names and identify the variations in expressing the same term. By introducing Semantic Web technologies, we generate Ontology for the file naming convention that contains concepts related to the seismic interpretation process and their possible expressions. Finally, by introducing machine-learning techniques, we implement a matching system that enables identification of linkage discovery among images and models. Recovering linkages in this manner can be particularly useful in not only generating the metadata, but also facilitating advance search capabilities based on various interpretation techniques and parameters. Establishing the derivation history is also useful in determining the quality and characteristics of seismic volumes. Motivating Scenario. Figure 1 depicts the result of a seismic image interpretation process carried out for BP 2004 Salt Structure Data [18]. Here, a velocity model is utilized with various interpretation techniques and different parameters. Derived seismic volumes are stored by interpreters in local disks or shared network folders. While storing these derived volumes using interpretation system, interpreters select filenames that capture key processing parameters by which the given volume was derived. In this particular case, interpreter has generated three volumes using three different interpretation parameters. One-way and two-way migration technique was performed on part of the dataset to generate an interpreted volume file. Based on the outcome, the third interpretation was performed using two-way migration technique on the full dataset. These variations are well captured in the file names. In addition, the geological structure type and the dataset name, the volume creation time, and the project name were also captured in the derived volume file names. This example provides a good understanding of the file naming convention that has been followed by the interpreters. Even though proper metadata is not generated for all derived seismic volumes, the selection of keywords in file names provides hints about how particular volume was derived. Knowing the Datasets, Project Name and Geological Structure Type, it becomes easier to establish links among volumes and the model that was used to derive them. In Figure 1, all derived volume file names indicate “BP_2004”, “projectbp” and “fslt” that can also be found in the model file name, with the exception of “fslt”, which is expressed as “fullsalt” instead. Clearly, file names not only include information about key parameters, but interpreters mostly select the same terms. However, the derived volumes include additional parameters capturing more information about the preprocessing and post processing steps, segmentation, and other image loading parameters that were used. In given example, “full”, “part”, “oneway”, “twoway”, “mig”terms are very specific to the interpretation process and therefore are found in the volume file names only. We argue that, it is possible to recover the linkages between volumes and associated velocity model by harnessing the hints provided by interpreters in the filenames. With careful observation of seismic processing and interpretation workflow and file naming convention followed in a specific organization, it is possible to establish rules that can help in detecting the linkage between files. For instance, dataset name, project name, geological structure name, and file creation dates all play key roles in discovering the match. However, pre-processing or post-processing parameters, file loading parameters etc. can be ignored as they are specific to volume names only. Designing a system for linkage discovery based on this matching approach SPE SPE-153272-PP 3 introduces various challenges when implemented for large number of users. As indicted in the example, different users use different keywords (e.g. “fslt” and “fullsalt”) for the same term (“Full Salt”). The proposed system must therefore effectively handle such variations, and must be able to address semantic and syntactic heterogeneity issues in order to accurately establish lost linkages between velocity models and their derived seismic image volumes. Figure 1. Example of Seismic Interpretation Process Indicating Generation of Multiple Volumes Using a Velocity Model System Overview Figure 2 illustrates the overview of our approach. In general, our approach consists of three steps: 1) Metadata extraction. Given a set of seismic images and velocity models, our approach first identifies and extracts the information that can be used for recovering the linkage between images and models. We employ a data loading process to retrieve the file names of images and models and their creation time. The name of a seismic image file or a velocity model file usually consists of multiple terms separated by “_”, where each term captures information about project name, processing vendors, and algorithms involved in the seismic processing and interpretation. We split file names of images and models into individual terms, and clean the terms by utilizing text analysis techniques. 2) Semantic annotation. The information encoded in the file names is the main hint for us to identify the linkage between images and models. However, as we have discussed, users may use different terms to express the same information, making it difficult to directly match seismic images and velocity models based on their names alone. To attack this challenge, we design an ontology as a global vocabulary to represent the information that may be encoded in file names. A user-interactive semantic annotation process, which is the second step of our approach, utilizes the ontology to annotate the terms extracted from file names. Each term is represented as an ontology instance that is stored in an ontology repository, and a file name can then be represented by a group of ontology instances. The group of ontology instances, along with the creation time, is associated with the corresponding seismic image or velocity model as its attributes. 3) Matching. In the last step, for each seismic image, we identify the velocity models that are probable to have been used for its creation. We use a set of rules to express the relationships that the attributes of an image and its ancestral model may have. For example, the creation time of the ancestral velocity model of a given image should be within a certain time window. According to the rules, we then execute semantic queries and rules on image and model attributes, to identify the best-matching images and models. We describe each step in detail in the following sections. 4 SPE SPE-153272-PP Metadata extraction Data Load File/directory names of images/models Ontology Database Pre-processing (Splitting, Cleansing) Bag of terms Semantic Annotation Creation time Ontology instances Matching Linkage between seismic images and velocity models Figure 2. Overview of the Semantic Based Matching Approach Metadata Extraction In our use case, all seismic image files and velocity model files are stored in servers running Linux operating system. Thus, we simply use “ls –l” command to get the standard Linux output containing two fields, i.e., file names and creation time, separated by spaces. We use the creation time and file names to extract the terms we use in the following steps in our approach. The name of a typical seismic image file or a velocity model file usually consists of multiple terms linked with underscore “_”. For example, one of the image files is named as “fslt_psdm_bp_projectbp_agc_il.bri”. In the file name, each term has its own semantic meaning, which represents metadata of the corresponding seismic image. In this example, “fslt” means the “full-salt” model, and “psdm” refers to the “pre-stack depth migration” imaging algorithm. The image file name contains these two terms to indicate that the seismic processing system for generating the image has employed the corresponding model and algorithm. Table 1 lists the meaning of all terms contained in the example file name. Term fslt psdm bp projectbp agc il Semantic Meaning The full-salt model The pre-stack depth migration imaging algorithm The processing vendor name The project name The image pre-processing step The inline sort method Table 1. Terms extracted from the example image file name, and the semantic meaning of the terms For each file name returned by the “ls” command, we split the file name into individual terms. Each image/model file is then associated with a group of terms. In general, a group of terms usually contain the following information: 1) Project names, for example, “projectbp” in the above example. 2) Processing vendors, a file name may contain terms like “BP”, “Chevron”, and “WesternGeco”. 3) Imaging algorithms, such as the pre-stack depth migration algorithm in the example. 4) Involved models, such as “sediment flood”, “water flood”, and “full-salt”. 5) Version information, for example, “v1” means the file is the first version of the seismic image. Other terms about the version information may include “v2”, “new”, and “update1”. 6) Post-processing steps, such as “agc”. SPE SPE-153272-PP 5 7) Sort order, such as “inline” and “crossline”. 8) Image loading parameters, such as “subvolume”, “partial” and “full”. However, not all the user-supplied terms are useful for the linkage recovery. Before we proceed further with the extracted terms, we need to remove redundant and useless terms. For example, if the term “agc” cannot be used as a hint for predicting the ancestral velocity models of the seismic image, we do not need to take it into account in the following steps. For the exemplified file name “fslt_psdm_bp_projectbp_agc_il.bri”, the metadata extraction step will generate the term group {“fslt”,“psdm”, “bp”, “projectbp”}. Matching based on Terms. We can match images to models directly, based on terms extracted from file names. Basically, a velocity model may be used to create a seismic image when we detect the following relationships between them: 1) The gap between their creation time is within a certain period of time (e.g., around a month). 2) They share the same or similar model/algorithm name, project name, and/or processing vendors. Based on this detection, we can compare the creation time and the extracted file name terms to identify the best-matching seismic images and velocity models. For example, we can match the seismic image “sedflood_psdm_bp_v5.il.bri”, which was created on 03/21/2011, to the velocity model “sedflood_psdm_bp.bln”, which was generated on 02/23/2011, since the seismic image file was created nearly one month after the velocity model, and they share the same imaging algorithm, model, and processing vendors. However, in practice, users do not always strictly follow the naming standards to name images and models. We used the Levenshtein Distance [14] to calculate the lexical similarity between terms, so as to identify terms that represent the same semantic meaning. For example, the Levenshtein distance between “full-salt” and “fullsalt” is 1, which is a small distance so that we can infer that the two terms may refer to the same model. However, in some cases terms expressing the same meaning may have a relative large distance. For example, the distance between “fs” and “fullsalt” is 6. Thus lexical similarity cannot accurately capture “semantic distance”. Semantic Annotation Annotation is about attaching names, attributes, comments, descriptions, etc. to a document or to a selected part of text. It provides additional information (metadata) about an existing piece of data. We use ontology to annotate terms extracted from file/directory names, so as to solve the problem of heterogeneous file naming conventions and information representation. The ontology acts as a global vocabulary that represents the semantic meaning of terms, assisting in term disambiguation and also helping in associating domain concepts and local facts. Figure 3. Snapshot of the Ontology Used for Annotation 6 SPE SPE-153272-PP Domain Ontology. Figure 3 illustrates a snapshot of the ontology that we use for annotation. For illustration purposes, this figure has been compressed to show only part of the classes and instances contained in the ontology. We use ontology classes such as “ProjectName”, “VendorName”, “Version”, “SeismicModelingAlgorithmName”, etc. to describe possible information contained in file names. As indicated by the name, each class captured a particular concept in file/directory names. Subclasses linked to each class define groups under particular concepts. For example, “Sub_Salt”, “Base_Salt”, “Full_Salt”, “Multi_Level_Salt”, and “Top_Salt” are all subclasses coming under the “Salt” class, which is in turn a subclass of “Geobody_Structure”. We can also find concepts like “fullsalt”, “fslt”, “flslt”, “fsalt” and “flst” that are linked to the “Full_Salt” class. In our ontology, the “Unknown” class captures terms which are unknown or don't fit in any of the above classes. Later on, with the help of the domain experts; such words can either form new classes or become new instances of existing classes. Annotation. During annotation [16], we annotate terms based on the domain ontology, and represent the terms as ontology instances belonging to corresponding ontology classes. For example, since the terms “fslt” is used to represent the algorithm name “Full_Salt”, the ontology class “Full_Salt” should be used for annotation. As shown in Figure 3, we create an instance “fslt” that belongs to the class “Full_Salt”. We define the instance as an Attribute of the file name. The whole file name can then be represented as a Semantic Entity containing a set of such attributes. For example, the file name “fslt_psdm_bp_projectbp_agc_il.bri”, can be represented by a group of attributes {fslt, psdm, bp, projectbp} that belongs to the ontology classes Full_Salt, PreStackDepthMigration, BP, and ProjectName, respectively. As shown in Figure 4, generated instances are stored in the ontology repository, which communicates with our annotation application through a SPARQL [15] endpoint. When annotating a group of terms extracted from an image/model file name, the Automated Annotation process first probes the ontology to identify if any existing ontology instances match the terms for annotation. If we can find such ontology instances, the ontology instances, as well as corresponding ontology classes, will be included as the attributes of the image/model. SPE SPE-153272-PP 7 Figure 4. Annotation approach Figure 5. Screenshot of the User Assisted Annotation Tool [16] If terms have not been annotated before, our annotation system marks them as “Unknown”. The user assisted annotation tool then annotates unknown terms as ontology instances. A domain expert who doesn't have prior knowledge of Semantic Web technology can easily update the main ontology using the interface provided by the tool. By utilizing this tool, a domain expert can either define new ontology classes for the unknown terms or associate them to previously defined classes. Figure 5 shows a screenshot of the annotation tool. As shown in the figure, “gulfofmaxico”, which is a term extracted from the file name “Fl_prstk_krchf_vol1_saltbody_insert_gulfofmaxico_vol1_2008.bri”, cannot be initially matched to any instance in the ontology because of a typo: “maxico”. Thus the annotation tool first annotates the term as “Unknown”. To define this “unknown” term, the annotation tool allows the user to navigate all relative ontology classes and select appropriate classes for annotation. In this example, the user selects the ontology class “Gulf_of_Mexico”, which is a subclass of “PlaceName”. The term “gulfofmaxico” is then described as an attribute of the file name, and a new instance “gulfofmaxico” belonging to the class “Gulf_of_Mexico” is added to the ontology with user corroboration. Semantic Based Matching Every image/model file name can now be represented as a semantic entity containing a set of semantic attributes, each expressing the semantic meaning of a term contained in the file name. Recall that we also capture the creation time of the file in the metadata extraction step. Time information is also represented as a semantic attribute. In the matching step, we speculate a velocity model was used to create a seismic image if we find that their annotation semantic entities match each other according to certain rules. In particular, we have developed two types of semantic-based matching approaches. Both approaches utilize semantic technologies such as SPARQL [15] to express the matching rules and apply the rules to semantic entities. We now introduce the two approaches in the following sections. Approach 1: Exact Pattern Matching. In our first approach, we first allow users to define a set of rules, each of which specify a condition that the file name of the matched velocity models must satisfy. We then compose a SPARQL query based on matching rules, and execute the query to search among all semantic entities of velocity models. The results of the query indicate potential velocity models which might have been used to generate the given seismic image. Specifically, a matching rule can be used to specify 1) What terms MUST be included in the file name of the model; 2) What terms are POSSIBLE to be included in the file name of the model; 8 SPE SPE-153272-PP 3) What terms MUST NOT be included in the file name of the model; 4) The range of creation time. Since all file names have been represented as semantic entities with semantic attributes, when defining a matching rule, users do not need to consider all possible terms differentiations. Instead, they can directly use our domain ontology to form their matching rules. Based on the annotation semantic entity of a given seismic image, a user can directly specify what ontology classes should be included in the semantic entity of the velocity model. For example, when the user finds that the semantic entity of the image contains an attribute “cvx”, and she thinks that the name of the matching velocity model should have the same processing vendor name, she can directly define a rule restricting the results to velocity models containing attributes belonging to the ontology class “Chevron”. Users can further define a set of rules with “if-else” structure in advance. “if-else” rules express the exclusion and/or inclusion over terms in image and model file names. For example, a user can define a rule of the form “if the semantic entity of the image contains an attribute belonging to ProcessingVendor P, the semantic entity of the model should also contain an attribute belonging to P”, where P can be seen as a parameter of the rule. Later, when the system does the matching for an image file whose semantic entity has an attribute “cvx”, our matching system captures “Chevron” as the argument for the “ifelse” rule, and automatically generates a matching rule that the semantic entity of the velocity model must contain an attribute belonging to the same ontology class “Chevron”. Based on the creation time of the seismic image, users can define rules to specify the creation time range for the velocity model, e.g., 3~5 weeks after the creation time of the image. In our system, we provide a simple GUI for users to define different types of matching rules, where users can select the corresponding ontology classes that are involved in the matching rules based on our domain ontology, and also specify operators such as “=”, “>”, “<”, and “”. SELECT ?velocityModel WHERE { ?velocityModel iam:hasAttribute ?vendorAttribute . ?vendorAttribute rdf:type iam:Chevron . ?velocityModel iam:hasAttribute ?geoAttribute . ?geoAttribute rdf:type iam:Full_Salt . ?velocityModel iam:hasAttribute ?creationTime . FILTER ( ?creationTime > "2010-01-01T00:00:00Z"^^xsd:dateTime ) } Figure 6. SPARQL Query Example All matching rules are then integrated together to compose a SPARQL query. Figure 6 illustrates a SPARQL query example that contains three matching rules: 1) The model name should have an attribute that is an instance of the ontology class “Chevron”. 2) The model name should have an attribute that is an instance of the class “Full_Salt”. 3) The creation time of the model should be after 00:00:00 of 01/01/2010. Approach 2: Matching Score. The above approach searches for velocity models whose file names satisfy all user-specified matching rules. However, in practice, images and models are often not named strictly following the naming, thus the completeness of the information cannot be guaranteed. Some important information contained in matching rules is often missing in model names. For example, the model name may miss the term indicating its processing vendor name. As a result, the correct matching model(s) cannot be identified by the SPARQL query. We develop a new approach to overcome this shortcoming of the exact pattern matching approach. Still, users define a set of matching rules. Instead of composing a SPARQL query containing all matching rules, we assign a score to each matching rule. Each velocity model is associated with a matching score, which is initiated as 0 in the beginning. We go through all rules and add the corresponding score to the model’s matching score if its file name satisfies a matching rule. We sort velocity models based on their matching scores and provide the top n models as our matching result. Here n is a number specified by the user. For example, suppose users define the three rules contained in the SPARQL query in Figure 6. The scores for Rule 1), 2) and 3) are 0.3, 0.5, and 0.2, respectively. Then if a velocity model, noted as m1, satisfies Rule 1) and 2), m1 gets a matching score 0.3+0.5=0.8. Similarly a velocity model m2 only satisfying Rule 3) gets a matching score 0.2, and a velocity model m3 satisfying Rule 2) and 3) gets a matching score 0.7. Thus m1 and m3 will be returned as matching results if a user wants the top 2 matching models for further selection (i.e., n is equal to 2). SPE SPE-153272-PP 9 The score of each matching rule can be considered “weighting”. Intuitively, a matching rule with more importance should have a higher score. Matching scores can be assigned by users according to domain knowledge, or can be “learned” using machine learning algorithms. Evaluation We evaluate our approach in this section. In our experiment, we first collect a set of seismic images and their corresponding velocity models, and record the correct matching between them as “ground truth”. We then use our approach to find the matching model for each seismic image among the set of velocity models, and compare our results with the “ground truth” so as to measure the precision. We utilize the matching score approach in the matching step. Experiment Setup and Dataset Generation. One challenge in our experiment is that although we can collect a relatively large set of images and models, it is still difficult to identify the “ground truth” since most of the correct linkage between images and models have not been recorded. To overcome this challenge, we designed an algorithm to generate synthetic “ground truth” based on the set of velocity models we have collected. Starting from around 500 velocity models collected from real applications, our algorithm mimics the naming procedure conveyed from domain users who utilize velocity models to generate seismic images. Our matching approach is then applied to find a matching model for each generated image. The links between velocity models and seismic images that are generated by our algorithm is used as the synthetic “ground truth”. As shown in Algorithm 1, for each seismic image, we randomly generate its creation time. Based on our observations on real data, the creation time of an image should be within 3~5 weeks after the model’s creation. But we also allow exceptions: the creation time of an image may be out of the time range with a small probability P T. After generating the creation time of the image, we represent the file name of the velocity model as a semantic entity. Our algorithm acts as a domain engineer, and determines what information should be included in the image file name based on a set of rules. For each piece of information, we randomly choose one of its possible representation terms according to our ontology. We compose the image file name by connecting all such terms (a term may be missing with a probability P M). We also generate some redundant information in the file name with probability PR. Input: 1) the file name of a velocity model; 2) a set of image generation rules; 3) probability PT, PM, and PR Output: the synthetic file name of a seismic image generated by the input velocity model 1. Identify the creation time TV of the velocity model; 2. Randomly generate a datetime TI within the range [TV+21 day, TV+35 day] with a probability PT that TI does not have any value range. Use TI as the creation time of the seismic image. 3. Extract useful terms from the velocity model file name. 4. Annotate extracted terms by using our domain ontology, and generate the annotation semantic entity. 5. Based on the image generation rules, identify what information should be included in the seismic image file name. Use a set of ontology classes (noted as {C}) to represent the necessary information. 6. For each ontology class in {C}, randomly pick one of its instances as its term representation in the image file name. 7. Compose the file name for the seismic image by connecting all the generated terms with “_”. Each term has a probability PM to be missing, and a probability PR to have a redundant copy in the file name (maybe with a different instance belonging to the same ontology class). Algorithm 1. Algorithm for Generating the Synthetic Image File Names We run our experiment in a Desktop with 3.06 GHz Intel Core i3 CPU and 4GB memory. Evaluation Results. We measure the precision of our approach when our approach only generates one matching model for each image file. This precision, noted as P1, measures the probability that our best matching result is the correct matching model: 𝑃1 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑚𝑎𝑡𝑐ℎ𝑖𝑛𝑔 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑖𝑠𝑚𝑖𝑐 𝑖𝑚𝑎𝑔𝑒𝑠 We also measure the probability of the correct matching model being covered when our approach generates multiple matching results. In general, if we use 𝑛 ≥ 1 to denote the number of matching results returned for each seismic image, we have: 𝑃𝑛 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡ℎ𝑎𝑡 𝑡ℎ𝑒 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑚𝑎𝑡𝑐ℎ𝑖𝑛𝑔 𝑚𝑜𝑑𝑒𝑙 𝑖𝑠 𝑐𝑜𝑣𝑒𝑟𝑒𝑑 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑖𝑠𝑚𝑖𝑐 𝑖𝑚𝑎𝑔𝑒𝑠 10 SPE SPE-153272-PP Figure 7. Matching precision as a function of number of returned matching results Figure 7 shows the precision Pn where n=1, 2, ⋯, 8. We see that we can only achieve a precision less than 60% when n=1. However, the precision improves greatly when multiple matching results are provided. When n=8, we can achieve 100% precision. This means that in order to recover lost links between images and models based solely on their file names, about 8 candidate models suffice for the correct matching model to be covered (i.e. the correct matching model to be in the retrieved results). Hence, we conjecture that our approach can effectively identify a small set of matching candidate models, thus drastically reducing the search space of candidate models for users to manually examine in order to select the right one. Related Work In this work, we use the Semantic Web technologies to attack the challenge caused by heterogeneous expression of information contained in image/model file names. Our later matching scheme is based on the semantic representations of file names. The Semantic Web technologies have been used in the oil and gas industry to address problems such as information integration and knowledge management [1][2][3][4]. For example, in [1], the POSC Caesar Association proposed the Oil and Gas Ontology (OGO), which is an industry wide ontology, in order to provide standard means for data integration within and across business domains. In [3], Norheim and Fjellheim introduced the AKSIO system, which is knowledge management system for petroleum industry. In addition, different use-cases have been discussed as applications of Semantic Web technologies in the oil and gas industry in [2]. In our earlier work, we use the Semantic Web technologies to address the problems in the reservoir management and real-time oil field operations setting [4]. Issues about how to develop ontologies, how to build the knowledge database, and examples of applications for information integration and knowledge management have been discussed in [4]. The linkage between the seismic images and the velocity models that were used for their creation can be seen as a type of provenance information. The provenance information explains the derivation history of a data object thus can be used for audit trail and is critical for data quality control. Applications and technologies for provenance collection, management and access have been widely discussed in domains such as e-Science and health care[5][6][7][8][9][10][11][12][13]. Conclusion The linkage between seismic images and velocity models that were used for their creation is an important type of provenance information required for further analysis and data quality control. In this paper, we proposed an approach to recover the missing linkage between seismic images and velocity models. Our approach extracts information contained in image/model file names, and utilizes Semantic Web technologies to annotate and represent extracted information. Based on user-specified rules, we designed algorithms to identify the matching between images and models. For future work, we will explore other possible information types that may be utilized to improve our prediction precision. We will also address the scalability of our approach so that we can effectively handle large datasets. Acknowledgement This work is supported by Chevron Corp. under the joint project, Center for Interactive Smart Oilfield Technologies (CiSoft), at the University of Southern California. References [1] POSC Caesar Association, http://www.posccaesar.com/ [2] F. Chum, “Use Case: Ontology-Driven Information Integration and Delivery A Survey of Semantic Web Technology in the Oil and Gas Industry,” April 2007. http://www.w3.org/2001/sw/sweo/public/UseCases/Chevron/ SPE SPE-153272-PP 11 [3] D. Norheim and R. Fjellheim, “AKSIO - Active Knowledge management in the petroleum industry,” in 3rd European Semantic Web Conference (Industry Forum), June 2006 [4] R. Soma, A. Bakshi, V. Prasanna, W. Da Sie and B. Bourgeois, Semantic-web technologies for Oil-field Management, SPE Intelligent Energy Conference and Exhibition, April 2008 [5] C. Pancerella et. al., “Metadata in the collaboratory for multi-scale chemical science,” Dublin Core Conference, 2003. [6] I. Foster, J. S. Vockler, M. Wilde, and Y. Zhao, “Chimera: A virtual data system for representing, querying, and automating data derivation,” Scientific and Statistical Database Management Conference (SSDBM), 2002. [7] J. Frew and R. Bose, “Earth system science workbench: A data management infrastructure for earth science products,” Scientific and Statistical Database Management Conference (SSDBM), 2001. [8] J. Zhao, C. Goble, M. Greenwood, C. Wroe, and R. Stevens, “Annotating, linking and browsing provenance logs for e-science,” International Semantic Web Conference (ISWC) Workshop on Retrieval of Scientific Data, 2003. [9] J. Zhao, C. Wroe, C. Goble, R. Stevens, S. Bechhofer, D. Quan, and M. Greenwood, “Using semantic web technologies for representing eScience provenance,” International Semantic Web Conference (ISWC), 2004. [10] S. Sahoo, A. Sheth, and C. Henson, “Semantic provenance for e-Science: Managing the deluge of scientific data,” Internet Computing, IEEE, vol. 12, 2008. [11] Y. L. Simmhan, B. P., and D. Gannon, “A survey of data provenance in e-science,” SIGMOD Record, vol. 34, no. 3, pp. 31–36, September 2005. [12] S. Alvarez, J. Vazquez-Salceda, T. Kifor, L. Z. Varga, and S. Willmott. “Applying Provenance in Distributed Organ Transplant Management,” International Provenance and Annotation Workshop, Chicago, USA, May 2006. [13] J. Zhao, Y. Simmhan, K. Gomadam, and V. Prasanna, "Querying Provenance Information in Distributed Environments, "International Journal of Computers and Their Applications, VOLUME 18, NO. 3, September 2011. [14] V. I Levenshtein, "Binary codes capable of correcting deletions, insertions, and reversals," Soviet Physics Doklady (1966), Vol. 10, Issue 8, pp. 707-710. [15] SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/ [16] C. Chelmis, J. Zhao, V. Sorathia, V. Prasanna, and S. Agarwal, "Semiautomatic, Semantic Assistance to Manual Curation of Data in Smart Oil Fields," SPE Western North American Regional Meeting, 2012. [17] P. Neri,“Data Management Challenges in the Pre-Stack Era,” in First Break, 2011, 29, 97-100. [18] J. B. Bednar, “Modeling, Migration and Velocity Analysis in Simple and Complex Structure,” by Panorama Technologies, Inc., 2009 [19] SURA, “Coastal Ocean Observing and Prediction (SCOOP)-Filename Conventions,” by Southeastern Universities Research Association, 2006. [20] T. Alsos, A. Eide, D. Astratti, S. Pickering, M.Benabentos, N. Dutta, S. Mallick, G. Schultz, L. den Boer, M. Livingstone, M. Nickel, L. Sonneland, J. Schlaf, P. Schoepfer, M. Sigismondi, J. C. Soldo, and L. K. Stronen, “Seismic Applications Throughout the Life of the Reservoir,” in Oilfield Review, 2002, 14, 48-65.

Semantic Annotation - University of Southern California

Related documents

Products

Support

Semantic Annotation - University of Southern California

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib