file

advertisement
Data Intensive Approach to
Semantics
a. Project Summary
- Realizing Semantic Technology, whose objective is enabling computer systems to provide novel intelligent
services or enhancing existing services using the meaning of data, is concerned as an important research
topic; however, we are still facing many obstacles to practically utilizing Semantic Technology in real world
application. The goal of this proposal is to present a general and theoretical methodology for extracting
implicit useful semantics from relational model and realizing semantically-enabled intelligent applications.
b. Background
- Semantics is about shared meanings, associations, and rules about the uses of things, and realizing
utilization of semantics means making computer system able to work with the meaning of data. If
computer systems are able to process the meaning of data, they can provide intelligent and enhanced
services in more efficient and effective ways.
- So far, many semantic technology projects have focused on high-level reasoning, artificial intelligent,
natural language processing, and mathematical propositions. These approaches do not have enough
performance and scalability and are not sufficient to be used for applications in real world enterprise
environment which requires massive and intensive data processing. On the other hand, there are some
cases which successfully enable intelligent services by extracting hiding implicit semantics thorough data
intensive analysis and empirical knowledge.
- The PageRank algorithm, which is one of the reasons of the big success of IT Company Google, calculates
importance of web documents using the number of back-links. In other words, it extends the implicit
meaning of links among web documents into importance of the documents. Collaborative Filtering which
is used in Amazon.com can be another good successful case of using implicit semantics of data. It is a
technology that is to predict a user’s preference based on the user’s and other similar user’s previous
transaction log. In the case of Collaborative Filtering, the system does not have to ‘understand’ the user’s
preference, but it still can perform a effective recommendation to users.
1
- Another good example is the case of Korean Government Public Procurement Ontology System. Previous
System used in the past was typical product search system built on relational model which does not
consider semantics of data. We have analyze the not only schema of data but also instance-level of data,
and we have implemented practical ontology search system that operates on large size real world data,
balancing power of inference, performance, and scalability.
- The successful cases explained above have common factors in that they analyze and utilize the extracted
semantics based on data centric analysis. We need a generalized theoretical methodology for extracting,
managing, querying implicit and explicit semantics embedded in relational model, and it will become a key
to the practical semantically-enabled application in real world.
. Research Objectives
- Most enterprise data in real world are stored in the “tables” of relational model without considering the
utilization of their semantics. Finding the unveiled semantics from these data and enriching them with the
semantics can enable more intelligent and meaningful services. The goal of this proposal is to define a
general methodology to extract, manage, and query the semantics within data in relational model for the
practical utilization of the semantics. The top considerations for this research are as follow: (1) The
analysis over data should focus on the correct data utilization based on the actual data analysis and
refinement methods, truncating unusable elements such as complex data representation techniques. (2)
Vocabulary used in system should be formal and controlled (3) The concept hierarchies of fundamental
entity should not be too deep, and essential including concrete entity realization. (4) Complex rules
should be disintegrated with more simple features with reliable methods such as normalization or
statistical analysis. (5) It is important to guarantee performance and scalability by continuous database
tuning, indexing, and simple inference procedure.
d. Expected Impact of Research
- It is expected that our proposed research has far-reaching implications. (1) A company who has a largescale massive database in practical enterprise environment can enrich their database semantically. It leads
them to have a variety of intelligent services with a flexible and fast way. (2) By defining extraction
methodology of implicit semantic information in the area of traditional database design, this research will
act as a bridge between the paradigm of traditional database design and new paradigm of ontology design
such as RDF, OWL. In other words, it enables smooth transitions from traditional DB world to more
intelligent semantic world.
2
Download