Graph Databases: Efficient storage  and Rapid retrieval  Robert Levinson

advertisement
Graph Databases: Efficient
storage  and Rapid retrieval 
Robert Levinson
Machine Intelligence Laboratory
University of California
Santa Cruz
THE CG MARS LANDER
High level
architecture
English Discourse
English Queries
CG Creator/Translator with Type Hierarchy
CG Parser
& Processor
English-CG-English
Translation
Query Processor
& Matcher
ADB Processor
ADB
English Translator,
Source reference,
& GUI
Answer: more
specific CGs in DB
Santa Cruz:
The CG Mars Lander
THE CG MARS LANDER
English
document
querie
s
CGs
replies
TH
SUBGRAPH-ISOMORPHISM
NP-COMPLETE 
 2 Main Methods:
 A. Backtracking Search
 B. Refinement O(n^2) on avg. 
 (both exploit candidate binding lists, modulo
type hierarchy)
 Key Idea: Amortize Cost Over

» Millions of Operations
» Mega-graph storage
Exploit Symmetry !! 
“Invariant with respect to transformation.”
“Shared information between objects
or systems or their representations.”
AB+AC = A(B+C). 
Symmetry Synonyms
 similarity
 commonality
 structure
 mutual information
 relationship
 redundancy
Total Information = Diversity +
Symmetry
 Diversity corresponds to Comp Sci
“Complexity” = resources required.

Diversity can often only be resolved with
Combinatorial Search 
Conceptual Graph Processing
 Concept Types
“a cat is an animal “
 Relation Types or Graph Type
“mother-of” Is “parent-of”
 Transitivity of Projection (subgraphisomorphism]
 Redundant Substructures
 Redundant Literals
 Redundant Pointers
6 Retrieval Methods:
 Method I: Flat Ordering
 Method II: 2-Levels: Indexes, Graphs
 Method III: Full Partial Order Hierarchy
 Method IV: Multi-Level Hierarchical
Retrieval
 Method V: Remember Node Bindings
 Method VI: UDS: The Universal Data
Structure 
THE CG MARS LANDER
Exploit Tuple-Based Linear
CGs ! 
(a conceptual graph syntax
that supports rapid
retirieval and questionanswering).
@CG000: {
AGNT (government, BE) }.
@CG001: {
AGNT (Hungarian_American_Enterprise_Fund,
invest),
OBJ
(invest, Dollars | 1000000 ),
IN
(Dollars | 1000000, first_business)
}.
 @CG002 : {
AGNT (@CG000, manage),
OBJ
(manage, @CG001) }.

THE CG MARS LANDER
A query:
/* Q2: Does anybody own the rag newspaper
New York Post ? */
Query::@bob_202 : {
ISA ( New_York_Post , newspaper [ n34861 ] )
,
CHRC ( newspaper [ n34861 ] , rag [ n9 ] ) ,
AGNT ( own [ v9125 ] , ????? ) ,
}.
THE CG MARS LANDER
Answer:  
/* A2: Rupert Murdoch once owned the troubled
tabloid newspaper
New York Post. */
@CG1684_3 : {
ISA ( New_York_Post , newspaper [ n34861 ] ) ,
CHRC ( newspaper [ n34861 ] , tabloid [ n27111 ] ) ,
CHRC ( newspaper [ n34861 ] , trouble [ n25320 ] )
,
AGNT ( own [ v9125 ] , Rupert Murdoch) ,
CHRC ( own [ v9125 ] , once )
}.
THE CG MARS LANDER
Capabilities & timings:
 Inputs:
– CGs (tens of thousands)
– pre-processed parts of speech
– Type Hierarchy (150,000 WORDNET
augmented English words)
– natural language queries
 Outputs:
– CG (save & restore) DB
– replies to queries
– specializations and maximal
specializations
THE CG MARS LANDER
Capabilities & timings:
– benchmark machine:
– Sun Ultra Enterprise 4000 (with 4 UltraSPARC 167Mhz and
512KB External Cache CPU and 256MB of main memory)
Read, process, and store an 18,000 CG input file in 1 hour and
46 minutes. 
 Reloading of above DB takes on the order of seconds. 
 A 150,000 word ontology is processed in 16 seconds. 
 Each query is handled in at most 5.5 seconds.
 For smaller database (hundreds of CGs only), the time to
handle a single query can be as low as 0.2 seconds. 
THE CG MARS LANDER
Cost/benefit analysis:

assume N CGs and Q queries

Method I Cost:

Method III Cost:
2
• N insertionsN  log
N Q
10
• Q queries
N
2
2
Q  log10 N
+
Cost/ benefit
table
N
Q
Method I
Cost
Method III
Cost
10
1
10
5.0
10
10
100
14.9
10
100
1,000
104.8
100
1
100
296.6
100
10
1,000
328.6
100
100
10,000
688.6
1,000
1
1,000
7,293.4
1,000
10
10,000
7,374.4
1,000
100
100,000
8,184.4
1,000
1,000
1,000,000
16,284.4
10,000
1,000
10,000,00
0
152,823.8
10,000
10,000
100,000,0
00
296,823.8
THE CG MARS LANDER
6 UDS DESIGN PRINCIPLES:
1. Every primitive data object, label or
symbol should be stored only once
with pointers used to denote the
actual uses of the object.
2. Every compound object should be
stored with the minimum information
required to represent the
combination of its parts.
THE CG MARS LANDER
3. Given no loss of accuracy, objects
should be processed at the highest
level of abstraction possible.
4. If one were to implement a conceptual
graph based on the diagrammatic
representation, the costs associated
with storage and matching would be
much higher than they need to be.
THE CG MARS LANDER
5. The same abstraction mechanism
that goes from labels to graphs can be
taken one step further to facilitate the
storage and retrieval of nested
context graphs.
6. A graph is itself the best descriptor of
its nodes.
CONCLUDING THOUGHTS
 The key to efficient implementation of CGs
is the exploitation of symmetry or
structure. 
 CG operations can be executed efficiently
in real-time applications. 
 At the implementation or machine level
knowledge representation formalisms sre
often nearly the same. 
THE CG MARS LANDER
References
[1] C. Colin and R. Levinson, ``Partial order
maintenance,'' Special Interest Group on
Information Retrieval Forum, vol. 23, no. 3,4, pp. 3459, 1988.
[2] G. Ellis, R. A. Levinson, and P. Robinson, ``Managing
complex objects in PEIRCE,'' Special Issue on
Object-Oriented Approaches in Artificial
Intelligence and Human-Computer Interaction
(IJMMS), vol. 41, pp. 109-148, 1994.
[3] R. Hughey, R. Levinson, and J. D. Roberts, eds.,
Issues in Parallel Hardware for Graph Retrieval,
1993.
More references…

[4]R. Levinson, ``A self-organizing retrieval system for
graphs,'' in AAAI-84, pp. 203-206, Morgan Kaufman, 1984.

[5] R. Levinson, ``Pattern associativity and the retrieval of
semantic networks,'' Computers and Mathematics with
Applications, vol. 23, no. 6-9, pp. 573-600, 1992. Part 2 of
Special Issue on Semantic Networks in Artificial
Intelligence, Fritz Lehmann, editor. Also reprinted on
pages 573-600 of the book, Semantic Networks in Artificial
Intelligence, Fritz Lehmann, editor, Pergammon Press,
1992.
THE CG MARS LANDER
References
[6] R. Levinson and G. Ellis, ``Multilevel hierarchical
retrieval,'' Knowledge-Based Systems, vol. 5,
pp. 233-244, September 1992. Special Issue on
Conceptual Graphs.
[7] R. Levinson and G. Fuchs, ``A pattern-weight
formulation of search knowledge,'' Tech. Rep.
UCSC-CRL-91-15, University of California Santa
Cruz, 2001. Revision to appear in Computational
Intelligence.
[8] R. A. Levinson, ``UDS: A universal data structure,''
in Proc. 2nd International Conference on
Conceptual Structures, (College Park, Maryland
Download