Promoting Semantic Interoperability of Metadata for

advertisement
Promoting Semantic Interoperability
of Metadata for Directories of the Future
Art Vandenberg, Georgia State University
Avandenberg@gsu.edu
Vijay K. Vaishnavi, Georgia State University
Vvaishna@gsu.edu
Chris Shaw, Georgia Institute of Technology
Cdshaw@cc.gatech.edu
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
1
Abstract
A challenge in LDAP schema design and interoperability is
better understanding of schema inter-relationships across
organizations. Georgia State has received NSF funding to
research an approach based on the proposition that
monitoring, clustering, and visualization of crossorganizational metadata can help identify patterns of
practice and lead to dynamic evolution of standards. A
semantic facilitator tool is demonstrated that uses SelfOrganizing Maps for clustering and viewing metadata, and
implements an instance of the Stereoscopic Field Analyzer
(SFA) to visualize directory objects’ in 3-dimensional,
interactive space.
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
2
New Approach to Metadata
•
•
•
•
Domain – directory metadata standards
Team & Funding
Research & experimentation
Semantic Facilitator TM SM Prototype
– Schema repository
– Select schema & universal input vector, cluster, view
– Repeat with tailored input vector (reference set)
• LSA/LSI with localDomainPerson
• SFA (Stereoscopic Field Analyzer)
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
3
Problem Domain
• Inter-organizational directory metadata
–
–
–
–
Standard objectClasses beneficial
Working group approach (often lengthy) to defining standards
No sooner adopted than “adapted and changed”
No sooner finished than new requirement
• How to enhance/improve this time-consuming practice?
• Relevant NMI Integration Testbed Components
–
–
–
–
eduPerson, eduOrg, commObject (ITU H.350), (courseID…)
LDAP Recipe
Metadirectory Practices for Enterprise Directories in Higher Ed
LDAP Analyzer
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
4
Proposed Approach
• Hypothesis:
monitoring, clustering, and appropriate visualization of
cross-organizational metadata can help identify patterns
of practice and lead to automatic evolution of standards
• Research literature, prototype, experimental validation
• Key insight: self-organizing of complex systems
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
5
Team & Funding
• Directory Services Team
– http://www.gsu.edu/~wwwacs/DSR/index.htm
– CIS faculty / IT middleware / 2 PhD, 5 Masters, 2 undergrad
– College of Computing faculty, Georgia Tech / (2 recent Masters)
•
•
•
•
•
Initial discussions Fall 2000, formal meetings June 2001…
Sun Microsystems, Academic Equipment Grant, Fall 2001
Internet2 Middleware – working groups et al.
NMI Integration Testbed Program participant
NSF-ITR Award 0312636, Sep 2003-Aug 2006
– Promoting Semantic Interoperability of Metadata for Directories
of the Future
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
6
Research & Experimentation
• Research on metadata approaches, clustering approaches
• Kohonen Self-Organizing Maps (SOM), neural-networks
• Latent Semantic Analysis/Latent Semantic Indexing
(LSA/LSI)
• Genetic Algorithm SOM implementation (using Condor-NT)
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
7
Research & Experimentation
• Hypotheses:
– SOM parameters from other domains not best for LDAP metadata
– Can find SOM parameters giving results comparable to experts
– SOM parameters so good that new data from domain clusters well
• Experiment design
– LDAP experts cluster iPlanet objectClasses
– Run SOM algorithm with varied parameter values
– Compare SOM results to experts
• Conclusion: can cluster LDAP metadata as well as experts
• Genetic Algorithm can find SOM parameter solution
– evaluate on order of 10,000 SOM values
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
8
Self-Organizing Maps
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
9
Semantic Facilitator
TM SM
• Initial Prototype WITS02 Conference, December 2002
• Current version
–
–
–
–
Runs on IBM Websphere (Apache/Tomcat), java
Oracle database repository for schemas
User selects schema, sets input vector (reference set)
User selects SOM parameter values
• map dimensions, neighborhood size, iterations
– ObjectClasses are mapped
• Prototype Demonstration
– select schema(s), cluster, map
– select schema(s), define reference set, cluster, map
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
10
BREAK PAGE
[Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo.
Next slides are screen captures of a “walk through”
demonstrating how prototype is used by user to:
• Select LDAP from repository;
• Accept default feature & cluster objectclasses;
• Submit;
• Accept default SOM parameter values;
• Choose rectangular display;
• Display;
• Show text;
• Uncover nearby person objects.
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
11
Semantic Facilitator TM SM / prototype
SF / Choose LDAP (schema repository)
SF / Setting reference set (SOM input feature vector)
SF / review input features & objectClasses to cluster
SF / select SOM parameters (recommended is default…)
SF / select interface option (rectangular implemented…)
SF / resulting map (red tags added to highlight person objects)
SF / Hide(Show) Node Text
SF / nearness of eduPerson, gsuPerson
BREAK PAGE
[Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo.
Next slides continue “walk through” demonstrating
how prototype is used by user:
• By clearing feature objectclasses,
• using only inetOrgPerson, eduPerson, gsuPerson as reference
• and submitting with default SOM parameter values,
• person objects are drawn out from whole schema set.
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
21
SF / Uncheck all – select reference objectClasses
SF / use inetOrgPerson, eduPerson, gsuPerson as reference
SF / submit clustering with reference set
SF / use default SOM parameter values
SF / improvement in clustering…(person at extreme right off screen)
BREAK PAGE
[Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo.
Next slides continue “walk through” demonstrating
how prototype is used by user:
• Continuing to refine reference set by
• adding person, organizationalPerson,
• further improving discovery of person objects.
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
27
SF / add person, organizationalPerson to reference set…
SF / revised reference set…
SF / incremented reference set improves clustering…
Summary of preceding
• It is possible to cluster objectClasses from a directory
schema in a way comparable to experts (based on
experimental validation of computer vs. expert results).
• By specifying a “reference set” of objectClasses, it is
possible to draw out particular objectClasses (in this case
person related objects) from all the other objectClasses.
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
31
BREAK PAGE
[Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo.
Next slides show “walk through” where user:
• Selects UAB schema;
• Directly specifies a “reference set” of person objects;
• Displays result;
• Finds clustering of additional uabPerson objects.
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
32
SF / scenario: Find UAB person objects
SF / What if we used “person” reference set?
(person, organizationalPerson, inetOrgPerson, residentialPerson, newPilotPerson, eduperson )
SF / Notice that person objects are now clustered more closely… and
SF / “unstacking the objects” finds “uab-” objects: uabPerson, uabAlum,
uabEmployee, uabStudent as well as pabPerson, uabEntity...
Summary of preceding
• Using a “reference set” of common person objectClasses
(person, organizationalPerson, inetOrgPerson,
residentialPerson, newPilotPerson, eduPerson), it is
possible to draw out new, unknown person objectClasses
(uabPerson, uabAlum, uabEmployee, uabStudent as well
as pabPerson, uabEntity...).
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
37
BREAK PAGE
[Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo.
Next slides shows “walk through” where user:
• Selects IBM vendor delivered schema.
• Default options reveal no obvious person objects.
• User picks ePerson as start of reference set.
• By iteratively adding newly revealed person objectclasses,
• User finds successive person related objectclasses.
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
38
SF / find “IBM person” objects in Secureway (IBM) vendor delivered
schema
SF / use defaults and resulting map doesn’t immediately find “persons”
SF / Show Node Text for 301 objects is complex
SF / select “ePerson” as a start for input features vector (reference set)…
SF / now several person objects are found…
SF / unstack & Show Node Text to reveal person object names…
SF / using additional person objects to expand reference set…
SF / finds more person objects…
SF / Show Node Text reveals others…
SF / unstack objects, find Secureway person objects, including
eContactPerson, iGNPerson…
SF / in fact, inspecting nearby nodes finds eGSOuser, eUser
Summary of preceding
• Rather than starting with a known reference set, one can
build up a reference set incrementally, starting with a
single objectClass of likely relevance and adding newly
discovered objectClasses to refine the results.
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
50
BREAK PAGE
[Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo.
Next slides show multiple schema clustering
• First:
• Cluster CMU and UMich schemas
• show clustering of cmuPerson, umichPerson, eduPerson.
• Then:
• Cluster Novell, OpenLDAP, IBM, and iPlanet schemas
• show clustering of related person objectClasses:
3 eduPerson, gsu/ufl/um/admin/liPerson.
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
51
SF / Cluster multiple schemas: CMU (62 objects) and UMich (66 objects)
SF / unstack & Show Node Text… cmuPerson, umichPerson, eduPerson
SF / Cluster GSU, UFL, UMD, UCD – 587 total objects
(Novell, OpenLDAP, Secureway, iPlanet)
SF / four schemas clustered – let’s check eduPerson
SF / unstack & Show Node Text – objects exploded out from middle right
of screen (3 eduPerson, gsu/ufl/um/admin/liPerson)
Summary of preceding
• Multiple schemas, even from different vendor LDAPs, can
be clustered.
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
57
Following slides...
• Simulate1 the time steps in Self Organizing Map solution
• University of Michigan OpenLDAP schema objects
• Time steps of 1000 iterations for SOM parameters:
– X_dimension = 7 and Y_dimension = 8
– Neighborhood_size = 2
– Iterations = 10,000
• Illustrates clustering state progression (with person objects
tagged)
– Our experiment indicated that 10,000 iterations was best
– This sequence simulates iterations up to 20,000
– Shows “good fit” for 10,000 based on clustering of person objects
•
1NB:
this state function not yet implemented by prototype
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
58
SF / consider time steps in SOM
UMich OpenLDAP
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
1000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
2000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
3000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
4000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
5000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
6000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
7000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
8000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
9000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
10000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
11000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
12000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
13000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
14000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
15000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
16000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
17000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
18000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
19000 iterations
SF / SOM parameters xsize=7, ysize=9, neighborhood=2
20000 iterations
Summary of preceding
• Providing a “state” function, that displays intermediate
states of clustering, may be helpful in determining SOM
parameter values selection. User may have better sense of
“good” clustering result by visually following convergence
rate.
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
80
LSA/LSI analysis
• “Latent Semantic Analysis (LSA) is a theory and method for
extracting and representing the contextual-usage meaning of words by
statistical computations applied to a large corpus of text.” ref:
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to
Latent Semantic Analysis. Discourse Processes, 25, 259-284.
(http://lsa.colorado.edu/)
• Latent Semantic Analysis/Indexing is another technique for analyzing
information content.
• Typically used for document searching where one wants to rank order
relevance of documents based on their inclusion of a set of terms
• “Latency” in the sense that, while not having all terms being queried, a
document may still be ranked high because other terms usually do
occur in conjunction with the missing term(s).
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
81
LSA/LSI analysis
localDomainPerson
• localDomainPerson – analyzing the variations
• 21 schemas used in LSA/LSI test set
–
–
–
–
13 localDomainPerson
2 eduPerson (structural, auxiliary)
liPerson, iGNPerson (Secureway)
Top, person, organizationalPerson, inetOrgPerson
• Challenges on vendor/institution schema:
–
–
–
–
Explicit statement of inherited attributes vs. implicit
Multiple inclusion of attributes in one objectClass!
No, or Non-standard, OIDs (cf. eduPerson-oid, uwPerson-oid)
Variations on objectClasses specification format
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
82
LSA/LSI Analysis
• Latent Semantic Analysis/Indexing
– Jorge Civera Saiz, Georgia Tech
– Taruna Hariani, Georgia State
• Basic idea
– Document X Term matrix created (cf. objectClass X attribute)
– singular value decomposition (SVD)
• X = T * S * D’
• txd=txk*kxk*kxd
• k corresponds to “noise factor” - goal is to optimize
– Construct query on SVD
• In other words:
– Find relevant documents containing terms
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
83
Following slides…
• Results of SVD of objectClass by attributes matrix
of 21 person schemas
• The query was based on structural eduPerson
• Results of K=1 to K=21 are graphed
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
84
-0.2
-0.4
-0.6
to
ig
np p
er
so
n
p
um er
so
ic
n
hp
er
so
ub
n
pe
gs rso
n
up
er
so
n
so
na
uc
ux
dp
e
tn
ed rso
n
up
er
so
ua
n
bp
er
so
ed
n
up
ut
e
si
ed rso
up n
ut
e
m
ed rso
n
up
er
so
gu
n
pe
rs
is
on
up
er
uw son
pe
r
us son
tp
e
ug rso
n
ap
in
or
e
et
rs
ga
on
or
ni
gp
za
er
tio
na son
lp
er
so
n
lip
er
so
n
ed
up
er
K=2, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
-0.2
-0.4
-0.6
to
p
so
ig
np n
um
er
s
ic
hp on
er
so
ub
n
pe
gs rso
n
up
er
so
n
pe
r
uc
ed dpe
up
rs
on
er
so
na
ux
gu
p
e
tn
ed rso
n
up
er
so
ed
n
up
er
so
ua
n
b
pe
ut
si
ed rso
up n
ut
e
m
ed rso
n
up
er
so
is
n
up
er
uw son
pe
r
us son
tp
e
ug rso
n
ap
in
or
e
et
rs
ga
on
or
ni
gp
za
er
tio
na son
lp
er
so
n
lip
er
so
n
K=3, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
-0.2
-0.4
-0.6
to
p
so
ig
np n
um
er
s
ic
hp on
er
so
ub
n
pe
gs rso
n
up
er
so
n
pe
r
uc
ed dpe
up
rs
on
er
so
na
ux
gu
pe
rs
is
on
up
er
so
ed
n
up
e
tn
ed rso
up n
e
ua rso
n
ut bpe
si
ed rso
n
u
ut
m per
ed
s
up on
er
uw son
pe
r
us son
tp
e
ug rso
n
ap
in
or
e
et
rs
ga
on
or
ni
gp
za
er
tio
na son
lp
er
so
n
lip
er
so
n
K=4, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
-0.2
-0.4
-0.6
to
p
so
n
lip
er
so
ig
n
np
um
er
so
ic
n
hp
er
so
ub
n
pe
rs
on
pe
r
uc
dp
e
gs rso
n
up
er
so
g
n
ed up
e
up
r
er so n
so
na
ux
is
up
e
ed rso
n
up
e
tn
ed rso
up n
e
ua rso
n
ut bpe
si
ed rso
up n
ut
e
m
ed rso
n
up
er
uw son
pe
r
us son
tp
e
ug rso
n
in ape
or
et
rs
ga
or
on
ni
gp
za
er
tio
na son
lp
er
so
n
K=5, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
-0.2
-0.4
-0.6
to
p
so
n
lip
er
so
ig
n
np
um
er
so
ic
n
hp
er
so
ub
n
pe
rs
on
pe
r
uc
dp
e
gs rso
n
up
er
so
is
n
ed up
e
up
r
er son
so
na
ux
gu
pe
ed rso
n
up
e
tn
ed rso
n
ut upe
si
ed rso
n
u
ut
m per
ed
s
up on
er
uw son
pe
ua rso
n
bp
er
us so n
tp
in
er
et
so
or
gp n
or
e
ga
ug rso
ni
n
ap
za
tio ers
o
na
lp n
er
so
n
K=6, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
za
t
-0.2
-0.4
-0.6
to
lip p
um
er
s
ic
hp on
er
so
ig
np n
er
ub son
pe
rs
on
uc
dp
er
io
na son
lp
e
gs rso
ed upe n
up
rs
o
er
so n
na
ux
is
up
er
so
gu
n
pe
ed rso
n
up
in
e
et
rs
o
or
gp n
e
ug rso
n
ap
er
tn
so
ed
n
u
pe
ut
si
ed rso
n
up
ut
er
m
so
ed
n
up
er
so
ua
n
bp
er
uw so n
pe
r
us son
tp
er
so
n
pe
rs
on
or
ga
ni
K=7, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
za
t
-0.2
-0.4
-0.6
to
lip p
er
so
ig
n
np
um
er
so
ic
n
hp
er
so
ub
n
pe
rs
on
uc
dp
er
io
na son
lp
e
gs rso
ed upe n
up
rs
o
er
so n
na
ux
is
up
er
so
gu
n
pe
in
et
rs
or
on
gp
e
ed rso
n
up
er
so
ug
n
ap
er
tn
so
ed
n
u
pe
ut
si
ed rso
n
up
ut
er
m
so
ed
n
up
er
so
ua
n
bp
er
uw so n
pe
r
us son
tp
er
so
n
pe
rs
on
or
ga
ni
K=8, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
za
t
-0.2
-0.4
-0.6
to
lip p
er
so
ig
n
np
um
er
so
ic
n
hp
er
so
ub
n
pe
rs
on
uc
dp
er
io
na son
lp
e
gs rso
ed upe n
up
rs
o
er
so n
na
ux
is
up
er
so
gu
n
pe
in
et
rs
or
on
gp
e
ed rso
n
up
er
so
ug
n
ap
er
so
ua
n
b
pe
ut
si
r
so
ed
n
u
ut
pe
m
ed rso
n
up
er
tn
so
ed
up n
er
u w so
pe n
r
us son
tp
er
so
n
pe
rs
on
or
ga
ni
K=9, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
-0.2
-0.4
-0.6
to
lip p
er
so
ig
n
np
um
er
so
ic
n
hp
er
so
ub
n
pe
rs
on
uc
dp
or
e
ga
gs rso
ni
n
za upe
tio
r
na son
lp
ed
up ers
o
er
so n
na
ux
is
up
er
us son
tp
e
ed rso
n
up
er
so
gu
n
pe
in
et
rs
or
on
gp
e
ug rso
n
ap
er
uw so n
pe
ua rso
n
ut bpe
si
ed rso
n
u
ut
m per
ed
s
up on
e
tn
ed rso
up n
er
so
n
pe
rs
on
K=10, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
za
t
-0.2
-0.4
-0.6
to
ig
np p
er
so
n
lip
um
er
so
ic
n
hp
er
so
ub
n
pe
rs
on
uc
dp
er
io
na son
lp
e
gs rso
n
up
er
u s so
n
tp
er
so
is
n
up
er
so
ed
n
up
er
so
gu
n
pe
r
uw so n
ed
p
up ers
on
er
so
na
ua
ux
ut bpe
si
ed rso
up n
ut
e
m
ed rso
n
up
er
tn
so
ed
n
u
pe
in
et
rs
or
on
gp
e
ug rso
n
ap
er
so
n
pe
rs
on
or
ga
ni
K=11, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
-0.2
-0.4
-0.6
to
ig
np p
er
so
n
lip
um
er
s
ic
hp on
e
uc rso
n
dp
er
s
ub on
pe
rs
on
ed
or
up
ga
er
ni
s
za
tio ona
ux
na
lp
e
ed rso
n
up
er
us so
n
tp
er
so
gs
n
up
ut
e
si
ed rso
up n
ut
e
m
ed rso
n
up
er
so
is
n
up
er
gu son
pe
r
uw so n
pe
in
et
rs
o
or
gp n
e
ug rso
n
ap
er
so
ua
n
bp
er
tn
so
ed
n
up
er
so
n
pe
rs
on
K=12, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
za
t
-0.2
-0.4
-0.6
to
ig
np p
er
so
n
lip
um
er
s
ic
hp on
e
uc rso
n
dp
er
s
ub on
pe
rs
on
io
na
lp
er
so
us
n
tp
er
tn
so
ed
up n
er
so
is
n
up
er
so
ua
n
bp
er
so
gu
n
pe
r
uw so n
pe
rs
gs
o
up n
in
er
et
so
or
gp n
e
ed rso
n
up
er
so
ug
n
a
pe
ut
si
ed rso
up n
ut
e
m
ed rso
ed upe n
up
r
er son
so
na
ux
pe
rs
on
or
ga
ni
K=13, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
za
t
-0.2
-0.4
-0.6
to
ig
np p
er
so
n
lip
um
er
s
ic
hp on
e
uc rso
n
dp
er
s
ub on
pe
rs
on
io
na
lp
er
uw son
pe
r
us son
tp
e
tn
ed rso
up n
e
ua rso
n
bp
er
so
is
n
up
er
so
gu
n
pe
gs rso
n
up
er
so
ed
n
up
in
er
et
so
or
gp n
e
ug rso
n
ut ape
si
r
so
ed
n
u
ut
pe
m
ed rso
ed upe n
up
r
er son
so
na
ux
pe
rs
on
or
ga
ni
K=14, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
za
t
-0.2
-0.4
-0.6
to
ig
np p
er
so
n
lip
um
er
so
ic
n
hp
er
so
uc
n
dp
er
s
ub on
pe
gs rso
n
up
er
so
n
io
na
lp
er
uw son
pe
r
us son
tp
e
tn
ed rso
up n
e
ua rso
n
bp
er
so
is
n
up
er
so
gu
n
pe
ed rso
n
up
in
e
et
rs
or
o
gp n
e
ug rso
n
ut ape
si
r
so
ed
n
u
ut
pe
m
ed rso
n
ed upe
rs
up
on
er
so
na
ux
pe
rs
on
or
ga
ni
K=15, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
uw top
pe
ug rso
n
ap
er
us so
n
tp
er
tn
so
ed
up n
or
e
ga
ua rso
ni
n
bp
za
e
tio
r
na so n
lp
er
so
is
n
up
er
gu son
pe
in
et
rs
or
on
gp
e
ed rso
n
up
er
so
n
pe
ut
si
ed rso
n
u
ut
m per
ed
so
u
pe n
ed
rs
up
on
er
so
na
ux
lip
er
so
ig
n
np
um
er
so
ic
n
hp
er
so
uc
n
dp
er
s
ub on
pe
gs rso
n
up
er
so
n
K=16, eduPerson (structural) query
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
Series1
to
ed
p
up
or
er
ga
so
ug
ni
n
ap
za
tio er
na so n
lp
e
tn
ed rso
n
up
er
so
ua
n
bp
er
so
gu
n
pe
rs
on
is
up
er
u s so
n
tp
er
uw son
pe
ut
si
rs
ed
o
up n
ut
e
m
ed rso
n
up
in
er
et
so
or
gp n
er
so
n
ed
pe
rs
up
on
er
so
na
ux
lip
er
so
ig
n
np
um
er
so
ic
n
hp
er
so
uc
n
dp
er
s
ub on
pe
gs rso
n
up
er
so
n
K=17, eduPerson (structural) query
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
Series1
to
ed
p
up
or
er
ga
so
ug
ni
n
ap
za
tio er
na so n
lp
e
ua rso
n
bp
er
tn
so
ed
n
up
er
so
gu
n
pe
rs
on
is
up
er
u s so
n
tp
er
uw son
pe
ut
si
rs
ed
o
up n
ut
e
m
ed rso
n
up
er
so
n
pe
in
et
rs
or
on
ed gpe
up
rs
o
er
so n
na
ux
lip
er
so
ig
n
np
um
er
so
ic
n
hp
er
so
uc
n
dp
er
s
ub on
pe
gs rso
n
up
er
so
n
K=18, eduPerson (structural) query
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
Series1
or
ga
ni
za
t
-0.2
-0.4
-0.6
so
n
io
na top
lp
e
ug rso
n
ap
er
so
ua
n
bp
e
tn
ed rso
n
up
er
so
gu
n
pe
rs
on
is
up
er
u s so
n
tp
er
uw son
pe
rs
on
pe
in
et
rs
or
on
ut gpe
si
ed rso
n
u
ut
m per
ed
so
u
pe n
ed
rs
up
on
er
so
na
ux
lip
er
so
ig
n
np
um
er
so
ic
n
hp
er
so
uc
n
dp
er
s
ub on
pe
gs rso
n
up
er
so
n
ed
up
er
K=19, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
or
ga
ni
za
t
-0.2
-0.4
-0.6
so
n
io
na top
lp
e
ug rso
n
ap
er
so
ua
n
bp
e
tn
ed rso
n
up
er
so
gu
n
pe
rs
on
is
up
er
u s so
n
tp
er
uw son
pe
rs
on
pe
in
et
rs
or
on
ut gpe
si
ed rso
n
u
ut
m per
ed
so
u
pe n
ed
rs
up
on
er
so
na
ux
lip
er
so
ig
n
np
um
er
so
ic
n
hp
er
so
uc
n
dp
er
s
ub on
pe
gs rso
n
up
er
so
n
ed
up
er
K=20, eduPerson (structural) query
0.6
0.4
0.2
0
Series1
za
t
to
p
pe
r
io
na son
lp
in
e
et
or rso
n
gp
er
so
ed
n
up
ut
e
si
ed rso
up n
ut
e
m
ed rso
n
up
er
so
ua
n
bp
er
tn
so
ed
n
up
er
so
gu
n
pe
rs
us
on
tp
er
so
is
n
up
er
so
ug
n
ap
er
uw so n
p
ed
up ers
on
er
so
na
ux
lip
er
so
ig
n
np
um
er
so
ic
n
hp
er
so
uc
n
dp
er
s
ub on
pe
gs rso
n
up
er
so
n
or
ga
ni
K=21, eduPerson (structural) query
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
Series1
LSA/LSI – finding “k”
• K can reduce dimensionality... noise reduction
• What’s best “k”?
– Usually look to mid-range
– Too high, includes noise
– Too low, trivial
• Query vector composed of terms (attributes)
–
–
–
–
Returns ranking of documents (objectClasses)
Ranking based on containment of terms (attributes)
Document may contain many other terms…
Issue of latency & similarity
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
105
•
•
•
•
DRAFT results
Values of k=10
eduPerson (structural) query vector
Attribute similarity is an issue (oids, names…)
objectClass
ucdperson
gsuperson
organizationalperson
edupersonaux
isuperson
ustperson
eduperson
guperson
inetorgperson
ugaperson
uwperson
uabperson
utsieduperson
utmeduperson
tneduperson
person
top
liperson
ignperson
umichperson
ubperson
October 16, 2003
Art Vandenberg
rank
-0.354474
-0.090412
-0.065578
-0.010739
-0.007394
-0.006752
-0.005851
-0.004702
0.002533
0.005138
0.005221
0.007272
0.009338
0.009338
0.015221
0.156449
0.191709
0.397216
0.441857
0.481966
0.637099
abs val dif
rank
0.349
0.085
0.060
0.005
0.002
0.001
0.000
0.001
0.008
0.011
0.011
0.013
0.015
0.015
0.021
0.162
0.198
0.403
0.448
0.488
0.643
matching
attributes
Internet2 Fall Member Meeting
0
0
25
7
52
52
52
47
47
52
58
58
58
60
8
2
26
6
47
34
total
attributes
9
8
28
10
70
64
61
79
53
54
61
71
61
61
73
11
4
38
17
91
63
106
Summary of preceding
• LSA/LSI may provide another mode of analyzing
relationship of objectClasses based on their attributes
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
107
SFA
Stereoscopic Field Analyzer
• SFA: visualize high-dimensional spaces
–
–
–
–
Chris Shaw, College of Computing, Georgia Tech
SFA Windows 2000 version
Analyzing complex data in greater than 3D space
Using color, size, glyphs, vectors for additional dimensions
• General approach:
–
–
–
–
–
–
Tokenize schema data (use SOM prep, or LSA results) for set file
Set file “length” is number of vectors – objectclasses
Set file “Dimension” is vector length – attributes
Convert to binary
In SFA space x,y,z axes, color, glyph, etc. correspond to attributes
Plotted objects are the objectClasses
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
108
Stereoscopic Field Analyzer: weather data
BREAK PAGE
[Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo.
Next slides show “walk through” of SFA operation
• Initial interface
• Open a data file (schema)
• Select glyph type
• Scale glyph size
• Inspect mappings (attributes matched to dimensions)
• Rotate, move 3D display volume
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
110
Stereoscopic Field Analyzer (SFA)
SFA – select data file
SFA – select glyph type
SFA – scale glyphs
SFA – data, glyph, glyph-size selected
SFA – Edit Mappings
SFA – interactively…
SFA – interactively rotate…
SFA – interactively rotate space…
Summary of preceding
• SFA provides a 3D volume in which objectClasses can be
mapped
• Additional dimensions provided by color, glyphs, x-size…
• Manipulation of attribute mappings to various dimensions
can highlight objectClasses containing attributes
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
120
BREAK PAGE
[Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo.
Next slides demonstrate multidimensionality of SFA
• Given a set of 3 attributes (cn,fullname, emailaddress)
mapped to x, y, z dimensions,
• Using additional “dimensions” (color, opacity, xsize)
can provide additional (re-enforcing) information
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
121
91attr 86obj EDIR with sim / cn, fullname, emailAddress (x, y, z)
91attr 86obj EDIR with sim / cn, fullname, emailAddress, + color
emailAddress
91attr 86obj EDIR with sim / cn, fullname, emailAddress, + color
emailAddress + opacity cn
91attr 86obj EDIR with sim / cn, fullname, emailAddress, + color
emailAddress + opacity cn + xsize fullname
Summary of preceding
• Using “extra” dimensions (color, opacity, x-size…) can
help visualize information and relationship of objects
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
126
BREAK PAGE
[Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo.
Next slides show more complex visualization
• 3 initial attribute dimensions (cn, fullname, emailAddress) set;
• Adding 4th dimension (groupMembership) refines object set.
• Opening a second schema file
• Provides further opportunity to refine & compare objects.
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
127
91attr 86obj EDIR with sim / cn, fullname, emailAddress
91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembership
91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembership
open 2nd data set (497 attr, 86obj EDIR) …
91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembership
open 2nd data set… select different glyph type
91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembership
open 2nd data set, select different glyph type… display together
91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembership
open 2nd data set… edit mappings 2nd data set
91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembership
open 2nd data set… select sn, fullname, displayName, givenName, groupid
91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembership
open 2nd data set… compare… (iterate)
Summary of preceding
• Additional dimensions can be represented by mapping
attributes beyond the x, y, z axes...
• Such as using color as 4th dimension for data set 1.
• Opening of additional data set 2 with 5 dimensions (using
color and opacity).
• Comparing data between data sets may provide insight.
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
136
BREAK PAGE
[Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo.
Next slides show various additional functions of SFA.
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
137
SFA – way cool options… still investigating what’s there & what’s needed
1,000 words…
…
…
Overall Summary
• Challenges of cross-organizational LDAP schema
• New approach to metadata:
– monitoring, clustering, and visualization
– identify patterns of practice
– dynamic evolution of standards
• Semantic Facilitator TM SM tool
– Schema repository
– Self-Organizing Map technology
• Latent Semantic Analysis/Latent Semantic Indexing
• Stereoscopic Field Analyzer (SFA) 3D visualization
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
142
Concepts & Challenges
•
•
•
•
•
•
•
•
•
Validating clustering (without recourse to “humans”…)
Interface design and usability
Reference sets (automated; library of; cf. my_refs…)
Monitoring
SOM - additional interfaces and parameters
Genetic Algorithm: extend J. Liang Thesis work
DirNet a la WordNet® (an online lexical reference system)
“DNA” (Directory Node Analysis) signatures
Generalize as knowledge engine for virtual organizations
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
143
Near Future Work
• Deploy prototype as component based architecture
Semantic
Facilitator
SF
DB
Tables,
ERD
Web
Services
Http
JSP
Servlet
Shibboleth
Client
AuthN/Z
Browser
(users)
• Extend schema repository
• Build, validate reference sets
• LSA/LSI and SFA as “drill down” analysis components
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
144
Q&A
Contact:
Art Vandenberg
avandenberg@gsu.edu
Vijay Vaishnavi
vvaishnavi@gsu.edu
Chris Shaw
cdshaw@cc.gatech.edu
Directory Services Team
http://www.gsu.edu/~wwwacs/DSR/index.htm
October 16, 2003
Art Vandenberg
Internet2 Fall Member Meeting
145
Download