Promoting Semantic Interoperability of Metadata for Directories of the Future Art Vandenberg, Georgia State University Avandenberg@gsu.edu Vijay K. Vaishnavi, Georgia State University Vvaishna@gsu.edu Chris Shaw, Georgia Institute of Technology Cdshaw@cc.gatech.edu October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 1 Abstract A challenge in LDAP schema design and interoperability is better understanding of schema inter-relationships across organizations. Georgia State has received NSF funding to research an approach based on the proposition that monitoring, clustering, and visualization of crossorganizational metadata can help identify patterns of practice and lead to dynamic evolution of standards. A semantic facilitator tool is demonstrated that uses SelfOrganizing Maps for clustering and viewing metadata, and implements an instance of the Stereoscopic Field Analyzer (SFA) to visualize directory objects’ in 3-dimensional, interactive space. October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 2 New Approach to Metadata • • • • Domain – directory metadata standards Team & Funding Research & experimentation Semantic Facilitator TM SM Prototype – Schema repository – Select schema & universal input vector, cluster, view – Repeat with tailored input vector (reference set) • LSA/LSI with localDomainPerson • SFA (Stereoscopic Field Analyzer) October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 3 Problem Domain • Inter-organizational directory metadata – – – – Standard objectClasses beneficial Working group approach (often lengthy) to defining standards No sooner adopted than “adapted and changed” No sooner finished than new requirement • How to enhance/improve this time-consuming practice? • Relevant NMI Integration Testbed Components – – – – eduPerson, eduOrg, commObject (ITU H.350), (courseID…) LDAP Recipe Metadirectory Practices for Enterprise Directories in Higher Ed LDAP Analyzer October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 4 Proposed Approach • Hypothesis: monitoring, clustering, and appropriate visualization of cross-organizational metadata can help identify patterns of practice and lead to automatic evolution of standards • Research literature, prototype, experimental validation • Key insight: self-organizing of complex systems October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 5 Team & Funding • Directory Services Team – http://www.gsu.edu/~wwwacs/DSR/index.htm – CIS faculty / IT middleware / 2 PhD, 5 Masters, 2 undergrad – College of Computing faculty, Georgia Tech / (2 recent Masters) • • • • • Initial discussions Fall 2000, formal meetings June 2001… Sun Microsystems, Academic Equipment Grant, Fall 2001 Internet2 Middleware – working groups et al. NMI Integration Testbed Program participant NSF-ITR Award 0312636, Sep 2003-Aug 2006 – Promoting Semantic Interoperability of Metadata for Directories of the Future October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 6 Research & Experimentation • Research on metadata approaches, clustering approaches • Kohonen Self-Organizing Maps (SOM), neural-networks • Latent Semantic Analysis/Latent Semantic Indexing (LSA/LSI) • Genetic Algorithm SOM implementation (using Condor-NT) October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 7 Research & Experimentation • Hypotheses: – SOM parameters from other domains not best for LDAP metadata – Can find SOM parameters giving results comparable to experts – SOM parameters so good that new data from domain clusters well • Experiment design – LDAP experts cluster iPlanet objectClasses – Run SOM algorithm with varied parameter values – Compare SOM results to experts • Conclusion: can cluster LDAP metadata as well as experts • Genetic Algorithm can find SOM parameter solution – evaluate on order of 10,000 SOM values October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 8 Self-Organizing Maps October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 9 Semantic Facilitator TM SM • Initial Prototype WITS02 Conference, December 2002 • Current version – – – – Runs on IBM Websphere (Apache/Tomcat), java Oracle database repository for schemas User selects schema, sets input vector (reference set) User selects SOM parameter values • map dimensions, neighborhood size, iterations – ObjectClasses are mapped • Prototype Demonstration – select schema(s), cluster, map – select schema(s), define reference set, cluster, map October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 10 BREAK PAGE [Live demo of prototype tool] NOTE: Internet2 Presentation was live demo. Next slides are screen captures of a “walk through” demonstrating how prototype is used by user to: • Select LDAP from repository; • Accept default feature & cluster objectclasses; • Submit; • Accept default SOM parameter values; • Choose rectangular display; • Display; • Show text; • Uncover nearby person objects. October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 11 Semantic Facilitator TM SM / prototype SF / Choose LDAP (schema repository) SF / Setting reference set (SOM input feature vector) SF / review input features & objectClasses to cluster SF / select SOM parameters (recommended is default…) SF / select interface option (rectangular implemented…) SF / resulting map (red tags added to highlight person objects) SF / Hide(Show) Node Text SF / nearness of eduPerson, gsuPerson BREAK PAGE [Live demo of prototype tool] NOTE: Internet2 Presentation was live demo. Next slides continue “walk through” demonstrating how prototype is used by user: • By clearing feature objectclasses, • using only inetOrgPerson, eduPerson, gsuPerson as reference • and submitting with default SOM parameter values, • person objects are drawn out from whole schema set. October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 21 SF / Uncheck all – select reference objectClasses SF / use inetOrgPerson, eduPerson, gsuPerson as reference SF / submit clustering with reference set SF / use default SOM parameter values SF / improvement in clustering…(person at extreme right off screen) BREAK PAGE [Live demo of prototype tool] NOTE: Internet2 Presentation was live demo. Next slides continue “walk through” demonstrating how prototype is used by user: • Continuing to refine reference set by • adding person, organizationalPerson, • further improving discovery of person objects. October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 27 SF / add person, organizationalPerson to reference set… SF / revised reference set… SF / incremented reference set improves clustering… Summary of preceding • It is possible to cluster objectClasses from a directory schema in a way comparable to experts (based on experimental validation of computer vs. expert results). • By specifying a “reference set” of objectClasses, it is possible to draw out particular objectClasses (in this case person related objects) from all the other objectClasses. October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 31 BREAK PAGE [Live demo of prototype tool] NOTE: Internet2 Presentation was live demo. Next slides show “walk through” where user: • Selects UAB schema; • Directly specifies a “reference set” of person objects; • Displays result; • Finds clustering of additional uabPerson objects. October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 32 SF / scenario: Find UAB person objects SF / What if we used “person” reference set? (person, organizationalPerson, inetOrgPerson, residentialPerson, newPilotPerson, eduperson ) SF / Notice that person objects are now clustered more closely… and SF / “unstacking the objects” finds “uab-” objects: uabPerson, uabAlum, uabEmployee, uabStudent as well as pabPerson, uabEntity... Summary of preceding • Using a “reference set” of common person objectClasses (person, organizationalPerson, inetOrgPerson, residentialPerson, newPilotPerson, eduPerson), it is possible to draw out new, unknown person objectClasses (uabPerson, uabAlum, uabEmployee, uabStudent as well as pabPerson, uabEntity...). October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 37 BREAK PAGE [Live demo of prototype tool] NOTE: Internet2 Presentation was live demo. Next slides shows “walk through” where user: • Selects IBM vendor delivered schema. • Default options reveal no obvious person objects. • User picks ePerson as start of reference set. • By iteratively adding newly revealed person objectclasses, • User finds successive person related objectclasses. October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 38 SF / find “IBM person” objects in Secureway (IBM) vendor delivered schema SF / use defaults and resulting map doesn’t immediately find “persons” SF / Show Node Text for 301 objects is complex SF / select “ePerson” as a start for input features vector (reference set)… SF / now several person objects are found… SF / unstack & Show Node Text to reveal person object names… SF / using additional person objects to expand reference set… SF / finds more person objects… SF / Show Node Text reveals others… SF / unstack objects, find Secureway person objects, including eContactPerson, iGNPerson… SF / in fact, inspecting nearby nodes finds eGSOuser, eUser Summary of preceding • Rather than starting with a known reference set, one can build up a reference set incrementally, starting with a single objectClass of likely relevance and adding newly discovered objectClasses to refine the results. October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 50 BREAK PAGE [Live demo of prototype tool] NOTE: Internet2 Presentation was live demo. Next slides show multiple schema clustering • First: • Cluster CMU and UMich schemas • show clustering of cmuPerson, umichPerson, eduPerson. • Then: • Cluster Novell, OpenLDAP, IBM, and iPlanet schemas • show clustering of related person objectClasses: 3 eduPerson, gsu/ufl/um/admin/liPerson. October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 51 SF / Cluster multiple schemas: CMU (62 objects) and UMich (66 objects) SF / unstack & Show Node Text… cmuPerson, umichPerson, eduPerson SF / Cluster GSU, UFL, UMD, UCD – 587 total objects (Novell, OpenLDAP, Secureway, iPlanet) SF / four schemas clustered – let’s check eduPerson SF / unstack & Show Node Text – objects exploded out from middle right of screen (3 eduPerson, gsu/ufl/um/admin/liPerson) Summary of preceding • Multiple schemas, even from different vendor LDAPs, can be clustered. October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 57 Following slides... • Simulate1 the time steps in Self Organizing Map solution • University of Michigan OpenLDAP schema objects • Time steps of 1000 iterations for SOM parameters: – X_dimension = 7 and Y_dimension = 8 – Neighborhood_size = 2 – Iterations = 10,000 • Illustrates clustering state progression (with person objects tagged) – Our experiment indicated that 10,000 iterations was best – This sequence simulates iterations up to 20,000 – Shows “good fit” for 10,000 based on clustering of person objects • 1NB: this state function not yet implemented by prototype October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 58 SF / consider time steps in SOM UMich OpenLDAP SF / SOM parameters xsize=7, ysize=9, neighborhood=2 1000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 2000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 3000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 4000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 5000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 6000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 7000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 8000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 9000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 10000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 11000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 12000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 13000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 14000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 15000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 16000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 17000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 18000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 19000 iterations SF / SOM parameters xsize=7, ysize=9, neighborhood=2 20000 iterations Summary of preceding • Providing a “state” function, that displays intermediate states of clustering, may be helpful in determining SOM parameter values selection. User may have better sense of “good” clustering result by visually following convergence rate. October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 80 LSA/LSI analysis • “Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text.” ref: Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259-284. (http://lsa.colorado.edu/) • Latent Semantic Analysis/Indexing is another technique for analyzing information content. • Typically used for document searching where one wants to rank order relevance of documents based on their inclusion of a set of terms • “Latency” in the sense that, while not having all terms being queried, a document may still be ranked high because other terms usually do occur in conjunction with the missing term(s). October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 81 LSA/LSI analysis localDomainPerson • localDomainPerson – analyzing the variations • 21 schemas used in LSA/LSI test set – – – – 13 localDomainPerson 2 eduPerson (structural, auxiliary) liPerson, iGNPerson (Secureway) Top, person, organizationalPerson, inetOrgPerson • Challenges on vendor/institution schema: – – – – Explicit statement of inherited attributes vs. implicit Multiple inclusion of attributes in one objectClass! No, or Non-standard, OIDs (cf. eduPerson-oid, uwPerson-oid) Variations on objectClasses specification format October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 82 LSA/LSI Analysis • Latent Semantic Analysis/Indexing – Jorge Civera Saiz, Georgia Tech – Taruna Hariani, Georgia State • Basic idea – Document X Term matrix created (cf. objectClass X attribute) – singular value decomposition (SVD) • X = T * S * D’ • txd=txk*kxk*kxd • k corresponds to “noise factor” - goal is to optimize – Construct query on SVD • In other words: – Find relevant documents containing terms October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 83 Following slides… • Results of SVD of objectClass by attributes matrix of 21 person schemas • The query was based on structural eduPerson • Results of K=1 to K=21 are graphed October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 84 -0.2 -0.4 -0.6 to ig np p er so n p um er so ic n hp er so ub n pe gs rso n up er so n so na uc ux dp e tn ed rso n up er so ua n bp er so ed n up ut e si ed rso up n ut e m ed rso n up er so gu n pe rs is on up er uw son pe r us son tp e ug rso n ap in or e et rs ga on or ni gp za er tio na son lp er so n lip er so n ed up er K=2, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 -0.2 -0.4 -0.6 to p so ig np n um er s ic hp on er so ub n pe gs rso n up er so n pe r uc ed dpe up rs on er so na ux gu p e tn ed rso n up er so ed n up er so ua n b pe ut si ed rso up n ut e m ed rso n up er so is n up er uw son pe r us son tp e ug rso n ap in or e et rs ga on or ni gp za er tio na son lp er so n lip er so n K=3, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 -0.2 -0.4 -0.6 to p so ig np n um er s ic hp on er so ub n pe gs rso n up er so n pe r uc ed dpe up rs on er so na ux gu pe rs is on up er so ed n up e tn ed rso up n e ua rso n ut bpe si ed rso n u ut m per ed s up on er uw son pe r us son tp e ug rso n ap in or e et rs ga on or ni gp za er tio na son lp er so n lip er so n K=4, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 -0.2 -0.4 -0.6 to p so n lip er so ig n np um er so ic n hp er so ub n pe rs on pe r uc dp e gs rso n up er so g n ed up e up r er so n so na ux is up e ed rso n up e tn ed rso up n e ua rso n ut bpe si ed rso up n ut e m ed rso n up er uw son pe r us son tp e ug rso n in ape or et rs ga or on ni gp za er tio na son lp er so n K=5, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 -0.2 -0.4 -0.6 to p so n lip er so ig n np um er so ic n hp er so ub n pe rs on pe r uc dp e gs rso n up er so is n ed up e up r er son so na ux gu pe ed rso n up e tn ed rso n ut upe si ed rso n u ut m per ed s up on er uw son pe ua rso n bp er us so n tp in er et so or gp n or e ga ug rso ni n ap za tio ers o na lp n er so n K=6, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 za t -0.2 -0.4 -0.6 to lip p um er s ic hp on er so ig np n er ub son pe rs on uc dp er io na son lp e gs rso ed upe n up rs o er so n na ux is up er so gu n pe ed rso n up in e et rs o or gp n e ug rso n ap er tn so ed n u pe ut si ed rso n up ut er m so ed n up er so ua n bp er uw so n pe r us son tp er so n pe rs on or ga ni K=7, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 za t -0.2 -0.4 -0.6 to lip p er so ig n np um er so ic n hp er so ub n pe rs on uc dp er io na son lp e gs rso ed upe n up rs o er so n na ux is up er so gu n pe in et rs or on gp e ed rso n up er so ug n ap er tn so ed n u pe ut si ed rso n up ut er m so ed n up er so ua n bp er uw so n pe r us son tp er so n pe rs on or ga ni K=8, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 za t -0.2 -0.4 -0.6 to lip p er so ig n np um er so ic n hp er so ub n pe rs on uc dp er io na son lp e gs rso ed upe n up rs o er so n na ux is up er so gu n pe in et rs or on gp e ed rso n up er so ug n ap er so ua n b pe ut si r so ed n u ut pe m ed rso n up er tn so ed up n er u w so pe n r us son tp er so n pe rs on or ga ni K=9, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 -0.2 -0.4 -0.6 to lip p er so ig n np um er so ic n hp er so ub n pe rs on uc dp or e ga gs rso ni n za upe tio r na son lp ed up ers o er so n na ux is up er us son tp e ed rso n up er so gu n pe in et rs or on gp e ug rso n ap er uw so n pe ua rso n ut bpe si ed rso n u ut m per ed s up on e tn ed rso up n er so n pe rs on K=10, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 za t -0.2 -0.4 -0.6 to ig np p er so n lip um er so ic n hp er so ub n pe rs on uc dp er io na son lp e gs rso n up er u s so n tp er so is n up er so ed n up er so gu n pe r uw so n ed p up ers on er so na ua ux ut bpe si ed rso up n ut e m ed rso n up er tn so ed n u pe in et rs or on gp e ug rso n ap er so n pe rs on or ga ni K=11, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 -0.2 -0.4 -0.6 to ig np p er so n lip um er s ic hp on e uc rso n dp er s ub on pe rs on ed or up ga er ni s za tio ona ux na lp e ed rso n up er us so n tp er so gs n up ut e si ed rso up n ut e m ed rso n up er so is n up er gu son pe r uw so n pe in et rs o or gp n e ug rso n ap er so ua n bp er tn so ed n up er so n pe rs on K=12, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 za t -0.2 -0.4 -0.6 to ig np p er so n lip um er s ic hp on e uc rso n dp er s ub on pe rs on io na lp er so us n tp er tn so ed up n er so is n up er so ua n bp er so gu n pe r uw so n pe rs gs o up n in er et so or gp n e ed rso n up er so ug n a pe ut si ed rso up n ut e m ed rso ed upe n up r er son so na ux pe rs on or ga ni K=13, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 za t -0.2 -0.4 -0.6 to ig np p er so n lip um er s ic hp on e uc rso n dp er s ub on pe rs on io na lp er uw son pe r us son tp e tn ed rso up n e ua rso n bp er so is n up er so gu n pe gs rso n up er so ed n up in er et so or gp n e ug rso n ut ape si r so ed n u ut pe m ed rso ed upe n up r er son so na ux pe rs on or ga ni K=14, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 za t -0.2 -0.4 -0.6 to ig np p er so n lip um er so ic n hp er so uc n dp er s ub on pe gs rso n up er so n io na lp er uw son pe r us son tp e tn ed rso up n e ua rso n bp er so is n up er so gu n pe ed rso n up in e et rs or o gp n e ug rso n ut ape si r so ed n u ut pe m ed rso n ed upe rs up on er so na ux pe rs on or ga ni K=15, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 uw top pe ug rso n ap er us so n tp er tn so ed up n or e ga ua rso ni n bp za e tio r na so n lp er so is n up er gu son pe in et rs or on gp e ed rso n up er so n pe ut si ed rso n u ut m per ed so u pe n ed rs up on er so na ux lip er so ig n np um er so ic n hp er so uc n dp er s ub on pe gs rso n up er so n K=16, eduPerson (structural) query 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 Series1 to ed p up or er ga so ug ni n ap za tio er na so n lp e tn ed rso n up er so ua n bp er so gu n pe rs on is up er u s so n tp er uw son pe ut si rs ed o up n ut e m ed rso n up in er et so or gp n er so n ed pe rs up on er so na ux lip er so ig n np um er so ic n hp er so uc n dp er s ub on pe gs rso n up er so n K=17, eduPerson (structural) query 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 Series1 to ed p up or er ga so ug ni n ap za tio er na so n lp e ua rso n bp er tn so ed n up er so gu n pe rs on is up er u s so n tp er uw son pe ut si rs ed o up n ut e m ed rso n up er so n pe in et rs or on ed gpe up rs o er so n na ux lip er so ig n np um er so ic n hp er so uc n dp er s ub on pe gs rso n up er so n K=18, eduPerson (structural) query 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 Series1 or ga ni za t -0.2 -0.4 -0.6 so n io na top lp e ug rso n ap er so ua n bp e tn ed rso n up er so gu n pe rs on is up er u s so n tp er uw son pe rs on pe in et rs or on ut gpe si ed rso n u ut m per ed so u pe n ed rs up on er so na ux lip er so ig n np um er so ic n hp er so uc n dp er s ub on pe gs rso n up er so n ed up er K=19, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 or ga ni za t -0.2 -0.4 -0.6 so n io na top lp e ug rso n ap er so ua n bp e tn ed rso n up er so gu n pe rs on is up er u s so n tp er uw son pe rs on pe in et rs or on ut gpe si ed rso n u ut m per ed so u pe n ed rs up on er so na ux lip er so ig n np um er so ic n hp er so uc n dp er s ub on pe gs rso n up er so n ed up er K=20, eduPerson (structural) query 0.6 0.4 0.2 0 Series1 za t to p pe r io na son lp in e et or rso n gp er so ed n up ut e si ed rso up n ut e m ed rso n up er so ua n bp er tn so ed n up er so gu n pe rs us on tp er so is n up er so ug n ap er uw so n p ed up ers on er so na ux lip er so ig n np um er so ic n hp er so uc n dp er s ub on pe gs rso n up er so n or ga ni K=21, eduPerson (structural) query 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 Series1 LSA/LSI – finding “k” • K can reduce dimensionality... noise reduction • What’s best “k”? – Usually look to mid-range – Too high, includes noise – Too low, trivial • Query vector composed of terms (attributes) – – – – Returns ranking of documents (objectClasses) Ranking based on containment of terms (attributes) Document may contain many other terms… Issue of latency & similarity October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 105 • • • • DRAFT results Values of k=10 eduPerson (structural) query vector Attribute similarity is an issue (oids, names…) objectClass ucdperson gsuperson organizationalperson edupersonaux isuperson ustperson eduperson guperson inetorgperson ugaperson uwperson uabperson utsieduperson utmeduperson tneduperson person top liperson ignperson umichperson ubperson October 16, 2003 Art Vandenberg rank -0.354474 -0.090412 -0.065578 -0.010739 -0.007394 -0.006752 -0.005851 -0.004702 0.002533 0.005138 0.005221 0.007272 0.009338 0.009338 0.015221 0.156449 0.191709 0.397216 0.441857 0.481966 0.637099 abs val dif rank 0.349 0.085 0.060 0.005 0.002 0.001 0.000 0.001 0.008 0.011 0.011 0.013 0.015 0.015 0.021 0.162 0.198 0.403 0.448 0.488 0.643 matching attributes Internet2 Fall Member Meeting 0 0 25 7 52 52 52 47 47 52 58 58 58 60 8 2 26 6 47 34 total attributes 9 8 28 10 70 64 61 79 53 54 61 71 61 61 73 11 4 38 17 91 63 106 Summary of preceding • LSA/LSI may provide another mode of analyzing relationship of objectClasses based on their attributes October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 107 SFA Stereoscopic Field Analyzer • SFA: visualize high-dimensional spaces – – – – Chris Shaw, College of Computing, Georgia Tech SFA Windows 2000 version Analyzing complex data in greater than 3D space Using color, size, glyphs, vectors for additional dimensions • General approach: – – – – – – Tokenize schema data (use SOM prep, or LSA results) for set file Set file “length” is number of vectors – objectclasses Set file “Dimension” is vector length – attributes Convert to binary In SFA space x,y,z axes, color, glyph, etc. correspond to attributes Plotted objects are the objectClasses October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 108 Stereoscopic Field Analyzer: weather data BREAK PAGE [Live demo of prototype tool] NOTE: Internet2 Presentation was live demo. Next slides show “walk through” of SFA operation • Initial interface • Open a data file (schema) • Select glyph type • Scale glyph size • Inspect mappings (attributes matched to dimensions) • Rotate, move 3D display volume October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 110 Stereoscopic Field Analyzer (SFA) SFA – select data file SFA – select glyph type SFA – scale glyphs SFA – data, glyph, glyph-size selected SFA – Edit Mappings SFA – interactively… SFA – interactively rotate… SFA – interactively rotate space… Summary of preceding • SFA provides a 3D volume in which objectClasses can be mapped • Additional dimensions provided by color, glyphs, x-size… • Manipulation of attribute mappings to various dimensions can highlight objectClasses containing attributes October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 120 BREAK PAGE [Live demo of prototype tool] NOTE: Internet2 Presentation was live demo. Next slides demonstrate multidimensionality of SFA • Given a set of 3 attributes (cn,fullname, emailaddress) mapped to x, y, z dimensions, • Using additional “dimensions” (color, opacity, xsize) can provide additional (re-enforcing) information October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 121 91attr 86obj EDIR with sim / cn, fullname, emailAddress (x, y, z) 91attr 86obj EDIR with sim / cn, fullname, emailAddress, + color emailAddress 91attr 86obj EDIR with sim / cn, fullname, emailAddress, + color emailAddress + opacity cn 91attr 86obj EDIR with sim / cn, fullname, emailAddress, + color emailAddress + opacity cn + xsize fullname Summary of preceding • Using “extra” dimensions (color, opacity, x-size…) can help visualize information and relationship of objects October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 126 BREAK PAGE [Live demo of prototype tool] NOTE: Internet2 Presentation was live demo. Next slides show more complex visualization • 3 initial attribute dimensions (cn, fullname, emailAddress) set; • Adding 4th dimension (groupMembership) refines object set. • Opening a second schema file • Provides further opportunity to refine & compare objects. October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 127 91attr 86obj EDIR with sim / cn, fullname, emailAddress 91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembership 91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembership open 2nd data set (497 attr, 86obj EDIR) … 91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembership open 2nd data set… select different glyph type 91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembership open 2nd data set, select different glyph type… display together 91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembership open 2nd data set… edit mappings 2nd data set 91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembership open 2nd data set… select sn, fullname, displayName, givenName, groupid 91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembership open 2nd data set… compare… (iterate) Summary of preceding • Additional dimensions can be represented by mapping attributes beyond the x, y, z axes... • Such as using color as 4th dimension for data set 1. • Opening of additional data set 2 with 5 dimensions (using color and opacity). • Comparing data between data sets may provide insight. October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 136 BREAK PAGE [Live demo of prototype tool] NOTE: Internet2 Presentation was live demo. Next slides show various additional functions of SFA. October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 137 SFA – way cool options… still investigating what’s there & what’s needed 1,000 words… … … Overall Summary • Challenges of cross-organizational LDAP schema • New approach to metadata: – monitoring, clustering, and visualization – identify patterns of practice – dynamic evolution of standards • Semantic Facilitator TM SM tool – Schema repository – Self-Organizing Map technology • Latent Semantic Analysis/Latent Semantic Indexing • Stereoscopic Field Analyzer (SFA) 3D visualization October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 142 Concepts & Challenges • • • • • • • • • Validating clustering (without recourse to “humans”…) Interface design and usability Reference sets (automated; library of; cf. my_refs…) Monitoring SOM - additional interfaces and parameters Genetic Algorithm: extend J. Liang Thesis work DirNet a la WordNet® (an online lexical reference system) “DNA” (Directory Node Analysis) signatures Generalize as knowledge engine for virtual organizations October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 143 Near Future Work • Deploy prototype as component based architecture Semantic Facilitator SF DB Tables, ERD Web Services Http JSP Servlet Shibboleth Client AuthN/Z Browser (users) • Extend schema repository • Build, validate reference sets • LSA/LSI and SFA as “drill down” analysis components October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 144 Q&A Contact: Art Vandenberg avandenberg@gsu.edu Vijay Vaishnavi vvaishnavi@gsu.edu Chris Shaw cdshaw@cc.gatech.edu Directory Services Team http://www.gsu.edu/~wwwacs/DSR/index.htm October 16, 2003 Art Vandenberg Internet2 Fall Member Meeting 145