Profiles Research Networking Software Users Group Meeting http://profiles.catalyst.harvard.edu July 20, 2012 Agenda • • • • Welcome to New Members Upcoming Events EAGER-Profiles Profiles RNS 1.0.1 Profiles Users Group Members UCSF Fred Hutchinson CRC Oregon Health Sci U UC Davis (CBST) Touro University U Southern California UC San Diego Charles Drew U Hawaii Arizona State Montana State U Colorado Denver U Nebraska-Lincoln UW Madison U Illinois U Chicago Baylor College Med UT Southwestern UT Houston Jackson State (RTRN) Ohio State Cincinnati Children’s Case Western U Kentucky Vanderbilt Stem Cell U Arkansas Little Rock U Alabama Birmingham Symplectic Limited (UK) McGill University (Canada) University of Cambridge (UK) Makerere University (Uganda) University of Leuven (Belgium) South-Valley University (Egypt) Elysium, Geneva (Switzerland) Beijing Normal University (China) University of the South Pacific (Fiji) Velammal Engineering College (India) Nati Sci Lib, Chinese Acad of Sci (China) Clinical & Biomedical Computing Ltd (UK) Ministério da Ciência e Tecnologia e Inovação (Brazil) Jonkoping University Engineering School (Sweden) Universidad Nacional Autonoma de Mexico (Mexico) Centre Interdisciplinaire de Nanoscience de Marseille (France) Harvard Univ Minnesota Dartmouth Univ Mass Boston Univ Tufts Univ Boston VA Rensselaer Univ Connecticut Univ Rochester NYU Med Ctr Mount Sinai Sch of Med MedMeme Thomas Jefferson UPenn Johns Hopkins USUHS-CNRM NIH George Wash U Penn State Childrens Nat Med Ctr Wake Forest Leadership in Med HSSC Georgia Tech Piedmont Healthcare Emory University University Spotlights Harvard University UCSF University of Minnesota http://connects.catalyst.harvard.edu/profiles http://profiles.ucsf.edu http://profiles.ahc.umn.edu South Carolina UConn Health Center Penn State http://profiles.healthsciencessc.org http://profiles.uconn.edu http://profiles.psu.edu Wake Forest Medicne RTRN (18 RCMI Institutions) Boston University http://profiles.tsi.wakehealth.edu http://rtrnprofiles.rtrn.net/profilesweb http://profiles.bumc.bu.edu Upcoming Events • 3rd Annual VIVO Conference, Miami, FL, Aug 22-24, 2012 • “OpenSocial Workshop.” Workshop. Eric Meeks, Leslie Yuan, Anirvan Chatterjee • “Building better teams: innovative approaches to the design and deployment of researcher recommendation systems.” Panel. Christopher Kelleher, Griffin Weber, Melissa Haendel, Jeff Horon and Noshir Contractor • “Linking Disciplines: Expanding Harvard Catalyst Profiles to Discover Connections across an Entire University.” Podium Presentation. Griffin Weber and Amy Brand • NSF Science of Science Policy (SciSIP) PI Meeting, Washington DC, Sep 20-21, 2012 • “EAGER-Profiles: Using researcher profiles to demonstrate the impact of investments in science.” Poster & Demonstration. Griffin Weber • AMIA Annual Symposium, Chicago, IL, Nov 3-7, 2012 • “Harvard Catalyst Profiles: Finding collaborators outside biomedicine.” Poster. Griffin Weber EAGER-Profiles • NSF #1238469. Science of Science and Innovation Policy (SciSIP) • “EAGER-Profiles: Using researcher profiles to demonstrate the impact of investments in science” • Prototype of national research networking website (SciENCV) • Profiles of computer scientists at Harvard (Profiles RNS), UCSF (Profiles RNS), U Chicago (Profiles RNS), U Florida (VIVO), U Cambridge UK (Symplectic) • Illustrate connections between research inputs (e.g., grants & contracts) and research outputs (e.g., publications & patents) • Computer scientists are funded by many different agencies, their research outputs take different forms (pubs, software, data, etc.), and they collaborate across many disciplines Profiles RNS 1.0.1 • The names of many web code files and database components were changed to make them more consistent throughout the software. • The documentation, particularly the Architecture Guide, was significantly expanded. ReadMeFirst and ReleaseNotes documents were created. • Database performance enhancements were made, which result in RDF data being returned faster, especially for profiles containing large numbers of triples. • Default editing modules for DataType and ObjectType properties were added. • A custom editing module was created for email address. • The Search API and SPARQL API were converted to SVC files and XSD files were created for each API. Profiles, Networks, Connections Website Framework Website Framework Applications Name Profile Display Search About SPARQL Edit Direct Description Returns the RDF document for a URI. Renders a URI as HTML. Search identifies all RDF nodes that have a property whose value matches a search string. It displays a list of those nodes and links to their URIs. Faceting allows users to narrow the search results by type (class group) or subtype (class). Any property can be used to sort search results. Search incorporates stemming (to match different parts of speech), removal of stop words (e.g., “the”, “of”), and term expansion through the use of a thesaurus (e.g., “cancer” -> “neoplasm”). Displays general information about the Profiles RNS website. This is an interface to test the Profiles RNS SemWeb SPARQL engine. Users can enter an arbitrary SPARQL query and view the results. By default, this front-end tool is only available to administrators, though the ability to pass SPARQL queries to the SemWeb web services can remain open to the public. This application allows users to manage the content on their profiles. Direct2Experts is a federated search tool that locates experts across multiple institutions using Profiles RNS and other research networking products. Core Objects Ontology Linked Data Nodes, Triples Co-Authors, Extended Objects Data Flow Database Schematic Social Network Analysis Derived Data Faculty Publications Disambiguated Data Medline, ISI Web of Knowledge, DSpace, Administrative Databases, Schema Complexity External Data Database Schemas & Tables [Profile.Cache].[SNA.Coauthor.Distance] Schema Table Core Schemas Schema Description [Framework.] Handles global functions, such as resolving RESTful URLs and managing scheduled jobs. [Ontology.] [Ontology.Import] [RDF.] [RDF.Security] Contains the semantic web ontology used by the website. Contains tools to import and process OWL files. Contains the "presentation" ontology, which describes how content should be displayed on the website. Contains the RDF nodes and triples specific to an instance of Profiles. Contains information about who can access secure/private nodes and triples. [RDF.SemWeb] Used to format [RDF.] data so that it can be used by the SemWeb SPARQL engine. [Ontology.Presentation] [RDF.Stage] [User.Account] [User.Session] [Utility.Application] [Utility.Math] [Utility.NLP] Used by the bulk data loading process to store temporary data before it is loaded into the [RDF.] tables. Contains information about authorized users of the website. Contains information about website sessions. A public user of the website will have a session even if she has not logged in and linked the session to a specific user account. Contains functions and procedures that are used in a variety of contexts. Contains mathematical lookup tables and functions. Contains lookup tables and functions related to support natural language processing for search and other features. Extended Schemas Schema Description [Direct.*] Supports Direct2Experts functionality--federated search across multiple institutions using Profiles and other research networking products. [Edit.*] Allows users to edit profile content. [Login.*] Allows users to login to the website. [Profile.Cache] Contains the results of bibliometric and social network analyses. [Profile.Data] Stores copies of certain types of RDF data in relational tables to help with data loads or to improve performance of particular kinds of queries. [Profile.Framework] Used by the Profile application to interact with the Framework. [Profile.Import] Used to place person and other types of data during an initial load of Profiles RNS and in subsequent updates. [Search.] Provides basic search functionality for Profiles RNS. [Search.Cache] Improves the performance of the Profiles RNS search tool by pre-processing the RDF data through scheduled jobs. [Search.Framework] Used by the Search application to interact with the Framework. Security Groups SecurityGroupID Label Description -50 Admins Limited to a restricted set of site administrators with special access permissions to configure the website. -40 Curators Limited to a small number of users whose job is to manage content on the website. -30 Harvesters -20 Users Limited to people who have logged into website. -10 No Search Open to the general public, but blocked to certain (but not all) search engines such as Google. -1 Public 0 Undefined Limited to authorized automated processes that synch data between this website and other systems. Open to the general public and may be indexed by search engines. Cannot be accessed by any users. Node and Triple Tables [RDF.].[Node] [RDF.].[Triple] Field Type Field Type NodeID BIGINT TripleID BIGINT ValueHash BINARY(20) Subject BIGINT Language NVARCHAR(255) Predicate BIGINT DataType NVARCHAR(255) Object BIGINT Value NVARCHAR(MAX) TripleHash BINARY(20) InternalNodeMapID INT Weight FLOAT ObjectType BIT Reitification BIGINT ViewSecurityGroup BIGINT ObjectType BIT EditSecurityGroup BIGINT SortOrder INT ViewSecurityGroup BIGINT Graph BIGINT Ontology Tables Table Description [Ontology.].[ClassGroup] Lists top-level Class Groups for search and browse. [Ontology.].[ClassGroupClass] Maps Class Groups to individual RDF Classes. [Ontology.].[ClassProperty] Defines which RDF properties should be returned and expanded when data is requested. [Ontology.].[ClassTreeDepth] Contains the class hierarchy. Used by Search. [Ontology.].[DataMap] Maps extended schema data to the ontology. [Ontology.].[Namespace] Lists namespaces and their prefixes. [Ontology.].[PropertyGroup] Lists the broad groups of related properties. [Ontology.].[PropertyGroupProperty] Lists the properties within each group. Data Flow Loading person data from an external (e.g., HR) source [Profile.Import] [Profile.Data] [Profile.Cache] [RDF.] Loading user account data from an external source [Profile.Import] [User.Account] [RDF.] Creating RDF data from an extended data table [Profile.Data] [RDF.] Loading data as triples [RDF.Stage] [RDF.] Adding new classes or properties to the ontology [Ontology.Import] [Ontology.] [RDF.Stage] [RDF.] Presenting the RDF data in a format that can be used by SemWeb (SPARQL) [RDF.] [RDF.SemWeb] Populating the search cache based on the RDF data [RDF.] [Search.Cache] Extending Profiles RNS 1) Extend the ontology a) Define a namespace b) Define the new class in that namespace c) Define the new properties in that namespace 2) Import the data feed to an extended schema table a) Create a new extended schema table (i.e., [Profile.Data].*) b) Load the feed into the new table 3) Create a mapping from the new table to the ontology 4) Run ProcessDataMap to generate RDF