Research Workflow Process on the GRIDs Surface Keith G Jeffery President euroCRIS Keith G Jeffery Director, IT Agenda • Introduction • The R&D Process: Recording • The Key: Metadata and Data Exchange Standards • Workflow on the GRIDs surface • Conclusion Research Process Workflow GRIDs 2 © Keith G Jeffery Director, IT Nirvana Commonly used to indicate an optimal state of a person (professional) or system (suitable) • • Buddhism. “The ineffable ultimate in which one has attained disinterested wisdom and compassion” Hinduism. “Emancipation from ignorance and the extinction of all attachment” In a euroCRIS context, best possible CRIS system(s) for end-users backed by best advice Research Process Workflow GRIDs 3 © Keith G Jeffery Director, IT Nirvana - Retrieval • An environment where an end-user can: – Request information and through an intelligent dialogue generate a ‘job’ which provides it • Example (Medical R&D planning) – How many researchers • expert in GlycoProtein gp120 and CD4 molecule – are likely be available in 2015; – Classify researchers by country, institution; • order list of researchers by number of refereed publications to date Research Process Workflow GRIDs 4 © Keith G Jeffery Director, IT Nirvana – input / update • An environment where an end-user can: – Input / update information and through an intelligent dialogue obtain assistance where needed and validation of the input • Example: – if value input for ‘person’ then possible valid values for ‘organisational unit’ suggested Research Process Workflow GRIDs 5 © Keith G Jeffery Director, IT The Solution is Required: • To overcome the ‘effort threshold’ to : • obtain the required answers from the CRIS • input and update the information in the CRIS • maintain data quality in the CRIS • Across – local stand-alone CRIS – heterogeneous distributed CRISs • Thus achieving ‘nirvana’ Research Process Workflow GRIDs 6 © Keith G Jeffery Director, IT Agenda • Introduction • The R&D Process: Recording • The Key: Metadata and Data Exchange Standards • Workflow on the GRIDs surface • Conclusion Research Process Workflow GRIDs 7 © Keith G Jeffery Director, IT The R&D Process: Recording Workprogramme CRIS DATABASE Proposal Project Results Exploitation Research Process Workflow GRIDs 8 WealthCreation © Keith G Jeffery Director, IT The R&D Process: Feedbacks Workprogramme CRIS DATABASE Proposal Project Results Exploitation Research Process Workflow GRIDs 9 WealthCreation © Keith G Jeffery Director, IT The R&D Process: Review Workprogramme CRIS DATABASE Proposal Project Results Exploitation WealthCreation Research Process Workflow GRIDs 10 review review review review © Keith G Jeffery Director, IT The WorkProgramme Process Economic factors Societal factors CRIS DATABASE Technology Foresight -World / Country State -World / Country Models -Technology Prediction -Solicited Advice Research Process Workflow GRIDs 11 Workprogramme © Keith G Jeffery Director, IT The Proposal Process Idea CRIS DATABASE Review Previous Work Objectives CRIS DATABASE -Previous Results -Previous Projects -Human Resources -Finance Method Resources and dependencies Research Process Workflow GRIDs 12 Proposal © Keith G Jeffery Director, IT The Project Process Project CRIS DATABASE Project Management System CRIS DATABASE -Previous Results -Previous Projects -Human Resources -Finance Research Process Workflow GRIDs 13 © Keith G Jeffery Director, IT The Results Process Initial Results CRIS DATABASE Internal Review CRIS DATABASE Previous Results Peer Review Publication or Registration Research Process Workflow GRIDs 14 © Keith G Jeffery Director, IT Results The Exploitation Process Results Business Plan CRIS DATABASE Finance Marketing Production Marketing Information Economic Information Research Process Workflow GRIDs 15 Selling Exploitation © Keith G Jeffery Director, IT The Wealth Creation Process Exploitation CRIS DATABASE marketing employment production Marketing Information Economic Information Research Process Workflow GRIDs 16 WealthCreation © Keith G Jeffery Director, IT The R&D Process: Recording Workprogramme CRIS DATABASE Proposal Project Results Exploitation Research Process Workflow GRIDs 17 WealthCreation © Keith G Jeffery Director, IT The R&D Process Recording WorkProgramme Workprogramme ProgrammeName Funding OrgUnit Person responsible Workprogramme document Research Process Workflow GRIDs 18 CRIS DATABASE © Keith G Jeffery Director, IT The R&D Process Recording Proposal Proposal Title Abstract Person(s) OrgUnit(s) Proposal Document Research Process Workflow GRIDs 19 CRIS DATABASE © Keith G Jeffery Director, IT The R&D Process Recording Project Project Research Process Workflow GRIDs 20 Title Abstract Person(s) OrgUnit(s) Funding Project Plan CRIS DATABASE © Keith G Jeffery Director, IT The R&D Process Recording Results-Product Person(s) OrgUnit(s) Project(s) Product(s) Product Description CRIS DATABASE Results Research Process Workflow GRIDs 21 © Keith G Jeffery Director, IT The R&D Process Recording Results-Patent Person(s) OrgUnit(s) Project(s) Patent(s) Patent File CRIS DATABASE Results Research Process Workflow GRIDs 22 © Keith G Jeffery Director, IT The R&D Process Recording Results-Publication Person(s) OrgUnit(s) Project(s) Bibliographic Information Article CRIS DATABASE Results Research Process Workflow GRIDs 23 © Keith G Jeffery Director, IT The R&D Process Recording Exploitation Person(s) OrgUnit(s) Business plan Finance Data Marketing Data Production Data Sales Data CRIS DATABASE Exploitation Research Process Workflow GRIDs 24 © Keith G Jeffery Director, IT The R&D Process Recording Wealth Creation Person(s) OrgUnit(s) Annual Reports/Accounts Employment Records Dividends Records Research Process Workflow GRIDs 25 WealthCreation CRIS DATABASE © Keith G Jeffery Director, IT The R&D Process Note: Workprogramme Nirvana Proposal Project Results some CRIS developers limit recording of outputs from the process to areas indicated Exploitation Research Process Workflow GRIDs 26 WealthCreation © Keith G Jeffery Director, IT Complete Process ICT Support • Nirvana is – a complete, – integrated, – end-to-end ICT support – for the research process – across heterogeneous distributed CRISs Research Process Workflow GRIDs 27 © Keith G Jeffery Director, IT Agenda • Introduction • The R&D Process: Recording • The Key: Metadata and Data Exchange Standards • Workflow on the GRIDs surface • Conclusion Research Process Workflow GRIDs 28 © Keith G Jeffery Director, IT Metadata and Data Exchange Standards • Metadata – a succinct representation of the object of interest – Schema, navigational, associative [descriptive, restrictive, supportive] – Used for rapid retrieval of navigational data to objects of interest – Can also be used for statistical purposes (‘how many…..’,’average number of…’) Research Process Workflow GRIDs 29 view to users SCHEMA NAVIGATIONAL ASSOCIATIVE constrain it data (document) © Keith G Jeffery Director, IT Metadata • Many kinds and standards exist • Examples include: – Publications: MARC, DC (Dublin Core) – Geospatial: CSDGM (Content standard for digital geospatial metadata) – Engineering: STEP – Education: LOM (learning object metadata); EDNA (Education Network Australia metadata) Research Process Workflow GRIDs 30 © Keith G Jeffery Director, IT Metadata and CRISs • Commonly a CRIS stores the metadata rather than the object itself – e.g. result_publicationId which can be used to access the publication itself (person{author}, title, abstract etc usually stored in the CRIS) – e.g. projectId which can be used to access the detailed project documentation (title, abstract etc usually stored in the CRIS) Research Process Workflow GRIDs 31 © Keith G Jeffery Director, IT Metadata: DCf: Publications Domain of CERIF Project Person OrgUnit Person Descriptive OrgUnit UniqueId UniqueId Restrictive Title Security Subject Privacy Keywords Quality Assessment AccessLevel Description Charge Resource Type Annotation Coverage Temporal Classification Coverage Spatial ResourceIdentifier Research Process Workflow GRIDs 32 Navigational © Keith G Jeffery Director, IT Metadata in CRISs • Used for – Quality: validation on input / update – Summarising: overview results – Retrieval speed (find the list of objects of potential interest) – Controlling access – Rights management – And…….. Research Process Workflow GRIDs 33 © Keith G Jeffery Director, IT Metadata in Interoperating CRISs • Metadata essential to allow interoperation of CRISs, especially heterogeneous distributed CRISs • Provides the information necessary to set up automatically retrieval (or update) over heterogeneous CRISs – Catalog technique – Universal schema technique(s) – Knowledge-based reconciliation technique(s) Research Process Workflow GRIDs 34 © Keith G Jeffery Director, IT Metadata and Data Exchange Standards • Data Exchange Standards – Needed not just for data (file) exchange – Also for returning results of a retrieval from one CRIS to another in a form (syntax, semantics) that is processable • Metadata plus dataset – Note data exchange standards used extensively in e-business, banking, insurance, medical, engineering, research areas Research Process Workflow GRIDs 35 © Keith G Jeffery Director, IT The Key: Metadata and Data Exchange Standards • Nirvana is – Formal metadata (machine understandable) – Query: Metadata describing CRIS resources to improve queries – Answer: Metadata attached to Query result files (data exchange) so the receiving CRIS or user can understand the output Research Process Workflow GRIDs 36 © Keith G Jeffery Director, IT Agenda • Introduction • The R&D Process: Recording • The Key: Metadata and Data Exchange Standards • Workflow on the GRIDs surface • Conclusion Research Process Workflow GRIDs 37 © Keith G Jeffery Director, IT Workflow on the GRIDs surface • GRIDs ‘surface’ provides – Computational capabilities of GRID – Information presentation capabilities of WWW – Information management capabilities • But not yet environment for workflow Research Process Workflow GRIDs 38 © Keith G Jeffery Director, IT Data to Knowledge The GRIDs Architecture Knowledge Layer Information Layer Computation / Data Layer Research Process Workflow GRIDs 39 © Keith G Jeffery Director, IT Research Process Workflow GRIDs 40 E-Business Application Environmental Application Data to Knowledge The GRIDs Architecture © Keith G Jeffery Director, IT A POSSIBLE ARCHITECTURE U:USER The GRIDs Environment Um:User Metadata Sm:Source Sa:Source Metadata Agent Ua:User Agent brokers S:SOURCE Research Process Workflow GRIDs 41 Ra:ResourceRm:Resource Agent Metadata R:RESOURCE © Keith G Jeffery Director, IT A Brief History of GRIDs • 1G: custom-made architecture machines to user – Pioneering metacomputing • 2G: proprietary standards and interfaces – I-WAY GLOBUS, UNICORE, CONDOR, e-Science Apps LEGION AVAKI • 2.5G: added in FTP, SRB, LDAP, AccessGRID • 3G: adopted W3C concepts for open interfaces – OGSA / OGSI: note especially OGSA/DAI e-Science R&D – But built on 2.G foundations Research Process Workflow GRIDs 42 © Keith G Jeffery Director, IT But….. • This comes nowhere near the requirements as originally defined for GRIDs • Too low-level (programmer not end-user level) – Insufficient representativity – Insufficient expressivity – Insufficient resilience – Insufficient dynamic flexibility Research Process Workflow GRIDs 43 © Keith G Jeffery Director, IT So….. • The US GRID is metacomputing plus extensions – In 2002 improved with OGSA using W3C Web Services ideas • European position is that GRID architecture (GLOBUS or even UNICORE) is the wrong starting point for the European vision Research Process Workflow GRIDs 44 © Keith G Jeffery Director, IT And….. • EC persuaded of importance of GRIDs – Started in IST/Environment (early 2000) with IT architectural framework for FP6 projects – Set up GRID Unit under Wolfgang Boch (late 2002) • January 2003: large workshop (GRID Unit) – (~ 240 participants) – Keynotes: • Thierry Priol (INRIA, FR) • Domenico Laforenza (CNR, IT) • Keith Jeffery (CCLRC, UK)© Keith G Jeffery Research Process Workflow GRIDs 45 Director, IT NGG Requirements • • • • • • • • • • Transparent and reliable Open to wide user and provider communities Pervasive and ubiquitous Secure and provide trust across multiple administrative domains Easy to use and to program Persistent 2.5G or Based on standards for software and protocols even 3G GRID basically Person-centric meet none Scalable of these Easy to configure and manage Research Process Workflow GRIDs 46 © Keith G Jeffery Director, IT NGG • NGG1: 200301-200306 – Brought together visionary experts – Defined properties required and research agenda to achieve them • NGG2: 200401-200407 – Updated NGG1 vision in the light of funded projects and evolving requirements and technology • NGG3 200509• http://www.cordis.lu/ist/grids/pub-report.htm Research Process Workflow GRIDs 47 © Keith G Jeffery Director, IT GRIDs Vision and Requirements (1) • a user interacts with the GRIDs environment intelligently • such that the GRIDs environment proposes a 'deal' to the end-user to satisfy her request • which the user can then decide to execute involving multiple resources of computation, information, detectors (for new data collection), interactions with other users through various communication devices etc. Research Process Workflow GRIDs 48 © Keith G Jeffery Director, IT GRIDs Vision and Requirements (2) • interoperation as a seemingly homogeneous 'surface' over a range of devices from smart dust through detectors to embedded systems (including controllers), handhelds, laptops, desktops, departmental servers, corporate servers and supercomputers. • the 'surface' depends on self-* (self-managing, self-repairing, self-tuning...) capability across arbitrary and dynamic collections of (large numbers of) nodes to give scalability, performance, reliability, access, security, privacy and other features. Research Process Workflow GRIDs 49 © Keith G Jeffery Director, IT NGG1 • NGG1 Properties Required: – – – – – – – – – – Transparent and reliable Open to wide user and provider communities Pervasive and ubiquitous Secure and provide trust across multiple administrative domains Easy to use and to program Persistent Based on standards for software and protocols Person-centric Scalable Easy to configure and manage Research Process Workflow GRIDs 50 © Keith G Jeffery Director, IT Call2 (NGG1) Projects Funded GRIDCOORD Building the ERA in Grid research inteliGRID Grid-based generic enabling application technologies to facilitate solution of industrial problems SIMDAT K-WF Grid Knowledge based workflow & collaboration UniGridS Extended OGSA Implementation based on UNICORE HPC4U Fault tolerance, dependability for Grid EU - driven Grid services architecture for business and industry NEXTGRID Semantic Grid based virtual organisations OntoGrid Knowledge Services for the semantic Grid Mobile Grid architecture and services for dynamic virtual Organisations AKOGRIMO DataminingGrid Datamining tools & services European - wide virtual laboratory for longer term Grid research - creating the foundation for the next generation Grids COREGRID Provenance Provenance for Grids Figure 1: The Call 2 Projects as a ‘house’ Research Process Workflow GRIDs 51 © Keith G Jeffery Director, IT NGG2 SWOT(1) • Ontologies and semantic web technologies will be crucial to provide scalable support for complex, heterogeneous Grids middleware and applications. • The strengths of the European telecommunications industry and the diversity of its market for electronic control systems have given Europe a leading position in the areas of mobile and embedded technology. This is of particular relevance for the realization of the vision of a Grid as a pervasive, user-centered utility. • The weakness in hardware and primary software products (e.g. commodity processors, server and desktop Operating systems, Programming Languages, etc.) may hamper the development of a European leadership in Grids Technologies. Research Process Workflow GRIDs 52 © Keith G Jeffery Director, IT NGG2 SWOT(2) • The convergence between Grids and Web Services provides a significant opportunity to move to a model of software development and service provision where the market dominance of particular OS vendors is no longer a major economic issue. • The distinctive European vision of a Grids environment that operates from the level of devices to supercomputers, to serve communities ranging from individuals to whole industries, including data, information and knowledge and emphasizing resilience and scalability could have a significant economic and social impact far beyond the scope of existing compute and data Grids. This should be contrasted with the North American Grid vision of programmerlevel metacomputing. • It is vital that any European vision for the evolution of Grids is accompanied by a clear representation of that vision to the key standards bodies and technology providers worldwide. Research Process Workflow GRIDs 53 © Keith G Jeffery Director, IT NGG2 Recommendations • (a) development of a design for a new operating system that provides a fault-tolerant, scalable, self-healing, self-managing environment upon which Grids service middleware may ‘sit’; • (b) development of Grids foundations middleware suitable both for enhancing existing operating systems and for inclusion within (a); • (c) development of Grids service middleware in a modular fashion allowing applications to utilise those services they require; • (d) research and development in computer science and information technology required to accomplish (c), (b) and (a), notably new models and software for transactions and messaging; for scheduling, resource management and optimisation; for trust, security and privacy; for data, information and knowledge management; for software development and deployment including mobile code; and for intelligent and appropriate user interfaces and device interfaces; • (e) development of novel applications that are wealth-creating or improve the quality of life, particularly in the e-business domain, but also in e-health, e-environment, e-culture, e-science, e-government; © Keith G Jeffery Research Process Workflow GRIDs 54 Director, IT NGG2 Application A Application B Application C Grids Middleware Services Needed for A Grids Middleware Services Needed for B Grids Middleware Services Needed for C Grids Foundations for Grids Foundations For Operating System X Operating System Y Operating System X Grids Operating System (including Foundations) Modular and dynamically loadable Operating System Y Research Process Workflow GRIDs 55 © Keith G Jeffery Director, IT Workflow on the GRIDs Surface • Nirvana is – GRIDs ‘surface’ • Providing computation, information presentation and information management – Plus Self* resilience – Plus capabilities to support workflow Research Process Workflow GRIDs 56 © Keith G Jeffery Director, IT Agenda • Introduction • The R&D Process: Recording • The Key: Metadata and Data Exchange Standards • Workflow on the GRIDs surface • Conclusion Research Process Workflow GRIDs 57 © Keith G Jeffery Director, IT Overall : The Way Forward SCIENTIFIC DATASETS PUBLICATIONS Data CRIS Data Information Management of Research Information Knowledge Research Process Workflow GRIDs 58 Knowledge © Keith G Jeffery Director, IT Overall : The Way Forward Portal with knowledge-assisted user interface SCIENTIFIC DATASETS PUBLICATIONS CRIS Data Information Data Management of Research CDR (CERIF) Knowledge Information Knowledge Digital Curation Facility Research Process Workflow GRIDs 59 © Keith G Jeffery Director, IT Overall : The Way Forward Portal with knowledge-assisted user interface SCIENTIFIC DATASETS PUBLICATIONS Data Data Information Information Knowledge metadata Knowledge Digital Curation Facility Research Process Workflow GRIDs 60 © Keith G Jeffery Director, IT Overall : The Way Forward Portal with knowledge-assisted user interface SCIENTIFIC DATASETS PUBLICATIONS Data Data publish Information Knowledge Information metadata validate Knowledge Digital Curation Facility Research Process Workflow GRIDs 61 © Keith G Jeffery Director, IT Overall : The Way Forward Ambient, Pervasive Access Portal with knowledge-assisted user interface SCIENTIFIC DATASETS PUBLICATIONS Data Data publish Information Knowledge Information metadata validate Knowledge Digital Curation Facility GRIDs Research Process Workflow GRIDs 62 © Keith G Jeffery Director, IT Overall : The Way Forward Ambient, Pervasive Access Portal with knowledge-assisted user interface SCIENTIFIC DATASETS PUBLICATIONS Data Data publish Information Knowledge Information metadata validate Knowledge Digital Curation Facility GRIDs Research Process Workflow GRIDs 63 © Keith G Jeffery Director, IT Overall : The Way Forward Ambient, Pervasive Access Portal with knowledge-assisted user interface SCIENTIFIC DATASETS PUBLICATIONS Data Data publish Information Knowledge Information metadata validate Knowledge Digital Curation Facility GRIDs Research Process Workflow GRIDs 64 © Keith G Jeffery Director, IT Overall : The Way Forward Ambient, Pervasive Access Portal with knowledge-assisted user interface SCIENTIFIC DATASETS PUBLICATIONS Data Data publish Information Knowledge Information metadata validate Knowledge Digital Curation Facility GRIDs Research Process Workflow GRIDs 65 © Keith G Jeffery Director, IT Three Steps to Nirvana The Perfect CRIS Workflow on the GRIDs Surface Metadata and Data Exchange Standards Complete Process ICT Support Research Process Workflow GRIDs 66 © Keith G Jeffery Director, IT Prof. Keith G Jeffery Director, Information Technology Head, Business & Information Technology Department CCLRC Rutherford Appleton Laboratory k.g.jeffery@rl.ac.uk http://www.bitd.clrc.ac.uk/ Keith G Jeffery Director, IT