AGENDA NOTES Synergies -Technical Committee Meeting Rm. 2200, SFU Harbour Centre 515 West Hastings Street, Vancouver B.C. 10:00 am - 4:00 pm Attendance: Tim Au Yeung, Rea Devakos, Martin Boucher, Mike Nason (replacement for Lisa C), Jason Nugent, Erik Moore, James Kerr, Andrea Kosivic, Matt Crider, Brent Jubinville, Kevin Stranack, James MacGregor, Mark Jordon, Mary Westell, Juan Pablo Alperin, MJ Suhonos, Alec Smecher, Brian Owen 1. New content (journals) to harvest for the 1st Release (to be launched beginning of October) - alpha version of Synergies platform has been released. - lots of positive feedback has already been received from the recent CanFed Conference and other sources during the past few months. - UMontreal currently in the process of acquiring the central hardware for the national platform. - Beta version has also been released on July 2 nd. - another major (full) harvesting of journal content is being scheduled for the end of September, prior to the release of Synergies 1.0. 2. Questions, problems, etc. for the preparation of data in NLM (journals) - two general categories: DTD compliance and related “technical” issues; also content and “editorial” issues. The latter will perhaps be best addressed by the preparation of guidelines to be shared with journal editors. - Question: will it be possible to undertake some test harvesting in advance for specific journals. Yes. Related discussion on staging of testing and distribution of OJS patches to address DTD and other technical issues. SFU will want to test the OJS fixes before distributing to other regional nodes that require it. MB indicated that the ALPHA environment could serve as a test platform. - Question: what are the plans for automating the harvesting process? Some limitations imposed by the current installation and use of Fedora, but there are plans to move to this direction, especially after the 1.0 release. - Question: what about incremental versus full harvesting, are there plans to test this? UNB might be a good source to test some of this with their journals. What about corrections to issues that have already been harvested? Will need to confirm that OJS harvester plug-in handles this in addition to the full end to end process. Another good thing to test as part of incremental harvesting. 3. Long term preservation - Tim Au Yeung reviewed the Digital Preservation Infrastructure Analysis 1.1 document (Feb. 20, 2009). - where possible we should leverage existing tools and standards. Best practices and standards are still emerging. - moved on to Synergies Preservation Implementation Plan 1.0 document (April 23, 2009). Key tasks: LOCKSS, PREMIS, persistent ID system. - Synergies Preservation Development Plan More (July 7, 2009) just distributed by email. At high level, OJS and Erudit will be able to export a platform-neutral, archival preservation dataset (NLM + PDFs) along with associated metadata (PREMIS). Noted that LOCKSS may not scale well for large datasets. A preservation portal will be developed to manage and monitor activities. The final piece is preservation migration system. - Tim will prepare a GANTT chart to provide a summary overview of timelines and relationships amongst major tasks. - Question: what about post-Synergies implications? - Question: what about workflow-related content from systems like OJS? A difficult area to find the balance between the final version of the item and everything associated with it from inception to publication. - Question: would it be helpful to agree upon a definition of preservation data for Synergies purposes? Related to this is a question of scoping development activities and what is achievable? The Technical Committee agreed that concentrating on the final item (or facsimile) from the end-user accessibility/representation perspective is feasible for preservation purposes, i.e. focus on LOCKSS and PREMIS implementations for OJS and Erudit. A full preservation platform that addresses the full gamut of preservation requirements should be left as a post-Synergies phase. - Question: are we just scoping to journals for the initial implementation? Yes. Although it is necessary to address other formats, perhaps as watching briefs on emerging best practices. There may also be some existing options, e.g. DSpace, suitable for other formats such as theses. - LOCKSS alliance issues, esp. governance and financial implications. The Prairie node will investigate this and develop a proposal including minimum infrastructure requirements. Mark Jordan will also work with the Prairies node on this. By end of August. 4. Statistics – Discussion on the implementation of Counter/Sushi - Andrea K. and James M. summarized the results of the statistical survey. Most popular: visitors by geographic region, unique visitors, number of downloads, are currently covered by AWStats and COUNTER support. - some issues: what is the best way to consolidate statistics between the regional and national nodes. Alec suggested that Google Stats [urchin] might be a solution for individual journals. Is it necessary to single out Synergies-only statistics and who would be interested in them – journal editors, internal Synergies, CFI? - Question: should be scope by focusing on COUNTER/SUSHI based statistical capabilities, even though it is far from ideal or complete with respect to high priority requirements? Are SSHRC statistical reporting requirements the most suitable way to define Synergies requirements? - Question: do the OJS and Erudit platforms have sufficient flexibility or plug-in capabilities to support multiple statistical requirements/formats? - next steps: finalize survey analysis; basic support for COUNTER/SUSHI in OJS and Erudit platforms; determine ways to consolidate statistics from both regional and national nodes; clarify CFI and SSHRC reporting requirements. Statistics WG will prepare a summary of these and distribute. - Rea indicated that UT was willing to undertake development work in this area. 5. Persistent Identifier - Tim Au Yeung reviewed Meta-Review and Recommendations 0.6.0 document (Nov.18, 2008). - two recommended PID schemes: HANDLE, ARK - second document Persistent Identifier System requirements (one pager, no date). - ARK proposed for reasons of technical expediency. Also appears to be more suitable for article linking requirements. HANDLE would be a more difficult one to implement. - Question: how widely has ARK been implemented? Not widely, although both California Digital Library and LAC are working on an implementation. - Question: wasn’t there supposed to be a survey of extent of current support for persistent identifiers at regional nodes? It hasn’t been done. - Question: weren’t the evaluation criteria supposed to be re-applied and determine if it changed the scoring? - discussion ensued around the completeness of the current report and whether it reflected the current proposal, i.e. support two schemes, but first move forward with an ARK prototype. Also suggested that prototyping of HANDLE be included in the timeline. - Question: is it up to each regional node to select either ARK or HANDLE as the PID they will use? - Question: what are the primary uses of a PID in Synergies? Preservation, shuffling content back/forth between regional and national nodes. - Question: what do we want to evaluate during the prototyping phase, i.e. revised criteria that more accurately reflect Synergies requirements? - will proceed to a prototyping phase with a focus on ARK and HANDLE, evaluation criteria will be revisited and re-applied during this phase, with a final recommendation and associated “papertrail” based on the outcome. Also incorporate the article linking WG activities as a key requirement/criteria into this group. 6. Basic footer for regional sites (proposition for discussion) - only applies to the regional node sites, not individual sites. - discussion on the intent and actual use of the regional node sites. What will they be used? Who will use them? Do we focus on access to content, or information about Synergies and to participate? - there will be search options at the Synergies head node to expand or refine on the basis of regions, i.e. collections. Could this be incorporated as a Synergies federated search box that could also be implemented at the regional or individual level. - ask for direction from Steering Committee: should the regional node sites focus primarily on information about Synergies? This will simplify the branding requirements significantly, especially for Erudit. Should be focus on federated search and similar functions to provide access via regional and individual journal sites to the full range of Synergies content. 7. Data Model of exchange format for thesis, books, proceedings and preprints (content of repositories) - Presentation of the Synergies Document Unit schema (Martin) - Martin B. reviewed a powerpoint presentation on a proposed Synergies schema. NLM does not support all of the services to be implemented by Synergies. - unit = document. - will be used internally by the Synergies Head platform. Could also interoperate with other exchange formats. All tags are multilingual. - OJS has focused on external standards, e.g. ETD for theses. - schema have implications for long-term preservation. - the schema will be considered by the Data Models WG as they move on to DTD (or XSD) requirements for other formats, e.g. monographs. What should be the next focus for content types? Potential candidates include: theses, conference proceedings, monographs, data sets. This should be discussed at the full Technical WG to get it started. 8. User testing for beginning of 2010 (online and focus group) - key areas of investigation. what do we want to measure - who will work on this activity, etc. - Martin proposed a working group to look into this and asked for volunteers. The group will commence in early Fall. - each regional node will check if they have some appropriate resources they could pull into this endeavour and report back at the next Synergies TC conference call. 9. Preliminary discussion on development milestones for next 12 – 18 months. Martin and Brian will prepare a summary from meeting notes. 10. Other - UMontreal licenses for various software tools, including a Confluence wiki. A document repository wiki will be implemented by UMontreal. They will also look after migrating documents from the existing SharePoint wiki at UCalgary. :synergies.tc.agenda.090707