synergies.tc.agenda.notes.090707.do... - CEN-R

advertisement
AGENDA NOTES
Synergies -Technical Committee Meeting
Rm. 2200, SFU Harbour Centre
515 West Hastings Street, Vancouver B.C.
10:00 am - 4:00 pm
Attendance: Tim Au Yeung, Rea Devakos, Martin Boucher, Mike Nason (replacement for Lisa C), Jason
Nugent, Erik Moore, James Kerr, Andrea Kosivic, Matt Crider, Brent Jubinville, Kevin Stranack, James
MacGregor, Mark Jordon, Mary Westell, Juan Pablo Alperin, MJ Suhonos, Alec Smecher, Brian Owen
1. New content (journals) to harvest for the 1st Release (to be launched beginning of October)
- alpha version of Synergies platform has been released.
- lots of positive feedback has already been received from the recent CanFed Conference and other
sources during the past few months.
- UMontreal currently in the process of acquiring the central hardware for the national platform.
- Beta version has also been released on July 2 nd.
- another major (full) harvesting of journal content is being scheduled for the end of September, prior to
the release of Synergies 1.0.
2. Questions, problems, etc. for the preparation of data in NLM (journals)
- two general categories: DTD compliance and related “technical” issues; also content and “editorial”
issues. The latter will perhaps be best addressed by the preparation of guidelines to be shared with
journal editors.
- Question: will it be possible to undertake some test harvesting in advance for specific journals. Yes.
Related discussion on staging of testing and distribution of OJS patches to address DTD and other
technical issues. SFU will want to test the OJS fixes before distributing to other regional nodes that
require it. MB indicated that the ALPHA environment could serve as a test platform.
- Question: what are the plans for automating the harvesting process? Some limitations imposed by the
current installation and use of Fedora, but there are plans to move to this direction, especially after the
1.0 release.
- Question: what about incremental versus full harvesting, are there plans to test this? UNB might be a
good source to test some of this with their journals. What about corrections to issues that have already
been harvested? Will need to confirm that OJS harvester plug-in handles this in addition to the full end to
end process. Another good thing to test as part of incremental harvesting.
3. Long term preservation
- Tim Au Yeung reviewed the Digital Preservation Infrastructure Analysis 1.1 document (Feb. 20, 2009).
- where possible we should leverage existing tools and standards. Best practices and standards are still
emerging.
- moved on to Synergies Preservation Implementation Plan 1.0 document (April 23, 2009). Key tasks:
LOCKSS, PREMIS, persistent ID system.
- Synergies Preservation Development Plan More (July 7, 2009) just distributed by email. At high level,
OJS and Erudit will be able to export a platform-neutral, archival preservation dataset (NLM + PDFs)
along with associated metadata (PREMIS). Noted that LOCKSS may not scale well for large datasets.
A preservation portal will be developed to manage and monitor activities. The final piece is preservation
migration system.
- Tim will prepare a GANTT chart to provide a summary overview of timelines and relationships amongst
major tasks.
- Question: what about post-Synergies implications?
- Question: what about workflow-related content from systems like OJS? A difficult area to find the
balance between the final version of the item and everything associated with it from inception to
publication.
- Question: would it be helpful to agree upon a definition of preservation data for Synergies purposes?
Related to this is a question of scoping development activities and what is achievable? The Technical
Committee agreed that concentrating on the final item (or facsimile) from the end-user
accessibility/representation perspective is feasible for preservation purposes, i.e. focus on
LOCKSS and PREMIS implementations for OJS and Erudit. A full preservation platform that
addresses the full gamut of preservation requirements should be left as a post-Synergies phase.
- Question: are we just scoping to journals for the initial implementation? Yes. Although it is necessary to
address other formats, perhaps as watching briefs on emerging best practices. There may also be some
existing options, e.g. DSpace, suitable for other formats such as theses.
- LOCKSS alliance issues, esp. governance and financial implications. The Prairie node will investigate
this and develop a proposal including minimum infrastructure requirements. Mark Jordan will also work
with the Prairies node on this. By end of August.
4. Statistics – Discussion on the implementation of Counter/Sushi
- Andrea K. and James M. summarized the results of the statistical survey. Most popular: visitors by
geographic region, unique visitors, number of downloads, are currently covered by AWStats and
COUNTER support.
- some issues: what is the best way to consolidate statistics between the regional and national nodes.
Alec suggested that Google Stats [urchin] might be a solution for individual journals. Is it necessary to
single out Synergies-only statistics and who would be interested in them – journal editors, internal
Synergies, CFI?
- Question: should be scope by focusing on COUNTER/SUSHI based statistical capabilities, even though
it is far from ideal or complete with respect to high priority requirements? Are SSHRC statistical reporting
requirements the most suitable way to define Synergies requirements?
- Question: do the OJS and Erudit platforms have sufficient flexibility or plug-in capabilities to support
multiple statistical requirements/formats?
- next steps: finalize survey analysis; basic support for COUNTER/SUSHI in OJS and Erudit platforms;
determine ways to consolidate statistics from both regional and national nodes; clarify CFI and SSHRC
reporting requirements. Statistics WG will prepare a summary of these and distribute.
- Rea indicated that UT was willing to undertake development work in this area.
5. Persistent Identifier
- Tim Au Yeung reviewed Meta-Review and Recommendations 0.6.0 document (Nov.18, 2008).
- two recommended PID schemes: HANDLE, ARK
- second document Persistent Identifier System requirements (one pager, no date).
- ARK proposed for reasons of technical expediency. Also appears to be more suitable for article linking
requirements. HANDLE would be a more difficult one to implement.
- Question: how widely has ARK been implemented? Not widely, although both California Digital Library
and LAC are working on an implementation.
- Question: wasn’t there supposed to be a survey of extent of current support for persistent identifiers at
regional nodes? It hasn’t been done.
- Question: weren’t the evaluation criteria supposed to be re-applied and determine if it changed the
scoring?
- discussion ensued around the completeness of the current report and whether it reflected the current
proposal, i.e. support two schemes, but first move forward with an ARK prototype. Also suggested that
prototyping of HANDLE be included in the timeline.
- Question: is it up to each regional node to select either ARK or HANDLE as the PID they will use?
- Question: what are the primary uses of a PID in Synergies? Preservation, shuffling content back/forth
between regional and national nodes.
- Question: what do we want to evaluate during the prototyping phase, i.e. revised criteria that more
accurately reflect Synergies requirements?
- will proceed to a prototyping phase with a focus on ARK and HANDLE, evaluation criteria will be
revisited and re-applied during this phase, with a final recommendation and associated “papertrail” based
on the outcome. Also incorporate the article linking WG activities as a key requirement/criteria into this
group.
6. Basic footer for regional sites (proposition for discussion)
- only applies to the regional node sites, not individual sites.
- discussion on the intent and actual use of the regional node sites. What will they be used? Who will use
them? Do we focus on access to content, or information about Synergies and to participate?
- there will be search options at the Synergies head node to expand or refine on the basis of regions, i.e.
collections. Could this be incorporated as a Synergies federated search box that could also be
implemented at the regional or individual level.
- ask for direction from Steering Committee: should the regional node sites focus primarily on information
about Synergies? This will simplify the branding requirements significantly, especially for Erudit. Should
be focus on federated search and similar functions to provide access via regional and individual journal
sites to the full range of Synergies content.
7. Data Model of exchange format for thesis, books, proceedings and preprints (content of
repositories) - Presentation of the Synergies Document Unit schema (Martin)
- Martin B. reviewed a powerpoint presentation on a proposed Synergies schema. NLM does not support
all of the services to be implemented by Synergies.
- unit = document.
- will be used internally by the Synergies Head platform. Could also interoperate with other exchange
formats. All tags are multilingual.
- OJS has focused on external standards, e.g. ETD for theses.
- schema have implications for long-term preservation.
- the schema will be considered by the Data Models WG as they move on to DTD (or XSD) requirements
for other formats, e.g. monographs. What should be the next focus for content types? Potential
candidates include: theses, conference proceedings, monographs, data sets. This should be discussed
at the full Technical WG to get it started.
8. User testing for beginning of 2010 (online and focus group)
- key areas of investigation. what do we want to measure
- who will work on this activity, etc.
- Martin proposed a working group to look into this and asked for volunteers. The group will commence in
early Fall.
- each regional node will check if they have some appropriate resources they could pull into this
endeavour and report back at the next Synergies TC conference call.
9. Preliminary discussion on development milestones for next 12 – 18 months.
Martin and Brian will prepare a summary from meeting notes.
10. Other
- UMontreal licenses for various software tools, including a Confluence wiki. A document repository wiki
will be implemented by UMontreal. They will also look after migrating documents from the existing
SharePoint wiki at UCalgary.
:synergies.tc.agenda.090707
Download