ePrints.FRI – a case study Open Access Repositories with EPrints EIFL-FOSS and EIFL-OA free online workshop 23 May 2011 Miha.Peternel@fri.uni-lj.si Overview • Context • Why EPrints ? • Installation & configuration • Case Study: Organization & service implementation @ FRI • Case Study: Submission policy • Measuring success • Conclusion ePrints.FRI – a case study 2 ePrints.FRI • ePrints.FRI is the publications database of the Faculty of Computer and Information Science, University of Ljubljana • It is based on open source ePrints with modifications • It was started in 2002 using ePrints 2, now it’s ePrints 3 • It is integrated into Open Archive Initiative indexer network • It was the first OAI archive in wider region • It is integrated into existing faculty web infrastructure • In 2008 it has become official digital repository for all student theses and dissertations • It currently hosts 995 student works, 155 scientific works, plus some books and other teaching materials ePrints.FRI – a case study 3 Context • ePrints.FRI is an effort of Faculty of Computer and Information Science • There is a separate University Library • Most of material is also catalogued in COBISS (metadata only) UL Universityof Ljubljana FRI Facultyof Computer andInformationScience COBISS National catalog DIKUL UniversityLibrary ePrints.FRI ePrints.FRI – a case study 4 Goal • Initial goal (volunteer effort, 2002) • Provide a simple self-archiving tool for the laboratory supporting Open Archive Initiative (prof. Franc Solina) • Try to deploy it Faculty-wide • Revised goal (institutional effort, 2008) • Fulfill national directive on thesis publishing • Revise policies for digital publishing • Try to provide single point of meta-data entry ePrints.FRI – a case study 5 Why did we choose ePrints 2 in 2002? • Open Access Initiative (OAI) • Web publishing of primarily scientific output and teaching materials • Web based interface • Open source on standard Linux servers (RedHat) • Highly customizable metadata • Multilingual metadata • The other OAI alternative in 2002 not as customizable ePrints.FRI – a case study 6 Why did we choose ePrints 3 again in 2008? • Mandatory publishing of student output (theses, dissertations) • Improved metadata customization • Improved workflow customization • Modernized web user interfaces • CSS, help, auto-complete, preview… • Fulfilled all our customization requirements • Alternatives did not offer any advantages for our needs • No security or performance issues with ePrints 2 • Easy import of metadata (XML) and documents (PDF,DOC) • Support from ePrints wiki and mailing list ePrints.FRI – a case study 7 Installing ePrints Test installation: • Pick your favourite OS (we prefer Debian Linux) • Install Apache web server, MySQL, Perl • Download & install EPrints package for your OS • Add a few more Perl packages that EPrints requires • Set up a test archive by running the configuration scripts • Optionally install some more tools that EPrints can use Pre-production installation: • Configure and finalize metadata before you start adding documents • Configure your server • Virtual server has benefits ePrints.FRI – a case study 8 Customizing ePrints • All the source code is available • A collage of Perl, XML, XHTML and structured text files • The code base is modular • The Perl code stubs that are expected to be modified are exposed in archive configuration directories • Most other configuration is done by modifying XML/XHTML • Some hacking of base Perl code or additions to code may be required for minor fixes (custom import/export, OAI language preferences…) or other special needs • Each new version is more customisable out of the box • Wiki documentation is extensive but not always up to date ePrints.FRI – a case study 9 Customizing ePrints 3 in practice • Code and libraries – Perl • Metadata definition – Perl • Subject hierarchy and departments – text or XML • Apache web server – conf files • Workflow – XML • Interface language – XML and some Perl • OAI export – Perl, text • Automation scripts – Linux crontab, some PHP • Autocomplete – text, PHP • Custom views – Perl, XML, XHTML • Custom references – XHTML, some Perl ePrints.FRI – a case study 10 OAI configuration • Enter policy information • (Re)configure metadata mapping • Enable & test (via web interface) • Language prioritization is an issue with multilingual metadata • We rewrote some code ePrints.FRI – a case study 11 Developmental phases • ePrints 2 – 1 person effort • Basic customization – less than 1 month • Internal testing – 1 laboratory • Dedicated server & multilingual debugging – 1 month • ePrints 3 – institutional effort • Organized process ePrints.FRI – a case study 12 Developmental phases – ePrints 3 • Institutional planning • Metadata definition • Customization • Translation • Staff education • Testing • Migration of existing publications from ePrints 2 • Initial deployment • Workflow facilitation • Statistics ePrints.FRI – a case study 13 Workgroup staff • Workgroup manager: prof. Mira Trebar • Software engineers (2) • IT department representative • Student office representative • Library representative • Linguist • Plus occasional institutional representatives • More student office & library personnel involved in final testing ePrints.FRI – a case study 14 Chart: Institutional departments involved • Workgroup staff dispersed over several departments University of Ljubljana FRI Faculty of Computer and Information Science Student office IT department Library Representative System engineer Representative Staff (4) Script engineer ePrints.FRI – a case study Support engineer Labs Manager ePrints engineer Staff (2) 15 Workgroup organization chart Faculty senate & commissions Workgroup manager ePrints engineer System engineer Library representative Student office representative Linguist Automation script engineer ePrints.FRI – a case study 16 Developmental milestones • Institutional commitment: December 2007 • First workgroup meeting: January 2008 • Test installation: April 2008 • Metadata testing: May 2008 • Institutional presentation: June 2008 • Metadata migration, Testing: August 2008 • Institutional deployment, Testing: September 2008 • Public deployment: October 2008 ePrints.FRI – a case study 17 ePrints.FRI – 2008 revision Bilingual interfaces Multilingual metadata and documents ePrints.FRI – a case study 18 Hosting • IT department, Faculty of Computer and Information Science • Platform: • IBM server • VMWare hosting multiple virtual servers • Virtual Debian Linux server • Backup • Backup virtual server images • Provided by IT department ePrints.FRI – a case study 19 Service sustainability • Printed instructions • 4 student office staff educated • 2 library office staff educated • One ePrints administrator plus one support engineer • One system administrator plus one support engineer • Virtual server with full-system backup • System, metadata and publications all backed up • Technical support ePrints.FRI – a case study 20 Technical support • 1st level: IT department • System engineer • Support engineer • 2nd level: involved technical staff • ePrints software engineer • Automation script engineer • Network administrator ePrints.FRI – a case study 21 Policy formulation and licensing • Policy formulated to respect national laws and university workflow • Two-track licensing • Strict requirements for student theses • Scientific papers self-published and checked on best-effort basis • Legal paperwork prepared for students • Students must sign papers submitting rights for electronic publishing ePrints.FRI – a case study 22 Student submission • Thesis work in printed form • Thesis work in electronic form (PDF/DOC on CD) • Metadata in electronic form (TXT/DOC on CD) • Signed legal paperwork All of above submitted to student office before oral defense, so that announcement and publication proceed automatically ePrints.FRI – a case study 23 Thesis submission workflow ePrints.FRI – a case study 24 Workflow facilitation (1) • Auto-complete names and titles from a database • Avoids tedious ID lookups and related errors ePrints.FRI – a case study 25 Workflow facilitation (2) • Fill in standard fields • Prepare links and fix them later ePrints.FRI – a case study 26 Scientific submission • Self-archived by author • Electronic document in PDF form preferred • Metadata in web forms • Self-published by employed author (policy since 2010) • Metadata and document validity periodically checked by administrator on best-effort bases • Returned to author in case of invalid metadata • Removed from public archive in case of a serious problem • No legal policy at the moment ePrints.FRI – a case study 27 Publication HTML page ePrints.FRI – a case study 28 System integration • Web integration • Thesis defense announcements • Thesis details • Personal publication lists • Laboratory publication lists • Links to content hosted in ePrints • Information system integration • Morning mails include thesis defense announcements • Mentoring and committee participation statistics automated ePrints.FRI – a case study 29 Web integration – defense announcements Generated by automation script from ePrints 3 XML ePrints.FRI – a case study 30 Web integration – publication lists Generated by PHP script from ePrints 3 export ePrints.FRI – a case study 31 Measuring and demonstrating success • The first open archive in the wider region, quickly picked up by OAI indexers and big search engines • Relatively quick deployment with NO serious glitches • Attracted interest from other open-access projects (DRIVER) and faculties • Access statistics: • AWStats and Webalizer – general web access • IRStats – repository specific • Increased ability to monitor INTEREST and ORIGIN OF INTEREST for publications and subjects of publications ePrints.FRI – a case study 32 Google ranking ePrints.FRI – a case study 33 Visitor statistics (IRStats) ePrints.FRI – a case study 34 Document downloads (IRStats) ePrints.FRI – a case study 35 Visitor statistics summary • Publication dissemination about triple the number of enrolled students PER MONTH • Greatly increased promotion and dissemination of student theses – on average 5 downloads per thesis per month • Elevated practical status of thesis as a reference • Most visitors arrive by search engines looking for general keywords ePrints.FRI – a case study 36 Key challenges faced • Translation and multi-language specific issues • Terminology • Missing ePrints language flexibility • Missing OAI multilingual support • Overcoming resistance to change • 1 point data entry for student department • Facilitators for data entry (auto-complete, workflow) • System integration with existing web software: Ažur, Moodle • Metadata set changed from ePrints 2 to ePrints 3 • Minor problems with indexing and automatic data transfer ePrints.FRI – a case study 37 Important unresolved issues • Legal policy for scientific publications • Centralized archive provides little incentive for self-archiving • If Google can find it, who cares about repository • Automated integration with current and future national archives • Goal: Enter metadata once, publish many times ePrints.FRI – a case study 38 Conclusion • EPrints is relatively easy to install and customize • Most things can be customized within provided Perl stubs, XML and XHTML • Anything can be changed using basic Perl skills (and time) • Google ranks ePrints archives highly • Setting up OAI will boost your rankings (OAI indexers will link back to your archive) • We did not experience any serious security or performance issues in ePrints code • Based on this I can recommend ePrints for your archive ePrints.FRI – a case study 39 Thank you • Any questions ? Miha.Peternel@fri.uni-lj.si ePrints.FRI – a case study 40