Implementing The HKUST Institutional Repository Diana Chan Head of Reference HKUST Library Nov, 2005 2005 Library Conference: Balancing the External and Traditional Libraries at the Tamkang University, Taiwan Library and Online Resources Technologies 2005 Conference at Xiamen University, PRC Contents 1. 2. 3. 4. 5. 6. Open Access and Institutional Repositories HKUST IR Software Selection Planning and Policies Strategies in Acquiring Content Challenges 2 HKUST Opened in 1991 4 schools (SSCI, SENG, SBM, HSS) 450 faculty, 5,500 UGs, 2,800 PGs Ranks 42 among the top 200 universities (2004 The Times Higher Education Supplement) Library: 22 librarians, 75 support staff 3 1. Open Access and Institutional Repositories Technological and social trends that lead to the Open Access Movement Fruits of Open Access What is an Institutional Repository? Why create one? 4 Technological Trends Increasing ease of sharing documents via FTP and Web (HTTP) Enables researchers to “publish” their research results (working papers, pre-prints, etc) in subject-specific, web-based open archives for faster and wider dissemination Individual scholars or institutions post abstracts and full-text Social Science Research Network (SSRN) IDEAS – Working papers in Economics The success of such collections led to the Open Archives Initiative (OAI) which promotes author self-archiving & interoperable standards for file sharing Major outcome: Open Archives Initiative Protocol for Meta Data Harvesting (OAI-PMH) 5 Social Trends “Serials Crisis” Journal titles Increasing + Prices rising + Library budgets cut = Market dysfunction (since the 1980’s) Source ARL Statistics: Monographs and Serials Costs in ARL Libraries, 1986-2003 6 Open Access Movement Example Scholarly Publishing and Academic Resources Coalition (SPARC) Sponsored by Association of Research Libraries Endorsed by many different groups: Assoc. of American Universities, Assoc. of Universities and Colleges of Canada, Australian ViceChancellors Committee, etc. Founded in 1997 to correct market dysfunction in scholarly publishing “Expand competition & support Open Access to address high & rising journal costs” 7 A Fruit of OA Movement: Open Access Journals Refereed or peer reviewed Emerging Infectious Diseases Journal of Machine Learning Research More in Directory of Open Access Journals (DOAJ) 8 A Fruit of OA Movement: OAIster One searchable interface for open archives from 536 academic institutions 5.9 million documents: articles from Open Access journals; working papers, discussion papers, & conference papers; dissertations & theses + All of the above & more from Institutional Repositories 9 A Fruit of OA Movement: Institutional Repositories Development of IRs gained momentum with the release of two open source systems: Eprints (U of Southampton) DSpace (MIT) Examples of Individual IRs Australian National University Eprint Repository eScholarship Repository (U of California) CalTech CODA Institutional Archives Registry (468 as of Oct 5, 2005) 10 What is an Institutional Repository (IR)? A “digital collection capturing and preserving the intellectual output of a single or multi-university community”. - Adopted from “The case for institutional repositories: a SPARC position paper” prepared by Raym Crow. <http://www.arl.org/sparc/IR/ir.html> 11 Why Create the IR? Budapest Open Access Initiative http://www.soros.org/openaccess/index.shtml Recommends 2 Strategies: 1. Self-archiving in Open Electronic Archives 2. Open Access Journals 12 Dual Open-Access Strategy BOAI-2 ("gold"): Publish your article in a suitable open-access journal whenever one exists. BOAI-1 ("green"): Otherwise, publish your article in a suitable toll-access journal and also selfarchive it. 13 Must Satisfy Two Conditions The author…grants to all users a free …right of access to, and a license to copy, use, distribute, transmit and display the work publicly … A complete version of the work is deposited in…at least one online repository - From the Berlin Declaration 14 Why We Created an IR at HKUST To create a permanent record of the scholarly output of HKUST To make available and disseminate the scholarly output of HKUST in a free and interoperable digital format To help the international Open Access effort. Because the mission of disseminating knowledge is only half complete if it is not widely and readily available to society. - Adapted from the Berlin Declaration 15 2. HKUST Institutional Repository Collects, disseminates, and preserves in digital format the scholarly output of the HKUST community Uses DSpace software, OAI-PMH compliant, supports Chinese Easily discovered by Internet search engines and indexing tools http://library.ust.hk/repository/ 16 Total Number of Documents Collection Size % Conference Papers 579 26 Working Papers, Technical Reports, Research Reports, Pre-prints 534 25 Journal Articles 493 23 Doctoral Theses 394 18 Patents 58 3 Presentations 56 2 Book Chapters 37 2 Miscellaneous 8 1 Total 2,159 (incl. 100 As of Oct 5, 2005 duplicates) 17 Contributors by Department (as of Oct 5, 2005) HSS&SOSC 6% OTHER 12% COMP 21% SBM 13% ELEC 13% OTHER SCI 9% PHY 5% MATH 6% OTHER ENG 8% MECH 7% 18 Home Page of the HKUST Institutional Repository 19 Browsing by Communities and Collections 20 Communities in HKUST IR Accounting Advanced Engineering Materials Facility Applied Technology Center Atmospheric, Marine and Coastal Environment Program Biochemistry Biology Center for Enhanced Learning and Teaching Centre for Display Research Chemical Engineering Chemistry Civil Engineering Computer Science Economics Electrical and Electronic Engineering Finance Humanities Industrial Engineering and Engineering Management Information and System Management Institute of Nano Science and Technology Language Center Library Management of Organizations Marketing Mathematics Mechanical Engineering Physics Social Science 21 To Find Papers by Authors kwok y 22 23 The View of an IR Record Click to see full text 24 Full Text in pdf Format 25 To Search in IR Fill in keywords and click Search 26 27 To Submit A Paper Put in your UST account name and password 28 Fill in the form, click the “Submit” button at the bottom of the page 29 30 You will receive a confirmation email 31 Access Data 32 3. Software Selection The July/August 2004 We followed CalTech’s issue of Library model and based our IR Technology Reports on open source software and with OAIon IR systems and PMH interface. functional requirements We evaluated 2 IR systems: EPrints and DSpace 33 DSpace Jointly developed by MIT Libraries and Hewlett-Packard Company Open source software Released on Sourceforge during our system evaluation period in late December 2002 Written in Java, with PostgreSQL database, Lucene search engine, and a Tomcat web servlet container 34 DSpace We chose DSpace in 2003 because: DSpace began the development with the experience gained from EPrints - the first and most popular open source IR software at that time EPrints did not have full support on Unicode and is not Java- and servlet-based Both EPrints and DSpace are open source software, fulfill our functional requirements, and follow state-of-the-art library standards 35 Current Configuration of HKUST IR As of Oct 5, 2005, Home URL: IR Software: System Software: http://repository.ust.hk/ DSpace Version 1.2.1 Fedora Core 2 Linux; Tomcat 5.0.28; JDK1.4.2_05 Server: Intel Pentium 4 2.4GHz, 2GB RAM Content: 2,059 documents from 40 communities Usages: Documents were accessed 5,792 times in September 2005 36 Major Features Data structure Document submission form Add item form CJK support OAI data provider SRW/U interface 37 Data Structure Document Types journal articles, theses, etc, Document Formats Mainly PDF files; also contains PowerPoint files DSpace data model Communities (and sub-communities) Collections Items Metadata Bundles of bitsteams HKUST implementation: Items are grouped by Departments (i.e. communities) then by Document Types (i.e. collections). 38 Document Submission Form Faculty are not willing to do self-submission DSpace’s submission and workflow functions are too lengthy In need of a simple and effortless submission form - as a quick medium for submitting documents Written in Perl Submitted data stored in DSpace “Simple Archive Format” 39 Add Item Form Is a locally developed JSP application to add items to DSpace by library staff Allows staff to: Create new item from scratch Enhance the metadata from faculty submission and then add the item to DSpace 40 41 CJK Support CJK (Chinese, Japanese, Korean) Support DSpace supports Unicode Problem - Lucene search engine is unable to search by CJK characters Solved by replacing DSpace’s Tokenizer with a CJKTokenizer - but has an interesting side effect Problem - URL of query containing CJK characters is not properly encoded Solved by setting Tomcat URIEncoding="UTF8" 42 43 44 OAI Data Provider DSpace is OAI-compliant This means that OAI harvesters can easily collect the metadata (in Dublin Core format) from various IRs (including HKUST’s) for their added-value indexing/searching services. For example: OAIster OAI Path to IR at HKUST: http://repository.ust.hk/dspace-oai/request? 45 http://repository.ust.hk/dspace-oai/request?verb=GetRecord& ... 1783.1/1805 46 SRW/U Interface Search and Retrieval for the Web (or by URL) Retain core functionality of Z39.50 but in the form of web services This means search service providers can broadcast a search to various IRs and deliver the search results in their own GUI interface SRW/U Interface for the IR at HKUST Based on OCLC’s SRW/U software URL: http://repository.ust.hk/SRW/ 47 The result of a SRW/U search, with XSLT transformation 48 Enhancements to DSpace Document submission form CJK searching problem Subscript and superscript problem Number of items displayed Access data Top 20 Recommend an item link Faculty & staff link 49 4. Planning and Policies Task Force – software, scope, policies, database structure, problems, action plans Information Services Committee – guidelines on publications, publishers’ policies, data formats, faculty concerns. Library Administrative Committee – problems, issues, final decision, strategies. 50 Work Team – Subject Librarians Correct Version Incorrect Version To Data Entry Staff Index Document Dr. Samson Soong Liaise & Subject librarians With Faculty Check Pub List Harvest Document Correct Version Verify Document Version Ascertain Pubs’ Policies 51 Work Team – Data Entry Staff Verify and Convert PDF Documents Final Review Input Metadata Using Submission Form Add Items to Repository Set PDF Document Security & Properties. Add Watermark for Pre-published Version Proof-Read 52 53 Guidelines on Different Publications Type Copyright Action Book chapter Book Conf paper Conf proceed. US Patent Publisher Need permission Publisher, 50 years Need permission Author Can archive Publisher Need permission Public Domain Author Can archive US Patents Working Paper, Technical Report Author Can archive Presentation Standard Author Can archive Issuing Organization No 54 SHERPA Summary of Publishers' Policies 55 Guidelines on Journal Articles Publisher’s Policy No Arch. Pub’s PrePostRef’ed Ref’ed Both All Not Specified PreRefereed Version No Yes Yes Yes Yes Yes Ask Pub PostRefereed Version No Yes No Yes Yes Yes Ask Pub Publisher’s No Version Yes No Ask Faculty Ask Yes Faculty Ask Pub Version Available On hand 56 Guidelines on Publishers’ Policies Studied publishers’ copyright & self-archiving policies (SHERPA/RoMEO , Stevan Harnad’s and publishers’ websites) Constructed our own table for reference Printout of publishers’ copyright statements and date-stamped Noted their acknowledgement or credit requirement 57 Credit to Publisher In the Rights field of a record: APS copyright statement: "[Journal title] © copyright (year) American Physical Society. The Journal's web site is located at http://....." 58 59 Other Policies Withdrawal Replacing Versions Cooperation with User Groups Authority Control Indexing Rights and Acknowledgement 60 5. Strategies in Acquiring Content Our logics How to Acquire by Type of Document? How to Use Different Channels? Sustainable Growth 61 Logics Behind our Strategies The research output is the University’s intellectual property Create a critical mass of papers Copyright and self-archiving rights are our concerns Ascertain publishers’ policies Ask permission from authors and publishers Deal with publications which are easier to obtain and sources which are more accessible Those posted on the web Those from publishers allowing published versions 62 How to Acquire by Type of Document? 1. Working Papers, Technical Reports, Research Reports 2. Conference Papers 3. Conference Presentations 4. Theses 5. Book Chapters 6. Peer-reviewed Journal Articles 7. Open Access Journal Articles 63 Sources of Scholarly Content Library Collection Researchers Web Scholarly Content Publishers Journals 64 Copyright VS. Self-Archiving Rights Copyrighted Non-copyrighted Journal articles, book chapters, Working papers, conference proceedings, theses, technical reports presentations Author’s Permission Author’s Permission Publisher’s & Author’s Permission Archivable University Owned Author Owned Publisher Owned Nonarchivable Selected items to ask for author’s & publisher’s permission Author’s permission Department’s Permission 65 Journal Articles Journal Article Check Author’s Archiving Rights No or Unclear Ask Publisher Yes Publisher’s Version Harvest from The Web Yes Pre-refereed Or Post-refereed Version Ask Author Deposit Into IR 66 How to Use Different Channels? 1. 2. 3. Self Submission Harvest from Websites (departmental, faculty, research centers) Library Collection 4. 5. 6. 7. Conference proceedings Theses and dissertations University Archives Harvest from the Source (databases, E-journals, Open Access publications) Publishers Liaisons with Faculty, departments, research centers Public Relations 67 Electronic Thesis Approval Form Student Agreement: I hereby grant to the Hong Kong University of Science and Technology Library the nonexclusive right to archive my thesis in digital format, and make it freely accessible, such as over the Internet. Signed: Date: 68 Publisher’s policy: Emerald Emerald’s Principles on Copyright Emerald seeks to retain copyright of the articles it publishes, without the authors giving up their rights to use their own material. Authors are not required to seek permission to re-use their own work. As an author you can use your paper in part or in full,…in another article written for us or another publisher, on your website, or any other use, without asking us first. http://ninetta.emeraldinsight.com/pdfs/jarform.pdf 69 Collection Growth Milestones 1800 83 Research Centers 1600 No. of Documents 1400 79 Univ. Archives 1200 50 IOP papers 1000 142 conference papers 35 papers with publishers' permission 800 96 CS papers 600 110 theses + 211 working papers 400 53 patents 200 116 papers from faculty websites 105 CS technical reports 0 May 2003 Jul Sep Nov Jan 2004 Mar May July Sep 70 Towards Sustainability for the HKUST Institutional Repository How to make the submission to IR part of the publication process? Seeking permission from faculty to archive papers supported by RGC grants making use of the OCGA Research Output report process, a checkbox is added to the report form to denote agreement to archiving in IR – 100+ papers was received in the summer 2005. 71 6. Challenges - Faculty Low awareness of Open Access Concern over copyright issues Apathy in self submission Lack of willingness to negotiate on nonexclusive rights or self-archiving rights Lack of willingness to provide the right versions of documents (pre- or post-refereed) Only a small % of their scholarly work can be archived 72 Example of a Faculty Retaining Self Archiving Rights 73 Challenges - Institution Needs to make a commitment to deposit all research output with the Institutional Repository Needs to give financial support to faculty who submit papers to open access journals Needs to give financial support to the Library for archiving work 74 Challenges - Publishers In SHERPA project, 73 out of 107 publishers (68%) allow some sort of archiving, as of Nov’04 Many have no policy (Camford, Genetic Society of America) Many have an unclear policy Need to include self-archiving into license agreements with publishers 75 Challenges – Library Provide support for university research selfarchiving Promote the IR Educate users and faculty about the IR Showcase the IR Find champions and partners Seek institutional commitment and support Harvest documents Make self submission a part of faculty’s publication reporting system 76 Challenges - Librarians System Evaluation Formulating and interpreting policies Internal and publishers’ policies Content Recruitment Advocacy Education Advisory Perceived benefits Public relations Use Assistance 77 References and Additional Resources Chan, Diana L.H. (2004) “Managing the challenges : acquiring content for the HKUST Institutional Repository” International conference on developing digital institutional repositories : experiences and challenges, Hong Kong, December 9-10, 2004, California Institute of Technology Libraries and the Hong Kong University of Science and Technology Library, available at http://hdl.handle.net/1783.1/1973 (accessed September 24, 2005) Chan, Diana L.H. (2004) “Strategies for acquiring content : experiences at HKUST” International conference on developing digital institutional repositories : experiences and challenges, Hong Kong, December 9-10 2004, California Institute of Technology Libraries and the Hong Kong University of Science and Technology Library, available at: http://hdl.handle.net/1783.1/1974 (accessed September 24, 2005) Chan, Diana L. H., Kwok, Catherine S. Y., Yip, Stephen K. F. (2005) “Changing roles of reference librarians : the case of HKUST Institutional Repository.” Reference Services Review, Vol. 33, No. 3, pp.268-282, available at http://hdl.handle.net/1783.1/2039 (accessed September 24, 2005) Crow, Raym. (2002) “SPARC Institutional repository checklist and resource guide” The Scholarly Publishing & Academic Resources Coalition, November. Crow, Raym. (2002) “The case for institutional repositories: a SPARC position paper”, available at http://www.arl.org/sparc/IR/ir.html (accessed September 24, 2005) Gibbons, Susan. (2004) “Establishing an institutional repository” Library Technology Reports, July/August, Vol. 40 No. 4, pp. 5-67. Lam, Ki-Tat. (2004) “DSpace in action: implementing the HKUST Institutional Repository system“ International Conference on Developing Digital Institutional Repositories : Experiences and Challenges, Hong Kong, December 9-10, 2004, California Institute of Technology Libraries and the Hong Kong University of Science and Technology Library, available at http://hdl.handle.net/1783.1/2023 (accessed September 24, 2005) Special issue on reference librarians and institutional repositories (2005). Reference Services Review, vol. 33, no.3. pp. 259-346. 78