Digital Library: The HKU Libraries’ experiences Kam-ming Ku HKUL kmku@hku.hk The presentation is about: How to achieve delivering right information to the right person at the right time in anywhere? 1.HKUL resources/projects 2.Going to do… 3.Challenges 4.Overcome the challenges 5.Discussion 1. HKUL resources/projects 1.1 Staffing 1.2 Networking 1.3 Hardware 1.4 Software 1.5 DL initiatives 1.1 Systems Staff • Systems Librarian • 2 Computer Officers • Assistant Librarian • Assistant Computer Officer • Senior Library Assistant • 5.5 Technicians 1.2 Networking • From 10 100 1000 wireless Bluetooth?? • Gigabit Ethernet backbone and Fast Ethernet running to users. About 1000 network points. • ACENet connection (Access Everywhere Network; plug-in network for roaming users); ~450 fixed points; 18 wireless access points. 1.2 Networking (cont.) • Libraries within Campus are connected to Campus Backbone by Gigabit Ethernet link or Fast Ethernet link. • 2 remote sites, Dental & Medical Libraries, are connected to Main Campus by 10Mbps links respectively. • Gigabit Firewall (Cisco PIX Firewall) • Packeteer Network shaper 1.3 Hardware • Compaq AlphaServer GS60E ( for library catalogue) • SUN Enterprise 4000, 10000 • 3 Linux, 5 Windows and 3 Novell Servers 1.3 Hardware (cont.) • 10 CDROM Towers 4 Towers for Staff 2 Towers in Medical Library 4 Towers for Network • 3 WinFrame Servers & 1 Thin Client server 1 1 1 1 Network CD-ROM MetaFrame Server Standalone CD-ROM MetaFrame Server Network CD-ROM WinFrame Server Dell Server for 6 Thin Clients 1.3 Hardware (cont.) Office/Staff Counter Student Office/Staff Student PC 289 35 342 Printer 107 27 MAC 6 7 Scanner 17 12 1.4 Software • SUN Solaris 8, DEC UNIX, Windows 2000/NT, • • • • • Novell Netware, Linux III Innopac library management system Oracle 9i database, 9iAS (Web) and Context (full-text indexing/searching) ERL server for SilverPlatter databases WinFrame server for legacy and network CDROM databases Apache Web servers 1.4 Software (cont.) • TRS 4.0 server • CJN server for hosting 6000+ China fulltext journals • Proxy server, Samba server • Pcounter server • Tamino XML server • VOD server (IBM Videocharger) • Ezproxy Server 1.4 Software (cont.) Illiad server (Inter-library Loan) Taiwan Newspaper database Chinese Database Server: Sibucongkan (四部叢刊); Sikuquanshu (四庫全書); ekangxi dictionary (康熙字典) 1.5 HKUL DL initiatives 1.5 HKUL DL initiatives Imaging database 1.5 HKUL DL initiatives • 1.5.1. Digitalization projects e.g. ExamBase – – – – – – First in-house developed database Imaging database for past exam. papers Released in 1996 Use DMS, client-server model Shifted to web-based soon tiff only (on-the-fly convert to gif/jpg) , no PDF!!! 1. Hardware High-speed flat bed scanner (36ppm) 2. Software Kofax capture 3.0 Sophisticated software includes scanning, OCR, verifications. 3. Logistics a. b. c. d. Scanning Automatic indexing Verification and manual inputting Data Publishing Publish data to Oracle database a. Scanning Papers are scanned in batch mode (~200 pages per batch) Uses separation sheet to separate different documents (The separation sheet is printed with barcoded index (e.g. department, course code) and fixed-sized font text The separation sheets can be re-used.) b. Automatic indexing To recognize those barcoded indexes and text printed on the separation sheet c. Verification and manual inputting No need to verify the barcoded indexes, as the accuracy is > 99.999% In-doubt OCRed text is marked in red, it is easy to verify Input other indexes manually (e.g. exam. date) 1.5 HKUL DL initiatives (cont.) • e.g. Newspaper clippings – Full-text imaging database – Outsource: scanning/indexing/OCR – Oracle context cartridge as full-text search engine (supports no Chinese!) – Decision: keep on using? or buying a 3-rd party full-text software?? 1.5 HKUL DL initiatives (cont.) • 1.5.2 Value-added Bibliographic databases – Subset of library catalogue – e.g. TOC , Thesis Online, AV materials.. – Debate: • single point source or a number of subsets?? 1.5 HKUL DL initiatives (cont.) e.g. Table of Contents • To automate the inputting of TOC into bibliographic records 1. Hardware Overhead book scanner (~4sec per image) 2. Software Kofax capture 3.0 Sophisticated software includes scanning, OCR, verifications. 3. Techniques a. b. c. d. Scanning Chinese OCR Proofreading Data Publishing Publish data to Catalogue a. Scanning Use book scanner to scan the book’s TOC benefits : no need to flip the book for scanning can scan two sides at one time increase the speed of scanning b. Chinese OCR A plug-in module was written to interface with Kofax Capture for Chinese OCR (TH-OCR 7.5) c. Proofreading Use MS Word (Chinese) to do the proofreading Macro program was written to ease the step of assigning MARC sub-fields d. Publish data to Catalogue Done at night in batch mode Use tcl/tk expect script to automate the upload process 1.5 HKUL DL initiatives (cont.) • 1.5.3 Subject-based e-resources – – – – – Redesign tag 996 A number of useful information on e-resources Grouping of materials by subject: fulfill users’ needs Ease of extending our further DL projects (e.g. portal) See HKUL HP (databases, EJ, Ebooks & ENews) • 1.5.4 Internet resources • 1.5.5 Electronic Delivery (ILLiad) 1.5 HKUL DL initiatives (cont.) • 1.5.6 Virtual services – E-forms (e.g. BRO) – Online reference • 1.5.7 Automation – – – – – Increase efficiency e.g. amend thousand of records in batch Electronic submission Staff intranet Innoface 1.5 HKUL DL initiatives (cont.) • 1.5.8 Collaboration – Union catalogue w/ Jinan University • 1.5.9 Authentication : Proxy, ezproxy, IP control • 1.5.10 Others…: for accessing legacy CDROM databases 2. Going to do… 1. 2. 3. 4. 5. Storage Area Network (SAN) Abundance of servers One-stop search Alert service Wireless applications 2.1 SAN Problem a: Storage large data size of our hosted databases high monthly data increase rate Databases are hosted in different hosts/OS 2.1 SAN (cont.) Problem b: Backup backup drive for every machine backup software license for every machine Need to handle a lot of backup tapes 2.1 SAN (cont.) Solution – (SAN) Put all data storage into a single large-sized expandable storage device. The storage device is connected to the hosts by high-speed Fiber channels Fiber channel loop is used to connect to each host in order to ensure high availability Backup can be done on a single device 2.2 Abundance of servers Problem : Hard to monitor the status and activities of each server Waste time to tune the performance of each server 2.2 Abundance of servers (cont.) Solution – Server consolidation Buy several powerful servers instead of many cheap mid-range servers Keep as minimal servers as possible Save space and UPS power ratings , i.e. $$ saving Save man power to administer/maintain server performance , i.e. cost saving 2.3 One-stop search Before searching, one needs to know which database suit one’s need To search multiple databases simultaneously e.g. OAI (http://www.openarchives.org/ ) e.g. CDL SearchLight (http://www.cdlib.org/cgi-bin/searchlight) 2.4 Alert service To alert users for new information SDI 2.5 Wireless Application A study on mobile and PDA application in Library 3. Challenges • • • • • • • • Changes New Technologies Competitors What are the (future) standards? Contents Digital Vs printed Information overflow Lifelong education 3.1 The causes of changes • Development of I.T. – Network, telecommunications, digitalization, storage format, access model, … • Economy – Online, e-commerce, smart card , … • Learning environment – Life-long learning • Mode of communication – Email, ICQ 3.2 New technologies • Changing … so fast • Acronyms – Help: http://www.webopedia.com • Who knows what the future would be? – Reluctant to change • Don’t be afraid to dig in – See : Editor’s notes, Computers in Libraries, vol.22, no.8, p.6 3.3 Competitors • Who? – See: OCLC White paper on the Information Habits of College Students (http://www2.oclc.org/oclc/pdf/printondeman d/informationhabits.pdf) • 79% use a search engine for every or most searches!! Technology Adoption Life Cycle Early Majority Innovators Early Adopters Late Majority Laggards Source: Crossing the Chasm, Geoffrey Moore Crystal Ball?? Number of visits Usage of physical materials Training to users & real-time support Demand for subject knowledge Competitors Fast services & high productivity Information provider and producer Cost-effectiveness Library workflow goes to e-business model Partnership Provide services that lead to income 4. Overcome the challenges • What business are we in? • What are our major strengths & weakness? • Who are our competitors? • Who are our customers? their needs? • What factors are affecting Library? • Do we have the skills? 4. Overcome the challenges – how? • • • • • • • • Training - to keep abreast with new technologies Human resources - partners Value-added services User-oriented mindset Automation Improve the social image of librarians Co-operation Talk with other people in order to understand the technology different areas • Research 4. Overcome the challenges (cont.) • Skills? – Librarianship & IT knowledge – Teamwork, Commitment – Thinking methodology – creativity, use of knowledge – Outlook of the world – Interpersonal skills – Health!! Principles for building DL Expect change Know your content Involve the right people Design usable system Ensure open access Beware of data rights Automate whenever possible Adopt and adhere to standards Ensure quality Be concerned about persistence McCray, A. & Gallagher, M. (2001). Principles for Digital Library Development, Communications of the ACM, 44(5), pp.49-54. THE END THANK YOU!