The Chinese University of Hong Kong Library Louisa LAM Jeff LIU Islandora Conference Aug 4, 2015 香港中文大學圖書館 Old Collections in a New Bottle: How The Chinese University of Hong Kong Library Uncovers Hidden Treasures of the University The Chinese University of Hong Kong Library • About CUHK Library Digital Initiatives • Islandora@CUHK • New Features implemented to uncover the idiosyncratic nature of Chinese texts to make the CUHK Digital Collections more discoverable • Book Flipping / Page Progression Direction • Display of Transcribed Chinese Text • Cross search of different forms of Chinese characters 香港中文大學圖書館 Outline 香港中文大學圖書館 About CUHK Library The Chinese University of Hong Kong Library • 7 branches in 2 campuses Upper campus: New Asia College Library, United College Library Central campus: University Library, Law Library Lower campus: Chung Chi College Library, Architecture Library Medical Library at Prince of Wales Hospital 4 The Chinese University of Hong Kong Library Collection Size: Print: 2.4 million Databases: 670+ E‐ Journals: 130,000+ E‐books: 4,500,000+ http://www.lib.cuhk.edu.hk/ 香港中文大學圖書館 About CUHK Library The Chinese University of Hong Kong Library Started 1995, now nearly 25 digitization projects were developed with a total of over 5.5 million of images. One of the most popular database reaches several millions hits per year. 香港中文大學圖書館 About CUHK Library Digital Initiatives 19 The Chinese University of Hong Kong Library These initiatives suffer from a multitude of limitations that requires migration to a new platform: • Individual web sites for browsing and searching cast difficulties in branding, maintenance and future technological development • Non‐standard descriptive metadata schema • No single platform for the management of digital objects and font‐ end display • No cross search amongst the digital collections • Lack of advanced features that were popular with users: facet search, social networking, federation search, non‐discoverable, etc. Time to move from Individual databases to Islandora 6 香港中文大學圖書館 Time to Move to a new Digital Repository The Chinese University of Hong Kong Library Flexible front‐end thematic design & back‐end management of digital objects – present much potential for adapting to new technologies Lots of modularized functions for rapid development (e.g. Drupal’s i18n to support trilingual interface, Google Analytics, Apache Solr, OAI‐PMH etc…we all know more it ^‐^) Open source with large user community, documentation and forums Support digital humanities research: text‐mining, TEI, GIS, etc Support multiple metadata schema Digital Preservation and curation Meet our current and future needs! CUHK Library is the first Asian library implementing Islandora Source: http://islandora.ca/islandora‐installations 6 香港中文大學圖書館 Why Islandora? The Chinese University of Hong Kong Library • Majority of CUHK Library digital collections are in rare books in Traditional Chinese • Some idiosyncratic nature of Chinese texts is beyond the scope of a Unicode‐based Solr‐supported repository system. 香港中文大學圖書館 Is Islandora Sufficient for CUHK Library Use Cases? The Chinese University of Hong Kong Library 香港中文大學圖書館 PROBLEM / FEATURE 1: BOOK FLIPPING / PAGE PROGRESSION DIRECTION The Chinese University of Hong Kong Library • The default Internet Archive Reader of Islandora is perfect for some modern Chinese and almost all English books, but … • Not working as expected for our Chinese Rare Book Collection that require flipping from right to left • The default page direction of Internet Archive Reader will flip the book from left to right, causing weird user experience and incorrect reading of the text. 香港中文大學圖書館 Book Flipping / Page Progression Direction The Chinese University of Hong Kong Library Sample link: http://repository.lib.cuhk.edu.hk/en/islandora/object/islandora%3A145#pag e/3/mode/2up 香港中文大學圖書館 Incorrect flipping direction for Chinese Rare Books The Chinese University of Hong Kong Library • We partnered with discoverygarden to develop a new Page Progression option in CUHK Islandora 香港中文大學圖書館 Implementation of a New Book Flipping / Page Progression Direction The Chinese University of Hong Kong Library • A Drush parameter was also developed for batch ingestion sudo drush ‐‐root=/var/www/drupal7 ‐‐ uri=http://repository.lib.cuhk.edu.hk ‐‐user=admin islandora_book_batch_preprocess ‐‐namespace=islandora ‐‐ parent=islandora:daoist‐text ‐‐content_models=islandora:bookCModel ‐‐ type=zip ‐‐page_progression=rl ‐‐do_not_generate_ocr ‐‐ target=/mnt/daoist/007294481.zip 香港中文大學圖書館 Implementation of a New Book Flipping / Page Progression Direction The Chinese University of Hong Kong Library • Sample link: http://repository.lib.cuhk.edu.hk/en/islandora/object/islan dora%3A9860#page/1/mode/2up 香港中文大學圖書館 Corrected Book Flipping / Page Progression Direction for Chinese Rare Books The Chinese University of Hong Kong Library 香港中文大學圖書館 PROBLEM / FEATURE 2: DISPLAY OF TRANSCRIBED CHINESE TEXT The Chinese University of Hong Kong Library • Like most Asian texts, Chinese texts can be written vertically and horizontally. 《中國學生周報》第1期 中華民國41(1952)年7月25日 First issue of “The Chinese Student Weekly” published on 25 July, 1952 http://hklit.lib.cuhk.edu.hk/pdf/journal/78/1952/160001p.pdf 香港中文大學圖書館 Writing Direction of Chinese Text (1) 2. Right to Left 1. Vertical The Chinese University of Hong Kong Library • Traditionally, vertical writing was the standard system and widely used in publications 文昌帝君陰騭文廣義節錄 : [三卷] / 周夢 顏述. http://repository.lib.cuhk.edu.hk/en/islando ra/object/islandora%3A9860#page/19/mod e/2up 香港中文大學圖書館 Writing Direction of Chinese Text (2) 香港中文大學圖書館 The Chinese University of Hong Kong Library Writing Direction of Chinese Text (3) • Horizontal writing is mainly used for signs Header Logo of “The Chinese Student Weekly” published in year 1952 Horizontally from Right to Left • In recent 40 years, the writing system is gradually changed to using horizontal writing for publications, possibly owing to the influence of English and the inability of some software / browser to fully support vertical display. Horizontally from Left to Right Header Logo of “Hong Kong Literature” published in year 2013 http://hklit.lib.cuhk.edu .hk/pdf/journal/97/201 3/1000065.pdf The Chinese University of Hong Kong Library • Library collaborated with CUHK Art Museum for developing a Sheng XuanHuai manuscript archive. • Sheng Xuanhuai Archive contains letters and correspondences of Sheng Xuanhuai, who was a very influential entrepreneur in the late Qing Dynasty. • The texts of the manuscript were transcribed by a Shanghai expert. • There is a need to display the transcribed Chinese text with the digitized images in Islandora. 香港中文大學圖書館 Sheng Xuanhuai Archive 2. Right to Left 1. Left to Right 2. Vertical 1. Vertical The Chinese University of Hong Kong Library • A side‐by‐side Open SeaDragon viewer and Transcription viewer is used for displaying the images and the transcribed text. • However, the readability of the image and annotation is lowered as the reading directions for two viewers are different. 香港中文大學圖書館 Display Problem in Islandora The Chinese University of Hong Kong Library • We partnered with discoverygarden to develop a new feature option in Islandora that enables vertical display of transcribed text • The implemented solution is based on the Writing Mode style in CSS3 香港中文大學圖書館 Display of Chinese Text in Vertical Direction 2. Right to Left 1. Vertical 1. Vertical The Chinese University of Hong Kong Library 2. Right to Left • http://repository.lib.cuhk.edu.hk/en/islandora/object/nam espace%3A2 香港中文大學圖書館 Corrected Display Direction for Chinese Text The Chinese University of Hong Kong Library Known issues / Limitations: • The enhancement fits for all recent versions of popular browsers, except FireFox, which does not support CSS3 writing mode currently • The vertical text display is not supported in the admin edit mode of Islandora 香港中文大學圖書館 Transcription Display for Chinese Text The Chinese University of Hong Kong Library 香港中文大學圖書館 PROBLEM / FEATURE 3: CJK TSVCC SEARCH The Chinese University of Hong Kong Library 香港中文大學圖書館 Chinese Search in default Islandora / Solr The Chinese University of Hong Kong Library Search of 2 characters 中文: no result Search of one phrase 中文大學圖書 館 (title): no result 香港中文大學圖書館 Search of one character 中: no result The Chinese University of Hong Kong Library • It is not the problem / bug of Islandora / Solr • Just like other systems, Solr is developed based on western language. • In CUHK, our Integrated Library System Innopac / Millennium also has similar problems. • Customization is required to enable the search of Chinese characters 香港中文大學圖書館 Chinese Search in default Islandora / Solr The Chinese University of Hong Kong Library Word Phrase • 香 (Incense) • 港 (Port) • 中 (Center) • 文 (Language) • 大 (Large) • 學 (Learn) • 香港 (Hong Kong) • 中文 (Chinese) • 文大 (Meaningless) • 大學 (University) • 中文大學 (Chinese University) • 香港中文大學 (Chinese University of Hong Kong) 香港中文大學圖書館 Structure of Chinese Characters Form Traditional Chinese 繁體中文 (Proper Chinese): Used in Hong Kong, Taiwan Simplified Chinese 简体中文: Used in Mainland China after 1949 and Singapore 28 The Chinese University of Hong Kong Library • • • • • • • • Traditional Chinese: 中文大學 (Chinese University) (U+5B78) Simplified Chinese: 中文大学 (Chinese University) (U+5B66) Traditional Chinese: 中國 (U+570B) Simplified Chinese: 中国 (U+56EF) 香港中文大學圖書館 Different Unicode for Different Forms of the Same Chinese Character 29 The Chinese University of Hong Kong Library • In Hong Kong, as a Special Administration Region of China, we need to serve both indexing and searching of Traditional Chinese and Simplified Chinese in our publications, websites, and ….. Islandora • CUHK Library users are composed of Mainland students and faculty that use Simplified Chinese and local students and faculty that use Traditional Chinese • Most prefer cross‐search and retrieval of materials in both traditional and simplified Chinese by inputting one single form of characters 香港中文大學圖書館 Preferred Search and Display Mode of CUHK Library Users 30 The Chinese University of Hong Kong Library • Because of the long history of China, there are variant forms of the same character carrying the same meaning: 台灣 (Taiwan) U+53F0 vs 臺灣 (Taiwan) | U+81FA • It is similar to American English and British English • “Center” vs “Centre” and “Digitization” vs “Digitisation” 香港中文大學圖書館 Different Unicode for Variant Forms of the Same Chinese Character 31 The Chinese University of Hong Kong Library • How to handle the cross search of 1) different forms of the same characters and 2) variant forms of the same characters, all with different Unicode? • Hong Kong Libraries developed an unique way of handling this special nature of Chinese characters 香港中文大學圖書館 Unique Searching Problems of Chinese Characters The Chinese University of Hong Kong Library • For mapping Traditional Chinese, Simplified Chinese and Variant form of Chinese Characters. • Has been Developed since 2003 for supporting CJK and Unicode support in the web‐based Online Public Access Catalog. 香港中文大學圖書館 TSVCC Table The Chinese University of Hong Kong Library • Traditional Chinese: 學 (U+5B78) with Simplified Chinese 学 (U+5B66) • Variant Forms: • U+4E00 一 | U+5F0C 弌 | U+58F9 壹 | U+58F1 壱 | (One) • U+4E8C 二 | U+5F0D 弍 | U+8CB3 貳 | U+8D30 贰 | U+5F10 弐 | U+8CAE 貮 | (Two) • U+53F0 台 | U+81FA 臺 | U+98B1 颱 | U+6AAF 檯 | U+67B1 枱 (Table) Total 4515 entries 香港中文大學圖書館 Mapping in TSVCC Table The Chinese University of Hong Kong Library • We partnered with discoverygarden to implement the TSVCC table in Islandora. • The mapping table mapping‐tsvcc.txt was loaded into /usr/local/fedora/solr/collection1/conf 香港中文大學圖書館 Implementation of TSVCC in Islandora@CUHK The Chinese University of Hong Kong Library Extract from schema.xml 香港中文大學圖書館 Implementation of TSVCC in Islandora@CUHK The Chinese University of Hong Kong Library • Input in either Traditional Chinese or Simplified Chinese retrieves exactly the same result Search in Traditional Chinese Search in Simplified Chinese 香港中文大學圖書館 Implementation of TSVCC in Islandora@CUHK The Chinese University of Hong Kong Library • Islandora is the right and good move for the continued development of CUHK digital initiatives • The problems found in the display and search of Chinese characters turned out to be the main development features of Islandora@CUHK • The enhanced features that built around the idiosyncratic features of Chinese characters further strengthen the platform for the discoverability of CUHK library treasures • A new team, namely Research Support & Digital Initiatives, was just established in the Library to lead the development of e‐Research; digital collections at Islandora is one of the core components of this strategy. 香港中文大學圖書館 Conclusion: Islandora@CUHK The Chinese University of Hong Kong Library Chinese Characters Searching • https://en.wikipedia.org/wiki/Ambiguities_in_Chinese_cha racter_simplification • http://hkiug.ln.edu.hk/unicode • http://hkiug.ln.edu.hk/unicode/hkiug_tsvcc_table‐ UnicodeVersion‐1.0.html Transcription Display for Chinese Text • https://en.wikipedia.org/wiki/Horizontal_and_vertical_writ ing_in_East_Asian_scripts 香港中文大學圖書館 References