About CUHK Library

advertisement
The Chinese University of Hong Kong Library
Louisa LAM
Jeff LIU
Islandora Conference Aug 4, 2015
香港中文大學圖書館
Old Collections in a New Bottle: How The Chinese University of Hong Kong Library Uncovers Hidden Treasures of the University
The Chinese University of Hong Kong Library
• About CUHK Library Digital Initiatives
• Islandora@CUHK
• New Features implemented to uncover the idiosyncratic
nature of Chinese texts to make the CUHK Digital
Collections more discoverable
• Book Flipping / Page Progression Direction
• Display of Transcribed Chinese Text
• Cross search of different forms of Chinese characters
香港中文大學圖書館
Outline
香港中文大學圖書館
About CUHK Library
The Chinese University of Hong Kong Library
• 7 branches in 2 campuses
Upper campus:
New Asia College Library, United
College Library
Central campus:
University Library,
Law Library
Lower campus:
Chung Chi College Library,
Architecture Library
Medical Library at
Prince of Wales
Hospital
4
The Chinese University of Hong Kong Library
Collection Size:
Print:
2.4 million
Databases: 670+
E‐ Journals: 130,000+
E‐books:
4,500,000+
http://www.lib.cuhk.edu.hk/
香港中文大學圖書館
About CUHK Library
The Chinese University of Hong Kong Library
Started 1995, now nearly 25 digitization projects were developed with a total of over 5.5 million of images. One of the most popular database reaches several millions hits per year. 香港中文大學圖書館
About CUHK Library Digital Initiatives
19
The Chinese University of Hong Kong Library
 These initiatives suffer from a multitude of limitations that requires migration to a new platform:
• Individual web sites for browsing and searching cast difficulties in branding, maintenance and future technological development
• Non‐standard descriptive metadata schema
• No single platform for the management of digital objects and font‐
end display
• No cross search amongst the digital collections
• Lack of advanced features that were popular with users: facet search, social networking, federation search, non‐discoverable, etc.
Time to move from Individual databases to Islandora
6
香港中文大學圖書館
Time to Move to a new Digital Repository
The Chinese University of Hong Kong Library






Flexible front‐end thematic design & back‐end management of digital objects – present much potential for adapting to new technologies
Lots of modularized functions for rapid development (e.g. Drupal’s i18n to support trilingual interface, Google Analytics, Apache Solr, OAI‐PMH etc…we all know more it ^‐^)
Open source with large user community, documentation and forums
Support digital humanities research: text‐mining, TEI, GIS, etc
Support multiple metadata schema
Digital Preservation and curation
Meet our current and future needs!
CUHK Library is the first Asian library implementing Islandora
Source: http://islandora.ca/islandora‐installations
6
香港中文大學圖書館
Why Islandora?
The Chinese University of Hong Kong Library
• Majority of CUHK Library digital collections are in rare books in Traditional Chinese
• Some idiosyncratic nature of Chinese texts is beyond the scope of a Unicode‐based Solr‐supported repository system.
香港中文大學圖書館
Is Islandora Sufficient for
CUHK Library Use Cases?
The Chinese University of Hong Kong Library
香港中文大學圖書館
PROBLEM / FEATURE 1:
BOOK FLIPPING / PAGE
PROGRESSION DIRECTION
The Chinese University of Hong Kong Library
• The default Internet Archive Reader of Islandora is perfect for some modern Chinese and almost all English books, but …
• Not working as expected for our Chinese Rare Book Collection that require flipping from right to left • The default page direction of Internet Archive Reader will flip the book from left to right, causing weird user experience and incorrect reading of the text.
香港中文大學圖書館
Book Flipping / Page Progression Direction
The Chinese University of Hong Kong Library
Sample link: http://repository.lib.cuhk.edu.hk/en/islandora/object/islandora%3A145#pag
e/3/mode/2up
香港中文大學圖書館
Incorrect flipping direction for
Chinese Rare Books
The Chinese University of Hong Kong Library
• We partnered with discoverygarden to develop a new Page Progression option in CUHK Islandora
香港中文大學圖書館
Implementation of a New Book Flipping /
Page Progression Direction
The Chinese University of Hong Kong Library
• A Drush parameter was also developed for batch ingestion
sudo drush ‐‐root=/var/www/drupal7 ‐‐
uri=http://repository.lib.cuhk.edu.hk ‐‐user=admin islandora_book_batch_preprocess ‐‐namespace=islandora ‐‐
parent=islandora:daoist‐text ‐‐content_models=islandora:bookCModel ‐‐
type=zip ‐‐page_progression=rl ‐‐do_not_generate_ocr ‐‐
target=/mnt/daoist/007294481.zip
香港中文大學圖書館
Implementation of a New Book Flipping /
Page Progression Direction
The Chinese University of Hong Kong Library
• Sample link: http://repository.lib.cuhk.edu.hk/en/islandora/object/islan
dora%3A9860#page/1/mode/2up
香港中文大學圖書館
Corrected Book Flipping / Page Progression
Direction for Chinese Rare Books
The Chinese University of Hong Kong Library
香港中文大學圖書館
PROBLEM / FEATURE 2:
DISPLAY OF TRANSCRIBED CHINESE
TEXT
The Chinese University of Hong Kong Library
• Like most Asian texts, Chinese texts can be written vertically and horizontally.
《中國學生周報》第1期 中華民國41(1952)年7月25日
First issue of “The Chinese Student Weekly” published on 25 July, 1952
http://hklit.lib.cuhk.edu.hk/pdf/journal/78/1952/160001p.pdf
香港中文大學圖書館
Writing Direction of Chinese Text (1)
2. Right to Left
1. Vertical
The Chinese University of Hong Kong Library
• Traditionally, vertical writing was the standard system and widely used in publications 文昌帝君陰騭文廣義節錄 : [三卷] / 周夢
顏述.
http://repository.lib.cuhk.edu.hk/en/islando
ra/object/islandora%3A9860#page/19/mod
e/2up
香港中文大學圖書館
Writing Direction of Chinese Text (2)
香港中文大學圖書館
The Chinese University of Hong Kong Library
Writing Direction of Chinese Text (3)
• Horizontal writing is mainly used for signs
Header Logo of “The Chinese Student Weekly” published in year 1952
Horizontally from Right to Left
• In recent 40 years, the writing system is gradually changed to using horizontal writing for publications, possibly owing to the influence of English and the inability of some software / browser to fully support vertical display.
Horizontally from Left to Right
Header Logo of “Hong Kong Literature” published in year 2013
http://hklit.lib.cuhk.edu
.hk/pdf/journal/97/201
3/1000065.pdf
The Chinese University of Hong Kong Library
• Library collaborated with CUHK Art Museum for developing a Sheng XuanHuai manuscript archive. • Sheng Xuanhuai Archive contains letters and correspondences of Sheng Xuanhuai, who was a very influential entrepreneur in the late Qing Dynasty.
• The texts of the manuscript were transcribed by a Shanghai expert. • There is a need to display the transcribed Chinese text with the digitized images in Islandora.
香港中文大學圖書館
Sheng Xuanhuai Archive
2. Right to Left
1. Left to Right
2. Vertical
1. Vertical
The Chinese University of Hong Kong Library
• A side‐by‐side Open SeaDragon viewer and Transcription viewer is used for displaying the images and the transcribed text.
• However, the readability of the image and annotation is lowered as the reading directions for two viewers are different.
香港中文大學圖書館
Display Problem in Islandora
The Chinese University of Hong Kong Library
• We partnered with discoverygarden to develop a new feature option in Islandora that enables vertical display of transcribed text
• The implemented solution is based on the Writing Mode style in CSS3
香港中文大學圖書館
Display of Chinese Text in Vertical Direction
2. Right to Left
1. Vertical
1. Vertical
The Chinese University of Hong Kong Library
2. Right to Left
• http://repository.lib.cuhk.edu.hk/en/islandora/object/nam
espace%3A2
香港中文大學圖書館
Corrected Display Direction for Chinese Text
The Chinese University of Hong Kong Library
Known issues / Limitations:
• The enhancement fits for all recent versions of popular browsers, except FireFox, which does not support CSS3 writing mode currently
• The vertical text display is not supported in the admin edit mode of Islandora
香港中文大學圖書館
Transcription Display for Chinese Text
The Chinese University of Hong Kong Library
香港中文大學圖書館
PROBLEM / FEATURE 3:
CJK TSVCC SEARCH
The Chinese University of Hong Kong Library
香港中文大學圖書館
Chinese Search in default Islandora / Solr
The Chinese University of Hong Kong Library
Search of 2 characters 中文: no result
Search of one phrase 中文大學圖書
館 (title): no result
香港中文大學圖書館
Search of one character 中: no result
The Chinese University of Hong Kong Library
• It is not the problem / bug of Islandora / Solr
• Just like other systems, Solr is developed based on western language.
• In CUHK, our Integrated Library System Innopac / Millennium also has similar problems.
• Customization is required to enable the search of Chinese characters
香港中文大學圖書館
Chinese Search in default Islandora / Solr
The Chinese University of Hong Kong Library
Word
Phrase
• 香 (Incense)
• 港 (Port)
• 中 (Center)
• 文 (Language)
• 大 (Large)
• 學 (Learn)
• 香港 (Hong Kong)
• 中文 (Chinese)
• 文大 (Meaningless)
• 大學 (University)
• 中文大學 (Chinese University)
• 香港中文大學 (Chinese University of Hong Kong)
香港中文大學圖書館
Structure of Chinese Characters
Form
Traditional Chinese 繁體中文 (Proper Chinese): Used in Hong Kong, Taiwan
Simplified Chinese 简体中文: Used in Mainland China after 1949 and Singapore
28
The Chinese University of Hong Kong Library
•
•
•
•
•
•
•
•
Traditional Chinese:
中文大學 (Chinese University) (U+5B78)
Simplified Chinese:
中文大学 (Chinese University) (U+5B66) Traditional Chinese:
中國 (U+570B)
Simplified Chinese:
中国 (U+56EF)
香港中文大學圖書館
Different Unicode for Different Forms of the Same Chinese Character
29
The Chinese University of Hong Kong Library
• In Hong Kong, as a Special Administration Region of China, we need to serve both indexing and searching of Traditional Chinese and Simplified Chinese in our publications, websites, and ….. Islandora
• CUHK Library users are composed of Mainland students and faculty that use Simplified Chinese and local students and faculty that use Traditional Chinese
• Most prefer cross‐search and retrieval of materials in both traditional and simplified Chinese by inputting one single form of characters
香港中文大學圖書館
Preferred Search and Display Mode of CUHK Library Users
30
The Chinese University of Hong Kong Library
• Because of the long history of China, there are variant forms of the same character carrying the same meaning:
 台灣 (Taiwan) U+53F0 vs 臺灣 (Taiwan) | U+81FA
• It is similar to American English and British English
• “Center” vs “Centre” and “Digitization” vs “Digitisation”
香港中文大學圖書館
Different Unicode for Variant Forms of the Same Chinese Character
31
The Chinese University of Hong Kong Library
• How to handle the cross search of 1) different forms of the same characters and 2) variant forms of the same characters, all with different Unicode?
• Hong Kong Libraries developed an unique way of handling this special nature of Chinese characters
香港中文大學圖書館
Unique Searching Problems of Chinese Characters
The Chinese University of Hong Kong Library
• For mapping Traditional Chinese, Simplified Chinese and Variant form of Chinese Characters.
• Has been Developed since 2003 for supporting CJK and Unicode support in the web‐based Online Public Access Catalog. 香港中文大學圖書館
TSVCC Table
The Chinese University of Hong Kong Library
• Traditional Chinese: 學 (U+5B78) with Simplified Chinese 学
(U+5B66) • Variant Forms:
• U+4E00 一 | U+5F0C 弌 | U+58F9 壹 | U+58F1 壱 | (One)
• U+4E8C 二 | U+5F0D 弍 | U+8CB3 貳 | U+8D30 贰 | U+5F10 弐 | U+8CAE 貮 | (Two) • U+53F0 台 | U+81FA 臺 | U+98B1 颱 | U+6AAF 檯 | U+67B1 枱
(Table)
Total 4515 entries
香港中文大學圖書館
Mapping in TSVCC Table
The Chinese University of Hong Kong Library
• We partnered with discoverygarden to implement the TSVCC table in Islandora.
• The mapping table mapping‐tsvcc.txt was loaded into /usr/local/fedora/solr/collection1/conf 香港中文大學圖書館
Implementation of TSVCC in
Islandora@CUHK
The Chinese University of Hong Kong Library
Extract from
schema.xml
香港中文大學圖書館
Implementation of TSVCC in Islandora@CUHK
The Chinese University of Hong Kong Library
• Input in either Traditional Chinese or Simplified Chinese retrieves exactly the same result
Search in Traditional Chinese
Search in Simplified Chinese
香港中文大學圖書館
Implementation of TSVCC in
Islandora@CUHK
The Chinese University of Hong Kong Library
• Islandora is the right and good move for the continued development of CUHK digital initiatives
• The problems found in the display and search of Chinese characters turned out to be the main development features of Islandora@CUHK
• The enhanced features that built around the idiosyncratic features of Chinese characters further strengthen the platform for the discoverability of CUHK library treasures
• A new team, namely Research Support & Digital Initiatives, was just established in the Library to lead the development of e‐Research; digital collections at Islandora is one of the core components of this strategy.
香港中文大學圖書館
Conclusion: Islandora@CUHK
The Chinese University of Hong Kong Library
Chinese Characters Searching
• https://en.wikipedia.org/wiki/Ambiguities_in_Chinese_cha
racter_simplification
• http://hkiug.ln.edu.hk/unicode
• http://hkiug.ln.edu.hk/unicode/hkiug_tsvcc_table‐
UnicodeVersion‐1.0.html
Transcription Display for Chinese Text
• https://en.wikipedia.org/wiki/Horizontal_and_vertical_writ
ing_in_East_Asian_scripts
香港中文大學圖書館
References
Download