Requirements of End User Defined Characters & Some Frequently Used Solutions Chinese Foundation for Digitization Technology (CMEX) Phobos Chang Agenda The Need For End User Defined Characters When to Use EUDC Embedded Resources Multi-Typefaces Support Mapping Information for Further Use Frequently Used Solutions The Need For End User Defined Characters The first well-accepted Traditional Chinese encoding format only defines 13,053 hanzi characters. – Some characters used in Address do not include. – Characters used in People’s name do not fully include. • Even Prime Minister’s name (only three characters) lacks one. • Tax collection • Fortune Teller’s issue – Characters used in ancient books, used in the names of historical people. – Needs EUDC support from day one. Starts from MS-DOS era. When to Use EUDC When EPUB devices cannot display characters bundled when purchased. GNU Unifont 5.1 – A project started in 1998 by Roman Czyborra – Covers Basic Multilingual Plane (BMP) of Unicode 5.1 standard – Bitmapped (8x8 or 16x16) font at the beginning, then transfer to TrueType – Has a character for BMP code point at first, beautiful is next – Sample Hardware Resource Limitation Why does CMEX suggest using BMP as a minimum? – BMP includes 27,484 normalized hanzi characters – Supplementary Characters are too many for low-end devices • Only CJK Unified Ideographys Extension B contains 42,711 characters • Surrogate support is not public for now – Not every book uses code points beside BMP, those books that needs EUDC support are few Requirement 1X Define EUDC as – End User Defined Character are those characters whose interpretation are not specified by current Unicode standard, plus characters whose interpretation are specified by Unicode standard but assigned code points are not inside BMP. Requirement One – For any character which is not defined in current Unicode standard, or which is defined in current Unicode standard but its code point is outside of Unicode BMP can be used in the context of any one EPUB document via EUDC support. Embedded Resources & Requirement 2X Not every EPUB hardware has wireless connection support. Those devices that have wireless connection, may be carried to a location without connection, like basement. We wish EUDC support works in such a circumstances. Requirement Two – For any EPUB documents which contains EUDC, all resource files to support the display of EUDC can be embedded inside the EPUB zip compression file. Multi-Typefaces Support Some EPUB hardware can let user to choose which typeface they want to use for display For example, Song (細明體) and Kai (楷體) are two mostly used fonts in Traditional Chinese. – To display EUDC using either font, it will need two resources, respectively. Requirement Three – It would be better to provide a mechanism to assign a corresponding resource to support EUDC display for each font using in an EPUB document. Mapping Information for Further Use What if EPUB hardware does not support EUDC? Provide useful information for later process. Requirement Four – It would be better to embed mapping information for all the EUDC using inside an EPUB document. – When embeds mapping information inside an EPUB documents, for EUDC that are interpreted by Unicode standard but beyond BMP, mapping information should contain corresponding code point such as U+20000 for each character; – for EUDC that are not interpreted by Unicode standard, mapping information should contain useful reference coding scheme, such as TF-2121 used in Taiwan’s CNS11643 standard. Use of Private Use Area Most of the solutions for EUDC is PUA-centric in Taiwan. – Input within Input Method Environment – Display for every application – Printing Pros – Easy to use when authoring – Much more straightforward Cons – Will need to check code point range when rendering – Unicode normalization Frequently Used Solutions In-line Images Java Applet for EUDC Display and Input EUDC Display using Ajax Embedded OpenType Font (EOT) sIRF Web Open Font Format Embedded OpenType Font Designed by Microsoft Submitted to W3C in 2007 as part of CSS3 and get rejected. Re-submitted to W3C in 2008 as a standalone submission IE only Not widely accepted even in Taiwan sIFR Scalable Inman Flash Replacement Open Source Javascript and Adobe Flash Web Open Font Format Developing in 2009 A strong favorite for standardization by the W3C Web Fonts Working Group Vendor Support – FireFox since 3.6 – Microsoft IE 9 – Webkit