Internationalizing OpenClinica

advertisement
Localizing OpenClinica
Hiroaki Honshuku: SQA
1
What is Character Encoding?
 Morse Code (1840) → Latin Alphabet
 ASCII (1963)
 The American Standard Code for Information Interchange
 Characters, Numerals, Symbols, Control Characters
 7-bit: 0~127
 0x41 = letter ‘A’, 0x61 = letter ‘a’
 ISO-8859-n




©
8-bit: 0-255
iso-8859-1: Latin-1, covers most of European Language
iso-8859-5: Cyrillic alphabet
No CJK (Chinese, Japanese, Korean) support
2
What is Character Encoding (cont.)
 iso-8859-1 versus iso-8859-5
iso-8859-1
©
iso-8859-5
A
0x65
A
0x176
B
0x66
B
0x178
3
What is Character Encoding (cont.)
 iso-8859-1 versus iso-8859-5
iso-8859-1
iso-8859-5
A
0x65
A
0x176
B
0x66
B
0x178
 CJK Encoding Mess
 Chinese: Big5 (Traditional), GB18030 (Simplified)
 Japanese: iso-2022-JP, EUC-JP, Shift-JIS
 Korean: EUC-KR, KS C 5861
©
4
What is Character Encoding (cont.)
 iso-8859-1 versus iso-8859-5
iso-8859-1
iso-8859-5
A
0x65
A
0x176
B
0x66
B
0x178
 CJK Encoding Mess
 Chinese: Big5 (Traditional), GB18030 (Simplified)
 Japanese: iso-2022-JP, EUC-JP, Shift-JIS
 Korean: EUC-KR, KS C 5861
 Windows propriety Encoding
 CP1252, CP932, etc
©
5
Unicode
 1887: Apple + Xerox
 1991: Unicode Consortium
©
6
Unicode
 1887: Apple + Xerox
 1991: Unicode Consortium
 UTF-8: 1,112,064 Code Points




©
Standard
ASCII Compatible
Unix, Linux, Mac OS
Big Endian
7
Unicode
 1887: Apple + Xerox
 1991: Unicode Consortium
 UTF-8: 1,112,064 Code Points




Standard
ASCII Compatible
Unix, Linux, Mac OS
Big Endian
 UTF-16 (UCS-2) : 1,112,064 Code Points
 Windows Only
 Little Endian: Requires BOM (Bite Order Marker)
©
8
OpenClinica and i18n
 i18n Support since 3.1.3
 OpenClinica i18n Work in Progress
 Data Mart




©
 Response OptionText
 CRF Name
Discrepancy Note data passing
Escaping Ctrl Chars and MS Propriety Chars
 Should detect at CRF upload
Hard-coded strings
Missing encode declaration in some Export formats
9
Microsoft Specific issues
 Display issues on Windows
 Pre-Win7, GUI was not fully UTF-8 compatible
 Displayed character corruption after saving data
 Viewing extracted data
 Use UTF-8 compatible Text Editor
 Never Copy/Paste from MSOffice
©
10
Demonstration
 Search Subjects and Tables
 CRF and Data Entry
 Discrepancy Notes
 Rules
 Data Import
 Data Extract
©
11
How to Localize
 Documentation
 https://docs.openclinica.com/3.1/technicaldocuments/openclinica-and-internationalization
 UTF-8 Converter
 i18n strings needs to be Hex value
 http://www.branah.com/unicode-converter
 Calendar Widget can take UTF-8 strings
 Pseudo Translation
 Insert one distinctive non-ASCII character
 Duplicate English properties files first
 Search “ = “ and replace by “ = \u8a66”
©
12
How to Localize (cont.)
1. Duplicate English properties files
 Exclude licensing.properties
©
13
How to Localize (cont.)
1. Duplicate English properties files
 Exclude licensing.properties
2. Rename duplicated files to your Locale
NO
©
14
How to Localize (cont.)
1. Duplicate English properties files
 Exclude licensing.properties
2. Rename duplicated files to your Locale
3. Date Format
 Edit format.properties file
©
15
How to Localize (cont.)
1. Duplicate English properties files
 Exclude licensing.properties
2. Rename duplicated files to your Locale
3. Date Format
 Edit format.properties file
4. Translate per GUI page
 Avoids possible legacy strings
 Use Text Editor that supports global search
©
16
Thank You!
©
17
Download