Bookmark Doc ID: Note:158577.1 Subject:NLS_LANG Explained (How does Client-Server Character Conversion Work?) Type: BULLETIN Status: PUBLISHED Fixed font Content Type: Creation Date: Last Revision Date: Go to End TEXT/PLAIN 24-SEP-2001 03-JUL-2003 Index of This Note: ------------------1.0 What is Oracle Globalization Support ? 1.1 What's the Purpose of This Document? 2.1 What is a Characterset or Code Page? 2.2 So Why Are There Different Charactersets? 2.3 What's the Difference Between 7 bit, 8 bit and Unicode Charactersets? 3.1 Why Should I Convert? 3.2 A detailed example of a *wrong* nls setup to understand what's going on. 3.3 How to see what's really stored in the database? 4.1 So What Should I Do? 4.1.1 Identify the characterset/codepage used by your clients. 4.1.2 Set the NLS_LANG on the client to the corresponding Oracle characterset. 4.1.3 Create your database with a characterset that supports ALL symbols used by your various clients. 4.1.4 Set NLS_LANG on the server ALSO to the characterset used by the OS (terminal type) of the server. 4.2 How can I Check the Client's NLS_LANG Setting? 4.2.1 On Unix: 4.2.2 On Windows: 4.3 4.3 Where is the Character Conversion Done? 4.4 NLS_LANG default value and priority of NLS parameters: 5.1 My windows sqlplus is not showing all my extended characters. 5.2 I get an (inverted) question mark (¿ or ?) when selecting back the just inserted character. 5.3 What about sql*loader, import, export, my tool? 5.4 What about database links? 5.5 What about webclients (browsers) and webservers connecting to Oracle? 5.6 What about Multiple Homes on Windows? 5.7 Is there an Oracle Unicode Client on Windows? 5.8 UTL_FILE is writing / reading incorrect characters. 5.9 Loading (XML) files as XMLtypes stores incorrect characters. 5.10 ODBC and NLS_LANG. 5.11 everything works except cut-and-paste from txt or word file to sqlplus. 6.1 Things Not Covered in This Note: 6.2 Some Other Interesting Notes: 1.0 What is Oracle Globalization Support ? ------------------------------------------ Globalization support enables Oracle software to support different languages and different national conventions in date and monetary formatting. It's also used to convert the charactersets of different clients to the characterset of the database. The name 'Globalization support' is the new name from Oracle9i onwards for 'National Language Support (NLS)'. 1.1 What's the Purpose of This Document? ---------------------------------------- To provide a basic understanding of what is going on if you set NLS_LANG, how the conversion is done and how to set up a correct configuration. If you think or notice that you have problems with character conversion the please *do* go first of all through this note, so that you have a good understanding what is a correct setup. If needed create a new db and test. If you understand this and then want to correct an existing enviroment go to: [NOTE:225912.1] Changing the Database Character Set - an Overview But *please* don't start changing you characterset without knowing what you are doing. And ALWAYS take a cold backup first. If you have ANY doubt, test it on a backup of your enviroment. 2.1 What is a Characterset or Code Page? ---------------------------------------- A characterset is just an agreement on what numeric value a symbol has. A computer does not know ' A ' or ' ? ', it only knows the (binary) numeric value for that symbol, defined in the characterset used by its Operating System (OS) or in hardware (firmware) for terminals. A computer can only manipulate numbers which is why there is a need for charactersets. An example is 'ASCII', an old 7 bit characterset, 'ROMAN8' a 8 bit characterset on unix or 'UTF8' a multibyte characterset. A code page is the name for the Windows/DOS encoding schemes, for Oracle NLS you can consider it the same as a characterset. You also have to distinguish between a FONT and a characterset/codepage. A font is used by the OS to convert a numeric value into a graphical 'print' on screen. The Windings Font on Windows is the best example of a font where an ' A ' is NOT shown as an ' A ' on screen, but for the OS the numeric value represents an ' A '. So you don't SEE it as an ' A ', but for Windows it's an ' A ' and will be saved (or used) as an ' A '. To better understand the above, just open MS Word, choose the Windings Font, type your name (you will see symbols) and save this as html, if you open the html file with Notepad you will see that in the <style> section the fonts are declared and lower in the <body> section you will find your name in plain text but with style='font-family:Wingdings' attribute. If you open it in Internet Explorer or Netscape, you will again see the Windings symbols. It's the presentation that changes, not the data itself. It's also possible that you don't see with a particular font ALL the symbols defined in the codepage you are using, just because the creator of the FONT did not include a graphical representation for all the symbols in that font. That's why you get sometimes black squares on the screen if you change fonts. On Windows you can use the 'Character Map' tool to see all the symbols defined in a font (!font, not characterset!). [NOTE:137127.1] for a more in-depth overview of this, highly recommended! Microsoft typography webpage: http://www.microsoft.com/typography/default.asp 2.2 So Why Are There Different Charactersets? --------------------------------------------- Two main reasons: * Historically vendors have defined different 'sets' for their hardware and software, mainly because there were no official standards. * New character sets have been defined to support new languages. With an 8 bit characterset, you are limited in the number of symbols you can support so there are different sets for different written languages. 2.3 What's the Difference Between 7 bit, 8 bit and Unicode Charactersets? ------------------------------------------------------------------------ A 7 bit characterset only knows 128 symbols (2^7) An 8 bit characterset knows 256 symbols (2^8) Unicode (UTF8) is a multibyte characterset. The latest version of the Unicode standard (3.1) defines 94,140 encoded characters. Unicode has the capability to define over a million characters . Oracle has several revisions of the Unicode standard implemented during the years: AL24UTFFSS (unicode version 1.1) Introduced in V7 and now obselete. UTF8 (unicode version 2.0) introduced in V8. AL32UTF8 (unicode version 3.1) introduced in V9. 3.1 Why Should I Convert? ------------------------- You can store data with the wrong setup, but that's up to you to take the risk. A common INCORRECT setup is storing 8 bit characters in a 7 bit database. This may render some tools or applications pretty useless at some point, not mentioning possible problems when upgrading Oracle..... So a CORRECT NLS setup is really NEEDED. 3.2 A detailed example of a *wrong* nls setup to understand what's going on: -------------------------------------------------------------------------- You have created a database on your unix box with the US7ASCII characterset. Your Windows clients work with the MSWIN1252 characterset (regional settings -> western europe) and you, as dba, use the unix shell (ROMAN8) to work on the database. You set NLS_LANG to american_america.US7ASCII on the clients and the server. (note: this is an INCORRECT setup to explain characterset conversion, don't use it in your enviroment!) a very important point: When the client NLS_LANG characterset is set to the same value as the database characterset, Oracle assumes that the data being sent or received are of the same encoding, so no conversions are performed. The data is just stored "as is", bit by bit.... Let's do something now: You insert an ' é '(LATIN SMALL LETTER E WITH ACUTE ) into a table ETEST containing one column 'TEST' of the type 'char'. As long as you insert into and select from the column on Windows NT clients with the MSWIN1252 characterset everything runs smoothly. No conversion is done and 8 bits are inserted and read back, even if the characterset of the database is defined as 7 bits. This happens because a byte is 8 bit and Oracle is ALWAYS using 8 bits even with a 7 bit characterset. In a correct setup the Most Significant Bit is just not used and only 7 bits are taken into account. For one reason or another you need to insert from the unix server. When you select from tables where data is inserted by the Windows clients you get a ' Õ ' (LATIN CAPITAL LETTER O WITH TILDE) for the ' é ' instead of the ' é '. If you insert ' é ' on the unix server and you select the row inserted on the unix at the Windows client you get an ' Å ' (LATIN CAPITAL LETTER A WITH RING ABOVE) back. The thing is that you have INCORRECT data in the database. You store the numeric value for ' é ' of the WIN1252 characterset in the database but you tell Oracle this is US7ASCII data, so Oracle is NOT converting anything and just stores the numeric value (again: Oracle thinks that the client is giving US7ASCII codes because the NLS_LANG is set to US7ASCII, and the database characterset is also US7ASCII -> no conversion done). When you select the same column back on the unix server, oracle is again thinking that the value is correct (Oracle is thinking that the terminal understands US7ASCII) and passes the value to the unix terminal without any conversion. Now the problem is that in the WIN1252 characterset the ' é' has the hexadecimal value 'E9' and in the Roman8 characterset the hexadecimal value for ' é ' is 'C5'. Oracle just passes the value stored in the database ('E9') to the unix terminal, and the unix terminal thinks this is the letter ' Õ ' because in its (Roman8) characterset the hexadecimal value 'E9'is representing the letter ' Õ '. So instead of the ' é ' you get ' Õ ' on the unix terminal screen. The inverse (the insert on the unix and the select on the Windows client) is just the same story, but you get other results. The solution is creating database with a characterset that contains ' é ' (WE8MSWIN1252,WE8ISO89859P1, UTF-8, etc..) and setting the NLS_LANG on client to WE8MSWIN1252 and on the server to WE8ROMAN8. If you then insert an ' é ' on both sides, you will get an ' é' back regardless of where you select them. Oracle knows then that a hexadecimal value of 'C5' inserted by the unix and a 'E9' from a MSWIN1252 client are both ' é ' inserts ' é ' into the database (the code in the database depends on the characterset you have chosen). The same problem appears if you add some Windows clients who are using another characterset and have an incorrect NLS_LANG set. You don't have to switch between unix, mainframe and Windows clients to run into this kind of problem. [NOTE:225938.1] Database Character Set Healthcheck gives more info on how to check if you can change you database characterset without losing data, even if you have stored mswin1252 in an us7ascii database or so. 2 important remarks: * The characterset defined with the NLS_LANG parameter does NOT CHANGE your client's characterset, it is used to let Oracle know what characterset you are USING on the client side, so Oracle can do the proper conversion. You cannot just set NLS_LANG to the characterset you WANT. * Another myth is that if you don't set the NLS_LANG on the client it uses the NLS_LANG of the server. This is also NOT true! see the section "4.4 NLS_LANG default value" for this. 3.3 How to see what's really stored in the database? ---------------------------------------------------To find the real numeric value for a character stored in the database is to use the dump command : [NOTE:13854.1] Dump SQL Command for NLS Debugging For a UTF8 database see also: [NOTE:69518.1] Determining the codepoint for UTF8 characters 4.1 So What Should I Do? ------------------------ To have a proper NLS environment you have to observe these steps: 4.1.1 Identify the characterset/codepage used by your clients. --------------------------------------------------------------> contact your OS vendor (Microsoft, HP, Sun...) if you have problems with your OS configuration. You may start with these sites and notes for your OS enviroment: For Microsoft Windows platforms: [NOTE:179133.1] The correct NLS_LANG in a Windows Environment [NOTE:199926.1] How to change the ANSI Code Page (ACP) on Windows http://www.microsoft.com/globaldev/reference/ (under the REFERENCE tab on the left of the page) OEM = the command line codepage, ANSI = the gui codepage For unix platforms, search your OS documenation for "LANG" and "locale" as this is differently implemented by each vendor. For unix the characterset is also defined by the terminal (emulation) used. Generic FAQ for Unicode/UTF-8 on POSIX systems (Linux, Unix) by Markus Kuhn http://www.cl.cam.ac.uk/~mgk25/unicode.html For Linux: (Li18NUX - Linux Internationalization Initiative) http://www.li18nux.org (Bruno Haible's Linux Unicode HOWTO) ftp://ftp.ilog.fr/pub/Users/haible/utf8/Unicode-HOWTO.html For Sun solaris: (Solaris 9 OE Globalization FAQ ) http://www.sun.com/developers/gadc/faq/sol9g11n.html For Tru64: (section 4.2 of the System Administration Guide for Tru64 UNIX Version 4.0F or higher) http://www.tru64unix.compaq.com/docs/base_doc/DOCUMENTATION/V40F_HTML /APS2RFTE/TITLE.HTM For HP-UX: (hp-ux 9.x - 11i internationalization features white paper) http://docs.hp.com/hpux/pdf/5971-2270.pdf For IBM AIX:(National Language Support Guide and Reference) http://publib16.boulder.ibm.com/pseries/en_US/aixprggd/nlsgdrf/nlsgdr f02.htm For fujitsu-siemens (SINIX , Reliant UNIX and solaris on Siemens) see: http://manuals.fujitsu-siemens.com For AS/400: (IBM publication National Language Support SC41-3101) http://publib.boulder.ibm.com/cgibin/bookmgr/bookmgr.cmd/books/qbkawc00/CONTENTS?SHELF=&DT=19940609071 756 For the classic MacOS(v7-9) see: http://developer.apple.com/techpubs/mac/Text/Text-30.html http://developer.apple.com/techpubs/mac/Text/Text-516.html Note that there are currently no supported versions for MacOS any more. For MacOS X: http://developer.apple.com/techpubs/macosx/macosx.html Note: 9i2 for MacOS X is no production release yet, just developer release and so not yet officially supported. You can download it here: http://technet.oracle.com/software/products/oracle9i/content.html This might also be usefull: http://www.macdevcenter.com/pub/a/mac/2003/01/03/oracle_part3.html?pa ge=1 4.1.2 Set the NLS_LANG on the client to the corresponding Oracle characterset. ----------------------------------------------------------------------------- Use this note: [NOTE:226492.1] Listing of Character Sets for 9.2, 9.0.1 and 8.1.7 including Language references Or if you use windows clients: [NOTE:179133.1] The correct NLS_LANG in a Windows Environment ********************************************************************* **** It cannot be stressed enough that you need to set the NLS_LANG to the characterset that your client is actually *using*. Not to the characterset you *want* to use, if you need on your client another characterset (to display cyrillic or so) the you need to see how you can change the characterset of the client on OS level. ********************************************************************* **** 4.1.3 Create your database with a characterset that supports ALL symbols used by your various clients. ------------------------------------------------------------------------------- Use locale builder (from 9i onwards) to view what characters are defined: [NOTE:223706.1] Using Locale Builder to view the definition of character Sets or use this note: [NOTE:226492.1] Listing of Character Sets for 9.2, 9.0.1 and 8.1.7 including Language references If you are thinking about using UTF8: [NOTE:119119.1] UTF8 Database Character Set Implications This select gives all the known charactersets for that release of Oracle on that platform. select unique VALUE from V$NLS_VALID_VALUES where PARAMETER ='CHARACTERSET'; You can create on Unix an database with a "Windows" characterset like WE8MSWIN1252. Oracle is not depending on the OS for the DATABASE (national) characterset. The only restriction is that you cannot use EBCDIC charactersets (like used on AS400 ea) on ASCII based platforms (like used on Unix and Windows) (or inverse) for the database characterset. select * from nls_database_parameters where parameter like '%CHARACTERSET%'; gives the current database (national) characterset. All charactersets that Oracle implements are based on industry standards, see these web sites for more info about the contents of an characterset: http://www.unicode.org/ http://www.iso.org/ http://www.microsoft.com/globaldev/reference/ (under the REFERENCE tab on the left of the page) 4.1.4 Set NLS_LANG on the server ALSO to the characterset used by the OS (terminal type) of the server. ----------------------------------------------------------------------- in fact the same as sections 4.1.1 and 4.1.2 ... 4.2 How can I Check the Client's NLS_LANG Setting? -------------------------------------------------- To be 100% sure about the value used by the client, you can use these methods to get back the value of NLS_LANG: 4.2.1 On Unix: -------------- sql>HOST ECHO $NLS_LANG This returns the value of the parameter. 4.2.2 On Windows: ----------------- On Windows NT you have two possible options, normally the NLS_LANG is set in the registry, but it can also be set in the environment, however this is not often done. The value in the environment takes precedence over the value in the registry and is used for ALL Oracle_Homes on the server(!). Also note that any USER enviroment variable is taking precedence over any SYSTEM enviroment variable (this is windows NT behaviour, nothing to do with Oracle) if set. Please use the command line mode of sqlplus -> sqlplus.exe in a command prompt. To check if it's set in the environment: sql>HOST ECHO %NLS_LANG% If this reports just %NLS_LANG% back, the variable is not set in the environment. If it's set it reports something like ENGLISH_UNITED KINGDOM.WE8ISO8859P1 If NLS_LANG is not set in the enviroment, check the value in the registry: sql>@.[%NLS_LANG%]. If you get something like: unable to open file ".[ENGLISH_UNITED KINGDOM.WE8ISO8859P1]." the "file name" between the '[]' is the value of the registry parameter. If you get this as result: unable to open file ".[%NLS_LANG%]." then the parameter NLS_LANG is also not set in the registry. Note: the @.[%NLS_LANG%]. "trick" reports the NLS_LANG known by the sqlplus executable, it will not read the registry itself. But then you are not sure if the variable is set in the enviroment or in the registry. That's the reason of checking with the host command first.. All other NLS parameters can be retrieved by a SELECT * FROM NLS_SESSION_PARAMETERS; See also [NOTE:241047.1] The Priority of NLS Parameters Explained. note: SELECT USERENV ('language') FROM DUAL; gives the session's <Language>_<territory> but the DATABASE character set, so the characterset returned is not the client's complete NLS_LANG setting! If you log a tar regarding an NLS issue please provide this information: [NOTE:226692.1] Finding out your NLS Setup. 4.3 Where is the Character Conversion Done? ------------------------------------------- Normally the conversion is done at client side for performance reasons. This is true from Version 8.0.4 onwards. If the database is using a characterset not known by the client then the conversion is done at server side. This is true from Version 8.1.6 onwards, if you are using pre-816 clients please see: [NOTE:70150.1] ALERT: Certain Character Sets Cause Client Code to Fail Note that there is a known problem with connecting with an V7 client to an utf8 9.2 database. This will not fail with an error but any non-us7ascii character will be incorrectly stored. There is no fix for this as this is an non-supported configuration. 4.4 NLS_LANG default value: --------------------------- see [NOTE:241047.1] The Priority of NLS Parameters Explained. 5.1 My windows sqlplus is not showing all my extended characters. ----------------------------------------------------------------- You see black squares instead of the characters. That's because sqlplusw.exe uses per default the Fixedsys font on windows and this font is not containing all characters, so it may have problems displaying some characters (the character is not defined in the font, so windows cannot display it. note that the character is correctly *stored* in the database) To get around this see (from 8.1.7 onwards): [NOTE:132453.1] How to Change the Displayed Font in SQL*Plus (GUI) on WinNT. Using 'Lucida Sans Unicode' should work for most cases. Note that you don't need a "unicode" font, just a font that knows how to render all the characters you want to use, but with a unicode font all chars should be covered. The same problem occurs with the dos version (sqlplus.exe), here change the properties of the dos box / cmd.exe / command prompt used. Under the Font tab choose there Lucida Console. As stated before you can see with the windows "Character Map" system tool what characters are known by a font. 5.2 I get an (inverted) question mark (? or ¿) when selecting back the just inserted character. -------------------------------------------------------------------------This is most likly intended behavior, when your client send characters to the database and the character you are inserting is not *known* by the database characterset then Oracle stores a "replacement" character. An example: you have a arabian windows client (AR8MSWIN1256 codepage), so you set your NLS_LANG to ARABIC_BAHRAIN.AR8MSWIN1256 and you are connecting to an database with an EE8MSWIN1250 database characterset. Now, as long as you use ascii type characters or extended characters like T or é there is no problem, but if you insert an arabian character like (ARABIC LETTER TEH , unicode code point 0x62b ) then you get an " ? " because Oracle cannot find a matching letter in the 1250 characterset. " ? " is used in the microsoft/windows charactersets and " ¿ " in the ISO charactersets as replacement character. It's also possible, but this less visible and not so common, that Oracle is doing a remapping to *another* character that resembles somwhat to the one inserted: a plain " e " for an locale builder " é " or so , but this you can check with in the "Replacement Characters" tab for the characterset you are using on that database. Of course, again, using an UTF8 database characterset solves all this... There is even a proposal to add Tolkiens Tengwar script to the unicode standard, not yet supported by Oracle ;-). 5.3 What about sql*loader, import, export, my tool? --------------------------------------------------- Basicly it's always the same for any tool used to input data: you have INPUT to a client: * for sqlloader this is the txt/xls/html file you read. * for import the dump file your read. * for sqlplus is this you command line or GUI enviroment "driving" your keyboard. * for a Thick Jdbc (oci) driver the java program that calls it. * for forms (windows runtime): [NOTE:105809.1] Character Set Support for Developer Tools * Web Forms 6i/9i is unicode enabled . * etc You have to set the character set part of the NLS_LANG to the characterset of the INPUT telling Oracle what charterset you are using so the Oracle client can do the conversion. For output, the same thing, if the tool you are using (like exp) is capable of outputting the characterset wanted (!) you just tell Oracle to convert it to that charset by setting the NLS_LANG on the client. 5.4 What about database links? ------------------------------ The NLS_LANG on the server (or client) has no influence on characterset conversion trough a database link, Oracle will do the conversion from the (national) characterset of the source database to the (national) characterset of the target database (or inverse). 5.5 What about webclients (browsers) and webservers connecting to Oracle? ------------------------------------------------------------------------ see [NOTE:229786.1] NLS_LANG and webservers explained. 5.6 What about Multiple Homes on Windows? ----------------------------------------- There is nothing special with NLS_LANG and the multiple homes on Windows. The parameter taken into account is the one specified in the ORACLE_HOME registry key used by the executable. Again, if set in the environment, it takes precedence over the value in the registry and is used for ALL Oracle_Homes on the server/client(!). The NLS_LANG can be found in these registry keys: HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE or HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\HOMEx See [NOTE:73963.1] Using multiple ORACLE HOMES on Windows platform for more information 5.7 Is there an Oracle Unicode Client on Windows? ------------------------------------------------- No, the gui of windows is not unicode, windows provide an API to the unicode layer of the OS (this is used by MS office 2000 / XP for example) but Oracle uses the "old" GUI API (like 99% of the windows programs). IF you need to display unicode then you might want to use iSQL*PLUS the browser based version of sqlplus.. See: [NOTE:231231.1] Quick setup of iSQL*Plus as unicode (UTF8) client on windows. 5.8 UTL_FILE is writing / reading incorrect characters. ------------------------------------------------------- See [NOTE:227531.1] Character set conversion when using UTL_FILE 5.9 Loading (XML) files as XMLtypes stores incorrect characters. ---------------------------------------------------------------- See [NOTE:229291.1] XDB (xmltype) functionallity and NLS related issues for 9.2 5.10 ODBC and NLS_LANG. ---------------------- See [NOTE:231953.1] ODBC and NLS Related Things to Know 5.11 everything works except cut-and-paste from txt or word file to sqlplus: --------------------------------------------------------------------------- See [NOTE:226558.1] An example inserting cyrillic data into a database on west european windows. for a live example en explenation what's happening when inserting data from txt or word files containing data in another code page than you are normally using. (that note uses cyrillic as example but you can easily use it for other languages.) 6.1 Things Not Covered in This Note: ------------------------------------ NLS_SORT (see [NOTE:13978.1] and [NOTE:13882.1] for this) NLS_TERRITORY, NLS_CURRENCY, and NLS_NUMERIC_CHARACTERS (see the Globalization Manual) NLS_DATE_FORMAT (see [NOTE:30557.1] for this) The Euro (see [NOTE:68790.1] for this) Export/import (see [NOTE:15095.1] for this) 6.2 Some Other Interesting Notes: --------------------------------- [NOTE:60134.1] NLS Frequently Asked Questions [NOTE:152935.1] Alert: Warning of Change to Support of the EURO for the 12 EMU Countries [NOTE:226692.1] Finding out your NLS Setup. [NOTE:241047.1] The Priority of NLS Parameters Explained. [NOTE:132453.1] How to Change the Displayed Font in SQL*PLUS (GUI) on WinNT [NOTE:179133.1] The correct NLS_LANG in a Windows Environment [NOTE:199926.1] How to change the ANSI Code Page (ACP) on Windows [NOTE:231231.1] Quick setup of iSQL*Plus as unicode (UTF8) client on windows. [NOTE:69518.1] Determining the codepoint for UTF8 characters [NOTE:223706.1] Using Locale Builder to view the definition of character sets [NOTE:119119.1] UTF8 Database Character Set Implications [NOTE:62107.1] The National Character Set in Oracle8 [NOTE:225912.1] Changing the Database Character Set - an Overview [NOTE:123670.1] Use Scanner Utility before Altering the Database Character Set [NOTE:225938.1] Database Character Set Healthcheck [NOTE:66320.1] Changing the Database Character Set or the Database National Character Set [NOTE:137127.1] Character Sets, Code Pages, Fonts and the NLS_LANG Value [NOTE:13854.1] Dump SQL Command for NLS Debugging [NOTE:77442.1] ORA_NLS (ORA_NLS32, ORA_NLS33) Environment Variables. [NOTE:132090.1] How to get messages in your own language on MS Windows platform? [NOTE:115001.1] NLS_LANG Client Settings and JDBC Driver [NOTE:131207.1] How to Set Unix Environment Variable [NOTE:227531.1] Character set conversion when using UTL_FILE [NOTE:229291.1] XDB ( xmltype ) functionallity and NLS related issues for 9.2 [NOTE:231953.1] ODBC and NLS Related Things to Know [NOTE:105809.1] Character Set Support for Developer Tools additional information from Globalization Support: http://otn.oracle.com/tech/globalization/content.html and http://otn.oracle.com/products/oracle8i/htdocs/faq_combined.htm a Case study: [NOTE:187739.1] NLS Setup in a Multilingual Database Environment For further NLS / Globalization information you may start here: [NOTE:150091.1] Globalization Technology (NLS) Library index . Bookmark Doc ID: Note:231953.1 Fixed font Content Type: Go to End TEXT/PLAIN Subject:ODBC and NLS Related Things to Know Type: BULLETIN Status: PUBLISHED Creation Date: Last Revision Date: 10-MAR2003 19-MAR2003 PURPOSE ------- Give a little overview of common NLS related cavevats with ODBC. ODBC and NLS Related Things to Know: ------------------------------------ * From 8i drivers onwards it's NOT possible any more to "disable" Characterset conversion by specifying for the NLS_LANG the same characterset as the database characterset. There is now ALWAYS a check to see if a codepoint is valid for that characterset. Typically you will encounter problems if you upgrade an environment that has NO NLS_LANG set on the client (or US7ASCII) and the database was also US7ASCII. This *incorrect* setup allowed you to store characters like éèç in an US7ASCII database, with the new 8i drivers this is not possible any more. The solution is to correct your NLS setup as described in the notes: [NOTE:158577.1] NLS_LANG Explained (How does Client-Server Character Conversion... [NOTE:225912.1] Changing the Database Character Set - an Overview and related... * The NLS_LANG is for ODBC 8.1.7.2.0 and up now taken from the Oracle_Home where the ODBC driver is installed, before ODBC 8.1.7.2.0 it was taken from the default_home, even if this was NOT the home of that version (so if you have a 734 and an 806 home on the same pc, then for the 806 ODBC the NLS_LANG of the 734 home is used, you can only have one ODBC driver of this version installed...) The ODBC installation now supports multiple Oracle homes. Each installation of the ODBC driver will be uniquely identified by the name of the Oracle home which it is installed under. For example, if the name of the Oracle home is "OraHome81" the ODBC driver will installed as "Oracle in OraHome81". The Oracle ODBC driver use to always be installed as "Oracle ODBC Driver". This list of installed ODBC drivers can be viewed from the ODBC Administrator utility under the "Drivers" tab. * A datasource configuration option to force SQLDescribeCol to return a data type of SQL_WCHAR for SQL_CHAR columns, SQL_WVARCHAR for SQL_VARCHAR columns, and SQL_WLONGVARCHAR for SQL_LONGVARCHAR columns has been added in ODBC 8.1.7.1.0 and up. Enabling this option allows ADO applications to use Unicode. ADO relies on the return value of SQLDescribeCol to determine how to bind the result column. Currently the Oracle ODBC driver would never return a data type of 'SQL_W' because the database does not support defining columns as type Unicode. By default Force WCHAR Support is disabled. Of course, you have to have an unicode database. * The Oracle8 ODBC driver (ODBC 8.1.5.5.0 and up) now supports Unicode. Unicode support is dependent on the Unicode features available through the Oracle Call Interface (OCI). OCI 8.1.5 supports inputting Unicode data into a database through SQLBindParameter and retrieving Unicode data from a database through SQLBindCol or SQLGetData. http://otn.oracle.com/docs/tech/windows/odbc/htdocs/817help/sqoraUnicode_Support.htm * The installed CLIENT software is responsible for the characterset conversion, so if you need support for the EURO symbol for example then you need to be shure that the client version is supporting this. Using a 806 odbc driver with an 805 client will NOT support euro symbol for example as the 805 client NLS libraries have no support for EURO, use a 806 client here... * See "How to write an ODBC Application to Support Unicode" on http://technet.oracle.com/tech/globalization/content.html for some more info. RELATED DOCUMENTS ----------------- [NOTE:158577.1] NLS_LANG Explained (How does Client-Server Character Conversion Work?) [NOTE:179133.1] The correct NLS_LANG in a Windows Environment [NOTE:225912.1] Changing the Database Character Set - an Overview For further NLS / Globalization information you may start here: [NOTE:150091.1] Globalization Technology (NLS) Library index . Bookmark Doc ID: Note:179133.1 Subject:The correct NLS_LANG in a Windows Environment Type: BULLETIN Status: PUBLISHED Fixed font Content Type: Creation Date: Last Revision Date: Go to End TEXT/PLAIN 07-MAR2002 13-JUN-2003 Content: -------- 1. Key concepts/terminology. 2. How to set up my NLS_LANG 3. The correct NLS_LANG for my Windows ANSI Code Page 4. The correct NLS_LANG for my DOS / Command Prompt OEM Code Page 5. How to check the NLS_LANG 6. List of common NLS_LANG to be set in Windows registry 7. List of common character sets to be used in a command prompt 8. How Windows uses Fonts to display the different charactersets 1. Key Concepts/terminology: ---------------------------- The intention of this note is to provide windows specific information in addition of [NOTE:158577.1] NLS_LANG Explained (How does ClientServer Character Conversion Work?). Please read that note first to have an idea how NLS_LANG works. 1.1 Windows and Dos Code Pages: ------------------------------- On Windows systems, the encoding scheme (=Characterset) is specified by a Code Page. Code Pages are defined to support specific languages or groups of languages which share common writing systems. Usually, in non Chinese-Japanese-Korean environments, the Windows GUI and DOS command prompt do not use the same code page (!). From Oracle point of view the terms Code Page and Characterset mean the same. 1.2 Fonts: ---------- A font is a collection of glyphs (from "hieroglyphs") that share common appearances (typeface, character size). A font is used by the operating system to convert a numeric value into a graphical representation on screen. A font does not necessarly contain a graphical representation for all numeric values defined in the code page you are using. That's why you get sometimes black squares on the screen if you change fonts and the new that font has no representation for a certain symbol. The Windows "Character Set Map" utility can be used to see which glyphs are part of a certain font. On Windows 2000: Start -> Programs -> Accessories -> System Tools -> Character Map or Start -> Run... Type "charmap", and click "ok" A font also implements a particular code page or set of code pages. For example, the Arial font implements the code pages 1252, 1250, 1251, 1253, 1254, 1257. For more in-depth info see point 8 in this note. 2. How to setup my NLS_LANG: ---------------------------- To specify the locale behaviour of your client Oracle software, you have to set your NLS_LANG parameter. It sets the language, territory and also the character set of your client. For a short overview, it uses the following format: NLS_LANG = LANGUAGE_TERRITORY.CHARACTERSET where: LANGUAGE specifies: - language used for Oracle messages, - day names and month names TERRITORY specifies: - monetary and numeric formats, - territory and conventions for calculating week and day numbers CHARACTERSET: - controls the character set used by the client application * or it matches your Windows code page * or it set to UTF8 for an unicode application The list of supported character sets,languages and territory can also be found in the Oracle9i Globalization Support Guide, Appendix A, Locale Data Available online at the following URL: http://otn.oracle.com/pls/db92/db92.docindex?remark=homepage The NLS_LANG parameter is never inherited from the server. Please also see: [NOTE:241047.1] The Priority of NLS Parameters Explained. 2.1 In the Registry: -------------------- On Windows systems, you should make sure that you have set an NLS_LANG registry subkey for each of your Oracle Homes: You can easily modify this subkey with the Windows Registry Editor: Start -> Run... Type "regedit", and click "ok" Edit the following registry entry: HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\HOMExx\ where "xx" is the unique number identifying the Oracle home. There you have an entry with as name NLS_LANG When starting an Oracle tools, like sqlplusw, it will read the content of the oracle.key file located in the same directory to determine which registry tree will be used, therefore which NLS_LANG subkey will be used. 2.2 As a System or User Environment Variable, in System properties: ------------------------------------------------------------------- Although the Registry is the primary repository for settings on Windows, it is not the only place where parameters can be set. Even if not at all recommended, you can set the NLS_LANG as a System or User Environment Variable in the System properties. This setting will be used for ALL Oracle homes. To check and modify them: Right-click the 'My Computer' icon -> 'Properties' Select the 'Advanced' Tab -> Click on 'Environment Variables' The 'User Variables' list contains the settings for the specific OS user currently logged on and the 'System variables' system-wide variables for all users. Since these environment variables take precedence of the parameters already set in your Registry, you should not set Oracle parameters at this location unless you have a very good reason. Particularly note the "ORACLE_HOME" parameter that is set on unix but NOT on windows. 2.3 As an Environment variable defined in the command prompt: ------------------------------------------------------------- If you set the NLS_LANG as an environment variable in a Command prompt, be aware that it will overrite the current NLS_LANG setting in the Registry and also the System Properties. In an MS-DOS command prompt, use the set command, for example: C:\> set NLS_LANG=american_america.WE8PC850 3. The correct NLS_LANG for my Windows ANSI Code Page: ------------------------------------------------------ 3.1 Determine your Windows ANSI code page: ------------------------------------------ The ACP (Ansi Code Page) is defined by the "default locale" setting of windows, so if you have a UK Windows 2000 client and you want to input cyrillic (russian) you need to change the ACP (by changing the "default locale") in order to be able to input russian. see [NOTE:199926.1] How to change the ANSI Code Page (ACP) on Windows. You'll find its value in the registry: Start -> Run... Type "regedit", and click "ok" Browse the following registry entry: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\ There you have (all the way below) an entry with as name ACP The value of ACP is your current GUI Codepage, see the table in point 3.2 for the mapping to the oracle name. Since there are many registry entries with very similar names, please make sure that you are looking at the right place in the registry. Again, if you need to change the "ACP" please see: [NOTE:199926.1] How to change the ANSI Code Page (ACP) on Windows Do NOT simply change it in the registry. Additionally, the following URL provides a list of the default code pages for all Windows versions: http://www.microsoft.com/globaldev/reference/ (under the REFERENCE tab on the left of the page) OEM = the command line codepage, ANSI = the gui codepage 3.2 Find the correspondent Oracle client character set: ------------------------------------------------------- Find the Oracle client character set in the table below based on the ACP you found in point 3.1. Note that there is only ONE CORRECT value for a given ACP ANSI CodePage (ACP) NLS_LANG) Oracle Client character set (3rd part of 1250 EE8MSWIN1250 1251 CL8MSWIN1251 1252 WE8MSWIN1252 1253 EL8MSWIN1253 1254 TR8MSWIN1254 1255 IW8MSWIN1255 1256 AR8MSWIN1256 1257 BLT8MSWIN1257 1258 VN8MSWIN1258 874 TH8TISASCII 932 JA16SJIS 936 ZHS16GBK 949 KO16MSWIN949 950 ZHT16MSWIN950 others UTF8 You can use UTF8 as Oracle client character set on Windows NT, 2000 and XP but you will be limited to use only client programs that explicitly support this configuration. This is because the user interface of Win32 is not UTF8, therefore the client programs have to perform explicit conversions between UTF8 (used on Oracle side) and UTF16 (used on Win32 side). An example of such a program is Oracle Forms in version 5 and later on NT 4.0. [NOTE:105809.1] Character Set Support for Developer Tools or iSQLplus (from 817 onwards). see [NOTE:231231.1] Quick setup of iSQL*Plus as unicode (UTF8) client on windows. From the other side, programs relying on ANSI Win32 API, like SQL*Plus, older Oracle Forms , etc. cannot work with an NLS_LANG set to UTF8. For Export / Import please see: [NOTE:227332.1] NLS considerations in Import/Export 3.3 Set it in your Registry: ---------------------------- Use the Windows Registry Editor to set up the NLS_LANG in your Oracle Home with the value you have just find above. Section 2.1 gives you more details on how to use the Registry Editor for that purpose. 4. The correct NLS_LANG for my DOS / Command Prompt OEM Code Page: ------------------------------------------------------------------ MS-DOS mode uses, with a few exceptions like CJK, a different code page (called OEM code page) than Windows GUI (ANSI code page). Meaning that before using an Oracle command line tool such as SQL*Plus (sqlplus.exe/ plus80.exe / plus33.exe ) en svrmgrl in a command prompt then you need to MANUALLY SET the NLS_LANG parameter as an environment variable with the set DOS command BEFORE using the tool. For Japanese, Korean, Simplified Chinese, and Traditional Chinese, the MS-DOS OEM code page (CJK) is identical to the ANSI code page meaning that, in this particular case, there is no need to set the NLS_LANG parameter in MS-DOS mode. In all other cases, you need to set it in order to overwrite the NLS_LANG registry key already matching the ANSI code page. The new "MS-DOS dedicated" NLS_LANG needs to match the MS-DOS OEM code page that could be retrieved by typing chcp in a Command Prompt: C:\> chcp Active code page: 437 C:\> set NLS_LANG=american_america.US8PC437 If the NLS_LANG parameter for the MS-DOS mode session is not set appropriately, error messages and data can be corrupted due to incorrect character set conversion. Use the following list to find the Oracle character set that fits to your MS-DOS code page in use on your locale system: MS-DOS code page NLS_LANG) Oracle Client character set (3rd part of 437 US8PC437 737 EL8PC737 850 WE8PC850 852 EE8PC852 857 TR8PC857 858 WE8PC858 861 IS8PC861 865 N8PC865 866 RU8PC866 For tools like sqlloader you need to set the NLS_LANG to the characterset of the FILE you loading. For Export / Import please see: [NOTE:227332.1] NLS considerations in Import/Export 5. How to check the NLS_LANG: ----------------------------- To check the NLS_LANG, you need to open a command prompt and to run sqlplus in command line mode. First, check if it's set in the environment: SQL> host echo %NLS_LANG% If this reports just %NLS_LANG% back, the variable is not set in the environment. If it's set it reports something like ENGLISH_UNITED KINGDOM.WE8PC850 If NLS_LANG is not set in the enviroment, you should check the value in the registry: SQL> @.[%NLS_LANG%]. If you get something like: unable to open file ".[ENGLISH_UNITED KINGDOM.WE8ISO8859P1]." the "file name" between the '[]' is the value of the registry parameter. (This is NOT an error but just a "trick" to get the NLS_LANG value) If you get this as result: unable to open file ".[%NLS_LANG%]." then the parameter NLS_LANG is also not set in the registry. Note: the @.[%NLS_LANG%]. "trick" reports the NLS_LANG known by the sqlplus executable, it will not read the registry itself. But then you are not sure if the variable is set in the enviroment or in the registry. That's the reason of checking with the host commando first. 6. List of common NLS_LANG's used in the Windows Registry: ---------------------------------------------------------- Operating System Locale Arabic (U.A.E.) EMIRATES.AR8MSWIN1256 NLS_LANG Value ARABIC_UNITED ARAB Bulgarian BULGARIAN_BULGARIA.CL8MSWIN1251 Catalan CATALAN_CATALONIA.WE8MSWIN1252 Chinese (PRC) SIMPLIFIED CHINESE_CHINA.ZHS16GBK Chinese (Taiwan) CHINESE_TAIWAN.ZHT16MSWIN950 TRADITIONAL Croatian CROATIAN_CROATIA.EE8MSWIN1250 Czech CZECH_CZECH REPUBLIC.EE8MSWIN1250 Danish DANISH_DENMARK.WE8MSWIN1252 Dutch (Netherlands) DUTCH_THE NETHERLANDS.WE8MSWIN1252 Dutch (belgium) DUTCH_BELGIUM.WE8MSWIN1252 English (United Kingdom) ENGLISH_UNITED KINGDOM.WE8MSWIN1252 English (United States) AMERICAN_AMERICA.WE8MSWIN1252 Estonian ESTONIAN_ESTONIA.BLT8MSWIN1257 Finnish FINNISH_FINLAND.WE8MSWIN1252 French (Canada) CANADIAN FRENCH_CANADA.WE8MSWIN1252 French (France) FRENCH_FRANCE.WE8MSWIN1252 German (Germany) GERMAN_GERMANY.WE8MSWIN1252 Greek GREEK_GREECE.EL8MSWIN1253 Hebrew HEBREW_ISRAEL.IW8MSWIN1255 Hungarian HUNGARIAN_HUNGARY.EE8MSWIN1250 Icelandic ICELANDIC_ICELAND.WE8MSWIN1252 Indonesian INDONESIAN_INDONESIA.WE8MSWIN1252 Italian (Italy) ITALIAN_ITALY.WE8MSWIN1252 Japanese JAPANESE_JAPAN.JA16SJIS Korean KOREAN_KOREA.KO16MSWIN949 Latvian LATVIAN_LATVIA.BLT8MSWIN1257 Lithuanian LITHUANIAN_LITHUANIA.BLT8MSWIN1257 Norwegian NORWEGIAN_NORWAY.WE8MSWIN1252 Polish POLISH_POLAND.EE8MSWIN1250 Portuguese (Brazil) BRAZILIAN PORTUGUESE_BRAZIL.WE8MSWIN1252 Portuguese (Portugal) PORTUGUESE_PORTUGAL.WE8MSWIN1252 Romanian ROMANIAN_ROMANIA.EE8MSWIN1250 Russian RUSSIAN_CIS.CL8MSWIN1251 Slovak SLOVAK_SLOVAKIA.EE8MSWIN1250 Spanish (Spain) SPANISH_SPAIN.WE8MSWIN1252 Swedish SWEDISH_SWEDEN.WE8MSWIN1252 Thai THAI_THAILAND.TH8TISASCII Spanish (Mexico) MEXICAN SPANISH_MEXICO.WE8MSWIN1252 Spanish (Venezuela) LATIN AMERICAN SPANISH_VENEZUELA.WE8MSWIN1252 Turkish TURKISH_TURKEY.TR8MSWIN1254 Ukrainian UKRAINIAN_UKRAINE.CL8MSWIN1251 Vietnamese VIETNAMESE_VIETNAM.VN8MSWIN1258 7. List of common NLS_LANG's used in the Command Prompt (DOS box): ------------------------------------------------------------------ Operating System Locale NLS_LANG) Oracle Client character set (3rd part of Arabic AR8ASMO8X Catalan WE8PC850 Chinese (PRC) ZHS16GBK Chinese (Taiwan) ZHT16MSWIN950 Czech EE8PC852 Danish WE8PC850 Dutch WE8PC850 English (United Kingdom) WE8PC850 English (United States) US8PC437 Finnish WE8PC850 French WE8PC850 German WE8PC850 Greek EL8PC737 Hungarian EE8PC852 Italian WE8PC850 Japanese JA16SJIS Korean KO16MSWIN949 Norwegian WE8PC850 Polish EE8PC852 Portuguese WE8PC850 Romanian EE8PC852 Russian RU8PC866 Slovak EE8PC852 Slovenian EE8PC852 Spanish WE8PC850 Swedish WE8PC850 Turkish TR8PC857 8. How Windows uses Fonts to display the different charactersets: ----------------------------------------------------------------- We assume you have an UTF8 database with correctly stored UTF8 codepoints. On Windows there are two kinds of tools / applications: 1)A fully Unicode enabled applications which accepts Unicode codepoints and which can render them. It's the application that needs to deal with the Unicode, Windows provides the unicode API but the GUI system itself is NOT Unicode "by nature". A fully Unicode application can only show one glyph for a given Unicode code point. So there is NO confusion possible here, this application will need to use a full unicode font. If you have a full unicode application, then you need to set the NLS_LANG to UTF8. Note that there are currently NOT many applications like this and if it's not explicitly mentioned by the vendor it's most likely an ANSI application (see below). So DON'T set the NLS_LANG to UTF8 if you are not sure! The only Unicode capable client that is included in the database is iSQLPLus. See [NOTE:231231.1] Quick setup of iSQL*Plus as unicode client on windows. for a guide on how to setup this. 2) An standard ANSI application (like sqlplusw.exe) cannot use Unicode code points. So the Unicode code point stored in the database needs to be CONVERTED to a ANSI code point. This is done by setting NLS_LANG (as described in further on in this note and in [NOTE:158577.1]. This allows oracle to map the unicode point to the characterset of the client, (and here comes the tricky part)but this is NOT the same as a font. If you want to display Arabic for example then you need to set the Windows characterset to Arabic. That way Windows knows what are valid codepoints and can use the FONT engine to DISPLAY the codepoints (this results in glyphs). Windows passes the codepoint and the "page" to the rendering engine. This "page" defines the glyphs for the codepoints for a certain characterset/codepage. Because there are only 256 possible positions for a ANSI application, and one font contains normally glyphs for different languages this "page" is used to select from a FONT that has (for example) all the glyphs for Cyrillic, Arabic and West-European the "page" for arabian. So lets say you have a Arabic setup that works, you change manually the "Page" of a FONT and ask to display the glyph for ANSI codepoint XX. Now 1 of 2 things can happen: 1) There is a character defined on that position for the CHARACTERSET of that "Page", so the creator of the font has forseen a glyph and this is displayed (but this is NOT the character expected or wanted as its stored as a different character in the database!). 2) There is NO character defined on that position for the CHARACTERSET of that "Page" so the creator of the font has NOT forseen a glyph and you get "garbage" or black squares (normally you should see a black square but a ? or ? are also possible, this depends on the error handling defined in the FONT). The above is also possible if you have an non-Unicode characterset for the database. For more information see also: [NOTE:137127.1] Character Sets, Code Pages, Fonts and the NLS_LANG Value Related Documents: ================== [NOTE:158577.1] NLS_LANG Explained (How does Client-Server Character Conversion Work?) [NOTE:137127.1] Character Sets, Code Pages, Fonts and the NLS_LANG Value <Note.199926.1> How to change the ANSI Code Page (ACP) on Windows NT 4.0 and Windows 2000 [NOTE:226558.1] An example inserting cyrillic data into a database on west european windows. [NOTE:223706.1] Using Locale Builder to view the definition of character sets [NOTE:132453.1] How to Change the Displayed Font in SQL*PLUS (GUI) on WinNT [NOTE:231231.1] Quick setup of iSQL*Plus as unicode (UTF8) client on windows. [NOTE:165259.1] How to set NLS Variables for different Applications using the same ORACLE_HOME - Oracle8i Installation Guide for Windows NT, Part Number A85302-01 Appendix D - National Language Support http://otn.oracle.com/documentation/oracle8i.html - Oracle9i Database Installation Guide for Windows, Part Number A90162-01 Appendix E - Globalization Support http://otn.oracle.com/documentation/oracle9i.html - Microsoft web site: http://www.microsoft.com/globaldev/reference/oslocversion.mspx provides a list of the default code pages for all Windows versions. For further NLS / Globalization information you may start here: [NOTE:150091.1] Globalization Technology (NLS) Library index . Bookmark Doc ID: Note:225912.1 Subject:Changing the Database Character Set - an Overview Type: BULLETIN Status: PUBLISHED Fixed font Content Type: Creation Date: Last Revision Date: Go to End TEXT/PLAIN 14-JAN-2003 06-JUN-2003 Introduction ============ This article aims to give an overview of possibilities to change the database character set and points to notes that are specific to the methods described here. If you use this note as a guide you're sure to find the correct information for your particular circumstances. Please make sure that you have a good understanding of how oracle deals with character set conversion before going ahead with this, start with going trough the following note before changing the character set of you database: [NOTE:158577.1] NLS_LANG Explained (How does Client-Server Character Conversion Work?) Impact of changing the database character set ============================================= Changing the database character set can have a lot of consequences that should be understood before starting with it. The character set of a database defines how characters are stored in the database and therefore you are limited to storing just the characters defined in that character set. If you change character sets there is a possibility that characters that you currently use are not defined in the new character set and therefore you could 'corrupt' your data. The following note gives a good way of viewing the characters that are defined in a character set: [NOTE:223706.1] Using Locale Builder to view the definition of character sets. You can check what what code points are physically stored in your database at this moment by using the dump command , see these notes for more details: [NOTE:13854.1] Dump SQL Command for NLS Debugging [NOTE:69518.1] Determining the code point for UTF8 characters Using "Oracle Applications" =========================== If you use Oracle Applications the basic task of running the character set scanner explained in this note are the same, so please continue to read this note to get a full understanding of the migration options. However there are some additional considerations so please see the following note for a complete overview of changing the character set in an Applications Database: [NOTE:124721.1] Migrating an Applications Installation to a New Character Set Preparation =========== The first step in the process of changing the database character set is running the character set scanner (csscan) to check how the data will cope with the character set change. Using this you will find the data that can not be stored in the new character set. This can be either due to the fact that the new character set does not describe the characters you currently store, or it can be due to size restrictions. The following note explains how to use the character set scanner and gives some examples: [NOTE:123670.1] Use Scanner Utility before Altering the Database Character Set The character set scanner reports the status of each table as one of these 3: CHANGELESS - the data would be stored in exactly the same way in the new character set. CONVERTIBLE - the data can be stored using the new character set without problems but different code points are used so it needs to be converted. EXCEPTIONAL - These are the tables/rows with problems that need to be tackled before going ahead with the conversion When you've changed the data in the original database and csscan only reports tables to be CHANGELESS or CONVERTIBLE it's time to move on to changing the database character set. Important warning: When using the Character set scanner you could find out that your data is stored completely incorrectly. For example you might have stored Arabic data in a WE8ISO8859P1 database (due to incorrect NLS_LANG settings on the client). This is typicaly showed as EXCEPTIONAL data found. In that case please see the following note, which explains how to recover from these problems: [NOTE:225938.1] Database Character Set Healthcheck Once the data is stored correctly you then have to start over and go through the steps of this note again to change the character set. For example you might have a database that currently has the US7ASCII character set and you might want to change it to AL32UTF8. You might come to the conclusion that you actually have WE8MSWIN1252 codes (the character set of Western Windows) stored in your current US7ASCII database. You then need to first follow the mentioned [NOTE:225938.1] to change the database to the correct character set. After that you can restart with this note to move from there to the desired character set. Changing the database character set =================================== After having asserted the data is not going to give us problems by using csscan we now come to the real work of changing the character set. Since Oracle 8.0 there are 2 ways to do this: The simplest way is to use the "ALTER DATABASE CHARACTER SET" command but that is not always possible. Something that will always work is to export the current database, then create a new database with the new character set and import the data into that database. Sometimes a combination of the 2 methods can be used. We will describe the methods further: 1. using the ALTER DATABASE CHARACTER SET command ------------------------------------------------This command was introduced in Oracle8. The statement does not actually convert any characters in the database. It simply updates all the places in the database where the character set information is stored. That means that you can only use this statement if the new character set is a superset of the old character set. We make the distinction between a binary -or TRUEsuperset and a 'normal' superset. For example lets take character set A and B. Character set A is a binary superset of character set B if A describes exactly the same characters and used the same codepoint for those characters as B does. On top of that it probably also has some more characters characters at codepoints that are left free by character set B - otherwise they would be 100% the same set. If this is the case then csscan would have flagged up a CHANGELESS conversion in the csscan output and this method will work fine. It is also possible for a character set to contain the same (and more) characters as another character set but store those characters at different code points. Although this is still a logical superset it is not a binary superset. Csscan would flag those characters up as CONVERTIBLE and we need another method of changing the character set. An example of this is (AL32)UTF8, which is a Unicode character set. It easily contains all the characters of, for example WE8ISO8859P15. However, it defines a lot of those characters at different code points. Therefore UTF8 is not classed as a binary superset of WE8ISO8859P15. The following note gives an overview of all the superset/subset combinations that exists: [NOTE:119164.1] Changing Database Character Set - Valid Superset Definitions So, if you indeed are changing to a superset and csscan showed all your data as CHANGELESS you can use the ALTER DATABASE CHARACTER SET command. See the following note for a detailed description of how to do this: [NOTE:66320.1] Changing the Database Character Set or the Database National Character Set 2. Using export/import ---------------------A method that will always work is using export/import. You use this method if the csscan has marked your data CONVERTIBLE. Be aware that using this method you will not receive an error when you move to an incompatible character set (using method 1 you will get errors in that case). If characters show up as "EXCEPTIONAL" in the csscan output you can still press ahead with export/import. Characters that do not exist in the new character set are then replaced by a so-called replacement character. It is not advisable to rely on this, it is a better to make the choices yourself by changing data that shows up as EXCEPTIONAL before the export. Also rows are flagged in csscan because the data in the new database will be expanded and will not fit in the allocated storage space. If you press ahead with the exp/imp they will simply receive a error om import (ORA1401) and will not be imported. When you change character sets through export/import you could potentially go through 3 different conversions. This is explained (among other things) in the following note: [NOTE:15095.1] Export/Import and NLS Considerations We obviously want to keep the number of conversions down to just 1 instead of 3 because there is less chance of problems that way. The easiest way to achieve that is to keep the export character set (the setting in NLS_LANG when the export is done) and the import session character set (the setting in NLS_LANG when the import is done) the same as the character set in the original database. That means that no conversion takes place on export and no conversion takes place inside the import session. The only conversion that takes place is from the import session (which by that point still has the original data) to the new database. So, if you only have CONVERTIBLE and CHANGELESS data then: 1) Export the source database with the NLS_LANG set to the character set of the source (OLD) database. ( select * from nls_database_parameters where parameter = 'NLS_CHARACTERSET'; ) 2) Create a NEW database with the NEW database characterset you want to use. 3) Import the export file with the NLS_LANG set to the character set of the source (OLD) database. 4) The character set conversion will be done at the moment that the import executable inserts the data into the new database. 5) Backup the new database. 3. Using a combination of ALTER DATABASE CHARACTER SET and export/import ----------------------------------------------------------------------In some cases you might find that csscan tells you that most of your data is CHANGELESS but only one (or a small number) of your tables is CONVERTABLE. In that case you would still have to use export/import. If the size of the database is particularly large and using export/import will cause too much down time you could follow this method: a) Export the data from the tables that come out as CONVERTABLE. b) Truncate or drop those tables. c) Now you have a database with only CHANGELESS data. Because the character set is not classed as a subset of the new character set you cannot use the normal form of the ALTER DATABASE CHARACTER SET command. The command usually simply checks if the new character set is a superset of the old character set, it does not check the actual data. In order to force the change please contact Oracle Support. Provide them with the output of csscan showing all data is classed as CHANGELESS and they will then assist you in forcing the character set change. d) Now that the character set has changed we can simply import the data exported in step (a). The import will convert that data so that it gets stored in the correct way for this character set. After changing the database character set ========================================= Remember that changing the database character set only changes the way characters are stored on the database. This should NOT have any effect on the clients and you should not have to change the NLS_LANG for example. The NLS_LANG on the clients should simply reflect the setup of that machine. For more general information on setting up NLS_LANG and how clientserver conversion works please see the following notes: [NOTE:179133.1] The correct NLS_LANG in a Windows Environment [NOTE:158577.1] NLS_LANG Explained (How does Client-Server Character Conversion Work?) Further reading =============== For further NLS / Globalization information you may start here: [NOTE:150091.1] Globalization Technology (NLS) Library index [NOTE:60134.1] Globalization (NLS) - Frequently Asked Questions . Abstract The following sample code uses Visual Basic and DAO to link in the EMP and DEPT tables from the Ora Microsoft Access. Product Name, Product Version Oracle ODBC Driver, versions 8.1.7, 9.0, or 9.2 Platform Windows 95, 98, NT, 2000, and XP Professional Date Created 20-AUG-2001 Instructions Execution Environment: MS Access 2000 Access Privileges: Requires access to the EMP and DEPT tables in the SCOTT sample schema. Usage: Run the completed sample from within Access. Instructions: 1. Open a blank database in Access 2000. 2. Click on the Forms tab and choose "Create a form in Design view". 3. Add a command button to the new form. 4. Right-click the button and choose "Properties". 5. Choose the "Event" tab and find the "On Click" event. 6. Click in the empty box next to the event and select the drop-down button and choose "[Event Procedure]". 7. Click on the button to the right of the drop-down button that looks like three dots "..." This will bring up Visual Basic. 8. Paste the code below into the "Command0_Click" procedure. 9. In the following connect string: strConnect = "ODBC;DSN=v817;UID=SCOTT;PWD=TIGER" a. Change "v817" to match your Data Source Name that you have setup for your ODBC connection. b. Change "SCOTT" to match userid for your database. c. 10. Change "TIGER" to match password for your database. Chools Tools | References... and check the following box: a. Microsoft DAO 3.51 Object Library 11. Save the form by going to File->Save in Visual Basic. save the code for the form. This will 12. Close out Visual Basic. 13. In Access, go to Forms, and you will see your form there. Double click on it and hit the Command0 button. This will run the code. 14. You are now ready to run the sample code. The code will link the EMP and DEPT tables from the Oracle database into MS Access using the DSN you have specified. PROOFREAD THIS SCRIPT BEFORE USING IT! Due to differences in the way text editors, e-mail packages, and operating systems handle text formatting (spaces, tabs, and carriage returns), this script may not be in an executable state when you first receive it. Check over the script to ensure that errors of this type are corrected. Description Prerequisites: * Oracle client software version 8.1.7, 9.0, or 9.2 * Oracle ODBC Driver version 8.1.7, 9.0, or 9.2 * Microsoft Data Access Components (MDAC) version 2.6 or above * Windows 95, 98, NT 4.0, 2000, or XP Professional * MS Access 2000 * Visual Basic 6.0 Sample Output: Under the "Tables" tab in MS Access, you will find the Oracle tables EMP and DEPT linked in as the MS Access tables EMP_TABLE and DEPT_TABLE. References This sample was taken from Sample Code Repository (SCR) Entry 1476. Sample Code ' ******************************************************************************** ' This procedure attaches an external Oracle Database table to MS Access in code ' ******************************************************************************** On Error GoTo ErrorHandler Dim Dim Dim Dim dbs As Database tdfMyTable1 As TableDef tdfMyTable2 As TableDef strConnect As String Set dbs = CurrentDb ' Define your connection string. ' Include the name of the DSN, User ID, and Password. strConnect = "ODBC;DSN=MyDSN;UID=SCOTT;PWD=TIGER" ' ATTACH THE 'EMP' TABLE FROM THE ORACLE DATABASE - PART 1 ' Delete the linked table object if it already exits DoCmd.DeleteObject acTable, "EMP_TABLE" ' Create a new table object from an Oracle table Set tdfMyTable1 = dbs.CreateTableDef("EMP_TABLE") tdfMyTable1.Connect = strConnect tdfMyTable1.SourceTableName = "EMP" ' Link the table and have it appear in the 'Tables' tab dbs.TableDefs.Append tdfMyTable1 ' Deallocate memory for the table object Set tdfMyTable1 = Nothing MsgBox "The EMP table from the SCOTT/TIGER schema in the Oracle database has been linked into MS Access through code. Check your 'Tables' tab.", , "Linked Table" ' ATTACH THE 'DEPT' TABLE FROM THE ORACLE DATABASE - PART 2 ' Delete the linked table object if it already exits DoCmd.DeleteObject acTable, "DEPT_TABLE" ' Create a new table object from an Oracle table Set tdfMyTable2 = dbs.CreateTableDef("DEPT_TABLE") tdfMyTable2.Connect = strConnect tdfMyTable2.SourceTableName = "DEPT" ' Link the table and have it appear in the 'Tables' tab dbs.TableDefs.Append tdfMyTable2 ' Deallocate memory for the table object Set tdfMyTable2 = Nothing MsgBox "The DEPT table from the SCOTT/TIGER schema in the Oracle database has been linked into MS Access through code. Check your 'Tables' tab.", , "Linked Ta ' Deallocate memory for the database object Set dbs = Nothing Exit Sub ErrorHandler: Select Case Err Case 3011, 7874 ' The table does not exit, continue execution of the program Resume Next Case Else ' Display any other error messages generated MsgBox Err & " - " & Error$, , "LINKING TABLES" End Select Exit Sub Disclaimer EXCEPT WHERE EXPRESSLY PROVIDED OTHERWISE, THE INFORMATION, SOFTWARE, PROVIDED ON AN "AS IS" AND "AS AVAILABLE" BASIS. ORACLE EXPRESSLY DISCLAIMS ALL WARRANTIES OF ANY KIND, WHETHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. ORACLE MAKES NO WARRANTY THAT: (A) THE RESULTS THAT MAY BE OBTAINED FROM THE USE OF THE SOFTWARE WILL BE ACCURATE OR RELIABLE; OR (B) THE INFORMATION, OR OTHER MATERIAL OBTAINED WILL MEET YOUR EXPECTATIONS. ANY CONTENT, MATERIALS, INFORMATION OR SOFTWARE DOWNLOADED OR OTHERWISE OBTAINED IS DONE AT YOUR OWN DISCRETION AND RISK. ORACLE SHALL HAVE NO RESPONSIBILITY FOR ANY DAMAGE DONE TO YOUR COMPUTER SYSTEM OR LOSS OF DATA THAT RESULTS FROM THE DOWNLOAD OF ANY CONTENT, MATERIALS, INFORMATION OR SOFTWARE. ORACLE RESERVES THE RIGHT TO MAKE CHANGES OR UPDATES TO THE SOFTWARE AT ANY TIME WITHOUT NOTICE. Limitation of Liability IN NO EVENT SHALL ORACLE BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL OR CONSEQUENTIAL DAMAGES, OR DAMAGES FOR LOSS OF PROFITS, REVENUE, DATA OR USE, INCURRED BY YOU OR ANY THIRD PARTY, WHETHER IN AN ACTION IN CONTRACT OR TORT, ARISING FROM YOUR ACCESS TO, OR USE OF, THE SOFTWARE. SOME JURISDICTIONS DO NOT ALLOW THE LIMITATION OR EXCLUSION OF LIABILITY. ACCORDINGLY, SOME OF THE ABOVE LIMITATIONS MAY NOT APPLY TO YOU.