Uploaded by Aly Deden

Differences between Unicode and Non-Unicode Programs

advertisement

ABAP Keyword Documentation

Differences between Unicode and Non-Unicode Programs

Comments and Literals in Non-Unicode Programs

Names in Unicode Programs

Program Structure of Unicode Programs

Operand Types in Unicode Programs

Alignment in Unicode Systems

Offset and Length Specifications in Unicode Programs

Access to Memory Sequences in Unicode Programs

Conversion of Structures in Unicode Programs

Structure Typing in Unicode Programs

Structure Enhancements and Unicode Programs

Character String and Byte String Processing in Unicode Programs

Function Module Calls in Unicode Programs

Open SQL in Unicode Programs

The File Interface in Unicode Programs

Lists in Unicode Systems

Differences between Unicode and Non-Unicode

Programs

The ABAP key word documentation describes the ABAP statements for both Unicode and non-Unicode systems. Only Unicode programs can be compiled and executed in Unicode systems. In non-Unicode systems, this is also possible for non-Unicode programs . However, Unicode programs should also be used in non-Unicode systems, for the following reasons:

Static type checks are executed in Unicode programs.

Byte processing and string processing is separated in Unicode programs.

Structures are always handled as structures in Unicode programs.

Uncontrolled access to segments of the working memory is not possible in Unicode programs.

This makes Unicode programs easier to understand, more robust, and easier to maintain than non-Unicode programs.

The following section lists the language constructs and statements for which there are differences between

Unicode and non-Unicode programs:

Comments and Literals in Non-Unicode Programs

Names in Unicode Programs

Program Structure of Unicode Programs

Operand Types in Unicode Programs

Alignment in Unicode Systems

Offset and Length Specifications in Unicode Programs

Access to Memory Sequences in Unicode Programs

Conversion of Structures in Unicode Programs

Structure Typing in Unicode Programs

Structure Enhancements and Unicode Programs

String and Byte String Processing in Unicode Programs

Function Module Calls in Unicode Programs

Open SQL in Unicode Programs

File Interface in Unicode Programs

Lists in Unicode Systems

Comments and Literals in Non-Unicode Programs

In non-Unicode systems, no characters should be used in comments unless they are available in all code pages supported by SAP. In the worst case, a program can no longer be executed when a code page other than the one in which it was created is used. We recommend the usage of 7-Bit-ASCII-characters only.

Note

In a Unicode system, all source codes are stored in Unicode and this is why this problem does not occur there. However, even in Unicode programs, do not use characters for comments and literals if they cannot be displayed in non-Unicode programs, so that programs can be transported from a Unicode system to a non-Unicode system without losses during conversion.

Names in Unicode Programs

Only the following characters are allowed in names in Unicode programs:

1. The letters "A" through "Z"

2. The digits "0" through "9"

3. Underscores ("_")

For compatibility reasons, you can also use the characters "%", "$", "?", "-", "#", and "*" but these should be used only in exception cases (for example, for existing program generations) and with good justification.

You can also use forward slashes ("/") for namespace prefixes.

Note

Apart from ABAP Objects, non-Unicode programs can also use characters other than the ones listed above. This can cause the following problems in these programs:

If characters are used that are not available in all code page supported by SAP, it might not be possible to run certain programs when using a different code page to the one in which they were created.

No string templates can be used in a non-Unicode program.

Program Structure of Unicode Programs

Non-accessible statements (statements that are not assigned to a processing block ) lead to a syntax error in Unicode programs. In non-Unicode programs, at present only a syntax warning is issued.

Operand Types in Unicode Programs

One of the most important differences between Unicode and non-Unicode programs is the clear distinction between character-type data objects and byte-type data objects , and the restriction of data types whose objects can be viewed as character-type. This has an influence on all statements in which character-type operands are expected, and in particular on character string and byte string processing .

Character-type data objects

In Unicode programs, only the following elementary data objects are now character-type:

Data type Meaning c

Text field d

Date field n

Numerical text t

Time field string

Text string

In addition, structures are character-type if they contain only flat character-type components (only components from the above table with the exception of text strings).

In Unicode programs, a structure can now essentially only be used at an operand position that expects a single field if the structure is character-type. It is then handled in the same way as a data object of type c .

In non-Unicode programs, all flat structures and byte-type data objects are also still handled as charactertype data objects (implicit casting).

Note

The incorrect use of structures at operand positions is greatly restricted in Unicode programs. For example, a structure that contains a numeric component can no longer be used at a numeric operand position.

Byte-type data objects

In Unicode programs, elementary data objects of types x and xstring are byte-type. In non-Unicode programs, data objects of this type are generally handled as character-type. Conversely, in non-Unicode programs, at positions in which byte processing takes place ( SET BIT , GET BIT and the logical operators

O , Z , M ), character-type data objects are still expected, while in Unicode programs only byte-type data objects are permitted.

Note

In Unicode programs, the storage of byte strings in character-type containers causes problems, as the byte order of character-type data objects in Unicode systems is platform dependent. In non-Unicode systems, this only applies for data objects of numeric data types. The content of the data objects is interpreted incorrectly if a container of this type is stored persistently and is then imported to an application server with a different byte sequence.

Alignment in Unicode Systems

In Unicode systems, in addition to alignment requirements for numeric data objects of types i , decfloat16 , decfloat34 , f , s , and of deep data objects, all character-like data types are also affected.

The alignment is determined by the length requirement of a character in the memory.

As a consequence, in structures with components of different data types, the alignment gaps in Unicode systems may be different to those in non Unicode systems. For enhancements between structures, the

Unicode fragment view concept has been introduced, which divides a structure into fragments according to its alignment gaps.

Note

Alignment gaps can also occur at the end of structures, as the overall length of the structure is determined by the component with the largest alignment requirement.

Example

In the following structure, alignment gaps (A) occur in Unicode systems that are not present in non-Unicode systems. The first alignment gap is formed as a result of the alignment of the substructure struc2 , the second due to the alignment of the component c of type c> , and the third is due to the component d of type i .

DATA:

BEGIN OF struc1,

a TYPE x LENGTH 1,

BEGIN OF struc2,

b TYPE x LENGTH 1,

c TYPE c LENGTH 6,

END OF struc2,

d TYPE i,

END OF struc1.

Non-Unicode system [ a | b | cccccc | dddd ]

Unicode system [ a | A | b | A | cccccccccccc | AA | dddd ]

Offset and Length Specifications in Unicode Programs

Offset/length specifications are made by appending [+off][(len)] to the name of a data object in operand position , and the specifications are used to access subareas of a data object. This type of programming is no longer completely possible in Unicode systems because, for example when accessing structures with components of different data types, it is not possible to define whether offset and length should be specified in characters or bytes. Furthermore, restrictions have been introduced that forbid access to memory areas outside of flat data objects.

Offset/Length Specifications for Elementary Data Objects

Offset/length specifications are permitted for character-like data objects and byte-like data objects . The specification of offset and length is interpreted either as a number of characters or as a number of bytes.

The rules that determine which data objects in Unicode programs count as character-like or byte-like objects do not allow for offset/length specifications for data objects of numeric data types.

Note

The method of using data objects of type c as containers for storing structures of different types, which are often not known until runtime, in which components are accessed using offset/length access, is no longer possible in Unicode programs. Instead of these containers, the statement CREATE DATA can be used to generate data objects of any structure. To enable access to existing containers, these can be assigned to a field symbol using the CASTING addition of the statement ASSIGN . The COMPONENT addition can then be used to access components.

Offset/Length Specifications for Structures

An offset/length specification for a structure is only permitted in Unicode systems if the structure is either

 character-like (meaning it only contains flat character-like components), or it is

 flat , has a character-like initial fragment according to the Unicode fragment view , and the offset/length specification accesses this initial fragment.

In both cases, the specification of offset and length is interpreted as a number of characters.

Example

The following structure has both character-like and non-character-like components:

DATA:

BEGIN OF struc,

a TYPE c LENGTH 3, "Length 3 characters

b TYPE n LENGTH 4, "Length 4 characters

c TYPE d, "Length 8 characters

d TYPE t, "Length 6 characters

e TYPE decfloat16, "Length 8 bytes

f TYPE c LENGTH 28, "Length 28 characters

g TYPE x LENGTH 2, "Length 2 bytes

END OF struc.

The Unicode fragment view splits the structure into five areas, F1 - F5.

[ aaa | bbbb | cccccccc | ddd | AAA | eeee | fffffffffffff | gg ]

[ F1 | F2 | F3 | F4 | F5 ]

Offset/length access is only possible for the character-like initial fragment F1 . Specifications such as struc(21) or struc+7(14) are accepted and are handled as a single field of type c . An access such as struc+57(2) , for example, is not permitted in Unicode systems.

Offset/Length Specifications for Actual Parameters

For actual parameters specified in PERFORM , in Unicode programs, it is not possible to specify a memory area outside of the actual parameter using offset/length specifications. In particular, it is no longer possible to specify an offset without a length, as this would implicitly set the length of the actual parameter.

Offset/Length Specification for Field Symbols

When assigning a memory area to a field symbol using the ASSIGN statement, in Unicode programs it is now only possible to use offset/length specifications to access the memory within the data object. The addition RANGE defines the data object.

Field symbols themselves are also allocated an assignable memory area. This is effective if a field symbol is used as a source in the ASSIGN statement.

In non-Unicode programs , the assignable area is defined by the data area of the current program, which can lead to references being overwritten.

If a data object is entered as a source in ASSIGN , no offset can be specified without a length unless the explicit RANGE addition is specified. Otherwise, this would implicitly set the length of the data object. If the name of a field symbol is specified, its data type in Unicode programs must be flat and elementary if an offset is specified without a length.

Note

Previously, cross-field offset/length accesses could be usefully implemented in the ASSIGN statement for processing repeating groups in structures. In order to enable this in Unicode systems, the ASSIGN statement has been enhanced with the additions RANGE and INCREMENT .

Access to Memory Sequences in Unicode Programs

The following (obsolete) statements access data objects that are stored in the memory as an equally spaced sequence:

 DO ... VARYING

 WHILE ... VARY

 ADD ... THEN ... UNTIL

 ADD ... FROM ... TO

In the DO and WHILE loops in Unicode programs, all data objects of the sequence must be compatible and either be structure components that belong to the same structure, or subareas of the same data object specified using offset/length specifications. In Unicode programs, a RANGE must also be entered if it cannot be statically recognized that the data objects involved are components of the same structure. Otherwise, the permitted memory area is determined from the smallest possible substructure.

When memory sequences are added using ADD , in Unicode programs, all data objects of the sequence must be components of a structure. If this cannot be statically recognized in the syntax check, a structure must be specified using the addition RANGE .

Conversion of Structures in Unicode Programs

The most important differences between the behaviors of Unicode programs and non-Unicode programs are the changed conversion rules for structures for assignments and for comparisons .

Note

Two structures in Unicode programs are only compatible when all alignment gaps are identical on all platforms. This applies in particular for alignment gaps that are created by included structures ( INCLUDE )

Assignments Between Flat Structures

In non-Unicode programs, incompatible flat structures are treated as data objects of the type c , whereas in

Unicode programs, conversion rules apply which assign the most important role to the Unicode fragment view of the structures.

Assignments Between Flat Structures and Single Fields

Non-Unicode programs always handle flat structures as data objects of the type c when assigning from and to elementary data objects. In Unicode programs, however, a conversion rule applies, stating that the structure must be character-like (at the very least in its initial fragment).

Comparisons Between Incompatible Flat Structures

As is the case with assignments, the structures are not handled as c fields, but in accordance with their

Unicode fragment view (see Comparison Rules Between Operands ).

Comparisons Between Flat Structures and Single Fields

As is the case with assignments, the system checks whether the structure is character-like, at the very least in its initial fragment (see Comparison Operators for All Data Types ).

Structure Typing in Unicode Programs

For downward compatibility reasons, a structure can still be cast for field symbols and parameters of function modules and subroutines using the obsolete addition STRUCTURE .

When assigning a data object to such a field symbol or passing an actual parameter to such a formal parameter, in non-Unicode programs, the system only checks whether the length of the data object or actual parameter has at least the length of the structure and whether the alignment is identical at runtime.

Unicode programs make a difference between structured and elementary data objects or actual parameters. For a structured data object or actual parameter, its Unicode fragment view must match the cast structure including all alignment gaps (including the closing ones). In addition, an elementary data object or actual parameter must be character-like and flat.

When a formal parameter of a function module is typed with a flat structure using LIKE instead of TYPE ,

LIKE has the same effect as STRUCTURE . However, the system checks the exact length when passing the parameters in non-Unicode programs.

Note

The check of the Unicode fragment view can avoid problems that occur in non-Unicode systems due to closing alignment gaps. This can include the non-type-compliant filling of actual parameters with the content of an alignment gap.

Structure Enhancements and Unicode Programs

ABAP Dictionary structures and database tables that are delivered by SAP can be enhanced using customizing includes or append structures . These types of changes cause problems in Unicode programs if the enhancements change the Unicode fragment view .

For this reason, the option to classify structures and database tables was introduced, which makes it possible to recognize and handle problems related to structure enhancements. This classification is used during in the program check to create a warning at all points where the program works with structures, and where later structure enhancements can cause syntax errors or changes in program behavior. When you define a structure or a database table in ABAP Dictionary, you can specify the enhancement categories that are displayed in the following table as classification.

Level Category

1 Unclassified

2

3

Meaning

The structure does not have an enhancement category.

Cannot be enhanced The structure must not be enhanced.

Can be enhanced and character-like

All structure components and their enhancements must be character-like and flat.

4

5

Can be enhanced and character-like or numeric

All structure components and their enhancements must be flat.

Can be enhanced in any way

All structure components and their enhancements can have any data type.

The warnings displayed after the program check are classified into three levels from the following table, depending on the consequences of the permitted structure enhacements.

Level

Type of

Check

A

B

Extended check

Meaning

Syntax check

An enhancement that fully utilizes the enhancement category of the structure in question leads to a syntax error.

Permitted enhancements can lead to a syntax errors, but not always.

C

Extended check

Permitted enhancements cannot lead to syntax errors, although changes to program behavior do result in semantic problems.

Example

If the structure ddic_struc in ABAP Dictionary is defined only with flat components but is classified as

Can be enhanced in any way , then the following program section leads to a warning in the syntax check. If the structure were to be enhanced by a deep component after the program was delivered, the program would be syntactically incorrect and no longer executable. This is why in this case you either have to classify the structure ddic_struc in ABAP Dictionary as Can be enhanced and character-like or else you cannot specify the offset/length in the program.

DATA: my_struc TYPE ddic_struc,

str TYPE string,

off TYPE i,

len TYPE i.

... str = my_struc+off(len).

Character String and Byte String Processing in

Unicode Programs

In Unicode programs, character string and byte string processing are strictly separated. The operands of character string processing must be character-like data objects , and operands in byte string processing must be byte-like data objects . In non-Unicode programs, byte strings are normally handled in the same way as character strings.

Syntactic Separation

Statements for Character String and Byte String Processing

In the statements for character string and byte string processing , in Unicode programs, the distinction is made in the statements that are intended for both types of processing by the optional addition IN

CHARACTER|BYTE MODE . In this case, IN CHARACTER MODE is the default.

Note

The addition IN CHARACTER|BYTE MODE is also used in the statements for determining length and offset:

 DESCRIBE FIELD ... LENGTH

 DESCRIBE DISTANCE

In this case, the specifications are mandatory.

Relational Operators for Character Strings and Byte Strings

Relational operators exist both for character strings and for byte strings . In Unicode programs, the latter can no longer be used for byte strings.

Functions for Character Strings and Byte Strings

The description functions are divided into description functions for character strings and description functions for byte strings. In particular, in Unicode programs, strlen can now only be used for characterlike arguments, while xstrlen is available for byte-like arguments.

Function Module Calls in Unicode Programs

In Unicode programs, a handleable exception is raised in a general function module call if an incorrect formal parameter is specified and the name of the function module is specified using a constant or as a literal. If the name of the function module is specified by a variable, and in non-Unicode programs, the specification of an incorrect formal parameter is ignored.

Open SQL in Unicode Programs

When work areas are used in Open SQL statements, in non-Unicode programs, their structure is not taken into account. Only the length and the alignment are checked.

In Unicode programs, for structured work areas the Unicode fragment view must be correct, and elementary work areas must be character-type.

The File Interface in Unicode Programs

Since the content of files frequently reflects the structure of data in the working memory, the file interface in a Unicode system must fulfill the following requirements:

It must be possible to exchange data between Unicode and non-Unicode systems.

It must be possible to exchange data between different Unicode systems.

It must be possible to exchange data between different non-Unicode systems that use different code pages .

For this reason, in Unicode programs, you must always define the code page used to encode the character-type data that is written in text files or that is read from text files.

You must also consider that a Unicode program must be executable in a non-Unicode system as well as a

Unicode system. Some of the syntax rules for the file interface have therefore been modified so that programming data access in Unicode programs is less prone to errors than in non-Unicode programs.

Before every read or write access, a file must be opened explicitly using OPEN DATASET .

Furthermore, a file that is already open cannot be opened again. In non-Unicode programs, the first time a file is accessed, it is implicitly opened using the standard settings. The statement for opening a file can be applied to an open file in non-Unicode-programs, although a file can only be opened once within a program.

When opening the file, the access type and type of file storage must be specified explicitly using the following additions: o INPUT|OUTPUT|APPENDING|UPDATE o

[LEGACY] BINARY|TEXT MODE

When opening a file in TEXT MODE , the ENCODING addition must be used to specify the character representation. When opening a file in LEGACY MODE , the byte order (endian) and a non-Unicode code page must be specified.

In non-Unicode programs, if nothing is entered, a file is opened with implicit standard settings.

If a file is opened for reading, the content can only be read. In non-Unicode programs, it is also possible to gain write access to these files.

If a file is opened as a text file , only the content of character-type data objects can be read or written. In non-Unicode programs, byte-type and numeric data objects are also allowed.

Note

In Unicode programs, file names can also contain blank characters.

Lists in Unicode Systems

Introduction

A WRITE statement writes the content of data objects to a list. When data is written with a WRITE statement, the output is stored in the list buffer and accessed from there for display when the list is called.

Each time a data object is produced by WRITE , the system defines an output length either implicitly or explicitly; the implicit output length depends on the data type. The output length defines the following two attributes:

Number of positions or memory spaces available for characters in the list buffer

Number of columns or cells available in the actual list

If the output length is shorter than the length of the data object, the system shortens its content according to certain rules when writing the data to the list buffer. Any values lost in numeric fields are indicated by a

* .

When displaying or printing a list, the content stored in the list buffer is transferred to the list as follows:

In non-Unicode systems , each character occupies the same amount of space in the list buffer as it requires columns in the list. In single-byte systems, a character occupies one byte in the list buffer and one column in the list, while a character that occupies several bytes in the list buffer in multibyte systems also occupies the same number of columns in the list. For this reason, all the characters stored in the list buffer are displayed in the list in non-Unicode systems.

In Unicode systems , every character usually occupies one place in the list buffer. However, a character can also occupy more than one column, as is the case for Eastern Asian characters.

However, since the list only contains the same number of columns as there are positions in the list buffer, the number of characters that can be displayed in the list is smaller than the number of characters stored in the list buffer in this case. List output is shortened accordingly, with the page formatted according to the specified alignment and marked with the characters > or <. You can then only display the entire content of the list by choosing the menu path Syst em → List →

Unicode Display .

For this reason, the horizontal position of the list cursor only has the same meaning as the output column in a list displayed or printed in non-Unicode systems. In Unicode systems, this is only guaranteed for the top and bottom output limits.

Rules for WRITE Statements

To avoid cutting off values unintentionally as far as possible, the rules for WRITE statements in Unicode programs have been modified and extended.

Operands in the WRITE Statement

If the data object specified in WRITE is a flat structure , this must be purely character-like in Unicode programs.

Note

This also applies for the statement WRITE TO , in which the target field must also be character-like.

WRITE Statements with Implicit Output Length

In Unicode programs, WRITE statements without an explicitly specified output length for all data objects except text field literals and data objects of the type string behave in the same way as in non-Unicode programs. This means fewer characters may be displayed in the list than are stored in the list buffer.

In the case of text field literals and data objects of the type string , the system assumes that all characters are to be displayed. For this reason the implicit output length is calculated using the characters contained in the data object so that it corresponds to the number of columns needed in the list. If this output length is greater than the length of the data object, surplus positions are filled with blanks when the data is written to the list buffer. When displaying the data in the list, the system removes these blanks, since the character representation fills the output length exactly.

WRITE Statements with Explicit Output Length

If a numeric data object is specified as an explicit output length after the AT addition for a WRITE statement, the value of this number is used as the output length, both in Unicode and non-Unicode systems. In

Unicode systems, the number of characters displayed in the list can differ from the number of characters stored in the list buffer. You can specify the output length in the following way instead of using numeric data objects:

1. WRITE AT (*) ...

2.

1. In data objects of the types c and string , the output length is set to the number of columns required to display the entire content in the list; closing blanks are ignored for type c . In the case of data objects of the type string , this has the same meaning as the implicit length.

2. In data objects of the types d and t , the output length is set to 10 and 8.

3. In data objects of the numeric types i , f , and p , the output length is set to the value required to display the current value including thousand separators. This rule is applied to the value after any CURRENCY , DECIMALS , NO-SIGN , ROUND , or UNIT have been used.

4. The implicit output length is used for data objects of the types n , x , and xstring .

3. WRITE AT (**) ...

4.

1. In data objects of the type c , the output length is set to twice the length of the data object, and in data objects of the type string , to twice the number of characters contained in the object.

2. In data objects of the types d and t , the output length is set to 10 and 8.

3. In data objects of the numeric types i , f , and p , the output length is set to the value required in order to display the maximum possible values for these types, including plus and minus signs and thousands separators. This rule is applied to the value after any CURRENCY ,

DECIMALS , NO-SIGN , ROUND , or UNIT additions have been used.

4. The implicit output length is used for data objects of the types n , x , and xstring .

The behavior of the output lengths (*) and (**) when using the addition USING EDIT MASK and the templates for date fields is described in Formatting Options .

Additions for GET/SET CURSOR FIELD/LINE

The additions DISPLAY OFFSET and MEMORY OFFSET take account of the fact that data objects can occupy different lengths when displayed in a list and when stored temporarily in the list buffer.

In accordance with this fact, the addition DISPLAY OFFSET off positions the cursor in the column in the output area specified in off for the SET CURSOR { FIELD f | LINE l } statement. The addition

MEMORY OFFSET off positions the cursor on the character in the output area that is located in the position (of the data object in f ) in the list buffer specified in off .

In the same way, a GET CURSOR { FIELD f | LINE l } statement used with the addition DISPLAY

OFFSET off places the cursor position in the output area in the data object off . When you use the addition MEMORY OFFSET off , the cursor position in the list buffer that is assigned to the character displayed is placed in the data object off . The DISPLAY addition is the default and can be left out.

Class for Formatting Lists

Class CL_ABAP_LIST_UTILITIES has been introduced to calculate output lengths, convert values from the list buffer, and define field limits. The return codes of the methods of this class can be used to program a correct column alignment in ABAP lists, even for output of Eastern Asian characters.

List Settings

The objects in a list can be displayed in different output lengths by specifying the desired length in the menu under

System → List → Unicode Display

. This is particularly advantageous for screen lists in

Unicode systems where the output is cut off as indicated by the characters > or <.

Recommendations

We recommend that you adhere to the following rules when programming lists, to ensure that they have the same appearance and functions both in Unicode and non-Unicode systems:

Specify an adequate output length

Do not overwrite parts of a field

Do not use the additions RIGHT-JUSTIFIED or CENTERED for WRITE TO if this statement is followed by list output with WRITE .

In customer-programmed horizontal scrolling with a SCROLL statement, you should only specify the upper or lower limit of data objects displayed, since the positions in the list buffer and in the list displayed are only certain to match for these field limits in Unicode systems.

Download