Uploaded by Aly Deden

Differences between Unicode and Non-Unicode Programs

advertisement
ABAP Keyword Documentation
• Differences between Unicode and Non-Unicode Programs
Comments and Literals in Non-Unicode Programs
Names in Unicode Programs
Program Structure of Unicode Programs
Operand Types in Unicode Programs
Alignment in Unicode Systems
Offset and Length Specifications in Unicode Programs
Access to Memory Sequences in Unicode Programs
Conversion of Structures in Unicode Programs
Structure Typing in Unicode Programs
Structure Enhancements and Unicode Programs
Character String and Byte String Processing in Unicode Programs
Function Module Calls in Unicode Programs
Open SQL in Unicode Programs
The File Interface in Unicode Programs
Lists in Unicode Systems
Differences between Unicode and Non-Unicode
Programs
The ABAP key word documentation describes the ABAP statements for both Unicode and non-Unicode
systems. Only Unicode programs can be compiled and executed in Unicode systems. In non-Unicode
systems, this is also possible for non-Unicode programs. However, Unicode programs should also be used
in non-Unicode systems, for the following reasons:

Static type checks are executed in Unicode programs.

Byte processing and string processing is separated in Unicode programs.

Structures are always handled as structures in Unicode programs.

Uncontrolled access to segments of the working memory is not possible in Unicode programs.
This makes Unicode programs easier to understand, more robust, and easier to maintain than non-Unicode
programs.
The following section lists the language constructs and statements for which there are differences between
Unicode and non-Unicode programs:

Comments and Literals in Non-Unicode Programs

Names in Unicode Programs

Program Structure of Unicode Programs

Operand Types in Unicode Programs

Alignment in Unicode Systems

Offset and Length Specifications in Unicode Programs

Access to Memory Sequences in Unicode Programs

Conversion of Structures in Unicode Programs

Structure Typing in Unicode Programs

Structure Enhancements and Unicode Programs

String and Byte String Processing in Unicode Programs

Function Module Calls in Unicode Programs

Open SQL in Unicode Programs

File Interface in Unicode Programs

Lists in Unicode Systems
Comments and Literals in Non-Unicode Programs
In non-Unicode systems, no characters should be used in comments unless they are available in all code
pages supported by SAP. In the worst case, a program can no longer be executed when a code page other
than the one in which it was created is used. We recommend the usage of 7-Bit-ASCII-characters only.
Note
In a Unicode system, all source codes are stored in Unicode and this is why this problem does not occur
there. However, even in Unicode programs, do not use characters for comments and literals if they cannot
be displayed in non-Unicode programs, so that programs can be transported from a Unicode system to a
non-Unicode system without losses during conversion.
Names in Unicode Programs
Only the following characters are allowed in names in Unicode programs:
1.
2.
3.
The letters "A" through "Z"
The digits "0" through "9"
Underscores ("_")
For compatibility reasons, you can also use the characters "%", "$", "?", "-", "#", and "*" but these should be
used only in exception cases (for example, for existing program generations) and with good justification.
You can also use forward slashes ("/") for namespace prefixes.
Note
Apart from ABAP Objects, non-Unicode programs can also use characters other than the ones listed
above. This can cause the following problems in these programs:

If characters are used that are not available in all code page supported by SAP, it might not be
possible to run certain programs when using a different code page to the one in which they were
created.

No string templates can be used in a non-Unicode program.
Program Structure of Unicode Programs
Non-accessible statements (statements that are not assigned to a processing block) lead to a syntax error
in Unicode programs. In non-Unicode programs, at present only a syntax warning is issued.
Operand Types in Unicode Programs
One of the most important differences between Unicode and non-Unicode programs is the clear distinction
between character-type data objects and byte-type data objects, and the restriction of data types whose
objects can be viewed as character-type. This has an influence on all statements in which character-type
operands are expected, and in particular on character string and byte string processing.
Character-type data objects
In Unicode programs, only the following elementary data objects are now character-type:
Data type Meaning
c
Text field
d
Date field
n
Numerical text
t
Time field
string
Text string
In addition, structures are character-type if they contain only flat character-type components (only
components from the above table with the exception of text strings).
In Unicode programs, a structure can now essentially only be used at an operand position that expects a
single field if the structure is character-type. It is then handled in the same way as a data object of type c.
In non-Unicode programs, all flat structures and byte-type data objects are also still handled as charactertype data objects (implicit casting).
Note
The incorrect use of structures at operand positions is greatly restricted in Unicode programs. For example,
a structure that contains a numeric component can no longer be used at a numeric operand position.
Byte-type data objects
In Unicode programs, elementary data objects of types x and xstring are byte-type. In non-Unicode
programs, data objects of this type are generally handled as character-type. Conversely, in non-Unicode
programs, at positions in which byte processing takes place (SET BIT, GET BIT and the logical operators
O, Z, M), character-type data objects are still expected, while in Unicode programs only byte-type data
objects are permitted.
Note
In Unicode programs, the storage of byte strings in character-type containers causes problems, as the byte
order of character-type data objects in Unicode systems is platform dependent. In non-Unicode systems,
this only applies for data objects of numeric data types. The content of the data objects is interpreted
incorrectly if a container of this type is stored persistently and is then imported to an application server with
a different byte sequence.
Alignment in Unicode Systems
In Unicode systems, in addition to alignment requirements for numeric data objects of types i,
decfloat16, decfloat34, f, s, and of deep data objects, all character-like data types are also affected.
The alignment is determined by the length requirement of a character in the memory.
As a consequence, in structures with components of different data types, the alignment gaps in Unicode
systems may be different to those in non Unicode systems. For enhancements between structures, the
Unicode fragment view concept has been introduced, which divides a structure into fragments according to
its alignment gaps.
Note
Alignment gaps can also occur at the end of structures, as the overall length of the structure is determined
by the component with the largest alignment requirement.
Example
In the following structure, alignment gaps (A) occur in Unicode systems that are not present in non-Unicode
systems. The first alignment gap is formed as a result of the alignment of the substructure struc2, the
second due to the alignment of the component c of type c>, and the third is due to the component d of type
i.
DATA:
BEGIN OF struc1,
a TYPE x LENGTH 1,
BEGIN OF struc2,
b TYPE x LENGTH 1,
c TYPE c LENGTH 6,
END OF struc2,
d
TYPE i,
END OF struc1.
Non-Unicode system [ a | b | cccccc | dddd ]
Unicode system
[ a | A | b | A | cccccccccccc | AA | dddd ]
Offset and Length Specifications in Unicode Programs
Offset/length specifications are made by appending [+off][(len)] to the name of a data object in
operand position, and the specifications are used to access subareas of a data object. This type of
programming is no longer completely possible in Unicode systems because, for example when accessing
structures with components of different data types, it is not possible to define whether offset and length
should be specified in characters or bytes. Furthermore, restrictions have been introduced that forbid
access to memory areas outside of flat data objects.
Offset/Length Specifications for Elementary Data Objects
Offset/length specifications are permitted for character-like data objects and byte-like data objects. The
specification of offset and length is interpreted either as a number of characters or as a number of bytes.
The rules that determine which data objects in Unicode programs count as character-like or byte-like
objects do not allow for offset/length specifications for data objects of numeric data types.
Note
The method of using data objects of type c as containers for storing structures of different types, which are
often not known until runtime, in which components are accessed using offset/length access, is no longer
possible in Unicode programs. Instead of these containers, the statement CREATE DATA can be used to
generate data objects of any structure. To enable access to existing containers, these can be assigned to a
field symbol using the CASTING addition of the statement ASSIGN. The COMPONENT addition can then be
used to access components.
Offset/Length Specifications for Structures
An offset/length specification for a structure is only permitted in Unicode systems if the structure is either

character-like (meaning it only contains flat character-like components), or it is

flat, has a character-like initial fragment according to the Unicode fragment view, and the
offset/length specification accesses this initial fragment.
In both cases, the specification of offset and length is interpreted as a number of characters.
Example
The following structure has both character-like and non-character-like components:
DATA:
BEGIN OF struc,
a TYPE c LENGTH 3,
b TYPE n LENGTH 4,
"Length 3 characters
"Length 4 characters
c
d
e
f
g
END
TYPE d,
TYPE t,
TYPE decfloat16,
TYPE c LENGTH 28,
TYPE x LENGTH 2,
OF struc.
"Length
"Length
"Length
"Length
"Length
8 characters
6 characters
8 bytes
28 characters
2 bytes
The Unicode fragment view splits the structure into five areas, F1 - F5.
[ aaa | bbbb | cccccccc | ddd | AAA | eeee | fffffffffffff | gg ]
[
F1
| F2 | F3 |
F4
| F5 ]
Offset/length access is only possible for the character-like initial fragment F1. Specifications such as
struc(21) or struc+7(14) are accepted and are handled as a single field of type c. An access such as
struc+57(2), for example, is not permitted in Unicode systems.
Offset/Length Specifications for Actual Parameters
For actual parameters specified in PERFORM, in Unicode programs, it is not possible to specify a memory
area outside of the actual parameter using offset/length specifications. In particular, it is no longer possible
to specify an offset without a length, as this would implicitly set the length of the actual parameter.
Offset/Length Specification for Field Symbols
When assigning a memory area to a field symbol using the ASSIGN statement, in Unicode programs it is
now only possible to use offset/length specifications to access the memory within the data object. The
addition RANGE defines the data object.
Field symbols themselves are also allocated an assignable memory area. This is effective if a field symbol
is used as a source in the ASSIGN statement.
In non-Unicode programs, the assignable area is defined by the data area of the current program, which
can lead to references being overwritten.
If a data object is entered as a source in ASSIGN, no offset can be specified without a length unless the
explicit RANGE addition is specified. Otherwise, this would implicitly set the length of the data object. If the
name of a field symbol is specified, its data type in Unicode programs must be flat and elementary if an
offset is specified without a length.
Note
Previously, cross-field offset/length accesses could be usefully implemented in the ASSIGN statement for
processing repeating groups in structures. In order to enable this in Unicode systems, the ASSIGN
statement has been enhanced with the additions RANGE and INCREMENT.
Access to Memory Sequences in Unicode Programs
The following (obsolete) statements access data objects that are stored in the memory as an equally
spaced sequence:

DO ... VARYING

WHILE ... VARY

ADD ... THEN ... UNTIL

ADD ... FROM ... TO
In the DO and WHILE loops in Unicode programs, all data objects of the sequence must be compatible and
either be structure components that belong to the same structure, or subareas of the same data object
specified using offset/length specifications. In Unicode programs, a RANGE must also be entered if it cannot
be statically recognized that the data objects involved are components of the same structure. Otherwise,
the permitted memory area is determined from the smallest possible substructure.
When memory sequences are added using ADD, in Unicode programs, all data objects of the sequence
must be components of a structure. If this cannot be statically recognized in the syntax check, a structure
must be specified using the addition RANGE.
Conversion of Structures in Unicode Programs
The most important differences between the behaviors of Unicode programs and non-Unicode programs
are the changed conversion rules for structures for assignments and for comparisons.
Note
Two structures in Unicode programs are only compatible when all alignment gaps are identical on all
platforms. This applies in particular for alignment gaps that are created by included structures (INCLUDE)
Assignments Between Flat Structures
In non-Unicode programs, incompatible flat structures are treated as data objects of the type c, whereas in
Unicode programs, conversion rules apply which assign the most important role to the Unicode fragment
view of the structures.
Assignments Between Flat Structures and Single Fields
Non-Unicode programs always handle flat structures as data objects of the type c when assigning from and
to elementary data objects. In Unicode programs, however, a conversion rule applies, stating that the
structure must be character-like (at the very least in its initial fragment).
Comparisons Between Incompatible Flat Structures
As is the case with assignments, the structures are not handled as c fields, but in accordance with their
Unicode fragment view (see Comparison Rules Between Operands).
Comparisons Between Flat Structures and Single Fields
As is the case with assignments, the system checks whether the structure is character-like, at the very
least in its initial fragment (see Comparison Operators for All Data Types).
Structure Typing in Unicode Programs
For downward compatibility reasons, a structure can still be cast for field symbols and parameters of
function modules and subroutines using the obsolete addition STRUCTURE.
When assigning a data object to such a field symbol or passing an actual parameter to such a formal
parameter, in non-Unicode programs, the system only checks whether the length of the data object or
actual parameter has at least the length of the structure and whether the alignment is identical at runtime.
Unicode programs make a difference between structured and elementary data objects or actual
parameters. For a structured data object or actual parameter, its Unicode fragment view must match the
cast structure including all alignment gaps (including the closing ones). In addition, an elementary data
object or actual parameter must be character-like and flat.
When a formal parameter of a function module is typed with a flat structure using LIKE instead of TYPE,
LIKE has the same effect as STRUCTURE. However, the system checks the exact length when passing the
parameters in non-Unicode programs.
Note
The check of the Unicode fragment view can avoid problems that occur in non-Unicode systems due to
closing alignment gaps. This can include the non-type-compliant filling of actual parameters with the
content of an alignment gap.
Structure Enhancements and Unicode Programs
ABAP Dictionary structures and database tables that are delivered by SAP can be enhanced using
customizing includes or append structures. These types of changes cause problems in Unicode programs if
the enhancements change the Unicode fragment view.
For this reason, the option to classify structures and database tables was introduced, which makes it
possible to recognize and handle problems related to structure enhancements. This classification is used
during in the program check to create a warning at all points where the program works with structures, and
where later structure enhancements can cause syntax errors or changes in program behavior. When you
define a structure or a database table in ABAP Dictionary, you can specify the enhancement categories
that are displayed in the following table as classification.
Level Category
Meaning
1
Unclassified
The structure does not have an enhancement category.
2
Cannot be enhanced
The structure must not be enhanced.
3
Can be enhanced and character-like
All structure components and their enhancements must be
character-like and flat.
4
Can be enhanced and character-like All structure components and their enhancements must be
or numeric
flat.
5
Can be enhanced in any way
All structure components and their enhancements can have
any data type.
The warnings displayed after the program check are classified into three levels from the following table,
depending on the consequences of the permitted structure enhacements.
Level
Type of
Check
Meaning
A
Syntax check
An enhancement that fully utilizes the enhancement category of the structure in
question leads to a syntax error.
B
Extended
check
Permitted enhancements can lead to a syntax errors, but not always.
C
Extended
check
Permitted enhancements cannot lead to syntax errors, although changes to
program behavior do result in semantic problems.
Example
If the structure ddic_struc in ABAP Dictionary is defined only with flat components but is classified as
Can be enhanced in any way, then the following program section leads to a warning in the syntax check. If
the structure were to be enhanced by a deep component after the program was delivered, the program
would be syntactically incorrect and no longer executable. This is why in this case you either have to
classify the structure ddic_struc in ABAP Dictionary as Can be enhanced and character-like or else you
cannot specify the offset/length in the program.
DATA: my_struc TYPE ddic_struc,
str TYPE string,
off TYPE i,
len TYPE i.
...
str = my_struc+off(len).
Character String and Byte String Processing in
Unicode Programs
In Unicode programs, character string and byte string processing are strictly separated. The operands of
character string processing must be character-like data objects, and operands in byte string processing
must be byte-like data objects. In non-Unicode programs, byte strings are normally handled in the same
way as character strings.
Syntactic Separation
Statements for Character String and Byte String Processing
In the statements for character string and byte string processing, in Unicode programs, the distinction is
made in the statements that are intended for both types of processing by the optional addition IN
CHARACTER|BYTE MODE. In this case, IN CHARACTER MODE is the default.
Note
The addition IN CHARACTER|BYTE MODE is also used in the statements for determining length and offset:

DESCRIBE FIELD ... LENGTH

DESCRIBE DISTANCE
In this case, the specifications are mandatory.
Relational Operators for Character Strings and Byte Strings
Relational operators exist both for character strings and for byte strings. In Unicode programs, the latter
can no longer be used for byte strings.
Functions for Character Strings and Byte Strings
The description functions are divided into description functions for character strings and description
functions for byte strings. In particular, in Unicode programs, strlen can now only be used for characterlike arguments, while xstrlen is available for byte-like arguments.
Function Module Calls in Unicode Programs
In Unicode programs, a handleable exception is raised in a general function module call if an incorrect
formal parameter is specified and the name of the function module is specified using a constant or as a
literal. If the name of the function module is specified by a variable, and in non-Unicode programs, the
specification of an incorrect formal parameter is ignored.
Open SQL in Unicode Programs
When work areas are used in Open SQL statements, in non-Unicode programs, their structure is not taken
into account. Only the length and the alignment are checked.
In Unicode programs, for structured work areas the Unicode fragment view must be correct, and
elementary work areas must be character-type.
The File Interface in Unicode Programs
Since the content of files frequently reflects the structure of data in the working memory, the file interface in
a Unicode system must fulfill the following requirements:

It must be possible to exchange data between Unicode and non-Unicode systems.

It must be possible to exchange data between different Unicode systems.

It must be possible to exchange data between different non-Unicode systems that use different
code pages.
For this reason, in Unicode programs, you must always define the code page used to encode the
character-type data that is written in text files or that is read from text files.
You must also consider that a Unicode program must be executable in a non-Unicode system as well as a
Unicode system. Some of the syntax rules for the file interface have therefore been modified so that
programming data access in Unicode programs is less prone to errors than in non-Unicode programs.

Before every read or write access, a file must be opened explicitly using OPEN DATASET.
Furthermore, a file that is already open cannot be opened again. In non-Unicode programs, the
first time a file is accessed, it is implicitly opened using the standard settings. The statement for
opening a file can be applied to an open file in non-Unicode-programs, although a file can only be
opened once within a program.

When opening the file, the access type and type of file storage must be specified explicitly using
the following additions:
o
INPUT|OUTPUT|APPENDING|UPDATE
o
[LEGACY] BINARY|TEXT MODE
When opening a file in TEXT MODE, the ENCODING addition must be used to specify the character
representation. When opening a file in LEGACY MODE, the byte order (endian) and a non-Unicode
code page must be specified.
In non-Unicode programs, if nothing is entered, a file is opened with implicit standard settings.

If a file is opened for reading, the content can only be read. In non-Unicode programs, it is also
possible to gain write access to these files.

If a file is opened as a text file, only the content of character-type data objects can be read or
written. In non-Unicode programs, byte-type and numeric data objects are also allowed.
Note
In Unicode programs, file names can also contain blank characters.
Lists in Unicode Systems
Introduction
A WRITE statement writes the content of data objects to a list. When data is written with a WRITE
statement, the output is stored in the list buffer and accessed from there for display when the list is called.
Each time a data object is produced by WRITE, the system defines an output length either implicitly or
explicitly; the implicit output length depends on the data type. The output length defines the following two
attributes:

Number of positions or memory spaces available for characters in the list buffer

Number of columns or cells available in the actual list
If the output length is shorter than the length of the data object, the system shortens its content according
to certain rules when writing the data to the list buffer. Any values lost in numeric fields are indicated by a
*.
When displaying or printing a list, the content stored in the list buffer is transferred to the list as follows:

In non-Unicode systems, each character occupies the same amount of space in the list buffer as it
requires columns in the list. In single-byte systems, a character occupies one byte in the list buffer
and one column in the list, while a character that occupies several bytes in the list buffer in multibyte systems also occupies the same number of columns in the list. For this reason, all the
characters stored in the list buffer are displayed in the list in non-Unicode systems.

In Unicode systems, every character usually occupies one place in the list buffer. However, a
character can also occupy more than one column, as is the case for Eastern Asian characters.
However, since the list only contains the same number of columns as there are positions in the list
buffer, the number of characters that can be displayed in the list is smaller than the number of
characters stored in the list buffer in this case. List output is shortened accordingly, with the page
formatted according to the specified alignment and marked with the characters > or <. You can
then only display the entire content of the list by choosing the menu path System → List →
Unicode Display.
For this reason, the horizontal position of the list cursor only has the same meaning as the output column in
a list displayed or printed in non-Unicode systems. In Unicode systems, this is only guaranteed for the top
and bottom output limits.
Rules for WRITE Statements
To avoid cutting off values unintentionally as far as possible, the rules for WRITE statements in Unicode
programs have been modified and extended.
Operands in the WRITE Statement
If the data object specified in WRITE is a flat structure, this must be purely character-like in Unicode
programs.
Note
This also applies for the statement WRITE TO, in which the target field must also be character-like.
WRITE Statements with Implicit Output Length
In Unicode programs, WRITE statements without an explicitly specified output length for all data objects
except text field literals and data objects of the type string behave in the same way as in non-Unicode
programs. This means fewer characters may be displayed in the list than are stored in the list buffer.
In the case of text field literals and data objects of the type string, the system assumes that all characters
are to be displayed. For this reason the implicit output length is calculated using the characters contained in
the data object so that it corresponds to the number of columns needed in the list. If this output length is
greater than the length of the data object, surplus positions are filled with blanks when the data is written to
the list buffer. When displaying the data in the list, the system removes these blanks, since the character
representation fills the output length exactly.
WRITE Statements with Explicit Output Length
If a numeric data object is specified as an explicit output length after the AT addition for a WRITE statement,
the value of this number is used as the output length, both in Unicode and non-Unicode systems. In
Unicode systems, the number of characters displayed in the list can differ from the number of characters
stored in the list buffer. You can specify the output length in the following way instead of using numeric data
objects:
1.
2.
WRITE AT (*) ...
1.
2.
3.
4.
3.
4.
In data objects of the types c and string, the output length is set to the number of columns
required to display the entire content in the list; closing blanks are ignored for type c. In the
case of data objects of the type string, this has the same meaning as the implicit length.
In data objects of the types d and t, the output length is set to 10 and 8.
In data objects of the numeric types i, f, and p, the output length is set to the value required
to display the current value including thousand separators. This rule is applied to the value
after any CURRENCY, DECIMALS, NO-SIGN, ROUND, or UNIT have been used.
The implicit output length is used for data objects of the types n, x, and xstring.
WRITE AT (**) ...
1.
2.
3.
4.
In data objects of the type c, the output length is set to twice the length of the data object, and
in data objects of the type string, to twice the number of characters contained in the object.
In data objects of the types d and t, the output length is set to 10 and 8.
In data objects of the numeric types i, f, and p, the output length is set to the value required
in order to display the maximum possible values for these types, including plus and minus
signs and thousands separators. This rule is applied to the value after any CURRENCY,
DECIMALS, NO-SIGN, ROUND, or UNIT additions have been used.
The implicit output length is used for data objects of the types n, x, and xstring.
The behavior of the output lengths (*) and (**) when using the addition USING EDIT MASK and the
templates for date fields is described in Formatting Options.
Additions for GET/SET CURSOR FIELD/LINE
The additions DISPLAY OFFSET and MEMORY OFFSET take account of the fact that data objects can
occupy different lengths when displayed in a list and when stored temporarily in the list buffer.
In accordance with this fact, the addition DISPLAY OFFSET off positions the cursor in the column in the
output area specified in off for the SET CURSOR { FIELD f | LINE l } statement. The addition
MEMORY OFFSET off positions the cursor on the character in the output area that is located in the
position (of the data object in f) in the list buffer specified in off.
In the same way, a GET CURSOR { FIELD f | LINE l } statement used with the addition DISPLAY
OFFSET off places the cursor position in the output area in the data object off. When you use the
addition MEMORY OFFSET off, the cursor position in the list buffer that is assigned to the character
displayed is placed in the data object off. The DISPLAY addition is the default and can be left out.
Class for Formatting Lists
Class CL_ABAP_LIST_UTILITIES has been introduced to calculate output lengths, convert values from the
list buffer, and define field limits. The return codes of the methods of this class can be used to program a
correct column alignment in ABAP lists, even for output of Eastern Asian characters.
List Settings
The objects in a list can be displayed in different output lengths by specifying the desired length in the
menu under System → List → Unicode Display. This is particularly advantageous for screen lists in
Unicode systems where the output is cut off as indicated by the characters > or <.
Recommendations
We recommend that you adhere to the following rules when programming lists, to ensure that they have the
same appearance and functions both in Unicode and non-Unicode systems:

Specify an adequate output length

Do not overwrite parts of a field

Do not use the additions RIGHT-JUSTIFIED or CENTERED for WRITE TO if this statement is
followed by list output with WRITE.

In customer-programmed horizontal scrolling with a SCROLL statement, you should only specify
the upper or lower limit of data objects displayed, since the positions in the list buffer and in the list
displayed are only certain to match for these field limits in Unicode systems.
Download