Example 4.1. Let us consider a concrete example of how the UTF-8 code of a code point is determined. The ASCII characters are not so interesting since for these characters the UTF-8 code agrees with the code point. The Norwegian character ’Å’ is more challenging. If we check the Unicode charts,4 we find that this character has the code point c516 = 197. This is in the range 128–2047 which is covered by rule 2 in fact 4.6. To determine the UTF-8 encoding we must find the binary representation of the code point. This is easy to deduce from the hexadecimal representation. The least significant numeral (5 in our case) determines the four least significant bits and the most significant numeral (c) determines the four most significant bits. Since 5 = 01012 and c 16 = 11002 , the code point in binary is c 5 z }| { z }| { 000 1100 01012 , where we have added three 0s to the left to get the eleven bits referred to by rule 2. We then distribute the eleven bits as in (4.4) and obtain the two bytes 11000011, 10000101. In hexadecimal this corresponds to the two values c3 and 85 so the UTF-8 encoding of ’Å’ is the two-byte number c38516 . 4 The Latin 1 supplement can be found at www.unicode.org/charts/PDF/U0080.pdf/.