CS 2110:
Data Types and Representations 2
Aaron Hillegass
Georgia Tech
Bitwise NOT
2/25
Denoted ! and ∼ and NOT and ¬.
Truth table:
A
NOT A
0
1
1
0
Example on 8-bit word:
~(0b00110101) = 0b11001010
Bitwise AND
3/25
Denoted & and ∧ and AND.
Truth table:
A
B
A AND B
0
0
0
1
0
0
0
1
0
1
1
1
𝑛 inputs? All must be 1.
Example on 8-bit words:
0b00110101
& 0b11100011
= 0b00100001
Bitwise OR
4/25
Denoted | and ∨ and OR.
Truth table:
A
B
A OR B
0
0
0
1
0
1
0
1
1
1
1
1
𝑛 inputs? At least one must be 1.
Example on 8-bit words:
0b00110101
| 0b11100011
= 0b11110111
Bitwise XOR
5/25
Denoted ^ and ⊕ and XOR.
Truth table:
A
B
A XOR B
0
0
0
1
0
1
0
1
1
1
1
0
𝑛 inputs? iff odd number of 1s.
Example on 8-bit words:
0b00110101
^ 0b11100011
= 0b11010110
NAND and NOR
6/25
A
B
A NAND B
A
B
A NOR B
0
0
1
0
0
1
1
0
1
1
0
0
0
1
1
0
1
0
1
1
0
1
1
0
𝑛 inputs? At least one must be 0.
𝑛 inputs? All must be 0.
Left Shift
7/25
All bits move to the left 𝑛 positions
Leftmost bit is list, Rightmost is zero.
uint16_t v = 7;
uint16_t u = v << 2;
# v = 0000000000000111
# u = 0000000000011100
Shifting left one is equivalent to multiplying by 2.
Logical Right Shift
8/25
All bits move to the right 𝑛 positions
Rightmost bit is list, Leftmost is zero.
uint16_t v = 32768; // v = 1000000000000000
uint16_t u = v >> 2; // u = 0010000000000000
Shifting right one is equivalent to dividing by 2 and discarding fractional
part.
Arithmetic Right Shift
All bits move to the right 𝑛 positions
Rightmost bit is sign bit, Leftmost is sign bit.
int v = -1536; // v = 1111101000000000
int u = v >> 2; // u = 1111111010000000
11111110100000002 = −38410 = − 1536
4 .
Shifting right one is equivalent to dividing by 2 and rounding towards
negative infinity.
In this class, if we say “right shift” we mean logical right shift.
9/25
Creating Bitmasks using bitwise OR
int fd = open("myfile.txt", O_RDWR | O_CREAT | O_TRUNC, 0644);
Each bit represents a flag.
int O_RDWR = 1 << 1 // = 0x2;
int O_CREAT = 1 << 6 // = 0x40;
int O_TRUNC = 1 << 8 // = 0x100;
10/25
Reading Bitmasks using bitwise AND
int open(const char *pathname, int flags, ...) {
if (flags & O_RDWR) {
// Handle read-write mode
}
if (flags & O_CREAT) {
// Handle create mode
}
if (flags & O_TRUNC) {
// Handle truncate ååmode
}
if (flags & O_APPEND) {
// Handle append mode
}å
}
11/25
Octal Notation
12/25
A lot of bitmasks are declared using octal notation.
It is just base-8, and you can recognize it because we always lead with a
zero:
0127 = 1 × 82 + 2 × 81 + 7 × 80 = 8710
Here’s some C code:
uint16_t x = 0x1234;
x = x & ~0777;
Scientific Notation
13/25
−2.45 × 107
• Sign
• Mantissa: One non-zero digit to the left of the decimal point.
• Exponent: A signed integer.
How would you convert this idea to binary?
Reminder: Radix Point for Binary Numbers
1010.110012 = 23 + 21 + 20 + 2−1 + 2−2 + 2−5 = 10.7812510
.𝑎1 𝑎2 𝑎3 …𝑎𝑛 = ∑𝑛1 𝑎𝑖 × 2−𝑖
14/25
32-bit IEEE-754 Floating Point Numbers
31
30 - 23
22 - 0
1
10000001
01100000000000000000000
1 bit
8 bits
23 bits
𝑆
𝐸
𝑀
(−1)𝑆 × 1.𝑀 × 2𝐸−127
Example: (−1) × 1.0112 × 2129−127 = −1.37510 × 22 = −5.5
About 7 decimal digits of precision, max is about 3.4 × 1038
15/25
64-bit IEEE-754 Floating Point Numbers
63
62 - 52
51 - 0
1 bit
11 bits
52 bits
𝑆
𝐸
𝑀
𝑆
𝐸−1023
(−1) × 1.𝑀 × 2
About 16 decimal digits of precision, max is about 1.8 × 10308
16/25
But...how do we represent zero?
31
30 - 23
22 - 0
1 bit
8 bits
23 bits
𝑆
𝐸
𝑀
And NaN and ∞ and −∞?
17/25
And the IEEE spoketh
31
30 - 23
22 - 0
1 bit
8 bits
23 bits
𝑆
𝐸
𝑀
Special numbers:
• Zero: 𝑀 = 0, 𝐸 = 0
• ∞: 𝑀 = 0, 𝐸 = 255
• −∞: 𝑀 = 0, 𝐸 = 255, 𝑆 = 1
• NaN: 𝑀 ≠ 0, 𝐸 = 255
18/25
Subnormals
To get really small numbers, we have subnormals.
For all the normals, the mantissa 𝑀 represents 1.𝑀 .
For subnormals. the mantissa 𝑀 represents 0.𝑀 .
If 𝐸 is zero and 𝑀 is not, it is a subnormal.
Smallest positive number is 0 00000000 00000000000000000000001
It represents 2−126 × 2−23 = 2−149
19/25
bfloat16
20/25
Deep learning like more numbers with less precision.
Google developed bfloat16:
15
14 - 7
6-0
1 bit
8 bits
7 bits
𝑆
𝐸
𝑀
(−1)𝑆 × 1.𝑀 × 2𝐸−127
ASCII text encoding
00 nul
08 bs
10 dle
18 can
20 sp
28 (
30 0
38 8
40 @
48 H
50 P
58 X
60 `
68 h
70 p
78 x
01 soh
09 ht
11 dc1
19 em
21 !
29 )
31 1
39 9
41 A
49 I
51 Q
59 Y
61 a
69 i
71 q
79 y
02 stx
0a nl
12 dc2
1a sub
22 "
2a *
32 2
3a :
42 B
4a J
52 R
5a Z
62 b
6a j
72 r
7a z
03 etx
0b vt
13 dc3
1b esc
23 #
2b +
33 3
3b ;
43 C
4b K
53 S
5b [
63 c
6b k
73 s
7b {
21/25
04 eot
0c np
14 dc4
1c fs
24 $
2c ,
34 4
3c <
44 D
4c L
54 T
5c \
64 d
6c l
74 t
7c |
05 enq
0d cr
15 nak
1d gs
25 %
2d 35 5
3d =
45 E
4d M
55 U
5d ]
65 e
6d m
75 u
7d }
06 ack
0e so
16 syn
1e rs
26 &
2e .
36 6
3e >
46 F
4e N
56 V
5e ^
66 f
6e n
76 v
7e ~
07 bel
0f si
17 etb
1f us
27 '
2f /
37 7
3f ?
47 G
4f O
57 W
5f _
67 g
6f o
77 w
7f del
ASCII strings in memory
22/25
char *greeting = "Hello!";
‘H’ ‘e’ ‘l’ ‘l’ ‘o’ ‘!’ ‘\0’
48
65
Notice:
• A is 6510
• Z is 9010
• a is 9710
• z is 12210
6c
6c
6f
21
00
‘K’ + 32 = ‘k’
Is there a cheaper way?
Newline Madness
In 1972, Unix used ‘\n’ (ASCII 1010 ).
In 1982, Microsoft DOS used ‘\r\n’ (ASCII 1310 , ASCII 1010 ).
We have never recovered. Most systems have dos2unix and unix2dos.
23/25
UTF-8 is a superset of ASCII
24/25
• UTF-8 is the most common encoding for the web.
• ASCII only used 7 bits. Turning on the 8th bit means “This is UTF-8.”
• UTF-8 represents any Unicode “code point” using 1 to 4 bytes.
• Sometimes a single character requires multiple code points.
First
Last
byte 1
byte 2
byte 3
U+0000
U+007F
0yyyzzzz
U+0080
U+07FF
110xxxyy
10yyzzzz
U+0800
U+FFFF
1110wwww
10xxxxyy
10yyzzzz
U+10000
U+10FFFF
11110uvv
10vvwwww
10xxxxyy
byte 4
10yyzzzz
Questions?
Reading Patt: 2.1 - 2.6
Slides by Aaron Hillegass