Week 1 - Maseeh College of Engineering & Computer Science

advertisement
OMSE 510: Computing
Foundations
Intro Lecture
Chris Gilmore <grimjac@cs.pdx.edu>
Portland State University/OMSE
Website/mailing list
Website
http://web.cecs.pdx.edu/~grimjack/OMSE510CF/ComputerFoundations.html
Mailing List:
omse510@cecs.pdx.edu
Personal Email:
grimjack@cs.pdx.edu
About OMSE510
Course Rationale:
This course has been designed for graduate level
software engineering students who are lacking key
foundation computer science knowledge in the
areas of computer architecture and operating
systems. This course may also be taken by
students needing or wanting to upgrade their
knowledge in these areas. With the approval of an
OMSE advisor, OMSE students may register in this
course and count it for credit as an OMSE elective.
Course Structure
Divided into two halves
Computer Architecture


How the hardware works
4 Sessions + Midterm
Operating Systems


How the software interacts with the hardware
5 Sessions + Final
Four Assignments (40%), Midterm (30%),
Final (30%)
Things not covered
Transistors, logic gates & Lower-level
functionality
In-depth Floating Point/Integer Arithmetic
Networking
Hardware Description Languages (Verilog,
ISP’)
Security
The History of anything
Theoretical Architectures
First Note
1) I like feedback
2) It’s good to ask questions in class.
Email is less good.
3) If you don’t understand, ask NOW.
Probably other people don’t
understand. And we always build
on existing material.
4) One or two breaks in a 3 hour
class.
The Basics
Today’s lecture covers the
very basics – should
probably be review!
If you’re bored, that’s good!
The interesting stuff comes
later
Today’s Lecture
Amdahl’s Law
Data Representation
Conventions: (binary/hex/oct)
 Unsigned/signed integers
 Floating point

Brief on Compilers
Amdahl’s Law
Fundamental design principle in
computer architecture design.
Make things FAST.
Amdahl’s law is a guideline for
making things faster.
Speedup
Suppose some task that takes time
torig minutes to perform
Eg.
Flying from PDX to YVR, 80 mins
Boeing 727, ~900 km/h
Speedup
But time is important to us! Let’s
take the Concorde instead!
Flying from PDX to YVR
- Boeing 727, ~900 km/h, 80 mins
- Concorde, ~2200 km/h, 40 mins
40 minutes saved!
Speedup
Flying from PDX to YVR
told = 80 mins (Boeing 727)
tnew = 40 mins (Concorde)
Speedup =
told
tnew
80 min
=
=2
40 min
2x speed improvement! That’s great!
.. But is it really?
Speedup
Time actually spent traveling from
PDX to YVR:
30 mins MAX to airport
20 mins getting your ticket
45 mins getting through security
30 mins boarding/taxiing
80 mins flying
40 mins landing + customs
= 245 minutes
Speedup
Time actually spent traveling from
PDX to YVR:
245 minutes (Boeing 747)
205 minutes (Concorde)
Where’d that 2x speedup go?
Speedup
Only 33%
of total time!
30 mins MAX to airport
20 mins getting your ticket
45 mins getting through
security
30 mins boarding/taxiing
80 mins flying
40 mins landing + customs
= 245 minutes
Amdahl’s Law
The variables:
told = 245 mins (Original travel time)
α = 33% (Time actually spent flying)
k = 2 (Speedup factor)
tnew = (1-α) told x
α told / k
= 66% * 245 mins x 33% * 245 mins / 2
= 205 mins
Amdahl’s Law
Speedup, S
S = told /tnew
= 1 / [ (1-α) + α /k ]
= 1.2
Much less than 2x!
Moral of the story: To improve the
system, you have to work harder
than you want
Amdahl’s Law
Special case – set k = ∞
S∞ = 1 / (1 – α)
Most amount of speedup you can get
out of tuning one component.
ie. Are you wasting your time?
Amdahl’s Law
Most important to Computer
Architecture/Operating system
design:
Speed!
Not necessarily like regular
programming. More important than
correctness (almost)
Data Representation
Foundation Idea #2:
Computers represent everything
with numbers
Data Representation
Everything in a computer is
represented as a number.
Letters
-> Numbers
Pictures -> Numbers
Programs -> Numbers
Data = Numbers
Numbers in different bases
(This should be old hat for you)
Non-negative Integers:
Decimal (Human) Numbers:
0,1,2,…..256, …. 1024… 2048….
Binary
Data in computers only exist in 2
states, on and off. (1 or 0)
This means it’s hard for them to
count in decimal…
Decimal / Binary
Decimal
0
1
2
3
4
5
Binary
0
1
10
11
100
101
Decimal
Decimal
12345 = abcde
Number = a*104 + b*103 + c*102 + d*101 + e*100
= 1*10000 + 2*1000 + 3*100 + 4*10 + 5*1
= 10000 + 2000 + 300 + 40 + 5
= 12345
Binary
Binary (Base 2)
10101 = abcde
Number = a*24 + b*23 + c*22 + d*21 + e*20
= 1*16 + 0*8 + 1*4 + 0*2 + 1*1
= 16 + 0 + 4 + 0 + 1
= 21
Decimal / Binary
Decimal
0
1
2
3
4
5
Binary
0
1
10
11
100
101
Octal/Hex
Okay, computers like binary…
But binary is too hard to read for
humans.
… But we want to express powers of
two conveniently
Octal
00, 01, 02,…, 07, 010, … 017, 020…..
Octal
Octal (Base 8)
012345 = 0abcde
Number = a*84 + b*83 + c*82 + d*81 + e*80
= 1*4096+ 2*512 + 3*64 + 4*8 + 5*1
= 4096 + 1024 + 192 + 32 + 5
= 5349
Decimal / Binary /Octal
Decimal
0
1
2
3
4
5
8
12
47
Binary
0
1
10
11
100
101
1000
1100
101111
Octal
00
01
02
03
04
05
010
014
057
Hexedecimal
But octal still cumbersome, because
computers often prefer grouping in
sets of 4 binary digits.
(Octal groups bits in sets of 3)
Hex Format (The preferred choice)
0x0, 0x1, 0x2,…0xf, 0x10, 0x11, ..
0x1a,0x20
Hexedecimal
Hex (Base 16)
0x12345 = 0xabcde
Number = a*164 + b*163 + c*162 + d*161 + e*160
= 1*65536+ 2*4096 + 3*256 + 4*16 + 5*1
= 65536 + 8192 + 768 + 64 + 5
= 74565
Hexedecimal
Hex Digits: Need more than 10 digits (0-9)
So we use a b c d e f
Decimal: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15, 16, 17
Hexedecimal: 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6,
0x7, 0x8, 0x9, 0xA, 0xB, 0xC, 0xD, 0xE, 0xF,
0x10,0x11
Decimal / Binary /Octal / Hex
Decimal Binary
Octal
Hex
0
0
00
0x0
1
1
01
0x1
2
10
02
0x2
3
11
03
0x3
4
100
04
0x4
5
101
05
0x5
8
1000
010
0x8
12
1100
014
0xC
47
101111
057
0x2F
*Chinese Remainder Theorem to convert
ASCII
Oct Dec Hex Char
------------------------------101 65 41 A
102 66 42 B
103 67 43 C
104 68 44 D
105 69 45 E
106 70 46 F
107 71 47 G
110 72 48 H
111 73 49 I
112 74 4A J
113 75 4B K
114 76 4C L
115 77 4D M
Oct Dec Hex Char
------------------------------116 78 4E N
117 79 4F O
120 80 50 P
121 81 51 Q
122 82 52 R
123 83 53 S
124 84 54 T
125 85 55 U
126 86 56 V
127 87 57 W
130 88 58 X
131 89 59 Y
132 90 5A Z
Text in ASCII
Rolex Newbie FAQ
Is it okay to peel off the hologram sticker from the back of my new rolex?
Yes. It will not devalue your watch, nor void your warranty. Hologram stickers
are not a good way of differentiating real and fake Rolexes. Even fake ones
often come with a hologram sticker.
00000000
00000010
00000020
00000030
00000040
00000050
00000060
00000070
00000080
00000090
000000A0
000000B0
000000C0
52
0D
6F
6F
66
20
20
74
61
6F
6C
0A
20
6F
0A
20
6C
72
6D
59
20
74
75
6F
20
77
6C
0D
70
6F
6F
79
65
64
63
72
67
61
61
65
0A
65
67
6D
20
73
65
68
20
72
72
79
78
49
65
72
20
6E
2E
76
2C
77
61
65
20
20
73
6C
61
74
65
20
61
20
61
6D
20
6F
4E
20
20
6D
68
77
49
6C
6E
72
20
6E
66
65
69
6F
20
65
20
74
75
6F
72
73
6F
20
77
74
66
73
20
72
20
65
72
61
74
74
64
62
20
66
74
62
6F
77
20
20
6E
69
20
69
69
6F
20
69
61
6C
69
79
76
74
63
61
66
65
6B
74
63
63
65
6C
6F
6F
79
6B
20
66
20
61
68
6B
6B
78
6C
75
69
2E
65
67
65
46
79
65
65
20
3F
20
72
64
20
72
6F
72
41
20
20
72
6F
0D
6E
20
20
48
73
6F
65
51
74
68
20
66
0A
6F
77
79
6F
0D
64
6E
Rolex Newbie FAQ
....Is it okay t
o peel off the h
ologram sticker
from the back of
my new rolex?..
Yes. It will no
t devalue your w
atch, nor void y
our warranty. Ho
logram stickers.
. are not a good
way of differen
Pictures in Binary
Each Pixel is a 3-tuple, (Red, Green, Blue)
Pictures in Binary
$ dump lena.jpg
00000000 ffd8 ffe0
00000010 0048 0000
00000020 0505 0609
00000030 0a0b 0a0a
00000040 100f 0e0c
00000050 2020 2020
00000060 070d 0c0d
00000070 2020 2020
00000080 2020 2020
00000090 2020 2020
000000a0 0011 0802
000000b0 01ff c400
000000c0 0000 0000
000000d0 c400 5310
000000e0 0303 0403
000000f0 2241 5161
0010
ffdb
0605
0c10
1313
2020
1810
2020
2020
2020
5803
1c00
0000
0001
0102
1432
4a46
0043
0609
0c0c
1414
2020
1018
2020
2020
2020
2003
0001
0200
0203
0300
7181
4946
0006
0b08
0c0c
1313
20ff
1a15
2020
2020
2020
0111
0501
0103
0406
0411
91a1
0001
0404
0606
0c0c
1c1b
db00
1115
2020
2020
2020
0002
0101
0405
0607
0512
0723
0101
0405
080b
100c
1b1b
4301
1a20
2020
2020
2020
1101
0000
0607
0408
2131
4252
0048
0406
0c0a
0e0f
1c20
0707
2020
2020
2020
ffc0
0311
0000
08ff
0307
0613
b1c1
.X.`..JFIF.....H
.H...[.C........
................
................
...............
.[.C...
.............
.@
....X. .........
..D.............
................
D.S.............
............!1..
"AQa.2q..!.#BR1A
Unsigned Numbers
All the numbers we’ve discussed are unsigned.
(ie. Non-negative integers)
Assume 8-bits of information:
Eg.
0000 0000 = 0
0000 0001 = 1
1000 0000 = 128
1111 1111 = 255
Range is [0,255]
Signed Numbers
What if we want to represent negative numbers?
Naïve Solution: Sign/Magnitude Notation
Use first bit to represent +/- (sign bit)
Eg.
0000 0000 = 0
0000 0001 = 1
1000 0001 = -1
0111 1111 = 127
1111 1111 = -127
Range is [-127,127]. But this is wasteful! There are two
ways of representing 0! (+0, -0)
Signed Numbers
Another approach: Bias Notation
Take the unsigned number, subtract b (eg. b = 127)
Eg.
0000 0000 = 0 – 127 = -127
0000 0001 = 1 – 127 = -126
0111 1111 = 127 – 127 = 0
1000 0000 = 128 – 127 = 1
1111 1111 = 255 – 127 = 128
Range is [-127,128]. This works, and has its purposes,
but usually we prefer….
Signed Numbers
Usual approach: Two’s Compliment
MSB is considered to have negative weight.
Eg.
0000 0000 = 0
0000 0001 = 1
1111 1111 = -1
1000 0000 = – 128
0111 1111 = 127
Range is [-128,127].
It seems goofy, but there’s a lot of good reasons for it
Two’s Complement
Advantages:
Easy to negate: Take the bitwise complement, add
one
Efficient – adding and what logical operator?
Overflow is handled “gracefully”
Easy to tell if a number is negative – if MSB is set
More details in your req’d reading :)
One’s Compliment
Ones’ Compliment: Mostly theoretical (noone uses it)
MSB is considered to have weight –(2w-1-1) instead of
2w-1. (eg. MSB = -127 instead of -128)
Eg.
0000 0000 = 0
0000 0001 = 1
1111 1110 = -1
1000 0000 = – 127
0111 1111 = 127
1111 1111 = 0
Range is [-127,127].
Note again there’s two ways of representing 0
What about fractions?
Okay great, we know how to represent all kinds of
integers:
Non-negative Integers: Unsigned format
Integers:
Sign-Magnitude
Bias Notation
Two’s Complement
Ones’ Complement
But how do we represent fractional numbers? Eg. ½
What about fractions?
Idea: How do we represent it in decimals?
½ = 0.5
We can introduce a decimal point to binary:
Decimal -> Binary
0.5 -> .1
1.5 -> 1.1
2.5 -> 10.1
0.25 -> 0.01
0.75 -> 0.11
Binary
This follows from our original definition
1010.1010 = abcd.efgh
Number = a*23 + b*22 + c*21 + d*20
+ e*2-1 + f*2-2 + g*2-3 + h*2-4
= 1*8 + 0*4 + 1*2 + 0*1
+ 1*1/2 + 0*1/4 + 1*1/8 + 0*1/16
= 8 + 2 + .5 + .125
= 10.675
Fixed Point
So if we have 8 bits of information, and we say
that the decimal point occurs between the
two sets of 4 bits, we have a convention for
representing fractions:
0000 0000 = 0
0001 0000 = 1
0000 1000 = 0.5
0001 1000 = 1.5
1010 1010 = 10.675
So called Fixed Point representation
Fixed Point
But with n bits, our range is still very small.
[0,2w/2)
We want to be able to express a very large
range (and negative numbers) very
compactly.
Let’s think about scientific notation:
1.2e10 = 1.2 * 1010
Binary Equivalent!
Floating Point
Binary equivalent of scientific notation is called
“floating point”
value * 2exponent
So since our decimal point is “floating”, we
have a much larger expressible range
IEEE Floating Point
Standardized representation of floating point
(-1) sign * mantissa * 2exponent
So since our decimal point is “floating”, we
have a much larger expressible range.
The mantissa is unsigned,
The exponent is expressed in bias notation.
*Brian & O’Hallaron calls it “significand” instead of mantissa
IEEE Floating Point
An in-depth example:
(-1) sign * mantissa * 2exponent
Suppose we have 9 bits to play with:
sign (1 bit) mantissa (4 bits) exponent (4 bits)
sign s: 0 or 1
mantissa, M: Fixed point number in the range [1,2)
exponent, E: Bias notation in the range [-6,7]*
*Why not [-7,8]? Those values used for something special
IEEE Floating Point
sign (1 bit)
s
mantissa (4 bits)
abcd
exponent (4 bits)
efgh
Mantissa: Fixed point notation – implied decimal
point
a.bcd
eg.
1.0 -> 1.000
1.125 -> 1.001
1.25 -> 1.010
1.5 -> 1.100
1.75 -> 1.110
IEEE Floating Point
The mantissa encodes a value in the range [1,2)
Realization: The most significant digit is always
1! Don’t need to encode it!
sign (1 bit)
s
mantissa (4 bits)
1.abcd
exponent (4 bits)
efgh
So the mantissa has a precision of 2-4 = 1/16
IEEE Floating Point
sign (1 bit)
mantissa (4 bits)
s
1.abcd
exponent (4 bits)
efgh
Exponent, E has k-bits, in bias notation
Bias is 2k-1-1 = 7
So the range is [-7,8]
IEEE Floating Point
Encoding Table
IEEE Floating Point
sign (1 bit)
s
mantissa (4 bits)
1.abcd
exponent (4 bits)
efgh
Special Values for Exponent, E:
If exponent field is all 0’s, the number is considered
denormalized: Mantissa does not have an implied
leading 1.
If exponent field is all 1’s, then there’s a special
interpretation to encode values such as infinity, and
NaN
So the range becomes is [-6,7]
IEEE Floating Point
Encoding Table
IEEE Floating Point
Closing notes:
Some numbers, such as 0.2 cannot be represented
exactly using any of the formats we’ve described
IEEE 32-bit Single-precision float: (c float usually)
1 sign bit, 23-bit mantissa, 8-bit exponent
Approximately 7 decimal digits of precision
IEEE 64-bit Double-precision float: (c double usually)
1 sign bit, 52-bit mantissa, 11-bit exponent
Rounding imprecision is a BIG problem with floating
point numbers.
bool equal( float x, float y ) { // Never do this
if ( x == y ) return true;
else return false;
}
printf rounds floats to be more human readable
Units
Some terminology:
- Byte: Smallest addressable unit on an
architecture. Usually an octet (8 bits)
- Nibble: Half a byte (4 bits)
- Word: Natural Unit of data on the architecture
-
8086: 8 bits
IA32, PPC: 32 bits
(Often the size of address space)
- Dword (Double word), Quad-word
- Caches often like 64 bytes (x86)
- Memory Pages (x86 4096 bytes)
- Disk Sectors (512 bytes common)
Units (for engineers)
b = bits
B = bytes
KB = Kilobyte = 210 = 1024
MB = Megabyte = 220 = 1024*1024 = 1048576
GB = Gigabyte = 230 = 1073741824
TB = Terrabyte = 240 = 1099511627776
*Note: MB = Megabyte, Mb = Megabit
**k and K are used interchangeably
Units (for marketing)
b = bits
B = bytes
KB = Kilobyte = 10 = 1000
MB = Megabyte = 102 = 1,000,000
GB = Gigabyte = 103 = 1,000,000,000
TB = Terrabyte = 104 = 1,000,000,000,000
Reason: Makes numbers seem bigger and
cooler
*Note: MB, mb, Mb all used interchangeably
Computer System (Idealized)
Disk
CPU
Memory
Disk
Controller
System Bus
Making Programs
$ cat hello.c
#include <stdio.h>
int main() {
printf( "Hello, world\n" );
return 0;
}
$ ./hello
Hello, world
Making Programs
But the computer doesn’t understand C code! C
is for humans.
Machine code looks like this:
00000000
00000010
00000020
00000030
00000040
00000050
00000060
00000070
00000080
00000090
000000a0
000000b0
4d5a
b800
0000
0000
0e1f
6973
7420
6d6f
5045
f800
0004
0000
9000
0000
0000
0000
ba0e
2070
6265
6465
0000
0000
0000
0000
0300
0000
0000
0000
00b4
726f
2072
2e0d
4c01
e000
0002
0000
0000
0000
0000
0000
09cd
6772
756e
0d0a
0400
0703
0000
4000
0400
4000
0000
0000
21b8
616d
2069
2400
0951
0b01
0010
0010
0000
0000
0000
0000
014c
2063
6e20
0000
ee42
0238
0000
0000
ffff
0000
0000
8000
cd21
616e
444f
0000
000c
0004
0010
0002
0000
0000
0000
0000
5468
6e6f
5320
0000
0000
0000
0000
0000
MZ..............
8.......@.......
................
................
..:..4.M!8.LM!Th
is program canno
t be run in DOS
mode....$.......
PE..L....QnB....
x...`......8....
................
......@.........
Enter the compiler
The compiler translates C to machine code…
Hello.c
Compiler Magic
(text file)
$ gcc hello.c –o hello
$ ./hello
Hello, world
Hello
(binary object)
Compilation System
Demystifying (slightly)
Hello.c
(c code)
Preprocessor
Hello.i
(preprocessed
simplified c)
Compiler
Hello.o
Assembler
(preprocessed
simplified c)
Hello
(binary object)
Compilation is divided into stages to simplify it.
Let’s follow through hello world example
Step 0: Source
Start with source code:
#include <stdio.h>
int main() {
printf( "Hello, world\n" );
return 0;
}
Step 1: Preprocess stage
Translates C to “simplified” C. Translates
macros, resolves file references,
preprocessor conditionals
#include <file.h>
#if, #ifdef, #else, #endif
#define
$ gcc -E hello.c >hello.i
Step 2: Compilation Stage
Translates preprocessed C into a simple
language called Assembly. Still humanreadable, but barely. Very close to machine
language
pushl
movl
subl
andl
movl
addl
addl
shrl
sall
movl
movl
call
call
movl
call
movl
leave
ret
%ebp
%esp, %ebp
$8, %esp
$-16, %esp
$0, %eax
$15, %eax
$15, %eax
$4, %eax
$4, %eax
%eax, -4(%ebp)
-4(%ebp), %eax
__alloca
___main
$LC0, (%esp)
_printf
$0, %eax
$ gcc -S hello.i
Step 3: Assembling stage
Translates assembly to machine code.
This stage is very simple – 1:1 mapping
between assembly and machine code
00000070
00000080
00000090
000000a0
000000b0
000000c0
000000d0
000000e0
000000f0
0000
0000
7461
f400
4000
0000
fc8b
2400
c390
0000
0000
0000
0000
0040
83c0
45fc
0000
9090
0000
0000
0000
0000
5589
0f83
e800
00e8
4865
0000
0000
0000
0000
e583
c00f
0000
0000
6c6c
0000
8000
0000
0000
ec08
c1e8
00e8
0000
6f2c
0000
00c0
0000
0000
83e4
04c1
0000
b800
2077
0000
2e72
1000
0000
f0b8
e004
0000
0000
6f72
0000
6461
0000
0000
0000
8945
c704
00c9
6c64
$ gcc -c hello.s –o hello.o
................
...........@.rda
ta..............
t...............
@..@U.e.l..dp8..
...@..@.Ah.A`..E
|.E|h....h....G.
$....h....8....I
C...Hello, world
Download