OMSE 510: Computing Foundations Intro Lecture Chris Gilmore <grimjac@cs.pdx.edu> Portland State University/OMSE Website/mailing list Website http://web.cecs.pdx.edu/~grimjack/OMSE510CF/ComputerFoundations.html Mailing List: omse510@cecs.pdx.edu Personal Email: grimjack@cs.pdx.edu About OMSE510 Course Rationale: This course has been designed for graduate level software engineering students who are lacking key foundation computer science knowledge in the areas of computer architecture and operating systems. This course may also be taken by students needing or wanting to upgrade their knowledge in these areas. With the approval of an OMSE advisor, OMSE students may register in this course and count it for credit as an OMSE elective. Course Structure Divided into two halves Computer Architecture How the hardware works 4 Sessions + Midterm Operating Systems How the software interacts with the hardware 5 Sessions + Final Four Assignments (40%), Midterm (30%), Final (30%) Things not covered Transistors, logic gates & Lower-level functionality In-depth Floating Point/Integer Arithmetic Networking Hardware Description Languages (Verilog, ISP’) Security The History of anything Theoretical Architectures First Note 1) I like feedback 2) It’s good to ask questions in class. Email is less good. 3) If you don’t understand, ask NOW. Probably other people don’t understand. And we always build on existing material. 4) One or two breaks in a 3 hour class. The Basics Today’s lecture covers the very basics – should probably be review! If you’re bored, that’s good! The interesting stuff comes later Today’s Lecture Amdahl’s Law Data Representation Conventions: (binary/hex/oct) Unsigned/signed integers Floating point Brief on Compilers Amdahl’s Law Fundamental design principle in computer architecture design. Make things FAST. Amdahl’s law is a guideline for making things faster. Speedup Suppose some task that takes time torig minutes to perform Eg. Flying from PDX to YVR, 80 mins Boeing 727, ~900 km/h Speedup But time is important to us! Let’s take the Concorde instead! Flying from PDX to YVR - Boeing 727, ~900 km/h, 80 mins - Concorde, ~2200 km/h, 40 mins 40 minutes saved! Speedup Flying from PDX to YVR told = 80 mins (Boeing 727) tnew = 40 mins (Concorde) Speedup = told tnew 80 min = =2 40 min 2x speed improvement! That’s great! .. But is it really? Speedup Time actually spent traveling from PDX to YVR: 30 mins MAX to airport 20 mins getting your ticket 45 mins getting through security 30 mins boarding/taxiing 80 mins flying 40 mins landing + customs = 245 minutes Speedup Time actually spent traveling from PDX to YVR: 245 minutes (Boeing 747) 205 minutes (Concorde) Where’d that 2x speedup go? Speedup Only 33% of total time! 30 mins MAX to airport 20 mins getting your ticket 45 mins getting through security 30 mins boarding/taxiing 80 mins flying 40 mins landing + customs = 245 minutes Amdahl’s Law The variables: told = 245 mins (Original travel time) α = 33% (Time actually spent flying) k = 2 (Speedup factor) tnew = (1-α) told x α told / k = 66% * 245 mins x 33% * 245 mins / 2 = 205 mins Amdahl’s Law Speedup, S S = told /tnew = 1 / [ (1-α) + α /k ] = 1.2 Much less than 2x! Moral of the story: To improve the system, you have to work harder than you want Amdahl’s Law Special case – set k = ∞ S∞ = 1 / (1 – α) Most amount of speedup you can get out of tuning one component. ie. Are you wasting your time? Amdahl’s Law Most important to Computer Architecture/Operating system design: Speed! Not necessarily like regular programming. More important than correctness (almost) Data Representation Foundation Idea #2: Computers represent everything with numbers Data Representation Everything in a computer is represented as a number. Letters -> Numbers Pictures -> Numbers Programs -> Numbers Data = Numbers Numbers in different bases (This should be old hat for you) Non-negative Integers: Decimal (Human) Numbers: 0,1,2,…..256, …. 1024… 2048…. Binary Data in computers only exist in 2 states, on and off. (1 or 0) This means it’s hard for them to count in decimal… Decimal / Binary Decimal 0 1 2 3 4 5 Binary 0 1 10 11 100 101 Decimal Decimal 12345 = abcde Number = a*104 + b*103 + c*102 + d*101 + e*100 = 1*10000 + 2*1000 + 3*100 + 4*10 + 5*1 = 10000 + 2000 + 300 + 40 + 5 = 12345 Binary Binary (Base 2) 10101 = abcde Number = a*24 + b*23 + c*22 + d*21 + e*20 = 1*16 + 0*8 + 1*4 + 0*2 + 1*1 = 16 + 0 + 4 + 0 + 1 = 21 Decimal / Binary Decimal 0 1 2 3 4 5 Binary 0 1 10 11 100 101 Octal/Hex Okay, computers like binary… But binary is too hard to read for humans. … But we want to express powers of two conveniently Octal 00, 01, 02,…, 07, 010, … 017, 020….. Octal Octal (Base 8) 012345 = 0abcde Number = a*84 + b*83 + c*82 + d*81 + e*80 = 1*4096+ 2*512 + 3*64 + 4*8 + 5*1 = 4096 + 1024 + 192 + 32 + 5 = 5349 Decimal / Binary /Octal Decimal 0 1 2 3 4 5 8 12 47 Binary 0 1 10 11 100 101 1000 1100 101111 Octal 00 01 02 03 04 05 010 014 057 Hexedecimal But octal still cumbersome, because computers often prefer grouping in sets of 4 binary digits. (Octal groups bits in sets of 3) Hex Format (The preferred choice) 0x0, 0x1, 0x2,…0xf, 0x10, 0x11, .. 0x1a,0x20 Hexedecimal Hex (Base 16) 0x12345 = 0xabcde Number = a*164 + b*163 + c*162 + d*161 + e*160 = 1*65536+ 2*4096 + 3*256 + 4*16 + 5*1 = 65536 + 8192 + 768 + 64 + 5 = 74565 Hexedecimal Hex Digits: Need more than 10 digits (0-9) So we use a b c d e f Decimal: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 Hexedecimal: 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xA, 0xB, 0xC, 0xD, 0xE, 0xF, 0x10,0x11 Decimal / Binary /Octal / Hex Decimal Binary Octal Hex 0 0 00 0x0 1 1 01 0x1 2 10 02 0x2 3 11 03 0x3 4 100 04 0x4 5 101 05 0x5 8 1000 010 0x8 12 1100 014 0xC 47 101111 057 0x2F *Chinese Remainder Theorem to convert ASCII Oct Dec Hex Char ------------------------------101 65 41 A 102 66 42 B 103 67 43 C 104 68 44 D 105 69 45 E 106 70 46 F 107 71 47 G 110 72 48 H 111 73 49 I 112 74 4A J 113 75 4B K 114 76 4C L 115 77 4D M Oct Dec Hex Char ------------------------------116 78 4E N 117 79 4F O 120 80 50 P 121 81 51 Q 122 82 52 R 123 83 53 S 124 84 54 T 125 85 55 U 126 86 56 V 127 87 57 W 130 88 58 X 131 89 59 Y 132 90 5A Z Text in ASCII Rolex Newbie FAQ Is it okay to peel off the hologram sticker from the back of my new rolex? Yes. It will not devalue your watch, nor void your warranty. Hologram stickers are not a good way of differentiating real and fake Rolexes. Even fake ones often come with a hologram sticker. 00000000 00000010 00000020 00000030 00000040 00000050 00000060 00000070 00000080 00000090 000000A0 000000B0 000000C0 52 0D 6F 6F 66 20 20 74 61 6F 6C 0A 20 6F 0A 20 6C 72 6D 59 20 74 75 6F 20 77 6C 0D 70 6F 6F 79 65 64 63 72 67 61 61 65 0A 65 67 6D 20 73 65 68 20 72 72 79 78 49 65 72 20 6E 2E 76 2C 77 61 65 20 20 73 6C 61 74 65 20 61 20 61 6D 20 6F 4E 20 20 6D 68 77 49 6C 6E 72 20 6E 66 65 69 6F 20 65 20 74 75 6F 72 73 6F 20 77 74 66 73 20 72 20 65 72 61 74 74 64 62 20 66 74 62 6F 77 20 20 6E 69 20 69 69 6F 20 69 61 6C 69 79 76 74 63 61 66 65 6B 74 63 63 65 6C 6F 6F 79 6B 20 66 20 61 68 6B 6B 78 6C 75 69 2E 65 67 65 46 79 65 65 20 3F 20 72 64 20 72 6F 72 41 20 20 72 6F 0D 6E 20 20 48 73 6F 65 51 74 68 20 66 0A 6F 77 79 6F 0D 64 6E Rolex Newbie FAQ ....Is it okay t o peel off the h ologram sticker from the back of my new rolex?.. Yes. It will no t devalue your w atch, nor void y our warranty. Ho logram stickers. . are not a good way of differen Pictures in Binary Each Pixel is a 3-tuple, (Red, Green, Blue) Pictures in Binary $ dump lena.jpg 00000000 ffd8 ffe0 00000010 0048 0000 00000020 0505 0609 00000030 0a0b 0a0a 00000040 100f 0e0c 00000050 2020 2020 00000060 070d 0c0d 00000070 2020 2020 00000080 2020 2020 00000090 2020 2020 000000a0 0011 0802 000000b0 01ff c400 000000c0 0000 0000 000000d0 c400 5310 000000e0 0303 0403 000000f0 2241 5161 0010 ffdb 0605 0c10 1313 2020 1810 2020 2020 2020 5803 1c00 0000 0001 0102 1432 4a46 0043 0609 0c0c 1414 2020 1018 2020 2020 2020 2003 0001 0200 0203 0300 7181 4946 0006 0b08 0c0c 1313 20ff 1a15 2020 2020 2020 0111 0501 0103 0406 0411 91a1 0001 0404 0606 0c0c 1c1b db00 1115 2020 2020 2020 0002 0101 0405 0607 0512 0723 0101 0405 080b 100c 1b1b 4301 1a20 2020 2020 2020 1101 0000 0607 0408 2131 4252 0048 0406 0c0a 0e0f 1c20 0707 2020 2020 2020 ffc0 0311 0000 08ff 0307 0613 b1c1 .X.`..JFIF.....H .H...[.C........ ................ ................ ............... .[.C... ............. .@ ....X. ......... ..D............. ................ D.S............. ............!1.. "AQa.2q..!.#BR1A Unsigned Numbers All the numbers we’ve discussed are unsigned. (ie. Non-negative integers) Assume 8-bits of information: Eg. 0000 0000 = 0 0000 0001 = 1 1000 0000 = 128 1111 1111 = 255 Range is [0,255] Signed Numbers What if we want to represent negative numbers? Naïve Solution: Sign/Magnitude Notation Use first bit to represent +/- (sign bit) Eg. 0000 0000 = 0 0000 0001 = 1 1000 0001 = -1 0111 1111 = 127 1111 1111 = -127 Range is [-127,127]. But this is wasteful! There are two ways of representing 0! (+0, -0) Signed Numbers Another approach: Bias Notation Take the unsigned number, subtract b (eg. b = 127) Eg. 0000 0000 = 0 – 127 = -127 0000 0001 = 1 – 127 = -126 0111 1111 = 127 – 127 = 0 1000 0000 = 128 – 127 = 1 1111 1111 = 255 – 127 = 128 Range is [-127,128]. This works, and has its purposes, but usually we prefer…. Signed Numbers Usual approach: Two’s Compliment MSB is considered to have negative weight. Eg. 0000 0000 = 0 0000 0001 = 1 1111 1111 = -1 1000 0000 = – 128 0111 1111 = 127 Range is [-128,127]. It seems goofy, but there’s a lot of good reasons for it Two’s Complement Advantages: Easy to negate: Take the bitwise complement, add one Efficient – adding and what logical operator? Overflow is handled “gracefully” Easy to tell if a number is negative – if MSB is set More details in your req’d reading :) One’s Compliment Ones’ Compliment: Mostly theoretical (noone uses it) MSB is considered to have weight –(2w-1-1) instead of 2w-1. (eg. MSB = -127 instead of -128) Eg. 0000 0000 = 0 0000 0001 = 1 1111 1110 = -1 1000 0000 = – 127 0111 1111 = 127 1111 1111 = 0 Range is [-127,127]. Note again there’s two ways of representing 0 What about fractions? Okay great, we know how to represent all kinds of integers: Non-negative Integers: Unsigned format Integers: Sign-Magnitude Bias Notation Two’s Complement Ones’ Complement But how do we represent fractional numbers? Eg. ½ What about fractions? Idea: How do we represent it in decimals? ½ = 0.5 We can introduce a decimal point to binary: Decimal -> Binary 0.5 -> .1 1.5 -> 1.1 2.5 -> 10.1 0.25 -> 0.01 0.75 -> 0.11 Binary This follows from our original definition 1010.1010 = abcd.efgh Number = a*23 + b*22 + c*21 + d*20 + e*2-1 + f*2-2 + g*2-3 + h*2-4 = 1*8 + 0*4 + 1*2 + 0*1 + 1*1/2 + 0*1/4 + 1*1/8 + 0*1/16 = 8 + 2 + .5 + .125 = 10.675 Fixed Point So if we have 8 bits of information, and we say that the decimal point occurs between the two sets of 4 bits, we have a convention for representing fractions: 0000 0000 = 0 0001 0000 = 1 0000 1000 = 0.5 0001 1000 = 1.5 1010 1010 = 10.675 So called Fixed Point representation Fixed Point But with n bits, our range is still very small. [0,2w/2) We want to be able to express a very large range (and negative numbers) very compactly. Let’s think about scientific notation: 1.2e10 = 1.2 * 1010 Binary Equivalent! Floating Point Binary equivalent of scientific notation is called “floating point” value * 2exponent So since our decimal point is “floating”, we have a much larger expressible range IEEE Floating Point Standardized representation of floating point (-1) sign * mantissa * 2exponent So since our decimal point is “floating”, we have a much larger expressible range. The mantissa is unsigned, The exponent is expressed in bias notation. *Brian & O’Hallaron calls it “significand” instead of mantissa IEEE Floating Point An in-depth example: (-1) sign * mantissa * 2exponent Suppose we have 9 bits to play with: sign (1 bit) mantissa (4 bits) exponent (4 bits) sign s: 0 or 1 mantissa, M: Fixed point number in the range [1,2) exponent, E: Bias notation in the range [-6,7]* *Why not [-7,8]? Those values used for something special IEEE Floating Point sign (1 bit) s mantissa (4 bits) abcd exponent (4 bits) efgh Mantissa: Fixed point notation – implied decimal point a.bcd eg. 1.0 -> 1.000 1.125 -> 1.001 1.25 -> 1.010 1.5 -> 1.100 1.75 -> 1.110 IEEE Floating Point The mantissa encodes a value in the range [1,2) Realization: The most significant digit is always 1! Don’t need to encode it! sign (1 bit) s mantissa (4 bits) 1.abcd exponent (4 bits) efgh So the mantissa has a precision of 2-4 = 1/16 IEEE Floating Point sign (1 bit) mantissa (4 bits) s 1.abcd exponent (4 bits) efgh Exponent, E has k-bits, in bias notation Bias is 2k-1-1 = 7 So the range is [-7,8] IEEE Floating Point Encoding Table IEEE Floating Point sign (1 bit) s mantissa (4 bits) 1.abcd exponent (4 bits) efgh Special Values for Exponent, E: If exponent field is all 0’s, the number is considered denormalized: Mantissa does not have an implied leading 1. If exponent field is all 1’s, then there’s a special interpretation to encode values such as infinity, and NaN So the range becomes is [-6,7] IEEE Floating Point Encoding Table IEEE Floating Point Closing notes: Some numbers, such as 0.2 cannot be represented exactly using any of the formats we’ve described IEEE 32-bit Single-precision float: (c float usually) 1 sign bit, 23-bit mantissa, 8-bit exponent Approximately 7 decimal digits of precision IEEE 64-bit Double-precision float: (c double usually) 1 sign bit, 52-bit mantissa, 11-bit exponent Rounding imprecision is a BIG problem with floating point numbers. bool equal( float x, float y ) { // Never do this if ( x == y ) return true; else return false; } printf rounds floats to be more human readable Units Some terminology: - Byte: Smallest addressable unit on an architecture. Usually an octet (8 bits) - Nibble: Half a byte (4 bits) - Word: Natural Unit of data on the architecture - 8086: 8 bits IA32, PPC: 32 bits (Often the size of address space) - Dword (Double word), Quad-word - Caches often like 64 bytes (x86) - Memory Pages (x86 4096 bytes) - Disk Sectors (512 bytes common) Units (for engineers) b = bits B = bytes KB = Kilobyte = 210 = 1024 MB = Megabyte = 220 = 1024*1024 = 1048576 GB = Gigabyte = 230 = 1073741824 TB = Terrabyte = 240 = 1099511627776 *Note: MB = Megabyte, Mb = Megabit **k and K are used interchangeably Units (for marketing) b = bits B = bytes KB = Kilobyte = 10 = 1000 MB = Megabyte = 102 = 1,000,000 GB = Gigabyte = 103 = 1,000,000,000 TB = Terrabyte = 104 = 1,000,000,000,000 Reason: Makes numbers seem bigger and cooler *Note: MB, mb, Mb all used interchangeably Computer System (Idealized) Disk CPU Memory Disk Controller System Bus Making Programs $ cat hello.c #include <stdio.h> int main() { printf( "Hello, world\n" ); return 0; } $ ./hello Hello, world Making Programs But the computer doesn’t understand C code! C is for humans. Machine code looks like this: 00000000 00000010 00000020 00000030 00000040 00000050 00000060 00000070 00000080 00000090 000000a0 000000b0 4d5a b800 0000 0000 0e1f 6973 7420 6d6f 5045 f800 0004 0000 9000 0000 0000 0000 ba0e 2070 6265 6465 0000 0000 0000 0000 0300 0000 0000 0000 00b4 726f 2072 2e0d 4c01 e000 0002 0000 0000 0000 0000 0000 09cd 6772 756e 0d0a 0400 0703 0000 4000 0400 4000 0000 0000 21b8 616d 2069 2400 0951 0b01 0010 0010 0000 0000 0000 0000 014c 2063 6e20 0000 ee42 0238 0000 0000 ffff 0000 0000 8000 cd21 616e 444f 0000 000c 0004 0010 0002 0000 0000 0000 0000 5468 6e6f 5320 0000 0000 0000 0000 0000 MZ.............. 8.......@....... ................ ................ ..:..4.M!8.LM!Th is program canno t be run in DOS mode....$....... PE..L....QnB.... x...`......8.... ................ ......@......... Enter the compiler The compiler translates C to machine code… Hello.c Compiler Magic (text file) $ gcc hello.c –o hello $ ./hello Hello, world Hello (binary object) Compilation System Demystifying (slightly) Hello.c (c code) Preprocessor Hello.i (preprocessed simplified c) Compiler Hello.o Assembler (preprocessed simplified c) Hello (binary object) Compilation is divided into stages to simplify it. Let’s follow through hello world example Step 0: Source Start with source code: #include <stdio.h> int main() { printf( "Hello, world\n" ); return 0; } Step 1: Preprocess stage Translates C to “simplified” C. Translates macros, resolves file references, preprocessor conditionals #include <file.h> #if, #ifdef, #else, #endif #define $ gcc -E hello.c >hello.i Step 2: Compilation Stage Translates preprocessed C into a simple language called Assembly. Still humanreadable, but barely. Very close to machine language pushl movl subl andl movl addl addl shrl sall movl movl call call movl call movl leave ret %ebp %esp, %ebp $8, %esp $-16, %esp $0, %eax $15, %eax $15, %eax $4, %eax $4, %eax %eax, -4(%ebp) -4(%ebp), %eax __alloca ___main $LC0, (%esp) _printf $0, %eax $ gcc -S hello.i Step 3: Assembling stage Translates assembly to machine code. This stage is very simple – 1:1 mapping between assembly and machine code 00000070 00000080 00000090 000000a0 000000b0 000000c0 000000d0 000000e0 000000f0 0000 0000 7461 f400 4000 0000 fc8b 2400 c390 0000 0000 0000 0000 0040 83c0 45fc 0000 9090 0000 0000 0000 0000 5589 0f83 e800 00e8 4865 0000 0000 0000 0000 e583 c00f 0000 0000 6c6c 0000 8000 0000 0000 ec08 c1e8 00e8 0000 6f2c 0000 00c0 0000 0000 83e4 04c1 0000 b800 2077 0000 2e72 1000 0000 f0b8 e004 0000 0000 6f72 0000 6461 0000 0000 0000 8945 c704 00c9 6c64 $ gcc -c hello.s –o hello.o ................ ...........@.rda ta.............. t............... @..@U.e.l..dp8.. ...@..@.Ah.A`..E |.E|h....h....G. $....h....8....I C...Hello, world