Computer Science Basics CS 216 Fall 2012 Operating Systems • interface to the hardware for the user and programs • The two operating systems that you are most likely to use at UK are Unix (Linux) and Windows • Unix is more of a programmer’s operating system (used on Multilab), with a number of tools and features for developing programs Operating Systems • Linux is similar to Unix and is the operating system that you will use on Multilab • It is becoming very popular for its reliability, and because it is “freeware” (available for free over the Internet) Programming • A program is an instance of an algorithm in a specific programming language • An algorithm is a step by step procedure for solving a problem Process • The programming process involves: – Problem description – Develop algorithm to solve problem – Implement the algorithm in the language that it will be executed in (program) – Test the program Programming • In programming classes, the problem is assigned to you. • In the business world, the problem is usually to solve a business problem for the organization that employs you. • In either case, there is an algorithm that will solve the problem executing on a computer. This is not the case for all possible problems. Complexity 1. There are problems that have no solution 2. There are problems where there is a solution, but the algorithm is so slow that it will not complete in thousands of years on the fastest computers 3. There are the problems that are solvable and executable on computers Complexity • No solution: write an algorithm to determine if any program will loop forever. A practical example is a C++ program checker to determine if a C++ program has infinite loops in it. • Intractable: traveling salesman problem (NPComplete) • In this class you will only be assigned problems from case 3 Programming • Solving a problem involves creating an algorithm, that you can encode in an algorithm • µ-Processors execute only a simple set of instructions. At first all programming was in the machine language (difficult task) • Big leap: high level programming language (first was Fortran, appeared in mid-50s) Languages • Nowadays there are hundreds of programming languages (Python, C/C++, Java, Perl, Visual Basic, TCL, JavaScript). They fall into two classes: 1. Compiled languages 2. Interpreted languages Compiled • Compilers were developed to convert highlevel program statements into the machine instructions that the computer can execute. • A program in a compiled language (Fortran, C/C++, Cobol) is converted into a machine language program by a compiler. • Once converted into a machine language program, the machine language program can be run over and over again. Interpreted • An interpreted language (perl, JavaScript, tcl) has no compiler. • Every time you run the program, it is passed to an interpreter. • The interpreter “interprets” the high level statements (converts them to machine instructions) on the fly. Compare/contrast • In general, compiled languages run faster (no interpretation each time they execute), and have more facilities in the language for catching programming errors. • Interpreted languages run slower (interpreted each time they execute), have fewer language facilities for catching programming errors, but are simpler to use and understand. • The process for developing , debugging, and updating programs is usually faster for an interpreted language. Compiler process • Language source code ==>compiler ==> to turn source into machine instructions • Machine instructions (from multiple source code files)==> linker ==>to link multiple compiles and library routines into an executable program • executable program==> run on computer Interpreter process • High level language==> interpreter to convert statements into machine instructions==> run on computer. • In this course you will write programs in C++ (a compiled language) and Perl (an interpreted language). Bits Bytes and Hexadecimal Numbers • All information processed by computers is in the binary number system, not the decimal number system that humans use – E.g.: 10010100100001010010100010 • All information in computers (including numbers and characters) is encoded in bits • Decimal system has 10 digits • Binary has 2 (1 and 0) Bits/bytes • Eight bits are a byte. • The byte is the smallest unit of information that is sent between components of the computer. • Storage is packaged in bytes. Addressing • The computer microprocessor addresses and accesses storage by byte address (starting with hexadecimal address 0). • It sends/receives data to I/O devices in multiples of bytes. The data bus inside the computer has a width of some multiple of bytes. • The word size of a computer is the width of its bus. The word size of most current personal computers is 8 bytes (64 bits). Hexadecmial • Hexadecimal numbers are used as a convenient method for expressing addresses or data that are in bytes. • Operating systems, compilers, and other tools may represent the binary data in hexadecimal format. For example, when a segmentation fault occurs, the operating system dumps error information in hexadecimal. • A segmentation fault is caused when a program tries to access a memory location outside of the area in memory assigned to it by the operating system. A bus error is a similar error. Hexadecimal • Hexadecimal (base 16) all numbers expressed with 16 digits • 0-9 • A decimal 10 • B 11 • C 12 • D 13 • E 14 • F 15 Binary • Binary (base 2) all numbers expressed with two digits (0,1) • Position is a power of 2 value: • 0001 = 1 0010 =2 0011 = 3 0111 = 7 8421 8421 8421 8421 • 1111 = 15 (F hex) 1010 = 10 dec (A hex) 8421 8421 Conversion from binary to hex • Put bits in groups of 4 • Each 4-tuple of bits is a hex digit • Remember that decimal 10-15 = hexadecimal A-F • • • • • 1001 = 9 1010 = A (10 decimal) 1111 = F (15 decimal) binary: 0101 0110 1100 0010 1110 hexadecimal: 5 6 C 2 E Hex to binary • • • • • • • • • • • • • • Each hexadecimal digit is a decimal number 0-15, representing four binary bits: hexadecimal binary 1 0001 2 0010 3 0011 ... 8 1000 9 1001 A 1010 (10 decimal) B 1011 (11 decimal) C 1100 (12 decimal) D 1101 (13 decimal) E 1110 (14 decimal) F 1111 (15 decimal) • • hexadecimal: D 7 B 5 F 3 C 8 binary: 1101 0111 1011 0101 1111 0011 1100 1000 Common numbers • Because of the binary nature of the hardware, there are some numbers that appear frequently: • Hexadecimal Decimal • FF 255 largest decimal number in a byte (8 bits) • FFFF 65535 largest decimal number in 2 bytes (16 bits) • If one bit is used for the sign (as C++ does for a short integer), the largest decimal number that can be stored in a C++ short integer is +32767 • FFFFFFFF +2147483647 largest decimal number that can be stored in a variable • declared as an int (one bit reserved for the sign, 31 bits for the number) • (there is more than one way to encode signed integers) Computer Storage • Computer storage space (internal memory, or external device capacity) is usually referred to using decimal units (kilobytes (KB), megabytes (MB), gigabytes (GB)). However internally, the space is allocated in binary as a power of two. The power-of-two size is greater than the decimal measurement: Measurement Power of 10 Power of 2 Size difference Kilobyte 3 10 2.4% Megabyte 6 20 4.86% Gigabyte 9 30 7.37% So if you purchase a computer with 1 gigabyte of memory, you are really getting 1,073,741,824 bytes of storage (2 raised to the 30th power). Representation of characters • All characters are represented by bytes. The standard representation is called ASCII. • For example, the character A is represented in ASCII by 0100 0001 (41 hexadecimal). • The ASCII character set was originally only 7 bits long, which allows for 128 distinct characters. • That's plenty for English alphabetic characters and punctuation and numbers, but it won't hold larger international alphabets, and special characters. Representation of characters • To represent characters in all the world's alphabets, you might need to handle about 50 alphabets (including Chinese, Korean, Thai, Hebrew, Cyrillic), each with different (sometimes a great number of) characters (including, for English, ç and ö). The Unicode character representation has been developed for this purpose. • The Unicode representation assigns a one to four byte encoding for the scripts of the world's principal languages, in addition to many historic and archaic scripts. Some programming languages use Unicode (Java), some let you use either ASCII or Unicode (C++). The Unicode encoding named UTF-8 (the most widely used encoding), supports ASCII characters as one byte values, but allows other languages and special characters to be used as two to four byte values. Program Development • Program development in CS 215 used an Integrated Development Environment (IDE). The IDE is the user interface to a collection of tools used to develop and test programs. • The tools that IDEs contain: – – – – – A project manager A text editor A compiler A linker A debugger IDE • The project manager controls the files that compose the program. Files that you include in your project will automatically be compiled and linked to produce an executable program. The make utility (described later in the course) is a project manager for Unix and Linux. IDE • The text editor is used to develop the program source code. The IDE text editors typically are designed to help you write source code. For example, they can automatically indent blocks of code, and display comments, reserved words in different colors. On Multilab (where your programs must execute) there are a number of editors, for example emacs, vi, and pico are editors available. IDE • The compiler produces the machine instructions for your hardware and operating system from the source code. The file of machine instructions produced from the source code is called an object file. On Multilab the compiler that is used is g++. IDE • The linker combines multiple object files of machine instructions into an executable program. On Multilab the linker used is built into g++. IDE • The debugger assists the debugging of programs. You can set breakpoints (points in the program where execution will halt), then look at or change program variables. On Multilab the debugger is gdb. Note • For CS216, you can use an IDE on Windows (such as Visual Studio), or a Mac, but your programs must compile and execute on the Multilab platform that has Linux as its operating system. This means that you have to send your source code and include files to Multilab, then build the executable program there. You cannot send the executable built on Windows to Multilab and execute it there.