Power Point - Computer Science Department

advertisement
Computer Science Basics
CS 216 Fall 2012
Operating Systems
• interface to the hardware for the user and
programs
• The two operating systems that you are most
likely to use at UK are Unix (Linux) and
Windows
• Unix is more of a programmer’s operating
system (used on Multilab), with a number of
tools and features for developing programs
Operating Systems
• Linux is similar to Unix and is the operating
system that you will use on Multilab
• It is becoming very popular for its reliability,
and because it is “freeware” (available for
free over the Internet)
Programming
• A program is an instance of an algorithm in a
specific programming language
• An algorithm is a step by step procedure for
solving a problem
Process
• The programming process involves:
– Problem description
– Develop algorithm to solve problem
– Implement the algorithm in the language that it
will be executed in (program)
– Test the program
Programming
• In programming classes, the problem is
assigned to you.
• In the business world, the problem is usually
to solve a business problem for the
organization that employs you.
• In either case, there is an algorithm that will
solve the problem executing on a computer.
This is not the case for all possible problems.
Complexity
1. There are problems that have no solution
2. There are problems where there is a
solution, but the algorithm is so slow that it
will not complete in thousands of years on
the fastest computers
3. There are the problems that are solvable
and executable on computers
Complexity
• No solution: write an algorithm to determine
if any program will loop forever. A practical
example is a C++ program checker to
determine if a C++ program has infinite loops
in it.
• Intractable: traveling salesman problem (NPComplete)
• In this class you will only be assigned
problems from case 3
Programming
• Solving a problem involves creating an
algorithm, that you can encode in an
algorithm
• µ-Processors execute only a simple set of
instructions. At first all programming was in
the machine language (difficult task)
• Big leap: high level programming language
(first was Fortran, appeared in mid-50s)
Languages
• Nowadays there are hundreds of
programming languages (Python, C/C++, Java,
Perl, Visual Basic, TCL, JavaScript). They fall
into two classes:
1. Compiled languages
2. Interpreted languages
Compiled
• Compilers were developed to convert highlevel program statements into the machine
instructions that the computer can execute.
• A program in a compiled language (Fortran,
C/C++, Cobol) is converted into a machine
language program by a compiler.
• Once converted into a machine language
program, the machine language program can
be run over and over again.
Interpreted
• An interpreted language (perl, JavaScript, tcl)
has no compiler.
• Every time you run the program, it is passed
to an interpreter.
• The interpreter “interprets” the high level
statements (converts them to machine
instructions) on the fly.
Compare/contrast
• In general, compiled languages run faster (no
interpretation each time they execute), and have
more facilities in the language for catching
programming errors.
• Interpreted languages run slower (interpreted
each time they execute), have fewer language
facilities for catching programming errors, but are
simpler to use and understand.
• The process for developing , debugging, and
updating programs is usually faster for an
interpreted language.
Compiler process
• Language source code ==>compiler ==> to
turn source into machine instructions
• Machine instructions (from multiple source
code files)==> linker ==>to link multiple
compiles and library routines into an
executable program
• executable program==> run on computer
Interpreter process
• High level language==> interpreter to convert
statements into machine instructions==> run
on computer.
• In this course you will write programs in C++
(a compiled language) and Perl (an
interpreted language).
Bits Bytes and Hexadecimal Numbers
• All information processed by computers is in
the binary number system, not the decimal
number system that humans use
– E.g.: 10010100100001010010100010
• All information in computers (including
numbers and characters) is encoded in bits
• Decimal system has 10 digits
• Binary has 2 (1 and 0)
Bits/bytes
• Eight bits are a byte.
• The byte is the smallest unit of information
that is sent between components of the
computer.
• Storage is packaged in bytes.
Addressing
• The computer microprocessor addresses and
accesses storage by byte address (starting with
hexadecimal address 0).
• It sends/receives data to I/O devices in multiples
of bytes. The data bus inside the computer has a
width of some multiple of bytes.
• The word size of a computer is the width of its
bus. The word size of most current personal
computers is 8 bytes (64 bits).
Hexadecmial
• Hexadecimal numbers are used as a convenient
method for expressing addresses or data that are in
bytes.
• Operating systems, compilers, and other tools may
represent the binary data in hexadecimal format. For
example, when a segmentation fault occurs, the
operating system dumps error information in
hexadecimal.
• A segmentation fault is caused when a program tries
to access a memory location outside of the area in
memory assigned to it by the operating system. A bus
error is a similar error.
Hexadecimal
• Hexadecimal (base 16) all numbers expressed
with 16 digits
• 0-9
• A
decimal 10
• B
11
• C
12
• D
13
• E
14
• F
15
Binary
• Binary (base 2) all numbers expressed with two digits (0,1)
• Position is a power of 2 value:
• 0001 = 1 0010 =2 0011 = 3 0111 = 7
8421
8421
8421
8421
• 1111 = 15 (F hex) 1010 = 10 dec (A hex)
8421
8421
Conversion from binary to hex
• Put bits in groups of 4
• Each 4-tuple of bits is a hex digit
• Remember that decimal 10-15 = hexadecimal A-F
•
•
•
•
•
1001 = 9
1010 = A (10 decimal)
1111 = F (15 decimal)
binary:
0101 0110 1100 0010 1110
hexadecimal: 5 6
C
2
E
Hex to binary
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Each hexadecimal digit is a decimal number 0-15, representing four binary bits:
hexadecimal
binary
1
0001
2
0010
3
0011
...
8
1000
9
1001
A
1010 (10 decimal)
B
1011 (11 decimal)
C
1100 (12 decimal)
D
1101 (13 decimal)
E
1110 (14 decimal)
F
1111 (15 decimal)
•
•
hexadecimal: D
7
B
5
F
3
C
8
binary:
1101 0111 1011 0101 1111 0011 1100 1000
Common numbers
• Because of the binary nature of the hardware, there are some numbers
that appear frequently:
• Hexadecimal
Decimal
• FF
255 largest decimal number in a byte (8 bits)
• FFFF
65535 largest decimal number in 2 bytes (16 bits)
• If one bit is used for the sign (as C++ does for a short integer), the largest
decimal number that can be stored in a C++ short integer is +32767
• FFFFFFFF +2147483647 largest decimal number that can be stored in a
variable
• declared as an int (one bit reserved for the sign, 31 bits for the number)
• (there is more than one way to encode signed integers)
Computer Storage
• Computer storage space (internal memory, or external
device capacity) is usually referred to using decimal units
(kilobytes (KB), megabytes (MB), gigabytes (GB)).
However internally, the space is allocated in binary as a
power of two. The power-of-two size is greater than the
decimal measurement:
Measurement
Power of 10
Power of 2
Size difference
Kilobyte
3
10
2.4%
Megabyte
6
20
4.86%
Gigabyte
9
30
7.37%
So if you purchase a computer with 1 gigabyte of memory, you are really getting
1,073,741,824 bytes of storage (2 raised to the 30th power).
Representation of characters
• All characters are represented by bytes. The
standard representation is called ASCII.
• For example, the character A is represented in
ASCII by 0100 0001 (41 hexadecimal).
• The ASCII character set was originally only 7 bits
long, which allows for 128 distinct characters.
• That's plenty for English alphabetic characters
and punctuation and numbers, but it won't hold
larger international alphabets, and special
characters.
Representation of characters
• To represent characters in all the world's alphabets, you might need
to handle about 50 alphabets (including Chinese, Korean, Thai,
Hebrew, Cyrillic), each with different (sometimes a great number
of) characters (including, for English, ç and ö). The Unicode
character representation has been developed for this purpose.
• The Unicode representation assigns a one to four byte encoding for
the scripts of the world's principal languages, in addition to many
historic and archaic scripts. Some programming languages use
Unicode (Java), some let you use either ASCII or Unicode (C++). The
Unicode encoding named UTF-8 (the most widely used encoding),
supports ASCII characters as one byte values, but allows other
languages and special characters to be used as two to four byte
values.
Program Development
• Program development in CS 215 used an
Integrated Development Environment (IDE). The
IDE is the user interface to a collection of tools
used to develop and test programs.
• The tools that IDEs contain:
–
–
–
–
–
A project manager
A text editor
A compiler
A linker
A debugger
IDE
• The project manager controls the files that
compose the program. Files that you include
in your project will automatically be
compiled and linked to produce an
executable program. The make utility
(described later in the course) is a project
manager for Unix and Linux.
IDE
• The text editor is used to develop the
program source code. The IDE text editors
typically are designed to help you write
source code. For example, they can
automatically indent blocks of code, and
display comments, reserved words in
different colors. On Multilab (where your
programs must execute) there are a number
of editors, for example emacs, vi, and pico
are editors available.
IDE
• The compiler produces the machine
instructions for your hardware and operating
system from the source code. The file of
machine instructions produced from the
source code is called an object file. On
Multilab the compiler that is used is g++.
IDE
• The linker combines multiple object files of
machine instructions into an executable
program. On Multilab the linker used is built
into g++.
IDE
• The debugger assists the debugging of
programs. You can set breakpoints (points in
the program where execution will halt), then
look at or change program variables. On
Multilab the debugger is gdb.
Note
• For CS216, you can use an IDE on Windows
(such as Visual Studio), or a Mac, but your
programs must compile and execute on the
Multilab platform that has Linux as its
operating system. This means that you have
to send your source code and include files to
Multilab, then build the executable program
there. You cannot send the executable built
on Windows to Multilab and execute it there.
Download