INPUT/OUTPUT

advertisement
CHAPTER 2 - COMPUTER SYSTEMS ORGANIZATION
2.1 PROCESSORS
CPU- the brain of the computer
CPU- ALU + control unit + small high speed memory (register)
Most important registers: PC, IR
Data path: registers, ALU, buses connecting the pieces.
Instructions: 1. Register-register
2. Register-memory
Data path cycle: The process of running two operands through the ALU
and storing the result back in one of the registers.
1
INSTRUCTION EXECUTION (Fetch-decode-execute cycle)
1.
2.
3.
4.
5.
6.
7.
Fetch the next instruction from memory into the IR.
Change PC to point to following instruction.
Determine the type of instruction just fetched.
If the instrucion uses a word in memory, determine where it is.
Fetch the word, if needed into a CPU register.
Execute the instruction.
Go to step 1.
2
Should a program be executed by a "hardware" CPU?
Hardware processors ≈ interpreters (Some hardware machine is needed to run
the interpreter.)
An interpreter breaks the instructions of its target machine into small steps. As a
consequence, the machine on which the interpreter runs can be much simpler
and less expensive than a hardware processor for the target machine would be.
The saving comes from the fact that hardware is being replaced by software.
Simple instructions or complex instructions?
Complex instruction (direct support for accessing array elements, floating point
arithmetic) are better:
- More powerful, more complex instructions led to faster program execution
even though individual instructions might take longer to execute.
- Execution of individual operations could be overlapped or executed in
parallel with different hardware.
Tradeoff: cost
Expensive, high performance computers came to have many more instructions
than lower cost ones. However, the rising cost of software development and
instruction compability requirements created the need to implement complex
instructions even on low-cost computers where cost was more important than
speed.
How to build a low cost computer that could execute all the complicated
instructions of high performance, expensive machines? INTERPRETATION
In 1950s, IBM introduced a family of computers (IBM System/360) with one
architecture but many different implementations that could execute the same
program, differing only in price and speed. A direct hardware implementation
was used on the most expensive models.
An important benefit of interpretation: The opportunity to add new instructions
at minimal cost.
Another factor in favor of interpretation: existence of fast read-only memories.
Example: a typical 68000 instruction took the interpreter 10 instructions called
microinstructions, at 100 nsec each, and 2 references to main memory, at 500
nsec each. Total execution time was then 2000 nsec. Without control structure, it
would have taken 6000 nsec.
3
RISC vs CISC
RISC: Reduced Instruction Set Computers
CISC: Complex Instruction Set Computers
In 1980, Patterson and Sequin designed VLSI CPU chips that did not use
interpretation -> RISC I - > RISC II -> SPARC.
In 1981, Hennessey designed MIPS.
- different from commercial processors: Since these new CPUs did not have to
be backward compatible with existing products, their designers were free to
choose new instruction sets that would maximize the performance. Designing
instructions that could be started quickly was the key to good performance.
How long an instruction actually took mattered less than how many could be
started per second.
RISC principle: a computer should have a small number of simple instructions
in one cycle of the data path, fetching two registers, combining them somehow
and storing the result back in a register.
- If a RISC machine takes 4 or 5 instructions to do what a CISC machine does
in one instruction, if the RISC instructions are 10 times as fast as (because
they are not interpreted), RISC wins.
- By this time, the speed of main memories had caught up to the speed of readonly control stores, interpretation penalty had greatly increased, strongly
favoring RISC machines.
RISC -> DEC Alpha
CISC -> Intel
CISC is still preferred. (backward compability): Billions of dollars have been
invested in Intel line.
Intel employed RISC ideas even in CISC architecture. Starting with 486, Intel
CPUs contain a RISC core that executes the simplest instructions in a single data
path cycle, while interpreting the more complicated ones in CISC way. Common
instructions are fast, less common ones are slow.
DESIGN PRINCIPLES OF MODERN COMPUTERS - RISC design
principles
1.
2.
All instructions are directly executed by hardware:
not interpreted by microinstructions
high speed
Maximize the rate at which instructions are issued: by parallelism
4
3. Instructions should be easy to decode: make instructions regular, fixed
length, with a small number of fields. The fewer different formats, the better.
4. Only loads and stores should reference memory:
- operands must come from and return to registers.
- Separate instructions for moving operands from memory to registers (delay
are unpredictable -> overlap different instructions)
5. Provide plenty of registers.
PARALLELISM
1. Instruction level
2. Processor level
INSTRUCTION LEVEL PARALLELISM
1. Pipelining
2. Superscalar Architecture:
2 pipelines are better
Intel 486 - one pipeline
Pentium: double pipelines
5
Main pipeline: u pipeline - arbitrary Pentium instructions
v pipeline - integer instructions
Complex rules determined whether a pair of instructions are compatible.
Pentium specific compilers that produced compatible pairs could produce
faster-running programs.
4 pipelines: duplicate hardware
Solution: superscalar architecture
PROCESSOR LEVEL PARALLELISM
Pipelining and superscalar operation rarely win a factor of more than 5 or 10 due
to heat and speed of light problems. To get gains of 50, 100 or more, the only
way is to design with multiple CPUs.
1. Data Parallel Computers
Loop/array based program structure make it easy target for parallel computing
1. SIMD processors: parallel computer
2. vector processors: extension of a single processor
SIMD processors have 1 control unit and many processing units which
execute the same instruction. Modern graphic processing units (GPUs)
heavily rely on SIMD processing
Ex: Nvidia Fermi GPU- 16 SIMD stream multiprocessors (SM) with each
SM containing 32 SIMD processors. Each cycle, the scheduler selects 2
threads to execute on the SIMD processor. The next instruction from each
thread then executes on top of 16 SIMD processors. A fully loaded Fermi
GPU core with 32 SMs will perform 512 operations per cycle.
6
Unlike a SIMD processor, vector processors perform operations in a single,
heavily pipelined functional unit. Both SIMD processors and vector processors
work on arrays of data. A SIMD processor does it by having as many as
functional units. On the other hand, the vector processor has the concept of a
vector register, which consists of a set of conventional registers that can be
loaded from memory in a single instruction. For ex., a vector addition
instruction performs the pairwise addition of the elements of two such vectors
by feeding them to a pipelined adder from the two vector registers. The SSE
(Streaming SIMD extension) instructions on the Intel Core architecture use this
execution model to speed up multimedia and scientific calculations.
2.Multiprocessors
3. Multicomputers
- large number of interconnected computers with private memory
- message communication
- 2D, 3D, tree, ring interconnections
7
4. Hybrid systems:
Multiprocessor: easier to program
Multicomputer: easier to build
2.2 PRIMARY MEMORY
Bits:
BCD: 4 decimals into an 16 bit integer
16 bit pure binary number: can store 65536 combinations
BCD: 10000 combinations
Memory addresses:
If an address has m bits, there are 2m cells.
32 bit machine: 32 bit registers
Byte ordering: Big endian, little endian
No problem with characters
Problem with integers during transfer between different systems
Solution: - software reverses the bytes
- include header telling which kind of data and its length
8
Error Correcting code: Hamming code
Cache memory: Most heavily used instructions are kept in cache memory.
Locality principle forms the basis of all caching systems.
Main memories and caches are divided into fixed size blocks: cache lines
When a cache miss occurs, entire cache line is loaded from main memory.
9
Cache design issues:
- cache size
- size of a cache line
- organization of the cache: how does the cache keep track of which memory
words are currently being held?
- Whether instructions and data are kept in the same cache: Unified x Split
caches
Split caches: parallel access
- Number of caches
Memory Packaging and Types:
SIMM - Single Inline Memory Module: transfer 32 bits at once
DIMM - Dual Inline Memory Module: transfer 64 bits at once
SO-DIMM (Small Outline DIMM): notebook computers
2.3 SECONDARY MEMORY
Memory Hierarchies
10
As we move down the hierarchy, three key parameters increase.
1. The access time gets bigger.
2. Storage capacity increases.
3. Number of bits per dollar increases.
Magnetic Disks
Track, sector, preamble, intersector gap, cylinder, rotational latency, seek
Outer tracks have more linear distance around them than the inner ones do. In
older drives, manufacturers used the maximum possible linear density on the
innermost track, and successively lower bit densities on tracks further out. If a
disk had 18 sectors per track, each one occupied 20 degrees of arc, no matter
which cylinder it was in.
Nowadays, a different strategy is used. Cylinders are divided into zones and the
number of sectors per track is increased in each zone, moving outward from the
innermost track. This makes keeping track of the information harder but
increases the drive capacity.
Disk controller: a chip that controls the drive.
- accept commands from software (READ; WRITE, FORMAT...)
- controlling the arm motion
- detecting and correcting errors
- converting 8-bit bytes read from memory into a serial bit stream
- Some controllers also handle buffering of multiple sectors, caching sectors,
remapping bad sectors.
IDE Disks (Integrated Device Electronics)
1. First disk: 10 MB Seagate disk. Controller was on a seperate board. BIOS
calling conversions were used.
2. IDE: the controller was closely integrated with the drives.
Problem: Sectors starts at address 1 while heads and cylinders starts with 0.
Ability to control two drives.
3. EIDE: Extended IDE
LBA (Logical block addressing) sectors starts at 0.
Ability to control four drives instead of two.
Data transfer rate has increased from 4 Mbps to 16.67 Mbps.
4. ATA-3 (AT Attachment)
5. ATAPI-4: ATA Packet Interface – 33 MBps
ATAPI-5: 66 MBps
11
ATAPI-6: 100 MBps – 48 bit LBA size instead of 28
ATAPI-7: radical break with the past
Serial ATA:1 bit over 7 pins
- 150 MBps – 1.5 GBps
- 0.5 volts for signaling which reduces power consumption
- Better airflow with round cables
SCSI Disks (Small Computer System Interface)
Not different from IDE disks in terms of how their cylinders, tracks and sectors
are organized, but they have a different interface and much higher transfer rates.
High transfer rates: SUN, HP, SGI workstations, Macintosh and high-end Intel
PCs as network servers
More than just a hard disk interface: it is a bus to which a SCSI controller and
up to seven devices can be attached.
Controller issues commands to disks and other peripherals. Commands and
responses occur in phases, using various control signals to delineate the phases
and arbitrate bus accesses when multiple devices are trying to use the bus at the
same time. SCSI allows all the devices run at once. IDE and EIDE do not allow
this.
RAID (Redundant Array of Inexpensive Disks)
- SCSI is suitable for good performance
- More than one disks appears as a single disk to the software.
- Striping: a technique to distribute data over multiple drives.
12
Solid-State Disks: nonvolatile flash memory
Advantage: faster than magnetic disks due to 0 seek time
Disadvantages: cost, higher failure rate
Wear leveling: every time a new block is written, the destination block is
reassigned to a new SSD block that has not been recently written.
13
CD-ROMs
CD-Recordables
CD-ROM XA: incremental writes
CD-Rewritables
DVD (Digital Versatile Disk)- 4.7 GB/17 GB, 1.4 MBps
Blue Ray- 25 /50 GB, 4.5MBps
2.4 INPUT/OUTPUT
1. BUSES
Motherboard:
- CPU chip
- some slots into which DIMM modules can be clicked
- a bus etched along its length
- sockets into which the edge connectors of I/O boards can be inserted
Each I/O device has two parts: 1. Controller
2. I/O device
The controller is usually contained on a board plugged into a free slot, except
for those controllers that are not optional (keyboard), which are sometimes
located on the motherboard. The controller connects to its device by a cable
attached to a connector on the back of the box.
What does a controller do?:
1. controls its I/O device and handles bus access for it .
2. breaks the bit stream up into units, and write each unit into memory.
3. A controller that reads and writes data to or from memory without CPU
access: DMA controller.
4. interrupts the CPU after the transfer.
14
What happens if the CPU and an I/O controller want to use the bus at the same
time?
A bus arbiter decides who goes next. In general, I/O devices are given
preference. Why?
!! Disks and other moving devices cannot be stopped and forcing them to wait
would result in lost data.
Cycle stealing: When no I/O is in progress, the CPU can have all the bus cycles
for itself to reference memory. When some IO device is also running, that
device will request and be granted the bus when it needs it.
As the CPUs, memories and I/O devices got faster, a problem arouse: the bus
could no longer handle the load presented.
- People often upgraded their CPU, but wanted to move their printer, scanner,
and modem to the new system.
- A huge industry had grown up around providing a vast range of I/O devices
for the IBM Pc bus.
IBM decided produced a faster bus for IBM PC, the PS/2 range. Old bus: ISA
(Industry Standard Architecture)
PCI and PCIe Buses
Despite the market pressure not to change anything, the old bus was really too
slow. Other companies developed machines with multiple buses, one of which
was the old ISA bus, or its backward compatible successor, the EISA
(Extended ISA) bus. The winner was PCI (Peripheral Component
Interconnect) designed by Intel. Intel put all patents in the public domain.
PCI was replaced by PCIe (PCI Express) which is a radical change from the PCI
bus.
15
-1 bit wide serial connections (much higher speeds than 8,16,32,64 bit transfers)
PCI: 66 MHZ clock rate, 528 MBps transfer rate with 64 bits transfers
PCIe: 8 GHZ, 1 GBps transfer rate
- point to point communication provides parallelization. On PCI all devices are
listened on the same bus.
2. TERMINALS
Computer terminal consists of two parts: a keyboard and a monitor.
Mainframe world: these parts are often integrated into a single device and
attached to the main computer by a serial line or over a telephone line.
PC world: These devices are independent.
Keyboards: When a key is depressed and released, an interrupt is generated.
Touch Screens: infrared, resistive, multitouch
Flat Panel Displays:
TFT display: a tiny switching element at each pixel position
16
Colored display: optical filters to separate light into components
OLED display(Organic Light Emitting Diode) : consists of layers of electrically
charged organic molecules sandwiched between two electrodes.
Video RAM: on the display’s controller card. It holds bitmaps for screen
images.
indexed vs 3byte RGB model.
Require a lot of bandwith: 1920x1080 display, 25 fps: 155 MB/sec more than
PCI can handle but PCIe can handle it with ease.
3. MICE
Mechanical mice
Optical mice
Optomechanical mice
4. GAME CONTROLLERS
Based on accelerometer and cameras
5. PRINTERS
5.1Monochrome Printers: Matrix printer, ink jet printer, laser printer
5.2Color Printers: based on CYMK principle.
Color ink jet printers, solid ink printers, laser printer, wax
printer, dye sublimation printers.
6. TELECOMMUNICATION EQUIPMENTS
17
Modems:
Amplitude modulation, Frequency modulation, Phase modulation
Baud rate, full duplex, simplex, half duplex
Digital Subscriber Line
Due to the high bandwidth provisions of satellite and cable TV, telcos needed a
more competitive product than dialup lines.
Data rate is reduced with a filter in dialup lines. With ADSL, this filter is
removed.
4-8 Mbps is possible.
ADSL modem acts as 250 modems operating at different frequencies.
18
Internet over Cable
Each telephone user has a private wire to the telco office. However, hundreds of
users share the same cable to the headend.
A cable modem is needed for Internet Access.
When a cable modem is turned on, it scans for a packet from the headend. Upon
finding such a packet, the new modem announces its presence on one of the
upstream channels. The headend assigns downstream and upstream channels.
Upstream channels:
Ranging to determine distance from the headend for proper timing.
19
Upstream channels are divided into minislots. Each upstream packet must
fit in 1 or more consecutive minislots. The headend announces the start of
a new round of minislots periodically. The starting gun is not heard at all
modems simultaneously due to the propagation time down the cable.
Contention from the minislots.
Packets goes to the headend, which relays them over a dedicated channel
to the cable company’s main office and then to the ISP.
Down stream channels: Single sender, no contention
Time division Statistical multiplexing
Protection with Reed Solomon codes
Encryption for security.
7. DIGITAL CAMERAS
20
The film in original cameras is replaced by CCDs (Charge coupled devices)
When light strikes a CCD, it acquires an electrical charge. The charge can be
read off by an ADC.
One pixel is made up four CCDs.
Autofocus: by analyzing high frequency info. in the image and then moving the
lens until it is maximized to give the most detail.
Exposure: measure the light falling on CCDs, make adjustments to have the
light intensity fall in the middle of the CCDs’ range.
White balance: for color correction.
CCDs are read and stored in rgb camera’s RAM.
The camera’s software applies white balance color correction, applies an
algorithm for noise reduction, and sharpening the image.
Compression
Transfer to PC over USB or FİreWİre cable.
8. CHARACTER CODES
ASCII - American Standard Code for Information Interchange
IS 646: Add 128 chars to ASCII
IS 8859: code page
UNICODE : code points of 16 bits
UTF-8: variable length codes (1-4 bytes)
-
dominant character set in www
ASCII characters: 1 byte length
Self synchronizing
21
Download