198:211 Computer Architecture

advertisement
198:211
Computer Architecture
Topics:
System I/O
Buses
Disk-based storage in computers
Memory/storage hierarchy
Combining many technologies to balance costs/
benefits
Recall the memory hierarchy and virtual memory
lectures
Memory/storage hierarchies
Performance
Capacity
Balancing performance with cost
Small memories are fast but expensive
Large memories are slow but cheap
Exploit locality to get the best of both worlds
locality = re-use/nearness of accesses
allows most accesses to use small, fast memory
An Example Memory Hierarchy
L0:!
Smaller,!
faster,!
and !
costlier!
(per byte)!
storage !
devices!
registers!
CPU registers hold words retrieved from L1 cache.!
L1:!
on-chip L1!
cache (SRAM)!
L1 cache holds cache lines retrieved from the L2
cache memory.!
L2:!
off-chip L2!
cache (SRAM)!
L2 cache holds cache lines retrieved from main
memory.!
L3:!
Larger, !
slower, !
and !
cheaper !
(per byte)!
storage!
devices!
main memory!
(DRAM)!
Main memory holds disk !
blocks retrieved from local !
disks.!
L4:!
local secondary storage!
(local disks)!
Local disks hold files retrieved from
disks on remote network servers.!
L5:!
remote secondary storage!
(tapes, distributed file systems, Web servers)!
Disk-based storage in computers
Memory/storage hierarchy
Combining many technologies to balance costs/
benefits
Recall the memory hierarchy and virtual memory
lectures
Persistence
Storing data for lengthy periods of time
DRAM/SRAM is “volatile”: contents lost if power lost
Disks are “non-volatile”: contents survive power outages
Disk are blocks access (read/write blocks)
Conventional magnetic disks
Newer: Solid state disks
What’s Inside A Disk Drive?
Arm
Spindle
Platters
Actuator
Electronics
SCSI
connector
Image courtesy of Seagate Technology
Disk Electronics
Just like a small computer –
processor, memory, network
interface
Connect to disk
Control processor
Cache memory
Control ASIC
Connect to motor
Disk “Geometry”
Disks contain platters, each with two surfaces
Each surface organized in concentric rings called tracks
Each track consists of sectors separated by gaps
tracks!
surface!
track k!
spindle!
sectors!
gaps!
Disk Geometry (Muliple-Platter View)
Aligned tracks form a cylinder
cylinder k
surface 0!
platter 0!
surface 1!
surface 2!
platter 1!
surface 3!
surface 4!
platter 2!
surface 5!
spindle!
Disk Structure
Read/Write Head
Arm
Upper Surface
Platter
Lower Surface
Cylinder
Track
Sector
Actuator
Disk Operation (Single-Platter View)
The disk surface !
spins at a fixed!
rotational rate!
spindle!
spindle!
The read/write head!
is attached to the end!
of the arm and flies over!
the disk surface on!
a thin cushion of air!
spindle!
spindle!
By moving radially, the arm can position
the read/write head over any track!
Disk Operation (Multi-Platter View)
read/write heads !
move in unison!
from cylinder to cylinder!
arm!
spindle!
Disk Structure - top view of single
platter
Surface organized into tracks
Tracks divided into sectors
Disk Access
Head in position above a track
Disk Access
Rotation is counter-clockwise
Disk Access – Read
About to read blue sector
Disk Access – Read
After BLUE read
After reading blue sector
Disk Access – Read
After BLUE read
Red request scheduled next
Disk Access – Seek
After BLUE read
Seek for RED
Seek to red’s track
Disk Access – Rotational Latency
After BLUE read
Seek for RED
Rotational latency
Wait for red sector to rotate around
Disk Access – Read
After BLUE read
Seek for RED
Complete read of red
Rotational latency
After RED read
Disk Access – Service Time Components
After BLUE read
Seek for RED
Seek
Rotational Latency
Data Transfer
Rotational latency
After RED read
Disk Access Time
Average time to access a specific sector approximated by:
Taccess = Tavg seek + Tavg rotation + Tavg transfer
Seek time (Tavg seek)
Time to position heads over cylinder containing target
sector
Typical Tavg seek = 3-5 ms
Rotational latency (Tavg rotation)
Time waiting for first bit of target sector to pass under r/w
head
Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min
e.g., 3ms for 10,000 RPM disk
Transfer time (Tavg transfer)
Time to read the bits in the target sector
Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1
min
e.g., 0.006ms for 10,000 RPM disk with 1,000 sectors/track
given 512-byte sectors, ~85 MB/s data transfer rate
Disk Access Time Example
Given:
Rotational rate = 7,200 RPM
Average seek time = 5 ms
Avg # sectors/track = 1000
Derived average time to access random sector:
Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms
Tavg transfer = 60/7200 RPM x 1/1000 secs/track x 1000 ms/sec = 0.008 ms
Taccess = 5 ms + 4 ms + 0.008 ms = 9.008 ms
Time to read sector: 0.008 ms
Important points:
Access time dominated by seek time and rotational latency
First bit in a sector is the most expensive, the rest are free
SRAM access time is about 4 ns/doubleword, DRAM about 60 ns
~100,000 times longer to access a word on disk than in DRAM
Disk Scheduling
The operating system is responsible for using hardware
efficiently — for the disk drives, this means having a
fast access time and disk bandwidth.
Access time has two major components
Seek time is the time for the disk are to move the heads
to the cylinder containing the desired sector.
Rotational latency is the additional time waiting for the
disk to rotate the desired sector to the disk head.
Minimize seek time
Seek time ≈ seek distance
Disk bandwidth is the total number of bytes transferred,
divided by the total time between the first request for
service and the completion of the last transfer.
Disk Scheduling
Several algorithms exist to schedule the servicing of
disk I/O requests.
We illustrate them with a request queue (0-199).
98, 183, 37, 122, 14, 124, 65, 67
Head pointer 53
FCFS
Illustration shows total head movement of 640 cylinders.!
SSTF
Selects the request with the minimum seek time from
the current head position.
SSTF scheduling is a form of SJF scheduling; may
cause starvation of some requests.
Illustration shows total head movement of 236
cylinders.
SSTF (Cont.)
SCAN
The disk arm starts at one end of the disk, and moves
toward the other end, servicing requests until it gets
to the other end of the disk, where the head
movement is reversed and servicing continues.
Sometimes called the elevator algorithm.
Illustration shows total head movement of 208
cylinders.
SCAN (Cont.)
C-SCAN
Provides a more uniform wait time than SCAN.
The head moves from one end of the disk to the other.
servicing requests as it goes. When it reaches the
other end, however, it immediately returns to the
beginning of the disk, without servicing any requests
on the return trip.
Treats the cylinders as a circular list that wraps around
from the last cylinder to the first one.
C-SCAN (Cont.)
C-LOOK
Version of C-SCAN
Arm only goes as far as the last request in each
direction, then reverses direction immediately,
without first going all the way to the end of the disk.
C-LOOK (Cont.)
Selecting a Disk-Scheduling Algorithm
SSTF is common and has a natural appeal
SCAN and C-SCAN perform better for systems that place
a heavy load on the disk.
Performance depends on the number and types of requests.
Requests for disk service can be influenced by the fileallocation method.
The disk-scheduling algorithm should be written as a
separate module of the operating system, allowing it to
be replaced with a different algorithm if necessary.
Either SSTF or LOOK is a reasonable choice for the
default algorithm.
Solid state disks
Solid state disks
An array of flash memory devices
Emulates conventional hard disk drive or HDD
No moving parts
Consumes less power than HDD
Small reads (< 4K) are 20x faster
Average reads comparable to HDD reads
Writes are still slow ½ x slower than HDD
Capacity/cost (today)
0.15$/GB-HDD , 2-3$/GB-SSD
Solid State Drive
The interface to the system looks like
HDD
Read, write and erase
Memory consists of blocks
Each block contains several pages
Each page is 2K or 4K in size
Unit of read/write are pages
Need to erase before write!
I/O or input and output
In addition to memory, data transfer needs to occur
between CPU and Input output devices
When reading from memory, a byte or several bytes
can be transferred from memory to register using
mov address, %eax or mov %eax,
address!
I/O devices also are sources or destinations for bytes of
data
I/O devices can be viewed just as memory
I/O devices can be viewed as separate from memory
I/O programming
There are two ways of addressing I/O devices
Memory mapped I/O
The address space is divided between memory and I/O
devices
Higher order addresses can refer to device
Lower order addresses can refer to memory
mov %eax, address will fetch data from I/O or memory
based on the address
E.g., memory range to from 0000 to BFFF
I/O range from C000 to CFFF
Device or memory selection based on address range
Different devices can have different addresses in the I/O
range
Memory mapped I/O
Memory - I/O Bus
CPU
Main
Memory
Disk
keyboard
Display
Network
Send or receive data to /from I/O device is a memory transfer instruction (mov) with
the right address
Main memory not selected when address is in I/O range
Adv
Uniformity of programming, same mov works for I/O and memory
Dis adv
Memory address space is reduced
I/O mapped I/O
Memory and I/O devices use distinct address spaces
Isolated I/O
Two separate instructions to address I/O devices
A separate code or control signal based on the op code
FFFF
will select memory or I/O
IN for input
MEM
0000
OUT for output
00FF
mov for memory access
I/O
0000
Less flexible for programming
Interfacing with I/O
Many devices, with varying speeds, complexity
CPU/bus shared among all peripherals and memory
CPU should be able to select a device and transfer data
to the device
Interpretation of data left to each device
Unlike memory, device need to be ready before
initiating transfer
All of this handed by I/O module
I/O module
Data
lines
Address
Control
data
To device
status
I/O
logic
To
device
CPU selects device by means of address
Data corresponds to instructions for device
Each device has its own set of commands
Status of device can be checked by reading status registers
Data transfer schemes
There are two schemes
Programmed data transfers
CPU transfers data from I/O devices onto registers
Useful for small data transfers
Direct memory access or DMA
Device or I/O module directly transfers data to
memory
Useful or large block transfers
Programmed I/O
Programmed I/O can be further classified as
Synchronous transfer
Asynchronous transfer
Interrupt driven transfer
All of the above can be used to interface with different
I/O devices
Require special hardware features in the CPU
Synchronous transfer
Simplest among three
CPU and I/O speed match
Transfer a byte, word, or double word
Memory mapped
mov %eax , 2
Address of device port is 2
I/O mapped
mov $2, %edx
out %eax, %edx
Similarly for Input device,
Memory mapped: mov 3, %eax or
I/O mapped
mov $3, %edx
in %edx, %eax
Asynchronous transfer
I/O devices slower
Instruct device to be ready
Wait until device ready
Device has status flag/register
Busy waiting
Waste of CPU resources
Request device to get ready
READY
Yes
Issue data transfer command
No
Interrupt driven I/O
Processor need not wait for slow
device
Processor continues with other
instructions
Device interrupts processor when
ready
Interrupt Service Routine
CPU transfers word from device to
register
CPU writes word from register to
memory
Request device
Fetch next instruction
Execute instruction
INT High
Yes
Call Interrupt Service routine
No
DMA or direct memory
Bulk data transfers
Direct device to memory transfer
Memory bus is contention between
CPU and DMA unit
During DMA
Either CPU is in hold state
Or
Cycle stealing
CPU and DMA access in interleaved
Request DMA device
Fetch next instruction
Execute instruction
INT High
No
Yes
Send R/W command
Starting address, #bytes
DMA interrupt
System bus
Memory - I/O Bus
CPU
Main
Memory
Disk
keyboard
Display
A bus is a shared communication link
Contains address bus, data bus
Each bus is a set of wires
Bus can transfer several bits between devices connected by bus
Bus width determines the number of bits transferred in a cycle
Network
Characteristics of bus
Several devices can be connected
Single bus for all devices – cost sharing
Added/removed without affecting others
I/O devices can be connected to other devices
following the same bus standard
Disadvantages:
Bus contention
Speed of I/O devices determined by bus speed
Bus speed determined by number of devices
Slower devices impact others
Bus architecture
Master issues command
Bus
Master
Data can go either way
Any interaction consists of two steps
1. Issue command 2. transfer data
Master Initiates
Issues command, starting address, #bytes
Slave Responds
Sends or receives data as per command from master
Bus
Slave
Computer buses
Modern computers have several I/O devices
Varying speeds
A simple linear bus will not suffice
Modern computers have hierarchical buses
Bus is split into different segments
CPU-Memory one bus
CPU-I/O devices another bus
CPU-cache – another bus
Backplane bus
System bus- Memory and I/O
Single bus for memory and I/O
Cheap
Slow and bus becomes bottleneck
Two-bus systems
Processor-Memory bus
Bus Bridge
I/O bus
Processor-memory traffic on one bus
I/O devices connected by a bridge
Bridge can connect to different kinds of buses
Traffic is isolated
I/O buses can provided expansion slots for devices
hierarchical-bus systems
Backside cache
bus
L2 cache
Processor-Memory bus
Bus Bridge
I/O bus
I/O bus
A single bus bridge connects to the processor-memory bus
Other I/O buses connected to this bus bridge (tree)
CPU-memory sees little contention
Costly
Examples of buses
ISA bus – Industry Standard bus
Old technology
8 Mhz, < 1 byte transfer/cycle, bus B/W 5.3 MB/
sec (1 MB = 1048576 B)
EISA bus – Extended ISA
Old technology
8 Mhz, 4 byte transfer, bus B/W 32 Mb/sec
PCI bus- Peripheral Component Interconnect
Speeds up to 132 MB/s
Bus speed of 33mhz, 4 Bytes/transfer
PCI popularized Plug and Play
Examples of buses
PCI-X extended PCI
133 MHz, 8 bytes/transfer, 1064 MB/sec or 1 GB/
sec
Used to connect gigabit ethernet, high speed disks
SCSI (Small Computer System Interface)
Capable of handling internal/external peripherals
Speed anywhere from 80 – 640 Mb/s
Many types of SCSI
Fast SCSI
Ultra SCSI
Ultra wide SCSI
Parallel vs serial (point-to-point) bus
Processor-Memory bus
CPI/IO
IO
Bus Bridge
I/O bus
Parallel bus
Bus shared among devices
Bus arbitration is slow
Example: PCI, SCSI
Serial I/O
Point to pint links connected directly to CPU
Requires lots of additional high speed hardware
Examples: SATA, USB, firewire
IO
USB
1.0
plug-and-play
Full speed USB devices signal at
12Mb/s
Low speed devices use a 1.5Mb/s
subchannel.
Up to 127 devices chained together
2.0
data rate of 480 mega bits per
second
Firewire (apple)
High speed serial port
400 mbps transfer rate
30 times faster than USB 1.0
plug-and-play
Intel Bus
North bridge and South bridge bus
http://www.testbench.in/pcie_sys_2.PNG
http://www.yourdictionary.com/images/computer/CHIPSET.GIF
Download