L11_PC_2010

advertisement
Computer Architecture
PC Structure and Peripherals
Dr. Lihu Rappoport
1
Computer Architecture 2010 – PC Structure and Peripherals
Memory
2
Computer Architecture 2010 – PC Structure and Peripherals
Technology Trends
Capacity
Speed
Logic
2× in 3 years
2× in 3 years
DRAM
4× in 3 years
1.4× in 10 years
Disk
2× in 3 years
1.4× in 10 years
Performance
CPU-DRAM Memory Gap (latency)
1000
CPU
100
10
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1
1980
DRAM
Time
3
Computer Architecture 2010 – PC Structure and Peripherals
SRAM vs. DRAM

Random Access: access time is the same for all locations
DRAM – Dynamic RAM
SRAM – Static RAM
Refresh
Regular refresh (~1% time)
No refresh needed
Address
Address muxed: row+ column
Address not multiplexed
Access
Not true “Random Access”
True “Random Access”
density
High (1 Transistor/bit)
Low (6 Transistor/bit)
Power
low
high
Speed
slow
fast
Price/bit
low
high
Typical usage Main memory
4
cache
Computer Architecture 2010 – PC Structure and Peripherals
Basic DRAM chip
Memory address bus
CAS#
Column latch
RAS#
Addr
Row
latch
Column addr
decoder
Row
address
decoder
Data
Memory
array
 Addressing sequence
 Row address and then RAS# asserted
 RAS# to CAS# delay
 Column address and then CAS# asserted
 DATA transfer
5
Computer Architecture 2010 – PC Structure and Peripherals
Addressing sequence
tRAC–Access time
Precharge delay
RAS#
RAS/CAS delay
CAS#
A[0:7]
X
Row i
Col n
X
Row j
CL - CAS latency
Data
Data n
 Access sequence
 Put row address on data bus and assert RAS#
 Wait for RAS# to CAS# delay (tRCD)
 Put column address on data bus and assert CAS#
 DATA transfer
 Precharge
6
Computer Architecture 2010 – PC Structure and Peripherals
DRAM Timing

CAS Latency: #clock cycles to access a specific column of data




7
#clock cycle from the moment memory controller issues a column in
the current row, and the data is read out from memory
RAS to CAS Delay: #clock cycles between row and column access
Row Pre-charge time: #clock cycles to close an open row, and open
the next row
Active to Precharge Delay #clock cycles to access a specific row
between the data request and the pre-charge command.
Computer Architecture 2010 – PC Structure and Peripherals
Basic SDRAM controller
A[20:23]
Chip
address select
decoder
Time
delay
gen.
RAS#
CAS#
Select
A[10:19]
A[0:9]
D[0:7]
address
mux
Memory address bus
DRAM
R/W#

DRAM data must be periodically refreshed


8
Needed to keep data correct
DRAM controller performs DRAM refresh, using refresh counter
Computer Architecture 2010 – PC Structure and Peripherals
Improved DRAM Schemes
 Paged Mode DRAM
– Multiple accesses to different columns from same row
– Saves RAS and RAS to CAS delay
RAS#
CAS#
A[0:7]
X
Row
X
Col n
X
Col n+1
X
Data n
Data
X
Col n+2
D n+1
D n+2
 Extended Data Output RAM (EDO RAM)
– A data output latch enables to parallel next column address with
current column data
RAS#
CAS#
A[0:7]
Data
9
X
Row
X
Col n
X
Col n+1
X
X
Col n+2
Data n
Data n+1
Data n+2
Computer Architecture 2010 – PC Structure and Peripherals
Improved DRAM Schemes (cont)
 Burst DRAM
– Generates consecutive column address by itself
RAS#
CAS#
A[0:7]
Data
10
X
Row
X
Col n
X
Data n
Data n+1
Data n+2
Computer Architecture 2010 – PC Structure and Peripherals
Synchronous DRAM – SDRAM

All signals are referenced to an external clock (100MHz-200MHz)


Multiple Banks


Makes timing more precise with other system devices
Multiple pages open simultaneously (one per bank)
Command driven functionality instead of signal driven

ACTIVE: selects both the bank and the row to be activated
• ACTIVE to a new bank can be issued while accessing current bank


READ/WRITE: select column
Read and write accesses to the SDRAM are burst oriented


Successive column locations accessed in the given row
Burst length is programmable: 1, 2, 4, 8, and full-page
• May end full-page burst by BURST TERMINATE to get arbitrary burst length

A user programmable Mode Register



CAS latency, burst length, burst type
Auto pre-charge: may close row at last read/write in burst
Auto refresh: internal counters generate refresh address
11
Computer Architecture 2010 – PC Structure and Peripherals
SDRAM Timing
clock
cmd
ACT
NOP
t RCD
>
20ns
RD RD+PC ACT
NOP
RD
ACT
NOP
RD
NOP
NOP
NOP
t RRD >
20ns
BL = 1
t RC>70ns
Bank
Bank 0
X
Bank 0 Bank 0 Bank 1
X
Bank 1 Bank 0
X
Bank 0
X
X
X
Addr
Row i
X
Col j Col k Row m
X
Col n Row l
X
Col q
X
X
X
CL=2
Data
12
Data j Data k
Data n
Data q

tRCD: ACTIVE to READ/WRITE gap = tRCD(MIN) / clock period

tRC: successive ACTIVE to a different row in the same bank

tRRD: successive ACTIVE commands to different banks
Computer Architecture 2010 – PC Structure and Peripherals
DDR-SDRAM

2n-prefetch architecture
 The DRAM cells are clocked at the same speed as SDR SDRAM
 Internal data bus is twice the width of the external data bus
 Data capture occurs twice per clock cycle
• Lower half of the bus sampled at clock rise
• Upper half of the bus sampled at clock fall
0:n-1
SDRAM
Array
0:n-1
0:2n-1
n:2n-1
200MHz clock

13
Uses 2.5V (vs. 3.3V in SDRAM)
 Reduced power consumption
Computer Architecture 2010 – PC Structure and Peripherals
DDR SDRAM Timing
133MHz
clock
cmd
ACT
NOP
NOP
RD
NOP
ACT
NOP
NOP
RD
NOP
ACT
NOP
NOP
tRCD >20ns
t RRD >20ns
t RC>70ns
Bank
Bank 0
X
X
Bank 0
X
Bank 1
X
X
Bank 1
X
Bank 0
X
X
Addr
Row i
X
X
Col j
X
Row m
X
X
Col n
X
Row l
X
X
CL=2
Data
14
j
+1 +2 +3
n +1 +2 +3
Computer Architecture 2010 – PC Structure and Peripherals
DIMMs

DIMM: Dual In-line Memory Module


A small circuit board that holds memory chips
64-bit wide data path (72 bit with parity)

Single sided: 9 chips, each with 8 bit data bus
• 512 Mbit / chip  8 chips  512 Mbyte per DIMM

Dual sided: 18 chips, each with 4 bit data bus
• 256 Mbit / chip  16 chips  512 Mbyte per DIMM
15
Computer Architecture 2010 – PC Structure and Peripherals
DRAM Standards


SDR SDRAM: PC66, PC100 and PC133
DDR SDRAM
DDR200 DDR266 DDR333 DDR400 DDR533
Bus freq (MHz)
100
133
167
200
266
Bit/pin (Mbps)
200
266
333
400
533
Total bandwidth
(M Byte/sec )
1600
2133
2666
3200
4264


16
Total BW for DDR400
 3200M Byte/sec = 64 bit2200MHz / 8 (bit/byte)
Dual channel DDR SDRAM
 Uses 2 64 bit DIMM modules in parallel to get a 128 data bus
 Total BW for DDR400 dual channel: 6400M Byte/sec = 128
bit2200MHz /8
Computer Architecture 2010 – PC Structure and Peripherals
DDR2

DDR2 achieves high-speed using
4-bit prefetch architecture



SDRAM cells read/write 4× the
amount of data as the external bus
DDR2-533 cell works at the same
frequency as a DDR266 SDRAM or
a PC133 SDRAM cell
This method comes at a price of
increased latency

17
DDR2-based systems may perform
worse than DDR1-based systems
Computer Architecture 2010 – PC Structure and Peripherals
DDR2 – Other Features

Shortened page size for reduced activation power
 When ACTIVATE command is given, read all bits in the page
• A major contributor to the active power
A device with shorter page size has significantly lower power
 512Mb DDR2 page size is 1KByte vs. 2KB for 512Mb DDR1
Eight banks in 1Gb densities and above
 Increases flexibility in DRAM accesses (also increases the power)


DDR1
DDR 2
DRAM Frequency
100/133/166/200 MHz
100/133/166/200 MHz
Bus Frequency
100/133/166/200 MHz
200/266/333/400 MHz
Data Rate
200/266/333/400 Mbps
400/533/667/800 Mbps
2.5V
1.8V
2, 2.5, 3
3, 4, 5
Data Bandwidth
3.2GBs
6.4GBs
Power Consumption
399mW
217mW
Operation Voltage
CAS Latency
18
Computer Architecture 2010 – PC Structure and Peripherals
DDR2 Latency


DRAM timing, measured in I/O bus cycles, specifies 3 numbers
 CAS Latency - RAS to CAS Delay - RAS Precharge Time
DRAM latency is the time for accessing data in an open page



E.g., latency for DDR400 2-3-2: 1/200MHz × 2.5 = 10ns
DDR2-533 4-4-4 latency is 1.5× of to DDR400 2–3–2
• 30% bandwidth growth does not compensate for access time worsening
DDR2-533 3-3-3 latency is only 12% worse than DDR400 2-3-2
Memory
Mem Clk
Bus Clk
Timings
Latency
dual-channel BW
DDR400
200MHz
200MHz
2.5–3–3
12.5 ns
6.4 GB/sec
DDR400
200MHz
200MHz
2 –3 –2
10 ns
6.4 GB/sec
DDR533
266MHz
266MHz
3–4–4
11.2 ns
8.5 GB/sec
DDR533
266MHz
266MHz
2.5–3–3
9.4 ns
8.5 GB/sec
DDR2-533
133MHz
266MHz
4 –4 –4
15 ns
8.5 GB/sec
DDR2-533
133MHz
266MHz
3–3–3
11.2 ns
8.5 GB/sec
DDR2-600
150MHz
300MHz
5 –5 –5
16.6 ns
9.6 GB/sec
DDR2-600
150MHz
300MHz
4 –4 –4
13.3 ns
9.6 GB/sec
19
Computer Architecture 2010 – PC Structure and Peripherals
DDR2 Latency (cont.)


Performance tests
 DDR2-533 with 4-4-4 timings worse than DDR400 2–3–2
 DDR2-533 with 3-3-3 timings better than DDR400 2–3–2
DDR2-533 modules with 3-3-3 timings




Over-clocked motherboards clock DDR2-533 at 600MHz


Supported by 925/915
best choice for enthusiastic users
significant improvement
realized through undocumented memory frequency ratios
available in i925/i915
The performance of DDR2-based systems is more sensitive
to a lower latency than to a higher frequency

20
We get practically nothing from using DDR2-600 SDRAM with
i925/i915
Computer Architecture 2010 – PC Structure and Peripherals
DDR2 Standards
Standard Memory
name
clock
Cycle
time
Data
I/O Bus transfers
clock
per
second
DDR2-400
100 MHz
10 ns
200 MHz 400 Million PC2-3200 3200 MB/s
3-3-3
4-4-4
DDR2-533
133 MHz
7.5 ns
266 MHz 533 Million PC2-4200 4266 MB/s
3-3-3
4-4-4
DDR2-667
166 MHz
6 ns
333 MHz 667 Million PC2-5300 5333 MB/s
4-4-4
5-5-5
400 MHz 800 Million PC2-6400 6400 MB/s
4-4-4
5-5-5
6-6-6
DDR2-800
200 MHz
5 ns
DDR21066
266 MHz
3.75 ns
21
533 MHz
1066
Million
Module
name
Peak
transfer
rate
PC2-8500 8533 MB/s
Timings
6-6-6
7-7-7
Computer Architecture 2010 – PC Structure and Peripherals
DDR3




22
30% a power consumption reduction compared to DDR2
 1.5 V supply voltage, compared to DDR2's 1.8 V or DDR's 2.5 V
 90 nanometer fabrication technology
Higher bandwidth
 8 bit deep prefetch buffer (vs. 4 bit in DDR2 and 2 bit in DDR)
Transfer data rate
 Effective clock rate of 800–1600 MHz using both rising and
falling edges of a 400–800 MHz I/O clock.
 DDR2: 400–800 MHz using a 200–400 MHz I/O clock
 DDR: 200–400 MHz based on a 100–200 MHz I/O clock
DDR3 DIMMs
 240 pins, the same number as DDR2, and are the same size
 Electrically incompatible, and have a different key notch location
Computer Architecture 2010 – PC Structure and Peripherals
DDR2 vs. DDR3 Performance
The high latency of DDR3 SDRAM has
negative effect on streaming operations
23
Source: xbitlabs
Computer Architecture 2010 – PC Structure and Peripherals
SRAM – Static RAM






24
True random access
High speed, low density, high power
No refresh
Address not multiplexed
DDR SRAM
 2 READs or 2 WRITEs per clock
 Common or Separate I/O
 DDRII: 200MHz to 333MHz Operation; Density: 18/36/72Mb+
QDR SRAM
 Two separate DDR ports: one read and one write
 One DDR address bus: alternating between the read address
and the write address
 QDRII: 250MHz to 333MHz Operation; Density: 18/36/72Mb+
Computer Architecture 2010 – PC Structure and Peripherals
Read Only Memory (ROM)



Random Access
Non volatile
ROM Types

PROM – Programmable ROM
• Burnt once using special equipment

EPROM – Erasable PROM
• Can be erased by exposure to UV, and then reprogrammed

E2PROM – Electrically Erasable PROM
• Can be erased and reprogrammed on board
• Write time (programming) much longer than RAM
• Limited number of writes (thousands)
25
Computer Architecture 2010 – PC Structure and Peripherals
Flash Memory

Non-volatile, rewritable memory


limited lifespan of around 100,000 write cycles
Flash drives compared to HD drives:

Smaller size, faster, lighter, noiseless, consume less energy

Withstanding shocks up to 2000 Gs
• Equivalent to a 10 foot drop onto concrete - without losing data


26

Lower capacity (8GB), but going up

Much more expensive (cost/byte): currently ~20$/1GB
NOR Flash

Supports per-byte addressing

Suitable for storing code (e.g. BIOS, cell phone SW)
NAND Flash

Supports page-mode addressing (e.g., 1KB blocks)

Suitable for storing large data (e.g. pictures, songs)
Computer Architecture 2010 – PC Structure and Peripherals
The Motherboard
27
Computer Architecture 2010 – PC Structure and Peripherals
Computer System Structure
External
Graphics
Card
PCI express ×16
CPU
BUS
Cache
CPU
North Bridge
On-board
Graphics
DDRII
Memory
controller
Channel 1
Mem BUS
DDRII
Channel 2
South Bridge
28
Serial Port
Parallel Port
IO Controller
Floppy
Drive
keybrd
USB
IDE
SATA
controller controller controller
mouse
Old DVD
Drive
Hard
Disk
PCI express ×1
PCI
Sound
Card
speakers
Lan
Adap
LAN
Computer Architecture 2010 – PC Structure and Peripherals
Computer System Structure – New
External
Graphics
Card
PCI express ×16
DDRIII
Cache
Channel 1
Mem
BUS
DDRIII
Memory
controller
CPU
BUS
CPU
North Bridge
On-board
Graphics
Channel 2
South Bridge
29
Serial Port
Parallel Port
IO Controller
Floppy
Drive
keybrd
USB
IDE
SATA
controller controller controller
mouse
DVD
Drive
Hard
Disk
PCI express ×1
PCI
Sound
Card
speakers
Lan
Adap
LAN
Computer Architecture 2010 – PC Structure and Peripherals
The Motherboard
IEEE1394a
header
audio
header
PCI express
PCI add-in PCI
express
x1
x16
card
connector connector connector
Back panel
connectors
Processor core power connector
Rear chassis fan header
High Def. Audio header
PCI add-in card connector
LGA775 processor socket
Parallel ATA IDE connector
GMCH: North Bridge + integ GFX
Processor fan header
Speaker
Front panel USB header
4 × SATA
connectors
30
DIMM Channel A sockets
Serial port header
DIMM Channel B sockets
Diskette drive connector
ICH: South Battery
Bridge +
integ Audio
Main
Power
connector
Computer Architecture 2010 – PC Structure and Peripherals
How to get the most of Memory ?

Single Channel DDR
L2 Cache
FSB – Front Side Bus
CPU

CPU
31
Memory Bus
Memory
Ctrlr
Dual channel DDR
 Each DIMM pair must be the same
L2 Cache

North
Bridge DRAM
FSB – Front Side Bus
North
Bridge DRAM
Ctrlr
CH A
CH B
DDR
DIMM
DDR
DIMM
DDR
DIMM
DDR
DIMM
Balance FSB and memory bandwidth
 800MHz FSB provides 800MHz × 64bit / 8 = 6.4 G Byte/sec
 Dual Channel DDR400 SDRAM also provides 6.4 G Byte/sec
Computer Architecture 2010 – PC Structure and Peripherals
How to get the most of Memory ?


Each DIMM supports 4 open pages simultaneously
 The more open pages, the more random access
 It is better to have more DIMMs
• n DIMMs: 4n open pages
DIMMs can be single sided or dual sided
 Dual sided DIMMs may have separate CS of each side
• In this case the number of open pages is doubled (goes up to 8)
• This is not a must – dual sided DIMMs may also have a common
CS for both sides, in which case, there are only 4 open pages, as
with single side
32
Computer Architecture 2010 – PC Structure and Peripherals
Hard Disks
33
Computer Architecture 2010 – PC Structure and Peripherals
Hard Disk Structure




Direct access
Nonvolatile, Large, inexpensive, and slow
 Lowest level in the memory hierarchy
Technology
 Rotating platters coated with a magnetic surface
 Use a moveable read/write head to access the disk
 Each platter is divided to tracks: concentric circles
 Each track is divided to sectors
• Smallest unit that can be read or written
 Disk outer parts have more space for
sectors than the inner parts
• Constant bit density: record more
sectors on the outer tracks
• speed varies with track location
Buffer Cache
 A temporary data storage area
used to enhance drive performance
34
Sector
Track
Platters
Computer Architecture 2010 – PC Structure and Peripherals
The IBM Ultrastar 36ZX


35
Top view of a 36
GB, 10,000 RPM,
IBM SCSI
server hard disk
10 stacked platters
Computer Architecture 2010 – PC Structure and Peripherals
Disk Access
Read/write data is a three-stage process


Seek time: position the arm over the proper track
 Average: Sum of the time for all possible seek / total # of possible seeks
 Due to locality of disk reference, actual average seek is shorter: 4 to 12 ms
Rotational latency: wait for desired sector to rotate under head
 The faster the drives spins, the shorter the rotational latency time
 Most disks rotate at 5,400 to 15,000 RPM
• At 7200 RPM: 8 ms per revolution

An average latency to the desired information is halfway around the disk
• At 7200 RPM: 4 ms

Transfer block: read/write the data
 Transfer Time is a function of:
• Sector size
• Rotation speed
• Recording density: bits per inch on a track


Typical values: 100 MB / sec
Disk Access Time = Seek time + Rotational Latency + Transfer time
+ Controller Time + Queuing Delay
36
Computer Architecture 2010 – PC Structure and Peripherals
The Disk Interface – EIDE


EIDE, ATA, UltraATA, ATA 100, ATAPI: all the same interface
 Uses for connecting hard disk drives and CD-ROM drives
 80-pin cable, 40-pin dual header connector
 100 MB/s (ATA66 is only 66MB/s)
 EIDE controller integrated with the motherboard (in the ICH)
EIDE controller has two channels: primary and a secondary
 Work independently
 Two devices per channel: master and slave, but equal
• The 2 devices have to take turns controlling the bus
• A total of four devices per cont

If there are two device on the system (e.g., a hard disk and a CD-ROM)
• It is better to put them on different channels


Avoid mixing slower (CD) and faster devices (HDD) on the same channel
If doing a lot of copying from a CD-ROM drive to the CD-RW
• Better performance by separating devices to separate channels
37
Computer Architecture 2010 – PC Structure and Peripherals
The Disk Interface – Serial ATA (SATA)








38
Point-to-point connection
 Ensures dedicated 150 MB/s per device (no sharing)
Dual controllers allow independent operation of each device
Thinner (7 wires), flexible, longer cables
 Easier routing and improved airflow
 4 wires for signaling + 3 ground wires to minimize impedance
and crosstalk
New 7-pin connector design
 for easier installation and better device reliability
 takes 1/6 the area on the system board
CRC error checking on all data and control information
Increased BW supports data intensive applications such as
 digital video production, digital audio storage and recording,
high-speed file sharing
No configuration needed when a adding a 2nd SATA drive
 One cable for each drive eliminates the need for jumpers
 No more figuring out which device is the master or slave
Today's hard drives are clearly below 100 MB/s
 Do not benefit from UltraATA / SATA
Computer Architecture 2010 – PC Structure and Peripherals
Solid State Drive – SSD
Performance numbers used by most manufacturers represent "burst rate"


Not its steady state or average read rate
Any write operation requires an erase followed by the write


When SSD is new, NAND flash memory is pre-erased
Consumer-grade multi-level cell (MLC) – allows ≥2 bit per flash memory cell



Sustains 2,000 to 10,000 write cycles
Notably less expensive than SLC drives
Enterprise-class single-level cell (SLC) – allows 1 bit per flash memory cell


Lasts 10× write cycles of an MLC
The more write/erase cycles there are, the shorter the drive's lifespan


Wear-leveling algorithms evenly distribute data across flash memory, and move
data around, so that no one portion wears out faster than another
• SSD's controller keeps a record of where data is set down on the drive as it is relocated
from one portion to another


39
Add DRAM cache to buffer data writes to reduce the number of write/erase cycles
Have extra memory cells; when blocks of flash memory wear out, use spare blocks
Computer Architecture 2010 – PC Structure and Peripherals
SSD (cont.)

Data in NAND flash memory organized in fixed size in blocks



40
When any portion of the data on the drive is changed
• Mark block for deletion in preparation for accommodating the new data
• Read current data on the block
• Redistribute the old data
• Lay down the new data in the old block
Old data is rewritten back
Typical write amplification is 15 to 20
• For every 1MB of data written to the drive, 15MB to 20MBs of space is
actually needed
• Using write combining reduces write amplification to ~10%
Computer Architecture 2010 – PC Structure and Peripherals
The BIOS
41
Computer Architecture 2010 – PC Structure and Peripherals
System Start-up
Upon computer turn-on several events occur:
1. The CPU "wakes up" and sends a message to activate the BIOS
2. BIOS runs the Power On Self Test (POST):
make sure system devices are working ok







42
Initialize system hardware and chipset registers
Initialize power management
Test RAM
Enable the keyboard
Test serial and parallel ports
Initialize floppy disk drives and hard disk drive controllers
Displays system summary information
Computer Architecture 2010 – PC Structure and Peripherals
System Start-up (cont.)
3. During POST, the BIOS compares the system configuration data
obtained from POST with the system information stored on a
memory chip located on the MB


A CMOS chip, which is updated whenever new system components
are added
Contains the latest information about system components
4. After the POST tasks are completed


the BIOS looks for the boot program responsible for loading the
operating system
Usually, the BIOS looks on the floppy disk drive A: followed by drive
C:
5. After boot program is loaded into memory

It loads the system configuration information contained in the
registry in a Windows® environment, and device drivers
6. Finally, the operating system is loaded
43
Computer Architecture 2010 – PC Structure and Peripherals
Download