Template

advertisement
Computer Architecture
PC Structure and Peripherals
Dr. Lihu Rappoport
1
Computer Architecture 2009 – PC Structure and Peripherals
Memory
2
Computer Architecture 2009 – PC Structure and Peripherals
SRAM vs. DRAM

Random Access: access time is the same for all locations
DRAM – Dynamic RAM
SRAM – Static RAM
Refresh
Regular refresh (~1% time)
No refresh needed
Address
Address muxed: row+ column
Address not multiplexed
Access
Not true “Random Access”
True “Random Access”
density
High (1 Transistor/bit)
Low (6 Transistor/bit)
Power
low
high
Speed
slow
fast
Price/bit
low
high
Typical usage Main memory
3
cache
Computer Architecture 2009 – PC Structure and Peripherals
Technology Trends
Capacity
Speed
Logic
2× in 3 years
2× in 3 years
DRAM
4× in 3 years
1.4× in 10 years
Disk
2× in 3 years
1.4× in 10 years
Performance
CPU-DRAM Memory Gap (latency)
1000
CPU
100
10
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1
1980
DRAM
Time
4
Computer Architecture 2009 – PC Structure and Peripherals
Basic DRAM chip
Memory address bus
CAS#
Column latch
RAS#
Addr
Row
latch
Column addr
decoder
Row
address
decoder
Data
Memory
array
 Addressing sequence
 Row address and then RAS# asserted
 RAS# to CAS# delay
 Column address and then CAS# asserted
 DATA transfer
5
Computer Architecture 2009 – PC Structure and Peripherals
Addressing sequence
tRAC–Access time
Precharge delay
RAS#
RAS/CAS delay
CAS#
A[0:7]
X
Row i
Col n
X
Row j
CL - CAS latency
Data
Data n
 Access sequence
 Put row address on data bus and assert RAS#
 Wait for RAS# to CAS# delay (tRCD)
 Put column address on data bus and assert CAS#
 DATA transfer
 Precharge
6
Computer Architecture 2009 – PC Structure and Peripherals
Basic SDRAM controller
A[20:23]
Chip
address select
decoder
Time
delay
gen.
RAS#
CAS#
Select
A[10:19]
A[0:9]
D[0:7]
address
mux
Memory address bus
DRAM
R/W#

DRAM data must be periodically refreshed


7
Needed to keep data correct
DRAM controller performs DRAM refresh, using refresh counter
Computer Architecture 2009 – PC Structure and Peripherals
Improved DRAM Schemes
 Paged Mode DRAM
– Multiple accesses to different columns from same row
– Saves RAS and RAS to CAS delay
RAS#
CAS#
A[0:7]
X
Row
X
Col n
X
Col n+1
X
Data n
Data
X
Col n+2
D n+1
D n+2
 Extended Data Output RAM (EDO RAM)
– A data output latch enables to parallel next column address with
current column data
RAS#
CAS#
A[0:7]
Data
8
X
Row
X
Col n
X
Col n+1
X
X
Col n+2
Data n
Data n+1
Data n+2
Computer Architecture 2009 – PC Structure and Peripherals
Improved DRAM Schemes (cont)
 Burst DRAM
– Generates consecutive column address by itself
RAS#
CAS#
A[0:7]
X
Row
X
Col n
X
Data n
Data
9
Data n+1
Data n+2
Computer Architecture 2009 – PC Structure and Peripherals
Synchronous DRAM – SDRAM

All signals are referenced to an external clock (100MHz-200MHz)


Multiple Banks


Makes timing more precise with other system devices
Multiple pages open simultaneously (one per bank)
Command driven functionality instead of signal driven

ACTIVE: selects both the bank and the row to be activated
• ACTIVE to a new bank can be issued while accessing current bank


READ/WRITE: select column
Read and write accesses to the SDRAM are burst oriented


Successive column locations accessed in the given row
Burst length is programmable: 1, 2, 4, 8, and full-page
• May end full-page burst by BURST TERMINATE to get arbitrary burst length

A user programmable Mode Register



CAS latency, burst length, burst type
Auto pre-charge: may close row at last read/write in burst
Auto refresh: internal counters generate refresh address
10
Computer Architecture 2009 – PC Structure and Peripherals
SDRAM Timing
clock
cmd
ACT
NOP
t RCD
>
20ns
RD RD+PC ACT
NOP
RD
ACT
NOP
RD
NOP
NOP
NOP
t RRD >
20ns
BL = 1
t RC>70ns
Bank
Bank 0
X
Bank 0 Bank 0 Bank 1
X
Bank 1 Bank 0
X
Bank 0
X
X
X
Addr
Row i
X
Col j Col k Row m
X
Col n Row l
X
Col q
X
X
X
CL=2
Data
11
Data j Data k
Data n
Data q

tRCD: ACTIVE to READ/WRITE gap = tRCD(MIN) / clock period

tRC: successive ACTIVE to a different row in the same bank

tRRD: successive ACTIVE commands to different banks
Computer Architecture 2009 – PC Structure and Peripherals
DDR-SDRAM

2n-prefetch architecture
 The DRAM cells are clocked at the same speed as SDR SDRAM
 Internal data bus is twice the width of the external data bus
 Data capture occurs twice per clock cycle
• Lower half of the bus sampled at clock rise
• Upper half of the bus sampled at clock fall
0:n-1
SDRAM
Array
0:n-1
0:2n-1
n:2n-1
200MHz clock

12
Uses 2.5V (vs. 3.3V in SDRAM)
 Reduced power consumption
Computer Architecture 2009 – PC Structure and Peripherals
DDR SDRAM Timing
133MHz
clock
cmd
ACT
NOP
NOP
RD
NOP
ACT
NOP
NOP
RD
NOP
ACT
NOP
NOP
tRCD >20ns
t RRD >20ns
t RC>70ns
Bank
Bank 0
X
X
Bank 0
X
Bank 1
X
X
Bank 1
X
Bank 0
X
X
Addr
Row i
X
X
Col j
X
Row m
X
X
Col n
X
Row l
X
X
CL=2
Data
13
j
+1 +2 +3
n +1 +2 +3
Computer Architecture 2009 – PC Structure and Peripherals
DIMMs

DIMM: Dual In-line Memory Module


A small circuit board that holds memory chips
64-bit wide data path (72 bit with parity)

Single sided: 9 chips, each with 8 bit data bus
• 512 Mbit / chip  8 chips  512 Mbyte per DIMM

Dual sided: 18 chips, each with 4 bit data bus
• 256 Mbit / chip  16 chips  512 Mbyte per DIMM
14
Computer Architecture 2009 – PC Structure and Peripherals
DRAM Standards


SDR SDRAM: PC66, PC100 and PC133
DDR SDRAM
DDR200 DDR266 DDR333 DDR400 DDR533
Bus freq (MHz)
100
133
167
200
266
Bit/pin (Mbps)
200
266
333
400
533
Total bandwidth
(M Byte/sec )
1600
2133
2666
3200
4264


15
Total BW for DDR400
 3200M Byte/sec = 64 bit2200MHz / 8 (bit/byte)
Dual channel DDR SDRAM
 Uses 2 64 bit DIMM modules in parallel to get a 128 data bus
 Total BW for DDR400 dual channel: 6400M Byte/sec = 128
bit2200MHz /8
Computer Architecture 2009 – PC Structure and Peripherals
DRAM Standards
Label
Name
Effective
Clock Rate
Data Bus
Bandwidth
PC66
SDRAM
66 MHz
64 Bit
0,5 GB/s
PC100
SDRAM
100 MHz
64 Bit
0,8 GB/s
PC133
SDRAM
133 MHz
64 Bit
1,06 GB/s
PC1600
DDR200
100 MHz
64 Bit
1,6 GB/s
PC1600
DDR200 Dual
100 MHz
2 x 64 Bit
3,2 GB/s
PC2100
DDR266
133 MHz
64 Bit
2,1 GB/s
PC2100
DDR266 Dual
133 MHz
2 x 64 Bit
4,2 GB/s
PC2700
DDR333
166 MHz
64 Bit
2,7 GB/s
PC2700
DDR333 Dual
166 MHz
2 x 64 Bit
5,4 GB/s
PC3200
DDR400
200 MHz
64 Bit
3,2 GB/s
PC3200
DDR400 Dual
200 MHz
2 x 64 Bit
6,4 GB/s
PC4200
DDR533
266 MHz
64 Bit
4,2 GB/s
PC4200
DDR533 Dual
266 MHz
2 x 64 Bit
8,4 GB/s
16
Computer Architecture 2009 – PC Structure and Peripherals
DDR Memory Performance
Source: http://www.tomshardware.com/
17
Computer Architecture 2009 – PC Structure and Peripherals
DDR2

DDR2 achieves high-speed using
4-bit prefetch architecture



SDRAM cells read/write 4× the
amount of data as the external bus
DDR2-533 cell works at the same
frequency as a DDR266 SDRAM or
a PC133 SDRAM cell
This method comes at a price of
increased latency

18
DDR2-based systems may perform
worse than DDR1-based systems
Computer Architecture 2009 – PC Structure and Peripherals
DDR2 – Other Features

Shortened page size for reduced activation power

Each time an ACTIVATE command is given, all bits in the
page are read
• A major contributor to the active power



Eight banks in 1Gb densities and above


19
A device with a shorter page size has a significantly lower
power
512Mb DDR2 page size is 1KByte vs. 2KB for 512Mb DDR1
Increases flexibility in DRAM accesses
Also increases the power
Computer Architecture 2009 – PC Structure and Peripherals
DDR2 vs DDR1 SDRAM
Data Bus
DDR1
DDR 2
64 bit
64 bit
Data Rate
200/266/333/400 Mbps 400/533/667/800 Mbps
Bus Frequency
100/133/166/200 MHz
200/266/333/400 MHz
DRAM Frequency
100/133/166/200 MHz
100/133/166/200 MHz
Operation Voltage
2.5V
1.8V
Package
TSOP
FBGA
Densities
128Mb~1Gb
256Mb~2Gb
Prefetch size
2 bits
4 bits
Burst length
2/4/8
4/8
CAS Latency
2, 2.5, 3
3, 4, 5
Data Bandwidth
3.2GBs
6.4GBs
Power Consumption
399mW
217mW
20
Computer Architecture 2009 – PC Structure and Peripherals
DDR2 Latency

Many DDR2-533 modules have 4-4-4 timings
 (CAS Latency - RAS to CAS Delay - RAS Precharge Time)
 1.5× latency compared to DDR400 2–3–2
• 30% growth of bandwidth does not compensates access time worsening

DDR2-533 latency improves considerably at 3-3-3 timings
 only 12% worse than the latency of 2-3-2 DDR400
21
Memory
Timings
Latency
dual-channel BW
DDR400
2.5–3–3
12.5 ns
6.4 GB/sec
DDR400
2 –3 –2
10 ns
6.4 GB/sec
DDR533
3 –4 –4
11.2 ns
8.5 GB/sec
DDR533
2.5–3–3
9.4 ns
8.5 GB/sec
DDR2-533
5 –5 –5
18.8 ns
8.5 GB/sec
DDR2-533
4–4–4
15 ns
8.5 GB/sec
DDR2-533
3 –3 –3
11.2 ns
8.5 GB/sec
DDR2-600
5 –5 –5
16.6 ns
9.6 GB/sec
DDR2-600
4 –4 –4
13.3 ns
9.6 GB/sec
Computer Architecture 2009 – PC Structure and Peripherals
DDR2 Latency (cont.)


Performance tests
 DDR2-533 with 4-4-4 timings worse than DDR400 2–3–2
 DDR2-533 with 3-3-3 timings better than DDR400 2–3–2
DDR2-533 modules with 3-3-3 timings




Over-clocked motherboards clock DDR2-533 at 600MHz


Supported by 925/915
best choice for enthusiastic users
significant improvement
realized through undocumented memory frequency ratios
available in i925/i915
The performance of DDR2-based systems is more sensitive
to a lower latency than to a higher frequency

22
We get practically nothing from using DDR2-600 SDRAM with
i925/i915
Computer Architecture 2009 – PC Structure and Peripherals
DDR3




23
30% a power consumption reduction compared to DDR2
 1.5 V supply voltage, compared to DDR2's 1.8 V or DDR's 2.5 V
 90 nanometer fabrication technology
Higher bandwidth
 8 bit deep prefetch buffer (vs. 4 bit in DDR2 and 2 bit in DDR)
Transfer data rate
 Effective clock rate of 800–1600 MHz using both rising and
falling edges of a 400–800 MHz I/O clock.
 DDR2: 400–800 MHz using a 200–400 MHz I/O clock
 DDR: 200–400 MHz based on a 100–200 MHz I/O clock
DDR3 DIMMs
 240 pins, the same number as DDR2, and are the same size
 Electrically incompatible, and have a different key notch location
Computer Architecture 2009 – PC Structure and Peripherals
SRAM – Static RAM






24
True random access
High speed, low density, high power
No refresh
Address not multiplexed
DDR SRAM
 2 READs or 2 WRITEs per clock
 Common or Separate I/O
 DDRII: 200MHz to 333MHz Operation; Density: 18/36/72Mb+
QDR SRAM
 Two separate DDR ports: one read and one write
 One DDR address bus: alternating between the read address
and the write address
 QDRII: 250MHz to 333MHz Operation; Density: 18/36/72Mb+
Computer Architecture 2009 – PC Structure and Peripherals
Read Only Memory (ROM)



Random Access
Non volatile
ROM Types

PROM – Programmable ROM
• Burnt once using special equipment

EPROM – Erasable PROM
• Can be erased by exposure to UV, and then reprogrammed

E2PROM – Electrically Erasable PROM
• Can be erased and reprogrammed on board
• Write time (programming) much longer than RAM
• Limited number of writes (thousands)
25
Computer Architecture 2009 – PC Structure and Peripherals
Flash Memory

Non-volatile, rewritable memory


limited lifespan of around 100,000 write cycles
Flash drives compared to HD drives:

Smaller size, faster, lighter, noiseless, consume less energy

Withstanding shocks up to 2000 Gs
• Equivalent to a 10 foot drop onto concrete - without losing data


26

Lower capacity (8GB), but going up

Much more expensive (cost/byte): currently ~20$/1GB
NOR Flash

Supports per-byte addressing

Suitable for storing code (e.g. BIOS, cell phone SW)
NAND Flash

Supports page-mode addressing (e.g., 1KB blocks)

Suitable for storing large data (e.g. pictures, songs)
Computer Architecture 2009 – PC Structure and Peripherals
The Motherboard
27
Computer Architecture 2009 – PC Structure and Peripherals
Motherboard with PCI Express
Monitor
L2 Cache
FSB
800MHz
PCI Express × 16
North
Bridge
Video Buff
DRAM
Ctrlr
CPU
Memory Bus
Graphics
Adaptor
Memory
Hub
interface
Sound
Card
CD/ Hard Hard
DVD Disk Disk
ROM Drive Drive
Drive
Speakers
SATA
PCI
Ctrlr express
Modem
Floppy
USB
Key- PS2
Disk
board mouse mouse
Drive
28
IDE Ctrlr
Phone
Line
USB
Ctrlr
PCI Bus:
133MB/s = 32bit ×33MHz
Network
card
Serial Port
Parallel Port
I/O Controller
LCP South Bridge
Computer Architecture 2009 – PC Structure and Peripherals
The Motherboard
IEEE1394a
header
audio
header
PCI express
PCI add-in PCI
express
x1
x16
card
connector connector connector
Back panel
connectors
Processor core power connector
Rear chassis fan header
High Def. Audio header
PCI add-in card connector
LGA775 processor socket
Parallel ATA IDE connector
GMCH: North Bridge + integ GFX
Processor fan header
Speaker
Front panel USB header
4 × SATA
connectors
29
DIMM Channel A sockets
Serial port header
DIMM Channel B sockets
Diskette drive connector
ICH: South Battery
Bridge +
integ Audio
Main
Power
connector
Computer Architecture 2009 – PC Structure and Peripherals
How to get the most of Memory ?

Single Channel DDR
L2 Cache
FSB – Front Side Bus
CPU

CPU
30
Memory Bus
Memory
Ctrlr
Dual channel DDR
 Each DIMM pair must be the same
L2 Cache

North
Bridge DRAM
FSB – Front Side Bus
North
Bridge DRAM
Ctrlr
CH A
CH B
DDR
DIMM
DDR
DIMM
DDR
DIMM
DDR
DIMM
Balance FSB and memory bandwidth
 800MHz FSB provides 800MHz × 64bit / 8 = 6.4 G Byte/sec
 Dual Channel DDR400 SDRAM also provides 6.4 G Byte/sec
Computer Architecture 2009 – PC Structure and Peripherals
How to get the most of Memory ?


Each DDR DIMM supports 4 open pages simultaneously
 The more open pages, the more random access
 It is better to have more DIMMs
• n DIMMs: 4n open pages
DIMMs can be single sided or dual sided
 Dual sided DIMMs may have separate CS of each side
• In this case the number of open pages is doubled (goes up to 8)
• This is not a must – dual sided DIMMs may also have a common
CS for both sides, in which case, there are only 4 open pages, as
with single side
31
Computer Architecture 2009 – PC Structure and Peripherals
Hard Disks
32
Computer Architecture 2009 – PC Structure and Peripherals
Hard Disk Structure




Direct access
Nonvolatile, Large, inexpensive, and slow
 Lowest level in the memory hierarchy
Technology
 Rotating platters coated with a magnetic surface
 Use a moveable read/write head to access the disk
 Each platter is divided to tracks: concentric circles
 Each track is divided to sectors
• Smallest unit that can be read or written
 Disk outer parts have more space for
sectors than the inner parts
• Constant bit density: record more
sectors on the outer tracks
• speed varies with track location
Buffer Cache
 A temporary data storage area
used to enhance drive performance
33
Sector
Track
Platters
Computer Architecture 2009 – PC Structure and Peripherals
The IBM Ultrastar 36ZX


34
Top view of a 36
GB, 10,000 RPM,
IBM SCSI
server hard disk
10 stacked platters
Computer Architecture 2009 – PC Structure and Peripherals
Disk Access
Read/write data is a three-stage process


Seek time: position the arm over the proper track
 Average: Sum of the time for all possible seek / total # of possible seeks
 Due to locality of disk reference, actual average seek is shorter: 4 to 12 ms
Rotational latency: wait for desired sector to rotate under head
 The faster the drives spins, the shorter the rotational latency time
 Most disks rotate at 5,400 to 15,000 RPM
• At 7200 RPM: 8 ms per revolution

An average latency to the desired information is halfway around the disk
• At 7200 RPM: 4 ms

Transfer block: read/write the data
 Transfer Time is a function of:
• Sector size
• Rotation speed
• Recording density: bits per inch on a track


Typical values: 100 MB / sec
Disk Access Time = Seek time + Rotational Latency + Transfer time
+ Controller Time + Queuing Delay
35
Computer Architecture 2009 – PC Structure and Peripherals
The Disk Interface – EIDE


EIDE, ATA, UltraATA, ATA 100, ATAPI: all the same interface
 Uses for connecting hard disk drives and CD-ROM drives
 80-pin cable, 40-pin dual header connector
 100 MB/s (ATA66 is only 66MB/s)
 EIDE controller integrated with the motherboard (in the ICH)
EIDE controller has two channels: primary and a secondary
 Work independently
 Two devices per channel: master and slave, but equal
• The 2 devices have to take turns controlling the bus
• A total of four devices per cont

If there are two device on the system (e.g., a hard disk and a CD-ROM)
• It is better to put them on different channels


Avoid mixing slower (CD) and faster devices (HDD) on the same channel
If doing a lot of copying from a CD-ROM drive to the CD-RW
• Better performance by separating devices to separate channels
36
Computer Architecture 2009 – PC Structure and Peripherals
The Disk Interface – Serial ATA (SATA)








37
Point-to-point connection
 Ensures dedicated 150 MB/s per device (no sharing)
Dual controllers allow independent operation of each device
Thinner (7 wires), flexible, longer cables
 Easier routing and improved airflow
 4 wires for signaling + 3 ground wires to minimize impedance
and crosstalk
New 7-pin connector design
 for easier installation and better device reliability
 takes 1/6 the area on the system board
CRC error checking on all data and control information
Increased BW supports data intensive applications such as
 digital video production, digital audio storage and recording,
high-speed file sharing
No configuration needed when a adding a 2nd SATA drive
 One cable for each drive eliminates the need for jumpers
 No more figuring out which device is the master or slave
Today's hard drives are clearly below 100 MB/s
 Do not benefit from UltraATA / SATA
Computer Architecture 2009 – PC Structure and Peripherals
The BIOS
38
Computer Architecture 2009 – PC Structure and Peripherals
System Start-up
Upon computer turn-on several events occur:
1. The CPU "wakes up" and sends a message to activate the BIOS
2. BIOS runs the Power On Self Test (POST):
make sure system devices are working ok







39
Initialize system hardware and chipset registers
Initialize power management
Test RAM
Enable the keyboard
Test serial and parallel ports
Initialize floppy disk drives and hard disk drive controllers
Displays system summary information
Computer Architecture 2009 – PC Structure and Peripherals
System Start-up (cont.)
3. During POST, the BIOS compares the system configuration data
obtained from POST with the system information stored on a
memory chip located on the MB


A CMOS chip, which is updated whenever new system components
are added
Contains the latest information about system components
4. After the POST tasks are completed


the BIOS looks for the boot program responsible for loading the
operating system
Usually, the BIOS looks on the floppy disk drive A: followed by drive
C:
5. After boot program is loaded into memory

It loads the system configuration information contained in the
registry in a Windows® environment, and device drivers
6. Finally, the operating system is loaded
40
Computer Architecture 2009 – PC Structure and Peripherals
Download