Computer Architecture PC Structure and Peripherals Dr. Lihu Rappoport 1 Computer Architecture 2009 – PC Structure and Peripherals Memory 2 Computer Architecture 2009 – PC Structure and Peripherals SRAM vs. DRAM Random Access: access time is the same for all locations DRAM – Dynamic RAM SRAM – Static RAM Refresh Regular refresh (~1% time) No refresh needed Address Address muxed: row+ column Address not multiplexed Access Not true “Random Access” True “Random Access” density High (1 Transistor/bit) Low (6 Transistor/bit) Power low high Speed slow fast Price/bit low high Typical usage Main memory 3 cache Computer Architecture 2009 – PC Structure and Peripherals Technology Trends Capacity Speed Logic 2× in 3 years 2× in 3 years DRAM 4× in 3 years 1.4× in 10 years Disk 2× in 3 years 1.4× in 10 years Performance CPU-DRAM Memory Gap (latency) 1000 CPU 100 10 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1 1980 DRAM Time 4 Computer Architecture 2009 – PC Structure and Peripherals Basic DRAM chip Memory address bus CAS# Column latch RAS# Addr Row latch Column addr decoder Row address decoder Data Memory array Addressing sequence Row address and then RAS# asserted RAS# to CAS# delay Column address and then CAS# asserted DATA transfer 5 Computer Architecture 2009 – PC Structure and Peripherals Addressing sequence tRAC–Access time Precharge delay RAS# RAS/CAS delay CAS# A[0:7] X Row i Col n X Row j CL - CAS latency Data Data n Access sequence Put row address on data bus and assert RAS# Wait for RAS# to CAS# delay (tRCD) Put column address on data bus and assert CAS# DATA transfer Precharge 6 Computer Architecture 2009 – PC Structure and Peripherals Basic SDRAM controller A[20:23] Chip address select decoder Time delay gen. RAS# CAS# Select A[10:19] A[0:9] D[0:7] address mux Memory address bus DRAM R/W# DRAM data must be periodically refreshed 7 Needed to keep data correct DRAM controller performs DRAM refresh, using refresh counter Computer Architecture 2009 – PC Structure and Peripherals Improved DRAM Schemes Paged Mode DRAM – Multiple accesses to different columns from same row – Saves RAS and RAS to CAS delay RAS# CAS# A[0:7] X Row X Col n X Col n+1 X Data n Data X Col n+2 D n+1 D n+2 Extended Data Output RAM (EDO RAM) – A data output latch enables to parallel next column address with current column data RAS# CAS# A[0:7] Data 8 X Row X Col n X Col n+1 X X Col n+2 Data n Data n+1 Data n+2 Computer Architecture 2009 – PC Structure and Peripherals Improved DRAM Schemes (cont) Burst DRAM – Generates consecutive column address by itself RAS# CAS# A[0:7] X Row X Col n X Data n Data 9 Data n+1 Data n+2 Computer Architecture 2009 – PC Structure and Peripherals Synchronous DRAM – SDRAM All signals are referenced to an external clock (100MHz-200MHz) Multiple Banks Makes timing more precise with other system devices Multiple pages open simultaneously (one per bank) Command driven functionality instead of signal driven ACTIVE: selects both the bank and the row to be activated • ACTIVE to a new bank can be issued while accessing current bank READ/WRITE: select column Read and write accesses to the SDRAM are burst oriented Successive column locations accessed in the given row Burst length is programmable: 1, 2, 4, 8, and full-page • May end full-page burst by BURST TERMINATE to get arbitrary burst length A user programmable Mode Register CAS latency, burst length, burst type Auto pre-charge: may close row at last read/write in burst Auto refresh: internal counters generate refresh address 10 Computer Architecture 2009 – PC Structure and Peripherals SDRAM Timing clock cmd ACT NOP t RCD > 20ns RD RD+PC ACT NOP RD ACT NOP RD NOP NOP NOP t RRD > 20ns BL = 1 t RC>70ns Bank Bank 0 X Bank 0 Bank 0 Bank 1 X Bank 1 Bank 0 X Bank 0 X X X Addr Row i X Col j Col k Row m X Col n Row l X Col q X X X CL=2 Data 11 Data j Data k Data n Data q tRCD: ACTIVE to READ/WRITE gap = tRCD(MIN) / clock period tRC: successive ACTIVE to a different row in the same bank tRRD: successive ACTIVE commands to different banks Computer Architecture 2009 – PC Structure and Peripherals DDR-SDRAM 2n-prefetch architecture The DRAM cells are clocked at the same speed as SDR SDRAM Internal data bus is twice the width of the external data bus Data capture occurs twice per clock cycle • Lower half of the bus sampled at clock rise • Upper half of the bus sampled at clock fall 0:n-1 SDRAM Array 0:n-1 0:2n-1 n:2n-1 200MHz clock 12 Uses 2.5V (vs. 3.3V in SDRAM) Reduced power consumption Computer Architecture 2009 – PC Structure and Peripherals DDR SDRAM Timing 133MHz clock cmd ACT NOP NOP RD NOP ACT NOP NOP RD NOP ACT NOP NOP tRCD >20ns t RRD >20ns t RC>70ns Bank Bank 0 X X Bank 0 X Bank 1 X X Bank 1 X Bank 0 X X Addr Row i X X Col j X Row m X X Col n X Row l X X CL=2 Data 13 j +1 +2 +3 n +1 +2 +3 Computer Architecture 2009 – PC Structure and Peripherals DIMMs DIMM: Dual In-line Memory Module A small circuit board that holds memory chips 64-bit wide data path (72 bit with parity) Single sided: 9 chips, each with 8 bit data bus • 512 Mbit / chip 8 chips 512 Mbyte per DIMM Dual sided: 18 chips, each with 4 bit data bus • 256 Mbit / chip 16 chips 512 Mbyte per DIMM 14 Computer Architecture 2009 – PC Structure and Peripherals DRAM Standards SDR SDRAM: PC66, PC100 and PC133 DDR SDRAM DDR200 DDR266 DDR333 DDR400 DDR533 Bus freq (MHz) 100 133 167 200 266 Bit/pin (Mbps) 200 266 333 400 533 Total bandwidth (M Byte/sec ) 1600 2133 2666 3200 4264 15 Total BW for DDR400 3200M Byte/sec = 64 bit2200MHz / 8 (bit/byte) Dual channel DDR SDRAM Uses 2 64 bit DIMM modules in parallel to get a 128 data bus Total BW for DDR400 dual channel: 6400M Byte/sec = 128 bit2200MHz /8 Computer Architecture 2009 – PC Structure and Peripherals DRAM Standards Label Name Effective Clock Rate Data Bus Bandwidth PC66 SDRAM 66 MHz 64 Bit 0,5 GB/s PC100 SDRAM 100 MHz 64 Bit 0,8 GB/s PC133 SDRAM 133 MHz 64 Bit 1,06 GB/s PC1600 DDR200 100 MHz 64 Bit 1,6 GB/s PC1600 DDR200 Dual 100 MHz 2 x 64 Bit 3,2 GB/s PC2100 DDR266 133 MHz 64 Bit 2,1 GB/s PC2100 DDR266 Dual 133 MHz 2 x 64 Bit 4,2 GB/s PC2700 DDR333 166 MHz 64 Bit 2,7 GB/s PC2700 DDR333 Dual 166 MHz 2 x 64 Bit 5,4 GB/s PC3200 DDR400 200 MHz 64 Bit 3,2 GB/s PC3200 DDR400 Dual 200 MHz 2 x 64 Bit 6,4 GB/s PC4200 DDR533 266 MHz 64 Bit 4,2 GB/s PC4200 DDR533 Dual 266 MHz 2 x 64 Bit 8,4 GB/s 16 Computer Architecture 2009 – PC Structure and Peripherals DDR Memory Performance Source: http://www.tomshardware.com/ 17 Computer Architecture 2009 – PC Structure and Peripherals DDR2 DDR2 achieves high-speed using 4-bit prefetch architecture SDRAM cells read/write 4× the amount of data as the external bus DDR2-533 cell works at the same frequency as a DDR266 SDRAM or a PC133 SDRAM cell This method comes at a price of increased latency 18 DDR2-based systems may perform worse than DDR1-based systems Computer Architecture 2009 – PC Structure and Peripherals DDR2 – Other Features Shortened page size for reduced activation power Each time an ACTIVATE command is given, all bits in the page are read • A major contributor to the active power Eight banks in 1Gb densities and above 19 A device with a shorter page size has a significantly lower power 512Mb DDR2 page size is 1KByte vs. 2KB for 512Mb DDR1 Increases flexibility in DRAM accesses Also increases the power Computer Architecture 2009 – PC Structure and Peripherals DDR2 vs DDR1 SDRAM Data Bus DDR1 DDR 2 64 bit 64 bit Data Rate 200/266/333/400 Mbps 400/533/667/800 Mbps Bus Frequency 100/133/166/200 MHz 200/266/333/400 MHz DRAM Frequency 100/133/166/200 MHz 100/133/166/200 MHz Operation Voltage 2.5V 1.8V Package TSOP FBGA Densities 128Mb~1Gb 256Mb~2Gb Prefetch size 2 bits 4 bits Burst length 2/4/8 4/8 CAS Latency 2, 2.5, 3 3, 4, 5 Data Bandwidth 3.2GBs 6.4GBs Power Consumption 399mW 217mW 20 Computer Architecture 2009 – PC Structure and Peripherals DDR2 Latency Many DDR2-533 modules have 4-4-4 timings (CAS Latency - RAS to CAS Delay - RAS Precharge Time) 1.5× latency compared to DDR400 2–3–2 • 30% growth of bandwidth does not compensates access time worsening DDR2-533 latency improves considerably at 3-3-3 timings only 12% worse than the latency of 2-3-2 DDR400 21 Memory Timings Latency dual-channel BW DDR400 2.5–3–3 12.5 ns 6.4 GB/sec DDR400 2 –3 –2 10 ns 6.4 GB/sec DDR533 3 –4 –4 11.2 ns 8.5 GB/sec DDR533 2.5–3–3 9.4 ns 8.5 GB/sec DDR2-533 5 –5 –5 18.8 ns 8.5 GB/sec DDR2-533 4–4–4 15 ns 8.5 GB/sec DDR2-533 3 –3 –3 11.2 ns 8.5 GB/sec DDR2-600 5 –5 –5 16.6 ns 9.6 GB/sec DDR2-600 4 –4 –4 13.3 ns 9.6 GB/sec Computer Architecture 2009 – PC Structure and Peripherals DDR2 Latency (cont.) Performance tests DDR2-533 with 4-4-4 timings worse than DDR400 2–3–2 DDR2-533 with 3-3-3 timings better than DDR400 2–3–2 DDR2-533 modules with 3-3-3 timings Over-clocked motherboards clock DDR2-533 at 600MHz Supported by 925/915 best choice for enthusiastic users significant improvement realized through undocumented memory frequency ratios available in i925/i915 The performance of DDR2-based systems is more sensitive to a lower latency than to a higher frequency 22 We get practically nothing from using DDR2-600 SDRAM with i925/i915 Computer Architecture 2009 – PC Structure and Peripherals DDR3 23 30% a power consumption reduction compared to DDR2 1.5 V supply voltage, compared to DDR2's 1.8 V or DDR's 2.5 V 90 nanometer fabrication technology Higher bandwidth 8 bit deep prefetch buffer (vs. 4 bit in DDR2 and 2 bit in DDR) Transfer data rate Effective clock rate of 800–1600 MHz using both rising and falling edges of a 400–800 MHz I/O clock. DDR2: 400–800 MHz using a 200–400 MHz I/O clock DDR: 200–400 MHz based on a 100–200 MHz I/O clock DDR3 DIMMs 240 pins, the same number as DDR2, and are the same size Electrically incompatible, and have a different key notch location Computer Architecture 2009 – PC Structure and Peripherals SRAM – Static RAM 24 True random access High speed, low density, high power No refresh Address not multiplexed DDR SRAM 2 READs or 2 WRITEs per clock Common or Separate I/O DDRII: 200MHz to 333MHz Operation; Density: 18/36/72Mb+ QDR SRAM Two separate DDR ports: one read and one write One DDR address bus: alternating between the read address and the write address QDRII: 250MHz to 333MHz Operation; Density: 18/36/72Mb+ Computer Architecture 2009 – PC Structure and Peripherals Read Only Memory (ROM) Random Access Non volatile ROM Types PROM – Programmable ROM • Burnt once using special equipment EPROM – Erasable PROM • Can be erased by exposure to UV, and then reprogrammed E2PROM – Electrically Erasable PROM • Can be erased and reprogrammed on board • Write time (programming) much longer than RAM • Limited number of writes (thousands) 25 Computer Architecture 2009 – PC Structure and Peripherals Flash Memory Non-volatile, rewritable memory limited lifespan of around 100,000 write cycles Flash drives compared to HD drives: Smaller size, faster, lighter, noiseless, consume less energy Withstanding shocks up to 2000 Gs • Equivalent to a 10 foot drop onto concrete - without losing data 26 Lower capacity (8GB), but going up Much more expensive (cost/byte): currently ~20$/1GB NOR Flash Supports per-byte addressing Suitable for storing code (e.g. BIOS, cell phone SW) NAND Flash Supports page-mode addressing (e.g., 1KB blocks) Suitable for storing large data (e.g. pictures, songs) Computer Architecture 2009 – PC Structure and Peripherals The Motherboard 27 Computer Architecture 2009 – PC Structure and Peripherals Motherboard with PCI Express Monitor L2 Cache FSB 800MHz PCI Express × 16 North Bridge Video Buff DRAM Ctrlr CPU Memory Bus Graphics Adaptor Memory Hub interface Sound Card CD/ Hard Hard DVD Disk Disk ROM Drive Drive Drive Speakers SATA PCI Ctrlr express Modem Floppy USB Key- PS2 Disk board mouse mouse Drive 28 IDE Ctrlr Phone Line USB Ctrlr PCI Bus: 133MB/s = 32bit ×33MHz Network card Serial Port Parallel Port I/O Controller LCP South Bridge Computer Architecture 2009 – PC Structure and Peripherals The Motherboard IEEE1394a header audio header PCI express PCI add-in PCI express x1 x16 card connector connector connector Back panel connectors Processor core power connector Rear chassis fan header High Def. Audio header PCI add-in card connector LGA775 processor socket Parallel ATA IDE connector GMCH: North Bridge + integ GFX Processor fan header Speaker Front panel USB header 4 × SATA connectors 29 DIMM Channel A sockets Serial port header DIMM Channel B sockets Diskette drive connector ICH: South Battery Bridge + integ Audio Main Power connector Computer Architecture 2009 – PC Structure and Peripherals How to get the most of Memory ? Single Channel DDR L2 Cache FSB – Front Side Bus CPU CPU 30 Memory Bus Memory Ctrlr Dual channel DDR Each DIMM pair must be the same L2 Cache North Bridge DRAM FSB – Front Side Bus North Bridge DRAM Ctrlr CH A CH B DDR DIMM DDR DIMM DDR DIMM DDR DIMM Balance FSB and memory bandwidth 800MHz FSB provides 800MHz × 64bit / 8 = 6.4 G Byte/sec Dual Channel DDR400 SDRAM also provides 6.4 G Byte/sec Computer Architecture 2009 – PC Structure and Peripherals How to get the most of Memory ? Each DDR DIMM supports 4 open pages simultaneously The more open pages, the more random access It is better to have more DIMMs • n DIMMs: 4n open pages DIMMs can be single sided or dual sided Dual sided DIMMs may have separate CS of each side • In this case the number of open pages is doubled (goes up to 8) • This is not a must – dual sided DIMMs may also have a common CS for both sides, in which case, there are only 4 open pages, as with single side 31 Computer Architecture 2009 – PC Structure and Peripherals Hard Disks 32 Computer Architecture 2009 – PC Structure and Peripherals Hard Disk Structure Direct access Nonvolatile, Large, inexpensive, and slow Lowest level in the memory hierarchy Technology Rotating platters coated with a magnetic surface Use a moveable read/write head to access the disk Each platter is divided to tracks: concentric circles Each track is divided to sectors • Smallest unit that can be read or written Disk outer parts have more space for sectors than the inner parts • Constant bit density: record more sectors on the outer tracks • speed varies with track location Buffer Cache A temporary data storage area used to enhance drive performance 33 Sector Track Platters Computer Architecture 2009 – PC Structure and Peripherals The IBM Ultrastar 36ZX 34 Top view of a 36 GB, 10,000 RPM, IBM SCSI server hard disk 10 stacked platters Computer Architecture 2009 – PC Structure and Peripherals Disk Access Read/write data is a three-stage process Seek time: position the arm over the proper track Average: Sum of the time for all possible seek / total # of possible seeks Due to locality of disk reference, actual average seek is shorter: 4 to 12 ms Rotational latency: wait for desired sector to rotate under head The faster the drives spins, the shorter the rotational latency time Most disks rotate at 5,400 to 15,000 RPM • At 7200 RPM: 8 ms per revolution An average latency to the desired information is halfway around the disk • At 7200 RPM: 4 ms Transfer block: read/write the data Transfer Time is a function of: • Sector size • Rotation speed • Recording density: bits per inch on a track Typical values: 100 MB / sec Disk Access Time = Seek time + Rotational Latency + Transfer time + Controller Time + Queuing Delay 35 Computer Architecture 2009 – PC Structure and Peripherals The Disk Interface – EIDE EIDE, ATA, UltraATA, ATA 100, ATAPI: all the same interface Uses for connecting hard disk drives and CD-ROM drives 80-pin cable, 40-pin dual header connector 100 MB/s (ATA66 is only 66MB/s) EIDE controller integrated with the motherboard (in the ICH) EIDE controller has two channels: primary and a secondary Work independently Two devices per channel: master and slave, but equal • The 2 devices have to take turns controlling the bus • A total of four devices per cont If there are two device on the system (e.g., a hard disk and a CD-ROM) • It is better to put them on different channels Avoid mixing slower (CD) and faster devices (HDD) on the same channel If doing a lot of copying from a CD-ROM drive to the CD-RW • Better performance by separating devices to separate channels 36 Computer Architecture 2009 – PC Structure and Peripherals The Disk Interface – Serial ATA (SATA) 37 Point-to-point connection Ensures dedicated 150 MB/s per device (no sharing) Dual controllers allow independent operation of each device Thinner (7 wires), flexible, longer cables Easier routing and improved airflow 4 wires for signaling + 3 ground wires to minimize impedance and crosstalk New 7-pin connector design for easier installation and better device reliability takes 1/6 the area on the system board CRC error checking on all data and control information Increased BW supports data intensive applications such as digital video production, digital audio storage and recording, high-speed file sharing No configuration needed when a adding a 2nd SATA drive One cable for each drive eliminates the need for jumpers No more figuring out which device is the master or slave Today's hard drives are clearly below 100 MB/s Do not benefit from UltraATA / SATA Computer Architecture 2009 – PC Structure and Peripherals The BIOS 38 Computer Architecture 2009 – PC Structure and Peripherals System Start-up Upon computer turn-on several events occur: 1. The CPU "wakes up" and sends a message to activate the BIOS 2. BIOS runs the Power On Self Test (POST): make sure system devices are working ok 39 Initialize system hardware and chipset registers Initialize power management Test RAM Enable the keyboard Test serial and parallel ports Initialize floppy disk drives and hard disk drive controllers Displays system summary information Computer Architecture 2009 – PC Structure and Peripherals System Start-up (cont.) 3. During POST, the BIOS compares the system configuration data obtained from POST with the system information stored on a memory chip located on the MB A CMOS chip, which is updated whenever new system components are added Contains the latest information about system components 4. After the POST tasks are completed the BIOS looks for the boot program responsible for loading the operating system Usually, the BIOS looks on the floppy disk drive A: followed by drive C: 5. After boot program is loaded into memory It loads the system configuration information contained in the registry in a Windows® environment, and device drivers 6. Finally, the operating system is loaded 40 Computer Architecture 2009 – PC Structure and Peripherals