Cache Organization of Pentium

Cache Organization of Pentium
Instruction & Data Cache of Pentium
• Both caches are organized as 2-way set
associative caches with 128 sets (total 256
• There are 32 bytes in a line (8K/256)
• An LRU algorithm is used to select victims in
each cache.
Structure of 8KB instruction and data
• Each entry in a set
has its own tag.
• Tags in the data
cache are triple
ported, used for
– U pipeline
– V pipeline
– Bus snooping
Data Cache of Pentium
• Bus Snooping: It is used to maintain
consistent data in a multiprocessor system
where each processor has a separate cache
• Each entry in data cache can be configured for
writethrough or write-back
Instruction Cache of Pentium
• Instruction cache is write protected to
prevent self-modifying code.
• Tags in instruction cache are also triple
– Two ports for split-line accesses
– Third port for bus snooping
Split-line Access
• In Pentium (since CISC), instructions are of
variable length(1-15bytes)
• Multibyte instructions may staddle two
sequential lines stored in code cache
• Then it has to go for two sequential access
which degrades performance.
• Solution: Split line Access
Split-line Access
Split-line Access
• It permits upper half of one line and lower
half of next to be fetched from code cache in
one clock cycle.
• When split-line is read, the information is not
correctly aligned.
• The bytes need to be rotated so that prefetch
queue receives instruction in proper order.
Instruction & Data Cache of Pentium
• Parity bits are used to maintain data integrity
• Each tag and every byte in data cache has its
own parity bit.
• There is one parity bit for every 8 byte of data
in instruction cache.
Translation Lookaside Buffers
• They translate virtual addresses to physical
• Data Cache:
– Data cache contains two TLBs
• First:
– 4-way set associative with 64 entries
– Translates addresses for 4KB pages of main
Translation Lookaside Buffers
• First:
– The lower 12 bits addresses are same
– The upper 20-bits of virtual address are checked
against four tags and translated into upper 20-bit
physical address during a hit
– Since translation need to be quick, TLB is kept
• Second:
– 4 way set-associative with 8 entries
– Used to handle 4MB pages
Translation Lookaside Buffers
• Both TLBs are parity protected and dual
• Instruction Cache:
– Uses a single 4-way set associative TLB with 32
– Both 4KB and 4MB are supported (4MB in 4KB
• Parity bits are used on tags and data to
maintain data integrity
• Entries are placed in all 3 TLBs through the
use of a 3-bit LRU counter stored in each set.
Cache Coherency in Multiprocessor
• When multiple processors are used in a single
system, there needs to be a mechanism
whereby all processors agree on the contents
of shared cache information.
• For e.g., two or more processors may utilize
data from the same memory location,X.
• Each processor may change value of X, thus
which value of X has to be considered?
Cache coherency in Multiprocessor
• If each processor change the value of the data
item, we have different(incoherent) values of
X’s data in each cache.
• Solution : Cache Coherency Mechanism
A multiprocessor system with
incoherent cache data
Cache Coherency
• Pentium’s mechanism is called MESI
• This protocol uses two bits stored with each
line of data to keep track of the state of cache
Cache Coherency
• The four states are defined as follows:
• Modified:
– The current line has been modified and is only
available in a single cache.
• Exclusive:
– The current line has not been modified and is only
available in a single cache
– Writing to this line changes its state to modified
Cache Coherency
• Shared:
– Copies of the current line may exist in more than
one cache.
– A write to this line causes a writethrough to main
memory and may invalidate the copies in the
other cache
• Invalid:
– The current line is empty
– A read from this line will generate a miss
– A write will cause a writethrough to main memory
Cache Coherency
• Only the shared and invalid states are used in
code cache.
• MESI protocol requires Pentium to monitor all
accesses to main memory in a multiprocessor
system. This is called bus snooping.
Cache Coherency
• Consider the above example.
• If the Processor 3 writes its local copy of X(30)
back to memory, the memory write cycle will
be detected by the other 3 processors.
• Each processor will then run an internal
inquire cycle to determine whether its data
cache contains address of X.
• Processor 1 and 2 then updates their cache
based on individual MESI states.
Cache Coherency
• Inquire cycles examine the code cache as well
(as code cache supports bus snooping)
• Pentium’s address lines are used as inputs
during an inquire cycle to accomplish bus