Uploaded by Rae e

Computer organization

advertisement
MODULE – 5
Input / Output Organization
Accessing I/O Devices
Interrupts
Interrupt Hardware
Enabling and Disabling Interrupts
Handling Multiple Devices
Controlling Device Requests
Direct Memory Access
Bus Arbitration
Memory System
Basic Concepts
Semiconductor RAM Memories
Internal Organization of Memory Chips
Static Memories
Asynchronous Drams
Synchronous DRAMs
Structure of Larger Memories
Memory System Considerations
Rambus Memory
Cache Memories
Mapping Functions
Replacement Algorithms
o
1
MODULE – 5
Computer Organization
INPUT / OUTPUT ORGANIZATION
Introduction
In this chapter, a study on various ways of input and output organization is done.
Accessing I/O Devices
All I/O devices have to interact via processor. Hence, a simple way to connect I/O devices
is shown below:
The bus enables all I/O devices to communicate with each other. The bus is actually not a single
line. It is a set of 3 lines namely:
● Address Line
● Data Line
● Control Line
Each I/O device is assigned a unique address. When the processor places an address on the
address line, the device associated with the address becomes active. It reads the control line to
obtain the instruction that it needs to execute. Based on Read / Write operation, data will be either
placed on data line or received from data line.
An I/O device could be mapped to a section of memory. Such a method of accessing an I/O device
via memory is called “Memory Mapped I/O”.
So, if some data is written to the memory it is as good as writing onto the I/O device and reading
from that memory is as good as reading from the I/O device.
Ex: The instruction Move DATAIN,R0 moves the content of memory “DATAIN” to register R0.Now
this memory “DATAIN” is mapped to keyboard. Hence, the above instruction is as good as reading
from keyboard.
Ex: The instruction Move R0,DATAOUT writes the content of R0 register to DATAOUT memory
which could be the buffer for display unit or a printer.
When an I/O device is connected to the system, the I/O device needs multiple hardware as shown
in the following figure:
2
MODULE – 5
Computer Organization
● Address Decoder: This circuit decodes the address that has been put on address line and
identifies if the address is of its input device.
● Data Register: It holds the data to be passed from or on to the processor, status register
holds status of the instruction execution
● Control circuit: Coordinates the I/O transfer.
Since, the processor is running at a very high speed, the I/O device might be running at a
much slower rate (user types slowly). Hence, some coordination mechanism is required. In earlier
sections, we saw SIN and SOUT bits used for coordination.
Ex: Let’s look at a computer with keyboard as input device and display as output device. When
data is entered via keyboard, the content is made available from “DATAIN” register. When data
need to be displayed, it is stored in the “DATAOUT” register. The status register contains two flags
“SIN” and “SOUT” used to synchronize the input and output devices with CPU. Two more flags
within the status register called “DIRQ” and “KIRQ” which work in conjunction with “DEN” and
“KEN” bits in a special CONTROL register for handling interrupts.
3
MODULE – 5
Computer Organization
The aim here is to read one line of content and display it on screen.
Note: In the program below, one character is read and written to screen at a time. This process
stops when user presses enter key {ASCII value 13 (Decimal Value) or0Dh (Hex value)}.
WAITK
WAITD
Move
TestBit
Branch=0
Move
TestBit
Branch=0
Move
Move
Compare
Branch ≠ 0,
Move
Call
#LINE,R0
#0,STATUS
WAITK
DATAIN,R1
#1,STATUS
WAITD
R1,DATAOUT
R1,(R0)+
#$0D,R1
WAITK
#$0A,DATAOUT
PROCESS
Initialize memory pointer
Test SIN
Wait for a character to be entered
Read a character
Test SOUT
Wait for display to be ready
Send character to display
Store the character and advance the pointer
Check if Carriage Return (CR)
If not get another character
Otherwise, send Line Feed (LF)
Call subroutines to process the input line
Now, let us try to understand the above program step by step:
● Move #LINE,R0
Final (or complete) string read from keyboard will be stored in “LINE”. It is manipulated via
register R0.
WaitK TestBit
#0,STATUS
Branch=0
WAITK
MOVE
DATAIN,R1
Keep checking the 0th bit of status register, if it is 0 then it means user has not yet typed a
key, so loop back to “WAITK” So the code keeps looping as long as 0th bit of status register (i.e.
SIN) is 0. Once it becomes 1, it means DATAIN register has been filled with the key typed by
user, read the contents of DATAIN register into the register R1. At this point R1 register has 1
character read from keyboard.
WAITD
TestBit
Branch=0
Move
Move
#1,STATUS
WAITD
R1,DATAOUT
R1,(R0)+
Now, it’s the turn of display related handling. Check the 1st bit of status register (SOUT). If it is 0
then it means display is busy transferring previous characters onto display so wait. The loop
continues as long as bit 1 of status register does not become 1. As soon as it becomes 1, then
the following two steps takes place:
o Content of R1 register (i.e. character read from keyboard) is copied to DATAOUT i.e.
display
o Simultaneously, the same content of R1 register is copied to R0 register in the buffer LINE
(Note: R0 points to address of LINE)
It is a post increment operation. Therefore, R0 moves ahead to get itself, ready to copy next
character in LINE array.
4
MODULE – 5
Computer Organization
Compare
Branch≠0
#$OD,R1
WAITK
Check if user has pressed carriage return (“\r”) key, if not then user might enter more
characters, and control jumps back to WAITK, else it will move to bottom instruction.
Move #$0A,DATAOUT
CALL PROCESS
If user presses the enter key, then display “\n” on the screen and call PROCESS
subroutine/function. The PROCESS subroutine could potentially work on the “LINE” buffer that
stores the complete input up to ”enter” key.
Such a mechanism, when processor depends / repeatedly checks a flag to ensure synchronization
in the program is called Program Controlled I/O.
There are two other methods of synchronization which are generally used:
● Interrupts
● Direct memory access
Interrupts
The code in the previous section was processor intensive. The processor keeps on
checking the “SIN” and “SOUT” bits. Hence, the processor does not do any useful task other than
running the loop. To avoid this situation, the I/O device can have the concept of interrupts. When
the I/O device is ready it could signal the processor on a separate line called interrupt-request
line.On receiving the interrupt, the processor reads the I/O device. This removes the infinite loop
waiting mechanism
Example: Let’s take a task that involves two activities
● Perform some computations
● Print the results this +
Repeat the above two steps several / multiple times.
Let the program contain two routines “COMPUTE” and “PRINT”. Let’s assume that COMPUTE
produces a set of “n” lines of output that needs to be printed by “PRINT”.
Method 1: The COMPUTE routine passes the “n” lines to the PRINT Routine. PRINT routine then
prints one line at a time on the printer. All the while the COMPUTE Routine is waiting.
Method 2: The COMPUTE routine passes the “n” lines to the PRINT routine. PRINT routine then
sends one line to the printer and instead of waiting for the line to get printed on the printer, the
PRINT routine suspends itself and gives control to COMPUTE routine. Compute routine resumes
its activity. Once the line has been printed the printer sends an interrupt to the processor of
computer, at this point COMPUTE is suspended and PRINT routine is activated. PRINT routine
sends the second line to the printer so that printer can keep printing. The PRINT again suspends
itself till the printer is done with printing, so COMPUTE takes over and resumes executing only to
be interrupted later when printing of 2nd line is complete. This process keeps repeating.
5
MODULE – 5
Computer Organization
Interrupt Service Routine (ISR): The routine that gets executed when an interrupt request is
made is called interrupt service routine (ISR). In the above example, PRINT subroutine is an
ISR.
Working of Interrupts
Step 1: The processor is executing instruction “i” when the interrupt occurs and then the program
counter (PC) will be currently pointing to i+1.
Step 2: When the interrupt occurs, the program counter (PC) value (i.e. address of instruction i+1)
is stored on processor stack (or a special register)
Step 3: PC is now loaded with the address of interrupt service routine (i.e. the address of first line
of PRINT routine)
Step 4: Once the interrupt service routine (i.e. PRINT routine) is completed the address on
processor stack is popped and placed back in program counter (i.e. address of i+1th
instruction of “COMPUTE” routine.)
Step 5: Execution resumes from i+1th line of COMPUTE routine
NOTE: When the interrupt service routine is being executed a signal is sent to inform that the
interrupt signal has been handled. This is done by a special signal called
“Interrupt-Acknowledge” signal.
Difference between Interrupt Service Routine call and a Subroutine call
Calling an interrupt service routine and calling any other subroutine seem similar, (the PC
value is pushed on stack and ISR address value is placed in PC, on return from ISR, the stored
PC value is popped back into PC).
However, the major difference is that when a subroutine is called, all register values are saved
before calling the subroutine. However, when an interrupt service routine is called, then only two
register values are saved.
●
PC value (example: the address of i+1th instruction in previous example)
●
Status Register
The reason why the processor does not store the contents of all registers is, because it needs
time to store these and makes the interrupt service routine to wait (this delay is called interrupt
latency). One solution is that when the interrupt service routine is called, then within the interrupt
service routine the programmer can write code to save all register values and just before exiting
the routine all register values can be restored.
Different methods of handling interrupt latency in various Processor Architectures:
6
MODULE – 5
Computer Organization
There are three different ways in which modern scenarios handle the above problem of
interrupt latency (i.e. storing and retrieving register values before calling interrupt service routine.)
Method 1: In some processor architectures the number of registers in the system is very small. In
this case the values are stored by the processor itself before calling interrupt service
routine.
Method 2: Some Architecture provides two special instructions, one that saves the register values
and another that does not save the values. The I/O interface designer chooses the one
which suites its needs, depending upon the response-time requirements.
Method 3: The processor supports two sets of registers. One for normal processing and another
duplicate set for interrupt handling. So, when an interrupt occurs, different set of
registers are at work and there is no need to save the registers and then restore them
back.
Interrupt Hardware
Most computers have facility to connect multiple I/O devices to it (example: laptops have 3
-4 USB slots, an RS232 slot for connecting projector, etc.,). All these I/O devices are connected
via switches as shown in figure below:
So, there is a common interrupt line which serves “n” I/O devices. The interrupt handling works in
the following manner.
● When no I/O device is issuing an interrupt, then all switches are open. This means the
entire Vdd voltage passes on the line and reaches the processor, which means a “1” is
flowing on the line.
● As soon as an I/O device raises an interrupt, (to indicate that a key has been typed on
keyboard or that the display is ready to receive next data / character), the switch associated
with I/O device gets closed. So, the entire current is passed / dissipated via that switch
which means the hardware line reaching the processor gets 0 voltage. This is an indication
to the processor that an interrupt has occurred, and processor needs to act on this to
identify which I/O device triggered the interrupt.
Note: (i) The value of INTR is a logical OR of the requests from individual devices,
i.e. INTR=INTR1+ ……. + INTRn
(ii) The interrupt is represented in a complementary form 𝐼𝑁𝑇𝑅. This is done because when
there is no interrupt, processor receives a 1 (i.e. Vdd voltage). However, when an interrupt is
actually issued/occurs processor receives 0 volts.
(iii) The switches in electronic implementation are special gates known as open-collector
(for bipolar circuits) and open-drain (MOS circuits).
7
MODULE – 5
Computer Organization
(iv) The resistor “R” is called a pull up resistor because it pulls the line voltage to high
voltage state when all switches are open (i.e. NO interrupt scenario).
Summary on Hardware Interrupts
● When there are no interrupts, the processor receives a 1 on the interrupt line going to the
processor.
● When an interrupt occurs on one of the I/O devices, the voltage drops to 0 on the interrupt line.
This is an indication to the processor that some I/O device has issued an interrupt.
Enabling and Disabling Interrupts
From the previous discussions on interrupt, we can infer that:
●
An interrupt will stop the currently executing program temporarily and branches to an
interrupt service routine.
●
An interrupt can occur at any time during the execution of a program.
Because of above reasons, it may sometimes be necessary to disable interrupts and
enable them later. Some processors provide special machine instructions such as
Interrupt-Enable and Interrupt Disable that perform these tasks.
Recursive Interrupt Problem
Let’s assume that an interrupt has occurred and the interrupt service routine is about to
execute. The interrupt line is still high. It may not go low until the interrupt routine is returned. This
may take some time, during which the processor might think that some more interrupts have
arrived and needs to schedule the interrupt service routine(s) (this happens because line has not
gone low) which leads to infinite loop of interrupts.
There are 3 methods to handle this situation:
Method 1: As soon as an interrupt service routine is entered, all interrupts are disabled. Interrupts
are enabled only while exiting from the interrupt service routine. This ensures that
processor does not notice false alarms. Also, note that the program written for the
interrupt service routine has to enable and disable interrupts.
Method 2: In case of simpler processor, the processor itself can disable interrupts when the
interrupt service routine is entered and it enables the interrupts when the interrupt
service routine is about to exit. This is different from the first method, as in method1 the
programmers had to explicitly write code to enable / disable interrupts. In case of
method 2, the processor has inbuilt mechanism to do it.
This is done by having a special bit in the program counter. After PC is stored on stack,
this bit is cleared. While exiting from the ISR, the PC is copied back and the bit
automatically is enabled.
Method 3: Interrupts are handled only when interrupt line is going high (i.e. on the leading edge of
signal). So, even if the signal is high for a long time it does not matter, as the trigger is
at the leading edge (called Edge-triggered).
Summary:
8
MODULE – 5
Computer Organization
●
●
●
●
The device raises an interrupt request
The processor interrupts current program being executed
Interrupts are disabled by changing control bit in status register (PS – processor status)
The I/O device is informed that interrupt is being handled, so that interrupt request signal can
be deactivated.
● The interrupt service routine performs required action (for example, to read the keyboard
register or display the next character on display unit).
● Interrupts are enabled back and the execution of the interrupted program is resumed.
Handling Multiple Devices
There could be scenarios where multiple devices (I/O devices) are connected to the CPU
that could issue interrupts. Since, these are independent devices raising interrupts at random time,
and then many issues arise, such as:
● How will the processor identify the device raising an interrupt?
● If each I/O device has its own interrupt service routine, then how will the processor know
the address of interrupt service specific to each I/O device?
● How would the processor handle two simultaneous interrupts?
● Should a device be allowed to interrupt when another interrupt service routine is already
executing?
Solution for identifying the device raising the interrupt:
The status register shall be used to identify the device raising the interrupt. When a device
raises an interrupt, it will set a specific bit to 1(in previous example, the KIRQ bit was set to 1,
when a key was pressed on keyboard and the DIRQ bit was set to 1, when display was done with
displaying current content). This bit is called IRQ bit.
Disadvantage: Time spent in checking the IRQ bits of all devices is more, considering that most of
the devices are not generating interrupts simultaneously.
Vectored Interrupts
Another solution is to use the concept called vectored interrupts. When an I/O device
raises an interrupt, it places a special code on the bus destined to the processor. This helps the
processor to identify the device. This special code could be about 4-8 bits and it might represent
the starting address of the interrupt services routine and the processor reads this value from the
bus. Then it goes to that location and fetches an instruction.
This instruction branches to the memory location where the code for interrupt service routine is
located. So, when the processor reads the branching address, it puts it in the Program Counter
(PC), leading to a jump/branching to the interrupt service routine. This address in the branch
instruction is called interrupt vector.
In some processors, the I/O device directly puts this interrupt vector on the data bus, so that
processor can directly jump to the address and start executing the ISR. To avoid multiple devices
from writing simultaneously on the data bus, an Interrupt Acknowledge Line (INTA) is used. When
a processor is ready to receive an interrupt it activates the interrupt acknowledge line. An I/O
device waiting to inform processor of an interrupt will pull “down” the INTR line and place the
interrupt vector (address of ISR) on the data bus.
9
MODULE – 5
Computer Organization
So, the processor knows an interrupt has occurred (since the INTR line has been pulled down)
and knows the location where the interrupt service routine code is present (I/O device copies the
data on the data bus). till here
Interrupt Nesting
● When an interrupt is being handled by an interrupt service routine, a situation may arise that an
interrupt occurs that is important than the current one, needs to be handled. Hence, the first
interrupt service routine needs to be paused and the second important interrupt needs to be
handled.
● To handle such scenarios, the interrupts from multiple devices are prioritized (i.e. certain
interrupts are given high priority and certain interrupts are given low priority).
● When the processor is executing a low priority interrupt service routine and the processor
receives an interrupt from a high priority device, then the low priority interrupt is paused and
high priority interrupt is handled.
● This is a concept called Priority of a processor. The priority of a processor is nothing but the
priority of the interrupt request being handled. So, processor priority is low if it is executing a
low priority interrupt.
Implementation
The concept of multi-priority can be executed by having separate interrupt request and
interrupt acknowledge lines as shown below:
Each interrupt request lines (𝐼𝑁𝑇𝑅1 ... 𝐼𝑁𝑇𝑅𝑛) is assigned different priority levels. When an interrupt
request is received on these interrupt lines, they are given to the “priority arbitration circuit”.
This circuit decides if the received interrupt has higher priority level than the interrupt assigned to
the processor. If yes, then existing ISR is paused and new interrupt is handled.
Simultaneous Requests
● If multiple devices raise interrupts simultaneously, then it is straight forward to decide which
interrupt needs to be handled, from the previous scenario, it is clear to choose the interrupt that
has higher priority.
● The above logic works fine when each device has its own interrupt request line and interrupt
acknowledge line as per above diagram. But, if all devices are connected to a common
interrupt request line then it becomes complex process.
● A widely used mechanism is as shown below:
10
MODULE – 5
Computer Organization
The daisy-chain interrupt handling mechanism works as described in below steps:
Step 1: Multiple devices raise interrupts by trying to pull down the INTR line.
Step 2: The processor realizes that devices are raising interrupts. So, it makes the INTA line to go
high (i.e. 1).
Step 3: The INTA line is connected to one device (for example, device 1 in the figure). If this
device 1 had raised the interrupt, then it goes ahead and puts the “identifying code” on
the data line. If it had not raised the interrupt, then it passes the INTA signal to device 2
and so on.
So, priority is defined by the device nearest to the processor. This method ensures that multiple
requests are handled even when all devices are connected to a common interrupt request line.
The above method has another advantage compared to interrupt priority mechanism, which used
individual request/acknowledgment lines for multiple devices, is that the number of lines is
reduced to just 1.
Priority Groups:
A small enhancement can be done to the above mechanism as shown in the following
figure. This is a combination of both fig 4.7 and 4.8 (a)
In the above implementation, the devices are grouped (all devices of same priority may form a
group) and when multiple devices raise interrupt, the priority arbitration circuit ensures high priority
group is handled first.
Within this high priority group, the device nearest to the processor gets first chance to send its
interrupt. Hence, the above method uses priority lines as well as daisy chain method to handle
simultaneous interrupts.
Controlling Device Requests
There are two registers that are involved in interrupt handling:
i. CONTROL register
ii. STATUS register
11
MODULE – 5
Computer Organization
The way these two registers work is as described below:
CONTROL register
The control register works at device end.
STATUS register
DEN - Display Interrupt Enable
If DEN flag is set (to 1) then an interrupt is generated by device as soon as DIRQ bit in
STATUS register gets set. If DEN flag/bit is set to 0, then an interrupt is NOT generated, even if
DIRQ bit is set in STATUS register (INTR line will never be pulled down).
KEN - Keyboard Interrupt Enable
Here, the behavior is similar to DEN. If KEN is set to 1, a keyboard interrupt is generated as
soon as KIRQ becomes 1, and an interrupt is never generated if KEN=0 irrespective of KIRQ
being 1 or not (INTR line will never be pulled down).
(Note: The SIN and SOUT bits are used to indicate if data bits in DATAIN & DATAOUT registers is
available or not).
Ex: Let’s write a program to understand the working of interrupts:
● A processor uses vectored input scheme, where the starting address of the interrupt service
routine is stored at a memory location INTVEC.
● Interrupts are enabled by setting to 1 an interrupt- enable bit, IE in the process status word.
Let it be the 9th bit.
● A keyboard and an display units are connected to the processor that have the status control
and data registers as shown below:
The aim of the program is to read the characters from keyboard and store the characters in
successive byte locations in the memory, starting at location LINE. We shall try to achieve
this using interrupts in 2 stages as given below:
● Do all initialization in the main program.
● Read one character in the interrupt service routine and return back from routine (so for
every character entered by user the ISR is called)
Main Program:
Move #LINE,PNTR
Clear EOL
BitSet #2,CONTROL
Initialize buffer pointer
Clear end of line indicator
Enable keyboard interrupt
12
MODULE – 5
Computer Organization
BitSet #9,PS
Set interrupt enable bits in the PS
Interrupt-Service Routine:
INTVEC
MoveMultiple R0-R1,-(SP)
Move
PNTR,R0
MoveByte
DATAIN,R1
MoveByte
R1,(R0)+
Move
R0,PNTR
CompareByte #$OD,R1
Branch≠0
RTRN
Move
Bitclear
RTRN
#1,EOL
#2,CONTROL
MoveMultiple (SP)+,R0-R1
Return-from-interrupt
Save registers R0 and R1 on stack
Load address pointer
Get input character
and store it in memory
Update pointer
Check if carriage return
Indicate End of Line
Disable keyboard, interrupts
Restore registers R0 and R1
The above program is divided into two parts, the main program that does the initialization and the
interrupt service routine.
In the main program, the following are achieved:
i. Load the starting address of the interrupt service routine in location INTVEC.
ii. Load the address LINE in a memory location PNTR. The interrupt service routine will use
this location as a pointer to store the input characters in the memory.
iii. Enable keyboard interrupts by setting bit 2 in the CONTROL register to 1.
iv.
Enable interrupts in the processor by setting to 1 the IE bit in the processor status register,
PS.
After the initialization is complete, when a user types a character, an interrupt is generated
(because bit 2 in CONTROL register is set to 1). The program currently being executed gets
interrupted and the interrupt service routine gets called.
The ISR does the following in its code:
i. Read the input character from the keyboard input data register. Once this is done, the
interface circuit removes the interrupt request.
ii. Stores the character in the memory location pointed to by PNTR, and increments PNTR.
iii. When end of line is reached, disable keyboard interrupts and inform program and
iv.
Return from interrupt.
Exceptions
Recovery from errors
Interrupts can be generated when an exception occurs during execution of a program. For
example, during an arithmetic calculation a divide by zero error could be encountered or the
opcode in an instruction may not be a valid one. In such scenarios, an exception occurs and an
interrupt is generated. The processing of exception is similar to an interrupt from I/O device.
Step 1: Current executing program is suspended
Step 2: Save the PC and status registers
13
MODULE – 5
Computer Organization
Step 3: Start the interrupt service routine to handle the exception.
Step4: The interrupt service routine could take action for the exception or the user could be
informed of the exception (ex by throwing an error message).
Debugging:
The system software usually includes a program called a debugger that is used to debug
issues in a program code. The debugging software can run in two modes:
●
Trace Mode:
In this mode, an exception occurs at every line of user program. This lets the programmer to
debug the status of memory and register at every line of the code.
●
Breakpoint:
It is similar to trace mode except for the fact that the program pauses/breaks only at specific
breakpoints set by the programmer. If a user sets a breakpoint on instruction “i” and runs the
debugger, the code executes and stops at line number “i”, it saves the address of “i+1”
instruction and executes the interrupt service routine. The user can check the status of
memory/register and then exit the interrupt service routine. The address of “i+1” instruction is
popped from processor stack and execution resumes. The execution stops at the next
breakpoint set by user.
Privilege Exception:
This is the 3rd type of exception being discussed. There are certain instructions that can be
executed only when the processor is in supervisor mode. These are called privileged instructions.
Ex: Changing the priority of processor (i.e. priority of task running on processor).
Ex: Accessing memory area of another user. When a user tries to execute such instructions, a
privilege exception is generated that switches the processor to supervisor mode and an
appropriate routine within OS gets executed.
Direct Memory Access
An instruction like the one given below, transfers data from an I/O device to a register R0,
Move DATAIN,R0
This involves multiple checks and multiple lines of code execution. For example, the processor
polls to check if the status flag is set, it checks if I/O device is ready and then the processor has to
wait for I/O device to raise an interrupt request. Once data is read/written, memory address needs
to be incremented. The word count also needs to be kept in check (i.e. no of words read/written).
As interrupts are used, program counter needs to be saved and then restored back when data has
been read/written.
A separate mechanism is used if the intention is to read/write a chunk of data from/to I/O
device and memory. In this mechanism, the processor does NOT intervene continuously. This
method is called as Direct Memory Access (DMA).
A control circuit called “DMA Controller” which is part of I/O interface performs this data transfer.
Hence, the DMA controller performs the task that was supposed to be done by the processor and
this frees up the processor for other tasks.
14
MODULE – 5
Computer Organization
The working of DMA Controller is given in below steps:
Step 1: The user program issues an instruction to read/write data from I/O device to/from memory.
Step 2: The OS realizes that this is a DMA operation and blocks the user program to start another
program.
Step 3: To initiate the DMA transfer, the processor sends the starting address of memory from/to
where data needs to be read/written. The processor also specifies the word count (i.e.
number of words to be read/written) and the direction (i.e. read or write).
Step 4: The DMA controller gets this instruction and initiates the operation (read/write).Now, the
processor is free to do different tasks as DMA has taken over read/write operation.
Once the DMA is done with read/write operation it informs the processor by raising an
interrupt.
Step 5: On receiving the interrupt the OS realizes that DMA operation is done so the program that
was blocked during DMA, is moved to a ; so that the scheduler can schedule it.
The registers used by DMA interface are shown below.
Starting address register
🡺 Stores the address where data is read/written.
Word count register
🡺Stores the number of words to be read/written.
The third register contains the control and status flags. Some of the flags are
𝑅/𝑊
🡺used to identify if it is a read operation or write.
1-> read operation->read data from memory and write it to I/O device.
0-> write operation->read data from I/O device and write to memory
Done
🡺 When DMA is done with read/write this flag is set to 1.
IE
🡺 If this flag is set to 1, the DMA controller can go ahead and raise an interrupt
after completing the data transfer.
IRQ
🡺 The DMA controller sets it to 1 after raising an interrupt.
15
MODULE – 5
Computer Organization
The following figure shows the DMA controller along with other components in the computer
system.
As it can be seen from the figure, there are multiple DMA controllers. One DMA controller
connects the network interface to the computer bus. Another DMA controller controls two disks.
This is actually a disk controller that shares additional responsibility as a DMA controller. It
provides two channels (DMA channel) one for each disk. This means the registers for “starting
address” and “word count” is duplicated for each channel.
When data needs to move from Main memory to one of the disks, then the following steps are
performed:
Step 1: A program writes the starting address and word count (no. of words to be read/from main
memory) into the register of one of the disks.
Step 2: The DMA controller works independently and performs the read or write operation.
Step 3: Once the DMA operation is completed, then the following sequence of steps takes place:
●
Done flag in status and control register is set to 1 to indicate DMA transfer is
completed.
●
If IE bit is set to 1, the DMA controller on that channel raises an interrupt.
●
The IRQ bit is set to indicate that this DMA controller has raised the interrupt.
●
If errors were encountered, then some other flags can also be set.
Note:
(i)
The DMA controller gets higher priority compared to the processor when accessing the
system bus.
(ii)
Generally, the processor gets to read the main memory. However, if DMA needs to access
the main memory then it is provided more cycles. This is called cycle stealing. In fact, in
some cases, the main memory could be completely blocked for the DMA to finish its
operation. This is called as block or burst mode.
(iii) If both “processor” and the “DMA controller” try to access the bus to read from main memory,
a conflict arises. This conflict is resolved by an arbitration procedure, implemented on the
bus to coordinate activities of all devices requesting memory transfer.
Bus Arbitration
Bus Master: The device that is allowed to initiate data transfers on the bus at any given time is
called bus master.
16
MODULE – 5
Computer Organization
As soon as the bus master releases the control on bus, another device acquires this status. Bus
arbitration process is used to select the next bus master. The selection of device depends on
priority system.
There are two ways/approaches for bus arbitration:
● Centralized Arbitration
● Distributed Arbitration
Centralized Arbitration:
● A bus arbiter is assigned to decide who shall be the next bus master. Normally, the processor
itself acts as a bus master. The processor contains arbitration circuitry.
● Processor starts as a bus master. Once the processor relinquishes this role to a DMA controller
that needs to access the bus and thus, becomes the bus master.
● The steps of bus master transfer mechanism are as below:
Step 1: A DMA controller that wants the bus mastership activates a line called Bus-Request
line 𝐵𝑅. This bus request line is common to all DMA controllers, so processor does not
get to know which DMA controller has requested bus mastership.
Step 2: The processor activates the Bus Grant signal (BG) which goes in daisy chain fashion.
i.e., the DMA controller directly connected to processor gets it first and if that device had
requested for mastership it blocks the BG signal, else it forwards it to next DMA
controller.
Step 3: The current user of the bus would have activated the line Bus Busy (𝐵𝐵𝑆𝑌) to inform
all the controllers that it is using the bus. As soon as it is done, it deactivates the line
and the next user who has acquired the Bus Grant (BG) will activate the 𝐵𝐵𝑆𝑌 to inform
all others that it is using the line.
The figure below shows the arrangement of Daisy Chain Arbitration:
The timing diagram of the operation for one sample scenario where DMA controller 2 acquires the
bus mastership is shown in the following figure:
17
MODULE – 5
Computer Organization
Note: In the above figure, all DMA controllers are connected to common Bus Request Line,
alternately each DMA device could have its own Bus Request line and Bus Grant line. When
multiple devices request for bus, a priority mechanism could be used to schedule the different
devices.
Distributed Arbitration
In a distributed arbitration scenario, all the devices have the responsibility of arbitrating and
choosing the winner. There is no central arbiter. The figure below shows the circuit at one of the
device A.
Each device on the bus is assigned a unique ID. When the device wishes to acquire the
bus, the signal 𝑠𝑡𝑎𝑟𝑡 − 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑡𝑖𝑜𝑛 is asserted and their ID is put onto the lines 𝐴𝑅𝐵0 𝑡𝑜 𝐴𝑅𝐵3. A
winner is chosen based on the interaction among the signals transmitted on these lines.
18
MODULE – 5
Computer Organization
Let Device A transmit its ID 0101 on 𝐴𝑅𝐵0 − 𝐴𝑅𝐵3. At the same time, another device B
with ID 0110 also transmits its ID. An OR of both the id’s is done, this is because of the Open
Collector (OC) drain circuit takes the logical ORing of the two values on the line available at the
same time. So, the result 0111 is transmitted back on lines 𝐴𝑅𝐵0 − 𝐴𝑅𝐵3. Now, device A checks
the return ID with its ID from the MSB and compares to realize that 𝐴𝑅𝐵1 does not match. So,
device A disables lines 𝐴𝑅𝐵1 and 𝐴𝑅𝐵0, by placing a zero at the input of the device interface,
which results in 0110 (after ORing 0100 and 0110) on 𝐴𝑅𝐵0 − 𝐴𝑅𝐵3 lines. This means the OC
line has the ID that matches with device B and hence, device B is said to have won the bus
mastership and hence, it can gain access to system bus.
MEMORY SYSTEM
Basic Concepts
Memory is made up of contiguous set of locations. If we assign only 2-bits for a memory, we can
access only 4 memory locations (i.e. 22).
00
10101110
01
01010110
10
11111111
11
00000000
Similarly, if we use 16 bits to address a memory then we can access 216 = 64K locations, and a 40-bit
address can access 240 =1T (Tera) locations. We generally store and retrieve word length of data from
memory. For example, if we say read content from memory location 0 X bf23129f in a 32-bit system. Then,
on reaching this location, entire 32-bits worth of content (i.e. 4 bytes of content is read from memory.)
The figure below shows the read/write operation performed by processor or memory.
✔ MAR and MDR registers are used for read/write operations on memory.
19
MODULE – 5
Computer Organization
● If MAR is k-bits long, then 2k memory locations can be accessed.
● If MDR is n bits long then in one operation, “n” bits of data can be read or written in memory.
✔ Read Operation
The address of memory location is placed in MAR register. 𝑅/𝑊 line is set to 1. This triggers the
memory to read data from location whose address is in MAR. The read data is placed on the data
line and MFC signal is made high. On receipt of MFC signal the processor places the data currently
on data line into MDR register.
✔ Write Operation
● The data to be written is placed in MDR.
● The address of memory is placed in MAR.
●
𝑅/𝑊signal is set to 0. The data gets written to the memory location.
Memory Access Time: Time elapsed between the initiation of operation and the completion of that
operation. Ex Time between Read signal and the MFC signal.
Memory Cycle Time: Minimum time required between the initiations of two successive memory operations
Ex Time between two read operations.
RAM: A memory unit is called random access memory if a memory location can be accessed in fixed
amount of time for either read or write operation.
Cache memory: A small memory added between the processor and main memory. It is faster than RAM. It
holds currently active segments of code and data.
Virtual memory: The address generated by processor need not be the actual address of the operand/ data
in memory. They could be different. A control circuit maps both the address. The address generated by
processor is called virtual or logical address. This is mapped to actual memory location by memory
management unit.
The virtual memory could be much larger than RAM. Hence part of virtual memory is present in RAM and
rest is present in hard disk. If the virtual memory accessed is present in RAM, then it is directly accessed. If
it is not present, then a page of words is fetched from memory. This replaces the existing page in RAM.
Hence in case of virtual memory the pages in the RAM should be selected such that there are more hits
than misses. This reduces hard disk access.
Semiconductor RAM Memories
Internal Organization of Memory Chip
●
●
●
Memory cells can be organized in the form of an array and each cell is capable of storing one bit of
information.
Each row of cells constitutes a word. All cells of the word are connected to a common line called word
line. This is driven by the address decoder on the chip.
The cells in a column are connected to a sense / write circuit by two bit lines. These sense/write lines
are connected to the data input/output lines of the chip.
✔ Read operation: The address lines identify a particular word in the memory. The sense lines read
data from these cells of the word. The read data is then sent to the output data lines.
✔ Write Operation: The address lines identify a particular word in the memory. The sense/write circuit
receives the input information. They store the data in the cells of the selected word.
✔ The following figure shows the organization of 16x8 memory
o There are 16 words each of 8 bits.
20
MODULE – 5
Computer Organization
o
o
4 address lines are used to uniquely identify one of the 16, 8-bit words.
Data input and data out of each sense/write circuit are connected to bidirectional data lines.
These bidirectional lines are in-turn connected to the data bus of the computer.
o
There are two control lines 𝑅/𝑊and CS.
o
𝑅/𝑊 is used to specify if it is a read operation or write operation.
CS (chip select) is used to select one of the memory chips. Generally, there could be multiple
such 16X 8 chips connected in parallel. CS is used to select one of the 16X8 chip.
In all 128 bits (i.e. 16X8 = 128 bits) can be stored in this memory. There are 14 external
o
connections from this memory, namely, 8 data lines, 4 address lines, 𝑅/𝑊 and CS (Hence, in all
14 lines).
Two additional lines one for power and one for ground are also supplied.
We can also have 1K (1024) memory cells.
✔ One way to organize it is to have 128 words each of 8 bits. In this configuration we need 7 lines
for address lines, 8 lines for data lines, 1 line 𝑅/𝑊, 1 Line CS and 2 lines for power and ground.
✔ Another way to organize the same 1K memory is in 1K X 1 format. So we write or read only 1 bit
at a time. The circuit is organized as shown in Figure 5.3:
The circuit works in 2 stages as shown below:
Stage 1: Using the 5-bit row address first fetch a row of 32-bits, from the 32 x 32 memory.
Stage 2: Now, we have 32-bits, but we need to fetch only 1 bit. These 32-bits are passed to
column selector (lower circuit). Using the column address we select one of the 32-bits and
this is sent to user in read operation or this cell is written into, for write operation.
This 1K was an example for
smaller memory. We can have
much larger memories like 4M in
the form of 512K X 8 format which
21
MODULE – 5
Computer Organization
has 19 address and 8 data lines. We now have chips that provide hundreds of MB of capacity.
Static Memories
Static Memory Definition: Memories that consists of circuits capable of retaining their state
as
long as power is applied are known as static memories.
Figure below shows a static RAM (SRAM) cell.
(a) Two inverters are cross connected to form a latch.
(b) The inverters are connected to bit lines by transistors T1 and T2.
(c) These transistors act as switches that can be opened or closed under control of word line.
When word line is at ground level, the transistors are turned off and the latch retains its state.
Ex: Let’s assume the cell state is 1 if the logic value at point X is 1 and at point Y is 0. This state is
maintained as long as the signal on the word line is at ground level.
Read Operation
●
●
●
To read the state of SRAM cell, the word line is activated to close switches T1 and T2.
If the cell state is 1, the signal on bit line b is high and on bit line b’ is low. The opposite is true if
state is 0.
Sense/write circuits that are connected at the end of the bit lines monitor the state of b and b’ and
output is set accordingly.
Write Operation:
● The state of the cell is set by placing the appropriate value on bit ‘b’ and it’s complement on b’.
22
MODULE – 5
Computer Organization
● The word line is then activated. This forces the cell in the corresponding state. The required signal
on these bit lines are actually generated by sense/write circuits.
CMOS Cell:
●
The figure below shows a CMOS memory cell that stores a single bit value. This is a realization of a
single cell using CMOS (Complementary Metal-Oxide Semiconductor).
●
●
Transistor pairs (T3, T5) and (T4, T6) form the inverters in the latch.
Now, if the state of cell is 1, the voltage at points X is maintained at high by having T3 and T6 ON
while T4 and T5 are set OFF. So, when T1 and T2 are turned ON (Closed) the bit lines b and b’ will
have high and low respectively.
The voltage Vsupply is set to 5V in old CMOS circuits, newer circuits need only 3.3 volts.
Note that to retain the cell value, continuous power is required. If power supply is lost (for example:
when we power down laptops, the values in the cell of SRAM/CMOS cell is lost). Hence, SRAM is
called a volatile memory.
CMOS SRAM have an advantage that they consume very less power. Current flows only when read
write operation is performed, otherwise T1 and T2 are turned OFF and one of the transistor in each
inverter are turned off.
Static RAM can be accessed very quickly. Access times of few nanoseconds are available in
commercially available chips.
SRAMs are used in applications where speed is of critical concern.
●
●
●
●
●
Asynchronous DRAMs
●
●
●
SRAMs are expensive. Hence, a cheaper version called dynamic RAM(DRAMs) is used.
The storing element in DRAMs is a capacitor. A capacitor charge can be maintained only for few
milliseconds. Hence the cell contents need to be periodically refreshed to charge the capacitor fully.
The figure below shows a single cell of DRAM. The cell consists of a Transistor(T) and Capacitor (C).
23
MODULE – 5
Computer Organization
Storing information: To store information in a single cell, the Transistor T is turned ON. Then an
appropriate voltage is applied to bit line. This results in known amount of charge getting stored in the
capacitor “C”. Once charge is stored it is as good as “1” is stored in that DRAM cell.
Reading Information: As soon as the transistor T is turned OFF the capacitor starts leaking the
charge (this is the property of the capacitor). A threshold is specified for the capacitor. If the charge in
the capacitor is above the threshold only, then the capacitor is supposed to be storing a charge.
o During read operation the transistor is turned ON and the sense amplifier checks the capacitor
charge to see if the value is more than threshold. If yes, then the capacitor is assumed to have
a 1. Simultaneously, the capacitor charge is made to full capacity to compensate for slow loss.
However, if the sense amplifier detected that the capacitor charge was below threshold then it
completely drains the capacitor and concludes that nothing was stored in the cell (i.e. value 0
was stored in the cell).
o So, reading a cell also refreshes the content of the cell (to full charge or to zero charge
depending on the fact that the charge was above threshold or below threshold).
● Let’s take an example of 16-megabit DRAM. It is configured to read 8 bits at a time. So, 2M number
of 8 bits can be read/written from the memory, this is represented as 2M x 8.
● This 16 MB of memory can be organized as 4K x 4K array. So, there are 4096 rows and each row
has 4096 cells.
● Now, our aim is to read 8 bits chunks. So there are 512 chunks of 8 bits in one row.
● There are 4096 rows so 12 bits are required to access a row (4096 = 212), in each row we need to
pick one of the 512 chunks, so 9 bits are required for that (29= 512). The circuit below shows the
memory access process.
24
MODULE – 5
Computer Organization
●
Ideally 21-bit address bus is required. However, the row and column address are multiplexed on a
single 12-bit address.
●
In a read or write operation, the row address is applied first to pick one of the 4096 rows. The 𝑅𝐴𝑆
strobe is held high to latch the address.
The read operation is initiated where the entire row of 4096 bits (grouped in 512 groups) is read and
refreshed so that capacitors are charged for entire row.
Then the column address is placed on the address bus and column address selector (CAS) is
latched.
Based on the column address, the appropriate group of 8 sense/write circuits is selected.
●
●
Next the 𝑅/𝑊 pin is observed if there is a read operation, then data is read and transferred to D7-0
lines. For write operation data from D7-0 is pushed to the circuit to be written to the selected lines.
Note: When a row address is applied the entire row is accessed and refreshed. These happen for both
read and write operation.
Note: To ensure that the contents of a DRAM are maintained, each row of cells must be accessed
periodically. A refresh circuit usually performs this function automatically. Many DRAM chips
incorporate a refresh facility within the chip themselves.
Because of high density and low cost, DRAMs are widely used. They are available from 1 M to 256 M and
higher capacity is being designed. To give flexibility to design memory of higher sizes, these memory
chips come in various combinations.
Example: A 64 M memory comes in following flavors: 16 M x 4, 8 M x 8 or 4M X 16.
Fast Page mode:
If you notice, when we read 8 bits from a row the entire row is sensed but only 8 bits are placed on data
lines. However, this entire row might be part of an operation and we may need to read this full row.
Going by the above logic, the following is observed:
1. Give the row address. The row address gets activated.
2. Give the 9-bit column address. The column gets activated and 8 bits are read.
3. Again give row address to read same row.
4. Give the 9-bit column address to read next 8 bits.
5. Again give same row address to read the row.
6. Give the 9-bit column address to read next 8 bits.
You will notice that steps 1, 3, and 5 do read the same row every time. To avoid multiple read of same
row, a latch can be added to each column. So, when a row is read it is latched and only steps 2, 4, 6
are read. This makes it faster to read all the contents of the row. This is called paging mode (or fast
paging mode). This fast paging mode is used in graphics terminal. It is also used in current computers
to transfer chunk of data between main memory and cache.
Synchronous DRAMs
●
DRAMs
have
gone
through
some
improvements and all operations are now
synchronized by a clock signal. Such
memories are called as synchronous
DRAMs (SDRAMs). The structure of the
SDRAM is shown in Figure 5.8.
25
MODULE – 5
●
●
●
Computer Organization
It is similar to Asynchronous DRAM. The additional thing is the fact that output of sense amplifier is
connected to a latch.
When a read instruction is issued the content of entire row is loaded into these latches. Only the data
for corresponding columns (say 8-bit chunk) is moved to data output register so that read operation
completes.
Mode register: SDRAM can have different modes of operation. The mode is specified in the mode
register. For example, burst operation is one of the mode. In burst operation more than 8 bits might be
read. This is the page fetch mode that was discussed in previous section. So, in this mode the column
selector is not required as all the bits of a row are read.
In Figure 5.9, we see the timing operation in SDRAM. As we said before the main highlight of SDRAM is
that all operations are performed at regular clock cycles. Our aim is to read burst of length 4 (i.e. say 4
bytes).
(a) In the first clock cycle, the row address is placed on address line. Simultaneously, 𝑅𝐴𝑆 line is held
low to indicate that row address has been placed. Memory takes about 2-3 clock cycles to place the
contents in the latches. So, we wait for 2 cycles.
(b) Column address is placed in the address lines and 𝐶𝐴𝑆 line is held low to indicate that address line
contains column address.
(c) After one clock cycle the first set of data (DO) is placed on data line.
(d) We don’t have to place address for row and column. Instead the 2nd(D1), 3rd (i.e. D2) and 4th(i.e. D3)
set of data bits are placed in the next 3 clock cycles. At this point all 4 sets of bits have been read.
Note:
(i)
(ii)
SDRAM has as inbuilt refresh circuit which has a refresh counter which provides the row number
to be refreshed. Please remember that we need to refer all rows at regular intervals of time as in
case of DRAM the capacitors keep discharging and they need to be recharged. Each row has to
be refreshed every 64 ms.
Commercial SDRAMs can work with clocks of clock speeds above 100 MHz (i.e. 100 X 106
clock cycles per second. In above example, we read 4 bytes in 9 clock cycles at 100 MHz, we
6
(iii)
can do at most 10 X 10 such operations).
Memory clock cycles are generally in sync with bus speeds. Intel defined PC100 and PC133
which says a bus supports 100 or 133 MHz. Hence, memory also operates at 100 or 133 MHz
So, memory designers produce 100 or 133 MHz chips.
26
MODULE – 5
Computer Organization
Latency and Bandwidth
Two important parameters that are used to evaluate a memory (performance parameters) are: latency and
bandwidth.
Latency:
● Latency is the amount of time it takes to transfer a word of data to or from the memory.
Latency for one word: for one word read/write latency gives accurate value.
Latency for one block: A block is multiple words. So, to measure latency for a block we measure latency
only for first word.
●
In previous example (timing figure 5.9) the access cycle begins with assertion of 𝑅𝐴𝑆 𝑠𝑖𝑔𝑛𝑎𝑙 . The first
word is available 5 cycles later. So, latency is 5 cycles. If this SDRAM had a clock rate of 100 MHz (i.e.
each cycle duration is 10 ns). So, total time taken for read = 5 X 10 ns = 50 ns. Rest of the 3 words is
transferred in immediate cycles.
Bandwidth:
● Bandwidth represents the number of bits or bytes that can be transferred in one second. The bandwidth
of a memory depends on.
(a) Speed of access to stored data
(b) Number of bits that can be accessed in parallel.
(c) It also depends on the speed of bus that connects the memory and processor.
So, bandwidth is a product of rate at which data are transferred and the width of the data bus.
Double-Data-rate SDRAM:
● SDRAM performs all actions on the rising edge of the clock signal.
● A new memory called double-data-rate SDRAMs (DDR SDRAMs) accesses the cell array in the same
way, but transfer data on both edges of the clock. Since they transfer on both edges of the clock their
bandwidth is essentially doubled. To aid this, the cell array is divided into two banks that can be
accessed separately. Successive words are stored in separate banks. This helps in reading the words
on two edges.
● DRAMs and SDRAMs are efficiently used in following cases:
(a) When chunk of data is transferred ex: from cache to memory
(b) During high quality video display which is again a bulk transfer.
Differences between SRAM and DRAM
SRAM
DRAM
Stands for Static Random Access Memory.
Stands for Dynamic Random Access Memory.
No need to refresh as the transistors inside
would continue to hold the data as long as the
power supply is not cut off.
Requires data to be refreshed periodically in
order to retain the data.
Faster. Access time is few ns.
Slower because of refreshing.
High cost because it requires 6 transistors per
cell.
Low cost because it requires 1 transistor and
a capacitor per cell.
Used in cache memory where speed is
essential.
Used in main memory.
27
MODULE – 5
Computer Organization
Structure of Larger Memories
In this section, design and structure of large memories are dealt.
Static memory Systems
Consider a memory of size of 2 M (2,097,152) words of 32-bits each. Suppose, we need to design this
memory using 512 X 8 static memory, then the following are needed:
● 4 number of 512 X 8 chips to have 512 X 32 bits
● 4 such rows to have 4 X (512 X 32) combination
So, in all 16 number of 512 X 8 chips are required. They are connected as shown in Figure 5.10.
●
●
●
●
●
●
●
First column gives the 1st byte of 2M memory
Second column gives the 2nd byte of 2M memory
Third column gives the 3rd byte of 2 M memory.
Fourth column gives the 4th byte of 4 M memory.
Each chip has a control input called chip select. If chip select is 1, then data from that chip is read or
written.
21 address bits are needed to select a 32 – bit word in this memory. The higher order 2 bits are used to
select the appropriate row/chip group. Rest 19 bits are used to fetch the specific byte from each chip of
selected row.
The 𝑅/𝑊 inputs of all chips are tied together to provide a common 𝑅𝑒𝑎𝑑/𝑊𝑟𝑖𝑡𝑒control.
Dynamic memory Systems
● The memory is generally not part of the PCB (Printed Circuit Board) on the mother board. If the memory
is soldered to the mother board then it cannot be expanded in future.
● Hence the memory (RAM) is built in the form of small plug in boards that plug vertically into a single
socket on the mother board. These large memory units are also called SIMMs (Single In line Memory
Modules) and DIMMs (Dual In – line Memory Modules).
● These SIMMs and DIMMs come in different sizes ex: 4M X 32, 16 M X 32, 32M X 32 bits DIMMs use a
100 pin socket. 8M x 64,16M x 64,32M x64 uses a 168 pin socket.
● By using the pluggable memory boards, we can easily expand or replace memory.
28
MODULE – 5
Computer Organization
Memory System Considerations
●
●
Static RAMs are generally used in cache memory; these are used only when very fast operation is
the primary requirement. The cost and size are the factors.
Dynamic RAM is generally used for main memory. Their high density makes them economically
feasible.
Memory Controllers
In order to reduce the number of address lines the address is split into row address and column
address. This is done by the memory controller.
The processor sends a single address to the memory controller. The memory controller splits it into
row address and column address. It also generates the 𝑅𝐴𝑆 𝑎𝑛𝑑 𝐶𝐴𝑆 signals and the timing information for
row and column information (i.e. 2 clocks). It also sends the 𝑅/𝑊and CS signals to memory.
●
Another important activity the memory controller does is to do the periodic memory refresh. The
capacitors keep discharging and they need to be recharged. The memory controller has a counter
that does this job. The counter keeps track of row address that need refresh every x ms.
Refresh overhead
● Old DRAMs had a refresh period of 16 ms, where-as typical SDRAMs have a period of 64 ms.
● Let’s take an SDRAM arranged in 8 K rows (8192). Let it take 4 clock cycles to access each row.
So, to refresh all 8K rows it takes 8192 X 4 = 32,768 Cycles.
● If the clock rate is 133 MHz = 133 X 106 cycles per second.
●
Then total time spent only for refreshing = 32,768 / 133 X 106 = 246 X 10-6 seconds
So, in the 64 ms time interval 0.246 ms is spent in refreshing. Hence, refresh overhead = 0.246 / 64 =
0.0038 i.e. 0.38 %. This is less than 0.4 that of the total time available for accessing memory.
RAMBUS MEMORY
●
●
●
●
The bandwidth (bits transmitted by memory per sec) not only depends on memory chip. It also
depends on the bus speed. A bus that is clocked at 133 MHz allows at most one transfer every 7.2
ns or 2 transfers if both edges of clock are used.
Only way to increase the amount of data transfer on a bus whose speed is limited is to provide more
data lines thus widening the bus.
But this is expensive and requires more space on mother board.
Rambus Inc. came up with a different solution to tackle this on a narrow bus itself. They use a fast
signaling mechanism to transfer information between chips.
29
MODULE – 5
Computer Organization
●
Instead of using just two voltage levels of 0 and Vsupply for signaling, they propose to use smaller
voltage swings around a reference Vref.
● The reference voltage is Vref = 2V. The two logical values are represented by a swing of 0.3V above
and below the reference voltage.
● Because, the swing is small, transmission time is less and hence high speed of transmission. This is
known as differential signaling.
● Differential signaling requires:
(a) Special techniques for design of wire connections.
(b) Special circuit interface to handle differential signals.
Rambus provides specification for this. Rambus allows for clock frequency of 400 MHz. If data is
transmitted on both clock edges the data transfer speed goes up to 800 MHz.
● Rambus also needs specially designed memory chips. They use cell arrays based on standard
DRAM technology. Multiple such banks of cell arrays are used for parallel data transfer. Such chips
are called Rambus DRAMs (RDRAMs).
● Original Rambus specification had 9 data lines .8 data lines for data and 1 line for error checking.
Also number of control and power supply lines was proposed.
A two channel Rambus RAM known as Direct RAM has 18 data lines to transfer 2 bytes at the same
time.
● Communication between processor (Master) and the RDRAM (slave) is by means of packets
transmitted over data lines.3 types of packets are supported.
(a) Request packet: used by master to specify the operation it wishes to perform. It contains
address of memory location, 8 bit count that specifies the number of bytes involved in transfer,
operation type (read/write) and control information.
(b) Acknowledge packet: When the slave receives the request it sends a positive
acknowledgement if it can satisfy the request immediately. Else it sends a negative
acknowledgement to indicate it is busy.
(c) Data packets: transfer data of say read operation
● Multiple RDRAM chips can be assembled together to form larger modules. RIMM is one such
module that can hold 16 RDRAMs.
● Rambus technology is a competitor to DDR SDRAM.DDR SDRAM is an open technology whereas
RDRAM is a proprietary design of Rambus Inc. for which chip manufacturers have to pay royalty.
This forms a major point for designers when they want to minimize cost.
Cache Memories
●
●
●
●
Main Memory is slower compared to speed of processor. Hence a scheme to reduce this delay is to use
a fast cache memory that makes the main memory appear faster than it really is.
Cache memory effectiveness is based on “locality of reference”. This involves two aspects: Temporal
and Spatial.
Temporal: A recently executed instruction is likely to be executed again very soon.
Spatial: Instruction in close proximity to a recently executed instruction is also likely to be executed
soon.
30
MODULE – 5
●
Computer Organization
A cache operation is fairly simple. It places the temporal and spatial instructions in the cache memory
i.e. an instruction that was recently executed and also the instructions around the recently executed
instruction are all placed in cache memory. If these instructions are executed again then they are
directly fetched from cache.
Figure below shows the cache arrangement.
●
✔ When a processor issues a read request, a block of memory containing the location is transferred to
cache one word at a time.
✔ Subsequently, when processor refers to the same word, it is directly fetched from cache.
✔ Cache can store reasonable amount of blocks at any given time.
✔ Replacement Algorithm: When the cache is full and a word needs to be read which is not in cache
then one of the blocks need to be replaced. Replacement algorithm decides which block needs to
be replaced.
A processor does not need to know about the existence of cache. It only issues a read or write request.
If the content is present in cache, we read/write from cache. This is called read or write hit.
Write Through and Write Back protocol:
In case of read, the data is read from cache. However, in case of write, the content needs to be
written back to main memory. There are two ways to write:
⮚ Write Through: In case of write through, the content is updated both in the cache as well as main
memory.
⮚ Write Back or Copy Back: In case of write back, only the cache is updated. To mark the block as
updated, a dirty/modified control bit is set. The Main memory is updated when the block is being
removed to make way for another block.
Write through is simpler but might result in updates to main memory multiple times during cache residency.
Write back can also result in redundancy as the entire block is written back when the block is being
removed, even if only one block in the word is dirty.
Load Through: When cache miss occurs, the entire block is read from memory and only the requested
word is forwarded to processor before storing it in cache. This is called Load Through. This reduces
Latency which would be caused if we store the block in cache and then forward just the word.
Mapping Functions
We need a mapping function to map a block in main memory to cache. Let:
Main Memory
Have 64 K words of 16 bits. These are grouped into blocks each block having 16 words. So, total
number of blocks = 64K/16 = 4096 blocks.
31
MODULE – 5
Computer Organization
Cache memory
Have 2K words of 16 bits. These are grouped into blocks of 16 words. So, total number of blocks =
2K/16 = 128 blocks. So, 4096 blocks in main memory need to be mapped to 128 blocks of cache.
A. Direct Mapping
Simplest form of Mapping is the direct mapping. BLOCK j of main memory maps to “j modulo 128” block in
cache. Hence,
Blocks 0,128,256,512,1024……………….. map to Block 0 in cache
Blocks 1,129,257,513,1025……………….. map to Block 1 in cache
Blocks 2,130,258,514,1026……………….. map to Block 2 in cache
…………………….
Blocks 127,255,511,1023,2047,……………….. map to Block 127 in cache.
●
●
Contention arises when, say block 1 of main memory is in cache and block 129 needs to be placed
in cache. Since both will occupy the same block in cache (the block 1), hence Block 1 of main
memory is replaced with Block 129 of main memory in the cache.
Placement
The entire main memory can be thought of as
✔ 32 Super blocks
✔ Each super block has 128 blocks
✔ Each block has 16 words
So,
✔ 5 bits to identify a superblock (tag field)
✔ 7 bits to identify a block within superblock (block field)
✔ 4 bits to identify word within block (word field)
✔
When a block enters from main memory into cache the 7-bits are used to identify where this Block will
fit in cache.
32
MODULE – 5
✔
✔
B.
●
●
●
●
●
Computer Organization
The higher order 5 bits are stored in a tag field to identify (the super block) where it was in main
memory.
When the processor issues a read instruction or write instruction, the first 5 bits of the instruction are
compared with the 5 bits in tag. If they match, then it means the correct word is present in the cache.
If the tag does not match, then the block needs to be read from memory.
Associative Mapping
In Associative Mapping the block from memory can be placed in any of the 128 blocks in cache.
12 tag bits are required to identify a memory block when it is resident in the cache.
The tag bits of the address received from processor are compared with the tag bits in cache to check if
the block is present in cache or not. This is called Associative Mapping technique.
An existing block is replaced to make way for new block only if cache is full else an empty block in
cache is used to place the block.
The performance cost of associative mapping is higher because all 128 blocks may have to be
searched to confirm if a block is present or not. Such a search is called associative search.
C. SET-ASSOCIATIVE MAPPING
● A combination of Direct Mapping and Associative Mapping can be used to form set associative
mapping.
● The blocks are grouped into sets (example: set of 2). A block read from memory can be placed in
any of the blocks within a set.
● In the above example, blocks 0,64,128 … 4032 of main memory map to cache set 0. So, they can
be placed in either Block 0 or Block 1 of set 0.
● If there are 128 blocks per set, then it is same as Associative mapping and if there is 1 Block per set
then it resembles direct mapping.
● Cache that has k blocks per set is referred to as k-way set associative cache.
● One more control bit, called the valid bit, must be provided for each block.
● The valid bits are all set to 0 when power is initially applied to the system or when the main
memory is loaded with new programs and data from the disk.
● Transfers from the disk to the main memory are carried out by a DMA mechanism. This
process bypasses the cache for performance reasons.
33
MODULE – 5
Computer Organization
●
●
The valid bit of a particular cache block is set to 1 the first time this block is loaded from
main memory.
● If a DMA transfer alters this block in main memory, then the valid bit is cleared to 0.
The following Figure shows a 2 blocks per set example:
Replacement Algorithms
●
In case of Associative Mapping and Set Associative mapping, we may have to remove a block from
cache to accommodate a new block. An algorithm is needed to decide which block needs to be
removed.
●
Least Recently Used(LRU) algorithm
Blocks that have been referred recently have high probability that they will be referred again hence
when a block needs to be replaced choose the block that has not been referred for a long time. This
algorithm is called LRU replacement algorithm.
In case of set associative mapping, each block is associated with a counter. If a block is accessed
it’s counter is set to 0 and counter of other is incremented. Every time a block is accessed, its
counter is set to 0. So, when a new block needs to be loaded from main memory to cache choose
the block in the set that has higher count value.
LRU works poorly for array access when array size is too large to fit in cache.
●
Oldest block first algorithm
In this method, the oldest block that was fetched is removed to make way for new block.
34
MODULE – 5
●
Computer Organization
Random chosen block remove algorithm
In this method, a block is randomly chosen in cache to be replaced for new block that has been read
from RAM.
It has been found that this random removal technique is quite effective in practice.
35
Download