D i i s s t t r i i b u t t e d S h a r r e d M e m o r y ( ( D S M ) )
1
DSM systems hide explicit data movement from the programmer and provide a simpler abstraction for sharing data.
It is easier to design and write parallel algorithms using DSM rather than through explicit message passing
DSM systems allow complex structures to be passed by reference and simplify development of algorithms for distributed applications.
Passing complex data structures and pointer is difficult and expensive using message passing.
By moving the entire block/page containing data referenced to the site of reference instead of moving only the specific piece of data referenced, DSM takes advantage of the locality of reference exhibited by programs and thereby cuts down on the overhead of communicating over the network.
2
Direct information sharing is the primary motivation behind DSM
Simulates a logical shared memory address space over a set of physically distributed local memory systems to achieve direct information sharing among coordinating systems
Motivation for DSM came from Non-uniform Memory Access (NUMA) systems
Used in large-scale multiprocessor systems
Memory access pattern must not fall into uniform distribution
Processors share one or more system buses or a scalable interconnection network (possibly circuit switched, i.e. cross bar switch)
Desired to distribute the shared memory so that frequently accessed locations are closer to some processors
3
Local caches on each processor are used to:
Reduce memory access latencies (due to local caching)
Reduce access traffic (bus contention) to the global shared memory
Therefore, primary concern is improving memory access performance
Issues with scalability
– multiple buses or scalable interconnection networks
4
There is no physical global memory
Virtual global memory is created by mapping all or parts of the local memory into a global address space shared by all nodes
Processors are connected through a loosely coupled network (usually packet switch)
More substantial delay time in accessing shared information
Multi-point connections are nonexistent which does not easily allow for broadcast and monitoring of memory accesses
Primary motivation for DSM is the desire to achieve transparency and not memory access performance
5
Access to shared variable are competing accesses (read or write access)
Other accesses are non-competing accesses
Competing accesses must be synchronized to insure consistency
System memory can be consistent only at certain synchronization points
Synchronization access operation synch(S) is used
synch(S) accesses a synchronization variable S
Synchronization point may involve more than just a simple memory access
Synchronization access operations, acquire(S) and release(S) can indicate begin and end of a synchronization period
6
Atomic Consistency
Requires that the distributed shared memory system have no replication of data (like centralized shared memory)
Each read operation receives the latest value and all write operations are completed before subsequent read operations
Sequential Consistency
Lamport “the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program”
Causal Consistency
Multiple write operations from different processors may be commuted but must be observed in the same order by all processors
7
Only writes that might be causally related must be observed in the same order by all processors
8
Central-Server Algorithm
Server maintains all data.
Clients request to read and the server returns the data item requested by the client.
On write requests, it updates the data.
Has the draw backs of a centralized algorithm but it is simple to implement.
Another implementation which is better is to partition the data based on address between multiple servers and then use a table to lookup the servers.
9
Migration Algorithm
Data is shipped to the location of the data access request.
Subsequent accesses to the data are performed locally.
Allows only one node to access a shared data page/block at a time.
Can see thrashing when pages frequently migrate between nodes.
Methods to reduce threshing.
A server must keep track of the location of the migrating pages.
10
Read-replication algorithm
Extends the migration algorithm by replicating data blocks and allowing multiple nodes to have access or one node to have read-write access.
When performing write operation, all copies of the page must be invalidated or updated with the current page.
Full-replication algorithm
Extends the read-replication algorithm by allowing multiple nodes to have both read and write access to shared data blocks
Multiple nodes writing to shared data concurrently, requires shared data access control to maintain consistency
Gap-free sequencer
Two-phase commit protocol
11
Read-remote-write-remote
Centralized server algorithm
Read-migrate-write-migrate
Migration algorithm
Single reader, single writer
Achieves better performance by exploiting program localities
Suffers from false-sharing
Read-replicate-write-migrate
Read-replicate algorithm
Multiple readers, single writer
Read-replication algorithm together with write-invalidate protocol
12
Performs well when read access dominates write access
Linked list or table is used to represent current state of sharing
Read-replicate-write-replicate
Full-replication algorithm
13
Granularity of the coherence unit is an object
An object is defined as an encapsulated data structure with prescribed methods or operations
Only the declared shared objects need be managed by the DSM
Synchronization accesses are applied at the object level
14
DSM systems make use of data replication so that multiple nodes can access the same data.
Write-invalidate protocol
A write to a shared data causes the invalidation of all copies except one before the write can proceed.
A major disadvantage of this scheme is that invalidation are sent to all the nodes that have copies, irrespective of whether they will use this data or not.
This protocol is better suited for applications where several updates occur between reads.
This protocol is inefficient for systems where many nodes frequently access an object.
15
Write-update protocol.
A write to a shared data causes all copies of that data to be updated.
Generates considerable network traffic.
16
Cache coherence in PLUS
Uses write-update protocol
A memory coherence manager running at each node is responsible for maintaining the consistency using a link list that keeps track of all copies.
A replica is marked as the master copy.
In each node maintains one pointer which points to the master copy at each node and a link list points to the next copy.
To maintain consistency, writes are always performed first on the master copy and are then propagated to the copies linked by the link-list.
17
Size of shared memory unit (page size).
A large page size will allow local access to subsequent memory locations and reduce paging activity and communication protocol overhead.
However, the larger page size increases chance for contention and also reduces concurrency.
False sharing.
Interesting case is the PLUS system
Large unit of replication.
Very small unit of memory access and coherence maintenance.
18
Traditional LRU methods are not the best in DSM systems.
Page access modes must be taken into consideration.
Must minimize replacement of shared pages as they are moved over the network.
Read-only pages may be deleted and may not need to be moved over the network as they are not the main copies.
19