Distributed Shared Memory (Chapter 10)



D i i s s t t r i i b u t t e d S h a r r e d M e m o r y ( ( D S M ) )

1



DSM



DSM systems hide explicit data movement from the programmer and provide a simpler abstraction for sharing data.

 It is easier to design and write parallel algorithms using DSM rather than through explicit message passing



DSM systems allow complex structures to be passed by reference and simplify development of algorithms for distributed applications.



Passing complex data structures and pointer is difficult and expensive using message passing.



By moving the entire block/page containing data referenced to the site of reference instead of moving only the specific piece of data referenced, DSM takes advantage of the locality of reference exhibited by programs and thereby cuts down on the overhead of communicating over the network.

2



Direct information sharing is the primary motivation behind DSM

 Simulates a logical shared memory address space over a set of physically distributed local memory systems to achieve direct information sharing among coordinating systems



Motivation for DSM came from Non-uniform Memory Access (NUMA) systems

 Used in large-scale multiprocessor systems

 Memory access pattern must not fall into uniform distribution

 Processors share one or more system buses or a scalable interconnection network (possibly circuit switched, i.e. cross bar switch)

 Desired to distribute the shared memory so that frequently accessed locations are closer to some processors

3



NUMA Architectures



Local caches on each processor are used to:

 Reduce memory access latencies (due to local caching)

 Reduce access traffic (bus contention) to the global shared memory



Therefore, primary concern is improving memory access performance



Issues with scalability

– multiple buses or scalable interconnection networks

4



Contrasts with DSM



There is no physical global memory



Virtual global memory is created by mapping all or parts of the local memory into a global address space shared by all nodes



Processors are connected through a loosely coupled network (usually packet switch)



More substantial delay time in accessing shared information



Multi-point connections are nonexistent which does not easily allow for broadcast and monitoring of memory accesses



Primary motivation for DSM is the desire to achieve transparency and not memory access performance

5



Consistency



Access to shared variable are competing accesses (read or write access)



Other accesses are non-competing accesses



Competing accesses must be synchronized to insure consistency



System memory can be consistent only at certain synchronization points

 Synchronization access operation synch(S) is used

 synch(S) accesses a synchronization variable S



Synchronization point may involve more than just a simple memory access

 Synchronization access operations, acquire(S) and release(S) can indicate begin and end of a synchronization period

6



Consistency Models (from paper by Hutto, Ahmad, Mosberger)



Atomic Consistency

 Requires that the distributed shared memory system have no replication of data (like centralized shared memory)

 Each read operation receives the latest value and all write operations are completed before subsequent read operations



Sequential Consistency

 Lamport “the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program”



Causal Consistency

 Multiple write operations from different processors may be commuted but must be observed in the same order by all processors

7

 Only writes that might be causally related must be observed in the same order by all processors

8



DSM Algorithms



Central-Server Algorithm

 Server maintains all data.

 Clients request to read and the server returns the data item requested by the client.

 On write requests, it updates the data.

 Has the draw backs of a centralized algorithm but it is simple to implement.

 Another implementation which is better is to partition the data based on address between multiple servers and then use a table to lookup the servers.

9



Migration Algorithm

 Data is shipped to the location of the data access request.

 Subsequent accesses to the data are performed locally.



Allows only one node to access a shared data page/block at a time.



Can see thrashing when pages frequently migrate between nodes.



Methods to reduce threshing.

 A server must keep track of the location of the migrating pages.

10



Read-replication algorithm

 Extends the migration algorithm by replicating data blocks and allowing multiple nodes to have access or one node to have read-write access.

 When performing write operation, all copies of the page must be invalidated or updated with the current page.



Full-replication algorithm

 Extends the read-replication algorithm by allowing multiple nodes to have both read and write access to shared data blocks

 Multiple nodes writing to shared data concurrently, requires shared data access control to maintain consistency



Gap-free sequencer



Two-phase commit protocol

11



Another Way of Looking at Memory Management Algorithms



Read-remote-write-remote

 Centralized server algorithm



Read-migrate-write-migrate



Migration algorithm

 Single reader, single writer

 Achieves better performance by exploiting program localities

 Suffers from false-sharing



Read-replicate-write-migrate



Read-replicate algorithm



Multiple readers, single writer

 Read-replication algorithm together with write-invalidate protocol

12

 Performs well when read access dominates write access

 Linked list or table is used to represent current state of sharing



Read-replicate-write-replicate



Full-replication algorithm



Uses two-phase commit protocol and sequencer

13



Object-based DSM



Granularity of the coherence unit is an object

 An object is defined as an encapsulated data structure with prescribed methods or operations



Only the declared shared objects need be managed by the DSM



Synchronization accesses are applied at the object level

14



Coherence



DSM systems make use of data replication so that multiple nodes can access the same data.



Write-invalidate protocol

 A write to a shared data causes the invalidation of all copies except one before the write can proceed.



A major disadvantage of this scheme is that invalidation are sent to all the nodes that have copies, irrespective of whether they will use this data or not.

 This protocol is better suited for applications where several updates occur between reads.

 This protocol is inefficient for systems where many nodes frequently access an object.

15



Write-update protocol.

 A write to a shared data causes all copies of that data to be updated.

 Generates considerable network traffic.

16



Cache coherence in PLUS

 Uses write-update protocol

 A memory coherence manager running at each node is responsible for maintaining the consistency using a link list that keeps track of all copies.

 A replica is marked as the master copy.

 In each node maintains one pointer which points to the master copy at each node and a link list points to the next copy.

 To maintain consistency, writes are always performed first on the master copy and are then propagated to the copies linked by the link-list.

17



Granularity



Size of shared memory unit (page size).



A large page size will allow local access to subsequent memory locations and reduce paging activity and communication protocol overhead.



However, the larger page size increases chance for contention and also reduces concurrency.

 False sharing.



Interesting case is the PLUS system



Large unit of replication.

 Very small unit of memory access and coherence maintenance.

18



Page replacement



Traditional LRU methods are not the best in DSM systems.



Page access modes must be taken into consideration.



Must minimize replacement of shared pages as they are moved over the network.



Read-only pages may be deleted and may not need to be moved over the network as they are not the main copies.

19

Distributed Shared Memory (Chapter 10)

DSM

NUMA Architectures

Contrasts with DSM

Consistency

Consistency Models (from paper by Hutto, Ahmad, Mosberger)

DSM Algorithms

Another Way of Looking at Memory Management Algorithms

Uses two-phase commit protocol and sequencer

Object-based DSM

Coherence

Granularity

Page replacement

Related documents

Products

Support

Distributed Shared Memory (Chapter 10)

DSM

NUMA Architectures

Contrasts with DSM

Consistency

Consistency Models (from paper by Hutto, Ahmad, Mosberger)

DSM Algorithms

Another Way of Looking at Memory Management Algorithms

Uses two-phase commit protocol and sequencer

Object-based DSM

Coherence

Granularity

Page replacement

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib