- Way2MCA

advertisement
Message Passing
Inter Process Communication
•
Original sharing (shared-data approach)
P1
•
Shared
memory
P2
Copy sharing (message passing approach)
•
Basic IPC mechanism in distributed systems
P1
P2
Desirable Features of a Good MPS
•
Simple
• Clean & simple semantics to avoid worry about system
or network aspects
•
Uniform Semantics
• Local communication
• Remote communication
•
Efficiency
• Aim to reduce no. of messages exchanged
•
Reliability
• Cope with failure problems & guaranteed delivery of
messages. Also handle duplicate messages.
• Correctness
• Handle group communication
• Atomicity
• Ordered Delivery
• Survivability
• Flexibility
• Users have flexibility to choose & specify type & level of
reliability & correctness requirement
• Security
• Secure end to end communication
• Portability
• Message passing system & applications using it should be
portable.
Message Structure


A block of information formatted by a sending process such
that it is meaningful to receiving process.
Various issues like who is sender/receiver, what if node
crashes, receiver not ready etc have to be dealt with.
Actual data
Structural information
or pointer
to the data Number of Type( Actual
data
or
bytes
pointer
to
/elements
data)
Variable
size data
Sequence
Addresses
number or
Receiving
Sending
message
process
process
ID
address
address
Fixed length header
Synchronization



Synchronization is achieved by communication primitives
– Blocking
– Nonblocking
The two types of semantics are used on both send & receive
primitives.
Complexities in synchronization
– How receiver knows when message is received in
message buffer in non blocking receive?
• Polling
• Interrupt
– Blocking send/receive could get blocked forever if
receiver/sender crashes or message is lost.
• Timeout
Synchronous Communication

When both send and receive primitives use blocking
semantics.
Sender’s
execution
Receiver’s
execution
Send(msg);
Receive(msg);
Execution
suspended
Execution
supended
Msg
Execution
resumed
Execution
resumed
Ack
Send(ack)
Blocked state
Execution state
Synchronous vs. Asynchronous
Communication

Synchronous Communication
– Advantages
• Simple & easy to implement
• Reliable
– Disadvantages
• Limits concurrency
• Can lead to communication deadlock
• Less flexible as compared to asynchronous
• Hardware is more expensive

Asynchronous Communication
– Advantages
• Doesn't
require
synchronization
of
both
communication sides
• Cheap, timing is not as critical as for synchronous
transmission, therefore hardware can be made
cheaper
• Set-up is very fast, well suited for applications where
messages are generated at irregular intervals
• Allows more parallelism
– Disadvantages
• Large relative overhead, a high proportion of the
transmitted bits are uniquely for control purposes and
thus carry no useful information
• Not very reliable
Buffering




Null Buffer (No Buffering)
Single Message Buffer
Unbounded Capacity Buffer
Finite Bound ( Multiple Message) Buffer
Null Buffer



Involves single copy.
Can be implemented in following ways:
– Sender sends only when receives acknowledgement from
receiver i.e. receiver executes ‘receive’. It remains blocked
otherwise.
– After executing ‘send’, sender waits for acknowledgement.
If not received within timeout period, it assumes message
was discarded & resends.
Not suitable for asynchronous transmission. Receiver blocked
till entire message transferred over network.
Sending
process
Message
Receiving
Message process
Single Message Buffer

Used in Synchronous Communication

Single message buffer on receiver’s side.


Message buffer may be in kernel’s or receiver’s address
space
Transfer involves two copy operations
Single msg buffer
Sending
process
Receiving
process
Message
Node
boundary
Unbounded Capacity Buffer



Used in asynchronous communication.
As sender does not wait for receiver to be ready, all
unreceived messages can be stored for later delivery.
Practically impossible
Finite Bound Buffer

Used in asynchronous communication.
Msg 1
Sending
process
Msg 2
Msg 3
Message
Msg n
Multiple-message Buffer/ mailbox / port
Receiving
process



Buffer overflow is possible. Can be dealt in two ways:
– Unsuccessful communication
• Message transfer fails when there is no more buffer
space. Less reliable.
– Flow-controlled communication
• Sender is blocked until the receiver accepts some
messages, creating space in buffer. This requires
some synchronization, thus not truly asynchronous.
Message buffer may be in kernel’s or receiver’s address
space
Extra overhead for buffer management.
Multidatagram Messages





Maximum transfer unit (MTU) - data that can be transmitted at
a time.
Packet (datagram) – Message data + control information.
Single datagram message - Messages smaller than MTU of
the network can be sent in a single packet (datagram).
Multidatagram messages - Messages larger than MTU have
to be fragmented and sent in multiple packets.
Disassembling and reassembling in sequence, of packets of
multidatagram messages, on the receiver side is
responsibility of the message passing system.
Encoding and Decoding of
message data

•
•
Structure of the program objects should be preserved when
they are transmitted from sender’s address space to
receiver’s address space. Difficult as:• An absolute pointer value looses its meaning when
transferred from one address space to another. Ex. Tree.
Necessary to send object-type information also.
• There must be some way for receiver to identify which
program object is stored where in message buffer & how
much space each program object occupies.
Encoding – program objects converted into stream by sender
Decoding – reconstruction of program objects from message
data



Representations used for encoding & decoding:
Tagged representation
– Type of each program object along with its value is
encoded in the message
– Quantity of data transferred more
– Time taken to encode/ decode data is more
Untagged representation
– Message data contains only program objects. Receiving
process should have prior knowledge on how to decode
data as it is not self-describing.
Process Addressing


Explicit addressing
• Send (process_id , msg)
• Receive (process_id , msg)
Implicit addressing
• Send_any (service_id , msg) //functional addressing
• Receive_any (process_id , msg)
Methods for Process Addressing

machine_id@local_id
– machine address @ receiving process identifier
– Local ids need to be unique for only one machine
– Does not support process migration


machine_id@local_id@machine_id
– machine on which process is created @ its local process
identifier @ last known location of process
– Link based addressing – link information left on previous
node
– A mapping table maintained by kernel for all processes
created on another node but running on this node.
– Current location of receiving process is sent to sender,
which it caches.
– Drawbacks
• Overload of locating process large if process migrated
many times.
• Not possible to locate process if intermediate node is
down.
Both methods location non-transparent
Location Transparent Process
Addressing


Centralized process identifier allocator – counter
– Not reliable & scalable
Two-level naming scheme
– High level machine independent name, low level machine
dependent name
– Name server maintains mapping table
– Kernel of sending machine obtains low level name of
receiving process from name server and also caches it
– When process migrates only low level name changes
– Used in functional addressing
– Not scalable & reliable.
Failure Handling

Loss of request msg
Sender
Receiver
Send request
Lost

Loss of response msg
Sender
Send request
Receiver
Request message
Successful request
execution
Response message
Lost
Send response

Unsuccessful execution of the request
Sender
Send request
Receiver
Request message
Unsuccessful request
execution
crash
Restarted
Four message reliable IPC protocol
server
client
Request
Acknowledgment
Reply
Acknowledgment
Blocked state
Execution state
Three message reliable IPC protocol
server
client
Request
Reply
Acknowledgment
Blocked state
Execution state
Two message reliable IPC protocol
server
client
Request
Reply
Blocked state
Execution state
Fault Tolerant Communication
Client
Send request
Server
Request message
Lost
Timeout
Send request
Retransmit Request Msg
Timeout
Crash
Send request
Unsuccessful request
execution
Retransmit Request Msg
Successful request
execution
Response msg
Timeout
Lost
Send request
Retransmit Request Msg
Successful request
execution
At – least once semantics
Response Msg
Idempotency


Repeatability
An idempotent operation produces the same result without
any side effect no matter how many times it is performed with
the same arguments..
debit(amount)
if (balance ≥ amount)
{ balance = balance-amount;
return (“Success”, balance);}
else return (“Failure, balance);
end;
Client
Send
request
Server (balance = 1000)
request
Debit(100)
Time
out
response
Process debit routine
balance =1000-100=900
Return (success , 900)
(success , 900)
Send
request
lost
Retransmit request
Process debit routine
balance=900-100=800
Response
(success , 800)
Handling Duplicate Request




Using the timeout-based retransmission of request , the
server may execute the same request message more than
once.
If the execution is non-idempotent, its repeated execution will
destroy the consistency of information.
Exactly–once semantics is used, which ensures that only one
execution of server’s operation is performed.
Use a unique identifier for every request that the client makes
and to set up a reply cache in the kernel’s address space on
the server machine to cache replies.
Req-id
Client
Send
request-1
Server (balance=1000)
Check reply cache for request - 1
No Match found , so process request-1
Save reply
Time
out
Lost
Return (success,900)
Retransmit request -1
Send
request-1
Debit (100)
Check reply cache for request - 1
Match found
Receive
balance
=900
(success,900)
Reply cache
Request-1
Debit (100)
Req -1
Reply
Extract reply
response
(Success,900)
Return ( success , 900)

Ques. Which of the following operations are
idempotent?
i.
ii.
iii.
iv.
v.
vi.
vii.
Read_next_record(filename)
Read_record(filename, record_no)
Append_record(filename, record)
Write_record(filename, after_record_n,record)
Seek(filename, position)
Add(integer1,integer2)
Increment(variable_name)
Handling lost and out-of-sequence
packets in multidatagram messages


Stop-and-wait protocol
– Acknowledge each packet separately
– Communication Overhead
Blast protocol
– Single acknowledgement for all packets. What if ?
• Packets are lost in communication
• Packets are received out of sequence
– Use bitmap to identify the packets of message.
– Header has two extra fields- total no. of packets,
position of this packet in complete message.
– Selective repeat send is implemented for unreceived
packets.
– If receiver sends (5,01001), sender sends back the
1st & 4th packet again.
Group Communication
•
One to many
•
Many to one
•
Many to many
One to Many




Multicast Communication
Broadcast Communication
Open Group
– Any process can send message to group as a
whole. Group of replicated servers.
Closed Group
– Only members of a group can send message to
the group. Collection of processors doing
parallel processing.
Group Management


Centralized group server
– Create & delete groups dynamically & allow
processes to join or leave group
– Poor reliability & scalability
Distributed Approach
– Open group – outsider can send a message to
all group members announcing its presence
– Closed group also have to be open with respect
to joining
Group Addressing


Two-level naming scheme
– High level group name
• ASCII name independent of location of processes in
group
• Used by user applications
– Low level group name
• Multicast address / Broadcast address
• One to one communication (Unicast) to implement
group communication
– Low level name :- List of machine identifiers of all
machines belonging to a group
– Packets sent = no. of machines in group
Centralized group server
Multicast




Multicast is asynchronous communication
– Sending process can’t wait for response of all receivers
– Sending process not aware of all receivers
Unbuffered Multicast/ Buffered Multicast
Send to all semantics
– Message sent to each process of multicast group
Bulletin Board semantics
– Message addressed to channel that acts like bulletin
board
– Receiving process copies message from channel
– Relevance of message to receiver depends on its state
– Messages not accepted within a certain time after
transmission may no longer be useful
Flexible Reliability in Multicast





0-reliable
1-reliable
m out of n reliable
All reliable
Atomic Multicast
– All - or - nothing property
– Required for all - reliable semantics
– Involves repeated retransmissions by sender
– What if sender/ receiver crashes or goes down?
– Include message identifier & field to indicate atomic
multicast
– Receiver also performs atomic multicast of message
Group Communication Primitives


send
send_group
– Simplifies design & implementation of group
communication
– Indicates whether to use name server or group
server
– Can include extra parameter to specify degree
of reliability or atomicity
Many to one Communication



Multiple senders – one receiver.
Selective receiver
– Accepts from unique sender
Non selective receiver
– Accepts from any sender from a specified group
Many-to-many Communication

Ordered message delivery
– All messages are delivered to all receivers in an
order acceptable to the application
– Requires message sequencing
S1
R1
m1
m2
R2
m2
S2
Time
m1
No ordering constraint for message delivery
Absolute Ordering


Messages delivered to all receivers in the exact order in
which they were sent
Use global timestamps as message identifiers & sliding
window protocol with it
S1
R1
R2
t1
S2
t2
m1
Time
m1
m2
m2
t1 < t2
Consistent Ordering


All messages are delivered to all receiver process
in the same order.
This order may be different from the order in which
messages were sent.
S1
R1
R2
t1
S2
t2
m2
Time
m2
m1
m1
t1 < t2


Centralized Algorithm
– Kernels of sending machines send messages to a single
receiver (sequencer) that assigns a sequence no. to each
message then multicasts it.
Distributed algorithm
– Sender assigns temporary sequence no. larger than
previous sequence nos., & sends to group.
– Each member returns a proposed sequence no. Member
(i) calculates it as
max(Fmax,Pmax) + 1 +i/N
Fmax: Largest seq. no. of any message this member
received till yet
Pmax: Largest proposed seq. no. by this member
– Sender selects largest sequence no. & sends to all
members in a commit message
– Committed messages are delivered to application
programs in order of their final sequence nos.
Causal Ordering


Two message sending events causally related (any possibility
of second message influenced by first one) then messages
delivered in order to all receivers.
Two message sending events are said to be causally related
if they are correlated by the happened-before relation.
S1
R1
R2
R3
m1
Time
m2
m1
m3
m2
S2
m1
m3

Happened before relation satisfies following
conditions:
– If a & b are events in same process & a occurs
before b.
– If a is event of sending a message by one
process & b is event of receipt of same
message by another process.
– If a→b & b →c then a →c
CBCAST Protocol
Vector of
Process A
3251
Vector of
Process B
Vector of
Process C
Vector of
Process D
3251
2251
3241
Process A sends
new msg
4 2 5 1 Msg
Deliver
Delay
A[1]=C[1]+1
not satisfied
Delay
A[3]<=D[3]
not satisfied
S[i]=R[i]+1 and S[j]<=R[j] for all j<>i
4.3BSD Unix IPC Mechanism

Network independent

Uses sockets for end point communication.



Two level naming scheme for naming communication end
points. Socket has high level string name, low level
communication domain dependent name.
Flexible. Provides sockets with different communication
semantics.
Supports broadcast facility if underlying network supports it.
IPC Primitives





socket() creates a new socket of a certain socket type,
identified by an integer number, and allocates system
resources to it.
bind() is typically used on the server side, and associates a
socket with a socket address structure, i.e. a specified local
port number and IP address.
connect() is used in connection based communication by a
client process to request a connection establishment between
its socket & socket of server process.
listen() is used on the server side in connection based
communication to listen to its socket for client requests.
accept() is used on the server side. It accepts a received
incoming attempt to create a new TCP connection from the
remote client.
Read/ Write Primitives


Read / write – connection based communication
Recvfrom/ sendto - connectionless communication
TCP/IP Socket Calls for
Connection
socket()
create socket
bind()
bind local IP address of socket to port
listen()
place socket in passive mode ready
to accept requests
take next request from queue (or wait) then forks
and create new socket for client connection
accept()
socket()
Blocks until
connection
from client
recv()
connect()
send()
Process request
send()
recv()
close()
close()
Server
Issue connection request to server
Client
Transfer message strings with
send/recv or read/write
Close socket
UDP/IP Socket Calls for
Connection
socket()
create socket
bind()
bind local IP address of socket to port
Receive senders address
and senders datagram
recvfrom()
blocks until
datagram
received
from a client
socket()
request
sendto()
Process request
sendto()
reply
recvfrom()
close()
Server
specify senders address
and send datagram
Client
Close socket
Download