Chapter 9 Interprocess Communication

advertisement
Chapter 9: Interprocess Communication
By Sape J. Mullender
Prepared by Rachel Elkin
4/5/99
Note: These notes do not attempt to be original
in any way, shape or form. If you can't figure
out from whence they come, you're in the wrong room.
9.1 Introduction
Distributed systems have multiple processing elements that may fail independently: a vital function of
communication mechanisms is the prevention of crashes in one process bringing down another.
There is advantage in making use of the same mechanisms for interaction between processes on one
processor, a shared-memory multiprocessor, and a distributed system: It makes life easier.
Four Functions of IPC:
1) Allow communication between separate processes over a network
2) Provide firewalls against failure, plus the means to cross protection boundaries
3) Enforce clean and simple interfaces, and id for modular structuring of large hairy distributed
applications
4) Hide the distinction between local and remote communication, allowing stating or dynamic
reconfiguration
You gotta use a Remote Procedure Call (RPC) once in your life. You gotta love it.
RPCs are kind of like a woman in a man's world: they have to work twice as hard to get half the credit.
Seriously, they must deliver information with minimum latency, maximum throughput and in the case of
continuous media, minimum jitter (irregularities in latency). IPC should be authenticated (receiver knows
who sends it) and secure (sender knows who receives it). Not only that, but they should hide as many
failures as possible, and recover from errors as gracefully as possible.
All this, and RPCs incur all kinds of inefficiencies from OS interfaces, hardware mechanisms that have to
be invoked, protocol machinery for host, network and process failures, and the inefficiencies of making the
interfaces uniform.
9.2 Computer Networks
We'll assume nodes can fail independently.
Network characteristics: topology, transmission speeds, data packaging.
Data is bundled and transmitted as packets. Each packet has a redundancy attached to it that allow error
detection and error correction (error detection requires less redundancy).
A checksum is computed using an algorithm that calculates most "typical" errors and can be computed in
real time.
Examples: deep space probes probably don't want to have to have just error detection; continuous media
probably doesn't really need either error detection or correction.
ATM stands for Asynchronous Transfer Mode, and don't you forget it. ATMs work in virtual circuits, with
cells that group together to make packets. Several virtual circuits can lead into one host, so the host has to
be able to demultiplex, i.e., figure out which packets go with which circuit. The layer that's responsible for
this (and also packet-level error detection (individual cells don't carry checksums with them)) is the ATM
Adaptation Layer (AAL).
Demultiplexing is usually done by copying packet data from the buffer in which they were received to the
appropriate process' address space. It would be preferable for the data to initially arrive in the appropriate
process' address space, but demultiplexing is generally done after the packets arrive, so that's pretty tough.
The latency is pretty terrible. ATMs will force host interfaces to receive data from each virtual circuit into
its own set of buffers.
9.3 Protocol Organization
Data Link Layer: Media errors are corrected in the data-link-layer. The data-link-layer protocol uses
timeouts, acknowledgements and retransmissions to detect and correct packet loss. So, packets won't
necessarily arrive in the order they were sent.
Network Layer: Network-layer protocols route packets. When routing decisions are made on a packet-bypacket basis, a connectionless or datagram service is being used. When decisions are made about a string
of packets, a virtual-circuit or connection-oriented service is being used.
Connectionless service will not guarantee that packets are received in the order they were sent.
Connection-oriented services usually guarantee packets are delivered in the order sent, and that no packets
are lost (except in the case of ATM, where intermediate-node failures are usually not masked).
Transport Layer: The protocol responsible for delivering data end-to-end between processes in the
network. If the underlying protocols don't provide the desired reliability, the transport layer must.
9.4 Fundamental Properties of Protocols
As long as there is some communication, failures can be corrected using acknowledgements and timeouts.
However, if processes can crash, communication protocols cannot be made completely reliable. Sorry.
The problem is: Say a client sends out a request over a network that can lose packets, and no response
comes back. The client has no way o finding out whether the server has received the request and has
started carrying out the requested work.
Even if the network is reliable, the client has the same problem. The server could crash before it does its
work or just after, before sending a reply. If the network is reliable, at least the client knows that the server
crashed.
When the server comes back, it can resume communication if the client retransmits the last -- unanswered-request. However, the server may have carried out that request just before it crashed. All the client can do,
if the server has total amnesia, is warn the server that it might have already carried out the request.
We're going to stick to amnesia failures. We can't guarantee exactly-once delivery, but we can at least aim
for at-least-once or at-most-once message delivery.
At-least-once protocols are idempotent: carrying them out once is the same as carrying them out several
times (reading, writing, opening files…)
At-most-once protocols detect a failure and report it. They work in sessions, association between two
processes during which both maintain protocol state. When state is lost, session is terminated. Parties must
agree on session names.
Naming schemes: use timestamps. Use random numbers.
Figure 9.2, p.224: An example of what happens when two servers have the same address: the at-most-once
principle is violated.
Example: Amoeba. Uses functional addressing. The OS runs the communication protocol, always finding
a unique name for the process. Naming is then a concatenation of client and server process' unique names.
Figure 9.3, p.226: A response sent to the client after execution of a request is indication that it was carried
out without failure. However, when the client doesn't get a response, it doesn't know what's up. An
indication of failure or the network or the server is not enough for certain knowledge that a request has not
been completed: only if the client somehow finds out that the request could not be delivered, can it know
for certain that the request was not carried out.
The end-to-end argument: "Look, we need an application-level end-to-end protocol to make things reliable
anyway, so why not integrate this end-to-end machinery with the rest of the protocol machinery for reliable
message delivery?"
9.5 Types of Data Transport
Remote Operations: One process sends a message to another, asking it to do some work. RPCs are remote
operations with an extra layer of software that packages procedure arguments in request and response
messages. Other names: remote invocation, client/server communication, request/response
communication.
Bulk Data Transfer: Efficient transfer of large bodies of data.
One-to-Many Communication: Broadcast protocols. (Broadcast = to all; Multicast = to a group). Watch
out for unintentionally synchronized messages. What if a broadcast goes out, and every recipient
immediately replies to the sender. Oops. Traffic jam.
Continuous Media: Gotta combine continuous transport and low-latency delivery.
9.6 Transport Protocols for Remote Operations
Desirable properties of transport protocols:
- at-most-once behavior,
- positive feedback when no failures occur
- error report when there have been failures
- low end-to-end latency
- support for very large request and reply messages
T Protocol: (At-most-once behavior requires the client and server maintain state that allows the client to
generate requests such that the server can always tell which requests its processed before.) T is the
maximum life of a packet on the network;
T includes:
- transmission delay through the network,
- max # of retransmissions, and
- time interval between transmission.
When there's been no communication for > T time, the session terminates and another starts. Pays the
price in waiting around for T to pass all the time, but you can guarantee at-most-once this way.
Telling new transmissions from retransmissions:
- Packets number separately for each direction, independently of the actual remote operations
- Messages numbered, subnumbering scheme for packets in message
- Remote operations numbered and message types and packet subnumberings used to distinguish
packets w/in a remote operation
(Numbering remote operations, rather than messages or packets, makes it easy to couple replies to requests
and requests to previous replies.)
Figure 9.4, p.231 A full transport protocol for remote operations; each message is acknowledge and keepalives are used for detecting server crashes.
Figure 9.5, p.232 For quick transmissions: replies acknowledge requests, requests replies.
What do we do about figuring out whether a server went down or is just taking its time processing a
request? Well, we could keep a piggyback timeron the server and a retransmission server on the client OR
we could do something like what's showing in
Figure 9.6, p.234: The client keeps a timer, retransmits when the timer expires. Server has no timer, but
will always acknowledge any request sent for the second time. So, what if the client crashes?
How will the server know what's up? One way to take care of this is have the client, when it gracefully
terminates, send acknowledgements for all replies not followed up by another request.
Detecting server crashes: client state extended with an ack-received flag. When client receives an ack, it
sets the flag to true and increases the retransmission timeout period. When the timer expires, client checks
flag. If false, retransmits request. If true, retransmits just header. Server checks header, sees if it's received
the request. If it has, just ignores the data, and sends an ack.
Figure 9.7, p.235: Large Messages using packet-blast or netblit protocols.
For a request n blasts in size (B1, B2,…., Bn; n>=0)… For I=1 to n, the client:
1. sends Bi,
2. Starts a retransmission timer and
3. Waits for expiration of the timer or reception of an ack;
4. If the retransmission timer expires, the client goes back to step 1,
5. If an ack is received, the client cancels the timer and iterates.
Total message size should be sent as part of the header, because Bn (the last blast) might be a partial blast.
9.7 Remote Procedure Call
RPC is remote operations in the guise of a procedural interface.
Most transport protocols execute in the operating system, but carry user-defined data structures, so they
cannot be used to do data conversion. It is the data portion of a message that is the problem.
Figure 9.9, p.238, shows it all. Client calls a client stub, which marshalls the parameters, calls on the
transport protocol to ship the request message to the server. The server stub get is, takes the parameters and
puts them on the stack and calls the actual remote subroutine. Repeat in reverse for reply.
We can go over Figure 9.10-12 (p.238-40) if we have time.
Differences between PC and RPC:
- client and server can fail independently (server failure detection isn't a problem in PC)
- client and server don't execute in the same address space, so global variables and pointers cannot
be used across the interface.
- in RPC, function-passing is close to impossible.
Marshalling: the technical term for transferring data structures used in RPC from one address space to
another. Goals: 1) linearize the data for transport. 2) translate the calling process' data structure to the
called process' data structure.
Call-back-handle: When the called procedure runs into a pointer, it sends an RPC asking for the data it
points to. (note: you don't have to use self-describing data structures all the time, but if you use a variablelength array, you've got to specify the length somehow).
Stub generation: Client and server stubs can sometimes be generated by a stub compiler, using an IDL.
IDLs should also provide info about which way parameters go (in,out, or both).
9.8 System Issues
Let's hear it for Ethernet. We demand authenticated, secure communication. We also want timely
communication.
ATM Networks: For efficient transmission of small packets (which are generally used in contiguous
media), we must minimize the size of the header and the processing time per packet. ATM networks are
"made for this," with 48-byte cells. Each cell belongs to a virtual circuit, has a VCI (identifier).
Demultiplexing can often be done with a VCI table in hardware.
Encryption: 1) parties must initially authenticate each other and agree on an encryption key. 2)
authentication process might use an authentication server, or cached info. 3) authentication and data
transmission can use different encryption algorithms 4) session keys should be used for one session only.
Butler Lampson's Great Idea: Key that decrypts a packet is sent with every packet. The key is also
encrypted with a master key hat all network controllers get upon bootstrap.
Demultiplexing: Multiplexing outgoing messages is fairly easy. Demultiplexing is much harder though,
because of reasons mentioned above.
Download