Uploaded by Mina Hanna

327 Exam 1 - Cheat Sheet (1)

advertisement
Distributed Systems
-Rise of Distributed Systems: increased power, network connectivity
increasing, easy to connect hardware together
-Distributed System: Collection of independent computers that appears
to its users as a single coherent system.
Distributed System = Distributed hardware + distributed control +
distributed data
-Why? Big data growing, apps are data-intensive, indv. computers
have limited resources, chip multiprocessors now available,cpu speeds
grew not mem speeds.
-Requirements: programming models &
concurrency,communication,synchronization,consistency &
replication,fault tolerance,security,virtualization
-Organized as middleware(b/t apps and OS). Extends over multiple
machines and users can interact in a consistent way
-Goals: transparency, scaling(hiding communication latency->important
for interactive apps, asynch comm & distribution->spread
info/processing to 1+ location & replication->copy info to ++ availability
and decrease centralized load)
-Architectures:Client invoke indv servers,peer-to-peer systems,a
service by multi servers,web proxy server,web applets,thin clients and
compute servers
-DS EX: cloud computing(on demand, dynamic allocation of resources,
abstraction of resource,self-managed,billed for what you use,standard
interfaces) and Xaas
-IaaS:user gets access to virtualised hardware, manage OS,
middleware, runtime, data EX: amazon EC2
-PaaS:integrated devlopment environment(app
design,testing,deployment,hosting), develop apps on top, responsible
for managing data EX Google App Engine
-SaaS:top layer consumed by end user, app software provided EX
gmail, google docs, facebook
-HuaaS:extraction of info from crowds of ppl, arbitrary EX youtube vids
Networking
-many diff agreements(protocols)needed at various levels for two OS to
communicate.
-Internet is network of networks connecting millions of devices(hostsend systems,links-fiber to satellite,routers and switches).Collection of
protocols providing communication services to distributed applications.
-Can be define recursively as: two or more nodes connected by a link
OR two or more connected by a node.
Internet Protocol Stack
-application: protocols designed to meet
communication requirements of specific
applications,defines interface to a
service(FTP,HTTP)
-transportation: process-to-process data
transfer (TCP,UDP)
-network: routing of datagrams from
source to destination(IP,OSPF,BGP)
-link: data transfer between neighboring
elements (PPP,Ethernet)
-physical: transmission of bits on a link (electronic signals on cable,
light signals on fibre)
**ISO/OSI model same as above, but presentation and session are
inbetween app & transport**
-presentation: allow applications to interpret meaning of data
(encrypt,compression)
-session: synchronization, check pointing, recovery of data exchange
-Why layering?allows for identification,relationship of complex
system’s pieces. Each layer… gets service from one below->performs
specific task->provides service to one above. Modularization eases
maintenance and updating.
-IP address:32 bit unique identifier for host, router interface. Interface
connection between host/router and physical link (routers usually have
mult interfaces,host has 1, IP associated w/ each interface)
-IP
Header: version(IPv4), header length(usually 5), size(bytes,
header+data),flags(3 bits),time to live(# of hops/links packet is routed,
decremented by router), Protocol(type of transport;
1=ICMP,6=TCP,17=UDP),
Header Checksum(updates when packet header is modified by a
node),source addr/destination addr
-Datagram:every datagram contains destination’s address. If
connected to destination network, forward to the host in LAN(if network
# of dest == my net #). if not directly connected,forward to hosts default
router. Each router maintains a forwarding table(maps network number
rather than host addr into next hop or interface number -if directly
connected-)
-Address:unique byte-string that identifies a node
-Routing:process of forwarding messages to the destination node
based on its address
Types… Unicast(node-specific);Broadcast(all nodes on
network);Multicast(some subset of nodes)
-Address Translation in LAN: maps IP addresses into physical
addresses of the destination host or the next hop router
-Address Resolution Protocol(ARP):host caches table of IP to
physical address bindings(table entries discarded if not refreshed)>broadcast request if IP not in table->target machine sends physical
address to sender & updates add entry of source in its table.
Network Layer
-End to End Protocols: underlying best-effort network(drop&reorder
messages,deliver duplicate copies, limit packets,delivers messages
arbitrarily)common end-to-end services(guarantee delivery,deliver in
same order and at most one copy, supports large messages, supports
synchronization, allows receiver to flow control,supports mult
application processes)
-Transport Layer: provides logical communication b/t app processes
on different hosts. Runs in end systems: sender side -> breaks app
messages into segments, passes to network layer. Receiver side ->
reassembles segments into messages, passes to app layer. *more than
1 protocol available to apps, Internet: TCP & UDP*
-Transmission Control Protocol(TCP): connection oriented, reliable
transport, flow control, does not provide timing, minimum throughput
guarantees, security.
Segment Format: 4-tuple(SrcPort, SrcIPAddr, DsrPort,DstIPAddr),
sliding window + flow control(acknowledgement, SequenceNum,
AdvertisedWindow), flags(SYN, FIN, RESET,PUSH, URG,ACK),
checksum. Client -> connection initiator Server -> contacted by client
Three way Handshake: 1. Client host sends TCP SYN segment to
server, 2. Server receives SYN, replies with SYNACK segment, 3.
Client receives SYNACK, replies with ACK segment,may contain data.
Closing a connection:clientSocket.close(); 1. client end system sends
TCP FIN control segment to server 2. server receives FIN, replies with
ACK. Closes connection, sends FIN. 3. client receives FIN, replies with
ACK. 4. server, receives ACK. Connection closed.
-User Datagram Protocol(UDP): unreliable data transfer, does not
provide connection setup, reliability, flow control, congestion control,
timing, throughput guarantee or security.
Simple Demultiplexor(UDP)unreliable and unordered datagram service.
No flow control or error control, endpoints identified by ports, header
format, optional checksum.(pseudoheader + UDP header + data)
Interprocess Communication
-Characteristics:messages sent to internet addr & local port pairs, port
has one receiver but many senders, processes may use multiple ports.
VALIDITY,INTERITY,ORDERING
-Sockets:inbetween app and transport layers. a door between
application process and end-to-end-transport protocol (UCP or
TCP).Application and middleware layers use servivec provided by the
network and transport layers thru the socket API( interface, gate, door
between a process and transport layer). A socket must be bound to a
local port
Programming: Client must contact server(server process must first be
running, server must have created socket (door) that welcomes client’s
contact), Client contacts server by(creating client-local TCP socket,
specifying IP address, port number of server process, When client
creates socket: client TCP establishes connection to server TCP),
When contacted by client, server TCP creates new socket for server
process to communicate with client(allows server to talk with multiple
clients & source port numbers used to distinguish clients)
-Process: program running within a host. within same host, two
processes communicate using inter-process communication. processes
in different hosts communicate by exchanging messages using
transport layer. Client process: process that initiates communication.
Server process: process that waits to be contacted
-Adressing Processes: to receive messages, process must have
identifier. host device has unique 32- bit IP address. identifier includes
both IP address and port number associated with the process
-Broadcast: sends a single message from one process to all
processes or hosts(Used for ARP in a LAN, Hard and expensive in
WAN)
-Multicast: sends a single message from one process to members of a
group of processes (hosts). USES: Fault tolerance based on replicated
services, Discovery in spontaneous networking, Performance from
replicated data, Propagation of event notifications in a distributed
environment.
IP Multicast: multicast address ->identify a group
Internet Group Membership Protocol.(Processes register a group with
local router using IGMP). Router updates its multicast routing table->
Processes send message to a group-> Router forward multicast
messages.
Multicast Routing Problem: Goal: find a tree (or trees) connecting
routers having local multicast group members. tree: not all paths
between routers used; source-based: different tree from each sender to
recivers; shared-tree: same tree used by all group members
Indirect Communication
-Indirect communication: communication through intermediary with
no direct coupling between the sender and the receiver.(Space
uncoupling: the sender does not know or need to know the identity of
the receiver; Time uncoupling: the sender and receiver can have
independent lifetimes).Its often used in distributed systems. Main
disadvantage is performance overhead introduced by added level of
indirection & more difficult to manage
-Group communication: offers a service where a message is sent to a
group and then this message is delivered to all members of the group,
Sender is not aware of the identities of the receivers. Represents
abstraction over multicast communication adding significant extra value
in terms of managing group membership, detecting failures and
providing reliability and ordering guarantees.
Application: Reliable dissemination of information, Support for
collaborative applications, a range of fault-tolerance strategies and
system monitoring and management.
-The programming model:a group with associated group membership
where processes may join or leave the group. Processes can send a
message to this group and have it propagated to all members of the
group with guarantees in terms of reliability and ordering. The essential
feature is that a process issues only one multicast operation to send a
message to each of a group of processes.
Issues: Reliability and ordering in multicast(Integrity, validity, and
agreement). Group communication services offer ordered
multicast(FIF0, casual ordering, total ordering)
-Group Membership Management: Provides an interface for group
membership changes, failure detection, notifying members of changes,
& performing group address expansion. Issues: most effective in smallscale and static systems and does not operate as well in larger-scale
environments or environments with a high degree of volatility
-Publish-subscribe systems: distributed event-based systems.
Publisher publishes structured events to an event service->Subscribers
express interest through subscriptions which can be arbitrarily
patterned over the structured events. Ensures that events are delivered
efficiently to all subscribers that have filters defined that match the
event. Subscription filter model(channel, topic, content and type
based)EX: financial info systems
Issues: Centralized versus distributed implementations.
-Message Queues: point-to-point service using the concept of a
message queue as an indirection. EX: Enterprise App Integration.
extensively used as the basis for commercial transaction processing
systems.
-Share memory approaches: Distributed shared memory (DSM) is an
abstraction used for sharing data between computers that do not share
physical memory. Processes access DSM by reads and updates to
ordinary memory within their address space. It is as though the
processes access a single shared memory, but in fact the physical
memory is distributed. primarily a tool for parallel applications or for any
distributed application or group of applications in which individual
shared data items can be accessed directly. general less appropriate in
client-server systems.
-Tuple Space Communication: processes communicate indirectly by
placing tuples in a tuple space, from which other processes can read or
remove them. Tuples consist of a sequence of one or more typed data
fields such as <"fred", 1958>, <"sid", 1964> and <4, 9.8, "Yes">.
OPERATIONS: write, read take. Immutable
Download