Prof. Thomas Sterling Center for Computation and Technology & Department of Computer Science Louisiana State University February 3, 2011 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS COMMUNICATING SEQUENTIAL PROCESSES CSC 7600 Lecture 6 : CSP Spring 2011 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 2 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 3 Opening Remarks • This week is about scalable application execution – Shared memory systems not scalable – Job stream parallelism does not accelerate single application • A path to harnessing distributed memory computers – Dominant form of HPC systems – Commodity clusters & DM MPPs • Discuss the 2ndparadigm for parallel programming: cooperative computing – Throughput computing (Segment 1, capacity computing) – Multithreaded shared memory (Segment 3, capability computing) • Dominant strategy – Arena of technical computing • Embodiment of Cooperative Computing – Single application – Weak scaling CSC 7600 Lecture 6 : CSP Spring 2011 4 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 5 Driving forces • Technology – VLSI • “Killer micros” • High density DRAM – Emerging network • Architecture – DM MPP – Beowulf systems (commodity clusters) • Weak Scaling – Need for larger problems – Data parallelism CSC 7600 Lecture 6 : CSP Spring 2011 6 Scalability • Strong scaling limits sustained performance – Fixed size problem to achieve reduced execution time with increased computing resources – Amdahl’s law • Sequential component limits speedup – Overhead imposes limits to granularity • therefore parallelism and speedup • Weak scaling allows computation size to grow with data set size – Larger data sets create more concurrent processes – Concurrent processes approximately same size granularity – Performance increases with problem set size • Big systems are big memories for big applications – Aggregates memories of many processing nodes – Allows problems far larger than a single processor could manage CSC 7600 Lecture 6 : CSP Spring 2011 7 Strong Scaling Machine Scale (# of nodes) Granularity (size / node) Total Problem Size Strong Scaling, Weak Scaling Weak Scaling Machine Scale (# of nodes) CSC 7600 Lecture 6 : CSP Spring 2011 8 Strong Scaling, Weak Scaling • Capacity • • • • Primary scaling is increase in throughput proportional to increase in resources applied Decoupled concurrent tasks, increasing in number of instances – scaling proportional to machine. Cooperative • • • Single job, (different nodes working on different partitions of the same job) Job size scales proportional to machine Granularity per node is fixed Capability • • Primary scaling is decrease in response time proportional to increase in resources applied Single job, constant size – scaling proportional to machine size Capacity Cooperative Capability Single Job Problem Size Scaling Weak Scaling Strong Scaling CSC 7600 Lecture 6 : CSP Spring 2011 9 Strong Scaling Vs. Weak Scaling Weak Scaling Work per task Strong Scaling 1 2 4 Machine Scale (# of nodes) 8 CSC 7600 Lecture 6 : CSP Spring 2011 10 Impact of VLSI • Mass produced microprocessor enabled low cost computing – PCs and workstations • Economy of scale – Ensembles of multiple processors • Microprocessor becomes building block of parallel computers • Favors sequential process oriented computing – Natural hardware supported execution model – Requires locality management • Data • Control – I/O channels (south bridge) provides external interface • Coarse grained communication packets • Suggests concurrent execution at the process boundary level – Processes statically assigned to processors (one on one) • Operate on local data – Coordination by large value-oriented I/O messages • Inter process/processor synchronization and remote data exchange CSC 7600 Lecture 6 : CSP Spring 2011 11 “Cooperative” computing • Between Capacity and Capability computing • • Synonymous with “Coordinated” computing Single application – Not a widely used term – But an important distinction with respect to these others – Partitioning of data into quasi independent blocks – Semi independent processes operate on separate data blocks – Limited communication of messages • Coordinate through remote synchronization • Cooperate through the exchange of some data • Scaling • Programming – Primarily weak scaling – Limited strong scaling – – – – Favors SPMD (Single Program stream Multiple Data stream) style Static scheduling mostly by hand Load balancing by hand Coarse grain • Process • Data • Communication CSC 7600 Lecture 6 : CSP Spring 2011 12 Data Decomposition • Partitioning the global data into major contiguous blocks • Exploits spatial locality that assumes the use of a data element heightens the likelihood of nearby data being used as well (reducing latencies associated with cache misses followed by accesses to main memory) • Exploits temporal locality that assumes the use of a data element heightens the likelihood that the same data will be reused again in the near future • Varies in form – in dimensionality – Granularity (size) – Shape of partitions • Static mapping of partitions on to processor nodes CSC 7600 Lecture 6 : CSP Spring 2011 13 Distributed Concurrent Processes • Each data block can be processed at the same time – Parallelism is determined by number of processes – More blocks with smaller partitions permit more processes – But … • Processes run on separate processors on local data – Usually one application process per processor – Usually SPMD i.e., processes are equivalent but separate (same code, different environments) • Execution of inner data elements of the partition block are done independently for each of the processes – Provides coarse grain parallelism – Outer loop iterates over successive application steps over the same local data CSC 7600 Lecture 6 : CSP Spring 2011 14 Data Exchange • In shared memory, no problem, all the data is there • For distributed memory systems, data needs to be exchanged between separate nodes and processes • Ghost cells used to hold local copies of edges of remote partition data at remote processor sites • Communication packets are medium to coarse grain and point to point for most data transfers – e.g., all edge cells of one data partition may be sent to corresponding ghost cells of the neighboring processor in a single message • Multi-cast or broadcast may be required for some application algorithms and data partitions – e.g., matrix-vector multiply CSC 7600 Lecture 6 : CSP Spring 2011 15 Synchronize • Global barriers – Coarse grain (in time) control of outer-loop steps – Usually used to coordinate transition from computation phase to communication phase • Send/receive – – – – Medium grain (in time) control of inner-loop data exchanges Blocks on a send and receive Computation at sender proceeds when data has been received Computation at receiver proceeds when incoming data is available – Non-blocking versions of each exist but can lead to race conditions CSC 7600 Lecture 6 : CSP Spring 2011 16 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 17 Making Parallelism Fit • Different kinds of parallelism work best on certain kinds architectures • Need to satisfy two contending requirements: – Spread work out among as many parallel elements as possible – Minimize inefficiencies due to: • Overhead • latency CSC 7600 Lecture 6 : CSP Spring 2011 18 Communicating Sequential Processes • A model of parallel computing – Developed in the 1970s – Often attributed to Tony Hoare – Satisfies criteria for cooperative computing • Many would claim it as a means of capability computing • • • • • Process Oriented Emphasizes data locality Message passing semantics Synchronization using barriers among others Distributed reduction operators added for purposes of optimization CSC 7600 Lecture 6 : CSP Spring 2011 19 Communicating Sequential Processes Model • Another form of parallelism • Coarse grained parallelism – Large pieces of sequential code – They run at the same time • Good for clusters and distributed memory MPPs • Share data by message passing – Often referred to as “message-passing model” • • • • Synchronize by “global barriers” Most widely used method for programming MPI is dominant API Supports “SPMD” strategy (Single Program Multiple Data) CSC 7600 Lecture 6 : CSP Spring 2011 20 CSP Processes • Process is the body of state and work • Process is the module of work distribution • Processes are static – In space: assigned to a single processor – In time: exist for the lifetime of the job • All data is either local to the process or acquired through incident messages • Possible to extend process beyond sequential to encompass multiple threaded processes – Hybrid model integrates the two models together in a clumsy programming methodology CSC 7600 Lecture 6 : CSP Spring 2011 21 Locality of state • Processes operate on memory within the processor node • Granularity of process iteration dependent on the amount of process data stored on processor node • New data from beyond local processor node acquired through message passing, primarily by send/receive semantic constructs CSC 7600 Lecture 6 : CSP Spring 2011 22 Other key functionalities • Synchronization – Barriers – messaging • Reduction – Mix of local and global • Load balancing – Static, user defined CSC 7600 Lecture 6 : CSP Spring 2011 23 Message Passing Model (BSP) Initialize barrier Local sequential process barrier Exchange Data CSC 7600 Lecture 6 : CSP Spring 2011 24 Global Barrier Synchronization • Different nodes finish a process at different times • Cannot exchange data until all processes have completed • Barriers synchronize all concurrent processes running on separate “nodes” • How it works – Every process “tells” barrier when it is done – When all processes are done, barrier “tells” processes that they can continue • “tells” is done by message passing over the network CSC 7600 Lecture 6 : CSP Spring 2011 25 Barrier Barrier Synchronization Processes CSC 7600 Lecture 6 : CSP Spring 2011 26 Message: Send & Receive • Nodes communicate with each other by packets through the system area network – Reminder: network comprises (hardware) • NICs (Network Interface Controller) • Links (metal wires or fiber optics) • Switch (N x N) – Operating systems and network drivers (software) • Processes communicate with each other by applicationlevel messages – send – receive • Message content – Process port – Data CSC 7600 Lecture 6 : CSP Spring 2011 27 Send & Receive Node 1 Node 2 Process A Process B Network send receive receive send send receive CSC 7600 Lecture 6 : CSP Spring 2011 28 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 29 An Example Problem • Partial Differential Equation (PDE) – Heat equation – 2-dimensions discrete point distribution mesh to approximate a unit square • The temperature field is approximated by a finite set of discrete points distributed over the computational domain and the temperature values need to be calculated in this points – Static boundary conditions (temperature on the boundaries is predefined function of time) • Stages of Code Development – – – – Data decomposition Concurrent sequential processes Coordination through synchronization Data exchange CSC 7600 Lecture 6 : CSP Spring 2011 30 An Example Problem Heat equation: u k 2u t Boundary cells In 2-D: ut k (u xx u yy ) Uniprocessor Ghost cells Implementation: • Jacobi method on a unit square • Dirichlet boundary condition • Equal number of intervals along x and y axis * * * * * stencil Boundary updates CSP with 4 processes CSC 7600 Lecture 6 : CSP Spring 2011 31 Stencil Calculation xN xW xC xE xN t xE t xS t xW t xC t 1 4.0 xS CSC 7600 Lecture 6 : CSP Spring 2011 32 Heat Distribution : Interactive Session • We are going to act out a heat distribution problem. • The take-home message we want to convey is : – – – – How cooperative computing works Notion of mesh/grid partitioning Use of messaging Explicit synchronization CSC 7600 Lecture 6 : CSP Spring 2011 33 CSC 7600 Lecture 6 : CSP Spring 2011 34 100 80 60 40 20 100 0 0 0 0 80 0 0 0 60 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 20 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 50 0 10 20 30 0 CSC 7600 Lecture 6 : CSP Spring 40 50 2011 35 Calculate the value of each cell by averaging its 4 neighboring cells 60 0 0 0 60 15 4 4 0000 0 0 4 4 100 80 60 100 0 0 0 0 0 0 0 80 0 0 0 0 0 0 0 60 0 0 0 0 0 0 0 0 0 0 CSC 7600 Lecture 6 : CSP Spring 2011 36 Calculate the difference between the previous cell values and new cell values 100 80 60 100 50 20 15 0 80 20 0 0 0 60 15 0 0 100 80 60 100 0 0 0 80 0 0 60 0 0 50 0 50 CSC 7600 Lecture 6 : CSP Spring 2011 37 After computing the difference for each cell, Determine the Maximum Temperature ACROSS your problem chunk 100 80 60 100 50 20 15 80 20 0 0 60 15 0 0 50 Send this value to coordinator Coordinator CSC 7600 Lecture 6 : CSP Spring 2011 38 Coordinator Waits for all processing elements to send their values and determines the maximum of all the values it receives max(50,10,10,25) 50 Coordinator 50 10 10 25 YES MAX < 7.0 ? Proc 1 Proc 2 Proc 3 STOP Proc 4 NO CSC 7600 Lecture 6 : CSP Spring 2011 39 100 80 60 40 20 100 50 20 15 10 80 20 0 0 60 15 0 0 10 0 15 40 15 10 5 0 0 0 0 0 0 2.5 10 0 0 0 0 5.0 20 0 0 0 7.5 0 0 0 0 5.0 10 0 0 0 0 0 0 7.5 30 20 5 0 0 0 0 0 0 10.0 40 0 0 2.5 5.0 7.5 5.0 7.5 10.0 25.0 50 30 CSC 7600 Lecture 6 : CSP Spring 40 50 2011 0 10 20 0 40 ITERATION 1 CSC 7600 Lecture 6 : CSP Spring 2011 41 ITERATION 2 CSC 7600 Lecture 6 : CSP Spring 2011 42 ITERATION 3 CSC 7600 Lecture 6 : CSP Spring 2011 43 ITERATION 4 CSC 7600 Lecture 6 : CSP Spring 2011 44 ITERATION 5 CSC 7600 Lecture 6 : CSP Spring 2011 45 ITERATION 6 CSC 7600 Lecture 6 : CSP Spring 2011 46 ITERATION 7 CSC 7600 Lecture 6 : CSP Spring 2011 47 ITERATION 8 CSC 7600 Lecture 6 : CSP Spring 2011 48 ITERATION 9 CSC 7600 Lecture 6 : CSP Spring 2011 49 ITERATION 10 CSC 7600 Lecture 6 : CSP Spring 2011 50 ITERATION 11 CSC 7600 Lecture 6 : CSP Spring 2011 51 ITERATION 12 CSC 7600 Lecture 6 : CSP Spring 2011 52 ITERATION 13 CSC 7600 Lecture 6 : CSP Spring 2011 53 ITERATION 14 CSC 7600 Lecture 6 : CSP Spring 2011 54 ITERATION 15 CSC 7600 Lecture 6 : CSP Spring 2011 55 ITERATION 16 CSC 7600 Lecture 6 : CSP Spring 2011 56 ITERATION 17 CSC 7600 Lecture 6 : CSP Spring 2011 57 ITERATION 18 CSC 7600 Lecture 6 : CSP Spring 2011 58 ITERATION 19 CSC 7600 Lecture 6 : CSP Spring 2011 59 ITERATION 20 CSC 7600 Lecture 6 : CSP Spring 2011 60 ITERATION 50 CSC 7600 Lecture 6 : CSP Spring 2011 61 ITERATION 100 CSC 7600 Lecture 6 : CSP Spring 2011 62 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 63 Performance issues for CSP • Parallelism speeds things up • Data exchange slows things down • Finer grain partitioning – provides more parallelism • Can use more processors – requires more fine grain messages • Overhead becomes more significant per datum – fewer operations per message • Overhead of communication becomes more significant per operation • Synchronization is another source of overhead • Computation and communication not overlapped CSC 7600 Lecture 6 : CSP Spring 2011 64 Performance Issues for CSP • Communication (Gather / Scatter, Data exchange) – Latency • Network Distance, Message size – Contention • Network Bandwidth – Overhead • Network interfaces & protocols used • Synchronization (Blocking read – writes, barriers) – Overhead • Load Balancing – Non-uniform work tasks – Starvation – Overhead CSC 7600 Lecture 6 : CSP Spring 2011 65 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 66 Parallelism : Operating System level Program instructions • Program (review) : A program is a set of instructions usually stored in the memory. During execution a computer fetches the instruction stored in the memory address indicated by the program counter and executes the instruction. • Process (review) : Can be defined as a combination of program, memory address space associated with the program and a program counter. • A program associated with one process cannot access the memory address space of another program. • A multi-threaded process is one where a single memory address space is associated with multiple program counters. • In this lecture we limit the discussion to single-threaded processes for the sake of simplicity. CPU Adapted from Ch. 7 Beowulf and Cluster Computing . Gropp, Lusk, Sterling CSC 7600 Lecture 6 : CSP Spring 2011 67 Unix processes : Overview • • New processes can be created using the fork()exec() combination. fork() – A medium weight mechanism that copies the address space and creates a process with the same program. The process that invoked the fork() call is known as the parent process and the newly created process is called the child process. – For the child the fork() call returns a 0 where as for the parent fork() call returns the process ID (PID) • • The child process then invokes the exec() system call. exec() – changes the program associated with the process – sets the program counter to the beginning of the program. – reinitializes the address space Image cropped from : http://sc.tamu.edu/ Adapted from Ch. 7 Beowulf and Cluster Computing . Gropp, Lusk, Sterling CSC 7600 Lecture 6 : CSP Spring 2011 68 Parallelism using Unix utilities • • • Usually a shell process waits for the child process to finish execution before prompting you to execute another command. By appending a new process invocation with the “&” character, the shell starts the new process but then immediately prompts for another command, this is called running a process in the “background”. This is the simplest form of master-worker model executed using basic Unix utilities. !# /bin/bash export search_string=$1 echo searching for $search_string for i in 20* do ( cd $i; grep $search_string * >> $search_string.out done wait cat 20*/$search_string.out > $1.all & ) ; Adapted from Ch. 7 Beowulf and Cluster Computing . Gropp, Lusk, Sterling CSC 7600 Lecture 6 : CSP Spring 2011 69 Remote Processes • • To create a new process on another machine, the initiator must contact an existing process and cause it to fork a new process. The contact is usually made over a TCP socket. rsh (remote shell) : – rsh command contacts the rshd process running on the remote machine and prompts it to execute a script/program. – The standard I/O for the remote machine are routed through rsh to the local machine’s standard I/O. – Due to severe security problems (plain text password handling) utilities like rsh and telnet are strongly discouraged and deprecated in many systems. – eg : rsh celeritas.cct.lsu.edu /bin/date • ssh (Secure shell) : – behaves much like rsh, but the authentication mechanism is based on public key encryption and encrypts all traffic between the local and remote machines. – since rsh does not have the encryption stage, rsh is substantially faster than ssh. – eg : ssh celeritas.cct.lsu.edu /bin/date CSC 7600 Lecture 6 : CSP Spring 2011 70 Sockets : Overview • socket : a bidirectional communication channel between two processes that is accessed by the processes using the same read and write functions that are used for file I/O • Connection Process : – The initial connection process that two remote processes perform in order to establish a bidirectional communication channel is asymmetric. • The remote machine listens for a connection and accepts it • One process initiates request for connection with the remote machine. • A bidirectional channel between the two machines is established. – Once a channel is established the communication between the two processes is symmetric. – In a client-server model, the process that waits for a connection is known as the server and the process that connects to it is known as the client. CSC 7600 Lecture 6 : CSP Spring 2011 71 TCP/IP : Overview • Common Terms : – IP : Internet Protocol for communication between computers, and is responsible for routing IP packets to their destination. – TCP : Transmission Control Protocol for communication between applications – UDP : User Datagram Protocol for communication between applications – ICMP : Internet Control Message Protocol for detecting errors and network statistics • TCP : – An application that wants to communicate with another application sends a request for connection. – The request is sent to a fully qualified address (more on this soon), and port. – After the “handshake”: (SYN-ACK-SYN) between the two, a bidirectional communication channel is established between the two. – The communication channel remains alive until it is terminated by one of the applications involved. • TCP / IP : – TCP breaks down the data to be communicated between applications into packets and assembles the data from packets when they reach the destination. – IP ensures routing of the data packets to their intended receiver. CSC 7600 Lecture 6 : CSP Spring 2011 72 TCP/IP : Overview • • • Each computer on a network is associated with an IP address containing 4 numbers each holding a value between 0-255 eg : 130.184.6.128 Using Domain Name System (DNS) servers the numeric IP address is mapped to a domain name that is easier to remember, eg the Domain Name corresponding to 130.184.6.128 is prospero.uark.edu Analogy : Making a phone call – – – – Caller – client Receiver – server Phone Number – IP address Extension – Port number Client: • Picking up the receiver – socket () • Locating the call recipient (from phone book / memory) – bind() • Dialing the phone number – connect() • Talking – read() / write() • Hanging up – close() Server: • Connecting phone to the phone line – socket () • Selecting an incoming line – bind () • Ringer ON – listen() • Receiving the call – accept() • Talking – read() / write() • Hanging up – close() CSC 7600 Lecture 6 : CSP Spring 2011 73 Server : Create Socket /* Create data structures to store connection specific info */ struct sockaddr_in sin, from; /* The main call that creates a socket */ listen_socket = socket(AF_INET, SOCK_STREAM, 0); Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: – – – – Create a TCP socket Connect to server Communicate Close connection • accept connection • communicate • close connection CSC 7600 Lecture 6 : CSP Spring 2011 74 Server : Bind Socket-Port /* Initializing data structures */ sin.sin_family = AF_INET; sin.sin_addr.s_addr = INADDR_ANY; /* 0 - Allowing the system to do the selection of port to bind. This is user-configurable, we use specific port (PORT_NUM) */ sin.sin_port = htons(PORT_NUM); bind(listen_socket, (struct sockaddr *) &sin, sizeof(sin)); Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: – – – – Create a TCP socket Connect to server Communicate Close connection • accept connection • communicate • close connection CSC 7600 Lecture 6 : CSP Spring 2011 75 Server : Listen for Connections listen(listen_socket, 5) /*5 refers to the number of connection requests that the kernel should maintain for the application */ getsockname(listen_socket, (struct sockaddr *) &sin, &len); print (“listening on port = %d\n”, ntohs(sin.sin_port)); Client Server – – – – Create a TCP socket Bind socket-port Listen for connections Loop: – – – – Server A 130.39.128.9 : port Create a TCP socket Connect to server Communicate Close connection • accept connection • communicate • close connection CSC 7600 Lecture 6 : CSP Spring 2011 76 Server : accept() talk_socket = accept(listen_socket, (structsockaddr *) &from, &len); /*accept() a blocking system call that waits until connection from a client and then returns a new socket(talk_socket)using which the server is connected to the client, so that it can continue listening for more connections on the original server socket (listen_socket) */ Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: – – – – Create a TCP socket Connect to server Communicate Close connection • accept connection • communicate • close connection CSC 7600 Lecture 6 : CSP Spring 2011 77 Client : Create Socket /* Create data structures to store connection specific info */ struct sockaddr_in sin; struct hostent *hp; /* The main call that creates a socket */ talk_socket = socket(AF_INET, SOCK_STREAM, 0); Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: • accept connection • communicate • close connection – – – – Create a TCP socket Connect to server Communicate Close connection CSC 7600 Lecture 6 : CSP Spring 2011 78 Client : Bind and Connect to Server /* initialize data structures*/ hp = gethostbyname(HOST_NAME) bzero((void *)&sin, sizeof(sin)); bcopy((void *) hp->h_addr, (void *) &sin.sin_addr, hp->h_length); sin.sin_family = hp->h_addrtype; sin.sin_port = htons(atoi(PORT_NUM)); /* connect to the server */ connect(talk_socket,(structsockaddr *) &sin, sizeof(sin)); Server – – – – Create a TCP socket Bind socket-port Listen for connections Loop: • accept connection • communicate • close connection Client – – – – Create a TCP socket Connect to server Communicate Close connection CSC 7600 Lecture 6 : CSP Spring 2011 79 Client : send msg. write() n = write(talk_socket, buf, strlen(buf)+1); if (n < 0) error("ERROR writing to socket"); bzero(buf,256); /*Client initiates communication with server using a write() call*/ Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: – – – – Create a TCP socket Connect to server Communicate Close connection • accept connection • communicate • close connection CSC 7600 Lecture 6 : CSP Spring 2011 80 Server recv/send read()/write() n = read (talk_socket, buf, 1024); if (n < 0) error("ERROR reading from socket"); else write(talk_socket, buf, n); /* simple echo; content stored in buf*/ Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: • accept connection • communicate • close connection – – – – Create a TCP socket Connect to server Communicate Close connection CSC 7600 Lecture 6 : CSP Spring 2011 81 Client : recv : read() n = read(talk_socket, buf, 1024); if (n < 0) error("ERROR reading from socket"); else printf(“received from server: %s \n”,buf); /*receives messages sent by the server stored in buf*/ Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: • accept connection • communicate • close connection – – – – Create a TCP socket Connect to server Communicate Close connection CSC 7600 Lecture 6 : CSP Spring 2011 82 Close Socket close(talk_socket) close(talk_socket) /* Ends the socket connection corresponding to one particular client. The control goes back to the loop and server continues to wait for more client connections at listen_socket */ /* Ends the client socket connection */ Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: – – – – Create a TCP socket Connect to server Communicate Close connection • accept connection • communicate • close connection CSC 7600 Lecture 6 : CSP Spring 2011 83 Demo: Socket Example CSC 7600 Lecture 6 : CSP Spring 2011 84 Socket Programming: Problems • Limited portability (not all interconnect interfaces support sockets) • Limited scalability (number of ports available on a node) • Tedious and error-prone hand-coding (unless somebody did it before) • Tricky startup process (assumed port availability is not guaranteed) • Only point-to-point communication supported explicitly; no implementation of collective communication patterns • Frequently used communication topologies not available (e.g., Cartesian mesh), have to be coded from scratch • Direct support only for data organized in continuous buffers, forcing writing own buffer packing/unpacking routines CSC 7600 Lecture 6 : CSP Spring 2011 85 Socket Programming : Problems • Suffer from the overhead of protocol stack (TCP), or require designing algorithms to manage reliable, in-order, free of duplicates arrival of complete messages (e.g. datagram-oriented) • Basic data transfer calls (read/write) do not guarantee returning or sending the full requested number of bytes, requiring the use of wrappers (and possibly resulting in multiple kernel calls per message) • Complicated writing of applications with changing/unpredictable communications (it’s only easy when reads are matched to writes and you know when both of them occur) • On some OS’s sockets may linger long after application exits, preventing new startups using the same configuration • If used, asynchronous management of socket calls adds another layer of complexity (either through select() or multiple threads) CSC 7600 Lecture 6 : CSP Spring 2011 86 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 87 Summary : Material for the Test • • • • • • Scalability (strong, weak scaling) : 7 – 10 Cooperative computing : 12 – 16 Communicating Sequential Processes : 18 – 22 Message Passing : 24 – 28 Performance issues of CSP : 64 – 65 Sockets, TCP / IP : 71, 72, 85, 86 CSC 7600 Lecture 6 : CSP Spring 2011 88 CSC 7600 Lecture 6 : CSP Spring 2011 89