High Performance Computing - Center for Computation & Technology

Prof. Thomas Sterling Center for Computation and Technology & Department of Computer Science Louisiana State University February 3, 2011 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS COMMUNICATING SEQUENTIAL PROCESSES CSC 7600 Lecture 6 : CSP Spring 2011 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 2 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 3 Opening Remarks • This week is about scalable application execution – Shared memory systems not scalable – Job stream parallelism does not accelerate single application • A path to harnessing distributed memory computers – Dominant form of HPC systems – Commodity clusters & DM MPPs • Discuss the 2ndparadigm for parallel programming: cooperative computing – Throughput computing (Segment 1, capacity computing) – Multithreaded shared memory (Segment 3, capability computing) • Dominant strategy – Arena of technical computing • Embodiment of Cooperative Computing – Single application – Weak scaling CSC 7600 Lecture 6 : CSP Spring 2011 4 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 5 Driving forces • Technology – VLSI • “Killer micros” • High density DRAM – Emerging network • Architecture – DM MPP – Beowulf systems (commodity clusters) • Weak Scaling – Need for larger problems – Data parallelism CSC 7600 Lecture 6 : CSP Spring 2011 6 Scalability • Strong scaling limits sustained performance – Fixed size problem to achieve reduced execution time with increased computing resources – Amdahl’s law • Sequential component limits speedup – Overhead imposes limits to granularity • therefore parallelism and speedup • Weak scaling allows computation size to grow with data set size – Larger data sets create more concurrent processes – Concurrent processes approximately same size granularity – Performance increases with problem set size • Big systems are big memories for big applications – Aggregates memories of many processing nodes – Allows problems far larger than a single processor could manage CSC 7600 Lecture 6 : CSP Spring 2011 7 Strong Scaling Machine Scale (# of nodes) Granularity (size / node) Total Problem Size Strong Scaling, Weak Scaling Weak Scaling Machine Scale (# of nodes) CSC 7600 Lecture 6 : CSP Spring 2011 8 Strong Scaling, Weak Scaling • Capacity • • • • Primary scaling is increase in throughput proportional to increase in resources applied Decoupled concurrent tasks, increasing in number of instances – scaling proportional to machine. Cooperative • • • Single job, (different nodes working on different partitions of the same job) Job size scales proportional to machine Granularity per node is fixed Capability • • Primary scaling is decrease in response time proportional to increase in resources applied Single job, constant size – scaling proportional to machine size Capacity Cooperative Capability Single Job Problem Size Scaling Weak Scaling Strong Scaling CSC 7600 Lecture 6 : CSP Spring 2011 9 Strong Scaling Vs. Weak Scaling Weak Scaling Work per task Strong Scaling 1 2 4 Machine Scale (# of nodes) 8 CSC 7600 Lecture 6 : CSP Spring 2011 10 Impact of VLSI • Mass produced microprocessor enabled low cost computing – PCs and workstations • Economy of scale – Ensembles of multiple processors • Microprocessor becomes building block of parallel computers • Favors sequential process oriented computing – Natural hardware supported execution model – Requires locality management • Data • Control – I/O channels (south bridge) provides external interface • Coarse grained communication packets • Suggests concurrent execution at the process boundary level – Processes statically assigned to processors (one on one) • Operate on local data – Coordination by large value-oriented I/O messages • Inter process/processor synchronization and remote data exchange CSC 7600 Lecture 6 : CSP Spring 2011 11 “Cooperative” computing • Between Capacity and Capability computing • • Synonymous with “Coordinated” computing Single application – Not a widely used term – But an important distinction with respect to these others – Partitioning of data into quasi independent blocks – Semi independent processes operate on separate data blocks – Limited communication of messages • Coordinate through remote synchronization • Cooperate through the exchange of some data • Scaling • Programming – Primarily weak scaling – Limited strong scaling – – – – Favors SPMD (Single Program stream Multiple Data stream) style Static scheduling mostly by hand Load balancing by hand Coarse grain • Process • Data • Communication CSC 7600 Lecture 6 : CSP Spring 2011 12 Data Decomposition • Partitioning the global data into major contiguous blocks • Exploits spatial locality that assumes the use of a data element heightens the likelihood of nearby data being used as well (reducing latencies associated with cache misses followed by accesses to main memory) • Exploits temporal locality that assumes the use of a data element heightens the likelihood that the same data will be reused again in the near future • Varies in form – in dimensionality – Granularity (size) – Shape of partitions • Static mapping of partitions on to processor nodes CSC 7600 Lecture 6 : CSP Spring 2011 13 Distributed Concurrent Processes • Each data block can be processed at the same time – Parallelism is determined by number of processes – More blocks with smaller partitions permit more processes – But … • Processes run on separate processors on local data – Usually one application process per processor – Usually SPMD i.e., processes are equivalent but separate (same code, different environments) • Execution of inner data elements of the partition block are done independently for each of the processes – Provides coarse grain parallelism – Outer loop iterates over successive application steps over the same local data CSC 7600 Lecture 6 : CSP Spring 2011 14 Data Exchange • In shared memory, no problem, all the data is there • For distributed memory systems, data needs to be exchanged between separate nodes and processes • Ghost cells used to hold local copies of edges of remote partition data at remote processor sites • Communication packets are medium to coarse grain and point to point for most data transfers – e.g., all edge cells of one data partition may be sent to corresponding ghost cells of the neighboring processor in a single message • Multi-cast or broadcast may be required for some application algorithms and data partitions – e.g., matrix-vector multiply CSC 7600 Lecture 6 : CSP Spring 2011 15 Synchronize • Global barriers – Coarse grain (in time) control of outer-loop steps – Usually used to coordinate transition from computation phase to communication phase • Send/receive – – – – Medium grain (in time) control of inner-loop data exchanges Blocks on a send and receive Computation at sender proceeds when data has been received Computation at receiver proceeds when incoming data is available – Non-blocking versions of each exist but can lead to race conditions CSC 7600 Lecture 6 : CSP Spring 2011 16 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 17 Making Parallelism Fit • Different kinds of parallelism work best on certain kinds architectures • Need to satisfy two contending requirements: – Spread work out among as many parallel elements as possible – Minimize inefficiencies due to: • Overhead • latency CSC 7600 Lecture 6 : CSP Spring 2011 18 Communicating Sequential Processes • A model of parallel computing – Developed in the 1970s – Often attributed to Tony Hoare – Satisfies criteria for cooperative computing • Many would claim it as a means of capability computing • • • • • Process Oriented Emphasizes data locality Message passing semantics Synchronization using barriers among others Distributed reduction operators added for purposes of optimization CSC 7600 Lecture 6 : CSP Spring 2011 19 Communicating Sequential Processes Model • Another form of parallelism • Coarse grained parallelism – Large pieces of sequential code – They run at the same time • Good for clusters and distributed memory MPPs • Share data by message passing – Often referred to as “message-passing model” • • • • Synchronize by “global barriers” Most widely used method for programming MPI is dominant API Supports “SPMD” strategy (Single Program Multiple Data) CSC 7600 Lecture 6 : CSP Spring 2011 20 CSP Processes • Process is the body of state and work • Process is the module of work distribution • Processes are static – In space: assigned to a single processor – In time: exist for the lifetime of the job • All data is either local to the process or acquired through incident messages • Possible to extend process beyond sequential to encompass multiple threaded processes – Hybrid model integrates the two models together in a clumsy programming methodology CSC 7600 Lecture 6 : CSP Spring 2011 21 Locality of state • Processes operate on memory within the processor node • Granularity of process iteration dependent on the amount of process data stored on processor node • New data from beyond local processor node acquired through message passing, primarily by send/receive semantic constructs CSC 7600 Lecture 6 : CSP Spring 2011 22 Other key functionalities • Synchronization – Barriers – messaging • Reduction – Mix of local and global • Load balancing – Static, user defined CSC 7600 Lecture 6 : CSP Spring 2011 23 Message Passing Model (BSP) Initialize barrier Local sequential process barrier Exchange Data CSC 7600 Lecture 6 : CSP Spring 2011 24 Global Barrier Synchronization • Different nodes finish a process at different times • Cannot exchange data until all processes have completed • Barriers synchronize all concurrent processes running on separate “nodes” • How it works – Every process “tells” barrier when it is done – When all processes are done, barrier “tells” processes that they can continue • “tells” is done by message passing over the network CSC 7600 Lecture 6 : CSP Spring 2011 25 Barrier Barrier Synchronization Processes CSC 7600 Lecture 6 : CSP Spring 2011 26 Message: Send & Receive • Nodes communicate with each other by packets through the system area network – Reminder: network comprises (hardware) • NICs (Network Interface Controller) • Links (metal wires or fiber optics) • Switch (N x N) – Operating systems and network drivers (software) • Processes communicate with each other by applicationlevel messages – send – receive • Message content – Process port – Data CSC 7600 Lecture 6 : CSP Spring 2011 27 Send & Receive Node 1 Node 2 Process A Process B Network send receive receive send send receive CSC 7600 Lecture 6 : CSP Spring 2011 28 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 29 An Example Problem • Partial Differential Equation (PDE) – Heat equation – 2-dimensions discrete point distribution mesh to approximate a unit square • The temperature field is approximated by a finite set of discrete points distributed over the computational domain and the temperature values need to be calculated in this points – Static boundary conditions (temperature on the boundaries is predefined function of time) • Stages of Code Development – – – – Data decomposition Concurrent sequential processes Coordination through synchronization Data exchange CSC 7600 Lecture 6 : CSP Spring 2011 30 An Example Problem Heat equation: u  k 2u t Boundary cells In 2-D: ut  k (u xx  u yy ) Uniprocessor Ghost cells Implementation: • Jacobi method on a unit square • Dirichlet boundary condition • Equal number of intervals along x and y axis * * * * * stencil Boundary updates CSP with 4 processes CSC 7600 Lecture 6 : CSP Spring 2011 31 Stencil Calculation xN xW xC xE xN t   xE t   xS t   xW t  xC t  1  4.0 xS CSC 7600 Lecture 6 : CSP Spring 2011 32 Heat Distribution : Interactive Session • We are going to act out a heat distribution problem. • The take-home message we want to convey is : – – – – How cooperative computing works Notion of mesh/grid partitioning Use of messaging Explicit synchronization CSC 7600 Lecture 6 : CSP Spring 2011 33 CSC 7600 Lecture 6 : CSP Spring 2011 34 100 80 60 40 20 100 0 0 0 0 80 0 0 0 60 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 20 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 50 0 10 20 30 0 CSC 7600 Lecture 6 : CSP Spring 40 50 2011 35 Calculate the value of each cell by averaging its 4 neighboring cells 60  0  0  0 60   15 4 4 0000 0  0 4 4 100 80 60 100 0 0 0 0 0 0 0 80 0 0 0 0 0 0 0 60 0 0 0 0 0 0 0 0 0 0 CSC 7600 Lecture 6 : CSP Spring 2011 36 Calculate the difference between the previous cell values and new cell values 100 80 60 100 50 20 15 0 80 20 0 0 0 60 15 0 0 100 80 60 100 0 0 0 80 0 0 60 0 0 50  0  50 CSC 7600 Lecture 6 : CSP Spring 2011 37 After computing the difference for each cell, Determine the Maximum Temperature ACROSS your problem chunk 100 80 60 100 50 20 15 80 20 0 0 60 15 0 0 50 Send this value to coordinator Coordinator CSC 7600 Lecture 6 : CSP Spring 2011 38 Coordinator Waits for all processing elements to send their values and determines the maximum of all the values it receives max(50,10,10,25)  50 Coordinator 50 10 10 25 YES MAX < 7.0 ? Proc 1 Proc 2 Proc 3 STOP Proc 4 NO CSC 7600 Lecture 6 : CSP Spring 2011 39 100 80 60 40 20 100 50 20 15 10 80 20 0 0 60 15 0 0 10 0 15 40 15 10 5 0 0 0 0 0 0 2.5 10 0 0 0 0 5.0 20 0 0 0 7.5 0 0 0 0 5.0 10 0 0 0 0 0 0 7.5 30 20 5 0 0 0 0 0 0 10.0 40 0 0 2.5 5.0 7.5 5.0 7.5 10.0 25.0 50 30 CSC 7600 Lecture 6 : CSP Spring 40 50 2011 0 10 20 0 40 ITERATION 1 CSC 7600 Lecture 6 : CSP Spring 2011 41 ITERATION 2 CSC 7600 Lecture 6 : CSP Spring 2011 42 ITERATION 3 CSC 7600 Lecture 6 : CSP Spring 2011 43 ITERATION 4 CSC 7600 Lecture 6 : CSP Spring 2011 44 ITERATION 5 CSC 7600 Lecture 6 : CSP Spring 2011 45 ITERATION 6 CSC 7600 Lecture 6 : CSP Spring 2011 46 ITERATION 7 CSC 7600 Lecture 6 : CSP Spring 2011 47 ITERATION 8 CSC 7600 Lecture 6 : CSP Spring 2011 48 ITERATION 9 CSC 7600 Lecture 6 : CSP Spring 2011 49 ITERATION 10 CSC 7600 Lecture 6 : CSP Spring 2011 50 ITERATION 11 CSC 7600 Lecture 6 : CSP Spring 2011 51 ITERATION 12 CSC 7600 Lecture 6 : CSP Spring 2011 52 ITERATION 13 CSC 7600 Lecture 6 : CSP Spring 2011 53 ITERATION 14 CSC 7600 Lecture 6 : CSP Spring 2011 54 ITERATION 15 CSC 7600 Lecture 6 : CSP Spring 2011 55 ITERATION 16 CSC 7600 Lecture 6 : CSP Spring 2011 56 ITERATION 17 CSC 7600 Lecture 6 : CSP Spring 2011 57 ITERATION 18 CSC 7600 Lecture 6 : CSP Spring 2011 58 ITERATION 19 CSC 7600 Lecture 6 : CSP Spring 2011 59 ITERATION 20 CSC 7600 Lecture 6 : CSP Spring 2011 60 ITERATION 50 CSC 7600 Lecture 6 : CSP Spring 2011 61 ITERATION 100 CSC 7600 Lecture 6 : CSP Spring 2011 62 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 63 Performance issues for CSP • Parallelism speeds things up • Data exchange slows things down • Finer grain partitioning – provides more parallelism • Can use more processors – requires more fine grain messages • Overhead becomes more significant per datum – fewer operations per message • Overhead of communication becomes more significant per operation • Synchronization is another source of overhead • Computation and communication not overlapped CSC 7600 Lecture 6 : CSP Spring 2011 64 Performance Issues for CSP • Communication (Gather / Scatter, Data exchange) – Latency • Network Distance, Message size – Contention • Network Bandwidth – Overhead • Network interfaces & protocols used • Synchronization (Blocking read – writes, barriers) – Overhead • Load Balancing – Non-uniform work tasks – Starvation – Overhead CSC 7600 Lecture 6 : CSP Spring 2011 65 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 66 Parallelism : Operating System level Program instructions • Program (review) : A program is a set of instructions usually stored in the memory. During execution a computer fetches the instruction stored in the memory address indicated by the program counter and executes the instruction. • Process (review) : Can be defined as a combination of program, memory address space associated with the program and a program counter. • A program associated with one process cannot access the memory address space of another program. • A multi-threaded process is one where a single memory address space is associated with multiple program counters. • In this lecture we limit the discussion to single-threaded processes for the sake of simplicity. CPU Adapted from Ch. 7 Beowulf and Cluster Computing . Gropp, Lusk, Sterling CSC 7600 Lecture 6 : CSP Spring 2011 67 Unix processes : Overview • • New processes can be created using the fork()exec() combination. fork() – A medium weight mechanism that copies the address space and creates a process with the same program. The process that invoked the fork() call is known as the parent process and the newly created process is called the child process. – For the child the fork() call returns a 0 where as for the parent fork() call returns the process ID (PID) • • The child process then invokes the exec() system call. exec() – changes the program associated with the process – sets the program counter to the beginning of the program. – reinitializes the address space Image cropped from : http://sc.tamu.edu/ Adapted from Ch. 7 Beowulf and Cluster Computing . Gropp, Lusk, Sterling CSC 7600 Lecture 6 : CSP Spring 2011 68 Parallelism using Unix utilities • • • Usually a shell process waits for the child process to finish execution before prompting you to execute another command. By appending a new process invocation with the “&” character, the shell starts the new process but then immediately prompts for another command, this is called running a process in the “background”. This is the simplest form of master-worker model executed using basic Unix utilities. !# /bin/bash export search_string=$1 echo searching for $search_string for i in 20* do ( cd $i; grep $search_string * >> $search_string.out done wait cat 20*/$search_string.out > $1.all & ) ; Adapted from Ch. 7 Beowulf and Cluster Computing . Gropp, Lusk, Sterling CSC 7600 Lecture 6 : CSP Spring 2011 69 Remote Processes • • To create a new process on another machine, the initiator must contact an existing process and cause it to fork a new process. The contact is usually made over a TCP socket. rsh (remote shell) : – rsh command contacts the rshd process running on the remote machine and prompts it to execute a script/program. – The standard I/O for the remote machine are routed through rsh to the local machine’s standard I/O. – Due to severe security problems (plain text password handling) utilities like rsh and telnet are strongly discouraged and deprecated in many systems. – eg : rsh celeritas.cct.lsu.edu /bin/date • ssh (Secure shell) : – behaves much like rsh, but the authentication mechanism is based on public key encryption and encrypts all traffic between the local and remote machines. – since rsh does not have the encryption stage, rsh is substantially faster than ssh. – eg : ssh celeritas.cct.lsu.edu /bin/date CSC 7600 Lecture 6 : CSP Spring 2011 70 Sockets : Overview • socket : a bidirectional communication channel between two processes that is accessed by the processes using the same read and write functions that are used for file I/O • Connection Process : – The initial connection process that two remote processes perform in order to establish a bidirectional communication channel is asymmetric. • The remote machine listens for a connection and accepts it • One process initiates request for connection with the remote machine. • A bidirectional channel between the two machines is established. – Once a channel is established the communication between the two processes is symmetric. – In a client-server model, the process that waits for a connection is known as the server and the process that connects to it is known as the client. CSC 7600 Lecture 6 : CSP Spring 2011 71 TCP/IP : Overview • Common Terms : – IP : Internet Protocol for communication between computers, and is responsible for routing IP packets to their destination. – TCP : Transmission Control Protocol for communication between applications – UDP : User Datagram Protocol for communication between applications – ICMP : Internet Control Message Protocol for detecting errors and network statistics • TCP : – An application that wants to communicate with another application sends a request for connection. – The request is sent to a fully qualified address (more on this soon), and port. – After the “handshake”: (SYN-ACK-SYN) between the two, a bidirectional communication channel is established between the two. – The communication channel remains alive until it is terminated by one of the applications involved. • TCP / IP : – TCP breaks down the data to be communicated between applications into packets and assembles the data from packets when they reach the destination. – IP ensures routing of the data packets to their intended receiver. CSC 7600 Lecture 6 : CSP Spring 2011 72 TCP/IP : Overview • • • Each computer on a network is associated with an IP address containing 4 numbers each holding a value between 0-255 eg : 130.184.6.128 Using Domain Name System (DNS) servers the numeric IP address is mapped to a domain name that is easier to remember, eg the Domain Name corresponding to 130.184.6.128 is prospero.uark.edu Analogy : Making a phone call – – – – Caller – client Receiver – server Phone Number – IP address Extension – Port number Client: • Picking up the receiver – socket () • Locating the call recipient (from phone book / memory) – bind() • Dialing the phone number – connect() • Talking – read() / write() • Hanging up – close() Server: • Connecting phone to the phone line – socket () • Selecting an incoming line – bind () • Ringer ON – listen() • Receiving the call – accept() • Talking – read() / write() • Hanging up – close() CSC 7600 Lecture 6 : CSP Spring 2011 73 Server : Create Socket /* Create data structures to store connection specific info */ struct sockaddr_in sin, from; /* The main call that creates a socket */ listen_socket = socket(AF_INET, SOCK_STREAM, 0); Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: – – – – Create a TCP socket Connect to server Communicate Close connection • accept connection • communicate • close connection CSC 7600 Lecture 6 : CSP Spring 2011 74 Server : Bind Socket-Port /* Initializing data structures */ sin.sin_family = AF_INET; sin.sin_addr.s_addr = INADDR_ANY; /* 0 - Allowing the system to do the selection of port to bind. This is user-configurable, we use specific port (PORT_NUM) */ sin.sin_port = htons(PORT_NUM); bind(listen_socket, (struct sockaddr *) &sin, sizeof(sin)); Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: – – – – Create a TCP socket Connect to server Communicate Close connection • accept connection • communicate • close connection CSC 7600 Lecture 6 : CSP Spring 2011 75 Server : Listen for Connections listen(listen_socket, 5) /*5 refers to the number of connection requests that the kernel should maintain for the application */ getsockname(listen_socket, (struct sockaddr *) &sin, &len); print (“listening on port = %d\n”, ntohs(sin.sin_port)); Client Server – – – – Create a TCP socket Bind socket-port Listen for connections Loop: – – – – Server A 130.39.128.9 : port Create a TCP socket Connect to server Communicate Close connection • accept connection • communicate • close connection CSC 7600 Lecture 6 : CSP Spring 2011 76 Server : accept() talk_socket = accept(listen_socket, (structsockaddr *) &from, &len); /*accept() a blocking system call that waits until connection from a client and then returns a new socket(talk_socket)using which the server is connected to the client, so that it can continue listening for more connections on the original server socket (listen_socket) */ Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: – – – – Create a TCP socket Connect to server Communicate Close connection • accept connection • communicate • close connection CSC 7600 Lecture 6 : CSP Spring 2011 77 Client : Create Socket /* Create data structures to store connection specific info */ struct sockaddr_in sin; struct hostent *hp; /* The main call that creates a socket */ talk_socket = socket(AF_INET, SOCK_STREAM, 0); Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: • accept connection • communicate • close connection – – – – Create a TCP socket Connect to server Communicate Close connection CSC 7600 Lecture 6 : CSP Spring 2011 78 Client : Bind and Connect to Server /* initialize data structures*/ hp = gethostbyname(HOST_NAME) bzero((void *)&sin, sizeof(sin)); bcopy((void *) hp->h_addr, (void *) &sin.sin_addr, hp->h_length); sin.sin_family = hp->h_addrtype; sin.sin_port = htons(atoi(PORT_NUM)); /* connect to the server */ connect(talk_socket,(structsockaddr *) &sin, sizeof(sin)); Server – – – – Create a TCP socket Bind socket-port Listen for connections Loop: • accept connection • communicate • close connection Client – – – – Create a TCP socket Connect to server Communicate Close connection CSC 7600 Lecture 6 : CSP Spring 2011 79 Client : send msg. write() n = write(talk_socket, buf, strlen(buf)+1); if (n < 0) error("ERROR writing to socket"); bzero(buf,256); /*Client initiates communication with server using a write() call*/ Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: – – – – Create a TCP socket Connect to server Communicate Close connection • accept connection • communicate • close connection CSC 7600 Lecture 6 : CSP Spring 2011 80 Server recv/send read()/write() n = read (talk_socket, buf, 1024); if (n < 0) error("ERROR reading from socket"); else write(talk_socket, buf, n); /* simple echo; content stored in buf*/ Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: • accept connection • communicate • close connection – – – – Create a TCP socket Connect to server Communicate Close connection CSC 7600 Lecture 6 : CSP Spring 2011 81 Client : recv : read() n = read(talk_socket, buf, 1024); if (n < 0) error("ERROR reading from socket"); else printf(“received from server: %s \n”,buf); /*receives messages sent by the server stored in buf*/ Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: • accept connection • communicate • close connection – – – – Create a TCP socket Connect to server Communicate Close connection CSC 7600 Lecture 6 : CSP Spring 2011 82 Close Socket close(talk_socket) close(talk_socket) /* Ends the socket connection corresponding to one particular client. The control goes back to the loop and server continues to wait for more client connections at listen_socket */ /* Ends the client socket connection */ Server – – – – Client Create a TCP socket Bind socket-port Listen for connections Loop: – – – – Create a TCP socket Connect to server Communicate Close connection • accept connection • communicate • close connection CSC 7600 Lecture 6 : CSP Spring 2011 83 Demo: Socket Example CSC 7600 Lecture 6 : CSP Spring 2011 84 Socket Programming: Problems • Limited portability (not all interconnect interfaces support sockets) • Limited scalability (number of ports available on a node) • Tedious and error-prone hand-coding (unless somebody did it before) • Tricky startup process (assumed port availability is not guaranteed) • Only point-to-point communication supported explicitly; no implementation of collective communication patterns • Frequently used communication topologies not available (e.g., Cartesian mesh), have to be coded from scratch • Direct support only for data organized in continuous buffers, forcing writing own buffer packing/unpacking routines CSC 7600 Lecture 6 : CSP Spring 2011 85 Socket Programming : Problems • Suffer from the overhead of protocol stack (TCP), or require designing algorithms to manage reliable, in-order, free of duplicates arrival of complete messages (e.g. datagram-oriented) • Basic data transfer calls (read/write) do not guarantee returning or sending the full requested number of bytes, requiring the use of wrappers (and possibly resulting in multiple kernel calls per message) • Complicated writing of applications with changing/unpredictable communications (it’s only easy when reads are matched to writes and you know when both of them occur) • On some OS’s sockets may linger long after application exits, preventing new startups using the same configuration • If used, asynchronous management of socket calls adds another layer of complexity (either through select() or multiple threads) CSC 7600 Lecture 6 : CSP Spring 2011 86 Topics • • • • • • • Introduction Towards a Scalable Execution Model Communicating Sequential Processes CSP – Heat Distribution Example Performance Issues Distributed Programming with Unix Summary – Material for the Test CSC 7600 Lecture 6 : CSP Spring 2011 87 Summary : Material for the Test • • • • • • Scalability (strong, weak scaling) : 7 – 10 Cooperative computing : 12 – 16 Communicating Sequential Processes : 18 – 22 Message Passing : 24 – 28 Performance issues of CSP : 64 – 65 Sockets, TCP / IP : 71, 72, 85, 86 CSC 7600 Lecture 6 : CSP Spring 2011 88 CSC 7600 Lecture 6 : CSP Spring 2011 89

High Performance Computing - Center for Computation & Technology

Related documents

Products

Support

High Performance Computing - Center for Computation & Technology

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib