Networking with Java 1: The Client Side Introduction to Networking Protocols Hi Hi TCP connection request Client Got the time? TCP connection Server reply GET http://www.cs.huji.ac.il/~dbi 2:00 <file> time DBI 2008 HUJI-CS 3 Networking as Layers DATA Application (HTTP, FTP) HEADER DATA HEADER HEADER DATA HEADER HEADER DATA Transport (TCP,UDP) Network (IP) Link (LINK) DBI 2008 HEADER HUJI-CS 4 TCP (Transmission-Control Protocol) • Enables symmetric byte-stream transmission between two endpoints (applications) • Reliable communication channel • TCP perform these tasks: – connection establishment by handshake (relatively slow) – division to numbered packets (transferred by IP) – error correction of packets (checksum) – acknowledgement and retransmission of packets – connection termination by handshake DBI 2008 HUJI-CS 5 UDP (User Datagram Protocol) • Enables direct datagram (packet) transmission from one endpoint to another • No reliability (except for data correction) – sender does not wait for acknowledgements – arrival order is not guaranteed – arrival is not guaranteed • Used when speed is essential, even in cost of reliability – e.g., streaming media, games, Internet telephony, etc. DBI 2008 HUJI-CS 6 Ports • A computer may have several applications that communicate with applications on remote computers through the same physical connection to the network • When receiving a packet, how can the computer tell which application is the destination? • Solution: each channel endpoint is assigned a unique port that is known to both the computer and the other endpoint DBI 2008 HUJI-CS 7 Ports (cont) • Thus, an endpoint application on the Internet is identified by – A host name → 32 bits IP-address – A 16 bits port • Why don’t we specify the port in a Web browser? DBI 2008 HUJI-CS 8 Known Ports • Some known ports are Client Application – 20, 21: FTP – 22: SSH mail client – 23: TELNET – 25: SMTP – 110: POP3 – 80: HTTP 21 23 25 web browser 110 80 119 – 119: NNTP DBI 2008 HUJI-CS 9 Sockets • A socket is a construct that represents one endpoint of a two-way communication channel between two programs running on the network • Using sockets, the OS provides processes a filelike access to the channel – i.e., sockets are allocated a file descriptor, and processes can access (read/write) the socket by specifying that descriptor • A specific socket is identified by the machine's IP and a port within that machine DBI 2008 HUJI-CS 10 Sockets (cont) • A socket stores the IP and port number of the other end-point computer of the channel • When writing to a socket, the written bytes are sent to the other computer and port (e.g., over TCP/IP) – That is, remote IP and port are attached to the packets • When OS receives packets on the network, it uses their destination port to decide which socket should get the received bytes DBI 2008 HUJI-CS 11 Java Sockets Low-Level Networking Java Sockets • Java wraps OS sockets (over TCP) by the objects of class java.net.Socket • new Socket(String remoteHost, int remotePort) creates a TCP socket and connects it to the remote host on the remote port (hand shake) • Write and read using streams: – InputStream getInputStream() – OutputStream getOutputStream() DBI 2008 HUJI-CS 13 A Socket Example import java.net.*; import java.io.*; public class SimpleSocket { public static void main(String[] args) throws IOException { ... next slide ... } } DBI 2008 HUJI-CS 14 Socket socket = new Socket("www.cs.huji.ac.il", 80); InputStream istream = socket.getInputStream(); OutputStream ostream = socket.getOutputStream(); String request = "GET /~dbi/admin.html HTTP/1.1\r\n" + "Host: www.cs.huji.ac.il\r\n" + "Connection: close\r\n\r\n"; ostream.write(request.getBytes()); Needed for forwarding for example byte[] response = new byte[4096]; int bytesRead = -1; while ((bytesRead = istream.read(response)) >= 0) { System.out.write(response, 0, bytesRead); } socket.close(); DBI 2008 HUJI-CS 15 Timeout • You can set timeout values to blocking method read() of Socket • Use the method socket.setSoTimeout(milliseconds) • If timeout is reached before the method returns, java.net.SocketTimeoutException is thrown Read more about Socket Class DBI 2008 HUJI-CS 16 Java Sockets and HTTP HTTP Message Structure • A HTTP message has the following structure: Request/Status-Line \r\n Header1: value1 \r\n Header2: value2 \r\n ... HeaderN: valueN \r\n \r\n Message-Body DBI 2008 HUJI-CS 18 Reading HTTP Messages • Several ways to interpret the bytes of the body – Binary: images, compressed files, class files, ... – Text: ASCII, Latin-1, UTF-8, ... • Commonly, applications parse the headers of the message, and process the body according to the information supplied by the headers – E.g., Content-Type, Content-Encoding, TransferEncoding DBI 2008 HUJI-CS 19 An Example DBI 2008 HUJI-CS 20 Parsing the Headers • So how are the headers themselves represented? • Headers of a HTTP message must be in US-ASCII format (1 byte per character) DBI 2008 HUJI-CS 21 Example: Extracting the Headers Socket socket = new Socket(argv[0], 80); InputStream istream = socket.getInputStream(); OutputStream ostream = socket.getOutputStream(); String request = "GET / HTTP/1.1\r\n" + "Host: " + argv[0] + "\r\n" + "Connection: close\r\n\r\n"; ostream.write(request.getBytes()); StringBuffer headers = new StringBuffer(); int byteRead = 0; while ( !endOfHeaders(headers) && (byteRead = istream.read()) >= 0) { headers.append((char) byteRead); } System.out.print(headers); socket.close(); DBI 2008 HUJI-CS 22 Example: Extracting the Headers (cont) public static boolean endOfHeaders(StringBuffer headers) { int lastIndex = headers.length() - 1; if (lastIndex < 3 || headers.charAt(lastIndex) != '\n') return false; return (headers.substring(lastIndex - 3, lastIndex + 1) .equals("\r\n\r\n")); } • Why did we (inefficiently) read one byte at a time? • Is there any way to avoid this inefficiency? DBI 2008 HUJI-CS 23 Persistent Connections • According to HTTP/1.1, a server does not have to close the connection after fulfilling your request • One connection (socket) can be used for several requests and responses send more requests – even while earlier responses are being transferred (pipelining) – saves “slow start” time • How can the client know when one response ends and a new one begins? • To avoid persistency, require explicitly by the header Connection: close DBI 2008 HUJI-CS 24 Parsing URLs Working with URLs • URL (Uniform/Universal Resource Locator): a reference (address) to a resource on the Internet http://www.cs.huji.ac.il:80/~dbi/main.html#notes Protocol Host Name Port Number File Name Reference Query http://www.google.com/search?hl=en&q=dbi+huji&btnG=Search DBI 2008 HUJI-CS 26 The Class URL • The class URL is used for parsing URLs • Constructing URLs: – URL w3c1 = new URL("http://www.w3.org/TR/"); – URL w3c2 = new URL("http","www.w3.org",80,"TR/"); – URL w3c3 = new URL(w3c2, "xhtml1/"); • If the string is not an absolute URL, then it is considered relative to the URL DBI 2008 HUJI-CS 27 Parsing URLs • The following methods of URL can be used for parsing URLs getProtocol(), getHost(), getPort(), getPath(), getFile(), getQuery(), getRef() Read more about URL Class DBI 2008 HUJI-CS 28 URLEncoder • Contains a utility method encode for converting a string into an encoded format (used in URLs, e.g. for searches) • To convert a string, each char is examined in turn: – Space is converted into a plus sign + – a-z, A-Z, 0-9, ., -, * and _ remain the same. – The bytes of all special characters are replaced by hexadecimal numbers, preceded with % • To decode an encoded string, use decode() of the class URLDecoder Read more about URLEncoder Class DBI 2008 HUJI-CS 29 Class URLConnection High-Level Networking The class URLConnection • To establish the actual resource, we can use the object URLConnection obtained by url.openConnection() • If the protocol of the URL is HTTP, the returned object is of class HttpURLConnection • This class encapsulates all socket management and HTTP directions required to obtain the resource Read more about URLConnection Class DBI 2008 HUJI-CS 31 public class ContentExtractor { public static void main(String[] argv) throws Exception { URL url = new URL(argv[0]); System.out.println("Host: " + url.getHost()); System.out.println("Protocol: " + url.getProtocol()); System.out.println("----"); URLConnection con = url.openConnection(); InputStream stream = con.getInputStream(); byte[] data = new byte[4096]; int bytesRead = 0; while((bytesRead=stream.read(data))>=0) { System.out.write(data,0,bytesRead); }}} DBI 2008 HUJI-CS 32 About URLConnection • The life cycle of a URLConnection object has two parts: – Before actual connection establishment • Connection configuration – After actual connection establishment • Content retrieval • Passage from the first phase to the second is implicit – A result of calling some committing methods, like getDate() DBI 2008 HUJI-CS 33 About HttpURLConnection • The HttpURLConnection class encapsulates all HTTP transaction over sockets, e.g., – Content decoding – Redirection – Proxy indirection • You can control requests by its methods – setRequestMethod, setFollowRedirects, setRequestProperty, ... Read more about HttpURLConnection Class DBI 2008 HUJI-CS 34