Introduction to HTTP The HyperText Transport Protocol is an ‘application-layer’ protocol for the ‘client/server’ paradigm ‘Request’ and ‘Response’ client server HTTP Request message HTTP Response message timeline Built on TCP/IP • Application programmers will need to be aware that HTTP relies on TCP’s reliable, stream-oriented and connection-based transport-layer facilities when specifying the socket types, functions, and options socket() bind() listen() socket() accept() connect() read() write() write() read() close() close() server client HTTP Request Request line Headers Empty line Body (may be absent) HTTP Response Status line Headers Empty line Body (may be absent) Sample Request line space space carriage-return and line-feed “GET /home/web/cs336/syllabus.s09 HTTP/1.0\r\n” resource pathname (UNIX filename syntax) command (one word - all capitals) protocol and version-number Sample Request header-lines carriage-return and line-feed “Connection: close\r\n” “User-agent: Mozilla 4.0\r\n” “Accept-language: en\r\n” • The header-lines must be followed by an ‘empty’ line (carriage-return and line-feed) Sample Response line space space “HTTP/1.0 200 OK\r\n” protocol and version-number response phrase status code carriage-return and line-feed Sample Response header-lines carriage-return and line-feed “Connection: close\r\n” “Date: Tue, 15 March 2009\r\n” “Server: Apache/1.3 *Unix)\r\n” “Content-Type: text/html\r\n” • The header-lines must be followed by an ‘empty’ line (carriage-return and line-feed) Demo: ‘grabfile.cpp’ • We shall construct a simple HTTP client which will allow a user to obtain a named internet object by typing its URL (Uniform Resource Locator) on the command-line: $ grabfile http://www.cs.usfca.edu/index.html The URL concept • URL means ‘Uniform Resource Locator’ • It’s a standard way of specifying any kind of information available on the Internet • Four elements of a URL specification: – Method (i.e., the protocol for object retrieval) – Host (i.e., location hostname or IP-address) – Port (i.e., port-number for contacting server) – Path (i.e., pathname of the resource’s file) The URL Format method :// host : port / path EXAMPLE: http://cs.usfca.edu:80/~cruse/cs336/syllabus.pdf Note: The port-number is often omitted in cases where the ‘method’ is an internet protocol (like HPPT) which uses a ‘well-known port’ Application’s organization Parse the URL entered on the command-line to determine the server’s hostname and port-number and the pathname to the desired file-obsect Open a stream-oriented TCP internet socket and establish a connection with the server Form the HTTP Request message and write it to the socket Read from the socket to receive the HTTP Response message (and echo it to the display) Close the socket to terminate the TCP connection Parsing the URL • The most challenging part of this program concerns the parsing of the command-line argument, allowing for some ‘degenerate’ cases and some malformed specifications • Several standard string-functions from the UNIX runtime-library are put to good use, including ‘strlen()’, ‘strncpy()’, ‘strtok()’ and ‘strtok_r()’, plus ‘strspn()’ and ‘strcspn()’ ‘strlen()’ size_t strlen( const char *s ); • This function calculates the length of the null-terminated string whose address is supplied as the function-argument #include <string.h> char message[ ] = “Hello”; int main( void ) { int len = strlen( message ); printf( “\’%s\’ has %d characters\n”, len ); } OUTPUT: ‘Hello’ has 5 characters ‘strncpy()’ char *strncpy( char *dst,const char *src, size_t n ); • This function copies at most n characters from the ‘src’ string into the ‘dst’ string, so provides a ‘safe’ way to copy from a string that might be too long to fit the destination int main( int argc, char *argv[] ) { char param[ 64 ]; if ( argc == 1 ) { fprintf( stderr, “ param? \n” ); exit(1); } strncpy( param, argv[ 1 ], 63 ); // source string has unknown length … } ‘strtok()’ char *strtok( char *s, const char *delim ); • This function extracts tokens from a string, but after being called once, it remembers where it stopped in case the caller wants to extract more tokens from that string char sentence[ ] = “Hello, world!\n”; char *word1 = strtok( sentence, “ ,!\n” ); char *word2 = strtok( NULL, “ ,!\n” ); char *word3 = strtok( NULL, “ ,!\n” ); printf( “ \’%s\’ \’%s\’ \’%s\’ \n”, word1, word2, word3 ); OUTPUT: ‘Hello’ ‘world’ ‘<nul>’ ‘strtok_r()’ char *strtok_r( char *s, const char *delim, char **saveptr ); • This function is a ‘reentrant’ version of the ‘strtok()’ function, placing the address of the character where a subsequent search for another token to extract would begin char sentence[ ] = “Hello, world!\n”; char *word1, *word2, *word3; word1 = strtok( sentence, “ ,!\n”, word2 ); strtok( word2, “ ,!\n”, word3 ); printf( “ \’%s\’ \’%s\’ \’%s\’ \n”, word1, word2, word3 ); OUTPUT: ‘Hello’ ‘world’ ‘<nul>’ ‘strspn()’ size_t strspn( const char *s, const char *accept ); • This function searches a string for a set of characters, and returns the length of the initial segment which consists entirely of characters that are in the ‘accept’ string char vowels[ ] = “aeiou”; char word[ ] = “eating”; int len = strspn( word, vowels ); printf( “\’%s\’ has %d vowels before any consonant \n”, word, vowels ); OUTPUT: ‘eating’ has 2 vowels before any consonant ‘strcspn()’ size_t strcspn( const char *s, const char *reject ); • This function searches a string for a set of characters, and returns the length of the initial segment which consists entirely of characters that are not in the ‘reject’ string char vowels[ ] = “aeiou”; char word[ ] = “shout”; int len = strcspn( word, vowels ); printf( “\’%s\’ has %d consonants before any vowel \n”, word, vowels ); OUTPUT: ‘shout’ has 2 consonants before any vowel Examples • Here are a few examples of ‘malformed’ and ‘degenerate’ URL parameter-strings http://:54321/index.html # no server hostname http://yahoo.com:/index.html # missing port http://usfca.edu:::54321/index.html # excess ‘:’s www.sfmuni.com/index.html # no ‘method’ http://www.bart.gov/ # no pathname www.sfsu.edu:80:57/index.html # extra chars In-class exercise • Download our ‘grabfile.cpp’ application and see whether you are able to retrieve any files by typing a URL as an argument • HINT: You can use some of the same IPaddresses and hostnames that you tried successfully while you were testing your earlier ‘showpath.cpp’ project