20-755: The Internet Lecture 9: Web Services II David O’Hallaron School of Computer Science and Department of Electrical and Computer Engineering Carnegie Mellon University Institute for eCommerce, Summer 1999 Lecture 9, 20-755: The Internet, Summer 1999 1 Today’s lecture • • • Dynamic content background (35 min) Break (10 min) Serving dynamic content with GET and POST (40 min) Lecture 9, 20-755: The Internet, Summer 1999 2 How programs run other programs • • • Recall that a process is an instance of a running program. Suppose a process A, which is running program foo, wants to run the program bar. Two-step procedure: – First, process A creates a new process B that is a clone of A » A and B are independent processes running concurrently on the machine. » A is the parent, B is the child. » Each has a unique process id (pid) – Second, process B recognizes that it is a clone, overwrites foo with bar, and transfers control to the first instruction in bar. Lecture 9, 20-755: The Internet, Summer 1999 3 How programs run other programs • Initially, foo is running in process A with process id (pid) of 325. Lecture 9, 20-755: The Internet, Summer 1999 Process A foo pid = 325 4 How programs run other programs • • • Next, program foo running in process A clones a copy of itself. So now we have two identical independent processes (A and B) running the same code. A can wait immediately for B to complete, or do other work in the meantime. Lecture 9, 20-755: The Internet, Summer 1999 Process A foo pid = 325 Process B foo pid = 326 5 How programs run other programs • • The instance of foo in process B recognizes that it is a clone. Process B foo replaces its code with the code for bar. Process A foo Process B bar Lecture 9, 20-755: The Internet, Summer 1999 pid = 325 pid = 326 6 How programs run other programs • pid = fork() – creates a clone of the current process. – returns a 0 to the child process. – returns the positive integer process ID of the child to the parent. • exec(objfile) – replaces the current running program with the code in the executable file objfile. – exec never returns to the caller unless there is an error. » e.g., if it can’t locate objfile. Lecture 9, 20-755: The Internet, Summer 1999 7 How programs run other programs # This is how program foo running in process A # runs program bar in a new process B # the parent executes this statement $child_pid = fork(); # both parent and child run the if statement if ($child_pid == 0) { # Only the child executes this code print “I’m the child\n” exec(bar); # the child only gets to this point if the # exec fails die “can’t exec bar: $!”; } # the parent continues here Lecture 9, 20-755: The Internet, Summer 1999 8 Perl abstractions for fork and exec • backquote operator – $output = `foo`; » runs the executable program foo and returns the contents of STDOUT to variable $output. • system command – system(“foo”, $arg1, arg2); » runs executable program date. » output goes to wherever STDOUT is currently going (e.g., the screen) – system($prog > mydate.txt”) » redirects output to file mydate.txt Lecture 9, 20-755: The Internet, Summer 1999 9 How programs pass info to the programs they create • Command line arguments – the exec operator can pass a list of ASCII arguments to the program that it run » exec(“foo.pl”, “dave”, “ohallaron”); #!/usr/local/bin/perl5 -w # Array @ARGV holds the arguments. # Acessing @ARGV returns the number of array elements # $0 is the name of the perl script (foo.pl) # $ARGV[0] is the first array element (argument) # $ARGV[1] is the second array element (argument) if (@ARGV != 2) { print "usage: $0 first last\n"; exit; } print "arg0 = $ARGV[0]\n"; # dave print "arg1 = $ARGV[1]\n"; # ohallaron Lecture 9, 20-755: The Internet, Summer 1999 10 How programs pass info to the programs they create • Environment variables – Each process maintains a set of “environment variables” » list of ASCII (name,value) pairs. » represent long term conditions or preferences. – A forked process gets an exact duplicate of the parent’s environment variables. Lecture 9, 20-755: The Internet, Summer 1999 11 Unix shell environment variables % printenv PWD=/usr/droh/afs/ TERM=emacs EMACS=t MANPATH=/usr/man:/usr/local/man:/usr/local/apache/man:/usr/X11R6/man PRINTER=iron login_done=1 HOSTNAME=kittyhawk.cmcl.cs.cmu.edu HOSTTYPE=i386_linux3 HOST=kittyhawk.cmcl.cs.cmu.edu SHLVL=2 KRBTKFILE=/tkt/3478-030d-379b6ada PATH=.:/usr/droh/bin:/usr/sbin:/sbin:/usr/local/apache/bin: /usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/etc:/etc: /usr/X11R6/bin USER=droh SHELL=/usr/local/bin/tcsh HOME=/usr/droh Lecture 9, 20-755: The Internet, Summer 1999 12 Accessing environment variables from PERL • Environment variables stored in a special hash called “%ENV” # sort and list the environment variables foreach $key(sort keys %ENV) { print “$key=$ENV{$key}\n”; } # add a new (key,value) pair to the environment hash %ENV{“IPADDR”} = “128.1.194.242”; # delete a (key,value) pair from the environment hash delete $ENV{“IPADDR”}; Lecture 9, 20-755: The Internet, Summer 1999 13 Serving dynamic content • • Client sends request to server. If request URI contains the string “/cgi-bin”, then the server assumes that the request is for dynamic content. Lecture 9, 20-755: The Internet, Summer 1999 GET /cgi-bin/env.pl HTTP/1.1 client server 14 Serving dynamic content • The server creates a child process and runs the program identified by the URI in that process client server fork/exec env.pl Lecture 9, 20-755: The Internet, Summer 1999 15 Serving dynamic content • • The child runs and generates the dynamic content. The server captures the content of the child and forwards it without modification to the client Lecture 9, 20-755: The Internet, Summer 1999 client content server content env.pl 16 Serving dynamic content • • The child terminates. Server waits for the next client request. Lecture 9, 20-755: The Internet, Summer 1999 client server 17 Issues in serving dynamic content • • • • • How does the client pass program arguments to the client server? How does the server pass these arguments to the child? How does the server pass other info relevant to the request to the child? How does the server capture the content produced by the child? These issues are addressed by the Common Gateway Interface (CGI) specification. Lecture 9, 20-755: The Internet, Summer 1999 request content server content create env.pl 18 Break time! Fish Lecture 9, 20-755: The Internet, Summer 1999 19 Today’s lecture • • • Dynamic content background (35 min) Break (10 min) Serving dynamic content with GET and POST (40 min) Lecture 9, 20-755: The Internet, Summer 1999 20 Issues in serving dynamic content • • • • • How does the client pass program arguments to the client server? How does the server pass these arguments to the child? How does the server pass other info relevant to the request to the child? How does the server capture the content produced by the child? These issues are addressed by the Common Gateway Interface (CGI) specification. Lecture 9, 20-755: The Internet, Summer 1999 request content server content create env.pl 21 CGI • Because the children are written according to the CGI spec, they are often called CGI programs. – Because many CGI programs are written in Perl, they are often called CGI scripts. • However, CGI really defines a simple standard between the client (browser), the server, and the child process. Lecture 9, 20-755: The Internet, Summer 1999 22 add.com: THE Internet addition service! • • Ever needed to add two numbers together and you just can’t find your calculator? Try Dr. Dave’s addition service at add.com! – Takes as input your name, and two numbers you want to add together. – Returns their sum in a tasteful personalized message. – After the IPO we’ll expand to multiplication! Lecture 9, 20-755: The Internet, Summer 1999 23 Serving dynamic content with GET • • • Question: How does the client pass arguments to the server? Answer: The arguments are appended to the URI Can be encoded directly in a URL typed to a browser or a URL in an HTML link – – – – – • http://add.com/cgi-bin/add.pl?Dave+O’Hallaron&1&2 add.pl is the program on the server that will do the addition. argument list starts with “?” arguments separated by “&” spaces represented by “+” Can also be generated by an HTML form <form method=get action="http://add.com/cgi-bin/post.pl"> Lecture 9, 20-755: The Internet, Summer 1999 24 Serving dynamic content with GET • URL: – http://add.com/cgi-bin/add.pl?Dave+O’Hallaron&1&2 • Result: Mr. Dave O'Hallaron, Welcome to add.com! The answer is: 1 + 2 = 3 Please come again soon! Tell your friends! Lecture 9, 20-755: The Internet, Summer 1999 25 Serving dynamic content with GET • • Question: How does the server pass these arguments to the child? Answer: In environment variable QUERY_STRING – a single string containing everything after the “?” – for add.com: QUERY_STRING = “Dave+O’Hallaron&1&2” # # Child code that parses the add.com arguments # $args = $ENV{QUERY_STRING}; $args =~ s/\+/ /; #replaces + with “ “ ($name, $a1, $a2) = split(/&/, $args); Lecture 9, 20-755: The Internet, Summer 1999 26 Serving dynamic content with GET • • Question: How does the server pass other info relevant to the request to the child? Answer: in a collection of environment variables defined by the CGI spec. Lecture 9, 20-755: The Internet, Summer 1999 27 Some CGI environment variables • General – SERVER_SOFTWARE – SERVER_NAME – GATEWAY_INTERFACE (CGI version) • Request specific – – – – – – SERVER_PORT REQUEST_METHOD (GET, POST, etc) QUERY_STRING (contains args) REMOTE_HOST (domain name of client) REMOTE_ADDR (IP address of client) CONTENT_TYPE (for POST, type of data in message body, e.g., text/html) – CONTENT_LENGTH (length in bytes) Lecture 9, 20-755: The Internet, Summer 1999 28 Some CGI environment variables • In addition, the value of each header of type type received from the client is placed in environment variable HTTP_type – Examples: » HTTP_ACCEPT » HTTP_HOST » HTTP_USER_AGENT (any “-” is changed to “_”) Lecture 9, 20-755: The Internet, Summer 1999 29 Serving dynamic content with GET • • Questions: How does the server capture the content produced by the child? Answer: The child writes its content to stdout. # # server code that runs child and captures stdout # # run the child and put its dynamic content in $child_output $child_output = `add.pl`; # send the child’s dynamic content back to the client $connfd->print($output) Lecture 9, 20-755: The Internet, Summer 1999 30 Putting it all together: The CGI script for GET requests to add.com #!/usr/local/bin/perl5 $args = $ENV{QUERY_STRING}; $args =~ s/\+/ /; ($name, $a1, $a2) = split(/&/, $args); print "Content-type: text/html\n\n"; print "<html><head></head><body>\n"; print "<h3>Mr. $name, Welcome to add.com!</h3>\n"; print "<b>The answer is: $a1 + $a2 = ", $a1+$a2, "</b><br>\n"; print "<p><i>Please come again soon! Tell your friends!</i>\n"; print "</body></html>\n"; Lecture 9, 20-755: The Internet, Summer 1999 31 Serving dynamic content with POST • • • More complicated and less general than GET Less frequently used because of the complexity. Only advantage is that it provides arbitrarylength argument lists – older browsers and servers had unnecessary limits on URI lengths in GET requests – doesn’t seem to be a problem anymore Lecture 9, 20-755: The Internet, Summer 1999 32 Serving dynamic content with POST • • Question: How does the client pass arguments to the server? Answer: In the message body of the HTTP request generated by a form. – space converted to “+” – puctuation converted to “%asciihexvalue” » e.g., apostrophe becomes “%27” Lecture 9, 20-755: The Internet, Summer 1999 33 add.com HTML form (form.html) <html> <body> <form method=post action="http://add.com/cgi-bin/post.pl"> <p>Name <input name="name" type=text SIZE="48"> <p>num1 <input name="num1" type=text SIZE="6"> <p>num2 <input name="num2" type=text SIZE="6"> <p><input type=submit> <input type=reset> </form> </body> </html> Lecture 9, 20-755: The Internet, Summer 1999 34 HTTP request generated by add.com form POST /cgi-bin/post.pl HTTP/1.1 Accept: */* Referer: http://add.com/form.html Accept-Language: en-us Content-Type: application/x-www-form-urlencoded Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 4.01; Windows 98) Host: add.com Content-Length: 34 CRLF name=Dave+O%27Hallaron&num1=1&num2=2 Lecture 9, 20-755: The Internet, Summer 1999 35 Serving dynamic content with POST • • Questions: How does the server pass the arguments to the child? Answer: Arguments are passed as one line via stdin. Lecture 9, 20-755: The Internet, Summer 1999 36 Serving dynamic content with POST • • • • Question: How does the server pass other info relevant to the request to the child? Answer: As with GET, in a collection of environment variables defined by the CGI spec. Question: How does the server capture the content produced by the child? Answer: As with GET, via stdout. Lecture 9, 20-755: The Internet, Summer 1999 37