Markup Languages and Web Programming Objectives • to learn basic HTML – and how to do web pages in our dept server (because it is useful) – to understand the layout algorithm behind browsers • to learn basic XML – as an example of markup languages for structured data representation • to use XSL to translate from XML to HTML – to learn the value of separating data from code or view • to talk about types of scripting HTML • HyperText Markup Language • this is what is behind what you see on a web page (type Crtl-U to ‘view source’) • early design principle for the web: describe the content, let the browser figure out how to display it – examples: line breaks/wrapping, fonts – “device-independent”, e.g. terminals that don’t support graphics... • Tags: <HTML> <HEAD> HTML <TITLE>This is my web page</TITLE> </HEAD> BODY <BODY> HEAD <H1>heading</H1> Here is some text. TITLE H1 Here is </BODY> some text. </HTML> • Why tags? This is my web page heading – advantages for parsing – can match-up open-tags with close-tags – represents a hierarchical structure to the data • More Tags: <B>boldface</B>, <I>italics</I> <BR> line break, <P> page break, <HR> horizontal rule <!-- comments --> • Lists: – <UL> for unordered lists (bullets) – <OL> for ordered lists (numbered) <UL> <LI>list item </UL> • Note: – browsers are actually designed to be flexible and accept loose syntax without properly closed tags – a shorthand to close a tag is: <BR/> = <BR></BR> • Tables <TABLE border=1> <TR><TD>A<TD>B</TR> <TR><TD>C<TD>D</TR> </TABLE> A B C D • Hyperlinks – <A HREF=“http://www.tamu.edu”>TAMU</A> • Images – <IMG SRC=“https://www.google.com/images/srpr/logo4w.png”></IMG> • of course, you can do many other things, like changing fonts and colors, specifying background colors/images, etc... – see this for HTML documentation – http://www.w3schools.com/html/default.asp • It is important to see what is behind web pages, and to know how to write it by hand. – what you see visually is described in file – think about lists and tables • we don’t say “put a bullet with a certain indent here...” • we say “here is the next item in the list” – the browser uses a layout algorithm to determine where to place things and what size, etc. • example: how to determine column widths in tables based on content? <TABLE border=1> <TR><TD>A<TD>narrower</TR> <TR><TD>a very wide wide column<TD>D</TR> </TABLE> Markup Languages • different systems of tags • There are many markup languages – SGML: book contents, for publishers • <chapter>, <abstract>, <subsection>... – VRML: virtual reality, with tags for describing geometric objects and their positions in 3D – MathML: tags for describing formulas • <sqrt>2</sqrt> • ax2: <mrow>a <msup>x 2</msup></mrow> – XML: eXtensible Markup Language • XML: make up your own tags for representing arbitrary data – example: <author>H.G. Wells</author> – partly, this was a response to the “semistructured” TABLEs in HTML – people didn’t know what the <TD> values meant semantically – tags “markup” or describe the data items • also known as metadata • data about the data, such as field name, source, units, etc. • can also use attributes • <price date=“9/29/2013” units=“euros”>2.50</price> in HTML <H1>Nobel Prizes</H1> <TABLE border=1> <TR><TD>Robert G. Edwards<TD>Medicine <TD>2010</TR> <TR><TD>Dan Shechtman <TD>Chemistry<TD>2011</TR> </TABLE> <NobelPrizes> <winner> <name>Robert G. Edwards</name> <area>Medicine</area> <year>2010</year> in XML </winner> <winner> <name>Dan Shechtman</name> <area>Chemistry</area> <year>2011</year> </winner> </NobelPrizes> • there are good parsers available for reading XML files in different languages – xerces for Java and C++ – minidom for Python – these APIs provide a parsing function: • input a filename • outputs the data in a tree-based data structure • note: XML requires strict syntax – every open tag must be properly closed (and not interleaved) • comparing XML to flat files or .CSV format Courses.csv: course title CSCE 411 Design and Analysis of Algorithms CSCE 121 Introduction to Computing in C++ CSCE 314 ”Programming Languages ”course”,”title” CSCE 206 Programming in C ”CSCE 411”,”Design and Analysis of Algorithms” ”CSCE 121”,”Introduction to Computing in C++” ”CSCE 314”,”Programming Languages” ”CSCE 206”,”Programming in C” • tab-separated or comma-separated • data laid out in rows and columns, like a spreadsheet Courses.xml: <courses> <course> <name>CSCE 411</name> <title>Design and Analysis of Algorithms</title> </course> <course> <name>CSCE 121</name> <title> Introduction to Computing in C++</title> </course> </courses> • XML is less compact (more verbose) • each item is explicitly labeled • more flexible: can have 0 or >1 titles, fields in any order • Now we need a way to display data in XML – browsers show XML in raw form by default – use XSLT to “translate” XML data into HTML • eXtensible Stylesheet Language Transformation • http://www.w3schools.com/xsl/xsl_languages.asp 1. make up a stylesheet (.xsl) file 2. add a reference to the stylesheet from your .xml file – this tells the browser how to display the data <?xml version="1.0" ?> <?xml-stylesheet type="text/xsl" href="books.xsl" ?> <BOOKS> <book> <title>Moby Dick</title> <author>Herman Melville</author> </book> <book> <title>Crime and Punishment</title> <author>Fyodor Dostoevsky</author> </book> <owner>Tom</owner> </BOOKS> • XSL files can have HTML code in them, “wrapped” around the data • Data items in the XML file can be referenced by XPATHs <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl= "http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html" indent="yes"/> <xsl:template match="/"> <HTML> <BODY> <H1>Library of <xsl:value-of select="BOOKS/owner"/></H1> ... </BODY> </HTML> XPATHs are a way to name and access data items hierarchically by descending a sequence of tags in the XML file <H1>Library of <xsl:for-each select="BOOKS/owner"><H1> <TABLE border="1"> <TR><TH>Title</TH><TH>Author</TH></TR> <xsl:for-each select="BOOKS/book"> <TR> <TD><xsl:value-of select="author"/></TD> <TD><xsl:value-of select="title"/></TD> </TR> </xsl:for-each> <TR> <TD>Herman Melville</TD> </TABLE> <TD>Moby Dick</TD> </TR> <TR> <TD>Fyodor Dostoevsky</TD> <TD>Crime and Punishment</TD> </TR> <MEDIA> <book> <title>Moby Dick</title> XPATHs <xsl:value-of select=“MEDIA/movie/studio"/> <author>Herman Melville</author> </book> Dreamworks <book> <title>Crime and Punishment</title> <author>Fyodor Dostoevsky</author> </book> <movie> <title>AI</title> <director>S. Spielberg</director> <studio>Warner Bros,</studio> <distr>Dreamworks</distr> MEDIA </movie> </MEDIA> book title book title author movie title author studio director distributor Moby Dick Crime&Punish. AI Dreamworks H. Melville F. Dostoevsky S. Spielberg Warner Bros. <MEDIA> <book> <title>Moby Dick</title> XPATHs <xsl:value-of select=“MEDIA/movie/studio"/> <author>Herman Melville</author> </book> Dreamworks <book> <title>Crime and Punishment</title> <author>Fyodor Dostoevsky</author> </book> <movie> <title>AI</title> <director>S. Spielberg</director> <studio>Warner Bros,</studio> <distr>Dreamworks</distr> MEDIA </movie> </MEDIA> book title book title author movie title author studio director distributor Moby Dick Crime&Punish. AI Dreamworks H. Melville F. Dostoevsky S. Spielberg Warner Bros. <MEDIA> <book> <title>Moby Dick</title> XPATHs <xsl:value-of select=“MEDIA/movie/studio"/> <author>Herman Melville</author> </book> = Dreamworks <book> <title>Crime and Punishment</title> <author>Fyodor Dostoevsky</author> </book> <movie> <title>AI</title> <director>S. Spielberg</director> <studio>Warner Bros,</studio> <distr>Dreamworks</distr> MEDIA </movie> </MEDIA> book title book title author movie title author studio director distributor Moby Dick Crime&Punish. AI Dreamworks H. Melville F. Dostoevsky S. Spielberg Warner Bros. Separating Data from View/Code • general principle used in software engineering • can change the view without touching the data – e.g. swap the columns in the books table via XSL • can change the data without touching the code – e.g. internationalization: different sets of text strings in different languages • MVC (Model-View-Controller) paradigm advocated for programming in Smalltalk – M: methods defining how objects work – V: methods defining how they are displayed – C: methods defining how users interact with them • “resource forks” in Mac apps • Making your own web pages in our CSCE department – follow these instructions... – https://wiki.cse.tamu.edu/index.php/CSE_Web_Pages – make a web_home/ directory in your home directory – can access from PCs in labs via “H:” drive – note: make sure you make .html pages readable by setting permissions Web Programming • scripting can make web pages interactive • client-side vs. server-side processing – client-side: Javascript – server-side: CGI, PERL, Python, PHP Client-side: Javascript embedded in .html changes appearance dynamically Server-side: CGI request when press Submit on form Response in the form of a new .html page e.g. receipt server image borrowed from http://cliffmass.blogspot.com/2012/06/weather-x.html amazon.com page for C++ book Client-side: Javascript • examples: – popups when you mouse-over something – dynamically expand a table or section – validate data entered into a field • how it works – associate events like onmouseover() or onclick() to components of page (like buttons) – add a <script> section in the <head> of your .html – define functions to call on these events Example from http://www.w3schools.com/js/js_popup.asp: <html> <head> <script> function myFunction() { alert("I am an alert box!"); } </script> </head> Javascript can do all sorts of things here: • define variables • do calculations • change look of page • update text values • popup a dialog box • trigger a sound <body> <input type="button" onclick="myFunction()" value="Show alert box"> </body> </html> Server-side: CGI CGI = Common Gateway Interface • FORMs – web-page elements like buttons, text-entry fields, drop-downs, etc. – these refer to a script on the server which processes the input – data gets passed to server as pairs of variables and values – script generates a response .html page as output .html file <FORM name="form1“ method="post" action="http://saclab.tamu.edu/cgibin/tom/add.py"> <H3>Enter 2 numbers to add:</H3> A: <input type=“text” name="A"></input> <BR> B: <input type=“text” name="B"></input> <BR> <input type="submit" value="Submit“> </FORM> .cgi file (executes on the server) #!/usr/bin/python import cgi if __name__=="__main__": form = cgi.FieldStorage() a = int(form['A'].value) b = int(form['B'].value) c = a+b print "Content-type: text/html" print print "<HTML><BODY>" print "A+B = %s+%s = %s" % (a,b,c) print “</BODY></HTML>" what is sent back to the browser on the client to display in response: <HTML><BODY> A+B = 5+10 = 15 </BODY></HTML> • other examples: checkboxes, radio buttons, drop-downs... <BR>text field: <input type=“text” name="state"> <BR>button: <INPUT type="submit" value="Press Me!"> <BR>radio buttons: VISA <INPUT TYPE="radio" NAME="payment" value="V"> Mastercard <INPUT TYPE="radio" NAME="payment" value="M"> AMEX <INPUT TYPE="radio" NAME="payment" value="A"> <BR>checkboxes: <input type="checkbox" name=“vote“ value=yes> Yes <input type="checkbox" name=“vote“ value=yes> No <BR>drop-down: <select name="shipping"> <option>land</option> <option>sea</option> <option>air</option> </select> CGI script sees: state = Texas payment = M vote = yes shipping = land