Optional exercise: Information system design So far in this worksheet we have explored the relationship between HTML and CSS and have been introduced to the MVC model for application design. With these tools in our toolkit it is time to try designing our own metadata-rich information service. In designing our service we need to keep in mind that a good information service is able to 1) fill a specific information need, 2) makes efficient use of available data and 3) is durable and easy to use. Lets start by exploring our information need. We will continue by examining possible solutions and will finally design and implement an application that fills our information need. Information need As you are aware, libraries purchase licensed resources so that students, faculty and staff can use them to do research. When libraries purchase access to these resources they have to provide a method of authentication to limit access to them. Often the publishers of these resources make the indexed content, but not the full text, available on the web. For example if you search Google scholar you will find citations to millions of articles but will often have trouble getting access to them if you are not on campus. Libraries provide access to these resources to off-campus patrons using something known as a proxy server. Proxy servers sit in between your user client and the destination server and handle (i.e. proxy) all of your requests. The impact of this is that the destination server thinks that you are on campus when in fact you are not! Proxy servers accomplish this by modifying every URL that you click on in your web-browser. There are two types of proxies – client based proxies, which you setup on your own machine) and server-based proxies, which redirect all traffic that you send it. Most libraries use a server configured proxy to avoid having users configure their own clients. This means that when you visit http://researchport.umd.edu from off campus you will have to login. You will notice after you do that that your URL contains the domain root “proxy-um.researchport.umd.edu.” When the proxy server is between you and the destination server it actually re-writes every URL on the page you see so that you stay proxied! Although the proxy server is quite useful, it can be frustrating when we happen across a restricted resource via a Google search and have to re-find the resource using the UM researchport service in order to gain access to it. Wouldn’t it be much nicer if we could push a button that would try to access the restricted resource through the proxy server immediately? Consideration of possible solutions It turns out that each HTML document is able to tell us what its URL (Uniform Resource Locator) is. This is the web address that we used to access the document. The URL is a direct path to the resource and all we need to do is see if we could have access if we were on campus. If we could just change that URL to make it route through the proxy server then we would have our solution! Lets look at an example by exploring URLS when we are proxied a. From off campus go to http://dl.acm.org/citation.cfm?id=1734388 b. You should see a ‘buy this article’ instead of a link to the full text! c. If we were going to find this article again we would have to go to http://researchport.umd.edu, login, find this database and conduct a search for the article title. Explore the differences between URLS for our test article d. login to research port (http://researchport.umd.edu) e. Find the ACM digital library f. Search for the article title and navigate to the article g. Copy the URL from the bar as you navigate pages – notice how the domain always includes the root url in addition to other information?\ Identify the elements of the URLs h. Copy the URL from Step 9a and the URL that you got in step 10c and paste them one above the other. i. A URL consists of a Scheme, a host, a port (optional) a path, a query and a fragment (optional). See Figure 2 below to help identify each of these elements in your own URLS. Figure 1: URL scheme ( from http://docstore.mik.ua/orelly/linux/cgi/ch02_01.htm) Key Questions Question 1. What are the host values of your two URLS from 9a and 10c? Question 2. What elements of the two URLS are different? Why do you think these elements differ? (Note – there are lots of differences and reasons – no need to be exhaustive here!) The reason proxies work is because they can leverage the Document Object Model (DOM) and scripting languages to re-write the host values of our URLS. As you can imagine this can be both very useful and very nefarious. Proxies have also been used as something called “man in the middle” attacks which redirect all of your web traffic to another location. Explore – what is the DOM? As we recall, the Document Object Model (DOM) is a model for representing documents in web-browsers. The DOM applies to a suite of web-based technologies including HTML, XML, JavaScript and CSS. Figure 1 shows the DOM model. Notice that the parent element is the “Window” object. That window has the children history and document. The document object in turn has multiple children. The DOM and MVC can be viewed as complementary models in that they help us create web applications using MVC design. This is possible in part because MVC defines an approach to software design that separates data from logic and display and the DOM allows us to write abstracted programs that can be applied to multiple documents (or even document types). Without these two concepts we would be limited to integrating our data model (e.g. HTML elements), our design model (e.g. styles, layout and formatting using HTML) and our web service behaviors (e.g. JavaScript commands) into a single document, making it much harder to both create and maintain. Review the DOM model below and answer the associated questions Figure 2: DOM model Key Questions Question 3. What is the top element of the DOM? Question 4. The document element (or object) in this model has what siblings? Question 5. Think for a moment about having the programmatic ability to manipulate all of the elements of the DOM. One feature of this is the ability to tell the window object to load a new document? Can you think of an example of this in your everyday use of the web? Perhaps one related to the information need we are exploring? Designing a solution It turns out that using the DOM and some JavaScript we can access the URL of our non-authenticated page and substitute the host of our proxy service. While this sounds complicated (in theory it is somewhat complicated), in practice it is pretty simple!. The JavaScript code that we develop can be installed in our browser as a Bookmarklet (e.g. a browser bookmark that runs a small piece of JavaScript on a loaded web-page) and can be accessed as a simple button in our browser! Figure 4 shows the completed bookmarklet installed in our browser. Lets begin by exploring bookmarklets. Figure 3: An installed bookmarklet called "ProxyUM" Look on the web for a definition and example of a bookmarklet. Identify how and where the bookmarklet is used and figure out how to ‘install’ it in your web-browser. Be prepared to report back to the class on the bookmarklet you found. Key Questions Question 6. What definition did you find for a bookmarklet? Question 7. Could you find any examples of a bookmarklet on the web? As we found, bookmarklets tend to be limited in scope and often implement a single purpose web-service. Sometimes this points to a web-service at another site but bookmarklets can also be used for very simple purposes. Anatomy of a JavaScript statement JavaScript is a scripting language that runs within the context of a web-browser. All web-browsers contain a JavaScript interpreter engine to go along with the HTML renderer. You can include JavaScript in a webpage by using the <script type=”javascript”/> element or by including it from another file. You can invoke JavaScript in a webpage by either running it on page load or on specific events. For example, if you want to have JavaScript run at the end of a page load, simply call your JavaScript functions in the footer of your HTML document. 1. <div id=”footer”><script type=”javascript”>….</script></div> You can also run JavaScript when a user clicks links by including a javascript function as part of an <a href> element 2. <a href=”javascript:….”></a> JavaScript uses a number of common programming elements including control structures, variables, functions, methods, and objects. Before we move forward, lets quickly define these concepts: Table 1: Definition of terms Programming concept Definition Control structure A control structure acts as a gate or cycle that a program must pass through when running. Our program contains no control structures! Variable A variable is very similar to an HTML element in that it has a name and a value. Variables are used to store data for the program to use. Our program does not use any variables! Functions A function is a block of code designed with a specific purpose. Functions are intended to “stand alone” and be relatively self sufficient. Our program creates a single function that implements our service Methods Methods, like functions are blocks of code that do something specific but unlike functions, methods are tied directly to objects. The association is a hierarchical one (a method inherits traits from its object and belongs to an object)( Objects An object is a set of variables, methods and definitions that form a discrete thing in programs. Objects have real-world analogs – for example a door has a form, methods (open, close) and statuses (opened, closed, locked). Our objects have similar states and we will work with them extensively in this exercise. Just what you need to know to understand this code In order to stay focused on our goal, we are just going to explore the elements of the language that we need to create our bookmarklet. A quick note on referring to objects and methods in JavaScript. As you can see on line 3, we have an object (window) with a member (location) and a method (replace). These three elements are connected using a dot (.) syntax (e.g. window.location.replace). This hierarchical representation of the object tells our program to execute the replace method on the location member of the window object. The parentheses on line 3, 4 and 5 help us group our methods using standard Boolean logic. The plus signs on line 4 implement a function called concatenation in which strings are re-combined. On line 4 you see a mix of object references (window.location.protocol) and text in double quotes. The text in double quotes (e..g “.proxy- um.researchport.umd.edu” are passed through our program as regular text while our object references (window.location.protocol) are processed by our JavaScript interpreter. 3. window.location.replace( 4. window.location.protocol + "//" + window.location.host + ".proxyum.researchport.umd.edu" + window.location.pathname + "?" + window.location.search 5. ); To review, on line3 we call the method replace on the object window.location. Note that location is a child of the window object and that the method replace tells the window.location object to replace the document is currently has with the one you are going to provide. The method is called using the syntax 6. window.location.replace(REPLACEMENTURL); Note that the line begins with a method call, encloses the REPLACEMENTURL in parentheses and ends in a semicolon. This is standard syntax for all method calls in JavaScript Lets try this out by entering javascript commands in a browser window. j. Open your favorite web-browser (e.g. Chrome, firefox, Safari, IE) k. Type javascript:window.location.replace("http://google.com"); into the location bar l. Hit enter m. What happens? You notice that we can hard-code a URL to send the document to. As I said before we can include that JavaScript code as the href value in an <a> element as well so that when a user clicks on the link it loads a specific page using javscript. Normally this would not make much sense as the href element pretty much tells the browser to load whatever link you follow but as we will see below, using JavaScript allows us to make that link dynamic! Lets return to our code example above. Look at line 4. You will see a mix of objects (e.g. window.location.protocol) and hard coded values (e.g. “//”). You will also see that we concatenate or combine each of these values using an addition (+) sign. This is standard javascript syntax for combining several values to create a new string. Lets look at that line of code again: 7. window.location.protocol + "//" + window.location.host + ".proxyum.researchport.umd.edu" + window.location.pathname + "?" + window.location.search Using the javascript reference at http://www.w3schools.com/jsref/obj_location.asp. Look up each of the objects defined in the code line above n. window.location.protocol o. window.location.host p. window.location.pathname q. window.location.search While we are here, lets also look at the meaning of window.location.replace(URL). This method replaces the document in your current window with a new document located at the URL passed to the method. In this web service we use the window.location.replace method to redirect our acm page to a proxied version. Before we get too involved with writing javascript, lets explore one method of using javascript to understand more about our HTML page. In the following steps we will use JavaScript to break apart the value of our acm.org URL into its constituent parts (e.g., protocol, host, pathname and search). r. Load http://dl.acm.org/citation.cfm?id=1734388 into a web-browser of your choice (Note, Chrome may be a good browser to use). s. We can run have JavaScript help us decode this URL by entering specific JavaScript commands in to the URL bar of our webbrowser. t. Try this out by typing “javascript:window.alert(window.location.protocol)” into your URL bar. After you have typed out the script (Figure XX), hit enter. Did you get a pop-up box (e.g., Figure XY)? Figure 4 Javascript window.alert method Figure 5 JavaScript window.alert output Now lets understand how a real URL maps to each of these objects. Take the URL http://dl.acm.org/citation.cfm?id=1734388 and use the method from step 17 to extract each value from these four variables. u. window.location.protocol = v. window.location.host = w. window.location.pathname = x. window.location.search = With our four variables and their values (e.g. element/value pairs) identified, you should see that a few characters are missing from the four elements. While we will use these four elements to help build a new proxied URL we also have to be aware of the proper syntax for a URL (e.g., Protocol://host.com/pathname?search). For each element, what characters were not returned from the URL? y. window.location.protocol = z. window.location.host = aa. window.location.pathname = bb. window.location.search = In order to re-insert these characters into our new URL we simply include them as quoted strings (e.g., “//”). When we put characters in quotations in JavaScript, we are asking JavaScript to simply pass along the characters without modification. In contrast, when we do not quote characters and strings in JavaScript, the program attempts to map these strings into valid JavaScript commands (e.g. window.alert). Remembering that anything in “” is a non-changing value and that anything in the format window.location. is either a method (i.e. it performs some task) or a value (i.e. it stores some information) take the above URL and write apply the code in line 4 against it. Write down the new URL that would be generated if we were to call the window.location.replace method on it. cc. New URL: Now, lets copy our new URL and paste it in a web-browser. Were you prompted to login to researchport? Did you pull up a webpage from which you could get the full text? Lets try it again by taking the code on lines 3-5 and assembling them into a single line of code. Load the ACM Url again and then copy and paste your singl line of code from step 10 into the URL. Hit enter. What happens? Did you get redirected through the proxy? A quick note on query strings. Query strings are all of the text in a URL that come after the “?”. Query strings are used to pass variables to web applications so that they know what to do next. In the above URL, the query string is id=1734388. Create our bookmarklet We can see that applying this JavaScript operation to the above url redirects it through the university proxy server. This means that we can be off campus when we find a resource on the web and use this code to access the resource using the campus proxy server. Now that we have explored the components of our JavaScript function and understand what it does lets turn it into a bookmarklet so that we can have simple access to this function anytime we need it. Open a text editor and create an HTML page that with the following code: 8. <html> 9. <body> 10. <a href="javascript:window.location.replace(window.location.protocol%20+%20'/ /'%20+%20window.location.host%20+%20'.proxyum.researchport.umd.edu'%20+%20window.location.pathname%20+%20win dow.location.search);">test</a> 11. </body> 12. </html> Save the page do your computer and open it in your webbrowser. Try the link – what did it do? Reload your page and drag your link to the bookmark toolbar (you may need to tell your web-browser to show your bookmark toolbar first). Visit the ACM page again. Click you bookmarklet. Did it work?