03_acq_javascript_worksheet

advertisement
Optional exercise: Information system design
So far in this worksheet we have explored the relationship between HTML and
CSS and have been introduced to the MVC model for application design. With
these tools in our toolkit it is time to try designing our own metadata-rich
information service. In designing our service we need to keep in mind that a
good information service is able to 1) fill a specific information need, 2) makes
efficient use of available data and 3) is durable and easy to use. Lets start by
exploring our information need. We will continue by examining possible solutions
and will finally design and implement an application that fills our information
need.
Information need
As you are aware, libraries purchase licensed resources so that students, faculty
and staff can use them to do research. When libraries purchase access to these
resources they have to provide a method of authentication to limit access to
them. Often the publishers of these resources make the indexed content, but not
the full text, available on the web. For example if you search Google scholar you
will find citations to millions of articles but will often have trouble getting access to
them if you are not on campus.
Libraries provide access to these resources to off-campus patrons using
something known as a proxy server. Proxy servers sit in between your user
client and the destination server and handle (i.e. proxy) all of your requests. The
impact of this is that the destination server thinks that you are on campus when
in fact you are not!
Proxy servers accomplish this by modifying every URL that you click on in your
web-browser. There are two types of proxies – client based proxies, which you
setup on your own machine) and server-based proxies, which redirect all traffic
that you send it.
Most libraries use a server configured proxy to avoid having users configure their
own clients. This means that when you visit http://researchport.umd.edu from off
campus you will have to login. You will notice after you do that that your URL
contains the domain root “proxy-um.researchport.umd.edu.” When the proxy
server is between you and the destination server it actually re-writes every URL
on the page you see so that you stay proxied!
Although the proxy server is quite useful, it can be frustrating when we happen
across a restricted resource via a Google search and have to re-find the resource
using the UM researchport service in order to gain access to it. Wouldn’t it be
much nicer if we could push a button that would try to access the restricted
resource through the proxy server immediately?
Consideration of possible solutions
It turns out that each HTML document is able to tell us what its URL (Uniform
Resource Locator) is. This is the web address that we used to access the
document. The URL is a direct path to the resource and all we need to do is see
if we could have access if we were on campus. If we could just change that URL
to make it route through the proxy server then we would have our solution!
Lets look at an example by exploring URLS when we are proxied
a. From off campus go to http://dl.acm.org/citation.cfm?id=1734388
b. You should see a ‘buy this article’ instead of a link to the full text!
c. If we were going to find this article again we would have to go to
http://researchport.umd.edu, login, find this database and conduct a
search for the article title.
Explore the differences between URLS for our test article
d. login to research port (http://researchport.umd.edu)
e. Find the ACM digital library
f. Search for the article title and navigate to the article
g. Copy the URL from the bar as you navigate pages – notice how the
domain always includes the root url in addition to other
information?\
Identify the elements of the URLs
h. Copy the URL from Step 9a and the URL that you got in step 10c
and paste them one above the other.
i.
A URL consists of a Scheme, a host, a port (optional) a path, a
query and a fragment (optional). See Figure 2 below to help
identify each of these elements in your own URLS.
Figure 1: URL scheme ( from http://docstore.mik.ua/orelly/linux/cgi/ch02_01.htm)
Key Questions
Question 1. What are the host values of your two URLS from 9a and 10c?
Question 2. What elements of the two URLS are different? Why do you think
these elements differ? (Note – there are lots of differences and reasons –
no need to be exhaustive here!)
The reason proxies work is because they can leverage the Document Object
Model (DOM) and scripting languages to re-write the host values of our URLS.
As you can imagine this can be both very useful and very nefarious. Proxies
have also been used as something called “man in the middle” attacks which redirect all of your web traffic to another location.
Explore – what is the DOM?
As we recall, the Document Object Model (DOM) is a model for representing
documents in web-browsers. The DOM applies to a suite of web-based
technologies including HTML, XML, JavaScript and CSS. Figure 1 shows the
DOM model. Notice that the parent element is the “Window” object. That
window has the children history and document. The document object in turn has
multiple children. The DOM and MVC can be viewed as complementary models
in that they help us create web applications using MVC design. This is possible
in part because MVC defines an approach to software design that separates data
from logic and display and the DOM allows us to write abstracted programs that
can be applied to multiple documents (or even document types). Without these
two concepts we would be limited to integrating our data model (e.g. HTML
elements), our design model (e.g. styles, layout and formatting using HTML) and
our web service behaviors (e.g. JavaScript commands) into a single document,
making it much harder to both create and maintain.
Review the DOM model below and answer the associated questions
Figure 2: DOM model
Key Questions
Question 3. What is the top element of
the DOM?
Question 4. The document element (or
object) in this model has what siblings?
Question 5. Think for a moment about
having the programmatic ability to manipulate all of the elements of the
DOM. One feature of this is the ability to tell the window object to load a
new document? Can you think of an example of this in your everyday use
of the web? Perhaps one related to the information need we are
exploring?
Designing a solution
It turns out that using the DOM and some JavaScript we can access the URL of
our non-authenticated page and substitute the host of our proxy service. While
this sounds complicated (in theory it is somewhat complicated), in practice it is
pretty simple!. The JavaScript code that we develop can be installed in our
browser as a Bookmarklet (e.g. a browser bookmark that runs a small piece of
JavaScript on a loaded web-page) and can be accessed as a simple button in
our browser! Figure 4 shows the completed bookmarklet installed in our browser.
Lets begin by exploring bookmarklets.
Figure 3: An installed bookmarklet called "ProxyUM"
Look on the web for a definition and example of a bookmarklet. Identify how and
where the bookmarklet is used and figure out how to ‘install’ it in your
web-browser. Be prepared to report back to the class on the
bookmarklet you found.
Key Questions
Question 6. What definition did you find for a bookmarklet?
Question 7. Could you find any examples of a bookmarklet on the web?
As we found, bookmarklets tend to be limited in scope and often implement a
single purpose web-service. Sometimes this points to a web-service at another
site but bookmarklets can also be used for very simple purposes.
Anatomy of a JavaScript statement
JavaScript is a scripting language that runs within the context of a web-browser.
All web-browsers contain a JavaScript interpreter engine to go along with the
HTML renderer. You can include JavaScript in a webpage by using the <script
type=”javascript”/> element or by including it from another file. You can invoke
JavaScript in a webpage by either running it on page load or on specific events.
For example, if you want to have JavaScript run at the end of a page load, simply
call your JavaScript functions in the footer of your HTML document.
1. <div id=”footer”><script type=”javascript”>….</script></div>
You can also run JavaScript when a user clicks links by including a javascript
function as part of an <a href> element
2. <a href=”javascript:….”></a>
JavaScript uses a number of common programming elements including control
structures, variables, functions, methods, and objects. Before we move forward,
lets quickly define these concepts:
Table 1: Definition of terms
Programming concept
Definition
Control structure
A control structure acts as a gate or cycle that a program must
pass through when running. Our program contains no control
structures!
Variable
A variable is very similar to an HTML element in that it has a name
and a value. Variables are used to store data for the program to
use. Our program does not use any variables!
Functions
A function is a block of code designed with a specific purpose.
Functions are intended to “stand alone” and be relatively self
sufficient. Our program creates a single function that implements
our service
Methods
Methods, like functions are blocks of code that do something
specific but unlike functions, methods are tied directly to objects.
The association is a hierarchical one (a method inherits traits from
its object and belongs to an object)(
Objects
An object is a set of variables, methods and definitions that form a
discrete thing in programs. Objects have real-world analogs – for
example a door has a form, methods (open, close) and statuses
(opened, closed, locked). Our objects have similar states and we
will work with them extensively in this exercise.
Just what you need to know to understand this code
In order to stay focused on our goal, we are just going to explore the elements of
the language that we need to create our bookmarklet.
A quick note on referring to objects and methods in JavaScript. As you can see
on line 3, we have an object (window) with a member (location) and a method
(replace). These three elements are connected using a dot (.) syntax (e.g.
window.location.replace). This hierarchical representation of the object tells our
program to execute the replace method on the location member of the window
object.
The parentheses on line 3, 4 and 5 help us group our methods using standard
Boolean logic. The plus signs on line 4 implement a function called
concatenation in which strings are re-combined.
On line 4 you see a mix of object references (window.location.protocol) and text
in double quotes. The text in double quotes (e..g “.proxy-
um.researchport.umd.edu” are passed through our program as regular text while
our object references (window.location.protocol) are processed by our JavaScript
interpreter.
3. window.location.replace(
4. window.location.protocol + "//" + window.location.host + ".proxyum.researchport.umd.edu" + window.location.pathname + "?" +
window.location.search
5. );
To review, on line3 we call the method replace on the object window.location.
Note that location is a child of the window object and that the method replace
tells the window.location object to replace the document is currently has with the
one you are going to provide. The method is called using the syntax
6. window.location.replace(REPLACEMENTURL);
Note that the line begins with a method call, encloses the REPLACEMENTURL
in parentheses and ends in a semicolon. This is standard syntax for all method
calls in JavaScript
Lets try this out by entering javascript commands in a browser window.
j.
Open your favorite web-browser (e.g. Chrome, firefox, Safari, IE)
k. Type javascript:window.location.replace("http://google.com"); into
the location bar
l.
Hit enter
m. What happens?
You notice that we can hard-code a URL to send the document to. As I said
before we can include that JavaScript code as the href value in an <a> element
as well so that when a user clicks on the link it loads a specific page using
javscript. Normally this would not make much sense as the href element pretty
much tells the browser to load whatever link you follow but as we will see below,
using JavaScript allows us to make that link dynamic!
Lets return to our code example above. Look at line 4. You will see a mix of
objects (e.g. window.location.protocol) and hard coded values (e.g. “//”). You will
also see that we concatenate or combine each of these values using an addition
(+) sign. This is standard javascript syntax for combining several values to
create a new string. Lets look at that line of code again:
7. window.location.protocol + "//" + window.location.host + ".proxyum.researchport.umd.edu" + window.location.pathname + "?" +
window.location.search
Using the javascript reference at
http://www.w3schools.com/jsref/obj_location.asp. Look up each of the
objects defined in the code line above
n. window.location.protocol
o. window.location.host
p. window.location.pathname
q. window.location.search
While we are here, lets also look at the meaning of
window.location.replace(URL). This method replaces the document in
your current window with a new document located at the URL passed
to the method. In this web service we use the window.location.replace
method to redirect our acm page to a proxied version.
Before we get too involved with writing javascript, lets explore one method of
using javascript to understand more about our HTML page. In the
following steps we will use JavaScript to break apart the value of our
acm.org URL into its constituent parts (e.g., protocol, host, pathname
and search).
r. Load http://dl.acm.org/citation.cfm?id=1734388 into a web-browser
of your choice (Note, Chrome may be a good browser to use).
s. We can run have JavaScript help us decode this URL by entering
specific JavaScript commands in to the URL bar of our
webbrowser.
t. Try this out by typing
“javascript:window.alert(window.location.protocol)” into your URL
bar. After you have typed out the script (Figure XX), hit enter. Did
you get a pop-up box (e.g., Figure XY)?
Figure 4 Javascript window.alert method
Figure 5 JavaScript window.alert output
Now lets understand how a real URL maps to each of these objects. Take the
URL http://dl.acm.org/citation.cfm?id=1734388 and use the method
from step 17 to extract each value from these four variables.
u. window.location.protocol =
v. window.location.host =
w. window.location.pathname =
x. window.location.search =
With our four variables and their values (e.g. element/value pairs) identified, you
should see that a few characters are missing from the four elements.
While we will use these four elements to help build a new proxied URL
we also have to be aware of the proper syntax for a URL (e.g.,
Protocol://host.com/pathname?search). For each element, what
characters were not returned from the URL?
y. window.location.protocol =
z. window.location.host =
aa. window.location.pathname =
bb. window.location.search =
In order to re-insert these characters into our new URL we simply include them
as quoted strings (e.g., “//”). When we put characters in quotations in
JavaScript, we are asking JavaScript to simply pass along the
characters without modification. In contrast, when we do not quote
characters and strings in JavaScript, the program attempts to map
these strings into valid JavaScript commands (e.g. window.alert).
Remembering that anything in “” is a non-changing value and that anything in the
format window.location. is either a method (i.e. it performs some task)
or a value (i.e. it stores some information) take the above URL and
write apply the code in line 4 against it. Write down the new URL that
would be generated if we were to call the window.location.replace
method on it.
cc. New URL:
Now, lets copy our new URL and paste it in a web-browser. Were you prompted
to login to researchport? Did you pull up a webpage from which you
could get the full text?
Lets try it again by taking the code on lines 3-5 and assembling them into a
single line of code.
Load the ACM Url again and then copy and paste your singl line of code from
step 10 into the URL. Hit enter. What happens? Did you get redirected through the proxy?
A quick note on query strings.
Query strings are all of the text in a URL that come after the “?”. Query strings
are used to pass variables to web applications so that they know what to do next.
In the above URL, the query string is id=1734388.
Create our bookmarklet
We can see that applying this JavaScript operation to the above url redirects it
through the university proxy server. This means that we can be off campus when
we find a resource on the web and use this code to access the resource using
the campus proxy server. Now that we have explored the components of our
JavaScript function and understand what it does lets turn it into a bookmarklet so
that we can have simple access to this function anytime we need it.
Open a text editor and create an HTML page that with the following code:
8. <html>
9. <body>
10. <a
href="javascript:window.location.replace(window.location.protocol%20+%20'/
/'%20+%20window.location.host%20+%20'.proxyum.researchport.umd.edu'%20+%20window.location.pathname%20+%20win
dow.location.search);">test</a>
11. </body>
12. </html>
Save the page do your computer and open it in your webbrowser.
Try the link – what did it do?
Reload your page and drag your link to the bookmark toolbar (you may need to
tell your web-browser to show your bookmark toolbar first).
Visit the ACM page again. Click you bookmarklet. Did it work?
Download