Aventail Web Translation Web Developer Guide May 2007 © Aventail Corporation 2007. All rights reserved. Aventail, Aventail.Net, Aventail ExtraNet Center, Aventail ExtraWeb, Aventail ExtraNet, Aventail Connect, and their respective logos are trademarks, service marks or registered trademarks of Aventail Corporation. Other product and company names mentioned in this publication are the trademarks of their respective owners. 2 • WEB CONTENT TRANSLATION Table of Contents Overview .......................................................................................................3 Introduction ............................................................................................... 3 Version Compliance ..................................................................................... 4 Who is this document for? ............................................................................ 4 How does the Aventail Web Translation server works? ..................................... 4 Content-Type of Web Pages ..........................................................................5 Recommendations ....................................................................................... 5 Character Encoding .......................................................................................5 Recommendations ....................................................................................... 5 Cookie translation .........................................................................................5 Recommendations ....................................................................................... 5 URLs ..............................................................................................................6 Recommendations ....................................................................................... 6 HTML translation ...........................................................................................6 Recommendations ....................................................................................... 6 CSS translation ..............................................................................................6 JavaScript translation ...................................................................................7 Translation rules ......................................................................................... 7 Recommendations ....................................................................................... 9 VBScript translation ......................................................................................9 Java applet, ActiveX and Flash translation ....................................................9 Recommendations ....................................................................................... 9 XML translation ...........................................................................................10 Recommendations ..................................................................................... 10 Web aliases .................................................................................................10 Recommendations ..................................................................................... 10 Miscellaneous ..............................................................................................10 Referrer lookup ......................................................................................... 10 WEB CONTENT TRANSLATION • 3 Overview Introduction A truly clientless VPN appliance requires a robust web-content translation engine. The reason is simple: all network references within the web content must be changed to point to the VPN appliance instead of internal hosts. With full-client VPNs or pseudo-clientless VPN appliances that use web-deployed ActiveX or Java clients, this host mapping can be done on the client. For VPN use on the broadest possible browser base however, web content translation is indispensable. A simple example is effective in illustrating the translation of web content. Imagine an HTML page with the following anchor tag that links to an internal resource: <a href=“http://owa.in.aventail.com”>Outlook Web Access</a> Within the corporate network, such a link works perfectly. When the user clicks on the link in the browser, the latter asks the internal DNS server what the IP address of “owa.in.aventail.com” is and retrieves the desired page. Outside the corporate network however, say at an employee’s home, this link does not work. The browser asks the DNS server of the local ISP what IP address corresponds to “owa.in.aventail.com” and is told that that address doesn’t exist. Even if the link were to a routable IP address within the corporate network, the corporate firewall would probably prevent the browser from accessing the desired resource. Web content translation is the process of changing (translating) the link above into something like: <a href=“https://ex2500.aventail.com/go/owa.in.aventail.com”>Outlook Web Access</a> The hostname is changed from the internal hostname to the DNS-resolvable hostname of the VPN appliance. However, the appliance doesn’t hold the desired resource; therefore that end resource must be encoded in some way within the URL. In our example, it is encoded within the path portion of the URL. If the only kind of translation necessary were a translation of HTML links such as the above, things would be easy. This unfortunately is not so. There are numerous ways 4 • WEB CONTENT TRANSLATION to reference network resources in HTML alone. Javascript, the now-ubiquitous web scripting language, augments the scope of the problem tremendously. Javascript in fact makes the problem intractable. It provides means of executing code on the browser and it allows the user to feed in additional input that is unknown at the time the server-side translation is done. For example, the user can be prompted for a URL using Javascript and the browser can then be instructed to go to that URL. Version Compliance Users of the document must note that this document is updated for each ASAP version that is released by Aventail. Please check the version you are running on Aventail box is in compliance with that of this document. This document is valid for releases ASAP 8.6 through ASAP 8.8. Who is this document for? This document is for Web Application Developers who wish to make their software easy to translate by the Aventail translation engine. It provides a set of guidelines to achieve this goal and gives a brief overview of certain aspects of the translation engine. How does the Aventail Web Translation server works? The Aventail Web Translation server is part of the Aventail VPN appliance which sits at the network perimeter. It isolates and protects private Web-based resources from unauthorized external access. A user first logs in to the Aventail appliance and is presented with the Workplace page. The user then follows a link on that page to request a resource from the internal network, or enters a URL on the Workplace page. All URLs point to the Aventail appliance. The Aventail Web Translation server translates an incoming URL using an "alias" contained in the URL. Aliases are used to obscure the URLs that point to resources on your internal (or “downstream”) servers. Because all requests are directed to the Aventail appliance, the user only sees the incoming URL that contains the alias. The Aventail Web Translation server matches the alias to a list it stores in memory and translates the URL. Once it determines that the URL submitted by the user is valid and points to a resource on the network, the Aventail appliance checks its access control and authentication rules to make sure the user is authorized to access the requested resource. WEB CONTENT TRANSLATION • 5 Content-Type of Web Pages Although the Aventail translation engine possesses heuristics to guess the type of content in an HTTP response from the backend web server, it is best to avoid relying on this and to instead specify the type explicitly. Recommendations The single most important thing you can do to ensure proper translation is to make sure that all pages are served up with the correct “Content-Type” header. In particular, it is imperative that: 1. HTML content is served up with the “text/html” Content-Type. 2. Javascript content is served up with the “application/x-javascript” ContentType. 3. XML content is served up with the “text/xml” Content-Type. Character Encoding As an internationalized network device, the Aventail appliance uses UTF-8 exclusively for its internal work. Recommendations 1. Use UTF-8 exclusively for all your Web content. Do not use the Microsoft code-pages. This particularly important when POSTing form data. Cookie translation The path portion of a “Set-Cookie” header is translated. The domain portion of this header is discarded. For example, if the backend web server sends the header: Set-cookie: x=y; path=/; domain=.in.aventail.com and the alias associated with the web resource is “morty”, then this header is translated to: Set-Cookie: x=y; path=/morty/ This forces the web browser to send this cookie back only to the alias (and therefore the web server) that set the cookie. Recommendations 1. Avoid sophisticated client-side cookie manipulations using Javascript 2. Avoid using URLs in cookies. Although an attempt is made to translate those URLs, there is some risk of letting them through. 6 • WEB CONTENT TRANSLATION URLs The Aventail translation engine can handle URLs in any form: 1. Fully-qualified URLs (e.g. “http://www.acme.com/dir1/dir2/file.html”) 2. Absolute paths (e.g. “/dir1/dir2/file.html”) 3. Relative paths (e.g. “../dir2/file.html”) Recommendations 1. It is best to use relative paths exclusively in your web application. This of course also has the advantage of making your web application more portable (e.g. to another web server and directory). HTML translation HTML translation is handled very reliably by the Aventail appliance. Recommendations 1. Make sure your HTML is formatted according to standard, especially the quotes around attributes in tags. Ideally, use XHTML formatting. HTML attributes containing a value (for example, src="path") may not be translated if they contain any of the following errors: a. Spaces before or after the equal sign. src ="path" or src= "path" b. Leading or trailing spaces within the value. src=" path" or src="path " c. Missing lead or end quotation mark. src="path or src=path" 2. Avoid base tags, such as: <base href="http://myapp.internal.acme.com/dir/" /> in your HTML code. 3. The “meta” tag is commonly used to redirect users to another page. For example: <meta http-equiv="refresh" content="5;url=redirectURL.html" /> The meta tag’s content attribute must be formatted carefully; don’t include line breaks or spaces. CSS translation CSS content should be handled without difficulty. WEB CONTENT TRANSLATION • 7 JavaScript translation JavaScript translation is complex and there are certain coding practices that you can use to make sure your JavaScript code translates correctly. Translation rules The current Aventail JavaScript translation engine is a parse-tree based engine that can handle complex syntax. It is a rule-based translator that makes use of Aventail’s client-side JavaScript library. The rules are stored in: /usr/local/extranet/etc/jstrans.cfg The translation rules are divided into four categories: 1. Assignment statements (type ASSIGNMENT) 2. Function calls (type CALL) 3. Substitution of one language token with another (type SUBSTITUTION) 4. Special kind of substitution in a function call (type SUBARGS) You should not need to write any new rules. It is however useful to be aware of the rules as you follow the recommendations below. Here are the majority of the JavaScript rules as of September 2006: # Javascript Translation # Assignment Statement Translation # # Type Left Hand Side (LHS) # ASSIGNMENT location ASSIGNMENT .location ASSIGNMENT .href ASSIGNMENT .src ASSIGNMENT .action ASSIGNMENT document.domain ASSIGNMENT document.cookie ASSIGNMENT .innerHTML ASSIGNMENT .url # Function Call Translation # # Type Function Name # CALL .addBehavior CALL .showModalDialog CALL .showModelessDialog CALL .insertAdjacentHTML CALL location.replace CALL location.assign Encapsulate RHS with aventail.translate_url aventail.translate_url aventail.translate_url aventail.translate_url aventail.translate_url aventail.setDomain aventail.setCookie aventail.postText aventail.translate_url Param Encapsulate param with 1 1 1 2 1 1 aventail.translate_url aventail.translate_url aventail.translate_url aventail.postText aventail.translate_url aventail.translate_url 8 • WEB CONTENT TRANSLATION CALL eval 1 aventail.post # Subsitution of one token with another # # lvalue/rvalue: 0: substitute always # 1: substitute only if token is an rvalue (read from) # 2: substitute only if token is an lvalue (written to) # # Type Token lval/ Replacement # rval SUBSTITUTION location.pathname 0 aventail.location.pathname SUBSTITUTION .location.pathname 0 .aventail.location.pathname SUBSTITUTION document.domain 1 document.aventail.getDomain() SUBSTITUTION document.domain 2 aventail.junk SUBSTITUTION .execCommand 0 .aventail.execCommand SUBSTITUTION location.pathname 0 aventail.location.pathname SUBSTITUTION .location.pathname 0 .aventail.location.pathname SUBSTITUTION location.host 0 aventail.location.host SUBSTITUTION .location.host 0 .aventail.location.host SUBSTITUTION location.hostname 0 aventail.location.hostname SUBSTITUTION .location.hostname 0 .aventail.location.hostname SUBSTITUTION location.port 0 aventail.location.port SUBSTITUTION .location.port 0 .aventail.location.port SUBSTITUTION location.protocol 0 aventail.location.protocol SUBSTITUTION .location.protocol 0 .aventail.location.protocol SUBSTITUTION location.href 1 aventail.location.href SUBSTITUTION .location.href 1 .aventail.location.href SUBSTITUTION location.search 1 aventail.location.search SUBSTITUTION .location.search 1 .aventail.location.search SUBSTITUTION location 1 aventail.location SUBSTITUTION .scripts 1 .aventail.getScripts() # Subsitution of one token with another, with a twist: # Take the "stem" of the call and make it the first argument in the new function. # For example: # If we have the token "foo.bar" and the replacement "aventail.ourFoo": # We will replace the construction "anObject.foo.bar(arg1, arg2)" with: # aventail.ourFoo(anObject, arg1, arg2) # This allows us to verify the type of the anObject object prior to operating on it # # lvalue/rvalue: 0: substitute always # 1: substitute only if token is an rvalue (read from) # 2: substitute only if token is an lvalue (written to) # 3: special case, turn a flat lvalue into a function call # # The "3" case above is used in cases such as "foo.location" to allow us to ensure # that "foo" is an object such as a document, window, or frame, and not some user-defined # object that just happens to have a "location" member. # # Type Token lval/ Replacement WEB CONTENT TRANSLATION • 9 # SUBARGS SUBARGS SUBARGS SUBARGS SUBARGS SUBARGS document.close document.write document.writeln .open .Open .location rval 0 0 0 0 0 3 aventail.docClose aventail.docWrite aventail.docWrite aventail.objOpen aventail.objOpen aventail.objLocation Recommendations 1. Do not use DOM references as variables names. For example, do not call any of your variables “location”. See the translation rules above to know what to avoid. 2. Avoid the “with” construct: with(object) {statements}. 3. Avoid passing DOM objects as parameters to functions. For example, avoid writing functions of the form: function test(mywin) { mywin.location = “http://owa.in.aventail.com” } Instead, make sure that the network-sensitive javascript appears verbatim, e.g. window.location = “http://owa.in.aventail.com”; In other words, do not hide the names of the underlying DOM objects. 4. Do not set a base tag using JavaScript. This invalidates all the translated URLs on the page. 5. Do not use conditional compilation for Internet Explorer (e.g. “@if …”) 6. Do not use Microsoft Script Encoding (e.g. language “JScript.Encode”) VBScript translation VBScript translation is no longer supported. Java applet, ActiveX and Flash translation No explicit translation of Java applets, ActiveX or Flash objects is performed. If possible, avoid using them entirely. Recommendations 1. If it is not possible to avoid using these objects entirely, consider constructing the network references they need from the URL of the page they are on. Perform this construction dynamically at run time. 10 • WEB CONTENT TRANSLATION XML translation Since XML needs to be described to make sense of the data, you will need to identify the portions of the XML content that require translation. This is done in the file: /usr/local/extranet/etc/custom-xmltrans.cfg The format of the rules to add to this file is: ELEMENT ATTR1 ATTR2 ... ATTRn This instructs the translation engine to look for element “ELEMENT” in the XML and to translation its attributes “ATTR1”, “ATTR2”,..., “ATTRn”. These attributes are URLs, of course. Recommendations 1. Add your XML translation rules to custom-xmltrans.cfg Web aliases Web aliases are declared when you configure a resource. They are used to hide the hostname of the internal server. Recommendations 1. Avoid using the same name for the alias as for the top level directory of your application. For example, if your web appliance lives in “http://myapp.in.aventail.com/coolapp/”, do not use “coolapp” as the alias for the Aventail resource. Miscellaneous Referrer lookup When a request for an absolute or relative URL for which there is no matching alias comes in, the Aventail Web Translation server looks at the “Referer” HTTP header or the referrer cookie that it sets. This header or cookie is used to correctly assemble the destination URL. This is a best effort attempt and should not be relied upon.