CyberAide Grid Abstractions for JavaScript —————————— —————————— 1 INTRODUCTION Abstract In this paper we present a JavaScript Grid abstraction In this paper, we describe a service oriented architecture and framework. It is based upon abstractions that have been Grid abstraction framework that allows access to the Grid proven to be useful in the Grid community and are the result through JavaScript. The framework can integrate with other of research activities from the Java CoG Kit [1]. However, we Web 2.0 technologies and allows for development of new are enhancing the previous approach with a number of components through an application interface. Moreover, our advanced services as well as targeting explicitly JavaScript on framework includes easy to use server side features to enable the client side as the language of choice. Through the the creation of advanced Grid services that are accessible framework we can integrate with a variety of Grid through JavaScript. Hence, the availability of our framework Middleware and obtain access to the Grid Fabric. However, in simplifies not only the development of new services, but also this paper we focus primarily on the integration with Globus the development of advanced client side Grid applications and SSH as used on the TeraGrid [3]. We are able to expose that can be accessed through Web browsers. We demonstrate Grid services through Web clients with the help of a this ability by providing a mechanism to develop Grid JavaScript API. Using the JavaScript API, Grid users and workflows through JavaScript with the help of a client side developers obtain a convenient way to access a Grid in graphical user interface defined in JavaScript. Overall, Grid JavaScript. One of the advantages of the JavaScript API is that developers will have another tool at their disposal that we can integrate with a large number of commodity libraries projects maintain available in JavaScript to go beyond the use of Grid cyberinfrastructure related software while simultaneously technologies. Hence, we leverage of from data-structure delivering advanced interfaces and integrating social services libraries, social networking, and communication libraries to for the scientific community. enable Web 2.0 programming features to build a commodity Keywords: JavaScript, Grid Computing, Workflow, CoG Kit, toolkit based on JavaScript. a simpler way to distribute and Grid Abstractions, Web 2.0. 1 2 ARCHITECTURE Grid middleware such as the Java CoG Kit, Globus, or even The framework is organized as a layered architecture and simply using SSH. Meanwhile it exposes the Grid services via contains several components. We depict the most important a standard Web Services to Web clients. It allows the use of a components of our architecture in Figure 1 and explain them personal queue management mechanism that keeps track of in more detail. all jobs and workflows submitted to the Grid in a more First, our Web Client API provides a high level-programming convenient form that a Grid middleware toolkit allows. interface to the Grid, a Grid workflow system, and a Grid Hence, we classify our framework as Grid upperware. using specific Graphical User Interfaces. Hence, while using Additionally, we are developing a collaborative queue this API our user community can develop portals specifically management service through which Grid users can share their targeted to Grids while being able to develop GUIs. workflows and now execute them collaboratively. Furthermore, we can easily provide functionalities such as Third, our framework contains a number of support Services workflows, job browsing, editing, submitting and execution that coordinate user account management, and system status status querying. information. Second, we are conducting all interaction to the backend Grid Next, we describe each of the components in more detail. or cyberinfrastructure through a mediator service. Through a mediator service we interact with the underlying grid services, which are deployed as Web services. Additionally, a mediator provides a persistent service allowing users to keep track of their jobs and tasks on the Grid. It only requires simple X509 certificate authentication. Because each TeraGrid user has access to a login node, we can host the mediator on one of these nodes, as by default each TeraGrid user will have access to such a node as it is part of the user account management of TeraGrid. The result is that the client has a zero install base if we use our Web2.0 API. Fig.1. System Architecture Figure 2 depicts a high-level interaction diagram with the Grid through a mediator to illustrate its functionality. The interaction with a mediator service is carried out through the JavaScript API. Internally, the mediator service is responsible for communicating with Grid services through underlying 2 HTTP over SSL/TLS. Message level security based on WSSecurity [12] may be integrated to enforce end-to-end security. QuickTime™ and a decompressor are needed to see this picture. Internally, the mediator service is developed using either JAXWS[10] from Java SE 6, or Axis2 [4] from Apache. However, in our current release however we focus on Axis. The Java CoG Fig.2. Role of the mediator between the client and the Grid Kit is used to interact with underlying Grid infrastructure and to provide functionality dealing with Grid security, job submission, workflow management, and file 2.1 Role of the Web Client in the Architecture transfers. We are able to perform proxy credential delegation The Web client provides the elementary functionality to for users so the users don’t have to authenticate themselves access the Grid through a portal user interface. every time they access Grid services during a login session. The JavaScript APIs expose several essential Grid services, We have integrated our service with the myProxy Server from such as: TeraGrid as to allow TeraGrid users immediate access to the Users can create jobs, as well as workflows on the client TeraGrid users. side. The meiator can be divided into two deployed parts to Users can submit jobs for execution or to shared queues improve security. The two parts can communicate with each that can be maintained by multiple users. other using Web Service architecture or Remote Method Invocation (RMI) approach. The first part works as a web Users can query and monitor the status of various jobs. service agent and forwards calls to the web services to the 2.2 Role of the Mediator Service in the Architecture other part, which could be behind a firewall to tighten As the name indicates, the mediator service behaves as an security. intermediate service between web client and the underlying 2.3 Role of Support Services in the Architecture Grid services. It hides much of the complexity of the Grid The support services enable user account management and functionality and allows for the deployment and integration job execution status. This includes the following: of resources where the installation of Grid middleware is not 1. possible. Furthermore, it provides a persistent collaborative User account information and access to a certificate authority maintained in a secure database or directory environment for users to share workflows. and a mapping between the Web portal user name and The mediator is based on standard web service technologies the actual Grid Distinguished Names (DNs). and mediates all function calls and communication patterns 2. between the client and the appropriate cyberinfrastructure Grid status is maintained to allow access of downtime and availability of registered Grid services. backends. The connection to the mediator is secured through 3 3 DESIGN ISSUES the functionality needed for many users. This includes task, The system design is based on Object-oriented model and job, workflow, and queue management. To obtain the status, Service-oriented architecture [8],[9],[14]. Entities in the system each of the APIs provide a convenient method to query the are all objects, while a service-oriented architecture is used to status of a task, job, workflow, or queue. Authentication expose these objects and allow for interaction that enables a methods are provided to allow the simple access of the service-oriented distributed object architecture. The benefit of services through the APIs. A list of the essential APIs is this approach is that it enables distributed objects to be loosely provided in the appendix. With the help of these APIs it is coupled while accessing them through standard web service possible to develop simple AJAX style applications indicated technologies. Using this approach, we are mostly concerned in Figure 3. Here we illustrate the execution of a job while about contracts of objects rather than reimplementing often depicting the interactions between the Web client (though its used patterns in regards to them. API) and the backend Grid service. Furthermore, as long as we keep the contract between entities, changes on implementation of one part would not affect other parts of the system. Hence we can execute changes on objects during runtime, while not effecting the client side interactions. Although JavaScript is not a statically typed language, it does not provide a general mechanism for defining a class or object-type definition [15]. However, it is a well-established practice to define custom objects in JavaScript that behave, in many ways like classes in Java. Fig.3. The use of the client APIs to authenticate, submit jobs and Hence we will use the term JavaScript classes throughout the query the execution status text. 3.1.2 Security Issue on JavaScript and Client Side Session Maintenance 3.1 Web Client Since the HTTP protocol is stateless, we need to maintain the We discuss client side APIs and discuss security issues. user’s status on the client’s side during a session. The status is 3.1.1 Essential Subset of the API kept in a cookie, which includes user related information and One of the most important APIs for many users will be the any session specific information. definition of APIs in JavaScript dealing with job files, and We tried to keep as little sensitive data as possible on the security management. Based on the lessons we learned from client to minimize the security risk as we are exposed to the Java CoG Kit, we designed APIs in JavaScript that address 4 malicious users and Web pages providing opportunities for writing to a web page. In this way, even if the malicious code potential JavaScript related attacks. passes the input validation, it could do no harm to the client The JavaScript security issue has two consequences. From side. the client side’s point of view, the script from server side 3.2 Mediator Service should not harm the client side; while from the server’s side, The application logic communicates with the mediator to the we need to limit the potential harm a client can do. underlying Grid services, and wraps the services into Web Two policies are enforced by most browsers to deal with services. Instead of just being a gateway service, the mediator this issue. The first is sandboxing, which means the script files contains also advanced functionality as part of the services from server can only be executed in a restricted environment offered. One example of this functionality is the management which has no impact in the client system; the second is same- of jobs (with the help of the CoGQueue) by a group of origin policy, which assures that script from a certain server scientists rather than an individual. The interactions between can only access resources from the same origin. the mediator service, the support services and the Web client The policies do improve security, however other potential are conducted through Web service invocations. problems still exist. One issue is that it restricts web application’s functionalities. By default JavaScript code cannot 3.2.1 Web Services access the client side local file system, which makes some We use Java to develop and deploy the web services. We operations like file upload impossible. The good news is there could implement the web Service Endpoint Implementation are some ways to circumvent this issue. One way is that by (SEI) class by using annotation mechanism provided by Java using signed scripts the JavaScript code can escalate its SE 6. JAX-WS provides wsgen [2], a convenient tool that can privilege based on the trust from the client. Another way is generate all the artifacts required for web service deployment combining with the use of protocol Form-based File Upload in from a Java SEI class. An alternative method to accomplish HTML, which is described in RFC1867 [17]. In such ways we this is using tools provided by Axis 2. Deploying the web solve the file upload problem. Another issue is that various service in some web container like Apache/Tomcat or Java SE attacks exist targeting the same-origin policy. Cross-site 6’s built-in Httpserver will expose the web service to the client Scripting (XSS)[6] is a typical attack that violates the same- side. origin policy. Cross-site Request Forge (XSRF)[7] is another Security is obtained through a HTTPS connection and kind of similar attack. No panacea to deal with these attacks, message level security based on WS-Security [12]. Users need while set of rules need to be considered during the to development [13, 7]. The basic rules are to always validate the username/password provided to them. Furthermore, all user’s input at server side, even if it is being performed in the messages coming from the client are semantically checked for client side; and to escape meaningful HTML characters before altered or malicious scripts. 5 authenticate them to the server side using a 3.2.2 Application Logic Layer decisions about this resource need to be made in an ad-hoc The application logic layer utilizes the Java CoG Kit API to fashion in order to address the most important situation. interact with the underlying Grid infrastructure. The actual Besides just managing single jobs in the queue we also intend call to the Java CoG Kit API is embedded in a web service and to manage entire workflows. As workflows are a superset of exposed indirectly through a JavaScript API. jobs, we will therefore refer to shared workflows instead of The application logic layer includes proxy credential single jobs. Our shared workflow service provides user-based delegation and authentication via the myProxy service access control allowing ownership of workflows by individual available in most Grid environments such as the TeraGrid users as well as group based access control through [11]. As long as a user stores his/her credential on a myProxy membership defined in an access control list or the inclusion server, the user can retrieve the credential by providing the in an explicit group. username and passphrase used when generating the proxy credential. Thus the user delegates the application server to communicate with the underlying Grid infrastructure during the user session, without the need to authenticate each time when submitting a job or query status of a submitted job. 3.2.3 Collaborative Queue Management As part of our JavaScript framework, we are working towards delivering a new service to the user community. One of these services, requested by some of our users is an ad-hoc Fig.4. A scenario for shared workflows in our ad-hoc queue collaborative queue management service. In contrast to traditional batch queues that are maintained by system As we internally employ an object oriented model we also administrators, our service is instantiated and maintained by have defined a number of objects and associated services for the users as part of a qausi ad-hoc virtual organization. The dealing with queue management. Important objects are the users regulate access to the queue and can determine SharedQueue itself and the object SharedWorkflow. The later priorities collaboratively. This is important if limited allows an object representing a workflow to be executed as resources are available and collaborative means are necessary part of the shared queue. These objects are to determine the priorities and an assignment to the resource SharedQueue ::= (unique object ID, is governed by policy instead of automatic scheduling system. label, This is especially useful in emergency situations where owner, compute resources may need to be controlled carefully and accessControlList, 6 • accessControlPolicy, queuObjects, …) objects, are general objects that can be shared in the queue. Although we have a general object model, for this paper we focus on sharing workflows (and jobs). SharedWorkflow ::= (unique object ID, • label, locked, for modification indicates whether or not this workflow is being edited by some user. owner, • locked, status, is an object that reports on the status of the workflow. It consists of a number of individual status status, messages for each job. The overall status of a workflow writeToken, type, can be determined by association of a status evaluation Workflow Object, …), function. However, it is beyond the scope of this paper. • type, is the object’s type. It could be system executable, a where the attributes are defined as follows: shell, python, or ruby script; or a CoG Kit Karajan • workflow to name just a view. label, is a field to store useful information to identify the object through a simple lookup. A label is user defined. • • Workflow Object, represents the actual workflow. unique object ID, an unique identification associated with 3.3 Support Services each object. Internally we distinguish a different Support Services are used to store the information that is namespace for unique ids for jobs and workflows to allow necessary to access Grids by a user though the use of a for easier bookkeeping. number of databases in order to deal with persistent states • owner, the person created the queue. Only the owner can grant other users SharedWorkflow, is access the to person the queue. who For the uploaded the and objects in cyberaide. One of them is the account management database, another one is a database to keep track of the users submitted jobs. We are in the process of adding workflow. Ownership can be reassigned. information such as those to improve resource utilization. For In addition to these common attributes (we have designed example, we are integrating with other useful information internally an object for it), we have the following additional such as information from QBETS[5] services used in Teragrid. attributes: • The information is obtained on the mediator service side, accessControlList, defines the users that have access to the through requests from a query or automatic update job and can reassign the priorities. notifications that listens and responds to information query accessControlPolicy, defines the policy that governs how requests through a common web service interface. priorities and the access to the queue is governed. 7 4 IMPLEMENTATION AND USECASE to the client via a handler. To show the client side interaction We have developed a prototype of the system that handles job we refer to the screenshot in Figure 7. submission and job status queries from a web client. It also 4.3 Status Query contains a client side GUI-based environment to mange and Users can check state of the submitted workflows by edit the jobs/workflows. The services are currently integrated specifying the workflow ID, or obtain the status of all into Axis2 and Tomcat as part of a standard web service. previously submitted workflows. Figure 8 depicts the result Authentication is conducted via the TeraGrid myProxy server. from a status query. In future we will develop more colorful Although our focus is not to create a workflow GUI, the and easy to understand GUI components such as a status problem of defining one in JavaScript is used as a test case to table. verify our design and to identify limitations due to its complexity on the client side. At this time we did not focus on 5 CONCLUSIONS AND FUTURE EFFORTS the beauty of the graphical components, but instead on testing In this paper we presented the JavaScript Grid abstraction the functionality of our framework. framework. It provides a client side web application, through which Grid users can conduct job and workflow management 4.1 Job Construction and Workflow Composition including submission and monitoring. We have outlined our The prototype uses the Java CoG Kit on the server side to methodology for sharing workflows. The web services-based provide workflow and job submission support. On the client application takes advantage of easy deployment and provides side workflows are assembled in the Karajan workflow interoperability through WS technologies and standards. Thus language, and then the workflows are passed on to the server we have introduced a new and convenient tool for accessing to be executed. The GUI widget provides a straight-forward Grids through JavaScript. way to define a job through the Karajan workflow language. Next steps include the implementation and integration of A user only needs to select workflow elements from a panel in more features in regards to collaborative queue management. order to integrate them into the workflow specification We also plan to investigate and support other workflows and window and define the dependencies between jobs. We depict other authentication methods. our client side workflow composition tools in Figures 5 and 6. ACKNOWLEDGEMENT 4.2 Job Execution We leave this section out to preserve anonymity. The final When the construction of a workflow is finished, it can be version will have this section included. NSF NMI supports the submitted for execution. The web client will invoke the server project. We like to thank Andrew Younge for his valuable side through regular web service calls. Once the workflow comments. arrives at the server, it will be executed with the help of the Java Cog Kit. A specific workflow ID is assigned and returned 8 Fig.7. Job submission Fig.8. Job status query Fig.5. Workflow element definition REFERENCES [1] Java cog kit. Web Pages. Available from World Wide Web: http://wiki.cogkit.org/index.php/Java_CoG_Kit. [2] Jax-ws wsgen document. Web Page. Available from World Wide Web: http://java.sun.com/javase/6/docs/technotes/tools/share/wsgen.html. [3] The Globus Alliance. Globus toolkit. Web Page. Available from World Wide Web: http://globus.org/toolkit/. Fig.6. Workflow dependency construction [4] APACHE. Apache axis2. Web Page. Available from World Wide Web: http://ws.apache.org/axis2/index.html. [5] J. Brevik D. Nurmi, R. Wolski and G. Obertelli. Qbets: Batch queue prediction system. Presented at TeraGrid 07’ Conference. Available from 9 World Wide Web: [6] http://www.teragrid.org/events/teragrid07/archive/presentations/Wedn World esday/QBETS.pdf. international.org/publications/files/ECMA-ST/Ecma-262.pdf. G.A. Di Lucca, A.R. Fasolino, M. Mastoianni, and P. Tramontana. [16] Wide Web: http://www.ecma- The Internet Engineering Task Force. Form- based file upload in html. Identifying cross site scripting vulnerabilities in web applications. In Web page, November 1995. Available from World Wide Web: Web Site Evolution, 2004. WSE 2004. Proceedings. Sixth IEEE http://www.ietf.org/rfc/rfc1867.txt. International Workshop on, pages 71–80, 11 Sept. 2004. [7] Nenad Jovanovic, Engin Kirda, and Christopher Kruegel. Preventing APPENDIX A: JAVASCRIPT CLIENT APLICATION cross site request forgery attacks. In Securecomm and Workshops, [8] 2006, pages 1–10, Aug. 28 2006-Sept. 1 2006. INTERFACES Yih-Cheng Lee, Chi-Ming Ma, and Shih-Chien Chou. A service- The appendix includes some more details about a selected oriented architecture for design and development of middleware. In number of APIs for our Cyberinfrastructure targeted Software Engineering Conference, 2005. APSEC ’05. 12th Asia- JavaScript. Pacific, page 5pp., 15-17 Dec. 2005. [9] Xiaolin Lu. An investigation on service-oriented architecture for A.1 Job Management constructing distributed web gis application. In Services Computing, An elementary class within the API is the Class Executable that 2005 IEEE International Conference on, volume 1, pages 191– [10] 197vol.1, 11-15 July 2005. is an abstraction for all executable entities such as jobs or SUN Microsystems. Java api for xml web services (jax-ws). Web workflows. Internally we can assign a provider (a concept Page. Available from World Wide Web: https://jax-ws.dev.java.net/. [11] NCSA. Myproxylogon. Available from World Wide developed as part of the Java CoG Kit) to map the execution Web: onto a resource. As our focus in this paper is to demonstrate a http://grid.ncsa.uiuc.edu/myproxy/MyProxyLogon/. [12] OASIS. Web services security v1.0 (ws-security 2004). Web Page, non-trivial function, we are focusing on our workflow related 2004. classes. Available from World Wide Web: http://docs.oasis- open.org/wss/2004/01/oasis-200401-wss-soap-message-security- [13] 1.0.pdf. A.2 Workflow Management Jayamsakthi Shanmugam and M. Ponnavaikko. A solution to block The class KarajanWorkflow is used to manage a Karajan cross site scripting vulnerabilities based on service oriented workflow. A Karajan workflow can contain more than one job. architecture. In Computer and Information Science, 2007. ICIS 2007. And it supports hierarchical workflow, which means a 6th IEEE/ACIS International Conference on, pages 861–866, 11-13 [14] July 2007. workflow can be a “job” in another workflow. The most A. Uyar, W. Wu, H. Bulut, and G. Fox. Service-oriented architecture important JavaScript functions include: for a scalable videoconferencing system. In Pervasive Services, 2005. • addJob (name, job) – A new job is added to the workflow. • deleteJob (name) – The job with the specified jobname is ICPS ’05. Proceedings. International Conference on, pages 445–448, 11-14 July 2005. [15] ECMA International. Standard ecma-262 ECMAscript language deleted. This also includes dependencies the job was specification, 3rd edition. Web page. December 1999. Available from associated with. • 10 listJobs ( ) – List all jobs in a workflow. • searchByName (name) – Search job by its name. • • addDependency (parent, child) – Adds a dependency workflow with the name specified by parameter. between the child and the parent, meaning that the parent A.4 Authentication jobparent is executed before the child. • removeDependency (parent, child) The class Authenticator provides an easy interface to – Remove a authentication from the client side. The function authenticate() dependency between the parent and the child. uses at this time a standard HTTPS connection to a backend We have more methods available, but they are beyond the service. Attributes are used to communicate the necessary scope of this paper. information back to the server such as the duration for creating a valid proxy certificate. To communicate with our A.3 Queue and Set Management Axis prototype we use a simple username password Queue management is handled in a class called WFQueue. It authentication scheme to relay the certificate authentication contains a queue of executable objects that have no via a myProxy server. dependency with each other. They are executed one after another. In addition, we have a WFSet that is similar to a A.5 Workflow Status Queue, but does not pose any order of execution of the objects in the • The class WorkflowQuery is used to query state of workflows Set.1 that the user submitted to execute. It has the following addWorkflow (name, workflow) – Add the workflow JavaScript functions: specified by parameter to a workflow queue or set. • • removeWorkflowByIndex (index) – Remove a workflow query () – Get state of all workflows submitted by the user. according to parameter index from the queu or set. • searchByName (name) – Return the index of the • removeWorkflowByName (name) – Remove a workflow query (workflowID) – Get state of specified workflows submitted by the user. with name specified by parameter name from the queue A.6 JS CoG API and Abstrction API or set. • The class CoGClass uses internally the above classes to do the clearAll ( ) – Remove all workflows in the queue or set. workflow submission and result query. It has the following functions: 1 • Note that in the following context, we use the name authenticate (Authenticator) – The Authenticator’s “workflow” to indicate all the kinds of Executable objects, authenticate() function will be called. For example, if we whether they’re instance of some workflows like Karajan or an use HTTPS as provider, then a username/password executable scripts. authentication will be carried out. 11 • execute (Executable, resources) – Submit a workflow to • listByType (queueID, provider) – List all the workflows’ server side to execute. The resources specify necessary metadata with the specified type, say, Karajan workflow, resources associated with the jobs. from the specified queue. • transfer (from, to) – transfer data from “from” to “to”. • query () – Get the state of all workflows that were specified workflow’s status. It could be edited by other submitted to execute by the user. user or not. • • • query (workflowids[]) – Get the state of the specified • Class WorkflowQueue deals with the interactions between the • listQueues () – List all the queues that the user are obtainWriteToken (queueID, sharedWorkFlowID) – The editWorkflow (queueID, sharedWorkFlowID) – The user • updateWorkflow (queueID, with other users. If the queuename exists and the user has privilege to access that queue (he is a participants of that queue), the server side program will extract info from the Executable object and construct objects to insert into the queue; otherwise the server side program will construct a new queue with that name, and set the user as the owner of the queue. remove (queueID, sharedWorkFlowID) – The owner of a workflow can remove it. List (queueID); list all workflows’ metadata shared in the queue. Figure 1: Cyberaide login window listParticipants (queueID) – List all the participants of the queue. • sharedWorkFlowID) Update the workflow when complete editing. add (queuename, Executable) – Submit a workflow into the server side’s queue to store it for future use or share • – workflow. grantAccess (queueID, userlist) – The owner of a queue can give some other users access privilege to the queue. • sharedWorkFlowID) will try to obtain the write token and then to edit the participating. • (queueID, user try to obtain the write token. client and the server side. It has the following functions: • browseWorkflow Download the workflow to client side to browse. workflows that were submitted to execute by the user. • getStatus (queueID, sharedWorkFlowID) – Query the listByName (queueID, username) – List the workflows’ metadata owned by a user from one queue. 12 – Figure 5: Cyberaide transfer Figure 2: Cyberaide job submisison Figure 3: Cyberaide shell Figure 4: Cyberaide dashboard panel 13 14