Globus Online Prepared by Bekir Güler What is Globus Online Globus online transfer is a hosted service (deployed on Amazon’s cloud infrastructure) which aims to make high performance data transfers easy and painless not just for the endusers of the data, but also for the organizations hosting the machines that those users interact with Shortly, it is a transfer service. It uses Software as a service model. What data movement is important Getting data to where it needs to be can be a crucial first (or middle or end) step for primary work. For example, in order to analyze satellite data, a researcher might first need to move it from a remote sensor to a local compute resource. Why moving data is difficult 1. 2. 3. 4. 5. Reliability Security Performance Usability Maintainability 1. Reliability Potential problems at each endpoint include: operating system misconfigurations, filesystem failures, hardware failures, bad firewall settings, authentication failures, authorization failures, etc. Then of course there are the many individual network links connecting the two endpoints, any one of which is subject to ephemeral failures. 2. Security Moving data within a site you administer is relatively simple from a security standpoint: you are able to control exactly who accesses what data. However, enabling wide-area transfers necessarily involves exposing your resource to outside connections. People use SCP or FTP but while SCP has some performance drawback, FTP is not also a secure protocol. 3. Performance Unfortunately, most data transfer protocols are not particularly designed for high performance. Popular tools like SCP are secure and easy to use, but performance can be a problem for large datasets. Basically, the cost of SCP’s approach to security is performance. SCP fully encrypts all data even when a more limited approach would be sufficient (such as applying strong authentication and authorization techniques when establishing connections.) 4. Usability Transferring large datasets across a wide-area is not typically a user-friendly experience. Such transactions often involve huge amounts of data and can take hours or days (or even weeks) to complete. Long-running transfers often end prematurely due to reliability issues, and figuring out which transfers succeeded and which failed after a transfer ends can be a tedious task. 5. Maintainability Providing a high performance data transfer capability typically requires that you install and maintain special software on your users’ machines. If you’re a member of a large organization such as a national laboratory this is usually not a big deal (or, rather, it probably is a big deal, and thus people are actually employed to do that.) If you’re a small or medium resource provider, enabling high performance data movement is often a one person job. Not only do you have to install new software and keep it current, you also have to make sure it is configured to run at peak efficiency. And, of course, user support responsibilities might also fall solely on you. What is the GO-Transfer approach? To mitigate reliabilty issues, GO-Transfer includes built-in automated fault detection and recovery codes and is backed by a responsive Help Desk comprised of data movement experts. • To address security concerns, communications with endpoints are strongly authenticated and authorized using native mechanisms; in addition Globus Online never stores private data like passwords and keys. • Performance-wise Globus Online is able to obtain state-of-the-art speeds through its use of the GridFTP transport protocol; user files are transferred directly from endpoint to endpoint, making use of highperformance networks where available. • Usability issues are addressed with an asynchronous “fire-and-forget” model that frees users from babysitting long-running transfers; email notifications are sent for interesting events like transfer completion; a rich query interface returns detailed information on transfer status and events; the CLI and GUI are implemented with technologies familiar to most users. • Maintenance costs are mitigated by Globus Online’s software-as-a-service model; users do not need special software to control their transfers; streamlined packaging of endpoint software eases the burden of creating new endpoints, and the Globus Online development team stands at the ready to fix bugs and provide technical assistance where needed. How to use Globus online What is endpoint station How to use Globus Online Transfer First of all, you will need to sign up for a Globus Online account. In the sign up page, you will need to provide just some basic information, as in next slide. Once you’ve signed up, you will be taken directly to your dashboard. Your account is not quite ready yet, though. You’ll see a notice on the left sidebar telling you that your account is being provisioned You’re now ready to do your first transfer with Globus Online. As mentioned earlier, we won’t need to set up a GridFTP server or even create our own endpoint for this. We will simply use a pair of public tutorial endpoints that all GO users have access to. To perform a transfer, click on “Start Transfer” from your dashboard. You will be taken to the transfer interface: How to turn our local computer to an endpoint Globus Connect (or GC, for short). It will set up a private endpoint on your personal computer, which you will then be able to access from the GO-Transfer web interface. Internally, Globus Connect does install and set up a GridFTP server on your computer, but it will hide all those details from you. To install Globus Connect, just click on “Get Globus Connect” from the transfer interface. You should see a pop-out like this: Globus Connect establishes only outbound connections and thus can work behind a firewall or other network interface device that does not allow for inbound connections. The Globus Connect server is stateless and thus can be started and stopped at will; all state associated with transfers is maintained by GO. Autoupdate means the user need not maintain thesoftware over time. Different Interfaces for GO Globus onli provides different interfaces for users Friendly webGui :for adhoc and less technical users CLI( Command line interface):For advenced users REST ( Representational State Transfer) aplication programming interface for system builders User Profile and Identity Management An important GO feature is the ability to handle transfers across multiple security domains with multiple user identities. Unlike many systems, including most previous Grid filetransfer services, GO does not require a single, common security credential across all transfer endpoints. GO doesn’t store passwords. Instead, it stores Identities. one can configure his proofile with various identities. Such as Myproxy Certification and Oauth protocol identities. Scalable Cloud-based Implementation SaaS requires reliability and scalability, continuing to operate despite the failure of individual components and behaves appropriately as usage grows. To this end, the GO team applies methods commonly used by SaaS providers, running GO on a commercial cloud provider, Amazon Web Services (AWS). The GO implementation uses a combination of Amazon Elastic Compute Cloud (EC2), Amazon Elastic Load Balancing, and Amazon Simple Storage Service (S3) The vast majority of GO is programmed in Python, running on Ubuntu Linux servers, with Cassandra and Postgres databases.