What is the GO-Transfer approach?

advertisement
Globus Online
Prepared by Bekir Güler
What is Globus Online


Globus online transfer is a hosted service
(deployed on Amazon’s cloud infrastructure)
which aims to make high performance data
transfers easy and painless not just for the endusers of the data, but also for the organizations
hosting the machines that those users interact with
Shortly, it is a transfer service. It uses Software as a
service model.
What data movement is
important
 Getting
data to where it needs to be can
be a crucial first (or middle or end) step
for primary work. For example, in order to
analyze satellite data, a researcher might
first need to move it from a remote sensor
to a local compute resource.
Why moving data is difficult
1.
2.
3.
4.
5.
Reliability
Security
Performance
Usability
Maintainability
1. Reliability

Potential problems at each endpoint include:
operating
system misconfigurations, filesystem
failures, hardware failures, bad firewall settings,
authentication failures, authorization failures, etc.
Then of course there are the many individual
network links connecting the two endpoints, any
one of which is subject to ephemeral failures.
2. Security


Moving data within a site you administer is
relatively simple from a security standpoint: you
are able to control exactly who accesses what
data. However, enabling wide-area transfers
necessarily involves exposing your resource to
outside connections.
People use SCP or FTP but while SCP has some
performance drawback, FTP is not also a secure
protocol.
3. Performance


Unfortunately, most data transfer protocols
are not particularly designed for high
performance.
Popular tools like SCP are secure and easy to
use, but performance can be a problem for
large datasets. Basically, the cost of SCP’s
approach to security is performance. SCP fully
encrypts all data even when a more limited
approach would be sufficient (such as
applying
strong
authentication
and
authorization techniques when establishing
connections.)
4. Usability

Transferring large datasets across a wide-area is
not typically a user-friendly experience. Such
transactions often involve huge amounts of data
and can take hours or days (or even weeks) to
complete. Long-running transfers often end
prematurely due to reliability issues, and figuring
out which transfers succeeded and which failed
after a transfer ends can be a tedious task.
5. Maintainability

Providing a high performance data transfer capability
typically requires that you install and maintain special
software on your users’ machines. If you’re a member of
a large organization such as a national laboratory this is
usually not a big deal (or, rather, it probably is a big
deal, and thus people are actually employed to do
that.) If you’re a small or medium resource provider,
enabling high performance data movement is often a
one person job. Not only do you have to install new
software and keep it current, you also have to make
sure it is configured to run at peak efficiency. And, of
course, user support responsibilities might also fall solely
on you.
What is the GO-Transfer
approach?


To mitigate reliabilty issues, GO-Transfer
includes built-in automated fault detection
and recovery codes and is backed by a
responsive Help Desk comprised of data
movement experts.
• To address security concerns,
communications with endpoints are strongly
authenticated and authorized using native
mechanisms; in addition Globus Online never
stores private data like passwords and keys.


• Performance-wise Globus Online is able to obtain
state-of-the-art speeds through its use of the GridFTP
transport protocol; user files are transferred directly
from endpoint to endpoint, making use of highperformance networks where available.
• Usability issues are addressed with an asynchronous
“fire-and-forget” model that frees users from
babysitting long-running transfers; email notifications
are sent for interesting events like transfer
completion; a rich query interface returns detailed
information on transfer status and events; the CLI and
GUI are implemented with technologies familiar to
most users.
•
Maintenance costs are mitigated by
Globus Online’s software-as-a-service
model; users do not need special
software to control their transfers;
streamlined packaging of endpoint
software eases the burden of creating
new endpoints, and the Globus Online
development team stands at the ready to
fix bugs and provide technical assistance
where needed.
How to use Globus online
What is endpoint station
How to use Globus Online
Transfer
 First
of all, you will need to sign up for a
Globus Online account. In the sign up
page, you will need to provide just some
basic information, as in next slide.
 Once
you’ve signed up, you will be taken
directly to your dashboard. Your account
is not quite ready yet, though. You’ll see a
notice on the left sidebar telling you that
your account is being provisioned

You’re now ready to do your first transfer with
Globus Online. As mentioned earlier, we won’t
need to set up a GridFTP server or even create our
own endpoint for this. We will simply use a pair of
public tutorial endpoints that all GO users have
access to. To perform a transfer, click on “Start
Transfer” from your dashboard. You will be taken
to the transfer interface:
How to turn our local
computer to an endpoint



Globus Connect (or GC, for short).
It will set up a private endpoint on your
personal computer, which you will then be
able to access from the GO-Transfer web
interface. Internally, Globus Connect does
install and set up a GridFTP server on your
computer, but it will hide all those details from
you.
To install Globus Connect, just click on “Get
Globus Connect” from the transfer interface.
You should see a pop-out like this:



Globus Connect establishes only outbound
connections and thus can work behind a
firewall or other network interface device that
does not allow for inbound connections.
The Globus Connect server is stateless and
thus can be started and stopped at will; all
state associated with transfers is maintained
by GO.
Autoupdate means the user need not
maintain thesoftware over time.
Different Interfaces for GO
 Globus
onli provides different interfaces
for users



Friendly webGui :for adhoc and less
technical users
CLI( Command line interface):For
advenced users
REST ( Representational State Transfer)
aplication programming interface for
system builders
User Profile and
Identity Management
 An
important GO feature is the ability to
handle transfers across multiple security
domains with multiple user identities.
Unlike many systems, including most
previous Grid filetransfer services, GO
does not require a single, common
security credential across all transfer
endpoints.
 GO
doesn’t store passwords. Instead, it
stores Identities. one can configure his
proofile with various identities. Such as
Myproxy Certification and Oauth protocol
identities.
Scalable Cloud-based
Implementation

SaaS requires reliability and scalability, continuing
to operate despite the failure of individual
components and behaves appropriately as usage
grows. To this end, the GO team applies methods
commonly used by SaaS providers, running GO on
a commercial cloud provider, Amazon Web
Services (AWS). The GO implementation uses a
combination of Amazon Elastic Compute Cloud
(EC2), Amazon Elastic Load Balancing, and
Amazon Simple Storage Service (S3)
 The
vast majority of GO is programmed in
Python, running on Ubuntu Linux servers,
with Cassandra and Postgres databases.
Download