The Ibis e-Science Software Framework Henri Bal

advertisement
The Ibis e-Science Software
Framework
Henri Bal
Frank J. Seinstra, Jason Maassen, Niels Drost
High Performance Distributed Computing Group
Department of Computer Science
VU University, Amsterdam, The Netherlands
Introduction
●
Distributed systems continue to change
●
●
Distributed applications continue to change
●
●
Clusters, grids, clouds, mobile devices
e-Science, web, pervasive applications
Distributed programming continues to be
notoriously difficult
Distributed Systems: 1980s
Multiple PCs on a (local) network
●
●
●
●
●
Networks of Workstations (NOWs)
Collections of Workstations (COWs)
Processor pools
Condor pools
Clusters
Distributed Systems: 1990s
Sharing wide-area resources
●
●
●
●
●
Metacomputing (Smarr & Catlett, CACM)
Flocking Condor (Epema)
DAS (Distributed ASCI Supercomputer)
Grid Blueprint (Foster & Kesselman)
Desktop grids, SETI@home
Distributed Systems: 2000s
●
Cloud computing
●
●
●
●
●
Pay-on-demand
Virtualization
Hardware diversity /heterogeneous computing
Green IT
The Networked World
●
●
Sensor networks
Smart phones
Our approach
●
●
●
Study fundamental underlying problems
… hand-in-hand with realistic applications
… integrate solutions in one system: Ibis
User
Distributed Systems
● Funding from NWO (2002), VL-e (2003-2009), EU
(JavaGAT, XtreemOS, Contrail), VU, COMMIT
Outline
●
●
●
‘Problem Solving’ vs. ‘System Fighting’
Jungle Computing
Example applications:
●
●
●
●
The Ibis Software Framework
The 3 Common Uses of Ibis
●
●
Computational Astrophysics
Multimedia Content Analysis
‘Master Key’ + ‘Glue’ + ‘HPC’
Some current work: Green Clouds
Ibis: ‘Problem Solving’ vs. ‘System Fighting’
A Random Example: Supernova Detection
●
DACH 2008, Japan
●
●
●
●
Distributed multi-cluster system
● Heterogeneous
Distributed database (image pairs)
● Large vs small databases/images
● Partial replication
Image-pair comparison given (in C)
Find all supernova candidates
●
●
Task 1: As fast as possible
Task 2: Idem, under system crashes
‘Problem Solving’ vs. ‘System Fighting’
●
All participating teams struggled (1 month)
●
●
●
●
Middleware instabilities…
Connectivity problems…
Load balancing…
But not the Ibis team
●
●
●
Winner (by far) in both categories
Note: many Japanese teams with years of experience
● Hardware, middleware, network, C-code, image data…
Focus on ‘problem solving’, not ‘system fighting’
● incl. ‘opening’ of black-box C-code
Ibis Results: Awards & Prizes
AAAI-VC 2007
Most Visionary Research Award
1st Prize: DACH 2008 - BS
1st Prize: DACH 2008 - FT
WebPie: A Web-Scale Parallel Inference
Engine
J. Urbani, S. Kotoulas,
J. Maassen, N. Drost, F.J. Seinstra,
F. van Harmelen, and H.E. Bal
3rd Prize: ISWC 2008
1st Prize: SCALE 2008
1st Prize: SCALE 2010
●
Many domains; data/compute intensive, real-time...
●
Winner Sustainability Award in the Enlighten Your Research (EYR)
competition, 7 Dec. 2011 (Frank Seinstra)
Ibis Users…
…and many more
Jungle Computing
Jungle Computing (Frank Seinstra)
●
‘Worst case’ computing as required by end-users
●
●
●
Distributed
Heterogeneous
Hierarchical (incl. multi-/many-cores)
Why Jungle Computing?
●
Scientists often forced to use a wide variety of
resources simultaneously to solve computational
problems, e.g. due to:
●
●
●
●
●
●
●
Desire for scalability
Distributed nature of (input) data
Software heterogeneity (e.g.: mix of C/MPI and CUDA)
Ad hoc hardware availability
Energy consumption (use most energy-efficient resource)
…
Note: most users do not need ‘worst case’ jungle
●
Ibis aims to apply to any subset
Example Application Domains
●
Computational Astrophysics (Leiden)
●
●
●
AMUSE: multi-model / multi-kernel simulations
“Simulating the Universe on an Intercontinental
Grid” - Portegies Zwart et al (IEEE Computer, Aug 2010)
Climate Modeling (Utrecht)
●
CPL: multi-model / multi-kernel simulations
● Atmosphere, ocean, source rock formation, …
- hardware: (potentially) very diverse
- high resolution => speed & scalability
-…
Domain Example #1:
Computational Astrophysics
Domain Example #1: Computational Astrophysics
Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA
Domain Example #1: Computational Astrophysics
●
The AMUSE system (Leiden University)
●
Early Star Cluster Evolution, including gas
gravitational
dynamics
stellar
evolution
AMUSE
hydrodynamics
radiative
transport
●
●
●
Gravitational dynamics (N-body):
GPU / GPU-cluster
Stellar evolution:
Beowulf cluster / Cloud
Hydro-dynamics, Radiative transport: Supercomputer
Domain Example #1: Computational Astrophysics
Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA
Domain Example #2:
Multimedia Content Analysis
Multimedia Content Analysis (MMCA)
●
Aim:
●
Automatic extraction of ‘semantic concepts’ from image sets and
video streams
●
Depending on specific problem & size of data set:
● May take hours, days, weeks, months, years…
Multimedia Content Analysis (MMCA)
●
Applications in (a.o):
●
●
●
●
●
●
●
Remote Sensing
Security / Surveillance
Medical Imaging
Document Analysis
Multimedia Systems
Astronomy
Application types:
●
●
●
Real-time vs. off-line
Fine-grained vs. coarse-grained
Data-intensive / compute-intensive / information-intensive
Domain Example #2: Color-based Object
Recognition by a Grid-connected Robot Dog
Seinstra et al (IEEE Multimedia, Oct-Dec 2007)
Seinstra et al (AAAI’07: Most Visionary Research Award)
Successful…
●
…but many fundamental problems unsolved!
●
●
●
●
●
●
●
Scaling up to very large systems
Platform independence
Middleware independence
Connectivity (a.o. firewalls, …)
Fault-tolerance
…
Software support tool(s) urgently needed!
●
●
Jungle-aware + transparent + efficient
No progress until ‘discovery’ of Ibis
The Ibis Software Framework
The Ibis Software Framework
●
Offers all functionality to efficiently & transparently
implement & run Jungle Computing applications
●
Designed for dynamic / hostile environments
●
Modular and flexible
●
●
Allow replacement of Ibis components by external ones, including
native code
Open source
●
Download: http://www.cs.vu.nl/ibis/
Ibis Design
●
Applications need functionality for
●
●
Programming (as in programming languages)
Deployment (as in operating systems)
Programming
Deployment
Logical
Practical
Likes math
Visual (GUI)
Ibis Software Stack
JavaGAT
●
Java Grid Application Toolkit
●
●
●
Developed in EU GridLab project
●
●
High-level API for developing (Grid) applications independently of
the underlying (Grid) middleware
Use (Grid) services; file cp, resource discovery, job submission, …
Thilo Kielmann, Rob van Nieuwpoort
SAGA API standardized by OGF
●
●
Simple API for Grid Applications (a.o. with LSU)
SAGA on top of JavaGAT (and v.v.)
Zorilla
●
A prototype P2P middleware
●
●
●
●
●
A Zorilla system consists of a collection of nodes, connected by a
P2P network
Each node independent & implements all middleware functionality
No central components
Supports fault-tolerance and malleability
Easily combines resources in multiple administrative domains
IbisDeploy
Ibis Portability Layer (IPL)
●
Java-centric ‘run-anywhere’ communication library
●
●
●
Supports fault-tolerance and malleability
●
●
●
Sent along with your application
“MPI for the Grid”
Resource tracking (Join-Elect-Leave model)
Open-world / Closed world
Efficient
●
●
Highly optimized object serialization
Can use optimized native libraries (e.g. MPI, Infiniband)
SmartSockets
●
Robust connection setup
Problems:
Firewalls
Network Address Translation (NAT)
Non-routed networks
Multi-homing
…
●
Always connection in 30 different scenarios
Ibis Programming Models
●
IPL-based programming models, a.o.:
●
●
●
●
Satin:
● A divide-and-conquer model
MPJ:
● The MPI binding for Java
RMI:
● Object-Oriented remote Procedure Call
Jorus:
● A ‘user transparent’ parallel model for multimedia applications
The 3 Common Uses of Ibis
Ibis as ‘Master Key’ (or ‘Passepartout’)
●
Use JavaGAT to access ‘any’ system
●
●
●
Develop/run applications independently of available middlewares
JavaGAT ‘adaptors’ required for each middleware
‘Intelligent dispatching’ even allows for transparent use of multiple
middlewares
package org.gridlab.gat.io.cpi.rftgt4;
import java.net.MalformedURLException;
import java.net.URL;
import java.rmi.RemoteException;
import java.security .cert.X509Certificate;
import java.util.Calendar;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Vector;
import javax.xml.namespace.QName;
import javax.xml.rpc.ServiceException;
import javax.xml.rpc.Stub;
import javax.xml.soap.SOAPElement;
●
Example: file copy
●
JavaGAT vs. Globus
package tutorial;
import org.apache.axis.message.addressing.EndpointReferenc eTy pe ;
import org.apache.axis.ty pes.URI.MalformedURIException;
import org.globus.axis.util.Util;
import org.globus.delegation.DelegationConsta nts;
import org.globus.delegation.DelegationException;
import org.globus.delegation.DelegationUtil;
import org.globus.gsi.GlobusCredential;
import org.globus.gsi.GlobusCredentialExceptio n;
import org.globus.gsi.gssapi.GlobusGSSCredentialImpl;
import org.globus.gsi.jaas.JaasGssUtil;
import org.globus.rft.generated.BaseRequestTy pe ;
import org.globus.rft.generated.CreateReliable FileTra nsferIn putTy pe ;
import org.globus.rft.generated.CreateReliable FileTra nsferOu tputTy pe ;
import org.globus.rft.generated.DeleteRequestTy pe ;
import org.globus.rft.generated.DeleteTy pe ;
import org.globus.rft.generated.OverallStatus;
import org.globus.rft.generated.RFTFaultResou rceProperty Ty pe ;
import org.globus.rft.generated.RFTOptionsTy pe;
import org.globus.rft.generated.ReliableFileTra nsferFa ctory PortTy pe ;
import org.globus.rft.generated.ReliableFileTra nsferPortTy pe ;
import org.globus.rft.generated.Start;
import org.globus.rft.generated.TransferReque stTy pe ;
import org.globus.rft.generated.TransferTy pe ;
import org.globus.transfer.reliable.client.BaseRFTClie nt;
import org.globus.transfer.reliable.service.RFT Constan ts;
import org.globus.wsrf.NotificationConsumerManager;
import org.globus.wsrf.Notify Callback;
import org.globus.wsrf.ResourceException;
import org.globus.wsrf.WSNConstants;
import org.globus.wsrf.container.ContainerExc eption;
import org.globus.wsrf.container.ServiceContainer;
import org.globus.wsrf.core.notification.Resour ceProperty ValueChan geNotification ElementTy pe;
import org.globus.wsrf.encoding.Deserializatio nExcep tion;
import org.globus.wsrf.encoding.ObjectDeserializer;
import org.globus.wsrf.impl.security .authentication.Constants;
import org.globus.wsrf.impl.security .authorization.Authorization;
import org.globus.wsrf.impl.security .authorization.HostAuthor ization;
import org.globus.wsrf.impl.security .authorization.Iden tity Authorization;
import org.globus.wsrf.impl.security .authorization.Self Authorization;
import org.globus.wsrf.impl.security .descriptor.ClientSecurity Descrip tor;
import org.globus.wsrf.impl.security .descriptor.ContainerSecu rity Descriptor;
import org.globus.wsrf.impl.security .descriptor.GSISecureMs gAuthM ethod;
import org.globus.wsrf.impl.security .descriptor.GSITransportAuthMe thod;
import org.globus.wsrf.impl.security .descriptor.ResourceSecu rity Descriptor;
import org.globus.wsrf.impl.security .descriptor.Security DescriptorEx ception;
import org.globus.wsrf.security .Security Manager;
import org.gridlab.gat.CouldNotInitializeCrede ntialExc eption;
import org.gridlab.gat.CredentialExpiredException;
import org.gridlab.gat.GATContext;
import org.gridlab.gat.GATInvocationExceptio n;
import org.gridlab.gat.GATObjectCreationException;
import org.gridlab.gat.Preferences;
import org.gridlab.gat.URI;
import org.gridlab.gat.io.cpi.FileCpi;
import org.gridlab.gat.security .globus.GlobusSecurity Utils;
import org.ietf.jgss.GSSCredential;
import org.ietf.jgss.GSSException;
import org.oasis.wsn.Subscribe;
import org.oasis.wsn.TopicExpressionTy pe;
import org.oasis.wsrf.faults.BaseFaultTy pe;
import org.oasis.wsrf.lifetime.SetTerminationTime ;
import org.oasis.wsrf.properties.GetMultipleRe sourcePropertiesResp onse ;
import org.oasis.wsrf.properties.GetMultipleRe sourceProperties_Element;
import org.oasis.wsrf.properties.ResourceProperty ValueChan geNotif ication Ty pe;
class RFTGT4Notify Callback implements Notify Callback {
RFTGT4FileAdaptor transfer;
OverallStatus status;
public RFTGT4Notify Callback(RFTGT4FileA daptor transfer) {
super();
this.transfer = transfer;
this.status = null;
}
import org.gridlab.gat.GAT;
import org.gridlab.gat.GATContext;
import org.gridlab.gat.URI;
import org.gridlab.gat.io.File;
public class RemoteCopy {
public static void main(String[] args) throws Exception {
GATContext context = new GATContext();
URI src = new URI(args[0]);
URI dest = new URI(args[1]);
File file = GAT.createFile(context, src);
file.copy (dest);
GAT.end();
}
}
@SuppressWarnings("unchecked")
public void deliver(List topicPath, EndpointReferenceTy pe producer,
Object messageWrapper) {
try {
ResourceProperty ValueChangeNotificationTy pe message =
((ResourceProperty ValueChangeNotificationElementTy pe ) messageWrapper)
.getResourceProperty ValueChangeNotification();
this.status = (OverallStatus) message.getNewValue().get_any ()[0]
.getValueAsTy pe(RFTConstants.OVERALL_ST ATUS_ RESO URCE,
OverallStatus.class);
if (status.getFault() != null) {
transfer.setFault(getFaultFromRP(status.getFault()));
}
// RunQueue.getInstance().add(this.resourc eKey );
} catch (Exception e) {
}
transfer.setStatus(status);
}
private BaseFaultTy pe getFaultFromRP (RFTFaultResourceProperty Ty pe fault) {
if (fault == null) {
return null;
}
●
●
Simple, portable, …
SAGA API standardized
if (fault.getDelegationEPRMissingFaultTy pe() != null) {
return fault.getDelegationEPRMissingFaultTy pe();
} else if (fault.getRftAuthenticationFaultTy pe() != null) {
return fault.getRftAuthenticationFaultTy pe();
} else if (fault.getRftAuthorizationFa ultTy pe() != null) {
return fault.getRftAuthorizationFaultTy pe();
} else if (fault.getRftDatabaseFaultTy pe() != null) {
return fault.getRftDatabaseFaultTy pe();
} else if (fault.getRftRepeatedly StartedFaultTy pe () != null) {
return fault.getRftRepeatedly StartedFaultTy pe ();
} else if (fault.getTransferTransientFaultTy pe () != null) {
return fault.getTransferTransientFaultTy pe ();
} else if (fault.getRftTransferFaultTy pe () != null) {
return fault.getRftTransferFaultTy pe ();
} else {
return null;
}
}
}
@SuppressWarnings("serial")
public class RFTGT4FileAdaptor extends FileCpi {
public static final Authorization DEFAULT_AUTHZ = HostAuthorization
.getInstance();
Integer msgProtectionTy pe = Constants.SIGNATURE;
static final int TERM_TIME = 20;
static final String PROTOCOL = "https";
private static final String BASE_SERVICE_PATH = "/wsrf/services/";
public static final int DEFAULT_DURATION_HOURS = 24;
public static final Integer DEFAULT_MSG_PROTECTION = Constants.SIGNATURE;
public static final String DEFAULT_FACTORY_PORT = "8443";
private static final int DEFAULT_GRIDFTP_PORT = 2811;
NotificationConsumerManager notificationConsumerManager;
EndpointReferenceTy pe notificationConsumerEPR;
EndpointReferenceTy pe notificationProducerEPR;
String security Ty pe;
String factory Url;
GSSCredential proxy ;
Authorization authorization;
String host;
OverallStatus status;
BaseFaultTy pe fault;
String locationStr;
ReliableFileTransferFactory PortTy pe factory Port;
throw new GATInvocationException(
"RFTGT4FileAdaptor: setAuthMethods failed, " + e);
}
RFTGT4Notify Callback notify Callback = new RFTGT4Notify Callback(this);
try {
notificationConsumerEPR = notificationConsumerManager
.createNotificationConsumer(topicPath, notify Callback,
security Descriptor);
} catch (ResourceException e) {
throw new GATInvocationException(
"RFTGT4FileAdaptor: createNotificationConsumer failed, "
+ e);
}
Subscribe subscriptionRequest = new Subscribe();
subscriptionRequest.setConsumerReference( notifica tionCo nsumerEPR);
TopicExpressionTy pe topicExpression = null;
try {
topicExpression = new TopicExpressionTy pe(
WSNConstants.SIMPLE_TOPIC_DIALECT,
RFTConstants.OVERALL_STATUS_RE SOURCE);
} catch (MalformedURIException e) {
throw new GATInvocationException(
"RFTGT4FileAdaptor: create TopicExpressionTy pe failed, "
+ e);
}
subscriptionRequest.setTopicExpression(topicExpre ssion);
try {
rft.subscribe(subscriptionRequest);
} catch (RemoteException e) {
throw new GATInvocationException(
"RFTGT4FileAdaptor: subscription failed, " + e);
}
public RFTGT4FileAdaptor(GATContext gatContext, Preferences preferences,
URI location) throws GATObjectCreationException {
super(gatContext, preferences, location);
if (!location.isCompatible("gsiftp")
&& !location.isCompatible("gridftp")) {
throw new GATObjectCreationException("cannot handle this URI");
}
String globusLocation = Sy stem.getenv("GLOBUS_LOCATION");
if (globusLocation == null) {
throw new GATObjectCreationException("$GLOBUS_L OCATI ON is not set");
}
Sy stem.setProperty ("GLOBUS_LOCATION", globusLocation);
Sy stem.setProperty ("axis.ClientConfigFile", globusLocation
+ "/client-config.wsdd");
this.host = location.getHost();
this.security Ty pe = Constants.GSI_SEC_MSG;
this.authorization = null;
this.proxy = null;
try {
proxy = GlobusSecurity Utils.getGlobusCredential(gatContext,
preferences, "globus", location, DEFAULT_GRIDFTP_PORT);
} catch (CouldNotInitializeCredentialException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (CredentialExpiredException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
this.notificationConsumerManager = null;
this.notificationConsumerEPR = null;
this.notificationProducerEPR = null;
this.status = null;
this.fault = null;
factory Port = null;
this.factory Url = PROTOCOL + "://" + host + ":" + DEFAULT_FACTORY_PORT
+ BASE_SERVICE_PATH + RFTConstants.FACTORY_NAME;
locationStr = setLocationStr(location);
}
String setLocationStr(URI location) {
if (location.getScheme().equals("any ")) {
return "gsiftp://" + location.getHost() + ":" + location.getPort()
+ "/" + location.getPath();
} else {
return location.toString();
}
}
protected boolean copy 2(String destStr) throws GATInvocationException {
EndpointReferenceTy pe credentialEndpoint = getCredentialEPR();
TransferTy pe[] transferArray = new TransferTy pe[1];
transferArray [0] = new TransferTy pe();
transferArray [0].setSourceUrl(locationStr);
transferArray [0].setDestinationUrl(destStr);
int lifetime = DEFAULT_DURATION_HOURS * 60 * 60;
ClientSecurity Descriptor secDesc = new ClientSecurity Descriptor();
if (this.security Ty pe.equals(Constants.GSI_SEC_MSG)) {
secDesc.setGSISecureMsg(this.getMessageProtectionTy pe());
} else {
secDesc.setGSITransport(this.getM essageProtectio nTy pe());
}
secDesc.setAuthz(getAuthorization() );
if (this.proxy != null) {
secDesc.setGSSCredential(this.proxy );
}
// Get the public key to delegate on.
X509Certificate[] certsToDelegateOn = null;
try {
certsToDelegateOn = DelegationUtil.getCertificateChainRP(
delegationFactory Endpoint, secDesc);
} catch (DelegationException e) {
throw new GATInvocationException("RFTGT4FileAdaptor: " + e);
}
X509Certificate certToSign = certsToDelegateOn[0];
}
protected EndpointReferenceTy pe getCredentialEPR()
throws GATInvocationException {
this.status = null;
URL factory URL = null;
try {
factory URL = new URL(factory Url);
} catch (MalformedURLException e) {
throw new GATInvocationException(
"RFTGT4FileAdaptor: set factory URL failed, " + e);
}
try {
factory Port = BaseRFTClient.rftFactory Locator
.getReliableFileTransferFactory PortTy pePort(factory URL);
} catch (ServiceException e) {
throw new GATInvocationException(
"RFTGT4FileAdaptor: set factory Port failed, " + e);
}
setSecurity Ty peFromURL(factory URL);
return populateRFTEndpoints(factory Port);
}
// FIXME remove when there is a DelegationUtil.delegate(EPR, ...)
String protocol = delegationFactory Endpoint.getAddress().getScheme();
String host = delegationFactory Endpoint.getAddress().getH ost();
int port = delegationFactory Endpoint.getAddress().getPort();
String factory Url = protocol + "://" + host + ":" + port
+ BASE_SERVICE_PATH + DelegationConstants.FACTORY_PATH;
// send to delegation service and get epr.
EndpointReferenceTy pe credentialEndpoint = null;
try {
credentialEndpoint = DelegationUtil.delegate(factory Url,
credential, certToSign, lifetime, false, secDesc);
} catch (DelegationException e) {
throw new GATInvocationException("RFTGT4FileAdaptor: " + e);
}
return credentialEndpoint;
}
protected void setRequest(BaseRequestTy pe request)
throws GATInvocationException {
CreateReliableFileTransferInputTy pe input = new CreateReliableFileTransferInputTy pe();
if (request instanceof TransferRequestTy pe) {
input.setTransferRequest((Transfer RequestTy pe) request);
} else {
input.setDeleteRequest((DeleteRequ estTy pe) request);
}
public EndpointReferenceTy pe[] fetchDelegationFactory Endpoints(
ReliableFileTransferFactory PortTy pe factory Port)
throws GATInvocationException {
GetMultipleResourceProperties_Element request = new GetMultipleResourceProperties_Element();
request
.setResourceProperty (new QName[] { RFTConstants.DELEGATION_ENDPOINT_FACTO RY });
GetMultipleResourcePropertiesResponse response;
try {
response = factory Port.getMultipleResourceProperties(request);
} catch (RemoteException e) {
e.printStackTrace();
throw new GATInvocationException(
"RFTGT4FileAdaptor: getMultipleResourceProperties, " + e);
}
SOAPElement[] any = response.get_any ();
Calendar termTimeDel = Calendar.getInstance();
termTimeDel.add(Calendar.MINUTE, TERM_TIME);
input.setInitialTerminationTime(termTimeDel);
CreateReliableFileTransferOutputTy pe response = null;
try {
response = factory Port.createReliableFileTransfer(input);
} catch (RemoteException e) {
throw new GATInvocationException(
"RFTGT4FileAdaptor: set createReliableFileTransfer failed, "
+ e);
}
EndpointReferenceTy pe reliableRFTEndpoint = response
.getReliableTransferEPR();
ReliableFileTransferPortTy pe rft = null;
try {
rft = BaseRFTClient.rftLocator
.getReliableFileTransferPortTy pePort(reliableRFTEndpoint);
} catch (ServiceException e) {
throw new GATInvocationException(
"RFTGT4FileAdaptor: getReliableFileTransferPortTy pePort failed, "
+ e);
}
setStubSecurity Properties((Stub) rft);
subscribe(rft);
Calendar termTime = Calendar.getInstance();
termTime.add(Calendar.MINUTE, TERM_TIME);
SetTerminationTime reqTermTime = new SetTerminationTime();
reqTermTime.setRequestedTerminationTime(termTime);
try {
rft.setTerminationTime(reqTermTime);
} catch (RemoteException e) {
throw new GATInvocationException(
"RFTGT4FileAdaptor: setTerminationTime failed, " + e);
}
try {
rft.start(new Start());
} catch (RemoteException e) {
throw new GATInvocationException(
"RFTGT4FileAdaptor: start failed, " + e);
}
RFTOptionsTy pe rftOptions = new RFTOptionsTy pe();
rftOptions.setBinary (Boolean.TRUE);
// rftOptions.setIgnoreFilePermErr(false);
TransferRequestTy pe request = new TransferRequestTy pe();
request.setRftOptions(rftOptions);
request.setTransfer(transferArray );
request.setTransferCredentialEndpoint(cred entialEn dpoint) ;
setRequest(request);
while (!transfersDone()) {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
throw new GATInvocationException("RFTGT4FileAdap tor: " + e);
}
}
return transfersSucc();
}
public void copy (URI dest) throws GATInvocationException {
String destUrl = setLocationStr(dest);
if (!copy 2(destUrl)) {
throw new GATInvocationException(
"RFTGT4FileAdaptor: file copy failed");
}
}
public void subscribe(ReliableFileTransferPortTy pe rft)
throws GATInvocationException {
Map<Object, Object> properties = new HashMap<Object, Object>();
properties.put(ServiceContainer.CLA SS,
"org.globus.wsrf.container.GSIServiceCon tainer");
if (this.proxy != null) {
ContainerSecurity Descriptor containerSecDesc = new ContainerSecurity Descriptor();
Security Manager.getManager();
try {
containerSecDesc.setSubject(JaasGssUtil
.createSubject(this.proxy ));
} catch (GSSException e) {
throw new GATInvocationException(
"RFTGT4FileAdaptor: ContainerSecurity Descriptor failed, "
+ e);
}
properties.put(ServiceContainer.CO NTAIN ER_D ESCRIPTOR,
containerSecDesc);
}
this.notificationConsumerManager = NotificationConsumerManager
.getInstance(properties);
try {
this.notificationConsumerManager.startListening();
} catch (ContainerException e) {
throw new GATInvocationException(
"RFTGT4FileAdaptor: NotificationConsumerManager failed, "
+ e);
}
List<Object> topicPath = new LinkedList<Object>();
topicPath.add(RFTConstants.OVERALL_ST ATUS_ RESOU RCE);
ResourceSecurity Descriptor security Descriptor = new ResourceSecurity Descriptor();
String authz = null;
if (authorization == null) {
authz = Authorization.AUTHZ_NONE;
} else if (authorization instanceof HostAuthorization) {
authz = Authorization.AUTHZ_NONE;
} else if (authorization instanceof SelfAuthorization) {
authz = Authorization.AUTHZ_SELF;
} else if (authorization instanceof Identity Authorization) {
// not supported
throw new GATInvocationException(
"RFTGT4FileAdaptor: identity authorization not supported");
} else {
// throw an sg
throw new GATInvocationException(
"RFTGT4FileAdaptor: set authorization failed");
}
security Descriptor.setAuthz(auth z);
Vector<Object> authMethod = new Vector<Object>();
if (this.security Ty pe.equals(Constants.GSI_SEC_MSG)) {
authMethod.add(GSISecureMsgAuthMetho d.BOT H);
} else {
authMethod.add(GSITransportAuth Method. BOTH );
}
try {
security Descriptor.setAuthMethods(authMe thod);
} catch (Security DescriptorException e) {
private EndpointReferenceTy pe delegate(
EndpointReferenceTy pe delegationFactory Endpoint)
throws GATInvocationException {
GlobusCredential credential = null;
if (this.proxy != null) {
credential = ((GlobusGSSCredentialImpl) this.proxy )
.getGlobusCredential();
} else {
try {
credential = GlobusCredential.getDefaultCredential();
} catch (GlobusCredentialException e) {
throw new GATInvocationException("RFTGT4FileAdap tor: " + e);
}
}
EndpointReferenceTy pe epr1 = null;
try {
epr1 = (EndpointReferenceTy pe) ObjectDeserializer.toObject(any [0],
EndpointReferenceTy pe.class);
} catch (DeserializationException e) {
throw new GATInvocationException(
"RFTGT4FileAdaptor: ObjectDeserializer, " + e);
}
EndpointReferenceTy pe[] endpoints = new EndpointReferenceTy pe[] { epr1 };
return endpoints;
}
sy nchronized void setStatus(OverallStatus status) {
this.status = status;
}
public int transfersActive() {
if (status == null) {
return 1;
}
return status.getTransfersActive();
}
public int transfersFinished() {
if (status == null) {
return 0;
}
return status.getTransfersFinished();
}
}
public int transfersCancelled() {
if (status == null) {
return 0;
}
return status.getTransfersCancelled();
}
private void setSecurity Ty peFromURL(URL url) {
if (url.getProtocol().equals("http")) {
security Ty pe = Constants.GSI_SEC_MSG;
} else {
Util.registerTransport();
security Ty pe = Constants.GSI_TRANSPORT;
}
}
public int transfersFailed() {
if (status == null) {
return 0;
}
return status.getTransfersFailed();
}
private void setStubSecurity Properties(Stub stub) {
ClientSecurity Descriptor secDesc = new ClientSecurity Descriptor();
if (this.security Ty pe.equals(Constants.GSI_SEC_MSG)) {
secDesc.setGSISecureMsg(this.getMessageProtectionTy pe());
} else {
secDesc.setGSITransport(this.getM essageProtectio nTy pe());
}
public int transfersPending() {
if (status == null) {
return 1;
}
return status.getTransfersPending();
}
secDesc.setAuthz(getAuthorization() );
public int transfersRestarted() {
if (status == null) {
return 0;
}
return status.getTransfersRestarted();
}
if (this.proxy != null) {
// set proxy credential
secDesc.setGSSCredential(this.proxy );
}
stub._setProperty (Constants.CLIENT_DESCRIPTOR, secDesc);
}
public boolean transfersDone() {
return (transfersActive() == 0 && transfersPending() == 0 && transfersRestarted() == 0);
}
public Integer getMessageProtectionTy pe() {
return (this.msgProtectionTy pe == null) ? RFTGT4FileAdaptor.DEFAULT_MSG _PROTECTIO N
: this.msgProtectionTy pe;
}
public boolean transfersSucc() {
return (transfersDone() && transfersFailed() == 0 && transfersCancelled() == 0);
}
public Authorization getAuthorization() {
return (authorization == null) ? DEFAULT_AUTHZ : this.authorization;
}
/*
* private BaseFaultTy pe getFaultFromRP(RFTFaultResourceProperty Ty pe fault) {
* if (fault == null) { return null; }
*
* if (fault.getRftTransferFaultTy pe() != null) { return
* fault.getRftTransferFaultTy pe(); } else if
* (fault.getDelegationEPRMissingFaultTy pe() != null) { return
* fault.getDelegationEPRMissingFaultTy pe(); } else if
* (fault.getRftAuthenticationFaultTy pe() != null) { return
* fault.getRftAuthenticationFaultTy pe(); } else if
* (fault.getRftAuthorizationFa ultTy pe() != null) { return
* fault.getRftAuthorizationFaultTy pe(); } else if
* (fault.getRftDatabaseFaultTy pe() != null) { return
* fault.getRftDatabaseFaultTy pe(); } else if
* (fault.getRftRepeatedly StartedFaultTy pe() != null) { return
* fault.getRftRepeatedly StartedFaultTy pe(); } else if
* (fault.getTransferTransientFaultTy pe() != null) { return
* fault.getTransferTransientFaultTy pe(); } else { return null; } }
*/
private EndpointReferenceTy pe populateRFTEndpoints(
ReliableFileTransferFactory PortTy pe factory Port)
throws GATInvocationException {
EndpointReferenceTy pe[] delegationFactory Endpoints = fetchDelegationFactory Endpoints(factory Port);
EndpointReferenceTy pe delegationEndpoint = delegate(delegationFactory Endpoints[0]);
return delegationEndpoint;
}
/*
* private BaseFaultTy pe deserializeFaultRP(SOAPElement any ) throws
* Exception { return getFaultFromRP((RFTFaultResourceProperty Ty pe)
* ObjectDeserializer .toObject(any , RFTFaultResourceProperty Ty pe.class)); }
*/
void setFault(BaseFaultTy pe fault) {
this.fault = fault;
}
}
Ibis as ‘Glue’
●
Use IPL + SmartSockets, generally for wide-area
communication
●
●
Linking up separate ‘activities’ of an application
● Activities: often largely ‘independent’ tasks implemented in any
popular language or model (e.g. C/MPI, CUDA, Fortran, Java…)
● Each typically running on a single GPU/node/Cluster/Cloud/…
Automatically circumvent connectivity problems
● Example:
With SmartSockets:
No SmartSockets:
Ibis as ‘HPC Solution’
●
Use Ibis as replacement for e.g. C++/MPI code
●
Benefits:
(better) portability
● malleability (open world)
● fault-tolerance
● (run-time) task migration
Downside:
● requires recoding
●
●
●
Comparable speedups:
MMCA: Situation in 2004/2005
SSH
Parallel
Horus
Parallel
Horus
Client
Sockets + SSH Tunneling
Server
C++/MPI
●
●
●
●
Code pre-installed at each cluster site
Instable / faulty communication
Connectivity problems
Execution on each cluster ‘by hand’
Parallel
Horus
Client
Phase 1: Ibis as ‘Master Key’ (2006)
JavaGAT + IbisDeploy
Parallel
Horus
Parallel
Horus
Client
Sockets + SSH Tunneling
Server
C++/MPI
●
●
●
●
Code pre-installed at each cluster site
Instable / faulty communication
Connectivity problems
Execution on each cluster ‘by hand’
Parallel
Horus
Client
Phase 2: Ibis as ‘Glue’ (2006/2007)
JavaGAT + IbisDeploy
Parallel
Horus
Parallel
Horus
Client
IPL + SmartSockets
Server
C++/MPI
●
●
●
●
Code pre-installed at each cluster site
Instable / faulty communication
Connectivity problems
Execution on each cluster ‘by hand’
Parallel
Horus
Client
Phase 3: Ibis as ‘HPC Solution’ (2008)
JavaGAT + IbisDeploy
Parallel
Jorus
Parallel
Jorus
Client
IPL + SmartSockets
Server
Ibis/Java
●
●
●
●
Code pre-installed at each cluster site
Instable / faulty communication
Connectivity problems
Execution on each cluster ‘by hand’
Parallel
Jorus
Client
‘Master Key’ + ‘Glue’ + ‘HPC’
●
Step-wise conversion to 100% Ibis / Java
●
●
●
●
●
Phase 1: JavaGAT as ‘Master Key’
Phase 2: IPL + SmartSockets as ‘Glue’
Phase 3: Ibis as ‘HPC Solution’
After each phase a fully functional, working solution was available!
Eventual result:
●
●
●
‘wall-socket computing from a memory stick’
Remember: the ‘Promise of the Grid’?
Awards at AAAI 2007 and CCGrid 2008
100% Ibis Implementation (2008++)
Seinstra, Maassen, Drost et al. (SCALE 2008 @ CCGrid 2008: First Prize Winner)
Bal, Maassen, Drost, Seinstra et al. (IEEE Computer, Aug 2010)
Some current work
Green Clouds
●
●
●
NWO Smart Energy Systems project with Univ.
of Amsterdam (Cees de Laat) & SARA
How to map high-performance applications onto
hybrid distributed computing system, taking both
performance & energy consumption into account
System-level approach to reduce HPC energy
consumption
DAS-4: infrastructure for Green IT
UvA/MultimediaN (16/36)
Dual quad-core Xeon E5620
Various accelerators
(GPUs, multicores, ….)
Scientific Linux
Built by ClusterVision
VU (74)
SURFnet6
ASTRON (23)
10 Gb/s lambdas
TU Delft (32)
Leiden (16)
Main ideas
●
Adapt resources to application needs dynamically,
accounting for computational & energy efficiency
●
●
Exploit hardware diversity
●
●
●
Using Ibis malleability support
Graphics Processing Units (GPUs) have much higher FLOPS/Watt
for many applications
Use optical and photonic networks
Build a knowledge base & semantic infrastructure
description
Other current PhD projects using Ibis
●
Distributed reasoning over semantic web data
●
●
●
Graph applications (HiPG)
●
●
●
E.g. for bioinformatics applications
http://www.graph500.org/
Games & distributed model checking
●
●
WebPIE: Parallel reasoner on Web scale
Written in Java, uses Hadoop (MapReduce)
Deal with large state space
Distributed smart phone applications
●
Computation & communication offloading to a cloud
Conclusions
●
Ibis enables problem solving (avoids system fighting)
●
Successfully applied in many domains
●
●
●
Astronomy, multimedia analysis, climate modeling,
remote sensing, semantic web, medical imaging, …
Data intensive, compute intensive, real-time…
Open source, download:
●
www.cs.vu.nl/ibis/
Conclusions (2)
●
Jungle Computing is hard
●
High-Performance Jungle Computing even harder
●
While research into efficient & green Jungle-aware
programming models has only just begun…
●
…Ibis provides the basic functionality to efficiently
& transparently overcome most Jungle Computing
complexities
Acknowledgements
Niels Drost
Ceriel Jacobs
Roelof Kemp
Timo van Kessel
Thilo Kielmann
Jason Maassen
Maarten van Meersbergen
Rob van Nieuwpoort
Nick Palmer
Kees van Reeuwijk
Frank J. Seinstra
Kees Verstoep
Ben van Werkhoven
Gosia Wrzesinska
Download