The Ibis e-Science Software Framework Henri Bal Frank J. Seinstra, Jason Maassen, Niels Drost High Performance Distributed Computing Group Department of Computer Science VU University, Amsterdam, The Netherlands Introduction ● Distributed systems continue to change ● ● Distributed applications continue to change ● ● Clusters, grids, clouds, mobile devices e-Science, web, pervasive applications Distributed programming continues to be notoriously difficult Distributed Systems: 1980s Multiple PCs on a (local) network ● ● ● ● ● Networks of Workstations (NOWs) Collections of Workstations (COWs) Processor pools Condor pools Clusters Distributed Systems: 1990s Sharing wide-area resources ● ● ● ● ● Metacomputing (Smarr & Catlett, CACM) Flocking Condor (Epema) DAS (Distributed ASCI Supercomputer) Grid Blueprint (Foster & Kesselman) Desktop grids, SETI@home Distributed Systems: 2000s ● Cloud computing ● ● ● ● ● Pay-on-demand Virtualization Hardware diversity /heterogeneous computing Green IT The Networked World ● ● Sensor networks Smart phones Our approach ● ● ● Study fundamental underlying problems … hand-in-hand with realistic applications … integrate solutions in one system: Ibis User Distributed Systems ● Funding from NWO (2002), VL-e (2003-2009), EU (JavaGAT, XtreemOS, Contrail), VU, COMMIT Outline ● ● ● ‘Problem Solving’ vs. ‘System Fighting’ Jungle Computing Example applications: ● ● ● ● The Ibis Software Framework The 3 Common Uses of Ibis ● ● Computational Astrophysics Multimedia Content Analysis ‘Master Key’ + ‘Glue’ + ‘HPC’ Some current work: Green Clouds Ibis: ‘Problem Solving’ vs. ‘System Fighting’ A Random Example: Supernova Detection ● DACH 2008, Japan ● ● ● ● Distributed multi-cluster system ● Heterogeneous Distributed database (image pairs) ● Large vs small databases/images ● Partial replication Image-pair comparison given (in C) Find all supernova candidates ● ● Task 1: As fast as possible Task 2: Idem, under system crashes ‘Problem Solving’ vs. ‘System Fighting’ ● All participating teams struggled (1 month) ● ● ● ● Middleware instabilities… Connectivity problems… Load balancing… But not the Ibis team ● ● ● Winner (by far) in both categories Note: many Japanese teams with years of experience ● Hardware, middleware, network, C-code, image data… Focus on ‘problem solving’, not ‘system fighting’ ● incl. ‘opening’ of black-box C-code Ibis Results: Awards & Prizes AAAI-VC 2007 Most Visionary Research Award 1st Prize: DACH 2008 - BS 1st Prize: DACH 2008 - FT WebPie: A Web-Scale Parallel Inference Engine J. Urbani, S. Kotoulas, J. Maassen, N. Drost, F.J. Seinstra, F. van Harmelen, and H.E. Bal 3rd Prize: ISWC 2008 1st Prize: SCALE 2008 1st Prize: SCALE 2010 ● Many domains; data/compute intensive, real-time... ● Winner Sustainability Award in the Enlighten Your Research (EYR) competition, 7 Dec. 2011 (Frank Seinstra) Ibis Users… …and many more Jungle Computing Jungle Computing (Frank Seinstra) ● ‘Worst case’ computing as required by end-users ● ● ● Distributed Heterogeneous Hierarchical (incl. multi-/many-cores) Why Jungle Computing? ● Scientists often forced to use a wide variety of resources simultaneously to solve computational problems, e.g. due to: ● ● ● ● ● ● ● Desire for scalability Distributed nature of (input) data Software heterogeneity (e.g.: mix of C/MPI and CUDA) Ad hoc hardware availability Energy consumption (use most energy-efficient resource) … Note: most users do not need ‘worst case’ jungle ● Ibis aims to apply to any subset Example Application Domains ● Computational Astrophysics (Leiden) ● ● ● AMUSE: multi-model / multi-kernel simulations “Simulating the Universe on an Intercontinental Grid” - Portegies Zwart et al (IEEE Computer, Aug 2010) Climate Modeling (Utrecht) ● CPL: multi-model / multi-kernel simulations ● Atmosphere, ocean, source rock formation, … - hardware: (potentially) very diverse - high resolution => speed & scalability -… Domain Example #1: Computational Astrophysics Domain Example #1: Computational Astrophysics Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA Domain Example #1: Computational Astrophysics ● The AMUSE system (Leiden University) ● Early Star Cluster Evolution, including gas gravitational dynamics stellar evolution AMUSE hydrodynamics radiative transport ● ● ● Gravitational dynamics (N-body): GPU / GPU-cluster Stellar evolution: Beowulf cluster / Cloud Hydro-dynamics, Radiative transport: Supercomputer Domain Example #1: Computational Astrophysics Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA Domain Example #2: Multimedia Content Analysis Multimedia Content Analysis (MMCA) ● Aim: ● Automatic extraction of ‘semantic concepts’ from image sets and video streams ● Depending on specific problem & size of data set: ● May take hours, days, weeks, months, years… Multimedia Content Analysis (MMCA) ● Applications in (a.o): ● ● ● ● ● ● ● Remote Sensing Security / Surveillance Medical Imaging Document Analysis Multimedia Systems Astronomy Application types: ● ● ● Real-time vs. off-line Fine-grained vs. coarse-grained Data-intensive / compute-intensive / information-intensive Domain Example #2: Color-based Object Recognition by a Grid-connected Robot Dog Seinstra et al (IEEE Multimedia, Oct-Dec 2007) Seinstra et al (AAAI’07: Most Visionary Research Award) Successful… ● …but many fundamental problems unsolved! ● ● ● ● ● ● ● Scaling up to very large systems Platform independence Middleware independence Connectivity (a.o. firewalls, …) Fault-tolerance … Software support tool(s) urgently needed! ● ● Jungle-aware + transparent + efficient No progress until ‘discovery’ of Ibis The Ibis Software Framework The Ibis Software Framework ● Offers all functionality to efficiently & transparently implement & run Jungle Computing applications ● Designed for dynamic / hostile environments ● Modular and flexible ● ● Allow replacement of Ibis components by external ones, including native code Open source ● Download: http://www.cs.vu.nl/ibis/ Ibis Design ● Applications need functionality for ● ● Programming (as in programming languages) Deployment (as in operating systems) Programming Deployment Logical Practical Likes math Visual (GUI) Ibis Software Stack JavaGAT ● Java Grid Application Toolkit ● ● ● Developed in EU GridLab project ● ● High-level API for developing (Grid) applications independently of the underlying (Grid) middleware Use (Grid) services; file cp, resource discovery, job submission, … Thilo Kielmann, Rob van Nieuwpoort SAGA API standardized by OGF ● ● Simple API for Grid Applications (a.o. with LSU) SAGA on top of JavaGAT (and v.v.) Zorilla ● A prototype P2P middleware ● ● ● ● ● A Zorilla system consists of a collection of nodes, connected by a P2P network Each node independent & implements all middleware functionality No central components Supports fault-tolerance and malleability Easily combines resources in multiple administrative domains IbisDeploy Ibis Portability Layer (IPL) ● Java-centric ‘run-anywhere’ communication library ● ● ● Supports fault-tolerance and malleability ● ● ● Sent along with your application “MPI for the Grid” Resource tracking (Join-Elect-Leave model) Open-world / Closed world Efficient ● ● Highly optimized object serialization Can use optimized native libraries (e.g. MPI, Infiniband) SmartSockets ● Robust connection setup Problems: Firewalls Network Address Translation (NAT) Non-routed networks Multi-homing … ● Always connection in 30 different scenarios Ibis Programming Models ● IPL-based programming models, a.o.: ● ● ● ● Satin: ● A divide-and-conquer model MPJ: ● The MPI binding for Java RMI: ● Object-Oriented remote Procedure Call Jorus: ● A ‘user transparent’ parallel model for multimedia applications The 3 Common Uses of Ibis Ibis as ‘Master Key’ (or ‘Passepartout’) ● Use JavaGAT to access ‘any’ system ● ● ● Develop/run applications independently of available middlewares JavaGAT ‘adaptors’ required for each middleware ‘Intelligent dispatching’ even allows for transparent use of multiple middlewares package org.gridlab.gat.io.cpi.rftgt4; import java.net.MalformedURLException; import java.net.URL; import java.rmi.RemoteException; import java.security .cert.X509Certificate; import java.util.Calendar; import java.util.HashMap; import java.util.LinkedList; import java.util.List; import java.util.Map; import java.util.Vector; import javax.xml.namespace.QName; import javax.xml.rpc.ServiceException; import javax.xml.rpc.Stub; import javax.xml.soap.SOAPElement; ● Example: file copy ● JavaGAT vs. Globus package tutorial; import org.apache.axis.message.addressing.EndpointReferenc eTy pe ; import org.apache.axis.ty pes.URI.MalformedURIException; import org.globus.axis.util.Util; import org.globus.delegation.DelegationConsta nts; import org.globus.delegation.DelegationException; import org.globus.delegation.DelegationUtil; import org.globus.gsi.GlobusCredential; import org.globus.gsi.GlobusCredentialExceptio n; import org.globus.gsi.gssapi.GlobusGSSCredentialImpl; import org.globus.gsi.jaas.JaasGssUtil; import org.globus.rft.generated.BaseRequestTy pe ; import org.globus.rft.generated.CreateReliable FileTra nsferIn putTy pe ; import org.globus.rft.generated.CreateReliable FileTra nsferOu tputTy pe ; import org.globus.rft.generated.DeleteRequestTy pe ; import org.globus.rft.generated.DeleteTy pe ; import org.globus.rft.generated.OverallStatus; import org.globus.rft.generated.RFTFaultResou rceProperty Ty pe ; import org.globus.rft.generated.RFTOptionsTy pe; import org.globus.rft.generated.ReliableFileTra nsferFa ctory PortTy pe ; import org.globus.rft.generated.ReliableFileTra nsferPortTy pe ; import org.globus.rft.generated.Start; import org.globus.rft.generated.TransferReque stTy pe ; import org.globus.rft.generated.TransferTy pe ; import org.globus.transfer.reliable.client.BaseRFTClie nt; import org.globus.transfer.reliable.service.RFT Constan ts; import org.globus.wsrf.NotificationConsumerManager; import org.globus.wsrf.Notify Callback; import org.globus.wsrf.ResourceException; import org.globus.wsrf.WSNConstants; import org.globus.wsrf.container.ContainerExc eption; import org.globus.wsrf.container.ServiceContainer; import org.globus.wsrf.core.notification.Resour ceProperty ValueChan geNotification ElementTy pe; import org.globus.wsrf.encoding.Deserializatio nExcep tion; import org.globus.wsrf.encoding.ObjectDeserializer; import org.globus.wsrf.impl.security .authentication.Constants; import org.globus.wsrf.impl.security .authorization.Authorization; import org.globus.wsrf.impl.security .authorization.HostAuthor ization; import org.globus.wsrf.impl.security .authorization.Iden tity Authorization; import org.globus.wsrf.impl.security .authorization.Self Authorization; import org.globus.wsrf.impl.security .descriptor.ClientSecurity Descrip tor; import org.globus.wsrf.impl.security .descriptor.ContainerSecu rity Descriptor; import org.globus.wsrf.impl.security .descriptor.GSISecureMs gAuthM ethod; import org.globus.wsrf.impl.security .descriptor.GSITransportAuthMe thod; import org.globus.wsrf.impl.security .descriptor.ResourceSecu rity Descriptor; import org.globus.wsrf.impl.security .descriptor.Security DescriptorEx ception; import org.globus.wsrf.security .Security Manager; import org.gridlab.gat.CouldNotInitializeCrede ntialExc eption; import org.gridlab.gat.CredentialExpiredException; import org.gridlab.gat.GATContext; import org.gridlab.gat.GATInvocationExceptio n; import org.gridlab.gat.GATObjectCreationException; import org.gridlab.gat.Preferences; import org.gridlab.gat.URI; import org.gridlab.gat.io.cpi.FileCpi; import org.gridlab.gat.security .globus.GlobusSecurity Utils; import org.ietf.jgss.GSSCredential; import org.ietf.jgss.GSSException; import org.oasis.wsn.Subscribe; import org.oasis.wsn.TopicExpressionTy pe; import org.oasis.wsrf.faults.BaseFaultTy pe; import org.oasis.wsrf.lifetime.SetTerminationTime ; import org.oasis.wsrf.properties.GetMultipleRe sourcePropertiesResp onse ; import org.oasis.wsrf.properties.GetMultipleRe sourceProperties_Element; import org.oasis.wsrf.properties.ResourceProperty ValueChan geNotif ication Ty pe; class RFTGT4Notify Callback implements Notify Callback { RFTGT4FileAdaptor transfer; OverallStatus status; public RFTGT4Notify Callback(RFTGT4FileA daptor transfer) { super(); this.transfer = transfer; this.status = null; } import org.gridlab.gat.GAT; import org.gridlab.gat.GATContext; import org.gridlab.gat.URI; import org.gridlab.gat.io.File; public class RemoteCopy { public static void main(String[] args) throws Exception { GATContext context = new GATContext(); URI src = new URI(args[0]); URI dest = new URI(args[1]); File file = GAT.createFile(context, src); file.copy (dest); GAT.end(); } } @SuppressWarnings("unchecked") public void deliver(List topicPath, EndpointReferenceTy pe producer, Object messageWrapper) { try { ResourceProperty ValueChangeNotificationTy pe message = ((ResourceProperty ValueChangeNotificationElementTy pe ) messageWrapper) .getResourceProperty ValueChangeNotification(); this.status = (OverallStatus) message.getNewValue().get_any ()[0] .getValueAsTy pe(RFTConstants.OVERALL_ST ATUS_ RESO URCE, OverallStatus.class); if (status.getFault() != null) { transfer.setFault(getFaultFromRP(status.getFault())); } // RunQueue.getInstance().add(this.resourc eKey ); } catch (Exception e) { } transfer.setStatus(status); } private BaseFaultTy pe getFaultFromRP (RFTFaultResourceProperty Ty pe fault) { if (fault == null) { return null; } ● ● Simple, portable, … SAGA API standardized if (fault.getDelegationEPRMissingFaultTy pe() != null) { return fault.getDelegationEPRMissingFaultTy pe(); } else if (fault.getRftAuthenticationFaultTy pe() != null) { return fault.getRftAuthenticationFaultTy pe(); } else if (fault.getRftAuthorizationFa ultTy pe() != null) { return fault.getRftAuthorizationFaultTy pe(); } else if (fault.getRftDatabaseFaultTy pe() != null) { return fault.getRftDatabaseFaultTy pe(); } else if (fault.getRftRepeatedly StartedFaultTy pe () != null) { return fault.getRftRepeatedly StartedFaultTy pe (); } else if (fault.getTransferTransientFaultTy pe () != null) { return fault.getTransferTransientFaultTy pe (); } else if (fault.getRftTransferFaultTy pe () != null) { return fault.getRftTransferFaultTy pe (); } else { return null; } } } @SuppressWarnings("serial") public class RFTGT4FileAdaptor extends FileCpi { public static final Authorization DEFAULT_AUTHZ = HostAuthorization .getInstance(); Integer msgProtectionTy pe = Constants.SIGNATURE; static final int TERM_TIME = 20; static final String PROTOCOL = "https"; private static final String BASE_SERVICE_PATH = "/wsrf/services/"; public static final int DEFAULT_DURATION_HOURS = 24; public static final Integer DEFAULT_MSG_PROTECTION = Constants.SIGNATURE; public static final String DEFAULT_FACTORY_PORT = "8443"; private static final int DEFAULT_GRIDFTP_PORT = 2811; NotificationConsumerManager notificationConsumerManager; EndpointReferenceTy pe notificationConsumerEPR; EndpointReferenceTy pe notificationProducerEPR; String security Ty pe; String factory Url; GSSCredential proxy ; Authorization authorization; String host; OverallStatus status; BaseFaultTy pe fault; String locationStr; ReliableFileTransferFactory PortTy pe factory Port; throw new GATInvocationException( "RFTGT4FileAdaptor: setAuthMethods failed, " + e); } RFTGT4Notify Callback notify Callback = new RFTGT4Notify Callback(this); try { notificationConsumerEPR = notificationConsumerManager .createNotificationConsumer(topicPath, notify Callback, security Descriptor); } catch (ResourceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: createNotificationConsumer failed, " + e); } Subscribe subscriptionRequest = new Subscribe(); subscriptionRequest.setConsumerReference( notifica tionCo nsumerEPR); TopicExpressionTy pe topicExpression = null; try { topicExpression = new TopicExpressionTy pe( WSNConstants.SIMPLE_TOPIC_DIALECT, RFTConstants.OVERALL_STATUS_RE SOURCE); } catch (MalformedURIException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: create TopicExpressionTy pe failed, " + e); } subscriptionRequest.setTopicExpression(topicExpre ssion); try { rft.subscribe(subscriptionRequest); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: subscription failed, " + e); } public RFTGT4FileAdaptor(GATContext gatContext, Preferences preferences, URI location) throws GATObjectCreationException { super(gatContext, preferences, location); if (!location.isCompatible("gsiftp") && !location.isCompatible("gridftp")) { throw new GATObjectCreationException("cannot handle this URI"); } String globusLocation = Sy stem.getenv("GLOBUS_LOCATION"); if (globusLocation == null) { throw new GATObjectCreationException("$GLOBUS_L OCATI ON is not set"); } Sy stem.setProperty ("GLOBUS_LOCATION", globusLocation); Sy stem.setProperty ("axis.ClientConfigFile", globusLocation + "/client-config.wsdd"); this.host = location.getHost(); this.security Ty pe = Constants.GSI_SEC_MSG; this.authorization = null; this.proxy = null; try { proxy = GlobusSecurity Utils.getGlobusCredential(gatContext, preferences, "globus", location, DEFAULT_GRIDFTP_PORT); } catch (CouldNotInitializeCredentialException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (CredentialExpiredException e) { // TODO Auto-generated catch block e.printStackTrace(); } this.notificationConsumerManager = null; this.notificationConsumerEPR = null; this.notificationProducerEPR = null; this.status = null; this.fault = null; factory Port = null; this.factory Url = PROTOCOL + "://" + host + ":" + DEFAULT_FACTORY_PORT + BASE_SERVICE_PATH + RFTConstants.FACTORY_NAME; locationStr = setLocationStr(location); } String setLocationStr(URI location) { if (location.getScheme().equals("any ")) { return "gsiftp://" + location.getHost() + ":" + location.getPort() + "/" + location.getPath(); } else { return location.toString(); } } protected boolean copy 2(String destStr) throws GATInvocationException { EndpointReferenceTy pe credentialEndpoint = getCredentialEPR(); TransferTy pe[] transferArray = new TransferTy pe[1]; transferArray [0] = new TransferTy pe(); transferArray [0].setSourceUrl(locationStr); transferArray [0].setDestinationUrl(destStr); int lifetime = DEFAULT_DURATION_HOURS * 60 * 60; ClientSecurity Descriptor secDesc = new ClientSecurity Descriptor(); if (this.security Ty pe.equals(Constants.GSI_SEC_MSG)) { secDesc.setGSISecureMsg(this.getMessageProtectionTy pe()); } else { secDesc.setGSITransport(this.getM essageProtectio nTy pe()); } secDesc.setAuthz(getAuthorization() ); if (this.proxy != null) { secDesc.setGSSCredential(this.proxy ); } // Get the public key to delegate on. X509Certificate[] certsToDelegateOn = null; try { certsToDelegateOn = DelegationUtil.getCertificateChainRP( delegationFactory Endpoint, secDesc); } catch (DelegationException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } X509Certificate certToSign = certsToDelegateOn[0]; } protected EndpointReferenceTy pe getCredentialEPR() throws GATInvocationException { this.status = null; URL factory URL = null; try { factory URL = new URL(factory Url); } catch (MalformedURLException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set factory URL failed, " + e); } try { factory Port = BaseRFTClient.rftFactory Locator .getReliableFileTransferFactory PortTy pePort(factory URL); } catch (ServiceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set factory Port failed, " + e); } setSecurity Ty peFromURL(factory URL); return populateRFTEndpoints(factory Port); } // FIXME remove when there is a DelegationUtil.delegate(EPR, ...) String protocol = delegationFactory Endpoint.getAddress().getScheme(); String host = delegationFactory Endpoint.getAddress().getH ost(); int port = delegationFactory Endpoint.getAddress().getPort(); String factory Url = protocol + "://" + host + ":" + port + BASE_SERVICE_PATH + DelegationConstants.FACTORY_PATH; // send to delegation service and get epr. EndpointReferenceTy pe credentialEndpoint = null; try { credentialEndpoint = DelegationUtil.delegate(factory Url, credential, certToSign, lifetime, false, secDesc); } catch (DelegationException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } return credentialEndpoint; } protected void setRequest(BaseRequestTy pe request) throws GATInvocationException { CreateReliableFileTransferInputTy pe input = new CreateReliableFileTransferInputTy pe(); if (request instanceof TransferRequestTy pe) { input.setTransferRequest((Transfer RequestTy pe) request); } else { input.setDeleteRequest((DeleteRequ estTy pe) request); } public EndpointReferenceTy pe[] fetchDelegationFactory Endpoints( ReliableFileTransferFactory PortTy pe factory Port) throws GATInvocationException { GetMultipleResourceProperties_Element request = new GetMultipleResourceProperties_Element(); request .setResourceProperty (new QName[] { RFTConstants.DELEGATION_ENDPOINT_FACTO RY }); GetMultipleResourcePropertiesResponse response; try { response = factory Port.getMultipleResourceProperties(request); } catch (RemoteException e) { e.printStackTrace(); throw new GATInvocationException( "RFTGT4FileAdaptor: getMultipleResourceProperties, " + e); } SOAPElement[] any = response.get_any (); Calendar termTimeDel = Calendar.getInstance(); termTimeDel.add(Calendar.MINUTE, TERM_TIME); input.setInitialTerminationTime(termTimeDel); CreateReliableFileTransferOutputTy pe response = null; try { response = factory Port.createReliableFileTransfer(input); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set createReliableFileTransfer failed, " + e); } EndpointReferenceTy pe reliableRFTEndpoint = response .getReliableTransferEPR(); ReliableFileTransferPortTy pe rft = null; try { rft = BaseRFTClient.rftLocator .getReliableFileTransferPortTy pePort(reliableRFTEndpoint); } catch (ServiceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: getReliableFileTransferPortTy pePort failed, " + e); } setStubSecurity Properties((Stub) rft); subscribe(rft); Calendar termTime = Calendar.getInstance(); termTime.add(Calendar.MINUTE, TERM_TIME); SetTerminationTime reqTermTime = new SetTerminationTime(); reqTermTime.setRequestedTerminationTime(termTime); try { rft.setTerminationTime(reqTermTime); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: setTerminationTime failed, " + e); } try { rft.start(new Start()); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: start failed, " + e); } RFTOptionsTy pe rftOptions = new RFTOptionsTy pe(); rftOptions.setBinary (Boolean.TRUE); // rftOptions.setIgnoreFilePermErr(false); TransferRequestTy pe request = new TransferRequestTy pe(); request.setRftOptions(rftOptions); request.setTransfer(transferArray ); request.setTransferCredentialEndpoint(cred entialEn dpoint) ; setRequest(request); while (!transfersDone()) { try { Thread.sleep(1000); } catch (InterruptedException e) { throw new GATInvocationException("RFTGT4FileAdap tor: " + e); } } return transfersSucc(); } public void copy (URI dest) throws GATInvocationException { String destUrl = setLocationStr(dest); if (!copy 2(destUrl)) { throw new GATInvocationException( "RFTGT4FileAdaptor: file copy failed"); } } public void subscribe(ReliableFileTransferPortTy pe rft) throws GATInvocationException { Map<Object, Object> properties = new HashMap<Object, Object>(); properties.put(ServiceContainer.CLA SS, "org.globus.wsrf.container.GSIServiceCon tainer"); if (this.proxy != null) { ContainerSecurity Descriptor containerSecDesc = new ContainerSecurity Descriptor(); Security Manager.getManager(); try { containerSecDesc.setSubject(JaasGssUtil .createSubject(this.proxy )); } catch (GSSException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: ContainerSecurity Descriptor failed, " + e); } properties.put(ServiceContainer.CO NTAIN ER_D ESCRIPTOR, containerSecDesc); } this.notificationConsumerManager = NotificationConsumerManager .getInstance(properties); try { this.notificationConsumerManager.startListening(); } catch (ContainerException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: NotificationConsumerManager failed, " + e); } List<Object> topicPath = new LinkedList<Object>(); topicPath.add(RFTConstants.OVERALL_ST ATUS_ RESOU RCE); ResourceSecurity Descriptor security Descriptor = new ResourceSecurity Descriptor(); String authz = null; if (authorization == null) { authz = Authorization.AUTHZ_NONE; } else if (authorization instanceof HostAuthorization) { authz = Authorization.AUTHZ_NONE; } else if (authorization instanceof SelfAuthorization) { authz = Authorization.AUTHZ_SELF; } else if (authorization instanceof Identity Authorization) { // not supported throw new GATInvocationException( "RFTGT4FileAdaptor: identity authorization not supported"); } else { // throw an sg throw new GATInvocationException( "RFTGT4FileAdaptor: set authorization failed"); } security Descriptor.setAuthz(auth z); Vector<Object> authMethod = new Vector<Object>(); if (this.security Ty pe.equals(Constants.GSI_SEC_MSG)) { authMethod.add(GSISecureMsgAuthMetho d.BOT H); } else { authMethod.add(GSITransportAuth Method. BOTH ); } try { security Descriptor.setAuthMethods(authMe thod); } catch (Security DescriptorException e) { private EndpointReferenceTy pe delegate( EndpointReferenceTy pe delegationFactory Endpoint) throws GATInvocationException { GlobusCredential credential = null; if (this.proxy != null) { credential = ((GlobusGSSCredentialImpl) this.proxy ) .getGlobusCredential(); } else { try { credential = GlobusCredential.getDefaultCredential(); } catch (GlobusCredentialException e) { throw new GATInvocationException("RFTGT4FileAdap tor: " + e); } } EndpointReferenceTy pe epr1 = null; try { epr1 = (EndpointReferenceTy pe) ObjectDeserializer.toObject(any [0], EndpointReferenceTy pe.class); } catch (DeserializationException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: ObjectDeserializer, " + e); } EndpointReferenceTy pe[] endpoints = new EndpointReferenceTy pe[] { epr1 }; return endpoints; } sy nchronized void setStatus(OverallStatus status) { this.status = status; } public int transfersActive() { if (status == null) { return 1; } return status.getTransfersActive(); } public int transfersFinished() { if (status == null) { return 0; } return status.getTransfersFinished(); } } public int transfersCancelled() { if (status == null) { return 0; } return status.getTransfersCancelled(); } private void setSecurity Ty peFromURL(URL url) { if (url.getProtocol().equals("http")) { security Ty pe = Constants.GSI_SEC_MSG; } else { Util.registerTransport(); security Ty pe = Constants.GSI_TRANSPORT; } } public int transfersFailed() { if (status == null) { return 0; } return status.getTransfersFailed(); } private void setStubSecurity Properties(Stub stub) { ClientSecurity Descriptor secDesc = new ClientSecurity Descriptor(); if (this.security Ty pe.equals(Constants.GSI_SEC_MSG)) { secDesc.setGSISecureMsg(this.getMessageProtectionTy pe()); } else { secDesc.setGSITransport(this.getM essageProtectio nTy pe()); } public int transfersPending() { if (status == null) { return 1; } return status.getTransfersPending(); } secDesc.setAuthz(getAuthorization() ); public int transfersRestarted() { if (status == null) { return 0; } return status.getTransfersRestarted(); } if (this.proxy != null) { // set proxy credential secDesc.setGSSCredential(this.proxy ); } stub._setProperty (Constants.CLIENT_DESCRIPTOR, secDesc); } public boolean transfersDone() { return (transfersActive() == 0 && transfersPending() == 0 && transfersRestarted() == 0); } public Integer getMessageProtectionTy pe() { return (this.msgProtectionTy pe == null) ? RFTGT4FileAdaptor.DEFAULT_MSG _PROTECTIO N : this.msgProtectionTy pe; } public boolean transfersSucc() { return (transfersDone() && transfersFailed() == 0 && transfersCancelled() == 0); } public Authorization getAuthorization() { return (authorization == null) ? DEFAULT_AUTHZ : this.authorization; } /* * private BaseFaultTy pe getFaultFromRP(RFTFaultResourceProperty Ty pe fault) { * if (fault == null) { return null; } * * if (fault.getRftTransferFaultTy pe() != null) { return * fault.getRftTransferFaultTy pe(); } else if * (fault.getDelegationEPRMissingFaultTy pe() != null) { return * fault.getDelegationEPRMissingFaultTy pe(); } else if * (fault.getRftAuthenticationFaultTy pe() != null) { return * fault.getRftAuthenticationFaultTy pe(); } else if * (fault.getRftAuthorizationFa ultTy pe() != null) { return * fault.getRftAuthorizationFaultTy pe(); } else if * (fault.getRftDatabaseFaultTy pe() != null) { return * fault.getRftDatabaseFaultTy pe(); } else if * (fault.getRftRepeatedly StartedFaultTy pe() != null) { return * fault.getRftRepeatedly StartedFaultTy pe(); } else if * (fault.getTransferTransientFaultTy pe() != null) { return * fault.getTransferTransientFaultTy pe(); } else { return null; } } */ private EndpointReferenceTy pe populateRFTEndpoints( ReliableFileTransferFactory PortTy pe factory Port) throws GATInvocationException { EndpointReferenceTy pe[] delegationFactory Endpoints = fetchDelegationFactory Endpoints(factory Port); EndpointReferenceTy pe delegationEndpoint = delegate(delegationFactory Endpoints[0]); return delegationEndpoint; } /* * private BaseFaultTy pe deserializeFaultRP(SOAPElement any ) throws * Exception { return getFaultFromRP((RFTFaultResourceProperty Ty pe) * ObjectDeserializer .toObject(any , RFTFaultResourceProperty Ty pe.class)); } */ void setFault(BaseFaultTy pe fault) { this.fault = fault; } } Ibis as ‘Glue’ ● Use IPL + SmartSockets, generally for wide-area communication ● ● Linking up separate ‘activities’ of an application ● Activities: often largely ‘independent’ tasks implemented in any popular language or model (e.g. C/MPI, CUDA, Fortran, Java…) ● Each typically running on a single GPU/node/Cluster/Cloud/… Automatically circumvent connectivity problems ● Example: With SmartSockets: No SmartSockets: Ibis as ‘HPC Solution’ ● Use Ibis as replacement for e.g. C++/MPI code ● Benefits: (better) portability ● malleability (open world) ● fault-tolerance ● (run-time) task migration Downside: ● requires recoding ● ● ● Comparable speedups: MMCA: Situation in 2004/2005 SSH Parallel Horus Parallel Horus Client Sockets + SSH Tunneling Server C++/MPI ● ● ● ● Code pre-installed at each cluster site Instable / faulty communication Connectivity problems Execution on each cluster ‘by hand’ Parallel Horus Client Phase 1: Ibis as ‘Master Key’ (2006) JavaGAT + IbisDeploy Parallel Horus Parallel Horus Client Sockets + SSH Tunneling Server C++/MPI ● ● ● ● Code pre-installed at each cluster site Instable / faulty communication Connectivity problems Execution on each cluster ‘by hand’ Parallel Horus Client Phase 2: Ibis as ‘Glue’ (2006/2007) JavaGAT + IbisDeploy Parallel Horus Parallel Horus Client IPL + SmartSockets Server C++/MPI ● ● ● ● Code pre-installed at each cluster site Instable / faulty communication Connectivity problems Execution on each cluster ‘by hand’ Parallel Horus Client Phase 3: Ibis as ‘HPC Solution’ (2008) JavaGAT + IbisDeploy Parallel Jorus Parallel Jorus Client IPL + SmartSockets Server Ibis/Java ● ● ● ● Code pre-installed at each cluster site Instable / faulty communication Connectivity problems Execution on each cluster ‘by hand’ Parallel Jorus Client ‘Master Key’ + ‘Glue’ + ‘HPC’ ● Step-wise conversion to 100% Ibis / Java ● ● ● ● ● Phase 1: JavaGAT as ‘Master Key’ Phase 2: IPL + SmartSockets as ‘Glue’ Phase 3: Ibis as ‘HPC Solution’ After each phase a fully functional, working solution was available! Eventual result: ● ● ● ‘wall-socket computing from a memory stick’ Remember: the ‘Promise of the Grid’? Awards at AAAI 2007 and CCGrid 2008 100% Ibis Implementation (2008++) Seinstra, Maassen, Drost et al. (SCALE 2008 @ CCGrid 2008: First Prize Winner) Bal, Maassen, Drost, Seinstra et al. (IEEE Computer, Aug 2010) Some current work Green Clouds ● ● ● NWO Smart Energy Systems project with Univ. of Amsterdam (Cees de Laat) & SARA How to map high-performance applications onto hybrid distributed computing system, taking both performance & energy consumption into account System-level approach to reduce HPC energy consumption DAS-4: infrastructure for Green IT UvA/MultimediaN (16/36) Dual quad-core Xeon E5620 Various accelerators (GPUs, multicores, ….) Scientific Linux Built by ClusterVision VU (74) SURFnet6 ASTRON (23) 10 Gb/s lambdas TU Delft (32) Leiden (16) Main ideas ● Adapt resources to application needs dynamically, accounting for computational & energy efficiency ● ● Exploit hardware diversity ● ● ● Using Ibis malleability support Graphics Processing Units (GPUs) have much higher FLOPS/Watt for many applications Use optical and photonic networks Build a knowledge base & semantic infrastructure description Other current PhD projects using Ibis ● Distributed reasoning over semantic web data ● ● ● Graph applications (HiPG) ● ● ● E.g. for bioinformatics applications http://www.graph500.org/ Games & distributed model checking ● ● WebPIE: Parallel reasoner on Web scale Written in Java, uses Hadoop (MapReduce) Deal with large state space Distributed smart phone applications ● Computation & communication offloading to a cloud Conclusions ● Ibis enables problem solving (avoids system fighting) ● Successfully applied in many domains ● ● ● Astronomy, multimedia analysis, climate modeling, remote sensing, semantic web, medical imaging, … Data intensive, compute intensive, real-time… Open source, download: ● www.cs.vu.nl/ibis/ Conclusions (2) ● Jungle Computing is hard ● High-Performance Jungle Computing even harder ● While research into efficient & green Jungle-aware programming models has only just begun… ● …Ibis provides the basic functionality to efficiently & transparently overcome most Jungle Computing complexities Acknowledgements Niels Drost Ceriel Jacobs Roelof Kemp Timo van Kessel Thilo Kielmann Jason Maassen Maarten van Meersbergen Rob van Nieuwpoort Nick Palmer Kees van Reeuwijk Frank J. Seinstra Kees Verstoep Ben van Werkhoven Gosia Wrzesinska