Nenad Medvidovic
USC-CSSE and Computer Science Department
University of Southern California neno@usc.edu
http://csse.usc.edu/~neno/
Collaborative work with
Joshua Garcia, Ivo Krka, Chris Mattmann, and Daniel Popescu
• A distributed systems technology that enables the sharing of resources across organizations scalably, efficiently, reliably, and securely
• Analogous to the electric grid
• A highly successful technology
• Deficiencies in the existing guidance for building grids
More to come
• Grids are not easy to build
– See CERN’s Large Hadron Collider
• Their architecture was published very early
– “anatomy” and “physiology”
• Yet
“ What is (not) a grid?
” is still a subject of debate
• Grids are large, complex systems
– Thousands of nodes or more
– Span many agency boundaries
• Qualities of Service (QoS) are critical
– Scalability
– Security
– Performance
– Reliability ...
• Software architecture is just what the doctor ordered
The set of principal design decisions about a software system [Taylor, Medvidovic, Dashofy 2009]
• Study grid’s reference requirements and architecture
• Study the architectures of existing grid technologies
• Compare the two
Knowing that there will likely be very few straightforward answers
• Suggest how to fix any discrepancies
Knowing that there will likely be very few straightforward answers
Technology
Alchemi
Apache Hadoop
Apache HBase
Condor
DSpace
Ganglia
GLIDE
Globus 4.0 (GT 4.0)
Grid Datafarm
Gridbus Broker
Jcgrid
OODT
Pegasus
SciFlo iRODS
Sun Grid Engine
Unicore
Wings
PL
C# (.NET)
Java, C/C++
Java, Ruby, Thrift
Java, C/C++
Java
C
Java
Java, C/C++
Java, C
Java
Java
Java
Java, C
Python
Java, C/C++
Java, C/C++
Java
Java
KSLOC # Modules
26.2
66.5
186
1643
6.7
14
79
18.5
84.1
265.1
571
8.8
14.1
51.6
23.4
19.3
2
2218.7
51.4
30.5
150
320
659
129
163
572
3665
97
362
962
217
22
57
2522
220
566
• Establish idealized architecture and candidate architectural style(s)
• Identify data and processing components
– Groups implementation modules according to a set of rules
• Map identified data and processing components onto an idealized architecture
Examine
Source code
Documentation
Runtime behavior
Tie to requirements satisfied by component
1. Group based on isolated classes
2. Group based on generalization
3. Group based on aggregation
4. Group based on composition
5. Group based on two-way association
6. Identify domain classes
7. Merge classes with a single originating domain class association into domain class
8. Group classes along a domain class circular dependency path
9. Group classes along a path with a start node and end node that reference a domain class
10. Group classes along paths with the same end node, and whose start node references the same domain class
• Domain class rules
– Class with large majority of outgoing calls
• Exclusion rules
– Class with large majority of incoming calls
– Utility classes
– Heavily passed data-structures
– Benchmarking and test classes
• Additional groupings
– By exception
– By interface
– By package if idealized architecture matches first-class component
• Infer distributor connectors from idealized architecture
• Classes with methods and names similar to first-class components are domain classes
• Classes importing network communication libraries are domain classes
• main() functions often identify first-class components
• Classes deployed onto different hosts must be grouped separately
• Empty layers
• Skipped Layers
• Up-calls
• Multi-layer components
Empty
Layers
Wings -
Skipped
Layers
Pegasus -
Upcalls
Hadoop -
Multi-Layer
Components
iRODS -
Application
CLOptionDescriptor
GetOpts JMSAdapterClient
ServiceRequest OGSA ClientOperation CommandLineTool CL Option
ToolingCommand
Document
AND
CLArgsParser
Upcall upcall
Element
Collective
JavaGridServiceDeployConstants
GridContext GenerateUndeploy
JavaGridServiceDeployWriter
Emiter
ServiceLocator determine right
EJBFactoryCallback
TypeMappingInfo
WSDL2 Map NotificationSubscriptionFactoryCallbackImpl upcall
Two layer
DescriptorHandler List TimerTask ServiceNotificationThread BasicHandler TypeEntry
WSDL2Java PersistentGridServiceImpl
ServiceDataAnnotation
ServiceDataAttributes
ServiceAnnotatorSimpleWriter ServiceDataSet ServiceData WSDLConstants ServiceDeployment ServiceLifecycleMonitorImpl
HandleType
Upcall
ExtendedDateTimeType SecureContainerHandler WSDDService ServiceActivatorHolder ServiceDesc FlattenedWSDLDefinition Java2WSDL ServiceEntry upcall
ServiceContainer
Connectivity
BinarySecurityTokenFactory
Method BinarySecurityToken Semaphore SecurityDescriptor
PrivilegedInvokeMethodAction
RPCURIProvider MessageContext
CreateInfo
Two layer
OGSI LoggingFaultElement GSSCredential boundary AND
OGSI AuthenticationToken OGSI FaultType OGSIHolder
UUID
OGSIType
PrivateKey
X509 Certificate
GroupLogAttribute
AuthMethod
OGSI AuthenticationFault
PerformanceLog
JAXRPCHandler
SecContext
GSSContext
NotificationSinkManager
ServicePropertiesImpl
Fabric
Upcall
SymbolTable JavaClassWriter
Exception Data
HomeWrapper
GlobusDescriptorSetter
Utilities
Parser
Discrepancies
Found
• The connectivity layer is eliminated
• Explicitly addressing deployment view
• Subsystem types rather than layer-oriented
• Four architectural styles comprise the grid
– Client/server
– Peer-to-peer
– Layered
– Event-based
• An improved classification of grid technologies
Revised Grid
Reference
Architecture
• Application components are clients to Collective components
– e.g., application components query for resource component locations from collective components
• Application components are clients to Resource components
– e.g., direct job submission from
application components to resource components
• Resource components can act as clients to Collective components
– e.g., resource components may obtain locations of other resource components through collective components
• Resource components are peers
– e.g., Grid Datafarm Filesystem
Daemon (gfsd) instance makes requests for file data from other gfsds
• Collective components are peers
– e.g., iRODS agents communicate with each other to exchange data to create replicas
• Resource components notify
Collective components that monitor them
– e.g., executors send heartbeats to managers
• Collective or Resource
components request services from Fabric components
– e.g., iRODS agent accesses a
DBMS with metadata
• Computational grid
– Implementing all
Collective components
– e.g., Alchemi and Sun
Grid Engine
• Data grid
– Job scheduling components in Collective subsystem are not required
– e.g., Grid Datafarm and
Hadoop
• Hybrid
– Resource components providing services either to perform operations on a storage repository or to execute a job or task
– e.g. Gridbus Broker and iRODS
File
Resource
Computational
Resource
• Why were there originally so many upcalls ?
– Legitimate client-server and event-based communication
• Why so many skipped layer calls?
– The Fabric layer was at the wrong level of abstraction
– Mostly utility classes that should be abstracted away
• Why so many multi-layer components ?
– Connectivity layer was at the wrong level of abstraction
– Not a layer, but utility libraries to enable connector functionality
– Also accounts for skipped layer calls
• Benefit of the deployment view
– Essential for distributed systems
– Helped to identify that the Fabric layer was not abstracted properly
• There are remaining violations
– Are they legitimate or a result of an improperly recast reference architecture?
• Original Focus is not ideal for recovering systems of these types
– Distributed systems realized by a middleware
• A more automated approach that combines static and dynamic analysis would be preferable
• Use the recast reference architecture to build a new grid
• What are the overarching grid principles?
1.
A grid is a collection of logical resources (computing and data) distributed across a wide-area network of physical resources (hosts).
2.
In a single grid-based application, the logical resources are owned by a single agency, while the physical resources are owned by multiple agencies.
3.
All resources in a grid are described using a common meta-resource language.
4.
Atomic-level logical resources are defined independently of the atomic-level physical resources.
5.
The allocation of the atomic-level logical resources to the atomic-level physical resources can be N:M.
6.
All computation in a grid is initiated by a client, which is a physical resource.
The client sends the logical resources to the servers, which are also physical resources. A server can, in turn, delegate the requested computation to other physical resources.
7.
All agencies that own physical resources in a grid must be able to specify policies that enforce the manner in and extent to which their physical resources can be used in grid applications.