global access - PL-Grid

advertisement
Metadata Organization and Management for
Globalization of Data Access with
Michał Wrzeszcz, Krzysztof Trzepla, Rafał Słota, Konrad Zemek, Tomasz Lichoń,
Łukasz Opioła, Darin Nikolow, Łukasz Dutka, Renata Słota, Jacek Kitowski
ACC Cyfronet AGH
Department of Computer Science, AGH - UST
PPAM 2015
Krakow, Poland, September 6-9, 2015
Agenda
Motivation
Problems with Global Data Access
Is a new tool needed?
Onedata
Design Assumptions
Key Aspects of Data Access
Global data organization
Globally distributed metadata
Results
Conclusions
Motivation
Scientific communities require global access that integrates
independently managed resources.
Metadata organization and management is a key to make global
access effective, simple and convenient.
Problems with Global Data
Access
Storage heterogeneity and
delays/bandwidth issue.
No accounts integration:
Difficult access (security
Manual transfer of data
issues).
before/after computations.
Problematic data sharing.
Is a new tool needed?
Globus Connect
iRODS
PanFS
Parrot
GoogleDrive Gluster
LFC
Dropbox
BeeFS
Onedata - Design Assumptions
All organizations (providers) supporting a user have access to
all data and meta-data concerning the given user.
No central server for the metadata for the sake of
performance and availability.
No replication everything to everyone, optimally
managing the redundancy data.
Data access efficiency:
Minimal overhead when the data is close to client.
In the case of remote data an efficient fragment access.
Onedata - Key Aspects of Data
Access
Global data organization
Hides complexity of data distribution from users
Indicates which remote data should be observed by
each organization
Globally distributed metadata
No trust between providers
Caching vs. coherency
Global data organization
Easy management and sharing of data for users.
Limitation of metadata that provider should know.
Global metadata distribution
3 metadata levels
Metadata used to coordinate providers’ cooperation
Files metadata stored by each provider
Current usage metadata
Usage optimization
Lower level -> more frequent usage -> higher
distribution
Caching and aggregation of changes
Changes pushing to caches
Global metadata distribution
Level 1
Supports cooperation (users
accounts integration)
Provides information which
lower level metadata should be
synchronized with whom
(spaces metadata)
Stored by Global Registry –
distributed application which
works as trusted mediator
Global metadata distribution
Level 2
Files metadata
File parts location description
Stored by each provider that supports particular space
Fast access to needed metadata
Limited number of synchronization operations
Propagation of changes on the basis of Level 1 metadata
Changes aggregation
Automatic conflicts resolution
Level 1 metadata caching
Global metadata distribution
Level 3
Metadata about current files usage
Who should be notified about file change
Where data is currently modified
Stored by providers, cached by clients
First aggregation at client side, second at provider’s
Updates Level 2 metadata
Global metadata distribution
Sum up
More changes -> lower
Global Registry
Level 1
level -> more power
Level 1 Cache
Level 1 Cache
Provider 1
Level 3
Provider 2
Level 2
Caching & aggregation vs. time needed to
gain global consistency
Level 2
Level 3
Level 3 Cache
Client
Set balance at provider level (dynamic clients reconfiguration)
Locks for immediate consistency
Results
Simplicity
Easy organization
of data
Global distribution
hidden
Easy results
publishing
Results
Cooperation
Results
Efficiency
Conclusions
Data organization allows hiding global distribution from
users keeping providers’ independence
Ready for global users cooperation
Efficient enough for computations
Onedata status
Onedata v1 installed in production environment of
ACC Cyfronet AGH
Onedata v2 currently tested by international
organizations
Thank you
onedata homepage:
http://www.onedata.org
Download