RSS Data-farm: from local storage to the cloud Julio Carreira ESA-ESRIN, Frascati , Roma (Italy) Storage Solutions for Digital Preservation APARSEN Webinars , 14th April 2014 EOP-GTR Systems Page 1 What is RSS • RSS architecture and environments have been design and made available by the Research/Service Support and Ground Segment Technology (EOP-GSR) section of ESA Earth Observation Ground Segment Department. • Its primary mission is to EOP-GSR – Provide support to the EO community to exploit EO data and to the researchers and service provider to ease the development of applications to generate value added information – Support and promote ground segment harmonization activities through the identification and collection of ground segment technology needs Page 2 RSS systems Standardised IFs following HMA Guidelines •MEA Storage •ESA RSS Storage •OGC GIS Web Service •MEA Storage •G-POD Catalogue •Data Farm •EO Products, Image Collection •DAIL SSE •OGC Web GIS Services •Online Archives •Metadata, Auxiliary data •UK PAC •Referenc e Data Sets EOP-GSR Page 3 RSS storage infrastructure • GPOD have 20 storages (NAS/SAN) , the full space is ~ 600 TB • ~7 000 000 products storage • ~ 350 TB of data • Our storages are from 2005 till 2013 • Increase of > 50 TB year • Use of RAID 5/6 with/without spare disk • Monitoring the storage system using OPSView EOP-GSR Page 4 The use of data at RSS EOP-GSR Page 5 Ancient times (around 2010) The problem • All the Storages had different paths to the data sctructure • A catalogue was necessary to know were was the data • No direct access to a flow of data (ex. A full year of a dataset) EOP-GSR Page 6 The answer! The solution was to create a datafarm(distributed file system): •POSIX based file system •Can be deploy on top of an pre-existent file system •Mirroring and replication •Load balancing •Storage quotas •ACLs for user access to data EOP-GSR Page 7 RSS datafarm • The RSS DataFarm allows much more flexibility than before in accessing data. For example, it is now possible to ingest data directly from the former GPOD dedicated storages into the RSS WebMap Server, with no need to copy data on a local storage. The same applies to SSE, KEO and the other RSS environments. • Other benefits brought by the DataFarm are: – Optimized storage space utilization – Easy access control – Easy scalability • At the moment the RSS archive, composed by ENVISAT, ERS and third party missions data has a total volume around 350TB growing by some 50TB/yr. • RSS DataFarm move a step towards the Cloud as well. Indeed, this novel RSS infrastructure model is been naturally extended to the Cloud, therefore constituting a robust and scalable basis for providing more and more efficient and flexible support to EO data users in the coming years. EOP-GSR Page 8 Today! •All the Storages have a single point of access Page 9 In the far future (Dec 2014) EOP-GSR Page 10 Main tecnical Problems •Main Problems •Network Polices •Network velocity •Security EOP-GSR Page 11 Different Elements of Risk •Availability High •Performance •Portability High Medium •Responsiveness Medium •Confidentiality High •Legal Problems Medium •Backup •Data integrity High •Disaster recovery Medium •Access control High •Reporting and monitoring •Client expectations •Responsiveness to users •Maintaining corporate values of quality of service EOP-GSR Page 12 At this moment! • At this moment on the development environment, we have 15 Tb on the cloud, and we are: EOP-GSR – Testing the geo replication of “critical” data, – Making stress testing with massive data requests, – Monitoring the availability and the performance of the file system, – Planning the security and the encryption of the communications, – Creating a software for a easy portability between cloud and virtualization providers, – Making a cost model of the storage on the cloud comparing to the local storage. Page 13 Conclusions • We are happy with our solution of unifed storage using a distributed file system • We are testing the portability to the cloud with sucess • We are creating policies/rules to mitigate the main risks of having the storage on the cloud EOP-GTR Page 14 Thank you EOP-GTR Page 15