Cloud Storage in Czech Republic Czech national Cloud Storage and Data Repository project Cloud Computing and Cloud Storage and Data Repository Grid Computing and Storage Web Services in the Cloud What is Cloud Storage and Data Repository? Understanding Cloud Storage and Data Repository current and future requirements Cloud Computing and Storage • ̵ Grid Computing • Refer to resource-pooled environments for running compute jobs (like image processing) rather than long running processes (such as a Web site or e-mail server) ̵ Utility Computing • Refer to resource-pooled environments for hosting long running processes, and tends to be focused on meeting service levels with the optimal amount of resources necessary to do so ̵ ̵ ̵ Cloud Computing Refer to a variety of services available over the Internet that deliver compute functionality on the service provider's infrastructure Its environment (infrastructure) may actually be hosted on either a grid or utility computing environment, but that doesn't matter to a service user The data in the cloud, as “Intel inside” (or intelligence inside), is often an important part of the services Cloud Computing – Simple Definition Cloud Computing = Software as a Service + Platform as a Service + Infrastructure as a Service + Data as a Service Cloud Computing – Simple Definition Cloud Computing = Software as a Service + Platform as a Service + Infrastructure as a Service + Data as a Service • Software as a Service (SaaS) ̵ ̵ ̵ From end user’s point of view Apps are located in the cloud Software experiences are delivered through the Internet Cloud Computing – Simple Definition Cloud Computing = Software as a Service + Platform as a Service + Infrastructure as a Service + Data as a Service • Platform as a Service (PaaS) ̵ ̵ From developer’s point of view (i.e. cloud users) Cloud providers offer an Internet-based platform to developers who want to create services but don't want to build their own cloud Cloud Computing – Simple Definition Cloud Computing = Software as a Service + Platform as a Service + Infrastructure as a Service + Data as a Service • Infrastructure as a Service (IaaS) ̵ Cloud providers build datacenters • Power, scale, hardware, networking, storage, distributed systems, etc ̵ Datacenter as a service ̵ Cloud users rent storage, computation, and maintenance from cloud providers (pay-as-you-go; like utility) Infrastructure of Mega Datacenters Not us! We plan 3 + 1 datacenters (3 PB + 6 PB + 12 PB + ? PB) in 3 Czech cities. All will be housed on Czech universities campuses in a rebuild server rooms. Knowledge & Data Intelligence as a Service Cloud Computing = Software as a Service + Platform as a Service + Infrastructure as a Service + Data as a Service Data Information Knowledge Intelligence ̵ Infrastructure for Web-scale data mining and knowledge discovery ̵ Empower people with knowledge ̵ Empower applications and services with intelligence Summary • The real underlying value of “cloud + clients” is that it transparently makes software, data, and computing available everywhere Czech National Storage Cloud and Data Repository (CESNET.cz) • Funding is provided by EU (85%) and Czech government (15%). It totals about 24 million Euros for the total project (16 mil for the 40+ Gb/s networking), (4 mil for the data repository and storage cloud), and the rest for the small computing grids and cloud computing system collocated with the 3 main data repository sites (Pilsen 3+PB, Pardubice near Prague 5 PB an Brno 9 PB) • Design, investment, testing and realization phase is 20112013. The sustainability phase is till 2018. Then all project hast to be self sufficient funded by Czech government Access protocols • Standard protocols ̵ All user required standard protocols will be supported if possible ̵ CIF, SMB, NFS v4, WebDav, FTP, HTTP • Non Standard protocols ̵ many user required special protocols will be supported if possible depending on the project ̵ Open Source • xrootd (LHC Cern), iRods middleware and others Authentication and Security • Community based authentication provided by CESNET and Czech universities will be used when possible • VPN tunnels will be used for less secure but standard protocols like CIF, FTP or non secure HTTP • We will research other means of authentication together with either Czech Universities and academia or emerging world wide authentication and security standard • We will consider using encryption of transferred and stored user data from the client computer A typical site (1 of 3) • • • • • tier 0 (first site 0 PB) tier 1 – fast FC or SAS disks, 15k, (first site 50 TB) tier 2 – cheap SATA disks, 7.5k and 5 k (first site 400 TB) tier 3 – FC tape robots (first site 5 LTO5 drives, 3+ PB) dual dedicated DWDM 2 x 10 Gbit/sec (future 40, 100 Gbit/sec) • several front end servers • HSM (ORACLE SUN SAM, GPFS Tivoli, DMF, etc.) Networking • All three sites will be connected by dual dedicated network 2x10Gb/s. This will be upgraded to dual (or more) 40 (or 100?) Gb/s • All three sites will create one big virtual data repository with possibility of remote replicas • We prefer the replica concept to a classical backup concept Users • All three sites will look like one big virtual data repository • All usage will be free for academic and non profit users • Data curation will be set to 7+ years (if funding model works even possibly infinite) • Catch all users go to CatchAll virtual organization • Special user will negotiate special services and condition • Special SLDs (Service Level Declarations) • Open to international collaborations in EU and elsewhere Conclusion • Questions, comments? • kremenek@cesnet.cz