网格计算与云计算 “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008 http://linux.sys-con.com/node/587717 Confused? Grid Computing Virtualization Cluster Computing ? SaaS P2P SaaS = Software as a Service Utility Computing ? Cloud Computing One can categorize each component Utility Computing Cloud Computing SaaS Cluster Computing Virtualization Grid Computing P2P Usage Model Infrastructure 网格计算 What is a Grid? Enable “coordinated resource sharing & problem solving in dynamic, multi-institutional virtual organizations.” (Source: “The Anatomy of the Grid”) 5 Virtual Organizations 6 TeraGrid What is the TeraGrid? Technology + Support = Science 8 – – NSF已投资2.460亿美元 自2004年10月已处于生产运行阶段,目前已用高性能网络集成了每秒750万亿次计算能 力、30PB存储空间和100多个学科的数据库资源。 TeraGrid’s 3-pronged strategy to further science • DEEP Science: Enabling Terascale Science – Make science more productive through an integrated set of veryhigh capability resources • ASTA projects • WIDE Impact: Empowering Communities – Bring TeraGrid capabilities to the broad science community • Science Gateways • OPEN Infrastructure, OPEN Partnership – Provide a coordinated, general purpose, reliable set of services and resources • Grid interoperability working group 9 TeraGrid Used 10 TeraGrid PI’s By Institution Blue: 10 or more PI’s Red: 5-9 PI’s Yellow: 2-4 PI’s Green: 1 PI TeraGrid PI’s 11 TeraGrid Resources Computational Resources ANL/UC IU NCSA ORNL PSC Purdue SDSC TACC Itanium 2 (0.5 TF) IA-32 (0.5 TF) Itanium2 (0.2 TF) IA-32 (2.0 TF) Itanium2 (10.7 TF) SGI SMP (7.0 TF) Dell Xeon (17.2TF) IBM p690 (2TF) Condor Flock (1.1TF) IA-32 (0.3 TF) XT3 (10 TF) TCS (6 TF) Marvel SMP (0.3 TF) Hetero (1.7 TF) IA-32 (11 TF) Itanium2 (4.4 TF) Power4+ (15.6 TF) Blue Gene (5.7 TF) IA-32 (6.3 TF) 32 TB 1140 TB 1 TB 300 TB 26 TB 1400 TB 50 TB 1.2 PB 5 PB 2.4 PB 1.3 PB 6 PB 2 PB 10 CHI 30 CHI 30 CHI 10 CHI 10 LA 10 CHI 5 Col. >3.7 TB URL/DB/ GridFTP > 30 Col. URL/SRB/DB/ GridFTP 4 Col. 7 TB SRB/Portal/ OPeNDAP >70 Col. >1 PB GFS/SRB/ DB/GridFTP 4 Col. 2.35 TB SRB/Web Services/ URL RB IA-32, 48 Nodes RB RI, RC, RB UltraSPARC IV, 512GB SMP, 16 gfx cards 100+ TF 8 distinct architectures 3 PB Online Disk Online Storage 20 TB Mass Storage Net Gb/s, Hub 30 CHI Data Collections # collections Approx total size Access methods >100 data collections Instruments Visualization Resources RI: Remote Interact RB: Remote Batch RC: RI/Collab 12 Proteomics X-ray Cryst. RI, RC, RB IA-32, 96 GeForce 6600GT 10 ATL Opportunistic SNS and HFIR Facilities RB SGI Prism, 32 graphics pipes; IA-32 RI, RB IA-32 + Quadro4 980 XGL Science Gateways A new initiative for the TeraGrid • Increasing investment by communities in their own cyberinfrastructure, but heterogeneous: • Resources • Users – from expert to K-12 • Software stacks, policies • Science Gateways – Provide “TeraGrid Inside” capabilities – Leverage community investment • Three common forms: – Web-based Portals – Application programs running on users' machines but accessing services in TeraGrid – Coordinated access points enabling users to move seamlessly between TeraGrid and other grids. 13 Workflow Composer Gateways are growing in numbers • • • 10 initial projects as part of TG proposal >20 Gateway projects today No limit on how many gateways can use TG resources – Prepare services and documentation so developers can work independently • • • • • • • • • • • • • Open Science Grid (OSG) Special PRiority and Urgent Computing Environment (SPRUCE) National Virtual Observatory (NVO) Linked Environments for Atmospheric Discovery (LEAD) Computational Chemistry Grid (GridChem) Computational Science and Engineering Online (CSEOnline) GEON(GEOsciences Network) Network for Earthquake Engineering Simulation (NEES) SCEC Earthworks Project Network for Computational Nanotechnology and nanoHUB GIScience Gateway (GISolve) Biology and Biomedicine Science Gateway Open Life Sciences Gateway The Telescience Project Grid Analysis Environment (GAE) Neutron Science Instrument Gateway TeraGrid Visualization Gateway, ANL BIRN Gridblast Bioinformatics Gateway Earth Systems Grid Astrophysical Data Repository (Cornell) • Many others interested • • • • • • • • – – 14 SID Grid HASTAC OSG (Open Science Grid) Open Science Grid (OSG) Origins: – National Grid (iVDGL, GriPhyN, PPDG) and LHC Software & Computing Projects Current Compute Resources: – 61 Open Science Grid sites – Connected via Inet2, NLR.... from 10 Gbps – 622 Mbps – Compute & Storage Elemets – All are Linux clusters – Most are shared • Campus grids • Local non-grid users – More than 10,000 CPUs • A lot of opportunistic usage • Total computing capacity difficult to estimate • Same with Storage 16 OSG Snapshot 96 Resources across production & integration infrastructures Using production & research networks Snapshot of Jobs on OSGs Sustaining through OSG submissions: 3,000-4,000 simultaneous jobs . ~10K jobs/day ~50K CPUhours/day. Peak test jobs of 15K a day. 20 Virtual Organizations +6 operations Includes 25% non-physics. ~20,000 CPUs (from 30 to 4000) ~6 PB Tapes ~4 PB Shared Disk 17 What is the Open Science Grid? MCGILL HARVARD ALBANY BU BUFFALO UMICH UWM WSU CORNELL BNL MSU WISC PSU FNAL UIC IOWA STATE UNI UCHICAGO LEHIGH ANL PURDUE UNL UIUC NSF IUPUI UVA INDIANA IU KU NERSC STANFORD VANDERBILT UCLA CALTECH UCR SDSC UNM OU TTU UMISS ORNL CLEMSON UTA LTU SMU LSU (+Brazil, Mexico, Tawain, UK) UFL OSG应用 Genome sequence analysis Sloan digital sky survey Earth System Grid: O(100TB) online data STAR: 5 TB transfer (SRM, GridFTP) Earth System Grid EGEE (Enabling Grids for E-sciencE) European Grid Initiative 22 Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … 23 >250 sites 48 countries >50,000 CPUs >20 PetaBytes >10,000 users >150 VOs >150,000 jobs/day June 2, 2008 Users and resources distribution 24 June 2, 2008 EGEE workload in 2007 Data: 25PB stored 11PB transferred CPU: 114 Million hours CPU Xfer Storage http://gridview.cern.ch/GRIDVIEW/same_index.php http://calculator.s3.amazonaws.com/calc5.html? 17/05/08 $58688679.08 25 LCG (LHC Computing Grid) LHC - Large Hadronic Collider GRID Tutorial - How to use LCG 4 experiments: ATLAS Alice CMS LHCb 27 km long pipe 7+7 TeV 3/18/2016 Federico Calzolari 27 LCG - LHC Computing Grid 目前集成了33个国家的 140个计算中心。 GRID Tutorial - How to use LCG 2008年将执行1亿个计 算任务。 3/18/2016 Federico Calzolari 28 Proxy certificate GRID Tutorial - How to use LCG Get your proxy certificate temporary (usually 24h) certificate depending on VO: grid-proxy-init voms-proxy-init -voms <VO>:/<VO>/Role=<role> -valid 1000:00 3/18/2016 29 Certificate GRID Tutorial - How to use LCG Install your certificate on the User Interface: Log in into the UserInterface, copy there the file you exported, and create a directory where your certificate + private key will be stored: mkdir ~/.globus Convert PKCS12 file .p12 into the supported standard .pem This operation will split your mycert.p12 file in two files: the certificate (usercert.pem) and the private key (userkey.pem) openssl pkcs12 -nocerts -in <mycert.p12> -out ~/.globus/userkey.pem openssl pkcs12 -clcerts -nokeys -in <mycert.p12> -out ~/.globus/usercert.pem chmod 0400 ~/.globus/userkey.pem chmod 0600 ~/.globus/usercert.pem At end you should have something like: [user@userinterface .globus]$ ls -al -rw------- 1 user user 2008 Nov 13 16:50 usercert.pem -r-------- 1 user user 963 Nov 13 16:50 userkey.pem 3/18/2016 Federico Calzolari 30 Register to a VO GRID Tutorial - How to use LCG http://grid-it.cnaf.infn.it for generic user 3/18/2016 31 JDL: Job Description Language GRID Tutorial - How to use LCG JOB overview: JDL (job encapsulation) main script executable program Creation Submission Status Retrieval 3/18/2016 32 JDL GRID Tutorial - How to use LCG test.jdl Executable StdOutput StdError InputSandbox OutputSandbox VirtualOrganisation DataAccessProtocol InputData OutputSE = = = = = = = = = "script.sh"; "std.out"; "std.err"; {"script.sh","exe.bin"}; # Input {"std.out","std.err","out"}; # Output "<VO>"; {"file","gsiftp","rfio","dcap"}; {"lfn:/grid/<VO>/<FILE>"}; "<SE>"; Requirements=Member("<SITE>", other.GlueHostApplicationSoftwareRunTimeEnvironment && other.GlueCEName=="<QUEUE>"); 3/18/2016 33 Main script GRID Tutorial - How to use LCG script.sh #!/bin/sh # Environment date >> out2 hostname >> out2 # Get data lcg-cp [-v] --vo <VO> lfn:<file> file:///data.tgz # Unpack input [data.tgz: src.cpp,...] tar -zxvf data.tgz # Compile source g++ src.cpp -o exe.bin chmod u+x exe.bin # Exec program ./exe.bin > out # Pack output tar -zcvf out.tgz out out2 3/18/2016 34 Submit a Job GRID Tutorial - How to use LCG Submit a JOB edg-job-submit -o ID <JDL> # save JOBid on file ID Selected Virtual Organisation name (from JDL): cms Connecting to host rb119.cern.ch, port 7772 # Resource Broker Logging to host rb119.cern.ch, port 9002 ********************************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://rb119.cern.ch:9000/tG3Xp2jT_58IUeXoY1GoZQ # JOBid ********************************************************************************************* Control JOB status edg-job-status <JOBid> [https://rb119.cern.ch:9000/tG3Xp2jT_58IUeXoY1GoZQ] ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://rb119.cern.ch:9000/tG3Xp2jT_58IUeXoY1GoZQ Current Status: Waiting / Scheduled / Running / Done (Success/Abort) Status Reason: Job successfully submitted to Globus Destination: ce0001.m45.ihep.su:2119/jobmanager-lcgpbs-cms reached on: Sat Nov 17 22:38:34 2007 ************************************************************* 3/18/2016 35 Get the output JOB output retrieve GRID Tutorial - How to use LCG edg-job-get-output <JOBid> [https://rb119.cern.ch:9000/tG3Xp2jT_58IUeXoY1GoZQ] Retrieving files from host: rb119.cern.ch( for https://rb119.cern.ch:9000/tG3Xp2jT_58IUeXoY1GoZQ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://rb119.cern.ch:9000/tG3Xp2jT_58IUeXoY1GoZQ have been successfully retrieved and stored in the directory: /tmp/jobOutput/<USER>_ tG3Xp2jT_58IUeXoY1GoZQ ********************************************************************************* ls -al /tmp/jobOutput/calzolar_ tG3Xp2jT_58IUeXoY1GoZQ -rw-r--r--rw-r--r--rw-r--r-- 3/18/2016 1 calzolar cms 11 Nov 17 23:59 out 1 calzolar cms 133 Nov 17 23:59 std.err 1 calzolar cms 8 Nov 17 23:59 std.out 36 Job Requirements JDL Requirements GRID Tutorial - How to use LCG everywhere NO Requirements at Pisa Requirements=Member("INFN-PISA",other.GlueHostApplicationSoftwareRunTimeEnvironment); on a queue 1 day at least long Requirements=(other.GlueCEPolicyMaxCPUTime>60*24); on a site with at least 20 free CPU Requirements=(other.GlueCEStateFreeCPUs>20); on a site with at least 1 TB (unit:kb) local disk available Requirements=anyMatch(other.storage.CloseSEs,target.GlueSAStateAvailableSpace > 1000000000); on a site with a given software locally installed Requirements=Member(”VO-<VO>-TAG",other.GlueHostApplicationSoftwareRunTimeEnvironment); 3/18/2016 37 Requirements TAGs from SINICA http://goc.grid.sinica.edu.tw/gstat/<SITE>/ GRID Tutorial - How to use LCG GlueHostOperatingSystemName: Scientific Linux CERN GlueHostOperatingSystemRelease: 4.5 GlueHostOperatingSystemVersion: Beryllium GlueSubClusterPhysicalCPUs: 0 GlueSubClusterLogicalCPUs: 0 GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2 LCG-2_1_0 LCG-2_1_1 LCG-2_2_0 LCG-2_3_0 LCG-2_3_1 LCG-2_4_0 LCG-2_5_0 LCG-2_6_0 LCG-2_7_0 GLITE-3_0_0 R-GMA INFN-PISA SI00MeanPerCPU_1800 SF00MeanPerCPU_2000 MPICH MPI_HOME_NOTSHARED AFS VO-atlas-cloud-IT VO-atlas-production-12.0.5 VO-atlas-production-12.0.6 VO-atlas-production-12.0.7 […] 3/18/2016 38 Resources search Query CPU / Storage available per VO GRID Tutorial - How to use LCG lcg-infosites --vo <VO> ce #CPU Free Total Jobs Running Waiting ComputingElement ---------------------------------------------------------165 1 1 0 1 ce.phy.bg.ac.yu:2119/jobmanager-pbs-cms 120 11 0 0 0 fangorn.man.poznan.pl:2119/jobmanager-pbs-cms 192 110 0 0 0 gridce.atlantis.ugent.be:2119/jobmanager-pbs-cms 212 0 529 146 383 gridce.iihe.ac.be:2119/jobmanager-pbs-cms 227 5 312 222 90 ingrid.cism.ucl.ac.be:2119/jobmanager-lcgcondor-cms 15 15 0 0 0 ce002.ipp.acad.bg:2119/jobmanager-lcgpbs-cms 80 43 0 0 0 ce02.grid.acad.bg:2119/jobmanager-pbs-cms 24 13 0 0 0 ce001.grid.uni-sofia.bg:2119/jobmanager-lcgpbs-cms lcg-infosites --vo <VO> se Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------97470000 n.a n.a dpm.phy.bg.ac.yu 395467659 779205896 n.a cmsse01.ihep.ac.cn 27664924 59878772 n.a se001.grid.uni-sofia.bg 149180000 n.a n.a se.hpc.iit.bme.hu 1 1 n.a dcsrm.usatlas.bnl.gov 190040000 208 n.a lxdpm101.cern.ch 1000000000000 500000000000 n.a castorgrid.cern.ch 1000000000000 500000000000 n.a srm.cern.ch 3/18/2016 39 Resources search Query available sites for my Job GRID Tutorial - How to use LCG edg-job-list-match <JDL> Selected Virtual Organisation name (from JDL): cms Connecting to host rb119.cern.ch, port 7772 *************************************************************************** COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* a01-004-128.gridka.de:2119/jobmanager-pbspro-cmsS a01-004-128.gridka.de:2119/jobmanager-pbspro-cmsXS ares02.cyf-kr.edu.pl:2119/jobmanager-pbs-cms beagle14.ba.itb.cnr.it:2119/jobmanager-lcgpbs-cms bogrid5.bo.infn.it:2119/jobmanager-lcgpbs-cms ce-fzk.gridka.de:2119/jobmanager-pbspro-cmsL ce-fzk.gridka.de:2119/jobmanager-pbspro-cmsS ce-fzk.gridka.de:2119/jobmanager-pbspro-cmsXS ce.bg.ktu.lt:2119/jobmanager-lcgpbs-cms ce.cc.ncu.edu.tw:2119/jobmanager-lcgpbs-cms […] gridce.ilc.cnr.it:2119/jobmanager-lcgpbs-cms gridce2.pi.infn.it:2119/jobmanager-lcglsf-cms4 gridce.sns.it:2119/jobmanager-lcgpbs-cms 3/18/2016 40 GRID Tutorial - How to use LCG Grid Monitoring GridICE INFN GOC Sinica 3/18/2016 41 GRID Tutorial - How to use LCG Grid Monitoring AOB 3/18/2016 42 云计算 Cloud Computing 4 Cloud Computing Definition Cloud computing is a concept of using the internet to allow people to access technology-enabled services. It allows users to consume services without knowledge of control over the technology infrastructure that supports them. - Wikipedia 46 Enterprise IT spending challenge Global Annual IT Spending Estimated US$B 1996-2010 300 Power and Cooling Costs Server Mgt and Admin Costs 250 New Server Spending 200 150 100 50 $0B Source: IBM Corporate Strategy analysis of IDC data, Sept. 2007 Dream or Nightmare? Seasonal Spikes A Closer Look at Cloud Computing End Users / Requestors Government/ Academics Industry (Startups/ SMB/ Enterprise) Consumers INNOVATIVE BUSINESS MODELS SIMPLIFIED SERVICES Public Cloud Enterprise Cloud Source: Corporate Strategy • New combinations of services to form differentiating value propositions at lower costs in shorter time • Cloud applications enable the simplification of complex services • A cloud computing platform combines modular components on a service oriented architecture with flexible pricing • An “Elastic” pool of high performance virtualized compute resources • Internet protocol based convergence of networks and devices Examples of Different Types of Services Web Application Service Compute Service Collaboration Services Datacenter Infrastructure Database service Cloud Computing Job Scheduling Service Virtual Client service Service Catalog Storage service Content Classification Storage backup, archive… service 51 Google and Cloud Computing Google与云计算 User Centric • Data stored in the “Cloud” • Data follows you & your devices • Data accessible anywhere • Data can be shared with others messages preferences news contacts calendar investments maps photo mailing lists music e-mails phone numbers Google的三大法宝 Google File System(GFS) BigTable MapReduce Google File System(GFS) Replicas GFS Architecture GFS Master Client Masters MSN 19% GFS Master Google 48% C2 C Yahoo 1 33% C C5 3 Chunkserver 1 Chunkserver 2 C0 C5 • • • 56 C1 … Client Client Client Client C0 C5 C2 Chunkserver N Files broken into chunks (typically 64 MB) Master manages metadata Data transfers happen directly between clients/chunkservers Client Client Client Client GFS Usage @ Google • • • • • 200+ clusters Filesystem clusters of up to 5000+ machines Pools of 10000+ clients 5+ Petabyte Filesystems All in the presence of frequent HW failure Google的三大法宝 Google File System(GFS) BigTable MapReduce BigTable • Data model (row, column, timestamp) cell contents BigTable • Distributed multi-level sparse map Fault-tolerance, persistent • Scalable Thousand of servers Terabytes of in-memory data Petabytes of disk-based data • Self-managing Servers can be added/removed dynamically Servers adjust to load imbalance Why not just use commercial DB? • Scale is too large or cost is too high for most commercial databases • Low-level storage optimizations help performance significantly Much harder to do when running on top of a database layer Also fun and challenging to build large-scale systems BigTable Summary • Data model applicable to broad range of clients Actively deployed in many of Google’s services • System provides high-performance storage system on a large scale Self-managing Thousands of servers Millions of ops/second Multiple GB/s reading/writing • Largest bigtable cell manages – 3PB of data spread over several thousand machines Google的三大法宝 Google File System(GFS) BigTable MapReduce MapReduce • A simple programming model that applies to many data-intensive computing problems • Hide messy details in MapReduce runtime library Automatic parallelization Load balancing Network and disk transfer optimization Handle of machine failures Robustness Easy to use MapReduce Programming Model f f f f f f • Borrowed from functional programming map(f, [x1,…,xm,…]) = [f(x1),…,f(xm),…] reduce(f, x1, [x2, x3,…]) = reduce(f, f(x1, x2), [x3,…]) =… f initial f f f f returned (continue until the list is exhausted) • Users implement two functions map (in_key, in_value) (key, value) list reduce (key, [value1,…,valuem]) f_value MapReduce – A New Model and System • Two phases of data processing – Map: (in_key, in_value) {(keyj, valuej) | j = 1…k} – Reduce: (key, [value1,…valuem]) (key, f_value) Input key*value pairs Input key*value pairs ... map map Data store 1 Data store n (key 1, values...) (key 2, values...) (key 3, values...) (key 2, values...) (key 1, values...) (key 3, values...) == Barrier == : Aggregates intermediate values by output key key 1, intermediate values key 2, intermediate values key 3, intermediate values reduce reduce reduce final key 1 values final key 2 values final key 3 values MapReduce Version of Pseudo Code Example – WordCount (1/2) • Input is files with one document per record • Specify a map function that takes a key/value pair key = document URL Value = document contents • Output of map function is key/value pairs. In our case, output (w,”1”) once per word in the document Example – WordCount (2/2) • MapReduce library gathers together all pairs with the same key(shuffle/sort) • The reduce function combines the values for a key. In our case, compute the sum • Output of reduce paired with key and saved MapReduce Framework • For certain classes of problems, the MapReduce framework provides: Automatic & efficient parallelization/distribution I/O scheduling: Run mapper close to input data Fault-tolerance: restart failed mapper or reducer tasks on the same or different nodes Robustness: tolerate even massive failures: e.g. large-scale network maintenance: once lost 1800 out of 2000 machines Status/monitoring Task Granularity And Pipelining • Fine granularity tasks: many more map tasks than machines Minimizes time for fault recovery Can pipeline shuffling with map execution Better dynamic load balancing • Often use 200,000 map/500 reduce tasks with 2000 machines MapReduce: Uses at Google • Typical configuration: 200,000 mappers, 500 reducers on 2,000 nodes • Broad applicability has been a pleasant surprise Quality experiences, log analysis, machine translation, ad-hoc data processing Production indexing system: rewritten with MapReduce • ~10 MapReductions, much simpler than old code MapReduce Summary • MapReduce is proven to be useful abstraction • Greatly simplifies large-scale computation at Google • Fun to use: focus on problem, let library deal with messy details A Data Playground • MapReduce + BigTable + GFS = Data playground Substantial fraction of internet available for processing Easy-to-use teraflops/petabytes, quick turn-around Cool problems, great colleagues Amazon Web Services Amazon Simple Storage Service S3 Amazon Simple Storage Service • Object-Based Storage • 1 B – 5 GB / object • Fast, Reliable, Scalable • Redundant, Dispersed • 99.99% Availability Goal • Private or Public • Per-object URLs & ACLs • BitTorrent Support $.15 per GB per month storage $.01 for 1000 to 10000 requests $.10 - $.18 per GB data transfer Amazon S3 Concepts Objects: Opaque data to be stored (1 byte … 5 Gigabytes) Authentication and access controls Buckets: Object container – any number of objects 100 buckets per account / buckets are “owned” Keys: Unique object identifier within bucket Up to 1024 bytes long Flat object storage model Standards-Based Interfaces: REST and SOAP URL-Addressability – every object has a URL S3 SOAP/Query API Service: ListAllMyBuckets Buckets: CreateBucket DeleteBucket ListBucket GetBucketAccessControlPolicy SetBucketAccessControlPolicy GetBucketLoggingStatus SetBucketLoggingStatus Objects: PutObject PutObjectInline GetObject GetObjectExtended DeleteObject GetObjectAccessControlPolicy SetObjectAccessControlPolicy Amazon Simple Queue Service SQS Amazon Simple Queue Service • Scalable Queuing • Elastic Capacity • Reliable, Simple, Secure $.10 per 1000 messages Inter-process messaging, data buffering, architecture component $.10 - $.18 per GB data transfer Amazon SQS Concepts Queues: Named message container Persistent Messages: Up to 256KB of data per message Peek / Lock access model Scalable: Unlimited number of queues per account Unlimited number of messages per queue SQS SOAP/Query API Queues: ListQueues DeleteQueue SetVisibilityTimeout GetVisibilityTimeout Messages: SendMessage ReceiveMessage DeleteMessage PeekMessage Security: AddGrant ListGrants RemoveGrant Amazon Elastic Compute Cloud EC2 Amazon Elastic Compute Cloud • Virtual Compute Cloud • Elastic Capacity • 1.7 GHz x86 • 1.7 GB RAM • 160 GB Disk • 250 MB/Second Network • Network Security Model Time or Traffic-based Scaling, Load testing, Simulation and Analysis, Rendering, Software as a Service Platform, Hosting $.10 per server hour $.10 - $.18 per GB data transfer Amazon EC2 Concepts Amazon Machine Image (AMI): Bootable root disk Pre-defined or user-built Catalog of user-built AMIs OS: Fedora, Centos, Gentoo, Debian, Ubuntu, Windows Server App Stack: LAMP, mpiBLAST, Hadoop Instance: Running copy of an AMI Launch in less than 2 minutes Start/stop programmatically Network Security Model: Explicit access control Security groups Inter-service bandwidth is free Amazon EC2 At Work Startups Cruxy – Media transcoding GigaVox Media – Podcast Management Fortune 500 clients: High-Impact, Short-Term Projects Development Host Science / Research: Hadoop / MapReduce mpiBLAST Load-Management and Load Balancing Tools: Pound Weogeo Rightscale EC2 SOAP/Query API Images: RegisterImage DescribeImages DeregisterImage Instances: RunInstances DescribeInstances TerminateInstances GetConsoleOutput RebootInstances Keypairs: CreateKeyPair DescribeKeyPairs DeleteKeyPair Image Attributes: ModifyImageAttribute DescribeImageAttribute ResetImageAttribute Security Groups: CreateSecurityGroup DescribeSecurityGroups DeleteSecurityGroup AuthorizeSecurityGroupIngress RevokeSecurityGroupIngress Web-Scale Architecture GigaVox Economics Implemented Amazon S3, Amazon EC2 and Amazon SQS in November 2006 Created an infinitely scalable infrastructure for less than $100 - building the same infrastructure themselves would have cost thousands of dollars Reduced staffing requirements - far less responsibility for 24x7 operations 分析展望 网络的迅猛发展 1986 年到2000年 计算机: × 500 网络: × 340,000 网络发展的必然结果… 网格计算与云计算的比较 • • • • • • • • • 异构资源 不同机构 虚拟组织 科学计算为主 高性能计算机 紧耦合问题 免费 标准化 科学界 • • • • • • • • • 同构资源 单一机构 虚拟机 数据处理为主 服务器/PC 松耦合问题 按量计费 尚无标准 商业社会 106 云计算是广义网格的一种 “网格是构筑在互联网上的一组新兴技术,它 将高速互联网、高性能计算机、大型数据库、 传感器、远程设备等融为一体,为科技人员 和普通老百姓提供更多的资源、功能和交互 性服务。 Ian Foster, The Grid, 1998 107 未来10年的科学 Science 2.0 网格计算 未来10年的商业 Business 2.0 云计算 网格书籍 http://www.chinagrid.net http://www.china-cloud.net