ADVANCE COMPUTING TECHNOLOGY (170704) PRACTICAL-1 AIM: introduction to advance computing technology. Advance Computing Technology is the one-time reference book, providing detailed information about computing technologies, such as cluster computing, grid computing, and cloud computing. 1. Cloud computing:The introduction of cloud computing has provided high scalability and elasticity to the companies. This book helps you to learn various concepts associated with all the three types of computing technologies. Cloud computing is a computing paradigm where a large pool of systems are connected in private or public networks, to provide dynamically scalable infrastructure for application, data and file storage. With the advent of this technology, the cost of computation, application hosting, content storage and delivery is reduced significantly. Cloud computing is a practical approach to experience direct cost benefits and it has the potential to transform a data center from a capital - intensive set up to a variable priced environment. The idea of cloud computing is based on a very fundamental principal of„reusability of IT capabilities'. The difference that cloud computing brings compared to traditional concepts of “grid computing”, “distributed computing”, “utility computing”, or “autonomic computing” is to broaden horizons across organizational boundaries. Forrester defines cloud computing as: “A pool of abstracted, highly scalable, and managed compute infrastructure capable of hosting end customer applications and billed by consumption.” Architecture of cloud computing:Cloud Computing architecture comprises of many cloud components, which are loosely coupled. We can broadly divide the cloud architecture into two parts: ● Front End ● Back End The front end refers to the client part of cloud computing system. It consists of interfaces and applications that are required to access the cloud computing platforms, Example - Web Browser. The back End refers to the cloud itself. It consists of all the resources required to provide cloud computing services. It comprises of huge data storage, virtual machines, security mechanism, services, deployment models, servers, etc. VIMAT/BE/CE/120940107018 Page 1 ADVANCE COMPUTING TECHNOLOGY (170704) Each of the ends is connected through a network, usually Internet. The following diagram shows the graphical view of cloud computing architecture: Cloud Computing Benefits Enterprises would need to align their applications, so as toexploit the architecture models that Cloud Computing offers. Some of the typical benefits are listed below: 1. Reduces cost:-There are a number of reasons to attribute Cloud technology with lower costs. The billing model is pay as per usage; the infrastructure is not thus lowering maintenance.Initial expense and recurring expenses are much lower than traditional computing. 2. Increased storage:-With the massive Infrastructure that is offered by Cloud providers today, storage & maintenance of large volumes of datais a reality. Sudden workload spikes are also managed effectively & efficiently, since the cloud can scale dynamically. 3. Flexibility:- This is an extremely important characteristic. With enterprises having to adapt, even more rapidly, to changing business conditions, speed to deliver is critical. Cloud computing stresses on getting applications to market very quickly, by using the most appropriate building blocks necessary for deployment. Disadvantages of Cloud Computing VIMAT/BE/CE/120940107018 Page 2 ADVANCE COMPUTING TECHNOLOGY (170704) 1. Downtime:- As cloud service providers take care of a number of clients each day, they can become overwhelmed and may even come up against technical outages. This can lead to your business processes being temporarily suspended. Additionally, if your internet connection is offline, you will not be able to access any of your applications, server or data from the cloud. 2. Security:- Although cloud service providers implement the best security standards and industry certifications, storing data and important files on external service providers always opens up risks. Using cloud-powered technologies means you need to provide your service provider with access to important business data. Meanwhile, being a public service opens up cloud service providers to security challenges on a routine basis. The ease in procuring and accessing cloud services can also give nefarious users the ability to scan, identify and exploit loopholes and vulnerabilities within a system. For instance, in a multi-tenant cloud architecture where multiple users are hosted on the same server, a hacker might try to break into the data of other users hosted and stored on the same server. However, such exploits and loopholes are not likely to surface, and the likelihood of a compromise is not great. 3. Vendor Lock-In:- Although cloud service providers promise that the cloud will be flexible to use and integrate, switching cloud services is something that hasn’t yet completely evolved. Organizations may find it difficult to migrate their services from one vendor to another. Hosting and integrating current cloud applications on another platform may throw up interoperability and support issues. For instance, applications developed on Microsoft Development Framework (.Net) might not work properly on the Linux platform. 4. Limited Control:- Since the cloud infrastructure is entirely owned, managed and monitored by the service provider, it transfers minimal control over to the customer. The customer can only control and manage the applications, data and services operated on top of that, not the backend infrastructure itself. Key administrative tasks such as server shell access, updating and firmware management may not be passed to the customer or end user. It is easy to see how the advantages of cloud computing easily outweigh the drawbacks. Decreased costs, reduced downtime, and less management effort are benefits that speak for themselves Application of cloud computing:The applications of cloud computing are practically limitless. With the right middleware, a cloud computing system could execute all the programs a normal computer could run. Potentially, everything from generic word processing software to customized computer programs designed for a specific company could work on a cloud computing system.Why would anyone want to rely on another computer system to run programs and store data? Here are just a few reasons: ● Clients would be able to access their applications and data from anywhere at any time. They could access the cloud computing system using any computer linked to the Internet. Data wouldn't be confined to a hard drive on one user's computer or even a corporation's internal network. VIMAT/BE/CE/120940107018 Page 3 ADVANCE COMPUTING TECHNOLOGY (170704) ● It could bring hardware costs down. Cloud computing systems would reduce the need for advanced hardware on the client side. You wouldn't need to buy the fastest computer with the most memory, because the cloud system would take care of those needs for you. Instead, you could buy an inexpensive computer terminal. The terminal could include a monitor, input devices like a keyboard and mouse and just enough processing power to run the middleware necessary to connect to the cloud system. You wouldn't need a large hard drive because you'd store all your information on a remote computer. ● Corporations that rely on computers have to make sure they have the right software in place to achieve goals. Cloud computing systems give these organizations company-wide access to computer applications. The companies don't have to buy a set of software or software licenses for every employee. Instead, the company could pay a metered fee to a cloud computing company. ● Servers and digital storage devices take up space. Some companies rent physical space to store servers and databases because they don't have it available on site. Cloud computing gives these companies the option of storing data on someone else's hardware, removing the need for physical space on the front end. ● Corporations might save money on IT support. Streamlined hardware would, in theory, have fewer problems than a network of heterogeneous machines and operating systems. 2. cluster computing:Clustering is a technique in which multiple computers are being connected together to achieve a powerful computing device. Architecture of cluster computing:- VIMAT/BE/CE/120940107018 Page 4 ADVANCE COMPUTING TECHNOLOGY (170704) 3. Grid computing:Grid computing is a technique in which computer resources from various administrative domains are being combined to achieve a common goal. At present, companies are moving their data on cloud to enable ubiquitous and on-demand accessibility of shared pool of computing resources. Architecture of grid computing:- VIMAT/BE/CE/120940107018 Page 5 ADVANCE COMPUTING TECHNOLOGY (170704) PRACTICAL-2 AIM:- To study about cluster computing. What is cloud computing? Connecting two or more computers together in such a way that they behave like a single computer. Clustering is used for parallel processing, load balancing and fault tolerance. Clustering is a popular strategy for implementing parallel processing applications because it enables companies to leverage the investment already made in PCs and workstations. In addition, it's relatively easy to add new CPUs simply by adding a new PC to the network. Connecting two or more computers together in such a way that they behave like a single computer. Clustering is used for parallel processing, load balancing and fault tolerance. Clustering is a popular strategy for implementing parallel processing applications because it enables companies to leverage the investment already made in PCs and workstations. In addition, it's relatively easy to add new CPUs simply by adding a new PC to the network. Introduction to cluster computing:Very often applications need more computing power than a sequential computer can provide. One way of overcoming this limitation is to improve the operating speed of processors and other components so that they can other the power required by computationally intensive applications. Even though this is currently possible to certain extent, future improvements are constrained by the speed of light, their dynamic laws, and the high financial costs for processor fabrication. A viable and cost- effective alternative solution is to connect multiple processors together and coordinate their computational efforts. The resulting systems are popularly known as parallel computers, and they allow the sharing of a computational task among multiple processors. There are three ways to improve performance:1. Work harder, 2. Work smarter, and 3. Get help. In terms of computing technologies, the analogy to this mantra is that working harder is like using faster hardware (high performance processors or peripheral devices). Working smarter concerns doing things more effciently and this revolves around the algorithms and techniques used to solve computational tasks. Finally, getting help refers to using multiple computers to solve a particular task. Cluster computing architecture:A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers working together as a single, integrated computing resource. VIMAT/BE/CE/120940107018 Page 6 ADVANCE COMPUTING TECHNOLOGY (170704) A computer node can be a single or multiprocessor system (PCs, workstations,or SMPs) with memory, I/O facilities, and an operating system. A cluster generallyrefers to two or more computers (nodes) connected together. The nodes can existin a single cabinet or be physically separated and connected via a LAN. An inter-connected (LAN-based) cluster of computers can appear as a single system to users and applications. Such a system can provide a cost-effective way to gain features and benfits (fast and reliable services) that have historically been found only on more expensive proprietary shared memory systems. The typical architecture of a cluster is shown in Figure. The following are some prominent components of cluster computers: 1. 2. 3. 4. 5. 6. Multiple High Performance Computers (PCs, Workstations, or SMPs) State-of-the-art Operating Systems (Layered or Micro-kernel based) High Performance Networks/Switches (such as Gigabit Ethernet and Myrinet) Network Interface Cards (NICs) Fast Communication Protocols and Services (such as Active and Fast Mes-sages) Cluster Middleware (Single System Image (SSI) and System Availability In-frastructure. The network interface hardware acts as a communication processor and is responsible for transmitting and receiving packets of data between cluster nodes via a network/switch. Communication software offers a means of fast and reliable data communication among cluster nodes and to the outside world. Often, clusters with a special network/switch like Myrinet use communication protocols such as active messages for fast communication among its nodes. They potentially bypass the operating system and thus remove the critical communication overheads providing direct user-level access to the network interface. The cluster nodes can work collectively, as an integrated computing resource, or they can operate as individual computers. The cluster middleware is responsible for offering an illusion of a unfied system image (single system image) and availability out of a collection on independent but interconnected computers. VIMAT/BE/CE/120940107018 Page 7 ADVANCE COMPUTING TECHNOLOGY (170704) Programming environments can other portable, effcient, and easy-to-use tools for development of applications. They include message passing libraries, debuggers, and prolers. It should not be forgotten that clusters could be used for the execution of sequential or parallel applications. Application of cluster computing (parallel processing):Clusters have been employed as an execution platform for a range of application classes, ranging from supercomputing and mission-critical ones, through to e-commerce, and databasebased ones. Clusters are being used as execution environments for Grand Challenge Applications(GCAs) such as weather modeling, automobile crash simulations, life sciences, computational fluid dynamics, nuclear simulations, image processing, electromagnetics, data mining, aerodynamics and astrophysics. These applications are generally considered intractable without the use of state-of-the-art parallel supercomputers. The scale of their resource requirements, such as processing time, memory, and communication needs distinguishes GCAs from other applications. For example, the execution of scientific applications used in predicting life-threatening situations such as earthquakes or hurricanes requires enormous computational power and storage resources. In the past, these applications would be run on vector or parallel supercomputers costing millions of dollars in order to calculate predictions well in advance of the actual events. Such applications can be migrated to run on commodity off-the-shelf-based clusters and deliver comparable performance at a much lower cost. In fact, in many situation expensive parallel supercomputers have been replaced by low-cost commodity Linux clusters in order to reduce maintenance costs and increase overall computational resources. Clusters are increasingly being used for running commercial applications. In a business environment, for example in a bank, many of its activities are automated. However, a problem will arise if the server that is handling customer transactions fails. The bank’s activities could come to halt and customers would not be able to deposit or withdraw money from their account. Such situations can cause a great deal of inconvenience and result in loss of business and confidence in a bank. This is where clusters can be useful. A bank could continue to operate even after the failure of a server by automatically isolating failed components and migrating activities to alternative resources as a means of offering an uninterrupted service. With the increasing popularity of the Web, computer system availability is becoming critical, especially for e-commerce applications. Clusters are used to host many new Internet service sites. For example, free email sites like Hotmail , and search sites like Hotbot (that uses Inktomi technologies) use clusters. Cluster-based systems can be used to execute many Internet applications: 1. Web servers; 2. Search engines; 3. Email; 4. Security; VIMAT/BE/CE/120940107018 Page 8 ADVANCE COMPUTING TECHNOLOGY (170704) 5. Proxy; and 6. Database servers. In the commercial arena these servers can be consolidated to create what is known as an enterprise server. The servers can be optimized, tuned and managed for increased efficiency and responsiveness depending on the workload through various load-balancing techniques. A large number of low-end machines (PCs) can be clustered along with storage and applications for scalability, high availability, and performance. The leading companies building these systems are Compaq , Hewlett-Packard , IBM, Microsoft and Sun. Advantages: • High performance • Large capacity • High availability • Incremental growth Disadvantages: • Complexity VIMAT/BE/CE/120940107018 Page 9 ADVANCE COMPUTING TECHNOLOGY (170704) PRACTICAL-3 AIM:- To study about grid computing. Introduction:Definition - What does Grid Computing mean? Grid computing is a processor architecture that combines computer resources from various domains to reach a main objective. In grid computing, the computers on the network can work on a task together, thus functioning as a supercomputer. a grid works on various tasks within a network, but it is also capable of working on specialized applications. It is designed to solve problems that are too big for a supercomputer while maintaining the flexibility to process numerous smaller problems. Computing grids deliver a multiuser infrastructure that accommodates the discontinuous demands of large information processing. Grid Computing is a form of distributed computing based on the dynamic sharing of resources between participants, organizations and companies to by combining them, and thereby carrying out intensive computing applications or processing very large amounts of data. Well- known example of grid computing in the public domain is the ongoing SETI (Search for Extraterrestrial Intelligence) @Home project in which thousands of people are sharing the unused processor cycles of their PCs in the vast search for signs of "rational" signals from outer space. Grid computing is applying the resources of many computers in a network to a single problem at the same time - usually to a scientific or technical problem that requires a great number of computer processing cycles or access to large amounts of data. Grid computing requires the use of software that can divide and farm out pieces of a program to as many as several thousand computers. Grid computing can be thought of as distributed and large-scale cluster computing and as a form of network-distributed parallel processing. Grid computing appears to be a promising trend for three reasons:(1) Its ability to make more cost-effective use of a given amount of computer resources. (2) As a way to solve problems that can't be approached without an enormous amount of computing power. (3) Because it suggests that the resources of many computers can be cooperatively and perhaps synergistically harnessed and managed as a collaboration toward a common objective. Types of Grid:Computational grid:- A computational grid is focused on setting aside resources specifically for computing power. In this type of grid, most of the machines are high-performance servers. VIMAT/BE/CE/120940107018 Page 10 ADVANCE COMPUTING TECHNOLOGY (170704) Scavenging grid:- A scavenging grid is most commonly used with large numbers of desktop machines. Machines are scavenged for available CPU cycles and other resources. Owners of the desktop machines are usually given control over when their resources are available to participate in the grid. Data grid:- A data grid is responsible for housing and providing access to data across multiple organizations. Users are not concerned with where this data is located as long as they have access to the data. A data grid would allow them to share their data, manage the data, and manage security issues such as who has access to what data. Architecture:- Applications:Grid Computing has many application fields. It is being used more and more systematically, and for many reasons. The first is the improvement of performance and the reduction of costs due to the combining of resources. The possibility of creating virtual organizations to establish collaboration between teams with scarce and costly data and resources is another. Scientists, who use applications that require enormous resources in terms of computing or data processing, are large consumers of computational grids. One finds for instance many grids in particle-physics experiments. Nor are leading industries staying behind: grids are massively present in the automobile and aeronautical business, where digital simulation plays an important part. In practice, grids are very useful in crash simulations, as well as for computer-aided design. More recently, grids have emerged in other areas with the purpose of optimising company business. The aim is to combine material resources for several services by reallocating them in a dynamic way depending on performance peaks. VIMAT/BE/CE/120940107018 Page 11 ADVANCE COMPUTING TECHNOLOGY (170704) This strategy offers considerable cost cutting thanks to better management of resources, administrative tasks and maintenance. This last application field is of particular interest to France Telecom, as we shall see. Advantages:● Can solve larger, more complex problems in a shorter time ● Easier to collaborate with other organizations ● Make better use of existing hardware Disadvantages:● Grid software and standards are still evolving ● Learning curve to get started ● Non-interactive job submission Conclusion:Grid computing provides a framework and deployment platform that enables resource sharing, accessing, aggregation, and management in a distributed computing environment based on system performance, users' quality of services, as well as emerging open standards, such as Web services. This is the era of Service Computing.Grid computing technologies are entering the mainstream of computing in the financial services industry. It is, slowly but surely, changing the face of computing in our industry. Just as the internet provided an means for explosive growth in information sharing, grid computing provides an infrastructure leading to explosive growth in the sharing of computational resources. This is making possible functionality that was previously unimaginable -- near real time portfolio rebalancing scenario analysis; risk analysis models with seemingly limitless complexity; and content distribution with speed and efficiency hereunto unparalleled. VIMAT/BE/CE/120940107018 Page 12 ADVANCE COMPUTING TECHNOLOGY (170704) PRACTICAL: - 4 AIM: Study Of ClusterSim Simulating Tool for Cluster Computing. Introduction:Nowadays, clusters of workstations are widely used in academic, industrial and commercial areas. Usually built with “commodity-off-the-shelf” hardware components and freeware or shareware available in the web, they are a low cost and high performance alternative to supercomputers. The performance analysis of different parallel job scheduling algorithms, interconnection networks and topologies, heterogeneous nodes and parallel jobs on real clusters requires: a long time to develop and change software; a high financial cost to acquire new hardware; a controllable and stable environment; less intrusive performance analysis tools etc. On the other hand, analytical modeling for the performance analysis of clusters requires too much simplifications and assumptions. Cluster Simulation Tool (ClusterSim) The ClusterSim is a Java-based parallel discrete-event simulation tool for cluster computing. It supports visual modeling and simulation of clusters and their workloads for performance analysis. A cluster is composed of single or multi-processed nodes, parallel job schedulers, network topologies and technologies. A workload is represented by users that submit jobs composed of tasks described by probability distributions and their internal structure. The main features of ClusterSim are: It provides a graphical environment to model clusters and their workloads. Its source code is available and its classes are extensible, providing a mechanism to implement new job scheduling algorithms, network topologies etc. A job is represented by some probability distributions and its internal structure (loop structures, CPU, I/O and MPI (communication) instructions). Thus, any parallel algorithm model and communication pattern can be simulated. It supports the modeling of clusters and heterogeneous or homogeneous nodes. Simulation entities (architectures and users) are independent threads, providing parallelism. The most part of collective and point-to-point MPI (Message Passing Interface) functions are supported. VIMAT/BE/CE/120940107018 Page 13 ADVANCE COMPUTING TECHNOLOGY (170704) A network is represented by its topology (bus, switch etc.), latency, bandwidth, protocol overhead, error rate and maximum segment size. It supports different parallel job scheduling algorithms (space sharing, gang scheduling, etc.) and node scheduling algorithms (first-come-first-served (FCFS), etc.). It provides a statistical and performance module that calculates some metrics (mean nodes utilization, mean simulation time, mean jobs response time etc.). It supports some probability distributions (Normal, Exponential, Erlang HyperExponential, Uniform etc.) to represent the parallelism degree of the jobs and the interarrival time between jobs submissions. Simulation time and seed can be specified. Architecture of the ClusterSim The architecture of the ClusterSim is divided in three layers: graphical environment, entities. The first layer allows the modeling and simulation of clusters and their workloads by means of a graphical environment. Moreover, it provides statistical and performance dataabout each simulation. The second layer is composed of three entities: user, cluster and 7 node. VIMAT/BE/CE/120940107018 Page 14 ADVANCE COMPUTING TECHNOLOGY (170704) Graphical Environment The graphical environment was implemented using Java Swing and NetBeans 3.4.1 compiler. It is composed of a configuration and simulation execution interface, three workload editors (user, job and task editors) and three architecture editors (cluster, node and processor editors). Using these tools, it is possible to model, execute, save and modify simulation environments and 8 experiments As well as the ClusterSim editors, the simulation model is divided between workload and architecture. Based on the related works, we chose a hybrid workload model using probability distributions to represent some parameters (parallelism degree and inter-arrival time) and the internal structure description of the jobs. The use of execution time as a parameter, in spite of being found on execution logs, it is valid only to a certain workload and architecture. Moreover, it is influenced by many factors like load, nodes processing power, network overhead etc. Thus, the execution time of a job must be calculated during a simulation, according to the simulated workload and architecture. To avoid long execution traces, the jobs inter-arrival time is also represented by a probability distribution. Moreover, exponential and Erlang hyper-exponential distributions are widely used in the academic community to represent the jobs inter-arrival time. Statistical and Performance Module For each executed simulation, the statistical and performance module of ClusterSim creates a log with the calculation of several metrics. The main calculated metrics are: mean jobs and tasks response time; wait, submission, start and end time of each task; mean jobs slowdown; mean nodes utilization; mean jobs reaction time and others. ClusterSim’s Entities Each entity has specific functions in a simulation environment. The user entity is responsible for submitting a certain number of jobs to the cluster following a pattern of arrival interval. Besides, each job type has a specific probability of being submitted to the cluster entity. This submission is made through the generation of a job arrival event. When the cluster receives this event, it should decide to which nodes the tasks of a job should be directed. So, there is a job management system scheduler that implements certain parallel job scheduling algorithms. Other important classes belonging to the cluster entity are: MPI manager, single system image and network. VIMAT/BE/CE/120940107018 Page 15 ADVANCE COMPUTING TECHNOLOGY (170704) The single system image works as an operating system of the cluster, receiving and directing events to the responsible classes for the event treatment. Besides, it generates periodically the end of time slice event to indicate to the node schedulers that another time slice ended. The Fig. shows the events exchange diagram of the ClusterSim, detailing the interaction among the user, cluster and node entities. To simplify the diagram, some classes were omitted. A cluster entity is composed of several node entities. When receiving a job arrival event, by means of the node scheduler class, the node entity puts the tasks destined to it into a queue. Oneach clock tick, the scheduler is called to execute tasks in the processors of the node. As each task is composed of CPU, E/S and MPI instructions, at the end of each one of those macro instructions an event is generated. A quantum is attributed to each task. When a task finishes, the processor should generate an end of quantum event for the node scheduler to execute the necessary actions (to change priorities, to remove the task from the head’s queue etc.). When the processor executes all the instructions of a task, an end of task event is generated for the node scheduler. ClusterSim’s Core The core is composed of the JSDESLib (Java Simple Discrete Event Simulation Library),multithread discrete-event simulation library in Java, developed by our group, which has as the main objective to simplify the development of discrete-event simulation tools. VIMAT/BE/CE/120940107018 Page 16 ADVANCE COMPUTING TECHNOLOGY (170704) Verification and Validation of the ClusterSim To verify and test the ClusterSim, we simulated a simple workload composed of two jobs and compare with the analytical analysis or manual execution of the same workload. In Figure ,each graph represents a job, where the nodes are the tasks and the edges indicate exchange of messages between the tasks. The value in each node and edge indicates the time spent in seconds to perform the processing (CPU instructions) and communication. For example, Job 2 represents a farm of processes or tasks, in which the master task sends data to the slaves. So, they process these data and return them for the master process. As the ClusterSim does not use the execution time as an entry parameter, the execution time of the jobs was converted in CPU instructions and sent bytes. Simulation Results To analyze the use of ClusterSim, we modeled, simulated and analyzed a case study composed of 12 workloads and 12 clusters. Due to the limited number of pages, we will only show the analysis based on one metric: mean nodes utilization. Simulation Setup 1). Clusters The clusters are composed of 16 nodes and a front-end node interconnected by a Fast Ethernet switch. Each node has a Pentium III 1 Ghz (0.938 GHz real frequency) processor. In Table 4, we show the main values of the clusters features, obtained from benchmarks and performance libraries (Sandra 2003, PAPI 2.3 etc.). VIMAT/BE/CE/120940107018 Page 17 ADVANCE COMPUTING TECHNOLOGY (170704) Clusters features and their respective values 2). Workloads In ClusterSim, a workload is composed of a jobs set represented by: their types, internal structures, submission probabilities and inter-arrival time distributions. Due to the lack of information about the internal structure of the jobs, we decided to create a synthetic set of job. In the workload jobs, at each one of the iterations, the master task sends a different message to each slave task. On their turn, they process a certain number of instructions, according to the previously defined granularity, and then they return a message to the master task. The total number of instructions that is to be processed by the job and the size of the messages are divided equally among the slave tasks. With regard to the parallelism degree, which is represented by a probability distribution, we considered jobs between 1 and 4 tasks as low parallelism degree and between 5 and 16 as high parallelism degree. As usual, we used a uniform distribution to represent the parallelism degree.Combining the parallelism level, number of instructions and granularity characteristics, we had 8 different basic job types. Results Presentation and Analysis In this section, we present and analyze the performance of the clusters and their gang scheduling algorithms. To analyze them, we compare clusters in which a gang scheduling component or part is varied and the others are fixed. In Fig., we present the mean nodes utilization for all workloads and clusters. Considering the packing schemes (Fig. ), when the multiprogramming level is unlimited, the first fit is better for HL and LH workloads. VIMAT/BE/CE/120940107018 Page 18 ADVANCE COMPUTING TECHNOLOGY (170704) At a first moment, the best fit scheme finds the best slot for a job, but at long term, this decision may prevent new jobs from entering in more appropriate slots. In the case of HL and LH workloads, this chance increases, because the long jobs (with a low parallelism degree) that remain after the execution of short jobs (with a high parallelism degree) will probably occupy columns in common, thus, making it difficult to defragment the matrix. On the other hand, the first fit initially makes the matrix more fragmented. Besides, it increases the multiprogramming level. But at long term, it will make it easier to defragment the matrix, because the jobs will have fewer columns in common. In the other cases, the best fit scheme presents a slightly better performance. In general, both packing schemes have an equivalent performance. Conclusions In this paper, we proposed, implemented, verified, validated and analyzed the simulation tool ClusterSim. It has a graphical environment that facilitates the modeling and creation of clusters and workloads (parallel jobs and users) to analyze their performance by means of simulation. Its hybrid workload model (probabilistic model and structural description) allows the representation of real parallel jobs (instructions, loops etc.). Moreover it makes the simulation more deterministic than an only-probabilistic model. The verification and validation of ClusterSim by means of manual execution and experimental tests showed that ClusterSim provides mechanisms to repeat and modify some parameters of real experiments under a controllable and trustful environment. As shown in our case study, VIMAT/BE/CE/120940107018 Page 19 ADVANCE COMPUTING TECHNOLOGY (170704) we can create synthetic workloads and evaluate the performance of different cluster configurations. Built in Java and with its source code available, the classes of ClusterSim can be extended,allowing the creation of new network topologies, parallel job scheduling algorithms, etc. The main contributions of this paper are: the definition, proposal, implementation, verification, validation and analysis of the ClusterSim. The main features of ClusterSim are: an hybrid workload model, a graphical environment, the modeling of heterogeneous clusters and astatistical and performance module. As future works we can highlight: to implement a network topology editor, support to distributed simulation, simulation of grid architectures, generation of statistical and performance graphics etc. VIMAT/BE/CE/120940107018 Page 20 ADVANCE COMPUTING TECHNOLOGY (170704) PRACTICAL:-5 Aim: Create Amazon server for storage and online processing. First of all create the account in amazon. There are following steps:Step1:- open the website. Step2:- then click ‘create an AWS Account’. Step:-3 if not registered then first register and then login. VIMAT/BE/CE/120940107018 Page 21 ADVANCE COMPUTING TECHNOLOGY (170704) Now login, VIMAT/BE/CE/120940107018 Page 22 ADVANCE COMPUTING TECHNOLOGY (170704) Step:-4 now, sign up for AWS services. Now, insert detail of credit/debit master card. Step:-5 veriffy the detail and create account. Amazon services:- VIMAT/BE/CE/120940107018 Page 23 ADVANCE COMPUTING TECHNOLOGY (170704) Amazon services has a features that are, Flexible Cost-effective Scalable Elastic Secure Amazon provice two cloud services: I. Amazon EC2 II. Amazon S3 Amazon EC2:Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers. Amazon EC2’s simple web service interface allows you to obtain and configure capacity with minimal friction. It provides you with complete control of your computing resources and lets you run on Amazon’s proven computing environment. Amazon EC2 reduces the time required to obtain and boot new server instances to minutes, allowing you to quickly scale capacity, both up and down, as your computing requirements change. Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use. Amazon EC2 provides developers the tools to build failure resilient applications and isolate themselves from common failure scenarios. Benefits I. Elastic Web-Scale Computing:Amazon EC2 enables you to increase or decrease capacity within minutes, not hours or days. You can commission one, hundreds or even thousands of server instances simultaneously. Of course, because this is all controlled with web service APIs, your application can automatically scale itself up and down depending on its needs. II. Completely Controlled:You have complete control of your instances. You have root access to each one, and you can interact with them as you would any machine. You can stop your instance while retaining the data on your boot partition and then subsequently restart the same instance using web service APIs. Instances can be rebooted remotely using web service APIs. You also have access to console output of your instances. III. Flexible Cloud Hosting Services:You have the choice of multiple instance types, operating systems, and software packages. Amazon EC2 allows you to select a configuration of memory, CPU, instance storage, and the boot partition size that is optimal for your choice of operating system and application. For example, your choice of operating systems includes numerous Linux distributions, and Microsoft Windows Server. IV. Designed for use with other Amazon Web Services:Amazon EC2 works in conjunction with Amazon Simple Storage Service (Amazon S3), Amazon Relational Database Service (Amazon RDS), Amazon SimpleDB and Amazon Simple Queue Service (Amazon SQS) to provide a complete solution for computing, query processing and storage across a wide range of applications. VIMAT/BE/CE/120940107018 Page 24 ADVANCE COMPUTING TECHNOLOGY (170704) V. Reliable:Amazon EC2 offers a highly reliable environment where replacement instances can be rapidly and predictably commissioned. The service runs within Amazon’s proven network infrastructure and data centers. The Amazon EC2 Service Level Agreement commitment is 99.95% availability for each Amazon EC2 Region. VI. Secure:Amazon EC2 works in conjunction with Amazon VPC to provide security and robust networking functionality for your compute resources. Your compute instances are located in a Virtual Private Cloud (VPC) with an IP range that you specify. You decide which instances are exposed to the Internet and which remain private. Security Groups and networks ACLs allow you to control inbound and outbound network access to and from your instances. You can connect your existing IT infrastructure to resources in your VPC using industry-standard encrypted IPsec VPN connections. You can provision your EC2 resources as Dedicated Instances. Dedicated Instances are Amazon EC2 Instances that run on hardware dedicated to a single customer for additional isolation. If you do not have a default VPC you must create a VPC and launch instances into that VPC to leverage advanced networking features such as private subnets, outbound security group filtering, network ACLs, Dedicated Instances, and VPN connections. VII. Inexpensive:Amazon EC2 passes on to you the financial benefits of Amazon’s scale. You pay a very low rate for the compute capacity you actually consume. See Amazon EC2 Instance Purchasing Options for a more detailed description. On-Demand Instances Reserved Instances Spot Instances Amazon S3:Amazon Simple Storage Service (Amazon S3), provides developers and IT teams with secure, durable, highly-scalable object storage. Amazon S3 is easy to use, with a simple web service interface to store and retrieve any amount of data from anywhere on the web. With Amazon S3, you pay only for the storage you actually use. There is no minimum fee and no setup cost. Amazon S3 offers a range of storage classes designed for different use cases including Amazon S3 Standard for general-purpose storage of frequently accessed data, Amazon S3 Standard - Infrequent Access (Standard - IA) for long-lived, but less frequently accessed data, and Amazon Glacier for longterm archive. Amazon S3 also offers configurable lifecycle policies for managing your data throughout its lifecycle. Once a policy is set, your data will automatically migrate to the most appropriate storage class without any changes to your applications. Amazon S3 can be used alone or together with other AWS services such as Amazon Elastic Compute Cloud (Amazon EC2) and AWS Identity and Access Management (IAM), as well as third party storage VIMAT/BE/CE/120940107018 Page 25 ADVANCE COMPUTING TECHNOLOGY (170704) repositories and gateways. Amazon S3 provides cost-effective object storage for a wide variety of use cases including cloud applications, content distribution, backup and archiving, disaster recovery, and big data analytics. Benefits I. Durable:Amazon S3 provides durable infrastructure to store important data and is designed for durability of 99.999999999% of objects. Your data is redundantly stored across multiple facilities and multiple devices in each facility. II. Low Cost:Amazon S3 allows you to store large amounts of data at a very low cost. Using lifecycle management, you can set policies to automatically migrate your data to Standard Infrequent Access and Amazon Glacier as it ages to further reduce costs. You pay for what you need, with no minimum commitments or up-front fees. III. Available:Amazon S3 Standard is designed for up to 99.99% availability of objects over a given year and is backed by the Amazon S3 Service Level Agreement, ensuring that you can rely on it when needed. You can also choose an AWS region to optimize for latency, minimize costs, or address regulatory requirements.. IV. Secure:Amazon S3 supports data transfer over SSL and automatic encryption of your data once it is uploaded. You can also configure bucket policies to manage object permissions and control access to your data using AWS Identity and Access Management (IAM). V. Scalable:With Amazon S3, you can store as much data as you want and access it when needed. You can stop guessing your future storage needs and scale up and down as required, dramatically increasing business agility. VI. Send Event Notifications:Amazon S3 can send event notifications when objects are uploaded to Amazon S3. Amazon S3 event notifications can be delivered using Amazon SQS or Amazon SNS, or sent directly to AWS Lambda, enabling you to trigger workflows, alerts, or other processing. For example, you could use Amazon S3 event notifications to trigger transcoding of media files when they are uploaded, processing of data files when they become available, or synchronization of Amazon S3 objects with other data stores. VII. High Performance:Amazon S3 supports multi-part uploads to help maximize network throughput and resiliency, and lets you choose the AWS region to store your data close to the end user and minimize network latency. And Amazon S3 is integrated with Amazon CloudFront, a content delivery web service that distributes content to end users with low latency, high data transfer speeds, and no minimum usage commitments. VIII. Integrated:Amazon S3 is integrated with other AWS services to simplify uploading and downloading data from Amazon S3 and make it easier to build solutions that use a range of AWS VIMAT/BE/CE/120940107018 Page 26 ADVANCE COMPUTING TECHNOLOGY (170704) services. Amazon S3 integrations include Amazon CloudFront, Amazon CloudWatch,Amazon Kinesis, Amazon RDS, Amazon Glacier, Amazon EBS, Amazon DynamoDB,Amazon Redshift, Amazon Route 53, Amazon EMR, Amazon VPC, Amazon KMS, and AWS Lambda. IX. Easy to use:Amazon S3 is easy to use with a web-based management console and mobile app and full REST APIs and SDKs for easy integration with third party technologies. VIMAT/BE/CE/120940107018 Page 27 ADVANCE COMPUTING TECHNOLOGY (170704) PRACTICAL-6 AIM: Install Eucalyptus in virtual machine using VM ware. Launch VMware Workstation and follow the simple steps as shown below: Select "Create New Virtual Machine" option This will bring up the following dialog box. Select "Typical" and hit Next to continue. VIMAT/BE/CE/120940107018 Page 28 ADVANCE COMPUTING TECHNOLOGY (170704) Browse to the downloaded "Eucalyptus Faststart ISO" image and click Next when done. Next, select "Linux" as the Guest Operating System and "CentOS 64-bit" as the version. This Faststart ISO is based on CentOS 6.4 version and is a 64 Bit ISO. Provide a suitable "Name" for your Virtual Machine. You can optionally browse the location where you want to save your VM files. VIMAT/BE/CE/120940107018 Page 29 ADVANCE COMPUTING TECHNOLOGY (170704) Provide at least a disk size of "100GB" and "store the virtual disk as a single file". Click Next to continue. In the next dialog, hit "Customize Hardware" to increase the RAM and CPU for your VM. VIMAT/BE/CE/120940107018 Page 30 ADVANCE COMPUTING TECHNOLOGY (170704) You can optionally remove unwanted devices such as Printer, USB etc if you don't require them. Click OK when done. You are now ready to "Power ON" your Frontend Controller VM. VIMAT/BE/CE/120940107018 Page 31 ADVANCE COMPUTING TECHNOLOGY (170704) Once you power on your VM, the following boot screen appears with few options. We need to select "Install CentOS 6 with Eucalyptus Frontend" option. VIMAT/BE/CE/120940107018 Page 32 ADVANCE COMPUTING TECHNOLOGY (170704) You will be prompted to run a "Disk Check Utility". Skip it for now. Now we will be walked through a simple step by step installer to set up our Node Controller. Click Next to begin Select your appropriate "Language". click Next to continue. VIMAT/BE/CE/120940107018 Page 33 ADVANCE COMPUTING TECHNOLOGY (170704) Select the appropriate "Keyboard" for your system. Click Next VIMAT/BE/CE/120940107018 Page 34 ADVANCE COMPUTING TECHNOLOGY (170704) You will be prompted to format your current disk. Select "Yes, discard any data" In the next prompt, provide a suitable hostname for your Node Controller (in this case, eucafrontend). Fill in the Static IP details for your VM as shown below. NOTE: It is not recommended that you use a DHCP Network for any of Eucalyptus components. Always provide static IPs. VIMAT/BE/CE/120940107018 Page 35 ADVANCE COMPUTING TECHNOLOGY (170704) Select your nearest "city" for timezone settings. Click Next to continue. Provide a suitable "Root password" for your system. You may get a warning that you are using a weak password, you can ignore it and select "Use anyway" to continue. (But don't ignore it on Production servers !!) VIMAT/BE/CE/120940107018 Page 36 ADVANCE COMPUTING TECHNOLOGY (170704) In the next dialog, you need to provide the "Public IP Range/ List". This is going to be used as the IP Range for your Eucalyptus Machine instances. You can set what type of installation you want for your VM. I generally choose "Use all space". you can optionally provide your own if you want. VIMAT/BE/CE/120940107018 Page 37 ADVANCE COMPUTING TECHNOLOGY (170704) "Write changes to disk" when prompted. This will now begin the installation process. The installation will take couple of minutes to complete.. Once your Frontend installation completes, Faststart will automatically create a Eucalyptus Machine Image (EMI). This will be used later to deploy instances in our Private Cloud. VIMAT/BE/CE/120940107018 Page 38 ADVANCE COMPUTING TECHNOLOGY (170704) Once the installation completes, you will be asked to "Reboot" your system. Reboot it. VIMAT/BE/CE/120940107018 Page 39 ADVANCE COMPUTING TECHNOLOGY (170704) Once the VM reboots, there will a lot of configuration going on. You will see Eucalyptus services starting up as well. Next, you will be asked to configure the Frontend. Click Forward to continue Accept the "License Information" and click Forward VIMAT/BE/CE/120940107018 Page 40 ADVANCE COMPUTING TECHNOLOGY (170704) Next, you will be asked to provide the Node Controller information. You need to provide each Node Controller's IP address separated by spaces. When you click onForward, you will need to provide each Node's root password. Create a "User" account for your Frontend. Click Forward once done. VIMAT/BE/CE/120940107018 Page 41 ADVANCE COMPUTING TECHNOLOGY (170704) You need to "sync date and time" over the network. You can optionally provide your own NTP server settings if your Eucalyptus Frontend is on an isolated network. And there you have it !! Your configuration is now done.. Note down the User Consoleand Admin Console credentials before you move forward. VIMAT/BE/CE/120940107018 Page 42 ADVANCE COMPUTING TECHNOLOGY (170704) You can now launch any browser, type in the credentials and view the Admin and User consoles respectively. That's it for now !! In the NEXT tutorial we will be looking at the User Console and how to go about launching your very first Eucalyptus instance. Stay tuned for much more ! VIMAT/BE/CE/120940107018 Page 43 ADVANCE COMPUTING TECHNOLOGY (170704) PRACTICAL:-7 AIM:-To study various application of IaaS, PaaS and SaaS. SaaS: Software as a Service:Cloud application services, or Software as a Service (SaaS), represent the largest cloud market and are still growing quickly. SaaS uses the web to deliver applications that are managed by a thirdparty vendor and whose interface is accessed on the clients’ side. Most SaaS applications can be run directly from a web browser without any downloads or installations required, although some require plugins. Because of the web delivery model, SaaS eliminates the need to install and run applications on individual computers. With SaaS, it’s easy for enterprises to streamline their maintenance and support, because everything can be managed by vendors: applications, runtime, data, middleware, OSes, virtualization, servers, storage and networking. Popular SaaS offering types include email and collaboration, customer relationship management, and healthcare-related applications. Some large enterprises that are not traditionally thought of as software vendors have started building SaaS as an additional source of revenue in order to gain a competitive advantage. SaaS Examples: Google Apps Salesforce Workday Concur Citrix GoToMeeting Cisco WebEx Common SaaS Use-Case: Replaces traditional on-device software Technology Analyst Examples: Bill Pray (Gartner), Amy DeMartine (Forrester) PaaS: Platform as a Service:Cloud platform services, or Platform as a Service (PaaS), are used for applications, and other development, while providing cloud components to software. What developers gain with PaaS is a framework they can build upon to develop or customize applications. PaaS makes the development, testing, and deployment of applications quick, simple, and cost-effective. With this technology, enterprise operations, or a third-party provider, can manage OSes, virtualization, servers, storage, networking, and the PaaS software itself. Developers, however, manage the applications. Enterprise PaaS provides line-of-business software developers a self-service portal for managing computing infrastructure from centralized IT operations and the platforms that are installed on top of the hardware. The enterprise PaaS can be delivered through a hybrid model that uses both public IaaS and on-premise infrastructure or as a pure private PaaS that only uses the latter. VIMAT/BE/CE/120940107018 Page 44 ADVANCE COMPUTING TECHNOLOGY (170704) Similar to the way in which you might create macros in Excel, PaaS allows you to create applications using software components that are built into the PaaS (middleware). Applications using PaaS inherit cloud characteristic such as scalability, high-availability, multi-tenancy, SaaS enablement, and more. Enterprises benefit from PaaS because it reduces the amount of coding necessary, automates business policy, and helps migrate apps to hybrid model. For the needs of enterprises and other organizations, Apprenda is one provider of a private cloud PaaS for .NET and Java. Enterprise PaaS Examples: Apprenda Common PaaS Use-Case: Increases developer productivity and utilization rates while also decreasing an application’s time-to-market Technology Analyst Examples: Richard Watson (Gartner), Eric Knipp (Gartner), Yefim Natis (Gartner), Stefan Ried (Forrester), John Rymer (Forrester) IaaS: Infrastructure as a Service Cloud infrastructure services, known as Infrastructure as a Service (IaaS), are self-service models for accessing, monitoring, and managing remote datacenter infrastructures, such as compute (virtualized or bare metal), storage, networking, and networking services (e.g. firewalls). Instead of having to purchase hardware outright, users can purchase IaaS based on consumption, similar to electricity or other utility billing. Compared to SaaS and PaaS, IaaS users are responsible for managing applications, data, runtime, middleware, and OSes. Providers still manage virtualization, servers, hard drives, storage, and networking. Many IaaS providers now offer databases, messaging queues, and other services above the virtualization layer as well. Some tech analysts draw a distinction here and use the IaaS+ moniker for these other options. What users gain with IaaS is infrastructure on top of which they can install any required platform. Users are responsible for updating these if new versions are released. IaaS Examples: Amazon Web Services (AWS) Microsoft Azure Google Compute Engine (GCE) Joyent Common IaaS Use-Case: Extends current data center infrastructure for temporary workloads (e.g. increased Christmas holiday site traffic). Technology Analyst Examples: Kyle Hilgendorf (Gartner), Drue Reeves (Gartner), Lydia Leong (Gartner), Doug Toombs (Gartner), Gregor Petri (Gartner EU), Tiny Haynes (Gartner EU), Jeffery Hammond (Forrester), James Staten (Forrester) VIMAT/BE/CE/120940107018 Page 45