ITEC 452 – Final Exam Name: Daniel Lust . 1. (10 points) Explain how “distributed computing” and “grid computing” relate to “cloud computing.” Distributed, grid, and cloud computing are all related due to the fact that they use multiple computers to complete a given a task. However, there are differences of each based on architecture and performance. Distributed computing mainly focuses on the utilization of several computers to process a given portion of a task to create one solution. Grid computing has the same idea but a high emphasis on the utilization of resources such as the CPU and memory of each computer to complete a task. E.g. airforce using 300 PS3’s to create one super computer. As for cloud computing, the biggest difference is in allowing the users to use various services without investing in the software. It also allows users to use other services such as word processing with real-time updates. If multiple users were to be viewing the same document, with proper permissions they are able to edit the document simultaneously. It also allows web hosting. So how are these forms of distribution related? They are related by their overall task to use computing resources economically while maximizing the throughput of the given time to complete a task. Karishma Sundaram – www.brighthub.com http://www.brighthub.com/environment/green-computing/articles/68785.aspx 2. (10 points) Explain what is “grid computing.” Grid computing as stated above, is a network of computers utilizing their resources to perform a single task. A more in depth overview of grid computing, is its ability to harness the idle state of a computers process, and use that processing power to compute one job. So if a computer is an idle state or room for more processes to be executed, it uses that cpu’s left over resources to perform the given task. Compared to distribute computers, only one resource is shared, while in grid computing all resources are shared. This makes it possible to complete a task in half the time. For a task to be built around grid computing, the code needs to be parallelized or in a serialized fashion. Karishma Sundaram – www.brighthub.com http://www.brighthub.com/environment/green-computing/articles/68785.aspx 3. (10 points) Explain what is “utility computing.” Utility computing is simple the “packaging of computing resources”. These computing resources extend to the use of storage, services such as word processing, and storages. The term utility is derived by the same idea of paying for utilities in a housing environment such as electricity, water, natural gas, and telephones. Likewise, the computer resources are the cost factor. This was one of the starting points and what drove to eventually what became cloud computing. So similarly, utility computing creates a repacking of computer services to be used as an On-Demand computing. This idea was originally thought of by John McCarthy when he spoke at the MIT Centennial in 1961. “If computers of the kind I have advocated become the computers of the future, then computing may someday be organized as a public utility just as the telephone system is a public utility... The computer utility could become the basis of a new and important industry.” It wasn’t until the 1990s that utility computing resurfaced. One of the companies that implemented the idea quoted from John McCarthy was a company called PloyServe inc. It used a cluster of computers and storage hardware to create a highly available utility for companies in need of mission critical applications such as Oracle and Microsoft. Overall, utility computing offers on-demand computing resources for whom ever wants to pay for it. Various – Wikipedia.org http://en.wikipedia.org/wiki/Utility_computing 4. (10 points) Explain security issues in cloud computing. Some security issues in cloud is its ability to use data encryption, network security ,application security, and access controls. Data encryption is a concern because the data you have on a cloud is in a shared environment! So next to your files of tax returns is someone else’s online bank statements. So it is very important to have encryption while In an idle state, transitions and dispositions. Its also very important to know what’s being logged on the cloud service. Network Security becomes a concern because of all the current hype revolving around cloud. This attracts more customers as well as hackers. Its been noted that a cloud service is more susceptible to denial of service attacks, botnets, but it doesn’t solve the issue of social engineering. The best possible solution is keep strong network firewalls, all updates, IDS/IPS systems, and to monitor events using SIEM or a log management software. It also important to scan network vulnerabilities using programs such as Nessus. Another issue is application security. Its very important to have well-coded applications in the infrastructure to protect against XSS, SQL Injections, CSRF, and vulnerabilities in Session managements. One last security issue is how the Access control is implemented. Access control can be used in a MAC/DAC fashion, but its important to assign users the proper set of rules when accessing a system. Especially though a VPN! Along side of this is how the ACL is set up to control password lengths, the amount of passwords input tries, an expiration date on passwords, and disabling of idle accounts. 5. (10 points) Explain what “hadoop” software framework in distributed computing is. How the framework in hadoop relates to distributed computing lies within the HDFS. The HDFS is considered the distributed storage of the application. It runs off of java so can be used on any machine, and is highly fault-tolerant. During the distribution process, HDFS stores large files across the machine cluster. Not only does it store the data over the cluster but creates replicas of the data for fault tolerance. With this in mind, it is possible to control the replication and block size distributed to the computers. Upon execution, hadoop runs a various processes to monitor the performance, but to also keep track of errors. When computations are being ran on different nods, the results are returned to the HDFS for compacting the results. When hadoop is executed from a remote location to large scale of clusters and databases, the HDFS also helps navigate the execution to the nearest set of computers that have closer access to that particular data requested. This helps reduce network traffic and clogging. So the distributed aspect of hadoop is how it takes a cluster of computers, distributes data, and runs computations on them while returning results back to the main computer. HDFS architecture guide – hadoop.apache.org http://hadoop.apache.org/common/docs/current/hdfs_design.html 6. (10 points) Explain what “MapReduce” software framework in distributed computing is. MapReduce is a software program built by google to organize huge amounts of data! With a growing amount of data and information being stored online, its important to keep all of it organized so it can be reached better. During executing, the MAP portion or the master node takes all the input and packets them to be sent to nodes. These nodes organize the information and then send it back to the Master. The next step is to REDUCE. This process takes all the output performed by the nodes and creates a single file result. Mapreduce was also integrated into hadoop, to not only organize data, but to give a layer of programming on top of it. Programs can be executed in C , Python exc. Mapreduce was also used in hadoop to organize the replicated data packets. Mapreduce.org http://www.mapreduce.org/what-is-mapreduce.php 7. Explain what each of the following four companies offers in their cloud computing services. a. (10 points) IBM b. (10 points) Google c. (10 points) Amazon d. (10 points) Rackspace