ITEC452_finalExam_ta..

advertisement
ITEC 452 – Final Exam
Name: Daniel Lust
.
1. (10 points) Explain how “distributed computing” and “grid computing” relate to “cloud
computing.”
Distributed, grid, and cloud computing are all related due to the fact that they use
multiple computers to complete a given a task. However, there are differences of each
based on architecture and performance. Distributed computing mainly focuses on the
utilization of several computers to process a given portion of a task to create one solution.
Grid computing has the same idea but a high emphasis on the utilization of resources
such as the CPU and memory of each computer to complete a task. E.g. airforce using
300 PS3’s to create one super computer. As for cloud computing, the biggest difference is
in allowing the users to use various services without investing in the software. It also
allows users to use other services such as word processing with real-time updates. If
multiple users were to be viewing the same document, with proper permissions they are
able to edit the document simultaneously. It also allows web hosting. So how are these
forms of distribution related? They are related by their overall task to use computing
resources economically while maximizing the throughput of the given time to complete a
task.
Karishma Sundaram – www.brighthub.com
http://www.brighthub.com/environment/green-computing/articles/68785.aspx
2. (10 points) Explain what is “grid computing.”
Grid computing as stated above, is a network of computers utilizing their resources to
perform a single task. A more in depth overview of grid computing, is its ability to
harness the idle state of a computers process, and use that processing power to
compute one job. So if a computer is an idle state or room for more processes to be
executed, it uses that cpu’s left over resources to perform the given task. Compared to
distribute computers, only one resource is shared, while in grid computing all resources
are shared. This makes it possible to complete a task in half the time. For a task to be
built around grid computing, the code needs to be parallelized or in a serialized fashion.
Karishma Sundaram – www.brighthub.com
http://www.brighthub.com/environment/green-computing/articles/68785.aspx
3. (10 points) Explain what is “utility computing.”
Utility computing is simple the “packaging of computing resources”. These computing
resources extend to the use of storage, services such as word processing, and storages.
The term utility is derived by the same idea of paying for utilities in a housing
environment such as electricity, water, natural gas, and telephones. Likewise, the
computer resources are the cost factor. This was one of the starting points and what
drove to eventually what became cloud computing. So similarly, utility computing
creates a repacking of computer services to be used as an On-Demand computing.
This idea was originally thought of by John McCarthy when he spoke at the MIT
Centennial in 1961.
“If computers of the kind I have advocated become the computers of the future, then
computing may someday be organized as a public utility just as the telephone system is a
public utility... The computer utility could become the basis of a new and important
industry.”
It wasn’t until the 1990s that utility computing resurfaced. One of the companies that
implemented the idea quoted from John McCarthy was a company called PloyServe inc. It
used a cluster of computers and storage hardware to create a highly available utility for
companies in need of mission critical applications such as Oracle and Microsoft. Overall,
utility computing offers on-demand computing resources for whom ever wants to pay for
it.
Various – Wikipedia.org
http://en.wikipedia.org/wiki/Utility_computing
4. (10 points) Explain security issues in cloud computing.
Some security issues in cloud is its ability to use data encryption, network
security ,application security, and access controls.
Data encryption is a concern because the data you have on a cloud is in a shared
environment! So next to your files of tax returns is someone else’s online bank
statements. So it is very important to have encryption while In an idle state, transitions
and dispositions. Its also very important to know what’s being logged on the cloud
service.
Network Security becomes a concern because of all the current hype revolving around
cloud. This attracts more customers as well as hackers. Its been noted that a cloud
service is more susceptible to denial of service attacks, botnets, but it doesn’t solve the
issue of social engineering. The best possible solution is keep strong network firewalls,
all updates, IDS/IPS systems, and to monitor events using SIEM or a log management
software. It also important to scan network vulnerabilities using programs such as
Nessus.
Another issue is application security. Its very important to have well-coded applications
in the infrastructure to protect against XSS, SQL Injections, CSRF, and vulnerabilities in
Session managements.
One last security issue is how the Access control is implemented. Access control can be
used in a MAC/DAC fashion, but its important to assign users the proper set of rules
when accessing a system. Especially though a VPN! Along side of this is how the ACL is
set up to control password lengths, the amount of passwords input tries, an expiration
date on passwords, and disabling of idle accounts.
5. (10 points) Explain what “hadoop” software framework in distributed computing is.
How the framework in hadoop relates to distributed computing lies within the HDFS. The
HDFS is considered the distributed storage of the application. It runs off of java so can
be used on any machine, and is highly fault-tolerant. During the distribution process,
HDFS stores large files across the machine cluster. Not only does it store the data over
the cluster but creates replicas of the data for fault tolerance. With this in mind, it is
possible to control the replication and block size distributed to the computers. Upon
execution, hadoop runs a various processes to monitor the performance, but to also
keep track of errors. When computations are being ran on different nods, the results are
returned to the HDFS for compacting the results. When hadoop is executed from a
remote location to large scale of clusters and databases, the HDFS also helps navigate
the execution to the nearest set of computers that have closer access to that particular
data requested. This helps reduce network traffic and clogging. So the distributed aspect
of hadoop is how it takes a cluster of computers, distributes data, and runs
computations on them while returning results back to the main computer.
HDFS architecture guide – hadoop.apache.org
http://hadoop.apache.org/common/docs/current/hdfs_design.html
6. (10 points) Explain what “MapReduce” software framework in distributed computing is.
MapReduce is a software program built by google to organize huge amounts of data!
With a growing amount of data and information being stored online, its important to
keep all of it organized so it can be reached better. During executing, the MAP portion or
the master node takes all the input and packets them to be sent to nodes. These nodes
organize the information and then send it back to the Master. The next step is to
REDUCE. This process takes all the output performed by the nodes and creates a single
file result. Mapreduce was also integrated into hadoop, to not only organize data, but to
give a layer of programming on top of it. Programs can be executed in C , Python exc.
Mapreduce was also used in hadoop to organize the replicated data packets.
Mapreduce.org
http://www.mapreduce.org/what-is-mapreduce.php
7. Explain what each of the following four companies offers in their cloud computing
services.
a. (10 points) IBM
b. (10 points) Google
c. (10 points) Amazon
d. (10 points) Rackspace
Download