View Company Presentation

advertisement
BITS Pilani presentation
BITS Pilani
Hyderabad Campus
D. Powar
Lecturer,
BITS-Pilani, Hyderabad Campus
BITS Pilani
Hyderabad Campus
SSZG527
Lecture 18
Cloud Computing
Lectures
Lecture No
Objectives
Lecture 10
Capacity management
Lecture 11
Introduction to PAAS (Drupal, Wolf frameworks, force.com), 5 Principles of
UI Design by AWS: MADPO Principles
Lecture 12
RAID (Redundant Array of Independent Disks)
Lecture 13
MapReduce - distributed programming frame work, Pig, Hive
Lecture 14
Distributed File System (GFS,HDFS), cloud storage
Lecture 15
Multi-Tenancy, 4 levels multi-tenancy
Lecture 16
Cloud security
Lecture 17
OpenStack – a cloud computing operating system
BITS Pilani, Hyderabad Campus
MapReduce
BITS Pilani, Hyderabad Campus
Map+Reduce
Very
big
data
R
E
D
U
C
E
M
A
P
Map:
– Accepts input key/value pair
– Emits intermediate key/value pair
Result
Reduce
– Accepts intermediate key/value* pair
– Emits output key/value pair
BITS Pilani, Hyderabad Campus
MapReduce Programming Model
Data type: key-value records
Map function:
(Kin, Vin)  list(Kinter, Vinter)
Reduce function:
(Kinter, list(Vinter))  list(Kout, Vout)
BITS Pilani, Hyderabad Campus
Examples
let map(k,v) =emit (k.toUpper(), v.toUpper() )
– (“foo”, “bar”) -> (“FOO”,”BAR”)
– (“key2”,”data”) -> (“KEY2”,”DATA”)
let map(k,v)= foreach char c in v :emit (k,c)
– (“A”,”cats”)->(“A”,”c”),(“A”,”a”),(“A”,”t”),(“A”,”s”)
– (“B”,”hi”) ->(“B”,”h”), (“B”,”i”)
let map(k,v)= if (isPrime(v)) then emit (k,v)
– (“foo”,7) -> (“foo”,7)
– (“test”,10) -> (nothing)
let map(k,v)= emit(v.length,v)
– (“hi”,”test”)->(4,”test”)
– (“x”,”quux”) ->(4,”quux”)
BITS Pilani, Hyderabad Campus
Example: Word Count
def mapper(line):
foreach word in line.split():
output(word, 1)
def reducer(key, values):
output(key, sum(values))
BITS Pilani, Hyderabad Campus
Word Count Execution
Input
the quick
brown
fox
the fox
ate the
mouse
how now
brown
cow
Map
Map
Shuffle & Sort
Reduce
the, 1
brown, 1
fox, 1
Reduce
brown, 2
fox, 2
how, 1
now, 1
the, 3
Reduce
ate, 1
cow, 1
mouse, 1
quick, 1
the, 1
fox, 1
the, 1
Map
how, 1
now, 1
brown, 1
Map
Output
quick, 1
ate, 1
mouse, 1
cow, 1
BITS Pilani, Hyderabad Campus
Word Count example code (java)
http://hadoop.apache.org/docs/stable/mapred_tutorial.html
http://wiki.apache.org/hadoop/WordCount
BITS Pilani, Hyderabad Campus
Distributed File Systems
BITS Pilani, Hyderabad Campus
The Google File System
 GFS stores a huge number of files, totaling
many terabytes of data
 Individual file characteristics
– Very large, multiple gigabytes per file
– Files are updated by appending new entries to the
end (faster than overwriting existing data)
– Files are virtually never modified (other than by
appends) and virtually never deleted.
– Files are mostly read-only
BITS Pilani, Hyderabad Campus
Google File System
Divide files in large 64 MB chunks, and distribute/replicate chunks
across many servers.
A couple of important details:
– The master maintains only a (file name, chunk server) table in main memory:
minimal I/O
– Files are replicated using a primary-backup scheme; the master is kept out
of the loop
BITS Pilani, Hyderabad Campus
HDFC??
 Hadoop's Distributed File System is designed to reliably
store very large files across machines in a large cluster.
 It is inspired by the Google File System.
 Hadoop DFS stores each file as a sequence of blocks, all
blocks in a file except the last block are the same size.
 Blocks belonging to a file are replicated for fault
tolerance. The block size and replication factor are
configurable per file. Files in HDFS are "write once" and
have strictly one writer at any time.
BITS Pilani, Hyderabad Campus
Hadoop Distributed File System – Goals:
• Store large data sets
• Cope with hardware failure
• Emphasize streaming data access
BITS Pilani, Hyderabad Campus
From GFS to HDFS
Terminology differences:
– GFS master = Hadoop namenode
– GFS chunkservers = Hadoop datanodes
Functional differences:
– No file appends in HDFS (planned feature)
– HDFS performance is (likely) slower
BITS Pilani, Hyderabad Campus
HDFS Architecture
HDFS namenode
Application
HDFS Client
(file name, block id)
/foo/bar
File namespace
block 3df2
(block id, block location)
instructions to datanode
(block id, byte range)
block data
datanode state
HDFS datanode
HDFS datanode
Linux file system
Linux file system
…
Adapted from (Ghemawat et al., SOSP 2003)
…
BITS Pilani, Hyderabad Campus
Namenode Responsibilities
 Managing the file system namespace:
– Holds file/directory structure, metadata, file-toblock mapping, access permissions, etc.
 Coordinating file operations:
– Directs clients to datanodes for reads and writes
– No data is moved through the namenode
 Maintaining overall health:
– Periodic communication with the datanodes
– Garbage collection
BITS Pilani, Hyderabad Campus
Cloud???
 Cloud storage is a model of networked online storage
where data is stored in virtualized pools of storage
 Companies operate large data centers, and people who
require their data to be hosted, buy or lease storage
capacity from them
 Cloud storage services may be accessed through a web
service application programming interface (API), a cloud
storage gateway or through a Web-based user interface
 It is difficult to pin down a canonical definition of cloud
storage architecture, but object storage is reasonably
analogous
BITS Pilani, Hyderabad Campus
Multi-tenanancy
BITS Pilani, Hyderabad Campus
basic SaaS maturity model
1. ad-hoc
/custom
2. configurable
single tenant
3. configurable
multi tenant
4. configurable
multi tenant
(scalable)
Ad-hoc /customizable instances
 Each customer has their own custom vision of
the software
 Represents a enterprise data center where
there are multiple instances and versions of the
software
 Each customer would have their own binaries,
as well as their own dedicated processes for
implementation of the application
 Disadv: Difficulty in Management: Each
customer would need their own management
support
BITS Pilani, Hyderabad Campus
Configurable instances
 All customers share the same vision of the
software (one copy for each customer)
 adv: Easy Management: Single copy of the
software
BITS Pilani, Hyderabad Campus
Configurable multi-tenant efficient instances
 All customers share the same version of the
software (only single copy among all
customers)
 adv: Easy Management: running of only single
instance
BITS Pilani, Hyderabad Campus
Configurable multi-tenant efficient
instances (scalable)
 All customers share the same version of the
software (only single copy among all
customers)
 Software is hosted on a cluster of computers
 Hence, allows the capacity of the system to
scale almost limitlessly
 Thus, increase in no. of customers and capacity
as well
 Ex: Gmail, yahoo mail, etc
 Disadv: Shared storage problem
BITS Pilani, Hyderabad Campus
vs
share
isolate
business model (can I monetise?)
architectural model (can I do it?)
operational model (can I guarantee SLAs?)
BITS Pilani, Hyderabad Campus
meta-data
access control
Authentication
 Unlike traditional computer systems, the tenant
would specify the valid users, and cloud service
provider would authenticate them
 Two basic approaches are used
 Centralized authentication
 Decentralized authentication
BITS Pilani, Hyderabad Campus
Authentication (contd..)
Centralized authentication:
 Authentication is performed using a centralized user
database
 Cloud admin gives the tenant admin rights to manage user
accounts for that tenant
 Multiple (two) sign-on service
 Given self service nature of the cloud, it is more generally
used
Decentralized authentication:


Each tenant maintains their own user database, and needs
to deploy a federation service that interface between that
tenant’s authentication framework and the cloud system’s
authentication service
Single sign-on service
BITS Pilani, Hyderabad Campus
Resource sharing
 Two major resource that need to be shared are storage
and servers
 Sharing storage resources (two types)
 File system
 Databases
 Since file system storage is well known mechanism, we
will restrict our discussion to database storage
BITS Pilani, Hyderabad Campus
Database
There are two methods of sharing data in a single database
 Dedicated tables per tenant
 Shared table
Dedicated tables per tenant:
 Each tenant stores their data in a separate set of tables
different from other tenants
 ex: www.mygarage.com portal
 Shows the way auto repair stores may store each table
as separate file
BITS Pilani, Hyderabad Campus
Dedicated tables per tenant:
Best garage
Car license
Service
Cost
Service
Cost
Friendly garage
Car license
Honest garage
Car license
Service
Cost
BITS Pilani, Hyderabad Campus
Shared table:
The data for all the tenant is stored in the same
table in different rows.
One of the column in the table identifies a tenant
to which a particular row belongs
It is more space efficient than previous approach
A auxiliary table, called a metadata table, stores
information about the tenants
BITS Pilani, Hyderabad Campus
Shared table (contd..)
Data table 1
Tenant ID
Car license Repair
Cost
1
2
2
1
3
2
Metadata table 1
Tenant ID
Data
1
Best garage
2
Friendly garage
3
Honest garage
BITS Pilani, Hyderabad Campus
Data customization
 It is important for the cloud infrastructure to support
customization of the stored data, since it is likely that
different tenants may want store different data in their
tables
 In Dedicated table method, each tenant has their own
table, and therefore can have different schema
 Difficulty is with shared table approach
 Three method used
 Pre-allocated columns
 Name-value pair
 XML method
BITS Pilani, Hyderabad Campus
Pre-allocated columns
Space is reserved in the tables for custom
columns, which can be used by tenants for
defining new columns
Salesforce.com reserves 500 columns
Some of the tenants may not use these columns
Disadv: There could be a lot of wasted space
BITS Pilani, Hyderabad Campus
Pre-allocated columns
Tenant ID
Car license
Service
Cost
Custom1
Custom2
Data table 1
1
2
2
1
3
2
Metadata table 1
Tenant ID
Tenant name
Custom1 name
Custom1 type
1
Best garage
Service rating
int
2
Friendly garage
Service manager
string
3
Honest garage
BITS Pilani, Hyderabad Campus
Name-value pair
The standard table will have an extra column
which is a pointer to a table of name-value pair,
which indicates additional custom fields for a
record
The table name-value pair is also called as a
pivot table
This method overcomes the deficiencies of
storage wastage from previous method
BITS Pilani, Hyderabad Campus
Name-value pair (contd..)
Tenant ID
Car license
Service
Cost
1
Name-value pair record
275
2
Data table 1
2
1
3
2
Name-value pair
Name ID
Value
275
15
5.5
Data table 2
Metadata table 2
Metadata table 1
Name ID
Name
Type
Tenant ID
Data
15
Service rating
int
1
Best garage
Service manager
string
2
Friendly garage
3
Honest garage
BITS Pilani, Hyderabad Campus
OpenStack – a cloud computing
operating system
BITS Pilani, Hyderabad Campus
9 core components of OpenStack (Havana)
Nova - Compute Service
Swift - Storage Service
Glance - Imaging Service
Keystone - Identity Service
Horizon - UI Service
Quantum - Network connectivity Service
Cinder - Block Storage Service
Ceilometer - billing, benchmarking, scalability, and
statistics purposes
Heat: Orchestrates multiple composite cloud applications
BITS Pilani, Hyderabad Campus
OpenStack conceptual architecture
BITS Pilani, Hyderabad Campus
Table 1.1. OpenStack current services (Havana)
Service
Project name Description
Dashboard
Horizon
Compute
Nova
Networking
Neutron
Object Storage
Swift
Block Storage
Cinder
Identity Service Keystone
Image Service
Glance
Metering/Monit
Ceilometer
oring Service
Orchestration
Service
Heat
Enables users to interact with OpenStack services to launch an instance, assign IP
addresses, set access controls, and so on.
Provisions and manages large networks of virtual machines on demand.
Enables network connectivity as a service among interface devices managed by
other OpenStack services, usually Compute. Enables users to create and attach
interfaces to networks. Has a pluggable architecture that supports many popular
networking vendors and technologies.
Storage
Stores and gets files. Does not mount directories like a file server.
Provides persistent block storage to guest virtual machines.
Shared services
Provides authentication and authorization for the OpenStack services. Also
provides a service catalog within a particular OpenStack cloud.
Provides a registry of virtual machine images. Compute uses it to provision
instances.
Monitors and meters the OpenStack cloud for billing, benchmarking, scalability,
and statistics purposes.
Higher-level services
Orchestrates multiple composite cloud applications by using either the native
HOT template format or the AWS CloudFormation template format, through
both an OpenStack-native REST API and a CloudFormation-compatible Query
API.
BITS Pilani, Hyderabad Campus
Summary
 Capacity management
 Introduction to PAAS (Drupal, Wolf frameworks,
force.com), 5 Principles of UI Design by AWS
 RAID (Redundant Array of Independent Disks)
 MapReduce - distributed programming frame work, Pig,
Hive
 Distributed File System (GFS,HDFS), cloud storage
 Multi-Tenancy, 4 levels multi-tenancy
 Cloud security
 OpenStack – a cloud computing operating system
BITS Pilani, Hyderabad Campus
Download