Final presentation

advertisement
SProj 3
Libra: An Economy-Driven Cluster Scheduler
Jahanzeb Sherwani
Nosheen Ali
Nausheen Lotia
Zahra Hayat
Project Advisor/Client: Rajkumar Buyya
Faculty Advisor: Dr. Arif Zaman
Problem Statement
Implementing a computationaleconomy based user-centric
scheduler for clusters
What is a cluster?
 A collection of workstations interconnected via a
network technology, in order to take advantage
of combined computational power and resources
 An integrated collection of resources that can
provide a single system image spanning all its
nodes: a virtual supercomputer
 Used for computation-intensive applications
such as AI expert systems, nuclear simulations,
and scientific calculations
Why clusters?
 Cost-effectiveness: low cost-performance ratio





compared to a specialized supercomputer
Increase in workstation performance
Increase in network bandwidth
Decrease in network latency
Scalability higher than that of a specialized
supercomputer
Easier to integrate into an existing network than
specialized supercomputers
Computational Economy
 Traditional system-centric performance
metrics
 CPU Throughput
 Mean Response Time
 Shortest Job First
 Computational economy is the inclusion
of user-specified quality of service
parameters with jobs so that resource
management is user-centric rather than
system-centric
Computational Economy (cont’d)
 Project focus: to implement a scheduler
that aims to maximize user utility
 Job parameters most relevant to usercentric scheduling
 Budget allocated to job by user
 Deadline specified by user
Computational Economy for Grids
 What is a grid?
 An infrastructure that couples resources such as computers
(workstations or clusters ), software (for special purpose
applications) and devices (printers, scanners) across the Internet
and presents them as a unified integrated single resource that
can be widely used
 How a grid differs from a cluster
 Wide geographical area
 Non-dedicated resources
 No centralized resource management
Computational Economy for Grids
 Management of resources and scheduling
computations in a grid environment is complex as the
resources are
 geographically distributed
 heterogeneous in nature
 owned by different individuals or organizations
 have different access and cost models
 resource discovery required
 security issues
 Computational economy has been implemented for grids: the
Nimrod/G resource broker is a global resource management and
scheduling system that supports deadline and economy-based
computations in grid-computing environments
Computational Economy for Clusters
 Market-based Proportional Resource Sharing for
Clusters: Brent Chun and David E. Culler, University
of California at Berkeley, Computer Science Division
 a market-based approach based on the notion of a
computational economy which optimizes for user value. It
describes an architecture for market-based cluster resource
management based on the idea of proportional resource
sharing of basic computing resources. Cluster nodes act as
independent sellers of computing resources while user
applications act as buyers who purchase resources . Users
are allocated credits/tickets-the more tickets they have, the
greater their CPU share. Ticket allocation is on the basis of
the amount the user is willing to pay: his valuation of the job
 Deadline not incorporated
Cluster Architecture
Cluster Management Software
 Cluster Management Software is designed to administer
and manage application jobs submitted to workstation
clusters.
 Creates a Single System Image
 When a collection of interconnected computers appear to be a
unified resource, we say it possesses a Single System Image
 The benefit of a Single System Image is that the exact location
of the execution of a process is entirely concealed from the user.
The user is offered the illusion of a single powerful computer
 Maintains centralized information about cluster status
and resources
Cluster Management Software
 Commercial and Open-source Cluster Management
Software
 Open-source Cluster Management Software







DQS (Distributed Queuing System )
CONDOR
GNQS (Generalized Network Queuing System)
MOSIX
REXEC (Remote Execution)
SGE (Sun Grid Engine)
PBS (Portable Batch System)
Cluster Management Software
 Why SGE was rejected
 lack of online support
 lack of stability
 Final choice of CMS: PBS(Portable Batch System )
Pricing the Cluster Resources
 Cost= a (Job Execution Time) + b (Job Execution
Time / Deadline)
 Cost of using the cluster depends on job length and
job deadline: the longer the user is prepared to wait
for the results, the lower his cost
 Cost formula forces user to reveal his true deadline
Scheduling Algorithm
 How to meet budget and deadline
constraints?
 Ensuring low run-time for the algorithm
 Greedy Algorithm
 Complex solutions unfeasible
 Test run of algorithm:
 5 jobs, arriving at time t=0, 5, 7, 9, 9, on a 3
node cluster
LIBRA with PBS
 Portable Batch System (PBS) as the
Cluster Management Software (CMS)
 Robust, portable, effective, extensible batch
job queuing and resource management
system
 Supports different schedulers
 Job accounting
 Technical Support
Setting up the PBS Cluster
 Installation of Linux with Windows
 Installation of SGE as well as PBS
 Setting up a Network File System
 Configuring GridSim in Java
 Configuring PBSWeb




Setting up the Apache WebServer
PHP scripting for Apache
Setting up PostgreSQL
Setting up SSH
PBS Overview
 Main components of PBS
 Job Server pbs_server
 Job Scheduler pbs_sched
 Job Executor & Resource Monitor pbs_mom
 The server accepts commands and
communicates with the daemons




qsub - submit a job
qstat - view queue and job status
qalter - change job’s attributes
qdel - delete a job
Xpbs – GUI for PBS
Xpbs --- GUI for PBS
Job Scheduling in PBS
The Libra Scheduler
 Default FIFO Scheduler in PBS
 FIFO - sort jobs by job queuing time running
the earliest job first
 Fair share: sort & schedule jobs based on
past usage of the machine by the job owners
 Round-robin - pick a job from each queue
 By key - sort jobs by a set of keys:
shortest_job_first, smallest_memory_first
The Libra Scheduler
 Job Input Controller
 Adding parameters at job submission time
 deadline
 budget
 executionTime
 Defining new attributes of job
 Job Acceptance and Assignment Controller
 Budget checked through cost function
 Admission control through deadline scheduling
 Execution host with the minimum load and ability to
finish job on time selected
 Equal Share instead of Minimum Share
The Libra Scheduler
 Job Execution Controller
 Job run on the best node according to
algorithm
 Cluster and node status updated
 runTime
 cpuLoad
 Job Querying Controller
 Server, Scheduler, Exec Host, and Accounting
Logs
PBS-Libra Web --- Front-end for
the Libra Engine
PBS-Libra Web
PBS-Libra Web
PBSLibra
Web
PBSLibra
Web
PBS-Libra Web
PBSLibra
Web
PBS-Libra Web
PBS-Libra Web
Simulations
 Goal:
 Measure the performance of Libra Scheduler
 Performance = ?
 Maximize user satisfaction
Simulations
 Simulation Software
 Alter GridSim (grid resource management
simulation)
GridSim Class Diagram
Simulations
 Methodology
 Workload
 120 jobs with deadlines and budgets
 Job lengths: 1000 to 10000
 Resources
 10 node, single processor (MIPS rating: 100)
homogenous cluster
Simulations
 Assumptions
 Strict deadlines
 Ignores processing overhead due to
scheduler and clock interrupt
 Scheduler simulated as a function
 Input: job size, deadline, budget
 Output: accept/reject, node #, share allocated
Simulations
 Compared:
 Proportional Share
 FIFO
 Experiments:
 120 jobs, 10 nodes
 Increasing workload to 150 and 200
 Increasing cluster size to 20
Simulation Results
 120 jobs, 20 did not meet budget
experiments
100
97
94
91
88
85
82
79
76
73
70
67
64
61
58
55
52
49
46
43
40
37
34
31
28
25
22
19
16
13
10
7
4
1
time
100
97
94
91
88
85
82
79
76
73
70
67
64
61
58
55
52
49
46
43
40
37
34
31
28
25
22
19
16
13
10
7
4
1
time
100 Jobs, 10 Nodes
FIFO: 23 rejected - Proportional Share: 14 rejected
1200
1000
800
600
400
200
0
experiments
1200
1000
800
600
400
200
0
Simulation Results
 Increase workload to 200 jobs on the same 10
node cluster
experiments
139
145
151
157
163
169
175
181
187
193
199
145
151
157
163
169
175
181
187
193
199
109
103
97
91
85
79
73
67
61
55
49
43
37
31
25
19
13
7
1
139
0
133
200
133
400
127
600
127
800
121
1000
121
1200
115
experiments
115
109
103
97
91
85
79
73
67
61
55
49
43
37
31
25
19
13
7
1
time
time
200 Jobs, 10 Nodes
FIFO: 105 rejected - Proportional Share: 93 rejected
1200
1000
800
600
400
200
0
Simulation Results
 Scale the cluster up to 20 nodes
experiments
139
145
151
157
163
169
175
181
187
193
199
145
151
157
163
169
175
181
187
193
199
109
103
97
91
85
79
73
67
61
55
49
43
37
31
25
19
13
7
1
139
0
133
200
133
400
127
600
127
800
121
1000
121
1200
115
experiments
115
109
103
97
91
85
79
73
67
61
55
49
43
37
31
25
19
13
7
1
time
time
200 Jobs, 20 Nodes
FIFO: 35 rejected - Proportional Share: 23 rejected
1200
1000
800
600
400
200
0
Simulation Results
CPU Utilization
Load on Node 4
120
100
80
60
40
20
0
11.09 23.18 28.16 32.18 36.27 38.24 44.27 56.28 73.36 97.5
time in s
time in s
170.8
101.5
83.44
73.39
71.35
64.32
55.28
37.24
36.19
29.2
23.14
10.08
CPU Utilization
Simulation Results
Load on Node 7
120
100
80
60
40
20
0
Simulation Results
CPU Utilization
Load on Node 9
150
100
50
0
11.1
15.1
19.1
29.2
66.3
time in s
67.3
67.3
82.3
Simulation Results
CPU Utilization
Load on Node 6
120
100
80
60
40
20
0
6.064562
37.19915
152.6984
time in s
154.7456
Conclusion & Future Work
 Succesfully implemented a Linux-based cluster




that schedules jobs using PBS with our
economy-driven Libra scheduler, and PBS-Libra
Web as the front end.
Successfully tested our scheduling policy
Proportional Share delivers more value to users
Exploring other pricing mechanisms
Expanding the cluster with more nodes and with
support for parallel jobs
Download