Holding slide prior to starting show

advertisement
Holding slide prior to starting
show
Scheduling Parametric Jobs on the
Grid
Jonathan Giddy
J.P.Giddy@wesc.ac.uk
Parametric computation
• Scientifically:
– Study the behaviour of output variables
against a range of different input
scenarios
• Computationally:
– Execute an application multiple times,
each time with a different combination of
input parameters
Why use the Grid?
• Parametric computations
– Require high performance computational
resources
– Require large numbers of computational
resources
– Generate large amounts of concurrency
– Generate uncoupled computations
– Tolerate high latencies
Nimrod/G
Deadline
Cost
Minimise Cost
Jobs
Budget
Cost
Increasing
price
Node 1
1
2
3
Node 2
4
5
6
Node 3
7
8
Node 4
Time
0
8
7
6
5
4
3
2
1
20
19
18
17
15
13
11
8
5
12
15
0
1
2
3
5
7
9
Minimise Time
8
7
6
5
4
3
2
1
0
Jobs
11
20
19
17
16
13
8
4
0
Budget
12
16
20
0
1
3
4
7
9
Cost
2.5
2.71
2.83
3.2
3.25
3.67
Budget / Job 4.0
Increasing
price
Node 1
1
3
Node 2
2
5
Node 3
4
6
Node 4
7
8
Time
Globus 1.1 GRAM API
int globus_gram_client_job_check(
char *resource_manager_contact,
const char *description,
const float conf_percentage,
globus_gram_client_time_t *estimate,
globus_gram_client_time_t *interval)
Note: This is not yet implemented
• This function returns an estimate of the time it would
take for a job of the description provided to reach an
ACTIVE state.
Historical profiling
• Examine characteristics of all jobs in
queue against historical profiles in order
to determine expected start time of a job
• Returns start time and error estimate
Warren Smith, Ian T. Foster, Valerie E. Taylor:
Predicting Application Run Times Using Historical
Information. Job Scheduling Strategies for Parallel
Processing Workshop (JSSPP) 1998: 122-142
Information Overload
• Too many variables:
–
–
–
–
–
–
–
–
–
Number of CPUs
CPU speed
Processor architecture
Operating system
Real memory
Disk speed
Bandwidth
Latency
Other users
Extrapolation of completion rate
1 hr
2 hr
A
2 jobs/hr
B
3 jobs/hr
C
6 jobs/hr
80
20 Hour deadline
15 hour deadline
10 hour deadline
70
Average
No. Processors
60
50
40
30
20
10
0
0
2.5
5
7.5
10
Time
12.5
15
17.5
20
Assumptions
• Compute time >> Network time
• All jobs are the same length on any
particular resource
• Price of a resource is constant over time
• Not much wriggle room during the endgame
– Both scheduling schemes push up against
the limit that they’re not minimising
– Heuristic nature of completion time
What we really want…
• Guaranteed completion time
– globus_gram_client_job_check()
with teeth
– Requires scheduler to internally reserve
space for job in advance
• Advance reservation
– As above, but with external interface
And this too…
• A real grid economy
– Incentive for providers to provide resources
– Incentive for consumers to describe
requirements accurately
– Incentive for consumers to use resources
judiciously
– Price mechanism
• budget as a timely global information parameter
• universally understood
• enables trade-offs in making QoS decisions
A final point
• Optimising is really hard in a wide area
network
– Requires centralised decision maker
– Information is missing
– Information is not contemporaneous
– Information is out-of-date
Scalable information
• …is slow to change
• Budget and deadline are (relatively) constant
and can be propagated far and wide in a
timely manner
• Slow information comes from specifying
requirements in the real world
• Satisfying (instead of optimising) a
requirement is relatively simple
– A resource can so it does
– A resource can’t so it doesn’t
Download