Information and Scheduling: What's available and how does it change

Information and Scheduling: What's available and how does it change Jennifer M. Schopf Argonne National Lab Information and Scheduling   Oct 20, 2003 How a scheduler work is closely tied to the information available Choice of algorithm dependent on accessible data 2 This Talk   What approaches expect form information What data is actually available, and some open questions  How data changes  What to do about changing data Oct 20, 2003 3 NB    Oct 20, 2003 I’m speaking (pessimistically) from my own background We’ve heard some talks earlier today (for example PACE) which address some of these problems I still think these are interesting open issues to think about 4 Information systems (NOTE: taken from my standard MDS2 talk)  Information is always old – Time of flight, changing system state – Need to provide quality metrics  Distributed system state is hard to obtain – Information is not contemporaneous (thanks j.g.) – Complexity of global snapshot  Components will fail  Scalability and overhead – Approaches are changed for scalability, this will affect the information available Oct 20, 2003 5 Scheduling approaches assume  A lot of data is available  All information is accurate  Values don’t change Oct 20, 2003 6 What some people expect  Perfect bandwidth info  Number of operations in an application  Scalar value of computer “power”  Mapping of “power” to applications  Perfect load information Oct 20, 2003 9 Bandwidth data  Network Weather Service (Wolski, UCSB) – 64k probe BW data – Latency data – Predictions  Pinger (Les Cotrell, SLAC) – Create long term baselines for expectations on means/medians and variability for response time, throughput, packet loss  Predicting TCP performance – Allen Downey – http://allendowney.com/research/tcp/  Oct 20, 2003 But what do Grid applications need? 10 Perfect Bandwidth Data 64 k probes don’t look like large file transfers LBL-ANL GridFTP (approximately 400 transfers at irregular intervals) end-to-end bandwidth and NWS (approximately 1,500 probes every five minutes) probe bandwidth for the two-week August’01 dataset. Oct 20, 2003 11 Predicting Large File Transfers  Vazhkudai and Schopf: use GridFTP logs and some background data - NWS, ioStat (HPDC 2002) – Error rate of ~15%  M. Faerman A. Su, R. Wolski, and F. Berman (HPDC 99) – Similar results for SARA data  Hu and Schopf: use an AI learning technique on GridFTP log files only (not published yet) – Picks best place to get a file from 60-80% of time, using averages only gives you ~50% “best chosen”  Oct 20, 2003 This topic needs much more study! 12 Data Generally Available From an Application  What some scheduling approaches want: – Number of ops in an application – Exact execution time on a platform – Perfect models of applications Oct 20, 2003 13 Application Data Currently Available   Bad models of applications No models of applications – Some work (Propehsy, Taylor at Texas A&M) does logging to create models    Oct 20, 2003 Many interesting applications have nondeterministic run times User estimates of application run time (historically) off by 20%+ We need to be able to figure out ways to do predictions of application run times WITHOUT models 14 Scalar value of computer “power”  MDS2 gives me: – CPU vendor, model and version – CPU speed – OS name, release and version – RAM size – Node count – CPU count  Oct 20, 2003 Where is “compute power” in this data? 15 What is compute “power”    I could get benchmark data, but what’s the right benchmark(s) to use? Computer “power” simply isn’t scalar, especially in a Grid environment Goal is really to understand how an application will run on a machine Given three different benchmarks, 3 different platforms will perform very differently – one best on BM1, another best on BM2 Oct 20, 2003 16 Mapping “power” to applications   Many scheduling approaches assume “power” is a scalar – just multiply it by the set application time and we’re set Only problem: – Power isn’t a scalar – No one knows absolute application run times – Mapping will NOT be straight forward  Oct 20, 2003 We need a way to estimate application time on a contended system 17 Perfect Load Information  MDS2 gives me: – Basic queue data – Host load 5/10/15 min avg – Last value only Oct 20, 2003 18 Load Predictions  Network weather service – 12+ prediction techniques – Work on any time series – Expect regularly arriving data  Only a prediction of the next value – *I* want to know what load is going to be like in 20 mins – Or the AVERAGE over the next 20 mins? Oct 20, 2003 19 Information and Scheduling  What approaches expect us to have  What we actually have access to  How it changes  What to do about changing data Oct 20, 2003 20 Dedicated SOR Experiments     Oct 20, 2003 Platform- 2 Sparc 2’s. 1 Sparc 5, 1 Sparc 10 10 mbit ethernet connection Quiescent machines and network Prediction within 3% before memory spill 21 Non-dedicated SOR results  Oct 20, 2003 Available CPU on workstations varied from .43 to .53 22 SOR with Higher Variance in CPU Availability Oct 20, 2003 23 Improving predictions  Available CPU has range of 0.48 +/- 0.05  Prediction should also have a range Oct 20, 2003 24 Scheduling needs to consider variance  Conservative Scheduling: Using Predicted Variance to Improve Scheduling Decisions in Dynamic Environments – Lingyun Yang, Jennifer M. Schopf, Ian Foster – To appear at SC'03, November 15-21, 2003, Phoenix, Arizona, USA – www.mcs.anl.gov/~jms/Pubs/lingyun-SCscheduling.pdf Oct 20, 2003 25 Scheduling with Variance  Oct 20, 2003 Summary: Scheduling with variance can give better mean performance and less variance in overall execution time 26 Lessons:       Oct 20, 2003 We need work predicting large file transfers – NOT bandwidth We need to be able to figure out ways to do predictions of application run times WITHOUT models We need predictions over time periods – not just a next value We need a way to represent “power” of a machine, that takes variance into account We need a way to map power to application behavior We need better scheduling approaches that take variance into account 27 Contact Information  Jennifer M. Schopf  jms@mcs.anl.gov  www.mcs.anl.gov/~jms – Links to some of the publications mentioned – Links to the co-edited book “Grid resource Management: State of the Art and Future Trends” Oct 20, 2003 28

Information and Scheduling: What's available and how does it change

Related documents

Products

Support

Information and Scheduling: What's available and how does it change

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib