Finding the Critical Path A Simple ApproacH

advertisement
Scott Chapman – American Electric Power
Paper 9015
Session 331
Agenda
 What I mean by “critical path”
 My simple way of finding it
 Review some sample code
 Questions
 Bonus Material
Geek-required xkcd reference
HTTP://XKCD.COM/399/
Critical path
 Simple definition: how long is this going
to take?
 Longest sequence of activities
 In a project
 In a batch schedule
 Need to look at:
 Predecessor-successor relationships
 Durations
 Time dependencies
 Ideas originated in Project Management
Project Management History
 Critical Path Method (CPM) originated at DuPont
in 1950s
 Used to manage chemical plant maintenance
projects
 Critical path is the sequence of events which
determines the duration of the project
 Delays in tasks on the CP delay the entire
project
 CP must be managed to stay on schedule
 To finish project earlier, tasks on CP must be
somehow shortened
CPM process
 Identify activities
 Identify sequence and dependencies
 Draw network diagram of activities
 Estimate duration of activities
 Identify the longest path in the network
(critical path)
 Monitor & update as project progresses
 Delays in tasks not on CP may change
the CP!
Batch Windows & CP
 Business runs on cycles
 Daily, weekly, monthly processes
 Large applications have large batch
schedules
 Batch schedules can be drawn as a
network diagram
 Predecessor – Successor
relationships between jobs
 Like projects, we like our batch windows
to finish on time!
 Like a project, a batch schedule has a CP
CP Calculation
 Formally:
 CP is path with no slack for any task
 Slack = difference between earliest & latest start
or finish time of task
 Latest finish = latest time task can finish
without delaying project
 And how do you figure that???
 Informally:
 Find all the paths through the network
diagram
 Add up task duration on each path
 Select the longest path
Simple example
Task B
Task E
Task A
Task C
Start
Task D
End
Task F
Five paths through this simple example
Simple example
Task B
Task E
Task A
Task C
Start
Task D
End
Task F
Five paths through this simple example
Simple example
Task B
Task E
Task A
Task C
Start
Task D
End
Task F
Five paths through this simple example
Simple example
Task B
Task E
Task A
Task C
Start
Task D
End
Task F
Five paths through this simple example
Simple example
Task B
Task E
Task A
Task C
Start
Task D
End
Task F
Five paths through this simple example
Simple example
Task B
Task E
Task A
Task C
Start
Task D
End
Task F
Five paths through this simple example
Simple example
A+B+E = 45
A+C+E = 40
A+C+F = 35
D+C+E = 55
D+C+F = 50
Task B
Task A
Start
15 mins
20 mins
Task C
15 mins
Task E
10 mins
End
Task D
Task F
30 mins
5 mins
Calculate path durations from task durations
Simple example
A+B+E = 45
A+C+E = 40
A+C+F = 35
D+C+E = 55
D+C+F = 50
Task B
Task A
Start
15 mins
20 mins
Task C
15 mins
Task E
10 mins
End
Task D
Task F
30 mins
5 mins
Calculate path durations from task durations
So that’s simple enough!
Everybody ready to leave?
Real world complications
 Hundreds or thousands of batch jobs
 Managed by a batch scheduler
package
 Time-of-day dependencies
 Extraneous dependencies
 New jobs added without cleaning up obsolete
dependencies
 Variable execution times
 Variation in data to be processed
 Contention with other processes
 External waits
 Job failures
What that might look like…
Winding your way through that mess is a bit more complicated!
Tooling Options
•
Package from scheduler vendor
+ Should be well integrated
- Cost?
•
Microsoft Project
- Not really meant for this purpose
+ See CMG Proceedings: Schwarz/Aurand, 1999 and
Zaslavsky, 2001
•
SAS/OR
- Cost and effort?
•
Roll your own
+ Can make output exactly what you want
- Time / effort
+ Sample code on your CD!
What is the real question?
1. What is the longest path through the
schedule?
- prediction of the critical path
- usually one-time analysis
2. Why did job X finish late last night?
- an ongoing question / process
- requires the CP for job X
Fortunately, #2 is much easier!
Use what you know
 Predecessors
 (from batch scheduler)
 End times
 (from actual executions)
We are answering a question, not
predicting the future
We just need to explain what happened
Look at the jobs in the critical path for job
X for anomalies
Critical path simplified
 Start at job X
 Find the predecessor that ended last –
that was the critical predecessor to X
 Call that job W
 Find last predecessor of W, call it V
 Repeat until:
 You go back some number of levels or
 You reach a time dependency
 Resulting list is the critical path, for
the day under study, for job X
Simple example
Task B
Task A
Start
15 mins
19:15
19:00
20 mins
19:35
Task C
15 mins
19:45
Task E
10 mins
19:55
End
Task D
Task F
30 mins
19:30
5 mins
19:50
Working backwards…
E ended last
Task B
Task A
Start
19:15
19:00
19:35
Task C
19:45
Task E
19:55
End
Task D
Task F
19:30
19:50
What is E’s last predecessor?
E ended last
C ended after B
Task B
Task A
Start
19:15
19:00
19:35
Task C
19:45
Task E
19:55
End
Task D
Task F
19:30
19:50
What is C’s last predecessor?
E ended last
C ended after B
D ended after A
Task B
Task A
Start
19:15
19:00
19:35
Task C
19:45
Task E
19:55
End
Task D
Task F
19:30
19:50
Critical Path is E – C – D
Task B
Task A
Start
19:15
19:00
19:35
Task C
19:45
Task E
19:55
End
Task D
Task F
19:30
19:50
Tasks A, B, F had no direct bearing on the end time
Complicating simplicity…
 Schedule changes every day
 Weekly / monthly processing
 Application changes
 Schedule relationships may not be
pristine
 Jobs may be run multiple times—be
sure to use the correct instance
 If you want to graph the entire
schedule it gets more complicated
Why bother finding the CP?
 Limit the data you need to look at to
investigate a late-finishing job
 The cause is on the CP
 Find changes
 If the CP changes day to day: why?
 Investigate impact of periodic
schedule differences
 Addition of monthly processing
jobs may change the CP
What I do
 Capture job stats every day to a
performance database
 Standard practice
 Extract history daily to XML file
 Extract schedule once per day and
store for 45 days
 Saved as XML files
 Allows historical investigation
 JavaScript browser application pulls
both data sources and allows for
investigation
Sample application
Input #1: Schedule XML file
A grouping of
<?xml version='1.0'?><?xml-stylesheet type='text/xsl'?>
jobs is an
<opc>
“application”
<app id='#AMCSMISCBILL'>
<op id='87' job='#AMCS331' arr='1930'><wkstn>CPUJ</wkstn>
A job is an
<desc>Online Bill Image xtract</desc>
“operation”
<pred><aid>#AMCSMISCBILL</aid><opid>81</opid></pred>
…
Each job has
<pred><aid>#AMCSMISCBILL</aid><opid>78</opid></pred>
predecessors and
<succ><aid>#SMCSDAILYRPTS</aid><opid>3</opid></succ>successors
…
Arrival time is the
<succ><aid>#AMCSDLYBKUP</aid><opid>6</opid></succ>
earliest the job
</op>
can run
<op id='90' job='#AMX1358' arr='1930'><wkstn>CPUJ</wkstn>
<desc>Load O/L Bill Image</desc>
<pred><aid>#AMCSMISCBILL</aid><opid>87</opid></pred>
<succ><aid>#AMCSMISCBILL</aid><opid>91</opid></succ>
</op>
</app>
Sample application
Input #2: Job Data XML file
Jobs by name
here
<job id="#AMCS331">
<sys>COCJ</sys><cls>0</cls><desc>MCSX4000</desc><cnt>24</cnt>
<acpu>2.18</acpu><aet>13.8</aet><mcpu>2.70</mcpu><met>26.7</met>
Norms, averages,
<run i="2009-05-09 2:01:55" rse=" 1:59:46, 2:01:56, 2:02:09">
maximums
<et>13.7</et><cp>2.16</cp><io>21632</io>
</run>
May have multiple
<run i="2009-05-08 0:51:46" rse=" 0:51:45, 0:51:48, 0:52:02">
runs per day
<et>16.5</et><cp>2.11</cp><io>21690</io>
</run>
<run i="2009-05-07 1:24:41" rse=" 1:22:04, 1:24:41, 1:24:52">
Read, start, end
times
<et>11.4</et><cp>2.12</cp><io>21740</io><sys>CO1J</sys>
</run>
</job>
Stats for single
run
Note relatively compact format designed to
reduce the file size—unfortunately that increases
the complexity of interpreting the data
Sample application output
Compare days in two windows
Why later on 12/29?
Completely
different CP
here
Critical path
comes back
together here
Here’s an ET
difference of >1
hour and CPU
>2x increase!
Sample application details
 On CD at back of room
 Very simple example coded quickly one
Sunday afternoon
 May not be bug free
 Will not satisfy all your needs
 For illustrative purposes only – easier to
understand than the examples in the paper




HTML / JavaScript application
Data is in XML files
Internet Explorer only
Uses XPath & XSLT
 Beyond the scope of this presentation
Application flow
 When HTML page loads
 Calls init() to load the XML files
 Selection criteria populated in HTML
 When user clicks “Find It” button
 Calls findIt() to find the critical path,
which in turn calls:
 getRunsDate(job, date) – returns array of
executions of a job on a given batch date
 getLatestPred(job, app) – returns
predecessor that ran last
getRunsDate is an
example of using DOM
functions to extract
data from XML
Critical parts of the code
findIt()
Get runs for this
job and date
Save stats for this
job to array
Then find preds
and add them to
the array
Loop through
array and build
HTML table
getLatestPred(cjob, capp)
XPath to get array
of preds
Use DOM calls to
get data from
XML
Xpath to find job
name from op id
Get the runs for a
pred job
Check run to see
if it is the latest
pred so far
If it is, save it
That’s essentially it!
 getRunsDate(job, date) is nothing
special – simply retrieves list of runs
from the XML file
 Typical housekeeping code initializing
variables, etc.
 Sample in the paper was much more
complicated due to it being pulled
from the application that does the
graphing
Summary
 Critical Path Analysis for Performance
Analysis usually involves answering
why, not predicting the future
 In such a case, start at the end and
work backwards
 That type of analysis is easy to code
 Record a snapshot of the schedule
daily as well as the job performance
Questions / comments ?
Bonus material: Export to
Excel!
 Sometimes you need to play with your
data
 Copy to Excel to
 Re-sort
 Filter
 Pivot tables
 Graph
 Summarize
 Cut and paste HTML tables works well
 Even better: automate it
New & improved:
sendExcel() function on CD
 If Excel not open, open it
 If previous workbook not open
 Open a new workbook
 Add new sheet to work book with data
 Requires IE and Excel
Useful references
 XPath, XSLT, XML quick reference
cards
 http://www.mulberrytech.com/quickref/index.html
 Browser Book for Web Designers
 http://www.visibone.com/products/browserbook.html
 Humor for geeks
 http://xkcd.com/
Download