Process Mining: The next step in Business Process Management

advertisement
Process Mining:
The next step in Business
Process Management
Prof.dr.ir. Wil van der Aalst
Eindhoven University of Technology
Department of Information and Technology
P.O. Box 513, 5600 MB Eindhoven
The Netherlands
w.m.p.v.d.aalst@tm.tue.nl
&
Centre for Information Technology Innovation (CITI)
Queensland University of Technology (QUT)
Brisbane, Australia
Outline
• Motivation
• Overview of process mining
–
–
–
–
–
Basic performance metrics
Process models
Organizational models
Social networks
Performance characteristics
• Process Mining: Some of our tools
– EMiT
– Thumb
– MinSocN
• Conclusion
Workflow/BPM in The Netherlands
• “The Netherlands in the country with the highest
density of workflow systems per capita” John
O'Connell (CEO Staffware)
(cf. population density per sq. km
390 versus 2.5 for Australia)
• Emphasis on process modeling
and analysis (the European way)
• Innovative companies like Pallas
Athena, Baan, …
I&T department, Eindhoven University of Technology
• Embedded in research institute BETA joining multiple
disciplines
• Three subgroups:
– Business Process Management
(workflow management, Petri
nets, mining, ...)
– ICT Architectures
(agents, transactions, ...)
– Software Engineering
(software quality, ...)
• Team working on process mining: Wil van der Aalst, Ton
Weijters, Ana Karla Alves de Medeiros, Boudewijn van
Dongen, Eric Verbeek, Minseok Song, Monique VullersJansen, Laura Maruster, …
Motivation
(Zur Muehlen 2003)
Commercial Workflow Systems
ViewStar
Lucent Mosaix
eiStream
WANG Workflow
BlueCross
BlueShield
Eastman
SIGMA
WANG
JCALS
25 years of workflow
CARNOT
Verve
Versata
MS2 Accelerate
VisualInfo
• Pioneers like Skip Ellis
and Michael Zisman
already worked on “office
automation” in the 70ties
• The WFM hype is over
…, but there are more
and more applications, it
has become a mature
technology, and WFM is
adopted by many other
technologies (ERP, Web
Services, etc.).
Continuum
Netscape PM
iPlanet
jFlow
BEA PI
DST AWD
DST AWD
ImagePlus FMS/FAF
Pavone
Onestone
Domino Workflow
NCR ProcessIT
Exotica I - III
FlowMark
Pegasus
MQSeries Workflow
OpenPM
WorkManager
FlowJet
AdminFlow
Changengine
SNI WorkParty
Recognition Int.
Plexus FloWare
COSA
BaaN
Ley COSA
Oracle Workflow
Digital Objectflow
DEC LinkWorks
BancTec FloWare
Digital Proc.Flo.
Beyond BeyondMail
AltaVista Proc.Flow
Banyan BeyondMail
Fujitsu iFlow
Fujitsu Regatta
Teamware Flow
Staffware
FileNet WorkFlo
Visual WorkFlo
Panagon WorkFlo
FileNet Ensemble
Action Coordinator
ActionWorkflow
DaVinci
ActionWorks Metro
Xerox InConcert
TIB/InConcert
IABG ProMInanD
Olivetti X_Workflow
1980
1985
1990
LEU
1995
2000
Start
Register order
Let us reverse the process!
Prepare
shipm ent
(Re)send bill
process
mining
Ship goods
Contact
custom er
Receive paym ent
• Process mining can be used for:
– Process discovery (What is the process?)
– Delta analysis (Are we doing what was specified?)
– Performance analysis (How can we improve?)
Archive order
End
• Particularly interesting in pre- and post-workflow
settings!
Process mining: Overview
Classification of process mining
The following types of process mining can be distinguished:
1)
2)
3)
4)
5)
Determine basic performance metrics
Determine process model
Determine organizational model
Analyze social network (i.e., relations between actors)
Analyze performance characteristics (i.e., derive rules
explaining performance)
2) process model
3) organizational model
4) social network
Start
Register order
Prepare
shipm ent
(Re)send bill
Ship goods
Contact
custom er
Receive paym ent
Archive order
End
1) basic
performance
metrics
5) performance
characteristics
If …then …
(1) Determine basic performance metrics
• Process/control-flow perspective: flow time, waiting time,
processing time and synchronization time.
Questions:
•
•
•
•
•
What is the average flow time of orders?
What is the maximum waiting time for activity approve?
What percentage of requests is handled within 10 days?
What is the minimum processing time of activity reject?
What is the average time between scheduling an activity and actually starting it?
• Resource perspective: frequencies, time, utilization, and
variability.
Questions:
•
•
•
•
•
•
•
•
How many times did Sue complete activity reject claim?
How many times did John withdraw activity go shopping?
How many times did Clare suspend some running activity?
How much time did Peter work on instances of activity reject claim?
How much time did people with role Manager work on this process?
What is the utilization of John?
What is the average utilization of people with role Manager?
How many times did John work for more than 2 hours without interruption?
Example (ARIS PPM)
IDS Scheer's ARIS Process Performance Manager
(2) Determine process model
• Discover a process model (e.g., in terms of a PN or EPC)
without prior knowledge about the structure of the process.
case 1 : task A
case 2 : task A
case 3 : task A
case 3 : task B
case 1 : task B
case 1 : task C
case 2 : task C
case 4 : task A
case 2 : task B
case 2 : task D
case 5 : task E
case 4 : task C
case 1 : task D
case 3 : task C
case 3 : task D
case 4 : task B
case 5 : task F
case 4 : task D
B
A
D
C
E
F
a(W)
(3) Determine organizational model
• Discover the organizational model (i.e., roles,
departments,etc.) without prior knowledge about
the structure of the organization.
Row Points for Source
John
Alex
Lucia
Peter
Mary
A
88
0
112
0
0
B
0
189
0
11
0
C
8
0
0
192
0
D
0
2
0
0
198
E
38
0
62
0
0
Symmetrical Normalization
F
50
0
40
0
0
2.0
Alex
1.5
Peter
1.0
e.g., correspondence analysis (typically
applied in ecology)
Dimension 2
.5
0.0
Mary
-.5
John
Lucia
-1.0
-1.0
-.5
0.0
Dimension 1
.5
1.0
1.5
2.0
(4) Analyze social network
• Social Network
Analysis (SNA)
• Based on:
–
–
–
–
–
Handover of work
Subcontracting
Working together
Reassignments
Doing similar tasks
Example
John
Alex
Lucia
Peter
Mary
John
0
0
0
0
2
Alex
0
0
0
0
0
Lucia
0
0
0
2
2
Peter
0
0
2
0
2
Mary
2
0
2
2
0
(5) Analyze performance characteristics
• Each case (process/workflow instance) has a
number of properties:
– Resource that worked on a specific activity
– Value of a characteristic data element (e.g., size of
order, age of customer, etc.)
– Performance metrics of case (e.g., flow time)
• Using machine-learning techniques it is possible
to find relevant relations between these properties.
Example
caseid
1
2
3
...
Act
Act
A
B
John Mike
Clare Jim
John Mike
...
...
...
...
Act Data
Z
D1
Anne $50
Ike
$75
Clare $55
...
...
Data
D2
20y
15y
20y
...
...
...
Data
D9
80%
75%
80%
...
Proc Wait Flow
time Time time
12h
3d
3.5d
6h
3d 3.25d
18h
4d 4.75d
...
...
...
• If John and Mike work together, it takes longer.
• Expensive cases require less processing.
• Etc.
Process mining: The tools
•
•
•
EMiT
Thumb
MinSocN
Process Mining: Tooling
workflow management systems
case handling / CRM systems
ERP systems
Staffware
FLOWer
SAP R/3
InConcert
Vectus
BaaN
MQ Series
Siebel
Peoplesoft
common XML format for storing/
exchanging workflow logs
mining tools
EMiT
Thumb
MinSocN
Example: processing customer orders
Example in
Staffware:
7 tasks and
all basic
routing
constructs
Fragment of Staffware log
Case 21
Diractive Description
Event
User
yyyy/mm/dd hh:mm
---------------------------------------------------------------------------Start
swdemo@staffw_edl 2003/02/05 15:00
Register order
Processed To
swdemo@staffw_edl 2003/02/05 15:00
Register order
Released By
swdemo@staffw_edl 2003/02/05 15:00
Prepare shipment
Processed To
swdemo@staffw_edl 2003/02/05 15:00
(Re)send bill
Processed To
swdemo@staffw_edl 2003/02/05 15:00
(Re)send bill
Released By
swdemo@staffw_edl 2003/02/05 15:01
Receive payment
Processed To
swdemo@staffw_edl 2003/02/05 15:01
Prepare shipment
Released By
swdemo@staffw_edl 2003/02/05 15:01
Ship goods
Processed To
swdemo@staffw_edl 2003/02/05 15:01
Ship goods
Released By
swdemo@staffw_edl 2003/02/05 15:02
Receive payment
Released By
swdemo@staffw_edl 2003/02/05 15:02
Archive order
Processed To
swdemo@staffw_edl 2003/02/05 15:02
Archive order
Released By
swdemo@staffw_edl 2003/02/05 15:02
Terminated
2003/02/05 15:02
Case 22
Diractive Description
Event
User
yyyy/mm/dd hh:mm
---------------------------------------------------------------------------Start
swdemo@staffw_edl 2003/02/05 15:02
Register order
Processed To
swdemo@staffw_edl 2003/02/05 15:02
Register order
Released By
swdemo@staffw_edl 2003/02/05 15:02
Prepare shipment
Processed To
swdemo@staffw_edl 2003/02/05 15:02
Fragment of XML file
<?xml version="1.0"?>
<!DOCTYPE WorkFlow_log SYSTEM
"http://www.tm.tue.nl/it/research/workflow/mining/WorkFlow_log.dtd">
<WorkFlow_log>
<source program="staffware"/>
<process id="main_process">
<case id="case_0">
<log_line>
<task_name>Case start</task_name>
<event kind="normal"/>
<date>05-02-2003</date>
<time>15:04</time>
</log_line>
<log_line>
<task_name>Register order</task_name>
<event kind="schedule"/>
<date>05-02-2003</date>
<time>15:04</time>
EMiT
Focus on time.
Thumb
Focus on noise.
Thumb is able to deal with noise (D/F-graphs)
no noise
causality
10% noise
Start
Register order
Representation in terms
of an EPC…
(collaboration with IDS Scheer)
Prepare
shipm ent
(Re)send bill
Ship goods
Contact
custom er
Receive paym ent
Archive order
End
MinSocN (Mining Social Networks)
Real case: CJIB
• Processing of fines
• 130136 cases
• 99 different activities
Process in EMiT
Complete process model
Validated by CJIB
Conclusion
Conclusion (1)
• Process mining is practically relevant and the logical
next step in Business Process Management.
diagnosis
process
enactment
process
design
implementation/
configuration
Conclusion (2)
• Process mining provides many interesting challenges for
scientists, customers, users, managers, consultants, and
tool developers.
Start
2) process
model
3) organizational model
Register order
4) social network
Prepare
shipm ent
(Re)send bill
Ship goods
Contact
custom er
Receive paym ent
Archive order
End
1) basic
performance
metrics
5) performance
characteristics
If …then …
More information
http://www.tm.tue.nl/it/research/workflow_mining.htm
http://www.tm.tue.nl/it/research/patterns
http://www.tm.tue.nl/it/staff/wvdaalst
W.M.P. van der Aalst and K.M. van Hee.
Workflow Management: Models, Methods, and
Systems.
MIT press, Cambridge, MA, 2002.
References BPM (just books and far from complete)
• W.M.P. van der Aalst and K.M. van Hee. Workflow Management:
Models, Methods, and Systems. MIT press, Cambridge, MA, 2002.
• Workflow Management: Modeling Concepts, Architecture and
Implementation by Stefan Jablonski and Christoph Bussler; Paperback:
351 pages; International Thomson Publishing, October 1996.
• Production Workflow: Concepts and Techniques, by Frank Leymann,
Dieter Roller, Andreas Reuter; Paperback, 479 pages; Prentice Hall
PTR, 1st edition, September 1999.
• Workflow-Based Process Controlling: Foundation, Design and
Application of Workflow-Driven Process Information Systems, by
Michael Zur Muehlen. Logos, Berlin, 2003
• Proceedings of the International Conference on Business Process
Management (BPM), Eindhoven, The Netherlands, June 26-27, 2003, by
Wil M. P. van der Aalst, Arthur H. M. ter Hofstede, and Mathias Weske
(Editors); Paperback, 391 pages; Springer Verlag, 2003.
• W.M.P. van der Aalst, J. Desel, and A. Oberweis, editors. Business
Process Management: Models, Techniques, and Empirical Studies,
volume 1806 of Lecture Notes in Computer Science. Springer-Verlag,
Berlin, 2000.
References (2)
• Internet Based Workflow Management: Towards a Semantic Web by Dan C.
Marinescu; Hardcover, 626 pages; John Wiley & Sons, 1st edition, April 2002.
• Web Services, by Gustavo Alonso, Fabio Casati, Harumi Kuno, and Vijay
Machiraju; Hardcover, 480 pages, Springer Verlag, June 2003.
• The Workflow Imperative, by Thomas M. Koulopolous; Hardcover, 240 pages;
Van Nostrand Reinhold, 1st edition, January 1995.
• Database Support for Workflow Management: The WIDE Project, by Paul
Grefen, Barbara Pernici, and Gabriel Sanchez (Editors); Hardcover, 296 pages.
Kluwer Academic Publishers, February, 1999.
• Design and Control of Workflow Processes: Business Process Management for
the Service Industry (Lecture Notes in Computer Science # 2617), by Hajo
Reijers; Paperback, 320 pages; Springer Verlag; October 2003.
• Practical Workflow for SAP - Effective Business Processes using SAP's
WebFlow Engine, by Alan Rickayzen et al; Hardcover, 52 pages; SAP Press,
July 2002.
• Workflow Modeling: Tools for Process Improvement and Application
Development, by Alec Sharp and Patrick McDermott, Hardcover, 345 pages;
Artech House, 1st edition, February 2001.
• Business Process Modelling With ARIS: A Practical Guide, by Rob Davis;
Paperback, 545 ; Springer Verlag, August 2001.
References (3)
• Workflow Handbook 2003, by Layna Fischer (Editor); Hardcover, 384 pages. Future
Strategies, April 2003.
Specific for process mining:
• W.M.P. van der Aalst, B.F. van Dongen, J. Herbst, L. Maruster, G. Schimm, and
A.J.M.M. Weijters. Workflow Mining: A Survey of Issues and Approaches. Data and
Knowledge Engineering , 47(2):237-267, 2003.
• W.M.P. van der Aalst and B.F. van Dongen. Discovering Workflow Performance
Models from Timed Logs. EDCIS 2002, volume 2480 of Lecture Notes in Computer
Science, pages 45-63. Springer-Verlag, Berlin, 2002.
• A.J.M.M. Weijters and W.M.P. van der Aalst. Rediscovering Workflow Models from
Event-Based Data using Little Thumb. Integrated Computer-Aided Engineering,
10(2):151-162, 2003.
• W.M.P. van der Aalst and A.J.M.M. Weijters, editors. Process Mining, Special Issue
of Computers in Industry, Elsevier Science Publishers, Amsterdam, 2004.
• W.M.P. van der Aalst, A.J.M.M. Weijters, and L. Maruster. Workflow Mining:
Discovering Process Models from Event Logs. IEEE Transactions on Knowledge
and Data Engineering (to appear).
Appendix: A concrete algorithm
Process Mining: The alpha algorithm
1 s ta r t
b e g in p r o c e s
is c o lle c tie f
c o lle c tie f
2 c o lle c tie o
ff
p a r tic u lie r
p a r tic u lie r
k la a r v o o r c o n tr o le
4 d u b b e lea a n v r a a g ?
d u b b e le
5 n a v r a a gVA
( te le fo o n )
v o ld o e n d e
o n v o ld o e n d e
3 c o n tr o le r e n
c o mp le e th e id /ju is th e id
opv agen gegev ens
n ie t c o mp le e t/o n ju is t
6 opv ragen
o n tb r e k e n d e
gegev ens
P1 o n tb r e k e n d e
D 1 G e e nr e a c tie
gegev ens
w a c h te n
c o mp le e t/ju is t
7 o n tv a n g s t
gegev ens
p a r tic u lie r e n in v o e r e n
8 v e r lo p e nd e a d lin e
in c o mp le e t
9 Be p a le nv e r v o lg 1 p a r tic u lie r e n a fw ijz e n
c o lle c tie f
k la a r v o o r r e g is tr e r e n
a fg e w e z e n
1 0 r e g is tr e r e n
k la a r v o o r in v o e r e n
alpha
algorithm
1 1 a fw ijz e n
1 2 Be p a le no ffe r te
s ta n d a a r do f N IET
Sta n d a a r d o ffe r te
N ie t Sta n d a a r d o ffe r te
1 3 in v ., 1 e c o n tr o le ,
p r in te nSTAN D AAR D
1 5 in v ,1 e c o n tr o le ,
p r in te nN IET STD .
o ffe r te u itg e p r in t
N S u itg e p r in t
Afg e k e u r d N S
a fg e k e u r d e o ffe r te
1 4 e in d c o n tr o le r e ,
te k e n e nSta n d a a r d
1 6 e in d c o n tr o le r e ,
te k e n e nn ie t s td .
G o e d g e k e u r d e o ffe r te
1 7 b e p a le nv e r v o lg
P o f C r e to u r g e w e n s t
r e to u r g e w e n s t
p a r tic u lie r z o n d e r r e to u r
1 9 w a c h te no p
ac c oord
v e r k la r in g
c o lle c tie f r e to u r r e e d s o n tv a n g e n
P2 a c c o o r d
v e r k la r in g
n a a r r e g is tr e r e n
1 8 r e g is tr e r e o
n ffe r te
g e s lo te n
k la a r v o o r e in d e
22 Opbergen
en
e in d e
2 0 o n tv a n g s t
v e r k la r in g
D 2 g e e nr e to u r
o n tv a n g e n
w a c h te n 2
2 1 r e g is tr e r e o
n ffe r te
a fg e le g d
Process log
• Minimal information in
log: case id’s and task
id’s.
• Additional information:
event type, time,
resources, and data.
• In this log there are three
possible sequences:
– ABCD
– ACBD
– EF
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
1
2
3
3
1
1
2
4
2
2
5
4
1
3
3
4
5
4
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
task
task
task
task
task
task
task
task
task
task
task
task
task
task
task
task
task
task
A
A
A
B
B
C
C
A
B
D
E
C
D
C
D
B
F
D
>,,||,# relations
• Direct succession: x>y
iff for some case x is
directly followed by y.
• Causality: xy iff x>y
and not y>x.
• Parallel: x||y iff x>y and
y>x
• Choice: x#y iff not x>y
and not y>x.
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
case
1
2
3
3
1
1
2
4
2
2
5
4
1
3
3
4
5
4
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
task
task
task
task
task
task
task
task
task
task
task
task
task
task
task
task
task
task
A
A
A
B
B
C
C
A
B
D
E
C
D
C
D
B
F
D
A>B
A>C
B>C
B>D
C>B
C>D
E>F
B||C
C||B
AB
AC
BD
CD
EF
Basic idea (1)
x
y
xy
Basic idea (2)
y
x
z
xy, xz, and y||z
Basic idea (3)
y
x
z
xy, xz, and y#z
Basic idea (4)
x
z
y
xz, yz, and x||y
Basic idea (5)
x
z
y
xz, yz, and x#y
It is not that simple: Basic alpha algorithm
Let W be a workflow log over T. a(W) is defined as follows.
1. TW = { t  T | $s  W t  s},
2. TI = { t  T | $s  W t = first(s) },
3. TO = { t  T | $s  W t = last(s) },
4. XW = { (A,B) | A  TW  B  TW  "a  A"b  B a W b  "a1,a2  A a1#W
a2  "b1,b2  B b1#W b2 },
5. YW = { (A,B)  X | "(A,B)  XA  A B  B (A,B) = (A,B) },
6. PW = { p(A,B) | (A,B)  YW } {iW,oW},
7. FW = { (a,p(A,B)) | (A,B)  YW  a  A }  { (p(A,B),b) | (A,B)  YW  b 
B } { (iW,t) | t  TI} { (t,oW) | t  TO}, and
8. a(W) = (PW,TW,FW).
Results
• If log is complete with respect to relation >, it can be used to
mine any SWF-net!
• Structured Workflow Nets (SWF-nets) have no implicit places
and the following two constructs cannot be used:
(Short loops require some refinement but not a problem.)
W
Example
case 1 : task A
case 2 : task A
case 3 : task A
case 3 : task B
case 1 : task B
case 1 : task C
case 2 : task C
case 4 : task A
case 2 : task B
case 2 : task D
case 5 : task E
case 4 : task C
case 1 : task D
case 3 : task C
case 3 : task D
case 4 : task B
case 5 : task F
case 4 : task D
a(W)
B
A
D
C
E
F
Download