12_Debugging_verteilter_Systeme

advertisement
XII.1
Debugging of Distributed Systems
XII.2
Debugging of Distributed Systems
• Example of a tool for distributed systems
• Approach to fault search during testing
• Control and inspection of internal program runtime
XII.3
Debugging of Distributed Systems
Requirements
– User-friendliness
– Problem-orientation (symbolic Debugging)
(String c = „xyz“ instead of „LOC FF2243 AC32...)
– Reproducibility (quasi-deterministic)
– Presentation of state information
(Variables, Registers, Ports etc: „show c“)
– Modification of system state
(set c = „ABC“)
– Supervision mechanisms
Query / Modification
User
Debugger
state information
Tested
program
XII.4
Special problems
•
•
•
•
•
•
•
Parallel processing
Indeterminism
Absence of a global state
Absence of a common clock
Interference “Debugger  System”
Resulting information flooding
Semantics of special constructs
(breakpoint, break conditions)
• Improved functionality
(inter-process communication)
XII.5
Inter-process communication
• State information contains in addition to process-/object
state also communication state
 Manipulated intervention preferable
• Separation in intra-process layer (conventional) and interprocess-layer (special)
Functionality of the inter-process layer
• Access to messages:
–
–
–
–
insert <m> in <port>
read <m> from <port>
extract <m> from <port>
forward <m> to <port>
XII.6
Inter-process communication
• Break points
– set break <port> <mtype> [send | receive]
– set break <port1> ... <portn>
• Statistic accounting records
• Access to operating system objects
(Semaphore, Processes)
XII.7
Consistent state representations
Problem: no common clock and storage
 no consistent state representation
• Approaches
– Clock synchronization (in the range of milliseconds)
– Logic arrangement of the events
• Basis: Lamport-Approach
– Half-order „Pre-Relation“
– Events are ordered by causal context
(sending before receiving)
– Unordered if events are independent
XII.8
Consistent state presentations
• Rules
– a and b in the same process, a before b : ab
– a to send, b to receive a message : ab
– ab, bc  ac (transitively)
 All essential events for distributed processing can be
ordered
(consistent logic “snapshots“)
XII.9
Lamport-Approach
Realization via the algorithm
– each process has event counter Z (initially “Null”)
– each inter-process event has a number N(E),
as well as the messages ( = N(E))
• Sending:
– increment of Z (Z:=Z+1)
– marking Sending Event: N(E) := Z
– marking message: :=Z
• Receiving of message with number 
– if  > Z (Receiver) set Z:= + 1
– otherwise set Z:=Z+1
– Receiving Event N(E) := Z
• Intra-process Event:
– Z:=Z+1
– N(E) := Z
XII.10
Lamport-Approach
P1
1
2
3
4
5
6
P2
P3
1
2
3
4
5
7
8
9
7
10
11
9
12
12
• Causal events ordered completely
• Non-causal events  unordered (for instance, Nr.12 within P2 and P3)
XII.11
Semantics of breakpoints
Problem:
When does a break point satisfy distributed conditions?
Approach:
– simple predicates (a process, „call proc“)
– disjunctive predicates („P1: call proc | P2: call xy“)
– subjunctive predicates („P1: call proc & P1: x=1“)
only a process inside
– joint predicates: coupling of events in pre-relation:
t11
Process 1
t12
t21 t22 t23
Process 2
t31, t22 : ordered
t11, t21 : unordered
Process 3
t31
t32
t33
XII.12
Consistent stopping of processes
Problem:
Time delay after issuing of a halt-command
Approach:
Backtracking to consistent state directly before a stopping event
(„reset line“)
Procedure:
Backtracking of the causal contexts regarding to the pre-relation
of messages
t11
Process 1
t21 t22 t23
Process 2
Process 3
t12 t13 t14
t12: stop point event
t24
Process 2: Backtracking on t23
Process 3: Backtracking on t32
t31
t32
t33
t34
XII.13
Distributed trace-steps
Basis:
Step-Mode from sequential Debuggers (interactive)
– one trace-step means movement up to the next point (inter-process
event)
– local calculations build a entity
– sending operations are carried out on all participating processes
– receiving operations only if a message exists (as the case may be
after sending step)
1
2
3
Distributed
trace-steps
Calculation
phase
Interaction point
XII.14
Indeterminism handling
Indeterministic program behavior:
race conditions
Decisions:
– Testing of different possible execution sequences via distributed
Single Step
– Re-execution / Replay via output recording
Approach:
– recording of all inter-process events
– control of repeated execution based on this (Re-execution)
– high storage requirements but reduction via check points without
precedent events
– Replay also to a single process possible
(important also in the technical processes)
XII.15
Handling of information flooding
Requirement:
Recorded / output information to be reduced
• Limitation on inter-process events
• Limitation on relevant time intervals
• Abstraction forms for
– process groups
– execution (Timing-Diagram)
– ports (abstract message flow)
• Graphics support
(control windows, animation tools)
XII.16
Distributed debugging: concepts
Hierarchized influencing
• Level 1 : „Free runtime“
– no modification, only trace-recording
– minimal interference
• Level 2 : „Self-responsibility“
– freely modifiable execution
– strong interference
– full responsibility of the tester for execution control
• Level 3 : „Pseudo-Real-time“
–
–
–
–
“the best possible compensation for strong interference”
“private clock” per process
“private clock” runs, except in the Debugger-Code
“private clock” synchronized via, for instance, Lamport-Algorithm on
partial order
XII.17
Architecture principles
Alternatives:
1. Separate processes: Program / Debugger
2. Separate processes with common data (also lightweight
processes)
3. Integrated processes with direct instrumentation
 as a rule alternative 2 or 3 are most common
XII.18
Architecture proposal
Computer A
Process 1
local debugging control
Centralized
dialogue
process
Process 2
Computer B
Process 3
local debugging control
Process 4
Download