Safe Collaborative Driving Systems NSF 1035178 and 1329593 Nick Maxemchuk Columbia University Engineering is the Art of Managing Complexity Photograph courtesy of NASA Example – A Collaborative Merge Protocol • Car 1 signals its intent to merge between cars 2 and 3 • Car 2 uses intelligent cruise control to maintain a safe gap behind both cars F and B • Car 3 increases the gap to car 3 to 2*safe gap + car length • Car 1 uses intelligent cruise control to maintain a safe gap behind car B and create a safe gap behind car 2 • When the gaps are safe, the driver in car 1 receives a signal to merge. Objective: To prove that the protocol will not cause an accident for combinations of failures, including 1) mechanical failures, 2) loss of communications, 3) unexpected obstacles in the roadway, 4) nonparticipating drivers who move into the gap, … Managing Complexity 1. An architecture that partitions the problem into smaller, more manageable pieces 2. Eliminate ambiguities -- replace timers that are initiated over an unreliable channel with deadlines based on synchronized clocks 3. Reducing pairwise verification of a large number of implementations by checking a single model and a single conformance test of each implementation Objectives of an Architecture Lessons from Communications Architectures • Break a big design problem into smaller, more manageable modules – A stack architecture with well defined interfaces • Stack Architectures – A subset of possible modular architectures – Re-use the modules in many applications – Modify modules independently – as long as services are preserved • Stacks also partition testing into smaller pieces The services provided by a layer are verified. o A higher layer is verified assuming the services from the lower layer Black box conformance testing at the service interfaces An Intelligent Vehicle Architecture Multiple Stack architecture with Well Defined Interfaces One stack for each interaction with the physical world Use services from lower layer in the same stack or any layer in another stack • • In order to guarantee that we can design, verify and modify components independently, we must verify that there are no loops Example: o o • There is an implicit loop between anti-lock braking and measurements. Feed back control must be considered when designing anti-lock brakes If a broadcast protocol transmits messages at specified times, and the broadcast protocol is used to synchronize clocks, then the implementations must be designed together. Services are provided to specific protocols in a layer, not to the layer o The figure shows the services in the merge protocol Architecture Synchronized Clocks 1. Time is a critical component in coordinated driving maneuvers – – Vehicles must start and complete maneuvers according to a planned schedule In the merge protocol, cars commit to the operation for a specified time, and abort the maneuver if the gap isn’t created by a deadline before the end of the commit time 2. Recent advances make synchronized clocks the new capability in protocols – – – Inexpensive, accurate, atomic clocks are distributed by GPS Crystal oscillators maintain clocks while GPS isn’t available NTP and PTP can synchronize nearby vehicles when necessary 3. Synchronization can reduce the possible protocol sequences – – – Timers that are set over an unreliable communications channel can start at different times in different vehicles – which results in different execution sequences Synchronized clocks can guarantee a unique sequences For instance, attacking armies synchronize their watches 4. Synchronization can provide guarantees that cannot be obtained without synchronization – A Lock Protocol A Safe Lock Protocol –Using Synchronized Clocks • Before merge protocol is used, 3 cars must obtain a lock – Each Accepts only 1 lock • • Simultaneously Release the lock at an absolute time deadline Merging car – the master - does not use the lock without receiving acks from the other two Proving the Safety of Intelligent Vehicles 1. The case for model checking and conformance testing rather than pair wise testing – – – 2. The number of different manufacturers, models per manufacturer, and model years (generations per model) will make pair wise testing unsustainable N different implementations of a system with k participants may require Nk pair wise tests. Formal methods require 1 model check and N conformance tests – procedure used for the telephone network Formal testing procedures allow us to only test the new components in an architecture, rather than the entire vehicle The case for probabilistic testing rather than test tracks – – You cannot operate a vehicle on a test track for a day and guarantee that it will not crash in the real world less than once every 10 or 20 years Probabilistic verification is a directed simulation that has guaranteed less than one failure per 100 years in communications protocols Pair Wise Testing of Implementations vs. Model checking + Conformance Testing of Implementations 4 Makes 3 Models Pairwise Verification Pairwise Verification Conformance Testing FSM Specification Method: Model Checking and Conformance Testing of Protocols 1. Unambiguous model of the interactions between users • Finite state machines (FSM, EFSM) – component machine (Also SDL, Pseudo code, Petri nets, …) 2. Verification of the model • • • Look at sequences of interactions - (Instead of proof systems) Composite Machine - 4 participants with 10 states each may have 10,000 states (Number of execution sequences is much larger) Differs from program verification (both execution sequences and data values) 3. Conformance testing • • Prove that the implementation of the component machine for each user correctly and completely implements the model Argue that all N implementations will work together because they all implement the same model, that has been verified Engineering Applied To Verification How to solve problems that mathematicians consider intractable 1. Probabilistic Verification: • Explore most likely sequences first • Don’t reconsider high probability paths many times As in simulations and on test tracks Upper bound on probability that an unexplored sequence will occur Unexplored paths are unlikely in the life-time of the machine 2. Multi-dimensional Architecture: • Partition verification into smaller, more manageable pieces The services provided by a layer are verified. The next layer is verified assuming those services 3. Time Synchronized Protocols: • Removes time from the finite state machine • Continuous values of time is similar to data in program verification Reduces the number of sequences that must be explored Example: Probabilistic Checking of an ARQ protocol •Composite Machine – Combination of all interactions •Search the sequence without errors only once •If P <= 10-3, one message is transmitted every 1 sec, and we search 5 levels, Each unexplored path occurs less than once every 100 billion years Conformance Testing Objective: To guarantee that the hardware or software implementation of a protocol matches the model that has been verified •Test that every edge from every state in the FSM is initiated by the proper input, issues the proper output, and leads to the proper state. The final state is tested using a UIO sequence, that ends in another state. A minimal test sequence is constructed with the Rural Chinese Postman Algorithm •When an implementation matches the model it will interoperate with any other implementation that matches the model The implementations of a protocol by different manufacturers will operate together For N implementations, interoperability is guaranteed with N tests of the component machine, instead of Ni i-party tests (composite machine) Conformance Testing in Intelligent Vehicles • We test the component machine (one vehicle) rather than the composite machine ( the interaction between a number of vehicles) • We test the the procedures in one layer in one stack by applying the inputs and observing the outputs across the well defined interfaces, rather than the entire vehicle – Since the same communications routine is used for every collaborative application,( cooperative braking, merges, intelligent cruise control), it isn’t necessary to check the communications multiple times. • Problem: The postman algorithm that was used for communications protocols does not consider time critical events – The only timers were for retransmissions. The timer was set and the machine stayed in the state until a message was received or a timeout occurred Time in Conformance Tests If the transition between two states occurs because of a time-out, in order to test the edge we must: 1) 2) 3) – Execute a transfer sequence to the edge where the time-out is set, and set the time out Execute a transfer sequence to the state where the time out occurs. Wait for the timer to expire The transfer sequences may also contain edges with time-outs that need to be set and waited for. The sequence that can successfully exercise the current time out may be difficult to find, and the waits may be excessive A strategy with accurate clocks: 1) 2) 3) A shared memory between processes stores the time of occurrence for all time related events – timeouts are set as outputs from a module In any module, timeouts are an external input from the shared memory To test an edge, the input to the module is just the input that is received from the shared memory We are designing a shared memory for timeouts, sensor readings, … • Each memory elements has guarantees for the element Which participants definitely have the value, which participants know which other participants have the element, … Result: A Fail Safe Assisted Merge Protocol Operation: Notify the driver when there is a safe gap. If there is uncertainty about safety, notify the driver and implement automated spacing, and lane maintenance Dependent on: • Intelligence cruise control: – Maintains distance to more than one car – spacing between cars can be set • A shared memory of the map of vehicles and the deadlines • The lock protocol • A fail-safe, reliable broadcast protocol – If anyone cannot recover a message, everyone knows -- quickly – Scheduled message and ack transmissions – token passing – If a scheduled message is not recovered, stop transmitting, so that no one can recover your scheduled message.