new frontiers in formal software verification s0 s1 f Gerard J. Holzmann gholzmann@acm.org 23 spacecraft and 10 science instruments Exploring the Solar System model checking static analysis JASON 2 KEPLER DAWN WISE INSTRUMENTS CLOUDSAT AQUARIUS Earth Science • ASTER • MISR MRO • TES • MLS • AIRS JUNO VOYAGER 2 Planetary • MIRO VOYAGER 1 CASSINI EPOXI/ DEEP IMPACT STARDUST ACRIMSAT MARS ODYSSEY GRAIL SPITZER • Diviner • MARSIS Astrophysics • Herschel • Planck JASON 1 GRACE OPPORTUNITY MSL GALEX 2 formal software verification • after some ~30 years of development, is still rarely used on industrial software – primary reasons: • it is (perceived to be) too difficult • it is takes too long (months to years) • even in safety critical applications, software verification is often restricted to the verification of models of software, instead of software • goal: – make software verification as simple as testing and as fast as compilation 3 verification of the PathStar switch (flashback to 1999) • a commercial data/phone switch designed in Bell Labs research (for Lucent Technologies) • newly written code for the core call processing engine • the first commercial call processing code that was formally verified – 20 versions of the code were verified with model checking during development 1998-2000 with a fully automated procedure 4 software structure basic call processing plus a long list of features (call waiting, call forwarding, three-way calling, etc., etc.) PathStar Code (C) call processing (~30KLOC) traditional hurdles: ~10% call processing control kernel feature interaction feature breakage concurrency problems race conditions deadlock scenarios non-compliance with legal requirements, etc. 5 complex feature precedence relations detecting undesired feature interaction is a serious problem 6 the verification was automated in 5 steps code 2 @ PathStar call processing abstraction map 3 context 1 4 feature requirements Spin 5 msc bug reports property violations 7 1. code conversion ... @dial: switch(op) { default: /* unexpected input */ goto error; case Crdtmf: /* digit collector ready */ x->drv->progress(x, Tdial); time = MSEC(16000); /* set timer */ PathStar implied state machine: dial @: switch(op) { default: /* unexpected input */ goto error; case Crconn: goto B@1b; case Cronhook: /* caller hangs up */ x->drv->disconnect(x); @: if(op!=Crconn && op!=Crdis) goto Aidle; ...etc... Crdtmf Crconn else dial1 Cronhook dial2 else error 8 2. defining abstractions • to verify the code we convert it into an automaton: a labeled transition system – the labels (transitions) are the basic statements from C • each statement can be converted via an abstraction – which is encoded as a lookup table that supports three possible conversions : – relevant: – partially relevant: – irrelevant to the requirements: keep map/abstract hide (~ 60%) (~ 10%) (~ 30%) • the generated transition system is then checked against the formal requirements (in linear temporal logic) with the model checker – the program and the negated requirements are converted into automata, and the model checker computes their intersection 9 3. defining the context C code bug reporting model extraction map environment model Spin model database of feature requirements 10 4. defining feature requirements a sample property: “always when the subscriber goes offhook, a dialtone is generated” -automaton failure to satisfy this requirement: <> eventually, the subscriber goes offhook /\ and X thereafter, no dialtone is U generated until the next onhook. in Linear Temporal Logic this is written: <> (offhook /\ X ( !dialtone U onhook)) mechanical conversion 11 5. verification LTL requirement logical negation C code abstraction map environment model model extractor bug report * no dialtone generated 12 hardware support (1999) client/server sockets code scripts 13 iterative search refinement t=5 min. t=15 min. t=40 min. 14 each verification task is run multiple times, with increasing accuracy performance (1999) “bugs per minute” percent of bugs reported 100 (60) 75 (number of bugs (50) reported) 50 (30) 25 (15) 10 20 30 40 minutes since start of check 15 that was 1999, can we do better now? 1999 2012 • 16 networked computers running the plan9 operating system – 500 MHz clockspeed (~8GHz equiv) – 16x128 Mbyte of RAM (~2GB equiv) • 32-core off-the-shelf system, running standard Ubuntu Linux (~ $4K USD) – 2.5 GHz clockspeed (~80GHz equiv) – 64 Gbyte of shared RAM difference: approx. 10x faster, and 32x more RAM does this change the usefulness of the approach? 16 performance in 2012: “bugs per second” number of bugs found 32-core PC, 64 GB RAM (2012) 2.5GHz per core Ubuntu Linux (1999) 11 bug reports after 1 second 10 seconds 10 min 16 PCs, 128 MB per PC 500MHz Plan9 OS number of seconds since start of check 17 side-by-side comparison 1999 2012 • 25% of all bugs reported in 180 seconds (15 bugs) • 50% of all bugs reported in 480 seconds (30 bugs) • 16 CPU networked system • 15% of all bugs reported in 1 second (11 bugs) • 50% of all bugs reported in 7 seconds (38 bugs) • 1 desktop PC 18 generalization: swarm verification • goal: leverage the availability of large numbers of CPUs and/or cpu-cores – if an application is too large to verify exhaustively, we can define a swarm of verifiers that each tries to check a randomly different part of the code • using different hash-polynomials in bitstate hashing and different numbers of polynomials • using different search algorithms and/or search orders – use iterative search refinement to dramatically speedup error reporting (“bugs per second”) 19 swarm search 20 spin front-end http://spinroot.com/swarm/ $ swarm –F config.lib –c6 > script swarm: 456 runs, avg time per cpu 3599.2 sec $ sh ./script swarm configuration file: # range k the user specifies: 1. # cpus to use 2. mem / cpu 3. maximum time 1 # limits depth cpus memory time vector speed file 4 # min and max nr of hash functions 10000 128 64MB 1h 512 250000 model.pml # max search depth # nr available cpus # max memory to be used; recognizes MB,GB # max time to be used; h=hr, m=min, s=sec # bytes per state, used for estimates # states per second processed # the spin model # compilation options (each line defines a search mode) -DBITSTATE # standard dfs -DBITSTATE -DREVERSE # reversed process ordering -DBITSTATE -DT_REVERSE # reversed transition ordering -DBITSTATE –DP_RAND # randomized process ordering -DBITSTATE –DT_RAND # randomized transition ordering -DBITSTATE –DP_RAND –DT_RAND # both # runtime options -c1 -x -n 21 many small jobs do the work of one large job – but much faster and in less memory states reached 100% coverage swarm search linear scaling 100% coverage with a swarm of 100 using 0.06% of RAM each (8MB) compared to a single exhaustive run (13GB) #processes (log) (DEOS O/S model) 22 thank you! tools mentioned: • Spin: http://spinroot.com (1989) o parallel depth-first search added 2007 o parallel breadth-first search added 2012 • Modex: http://spinroot.com/modex (1999) • Swarm: http://spinroot.com/swarm (2008) 23