The Use of Program Profiling for Software Maintenance with Applications to the Year 2000 Problem Thomas Reps, Thomas Ball, Manuvir Das, and James Larus Presented by Amy Sliva The Y2K Problem Many computer programs used two digit date representations The year 2000 could be interpreted as 1900 Madness Other date related problems as well Leap and mayhem would reign year issues DARPA asked Reps to help plan a project to reduce the impact What technology could help in addition to present commercial products? Use Path Profiling to Help Determine the sites at which date-manipulation code occurs Dates are hidden in programs Crucial to the creation of effective tools for correcting Y2K problems Determine whether COTS components or tools have date problems Testing of post-renovation code Can distinguish more behavioral differences than node- or edge-profiling Path Profiling Instrument program to count the number of times different paths are executed Paths of interest are loop-free, intraprocedural Distribution of paths from an execution is called a path profile or path spectrum Differences between spectra from different runs can identify date-dependent computations Are different paths executed using pre-2000 and post-2000 date input? Example 1 Differences between spectra Path-spectrum-comparison reveals paths from new_spectrum not found in old_spectrum and vice versa Determine the shortest prefix of paths in new_spectrum but not old_spectrum Portion of a path representing a different computation Gather information on paths executed different numbers of times in the two spectra A threshold ratio (i.e., 100 to 1) identifies interesting paths Finding the shortest path prefix Use a trie structure on a path spectrum to find shortest prefix The first edge that deviates from the trie is the last edge of the shortest prefix Efficient Path Profiling Ball and Larus algorithm with overheads of 30-40% Numbering scheme applied to acyclic control-flow graph Ball-Larus labels the graph with two quantities: Each node W is labeled with the value num_paths_from(W) Each edge is labeled with a value derived from num_paths_from Each path will end up with a unique number using these two quantities Ball-Larus Algorithm Backward dataflow analysis Nodes labeled with num_paths_from Exit labeled 1 num_paths_from(W) = num_paths_from(W1 + … + Wk) Ball-Larus Algorithm (cont.) Number edges such that every path from Start to Exit has a unique sum of edge labels in the range [0…num_paths_from(Start) - 1] an edge WWi, vi is the sum of the number of paths to Exit from all successors of W that are to the left of Wi For Ball-Larus Algorithm (cont.) Example 2 Finding Path Prefixes from BallLarus labels p is a path in new_spectrum that does not occur in old_spectrum Index structure supporting range queries is built on old_spectrum Sequence of queries is issued to determine if ranges are empty IsRangeEmpty(S,a,b) the range [a…b] = true if S does not contain values in Finding Path Prefixes from BallLarus labels Paths in range [c…c + num_paths_from(W)-1] and prefix pre Start search where W = Start, pre = empty, and c = 0 Searched Start to W and have not found distinguishing edge Query IsRangeEmpty(old_spectrum, c + v i, c + v i + num_paths_from(Wi) - 1) If true, prefix is pre||(WWi) and prefix value = c + vi Implementation and Results Prototype system called DYNADIFF Instruments executable files User interfaces for displaying and organizing path spectra Tested on Unix cal and ncftp Correctly identified path for handling leap years in cal Introduced two-digit Y2K problem into ncftp and were able to identify different pre- and post-2000 behavior Implementation and Results Other Applications of Path Profiling Testing Systems that warn of internal errors Regression testing Testing for inconsistent data