Tracing regression bugs Presented by Dor Nir 1 Outline Problem introduction. Regression bug – definition. Industry tools. Proposed solution. Experimental results. Future work. 2 Micronose corp. A big company founded by Nataly noseman 1998 – Version 1 of nosepad. (great success) 3 Nosepad version 1 class Nosepad void Save(){ ... bDirty = false; } void Exit(){ if(IsDirty()) { if(MsgBox(“Save your noses?")) Save(); } CloseWindow(); } { bool bDirty; void AddNose(){ ... bDirty = true; } void DeleteNose(){ ... bDirty = true; } bool IsDirty(){ return bDirty; } } 4 Nosepad version 2 New features requires … … Undo/Redo mechanism. … Micronose expanding Promotions. New recruitments. 5 Undo/Redo Design Undo stack – Each operation will be added to the undo stack. Redo stack - When undo operation this operation will move to the redo stack. Key “b” “a” Undo Redo 6 Nosepad version 2 class Nosepad { … Stack undoStack; Stack redoStack; void AddNose(){ ... void Undo(){ undoStack.Top().Operate(false); redoStack.Push(undoStack.Pop()); } void DeleteNose() { ... undoStack.Add(DelNoseOp); redoStack.Clear(); } . . . void Redo(){ redoStack.Top().Operate(true); undoStack.Push(redoStack.Pop()); } undoStack.Add(AddNoseOp); redoStack.Clear(); } } 7 Zelda from QA 8 Nosepad version 2 correction class Nosepad {… Stack undoStack; Stack redoStack; void AddNose(){ ... undoStack.Add(AddNoseOp); redoStack.Clear(); } void Undo(){ undoStack.Top().Operate(false); redoStack.Push(undoStack.Pop()); } void Redo(){ redoStack.Top().Operate(true); undoStack.Push(redoStack.Pop()); } void DeleteNose() { undoStack.Add(DelNoseOp); redoStack.Clear(); } bool IsDirty(){ return bDirty & !undoStack.IsEmpty(); } 9 Zelda from QA 10 Regression bug observations The second bug is a regression bug. The same test will succeeded on version 1 and fail on version 2. The specifications for version 1 haven't change for version 2 (only addition) 11 Regression bug Specifications 1. X 2. Y 3. Z Version 1 Changes in code Version 2 Specifications 1. X 2. Y 3. Z 4. A 5. B Bug… But no regression 12 Regression bug definition Regression bug – Changes in existing code that change the behavior of the application so it does not meet a specification that was previously met. 13 How to avoid regression bugs? Prevent inserting regression bug to the code: Simple design. Programming language. Good programming. Methodology. Test driven development. Code review. Communication. Find regression bugs before release of the product. Extensive testing. White box \ Black box testing. 14 Automatic tools Find whether a regression bug exist. Quick Test Professional. 15 Where is it? What was the cause for the regression bug? What was the change that caused the regression bug? 16 What is a change? Change existing code lines Adding new code lines. Delete code lines. 17 Problem definition When getting a check point C that failed, and source code S of the AUT. We want to find the p1 , p2 in ... pthe n places (changes) code S that causes C to fail. We want to do it independently of the source code language or technology. We know that at time T (previous to the failure) the checkpoint passed. 18 Solution 1 QA Programmer Cooperation Tests Source code 19 Solution 1 - map Tests Check text in message box Source code Windows.cpp errorMessages.cpp File t.xml was Created successfully File.cpp IO.cs “SELECT NAMES from Table1” is not empty C:\code\files DB project 20 Solution 1 Much work has to be done for each new test. Maintenance is hard. We end up with a lot of code to be analyzed. Could use automatic tools (profilers). 21 Solution 2 Check text in message box Windows.cpp errorMessages.cpp File t.xml was Created successfully “SELECT NAMES from File.cpp Only IO.cs changes C:\code\files Table1” is not empty DB project 22 Source Control Version control tool. Data base of source code. Check-in \ Check-out operation. History of versions. Differences between versions. Very common in software development. Currently in market:VSS, starTeam, clear case, cvs and many more. 23 Finding regression bug Check point to code tool Check point Source code Source Control Tool Change A Change B … Input Failed check point First phase Heuristic s Out put: Relevant changes: 1. Change X 2. Change Y 3. Change Z … Second phase 24 Heuristics (second phase) Rank changes. Each heuristic will get different weight. Two kinds of heuristics: Technology dependence. Non technology dependence. 2 1 3 25 Non-technology heuristics Do not depend on the technology of the code. Textual driven. No semantics. 26 Code lines affinity Check point Select "clerk 1" from the clerk tree (clerk number 2). Go to the next clerk. The next clerk is "clerk 3" 27 Check in comment affinity Check-in comment Check point Go to the next waiter when next item event is raise Select "clerk 1" from the clerk tree (clerk number 2). Go to the next clerk. The next clerk is "clerk 3" 28 File affinity Words histogram in file Clerk.cpp Waiter Waiters Next Number ….. ….. 186 15 26 174 Check point Select "clerk 1" from the clerk tree (clerk number 2). Go to the next clerk. The next clerk is "clerk 3" 29 File Name affinity Check point ClerkDlg.cpp Select "clerk 1" from the clerk tree (clerk number 2). Go to the next clerk. The next clerk is "clerk 3" 30 More possible non technology heuristics Programmers history. Reliable vs. “Prone to error” programmers. Experience in the specific module. Time of change. Late at night. Close to release dead line. 31 Technology heuristics Depend on the source code language. Take advantage of known keywords. Use the semantics. 32 Function\Class\Namespace affinity Check point Select "clerk 1" from the clerk tree (clerk number 2). Go to the next clerk. The next clerk is "clerk 3" 33 Code complexity Deepness, number of branching. if(b1 && b2) if(b1 && b2 && c2 && d1) { c1 = true; if(c2 && d1) c1 = true; else { if((c2 && d2) || e1) c1 = false; } } > 34 Words affinity problem red, flower, white, black, cloud rain, green, red, coat 35 Words affinity problem (cont.) red, flower, white, black, cloud rain, green, red, coat > red, flower, white, black, cloud train, table, love 36 Word affinity red flower red < blue red < red 37 How can we measure affinity? Vector space model of information retrieval.Wong S.K.M , raghavan Similarity of documents. Improving web search results using affinity graph - benyu Zhang, Hau Li, Lei Ji, Wensi Xi, Weiguo Fan. Similarity of documents. Diversity vs. Information richness of documents. 38 Affinity definition Synonym (a) - Group of words that are synonyms of a or similar in the meaning to a. Synonym (choose) = chosen, picked out; choice, superior, prime; discriminating, choosy, picky , select, selection 39 Words affinity definition (cont) if a == b 0 else ShallowAffinity (a,b) = 1 1 if a == b Affinity (a,b) = else ShallowAffinity (synonym (a), synonym (b)) 40 Affinity of groups of words A {a1 , a 2 , a3 ...a n } B {b1 , b2 , b3 ...bm } ShallowAffinity ( A, B ) AsymetricAffinity( A, B ) ShallowAffinity (a , b ) i i 1.. n j 1.. m j | A|| B | max{ Affinity(a , b ),..., Affinity(a , b i 1.. n i 1 i m | A| AsymetricAffinity( A, B ) AsymetricAffinity( B, A) Affinity( A, B ) 2 )} Using affinity in the tool Words (C) = the group of words in the description of the checkpoint C. Words (P) = Group of words in the source code/checkin/file etc… Rank (C , P ) Affinity(Words (C ), Words ( P )) 42 Using affinity in heuristics Code line affinity: Words (P, L) = Group of words in the source code located L lines from the change P. β – coefficient that gives different weight for lines inside the change. Check-in comment affinity: Rank 2 (C, P) Affinity(Words(C),Words(checkin( P)) 43 Using affinity in heuristics (cont.) File affinity: P is a change in file F with histogram map. HstgrmAffinity ( A, B, map) max{ Affinity(a , b ),..., Affinity(a , b map[a ] i 1.. n i 1 i 1.. n i m )} map[ai ] i FileRank (C , F ) HstgrmAffinity (Words (C ), Words ( F ), Hstgrm ( F )) Rank 3 (C, P) FileRank (C, F ) 44 Using affinity in heuristics (cont.) File name affinity: Rank 4 (C, P) Affinity(Words(C),Words( FileName( P)) Code elements affinity: Rank 5 (C , P ) 1 Affinity(Words (C ), Words ( FunctionName( P )) 2 3 Affinity(Words (C ), Words (ClassName( P )) 8 1 Affinity(Words (C ), Words ( Namespace( P )) 8 45 Algorithm Input: C – Checkpoint. T – The last time checkpoint C passed. 1. Get the latest version of the source code for C from the source control tool. 2. Get files versions from the source control tool one by one until the version check-in time is smaller then T. For each file version: 1. Get the change between the two sequential versions. 2. Analyze and rank the change in respect to the checkpoint C (Rank(C,P)) 3. Add the rank to DB. 46 Observations Rank i (C , P1 ) Rank j (C , P2 ) and i ≠ j P1 P2 Better affinity Better results The project is not always in a valid state. 47 Implementation Visual source safe Arexis merge – Diff tool. MS Word WordNet MS Access – DB. 48 WordNet Developed at the University of Princtoen. Large lexical database of English. English nouns, verbs, adjectives, and adverbs are organized into synonym sets, each representing one underlying lexicalized concept. Different relations link the synonym sets. 49 Additional views Group by file. Group by time of change. 50 The tool 51 The tool 52 Experimental Results Source code: C++ MFC framework 891 files in 29 folders 3 millions lines of code 3984 check-ins 53 Experimental results (cont.) Checkpoint No Grouping Group by file 1 1 1 2 2 7 3 2 2 4 - - 5 - 1 54 Challenges Time. Words equality. Source code vocabulary. Cache. Filtering by one heuristic. Example - m_CountItemInTable. Additional synonyms. Clerk ≈ Waiter. 55 Future work Add more heuristics. Learning mechanism – Automatic tuning of heuristics. Why? Finding more about source of regression bugs. Bad Programmer. Dead line. Technology. Design. 56 57