ClearCase to Subversion bridge Handling Branch Merged Directory Changes................................................................................. 1 Reviewing the Subversion and ClearCase Change Models ........................................................ 1 The ClearCase Directory Model ............................................................................................. 2 How Various ClearCase Directory Changes Work ................................................................ 2 The Subversion Directory Model............................................................................................ 3 Interpreting the SVN Log ....................................................................................................... 4 Design ............................................................................................................................................. 5 Quality Level .............................................................................................................................. 5 Wish List ..................................................................................................................................... 5 ClearCase to SVN ....................................................................................................................... 5 Caveats, Assumptions, and Release Notes ................................................................................. 6 Caveats .................................................................................................................................... 6 SVN to ClearCase: initial migration ........................................................................................... 8 How to do a directory-only merge .......................................................................................... 8 SVN to ClearCase: synchronization updates .............................................................................. 8 ClearCase to SVN ....................................................................................................................... 9 Requirements for SVN/ClearCase Synchronization ....................................................................... 9 Handling Branch Merged Directory Changes This section recaps much of the discussion between Mike Pilato, Bob Jenkins, and Bill Rassieur as they discuss the solution to handling merged directory changes when importing to ClearCase from Subversion. The algorithm developed so far by Bill is able to parse and interpret the Subversion log for directory changes made directly on the sync branch (the source for changes to be migrated into ClearCase). The current algorithm, as of rev 1.76 of the script, fails in a number of cases when it tries to parse a log of the sync branch when the changes being parsed represent directory changes merged from a branch that is a child of the sync branch. (Note: the term child branch means that the branch originated in Subversion as a copy of the sync branch.) Reviewing the Subversion and ClearCase Change Models This section reviews a few of the key aspects of how Subversion and ClearCase model changes to directories and files. You will see the key difference is that a ClearCase version of a directory lists the versioned files and directories it contains, whereas Subversion also records the specific versions of each of the elements the directory contains. We will also discuss the implications of this difference. Page 1 of 10 ClearCase to Subversion bridge The ClearCase Directory Model In this section you can read a very brief description of what data ClearCase stores when it versions directories. Then for each operation like Adding a new element, Deleting, or Moving an element, we say what the directory changes look like under this model. In ClearCase, files and directories are both treated as versioned entities. The general term for a versioned entity is element. Every element in ClearCase, whether file or directory, has an associated version tree. The version tree indicates the entire change history of that element. The version tree consists of branches with versions (along with a representation of the contents of each version). [Editor’s note: be nice to insert a diagram of a version tree at this point.] Each element has an OID (object id) that uniquely identifies it among elements stored in the ClearCase repository. Note that elements conspicuously lack a name! That is not one of their intrinsic attributes. How, then, do files (and directories) get names? To understand that, we need to consider what ClearCase directory elements actually contain as their contents: a list of names each associated with an OID pointing to an element in the repository. A key distinction with Subversion is that ClearCase directory versions contain pointers to elements (i.e., whole version trees), and NOT to specific versions of each of the elements the directory version contains. ClearCase uses another mechanism, known as a Configuration Specification, a set of rules, to determine which version of an element a user workspace expresses. For more information, see any of the Rational reference books on ClearCase. A ClearCase directory version can also associate a symbolic link with a name. This, in effect, makes the ClearCase file-system model look very much like the Unix file system, with the added concept that a given file (or directory) can be versioned. But as you can see, both hard links which are just OID values - and symlinks can be found in a given (version of a) directory. How Various ClearCase Directory Changes Work The last section talked about what data ClearCase stores, especially with regard to versions of Directory elements. This section builds on that by talking about what various change operations do in terms of this model. This section is written under the simplifying assumption that we are concerned only with a single VOB database. Consideration of the ‘relocate’ command is not treated here as it would complicate the discussion quite a bit and is out of scope for the current Bridge project. Page 2 of 10 ClearCase to Subversion bridge Adding files and directories: To Add a file or directory under a given parent directory, the parent is checked out, the new element created, and the parent checked in. The command that creates the new element, starting its version tree, also inserts a name and OID pointer into the checked out version of the parent directory. Once the parent directory is checked in, a diff with its predecessor version clearly shows the addition of the new name. This process works the same whether the new element being added is a file or a directory. Deleting files or directories: To make a file go away, or not appear any more under a directory, that directory is checked out, and then an ‘rmname’ command executed against the name that is no longer required. This removes the name from the list of the checked out version of the directory. When the parent directory is checked back in, the change is recorded in the repository. Moving or Renaming elements: The source parent directory and destination parent directory are checked out. A ‘move’ command is invoked. This results in a name removal in the source parent directory and a name insertion in the destination directory. The entire history of the moved element is retained at the new location. This is because the OID reference in the before/after directory entries point to the same element, hence the same version tree. Copying elements: ClearCase has no built in command to perform copies per se. There are commands that provide for manually inserting hardlinks (names+OID refernces) into directory versions at will. This can be used to implement copy functionality. Scheible Rassieur always advises clients to avoid using this feature of ClearCase as it results in “instant change propogation”. The notion of instant change is nearly always anathema to best practices of Change Management in which the desire is to manage and control change. As mentioned in an earlier section, ClearCase also supports symbolic links. These might be similarly employed to provide copy semantics if one wished to do so. Scheible Rassieur does recommend limited use of symbolic links, although not as a means of implementing copy semantics, but for other purposes which we will not enter into here. The Subversion Directory Model In this section you can read a brief description of what data Subversion is storing as it versions directories. A description of what happens for copies, moves, adds, and deletes is then given in terms of this model, with an eye towards comparing with what ClearCase is doing. Note: this section as seen currently has been drafted by Bill Rassieur whose knowledge of Subversion certainly is superseded by that of Mike Pilato or Robert Jenkins. The Subversion directory model is substantially the same as the ClearCase model in that directories are indeed versioned objects. This affords Subversion a great deal of power to track directory changes which power a number of other popular versioning tools lack. A contrast between how Subversion and ClearCase treat directory objects is that in Subversion the references to children that a directory stores actually point to versions of the children, not the children as a whole. To repeat a statement made earlier in this document: Page 3 of 10 ClearCase to Subversion bridge A key distinction with Subversion is that ClearCase directory versions contain pointers to elements (i.e., whole version trees), and NOT to specific versions of each of the elements the directory version contains. One consequence of this structure is that whenever a file somewhere in the repository tree is checked in, each of its ancestor directories all the way back to the root need to be ‘bumped up’ a version. One can infer from this that the root directory in a subversion repository must have the maximum number of internal revisions of any versioned object in the system. In fact, it may be the same number as the number of commits in the system [Mike is that right, or approximately right?] Adds: When a new file is added to the system, a new versioned object is created for it. This further results in a new internal revision of the parent and all other ancestor directories of the new file. Deletes: When a file is deleted, it’s old history is retained (just like in ClearCase). A new internal revision is created for the parent directory and any other ancestor directories of the deleted file. Copies: A copy works the the Add of a new object but with an exception. The file at the location that is the destination of the copy has an internal pointer, a predecessor pointer linking back to the source of the copy. In this way, Subversion is able to represent the history of the versioned object. Note that a copy operation is essential to how Subversion treats branching. In Subversion, unlike ClearCase, the file-system and branching are intertwined: to copy is to branch. In ClearCase the history of a versioned object is associated with that object. Where that object is located in the ClearCase representation of the file-system is entirely unrelated: it is stored in the history of directory objects. Moves and Renames: These operations (which are really the same thing) are implemented as a Copy from source to destination followed by a Delete of the source. Interpreting the SVN Log In order for the ClearCase to SVN source Bridge to work effectively, it must be able to correctly interpret the SVN log to try and correctly derive and rollup directory changes occurring along a sync branch between two given revisions. We are having problems in this area currently. This section highlights some of the theory of operation relative to these problems. [Editor’s note… add more text to this section] Page 4 of 10 ClearCase to Subversion bridge Design Quality Level The quality level of this design has been requested as “down and dirty”; something that arguably works, but may have caveats that a refined, finished product would not have. Wish List Following is a list of potential features for future consideration: address all the caveats shown below provide nice error messages to users, and in no case have a Python stack traceback (if it can at all be helped) support Linux/Unix environment support snapshot views the shutil copy and test stuff could say, hey you've forgotten to update? Log it as an error. Also, at the end, we can recommend NOT checking in anything if an error has occurred. as an added feature, if there are no errors, then check in and label everything. as and added feature, have clearsvn check for non-versioned (aka, private) elements before running, as well as checking the configuration of both spaces. (CC if possible.. maybe not). be nice to use a .ini file to store parameters, as well as hold the record of updates have a log file, and a directory where update logs are maintained. ClearCase to SVN Pre-req: the user has set up -- an SVN client space and two ClearCase views, a BEFORE and an AFTER. -- the SVN workspace should exactly match the BEFORE view configuration (we need to also have a way for the initial import into SVN - I'll get to that) -- The AFTER can represent some point later. NOTE: It behooves the user to have some type of sane configuration rule for their views, like, use UCM baselines, or base ClearCase labels, or timestamps. If not, they risk losing the ability to effectively synchronize. When clearsvn is invoked for CC->SVN mode it does the following: 1. Gets command line args. (duh) 2. Does a tree walk comparing the BEFORE view and the SVN workspace. -- any differences are noted as errors. 3. Does a tree walk of the BEFORE and AFTER views to get adds, deletes, and modifies. 4. For each ADD or DELETE, looks up the object id's and resolves any matches as moves. 5. Performs the ADD, DELETE, and MOVE operations in SVN, reporting any errors (remember, you can do stuff in ClearCase that you can't play back in SVN...) Page 5 of 10 ClearCase to Subversion bridge 6. Performs the modify updates on changed files, reporting any errors (not that we expect any). Wish List Items: -- At step 2, you could also imagine having clearsvn check for checkouts and/or view private files, of which, there should be none. Initial setup. This should be pretty easy, actually. We just provide a mode where there is no BEFORE directory, and clearsvn: 1. Gets command line args. 2. Does a tree walk comparing the AFTER view (the only view) and the SVN workspace. 3. Adds any missing elements. Deletes any extra elements. Updates any changes. In other words, in this mode, clearsvn is effective, (but dangerous) like a table saw: it will force the svn workspace to look EXACTLY like the AFTER view. Again, I'm not recommending we do any auto-checkin. In my experience (you can accuse me of being all thumbs if you want :-), these kinds of updates are fraught with peril (even with great tools to help - it's the band saw thing): the last thing you want is to have to undo a thousand files worth of change when your script checked it all in for you w/o review first. However, it is not technically difficult to add that enhancement. I'm not trying to dig my feet in. It can be done if you really want. Caveats, Assumptions, and Release Notes Caveats The –x option cannot take an empty value: you always have to specify at least one level of directory in SVN that is going to be truncated off the front of all paths processed from SVN into ClearCase. As an example, you have to do ‘-x /trunk’ or ‘-x /branches’ (referring to what you originally populated your client workspace with, which is probably going to be /trunk). And what this precludes is the possibility that you can update ClearCase with all paths under the repository root of SVN. Some of this section is redundant. To be cleaned up later. if a co or other CC operation fails, we just abort, don't do things intelligently. User must unco and correct the underlying issue. -- could be a lot more helpful here not parsing the svn log very intelligently: -- if there are comments masquerading as Add and Delete operations, we get fooled -- not parsing for the change revision, which would be very cool to do, so repeated add/del of same file not handled correctly necessarily. Page 6 of 10 ClearCase to Subversion bridge user keeps track of svn revision changes themselves -- would be great if clearsvn.py maintained some information somewhere user does initial setup themselves, i.e., svn->CC is a clearfsimport after running update, user validates it and manually does CC checkin, or does svn commit it's up to the user to refrain from: - making any changes in a receiving workspace. If they do, such will be interpreted as requiring update from the sending side. (although this would be detectable, especially in the SVN->CC case... see next item) - having any non-versioned files in either a sending or a receiving workspace. clearsvn will not attempt to identify and skip non-versioned files; such will essentially be the cause of error generation. The user will have to go back and cleanup any such elements. clearsvn is not attempting to reconcile the SVN Modify changes with what we find in the treewalk. We *could* do this and it would help address the case where the user accidentally made changes on the CC receiving workspace where they shouldn't have. There are two modes, and two major cases that have to be considered: modes: SVN to ClearCase, and ClearCase to SVN updates. Cases: first time update (a migration), a synchronization. There is another case which occurs when an update is aborted part way through. The same type of issue, but for different reasons occurs if the assumption of a clean destination is broken. The assumption is that any branch in SVN or in ClearCase that is the target or destination for an update, that said branch receives changes solely through update operations. The branch never receives changes for any other reasons. Depending on what time allows, we may or may not provide checks for non-versioned files appearing in either the source or target trees. Such files are called view private files in ClearCase terminology. The simplest approach is to assume (read: restrict so) that the user never has any non-versioned files in either tree. It may be prudent – and cost effective - to simply put checks for any such files and severely warn the user if such are found upon doing a sync operation. Another assumption we shall make is that synch updates are atomic operations. This means that if an update operation gets aborted for some reason and before it has completed in entirety, that there is a way that the user can revert the partial changes made to the target. Furthermore, we assume the user always takes care of this so that the synch script can be designed relying on the notion that it has a revision number or some other means to specify the state of the source and target branches each time it runs. This means we are not providing a general way for the synch script to decipher and determine the current state of a target branch and make the right set of changes to bring it into line with a source. We are providing a specific way of doing this in which the target branch is assumed Page 7 of 10 ClearCase to Subversion bridge clean and up to a certain point, and that we can easily establish that same state on the source branch. The script is designed under the assumption that the user has done all the work to setup: a properly configured ClearCase dynamic view. a properly configured SVN client workspace. that are ready and usable for the synchronization script. SVN to ClearCase: initial migration In this case, the sync script can invoke “clearfsimport” as its central action. The assumption is generally that the target view in ClearCase will appear empty, and that there is an SVN client workspace that the script should use as a source basis. The revision number specifies the source configuration. The source revision number gets recorded in a text file specially maintained in the ClearCase target view. How to do a directory-only merge Here’s the overview of how to setup a UCM view in ClearCase to do the directory-only merge into the receiving branch. 1) Configure a view (not the receiving view, but may be one configured just like it – Merge Manager won’t permit view-to-same-view merging). 2) (optional) Update that view’s config spec to select only directory types (you can omit this step if you don’t know how to do that 100% successfully) 3) Set up the trees in ClearCase you want to merge, and run the find merge. 4) If you did step 2, skip to step 6. 5) In the graphical list, sort by type, select all files, and delete them from the list. 6) Select all candidates and merge them in. Note their merge type should all say ‘trivial’. 7) Check in the merged directory elements. SVN to ClearCase: synchronization updates In this case, the sync script expects to find a special text file in the target view area, which file indicates the revision level of the SVN source tree the last time any update occurred. The script makes sure the SVN source area is updated, and then parses the log for the changes that have occurred since the last update. As part of the update, the special text file in the ClearCase view is updated. This file is kept under revision control and checked in after all updates have been made during each synchronization. The log from SVN is parsed. The overall (cumulative effect of) adds, deletes, and moves are determined and translated into mkelem, rmname, and mv commands for cleartool. These commands are issued. After that, the source and destination trees are walked. In each directory, contents are compared. At this point, there should be no directory change type of differences found: all files and directory contents should match up. If such reconciliation is not seen, then the differences are reported as errors. The files are compared. If any are different, then the source version is copied over to a writeable (checked out) copy of the destination file. Everything is checked in finally. Page 8 of 10 ClearCase to Subversion bridge Note that certain directories shall be excluded from synchronization updates. One case is that no .svn directory from an SVN source tree will ever be moved into ClearCase. During operation of the script, a log file is kept that tracks every command and every step that the sync script takes. This log file is generated with a timestamp as part of its name so as to make it easy to correlate a given log file with the event when it was run. ClearCase to SVN To be determined. This is contemplated to work much as the SVN to ClearCase operations as discussed above. Requirements for SVN/ClearCase Synchronization Summary – In general, the requirement is to be able to reproduce the snapshot state of the source version management system in the target version management system. That entails adds, modifications, deletes, renames and moves of files and directories. The synchronization must be supported to go from Subversion to ClearCase and vice versa. SVN -> ClearCase Synchronization – 1. Update the SVN working copy (focused on the development branch) 2. Need to identify the files and directories that have moved (or renamed) via the ‘svn log’ command (e.g., I was able to get the new location/name and old location/name via following commandline, but this would need to either filter by date or revision to get just the latest changes since the last synchronization) svn log --verbose|grep from|sed 's/ A //'|sed 's/(from //'|sed 's/)//' 3. Need to execute the ClearCase command (cleartool mv) to move the identified files and directories in the ClearCase view (focused on a branch for the SVN work) 4. Need to export SVN (svn export) 5. Need to execute clearfsimport to update ClearCase with new files, deleted files and modified files (per the current clearcvs.py with deleted files handled) from the SVN export ClearCase -> SVN Synchronization – 1. Triggers need to be created for the following ClearCase commands: mv, rmname, rmelem and rmfolder. The triggers would need to record the from and to paths along with the operation to be performed (i.e., move or remove). 2. Need to be able to execute moves (svn move) and deletes in the SVN working copy (focused on a branch for the ClearCase work) working from a list of files and directories that have moved in ClearCase. 3. Copy the contents of the ClearCase view (focused on the internal development branch) into the SVN working copy (being sure to eliminate the lost+found directory) 4. Identify the files and directories that need to be added (e.g., I was able to do this via the following commandline). svn status |awk '/\?/'| awk 'BEGIN { FS= "?" } { print $2 }' > ..\addfiles Page 9 of 10 ClearCase to Subversion bridge 5. Execute a SVN add on the identified files and directories (e.g., I was able to do this via the following command). svn add --targets ..\addfiles 6. Execute an SVN commit which will commit the adds, moves and modifies together. The key here is to automate the steps for the synchronization. We can document how to establish the views and working copies leaving it to the user to execute properly. The SVN to ClearCase synchronization is the most critical one for the customer so it is needed first. It would be nice to allow the commit message to be passed to the script. Page 10 of 10