Version Control with Git xkcd.com/1597 adam-harding@uiowa.edu Every project has 2 problems… stuff.h awesome.h things.c Problem 1: Progress Project changes over time: Old? New? Both? stuff.h stuff.h awesome.h awesome.h awesome.h things.c things.c things.c initial effort (last month) • • • add stuff (last week) When was some feature added? Why? Maintain previous version while working on new? Fix the same bug once in old and new version? change stuff (yesterday) Problem 1: Progress Project changes over time: Old? New? Both? stuff.h stuff.h awesome.h awesome.h awesome.h things.c things.c things.c initial effort (last month) • • • add stuff (last week) change stuff (yesterday) When was some feature added? Why? Maintain previous version while working on new? Fix the same bug once in old and new version? “Many people’s version-control method of choice is to copy files into another directory (perhaps a timestamped directory, if they’re clever). This approach is very common because it is so simple, but it is also incredibly error prone.” –Pro Git Difficult even if you are the only one! But collaboration is totally impractical, because of… Problem 2: Edit Wars Multiple, simultaneous changes: Laptops? Colleagues? stuff.h stuff.h stuff.h awesome.h awesome.h awesome.h things.c things.c things.c int main(){} int foo(){} ??? Can I keep both? int main(){} void foo(){} Problem 2: Edit Wars Multiple, simultaneous changes: Laptops? Colleagues? stuff.h stuff.h stuff.h awesome.h awesome.h awesome.h things.c things.c things.c int main(){} int foo(){} ??? int main(){} void foo(){} Can I keep both? Different files: Same file, but lines don’t overlap: • Overwrite old version of each. (Trivial. A computer could do it without hints!) • Edit the file to incorporate each line modification. (Manually? Hmm…) Same file, same lines: • Conflicts! For each line, I choose what to do. (Can the computer help? Hmm..) Solution: VCS Version Control System Repository VCS tool awesome.h things.c v8 awesome.h things.c v7 Solution: VCS Version Control System Repository awesome.h things.c 1: Get bits VCS tool awesome.h things.c v8 awesome.h things.c v7 Solution: VCS Version Control System Repository stuff.h awesome.h things.c v9 stuff.h awesome.h 2: Store changes things.c 1: Get bits VCS tool awesome.h things.c v8 awesome.h things.c v7 Solution: VCS Version Control System Repository stuff.h awesome.h things.c v9 stuff.h awesome.h 2: Store changes things.c 1: Get bits VCS tool awesome.h things.c v8 Why Git: • Basic features are really useful, and… …advanced features if you want them • Robustly retains your data, but… …can undo almost anything if needed • Adaptable: you, the lab, MegaCorp… awesome.h things.c v7 Git: Outline • Using Git to record your project while you work • How Git records your progress • Using Git to organize your progress • Sharing a project with other Git users Commit • Primary unit for storing work • Exists inside a repository • A snapshot of what your project’s files looked like when you made the commit • Contains: author, email address, timestamp, snapshot data, other stuff, unique hash of contents Commit History • Unless an “orphan”, each commit descends (as the “child”) from at least one earlier commit (“parent”) • Project history is all the project’s commits in order • Yes, history is a Directed Acyclic Graph whose nodes are commits. Hold that thought! Repository: 3 Areas • Working Copy The file tree you see on your filesystem • Staging Area List of changes to the previous snapshot you will apply as the next snapshot • Commit History The snapshots you stored previously stuff.h awesome.h things.c Working Copy Staging Area Commit History Repository: 3 Areas • Working Copy The file tree you see on your filesystem • Staging Area List of changes to the previous snapshot you will apply as the next snapshot • Commit History The snapshots you stored previously stuff.h awesome.h things.c Working Copy Staging Area Commit History Repository: 3 Areas • Working Copy The file tree you see on your filesystem • Staging Area List of changes to the previous snapshot you will apply as the next snapshot • Commit History The snapshots you stored previously stuff.h awesome.h things.c Working Copy Staging Area Commit History Repository: 3 Areas • Working Copy The file tree you see on your filesystem • Staging Area List of changes to the previous snapshot you will apply as the next snapshot • Commit History The snapshots you stored previously stuff.h awesome.h things.c Working Copy Staging Area Demo: Store a new project! Commit History 1) Files are ready! (^>^) cd somefolder awesome.h things.c somefolder/ 0) (Need a repository first) (^>^) git init # Create a new, empty repository in the current directory awesome.h things.c Staging Area Commit History somefolder/.git/ somefolder/ 1) (Files still untracked) (^>^) git status awesome.h # untracked things.c # untracked # Git sees any file, but doesn’t care about it until you say so.) awesome.h things.c Staging Area Commit History somefolder/.git/ somefolder/ 2) Add changes (^>^) git add things.c awesome.h (^>^) # Changes are staged. # Git starts tracking a file the first time you add its changes! awesome.h things.c Staging Area Commit History somefolder/.git/ somefolder/ 2) (These are now tracked) (^>^) git status new file: awesome.h new file: things.c awesome.h things.c Staging Area Commit History somefolder/.git/ somefolder/ 3) Commit staged changes (^>^) git commit -m ‘add initial functions’ (^>^) (^>^) # Always write a nice commit message! awesome.h things.c Staging Area Commit History somefolder/.git/ somefolder/ Commit terminology Yes, this means you “commit a commit”. Sorry. awesome.h things.c Staging Area Commit History somefolder/.git/ somefolder/ Repeat: 1) Files are ready (^>^) $EDITOR things.c stuff.h stuff.h awesome.h things.c Staging Area Commit History somefolder/.git/ somefolder/ Repeat: 1) (some tracked, some new) (^>^) git status stuff.h # untracked (The previous commit doesn’t have this file.) things.c # modified (The previous commit has a version of this file.) stuff.h awesome.h things.c Staging Area Commit History somefolder/.git/ somefolder/ Repeat: 2) Add changes (^>^) git add things.c stuff.h stuff.h awesome.h things.c Staging Area Commit History somefolder/.git/ somefolder/ Repeat: Commit Stuff (^>^) git commit -m ‘new parms in somefunction’ stuff.h awesome.h things.c Staging Area Commit History somefolder/.git/ somefolder/ Check the log (^>^) git log add initial functions new parms in somefunction stuff.h awesome.h things.c Staging Area Commit History somefolder/.git/ somefolder/ Good advice Use commit messages: Separate your build artifacts: Somebody in the future needs the help Use a build directory stuff.h awesome.h things.c Staging Area Commit History somefolder/.git/ somefolder/build/ somefolder/ Good advice Use commit messages: Separate your build artifacts: Avoid committing build artifacts: “Release” means you published binaries: Somebody in the future needs the help Use a build directory If you can get the project, you can build it! Others can download from the website (or FTP site, etc.) stuff.h awesome.h things.c Staging Area Commit History somefolder/.git/ somefolder/build/ somefolder/ Git: Outline • Using Git to record your project while you work Commits • How Git records your progress • Using Git to organize your progress • Sharing a project with other Git users The DAG History: The DAG grows as commits arrive. What about a new feature I want to keep separate? Commit History The DAG History: • Work on a new feature, but keep it separated until it’s ready The DAG History: • • Work on a new feature, but keep it separated until it’s ready I drew another “column”, but this history is just a string! The DAG History: • • • Work on a new feature, but keep it separated until it’s ready I drew another “column”, but this history is just a string! Work continues on the stable version: a commit can have more than one child Git: Outline • Using Git to record your project while you work Commits • How Git records your progress The DAG • Using Git to organize your progress • Sharing a project with other Git users The DAG History: • Easy to find the history of any commit: follow links to visit every ancestor C B A P The DAG: Branches History: • Easy to find the history of any commit: follow links to visit every ancestor Branches: • We only need to store a single reference to identify each path; call it a branch ref (or just “branch”) C B A P feat42 The DAG: Branches History: • Easy to find the history of any commit: follow links to visit every ancestor Branches: • • We only need to store a single reference to identify each path; call it a branch ref (or just “branch”) Git creates “master” branch by default, and stores commits there unless told otherwise master C B A P feat42 The DAG: Branches History: • Easy to find the history of any commit: follow links to visit every ancestor Branches: • • We only need to store a single reference to identify each path; call it a branch ref (or just “branch”) Git creates “master” branch by default, and stores commits there unless told otherwise Just a drawing convention! Only topology matters. master C B A P feat42 The DAG: Branches Branches: Q: What are you working on right now? A: Current location is simply what the HEAD ref is pointing to Git appends your staged commit onto the DAG at the current branch, then moves the branch to that commit. HEAD master C B A P feat42 Branch details Commits A, B, and P are “on the feat42 branch”, so feat42 usually means both of these identically: “the ref named feat42” “all the commits in the history of the ref named feat42” But if somebody says you can “delete your feat42 branch”, you know they mean the ref! HEAD master C B A P feat42 Branch details Deleting your feat42 branch is removing the branch ref! • P is no longer in the history of ANY branch • You can still access it, but… • …Git will eventually garbage-collect such unreferenced commits (30 days by default) HEAD master C B A P (poof) Branch demo OK, same example again. This time, note how some commands modify the branch refs. HEAD master B A feat42 Branch demo (^>^) git branch # list the branches; ‘*’ indicates current branch * master HEAD master B A Branch demo (^>^) git branch feat42 # create the branch ‘feat42’ at current location HEAD master B A feat42 Branch demo (^>^) git branch # notice we are still on master! feat42 * master HEAD master B A feat42 Branch demo (^>^) git checkout feat42 (^>^) git branch * feat42 master master B A feat42 HEAD Branch demo (^>^) $EDITOR newfeature.c (^>^) git add newfeature.c (^>^) git commit -m ‘add function for feat42’ P master B A feat42 HEAD Branch demo (^>^) git checkout master P HEAD master B A feat42 Branch demo (^>^) $EDITOR mainprog.c (^>^) git add mainprog.c (^>^) git commit -m ‘fixed a bug’ HEAD master C B A P feat42 Problem: • • Commit C contains a bugfix Commit P contains your new feature • • But your feature needs the bugfix, and Commit C is only in the history of master, not feat42 HEAD master C B A P feat42 Problem: • • Commit C contains a bugfix Commit P contains your new feature • But your feature needs the bugfix, and • Commit C is only in the history of master, not feat42 That is, you need the changes between B and C to be in the history of your feat42 branch: HEAD master C B A P feat42 Merge The most common technique is called merging: • Snapshot Pm will combine the image of C with the image of P • Pm will have more than one parent, making it a merge commit Pm HEAD master C B A P feat42 Merge demo git merge <source> Git will merge the branch you specify into your current branch (be careful!): (^>^) git checkout feat42 master C B A Pm feat42 HEAD P feat42 HEAD Merge demo git merge <source> Git will merge the branch you specify into your current branch (be careful!): (^>^) git checkout feat42 (^>^) git merge master Pm master C B A P feat42 HEAD Merge conflicts Notice that Git must figure out how to merge two snapshots. Recall: Different files: Same file, but lines don’t overlap: • Overwrite old version of each. (Trivial. Git does this automatically.) • Edit the file to incorporate each line modification. (Git is very good at figuring out how to do this.) Same file, same lines: • Conflicts! For each line, I choose what to do. (Git identifies conflicting files, and marks any conflicts inside files.) Pm master C B A P feat42 HEAD Merge conflicts • • • • • • Conflicts! For each line, I choose what to do. (Git identifies conflicting files, and marks any conflicts inside files.) Resolution: often interactively (meld, kdiff3, etc.) Simply part of life; not unique to Git. Not discussed further here. OK in small numbers, but pile up quickly. You can avoid many of them entirely! Pm master C B A P feat42 HEAD Merge conflicts branches diverge a lot between merges == a lot of merge conflicts at the next merge feat42 is done. I bet this merge takes all day. I wish I saw those changed lines before I edited them! F master E R D Q C P B A feat42 Merge conflicts branches diverge a little between merges == fewer merge conflicts at the next merge feat42 is done. Easy! Git combined most changes from master before I edited them! R Qm master E Q D Pm C P B A feat42 Merge conflicts branches diverge a little between merges == fewer merge conflicts at the next merge feat42 is done. Easy! Git combined most changes from master before I edited them! Merging feat42 onto master will be easy! R Qm master E Q D Pm C P B A feat42 Merge conflicts branches diverge a little between merges == fewer merge conflicts at the next merge feat42 is done. Easy! Git combined most changes from master before I edited them! Merging feat42 onto master will be easy! feat42 R master Git only had to move the master ref forward from E to R to indicate feat42 is in its history. No merge commit! This is a “fast-forward merge”. Qm E Q D Pm C P B A Merge conflicts branches diverge a little between merges == fewer merge conflicts at the next merge feat42 is done. Easy! Git combined most changes from master before I edited them! Merging feat42 onto master will be easy! Discipline counts: • Many small commits! • Separate changes logically! • Merge often! feat42 R master Git only had to move the master ref forward from E to R to indicate feat42 is in its history. No merge commit! This is a “fast-forward merge”. Qm E Q D Pm C P B A Git: Outline • Using Git to record your project while you work Commits • How Git records your progress The DAG • Using Git to organize your progress Branches • Sharing a project with other Git users Sharing • • Repositories are identified by URIs, but each repository can name them for convenience You transfer commits between a repository and its remote repositories (“remotes”) /home/alice/ master https://foo.org/bar.git Sharing When you first access a project whose files are in a Git repository, you perform the following sequence: 1. create a new repo /home/alice/bar/ master https://foo.org/bar.git Sharing When you first access a project whose files are in a Git repository, you perform the following sequence: 1. create a new repo 2. add to it the remote you specified (naming it “origin” by convention) origin https://foo.org/bar.git /home/alice/bar/ master https://foo.org/bar.git Sharing When you first access a project whose files are in a Git repository, you perform the following sequence: 1. create a new repo 2. add to it the remote you specified (naming it “origin” by convention) 3. copy origin’s branches and commits into your new repo (so you can easily keep your repo up-to-date) origin https://foo.org/bar.git master /home/alice/bar/ master https://foo.org/bar.git Remotes When you first access a project whose files are in a Git repository, you perform the following sequence: 1. create a new repo 2. add to it the remote you specified (naming it “origin” by convention) 3. copy origin’s branches and commits into your new repo (so you can easily keep your repo up-to-date) VERY common, so Git offers “clone” as a shortcut: origin https://foo.org/bar.git master /home/alice/bar/ (^>^) git clone https://foo.org/bar.git master https://foo.org/bar.git Remote tracking branches “clone” also sets up a branch ref which always points to the same commit in your repository as it does in the remote. This remote tracking branch is read-only: Git moves it for you based on where it was when you last checked. Here, your master branch is the tracking branch paired with origin/master. origin https://foo.org/bar.git master origin/master /home/alice/bar/ master https://foo.org/bar.git Notice that remote tracking branches are qualified by remote names! Syncing remotes Project members commit new commits to the master branch on the origin repository… origin https://foo.org/bar.git master origin/master /home/alice/bar/ master https://foo.org/bar.git Syncing remotes Project members commit new commits to the master branch on the origin repository… …so you need to update your repository! This happens in 2 steps: origin https://foo.org/bar.git master origin/master /home/alice/bar/ master https://foo.org/bar.git Syncing remotes Project members commit new commits to the master branch on the origin repository… …so you need to update your repository! This happens in 2 steps: 1. fetch the state of tracked branches from the remote, along with any new commits origin https://foo.org/bar.git origin/master master /home/alice/bar/ master https://foo.org/bar.git Syncing remotes Project members commit new commits to the master branch on the origin repository… …so you need to update your repository! This happens in 2 steps: 1. fetch the state of tracked branches from the remote, along with any new commits 2. merge origin/master onto your master branch origin https://foo.org/bar.git master origin/master /home/alice/bar/ master https://foo.org/bar.git Syncing remotes Project members commit new commits to the master branch on the origin repository… …so you need to update your repository! This happens in 2 steps: 1. fetch the state of tracked branches from the remote, along with any new commits 2. merge origin/master onto your master branch VERY common, so Git offers “pull” as a shortcut for both steps: origin https://foo.org/bar.git master origin/master /home/alice/bar/ master https://foo.org/bar.git (^>^) git pull origin Syncing remotes Project members commit new commits to the master branch on the origin repository… …and that means you (or maybe Bob); here’s how he did it: origin https://foo.org/bar.git master origin/master /home/alice/bar/ master https://foo.org/bar.git origin https://foo.org/bar.git master origin/master /home/bob/bar/ Syncing remotes Project members commit new commits to the master branch on the origin repository… …and that means you (or maybe Bob); here’s how he did it: 1. git pull origin # Just in case Alice put new commits there! Do this often! origin https://foo.org/bar.git master origin/master /home/alice/bar/ master https://foo.org/bar.git origin https://foo.org/bar.git master origin/master /home/bob/bar/ Syncing remotes Project members commit new commits to the master branch on the origin repository… …and that means you (or maybe Bob); here’s how he did it: 1. git pull origin # Just in case Alice put new commits there! Do this often! 2. place the new commits into the desired branch (master in this case); merge, direct commit, whatever origin https://foo.org/bar.git master origin/master /home/alice/bar/ master https://foo.org/bar.git origin https://foo.org/bar.git master origin/master /home/bob/bar/ Syncing remotes Project members commit new commits to the master branch on the origin repository… …and that means you (or maybe Bob); here’s how he did it: 1. git pull origin # Just in case Alice put new commits there! Do this often! 2. place the new commits into the desired branch (master in this case); merge, direct commit, whatever 3. git push origin # Naturally, this updates origin/master automatically origin https://foo.org/bar.git master origin/master /home/alice/bar/ master origin https://foo.org/bar.git master origin/master https://foo.org/bar.git /home/bob/bar/ Project organization • • • There are multiple copies of the repository. (Your laptop, your workstation, Bob’s laptop…) In principle, you can transfer commits from any repository to any other repository You and your team choose one to be the central repository by convention Github.com, Bitbucket.com, and similar services provide nice tools for organizing this. origin https://foo.org/bar.git master origin/master /home/alice/bar/ master origin https://foo.org/bar.git master origin/master https://foo.org/bar.git /home/bob/bar/ Git: Outline • Using Git to record your project while you work Commits • How Git records your progress The DAG • Using Git to organize your progress Branches • Sharing a project with other Git users Remotes The remainder is elaboration and convenience. So… How about now? xkcd.com/1597 “If that doesn't fix it, git.txt contains the phone number of a friend of mine who understands git. Just wait through a few minutes of 'It's really pretty simple, just think of branches as...' and eventually you'll learn the commands that will fix everything.” atlassian.com/git git-scm.com/book gitguys.com Thanks! Questions? atlassian.com/git git-scm.com/book gitguys.com icts.uiowa.edu/confluence events.uiowa.edu/event/git_workshop see “Version Control with Git” Prof. Hans Johnson Michigan Room, IMU Monday, 1:30-3:00 Next: hands-on session in the lab adam-harding@uiowa.edu