Continuous Integration: December 2013

Monday, December 23, 2013

How to use GIT with Subversion

The integration between git and Subversion (git-svn) is so well done that several of us have been using git as our interface to all our Subversion repositories. Doing this is fairly simple, but there are some interesting tricks, and so I thought I would share a day in the Viget life with git-svn.

Getting your repository set up

Checking out a Subversion repository is as simple as can be:

git-svn clone -s http://example.com/my_subversion_repo local_dir

The -s is there to signify that my Subversion repository has a standard layout (trunk/, branches/, and tags/.) If your repository doesn’t have a standard layout, you can leave that off.

As you would expect, this leaves you with a git repository under local_dir. It should map to the trunk of your Subversion repository, with a few exceptions. First, any empty directories under Subversion won’t show up here: git doesn’t track empty directories, as it tracks file contents, not files themselves. Also, files you were ignoring via svn:ignore are not ignored in this git repository. To ignore them again, run the following command in the root of your repository:

git-svn show-ignore > .gitignore

You may see this same command as git-svn show-ignore >> .git/info/exclude elsewhere, and that is a valid way to get the same outcome. The difference is that with the former method, you are adding a file in your repository that you can then track. Even though we will be committing back to Subversion, I like tracking .gitignore. As you may have noticed, running git-svn show-ignore is slow, and by committing .gitignore back to the repository, others using git-svn won’t have to run this again.

Making branches

One of the best reasons to use git is its lightweight local branches. These are not the same as Subversion branches: they reside locally and can be created, destroyed, and merged easily. When working on a project, you’ll probably want to create a branch every time you work on a new feature. It’s very simple to do so: run the command git branch new_branch_name master. “master” is the branch you are forking off a new branch from. If you want to fork from the current branch, you don’t have to say so. You can just as easily type git branch new_branch_name. To move to this new branch, you run git checkout new_branch_name. To make things easier, all of these steps can be combined, like so:

`git checkout -b new_branch_name [old_branch_name]`

Again, you do not have to include the old branch name if you do not want to.
To see all the branches in your repository, you can execute git branch. You might wonder why your Subversion branches do not show up. They exist as remote branches, not local branches, but you can still get to them. Execute git branch -a to see local and remote branches, which should show you your Subversion branches and tags.

Adding changes

Once you are in a branch for your feature, basic git usage is much like Subversion. When you create a new file, you execute git add <new_filename> to start tracking the new file, and git rm <filename> to remove a file. You will note that git add is recursive, and so git add . will add all new files in the current directory or below. git rm will do the same thing, but you have to explicitly give it a -r flag. Do not make the same mistake I did early on: I manually removed a few files, and then, extrapolating from my experience with git add ., I thought, “I’ll run git rm -r . to stop tracking those files.” Definitely not my smartest moment.

For files that are already being tracked, you will still have to run git add to add changes to a changeset. You can do this by executing git add <filename> or git add . (to add every change), but I prefer another way:

    cnixon$ git add --patch
    diff --git a/config/environment.rb b/config/environment.rb
    index f3a3319..34ece3a 100644
    --- a/config/environment.rb
    +++ b/config/environment.rb
    @@ -19,6 +19,8 @@ Rails::Initializer.run do |config|
    # Only load the plugins named here, in the order given. By default, all plugins in       endor/plugins are loaded in alphabetical order.
    # :all can be used as a placeholder for all plugins not explicitly named.
    # config.plugins = [ :exception_notification, :ssl_requirement, :all ]
    +
    + config.gem 'chronic'
    # Add additional load paths for your own custom dirs
    # config.load_paths += %W( #{RAILS_ROOT}/extras )
    Stage this hunk [y/n/a/d/?]? y
    <stdin>:9: trailing whitespace.
    warning: 1 line adds whitespace errors.

When you run git add --patch, git will show you every chunk - that is, every set of changes - you made, and ask you whether you want to add it or not. This easily allows you to commit some changes, but not others, but even more importantly, it begins to show you how git tracks changes differently. git does not track files, but instead tracks text and changes to that text. It is a good habit to get into: looking at each of your changes before a commit helps make sure you are committing what you think you’re committing.

Reverting changes

One command I used to use a lot in Subversion that doesn’t have a clear equivalent in git is svn revert. In git, how you revert depends on whether the change has been added to the current changeset. If it hasn’t, you can execute:

`git checkout <filename>`

I know, that seems incredibly strange. I thought so, too, at first. It makes sense when you think of it as re-checking out the last committed version over your changes.
If you have added your change to the changeset, you will have to remove it from the changeset first, by executing:

git reset HEAD <filename>

Committing changesets

Committing is very familiar if you are used to Subversion. You execute git commit and get prompted for a message in your favorite text editor. If you have outstanding changes, and want to add them without looking at them, you can execute git commit -a.

Merging a branch back to master

Once your feature is finished, you’ll want to merge your branch back to the master branch. This is way easier than it would be in Subversion. You just execute:

git checkout master
git merge <feature_branch>

If there are conflicts, you will be prompted to fix them. They aren’t interactive, so you will have to go fix them in your editor. Add the conflict fixes to the changeset with git add, and then commit.

Sometimes, you may make a lot of commits in your feature branch that you want represented as one commit in the main branch. This is especially useful if you’re using CruiseControl or some other automated testing tool: you can make one large commit that passes all the tests, and don’t necessarily have to worry about passing all the tests while developing your feature. To do this, add the --squash flag to git merge, like so:


git merge --squash <feature_branch>

Updating from and committing back to Subversion

Before committing back to Subversion, you will want to update to apply any new changes in the repository to your local Git repo.

git-svn rebase

This will download all new changesets from Subversion, apply them to the
 last checkout from Subversion, and then re-apply your local changes on 
top of that.

When you’re ready to commit back to Subversion, execute:

git-svn dcommit

There are a lot more commands and options in git to learn, but these 
should be sufficient for most day-to-day usage of git as a front-end to 
your Subversion repository. After using it for the last month, I cannot 
imagine going back.

Friday, December 20, 2013

Easy builds - Distributed CI

I recently defined Continuous Integration as the practice of merging development work with a Master/Trunk/Mainline branch constantly so that you can test changes, and test that changes work with other changes, this is the "as early as possible" integration methodology. The idea here is to test your code as often as possible to catch issues early (Continuous Delivery vs Continuous Deployment vs Continuous Integration)

Argument

The argument goes, that a Continuous Integration process is the process of constantly integrating all development work across the entire project into Mainline to detect issues early on in a code’s lifecycle. I argue that when utilizing Continuous Deployment, this is detrimental and the proper way is to integrate only Production ready code to Mainline, while merging Mainline back onto the isolated development work. Recently, I explained this concept to another developer and their response: “I have never thought about merging backwards from master to branches in order to run test, amazing”. Continuous Deployment is not achievable using traditional CI methodologies as described by ThoughtWorks and others because the bottlenecks will prevent the flow of code from developer to Production, but the simple notion of merging backwards to run tests in order to see a view of Mainline before you integrate upwards, will allow you to achieve Continuous Deployment

- See more at: http://blog.assembla.com/assemblablog/tabid/12618/bid/96937/Distributed-Continuous-Integration-Keep-the-Mainline-Clean.aspx#sthash.qgR2lFph.dpuf

Argument

The argument goes, that a Continuous Integration process is the process of constantly integrating all development work across the entire project into Mainline to detect issues early on in a code’s lifecycle. I argue that when utilizing Continuous Deployment, this is detrimental and the proper way is to integrate only Production ready code to Mainline, while merging Mainline back onto the isolated development work. Recently, I explained this concept to another developer and their response: “I have never thought about merging backwards from master to branches in order to run test, amazing”. Continuous Deployment is not achievable using traditional CI methodologies as described by ThoughtWorks and others because the bottlenecks will prevent the flow of code from developer to Production, but the simple notion of merging backwards to run tests in order to see a view of Mainline before you integrate upwards, will allow you to achieve Continuous Deployment

Broken Builds

Oh and lets not forget broken builds - broken builds prevent anyone from being able to reliably take code from Mainline or move code to Production from Mainline. Centralized CI has a way to “fix” this, rule #2: never break the build, if you do you must fix it, and not leave until you do. OK, so nevermind the insanity of the first part of this rule, since the only way to ensure satisfying it is by never committing: I cannot control how my code integrates with other unknown code.. And sure, if I break something, I understand that I must fix it. But how do we know it was my code that broke the build and not yours, we don’t. But assume it is me who broke it, now everyone is waiting for me to fix it. I have now become a bottleneck and no code can move past Mainline, shoots, I guess Friday night beers are not in my future. And the pipeline from developer to Production has just halted dead in its tracks, no code can be Continuously Deployed. Some places call this a lesson (http://www.hanselman.com/blog/FirstRuleOfSoftwareDevelopment.aspx) and assume that you will learn from this ordeal, I think you will fear it, yes, but to work in fear . . . I prefer not to. I prefer to have systems that do require me to stay late to keep Mainline stable unduly, but rather a system that always ensures Mainline is stable, before and after my code is merged.

Distributed CI deals with this with a “mergeback” from Mainline to your developer branch - ensuring that you have a clean build after running unit tests on the branch, then merging up to Mainline. Otherwise, having failed a developer branch unit test - do not merge to Mainline, do not become the bottleneck, go out for beers, rest well, and fix it tomorrow. After the developer branch is merged to Mainline, do not run the unit tests again - the test suite was already run against a copy of Mainline in your development branch, already knowing that all tests have passed, the code is deployed right out to Production. Then Mainline is merged backwards to other developers’ branches, and unit tests are run individually on each developer branch. If any tests fail - the owner of that branch must fix the issue.

Feature Branches

Rule #3 no feature branches. Wait what? Hold on, I like my feature branches. But Jez did just say: “But you can’t say you’re doing Continuous Integration and be doing feature branching. It’s just not possible by definition (while waving hands)”, didn’t he? Looking at the definition of CI from Martin Fowler of ThoughtWorks:

Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible.

I do not actually see anywhere that says CI cannot have feature branches, just that all developers work must integrate frequently - not even daily. Well in Jez’s Centralized CI - feature branches are a no go - you must integrate daily ALL changes to a centralized Mainline. But hold on - this code can’t be integrated into Mainline, its weeks away from delivery. In distributed CI - a feature branch is nothing more than a developer’s branch. Feature branch away. Mainline will be integrated back into the Feature branch after any Production deploy and all new stable code will be integrated and tested immediately.

Hmmm, sounds like feature branching and CI are not mutually exclusive - nice. But Jez was very clear on this. Yes, his worry is about long running code not being integrated with. OK - so you want a true feature branch and not integrate up to Mainline for longer than today. Well you can. Since you are always integrating backwards from Mainline. But you must continually do this. Once we are ready to move the feature branch into Mainline - we already have a snapshot of what Mainline would look like, the feature branch is Mainline plus all new code of the feature branch, and it has been Continually Integrated with since creation every time Mainline is updated. If two developers want to integrate with each other, then they are able to in the Distributed CI World, they can actually be moved behind another integration point, where their work is merged up to it and then to Mainline. This is basically utilizing the ability to create a distributed network of developers working off of various localized Mainlines that merge up to a Master Mainline.

s Distributed CI really CI?

Yes. if you are implementing Distributed CI and CD together, you will always be integrating your developer code with the latest stable release to Production. If an integration test fails after the release to Production, it will fail for a developer’s branch, not everyone, so we know the integration failure is isolated to that developer’s branch and the new code; the developer whose failed branch must attend to it. In Centralized CI, the developers must discuss and work out this issue to see who is at fault. CD demands that you deliver code to Production often. In Distributed CI, anytime you deliver code to Production, the Mainline branch is merged back to developer branches and tested (thank goodness for automation). What does this mean, lets look at the life cycle of a commit a little closer:

- See more at: http://blog.assembla.com/assemblablog/tabid/12618/bid/96937/Distributed-Continuous-Integration-Keep-the-Mainline-Clean.aspx#sthash.qgR2lFph.dpuf

Is Distributed CI really CI?

s Distributed CI really CI?

- See more at: http://blog.assembla.com/assemblablog/tabid/12618/bid/96937/Distributed-Continuous-Integration-Keep-the-Mainline-Clean.aspx#sthash.qgR2lFph.dpuf

s Distributed CI really CI?

- See more at: http://blog.assembla.com/assemblablog/tabid/12618/bid/96937/Distributed-Continuous-Integration-Keep-the-Mainline-Clean.aspx#sthash.qgR2lFph.dpuf

The top diagram shows a Centralized CI process, where the developer merges into a Mainline integrated with other developers commits before running Unit Tests. Mainline cannot necessarily be deployed since some work from developers may not be Production ready. The lower diagram show a Distributed CI process where the developer merges Mainline backwards, then integrates up to Mainline, then pushes on through to Production.

If you are performing CD, you are releasing very often, each stable commit is still tested with every developer’s development work. Except in distributed CI - its tested individually, so detection of issues is easier - the amount of conflicting code is less, and isolated - to a developer’s branch. Distributed CI is a more thorough form of CI when implemented with CD - which attempts to break down processes into smaller automated pieces. Centralized CI is suited for iteration release planning, but distributed CI will work just as well for this and better. If you are doing centralized CI - then you are not doing Continuous Deployment. Dstributed CI removes all the bottlenecks and constraints placed on the workflow from developer to production that centralized CI requires.

So why all this confusion around CI and the artificial constraints put on it? Clearly, the centralized VCS is the basis for these constraints. Basically, Distributed CI is treating each developer branch like it is Mainline and testing there instead of up to a centralized Mainline. Perhaps its because CI grew up in a time of non-distributed VCS and has not modernized since. Now is the time to utilize the advantage of a distributed VCS. Unleash those feature branches, and commit code now, merge to Mainline to see it in Production moments later - yes, moments later, every time.

Watching a presentation by Jez Humble of ThoughtWorks, who defines Continuous Integration (CI) in relationship to Continuous Delivery, I realized that my definition was in direct opposition to 2 minutes of his presentation: http://www.youtube.com/watch?v=IBghnXBz3_w&feature=youtu.be&t=10m But why is Jez so adamant about these points, whereas I feel I am doing CI without everyone committing to a Mainline daily, with having Feature branches and often with working locally, making commits without having ALL integrated tests running. Can I be? I say yes, and the reason is because of where the integration points are and what kind of code is moving forward to Mainline - unstable code or stable code. These differences keep a clean stable Mainline which in turns gives the ability to deploy code to Production anytime. Jez’s definition of CI (Centralized CI) causes bottlenecks and is in direct contradiction with the process of Continuous Deployment (CD), whereas Distributed CI is able to remove these barriers while still giving confidence good code is moving to Production.

- See more at: http://blog.assembla.com/assemblablog/tabid/12618/bid/96937/Distributed-Continuous-Integration-Keep-the-Mainline-Clean.aspx#sthash.qgR2lFph.dpuf