SteinBlog

Linus on GIT on Google TechTalks

I’m a big fan of Google TechTalks and watch a lot of them during flights. This week I enjoyed the recording of Linus Torvalds insulting all kinds of people including the whole SVN develoment team while introducing his distributed source code management system GIT. Egon had pointed me to GIT quite a while ago but seeing Linus himself discuss the issue made a difference.While CDK is still considerably smaller than the Linux kernel, I can see a lot of commonalities and I think that with our current development of having our fellow coadmins review important patches and branches, GIT sounds like a much easier way to do it.

In GIT the source code is distributed – there is no concept of a central source repository. Developers commit their chances to their local GIT systems, with all the advantages of versioning and source code history. Other developers pull code from you if they think that the changes you’ve advertised via your favourite communication channels are interesting. In theory, this allows for a very democratic and evolutionary code development. In addition to being distributed, GIT seems to be very fast when it comes to merging. Linus reports that he does hundreds of full merges per day and nothing take longer than 5 secs.

In practice, as Linus points out in his talk, there will always be one or very few repositories that people pull from – for the Linux kernel it will be Linus’s machine. In CDK it will very likely be Egon‘s. Sorry Egon, you’ve got to be online all day 🙂

The last sentence already brings me to the point. I wonder if we should give GIT a try for CDK development. The advantages do sound enormous. Ok, there are disadvantage too, such as loosing the central web browsing of the SVN repository on SF. There may be ways around this, as Egon decribed here, but this seems like not using the real thing.

This is a brief impression dump after watching Linus’ talk today and I’m happy to hear your opinions 🙂


Categorised as: Blue Obelisk, Chemistry Development Kit, Informatics, Open Standards, Publishing, Scientific Culture


8 Comments

  1. Sounds cool – for browsing, one could always denote the main person (i.e. Egon’s) repo as the one to browse and others could always provide web interfaces if they so desired. The former would be important since the Javadocs would need to link to the sources in one repo or another

  2. I’ve started using git for my personal repositories (my private electronic notebooks, so to say), and I quite like it. I still need to get the hang on merging, but I’m all in favor of this. Only real problem right now, is that I do not have a place at this moment where I can keep a browsable version of my (main) branch…

    Two other issues… One: cdk/ is not the only project… the others would need git repositories too… For example, Thomas good host the master branch for cdk-taverna. Two: we still desparately need someone to maintain the cdk1.0.x/ branch!

  3. At the Science Blogging 2008 London conference, choosing a 100% distributed service, would be the lack of a authority that can identify that something got uploaded at some point… do we find timestamping important?

  4. If I understand GIT correctly, there is indeed no particular time when a piece of code is commonly accepted. The workaround could be more frequent releases.

  5. The video mentions that people have been using git to do the merges SVN cannot do. I’ve just tried this on the CDK, and it works brilliantly! I just did this:

    1. apply a patch by Stefan in trunk to the cdk-1.2.x branch
    2. hack on cdk-1.2.x and commit
    3. merge cdk-1.2.x with trunk

    And git properly recognized that Stefan’s patch actually came from trunk, and that it did not have to apply that from trunk. If I knew this power earlier, it would have saved me and Miguel quite a few hours of merging horror.

  6. Egon, I think if we can resolve the problem of essentially the “normative” archive, then we should probably move. In principle, one would say that the release manager would have the normative archive but I guess GIT means having a less clear, more democratic way of things. But that again sounds like blah-blah considering that we will only have to sort out major contributions from 3 to 5 developers.

  7. I have set up a *mirror* at GitHub.com:

    http://github.com/egonw/cdk/tree/master

    It’s read-only, as SVN remains the main repository for now. If you like to promote a Git-replacement, then please reply to the relevant thread in the last week of September.

  8. Egon, excellent!
    I think that in the long run branches will play a more and more pronounced role in CDK and if GIT makes it much easier to merge, we should move after a thorough investigation.
    Cheers, Chris

Leave a Reply

Your email address will not be published. Required fields are marked *