SteinBlog

Creating and Reviewing Patches in the Chemistry Development Kit (CDK)

In order to prevent major turbulences in the main source code development line of the Chemistry Development Kit (CDK), we decided a while ago to have separate branches in our subversion source code management system for each developer and each of his subprojects. Once a project has been finalized by a developer in her branch, she would then publish a patch in the CDK patch tracker system and ask for it to be reviewed by posting to the CDK developers mailing list. A CDK senior developer would the assign the patch to himself or another senior developer.

I have just been assigned the task to review the recent Iterator/Iterable patch for CDK and will protocol my task for reference reasons.The patch was published on the CDK patch tracker.

The executive summary of the reviewing task goes like:

  1. browse the code
  2. mark up code you think is buggy
  3. note missing unit tests
  4. note missing JavaDoc
  5. warn for subjected PMD warnings
  6. optionally note other problems
  7. optionally any other comment you have

So, let’s see how it went:

Browse the Code

I got the gzipped archive with Egon’s patch and looked at the code. A large part of the changes involve

removing       public Iterator<IIsotope> isotopes() {
and adding    public Iterable<IIsotope> isotopes() {

to enable things like

double overallCharge = 0.0
for (IAtom atom : molecule.atoms()) {
overallCharge += atom.getCharge();
}

In order to implement Iterable, one needs to have methods returning an Iterator, so a lot of code essentially implements those.

Remove:       public java.util.Iterator atoms() {
and add:      public Iterable<IAtom> atoms() {
logger.debug("Getting atoms iterator");
return super.atoms();
}

And then there is code actually using those iterators and all of these instances had to be adapted too (I’m just giving the patch syntax):

for(IReactionScheme rm : scheme.reactionSchemes()){
-                       for(Iterator<IAtomContainer> iter = getAllMolecules(rm, molSet).atomContainers(); iter.hasNext(); ){
-                       IAtomContainer ac = iter.next();
-                       boolean contain = false;
-                       for(Iterator<IAtomContainer> it2 = molSet.molecules();it2.hasNext();){
-                               if(it2.next().equals(ac)){
-                               contain = true;
-                               break;
-                       }
-                       }
-                       if(!contain)
-                               molSet.addMolecule((IMolecule)(ac));
-                       }
+                for (IAtomContainer ac : getAllMolecules(rm, molSet).atomContainers()) {
+                    boolean contain = false;
+                    for (IAtomContainer atomContainer : molSet.molecules()) {
+                        if (atomContainer.equals(ac)) {
+                            contain = true;
+                            break;
+                        }
+                    }
+                    if (!contain)
+                        molSet.addMolecule((IMolecule) (ac));
+                }

Overall, the patch affected 288 classes including test classes, with almost 2000 lines of code changed.

Mark up code you think is buggy

Impossible to do for me for such a large bunch of changes, so one must rely here on the unit tests to work.

Note missing unit tests

Egon had posted some notes about comparing failing and passing between unit tests earlier but we also need an automatic check for unit test coverage. And yes, of course, there are limits to what such an automated coverage tool can do.

With regard for failing unit tests, the “iterable” branch did have anymore failures and errors than the head branch.

Note missing JavaDoc

We’ve go DocCheck results on our CDK nightly pages but nothing tells you whether a patched method is missing neccessary JavaDoc. Presumably, we could “grep” the patches class names into a DocCheck input file and get customized info about it.

Warn for subjected PMD warnings

PMD is a tool for checking code with respect to adherence to certain coding standards.Again, the CDK nightly page contains all PMD reports on the CDK code, generated in nightly runs. The same can be achieved for each branch with a “ant -f pmd.xml”  on your local copy of the branch.

Optionally note other problems

I love optional things and tend to let them be optional

Optionally any other comment you have

Dto.

So, overall I would like to conclude that according to the best of my knowledge, the Iterable patch should be safe and can be applied to the HEAD branch.


Categorised as: Chemistry Development Kit, Life of Chris, Open Source, Open Standards, Scientific Culture


3 Comments

  1. Excellent! Going to merge now then. Using git…

  2. baoilleach says:

    To me, it looks like this patch does not pass 3 and 4. I would reject it and send it back to the submitter for evidence that 3 and 4 are passed. Better to fix the problems now…

  3. Christoph says:

    I found it hard to detect, in reasonable time, which of the methods that were affected, had no unit tests before. I wasn’t therefore sure what to do here.

Leave a Reply

Your email address will not be published. Required fields are marked *