In a heroic act, our collaborator Dan Gezelter, University of Notre Dame, has unearthed an historical document about our Chemistry Development Kit (CDK) from a SCSI DDS-4 tape he found somewhere on his attic. This morning, I found an email from Dan in my Inbox saying:
I finally located a working SCSI DDS-4 tape drive and found a tape of the appropriate age which had copies of those pictures. I'll spare you the details of the hardware hacking required, but we do have the original images attached below. I hope they are useful at the next CDK meet-up, and I'm sorry it took so long to line up the tape drive.
All the best! --Dan
Thanks, Dan – I appreciate what you went through.
Now, why that hype from my side? Well, first of all because it is a piece of CDK history, in fact the earliest possible piece of documentation for the CDK. Second, because it is a piece of personal history. And third because is nicely shows how fragile our modern way of handling digital documents is.
So here is the background story: After I had written and published my first Java cheminformatics toolkit, the so-called CompChem classes, which formed the basis for my first version of JChemPaint and also for my structure elucidation package SENECA, it quickly became clear that there were some design flaws, including the whole thing not really being object-oriented, and that the package needed a re-write. By that time friend Egon had come on board, contributed a lot to JChemPaint development, and we also had joined the development of Jmol, now the worlds premier open source applet for 3D vizualization of molecular structure, started by Dan Gezelter.
In August 2000, Dan had just moved to his first assistant professorship at the University of Notre Dame in Chicago and it happend that Egon and I were going to a conference in Washinton, so we decided to stop over at ND and spend some time with Dan to discuss how a new cheminformatics toolkit should look like. During the brainstorming sessions in Dan’s office, the attached snapshot was taken from Dan’s whiteboard. On the flight back to Europe, I wrote the first version of the base classes and released them. The CDK grew rapidly and the rest is history. The photo was available on Dan’s Open Science website for a while but fell victim of some re-arrangment or upgrade of the site.
This year, we had held the fourth CDK workshop, for the first time at the European Bioinformatics Institute (EBI), and it occurred to me that that picture should be part of any introductory talk about the CDK. Having realized that I could not find the picture anywhere, and neither could Egon, I contacted Dan literally five minutes before the workshop, he immediately recognized the pressing nature of my request
and promised to look at the old backup tapes he knew where lying around somewhere on his attic. And so he did. Understandably, he didn’t make it in time for the workshop, but too be honest I did not think that he would find anything and so I’m extremely happy now.
And here is the photo

Ah, cheers for the elaborate blog! This very much justified and *very* important.
very cool!
Honestly, it wasn’t all that heroic. I found the tape within 5 minutes. After that, it was just a matter of pride and stubbornness to extract the data off of it, whether or not other people wanted it.
And you are completely correct about the fragility of modern electronic documents. These were tapes that were only 9 years old, and they were essentially unreadable without rebuilding a clunker PC. I wonder what gives some formats (i.e. CDs) longevity, while others (tape, floppy, zip disks, CF cards) are so much more ephemeral. 10 years from now, will we be scrounging around for old machines with USB ports so that we can recover data from thumb drives?
My current approach to not loosing anything is too not use any traditional storage media but a couple of large NAS devices which are kept in sync. That works for my digital life so far (despite a couple of things which got lost due to my own stupidiy), including almost all email I ever read and received since 1989. It will clearly not work as easily for research data, large spectral files, etc.
The NAS’s are updated on a regular basis, assuming that storage capacity will grow quicker than my ability to fill it. Again, no chance for that approach in areas where machines fill the storage space
It seems that the API of earlier versions of the CDK were severely limited by the size of available whiteboards.
Programming design principle looks like a classical blue *waterfall* in the middle, with some green *pragmatic*s on the left, and some red *extreme*s surrounding it.
LOL
Design Principle? What Design Principle?