{"id":58,"date":"2008-10-07T17:19:00","date_gmt":"2008-10-07T15:19:00","guid":{"rendered":"http:\/\/www.steinbeck-molecular.de\/steinblog\/?p=58"},"modified":"2008-10-07T17:19:00","modified_gmt":"2008-10-07T15:19:00","slug":"faster-fingerprints-for-the-cdk","status":"publish","type":"post","link":"http:\/\/www.steinbeck-molecular.de\/steinblog\/index.php\/2008\/10\/07\/faster-fingerprints-for-the-cdk\/","title":{"rendered":"Faster Fingerprints for the CDK"},"content":{"rendered":"<p><a href=\"http:\/\/www.ebi.ac.uk\/Information\/Staff\/person_maintx.php?s_person_id=261\" target=\"_blank\">Mark Rijnbeek<\/a>, who has moved to my team last month to work on the chemistry search engine for our <a href=\"http:\/\/www.steinbeck-molecular.de\/steinblog\/index.php\/2008\/07\/23\/breaking-news-open-access-to-large-scale-drug-discovery-data-at-ebi\/\" target=\"_blank\">new chemogenomics data<\/a>, has given <a href=\"http:\/\/rguha.wordpress.com\/\" target=\"_blank\">Rajarshi<\/a>&#8216;s <a href=\"http:\/\/rguha.wordpress.com\/2008\/09\/12\/faster-fingerprinting\/\" target=\"_blank\">new fingerprint implementation<\/a> a test. Mark was bored to hell by the performance of the version he had in hand and it turned out that it was my old one, which had served us well for quite a while but turned out to be unusable for the amount of data we are testing now.<\/p>\n<p>So he <a href=\"http:\/\/sourceforge.net\/project\/showfiles.php?group_id=20024\" target=\"_blank\">downloaded CDK 1.04<\/a>, just released a few days ago, and have it a shot.<\/p>\n<p>Mark wrote:<\/p>\n<p>&#8220;Here&#8217;s what happens: fetch  1000 molfile clobs from Oracle, put them in a list, create a list of  Molecule objects from that, and lastly calculate fingerprints on that  last list.<br \/>\nBelow is Java system output, each CDK version tested against a 1000  compounds, twice.<br \/>\nThe numbers are milliseconds [passed] since program start.<br \/>\nThe performance increase is very significant; the older CDK  fingerprinter took about a minute (see below) for 1000 fingerprints, the  new one about 7 seconds.&#8221;<\/p>\n<p>The numbers for the &#8220;old code&#8221;:<\/p>\n<pre>0 - Start benchmark 1000 compounds.\r\n84 - Fingerprinter set up\r\n531 - Connected to database\r\n120 - Resultset opened\r\n1706 - Molfile strings retrieved from database, stored in list\r\n3231 - Molecule objects list built\r\n64202 - Fingerprints calculated<\/pre>\n<p>And then CDK 1.04:<\/p>\n<pre>0 - Start benchmark 1000 compounds.\r\n77 - Fingerprinter set up\r\n536 - Connected to database\r\n118 - Resultset opened\r\n909 - Molfile strings retrieved from database, stored in list\r\n2360 - Molecule objects list built\r\n9900 - Fingerprints calculated<\/pre>\n<p>These numbers are just one representative instance from multiple runs performed by Mark. They do not quite fit the <a href=\"http:\/\/rguha.wordpress.com\/2008\/09\/12\/faster-fingerprinting\/\" target=\"_self\">numbers reported by Rajarshi<\/a>, but the conditions were to different to be comparable. In our case, the achieved speed-up is 8-fold, which is a nice success and even better than Rajarshi&#8217;s reported 4-fold speed-up.<\/p>\n<p>We plan to soon be reporting on benchmarking a much larger dataset.<\/p>\n<p>Thanks, Rajarshi. Great stuff!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Mark Rijnbeek, who has moved to my team last month to work on the chemistry search engine for our new chemogenomics data, has given Rajarshi&#8216;s new fingerprint implementation a test. Mark was bored to hell by the performance of the version he had in hand and it turned out that it was my old one, [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-58","post","type-post","status-publish","format-standard","hentry","category-open-science"],"_links":{"self":[{"href":"http:\/\/www.steinbeck-molecular.de\/steinblog\/index.php\/wp-json\/wp\/v2\/posts\/58"}],"collection":[{"href":"http:\/\/www.steinbeck-molecular.de\/steinblog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.steinbeck-molecular.de\/steinblog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.steinbeck-molecular.de\/steinblog\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.steinbeck-molecular.de\/steinblog\/index.php\/wp-json\/wp\/v2\/comments?post=58"}],"version-history":[{"count":0,"href":"http:\/\/www.steinbeck-molecular.de\/steinblog\/index.php\/wp-json\/wp\/v2\/posts\/58\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.steinbeck-molecular.de\/steinblog\/index.php\/wp-json\/wp\/v2\/media?parent=58"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.steinbeck-molecular.de\/steinblog\/index.php\/wp-json\/wp\/v2\/categories?post=58"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.steinbeck-molecular.de\/steinblog\/index.php\/wp-json\/wp\/v2\/tags?post=58"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}