I was invited to give my views on some new chemistry in European Bioinformatics at a Meeting held by the CICAG group of the Royal Society, held at Burlington House, London.
Peter Murray-Rust set the scene by emphasising the importance for Open Data. He showed some fantastic work on data extraction by OSCAR from theses, where his group had parsed a synthetic chemistry thesis into an interactive graph of a reaction network. He also showed an SVG animation of this graph as a reaction sequence, all automatically generated from an OSCAR run. Peter pointed out in the subsequent discussion that data cannot be copyrighted, which was acknowledged by all publishers in the audience. The reality is different, however, because publisher’s licenses often prevent downloading of more than few articles in a row. Detection of a robotic download for text mining comes with the danger of the whole university being disconnected. It is unclear to me how robotically parsing papers and extracting data would damage the bushiness model of publishers. It could, of course, lower the number of subscriptions from
Ian RusselL of ALPSP presented on Open Access models and how those of uses by ALPSP members. He pointed out that a lot of long-tail publishers publish only two or three journals, quite in contrast to ACS and RSC, for example. He stated that making profit is good, because it can be reinvested into innovation. I’m not sure if I’ve seen much innovation in the publishing business before the emergence of the Open Access model. He further commented on self-archiving stating that only pre-peer-review manuscripts can be self-archived without permission from the publisher. A librarian in the audience pointed out that duplication of costs by mixed read-pays and author-pays models have significantly increased the libraries expenses and Ian Russell comment was that there are no cost-savings in Open Access. Not sure if this helps. My impression is that it is not in the interest of publishers to resolve this conflict. [Editorial Note after Submission: Ian has replied to this part - See comments. ]
Robert Kiley, Wellcome Trust, summarized the Trust’s OA policy, where Trust-funded research needs to be put into pubmedcentral six month after publication. If I remember right, the Trust funds more than 90% of Biomedical research in Britain. The NIH now has a similar policy, and so has European Research Council; Robert mentioned that most text mining so far is based on PubMed abstracts, but that the full text would be required for serious efforts. He further pointed out that the number-one option for researchers to comply with the Trust’s OA policy would be to publish in a true Open Access Journal (BMC, PLOS, etc.). The second-best choice would be to publish anywhere and self-archive. The least preferable choice would be to publish with the ACS (one of the very few publishers without a Wellcome-Trust complient OA policy) and try to change the copyright notice . The Trust is in contact with publishers to make sure that authors have a wide variety of journal with open access policies to choose from. Robert highlighted the importance of OA for the long-term preservation of articles and data therein, with special emphasis on future-proofing the record of medicine. To check the compliance of authors with this new OA policy, the Trust conducted a study with 279 papers on Trust-funded research, where, if I remember right, over 90 percent of researchers were in compliance. Robert concluded his talks with mentioning UK PubMedCentral, which will be exposed for Text Mining, including Chemical Entities. The trusts next steps are to continue to work with publishers, monitor compliance of researchers, make funds available for OA and develop UkPMC
My own talk went about Chemistry at EBI and in European Bioinformatics in general.
Simon Coles, University of Southampton, talked about building repositories to preserve chemical data and publications. His view is that of an active crystallographer and he pointed out that this community could serve as a paradigm for other areas of chemistry. He pointed out that spectra are often published as inaccessible supplemental information without proper guidelines for representation. A great chance for capturing and publishing data in the very beginning are movements like OpenWetWare and Open Notebook Science which are catching and publishing laboratory experiments as they are done, and not 9 month after, when published and filtered. Simon mentioned that less than a quarter of cystral structures determined are actually published – a lot of data is just lost. This is certainly true for NMR data, an area of interest of my own research. The crystallographic community has made a step to increase the number of published crystal structures by creating Acta Cryst E (now open access) where publications just consist of the crystal structure it self and a few additional remarks. They just publish the crystal structure!!! In CIF format, computer-readable and harvestable. This would be a great step for NMR – publish structure and spectral evidence, semantically enriched, in a short communication. Simon further reported that Southampton has build the eCrystals Data Repository software based on the eGrid project, where crystallographic data can be easily deposited, supported by authoring tools, and serving as a laboratory archive. An embargo mechanism is implemented but once published, data can be harvested and analysed. The whole, funded study was really about preservation of research data, exemplified on Crystallographic data. To scale this up, Simon suggested (does is exist already?) to have a Federation of Crystallographic Data Repositories. He further mused about how that could be transferred to less well organized disciplines such as NMR, synthesis, etc. Long tail science is extremely fractioned, unorganised stuff sitting on Laptops, unorganised, inaccessible. A new kind of electronic lab notebook, the “Smart Tea Project“, can actually be taken into the lab, using mobile input devices. Analysis and discussion of on-the-fly captured data can then be published in the workgroup, between collaborators in the world, using blog-technology. Some software can enable machines and sensors to blog their results – a sensor for room temperature and or air flow in the lab for example can then be correlated with outcome of NMR experiments using time stamps.
Diana Leitch, University of Manchester, gave “the academic librarian’s perspective, in a unique way of just talking and *not* showing any slides How relieving.
I had to miss the two talks by Chris Leonhard from BMC, whose title seems to indicate growing support for Open Access and by by David Hool from the Nature Publishing Group, because I had to catch my flight from Stansted, which I also almost missed because the fantastically reliable British rail system – all trains from London to Stansted were cancelled, we were advised to take a local train to a small station in the middle of nowhere instead, where of course there was nothing like a connection to Stansted. I was one of the lucky few to get the last available taxi, driven in snail speed by an old lady Could have been very funny as a movie. Well, I made it to the gate in time, where “the machine was otherwise fully boarded and awaiting and an on-time departure” (Those of you travelling with Ryan Air regularly know what I’m referring to). I will never again be arrogant toward people showing up in Stansted sweating and on the very last minute. The all come from central London.
So, what it the bottom line from this meeting? The important message is perhaps that OA publishing has not yet quite reached chemistry but that there are grass-root movements which are going to revolutionize the way in which we publish science and scientific data, starting at the very first moment when research is performed in the lab.