Structure Elucidation of Unknown Metabolites (2) – De Novo vs Look-Up
There has always been a bit of confusion in the terminology around the subject of computer-assisted structure elucidation (CASE), so let’s define some terms:
- Structure Elucidation – Determining the structure of truly unknown compounds de-novo from spectroscopic (or earlier by hard core chemistry :)) experiments
- Structure Identification – Recovering the structure of already known compounds from databases or printed reference material (including experimental sections of the primary literature)
- Dereplication: Same as 2.
The more we discover, the more likely it will be that we are able to de-replicate by database lookup. This of course requires well curated and developed open access databases that cover many chemical compounds/metabolites.
In organic chemistry, spectroscopic databases for structure identification where published quite early, albeit as closed-access, commercial systems. The most widely used examples is probably the SpecInfo database which now seems to be marketed by Wiley and the more recently (considering the 40-year horizon of the topic :)) published ACD/Labs spectral libaries and management system. Wolfgang Robien in Vienna has been developing NMR spectral databases and prediction tools for a long time.
The general way of searching in such databases would be to measure an NMR spectrum of your isolated unknown compound, perform a peak picking and search the database using this peak picking (a feature vector, if you wish).
In the early 2000’s my Stephan Kuhn in my group developed the NMRShiftDB database which was the first open access, open source, open submission, web-based NMR database where you can now test how this all works without running into pay walls. Stephan has left the lab and now runs version 2 of this database in collaboration with the NMR lab at the department of chemistry at the University of Cologne.
One caveat: It is much easier to search for carbon-13 NMR spectra or mass spectra than for proton NMR spectra. The latter has rarely been addressed, not the least because of the lack of full spectrum proton data to which you could match a real-life proton spectrum. Peak-picking proton NMR spectra is problematic often due to overlap and complex coupling patterns.
Take for example the carbon-13 spectrum of pinocarveol, both from the metabolomics section of BioMagResBank (BMRB). Using your NMR software’s peak picking method, you would end up with this list of NMR signals. If you have a decent browser, such as FireFox, you can use the CMD (on mac) or CTRL key and select the chemical shifts in the table linked above. If not, here they are:
Categorised as: Open Science