The IUPAC Chemical Identifier
Steve Stein, Steve Heller,
Dmitrii Tchekhovskoi
National Institute of Standards and Technology
Gaithersburg, MD, USA


 
U.S. Government Chemical Databases Frederick, MD
July 22, 2003
The most widely-used, and perhaps best understood object under discussion in chemistry is the chemical compound. Of course, we define chemical compounds by their chemical structure, as typically shown in 2D diagrams or held as ‘connection tables’.

Pronouncible names have been developed for oral and written communication, though derivation of a name from a structure can require highly complex rules known only to experts. ‘Understanding’ these names requires reversing the naming process to derive the original structure. They are clearly less direct and often less effective means of identifying chemicals.

In the current digital age, where compounds are represented digitally, the need for effective identifiers is no less important. Freed from the restriction of ‘pronouncibility’, chemical identifiers can be tied more directly to structures. In fact, they can be derived directly from structure by algorithm such that any structure that can be drawn can be ‘identified’.

This project aims to develop such a set of algorithms to serve as the unique identifier for a compounds, its digital signature. Since a series of characters is the method of storage and transmission of information, such a string, derived from a structures, is the output format.


 speaking an While the most fundamental description of the identity of a compound is its structure, this requires a picture, which is not usable for speech and often inconvenient for text. The use of pronouncible names is very efficient for common substances, which often acquire a ‘trivial’ name, but can be cumbersome or impossible for complex compounds.


Describe what has been done, what remains to be done