Notes
Outline
An Open Standard for Chemical Structure Representation

The IUPAC Chemical Identifier
 S. E. Stein, S. R. Heller, D. V. Tchekhovskoi
Physical & Chemical Properties Division
NIST
Slide 2
Slide 3
Communication of Chemical Identity
Human
Verbal – Common name
Text – Systematic/Common name
Pictorial – Structure diagram
Digital ‘Naming’ of Chemicals
Chemical structure is the true ‘identifier’
But, structure representations are not unique or convenient for computers.
So, convert structure to a unique ‘name’ by fixed algorithms
The Iupac CHemical Identifier (IChI)
Customer Needs
“Authors”
Precise
Convention-free
Wide coverage
“Readers”
Robust
Variable specificity
Long life
“Publishers” (Software)
Ready access
Two Problems
Chemicals
Fast isomerization (tautomerization)
Ill-defined connectivity
Chemists
Differing conventions
Depends on discipline, education and convenience
Imprecision/uncertainty
3 Steps to IChI
Chemistry
‘Normalize’ Input Structure
Implement chemical rules
Math
‘Canonicalize’ (label the atoms)
Equivalent atoms get the same label
Format
‘Serialize’ Labeled Structure
Output as character string (‘name’)
Normalize
Simplify
Divide structure into ‘layers’
Each layer ‘refines’ structure
Ignore ‘Electron Density’
Use simple ‘connectivity’ only
Ignore bond type and electron location
Stereochemistry
sp2 and sp3 only
Free rotation around single bonds
No Z/E stereo for small rings (default)
Slide 10
4 Connectivity ‘Sublayers’
Disconnect metals and H-atoms
Skeleton
Reconnect fixed H-atoms
Tautomerism
Reconnect mobile H-atoms (optional)
All connections fixed
Reconnect metals (optional)
Represent bonds to metals
Tautomer Sublayer
Stereochemical Sublayers
sp2 – double bond
sp3 – tetrahedral
{others added later}
relative, absolute or racemic
Slide 14
Assume Free Rotation Around Single Bonds
Slide 16
Nitrobenzene
MSG (tautomeric)
MSG (fixed)
Ferrocene
Byproducts:
Stereogenic Centers and Equivalent Atoms
Aids structure validation
Auxiliary Output
Warnings/Errors
Unusual valences
Unrecognized input
‘Reversibility’
Coordinates
Bond/Charge Location
Slide 23
Slide 24
Slide 25
IChI FAQs
How can you represent chemistry without electrons?
Chemistry is not represented, just identity
Whole molecule properties are easily added.
Do big molecules have big IChIs?
Yes, just like systematic names
How to handle other tautomers, substructures,..?
Other software
Is IChI reversible?
Partly - contains only data needed for ‘naming’
Auxiliary fields can carry other information
Is IChI extensible?
New layers can add refinement
IChI Capabilities
Identify compounds at the known level of detail
Convention-free (mostly)
Generate quickly from structure
Contains all essential connectivity information
Simple ASCII representation
Slide 28
Slide 29
Utility of Digital ‘Dictionary’
Traceability
Clarity (especially for computers)
Indexing
Effective ‘keywording’
Accuracy
Error checking
Automated Processing
Goal Color Books as Source of Basic Chemical Terms in XML
Why IUPAC?
International Acceptance
Comprehensive
Open Process
Long-standing
Part of its mission
Slide 32
Slide 33
Slide 34
The Gold Book is ‘Indexed’ on the Web
Gold Book in XML
Provide uniform chemical terminology for XML documents
Root for digital ‘tags’ in chemistry
Model for future IUPAC recommendations
Gold Book – PDF to XML
(implicit to explicit)
Text
‘Tag’ data and relations
Chemical Structures
To connection tables/CML/SVG
Equations
To MathML
Figures & Complex Schemes
Redraw in SVG
Slide 38
Slide 39
Slide 40
Green Book - Promise
Template’ for numeric property validation
Ensure proper units and representation
Traceable  to IUPAC definition
Basic Tags for Common Properties
Covers 15 ‘fields’ of chemistry
Next
Nov 12-14 Meeting at NIST
IChI
‘Final’ Beta Nov. 2003
Dissemination
Databases, Software
Version 2
XML Data Dictionary
Gold Book Conversion
Maintenance Method
Green Book
Thank you.
I will be to answer any questions.