|
|
|
S. E.
Stein, S. R. Heller, D. V. Tchekhovskoi
Physical & Chemical Properties Division
NIST |
|
|
|
|
|
|
|
|
|
Human |
|
Verbal – Common name |
|
Text – Systematic/Common name |
|
Pictorial – Structure diagram |
|
|
|
|
|
Chemical structure is the true ‘identifier’ |
|
But, structure representations are not unique or
convenient for computers. |
|
So, convert structure to a unique ‘name’ by
fixed algorithms |
|
The Iupac CHemical Identifier (IChI) |
|
|
|
|
|
“Authors” |
|
Precise |
|
Convention-free |
|
Wide coverage |
|
|
|
“Readers” |
|
Robust |
|
Variable specificity |
|
Long life |
|
|
|
“Publishers” (Software) |
|
Ready access |
|
|
|
|
|
|
Chemicals |
|
Fast isomerization (tautomerization) |
|
Ill-defined connectivity |
|
Chemists |
|
Differing conventions |
|
Depends on discipline, education and convenience |
|
Imprecision/uncertainty |
|
|
|
|
|
|
Chemistry |
|
‘Normalize’ Input Structure |
|
Implement chemical rules |
|
|
|
Math |
|
‘Canonicalize’ (label the atoms) |
|
Equivalent atoms get the same label |
|
|
|
Format |
|
‘Serialize’ Labeled Structure |
|
Output as character string (‘name’) |
|
|
|
|
|
Divide structure into ‘layers’ |
|
Each layer ‘refines’ structure |
|
|
|
Ignore ‘Electron Density’ |
|
Use simple ‘connectivity’ only |
|
Ignore bond type and electron location |
|
|
|
Stereochemistry |
|
sp2 and sp3 only |
|
Free rotation around single bonds |
|
No Z/E stereo for small rings (default) |
|
|
|
|
|
|
|
Disconnect metals and H-atoms |
|
Skeleton |
|
Reconnect fixed H-atoms |
|
Tautomerism |
|
Reconnect mobile H-atoms (optional) |
|
All connections fixed |
|
Reconnect metals (optional) |
|
Represent bonds to metals |
|
|
|
|
|
|
sp2 – double bond |
|
sp3 – tetrahedral |
|
{others added later} |
|
relative, absolute or racemic |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Aids structure validation |
|
|
|
|
|
|
Warnings/Errors |
|
Unusual valences |
|
Unrecognized input |
|
|
|
‘Reversibility’ |
|
Coordinates |
|
Bond/Charge Location |
|
|
|
|
|
|
|
|
|
|
|
How can you represent chemistry without
electrons? |
|
Chemistry is not represented, just identity |
|
Whole molecule properties are easily added. |
|
Do big molecules have big IChIs? |
|
Yes, just like systematic names |
|
How to handle other tautomers, substructures,..? |
|
Other software |
|
Is IChI reversible? |
|
Partly - contains only data needed for ‘naming’ |
|
Auxiliary fields can carry other information |
|
Is IChI extensible? |
|
New layers can add refinement |
|
|
|
|
Identify compounds at the known level of detail |
|
Convention-free (mostly) |
|
Generate quickly from structure |
|
Contains all essential connectivity information |
|
Simple ASCII representation |
|
|
|
|
|
|
|
|
|
Traceability |
|
Clarity (especially for computers) |
|
Indexing |
|
Effective ‘keywording’ |
|
Accuracy |
|
Error checking |
|
Automated Processing |
|
|
|
|
|
|
Why IUPAC? |
|
International Acceptance |
|
Comprehensive |
|
Open Process |
|
Long-standing |
|
Part of its mission |
|
|
|
|
|
|
|
|
|
|
|
|
Provide uniform chemical terminology for XML
documents |
|
Root for digital ‘tags’ in chemistry |
|
Model for future IUPAC recommendations |
|
|
|
|
|
Text |
|
‘Tag’ data and relations |
|
Chemical Structures |
|
To connection tables/CML/SVG |
|
Equations |
|
To MathML |
|
Figures & Complex Schemes |
|
Redraw in SVG |
|
|
|
|
|
|
|
|
|
|
|
|
Template’ for numeric property validation |
|
Ensure proper units and representation |
|
Traceable
to IUPAC definition |
|
Basic Tags for Common Properties |
|
Covers 15 ‘fields’ of chemistry |
|
|
|
|
|
|
Nov 12-14 Meeting at NIST |
|
|
|
IChI |
|
‘Final’ Beta Nov. 2003 |
|
Dissemination |
|
Databases, Software |
|
Version 2 |
|
|
|
XML Data Dictionary |
|
Gold Book Conversion |
|
Maintenance Method |
|
Green Book |
|
|
|