|
|
|
S. E.
Stein, S. R. Heller, D. V. Tchekhovskoi
Physical & Chemical Properties Division
NIST |
|
|
|
|
|
|
|
|
|
Chemical structure is the true ‘identifier’ |
|
But, structures are not unique or convenient for
computers |
|
Convert structure (connection table) to unique
string of characters by algorithms |
|
The Iupac CHemical Identifier (IChI) |
|
|
|
|
|
Originators |
|
Capable of full representation |
|
Eliminate ‘conventions’ |
|
Maximum application |
|
|
|
‘Clients’ |
|
Robust |
|
Selectable specificity |
|
Software |
|
Implement through external structure processing
software |
|
|
|
|
|
|
Chemicals |
|
Rapid reaction (tautomerization) |
|
Ambiguous/uncertain structure |
|
Chemists |
|
Differing conventions |
|
Based on discipline, education and convenience |
|
|
|
|
|
|
Chemistry |
|
‘Normalize’ Input Structure |
|
Implement chemical rules |
|
|
|
Math |
|
‘Canonicalize’ (label the atoms) |
|
Equivalent atoms get the same label |
|
|
|
Convention |
|
‘Serialize’ the Labeled Structure |
|
Output as a Series of Bytes |
|
|
|
|
|
Ignore ‘Electron Density’ |
|
Double/Triple/Coordination bonds |
|
Odd-electrons/Charges |
|
|
|
Stereochemistry |
|
Free rotation around single bonds |
|
No stereo < 8-membered rings (default) |
|
Divide structure information into ‘layers’ |
|
|
|
|
|
|
|
Not required for compound identification |
|
Represent ‘excited states’ |
|
|
|
Simplify representations |
|
Delocalization, aromaticity, zwitterions,
coordination … |
|
|
|
|
|
|
|
|
Formula |
|
Connectivity |
|
Stereochemistry |
|
Isotopic ‘Corrections’ |
|
|
|
|
|
Disconnect metals and H-atoms |
|
Reconnect metals |
|
Reconnect H-atoms |
|
Non-mobile (non-tautomers) |
|
Mobile (distinguish tautomers) |
|
|
|
|
|
|
|
|
sp2 – double bond |
|
sp3 – tetrahedral |
|
{others added later} |
|
|
|
|
Byproduct of IChI Creation |
|
Assist chemists for structure confirmation |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Byproduct |
|
Label stereogenic atoms |
|
Identify equivalent atoms |
|
Warnings/Errors |
|
Unusual valences |
|
Unrecognized input |
|
|
|
‘Reversibility’ |
|
Coordinates |
|
Bond/Charge Location |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Each term on a PDF page |
|
Looks like printed page |
|
Links to other definitions |
|
|
|
For display only |
|
Not easily convertible |
|
Graphics, symbols, equations |
|
Text not ‘parsed’ |
|
No ‘metadata’ |
|
|
|
|
|
|
|
Text |
|
Perceive and tag data types and relationships |
|
Simple Structures |
|
To connection tables/CML/SVG |
|
Equations |
|
To MathML |
|
Figures & Complex Schemes |
|
Redraw in SVG |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use by Other MLs |
|
Schema |
|
Integrate dictionary via STMML |
|
|
|
Internal Maintenance |
|
Develop ‘authoring’ procedures |
|
|
|
|
Provide uniform chemical terminology for XML
documents |
|
Traceablity of terminology |
|
Root for the ‘tagging’ of chemistry |
|
Model for future IUPAC recommendations |
|
|
|
|
|
|
|
‘Template’ for numeric property validation in
chemistry |
|
Ensure proper units and representation |
|
Traceability
to IUPAC definition |
|
Basic Standards for Numeric Data ‘Tagging’ |
|
|
|
|
|
Periodic Table and Relative Molar Masses |
|
‘Official’ digital source |
|
Connect to relevant IUPAC information |
|
Root of chemical information ‘tree’ |
|
Spectroscopy, electrochemistry, thermochemistry,
catalysis, … |
|
|
|
|
|
Nov 12-14 Meeting at NIST |
|
|
|
IChI |
|
Final Beta Nov. 2002 |
|
Dissemination |
|
Version 2 |
|
|
|
XML Data Dictionary |
|
Finish Gold Book Conversion |
|
Maintenance Path |
|
Begin Green Book |
|