Notes
Slide Show
Outline
1
The IUPAC  International Chemical
Identifier
Project – InChI. An Open Source/Open Access Project.
  • Stephen Heller, Stephen Stein, & Dmitrii Tchekhovskoi
  • Physical & Chemical Properties Division
  • NIST
  • Gaithersburg, MD
  • srheller@nist.gov
2
The slides from this talk can be found at:
http://www.hellers.com/steve/pub-talks/pittcon305/frame.htm
3
 
4
 
5
 
6
 
7
 
8
Digital ‘Naming’ of Chemicals
  • Chemical structure is the true ‘identifier’
  • But, structure representations are not unique or convenient for computers.
  • So, convert structure to a unique ‘name’ by fixed algorithms


    • The IUPAC International Chemical Identifier (InChI)

9
Two Problems
  • Chemicals
    • Fast isomerization (tautomerization)
    • Ill-defined connectivity
  • Chemists
    • Differing conventions
      • Depends on discipline, education and convenience
    • Imprecision/uncertainty
10
3 Steps to InChI
  • Chemistry
    • ‘Normalize’ Input Structure
      • Implement chemical rules

  • Math
    • ‘Canonicalize’ (label the atoms)
      • Equivalent atoms get the same label

  • Format
    • ‘Serialize’ Labeled Structure
      • Output as character string (‘name’)
11
Normalize
Simplify
  • Divide structure into ‘layers’
    • Each layer ‘refines’ structure

  • Ignore ‘Electron Density’
    • Use simple ‘connectivity’ only
    • Ignore bond type and electron location


  • Stereochemistry
    • sp2 and sp3 only
    • Free rotation around single bonds
    • No Z/E stereo for small rings (default)
12
 
13
 
14
 
15
 
16
 
17
 
18
 
19
InChI FAQs
  • How can you represent chemistry without electrons?
    • Chemistry is not represented, just identity
    • Whole molecule properties are easily added.
  • Do big molecules have big InChIs?
    • Yes, just like systematic names
  • How to handle other tautomers, substructures,..?
    • Other software
  • Is InChI reversible?
    • Partly - contains only data needed for ‘naming’
    • Auxiliary fields can carry other information
  • Is InChI extensible?
    • New layers can add refinement

20
InChI Capabilities
  • Identify compounds at the known level of detail
  • Convention-free (mostly)
  • Generate quickly from structure
  • Contains all essential connectivity information
  • Simple ASCII representation
21
Early InChI Adaptors
  • NIH/NCBI/PubChem project – 700,000+ structures
  • NCI Database – 250,000+ structures
  • EPA –DSSTox database
  • KEGG database – 9584 structures



22
 
23
InChI FAQ’s
  • Available from Nick Day, Cambridge University, UK:
  • ned24@cam.ac.uk
24
 
25
Future
  • Future versions of InChI, for example, could include phase information and crystal structure, conformations, electronic states and additional classes of stereochemistry.


  • First additional project: Investigate adding polymers to InChI
26
Acknowledgements
  • Steve Bachrach, Steve Bryant,  Nick Day, Rene Deplanque, Gary Mallard, Alan McNaught, Bill Milne, Peter Murray-Rust, Henry Rzepa,
  • Peter Shepherd, and Bill Town