Notes
Slide Show
Outline
1
InChI
The IUPAC International Chemical Identifier
2
 
3
 
4
Current Practice: Guanine
5
A Unique Chemical Substance Has a Unique, Natural Name
  • Identify expressed as ‘connectivity’
  • Its nature may be expressed in different levels of specificity
  • Creating a unique ‘name’ requires complex algorithms to convert connectivity to a ‘canonical’ string.
  • InChI is proposed as such a public name
6
InChI Goal
  • Provide a unique string representing a chemical substance of known structure
    • Independent of specific depiction
    • Derive from conventional ‘connection table’
    • Freely available
    • Extensible
7
Outline
  • Evolution
  • Technical Points
  • Current Usage
  • Searching the Internet
  • Extensions
8
InChI: NIST History
  • NIST/EPA/NIH Mass Spectral Database (1976)
    • 32,000 connection tables (1986)


  • Structures & Properties (1990)
    • Reference Data & Structure-Based Property Prediction


  • NIST Chemistry Webbook (1991)
    • http://webbook.nist.gov/

  • ‘Standard’ canonical connection table (1992)
    • Internal use only

  • NIST + IUPAC (2000)
9
 
10
 
11
 
12
Unplanned Extensions
2003-2006
  • Salts and H-migration
  • Organometallics
  • Protonation
  • InChI -> Depiction
  • Conformance test
13
http://www.iupac.org/inchi
14
 
15
International Chemical Identifier
16
Some Technical Details
17
Formal Requirement 1
  • Different compounds have different identifiers
18
Formal Requirement 2
  • One compound has only one identifier
19
Practical Requirements
  • Simple text format
  • Represent ‘equilibrated mixtures’
  • Account for incomplete stereo information
  • Handle coordination/imprecise bonding
  • Extensible
20
 
21
InChI Layers: L-Histidine
22
3 Steps to InChI
  • Chemistry
    • ‘Normalize’ Input Structure
      • Implement chemical rules

  • Math
    • ‘Canonicalize’ (label the atoms)
      • Equivalent atoms get the same label

  • Format
    • ‘Serialize’ Labeled Structure
      • Output as character string (‘name’)
23
Normalization
  • Divide input structure into ‘layers’
    • Each layer ‘refines’ structure

  • Ignore ‘Electron Density’
    • Ignore bond type and charge location


  • Stereochemistry
    • sp2 and sp3 only
24
 
25
Current Usage
26
 
27
 
28
 
29
Nick Day’s InChI FAQ
30
Illustration - Draw Compound
31
Generate InChI
32
Reverse InChI
33
Reversed Structure
34
Searching the Internet
35
"Results 1 - 8 of..."
  • Results 1 - 8 of about 24 for “InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2”. (0.12 seconds)
  • International Chemical Identifier - Wikipedia, the free encyclopedia ethanol, InChI=1/C2H6O/c1-2-3/h3H,2H2,1H3. Image:Ascorbic_acid.png L-ascorbic acid, InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1 ...
    en.wikipedia.org/wiki/International_Chemical_Identifier - 18k - Cached - Similar pages
  • L-ascorbic acid: Carcinogenic Potency Database InChI Code for l-Ascorbic acid: InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2?,5-/m0/s1 Source for SMILES and InChI: USEPA Distributed ...
    potency.berkeley.edu/chempages/L-ASCORBIC%20ACID.html - 17k - Cached - Similar pages
  • carcinogenic potency project s1 50-81-7 l-Ascorbic acid 176.1241 C6H8O6 OC([C@H]([C@H](O)CO)O1)=C(O)C1=O InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2?,5-/m0/s1 ...
    potency.berkeley.edu/chemical-structures.text - 234k - Cached - Similar pages
    [ More results from potency.berkeley.edu ]
  • International Chemical Identifier at AllExperts L-ascorbic acid, InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1. Layer types. There are six InChI layer types: # Main layer ...
    experts.about.com/e/i/in/International_Chemical_Identifier.htm - 22k - Cached - Similar pages
36
Limitations in Searching
  • Punctuation
    • Misses structures
    • Loses uniqueness
  • Layering is inflexible
    • Needs separate parser
  • Depiction is lost
    • Needs 2-d display module
37
Information Loss:
replace punctuation with spaces
38
Extensions
39
Canonical Numbers Are Key
  • Each number in a structure provides a precise way to connect other moieties.
    • Unspecified atom/group
      • Connection point, atom type, ..
    • Predefined structural unit
      • Sugars, polymers, …
    • Markush series
      • Substructure search queries
40
IUPAC vs InChI Atom Numbering
41
GLYDE-CT
  • Glycan Data Exchange Format – Connection Table
  • Carbohydrate structural data exchange
  • Use InChI canonical numbers for building glycans from simple sugars


  • http://lsdis.cs.uga.edu/~cory/GLYDE-v2_1-08-08-06.pdf
42
 
43
Depiction