Notes
Slide Show
Outline
1
The InChI/InChIKey – an Open Source Algorithm for Linking Chemical, Scientific, Medical, and Healthcare Information.

Stephen R. Heller
IUPAC
steve@hellers.com
2
The InChI Team
  •    Stephen R. Heller
     Alan McNaught
  • Igor Pletnev
  • Stephen E. Stein
    Dmitrii Tchekhovskoi
3
Objective from the Health & Life Sciences perspective
4
 
5
InChI
  •     Using an InChI/InChIKey knowing you find a match if it is there and not need to worry if it was coded differently by another person or program.


  •    The InChI/InChIKey is a system for both public and private (fee-based) sources.
6
 
7
InChI Characteristics
  • 1. Easy to generate (It will use existing software.)
  • 2. Expressive (It will contain structural information.)
  • 3. Unique/Unambiguous
  • 4. Easy to search for structure via Internet search engines (Google, Yahoo, Microsoft Live, etc.) using the InChI (hash) Key.
8
 
9
 
10
 
11
 
12
InChIKey
  •           The InChI string has been found to be too long for Internet search engines to use, hence the need for a fixed length InChIKey. The InChIKey is a 25 character (14+8 = 22 +1 check + 1 flag + 1 dash)  hash code of the InChI string. It is made up to four (4) parts:


  •                               AAAAAAAAAAAAAA-BBBBBBBBCD


  •    14 characters for the basic structure
  •      8 characters for the layers
  •      1 character is a “check” character
  •      1 character is a flag indicating certain features
  •                      (e.g., fixed or not fixed hydrogen atoms)


  • A hash code is a fixed length condensed digital representation of a variable character string.


  • The InChIKey is based on truncated SHA-256 cryptographic hash function.
  •    (http://en.wikipedia.org/wiki/SHA-2)



13
InChIKey
  • The principal new features of the InChIKey  are:



  • A fixed-length (25-character) condensed digital representation of the
  • Identifier to be known as InChIKey. In particular, this will


  • * facilitate web searching, previously complicated by unpredictable breaking of InChI character strings by search engines


  • * allow development of a web-based InChI lookup service


  • * permit an InChI representation to be stored in fixed length fields


  • * make chemical structure database indexing easier


  • * allow verification of InChI strings after network transmission.


14
Really long InChI (Palytoxin)
15
D-Fructose (natural)

InChI=1/C6H12O6/c7-1-3(9)5(11)6(12)4(10)2-8/h3,5-9,11-12H,1- 2H2/t3-,5-,6-/m1/s1

InChIKey=BJHIKXHVCXFQLS-UYFOZJQFBH


L-Fructose

InChI=1/C6H12O6/c7-1-3(9)5(11)6(12)4(10)2-8/h3,5-9,11-12H,1-2H2/t3-,5-,6-/m0/s1

InChIKey=BJHIKXHVCXFQLS-FUTKDDECBR
16
Stereoisomers of menthol
17
 
18
 InChI URL’s
Main IUPAC InChI page:http://iupac.org/inchi/

InChI Google video lecture (11/06):
http://video.google.com/videoplay?docid=-6653695245776470969&q=heller+chemical

InChI Google video lecture (10/07): http://youtube.com/watch?v=F9XppyZg4E4

B. Kosata (Prague):
www.inchi.info

P. Murray-Rust/Nick Day (Cambridge): http://wwmm.ch.cam.ac.uk/inchifaq/

ChemSpider:
http://www.chemspider.com/inchi.asmx
19
Summary - Overall Features of InChI (1)
  •           1. InChI is the only publicly available method for creating a unique chemical identifier for a given chemical structure.  In addition InChI has a number of other value attributes noted below.

    2. InChI is free-open source software. 

    3. Any organization (public and private) can use for internal and/or external structure files at no cost.


20
Summary - Overall Features of InChI (2)
  •     4. It is sponsored by IUPAC and primarily implemented by the US scientific standards agency – NIST.

    5. It allows the scientific and medical - healthcare community to use the InChIKey  as a universal chemical identifier. This means  InChI’s can be freely searched for via Internet search engines. 

    6. The InChIKey unlocks the data and information from all sites around the world that choose to use it.  The InChIKey allows all those commercial chemical information providers (e.g., Thieme,  Elsevier, Thomson,  Prous Science, and John  Wiley )  to have a free structure and number/linking system.