Notes
Slide Show
Outline
1
The IUPAC Chemical Identifier Project – InChI. A Status Report

Stephen R. Heller
Stephen E. Stein
Dmitrii Tchekhovskoi
Igor Pletnev
Alan McNaught

steve@hellers.com
2
The slides from this presentation can be found at :

http://www.hellers.com/steve/pub-talks/turin-807/frame.htm


The main web site for the IUPAC InChI project is:

http://www.iupac.org/inchi
3
Outline of Presentation
    • 1. Background/History/Objective
  •      2. Development of InChI
  •      3. InChI Key
  •      4. InChI Adoption & Use
  •      5. Conclusion
4
Date: Mon, 15 Nov 1999 18:48:30 -0500 (EST)
From: Stephen R. Heller<srheller@cliff.nal.usda.gov>
To: stein <sstein@enh.nist.gov>
Subject: Re: A strawman proposal

Steve-

First rough draft. Let's talk tomorrow about it.

Steve

--------------
11/15/99

An IUPAC Chemical Registry System

        In response to the upcoming March 2000 IUPAC meeting -
Representations of Molecular Structure: Nomenclature and its Alternatives
- I would like to propose the creation of an IUPAC public domain chemical
registry system.
…
5
                 Objective

The objective of the IUPAC Chemical Identifier Project  is to create a unique label, the IUPAC Chemical Identifier  (InChI), which will be an Open Source, non-proprietary identifier for chemical substances that can be used in printed and electronic data sources thus enabling easier linking of diverse data and information compilations.
6
InChI
  • A project whose time has come.  The Internet, an international scientific body (IUPAC), and international cooperation (US, UK, Czech Republic) have led to the rapid development, implementation, and use of InChI. Furthermore, cooperation from software vendors, particularly those with structure drawing software, has made generating InChI’s very easy for all chemists.


  • While InChI is an Open Source, public domain, system for creating a unique computer-readable identifier (“name”)  it is NOT a registry system.  InChI’s are created only by those who choose to adopt and use the algorithm.  Registry systems which index the literature are complementary to any InChI databases that anyone creates.
7
 
8
InChI Characteristics
  • 1. Easy to generate (It will use existing software.)
  • 2. Expressive (It will contain structural information.)
  • 3. Unique/Unambiguous
  • 4. Easy to search for structure via Internet search engines (Google, Yahoo, Microsoft Live, etc.) using the InChI (hash) Key.
9
 
10
 
11
 
12
 
13
 
14
Example of Basic Connectivity
15
Example of Basic Connectivity
  • The input structure and its normalized structure  is shown below – dots correspond to pi-electrons and are shown for illustrative purposes only.
16
Example of a Tautomer : Guanine

This layer is derived from the Basic Layer by the logical removal of mobile H-atoms and the tagging of H-donor and H-receptor atoms.  The input structure and its normalized structure  is shown below – dots correspond to pi-electrons and are shown for illustrative purposes only.
17
 
18
InChIKey
  •           The InChI string has been found to be too long for Internet search engines to use, hence the need for a fixed length InChIKey. The InChIKey is a 22 character (12+6 = 18 +1 check + 1 flag + 2 dashes)  hash code of the InChI string. It is made up to four (4) parts:


  •                                AAAAAAAAAAAA-BBBBBBC-D


  •    12 characters for the basic structure
  •      6 characters for the layers
  •      1 character is a “check” character
  •      1 character is a flag indicating certain features
  •                      (e.g., fixed or not fixed hydrogen atoms)


  • A hash code is a fixed length condensed digital representation of a variable character string.


  • The InChIKey is based on truncated SHA-256 cryptographic hash function.
  •    (http://en.wikipedia.org/wiki/SHA-2)



19
 
20
 
21
 
22
 
23
 
24
 
25
Other InChI Adopters
  • Publishers:


  • Royal Society of Chemistry www.rsc.org/Publishing/Journals/ProjectProspect/
  • Prous Science - Drugs of the Future
  •  BioMed Central - Chemistry Central www.chemistrycentral.com


  • Other:


  • 1.         European Patent Office (EPO)
26
            InChI URL’s

Main IUPAC InChI page:http://iupac.org/inchi/

InChI Google video lecture:
http://video.google.com/videoplay?docid=-6653695245776470969&q=heller+chemical

B. Kosata (Prague): www.inchi.info

P. Murray-Rust/Nick Day (Cambridge): http://wwmm.ch.cam.ac.uk/inchifaq/
27
Summary - Overall Features of InChI (1)
  •           1. InChI is the only publicly available method for creating a unique chemical identifier for a given chemical structure.  In addition InChI has a number of other value attributes noted below.

    2. InChI is free-open source software.  (Web 2.0)

    3. Any organization (public and private) can use for internal and/or external structure files at no cost. (Web 2.0)
  •          The Web 2.0 is the second generation of web-based communities and hosted services — such as social-networking sites — which facilitate collaboration and sharing between users.  Web 1.0 is where information comes from one central source.
28
Summary - Overall Features of InChI (2)
  •     4. It is sponsored by IUPAC and primarily implemented by the US scientific standards agency – NIST.

    5. It allows the chemistry community to use the InChI key  as a universal chemical identifier. This means  InChI’s can be freely searched for via Google/Yahoo/Microsoft Live and other Internet search engines.  (Web 2.0)

    6. The InChI Key unlocks the data and information from all sites around the world that choose to use it.  The InChI Key allows all those commercial chemical information providers (e.g., Elsevier, Thomson,  Prous Science, and John  Wiley )  to have a free structure and number/linking system. (Web 2.0)


29
 Acknowledgments
  • Steve Bachrach, Ted Becker, Jost Bohlen, Pieter Bolman, Evan Bolton, Bob Bovenschulte, Steve Bryant, Harry Collier, Alice Cooper,  Nick Day, Rene Deplanque, Ron Dunn, Guenter Grethe, Stevan Harnad, Wolf-Dietrich Ihlenfeldt, Sami Kassab, Sandy Lawson, David Lipman, Gary Mallard, Randy Marcinko, Bill Milne, Carmen Nitsche, Josep Prous, Chris Reed, Rich Roberts, Peter Murray-Rust, Henry Rzepa,  Peter Shepherd, Bill Town, Andrea Twiss-Brooks, Wendy Warr, Tony Williams, and Ann Wolpert