Notes
Slide Show
Outline
1
Open Source/Open Access and the IUPAC  International Chemical
Identifier – (InChI)
  • Stephen Heller, Stephen Stein, & Dmitrii Tchekhovskoi
  • Physical & Chemical Properties Division
  • NIST
  • Gaithersburg, MD
  • srheller@nist.gov
2
Scribes in the 15th century were not happy with Johann Gutenberg.

Publishers in the 21st century are not happy with Tim Berners-Lee

Organizations that fail to recognize and confront technological and market changes often tend to lose their positions, if not their organizations.  History is replete with such examples. In the 18th century the power looms replaced the handloom weavers, In the early 20th century the horse and buggy industry giving way to automobiles.  In the late 20th century the airplane replaced the train and boat for long distance traveling.  Now, at the start of the 21st century the technology of the Internet is threatening the way in which the 3+ century old scientific publishing industry and libraries which subscribe to scholarly publications have done business for many decades.
3
Publication System Problems
  • Costs are high
  • No cost for manuscript submission. Under ANY economic model the high volume of submissions generated by the submission via the Internet will drown any system.
  • Lack of leadership at research institutions to demand changes from researchers publication behavior.
  • Difficulty to institute change
4
Nucleic Acids Research
Impact Factor – 7.260 (2004)

  • 1. Institutional membership of NAR means that corresponding authors based at the member institution will qualify for substantially discounted NAR Open Access publication charges ($950 compared with $1900 per article in 2006).

    2. Online access free of charge in 2006. Please note that a 2006 print subscription does not include institutional membership.

    3. Online access will be completely free of charge in 2005. A print subscription or institutional membership provides discounted publication charges for corresponding authors based at the member institution. See www.nar.oupjournals.org/openaccess
    and
    http://www3.oup.co.uk/nar/special/14/default.html
5
 
6
 
7
 
8
 
9
 
10
Sample manuscript – From the lab of Charles Casey – ACS Past President
11
Digital ‘Naming’ of Chemicals
  • Chemical structure is the true ‘identifier’
  • But, structure representations are not unique or convenient for computers.
  • So, convert structure to a unique ‘name’ by fixed algorithms
    • The IUPAC International Chemical Identifier (InChI)
12
Two Problems
  • Chemicals
    • Fast isomerization (tautomerization)
    • Ill-defined connectivity
  • Chemists
    • Differing conventions
      • Depends on discipline, education and convenience
    • Imprecision/uncertainty
13
3 Steps to InChI
  • Chemistry
    • ‘Normalize’ Input Structure
      • Implement chemical rules

  • Math
    • ‘Canonicalize’ (label the atoms)
      • Equivalent atoms get the same label

  • Format
    • ‘Serialize’ Labeled Structure
      • Output as character string (‘name’)
14
Normalize
Simplify
  • Divide structure into ‘layers’
    • Each layer ‘refines’ structure

  • Ignore ‘Electron Density’
    • Use simple ‘connectivity’ only
    • Ignore bond type and electron location


  • Stereochemistry
    • sp2 and sp3 only
    • Free rotation around single bonds
    • No Z/E stereo for small rings (default)
15
 
16
 
17
 
18
InChI Capabilities
  • Identify compounds at the known level of detail
  • Convention-free (mostly)
  • Generate quickly from structure
  • Contains all essential connectivity information
  • Simple ASCII representation
19
 
20
 
21
 
22
 
23
 
24
Current InChI Project Status
  • Description:
    Version 1.0 of the Identifier expresses chemical structures in a standard machine-readable format, in terms of atomic connectivity, tautomeric state, isotopes, stereochemistry, and electronic charge. It deals with neutral and ionic well-defined, covalently-bonded organic molecules, and also with inorganic, organometallic and coordination compounds.
  • We propose to promote actively the use of the algorithm and its associated implementations to developers of commercial chemical software, database compilers and publishers of chemical information, in order to enable sharing of molecular information throughout the worldwide community of chemical scientists.
  • We propose also to extend the applicability of the Identifier to polymeric structures, and to explore the need for and the practicality of an extension to cover Markush structures.
  • In addition, we will evaluate the need for inclusion of information on other attributes such as phases and excited states, and take steps to include such information if appropriate.
25
InChI References/Publications
  • 1. Sophie Rovner, C&E News, ” CHEMICAL 'NAMING' METHOD UNVEILED ”, August 22, 2005
    Volume 83, Number 34, pp. 39-40


  • 2. International chemical identifier goes online, Chem. World, 16 May 2005


  • 3. M.D. Prasanna, J. Vondrasek, A. Wlodawer and T.N. Bhat, Application of InChI to Curate, Index, and Query 3-D Structures, Proteins: Structure, Function, and Bioinformatics, 2005, 60, 1-4


  • 4. Enhancement of the chemical semantic web through the use of InChI identifiers, S.J. Coles, N.E. Day, P. Murray-Rust, H.S. Rzepa and Y. Zhang, Org. Biomol. Chem., 2005, 3(10), 1832-1834


  • 5. InChI FAQ, by Nick Day (Unilever Centre for Molecular Informatics, Cambridge University)


  • 6.Representation and Use of Chemistry in the Global Electronic Age, P. Murray-Rust, H.S. Rzepa, S.M. Tyrrell and Y. Zhang, Org. Biomol. Chem., 2004, 3192-3203 [www.ch.ic.ac.uk/rzepa/obc/]


  • 7.That INChI feeling, Reactive Reports, issue 40, Sep 2004


  • 8.Unique labels for compounds, Chem. & Eng. News, 2 Dec 2002
  • \
  • 9. Chemists synthesize a single naming system, Nature, 23 May 2002


  • 10.That IChI feeling ... The Alchemist, 24 Apr 2002


  • 11.What's in a Name? The Alchemist, 21 Mar 2002
  •   12. Stephen E. Stein, Stephen R. Heller, and Dmitrii Tchekhovskoi, An Open Standard for Chemical Structure Representation: The IUPAC Chemical Identifier,
  •                                                   Proceedings of the 2003 International Chemical Information Conference (Nimes), Infonortics, pp. 131-143.



26
Early InChI Adaptors
  • NIST – 150,000 structures
  • PubChem project – 5.2+ million structures
  • ISI – 2+ million structures
  • IBM – 1.6+ million structures
  • NCI Database – 23+ million structures
  • EPA –DSSTox database – 1450 structures
  • KEGG database – 9584 structures
  • UCSF ZINC – 3.3 million structures



27
 
28
 
29
 
30
 
31
 
32
Acknowledgements
  • Steve Bachrach, Steve Bryant, Denise Creech, Nick Day, Rene Deplanque, Guenter Grethe, Stevan Hanard, Sami Kassab, Gary Mallard, Randy Marcinko, Alan McNaught, Bill Milne, Carmen Nitsche, Chris Reed, Rich Roberts, Peter Murray-Rust, Henry Rzepa, Steve Stein, Peter Shepherd, Bill Town, Andrea Twiss-Brooks, Wendy Warr, and Ann Wolpert