Notes
Outline
Open Source/Open Access and the IUPAC  International Chemical
Identifier – (InChI)
Stephen Heller, Stephen Stein, & Dmitrii Tchekhovskoi
Physical & Chemical Properties Division
NIST
Gaithersburg, MD
srheller@nist.gov
Internet Sources of Open Access Information
Peter Suber - SPARC
http://www.arl.org/sparc/soa/index.html
Harnad list http://www.cogsci.soton.ac.uk/~harnad/Hypermail/Amsci/index.html
Slide 3
Scribes in the 15th century were not happy with Johann Gutenberg.

Publishers in the 21st century are not happy with Tim Berners-Lee

Organizations that fail to recognize and confront technological and market changes often tend to lose their positions, if not their organizations.  History is replete with such examples. In the 18th century the power looms replaced the handloom weavers, In the early 20th century the horse and buggy industry giving way to automobiles.  In the late 20th century the airplane replaced the train and boat for long distance traveling.  Now, at the start of the 21st century the technology of the Internet is threatening the way in which the 3+ century old scientific publishing industry and libraries which subscribe to scholarly publications have done business for many decades.
Slide 5
System Problems
Costs are high
No cost for manuscript submission. Under ANY economic model the high volume of submissions generated by the submission via the Internet will drown any system.
Lack of leadership at research institutions to demand changes from researchers publication behavior.
Difficulty to institute change
Nucleic Acids Research
Impact Factor -- 6.575
Overview of NAR’s Open Access model for 2005
From 1st January 2005, all articles published in NAR will be made freely available online immediately upon publication. This means that it will no longer be necessary to hold a subscription in order to read NAR online – content published in the journal will be easily accessible to everyone.
Our decision to implement an Open Access model for 2005 is based in part on a large-scale survey of NAR authors and reviewers. Between March and April 2004, over 1000 members of the journal’s community responded to our survey, with the majority supporting a move to full Open Access partially funded by author publication charges. We have also discussed possible models with representatives of the librarian community, who have expressed support for our experimentation with Open Access.
http://www3.oup.co.uk/nar/special/14/default.html
Slide 8
Slide 9
Slide 10
Slide 11
Discussions are underway to use InChI as the chemical identifier in the Beilstein Journal of Organic Chemistry (BJOC)
Digital ‘Naming’ of Chemicals
Chemical structure is the true ‘identifier’
But, structure representations are not unique or convenient for computers.
So, convert structure to a unique ‘name’ by fixed algorithms
The IUPAC International Chemical Identifier (InChI)
Two Problems
Chemicals
Fast isomerization (tautomerization)
Ill-defined connectivity
Chemists
Differing conventions
Depends on discipline, education and convenience
Imprecision/uncertainty
3 Steps to InChI
Chemistry
‘Normalize’ Input Structure
Implement chemical rules
Math
‘Canonicalize’ (label the atoms)
Equivalent atoms get the same label
Format
‘Serialize’ Labeled Structure
Output as character string (‘name’)
Normalize
Simplify
Divide structure into ‘layers’
Each layer ‘refines’ structure
Ignore ‘Electron Density’
Use simple ‘connectivity’ only
Ignore bond type and electron location
Stereochemistry
sp2 and sp3 only
Free rotation around single bonds
No Z/E stereo for small rings (default)
Slide 17
Slide 18
InChI Capabilities
Identify compounds at the known level of detail
Convention-free (mostly)
Generate quickly from structure
Contains all essential connectivity information
Simple ASCII representation
Slide 20
Slide 21
Slide 22
Slide 23
Slide 24
Slide 25
Current InChI Project -1
Chemical Nomenclature and Structure Representation Division (VIII)
Number: 2004-039-1-800
Title: IUPAC International Chemical Identifier (InChI): promotion and extension
Task Group
Chairman: Alan McNaught
Members: Stephen R. Heller, Jaroslav Kahovec, Stephen Stein, Dmitrii Tchekhovskoi, and Andrey Yerin
Objective:
Following the launch of InChI version 1.0:
to promote its use throughout the chemical information community
to extend its applicability to include polymeric structures
to explore the need for other extensions, including the ability to handle Markush structures, and to include information on other attributes such as phases and excited states
Current InChI Project -2
Description:
Version 1.0 of the Identifier expresses chemical structures in a standard machine-readable format, in terms of atomic connectivity, tautomeric state, isotopes, stereochemistry, and electronic charge. It deals with neutral and ionic well-defined, covalently-bonded organic molecules, and also with inorganic, organometallic and coordination compounds.
We propose to promote actively the use of the algorithm and its associated implementations to developers of commercial chemical software, database compilers and publishers of chemical information, in order to enable sharing of molecular information throughout the worldwide community of chemical scientists.
We propose also to extend the applicability of the Identifier to polymeric structures, and to explore the need for and the practicality of an extension to cover Markush structures.
In addition, we will evaluate the need for inclusion of information on other attributes such as phases and excited states, and take steps to include such information if appropriate.
Current InChI Project -3
Progress:
Version 1 of IUPAC's International Chemical Identifier (InChI) has been released in April 2005; software, documentation, source code and licensing conditions are available from the IUPAC website at www.iupac.org/inchi
An InChI FAQ presented by Nick Day (Unilever Centre for Molecular Informatics, Cambridge University) is available from http://wwmm.ch.cam.ac.uk/inchifaq/
May 2005 update
To enable development of InChI facilities and applications in an Open Source context, a project to encompass this work has been registered with SourceForge.net (see http://sourceforge.net/projects/inchi); people wishing to participate should contact the project administrator (mcnaughta@rsc.org) or the IUPAC Secretariat (secretariat@iupac.org). To receive and discuss proposals for InChI enhancements, an internet listserver has also been established; people wishing to participate in these discussions should contact Alan McNaught (mcnaughta@rsc.org).
InChI References/Publications
1. International chemical identifier goes online, Chem. World, 16 May 2005
2. M.D. Prasanna, J. Vondrasek, A. Wlodawer and T.N. Bhat, Application of InChI to Curate, Index, and Query 3-D Structures, Proteins: Structure, Function, and Bioinformatics, 2005, 60, 1-4
3. Enhancement of the chemical semantic web through the use of InChI identifiers, S.J. Coles, N.E. Day, P. Murray-Rust, H.S. Rzepa and Y. Zhang, Org. Biomol. Chem., 2005, 3(10), 1832-1834
3. InChI FAQ, by Nick Day (Unilever Centre for Molecular Informatics, Cambridge University)
4.Representation and Use of Chemistry in the Global Electronic Age, P. Murray-Rust, H.S. Rzepa, S.M. Tyrrell and Y. Zhang, Org. Biomol. Chem., 2004, 3192-3203 [www.ch.ic.ac.uk/rzepa/obc/]
5.That INChI feeling, Reactive Reports, issue 40, Sep 2004
6.Unique labels for compounds, Chem. & Eng. News, 2 Dec 2002
\
7. Chemists synthesize a single naming system, Nature, 23 May 2002
8.That IChI feeling ... The Alchemist, 24 Apr 2002
9.What's in a Name? The Alchemist, 21 Mar 2002
  10. Stephen E. Stein, Stephen R. Heller, and Dmitrii Tchekhovskoi, An Open Standard for Chemical Structure Representation: The IUPAC Chemical Identifier, in Proceedings of the 2003 International Chemical Information Conference (Nimes), Infonortics, pp. 131-143.
Early InChI Adaptors
NIST – 150,000 structures
NIH/NCBI/PubChem project – 800,000+ structures
ISI – 2+ million structures
NCI Database – 23 million+ structures
EPA –DSSTox database – 1450 structures
KEGG database – 9584 structures
UCSF ZINC – 3.3 million structures
Slide 31
Slide 32
Future
Future versions of InChI, for example, could include phase information and crystal structure, conformations, electronic states and additional classes of stereochemistry.
First additional project: Investigate adding polymers to InChI
Acknowledgements
Steve Bachrach, Mila Becker, Pieter Bolman, Bob Bovenschulte, Steve Bryant, Rene Deplanque, Guenter Grethe, Stevan Hanard, Sami Kassab, Gary Mallard, Randy Marcinko, Alan McNaught, Bill Milne, Carmen Nitsche, Chris Reed, Rich Roberts, Peter Murray-Rust, Henry Rzepa, Steve Stein,
Peter Shepherd, Bill Town, Andrea Twiss-Brooks, and Ann Wolpert