|
|
|
Stephen Heller, Stephen Stein, & Dmitrii
Tchekhovskoi |
|
Physical & Chemical Properties Division |
|
NIST |
|
Gaithersburg, MD |
|
srheller@nist.gov |
|
|
|
|
Peter Suber - SPARC |
|
http://www.arl.org/sparc/soa/index.html |
|
|
|
Harnad list http://www.cogsci.soton.ac.uk/~harnad/Hypermail/Amsci/index.html |
|
|
|
|
|
|
|
|
|
|
Costs are high |
|
No cost for manuscript submission. Under ANY
economic model the high volume of submissions generated by the submission
via the Internet will drown any system. |
|
Lack of leadership at research institutions to
demand changes from researchers publication behavior. |
|
Difficulty to institute change |
|
|
|
|
|
|
Overview of NAR’s Open Access model for 2005 |
|
From 1st January 2005, all articles published in
NAR will be made freely available online immediately upon publication. This
means that it will no longer be necessary to hold a subscription in order
to read NAR online – content published in the journal will be easily
accessible to everyone. |
|
|
|
Our decision to implement an Open Access model
for 2005 is based in part on a large-scale survey of NAR authors and
reviewers. Between March and April 2004, over 1000 members of the journal’s
community responded to our survey, with the majority supporting a move to
full Open Access partially funded by author publication charges. We have
also discussed possible models with representatives of the librarian
community, who have expressed support for our experimentation with Open
Access. |
|
|
|
http://www3.oup.co.uk/nar/special/14/default.html |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chemical structure is the true ‘identifier’ |
|
But, structure representations are not unique or
convenient for computers. |
|
So, convert structure to a unique ‘name’ by
fixed algorithms |
|
The IUPAC International Chemical Identifier
(InChI) |
|
|
|
|
|
|
Chemicals |
|
Fast isomerization (tautomerization) |
|
Ill-defined connectivity |
|
Chemists |
|
Differing conventions |
|
Depends on discipline, education and convenience |
|
Imprecision/uncertainty |
|
|
|
|
|
|
Chemistry |
|
‘Normalize’ Input Structure |
|
Implement chemical rules |
|
|
|
Math |
|
‘Canonicalize’ (label the atoms) |
|
Equivalent atoms get the same label |
|
|
|
Format |
|
‘Serialize’ Labeled Structure |
|
Output as character string (‘name’) |
|
|
|
|
|
Divide structure into ‘layers’ |
|
Each layer ‘refines’ structure |
|
|
|
Ignore ‘Electron Density’ |
|
Use simple ‘connectivity’ only |
|
Ignore bond type and electron location |
|
|
|
Stereochemistry |
|
sp2 and sp3 only |
|
Free rotation around single bonds |
|
No Z/E stereo for small rings (default) |
|
|
|
|
|
|
|
|
Identify compounds at the known level of detail |
|
Convention-free (mostly) |
|
Generate quickly from structure |
|
Contains all essential connectivity information |
|
Simple ASCII representation |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chemical Nomenclature and Structure
Representation Division (VIII) |
|
Number: 2004-039-1-800 |
|
Title: IUPAC International Chemical Identifier
(InChI): promotion and extension |
|
Task Group
Chairman: Alan McNaught |
|
Members: Stephen R. Heller, Jaroslav Kahovec, Stephen
Stein, Dmitrii Tchekhovskoi, and Andrey Yerin |
|
Objective:
Following the launch of InChI version 1.0: |
|
to promote its use throughout the chemical
information community |
|
to extend its applicability to include polymeric
structures |
|
to explore the need for other extensions,
including the ability to handle Markush structures, and to include
information on other attributes such as phases and excited states |
|
|
|
|
|
|
Description:
Version 1.0 of the Identifier expresses chemical structures in a standard
machine-readable format, in terms of atomic connectivity, tautomeric state,
isotopes, stereochemistry, and electronic charge. It deals with neutral and
ionic well-defined, covalently-bonded organic molecules, and also with
inorganic, organometallic and coordination compounds. |
|
We propose to promote actively the use of the
algorithm and its associated implementations to developers of commercial
chemical software, database compilers and publishers of chemical
information, in order to enable sharing of molecular information throughout
the worldwide community of chemical scientists. |
|
We propose also to extend the applicability of
the Identifier to polymeric structures, and to explore the need for and the
practicality of an extension to cover Markush structures. |
|
In addition, we will evaluate the need for
inclusion of information on other attributes such as phases and excited
states, and take steps to include such information if appropriate. |
|
|
|
|
Progress:
Version 1 of IUPAC's
International Chemical Identifier (InChI) has been released in April 2005;
software, documentation, source code and licensing conditions are available
from the IUPAC website at www.iupac.org/inchi |
|
An InChI FAQ presented by Nick Day (Unilever
Centre for Molecular Informatics, Cambridge University) is available from http://wwmm.ch.cam.ac.uk/inchifaq/ |
|
May 2005 update
To enable development of InChI facilities and applications in an Open
Source context, a project to encompass this work has been registered with
SourceForge.net (see http://sourceforge.net/projects/inchi); people wishing
to participate should contact the project administrator (mcnaughta@rsc.org)
or the IUPAC Secretariat (secretariat@iupac.org). To receive and discuss
proposals for InChI enhancements, an internet listserver has also been
established; people wishing to participate in these discussions should
contact Alan McNaught (mcnaughta@rsc.org). |
|
|
|
|
|
|
1. International chemical identifier goes
online, Chem. World, 16 May 2005 |
|
|
|
2. M.D. Prasanna, J. Vondrasek, A. Wlodawer and
T.N. Bhat, Application of InChI to Curate, Index, and Query 3-D Structures,
Proteins: Structure, Function, and Bioinformatics, 2005, 60, 1-4 |
|
|
|
3. Enhancement of the chemical semantic web
through the use of InChI identifiers, S.J. Coles, N.E. Day, P. Murray-Rust,
H.S. Rzepa and Y. Zhang, Org. Biomol. Chem., 2005, 3(10), 1832-1834 |
|
|
|
3. InChI FAQ, by Nick Day (Unilever Centre for
Molecular Informatics, Cambridge University) |
|
|
|
4.Representation and Use of Chemistry in the
Global Electronic Age, P. Murray-Rust, H.S. Rzepa, S.M. Tyrrell and Y.
Zhang, Org. Biomol. Chem., 2004, 3192-3203 [www.ch.ic.ac.uk/rzepa/obc/] |
|
|
|
5.That INChI feeling, Reactive Reports, issue
40, Sep 2004 |
|
|
|
6.Unique labels for compounds, Chem. & Eng.
News, 2 Dec 2002 |
|
\ |
|
7. Chemists synthesize a single naming system, Nature,
23 May 2002 |
|
|
|
8.That IChI feeling ... The Alchemist, 24 Apr
2002 |
|
|
|
9.What's in a Name? The Alchemist, 21 Mar 2002 |
|
10. Stephen E. Stein, Stephen R. Heller,
and Dmitrii Tchekhovskoi, An Open Standard for Chemical Structure
Representation: The IUPAC Chemical Identifier, in Proceedings of the 2003
International Chemical Information Conference (Nimes), Infonortics, pp.
131-143. |
|
|
|
|
|
|
NIST – 150,000 structures |
|
NIH/NCBI/PubChem project – 800,000+ structures |
|
ISI – 2+ million structures |
|
NCI Database – 23 million+ structures |
|
EPA –DSSTox database – 1450 structures |
|
KEGG database – 9584 structures |
|
UCSF ZINC – 3.3 million structures |
|
|
|
|
|
|
|
|
|
|
|
|
Future versions of InChI, for example, could
include phase information and crystal structure, conformations, electronic
states and additional classes of stereochemistry. |
|
|
|
First additional project: Investigate adding
polymers to InChI |
|
|
|
|
|
|
Steve Bachrach, Mila Becker, Pieter Bolman, Bob
Bovenschulte, Steve Bryant, Rene Deplanque, Guenter Grethe, Stevan Hanard,
Sami Kassab, Gary Mallard, Randy Marcinko, Alan McNaught, Bill Milne,
Carmen Nitsche, Chris Reed, Rich Roberts, Peter Murray-Rust, Henry Rzepa,
Steve Stein, |
|
Peter Shepherd, Bill Town, Andrea Twiss-Brooks,
and Ann Wolpert |
|
|
|
|
|