|
|
|
Stephen Heller*, Stephen Stein, & Dmitrii
Tchekhovskoi |
|
Physical & Chemical Properties Division |
|
NIST |
|
Gaithersburg, MD 20899 |
|
|
|
*affiliation for InChI project work – This not a
NIST presentation |
|
steve@hellers.com |
|
|
|
|
|
|
|
|
|
|
The opinions presented on these slides are those
of the slides and not necessarily those of the speaker. |
|
No animals were harmed in the preparation of
this talk; however a few WWW sites were hit. This talk conforms to PETA
& NIH treatment of human subjects guidelines. |
|
These slides were made from 100% recycled
electrons. |
|
There are no George W. Bush jokes in this
presentation. |
|
This will be a well balanced presentation. I
have a chip on both shoulders. |
|
|
|
|
Key Factors: |
|
|
|
1. Internet |
|
2. Internet |
|
3. Internet |
|
4. Internet |
|
5. Internet |
|
|
|
|
Overview of the new world |
|
Open Data |
|
Open Access |
|
Open Source |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Structural
Changes in communications and interactions |
|
|
|
Lack of Allegiances |
|
|
|
|
|
|
|
|
|
|
|
|
Memberships in Professional Societies is
declining because people interact and communicate differently – using the
Internet. Trend is most obvious in younger scientists. |
|
While AAAS has lost perhaps 20% of its
membership in the past few years, the ACS drop, while smaller, is the same
trend. |
|
|
|
1998 2004 |
|
Full Paid Rate 106,463
103, 227 |
|
|
|
|
|
|
|
|
|
|
Information is not free. Someone needs to pay.
Databases are never “finished”. Someone needs to update, add to, and /or
correct databases. Software is
never “finished”. Someone needs to correct errors, add new features, and maintain the
software as new hardware and computer networks become available. |
|
|
|
The Internet has changed the way databases are
created, as now virtually everything comes in computer readable form. This allows for lower costs to obtain
raw data/information. But the data
stills requires good science and manual, labor-intensive efforts to be
curated or checked. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hindsight is Easy |
|
Creative vs. Custodial |
|
Vision vs. Blindness |
|
Constructive vs. Destructive |
|
|
|
|
|
|
Momentum – People are conservative and change
slowly. Even an airplane at 30,000
feet does fall straight into the sea… |
|
|
|
|
|
|
|
|
|
|
Open Data - NIH/Roadmap PubChem Project |
|
|
|
Accepting
PubChem policies and plans and working with NIH to improve public
health and fight deadly diseases (25): |
|
ASINEX, BIND, BiopCyc, CMLD-BU, ChemBank,
ChemBlock,ChemBridge, ChemDB, ChemExper Chemical Dictionary, ChemIDplus,
DPISMR, DTP/NCI, KEGG,
LipidMAPS,MICAD, MMDB, MOLI, NCGC, NIAID, NIST, NIST Chemistry WebBook,
NMRShiftDB, Nature Chemical Biology,
UPCMLD, ZINC |
|
|
|
|
|
Not accepting
PubChem policies or plans and not yet working with NIH (1): |
|
ACS |
|
|
|
Users: PubChem 10,000 IP’s address/day CAS: 1000
subscriberorganizations |
|
|
|
|
|
|
|
|
1. Costs are high |
|
|
|
2. No cost for manuscript submission. Under ANY
economic model the high volume of submissions generated by the submission
via the Internet will drown any system. |
|
|
|
3. Lack of leadership at research institutions
to demand changes from researchers publication behavior. |
|
|
|
4.
Difficulty to institute change |
|
|
|
|
|
|
Peter Suber - SPARC |
|
http://www.arl.org/sparc/soa/index.html |
|
|
|
Harnad list http://www.cogsci.soton.ac.uk/~harnad/Hypermail/Amsci/index.html |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
An Open Access Publication is one that
meets the following two conditions:
The author(s) and copyright holder(s) grant(s) to all users a free,
irrevocable, worldwide, perpetual right of access to, and a license to
copy, use, distribute, transmit and display the work publicly and to make
and distribute derivative works, in any digital medium for any responsible
purpose, subject to proper attribution of authorship, as well as the right
to make small numbers of printed copies for their personal use.
A complete version of the work and all supplemental materials, including a
copy of the permission as stated above, in a suitable standard electronic
format is deposited immediately upon initial publication in at least one
online repository that is supported by an academic institution, scholarly
society, government agency, or other well-established organization that
seeks to enable open access, unrestricted distribution, interoperability,
and long-term archiving (for the biomedical sciences, PubMed Central is
such a repository).
From PLOS web site |
|
|
|
|
|
|
|
|
|
|
The scholarly community needs organizations to
accept, review, disseminate, and
archive manuscripts |
|
|
|
Only institutions have infinite lifetimes,
humans don’t (i.e., self archiving is nice, but too finite for civilization
to benefit) |
|
|
|
|
Costs are high |
|
No cost for manuscript submission. Under ANY
economic model the high volume of submissions generated by the submission
via the Internet will drown any system. |
|
Lack of leadership at research institutions to
demand changes from researchers publication behavior. |
|
Difficulty to institute change |
|
|
|
|
|
|
Researchers |
|
Publishers |
|
Libraries |
|
|
|
Stevan Harnad |
|
|
|
|
Peer Review |
|
Archiving |
|
Economics |
|
|
|
|
|
|
The worst of system, except for all the others. |
|
|
|
|
|
With apologies to Winston Churchill |
|
|
|
|
Peer review is nice, but |
|
|
|
Reproducibility is what counts in science. |
|
|
|
Peer review has nothing to do with OA. |
|
|
|
|
Peer Review is about to collapse under the
weight of too many short (LPU’s - least publishable unit) papers, too many
poor science papers, and too many poorly written manuscripts – all of which
are too easily submitted via the Internet. |
|
|
|
. |
|
|
|
|
|
|
Vatican Library – 4th century |
|
Bibliotheque Nationale de France -1367 |
|
National Library of Sweden – 1568 |
|
Harvard University - 1638 |
|
German State Library in Berlin -1661 |
|
National Library of Spain -1711 |
|
British Library
– 1753 |
|
US Library of Congress – 1800 |
|
ACS Electronic Journals – 1996 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Provide global, universal free access to
information |
|
Resolve the serial & budget crises at
libraries |
|
|
|
Accelerate scientific progress and research |
|
|
|
Enhance research productivity |
|
|
|
Improve Quality Assurance |
|
|
|
Grow hair on bald spots - ? |
|
|
|
|
|
|
|
|
|
|
Scholarly journals -The only item in the USA
whose cost is rising faster than health care. |
|
|
|
Ann Wolpert, MIT |
|
|
|
|
The financial models from the publishers have
changed due to the Internet. They have replaced purchases and copyright and
fair-use with leases, and contracts. |
|
|
|
|
If I had a one dollar for every Stevan Hanard
e-mail about OA, I could fund OA. |
|
|
|
(~4400 messages as of2/98 – 3/05) |
|
|
|
|
Cost of publishing an OA article is US $100 -
$15,000 |
|
|
|
|
|
(All financial numbers have been audited and
approved by Arthur Anderson, Inc.) |
|
|
|
|
Publishing Costs – Subscription model: |
|
|
|
Editorial Staff |
|
Sales |
|
Marketing |
|
Legal – Contracts, Copyright |
|
IT/Computer Systems |
|
|
|
|
|
|
Derk Haank – Info. World Rev., December 2004,
page 18 |
|
|
|
“The people calling for “free access for all”
should realise that the relevant audience already has free access – through
the libraries and research institutions. I am sure that today everyone can
access the results they need for their work. |
|
… |
|
The new Springer Open Choice model, in which
case they pay a fee of $3,000 (1,620 UK pounds). In return the paper is
freely accessible to anyone interested via the online service SpringerLink
and can read and downloaded free of charge.” |
|
|
|
(Are the latter, who pay $3,000, the “irrelevant
audience”?) |
|
|
|
|
Whine about increasing prices |
|
Reduce journal and book purchases |
|
Attempted to educate researchers about pricing
issues |
|
Provided journals in electronic form to
researchers’ desktop |
|
Provided electronic document delivery |
|
|
|
|
|
|
|
|
Keep the cash flowing in: |
|
Raise prices |
|
Replace copyright with contracts/licenses |
|
Object to any changes (e.g., Open Access) |
|
Suggest various doomsday scenarios to any change from the outside |
|
Provide content in electronic form |
|
Provide archives in electronic form |
|
|
|
|
|
|
Business as usual – publish wherever they want |
|
No change in where or how (e.g., use features of
electronic media to enhance manuscript) they publish |
|
|
|
|
|
|
|
|
What are the library infrastructure costs ? |
|
Purchasing, licensing agreements (staff size
including lawyers), inventory, budgeting for journal/book reductions,
document delivery, interlibrary loans, etc. |
|
What costs disappear with OA? |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Provosts at universities and college will
mandate researchers put up their publications either on their institution
web site or one or more public sites – libraries, universities, etc. |
|
|
|
|
Winners: |
|
Industry Libraries – they pay much less |
|
Researchers – they have access to everything at
no additional cost |
|
Public - they have access to everything at no
cost |
|
Libraries (they become relevant again) |
|
|
|
|
|
Losers: |
|
Most publishers |
|
|
|
|
|
|
Between researchers putting their results on the
web and Google/Yahoo/Microsoft developing ways to search text and chemical
structures all non-copyright, non-proprietary information will be readily
available. Who knows, Google might even buy all of Elsevier’s back-file
content one day. |
|
|
|
|
In the
Hitchhikers Guide to the Galaxy,
the Earth was vaporized to make way for a new inter-galatic highway
which was needed. Destroying earth, was, to quote Defense Secretary
Rumsfeld, “collateral damage”. Perhaps we will be saying that of publishers
one day soon as well. |
|
|
|
|
There is a problem with too many manuscripts and
how to publish them (peer review) and how to make them available. OA may be smoke and mirrors, but where
there is smoke there usually is fire.
And someone will put out the fire. I am betting on Goggle or some
version of it to do so. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chemical structure is the true ‘identifier’ |
|
But, structure representations are not unique or
convenient for computers. |
|
So, convert structure to a unique ‘name’ by
fixed algorithms |
|
The IUPAC International Chemical Identifier
(InChI) |
|
|
|
|
|
|
|
|
Chemicals |
|
Fast isomerization (tautomerization) |
|
Ill-defined connectivity |
|
Chemists |
|
Differing conventions |
|
Depends on discipline, education and convenience |
|
Imprecision/uncertainty |
|
|
|
|
|
|
Chemistry |
|
‘Normalize’ Input Structure |
|
Implement chemical rules |
|
|
|
Math |
|
‘Canonicalize’ (label the atoms) |
|
Equivalent atoms get the same label |
|
|
|
Format |
|
‘Serialize’ Labeled Structure |
|
Output as character string (‘name’) |
|
|
|
|
|
Divide structure into ‘layers’ |
|
Each layer ‘refines’ structure |
|
|
|
Ignore ‘Electron Density’ |
|
Use simple ‘connectivity’ only |
|
Ignore bond type and electron location |
|
|
|
Stereochemistry |
|
sp2 and sp3 only |
|
Free rotation around single bonds |
|
No Z/E stereo for small rings (default) |
|
|
|
|
|
|
|
|
Identify compounds at the known level of detail |
|
Convention-free (mostly) |
|
Generate quickly from structure |
|
Contains all essential connectivity information |
|
Simple ASCII representation |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chemical Nomenclature and Structure
Representation Division (VIII) |
|
Number: 2004-039-1-800 |
|
Title: IUPAC International Chemical Identifier
(InChI): promotion and extension |
|
Task Group
Chairman: Alan McNaught |
|
Members: Stephen R. Heller, Jaroslav Kahovec, Stephen
Stein, Dmitrii Tchekhovskoi, and Andrey Yerin |
|
Objective:
Following the launch of InChI version 1.0: |
|
to promote its use throughout the chemical
information community |
|
to extend its applicability to include polymeric
structures |
|
to explore the need for other extensions,
including the ability to handle Markush structures, and to include
information on other attributes such as phases and excited states |
|
|
|
|
|
|
Description:
Version 1.0 of the Identifier expresses chemical structures in a standard
machine-readable format, in terms of atomic connectivity, tautomeric state,
isotopes, stereochemistry, and electronic charge. It deals with neutral and
ionic well-defined, covalently-bonded organic molecules, and also with
inorganic, organometallic and coordination compounds. |
|
We propose to promote actively the use of the
algorithm and its associated implementations to developers of commercial
chemical software, database compilers and publishers of chemical
information, in order to enable sharing of molecular information throughout
the worldwide community of chemical scientists. |
|
We propose also to extend the applicability of
the Identifier to polymeric structures, and to explore the need for and the
practicality of an extension to cover Markush structures. |
|
In addition, we will evaluate the need for
inclusion of information on other attributes such as phases and excited
states, and take steps to include such information if appropriate. |
|
|
|
|
Progress:
Version 1 of IUPAC's
International Chemical Identifier (InChI) has been released in April 2005;
software, documentation, source code and licensing conditions are available
from the IUPAC website at www.iupac.org/inchi |
|
An InChI FAQ presented by Nick Day (Unilever
Centre for Molecular Informatics, Cambridge University) is available from http://wwmm.ch.cam.ac.uk/inchifaq/ |
|
November 2005 update
To enable development of InChI facilities and applications in an Open
Source context, a project to encompass this work has been registered with
SourceForge.net (see http://sourceforge.net/projects/inchi); people wishing
to participate should contact the project administrator (mcnaughta@rsc.org)
or the IUPAC Secretariat (secretariat@iupac.org). To receive and discuss
proposals for InChI enhancements, an internet listserver has also been
established; people wishing to participate in these discussions should
contact Alan McNaught (mcnaughta@rsc.org). |
|
|
|
|
|
|
1. Sophie Rovner, C&E News, ” CHEMICAL
'NAMING' METHOD UNVEILED ”, August 22, 2005
Volume 83, Number 34, pp. 39-40 |
|
|
|
2. International chemical identifier goes
online, Chem. World, 16 May 2005 |
|
|
|
3. M.D. Prasanna, J. Vondrasek, A. Wlodawer and
T.N. Bhat, Application of InChI to Curate, Index, and Query 3-D Structures,
Proteins: Structure, Function, and Bioinformatics, 2005, 60, 1-4 |
|
|
|
4. Enhancement of the chemical semantic web
through the use of InChI identifiers, S.J. Coles, N.E. Day, P. Murray-Rust,
H.S. Rzepa and Y. Zhang, Org. Biomol. Chem., 2005, 3(10), 1832-1834 |
|
|
|
5. InChI FAQ, by Nick Day (Unilever Centre for
Molecular Informatics, Cambridge University) |
|
|
|
6.Representation and Use of Chemistry in the
Global Electronic Age, P. Murray-Rust, H.S. Rzepa, S.M. Tyrrell and Y.
Zhang, Org. Biomol. Chem., 2004, 3192-3203 [www.ch.ic.ac.uk/rzepa/obc/] |
|
|
|
7.That INChI feeling, Reactive Reports, issue
40, Sep 2004 |
|
|
|
8.Unique labels for compounds, Chem. & Eng.
News, 2 Dec 2002 |
|
\ |
|
9. Chemists synthesize a single naming system, Nature,
23 May 2002 |
|
|
|
10.That IChI feeling ... The Alchemist, 24 Apr
2002 |
|
|
|
11.What's in a Name? The Alchemist, 21 Mar 2002 |
|
12. Stephen E. Stein, Stephen R. Heller,
and Dmitrii Tchekhovskoi, An Open Standard for Chemical Structure
Representation: The IUPAC Chemical Identifier, |
|
Proceedings
of the 2003 International Chemical Information Conference (Nimes),
Infonortics, pp. 131-143. |
|
|
|
|
|
|
NIST – 150,000 |
|
NIH/NCBI/PubChem project – 5,200,000+ |
|
IBM – 1.6 million |
|
ISI – 2+ million structures |
|
NCI Database – 23 million+ |
|
EPA –DSSTox database – 1450 |
|
KEGG database – 9584 |
|
UCSF ZINC – 3.3million |
|
|
|
|
|
|
|
|
|
|
|
|
Future versions of InChI, for example, could
include phase information and crystal structure, conformations, electronic
states and additional classes of stereochemistry. |
|
|
|
First additional project: Investigate adding
polymers to InChI |
|
|
|
|
|
|
Steve Bachrach, Steve Bryant, Denise Creech,
Nick Day, Rene Deplanque, Guenter Grethe, Stevan Hanard, Martin Hicks,Sami
Kassab, Gary Mallard, Randy Marcinko, Alan McNaught, Bill Milne, Carmen
Nitsche, Chris Reed, Rich Roberts, Peter Murray-Rust, Henry Rzepa, Steve
Stein, Peter Shepherd, Bill Town, Andrea Twiss-Brooks, Wendy Warr, and Ann
Wolpert |
|
|
|
|
|