|
|
|
Stephen Heller*, Stephen Stein, & Dmitrii
Tchekhovskoi |
|
Physical & Chemical Properties Division |
|
NIST |
|
Gaithersburg, MD 20899 |
|
|
|
*affiliation for InChI project work – This not a
NIST presentation |
|
steve@hellers.com |
|
|
|
|
|
|
|
|
|
The opinions presented on these slides are those
of the slides and not necessarily those of the speaker. |
|
No animals were harmed in the preparation of
this talk; however a few WWW sites were hit. This talk conforms to PETA
& NIH treatment of human subjects guidelines. |
|
These slides were made from 100% recycled
electrons. |
|
There are no George W. Bush jokes in this
presentation. |
|
This will be a well balanced presentation. I
have a chip on both shoulders. |
|
|
|
|
Key Factors: |
|
|
|
1. Internet |
|
2. Internet |
|
3. Internet |
|
4. Internet |
|
5. Internet |
|
|
|
|
|
Overview of the new world |
|
Open Access |
|
Open Data |
|
Open Source |
|
|
|
|
|
|
|
|
Structural
Changes in communications and interactions |
|
|
|
Lack of Allegiances |
|
|
|
|
|
|
|
|
|
|
|
|
Memberships in Professional Societies is
declining because people interact and communicate differently – using the
Internet. Trend is most obvious in younger scientists. |
|
While AAAS has lost perhaps 20% of its
membership in the past few years, the ACS drop, while smaller, is the same
trend. |
|
|
|
1998 2004 |
|
Full Paid Rate 106,463
103, 227 |
|
|
|
|
|
|
|
Information is not free. Someone needs to pay.
Databases are never “finished”. Someone needs to update, add to, and /or
correct databases. Software is
never “finished”. Someone needs to correct errors, add new features, and maintain the
software as new hardware and computer networks become available. |
|
|
|
The Internet has changed the way databases are
created, as now virtually everything comes in computer readable form. This allows for lower costs to obtain
raw data/information. But the data
stills requires manual, labor-intensive efforts to be curated or checked. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hindsight is Easy |
|
|
|
Creative vs. Custodial |
|
Vision vs. Blindness |
|
Constructive vs. Destructive |
|
|
|
|
|
|
Momentum – People are conservative and change
slowly. Even an airplane at 30,000
feet does not fall straight into the sea… |
|
|
|
|
|
|
|
|
Open Data - NIH/Roadmap PubChem Project |
|
|
|
Accepting
PubChem policies and plans and working with NIH to improve public
health and fight deadly diseases: |
|
ASINEX, BIND, BiopCyc, CMLD-BU, ChemBank,
ChemBridge, ChemExper Chemical Dictionary, ChemIDplus, DPISMR, DTP/NCI,
Elsevier, KEGG, MICAD, MMDB, MOLI, NCGC, NIAID, NIST, NIST Chemistry
WebBook, NMRShiftDB, Nature Chemical Biology |
|
|
|
|
|
Not accepting
PubChem policies or plans and not yet working with NIH: |
|
ACS |
|
|
|
|
|
|
|
|
|
|
|
1. Costs are high due to legacy expenses. |
|
|
|
2. No cost for manuscript submission. Under ANY
economic model the high volume of submissions generated by the submission
via the Internet will drown any system. |
|
|
|
3. Lack of leadership at research institutions
to demand changes from researchers publication behavior. |
|
|
|
4.
Difficulty to institute change |
|
|
|
|
|
|
Peter Suber - SPARC |
|
http://www.arl.org/sparc/soa/index.html |
|
|
|
Harnad list http://www.cogsci.soton.ac.uk/~harnad/Hypermail/Amsci/index.html |
|
|
|
|
|
|
|
|
|
|
An Open Access Publication is one that
meets the following two conditions:
The author(s) and copyright holder(s) grant(s) to all users a free,
irrevocable, worldwide, perpetual right of access to, and a license to
copy, use, distribute, transmit and display the work publicly and to make
and distribute derivative works, in any digital medium for any responsible
purpose, subject to proper attribution of authorship, as well as the right
to make small numbers of printed copies for their personal use.
A complete version of the work and all supplemental materials, including a
copy of the permission as stated above, in a suitable standard electronic
format is deposited immediately upon initial publication in at least one
online repository that is supported by an academic institution, scholarly
society, government agency, or other well-established organization that
seeks to enable open access, unrestricted distribution, interoperability,
and long-term archiving (for the biomedical sciences, PubMed Central is
such a repository).
From PLOS web site |
|
|
|
|
|
|
|
The scholarly community needs organizations to
accept, review, disseminate, and
archive manuscripts |
|
|
|
Only institutions have infinite lifetimes,
humans don’t (i.e., self archiving is nice, but too finite for civilization
to benefit) |
|
|
|
|
|
|
Researchers |
|
Publishers |
|
Libraries |
|
|
|
Stevan Harnad |
|
|
|
|
Peer Review |
|
Archiving |
|
Economics |
|
|
|
|
|
|
The worst of system, except for all the others. |
|
|
|
|
|
With apologies to Winston Churchill |
|
|
|
|
Peer review is nice, but |
|
|
|
Reproducibility is what counts in science. |
|
|
|
Peer review has nothing to do with OA. |
|
|
|
|
Peer Review is about to collapse under the
weight of too many short (LPU’s - least publishable unit) papers, too many
poor science papers, and too many poorly written manuscripts – all of which
are too easily submitted via the Internet. |
|
|
|
. |
|
|
|
|
For example: |
|
“Two member-ed unsaturated rings” |
|
Part 1- Synthesis |
|
Part 2 – Nitrogen derivatives |
|
Part 3- Sulfur derivatives |
|
Authors: |
|
G. Marx, H. Marx, & Z. Marx, Freedonia
Academy of Sciences |
|
|
|
|
|
Vatican Library – 4th century |
|
Bibliotheque Nationale de France -1367 |
|
National Library of Sweden – 1568 |
|
Harvard University - 1638 |
|
German State Library in Berlin -1661 |
|
National Library of Spain -1711 |
|
British Library
– 1753 |
|
US Library of Congress – 1800 |
|
ACS Electronic Journals – 1996 |
|
|
|
|
|
|
|
|
|
|
|
|
Provide global, universal free access to
information |
|
Resolve the serial & budget crises at
libraries |
|
|
|
Accelerate scientific progress and research |
|
|
|
Enhance research productivity |
|
|
|
Improve Quality Assurance |
|
|
|
Grow hair on bald spots - ? |
|
|
|
|
|
|
|
|
|
|
The financial models from the publishers have
changed due to the Internet. They have replaced purchases and copyright and
fair-use with leases, and contracts. |
|
|
|
|
Cost of publishing an OA article is US $100 -
$15,000 |
|
|
|
|
|
(All financial numbers have been audited and
approved by Arthur Anderson, Inc.) |
|
|
|
|
Publishing Costs – Subscription model: |
|
|
|
Editorial Staff |
|
Sales |
|
Marketing |
|
Legal – Contracts, Copyright |
|
IT/Computer Systems |
|
|
|
|
|
|
Whine about increasing prices |
|
Reduce journal and book purchases |
|
Attempted to educate researchers about pricing
issues |
|
Provided journals in electronic form to
researchers’ desktop |
|
Provided electronic document delivery |
|
|
|
|
|
|
|
|
Keep the cash flowing in: |
|
Raise prices |
|
Replace copyright with contracts/licenses |
|
Object to any changes (e.g., Open Access) |
|
Suggest various doomsday scenarios to any change from the outside |
|
Provide content in electronic form |
|
Provide archives in electronic form |
|
|
|
|
|
|
Business as usual – publish wherever they want |
|
No change in where or how (e.g., use features of
electronic media to enhance manuscript) they publish |
|
|
|
|
|
|
|
|
What are the library infrastructure costs ? |
|
Purchasing, licensing agreements (staff size
including lawyers), inventory, budgeting for journal/book reductions,
document delivery, interlibrary loans, etc. |
|
What costs disappear with OA? |
|
|
|
|
|
|
|
|
|
Provosts at universities and college will
mandate researchers put up their publications either on their institution
web site or one or more public sites – libraries, universities, etc. |
|
|
|
|
|
Between researchers putting their results on the
web and Google/Yahoo/Microsoft developing ways to search text and chemical
structures all non-copyright, non-proprietary information will be readily
available. Who knows, Google might even buy all of Elsevier’s back-file
content one day. |
|
|
|
|
In the
Hitchhikers Guide to the Galaxy,
the Earth was vaporized to make way for a new inter-galatic highway
which was needed. Destroying earth, was, to quote Defense Secretary
Rumsfeld, “collateral damage”. Perhaps we will be saying that of publishers
one day soon as well. |
|
|
|
|
There is a problem with too many manuscripts and
how to publish them (peer review) and how to make them available. OA may be smoke and mirrors, but where
there is smoke there usually is fire.
And someone will put out the fire. I am betting on Goggle or some
version of it to do so. |
|
|
|
|
|
|
|
|
|
|
|
|
Chemical structure is the true ‘identifier’ |
|
But, structure representations are not unique or
convenient for computers. |
|
So, convert structure to a unique ‘name’ by
fixed algorithms |
|
The IUPAC International Chemical Identifier
(InChI) |
|
|
|
|
|
|
|
Chemicals |
|
Fast isomerization (tautomerization) |
|
Ill-defined connectivity |
|
Chemists |
|
Differing conventions |
|
Depends on discipline, education and convenience |
|
Imprecision/uncertainty |
|
|
|
|
|
|
Chemistry |
|
‘Normalize’ Input Structure |
|
Implement chemical rules |
|
|
|
Math |
|
‘Canonicalize’ (label the atoms) |
|
Equivalent atoms get the same label |
|
|
|
Format |
|
‘Serialize’ Labeled Structure |
|
Output as character string (‘name’) |
|
|
|
|
|
Divide structure into ‘layers’ |
|
Each layer ‘refines’ structure |
|
|
|
Ignore ‘Electron Density’ |
|
Use simple ‘connectivity’ only |
|
Ignore bond type and electron location |
|
|
|
Stereochemistry |
|
sp2 and sp3 only |
|
Free rotation around single bonds |
|
No Z/E stereo for small rings (default) |
|
|
|
|
|
|
|
Identify compounds at the known level of detail |
|
Convention-free (mostly) |
|
Generate quickly from structure |
|
Contains all essential connectivity information |
|
Simple ASCII representation |
|
|
|
|
|
|
|
|
1. Sophie Rovner, C&E News, ” CHEMICAL
'NAMING' METHOD UNVEILED ”, August 22, 2005
Volume 83, Number 34, pp. 39-40 |
|
|
|
2. International chemical identifier goes
online, Chem. World, 16 May 2005 |
|
|
|
3. M.D. Prasanna, J. Vondrasek, A. Wlodawer and
T.N. Bhat, Application of InChI to Curate, Index, and Query 3-D Structures,
Proteins: Structure, Function, and Bioinformatics, 2005, 60, 1-4 |
|
|
|
4. Enhancement of the chemical semantic web
through the use of InChI identifiers, S.J. Coles, N.E. Day, P. Murray-Rust,
H.S. Rzepa and Y. Zhang, Org. Biomol. Chem., 2005, 3(10), 1832-1834 |
|
|
|
5. InChI FAQ, by Nick Day (Unilever Centre for
Molecular Informatics, Cambridge University) |
|
|
|
6.Representation and Use of Chemistry in the
Global Electronic Age, P. Murray-Rust, H.S. Rzepa, S.M. Tyrrell and Y.
Zhang, Org. Biomol. Chem., 2004, 3192-3203 [www.ch.ic.ac.uk/rzepa/obc/] |
|
|
|
7.That INChI feeling, Reactive Reports, issue
40, Sep 2004 |
|
|
|
8.Unique labels for compounds, Chem. & Eng.
News, 2 Dec 2002 |
|
\ |
|
9. Chemists synthesize a single naming system, Nature,
23 May 2002 |
|
|
|
10.That IChI feeling ... The Alchemist, 24 Apr
2002 |
|
|
|
11.What's in a Name? The Alchemist, 21 Mar 2002 |
|
12. Stephen E. Stein, Stephen R. Heller,
and Dmitrii Tchekhovskoi, An Open Standard for Chemical Structure
Representation: The IUPAC Chemical Identifier, |
|
Proceedings
of the 2003 International Chemical Information Conference (Nimes),
Infonortics, pp. 131-143. |
|
|
|
|
|
|
NIST – 150,000 |
|
NIH/NCBI/PubChem project – 5.2 million+ |
|
IBM – 1.6+ million |
|
ISI – 2+ million |
|
NCI Database – 23 million+ |
|
EPA –DSSTox database – 1450 |
|
KEGG database – 9584 |
|
UCSF ZINC – 3.3million |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Future versions of InChI, for example, could
include phase information and crystal structure, conformations, electronic
states and additional classes of stereochemistry. |
|
|
|
First additional project: Investigate adding
polymers to InChI |
|
|
|
|
|
|
I really think my friends would prefer if I left
their names off this slide. |
|
|
|
|
Steve Bachrach, Steve Bryant, Denise Creech,
Nick Day, Rene Deplanque, Guenter Grethe, Stevan Hanard, Sami Kassab, Gary
Mallard, Randy Marcinko, Alan McNaught, Bill Milne, Carmen Nitsche, Chris
Reed, Rich Roberts, Peter Murray-Rust, Henry Rzepa, Steve Stein, Peter
Shepherd, Bill Town, Andrea Twiss-Brooks, Wendy Warr, and Ann Wolpert |
|
|
|
|
|