Notes
Slide Show
Outline
1
30 Years of Evolution & Revolution of Chemical Information Resources
  • Stephen R. Heller
  • Consultant
  • Silver Spring, MD 20902
  • steve@hellers.com
2
The slides from this and related presentations can be found at:

http://www.hellers.com/steve/pub-talks/prous1008
3
Content/Information

Where do you get it and what do you get?
How is it disseminated?
How can you get it and how do you use it?

Technology is changing all of this!
4
10 Critical Factors Affecting Chemical Information collection, processing, and dissemination in the 21st Century

1. Internet - WWW
2. Internet - WWW
3. Internet - WWW
4. Internet - WWW
5. Internet - WWW
6. Internet - WWW
7. Internet - WWW
8. Internet - WWW
9. Internet - WWW
10. Internet - WWW
5



To everything there is a season,
a time for every purpose under the sun.

 Ecclesiastes 3:1
6
Evolution
  •    From the 1970’s to 2006 there has been an evolution of scientific information from paper to electronic form, coupled with a revolution in computer and network communication capabilities (i.e., the Internet) which is transforming the way information is collected, processed,  disseminated, and used.
7
Evolution
  • Web 1.0 - We have evolved from everything on paper, which needed to be centrally organized and distributed from a central source to …


  • Web 2.0 - Currently uncontrolled chaos and a revolution with data and information being dumped into systems around the world.    à Web 2.0 is an attitude, not a technology
8
Scribes in the 15th century were not happy with Johann Gutenberg.

Publishers in the 21st century are not happy with Tim Berners-Lee/Internet.

If horses could vote we would never have cars.
9
1970’s
  • Printed Abstracts from CAS, UK
  • Few databases/compilations
  • All on paper – Simple text; few diagrams
  • A handful of computers worldwide
  • Chemical Information was supported by a              thriving chemical industry
10
1970’s
  •    The chemist would read the CAS or ISI Current Contents sections appropriate to their research needs.  Then he/she would either send a postcard for a reprint or go to the library to read the full journal article of interest.  In the library this meant a request for an interlibrary loan to obtain the article.
11
2008
  • Everything is electronic
  • Databases are common in chemistry and biology
  • Everyone has a PC and Internet access
  • Data and databases are commonplace and  VERY large
  • Databases have gone from primarily text to value-added indexing, coding, structures, and linking (e.g. PubChem)
  • The chemical industry has been overtaken by biology/biochemistry/biomedicine causing problems for publishers
  • Bioinformatics data is the antithesis of the chemical data franchise
  • Current Awareness has evolved into Continuous Awareness
12
2008
  •     The chemist logs onto CAS/SciFinder®, ISI Web of Science®, Integrity®, ScienceDirect®,  Scirus.com®, Chemindustry.com®, PubChem, or Chemweb.com® to search for something of interest.  Then he/she clicks in the hyperlink, using LitLink or ChemPort and, assuming you have a paid for access to the journal article, the article appears immediately on your computer screen for you to read or print out and take to the bathroom to read. Now document delivery is easy and fast.  More importantly, one learns from the experiences of others - being able to do computer searches of the literature helps a lot and allows one to read more articles of interest.
13
Organizations that fail to recognize and confront technological and market changes often tend to lose their positions, if not their organizations. 

History is replete with such examples. In the 18th century the power looms replaced the handloom weavers, In the early 20th century the horse and buggy industry giving way to automobiles. 

 In the late 20th century the airplane replaced the train and boat for long distance traveling.  Now, at the start of the 21st century the technology of the Internet is threatening the way in which the 3+ century old scientific publishing industry and libraries which subscribe to scholarly publications have done business for many decades.
14
Growth of Open Access
  •       Steve Heller’s Google News Alert Service
  •                    (A non-scientific study )


  •     Started May 2004 – 6 news articles


  •       September  2008 –  97 news articles
15
From the 1950’s to the early 1990’s scientists had considerable support staff to type, edit, file, communicate, and so on. But now, with computers and the Internet everyone in this room can:



Edit text better a secretary
Calculate more accurately than the bookkeeper
File better than the office clerk
Communicate cheaper and faster than the mail room
Draw better the art department
16
Most Popular Web Sites
  • Yahoo!- free
  • Google - free
  • MySpace – free
  • Amazon
  • Craiglist – free classified ads
  • Wikipedia - free


  • # 22 – NY Times - free
  • # 44 – BBC - free
  • # 5  – FaceBook – free university/college social network
  • # 371 – NLM-PubMed & PubChem - free
  • # 6,310 - ACS
  • # 48,927 – CAS
  • # 222,273 – ISI/Web of Knowledge
  • # 240,141 - ChemSpider


  •                                             Alexa.com – October 2008


17

Science Publishing and the Web

TheWeb 2.0, social networking, wikis, mashups, and so on , are poised to radically change the ability of scientists to share data and develop ideas both within and between organizations.

“Scientists are eager to apply the awesome power of the Internet revolution to scientific communication, but have been stymied by the conservative nature of scientific publishing,” says PLoS co-founder Michael Eisen



http://www.bio-itworld.com/issues/2006/july-aug/first-base/
18
Journals are a method of destroying information and data on a gigantic scale.

Johnny Gasteiger
19
                        Web 2.0 changes

1. Diminished stature for many existing institutions

2. Hierarchies are coming undone

3. Gatekeepers are being bypassed

4. Many intermediaries are no longer necessary

Andrew Shapiro
The Control Revolution, Public Affairs, New York, 1999 (ISBN: 1-891620-19-3)
20
The availability of for-free services such as those offered in the
patent field by the EPO and now Google (plus others) is a real threat
to financial viability of many traditional, high-cost information
providers. A small core of faithful users -- who feel they need
advanced features -- may stay with Thomson, CAS, Questel, Dialog, etc.
But this small core may well be too small to support high-cost
services.

Harry Collier, private communication, January 2007




…And this small core of users are aging and retiring with  the new generation which has been brought up on Google, FaceBook, MySpace, and similar technology and services.
21
Steve-

“I had lunch with a senior manager at Springer. He told me that 95% of their search referrals come from Google and PubMed!   I found this number quite impressive. I wonder why Elsevier spends millions developing their search interface?”

February 2007
22
RSC – Project Prospect
  •         From 2/2007 electronic RSC journals will have metadata added to each article – CML, InChI/InChIKey, and OBO – Open Biomedical Ontologies. This way one can search using chemical structure and these index terms.


  •         Sooner, rather than later, secondary publishers (e.g., CAS) will find their role is no longer needed.
23
InChI/InChIKey
  • A project whose time has come.  Without the Internet InChI would be just another in a series of technically excellent, soon forgotten, projects for representing chemical structures. The Internet, an international scientific body (IUPAC), and international cooperation (US, UK, Czech Republic) has led  to the speedy development, implementation, and use of InChI/InChIKey.


24
InChI/InChIKey

  • While InChI is a public domain, open source system for creating a unique computer-readable identifier (“name”)  it is NOT a registry system.  InChI’s and InChIKey’s are created only by those who choose to adopt and use the algorithm.  Registry systems which index the literature are complimentary to any InChI/InChIKey databases that anyone creates.


  • What  has made InChI/InChIKey so  successful and being widely adopted by the community is that its’  success and adoption has been uncoerced. The US Government, large database producers (NIH, ChemSpider, Beilstein, , etc.), Publishers (Wiley, Nature, RSC, Prous,  etc.), Big Pharma (Pfizer, Novartis, etc.) and chemistry software companies (Symyx,  Microsoft, CambridgeSoft, ChemAxon, ACD Labs, etc.).


  •                                                   Why is this happening?
25
How does InChI differ from SMILES?

Like InChI, the SMILES language allows a canonical serialization of molecular structure. However, SMILES is proprietary and unlike InChI is not an open project. This has led to the use of different generation algorithms, and thus, different SMILES versions of the same compound have been found.

In fact, we have found seven different unique SMILES for caffeine on Web sites:

1.[c]1([n+]([CH3])[c]([c]2([c]([n+]1[CH3])[n][cH][n+]2[CH3]))[O-])[O-]
2.CN1C(=O)N(C)C(=O)C(N(C)C=N2)=C12
3.Cn1cnc2n(C)c(=O)n(C)c(=O)c12
4.Cn1cnc2c1c(=O)n(C)c(=O)n2C
5.N1(C)C(=O)N(C)C2=C(C1=O)N(C)C=N2
6.O=C1C2=C(N=CN2C)N(C(=O)N1C)C
7.CN1C=NC2=C1C(=O)N(C)C(=O)N2C
26
InChI/InChIKey
  • InChI/InChIKey is being adopted because organizations need interoperability. Organizations need to be able to connect all their internal data and information associated wit h their chemicals and link both internally and externally. Times are now difficult for Pharma and others who need to be able to correlate and link both internal and external information.


  • There is a insane amount of information available today , both public and commercial/private. People  and organizations don't change when the see the light; they change when they feel the heat.  InChI is open, free, and universal as opposed to unique identifiers which are proprietary.  Keeping a unique identifier proprietary and (in some cases) available at a high cost is stupidity on stilts.  InChI/InChIKey has become the world standard for being the unique chemical representation for a  defined chemical structure because it meets the needs of today’s chemical community.


27
5 Useful InChI/InChIKey URL’s


IUPAC InChI URL:  http://www.iupac.org/inchi

The InChI-L Listserver WebBoard URL:
http://webboard.rsc.org:8080/~INCHI-L

InChI FAQ’s: Created by Nick Day, Cambridge University, UK:
http://wwmm.ch.cam.ac.uk/inchifaq/

IUPAC Prague Group InChI URL:
www.inchi.info

 http://video.google.com/videoplay?docid=-6653695245776470969

 http://www.youtube.com/watch?v=F9XppyZg4E4
28
 
29
 
30
                  Technical/Economic/Political
                   Features of InChI/InChIKey

1. It works as well as any other system.

2. It is free-open source software.  (Web 2.0)

3. Any organization can use for internal and/or external structure files at no cost. (Web 2.0)

4. It is sponsored by IUPAC and primarily implemented by the US scientific standards agency – NIST.

5. It allows one to have an alternative to the CAS Registry and to InChI’s can be freely searched for via Google/Yahoo/Microsoft.  (Web 2.0)

6. It allows all those chemical information providers who compete with CAS to have a free alternative. (Web 2.0)
31
             The Future

1. People will continue to pay for REAL added value.

2. People will pay for software and analysis tools that are worth the                       money.

3. Open Access journals will continue to evolve.

4. Open Source resources, such as IUPAC/InChI/inChIKey will become the predominant structure representation form.

5. E-Notebooks/LIMS will grow and evolve into organization-wide linked information systems.
32
Acknowledgements


Steve Bachrach, Mila Becker, Pieter Bolman, Evan Bolton, Steve Bryant, Harry Collier, Alice Cooper,  Rene Deplanque, Guenter Grethe, Stevan Hanard, Sami Kassab, David Lipman, Gary Mallard, Randy Marcinko, Alan McNaught, Bill Milne, Carmen Nitsche, David Prous, Josep Prous, Chris Reed, Rich Roberts, Peter Murray-Rust, Henry Rzepa, Steve Stein, Peter Shepherd, Bill Town,  Wendy Warr, Ann Wolpert