Notes
Slide Show
Outline
1
When will the evolution of chemical information on the Internet turn into a revolution?
  • Stephen R. Heller
  • Silver Spring, MD 20902
  • steve@hellers.com
2
The slides from this and related presentations can be found at:

http://www.hellers.com/steve/pub-talks/
3
Disclaimer

The opinions presented on these slides are those of the slides and not necessarily those of the speaker.
 
These slides were made from 100% recycled electrons.

This will be a well balanced presentation.
I have a chip on both shoulders.
4
Content/Information

Where do you get it and what do you get?

Technology is CHANGING all of this!
5
10 Critical Factors Affecting the CHANGE in Chemical Information collection, processing, and dissemination in the 21st Century

1. Internet - WWW
2. Internet - WWW
3. Internet - WWW
4. Internet - WWW
5. Internet - WWW
6. Internet - WWW
7. Internet - WWW
8. Internet - WWW
9. Internet - WWW
10. Internet - WWW
6
Prediction is very difficult, especially about the future.
-  Niels Bohr



Give them a number and give them a date, but never both
- Edgar Fiedler


If you have to forecast, forecast often
- Anonymous
7
Wherever there is power, there is resistance.
 (Again, no names ---to protect the guilty)

Michel Foucault

The History of Sexuality, Volume 1: An Introduction, New York, Vintage Books, 1978, page 95
8
Evolution
  •    From the 1960’s to 2007 there has been an evolution of scientific information from paper to electronic form, coupled with a revolution in computer and network communication capabilities (i.e., the Internet) which is transforming and changing the way information is collected, processed,  disseminated, and used.
9
Evolution
  • Web 1.0 - We have evolved from everything on paper and, via technology to electronic form, which needed to be centrally organized and distributed from a central source to   …


  • Web 2.0 - Currently uncontrolled chaos and a revolution with data and information being dumped into systems around the world.  Web 2.0 is more an attitude more than a technology. Web 2.0 is “leading from below”.
10
More Change 

Scribes in the 15th century were not happy with Johann Gutenberg.

Publishers in the 21st century are not happy with Tim Berners-Lee/Internet

If horses could vote we would never have cars.
11
1960’s
  • Printed Abstracts from CAS, UK, Germany
  • Few databases/compilations
  • All on paper – Simple text; few diagrams
  • A handful of computers worldwide
  • Chemical Information was supported by a              thriving, high profit margin chemical and pharmaceutical industry
12
1960’s
  •    The chemist would read the CAS sections appropriate to their research needs.  Then he/she would go to the library to read the full journal article of interest. Often this meant a request for an interlibrary loan to obtain the article.
13
2007
  • Everything is electronic
  • Databases are common in chemistry and biology
  • Everyone has a PC and Internet access
  • Data and databases are commonplace and large
  • Databases have gone from primarily text to value-added indexing, coding, structures, and linking (e.g. PubChem)
  • The chemical industry has been overtaken by biology/biochemistry/biomedicine causing problems for the ACS/CAS
  • Bioinformatics data is the antithesis of the chemical data franchise – virtually all free vs. fee-based
  • Current Awareness has evolved into Continuous Awareness
14
2007
  •     The chemist logs onto CAS/SciFinder®, ISI Web of Science®, Integrity®, ScienceDirect®,  Scirus.com®, Chemindustry.com®, PubChem, or Chemweb.com® to search for something of interest.  Then he/she clicks in the hyperlink, using LitLink or ChemPort and, assuming you have a paid for access to the journal article, the article appears immediately on your computer screen for you to read or print out and take to the bathroom to read. Now document delivery is easy and fast.  More importantly, one learns from the experiences of others - being able to do computer searches of the literature helps a lot and allows one to read more articles of interest.
15
Evolution becomes a revolution when there are a sufficient number of mutations/changes to take over and replace the old “forms” of life.   

When the “tipping or change point” comes is impossible to predict, but it WILL come.
16
When it comes to change, some organizations are so dense, light bends around them.
(My lawyer suggests I not give any example of such a organization run by a lawyer.)


But others, see change coming and do or try to adopt and adapt.
17
            Individual Change

In the 1950’s to the early 1990’s scientists had considerable support staff to type, edit, file, communicate, and so on. But now, with computers and the Internet everyone in this room can:


Edit text better than a secretary
Calculate more accurately than the bookkeeper
File better than the office clerk
Communicate cheaper and faster than the mail room
Draw better than the graphics art department
18
Corporate Internet Company Change

AOL – Started charging $19.95 per month for Internet access. It evolved and grew to be the largest online service. But the Internet also evolved and grew and the AOL monthly fee model began to develop problems.  The number of users went from 35+ million to under 20 million, while usage of the free service Google went from zero (0) to tens of millions.

In August 2006 AOL announced: "We’re in the process of offering all of our content and many of our services for free -- with or without an AOL Internet connection.”
19
         Content vs. Distributor
             Revenue Change

1980’s – 70% - Distributor (e.g., Dialog)
               30% - Content Owner

2000’s – 70% - Content Owner
               30% - Distributor


Question – Why?
Answer – Google/Internet
20
                                Reading Habit Change


The circulation of daily U.S. newspapers is 55.2 million, down from 62.3 million in 1990.  The percentages of adults who say they read a paper "yesterday" are ominous:

 65 and older  --  60 percent.

 50-64  --  52 percent.

 30-49  --  39 percent.

 18-29  --  23 percent.



A structural change in the way get information
4/25/2005
“Unread and Unsubscribing” - George F. Will – US syndicated columnist
21
Most Popular Web Sites
  • Yahoo!- free (#1 of top 10)
  • Google - free
  • MySpace – free social network
  • MSN - free
  • EBay
  • Amazon
  • Craiglist – free classified ads
  • CNN news - free
  • Wikipedia - free


  • # 19 – NY Times - free
  • # 27 – BBC - free
  • # 66 – FaceBook – free university/college social network
  • # 290 – NLM/NIH - free
  • # 7,756 - ACS
  • # 41,695 – CAS
  • # 180,328 – ISI/Web of Science


22
Members/Users
  • MySpace – 100 million users/profiles;
  •                       2,210,000 users/day
  • Ebay – 100 million users --   5,044,00 users/day
  • FaceBook – 8 million users/profiles of university students
  • Yahoo!  -  16,031,000 users/day
  • Google –  15,130,000 users/day
  • Wikipedia – 4,260,000 users/day
  • NLM/NIH – PubMed/PubChem – 500,000 users/day
  • CAS – 1000 organizations - ? users/day



  • ComScore.com – June 2006 analysis


23

Science Publishing and the Web

TheWeb 2.0, social networking, wikis, mashups, and so on  are poised to radically change the ability of scientists to share data and develop ideas both within and between organizations.

“Scientists are eager to apply the awesome power of the Internet revolution to scientific communication, but have been stymied by the conservative nature of scientific publishing,” says PLoS co-founder Michael Eisen



http://www.bio-itworld.com/issues/2006/july-aug/first-base/
24
                        Web 2.0 changes

1. Diminished stature for many existing institutions

2. Hierarchies are coming undone

3. Gatekeepers are being bypassed

4. Many intermediaries are no longer necessary

Andrew Shapiro
The Control Revolution, Public Affairs, New York, 1999 (ISBN: 1-891620-19-3)
25
Growth of Open Access
  •             Steve Heller’s Google News
  •               Open Access” Alert Service
  •                    (A non-scientific study )


  •      Started May 2004 – 6 news articles/month


  •    As of February  2007 – 48 news articles/month
26
Publishers response to the change created by Internet and Open Access –


A unique combinations of panic and lethargy
27
“The availability of for-free services such as those offered in the
patent field by the EPO and now Google (plus others) is a real threat
to financial viability of many traditional, high-cost information
providers. A small core of faithful users -- who feel they need
advanced features -- may stay with Thomson, CAS, Questel, Dialog, etc.
But this small core may well be too small to support high-cost
services. “

Harry Collier, private communication, January 2007




…And this small core of users are aging and retiring with  the new generation which has been brought up on Google, FaceBook, MySpace, and similar technology and services.
28


“I had lunch with a senior manager at Springer.  He told me that 95% of their search referrals come from Google and PubMed! I found this number quite impressive. I wonder why Elsevier spends millions developing their search interface?”

February 2007 email
29
Elsevier Share/Stock Price

  •                          London Stock Exchange Information


  • 3/19/1997   567.45 pence
  • 3/19/2007   589.50 pence


  • 100 UK pound investment in 1997 is now worth 104 pounds.


  • http://www.reedelsevier.com/index.cfm?articleid=125
30
Thomson Share/Stock Price
  •               Toronto Stock Exchange Information


  • 3/19/1997   28.15 dollars
  • 3/19/2007   49.01 dollars


  • http://www.investcom.com/cgi-bin/redir.cgi?url=http://finance.yahoo.com/q/hp?s=TOC.TO&frame=frame/yahoo.html


31
Microsoft  Change
February 2007

Hires a “Industry Technology Strategist for the Pharma Industry”
32
'Wikinomics: How Mass Collaboration Changes Everything'
  

Talk of the Nation (radio program), January 2, 2007 · YouTubers, wiki-users and MySpacers are at the vanguard of a movement that's changing the way we do business.

Guest:
Don Tapscott, co-author of Wikinomics: How Mass Collaboration Changes Everything


http://www.npr.org/templates/story/story.php?storyID=6711038
33
A new Chemistry Wiki
  • ChemSpider is a new Open Access project of chemical structures linked to data and property prediction programs (at present limited to ACD/Labs software), all available free on the Internet.


  • Going live this week at the ACS meeting is contains some 10 ½ million structures.
34
What is ChemSpider?

ChemSpider is a chemistry search engine. It has been built with the intention of aggregating and indexing chemical structures and their associated information into a single searchable repository and make it available to everybody, at no charge.

ChemSpider is a value-added offering of publicly available chemical structures since many additional properties have been added to each of the chemical structures. We intend ChemSpider to offer the fastest chemical structure searches available online and delivered with the flexibility and usability necessary to encourage repeat usage.
35

Microsoft – February 2007
  • “companies that create no content of their own, and make money solely on the back of other people’s content are raking in billions”


  • Tom Rubin, Associate General Counsel, Microsoft , March 2007


  • (taken out of context, but it is a great quote)
36

Google/Microsoft/Yahoo/ChemSpider – serving the public


STM Publishers – serving the rich
37
Open Access in Chemistry
  • Beilstein Journal of Organic Chemistry – less than 50 articles since August 2005


  • ChemistryCentral.com – just starting


  • Arkivoc - The Organic Chemistry Journal
38
Arkivoc & BJOC publications

  •     Arkivoc                   BJOC
  •   (Arkat-usa.org)


  • 2000 -     90
  • 2001 -   174
  • 2002 -   209
  • 2003 -   327
  • 2004 –  230
  • 2005 -   305              18
  • 2006 -   248              26
  • 2007 -     98                3  (up to 2/07)


  • Totals – 1681            47


39
“Journals are a method of destroying information and data on a gigantic scale.”

Johnny Gasteiger
40
More Change --
RSC – Project Prospect
  •         From 2/2007 electronic RSC journals will have metadata added to each article – CML, InChI, and OBO – Open Biomedical Ontologies. This way one can search using chemical structure and these index terms.


  •         Sooner, rather than later, secondary publishers (e.g., CAS) will find their role is no longer needed. See David Flaxbart:
  • http://www.istl.org/07-winter/viewpoints.html
41
InChI
  •            A project whose time has come.  Without the Internet InChI would be just another in a series of technically excellent, soon forgotten, projects for representing chemical structures. The Internet, an international scientific body (IUPAC), and international cooperation (US, UK, Czech Republic) has led  to the speedy development, implementation, and use of InChI.


  •            While InChI is a public domain, open source system for creating a unique computer-readable identifier (“name”), and (soon) an InChI number, it is NOT a registry system.  InChI’s are created only by those who choose to adopt and use the algorithms.  Registry systems which index the literature are complimentary to any InChI databases and InChI numbers that anyone creates.
42
InChI
  • Digital ‘Naming’ of Chemicals:


  • Chemical structure is the true ‘identifier’
  • But, structure representations are not unique or convenient for computers.
    • So, convert structure to a unique ‘name’ by fixed algorithms
    • The IUPAC International Chemical Identifier (InChI)
43
Two Problems with InChI
    • 1. Chemicals
    •  – Fast isomerization (tautomerization)
  •       – Ill-defined connectivity
  •      2. Chemists
  •      – Differing conventions
    • Depends on discipline, education and convenience
    • Imprecision/uncertainty
44
InChI Layers

  • Formula
  • Connectivity
  • Stereochemistry/Chirality
  • Isotope
  • Charge
  • Fixed/Mobile Hydrogens
  • And so on
45
How does InChI differ from SMILES?

Like InChI, the SMILES language allows a canonical serialization of molecular structure. However, SMILES is proprietary and unlike InChI is not an open project. This has led to the use of different generation algorithms, and thus, different SMILES versions of the same compound have been found.

In fact, we have found seven different unique SMILES for caffeine on Web sites:

1.[c]1([n+]([CH3])[c]([c]2([c]([n+]1[CH3])[n][cH][n+]2[CH3]))[O-])[O-]
2.CN1C(=O)N(C)C(=O)C(N(C)C=N2)=C12
3.Cn1cnc2n(C)c(=O)n(C)c(=O)c12
4.Cn1cnc2c1c(=O)n(C)c(=O)n2C
5.N1(C)C(=O)N(C)C2=C(C1=O)N(C)C=N2
6.O=C1C2=C(N=CN2C)N(C(=O)N1C)C
7.CN1C=NC2=C1C(=O)N(C)C(=O)N2C
46
5 Useful InChI URL’s

IUPAC InChI URL:  http://www.iupac.org/inchi

The InChI-L Listserver WebBoard URL:
http://webboard.rsc.org:8080/~INCHI-L

InChI FAQ’s: Created by Nick Day, Cambridge University, UK:
http://wwmm.ch.cam.ac.uk/inchifaq/


IUPAC Prague Group InChI URL:
www.inchi.info

http://video.google.com/videoplay?docid=-6653695245776470969&q=heller+chemical
47
InChI take-up by software developers and database providers

Software:

1. Structure Drawing

    a. ACD Labs: ChemSketch http://www.acdlabs.com
    b. CambridgeSoft: ChemDraw http://www.camsoft.com
    c. ChemAxon: Marvin http://www.chemaxon.com
    d. BK-Chem: http://bkchem.zirael.org/inchi_en.html
    e. MDL Draw

2. Structure Search

    a. IBM (internal project)

3. Analysis software

    a. SciTegic: http://www.scitegic.com

4. Structure file interconversion

    a. OpenBabel: http://openbabel.sourceforge.net/RELEASE.shtml

5. Other software

    a. World Wide Molecular Matrix: http://wwmm.ch.cam.ac.uk/gridsphere/gridsphere
48
Databases:
(ordered by when adopted)

1. NIST WebBook http://webbook.nist.gov
2. NIH PubChem http://pubchem.ncbi.nlm.nih.gov
3. NCI DTP http://cactus.nci.nih.gov/ncidb2/
4. EPA - DSSTox http://www.epa.gov/nheerl/dsstox/
5. UC-SF ZINC project http://blaster.docking.org/zinc/
6. KEGG http://www.genome.ad.jp/kegg/
7. ISI Web of Science http://portal.isiknowledge.com/
8. Carcinogenic Potency http://potency.berkeley.edu/structure.html
9. ChEBI http://www.ebi.ac.uk/chebi
10. Wiley Mass Spectra http:www.wiley.com/WileyDCA/Section/id-                                  131370.html
11. Prous Science Integrity http://integrity.prous.com/integrity/servlet/xmlxsl/
12. FDA GeneTox and Chronic/subchronic Databases                                                            http://www.leadscope.com/fdadb_cat.php
13. Compendium of Pesticide Common Names
http://www.alanwood.net/pesticides
49
Other InChI Adopters
  • Journals:
  •       RSC
  •       Prous Science – Drugs of the Future


  •  Organizations:
  •       EPO
50
Technical/Economic /Political Features of InChI

1. It works as well as any other system.

2. It is free-open source software.  (Web 2.0)

3. Any organization can use for internal and/or external structure files at no cost. (Web 2.0)

4. It is sponsored by IUPAC and primarily implemented by the US scientific standards agency – NIST.

5. It allows one to have an alternative to the CAS Registry and to InChI’s can be freely searched for via Google/Yahoo/Microsoft.  (Web 2.0)

6. It allows all those chemical information providers who compete with CAS to have a free structure and number system alternative. (Web 2.0)
51
The Future

Between researchers putting their results on the web, journal metadata projects such as the RSC has started, and Google/Yahoo/Microsoft developing ways to search text and chemical structures all non-copyright, non-proprietary information will be readily available. Who knows, Google or Microsoft might even buy all of Elsevier’s back-file content one day.
52
             The Future

1. People will continue to pay for real added value.

2. People will pay for software and analysis tools that are worth the                       money.

3. Open Access journals will continue to evolve and will become the mainstream of scientific publishing.

4. Open Source, such as IUPAC/InChI will become the predominant structure representation form.

5. E-Notebooks/LIMS will grow and evolve into organization-wide linked information systems.
53

For any organization I haven’t picked on yet,  perhaps I can take care of this oversight in the questions and answer  session.
54
Acknowledgements


Steve Bachrach, Mila Becker, Jost Bohlen, Evan Bolton, Pieter Bolman, Evan Bolton, Bob Bovenschulte, Steve Bryant, Harry Collier, Alice Cooper,  Rene Deplanque, Ron Dunn, Guenter Grethe, Stevan Hanard, Sami Kassab, David Lipman, Gary Mallard, Randy Marcinko, Alan McNaught, Bill Milne, Carmen Nitsche, Josep Prous, Chris Reed, Rich Roberts, Peter Murray-Rust, Henry Rzepa, Steve Stein, Peter Shepherd, Bill Town, Andrea Twiss-Brooks, Wendy Warr, Ann Wolpert