Notes
Slide Show
Outline
1
The Evolution and Revolution of Scientific Information Resources in the Last 50 Years
  • Stephen R. Heller
  • Consultant/Guest Researcher
  • NIST/PCPD
  • Gaithersburg, MD 20899
  • steve@hellers.com
2
The slides from this presentation can be found at:

http://www.hellers.com/steve/pub-talks/
3
Disclaimer

The opinions presented on these slides are those of the slides and not necessarily those of the speaker.

No animals were harmed in the preparation of this talk; however a few WWW sites were hit.
.
These slides were made from 100% recycled electrons.
 
There are no George W. Bush jokes in this presentation.

This will be a well balanced presentation. I have a chip (grudge) on both shoulders.
4
Content


Where do you get it?

How is it disseminated?

How do you use it?
5
There are three ways to ruin yourself:
gambling, women, and technology.

Gambling is the fastest
Women are the most pleasurable
Technology is the most certain


George Pompidou
6
5 Critical Factors Affecting Chemical Information in the 21st Century

1. Internet - WWW
2. Internet - WWW
3. Internet - WWW
4. Internet - WWW
5. Internet - WWW
7
Evolution
  •    From the 1950’s to 2006 there has been an evolution of scientific information from paper to electronic form, coupled with a revolution in computer and network communication capabilities (i.e., the Internet) which is transforming the way information is collected, processed,  disseminated, and used.
8
Evolution
  • Web 1.0 - We have evolved from everything on paper, which needed to be centrally organized and distributed from a central source to …


  • Web 2.0 - Currently uncontrolled chaos and a revolution with data and information being dumped into systems around the world.
9
Three main factors that will mentioned when discussing software and databases as they all relate to the evolution and revolution in scientific information.

  • Technical
  • Economic
  • Political/Cultural
10
Scribes in the 15th century were not happy with Johann Gutenberg.


Publishers in the 21st century are not happy with Tim Berners-Lee/Internet
11
1950’s
  • Printed Abstracts from CAS, UK
  • Few databases/compilations
  • All on paper
  • A handful of computers worldwide
  • Chemical Information was supported by a              thriving chemical industry
12
1950’s

The chemist would read the CAS sections appropriate to their research needs.  Then he/she would go to the library to read the full journal article of interest.  Often this meant a request for an interlibrary loan to obtain the article.
13
Computer & Information Trends
  • 1. Hardware:
  • Smaller
    Faster
    Cheaper
  • Networked

  • 2. Software:
    Bigger
    Slower
    More expensive  (fee based)
  • Open Source (free)


  • 3. Data/Information
  • More expensive (fee based)
  • Free (Open Access)


14
If software is so user friendly, why are there so many training classes?

Because the information is more complex.

Because you can do more with it.

Because the software offers (too) many options.
15
The Paperless Office is as likely as the Paperless Bathroom
16
2006
  • Everything is electronic
  • Databases are common in chemistry and biology
  • Everyone has a PC and WWW access
  • Data and databases are commonplace and large
  • Databases have gone from primarily text to value-added indexing, coding, structures, and linking (e.g. PubChem)
  • The chemical industry has been overtaken by biology/biochemistry/biomedicine causing problems for the ACS/CAS
  • Bioinformatics data is the antithesis of the chemical data franchise
  • Current Awareness has evolved into Continuous Awareness
17
2006
  •     The chemist logs onto CAS/SciFinder®, ISI Web of Science®, Integrity®, ScienceDirect®,  Scirus.com®, Chemindustry.com®, PubChem, or Chemweb.com® to search for something of interest.  Then he/she clicks in the hyperlink, using LitLink or ChemPort and, assuming you have a paid for access to the journal article, the article appears immediately on your computer screen for you to read or print out and take to the bathroom to read. Now document delivery is easy and fast.  More importantly, one learns from the experiences of others - being able to do computer searches of the literature helps a lot and allows one to read more articles of interest.
18
Internet – 2006


The Internet is like a box of chocolates -
you never know what you will get.
(with apologies to Forrest Gump)


The Internet is like drinking from a fire hydrant
19
Old Chemistry is Useful
(reactions/synthesis/patents)

Old Biology is not Useful
(Mendel’s genetics experiments on peas)
20
"New scientific truth does
not triumph by convincing its
opponents and making them see
the light, but rather because
its opponents eventually die,
and a new generation grows up
that is familiar with it."


Max Planck,
"Scientific Autobiography and
Other Papers",
Williams & Norgate,
London (1950), pages 33-34.
21
Evolution becomes a revolution when there are a sufficient number of mutations to take over and replace the old forms of life.
22
Organizations that fail to recognize and confront technological and market changes often tend to lose their positions, if not their organizations.  History is replete with such examples. In the 18th century the power looms replaced the handloom weavers, In the early 20th century the horse and buggy industry giving way to automobiles.  In the late 20th century the airplane replaced the train and boat for long distance traveling. 
Now, at the start of the 21st century the technology of the Internet is threatening the way in which the 3+ century old scientific publishing industry and libraries which subscribe to scholarly publications have done business for many decades.
23
When it comes to change, some organizations are so dense, light bends around them.
24
The circulation of daily U.S. newspapers is 55.2 million, down from 62.3 million in 1990.  The percentages of adults who say they read a paper "yesterday" are ominous:

 65 and older  --  60 percent.

 50-64  --  52 percent.

 30-49  --  39 percent.

 18-29  --  23 percent.


A structural change in the way get information
4/25/2005
“Unread and Unsubscribing” - George F. Will – US syndicated columnist
25
Most Popular Web Sites
  • Yahoo!- free
  • Google - free
  • MySpace – free social network
  • MSN - free
  • EBay
  • Amazon
  • Craiglist – free classified ads
  • CNN news - free
  • Wikipedia - free


  • # 19 – NY Times - free
  • # 27 – BBC - free
  • # 66 – FaceBook – free university/college social network
  • # 290 – NLM/NIH - free
  • # 7,756 - ACS
  • # 41,695 – CAS
  • # 180,328 – ISI/Web of Science


26
Members/Users
  • MySpace – 100 million users/profiles;
  •                       2,210,000 users/day
  • Ebay – 100 million users --   5,044,00 users/day
  • FaceBook – 8 million users/profiles of university students
  • NLM/NIH – PubMed/PubChem – 500,000 users/day
  • CAS – 1000 organizations - ? users/day
  • Yahoo!  -  16,031,000 users/day
  • Google –  15,130,000 users/day
  • Wikipedia – 4,260,000 users/day



  • ComScore.com – June 2006 analysis


27

Science Publishing and the Web

TheWeb 2.0, social networking, wikis, mashups, and so on  are poised to radically change the ability of scientists to share data and develop ideas both within and between organizations.

“Scientists are eager to apply the awesome power of the Internet revolution to scientific communication, but have been stymied by the conservative nature of scientific publishing,” says PLoS co-founder Michael Eisen


http://www.bio-itworld.com/issues/2006/july-aug/first-base/
28
Journals are a method of destroying information and data on a gigantic scale.

Johnny Gasteiger
29
Open Access Information
  • Peter Suber list (started in 2001):
  •   http://www.arl.org/sparc/soa/index.html


  • Steven Harnad List (started in 1998):
  •    http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html
30
Open Access in Chemistry
  • Beilstein Journal of Organic Chemistry


  • Chemistrycentral.com
31
 
32
 
33
 
34
InChI
  •            A project whose time has come.  Without the Internet InChI would be just another in a series of technically excellent, soon forgotten, projects for representing chemical structures. The Internet, an international scientific body (IUPAC), and international cooperation (US, UK, Czech Republic) has led  to the speedy development, implementation, and use of InChI.


  •            While InChI is a public domain, open source system for creating a unique computer-readable identifier (“name”)  it is NOT a registry system.  InChI’s are created only by those who choose to adopt and use the algorithm.  Registry systems which index the literature are complimentary to any InChI databases that anyone creates.
35
InChI
  • Digital ‘Naming’ of Chemicals:


  • Chemical structure is the true ‘identifier’
  • But, structure representations are not unique or convenient for computers.
    • So, convert structure to a unique ‘name’ by fixed algorithms
    • The IUPAC International Chemical Identifier (InChI)
36
Two Major Problems
    • 1. Chemicals
    •  – Fast isomerization (tautomerization)
  •       – Ill-defined connectivity
  •      2. Chemists
  •      – Differing conventions
    • Depends on discipline, education and convenience
    • Imprecision/uncertainty
37
InChI Layers

  • Formula
  • Connectivity
  • Stereochemistry/Chirality
  • Isotope
  • Charge
  • Fixed/Mobile Hydrogens
  • And so on
38
How does InChI differ from SMILES?

Like InChI, the SMILES language allows a canonical serialization of molecular structure. However, SMILES is proprietary and unlike InChI is not an open project. This has led to the use of different generation algorithms, and thus, different SMILES versions of the same compound have been found.

In fact, we have found seven different unique SMILES for caffeine on Web sites:

1.[c]1([n+]([CH3])[c]([c]2([c]([n+]1[CH3])[n][cH][n+]2[CH3]))[O-])[O-]
2.CN1C(=O)N(C)C(=O)C(N(C)C=N2)=C12
3.Cn1cnc2n(C)c(=O)n(C)c(=O)c12
4.Cn1cnc2c1c(=O)n(C)c(=O)n2C
5.N1(C)C(=O)N(C)C2=C(C1=O)N(C)C=N2
6.O=C1C2=C(N=CN2C)N(C(=O)N1C)C
7.CN1C=NC2=C1C(=O)N(C)C(=O)N2C
39
4 Useful InChI URL’s

IUPAC InChI URL:  http://www.iupac.org/inchi

The InChI-L Listserver WebBoard URL:
http://webboard.rsc.org:8080/~INCHI-L

InChI FAQ’s: Created by Nick Day, Cambridge University, UK:
http://wwmm.ch.cam.ac.uk/inchifaq/


IUPAC Prague Group InChI URL:
www.inchi.info
40
InChI take-up by software developers and database providers

Software:

1. Structure Drawing

    a. ACD Labs: ChemSketch http://www.acdlabs.com
    b. CambridgeSoft: ChemDraw http://www.camsoft.com
    c. ChemAxon: Marvin http://www.chemaxon.com
    d. BK-Chem: http://bkchem.zirael.org/inchi_en.html

2. Structure Search

    a. IBM (internal project)

3. Analysis software

    a. SciTegic: http://www.scitegic.com

4. Structure file interconversion

    a. OpenBabel: http://openbabel.sourceforge.net/RELEASE.shtml

5. Other software

    a. World Wide Molecular Matrix: http://wwmm.ch.cam.ac.uk/gridsphere/gridsphere
41
Databases:
(ordered by when adopted)

1. NIST WebBook http://webbook.nist.gov
2. NIH PubChem http://pubchem.ncbi.nlm.nih.gov
3. NCI DTP http://cactus.nci.nih.gov/ncidb2/
4. EPA - DSSTox http://www.epa.gov/nheerl/dsstox/
5. UC-SF ZINC project http://blaster.docking.org/zinc/
6. KEGG http://www.genome.ad.jp/kegg/
7. ISI Web of Science http://portal.isiknowledge.com/
8. Carcinogenic Potency http://potency.berkeley.edu/structure.html
9. ChEBI http://www.ebi.ac.uk/chebi
10. Wiley Mass Spectra http:www.wiley.com/WileyDCA/Section/id-                                  131370.html
11. Prous Science Integrity http://integrity.prous.com/integrity/servlet/xmlxsl/
12. FDA GeneTox and Chronic/subchronic Databases http://www.leadscope.com/fdadb_cat.php
13. Compendium of Pesticide Common Names
http://www.alanwood.net/pesticides
42
Technical/Economic /Political Features of InChI

1. It works as well as any other system.

2. It is free-open source software.

3. Any organization can use for internal and/or external structure files at no cost.

4. It is sponsored by IUPAC and primarily implemented by the US standards agency – NIST.

5. It allows one to have an alternative to the CAS Registry and to InChI’s can be freely searched for via Google/Yahoo.

6. It allows all those chemical information providers who compete with CAS to have a free alternative.
43
Prediction is very difficult, especially about the future.

  -  Niels Bohr



Give them a number and give them a date, but never both

- Edgar Fiedler


If you have to forecast, forecast often

- Anonymous
44
The Future

Between researchers putting their results on the web and Google/Yahoo/Microsoft developing ways to search text and chemical structures all non-copyright, non-proprietary information will be readily available. Who knows, Google might even buy all of Elsevier’s back-file content one day.
45
From my May 2003 Prous Users Forum:

Two major areas of chemical information software and database development will be in the ADMET and E-notebooks/LIMS/e-data integration.
46
             The Future

1. People will continue to pay for real added value.

2. People will pay for software and analysis tools that are worth the                       money.

3. Open Access journals will continue to evolve.

4. Open Source, such as IUPAC/InChI will become the predominant structure representation form.

5. E-Notebooks/LIMS will grow and evolve into organization-wide linked information systems.
47
Acknowledgements

I really think my friends would prefer if I left their names off this slide.
48
Acknowledgements


Steve Bachrach, Mila Becker, Pieter Bolman, Bob Bovenschulte, Steve Bryant, Harry Collier, Alice Cooper,  Rene Deplanque, Guenter Grethe, Stevan Hanard, Sami Kassab, Gary Mallard, Randy Marcinko, Alan McNaught, Bill Milne, Carmen Nitsche, Josep Prous, Chris Reed, Rich Roberts, Peter Murray-Rust, Henry Rzepa, Steve Stein, Peter Shepherd, Bill Town, Andrea Twiss-Brooks, Wendy Warr, Ann Wolpert