Notes
Slide Show
Outline
1
Chemical Informatics and Databases – 30+ Years of Progress
  • Stephen R. Heller,
  • NIST, Gaithersburg, MD 20899 USA
  • srheller@nist.gov


2
 
3
An expert is someone who knows no more
about the subject than anyone else,
but is organized and shows slides.
4
Noordwijkerhout 1972

The first international conference on chemical information
5
 
6
 
7
The only thing new in the world is the history you don’t know.

Harry Truman
8
The further backward you look, the further forward you can see.

Winston Churchill
9
There are three ways to ruin yourself:
Gambling, woman, and technology


Gambling is the fastest,
Woman are the most pleasurable,
Technology is the most certain

George Pompidou
10
Computer Literacy
  • 1970
  • Low – few chemists used computers
  • Most users did Quantum Chemistry calculations
  • 2003
  • High – most all chemists are using computers every day for scientific and administrative work
11
User Needs
  • 1970
  • Edit text – Secretary
  • Calculate – Bookkeeper
  • File – Office Clerk
  • Communicate – Mail Room
  • Draw – Art Department
  • 2003
  • WordPerfect/Word
  • Excel


  • Access/
  • E-mail


  • PowerPoint
12
Bibliographic Databases

  • 1970
  • CAS – Just starting to automate and out their information into computer readable form.

  • 2003
  • CAS – Almost totally electronic.  No real new content – but THE source of bibliographic information for chemists.
13
Structure Representation
  • 1970
  • WLN
  • Connection tables
  • IUPAC Names
  • 2003
  • Connection tables
  • SMILES
  • IUPAC IChI
14
 
15
 
16
 
17
 
18
 
19
 
20
 
21
 
22
 
23
 
24
 
25
 
26
 
27
 
28
 
29
 
30
IChI software available from
Steve Stein:

steve.stein@nist.gov
31
Distributed Computing
  • 1970
  • Timesharing PDP-10’s
  • Initially used for calculations because hardware was expensive
  • 2003
  • GRID computing
  • Initially used for computing because of need for enormous processing capabilities
32
Distributed Computing
  • 1970
  • Timesharing used to access scientific databases - The NIH/EPA CIS
  • 2003
  • GRID computers used to access databases in multiple remote systems
33
Computer Trends
  • Hardware


  • Smaller
  • Faster
  • Cheaper
  • Software


  • Bigger
  • Slower
  • More Expensive
34
Multiple Database Access
  • 1970
  • NIH/EPA CIS
  • Structure Searching linked to data and information
  • 200,000 structures
  • ~ 1 million facts
  • Searches only CIS databases
  • 2003
  • MDL DiscoveryGate
  • Structure searching link to data and information
  • 11 million structures
  • 200 million facts
  • Searches only MDL databases
35
Journals
  • 1970
  • All print journals
  • Since the 18th century the only way to distribute information to chemists around the world
  • 2003
  • Most chemistry journals available electronically
  • Print/snail mail – replaced by the Internet
36
Physics Pre-Print Server
  • 1930’s
  • High energy physicists send mimeograph copies of their research preprints to colleagues
  • 2003
  • In 1990’s Ginsparg updated this distribution system to all electronic Internet access
37
Numeric Databases
  • 1970
  • Very few databases in electronic form
  • Print-Libraries-Handbooks are the norm
  • 2003
  • No one would think of anything but an electronic database
38
Mass Spec
  • 1970
  • First NIH database 8124 spectra - including duplicates
  • Very limited Quality Control
  • 2003
  • Current 2002 NIST database:
  • 174,948 total spectra
  • 147,198 unique spectra
  • Extensive Quality Control
39
Mass Spec
  • 1970
  • Electron Impact Mass Spec very useful for structure identification
  • 2003
  • Electron Impact Mass Spec very useful for structure identification
40
2D-3D structures
  • 1970
  • Many home grown methods
  • 2003
  • CORINA – gold standard
41
Journal Access
  • 1970
  • Library
  • Interlibrary Loan – slow delivery


  • 2003
  • Online Internet access
  • Download for local printing -  instant delivery
42
Delivery of Journals
  • 1970-1990’s
  • Print
  • Microfilm
  • FAX
  • CD-ROM
  • 2003
  • Print/PDF
  • Most Electronic
43
Next Generation Journals ?

Each organization will publish their own journal articles on local servers, referred by peers in the field , and all journal databases will be linked together via a grid of networked computers which know where each article is stored.
44
..publishers have grown fat by charging libraries hundreds or thousands of dollars a year for subscriptions to printed artefacts that might not contain information of real importance.

Harry Collier, Digital Publishing Strategies, 11/97, page 16
45
Most scientific manuscripts are write only.
46
For a talk to be immortal
it does not have to be eternal.

Muriel Humphrey comment to
Vice-President Hubert Humphrey
47
Major projects
  • Mass Spectrometry Database & Search System (MSSS)
  • NIH/EPA Chemical Information System (CIS)
  • SciWords scientific spelling databases
  • IUPAC Chemical Identifier Project (IChI)
48
Acknowledgements
  • The following have been instrumental in the work I have undertaken in the past 30+ years:


  • Steve Bachrach
  • Hank Fales
  • Richard Feldmann
  • Sandy Lawson
  • Jerry Mitsche
  • Bill Milne
  • Kay Pool
  • Rudy Potenzone
  • Steve Stein
  • Morris Yaguda