Notes
Outline
Developing Numeric Databases – The NIST Mass Spec and WebBook


Stephen Heller
Guest Researcher
 NIST/PCPD
Gaithersburg, MD 20899-8380
The slides from this presentation can be found at:

http://www.hellers.com/steve/pub-talks/ali2004/frame.htm
An expert is someone who knows no more
about the subject than anyone else,
but is organized and shows slides.
There are three ways to ruin yourself:
Gambling, woman, and technology


Gambling is the fastest,
Woman are the most pleasurable,
Technology is the most certain

George Pompidou
NIST and others have developed many good, high quality numeric databases, but VERY few are used by someone other than the creator and his/her mother.
Good and useful are two different words with very different meanings.
If databases were perfect,
they wouldn’t be.


With apologies toYogi Berra
Outline of this talk

Part 1 – Mass Spec Database

Part 2 – NIST WebBook
The NIST Mass Spectrometry
Data Program
Slide 10
Slide 11
MS Library History
1971 – EPA/NIH Collection of Collections
1978 – First Distribution – Tape, On-Line, Books
1983 – To EPA, Cincinnati
1988 – To NIST
1990 – Manual Evaluation/Algorithms
1998 – Evaluated Library
2002 – Major Update
New Spectra
Focus Areas
New Commercially Available
Replicates for Important Compounds
Derivatives
Chemical Weapons Related
New Spectra
Quality Up Front
Complete
With chemical structures
Documented
Calibrated instruments
Upstream filter
Some Sources
14,000 Japan AIST/NIMC Collection: Commercially available common organic compounds
6,976 Russian Academy of Sciences : Institute of Petrochemical Synthesis:
Mostly Derivatives (silyl, acyl).
7,182 NIH measurements
Synthetic analogues of natural compounds, perspective drugs, drug metabolites, and their intermediates.
1,735 NIST
Commercially available common compounds, pesticides, drugs.
1,022 Eastman Chemical Company
Commercial and synthetic compounds and silyl derivatives.
406 Verifin (Finland)
: Chemical weapons and precursors.
348 HD-Science (UK)
: Silyl derivatives of drugs
138 Military Institute of Chemistry and Radiometry (Poland):
Chemical weapons related
Slide 16
When the pear is ripe it falls by itself.
Slide 18
NIST98 => NIST02
Mainlib: 107,886 => 147,370 spectra
91,856 spectra from old mainlib
1,331 spectra from old replib
54,183 new spectra
Replib: 21,250 => 27,844 spectra
14,050 spectra from old replib
7,378 spectra from old mainlib
6,416 new spectra
Excluded in the new database:
8,652 spectra from old mainlib
5,869 spectra from old replib
More Statistics
129,136  => 175,214 spectra
90,311    => 134,949 with CAS number
69,061    => 107,105 unique CAS numbers
107,829  => 147,350 structures
255,234  => 440,764 names
Peaks per Spectrum
79    => 99 median
96    => 111 average
12% => 5% less than 20 peaks/spectrum
2%   => 0.5% less then 10 peaks/spectrum
Slide 21
Distributors
Instrument Data Systems (15)
Agilent, Bruker, ThermoFinnigan, Hitachi, Inficon, JEOL, LECO, Los Gatos,  Micromass, MSS, ONIX/Fisons, Perkin-Elmer, Shimadzu, Shrader, Varian/Bear
Software (21)
ACD, Aldrich, ARLS, Bio-Rad/Sadtler, Chemical Concepts, ChemSW, ChroMaSoft, CSS, Digital Data Management, Fiveash, Galactic, HD Science, Hiden, JEMS, KORE, Monitor Group, Pro-Lab, Axel Semrau, Spectra Seriea, SIS, Stanton
Data (3)
ERM, JAICI, Wiley
Slide 23
The NIST WebBook

http://webbook.nist.gov/
Slide 25
WebBook
Motivation
Data available
Quality of data
Data presentation
Usage patterns
Future directions
Motivation
Standard Reference Data Act
Provide access to NIST standard reference data over the internet
Existing NIST databases
Additional data from NIST
Data from outside NIST
Data Available (1)
Thermodynamic
Gas phase
Condensed phase
Phase change
Reaction
Ion energetics
Ion clustering
Fluid property models
Group additivity model
Data Available (2)
Other
IR spectrum
Mass spectrum
UV / Vis spectrum
Vibrational and electronic energy levels
Diatomic constants
Henry’s law
Thermochemical Properties
Many types of thermochemical data
Tool for data evaluation
Reference to original literature
Useful comments provided in many cases
Multiple values for many properties
Related properties (DfH Ű DrH)
Thermochemical Properties (1)
DfH° - gas, liq., cr.
S° - gas, liq., cr.
Cp- gas, liq., cr.
DH - phase change
DS - phase change
Boiling point
Melting point
Thermochemical Properties (2)
Tc
Pc
Antoine parameters
Heat of combustion
DH - reaction
DG - reaction
DS - reaction
Quality of Data
Most data from peer-reviewed literature
NIST QA efforts
Preserve data in original form (digits, units)
Compilers are qualified scientists
Inspect multiple determinations
Error report tracking system
Data Presentation
Tabular Data
Antoine Equation Parameters
IR Spectrum
Reaction Search
Reaction Search
Reaction Data
Usage Patterns (1)
Steadily increasing usage over time
Over 20,000 “users” per week in non-holiday months
Large numbers of users return to the site
Used worldwide
Usage Patterns (2)
Future Directions
NIST retention index database (gas chromatography)
Links to more NIST data projects and sites
Support for the IUPAC Chemical Identifier (INChI)
Updates to existing databases
For a talk to be immortal
it does not have to be eternal.


Muriel Humphrey comment to
Vice-President Hubert Humphrey
Acknowledgements

Mass Spec Database:

Hank Fales (NIH), Bill Milne (NIH), Bill Budde (EPA), Steve Stein (NIST), Jane Klassen (NIST), Anzor Mikaya (NIST)

WebBook:
Peter Linstrom (NIST), Gary Mallard (NIST)