(This is the fourth in a series of articles aiming to promote a
higher awareness of the computer applications in the management,
dissemination, and uses of chemical data. It describes a number
of computerized spectroscopy databases).
This article is part of the efforts of CCDB to familiarize
the IUPAC membership of various aspects of computer activities in
chemistry. Previous articles have described online databases
(1), the Beilstein and Gmelin databases (2), and chemical
structure searching (3). This article will concentrate on the
spectroscopy databases listed in Table 1. Crystallographic
databases will not be covered here for reasons of space
limitations. For further details about these databases (Powder
Diffraction, NIST Single Crystal Data, Organic Xray Crystal Data,
and the Inorganic Xray Crystal Data, please refer to the sources
and related information (4-8).
Mass Spectrometry
Infrared
Raman
1H NMR
11B NMR
13C NMR
15N NMR
17O NMR
19F NMR
31P NMR
ESR
Computer-based spectral databases are both some of the
oldest and most widely used computerized products in chemistry.
This article will give an overview of the field, as well as
highlight some of the more interesting activities and sources of
spectral data. For more comprehensive articles on this subject
the reader is referred elsewhere (9,10). This article will cover
only computer readable databases which are available on magnetic
tape, floppy disks, or on CD-ROM. No mention will be made of the
many spectral collections available in printed form.
The reason for the long term and wide spread use of
computerized spectral databases stems from their practical value
in analytical and organic chemistry labs, as well as from the
ease in which these data could be made available, retrievable,
and searchable. While the value of such databases seems clear,
the need has not been met to the satisfaction of many (11, 12,
13). Isenhour (12), wrote of the frustration that he and his
colleagues have with the lack of large, representative, high
quality spectral databases which would enable further research in
search and interpretation studies. Furthermore Shelly (13) has
written "Although many spectral databases have been created, few
are of high quality and many are useless". The reason for this
are the errors in the databases, incomplete data, and lack of
structural information associated with a spectrum, all of which
inhibit further useful work in the areas of concern to Isenhour.
As we will see later in the paper, there has been some progress
in the five years since these comments were made. As these
databases have continued to develop the emphasis has moved from
quantity of data to the quality of the data. That is, the
largest collections are not necessarily the best, nor the most
widely used. This point will be further discussed later in this
article.
Infrared Data
Infrared data is the oldest type of spectral data which has
been available to the chemical community. The original printed
collections of data such as from Sadtler, were from prism and
grating instruments, which have lower resolution than the modern
FT-IR instruments. As Lias has noted (9), many spectroscopists
believe that such older are not adequate for good reference
databases. An example of the new generation of IR data is the
Aldrich-Nicolet Digital FT-IR database of condensed phase
(liquid) spectra and the condensed phase Sigma-Aldrich
Biochemical Library. Both contain about 12,000 spectra each. In
addition there is a vapor phase Aldrich FT-IR database of 5,000
spectra. There is also a collection of about 61,000 condensed
phase spectra available from Sadtler. Sadtler also sells a vapor
phase library of some 5,000 spectra. There is a smaller
collection of 3,000 FT-IR spectra which EPA commissioned in the
1980's for environmental analysis studies. More recently NIST
has been undertaking experimental work in order to expand this
collection and provide good reference data. NCLI (National
Chemical Laboratory for Industry in Tsukuba, Japan (14)) has a
database of about 26,000 unique IR spectra, which grows to 60,000
spectra when multiple spectra of the same chemical are included.
Chemical Concepts also has a database of some 28,000 IR spectra
obtained from BASF. These are full, FT-IR spectra, with
connection tables.
Raman
The NCLI (14) has a database of about 3600 Raman spectra,
without connection tables.
Mass Spectrometry
Mass spectral data, primarily owing to its importance in
environmental analysis, has become the most widely used tool for
chemical substance identification. The regulatory power of the
US EPA has been the driving force behind this activity and the
development of what now is the called the NIST/EPA/MSDC mass
spectral database (15). As there has been a recent discussion
(16) regarding the matter of mass spectral data quality it is
instructive to comment here about this issue. Over the past
three decades a number of mass spectral databases have been
developed in the US, UK, Germany, and Japan. At present the two
largest of these have evolved with very different philosophies in
how they are being built. A third database and associated search
software of about 30,000 high quality mass spectra from a number
of Max-Planck Institutes in Germany, as well as from
universities, is also available from Chemical Concepts (17).
The John Wiley collection (18) attempts to collect any
available spectra with no published acceptance criteria. This
database lacks the structural information which Shelly has
commented on (13) and does not contain any connection tables.
Thus one often (some 17,000 out of almost 140,000 spectra) finds
only 2-5 peaks in a spectrum. To compensate for the lack of data
the developer of the Wiley database computer-generates isotopic
abundance peaks. On the other hand the NIST collection of some
54,000 spectra, virtually of which has connection tables,
consists primarily of reference spectra for chemical analysis,
many of which have been obtained by the US Government by running
the mass spectra in the laboratory, thus insuring both high
quality as well as complete spectra. The importance of this
point is highlighted by the fact that of the spectra unique to
the Wiley database almost 42% (some 58,000 spectra) have less
than 10 peaks, whereas this is true in 3.5% of the NIST spectra
916). The NIST scientists also found that a spectrum unique to
the NIST database contains nearly 5 times the number of peaks as
a spectrum unique to the Wiley database. In addition to the data
quality study by NIST, the Max Planck Institute in Mulheim,
Germany is conducting their own data quality analysis of mass
spectral databases and they have also found similar results. In
summary, the issue of data quality is one that is not easily
discovered and potential users of these spectral databases should
carefully examine the content and description as part of their
consideration of which database to obtain which will meet their
needs.
NMR databases
While there are publicly available spectral databases of a
seven different of nuclei (primarily, 1H NMR, 11B NMR, 13C NMR, 15N
NMR, 17O NMR, 19F NMR, and 31P NMR), the discussions here will
focus on the 13C NMR databases, as this data is presently
regarded as the most useful.
13C NMR
The premier collection of over 100,000 13C NMR data is the so-called Bremser database, named after the original developer at
BASF (19). This database has been developed at BASF over the
past two decades and has been evaluated and checked carefully by
the BASF scientists. The database also contain chemical
structures and chemical shifts have been given assignment.
Recently the CAS Registry Number has been added to the database
entries which has further enhanced its value and usefulness.
There is also a database of about 30,000 spectra and
corresponding connection tables available from Sadtler (20).
Recently the German government has decided to sponsor a
company, Chemical Concepts (17), to take the Bremser BASF
database, together with spectral data from a wide variety of
other sources, and associated search and analysis software and
make the entire compilation available to the chemical community.
This database, which continues to grow at a substantial rate, now
contains about 100,000 spectra, all of which have connection
tables in which each carbon atom is assigned a particular
chemical shift.
1H NMR
There are four major collections of computer based 1H or
proton NMR data, obtained mostly at a frequency of 60 or 90 Mhz.
These are Chemical Concepts (about 13,000 spectra), Sasaki/Japan
(10,000 coded spectra and 4,000 digitized full spectra (21)),
NCLI (about 6,500 spectra)(14), and the Institute of Organic
Chemistry, Novosibirsk, USSR (about 50,000 spectra (22)).
11B NMR
Chemical Concepts (17) distributes a database of about 9000
spectra of 11B NMR data.
15N NMR
Chemical Concepts (17) distributes a database of about 1000
spectra of 15N NMR data.
17O NMR
Chemical Concepts (17) distributes a database of about 900
17O NMR database.
19F NMR
Fraser-Williams (23) distributes a PC searchable database of
about 10,000 spectra of 19F NMR data. The database is data and
text searchable. There is also a 19F NMR database of about 2000
spectra available from Chemical Concepts (17).
31P NMR
Chemical Concepts (17) distributes a small database of about
2200 spectra of 31P NMR data.
ESR
NCLI (14) has created a database of about 1,300 ESR spectra.
It has not yet been made publicly available.
Integrated Systems
The BASF system, parts of which have ben described earlier
in this article, is the one which Chemical Concepts has been
refining and improving so that it is can be viable and easily
product for the scientific community is called SpecInfo. The
unique features of SpecInfo, as opposed to most other single
spectral database systems is the integration of their software
for a total solution to a problem using spectral identity and
similarity, spectral interpretation, spectral simulation,
spectral calculations, and structure determination. This is best
illustrated in Figure 1, which is a schematic of how data from a
number of different spectroscopies can be integrated into a
solution of a typical lab problem. Figure 2 shows the status the
SpecInfo spectral databases.
There are two important features of the SpecInfo which
should be noted here. The first is the ability to add your own
spectra to the database so that an individual or laboratory can
make use of the powerful and various software analysis programs
in SpecInfo. The second is the policy of the Chemical Concepts to
work with universities in providing them with a no cost copy of
the system in return for spectral contributions.
Summary
In summary this article has described a number of computer
readable databases in the area of spectroscopy, as well as one
integrated spectral system. The size and quality of these
databases varies considerably, as does their cost. As one
considers the need to use any of these databases, the problems
which need to be solved should be well understood so that the
choice of which database to obtain will indeed provide you with
the proper solution to your problem.
References
1. S. R. Heller, "Online Chemical Information", Chem. Int., 9, 136-138(1987).
2. S. R. Heller, "Computer Databases of the Beilstein and Gmelin Institutes", Chem. Int., 11, 49-52(1989).
3. S. R. Heller and D. E. Meyer,
"Chemical Substructure Search Software for Personal Computers",
Chem. Int., 12, #3, 89-94
(1990).
4. International Centre for Diffraction Data, 1601 Park Lane, Swarthmore, PA 19081 USA. The cost of the complete set on CD-ROM is $ 1,250 for those who already subscribe to the printed product.
5. National Institute of Standards and Technology, Office of Standard Reference Data, Building 221, Room A-325, Gaithersburg, MD 20899 USA. The cost of the database is $1000.
6. Cambridge Crystallographic Data Centre, Lensfield Road, Cambridge CB2 1EW, UK. The cost of the database is difficult to specify owing to the unique pricing and access policy of this data center.
7. Inorganic Crystal Data Center, University of Bonn, Institute of Inorganic Chemistry, Gerhard-Domagk Strasse 1, D-5300 Bonn 1, Germany. This database is available online via the STN and CAN/SND (Canada) networks.
8. An discussion of the above four databases as well as related databases and software systems can be found in "Crystallographic Databases", published by the Data Commission of the International Union of Crystallography, Chester, UK (1987).
9. S. G. Lias, "Numeric Databases for Chemical Analysis", J. Res. of the NIST, 94, 25-35 (1989)
10. W. A. Warr, "Spectral Databases", Chemometrics and Intelligent Lab. Sys., in press, 1991.
11. S. R. Heller and R. Potenzone Jr, "Computer Readable Analytical Chemical Data - Comments on a Critical Need", Trends in Anal. Chem., 2, 218-221 (1983)
12. T. Isenhour, "Spectroscopic Databases", J. Chem. Inf. Comput. Sci., 26, 2A (1986).
13. C. Shelly, "Problems That Prevent Computer-Assisted Structure Elucidation From Becoming a Practical Tool, pages 6-25, in Computer -Supported Spectroscopic databases, Ed. J. Zupan, Ellis Horwood, UK (186).
14. Dr. K. Tanabe, National Chemical Laboratory for Industry, 1- 1 Higashi, Tsukuba, Ibaraki 305, Japan
15. The database is available either on high density floppy disks or CD-ROM. The system program written by Dr. Stephen E. Stein, National Institute of Standards and Technology, Office of Standard Reference Data, Building 221, Room A-325, Gaithersburg, MD 20899 USA. The database on floppy disks is available from NIST, OSRD, Physics Building, Room A323, Gaithersburg, MD. 20899 for $ 1050.00. The same database on a CD-ROM is available (catalog # Z21,399-3) from Aldrich Chemical Company, 1001 West Saint Paul Avenue, Milwaukee, WI 53233 USA for $ 1050.00.
16. S. E. Stein, P. Ausloos, and S. G. Lias, "Comparative Evaluations of Mass Spectral Databases", J. Amer. Soc. Mass Spectrom., 2, in press, (1991).
17. Chemical Concepts, Boschstrasse 12, PO Box 10 02 02, D-6940 Weinheim, Germany.
18. John Wiley & Sons, 605 Third Avenue, New York, NY 10158 USA.
19. W. Bremser, "Structure Elucidation and Artifical Intelligence, Angew. Chem., 100, 252-65 (1988).
20. Bio-Rad, Sadtler Division, 3316 Spring Garden Street, Philadelphia, PA 19104 USA.
21. Prof. S. Sasaki, Toyohashi University of Technology, 1-1 Hibarigaoka, Tempaku, Toyohashi 441, Japan
22. Dr. B. Derendjaev, Institute of Organic Chemistry, Prospect Lavrentiev 9, Siberian Division of the USSR Academy of Sciences, 630090 Novosibirsk-90, USSR
23. Fraser-Williams Ltd, London House, London Road South, Poynton, Cheshire, SK2 1NJ, UK.