Stephen R. Heller
Research Leader
USDA, ARS, Beltsville, MD 20705
As a recent American Biotechnology Laboratory editorial (1)
indicated, after over 150 years of printed inorganic (Gmelin) and
organic (Beilstein) handbooks, there are considerable computer
activities taking place in Germany that will have a significant
effect on scientists throughout the world. While the editorial
gave a very good, but brief description of the activities of the
Gmelin Institute, this article will describe the activities of
the institute with which Gmelin shares a building in Frankfurt,
West Germany: The Beilstein Institute.
While Gmelin is a decade older than the Beilstein Institute,
the Beilstein group has taken the lead in computerization
activities. As might be expected, while the Gmelin Handbook
presents information on more than 250,000 inorganic compounds,
the Beilstein Handbook of Organic Chemistry, comprising over 300
volumes, contains factual data on over three million organic
chemicals.
The Beilstein Handbook of Organic Chemistry is the premier
printed collection of important published data on the preparation
and properties of carbon compounds. It is, by far, the largest
collection of evaluated scientific data in the field of
chemistry. The Beilstein Handbook is produced by the nonprofit
Beilstein Institute and distributed by the German publisher
Springer-Verlag. The Beilstein Institute editors and staff
critically sift and correlate the data from the literature and
point out errors in the published data, providing the user with
much more than just extracted results of scientific publications.
Chemical Abstracts, which is purely bibliographic in nature,
performs no quality control or examination of results, and in
recent years, primarily uses authors' abstracts directly without
any review. The current Beilstein Handbook consists of over 300
printed volumes, covering from 1830 to 1979. The original work or
Basic Series (Hauptwerk in German) covered the literature from
1830 to 1909, the latter year being just about the time Chemical
Abstracts started its bibliographic abstracting and indexing
service. A number of Supplementary Series (Erganzungswerk in
German) have since been published. The first, E Series I runs
from 1910 to 1919, E II from 1920 to 1929, E m from 1930 to
1949, E IV from 1950 to 1959, and lastly, E IV runs from 1960 to
1979. In addition, there are some 27 Cumulative Indexes. Until
the start of the 5th Series, the entire handbook was written in
German,after which time the handbook was written in English, to
take into account the movement of the scientific community from
German to English as the primary language of scientific
communication.
SANDRA
With this background, it is easy to understand both the need of
chemists for the Beilstein Handbook and the great value offered
by a computer program that is able to quickly and accurately
indicate where, among the 300 volumes, data and information can
be found for a given organic chemical. The Beilstein Handbook is
well organized, but the structure of the ordering system is
difficult to learn, and easily forgotten. Thus the SANDRA program
is a tool every chemist dealing with properties of organic
chemicals should have available for use with a PC.
SANDRA (available from Springer-Verlag Publishers, New York, New
York), the acronym for Structure and Reference Analyzer, is an
IBM PC DOS-based program (a Macintosh version is not expected)
which takes a chemical structure the user draws easily on the PC
screen, and indicates where in the 300 volumes of the printed
Beilstein Handbook referenced compound can be found. The program,
written at the Beilstein Institute, is a major advance in the
tools which the Institute has created to help the user locate a
chemical in the Beilstein Handbook. The highly structured
Beilstein ordering and indexing system has always been a handicap
to the ability of chemists to use the 300 odd volumes of the
Beilstein Handbook. Now, with one very well designed and easy-to-use program, it is possible for organic chemists and even non-chemists to easily find references in the Beilstein Handbook.
SANDRA is easy to install, and the manual gives extensive instructions on how to use the program. It takes only a few minutes to load the program on a hard disk and start it running. The program requires 256K memory, DOS 2.0 or higher, a Microsoft or equivalent mouse, and an IBM CGA or equivalent graphics adapter board with a resolution of640 x 200 pixels. Version 1.0 of the program will not run with a Hercules graphics board, but a later version, due to be released by the end of 1987, will work with the Hercules board. The flexible graphical structure input is easy to learn. If a command is forgotten, the user need only touch the mouse pointer to the HELP COMMANDS box, and the list of commands is immediately displayed on the screen. There are 30 predefined structure templates, and the user's own templates can be created and stored.
Figure 1.
Result of
entering the amno-hydrozy aromatic ring, C11.H17.N.O. into the SANDRA
program.
Figure 2.
Final output of the SANDRA program analysis for the amino-hydroxy
aromatic compound with pointer information and reference information.
A sample structure, C11.H17.N.O, shown after input, is
illustrated in
Figure 1. Atoms can be labeled and numbered for
easier identification. The analysis is performed after an
acceptable structure is entered, by simply typing "Q" to quit the
structure entry and "2" to start the analysis part of the
program. The analysis normally takes from 2 to 6 sec, depending
on the computer used (IBM XT or AT, or Compaq 386) and the
complexity and size of the structure entered. The author was able
to draw a 70-atom molecule, which is the maximum number of atoms
allowed by the program. The correct pointers and page numbers for
the molecule were supplied in 12 seconds.
Figure 2 shows the output of the program after the analysis is
finished. The examples shown in Figures 1 and 2 are a direct
screen dump of what one sees on the screen, and were created
using the IBM PC DOS print screen keyboard function. The output
information in Figure 2 shows the value of the SANDRA program. A
number of pointers are shown in
Figure 2. The first is the H-page
number (in this case 574-624), and then the Beilstein System
number (in this case 1855) is shown. The degree of unsaturation
(2n-6) and the carbon number (in this case 7, which turns out not
to be the same as the number of carbons in the molecule) are
further indicators to finding the exact page in the Beilstein
Handbook where this compound can be found. The information in the
bottom left-hand corner of
Figure 2 shows the other Supplementary
volumes (E IV, 13/3 and E III, 13/3) where more recent
information on this chemical can be found. The lower right-hand
corner contains the molecular formula of the molecule.
Computer readable files
The second area of computer activity at the Beilstein Institute
is the creation of two databases, a Structure File and a Factual
File. The Structure File will consist, eventually, of about three
million chemical structures,and will provide a complete
topological structure representation for each chemical. This file
will consist of Beilstein Registry Connection Tables (BRCT), the
largest collection of complete structure representations ever
compiled. The BRCT will contain stereochemical information on
organic molecules that the Chemical Abstracts database (of over
eight million chemicals) does not contain. The BRCT will contain
a number of fields, but details of the structure record can be
found elsewhere.2 The entire structure file on computer tape will
be available for lease in 1988 with a number of existing
structure search software systems, such as the French DARC
system, the Hungarian HTSS software, and the Molecular Design
MACSS software.
The Beilstein Factual File will contain over 7.5 factual records
for organic compounds dating back to 1830. More than 400 fields
of information exist in the Factual File, with more than 60 of
these being numeric data fields. The database will contain all
the physical and chemical properties relevant to the compounds in
the database. Each property will have a literature citation. The
entire Factual File will be available in 1988, but only on-line.
At present, the Beilstein Institute does not plan to lease the
Factual File. The database will be available first on the
Lockheed DIALOG system, and later on the STN Network. The delay
in the latter version is due to the lack of appropriate software
to perform numeric data searching. It is probable that additional
on-line vendors will make the Factual File available at a later
date. The decision as to what software DIALOG all use for
structure searching has not yet been finalized; however, the data
searching software is expected to be an enhanced version of the
current DIALOG search software.
The actual factual database will consist of two parts, evaluated
data and non-evaluated data. All data from the 1830s through 1979
(corresponding to the printed Beilstein Handbook H. E-I, E-II, E-III, E-III/IV, E-IV, and E-V Series) will be critically evaluated
before going on-line in the DIALOG system. From 1980 onward, the
data will first be put on-line in a non-evaluated form, to be
replaced as the data are critically evaluated by the Beilstein
scientists. This will update the Beilstein database. A further
distinction will be made in the critically evaluated data that
will be available. For the evaluated data, there will be two
types of compounds, the Large Information Compounds (UC) and the
Small Information Compounds (SIC). LICs, which comprise just a
few percent of the entire database (probably less than 5% of the
total chemicals in the database), are chemicals which are very
important in the fields of chemistry, biochemistry,
pharmaceuticals, and agrochemistry. For these LICs. the on-line
Beilstein database on DIALOG will have only a subset of all the
information available for the compound. However, the Beilstein
Handbook win continue to have all the information for each UC.
For the SICs all the information in the Beilstein Handbook
will be available in the DIALOG on-line version.
The search capabilities of the system on DIALS will allow for the
type of searches described by Andersen in his editorial (1).
Thus, one will be able to enter a melting point (or boiling
point) and a second property, such as a density of 0.8852, and
quickly get a list of all chemicals that meet these criteria,
along with the relevant literature citations. If there are
skill too many compounds which satisfy these criteria, then a
third criterion could be added to further narrow down the
search to a reasonable number of possible answers.
While the cost to the user for the DIALOG and STN on-line
versions has yet to be established, the Beilstein Institute
and Springer-Verlag have stated that those who do subscribe
and continue to subscribe to the printed Handbook editions
will receive a discount on their on-line use.
Summary
With the activities of the Max Planck Society, Gmelin Institute, and the Beilstein Institute, the chemical community will be provided wide a vast treasure of high quality, evaluated chemical and physical property data on millions of organic and inorganic chemicals in computer-readable form. This will enable scientists to easily obtain, manipulate, and correlate data in a manner never before possible.
References
1. ANDERSEN. H.C.. Am. Biotech. Lab. S (1), 4-6 (1987).
2. JOCHUM, C., WITTIG, G., and WELFORD, S., "Search possibilities
depend on the data structure: The Beilstein facts," in Procecdings
of the 10th International Online Meeting (London), December
1986 (Learned Information, Medford, New Jersey, 1986), 43~52.