The Beilstein Online Database
Stephen R. Heller
U.S. Department of Agriculture
Agricultural Research Service
Northeastern Region, Beltsville, MD 20705
One of the most exciting events in chemical information has been the conversion of
the Beilstein Handbook of Organic Chemistry (1), commonly referred to as Beilstein,
into computer-readable form and its availability as an online database. This event is
quite recent and has generated considerable interest in the chemical community.
Because no broad-based description of the online Beilstein database exists, a book
on this subject will be of value to various groups in the chemical community,
including organic chemists, information specialists, and those in the acade'me. This
book, based on a symposium (2), is the first collection of papers discussing the
important aspects of the computer-based version of the Beilstein Handbook
The Beilstein Handbook the most complete and systematic collection of evaluated
data on organic compounds, consists of over 350 printed volumes comprising more
than 275,000 pages of text. A chemical is included in Beilstein if it satisfies the
following three requirements. First, the chemical must be an organic compound.
Second, it must have a known, verified structure and must be able to exist as a pure
compound. Third, a fully described method of preparation and some published
physical or chemical data about the compound must be available. Entries in the
Beilstein Handbook come from the chemical literature including journals, from
patents, and from monographs.
To appreciate the enormous task involved in making the Beilstein database
available online, a brief history is in order. Beilstein, or more accurately, the standard
reference work known today as Beilsteins Handbuch der Organischen Chemie, is a
descendant of the original Handbuch, whose first edition was created by Friedrich
Konrad Beilstein in St. Petersburg in 1881.
Professor Beilstein, born in St. Petersburg, Russia, of German parents in 1838,
was educated at the Universities of Heidelberg, Munich, and Goettingen and assumed
a professorship at the Imperial Technical Institute in St. Petersburg in 1866. The first
edition of Beilsteins Handbuch der OrganischenChemie was publishedin1881 -
1882 and consisted of two volumes for a total of 2200 pages, in which about 15,000 organic
compounds were described. Beilstein published a second edition (three volumes, 4080 pages)
between 1885 and 1889 and a third edition (eight volumes, 11,000 pages) between 1892 and
1906, the year of his death.
The size and scope of the Handbook was such that it could no longer be managed by one individual, and accordingly, the German Chemical Society undertook this responsibility after Beilstein's death. The publication of the current edition of Beilstein, the fourth edition, began in 1918, under the editorship of P. Jacobson and B. Prager. In 1933, F. Richter was named as editor, and he was followed in 1961 by H.-G. Bolt. The current editor, R. Luckenbach, succeeded Professor Boit in 1978. The Beilstein Handbook is distributed by Springer Verlag publishers (3).
The fourth edition of Beilstein is the basis of the Beilstein Online database. It consists of a main
work (Hauptwerk) and five supplementaryseries (Ergaenz~ngswerke). The combined supplement
E III/IV is not regarded as a separate supplement. Each of these supplementary series covers
different time periods, as shown in Table I. The basic work and the first four supplements were
published in German, but the fifth supplement is in English, as will be all future supplements.
Table I. Organization of the Beilstein Handbook Fourth Edition
|Period of Literature Covered
|Supplement III, IV
1 Volumes 17-27 of the Supplemental Series III and IV, covering the heterocyclic compounds,
were combined into a joint issue as a result of some disruptions occurring in Germany during the
The main work and each of the supplementary series consist of 27 "volumes," each of which may
be one or more physical books. Which compounds appear in which volume is determined by the
compound's chemical structure, as described in the Beilstein classification system procedures.
Table II shows the main divisions of the Beilstein Handbook.
A central feature of the Handbook therefore, is the way in which it is organized. A specific
structure will be found at essentially the same place in any of the supplementary series. In the
past, determining that location from the structure required a knowledge of the Beilstein
Table II. Main Divisions of the Beilstein Handbook
|1. Acyclic compounds
|2. Isocyclic or carbocyclic compounds
|3. Heterocyclic compounds
systematic rules for filing. These rules are well described in a brochure published by the Beilstein
Institute and entitled "How to Use Beilstein," which is available from either the Beilstein
Institute or Springer Verlag publishers (1, 3).
A given compound will appear in the same volume in each series; thus thiophene is found in
Volume 17 of the main work and Volume 17 of each of the supplementary series, because
Volume 17 is devoted to heterocyclics containing one chalcogen (Group VI; O. S. Se, or Te)
The main consequence of this organization is that the Beilstein Handbook rather than being a
linear chronological record, is really a series of "snap-shots," taken at 10- or 20-year intervals, of
the entire organic chemical world.
Today, one can use the computer program SANDRA (Structure and Reference Analyzer) (3) to
locate a system number when looking up a compound in the Har~dbooic Alternatively, the
Lawson Number (see Chapter 10) can be used with the online system. When a chemical is
retrieved from Beilstein Online, the BeRsteu' citations typically one per series are provided.
These citations are in the form 4 - 17 - 00 - 00093, which indicates page 93 of Volume 17 of the
fourth supplementary series.
The heart of the Beilstein Handbook is the factual information associated with each compound.
Each published series contains new information about a compound, which means that a searcher
may have to look into a number of volumes to find all the information on a given compound. In
contrast, in the online database, data from the main work and all the supplementary series are
combined to form a record that encompasses all the available knowledge about structure. The
online version includes both the complete full records (Full File), which contain evaluated data,
and the partial incomplete records (Short File), which contain unevaluated informat~on. The
incomplete records provide a more up-to-date database for searching. Each full record of a
compound in the Beilstein database contains the following information:
. Identification, structure, and configuration
. Natural occurrence, and isolation from natural products
. Preparation and purification
. Physical properties
. Chemical properties
. Analytical and characterization data
. Related salts and addition compounds
The total number of types of data that may be available for a compound is in excess of 200, such
as for pyridine. Each of these fields may be searched and displayed. However, very few
compounds have many fields of information actually in the record associated with the chemical.
The exact details of searching and retrieval vary among the different online host implementations
of the Beilstein database. At present Beilstein is available online on STN (4, 5) and DIALOG (6),
and by mid-1990, it will be available through the Maxwell Online ORBIT system (7).
In addition to all the factual data, which are extensively reviewed and cross-checked before they are added to the database, a record is kept of the source of the data. A literature citation for every
measurement of a datum is provided. It is possible also to search through these literature
citations, but only on the basis of the first author's surname.
Implementation of the Database
In contrast to the weekly updates of the CAS (Chemical Abstracts Serv~ce) Registry file, the
Beilstein Structure and Factual Database files are currently being updated on a much less
frequent basis. In addition, the entire Beilstein Handbook is not yet available online. The
schedule for implementation of the computer-readable form of the database is indicated in Table
TABLE III. Implementation of Beilstein Online Database
|Class of Compounds
|Heterocyclic (Volumes 17 - 27) (Full File)
|Heterocyclic (Volumes 17 - 27) (Short File or excerpts)
|Acyclic (Volumes 1-4)
|Isocyclic (Volumes 5 - 16) (Short File or excerpts)
|Isocyclic (Volumes 5 - 16)
Online Access to Beilstein
As indicated in Table III, the first part of the database of the file to be mounted and made searchable online was the full portion on heterocyclic compounds of the BeilsteinHandbook Volumes 17-27. The period of the literature covered was from 1830 to 1959. Since then, the so-called Short File, or excerpts, of additional information and additional heterocyclic
compounds has been made available online. The Short File, or excerpts, includes
compounds from the scientific literature from 1960 to 1979 that have yet to be
critically reviewed and evaluated. These compounds correspond to the organic
chemicals that are found in the printed Beilstein Handbook Supplementary Series
V. The initial Full File of heterocyclic compounds compiled from 1830 to 1959
encompassed about 350,000 compounds. The Short File compiled from 1960 to
1979 added an additional 2.65 million compounds to the database, to bring the total
to a little over 3 million compounds. At this time, neither the Full File or Short File
contains any salts of the compounds.
Since the end of 1988, the first part of the Beilstein database has been available
online through the STN International system. The actual STN host computer on
which the database is mounted is in Karlsruhe, the Federal Republic of Germany
(4). At the end of 1989, the Beilstein database became available online on the
DIALOG system. By mid-1990 it will also be available through the Maxwell
Online ORBIT system. All three systems are provided with the same database
and chemical structures on computer tapes by the Beilstein Institute in
Frankfurt/Main. Details of the implementations of the STN and DL4LOG versions
of Beilstein Online can be found in Chapters 3 and 4, respectively. The following
sections discuss some overall comments about the three systems.
Each of the online vendors has different data and text search software. To
handle the types of numeric and factual data found in the Beilstein database, all
three vendors had to develop additional search software capabilities. As with
bibliographic searching, the differences in these three systems are both objective
and subjective. It is not the purpose of this chapter to evaluate the search software
of these three vendors. Indeed it would be of little value, because searchers tend to
have their own criteria or requirements for searching a particular vendor or vendors.
However, an overall view of the systems available on each vendor will be described
so that should a particular feature be of importance to a user, it will be properly
The main differences between the three vendors is the kind of structure-
searching and display software used for exact and substructure searching. The three
systems are summarized in Table IV.
Table IV. Chemical Structure Searching Software for Beilstein Online
|First Commercial Use
It is interesting that none of the vendors chose the DARC software, available
on the Questel online system, which was one of the earliest (first used
commercially in 1980) structure-searching software systems to be available
commercially. The NIH/EPA CIS SANSS (10) software (first made commercially available in
1977 was also not chosen by any vendor for this database.
From the viewpoint of the end user or the online searcher, the variety of software packages for structure searching offers a unique opportunity to examine the different algorithms and methods of structure searching, and it is possible that, for some types of searching, one system may prove superior over another. The availability of these three systems will undoubtedly provide a
fertile ground for a number of interesting studies on the structure-searching capabilities of
different software algorithms.
The cost of online searching has been rising over the years. From an hourly royalty of some $4 in the early 1970s to what is now well over $100, users have seen their online search system budgets increase at a rate much faster than that of inflation in most countries. The cost of searching bibliographic databases is now about $135 per hour, on the basis of 1989 prices from the three vendors who will be providing Beilstein Online. Structure searching of the CAS 9
million plus files of connection tables costs about $250 - 300 per hour.
The current cost of the Beilstein Online factual and structure files averages to $225 - 250 per hour for both STN and DIALOG, although the cost can go considerably higher, depending on the exact nature of what one does on the system. The system on STN has various connect time, search term, and other charges, so the average cost will depend on the exact type of search conducted. The DIALOG system charges a fixed hourly connect time price, and the cost
of using it for data searching is thus independent of the type of search
However, some additional fees for certain records and structures being printed out may be charged. ORBIT has chosen a third and different method of charging. However, as one vendor candidly pointed to a user grumbling about the cost of the Beilstein Online, all the vendors really charge essentially the same fees but use different algorithms The reason for this is simple. The
contracts each of the three vendors signed, which allow them to make the Beilstein database
available, require a certain royalty revenue for usage. Thus, practically speaking, $250 per hour
is the most likely cost you will incur on any of the three systems.
Although this amount may, at first glance, be regarded as high, the value for the money, or the
cost-benefit analysis, is considered to be well worth the price, because in a bibliographic
database, one must first perform a search and then look up the references or literature citations to
find the actual data desired. The extra cost of using Beilstein Online pays for the "preprocessing"
of each literature citation, which involved the intellectual extraction and evaluation of the data in
the original article. Furthermore the data are then compared with other values found in other
published articles for the same property for the chemical. This massive high level, high-quality,
and very labor intensive effort, which then lute must pay in salaries for its Ph.D. chemists, must
be recovered in some way. Although the government of the Federal Republic of Germany has
provided subsidies over the years, in the near future the Beilstein Online database must recover
its obvious high costs from its only real - source the user community.
Outline of This Book
This book consists of 10 chapters. This first chapter is an introduction to and an
overview of the Beilstein Online database and provides a short description of the
computer systems and related search software that were operational as of the date
of the symposium. The next chapter, by Clemens Jochum, the president of the
Beilstein Institute computer division, details how the database was assembled, its
current status, and the future of the computer activities at the Beilstein Institute.
Chapters 3 and 4 describe the specific implementations of the two currently operational systems, STN and DIALOG, respectively. The STN chapter, "The STN Implementation of the Beilstein Factual and Structure Databases," was written by Andreas Barth, the person responsible for the implementation of the system for STN. The chapter describes the types of information in the database on STN and gives numerous search examples. The DIALOG chapter, "Beilstein on
DIALOG," written by Kathy Haglund, the person responsible for the implementation of the system for DIALOG, is written in a similar fashion, with examples of how one performs a search on DIALOG. This chapter explains how DIALOG has indexed the Beilstein database and how it can be searched. One nice feature of the DIALOG implementation is the use of the S search operator to do a combined search for a property with a related refinement, such as boiling point
at a given pressure or heat of formation at a specific temperature. In addition, the DIALOG
implementation describes how structures can be created, searched, and displayed by using a
combination of the MOLKICK (11 ) program or the ROSDAL linear-notation structure strings,
S4 structure searching, and the DLALOG GEOFF (Graphics Enhanced Output File Format)
structure display program.
Besides the introduction in Chapter 4, little published information is available about the newly released S4 chemical structure search software system that DIALOG is using. This system is available for the first time to the online community. Chapter 5 is devoted to an extended discussion of the S4 system. Included in this chapter is a discussion of the MOLKICK (11) PC-based software package that allows one to easily draw structures for up-loading to the DIALOG mainframe computer for structure searching. The Wiswesser-type linear-notation
scheme used by the MOLKICK program, the ROSDAL (Representation of Organic Structure
Descriptions Arranged Linearly) string, which is how structures are entered as queries into the S4
search system, is briefly mentioned.
Chapter 6 gives an industrial view from a large pharmaceutical company on how they use the Beilstein Online database. The examples in this chapter relate to using the Beilstein Online database as an aid in drug design research. A novel application of the use of the Lawson Number
(described in detail in Chapter 10) is also presented.
The next presentation, and by far the longest in this book, is by Damon Ridley. Chapter 7
describes the very valuable chemical reaction information that is part of the Beilstein Online
database, with examples of how one can search for this information.
Chapter 8 examines another area of strength of the Beilstein Online database, the physical property data in the system. In his second chapter in this book, Andreas Barth gives a detailed account of data range searching for numeric data and what range searching really means as
implemented on STN. A number of examples are given in a number of figures in the chapter, showing how a particular search either gives a hit or a miss. The more sophisticated data searches are explained. From this discussion, the reader will be able to understand better the differences between checked (evaluated) and unchecked data in the database. The very useful and important
ability of the STN system to accept different units and to convert units within the system is also
Chapter 9 is a view from academia relating the experience of an organic chemist in a chemistry
department and of a librarian in a chemical engineering department, both of whom have been
using Beilstein Online since it was first available.
Finally, Chapter 10 describes a valuable and new tool for structure similarity searching in
Beilstein Online. The Lawson Number (LN), the computer system equivalent of the Beilstein
classification system number, is valuable for many types of searches for particular types of
organic chemical structures. This chapter very clearly explains the significance of the LN, why
compounds have more than one LN, and how the new version of the SANDRA program (3)
allows one to create the LNs for any structure drawn.
1. Beilstein Institute, Varrentrappstrasse 40 - 42, D 6000 Frankfurt/M 90, Federal
Republic of Germany. The Beilstein Help Desk phone number is 49 - 69-7917-258. The FAX
number is 49-69-7917 - 492.
2. "Experiences with Beilstein Online," sponsored by the Division of Computers in
Chemistry of the American Chemical Society, Fall ACS Meeting, Miami, FL,
September 12, 1989.
3. a) Springer Verlag New York, 175 Fifth Avenue, New York, NY 10010. The
Beilstein phone number is (212) 460 - 1622. The FAX number is (212) 473-6272.
b) Spring-Verlag, Tiergartenstrasse 17, D-6900 Heidelberg, Federal Republic of Germany. The
phone number is 49 - 6221 - 487 - 457. The FAX number is 49 - 6221 - 43982.
4.STN International, Postfach 2465, D 7500 Karlsrohe 1, Federal
Republic of Germany. The phone number is 49 - 7247 - 82 - 4566.
5.STN International, CAS, PO Box 02228, Columbus, OH 43210. The
phone number is (800) 848-6538 or (614) 447 - 3600. The FAX
numbers are (614) 447 - 3709 or (614) 447 - 3713.
6. DIALOG Information Services, 3460 Hillview Avenue, Palo Alto, CA
94304. The phone number is (800) 334 - 2564 or (415) 858-2700.
7. Maxwell Online ORBIT Search Service, 8000 Westpark Drive, McLean, VA 22102. The
phone number is (800) 456 - 7248 or (703) 442 - 0900. The FAX number is (703) 893 - 4632.
8. S4 is available from Softron GmbH, Rudolph Diesel Strasse 1, D 8032 Grafelfing, Federal
Republic of Germany. The phone number is 49-89-855-056. The FAX number is
9. HTSS is available from ORAC Ltd., 18 Blenheim Terrace, Woodhouse Lane, Leeds LS2 9HD,
United Kingdom. The phone number is 44-532-441-821. The FAX number is 44-532-448-283.
10. G. W. A. Milne, S. R. Heller, A. E. Fein, E. F. Frees, R. G. Marquart, J. A. McGill, and J. A.
Miller,"The NIH/EPA Structure and Nomenclature Search System (SANSS)," J. Chem. Inf:
Comput. Sci, 1978, 18, 181 - 185.
11. MOLKICK is available from Springer Verlag. See references 3a and 3b.
RECEIVED February 10, 1990