The Beilstein Online Database

An Introduction



Stephen R. Heller

U.S. Department of Agriculture

Agricultural Research Service

Northeastern Region, Beltsville, MD 20705



One of the most exciting events in chemical information has been the conversion of

the Beilstein Handbook of Organic Chemistry (1), commonly referred to as Beilstein,

into computer-readable form and its availability as an online database. This event is

quite recent and has generated considerable interest in the chemical community.

Because no broad-based description of the online Beilstein database exists, a book

on this subject will be of value to various groups in the chemical community,

including organic chemists, information specialists, and those in the acade'me. This

book, based on a symposium (2), is the first collection of papers discussing the

important aspects of the computer-based version of the Beilstein Handbook

The Beilstein Handbook the most complete and systematic collection of evaluated

data on organic compounds, consists of over 350 printed volumes comprising more

than 275,000 pages of text. A chemical is included in Beilstein if it satisfies the

following three requirements. First, the chemical must be an organic compound.

Second, it must have a known, verified structure and must be able to exist as a pure

compound. Third, a fully described method of preparation and some published

physical or chemical data about the compound must be available. Entries in the

Beilstein Handbook come from the chemical literature including journals, from

patents, and from monographs.

To appreciate the enormous task involved in making the Beilstein database

available online, a brief history is in order. Beilstein, or more accurately, the standard

reference work known today as Beilsteins Handbuch der Organischen Chemie, is a

descendant of the original Handbuch, whose first edition was created by Friedrich

Konrad Beilstein in St. Petersburg in 1881.

Professor Beilstein, born in St. Petersburg, Russia, of German parents in 1838,

was educated at the Universities of Heidelberg, Munich, and Goettingen and assumed

a professorship at the Imperial Technical Institute in St. Petersburg in 1866. The first

edition of Beilsteins Handbuch der OrganischenChemie was publishedin1881 -

1882 and consisted of two volumes for a total of 2200 pages, in which about 15,000 organic compounds were described. Beilstein published a second edition (three volumes, 4080 pages) between 1885 and 1889 and a third edition (eight volumes, 11,000 pages) between 1892 and 1906, the year of his death.

The size and scope of the Handbook was such that it could no longer be managed by one individual, and accordingly, the German Chemical Society undertook this responsibility after Beilstein's death. The publication of the current edition of Beilstein, the fourth edition, began in 1918, under the editorship of P. Jacobson and B. Prager. In 1933, F. Richter was named as editor, and he was followed in 1961 by H.-G. Bolt. The current editor, R. Luckenbach, succeeded Professor Boit in 1978. The Beilstein Handbook is distributed by Springer Verlag publishers (3).

The fourth edition of Beilstein is the basis of the Beilstein Online database. It consists of a main work (Hauptwerk) and five supplementaryseries (Ergaenz~ngswerke). The combined supplement E III/IV is not regarded as a separate supplement. Each of these supplementary series covers different time periods, as shown in Table I. The basic work and the first four supplements were published in German, but the fifth supplement is in English, as will be all future supplements.

Table I. Organization of the Beilstein Handbook Fourth Edition



Series Period of Literature Covered Abbreviation,
Basic Series- 1830-1909 H
Supplement I 1910-1919 E I
Supplement II 1920-1929 E II
Supplement III 1930-1949 E III
Supplement III, IV 1930-1959 E III/IV1
Supplement IV 1950-1959 E IV
Supplement V 1960-1979 E V




1 Volumes 17-27 of the Supplemental Series III and IV, covering the heterocyclic compounds, were combined into a joint issue as a result of some disruptions occurring in Germany during the period involved.

The main work and each of the supplementary series consist of 27 "volumes," each of which may be one or more physical books. Which compounds appear in which volume is determined by the compound's chemical structure, as described in the Beilstein classification system procedures. Table II shows the main divisions of the Beilstein Handbook.

A central feature of the Handbook therefore, is the way in which it is organized. A specific structure will be found at essentially the same place in any of the supplementary series. In the past, determining that location from the structure required a knowledge of the Beilstein





Table II. Main Divisions of the Beilstein Handbook



Compound Group Volume Numbers
1. Acyclic compounds 1-4
2. Isocyclic or carbocyclic compounds 5-16
3. Heterocyclic compounds 17-27


systematic rules for filing. These rules are well described in a brochure published by the Beilstein Institute and entitled "How to Use Beilstein," which is available from either the Beilstein Institute or Springer Verlag publishers (1, 3).

A given compound will appear in the same volume in each series; thus thiophene is found in

Volume 17 of the main work and Volume 17 of each of the supplementary series, because Volume 17 is devoted to heterocyclics containing one chalcogen (Group VI; O. S. Se, or Te) hetero-atom.

The main consequence of this organization is that the Beilstein Handbook rather than being a linear chronological record, is really a series of "snap-shots," taken at 10- or 20-year intervals, of the entire organic chemical world.

Today, one can use the computer program SANDRA (Structure and Reference Analyzer) (3) to

locate a system number when looking up a compound in the Har~dbooic Alternatively, the Lawson Number (see Chapter 10) can be used with the online system. When a chemical is retrieved from Beilstein Online, the BeRsteu' citations typically one per series are provided. These citations are in the form 4 - 17 - 00 - 00093, which indicates page 93 of Volume 17 of the fourth supplementary series.

The heart of the Beilstein Handbook is the factual information associated with each compound.

Each published series contains new information about a compound, which means that a searcher may have to look into a number of volumes to find all the information on a given compound. In contrast, in the online database, data from the main work and all the supplementary series are combined to form a record that encompasses all the available knowledge about structure. The online version includes both the complete full records (Full File), which contain evaluated data, and the partial incomplete records (Short File), which contain unevaluated informat~on. The incomplete records provide a more up-to-date database for searching. Each full record of a compound in the Beilstein database contains the following information:

. Identification, structure, and configuration

. Natural occurrence, and isolation from natural products

. Preparation and purification

. Physical properties

. Chemical properties

. Analytical and characterization data

. Related salts and addition compounds

The total number of types of data that may be available for a compound is in excess of 200, such as for pyridine. Each of these fields may be searched and displayed. However, very few compounds have many fields of information actually in the record associated with the chemical. The exact details of searching and retrieval vary among the different online host implementations of the Beilstein database. At present Beilstein is available online on STN (4, 5) and DIALOG (6), and by mid-1990, it will be available through the Maxwell Online ORBIT system (7).

In addition to all the factual data, which are extensively reviewed and cross-checked before they are added to the database, a record is kept of the source of the data. A literature citation for every

measurement of a datum is provided. It is possible also to search through these literature citations, but only on the basis of the first author's surname.

Implementation of the Database

In contrast to the weekly updates of the CAS (Chemical Abstracts Serv~ce) Registry file, the

Beilstein Structure and Factual Database files are currently being updated on a much less frequent basis. In addition, the entire Beilstein Handbook is not yet available online. The schedule for implementation of the computer-readable form of the database is indicated in Table III.

TABLE III. Implementation of Beilstein Online Database
Step Year Available Class of Compounds
1 1988 Heterocyclic (Volumes 17 - 27) (Full File)
2 1989 Heterocyclic (Volumes 17 - 27) (Short File or excerpts)
3 1989 Acyclic (Volumes 1-4)

(Full File)

4 1990 Isocyclic (Volumes 5 - 16) (Short File or excerpts)
5 1991 Isocyclic (Volumes 5 - 16)

(Full File)





Online Access to Beilstein

As indicated in Table III, the first part of the database of the file to be mounted and made searchable online was the full portion on heterocyclic compounds of the BeilsteinHandbook Volumes 17-27. The period of the literature covered was from 1830 to 1959. Since then, the so-called Short File, or excerpts, of additional information and additional heterocyclic

compounds has been made available online. The Short File, or excerpts, includes

compounds from the scientific literature from 1960 to 1979 that have yet to be

critically reviewed and evaluated. These compounds correspond to the organic

chemicals that are found in the printed Beilstein Handbook Supplementary Series

V. The initial Full File of heterocyclic compounds compiled from 1830 to 1959

encompassed about 350,000 compounds. The Short File compiled from 1960 to

1979 added an additional 2.65 million compounds to the database, to bring the total

to a little over 3 million compounds. At this time, neither the Full File or Short File

contains any salts of the compounds.

Since the end of 1988, the first part of the Beilstein database has been available

online through the STN International system. The actual STN host computer on

which the database is mounted is in Karlsruhe, the Federal Republic of Germany

(4). At the end of 1989, the Beilstein database became available online on the

DIALOG system. By mid-1990 it will also be available through the Maxwell

Online ORBIT system. All three systems are provided with the same database

and chemical structures on computer tapes by the Beilstein Institute in

Frankfurt/Main. Details of the implementations of the STN and DL4LOG versions

of Beilstein Online can be found in Chapters 3 and 4, respectively. The following

sections discuss some overall comments about the three systems.

Each of the online vendors has different data and text search software. To

handle the types of numeric and factual data found in the Beilstein database, all

three vendors had to develop additional search software capabilities. As with

bibliographic searching, the differences in these three systems are both objective

and subjective. It is not the purpose of this chapter to evaluate the search software

of these three vendors. Indeed it would be of little value, because searchers tend to

have their own criteria or requirements for searching a particular vendor or vendors.

However, an overall view of the systems available on each vendor will be described

so that should a particular feature be of importance to a user, it will be properly

explained here.

The main differences between the three vendors is the kind of structure-

searching and display software used for exact and substructure searching. The three

systems are summarized in Table IV.























Table IV. Chemical Structure Searching Software for Beilstein Online



Vendor Software First Commercial Use Reference
STN CAS ONLINE 1981 5
DIALOG S4 1989 8
ORBIT HTSS 1987 9




It is interesting that none of the vendors chose the DARC software, available

on the Questel online system, which was one of the earliest (first used

commercially in 1980) structure-searching software systems to be available

commercially. The NIH/EPA CIS SANSS (10) software (first made commercially available in 1977 was also not chosen by any vendor for this database.

From the viewpoint of the end user or the online searcher, the variety of software packages for structure searching offers a unique opportunity to examine the different algorithms and methods of structure searching, and it is possible that, for some types of searching, one system may prove superior over another. The availability of these three systems will undoubtedly provide a

fertile ground for a number of interesting studies on the structure-searching capabilities of different software algorithms.

Costs

The cost of online searching has been rising over the years. From an hourly royalty of some $4 in the early 1970s to what is now well over $100, users have seen their online search system budgets increase at a rate much faster than that of inflation in most countries. The cost of searching bibliographic databases is now about $135 per hour, on the basis of 1989 prices from the three vendors who will be providing Beilstein Online. Structure searching of the CAS 9

million plus files of connection tables costs about $250 - 300 per hour.

The current cost of the Beilstein Online factual and structure files averages to $225 - 250 per hour for both STN and DIALOG, although the cost can go considerably higher, depending on the exact nature of what one does on the system. The system on STN has various connect time, search term, and other charges, so the average cost will depend on the exact type of search conducted. The DIALOG system charges a fixed hourly connect time price, and the cost

of using it for data searching is thus independent of the type of search

performed.

However, some additional fees for certain records and structures being printed out may be charged. ORBIT has chosen a third and different method of charging. However, as one vendor candidly pointed to a user grumbling about the cost of the Beilstein Online, all the vendors really charge essentially the same fees but use different algorithms The reason for this is simple. The

contracts each of the three vendors signed, which allow them to make the Beilstein database available, require a certain royalty revenue for usage. Thus, practically speaking, $250 per hour is the most likely cost you will incur on any of the three systems.

Although this amount may, at first glance, be regarded as high, the value for the money, or the cost-benefit analysis, is considered to be well worth the price, because in a bibliographic database, one must first perform a search and then look up the references or literature citations to find the actual data desired. The extra cost of using Beilstein Online pays for the "preprocessing" of each literature citation, which involved the intellectual extraction and evaluation of the data in the original article. Furthermore the data are then compared with other values found in other published articles for the same property for the chemical. This massive high level, high-quality, and very labor intensive effort, which then lute must pay in salaries for its Ph.D. chemists, must be recovered in some way. Although the government of the Federal Republic of Germany has provided subsidies over the years, in the near future the Beilstein Online database must recover its obvious high costs from its only real - source the user community.

Outline of This Book

This book consists of 10 chapters. This first chapter is an introduction to and an

overview of the Beilstein Online database and provides a short description of the

computer systems and related search software that were operational as of the date

of the symposium. The next chapter, by Clemens Jochum, the president of the

Beilstein Institute computer division, details how the database was assembled, its

current status, and the future of the computer activities at the Beilstein Institute.

Chapters 3 and 4 describe the specific implementations of the two currently operational systems, STN and DIALOG, respectively. The STN chapter, "The STN Implementation of the Beilstein Factual and Structure Databases," was written by Andreas Barth, the person responsible for the implementation of the system for STN. The chapter describes the types of information in the database on STN and gives numerous search examples. The DIALOG chapter, "Beilstein on

DIALOG," written by Kathy Haglund, the person responsible for the implementation of the system for DIALOG, is written in a similar fashion, with examples of how one performs a search on DIALOG. This chapter explains how DIALOG has indexed the Beilstein database and how it can be searched. One nice feature of the DIALOG implementation is the use of the S search operator to do a combined search for a property with a related refinement, such as boiling point

at a given pressure or heat of formation at a specific temperature. In addition, the DIALOG implementation describes how structures can be created, searched, and displayed by using a combination of the MOLKICK (11 ) program or the ROSDAL linear-notation structure strings, S4 structure searching, and the DLALOG GEOFF (Graphics Enhanced Output File Format) structure display program.

Besides the introduction in Chapter 4, little published information is available about the newly released S4 chemical structure search software system that DIALOG is using. This system is available for the first time to the online community. Chapter 5 is devoted to an extended discussion of the S4 system. Included in this chapter is a discussion of the MOLKICK (11) PC-based software package that allows one to easily draw structures for up-loading to the DIALOG mainframe computer for structure searching. The Wiswesser-type linear-notation

scheme used by the MOLKICK program, the ROSDAL (Representation of Organic Structure Descriptions Arranged Linearly) string, which is how structures are entered as queries into the S4 search system, is briefly mentioned.



Chapter 6 gives an industrial view from a large pharmaceutical company on how they use the Beilstein Online database. The examples in this chapter relate to using the Beilstein Online database as an aid in drug design research. A novel application of the use of the Lawson Number

(described in detail in Chapter 10) is also presented.

The next presentation, and by far the longest in this book, is by Damon Ridley. Chapter 7 describes the very valuable chemical reaction information that is part of the Beilstein Online database, with examples of how one can search for this information.

Chapter 8 examines another area of strength of the Beilstein Online database, the physical property data in the system. In his second chapter in this book, Andreas Barth gives a detailed account of data range searching for numeric data and what range searching really means as

implemented on STN. A number of examples are given in a number of figures in the chapter, showing how a particular search either gives a hit or a miss. The more sophisticated data searches are explained. From this discussion, the reader will be able to understand better the differences between checked (evaluated) and unchecked data in the database. The very useful and important

ability of the STN system to accept different units and to convert units within the system is also well described.

Chapter 9 is a view from academia relating the experience of an organic chemist in a chemistry department and of a librarian in a chemical engineering department, both of whom have been using Beilstein Online since it was first available.

Finally, Chapter 10 describes a valuable and new tool for structure similarity searching in Beilstein Online. The Lawson Number (LN), the computer system equivalent of the Beilstein classification system number, is valuable for many types of searches for particular types of organic chemical structures. This chapter very clearly explains the significance of the LN, why compounds have more than one LN, and how the new version of the SANDRA program (3) allows one to create the LNs for any structure drawn.

Literature Cited

1. Beilstein Institute, Varrentrappstrasse 40 - 42, D 6000 Frankfurt/M 90, Federal

Republic of Germany. The Beilstein Help Desk phone number is 49 - 69-7917-258. The FAX number is 49-69-7917 - 492.

2. "Experiences with Beilstein Online," sponsored by the Division of Computers in

Chemistry of the American Chemical Society, Fall ACS Meeting, Miami, FL,

September 12, 1989.

3. a) Springer Verlag New York, 175 Fifth Avenue, New York, NY 10010. The

Beilstein phone number is (212) 460 - 1622. The FAX number is (212) 473-6272.

b) Spring-Verlag, Tiergartenstrasse 17, D-6900 Heidelberg, Federal Republic of Germany. The phone number is 49 - 6221 - 487 - 457. The FAX number is 49 - 6221 - 43982.

4.STN International, Postfach 2465, D 7500 Karlsrohe 1, Federal

Republic of Germany. The phone number is 49 - 7247 - 82 - 4566.

5.STN International, CAS, PO Box 02228, Columbus, OH 43210. The

phone number is (800) 848-6538 or (614) 447 - 3600. The FAX

numbers are (614) 447 - 3709 or (614) 447 - 3713.

6. DIALOG Information Services, 3460 Hillview Avenue, Palo Alto, CA

94304. The phone number is (800) 334 - 2564 or (415) 858-2700.

7. Maxwell Online ORBIT Search Service, 8000 Westpark Drive, McLean, VA 22102. The phone number is (800) 456 - 7248 or (703) 442 - 0900. The FAX number is (703) 893 - 4632.

8. S4 is available from Softron GmbH, Rudolph Diesel Strasse 1, D 8032 Grafelfing, Federal

Republic of Germany. The phone number is 49-89-855-056. The FAX number is 49-89-852-170.

9. HTSS is available from ORAC Ltd., 18 Blenheim Terrace, Woodhouse Lane, Leeds LS2 9HD, United Kingdom. The phone number is 44-532-441-821. The FAX number is 44-532-448-283.

10. G. W. A. Milne, S. R. Heller, A. E. Fein, E. F. Frees, R. G. Marquart, J. A. McGill, and J. A. Miller,"The NIH/EPA Structure and Nomenclature Search System (SANSS)," J. Chem. Inf: Comput. Sci, 1978, 18, 181 - 185.

11. MOLKICK is available from Springer Verlag. See references 3a and 3b.



RECEIVED February 10, 1990