Computer Databases at the Beilstein and Gmelin Institutes

Stephen R. Heller
Secretary, IUPAC Committee on Chemical Databases
USDA, ARS, BARC-W
Building 007, Room 56
Beltsville, MD 20705-2350 USA




Introduction

Part of the educational activities of the Committee on Chemical Databases (CCDB) is to inform IUPAC members of interesting and significant computer activities. This is a part of the CCDB term of reference to promote a higher awareness of the applications of computers in the management, dissemination, and uses of chemical data. The first article described online computer services (1). This article is the second in a series the CCDB is providing to Chemistry International.

The shelves of a few thousand libraries around the world are home to hundreds and hundreds of volumes of the well known and highly respected German works of the Beilstein and Gmelin Handbooks (2). Starting at the end of 1988, and continuing for the next few years, these two massive and important reference works will become even more well known and used as these two giants of chemical data and information become more active as the 20th century comes to an end.

Beilstein

By the end of 1988 parts of the Beilstein Handbook of Organic Chemistry will be available online, in addition, the connection tables structures will be available for online searching. It will also be possible to be able to lease these connection tables for search on in-house computers (3). Thus, a new and computerized complementary resource for evaluated data of organic compounds will be available to the scientific community.

This will enhance the value of this unique scientific resource to the entire chemical community.

The Beilstein database is really composed of two separate parts. These two computer readable databases, the factual and structure databases, have been developed under the direction of Dr. C. Jochum, a member of the IUPAC CCDB. The Beilstein databases will be available on the DIALOG and STN systems in 1988, and shortly thereafter on the ORBIT and Datastar systems (4). While the painful days of learning German and the complex and exacting Beilstein organizational rules were over when the SANDRA (Structure AND Reference Analyzer) computer program (written by Dr. S. Lawson, a member of the IUPAC Publications Committee) was made available last year (5), one still needed to go to the multi-volume Beilstein Handbook to get the data and information one needed. Within months this will begin to be no longer necessary for most (but not all) of the data one will be looking for in the Beilstein Handbook. While the database is going online in increments, and for the group of large information compounds (LIC's those compounds with too much data and information to be usefully stored online, such as benzene, aniline, and so forth) the full data will never go online, but will certainly be a considerable improvement over what was the case in the past with the printed Handbook volumes. The first increment to go online will be the heterocyclic compounds, with acyclic and carbocyclic compounds to follow over the next few years, so don't expect too much too soon.

So that Chemistry International readers don't misunderstand what Beilstein Online is and is not, it should be stated clearly that the database will be data and structure searchable. The original data from the Main or Basic (Hauptwerk) and Supplementary volumes (Erganzungswerk) I-IV which appear in German are still in German in the online version. (However most of the information from H-IV will be in english in the online version.) Nevertheless the names of all compounds which are still in german in the Handbook are being translated into english for the online version. While only the data and information from the 5th Supplemental Series (E-V) onwards will be in English, this will, in fact, comprise the bulk of the information in the entire Beilstein database. There will be some 400 different parameter fields or data elements in the factual portion of the database, most of which will be searchable, while one will be able to display all parameter fields. Of the 400 searchable factual fields about 60 are numerically searchable, some are searchable text fields (such as molecular formula, chemical names, and so forth), while the rest are searchable using keywords.

For example one will be able to search for all compounds with a dipole moment within a specified range combined which also have toxicity data. One could also search for a chemical which is a reaction product, or for starting materials of a reaction. Thus it will be possible to search for how to prepare 2-nitro-thiophene using acetic anhydride as a starting material and obtain the appropriate reaction conditions and literature citations. The immense number of useful parameters make it impossible to do justice to the content of the database in this brief article.

There will also be two parts to the online factual database, the Full File (completely evaluated data) and the Short File (not yet evaluated data). The Full File will consist of virtually all the data which have been printed in the Handbook and which have been checked for errors and redundancies by the highly trained Beilstein staff. The Full File, of completely evaluated data, will consist of two types of compounds. In the Full File there will be the so-called large information compounds (LIC's) and the short information compounds (SIC's). As stated previously, the LIC's are compounds for which there is a considerable amount of information, which often are a number of pages in length in the Handbook. The LIC's comprise less than 2% of the total entries in the Beilstein Handbook. The SIC's usually take less than 1/4 of a page in the Handbook. The Short File, which is not to be confused with the SIC's, will be made up of the abstracts from the primary literature which have not yet undergone the detailed checking which the Beilstein Handbook is so well known for. Factual data of the current literature will be added to the Short File on a regular basis. In this way Beilstein Online will be more up-to-date than the printed Handbook volumes. As a record is checked, it will be transferred to the Full File, and also printed in the latest and appropriate printed Handbook volume. To distinguish between the Full File and the Short File, all Full File records will be noted in the online versions as "Handbook Data". (For the complete details about Beilstein Online searching, which are too lengthy to outline here, the reader is referred to the Beilstein Online Manual, available from the Beilstein publisher (2).) All parameters can be displayed or printed, including the chemical structure of the compound. While the database provided to the online vendors will be the same on all online systems, as each vendor will implement the database slightly differently owing to the different data and structure search software they use, one should carefully examine the features of a particular vendor prior to choosing which one you will use. As a final point it should be again noted that the first part of the Beilstein Handbook which will be going online in 1988 will be the heterocycles, Beilstein volumes 17-27, and the time covered will be from 1830 to the latest published volumes in the E-IV series.

While the cost of searching has not been announced, it is plausible to assume it will exceed the cost of the approximately $150 per hour cost to search the CAS bibliographic databases online. The CAS database, which goes back (in an online form) only to the 1960's, whereas Beilstein goes back to 1830, does not give "hard" answers to a query, but rather "soft" and unevaluated answers (i.e, literature citations, not scientific data). Of course, CAS is more up-to-date in terms of literature citations than the Beilstein Handbook. Organizations which subscribe to the printed Beilstein volumes will receive a respectable discount from the standard online prices of all vendors which make the Beilstein database available. Owners of the printed volumes will also benefit from a cost savings point of view from being able to use the PC (Personal Computer) SANDRA program, which runs on IBM and compatible computers, to quickly find a compound in the printed Handbook volumes. Owners of the printed volumes who do a lot of searching online can also benefit from having the printed volumes since they will save money by getting the Beilstein volume numbers and page numbers from an online search without having to pay the considerable cost of typing or printing out a record online. Thus it would best for someone to subscribe to the Handbook and use the online service to have access to everything from Beilstein.

Gmelin

The inorganic counterpart to the Beilstein Handbook is the Gmelin Handbook. While there has been considerable interest and publicity in the Beilstein Handbook, the scientists at the Gmelin Institute have been working in their usual efficient manner, under the direction of Professor E. Fluck, a long standing member of IUPAC, and currently the President of the IUPAC Inorganic Chemistry Division. The Gmelin Handbook of Inorganic Chemistry is prepared by the Gmelin Institute, which is part of the Max-Planck Society. The Gmelin Handbook, now in its 8th edition is composed of over 570 volumes with more than 180,000 pages of text which systematically covers the field of inorganic chemistry (2). Since 1980 all the volumes have been published in English. While the Beilstein computerization activities are done by the Beilstein staff in Frankfurt and their collaborators, the Gmelin Institute has chosen to establish a separate organization, Chemplex, to handle the computerization of the Gmelin database. While the Gmelin database will not be available online for a few years, Gmelin has made the Gmelin Molecular Formula Index available in 1987, almost a full two years before the Beilstein database will be online.

The computerized version of the Gmelin Formula Index (GFI) consists of the 20 Gmelin Handbook index volumes plus the abstracts and bibliographic information from the Gmelin Catalog. A second Supplement of 8 volumes, containing references up to 1987, is now in preparation. About one half of this information is already online on STN since early 1988 and is called GFI, Version 2. GFI can be searched in a number of ways. These include the molecular formula using the Hill notation, by substance groups (such as Solutions), by System Components (such as HAsO2-H2O) and by keywords (such as electrochemical behavior). In addition, the database can be searched with Boolean logic to combine terms, such as element counts and molecular weights. Thus one can search for entries with a molecular weight between 250-260 and 6-8 atoms. The basic results of a search are the Gmelin volume number and page number from the printed Handbook.

As for future computerization activities, a systems analysis was performed in 1985 and resulted in the decision to build a substance oriented factual database. There will be logical data records connected to individual inorganic substances which will be searchable by structure, substructure, or molecular formula and by data parameters fields of various chemical and physical properties. As not all inorganic compounds can be represented by a structure, the Chemplex staff plans to develop methods which will allow searches for nonstoichiometric compounds, systems, diadochous compounds, and others.

The final database, which is not expected to become completely available until about 1991 - 1992, will contain data from four sources. The first is the evaluated data from the existing Gmelin Handbook volumes. The second is the chemical substances and factual data from the old literature which is not yet covered in Gmelin. There are no immediate plans to evaluate this data.

The third source of data for the online Gmelin database is the chemical substances and factual data from the recent literature, similar to the Beilstein Short File. This information will then be checked, published in the printed Gmelin Handbook volumes, and then become part of the evaluated Gmelin database. The last source is the completely checked and evaluated chemical substances and evaluated data which is covered in the new Gmelin Handbook volumes.

The Gmelin Handbook and the online Gmelin database will independently continue to meet the information needs of the inorganic and organometallic chemist. The two products, hardcopy printed volumes and online will continue to be complimentary products, for which all chemists should have access to and use of both resources/ In the Handbook there will continue to be related information which is compiled at the same place and allows for a complete review of the material in question. It is the information in context which catalyzes the reflection, the association of ideas, and perhaps been the inspiration that makes up the innovative and creative thought process. Many new ideas have been inspired or stimulated by browsing through review type literature. On the other hand, the online Gmelin database will allow quick access to numeric and alphanumerical data of elements, compounds, chemical systems, and furthermore, searches for substances possessing specific properties, for specified structural formula or structure fragments, and so forth.

Summary

This article has presented the computerized activities of the Beilstein and Gmelin Institutes, two of the most important and useful sources of high quality scientific data available to chemists throughout the world. The factual and structure files of these two databases, both in hardcopy as well as computer readable forms, will be of considerable value to the chemical community in the upcoming years.

References

1. S. R. Heller, "Online Chemical Information", Chem. Int., 9, 136-138 (1987).

2. Springer-Verlag Publishers, Department of New Media/Handbooks, Tiergartenstrasse 17, D-6900, Heidelberg 1, West Germany or Electronic Information Services, 175 Fifth Avenue, New York, NY 10010 USA.

3. a) L. Domokos, C. Jochum, and G. Wittig, Mikrochim. Acta, II, 423-429 (1986), b) C. Jochum, G. Wittig, and S. Welford, Proceedings of the 10th International Online Conference, London, December 1986, pages 43-52 (1986), c) L. Domokos and C. Jochum, Anal. Chim. Acta., 191, 481-485 (1986), and d) R. Luckenbach and C. Jochum, CODATA Bulletin, 64, pages 28-31 (1986).

4. For further details about online access please contact the local Datastar, DIALOG, ORBIT, or STN offices or refer to the Addresses section of the "Directory of Online Databases" (ISSN 0193-6840), published quarterly by Cuadra/Elsevier, 52 Vanderbilt Avenue, New York, NY 10017 USA.

5. S. Lawson, Chapter 8, pages 80-87, in "Graphics for Chemical Structures: Integration with Text and Data", ACS Symposium Series #341, Edited by W. Warr (1987).