Introduction
Part of the educational activities of the Committee on Chemical
Databases (CCDB) is to inform IUPAC members of interesting and
significant computer activities. This is a part of the CCDB term
of reference to promote a higher awareness of the applications of
computers in the management, dissemination, and uses of chemical
data. The first article described online computer services (1).
This article is the second in a series the CCDB is providing to
Chemistry International.
The shelves of a few thousand libraries around the world are home to hundreds and hundreds of volumes of the well known and highly respected German works of the Beilstein and Gmelin Handbooks (2). Starting at the end of 1988, and continuing for the next few years, these two massive and important reference works will become even more well known and used as these two giants of chemical data and information become more active as the 20th century comes to an end.
Beilstein
By the end of 1988 parts of the Beilstein Handbook of Organic Chemistry will be available online, in addition, the connection tables structures will be available for online searching. It will also be possible to be able to lease these connection tables for search on in-house computers (3). Thus, a new and computerized complementary resource for evaluated data of organic compounds will be available to the scientific community.
This will enhance the value of this unique scientific resource to
the entire chemical community.
The Beilstein database is really composed of two separate parts.
These two computer readable databases, the factual and structure
databases, have been developed under the direction of Dr. C.
Jochum, a member of the IUPAC CCDB. The Beilstein databases will
be available on the DIALOG and STN systems in 1988, and shortly
thereafter on the ORBIT and Datastar systems (4). While the
painful days of learning German and the complex and exacting
Beilstein organizational rules were over when the SANDRA
(Structure AND Reference Analyzer) computer program (written by
Dr. S. Lawson, a member of the IUPAC Publications Committee) was
made available last year (5), one still needed to go to the
multi-volume Beilstein Handbook to get the data and information
one needed. Within months this will begin to be no longer
necessary for most (but not all) of the data one will be looking
for in the Beilstein Handbook. While the database is going
online in increments, and for the group of large information
compounds (LIC's those compounds with too much data and
information to be usefully stored online, such as benzene,
aniline, and so forth) the full data will never go online, but
will certainly be a considerable improvement over what was the
case in the past with the printed Handbook volumes. The first
increment to go online will be the heterocyclic compounds, with
acyclic and carbocyclic compounds to follow over the next few
years, so don't expect too much too soon.
So that Chemistry International readers don't misunderstand what
Beilstein Online is and is not, it should be stated clearly that
the database will be data and structure searchable. The original
data from the Main or Basic (Hauptwerk) and Supplementary volumes
(Erganzungswerk) I-IV which appear in German are still in German
in the online version. (However most of the information from H-IV will be in english in the online version.) Nevertheless the
names of all compounds which are still in german in the Handbook
are being translated into english for the online version. While
only the data and information from the 5th Supplemental Series
(E-V) onwards will be in English, this will, in fact, comprise
the bulk of the information in the entire Beilstein database.
There will be some 400 different parameter fields or data
elements in the factual portion of the database, most of which
will be searchable, while one will be able to display all
parameter fields. Of the 400 searchable factual fields about 60
are numerically searchable, some are searchable text fields (such
as molecular formula, chemical names, and so forth), while the
rest are searchable using keywords.
For example one will be able to search for all compounds with a
dipole moment within a specified range combined which also have
toxicity data. One could also search for a chemical which is a
reaction product, or for starting materials of a reaction. Thus
it will be possible to search for how to prepare 2-nitro-thiophene using acetic anhydride as a starting material and
obtain the appropriate reaction conditions and literature
citations. The immense number of useful parameters make it
impossible to do justice to the content of the database in this
brief article.
There will also be two parts to the online factual database, the
Full File (completely evaluated data) and the Short File (not yet
evaluated data). The Full File will consist of virtually all the
data which have been printed in the Handbook and which have been
checked for errors and redundancies by the highly trained
Beilstein staff. The Full File, of completely evaluated data,
will consist of two types of compounds. In the Full File there
will be the so-called large information compounds (LIC's) and the
short information compounds (SIC's). As stated previously, the
LIC's are compounds for which there is a considerable amount of
information, which often are a number of pages in length in the
Handbook. The LIC's comprise less than 2% of the total entries
in the Beilstein Handbook. The SIC's usually take less than 1/4
of a page in the Handbook. The Short File, which is not to be
confused with the SIC's, will be made up of the abstracts from
the primary literature which have not yet undergone the detailed
checking which the Beilstein Handbook is so well known for.
Factual data of the current literature will be added to the Short
File on a regular basis. In this way Beilstein Online will be
more up-to-date than the printed Handbook volumes. As a record
is checked, it will be transferred to the Full File, and also
printed in the latest and appropriate printed Handbook volume.
To distinguish between the Full File and the Short File, all Full
File records will be noted in the online versions as "Handbook
Data". (For the complete details about Beilstein Online
searching, which are too lengthy to outline here, the reader is
referred to the Beilstein Online Manual, available from the
Beilstein publisher (2).) All parameters can be displayed or
printed, including the chemical structure of the compound. While
the database provided to the online vendors will be the same on
all online systems, as each vendor will implement the database
slightly differently owing to the different data and structure
search software they use, one should carefully examine the
features of a particular vendor prior to choosing which one you
will use. As a final point it should be again noted that the
first part of the Beilstein Handbook which will be going online
in 1988 will be the heterocycles, Beilstein volumes 17-27, and
the time covered will be from 1830 to the latest published
volumes in the E-IV series.
While the cost of searching has not been announced, it is
plausible to assume it will exceed the cost of the approximately
$150 per hour cost to search the CAS bibliographic databases
online. The CAS database, which goes back (in an online form)
only to the 1960's, whereas Beilstein goes back to 1830, does not
give "hard" answers to a query, but rather "soft" and unevaluated
answers (i.e, literature citations, not scientific data). Of
course, CAS is more up-to-date in terms of literature citations
than the Beilstein Handbook. Organizations which subscribe to
the printed Beilstein volumes will receive a respectable discount
from the standard online prices of all vendors which make the
Beilstein database available. Owners of the printed volumes will
also benefit from a cost savings point of view from being able to
use the PC (Personal Computer) SANDRA program, which runs on IBM
and compatible computers, to quickly find a compound in the
printed Handbook volumes. Owners of the printed volumes who do a
lot of searching online can also benefit from having the printed
volumes since they will save money by getting the Beilstein
volume numbers and page numbers from an online search without
having to pay the considerable cost of typing or printing out a
record online. Thus it would best for someone to subscribe to
the Handbook and use the online service to have access to
everything from Beilstein.
Gmelin
The inorganic counterpart to the Beilstein Handbook is the Gmelin
Handbook. While there has been considerable interest and
publicity in the Beilstein Handbook, the scientists at the Gmelin
Institute have been working in their usual efficient manner,
under the direction of Professor E. Fluck, a long standing member
of IUPAC, and currently the President of the IUPAC Inorganic
Chemistry Division. The Gmelin Handbook of Inorganic Chemistry
is prepared by the Gmelin Institute, which is part of the Max-Planck Society. The Gmelin Handbook, now in its 8th edition is
composed of over 570 volumes with more than 180,000 pages of text
which systematically covers the field of inorganic chemistry (2).
Since 1980 all the volumes have been published in English. While
the Beilstein computerization activities are done by the
Beilstein staff in Frankfurt and their collaborators, the Gmelin
Institute has chosen to establish a separate organization,
Chemplex, to handle the computerization of the Gmelin database.
While the Gmelin database will not be available online for a few
years, Gmelin has made the Gmelin Molecular Formula Index
available in 1987, almost a full two years before the Beilstein
database will be online.
The computerized version of the Gmelin Formula Index (GFI)
consists of the 20 Gmelin Handbook index volumes plus the
abstracts and bibliographic information from the Gmelin Catalog.
A second Supplement of 8 volumes, containing references up to
1987, is now in preparation. About one half of this information
is already online on STN since early 1988 and is called GFI,
Version 2. GFI can be searched in a number of ways. These
include the molecular formula using the Hill notation, by
substance groups (such as Solutions), by System Components (such
as HAsO2-H2O) and by keywords (such as electrochemical behavior).
In addition, the database can be searched with Boolean logic to
combine terms, such as element counts and molecular weights.
Thus one can search for entries with a molecular weight between
250-260 and 6-8 atoms. The basic results of a search are the
Gmelin volume number and page number from the printed Handbook.
As for future computerization activities, a systems analysis was
performed in 1985 and resulted in the decision to build a
substance oriented factual database. There will be logical data
records connected to individual inorganic substances which will
be searchable by structure, substructure, or molecular formula
and by data parameters fields of various chemical and physical
properties. As not all inorganic compounds can be represented by
a structure, the Chemplex staff plans to develop methods which
will allow searches for nonstoichiometric compounds, systems,
diadochous compounds, and others.
The final database, which is not expected to become completely available until about 1991 - 1992, will contain data from four sources. The first is the evaluated data from the existing Gmelin Handbook volumes. The second is the chemical substances and factual data from the old literature which is not yet covered in Gmelin. There are no immediate plans to evaluate this data.
The third source of data for the online Gmelin database is the
chemical substances and factual data from the recent literature,
similar to the Beilstein Short File. This information will then
be checked, published in the printed Gmelin Handbook volumes, and
then become part of the evaluated Gmelin database. The last
source is the completely checked and evaluated chemical
substances and evaluated data which is covered in the new Gmelin
Handbook volumes.
The Gmelin Handbook and the online Gmelin database will
independently continue to meet the information needs of the
inorganic and organometallic chemist. The two products, hardcopy
printed volumes and online will continue to be complimentary
products, for which all chemists should have access to and use of
both resources/ In the Handbook there will continue to be
related information which is compiled at the same place and
allows for a complete review of the material in question. It is
the information in context which catalyzes the reflection, the
association of ideas, and perhaps been the inspiration that makes
up the innovative and creative thought process. Many new ideas
have been inspired or stimulated by browsing through review type
literature. On the other hand, the online Gmelin database will
allow quick access to numeric and alphanumerical data of
elements, compounds, chemical systems, and furthermore, searches
for substances possessing specific properties, for specified
structural formula or structure fragments, and so forth.
Summary
This article has presented the computerized activities of the Beilstein and Gmelin Institutes, two of the most important and useful sources of high quality scientific data available to chemists throughout the world. The factual and structure files of these two databases, both in hardcopy as well as computer readable forms, will be of considerable value to the chemical community in the upcoming years.
References
1. S. R. Heller, "Online Chemical Information", Chem. Int., 9, 136-138 (1987).
2. Springer-Verlag Publishers, Department of New Media/Handbooks, Tiergartenstrasse 17, D-6900, Heidelberg 1, West Germany or Electronic Information Services, 175 Fifth Avenue, New York, NY 10010 USA.
3. a) L. Domokos, C. Jochum, and G. Wittig, Mikrochim. Acta, II, 423-429 (1986), b) C. Jochum, G. Wittig, and S. Welford, Proceedings of the 10th International Online Conference, London, December 1986, pages 43-52 (1986), c) L. Domokos and C. Jochum, Anal. Chim. Acta., 191, 481-485 (1986), and d) R. Luckenbach and C. Jochum, CODATA Bulletin, 64, pages 28-31 (1986).
4. For further details about online access please contact the local Datastar, DIALOG, ORBIT, or STN offices or refer to the Addresses section of the "Directory of Online Databases" (ISSN 0193-6840), published quarterly by Cuadra/Elsevier, 52 Vanderbilt Avenue, New York, NY 10017 USA.
5. S. Lawson, Chapter 8, pages 80-87, in "Graphics for Chemical Structures: Integration with Text and Data", ACS Symposium Series #341, Edited by W. Warr (1987).