A Survey of Reaction Databases
Stephen R. Heller
US Department of Agriculture
Agricultural Research Service
BARC-W, Bldg. 011A, Room 164
Beltsville, MD 20705-2350 USA
Phone: 1-301-344-1709, FAX: 1-301-344-1823
Telemail: SRHELLER, BITNET: SRHELLER@UMDARS
Keywords: Chemical Reactions, Chemical Reaction Retrieval
Systems, ORAC, REACCS, CASREACT, ChemInform, Beilstein
Abstract: A brief survey of chemical reaction databases,
their contents, and the corresponding search and retrieval
software are described.
Over the past few years there has been considerable growth
in the number and types of databases of chemical reactions.
This growth has been stimulated by the availability of two
commercial chemical reaction retrieval software systems.
They are Organic Reactions Accessed by Computer (ORAC) for
ORAC Ltd. (1) and REaction ACCess System (REACCS) from MDL
Ltd. (2). This paper is an introduction to the reaction
databases symposium presentations being presented at the
1989 London Online Conference and is the first summary of
the various databases which has been compiled for
Chemical synthesis is a fundamental activity in chemistry,
particularly organic chemistry. A chemist who needs to
synthesize a chemical needs to know what methods are
available to convert chemical A into chemical B quickly,
efficiently, with as high a yield as possible, and at the
lowest possible cost. To do this the chemist needs to
perform a substructure search on a database of chemical
reactions. The ORAC and REACCS in-house software systems,
along with the online CAS STN system perform such
searching. There are differences in the software
capabilities of these systems, but this paper is not
designed to cover that topic. For this the reader is
referred elsewhere (3).
2 AVAILABLE DATABASES
The following is a list of available chemical reaction
libraries which will be discussed in this article.
Additional details on these databases will be found in the
papers following this article. The FIZ-Chemie ChemInform
and Beilstein databases are not included here, as they were
not commercially available at the time this paper was being
prepared in mid 1989.
Database Number of Years
Reactions of Coverage
A. CASREACT 570,000 1975 - present
B. ORAC Core database 50,000 1900 - present *
C. ORAC - Theilheimer
- Synthetic Methods
of Organic Chemistry 47,000 1946 - 1980
D. ORAC Academic
Collaboration 5,000 1987 - present
E. ORAC Heterocyclic 15,000 1980 - present
(* Most of the reactions are from 1980 onwards, but
important reactions dating back to the turn of the century
are in database.)
F. REACCS - Theilheimer
- Synthetic Methods
of Organic Chemistry 47,000 1946 - 1980
G. REACCS - Derwent's
Journal of Synthetic
Methods 29,000 1980 - present
H. REACCS -
Organic Synthesis 5,000 1921 - present
I. REACCS - Current
Literature File (CLF) 25,000 1983 - present
J. REACCS - CHIRAS -
Asymmetric Synthesis 5,000 1975 - present
Neglecting redundancy the overall numbers for the reaction
databases available on CAS, ORAC, and REACCS are,
respectively, about 570,000, 120,000, and 110,000. As
might be expected these "raw" numbers have little meaning
for most chemists. The main reasons are that the ORAC and
REACCS databases have only about 100,000 unique reactions,
and the large CAS database is limited in being only from
the recent (post 1985) literature and having limited
information (e.g., no stereochemistry).
3 DATABASE CONTENT
The databases listed in the previous section have
considerable variation in their content. The CASREACT
database (4), which was initiated a few years ago, has
reactions from the chemical literature going back only to
1985. It contains about 570,000 single-step reactions
found in some 39,000 records. As many of the very
important and most frequently used chemical reactions go
back decades, if not longer, there are clear limitations to
the CAS database. While the CAS database does have "all"
reactions in the database, as opposed to just the
"important" or "interesting" reactions which appear in the
databases from ORAC and REACCS, it is easy to be swamped
with the volume of the CAS database. CAS decided that
quantity would be their main focus, with little or no
concern about the chemistry or content of the database.
This is consistent with the bibliographic nature of their
abstract service. The CASREACT database is derived from
chemical reactions found in over 100 important synthetic
organic chemistry journals.
The data elements or parameters searchable in the CASREACT
database include the CAS Registry Number, starting
materials, products, catalyst, reagents, and solvents.
Missing from the database are parameters such as
stereochemistry, reaction temperature, comments on the
reaction (such as mechanism information), and labeling of
reaction centers (which atoms in the molecules were
involved in the reaction). A number of parameters can be
displayed, but are not searchable. These include
bibliographic information, in-depth substance and subject
indexing, and abstracts.
ORAC and REACCS Databases
Both of these systems contain the Theilheimer database and
the databases are created in essentially the same manner.
As an illustration of how a reaction is entered into a
database an example from ORAC has been chosen, as is shown
in Figures 1-4. The bottom right hand corner of Figure 1
shows the entire reaction being entered, with the product
shown in the main drawing area. Figure 2 shows how one
maps the atom-to-atom correspondences between reactant and
product. The asterisks are used to tag the atoms of concern
and the numbers show the details of the correspondences.
Figure 3 shows the data entry form for additional
information about the reaction, which is self evident from
the labels in the figure. Figure 4 shows part of the final
version of an entry.
The Theilheimer - Synthetic Methods of Organic Chemistry
database is derived from volumes 1-35 of the printed
editions, covering chemical reactions published from 1946
to 1980. There have been no updates to Theilheimer since
The ORAC core database comes from literature searches
performed by the ORAC staff.
The ORAC Academic Collaboration database comes from
reactions submitted by university collaborators who have
access to the ORAC software in exchange for providing
reactions to the database.
The ORAC Heterocyclic database comes from literature
abstracting of heterocyclic reactions.
The Derwent database is the computer readable version of
the 1980 - 1987 printed publication entitled the Journal of
Synthetic Methods (JSM), published monthly by Derwent. JSM
includes patent coverage. JSM can be thought of, in some
ways, as picking up where Theilheimer stopped in 1980.
The Organic Synthesis database is the computer readable
version of the printed Organic Synthesis reference
collection, which dates back to 1921 and currently runs
through Volume 67 (1987). Organic Synthesis is a
collection of well tested, verified methods for the
preparation of specific compounds. There are about 100 new
reactions added each year to the database.
The Current Literature File (CLF) is a database originally
created by an MDL REACCS customer and now being added to by
contributions from a number of sources. The reactions come
from about 35 journals abstracted since 1983.
The CHIRAS database of asymmetric synthesis contains
synthetic routes for optically active materials used
primarily in the agrochemical and pharmaceutical
industries. The reactions cover the literature from 1975
to the present. CHIRAS was initially developed by
scientists at Hoffmann-La Roche labs in the USA.
The data elements or parameters searchable in ORAC and
REACCS databases include the starting materials, products,
catalyst, reagents, solvents, stereochemistry, bonds which
change during the reaction, author, journal, year of
publication, name of reaction, and temperature. The ORAC
version also has comments about the reaction (such as
"Product steam distilled immediately on completion of
reaction."), reaction keywords (such as, metal amide,
migration, rearrangement, and so forth), and available
physical data (such as melting point, boiling point,
refractive index, and so forth). Only the REACCS CHIRAS
database has the CAS Registry Number included as a
This article has presented the reader with an overview of
the different chemical reaction library databases and their
contents. In addition to these databases, others are being
developed by ORAC Ltd. and MDL Ltd., as well as by the
German government FIZ-Chemie in Berlin and the Beilstein
Institute. As these last two groups have not yet released
any products, no discussion of their databases have been
included here. In any event these two databases would
likely be made available to both the in-house systems (ORAC
and REACCS) as well as the online STN CASREACT system with
essentially the same file content as the databases
discussed in this article.
1. ORAC Limited, 175 Woodhouse Lane, Leeds LS2 3AR, United
Kingdom, (Telephone: 0532-441821, FAX: 0532-448283).
2. Molecular Design Limited, 2132 Farallon Drive, San
Leandro, CA 94577, (Telephone: 415-895-1313 or 800-635-0064, FAX: 415-352-2870).
3. Borkent, J. H., Oukes, F., and Noordik, J. H.,
"Chemical Reaction Searching Compared in REACCS, SYNLIB,
and ORAC", J. Chem. Inf. Comput. Sci., 28, 148-150(1988).
4. CASREACT is available only online from CAS STN, 2540 Olentangy River Road, PO Box 3012, Columbus, OH 43210, (Telephone 614-447-3600 or 800-848-6538, ext. 3731).