Online Chemical Information
ABSTRACT
A brief summary of the chemical information available online
in computer systems around the world are described and discussed.
Both databases and software systems will be included.
INTRODUCTION
IUPAC has recently created a new committee on Chemical
Databases, with members from the USA, USSR, FRG, and Japan. The
Committee has three main terms of reference. The first is to
advise the IUPAC president and Executive Committee on all aspects
of computerized databases of chemical properties, needs for
standardization for databases and chemical structure records, and
policy on database dissemination. The second is to work with the
IUPAC Commissions on the design and implementation of databases and
appropriate software and to encourage maximum compatibility of
databases from different groups within IUPAC. Lastly the committee
is to promote, in collaboration with other ICSU bodies, a higher
level of awareness of the application of computers in the
management, dissemination, and use of chemical data. As part of
this third activity of the committee, this presentation on Online
Chemical Information has been prepared as the first educational
activity of the committee. The committee is also working on a
Glossary of Computer Terms for chemists, and is preparing a draft
list of chemical codes for IUPAC (including greek letters and other
special symbols), which would be the computer equivalent of the
IUPAC green book of symbols and standards. With the heightened
awareness of the need for and use of computers as a tool to aid the
chemist, it is felt the committee will have an growing and
interested audience throughout all of IUPAC in its activities.
The chemical literature and chemical information was once a
finite and manageable resource. However, with the information
explosion and the growth of the scientific literature, it has
become almost infinite and unmanageable, making it very difficult
for a chemist to keep abreast of the latest developments in a given
area of research. This problem has been recognized by the
scientific information community, mostly notably Chemical
Abstracts, and this has led over the past twenty years to a
considerable amount of automation of chemical information and data.
There are many books (1) and articles on the subject of
chemical information and computerized databases, so this brief
article will only survey the field, to assure that the reader is
aware of what is available and the major features and
characteristics of the products and services on the market. One
very critical point to be made, which it is hoped will be the
impetus for the reader to learn more about this subject, is that
the existence and availability of online information makes it
possible for anyone, anywhere in the world, in both big and small
organizations, to have the same access to information at the same
time and at (essentially) the same cost.
TYPES OF CHEMICAL INFORMATION
There are a number of types of chemical information which have
been automated or computerized, and it is important to know and
understand the differences between these types. The first is
reference data, often called bibliographic chemical data. This
type includes Chemical Abstracts, which publishes some 400,000
abstracts per year, as well as the Institute for Scientific
Information (ISI) Index Chemicus, which also publishes a similar
number of abstracts per year. These databases contain textual
information, citations, sometimes abstracts, but not factual data.
The second type of chemical database is called non-bibliographic, factual, source or numeric data. These databases
contain actual numbers or measurements, like mass spectral data,
infrared data, boiling points, partition coefficient values, and so
forth. Handbooks and similar types of databases, such as Beilstein
- The Handbook of Organic Chemistry, Heilbron - The Dictionary of
Organic Chemicals, The Cambridge Crystal Database, The Merck Index,
and so forth are examples of databases which would fit into the
category of source or non-bibliographic databases.
In the field of chemistry there is third type of database
which is related to, and a possible companion database to, these
two types, which is called a chemical structure database. This
is simply a database in which the chemical structure has been
represented in computer readable form, usually called a connection
table. An example of a connection table is given in Figure 1,
which shows both the usual chemical structure, with a molecular
formula C8H7ClO, and below the structural diagram is the computer
representation of the chemical. Each atom is numbered and then
identified with a letter(s) representing which element it is,
following by the other atoms to which it is connected. Lastly, the
type of bond connection is given. Bond type 9 means aromatic, type
5 means a chain single bond, and type 1 is a ring single bond. In
the past (and also continuing into the present) there have been
many other representations of chemical structures, including
nomenclature, such as IUPAC names, and Wiswesser Line Notation
(WLN). What differentiates a connection table from these linear
notations (i.e., notations which can be written on one line) is the
two-dimensional nature of the connection table as well as the
ability to search for chemical fragments or sub-structures in a
completely open and total manner. (it is worthwhile to note that
structure searching by name fragments and WLN is possible, but it
is not as good, efficient, and complete as connection table
searching.)
CHEMICAL DATABASES
This short article cannot go into considerable detail about
the many chemical and chemical related databases, so it will be
necessary for the reader to refer to either the database producer,
the online system on which a particular database is available, or
to a directory of databases for the details desired. In 1980 there
were about 500 computer readable databases available in all field
of science, technology, business, and other areas, with some 75
companies making these databases available online in a computer
system which was available for access by telephone and computer
terminal connection. By 1986 this number has grown to over 2900
databases available from about 450 different sources.
In many cases the same database is available from more than
one source. For example CA Search (which has a total of over five
million abstracts in computer readable form), the online computer
version of Chemical Abstracts, is available from some nine online
vendors throughout the world, and is updated every 2-4 weeks,
depending on which online vendor you chose. The Merck Index is
available from three different companies. The 13-CNMR database is
available from two companies, while the LogP database is available
from only one company. The definition of a chemical database is
generally quite broad, and includes the usual bibliographic
databases, many patent databases, chemical property databases, and
chemical structure/nomenclature databases. An excellent source of
information for the latest summary of chemical (and other)
databases is the Directory of Online Databases (2), published
quarterly, and usually available in the library.
In the field of bibliographic databases, the most widely used
is the Chemical Abstracts database, which adds over 400,000
citations per year to the database which goes back to 1967, and
totals over 7,000,000 citations. The ISI database, Index Chemicus,
which covers the literature from 1962 to date, includes only new
chemicals, and thus is smaller (4,000,000) in size than the CA
database.
In addition to these two large abstracting services in
chemistry, the American Chemical Society (ACS) has computerized
nineteen of its journal publications, so that now the entire
journal article (less the tables and diagrams) is in computer
readable form and can be searched (3).
In the area of non-bibliographic, factual, or numeric
databases, there are many, and the list continues to grow. One
important point to be made about these databases as opposed to the
bibliographic databases is their size. The numeric database are
usually very small in numbers of chemicals. Some, like CESARS (a
database of detailed and evaluated toxicological data) have
information on about 200 chemicals. The 13-CNMR database range in
size from 15,000 to 50,000. The mass spectral databases range from
40,000 to over 100,000. In the case of mass spectral data, the
larger database does not have as complete information on each
chemical as the smaller database. So one must be careful to
examine quality as well as quantity.
Thermodynamic databases are available from producers like DECHEMA (FRG), the Thermodynamics Research Center (Texas A&M University), and the Physical Properties Data System (PPDS) of the UK. The IUPAC Committee on Chemical Databases is working with the IUPAC Commissions to make databases such as conductance, solubility, transport properties, and enthalpy of vaporization available, both on computer tapes and floppy disks as well as through online vendors in the near future.
ONLINE COMPUTER SYSTEMS
Each type of database mentioned about requires computer
software to search, retrieve, and/or analyze the information in the
particular database. Such software needs a computer to run on.
Thus the online computer systems (all of which are in the USA,
except as noted), such as the DIALOG, ORBIT, BRS, Data-Star
(Switzerland), JICST (Japan), DARC (France), STN (USA and FRG), CAS
ONLINE, Pergamon Online (UK), TDS, CIS, and so forth are
combination of a database, software, and computer hardware which
forms a complete system (2). Generally speaking these online
computer system vendors do not create and own the databases which
are available on their systems. Thus, the approximately 200
databases on the DIALOG system are maintained and owned by their
respective creators, not DIALOG. DIALOG is simply a supplier of
the information from others. Some, like CAS have primarily their
own databases on their system (STN), but most of their databases
are also found on other systems (such as DIALOG, BRS, QUESTEL,
ORBIT, Pergamon Online, and others).
In addition to the above systems which search bibliographic
and non-bibliographic databases, there are two major online
systems which search for chemical structures. They are CAS ONLINE
and the QUESTEL DARC system. Both have the entire CAS file of over
seven and one half million chemical structures, and the QUESTEL
DARC systems also has the ISI database of over three and one half
million chemical structures which are associated with the ISI Index
Chemicus database.
All of these systems are available, usually via a local dial-up telephone, in most countries, and usually at nominal prices.
Figure 2 shows the author sitting at a computer terminal connected
to a chemical database system via an ordinary telephone line which
can be seen in the bottom left hand of the figure. The computer
systems of some vendors are available 24 hours per day, 7 days per
week. Availability of the remaining systems is 5-6 days per week,
and usually about 20 hours per day. Thus it is fair to say
chemical information is essentially available anywhere and at
anytime. As telecommunications become easier and less expensive,
usage will increase. In many countries and organizations, access
is available through a library or similar information group, often
at little or no cost to the end user. For a novice it is probably
better to let someone else perform the searching, so you can learn
how it is done.
SUMMARY
It has be the intent of this article to provide a brief
overview of computer based chemical information in terms of the
databases and how they are available to the worldwide scientific
community. Understanding what is available and where such
information and chemical data can be found is becoming more and
more important in the high technology world we live in today.
IUPAC is participating in this area through its new Committee on
Chemical Databases, and all IUPAC members are encouraged to provide
input into this committee to help create and make available for
dissemination and distribution the valuable data being gathered and
evaluated by the IUPAC Commissions.
REFERENCES
1. See, for example, Y. Wolman, "Chemical Information - A Practical Guide to Utilization", John Wiley & Sons, New York, (1983), and J. Ash, et. al., "Communication, Storage and Retrieval of Chemical Information", Ellis Harwood, Chichester, (1985).
2. For a explanation of all the acronyms, what databases these online vendor companies have available, and how these companies can be contacted, an excellent reference source is: Directory of Online Databases, Cuadra Associates, 2001 Wilshire Blvd., Suite 305, Santa Monica, CA 90403 USA.
3. S. W. Terant, L. R. Garson, B. E. Myers, and S. M. Cohen, "Online Searching : Full text of American Chemical Society Primary Journals", J. Chem. Inf. Comput. Sci., 24, 230(1984).