The current state of chemical information technology in a
number of areas is presented. The author speculates in a number
of areas in some detail and presents a list of predictions as to
the likely state of the field in about the year 2000. The
economics of chemical information are also briefly discussed.
The author concludes that until the computer is made to be a easy
to use tool, not a barrier, reasonable, let alone optimum usage,
will not result.
INTRODUCTION
This presentation [1] is designed to stimulate discussion of
new technology which is becoming be available to chemists,
applied to chemistry, and most importantly, used by chemists in
their everyday activities by the beginning of the next century.
This paper has evolved over the past four years, and no doubt
will continue to evolve as new phenomena stimulate changes in the
habits and activities of chemists.
As computer technology has developed, the use of computers in chemistry has expanded from simple arithmetic calculations to very broad areas of chemistry. This paper delves into some of these areas and tries to summarize the current state of the use of computers in chemistry and what the author believes the use of computers in the field of chemistry will be a decade from now, which is roughly, the beginning of the 21st century.
BACKGROUND
Computers, like any other technological tool, have become
integrated gradually into the daily routine of chemists.
Anything new, particularly in science is usually taken with some
skepticism. This has been the case in science for a long time as
Max Planck, more widely known for Planck's constant, noted: "New
scientific truth does not triumph by convincing its opponents and
making them see the light, but rather because its opponents
eventually die, and a new generation grows up that is familiar
with it" [2].
The widespread use of computers in chemistry has clearly
been handicapped by a number of factors, a major one being the
lack of familiarity with this new technology on the part of
chemists and managers in the field of chemistry. This is true
from academia to government to industry. In working to locate
supporting facts for this article I heard the spirit of Max
Planck evoked a number of times. Phrases like you need to "raise
a generation of people who are comfortable with these tools" [3],
and "raise a generation of advocates" [4] came from professionals
in the field of market research. Thus I concluded that wide-scale and heavy use of computers by chemists has not yet started.
At present, the routine use of computers in support of
research and production in a chemistry lab or office, other than
for word processing, spreadsheets, and literature searching, is
low [5] (defined as less than 25% of the potential users). Why
is this the case? There are a few hundred thousand chemists in
the USA [6] and many of them have computers [7]. In the
worldwide pharmaceutical industry, in 1989, there were some
54,000 scientists employed in R&D activities [8]. Of these
almost 44,000 were in the USA and some 28,500 of them are
categorized as scientific and professional staff, the remaining
13,500 being categorized as technical and support staff. It is
generally thought that most (>90%) of these individuals have
computers, virtually of all which are IBM PC's and clones or
Apple Macintoshes, the remainder having Sun, Silicon Graphics,
DEC or other manufacturers of workstations [5,7]. It has not
possible to obtain any definitive information on the number of
scientists or chemists with PC's. Marketing surveys have not
addressed such questions [9].
If one combines these numbers with those in other developed
countries one could estimate some 800,000 chemists [4] as a
potential market for various computers and computer systems and
for software specifically designed to support the needs of the
chemist. This paper will examine some possible reasons why large
numbers of chemists have not yet decided that computers are a
necessary tool for conduct of their everyday research and
administrative work, thus explaining the lack of extensive use of
computers and related computer technology.
Please note that when the qualitative phrases "few" or "low"
(as defined in reference 5) are mentioned for the overall use of
a particular piece of computer, a computer program, or a
computerized database, the phrase is meant in comparison to the
overall potential purchase and use by some 800,000 potential
users (chemists) worldwide. For example, when a group of
chemists belonging to the COMP (Computers in Chemistry) division
of the ACS reports their current level of use of computers for
electronic communication at less than 15%, the term "few" or
"low" seems justified. With the exception of one series of
marketing studies in the area of computational chemistry [4],
there have, to date, been no published studies of the use of
computers and related computer systems by the end users in the
chemical community [9]. (Studies on the use of computers and
databases in libraries are not regarded here as end-user
studies.) The reason for this is the small size of the current
market which does not justify the investment for such a survey
[3,4]. Thus the reader will have to accept the lack of hard
statistics for many of the statements presented here.
While selling a total (over the lifetime of the program) of
a few hundred or even a few thousand molecular modeling or
structure drawing programs is, today, a major accomplishment in
the business of software for chemistry, it is a minor event
relative to the daily sales of word processing, database
management, spreadsheets, and other such programs. The lack of
any public software companies in chemistry (i.e., software
companies devoted exclusively to selling software for chemistry
and whose stock is available to the public) is indirect evidence
to support this position that there is, at present, no major
financial incentive to go into this business.
Before proceeding to the main thrust of the speculations
into the future of computers in chemistry it is important to note
that there are some labs as well as areas of chemistry in which
the use of computers is very high. As mentioned above, the area
of computational chemistry is clearly one of these areas. While
it is estimated there are 1000 sites worldwide with some 2000
academic and industrial chemists now involved in this area [4,
page 4], this is less than 1% of the chemists in the world. In
almost all areas of spectroscopy computers are heavily used to
acquire and analyze data. A reader involved in these areas of
chemistry would certainly not fall within the "low" range of
computer use. However, these "pockets" of high computer use,
when averaged with the entire chemistry community, I believe are
consistent with the levels of usage stated here.
COMPUTER AND CHEMICAL INFORMATION ISSUES
Table 1 summarizes both the issues which are to be discussed
here as well as the current and predicted level of activities in
these areas. Space in this journal does not permit a full
analysis of all of these topics. Thus a few representative
issues will be mentioned. Tables 2-14 list details for many of
these issues.
Topic | Today | 2000 |
Computer Literacy | Low-Moderate | Moderate - High |
Computer Chip technology | Intel 386,486;
Motorola 68000; RISC |
Intel 986; Motorola 98000;
RISC |
Operating Systems | DOS, UNIX, Windows,
OS/2, Macintosh |
Mostly enhanced, friendly, UNIX |
Telecommunications | Moderate usage
2400-9600 baud speeds |
Heavy usage
1 million++ baud speeds |
Interfaces | Offensive/exacting | Transparent/Voice based |
Graphics | Low usage in most software | Predominant usage in most software |
CD-ROM | Low end-user usage | High end-user usage |
Chemical Information | Raw & unprocessed | Processed & analyzed |
Online usage for chemistry | Low | Low |
SDI | Manual or by post | Electronic |
Databases | Bibliographic | Numeric & factual |
Beilstein | E-V Series being published | E-V Series still being published |
Chemical catalogs | Online searching of catalogs | Online ordering from catalogs |
Chemical Identification | CAS RN & BRN | Chemical structure |
Molecular Modeling | Few | Some |
Educational software | Random; not integrated with textbooks | Integrated with textbooks |
Publishing | Semi-Electronic | Mostly Electronic |
Books | Thought of as probable dinosaurs | Thought of as probable dinosaurs |
Instruments | Semi-automated | Fully-automated with ISO data transfer standards |
Today -
IBM PC widely used. Macintosh usage increasing.
2000 -
McDonald's selling the McIBM. Graphics, Mouse, user
friendly programs, very fast cpu (>250 Mhz), lots of storage
(> 1 Billion bytes), 1200 dpi laser printers, and high speed
modems.
Today -
Networks being used routinely by many chemists. BITNET, Internet and other networks used by scientists a few times
per week. Some companies have internal networks for many of
their end-user PC's. Telecommunications speeds in the range
of 2400 - 9600 baud.
2000 -
Networks and e-mail used all of the time. Automatic interfacing
between
all
networks
routine.
Automatic
logins for
mail
done
everyday
before the
scientist
comes
to
work.
E-mail
automatically
re-routed as
you
travel to
meetings,
holidays,
and
home.
Local-area and wide-area networks are widely available within most organizations. Large databases more readily
available within organizations. Telecommunications speeds
in the range of 2.4 million + baud.
Today -
Programs in their infancy.
2000 -
Voice control for input with lots of graphics. Standards for graphics and data are common. IUPAC, CODATA, ASTM, ISO,
and other organizations agree on data transfer protocols.
Today -
Usage in its infancy.
Lack of compatibility.
Lack of standards.
FAX transmission in its infancy.
2000 -
Graphics software packages are widespread.
Graphics routinely sent electronically.
(Microsoft Chart)
PC's have built in FAX's for receiving and transmitting
chemical structures and tables of data.
Today -
Chemistry CD-ROM products are rare today.
Low density (600 MB) CD's.
e.g., Aldrich MSDS, Beilstein Current Facts, Canadian Toxicity Databases NIST Mass Spectrometry, Kirk-Othmer Encyclopedia CAS 12th Collective Index, Chapman & Hall -
Dictionary of Natural Products
2000 -
New products and high density CD-ROM's (6 Billion + Bytes)
Heilbron Dictionary of Organic Chemicals
CRC Handbook, CAS volume(s) on CD-ROM
CAS subsets (e.g., polymers, patents)
Beilstein subsets, Gmelin subsets
Collections of small numeric databases
(e.g., IR and MS databases from NIST)
Most Journals
Today -
Most information is raw, unprocessed, and un-evaluated.
CAS, Beilstein, VINITI - most abstracting and data extraction
is
done
in-house
2000 -
Greater reliance on processed and evaluated data, such as
Beilstein, Gmelin, IUPAC data series, CRC Handbooks.
CAS, Beilstein, VINITI - economic factors will cause most abstracting and data extraction to be done by free-lance
workers at home. Articles and abstracts all sent
electronically from abstractor to abstracting service.
Today -
Popular feature for vendors. Results mailed to
customers or left for online downloading. [10]
2000 -
Popular feature for vendors. Results automatically sent
electronically to customers' PC via networks.
Today -
Still in the age of Bibliographic databases.
2000 -
Second generation of databases - numeric and factual data
overtake bibliographic databases in usage. Usage increases
as scientists realize need for (good) data for dry lab work
(modeling, etc.)
Today -
Lots of printed catalogs. A few catalogs on disk or CD-ROM (e.g. Aldrich)
2000 -
Catalogs on CD-ROM's. Users order directly over the phone from their labs. Ordering by credit card is routine.
Today -
CAS Registry Number reigns supreme.
2000 -
With chemical structures in all important databases, special
identification numbers have little use. Standard molecular
data formats allow for interfacing between all public and
private files.
Today -
Random Usage. Software used in teaching high school and
college chemistry does not come with textbooks, but as
separate products.
2000 -
Software integrated into textbooks [11]
(G. D. Wiggins, "Chemical Information Sources" [12])
PC floppy disk programs part of all undergraduate texts.
Chemical Information courses have PC based tutorials and practical online sessions.
Today -
Journal articles are almost the only socially acceptable form of communication and reward/promotion. Some scientific
manuscripts submitted in electronic form, but process is
neither widespread or practical. Virtually all refereeing
done by postal system mail, with some done by FAX.
2000 -
Printed journals still predominate, but electronic data submissions, electronic journals, software programs are now part of academic, government, and industrial chemist's reward/promotion system.
Leading journal publishers use electronic submissions to speed up processing of publications, easier data extraction,
and overall quality improvement.
Electronic (FAX and e-mail) peer review predominates.
Today -
Every instrument has a computer & most have different
computers and operating systems.
No universal interfacing.
ASTM and instrument manufacturers discussing standards.
2000 -
Everyone has a computer & there are universal protocols for input and output.
Data readily shipped to other computers for identification
and analysis.
The heart of the matter is computer literacy. Growing up
with, being familiar with, and making regular use of computers
and computer systems of information will not become the norm and
"triumph" (à la Max Planck) without the necessary atmosphere and
background being part of your upbringing. As mentioned in the
introduction, the initial use of computers by chemists (and other
scientists) was limited to performing simple calculations. Hence
it is no surprise that the area of chemistry in which computers
have been used is primarily computational chemistry. But the
usage even in this area is low. As Casale and Gelin [4] point
out, "as a scientific discipline, computational chemistry is in
its infancy". The current state of education in many parts of
the world will make further usage difficult. However I would
hope that in college and graduate school there would be
sufficient competence to train the upcoming generation of
chemists to become very familiar with computers, through the
introduction of computer application courses taught by chemists
in chemistry departments. Without an increase in the level of
computer literacy the remaining issues are pretty much
irrelevant.
There are two facets in using computers; writing programs
and using programs. The writing of programs is really a rather
limited issue. A computer is a tool. When a chemists gets too
involved in the tool then he or she is, more often than not, no
longer doing chemistry. What matters is using programs. To do
this effectively and properly you need to know what a computer
can do for you in the area in which you need to solve a problem.
I don't need to be an automotive engineer to know that to get
somewhere by car, I need a car, and need to know how to drive it,
and know where I am going. The same is true with computers.
Understanding what a computer can, or cannot, do is the important
step. Then either finding software and hardware to do it, or
getting someone to produce what is needed to get the job done, is
relatively simple. I believe that virtually no chemists use
computers as an end in themselves and that chemists should use
computers as one of many tools to do their job, but only if the
computer is the most effective way, and not a barrier, to do the
job better, more effectively, and more efficiently..
Most chemists use computers for only administrative purposes (like writing a manuscript which may or may not include chemical diagrams). I would argue that the reason for the lack of extensive use of computers is that the majority of computers (PC's of the 8086, 8088, 286, and 386 vintage) which are readily available to the chemist are of insufficient capacity and capability to do effective work other than word processing, structure drawing, and spreadsheet calculations. (Without the available computers moreover, there has been no incentive to develop the software for chemists.) Until just very recently the computers with the necessary cpu speed (e.g., 486 cpu PC's, DEC, Sun, Silicon Graphics, and other type workstations) and available disk space to do a variety of scientific applications (modeling, quantum chemistry calculations, spectral interpretation and prediction, database searching of spectral data, image analysis, etc.) were much too expensive for most individual chemists to have on their desks or in their labs. As little as 2 years ago a computer with an Intel 286 cpu and 40 MB hard disk was considered a state-of-the-art computer system. Almost nobody with a PC would keep a mass spectral database [13] and search system, requiring some 23 MB of hard disk space on a computer system with a 40 MB hard disk. Today, to run a modern PC operating system (using DOS 5 and the Windows or OS/2 operating systems) one needs at least an Intel 486 cpu (with a 50-66 Mhz clock) and 300-500 MB of disk space. A recent article from a monthly computer magazine [14] added up the disk storage requirements for a little over a dozen pieces of popular business oriented software and the total of the disk space required came to almost 100 MB, not including any of the disk space required for program swapping or any of the space needed for files of data and information.
In the next few years chemists will be able to replace
existing low power (e.g., 286 or equivalent type of PC) computers
or buy new ones with the computer power of a an Intel 486/586 cpu
(or their Sun, Silicon Graphics, DEC, or equivalent) with
sufficient disk storage space to readily run complex and powerful
programs.
The low usage of computers by chemists in the recent few
years may be attributable to the lack of affordable adequate
hardware, but it will take a number of years for this new and
more powerful hardware to work its way into the system and into
everyday use. Furthermore, unless software prices follow those
of hardware, it is difficult to believe that many chemists will
pay $2000 for a computer system and then spend thousands of
dollars for additional software packages. Only low cost, high
volume software is likely to succeed in the future. An
experiment in mass marketing to the chemistry community is now
being undertaken by Autodesk, which is hoping to increase the
number of scientists using PC-based molecular modeling packages
from a world-wide total of 5,000 to 100,000 or more [4].
Included in Autodesk's effort is a, multi-million dollar, grant
program for encouraging university use of their HyperChem
molecular modeling software product.
Computers are used for electronic communication by a small,
but growing number of chemists. Among the reasons for the low
usage are the lack of modems and dedicated phone lines as well as
the difficulty in finding where people are located or information
is located and initiating communication. There is also the lack
of computer addressees on the necessary computer networks
(Internet, BITNET, Sprintnet, Compuserve, etc.) and the problem
of connecting between networks. If I want to telephone someone
in another city or country I need only get the phone number from
a telephone operator at a price (except for unlisted numbers).
With computer networks, there is no readily available phone book,
no operator. Practically all numbers (actually computer network
addresses) are unlisted. That does present (using a good
chemical phase) an energy barrier to solving a problem. However
I can see changes coming. A few years ago a business card had a
name, title, address, and phone number. Today many business
cards have FAX and Internet addresses. It is even possible in
many cases to access a computer address online, although this is
only starting to be widely used. This is part of computer
literacy. This is progress. I believe it will still take years
for chemists to make routine use of Internet and the related
networks connected to it. From discussions with a number of
people, the estimated usage of Internet by chemists was in the
range of 10-15% of those who have a computer and can access
computers outside their organizations [5]. This number is quite
similar to that found for the current level of electronic
communication of chemists belonging to the COMP division of the
ACS, noted below. At present about 1/3 of the use of Internet
is for e-mail [15]. Perhaps even more interesting is that up to
20% of the Internet traffic in the USA is flow-through traffic
between Europe & Asia [15]. While the overall number of
computers connected to Internet (estimated at more than 727,000
as of January 1992 [16]), the volume of usage (traffic) on
Internet [17] and other statistics about Internet are available
[17], these numbers do not address the question of micro-usage or
end-usage. While the number of computers and the number of users
with accounts on these computers can be reasonably estimated, it
is not possible at present to determine how many people are
actually using Internet and what their usage level is. It is
possible that only a few hundred of the hundreds of thousands of
computers and users are generating large percentages of the total
volume of use of Internet.
Electronic mail or e-mail is slowly (due to a lack of
knowledge about it) becoming a new type of network for chemists,
as well as other scientists. [18]. It is not an "old-boy"
network or "invisible college" because it allows anyone accessing
these system to be an "equal" of anyone else on the system.
Electronic bulletin boards, discussion groups, and news groups,
dealing with all subjects, are slowly sprouting up everywhere in
all areas, each with dozens to hundreds of users [19]. Of almost
800 such news group surveyed by Kovacs, less than 100 are in the
physical sciences and of these only 20 are in chemistry [20].
Again a small percentage is observed when the topic relates to
chemistry. A few will be mentioned here. For the area of
chemical information these include the Chemical Information
Sources Discussion List managed by Gary Wiggins at Indiana
University (CHMINF-L@IUBVM.UCS.INDIANA.EDU or CHMINF@IUBACS -
with about 425 subscribers with a maximum volume of 5 messages
per day), the Chemical Education News Group managed by Bill
Halpern at the University of West Florida (CHEMED-L@UWF.BITNET),
and the Computational Chemistry News Group managed by Jan
Labanowski at the Ohio Supercomputer Center at Ohio State
University (CHEMISTRY@OSC.EDU - with about 1000 subscribers and
with a volume of 10 messages per day). In a 1990 survey [21] it
was estimated that some 10% of the overall working population in
the USA and Canada uses e-mail systems, vs, only 1.3% in Europe.
The ACS Computers in Chemistry (COMP) Division now distributes
its newsletter via e-mail on Internet, as well as hard-copy. In
mid-1992 a little less than 10% of the COMP members received the
newsletter electronically [22], a number comparable with the
survey mentioned above. By 1994 this survey estimated the usage
in North America would grow to almost 29%, while in Europe the
usage would expand to just under 5%. Certainly there are
cultural differences between those two areas in the use of the
telephone and modems, but the European PTT's and their policies
add to the difficulty of use. I would expect the e-mail usage in
chemistry to be higher today, and that e-mail will become a
necessity by the end of this decade. The low cost, ease of use,
and ability readily to send written information to colleagues
around the world make this an ideal replacement for the existing
"old-boy" network of phone calls, meetings, and letters. For
anyone, from a Nobel prize winner to a undergraduate student, to
be able to communicate freely and easily, and to see what topics
and areas are of current interest should improve scientific
communication and research work. E-mail will be able to reduce
the time for papers to be sent back and forth. Once the graphics
problem is solved (both the technical standard for graphics and
the speed of transmission of graphics) and put into practical
use, e-mail will allow for real electronic journals [23]. This
area has a great and important future for chemists throughout the
world.
Another major problem with computer programs is the
difficulty associated with their use. Pacman and Nintendo (the
popular video games of the 1980's and early 1990's) never came
with manuals. Some manuals seem more designed for weight lifting
than explaining how to use a particular computer program.
Rarely are manuals available in computer readable form [24]. In
computerized form manuals could be capable of being searching for
a word in which you are interested in finding. Installing and
running programs is a major energy barrier for most people. My
philosophy is that if I must read the manual to use the computer
program, I probably am better off without it. There is no way
someone can become and remain proficient with a wide variety of
programs, remembering what each does and how to perform
particular tasks, as well as doing their assigned job as a
chemist. Few people use their VCR's to record TV shows because
they can't figure out how to do it. This even created a market
for a device which automatically sets up the VCR to record based
on a set of 5 digits you type into a device. The 5 digits are
published in newspapers in the USA everyday next to each TV
program listing.
Table 1 speaks of today's interfaces as being frightful and
difficult. If people are not comfortable with a tool they will
not use it [3]. As computers become more powerful and better
software engineers graduate and get jobs, one can only hope and
expect that the interfaces in the year 2000 will become
transparent and even voice based [25]. One way to accomplish
this is through the extended use of graphics in computers. Today
the use of high resolution graphics (1024 x 1024 pixels) is low.
Color screen size is small (12 - 14 inches) and expensive. By
the year 2000 I would expect that every computer will have a 20
inch or larger color monitor with at least 2048 x 2048
resolution, along with a color laser printer or plotter with the
same capabilities. With the cost of hardware decreasing this
equipment should be available to most scientists in the coming
decade. Related to the problem of the need high-resolution for
graphics is the problem of how to transmit all the information
quickly enough to be of practical use. Today's modem speeds of
2400 - 9600 baud are much too slow for graphics to be practical.
With the current trend of better networks and telecommunications,
it seems reasonable to believe that the speeds of transmissions
needed for chemistry graphics will be available in the next few
years.
In the area of chemical identification it has taken some 20
years for the CAS (Chemical Abstracts Service) Registry Number
(CAS RN) to be used widely and routinely in databases and in
searching for chemicals. While the CAS RN now reigns supreme for
chemical identification, it suffers from the lack of any inherent
intellectual value; it is, like the US Social Security number, an
idiot number (notwithstanding its check digit), assigned
sequentially over time. A larger number just means it entered
the CAS Registry system more recently. In the past few years
optical scanning devices, coupled with advances in character and
vector recognition have led to the development of computer
programs (see, for example the work of Johnson, et. al [26])
which are able to scan articles from the scientific literature
(or from internal research reports), extract chemical
information, including connection tables of chemical structures
and chemical reaction data (such as solvent, temperature of
reaction, etc.). Kekulé, a similar, but less ambitious program
[27] is able to scan a structure and create a connection table.
Both of these systems will make adding connection tables to
databases much less labor-intensive than in the past.
In spite of the wide use of the CAS RN in chemistry, and
particularly in chemical regulation by the EPA (Environmental
Protection Agency) [28], chemical names are also still very
widely used for administrative and regulatory purposes. In fact
the recently developed AUTONOM program [29] was initially
conceived for internal processing at the Beilstein Institute for
their handbook and database work. AUTONOM takes most (>75%) [30]
chemical structures and creates an IUPAC approved name for that
chemical structure. Its administrative value for internal and
regulatory purposes is such that it is now a commercial package
[31]. Thus while there will be a need for chemical names and
registry numbers, the primary need will not be a scientific one.
The ease of creating large databases of chemical structures,
along with the efforts underway to create standard molecular data
descriptions of molecules (e.g., the SMD (Standard Molecular
Data) [32] and STAR (Self-defining Text Archive and Retrieval)
[33] projects) and the increased ability to send large volumes of
data over networks at high speeds, make it seem reasonable to
predict that the use of the CAS RN for searching for a chemical
will decrease over time. One of the major drawbacks of the CAS
RN (and the Beilstein Registry Number as well) is the lack of
these numbers in the private and generally confidential files of
companies. It is not possible to use an internal identification
number to search public files and vice versa. Only the chemical
structure itself, when used as the "search term" will be a
practical way to see if a chemical is in another database. As
different organizations represent their structures slightly
differently, only the advent of a standard molecular
representation or an interchange program (such as the recently
developed program ConSystant [34]) will allow a user to readily
search for related structures in another database of chemical
structures.
Most of the data and information in the major chemical
databases of the world are raw and unprocessed. The two largest
collections, those of CAS [35] and VINITI [36] are bibliographic.
In these two databases, whatever the author says is accepted at
face value. Since almost all of the papers abstracted are
refereed, either the author's abstract is used or CAS or VINITI
write an abstract, based on the information provided in the
publication. Only the Beilstein and Gmelin databases perform
some measure of evaluation, although most of this work is really
extraction of information. For example, in the Beilstein
Handbook, online database, and CD-ROM Current Facts, the data are
extracted. In the past there were some additional efforts made
to assure that enough information was published in the original
work to guarantee the work could be reproduced, and not every
chemical reaction or piece of data was used by Beilstein. The
Beilstein staff has never had the financial capability to
evaluate the very large volume of data they process. Such data
evaluation is rare, with the most well known example being that
of the US NIST/SRD (National Institute for Standards and
Technology, Standard Reference Data Program). Beilstein
performed a "second" and valuable peer review, albeit too late to
keep questionable or poorly defined or unexplained science from
being published. In any event, today, due to the high costs of
labor, both the Beilstein Institute and VINITI have fewer in-house staff than in past years and rely more on part-time outside
workers. With some 65% of the costs at CAS being labor, it is
reasonable to believe that CAS will be moving in this direction
again. (CAS once had primarily all of the abstracting done by
outside chemists.)
CD-ROM's are hardware devices that are just beginning to
find use in chemistry. Again the problem of the lack of good
software, adequate computer hardware, and available databases has
limited the growth and use of this medium. While most of the
422 [37] educational science libraries in the USA have CD-ROM
drives it is estimated that less than 1% of the computers which
are in chemistry labs have a CD-ROM drive [38]. This estimate
has been supported by a non-scientific, non-systematic request
for information which the author sent to the approximately 400
subscribers to the Chemical Information News Group (see above)
resulted in two responses, one from Exxon and one from Rutgers
(Chemistry and Physics Departments) [39]. In both cases these
information specialists who replied indicated they know of no end
users in their organizations who had CD-ROM's on their PC's. In
addition to this survey, a number of vendors of CD-ROM's were
contacted. All considered the sales and types of users to be
confidential information. None kept track of the type of users
who were buying their products. In one case, that of the Aldrich
Chemical catalog on CD-ROM it was learned that while the sales
(at $25 per CD-ROM) of their catalog on CD-ROM were under 1000,
they publish 2.7 million copies of their printed catalog [40] and
distributed free of charge. Even in an area where computers are
used more routinely in chemistry, namely computational chemistry,
less than 10% of the customers using the molecular modeling
software program SYBYL have requested to receive their software
update on the CD-ROM offered by the vendor [41]. This could be
compared to the computer science community, where more than 80%
of the users of Sun workstations receive their software and
documentation on CD-ROM [42]. Thus chemists clearly have a long
way to go before they become as comfortable with this medium as
computer programmers and computer systems staff are. At present
I would summarize the state of the chemists' use of CD-ROM as in
its infancy, but I strongly feel that the growth curve for CD-ROM
usage is likely to be exponential in the coming years, as
evidenced by the use of CD-ROM in other fields.
Even for those who have a CD-ROM drive, the small size of
the market often leads to databases which are not updated. No
doubt owing to the lack of sales, even the Microsoft Bookshelf
CD-ROM, which contains an almanac, dictionary, thesaurus, US Zip
code directory, and other databases hasn't been updated in 5
years [43]. CD-ROM's, which today store about 660 million
characters (about 330,000 pages of text), will, by the year 2000,
replace many reference books and chemical catalogs on the
chemists' bench and bookshelf. A few pioneers in this area, such
as the Beilstein Institute in Frankfurt Germany are leading the
way to what will clearly be the library of the future. The
Beilstein Current Facts CD-ROM has about one year of extracted
data from the literature (without author names, titles, or
abstracts), along with a computer chemical structure search
system, all neatly collected on a single CD-ROM. Someday, the
weekly issue of Chemical Abstracts will come to each chemist this
way. Each chemist will have the Merck Index, CRC Handbook of
Chemistry and Physics, ACS Directory of Graduate Research, and a
few ACS journals, all on CD-ROM's. By the year 2000 it should be
possible to custom order a set of books on CD-ROM. For example,
the ACS Symposium Series of several hundred books could be
entered into computer readable form and then books "printed" on a
CD-ROM on demand, the same way floppy disks are copied today.
Using keywords or phrases you could select a set of books you
might want on your bookshelf (actually your CD-ROM jukebox
device), and send the order for such a disk to be mastered and
mailed to you. Certainly custom made orders would be more
expensive than pre-packaged ones, but, if marketed and priced
favorably, should be well within the means of most chemists.
Groups of chemists, such as the polymer or materials chemists
could create their own CD-ROM's based on existing volumes already
printed. IUPAC could create a CD-ROM of Pure and Applied
Chemistry. The list is almost endless.
The last specific topic to be covered in this paper is the
area of books, journals, and online chemical information. In the
online area it can be seen from the current usage of scientific
and technical databases, the current generation of chemists is
not very familiar with computers and chemical information. The
costs of searching the chemical literature (including the various
charges of connect time, search hits, printouts, and so forth)
are high, averaging well over $100 per hour connected to a host
main-frame computer. Compared to browsing through a book,
journal, or an issue of the printed Chemical Abstracts, this is
expensive. Most of the information is not evaluated. The
details of the chemical synthesis method or the properties of a
molecule or material are either not in the abstract or need to be
found by reading the journal article or book chapter. With high
fixed expenses in the creation of the information, due to the
fact that abstracting and indexing is and, I believe, will always
be a very labor intensive effort (even with such expected
developments as the potentially useful software of Johnson, et.
al. [26]), there are two ways to recover the costs; either
charge a lot of people small sums of money or charge a few people
a lot of money. The chemical information industry, for the most
part (and there are a few exceptions), has decided to opt for
high prices. The results are what most would expect. Few of the
hundreds of thousands of chemists referred to in the beginning of
this article use computerized databases. Few subscribe to weekly
literature searching (Selective Dissemination of Information -
SDI) of online databases. The reason is primarily economic.
Schools and even many companies cannot afford to have hundreds of
chemists spending such large sums of money on literature and
related online searching. Hopefully some of the database and
vendor companies will begin to experiment with the notion of
marketing to the thousands of potential users waiting for
reasonably priced products. Years ago many people had personal
subscriptions to sections of Chemical Abstracts, to journals, and
so on. Will the computer revolution in general and CD-ROM's in
particular cause history to come full circle? I believe by the
year 2000 this is a distinct possibility if there are changes in
the way in which vendors market their products. While books will
never disappear from the chemist's desk, I think CD-ROM will
become the preferred medium of distribution and use in many areas
of chemical information. These areas include reference works,
collections of books and articles on a particular subject, as
well as chemical catalogs of supplies, and software updates.
As for computer-based journals, as stated in Table 13,
publishing in a printed journal is now the socially accepted
means of communication and leads to rewards and promotions.
While the means of communication can easily change, the social
reward situation is quite different. Universities and most other
organizations which have peer review, use refereed scientific
journals very heavily in their evaluation criteria. While I feel
my career has not suffered due to the software and databases I
have written and developed, I do not think this is the usual
case. Experiments in journals which have a substantial portion
of their activity in non-hard copy form are now starting to
appear. One such case was Tetrahedron Computer Methodology (TCM)
[44]. This journal died after about 4 years, owing to a variety
of technical and nontechnical reasons. A new partly online
journal, the Online Journal of Current Clinical Trials, has just
finally gotten off the ground and is now available [45, 46].
This journal has more institutional support than TCM, and so it
may make inroads in this area. Additionally there is another new
journal, Protein Science [47], which is a biochemistry journal
which started publishing in January 1992. Protein Science comes
with a floppy disk of graphics, which the journal calls
"kinemages". In any event, I can see that these experiments,
coupled with better delivery mechanisms (for chemistry this is
primarily software for the transmission and viewing of graphics),
will by the end of the decade lead to a few journals making real
headway towards the chemical community having automated journals.
ECONOMIC ISSUES [48]
The recent recession in a number of developed countries of
the world has led to the re-examination of how to sell products.
When people don't fly, airlines lower their fares to fill seats.
When people don't buy automobiles, General Motors, Ford, and
Chrysler, along with foreign car companies lower the prices to
stimulate sales. When hotels have occupancy rates below 50% and
need 65% occupancy to at least break-even financially, hotels
offer cheap rooms. There are many more examples outside the
chemical information area, but it should suffice to state that
the Japanese domination of the consumer electronics industry
clearly shows that lower prices lead to higher volumes and
generally higher profits. Examples in chemical information, I
need only to mention such publications as the 11th edition of the
Merck Index [49] (priced at $30) or the CRC Handbook of Chemistry
and Physics [50] (priced at $100), now in its 73rd edition. Both
of these products sell tens of thousands of copies.
In chemical information there seems to be a pervasive
attitude that information is valuable and prices must be high.
Information is no doubt valuable, as evidenced by state and
corporate intelligence gathering. In 1978 the total annual
online information (scientific and non-scientific (primarily
legal) information) revenues were about $40 million [51]. By
1990 this had grown to an annual rate of $690 million. The most
successful computer chemistry software company, Molecular Design
Ltd (MDL) of San Leandro, California, in roughly the same period
of time has seen revenues go from $0 to about $50 million per
year. Molecular modeling companies, of which there are at least
a half dozen, together probably have total annual revenues of
less than current MDL sales. (Sales revenues are based on
software sales and exclude hardware and consulting/consortium
groups.) Compared with other industries and especially compared
to other areas of the computer industry, these revenues are
rather low and these are not impressive figures [52]. I would
hope that companies in this field will begin to experiment with
new marketing approaches which will both increase the usage of
their products and reach a larger segment of the chemistry
population. The Autodesk effort with its HyperChem software is
one bright example in a gloomy field. Without a greater volume
of usage it is possible that information will remain a commodity
for only a small portion of the chemical community.
SUMMARY
I believe there are two main reasons for the low use of
computers and computer systems by chemists - cost and ease of
use. The economics of chemical information, up to this point in
time, made it a tool for few users and the wealthy in the more
developed nations of the world and for the more wealthy companies
in those countries. More easy to use computer systems will, in
the long run, generate more usage. This should, in turn, lower
individual computer costs. (The classic chicken and the egg
situation.) I believe that with the current and upcoming
generation of hardware with the power of an Intel 486 (at 50-100
Mhz) or an equivalent UNIX-based Sun or DEC or Silicon Graphics
workstation, software can be designed and implemented which will
have two main features. The software will be reasonably easy to
use (and be easy to remember the next day or week as to how to
use a program or database system being accessed) and powerful
enough to do the actual job needed to be done. By powerful I
mean that the software will have the necessary "user-friendly"
interfaces (graphics, mouse, voice command, and so on), and have
some AI (artificial intelligence) capability and knowledge of the
subject to assist the end use in getting his or her job done.
However, without close cooperation between software developers
and database producers and their end users, this will not happen.
Both the software and databases need to be properly designed to
meet the actual end-user needs, not the needs which the vendors
perceive the users have . Talking to, and more importantly,
listening to, the customer or end user, is something the chemical
information and related industry will have to come to grips with
in the next few years if real and substantial progress is to be
made for both parties.
Computers and the related technology described in this
article hold the potential promise that by the 21st century more
chemical information and computer systems will be available to
the entire world-wide community. With larger numbers of users
this should allow the costs of the products being developed to be
spread across a much wider number of people, leading to higher
usage, higher productivity and lower costs for all computer
related products.
Acknowledgements
The author wishes to acknowledge of number of colleagues who have provided information and comments on this paper. These include:
Bob Badger (Springer-Verlag)
Mike Bowen (ACS)
Pierre Buffet (Questel)
Chuck Casale (Aberdeen Group)
Hideaki Chihara (JAICI)
Thomas Clerc (Bern)
Harry Collier (Infonortics)
Larry Dusold (FDA)
Tom Greeves (Daratech)
Richard Hong (Hawk Scientific)
Sandy Lawson (Beilstein)
Dave Lide (CRC)
Bill Milne (NIH)
Glen Ouchi (Brego)
Kris Pettersen (Autodesk)
Tom Pierce (Rohm & Haas)
Rudy Potenzone (CAS)
Craig Shelley (SoftShell)
Steve Schultz (Aldrich Chemical)
Babu Venkataraghavan (Lederle Labs)
Wendy Warr (Wendy Warr & Associates)
Gary Wiggins (Indiana University)
Joanne Witiak (Rohm & Haas)
Chezi Wolman (Hebrew University)
Hugh Woodruff (Merck)
[1] Based on a lectures given at the 10th International Conference on Computers in Chemical Research and Education (ICCCRE), Jerusalem, Israel, July 1992 and the Second InternationalConference on Computer Applications to Materials and Molecular Science and Engineering, Yokohama, Japan, September 1992.
[2] Planck, M., "Scientific Autobiography and Other Papers", Williams & Norgate, London (1950), pages 33-34.
[3] Tom Greeves, Daratech Inc., 140 6th Street, Cambridge, MA 02142. Phone: 617-354-2339.
[4] Casale, Charles T. and Gelin, Bruce R., "Growth and Opportunity in Computational Chemistry", 1992, The Aberdeen Group,, 92 State Street, Boston,MA. Phone: 617-723-7890, FAX: 617-723-7897. There is also a more extensive 1989 report published by the same organization entitled "Conflicting Trends in Computational Chemistry".
[5] Ouchi, G., Brego Research, private communication. R. Venkataraghavan, Lederle Labs, private communication, T. Pierce, Rohm & Haas, private communication. J. Witiak, Rohm & Haas, private communication. H. Woodruff, Merck Labs, private communication. Low usage is defined as 0-25%, moderate usage at 26-49%, high as 50-75%, and very high as 76-100%.
[6] The information on the number of chemists appears to be inconsistent. In 1990 the US Bureau of Labor Statistics reported there were 125,000 chemists in the USA. From a 1990 survey, the number of ACS members with chemistry degrees number about 137,000. The same survey indicated that the number of ACS members who majored in chemistry was 111,000. In 1992 there were about 145,000 members of the ACS. Lastly, the 1988 Kline report to the ACS stated that there were 213,000 chemists in the USA and 137,000 chemical engineers. Among the reasons the Kline report was commissioned by the ACS was to find out how many potential members there would be for ACS membership. Mike Bowen, Director, ACS Membership Division, private communication.
[7] For example, Merck & Co, a drug company of over 35,000 employees (not all of whom are scientists) has in excess of 10,000 PC's in total. Hugh Woodruff, private communication.
[8] Tim Brogan, Pharmaceutical Manufacturers Association, 1100 15th Street, NW, Washington DC 20005. Phone 202-835-3400.
[9] Attempts to find evidence or even an estimate of the number of chemists who have PC's has proven futile. The few firms that have done market surveys in this field or for computer sales in general, such as the Aberdeen group [reference 7], as well as Daratech, Inc., Dataquest, and International Data Corporation, had no information, nor did they have any idea where to get such information.
[10] In October 1992 STN began to deliver SDI results electronically, but only to an STNmail ID. STNews, 8, #10, page 1, October 1992, North American Edition.
[11] In the PC computer field a regular flow of books come with computer disks. These disks are intimately related to the contents of the book. For example, a disk of DOS enhancement programs comes with the Dvorak book on DOS and PC performance. Dvorak, J. C. and Anis, N., "Dvorak's Inside Track to DOS & PC Performance", ISBN: 0-07-881759-5, $39.95. Osborne McGraw-Hill, 1992. 2600 Tenth Street, Berkeley, CA 94710. Phone: 510-549-6600, FAX: 510549-6603.
[12] Wiggins, G. D., "Chemical Information Sources", McGraw-Hill Series in Advanced Chemistry, ISBN: 0-07-909939-4, McGraw-Hill, New York, 1991. This book includes a "Chemistry Reference Sources Database" of 2156 records plus the Pro-Cite search software for IBM PC's. Pro-Cite is available from Personal Bibliographic Software, PO Box 4250, Ann Arbor, MI 48106-4250. Phone" 313-996-1580; FAX: 313-996-4672.
[13] PC version of the NIST/EPA/NIH Mass Spectral Database, March 1992 Version. Available from NIST/SRD, Bldg 221/A320, Gaithersburg, MD 20899 (Phone: 301-975-2208; FAX: 301-926-0416). The price is $1200 for the database or $200 for those who had bought previous versions.
[14] PC World, page 210, August 1992.
[15] L. Dusold (FDA, Washington, DC), private communication.
[16] Lottor, M., "Internet Growth", RFC#1296, SRI International, Network Information Systems center, 333 Ravenswood Avenue, Room EJ-291, Menlo Park, CA 94025. Phone: 415-859-6387, FAX: 415-859-6028, E-mail: NISC@NISC.SRI.COM.
[17] MERIT - Information Center for Internet. MERIT Network, Inc., 2901 Hubbard, Pod-G, University of Michigan, Ann Arbor, MI 48105-2016. Phone: 313-936-3000, e-mail: NSFNET-INFO@MERIT.EDU.
[18] See the recent series on electronic mail in the October 1992 issue of Spectrum magazine, the monthly publication of the IEEE. The articles in this issue include: a) Perry, T. S. and Adam, J. A., "E-mail: Pervasive and Persuasive", Spectrum, pages 22-23 (1992), b) Perry, T. S., "E-mail at Work", Spectrum, pages 24-28 (1992), c) Adam, J. A., "Playing on the net", page 29 (1992), and d) "Perry, T. S., "Forces for Social Change", pages 30-32 (1992).
[19] One list of directories of academic e-mail conferences is available from the Kent State University file server. It was developed by Diane K. Kovacs and is copyright. It is available , via Internet, by ftp (file transfer protocol) from KSUVXA.KENT.EDU. The lists of directories are in the LIBRARY directory of the computer and are titled: ACADLIST.FILEx, where x is 1 to 7, depending on your area of interest (physical sciences, biological sciences, etc.). File7 is the file containing the news groups in the Physical sciences and mathematics).
[20] There are new user groups being formed all the time. However it remains to be seen how will it will take for these to actually take hold and have staying power. Springer-Verlag, the company that distributes the Beilstein database started a system for its users of the Beilstein database on the Compuserve computer system. After a year it was found that usage was too low to continue it. In the past few months user group conferences on HyperChem software, CHARMm software, Organic Chemistry (actually a restart of a system that died a few years ago), Amber software, BioSym software, and SYBYL software have been started up. IUPAC plans to initiate one in the near future for its many members and affiliates around the world. It will be interesting to see how many of these remain and in what form they remain in 1-2 years.
[21] BIS Strategic Decisions Global Electronic Messaging Service, 1991.
[22] Pierce, T., Rohm & Haas, private communication.
[23] Meadows, A. J., and Buckle, P., "Changing Communication Activities in the British Scientific Community", J. Doc., 48, 276-290 (1992).
[24] The WordPerfect Corporation makes its WordPerfect manuals available on disk, which can be searched for words using any word processor.
[25] Calem, R. E., "Coming Soon: The PC With Ears", New York Times, Business Section, page F9, August 30, 1992.
[26] Ibison, P., Johnson, A. P., Kam, F., Neville, A. G., Simpson, R., Tonnelier, R., and Venczel T., "Automatic Extraction of Chemical Information from the Literature", page 25, Abstracts of the 10th ICCCRE, Jerusalem, Israel, July 1992.
[27] McDaniel, Joe R, and Balmuth, Jason R., "Kekulé: OCR - Optical Chemical (Structure) Recognition", J. Chem. Inf. Comput. Sci., 32, 373-378 (1992).
[28] EPA Order 2880.2, "Use of Chemical Abstracts Service Registry Data in ADP Systems", June 30, 1975.
[29] Goebels, L., Lawson, A. J., and Wisniewski, J. L. , "AUTONOM: System for Computer Translation of Structural Diagrams into IUPAC-Compatible Names. Nomenclature of Chains and Rings", J. Chem. Inf. Comput. Sci., 31, 216-225 (1991). Wisniewski, J. L., "AUTONOM: System for Computer Translation of Structural Diagrams into IUPAC-Compatible Names. 1. General Design", J. Chem. Inf. Comput. Sci., 30, 314-332 (1990).
[30] The figure of 75% refers to chemicals published in the current organic chemistry literature. The program does not handle stereochemistry, charged species, inorganics, peptides, or sugars. A. J. Lawson, private communication.
[31] AUTONOM, is an IBM-PC software program and costs $ 1980 (industry price) or $980 (academic price). It is available from Springer-Verlag Publishers, 175 Fifth Avenue, NY 10010. Phone: 212-460-1622, FAX: 212-533-5781.
[32] Barnard, J. M., "Draft Specification for Revised Version of the Standard Molecular Data (SMD) Format", J. Chem. Inf. Comput. Sci., 30, 81-96 (1990).
[33] Hall, S. R., "The STAR File: A New Format for Electronic Data Transfer and Archiving", J. Chem. Inf. Comput. Sci., 31, 326-333 (1991).
[34] ConSystant is an IBM PC-based DOS program available for $199 from ExoGraphics, PO Box 655, West Milford, NJ 07480-0655. Phone: 201-728-0188, FAX 201-728-0735.
[35] The American Chemical Society established the Chemical Abstracts Service in 1907. The printed Chemical Abstracts and the related Chemical Abstracts databases are available from CAS, 2540 Olentangy River Road, PO Box 3012, Columbus, OH 43210-0012. Phone: 614-447-3600, FAX: 614-447-3713. At present there are some 9.5 million abstracts in the CA computer readable database and about 11.5 million chemical structures with CAS REGN, in the structure file associated with the bibliographic database. There are some 17.5 million names associated with the 11.5 million structures.
[36] VINITI, The All Russian Institute of Scientific and Technical Information, was established in 1952. Since that time it has collected over 31 million source documents in all areas of science (not just chemistry). Of these there are some 11 million abstracts in computer readable form. Its main publication is "Referativnyi Zhurnal VINITI". VINITI is located at 20a Uslevitcha Street, Moscow 125219, Russia. Phone: 011-7-095-152-6163, FAX: 011-7-095-943-0060. VINITI distributes its products outside of Russia through Access Innovations, Inc. 4314 Mesa Grande S.E., Albuquerque, NM 87108. Phone 505-265-3591, FAX: 505-256-1080
[37] J. Comstock (Head, ACS Books Department, ACS Publications Division, Washington, DC.), private communication.
[38] Badger, R., Springer-Verlag, New York, private communication. Gary Wiggins, Chemistry Library, Indiana University, private communication.
[39] D. Johnson, Exxon Corporation, private communication. H. Dess, Rutgers University/Chemistry & Physics Library, private communication.
[40] Publication Department, Aldrich Chemical Company, 940 West St. Paul Avenue, Milwaukee, WI 53233. Phone: 414-273-3850.
[41] TRIPOS Associates, Inc., 1699 South Hanley Road, Suite 303, St. Louis, MO 63144. Phone: 314-647-1099 or 800-323-2960; FAX: 314-647-9241.
[42] Sun Microsystems, 2550 Garcia Avenue, Mountain View, CA 94043. Phone: 415-960-1300; FAX: 415-969-9131.
[43] Mitchell, J., editor, "The CD-ROM Directory 1991", 5th edition, Entry 1010, page 169, TFPL Publishing, 22 Peters Lane, London, EC1M 6DS, UK (Phone: 44-71-251-5522).
[44] Tetrahedron Computer Methodology (TCM), was published by Pergamon Press between 1988 and 1992.
[45] The Online Journal of Current Clinical Trials (CCT), a joint venture of the American Association for the Advancement of Science (AAAS) and the OCLC Online Computer Library Center, Inc. The price is $95 plus monthly telecommunication charges. For further information contact the journal at 1333 H Street, NW, Washington, DC 20005. Phone: 202-326-6446.
[46] On 28 August 1992 it was announced (Science, Volume 257, 4 September 1992, page 1341) that CCT [42] has linked up with the journal "The Lancet" so that "The Lancet" could publish a printed, abridged form of a current CCT article.
[47] This new journal was described by Stu Borman in an article in Chemical & Engineering News, February 17, 1992, pages 26-27.
[48] Heller, S. R., "The Economic Future of Numeric Databases in Chemistry " , Proceedings of the 15th International Online Information Meeting, London, December 1991, pages 47 - 50. Published by Learned Information, Oxford, UK.
[49] The Merck Index, 11th Edition, S. Budavari, Editor, Merck & Co. Inc., Rahway, NJ 07065-0900 USA, 1989 (Phone: 908-594-4904).
[50] CRC Handbook of Chemistry and Physics, 73rd Edition, D. Lide, Editor, CRC Press Inc., 2000 Corporate Blvd., N.W., Boca Raton, FL 33431 USA, 1992 (Phone: 407-994-0555, FAX: 407-994-3625).
[51] Williams, M., "Highlights of the Online Database Industry", pages 1-4, Proceedings of the National Online Meeting, New York, May 1992. Published by Learned Information Inc., 143 Old Marlton Pike Medford, NJ 08055 (Phone: 609-654-6226, FAX: 609-654-4309).
[52] For example, the sales figure of $50 million for MDL sales after 10 years can be compared with that of Lotus Development Corporation, which was $53 million in its first year (1983). To date over 9 million copies of Lotus 1-2-3 have been sold. Barron's magazine, September 14, 1992, page 12.